notes 4 supp diffeq
TRANSCRIPT
-
7/31/2019 Notes 4 Supp Diffeq
1/23
Additional Topics for Chapter 4Linear Algebra and Differential Equations1
Matrix Factorization
Review of Elementary Matrices
Denition 1 An elementary matrix is an n
n matrix that can be obtained by performing a single
elementary row operation on the identity matrix In. (Note that the identity matrix itself is an elementarymatrix because we could multiply any row of In by the scalar 1.)
Recall that the elementary row operations are:
1. Swap two rows
2. Multiply a row by a nonzero constant
3. Add a multiple of one row to another row
Example 2 Row swap: Multiplying matrix A by the elementary matrix E1, in which rows 1 and 2 of I3are swapped, produces a matrix in which rows 1 and 2 of A have also been swapped.
240 1 0
1 0 00 0 1
35241 4 7
2 5 63 1 2 35 = 242 5 6
1 4 73 1 2 35Example 3 Multiplication of a row by a scalar: Multiplying matrix A by the elementary matrix E2,in which the second row of I3 has been multiplied by
1
3, produces a new matrix in which the second row of A
has been multiplied by 13
.24 1 0 00 1
30
0 0 1
3524 1 4 72 5 6
3 1 2
35 =
24 1 4 72
3
5
32
3 1 2
35
Example 4 Adding a multiple of one row to another: Multiplying matrixA by the elementary matrixE3, in which two times the rst row has been subtracted from the second row of I3, produces a new matrixin which the two times the rst row of A has been subtracted from the second row of A.
24 1 0 02 1 00 0 1
3524 1 4 72 5 63 1 2
35 = 24 1 4 70 3 83 1 2
35This leads us to the following theorems, the second of which is a direct result of the fact that elementary
row operations are reversible.
Theorem 5 If an elementary row operation is performed on a matrix A, the resulting matrix can also beobtained by multiplying A (on the left) by the corresponding elementary matrix E.
Theorem 6 If E is an elementary matrix, then E1 exists and is also an elementary matrix.
As conrmation of the previous theorem, note that the elementary matrices E1, E2, and E3 from abovehave inverses
24 0 1 01 0 00 0 1
35 ,24 1 0 00 3 00 0 1
35 , and 24 1 0 02 1 00 0 1
35because
E1E11
= E11
E1 =
24 0 1 01 0 0
0 0 1
3524 0 1 01 0 0
0 0 1
35 =
24 1 0 00 1 0
0 0 1
35 ,
1 Material from Falvo, David C. and Larson, Ron. Elementary Linear Algebra, 6th ed. Brooks/Cole. 2010.
1
-
7/31/2019 Notes 4 Supp Diffeq
2/23
while
E2E12
= E12
E2 =
24 1 0 00 1
30
0 0 1
3524 1 0 00 3 0
0 0 1
35 =
24 1 0 00 1 0
0 0 1
35 ,
and
E3E13
= E13
E3 =
24 1 0 02 1 0
0 0 1
3524 1 0 02 1 0
0 0 1
35 =
24 1 0 00 1 0
0 0 1
35 .
Theorem 7 Two matricesA andB are row equivalent if there exists a nite number of elementary matricesE1; E2;:::;Ek such that B = EkEk1 E2E1A. (In other words, A and B are row equivalent if we can get
fromA to B via a nite number of elementary row operations.)
Following is an example of elementary matrices in use to reduce a 2 2 matrix to reduced row-echelonform (i.e., I2 in this case):
Example 8 Start with A =
5 181 4
:
Matrix Elementary Row Operation Elementary Matrix Inverse Elementary Matrix
5 181 4
swap R1 and R2 E1 =
0 11 0
E11
=
0 11 0
1 45 18
Add5R1 to R2 E2 = 1 05 1 E12 = 1 05 1
1 40 2
Multiply R2 by 1
2E3 =
1 00 1
2
E13
=
1 00 2
1 40 1
Add4R2 to R1 E4 =
1 40 1
E14
=
1 40 1
Then, E4E3E2E1A = I. Since each of the Ei are invertible, we also see that
E11
E12
E13
E14
E4E3E2E1A = E11
E12
E13
E14
I
A = E1
1 E1
2 E1
3 E1
4 .
In other words,
A =
0 11 0
1 05 1
1 00 2
1 40 1
,
or, A is the product of the inverses of the elementary matrices that were used to reduce A to I.
The LU-Factorization (without row interchanges)
There are a number of "matrix factorizations" in frequent use. Perhaps the most basic of these is what isknown as the "LU-Factorization." To motivate its development, let us consider an example:
Example 9 Start with A = 2 1
8 7 . We can accomplish row-echelon form with only one row operation.Here is that row operation and its associated elementary matrix:Matrix Elementary Row Operation Elementary Matrix Inverse Elementary Matrix
2 18 7
Add4R1 to R2 E1 =
1 0
4 1
E11
=
1 04 1
#
2 10 3
2
-
7/31/2019 Notes 4 Supp Diffeq
3/23
The above example shows that E1A = U, so the relation A = LU implies that L must actually be E11
,or
A =
2 18 7
=
1 04 1
2 10 3
= LU:
What is the signicance of this factorization? First of all, we use the letters L and U for a reason. Note that
L is lower triangular (any nonzero elements are on or below the diagonal) and U is upper triangular (anynonzero elements are on or above the diagonal). Additionally, the diagonal elements of the L matrix are1s. Once we have an LU-factorization of a matrix, we can generate an algorithm to easily solve numeroussystems involving that same coecient matrix. The practical signicance of this is that it is even moreecient than Gaussian elimination when we need to reuse a coecient matrix with varying right-hand sides(i.e., what weve been calling the b vector).2 Before we proceed, we need to mention an important "lemma"(a lemma is a sort of warm-up to a Theorem):
Lemma 10 IfL and bL are lower triangular matrices of the same size, so is their productLbL. Furthermore,if both of the matrices have ones on their diagonals, then so does their product. If U and bU are uppertriangular matrices of the same size, so is their product UbU.
Let us illustrate with another example, this time taking note of the result of the above lemma.
Example 11 Find an LU-factorization of the matrix A =
24 2 1 14 5 22 2 0
35.Here is the procedure (Gaussian elimination) and its associated elementary matrices.
Matrix Elementary Row Operation Elementary Matrix Inverse Elementary Matrix
24 2 1 14 5 2
2 2 0
35 Add2R1 to R2 E1 =
24 1 0 02 1 0
0 0 1
35 E1
1=
24 1 0 02 1 0
0 0 1
35
24
2 1 10 3 0
2 2 0
35
Add
R1 to R3 E2 =
24
1 0 00 1 0
1 0 1
35
E12
=
24
1 0 00 1 0
1 0 1
352
4 2 1 10 3 00 3 1
35 Add R2 to R3 E3 =
24 1 0 00 1 0
0 1 1
35 E1
3=
24 1 0 00 1 0
0 1 1
35
#24 2 1 10 3 0
0 0 1
35
Just as in the earlier 2 2 example, we have
E3E2E1A = U;
soE11
E12
E13
E3E2E1A = A = E11
E12
E13
U.
2 For n n systems, LU-Factorization requires
4n3 3n2 n=6 airthmetic operations for the factorization itself (which
only has to be done once and can then be reused). Then each solution for the two resulting tiangular systems (more on thislater) can be carried out in 2n2n operations per system. On the other hand, Gaussian elimination uses
4n3 + 9n2 7n
=6
arithmetic operations to arrive at a solution, and it requries this many operations for each system.
3
-
7/31/2019 Notes 4 Supp Diffeq
4/23
But, note that each of the E1i s are lower triangular with ones on their diagonal. According to the previouslemma, their product will also have this form. Indeed,
E11
E12
E13
=
24 1 0 02 1 0
0 0 1
3524 1 0 00 1 0
1 0 1
3524 1 0 00 1 0
0 1 1
35 =
24 1 0 02 1 0
1 1 1
35 ; (1)
and we realize that E11
E12
E13
= L, and that A = LU, as desired. In other words, A can be "factored"into
A = 24 2 1 14 5 22 2 0
35 = 24 1 0 02 1 01 1 1
3524 2 1 10 3 00 0 1
35 = LU, (2)which again, is a product of a lower and an upper diagonal matrix. Note too that the result of themultiplication in (1) is a matrix whose diagonal elements are ones and whose other elements are the individualelements of the elementary matrices "condensed" into one matrix. We can look directly at L (at least inthis case) and see exactly what row operations were performed to get from A to U.
Using A = LU to Solve Systems
So how do we use this factorization to solve a system Ax = b? We can use a simple two-stage process:
1. Solve the lower triangular system Ly = b for the vector c by forward substitution.
2. Solve the resulting upper triangular system Ux = y for x by back substitution.
The above two-stage process works because if
Ux = y and Ly = b, then Ax = LUx = Ly = b.
As an example, consider the LU-factorization we found in (2) above, namely24 2 1 14 5 2
2 2 0
35 =
24 1 0 02 1 0
1 1 1
3524 2 1 10 3 0
0 0 1
35 :
Suppose we seek to nd the solution to the system
24 2 1 14 5 22 2 0
3524 xyz
35 = 24 122
35 .We rst solve the lower triangular system2
4 1 0 02 1 01 1 1
3524 ab
c
35 =
24 12
2
35 , or
8 0;with positive scalars ci (called weights) are all inner products on R
n. Note the conditionc > 0. If any ofthe ci are zero or negative, the product is no longer an inner product.
8
-
7/31/2019 Notes 4 Supp Diffeq
9/23
Example 17 Consider the real-valued and continuous functions in the vector space C[a; b] (the space of all
continuous functions on the interval [a; b]). Thenhf; gi = Rba
f(x) g (x) dx is an inner product on C[a; b].
(1) hf; gi = Rba
f(x) g (x) dx =Rba
g (x) f(x) dx = hg; fi(2) hf; g + hi = Rb
af(x) [g (x) + h (x)] dx =
Rba
f(x) g (x) dx +Rba
f(x) h (x) dx = hf; gi + hf; hi(3) c hf; gi = c Rb
af(x) g (x) dx =
Rba
(cf(x)) g (x) dx = hcf;gi(4) hf; fi = Rb
af(x) f(x) dx 0 because (f(x))2 0 for all x. Additionally, hf; fi = 0 if and only if
f(x) = 0 or if a = b.
Orthogonal Projections
Review of Dot Products and Orthogonality
Recall the following:
Two vectors are said to be orthogonal if their dot product is zero, namely
u v = 0 or uTv = 0,
where u and v are column vectors. By denition, the zero vector is orthogonal to all other vectors.
The angle between the two vectors is given by the relation
u v = kuk kvk cos or cos = u vkuk kvk .
The length or norm of a vector is given by kvk2 = v v. The distance between two points (or vectors) is given by
d (u;v) = ku vk = kv uk .
A set of vectors is said to be mutually orthogonal if every pair of vectors in the set is orthogonal.Additionally, if all of the vectors are unit vectors (i.e., have length of one), the set is said to beorthonormal.
An orthogonal set of nonzero vectors is linearly independent. A basis that is an orthogonal set is called an orthogonal basis. If the vectors in the basis are all
of length one, the basis is called an orthonormal basis. (All of the familiar "standard" bases areorthonormal, e.g. f(1; 0; 0) ; (0; 1; 0) ; (0; 0; 1)g)
Orthogonal and Orthonormal Bases
Why make a big deal out of orthogonal and orthonormal bases? It turns out that the orthonormal bases ofa vector space are quite useful because there is a simple formula for writing any vector in the vector space asa linear combination of those orthonormal basis vectors. We do not have to start over and solve a system ofequations just to determine the coecients of the given vector relative to the basis every single time. Hereis the derivation of that formula.
Suppose we have an orthonormal basis fu1;:::;ung for a vector space V. Ifv is a vector in V, theremust exist scalars c1;:::;cn such that
v = c1u1 + c2u2 + + cnun. (3)
We seek a formula to determine each of the cis. Start with the ith basis vector, namely ui. If we take thedot product ofui with both sides of (3), we have
v ui = (c1u1 + c2u2 + + cnun) ui,
9
-
7/31/2019 Notes 4 Supp Diffeq
10/23
and using the properties of dot products, this leads to
v ui = (c1u1 + c2u2 + + cnun) ui= c1u1 ui + c2u2 ui + + cnun ui.
Now, since each of the basis vectors are mutually orthogonal, we must have ui uj = 0 for any two distinctvectors in the set fu1;:::;ung (i.e., ui uj = 0 unless i = j). Therefore,
v ui = 0 + 0 + + ciui ui + + 0 + 0.
Since the basis vectors are orthonormal, we know their lengths are all one, so ui ui = kuik2 = 1, and
v ui = ci (ui ui) = ci.
We have therefore found a formula for the ith coecient ci. As i ranges from 1 to n, we nd that c1 = v u1,c2 = v u2, ..., cn = v un. Consequently, we have proven the following theorem.
Theorem 18 Iffu1; :::;ung is an orthonormal basis for a vector space V, any vectorv in V can be writtenas a linear combination of these basis vectors as follows:
v = c1u1 + c2u2 + + cnun= (v u1)u1 + (v u2)u2 + + (v un)un.
Example 19 The vectors u1 = (0; 1; 0), u2 = 35 ; 0; 45, andu3 = 45 ; 0; 35 form an orthonormal basisB for R3. Express the vectorv = (2; 3; 1) as a linear combination of these basis vectors.
Solution 20 Take the three required dot products:
v u1 = (2; 3; 1) (0; 1; 0) = 3v u2 = (2; 3; 1)
3
5; 0; 4
5
=
2
5
v u3 = (2; 3; 1)
4
5; 0;
3
5
=
11
5
These scalars represent the "coordinates ofv relative to the basis B," and
v = 3(0; 1; 0) + 253
5; 0; 4
5 + 11
54
5; 0; 3
5 .
(Multiply it out to conrm this!)
Furthermore, note that taking dot products in this manner, with the rst vector the same each time, isequivalent to the following matrix multiplications:
[2; 3; 1]24 01
0
35 = 3, [2; 3; 1]
24 350
45
35 = 2
5, and [2; 3; 1]
24 450
3
5
35 = 11
5,
and we can combine all of them into a single matrix multiplication:
[2; 3; 1]24 0 35 451 0 0
0 45
3
5
35 = 3 25
11
5
,
yielding the desired coecients ofu1, u2, and u3, respectively. (Compare this to the technique we had touse to nd the coordinates of a vector relative to a nonstandard basis.)
10
-
7/31/2019 Notes 4 Supp Diffeq
11/23
Distance and Projections
We quite often need to determine the distance between a point b and a line in the direction of vector a, asshown in the gure below. Or, we might want to determine "how much" of the force vector b is pointing inthe direction ofa. (We have probably all done this with respect to the coordinate axes in the former caseor horizontal and vertical vector components in the latter.) Regardless of the question, the approach is thesame. We need to determine the projection of b onto a, denoted by projab and represented by p in thegure.
b
a
e = b - p
p
b
a
e = b - p
p
O
It might help to think of projab as what b would look like if you were "above" it and looking directlydown at a, with a line of sight perpendicular to a.
We will now derive the formula for p. Note that p must be some scalar multiple of vector a because itis in the same direction (or opposite direction if the angle was obtuse). Therefore, p = ca, and we need tosolve for c. Of course, the point on the vector a that is closest to b would be the point at the foot of theperpendicular dropped from b onto a. In other words, the line from b to the closest point p on a wouldbe perpendicular to a: Note that in terms of vector subtraction, the side opposite angle O (denoted e inthe gure) represents the vector subtraction e = b p, or because p = ca, e = b ca. Since vector e isperpendicular to a, we must have
a e = 0, or a (b ca) = 0, or a b a ca = 0,
which in turn leads to the solution
c =a ba a .
Therefore, the projection p of vector b onto a is given by
p = proja
b = ca =a ba a
a. (4)
If we rewrite the dot products in (4) in the equivalent form a b = aTb and a a = aTa, we have
projab =aTb
aTaa.
Realizing that this is a scalar aTb
aTamultiplied by the vector a and rearranging, we have3
projab = aaTb
aTa=aaT
aTab.
Note that the quantity aaT
aTaactually represents a matrix called the projection matrix P. (It is a matrix
because aaT is a column times a row (say an n
1 times a 1
n, so the product is an n
n matrix), andaTa is the familiar dot product of a with itself.) Thus we conclude that the projection ofb onto a can be
found by multiplying the projection matrix P = aaT
aTaby the vector b:
p = Pb.
3 The 11 "matrix" (i.e. scalar) aTa is called an "inner product" while the nn matrix aaT is called the "outer product."
11
-
7/31/2019 Notes 4 Supp Diffeq
12/23
Example 21 The matrix that projects any vector onto the line through the point a = (1; 1; 1) is given by
P =aaT
aTa=
1
3
24 11
1
35 1 1 1 =
24 13 13 131
3
1
3
1
31
3
1
3
1
3
35 .
For example, to determine the projection of (2; 3; 1) onto the line through (1; 1; 1), we would simply calculate24
1
3
1
3
1
31
3
1
3
1
31
3
1
3
1
3
3524
215
35 =
24
8
38
38
3
35 .
Note again the ease with which the projections can be found if the vector a has unit length. The dotproduct a a would be 1, and the resulting formulas would become
projab = (a b)a
andP = aaT.
Example 22 Determine the projection of the vectorv = (6; 7) onto the vectoru = (1; 4) :
Method 1 Using the formula projab =abaaa, we have
projuv =34
17(1; 4) = (2; 8) .
Method 2 Using the projection matrix P = aaT
aTa, we nd
P =1
17
14
1 4
=
1
17
4
174
17
16
17
.
Then
projuv = Pv =
1
17
4
174
17
16
17
67
=
28
,
both of which appear to agree with the gure shown below.
(2,8)
(1,4)
(6,7)
(2,8)
(1,4)
(6,7)
u
p
O
v
12
-
7/31/2019 Notes 4 Supp Diffeq
13/23
Gram-Schmidt Orthonormalization
Recall that, in R2, the projection of a vector v onto a nonzero vector u is given by
projuv =u vu uu.
If the vector u is of unit length, this projection becomes
projuv =u vu
uu = (u v)u. (5)
Now suppose we have a basis fw1; :::;wng for some vector space V and we wish to use this basis toconstruct an orthogonal (or orthonormal) basis fv1;:::;vng for V. Start by choosing
v1 = w1,
(where v1 6= 0 because w1 was a member of the original basis). We then require that the second vector beorthogonal to the rst, or v1 v2 = 0. Weve seen previously that at least one way to obtain an orthogonalvector is to consider the perpendicular dropped from v onto u in the projection projuv:
proj(v)
v
v - proj(v)
u
proj(v)
v
v - proj(v)
u
O
So lets take the next vector, v1, to be the perpendicular dropped from w2 onto v1, i.e.
v2 = w2 projv1w2. (6)
As conrmation of this choice, note that this will satisfy the orthogonality requirement because
v1 v2 = v1 w2 projv1w2= v1 w2 v1 v1 w2
v1 v1 v1= v1 w2 v1 w2= 0.
Because v1 = w1 and w2 are members of the original basis, we know they are linearly independent andtherefore v1 and v2 are also linearly independent, and thus v2 = w2 v1w2v1v1 v1 6= 0.
Now we need the third basis vector to be perpendicular to the rst two. Note from Eq. (6) that inorder to construct a new orthogonal basis vector (i.e. v2), we took the next given basis vector (i.e., w2) andremoved the component ofw2 that pointed in the direction of v1, our already settled basis vector. If wecontinue in this manner, to nd v3 we would subtract the components ofw3 in the directions ofv1 and v2 to
obtain a vector that is perpendicular to both v1 and v2, then to nd v4 we would subtract the componentsofw4 in the direction ofv1, v2, and v3, and so on. In other words, we will take
v3 = w3 projv1w3 projv2w3,
and thenv4 = w4 projv1w4 projv2w4 projv3w4,
and so on. This leads to the following generalization:
13
-
7/31/2019 Notes 4 Supp Diffeq
14/23
Theorem 23 Gram-Schmidt Orthogonalization: Let W = fw1;:::;wng be a basis for a vector space V.To create a set of orthogonal basis vectors B = fv1;:::;vng from W, construct the vi as follows:
v1 = w1
v2 = w2 projv1w2v3 = w3 projv1w3 projv2w3
...
vn = wn
projv1wn
projv2wn
projvn1wn
To create an orthonormal basis, normalize each of the vectors vi.
If we normalize the vectors as we go through the process, all of the dot products, as we are remindedin (5), are easier to calculate. However, the normalization usually introduces many square roots into thecalculation, which may be cumbersome to work with.
Here are some examples of this process.
Example 24 Apply the Gram-Schmidt process to the following basis for R2: B = f(1; 1) ; (0; 1)g.
Solution: Choose v1 = (1; 1). Then remove the component of w2 = (0; 1) that points in the directionof v1:
v2 = w2 projv1w2= (0; 1) (1; 1) (0; 1)
(1; 1) (1; 1) (1; 1)
= (0; 1)
1
2;
1
2
=
12
;1
2
:
Therefore an orthogonal basis for R2 based on the two vectors (1; 1) and (0; 1) would be (1; 1) and1
2; 12
.
If we desire and orthonormal basis, divide each vector by its respective length, namely kv1k =p
2 and
kv2
k= 1
p2, so the basis would be
p2
2;p2
2 andp2
2;p2
2 .Note: Had we chosen v1 = (0; 1), we would have found
v2 = (1; 1) (0; 1) (1; 1)(0; 1) (0; 1) (0; 1) = (1; 0) ,
which we should have been able to guess in the rst place, since (1; 0) and (0; 1) make up the standard basisfor R2!
Example 25 Apply the Gram-Schmidt process to the following basis for a three-dimensional subspace of R4:B = f(1; 2; 0; 3) ; (4; 0; 5; 8) ; (8; 1; 5; 6)g.
Solution: Choose v1 = (1; 2; 0; 3). Then remove the component of w2 = (4; 0; 5; 8) that points in the
direction of v1:
v2 = w2 projv1w2= (4; 0; 5; 8) (1; 2; 0; 3) (4; 0; 5; 8)
(1; 2; 0; 3) (1; 2; 0; 3) (1; 2; 0; 3)= (4; 0; 5; 8) (2; 4; 0; 6)= (2; 4; 5; 2) :
14
-
7/31/2019 Notes 4 Supp Diffeq
15/23
Now remove the components of w3 = (8; 1; 5; 6) that point in the directions of v1 and v2:
v3 = w3 projv1w3 projv2w3= (8; 1; 5; 6) (1; 2; 0; 3) (8; 1; 5; 6)
(1; 2; 0; 3) (1; 2; 0; 3) (1; 2; 0; 3) (2; 4; 5; 2) (8; 1; 5; 6)
(2; 4; 5; 2) (2; 4; 5; 2) (2; 4; 5; 2)= (8; 1; 5; 6) (2; 4; 0; 6) (2; 4; 5; 2)= (4; 1; 0; 2) .
We conclude that the setf
(1; 2; 0; 3) ; (2;
4; 5; 2) ; (4; 1; 0;
2)g
constitutes an orthogonal basis for this par-ticular subspace. We get an orthonormal basis by dividing each vector by its length:
k(1; 2; 0; 3)k =p
14
k(2; 4; 5; 2)k = 7k(4; 1; 0; 2)k =
p21,
so the orthonormal basis is given by1p14
;2p14
; 0;3p14
;
2
7;47
;5
7;
2
7
;
4p21
;1p21
; 0;2p
21
.
Projection and Distances on Subspaces; QR-Factorization
Quick Review
We now know how to project one vector onto another vector, namely via any of the following formulas:
projuv =u vu uu or projuv =
uTv
uTuu or projuv =
uuT
uTuv.
We also know how to write any vector w in a vector space V in terms of its orthonormal basis vectorsfu1;:::;ung:
w = (w u1)u1 + (w u2)u2 + + (w un)un.Finally, weve devised a way to generate an orthonormal basis fv1; :::;vng from another basis fw1;:::;wngvia the Gram-Schmidt process:
v1 = w1
v2 = w2 projv1w2v3 = w3 projv1w3 projv2w3
...
vn = wn projv1wn projv2wn projvn1wn.
Projection onto a Subspace
The projection of a vector v onto a subspace tells us "how much" of the given vector v lies in that particularsubspace. Put another way (and rather non-rigorously), the projection ofv onto the subspace tells us "howmany" of each of the subspaces orthonormal basis vectors we would need to represent v. We have met this
quantity before, and you should recognize the right-hand side of the following.
Denition 26 Consider the subspace W of Rn and letfu1; :::;ukg be an orthonormal basis for W. Ifv isa vector in Rn, the projection of vectorv onto the subspace W, denoted proj
Wv, is dened as
projWv = (v u1)u1 + (v u2)u2 + + (v uk)uk.
15
-
7/31/2019 Notes 4 Supp Diffeq
16/23
This is the exact same formula we encountered when writing a vector in terms of orthonormal basisvectors of a particular subspace! In addition, it would make sense (and we accept without proof) that everyvector in Rn can be "decomposed" into a vector w within a vector space W and a vector w? orthogonal toW. In symbols,
v = w + w?, where w is in W and w? is in W?.
It should come as no surprise, especially if one considers the two-dimensional case, that
w = projWv,
and because v = w + w?, we must have
w? = v projWv.
Example 27 Suppose we have the vector v = (3; 2; 6) in R3, and we wish to decomposev into the sum ofa vector that lies in the subspace W consisting of all vectors of the form (a;b;b) and a vector orthogonal tothat subspace.
Solution: The vectors (1; 0; 0) and (0; 1; 1) span all of W and are orthogonal (hence linearly independent),and therefore form a basis for W. Normalizing, we nd orthonormal basis vectors
u1 = (1; 0; 0) and u2 =
0;
1p2
;1p
2
.
Then
w = projWv
= (v u1)u1 + (v u2)u2= ((3; 2; 6) (1; 0; 0)) (1; 0; 0) +
(3; 2; 6)
0;
1p2
;1p
2
0;
1p2
;1p
2
= (3; 0; 0) + (0; 4; 4)
= (3; 4; 4) .
Now,
w
?= v
proj
Wv
= (3; 2; 6) (3; 4; 4)= (0; 2; 2) .
We can then conclude that (3; 4; 4) is a vector in W while (0; 2; 2) is a vector that is orthogonal to W.
Distance from a Point to a Subspace
Again, it would seem reasonable to extend the concept of "distance between points" to "distance from apoint to a line" to "distance from a point to a subspace" by realizing that the latter is simply the distanceof the point from its projection in the subspace. In symbols,
d (x; W) =x proj
Wx
.
Example 28 Determine the distance of the point x = (4; 1; 7) from the subspace W discussed in the pre-vious example.
Solution: We have already found an orthonormal basis for W, namely
u1 = (1; 0; 0) and u2 =
0;
1p2
;1p
2
.
16
-
7/31/2019 Notes 4 Supp Diffeq
17/23
Then
projWx = (x u1)u1 + (x u2)u2
= ((4; 1; 7) (1; 0; 0)) (1; 0; 0) +
(4; 1; 7)
0;1p
2;
1p2
0;
1p2
;1p
2
= (4; 0; 0) + (0; 3; 3)= (4; 3; 3) .
The distance of the pointx from the subspace W is then
x projWx = k(4; 1; 7) (4; 3; 3)k
= k(0; 4; 4)k=
p32.
Orthogonal Matrices
Denition 29 Anorthogonal matrixis a square matrix with orthonormal columns. Denoting this matrixQ, it is easy to determine that QTQ = I, and therefore, QT = Q1. In other words, the transpose of anorthogonal matrix is its inverse.4
Example 30 Consider the rotation matrix Q =
cos sin sin cos
. Then QT =
cos sin
sin cos
; and
it is easy to verify that QT
Q = I. This type of matrix is called an isometry because it represents alength-preserving transformation. We can calculate the length of(1; 2)T to be p5. Then,cos sin sin cos
12
=
cos 2sin 2cos + sin
,
which still has a length ofp
5:
Example 31 All permutation matrices are orthogonal, hence we conrm that the inverse of a permutationmatrix is actually its transpose.
Another important property of orthogonal matrices is that multiplication by Q preserves lengths, innerproducts, and angles (i.e., lengths, inner products, and angles that existed before multiplication by Q will
be the same after multiplication by Q.) For instance, lengths are equivalent (i.e., kQxk2 = kxk2) because
(Qx)
T
(Qx) = xT
QT
Qx = xT
x, and inner products are preserved because (Qx)
T
(Qy) = xT
QT
Qy = xT
y.Therefore, the following statements are equivalent if they are about an n n matrix Q:1. Q is orthogonal.
2. kQxk = kxk for exery x in Rn:3. Qx Qy = x y for every x and y in Rn.Note that the discussion earlier regarding the expression of a vector v as a linear combination of a
subspaces orthonormal basis vectors can be reinterpreted here if we consider again the system Ax = b.This time, however, we will consider Qx = b, where the columns of Q are the orthonormal basis vectors.Then writing b as a linear combination of the basis vectors fq1;:::;qng simply equates to solving the system
x1q1 + x2q2 + + xnqn = b, or Qx = b.The solution to this system is x = Q1b, and since Q1 = QT, this becomes
x =QTb =
264
qT1
...
qTn
3752664
...b...
3775 =
264
qT1
b...
qTn b
375 , (7)
4 The QTQ = I relation still works even ifQ is not square. IfQ is an m n matrix, QT would be an nm matrix, andtheir product would be a square identity matrix.
17
-
7/31/2019 Notes 4 Supp Diffeq
18/23
where the components ofx are the dot products of the orthonormal basis vectors with b, as we would expect.
Note: When we projected a vector b onto a line, we ended up with the expression aTb
aTa. Note here
that a is actually qi, and because of the unit lengths, the denominator is 1. What Eq. (7) then showsis that every vector b is the sum of its one-dimensional projections onto the lines spanned by each of theorthonormal vectors qi.
Note: Furthermore, because QT = Q1 we have QQT = I (in addition to QTQ = I). This leads to thesomewhat remarkable conclusion that the rows of a square matrix are orthonormal whenever the columnsare!
QR-Factorization
In the Gram-Schmidt process, we start with independent vectors in Rm, namely fa1; :::;ang ; and end withorthonormal vectors fq1; :::;qng (again in Rm) . If we make these vectors the columns of matrices A andQ, respectively, we have two m n matrices. Is there a third matrix that connects these two?
Recall that we can easily write vectors in a space as linear combinations of the vectors in any orthonormalbasis of that space. Since the qi constitute an orthonormal basis, we have
a1 =qT1a1q1 +
qT2a1q2 + +
qTna1
qn
a2 =qT1a2q1 +
qT2a2q2 + +
qTna2
qn
a3 =qT1a3q1 +
qT2a3q2 + +
qTna3
qn
...
an =qT1anq1 +
qT2anq2 + + qTnanqn:
However, because of the manner in which the Gram-Schmidt process is performed, we know that vector a1is orthogonal to the vectors q2;q3;q4;:::, the vector a2 is orthogonal to the vectors q3;q4;q5;:::, the vectora3 is orthogonal to the vectors q4;q5;q6;:::, and so on. Therefore, all of the dot products q
Tj ai with j > i
will equal zero, yielding the following:
a1 =qT1a1q1
a2 =qT1a2q1 +
qT2a2q2
a3 =qT1a3q1 +
qT2a3q2 +
qT3a3q3
...
an = qT1 anq1 + qT2 anq2 + + qTnanqn:Of course, this corresponds exactly to the following system:
A =
24 a1 a2 an
35
| {z }mn
=
24 q1 q2 qn
35
| {z }mn
2666664
qT1a1
qT1a2
qT1a3 qT
1an
0qT2a2
qT2a3 qT
2an
0 0qT3a3 qT
3an
......
.... . .
...0 0 0 0
qTnan
3777775
| {z }nn
= QR,
and we have arrived at the QR-Factorization of matrix A, in which Q has orthonormal columns and R is
upper triangular (because of how Gram-Schmidt is performed - we start with vector a, which falls on thesame line as q1. Then vectors a1 and a2 are in the same plane as q1 and q2, and so on). Thus matrix R isthe matrix that connects Q back to A, and we have the following theorem:
Theorem 32 Let A be an m n matrix with linearly independent columns. Then A can be factored asA = QR, where Q is an m n matrix with orthonormal columns and R is an invertible upper triangularmatrix.
18
-
7/31/2019 Notes 4 Supp Diffeq
19/23
Example 33 Find a QR factorization of
A =
2664
1 2 21 1 21 0 11 1 2
3775 .
Solution: It is easy to determine that the columns of A are linearly independent, so it forms a basis forthe subspace spanned by those columns (i.e., the column space of A). Start the Gram-Schmidt process by
settingv1
= a1
:
v1 =
0BB@1
111
1CCA .Then,
v2 =
0BB@
2101
1CCA
v1 a2v1 v1
0BB@1
111
1CCA =
0BB@
2101
1CCA
2
4
0BB@1
111
1CCA =
0BB@
3
23
21
21
2
1CCA .
Note: Since we will be normalizing later, we can "rescale"v2 without changing any orthogonality relation-ships to make future calculations easier. So well replacev2 withv02 = (3; 3; 1; 1). Finally,
v3 =
0BB@2212
1CCAv1 a3v1 v1
0BB@1
111
1CCAv02 a3
v02 v0
2
0BB@3311
1CCA
=
0BB@
2212
1CCA
1
4
0BB@1
111
1CCA
15
20
0BB@3311
1CCA
=
0BB@
12
01
2
1
1CCA
.
We can again rescale v3 to obtain v03
=
0BB@1012
1CCA. We now have an orthogonal basis fv1;v02;v03g for thesubspace W. Now, to obtain an orthonormal basis, normalize each vector (the details are left to you):
fq1; q2; q3g =
8>>>:0BB@
1=21=21=21=2
1CCA ;
0BB@
3p
5=10
3p
5=10p5=10p5=10
1CCA ;
0BB@
p6=60p6=6p6=3
1CCA9>>=>>; .
Now, to obtain a QR factorization for A, we have
Q =
26641=2 3p5=10 p6=6
1=2 3p5=10 01=2 p5=10 p6=61=2
p5=10
p6=3
3775 .
Because Q has orthonormal columns, we know that QTQ = I. Therefore, if A = QR,
QTA = QTQR = IR = R.
19
-
7/31/2019 Notes 4 Supp Diffeq
20/23
So to nd R, just calculate QTA:
QTA =
24 12 12 12 123p5=10 3p5=10 p5=10 p5=10
p6=6 0 p6=6 p6=3
352664
1 2 21 1 21 0 11 1 2
3775
=
2
4
2 1 12
0p
5 32
p5
0 0 12
p6
3
5= R.Note that the diagonals of R contain the lengths of vectors v1, v2, and v3.
Using the QR Factorization to Solve Systems
Note that the system Ax = b becomes QRx = b, and hence
Rx = QTb (8)
(because Q1 = QT). Because R is upper triangular, the equation in (8) can be solved easily via backsubstitution. For example, given the system Ax = (0; 4; 5) and the fact that A = QR factorization yields
24 1 1 21 0 21 2 3
35 = 2641
p34
p422
p141p3
1p42
3p14
1p3
5p42
1p14
375264 p3 1
p3 p30
p14p3
p21p2
0 0p7p2
375 ,We nd
QTb =
24 13
p3 1
3
p3 1
3
p3
2
21
p42 1
42
p42 5
42
p42
1
7
p14 3
14
p14 1
14
p14
3524 04
5
35 =
24 3
p3
1
2
p42
1
2
p14
35 .
Then solve
Rx =
264
p3 1p
3p3
0p14p3
p21p2
0 0p7
p2
375
2
4xyz
3
5=
2
43p31
2
p42
1
2
p14
3
5by back substitution to obtain
x =
24 20
1
35 .
Least Squares and the QR Factorization
Review of Least Squares and the Normal Equations
This topic builds o of what we did in Computer Lab #10. In the lab, we learned:
In a least-squares situation, in order to minimize all of the errors (specically, the sum of the squared
distances between the "best-t" line and the actual data points), we needed to determine the vectorin Ax that was closest to the vector b.
This is the same as determining the projection of b onto a subspace, and that subspace was actuallythe column space of A.
Typically, in a least squares setting, we have many more data points than variables, so if A is m n,then m > n, and we most likely do not have an exact solution (i.e., rarely will all the points follow themathematical model exactly).
20
-
7/31/2019 Notes 4 Supp Diffeq
21/23
In terms of matrix subspaces, the vector b will most likely be outside the column space of A. However, the point p in the subspace that is closest to b would be in the column space of A, so it can
be written as p = Abx, where bx represents the "best estimate" vector to the "almost" solution vectorx.
Since p is the projection of b onto the column space, the error vector we wish to minimize, i.e.e = b Abx, will be orthogonal to that space.
However, if a vector is orthogonal to the column space of the matrix A, it is also orthogonal to the rowspace of the transpose A
T
, and any vector orthogonal to the row space of a matrix is in the null spaceof that matrix.
Therefore, because e is orthogonal to the column space of A, we can conclude that it is in the nullspace of AT. This is what nally allowed us make the following important connection:
AT (b Abx) = 0ATb ATAbx = 0
ATAbx = ATb, (9)the last line of which describes what are called the normal equations.
Finally, the matrix ATA is invertible exactly when the columns of A are linearly independent.5 Then,the best estimate
bx, which gives us the coecients in the mathematical model (or "line" of best-t), 6
can be found as bx = ATA1 ATb:Example 34 Find a least squares solution to the inconsistent system Ax = b, where
A =
24 1 52 2
1 1
35 and b =
24 32
5
35 .
Solution: Compute
ATA =
1 2 15 2 1
24 1 52 21 1
35 = 6 0
0 30
andATb =
1 2 15 2 1
24 325
35 = 216
.
Then the normal equations are
ATAbx = ATb6 00 30
bx = 216
,
from which it is easy to see thatbx = 13
; 815
T.
Example 35 Find the least squares approximating line for the data points (1; 2), (2; 2), and (3; 4).Solution: We want the liney = a + bx that is best ts these three points. The appropriate system would
be
a + b (1) = 2
a + b (2) = 2
a + b (3) = 4
5 Be careful here - because A might be rectangular, we are acutally dealing with what is called a "left inverse," and the
relationATA
1
= A1AT
1
does not hold as it does with square matrices.6 I use quotes here because we are not limited to linear models with this technique.
21
-
7/31/2019 Notes 4 Supp Diffeq
22/23
which can be reformed into Ax = b as 24 1 11 2
1 3
35 a
b
=
24 22
4
35 .
Again, compute
ATA =
1 1 11 2 3
2
41 11 21 3
3
5=
3 66 14
and
ATb =
1 1 11 2 3
24 224
35 = 8
18
.
Solving 3 66 14
bx = 818
leads to the solution bx = 23
; 1T
, so the equation for the line of best t would be y = 23
+ x, shown in theplot below along with the three data points:
52.50-2.5-5
5
2.5
0
-2.5
x
y
x
y
While were at it, we can also calculate the actual least squares error. If bx represents the least squaressolution of Ax = b, it is the vector in the column space of A that is closest to b. The actual distance fromb to bx would simply be the length of the perpendicular component of the projection ofb onto A. In symbols,
kek = kb Abxk .Now,
e = b Abx =24 22
4
35
24 1 11 2
1 3
35 23
1
=
24 132
31
3
35 ,
and the length ofe is thenq132 + 2
32 + 1
32 = q2
3 0:816.
Least Squares and the QR Factorization
One major advantage of orthogonalization is that it greatly simplies the least squares problem Ax = b.The normal equations from (9) are still
ATAbx = ATb,
22
-
7/31/2019 Notes 4 Supp Diffeq
23/23
but with QR factorization, ATA becomes
ATA = (QR)T
(QR) = RTQTQR = RTR, (because QT = Q1).
Then, the equations in (9) become
ATAbx = ATbRTRbx = RTQTb,
or,R
bx = QTb. (10)
Although this may not look like much of an improvement, it most certainly is, particularly because R isupper triangular. Therefore, the solution to (10) can be found via back substitution. We still need to useGram-Schmidt to produce Q and R, but the payo is that the equations in (10) are less prone to numericalinaccuracies such as round-o error.
Example 36 Consider the previous example in which we found the line of best t for the points (1; 2), (2; 2),and (3; 4). If we instead nd the QR factorization, we have
A =
24 1 11 2
1 3
35 =
24 13
p3 1
2
p2
1
3
p3 0
1
3
p3 1
2
p2
35 p3 2p3
0p
2
= QR.
Then Rbx = QTb becomes p3 2p30 p2 bx =
1
3p31
3p31
3p312
p2 0 12
p2 242
2435 = 83p3p2 .
Hence,p
2b =p
2 ) b = 1 and so p3a + 2p3(1) = 8p3
3) a = 2
3, as we found earlier.
An Aside: Least Squares and Calculus
Consider the simple system
a1x = b1
a2x = b2:
a3x = b3
This is solvable only if b1; b2, and b3 are in the ratio of a1 : a2 : a3. In practice, this would rarely be the
case if the above equations came from "real" data. So, instead of trying to solve the unsolvable, we proceedby choosing an x that minimizes the average error E in the equations. A convenient error measurement touse is the "sum of squares," namely
E2 = (a1x b1)2 + (a2x b2)2 + (a3x b3)2 .If there was an exact solution, E = 0. If there is not an exact solution, we can nd the minimum error bysetting the derivative of E2 = 0
dE2
dx= 2[(a1x b1) a1 + (a2x b2) a2 + (a3x b3) a3] = 0
and then solving for x:
0 = 2 ((a1x b1) a1 + (a2x b2) a2 + (a3x b3) a3)= 2xa2
1 2a2b2
2a3b3
2a1b1 + 2xa
2
2+ 2xa2
3
)x =
a1b1 + a2b2 + a3b3a21
+ a22
+ a23
=aTb
aTa.
This result, which you should recognize as the coecient in the projection calculations, gives us the least-squares solution to a problem ax = b in one variable x.
23