lec07-leastsquares

8/9/2019 Lec07-LeastSquares

1/68

Least-Squares Systems and The QR factorization

Orthogonality

Least-squares systems. Text: 5.3

The Gram-Schmidt and Modied Gram-Schmidt processes.

Text: 5.2.7 , 5.2.8

The Householder QR and the Givens QR. Text: 5.1 ,

5.2 .

7-1 GvL 5, 5.3; Heath 3.1-3.5?; TB 7-(8),11? QR-NEW


2/68

Orthogonality The Gram-Schmidt algorithm

1. Two vectors u and v are orthogonal if (u, v ) = 0 .

2. A system of vectors {v1, . . . , v n } is orthogonal if (v i , v j ) =0 for i = j ; and orthonormal if (v i , v j ) = ij

3. A matrix is orthogonal if its columns are orthonormal

Notation: V = [ v1, . . . , v n ] == matrix with column-vectors v1, . . . , v n .



3/68

IMPORTANT: we reserve the term unitary for squarematrices. The term orthonormal matrix is not used.Even orthogonal is often used for square matrices.



4/68

Least-Squares systems

Given: an m

n matrix n < m . Problem: nd x which

minimizes:

b Ax 2 Good illustration: Data tting.

Typical problem of data tting: We seek an unknwonfunction as a linear combination of n known functions i (e.g. polynomials, trig. functions). Experimentaldata (not accurate) provides measures 1, . . . , m of this

unknown function at points t1, . . . , t m . Problem: nd thebest possible approximation to this data. ( t ) = ni=1 i i ( t ) , s.t. ( t j ) j , j = 1 , . . . , m7-4 GvL 5, 5.3; Heath 3.1-3.5?; TB 7-(8),11? QR-NEW


5/68

Question: Close in what sense?

Least-squares approximation: Find such that

( t ) = ni=1 i i ( t ) , &

m j =1 | ( t j ) j |

2= Min

Translated in linear algebra terms: nd best approx-imation vector to a vector b from linear combinations of vectors f i , i = 1 , . . . , n , where

b =

1 2...

m

, f i =

i ( t 1) i ( t 2)

... i ( t m )

We want to nd x = { i } i =1 ,...,n such thatn

i =1

i f i b2

Minimum



6/68

Dene

F = [ f 1, f 2, . . . , f n ], x =1...

n

We want to nd x to minimize b F x 2 . Least-squares linear system. F is m

n , with m > n .



7/68

THEOREM. The vector x mininizes (x ) = b F x 22

if and only if it is the solution of the normal equations:

F T

F x = F T

b

Proof: Expand out the formula for (x + x ) :

(x + x ) = ( b F (x + x )) T (b F (x + x ))

= bT b + ( x + x )T F T F (x + x ) 2( x + x )F T b= (x

) + 2( x )T F T F x

2( x )T F T b + 2( x )T F T F ( x )= (x

) + 2( x )T (F T F x

F T b)

x

+ 2( x )T F T F ( x )

always positive

In order for (x + x ) (x) for any x , the boxed

quantity [the gradient vector] must be zero. Q.E.D.



8/68

b

b ! F x *

F x *0}F{

Illustration of theorem: x is the best approximation to

the vector b from the subspace span {F } if and only if bF x isto the whole subspace span {F } . This in turnis equivalent to F T (b F x) = 0 Normal equations.



9/68

Example:

Points: t1 = 1 t2 = 1/ 2 t3 = 0 t4 = 1 / 2 t5 = 1Values: 1 = 0 .1 2 = 0 .3 3 = 0 .3 4 = 0 .2 5 = 0 .0

! 1.5 ! 1 ! 0.5 0 0.5 1 1.5! 0.1

0

0.1

0.2

0.3

0.4

0.5



10/68


11/68

2) Approximation by polynomials of degree 2:

1( t ) = 1 , 2( t ) = t, 3( t ) = t2.

Best polynomial found:0.3085714285 0.06 t 0.2571428571 t 2

! 1.5 ! 1 ! 0.5 0 0.5 1 1.5! 0.1

0

0.1

0.2

0.3

0.4

0.5



12/68

Problem with Normal Equations

Condition number is high: if A is square and non-singularwith SVD A = U V T , then

2(A ) = A2 A 12 = max / min

2(A T A ) = AT A2 (A

T A )12 = ( max / min )2

Example: Let A =1 1 0 10 1

.

Then (A )

3/ , but (A T A )

O (

2) .

f l (A T A ) = f l1 + 2 1 0

1 1 + 2 00 0 2 + 2

=1 1 01 1 00 0 2

is singular to working precision (if

u ).



13/68

Find Orthonormal Basis for a subspace

Goal: Find vector in span( X ) closest to b.

Much easier with an orthonormal basis for span( X ) .

Problem: Given X = [ x 1, . . . , x n ], compute Q =[q1, . . . , q n ] which is orthonormal and s.t. span( Q ) =

span( X ) .



14/68

= Q

R*

X

Q is orthogonal (Q Q = I)

Originalmatrix H

R is upper triangular

Another decomposition:A matrix X , with linearly independent columns, is the

product of an orthogonal matrix Q and a upper triangularmatrix R .



15/68


16/68

Lines 5 and 7-8 show thatx j = r 1 j q1 + r 2 j q2 + . . . + r jj q j

If X = [ x 1, x 2, . . . , x n ], Q = [ q1, q2, . . . , q n ], and if R is the n n upper triangular matrix

R = {r ij } i,j =1 ,...,n

then the above relation can be written asX = QR

R is upper triangular, Q is orthogonal. This is calledthe QR factorization of X .

What is the cost of the factorization when X Rm

n

?



17/68

= Q

R*

X

Q is orthogonal (Q Q = I)

Originalmatrix H

R is upper triangular

Another decomposition:A matrix X , with linearly independent columns, is the

product of an orthogonal matrix Q and a upper triangularmatrix R .



18/68

Better algorithm: Modied Gram-Schmidt.ALGORITHM : 2 Modied Gram-Schmidt

1. For j = 1 , . . . , n Do:2. Dene q := x j3. For i = 1 , . . . , j 1, Do:4. r ij := ( q, q i )5. q := q

r ij qi

6. EndDo7. Compute r jj := q2,8. If r jj = 0 then Stop, else q j := q/r jj9. EndDo

Only diff erence: inner product uses the accumulated sub-sum instead of original q



19/68

The operations in lines 4 and 5 can be written asq := ORT H ( q, q i )

where ORTH (x, q ) denotes the operation of orthogonal-izing a vector x against a unit vector q.

q

xz

(x,q)

Result of z = ORT H (x, q )



20/68

Modied Gram-Schmidt algorithm is much more stablethan classical Gram-Schmidt in general. [A few exampleseasily show this].

Suppose MGS is applied to A yielding computed matricesQ and R . Then there are constants ci (depending on(m, n )) such that

A + E 1 = Q R E 12 c1 u A2Q

T Q I 2 c2 u 2(A ) + O ((u 2(A )) 2)for a certain perturbation matrix E 1, and there exists anorthonormal matrix Q such that

A + E 2 = QR E 2(: , j )2 c3u A (: , j )2for a certain perturbation matrix E 2.



21/68

An equivalent version:ALGORITHM : 3 Modied Gram-Schmidt - 2 -

0. Set Q := X 1. For i = 1 , . . . , n Do:

2. Compute r ii := qi2,3. If r ii = 0 then Stop, else qi := qi /r ii4. For j = i + 1 , . . . , n , Do:5. r ij := ( q j , q i )6. q j := q j r ij qi7. EndDo8. EndDo

Does exactly the same computation as previous algo-rithm, but in a di ff erent order.



22/68

Example: Orthonormalize the system of vectors:

X = [ x 1, x 2, x 3] =1 1 11 1 01 0 11 0 4



23/68

Answer:

q1 =

121

21212

; q2 = x2 (x 2, q 1)q1 =1

100

1

121

21212

q2 =

1212

1212

; q2 =

1212

1212



24/68


25/68

For this example: compute QT Q .

Result is the identity matrix.

Recall: For any orthogonal matrix Q , we haveQ T Q = I

(In complex case: QH Q = I ).Consequence: For an n

n orthogonal matrix

Q 1 = QT . (Q is unitary)



26/68

Application: another method for solving linear systems.

Ax = b

A is an n n nonsingular matrix. Compute its QR factor-ization.

Multiply both sides by QT QT QRx = QT b Rx = Q

T

b



27/68


28/68

Use of the QR factorization

Problem: Ax b in least-squares senseA is an m n (full-rank) matrix. Let

A = QR

the QR factorization of A and consider the normal equa-

tions:A T Ax = AT b R T Q T QRx = R T Q T b

R T Rx = R T Q T b Rx = QT b(R

T is an n n nonsingular matrix). Therefore,x = R 1Q T b



29/68

Another derivation:

Recall: span( Q ) = span( A )

So b Ax 2 is minimum when b Ax span {Q } Therefore solution x must satisfy QT (b Ax ) = 0

Q T (b

QRx ) = 0

Rx = QT b

x = R 1Q T b



30/68

Also observe that for any vector ww = QQ T w + ( I QQ T )w

and that w = QQT

w (I QQT

)w Pythagoras theorem:

w22 = QQ

T w22 + (I QQ T )w22



31/68

b Ax 2 = b QRx 2=

( I

QQ T )b + Q (Q T b

Rx )

2

= ( I QQ T )b2 + Q (Q T b Rx )2= ( I QQ T )b2 + Q T b Rx 2

Min is reached when 2nd term of r.h.s. is zero.



32/68


33/68

As a rule it is not a good idea to form AT A and solvethe normal equations. Methods using the QR factorizationare better.

Total cost?? (depends on the algorithm use to get theQR decomp.

Using matlab nd the parabola that ts the data in

previous example in L.S. sense [verify that the result foundis correct.]



34/68

Householder QR

Householder reectors are matrices of the form

P = I 2ww T ,where w is a unit vector (a vector of 2-norm unity)

7-34 GvL 5.1; Heath 3.5; TB 10 hou


35/68

w

w

Px x Geometrically, P x repre-sents a mirror image of xwith respect to the hyper-plane span {w }.

7-35 GvL 5.1; Heath 3.5; TB 10 hou


36/68


37/68

P can be written as P = I vv T with = 2 / v22,where v is a multiple of w . [storage: v and ]

P x can be evaluated x

(x T v )

v (op count?) Similarly: P A = A vz T where zT = vT A NOTE: we work in R m , so all vectors are of length m ,P is of size m

m , etc.

7-37 GvL 5.1; Heath 3.5; TB 10 hou


38/68

Problem 1: Given a vector x = 0 , nd w such that

( I

2ww T )x = e1,

where is a (free) scalar.

7-38 GvL 5.1; Heath 3.5; TB 10 hou


39/68

Writing (I vv T )x = e1 yields (vT x ) v = x e1. (1)

Desired w is a multiple of x e1, i.e., we can takev = x e1 To determine we just recall that

(I 2ww T )x2 = x2 As a result: | | = x2, or

= x2

7-39 GvL 5.1; Heath 3.5; TB 10 hou


40/68


41/68

.. Show that (I vv T )x = e1 when v = x e1and = x2.

Equivalent to showing thatx ( x T v)v = e1 x e1 = ( x T v)v

but recall that v = x e1 so we need to show that x T v = 1 i.e., that 2x

T

vx e122

= 1

Denominator = x22 +

2 2 eT 1 x = 2( x22 eT 1 x )

Numerator = 2 xT

v = 2 xT

(x e1) = 2( x22 x

T

e1)Numerator/ Denominator = 1. Done

7-41 GvL 5.1; Heath 3.5; TB 10 hou


42/68

Alternative:

Dene =

mi =2

2i .

Always set 1 = 1 x2. Update OK when 1 0 When 1 > 0 compute x 1 as

1 = 1

x

2 = 21 x221 + x

2=

1 + x2

So: 1 = 1+ x2 if

1 > 01 x2 if 1 0

It is customary to compute a vector v such that v1 = 1 .So v is scaled by its rst component.

If is zero, procedure will return v = [1; x (2 : m )] and

= 0 .


43/68

Matlab function:

function [v,bet] = house (x)%% computes the householder vector for x m = length(x);v = [1 ; x(2:m)];sigma = v(2:m) * v(2:m);if (sigma == 0)

bet = 0;else

xnrm = sqrt(x(1)^2 + sigma) ;if (x(1)


44/68

Problem 2: Generalization.

Given an m n matrix X , nd w1, w 2, . . . , w n such that(I 2w n w T n ) (I 2w 2w T 2 )( I 2w 1w T 1 )X = R

where r ij = 0 for i > j

7-44 GvL 5.1; Heath 3.5; TB 10 hou


45/68

First step is easy : select w1 so that the rst column of X becomes e1

Second step: select w2 so that x2 has zeros below 2ndcomponent.

etc.. After k 1 steps: X k P k1 . . . P 1X has thefollowing shape:

7-45 GvL 5.1; Heath 3.5; TB 10 hou


46/68

Example: Reduce the system of vectors:

X = [ x 1, x 2, x 3] =

1 1 11 1 01 0 11 0 4

7-46 GvL 5.1; Heath 3.5; TB 10 hou


47/68

Answer:

x 1 =

1111

, x1

2 = 2 , v1 =

1 + 2111

, w1 = 12 3

1 + 2111

P 1 = I 2w 1w T 1 = 163 3 3 3

3 5 1 13 1 5 13 1 1 5

,

P 1X =2 1 20 1/ 3 10 2/ 3 20 2/ 3 3

7-47 GvL 5.1; Heath 3.5; TB 10 hou


48/68

next stage:

x 2 =

01/ 3

2/ 32/ 3

, x 22 = 1 , v2 =0

1/ 3 + 1

2/ 32/ 3

,

P 2 = I 2vT 2 v2 v2vT 2 = 133 0 0 0

0 1 2 20 2 2 10 2 1 2

,

P 2P 1X =2 1 2

0 1 10 0 30 0 2

7-48 GvL 5.1; Heath 3.5; TB 10 hou


49/68

last stage:

x 3 =

0

023

, x 32 = 13, v1 =0

02 133

,

P 2 = I 2vT 3 v3 v3vT 3 =1 0 0 0

0 1 0 00 0 .83205 .554700 0 .55470 .83205

,

P 3P 2P 1X =2 1 2

0 1 10 0 130 0 0

= R ,


50/68

P 3P 2P 1 =.50000 .50000 .50000 .50000.50000 .50000 .50000 .50000

.13868

.13868

.69338 .69338

.69338 .69338 .13868 .13868 So we have the factorization

X = P 1P 2P 3 Q R

7-50 GvL 5.1; Heath 3.5; TB 10 hou


51/68

X k =

x 11 x12 x13 x1nx 22 x23 x2n

x33

x3n. . . ...

x kk ...x k+1 ,k xk+1 ,n

... ... ...x m,k xm,n

.

To do: transform this matrix into one which is uppertriangular up to the k-th column...

... while leaving the previous columns untouched.

7-51 GvL 5.1; Heath 3.5; TB 10 hou


52/68

To leave the rst k1 columns unchanged w must havezeros in positions 1 through k 1.

P k = I

2w kw T k , w k =

v

v

2,

where the vector v can be expressed as a Householdervector for a shorter vector using the matlab function house,

v = 0house (X (k : m, k ))

The result is that work is done on the (k : m, k : n )

submatrix.

7-52 GvL 5.1; Heath 3.5; TB 10 hou


53/68


54/68

Yields the factorization:X = QR

where Q = P 1P 2 . . . P n and R = X n

MAJOR di ff erence with Gram-Schmidt: Q is m mand R is m

n (same as X ). The matrix R has zeros

below the n -th row. Note also : this factorization alwaysexists. Cost of Householder QR? Compare with Gram-Schmidt

Question: How to obtain X = Q1R 1 where Q1 = samesize as X and R 1 is n n (as in MGS)?

7-54 GvL 5.1; Heath 3.5; TB 10 hou


55/68

Answer: simply use the partitioning

X = Q 1 Q2 R 1

0 X = Q1R 1

Referred to as the thin QR factorization (or economy-size QR factorization in matlab)

How to solve a least-squares problem Ax = b using theHouseholder factorization?

Answer: no need to compute Q1. Just apply QT to b. This entails applying the successive Householder reec-tions to b

7-55 GvL 5.1; Heath 3.5; TB 10 hou


56/68

The rank-decient case

Result of Householder QR: Q 1 and R 1 such that Q 1R 1 =X . In the rank-decient case, can have span {Q 1} =span {X } because R 1 may be singular. Remedy: Householder QR with column pivoting. Resultwill be:

A = Q R 11 R 120 0

R 11 is nonsingular. So rank (X ) = size of R 11 = rank (Q 1)and Q1 and X span the same subspace.

permutes columns of X .

7-56 GvL 5.1; Heath 3.5; TB 10 hou


57/68

Algorithm: At step k, active matrix is X (k : m, k : n ) .Swap k -th column with column of largest 2-norm in X (k :m, k : n ) . If all the columns have zero norm, stop.

Swap with columnof largest norm

X(k:m, k:n)

Practical Question: how to implement this ???

7-57 GvL 5.1; Heath 3.5; TB 10 hou


58/68

Properties of the QR factorization

Consider the thin factorization A = QR , (size( Q ) =[m,n] = size ( A )). Assume r ii > 0, i = 1 , . . . , n

1. When A is of full column rank this factorization existsand is unique

2. It satises:span {a 1, , a k} = span {q1, , qk} , k = 1 , . . . , n

3. R is identical with the Cholesky factor GT of AT A .

When A in rank-decient and Householder with pivotingis used, thenRan {Q 1} = Ran {A }

7-58 GvL 5.1; Heath 3.5; TB 10 hou


59/68

Givens Rotations

Matrices of the form

G ( i,k, ) =

1 . . . 0 . . . 0 0... . . . ... ... ... ... ...0 c s 0... ... . . . ... ... ...0 s c 0... ... ... ...0 . . . 0 . . . 1

i

k

with c = cos and s = sin

represents a rotation in the span of ei and ek .

7-59 GvL 5.1; Heath 3.5; TB 10 hou


60/68

Main idea of Givens rotations consider y = Gx then

y i = cx i + sx k

yk = sx i + cx ky j = x j for j = i, k Can make yk = 0 by selecting

s = xk /t ; c = x i /t ; t = x2i + x

2k

This is used to introduce zeros in the rst column of amatrix A (for example G (m 1, m ) , G (m 2, m 1)etc.. G (1 , 2) )..

See text for details

7-60 GvL 5.1; Heath 3.5; TB 10 hou


61/68

Orthogonal projectors and subspaces

Notation: Given a supspace X or R m deneX = {y |y x, x X }

Let Q = [ q1, , qr ] an orthonormal basis of X

How would you obtain such a basis?

Then dene orthogonal projector P = QQ T

7-61 GvL 5.4 URV


62/68

Properties

(a) P 2 = P (b) (I

P )2 = I

P

(c) Ran (P ) = X (d) Null (P ) = X (e) Ran (I P ) = Null (P ) = X

Note that (b) means that I

P is also a projector

7-62 GvL 5.4 URV


63/68

Proof. (a), (b) are trivial

(c): Clearly Ran (P ) = {x | x = QQ T y, y R m } X .Any x

X is of the form x = Qy, y

R m . Take P x =QQ T (Qy ) = Qy = x . Since x = P x , x Ran (P ) . SoX Ran (P ) . In the end X = Ran (P ) .(d): x X (x, y ) = 0 ,y X (x, Qz ) =0,z

R r

(QT

x, z ) = 0 ,z R r

QT

x = 0 QQ T x = 0 P x = 0 .(e): Need to show inclusion both ways. x Null (P ) P x = 0 (I P )x = x x Ran ( I P ) x Ran ( I P ) y R m |x = ( I P )y P x = P ( I P )y = 0 x Null (P )

7-63 GvL 5.4 URV


64/68

Orthogonal decomposition

Result: Any x

R m can be written in a unique way as

x = x1 + x 2, x 1 X , x 2 X Called the Orthogonal Decomposition

Just set x1 = P x, x 2 = ( I

P )x

In other words R m = P R m ( I P )R m orR m = Ran (P )Ran ( I P )

R m = Ran (P )Null (P )

Can complete basis {q1, , qr } into orthonormal basisof R m , qr +1 , , qm {qr +1 , q r +2 , , qm } = basis of X .

dim (X ) = m

r .

7-64 GvL 5.4 URV


65/68

Four fundamental supspaces - URV decomposition

Let A

R m n and consider Ran( A )Property 1: Ran( A ) = Null (A T )

Proof: x Ran( A ) iff (Ay,x ) = 0 for all y iff (y, A T x ) =

0 for all y ...

Property 2: Ran( AT

) = Null (A ) Take X = Ran( A ) in orthogonal decomoposition

7-65 GvL 5.4 URV

R l


66/68

Result:

R m = Ran (A )

Null (A T )R n = Ran (A T )Null (A )

4 fundamental subspacesRan (A ) Null (A ) ,Ran (A T ) Null (A T )

7-66 GvL 5.4 URV


67/68

F f i


68/68

Far from unique.

Show how you can get a decomposition in which C is

lower (or upper) triangular, from the above factorization. Can select decomposition so that R is upper triangular

URV decomposition. Can select decomposition so that R is lower triangular ULV decomposition. SVD = special case of URV where R = diagonal

How can you get the ULV decomposition by using onlythe Householder QR factorization (possibly with pivoting)?[Hint: you must use Householder twice]

lec07-leastsquares

Documents