lec08-svd

8/9/2019 Lec08-SVD

1/39

THE SINGULAR VALUE DECOMPOSITION

The SVD existence - properties.

Pseudo-inverses and the SVD

Use of SVD for least-squares problems

Applications of the SVD

Text: mainly sect. 2.4

8-1 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW

8/9/2019 Lec08-SVD

2/39

The Singular Value Decomposition (SVD)

Theorem For any matrix A Rmn there existunitary matrices U Rmm and V Rnn such

that

A =UVT

where is a diagonal matrix with entries ii 0.

11 22 pp 0 with p = min(n, m)

The ii are thesingular valuesofA.

ii is denoted simply by i


8/9/2019 Lec08-SVD

3/39

Proof: Let 1 = A2 = maxx,x2=1 Ax2. There

exists a pair of unit vectors v1, u1 such thatAv1 =1u1


8/9/2019 Lec08-SVD

4/39

Complete v1 into an orthonormal basis ofRn

V [v1, V2] =n n unitary

Complete u1 into an orthonormal basis ofRm

U [u1, U2] =m m unitary

Define U, V as single Householder reflectors.

Then, it is easy to show that

AV =U 1 wT

0 B UTAV = 1 wT

0 B A1


8/9/2019 Lec08-SVD

5/39

8/9/2019 Lec08-SVD

6/39

Case 1:

=

V

UA

T

Case 2:

A U

V

=

T


8/9/2019 Lec08-SVD

7/39

The thin SVD

Consider the Case-1. It can be rewritten as

A= [U1U2]10 VTWhich gives:

A=U11 VT

where U1 is m n (same shape as A), and 1 and V are

n n

Referred to as thethin SVD. Important in practice.

How can you obtain the thin SVD from the QR factor-

ization ofA and the SVD of an n n matrix?8-7 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW

8/9/2019 Lec08-SVD

8/39

A few properties. Assume that

1 2 r >0 and r+1 = =p = 0

Then:

rank(A) =r = number of nonzero singular values.

Ran(A) = span{u1, u2, . . . , ur}

Null(A) = span{vr+1, vr+2, . . . , vn}

The matrix A admits the SVD expansion:

A=r

i=1

iuivTi


8/9/2019 Lec08-SVD

9/39

Properties of the SVD (continued)

A2 =1 = largest singular value

AF =

ri=1

2i

1/2

When A is an n n nonsingular matrix then A12 =1/n = inverse of smallest s.v.


8/9/2019 Lec08-SVD

10/39

8/9/2019 Lec08-SVD

11/39

Proof: First show: for any rank-k matrix B we have A

B2 k+1.

Consider X= span{v1, v2, , vk+1}. Note:

dim(Null(B)) =n k Null(B) X ={0}

Let x0 Null(B) X ={0}. Write x0 =V y. Then

(A B)x02 =Ax02 =UVTV y2 =y2

But y2 k+1x02 (Show this).

Second: take B =Ak. Achieves the min.


8/9/2019 Lec08-SVD

12/39

Right and Left Singular vectors:

Avi = iui

ATuj = jvj

Consequence ATAvi =2i vi and AA

Tui =2i ui

Right singular vectors (vis) are eigenvectors ofATA

Left singular vectors (uis) are eigenvectors ofAAT

Possible to get the SVD from eigenvectors ofAAT andATA but: difficulties due to non-uniqueness of the SVD


8/9/2019 Lec08-SVD

13/39

Define the r r matrix

1 = diag(1, . . . , r)

Let A Rmn and consider ATA ( Rnn):

ATA=VTVT ATA =V 21 00 0

nn

VT

This gives the spectral decomposition ofATA.


8/9/2019 Lec08-SVD

14/39

Similarly, Ugives the eigenvectors ofAAT.

AAT =U

21 0

0 0

mm

UT

Important:

ATA = V D1VT and AAT = U D2U

T give the SVD

factors U, V up to signs!


8/9/2019 Lec08-SVD

15/39

Pseudo-inverse of an arbitrary matrix

The pseudo-inverse ofA is the mapping from a vector bto

the solution minx Ax b22.

In the full-rank overdetermined case, the normal equa-

tions yield x= (ATA)1AT

A

b


8/9/2019 Lec08-SVD

16/39

In rank-deficient case, consider fat SVD ofA:

A = U1 U21 00 0VT

1VT2

= ri=1

iviuTi

Then left multiply by UT to get

Ax b22 =

1 0

0 0

y1y2

UT1UT2

b

2

2

with y1y2

=

VT1VT2

x

Then minimum norm solution tominx Axb2

2

is solution

to 1y1 =UT1b, y2 = 0:

x= V1 V2y1

y2 =A

V1 V211 0

0 0UT1

UT2 b.


8/9/2019 Lec08-SVD

17/39

Moore-Penrose Inverse

The pseudo-inverse ofA is given by

A =V

11 0

0 0

UT =

r

i=1viu

Ti

i

Moore-Penrose conditions:

The pseudo inverse of a matrix is uniquely determined bythese four conditions:

(1) AXA=A (2) XAX=X

(3) (AX)H =AX (4) (XA)H =XA

In the full-rank overdetermined case, A = (ATA)1AT


8/9/2019 Lec08-SVD

18/39

Least-squares problems and the SVD

SVD can give much information about solving overde-

termined and underdetermined linear systems.

LetAbe anm nmatrix andA=UVT its SVD with

r = rank(A),V = [v1, . . . , vn]U = [u1, . . . , um]. Then

xLS=r

i=1uTib

i

vi

minimizes b Ax2and has the smallest 2-norm among

all possible minimizers. In addition,

LS b AxLS2 =z2 with z = [ur+1, . . . , um]Tb


8/9/2019 Lec08-SVD

19/39

Least-squares problems and pseudo-inverses

A restatement of the first part of the previous result:

Consider the general linear least-squares problem

minx S x2, S={x Rn

| b Ax2 min}.

This problem always has a unique solution given by

x=A

b


8/9/2019 Lec08-SVD

20/39

Consider the matrix: A= 1 0 2 00 0 2 1 Compute the singular value decomposition ofA

Find the matrix B of rank 1 which is the closest to the

above matrix in the 2-norm sense.

What is the pseudo-inverse ofA?

What is the pseudo-inverse ofB?

Find the vectorxof smallest norm which minimizes b

Ax2 with b= (1, 1)T

Find the vectorxof smallest norm which minimizes b

Bx2 with b= (1, 1)T


8/9/2019 Lec08-SVD

21/39

Ill-conditioned systems and the SVD

Let A be m m and A=UVT its SVD

Solution ofAx=b is x=A1b=

mi=1

uTib

ivi

WhenAis very ill-conditioned, it has many small singularvalues. The division by these small is will amplify any

noise in the data. If b=b+ then

A1b=m

i=1

uTibi

vi+m

i=1

uTii

vi

Error Result: solution could be completely meaningless.


8/9/2019 Lec08-SVD

22/39

Remedy: SVD regularization

Truncate the SVD by only keeping the is that ,where is a threshold

Gives the Truncated SVD solution (TSVD solution:)

xT SV D =

i

uTib

ivi

Many applications [e.g., Image processing,..]


8/9/2019 Lec08-SVD

23/39

Numerical rank and the SVD

Assume that the original matrix A is exactly of rank k.

Thecomputed SVD of A will be the SVD of a nearby

matrix A+E.

Easy to show that |i i| 1u

Result: zero singular values will yield small computed

singular values

Determining the numerical rank: treat singular values

below a certain threshold as zero.

Practical problem : need to set .


8/9/2019 Lec08-SVD

24/39

A few applications of the SVD

Many methods require to approximate the original data(matrix) by a low rank matrix before attempting to solve

the original problem

Regularization methodsrequire the solution of a least-squares linear system Ax =b approximately in the domi-

nant singular space ofA

The Latent Semantic Indexing (LSI) method in infor-mation retrieval, performs the query in the dominant

singular space ofA

Methods utilizing Principal Component Analysis, e.g.

Face Recognition.

8-24 (TB 5) SVDapp

8/9/2019 Lec08-SVD

25/39

Commonality: Approximate A (or A) by a lower rank

approximation Ak (using dominant singular space) before

solving original problem.

This approximation captures the main features of the

data while getting rid of noise and redundancy

Note: Common misconception: we need to reduce dimen-

sion in order to reduce computational cost. In reality:

using less information often yields better results. Thisis the problem ofoverfitting.

Good illustration: Information Retrieval (IR)

8-25 (TB 5) SVDapp

8/9/2019 Lec08-SVD

26/39

Information Retrieval: Vector Space Model

Given: a collection of documents (columns of a matrixA) and a query vector q.

Collection represented by an m n term by document

matrix with aij =LijGiNj

Queries (pseudo-documents) qare represented similarly

to a column

8-26 (TB 5) SVDapp

8/9/2019 Lec08-SVD

27/39

Vector Space Model - continued

Problem: find a column ofA that best matches q

Similarity metric: angle between the column and q - Use

cosines:

|cTq|

c2q2

To rank all documents we need to compute

s=ATq

s = similarity vector.

Literal matching not very effective.

8-27 (TB 5) SVDapp

8/9/2019 Lec08-SVD

28/39

Use of the SVD

Many problems with literal matching: polysemy, syn-onymy, ...

Need to extract intrinsic information or underlying

semantic information

Solution (LSI): replace matrix A by a low rank approxi-

mation using the Singular Value Decomposition (SVD)

A=UVT Ak =UkkVTk

Uk : term space, Vk: document space.

Refer to this as Truncated SVD (TSVD) approach

8-28 (TB 5) SVDapp

8/9/2019 Lec08-SVD

29/39

New similarity vector:

sk =AT

k q =VkkUT

kq

Issues:

Problem 1: How to select k? Problem 2: computational cost (memory + computa-

tion)

Problem 3: updates [e.g. google data changes all the

time]

Not practical for very large sets

8-29 (TB 5) SVDapp

8/9/2019 Lec08-SVD

30/39

LSI : an example

%% D1 : INFANT & TODLER first aid%% D2 : BABIES & CHILDRENs room for your HOME%% D3 : CHILD SAFETY at HOME%% D4 : Your BABYs HEALTH and SAFETY%% : From INFANT to TODDLER

%% D5 : BABY PROOFING basics%% D6 : Your GUIDE to easy rust PROOFING%% D7 : Beanie BABIES collectors GUIDE%% D8 : SAFETY GUIDE for CHILD PROOFING your HOME%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% TERMS: 1:BABY 2:CHILD 3:GUIDE 4:HEALTH 5:HOME%% 6:INFANT 7:PROOFING 8:SAFETY 9:TODDLER%% Source: Berry and Browne, SIAM., 99

Number of documents: 8 Number of terms: 9

( )

8/9/2019 Lec08-SVD

31/39

Raw matrix (before scaling).

A=

d1 d2 d3 d4 d5 d6 d7 d8

1 1 1 1 bab

1 1 1 chi

1 1 1 gui

1 hea1 1 1 hom

1 1 inf

1 1 1 pro1 1 1 saf

1 1 tod

Get the anwser to the query Child Safety, so

q = [0 1 0 0 0 0 0 1 0]

using cosines and then using LSI with k = 3.8-31 (TB 5) SVDapp

8/9/2019 Lec08-SVD

32/39

Dimension reduction

Dimensionality Reduction (DR) techniques pervasive to manyapplications

Often main goal of dimension reduction is not to reduce

computational cost. Instead:

Dimension reduction used to reduce noise and redun-

dancy in data Dimension reduction used to discover patterns (e.g., su-

pervised learning)

Techniques depend on desirable features or application:

Preserve angles? Preserve distances? Maximize variance?

..

8-32 (TB 5) SVDapp

8/9/2019 Lec08-SVD

33/39

The problem

Given d m find a

mapping

:x Rm y Rd

Mapping may be explicit(e.g., y =VTx)

Or implicit (nonlinear)

Practically:

Find a low-dimensional representation

Y Rdn ofX Rmn.

Two classes of methods: (1) projection techniques and

(2) nonlinear implicit methods.8-33 (TB 5) SVDapp

8/9/2019 Lec08-SVD

34/39

Example: Digit images (a sample of 30)

5 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

20

5 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

20

5 10 15

10

20 5 10 15

10

20 5 10 15

10

20 5 10 15

10

20 5 10 15

10

20 5 10 15

10

20

5 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

20

5 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

20

8-34 (TB 5) SVDapp

8/9/2019 Lec08-SVD

35/39

A few 2-D reductions:

10 5 0 5 106

4

2

0

2

4

6

PCA digits : 0 4

0

1

2

3

4

0.2 0.1 0 0.1 0.20.2

0.15

0.1

0.05

0

0.05

0.1

0.15

LLE digits : 0 4

0.07 0.071 0.072 0.073 0.0740.15

0.1

0.05

0

0.05

0.1

0.15

0.2

KPCA digits : 0 4

0

1

2

3

4

5.4341 5.4341 5.4341 5.4341 5.4341

x 103

0.25

0.2

0.15

0.1

0.05

0

0.05

0.1

ONPP digits : 0 4

8-35 (TB 5) SVDapp

P j i b d Di i li R d i

8/9/2019 Lec08-SVD

36/39

Projection-based Dimensionality Reduction

Given: a data set X= [x1, x2, . . . , xn], and d the dimen-sion of the desired reduced space Y.

Want: a linear transformation from X to Y

v Td

m

m

d

n

X

Y

n

X Rmn

V R

md

Y =VX

Y Rdn

m-dimens. objects (xi) flattened to d-dimens. space(yi)

Problem: Find the best such mapping (optimization) given

that the yis must satisfy certain constraints8-36 (TB 5) SVDapp

P i i l C A l i (PCA)

8/9/2019 Lec08-SVD

37/39

Principal Component Analysis (PCA)

PCA: find V (orthogonal) so that projected data Y =VTXhas maximum variance

Maximize over all orthogonal m d matrices V:i

yi 1n

j

yj22 = =Tr

V XXV

Where: X= [x1, ,xn] with xi =xi , = mean.

Solution:

V ={ dominant eigenvectors } of the covariance matrix

i.e., Optimal V = Set of left singular vectors of X

associated with d largest singular values.

8-37 (TB 5) SVDapp

Show that X X(I 1 eeT ) (here e vector of all

8/9/2019 Lec08-SVD

38/39

Show that X = X(In

eeT) (here e = vector of all

ones). What does the projector (I 1n

eeT) do?

Show that solution V also minimizes reconstruction

error ..

i

xi V VTxi

2 =

i

xi Vyi2

.. and that it also maximizesi,j yi yj2

8-38 (TB 5) SVDapp

8/9/2019 Lec08-SVD

39/39

lec08-svd

Documents