lec08-svd
TRANSCRIPT
-
8/9/2019 Lec08-SVD
1/39
THE SINGULAR VALUE DECOMPOSITION
The SVD existence - properties.
Pseudo-inverses and the SVD
Use of SVD for least-squares problems
Applications of the SVD
Text: mainly sect. 2.4
8-1 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW
-
8/9/2019 Lec08-SVD
2/39
The Singular Value Decomposition (SVD)
Theorem For any matrix A Rmn there existunitary matrices U Rmm and V Rnn such
that
A =UVT
where is a diagonal matrix with entries ii 0.
11 22 pp 0 with p = min(n, m)
The ii are thesingular valuesofA.
ii is denoted simply by i
8-2 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW
-
8/9/2019 Lec08-SVD
3/39
Proof: Let 1 = A2 = maxx,x2=1 Ax2. There
exists a pair of unit vectors v1, u1 such thatAv1 =1u1
8-3 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW
-
8/9/2019 Lec08-SVD
4/39
Complete v1 into an orthonormal basis ofRn
V [v1, V2] =n n unitary
Complete u1 into an orthonormal basis ofRm
U [u1, U2] =m m unitary
Define U, V as single Householder reflectors.
Then, it is easy to show that
AV =U 1 wT
0 B UTAV = 1 wT
0 B A1
8-4 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW
-
8/9/2019 Lec08-SVD
5/39
-
8/9/2019 Lec08-SVD
6/39
Case 1:
=
V
UA
T
Case 2:
A U
V
=
T
8-6 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW
-
8/9/2019 Lec08-SVD
7/39
The thin SVD
Consider the Case-1. It can be rewritten as
A= [U1U2]10 VTWhich gives:
A=U11 VT
where U1 is m n (same shape as A), and 1 and V are
n n
Referred to as thethin SVD. Important in practice.
How can you obtain the thin SVD from the QR factor-
ization ofA and the SVD of an n n matrix?8-7 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW
-
8/9/2019 Lec08-SVD
8/39
A few properties. Assume that
1 2 r >0 and r+1 = =p = 0
Then:
rank(A) =r = number of nonzero singular values.
Ran(A) = span{u1, u2, . . . , ur}
Null(A) = span{vr+1, vr+2, . . . , vn}
The matrix A admits the SVD expansion:
A=r
i=1
iuivTi
8-8 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW
-
8/9/2019 Lec08-SVD
9/39
Properties of the SVD (continued)
A2 =1 = largest singular value
AF =
ri=1
2i
1/2
When A is an n n nonsingular matrix then A12 =1/n = inverse of smallest s.v.
8-9 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW
-
8/9/2019 Lec08-SVD
10/39
-
8/9/2019 Lec08-SVD
11/39
Proof: First show: for any rank-k matrix B we have A
B2 k+1.
Consider X= span{v1, v2, , vk+1}. Note:
dim(Null(B)) =n k Null(B) X ={0}
Let x0 Null(B) X ={0}. Write x0 =V y. Then
(A B)x02 =Ax02 =UVTV y2 =y2
But y2 k+1x02 (Show this).
Second: take B =Ak. Achieves the min.
8-11 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW
-
8/9/2019 Lec08-SVD
12/39
Right and Left Singular vectors:
Avi = iui
ATuj = jvj
Consequence ATAvi =2i vi and AA
Tui =2i ui
Right singular vectors (vis) are eigenvectors ofATA
Left singular vectors (uis) are eigenvectors ofAAT
Possible to get the SVD from eigenvectors ofAAT andATA but: difficulties due to non-uniqueness of the SVD
8-12 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW
-
8/9/2019 Lec08-SVD
13/39
Define the r r matrix
1 = diag(1, . . . , r)
Let A Rmn and consider ATA ( Rnn):
ATA=VTVT ATA =V 21 00 0
nn
VT
This gives the spectral decomposition ofATA.
8-13 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW
-
8/9/2019 Lec08-SVD
14/39
Similarly, Ugives the eigenvectors ofAAT.
AAT =U
21 0
0 0
mm
UT
Important:
ATA = V D1VT and AAT = U D2U
T give the SVD
factors U, V up to signs!
8-14 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW
-
8/9/2019 Lec08-SVD
15/39
Pseudo-inverse of an arbitrary matrix
The pseudo-inverse ofA is the mapping from a vector bto
the solution minx Ax b22.
In the full-rank overdetermined case, the normal equa-
tions yield x= (ATA)1AT
A
b
8-15 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW
-
8/9/2019 Lec08-SVD
16/39
In rank-deficient case, consider fat SVD ofA:
A = U1 U21 00 0VT
1VT2
= ri=1
iviuTi
Then left multiply by UT to get
Ax b22 =
1 0
0 0
y1y2
UT1UT2
b
2
2
with y1y2
=
VT1VT2
x
Then minimum norm solution tominx Axb2
2
is solution
to 1y1 =UT1b, y2 = 0:
x= V1 V2y1
y2 =A
V1 V211 0
0 0UT1
UT2 b.
8-16 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW
-
8/9/2019 Lec08-SVD
17/39
Moore-Penrose Inverse
The pseudo-inverse ofA is given by
A =V
11 0
0 0
UT =
r
i=1viu
Ti
i
Moore-Penrose conditions:
The pseudo inverse of a matrix is uniquely determined bythese four conditions:
(1) AXA=A (2) XAX=X
(3) (AX)H =AX (4) (XA)H =XA
In the full-rank overdetermined case, A = (ATA)1AT
8-17 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW
-
8/9/2019 Lec08-SVD
18/39
Least-squares problems and the SVD
SVD can give much information about solving overde-
termined and underdetermined linear systems.
LetAbe anm nmatrix andA=UVT its SVD with
r = rank(A),V = [v1, . . . , vn]U = [u1, . . . , um]. Then
xLS=r
i=1uTib
i
vi
minimizes b Ax2and has the smallest 2-norm among
all possible minimizers. In addition,
LS b AxLS2 =z2 with z = [ur+1, . . . , um]Tb
8-18 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW
-
8/9/2019 Lec08-SVD
19/39
Least-squares problems and pseudo-inverses
A restatement of the first part of the previous result:
Consider the general linear least-squares problem
minx S x2, S={x Rn
| b Ax2 min}.
This problem always has a unique solution given by
x=A
b
8-19 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW
-
8/9/2019 Lec08-SVD
20/39
Consider the matrix: A= 1 0 2 00 0 2 1 Compute the singular value decomposition ofA
Find the matrix B of rank 1 which is the closest to the
above matrix in the 2-norm sense.
What is the pseudo-inverse ofA?
What is the pseudo-inverse ofB?
Find the vectorxof smallest norm which minimizes b
Ax2 with b= (1, 1)T
Find the vectorxof smallest norm which minimizes b
Bx2 with b= (1, 1)T
8-20 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW
-
8/9/2019 Lec08-SVD
21/39
Ill-conditioned systems and the SVD
Let A be m m and A=UVT its SVD
Solution ofAx=b is x=A1b=
mi=1
uTib
ivi
WhenAis very ill-conditioned, it has many small singularvalues. The division by these small is will amplify any
noise in the data. If b=b+ then
A1b=m
i=1
uTibi
vi+m
i=1
uTii
vi
Error Result: solution could be completely meaningless.
8-21 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW
-
8/9/2019 Lec08-SVD
22/39
Remedy: SVD regularization
Truncate the SVD by only keeping the is that ,where is a threshold
Gives the Truncated SVD solution (TSVD solution:)
xT SV D =
i
uTib
ivi
Many applications [e.g., Image processing,..]
8-22 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW
-
8/9/2019 Lec08-SVD
23/39
Numerical rank and the SVD
Assume that the original matrix A is exactly of rank k.
Thecomputed SVD of A will be the SVD of a nearby
matrix A+E.
Easy to show that |i i| 1u
Result: zero singular values will yield small computed
singular values
Determining the numerical rank: treat singular values
below a certain threshold as zero.
Practical problem : need to set .
8-23 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW
-
8/9/2019 Lec08-SVD
24/39
A few applications of the SVD
Many methods require to approximate the original data(matrix) by a low rank matrix before attempting to solve
the original problem
Regularization methodsrequire the solution of a least-squares linear system Ax =b approximately in the domi-
nant singular space ofA
The Latent Semantic Indexing (LSI) method in infor-mation retrieval, performs the query in the dominant
singular space ofA
Methods utilizing Principal Component Analysis, e.g.
Face Recognition.
8-24 (TB 5) SVDapp
-
8/9/2019 Lec08-SVD
25/39
Commonality: Approximate A (or A) by a lower rank
approximation Ak (using dominant singular space) before
solving original problem.
This approximation captures the main features of the
data while getting rid of noise and redundancy
Note: Common misconception: we need to reduce dimen-
sion in order to reduce computational cost. In reality:
using less information often yields better results. Thisis the problem ofoverfitting.
Good illustration: Information Retrieval (IR)
8-25 (TB 5) SVDapp
-
8/9/2019 Lec08-SVD
26/39
Information Retrieval: Vector Space Model
Given: a collection of documents (columns of a matrixA) and a query vector q.
Collection represented by an m n term by document
matrix with aij =LijGiNj
Queries (pseudo-documents) qare represented similarly
to a column
8-26 (TB 5) SVDapp
-
8/9/2019 Lec08-SVD
27/39
Vector Space Model - continued
Problem: find a column ofA that best matches q
Similarity metric: angle between the column and q - Use
cosines:
|cTq|
c2q2
To rank all documents we need to compute
s=ATq
s = similarity vector.
Literal matching not very effective.
8-27 (TB 5) SVDapp
-
8/9/2019 Lec08-SVD
28/39
Use of the SVD
Many problems with literal matching: polysemy, syn-onymy, ...
Need to extract intrinsic information or underlying
semantic information
Solution (LSI): replace matrix A by a low rank approxi-
mation using the Singular Value Decomposition (SVD)
A=UVT Ak =UkkVTk
Uk : term space, Vk: document space.
Refer to this as Truncated SVD (TSVD) approach
8-28 (TB 5) SVDapp
-
8/9/2019 Lec08-SVD
29/39
New similarity vector:
sk =AT
k q =VkkUT
kq
Issues:
Problem 1: How to select k? Problem 2: computational cost (memory + computa-
tion)
Problem 3: updates [e.g. google data changes all the
time]
Not practical for very large sets
8-29 (TB 5) SVDapp
-
8/9/2019 Lec08-SVD
30/39
LSI : an example
%% D1 : INFANT & TODLER first aid%% D2 : BABIES & CHILDRENs room for your HOME%% D3 : CHILD SAFETY at HOME%% D4 : Your BABYs HEALTH and SAFETY%% : From INFANT to TODDLER
%% D5 : BABY PROOFING basics%% D6 : Your GUIDE to easy rust PROOFING%% D7 : Beanie BABIES collectors GUIDE%% D8 : SAFETY GUIDE for CHILD PROOFING your HOME%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% TERMS: 1:BABY 2:CHILD 3:GUIDE 4:HEALTH 5:HOME%% 6:INFANT 7:PROOFING 8:SAFETY 9:TODDLER%% Source: Berry and Browne, SIAM., 99
Number of documents: 8 Number of terms: 9
( )
-
8/9/2019 Lec08-SVD
31/39
Raw matrix (before scaling).
A=
d1 d2 d3 d4 d5 d6 d7 d8
1 1 1 1 bab
1 1 1 chi
1 1 1 gui
1 hea1 1 1 hom
1 1 inf
1 1 1 pro1 1 1 saf
1 1 tod
Get the anwser to the query Child Safety, so
q = [0 1 0 0 0 0 0 1 0]
using cosines and then using LSI with k = 3.8-31 (TB 5) SVDapp
-
8/9/2019 Lec08-SVD
32/39
Dimension reduction
Dimensionality Reduction (DR) techniques pervasive to manyapplications
Often main goal of dimension reduction is not to reduce
computational cost. Instead:
Dimension reduction used to reduce noise and redun-
dancy in data Dimension reduction used to discover patterns (e.g., su-
pervised learning)
Techniques depend on desirable features or application:
Preserve angles? Preserve distances? Maximize variance?
..
8-32 (TB 5) SVDapp
-
8/9/2019 Lec08-SVD
33/39
The problem
Given d m find a
mapping
:x Rm y Rd
Mapping may be explicit(e.g., y =VTx)
Or implicit (nonlinear)
Practically:
Find a low-dimensional representation
Y Rdn ofX Rmn.
Two classes of methods: (1) projection techniques and
(2) nonlinear implicit methods.8-33 (TB 5) SVDapp
-
8/9/2019 Lec08-SVD
34/39
Example: Digit images (a sample of 30)
5 10 15
10
205 10 15
10
205 10 15
10
205 10 15
10
205 10 15
10
205 10 15
10
20
5 10 15
10
205 10 15
10
205 10 15
10
205 10 15
10
205 10 15
10
205 10 15
10
20
5 10 15
10
20 5 10 15
10
20 5 10 15
10
20 5 10 15
10
20 5 10 15
10
20 5 10 15
10
20
5 10 15
10
205 10 15
10
205 10 15
10
205 10 15
10
205 10 15
10
205 10 15
10
20
5 10 15
10
205 10 15
10
205 10 15
10
205 10 15
10
205 10 15
10
205 10 15
10
20
8-34 (TB 5) SVDapp
-
8/9/2019 Lec08-SVD
35/39
A few 2-D reductions:
10 5 0 5 106
4
2
0
2
4
6
PCA digits : 0 4
0
1
2
3
4
0.2 0.1 0 0.1 0.20.2
0.15
0.1
0.05
0
0.05
0.1
0.15
LLE digits : 0 4
0.07 0.071 0.072 0.073 0.0740.15
0.1
0.05
0
0.05
0.1
0.15
0.2
KPCA digits : 0 4
0
1
2
3
4
5.4341 5.4341 5.4341 5.4341 5.4341
x 103
0.25
0.2
0.15
0.1
0.05
0
0.05
0.1
ONPP digits : 0 4
8-35 (TB 5) SVDapp
P j i b d Di i li R d i
-
8/9/2019 Lec08-SVD
36/39
Projection-based Dimensionality Reduction
Given: a data set X= [x1, x2, . . . , xn], and d the dimen-sion of the desired reduced space Y.
Want: a linear transformation from X to Y
v Td
m
m
d
n
X
Y
n
X Rmn
V R
md
Y =VX
Y Rdn
m-dimens. objects (xi) flattened to d-dimens. space(yi)
Problem: Find the best such mapping (optimization) given
that the yis must satisfy certain constraints8-36 (TB 5) SVDapp
P i i l C A l i (PCA)
-
8/9/2019 Lec08-SVD
37/39
Principal Component Analysis (PCA)
PCA: find V (orthogonal) so that projected data Y =VTXhas maximum variance
Maximize over all orthogonal m d matrices V:i
yi 1n
j
yj22 = =Tr
V XXV
Where: X= [x1, ,xn] with xi =xi , = mean.
Solution:
V ={ dominant eigenvectors } of the covariance matrix
i.e., Optimal V = Set of left singular vectors of X
associated with d largest singular values.
8-37 (TB 5) SVDapp
Show that X X(I 1 eeT ) (here e vector of all
-
8/9/2019 Lec08-SVD
38/39
Show that X = X(In
eeT) (here e = vector of all
ones). What does the projector (I 1n
eeT) do?
Show that solution V also minimizes reconstruction
error ..
i
xi V VTxi
2 =
i
xi Vyi2
.. and that it also maximizesi,j yi yj2
8-38 (TB 5) SVDapp
-
8/9/2019 Lec08-SVD
39/39