lec08-svd

Upload: adis022

Post on 01-Jun-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 Lec08-SVD

    1/39

    THE SINGULAR VALUE DECOMPOSITION

    The SVD existence - properties.

    Pseudo-inverses and the SVD

    Use of SVD for least-squares problems

    Applications of the SVD

    Text: mainly sect. 2.4

    8-1 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW

  • 8/9/2019 Lec08-SVD

    2/39

    The Singular Value Decomposition (SVD)

    Theorem For any matrix A Rmn there existunitary matrices U Rmm and V Rnn such

    that

    A =UVT

    where is a diagonal matrix with entries ii 0.

    11 22 pp 0 with p = min(n, m)

    The ii are thesingular valuesofA.

    ii is denoted simply by i

    8-2 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW

  • 8/9/2019 Lec08-SVD

    3/39

    Proof: Let 1 = A2 = maxx,x2=1 Ax2. There

    exists a pair of unit vectors v1, u1 such thatAv1 =1u1

    8-3 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW

  • 8/9/2019 Lec08-SVD

    4/39

    Complete v1 into an orthonormal basis ofRn

    V [v1, V2] =n n unitary

    Complete u1 into an orthonormal basis ofRm

    U [u1, U2] =m m unitary

    Define U, V as single Householder reflectors.

    Then, it is easy to show that

    AV =U 1 wT

    0 B UTAV = 1 wT

    0 B A1

    8-4 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW

  • 8/9/2019 Lec08-SVD

    5/39

  • 8/9/2019 Lec08-SVD

    6/39

    Case 1:

    =

    V

    UA

    T

    Case 2:

    A U

    V

    =

    T

    8-6 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW

  • 8/9/2019 Lec08-SVD

    7/39

    The thin SVD

    Consider the Case-1. It can be rewritten as

    A= [U1U2]10 VTWhich gives:

    A=U11 VT

    where U1 is m n (same shape as A), and 1 and V are

    n n

    Referred to as thethin SVD. Important in practice.

    How can you obtain the thin SVD from the QR factor-

    ization ofA and the SVD of an n n matrix?8-7 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW

  • 8/9/2019 Lec08-SVD

    8/39

    A few properties. Assume that

    1 2 r >0 and r+1 = =p = 0

    Then:

    rank(A) =r = number of nonzero singular values.

    Ran(A) = span{u1, u2, . . . , ur}

    Null(A) = span{vr+1, vr+2, . . . , vn}

    The matrix A admits the SVD expansion:

    A=r

    i=1

    iuivTi

    8-8 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW

  • 8/9/2019 Lec08-SVD

    9/39

    Properties of the SVD (continued)

    A2 =1 = largest singular value

    AF =

    ri=1

    2i

    1/2

    When A is an n n nonsingular matrix then A12 =1/n = inverse of smallest s.v.

    8-9 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW

  • 8/9/2019 Lec08-SVD

    10/39

  • 8/9/2019 Lec08-SVD

    11/39

    Proof: First show: for any rank-k matrix B we have A

    B2 k+1.

    Consider X= span{v1, v2, , vk+1}. Note:

    dim(Null(B)) =n k Null(B) X ={0}

    Let x0 Null(B) X ={0}. Write x0 =V y. Then

    (A B)x02 =Ax02 =UVTV y2 =y2

    But y2 k+1x02 (Show this).

    Second: take B =Ak. Achieves the min.

    8-11 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW

  • 8/9/2019 Lec08-SVD

    12/39

    Right and Left Singular vectors:

    Avi = iui

    ATuj = jvj

    Consequence ATAvi =2i vi and AA

    Tui =2i ui

    Right singular vectors (vis) are eigenvectors ofATA

    Left singular vectors (uis) are eigenvectors ofAAT

    Possible to get the SVD from eigenvectors ofAAT andATA but: difficulties due to non-uniqueness of the SVD

    8-12 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW

  • 8/9/2019 Lec08-SVD

    13/39

    Define the r r matrix

    1 = diag(1, . . . , r)

    Let A Rmn and consider ATA ( Rnn):

    ATA=VTVT ATA =V 21 00 0

    nn

    VT

    This gives the spectral decomposition ofATA.

    8-13 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW

  • 8/9/2019 Lec08-SVD

    14/39

    Similarly, Ugives the eigenvectors ofAAT.

    AAT =U

    21 0

    0 0

    mm

    UT

    Important:

    ATA = V D1VT and AAT = U D2U

    T give the SVD

    factors U, V up to signs!

    8-14 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW

  • 8/9/2019 Lec08-SVD

    15/39

    Pseudo-inverse of an arbitrary matrix

    The pseudo-inverse ofA is the mapping from a vector bto

    the solution minx Ax b22.

    In the full-rank overdetermined case, the normal equa-

    tions yield x= (ATA)1AT

    A

    b

    8-15 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW

  • 8/9/2019 Lec08-SVD

    16/39

    In rank-deficient case, consider fat SVD ofA:

    A = U1 U21 00 0VT

    1VT2

    = ri=1

    iviuTi

    Then left multiply by UT to get

    Ax b22 =

    1 0

    0 0

    y1y2

    UT1UT2

    b

    2

    2

    with y1y2

    =

    VT1VT2

    x

    Then minimum norm solution tominx Axb2

    2

    is solution

    to 1y1 =UT1b, y2 = 0:

    x= V1 V2y1

    y2 =A

    V1 V211 0

    0 0UT1

    UT2 b.

    8-16 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW

  • 8/9/2019 Lec08-SVD

    17/39

    Moore-Penrose Inverse

    The pseudo-inverse ofA is given by

    A =V

    11 0

    0 0

    UT =

    r

    i=1viu

    Ti

    i

    Moore-Penrose conditions:

    The pseudo inverse of a matrix is uniquely determined bythese four conditions:

    (1) AXA=A (2) XAX=X

    (3) (AX)H =AX (4) (XA)H =XA

    In the full-rank overdetermined case, A = (ATA)1AT

    8-17 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW

  • 8/9/2019 Lec08-SVD

    18/39

    Least-squares problems and the SVD

    SVD can give much information about solving overde-

    termined and underdetermined linear systems.

    LetAbe anm nmatrix andA=UVT its SVD with

    r = rank(A),V = [v1, . . . , vn]U = [u1, . . . , um]. Then

    xLS=r

    i=1uTib

    i

    vi

    minimizes b Ax2and has the smallest 2-norm among

    all possible minimizers. In addition,

    LS b AxLS2 =z2 with z = [ur+1, . . . , um]Tb

    8-18 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW

  • 8/9/2019 Lec08-SVD

    19/39

    Least-squares problems and pseudo-inverses

    A restatement of the first part of the previous result:

    Consider the general linear least-squares problem

    minx S x2, S={x Rn

    | b Ax2 min}.

    This problem always has a unique solution given by

    x=A

    b

    8-19 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW

  • 8/9/2019 Lec08-SVD

    20/39

    Consider the matrix: A= 1 0 2 00 0 2 1 Compute the singular value decomposition ofA

    Find the matrix B of rank 1 which is the closest to the

    above matrix in the 2-norm sense.

    What is the pseudo-inverse ofA?

    What is the pseudo-inverse ofB?

    Find the vectorxof smallest norm which minimizes b

    Ax2 with b= (1, 1)T

    Find the vectorxof smallest norm which minimizes b

    Bx2 with b= (1, 1)T

    8-20 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW

  • 8/9/2019 Lec08-SVD

    21/39

    Ill-conditioned systems and the SVD

    Let A be m m and A=UVT its SVD

    Solution ofAx=b is x=A1b=

    mi=1

    uTib

    ivi

    WhenAis very ill-conditioned, it has many small singularvalues. The division by these small is will amplify any

    noise in the data. If b=b+ then

    A1b=m

    i=1

    uTibi

    vi+m

    i=1

    uTii

    vi

    Error Result: solution could be completely meaningless.

    8-21 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW

  • 8/9/2019 Lec08-SVD

    22/39

    Remedy: SVD regularization

    Truncate the SVD by only keeping the is that ,where is a threshold

    Gives the Truncated SVD solution (TSVD solution:)

    xT SV D =

    i

    uTib

    ivi

    Many applications [e.g., Image processing,..]

    8-22 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW

  • 8/9/2019 Lec08-SVD

    23/39

    Numerical rank and the SVD

    Assume that the original matrix A is exactly of rank k.

    Thecomputed SVD of A will be the SVD of a nearby

    matrix A+E.

    Easy to show that |i i| 1u

    Result: zero singular values will yield small computed

    singular values

    Determining the numerical rank: treat singular values

    below a certain threshold as zero.

    Practical problem : need to set .

    8-23 GvL 2.4,5.5; Heath 3.6; TB 4-5 SVD-NEW

  • 8/9/2019 Lec08-SVD

    24/39

    A few applications of the SVD

    Many methods require to approximate the original data(matrix) by a low rank matrix before attempting to solve

    the original problem

    Regularization methodsrequire the solution of a least-squares linear system Ax =b approximately in the domi-

    nant singular space ofA

    The Latent Semantic Indexing (LSI) method in infor-mation retrieval, performs the query in the dominant

    singular space ofA

    Methods utilizing Principal Component Analysis, e.g.

    Face Recognition.

    8-24 (TB 5) SVDapp

  • 8/9/2019 Lec08-SVD

    25/39

    Commonality: Approximate A (or A) by a lower rank

    approximation Ak (using dominant singular space) before

    solving original problem.

    This approximation captures the main features of the

    data while getting rid of noise and redundancy

    Note: Common misconception: we need to reduce dimen-

    sion in order to reduce computational cost. In reality:

    using less information often yields better results. Thisis the problem ofoverfitting.

    Good illustration: Information Retrieval (IR)

    8-25 (TB 5) SVDapp

  • 8/9/2019 Lec08-SVD

    26/39

    Information Retrieval: Vector Space Model

    Given: a collection of documents (columns of a matrixA) and a query vector q.

    Collection represented by an m n term by document

    matrix with aij =LijGiNj

    Queries (pseudo-documents) qare represented similarly

    to a column

    8-26 (TB 5) SVDapp

  • 8/9/2019 Lec08-SVD

    27/39

    Vector Space Model - continued

    Problem: find a column ofA that best matches q

    Similarity metric: angle between the column and q - Use

    cosines:

    |cTq|

    c2q2

    To rank all documents we need to compute

    s=ATq

    s = similarity vector.

    Literal matching not very effective.

    8-27 (TB 5) SVDapp

  • 8/9/2019 Lec08-SVD

    28/39

    Use of the SVD

    Many problems with literal matching: polysemy, syn-onymy, ...

    Need to extract intrinsic information or underlying

    semantic information

    Solution (LSI): replace matrix A by a low rank approxi-

    mation using the Singular Value Decomposition (SVD)

    A=UVT Ak =UkkVTk

    Uk : term space, Vk: document space.

    Refer to this as Truncated SVD (TSVD) approach

    8-28 (TB 5) SVDapp

  • 8/9/2019 Lec08-SVD

    29/39

    New similarity vector:

    sk =AT

    k q =VkkUT

    kq

    Issues:

    Problem 1: How to select k? Problem 2: computational cost (memory + computa-

    tion)

    Problem 3: updates [e.g. google data changes all the

    time]

    Not practical for very large sets

    8-29 (TB 5) SVDapp

  • 8/9/2019 Lec08-SVD

    30/39

    LSI : an example

    %% D1 : INFANT & TODLER first aid%% D2 : BABIES & CHILDRENs room for your HOME%% D3 : CHILD SAFETY at HOME%% D4 : Your BABYs HEALTH and SAFETY%% : From INFANT to TODDLER

    %% D5 : BABY PROOFING basics%% D6 : Your GUIDE to easy rust PROOFING%% D7 : Beanie BABIES collectors GUIDE%% D8 : SAFETY GUIDE for CHILD PROOFING your HOME%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% TERMS: 1:BABY 2:CHILD 3:GUIDE 4:HEALTH 5:HOME%% 6:INFANT 7:PROOFING 8:SAFETY 9:TODDLER%% Source: Berry and Browne, SIAM., 99

    Number of documents: 8 Number of terms: 9

    ( )

  • 8/9/2019 Lec08-SVD

    31/39

    Raw matrix (before scaling).

    A=

    d1 d2 d3 d4 d5 d6 d7 d8

    1 1 1 1 bab

    1 1 1 chi

    1 1 1 gui

    1 hea1 1 1 hom

    1 1 inf

    1 1 1 pro1 1 1 saf

    1 1 tod

    Get the anwser to the query Child Safety, so

    q = [0 1 0 0 0 0 0 1 0]

    using cosines and then using LSI with k = 3.8-31 (TB 5) SVDapp

  • 8/9/2019 Lec08-SVD

    32/39

    Dimension reduction

    Dimensionality Reduction (DR) techniques pervasive to manyapplications

    Often main goal of dimension reduction is not to reduce

    computational cost. Instead:

    Dimension reduction used to reduce noise and redun-

    dancy in data Dimension reduction used to discover patterns (e.g., su-

    pervised learning)

    Techniques depend on desirable features or application:

    Preserve angles? Preserve distances? Maximize variance?

    ..

    8-32 (TB 5) SVDapp

  • 8/9/2019 Lec08-SVD

    33/39

    The problem

    Given d m find a

    mapping

    :x Rm y Rd

    Mapping may be explicit(e.g., y =VTx)

    Or implicit (nonlinear)

    Practically:

    Find a low-dimensional representation

    Y Rdn ofX Rmn.

    Two classes of methods: (1) projection techniques and

    (2) nonlinear implicit methods.8-33 (TB 5) SVDapp

  • 8/9/2019 Lec08-SVD

    34/39

    Example: Digit images (a sample of 30)

    5 10 15

    10

    205 10 15

    10

    205 10 15

    10

    205 10 15

    10

    205 10 15

    10

    205 10 15

    10

    20

    5 10 15

    10

    205 10 15

    10

    205 10 15

    10

    205 10 15

    10

    205 10 15

    10

    205 10 15

    10

    20

    5 10 15

    10

    20 5 10 15

    10

    20 5 10 15

    10

    20 5 10 15

    10

    20 5 10 15

    10

    20 5 10 15

    10

    20

    5 10 15

    10

    205 10 15

    10

    205 10 15

    10

    205 10 15

    10

    205 10 15

    10

    205 10 15

    10

    20

    5 10 15

    10

    205 10 15

    10

    205 10 15

    10

    205 10 15

    10

    205 10 15

    10

    205 10 15

    10

    20

    8-34 (TB 5) SVDapp

  • 8/9/2019 Lec08-SVD

    35/39

    A few 2-D reductions:

    10 5 0 5 106

    4

    2

    0

    2

    4

    6

    PCA digits : 0 4

    0

    1

    2

    3

    4

    0.2 0.1 0 0.1 0.20.2

    0.15

    0.1

    0.05

    0

    0.05

    0.1

    0.15

    LLE digits : 0 4

    0.07 0.071 0.072 0.073 0.0740.15

    0.1

    0.05

    0

    0.05

    0.1

    0.15

    0.2

    KPCA digits : 0 4

    0

    1

    2

    3

    4

    5.4341 5.4341 5.4341 5.4341 5.4341

    x 103

    0.25

    0.2

    0.15

    0.1

    0.05

    0

    0.05

    0.1

    ONPP digits : 0 4

    8-35 (TB 5) SVDapp

    P j i b d Di i li R d i

  • 8/9/2019 Lec08-SVD

    36/39

    Projection-based Dimensionality Reduction

    Given: a data set X= [x1, x2, . . . , xn], and d the dimen-sion of the desired reduced space Y.

    Want: a linear transformation from X to Y

    v Td

    m

    m

    d

    n

    X

    Y

    n

    X Rmn

    V R

    md

    Y =VX

    Y Rdn

    m-dimens. objects (xi) flattened to d-dimens. space(yi)

    Problem: Find the best such mapping (optimization) given

    that the yis must satisfy certain constraints8-36 (TB 5) SVDapp

    P i i l C A l i (PCA)

  • 8/9/2019 Lec08-SVD

    37/39

    Principal Component Analysis (PCA)

    PCA: find V (orthogonal) so that projected data Y =VTXhas maximum variance

    Maximize over all orthogonal m d matrices V:i

    yi 1n

    j

    yj22 = =Tr

    V XXV

    Where: X= [x1, ,xn] with xi =xi , = mean.

    Solution:

    V ={ dominant eigenvectors } of the covariance matrix

    i.e., Optimal V = Set of left singular vectors of X

    associated with d largest singular values.

    8-37 (TB 5) SVDapp

    Show that X X(I 1 eeT ) (here e vector of all

  • 8/9/2019 Lec08-SVD

    38/39

    Show that X = X(In

    eeT) (here e = vector of all

    ones). What does the projector (I 1n

    eeT) do?

    Show that solution V also minimizes reconstruction

    error ..

    i

    xi V VTxi

    2 =

    i

    xi Vyi2

    .. and that it also maximizesi,j yi yj2

    8-38 (TB 5) SVDapp

  • 8/9/2019 Lec08-SVD

    39/39