dimensionality reduction - github pagesdimensionality reduction: nd the \narrower" matrices of...

39
Dimensionality Reduction Hung Le University of Victoria March 23, 2019 Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 1 / 28

Upload: others

Post on 25-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Dimensionality Reduction

Hung Le

University of Victoria

March 23, 2019

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 1 / 28

Page 2: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Motivation

Matrices are everywhere:

Graphs: Web or Social Network.

Pairwise interaction between two types of entities: movie rating,image-tags

Spatial representation: images.

In many applications, the raw matrix can be summarized by a “narrower”matrices.

“Narrower” means the output matrices have much less number ofcolumns/rows.

“Can be summarized ” means we can almost recover the originalmatrix from the summarized matrices.

Dimensionality reduction: find the “narrower” matrices of the originalmatrix.

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 2 / 28

Page 3: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Motivation

Matrices are everywhere:

Graphs: Web or Social Network.

Pairwise interaction between two types of entities: movie rating,image-tags

Spatial representation: images.

In many applications, the raw matrix can be summarized by a “narrower”matrices.

“Narrower” means the output matrices have much less number ofcolumns/rows.

“Can be summarized ” means we can almost recover the originalmatrix from the summarized matrices.

Dimensionality reduction: find the “narrower” matrices of the originalmatrix.

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 2 / 28

Page 4: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Motivation

Matrices are everywhere:

Graphs: Web or Social Network.

Pairwise interaction between two types of entities: movie rating,image-tags

Spatial representation: images.

In many applications, the raw matrix can be summarized by a “narrower”matrices.

“Narrower” means the output matrices have much less number ofcolumns/rows.

“Can be summarized ” means we can almost recover the originalmatrix from the summarized matrices.

Dimensionality reduction: find the “narrower” matrices of the originalmatrix.

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 2 / 28

Page 5: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Eigenvectors and Eigenvalues of Symmetric Matricese is an eigenvector corresponding to an eigenvalue λ of a square matrix Miff:

Me = λe (1)

Fact 1 If e is an eigenvector of M, for any constant c 6= 0, ce is alsoan eigenvector of M (with the same eigenvalue) ⇒ We often requirethat ||e||2 = 1.

Fact 2 If M ∈ Rn×n is real and symmetric, then M has n realeigenvectors and eigenvalues ⇒ In this lecture, M is real, symmetric.

Fact 3 We can order

λ1 ≥ λ2 ≥ . . . ≥ λne1 e2 . . . en

(2)

We can make ej , ej orthogonal, i.e, eTi ej = 0 for all i 6= j .

λ1 and e1 called the principal eigenvalue the principal eigenvector,respectively

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 3 / 28

Page 6: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Eigenvectors and Eigenvalues of Symmetric Matricese is an eigenvector corresponding to an eigenvalue λ of a square matrix Miff:

Me = λe (1)

Fact 1 If e is an eigenvector of M, for any constant c 6= 0, ce is alsoan eigenvector of M (with the same eigenvalue) ⇒ We often requirethat ||e||2 = 1.

Fact 2 If M ∈ Rn×n is real and symmetric, then M has n realeigenvectors and eigenvalues ⇒ In this lecture, M is real, symmetric.

Fact 3 We can order

λ1 ≥ λ2 ≥ . . . ≥ λne1 e2 . . . en

(2)

We can make ej , ej orthogonal, i.e, eTi ej = 0 for all i 6= j .

λ1 and e1 called the principal eigenvalue the principal eigenvector,respectively

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 3 / 28

Page 7: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Eigenvectors and Eigenvalues of Symmetric Matricese is an eigenvector corresponding to an eigenvalue λ of a square matrix Miff:

Me = λe (1)

Fact 1 If e is an eigenvector of M, for any constant c 6= 0, ce is alsoan eigenvector of M (with the same eigenvalue) ⇒ We often requirethat ||e||2 = 1.

Fact 2 If M ∈ Rn×n is real and symmetric, then M has n realeigenvectors and eigenvalues ⇒ In this lecture, M is real, symmetric.

Fact 3 We can order

λ1 ≥ λ2 ≥ . . . ≥ λne1 e2 . . . en

(2)

We can make ej , ej orthogonal, i.e, eTi ej = 0 for all i 6= j .

λ1 and e1 called the principal eigenvalue the principal eigenvector,respectivelyHung Le (University of Victoria) Dimensionality Reduction March 23, 2019 3 / 28

Page 8: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Eigenvectors and Eigenvalues of Symmetric Matrices - AnExample

M =

[3 2

2 6

](3)

has:

λ1 = 7 and x1 = [1√5,

2√5

]T

λ2 = 2 and x2 = [2√5,− 1√

5]T

(4)

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 4 / 28

Page 9: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Finding Eigenvalues by Solving Equations

Eigenvalues are the roots of the following equation (with variable λ):

det(M − λI) = 0 (5)

where I is an identity matrix, and det(X ) is the determinant of an n × nmatrix X .

Fact 4 det(M − λI) is a degree-n polynomial with variable λ.

Fact 5 If M is real and symmetric, Equation 5 has n real roots.

Fact 6 Computing the determinant of a matrix takes O(n3).

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 5 / 28

Page 10: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Finding Eigenvalues by Solving Equations

Eigenvalues are the roots of the following equation (with variable λ):

det(M − λI) = 0 (5)

where I is an identity matrix, and det(X ) is the determinant of an n × nmatrix X .

Fact 4 det(M − λI) is a degree-n polynomial with variable λ.

Fact 5 If M is real and symmetric, Equation 5 has n real roots.

Fact 6 Computing the determinant of a matrix takes O(n3).

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 5 / 28

Page 11: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Finding Eigenvalues by Solving Equations

Eigenvalues are the roots of the following equation (with variable λ):

det(M − λI) = 0 (5)

where I is an identity matrix, and det(X ) is the determinant of an n × nmatrix X .

Fact 4 det(M − λI) is a degree-n polynomial with variable λ.

Fact 5 If M is real and symmetric, Equation 5 has n real roots.

Fact 6 Computing the determinant of a matrix takes O(n3).

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 5 / 28

Page 12: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Finding Eigenvalues by Solving Equations - An example

M =

[3 2

2 6

](6)

We have:

det(M − λI ) = det(

[3− λ 2

2 6− λ

]) = λ2 − 9λ+ 14 (7)

Two roots are λ1 = 7 and λ2 = 2.

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 6 / 28

Page 13: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Finding Eigenvalues and Eigenvectors by Power Iteration

Finding principal eigenvector and value.

PowerIteration1(M)v0 ← a random vectorfor t ← 1 to k

vt = Mvt−1

||Mvt−1||return vk , v

Tk Mvk .

Running time O((m + n)k) where m is the number of non-zeros of M.

e1 ≈ vk and λ1 ≈ vTk Mvk .

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 7 / 28

Page 14: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Power Iteration - An example

M =

[3 2

2 6

](8)

and v0 = [1, 1]T . Then

Mv0 =

[3 2

2 6

][1

1

]=

[5

8

](9)

Thus, v1 = Mv0||Mv0||2 = [0.530, 0.848]T . Repeat second time we get:

v2 = [0.471, 0.882]T (10)

The limiting vector is:vk = [0.447, 0.894]T (11)

with λ = vTk Mvk = 6.993

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 8 / 28

Page 15: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Finding Eigenvalues and Eigenvectors by Power Iteration

Finding the second largest eigenvalues and vectors

PowerIteration2(M)(e1, λ1)← PowerIteration1(M)v0 ← a random vectorM2 ← M − λ1e1eT1for t ← 1 to k

vt = M2vt−1

||Mvt−1||return vk , v

Tk M2vk .

Running time O((m + n)k) where m is the number of non-zeros of M.

e2 ≈ vk and λ2 ≈ vTk Mvk .

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 9 / 28

Page 16: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

The Matrix of Eigenvectors

E =

e1 e2 . . . en

(12)

Then:ETE = EET = I (13)

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 10 / 28

Page 17: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

The Matrix of Eigenvectors - An Example

M =

[3 2

2 6

](14)

has:

x1 = [1√5,

2√5

]T x2 = [2√5,− 1√

5]T (15)

Thus,

E =

[1√5

2√5

2√5− 1√

5

](16)

It is straightforward to verify that:

EET = ETE =

[1 0

0 1

](17)

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 11 / 28

Page 18: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Principal Component Analysis- PCA

Given a set of points D = {x1, x2, . . . , xn} in Rd , find a direction where allthe points line up best.

A direction is a (unit) vector w.

All the points line up best along w when:

n∑i=1

(xTi w)2 (18)

is maximum.

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 12 / 28

Page 19: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Principal Component Analysis- PCA

Given a set of points D = {x1, x2, . . . , xn} in Rd , find a direction where allthe points line up best.

A direction is a (unit) vector w.

All the points line up best along w when:

n∑i=1

(xTi w)2 (18)

is maximum.

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 12 / 28

Page 20: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Principal Component Analysis- PCA

Let:

X =

xT1xT2...xTn

(19)

Then

n∑i=1

(xTi w)2 = ||Xw||22 = wTXTXw (20)

Thus,∑n

i=1(xTi w)2 is maximum when w is the principal eigenvector ofXTX .

Question: what is wTXTXw when w is the principal eigenvectorof XTX?Answer: λ1(XTX ).

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 13 / 28

Page 21: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Principal Component Analysis- PCA

Let:

X =

xT1xT2...xTn

(19)

Then

n∑i=1

(xTi w)2 = ||Xw||22 = wTXTXw (20)

Thus,∑n

i=1(xTi w)2 is maximum when w is the principal eigenvector ofXTX . Question: what is wTXTXw when w is the principal eigenvectorof XTX?

Answer: λ1(XTX ).

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 13 / 28

Page 22: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Principal Component Analysis- PCA

Let:

X =

xT1xT2...xTn

(19)

Then

n∑i=1

(xTi w)2 = ||Xw||22 = wTXTXw (20)

Thus,∑n

i=1(xTi w)2 is maximum when w is the principal eigenvector ofXTX . Question: what is wTXTXw when w is the principal eigenvectorof XTX?Answer: λ1(XTX ).

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 13 / 28

Page 23: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

PCA - An Example

X =

1 2

2 1

3 4

4 3

and XTX =

[30 28

28 30

](21)

has λ1(XTX ) = 58 with vector x1 = [ 1√2, 1√

2]T

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 14 / 28

Page 24: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

PCA - More Components

Given a set of points D = {x1, x2, . . . , xn} in Rd , find the second bestdirection where all the points line up best.

A direction u is the second best if

uTw = 0, here w is the best.

All the points line up best along u among all directions orthogonal tow . That is

n∑i=1

(xTi u)2 (22)

is maximum among all directions orthogonal to w

Fact: u is the second eigenvector of XTX corresponding to the secondlargest eigenvalue λ2(XTX ).

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 15 / 28

Page 25: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

PCA - More Components

Given a set of points D = {x1, x2, . . . , xn} in Rd , find the second bestdirection where all the points line up best.A direction u is the second best if

uTw = 0, here w is the best.

All the points line up best along u among all directions orthogonal tow . That is

n∑i=1

(xTi u)2 (22)

is maximum among all directions orthogonal to w

Fact: u is the second eigenvector of XTX corresponding to the secondlargest eigenvalue λ2(XTX ).

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 15 / 28

Page 26: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

PCA - More Components

Given a set of points D = {x1, x2, . . . , xn} in Rd , find the second bestdirection where all the points line up best.A direction u is the second best if

uTw = 0, here w is the best.

All the points line up best along u among all directions orthogonal tow . That is

n∑i=1

(xTi u)2 (22)

is maximum among all directions orthogonal to w

Fact: u is the second eigenvector of XTX corresponding to the secondlargest eigenvalue λ2(XTX ).

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 15 / 28

Page 27: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

PCA - More Components

Given a set of points D = {x1, x2, . . . , xn} in Rd , find the k-th bestdirection where all the points line up best.Fact: The k-the best direction is the eigenvector of XTX correspondingto the k − th largest eigenvalue λk(XTX ).

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 16 / 28

Page 28: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Using PCA for Dimensionality Reduction

Given k eigenvectors e1, e2, . . . , ek corresponding to k largest eigenvalues.We can then represent each new data point xi ∈ D as:

xTi e1

xTi e2

···

xTi ek

(23)

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 17 / 28

Page 29: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Singular Value DecompositionLet M be a m × n matrix. Let r be the rank of M. A Singular ValueDecomposition is a decomposition of M into three matrices U,Σ,Vwhere:

U is a column-orthogonal m × r matrix, i.e, UTU = ImΣ is a diagonal r × r matrix. Elements on the diagonal of Σ aresingular values and are decreasingly ordered.

V is an column-orthogonal n × r matrix, i.e, V TV = In

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 18 / 28

Page 30: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

SVD

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 19 / 28

Page 31: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Understanding SVD

Think of U,Σ,V as a representation of concepts hidden in M.

Two concepts:

ScienceFiction = {TheMatrixAlien, StarWars}Romance = {Casablanca, Titanic}

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 20 / 28

Page 32: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Dimensionality Reduction by SVD

A rank-k SVD approximation of M is the matrix UkΣkVTk where:

Uk contains the first k columns of U.

Σk contains k largest elements of Σ.

Vk contains the first k columns of V .

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 21 / 28

Page 33: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Dimensionality Reduction by SVD - An Example

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 22 / 28

Page 34: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Dimensionality Reduction by SVD - An Example

A rank-2 approximation of M ′:

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 23 / 28

Page 35: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Dimensionality Reduction by SVD - Why?

Theorem

Given M, a rank-k SVD approximation of M, denoted by Ak = UkΣkVTk

has:||M − Ak ||F (24)

minimum among all possible rank-k matrices.

||X ||F is the Frobenius norm of X :

||X ||F =

√√√√ n∑i=1

m∑j=1

X [i , j ]2 (25)

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 24 / 28

Page 36: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

How to choose k

Choose k so that at least 90% of energy of Σ is preserved.

Energy(Σ) =r∑

i=1

Σ[i , i ]2 (26)

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 25 / 28

Page 37: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Querying Concept

Suppose that a new person P has seen only one movie the Matrix andrated it 4. Recall:

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 26 / 28

Page 38: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Querying Concept (Contt.)

Let q be the row representation of P, that is:

q =[4 0 0 0 0

](27)

We can determine the “concept space” of P by:

qV =[2.32 0

](28)

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 27 / 28

Page 39: Dimensionality Reduction - GitHub PagesDimensionality reduction: nd the \narrower" matrices of the original matrix. Hung Le (University of Victoria) Dimensionality Reduction March

Computing SVD

RecallM = UΣV T (29)

Hence,MTM = VΣUTUΣV T = VΣ2V T (30)

which implies:MTMV = VΣ2 (31)

Conclusion: V is the set of eigenvectors of MTM. By the sameargument, U is the set of eigenvectors of MMT .Question: What is Σ?

Hung Le (University of Victoria) Dimensionality Reduction March 23, 2019 28 / 28