lecture 9: linear algebracfgranda/pages/dsga1002_fall15/material/lecture_9.pdflinear minimum-mse...

Lecture 9: Linear algebra

DS GA 1002 Statistical and Mathematical Modelshttp://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall15

Carlos Fernandez-Granda

11/16/2015

http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall15

Projections

Matrices

Eigendecomposition

Principal Component Analysis

Orthogonal projection

The orthogonal projection of x onto a subspace S is a vectorPS x such that x − PS x ∈ S⊥

For any orthonormal basis b1, . . . , bm of S,

PS x =m∑

i=1

〈x , bi 〉 bi

The projection of x onto the span of any vector v is

PS x =

⟨x ,

v||v ||〈·,·〉

⟩v

||v ||〈·,·〉

Lemma: The orthogonal projection is unique

Orthogonal projection

PS x is the vector in S that is closest to x

It is the solution to the optimization problem

minimizeu

||x − u||〈·,·〉subject to u ∈ S

Linear minimum-MSE estimation

Aim: Estimate X from Y using a linear estimator

Assumption: We know the means µX , µY , variances σ2X , σ

2Y and

correlation coefficient ρXY

The best linear estimate of X given Y in terms of MSE is

gLMMSE (y) =ρXY σX (y − µY )

σY+ µX

Projections

Matrices

Eigendecomposition


Matrices

Rectangular array of numbers, if A ∈ Rm×n

A =

A11 A12 · · · A1nA21 A22 · · · A2n

· · ·Am1 Am2 · · · Amn

Notation:

I Ai : is the ith row of A

I A:j is the jth column of A

The transpose of A AT ∈ Rn×m satisfies(AT)

ij= Aji

Matrix-vector multiplication

The product of a matrix A ∈ Rm×n and a vector x ∈ Rn equals

(A x)i =n∑

j=1

Aijx(j), 1 ≤ i ≤ m

Matrix Vector Multiplication

Ax =

A11 A12 . . . A1nA21 A22 . . . A2n

. . .Am1 Am2 . . . Amn

x1x2. . .xn

=

v1v2. . .vm

Matrix-vector multiplication

Row interpretation:

A x =

〈A1:, x〉

〈A2:, x〉· · ·

〈Am:, x〉

Column interpretation:

A x =n∑

j=1

A:j x(j)

Dot product between x and y

x · y = xT y = yT x

Identity matrix

I =

1 0 · · · 00 1 · · · 0

· · ·0 0 · · · 1

Maps any vector to itself, for all x ∈ Rn

Ix = x

Matrix product

The product of A ∈ Rm×n and B ∈ Rn×p is a matrix AB ∈ Rm×p,

(AB)ij =n∑

k=1

AikBkj = 〈Ai :,B:,j〉

Matrix product

AB =

A11 A12 . . . A1nA21 A22 . . . A2n

. . .Am1 Am2 . . . Amn

B11 B12 . . . B1pB21 B22 . . . B2p

. . .Bn1 Bn2 . . . Bnp

=

AB11 AB12 . . . AB1pAB21 AB22 . . . AB2p

. . .ABm1 ABm2 . . . ABmp

Matrix product

Column interpretation:

AB =[AB:1 AB:2 · · · AB:n

]The inverse A−1 of a square matrix A satisfies

AA−1 = A−1A = I

Orthogonal matrix

An orthogonal matrix is a square matrix such that

UTU = UUT = I

The columns U:1,U:2, . . . ,U:n form an orthonormal basis

For any vector x

x = UUT x =n∑

i=1

〈U:i , x〉U:i

UT x contains the basis coefficients of x

Applying an orthogonal matrix just rotates the vector

||Ux ||2 = ||x ||2

Projections

Matrices

Eigendecomposition


Eigenvectors and eigenvalues

An eigenvector v of A satisfies

Av = λv

for a real number λ which is the corresponding eigenvalue

Even if A is real, its eigenvectors and eigenvalues can be complex

Eigendecomposition

If a square matrix A ∈ Rn×n has n linearly independent eigenvectorsv1, . . . , vn with eigenvalues λ1, . . . , λn

A =[v1 v2 · · · vn

] λ1 0 · · · 00 λ2 · · · 0

· · ·0 0 · · · λn

[v1 v2 · · · vn]−1

= QΛQ−1

Usually, by convention λ1 ≥ λ2 ≥ · · · ≥ λn

This is the eigendecomposition of A

Not all matrices have an eigendecomposition[0 10 0

]

Power method

Let A = QΛQ−1

For an arbitrary vector x , express x in terms of Q:1,Q:2, . . . ,Q:n

x =n∑

i=1

αiQ:i

If we apply A to x k times

Akx =n∑

i=1

αiλki Q:i

If α1 6= 0, as k →∞ the term α1λk1Q:1 dominates

Power method

Input: Matrix A, vector x

Output: Eigenvector corresponding to dominant eigenvalue

Initialization: Set u1 := x/ ||x ||2

For i = 2, . . . ,m compute

ui :=Aui−1

||Aui−1||2

Markov chains

Consider a sequence of discrete random variables X0,X1, . . . such that

pXk+1|X0,X1,...,Xk (xk+1|x0, x1, . . . , xk) = pXk+1|Xk (xk+1|xk)

The sequence is a Markov chain

Let X be restricted to {α1, . . . , αn}, the Markov chain istime homogeneous if

Pij := pXk+1|Xk (αj |αi )

only depends on i and j , not on k for all 1 ≤ i , j ≤ n, k ≥ 0

Time-homogeneous Markov chains

Pij can be interpreted as entries of a transition matrix P

Consider the vector of probabilities

πk =

pXk (α1)

pXk (α2)

· · ·pXk (αn)

πk = P P · · ·P π0 = Pk π0

If P has an eigendecomposition and a dominant eigenvalue, as k →∞

limk→∞

πk = αv1, α ∈ R

Projections

Matrices

Eigendecomposition


Row and column space

The row space row (A) of a matrix A is the span of its rows

The column space col (A) is the span of its columns

Lemma:

rank (A) := dim (col (A)) = dim (row (A))

Number of linearly independent rows or columns is the same!

Singular-value decomposition

Every real matrix has a unique singular-value decomposition (svd)

A =[u1 u2 · · · ur

] σ1 0 · · · 00 σ2 · · · 0

· · ·0 0 · · · σr

vT1

vT2· · ·vTr

= UΣV T

The singular values are σ1 ≥ σ2 ≥ · · · ≥ σr ≥ 0

The left singular vectors u1, u2, . . . , ur form a basis for the column space

The right singular vectors v1, v2, . . . , vr form a basis for the row space


Aim: Find a basis for a subspace S of low dimension such that

xi ≈ PS xi

Idea: Greedy approach

1. Find unit-norm vector u1, such that the projection of the data ontoits span is as large as possible

2. Find unit-norm vector u2 orthogonal to u1, such that the projectionof the data onto its span is as large as possible

3. Find unit-norm vector u3 orthogonal to u1 and u2, such that theprojection of the data onto its span is as large as possible

4. . . .


We group the data x1, x2, . . . , xn as columns of a matrix X

X =[x1 x2 · · · xn

]For a unit-norm vector u, XTu contains the projections of x1, . . . , xnonto the span of u

What u maximizes the norm of the projection?

The top left singular vector because

σ1 = max||u||2=1

∣∣∣∣∣∣XTu∣∣∣∣∣∣

2

u1 = arg max||u||2=1

∣∣∣∣∣∣XTu∣∣∣∣∣∣

2


Similarly

σ2 = max||u||2=1,u⊥u1

∣∣∣∣∣∣XTu∣∣∣∣∣∣

2

u2 = arg max||u||2=1,u⊥u1

∣∣∣∣∣∣XTu∣∣∣∣∣∣

2

σi = max||u||2=1,u⊥u1,...,ui−1

∣∣∣∣∣∣XTu∣∣∣∣∣∣

2

ui = arg max||u||2=1,u⊥u1,...,ui−1

∣∣∣∣∣∣XTu∣∣∣∣∣∣

2

Dimensionality reduction / compression using the truncated svd

Example

lecture 9: linear algebracfgranda/pages/dsga1002_fall15/material/lecture_9.pdflinear minimum-mse...

Documents