isye 6416: computational statistics spring 2017 …yxie77/isye6416_17/lecture...isye 6416:...

21
ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

Upload: others

Post on 10-Jun-2020

31 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof

ISyE 6416: Computational StatisticsSpring 2017

Lecture 11: Principal Component Analysis(PCA)

Prof. Yao Xie

H. Milton Stewart School of Industrial and Systems EngineeringGeorgia Institute of Technology

Page 2: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof

Why PCA

I Real-world data sets usually exhibit structures among theirvariables

I Principal component analysis (PCA) rotates the original datato new coordinates

I Dimension reductionI ClassificationI Denoising

Page 3: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof

Data compression

Page 4: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof

Face recognition

Eigenfaces

Page 5: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof

Denoising

Page 6: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof

Data visualization

“The system collects opinions on statements as scalar values on acontinuous scale and applies dimensionality reduction to projectthe data onto a two-dimensional plane for visualization andnavigation.” Using Canonical Correla.on Analysis (CCA) for Opinion

Visualiza.on, Faridani et.al, UC Berkeley, 2010.

Page 7: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof

What is PCAI Given a data matrix X ∈ Rn×p: n samples, and p variablesI Transform data set to one with a few number of principal

components

vTj xi, j = 1, 2, . . . ,K, i = 1, 2, . . . , n.

I It is a form of linear dimension reductionI Specifically, “interesting directions” means “high variance”

Page 8: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof

Projection onto unit vectors

(xT v) ∈ R: score(xT v)v ∈ Rp: projection

Page 9: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof

Example: projection onto unit vectors

Page 10: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof

Projections onto orthonormal vectors

Source: R. Tibshrani.

Page 11: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof

First principal component

I The first principal component direction of X is the unit vectorv1 ∈ Rp that maximizes the sample variance of Xv1 ∈ Rn

among all unit length vector

v1 = arg max‖v‖2=1

(Xv)T (Xv)

Note that

Xv =

xT1 v...

xTnv

are projection of each of the sample on a unit length vector

I Xv1 is the first principal component score

Page 12: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof

Eigenvector and eigenvalue

v1 = arg max‖v‖2=1

(Xv)T (Xv)

I v1 corresponds to the largest eigenvector of XTX: samplecovariance matrix

I vT1 XTXv1, the largest eigenvalue of XTX is the variance

explained by v1I Rayleigh quotient R(v) = vTAv

vT v

λmin ≤vTAv

vT v≤ λmax

Page 13: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof

Example: X ∈ R50×2

Page 14: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof

Beyond first principal componentI What’s next? The idea is the find orthogonal directions of the

remaining highest varianceI Orthogonal: since we have already explained the variance

along v1, we need new direction that has no “overlap” with v1to avoid redundancy.

v2 = arg max‖v‖2=1,vT v1=0

(Xv)T (Xv)

I Can repeat this process to find the kth principal componentdirection

vk = arg max‖v‖2=1,vT vj=0,j=1,...,k−1

(Xv)T (Xv)

Page 15: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof

Example: X ∈ R50×2

Page 16: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof

Properties

I There are at most p principal components

I [v1, v2, . . . , vp] can be found from eigendecompositionLet Σ = XTX

Σ = UΛUT

where Λ = diag{λ1, . . . , λp}, U is orthogonal matrix

I Can be computed efficiently if you only need the first principlecomponent: power’s method

v(k) := Σv(k−1)/‖Σv(k−1)‖

I In general can be computed using Jacobi’s method

I Sparse Σ

Page 17: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof

PCA example: financial data analysis

I Daily swap rates of eight maturities from 7/3/2000 to7/15/2005

data from

http://www.stanford.edu/~xing/statfinbook/data.html

Page 18: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof
Page 19: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof

We typically take difference of these time series: yt = xt − xt−1, tomake it more “stationary”.

Page 20: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof

Use the MATLAB command to perform PCA

eigenvalues first 3 eigenvectors

Page 21: ISyE 6416: Computational Statistics Spring 2017 …yxie77/isye6416_17/Lecture...ISyE 6416: Computational Statistics Spring 2017 Lecture 11: Principal Component Analysis (PCA) Prof

Resulted first 3 reduced dimensions