principal component analysis jieping ye department of computer science and engineering arizona state...
Post on 19-Dec-2015
217 views
TRANSCRIPT
![Page 1: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/1.jpg)
Principal Component Analysis
Jieping Ye
Department of Computer Science and Engineering
Arizona State University
http://www.public.asu.edu/~jye02
![Page 2: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/2.jpg)
Outline of lecture
• What is feature reduction?• Why feature reduction?• Feature reduction algorithms• Principal Component Analysis (PCA)• Nonlinear PCA using Kernels
![Page 3: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/3.jpg)
What is feature reduction?
• Feature reduction refers to the mapping of the original high-dimensional data onto a lower-dimensional space.– Criterion for feature reduction can be different based on different
problem settings.• Unsupervised setting: minimize the information loss• Supervised setting: maximize the class discrimination
• Given a set of data points of p variables
Compute the linear transformation (projection)
nxxx ,,, 21
)(: pdxGyxG dTpdp
![Page 4: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/4.jpg)
What is feature reduction?
dY pdTG
pX
dTdp XGYXG :
Linear transformation
Original data reduced data
![Page 5: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/5.jpg)
High-dimensional data
Gene expression Face images Handwritten digits
![Page 6: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/6.jpg)
Outline of lecture
• What is feature reduction?• Why feature reduction?• Feature reduction algorithms• Principal Component Analysis• Nonlinear PCA using Kernels
![Page 7: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/7.jpg)
Why feature reduction?
• Most machine learning and data mining techniques may not be effective for high-dimensional data – Curse of Dimensionality– Query accuracy and efficiency degrade rapidly as the dimension
increases.
• The intrinsic dimension may be small. – For example, the number of genes responsible for a certain type
of disease may be small.
![Page 8: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/8.jpg)
Why feature reduction?
• Visualization: projection of high-dimensional data onto 2D or 3D.
• Data compression: efficient storage and retrieval.
• Noise removal: positive effect on query accuracy.
![Page 9: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/9.jpg)
Application of feature reduction
• Face recognition• Handwritten digit recognition• Text mining• Image retrieval• Microarray data analysis• Protein classification
![Page 10: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/10.jpg)
Outline of lecture
• What is feature reduction?• Why feature reduction?• Feature reduction algorithms• Principal Component Analysis• Nonlinear PCA using Kernels
![Page 11: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/11.jpg)
Feature reduction algorithms
• Unsupervised– Latent Semantic Indexing (LSI): truncated SVD– Independent Component Analysis (ICA)– Principal Component Analysis (PCA)– Canonical Correlation Analysis (CCA)
• Supervised – Linear Discriminant Analysis (LDA)
• Semi-supervised – Research topic
![Page 12: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/12.jpg)
Outline of lecture
• What is feature reduction?• Why feature reduction?• Feature reduction algorithms• Principal Component Analysis• Nonlinear PCA using Kernels
![Page 13: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/13.jpg)
What is Principal Component Analysis?
• Principal component analysis (PCA) – Reduce the dimensionality of a data set by finding a new set of
variables, smaller than the original set of variables– Retains most of the sample's information.– Useful for the compression and classification of data.
• By information we mean the variation present in the sample, given by the correlations between the original variables. – The new variables, called principal components (PCs), are
uncorrelated, and are ordered by the fraction of the total information each retains.
![Page 14: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/14.jpg)
Geometric picture of principal components (PCs)
2z
1z
• the 1st PC is a minimum distance fit to a line in X space
• the 2nd PC is a minimum distance fit to a line in the plane perpendicular to the 1st PC
PCs are a series of linear least squares fits to a sample,each orthogonal to all the previous.
1z
![Page 15: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/15.jpg)
Algebraic definition of PCs
.,,2,1,1
111 njxaxazp
iijij
T
pnxxx ,,, 21
]var[ 1z
Given a sample of n observations on a vector of p variables
define the first principal component of the sampleby the linear transformation
where the vector
is chosen such that is maximum.
),,,(
),,,(
21
121111
pjjjj
p
xxxx
aaaa
![Page 16: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/16.jpg)
Algebraic derivation of PCs
To find first note that
where
is the covariance matrix.
Ti
n
ii xxxx
nS
1
1
1a
111
11
1
2
112
111
1
1))((]var[
Saaaxxxxan
xaxan
zzEz
Tn
i
T
iiT
n
i
Ti
T
mean. theis 1
1
n
iix
nx
In the following, we assume theData is centered. 0x
![Page 17: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/17.jpg)
Algebraic derivation of PCs
npnxxxX ],,,[ 21
0x
TXXn
S1
Assume
Form the matrix:
then
TVUX
Obtain eigenvectors of S by computing the SVD of X:
![Page 18: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/18.jpg)
To find that maximizes subject to
Let λ be a Lagrange multiplier
is an eigenvector of S
corresponding to the largest eigenvalue
therefore
Algebraic derivation of PCs
1a ]var[ 1z 111 aaT
0)(
0
)1(
1
111
1111
aIS
aSaLa
aaSaaL
p
TT
1a
.1
![Page 19: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/19.jpg)
To find the next coefficient vector maximizing
then let λ and φ be Lagrange multipliers, and maximize
subject to
and to
First note that
Algebraic derivation of PCs
2a
122 aaT
]var[ 2z
0],cov[ 12 zz
2112112 ],cov[ aaSaazz TT
122222 )1( aaaaSaaL TTT
uncorrelated
![Page 20: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/20.jpg)
Algebraic derivation of PCs
122222 )1( aaaaSaaL TTT
001222
aaSaLa
2222 and SaaaSa T
![Page 21: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/21.jpg)
We find that is also an eigenvector of S
whose eigenvalue is the second largest.
In general
• The kth largest eigenvalue of S is the variance of the kth PC.
• The kth PC retains the kth greatest fraction of the variation in the sample.
Algebraic derivation of PCs
2a
2
kkTkk Saaz ]var[
kz
![Page 22: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/22.jpg)
Algebraic derivation of PCs
• Main steps for computing PCs– Form the covariance matrix S.
– Compute its eigenvectors:
– Use the first d eigenvectors to form the d PCs.
– The transformation G is given by
],,,[ 21 daaaG
p
iia 1
d
iia 1
.point A test dTp xGx
![Page 23: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/23.jpg)
Optimality property of PCA
npTndT
ndTnp
XGGXXG
XGX
)(
Dimension reductionReconstruction
ndT XGY
pdTG
npX
Original data
dpG npX
![Page 24: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/24.jpg)
Optimality property of PCA
2
FXX
The matrix G consisting of the first d eigenvectors of the covariance matrix S solves the following min problem:
Main theoretical result:
dF
T
GIGXGGXdp
T2G subject to )(min
reconstruction error
PCA projection minimizes the reconstruction error among all linear projections of size d.
![Page 25: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/25.jpg)
Applications of PCA
• Eigenfaces for recognition. Turk and Pentland. 1991.
• Principal Component Analysis for clustering gene expression data. Yeung and Ruzzo. 2001.
• Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Human Serum. Lilien. 2003.
![Page 26: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/26.jpg)
PCA for image compression
d=1 d=2 d=4 d=8
d=16 d=32 d=64 d=100Original Image
![Page 27: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/27.jpg)
Outline of lecture
• What is feature reduction?• Why feature reduction?• Feature reduction algorithms• Principal Component Analysis• Nonlinear PCA using Kernels
![Page 28: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/28.jpg)
Motivation
Linear projections will not detect thepattern.
![Page 29: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/29.jpg)
Nonlinear PCA using Kernels
• Traditional PCA applies linear transformation– May not be effective for nonlinear data
• Solution: apply nonlinear transformation to potentially very high-dimensional space.
• Computational efficiency: apply the kernel trick.– Require PCA can be rewritten in terms of dot product.
)(: xx
)()(),( jiji xxxxK More on kernelslater
![Page 30: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/30.jpg)
Nonlinear PCA using Kernels
Rewrite PCA in terms of dot product
.0 i.e., centered,been has data theAssume i
ix
Ti
ii xx
nS
1The covariance matrix S can be written as
i
iTi
Ti
ii xvx
nvvvxx
nSv )(
11
Let v be The eigenvector of S corresponding to nonzero eigenvalue
Eigenvectors of S lie in the space spanned by all data points.
![Page 31: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/31.jpg)
Nonlinear PCA using Kernels
].x,,x,[xX where,1
n21 TXXn
S
Xxvi
ii
i
iTi
Ti
ii xvx
nvvvxx
nSv )(
11
The covariance matrix can be written in matrix form:
XXXXn
Sv T 1
)())((1
XXXXXXn
TTT
)(1
XXn
T Any benefits?
![Page 32: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/32.jpg)
Nonlinear PCA using Kernels
Next consider the feature space: )(: xx
].x,,x,[x where,1
n21 XXX
nS
T
XXn
T1 Xxvi
ii )(
The (i,j)-th entry of XXT
is )()( ji xx
Apply the kernel trick: )()(),( jiji xxxxK
Kn
1K is called the kernel matrix.
![Page 33: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/33.jpg)
Nonlinear PCA using Kernels
• Projection of a test point x onto v:
iii
iii
iii
xxKxx
xxvx
),()()(
)()()(
Explicit mapping is not required here.
![Page 34: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d385503460f94a110ca/html5/thumbnails/34.jpg)
Reference
• Principal Component Analysis. I.T. Jolliffe.
• Kernel Principal Component Analysis. Schölkopf, et al.
• Geometric Methods for Feature Extraction and Dimensional Reduction. Burges.