pi i lc ta l iprincipal component analysis a brief ... · 19-04-2011 · how old is principal...
TRANSCRIPT
P i i l C t A l iPrincipal Component Analysis
A Brief IntroductionA Brief Introduction
Mi f Li PhD DCSMingfu Liu. PhD, DCS
Methodology Journal Club, University of CalgaryApril 19, 2011
1
How Old is Principal Component Analysis
Pearson, K. (1901) On lines and planes of closest fit to systems of points in space Philosophical Magazine 2 559-572points in space. Philosophical Magazine, 2, 559-572
Hotelling, H. (1933) Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24, 417-p p p y gy, ,441.
2
Principal Components Visual Presentation
3
How do you describe him?How do you describe him?1. How many attributes can you get from him?2. How do you describe him generally?
)
4
Now we have p variables from a single population of size nNow we have p variables from a single population of size n
X11, X12, X13, …, X1pX21, X22, X23, …, X2p21 22 23 2p….Xn1, Xn2, Xn3, …, Xnp
Which variables should be used to represent the characteristics of the population?
How do we classify these variables, independent or dependent?
5
Which variables should be used to represent the characteristics of the population?
The simplet way is to keep one variable and discard all others: not reasonable!
Wheigt all variable equally: not reasonable (even they have same variance)
Wheigted average based on some citerion.
6
Which criterion?.
The weighted average f(X1, X2, X3, … Xp) seems reasonable.
If this is true, the Xs are independent variables. Where are the dependent variables? In this case, the dependent p , pvariables are unobservable latent variables ( we assume they are dependent variables for now).
We need to set up criterion to find the function –> Principal Component Analysis
7
Principal Component Analysis
Principal component analysis (PCA) involves a mathematical procedure that transforms a number of (possibly) correlated variables into anumber of (possibly) correlated variables into a (smaller) number of uncorrelated variables called principal componentscalled principal components.
Objectives of principal component analysis
To discover or to reduce the dimensionality of the data set.
To identify new meaningful underlying variables.
8
Transformation• Look for a transformation of the original data vector X (px1) Look for a transformation of the original data vector X (px1)
so that new variable - principal component (Yi ) can bedefined as
Yi= α1TX= α11 X1+ α12 X2+..+ α1p Xp
… Yi= αi
TX = αi1 X1+ αi2 X2+..+ αip Xpi i i1 1 i2 2 ip p
… Yp= αp
TX= αp1 X1+ αp2 X2+..+ αpp Xp
• Where αi =(αi1 , αi2 ,.., αip)T is a column vector of wheightswith
αiTαi = αi1²+ αi2²+ + αi ² =1 to restrict the vector to
9
αi αi αi1 + αi2 +..+ αip 1 to restrict the vector to unit length to eliminate indetermincy
Variance MaximizationMaximize the variance of the projection of the observations on the principal Maximize the variance of the projection of the observations on the principal component (Yi) to find a vector αi for each principal component (Yi).
Var(Y1) = var(α1T X)= α1
TVar(X) α1 is maximal…
Var(Yi) = var(αiT X)= αi
TVar(X) αi is maximal…
Var(Yp) = var(αpT X)= αp
TVar(X) αp is maximalVar(Yp) var(αp X) αp Var(X) αp is maximal
The matrix C=Var(X) is the variance matrix of the X variables
T i i f ti f l i bl bj t t t i t thTo maximize a function of several variables subject to one or more constraints, the method of Lagrange multipliers is used.
In this case this leads to the solution that αi is the ith eignvector
10
i gof the variance matrix C.
Variance Matrix
⎟⎟⎞
⎜⎜⎛ ........)( 1211 p ),xc(x),xc(xxv
⎟⎟⎟⎟
⎜⎜⎜⎜
........)( 2221 p ),xc(xxv),xc(xC=
⎟⎟
⎠⎜⎜
⎝ )(..........21 ppp xv),xc(x),xc(x
C has p eignvalue–eignvector (latent value – latent vector) pairs (λ1 , α1), (λ2 , α2), … (λp , αp) corresponding to the variances and coefficient vectors of the p principal components, , where λ1 >= λ2 >= λ3 >= … >= λp
11
1 2 3 … p
Principal Component variance - Eignvalue
A P i i l C t h i l t th A Principal Component has a variance equal to the corresponding eigenvalue
Var(Yi)= λi for all i=1…pa ( i) λi o …p
Small λi small variance data change little in the direction of the component Yi
Principal Components Yi are derived in decreasing order of importance λ1 >= λ2 >= λ3 >= … >= λp
Th l ti i l i d b h i i l t The relative variance explained by each principal component is given by λi /Σ λi
12
Principal Component weights - Eignvector
Principal Component Y that is linear combination of the Principal Component Yi that is linear combination of the original variables (X) is calculated using eignvector αi as weights
Y1 = α1TX = a11x1+a12x2+…+a1pxp
Y2 = α2TX = a21x1+a22x2+…+a2pxp
…Y = α TX = a 1x1+a 2x2+ +a xYp αp X ap1x1+ap2x2+…+appxp
or
Y TX i 1Yi = αiTX = ai1x1+ai2x2+…+aipxp ; i=1..p
As the eignvectors are orthogonal (uncorrelated) to eachanother, Principal Components are orthogonal to each
13
, p p ganother
Principal Components Visual Presentation
14
Pearson’s Visual Presentation
15
Key Points
What we need to remember for now are:
1. We transform original variables Xs to Principal Components Ys using eignanalysis
2.The eignvalues are the corresponding variances of the principal components in decreasing order of importance
3. The eignvectors are the corresponding weight sets for the principal componentswhich are linear combinations of the original variables
4. Principal Components are uncorrelated4. Principal Components are uncorrelated
16
Eignanalysis – Square Matrix Decomposition and Diagonalization
The eigenvalues λi are found by solving the equation q
det(C-λI)=0
Eigenvectors are columns of the matrix A such that
C=A D AT⎟⎟⎞
⎜⎜⎛λ 0........01
C=A D AT
Where D = ⎟⎟⎟⎟⎟
⎠⎜⎜⎜⎜⎜
⎝ λ
λ
00
0.......0 2
17
⎟⎠
⎜⎝ pλ............0
Application StrategyKeep enough principal components to have a cumulative variance explained by them >50-70%
Kaiser criterion: keep principal components with eigenvalues >1eigenvalues >1
Scree plot: represents the ability of principal p p y p pcomponents to explain the variation in data
18
Scree Plot
19
Standardization of Original Varibales
If variables have very heterogenous variances we standardize them The standardized variables Zi
Z (X mean)/√varianceZi= (Xi-mean)/√variance
The new variables all have the unit The new variables all have the unit variance (=1)
20
Correlation Matrix
When the original variables are standardized, covariance Matrix becomes correlation Matrix.
W l th th d t d P i i l C t A l i Th lWe apply the same methods to do Principal Component Analysis. The only difference is that C matrix is replaced by R matrix
R =
21
Covariance Matrix or Correlation Matrx?
C i M i i d hCovariance Matrix is used when
1 The original variables have the same1. The original variables have the sameunit/scale
2. The original variables have similar varianceg
Otherwise, Correlation Matrix should be used
22
Simple Example
1 4
4 100C = λ1 = 100.16 α1
T = [0.04, 0.990] Y1 = 0.04 X1 + 0.999X2
λ2 = 0.84 α2T
= [ 0.999, -0.04] Y2 = 0.999 X1 - 0.04X2
1 .4
.4 1 R=
λ1 = 1.4 α1T
= [0.707, 0.707 ] Y1 = 0.707 Z1 + 0.707Z2 = 0.707 (X1 – M1) + 0.0707 (X2 – M2) λ2 = 0.6 α2
T = [ 0.707, -0.707] Y2 = 0.707 Z1 - 0.707Z2 = 0.707 (X1 – M1) - 0.0707 (X2 – M2)
23
Something to think about
Are original variables really independent variables?
It depends on 1 H t h t th bl1. How to approach to the problem2. What methods to use to solve the problem
24
How to Approach to the Problem
1. When we treat the original variables as analytical input as we have done, they are independent variables
2. When we treat the original variables as realization of the underlying latent variables they are dependent variablesvariables, they are dependent variables
25
What methods to use to solve the problem
Three Methods are Available:
1. Maximizing Variance 2. Minimizing Error3. Diagonalizing the Correlation Matrix
When minimizing error method is used original variables are dependent variablesWhen minimizing error method is used, original variables are dependent variables
26
What methods to use to solve the problem
X = YB+ E
where
X is an n x p matrix of the centered observed variables; Y is the n x j matrix of scores on the first j principal components; B is the j x p matrix of eigenvectors;B is the j x p matrix of eigenvectors; E is an n x p matrix of residuals;
The method is to minimize the sum of all the squared elements in E
In this case the original variables are dependent variables just like SEM and latent models.
27
Key Points To Take Home
1. We transform original variables Xs to Principal Components Ys using eignanalysis
2. The eignvalues are the corresponding variances of the principal components indecreasing order of importance.
3 The eignvectors are the corresponding weight sets for the principal components3. The eignvectors are the corresponding weight sets for the principal componentswhich are linear combinations of the original variables
4. Principal Components are uncorrelated
28
Principal Component Analysis
Questions?Questions?
29
SAS Example Demo
30