pi i lc ta l iprincipal component analysis a brief ... · 19-04-2011 · how old is principal...

P i i l C t A l iPrincipal Component Analysis

A Brief IntroductionA Brief Introduction

Mi f Li PhD DCSMingfu Liu. PhD, DCS

Methodology Journal Club, University of CalgaryApril 19, 2011

1

How Old is Principal Component Analysis

Pearson, K. (1901) On lines and planes of closest fit to systems of points in space Philosophical Magazine 2 559-572points in space. Philosophical Magazine, 2, 559-572

Hotelling, H. (1933) Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24, 417-p p p y gy, ,441.

2

Principal Components Visual Presentation

3

How do you describe him?How do you describe him?1. How many attributes can you get from him?2. How do you describe him generally?

)

4

Now we have p variables from a single population of size nNow we have p variables from a single population of size n

X11, X12, X13, …, X1pX21, X22, X23, …, X2p21 22 23 2p….Xn1, Xn2, Xn3, …, Xnp

Which variables should be used to represent the characteristics of the population?

How do we classify these variables, independent or dependent?

5

Which variables should be used to represent the characteristics of the population?

The simplet way is to keep one variable and discard all others: not reasonable!

Wheigt all variable equally: not reasonable (even they have same variance)

Wheigted average based on some citerion.

6

Which criterion?.

The weighted average f(X1, X2, X3, … Xp) seems reasonable.

If this is true, the Xs are independent variables. Where are the dependent variables? In this case, the dependent p , pvariables are unobservable latent variables ( we assume they are dependent variables for now).

We need to set up criterion to find the function –> Principal Component Analysis

7

Principal Component Analysis

Principal component analysis (PCA) involves a mathematical procedure that transforms a number of (possibly) correlated variables into anumber of (possibly) correlated variables into a (smaller) number of uncorrelated variables called principal componentscalled principal components.

Objectives of principal component analysis

To discover or to reduce the dimensionality of the data set.

To identify new meaningful underlying variables.

8

Transformation• Look for a transformation of the original data vector X (px1) Look for a transformation of the original data vector X (px1)

so that new variable - principal component (Yi ) can bedefined as

Yi= α1TX= α11 X1+ α12 X2+..+ α1p Xp

… Yi= αi

TX = αi1 X1+ αi2 X2+..+ αip Xpi i i1 1 i2 2 ip p

… Yp= αp

TX= αp1 X1+ αp2 X2+..+ αpp Xp

• Where αi =(αi1 , αi2 ,.., αip)T is a column vector of wheightswith

αiTαi = αi1²+ αi2²+ + αi ² =1 to restrict the vector to

9

αi αi αi1 + αi2 +..+ αip 1 to restrict the vector to unit length to eliminate indetermincy

Variance MaximizationMaximize the variance of the projection of the observations on the principal Maximize the variance of the projection of the observations on the principal component (Yi) to find a vector αi for each principal component (Yi).

Var(Y1) = var(α1T X)= α1

TVar(X) α1 is maximal…

Var(Yi) = var(αiT X)= αi

TVar(X) αi is maximal…

Var(Yp) = var(αpT X)= αp

TVar(X) αp is maximalVar(Yp) var(αp X) αp Var(X) αp is maximal

The matrix C=Var(X) is the variance matrix of the X variables

T i i f ti f l i bl bj t t t i t thTo maximize a function of several variables subject to one or more constraints, the method of Lagrange multipliers is used.

In this case this leads to the solution that αi is the ith eignvector

10

i gof the variance matrix C.

Variance Matrix

⎟⎟⎞

⎜⎜⎛ ........)( 1211 p ),xc(x),xc(xxv

⎟⎟⎟⎟

⎜⎜⎜⎜

........)( 2221 p ),xc(xxv),xc(xC=

⎟⎟

⎠⎜⎜

⎝ )(..........21 ppp xv),xc(x),xc(x

C has p eignvalue–eignvector (latent value – latent vector) pairs (λ1 , α1), (λ2 , α2), … (λp , αp) corresponding to the variances and coefficient vectors of the p principal components, , where λ1 >= λ2 >= λ3 >= … >= λp

11

1 2 3 … p

Principal Component variance - Eignvalue

A P i i l C t h i l t th A Principal Component has a variance equal to the corresponding eigenvalue

Var(Yi)= λi for all i=1…pa ( i) λi o …p

Small λi small variance data change little in the direction of the component Yi

Principal Components Yi are derived in decreasing order of importance λ1 >= λ2 >= λ3 >= … >= λp

Th l ti i l i d b h i i l t The relative variance explained by each principal component is given by λi /Σ λi

12

Principal Component weights - Eignvector

Principal Component Y that is linear combination of the Principal Component Yi that is linear combination of the original variables (X) is calculated using eignvector αi as weights

Y1 = α1TX = a11x1+a12x2+…+a1pxp

Y2 = α2TX = a21x1+a22x2+…+a2pxp

…Y = α TX = a 1x1+a 2x2+ +a xYp αp X ap1x1+ap2x2+…+appxp

or

Y TX i 1Yi = αiTX = ai1x1+ai2x2+…+aipxp ; i=1..p

As the eignvectors are orthogonal (uncorrelated) to eachanother, Principal Components are orthogonal to each

13

, p p ganother

Principal Components Visual Presentation

14

Pearson’s Visual Presentation

15

Key Points

What we need to remember for now are:

1. We transform original variables Xs to Principal Components Ys using eignanalysis

2.The eignvalues are the corresponding variances of the principal components in decreasing order of importance

3. The eignvectors are the corresponding weight sets for the principal componentswhich are linear combinations of the original variables

4. Principal Components are uncorrelated4. Principal Components are uncorrelated

16

Eignanalysis – Square Matrix Decomposition and Diagonalization

The eigenvalues λi are found by solving the equation q

det(C-λI)=0

Eigenvectors are columns of the matrix A such that

C=A D AT⎟⎟⎞

⎜⎜⎛λ 0........01

C=A D AT

Where D = ⎟⎟⎟⎟⎟

⎠⎜⎜⎜⎜⎜

⎝ λ

λ

00

0.......0 2

17

⎟⎠

⎜⎝ pλ............0

Application StrategyKeep enough principal components to have a cumulative variance explained by them >50-70%

Kaiser criterion: keep principal components with eigenvalues >1eigenvalues >1

Scree plot: represents the ability of principal p p y p pcomponents to explain the variation in data

18

Scree Plot

19

Standardization of Original Varibales

If variables have very heterogenous variances we standardize them The standardized variables Zi

Z (X mean)/√varianceZi= (Xi-mean)/√variance

The new variables all have the unit The new variables all have the unit variance (=1)

20

Correlation Matrix

When the original variables are standardized, covariance Matrix becomes correlation Matrix.

W l th th d t d P i i l C t A l i Th lWe apply the same methods to do Principal Component Analysis. The only difference is that C matrix is replaced by R matrix

R =

21

Covariance Matrix or Correlation Matrx?

C i M i i d hCovariance Matrix is used when

1 The original variables have the same1. The original variables have the sameunit/scale

2. The original variables have similar varianceg

Otherwise, Correlation Matrix should be used

22

Simple Example

1 4

4 100C = λ1 = 100.16 α1

T = [0.04, 0.990] Y1 = 0.04 X1 + 0.999X2

λ2 = 0.84 α2T

= [ 0.999, -0.04] Y2 = 0.999 X1 - 0.04X2

1 .4

.4 1 R=

λ1 = 1.4 α1T

= [0.707, 0.707 ] Y1 = 0.707 Z1 + 0.707Z2 = 0.707 (X1 – M1) + 0.0707 (X2 – M2) λ2 = 0.6 α2

T = [ 0.707, -0.707] Y2 = 0.707 Z1 - 0.707Z2 = 0.707 (X1 – M1) - 0.0707 (X2 – M2)

23

Something to think about

Are original variables really independent variables?

It depends on 1 H t h t th bl1. How to approach to the problem2. What methods to use to solve the problem

24

How to Approach to the Problem

1. When we treat the original variables as analytical input as we have done, they are independent variables

2. When we treat the original variables as realization of the underlying latent variables they are dependent variablesvariables, they are dependent variables

25

What methods to use to solve the problem

Three Methods are Available:

1. Maximizing Variance 2. Minimizing Error3. Diagonalizing the Correlation Matrix

When minimizing error method is used original variables are dependent variablesWhen minimizing error method is used, original variables are dependent variables

26

What methods to use to solve the problem

X = YB+ E

where

X is an n x p matrix of the centered observed variables; Y is the n x j matrix of scores on the first j principal components; B is the j x p matrix of eigenvectors;B is the j x p matrix of eigenvectors; E is an n x p matrix of residuals;

The method is to minimize the sum of all the squared elements in E

In this case the original variables are dependent variables just like SEM and latent models.

27

Key Points To Take Home

1. We transform original variables Xs to Principal Components Ys using eignanalysis

2. The eignvalues are the corresponding variances of the principal components indecreasing order of importance.

3 The eignvectors are the corresponding weight sets for the principal components3. The eignvectors are the corresponding weight sets for the principal componentswhich are linear combinations of the original variables

4. Principal Components are uncorrelated

28

Principal Component Analysis

Questions?Questions?

29

SAS Example Demo

30

pi i lc ta l iprincipal component analysis a brief ... · 19-04-2011 · how old is principal...

Documents