lecture 20 empirical orthogonal functions and factor analysis

Lecture 20

Empirical Orthogonal Functionsand

Factor Analysis

Motivation

in Fourier Analysis the choice of sine and cosine “patterns” was prescribed

by the method.

Could we use the data itself as a source of information about the shape of the

patterns?

Example

maps of some hypothetical function,say, sea surface temperatureforming a sequence in time

the data

timetime

the data

pattern number

patt

ern

impo

rtan

ce

patt

ern

impo

rtan

ce

pattern number

3

Choose just the most important patterns

3 most important patterns

comparison

original reconstruction using only 3 patterns

Note that this process has reduced the noise(since noise has no pattern common to all the images)

amplitudes of patterns

time

timeNote: no requirement that pattern is periodic in time

amplitudes of patterns

Discussion:

mixing of end members

A B

C

Useful tool for data that has three “components”

ternary diagram

B

C

100% A

75% A

50% A

25% A

0% A

works for 3 end-members, as long as A+B+C=100%

… similarly for B and C

B

C

Suppose data fall near line on diagram

A

= data

B

C


A

= end-members or factors

f1

f2

B

C


A

= end-members or factors

f1

f2mixing line

50%

data idealize as being on mixing line

B

C

A

f1

f2

B

C

You could represent the data exactly with a third ‘noise’ factor

A

f1

f2 f3

doesn’t much matter where you put f3, as long as it’s not on the line

S: components (A, B, C, …) in each sample, s

(A in s1) (B in s1) (C in s1)



…

(A in sN) (B in sN) (C in sN)

S =

Note: a sample is along a row in SN samplesM componentsS is NM

F: components (A, B, C, …) in each factor, f

(A in f1) (B in f1) (C in f1)



F =

M componentsM factorsF is MM

C: coefficients of the factors

(f1 in s1) (f2 in s1) (f3 in s1)

(f1 in s2) (f2 in s2) (f3 in s2)

(f1 in s3) (f2 in s3) (f3 in s3)

…

(f1 in sN) (f2 in sN) (f3 in sN)

C =

N samplesM factorsC is NM

SamplesNM

(f1 in s1) (f2 in s1) (f3 in s1)

(f1 in s2) (f2 in s2) (f3 in s2)

(f1 in s3) (f2 in s3) (f3 in s3)

…

(f1 in sN) (f2 in sN) (f3 in sN)




…


=




S = C F

Coefficients NM

Factors MM

SamplesNM

(f1 in s1) (f2 in s1)

(f1 in s2) (f2 in s2)

(f1 in s3) (f2 in s3)

…

(f1 in sN) (f2 in sN)




…


=



S C’ F’

selectedcoefficients

Np

selectedfactors pM

ignore f3

igno

re f

3

data approximated with only most important factors

p most important factors = those with the biggest coefficients

view samples as vectors in space

A

B

C

s1 s2s3

f

Let the factors be unit vectors …

… then the coefficients are the projections (dot products) of the sample

onto the factors

Suggests a method of choosing factors so that they have large coefficients:

A

B

C

s1 s2s3

f

Find the factor f that maximizes

E = i [ si f ]2

with the constraint that f f =1

Note: square the dot product since it can be

negative

Find the factor f that maximizesE = i [ si f ]2 with the constraint that L = f f – 1 = 0

E = i [ si f ]2 = i [j Sij fj] [k Sik fk] = j k [i Sij Sik] fj fk

= j k Mjk fj fk with Mjk= i Sij Sik or M=STS

L = i fi2 – 1

Use Lagrange Multipliers, minimizing =E-2L, where 2 is the Lagrange Multiplier. We solved this problem 2 lectures ago. It’s solution is the algebraic eigenvalue problemMf = 2 f. Recall that the eigenvalue is the corresponding value of E.

symmetricWrite as square for reasons that will become apparent later

So factors solve the algebraic eigenvalue problem:

[STS] f = 2 f.

[STS] is a square matrix with the same number of rows and columns as there are components. So there are as many factors as there are components. The factors must span a space of the same dimension as the components.

If you sort the eigenvectors by the size of their eigenvectors, then the ones with the largest eigenvalue have the largest components. So selecting the most important factors is easy.

An important tidbit from the theory of eigenvalues and eigenvectors that we’ll use later on …

[STS] f = 2 f.

Let2 be a diagonal matrix of eigenvalues, i

2

and letV be a matrix whose columns are the corresponding

factors, f(i)

Then

[STS] = V 2 VT

Note also that the factors are orthogonal

f(i) f(j) = 0 if ij

This is a mathematically pleasant property

But it may not always be the physically most-relevant choice

B

C

A

f1

B

C

Af2

not orthogonal orthogonal

f1

f2

contains negative A

close to mean of data

Upshot

eigenvectors of [STS] f = 2 f with the p eigenvalues

identify a p-dimensional sub-spacein which most of the data lie

you can use those eigenvectors as factors

Or

You can chose any other p factors that span that subspace In the ternary diagram example, they must lie on the line connecting the two SVD factors

Singular Value Decomposition (SVD)

Any NM matrix S and be written as the product of three matrices

S = U VT

where U is NN and satisfies UTU = UUT

V is MM and satisfies VTV = VVT

and is an NM diagonal matrix of singular values

Now note that it

S = U VT

then

STS = [U VT]T [U VT] = V UTU VT = V 2VT

Compare with the tidbit mentioned earlier STS=V2VT

The SVD V is the same V we were talking about earlierThe columns of V are the eigenvectors f, so

F = VT

So we can use the SVD to calculate the factors, F

But its even better than that! Write

S = U VT

as

S = U VT = [U ] [VT] = C F

So the coefficients are C = U

and, as shown previously, the factors are

F = VT

So we can use the SVD to calculate the coefficients, C, and the factors, F

MatLab Codefor computing C and F

[U,LAMBDA,V] = svd(S);

C = U*LAMBDA;

F = V’;

MatLab Codeapproximating SSp using only the p most

important factors

p = (whatever);Up=U(:,1:p);LAMBDAp=LAMBDA(1:p,1:p);Cp = Up*LAMBDAp;Vp = V(:,1:p);Fp = (Vp)’;Sp = Cp * Fp;

back to my example

Each pixel is a component of the imageand the patters are factors

our derivation assumed that the data (samples, s(i)) were vectors

However, in this example, the data are images (matrices)

so what I had to do was to write out the pixels of each image as a vector

Steps

1) load images2) reorganize images into S3) SVD of S to get U and V4) Examine to identify number of significant factors5) Build S’, using only significant factors6) reorganize S’ back into images

MatLab code for reorganizing a sequence of imagesD(p,q,r) (p=1 …Nx) (q=1 …Nx) (r=1 …Nt)

into the sample matrix, S(r,s) (r=1 …Nt) (q=1 …Nx2)

for r = [1:Nt] % time rfor p = [1:Nx] % row pfor q = [1:Nx] % col q s = Nx*(p-1)+q; % index s S(r,s) = D(p,q,r);endendend

MatLab code for reorganizing the sample matrixS(r,s) (r=1 …Nt) (s=1 …Nx

2) back into a sequence of images

D(p,q,r) (p=1 …Nx) (q=1 …Nx) (r=1 …Nt)

for r = [1:Nt] % time pfor s = [1:Nx*Nx] % index s p = floor( (s-1)/Nx+0.01 ) + 1; % row p q = s - Nx*(p-1); % col q D(p,q,r) = S(r,s);endend

Reality of Factorsare factors intrinsically meaningful, or just a convenient way of

representing data?

Example:Suppose the samples are rocksand the components are element concentrationsthenthinking of the factors as minerals might make intuitive sense

Minerals: fixed element composition

Rock: mixture of minerals

Many rocks – but just a few minerals

mineral (factor) 1mineral (factor) 2mineral (factor) 3

rock 1 rock 2rock 3

rock 4rock 5 rock 6 rock 7

Possibly Desirable Properties of Factors

Factors are unlike each otherdifferent minerals typically contain different elements

Factor contains either large or near-zero componentsa mineral typically contains only a few elements

Factors have only positive componentsminerals composed of positive amount of chemical elements

Coefficient of factors are positive rocks composed of positive amount of minerals

Coefficient typically either large or near-zero rocks composed of just a few major minerals

Transformations of Factors

S = C F

Suppose we mix factors together to get new factors set of factors

New FactorsMM

(f1 in f’1) (f2 in f’1) (f3 in f’1)

(f1 in f’s2) (f2 in f’2) (f3 in f’2)

(f1 in f’3) (f2 in f’3) (f3 in f’3)=




Transformation MM

Old Factors MM

(A in f’1) (B in f’1) (C in f’1)

(A in f’2) (B in f’2) (C in f’2)

(A in f’3) (B in f’3) (C in f’3)

Fnew = T Fold

Transformations of FactorsFnew = T Fold

A requirement is that T-1 exists, else Fnew will not span the same space as Fold

S = C F = C I F = (C T-1) (T F)= Cnew Fnew

So you could try to implement the desirable factors by designing an appropriate transformation matrix, T

A somewhat restrictive choice of T is T=R, where R is a rotation matrix

(rotation matrices satisfy R-1=RT)

A method for implementing this property

Factors are unlike each otherdifferent minerals typically contain different elements

Factor contains either large or near-zero componentsa mineral typically contains only a few elements

Factors have only positive componentsminerals composed of positive amount of chemical elements

Coefficient of factors are positive rocks composed of positive amount of minerals

Coefficient typically either large or near-zero rocks composed of just a few major minerals

Factor contains either large or near-zero components

More-or-less equivalent to

Lots of variance in the amounts of components contained in the factor

Usual formula for variance for data, x

d2 = N-2 [ N ixi

2 - (i xi)2 ]

Application to factor, f

f2 = N-2 [ N ifi

4 - (i fi2)2 ]

Note that we are measuring the variance of the squares of the elements of , f. Thus a factor has large f

2 if the absolute-value of its elements has a lot of variation. The sign of the elements is irrelevant.

Varimax Factors

Procedure for maximizing the variance of the factors

while still preserving their orthogonality

Based on rotating pairs of factorsin their plane

f1oldf2

old

f1new

f2new

f1

f2

f3

f4

f1

cos()f2 + sin()f3

f4

-sin()f2 + cos()f3= R

rotating a pair of factors in their plane by an amount

1 0 0 0 0 cos() sin() 00 -sin() cos() 00 0 0 1

R = Called a Givens rotation, by the way

Varimax Procedure

for a pair of factors fs and ft

find that maximizes the sum of their variances

f’s2+f’t

2) = Nif’is4-(i f’i

s2)2+Nif’it4-(i f’i

t2)2

where fi’s = cos( fi

s + sin( fit

where fi’t = -sin( fi

s + cos( fit

Just solve dE/d = 0

After much algebra

= ¼ tan-1

2Ni uivi – iui ivi

where ui = fis2 - fi

t2 and vi = 2 fis2 fi

t2

Ni (ui2-vi

2) – (iui)2 (ivi)2

Then just apply this rotation to every pair of factors*

the result is a new set of factor that are mutually orthogonal

but that have maximal variance

hence the name Varimax

*Actually, you need to do the whole procedure multiple times to get convergence, since subsequent rotations to some extent undo the work of previous rotations

Example 1fs = [ ½, ½, ½, ½ ]T and ft = [ ½, -½, -½, -½ ]T

= 45°f’s = [ 1/2, 0, 1/2, 0 ]T and f’t = [ 0, -1/2, 0, - 1/2 ]T

rotation angle,

fs2 +

ft

2

sum

of

vari

ance

s

°

worst case: zero variance

Example 2fs = [0.63, 0.31, 0.63, 0.31]T ft = [0.31, - 0.63, 0.31, -0.63]T

= 26.56°

fs = [0.71, 0.00, 0.71, 0.00]T ft = [0.00, -0.71, 0.00, -0.71]T

rotation angle,

fs2 +

ft

2

sum

of

vari

ance

s

°

lecture 20 empirical orthogonal functions and factor analysis

Documents