laboratory in oceanography: data and methods mar599, spring 2009 anne-marie e.g. brunner-suzuki...

Laboratory in Oceanography:

Data and Methods

MAR599, Spring 2009Anne-Marie E.G. Brunner-Suzuki

Empirical Orthogonal Functions

MotivationMiles’s classDistinguish patterns/noiseReduce dimensionalityPredictionSmoothing

The Goal1. Separate time and space of the data:

1. Filter out the noise and reveal “hidden” structure

Matlab Example 1

Data

t: time; xt is one “map” in time. There are n timesteps and p different

measurements at each timestep.

Matlab Example 2 – artificial signal

SummaryEOF let’s us separate an ensemble of data into k

different modes.Each mode has a ‘space’ (EOF=u) and ‘time’ (EC =c) component

Pre-treating the data can be useful in finding “hidding” structures (taking out the temporal/spatial mean)

But all the information is contained in the dataIt is “just” a mathematical construct. We, the

researchers, are responsible for finding appropriate explanations.

Naming conventionEmpirical Orthogonal Functions AnalysisPrincipal Component AnalysisDiscrete Karhunen–Loève FunctionsHotelling transformProper orthogonal decomposition

How to deal with gaps?Ignore them; leave them be.Introduce randomly generated data to fill

gaps and test for M realizationsFill the gaps in each data series using e.g.

optimal interpolation

Next timeSome math:

What happens inside the black box?How do we know how many modes are

significant?Some problems and pitfallsMore advanced EOFMatlab’s own function

ReferencesPreisendorferStorchHannachi

Laboratory in Oceanography:

Data and Methods

MAR599, Spring 2009Anne-Marie E.G. Brunner-Suzuki

Empirical Orthogonal FunctionsPart II

X has n timesteps and p different measurements.

[n,p] = size(X);

Use ‘reshape’ to convert from 3D to 2D: X=reshape(X3D, [nx*ny ntimes]);

2.Remove the mean from the data, so each column (=timeseries) has zero mean: X=detrend(X,o);

Pre-treating the data X:1. Shaping the data set:

1. Form a covariance matrix:

2. Solve the eigenvalue Problem: Cx R = R Λ. Λ is a diagonal matrix containing all

the eigenvalues λ of Cx. The columns ri in R are the eigenvectors

of Cx. Each corresponding to its λi. We pick the ri to be our EOF patterns:

R= EOFs

3. We arrange the: λ1 > λ2….> λp and the ri correspondingly.

How to do it.

Eigenvectors & Eigenvalues

Cx R = R Λ Here, R is a set of vectors, that are transformed

by Cx into the same vectors except a multiplication factor Λ. R changes in length, but not in direction.

These R are called eigenvectors. The Λ are called eigenvalues.

Also, because Cx is hermitian (diagonally symmetric: Cx’=Cx) and Cx has rank p, there will be p eigenvectors.

Eigenvectors are always orthogonal.

4. All EOFs explain 100% of the variance. Each mode explains part of the total variance.

5. All eigenvectors are orthogonal to each other;Hence Empiriral ORTHOGONAL Functions.

6. To see how the EOFs evolve in time, we compute the ‘expansion coefficients ‘or amplitudes: ECi = X EOFi;

In Matlab:1. Shape your data into time x space2. Demean your data: X = detrend (X,o);3. Compute the Covariance: Cx = cov(X);4. Compute Eigenvectors, Eigenvalues:

[EOFs, l] = eig(Cx);5. Sort according to size. Matlab sorts in

ascending order.6. Compute EC: EC1 = X * EOFs(: , 1);7. Compute variance explained:

Var_expl = diag(l)/trace(l);

NormalizationOften the EOFs are normalized, so that

highest value is 1 or 100.

As X = EOF *EC the EC will need to be adjusted correspondingly, as has to be valid.

How to understand this?Let’s assume we only have 2

samples xa and ya that evolve in time:

If the all observations are random, there would be a blob in space. Any regularities would show up as directionalities in the blob.

EOF Analysis aims to find these new directionalities, by defining a new coordinate system, where the new axis goes right along these dimensionalities

With p observations, we have p-dimensional space, and hence we want to find every cluster, by laying a new coordinate system (basis) through the data.

EOF method takes all the variability in a time evolving field and breaks it into a (a few) standing oscillations and a time series to go with each oscillation. The EC show how the EOF modes vary in time.

A word about removing the mean

Removing the time means has nothing to do with the process of finding eigenvectors, but it allows us to interpret Cx as a covariance matrix, and hence, we can understand our results. Strictly speaking one can find EOFs without removing any mean.

EOF via SVDSVD : Singular Value Decomposition It decomposes any n x p matrix X into the

form:X = U S V’,

U is a n x n orthonormal matrixS is a diagnoal n x p matrix with si,i elements

on the diagonal. s are called singular values.The columns of U and V contain the singular

vectors of X.

Connecting SVD and EOFX is the demeaned data matrix as before. 1.Cx = X’X = (U S V’)’ (U S V’) = VS’ U’ U S V’

= V S’S V’2.Cx = EOFs Λ EOFs’ (rewritten eigenvalue problem)

Comparing 1. & 2.: EOFs = V (at least almost)

Λ = S’ S: the squared singular values are the eigenvalues.

The columns of V contain the eigenvectors of Cx= X’ X; our EOFs.

The columns of U contain the eigenvectors of X’ X. Which is also the normalized time series.

How to do it

1. Use SVD to find U S and V such that X = U S V’

2. Compute the eigenvalues of Cx.3. The eigenvectors of Cx are the column

vectors of V.

We never have to actually compute Cx!

In Matlab1. Shape your data into time x space2. Demean your data: X = detrend (X,o);3. Perform SVD: [ U, S, V ] = svd(X);4. Compue Eigenvalues: EVal = diag(S.^2);5. Compute explained variance:

expl_var = EVal/sum(EVal);6. EOFs are the column vectors of V’: EOFs

= V’; 7. Compute Expansion Coefficients: EC = U*S;

There are basically two techniques:1. Computing Eigenvector and Eigenvalues of the

Covariance Matrix2. Singular Value Decomposition (SVD) of the

data.

Both Methods give similar results. Check it out!

However, 1.There are some differences in dimesionality.2.SVD is much faster – especially when your

data are above 1000 x 1000 points.

The two techniques

Testing Domain DependencyIf the first EOF is unimodal, the second bimodal,

the EOF analysis might be domain dependent.Testing:

Split your domain into two sections (e.g. North and South)

Repeat EOF for each domainAre the same results (unimodal and bi-modal

structures) are obtained for each sub-domain?If yes: The EOF analysis is domain dependent.

Interpretation becomes difficult or impossibleA possibly solution are “rotated EOFs” (REOF):

After a EOF analysis some of the Eigenvectors are rotated.

EOF from Hannachi exampleWinter (DJF) monthly SLP over the Northern

Hemisphere (NH) from NCEP/NCAR reanalyses January 1948 to December 2000.

The mean annual cycle was removed

Positive contours solid, negative contours dashed. EOFs have been multiplied by 100.

Selection RulesVisual.

North’s Rule of Thumb

North et al defined “typical errors” between two neighboring eigenvalues λ:

“typical errors” between neighboring eigenvectors ψ:

n is the number of degrees of freedom, which is generally less than the number of data points.

Are two modes two close, they are called degenerate.

ComplexEOFAllows to analyze propagating signals.Analyze a set of time series by creating a

phase lag among between them by adding a 90degree phase shift. This is done in complex space using the Hilbert transform.

Is cool technique, but pretty complex.

Monte Carlo

Create surrogate data – a randomized data set by scrambling the monthly maps in the time domain, in order to break the chronological order.

Compute EOF of scrambled dataset and analyze EOFs.

Matlab’s own functionsPRINCOMP

[COEFF,SCORE,latent] = princomp(X)[EOFs,EC, EigVal] = princomp (data);The EOFs are columns and so are the ECs.

PCACOV[COEFF,latent,explained] = pcacov(V);[EOFs, EigVal, expl_var] = pcacov(data);I believe this uses svd

Assumptions we madeOrthogonalNormal distributed dataHigh signal to noise ratioStanding Patterns only“The mean”

Problems that might occur:No physical interpretation possibleDegenerate ModesDomain Dependency

A warning from von Storch and Navarra:“I have learned the following rule to be useful

when dealing with advanced methods. Such methods are often needed to find a signal in a vast noisy space, i.e. the needle in the haystack. But after having the needle in our hand, we should be able to identify the needle by simply looking at it. Whenever you are unable to do so there is a good chance that something is rotten in the analysis.”

ReferencesR. W. Preisendorfer. Principal component analysis in

meteorology and oceanography. Elsevier. Science, 1988

Hans v. Storch and Francis W. Zwiers: Statistical Analysis in Climate Research. Cambridge University Press, 2002.

North, G.R., T.L. Bell, R.F. Cahalan, and F.J. Moeng, Sampling errors in the estimation of empirical orthogonal functions, Mon. Wea. Rev., 110, 699-706, 1982.

Hannachi, A., I. T. Jolliffe and D. B. Stephenson: Empirical orthogonal functions and related techniques in atmospheric science: A review. International Journal of Climatology, 27, 1119–1152, 2007.

laboratory in oceanography: data and methods mar599, spring 2009 anne-marie e.g. brunner-suzuki...

Documents