laboratory in oceanography: data and methods mar599, spring 2009 anne-marie e.g. brunner-suzuki...
TRANSCRIPT
Laboratory in Oceanography:
Data and Methods
MAR599, Spring 2009Anne-Marie E.G. Brunner-Suzuki
Empirical Orthogonal Functions
MotivationMiles’s classDistinguish patterns/noiseReduce dimensionalityPredictionSmoothing
The Goal1. Separate time and space of the data:
1. Filter out the noise and reveal “hidden” structure
Matlab Example 1
Data
t: time; xt is one “map” in time. There are n timesteps and p different
measurements at each timestep.
Matlab Example 2 – artificial signal
SummaryEOF let’s us separate an ensemble of data into k
different modes.Each mode has a ‘space’ (EOF=u) and ‘time’ (EC =c) component
Pre-treating the data can be useful in finding “hidding” structures (taking out the temporal/spatial mean)
But all the information is contained in the dataIt is “just” a mathematical construct. We, the
researchers, are responsible for finding appropriate explanations.
Naming conventionEmpirical Orthogonal Functions AnalysisPrincipal Component AnalysisDiscrete Karhunen–Loève FunctionsHotelling transformProper orthogonal decomposition
How to deal with gaps?Ignore them; leave them be.Introduce randomly generated data to fill
gaps and test for M realizationsFill the gaps in each data series using e.g.
optimal interpolation
Next timeSome math:
What happens inside the black box?How do we know how many modes are
significant?Some problems and pitfallsMore advanced EOFMatlab’s own function
ReferencesPreisendorferStorchHannachi
Laboratory in Oceanography:
Data and Methods
MAR599, Spring 2009Anne-Marie E.G. Brunner-Suzuki
Empirical Orthogonal FunctionsPart II
X has n timesteps and p different measurements.
[n,p] = size(X);
Use ‘reshape’ to convert from 3D to 2D: X=reshape(X3D, [nx*ny ntimes]);
2.Remove the mean from the data, so each column (=timeseries) has zero mean: X=detrend(X,o);
Pre-treating the data X:1. Shaping the data set:
1. Form a covariance matrix:
2. Solve the eigenvalue Problem: Cx R = R Λ. Λ is a diagonal matrix containing all
the eigenvalues λ of Cx. The columns ri in R are the eigenvectors
of Cx. Each corresponding to its λi. We pick the ri to be our EOF patterns:
R= EOFs
3. We arrange the: λ1 > λ2….> λp and the ri correspondingly.
How to do it.
Eigenvectors & Eigenvalues
Cx R = R Λ Here, R is a set of vectors, that are transformed
by Cx into the same vectors except a multiplication factor Λ. R changes in length, but not in direction.
These R are called eigenvectors. The Λ are called eigenvalues.
Also, because Cx is hermitian (diagonally symmetric: Cx’=Cx) and Cx has rank p, there will be p eigenvectors.
Eigenvectors are always orthogonal.
4. All EOFs explain 100% of the variance. Each mode explains part of the total variance.
5. All eigenvectors are orthogonal to each other;Hence Empiriral ORTHOGONAL Functions.
6. To see how the EOFs evolve in time, we compute the ‘expansion coefficients ‘or amplitudes: ECi = X EOFi;
In Matlab:1. Shape your data into time x space2. Demean your data: X = detrend (X,o);3. Compute the Covariance: Cx = cov(X);4. Compute Eigenvectors, Eigenvalues:
[EOFs, l] = eig(Cx);5. Sort according to size. Matlab sorts in
ascending order.6. Compute EC: EC1 = X * EOFs(: , 1);7. Compute variance explained:
Var_expl = diag(l)/trace(l);
NormalizationOften the EOFs are normalized, so that
highest value is 1 or 100.
As X = EOF *EC the EC will need to be adjusted correspondingly, as has to be valid.
How to understand this?Let’s assume we only have 2
samples xa and ya that evolve in time:
If the all observations are random, there would be a blob in space. Any regularities would show up as directionalities in the blob.
EOF Analysis aims to find these new directionalities, by defining a new coordinate system, where the new axis goes right along these dimensionalities
With p observations, we have p-dimensional space, and hence we want to find every cluster, by laying a new coordinate system (basis) through the data.
EOF method takes all the variability in a time evolving field and breaks it into a (a few) standing oscillations and a time series to go with each oscillation. The EC show how the EOF modes vary in time.
A word about removing the mean
Removing the time means has nothing to do with the process of finding eigenvectors, but it allows us to interpret Cx as a covariance matrix, and hence, we can understand our results. Strictly speaking one can find EOFs without removing any mean.
EOF via SVDSVD : Singular Value Decomposition It decomposes any n x p matrix X into the
form:X = U S V’,
U is a n x n orthonormal matrixS is a diagnoal n x p matrix with si,i elements
on the diagonal. s are called singular values.The columns of U and V contain the singular
vectors of X.
Connecting SVD and EOFX is the demeaned data matrix as before. 1.Cx = X’X = (U S V’)’ (U S V’) = VS’ U’ U S V’
= V S’S V’2.Cx = EOFs Λ EOFs’ (rewritten eigenvalue problem)
Comparing 1. & 2.: EOFs = V (at least almost)
Λ = S’ S: the squared singular values are the eigenvalues.
The columns of V contain the eigenvectors of Cx= X’ X; our EOFs.
The columns of U contain the eigenvectors of X’ X. Which is also the normalized time series.
How to do it
1. Use SVD to find U S and V such that X = U S V’
2. Compute the eigenvalues of Cx.3. The eigenvectors of Cx are the column
vectors of V.
We never have to actually compute Cx!
In Matlab1. Shape your data into time x space2. Demean your data: X = detrend (X,o);3. Perform SVD: [ U, S, V ] = svd(X);4. Compue Eigenvalues: EVal = diag(S.^2);5. Compute explained variance:
expl_var = EVal/sum(EVal);6. EOFs are the column vectors of V’: EOFs
= V’; 7. Compute Expansion Coefficients: EC = U*S;
There are basically two techniques:1. Computing Eigenvector and Eigenvalues of the
Covariance Matrix2. Singular Value Decomposition (SVD) of the
data.
Both Methods give similar results. Check it out!
However, 1.There are some differences in dimesionality.2.SVD is much faster – especially when your
data are above 1000 x 1000 points.
The two techniques
Testing Domain DependencyIf the first EOF is unimodal, the second bimodal,
the EOF analysis might be domain dependent.Testing:
Split your domain into two sections (e.g. North and South)
Repeat EOF for each domainAre the same results (unimodal and bi-modal
structures) are obtained for each sub-domain?If yes: The EOF analysis is domain dependent.
Interpretation becomes difficult or impossibleA possibly solution are “rotated EOFs” (REOF):
After a EOF analysis some of the Eigenvectors are rotated.
EOF from Hannachi exampleWinter (DJF) monthly SLP over the Northern
Hemisphere (NH) from NCEP/NCAR reanalyses January 1948 to December 2000.
The mean annual cycle was removed
Positive contours solid, negative contours dashed. EOFs have been multiplied by 100.
Selection RulesVisual.
North’s Rule of Thumb
North et al defined “typical errors” between two neighboring eigenvalues λ:
“typical errors” between neighboring eigenvectors ψ:
n is the number of degrees of freedom, which is generally less than the number of data points.
Are two modes two close, they are called degenerate.
ComplexEOFAllows to analyze propagating signals.Analyze a set of time series by creating a
phase lag among between them by adding a 90degree phase shift. This is done in complex space using the Hilbert transform.
Is cool technique, but pretty complex.
Monte Carlo
Create surrogate data – a randomized data set by scrambling the monthly maps in the time domain, in order to break the chronological order.
Compute EOF of scrambled dataset and analyze EOFs.
Matlab’s own functionsPRINCOMP
[COEFF,SCORE,latent] = princomp(X)[EOFs,EC, EigVal] = princomp (data);The EOFs are columns and so are the ECs.
PCACOV[COEFF,latent,explained] = pcacov(V);[EOFs, EigVal, expl_var] = pcacov(data);I believe this uses svd
Assumptions we madeOrthogonalNormal distributed dataHigh signal to noise ratioStanding Patterns only“The mean”
Problems that might occur:No physical interpretation possibleDegenerate ModesDomain Dependency
A warning from von Storch and Navarra:“I have learned the following rule to be useful
when dealing with advanced methods. Such methods are often needed to find a signal in a vast noisy space, i.e. the needle in the haystack. But after having the needle in our hand, we should be able to identify the needle by simply looking at it. Whenever you are unable to do so there is a good chance that something is rotten in the analysis.”
ReferencesR. W. Preisendorfer. Principal component analysis in
meteorology and oceanography. Elsevier. Science, 1988
Hans v. Storch and Francis W. Zwiers: Statistical Analysis in Climate Research. Cambridge University Press, 2002.
North, G.R., T.L. Bell, R.F. Cahalan, and F.J. Moeng, Sampling errors in the estimation of empirical orthogonal functions, Mon. Wea. Rev., 110, 699-706, 1982.
Hannachi, A., I. T. Jolliffe and D. B. Stephenson: Empirical orthogonal functions and related techniques in atmospheric science: A review. International Journal of Climatology, 27, 1119–1152, 2007.