Download - Time Domain to Frequency Domain
Autocorrelation and Crosscorrelation
Discussion of basic concept around the following figure from Davis’s text
Recall the basic definition of the correlation coefficient:
and also recall the basic definitions of the covariance and standard deviation.
xy
x
covar
sxyy
rs
1cov
1
n
i ii
xy
x x y y
n
2
1
1
n
ii
x
x xs
n
Combine these terms, assume 0-average, and consider how r will be simplified.
2 2
xyxyr
x y
You should get -
Explicit reference to summation elements i through n has been left out for simplicity.
Consider the following sequence of numbers. Note that the set of numbers has 0 average.
Verify that r, the correlation of the series with itself, equals 1.
1 12 41 , 1,
Computational steps of the autocorrelation function are illustrated graphically below.
Autocorrelation involves the repeated computation of the correlation coefficient r between a series and a shifted version of the series. The shift is referred to as the lag. The computation of the autocorrelation for our simple function with lag = 1 is shown below.
The lag 2 value of the autocorrelation is computed in the same way, but after shifting an image of the input series two sample values relative to the original sequence.
The resultant autocorrelation function consists of 3 terms.
To convert these numbers into correlation coefficients we need only normalize each term in the series by 3.5
We’ll consider the mathematical representation of the autocorrelation function leading to
1
0
n
t tt
a f f
In its discrete form, and
a f t f t dt
In its continuous form.
Autocorrelation
Let’s take another look at this diagram from Davis and see if we understand it a little better.
crosscorrelation
Take the following two series of numbers, assume they are paired observations and compute the correlation coefficient between them.
Given series 1: 2, -1, -1, and
series 2: 1.5, -1, -0.5
Note that both series have 0 mean value.
2 2
xyxyr
x y
3 1 0.5 4.50.983
4.58(4 1 1)(2.25 1 0.5)xyr
Ziegler et al, 1997
A “noise free” data set and its autocorrelation - This simulated data set is comprised of two periodic components.
The presence of the two components is easily seen in either the raw data or its autocorrelation.
In the presence of other influences (measurement error or a process influenced by many variables but controlled by only a few as in our multivariate analysis) our data may not be so easily interpretable. The autocorrelation helps clean it up and reveal the presence of dominant cyclical components.
The amplitudes of the different frequency components are represented in the upper plot.
The relative phase shifts imposed on the set of cosine waves are defined by the second plot from the top.
We noted that time and spatial views of our data can actually be constructed from a sum of cosines and/or sine waves (in time or space)
The data you are looking at can go from the simple to complex, but it can usually be broken down into a
series individual spectral components.
Even when our data have abrupt changes in value, it is still possible to replicate these details using a sum of sines and cosines.
A data set depicting the amplitude and frequency of the different sines and cosines used to create the temporal or spatial features in your data is referred to as the amplitude spectrum.
Profile distance or time of measurement
0 50 100 150 200
Ob
serv
ed
Va
lue
-4-3-2-101234
Data set DS4.dat
Lag (time or distance)
0 20 40 60 80 100
Corr
ela
tion C
oeffic
ient
-0.5
0.0
0.5
1.0Autocorrelation of DS4.dat
Given the more complicated data sets like the ones we were analyzing before, the autocorrelation and cross correlation give us some idea of the frequency or wavelength of imbedded cyclical components. We would guess that the amplitude spectrum should reveal certain prominent frequencies.
Frequency
0.0 0.1 0.2 0.3 0.4 0.5
Am
plit
ude
0.0
0.1
0.2
0.3
0.4
0.5Amplitude Spectrum of DS4.dat
1st peak on the left has an f ~ 0.016 that corresponds to a period of 60 samplesThe second major peak has an f of0.05 and corresponds to a period of 20 samples
0.0160.05
Profile distance or time of measurement
0 50 100 150 200
Ob
serv
ed
Va
lue
-4-3-2-101234
Data set DS4.dat
Lag (time or distance)
0 20 40 60 80 100
Corr
ela
tion C
oeffic
ient
-0.5
0.0
0.5
1.0Autocorrelation of DS4.dat
We also examined oxygen isotope data from the Caribbean and Mediterranean using autocorrelation and cross correlation methods and found indications of pronounced cyclical variation through time.
Frequency (cycles per million years)
0 50 100 150 200 250
Am
plitu
de
0.000
0.005
0.010
0.015
0.020
0.025
0.030
0.035Amplitude spectrum of the del O data from the Caribbean
18,700
26,200
40,000?
110,000
125,000125,000
?
The autocorrelation and amplitude spectrum of the Caribbean Sea O18 variations.
Three components representing an ideal model of the “Milankovich” cycles.
The real world is not that simple.
The superposition of all influences over a 500,000 year period of time.
100,000 years
41,000 years
21,000 years
Variations in orbital parameters computed over 5 million and 1 million year time frames.
Summation of these responses over the past 800,000 years yields a complicated function that might be viewed as controlling earth climate.
The composite response calculated over the past 5 million years and it’s amplitude spectrum.
The astronomical components show up as separate peaks in the amplitude spectrum, and the outcome is a little more complicated than the simple 3 component forcing model.
Anyone recall what the Nyquist frequency is?
Recall, this frequency is related to the sampling interval.
What is the maximum frequency you can see when sampling at a given sample rate t?fNy=1/2t
Simulated Climate Data
-8
-6
-4
-2
0
2
4
6
8
0 100 200 300 400 500 600
Time in multiples of 1000 years
Relative Response
In today’s lab exercise you’ll simulate noisy climate data containing “hidden” Melankovich cycles and then compute its amplitude spectrum.
Frequency (cycles/1000 years)
0.000 0.050 0.100 0.150 0.200
Re
lativ
e A
mp
litu
de
0.00
0.50
1.00
1.50
2.00
2.50Spectrum of Simulated Climate Data