redfit-x cross-spectral analysis of unevenly spaced ... · pdf fileresearch paper redfit-x:...

8
Research paper REDFIT-X: Cross-spectral analysis of unevenly spaced paleoclimate time series Kristín Björg Ólafsdóttir a,b,n , Michael Schulz b , Manfred Mudelsee a a Climate Risk Analysis, Heckenbeck, 37581 Bad Gandersheim, Germany b MARUM - Center for Marine Environmental Sciences, University of Bremen, 28334 Bremen, Germany article info Article history: Received 13 July 2015 Received in revised form 23 December 2015 Accepted 1 March 2016 Available online 4 March 2016 Keywords: Cross-spectral analysis Coherency Phase spectrum Irregular sampling Monte Carlo simulations Paleoclimate time series abstract Cross-spectral analysis is commonly used in climate research to identify joint variability between two variables and to assess the phase (lead/lag) between them. Here we present a Fortran 90 program (REDFIT-X) that is specially developed to perform cross-spectral analysis of unevenly spaced paleocli- mate time series. The data properties of climate time series that are necessary to take into account are for example data spacing (unequal time scales and/or uneven spacing between time points) and the per- sistence in the data. LombScargle Fourier transform is used for the cross-spectral analyses between two time series with unequal and/or uneven time scale and the persistence in the data is taken into account when estimating the uncertainty associated with cross-spectral estimates. We use a Monte Carlo ap- proach to estimate the uncertainty associated with coherency and phase. False-alarm level is estimated from empirical distribution of coherency estimates and condence intervals for the phase angle are formed from the empirical distribution of the phase estimates. The method is validated by comparing the Monte Carlo uncertainty estimates with the traditionally used measures. Examples are given where the method is applied to paleoceanographic time series. & 2016 Elsevier Ltd. All rights reserved. 1. Introduction Cross-spectral analysis is often used to estimate the relation- ship between two time series as a function of frequency. Of par- ticular importance are the coherency and phase spectrum. Co- herency is a dimensionless measure on how well two time series co-vary at different frequencies while the phase spectrum shows if the variations happen synchronously at each frequency or if there is a phase difference between them. Cross-spectral analysis is used in climate research to identify joint variability between two vari- ables and to assess the phase (lead/lag) between them. Paleoclimate proxy time series come from various archives such as marine sediments, ice caps, lake sediments, speleothems, tree rings or corals. Usually the sampling is carried out at constant length intervals and then transferred into the time domain, either by direct dating, or by aligning with other dated time series. Most cross-spectral analysis methods require that the two time series are sampled at identical times and have constant spacing between time points (evenly spaced). This is rarely the case with paleocli- mate time series as the archives do normally not accumulate at constant rate, which makes in many cases some kind of inter- polation necessary prior to the analysis. Unfortunately, interpola- tion can bias the spectral results substantially as the spectral power may be shifted from higher to lower frequencies, that is, the spectrum becomes redder (Schulz and Stattegger, 1997). The in- terpolation can be avoided by estimating the spectrum directly from unevenly spaced time series with the LombScargle Fourier transform (Lomb, 1976; Scargle, 1982) as done for example in the computer programs by Schulz and Stattegger (1997), Schulz and Mudelsee (2002) and Pardo-Igúzquiza and Rodríguez-Tovar (2012). Generally, climate time series include persistence (serial cor- relation) or memory as there is natural inertia in the climate system. Due to the persistence, the spectra of climate time series are characterized by greater amplitude values at lower frequencies (red noise). To distinguish the signals (spectral peaks) in the spectrum of climate time series from background variability they need to be tested against red noise. First-order autoregressive or AR(1) process can be used to model the climate noise (Hassel- mann, 1976). The model is normally tted to the observed time series and the estimated AR(1) parameter is used to form the red noise spectrum (Allen and Smith, 1996). In REDFIT (Schulz and Mudelsee, 2002), the AR(1) parameter is estimated directly from the unevenly spaced time series, so there is no need to interpolate the time series, which can bias the estimated value. The estimated Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/cageo Computers & Geosciences http://dx.doi.org/10.1016/j.cageo.2016.03.001 0098-3004/& 2016 Elsevier Ltd. All rights reserved. n Corresponding author at: Climate Risk Analysis, Heckenbeck, 37581 Bad Gan- dersheim, Germany. E-mail address: [email protected] (K. Björg Ólafsdóttir). Computers & Geosciences 91 (2016) 1118

Upload: lymien

Post on 05-Feb-2018

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: REDFIT-X Cross-spectral analysis of unevenly spaced ... · PDF fileResearch paper REDFIT-X: Cross-spectral analysis of unevenly spaced paleoclimate time series Kristín Björg Ólafsdóttira,b,n,

Computers & Geosciences 91 (2016) 11–18

Contents lists available at ScienceDirect

Computers & Geosciences

http://d0098-30

n Corrdersheim

E-m

journal homepage: www.elsevier.com/locate/cageo

Research paper

REDFIT-X: Cross-spectral analysis of unevenly spaced paleoclimatetime series

Kristín Björg Ólafsdóttir a,b,n, Michael Schulz b, Manfred Mudelsee a

a Climate Risk Analysis, Heckenbeck, 37581 Bad Gandersheim, Germanyb MARUM - Center for Marine Environmental Sciences, University of Bremen, 28334 Bremen, Germany

a r t i c l e i n f o

Article history:Received 13 July 2015Received in revised form23 December 2015Accepted 1 March 2016Available online 4 March 2016

Keywords:Cross-spectral analysisCoherencyPhase spectrumIrregular samplingMonte Carlo simulationsPaleoclimate time series

x.doi.org/10.1016/j.cageo.2016.03.00104/& 2016 Elsevier Ltd. All rights reserved.

esponding author at: Climate Risk Analysis, H, Germany.

ail address: [email protected]

a b s t r a c t

Cross-spectral analysis is commonly used in climate research to identify joint variability between twovariables and to assess the phase (lead/lag) between them. Here we present a Fortran 90 program(REDFIT-X) that is specially developed to perform cross-spectral analysis of unevenly spaced paleocli-mate time series. The data properties of climate time series that are necessary to take into account are forexample data spacing (unequal time scales and/or uneven spacing between time points) and the per-sistence in the data. Lomb–Scargle Fourier transform is used for the cross-spectral analyses between twotime series with unequal and/or uneven time scale and the persistence in the data is taken into accountwhen estimating the uncertainty associated with cross-spectral estimates. We use a Monte Carlo ap-proach to estimate the uncertainty associated with coherency and phase. False-alarm level is estimatedfrom empirical distribution of coherency estimates and confidence intervals for the phase angle areformed from the empirical distribution of the phase estimates. The method is validated by comparing theMonte Carlo uncertainty estimates with the traditionally used measures. Examples are given where themethod is applied to paleoceanographic time series.

& 2016 Elsevier Ltd. All rights reserved.

1. Introduction

Cross-spectral analysis is often used to estimate the relation-ship between two time series as a function of frequency. Of par-ticular importance are the coherency and phase spectrum. Co-herency is a dimensionless measure on how well two time seriesco-vary at different frequencies while the phase spectrum shows ifthe variations happen synchronously at each frequency or if thereis a phase difference between them. Cross-spectral analysis is usedin climate research to identify joint variability between two vari-ables and to assess the phase (lead/lag) between them.

Paleoclimate proxy time series come from various archivessuch as marine sediments, ice caps, lake sediments, speleothems,tree rings or corals. Usually the sampling is carried out at constantlength intervals and then transferred into the time domain, eitherby direct dating, or by aligning with other dated time series. Mostcross-spectral analysis methods require that the two time seriesare sampled at identical times and have constant spacing betweentime points (evenly spaced). This is rarely the case with paleocli-mate time series as the archives do normally not accumulate at

eckenbeck, 37581 Bad Gan-

om (K. Björg Ólafsdóttir).

constant rate, which makes in many cases some kind of inter-polation necessary prior to the analysis. Unfortunately, interpola-tion can bias the spectral results substantially as the spectralpower may be shifted from higher to lower frequencies, that is, thespectrum becomes redder (Schulz and Stattegger, 1997). The in-terpolation can be avoided by estimating the spectrum directlyfrom unevenly spaced time series with the Lomb–Scargle Fouriertransform (Lomb, 1976; Scargle, 1982) as done for example in thecomputer programs by Schulz and Stattegger (1997), Schulz andMudelsee (2002) and Pardo-Igúzquiza and Rodríguez-Tovar(2012).

Generally, climate time series include persistence (serial cor-relation) or memory as there is natural inertia in the climatesystem. Due to the persistence, the spectra of climate time seriesare characterized by greater amplitude values at lower frequencies(red noise). To distinguish the signals (spectral peaks) in thespectrum of climate time series from background variability theyneed to be tested against red noise. First-order autoregressive orAR(1) process can be used to model the climate noise (Hassel-mann, 1976). The model is normally fitted to the observed timeseries and the estimated AR(1) parameter is used to form the rednoise spectrum (Allen and Smith, 1996). In REDFIT (Schulz andMudelsee, 2002), the AR(1) parameter is estimated directly fromthe unevenly spaced time series, so there is no need to interpolatethe time series, which can bias the estimated value. The estimated

Page 2: REDFIT-X Cross-spectral analysis of unevenly spaced ... · PDF fileResearch paper REDFIT-X: Cross-spectral analysis of unevenly spaced paleoclimate time series Kristín Björg Ólafsdóttira,b,n,

K. Björg Ólafsdóttir et al. / Computers & Geosciences 91 (2016) 11–1812

AR(1) parameter is used to form a theoretical AR(1) spectrum andfalse-alarm level for testing the significance of spectral peaks viathe χ2 distribution. In addition Monte Carlo simulations can beused, where a large number of red noise processes are generatedwith the same estimated AR(1) parameter, to form false-alarmlevel as percentiles of the Monte Carlo ensemble (Schulz andMudelsee, 2002).

The REDFIT program can only be used for univariate spectralanalysis or the autospectrum. Given the need for cross-spectralanalysis for unevenly spaced data where the significance is eval-uated with Monte Carlo simulations, we present a computer pro-gram REDFIT-X, in which cross-spectral analysis has been im-plemented. Until now the computer program SPECTRUM (Schulzand Stattegger, 1997) has been available for cross-spectral analysisfor unevenly spaced climate time series. However the significancemeasurements in SPECTRUM do not allow for the persistence in-cluded in paleoclimate time series. Therefore it is necessary tocombine the two approaches, to perform both auto- and cross-spectral analysis with reliable uncertainty estimates.

2. Method

2.1. Cross-spectral analysis – background

2.1.1. Coherency and phase spectrumThe most important features of the cross-spectrum are coher-

ency spectrum and phase between the two signals. Coherency isdefined as:

( ) =( )

( ) ( ) ( )c f

G f

G f G f,

1xy

xy

xx yy

22

where Gxx(f) and Gyy(f) are the autospectra of the signals x(t) and y(t) (with t being time), respectively and Gxy(f) is the cross-spec-trum between them (e.g., Bendat and Piersol, 2010). It is a di-mensionless measure that informs about the degree of linear re-lationship between two time series, as a function of frequency (f).Coherency is in the range from 0 (no relationship) to 1 (perfectrelationship) and can be thought of as a squared correlationcoefficient depending on frequency (von Storch and Zwiers, 2003).

Coherency is estimated as:

^ ( ) =

^ ( )

^ ( ) ^ ( ) ( )c f

G f

G f G f,

2xy k

xy k

xx k yy k

2

2

where ^ ( )G fxx k and ^ ( )G fyy k are the estimated autospectra and ^ ( )G fxy kis the estimated cross-spectrum of two weakly stationary timeseries { ( ) ( )} =t i x i,x i

n1

x and { ( ) ( )} =t i y i,y in

1y (nx and ny are numbers of

data points in each series) (e.g., Bendat and Piersol, 2010). Thefrequency fk is in the range from the fundamental frequency¯ = ( ¯)f nd1/ to the average Nyquist frequency = ( ¯)f d1/ 2Nyq , where n

is the number of data points and ¯ = [ ( ) − ( )] ( − )d t n t n1 / 1 is theaverage spacing of the time series (Mudelsee, 2010). Apparentlywhen two time series do not have the same sampling points, theaverage spacing ( ¯ ¯d d,x y) and the fundamental frequency (¯ ¯ )f f,x y foreach time series can differ. To ensure that the time series with thelower resolution determines these variables, we use¯ = ( ¯ ¯ )d d dmax ,xy x y , ¯ = (¯ ¯ )f f fmax ,xy x y and the average Nyquist fre-

quency is determined as = ( ¯ )f d1/ 2 xyNyq (Schulz and Stattegger,1997).

The auto- and cross-spectra are estimated with the Lomb–Scargle Fourier transform (Lomb, 1976; Scargle, 1982) in combi-nation with the “Welch's Overlapped Segment Averaging” (WOSA)procedure (Welch, 1967) as done in Schulz and Stattegger (1997).

The WOSA segmenting is used to smooth the estimated rawspectrum and make it consistent (the raw spectrum is an incon-sistent estimator as the variance does not decrease with increasingdata size). The time series of length nx and ny are split into anumber n50 of overlapping segments of length n x

seg and n yseg (with

50% overlap)

( ) = [( − )( ) + ] = …

( ) = [( − )( ) + ] = … ( )

x j x i n j j n

y j y i n j j n

1 /2 , 1, , ,

1 /2 , 1, , , 3

ix x

iy y

seg seg

seg seg

where = …i n1, , 50. A linear trend is subtracted from each seg-ment to avoid possible artifacts at low frequencies (i.e., resultingfrom periods, which exceed the segment length). Of course, in-terpretation of the low-frequency part of a spectrum requiressufficiently long segments. The segments are multiplied by a taper

( ) = …w j j n, 1, , seg (see different types of spectral windows inHarris, 1978) to reduce spectral leakage and then Fourier trans-formed

( ) = { ( ) ( )}

( ) = { ( ) ( )} ( )

X f x j w j

Y f y j w j

,

, 4

i k i

i k i

LS

LS

where LS denotes the Lomb–Scargle Fourier transform (Lomb,1976; Scargle, 1982). Finally the n50 segments are averaged to formconsistent auto- and cross-spectrum

∑^ ( ) = ¯ | ( )|( )=

G fn f n

X f2

,5

xx kxy

xi

n

i k50 seg 1

250

∑^ ( ) = ¯ | ( )|( )=

G fn f n

Y f2

,6

yy kxy

yi

n

i k50 seg 1

250

( ) ( )∑^ ( ) = ¯ ( )=

⁎⎡⎣ ⎤⎦G fn f n n

X f Y f2

,7

xy kxy

x yi

n

i k i k50 seg seg 1

50

where n denotes the complex conjugate (Schulz and Stattegger,1997).

The coherency estimate is biased, where the coherency be-tween two uncoupled time series are expected to be greater thanzero (Benignus, 1969). We use the bias approximation from Bendatand Piersol (2010) to form a bias-corrected coherency estimate

( )^ ≈ − ^

^′ = ^ − ^( )

⎡⎣⎢

⎛⎝⎜

⎞⎠⎟

⎤⎦⎥

⎡⎣⎢

⎤⎦⎥

⎛⎝⎜

⎞⎠⎟

⎛⎝⎜

⎞⎠⎟

⎡⎣⎢

⎛⎝⎜

⎞⎠⎟

⎤⎦⎥

c f c f n

c f c f c f

bias 1 / ,

bias ,8

xy k xy k

xy k xy k xy k

2 2 2

eff

2 2 2

where neff is the effective number of segments (defined in Section2.1.2).

The phase spectrum is estimated as

( )( )ϕ̂ ( ) =

^

^( )

⎜⎜⎜

⎟⎟⎟f

Q f

C ftan ,

9

xy k

xy k

xy k

1

where ^ ( )Q fxy k and ^ ( )C fxy k are the real and imaginary parts of the

estimated cross-spectrum ^ ( )G fxy k , respectively (e.g., Bendat andPiersol, 2010). We use the four-quadrant inverse tangent functionin Fortran (atan2), which returns the result in appropriate quad-rant. Therefore the estimated phase angle falls in the range from[�180°, 180°], where zero value means that the two time seriesare in phase while non zero value means out of phase.

Page 3: REDFIT-X Cross-spectral analysis of unevenly spaced ... · PDF fileResearch paper REDFIT-X: Cross-spectral analysis of unevenly spaced paleoclimate time series Kristín Björg Ólafsdóttira,b,n,

)

)

K. Björg Ólafsdóttir et al. / Computers & Geosciences 91 (2016) 11–18 13

2.1.2. Theoretical uncertainty measurementsA false-alarm level for estimated coherency has been derived

from the statistical distribution of coherency estimate, when truecoherency equals zero

α= − ( )( − )z 1 , 10xyn2 1/ 1eff

where α is the significant level and neff is the effective number ofsegments (cf. Carter, 1977; Koopmans, 1995). The statistical dis-tribution of coherency estimate (Goodman, 1957) was derivedunder the assumptions that the two signals are stationary Gaus-sian random processes and the data segments are independent(non-overlapping) (Carter, 1977). The overlapping segments areknown to reduce the bias and variance of the coherency estimate(Carter et al., 1973), but it also introduces correlation betweensegments and changes the statistical distribution of the coherencyestimate. Therefore it is necessary to use effective number ofsegments ( )neff instead of the real number of segments (n50) tocalculate the false-alarm level for coherency. The effective numberof segments for 50% overlap is calculated according to Welch(1967) as = ( + − )n n c c n/ 1 2 2 /eff 50 50

2502

50 , where c50 is a constantdepending on the applied spectral window (Harris, 1978). Theestimated phase is approximately normally distributed and thestandard deviation of the phase estimate is used to form con-fidence interval for the phase angle. The standard deviation (inradians) is approximated as:

( ( ))( )σ ϕ[ ^ ( )] ≈

− ^′

^′( )

fc f

c f n

1

2 11xy k

xy k

xy k

2 1/2

eff

(Bendat and Piersol, 2010). Uncertainties for phase values,

ϕ α σ ϕ^ ( ) ± ( )· [ ^ ( )]f z fxy k xy k , are derived from percentage points of thenormal distribution α( )z . Due to the dependency of the standarddeviation on coherency, the uncertainty for a phase angle can bevery large if coherency is close to zero. Accordingly the phaseshould only be analysed at frequencies with significant coherency.

2.2. Monte Carlo procedure

2.2.1. Monte Carlo false-alarm level for coherencyWe use a Monte Carlo simulation technique to approximate the

additional false-alarm level to detect whether there is significantcoherency between the two time series at a given frequency. TheMonte Carlo false-alarm level is formed from an empirical sam-pling distribution of the coherency estimator. The sampling dis-tribution is obtained by estimating coherency between many pairsof uncoupled time series formed by Monte Carlo simulations,which mimic key properties of the observed signals, such as au-tocorrelation and time spacing. The detailed steps of the simula-tions are as follows:

1. The persistence times τx and τy are estimated from the twoobserved time series. They are estimated with the least-squaresalgorithm TAUEST (Mudelsee, 2002), which fits a first-orderautoregressive or AR(1) persistence model to unevenly spacedtime series.

2. Monte Carlo simulation loop, repeated nsim times (where nsim isthe number of simulations, usually around 1000).� Two uncoupled AR(1) processes rx and ry are generated as

{ }( { })

τ

τ

( ) = ( ) ∼ ( )

= …

( ) = − ( ) − ( − ) · ( − ) + ( )

( ) ∼ − − ( ) − ( − ) (

⎡⎣ ⎤⎦⎡⎣ ⎤⎦

r N

i n

r i t i t i r i i

i N t i t i

1 1 0, 1 ,

and, for 2, ,

exp 1 / 1 ,

0, 1 exp 2 1 / , 12

x x

x

x x x x x x

x x x x

and

{ }( { })

τ

τ

( ) = ( ) ∼ ( )

= …

( ) = − ( ) − ( − ) · ( − ) + ( )

( ) ∼ − − ( ) − ( − ) (

⎡⎣ ⎤⎦⎡⎣ ⎤⎦

r N

i n

r i t i t i r i i

i N t i t i

1 1 0, 1 ,

and, for 2, ,

exp 1 / 1 ,

0, 1 exp 2 1 / , 13

y y

y

y y y y y y

y y y y

see Mudelsee (2010). We include the estimated τx and τy fromStep 1, sampling times ( ( ))t ix and ( ( ))t iy (unevenly or evenlyspaced) from each of the observed time series and unrelated setof x and y for each process, which are purely randomprocesses drawn from normal distribution μ σ( )N , 2 , with zeromean μ and unit variance s2. The variance of the innovation term

is set to, for example σ τ= − ( − [ ( ) − ( − )] )t i t i1 exp 2 1 /y y y2 ,

to have the AR(1) process stationary with unit variance.� The coherency ^ ( )c fr r k

2x y between the two generated AR

(1) processes is estimated using Eq. (2).3. The false-alarm level for ^′ ( )c fxy k

2is determined as the α( − )n 1sim

th point of the empirical sampling distribution of the ^ ( )c fr r k2x y

estimates. This is done at all frequencies fk. Coherency valuesabove this false-alarm level can be taken as significantly dif-ferent from zero at given significance level α.

2.2.2. Monte Carlo phase confidence intervalMonte Carlo confidence intervals for the phase are estimated at

frequencies fk where the coherency exceeds the false-alarm level.The confidence interval is formed from the empirical distributionof the phase estimates formed by Monte Carlo simulations. Theuncertainty in the phase angle depends on the coherency valueand effective number of segments neff , which means that we needto generate two time series with prescribed coherency. Themethod described in Carter et al. (1973) is used to generate twowhite noise processes with prescribed coherency independent offrequency (similar approaches are also found in Foster and Guinzy,1967; Miles, 2011)

( ) = ( ) + ( ) = … ( )⁎ i i G i i n, 1, , , 14x x y

( ) = ( ) + ( ) = … ( )⁎ i i G i i n, 1, , , 15y y x

where x and y are unrelated random processes with normaldistribution, zero mean and unit variance and n is the number ofdata points. G can be selected from ( ) = ( + )c f G G4 / 1xy k

2 2 2 2 for de-sired value of c2xy (Carter et al., 1973). This contains a fourth-orderpolynomial equation for the calculation of G, leading to

+ + − =c G c G c G2 4 0xy xy xy2 4 2 2 2 2 , which has four solutions for G. Here

we use = − ( − ) + −G c c c2 1 / 2/ 1xy xy xy2 2 2 to generate the coupled

time series. This method works only if the two time series haveidentical time steps. Therefore to include both time scales, theMonte Carlo simulation loop is performed twice. In the first roundthe sample times { ( )} =t i i

n1 are set equal to the times from the first

time series { ( )} =t ix in

1x and the persistence time τ equal to τx. In the

second round the times are set equal to the times from secondtime series { ( )} =t iy i

n1

y and the persistence time equal to τy. Con-fidence intervals are formed from the empirical distributions ofphase estimates for both cases and the mean value is taken for

Page 4: REDFIT-X Cross-spectral analysis of unevenly spaced ... · PDF fileResearch paper REDFIT-X: Cross-spectral analysis of unevenly spaced paleoclimate time series Kristín Björg Ólafsdóttira,b,n,

)

)

Fig. 1. Empirical distribution of coherency estimate at single frequency for fourdifferent prescribed coherency values. White noise processes, =n 8.24eff , n¼300,

=n 10, 000sim . Black lines show the theoretical distributions (Eq. (18)).

K. Björg Ólafsdóttir et al. / Computers & Geosciences 91 (2016) 11–1814

final result. The detailed steps of the simulations are the following:

1. For each frequency ( )fk with significant coherency a Monte Carlosimulation loop is repeated nsim times with =n 1000sim :� Two coupled AR(1) processes ⁎rx and ⁎ry are generated as

τ

( ) = ( )

= …( ) = {− ( ) − ( − ) }· ( − ) + ( ) (

⁎ ⁎

⁎ ⁎ ⁎⎡⎣ ⎤⎦

r

i n

r i t i t i r i i

1 1 ,

and, for 2, ,

exp 1 / 1 , 16

x x

x x x

τ

( ) = ( )

= …( ) = {− ( ) − ( − ) }· ( − ) + ( ) (

⁎ ⁎

⁎ ⁎ ⁎⎡⎣ ⎤⎦

r

i n

r i t i t i r i i

1 1 ,

and, for 2, ,

exp 1 / 1 . 17

y y

y y y

The white noise terms ⁎x and

⁎y are coupled with the method

explained above Eqs. (14) and (15), where the bias corrected

coherency estimate ^′ ( )c fxy k2

is used to select appropriate Gvalue.

� The phase ϕ̂ ( )⁎ ⁎ fr r kx yis estimated according to (Eq. (9)).

2. The estimated phase ϕ̂ ( )⁎ ⁎ fr r kx yis centered around zero by sub-

tracting the average phase (average of ϕ̂ ( )⁎ ⁎ fr r kx yover the nsim

simulations) from each phase value.3. Confidence intervals for the phase are determined as

ϕ α^ ( ) + ( )⁎ ⁎ f 100r r kx yth and ϕ α^ ( ) + ( − )⁎ ⁎ f 100 1 thr r kx y

percentagepoint of the empirical distribution of the estimated phase valuesϕ̂ ( )⁎ ⁎ fr r kx y

.

The Monte Carlo simulation loop is performed twice for eachfrequency with significant coherency, first with the time scale txand persistence time τx and second with the times ty and persis-tence time τy. The phase confidence interval at given frequencyconsists of the mean value from the two Monte Carlo simulationloops.

3. Comparison of theoretical and Monte Carlo results

3.1. Coherency

We compare the empirical sample distribution of coherencyformed by the Monte Carlo simulations with the theoretical dis-tribution of coherency estimate from Carter et al. (1973), as thefalse-alarm levels for coherency are formed from thesedistributions.

The Monte Carlo simulation loop in the program is used to formthe empirical distribution of coherency (see Section 2.2.1). In thefirst example 10,000 pairs ( =n 10000sim ) of white noise processeswere generated, where { } =rx i

n1

x and { } =ry in

1y in Eqs. (12) and (13) are

set as Gaussian random processes x and y, with mean μ = 0 andvariance σ = 12 . The number of data points in the generated timeseries is n¼300 and the two series have identical and evenlyspaced time points. Several experiments were done for differentpredefined coherency value between the two processes, in-dependent of frequency. The white noise processes were coupledas in Eqs. (14) and (15), with four different prescribed coherencyvalues ∈ { }c 0.0, 0.3, 0.6, 0.9xy

2 . Welch window was used in thespectral analysis, number of segments n50 was set to 10 whichgives =n 8.24eff . Histogram of the coherency estimates, from theMonte Carlo simulations, for single frequency is compared withthe probability density function of the coherency estimate fromCarter et al. (1973),

( )( )

( )

^

= ( − ) − − ^ ·

− ^ − − ^( )

⎛⎝⎜

⎞⎠⎟

⎛⎝⎜

⎞⎠⎟

p c n c

n c c

c c F n n c c

,

1 1 1

1 2 1 , 1 ; 1; .18

xy xy

xyn

xy

n

xy xy

n

xy xy

2eff

2

eff2 2 2

2 2 1 2

1 eff eff2 2

eff eff

eff

Matlab code from Pearson (2009) was used to compute the hy-

pergeometric function ( − − ^ )F n n c c2 1 , 1 ; 1; xy xy1 eff eff2 2

. It is clearfrom Fig. 1 that we are able to reproduce the theoretical dis-tribution of coherency estimate with the Monte Carlo simulations,in the case of two evenly spaced white noise processes.

The false-alarm level (Eq. (10)) is used to test if the estimatedcoherency values are different from zero, that is different fromestimated coherency between two independent processes. Thefalse-alarm level is derived from the probability function of thecoherency estimator when =c 0xy

2 , which can be written as

( )^ = = ( − ) − ^( )

−⎛⎝⎜

⎞⎠⎟p c n c n c, 0 1 1

19xy xy xy

n2eff

2eff

2 2eff

(Carter, 1977). The estimated coherency between two completelyunrelated time series is expected to be greater than zero with the

expected value of [^ ∣ = ] =E c n c n, 0 1/xy xy2

eff2

eff (which is also the bias)(Miles, 2006). This is correspondingly observed in the empiricaldistribution from the simulations above. We can determine theempirical bias for the estimated coherency by averaging

(^ ( ) − )c f cr r k xy2 2x y over the =n 10, 000sim simulations. The empirical

Page 5: REDFIT-X Cross-spectral analysis of unevenly spaced ... · PDF fileResearch paper REDFIT-X: Cross-spectral analysis of unevenly spaced paleoclimate time series Kristín Björg Ólafsdóttira,b,n,

Fig. 2. As Fig. 1 but for red noise processes τ τ( = = )3.0, 5.0x y with uneven spacing.=n 8.24eff , n¼300, =n 10, 000sim . Black lines show the theoretical distributions

(Eq. (18)).

Fig. 3. 95% false-alarm level for coherency plotted against effective number ofsegments ( )neff . Grey line with triangle shows the theoretical false-alarm level, greydashed line shows Monte Carlo false-alarm level generated with white noise pro-cesses, black line with diamonds shows Monte Carlo false-alarm level generatedwith red noise processes ( τ = 3.0x , τ = 5.0y ). Number of simulations were 1000.(a) Monte Carlo false-alarm level formed with evenly spaced time series, n¼300.(b) Monte Carlo false-alarm level formed with unevenly spaced time series, n¼300.(c) Monte Carlo false-alarm level formed with unevenly spaced time series withunequal time-scale, =n 305x , =n 293y .

K. Björg Ólafsdóttir et al. / Computers & Geosciences 91 (2016) 11–18 15

bias for estimated coherency when =c 0xy2 and =n 8.24eff equals

0.12 which agrees with the theoretical bias. The theoretical false-alarm level is derived from the theoretical distribution of coher-ency estimator when the true coherency is zero. There is no biascorrection done beforehand. According to that we do not correctfor bias in the coherency estimates formed in the Monte Carloloop, as it causes mismatch between the theoretical and MonteCarlo false-alarm level (the false-alarm level estimated with theMonte Carlo simulations would be lower than the false-alarm levelfrom Eq. (10)).

In the second example the same settings were used in thespectral analysis but the generated processes were more complex.Now we generated 10,000 pairs of unevenly spaced Gaussian AR(1) processes. Several experiments were done with different pre-defined coherency values. Two coupled AR(1) processes ⁎rx and ⁎ry

were generated as in Eqs. (16) and (17), where the τ values wereset as 3.0 for the first time series ⁎rx and as 5.0 for the second timeseries ⁎ry. The number of data points in the time series is n¼300.The two time series have the same sampling times but the timescale is unevenly spaced. The uneven time spacing was drawnfrom Gamma distribution and the average time spacing set to 1.0.Fig. 2 shows the distribution of the coherency estimates at singlefrequency with four different prescribed coherency values

∈ { }c 0.0, 0.3, 0.6, 0.9xy2 . The figure shows that the empirical dis-tribution explicitly agrees with the theoretical distribution (Eq.(18)). Neither the persistence in the data nor the uneven spacingseems to influence the distributional shape of the coherency

Page 6: REDFIT-X Cross-spectral analysis of unevenly spaced ... · PDF fileResearch paper REDFIT-X: Cross-spectral analysis of unevenly spaced paleoclimate time series Kristín Björg Ólafsdóttira,b,n,

Fig. 5. Comparison between Monte Carlo phase confidence intervals and theore-tical phase confidence intervals. (a) Coherency spectrum for two generated timeseries with 95% Monte Carlo false-alarm level (the mean value over all frequencies).(b) Phase spectrum with 95% confidence intervals estimated at frequencies withsignificant coherency. Monte Carlo phase confidence intervals (black narrow errorbars) estimated with =n 1000sim , theoretical confidence intervals (grey thick errorbars).

K. Björg Ólafsdóttir et al. / Computers & Geosciences 91 (2016) 11–1816

estimates.To analyse this further and to see if the Monte Carlo false-alarm

level follows the theoretical false-alarm level for any number ofsegments, we plot the Monte Carlo false-alarm level against ef-fective number of segments. As the Monte Carlo false-alarm levelis more or less constant over all frequencies, we use the meanvalue over all frequencies. This was done both for Monte Carlofalse-alarm levels formed with white noise and red noise pro-cesses (τ τ= =3.0, 5.0x y ) and compared with the theoretical false-alarm level for different numbers of segments (Fig. 3). The gen-erated processes are either evenly spaced (Fig. 3a), unevenlyspaced on equal timescales (Fig. 3b) or unevenly spaced on un-equal timescale (Fig. 3c). The results show again that there ispractically no difference between the Monte Carlo false-alarm le-vels formed with white noise processes or red noise processes.There is a small difference between false-alarm level formed withthe Monte Carlo simulations (generated with both white and rednoise processes) and the theoretical false-alarm level when thenumber of segments is less than 7 ( =n 5.82eff ). The uneven spa-cing does not have any influence but there seems to be a little bitmore mismatch between the theoretical and Monte Carlo false-alarm level in case of unequal timescale. In this experiment weused =n 1000sim . The results were identical if the number of si-mulations were increased up to 10,000, which shows that there isno reason to use more than 1000 simulations to form the MonteCarlo false-alarm level.

3.2. Phase

The Monte Carlo phase confidence intervals are compared withthe theoretical phase confidence intervals. For the comparison wegenerated two time series with identical periodicities and knownphase difference. The first time series consists of two sine waves(frequencies =f 0.031 and =f 0.22 ; amplitudes: =A 1.01 and

=A 1.42 ; phase values ϕ = °01 and ϕ = °602 ). Additionally Gaussiannoise was added to the signal (variance σ = 1.02 ). The time scale isunevenly spaced (drawn from Gamma distribution) with averagetime spacing ¯ =d 1.0 year and number of data points n¼300. Thesecond time series consists of the same periodicities (frequencies

=f 0.031 and =f 0.022 ; amplitudes: =A 1.41 and =A 1.02 ; phasevalues ϕ = °901 and ϕ = °02 ) plus Gaussian noise (varianceσ = 1.02 ). The time scale is evenly spaced with time spacing =d 1.0year and number of data points n¼300. Coherency and phase

Fig. 4. Empirical distributions of phase estimates at single frequency with sig-nificant coherency for two cases, white noise and red noise. The lines show normalcurves with estimated mean and standard deviation from the data (black dashedline for white noise and grey line for red noise).

spectrum were estimated along with Monte Carlo and theoreticalphase confidence intervals. Welch window was used in the spec-tral analysis and number of segments =n 750 ( = )n 5.82eff .

Histogram of the phase estimates, from the Monte Carlo si-mulation loop (with =n 10, 000sim ), for single frequency with asignificant coherency is used to infer if the phase estimates followa normal distribution. In the first case white noise processes (τxand τ = 0y ) were generated to form the empirical distribution ofthe phase estimates, while in the second case red noise processes(τ = 5.0x and τ = 3.0y ) were generated (Fig. 4). χ2-test was used totest the null hypothesis at 5% significance level that the phaseestimates come from a normal distribution. The χ2-test statistics inboth cases are above the critical value, which means that thesample distribution of the phase estimates are significantly dif-ferent from a normal distribution. The color of the noise does notinfluence the distributional shape of the phase estimates (Fig. 4).The phase values are in general shifted along the x-axis whenpersistence is included but that is corrected for in the REDFIT-Xprogram as the phase values are centered around zero by sub-tracting the average phase value from the Monte Carlo ensemblefrom each phase value.

Fig. 5 shows a comparison of the Monte Carlo phase confidenceintervals estimated with =n 1000sim and the theoretical phaseconfidence intervals. It shows that the Monte Carlo phase con-fidence intervals are slightly wider than the theoretical ones.

4. Application

As an example we applied the method to the same time seriesused in the SPECTRUM paper (Schulz and Stattegger, 1997). Thefirst time series is the SPECMAP stack (oxygen isotope data, δ18O)(Imbrie et al., 1984) and the second time series is a proxy record ofNorth-Atlantic sea-surface temperature (SST) (Ruddiman et al.,1989). The example investigates the relation between global cli-mate and regional SST changes at the Milankovitch periods ofEarth orbital variations. The intention is not to answer the un-derlying climatological questions. But to allow comparison

Page 7: REDFIT-X Cross-spectral analysis of unevenly spaced ... · PDF fileResearch paper REDFIT-X: Cross-spectral analysis of unevenly spaced paleoclimate time series Kristín Björg Ólafsdóttira,b,n,

Fig. 6. (a) SPECMAP oxygen isotope data (δ18O). (b) North Atlantic sea-surfacetemperatures (SST).

Fig. 7. Estimated coherency and phase spectrum on the SPECMAP and SST timeseries from Fig. 4. (a) Coherency spectrum, the grey line indicates 95% false-alarmlevel (the mean value over all frequencies) formed with 1000 Monte Carlo simu-lations. (b) Phase spectrum, 95% Monte Carlo confidence intervals were estimatedat frequencies where the coherency exceeds the false-alarm level.

K. Björg Ólafsdóttir et al. / Computers & Geosciences 91 (2016) 11–18 17

between the outcomes from the new program with the previousSPECTRUM results. Therefore we used the identical age model asin the previous paper, although this has now been revised (Lisieckiand Raymo, 2005). The oxygen isotope time series is evenlyspaced, with time spacing¼2 ka and number of data pointsn¼325 (Fig. 6a). The SST time series is unevenly spaced withnumber of data points n¼322 and average spacing ¯ =d 2 ka(Fig. 6b). The settings used in the cross-spectral analysis were thesame as used in the example in Schulz and Stattegger (1997), asfollows, number of segments =n 450 , Welch window, OFAC¼5 andHIFAC¼1.0. The parameters OFAC and HIFAC are used to set thefrequency grid in the spectra. OFAC is the oversampling factor thatdetermines how many frequencies are investigated in the Lomb–Scargle Fourier transform and HIFAC determines the highest fre-quency to analyse in the Fourier transform or HIFAC n fNyq (Schulzand Stattegger, 1997). The minimum values in the δ18O record are

expected to correlate with maxima in the SST record. Therefore thesign of the δ18O record was changed prior to the analysis to pre-vent an artificial offset in the phase by 180°. The coherencyspectrum indicated significant coherencies at frequencies 1/100and −1/23 ka 1 (Fig. 7a). The Monte Carlo false-alarm level (α¼0.05)was estimated with 1000 simulations. The phase spectrumshowed that the ice-volume minima lead SST maxima by

°[ − ° °]17 23.5 ; 57.1 ( = −f 1/99.26 kak1) and °[ ° °]95 58.0 ; 131.1

( = −f 1/23.04 kak1) (Fig. 7b). The confidence interval for the phase

was estimated with Monte Carlo simulations ( =n 1000sim ). Theresults for the coherency spectrumwere in harmony with previousresults from Schulz and Stattegger (1997). The Monte Carlo false-alarm level was slightly lower than the theoretical one, but there isa small difference between them when the number of segments isless than 7, as mentioned in the previous section. The confidenceinterval for the phase angles was wider than in the previous paperas the Monte Carlo phase confidence interval is in general widerthan the theoretical one. The seesaw shape of the phase spectrumis due to the fact that the mean sampling time in the two timeseries differs. This was discussed in Schulz and Stattegger (1997)and in the SPECTRUM program there is an option to estimatealigned phase spectrum to make it easier to measure the phaseangles from the phase spectrum plot. In the case of alignment, thephase angles need to be corrected back, before interpreting theresults. For simplification, this alignment option is not included inthe new version of the program.

5. Discussion

Simulation approaches have been previously applied to estimatefalse-alarm level for coherency. Pardo-Igúzquiza and Rodríguez-Tovar (2012) use Monte Carlo simulations to form statistical sig-nificance for coherency spectrum estimated via Lomb–Scarglecross-periodogram. They use a permutation test which preservesthe distribution of the underlying data but they do not take intoaccount the persistence of the observed time series as they generatewhite noise processes. Faes et al. (2004) define threshold value forzero coherence with three different surrogate analysis type, wherethe surrogates preserve either the distributional shape or thespectral properties of the observed data. Huybers and Denton(2008) analyse the significance of the coherency with Monte Carlosimulation technique as well as confidence intervals for the phase,but they do not consider unevenly spaced data.

We show that the Monte Carlo false-alarm level fits the theoreticalfalse-alarm, when using more than 6 segments. This is in agreementwith results from Gallet and Julien (2011), who modified the false-alarm level equation (Eq. (10)) for 50% overlapping to take into accountthe effect of overlapping segmenting. This is similar to what is donehere with the effective number of segments. The persistence of thegenerated data in the Monte Carlo simulations does not influence ei-ther the distributional shape of the coherency estimate or, conse-quently, the false-alarm level for coherency. The color of the noise hasalso little impact on significance level for wavelet coherence formedwith Monte Carlo simulations by Grinsted et al. (2004). Although thereis a little difference between the Monte Carlo false-alarm level and thetheoretical false-alarm level, we highly recommend the usage of theMonte Carlo false-alarm level to add more confidence to the result. Itcan be especially convenient when the method is applied on for ex-ample noisy or strange data.

The Monte Carlo phase confidence intervals are slightly wider thanthe theoretical confidence intervals used in SPECTRUM. The sampledistribution of the Monte Carlo phase estimates does not follow anormal distribution. The phase estimates are known to be approxi-mately normally distributed (Bloomfield, 2000). When the approx-imation is good, variance is small, coherency is not small the

Page 8: REDFIT-X Cross-spectral analysis of unevenly spaced ... · PDF fileResearch paper REDFIT-X: Cross-spectral analysis of unevenly spaced paleoclimate time series Kristín Björg Ólafsdóttira,b,n,

K. Björg Ólafsdóttir et al. / Computers & Geosciences 91 (2016) 11–1818

theoretical confidence interval from SPECTRUM can be suitable. Han-nan (1970) uses for example percentage point of the t distribution toform confidence intervals for the phase angle as it is thought to give abetter approximation of the distribution of the phase estimates thanthe normal distribution.

6. Conclusion

Here we present a computer program REDFIT-X that performscross-spectral analysis of unevenly spaced paleoclimate time ser-ies. We use a Monte Carlo approach to estimate the uncertaintyassociated with coherency and phase. The program is an updatedversion of the existing programs REDFIT (Schulz and Mudelsee,2002) and SPECTRUM (Schulz and Stattegger, 1997). Now bothauto- and cross-spectral analysis can be performed under thesame program where the significance measurements are esti-mated with Monte Carlo simulation technique.

A comparison between the empirical sample distribution of co-herency formed by the Monte Carlo simulations and the theoreticaldistribution of coherency shows that we are able to reproduce thetheoretical distribution with the Monte Carlo simulations. The color ofthe noise does not have large influence on the distribution or the false-alarm level for coherency. The worked example demonstrates theapplicability of the method to real paleoclimate time series and allowsfor exact comparison between the outcomes from the new programwith the previous SPECTRUM results. The Monte Carlo phase con-fidence intervals are slightly wider than the theoretical confidenceintervals for phase used in SPECTRUM. The program is suitable forunevenly spaced climate time series where the persistence of the datais taken into account when estimating the uncertainty associated withthe coherency and phase estimates.

Acknowledgments

This study has received funding from the European Commu-nity's Seventh Framework Programme (FP7/2007-2013) - Marie-Curie ITN, under Grant agreement no. 238512, GATEWAYS project.Funding also came from the DFG-Research Center MARUM - TheOcean in the Earth System and GLOMAR - Bremen InternationalGraduate School for Marine Sciences. We thank Heather Johnstonefor helpful comments regarding the English. We thank twoanonymous reviewers for constructive comments.

Appendix A. Supplementary data

Supplementary data associated with this paper can be found inthe online version at http://dx.doi.org/10.1016/j.cageo.2016.03.001.

References

Allen, M.R., Smith, L.A., 1996. Monte Carlo SSA: detecting irregular oscillations inthe presence of colored noise. J. Clim. 9, 3373–3404.

Bendat, J.S., Piersol, A.G., 2010. Random Data Analysis and Measurement

procedures, 4th Edition. Wiley, New Jersey.Benignus, V.A., 1969. Estimation of the coherence spectrum and its confidence

interval using the fast Fourier transform. IEEE Trans. Audio Electroacoust. 17,145–150.

Bloomfield, P., 2000. Fourier Analysis of Time Series: An Introduction, 2nd Edition.Wiley, New York.

Carter, G.C., 1977. Receiver operating characteristics for a linearly thresholded co-herence estimation detector. IEEE Trans. Acoust. Speech Signal Process. 25,90–92.

Carter, G.C., Knapp, C.H., Nuttall, A.H., 1973. Estimation of the magnitude-squaredcoherence function via overlapped fast Fourier transform processing. IEEETrans. Audio Electroacoust. AU-21, 337–344.

Faes, L., Pinna, G.D., Porta, A., Maestri, R., Nollo, G., 2004. Surrogate data analysis forassessing the significance of the coherence function. IEEE Trans. Biomed. Eng.51, 1156–1166.

Foster, M.R., Guinzy, N.J., 1967. The coefficient of coherence; its estimation and usein geophysical data processing. Geophysics 32, 602–616.

Gallet, C., Julien, C., 2011. The significance threshold for coherence when using theWelch's periodogram method: effect of overlapping segments. Biomed. SignalProcess. Control 6, 405–409.

Goodman, N.R., 1957. On the Joint Estimation of the Spectra, Cospectrum, andQuadrature Spectrum of a Two-dimensional Stationary Gaussian Process. Sci-entific Paper No.10 (AD 134919). New York University, New York.

Grinsted, A., Moore, J.C., Jevrejeva, S., 2004. Application of the cross wavelettransform and wavelet coherence to geophysical time series. Nonlinear Precess.Geophys. 11, 561–566.

Hannan, E.J., 1970. Multiple Time Series. Wiley, New York.Harris, F.J., 1978. On the use of windows for harmonic analysis with the discrete

Fourier transform. Proc. IEEE 66, 51–83.Hasselmann, K., 1976. Stochastic climate models: Part I. Theory. Tellus 28, 473–485.Huybers, P., Denton, G., 2008. Antarctic temperature at orbital timescales controlled

by local summer duration. Nat. Geosci. 1, 787–792.Imbrie, J., Hays, J.D., Martinson, D.G., McIntyre, A., Mix, A.C., Morley, J.J., Pisias, N.G.,

Prell, W.L., Shackleton, N.J., 1984. The orbital theory of Pleistocene climate:support from a revised chronology of the marine δ18O record. In: Berger, A.,Imbrie, J., Hays, H., Kukla, G., Saltzman, B. (Eds.), Milankovitch and Climate, PartI. D. Reidel Publishing, Dordrecht, pp. 269–305.

Koopmans, L.H., 1995. The Spectral Analysis of Time Series. Academic Press, NewYork.

Lisiecki, L.E., Raymo, M.E., 2005. A Pliocene–Pleistocene stack of 57 globally dis-tributed benthic δ18O records. Paleoceanography 20. http://dx.doi.org/10.1029/2004PA001071, PA1003.

Lomb, N.R., 1976. Least-squares frequency analysis of unequally spaced data. As-trophys. Space Sci. 39, 447–462.

Miles, J.H., 2006. Aligned and Unaligned Coherence: A New Diagnostic Tool, AIAAPaper No. 2006-10. Technical Report NASA/TM2006-214112, NASA, Cleveland,OH.

Miles, J.H., 2011. Estimation of signal coherence threshold and concealed spectrallines applied to detection of turbofan engine combustion noise. J. Acoust. Soc.Am. 129, 3068–3081.

Mudelsee, M., 2002. TAUEST: a computer program for estimating persistence inunevenly spaced weather/climate time series. Comput. Geosci. 28, 69–72.

Mudelsee, M., 2010. Climate Time Series Analysis: Classical Statistical and BootstrapMethods, 1st Edition. Springer, Dordrecht.

Pardo-Igúzquiza, E., Rodríguez-Tovar, F.J., 2012. Spectral and cross-spectral analysisof uneven time series with the smoothed Lomb–Scargle periodogram andMonte Carlo evaluation of statistical significance. Comput. Geosci. 49, 207–216.

Pearson, J., 2009. Computation of Hypergeometric Functions [Master's thesis].University of Oxford.

Ruddiman, W.F., Raymo, M.E., Martinson, D.G., Clement, B.M., Backman, J., 1989.Pleistocene evolution: northern hemisphere ice sheets and North AtlanticOcean. Paleoceanography 4, 353–412.

Scargle, J.D., 1982. Studies in astronomical time series analysis. II. Statistical aspectsof spectral analysis of unevenly spaced data. Astrophys. J. 263, 835–853.

Schulz, M., Mudelsee, M., 2002. REDFIT: estimating red-noise spectra directly fromunevenly spaced paleoclimatic time series. Comput. Geosci. 28, 421–426.

Schulz, M., Stattegger, K., 1997. SPECTRUM: spectral analysis of unevenly spacedpaleoclimatic time series. Comput. Geosci. 23, 929–945.

von Storch, H., Zwiers, F.W., 2003. Statistical Analysis in Climate Research. Cam-bridge University Press, Cambridge.

Welch, P.D., 1967. The use of fast Fourier transform for the estimation of powerspectra: a method based on time averaging over short, modified periodograms.IEEE Trans. Audio Electroacoust. AU-15, 70–73.