evaluation of proxy-based millennial reconstruction methods...juckes et al. 2006). in this paper, we...

Evaluation of proxy-based millennial reconstruction methods

Terry C. K. Lee Æ Francis W. Zwiers ÆMin Tsao

Received: 3 July 2007 / Accepted: 9 November 2007

� Springer-Verlag 2007

Abstract A range of existing statistical approaches for

reconstructing historical temperature variations from proxy

data are compared using both climate model data and real-

world paleoclimate proxy data. We also propose a new

method for reconstruction that is based on a state-space

time series model and Kalman filter algorithm. The state-

space modelling approach and the recently developed

RegEM method generally perform better than their com-

petitors when reconstructing interannual variations in

Northern Hemispheric mean surface air temperature. On

the other hand, a variety of methods are seen to perform

well when reconstructing surface air temperature variabi-

lity on decadal time scales. An advantage of the new

method is that it can incorporate additional, non-tempera-

ture, information into the reconstruction, such as the

estimated response to external forcing, thereby permitting a

simultaneous reconstruction and detection analysis as well

as future projection. An application of these extensions is

also demonstrated in the paper.

Keywords Kalman filter � State-space model �Temperature reconstruction

1 Introduction

A number of studies have attempted to reconstruct hemi-

spheric mean temperature for the past millennium from

proxy climate indicators. However, the available recon-

structions vary considerably. Different statistical methods

are employed in these reconstructions and it therefore

seems natural to ask how much of this discrepancy is

caused by the variations in methods. The reliability of

some of the reconstruction methods has been looked at in

different studies (e.g. Zortia et al. 2003; von Storch et al.

2004; Bürger and Cubasch 2005; Esper et al. 2005; Mann

et al. 2005; Zorita and von Storch 2005; Bürger et al. 2006;

Juckes et al. 2006).

In this paper, we provide an empirical comparison

between different reconstruction methods. Analyses are

carried out using both pseudo-proxy data from climate

models and real-world paleoclimate proxy data. We will

also propose a method for reconstruction that is based on a

state-space time series model and Kalman filter algorithm.

As an option, this method allows one to simultaneously

incorporate historical forcing information to reconstruct the

unknown historical temperature, and to carry out detection

analysis to quantify the influence of external forcing on

historical temperature.

The remainder of this paper is organized as follows. The

existing statistical procedures used for reconstruction are

discussed in Sect. 2, together with the approach that uti-

lizes a state-space time series model and Kalman filter

algorithm. Following the general approach proposed by

von Storch et al. (2004), comparisons between these

methods are made with the help of both climate model data

and real-world paleoclimate proxy data. The results are

presented in Sect. 3. Concluding remarks are given in

Sect. 4.

T. C. K. Lee � M. TsaoDepartment of Mathematics and Statistics,

University of Victoria, Victoria, BC, Canada

F. W. Zwiers (&)Climate Research Division, Environment Canada,

4905 Dufferin Street, Toronto, ON M3H 5T4, Canada

e-mail: [email protected]

123

Clim Dyn

DOI 10.1007/s00382-007-0351-9

2 Reconstruction approaches

2.1 Existing approaches

All reconstruction approaches, in general, involve statisti-

cal procedures that map the proxy series onto the

reconstructed temperature. The mapping involves first

finding the relationship between the proxy data and the

instrumental record during a calibration period when both

records are available. This relationship is then applied to

the proxy data to provide the reconstructed historical

temperature.

Mann et al. (2005) classified the existing reconstruction

approaches into two major categories: composite plus scale

(CPS) methods and climate field reconstruction (CFR)

methods. In the CPS approach, a number of proxy series is

first combined together to form a composite record, which

is then used to reconstruct the temporal evolution of mean

temperature over some spatial domain, typically the

hemispheric mean. In many studies, the reconstruction

target is the northern hemispheric mean temperature

because proxy data, derived mainly from trees, are mostly

available from the northern hemisphere land masses. The

composite record is typically formed by calculating a

weighted average of all the available proxy series. The

weights can either be uniform (e.g. Jones et al. 1998; Briffa

et al. 2001; Esper et al. 2002) or can be determined using

the correlation between the proxy series and the instru-

mental record during the calibration period (e.g. Hegerl

et al. 2007). The correlation based weighting scheme has

the advantage of minimizing the influence of potentially

unreliable proxy series on the composite record. One recent

study by Moberg et al. (2005) used a different averaging

scheme, in which high-frequency and low-frequency

composites were first formed individually using wavelet

transformation. Each of these composites was formed using

only proxy indicators that were thought to be able to cap-

ture variability in the corresponding frequency range. The

two composites are then combined to form a single com-

posite. This averaging scheme can be beneficial when the

individual proxy series are known to capture variability at

different timescales.

Once the composite is formed, it is then calibrated to

produce the reconstructed temperature. One calibration

approach, the forward regression approach, utilizes the

ordinary least squares regression model which assumes

that, at time t, for t = n + 1, n + 2, …, n + m,

Tt ¼ aPt þ et ð1Þ

where n + m and m are the length of the composite proxy

record and instrumental record respectively (m is in general

much smaller than n), T is the mean hemispheric instru-

mental record, P is the composite proxy record and et

represents error that results from incomplete spatial sam-

pling in the instrumental record. The coefficient a scales Ptto Tt and is estimated by ordinary least squares regression

which minimizes the residual sum of squares between aPtand Tt during the calibration period. By assuming that Eq.

(1) also holds at times prior to the calibration period, the

unknown historical hemispheric mean temperature can

then be reconstructed by scaling the pre-calibration period

composite record Pt by a, for t = 1, 2, ..., n. The estimate ofa, denoted by â; is in general negatively biased because ofthe measurement error and non-temperature variability

inherit in the composite proxy series (see, e.g. Fuller 1987;

Allen and Stott 2003). This under-estimation will result in

a loss of variance in the reconstructed series because

VarðâPÞ\VarðaPÞ for 0\â\a:One way to avoid this underestimation in a is to use the

total least squares (TLS) method to estimate a (see Allenand Stott 2003 for an application of this technique in cli-

mate change detection analysis). In this case the statistical

relationship between Pt and Tt is defined by

Tt ¼ aðPt � gtÞ þ et ð2Þ

where the additional term gt represents non-temperaturevariability in the composite proxy series. By explicitly

incorporating gt into the regression model, the underlyingvalue of a, in theory, can be better estimated. Hegerl et al.(2007) applied the total least squares method in their

reconstruction. To derive the best guess estimate of a, oneneeds to know the ratio of Var(gt) to Var(et). Since thisratio is unknown in real-world applications, it is estimated

using climate model simulations. Hegerl et al. (2007) find

that their reconstruction is insensitive to the precise choice

of the Var(gt) to Var(et) ratio.Another CPS reconstruction method, which is termed

the variance matching approach, is favored by Jones et al.

(1998) and others. In this approach, the pre-calibration

period Pt is scaled by a parameter b, where b is determinedso that, during the calibration period, Var(T) = Var(b P).

A variant of the forward regression model (Eq. 1) is the

inverse regression model (see Brown 1993, for an intro-

duction; Coehlo et al. 2004, for an example)

Pt ¼ fTt þ gt ð3Þ

where gt is defined similarly as in Eq. (2) and f is estimatedby minimizing the residual sum of squares between fTt andPt during the calibration period. The reconstructed hemi-

spheric temperature is then estimated by Pt/f. Differentimplementations of the inverse regression method have

been used in the paleoclimate reconstruction literature. In

Mann et al. (1998), the regression is done between the

principal components of the instrumental record and the

individual proxy indictor (see latter part of this section for

details), some of which were obtained by principle

T. C. K. Lee et al.: Evaluation of proxy-based millennial reconstruction methods

123

component analysis from a set of proxy series. In Juckes

et al. (2006), the inverse regression is done between each

individual proxy series and the Northern Hemisphere

annual mean temperature. A weighted average of the

individual proxy series, weights determined by the

regression coefficients, is then used as the reconstructed

series.

Note that the instrumental record Tt is subject to sam-

pling uncertainty (see Jones et al. 1997 for an estimate of

this uncertainty) and neglecting such uncertainty when

estimating f in inverse regression will result in an estimatethat is negatively biased. However, such bias would be

smaller than the bias incurred by using forward regression

because the non-temperature variability inherent in the

composite record is usually larger than the sampling

uncertainty in the instrumental record. On the other hand,

the total least squares approach used by Hegerl et al.

(2007), in theory, should provide a more accurate estimate

of the regression coefficient when both Pt and Tt are noise

contaminated. However, this only holds if the ratio of

Var(gt) to Var(et), which is required to derive the estimateof a, is known. This ratio is unknown in real-worldapplications and is estimated using a limited number of

climate model simulations. The bias from such estimation

is hard to determine and thus its impact on the recon-

struction is unclear. For this reason, it is not obvious

whether inverse regression or total least squares regression

will result in smaller bias.

In the case of the CFR approach, the proxy series are

used to reconstruct both the underlying temporal and spa-

tial patterns of historical temperature. The Mann et al.

(1998) method (often referred to as the MBH method) is an

example of a technique that uses the climate field recon-

struction approach. The MBH method brings together

techniques used in principal component analysis and

regression. The instrumental record is first decomposed

into its spatial and temporal parts through principal com-

ponent analysis and only a subset of the components are

retained. Next, the relationship between the subset of

temporal principal components (PCs) and the ith proxy

series (i = 1, 2, ..., p) during the calibration period is

obtained by inverse regression which estimates the coeffi-

cients bj(i) (j = 1, 2, ..., Neof) in

PðiÞnþ1

PðiÞnþ2...

..

.

PðiÞnþm

266666664

377777775¼ U

bðiÞ1bðiÞ2...

bðiÞNeof

266664

377775þ dðiÞ

where Neof is the number of EOFs retained, Pt(i) is the ith

proxy data at time t , U is the matrix of temporal PCs

(m 9 Neof) of the instrumental record and d(i) is a noise

series (m 9 1). This procedure is repeated for each proxy

series and this yields a matrix of coefficients, denoted by G

(p 9 Neof), which is defined as,

G ¼

bð1Þ1 bð1Þ2 � � � b

ð1ÞNeof

bð2Þ1 bð2Þ2 � � � b

ð2ÞNeof

..

. ... ..

. ...

bðpÞ1 bðpÞ2 � � � b

ðpÞNeof

2666664

3777775

The pre-calibration period temporal PCs at time t, denoted

by Zt (Neof 9 1), are then reconstructed through least

squares regression using

Pð1Þt

Pð2Þt

..

.

PðpÞt

26664

37775 ¼ GZt þ jt

where jt is a noise series (p 9 1). The sequence ofreconstructed temporal PCs, Ẑt; is then scaled to have the

same variance as the instrumental temporal PCs over the

calibration period and subsequently re-combined with

the calibration period spatial PCs to provide an estimate of

the unknown local temperature. A hemispheric mean

reconstruction is then formed by the appropriate spatial

average of the reconstructed local temperatures. The

regression procedures above are both inverse regression.

However, unlike the CPS inverse regression, the individual

proxy series are used in the regression process. Also, there

are Neof coefficients to estimate in each regression, com-

pared to only one coefficient in the CPS case.

Another CFR method uses the regularized expectation

maximization (RegEM) algorithm (Schneider 2001) and is

advocated by Rutherford et al. (2005) and others. Let Tt(j)

be the local temperature at time t at the jth location (j = 1,

2, ..., q) and let X be a (n + m) 9 (q + p) matrix with

missing data which is defined as

X¼

Tð1Þ1 T

ð2Þ1 � � � T

ðqÞ1 P

ð1Þ1 P

ð2Þ1 � � � P

ðpÞ1

Tð1Þ2 T

ð2Þ2 � � � T

ðqÞ2 P

ð1Þ2 P

ð2Þ2 � � � P

ðpÞ2

..

. ... ..

. ... ..

. ... ..

. ...

Tð1Þnþm T

ð2Þnþm � � � T

ðqÞnþm P

ð1Þnþm P

ð2Þnþm � � � P

ðpÞnþm

26666664

37777775

¼def

Tð1Þ Pð1Þ

Tð2Þ Pð2Þ

..

. ...

TðnþmÞ PðnþmÞ

26666664

37777775

with


123

E½TðtÞ PðtÞ� ¼ l ¼ ½lT lP�; for t ¼ 1; 2; . . .; nþ m

VarðXÞ ¼ R ¼RTT RTPRPT RPP

� �

where lT and lP have length q and p, respectively,T(t) = [Tt

(1) Tt(2) ... Tt

(q)] and P(t) = [Pt(1) Pt

(2) ... Pt(p)].

The (q + p) 9 (q + p) matrix R is partitioned according toT and P so that RTT, RPP and RTP = RPT

T are q 9 q, p 9 p

and q 9 p matrices, respectively. In the matrix X, T(t) is

unknown for t = 1, 2,..., n. It is assumed that each record

T(t) can be represented by a linear model of the form

TðtÞ ¼ lT þ ðPðtÞ � lPÞBþ �ðtÞ:

Here, e(t) is the residual and B is a p 9 q matrix of regressioncoefficients. The reconstructed temperature T̂ðsÞ at time s(s = 1,2 ,..., n), is thus defined as lT + (P(s) - lP) B.

To calculate T̂ðsÞ; estimates of B and l are required. Ifn + m C p + 1, the expectation maximization (EM) algo-

rithm provides a way to iteratively estimate these

parameters (see, e.g. Schneider 2001). First, initialize the

unknown T̂ðtÞ with some values and obtain an estimate ofthe mean and covariance of matrix X (see Schneider 2001,

for the formulas). Then, estimate B using the maximum

likelihood estimate B̂ ¼ R̂�1PPR̂PT ; where the R̂PP and R̂PTdenote the partitioned covariance matrix estimate. If n +

m \ p + 1, the estimate of RPP is singular and the coeffi-cient B is not defined. Next, impute the missing T̂ðtÞ byl̂T þ ðPðtÞ � l̂PÞB̂ and update the X matrix. Re-estimate l,R and B using the updated matrix and then recalculate themissing temperatures. This process is continued until the

change in imputed values become sufficiently small.

In a typical CFR application, n + m is greater than p + 1.

However, the estimate of R is rank deficient because n +m \ p + q + 1, which can result in a poor estimate of thecoefficient B. Hence, the RegEM algorithm is used instead.

The RegEM algorithm consists of the same steps as the EM

algorithm, except the maximum likelihood estimate B̂ is

replaced with a regularized estimate, which is obtained by a

regularized regression procedure. Different regularization

scheme have been used. In Rutherford et al. (2005) and

Mann et al. (2005), the ridge regression scheme is used to

estimate the coefficient matrix B. Readers are referred to

Schneider (2001) for more details of the ridge regression

scheme. In Mann et al. (2007), truncated total least squares

(TTLS) (Fierro et al. 1997) is used to estimate B.

2.2 State-space model and Kalman filter algorithm

In addition to the methods described above, we propose a

new method to be used for reconstructing historical tem-

perature that is based on a state-space time series model

and the Kalman filter algorithm. (Kalman 1960; see Harvey

1989 or Durbin and Koopman 2001, for an introduction).

The proposed technique is essentially a further variant on

inverse regression. A state-space time series model is a

system that consists of two equations: the observation

equation and the state equation. In the context of paleo-

climate reconstruction, the observation equation describes

the relationship between the proxy series and the unknown

hemispheric mean temperature. On the other hand, the state

equation models the dynamics of the unknown hemispheric

mean temperature. Based on the state-space model, one can

estimate the unknown hemispheric mean temperature using

the Kalman filter algorithm, a Bayesian updating scheme

that estimates the unobserved hemispheric mean tempera-

ture based on the proxy data. The estimates from the

Kalman filter algorithm are the best linear estimator of the

unobserved process in the sense of minimum mean square

error. The Kalman filter algorithm can also be extended to

provide a forecast of the hemispheric mean temperature.

To describe the state-space representation of the hemi-

spheric mean temperature, we must first introduce some

notation and make certain assumptions. Thus, let GSt,

VOLt and SOLt be the climate model estimated responses

to greenhouse gas and sulphate aerosol forcing combined

(GS), volcanic forcing (VOL) and solar forcing (SOL) at

time t, respectively. We can then think of the hemispheric

mean temperature Tt as being the sum of the response to

external forcing plus the effect of internal variability. With

these assumptions, a reasonable statistical model for Tt(t = 1, 2, ..., n + m) might be

Tt ¼ sþ dGSGSt þ dVOLVOLt þ dSOLSOLt þ Zt¼ �Xt þ Zt

ð4Þ

where � ¼ ½s dGS dVOL dSOL�;Xt ¼½1 GSt VOLt SOLt�T; s is the mean state of theclimate system, dGS,dVOL and dSOL are scaling factorsthat account for error in the magnitude of the estimated

response to the specified external forcing and Zt represents

random variations resulting from internal variability. We

further assumed that Zt is an AR(1) process with lag-one

autocorrelation /. Thus,

Zt ¼X1j¼0

/jmt�j; ð5Þ

where the mt’s are white noise with mean 0 and variance Q.With these assumptions, our state-space representation

of the hemispheric mean temperature Tt (t = 1, 2, ...,

n + m) is given by the following system of equations:

Tt ¼ /Tt�1 þ �Ft þ mtPt ¼ fTt þ gt

ð6Þ

where Ft = Xt - /Xt-1, gt is white noise with mean zeroand variance R. The first equation is the state equation


123

which determines how the unknown hemispheric mean

temperature evolves over time. It is obtained by consider-

ing the difference between Tt and /Tt-1 using Eq. (4). Thisequation assumed that Tt follows an autoregressive process

of order 1, i.e. AR(1), with exogenous variables Ft and

additive white noise mt. The state equation in effect statesthat the rate of change in temperature depends upon the

response to forcing which is governed by the forcing

coefficients dGS, dVOL and dSOL, and upon natural internalvariability. Alternatively, one can also define a state

equation that does not contain the response to forcings.

This can be achieved by setting Xt = 1 and � ¼ s: Thesecond equation in Eq. (6), which is the same equation as

used in inverse regression, is the observation equation. This

equation simply assumes that the relationship between the

composite proxy series and hemispheric temperature that

holds during the calibration period in Eq. (3) also holds in

the pre-calibration period.

The problem of estimating the pre-calibration period Ttin a state-space model can be approached by using the

Kalman filter and smoother algorithm. For notational pur-

poses, let Tts represent the estimate of T at time t given P1,

P2, ..., Ps and F1, F2, ..., Fs, where s B t. The Kalman filter

estimates of Tt for t = 1, 2, ..., n + m, denoted by Ttt, are

defined by the following set of recursive equations (Kalman

1960; see also Shumway and Stoffer 2000, p. 313),

Tt�1t ¼ /Tt�1t�1 þ �FtTtt ¼ Tt�1t þ KtðPt � fTt�1t Þ

ð7Þ

where

Kt ¼ fSt�1t ðf2St�1t þ RÞ

�1

St�1t ¼ /2St�1t�1 þ QStt ¼ St�1t ð1� fKtÞ:

Appropriate initial conditions for this recursion are T00 = l

and S00 = R, where l and R are constants. The Kalman

prediction Ttt-1 is an estimate of Tt based on all the

information we have at time t - 1, that is, our prior

knowledge of Tt before observing Pt. Such prior knowledge

is expressed through our Kalman filter estimate Tt-1t-1. After

observing Pt, our knowledge of Tt is updated by calculating

the Kalman filter estimate, Ttt. Once all of the Kalman filter

estimates are found, we can then further update our estimates

of Tt based on the entire data set {Pt, Ft; t = 1, 2, ..., n} using

the Kalman smoother algorithm. It should be noted that even

though Tt is known for t = n + 1, ..., n + m, we will still

provide filter estimates for them. This will enable us to

utilize all of the available data to estimate the unknown Ttwhen using the Kalman smoother. With initial condition

Tn+mn+m obtained from the Kalman filter algorithm (Eq. 7), the

Kalman smoother estimates Ttn+m of Tt, for t = n + m - 1,

n + m - 2, ..., 0, are

Tnþmt ¼ Ttt þ JtðTnþmtþ1 � Tttþ1Þ; ð8Þ

where Jt = /Stt/St+1

t .

A difficulty in using the state-space time series model is

that the parameters, ff;R;/;� ;Q; l;Rg; are unknown andneed to be estimated. An optimization algorithm such as

Newton–Raphson scoring and EM algorithm can be used to

estimate them numerically (see Appendix for details; see

Shumway and Stoffer 2000 for examples). Note that in the

optimization algorithm for state-space model, it is assumed

that the exogenous variable Ft is known. If / is unknown,the exogenous variables therefore cannot contain the

parameter /. Thus, in our state-space representation of thehemispheric mean temperature, one needs to fix the

parameter / that is involved in Ft = Xt - /Xt-1 beforeestimating the other parameters. More details on the choice

of / are given in the next section. Note that there is no needto fix the parameter / in front of Tt-1 in Eq. (6) in order touse the optimization algorithm. Thus, this parameter can be

estimated together with the other parameters.

The state-space model approach can also be extended to

produce forecasts. Provided that Ft is known for the future

time period, one can use the state equation to generate a

forecast recursively. Recall that the state equation at time t

is given by Tt ¼ /Tt�1 þ �Ft þ mt: By forecasting theerror term mt as zero, the forecast of Tn+m+s (s = 1, 2, ....)can be given by

~Tnþmþs ¼ /~Tnþmþs�1 þ �Fnþmþs ð9Þ

with ~Tnþm ¼ Tnþm being the known instrumental record attime n + m. Forecasts can also be made in the case when

the response to external forcing is not included in Eq. (6),

i.e. when Xt = 1 and � ¼ s: However, such forecasts willlikely not be very useful because they do not take the

impact of forcings into account and thus quickly revert to a

forecast of the mean s.An advantage of using the state-space model in Eq. (6)

is that it provides the flexibility to incorporate forcing

response information into the estimation of the unknown

temperature. This is achieved in a two step process that first

determines the impact of each forcing on the unknown

temperature through the estimation of the forcing coeffi-

cients. The estimation process uses the information

available in both the proxy series and the calibration data.

From Eq. (6), it is obvious that if the forcing coefficient is

significantly different from zero, one can claim that the

corresponding forcing has a significant impact on the

hemispheric mean temperature. Such an assessment can be

made using the confidence bound of each coefficient

obtained during the estimation step (see the Appendix for

details). Once the coefficients are estimated, the forcing

information is then incorporated into the final estimate of

the unknown temperature using the Kalman filter and


123

smoother algorithm. In another words, the use of the state-

space model allows one to simultaneously reconstruct the

unknown hemispheric mean temperature and conduct a

detection assessment of the importance of the response to

GS, VOL and SOL forcing on hemispheric temperature.

Furthermore, the use of the state-space model allows one to

provide projections of future climate which are based on

parameters that are estimated using the proxy records and

past observations.

3 Results

3.1 Analysis with climate model simulations

The performance of a particular reconstruction method

depends on many factors, such as the statistical methods

used, the choice of proxies, the quality of the proxy records

and the target season or latitude band. It is, in general,

difficult to compare the performance of different recon-

struction methods using the past instrumental temperature

record because the instrumental period is simply too short

to calibrate such techniques and reliably assess their per-

formance. Climate model simulations, however, can

provide a test bed for assessing the reliability of these

methods as first introduced by Zorita et al. (2003; see also

von Storch et al. 2004; Mann et al. 2005; Zorita and von

Storch 2005 and others). In this paper, we will follow the

methodology proposed by von Storch et al. (2004). The

idea is to generate pseudo-proxy records by sampling a

selection of simulated grid-box temperatures from the cli-

mate model and degrading them with additive noise. The

reconstruction method is then applied to these pseudo-

proxy records and the resulting reconstruction is validated

against the known simulated hemispheric mean tempera-

ture record. To reflect the differences in the quality and

properties of real-world proxy data, different colours of

noise (e.g. red rather than white) and varying amplitudes of

the noise variance have been used in different studies.

In this section, we followed these procedures for testing

the reconstruction methods that are described in Sect. 2.

We have conducted a suite of experiments that explore the

sensitivity of each method to (1) the climate model simu-

lation used, (2) the length of the calibration period and (3)

the amount of noise introduced into the pseudo-proxy

series. The two simulations used are the GKSS ECHO-G

simulation (von Storch et al. 2004) and a simulation from

the NCAR CSM 1.4 model (Ammann et al. 2007). Both

simulations were forced with reconstructions of solar,

volcanic and greenhouse gas forcing. The CSM 1.4 run was

also forced with a reconstruction of aerosol forcing over

the millennium. We used the ECHO-G simulation in its

original form, rather than as adjusted (Osborn et al. 2006).

For both simulations, only output between years 1000 and

1990 is used. To represent the varying calibration intervals

that were used in actual reconstructions, we have used two

calibration periods in our analysis (1880–1960 and 1860–

1970). These calibration periods are similar to that used in

Hegerl et al. (2007) and Moberg et al. (2005). In our

experiments, pseudo-proxy records were formed by

degrading the grid box temperatures with additive noise.

The amount of noise introduced into the pseudo-proxy

series is expressed in terms of the signal-to-noise ratio

(SNR), which is defined asffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVarðXÞ=VarðNÞ

p; where X is

the grid box temperature series and N is the additive noise

series. For all methods, experiments with SNR = 0.5 and 1

were performed.

First, we obtain a pseudo northern hemisphere instru-

mental record from the simulation. CFR methods use a set

of continuous grid-box temperatures for analysis; we

therefore fixed the spatial coverage of the entire pseudo-

instrumental record at that of the instrumental network in

1920, as represented in the HadCRUT3 data set (Brohan

et al. 2006). The choice of the year 1920 is arbitrary, but it

corresponds roughly to the mid points of the calibration

periods that were used in our analysis. For the other meth-

ods, the pseudo NH mean instrumental record is calculated

from the appropriate areal average of the same grid-box

temperatures used above. This ensures that all methods are

provided with the same information for calibration.

Next, we define two pseudo-proxy networks of 15 and

100 randomly selected model grid boxes. These networks

were sampled from the 321 NH grid boxes that are co-

located with actual tree ring data found in the International

Tree Ring Data Base (http://www.ncdc.noaa.gov/paleo/

treering.html). These networks of grid-box temperature

were converted to pseudo-proxy records by degrading the

grid-box temperatures with added noise to mimic the

measurement error that is inherent in the proxy records.

The resulting pseudo-proxy series were then standardized

relative to the calibration period. Since the RegEM method

is computationally intensive, we will only apply this

method to the larger network of the two. The size of this

network is similar to networks that are used in the real-

world application of this technique. On the other hand, the

CPS methods and state-space model approach are less

computationally intensive, and these methods were tested

using both networks.

We used uniformly weighted spatial averages to form

the composite record in our experiments; composites that

are formed by wavelet transformation (Moberg et al. 2005)

or correlation based weighted averages (Hegerl et al. 2007)

are not considered here. Since the SNR and colour of the

noise in each pseudo-proxy series is the same, the different

weighting scheme should not substantially impact the

resulting composite record.


123

http://www.ncdc.noaa.gov/paleo/treering.htmlhttp://www.ncdc.noaa.gov/paleo/treering.html

For the total least squares method, Var(et) is estimatedby calculating the variability of the difference between the

pseudo-instrumental record and the climate model simu-

lated NH temperature over the whole reconstruction period.

Since the pre-noise contaminated composite record is

known in our experiment, Var(gt) is estimate by calculatingthe variability of the difference between the pre-noise

contaminated and noise contaminated composite pseudo-

proxy record, instead of following the procedure described

in Hegerl et al. (2007). This is a rather idealized situation

in the sense that we have used information that is not

available in real-world application to estimate the two

variances.

For the MBH method, we retained the 10 largest EOFs

instead of using the selection rule that was described in

Mann et al. (1998). We also repeated our analysis by

retaining the 5 or 15 largest EOFs and the results are almost

identical to that obtained with the 10 largest EOFs. Thus,

these results will not be shown. Also, no detrending was

done to the data prior to calibration (see Wahl et al. 2006;

von Storch et al. 2006 for further discussions on the use of

detrended data).

For the RegEM method, the hybrid non-stepwise

approach is used to reconstruct the grid box temperatures

(Rutherford et al. 2005; Mann et al. 2005). As in Ruther-

ford et al. (2005) and Mann et al. (2005), the ridge

regression procedure is used to regularize the EM algo-

rithm. Prior to reconstruction, a weight is applied to each

standardized proxy series to ensure that the error variance

of the signal in the series are homogenous among all

records. The weight for the ith proxy series is defined asffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVarðPðiÞÞ=VarðSðiÞÞ

q; where P(i) is the ith proxy series and

S(i) is the signal in the ith proxy series. However, S(i) is

unknown in real-world applications. An approximation of

this weight can be provided by the sample correlation

coefficient between the proxy series and the associated grid

box temperature over the calibration period (M. Mann and

S. Rutherford, personal communication, 2006). Such a

weighting approach is not implemented in Rutherford et al.

(2005) and Mann et al. (2005). However, through our

experiments with the RegEM method (not shown) and

personal communications (2006) with M. Mann and S.

Rutherford, we confirmed that reconstruction of the CSM

hemispheric mean temperature is sensitive to whether

weighted proxy series are used. However, this only applies

to reconstructions that use ridge regression to regularize the

EM algorithm. Mann et al. (2007) found that results

obtained using TTLS regression for regularization are

insensitive to whether weights are used. They also pointed

out that regularization using TTLS regression is less

computationally intensive and tends to provide more robust

results than regularization with ridge regression. Hence

regularization using TTLS regression would be preferable.

However, we were not aware of these advantages at the

time of running our experiments, and hence we report on

results obtained with the ridge regression procedure. Mann

et al. (2005) found that RegEM reconstructions are rela-

tively insensitive to the use of a shorter calibration period

and hence, in this analysis, RegEM experiments were only

carried out for the 1860–1970 calibration period.

For the state-space model approach, we have run two

separate types of experiments in which the variable Ft in

Eq. (6) is defined differently. The first type of experiment

accounts for the impact of external forcing when recon-

structing the unknown temperature. This is achieved by

setting Xt = [1 GSt VOLt SOLt]T. The second type

of experiment only uses the information from the proxy

data for reconstruction and is done by using Xt = 1 and

� ¼ s: For experiments that investigate the impact ofexternal forcing, the variable Xt is obtained from an energy

balanced model (EBM) driven with reconstructed solar,

volcanic and anthropogenic forcings (Hegerl et al. 2003,

2007) . The EBM simulation used here is the same as that

used in Hegerl et al. (2007), from which we had available

the 30N–90N average response to greenhouse gas, sulfate

aerosol, volcanic and solar forcing.

In all the experiments, all parameters except one in Eq.

(6) are estimated through the EM algorithm (see Appendix

for details). The exception is the parameter / in theexogenous variable Ft, which needs to be estimated outside

of the EM algorithm. The value of / will be data dependentbecause / represents the lag-one autocorrelation of internalvariability. For our analysis, it is estimated by calculating

the lag-one autocorrelation of the residuals that result from

fitting Eq. (4) using the CSM or the ECHO-G model

simulated northern hemisphere mean temperature as Tt and

the EBM simulated response to forcings mentioned above

as Xt. For the CSM and ECHO-G simulations, / is esti-mated to be 0.581 and 0.741, respectively. The exogenous

variable Ft used in our analysis is then obtained using these

/ values. The precise choice of / turns out to have verylittle impact on the resulting reconstruction (not shown).

Annual mean data are used for all reconstruction

methods with some exceptions. Following the typical CPS

procedure, to estimate the parameters a and b in the for-ward regression and the variance matching method,

decadally smoothed data are used. The estimated parame-

ters are then used to scale the annual mean pseudo-proxy

series to provide the reconstructed annual hemispheric

mean temperature. For comparison, we also reconstruct the

temperature using parameters that are estimated with

annual mean data and we will denote such methods as the

non-smoothed forward regression and non-smoothed vari-

ance matching methods. On the other hand, as in Mann

et al. (1998), the pseudo-instrumental record used in the

principle component analysis of the MBH method is


123

expressed as monthly means and annual mean PCs are

subsequently obtained from the monthly mean PCs for

analysis.

Figure 1a shows examples of reconstructed NH tem-

perature evolution simulated by the CSM. The

reconstructions are based on 15 pseudo-proxy series with

SNR = 0.5. It should be noted that the same pseudo-proxy

series and pseudo-instrumental record was used in each

method to ensure that differences in performance are solely

due to the methods themselves. Among the methods that are

considered, most did provide a faithful estimate for the

simulated NH temperature. The forward regression

(smoothed and non-smoothed) and the non-smoothed vari-

ance matching methods are exceptions. Comparing the

series reconstructed with inverse regression and total least

squares regression, it is clear that the two series are visually

indistinguishable, suggesting that the neglect of sampling

uncertainty in the instrumental record when estimating f(Eq. 3) does not have a substantial impact on the results. On

the other hand, neglecting the measurement error in the

composite record when estimating a (Eq. 1) can cause asubstantial bias in the reconstruction. This can be observed

by comparing the reconstructed series obtained with for-

ward regression to that obtained with total least squares

regression. Experiments using 100 pseudo-proxy series

with SNR = 0.5 are displayed in Fig. 1b. The increase in

the number of pseudo-proxy series does seem to alleviate

the problem in the smoothed forward regression method.

The improvement gained when going from 15 to 100

pseudo-proxy series is expected because the noise variance

in the composite record of 100 pseudo-proxy series is only

15% of that with 15 pseudo-proxy series. However, such

improvement may be less significant in real-world appli-

cations given that errors might be spatially correlated.

Results obtained with the ECHO-G simulation are very

similar and thus not shown.

The test results for a reconstruction method can be

affected by both the specific realizations of noise that are

added when creating the pseudo-proxy record, and the

locations of the pseudo-proxy record. One would expect

the reconstruction to differ as a result of sampling vari-

ability from at least these two sources. Therefore, to get a

better picture of the performance of each reconstruction

method, it is necessary to apply them to a number of

realizations of the pseudo-proxy record. Hence, we

reconstructed the hemispheric mean temperature series 100

times for each method using 100 different realizations of

pseudo-proxy records. For each realization, the locations of

the pseudo-proxies also change randomly within the 321

grid boxes that are specified before, as well as the noise.

For the network with 100 proxy series, only 40 recon-

structions are produced for each method due to

computational limitations. However, 100 reconstructions

were produced for each method for the smaller 15 locations

proxy network.

To provide a quantitative assessment for each recon-

struction method, we computed the relative root mean

squared error (RRMSE) of the reconstruction error,

expressed relative to the variability of the model simulated

NH temperature during the pre-calibration period. The

RRMSE is simply defined as

RRMSE ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPnt¼1ðTt � T̂tÞ

2

Pnt¼1ðTt � TÞ

2

s

where Tt; T̂t and T are the model simulated NH tempera-

ture, the reconstructed NH temperature and the temporal

mean of the model simulated NH temperature during the

pre-calibration period respectively. In general, a smaller

RRMSE means a better reconstruction. The RRMSE can

lie between zero and infinity and a RRMSE value of less

than 1 indicates that the reconstructed series is better than a

reconstruction that has a constant value equal to the cli-

matology of the pre-calibration period.

Figure 2 shows the median of the RRMSEs obtained

from the 100 (or 40) realizations, as a function of the

degree of smoothing of the climate model simulated annual

mean series Tt and the reconstructed annual mean series T̂t:

An estimated 5–95% uncertainty range of the RRMSE is

also displayed in the figure, which is obtained by using the

5th and 95th percentiles of the sample of RRMSEs.

Comparing the results obtained between the two simula-

tions, the results are robust for most of the methods. The

performance of most of the methods considered is very

similar at decadal and lower resolution. The non-smoothed

forward regression, non-smoothed variance matching and

MBH methods are exceptions. The performance of all

methods was found to be insensitive to the two choices of

calibration period, and thus results obtained using the

shorter calibration periods are not shown. In fact, for both

the CSM and ECHO-G simulation, there is almost no

change in the median value of RRMSE when the shorter

calibration period is used.

At the annual resolution, the RegEM and state-space

model approaches produce the smallest RRMSEs. This is a

result of the more sophisticated procedures that are involved

in these methods, which in effect filter out the measurement

errors in the pseudo-proxy series. In contrast, the recon-

structed series from a typical CPS method is merely a scaled

version of the composite series, with the result that the

measurement error contained in the composite series is

directly transferred to the reconstructed series. This prob-

lem is more severe when only 15 proxy series are available

for reconstruction (Fig. 3). Hence, the simple CPS methods

should be avoided if the goal is to reconstruct high fre-

quency climate variation. At the same time, the MBH


123

method is observed to be the worst performer when

SNR = 0.5. However, when SNR is increased to 1, the

MBH method, at annual resolution, produces comparable

RRMSEs to that of most CPS methods considered.

The estimated RRMSE uncertainty ranges provide

information on the sensitivity to sampling variability for

each method. From Fig. 2, it is obvious that such sensi-

tivity is larger when only 15 pseudo-proxy series are used,

reflecting the greater sampling variability in the composite

record when only 15 proxy series are used. Inter-compa-

rison between the different methods suggests that the

sensitivity to sampling variability at annual resolution is

Fig. 1 Examples of reconstructed CSM northern hemisphere meantemperature series. Experiments were run using SNR = 0.5 and the

1860–1970 calibration period with a 15 and b 100 pseudo-proxyseries. 11-year moving averages are shown. The CSM NH mean is

shown in black. Reconstructions are shown as coloured lines. Allseries are express as anomalies relative to the calibration period. Units

(degrees Kelvin)


123

somewhat smaller for the RegEM and state-space model

approaches. At decadal or lower resolution, the RRMSE

uncertainty range is very similar across most methods.

By comparing the results of the two state-space model

approaches (with external forcing response and without), it

is clear that the RRMSE is insensitive to the inclusion or

Fig. 2 Relative root mean squared error (RRMSE) of the reconstruc-tion error, expressed relative to the variability of the simulated

hemispheric temperature. Units (degrees Kelvin). The median

RRMSE is indicated with horizontal bars and the estimated 5–95%

range of the RRMSEs is shown with vertical lines. Results using the

1860–1970 calibration period with different signal-to-noise ratios and

varying number of pseudo-proxies are shown for the two climate

model simulations: a CSM and b ECHO-G


123

exclusion of the EBM estimated forcing response infor-

mation. Nevertheless, the reconstructed series from the two

approaches are slightly different when only 15 pseudo-

proxy series are used (Fig. 1a). This suggests that the

Kalman filter and smoother algorithm relies more heavily

on the proxy series than the estimated response to forcings

to estimate the unknown temperature. Even though the

reconstructions from the two state-space model approaches

are very similar, the approach that takes forcing changes

into account may be more useful in some instances since it

may be possible to use it to provide a detection assessment

and perhaps also a projection of future climate.

As mentioned above, one can use the state-space model

to simultaneously reconstruct the NH temperature and

conduct detection analysis. Both the CSM and ECHO-G

simulations are forced with a combination of external

forcing factors and therefore we should be able to detect

the effect of external forcing in our experiments provided

that these models respond similarly to forcing as the EBM.

Figure 4 shows the confidence bounds on the forcing

response coefficients for the experiments using SNR = 0.5.

For dGS and dVOL, the confidence bounds do not includezero for almost all experiments that were run, indicating

that the EBM simulated GS and VOL signals are detectable

in the reconstruction of the CSM and ECHO-G simula-

tions. However, the response to solar forcing as simulated

by the EBM is not detectable in about half of the CSM

reconstructions obtained using 15 pseudo-proxy series.

This fraction is reduced to near zero when the SNR is

increased to 1. In contrast, the EBM simulated SOL signal

is detectable in all the CSM experiments that use 100

pseudo-proxy series and in all ECHO-G experiments. The

inability to detect the SOL forcing in some experiments

may be due to the fact that the climate response to solar

forcing is relatively weaker than that to the other forcings

and hence may be harder to detect when the noise con-

tamination in the pseudo-proxy series increases. At the

same time, unlike the ECHO-G and EBM simulations, the

solar forcing estimates used in the CSM simulation

excluded the 11-yr solar cycle and this may also contribute

to the varying detection results for the response to solar

forcing. The inability to consistently detect the response to

SOL forcing in our experiments is consistent with detection

work on real-world paleo-reconstructions (Hegerl et al.

2003, 2007).

Figure 5 displays hindcasts of annual NH mean tem-

perature for 1971 to 1990 with 100 pseudo-proxy series,

SNR = 0.5 and the 1860–1970 calibration period. The

hindcasts are produced using Eq. (9). For comparison, the

sum of the responses to external forcings for the average of

30N–90N as simulated by the EBM is also displayed in the

figure. We have produced two hindcasts for each climate

model using the same set of pseudo-proxy series, one using

the full 1000–1970 period to estimate the parameters for

the state-space model and another using only data from

1800 to 1970. It can be observed that the skill of the

hindcast varies between the two analysis periods. In fact,

the estimates of the forcing response coefficients, which are

influential to the hindcast values, are substantially different

for the two analysis periods (Table 1). However, these

Fig. 3 Example of thedifference in temperature

anomalies between the ECHO-

G simulation and the

reconstructed ECHO-G series

using SNR = 0.5 and the 1860–

1970 calibration period, plotted

at annual resolution with a 15pseudo-proxy series and b 100pseudo-proxy series


123

differences have only a minor influence on the recon-

structed series (not shown) because the Kalman smoother

algorithm relies more heavily on the proxy data than the

EBM to reconstruct the unknown temperature.

Figure 6 displays examples of reconstructed NH tem-

perature from the MBH method using the two different

simulations and different SNR with the 1860–1970 cali-

bration period. Von Storch et al. (2004) included a

detrending step in their test of the MBH method. We

therefore repeated our experiments with detrended cali-

bration data and those results are also shown in Fig. 6.

When the variance of the noise added in the pseudo-proxy

series is the same as the grid-box temperature variance, that

is, when SNR=1, the non-detrended MBH method was able

to provide a reasonable reconstruction of the ECHO-G

simulated hemispheric mean temperature. However, results

Fig. 4 95% confidence boundsfor the coefficients used to scale

the EBM simulated responses to

external forcing in the state-

space model when the pseudo-

proxy SNR is 0.5 and using the

1860–1970 calibration period.

Only results from 5 out of the

100 (or 40) experiments are

displayed. The number in the

bottom left corner of each boxindicates the percentage of

confidence bounds (out of 100

or 40) that excludes zero

Fig. 5 Hindcasts of annualmean NH temperature based on

the estimated state equation

from 100 proxy series for two

analysis periods: 1000–1970

and 1800–1970. The sum of

response to external forcings for

the 30N–90N average as

simulated by the energy balance

model is also displayed for

comparison purposes. All series

are expressed as anomalies

relative to the 1860–1970 period


123

become unsatisfactory when the SNR is decreased to 0.5.

For the reconstruction of CSM hemispheric mean tempera-

ture, the result is poor with both SNRs. Although our

calibration period is longer than that used in Mann et al.

(1998), our result does not change substantially when the

shorter 1880–1960 calibration period is used (not shown).

While our result does suggest the MBH method under-

estimates long term variability, the under-estimation is

smaller when non-detrended data is used (see Wahl et al.

2006; von Storch et al. 2006 for further discussions).

We also tested the CPS methods that are mentioned in

the previous section using detrended calibration data (not

shown). The performance of the CPS methods varies across

different realizations of pseudo-proxy records. In some

cases, detrending does not affect the reconstructed series

irrespective of which CPS method is considered. On the

other hand, there were also realizations of pseudo proxies

for which there was under-estimation of variability when

detrended data are used. Therefore, in the context of

pseudo-proxies constructed with white noise, detrending

results in less robust reconstructions of hemispheric mean

temperature variability.

3.1.1 Analysis with red pseudo-proxy noise

Up to this point, our experiments have not taken into

account the possibility that the proxies consist of a tem-

perature signal plus correlated errors. We therefore now

examine the impact of red pseudo-proxy noise on the

reconstruction methods. Following Mann et al. (2007), red

pseudo-proxy noise is generated from an AR(1) process

with lag-one autocorrelation equal to 0.32. As before,

analyses are conducted using two calibration periods and

with SNR = 0.5 and 1.

Examples of reconstructed NH temperature evolution

simulated by the CSM based on red pseudo-proxy series

with SNR=0.5 are displayed in Fig. 7. It is clear that the

redness of the pseudo-proxy noise has slightly increased

the variability of the reconstructed series when 15 pseudo-

proxy series are used. On the other hand, series recon-

structed with 100 pseudo-proxy series do not seem to be

affected. Results obtained with the ECHO-G simulation are

very similar and thus not shown.

The estimated RRMSEs obtained with red pseudo-proxy

series (not shown) are very similar to those shown in

Fig. 2. In particular, the estimated median RRMSEs are

almost unchanged and the relative ranking of the RRMSEs

between the different reconstruction methods remains the

same as in the case of white pseudo-proxy series. Hence,

similar conclusions regarding the performance of the dif-

ferent methods can be drawn as before. However, the

RRMSE uncertainty ranges are slightly larger than before

when 15 pseudo-proxy series are used.

3.2 Analysis with real-world paleoclimate proxy data

We apply the CPS methods and state-space model

approach to the paleoclimate proxy data used in Hegerl

et al. (2007). This data set consists of 14 proxy series. All

records are available as decadally smoothed series. As in

Hegerl et al. (2007), we calculated correlation based

weighted averages to form the composite record. Using

1880–1960 as the calibration period, the 30N–90N mean

temperature is reconstructed for the period 1510–1960. As

noted previously, the use of the state-space model approach

requires fixing the parameter / in the exogenous variableFt. Here, the value of this parameter is estimated by the

lag-one autocorrelation of the decadally smoothed 30N–

90N mean temperature from a control simulation of the

CCCma CGCM2 (Flato and Boer 2001). The resulting /value, 0.982, was used to obtain the exogenous variable Ft.

Figure 8 compares the reconstructed series obtained

from the variance matching method, state-space model

approach and Hegerl et al. (2007), who used the total least

squares method. The reconstruction estimates obtained from

these approaches are nearly identical. Reconstructed series

from the other CPS approaches also agree closely with the

series in Fig. 8 (not shown). This finding is consistent with

results obtained using climate model simulations in the

previous section where it was seen that these methods have

similar RRMSEs at the decadal resolution.

Estimates of the parameters of the state-space model

(with forcing) are given in Table 2. The parameter / isestimated to be close to 1, which is larger than that

obtained using climate model simulations with annual

mean data, which ranges from 0.3 to 0.7. This is not sur-

prising given that decadally smoothed data is strongly

dependent between successive time points. The estimate

parameter value for dGS and dVOL is significantly differentfrom zero, suggesting that the response to GS and VOL

forcing are detectable. On the other hand, the response to

SOL forcing is not detected. These detection results agree

with the findings reported in Hegerl et al. (2007).

Table 1 Estimated parameter values of the state-space modelobtained with 100 pseudo-proxy series using two analysis periods

Parameter CSM ECHO-G

1000–1970 1800–1970 1000–1970 1800–1970

dGS 1.315 2.105 0.775 1.978

dVOL 0.994 2.334 1.270 1.875

dSOL 0.891 -0.298 2.779 6.456

Boldfaced values are significant


123

Fig. 6 Comparison of the reconstructed hemispheric mean temper-ature series obtained using the MBH method with non-detrended or

detrended data at two signal-to-noise ratios a SNR = 0.5 and

b SNR = 1 with 100 pseudo-proxy series and the 1860–1970calibration period. All series are expressed as 11-year running means

and as anomalies relative to the calibration period


123

The 30N–90N decadally smoothed mean temperature

hindcast for 1961–1990 is displayed in Fig. 9. Hindcasts

are generated for three analysis periods (1270–1960, 1510–

1960 and 1800–1960). Confidence bounds on the hindcast

for the 1510–1960 analysis period are also displayed in the

figure. The hindcasts obtained from the 1270–1960 and

1510–1960 analysis periods are very similar and is very

close to the sum of the response to external forcings as

simulated by the EBM. The confidence bounds on the

hindcasts generated for the 1510–1960 analysis were able

to include almost all the observed temperature anomalies.

Similar results can be obtained for the 1270–1960 analysis

Fig. 7 Examples of reconstructed CSM northern hemisphere meantemperature series. Experiments were run using SNR = 0.5 and the

1860–1970 calibration period with a 15 and b 100 red pseudo-proxyseries. 11-year moving averages are shown. The CSM NH mean is

shown in black. Reconstructions are shown as coloured lines. Allseries are express as anomalies relative to the calibration period. Units

(degrees Kelvin)


123

period (not shown). On the other hand, the hindcasts

obtained from the 1800–1960 analysis period warm too

quickly. This is because the estimated value of dGS is fourtimes larger than that in the other two analysis periods

(Table 2).

4 Conclusion

In this paper, we have compared the skill of several dif-

ferent reconstruction methods using climate model

simulations. At the annual resolution, the state-space model

and RegEM approaches provide the best reconstructions.

On the other hand, when compared at decadal or lower

resolution, we find that most methods can provide satis-

factory and similar results. Exceptions are the MBH, non-

smoothed forward regression and non-smoothed variance

matching methods. When analysed with decadally

smoothed real-world paleoclimate proxy data, we find that

all of the CPS methods considered provide almost identical

results. The similarity in performance provides evidence

that the difference between many real-world reconstruc-

tions is more likely to be due to the choice of the proxy

series, or the use of difference target seasons or latitudes

than to the choice of statistical reconstruction method (see

also Juckes et al. 2006).

We have also put forward another approach to historical

temperature reconstruction that is based on a state-space

time series model and the Kalman filter and smoother

algorithm. This approach allows the possibility of incor-

porating additional non-proxy information into the

reconstruction analysis, such as the estimated response to

external forcing. However, our experiments show that the

state-space model approach does not produce substantially

different reconstructions when such information is inclu-

ded. Nevertheless, both state-space model approaches

provided better reconstructions than existing CPS methods

at annual resolutions. At the same time, including forcing

response terms in the state-space model allows one to carry

out a simultaneous reconstruction and detection analysis. It

can also be used to provide forecasts of future climates.

Consistent with the results of Hegerl et al. (2007), we have

detected the effects of anthropogenic forcing (greenhouse

gas and aerosol) and volcanic forcing in real-world

paleoclimate proxy data.

Fig. 8 Reconstructed series for the 30N–90N mean using thevariance matching method and state-space model approach for real-

world paleoclimate proxy data. Temperature series are expressed as

11-year moving averages. All series are expressed as anomalies

relative to the 1880–1960 period and in units of degrees Kelvin

Table 2 Estimated parameter values of the state-space model obtained with paleoclimate proxy data for three reconstruction periods

Parameter 1270–1960 1510–1960 1800–1960

/ 0.981 (0.967, 0.994) 0.979 (0.961, 0.997) 0.944 (0.900, 0.988)

dGS 1.022 (0.043, 2.001) 1.128 (0.112, 2.143) 4.271 (1.054, 7.489)

dVOL 0.953 (0.685, 1.220) 0.814 (0.522, 1.106) 0.998 (0.461, 1.535)

dSOL 0.784 (-0.592, 2.161) 0.368 (-1.218, 1.955) -0.203 (-3.211, 2.805)

The 95% confidence interval is listed in brackets. Boldfaced values are significant

Fig. 9 State equation hindcast of decadally smoothed 30N–90Nmean temperature with real-world paleoclimate proxy data. All

hindcasts are based on the estimated state equation for three analysis

periods: 1270–1960, 1510–1960 and 1800–1960. Nine proxy series

are available for the 1270–1960 analysis period and 14 proxy series

are available for the other two analysis period. 95% confidence

bounds of the hindcasts for the 1510–1960 analysis period are shown

as dashed lines. All series are expressed as anomalies relative to the

1880–1960 period and in units of degrees Kelvin


123

Acknowledgments We thank Caspar Ammann, Gabriele Hegerland Eduardo Zorita for providing their data for use in this study. We

also thank Gabriele Hegerl for helpful and constructive discussion.

We gratefully acknowledge that Terry Lee was supported by the

Canadian Foundation for Climate and Atmospheric Science through

the Canadian CLIVAR Research Network. Work by Min Tsao was

supported by the Natural Sciences and Engineering Research Council

through a Discovery Grant. This paper was improved by insightful

and helpful comments provided by Scott Rutherford, Walter Skinner,

Xuebin Zhang and an anonymous referree.

5 Appendix

5.1 Expectation maximization algorithm

Here we present the steps required to estimate the unknown

parameters that specify the state-space model (Eq. 6). We

use H ¼ ff;R;/; � ;Q; l;Rg to represent the vector ofunknown parameters. The derivations presented here are

expanded from Shumway and Stoffer (1982, 2000, pp.

321–325). Detail derivations of the following procedure are

given in Lee (2008). We first write down the likelihood

function for {Pt; t = 1, 2, ..., n} as in Shumway and Stoffer

(2000). Ignoring a constant, the likelihood function LP(H)can be expressed as:

�2 ln LPðHÞ ¼Xnt¼1

lnðXtÞ þXnt¼1ðe2t =XtÞ

where Xt = f2St

t-1 + R [ 0 and et = Pt - fTtt-1 for t = 1, 2,

..., n. The calibration period hemispheric mean

temperatures {Tt; t = n + 1, ..., n + m } are not involved

in the above likelihood function. To better utilize the

available information, we have modified the likelihood

function to include {Tt; t = n + 1, ..., n + m}. Ignoring a

constant, one can express the likelihood function for the

observation data set {Pt,Ts; t = 1, 2, ..., n; s = n + 1, ...,

n + m}, namely LP,T(H), as

�2 lnLP;TðHÞ ¼Xnþmt¼1

lnðXtÞþXnþmt¼1ðe2t =XtÞþ ðm� 1Þ lnðQÞ

þ lnð/2SnnþQÞþ ðTnþ1�/Tnn��Fnþ1Þ2=

ð/2SnnþQÞþXnþm

t¼nþ2ðTt�/Tt�1��FtÞ2=Q

where Xt and et is defined as before for t = 1, 2, ..., n. Fort = n + 1, ..., n + m, Xt = R and et = Pt - fTt. The goalhere is to find the H values that maximize the likelihoodfunction LP,T(H). Using the EM algorithm, such H valuescan be found through an iterative procedure. The

estimation procedure is summarized as follows.

1. Select the starting values for the parameters Hð0Þ ¼ffð0Þ;Rð0Þ;/ð0Þ; � ð0Þ;Qð0Þ; lð0Þg while fixing R, one ofthe initial condition for the Kalman recursion. (In our

analysis, we set R = 0.05. Results were almost iden-tical when different R values were used.) On iterationj, (j = 1, 2, ...), do steps 2–4.

2. Let N = n + m. Using H(j-1), compute the Kalmanfilter and smoother estimates using Eqs. (7) and (8) for

t = 1, 2, ..., N and the likelihood function lnLP,T(H(j-1)).

Also calculate, for t = N - 1, N - 2, ..., 0

SNt ¼ Stt þ J2t ðSNtþ1 � Sttþ1Þ

and for t = n, n - 1, ..., 0,

Ct ¼ Jt�1SNteTNt ¼ TNt þ ðJtJtþ1. . .JnÞðTnþ1 � TNnþ1ÞeSNt ¼ SNt � ðJtJtþ1. . .JnÞ2SNnþ1

and the following quantities:

Z11 ¼Xnt¼1½ðeTNt Þ2 þ eS

N

t � þXN

t¼nþ1ðTtÞ2

Z00 ¼Xnt¼0½ðeTNt Þ2 þ eS

N

t � þXN�1

t¼nþ1ðTtÞ2

Z10 ¼Xnt¼1ðeTNt eT

N

t�1 þ CtÞ þ Tnþ1eTN

n þXN

t¼nþ2TtTt�1

F11 ¼Xnt¼1

Ft eTNt þXN

t¼nþ1FtTt

F10 ¼Xnþ1t¼1

Ft eTNt�1 þXN

t¼nþ2FtTt�1

F00 ¼XNt¼1

FtFTt :

3. Obtain H(j) using the following:

fðjÞ ¼Xnt¼1½ðeTNt Þ2 þ eS

N

t � þXN

t¼nþ1T2t

" #�1

Xnt¼1

eTtjNPt þXN

t¼nþ1TtPt

" #

RðjÞ ¼ N�1Xnt¼1

eSNt ðfðjÞÞ2 þ ðPt � fðjÞeTN

t Þ2

h i

þ N�1XN

t¼nþ1Pt � fðjÞTt� �2

� ðjÞ ¼ FT11 � FT10Z10=Z00� �

F00 � F10FT10=Z00� ��1

/ðjÞ ¼ Z10 � � ðjÞF10� �

Z�100

QðjÞ ¼ N�1 Z11 � /ðjÞZ10 � � ðjÞF11� �

lðjÞ0 ¼ eTN

0 :

4. Repeat steps 2 and 3 until convergence. For the

analysis presented in this paper, the algorithm is

stopped when lnLP,T(H(j)) - ln LP,T(H

(j-1)) \ 0.0005.


123

To provide a final estimate of Tt, the Kalman filter and

smoother estimates are recalculated using Eqs. (7) and (8)

with the estimate parameters Ĥ; for t = 1, 2, ..., n + m. Atthe time of convergence, one can also calculate the

standard errors for Ĥ: For parameters estimated using lnLP(H), the asymptotic variance of Ĥ is defined as (Caines1988, Chap. 7; Jensen and Petersen 1999):

VarðĤÞ ¼ � o2

oHln LPðHÞ

��Ĥ

� ��1

where q2/qH denotes the second derivatives with respect toH. In our application, we have used LP,T(H) in the estimationprocedure and hence a reasonable estimate of the asymptotic

variance can be obtained by replacing LP(H) with LP,T(H) inthe above formula. More discussions of this can be found in

Lee (2008). An analytical form of the asymptotic variance is

generally hard to find and hence we calculated it numeri-

cally. The asymptotic variance can be used to provide

confidence bounds for the estimated parameters. In partic-

ular, the 95% confidence bound for H is defined asĤ� 1:96

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVarðĤÞ

q: In our application, we are interested in

the confidence bounds for the parameters dGS, dVOL anddSOL. These bounds can provide a detection assessment ofthe importance of the response to the GS, VOL and SOL

forcing on the hemispheric mean temperature.

References

Allen MR, Stott PA (2003) Estimating signal amplitudes in optimal

fingerprinting. Part I: theory. Clim Dyn 21:477–491

Ammann CM, Joos F, Schimel D, Otto-Bliesner BL, Tomas R (2007)

Solar influence on climate during the past millennium: results

from transient simulations with the NCAR Climate System

Model. Proc Nat Acad Sci 104:3713–3718

Briffa KR, Osborn TJ, Schweingruber FH, Harris IC, Jones PD,

Shiyatov SG, Vaganov EA, (2001) Low-frequency temperature

variations from a northern tree-ring density network. J Geophys

Res 106:2929–2941

Brohan P, Kennedy JJ, Haris I, Tett SFB, Jones PD (2006)

Uncertainty estimates in regional and global observed temper-

ature changes: a new dataset from 1850. J Geophys Res

111:D12106

Brown PJ (1993) Measurement, regression, and calibration. Oxford

University Press, Oxford, 201 pp

Bürger G, Cubasch U (2005) Are multiproxy climate reconstructions

robust? Geophys Res Lett 32:L23711

Bürger G, Fast I, Cubasch U (2006) Climate reconstruction by

regression—32 variations on a theme. Tellus 58A:227–235

Caines PE (1988) Linear stochastic systems. Wiley, New York,

847 pp

Coelho CAS, Pezzulli S, Balmaseda M, Doblas-Reyes JJ, Stephenson

D (2004) Forecast calibration and combination: a simple

Bayesian approach for ENSO. J Clim 17:1504–1516

Durbin J, Koopman SJ (2001) Time series analysis by state space

methods. Oxford University Press, Oxford, 253 pp

Esper J, Cook ER, Schweinngruber FH (2002) Low-frequency in long

treering chronologies for reconstruction past temperature vari-

ability. Science 295:2250–2253

Esper J, Frank DC, Wilson RJ, Briffa KR (2005) Effect of scaling and

regression on reconstructed temperature amplitude for the past

millennium. Geophys Res Lett 32:L07711

Fierro RD, Golub GH, Hansen PC, O’Leary DP (1997) Regularization

by truncated total least squares. SIAM J Sci Comput 18:1223–

1241

Flato G, Boer GJ (2001) Warming asymmetry in climate change

experiments. Geophys Res Lett 28:195–198

Fuller WA (1987) Measurement error models. Wiley, New York,

440 pp

Harvey A (1989) Forecasting, structural time series models and the

Kalman filter. Cambridge University Press, Cambridge, 554 pp

Hegerl GC, Crowley TJ, Baum SK, Kim K-Y, Hyde WT (2003)

Detection of volcanic, solar, and greenhouse gas signals in

paleo-reconstructions of Northern Hemispheric temperature.

Geophys Res Lett 30:1242 doi:10.1029/2002GL016635

Hegerl GC, Crowley TJ, Allen MR, Hyde WT, Pollack HN, Smerdon

JE, Zorita E (2007) Detection of human influence on a new,

validated 1500 year temperature reconstruction. J Clim 20:650–

666

Jensen JL, Petersen NV (1999) Asymptotic normality of the

maximum likelihood estimator in state space models. Ann Stat

27:514–535

Jones PD, Osborn TS, Briffa KR (1997) Estimating sampling errors in

large-scale temperature averages. J Clim 10:2548–2568

Jones, PD, Briffa KR, Barnett TP, Tett SFB (1998) High-resolution

palaeoclimatic records for the last millennium: Integration,

interpretation and comparison with general circulation model

control run temperatures. Holocene 8:455–471

Juckes MN, Allen MR, Briffa KR, Esper J, Hegerl GC, Moberg A,

Osborn TJ, Weber SL, Zorita E (2006) Millennial temperature

reconstruction intercomparison and evaluation. Clim Past Dis-

cuss 2:1001–1049

Kalman R (1960) A new approach to linear filtering and prediction

problems. J Basic Eng 82:34–45

Lee TCK (2008) On statistical approaches to climate change analysis.

University of Victoria, PhD dissertation, to be available on

https://dspace.library.uvic.ca:8443/dspace/

Mann ME, Bradley RS, Hughes MK (1998) Global-scale temperature

patterns and climate forcing over the past six centuries. Nature

392:779–787

Mann ME, Rutherford S, Wahl E, Ammann CM (2005) Testing the

fidelity of methods used in proxy-based reconstructions of past

climate. J Clim 18:4097–4107

Mann ME, Rutherford S, Wahl E, Ammann CM (2007) Robustness of

proxy-based climate field reconstruction methods. J Geophys

Res (in press)

Moberg A, Sonechkin DM, Holmgren K, Datsenko NM, Karlen W

(2005) Highly variable Northern Hemisphere temperatures

reconstructed from low- and high-resolution proxy data. Nature

433:613–617

Osborn TJ, Raper SCB, Briffa KR (2006) Simulated climate change

during the last 1,000 years: comparing the ECHO-G general

circulation model with the MAGICC simple climate model. Clim

Dyn 27:185–197

Rutherford S, Mann ME, Osborn TJ, Bradley RS, Briffa KR, Hughes

MK, Jones PD (2005) Proxy-based Northern Hemisphere surface

temperature reconstructions: Sensitivity to methodology, predic-

tor network, target season, and target domain. J Clim 18:2308–

2329

Schneider T (2001) Analysis of incomplete climate data: Estimation

of mean values and covariance matrices and imputation of

missing values. J Clim 14:853–871

Shumway RH, Stoffer DS (1982) An approach to time series

smoothing and forecasting using the EM algorithm. J Time Ser

Anal 3:253–264


123

http://dx.doi.org/10.1029/2002GL016635https://dspace.library.uvic.ca:8443/dspace/

Shumway RH, Stoffer DS (2000) Time series analysis and its

applications. Springer, Heidelberg, 549 pp

von Storch H, Zorita E, Jones JM, Dimitriev Y, Gonzalez-Rouco F,

Tett SFB (2004) Reconstructing past climate from noisy data.

Science 306:679–682

von Storch H, Zorita E, Jones JM, Gonzalez-Rouco F, Tett SFB

(2006) Response to comment on ‘‘Reconstructing past climate

from noisy data’’. Science 312:529c

Wahl ER, Ritson DM, Ammann CM (2006) Comment on ‘‘Recon-

structing past climate from noisy data’’. Science 312:529b

Zorita E, von Storch H (2005) Methodical aspects of reconstructing non-

local historical temperatures. Memorie Soc Astron Ital 76:794–801

Zorita E, Gonzalez-Rouco JF, Legutke S (2003) Testing the Mann

et al. (1998) approach to paleoclimate reconstructions in the

context of a 1000-yr control simulation with the ECHO-G

coupled climate model. J Clim 16:1378–1390


123

Evaluation of proxy-based millennial reconstruction methodsAbstractIntroductionReconstruction approachesExisting approachesState-space model and Kalman filter algorithm

ResultsAnalysis with climate model simulationsAnalysis with red pseudo-proxy noise

Analysis with real-world paleoclimate proxy data

ConclusionAcknowledgmentsAppendixExpectation maximization algorithm

References

/ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 150 /GrayImageDepth -1 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 600 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile (None) /PDFXOutputCondition () /PDFXRegistryName (http://www.color.org?) /PDFXTrapped /False

/Description >>> setdistillerparams> setpagedevice

evaluation of proxy-based millennial reconstruction methods...juckes et al. 2006). in this paper, we...

Documents