evaluation of proxy-based millennial reconstruction methods...juckes et al. 2006). in this paper, we...
TRANSCRIPT
-
Evaluation of proxy-based millennial reconstruction methods
Terry C. K. Lee Æ Francis W. Zwiers ÆMin Tsao
Received: 3 July 2007 / Accepted: 9 November 2007
� Springer-Verlag 2007
Abstract A range of existing statistical approaches for
reconstructing historical temperature variations from proxy
data are compared using both climate model data and real-
world paleoclimate proxy data. We also propose a new
method for reconstruction that is based on a state-space
time series model and Kalman filter algorithm. The state-
space modelling approach and the recently developed
RegEM method generally perform better than their com-
petitors when reconstructing interannual variations in
Northern Hemispheric mean surface air temperature. On
the other hand, a variety of methods are seen to perform
well when reconstructing surface air temperature variabi-
lity on decadal time scales. An advantage of the new
method is that it can incorporate additional, non-tempera-
ture, information into the reconstruction, such as the
estimated response to external forcing, thereby permitting a
simultaneous reconstruction and detection analysis as well
as future projection. An application of these extensions is
also demonstrated in the paper.
Keywords Kalman filter � State-space model �Temperature reconstruction
1 Introduction
A number of studies have attempted to reconstruct hemi-
spheric mean temperature for the past millennium from
proxy climate indicators. However, the available recon-
structions vary considerably. Different statistical methods
are employed in these reconstructions and it therefore
seems natural to ask how much of this discrepancy is
caused by the variations in methods. The reliability of
some of the reconstruction methods has been looked at in
different studies (e.g. Zortia et al. 2003; von Storch et al.
2004; Bürger and Cubasch 2005; Esper et al. 2005; Mann
et al. 2005; Zorita and von Storch 2005; Bürger et al. 2006;
Juckes et al. 2006).
In this paper, we provide an empirical comparison
between different reconstruction methods. Analyses are
carried out using both pseudo-proxy data from climate
models and real-world paleoclimate proxy data. We will
also propose a method for reconstruction that is based on a
state-space time series model and Kalman filter algorithm.
As an option, this method allows one to simultaneously
incorporate historical forcing information to reconstruct the
unknown historical temperature, and to carry out detection
analysis to quantify the influence of external forcing on
historical temperature.
The remainder of this paper is organized as follows. The
existing statistical procedures used for reconstruction are
discussed in Sect. 2, together with the approach that uti-
lizes a state-space time series model and Kalman filter
algorithm. Following the general approach proposed by
von Storch et al. (2004), comparisons between these
methods are made with the help of both climate model data
and real-world paleoclimate proxy data. The results are
presented in Sect. 3. Concluding remarks are given in
Sect. 4.
T. C. K. Lee � M. TsaoDepartment of Mathematics and Statistics,
University of Victoria, Victoria, BC, Canada
F. W. Zwiers (&)Climate Research Division, Environment Canada,
4905 Dufferin Street, Toronto, ON M3H 5T4, Canada
e-mail: [email protected]
123
Clim Dyn
DOI 10.1007/s00382-007-0351-9
-
2 Reconstruction approaches
2.1 Existing approaches
All reconstruction approaches, in general, involve statisti-
cal procedures that map the proxy series onto the
reconstructed temperature. The mapping involves first
finding the relationship between the proxy data and the
instrumental record during a calibration period when both
records are available. This relationship is then applied to
the proxy data to provide the reconstructed historical
temperature.
Mann et al. (2005) classified the existing reconstruction
approaches into two major categories: composite plus scale
(CPS) methods and climate field reconstruction (CFR)
methods. In the CPS approach, a number of proxy series is
first combined together to form a composite record, which
is then used to reconstruct the temporal evolution of mean
temperature over some spatial domain, typically the
hemispheric mean. In many studies, the reconstruction
target is the northern hemispheric mean temperature
because proxy data, derived mainly from trees, are mostly
available from the northern hemisphere land masses. The
composite record is typically formed by calculating a
weighted average of all the available proxy series. The
weights can either be uniform (e.g. Jones et al. 1998; Briffa
et al. 2001; Esper et al. 2002) or can be determined using
the correlation between the proxy series and the instru-
mental record during the calibration period (e.g. Hegerl
et al. 2007). The correlation based weighting scheme has
the advantage of minimizing the influence of potentially
unreliable proxy series on the composite record. One recent
study by Moberg et al. (2005) used a different averaging
scheme, in which high-frequency and low-frequency
composites were first formed individually using wavelet
transformation. Each of these composites was formed using
only proxy indicators that were thought to be able to cap-
ture variability in the corresponding frequency range. The
two composites are then combined to form a single com-
posite. This averaging scheme can be beneficial when the
individual proxy series are known to capture variability at
different timescales.
Once the composite is formed, it is then calibrated to
produce the reconstructed temperature. One calibration
approach, the forward regression approach, utilizes the
ordinary least squares regression model which assumes
that, at time t, for t = n + 1, n + 2, …, n + m,
Tt ¼ aPt þ et ð1Þ
where n + m and m are the length of the composite proxy
record and instrumental record respectively (m is in general
much smaller than n), T is the mean hemispheric instru-
mental record, P is the composite proxy record and et
represents error that results from incomplete spatial sam-
pling in the instrumental record. The coefficient a scales Ptto Tt and is estimated by ordinary least squares regression
which minimizes the residual sum of squares between aPtand Tt during the calibration period. By assuming that Eq.
(1) also holds at times prior to the calibration period, the
unknown historical hemispheric mean temperature can
then be reconstructed by scaling the pre-calibration period
composite record Pt by a, for t = 1, 2, ..., n. The estimate ofa, denoted by â; is in general negatively biased because ofthe measurement error and non-temperature variability
inherit in the composite proxy series (see, e.g. Fuller 1987;
Allen and Stott 2003). This under-estimation will result in
a loss of variance in the reconstructed series because
VarðâPÞ\VarðaPÞ for 0\â\a:One way to avoid this underestimation in a is to use the
total least squares (TLS) method to estimate a (see Allenand Stott 2003 for an application of this technique in cli-
mate change detection analysis). In this case the statistical
relationship between Pt and Tt is defined by
Tt ¼ aðPt � gtÞ þ et ð2Þ
where the additional term gt represents non-temperaturevariability in the composite proxy series. By explicitly
incorporating gt into the regression model, the underlyingvalue of a, in theory, can be better estimated. Hegerl et al.(2007) applied the total least squares method in their
reconstruction. To derive the best guess estimate of a, oneneeds to know the ratio of Var(gt) to Var(et). Since thisratio is unknown in real-world applications, it is estimated
using climate model simulations. Hegerl et al. (2007) find
that their reconstruction is insensitive to the precise choice
of the Var(gt) to Var(et) ratio.Another CPS reconstruction method, which is termed
the variance matching approach, is favored by Jones et al.
(1998) and others. In this approach, the pre-calibration
period Pt is scaled by a parameter b, where b is determinedso that, during the calibration period, Var(T) = Var(b P).
A variant of the forward regression model (Eq. 1) is the
inverse regression model (see Brown 1993, for an intro-
duction; Coehlo et al. 2004, for an example)
Pt ¼ fTt þ gt ð3Þ
where gt is defined similarly as in Eq. (2) and f is estimatedby minimizing the residual sum of squares between fTt andPt during the calibration period. The reconstructed hemi-
spheric temperature is then estimated by Pt/f. Differentimplementations of the inverse regression method have
been used in the paleoclimate reconstruction literature. In
Mann et al. (1998), the regression is done between the
principal components of the instrumental record and the
individual proxy indictor (see latter part of this section for
details), some of which were obtained by principle
T. C. K. Lee et al.: Evaluation of proxy-based millennial reconstruction methods
123
-
component analysis from a set of proxy series. In Juckes
et al. (2006), the inverse regression is done between each
individual proxy series and the Northern Hemisphere
annual mean temperature. A weighted average of the
individual proxy series, weights determined by the
regression coefficients, is then used as the reconstructed
series.
Note that the instrumental record Tt is subject to sam-
pling uncertainty (see Jones et al. 1997 for an estimate of
this uncertainty) and neglecting such uncertainty when
estimating f in inverse regression will result in an estimatethat is negatively biased. However, such bias would be
smaller than the bias incurred by using forward regression
because the non-temperature variability inherent in the
composite record is usually larger than the sampling
uncertainty in the instrumental record. On the other hand,
the total least squares approach used by Hegerl et al.
(2007), in theory, should provide a more accurate estimate
of the regression coefficient when both Pt and Tt are noise
contaminated. However, this only holds if the ratio of
Var(gt) to Var(et), which is required to derive the estimateof a, is known. This ratio is unknown in real-worldapplications and is estimated using a limited number of
climate model simulations. The bias from such estimation
is hard to determine and thus its impact on the recon-
struction is unclear. For this reason, it is not obvious
whether inverse regression or total least squares regression
will result in smaller bias.
In the case of the CFR approach, the proxy series are
used to reconstruct both the underlying temporal and spa-
tial patterns of historical temperature. The Mann et al.
(1998) method (often referred to as the MBH method) is an
example of a technique that uses the climate field recon-
struction approach. The MBH method brings together
techniques used in principal component analysis and
regression. The instrumental record is first decomposed
into its spatial and temporal parts through principal com-
ponent analysis and only a subset of the components are
retained. Next, the relationship between the subset of
temporal principal components (PCs) and the ith proxy
series (i = 1, 2, ..., p) during the calibration period is
obtained by inverse regression which estimates the coeffi-
cients bj(i) (j = 1, 2, ..., Neof) in
PðiÞnþ1
PðiÞnþ2...
..
.
PðiÞnþm
266666664
377777775¼ U
bðiÞ1bðiÞ2...
bðiÞNeof
266664
377775þ dðiÞ
where Neof is the number of EOFs retained, Pt(i) is the ith
proxy data at time t , U is the matrix of temporal PCs
(m 9 Neof) of the instrumental record and d(i) is a noise
series (m 9 1). This procedure is repeated for each proxy
series and this yields a matrix of coefficients, denoted by G
(p 9 Neof), which is defined as,
G ¼
bð1Þ1 bð1Þ2 � � � b
ð1ÞNeof
bð2Þ1 bð2Þ2 � � � b
ð2ÞNeof
..
. ... ..
. ...
bðpÞ1 bðpÞ2 � � � b
ðpÞNeof
2666664
3777775
The pre-calibration period temporal PCs at time t, denoted
by Zt (Neof 9 1), are then reconstructed through least
squares regression using
Pð1Þt
Pð2Þt
..
.
PðpÞt
26664
37775 ¼ GZt þ jt
where jt is a noise series (p 9 1). The sequence ofreconstructed temporal PCs, Ẑt; is then scaled to have the
same variance as the instrumental temporal PCs over the
calibration period and subsequently re-combined with
the calibration period spatial PCs to provide an estimate of
the unknown local temperature. A hemispheric mean
reconstruction is then formed by the appropriate spatial
average of the reconstructed local temperatures. The
regression procedures above are both inverse regression.
However, unlike the CPS inverse regression, the individual
proxy series are used in the regression process. Also, there
are Neof coefficients to estimate in each regression, com-
pared to only one coefficient in the CPS case.
Another CFR method uses the regularized expectation
maximization (RegEM) algorithm (Schneider 2001) and is
advocated by Rutherford et al. (2005) and others. Let Tt(j)
be the local temperature at time t at the jth location (j = 1,
2, ..., q) and let X be a (n + m) 9 (q + p) matrix with
missing data which is defined as
X¼
Tð1Þ1 T
ð2Þ1 � � � T
ðqÞ1 P
ð1Þ1 P
ð2Þ1 � � � P
ðpÞ1
Tð1Þ2 T
ð2Þ2 � � � T
ðqÞ2 P
ð1Þ2 P
ð2Þ2 � � � P
ðpÞ2
..
. ... ..
. ... ..
. ... ..
. ...
Tð1Þnþm T
ð2Þnþm � � � T
ðqÞnþm P
ð1Þnþm P
ð2Þnþm � � � P
ðpÞnþm
26666664
37777775
¼def
Tð1Þ Pð1Þ
Tð2Þ Pð2Þ
..
. ...
TðnþmÞ PðnþmÞ
26666664
37777775
with
T. C. K. Lee et al.: Evaluation of proxy-based millennial reconstruction methods
123
-
E½TðtÞ PðtÞ� ¼ l ¼ ½lT lP�; for t ¼ 1; 2; . . .; nþ m
VarðXÞ ¼ R ¼RTT RTPRPT RPP
� �
where lT and lP have length q and p, respectively,T(t) = [Tt
(1) Tt(2) ... Tt
(q)] and P(t) = [Pt(1) Pt
(2) ... Pt(p)].
The (q + p) 9 (q + p) matrix R is partitioned according toT and P so that RTT, RPP and RTP = RPT
T are q 9 q, p 9 p
and q 9 p matrices, respectively. In the matrix X, T(t) is
unknown for t = 1, 2,..., n. It is assumed that each record
T(t) can be represented by a linear model of the form
TðtÞ ¼ lT þ ðPðtÞ � lPÞBþ �ðtÞ:
Here, e(t) is the residual and B is a p 9 q matrix of regressioncoefficients. The reconstructed temperature T̂ðsÞ at time s(s = 1,2 ,..., n), is thus defined as lT + (P(s) - lP) B.
To calculate T̂ðsÞ; estimates of B and l are required. Ifn + m C p + 1, the expectation maximization (EM) algo-
rithm provides a way to iteratively estimate these
parameters (see, e.g. Schneider 2001). First, initialize the
unknown T̂ðtÞ with some values and obtain an estimate ofthe mean and covariance of matrix X (see Schneider 2001,
for the formulas). Then, estimate B using the maximum
likelihood estimate B̂ ¼ R̂�1PPR̂PT ; where the R̂PP and R̂PTdenote the partitioned covariance matrix estimate. If n +
m \ p + 1, the estimate of RPP is singular and the coeffi-cient B is not defined. Next, impute the missing T̂ðtÞ byl̂T þ ðPðtÞ � l̂PÞB̂ and update the X matrix. Re-estimate l,R and B using the updated matrix and then recalculate themissing temperatures. This process is continued until the
change in imputed values become sufficiently small.
In a typical CFR application, n + m is greater than p + 1.
However, the estimate of R is rank deficient because n +m \ p + q + 1, which can result in a poor estimate of thecoefficient B. Hence, the RegEM algorithm is used instead.
The RegEM algorithm consists of the same steps as the EM
algorithm, except the maximum likelihood estimate B̂ is
replaced with a regularized estimate, which is obtained by a
regularized regression procedure. Different regularization
scheme have been used. In Rutherford et al. (2005) and
Mann et al. (2005), the ridge regression scheme is used to
estimate the coefficient matrix B. Readers are referred to
Schneider (2001) for more details of the ridge regression
scheme. In Mann et al. (2007), truncated total least squares
(TTLS) (Fierro et al. 1997) is used to estimate B.
2.2 State-space model and Kalman filter algorithm
In addition to the methods described above, we propose a
new method to be used for reconstructing historical tem-
perature that is based on a state-space time series model
and the Kalman filter algorithm. (Kalman 1960; see Harvey
1989 or Durbin and Koopman 2001, for an introduction).
The proposed technique is essentially a further variant on
inverse regression. A state-space time series model is a
system that consists of two equations: the observation
equation and the state equation. In the context of paleo-
climate reconstruction, the observation equation describes
the relationship between the proxy series and the unknown
hemispheric mean temperature. On the other hand, the state
equation models the dynamics of the unknown hemispheric
mean temperature. Based on the state-space model, one can
estimate the unknown hemispheric mean temperature using
the Kalman filter algorithm, a Bayesian updating scheme
that estimates the unobserved hemispheric mean tempera-
ture based on the proxy data. The estimates from the
Kalman filter algorithm are the best linear estimator of the
unobserved process in the sense of minimum mean square
error. The Kalman filter algorithm can also be extended to
provide a forecast of the hemispheric mean temperature.
To describe the state-space representation of the hemi-
spheric mean temperature, we must first introduce some
notation and make certain assumptions. Thus, let GSt,
VOLt and SOLt be the climate model estimated responses
to greenhouse gas and sulphate aerosol forcing combined
(GS), volcanic forcing (VOL) and solar forcing (SOL) at
time t, respectively. We can then think of the hemispheric
mean temperature Tt as being the sum of the response to
external forcing plus the effect of internal variability. With
these assumptions, a reasonable statistical model for Tt(t = 1, 2, ..., n + m) might be
Tt ¼ sþ dGSGSt þ dVOLVOLt þ dSOLSOLt þ Zt¼ �Xt þ Zt
ð4Þ
where � ¼ ½s dGS dVOL dSOL�;Xt ¼½1 GSt VOLt SOLt�T; s is the mean state of theclimate system, dGS,dVOL and dSOL are scaling factorsthat account for error in the magnitude of the estimated
response to the specified external forcing and Zt represents
random variations resulting from internal variability. We
further assumed that Zt is an AR(1) process with lag-one
autocorrelation /. Thus,
Zt ¼X1j¼0
/jmt�j; ð5Þ
where the mt’s are white noise with mean 0 and variance Q.With these assumptions, our state-space representation
of the hemispheric mean temperature Tt (t = 1, 2, ...,
n + m) is given by the following system of equations:
Tt ¼ /Tt�1 þ �Ft þ mtPt ¼ fTt þ gt
ð6Þ
where Ft = Xt - /Xt-1, gt is white noise with mean zeroand variance R. The first equation is the state equation
T. C. K. Lee et al.: Evaluation of proxy-based millennial reconstruction methods
123
-
which determines how the unknown hemispheric mean
temperature evolves over time. It is obtained by consider-
ing the difference between Tt and /Tt-1 using Eq. (4). Thisequation assumed that Tt follows an autoregressive process
of order 1, i.e. AR(1), with exogenous variables Ft and
additive white noise mt. The state equation in effect statesthat the rate of change in temperature depends upon the
response to forcing which is governed by the forcing
coefficients dGS, dVOL and dSOL, and upon natural internalvariability. Alternatively, one can also define a state
equation that does not contain the response to forcings.
This can be achieved by setting Xt = 1 and � ¼ s: Thesecond equation in Eq. (6), which is the same equation as
used in inverse regression, is the observation equation. This
equation simply assumes that the relationship between the
composite proxy series and hemispheric temperature that
holds during the calibration period in Eq. (3) also holds in
the pre-calibration period.
The problem of estimating the pre-calibration period Ttin a state-space model can be approached by using the
Kalman filter and smoother algorithm. For notational pur-
poses, let Tts represent the estimate of T at time t given P1,
P2, ..., Ps and F1, F2, ..., Fs, where s B t. The Kalman filter
estimates of Tt for t = 1, 2, ..., n + m, denoted by Ttt, are
defined by the following set of recursive equations (Kalman
1960; see also Shumway and Stoffer 2000, p. 313),
Tt�1t ¼ /Tt�1t�1 þ �FtTtt ¼ Tt�1t þ KtðPt � fTt�1t Þ
ð7Þ
where
Kt ¼ fSt�1t ðf2St�1t þ RÞ
�1
St�1t ¼ /2St�1t�1 þ QStt ¼ St�1t ð1� fKtÞ:
Appropriate initial conditions for this recursion are T00 = l
and S00 = R, where l and R are constants. The Kalman
prediction Ttt-1 is an estimate of Tt based on all the
information we have at time t - 1, that is, our prior
knowledge of Tt before observing Pt. Such prior knowledge
is expressed through our Kalman filter estimate Tt-1t-1. After
observing Pt, our knowledge of Tt is updated by calculating
the Kalman filter estimate, Ttt. Once all of the Kalman filter
estimates are found, we can then further update our estimates
of Tt based on the entire data set {Pt, Ft; t = 1, 2, ..., n} using
the Kalman smoother algorithm. It should be noted that even
though Tt is known for t = n + 1, ..., n + m, we will still
provide filter estimates for them. This will enable us to
utilize all of the available data to estimate the unknown Ttwhen using the Kalman smoother. With initial condition
Tn+mn+m obtained from the Kalman filter algorithm (Eq. 7), the
Kalman smoother estimates Ttn+m of Tt, for t = n + m - 1,
n + m - 2, ..., 0, are
Tnþmt ¼ Ttt þ JtðTnþmtþ1 � Tttþ1Þ; ð8Þ
where Jt = /Stt/St+1
t .
A difficulty in using the state-space time series model is
that the parameters, ff;R;/;� ;Q; l;Rg; are unknown andneed to be estimated. An optimization algorithm such as
Newton–Raphson scoring and EM algorithm can be used to
estimate them numerically (see Appendix for details; see
Shumway and Stoffer 2000 for examples). Note that in the
optimization algorithm for state-space model, it is assumed
that the exogenous variable Ft is known. If / is unknown,the exogenous variables therefore cannot contain the
parameter /. Thus, in our state-space representation of thehemispheric mean temperature, one needs to fix the
parameter / that is involved in Ft = Xt - /Xt-1 beforeestimating the other parameters. More details on the choice
of / are given in the next section. Note that there is no needto fix the parameter / in front of Tt-1 in Eq. (6) in order touse the optimization algorithm. Thus, this parameter can be
estimated together with the other parameters.
The state-space model approach can also be extended to
produce forecasts. Provided that Ft is known for the future
time period, one can use the state equation to generate a
forecast recursively. Recall that the state equation at time t
is given by Tt ¼ /Tt�1 þ �Ft þ mt: By forecasting theerror term mt as zero, the forecast of Tn+m+s (s = 1, 2, ....)can be given by
~Tnþmþs ¼ /~Tnþmþs�1 þ �Fnþmþs ð9Þ
with ~Tnþm ¼ Tnþm being the known instrumental record attime n + m. Forecasts can also be made in the case when
the response to external forcing is not included in Eq. (6),
i.e. when Xt = 1 and � ¼ s: However, such forecasts willlikely not be very useful because they do not take the
impact of forcings into account and thus quickly revert to a
forecast of the mean s.An advantage of using the state-space model in Eq. (6)
is that it provides the flexibility to incorporate forcing
response information into the estimation of the unknown
temperature. This is achieved in a two step process that first
determines the impact of each forcing on the unknown
temperature through the estimation of the forcing coeffi-
cients. The estimation process uses the information
available in both the proxy series and the calibration data.
From Eq. (6), it is obvious that if the forcing coefficient is
significantly different from zero, one can claim that the
corresponding forcing has a significant impact on the
hemispheric mean temperature. Such an assessment can be
made using the confidence bound of each coefficient
obtained during the estimation step (see the Appendix for
details). Once the coefficients are estimated, the forcing
information is then incorporated into the final estimate of
the unknown temperature using the Kalman filter and
T. C. K. Lee et al.: Evaluation of proxy-based millennial reconstruction methods
123
-
smoother algorithm. In another words, the use of the state-
space model allows one to simultaneously reconstruct the
unknown hemispheric mean temperature and conduct a
detection assessment of the importance of the response to
GS, VOL and SOL forcing on hemispheric temperature.
Furthermore, the use of the state-space model allows one to
provide projections of future climate which are based on
parameters that are estimated using the proxy records and
past observations.
3 Results
3.1 Analysis with climate model simulations
The performance of a particular reconstruction method
depends on many factors, such as the statistical methods
used, the choice of proxies, the quality of the proxy records
and the target season or latitude band. It is, in general,
difficult to compare the performance of different recon-
struction methods using the past instrumental temperature
record because the instrumental period is simply too short
to calibrate such techniques and reliably assess their per-
formance. Climate model simulations, however, can
provide a test bed for assessing the reliability of these
methods as first introduced by Zorita et al. (2003; see also
von Storch et al. 2004; Mann et al. 2005; Zorita and von
Storch 2005 and others). In this paper, we will follow the
methodology proposed by von Storch et al. (2004). The
idea is to generate pseudo-proxy records by sampling a
selection of simulated grid-box temperatures from the cli-
mate model and degrading them with additive noise. The
reconstruction method is then applied to these pseudo-
proxy records and the resulting reconstruction is validated
against the known simulated hemispheric mean tempera-
ture record. To reflect the differences in the quality and
properties of real-world proxy data, different colours of
noise (e.g. red rather than white) and varying amplitudes of
the noise variance have been used in different studies.
In this section, we followed these procedures for testing
the reconstruction methods that are described in Sect. 2.
We have conducted a suite of experiments that explore the
sensitivity of each method to (1) the climate model simu-
lation used, (2) the length of the calibration period and (3)
the amount of noise introduced into the pseudo-proxy
series. The two simulations used are the GKSS ECHO-G
simulation (von Storch et al. 2004) and a simulation from
the NCAR CSM 1.4 model (Ammann et al. 2007). Both
simulations were forced with reconstructions of solar,
volcanic and greenhouse gas forcing. The CSM 1.4 run was
also forced with a reconstruction of aerosol forcing over
the millennium. We used the ECHO-G simulation in its
original form, rather than as adjusted (Osborn et al. 2006).
For both simulations, only output between years 1000 and
1990 is used. To represent the varying calibration intervals
that were used in actual reconstructions, we have used two
calibration periods in our analysis (1880–1960 and 1860–
1970). These calibration periods are similar to that used in
Hegerl et al. (2007) and Moberg et al. (2005). In our
experiments, pseudo-proxy records were formed by
degrading the grid box temperatures with additive noise.
The amount of noise introduced into the pseudo-proxy
series is expressed in terms of the signal-to-noise ratio
(SNR), which is defined asffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVarðXÞ=VarðNÞ
p; where X is
the grid box temperature series and N is the additive noise
series. For all methods, experiments with SNR = 0.5 and 1
were performed.
First, we obtain a pseudo northern hemisphere instru-
mental record from the simulation. CFR methods use a set
of continuous grid-box temperatures for analysis; we
therefore fixed the spatial coverage of the entire pseudo-
instrumental record at that of the instrumental network in
1920, as represented in the HadCRUT3 data set (Brohan
et al. 2006). The choice of the year 1920 is arbitrary, but it
corresponds roughly to the mid points of the calibration
periods that were used in our analysis. For the other meth-
ods, the pseudo NH mean instrumental record is calculated
from the appropriate areal average of the same grid-box
temperatures used above. This ensures that all methods are
provided with the same information for calibration.
Next, we define two pseudo-proxy networks of 15 and
100 randomly selected model grid boxes. These networks
were sampled from the 321 NH grid boxes that are co-
located with actual tree ring data found in the International
Tree Ring Data Base (http://www.ncdc.noaa.gov/paleo/
treering.html). These networks of grid-box temperature
were converted to pseudo-proxy records by degrading the
grid-box temperatures with added noise to mimic the
measurement error that is inherent in the proxy records.
The resulting pseudo-proxy series were then standardized
relative to the calibration period. Since the RegEM method
is computationally intensive, we will only apply this
method to the larger network of the two. The size of this
network is similar to networks that are used in the real-
world application of this technique. On the other hand, the
CPS methods and state-space model approach are less
computationally intensive, and these methods were tested
using both networks.
We used uniformly weighted spatial averages to form
the composite record in our experiments; composites that
are formed by wavelet transformation (Moberg et al. 2005)
or correlation based weighted averages (Hegerl et al. 2007)
are not considered here. Since the SNR and colour of the
noise in each pseudo-proxy series is the same, the different
weighting scheme should not substantially impact the
resulting composite record.
T. C. K. Lee et al.: Evaluation of proxy-based millennial reconstruction methods
123
http://www.ncdc.noaa.gov/paleo/treering.htmlhttp://www.ncdc.noaa.gov/paleo/treering.html
-
For the total least squares method, Var(et) is estimatedby calculating the variability of the difference between the
pseudo-instrumental record and the climate model simu-
lated NH temperature over the whole reconstruction period.
Since the pre-noise contaminated composite record is
known in our experiment, Var(gt) is estimate by calculatingthe variability of the difference between the pre-noise
contaminated and noise contaminated composite pseudo-
proxy record, instead of following the procedure described
in Hegerl et al. (2007). This is a rather idealized situation
in the sense that we have used information that is not
available in real-world application to estimate the two
variances.
For the MBH method, we retained the 10 largest EOFs
instead of using the selection rule that was described in
Mann et al. (1998). We also repeated our analysis by
retaining the 5 or 15 largest EOFs and the results are almost
identical to that obtained with the 10 largest EOFs. Thus,
these results will not be shown. Also, no detrending was
done to the data prior to calibration (see Wahl et al. 2006;
von Storch et al. 2006 for further discussions on the use of
detrended data).
For the RegEM method, the hybrid non-stepwise
approach is used to reconstruct the grid box temperatures
(Rutherford et al. 2005; Mann et al. 2005). As in Ruther-
ford et al. (2005) and Mann et al. (2005), the ridge
regression procedure is used to regularize the EM algo-
rithm. Prior to reconstruction, a weight is applied to each
standardized proxy series to ensure that the error variance
of the signal in the series are homogenous among all
records. The weight for the ith proxy series is defined asffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVarðPðiÞÞ=VarðSðiÞÞ
q; where P(i) is the ith proxy series and
S(i) is the signal in the ith proxy series. However, S(i) is
unknown in real-world applications. An approximation of
this weight can be provided by the sample correlation
coefficient between the proxy series and the associated grid
box temperature over the calibration period (M. Mann and
S. Rutherford, personal communication, 2006). Such a
weighting approach is not implemented in Rutherford et al.
(2005) and Mann et al. (2005). However, through our
experiments with the RegEM method (not shown) and
personal communications (2006) with M. Mann and S.
Rutherford, we confirmed that reconstruction of the CSM
hemispheric mean temperature is sensitive to whether
weighted proxy series are used. However, this only applies
to reconstructions that use ridge regression to regularize the
EM algorithm. Mann et al. (2007) found that results
obtained using TTLS regression for regularization are
insensitive to whether weights are used. They also pointed
out that regularization using TTLS regression is less
computationally intensive and tends to provide more robust
results than regularization with ridge regression. Hence
regularization using TTLS regression would be preferable.
However, we were not aware of these advantages at the
time of running our experiments, and hence we report on
results obtained with the ridge regression procedure. Mann
et al. (2005) found that RegEM reconstructions are rela-
tively insensitive to the use of a shorter calibration period
and hence, in this analysis, RegEM experiments were only
carried out for the 1860–1970 calibration period.
For the state-space model approach, we have run two
separate types of experiments in which the variable Ft in
Eq. (6) is defined differently. The first type of experiment
accounts for the impact of external forcing when recon-
structing the unknown temperature. This is achieved by
setting Xt = [1 GSt VOLt SOLt]T. The second type
of experiment only uses the information from the proxy
data for reconstruction and is done by using Xt = 1 and
� ¼ s: For experiments that investigate the impact ofexternal forcing, the variable Xt is obtained from an energy
balanced model (EBM) driven with reconstructed solar,
volcanic and anthropogenic forcings (Hegerl et al. 2003,
2007) . The EBM simulation used here is the same as that
used in Hegerl et al. (2007), from which we had available
the 30N–90N average response to greenhouse gas, sulfate
aerosol, volcanic and solar forcing.
In all the experiments, all parameters except one in Eq.
(6) are estimated through the EM algorithm (see Appendix
for details). The exception is the parameter / in theexogenous variable Ft, which needs to be estimated outside
of the EM algorithm. The value of / will be data dependentbecause / represents the lag-one autocorrelation of internalvariability. For our analysis, it is estimated by calculating
the lag-one autocorrelation of the residuals that result from
fitting Eq. (4) using the CSM or the ECHO-G model
simulated northern hemisphere mean temperature as Tt and
the EBM simulated response to forcings mentioned above
as Xt. For the CSM and ECHO-G simulations, / is esti-mated to be 0.581 and 0.741, respectively. The exogenous
variable Ft used in our analysis is then obtained using these
/ values. The precise choice of / turns out to have verylittle impact on the resulting reconstruction (not shown).
Annual mean data are used for all reconstruction
methods with some exceptions. Following the typical CPS
procedure, to estimate the parameters a and b in the for-ward regression and the variance matching method,
decadally smoothed data are used. The estimated parame-
ters are then used to scale the annual mean pseudo-proxy
series to provide the reconstructed annual hemispheric
mean temperature. For comparison, we also reconstruct the
temperature using parameters that are estimated with
annual mean data and we will denote such methods as the
non-smoothed forward regression and non-smoothed vari-
ance matching methods. On the other hand, as in Mann
et al. (1998), the pseudo-instrumental record used in the
principle component analysis of the MBH method is
T. C. K. Lee et al.: Evaluation of proxy-based millennial reconstruction methods
123
-
expressed as monthly means and annual mean PCs are
subsequently obtained from the monthly mean PCs for
analysis.
Figure 1a shows examples of reconstructed NH tem-
perature evolution simulated by the CSM. The
reconstructions are based on 15 pseudo-proxy series with
SNR = 0.5. It should be noted that the same pseudo-proxy
series and pseudo-instrumental record was used in each
method to ensure that differences in performance are solely
due to the methods themselves. Among the methods that are
considered, most did provide a faithful estimate for the
simulated NH temperature. The forward regression
(smoothed and non-smoothed) and the non-smoothed vari-
ance matching methods are exceptions. Comparing the
series reconstructed with inverse regression and total least
squares regression, it is clear that the two series are visually
indistinguishable, suggesting that the neglect of sampling
uncertainty in the instrumental record when estimating f(Eq. 3) does not have a substantial impact on the results. On
the other hand, neglecting the measurement error in the
composite record when estimating a (Eq. 1) can cause asubstantial bias in the reconstruction. This can be observed
by comparing the reconstructed series obtained with for-
ward regression to that obtained with total least squares
regression. Experiments using 100 pseudo-proxy series
with SNR = 0.5 are displayed in Fig. 1b. The increase in
the number of pseudo-proxy series does seem to alleviate
the problem in the smoothed forward regression method.
The improvement gained when going from 15 to 100
pseudo-proxy series is expected because the noise variance
in the composite record of 100 pseudo-proxy series is only
15% of that with 15 pseudo-proxy series. However, such
improvement may be less significant in real-world appli-
cations given that errors might be spatially correlated.
Results obtained with the ECHO-G simulation are very
similar and thus not shown.
The test results for a reconstruction method can be
affected by both the specific realizations of noise that are
added when creating the pseudo-proxy record, and the
locations of the pseudo-proxy record. One would expect
the reconstruction to differ as a result of sampling vari-
ability from at least these two sources. Therefore, to get a
better picture of the performance of each reconstruction
method, it is necessary to apply them to a number of
realizations of the pseudo-proxy record. Hence, we
reconstructed the hemispheric mean temperature series 100
times for each method using 100 different realizations of
pseudo-proxy records. For each realization, the locations of
the pseudo-proxies also change randomly within the 321
grid boxes that are specified before, as well as the noise.
For the network with 100 proxy series, only 40 recon-
structions are produced for each method due to
computational limitations. However, 100 reconstructions
were produced for each method for the smaller 15 locations
proxy network.
To provide a quantitative assessment for each recon-
struction method, we computed the relative root mean
squared error (RRMSE) of the reconstruction error,
expressed relative to the variability of the model simulated
NH temperature during the pre-calibration period. The
RRMSE is simply defined as
RRMSE ¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPnt¼1ðTt � T̂tÞ
2
Pnt¼1ðTt � TÞ
2
s
where Tt; T̂t and T are the model simulated NH tempera-
ture, the reconstructed NH temperature and the temporal
mean of the model simulated NH temperature during the
pre-calibration period respectively. In general, a smaller
RRMSE means a better reconstruction. The RRMSE can
lie between zero and infinity and a RRMSE value of less
than 1 indicates that the reconstructed series is better than a
reconstruction that has a constant value equal to the cli-
matology of the pre-calibration period.
Figure 2 shows the median of the RRMSEs obtained
from the 100 (or 40) realizations, as a function of the
degree of smoothing of the climate model simulated annual
mean series Tt and the reconstructed annual mean series T̂t:
An estimated 5–95% uncertainty range of the RRMSE is
also displayed in the figure, which is obtained by using the
5th and 95th percentiles of the sample of RRMSEs.
Comparing the results obtained between the two simula-
tions, the results are robust for most of the methods. The
performance of most of the methods considered is very
similar at decadal and lower resolution. The non-smoothed
forward regression, non-smoothed variance matching and
MBH methods are exceptions. The performance of all
methods was found to be insensitive to the two choices of
calibration period, and thus results obtained using the
shorter calibration periods are not shown. In fact, for both
the CSM and ECHO-G simulation, there is almost no
change in the median value of RRMSE when the shorter
calibration period is used.
At the annual resolution, the RegEM and state-space
model approaches produce the smallest RRMSEs. This is a
result of the more sophisticated procedures that are involved
in these methods, which in effect filter out the measurement
errors in the pseudo-proxy series. In contrast, the recon-
structed series from a typical CPS method is merely a scaled
version of the composite series, with the result that the
measurement error contained in the composite series is
directly transferred to the reconstructed series. This prob-
lem is more severe when only 15 proxy series are available
for reconstruction (Fig. 3). Hence, the simple CPS methods
should be avoided if the goal is to reconstruct high fre-
quency climate variation. At the same time, the MBH
T. C. K. Lee et al.: Evaluation of proxy-based millennial reconstruction methods
123
-
method is observed to be the worst performer when
SNR = 0.5. However, when SNR is increased to 1, the
MBH method, at annual resolution, produces comparable
RRMSEs to that of most CPS methods considered.
The estimated RRMSE uncertainty ranges provide
information on the sensitivity to sampling variability for
each method. From Fig. 2, it is obvious that such sensi-
tivity is larger when only 15 pseudo-proxy series are used,
reflecting the greater sampling variability in the composite
record when only 15 proxy series are used. Inter-compa-
rison between the different methods suggests that the
sensitivity to sampling variability at annual resolution is
Fig. 1 Examples of reconstructed CSM northern hemisphere meantemperature series. Experiments were run using SNR = 0.5 and the
1860–1970 calibration period with a 15 and b 100 pseudo-proxyseries. 11-year moving averages are shown. The CSM NH mean is
shown in black. Reconstructions are shown as coloured lines. Allseries are express as anomalies relative to the calibration period. Units
(degrees Kelvin)
T. C. K. Lee et al.: Evaluation of proxy-based millennial reconstruction methods
123
-
somewhat smaller for the RegEM and state-space model
approaches. At decadal or lower resolution, the RRMSE
uncertainty range is very similar across most methods.
By comparing the results of the two state-space model
approaches (with external forcing response and without), it
is clear that the RRMSE is insensitive to the inclusion or
Fig. 2 Relative root mean squared error (RRMSE) of the reconstruc-tion error, expressed relative to the variability of the simulated
hemispheric temperature. Units (degrees Kelvin). The median
RRMSE is indicated with horizontal bars and the estimated 5–95%
range of the RRMSEs is shown with vertical lines. Results using the
1860–1970 calibration period with different signal-to-noise ratios and
varying number of pseudo-proxies are shown for the two climate
model simulations: a CSM and b ECHO-G
T. C. K. Lee et al.: Evaluation of proxy-based millennial reconstruction methods
123
-
exclusion of the EBM estimated forcing response infor-
mation. Nevertheless, the reconstructed series from the two
approaches are slightly different when only 15 pseudo-
proxy series are used (Fig. 1a). This suggests that the
Kalman filter and smoother algorithm relies more heavily
on the proxy series than the estimated response to forcings
to estimate the unknown temperature. Even though the
reconstructions from the two state-space model approaches
are very similar, the approach that takes forcing changes
into account may be more useful in some instances since it
may be possible to use it to provide a detection assessment
and perhaps also a projection of future climate.
As mentioned above, one can use the state-space model
to simultaneously reconstruct the NH temperature and
conduct detection analysis. Both the CSM and ECHO-G
simulations are forced with a combination of external
forcing factors and therefore we should be able to detect
the effect of external forcing in our experiments provided
that these models respond similarly to forcing as the EBM.
Figure 4 shows the confidence bounds on the forcing
response coefficients for the experiments using SNR = 0.5.
For dGS and dVOL, the confidence bounds do not includezero for almost all experiments that were run, indicating
that the EBM simulated GS and VOL signals are detectable
in the reconstruction of the CSM and ECHO-G simula-
tions. However, the response to solar forcing as simulated
by the EBM is not detectable in about half of the CSM
reconstructions obtained using 15 pseudo-proxy series.
This fraction is reduced to near zero when the SNR is
increased to 1. In contrast, the EBM simulated SOL signal
is detectable in all the CSM experiments that use 100
pseudo-proxy series and in all ECHO-G experiments. The
inability to detect the SOL forcing in some experiments
may be due to the fact that the climate response to solar
forcing is relatively weaker than that to the other forcings
and hence may be harder to detect when the noise con-
tamination in the pseudo-proxy series increases. At the
same time, unlike the ECHO-G and EBM simulations, the
solar forcing estimates used in the CSM simulation
excluded the 11-yr solar cycle and this may also contribute
to the varying detection results for the response to solar
forcing. The inability to consistently detect the response to
SOL forcing in our experiments is consistent with detection
work on real-world paleo-reconstructions (Hegerl et al.
2003, 2007).
Figure 5 displays hindcasts of annual NH mean tem-
perature for 1971 to 1990 with 100 pseudo-proxy series,
SNR = 0.5 and the 1860–1970 calibration period. The
hindcasts are produced using Eq. (9). For comparison, the
sum of the responses to external forcings for the average of
30N–90N as simulated by the EBM is also displayed in the
figure. We have produced two hindcasts for each climate
model using the same set of pseudo-proxy series, one using
the full 1000–1970 period to estimate the parameters for
the state-space model and another using only data from
1800 to 1970. It can be observed that the skill of the
hindcast varies between the two analysis periods. In fact,
the estimates of the forcing response coefficients, which are
influential to the hindcast values, are substantially different
for the two analysis periods (Table 1). However, these
Fig. 3 Example of thedifference in temperature
anomalies between the ECHO-
G simulation and the
reconstructed ECHO-G series
using SNR = 0.5 and the 1860–
1970 calibration period, plotted
at annual resolution with a 15pseudo-proxy series and b 100pseudo-proxy series
T. C. K. Lee et al.: Evaluation of proxy-based millennial reconstruction methods
123
-
differences have only a minor influence on the recon-
structed series (not shown) because the Kalman smoother
algorithm relies more heavily on the proxy data than the
EBM to reconstruct the unknown temperature.
Figure 6 displays examples of reconstructed NH tem-
perature from the MBH method using the two different
simulations and different SNR with the 1860–1970 cali-
bration period. Von Storch et al. (2004) included a
detrending step in their test of the MBH method. We
therefore repeated our experiments with detrended cali-
bration data and those results are also shown in Fig. 6.
When the variance of the noise added in the pseudo-proxy
series is the same as the grid-box temperature variance, that
is, when SNR=1, the non-detrended MBH method was able
to provide a reasonable reconstruction of the ECHO-G
simulated hemispheric mean temperature. However, results
Fig. 4 95% confidence boundsfor the coefficients used to scale
the EBM simulated responses to
external forcing in the state-
space model when the pseudo-
proxy SNR is 0.5 and using the
1860–1970 calibration period.
Only results from 5 out of the
100 (or 40) experiments are
displayed. The number in the
bottom left corner of each boxindicates the percentage of
confidence bounds (out of 100
or 40) that excludes zero
Fig. 5 Hindcasts of annualmean NH temperature based on
the estimated state equation
from 100 proxy series for two
analysis periods: 1000–1970
and 1800–1970. The sum of
response to external forcings for
the 30N–90N average as
simulated by the energy balance
model is also displayed for
comparison purposes. All series
are expressed as anomalies
relative to the 1860–1970 period
T. C. K. Lee et al.: Evaluation of proxy-based millennial reconstruction methods
123
-
become unsatisfactory when the SNR is decreased to 0.5.
For the reconstruction of CSM hemispheric mean tempera-
ture, the result is poor with both SNRs. Although our
calibration period is longer than that used in Mann et al.
(1998), our result does not change substantially when the
shorter 1880–1960 calibration period is used (not shown).
While our result does suggest the MBH method under-
estimates long term variability, the under-estimation is
smaller when non-detrended data is used (see Wahl et al.
2006; von Storch et al. 2006 for further discussions).
We also tested the CPS methods that are mentioned in
the previous section using detrended calibration data (not
shown). The performance of the CPS methods varies across
different realizations of pseudo-proxy records. In some
cases, detrending does not affect the reconstructed series
irrespective of which CPS method is considered. On the
other hand, there were also realizations of pseudo proxies
for which there was under-estimation of variability when
detrended data are used. Therefore, in the context of
pseudo-proxies constructed with white noise, detrending
results in less robust reconstructions of hemispheric mean
temperature variability.
3.1.1 Analysis with red pseudo-proxy noise
Up to this point, our experiments have not taken into
account the possibility that the proxies consist of a tem-
perature signal plus correlated errors. We therefore now
examine the impact of red pseudo-proxy noise on the
reconstruction methods. Following Mann et al. (2007), red
pseudo-proxy noise is generated from an AR(1) process
with lag-one autocorrelation equal to 0.32. As before,
analyses are conducted using two calibration periods and
with SNR = 0.5 and 1.
Examples of reconstructed NH temperature evolution
simulated by the CSM based on red pseudo-proxy series
with SNR=0.5 are displayed in Fig. 7. It is clear that the
redness of the pseudo-proxy noise has slightly increased
the variability of the reconstructed series when 15 pseudo-
proxy series are used. On the other hand, series recon-
structed with 100 pseudo-proxy series do not seem to be
affected. Results obtained with the ECHO-G simulation are
very similar and thus not shown.
The estimated RRMSEs obtained with red pseudo-proxy
series (not shown) are very similar to those shown in
Fig. 2. In particular, the estimated median RRMSEs are
almost unchanged and the relative ranking of the RRMSEs
between the different reconstruction methods remains the
same as in the case of white pseudo-proxy series. Hence,
similar conclusions regarding the performance of the dif-
ferent methods can be drawn as before. However, the
RRMSE uncertainty ranges are slightly larger than before
when 15 pseudo-proxy series are used.
3.2 Analysis with real-world paleoclimate proxy data
We apply the CPS methods and state-space model
approach to the paleoclimate proxy data used in Hegerl
et al. (2007). This data set consists of 14 proxy series. All
records are available as decadally smoothed series. As in
Hegerl et al. (2007), we calculated correlation based
weighted averages to form the composite record. Using
1880–1960 as the calibration period, the 30N–90N mean
temperature is reconstructed for the period 1510–1960. As
noted previously, the use of the state-space model approach
requires fixing the parameter / in the exogenous variableFt. Here, the value of this parameter is estimated by the
lag-one autocorrelation of the decadally smoothed 30N–
90N mean temperature from a control simulation of the
CCCma CGCM2 (Flato and Boer 2001). The resulting /value, 0.982, was used to obtain the exogenous variable Ft.
Figure 8 compares the reconstructed series obtained
from the variance matching method, state-space model
approach and Hegerl et al. (2007), who used the total least
squares method. The reconstruction estimates obtained from
these approaches are nearly identical. Reconstructed series
from the other CPS approaches also agree closely with the
series in Fig. 8 (not shown). This finding is consistent with
results obtained using climate model simulations in the
previous section where it was seen that these methods have
similar RRMSEs at the decadal resolution.
Estimates of the parameters of the state-space model
(with forcing) are given in Table 2. The parameter / isestimated to be close to 1, which is larger than that
obtained using climate model simulations with annual
mean data, which ranges from 0.3 to 0.7. This is not sur-
prising given that decadally smoothed data is strongly
dependent between successive time points. The estimate
parameter value for dGS and dVOL is significantly differentfrom zero, suggesting that the response to GS and VOL
forcing are detectable. On the other hand, the response to
SOL forcing is not detected. These detection results agree
with the findings reported in Hegerl et al. (2007).
Table 1 Estimated parameter values of the state-space modelobtained with 100 pseudo-proxy series using two analysis periods
Parameter CSM ECHO-G
1000–1970 1800–1970 1000–1970 1800–1970
dGS 1.315 2.105 0.775 1.978
dVOL 0.994 2.334 1.270 1.875
dSOL 0.891 -0.298 2.779 6.456
Boldfaced values are significant
T. C. K. Lee et al.: Evaluation of proxy-based millennial reconstruction methods
123
-
Fig. 6 Comparison of the reconstructed hemispheric mean temper-ature series obtained using the MBH method with non-detrended or
detrended data at two signal-to-noise ratios a SNR = 0.5 and
b SNR = 1 with 100 pseudo-proxy series and the 1860–1970calibration period. All series are expressed as 11-year running means
and as anomalies relative to the calibration period
T. C. K. Lee et al.: Evaluation of proxy-based millennial reconstruction methods
123
-
The 30N–90N decadally smoothed mean temperature
hindcast for 1961–1990 is displayed in Fig. 9. Hindcasts
are generated for three analysis periods (1270–1960, 1510–
1960 and 1800–1960). Confidence bounds on the hindcast
for the 1510–1960 analysis period are also displayed in the
figure. The hindcasts obtained from the 1270–1960 and
1510–1960 analysis periods are very similar and is very
close to the sum of the response to external forcings as
simulated by the EBM. The confidence bounds on the
hindcasts generated for the 1510–1960 analysis were able
to include almost all the observed temperature anomalies.
Similar results can be obtained for the 1270–1960 analysis
Fig. 7 Examples of reconstructed CSM northern hemisphere meantemperature series. Experiments were run using SNR = 0.5 and the
1860–1970 calibration period with a 15 and b 100 red pseudo-proxyseries. 11-year moving averages are shown. The CSM NH mean is
shown in black. Reconstructions are shown as coloured lines. Allseries are express as anomalies relative to the calibration period. Units
(degrees Kelvin)
T. C. K. Lee et al.: Evaluation of proxy-based millennial reconstruction methods
123
-
period (not shown). On the other hand, the hindcasts
obtained from the 1800–1960 analysis period warm too
quickly. This is because the estimated value of dGS is fourtimes larger than that in the other two analysis periods
(Table 2).
4 Conclusion
In this paper, we have compared the skill of several dif-
ferent reconstruction methods using climate model
simulations. At the annual resolution, the state-space model
and RegEM approaches provide the best reconstructions.
On the other hand, when compared at decadal or lower
resolution, we find that most methods can provide satis-
factory and similar results. Exceptions are the MBH, non-
smoothed forward regression and non-smoothed variance
matching methods. When analysed with decadally
smoothed real-world paleoclimate proxy data, we find that
all of the CPS methods considered provide almost identical
results. The similarity in performance provides evidence
that the difference between many real-world reconstruc-
tions is more likely to be due to the choice of the proxy
series, or the use of difference target seasons or latitudes
than to the choice of statistical reconstruction method (see
also Juckes et al. 2006).
We have also put forward another approach to historical
temperature reconstruction that is based on a state-space
time series model and the Kalman filter and smoother
algorithm. This approach allows the possibility of incor-
porating additional non-proxy information into the
reconstruction analysis, such as the estimated response to
external forcing. However, our experiments show that the
state-space model approach does not produce substantially
different reconstructions when such information is inclu-
ded. Nevertheless, both state-space model approaches
provided better reconstructions than existing CPS methods
at annual resolutions. At the same time, including forcing
response terms in the state-space model allows one to carry
out a simultaneous reconstruction and detection analysis. It
can also be used to provide forecasts of future climates.
Consistent with the results of Hegerl et al. (2007), we have
detected the effects of anthropogenic forcing (greenhouse
gas and aerosol) and volcanic forcing in real-world
paleoclimate proxy data.
Fig. 8 Reconstructed series for the 30N–90N mean using thevariance matching method and state-space model approach for real-
world paleoclimate proxy data. Temperature series are expressed as
11-year moving averages. All series are expressed as anomalies
relative to the 1880–1960 period and in units of degrees Kelvin
Table 2 Estimated parameter values of the state-space model obtained with paleoclimate proxy data for three reconstruction periods
Parameter 1270–1960 1510–1960 1800–1960
/ 0.981 (0.967, 0.994) 0.979 (0.961, 0.997) 0.944 (0.900, 0.988)
dGS 1.022 (0.043, 2.001) 1.128 (0.112, 2.143) 4.271 (1.054, 7.489)
dVOL 0.953 (0.685, 1.220) 0.814 (0.522, 1.106) 0.998 (0.461, 1.535)
dSOL 0.784 (-0.592, 2.161) 0.368 (-1.218, 1.955) -0.203 (-3.211, 2.805)
The 95% confidence interval is listed in brackets. Boldfaced values are significant
Fig. 9 State equation hindcast of decadally smoothed 30N–90Nmean temperature with real-world paleoclimate proxy data. All
hindcasts are based on the estimated state equation for three analysis
periods: 1270–1960, 1510–1960 and 1800–1960. Nine proxy series
are available for the 1270–1960 analysis period and 14 proxy series
are available for the other two analysis period. 95% confidence
bounds of the hindcasts for the 1510–1960 analysis period are shown
as dashed lines. All series are expressed as anomalies relative to the
1880–1960 period and in units of degrees Kelvin
T. C. K. Lee et al.: Evaluation of proxy-based millennial reconstruction methods
123
-
Acknowledgments We thank Caspar Ammann, Gabriele Hegerland Eduardo Zorita for providing their data for use in this study. We
also thank Gabriele Hegerl for helpful and constructive discussion.
We gratefully acknowledge that Terry Lee was supported by the
Canadian Foundation for Climate and Atmospheric Science through
the Canadian CLIVAR Research Network. Work by Min Tsao was
supported by the Natural Sciences and Engineering Research Council
through a Discovery Grant. This paper was improved by insightful
and helpful comments provided by Scott Rutherford, Walter Skinner,
Xuebin Zhang and an anonymous referree.
5 Appendix
5.1 Expectation maximization algorithm
Here we present the steps required to estimate the unknown
parameters that specify the state-space model (Eq. 6). We
use H ¼ ff;R;/; � ;Q; l;Rg to represent the vector ofunknown parameters. The derivations presented here are
expanded from Shumway and Stoffer (1982, 2000, pp.
321–325). Detail derivations of the following procedure are
given in Lee (2008). We first write down the likelihood
function for {Pt; t = 1, 2, ..., n} as in Shumway and Stoffer
(2000). Ignoring a constant, the likelihood function LP(H)can be expressed as:
�2 ln LPðHÞ ¼Xnt¼1
lnðXtÞ þXnt¼1ðe2t =XtÞ
where Xt = f2St
t-1 + R [ 0 and et = Pt - fTtt-1 for t = 1, 2,
..., n. The calibration period hemispheric mean
temperatures {Tt; t = n + 1, ..., n + m } are not involved
in the above likelihood function. To better utilize the
available information, we have modified the likelihood
function to include {Tt; t = n + 1, ..., n + m}. Ignoring a
constant, one can express the likelihood function for the
observation data set {Pt,Ts; t = 1, 2, ..., n; s = n + 1, ...,
n + m}, namely LP,T(H), as
�2 lnLP;TðHÞ ¼Xnþmt¼1
lnðXtÞþXnþmt¼1ðe2t =XtÞþ ðm� 1Þ lnðQÞ
þ lnð/2SnnþQÞþ ðTnþ1�/Tnn��Fnþ1Þ2=
ð/2SnnþQÞþXnþm
t¼nþ2ðTt�/Tt�1��FtÞ2=Q
where Xt and et is defined as before for t = 1, 2, ..., n. Fort = n + 1, ..., n + m, Xt = R and et = Pt - fTt. The goalhere is to find the H values that maximize the likelihoodfunction LP,T(H). Using the EM algorithm, such H valuescan be found through an iterative procedure. The
estimation procedure is summarized as follows.
1. Select the starting values for the parameters Hð0Þ ¼ffð0Þ;Rð0Þ;/ð0Þ; � ð0Þ;Qð0Þ; lð0Þg while fixing R, one ofthe initial condition for the Kalman recursion. (In our
analysis, we set R = 0.05. Results were almost iden-tical when different R values were used.) On iterationj, (j = 1, 2, ...), do steps 2–4.
2. Let N = n + m. Using H(j-1), compute the Kalmanfilter and smoother estimates using Eqs. (7) and (8) for
t = 1, 2, ..., N and the likelihood function lnLP,T(H(j-1)).
Also calculate, for t = N - 1, N - 2, ..., 0
SNt ¼ Stt þ J2t ðSNtþ1 � Sttþ1Þ
and for t = n, n - 1, ..., 0,
Ct ¼ Jt�1SNteTNt ¼ TNt þ ðJtJtþ1. . .JnÞðTnþ1 � TNnþ1ÞeSNt ¼ SNt � ðJtJtþ1. . .JnÞ2SNnþ1
and the following quantities:
Z11 ¼Xnt¼1½ðeTNt Þ2 þ eS
N
t � þXN
t¼nþ1ðTtÞ2
Z00 ¼Xnt¼0½ðeTNt Þ2 þ eS
N
t � þXN�1
t¼nþ1ðTtÞ2
Z10 ¼Xnt¼1ðeTNt eT
N
t�1 þ CtÞ þ Tnþ1eTN
n þXN
t¼nþ2TtTt�1
F11 ¼Xnt¼1
Ft eTNt þXN
t¼nþ1FtTt
F10 ¼Xnþ1t¼1
Ft eTNt�1 þXN
t¼nþ2FtTt�1
F00 ¼XNt¼1
FtFTt :
3. Obtain H(j) using the following:
fðjÞ ¼Xnt¼1½ðeTNt Þ2 þ eS
N
t � þXN
t¼nþ1T2t
" #�1
Xnt¼1
eTtjNPt þXN
t¼nþ1TtPt
" #
RðjÞ ¼ N�1Xnt¼1
eSNt ðfðjÞÞ2 þ ðPt � fðjÞeTN
t Þ2
h i
þ N�1XN
t¼nþ1Pt � fðjÞTt� �2
� ðjÞ ¼ FT11 � FT10Z10=Z00� �
F00 � F10FT10=Z00� ��1
/ðjÞ ¼ Z10 � � ðjÞF10� �
Z�100
QðjÞ ¼ N�1 Z11 � /ðjÞZ10 � � ðjÞF11� �
lðjÞ0 ¼ eTN
0 :
4. Repeat steps 2 and 3 until convergence. For the
analysis presented in this paper, the algorithm is
stopped when lnLP,T(H(j)) - ln LP,T(H
(j-1)) \ 0.0005.
T. C. K. Lee et al.: Evaluation of proxy-based millennial reconstruction methods
123
-
To provide a final estimate of Tt, the Kalman filter and
smoother estimates are recalculated using Eqs. (7) and (8)
with the estimate parameters Ĥ; for t = 1, 2, ..., n + m. Atthe time of convergence, one can also calculate the
standard errors for Ĥ: For parameters estimated using lnLP(H), the asymptotic variance of Ĥ is defined as (Caines1988, Chap. 7; Jensen and Petersen 1999):
VarðĤÞ ¼ � o2
oHln LPðHÞ
����Ĥ
� ��1
where q2/qH denotes the second derivatives with respect toH. In our application, we have used LP,T(H) in the estimationprocedure and hence a reasonable estimate of the asymptotic
variance can be obtained by replacing LP(H) with LP,T(H) inthe above formula. More discussions of this can be found in
Lee (2008). An analytical form of the asymptotic variance is
generally hard to find and hence we calculated it numeri-
cally. The asymptotic variance can be used to provide
confidence bounds for the estimated parameters. In partic-
ular, the 95% confidence bound for H is defined asĤ� 1:96
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVarðĤÞ
q: In our application, we are interested in
the confidence bounds for the parameters dGS, dVOL anddSOL. These bounds can provide a detection assessment ofthe importance of the response to the GS, VOL and SOL
forcing on the hemispheric mean temperature.
References
Allen MR, Stott PA (2003) Estimating signal amplitudes in optimal
fingerprinting. Part I: theory. Clim Dyn 21:477–491
Ammann CM, Joos F, Schimel D, Otto-Bliesner BL, Tomas R (2007)
Solar influence on climate during the past millennium: results
from transient simulations with the NCAR Climate System
Model. Proc Nat Acad Sci 104:3713–3718
Briffa KR, Osborn TJ, Schweingruber FH, Harris IC, Jones PD,
Shiyatov SG, Vaganov EA, (2001) Low-frequency temperature
variations from a northern tree-ring density network. J Geophys
Res 106:2929–2941
Brohan P, Kennedy JJ, Haris I, Tett SFB, Jones PD (2006)
Uncertainty estimates in regional and global observed temper-
ature changes: a new dataset from 1850. J Geophys Res
111:D12106
Brown PJ (1993) Measurement, regression, and calibration. Oxford
University Press, Oxford, 201 pp
Bürger G, Cubasch U (2005) Are multiproxy climate reconstructions
robust? Geophys Res Lett 32:L23711
Bürger G, Fast I, Cubasch U (2006) Climate reconstruction by
regression—32 variations on a theme. Tellus 58A:227–235
Caines PE (1988) Linear stochastic systems. Wiley, New York,
847 pp
Coelho CAS, Pezzulli S, Balmaseda M, Doblas-Reyes JJ, Stephenson
D (2004) Forecast calibration and combination: a simple
Bayesian approach for ENSO. J Clim 17:1504–1516
Durbin J, Koopman SJ (2001) Time series analysis by state space
methods. Oxford University Press, Oxford, 253 pp
Esper J, Cook ER, Schweinngruber FH (2002) Low-frequency in long
treering chronologies for reconstruction past temperature vari-
ability. Science 295:2250–2253
Esper J, Frank DC, Wilson RJ, Briffa KR (2005) Effect of scaling and
regression on reconstructed temperature amplitude for the past
millennium. Geophys Res Lett 32:L07711
Fierro RD, Golub GH, Hansen PC, O’Leary DP (1997) Regularization
by truncated total least squares. SIAM J Sci Comput 18:1223–
1241
Flato G, Boer GJ (2001) Warming asymmetry in climate change
experiments. Geophys Res Lett 28:195–198
Fuller WA (1987) Measurement error models. Wiley, New York,
440 pp
Harvey A (1989) Forecasting, structural time series models and the
Kalman filter. Cambridge University Press, Cambridge, 554 pp
Hegerl GC, Crowley TJ, Baum SK, Kim K-Y, Hyde WT (2003)
Detection of volcanic, solar, and greenhouse gas signals in
paleo-reconstructions of Northern Hemispheric temperature.
Geophys Res Lett 30:1242 doi:10.1029/2002GL016635
Hegerl GC, Crowley TJ, Allen MR, Hyde WT, Pollack HN, Smerdon
JE, Zorita E (2007) Detection of human influence on a new,
validated 1500 year temperature reconstruction. J Clim 20:650–
666
Jensen JL, Petersen NV (1999) Asymptotic normality of the
maximum likelihood estimator in state space models. Ann Stat
27:514–535
Jones PD, Osborn TS, Briffa KR (1997) Estimating sampling errors in
large-scale temperature averages. J Clim 10:2548–2568
Jones, PD, Briffa KR, Barnett TP, Tett SFB (1998) High-resolution
palaeoclimatic records for the last millennium: Integration,
interpretation and comparison with general circulation model
control run temperatures. Holocene 8:455–471
Juckes MN, Allen MR, Briffa KR, Esper J, Hegerl GC, Moberg A,
Osborn TJ, Weber SL, Zorita E (2006) Millennial temperature
reconstruction intercomparison and evaluation. Clim Past Dis-
cuss 2:1001–1049
Kalman R (1960) A new approach to linear filtering and prediction
problems. J Basic Eng 82:34–45
Lee TCK (2008) On statistical approaches to climate change analysis.
University of Victoria, PhD dissertation, to be available on
https://dspace.library.uvic.ca:8443/dspace/
Mann ME, Bradley RS, Hughes MK (1998) Global-scale temperature
patterns and climate forcing over the past six centuries. Nature
392:779–787
Mann ME, Rutherford S, Wahl E, Ammann CM (2005) Testing the
fidelity of methods used in proxy-based reconstructions of past
climate. J Clim 18:4097–4107
Mann ME, Rutherford S, Wahl E, Ammann CM (2007) Robustness of
proxy-based climate field reconstruction methods. J Geophys
Res (in press)
Moberg A, Sonechkin DM, Holmgren K, Datsenko NM, Karlen W
(2005) Highly variable Northern Hemisphere temperatures
reconstructed from low- and high-resolution proxy data. Nature
433:613–617
Osborn TJ, Raper SCB, Briffa KR (2006) Simulated climate change
during the last 1,000 years: comparing the ECHO-G general
circulation model with the MAGICC simple climate model. Clim
Dyn 27:185–197
Rutherford S, Mann ME, Osborn TJ, Bradley RS, Briffa KR, Hughes
MK, Jones PD (2005) Proxy-based Northern Hemisphere surface
temperature reconstructions: Sensitivity to methodology, predic-
tor network, target season, and target domain. J Clim 18:2308–
2329
Schneider T (2001) Analysis of incomplete climate data: Estimation
of mean values and covariance matrices and imputation of
missing values. J Clim 14:853–871
Shumway RH, Stoffer DS (1982) An approach to time series
smoothing and forecasting using the EM algorithm. J Time Ser
Anal 3:253–264
T. C. K. Lee et al.: Evaluation of proxy-based millennial reconstruction methods
123
http://dx.doi.org/10.1029/2002GL016635https://dspace.library.uvic.ca:8443/dspace/
-
Shumway RH, Stoffer DS (2000) Time series analysis and its
applications. Springer, Heidelberg, 549 pp
von Storch H, Zorita E, Jones JM, Dimitriev Y, Gonzalez-Rouco F,
Tett SFB (2004) Reconstructing past climate from noisy data.
Science 306:679–682
von Storch H, Zorita E, Jones JM, Gonzalez-Rouco F, Tett SFB
(2006) Response to comment on ‘‘Reconstructing past climate
from noisy data’’. Science 312:529c
Wahl ER, Ritson DM, Ammann CM (2006) Comment on ‘‘Recon-
structing past climate from noisy data’’. Science 312:529b
Zorita E, von Storch H (2005) Methodical aspects of reconstructing non-
local historical temperatures. Memorie Soc Astron Ital 76:794–801
Zorita E, Gonzalez-Rouco JF, Legutke S (2003) Testing the Mann
et al. (1998) approach to paleoclimate reconstructions in the
context of a 1000-yr control simulation with the ECHO-G
coupled climate model. J Clim 16:1378–1390
T. C. K. Lee et al.: Evaluation of proxy-based millennial reconstruction methods
123
Evaluation of proxy-based millennial reconstruction methodsAbstractIntroductionReconstruction approachesExisting approachesState-space model and Kalman filter algorithm
ResultsAnalysis with climate model simulationsAnalysis with red pseudo-proxy noise
Analysis with real-world paleoclimate proxy data
ConclusionAcknowledgmentsAppendixExpectation maximization algorithm
References
/ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 150 /GrayImageDepth -1 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 600 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile (None) /PDFXOutputCondition () /PDFXRegistryName (http://www.color.org?) /PDFXTrapped /False
/Description >>> setdistillerparams> setpagedevice