on the multiple breakpoint problem and the number of significant breaks in homogenisation of climate...
Post on 02-Jan-2016
214 Views
Preview:
TRANSCRIPT
On the multiple breakpoint problem and the number of significant breaks in homogenisation of climate records
Separation of true from spurious breaks
Ralf Lindau & Victor VenemaUniversity of Bonn
12th EMS Annual Meeting, Lodz, Poland – 13. September 2012
Internal and External Variance
Consider the differences of one station compared to a neighbour or a reference.
Breaks are defined by abrupt changes in the station-reference time series.
Internal variancewithin the subperiods
External variancebetween the means of different
subperiods
Criterion:Maximum external variance attained bya minimum number of breaks
12th EMS Annual Meeting, Lodz, Poland – 13. September 2012
Decomposition of Variance
n total number of yearsN subperiodsni years within a subperiod
The sum of external and internal variance is constant.
12th EMS Annual Meeting, Lodz, Poland – 13. September 2012
First Question
How do random data behave?
Needed as stop criterion for the numberof significant breaks.
12th EMS Annual Meeting, Lodz, Poland – 13. September 2012
Random Time Series
with stddev = 1
Segment averages xi scatter randomly
mean : 0
stddev: 1/
Because any deviation from zero can beseen as inaccuracy due to the limited number of members.
in
12th EMS Annual Meeting, Lodz, Poland – 13. September 2012
2-distribution
The external varianceis equal to the mean square sumof a random standard normal distributed variable.
Weighted measure for thevariability of the subperiods‘means
12th EMS Annual Meeting, Lodz, Poland – 13. September 2012
From 2 to distribution
n = 21 yearsk = 7 breaks
As the total variance is normalized to 1, a kind of normalized
chi2-distribution is expected:
This is the -distribution.
data
2
1,2
1)(
12
112
knkB
vvvp
knk
The exceeding probability P gives thebest (maximum) solution for v
Incomplete Beta Function
v
pdvvP0
1)(
7 breaks in 21 years
12th EMS Annual Meeting, Lodz, Poland – 13. September 2012
Added variance per break
5ln21ln
2
1
1
1*
***
k
kk
dk
dv
v
k
1
0
1)(i
l
lml vvl
mvP
Incomplete -function:
2
3n
m
2
ki
Transformation to dv/dk:
mean
90%
95%
12th EMS Annual Meeting, Lodz, Poland – 13. September 2012
The extisting algorithm Prodige
Original formulation of Caussinus and Mestre for the penalty term in Prodige
Translation into terms used by us.
Normalisation by k* = k / (n -1)
Derivation to get the minimum
In Prodige it is postulated that the relative gain of external variance is a constant for given n.
minln21ln * nkv
0ln21
1*
ndk
dv
v
ndk
dv
vln2
1
1*
minln1
21ln
n
n
kv
min)ln(
1
2
)(
)(
1ln)(
1
2
1
1
2
nn
lk
YY
YYn
YCn
ii
k
j
jj
k
12th EMS Annual Meeting, Lodz, Poland – 13. September 2012
Shorter length, less certainty
n = 21 yearsn = 101 years
Exceeding probability1/1281/641/321/161/81/4
12th EMS Annual Meeting, Lodz, Poland – 13. September 2012
Second Question
How do true breaks behave?
Identical Behaviour
True breaks behave identical to random data.
But the abscissa-scale is now:
k / nk instead of k / n.
Compared to random time series the external variance grows faster by the factor
n / nk
12th EMS Annual Meeting, Lodz, Poland – 13. September 2012
data
theory
nk = 19 true breaks within n = 100 years time series
Assumed / True Break Number k / nk
Break vs Scatter Regime
Simulated data with 19 breaks interfered by scatter
The internal variance decrease as a function of break number.
In the break regime the variance decrease faster by the factor:
15 breaks are detectable, depending on signal to noise ratio.
12th EMS Annual Meeting, Lodz, Poland – 13. September 2012
Time series lengthNumber of true breaks
12th EMS Annual Meeting, Lodz, Poland – 13. September 2012
Conclusions
• The analysis of random data shows that the external variance is -distributed, which leads to a new formulation for the penalty term.
• True breaks are also -distributed. Their external variance increases faster by a factor of n/nk compared to random scatter.
top related