on the multiple breakpoint problem and the number of significant breaks in homogenisation of climate...

15
On the multiple breakpoint problem and the number of significant breaks in homogenisation of climate records Separation of true from spurious breaks Ralf Lindau & Victor Venema University of Bonn

Upload: noah-logan

Post on 02-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

On the multiple breakpoint problem and the number of significant breaks in homogenisation of climate records

Separation of true from spurious breaks

Ralf Lindau & Victor VenemaUniversity of Bonn

12th EMS Annual Meeting, Lodz, Poland – 13. September 2012

Internal and External Variance

Consider the differences of one station compared to a neighbour or a reference.

Breaks are defined by abrupt changes in the station-reference time series.

Internal variancewithin the subperiods

External variancebetween the means of different

subperiods

Criterion:Maximum external variance attained bya minimum number of breaks

12th EMS Annual Meeting, Lodz, Poland – 13. September 2012

Decomposition of Variance

n total number of yearsN subperiodsni years within a subperiod

The sum of external and internal variance is constant.

12th EMS Annual Meeting, Lodz, Poland – 13. September 2012

First Question

How do random data behave?

Needed as stop criterion for the numberof significant breaks.

12th EMS Annual Meeting, Lodz, Poland – 13. September 2012

Random Time Series

with stddev = 1

Segment averages xi scatter randomly

mean : 0

stddev: 1/

Because any deviation from zero can beseen as inaccuracy due to the limited number of members.

in

12th EMS Annual Meeting, Lodz, Poland – 13. September 2012

2-distribution

The external varianceis equal to the mean square sumof a random standard normal distributed variable.

Weighted measure for thevariability of the subperiods‘means

12th EMS Annual Meeting, Lodz, Poland – 13. September 2012

From 2 to distribution

n = 21 yearsk = 7 breaks

As the total variance is normalized to 1, a kind of normalized

chi2-distribution is expected:

This is the -distribution.

data

2

1,2

1)(

12

112

knkB

vvvp

knk

The exceeding probability P gives thebest (maximum) solution for v

Incomplete Beta Function

v

pdvvP0

1)(

7 breaks in 21 years

12th EMS Annual Meeting, Lodz, Poland – 13. September 2012

Added variance per break

5ln21ln

2

1

1

1*

***

k

kk

dk

dv

v

k

1

0

1)(i

l

lml vvl

mvP

Incomplete -function:

2

3n

m

2

ki

Transformation to dv/dk:

mean

90%

95%

12th EMS Annual Meeting, Lodz, Poland – 13. September 2012

The extisting algorithm Prodige

Original formulation of Caussinus and Mestre for the penalty term in Prodige

Translation into terms used by us.

Normalisation by k* = k / (n -1)

Derivation to get the minimum

In Prodige it is postulated that the relative gain of external variance is a constant for given n.

minln21ln * nkv

0ln21

1*

ndk

dv

v

ndk

dv

vln2

1

1*

minln1

21ln

n

n

kv

min)ln(

1

2

)(

)(

1ln)(

1

2

1

1

2

nn

lk

YY

YYn

YCn

ii

k

j

jj

k

12th EMS Annual Meeting, Lodz, Poland – 13. September 2012

Shorter length, less certainty

n = 21 yearsn = 101 years

Exceeding probability1/1281/641/321/161/81/4

12th EMS Annual Meeting, Lodz, Poland – 13. September 2012

Second Question

How do true breaks behave?

True Breaks

12th EMS Annual Meeting, Lodz, Poland – 13. September 2012

Identical Behaviour

True breaks behave identical to random data.

But the abscissa-scale is now:

k / nk instead of k / n.

Compared to random time series the external variance grows faster by the factor

n / nk

12th EMS Annual Meeting, Lodz, Poland – 13. September 2012

data

theory

nk = 19 true breaks within n = 100 years time series

Assumed / True Break Number k / nk

Break vs Scatter Regime

Simulated data with 19 breaks interfered by scatter

The internal variance decrease as a function of break number.

In the break regime the variance decrease faster by the factor:

15 breaks are detectable, depending on signal to noise ratio.

12th EMS Annual Meeting, Lodz, Poland – 13. September 2012

Time series lengthNumber of true breaks

12th EMS Annual Meeting, Lodz, Poland – 13. September 2012

Conclusions

• The analysis of random data shows that the external variance is -distributed, which leads to a new formulation for the penalty term.

• True breaks are also -distributed. Their external variance increases faster by a factor of n/nk compared to random scatter.