statistical validation of numerical models: some methods

statistical validation of numerical models: some methods

Ricardo Lemos

1. data setup

2. standard methods of model validation

3. model validation for a single location - time-series analysis

4. model validation for a single instant - spatial data analysis

5. model validation for variable space and time - spatiotemporal data analysis

6. summary

subject index

1. data setup

1. data setup

a) deterministic model already calibrated – the whole dataset is used to validate the model

b) deterministic model needs calibration – data subsetting according to the purpose of the numerical model (description vs. prediction)

calibrationvalidation

description prediction

space

time

space

time

spacespace

time

random subsampling

subsampling with the aim of

forecasting

Chang, J.C., Hanna, S.R., 2004. Air quality model performance evaluation. Meteorol Atmos Phys 87: 167–196

«Because there is not a single best performance measure or best evaluation methodology, it is recommended that a suite of different performance measures be applied.» (Chang and Hanna, 2004)

model validation

WWRP/WGNE Joint Working Group on VerificationForecast Verification - Issues, Methods and FAQ

Introduction - what is this web site about? Issues:

Why verify? Types of forecasts and verification

What makes a forecast good? Forecast quality vs. value

What is "truth"? Validity of verification results Pooling vs. stratifying results

Methods: Standard verification methods:

Methods for dichotomous (yes/no) forecasts Methods for multi-category forecasts

Methods for forecasts of continuous variables Methods for probabilistic forecasts

Scientific or diagnostic verification methods: Methods for spatial forecasts

Methods for probabilistic forecasts, including ensemble prediction systems Other methods

Sample forecast datasets: Finley tornado forecasts

Sydney 2000 Forecast Demonstration Project radar-based rainfall nowcasts

..... climate example ..... Some Frequently Asked Questions

Discussion group References:

Links to other verification sites References and further reading

Contributors to this site

http://www.bom.gov.au/bmrc/wefor/staff/eee/verif/verif_web_page.shtml


World Weather Research Program / Working Group on Numerical

Experimentation


a) compare raw model predictions and observed data

b) analyse residuals (observed – predicted)

methods:


a1) “eyeball“ verification:

a) compare raw model predictions and observed data

a2) straightforward statistical analysis - steps:

i. define important features of the data

ii. quantify them in some way - “statistical probes” (Kendall et al., 1999)

iii. investigate to what extent those features are captured by the model

probe mean variance min max lag-1 autocorrelation

amplitude of periodical fluctuation (e.g., =29.5d)

phase trend

Data

Model

Kendall B.E., et al., 1999. Why do populations cycle? A synthesis of statistical and mechanistic modeling approaches. Ecology 80(6): 1789-1805


b1) “eyeball“ verification:

b) analyse residuals (observed – predicted)

resi

dua

l

resi

dua

lre

sidu

al

time or space

b2) straightforward statistical analysis:

autocorrelation plot periodogram

rationale: if the model performs well, it should closely follow the observations and leave white noise only

validated model

overfitted?

incomplete?

significant lag-1 autocorrelation in residuals

unmatched periodicity


methods:

i. compare the performance of the numerical model with the performance of statistical time-series models:

a) Autoregressive Integrated Moving Average Models (ARIMA)

b) Bayesian Dynamic Linear Models (DLM)

c) Analogue Forecasting Models

this method requires subsetting in order to build the statistical models.

ii. examine in detail the performance of the numerical model

a) models for known periodicities

b) bootstrap R

c) process convolutions



Xt+1=0.9Xt-0.2Xt-1+t

t~N(0,1.2)

ARIMA models can contain seasonal components.

T [ºC]

time

t+1 t+2t+3

Box, G. E. P., Jenkins, G. M. 1976. Time Series Analysis: forecasting and control. Holden Day, Oakland, CA.

a) Autoregressive Integrated Moving Average Models (ARIMA; Box and Jenkins, 1976)



b) Bayesian Dynamic Linear Models (West and Harrison, 2000)

Xt+1=tXt+tXt-1+t

tt-1t

tt-1t

t~N(0,1.2)

t~N(0,0.05)

t~N(0,0.02)

T [ºC]

time

t+1

West, M., Harrison, J., 2000. Bayesian Forecasting and Dynamic Models. Springer-Verlag, NY.



c) Analogue Forecasting Models (McNames, 2002)

Xt+1,L=0.7Xt+1,A1+0.3Xt+1,A2+

~N(0,1.2)

T [ºC]

time

t+1t+1

t+1

A1 A2 L

McNames, J. 2002. Local averaging optimization for chaotic time series prediction, Neurocomputing 48(1-4): 279-297


i. compare the performance of the numerical model with the performance of statistical time-series models

T [ºC]

time

observations

numerical model

ARIMA

Bayesian DLM

analogue forecasting model


i. compare the performance of the numerical model with the performance of statistical time-series models

Taylor, K.E. 2001. Summarizing multiple aspects of model performance in a single diagram. J Geophys Res 106(D7): 7183–7192



a) models for known periodicities

e.g.: is the numerical model emulating the major tide components?

model for the observations: model for the numerical model output:

if, for example, 2 is significantly different from 0, we may conclude that the model is not reproducing well the f1 periodicity.

)2cos()2sin(

)2cos()2sin(

)2cos()2sin(

)2cos()2sin(

4847

3635

2423

1211

fafa

fafa

fafa

fafaX

)2cos()()2sin()(

)2cos()()2sin()(

)2cos()()2sin()(

)2cos()()2sin()(

488477

366355

244233

122111

fafa

fafa

fafa

fafaX



T [ºC]

time

b) bootstrap R (Mudelsee, 2003) – time-series usually have positive serial dependence, a.k.a. persistence (i.e., lagged autocorrelations are significant and positive). This affects the estimation of confidence intervals for the cross-correlation (R)

observations

numerical model

Mudelsee, M., 2003. Estimating Pearson’s Correlation Coefficient With Bootstrap Confidence Interval From Serially Dependent Time Series. Mathematical Geology 35(6): 651-665



T [ºC]

time

observationsnumerical model

residual [ºC]

time

observational missing values wider confidence bandssignificant model misfit

95% confidence band

Higdon, D., 2002. Space and space-time modeling using process convolutions. In Quantitative Methods for Current Environmental Issues, eds. C. Anderson, V. Barnett, P. C. Chatwin, and A. H. El-Shaarawi, 37–56. London: Springer-Verlag

0

c) process convolutions (Higdon, 2002) – help to define time periods where observations and predictions differ significantly. Should be applied to residuals (observations – predictions)

4. model validation for a single instant – spatial data analysis


output of the numerical model

methods:

i. direct comparison between numerical model and observationsa) figure of Merit in Space (FMS) / measure of effectiveness (MOE)b) entity-based verification

ii. residual analysis

a) process convolutions

in-situ measurements

T[ºC]

i. direct comparison between numerical model and observations


output of the numerical model (predictions)

in-situ measurements (observations)

AO

AP

AP∩AO

AP: T1<TP<T2

AO: T1<TO<T2

T2T1

T[ºC]

AFalse Negative

AFalse Positive

a) figure of merit in space (FMS) / measure of effectiveness (MOE)

AO

AP

AFalse Negative

AFalse Positive



0º

45º

90º

AP∩AO

Azimuth [º]

d

0º 45º 90º

d

this is a simple statistical approach, with easy interpretation and potential impact on decision-makers. However, it depends on some subjective criteria that have a strong impact on the outcome: boundaries (T1 and T2), interpolation algorithm, interpolation smoothness; the density and location of the observations is also important.

a) figure of merit in space (FMS) / measure of effectiveness (MOE)



b) entity-based verification (Ebert and McBride, 2000)

the total mean squared error (MSE) can be written as: MSEtotal = MSEdisplacement + MSEvolume + MSEpattern

the difference between the mean square error before and after translation is the contribution to total error due to displacement, MSEdisplacement = MSEtotal – MSEshifted

the error component due to volume represents the bias in mean intensity, MSEvolume = ( F - X )2

where F and X are the entity’s mean forecast and observed values after the shift. The pattern error accounts for differences in the fine structure of forecast and observed fields

MSEpattern = MSEshifted - MSEvolume

Ebert, E.E., McBride, J.L. 2000. Verification of precipitation in weather systems: Determination of systematic errors. J. Hydrology 239: 179-202.

0

0

0

y

x

z

95% confidence interval

ii. residual analysis


a) process convolutions (Higdon, 2002)

methods:

i. analyse observations and predictions at a single location (time-series analysis) or time instant (spatial data analysis) – see sections 3 & 4

ii. residual analysis – dynamic process convolutions


Residuals Spatial Process Noise

Time 1 Time 2 Time 3

S(., 1) S(., 2) S(., 3)

= +

yi S(xi, 2) i= +

ii. residual analysis - dynamic process convolutions (Higdon, 2002)


6. summary

6. summary

in essence, two validation approaches were proposed:

1) signal analysis – used to investigate to what extent the most important features of the data are captured by the numerical model

2) residual analysis – used to investigate if some significant features were left out by the numerical model

a third option is available: compare the performance of the numerical model with that of statistical models (ARIMA, DLMs, etc.).

statistical validation of numerical models: some methods

Documents

standard verification

spatial forecasts methods

stratifying results

numerical model description

modelb deterministic

compare raw model predictions

types of forecasts

verification sites