ensemble forecasting: calibration, verification, and use in applications

Ensemble Forecasting: Calibration, Verification, and use in Applications

Tom Hopson

OutlineI. Motivation for ensemble forecasting and post-

processinga) Introduce Quantile Regression (QR; Kroenker and

Bassett, 1978) post-processing procedureII. Ensemble forecast verificationIII. Thorpex-Tigge data setIV. Ensemble forecast examples:

a) Southwestern African floodingb) African meningitisc) US Army test range weather forecastingd) Bangladesh flood forecasting

Goals of an Ensemble Prediction System (EPS)

• Predict the observed distribution of events and atmospheric states

• Predict uncertainty in the day’s prediction• Predict the extreme events that are possible on a

particular day• Provide a range of possible scenarios for a

particular forecast

1. Greater accuracy of ensemble mean forecast (half the error variance of single forecast)

2. Likelihood of extremes3. Non-Gaussian forecast PDF’s4. Ensemble spread as a representation of forecast

uncertainty=> All rely on forecasts being calibrated

Further … -- Argue calibration essential for tailoring to local application:

NWP provides spatially- and temporally-averaged gridded forecast output

-- Applying gridded forecasts to point locations requires location specific calibration to account for local spatial- and temporal-scales of variability ( => increasing ensemble dispersion)

More technically …

Note:

Take home message:

For a “calibrated ensemble”, error variance of the ensemble mean is 1/2 the error variance of any ensemble member (on average), independent of the distribution being sampled

Prob

abili

ty

obsForecastPDF

Discharge

i=ensembleaverage

( fi −o)2iversus ( f −o)2

i

Simplifying

eq1 : fi2 −2of + o2

eq2 : f 2 −2of + o2

o : fj ⇒ j

eq1 : 2 f 2 − f 2( )

eq2 : f 2 − f 2

⇒ eq1=2 eq2

Forecast “calibration” or “post-processing”Pr

obab

ility

calibration

Flow rate [m3/s]

Prob

abili

ty

Post-processing has corrected:• the “on average” bias• as well as under-representation of the 2nd moment of the empirical forecast PDF (i.e. corrected its “dispersion” or “spread”)

“spread” or “dispersion”

“bias”obs

obs

ForecastPDF

ForecastPDF

Flow rate [m3/s]

Our approach:• under-utilized “quantile regression” approach• probability distribution function “means what it says”• daily variation in the ensemble dispersion directly relate to changes in forecast skill => informative ensemble skill-spread relationship

Rank Histograms – measuring the reliability of an ensemble forecast

• You cannot verify an ensemble forecast with a single observation.

• The more data you have for verification, (as is true in general for other statistical measures) the more certain you are.

• Rare events (low probability) require more data to verify => as do systems with many ensemble members.

From Barb Brown

From Tom Hamill

Troubled Rank Histograms

Slide from Matt Pocernic

1 2 3 4 5 6 7 8 9 10Ensemble #

1 2 3 4 5 6 7 8 9 10Ensemble #

Coun

ts0

1020

30

Coun

ts0

1020

30

From Tom Hamill

Example of Quantile Regression (QR)

Our application

Fitting T quantiles using QR conditioned on:

1) Ranked forecast ens

2) ensemble mean

3) ensemble median

4) ensemble stdev

5) Persistence

R package: quantreg

T [K

]

Timeforecastsobserved

Regressor set: 1. reforecast ens2. ens mean3. ens stdev 4. persistence 5. LR quantile (not shown)

Prob

abili

ty/°

K

Temperature [K]

climatologicalPDF

Step I: Determineclimatological quantiles

Step 2: For each quan, use “forward step-wisecross-validation” to iteratively select best subsetSelection requirements: a) QR cost function minimum, b) Satisfy binomial distribution at 95% confidenceIf requirements not met, retain climatological “prior”

1.

3.2.

4.

Step 3: segregate forecasts into differing ranges of ensemble dispersion and refit models (Step 2) uniquely for each range

Time

forecasts

T [K

]

I. II. III. II. I.Pr

obab

ility

/°K

Temperature [K]

ForecastPDF

prior

posterior

Final result: “sharper” posterior PDFrepresented by interpolated quans

RPS =1

n−1CDFfc,i −CDFobs,i( )

2

i=1

n

∑

Rank Probability Scorefor multi-categorical or continuous variables

Scatter-plot and Contingency Table

Does the forecast detect correctly temperatures above 18 degrees ?

Slide from Barbara Casati

BS =1n

yi −oi( )2

i=1

n

∑

Brier Score

y = forecasted event occurenceo = observed occurrence (0 or 1)i = sample # of total n samples

=> Note similarity to MSE

Other post-processing approaches …1) Bayesian Model Averaging (BMA) –

Raftery et al (1997)

2) Analogue approaches –Hopson and Webster, J. Hydromet (2010)

3) Kalman Filter with analogues –Delle Monache et al (2010)

4) Quantile regression –Hopson and Hacker, MWR (under review)

5) quantile-to-quantile (quantile matching) approach –Hopson and Webster J. Hydromet (2010)

… many others

Quantile Matching: another approach when matched forecasts-observationpairs are not available => useful for climate change studies

2004 Brahmaputra Catchment-averaged Forecasts-black line satellite observations-colored lines ensemble forecasts-Basic structure of catchment rainfall similar for both forecasts and observations-But large relative over-bias in forecasts

ECMWF 51-member EnsemblePrecipitation Forecasts comparedTo observations

Pmax

25th 50th 75th 100th

PfcstPrec

ipita

tion

Quantile

Pmax

25th 50th 75th 100th

Padj

Quantile

Forecast Bias Adjustment - done independently for each forecast grid

(bias-correct the whole PDF, not just the median)

Model Climatology CDF “Observed” Climatology CDF

In practical terms …

Precipitation 0 1m

ranked forecasts

Precipitation 0 1m

ranked observations

Hopson and Webster (2010)

Brahmaputra Corrected Forecasts Original Forecast

Corrected Forecast

=> Now observed precipitation within the “ensemble bundle”

Bias-corrected Precipitation Forecasts

• TIGGE, the THORPEX Interactive Grand Global Ensemble

• component of the World Weather Research Programme

• TIGGE archive consists of ensemble forecast data from ten global NWP centers

• designed to accelerate the improvements in the accuracy of 1-day to 2 week high-impact weather forecasts for the benefit of humanity.

• starting from October 2006

• available for scientific research

• near-real time forecasts (some centers delayed)

THORPEX Interactive Grand Global Ensemble

Archive Status and Monitoring, Data Receipt

Archive Centre

Current Data Provider

NCAR NCEP

CMC

UKMO

ECMWFMeteoFrance

JMAKMA

CMA

BoMCPTEC

IDD/LDM

HTTP

FTP

Unidata IDD/LDM

Internet Data Distribution / Local Data Manager

Commodity internet application to send and receive data

NCDC

Archive Status and Monitoring, Variability between providers

N200N128

0.56x0.561.00x1.001.25x0.831.25x1.251.50x1.50

0 1 2 3 4

Spatial Resolution

ECMWF UKMO JMA NCEP CMA CMC BOM MF KMA CPTEC

Number of Data Providers

Mod

el R

esol

ution

ECMW

F

UKMOJM

ANCEP

CMACMC

BOM MFKMA

CPTEC

0

10

20

30

40

50

60

70

80 # fields, # ensemble members

Conforming parameters

Ensemble Members

ECMW

F

UKMO

JMA

NCEPCM

ACM

CBOM M

FKM

ACPTEC

02468

1012141618

Forecast Length, Initialization

Forecast Length (Days)

Forecasts/day

Archive Status and Monitoring, Archive Completeness

PL = Pressure Level, PT = 320K θ Level, PV = ± 2 Potential Vorticity Level, SL = Single/Surface Level

Variable LvL ECWF UKMO JMA NCEP CMA CMC BOM MetF KMA CPTC

Geopotential Z PL

Specific H PL

T PL

U-velocity PL

V-velocity PL

Potential Vor PT

Potential T PV

U-velocity PV

V-Velocity PV

U 10m SL

V 10m SL

CAPE SL

Conv. Inhib. SL

Land-sea SL

Mean SLP SL

Orog. SL

Skin T SL

Snow D. H20 SL

Snow F. H20 SL

Archive Status and Monitoring, Archive CompletenessVariable LvL ECWF UKMO JMA NCEP CMA CMC BOM MetF KMA CPTC

Soil Moist. SL

Soil T SL

Sunshine D. SL

Surf. DPT SL

Surf. ATmax SL

Surf. ATmin SL

Surf. AT SL

Surf. P SL

LW Rad. Out SL

LH flux SL

Net Rad SL

Net Therm. Rad SL

Sensible Rad. SL

Cloud Cov SL

Column Water SL

Precipitation SL

Wilt. Point SL

Field Cap. SL

PL = Pressure Level, PT = 320K θ Level, PV = ± 2 Potential Vorticity Level, SL = Single/Surface Level

Early May 2011, floods in southwestern Africa

Early May 2011, floods in southwestern Africa-- examine ens forecasts … ECMWF 24hr precip

Early May 2011, floods in southwestern Africa-- examine ens forecasts … NCEP GEFS 24hr precip

Early May 2011, floods in southwestern Africa-- examine ens forecasts … ECMWF 5-day precip

Early May 2011, floods in southwestern Africa-- examine ens forecasts … NCEP GEFS 5day precip

A Cautionary Warning about using ProbabilisticPrecipitation Forecasts in Hydrologic Modeling

(Importance of Maintaining Spatial and Temporal Covariancesfor Hydrologic Forecasting => one option: “Schaake Shuffle”)

River catchtment A

subB

subC

ensemble1 ensemble2 ensemble3

QBQC

QA

Scenario forsmallest possibleQA? No.

Scenario forlargest possibleQA? No.

QA sameFor all 3 possibleensembles

Scenario foraverage QA?

Dugway Proving Ground

Dugway Proving Grounds, Utah e.g. T Thresholds

• Includes random and systematic differences between members.

• Not an actual chance of exceedance unless calibrated.

Challenges in probabilistic mesoscale prediction

• Model formulation• Bias (marginal and conditional)• Lack of variability caused by truncation and approximation• Non-universality of closure and forcing

• Initial conditions• Small-scales are damped in analysis systems, and the model must

develop them• Perturbation methods designed for medium-range systems may not be

appropriate• Lateral boundary conditions

• After short time periods the lateral boundary conditions can dominate• Representing uncertainty in lateral boundary conditions is critical

• Lower boundary conditions• Dominate boundary-layer response• Difficult to estimate uncertainty in lower boundary conditions

RTFDDA and Ensemble-RTFDDA

Liu et al. 2010 AMS Annual Meeting, 14th IOAS-AOLS, Atlanta, GA. January 18 – 23, [email protected]

National Security Applications Program Research Applications Laboratory

3-hr dewpoint time seriesBefore Calibration After Calibration

Station DPG S01

42-hr dewpoint time seriesBefore Calibration After Calibration

Station DPG S01

obs

Blue is “raw” ensembleBlack is calibrated ensembleRed is the observed value

Notice: significant change in both “bias” and dispersion of final PDF

(also notice PDF asymmetries)

PDFs: raw vs. calibrated


3-hr dewpoint rank histogramsStation DPG S01


Station DPG S01

42-hr dewpoint rank histograms

Measures Used:1) Rank histogram (converted to scalar measure)2) Root Mean square error (RMSE)3) Brier score4) Rank Probability Score (RPS)5) Relative Operating Characteristic (ROC) curve6) New measure of ensemble skill-spread utility

=> Using these for automated calibration model selection by using weighted sum of skill scores of each

Utilizing Verification measures near-real-time …

Skill Scores

• Single value to summarize performance.• Reference forecast - best naive guess;

persistence, climatology• A perfect forecast implies that the object

can be perfectly observed• Positively oriented – Positive is good

SS =Aforc −Aref

Aperf −Aref


Skill Score VerificationRMSE Skill Score CRPS Skill Score

Reference Forecasts:Black -- raw ensembleBlue -- persistence

Thank You!

ensemble forecasting: calibration, verification, and use in applications

Documents

ensemble mean

ensemble stdev

calibrated ensemble

ensemble median

ensemble forecastyou

ensemble members

forecast calibration

informative ensemble