ensemble forecasting: calibration, verification, and use in applications
DESCRIPTION
Ensemble Forecasting: Calibration, Verification, and use in Applications. Tom Hopson. Outline. Motivation for ensemble forecasting and post-processing Introduce Quantile Regression (QR; Kroenker and Bassett, 1978) p ost -processing procedure Ensemble forecast verification - PowerPoint PPT PresentationTRANSCRIPT
Ensemble Forecasting: Calibration, Verification, and use in Applications
Tom Hopson
OutlineI. Motivation for ensemble forecasting and post-
processinga) Introduce Quantile Regression (QR; Kroenker and
Bassett, 1978) post-processing procedureII. Ensemble forecast verificationIII. Thorpex-Tigge data setIV. Ensemble forecast examples:
a) Southwestern African floodingb) African meningitisc) US Army test range weather forecastingd) Bangladesh flood forecasting
Goals of an Ensemble Prediction System (EPS)
• Predict the observed distribution of events and atmospheric states
• Predict uncertainty in the day’s prediction• Predict the extreme events that are possible on a
particular day• Provide a range of possible scenarios for a
particular forecast
1. Greater accuracy of ensemble mean forecast (half the error variance of single forecast)
2. Likelihood of extremes3. Non-Gaussian forecast PDF’s4. Ensemble spread as a representation of forecast
uncertainty=> All rely on forecasts being calibrated
Further … -- Argue calibration essential for tailoring to local application:
NWP provides spatially- and temporally-averaged gridded forecast output
-- Applying gridded forecasts to point locations requires location specific calibration to account for local spatial- and temporal-scales of variability ( => increasing ensemble dispersion)
More technically …
Note:
Take home message:
For a “calibrated ensemble”, error variance of the ensemble mean is 1/2 the error variance of any ensemble member (on average), independent of the distribution being sampled
Prob
abili
ty
obsForecastPDF
Discharge
i=ensembleaverage
( fi −o)2iversus ( f −o)2
i
Simplifying
eq1 : fi2 −2of + o2
eq2 : f 2 −2of + o2
o : fj ⇒ j
eq1 : 2 f 2 − f 2( )
eq2 : f 2 − f 2
⇒ eq1=2 eq2
Forecast “calibration” or “post-processing”Pr
obab
ility
calibration
Flow rate [m3/s]
Prob
abili
ty
Post-processing has corrected:• the “on average” bias• as well as under-representation of the 2nd moment of the empirical forecast PDF (i.e. corrected its “dispersion” or “spread”)
“spread” or “dispersion”
“bias”obs
obs
ForecastPDF
ForecastPDF
Flow rate [m3/s]
Our approach:• under-utilized “quantile regression” approach• probability distribution function “means what it says”• daily variation in the ensemble dispersion directly relate to changes in forecast skill => informative ensemble skill-spread relationship
Rank Histograms – measuring the reliability of an ensemble forecast
• You cannot verify an ensemble forecast with a single observation.
• The more data you have for verification, (as is true in general for other statistical measures) the more certain you are.
• Rare events (low probability) require more data to verify => as do systems with many ensemble members.
From Barb Brown
From Tom Hamill
Troubled Rank Histograms
Slide from Matt Pocernic
1 2 3 4 5 6 7 8 9 10Ensemble #
1 2 3 4 5 6 7 8 9 10Ensemble #
Coun
ts0
1020
30
Coun
ts0
1020
30
From Tom Hamill
From Tom Hamill
From Tom Hamill
From Tom Hamill
From Tom Hamill
Example of Quantile Regression (QR)
Our application
Fitting T quantiles using QR conditioned on:
1) Ranked forecast ens
2) ensemble mean
3) ensemble median
4) ensemble stdev
5) Persistence
R package: quantreg
T [K
]
Timeforecastsobserved
Regressor set: 1. reforecast ens2. ens mean3. ens stdev 4. persistence 5. LR quantile (not shown)
Prob
abili
ty/°
K
Temperature [K]
climatologicalPDF
Step I: Determineclimatological quantiles
Step 2: For each quan, use “forward step-wisecross-validation” to iteratively select best subsetSelection requirements: a) QR cost function minimum, b) Satisfy binomial distribution at 95% confidenceIf requirements not met, retain climatological “prior”
1.
3.2.
4.
Step 3: segregate forecasts into differing ranges of ensemble dispersion and refit models (Step 2) uniquely for each range
Time
forecasts
T [K
]
I. II. III. II. I.Pr
obab
ility
/°K
Temperature [K]
ForecastPDF
prior
posterior
Final result: “sharper” posterior PDFrepresented by interpolated quans
RPS =1
n−1CDFfc,i −CDFobs,i( )
2
i=1
n
∑
Rank Probability Scorefor multi-categorical or continuous variables
Scatter-plot and Contingency Table
Does the forecast detect correctly temperatures above 18 degrees ?
Slide from Barbara Casati
BS =1n
yi −oi( )2
i=1
n
∑
Brier Score
y = forecasted event occurenceo = observed occurrence (0 or 1)i = sample # of total n samples
=> Note similarity to MSE
Other post-processing approaches …1) Bayesian Model Averaging (BMA) –
Raftery et al (1997)
2) Analogue approaches –Hopson and Webster, J. Hydromet (2010)
3) Kalman Filter with analogues –Delle Monache et al (2010)
4) Quantile regression –Hopson and Hacker, MWR (under review)
5) quantile-to-quantile (quantile matching) approach –Hopson and Webster J. Hydromet (2010)
… many others
Quantile Matching: another approach when matched forecasts-observationpairs are not available => useful for climate change studies
2004 Brahmaputra Catchment-averaged Forecasts-black line satellite observations-colored lines ensemble forecasts-Basic structure of catchment rainfall similar for both forecasts and observations-But large relative over-bias in forecasts
ECMWF 51-member EnsemblePrecipitation Forecasts comparedTo observations
Pmax
25th 50th 75th 100th
PfcstPrec
ipita
tion
Quantile
Pmax
25th 50th 75th 100th
Padj
Quantile
Forecast Bias Adjustment - done independently for each forecast grid
(bias-correct the whole PDF, not just the median)
Model Climatology CDF “Observed” Climatology CDF
In practical terms …
Precipitation 0 1m
ranked forecasts
Precipitation 0 1m
ranked observations
Hopson and Webster (2010)
Brahmaputra Corrected Forecasts Original Forecast
Corrected Forecast
=> Now observed precipitation within the “ensemble bundle”
Bias-corrected Precipitation Forecasts
OutlineI. Motivation for ensemble forecasting and post-
processinga) Introduce Quantile Regression (QR; Kroenker and
Bassett, 1978) post-processing procedureII. Ensemble forecast verificationIII. Thorpex-Tigge data setIV. Ensemble forecast examples:
a) Southwestern African floodingb) African meningitisc) US Army test range weather forecastingd) Bangladesh flood forecasting
• TIGGE, the THORPEX Interactive Grand Global Ensemble
• component of the World Weather Research Programme
• TIGGE archive consists of ensemble forecast data from ten global NWP centers
• designed to accelerate the improvements in the accuracy of 1-day to 2 week high-impact weather forecasts for the benefit of humanity.
• starting from October 2006
• available for scientific research
• near-real time forecasts (some centers delayed)
THORPEX Interactive Grand Global Ensemble
Archive Status and Monitoring, Data Receipt
Archive Centre
Current Data Provider
NCAR NCEP
CMC
UKMO
ECMWFMeteoFrance
JMAKMA
CMA
BoMCPTEC
IDD/LDM
HTTP
FTP
Unidata IDD/LDM
Internet Data Distribution / Local Data Manager
Commodity internet application to send and receive data
NCDC
Archive Status and Monitoring, Variability between providers
N200N128
0.56x0.561.00x1.001.25x0.831.25x1.251.50x1.50
0 1 2 3 4
Spatial Resolution
ECMWF UKMO JMA NCEP CMA CMC BOM MF KMA CPTEC
Number of Data Providers
Mod
el R
esol
ution
ECMW
F
UKMOJM
ANCEP
CMACMC
BOM MFKMA
CPTEC
0
10
20
30
40
50
60
70
80 # fields, # ensemble members
Conforming parameters
Ensemble Members
ECMW
F
UKMO
JMA
NCEPCM
ACM
CBOM M
FKM
ACPTEC
02468
1012141618
Forecast Length, Initialization
Forecast Length (Days)
Forecasts/day
Archive Status and Monitoring, Archive Completeness
PL = Pressure Level, PT = 320K θ Level, PV = ± 2 Potential Vorticity Level, SL = Single/Surface Level
Variable LvL ECWF UKMO JMA NCEP CMA CMC BOM MetF KMA CPTC
Geopotential Z PL
Specific H PL
T PL
U-velocity PL
V-velocity PL
Potential Vor PT
Potential T PV
U-velocity PV
V-Velocity PV
U 10m SL
V 10m SL
CAPE SL
Conv. Inhib. SL
Land-sea SL
Mean SLP SL
Orog. SL
Skin T SL
Snow D. H20 SL
Snow F. H20 SL
Archive Status and Monitoring, Archive CompletenessVariable LvL ECWF UKMO JMA NCEP CMA CMC BOM MetF KMA CPTC
Soil Moist. SL
Soil T SL
Sunshine D. SL
Surf. DPT SL
Surf. ATmax SL
Surf. ATmin SL
Surf. AT SL
Surf. P SL
LW Rad. Out SL
LH flux SL
Net Rad SL
Net Therm. Rad SL
Sensible Rad. SL
Cloud Cov SL
Column Water SL
Precipitation SL
Wilt. Point SL
Field Cap. SL
PL = Pressure Level, PT = 320K θ Level, PV = ± 2 Potential Vorticity Level, SL = Single/Surface Level
OutlineI. Motivation for ensemble forecasting and post-
processinga) Introduce Quantile Regression (QR; Kroenker and
Bassett, 1978) post-processing procedureII. Ensemble forecast verificationIII. Thorpex-Tigge data setIV. Ensemble forecast examples:
a) Southwestern African floodingb) African meningitisc) US Army test range weather forecastingd) Bangladesh flood forecasting
Early May 2011, floods in southwestern Africa
Early May 2011, floods in southwestern Africa-- examine ens forecasts … ECMWF 24hr precip
Early May 2011, floods in southwestern Africa-- examine ens forecasts … NCEP GEFS 24hr precip
Early May 2011, floods in southwestern Africa-- examine ens forecasts … ECMWF 5-day precip
Early May 2011, floods in southwestern Africa-- examine ens forecasts … NCEP GEFS 5day precip
Early May 2011, floods in southwestern Africa-- examine ens forecasts … NCEP GEFS 5day precip
A Cautionary Warning about using ProbabilisticPrecipitation Forecasts in Hydrologic Modeling
(Importance of Maintaining Spatial and Temporal Covariancesfor Hydrologic Forecasting => one option: “Schaake Shuffle”)
River catchtment A
subB
subC
ensemble1 ensemble2 ensemble3
QBQC
QA
Scenario forsmallest possibleQA? No.
Scenario forlargest possibleQA? No.
QA sameFor all 3 possibleensembles
Scenario foraverage QA?
Dugway Proving Ground
Dugway Proving Grounds, Utah e.g. T Thresholds
• Includes random and systematic differences between members.
• Not an actual chance of exceedance unless calibrated.
Challenges in probabilistic mesoscale prediction
• Model formulation• Bias (marginal and conditional)• Lack of variability caused by truncation and approximation• Non-universality of closure and forcing
• Initial conditions• Small-scales are damped in analysis systems, and the model must
develop them• Perturbation methods designed for medium-range systems may not be
appropriate• Lateral boundary conditions
• After short time periods the lateral boundary conditions can dominate• Representing uncertainty in lateral boundary conditions is critical
• Lower boundary conditions• Dominate boundary-layer response• Difficult to estimate uncertainty in lower boundary conditions
RTFDDA and Ensemble-RTFDDA
Liu et al. 2010 AMS Annual Meeting, 14th IOAS-AOLS, Atlanta, GA. January 18 – 23, [email protected]
National Security Applications Program Research Applications Laboratory
3-hr dewpoint time seriesBefore Calibration After Calibration
Station DPG S01
42-hr dewpoint time seriesBefore Calibration After Calibration
Station DPG S01
obs
Blue is “raw” ensembleBlack is calibrated ensembleRed is the observed value
Notice: significant change in both “bias” and dispersion of final PDF
(also notice PDF asymmetries)
PDFs: raw vs. calibrated
National Security Applications Program Research Applications Laboratory
3-hr dewpoint rank histogramsStation DPG S01
National Security Applications Program Research Applications Laboratory
Station DPG S01
42-hr dewpoint rank histograms
Measures Used:1) Rank histogram (converted to scalar measure)2) Root Mean square error (RMSE)3) Brier score4) Rank Probability Score (RPS)5) Relative Operating Characteristic (ROC) curve6) New measure of ensemble skill-spread utility
=> Using these for automated calibration model selection by using weighted sum of skill scores of each
Utilizing Verification measures near-real-time …
Skill Scores
• Single value to summarize performance.• Reference forecast - best naive guess;
persistence, climatology• A perfect forecast implies that the object
can be perfectly observed• Positively oriented – Positive is good
SS =Aforc −Aref
Aperf −Aref
National Security Applications Program Research Applications Laboratory
Skill Score VerificationRMSE Skill Score CRPS Skill Score
Reference Forecasts:Black -- raw ensembleBlue -- persistence
Thank You!