comparing postprocessing approaches to calibrating operational river discharge forecasts

Comparing Postprocessing Approaches to Calibrating Operational River Discharge ForecastsTom Hopson1Peter Webster, EAS-Georgia TechAndy Wood, CBRFC-NOAA1

QuestionsHow are forecasting errors partitioned (i.e. lowest hanging fruit)?Capturing (reducing?) hydrologic model errorsBest algorithms (KNN, QR)?Impact of # of hindcasts?Optimal point to apply post-processing algorithms?

Context: CFAB ProjectPI Peter Webster, Georgia Tech ([email protected])Partners: USAID, CARE, ECMWF, Bangladeshs Meteorology Dept and Flood Forecasting Warning Centre (FFWC), NASA-TRMM, NOAA-CMORPHPurpose: provide flood forecasts of the Ganges and Brahmaputra rivers for Bangladesh, operational 2003-ongoing

Daily Operational Flood Forecasting SequenceReference: Hopson, T. M., and P. J. Webster, 2010: A 110-Day Ensemble Forecasting Scheme for the Major River Basins of Bangladesh: Forecasting Severe Floods of 200307. J. Hydrometeor., 11.

Statistically corrected downscaled forecasts

Generate forecasts

Update soil moisture states and in-stream flows

Calibrate model

Generate hindcasts

Generate forecasts

Distributed Model Hindcast/Forecast Discharge Generation

Generate hindcasts

Generate forecasts

Above-critical-level forecast probabilities transferred to Bangladesh

Convolve multi-model forecast PDF with model error PDF

Generate forecasted model error PDF

Generate hindcasts

Generate forecasts

Updated outlet discharge estimates

Calibrate multi-model

Updated distributed model parameters

Calibrate AR error model

Multi-Model Hindcast/Forecast Discharge Generation

Updated TRMM-CMORPH-CPC precipitation estimates

Forecast Trigger:

ECMWF forecast files

Lumped Model Hindcast/Forecast Discharge Generation

Discharge Forecast PDF Generation

Generate hindcasts

Pmax25th50th75th100thPfcstPrecipitationQuantilePmax25th50th75th100thPadjQuantilePrecipitation Forecast Bias Adjustment done independently for each forecast grid(bias-correct the whole PDF, not just the median)Model Climatology CDFObserved Climatology CDFIn practical terms Precipitation 01mranked forecastsPrecipitation 01mranked observations

Brahmaputra Discharge Forecasts for 2007Model driven with ECMWF 51-member ensemble=> Need to account for hydrologic model (and observed precip) errors!

Skill ScoresSingle value to summarize performance.Reference forecast - best naive guess; persistence, climatologyA perfect forecast implies that the object can be perfectly observedPositively oriented Positive is good

Skill Score VerificationRMSE ScoreCRPS ScoreRMSE Skill ScoreReference Forecasts:green climatologyred -- persistenceCRPS Skill ScoreReference Forecasts:green climatologyred -- persistence

Recall for errors of 2 variables v1 and v2 added in quadrature:Weather variable stddevTotal stddevHydro model/obs precip stddevAssessment of Hydrologic Forecast Error Sources

Meeting with John Pace 2829 May 2008 NCAR, Boulder, CONCAR/RAL - National Security Applications Program*

Final flood forecast calibration or post-processingProbabilitycalibrationFlow rate [m3/s]ProbabilityPost-processing has corrected: the on average bias as well as under-representation of the 2nd moment of the empirical forecast PDF (i.e. corrected its dispersion or spread)spread or dispersionbiasobsobsForecastPDFForecastPDFFlow rate [m3/s]Additional Goals: probability distribution function means what it says (flat rank histogram) daily variation in the dispersion directly relate to changes in forecast skill produce PDF that has as much information content as possible (i.e. narrow)

Producing a Reliable Probabilistic Discharge Forecast

Our approach: Quantile Regression (QR)Benefits

Less sensitivity to outliers

Works with heteroscedastic errors

Optimally fit for each part of the PDF

4) flat rank histograms

Results: time series5-day lead-time10-day lead-timeUncorrectedKNNQR=> QR appears to be more stable than KNN

UncorrectedKNNQRRank Histograms=> Increased stability of QR reflected in slightly more consistent rank histograms

KNNQR125pts1000ptsCRPS Skill Score Comparisons of:1) Post-processing; 2) Data sizeReferenced to Uncorrected Forecasts

KNNQR125pts1000ptsRMSE Skill Score Comparisons of:1) Post-processing; 2) Data sizeReferenced to Uncorrected ForecastsPoints: 1) degradation of mean for small error corrections (short lead-times), especially for QR;2) KNN appears to provide an actual forecast correction for long lead-times;3) But KNN less stable for small data sets

SummaryInvestigated different algorithms (KNN and QR) and training data size 125-1000pts) to post-processing ensemble discharge forecasts in the operational setting of the Brahmaputra river entering into Bangladesh

Forecasting system post-processes separately precipitation forecasting error from hydrologic model (and observed precipitation) error

For the Brahmaputra catchment (with a time of concentration roughly 7 days), impact of precipitation forecasting error inconsequential out to 4 days

Under this particular forecasting approach, both KNN and QR provide reliable (flat rank histograms) and sharp (roughly 70-80% improvements over persistence) final PDFs

Post-processing inflates the dispersion (2nd moment) of the PDFs, which increases the sensitivity of the PDFs to degradation of their 1st moment skill

Comparisons of KNN and QR show greater stability in the QR approach, but possible greater sensitivity of KNN to forecasting hydrologic model errors

Slide 9 covers point 1Slides 14- 16 show results of comparing KNN w/ QR and different # of hindcasts to train model w/Point 3 no results for this, but what we had been discussing you can either delete this point, or discuss at end of talk the issue of whether best to partition the error postprocessing between precip and model error separately (as is done for Bangladesh) or just calibrate at the very end of the forecasting algorithm

***Can be very brief with this slide reference provided in next slide.

Points: CFAB stands for climate forecasting applications for Bangladesh, PI Peter Websterproviding operational flood forecasts of the Ganges and Brahmaputra Rivers for Bangladesh since 2003 ongoing;Utilizing ECMWF ensemble forecasts and NASA and NOAA satellite precip estimates

FYI more information below

The goal of CFAB is to improve flood warning techniquesWhy flooding problem exacerbated: India almost completely surrounds Bangladesh; historically no data sharing between countries => no advanced warning of severe flood stages until discharges reach the India-Bangladesh border (at which point extensively developed Bangladesh hydraulic model routes water downstream, giving Dhaka ~ 24- to 48-hrs in-advance lead-time of severe flood stages). CFAB goal: to extend out this lead-time to additional days/weeks/months in advance.

What allows us to make skillful river discharge forecasts at long lead-times (especially since forecasting precipitation in the tropics notoriously difficult)? Large catchments (Ganges ~ 1,000,000km^2, Brahmaputra ~500,000km^2 -- red line separates the catchments; Combined Meghna/Brahmaputra/Ganges peak discharges can reach up to 150,000 m^3/s => ~10X larger than historic Danube peak flows [2006] & 5X larger than historic Mississippi peak flows [1993])) => roughly-speaking catchments spatially- and temporally integrate precipitation => utilize ECMWF GCM output (1 to 10 day 51 member EPS weather variable forecasts and 1- to 6-month in-advance 41-member ensemble seasonal forecasts) - one of the (or arguably the) premier weather forecasting center in the world partnership with Bangladeshs Flood Forecasting Warning Centre -- send us near-real time border discharge measurements; capitalize on this data source by incorporating into our forecasting and error-correction data-assimilation schemes).

Discharge forecast schemes went operational in 2003; focus on 2004 results in this talk (2005 and 2006 Ganges and Brahmaputra river flows remained relatively low throughout the monsoon seasons).

Conceptual idea of catchments being able to spatially/temporally integrate first, temporal integration idea:say, it takes 10 days for the water that falls on the top end of the catchment to reach the flow outlet point. So the discharge flowing out today would be the result of rainfall that fell 10days ago at the top, 9days ago a little lower down, 8 days a little further down still, etc., including todays rainfall that's falling right near the channel outlet. This means if I wanted to generate, say, a 5day discharge forecast, a lot of the discharge 5 days from now would be due to "observed" precipitation I am observing today, yesterday, etc., up to 5 days ago;and only part (5 days worth, if you will) of the discharge would be due to "forecasted" rainfall. In this way,loosely-speaking, large catchments "temporally integrate" rainfall (inputs).

Similarly, they "spatially-integrate" in the sense that all the rainfall events that fall within the catchment will eventually (excepting evapotranspiration, dam holdings, etc.) reach the outlet: so in terms of forecast skill, if ECMWF forecasts that it will rain, say, 100km away from where it actually will rain, as long as it is still within the catchment (and still roughly equidistant to the channel outlet) this displacement error won't greatly affect the final discharge forecast; in this way (again loosely speaking) large catchments "spatially integrate" rainfall and this spatially intregration enhances the relevant skill of the precip forecasts themselves.**Schematic of the 1-10day discharge forecasting sequence for the Brahmaputra and Ganges rivers (which was fully-automated in 2004).Skip over details (more details than you need are given below) about the scheme (reference in lower R corner) points to hit are that it is a multi-model automated scheme with forecasts generated independently for the Ganges and Brahmaputra Rivers, and independently for each forecast lead-time. Also, we make a precip correction in the white (with oval) box, along with a hydrologic model error correction in the orange (with oval), then convolute the hydro model error w/ the precip error. Other option would be to instead just post-process the final output without doing these separate error corrections first.

More details FYI Schematic of the 1-10day discharge forecasting sequence for the Brahmaputra and Ganges rivers (which was fully-automated in 2004, to the huge relief of the first author!).Red boxes show inputs to the hydrologic forecast model. From right to left, these include: updated model calibrations (from an automated background calibration process); daily discharge observations of the Ganges and Brahmaputra at the border crossings provided by the FFWC (note that their consistent availability has allowed us to implement an error forecasting and correction scheme discussed later -- orange box below); TRMM and CMORPH satellite-derived precipitation estimates provided near-real-time by the GSFC Laboratory for Atmospheres and NOAA Climate Prediction Center, respectively, along with daily-updated interpolated rain-gauge fields (also NOAA-CPC). These precipitation sources were incorporated into the operational forecasts in 2004, but as such represented a new technical challenge: how to incorporate different sources of precipitation inputs that may each have their own biases relative to each other. To deal with this, we implemented the quantile-to-quantile mapping approach that same year to deal w/ these biases, represented by the clear box (with oval), which we also show in slide 5.

The most-left red box represents the ECMWF ensemble weather forecasts, which once they are received each day, triggers the forecasting sequence and all initial conditions (soil moisture, in-stream flows) are locked in.

Orange box to account for all sources of uncertainty in the discharge forecast.

These inputs feed into a separate catchment-lumped model (blue box), and a semi-distributed model (green box). Note that all steps shown below the horizontal dashed line are done independently for each day and forecast lead-time separately. Both of these hydrologic models are then blended together (again, done independently for each day and forecast lead-time) to form a multi-model forecast, shown in yellow. We wont say much here about this multi-model process, except that operational forecasts using 51-member ECMWF forecasts began in 2003 using only the catchment-lumped model only (blue box) since far simpler to implement (thanks to Keith Beven for constructive advice on implementing this model in these early stages); in 2004 we brought on-line the semi-distributed model, and at this point it was natural to retain both forecast models under a multi-model framework.

A separate algorithm is then used to both forecast and statistically account for all sources of error, shown in the orange box. This algorithm was also brought on line in 2004, and has been tweeked over the last couple of years KNN was the original algorithm, and we compare this with Quantile Regression in this talk.

last box represents final operationally-derivered forecasts, which we will show results from (focusing on 2004).

Aside: "lumped" model means that the modeling of the discharge process (i.e. going from rainfall inputs to discharge output) is being treated as a single point, without regards to where the rainfall falls within the catchments (so the rainfall inputs into the lumped model are being spatially averaged over the whole catchment first, before going into the model) -- this is the simplest model once can use for discharge forecasting. "Distrbuted" means you're now accounting for where the rainfall/runoff occurs within the catchment itself, and the local conditions of the watershed at those locations (i.e. how saturated the soil moisture is, etc.).

Further comments: each day after the ECMWF forecasts are received, it takes approximately 2hrs for forecasts for both the Ganges and Brahmaputra to be generated (again, all automatically done). The ECMWF forecasts are updated daily (automatically sent through FTP to Georgia Tech from ECMWF -- ECMWF is under the European Unions jurisdiction, but based in Reading, UK), the TRMM estimates are updated every 3hrs approximately, and the CMORPH and CPC-raingauge are updated daily.

Note that the forecasts are independently done for both the Ganges and Brahmaputra rivers close to the border crossing point of each river as it enters into Bangladesh from India. Severe flooding within Bangladesh can occur in years when both rivers crest at the same time, especially affecting the regions near and downstream of their junction (see slide 2), and this in-turn can affect the flow rates of these rivers as they enter into the country if the backwater effects from their junction point propogate far enough upstream. Currently this is not being accounted for, and is one weakness of our current forecast scheme.*Quantile-to-quantile approach no need to cover, just FYI if needed What are shown are 51-member ECMWF precip ensemble members run through the hydrologic multi-model, without accounting for hydrologic modeling error, at 5- and 10-day lead-times U-shaped rank histograms shown on the right therefore, need also to account for hydrologic model errors (implicitly also accounting for observed satellite precip errors)

Notes: left panels colored lines are 51-member ensemble forecasts; black solid line is the observed; horizontal dashed line is the severe discharge levelRight panels red dashed lines are 95% confidence intervals (for 51-member ensembles, would expect roughly 3 bins to fall outside of these bounds)*If needed, brief discussion of the skill-score idea, since well present skill scores in the remaining slides*Should be brief with this slide whats shown in the top 2 plots are RMSE and CRPS scores of the multi-model forecasts running just the 51-member ECMWF ensemble through the system (with was itself quantile-to-quantile Q2Q corrected) (black solid line, with dashed lines 95% confidence intervals); Red lines are persistence forecasts; green (dashed) lines are for a climatology forecast

Lower two plots show skill scored with the multi-model forecasts referenced to persistence (red) and climatology (green).

Points: by themselves, the multi-model forecasts which have just the Q2Q correction applied are skillful. Any further corrections (i.e. hydrologic model error corrections) can only improve from this starting point.*Total error as a function of lead-time shown with the blue line; red shows just the hydrologic modeling (and obs precip) error; green shows the precip forcing (I.e. ECMWF) error

Points: 1) lots of error due to hydrologic model errors; 2) precip forcing error only significant after 5-days or so; 3) errors do not sum, i.e. needed to be added in quadrature for uncorrelated errors (i.e. green+red NOT = blue)**Overall goal produce final forecasts that have flat rank histograms and are as sharp as possible*Lots of details given below: Main point is that step 1) accounts for precip forcing error; step 2) is where hydrologic (and obs precip) error is accounted for; step 3) is where the 2 sources of error are convoluted together. Also, shown here in step 2) is the KNN approach applied; could also apply QR at this point (which we also show results for in following slides). We also investigate (in further slides) the impact of hindcast size (shown in step 2a) on post-processing skill

Here we discuss a technique that we have implemented as part of our operational forecasts beginning in 2004: the idea is to make a separate estimate of the uncertainty due to weather forecast uncertainty and a separate estimate of combined initial condition and hydrologic modeling error. Note that it is not necessary that these uncertainty sources be separated to produce total uncertainty estimates of the discharge forecasts. One could instead generate hindcasts of all error sources combined, and then use these to directly estimate future forecasting errors. However, by separating out these two error sources, we can specifically estimate the effects of combined initial condition and hydrologic modeling error on each ensemble member generated in step 1, producing a more refined uncertainty estimate, and also better utilizing the useful information provided by ECMWF in the varying weather forecast ensemble dispersion. Once these two separate error estimates are made, these sources are then merged (convoluted) to produce a final estimate of hydrologic forecasting error. We argue that making separate estimates of these sources of hydrologic forecasting error as we have done here has real utility only if one can capture large variations in these separate error estimates; otherwise, treating these error sources as lumped together is likely the most expedient and accurate. These steps are decribed below.

In the first step, we estimate the discharge forecast uncertainty due to weather forecast uncertainty. To do so, the ECMWF 51-member ensemble weather forecasts are passed individually through the hydrologic multi-model, producing transformed ensembles of river discharge. These ensembles are shown schematically in step 1, where each discharge ensemble member (Qp) is equally weighted with probability 1/51.

In the second step, the uncertainties in the initial conditions (IC; these are primarily due to uncertainties in the TRMM/CMORPH/Raingage rainfall estimates) and hydrologic model errors are estimated using the KNN approach. This is done by first producing a hindcast time-series of hydrologic multi-model error. These errors are generated by driving the current calibrated model with the rainfall estimated fields only (as if they were forecasts) and differencing the forecasted discharge with the historic discharge (step 2a) above). Current conditions in the watershed at the time of the forecast horizon (see figure 2a ) are then compared to past conditions (using the error, its slope, and curvature of error, discharge, and catchment precipitation as selector variables; and the Mahalanobis distance as gauge). The most similar past periods are selected for and given a weight, and the *succeeding* error from the time-series (1 day succeeding error for the 1-day forecasts, 2nd day succeeding error for the 2-day forecasts, etc.; Fig 2a) from each period is then extracted (note that this represents a *forecasted* estimate of hydrologic error). These weighted forecasted errors then represent an empirical PDF of combined IC & hydrologic model error ( 2b) above). (This is represented by red/blue/green residual strings shown in 2a, in which these strings have been appended at the point of the forecast horizon (vertical dash line).

The final step is to combine the forecasts of model error (step 1) with the discharge forecasts derived from precipitation uncertainty (step 2). This is done by SUBTRACTING the value of each weighted error ensemble of step 2 from each EPS-derived discharge ensemble (Qp) of step 1 (recall, the error ensembles of step 2 represent error FORECASTS, so we want to remove this error forecast from the final discharge forecast. In essence, the model error PDF then represents a final model error correction).

We then end up with a new set of ensembles with size [51 X (size of step2 error ensembles)], and each new ensemble has a probability given by [(1/51) X (step2 error ensemble weight)]. This set of ensembles represents a combined PDF of both precipitation forecast uncertainty along with IC and hydrologic model error uncertainty, and also has provided an additional error correction to the forecast.

Note that this process is done independently for each day and for each forecast lead-time.

Note: the Mahalanobis Distance both standardizes the selector variables and and normalizes out their covariances.*This slide introduces QR as the underlying statistical tool we use in the calibration process. Here, Ive used persistence (yesterdays discharge) as a regressor for forecasting both the 90th, 50th, and 10th quantiles (upper, middle, lower black lines, respectively) of todays discharge. The red line is the fit to the mean (fit using standard linear regression). Notice that the median (50th percentile or quantile) fit is almost identical to the mean fit. Notice, however, how QR captures the heteroscedastic behavior of the data. Also, by virtue of minimizing absolute errors, QR is less sensitive to outliers. Also, a by-product of the QR cost function is a flat histogram.To fit the median, the sum of the absolute errors is minimized (as opposed to the sum of the square errors for linear regression); to fit the other quantiles, points above the line are preferentially-weighted over points below the line: e.g. for the 90th quantile, the absolute distance of points above the line is given a weight of 0.9, while the absolute distance from the line to the points below get a weight of 0.1.Top panels are 5-day 51-member ensemble forecasts for 2007 forecasts done w/ just the precip Q2Q correction (top left), and after also accounting for hydrologic model error using a KNN correction (middle top), and QR correction (right top). Lower panels are the same, but for 10-day lead-times.

Points: dispersion has increased after KNN and QR applied (middle and right panels) as compared to the raw forecasts (left panels)QR seems to be less jittery as compared to KNNThese are all qualitative observations; in next slides we provide quantitative comparisions*Rank histogram results for forecasts from 2004-2007 w/ and w/o hydrologic model corrections (correspond roughly to time-series shown in previous slide)

Points: for 51-member ensembles, would expect roughly 3 bins to fall outside of these bounds, implying that both KNN and QR provide almost perfect resultant PDFs, with QR perhaps a bit better*These slides show CRPS skill score improvements of the hydrologic model error corrections (KNN and QR) as compared to the uncorrected (i.e. just Q2Q applied to precip ensembles) raw ensemble forecasts (i.e. the reference forecast for these calculations). 95% confidence bounds are the dashed lines.

Points: see 10-20% improvements in skill after post-processing, with slight improvements (say, 5%) after processing with 1000pts as compared to 125pts. Confidence intervals also narrow with the 1000pts compared to the 125pts.Impact degrades with longer lead-times

*Same as previous slide, but skill measure is now the ensemble mean square error.

Points given at bottom of slide, but in addition, 1) the skill improvements shown with the CRPS (previous slide) are more focused on 2nd moment improvements; while here, we see much less improvement (if any) with the 1st moment;2) But note some of this degradation would be expected due to creating increased sensitivity of the skill measure to deviations in the mean caused by inflating the ensemble dispersion after hydrologic model post-processing

*

comparing postprocessing approaches to calibrating operational river discharge forecasts

Documents

model forecast pdf

forecast gridbias

empirical forecast pdf

flood forecasts

climatologya perfect

level forecast probabilities

precip errors

postprocessing algorithms