Download - STEPS3 ADV VERIFICATION REPORT
Bureau Research Report - 045
STEPS3 – ADV – VERIFICATION REPORT
Carlos Velasco-Forero, Jayaram Pudashine, Mark Curtis, and Alan Seed
September 2020
STEPS3 – ADV – VERIFICATION REPORT
STEPS3 – ADV – VERIFICATION REPORT
i
STEPS3 – ADV – VERIFICATION REPORT
Carlos Velasco-Forero, Jayaram Pudashine, Mark Curtis, and Alan Seed
Bureau Research Report No. 045
September 2020
National Library of Australia Cataloguing-in-Publication entry
Authors: Carlos Velasco-Forero, Jayaram Pudashine, Mark Curtis, and Alan Seed
Title: STEPS3 – ADV – VERIFICATION REPORT
ISBN: 978-1-925738-20-9
Series: Bureau Research Report – BRR045
STEPS3 – ADV – VERIFICATION REPORT
ii
Enquiries should be addressed to:
Dr. Carlos Velasco-Forero:
Bureau of Meteorology
GPO Box 1289, Melbourne
Victoria 3001, Australia
Copyright and Disclaimer
© 2020 Bureau of Meteorology. To the extent permitted by law, all rights are reserved and no part of
this publication covered by copyright may be reproduced or copied in any form or by any means except
with the written permission of the Bureau of Meteorology.
The Bureau of Meteorology advise that the information contained in this publication comprises general
statements based on scientific research. The reader is advised and needs to be aware that such
information may be incomplete or unable to be used in any specific situation. No reliance or actions
must therefore be made on that information without seeking prior expert professional, scientific and
technical advice. To the extent permitted by law and the Bureau of Meteorology (including each of its
employees and consultants) excludes all liability to any person for any consequences, including but not
limited to all losses, damages, costs, expenses and any other compensation, arising directly or indirectly
from using this publication (in part or in whole) and any information or material contained in it.
STEPS3 – ADV – VERIFICATION REPORT
iii
Contents
1. Executive Summary .......................................................................................... 1
2. Short description of STEPS .............................................................................. 3
3. Methodology ...................................................................................................... 3
3.1 User needs................................................................................................................ 3
3.2 STEPS3 .................................................................................................................... 4
3.3 Datasets .................................................................................................................... 4
4. Rainfall Ensembles and Probabilistic Verification .......................................... 8
4.1 Qualitative evaluation of forecast mean areal rainfall ............................................... 9
4.2 Qualitative evaluation of forecast rainfall fields ...................................................... 10
4.3 Probabilistic Verification .......................................................................................... 15 4.3.1 Root Mean Square Error – Ensemble Spread..................................................... 16 4.3.2 Continuous Ranked Probability Score (CRPS) ................................................... 18 4.3.3 Relative Operating Characteristic (ROC) ............................................................ 18 4.3.4 Rank (Talagrand) Histogram ............................................................................... 21 4.3.5 Reliability (Attribute) Diagrams............................................................................ 22
4.4 Performance based on different number of ensemble members ........................... 25
4.5 Comparison with existing operational system, STEPS1-ADV ................................ 28
4.6 Comparison with pySTEPS .................................................................................... 31
5. Operational configuration for STEPS3-ADV .................................................. 35
6. Conclusions .................................................................................................... 37
7. References ....................................................................................................... 38
8. Acknowledgements......................................................................................... 38
9. Appendix .......................................................................................................... 39
STEPS3 – ADV – VERIFICATION REPORT
iv
List of Figures
Figure 1. Indicative locations of whole Australian weather radars (green), with radars used in this study pinpointed in red. Dashed squares correspond with approximated extents of radar data used in this experiment (square regions of 256 km per side). .............................. 5
Figure 2. Time series of mean and maximum rainfall rate for Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5). Mean rainfall rate is shown as black lines, with maximum values per time step in red and interquartile range in shaded blue. ..... 7
Figure 3. Time series of Wet Area Rate (WAR) for a threshold of 1mm/hr (red) and ratio of mean rainfall rates and standard deviation (blue) for Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5). High values of WAR correspond with wide-spread rainfall, with values close to zero corresponding to isolated cells of high intensity rain rates. Ratio of mean and standard deviation show a high correlation with WAR and may be used as alternative score. ......................................................................... 7
Figure 4. 5-min accumulated radar rainfall fields for time steps with Maximum Wet Area Ration (left), highest maximum rainfall (centre) and maximum Mean Rainfall (right) from event 5 of Brisbane radar (66). Extent of areas are 256 x 256 kms with pixel size of 0.5km. ................ 8
Figure 5. Time series of domain wide mean (observed and forecast) rainfall for first half of event 5 of Brisbane radar. Values in black correspond with the mean of 60-min accumulated rainfall fields calculated from 5-min accumulated radar data every 5 minutes. Multi-coloured lines correspond with the mean of forecast rainfall fields for each member of a selection of 96-member STEPS3-ADV rainfall ensembles. ...................................................................... 9
Figure 6. Observed and forecast mean rainfall values for selected STEPS3-ADV rainfall ensembles. Values are 60-min accumulated rainfall fields. Forecasts correspond to 96-member rainfall ensembles calculated using observed radar rainfall at a) 2020-02-05 13:00UTC (left), 2020-02-06 00:00UTC (centre) and 2020-02-06 19:00UTC (right) over the Brisbane (Mt. Stapylton) Radar. ........................................................................................... 10
Figure 7. 60-min accumulated rainfall fields for Brisbane radar at 2020-02-05 14:00UTC, that correspond with (top row, from left to right) estimated radar rainfall, mean rainfall from 96-member STEPS3-ADV rainfall forecast ensemble calculated at 2020-02-05 13:00UTC, and six individual members from the same rainfall ensemble. ................................................... 11
Figure 8. As Figure 7 but at 2020-02-06 01:00UTC. In this case, rainfall ensemble was calculated at 2020-02-06 00:00UTC. ................................................................................... 11
Figure 9. As Figure 7 but at 2020-02-06 20:00UTC. In this case, forecast rainfall ensemble was calculated at 2020-02-06 19:00UTC. ................................................................................... 12
Figure 10. 5-min rainfall fields for Brisbane radar at 2020-02-05 21:55UTC, that correspond with (top row, from left to right) estimated radar rainfall, mean rainfall from 96-member STEPS3-ADV rainfall forecast ensemble calculated at 2020-02-05 21:45UTC, and six individual members from the same forecast rainfall ensemble. ........................................................... 13
Figure 11. 5-min forecast rainfall fields for member 23 from a 96-member STEPS3-ADV rainfall forecast ensemble calculated at 2020-02-05 21:45UTC for Brisbane radar. Rainfall fields correspond with (from top to bottom, and left to right) 5-, 10-, 20-, 30-, 40-, 50-, 60-, 70-, 80- and 90-min lead times. Size of rainfall fields is 256 x 256 km at 0.5 km resolution. ............ 14
Figure 12. As Figure 11, but 5-min forecast rainfall fields are here extracted from member 43. Size of rainfall fields is 256 x 256 km at 0.5 km resolution. ................................................. 14
Figure 13. Same as in Figure 11, but 5-min forecast rainfall fields depict here a close-up area of 100x100 km centred at Brisbane city. .................................................................................. 15
Figure 14. Same as Figure 12, but 5-min forecast rainfall fields depict here close-up areas of 100x100 km centred at Brisbane city. .................................................................................. 15
STEPS3 – ADV – VERIFICATION REPORT
v
Figure 15. RMS error vs spread of STEPS3-ADV 60min rainfall ensemble forecasts for a selection of rainfall events. a) Brisbane radar (ID66), event 05; b) Melbourne radar (ID02), event 08; c) Sydney(Terrey Hills) radar (ID71), event 09; d) Perth (Serpentine) radar (ID70), event 04; e) Canberra radar (ID40), event 01; and f) Cairns Radar (ID19), event 03. ........ 17
Figure 16. Distribution of the RMSE (left) and Ensemble Spread (right) based on 100 rainfall events for lead times from 60 to 90 minutes ........................................................................ 17
Figure 17. Distribution of CRPS values of STEPS3-ADV rainfall forecasts for 100 rainfall events, per lead time. ....................................................................................................................... 18
Figure 18. ROC curves for STEPS3-ADV 96-member 60-min accumulated rainfall ensembles for Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5). ROCs correspond with multiple rainfall thresholds (0.2, 0.4, 0.6, 0.8, 1.0, 5.0, 10, 20 and 50 mm) and different lead times a) 60 minutes, b) 70 minutes, c) 80 minutes and d) 90 minutes. ................................................................................................................................ 19
Figure 19. Mean and spread of ROC curves for STEPS3-ADV 96-member 60-min accumulated rainfall ensembles for 100 events. Data correspond with multiple rainfall thresholds (0.2, 0.4, 0.6, 0.8, 1.0, 5.0, 10, 20 and 50 mm) for a lead time of 60 minutes. ............................ 20
Figure 20. ROC area results for 100 rainfall events and multiple rainfall thresholds (mm), grouped by Lead Time (minutes) ......................................................................................... 21
Figure 21. Rank histograms for STEPS3-ADV 96-member 60-min accumulated rainfall ensembles for Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5). Histograms correspond to lead times of a) 60 minutes, b) 70 minutes, c) 80 minutes and d) 90 minutes. .................................................................................................. 22
Figure 22. Distribution of rank histograms for STEPS3-ADV 96-member 60-min accumulated rainfall ensembles for 100 rainfall events. Histograms correspond to a lead time of 60 minutes. ................................................................................................................................ 22
Figure 23. Reliability diagrams for STEPS3-ADV 96-member 60-min accumulated rainfall ensembles for Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5). Diagrams correspond with lead time of 60 minutes and multiple rainfall threshold (from top to bottom, 0.2, 0.4, 0.6, 0.8, 1.0, 5.0, 10, 20 and 50 mm). Dashed horizontal lines show the climatological frequency for the given threshold, and the dotted lines midway between the 1:1 diagonal line and the horizontal denotes "no skill" relative to climatology. Shaded regions show the areas where ensembles have good reliability and therefore skill. Bar charts below each diagram show the number of times each probability value was predicted. ............................................................................................................................. 24
Figure 24. Reliability diagrams for 100 rainfall events for multiple rainfall thresholds (0.2, 0.4, 0.6, 0.8, 1.0, 5.0, 10, 20 and 50 mm) and different Lead Times (60, 70, 80 and 90 minutes). Lines correspond to the mean of 100 events results and shaded areas depict 5 –95 percentiles for the indicated lead times and rainfall thresholds. Background correspond with climatology areas of positive skill for a lead time of 60-min. ................................................ 25
Figure 25. ROCs for STEPS3-ADV 60-min accumulated rainfall ensembles with different ensemble members for (top row) T+60 (centre) T+70 and (bottom row) T+80 for Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5) ................... 26
Figure 26. ROC areas for different number of members in a STEPS3-ADV rainfall ensemble for multiple lead times (60 to 90 minutes) and different detection thresholds (0.2 to 50 mm). Results correspond with 60-min accumulated rainfall ensembles for the event 05 of Brisbane radar...................................................................................................................... 27
Figure 27. Evolution of CRPS values over domain for different number of ensemble members. Results correspond with 60-min accumulated rainfall ensembles for the event 05 of Brisbane radar (ID 66). ........................................................................................................ 28
STEPS3 – ADV – VERIFICATION REPORT
vi
Figure 28. Comparison of ROC Area distribution for 60-minute lead time based on STEPS1-ADV and STEPS3-ADV........................................................................................................ 29
Figure 29. Comparison of reliability plot for 60-, 70-, 80- and 90-minute lead times considering all 100 events based on STEPS1-ADV and STEPS3-ADV. Results are based on 24-member ensembles. Background correspond with climatology areas of positive skill for a lead time of 60-min. .............................................................................................................. 30
Figure 30. Comparison of CRPS based on STEPS3-ADV and STEPS1-ADV. Results are based on 24-member ensembles for all 100 events. ...................................................................... 30
Figure 31. Comparison of RMSE (left panel) and ensemble spread (right panel) for STEPS3-ADV (orange) and STEPS1-ADV (blue) ............................................................................... 30
Figure 32. Comparison of ROCs for STEPS3-ADV (green line), STEPS1-ADV (orange line) and pySTEPS (blue line) using 24 members for Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5) for a lead time of 60-min ............................. 32
Figure 33. ROC areas for STEPS3-ADV, STEPS1-ADV and pySTEPS using 24 rainfall ensembles for multiple lead times (60 to 90 minutes) and different detection thresholds (0.2 to 50 mm). Results correspond with 60-min accumulated rainfall ensembles for the event 05 of Brisbane radar. ............................................................................................................ 33
Figure 34. Comparison of reliability diagram STEPS3-ADV (green line), STEPS1-ADV (orange line) and pySTEPS (blue line) using 24-member 60-min accumulated rainfall ensembles for Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5). Diagrams correspond with a lead time of 60 minutes and multiple rainfall thresholds (from top to bottom, and left to right, 0.2, 0.4, 0.6, 0.8, 1.0, 5.0, 10, 20 and 50 mm). ................... 34
Figure 35. Comparison of RMSE (left) and Ensemble spread(right) based on STEPS3-ADV(green), STEPS1-ADV(orange) and pySTEPS(blue) using 24-member 60-min accumulated rainfall ensembles for Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5). ......................................................................................... 35
Figure 36. Evolution of CRPS values over domain based on STEPS3-ADV (green), STEPS1-ADV (orange) and pySTEPS (blue) using 24-member 60-min accumulated rainfall ensembles for Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5). .............................................................................................................................. 35
Figure 37. ROC area results for 10 rainfall events per radar and multiple rainfall thresholds (mm) for the 60-minute lead time. Colours indicate different rainfall thresholds.................. 36
STEPS3 – ADV – VERIFICATION REPORT
1
STEPS3 – ADV – Verification Report Carlos Velasco-Forero, Jayaram Pudashine, Mark Curtis, and Alan Seed
Radar Science and Nowcasting Team
Research Program, Science and Innovation Group
Australian Bureau of Meteorology
03 September 2020
1. EXECUTIVE SUMMARY
A new version of the Short-Term Ensemble Prediction System (STEPS) (Seed et al., 2003) is
under development as part of the Public Service Transformation (PST) program in the Bureau of
Meteorology. Main goals of this new development are 1) to reduce the computational time
required to produce rainfall ensembles to seconds from minutes (the average production time in
the current operational implementation known as STEPS1), 2) to improve the quality of rainfall
ensemble forecasts, 3) to extend the current coverage of the service to most of Australia and 4) to
use the latest programming techniques that allow the deployment of the new implementation in
cloud-based systems.
The focus of this report is to assess the quality of the rainfall ensemble forecasts and define
operational rules and configurations that fulfil most of the stakeholders' requirements, in this case,
60-min accumulated rainfall fields for lead times in the range of 60 to 90 minutes. This report
assesses a new STEPS implementation that produces rainfall ensemble forecasts using weather
radar data only (henceforth referred to as STEPS3-ADV). STEPS implementations that use
Numerical Weather Prediction (NWP) rainfall forecasts jointly with weather radar data to
generate rainfall ensembles are not within the scope of this work and will be assessed in future
reports.
In this experiment a total of 47,057 individual 5-min radar rainfall fields (equivalent to more than
163 days of rain) were analysed, by using the Bureau’s operational Rainfields (Seed et al., 2008)
datasets from 10 radars across Australia. 96-member rainfall ensembles were calculated for each
one of these individual radar fields to up to 90 minutes ahead.
This extensive verification exercise shows that STEPS3-ADV can generate reliable ensemble
rainfall forecasts in a large variety of rainfall conditions with comparable quality to available
open-source alternatives but delivering results up to 15 times faster. When compared with current
operational version, STEPS3-ADV have better performance in all scores analysed in this report
and can deliver results up to 30 times faster, showing strong capabilities for use as an operational
system in the Bureau.
STEPS3-ADV rainfall forecasts are suitable to correctly predict the probability of the occurrence
of hourly rainfall accumulations for rainfall thresholds in the range of 0.2 to 50 mm in the hour
for the 60- to-90-minute lead times. However, some degree of caution may be required by the
users for thresholds above 20mm until additional datasets with a larger number of occurrences of
rainfall in that range have been incorporated into the verification.
STEPS3-ADV ensembles seem to be under-dispersive and additional spread may be required to
improve the accuracy of the rainfall ensembles. However, expected errors are small with at least
75% of the case studies showing mean error values lower than 0.80 mm. Additionally, an
STEPS3 – ADV – VERIFICATION REPORT
2
assessment on the influence of the number of members of the ensemble in the quality of the
rainfall forecasts was carried out, finding that about 48 members may be needed to accurately
forecast the probability of the 50 mm accumulation during extreme events. Finally, it is important
to note that there is a significant variation in the quality of the predictions, and verification results
vary from radar to radar and from event to event depending of the nature of the radar, event,
accumulation threshold, and lead-time.
STEPS3 – ADV – VERIFICATION REPORT
3
2. SHORT DESCRIPTION OF STEPS
The Short-Term Ensemble Prediction System (STEPS) method uses a multiplicative cascade
scale decomposition approach for generating high-resolution ensembles of short-term rainfall
forecasts (nowcasts) from radar observations (Seed, 2003; Bowler et al., 2006). The main goal of
STEPS is to generate ensembles of rainfall forecasts that exhibit similar space-time structures to
those of observed rainfall over a range of space and time scales. Originally, this system blended
an advection forecast from radar observations with a noise model possessing the space-time
properties of observed rain fields (Bowler et al., 2004, 2006). This method has since been
extended to allow radar and numerical weather prediction (NWP) forecasts to be blended (Seed
et al., 2013).
Current operational implementation of STEPS in the Bureau of Meteorology consists of two
product types. The first one uses 5-min radar rainfall estimations for a single radar to generate 10
member ensembles of 5-min rainfall forecasts up to 90 minutes ahead. These products have a
spatial resolution of 0.5 x 0.5 km on a 256 x 256 km domain centred at the radar location and are
available for the radars at Adelaide, Melbourne, Sydney, and Brisbane only. The second product
type combines 10-min multi-radar rainfall estimations with 10-min ACCESS-C NWP rainfall
forecasts (Bureau National Operations Centre Operations, Bulletin Number 114, 2018) to create
10-member ensembles of 10-min rainfall forecasts up to 12-hours ahead, for seven regions across
Australia. These combined radar-NWP rainfall ensembles have a spatial resolution of 1 x 1 km
and covers domain of 512 x 512 km. An updated implementation of STEPS is required to fulfill
current user requirements with increased demand of nowcasting products across the country (see
next section) and to incorporate latest computational techniques that allow to use modern
architectures such as cloud computing.
3. METHODOLOGY
3.1 User needs
Currently, the Australian weather radar network consists of 58 radars, and the number is expected
to increase to 70 radars in the next decade. Stakeholders have expressed their interest to have
ensemble nowcasts of 5-min rainfall for each one of the radars in the Australian network at a high
spatial resolution (1km or less). End-users in Public Weather and Public Safety teams in the
Bureau were consulted about which products are required for nowcasts of rainfall accumulations,
and consensus was reached that a forecast of the accumulation of rainfall in the next hour, and
the probability that it would exceed a number of thresholds was required. Rainfall ensembles
generated by STEPS must have sufficient quality to be able to accurately predict the chance of
both light rainfall (<1mm/hr) for public weather applications and heavy rainfall (50mm/hr) for
warning applications.
STEPS3 – ADV – VERIFICATION REPORT
4
3.2 STEPS3
STEPS3 is a completely new implementation which has been built with the goal of providing low
cost, high performance and reliable operational nowcasting services for large scale radar
networks.
Scientific improvements to the STEPS algorithm have been realized in several important areas.
The decomposition filters have been redesigned to increase spectral isolation of cascade levels
while reducing ringing artefacts. An alternative optical flow technique has been adopted that
provides superior tracking of the rain fields in areas of low texture. Also, the process parameters
of the autoregressive models that drive stochastic evolution of the nowcast have been made
spatially varying. This change ensures that localized scaling characteristics are retained during
the life of the nowcast rather than becoming statistically homogenous.
As a piece of software, the engineering priorities for STEPS3 are performance, reliability and
suitability for operational deployment. The code base is highly parallelized, allowing the
generation of large nowcast ensembles with very low latency. While STEPS3 remains highly
configurable, significant effort has been made to ease the setup burden on users by optimizing
and tuning the model during development. This was facilitated through the use of a Continuous
Integration server that automatically evaluated every proposed change by generating and
verifying over 2,000 ensembles against a reference dataset of real-world scenarios.
Finally, STEPS3 can be deployed as a cloud-based application allowing to scale dynamically as
needed based on weather conditions, minimizing costs without placing demands on internal IT
resources. It is hoped that cloud-based deployment will be a first step towards providing easily
accessible high-quality radar-based nowcasting as a service to a broad range of users.
The new STEPS3 implementation that produces rainfall ensemble forecasts using weather radar
data only is henceforth referred to as STEPS3-ADV. STEPS3-ADV is developed to be able to
generate ensembles of rainfall forecasts for all radars in the Bureau of Meteorology’s weather
radar network across Australia. The new STEPS3-ADV system can generate ensembles of 5-min
rainfall nowcasts up to 90 minutes, the 60-min accumulation, and the probability that a range of
thresholds will be exceeded in the next hour.
3.3 Datasets
To analyse the quality of STEPS3-ADV rainfall ensembles under different weather conditions,
several rainfall events were selected for each of 10 weather radars located around Australia.
Selection criteria for radars included location around capital cities (8 radars) and subject to
extreme rainfall events (2 radars). List of the selected radars is in Table 1 and their indicative
locations in Figure 1.
A 6-month period from 1st October 2019 to 31 March 2020 was used to identify significant
rainfall events for all radars. It is noted that this period corresponds with the warm season in
Australia and it is likely that the results may be influenced by not having cool season events in
this analysis. The rainfall product chosen is the 5-min calibrated radar rainfall accumulation
generated in real-time by operational Bureau’s Rainfields system (Seed et. al., 2008). This
calibrated radar rainfall product is obtained using a series of quality control measures including
removal of ground and sea clutter, interferences, anomalous propagations, second trip and bright
STEPS3 – ADV – VERIFICATION REPORT
5
band contamination and partial beam blockages. This cleaned reflectivity is later converted to
surface rainfall map by firstly estimating the reflectivity at the earth surface using a three
dimensional interpolation, then converting these reflectivity values into rainfall estimates at
ground based on static Z-R relationships, and finally correcting gauge/radar bias by using near
real-time rain gauges information.
Radar
ID
Name Type Latitude
(° S)
Longitude
(° E)
2 Melbourne (VIC) S-band DualPol 37.86 144.76
19 Cairns/ Saddle Mountain (QLD) C-band Doppler 16.82 145.68
40 Canberra / Captain's Flat (ACT) S-band Doppler 35.66 149.51
63 Darwin - Berrimah (NT) C-band Doppler 12.46 130.93
64 Adelaide/ Buckland Park (SA) S-band DualPol 34.617 138.469
66 Brisbane/Mt. Stapylton (QLD) S-band DualPol 27.718 153.24
70 Perth/Serpentin (WA) C-band Doppler 32.39 115.87
71 Sydney/Terry Hills (NSW) S-band Doppler 33.70 151.21
76 Hobart/Mt.Koonya (TAS) C-band Doppler 43.11 147.81
78 Weipa Airport (QLD) C-band Doppler 12.67 141.92
Table 1. List of selected weather radars and its main characteristics.
Figure 1. Indicative locations of whole Australian weather radars (green), with radars used in this study
pinpointed in red. Dashed squares correspond with approximated extents of radar data used in this
experiment (square regions of 256 km per side).
STEPS3 – ADV – VERIFICATION REPORT
6
For each radar, rainfall events were identified using Wet Area Ratio (WAR) based on the fraction
of rainfall field above 1 mm/h each 5 minutes and that fulfilled the following three criteria:
• Minimum WAR in the event: 0.01
• Minimum storm duration: 3 hr
• Maximum gap with no rain within a storm: 1 hr
Using the above selection criteria, Table 2 shows the total number of rainfall events identified
and the duration of longest rainfall event for each of the 10 radars.
Radar ID Name No of rainfall events Longest duration (hr)
2 Melbourne (VIC) 45 66.08
19 Cairns (QLD) 95 106.16
40 Canberra (ACT) 80 108.83
63 Darwin - Berrimah (NT) 128 111.58
64 Adelaide/ Buckland Park (SA) 45 42.41
66 Brisbane/Mt. Stapylton (QLD) 63 168
70 Perth/Serpentin (WA) 57 70.58
71 Sydney/Terry Hills (NSW) 70 104.41
76 Hobart/Mt.Koonya (TAS) 81 90
78 Weipa Airport (QLD) 122 114.25
Table 2. Total number of identified rainfall events per radar and duration of the longest event.
Among the 786 rainfall events identified, 10 rainfall events that showed different types of
precipitation and rain fields evolution were handpicked for each one of the 10 radars and later
used in the verification analyses of STEPS3-ADV implementation.
A total of 47,057 individual 5-min radar rainfall fields (equivalent to more than 163 days of rain)
were analysed in this experiment. The main characteristics of the selected 100 rainfall events are
summarized per radar in Appendix.
Examples of mean areal rainfall time series and typical rainfall fields for some selected rainfall
events are shown next. Figure 2 shows temporal evolution of mean areal rainfall rate (black),
maximum rain rate value (red) and interquartile range (blue shade) for the longest event in the
archive (Event 5 of Brisbane Radar [Id 66]). Time steps with high maximum values but low mean
areal rainfall rates are indicative of intense convective cells travelling in the area of analysis, while
time steps with high mean areal values with moderate maximum values usually correspond with
wide-spread rainfall events with embedded high intensity cells. Time series of Wet Area Ratio
and mean and standard deviation ratio of rain rates for the same rainfall event are shown in Figure
3. Note the strong correlation between both ratios.
5-min accumulated rainfall fields for the time steps with maximum observed mean, highest
observed maximum value and maximum wet area ratio for event 5 over the Brisbane Radar are
shown in Figure 4. The adaptative scale-based scheme used by STEPS3-ADV allows for the
generation of ensembles of rainfall forecasts using these diverse range of observed rainfall fields.
Plots were generated for all 100 events but are not included in this report for the sake of simplicity.
STEPS3 – ADV – VERIFICATION REPORT
7
Figure 2. Time series of mean and maximum rainfall rate for Brisbane radar (ID 66) from 2020-02-04
19:00UTC to 2020-02-10 19:00UTC (event 5). Mean rainfall rate is shown as black lines, with maximum
values per time step in red and interquartile range in shaded blue.
Figure 3. Time series of Wet Area Rate (WAR) for a threshold of 1mm/hr (red) and ratio of mean rainfall
rates and standard deviation (blue) for Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10
19:00UTC (event 5). High values of WAR correspond with wide-spread rainfall, with values close to zero
corresponding to isolated cells of high intensity rain rates. Ratio of mean and standard deviation show a high
correlation with WAR and may be used as alternative score.
STEPS3 – ADV – VERIFICATION REPORT
8
Figure 4. 5-min accumulated radar rainfall fields for time steps with Maximum Wet Area Ration (left), highest
maximum rainfall (centre) and maximum Mean Rainfall (right) from event 5 of Brisbane radar (66). Extent of
areas are 256 x 256 kms with pixel size of 0.5km.
4. RAINFALL ENSEMBLES AND PROBABILISTIC VERIFICATION
5-min rainfall ensemble forecasts were calculated every 5 minutes up to 90 minutes after the
observation time using each one of the time steps of the 100 selected rainfall events. A total of 96
members were calculated for each lead time providing an opportunity to build robust statistical
scores and test the impact of ensemble size in the performance and accuracy of the forecast rainfall
fields. Simulations were undertaken with the assistance of resources and services from the
National Computational Infrastructure (NCI), which is supported by the Australian Government,
using supercomputer "GADI".
STEPS3-ADV stakeholders are mainly interested in the chance of rainfall for the next hour after
the observation time, and therefore verification analyses in this report will be solely using 60-min
accumulation rainfall fields.
Observed 60-min accumulated rainfall fields were calculated every 5 minutes by adding the
twelve previous 5-min rainfall accumulation rainfall fields until the accumulation time. 60-min
accumulated rainfall fields were only calculated if and only if all the twelve 5-min accumulated
rainfall fields in the period were available in the archive. The same accumulation scheme was
applied to each member of the STEPS3-ADV rainfall ensemble forecasts, and therefore a 96-
member ensemble of hourly rainfall forecasts were calculated adding 5-min rainfall forecasts.
Verification analyses were made for each rainfall event and then individual verification results
were concatenated to increase the number of samples and obtain statistically significant results
for each radar. Additionally, verification results from multiple radars were combined to
understand the overall STEPS3-ADV’s performance.
STEPS3 – ADV – VERIFICATION REPORT
9
4.1 Qualitative evaluation of forecast mean areal rainfall
The following figures show examples of STEPS3-ADV rainfall ensemble forecasts compared
with observed mean rainfall for the first half of event 5 of the Brisbane radars using 60-min
accumulation rainfall fields.
Figure 5 shows mean observed rainfall values for the whole Brisbane radar domain (black) and a
selection of domain wide mean forecast rainfall values for each member of STEPS3-ADV
ensembles (each member in a different colour) calculated using observed rainfall at different
times. Observed mean rainfall corresponds with the 60-min accumulation ending at the marked
time. As the frequency of original dataset is 5-min, 60-min accumulations can be estimated every
5 minutes as well for lead times in the range of 60 to 90 minutes. This figure shows how the
forecast mean rainfall of the members of the STEPS3-ADV rainfall forecast ensemble evolves
around the observed mean rainfall and spread of the ensemble members seems to vary depending
on the calculation time.
Figure 5. Time series of domain wide mean (observed and forecast) rainfall for first half of event 5 of Brisbane
radar. Values in black correspond with the mean of 60-min accumulated rainfall fields calculated from 5-min
accumulated radar data every 5 minutes. Multi-coloured lines correspond with the mean of forecast rainfall
fields for each member of a selection of 96-member STEPS3-ADV rainfall ensembles.
To facilitate comparisons, Figure 6 shows detailed versions of some of the ensemble results
presented in Figure 5. In addition to domain wide mean rainfall forecast for each of the members
of the ensemble, red lines in Figure 6 represent the domain wide mean rainfall of the whole
ensemble, and the base times used to calculate the 60-min rainfall forecasts ensembles are
highlighted in red. Results come from 96-member ensembles that were calculated using observed
rainfall at a) 2020-02-05 13:00UTC, b) 2020-02-06 00:00UTC and c) 2020-02-06 19:00UTC.
Note that as the comparison is done with 60-min accumulations for both observed and forecast
rainfall fields, the first forecasts are only available one hour after the base time of the ensemble
and from then, accumulated forecasts are calculated every 5 minutes until the end of forecast (90
minutes).
STEPS3 – ADV – VERIFICATION REPORT
10
For all cases in Figure 6, the ensemble mean areal rainfall remains almost the same for all lead
times which is the expected behaviour of STEPS methodology. The mean areal rainfall values
calculated for each of the ensemble members nicely spreads around the observed domain wide
mean rainfall, with the spread increasing with the lead time and changing depending the
conditions at the base time. For example, in the first case ( Figure 6 (left), 2020-02-05 13:00UTC)
the mean areal rainfall of ensemble members is scattered about 1.5mm for the first lead time and
extents about 2.5mm for the 90-min lead time. In the last case (Figure 6 (right), 2020-02-06
19:00UTC), mean areal rainfall of the ensemble members scatters about 4 mm for the 60-min
lead time and expands to almost 5 mm for the 90-min lead time.
It is important to note that STEPS3-ADV rainfall ensembles may however not show this nice
spread around the observed mean in other conditions. For example, in cases with strong changes
in the mean rainfall, either abrupt rises or decays, the assumption that the mean rainfall must
remain the same for the duration of the forecast may not be applicable and ensemble forecasts
may spread away of the observed mean rainfall values. This is the main reason to limit the forecast
duration of radar-only STEPS forecast until 90 minutes, and motivates the use of additional data
sources (such as NWP) to blend with radar that may better estimate the evolution of the mean
rainfall in the forecast area for longer periods.
Figure 6. Observed and forecast mean rainfall values for selected STEPS3-ADV rainfall ensembles. Values
are 60-min accumulated rainfall fields. Forecasts correspond to 96-member rainfall ensembles calculated
using observed radar rainfall at a) 2020-02-05 13:00UTC (left), 2020-02-06 00:00UTC (centre) and 2020-
02-06 19:00UTC (right) over the Brisbane (Mt. Stapylton) Radar.
4.2 Qualitative evaluation of forecast rainfall fields
Previous figures have compared the STEPS3-ADV rainfall ensembles with observed rainfall only
in terms of their mean areal rainfall values. Next, some examples of 60-min rainfall fields are
presented in the form of “postage stamps” to illustrate spatial similarities between observed and
forecast ensemble rainfall fields. Each figure includes, for one given time step, the estimated 60-
min accumulated radar rainfall (‘true’ rainfall) (top left), and from the STEPS3-ADV rainfall
ensemble calculated one hour earlier (lead time 60 minutes), the 60-min ensemble mean rainfall
forecast (second stamp from top row), and six 60-min rainfall forecast fields that correspond with
six different members of the 96 members available. Rainfall fields from single members in Figure
7, Figure 8 and Figure 9 generally show a good agreement with the estimated rainfall by radar at
the large scale, while providing clear alternatives at medium and small scales. In other words,
large areas of rainfall predicted in the member of the ensembles usually cover the same areas as
the observed accumulation, and local areas of heavy intensities are usually predicted in the similar
STEPS3 – ADV – VERIFICATION REPORT
11
locations of intense rain in the observed field. The ensemble mean rainfall fields are smoother
than the individual members fields, clearly highlighting those areas where rainfall is most likely
to occur, with values to be considered by users as the expected (most likely) value for a given
location, lead time and time step. Ensemble mean rainfall does not provide information about
possible extreme values for a given location and lead time that may be useful in some cases for
some users (such as extreme weather).
Figure 7. 60-min accumulated rainfall fields for Brisbane radar at 2020-02-05 14:00UTC, that correspond
with (top row, from left to right) estimated radar rainfall, mean rainfall from 96-member STEPS3-ADV rainfall
forecast ensemble calculated at 2020-02-05 13:00UTC, and six individual members from the same rainfall
ensemble.
Figure 8. As Figure 7 but at 2020-02-06 01:00UTC. In this case, rainfall ensemble was calculated at 2020-
02-06 00:00UTC.
STEPS3 – ADV – VERIFICATION REPORT
12
Figure 9. As Figure 7 but at 2020-02-06 20:00UTC. In this case, forecast rainfall ensemble was calculated
at 2020-02-06 19:00UTC.
It is worth noting again that 60-min forecast fields presented in the previous figures correspond
with accumulations of ‘native’ 5-min rainfall forecasts produced by STEPS3-ADV. Next,
examples of 5-min rainfall forecasts STEPS3-ADV are presented and some of their characteristics
are discussed. Figure 10 shows an example of 5-min rainfall observations and forecast for
Brisbane radar at 2020-02-05 21:55UTC where ensemble mean and members correspond with a
lead time of 10 minutes (i.e., forecast rainfall ensemble was calculated using radar rainfall data
observed at 2020-02-05 21:45UTC). Figure 10 is a clear example of the diversity of forecast
rainfall fields produced by STESP3-ADV at 5-min time steps, where large-scale features are
preserved in all the members of the ensemble but additionally, each member is enriched with
numerous, different, spatial-coherent medium- and small-scale features. In this way, forecast
rainfall fields match large features and stress areas with high intensities that were observed by the
weather radar later.
In addition to these spatial-correlated features, STEPS3-ADV also provide a temporal connection
between different lead times for each of the ensemble members. For one given member, forecast
rainfall fields evolve in time in the same way that observed rainfall fields do. It is important to
note that each member evolves in time differently to other members of the same ensemble. For
example, Figure 11 shows an example of the temporal evolution of the rainfall fields labelled as
member 23 of the STEPS-ADV ensemble calculated at 2020-02-05 21:45UTC for increasing lead
times from 5 minutes to 90 minutes. In this case, as forecast moves into larger lead times, the
overall rainy area of rainfall decays with high intensity cells persisting across the region for the
whole duration of the forecast, while their locations and intensities change slowly. Alternative
scenarios can be found by selecting other members in the same ensemble as the one presented in
Figure 12 for member 43. Forecast rainfall fields from member 43 indicate that after 30 minutes
the high intensity cells will mostly disappear from the western half of the region, with overall
rainfall areas heavily reduced as well.
STEPS3 – ADV – VERIFICATION REPORT
13
This diversity in STESP3-ADV rainfall forecasts become more relevant when smaller regions are
examined. Figure 13 and Figure 14 show close-ups of 100km around Brisbane City for the same
forecast rainfall fields shown in Figure 11 and Figure 12, respectively. It could be argued that this
may be the working scale for many hydrological models and large flash-flooding warning
systems. Variability at kilometre and sub-kilometre scales is quite remarkable in this case. For
example, member 23 predicts a high-intensity cells band to cross North Stradbroke Island from
north to south while increases its intensities during the first 30 minutes of the forecast period, but
then limiting its development to the eastern half of the domain for the rest of the forecast period.
On the other hand, member 43 predicts the arrival of a band of high-but-less-intense rainfall cells
during the first 15 minutes of the forecast period, that quickly decays and reduces to a narrow
area of intense precipitation localised about Gold Coast city. Both members strongly coincide to
forecast the occurrence of high-intense precipitation areas around Brisbane City (top third, centre
of the images) within the first 15 minutes of the forecast period.
Figure 10. 5-min rainfall fields for Brisbane radar at 2020-02-05 21:55UTC, that correspond with (top row,
from left to right) estimated radar rainfall, mean rainfall from 96-member STEPS3-ADV rainfall forecast
ensemble calculated at 2020-02-05 21:45UTC, and six individual members from the same forecast rainfall
ensemble.
STEPS3 – ADV – VERIFICATION REPORT
14
Figure 11. 5-min forecast rainfall fields for member 23 from a 96-member STEPS3-ADV rainfall forecast
ensemble calculated at 2020-02-05 21:45UTC for Brisbane radar. Rainfall fields correspond with (from top
to bottom, and left to right) 5-, 10-, 20-, 30-, 40-, 50-, 60-, 70-, 80- and 90-min lead times. Size of rainfall
fields is 256 x 256 km at 0.5 km resolution.
Figure 12. As Figure 11, but 5-min forecast rainfall fields are here extracted from member 43. Size of rainfall
fields is 256 x 256 km at 0.5 km resolution.
STEPS3 – ADV – VERIFICATION REPORT
15
Figure 13. Same as in Figure 11, but 5-min forecast rainfall fields depict here a close-up area of 100x100
km centred at Brisbane city.
Figure 14. Same as Figure 12, but 5-min forecast rainfall fields depict here close-up areas of 100x100 km
centred at Brisbane city.
4.3 Probabilistic Verification
In order to evaluate the quality of STEPS3-ADV rainfall ensembles, the following thresholds for
rainfall in mm/hr were used: 0.2, 0.4, 0.6, 0.8, 1.0, 5.0, 10.0, 20.0, and 50.0. The first threshold
identifies the chance of any rainfall; the next four thresholds will assess the ability of STEPS3-
ADV to properly identify the chance of light rain, while the last five thresholds address the goal
to properly identify significant and very intense rainfall amounts.
An ensemble forecast must have at least the following characteristics to be considered useful:
STEPS3 – ADV – VERIFICATION REPORT
16
• enough spread in the ensemble where the forecast values adequately represent the
uncertainty of the forecasts,
• enough reliability where the predicted probabilities of an event correspond to their
observed frequencies, and
• enough skill to forecast extreme values (probabilities near 0 or 100 %) rather than values
clustered around the mean.
In this study, different aspects of forecast quality are characterized by evaluating the root-mean-
square error (RMSE), the ensemble spread, the continuous ranked probability score (CRPS), the
Reliability curve, the Relative Operating Characteristic (ROC) curve, and the Rank histogram.
4.3.1 Root Mean Square Error – Ensemble Spread
One of the more common scores to assess the accuracy of rainfall forecasts consists in plotting,
as a function of lead time, both the root-mean-square error (RMSE) of the ensemble mean and
the average spread of the ensemble. Palmer et al., 2006 showed that in a 'perfect ensemble' the
mean of the spread should be equal to the RMSE over the same period.
RMSE provides the square root of the average square error of the forecasts and has the same units
as the forecasts and observations. The lower the RMSE, the better the ensemble. On the other
hand, the spread of the ensemble is calculated, in some cases, as the square root of average
ensemble variance, and more commonly as the average of ensemble standard deviation. Fortin et
al., 2014 however proved that only the first option is correct.
Figure 15 shows some examples of RMSE and Spread values of STEPS3-ADV rainfall ensembles
for a selection of rainfall events for multiple lead times. In these Spread-RMSE diagrams, an
under dispersive ensemble (i.e., an ensemble that does need more spread) will have the spread
values siting below the RMSE values, while spread points siting above RMSE values represent
an over dispersive ensemble (i.e., ensemble spread is greater than the RMS error). Results of
STEPS3-ADV ensembles indicate RMSE increases with lead time as expected with ensembles
be slightly under-dispersive. Additional spread or additional reduction in error may be required
to improve the accuracy of the rainfall ensembles. This overall behaviour seems to persist among
all radars and interestingly the level of under dispersion does not change significantly for the
different lead times assessed here (60 to 90 minutes). Figure 16 shows mean and dispersion of
RMSE and Ensemble spread values for the full set of 100 rainfall events analysed confirming the
increase of RMSE with lead time and an under dispersive behaviour of about 20% for all lead
times. Additional trials and adjustments to the algorithm may be explored to reduce this under
dispersion (such as increasing the variance of the perturbations in the diagnosed field advection
vectors).
STEPS3 – ADV – VERIFICATION REPORT
17
a)
b)
c)
d)
e)
f)
Figure 15. RMS error vs spread of STEPS3-ADV 60min rainfall ensemble forecasts for a selection of rainfall
events. a) Brisbane radar (ID66), event 05; b) Melbourne radar (ID02), event 08; c) Sydney(Terrey Hills)
radar (ID71), event 09; d) Perth (Serpentine) radar (ID70), event 04; e) Canberra radar (ID40), event 01;
and f) Cairns Radar (ID19), event 03.
Figure 16. Distribution of the RMSE (left) and Ensemble Spread (right) based on 100 rainfall events for lead
times from 60 to 90 minutes
STEPS3 – ADV – VERIFICATION REPORT
18
4.3.2 Continuous Ranked Probability Score (CRPS)
The continuous ranked probability score (CRPS) (Hersbach, 2000) is a summary statistic
comparing the forecast cumulative distribution with the corresponding distribution from the
observations. The mean CRPS is the mean of the CRPS values calculated for all forecasts. The
smaller the CRPS values the better; because of the differences between forecast and observed
probability distributions are smaller. The CRPS is expressed in the same unit as the observed
variable. The CRPS generalizes the mean absolute error to probabilistic forecasts. It reduces to
the mean absolute error (MAE) if the forecast is deterministic.
CRPS values were calculated for all rainfall events and for lead times in the range of 60 to 90
minutes. Figure 17 shows the distribution of the CRPS values for all events. Mean CRPS values
are small for all lead times with interquartile ranges varying between 0.3 and 0.8 mm. As
expected, CRPS values degrade (increase) as lead time increases but remain on average in the
same range, with at least 75% of the cases showing CRPS values lower than 0.80 mm.
Figure 17. Distribution of CRPS values of STEPS3-ADV rainfall forecasts for 100 rainfall events, per lead
time.
4.3.3 Relative Operating Characteristic (ROC)
Relative Operating Characteristic (ROC) measures the ability of the forecast to discriminate
between two alternative outcomes, therefore measuring resolution. For any event, a graph known
as ROC curve can be constructed to offer information on the expected hit rates and false alarm
rates from using different probabilities thresholds to initiate action. ROC curves can be used to
identify the probability threshold that provides the best trade-off between hit rate and false alarm
rate for a given decision. When the hit rates exceed the false alarm rates and the forecast is skilful.
The closest to the top left corner of the plot, the more skilful the forecast will be. A perfect score
is obtained if curve travels from bottom left to top left of diagram and then across the top right of
the diagram. A forecast that has no skill will have hit rates equals to the false alarm rates and the
ROC curve will be positioned along the diagonal.
STEPS3 – ADV – VERIFICATION REPORT
19
The areas under ROC curves (AUC) are commonly used to compare the usefulness of different
scenarios, where a greater area (closer to 1) means a more useful scenario. In general, an AUC of
0.5 suggests no ability of discrimination, 0.7 to 0.8 is considered acceptable, 0.8 to 0.9 is
considered excellent, and more than 0.9 is considered outstanding (Hosmer et al., 2013).
a)
b)
c)
d)
Figure 18. ROC curves for STEPS3-ADV 96-member 60-min accumulated rainfall ensembles for Brisbane
radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5). ROCs correspond with multiple
rainfall thresholds (0.2, 0.4, 0.6, 0.8, 1.0, 5.0, 10, 20 and 50 mm) and different lead times a) 60 minutes, b)
70 minutes, c) 80 minutes and d) 90 minutes.
Figure 18 shows an example of ROC curves for multiple lead times and rainfall thresholds for
STEPS3-ADV 96-member rainfall forecasts corresponding to the event 5 of Brisbane radar. ROC
for all thresholds are typical of a system that allows a good discrimination of events, with curves
close to the top left corner of the diagram. As expected, the longer the lead time, the system is
less able to identify events, but in this case, ROCs show that the system is still useful to identify
events up to 50mm in an hour for a lead time of 90 minutes. Although ROCs change from event
to event and from radar to radar, behaviours showed in Figure 18 seems to be representative of
STEPS3 – ADV – VERIFICATION REPORT
20
other rainfall events, as shown in Figure 19 where the mean ROC and spread for 100 events for a
lead time of 60 minutes are presented for multiple thresholds.
Figure 19. Mean and spread of ROC curves for STEPS3-ADV 96-member 60-min accumulated rainfall
ensembles for 100 events. Data correspond with multiple rainfall thresholds (0.2, 0.4, 0.6, 0.8, 1.0, 5.0, 10,
20 and 50 mm) for a lead time of 60 minutes.
Even further, overall results can be seen in Figure 20 that summarizes the ROC areas for all 100
events analysed in this study for multiple lead times and thresholds. These results indicate that
STEPS3-ADV can effectively identify the occurrence of rainfall events using rainfall thresholds
up to 10mm in an hour for all lead times, and up to 20mm for 60 or 70-minutes lead times.
Unfortunately, STEPS3-ADV seems to be unable to properly identify the occurrence of rainfall
events of 50mm in an hour for the longer lead times, although for a lead time of 60 minutes it was
able to provide useful advice for about half of the events. This is mainly due to limited number
of timesteps having 50 mm in an hour. To note that there were only 38 events where at least one
of the observed rainfall values exceeded the threshold of 50 mm.
STEPS3 – ADV – VERIFICATION REPORT
21
Figure 20. ROC area results for 100 rainfall events and multiple rainfall thresholds (mm), grouped by Lead
Time (minutes)
4.3.4 Rank (Talagrand) Histogram
A more detailed way of analysing the ensemble spread is to construct a rank histogram or
Talagrand diagram (Talagrand et al., 1997). The Talagrand diagram is the histogram of
frequencies of the rank of the observed data within the forecast ensemble.
In a good ensemble forecast system, all members should have equal ability to capture the
observations, thus the observed dataset should be distributed among the ensemble members
uniformly and the Talagrand diagram would be flat. If the ensemble spread is too large, the rank
histogram is ∩-shaped indicating that many observations are falling near the centre of the
ensemble; on the contrary, if ensemble spread is too small and therefore many observations are
falling outside the extremes of the ensemble, the rank histogram is ∪-shaped. If rank histogram
shows an asymmetric shape, that indicates the presence of bias in the ensemble.
Figure 21 shows rank histograms for lead times from 60 to 90 minutes for STEPS3-ADV 96-
member rainfall ensembles for the event 05 from the Brisbane radar. All histograms show an
asymmetric shape, with the first and last rank accounting for more observations than the other
ranks. This shape may indicate the presence of a bias in the ensemble, in this case, rainfall
forecasts tend to be lower than observations. This shape may be also result of the initiation of new
raining areas not being captured by STEPS. To note that STEPS needs that some rain data have
been detected by the radar to be able to nowcast rain rates for the following time steps. If radar
has not detected any falls, the nowcast values will be zero. Therefore, STEPS will underestimate
forecast rainfall in cases where radar has not detected any rain, but rapidly moving rain bands
enter the radar umbrella, or when convection rainfall is initiated by orographic enhancement or
costal effects. Once again, rank histograms vary from radar to radar and from event to event, but
these ones can be considered typical. Figure 22 summarizes Rank histogram results showing the
distribution of rank frequencies for all 100 events analysed in this study for the 60-minutes lead
time. Overall behaviour matches with the one for the single event described earlier, with the
ensembles tending to be lower than observations as an asymmetric shape in the summary rank
histogram is clear.
STEPS3 – ADV – VERIFICATION REPORT
22
Figure 21. Rank histograms for STEPS3-ADV 96-member 60-min accumulated rainfall ensembles for
Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5). Histograms
correspond to lead times of a) 60 minutes, b) 70 minutes, c) 80 minutes and d) 90 minutes.
Figure 22. Distribution of rank histograms for STEPS3-ADV 96-member 60-min accumulated rainfall
ensembles for 100 rainfall events. Histograms correspond to a lead time of 60 minutes.
4.3.5 Reliability (Attribute) Diagrams
The reliability or attribute diagram measures how well the predicted probabilities of an event
correspond to their observed frequencies (reliability). In this diagram, the observed frequency is
plotted against forecast probability for all probability categories, indicating good reliability a line
close to the diagonal. A deviation below the diagonal represents forecast probabilities that are too
high (ensemble is predicting higher chances than observed frequency). On the other hand, a
deviation above diagonal indicate forecast probabilities that are too low, i.e., ensemble is
STEPS3 – ADV – VERIFICATION REPORT
23
predicting less chance than the observed frequency. The flatter the curve in the diagram, the less
resolution and reliability it has.
Figure 23 shows an example of Reliability diagrams for a lead time of 60-min and multiple rainfall
thresholds for STEPS3-ADV 96-member rainfall forecasts corresponding to the event 5 of
Brisbane radar. Dashed horizontal lines show the climatological frequency for the given
threshold, and the dotted lines midway between the diagonal line and the horizontal denotes "no
skill" relative to climatology. Shaded regions highlight areas where an ensemble have good
reliability and therefore skill. The bar charts below each diagram show the number of times each
probability value was predicted.
In this case, Reliability plots seem to indicate that ensembles adequate predict the extreme
probabilities but show diversions from the expected values in the middle probabilities for the
lowest rainfall threshold. However, results for rainfall thresholds in the range of 0.4 to 20 mm
seems to have a good reliability for all the range of probabilities for this lead time. Reliabilities
for the 50mm threshold are clearly inadequate and seems to be affected by the limited number of
observed values having 50 mm in an hour to establish a valid comparison.
It is important to note that according to the probability histograms, STEPS3-ADV does not
forecast the middle probabilities very often, and that it is another indication that ensembles are
very ‘sharp’. STESP3-ADV ensembles seems to be very confident in saying that a rainfall event
above given threshold 'definitely won't happen' or 'definitely will happen', but it seems to have a
reduced reliability to forecast middle probabilities. These results are consistent with the high ROC
area values observed for this event discussed earlier and displayed in Figure 18.
As reliability of a rainfall ensemble change from rain event to rain event, from radar to radar and
from lead time to lead time, the behaviours shown in Figure 23 may not to be representative for
the whole archive. In order to identify an overall result, Figure 24 summarizes reliability diagrams
for the 100 events analysed in this experiment. Each line in the figure aggregates over all values
of observed relative frequency at each forecast probability bin and displays the mean and a
confidence interval (5 - 95 percentiles) using coloured shaded areas.
In an intent to summarize areas with positive skill for all events, grey scaled backgrounds have
been added to each one of the reliability plots in Figure 24. In the case of one single event blue
shaded areas were used to indicate areas of positive skill (as showed in Figure 23). These areas
of positive skill change from event to event as they are based on the climatological frequency of
the rainfall event. As here 100 events are summarized, the larger number of events considered a
forecast probability – observed frequency pair with a positive skill the darker the background
becomes. If all events considered a pair with a positive skill (such all the points along the line 1-
1) the background for that pair is the darkest. On the other hand, if only few events or no events
identified a pair with a positive skill that pair is barely painted. Backgrounds in Figure 24
correspond with these climatology areas for a lead time of 60-min.
Aggregated results show that STEPS3-ADV ensembles have a good reliability for rainfall
thresholds up to 5 mm/hr for all the lead times assesses in this study. Reliabilities seem to have
a positive skill for threshold of 10mm/hr up to 80-min lead time, and for 20 mm/hr up to 60-min
only with reliability decaying for longer lead times. Reliabilities for 50 mm/hr thresholds seems
to be on average inadequate for any of the lead times analysed, although a lack of enough
observed data in these ranges of precipitation that allow a robust score for this threshold may be
the cause of these results. These results are consistent with those ones showed in Figure 23 for
STEPS3 – ADV – VERIFICATION REPORT
24
the longest event in the dataset and also with the high ROC area values discussed earlier and
displayed in Figure 20.
Figure 23. Reliability diagrams for STEPS3-ADV 96-member 60-min accumulated rainfall ensembles for
Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5). Diagrams correspond
with lead time of 60 minutes and multiple rainfall threshold (from top to bottom, 0.2, 0.4, 0.6, 0.8, 1.0, 5.0,
10, 20 and 50 mm). Dashed horizontal lines show the climatological frequency for the given threshold, and
the dotted lines midway between the 1:1 diagonal line and the horizontal denotes "no skill" relative to
climatology. Shaded regions show the areas where ensembles have good reliability and therefore skill. Bar
charts below each diagram show the number of times each probability value was predicted.
STEPS3 – ADV – VERIFICATION REPORT
25
Figure 24. Reliability diagrams for 100 rainfall events for multiple rainfall thresholds (0.2, 0.4, 0.6, 0.8, 1.0,
5.0, 10, 20 and 50 mm) and different Lead Times (60, 70, 80 and 90 minutes). Lines correspond to the mean
of 100 events results and shaded areas depict 5 –95 percentiles for the indicated lead times and rainfall
thresholds. Background correspond with climatology areas of positive skill for a lead time of 60-min.
4.4 Performance based on different number of ensemble members
To understand the performance of forecast based on different number of ensemble members, the
longest event in the dataset (Radar 66, Event 5) was further verified with the different number of
ensemble members having 48, 24, 12 and 6 in addition to the original 96 members. This analysis
STEPS3 – ADV – VERIFICATION REPORT
26
provides insight on using the optimum number of ensemble member for providing reliable
probabilistic forecasts of rain/no rain and the probability of extreme/high rainfall intensity.
Figure 25 presents a comparison of the ROC curve using 48, 24 and 12 members from 60 to 80
minutes lead times. ROCs in this figure can be directly compared with the ROC curve based on
96 members that are shown in Figure 18. In an overall comparison, the area under the ROC curve
decreases with the lower number of ensemble members. This decrease in ROC area seems to be
more prominent in higher rainfall threshold (e g. 20 and 50 mm h-1).
Figure 25. ROCs for STEPS3-ADV 60-min accumulated rainfall ensembles with different ensemble
members for (top row) T+60 (centre) T+70 and (bottom row) T+80 for Brisbane radar (ID 66) from 2020-02-
04 19:00UTC to 2020-02-10 19:00UTC (event 5)
The area under the ROC curve for ensembles with the different number of members is presented
in Figure 26. Results show that, as expected, the larger number of ensemble members, the higher
area under the ROC curve remains, the more useful the ensemble is. However, for lower rainfall
thresholds (from 0.2 up to 5 mm), the ROC areas remain close to the baseline derived from 96-
member ensemble for ensembles with up to 12 members. For the higher rainfall thresholds (10 to
50 mm), however, ROC areas obtained using lower ensemble members are significantly lower
than the baseline. This means ensembles with up to 12 members may be enough for predicting
rain/no rain for the lower rainfall intensities (up to 5mm) for all lead times; however for the higher
intensity rainfall, a larger number of ensemble members are required to provide useful predictions
for lead times larger than 70 minutes. Even though this result is based on a single event, there
would be a similar trend in the variation of result for area under the ROC curve for other radars
and rainfall events.
STEPS3 – ADV – VERIFICATION REPORT
27
Figure 26. ROC areas for different number of members in a STEPS3-ADV rainfall ensemble for multiple
lead times (60 to 90 minutes) and different detection thresholds (0.2 to 50 mm). Results correspond with
60-min accumulated rainfall ensembles for the event 05 of Brisbane radar.
CRPS results using ensembles with the different number of members for the same event as above
are presented in Figure 27. Results seem to confirm our conclusions after analysing ROC areas.
The higher the number of members, the lower the CRPS values are and therefore the more
accurate the ensemble is. As expected, when the number of members is reduced, higher CRPS
values are obtained indicating the ensembles are less accurate. It is observed that CRPS values
are lower than 0.9 mm/hr for ensembles with more than to 24 members, however this value
increases above 1.0 mm/hr for an ensemble with just 6 members indicating larger errors and
therefore a lower performance when number of members is heavily reduced.
STEPS3 – ADV – VERIFICATION REPORT
28
Figure 27. Evolution of CRPS values over domain for different number of ensemble members. Results
correspond with 60-min accumulated rainfall ensembles for the event 05 of Brisbane radar (ID 66).
4.5 Comparison with existing operational system, STEPS1-ADV
This section provides the comparison of STEPS3-ADV with the existing operational system
(henceforth referred to as STEPS1-ADV). STEPS1-ADV is being used as an operational system
in the Bureau of Meteorology which generates 10 ensemble members for four radars. Table 3
shows the difference in average computation time for STEPS3-ADV and STEPS1-ADV using
24-core machines on NCI's supercomputer GADI. Recorded durations included the total
computation time involving all the processes from reading data, creating ensembles and writing
results back to disk. It is observed that STEPS3-ADV is more than 30 times computationally
efficient compared to the existing STEPS1-ADV. One of the main reasons behind this
improvement is STEPS3-ADV was designed to utilise multiple cores and threads making the use
of available resources more efficient, while STEPS1-ADV is a single-core implementation.
Table 3 Average computation times (in seconds) for STEPS3-ADV and STEPS1-ADV using 24 cores
Ensemble members STEPS3-ADV STEPS1-ADV
96 24 780
24 12 360
As STEPS1-ADV is taking significant time for generating 96 members, to reduce the
computational time, rainfall ensembles with only 24 members were calculated with STEPS1-
ADV for all 100 rainfall events. Those 24-member rainfall ensembles are compared in this section
with the rainfall ensembles calculated using STESP3-ADV (that were reduced to 24-member).
Figure 28 shows the comparison of the distribution of the ROC Area for 60 minutes lead time
based on different threshold and radar for both versions of STEPS-ADV. An increase in the ROC
Area for the rainfall ensembles calculated by STESP3-ADV is clear for all the threshold and
radars. Also, the variability of the ROC Area is reduced for STEPS3-ADV compared with an
existing operational STEPS1-ADV.
STEPS3 – ADV – VERIFICATION REPORT
29
Figure 28. Comparison of ROC Area distribution for 60-minute lead time based on STEPS1-ADV and
STEPS3-ADV
Similarly, Figure 29 shows the comparison of the reliability plot based on STEPS3-ADV and
STEPS1-ADV for 60-, 70-, 80- and 90-minute lead times considering all 100 rainfall events. It
can be clearly observed that there is a significant improvement on result based on STEPS3-ADV
for all rainfall threshold. For the rainfall threshold from 0.2 to 5.0 mm, reliability curves produced
from STEPS3-ADV rainfall ensembles are close to 1:1 line indicating superior reliability
compared with STEPS1-ADV rainfall ensembles that are predicting significant less chance than
the observed frequencies for the lower probabilities. For higher rainfall thresholds (10 and 20
mm/hr), STEPS3-ADV was able to provide adequate reliable results up to 60-minute lead times
showing a better performance compared with STEPS1-ADV that was unable to provide reliable
results for the same rainfall thresholds.
Figure 30 shows the comparison of the CRPS distributions of rainfall ensembles calculated by
STEPS3-ADV and STEPS1-ADV for 100 events. It is also clear that the mean CRPS for all lead
time based on STEPS3-ADV is lower compared to STEPS1-ADV indicating better performance;
however, STEPS3-ADV showed higher outlier values of CRPS compared with STEPS1-ADV.
Figure 31 shows the comparison of RMSE and ensemble spread based on STEPS3 and STEPS1-
ADV for 100 events. Overall RMSE values for all lead times (60 to 90 minutes) based on
STEPS3-ADV are lower than those obtained using STEPS1-ADV what indicates that the new
STEPS3-ADV is more accurate. Also looking at the ensemble spread, STEPS3-ADV has shown
higher values of ensemble spread compared to STEPS1-ADV what indicates that STEPS3-ADV
produces rainfall ensembles less under dispersive than the current ones.
STEPS3 – ADV – VERIFICATION REPORT
30
STEPS3-ADV STEPS1-ADV
Figure 29. Comparison of reliability plot for 60-, 70-, 80- and 90-minute lead times considering all 100 events
based on STEPS1-ADV and STEPS3-ADV. Results are based on 24-member ensembles. Background
correspond with climatology areas of positive skill for a lead time of 60-min.
Figure 30. Comparison of CRPS based on STEPS3-ADV and STEPS1-ADV. Results are based on 24-
member ensembles for all 100 events.
Figure 31. Comparison of RMSE (left panel) and ensemble spread (right panel) for STEPS3-ADV (orange)
and STEPS1-ADV (blue)
STEPS3 – ADV – VERIFICATION REPORT
31
4.6 Comparison with pySTEPS
As there are a few open source tools used by the broad scientific community involved in rainfall
nowcasting mainly for research purposes. It is worth therefore to provide a comparison of the
performance of at least one of those research-based tools with our latest STEPS3-ADV to have
an idea on its operation and scientific value. Thus, this section provides the comparison for the
performance of STEPS3-ADV and pySTEPS to provide ensembles of forecasting rainfall from
60 to 90 minutes. pySTEPS (Pulkkinen et al., 2019) is an open-source python library for
probabilistic precipitation based on the STEPS methodology for nowcasting which was written
mainly for research purposes. For the sake of simplicity and to reduce the computation time, 24-
member ensembles for the longest rainfall event from the database was considered for this
analysis. Table 4 shows the difference in average computation times (this includes the total
computation time involving all the processes from reading data, creating ensembles and writing
results back to disk) for STEPS3-ADV and pySTEPS using 24-core machine on NCI's
supercomputer GADI. It was observed that STEPS3-ADV is more than 15 times computationally
efficient compared to pySTEPS.
Table 4. Average computation times for STEPS3-ADV and pySTEPS (time in seconds) using 24 cores
Ensemble members STEPS3-ADV pySTEPS
96 24 360
24 12 240
Comparisons between the performance and quality of rainfall forecasts of STEPS3-ADV,
STEPS1-ADV and pySTEPS using a few probabilistic scores are presented next.
Figure 32 presents a comparison of the ROC curve obtained from the three systems using 24
ensemble members for a lead time of 60-min and multiple thresholds. For the lower rainfall
threshold, the area under the ROC curve obtained using STEPS3-ADV is very similar to what has
been obtained from pySTEPS, however for the higher rainfall threshold STEPS3-ADV showed
higher area indicating better performance compared with STEPS1-ADV and pySTEPS. For all
thresholds, STESP1-ADV showed the worse performance among the three alternatives analysed
here.
Figure 33 shows the evolution of area under ROC for multiple lead times for the three systems.
It can be observed that for all thresholds and all lead times, ROC areas are higher for STEPS3-
ADV and pySTEPS compared with STEPS1-ADV. Please note that STESP3-ADV and pySTEPS
have very similar values of area under ROC for all lead times and thresholds except for the highest
rainfall threshold (50 mm/hr) where STEPS3-ADV clearly shows a superior performance. We
believe that this result is because STEPS3-ADV is free to dynamically evolve higher rain rates
while pySTEPS and STEPS1-AVD usually applies post processing techniques such as probability
matching that may cap rain rates in the rainfall ensemble to those that were already observed in
the input radar field.
STEPS3 – ADV – VERIFICATION REPORT
32
Figure 32. Comparison of ROCs for STEPS3-ADV (green line), STEPS1-ADV (orange line) and pySTEPS
(blue line) using 24 members for Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10
19:00UTC (event 5) for a lead time of 60-min
STEPS3 – ADV – VERIFICATION REPORT
33
Figure 33. ROC areas for STEPS3-ADV, STEPS1-ADV and pySTEPS using 24 rainfall ensembles for
multiple lead times (60 to 90 minutes) and different detection thresholds (0.2 to 50 mm). Results correspond
with 60-min accumulated rainfall ensembles for the event 05 of Brisbane radar.
Figure 34 shows the comparison of the reliability plot between STEPS3-ADV, STEPS1-ADV
and pySTEPS for the 60 minutes lead time and multiple rainfall thresholds. For this event,
pySTEPS seems to be more reliable than STEPS3-ADV and STEPS1-ADV for rainfall thresholds
lower than 0.6mm/hr. However, for rainfall thresholds in the range of 0.8 and 20 mm/hr results
from STEPS3-ADV and pySTEPS models are comparable, showing high reliabilities that are
relatively superior to STEPS1-ADV ones.
STEPS3 – ADV – VERIFICATION REPORT
34
Figure 34. Comparison of reliability diagram STEPS3-ADV (green line), STEPS1-ADV (orange line) and
pySTEPS (blue line) using 24-member 60-min accumulated rainfall ensembles for Brisbane radar (ID 66)
from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5). Diagrams correspond with a lead time of 60
minutes and multiple rainfall thresholds (from top to bottom, and left to right, 0.2, 0.4, 0.6, 0.8, 1.0, 5.0, 10,
20 and 50 mm).
Figure 35 shows the comparison RMSE and ensemble spread of STEPS3-ADV, STEPS1-ADV
and pySTEPS. It is observed RMSE values for STEPS1-ADV are significantly higher than those
ones for pySTEPS and STEPS3-ADV, with the lowest ensemble spread values among the three
alternatives. RMSE values for STEPS3-ADV and pySTEPS are quite similar for all lead times
but STEPS3-ADV provides rainfall ensembles with higher spread than any of the alternatives.
STEPS3 – ADV – VERIFICATION REPORT
35
Figure 35. Comparison of RMSE (left) and Ensemble spread(right) based on STEPS3-ADV(green),
STEPS1-ADV(orange) and pySTEPS(blue) using 24-member 60-min accumulated rainfall ensembles for
Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5).
Finally, when comparing CPRS values between STEPS3-ADV, STEPS1-ADV and pySTEPS
(Figure 36), it is observed again that STEPS1-ADV has the highest level of error among the
alternatives (higher CRPS values for all lead times) with STEPS3-ADV and pySTEPS performing
in similar levels for all lead times for this rainfall event.
Figure 36. Evolution of CRPS values over domain based on STEPS3-ADV (green), STEPS1-ADV (orange)
and pySTEPS (blue) using 24-member 60-min accumulated rainfall ensembles for Brisbane radar (ID 66)
from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5).
5. OPERATIONAL CONFIGURATION FOR STEPS3-ADV
As mentioned earlier, the main requirement from STEPS stakeholders was to obtain from
STEPS3-ADV rainfall ensembles the probability of rainfall exceeding some given rainfall
thresholds in the next hour (60-min). STEPS3-ADV clearly shows large improvements in both
the quality of the rainfall ensembles and computing efficiency when compared to the existing
production system STESP1-ADV and an open-source alternative (pySTEPS).
STEPS3 – ADV – VERIFICATION REPORT
36
After the analysis of the STEPS3-ADV case studies, the following configuration for the
operational STEPS3-ADV system is recommended:
• Update Frequency: 5 minutes
• Product: 60-min accumulated rainfall fields from 5-min rainfall ensembles
• Lead Time: 60 to 90 minutes
• Minimum number of members in the ensemble: 48
• Minimum threshold to identify event occurrence: 0.2 mm in one hour
• Maximum threshold to identify event occurrence: 50 mm in one hour
The recommended number of members 48 will allow getting rainfall ensembles with similar
performance than the very large ensembles analysed here but with half of the size of data files
and less computing processing. See the description and analysis of Figure 25, Figure 26, Figure
27 for more details.
It is important to note that some degree of caution may be required by the users for thresholds
above 20mm until additional datasets with a larger number of occurrences of rainfall in that
range have been incorporated into the verification. Verification results show that STEPS3-ADV
is capable to identify the occurrence or non-occurrence of a rainfall event (high chances or small
chances) but it may be too sharp in its current configuration to identify intermediate chances of
occurrence for some rainfall thresholds.
Also, there are a significant spread in the results from one event to the next one and from one
radar to the next one. For example, Figure 37 shows the distribution of ROC area of STEPS3-
ADV rainfall forecasts for a 60-minute lead time for each of the 10 radars analysed (Table 2) for
multiple rainfall thresholds. Figure 37 disaggregates per radar the results summarized in Figure
20 for the 60-minute lead time. It is clear that overall good performance occurred at some radars
(high ROC values) for most of the rainfall thresholds (e.g., radars Brisbane [66] and Weipa [78]),
while performance at other radars show larger spread and strong decays for rainfall thresholds as
low as 1 mm in one hour (e.g. radar Adelaide [64]). This could be an effect of local conditions
around the radar (such topography) that may induce localized growing or decay of rainfall rates
under some flow conditions that are not properly modelled by STEPS or also anomalies in the
correction of ground echoes in other conditions (such anomalous propagation) that could produce
fictitious rainfall echoes in the input files used to generate and verify ensembles.
Figure 37. ROC area results for 10 rainfall events per radar and multiple rainfall thresholds (mm) for the
60-minute lead time. Colours indicate different rainfall thresholds.
STEPS3 – ADV – VERIFICATION REPORT
37
6. CONCLUSIONS
An extensive verification exercise was carried out to assess the quality of the new generation of
Bureau's high-resolution rainfall ensemble generator, STEPS3-ADV. More than 47,000 5-min
radar rainfall fields from 10 different weather radars formed the verification dataset. 96-member
ensembles were calculated for all time steps in the verification dataset using NCI supercomputer
"GADI". STEPS3-ADV ensembles were compared with ensembles generated by the current
operational system, STEPS1-ADV and by an open source alternative, pySTEPS.
STEPS3-ADV rainfall forecasts are also suitable to correctly predict the probability of the
occurrence of hourly rainfall accumulations for rainfall thresholds in the range of 0.2 to 50 mm
in the hour for the 60- to-90-minute lead times. However, some degree of caution may be required
by the users for thresholds above 20mm until additional datasets with a larger number of
occurrences of rainfall in that range have been incorporated into the verification.
STEPS3-ADV ensembles seem to be under-dispersive and additional spread may be required to
improve the accuracy of the rainfall ensembles although the new system clearly produces
ensembles more accurate and with more spread than current operational version. Nevertheless,
expected errors are small with at least 75% of the case studies showing mean CRPS values lower
than 0.80 mm.
A limited assessment of the influence of the number of ensemble members on the quality of the
rainfall forecasts was carried out. From the analysis of the largest event (2017 time steps), it was
found that ability to successfully identify events for rainfall threshold from 0.2 to 5 mm/hr remains
mostly similar for ensembles with 6, 12, 24, 48 and 96 members for lead times from 60 to 90
minutes. For the higher rainfall thresholds (10 mm/hr and above) this ability is heavily reduced if
ensembles with less than 48 members were used. Expected error in the rainfall ensembles reduces
as the number of members is increased, but ensembles with 48-member seem to have a similar
performance than those ones using larger 96-member.
Results show that STEPS3-ADV can generate reliable ensemble rainfall forecasts in large range
of rainfall conditions with quality comparable with available open-source alternatives but
delivering results up to 15 times faster. When compared with current operational version,
STEPS3-ADV have better performance in all scores analysed in this report and can deliver results
up to 30 times faster, showing strong capabilities for use as an operational system in the Bureau.
Finally, it is important to note that there is significant variability in the quality of the predictions,
and the verification results vary from radar to radar and from event to event depending of the
nature of the event, threshold, and lead-time.
STEPS3 – ADV – VERIFICATION REPORT
38
7. REFERENCES
Bureau National Operations Centre Operations, Bulletin Number 114, 2018. APS2 upgrade of
the ACCESS-C Numerical Weather Prediction system.
http://www.bom.gov.au/australia/charts/bulletins/BNOC_Operations_Bulletin_114.pdf
Bowler, N.E., Pierce, C.E. and Seed, A., 2004. Development of a precipitation nowcasting
algorithm based upon optical flow techniques. Journal of Hydrology, 288(1-2), pp.74-91.
Bowler, N.E., Pierce, C.E. and Seed, A.W., 2006. STEPS: A probabilistic precipitation
forecasting scheme which merges an extrapolation nowcast with downscaled NWP.
Quarterly Journal of the Royal Meteorological Society, 132(620), pp.2127-2155.
Fortin, V., Abaza, M., Anctil, F., Turcotte, R., 2014. Why Should Ensemble Spread Match the
RMSE of the Ensemble Mean? J. Hydrometeorol. 15, 1708–1713.
https://doi.org/10.1175/jhm-d-14-0008.1
Hersbach, H., 2000. Decomposition of the Continuous Ranked Probability Score for Ensemble
Prediction Systems. Weather Forecast. 15, 559–570. https://doi.org/10.1175/1520-
0434(2000)015<0559:DOTCRP>2.0.CO;2
Hosmer, David W., Lemeshow, Stanley., and Sturdivant, Rodney X., 2013. Applied Logistic Regression, 3rd Ed. Chapter 5, John Wiley and Sons, New York, NY, pp. 177
Palmer, T., Buizza, R., Hagedorn, R., Lawrence, A., Leutbecher, M., Smith, L., 2006. Ensemble
prediction: a pedagogical perspective. ECMWF Newsl. 106, 10–17.
https://doi.org/10.21957/ab129056ew
Pulkkinen, S., Nerini, D., Perez Hortal ,A., Velasco-Forero ,C., Germann ,U., Seed, A., and
Foresti ,L., 2019: Pysteps: an open-source Python library for probabilistic precipitation
nowcasting (v1.0). Geosci. Model Dev., 12 (10), 4185–4219, doi:10.5194/gmd-12-4185-
2019.
Roberts, N.M., Lean, H.W., 2008. Scale-Selective Verification of Rainfall Accumulations from
High-Resolution Forecasts of Convective Events. Mon. Weather Rev. 136, 78–97.
https://doi.org/10.1175/2007mwr2123.1
Seed, A.W., 2003. A dynamic and spatial scaling approach to advection forecasting. Journal of
Applied Meteorology, 42(3), pp.381-388.
Seed, A.W., 2008. Rainfields: The Australian Bureau of Meteorology System for Quantitative
Precipitation Estimation, and it’s use in Hydrological Modelling, Proceedings of Water
Down Under 2008, Modbury, SA, 661-670
Seed, A.W., Pierce, C.E. and Norman, K., 2013. Formulation and evaluation of a scale
decomposition‐based stochastic precipitation nowcast scheme. Water Resources Research,
49(10), pp.6624-6641.
Talagrand, O., Vautard, R., Strauss, B., 1997. Evaluation of Probabilistic Prediction Systems.
8. ACKNOWLEDGEMENTS
The authors express their gratitude to Dr. Beth Ebert and Dr. Shaun Cooper (Bureau of
Meteorology) for their insightful comments offered when reviewed the manuscript.
This project was undertaken with the assistance of resources and services from the National
Computational Infrastructure (NCI), which is supported by the Australian Government.
STEPS3 – ADV – VERIFICATION REPORT
39
9. APPENDIX
Main characteristics of the 100 selected rainfall events are summarized per radar in the following
tables:
Table A1: Rainfall events for Radar 2 (Melbourne)
No Start time End time Number of
time steps Description
1 15 Oct 2019 18:00 17 Oct 2019 09:00 469 Starts with a convective cell with
slowly moving precipitation
band
2 1 Nov 2019 09:00 2 Nov 2019 09:00 289 Starts with a narrow convective
band later forming well spread
rain
3 6 Nov 2019 05:00 8 Nov 2019 23:00 793 Widespread rain
4 12 Nov 2019 00:00 12 Nov 2019 19:00 229 Small patches of rain cells
moving quickly
5 1 Dec 2019 00:00 2 Dec 2019 13:00 445 Scattered light rain
6 4 Jan 2020 15:00 6 Jan 2020 02:00 421 Rainfall occurred during the
high level of smoke recorded in
Melbourne
7 15 Jan 2020 01:00 15 Jan 2020 11:00 121 Fast moving convective system
from west to south east
8 19 Jan 2020 02:00 20 Jan 2020 12:00 409 High intensity rain with the
presence of large-hail stones up
to 6 cm.
9 4 Mar 2020 00:00 5 Mar 2020 08:00 385 Widespread rain
10 3 Apr 2020 04:00 4 Apr 2020 14:00 409 South westerly cold front
forming intermediate convective
cells
STEPS3 – ADV – VERIFICATION REPORT
40
Table A2: Rainfall events for Radar 19 (Cairns)
No Start time End time
Number
of time
steps
Description
1 21 Oct 2019 03:00 22 Oct 2019 16:00 445 Scattered rain
2 5 Dec 2019 04:00 5Dec 2019 14:00 121 Small patches of convective
cells
3 9 Dec 2019 03:00 11 Dec 2019 09:00 649 Starts with a narrow band of
convective cells and later
forming scatter rain
4 2 Jan 2020 22:00 5 Jan 2020 00:00 601 Fast moving scattered rain
5 8 Jan 2020 15:00 9 Jan 2020 16:00 301 Localized high intensity rain
6 22 Jan 2020 10:00 24 Jan 2020 00:00 457 Starts with high intensity
scattered rain later forming
widespread rain
7 26 Jan 2020 00:00 29 Jan 2020 09:00 973 Widespread rain
8 8 Feb 2020 02:00 8 Feb 2020 20:00 217 Fast moving convective cells
9 20 Feb 2020 14:00 25 Feb 2020 00:00 1273 Southerly moving wind forming
widespread rain
10 8 Mar 2020 23:00 9 Mar 2020 17:00 217 Start with scattered rain later
forming widespread rain
STEPS3 – ADV – VERIFICATION REPORT
41
Table A3: Rainfall events for Radar 40 (Canberra)
No Start time End time
Number
of Time
steps
Description
1 7 Oct 2019 17:00 8 Oct 2019 09:00 193 Scattered rain in the beginning
later covering about 80 % of the
radar
2 15 Oct 2019 22:00 16 Oct 2019 15:00 205 Starts with a localized rain later
forming widespread rain
3 2 Nov 2019 5:00 3 Nov 2019 16:00 421 Widespread rain
4 21 Dec 2019 00:00 21 Dec 2019 09:00 109 Convective rain
5 29 Dec 2019 23:00 31 Dec 2019 13:00 457 Precipitation starts out as small
patches with some scattered
convective cells and later
evolves predominantly into
convective precipitation
6 15 Jan 2020 2:00 15 Jan 2020 21:00 229 Localized rain patches later
forming widespread rain
7 18 Jan 2020 23:00 19 Jan 2020 15:00 193 Fast moving convective cells
8 19 Jan 2020 23:00 20 Jan 2020 17:00 217 High intensity rain with the
presence of large hail stones up
to 5 cm.
9 7 Feb 2020 00:00 11 Feb 2020 12:00 1309 Widespread rain
10 3 Mar 2020 07:00 5 Mar 2020 07:00 577 Cold front with widespread rain
STEPS3 – ADV – VERIFICATION REPORT
42
Table A4: Rainfall events for Radar 63 (Darwin/Berrimah)
No Start time End time
Number
of Time
steps
Description
1 27 Dec 2019 00:00 27 Dec 2019 09:00 109 Fast evolving rain cells forming
predominantly convective rain
2 5 Jan 2020 00:00 6 Jan 2020 16:00 481 Widespread rain
3 8 Jan 2020 07:00 11 Jan 2020 13:00 937 Starting with the narrow band
later forming widespread rain
4 18 Jan 2020 20:00 23 Jan 2020 11:00 1333 Mostly scattered rain with
widespread rain in between
5 28 Jan 2020 18:00 29 Jan 2020 10:00 193 Fast evolving rain cells
6 8 Feb 2020 04:00 9 Feb 2020 13:00 397 Scattered rain
7 19 Feb 2020 06:00 20 Feb 2020 00:00 217 Starts with the rainfall band later
forming widespread rain
8 26 Feb 2020 11:00 29 Feb 2020 11:00 865 Predominantly convective rain
with scatter rain and narrow band
9 7 Mar 2020 02:00 8 Mar 2020 14:00 433 Starts with cumulus cloud with
fast evolving rain cells
10 23 Mar 2020 14:00 25 Mar 2020 19:00 637 Mostly localised rain
STEPS3 – ADV – VERIFICATION REPORT
43
Table A5: Rainfall events for Radar 64 (Adelaide/ Buckland Park)
No Start time End time
Number
of Time
steps
Description
1 12 Oct 2019 17:00 12 Oct 2019 21:00 49 Fast moving convective cells
approaching from the west
2 15 Oct 2019 02:00 15 Oct 2019 23:00 253 Started with the scattered rain,
later high rainfall intensity
convective band observed close
to Port Lincoln
3 1 Nov 2019 05:00 2 Nov 2019 00:00 229 Started with series of convective
bands later formed wide spread
rain with localized high intensity
rainfall.
4 28 Nov 2019 23:00 29 Nov 2019 11:00 145 Narrow band of convection rain
5 27 Dec 2019 12:00 27 Dec 2019 21:00 109 Medium intensity rain band
moving from south west to north
east.
6 4 Jan 2020 06:00 5 Jan 2020 19:00 445 Started with a narrow rain band,
intermittently forming a wide
spread with low to medium
intensity rainfall
7 9 Jan 2020 22:00 10 Jan 2020 09:00 133 Started with the scattered rain
later forming a narrow band of
fast moving moderate intensity
localized rain
8 30 Jan 2020 12:00 1 Feb 2020 7:00 517 Scattered rain in the beginning
coming from the North later
forming convective rain
9 1 Mar 2020 3:00 1 March 2020 15:00 145 Fast evolving rainfall mostly
convective rain cells
10 3 Apr 2020 3:00 4 Apr 2020 11:00 385 Started with a narrow band of
convective rainfall later formed
scatter rain.
STEPS3 – ADV – VERIFICATION REPORT
44
Table A6: Rainfall events for Radar 66 (Brisbane/Mt. Stapylton)
No Start time End time
Number
of time
steps
Description
1 10 Oct 2019 11:00 12 Oct 2019 12:00 589 Starts with a narrow convection
band of rain followed with
widespread rain
2 17 Oct 2019 04:00 17 Oct 2019 10:00 73 Strong convective event
3 12 Dec 2019 16:00 13 Dec 2019 09:00 205 Fast evolving convective system
with high intensity localised rain
4 24 Dec 2019 04:00 25 Dec 2019 21:00 493 Heavy precipitation event
5 3 Feb 2020 19:00 10 Feb 2020 19:00 2017 Longest rainfall event considered
for this verification
6 10 Feb 2020 20:00 14 Feb 2020 08:00 1009 Starts with a wide band of
convective rain followed by
thunderstorm with some reported
flooding cases around Brisbane
7 22 Feb 2020 11:00 25 Feb 2020 12:00 985 Rain band passing over Brisbane
8 26 Feb 2020 00:00 27 Feb 2020 08:00 385 Fast evolving convective rain
cells
9 8 Mar 2020 02:00 10 Mar 2020 11:00 685 Widespread rain event
10 30 Mar 2020 14:00 30 Mar 2020 18:00 49 Localised rain event
STEPS3 – ADV – VERIFICATION REPORT
45
Table A7: Rainfall events for Radar 70 (Perth/Serpentine)
No Start time End time
Number
of time
steps
Description
1 3 Oct 2019 06:00 4 Oct 2019 13:00 373 Precipitation starts as a
widespread rain later forms
scattered rain
2 11 Oct 2019 09:00 12 Oct 2019 00:00 181 Scattered rain
3 30 Oct 2019 02:00 2 Nov 2019 01:00 853 A narrow band of precipitation
followed by scattered rain
4 16 Dec 2019 07:00 16 Dec 2019 18:00 133 Light scattered rain moving
towards NE
5 10 Feb 2020 21:00 11 Feb 2020 01:00 49 Light scattered rain
6 21 Feb 2020 05:00 22 Feb 2020 06:00 301 Widespread rain
7 24 Feb 2020 00:00 24 Feb 2020 09:00 109 Fast evolving localised
convective cells
8 25 Feb 2020 20:00 28 Feb 2020 10:00 745 Convective rain approaching
from NW direction
9 14 Mar 2020 06:00 14 Mar 2020 18:00 145 Scattered localized rain
10 16 Mar 2020 22:00 18 Mar 2020 00:00 313 Rain band passing through radar
STEPS3 – ADV – VERIFICATION REPORT
46
Table A8: Rainfall events for Radar 71 (Sydney/Terry Hills)
No Start time End time Number of time
steps Description
1 4 Oct 2019 11:00 5 Oct 2019 01:00 169 Cluster of intense rain
surrounded by light rain
cells
2 10 Oct 2019 08:00 12 Oct 2019 13:00 634 Continuously generated
convective rain cells
3 3 Nov 2019 02:00 3 Nov 2019 17:00 162 Starts with localized rain
later forming widespread
rain
4 23 Nov 2019 02:00 23 Nov 2019 10:00 97 Localized high intensity
rain
5 15 Jan 2020 14:00 19 Jan 2020 09:00 1091 Cluster of intense rain
6 19 Jan 2020 22:00 20 Jan 2020 12:00 150 Hail event
7 5 Feb 2020 12:00 09 Feb 2020 20:00 1249 Widespread rain
8 3 Mar 2020 00:00 4 Mar 2020 08:00 385 Scattered rain
9 24 Mar 2020 19:00 27 Mar 2020 02:00 661 Band of convective cells
moving towards NE
10 29 Mar 2020 12:00 30 Mar 2020 05:00 205 Started with localized high
intensity rain, later
forming widespread rain
STEPS3 – ADV – VERIFICATION REPORT
47
Table A9: Rainfall events for Radar 76 (Hobart/Mt.Koonya)
No Start time End time Number of
time steps Description
1 1 Nov 2019 02:00 2 Nov 2019 00:00 265 Started with high intensity
localized rain cells, forming
widespread later during the
event
2 6 Nov 2019 02:00 9 Nov 2019 20:00 1081 Mostly scattered rain
3 30 Dec 2019 02:00 30 Dec 2019 19:00 205 Fast evolving high intensity
localized rain with widespread
rain in the later phase
4 9 Jan 2020 21:00 10 Jan 2020 22:00 301 Mostly widespread rain
5 22 Jan 2020 04:00 24 Jan 2020 02:00 553 Widespread rain
6 18 Feb 2020 08:00 19 Feb 2020 09:00 301 Light rainfall band
7 20 Feb 2020 08:00 21 Feb 2020 03:00 229 Mostly scattered localized rain
cells
8 25 Feb 2020 15:00 27 Feb 2020 23:00 673 Widespread rain moving from
SW to NE
9 4 Mar 2020 14:00 5 Mar 2020 16:00 313 Mostly widespread rain coming
from the north direction
10 19 Mar 2020 01:00 19 Mar 2020 19:00 217 Started with the scatter rain
later forming widespread with
intermittent high intensity rain
bands
STEPS3 – ADV – VERIFICATION REPORT
48
Table A10: Rainfall events for Radar 78 (Weipa airport)
No Start time End time Number of
time steps Description
1 26 Dec 2019 04:00 26 Dec 2019 22:00 217 Starts with the high intensity
localized rain later widespread
rain approaches from the south
direction
2 7 Jan 2020 14:00 10 Jan 2020 20:00 937 Mostly scattered rain, also
forming narrow rain band
3 27 Jan 2020 20:00 29 Jan 2020 12:00 481 Begins with high intensity scatter
rain later forming widespread
rain
4 29 Jan 2020 13:00 30 Jan 2020 17:00 337 Convective cells moving from
NW direction covering almost
80% the area at the later phase
5 30 Jan 2020 21:00 4 Feb 2020 13:00 1333 Intermittent rain event forming
with localized high intensity
rainfall
6 10 Feb 2020 23:00 13 Feb 2020 19:00 817 Intermittent localized convective
cells
7 19 Feb 2020 13:00 24 Feb 2020 07:00 1356 Begins with the scattered light
rain followed by some high
intensity narrow rain bands, later
forming widespread rain
8 5 Mar 2020 17:00 7 Mar 2020 17:00 577 Convective rain cells coming
from NE
9 10 Mar 2020 22:00 13 Mar 2020 08:00 682 Mostly scattered rain
intermittently forming narrow
rain bands
10 22 Mar 2020 03:00 24 Mar 2020 00:00 541 Rain moving towards west
direction with mostly scattered
rain