steps3 adv verification report

56
Bureau Research Report - 045 STEPS3 ADV VERIFICATION REPORT Carlos Velasco-Forero, Jayaram Pudashine, Mark Curtis, and Alan Seed September 2020

Upload: others

Post on 04-Dec-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: STEPS3 ADV VERIFICATION REPORT

Bureau Research Report - 045

STEPS3 – ADV – VERIFICATION REPORT

Carlos Velasco-Forero, Jayaram Pudashine, Mark Curtis, and Alan Seed

September 2020

Page 2: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

Page 3: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

i

STEPS3 – ADV – VERIFICATION REPORT

Carlos Velasco-Forero, Jayaram Pudashine, Mark Curtis, and Alan Seed

Bureau Research Report No. 045

September 2020

National Library of Australia Cataloguing-in-Publication entry

Authors: Carlos Velasco-Forero, Jayaram Pudashine, Mark Curtis, and Alan Seed

Title: STEPS3 – ADV – VERIFICATION REPORT

ISBN: 978-1-925738-20-9

Series: Bureau Research Report – BRR045

Page 4: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

ii

Enquiries should be addressed to:

Dr. Carlos Velasco-Forero:

Bureau of Meteorology

GPO Box 1289, Melbourne

Victoria 3001, Australia

[email protected]

Copyright and Disclaimer

© 2020 Bureau of Meteorology. To the extent permitted by law, all rights are reserved and no part of

this publication covered by copyright may be reproduced or copied in any form or by any means except

with the written permission of the Bureau of Meteorology.

The Bureau of Meteorology advise that the information contained in this publication comprises general

statements based on scientific research. The reader is advised and needs to be aware that such

information may be incomplete or unable to be used in any specific situation. No reliance or actions

must therefore be made on that information without seeking prior expert professional, scientific and

technical advice. To the extent permitted by law and the Bureau of Meteorology (including each of its

employees and consultants) excludes all liability to any person for any consequences, including but not

limited to all losses, damages, costs, expenses and any other compensation, arising directly or indirectly

from using this publication (in part or in whole) and any information or material contained in it.

Page 5: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

iii

Contents

1. Executive Summary .......................................................................................... 1

2. Short description of STEPS .............................................................................. 3

3. Methodology ...................................................................................................... 3

3.1 User needs................................................................................................................ 3

3.2 STEPS3 .................................................................................................................... 4

3.3 Datasets .................................................................................................................... 4

4. Rainfall Ensembles and Probabilistic Verification .......................................... 8

4.1 Qualitative evaluation of forecast mean areal rainfall ............................................... 9

4.2 Qualitative evaluation of forecast rainfall fields ...................................................... 10

4.3 Probabilistic Verification .......................................................................................... 15 4.3.1 Root Mean Square Error – Ensemble Spread..................................................... 16 4.3.2 Continuous Ranked Probability Score (CRPS) ................................................... 18 4.3.3 Relative Operating Characteristic (ROC) ............................................................ 18 4.3.4 Rank (Talagrand) Histogram ............................................................................... 21 4.3.5 Reliability (Attribute) Diagrams............................................................................ 22

4.4 Performance based on different number of ensemble members ........................... 25

4.5 Comparison with existing operational system, STEPS1-ADV ................................ 28

4.6 Comparison with pySTEPS .................................................................................... 31

5. Operational configuration for STEPS3-ADV .................................................. 35

6. Conclusions .................................................................................................... 37

7. References ....................................................................................................... 38

8. Acknowledgements......................................................................................... 38

9. Appendix .......................................................................................................... 39

Page 6: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

iv

List of Figures

Figure 1. Indicative locations of whole Australian weather radars (green), with radars used in this study pinpointed in red. Dashed squares correspond with approximated extents of radar data used in this experiment (square regions of 256 km per side). .............................. 5

Figure 2. Time series of mean and maximum rainfall rate for Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5). Mean rainfall rate is shown as black lines, with maximum values per time step in red and interquartile range in shaded blue. ..... 7

Figure 3. Time series of Wet Area Rate (WAR) for a threshold of 1mm/hr (red) and ratio of mean rainfall rates and standard deviation (blue) for Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5). High values of WAR correspond with wide-spread rainfall, with values close to zero corresponding to isolated cells of high intensity rain rates. Ratio of mean and standard deviation show a high correlation with WAR and may be used as alternative score. ......................................................................... 7

Figure 4. 5-min accumulated radar rainfall fields for time steps with Maximum Wet Area Ration (left), highest maximum rainfall (centre) and maximum Mean Rainfall (right) from event 5 of Brisbane radar (66). Extent of areas are 256 x 256 kms with pixel size of 0.5km. ................ 8

Figure 5. Time series of domain wide mean (observed and forecast) rainfall for first half of event 5 of Brisbane radar. Values in black correspond with the mean of 60-min accumulated rainfall fields calculated from 5-min accumulated radar data every 5 minutes. Multi-coloured lines correspond with the mean of forecast rainfall fields for each member of a selection of 96-member STEPS3-ADV rainfall ensembles. ...................................................................... 9

Figure 6. Observed and forecast mean rainfall values for selected STEPS3-ADV rainfall ensembles. Values are 60-min accumulated rainfall fields. Forecasts correspond to 96-member rainfall ensembles calculated using observed radar rainfall at a) 2020-02-05 13:00UTC (left), 2020-02-06 00:00UTC (centre) and 2020-02-06 19:00UTC (right) over the Brisbane (Mt. Stapylton) Radar. ........................................................................................... 10

Figure 7. 60-min accumulated rainfall fields for Brisbane radar at 2020-02-05 14:00UTC, that correspond with (top row, from left to right) estimated radar rainfall, mean rainfall from 96-member STEPS3-ADV rainfall forecast ensemble calculated at 2020-02-05 13:00UTC, and six individual members from the same rainfall ensemble. ................................................... 11

Figure 8. As Figure 7 but at 2020-02-06 01:00UTC. In this case, rainfall ensemble was calculated at 2020-02-06 00:00UTC. ................................................................................... 11

Figure 9. As Figure 7 but at 2020-02-06 20:00UTC. In this case, forecast rainfall ensemble was calculated at 2020-02-06 19:00UTC. ................................................................................... 12

Figure 10. 5-min rainfall fields for Brisbane radar at 2020-02-05 21:55UTC, that correspond with (top row, from left to right) estimated radar rainfall, mean rainfall from 96-member STEPS3-ADV rainfall forecast ensemble calculated at 2020-02-05 21:45UTC, and six individual members from the same forecast rainfall ensemble. ........................................................... 13

Figure 11. 5-min forecast rainfall fields for member 23 from a 96-member STEPS3-ADV rainfall forecast ensemble calculated at 2020-02-05 21:45UTC for Brisbane radar. Rainfall fields correspond with (from top to bottom, and left to right) 5-, 10-, 20-, 30-, 40-, 50-, 60-, 70-, 80- and 90-min lead times. Size of rainfall fields is 256 x 256 km at 0.5 km resolution. ............ 14

Figure 12. As Figure 11, but 5-min forecast rainfall fields are here extracted from member 43. Size of rainfall fields is 256 x 256 km at 0.5 km resolution. ................................................. 14

Figure 13. Same as in Figure 11, but 5-min forecast rainfall fields depict here a close-up area of 100x100 km centred at Brisbane city. .................................................................................. 15

Figure 14. Same as Figure 12, but 5-min forecast rainfall fields depict here close-up areas of 100x100 km centred at Brisbane city. .................................................................................. 15

Page 7: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

v

Figure 15. RMS error vs spread of STEPS3-ADV 60min rainfall ensemble forecasts for a selection of rainfall events. a) Brisbane radar (ID66), event 05; b) Melbourne radar (ID02), event 08; c) Sydney(Terrey Hills) radar (ID71), event 09; d) Perth (Serpentine) radar (ID70), event 04; e) Canberra radar (ID40), event 01; and f) Cairns Radar (ID19), event 03. ........ 17

Figure 16. Distribution of the RMSE (left) and Ensemble Spread (right) based on 100 rainfall events for lead times from 60 to 90 minutes ........................................................................ 17

Figure 17. Distribution of CRPS values of STEPS3-ADV rainfall forecasts for 100 rainfall events, per lead time. ....................................................................................................................... 18

Figure 18. ROC curves for STEPS3-ADV 96-member 60-min accumulated rainfall ensembles for Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5). ROCs correspond with multiple rainfall thresholds (0.2, 0.4, 0.6, 0.8, 1.0, 5.0, 10, 20 and 50 mm) and different lead times a) 60 minutes, b) 70 minutes, c) 80 minutes and d) 90 minutes. ................................................................................................................................ 19

Figure 19. Mean and spread of ROC curves for STEPS3-ADV 96-member 60-min accumulated rainfall ensembles for 100 events. Data correspond with multiple rainfall thresholds (0.2, 0.4, 0.6, 0.8, 1.0, 5.0, 10, 20 and 50 mm) for a lead time of 60 minutes. ............................ 20

Figure 20. ROC area results for 100 rainfall events and multiple rainfall thresholds (mm), grouped by Lead Time (minutes) ......................................................................................... 21

Figure 21. Rank histograms for STEPS3-ADV 96-member 60-min accumulated rainfall ensembles for Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5). Histograms correspond to lead times of a) 60 minutes, b) 70 minutes, c) 80 minutes and d) 90 minutes. .................................................................................................. 22

Figure 22. Distribution of rank histograms for STEPS3-ADV 96-member 60-min accumulated rainfall ensembles for 100 rainfall events. Histograms correspond to a lead time of 60 minutes. ................................................................................................................................ 22

Figure 23. Reliability diagrams for STEPS3-ADV 96-member 60-min accumulated rainfall ensembles for Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5). Diagrams correspond with lead time of 60 minutes and multiple rainfall threshold (from top to bottom, 0.2, 0.4, 0.6, 0.8, 1.0, 5.0, 10, 20 and 50 mm). Dashed horizontal lines show the climatological frequency for the given threshold, and the dotted lines midway between the 1:1 diagonal line and the horizontal denotes "no skill" relative to climatology. Shaded regions show the areas where ensembles have good reliability and therefore skill. Bar charts below each diagram show the number of times each probability value was predicted. ............................................................................................................................. 24

Figure 24. Reliability diagrams for 100 rainfall events for multiple rainfall thresholds (0.2, 0.4, 0.6, 0.8, 1.0, 5.0, 10, 20 and 50 mm) and different Lead Times (60, 70, 80 and 90 minutes). Lines correspond to the mean of 100 events results and shaded areas depict 5 –95 percentiles for the indicated lead times and rainfall thresholds. Background correspond with climatology areas of positive skill for a lead time of 60-min. ................................................ 25

Figure 25. ROCs for STEPS3-ADV 60-min accumulated rainfall ensembles with different ensemble members for (top row) T+60 (centre) T+70 and (bottom row) T+80 for Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5) ................... 26

Figure 26. ROC areas for different number of members in a STEPS3-ADV rainfall ensemble for multiple lead times (60 to 90 minutes) and different detection thresholds (0.2 to 50 mm). Results correspond with 60-min accumulated rainfall ensembles for the event 05 of Brisbane radar...................................................................................................................... 27

Figure 27. Evolution of CRPS values over domain for different number of ensemble members. Results correspond with 60-min accumulated rainfall ensembles for the event 05 of Brisbane radar (ID 66). ........................................................................................................ 28

Page 8: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

vi

Figure 28. Comparison of ROC Area distribution for 60-minute lead time based on STEPS1-ADV and STEPS3-ADV........................................................................................................ 29

Figure 29. Comparison of reliability plot for 60-, 70-, 80- and 90-minute lead times considering all 100 events based on STEPS1-ADV and STEPS3-ADV. Results are based on 24-member ensembles. Background correspond with climatology areas of positive skill for a lead time of 60-min. .............................................................................................................. 30

Figure 30. Comparison of CRPS based on STEPS3-ADV and STEPS1-ADV. Results are based on 24-member ensembles for all 100 events. ...................................................................... 30

Figure 31. Comparison of RMSE (left panel) and ensemble spread (right panel) for STEPS3-ADV (orange) and STEPS1-ADV (blue) ............................................................................... 30

Figure 32. Comparison of ROCs for STEPS3-ADV (green line), STEPS1-ADV (orange line) and pySTEPS (blue line) using 24 members for Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5) for a lead time of 60-min ............................. 32

Figure 33. ROC areas for STEPS3-ADV, STEPS1-ADV and pySTEPS using 24 rainfall ensembles for multiple lead times (60 to 90 minutes) and different detection thresholds (0.2 to 50 mm). Results correspond with 60-min accumulated rainfall ensembles for the event 05 of Brisbane radar. ............................................................................................................ 33

Figure 34. Comparison of reliability diagram STEPS3-ADV (green line), STEPS1-ADV (orange line) and pySTEPS (blue line) using 24-member 60-min accumulated rainfall ensembles for Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5). Diagrams correspond with a lead time of 60 minutes and multiple rainfall thresholds (from top to bottom, and left to right, 0.2, 0.4, 0.6, 0.8, 1.0, 5.0, 10, 20 and 50 mm). ................... 34

Figure 35. Comparison of RMSE (left) and Ensemble spread(right) based on STEPS3-ADV(green), STEPS1-ADV(orange) and pySTEPS(blue) using 24-member 60-min accumulated rainfall ensembles for Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5). ......................................................................................... 35

Figure 36. Evolution of CRPS values over domain based on STEPS3-ADV (green), STEPS1-ADV (orange) and pySTEPS (blue) using 24-member 60-min accumulated rainfall ensembles for Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5). .............................................................................................................................. 35

Figure 37. ROC area results for 10 rainfall events per radar and multiple rainfall thresholds (mm) for the 60-minute lead time. Colours indicate different rainfall thresholds.................. 36

Page 9: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

1

STEPS3 – ADV – Verification Report Carlos Velasco-Forero, Jayaram Pudashine, Mark Curtis, and Alan Seed

Radar Science and Nowcasting Team

Research Program, Science and Innovation Group

Australian Bureau of Meteorology

03 September 2020

1. EXECUTIVE SUMMARY

A new version of the Short-Term Ensemble Prediction System (STEPS) (Seed et al., 2003) is

under development as part of the Public Service Transformation (PST) program in the Bureau of

Meteorology. Main goals of this new development are 1) to reduce the computational time

required to produce rainfall ensembles to seconds from minutes (the average production time in

the current operational implementation known as STEPS1), 2) to improve the quality of rainfall

ensemble forecasts, 3) to extend the current coverage of the service to most of Australia and 4) to

use the latest programming techniques that allow the deployment of the new implementation in

cloud-based systems.

The focus of this report is to assess the quality of the rainfall ensemble forecasts and define

operational rules and configurations that fulfil most of the stakeholders' requirements, in this case,

60-min accumulated rainfall fields for lead times in the range of 60 to 90 minutes. This report

assesses a new STEPS implementation that produces rainfall ensemble forecasts using weather

radar data only (henceforth referred to as STEPS3-ADV). STEPS implementations that use

Numerical Weather Prediction (NWP) rainfall forecasts jointly with weather radar data to

generate rainfall ensembles are not within the scope of this work and will be assessed in future

reports.

In this experiment a total of 47,057 individual 5-min radar rainfall fields (equivalent to more than

163 days of rain) were analysed, by using the Bureau’s operational Rainfields (Seed et al., 2008)

datasets from 10 radars across Australia. 96-member rainfall ensembles were calculated for each

one of these individual radar fields to up to 90 minutes ahead.

This extensive verification exercise shows that STEPS3-ADV can generate reliable ensemble

rainfall forecasts in a large variety of rainfall conditions with comparable quality to available

open-source alternatives but delivering results up to 15 times faster. When compared with current

operational version, STEPS3-ADV have better performance in all scores analysed in this report

and can deliver results up to 30 times faster, showing strong capabilities for use as an operational

system in the Bureau.

STEPS3-ADV rainfall forecasts are suitable to correctly predict the probability of the occurrence

of hourly rainfall accumulations for rainfall thresholds in the range of 0.2 to 50 mm in the hour

for the 60- to-90-minute lead times. However, some degree of caution may be required by the

users for thresholds above 20mm until additional datasets with a larger number of occurrences of

rainfall in that range have been incorporated into the verification.

STEPS3-ADV ensembles seem to be under-dispersive and additional spread may be required to

improve the accuracy of the rainfall ensembles. However, expected errors are small with at least

75% of the case studies showing mean error values lower than 0.80 mm. Additionally, an

Page 10: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

2

assessment on the influence of the number of members of the ensemble in the quality of the

rainfall forecasts was carried out, finding that about 48 members may be needed to accurately

forecast the probability of the 50 mm accumulation during extreme events. Finally, it is important

to note that there is a significant variation in the quality of the predictions, and verification results

vary from radar to radar and from event to event depending of the nature of the radar, event,

accumulation threshold, and lead-time.

Page 11: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

3

2. SHORT DESCRIPTION OF STEPS

The Short-Term Ensemble Prediction System (STEPS) method uses a multiplicative cascade

scale decomposition approach for generating high-resolution ensembles of short-term rainfall

forecasts (nowcasts) from radar observations (Seed, 2003; Bowler et al., 2006). The main goal of

STEPS is to generate ensembles of rainfall forecasts that exhibit similar space-time structures to

those of observed rainfall over a range of space and time scales. Originally, this system blended

an advection forecast from radar observations with a noise model possessing the space-time

properties of observed rain fields (Bowler et al., 2004, 2006). This method has since been

extended to allow radar and numerical weather prediction (NWP) forecasts to be blended (Seed

et al., 2013).

Current operational implementation of STEPS in the Bureau of Meteorology consists of two

product types. The first one uses 5-min radar rainfall estimations for a single radar to generate 10

member ensembles of 5-min rainfall forecasts up to 90 minutes ahead. These products have a

spatial resolution of 0.5 x 0.5 km on a 256 x 256 km domain centred at the radar location and are

available for the radars at Adelaide, Melbourne, Sydney, and Brisbane only. The second product

type combines 10-min multi-radar rainfall estimations with 10-min ACCESS-C NWP rainfall

forecasts (Bureau National Operations Centre Operations, Bulletin Number 114, 2018) to create

10-member ensembles of 10-min rainfall forecasts up to 12-hours ahead, for seven regions across

Australia. These combined radar-NWP rainfall ensembles have a spatial resolution of 1 x 1 km

and covers domain of 512 x 512 km. An updated implementation of STEPS is required to fulfill

current user requirements with increased demand of nowcasting products across the country (see

next section) and to incorporate latest computational techniques that allow to use modern

architectures such as cloud computing.

3. METHODOLOGY

3.1 User needs

Currently, the Australian weather radar network consists of 58 radars, and the number is expected

to increase to 70 radars in the next decade. Stakeholders have expressed their interest to have

ensemble nowcasts of 5-min rainfall for each one of the radars in the Australian network at a high

spatial resolution (1km or less). End-users in Public Weather and Public Safety teams in the

Bureau were consulted about which products are required for nowcasts of rainfall accumulations,

and consensus was reached that a forecast of the accumulation of rainfall in the next hour, and

the probability that it would exceed a number of thresholds was required. Rainfall ensembles

generated by STEPS must have sufficient quality to be able to accurately predict the chance of

both light rainfall (<1mm/hr) for public weather applications and heavy rainfall (50mm/hr) for

warning applications.

Page 12: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

4

3.2 STEPS3

STEPS3 is a completely new implementation which has been built with the goal of providing low

cost, high performance and reliable operational nowcasting services for large scale radar

networks.

Scientific improvements to the STEPS algorithm have been realized in several important areas.

The decomposition filters have been redesigned to increase spectral isolation of cascade levels

while reducing ringing artefacts. An alternative optical flow technique has been adopted that

provides superior tracking of the rain fields in areas of low texture. Also, the process parameters

of the autoregressive models that drive stochastic evolution of the nowcast have been made

spatially varying. This change ensures that localized scaling characteristics are retained during

the life of the nowcast rather than becoming statistically homogenous.

As a piece of software, the engineering priorities for STEPS3 are performance, reliability and

suitability for operational deployment. The code base is highly parallelized, allowing the

generation of large nowcast ensembles with very low latency. While STEPS3 remains highly

configurable, significant effort has been made to ease the setup burden on users by optimizing

and tuning the model during development. This was facilitated through the use of a Continuous

Integration server that automatically evaluated every proposed change by generating and

verifying over 2,000 ensembles against a reference dataset of real-world scenarios.

Finally, STEPS3 can be deployed as a cloud-based application allowing to scale dynamically as

needed based on weather conditions, minimizing costs without placing demands on internal IT

resources. It is hoped that cloud-based deployment will be a first step towards providing easily

accessible high-quality radar-based nowcasting as a service to a broad range of users.

The new STEPS3 implementation that produces rainfall ensemble forecasts using weather radar

data only is henceforth referred to as STEPS3-ADV. STEPS3-ADV is developed to be able to

generate ensembles of rainfall forecasts for all radars in the Bureau of Meteorology’s weather

radar network across Australia. The new STEPS3-ADV system can generate ensembles of 5-min

rainfall nowcasts up to 90 minutes, the 60-min accumulation, and the probability that a range of

thresholds will be exceeded in the next hour.

3.3 Datasets

To analyse the quality of STEPS3-ADV rainfall ensembles under different weather conditions,

several rainfall events were selected for each of 10 weather radars located around Australia.

Selection criteria for radars included location around capital cities (8 radars) and subject to

extreme rainfall events (2 radars). List of the selected radars is in Table 1 and their indicative

locations in Figure 1.

A 6-month period from 1st October 2019 to 31 March 2020 was used to identify significant

rainfall events for all radars. It is noted that this period corresponds with the warm season in

Australia and it is likely that the results may be influenced by not having cool season events in

this analysis. The rainfall product chosen is the 5-min calibrated radar rainfall accumulation

generated in real-time by operational Bureau’s Rainfields system (Seed et. al., 2008). This

calibrated radar rainfall product is obtained using a series of quality control measures including

removal of ground and sea clutter, interferences, anomalous propagations, second trip and bright

Page 13: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

5

band contamination and partial beam blockages. This cleaned reflectivity is later converted to

surface rainfall map by firstly estimating the reflectivity at the earth surface using a three

dimensional interpolation, then converting these reflectivity values into rainfall estimates at

ground based on static Z-R relationships, and finally correcting gauge/radar bias by using near

real-time rain gauges information.

Radar

ID

Name Type Latitude

(° S)

Longitude

(° E)

2 Melbourne (VIC) S-band DualPol 37.86 144.76

19 Cairns/ Saddle Mountain (QLD) C-band Doppler 16.82 145.68

40 Canberra / Captain's Flat (ACT) S-band Doppler 35.66 149.51

63 Darwin - Berrimah (NT) C-band Doppler 12.46 130.93

64 Adelaide/ Buckland Park (SA) S-band DualPol 34.617 138.469

66 Brisbane/Mt. Stapylton (QLD) S-band DualPol 27.718 153.24

70 Perth/Serpentin (WA) C-band Doppler 32.39 115.87

71 Sydney/Terry Hills (NSW) S-band Doppler 33.70 151.21

76 Hobart/Mt.Koonya (TAS) C-band Doppler 43.11 147.81

78 Weipa Airport (QLD) C-band Doppler 12.67 141.92

Table 1. List of selected weather radars and its main characteristics.

Figure 1. Indicative locations of whole Australian weather radars (green), with radars used in this study

pinpointed in red. Dashed squares correspond with approximated extents of radar data used in this

experiment (square regions of 256 km per side).

Page 14: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

6

For each radar, rainfall events were identified using Wet Area Ratio (WAR) based on the fraction

of rainfall field above 1 mm/h each 5 minutes and that fulfilled the following three criteria:

• Minimum WAR in the event: 0.01

• Minimum storm duration: 3 hr

• Maximum gap with no rain within a storm: 1 hr

Using the above selection criteria, Table 2 shows the total number of rainfall events identified

and the duration of longest rainfall event for each of the 10 radars.

Radar ID Name No of rainfall events Longest duration (hr)

2 Melbourne (VIC) 45 66.08

19 Cairns (QLD) 95 106.16

40 Canberra (ACT) 80 108.83

63 Darwin - Berrimah (NT) 128 111.58

64 Adelaide/ Buckland Park (SA) 45 42.41

66 Brisbane/Mt. Stapylton (QLD) 63 168

70 Perth/Serpentin (WA) 57 70.58

71 Sydney/Terry Hills (NSW) 70 104.41

76 Hobart/Mt.Koonya (TAS) 81 90

78 Weipa Airport (QLD) 122 114.25

Table 2. Total number of identified rainfall events per radar and duration of the longest event.

Among the 786 rainfall events identified, 10 rainfall events that showed different types of

precipitation and rain fields evolution were handpicked for each one of the 10 radars and later

used in the verification analyses of STEPS3-ADV implementation.

A total of 47,057 individual 5-min radar rainfall fields (equivalent to more than 163 days of rain)

were analysed in this experiment. The main characteristics of the selected 100 rainfall events are

summarized per radar in Appendix.

Examples of mean areal rainfall time series and typical rainfall fields for some selected rainfall

events are shown next. Figure 2 shows temporal evolution of mean areal rainfall rate (black),

maximum rain rate value (red) and interquartile range (blue shade) for the longest event in the

archive (Event 5 of Brisbane Radar [Id 66]). Time steps with high maximum values but low mean

areal rainfall rates are indicative of intense convective cells travelling in the area of analysis, while

time steps with high mean areal values with moderate maximum values usually correspond with

wide-spread rainfall events with embedded high intensity cells. Time series of Wet Area Ratio

and mean and standard deviation ratio of rain rates for the same rainfall event are shown in Figure

3. Note the strong correlation between both ratios.

5-min accumulated rainfall fields for the time steps with maximum observed mean, highest

observed maximum value and maximum wet area ratio for event 5 over the Brisbane Radar are

shown in Figure 4. The adaptative scale-based scheme used by STEPS3-ADV allows for the

generation of ensembles of rainfall forecasts using these diverse range of observed rainfall fields.

Plots were generated for all 100 events but are not included in this report for the sake of simplicity.

Page 15: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

7

Figure 2. Time series of mean and maximum rainfall rate for Brisbane radar (ID 66) from 2020-02-04

19:00UTC to 2020-02-10 19:00UTC (event 5). Mean rainfall rate is shown as black lines, with maximum

values per time step in red and interquartile range in shaded blue.

Figure 3. Time series of Wet Area Rate (WAR) for a threshold of 1mm/hr (red) and ratio of mean rainfall

rates and standard deviation (blue) for Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10

19:00UTC (event 5). High values of WAR correspond with wide-spread rainfall, with values close to zero

corresponding to isolated cells of high intensity rain rates. Ratio of mean and standard deviation show a high

correlation with WAR and may be used as alternative score.

Page 16: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

8

Figure 4. 5-min accumulated radar rainfall fields for time steps with Maximum Wet Area Ration (left), highest

maximum rainfall (centre) and maximum Mean Rainfall (right) from event 5 of Brisbane radar (66). Extent of

areas are 256 x 256 kms with pixel size of 0.5km.

4. RAINFALL ENSEMBLES AND PROBABILISTIC VERIFICATION

5-min rainfall ensemble forecasts were calculated every 5 minutes up to 90 minutes after the

observation time using each one of the time steps of the 100 selected rainfall events. A total of 96

members were calculated for each lead time providing an opportunity to build robust statistical

scores and test the impact of ensemble size in the performance and accuracy of the forecast rainfall

fields. Simulations were undertaken with the assistance of resources and services from the

National Computational Infrastructure (NCI), which is supported by the Australian Government,

using supercomputer "GADI".

STEPS3-ADV stakeholders are mainly interested in the chance of rainfall for the next hour after

the observation time, and therefore verification analyses in this report will be solely using 60-min

accumulation rainfall fields.

Observed 60-min accumulated rainfall fields were calculated every 5 minutes by adding the

twelve previous 5-min rainfall accumulation rainfall fields until the accumulation time. 60-min

accumulated rainfall fields were only calculated if and only if all the twelve 5-min accumulated

rainfall fields in the period were available in the archive. The same accumulation scheme was

applied to each member of the STEPS3-ADV rainfall ensemble forecasts, and therefore a 96-

member ensemble of hourly rainfall forecasts were calculated adding 5-min rainfall forecasts.

Verification analyses were made for each rainfall event and then individual verification results

were concatenated to increase the number of samples and obtain statistically significant results

for each radar. Additionally, verification results from multiple radars were combined to

understand the overall STEPS3-ADV’s performance.

Page 17: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

9

4.1 Qualitative evaluation of forecast mean areal rainfall

The following figures show examples of STEPS3-ADV rainfall ensemble forecasts compared

with observed mean rainfall for the first half of event 5 of the Brisbane radars using 60-min

accumulation rainfall fields.

Figure 5 shows mean observed rainfall values for the whole Brisbane radar domain (black) and a

selection of domain wide mean forecast rainfall values for each member of STEPS3-ADV

ensembles (each member in a different colour) calculated using observed rainfall at different

times. Observed mean rainfall corresponds with the 60-min accumulation ending at the marked

time. As the frequency of original dataset is 5-min, 60-min accumulations can be estimated every

5 minutes as well for lead times in the range of 60 to 90 minutes. This figure shows how the

forecast mean rainfall of the members of the STEPS3-ADV rainfall forecast ensemble evolves

around the observed mean rainfall and spread of the ensemble members seems to vary depending

on the calculation time.

Figure 5. Time series of domain wide mean (observed and forecast) rainfall for first half of event 5 of Brisbane

radar. Values in black correspond with the mean of 60-min accumulated rainfall fields calculated from 5-min

accumulated radar data every 5 minutes. Multi-coloured lines correspond with the mean of forecast rainfall

fields for each member of a selection of 96-member STEPS3-ADV rainfall ensembles.

To facilitate comparisons, Figure 6 shows detailed versions of some of the ensemble results

presented in Figure 5. In addition to domain wide mean rainfall forecast for each of the members

of the ensemble, red lines in Figure 6 represent the domain wide mean rainfall of the whole

ensemble, and the base times used to calculate the 60-min rainfall forecasts ensembles are

highlighted in red. Results come from 96-member ensembles that were calculated using observed

rainfall at a) 2020-02-05 13:00UTC, b) 2020-02-06 00:00UTC and c) 2020-02-06 19:00UTC.

Note that as the comparison is done with 60-min accumulations for both observed and forecast

rainfall fields, the first forecasts are only available one hour after the base time of the ensemble

and from then, accumulated forecasts are calculated every 5 minutes until the end of forecast (90

minutes).

Page 18: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

10

For all cases in Figure 6, the ensemble mean areal rainfall remains almost the same for all lead

times which is the expected behaviour of STEPS methodology. The mean areal rainfall values

calculated for each of the ensemble members nicely spreads around the observed domain wide

mean rainfall, with the spread increasing with the lead time and changing depending the

conditions at the base time. For example, in the first case ( Figure 6 (left), 2020-02-05 13:00UTC)

the mean areal rainfall of ensemble members is scattered about 1.5mm for the first lead time and

extents about 2.5mm for the 90-min lead time. In the last case (Figure 6 (right), 2020-02-06

19:00UTC), mean areal rainfall of the ensemble members scatters about 4 mm for the 60-min

lead time and expands to almost 5 mm for the 90-min lead time.

It is important to note that STEPS3-ADV rainfall ensembles may however not show this nice

spread around the observed mean in other conditions. For example, in cases with strong changes

in the mean rainfall, either abrupt rises or decays, the assumption that the mean rainfall must

remain the same for the duration of the forecast may not be applicable and ensemble forecasts

may spread away of the observed mean rainfall values. This is the main reason to limit the forecast

duration of radar-only STEPS forecast until 90 minutes, and motivates the use of additional data

sources (such as NWP) to blend with radar that may better estimate the evolution of the mean

rainfall in the forecast area for longer periods.

Figure 6. Observed and forecast mean rainfall values for selected STEPS3-ADV rainfall ensembles. Values

are 60-min accumulated rainfall fields. Forecasts correspond to 96-member rainfall ensembles calculated

using observed radar rainfall at a) 2020-02-05 13:00UTC (left), 2020-02-06 00:00UTC (centre) and 2020-

02-06 19:00UTC (right) over the Brisbane (Mt. Stapylton) Radar.

4.2 Qualitative evaluation of forecast rainfall fields

Previous figures have compared the STEPS3-ADV rainfall ensembles with observed rainfall only

in terms of their mean areal rainfall values. Next, some examples of 60-min rainfall fields are

presented in the form of “postage stamps” to illustrate spatial similarities between observed and

forecast ensemble rainfall fields. Each figure includes, for one given time step, the estimated 60-

min accumulated radar rainfall (‘true’ rainfall) (top left), and from the STEPS3-ADV rainfall

ensemble calculated one hour earlier (lead time 60 minutes), the 60-min ensemble mean rainfall

forecast (second stamp from top row), and six 60-min rainfall forecast fields that correspond with

six different members of the 96 members available. Rainfall fields from single members in Figure

7, Figure 8 and Figure 9 generally show a good agreement with the estimated rainfall by radar at

the large scale, while providing clear alternatives at medium and small scales. In other words,

large areas of rainfall predicted in the member of the ensembles usually cover the same areas as

the observed accumulation, and local areas of heavy intensities are usually predicted in the similar

Page 19: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

11

locations of intense rain in the observed field. The ensemble mean rainfall fields are smoother

than the individual members fields, clearly highlighting those areas where rainfall is most likely

to occur, with values to be considered by users as the expected (most likely) value for a given

location, lead time and time step. Ensemble mean rainfall does not provide information about

possible extreme values for a given location and lead time that may be useful in some cases for

some users (such as extreme weather).

Figure 7. 60-min accumulated rainfall fields for Brisbane radar at 2020-02-05 14:00UTC, that correspond

with (top row, from left to right) estimated radar rainfall, mean rainfall from 96-member STEPS3-ADV rainfall

forecast ensemble calculated at 2020-02-05 13:00UTC, and six individual members from the same rainfall

ensemble.

Figure 8. As Figure 7 but at 2020-02-06 01:00UTC. In this case, rainfall ensemble was calculated at 2020-

02-06 00:00UTC.

Page 20: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

12

Figure 9. As Figure 7 but at 2020-02-06 20:00UTC. In this case, forecast rainfall ensemble was calculated

at 2020-02-06 19:00UTC.

It is worth noting again that 60-min forecast fields presented in the previous figures correspond

with accumulations of ‘native’ 5-min rainfall forecasts produced by STEPS3-ADV. Next,

examples of 5-min rainfall forecasts STEPS3-ADV are presented and some of their characteristics

are discussed. Figure 10 shows an example of 5-min rainfall observations and forecast for

Brisbane radar at 2020-02-05 21:55UTC where ensemble mean and members correspond with a

lead time of 10 minutes (i.e., forecast rainfall ensemble was calculated using radar rainfall data

observed at 2020-02-05 21:45UTC). Figure 10 is a clear example of the diversity of forecast

rainfall fields produced by STESP3-ADV at 5-min time steps, where large-scale features are

preserved in all the members of the ensemble but additionally, each member is enriched with

numerous, different, spatial-coherent medium- and small-scale features. In this way, forecast

rainfall fields match large features and stress areas with high intensities that were observed by the

weather radar later.

In addition to these spatial-correlated features, STEPS3-ADV also provide a temporal connection

between different lead times for each of the ensemble members. For one given member, forecast

rainfall fields evolve in time in the same way that observed rainfall fields do. It is important to

note that each member evolves in time differently to other members of the same ensemble. For

example, Figure 11 shows an example of the temporal evolution of the rainfall fields labelled as

member 23 of the STEPS-ADV ensemble calculated at 2020-02-05 21:45UTC for increasing lead

times from 5 minutes to 90 minutes. In this case, as forecast moves into larger lead times, the

overall rainy area of rainfall decays with high intensity cells persisting across the region for the

whole duration of the forecast, while their locations and intensities change slowly. Alternative

scenarios can be found by selecting other members in the same ensemble as the one presented in

Figure 12 for member 43. Forecast rainfall fields from member 43 indicate that after 30 minutes

the high intensity cells will mostly disappear from the western half of the region, with overall

rainfall areas heavily reduced as well.

Page 21: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

13

This diversity in STESP3-ADV rainfall forecasts become more relevant when smaller regions are

examined. Figure 13 and Figure 14 show close-ups of 100km around Brisbane City for the same

forecast rainfall fields shown in Figure 11 and Figure 12, respectively. It could be argued that this

may be the working scale for many hydrological models and large flash-flooding warning

systems. Variability at kilometre and sub-kilometre scales is quite remarkable in this case. For

example, member 23 predicts a high-intensity cells band to cross North Stradbroke Island from

north to south while increases its intensities during the first 30 minutes of the forecast period, but

then limiting its development to the eastern half of the domain for the rest of the forecast period.

On the other hand, member 43 predicts the arrival of a band of high-but-less-intense rainfall cells

during the first 15 minutes of the forecast period, that quickly decays and reduces to a narrow

area of intense precipitation localised about Gold Coast city. Both members strongly coincide to

forecast the occurrence of high-intense precipitation areas around Brisbane City (top third, centre

of the images) within the first 15 minutes of the forecast period.

Figure 10. 5-min rainfall fields for Brisbane radar at 2020-02-05 21:55UTC, that correspond with (top row,

from left to right) estimated radar rainfall, mean rainfall from 96-member STEPS3-ADV rainfall forecast

ensemble calculated at 2020-02-05 21:45UTC, and six individual members from the same forecast rainfall

ensemble.

Page 22: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

14

Figure 11. 5-min forecast rainfall fields for member 23 from a 96-member STEPS3-ADV rainfall forecast

ensemble calculated at 2020-02-05 21:45UTC for Brisbane radar. Rainfall fields correspond with (from top

to bottom, and left to right) 5-, 10-, 20-, 30-, 40-, 50-, 60-, 70-, 80- and 90-min lead times. Size of rainfall

fields is 256 x 256 km at 0.5 km resolution.

Figure 12. As Figure 11, but 5-min forecast rainfall fields are here extracted from member 43. Size of rainfall

fields is 256 x 256 km at 0.5 km resolution.

Page 23: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

15

Figure 13. Same as in Figure 11, but 5-min forecast rainfall fields depict here a close-up area of 100x100

km centred at Brisbane city.

Figure 14. Same as Figure 12, but 5-min forecast rainfall fields depict here close-up areas of 100x100 km

centred at Brisbane city.

4.3 Probabilistic Verification

In order to evaluate the quality of STEPS3-ADV rainfall ensembles, the following thresholds for

rainfall in mm/hr were used: 0.2, 0.4, 0.6, 0.8, 1.0, 5.0, 10.0, 20.0, and 50.0. The first threshold

identifies the chance of any rainfall; the next four thresholds will assess the ability of STEPS3-

ADV to properly identify the chance of light rain, while the last five thresholds address the goal

to properly identify significant and very intense rainfall amounts.

An ensemble forecast must have at least the following characteristics to be considered useful:

Page 24: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

16

• enough spread in the ensemble where the forecast values adequately represent the

uncertainty of the forecasts,

• enough reliability where the predicted probabilities of an event correspond to their

observed frequencies, and

• enough skill to forecast extreme values (probabilities near 0 or 100 %) rather than values

clustered around the mean.

In this study, different aspects of forecast quality are characterized by evaluating the root-mean-

square error (RMSE), the ensemble spread, the continuous ranked probability score (CRPS), the

Reliability curve, the Relative Operating Characteristic (ROC) curve, and the Rank histogram.

4.3.1 Root Mean Square Error – Ensemble Spread

One of the more common scores to assess the accuracy of rainfall forecasts consists in plotting,

as a function of lead time, both the root-mean-square error (RMSE) of the ensemble mean and

the average spread of the ensemble. Palmer et al., 2006 showed that in a 'perfect ensemble' the

mean of the spread should be equal to the RMSE over the same period.

RMSE provides the square root of the average square error of the forecasts and has the same units

as the forecasts and observations. The lower the RMSE, the better the ensemble. On the other

hand, the spread of the ensemble is calculated, in some cases, as the square root of average

ensemble variance, and more commonly as the average of ensemble standard deviation. Fortin et

al., 2014 however proved that only the first option is correct.

Figure 15 shows some examples of RMSE and Spread values of STEPS3-ADV rainfall ensembles

for a selection of rainfall events for multiple lead times. In these Spread-RMSE diagrams, an

under dispersive ensemble (i.e., an ensemble that does need more spread) will have the spread

values siting below the RMSE values, while spread points siting above RMSE values represent

an over dispersive ensemble (i.e., ensemble spread is greater than the RMS error). Results of

STEPS3-ADV ensembles indicate RMSE increases with lead time as expected with ensembles

be slightly under-dispersive. Additional spread or additional reduction in error may be required

to improve the accuracy of the rainfall ensembles. This overall behaviour seems to persist among

all radars and interestingly the level of under dispersion does not change significantly for the

different lead times assessed here (60 to 90 minutes). Figure 16 shows mean and dispersion of

RMSE and Ensemble spread values for the full set of 100 rainfall events analysed confirming the

increase of RMSE with lead time and an under dispersive behaviour of about 20% for all lead

times. Additional trials and adjustments to the algorithm may be explored to reduce this under

dispersion (such as increasing the variance of the perturbations in the diagnosed field advection

vectors).

Page 25: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

17

a)

b)

c)

d)

e)

f)

Figure 15. RMS error vs spread of STEPS3-ADV 60min rainfall ensemble forecasts for a selection of rainfall

events. a) Brisbane radar (ID66), event 05; b) Melbourne radar (ID02), event 08; c) Sydney(Terrey Hills)

radar (ID71), event 09; d) Perth (Serpentine) radar (ID70), event 04; e) Canberra radar (ID40), event 01;

and f) Cairns Radar (ID19), event 03.

Figure 16. Distribution of the RMSE (left) and Ensemble Spread (right) based on 100 rainfall events for lead

times from 60 to 90 minutes

Page 26: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

18

4.3.2 Continuous Ranked Probability Score (CRPS)

The continuous ranked probability score (CRPS) (Hersbach, 2000) is a summary statistic

comparing the forecast cumulative distribution with the corresponding distribution from the

observations. The mean CRPS is the mean of the CRPS values calculated for all forecasts. The

smaller the CRPS values the better; because of the differences between forecast and observed

probability distributions are smaller. The CRPS is expressed in the same unit as the observed

variable. The CRPS generalizes the mean absolute error to probabilistic forecasts. It reduces to

the mean absolute error (MAE) if the forecast is deterministic.

CRPS values were calculated for all rainfall events and for lead times in the range of 60 to 90

minutes. Figure 17 shows the distribution of the CRPS values for all events. Mean CRPS values

are small for all lead times with interquartile ranges varying between 0.3 and 0.8 mm. As

expected, CRPS values degrade (increase) as lead time increases but remain on average in the

same range, with at least 75% of the cases showing CRPS values lower than 0.80 mm.

Figure 17. Distribution of CRPS values of STEPS3-ADV rainfall forecasts for 100 rainfall events, per lead

time.

4.3.3 Relative Operating Characteristic (ROC)

Relative Operating Characteristic (ROC) measures the ability of the forecast to discriminate

between two alternative outcomes, therefore measuring resolution. For any event, a graph known

as ROC curve can be constructed to offer information on the expected hit rates and false alarm

rates from using different probabilities thresholds to initiate action. ROC curves can be used to

identify the probability threshold that provides the best trade-off between hit rate and false alarm

rate for a given decision. When the hit rates exceed the false alarm rates and the forecast is skilful.

The closest to the top left corner of the plot, the more skilful the forecast will be. A perfect score

is obtained if curve travels from bottom left to top left of diagram and then across the top right of

the diagram. A forecast that has no skill will have hit rates equals to the false alarm rates and the

ROC curve will be positioned along the diagonal.

Page 27: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

19

The areas under ROC curves (AUC) are commonly used to compare the usefulness of different

scenarios, where a greater area (closer to 1) means a more useful scenario. In general, an AUC of

0.5 suggests no ability of discrimination, 0.7 to 0.8 is considered acceptable, 0.8 to 0.9 is

considered excellent, and more than 0.9 is considered outstanding (Hosmer et al., 2013).

a)

b)

c)

d)

Figure 18. ROC curves for STEPS3-ADV 96-member 60-min accumulated rainfall ensembles for Brisbane

radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5). ROCs correspond with multiple

rainfall thresholds (0.2, 0.4, 0.6, 0.8, 1.0, 5.0, 10, 20 and 50 mm) and different lead times a) 60 minutes, b)

70 minutes, c) 80 minutes and d) 90 minutes.

Figure 18 shows an example of ROC curves for multiple lead times and rainfall thresholds for

STEPS3-ADV 96-member rainfall forecasts corresponding to the event 5 of Brisbane radar. ROC

for all thresholds are typical of a system that allows a good discrimination of events, with curves

close to the top left corner of the diagram. As expected, the longer the lead time, the system is

less able to identify events, but in this case, ROCs show that the system is still useful to identify

events up to 50mm in an hour for a lead time of 90 minutes. Although ROCs change from event

to event and from radar to radar, behaviours showed in Figure 18 seems to be representative of

Page 28: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

20

other rainfall events, as shown in Figure 19 where the mean ROC and spread for 100 events for a

lead time of 60 minutes are presented for multiple thresholds.

Figure 19. Mean and spread of ROC curves for STEPS3-ADV 96-member 60-min accumulated rainfall

ensembles for 100 events. Data correspond with multiple rainfall thresholds (0.2, 0.4, 0.6, 0.8, 1.0, 5.0, 10,

20 and 50 mm) for a lead time of 60 minutes.

Even further, overall results can be seen in Figure 20 that summarizes the ROC areas for all 100

events analysed in this study for multiple lead times and thresholds. These results indicate that

STEPS3-ADV can effectively identify the occurrence of rainfall events using rainfall thresholds

up to 10mm in an hour for all lead times, and up to 20mm for 60 or 70-minutes lead times.

Unfortunately, STEPS3-ADV seems to be unable to properly identify the occurrence of rainfall

events of 50mm in an hour for the longer lead times, although for a lead time of 60 minutes it was

able to provide useful advice for about half of the events. This is mainly due to limited number

of timesteps having 50 mm in an hour. To note that there were only 38 events where at least one

of the observed rainfall values exceeded the threshold of 50 mm.

Page 29: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

21

Figure 20. ROC area results for 100 rainfall events and multiple rainfall thresholds (mm), grouped by Lead

Time (minutes)

4.3.4 Rank (Talagrand) Histogram

A more detailed way of analysing the ensemble spread is to construct a rank histogram or

Talagrand diagram (Talagrand et al., 1997). The Talagrand diagram is the histogram of

frequencies of the rank of the observed data within the forecast ensemble.

In a good ensemble forecast system, all members should have equal ability to capture the

observations, thus the observed dataset should be distributed among the ensemble members

uniformly and the Talagrand diagram would be flat. If the ensemble spread is too large, the rank

histogram is ∩-shaped indicating that many observations are falling near the centre of the

ensemble; on the contrary, if ensemble spread is too small and therefore many observations are

falling outside the extremes of the ensemble, the rank histogram is ∪-shaped. If rank histogram

shows an asymmetric shape, that indicates the presence of bias in the ensemble.

Figure 21 shows rank histograms for lead times from 60 to 90 minutes for STEPS3-ADV 96-

member rainfall ensembles for the event 05 from the Brisbane radar. All histograms show an

asymmetric shape, with the first and last rank accounting for more observations than the other

ranks. This shape may indicate the presence of a bias in the ensemble, in this case, rainfall

forecasts tend to be lower than observations. This shape may be also result of the initiation of new

raining areas not being captured by STEPS. To note that STEPS needs that some rain data have

been detected by the radar to be able to nowcast rain rates for the following time steps. If radar

has not detected any falls, the nowcast values will be zero. Therefore, STEPS will underestimate

forecast rainfall in cases where radar has not detected any rain, but rapidly moving rain bands

enter the radar umbrella, or when convection rainfall is initiated by orographic enhancement or

costal effects. Once again, rank histograms vary from radar to radar and from event to event, but

these ones can be considered typical. Figure 22 summarizes Rank histogram results showing the

distribution of rank frequencies for all 100 events analysed in this study for the 60-minutes lead

time. Overall behaviour matches with the one for the single event described earlier, with the

ensembles tending to be lower than observations as an asymmetric shape in the summary rank

histogram is clear.

Page 30: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

22

Figure 21. Rank histograms for STEPS3-ADV 96-member 60-min accumulated rainfall ensembles for

Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5). Histograms

correspond to lead times of a) 60 minutes, b) 70 minutes, c) 80 minutes and d) 90 minutes.

Figure 22. Distribution of rank histograms for STEPS3-ADV 96-member 60-min accumulated rainfall

ensembles for 100 rainfall events. Histograms correspond to a lead time of 60 minutes.

4.3.5 Reliability (Attribute) Diagrams

The reliability or attribute diagram measures how well the predicted probabilities of an event

correspond to their observed frequencies (reliability). In this diagram, the observed frequency is

plotted against forecast probability for all probability categories, indicating good reliability a line

close to the diagonal. A deviation below the diagonal represents forecast probabilities that are too

high (ensemble is predicting higher chances than observed frequency). On the other hand, a

deviation above diagonal indicate forecast probabilities that are too low, i.e., ensemble is

Page 31: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

23

predicting less chance than the observed frequency. The flatter the curve in the diagram, the less

resolution and reliability it has.

Figure 23 shows an example of Reliability diagrams for a lead time of 60-min and multiple rainfall

thresholds for STEPS3-ADV 96-member rainfall forecasts corresponding to the event 5 of

Brisbane radar. Dashed horizontal lines show the climatological frequency for the given

threshold, and the dotted lines midway between the diagonal line and the horizontal denotes "no

skill" relative to climatology. Shaded regions highlight areas where an ensemble have good

reliability and therefore skill. The bar charts below each diagram show the number of times each

probability value was predicted.

In this case, Reliability plots seem to indicate that ensembles adequate predict the extreme

probabilities but show diversions from the expected values in the middle probabilities for the

lowest rainfall threshold. However, results for rainfall thresholds in the range of 0.4 to 20 mm

seems to have a good reliability for all the range of probabilities for this lead time. Reliabilities

for the 50mm threshold are clearly inadequate and seems to be affected by the limited number of

observed values having 50 mm in an hour to establish a valid comparison.

It is important to note that according to the probability histograms, STEPS3-ADV does not

forecast the middle probabilities very often, and that it is another indication that ensembles are

very ‘sharp’. STESP3-ADV ensembles seems to be very confident in saying that a rainfall event

above given threshold 'definitely won't happen' or 'definitely will happen', but it seems to have a

reduced reliability to forecast middle probabilities. These results are consistent with the high ROC

area values observed for this event discussed earlier and displayed in Figure 18.

As reliability of a rainfall ensemble change from rain event to rain event, from radar to radar and

from lead time to lead time, the behaviours shown in Figure 23 may not to be representative for

the whole archive. In order to identify an overall result, Figure 24 summarizes reliability diagrams

for the 100 events analysed in this experiment. Each line in the figure aggregates over all values

of observed relative frequency at each forecast probability bin and displays the mean and a

confidence interval (5 - 95 percentiles) using coloured shaded areas.

In an intent to summarize areas with positive skill for all events, grey scaled backgrounds have

been added to each one of the reliability plots in Figure 24. In the case of one single event blue

shaded areas were used to indicate areas of positive skill (as showed in Figure 23). These areas

of positive skill change from event to event as they are based on the climatological frequency of

the rainfall event. As here 100 events are summarized, the larger number of events considered a

forecast probability – observed frequency pair with a positive skill the darker the background

becomes. If all events considered a pair with a positive skill (such all the points along the line 1-

1) the background for that pair is the darkest. On the other hand, if only few events or no events

identified a pair with a positive skill that pair is barely painted. Backgrounds in Figure 24

correspond with these climatology areas for a lead time of 60-min.

Aggregated results show that STEPS3-ADV ensembles have a good reliability for rainfall

thresholds up to 5 mm/hr for all the lead times assesses in this study. Reliabilities seem to have

a positive skill for threshold of 10mm/hr up to 80-min lead time, and for 20 mm/hr up to 60-min

only with reliability decaying for longer lead times. Reliabilities for 50 mm/hr thresholds seems

to be on average inadequate for any of the lead times analysed, although a lack of enough

observed data in these ranges of precipitation that allow a robust score for this threshold may be

the cause of these results. These results are consistent with those ones showed in Figure 23 for

Page 32: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

24

the longest event in the dataset and also with the high ROC area values discussed earlier and

displayed in Figure 20.

Figure 23. Reliability diagrams for STEPS3-ADV 96-member 60-min accumulated rainfall ensembles for

Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5). Diagrams correspond

with lead time of 60 minutes and multiple rainfall threshold (from top to bottom, 0.2, 0.4, 0.6, 0.8, 1.0, 5.0,

10, 20 and 50 mm). Dashed horizontal lines show the climatological frequency for the given threshold, and

the dotted lines midway between the 1:1 diagonal line and the horizontal denotes "no skill" relative to

climatology. Shaded regions show the areas where ensembles have good reliability and therefore skill. Bar

charts below each diagram show the number of times each probability value was predicted.

Page 33: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

25

Figure 24. Reliability diagrams for 100 rainfall events for multiple rainfall thresholds (0.2, 0.4, 0.6, 0.8, 1.0,

5.0, 10, 20 and 50 mm) and different Lead Times (60, 70, 80 and 90 minutes). Lines correspond to the mean

of 100 events results and shaded areas depict 5 –95 percentiles for the indicated lead times and rainfall

thresholds. Background correspond with climatology areas of positive skill for a lead time of 60-min.

4.4 Performance based on different number of ensemble members

To understand the performance of forecast based on different number of ensemble members, the

longest event in the dataset (Radar 66, Event 5) was further verified with the different number of

ensemble members having 48, 24, 12 and 6 in addition to the original 96 members. This analysis

Page 34: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

26

provides insight on using the optimum number of ensemble member for providing reliable

probabilistic forecasts of rain/no rain and the probability of extreme/high rainfall intensity.

Figure 25 presents a comparison of the ROC curve using 48, 24 and 12 members from 60 to 80

minutes lead times. ROCs in this figure can be directly compared with the ROC curve based on

96 members that are shown in Figure 18. In an overall comparison, the area under the ROC curve

decreases with the lower number of ensemble members. This decrease in ROC area seems to be

more prominent in higher rainfall threshold (e g. 20 and 50 mm h-1).

Figure 25. ROCs for STEPS3-ADV 60-min accumulated rainfall ensembles with different ensemble

members for (top row) T+60 (centre) T+70 and (bottom row) T+80 for Brisbane radar (ID 66) from 2020-02-

04 19:00UTC to 2020-02-10 19:00UTC (event 5)

The area under the ROC curve for ensembles with the different number of members is presented

in Figure 26. Results show that, as expected, the larger number of ensemble members, the higher

area under the ROC curve remains, the more useful the ensemble is. However, for lower rainfall

thresholds (from 0.2 up to 5 mm), the ROC areas remain close to the baseline derived from 96-

member ensemble for ensembles with up to 12 members. For the higher rainfall thresholds (10 to

50 mm), however, ROC areas obtained using lower ensemble members are significantly lower

than the baseline. This means ensembles with up to 12 members may be enough for predicting

rain/no rain for the lower rainfall intensities (up to 5mm) for all lead times; however for the higher

intensity rainfall, a larger number of ensemble members are required to provide useful predictions

for lead times larger than 70 minutes. Even though this result is based on a single event, there

would be a similar trend in the variation of result for area under the ROC curve for other radars

and rainfall events.

Page 35: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

27

Figure 26. ROC areas for different number of members in a STEPS3-ADV rainfall ensemble for multiple

lead times (60 to 90 minutes) and different detection thresholds (0.2 to 50 mm). Results correspond with

60-min accumulated rainfall ensembles for the event 05 of Brisbane radar.

CRPS results using ensembles with the different number of members for the same event as above

are presented in Figure 27. Results seem to confirm our conclusions after analysing ROC areas.

The higher the number of members, the lower the CRPS values are and therefore the more

accurate the ensemble is. As expected, when the number of members is reduced, higher CRPS

values are obtained indicating the ensembles are less accurate. It is observed that CRPS values

are lower than 0.9 mm/hr for ensembles with more than to 24 members, however this value

increases above 1.0 mm/hr for an ensemble with just 6 members indicating larger errors and

therefore a lower performance when number of members is heavily reduced.

Page 36: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

28

Figure 27. Evolution of CRPS values over domain for different number of ensemble members. Results

correspond with 60-min accumulated rainfall ensembles for the event 05 of Brisbane radar (ID 66).

4.5 Comparison with existing operational system, STEPS1-ADV

This section provides the comparison of STEPS3-ADV with the existing operational system

(henceforth referred to as STEPS1-ADV). STEPS1-ADV is being used as an operational system

in the Bureau of Meteorology which generates 10 ensemble members for four radars. Table 3

shows the difference in average computation time for STEPS3-ADV and STEPS1-ADV using

24-core machines on NCI's supercomputer GADI. Recorded durations included the total

computation time involving all the processes from reading data, creating ensembles and writing

results back to disk. It is observed that STEPS3-ADV is more than 30 times computationally

efficient compared to the existing STEPS1-ADV. One of the main reasons behind this

improvement is STEPS3-ADV was designed to utilise multiple cores and threads making the use

of available resources more efficient, while STEPS1-ADV is a single-core implementation.

Table 3 Average computation times (in seconds) for STEPS3-ADV and STEPS1-ADV using 24 cores

Ensemble members STEPS3-ADV STEPS1-ADV

96 24 780

24 12 360

As STEPS1-ADV is taking significant time for generating 96 members, to reduce the

computational time, rainfall ensembles with only 24 members were calculated with STEPS1-

ADV for all 100 rainfall events. Those 24-member rainfall ensembles are compared in this section

with the rainfall ensembles calculated using STESP3-ADV (that were reduced to 24-member).

Figure 28 shows the comparison of the distribution of the ROC Area for 60 minutes lead time

based on different threshold and radar for both versions of STEPS-ADV. An increase in the ROC

Area for the rainfall ensembles calculated by STESP3-ADV is clear for all the threshold and

radars. Also, the variability of the ROC Area is reduced for STEPS3-ADV compared with an

existing operational STEPS1-ADV.

Page 37: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

29

Figure 28. Comparison of ROC Area distribution for 60-minute lead time based on STEPS1-ADV and

STEPS3-ADV

Similarly, Figure 29 shows the comparison of the reliability plot based on STEPS3-ADV and

STEPS1-ADV for 60-, 70-, 80- and 90-minute lead times considering all 100 rainfall events. It

can be clearly observed that there is a significant improvement on result based on STEPS3-ADV

for all rainfall threshold. For the rainfall threshold from 0.2 to 5.0 mm, reliability curves produced

from STEPS3-ADV rainfall ensembles are close to 1:1 line indicating superior reliability

compared with STEPS1-ADV rainfall ensembles that are predicting significant less chance than

the observed frequencies for the lower probabilities. For higher rainfall thresholds (10 and 20

mm/hr), STEPS3-ADV was able to provide adequate reliable results up to 60-minute lead times

showing a better performance compared with STEPS1-ADV that was unable to provide reliable

results for the same rainfall thresholds.

Figure 30 shows the comparison of the CRPS distributions of rainfall ensembles calculated by

STEPS3-ADV and STEPS1-ADV for 100 events. It is also clear that the mean CRPS for all lead

time based on STEPS3-ADV is lower compared to STEPS1-ADV indicating better performance;

however, STEPS3-ADV showed higher outlier values of CRPS compared with STEPS1-ADV.

Figure 31 shows the comparison of RMSE and ensemble spread based on STEPS3 and STEPS1-

ADV for 100 events. Overall RMSE values for all lead times (60 to 90 minutes) based on

STEPS3-ADV are lower than those obtained using STEPS1-ADV what indicates that the new

STEPS3-ADV is more accurate. Also looking at the ensemble spread, STEPS3-ADV has shown

higher values of ensemble spread compared to STEPS1-ADV what indicates that STEPS3-ADV

produces rainfall ensembles less under dispersive than the current ones.

Page 38: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

30

STEPS3-ADV STEPS1-ADV

Figure 29. Comparison of reliability plot for 60-, 70-, 80- and 90-minute lead times considering all 100 events

based on STEPS1-ADV and STEPS3-ADV. Results are based on 24-member ensembles. Background

correspond with climatology areas of positive skill for a lead time of 60-min.

Figure 30. Comparison of CRPS based on STEPS3-ADV and STEPS1-ADV. Results are based on 24-

member ensembles for all 100 events.

Figure 31. Comparison of RMSE (left panel) and ensemble spread (right panel) for STEPS3-ADV (orange)

and STEPS1-ADV (blue)

Page 39: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

31

4.6 Comparison with pySTEPS

As there are a few open source tools used by the broad scientific community involved in rainfall

nowcasting mainly for research purposes. It is worth therefore to provide a comparison of the

performance of at least one of those research-based tools with our latest STEPS3-ADV to have

an idea on its operation and scientific value. Thus, this section provides the comparison for the

performance of STEPS3-ADV and pySTEPS to provide ensembles of forecasting rainfall from

60 to 90 minutes. pySTEPS (Pulkkinen et al., 2019) is an open-source python library for

probabilistic precipitation based on the STEPS methodology for nowcasting which was written

mainly for research purposes. For the sake of simplicity and to reduce the computation time, 24-

member ensembles for the longest rainfall event from the database was considered for this

analysis. Table 4 shows the difference in average computation times (this includes the total

computation time involving all the processes from reading data, creating ensembles and writing

results back to disk) for STEPS3-ADV and pySTEPS using 24-core machine on NCI's

supercomputer GADI. It was observed that STEPS3-ADV is more than 15 times computationally

efficient compared to pySTEPS.

Table 4. Average computation times for STEPS3-ADV and pySTEPS (time in seconds) using 24 cores

Ensemble members STEPS3-ADV pySTEPS

96 24 360

24 12 240

Comparisons between the performance and quality of rainfall forecasts of STEPS3-ADV,

STEPS1-ADV and pySTEPS using a few probabilistic scores are presented next.

Figure 32 presents a comparison of the ROC curve obtained from the three systems using 24

ensemble members for a lead time of 60-min and multiple thresholds. For the lower rainfall

threshold, the area under the ROC curve obtained using STEPS3-ADV is very similar to what has

been obtained from pySTEPS, however for the higher rainfall threshold STEPS3-ADV showed

higher area indicating better performance compared with STEPS1-ADV and pySTEPS. For all

thresholds, STESP1-ADV showed the worse performance among the three alternatives analysed

here.

Figure 33 shows the evolution of area under ROC for multiple lead times for the three systems.

It can be observed that for all thresholds and all lead times, ROC areas are higher for STEPS3-

ADV and pySTEPS compared with STEPS1-ADV. Please note that STESP3-ADV and pySTEPS

have very similar values of area under ROC for all lead times and thresholds except for the highest

rainfall threshold (50 mm/hr) where STEPS3-ADV clearly shows a superior performance. We

believe that this result is because STEPS3-ADV is free to dynamically evolve higher rain rates

while pySTEPS and STEPS1-AVD usually applies post processing techniques such as probability

matching that may cap rain rates in the rainfall ensemble to those that were already observed in

the input radar field.

Page 40: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

32

Figure 32. Comparison of ROCs for STEPS3-ADV (green line), STEPS1-ADV (orange line) and pySTEPS

(blue line) using 24 members for Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10

19:00UTC (event 5) for a lead time of 60-min

Page 41: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

33

Figure 33. ROC areas for STEPS3-ADV, STEPS1-ADV and pySTEPS using 24 rainfall ensembles for

multiple lead times (60 to 90 minutes) and different detection thresholds (0.2 to 50 mm). Results correspond

with 60-min accumulated rainfall ensembles for the event 05 of Brisbane radar.

Figure 34 shows the comparison of the reliability plot between STEPS3-ADV, STEPS1-ADV

and pySTEPS for the 60 minutes lead time and multiple rainfall thresholds. For this event,

pySTEPS seems to be more reliable than STEPS3-ADV and STEPS1-ADV for rainfall thresholds

lower than 0.6mm/hr. However, for rainfall thresholds in the range of 0.8 and 20 mm/hr results

from STEPS3-ADV and pySTEPS models are comparable, showing high reliabilities that are

relatively superior to STEPS1-ADV ones.

Page 42: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

34

Figure 34. Comparison of reliability diagram STEPS3-ADV (green line), STEPS1-ADV (orange line) and

pySTEPS (blue line) using 24-member 60-min accumulated rainfall ensembles for Brisbane radar (ID 66)

from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5). Diagrams correspond with a lead time of 60

minutes and multiple rainfall thresholds (from top to bottom, and left to right, 0.2, 0.4, 0.6, 0.8, 1.0, 5.0, 10,

20 and 50 mm).

Figure 35 shows the comparison RMSE and ensemble spread of STEPS3-ADV, STEPS1-ADV

and pySTEPS. It is observed RMSE values for STEPS1-ADV are significantly higher than those

ones for pySTEPS and STEPS3-ADV, with the lowest ensemble spread values among the three

alternatives. RMSE values for STEPS3-ADV and pySTEPS are quite similar for all lead times

but STEPS3-ADV provides rainfall ensembles with higher spread than any of the alternatives.

Page 43: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

35

Figure 35. Comparison of RMSE (left) and Ensemble spread(right) based on STEPS3-ADV(green),

STEPS1-ADV(orange) and pySTEPS(blue) using 24-member 60-min accumulated rainfall ensembles for

Brisbane radar (ID 66) from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5).

Finally, when comparing CPRS values between STEPS3-ADV, STEPS1-ADV and pySTEPS

(Figure 36), it is observed again that STEPS1-ADV has the highest level of error among the

alternatives (higher CRPS values for all lead times) with STEPS3-ADV and pySTEPS performing

in similar levels for all lead times for this rainfall event.

Figure 36. Evolution of CRPS values over domain based on STEPS3-ADV (green), STEPS1-ADV (orange)

and pySTEPS (blue) using 24-member 60-min accumulated rainfall ensembles for Brisbane radar (ID 66)

from 2020-02-04 19:00UTC to 2020-02-10 19:00UTC (event 5).

5. OPERATIONAL CONFIGURATION FOR STEPS3-ADV

As mentioned earlier, the main requirement from STEPS stakeholders was to obtain from

STEPS3-ADV rainfall ensembles the probability of rainfall exceeding some given rainfall

thresholds in the next hour (60-min). STEPS3-ADV clearly shows large improvements in both

the quality of the rainfall ensembles and computing efficiency when compared to the existing

production system STESP1-ADV and an open-source alternative (pySTEPS).

Page 44: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

36

After the analysis of the STEPS3-ADV case studies, the following configuration for the

operational STEPS3-ADV system is recommended:

• Update Frequency: 5 minutes

• Product: 60-min accumulated rainfall fields from 5-min rainfall ensembles

• Lead Time: 60 to 90 minutes

• Minimum number of members in the ensemble: 48

• Minimum threshold to identify event occurrence: 0.2 mm in one hour

• Maximum threshold to identify event occurrence: 50 mm in one hour

The recommended number of members 48 will allow getting rainfall ensembles with similar

performance than the very large ensembles analysed here but with half of the size of data files

and less computing processing. See the description and analysis of Figure 25, Figure 26, Figure

27 for more details.

It is important to note that some degree of caution may be required by the users for thresholds

above 20mm until additional datasets with a larger number of occurrences of rainfall in that

range have been incorporated into the verification. Verification results show that STEPS3-ADV

is capable to identify the occurrence or non-occurrence of a rainfall event (high chances or small

chances) but it may be too sharp in its current configuration to identify intermediate chances of

occurrence for some rainfall thresholds.

Also, there are a significant spread in the results from one event to the next one and from one

radar to the next one. For example, Figure 37 shows the distribution of ROC area of STEPS3-

ADV rainfall forecasts for a 60-minute lead time for each of the 10 radars analysed (Table 2) for

multiple rainfall thresholds. Figure 37 disaggregates per radar the results summarized in Figure

20 for the 60-minute lead time. It is clear that overall good performance occurred at some radars

(high ROC values) for most of the rainfall thresholds (e.g., radars Brisbane [66] and Weipa [78]),

while performance at other radars show larger spread and strong decays for rainfall thresholds as

low as 1 mm in one hour (e.g. radar Adelaide [64]). This could be an effect of local conditions

around the radar (such topography) that may induce localized growing or decay of rainfall rates

under some flow conditions that are not properly modelled by STEPS or also anomalies in the

correction of ground echoes in other conditions (such anomalous propagation) that could produce

fictitious rainfall echoes in the input files used to generate and verify ensembles.

Figure 37. ROC area results for 10 rainfall events per radar and multiple rainfall thresholds (mm) for the

60-minute lead time. Colours indicate different rainfall thresholds.

Page 45: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

37

6. CONCLUSIONS

An extensive verification exercise was carried out to assess the quality of the new generation of

Bureau's high-resolution rainfall ensemble generator, STEPS3-ADV. More than 47,000 5-min

radar rainfall fields from 10 different weather radars formed the verification dataset. 96-member

ensembles were calculated for all time steps in the verification dataset using NCI supercomputer

"GADI". STEPS3-ADV ensembles were compared with ensembles generated by the current

operational system, STEPS1-ADV and by an open source alternative, pySTEPS.

STEPS3-ADV rainfall forecasts are also suitable to correctly predict the probability of the

occurrence of hourly rainfall accumulations for rainfall thresholds in the range of 0.2 to 50 mm

in the hour for the 60- to-90-minute lead times. However, some degree of caution may be required

by the users for thresholds above 20mm until additional datasets with a larger number of

occurrences of rainfall in that range have been incorporated into the verification.

STEPS3-ADV ensembles seem to be under-dispersive and additional spread may be required to

improve the accuracy of the rainfall ensembles although the new system clearly produces

ensembles more accurate and with more spread than current operational version. Nevertheless,

expected errors are small with at least 75% of the case studies showing mean CRPS values lower

than 0.80 mm.

A limited assessment of the influence of the number of ensemble members on the quality of the

rainfall forecasts was carried out. From the analysis of the largest event (2017 time steps), it was

found that ability to successfully identify events for rainfall threshold from 0.2 to 5 mm/hr remains

mostly similar for ensembles with 6, 12, 24, 48 and 96 members for lead times from 60 to 90

minutes. For the higher rainfall thresholds (10 mm/hr and above) this ability is heavily reduced if

ensembles with less than 48 members were used. Expected error in the rainfall ensembles reduces

as the number of members is increased, but ensembles with 48-member seem to have a similar

performance than those ones using larger 96-member.

Results show that STEPS3-ADV can generate reliable ensemble rainfall forecasts in large range

of rainfall conditions with quality comparable with available open-source alternatives but

delivering results up to 15 times faster. When compared with current operational version,

STEPS3-ADV have better performance in all scores analysed in this report and can deliver results

up to 30 times faster, showing strong capabilities for use as an operational system in the Bureau.

Finally, it is important to note that there is significant variability in the quality of the predictions,

and the verification results vary from radar to radar and from event to event depending of the

nature of the event, threshold, and lead-time.

Page 46: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

38

7. REFERENCES

Bureau National Operations Centre Operations, Bulletin Number 114, 2018. APS2 upgrade of

the ACCESS-C Numerical Weather Prediction system.

http://www.bom.gov.au/australia/charts/bulletins/BNOC_Operations_Bulletin_114.pdf

Bowler, N.E., Pierce, C.E. and Seed, A., 2004. Development of a precipitation nowcasting

algorithm based upon optical flow techniques. Journal of Hydrology, 288(1-2), pp.74-91.

Bowler, N.E., Pierce, C.E. and Seed, A.W., 2006. STEPS: A probabilistic precipitation

forecasting scheme which merges an extrapolation nowcast with downscaled NWP.

Quarterly Journal of the Royal Meteorological Society, 132(620), pp.2127-2155.

Fortin, V., Abaza, M., Anctil, F., Turcotte, R., 2014. Why Should Ensemble Spread Match the

RMSE of the Ensemble Mean? J. Hydrometeorol. 15, 1708–1713.

https://doi.org/10.1175/jhm-d-14-0008.1

Hersbach, H., 2000. Decomposition of the Continuous Ranked Probability Score for Ensemble

Prediction Systems. Weather Forecast. 15, 559–570. https://doi.org/10.1175/1520-

0434(2000)015<0559:DOTCRP>2.0.CO;2

Hosmer, David W., Lemeshow, Stanley., and Sturdivant, Rodney X., 2013. Applied Logistic Regression, 3rd Ed. Chapter 5, John Wiley and Sons, New York, NY, pp. 177

Palmer, T., Buizza, R., Hagedorn, R., Lawrence, A., Leutbecher, M., Smith, L., 2006. Ensemble

prediction: a pedagogical perspective. ECMWF Newsl. 106, 10–17.

https://doi.org/10.21957/ab129056ew

Pulkkinen, S., Nerini, D., Perez Hortal ,A., Velasco-Forero ,C., Germann ,U., Seed, A., and

Foresti ,L., 2019: Pysteps: an open-source Python library for probabilistic precipitation

nowcasting (v1.0). Geosci. Model Dev., 12 (10), 4185–4219, doi:10.5194/gmd-12-4185-

2019.

Roberts, N.M., Lean, H.W., 2008. Scale-Selective Verification of Rainfall Accumulations from

High-Resolution Forecasts of Convective Events. Mon. Weather Rev. 136, 78–97.

https://doi.org/10.1175/2007mwr2123.1

Seed, A.W., 2003. A dynamic and spatial scaling approach to advection forecasting. Journal of

Applied Meteorology, 42(3), pp.381-388.

Seed, A.W., 2008. Rainfields: The Australian Bureau of Meteorology System for Quantitative

Precipitation Estimation, and it’s use in Hydrological Modelling, Proceedings of Water

Down Under 2008, Modbury, SA, 661-670

Seed, A.W., Pierce, C.E. and Norman, K., 2013. Formulation and evaluation of a scale

decomposition‐based stochastic precipitation nowcast scheme. Water Resources Research,

49(10), pp.6624-6641.

Talagrand, O., Vautard, R., Strauss, B., 1997. Evaluation of Probabilistic Prediction Systems.

8. ACKNOWLEDGEMENTS

The authors express their gratitude to Dr. Beth Ebert and Dr. Shaun Cooper (Bureau of

Meteorology) for their insightful comments offered when reviewed the manuscript.

This project was undertaken with the assistance of resources and services from the National

Computational Infrastructure (NCI), which is supported by the Australian Government.

Page 47: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

39

9. APPENDIX

Main characteristics of the 100 selected rainfall events are summarized per radar in the following

tables:

Table A1: Rainfall events for Radar 2 (Melbourne)

No Start time End time Number of

time steps Description

1 15 Oct 2019 18:00 17 Oct 2019 09:00 469 Starts with a convective cell with

slowly moving precipitation

band

2 1 Nov 2019 09:00 2 Nov 2019 09:00 289 Starts with a narrow convective

band later forming well spread

rain

3 6 Nov 2019 05:00 8 Nov 2019 23:00 793 Widespread rain

4 12 Nov 2019 00:00 12 Nov 2019 19:00 229 Small patches of rain cells

moving quickly

5 1 Dec 2019 00:00 2 Dec 2019 13:00 445 Scattered light rain

6 4 Jan 2020 15:00 6 Jan 2020 02:00 421 Rainfall occurred during the

high level of smoke recorded in

Melbourne

7 15 Jan 2020 01:00 15 Jan 2020 11:00 121 Fast moving convective system

from west to south east

8 19 Jan 2020 02:00 20 Jan 2020 12:00 409 High intensity rain with the

presence of large-hail stones up

to 6 cm.

9 4 Mar 2020 00:00 5 Mar 2020 08:00 385 Widespread rain

10 3 Apr 2020 04:00 4 Apr 2020 14:00 409 South westerly cold front

forming intermediate convective

cells

Page 48: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

40

Table A2: Rainfall events for Radar 19 (Cairns)

No Start time End time

Number

of time

steps

Description

1 21 Oct 2019 03:00 22 Oct 2019 16:00 445 Scattered rain

2 5 Dec 2019 04:00 5Dec 2019 14:00 121 Small patches of convective

cells

3 9 Dec 2019 03:00 11 Dec 2019 09:00 649 Starts with a narrow band of

convective cells and later

forming scatter rain

4 2 Jan 2020 22:00 5 Jan 2020 00:00 601 Fast moving scattered rain

5 8 Jan 2020 15:00 9 Jan 2020 16:00 301 Localized high intensity rain

6 22 Jan 2020 10:00 24 Jan 2020 00:00 457 Starts with high intensity

scattered rain later forming

widespread rain

7 26 Jan 2020 00:00 29 Jan 2020 09:00 973 Widespread rain

8 8 Feb 2020 02:00 8 Feb 2020 20:00 217 Fast moving convective cells

9 20 Feb 2020 14:00 25 Feb 2020 00:00 1273 Southerly moving wind forming

widespread rain

10 8 Mar 2020 23:00 9 Mar 2020 17:00 217 Start with scattered rain later

forming widespread rain

Page 49: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

41

Table A3: Rainfall events for Radar 40 (Canberra)

No Start time End time

Number

of Time

steps

Description

1 7 Oct 2019 17:00 8 Oct 2019 09:00 193 Scattered rain in the beginning

later covering about 80 % of the

radar

2 15 Oct 2019 22:00 16 Oct 2019 15:00 205 Starts with a localized rain later

forming widespread rain

3 2 Nov 2019 5:00 3 Nov 2019 16:00 421 Widespread rain

4 21 Dec 2019 00:00 21 Dec 2019 09:00 109 Convective rain

5 29 Dec 2019 23:00 31 Dec 2019 13:00 457 Precipitation starts out as small

patches with some scattered

convective cells and later

evolves predominantly into

convective precipitation

6 15 Jan 2020 2:00 15 Jan 2020 21:00 229 Localized rain patches later

forming widespread rain

7 18 Jan 2020 23:00 19 Jan 2020 15:00 193 Fast moving convective cells

8 19 Jan 2020 23:00 20 Jan 2020 17:00 217 High intensity rain with the

presence of large hail stones up

to 5 cm.

9 7 Feb 2020 00:00 11 Feb 2020 12:00 1309 Widespread rain

10 3 Mar 2020 07:00 5 Mar 2020 07:00 577 Cold front with widespread rain

Page 50: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

42

Table A4: Rainfall events for Radar 63 (Darwin/Berrimah)

No Start time End time

Number

of Time

steps

Description

1 27 Dec 2019 00:00 27 Dec 2019 09:00 109 Fast evolving rain cells forming

predominantly convective rain

2 5 Jan 2020 00:00 6 Jan 2020 16:00 481 Widespread rain

3 8 Jan 2020 07:00 11 Jan 2020 13:00 937 Starting with the narrow band

later forming widespread rain

4 18 Jan 2020 20:00 23 Jan 2020 11:00 1333 Mostly scattered rain with

widespread rain in between

5 28 Jan 2020 18:00 29 Jan 2020 10:00 193 Fast evolving rain cells

6 8 Feb 2020 04:00 9 Feb 2020 13:00 397 Scattered rain

7 19 Feb 2020 06:00 20 Feb 2020 00:00 217 Starts with the rainfall band later

forming widespread rain

8 26 Feb 2020 11:00 29 Feb 2020 11:00 865 Predominantly convective rain

with scatter rain and narrow band

9 7 Mar 2020 02:00 8 Mar 2020 14:00 433 Starts with cumulus cloud with

fast evolving rain cells

10 23 Mar 2020 14:00 25 Mar 2020 19:00 637 Mostly localised rain

Page 51: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

43

Table A5: Rainfall events for Radar 64 (Adelaide/ Buckland Park)

No Start time End time

Number

of Time

steps

Description

1 12 Oct 2019 17:00 12 Oct 2019 21:00 49 Fast moving convective cells

approaching from the west

2 15 Oct 2019 02:00 15 Oct 2019 23:00 253 Started with the scattered rain,

later high rainfall intensity

convective band observed close

to Port Lincoln

3 1 Nov 2019 05:00 2 Nov 2019 00:00 229 Started with series of convective

bands later formed wide spread

rain with localized high intensity

rainfall.

4 28 Nov 2019 23:00 29 Nov 2019 11:00 145 Narrow band of convection rain

5 27 Dec 2019 12:00 27 Dec 2019 21:00 109 Medium intensity rain band

moving from south west to north

east.

6 4 Jan 2020 06:00 5 Jan 2020 19:00 445 Started with a narrow rain band,

intermittently forming a wide

spread with low to medium

intensity rainfall

7 9 Jan 2020 22:00 10 Jan 2020 09:00 133 Started with the scattered rain

later forming a narrow band of

fast moving moderate intensity

localized rain

8 30 Jan 2020 12:00 1 Feb 2020 7:00 517 Scattered rain in the beginning

coming from the North later

forming convective rain

9 1 Mar 2020 3:00 1 March 2020 15:00 145 Fast evolving rainfall mostly

convective rain cells

10 3 Apr 2020 3:00 4 Apr 2020 11:00 385 Started with a narrow band of

convective rainfall later formed

scatter rain.

Page 52: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

44

Table A6: Rainfall events for Radar 66 (Brisbane/Mt. Stapylton)

No Start time End time

Number

of time

steps

Description

1 10 Oct 2019 11:00 12 Oct 2019 12:00 589 Starts with a narrow convection

band of rain followed with

widespread rain

2 17 Oct 2019 04:00 17 Oct 2019 10:00 73 Strong convective event

3 12 Dec 2019 16:00 13 Dec 2019 09:00 205 Fast evolving convective system

with high intensity localised rain

4 24 Dec 2019 04:00 25 Dec 2019 21:00 493 Heavy precipitation event

5 3 Feb 2020 19:00 10 Feb 2020 19:00 2017 Longest rainfall event considered

for this verification

6 10 Feb 2020 20:00 14 Feb 2020 08:00 1009 Starts with a wide band of

convective rain followed by

thunderstorm with some reported

flooding cases around Brisbane

7 22 Feb 2020 11:00 25 Feb 2020 12:00 985 Rain band passing over Brisbane

8 26 Feb 2020 00:00 27 Feb 2020 08:00 385 Fast evolving convective rain

cells

9 8 Mar 2020 02:00 10 Mar 2020 11:00 685 Widespread rain event

10 30 Mar 2020 14:00 30 Mar 2020 18:00 49 Localised rain event

Page 53: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

45

Table A7: Rainfall events for Radar 70 (Perth/Serpentine)

No Start time End time

Number

of time

steps

Description

1 3 Oct 2019 06:00 4 Oct 2019 13:00 373 Precipitation starts as a

widespread rain later forms

scattered rain

2 11 Oct 2019 09:00 12 Oct 2019 00:00 181 Scattered rain

3 30 Oct 2019 02:00 2 Nov 2019 01:00 853 A narrow band of precipitation

followed by scattered rain

4 16 Dec 2019 07:00 16 Dec 2019 18:00 133 Light scattered rain moving

towards NE

5 10 Feb 2020 21:00 11 Feb 2020 01:00 49 Light scattered rain

6 21 Feb 2020 05:00 22 Feb 2020 06:00 301 Widespread rain

7 24 Feb 2020 00:00 24 Feb 2020 09:00 109 Fast evolving localised

convective cells

8 25 Feb 2020 20:00 28 Feb 2020 10:00 745 Convective rain approaching

from NW direction

9 14 Mar 2020 06:00 14 Mar 2020 18:00 145 Scattered localized rain

10 16 Mar 2020 22:00 18 Mar 2020 00:00 313 Rain band passing through radar

Page 54: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

46

Table A8: Rainfall events for Radar 71 (Sydney/Terry Hills)

No Start time End time Number of time

steps Description

1 4 Oct 2019 11:00 5 Oct 2019 01:00 169 Cluster of intense rain

surrounded by light rain

cells

2 10 Oct 2019 08:00 12 Oct 2019 13:00 634 Continuously generated

convective rain cells

3 3 Nov 2019 02:00 3 Nov 2019 17:00 162 Starts with localized rain

later forming widespread

rain

4 23 Nov 2019 02:00 23 Nov 2019 10:00 97 Localized high intensity

rain

5 15 Jan 2020 14:00 19 Jan 2020 09:00 1091 Cluster of intense rain

6 19 Jan 2020 22:00 20 Jan 2020 12:00 150 Hail event

7 5 Feb 2020 12:00 09 Feb 2020 20:00 1249 Widespread rain

8 3 Mar 2020 00:00 4 Mar 2020 08:00 385 Scattered rain

9 24 Mar 2020 19:00 27 Mar 2020 02:00 661 Band of convective cells

moving towards NE

10 29 Mar 2020 12:00 30 Mar 2020 05:00 205 Started with localized high

intensity rain, later

forming widespread rain

Page 55: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

47

Table A9: Rainfall events for Radar 76 (Hobart/Mt.Koonya)

No Start time End time Number of

time steps Description

1 1 Nov 2019 02:00 2 Nov 2019 00:00 265 Started with high intensity

localized rain cells, forming

widespread later during the

event

2 6 Nov 2019 02:00 9 Nov 2019 20:00 1081 Mostly scattered rain

3 30 Dec 2019 02:00 30 Dec 2019 19:00 205 Fast evolving high intensity

localized rain with widespread

rain in the later phase

4 9 Jan 2020 21:00 10 Jan 2020 22:00 301 Mostly widespread rain

5 22 Jan 2020 04:00 24 Jan 2020 02:00 553 Widespread rain

6 18 Feb 2020 08:00 19 Feb 2020 09:00 301 Light rainfall band

7 20 Feb 2020 08:00 21 Feb 2020 03:00 229 Mostly scattered localized rain

cells

8 25 Feb 2020 15:00 27 Feb 2020 23:00 673 Widespread rain moving from

SW to NE

9 4 Mar 2020 14:00 5 Mar 2020 16:00 313 Mostly widespread rain coming

from the north direction

10 19 Mar 2020 01:00 19 Mar 2020 19:00 217 Started with the scatter rain

later forming widespread with

intermittent high intensity rain

bands

Page 56: STEPS3 ADV VERIFICATION REPORT

STEPS3 – ADV – VERIFICATION REPORT

48

Table A10: Rainfall events for Radar 78 (Weipa airport)

No Start time End time Number of

time steps Description

1 26 Dec 2019 04:00 26 Dec 2019 22:00 217 Starts with the high intensity

localized rain later widespread

rain approaches from the south

direction

2 7 Jan 2020 14:00 10 Jan 2020 20:00 937 Mostly scattered rain, also

forming narrow rain band

3 27 Jan 2020 20:00 29 Jan 2020 12:00 481 Begins with high intensity scatter

rain later forming widespread

rain

4 29 Jan 2020 13:00 30 Jan 2020 17:00 337 Convective cells moving from

NW direction covering almost

80% the area at the later phase

5 30 Jan 2020 21:00 4 Feb 2020 13:00 1333 Intermittent rain event forming

with localized high intensity

rainfall

6 10 Feb 2020 23:00 13 Feb 2020 19:00 817 Intermittent localized convective

cells

7 19 Feb 2020 13:00 24 Feb 2020 07:00 1356 Begins with the scattered light

rain followed by some high

intensity narrow rain bands, later

forming widespread rain

8 5 Mar 2020 17:00 7 Mar 2020 17:00 577 Convective rain cells coming

from NE

9 10 Mar 2020 22:00 13 Mar 2020 08:00 682 Mostly scattered rain

intermittently forming narrow

rain bands

10 22 Mar 2020 03:00 24 Mar 2020 00:00 541 Rain moving towards west

direction with mostly scattered

rain