task 1 of pp interpretation

Federal Department of Home Affairs FDHAFederal Office of Meteorology and Climatology MeteoSwiss

Task 1 of PP Interpretation

1.1 Further applications of boosting:This talk

1.2 Publication on boosting:Paper of Oliver Marchand submitted, but not yet published

Federal Department of Home Affairs FDHAFederal Office of Meteorology and Climatology MeteoSwiss

Thunderstorm Prediction with Boosting:

Verification and Implementation of a new Base Classifier

André Walser (MeteoSwiss)

Martin Kohli (ETH Zürich, Semester Thesis)

3 Andre Walser

Overview

• Boosting Algorithm

• Impact of learn data

• Verification results

• Mapping to probability forecast

• New base classier: decision tree

4 Andre Walser

Supervised Learning

Rules Classifier

New Data

yes/no

Historic Data

Learner

5 Andre Walser

Learn data

COSMO-7 assml cycle• Data for 79 SYNOP stations

in Switzerland

• At least on year, every hour

• e.g. SI, CAPE, W, date, time

LABEL DATA• a thunderstorm „yes“ if

• an appropriate ww-code was reported in the SYNOP or

• at least 3 lightnings were registered within 13.5 km

station

13.5km

6 Andre Walser

AdaBoost Algorithm

InputWeighted learn samplesNumber of base classifier M

Iteration1 determine base classifier G2 calculate error, weights w3 adapt the weights of falsely classified samples

7 Andre Walser

Output of the Learn process

• M base classifier• Threshold classifier:

8 Andre Walser

AdaBoost Algorithm

InputWeighted learn samplesNumber of base classifier M

Iteration1 determine base classifier G2 calculate error, weights w3 adapt the weights of falsely classified samples

Classifier:

9 Andre Walser

Output of the Classifier: C_TSTORM

17 UTC

18 UTC

19 UTC

Biased!

Biased!

10 Andre Walser

Reason: Inappropriate learn data…

• SYNOP messages contain events and non-events, but are only available every 3 hours (most messages for 6, 12, 18 UTC).

• Lightning data only contains events

11 Andre Walser

New learn data sets

• B – biasedSYNOP messages; only events from lightning data

• F – fullSYNOP messages; all missing values are considered as non events

• AL1 – at least 1SYNOP messages; when lightning data shows at least 1 events, all non missing value are considered as non-events

12 Andre Walser

Without bias…

17 UTC

18 UTC

19 UTC

13 Andre Walser

Verification

• POD and FAR for different C_TSTORM values between 0.3 and 0.6

FAR = False Alarms / #Alarms

• Learn data:Model: COSMO-7 assimilation cycle Jun 06 – May 07Obs: B / AL1 / F

• Verification data: Model: COSMO-7 forecasts July 06 and May/June 07Obs: F

14 Andre Walser

Verification: earlier results

• Results reported last year for 2005:

POD = 72%, FAR = 34%

• Unfortunately not realistic, verification done with obs data B

15 Andre Walser

July 2006

~7% events

Random forecast

16 Andre Walser

18 May – 24 June 2007

17 Andre Walser

Comparison with other system

• DWD Expert-System:• Periode April 2006 - September 2006:

POD = 0.346, FAR = 0.740

18 Andre Walser

Mapping to a probability forecast

PC_TSTORM

Polygon fit in a reliability diagram:

19 Andre Walser

Mapping to a probability forecast

0 if x ≤ 0.4;ax2 + bx + c if 0.4 < x < 0.6;a0.62 + b0.6 + c if x ≥ 0.6.

PC_TSTORM =

Limited resolution: The system predicts probabilities only between 0 and ~40% Limited resolution: The system predicts probabilities only between 0 and ~40%

20 Andre Walser

New Base Classifier: Decision Tree

threshold classifier 1

1 0

21 Andre Walser

New Base Classifier: Decision Tree




class 1 class 0

1 0 1 0

22 Andre Walser

Decision Tree: Example

23 Andre Walser

Conclusions & Outlook

• Boosting • is a simple, efficient and effective machine learning method

for model post-processing• is completely general• can employ a number of redundant indicators• computes a certainty of the classification

mapped to probability forecast

• First verification results promising, extended verification required

• Benefit of decision trees?

task 1 of pp interpretation

Documents