reproducible machine learning in climate science · 2019-10-17 · open and reproducible science t...

54
Reproducible Machine Learning in Climate Science ESoWC 2019 T

Upload: others

Post on 15-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

Reproducible Machine Learning in Climate Science

ESoWC 2019

T

Page 2: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

We are a PhD Student and a Machine Learning

Engineer

Page 3: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

Our aim was to build a Python toolbox for working

with ML using weather and climate data

ML is hard for me and you,

Weather is tough for others too.

Interpretability makes us all fearful,

So we wrote some code to make it cheerful!

T

Page 4: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

We developed a modular and extensible pipeline to

apply machine learning to climate scienceData extensibility Experimental extensibility

G

Page 5: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

First results - using machine learning to predict

vegetation health

G

Page 6: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

Our initial experiments showed some skill!

G

Pers

iste

nce

EA L

STM

Page 7: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

We were strongly influenced by this talk

T

Reproducibility @ ICLR 2019

Page 8: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

Reproducibility is central to the ideals of scientific

research

T

Page 9: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

Open and Reproducible Science

T

"you can't stand on the shoulders of giants if they keep their shoulders

private"

Grus 2019

Page 10: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

We wanted to utilise the tools from Software

Engineering

T

Page 11: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

We wanted to allow other people (and future us) to

train, use and reproduce our models

T

Page 12: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

Unit testing allows us to be confident our pipeline

does what we expect

G

Page 13: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

We used type hints to better communicate what

functions did, and to leverage type checkers

G

Page 14: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

We use json configurations to keep track of what

experiments are run (WIP)

G

Page 15: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

Making scientific workflows fully reproducible is

hard...

T

● Documentation

● Instructions

● Unit Testing

● Experimental vs. Library code

● Source Control

● Parameters as Arguments

Page 16: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

Lessons learned:

T

● Infrastructure communicates what code does.

● The initial time investment is worth investing.

● There is a tension between experimenting quickly, and maintaining

well-tested robust code.

Page 17: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

Our Summer in Numbers● 131 commits to master branch

● 37 Github issues

● 87 Pull Requests

● +15,600 slack messages

● 18,544 lines of Python code

● 188 tests written

T

Page 18: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

● Work with forecast data● Test for different problems● ml_climate.readthedocs.io

documentation! ● Improve our VCI predictions

To the future ...

T

Usability Performance Analysis

Page 19: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

Let’s innovate together.

Page 20: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

Appendix

Page 21: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

We focused on predicting an agricultural drought

index in Kenya

T

Page 22: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

Lessons learned:

T

● The biggest benefit of all this infrastructure is easier communication of

what code is supposed to do

● A little overhead at the beginning of the project (setting up CI) reaps big

rewards later

● It is challenging to manage the tension between experimenting quickly,

but also keeping well-tested robust code

Page 23: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

G

Page 24: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

T

Our data sources include satellite data and model

outputs

Page 25: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

G

Page 26: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

G

Page 27: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

G

Page 28: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

G

Page 29: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

G

Page 30: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

● Persistence (previous month)● Linear Regression● Linear (classical) Neural Network● Recurrent Neural Network - LSTM● Entity Aware LSTM (Hydrology Specific Architecture!)

G

Page 31: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

G

Page 32: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

Incorporate static variables and dynamic variables

G

Page 33: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

Classic LSTM

Entity Aware LSTM

G

Page 34: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

G

Page 35: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

Initial Experiments

T

Page 36: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

Initial Experiments

T

Page 37: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

● We fit a number of different machine learning models.

● EALSTM seems to have performed significantly better than the other models.

Initial Results - EALSTM

Page 38: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

Initial Results

T

Page 39: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

Preliminary Results in April

T

Page 40: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

Preliminary Results in May

T

Page 41: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

EA

LSTM

Per

sist

ence

T

Page 42: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

Initial Results

T

Page 43: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

Initial Results

G

https://github.com/esowc/ml_drought/blob/master/notebooks/draft/15_gt_ealstm.ipynb

Page 44: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

Results - We do best in Cropland areas

Page 45: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

Appendix - Administrative Level performance

Page 46: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

T

Page 47: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

T

Page 48: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"
Page 49: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

G

Page 50: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

● We were particularly interested in understanding the patterns being learnt by the model.

● We used the DeepSHAP implementation of DeepLIFT to interpret how an input data points affected a model’s final prediction

G

Model interpretability was a key component of our

pipeline

Page 51: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

● Persistence (previous month)● Linear Regression● Linear (classical) Neural Network● Recurrent Neural Network - LSTM● Entity Aware LSTM (Hydrology Specific Architecture!)

G

Page 52: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

● Model extras○ Surrounding Pixels○ One hot encoded month○ Spatial Climatology (for each month)○ Spatial Mean of input timesteps

In addition to the pixel wise climate values, we fed

the model a few additional inputs

G

Page 53: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

● Climate Indices.● Identify ‘runs’ of drought events.● Aggregate results by region or landcover.● Diagnose feature contributions (SHAP).● Calculate performance metrics (RMSE, R^2).● Plotting functions.

Page 54: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"

● We were particularly interested in understanding the patterns being learnt by the model.

● We used the DeepSHAP implementation of DeepLIFT to interpret how an input data points affected a model’s final prediction

G