automatic optimization of predictive bioactivity models · guido bolick: automatic generation of...
TRANSCRIPT
![Page 1: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/1.jpg)
April 25th 2018
Automatic
optimization of
predictive Bioactivity
models
Chi Chung Lam, Fabian Steinmetz, Paul Czodrowski
![Page 2: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/2.jpg)
2
Multiple models trained for biological targets
Random Forests
Neural Networks
Gradient Boosted Trees
NNs and GBTs are very sensitive to hyperparameter changes
Automated ways needed to build models with the right hyperparameters
Predictive Models in Production
![Page 3: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/3.jpg)
millions of unique
combinations possible3
NN Architectures & Hyperparameters
NN-Architecture
• Layer-Type
• Number of Layers
• Neurons per Layer
• Activation-Functions
Training-Parameters
• Optimizer
• Learning-Rate
• Weight-Decay
• Batch-Size
• Loss-Function
• …
Hyperparameters
Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016
![Page 4: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/4.jpg)
4
Genetic Algorithm for hyperparameter optimization
5.1
5.2
4
12 3
Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016
![Page 5: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/5.jpg)
5
Genetic Algorithm Workflow
Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016
![Page 6: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/6.jpg)
6
Comparing Global Models
Model Description
RF Random Forest with fixed hyperparams
Leiden DNN DNN with fixed hyperparams
GA DNN DNN with GA optimized hyperparams
Random DNN DNN with grid search optimized hyperparams
Feature-Wise Baseline Model that takes the fingerprint bit as prediction
XGBoost Gradient Boosted Trees with fixed hyperparams
![Page 7: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/7.jpg)
7
Assume that each fingerprint bit is a prediction, and select the best bit
Feature-Wise Baseline
Bit 0 Bit 1 Bit 2 Bit 3 Activity
Sample 1 1 0 0 1 0
Sample 2 1 0 0 0 0
Sample 3 1 1 1 1 1
Sample 4 1 1 1 0 1
Sample 5 0 0 1 1 0
Kappa score 0.41 1.00 0.67 -0.17
![Page 8: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/8.jpg)
8
Global Model Performance
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
CACO CLINT_H CLINT_M CLINT_R HERG SOL
Kappa S
core
Target
Global Model Performance
RF Leiden DNN GA DNN Random DNN Feature-Wise XGBoost XGBoost Random
![Page 9: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/9.jpg)
9
GA vs Random Search Comparison
Mean kappa score increases as GA evolution occurs
However, good solution is found too easily (already found in initial 100 architectures)
A random search of the same search space finds a similar or better solution
![Page 10: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/10.jpg)
10
Fingerprints hash a molecule’s substructures into a fixed bit
A small fingerprint size will cause “collisions”
A large fingerprint size will cause many redundant bits
Fingerprint Filtering: CLINT_R
FP Size 1024 4096
Avg substructures per bit 79.84 20.64
0.01 variance filter 3 2388
Substr/bit after 0.01 var filter 80.00 21.86
True size after 0.01 var filter 1021 1708
![Page 11: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/11.jpg)
11
Feature-selection of fingerprints by variance
Control: unfiltered FP of same length as filtered FP
Problem: Arbitrary choice of threshold variance
Fingerprint Filtering: CLINT_R
0,000
0,050
0,100
0,150
0,200
0,250
0.01 var Control 0.0 var Control Unfiltered
Mean K
appa S
core
CLINT_R 1024 Bits Filtering
DNN RF XGB
0,000
0,050
0,100
0,150
0,200
0,250
0,300
0.01 var Control 0.0 var Control Unfiltered
Mean K
appa S
core
CLINT_R 4096 Bits Filtering
DNN RF XGB
![Page 12: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/12.jpg)
Finding the optimal variance: CLINT_R
0,000
0,050
0,100
0,150
0,200
0,250
0.01 var 0.0 var Optimal Var Unfiltered
Mean K
appa S
core
CLINT_R Optimal Var Filtering
![Page 13: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/13.jpg)
Finding the optimal variance: HERG
0,000
0,100
0,200
0,300
0,400
0,500
0,600
0.01 var 0.0 var Optimal Var Unfiltered
Mean K
appa S
core
HERG Optimal Var Filtering
![Page 14: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/14.jpg)
Fingerprint Filtering: Problems
Variance of bits highly depends on sample size
Use threshold that is relative to sample size, instead of absolute value
Can we combine this filtering with the “feature-wise baseline” analysis?
Drop fingerprints that correlate poorly with dependent variable?
![Page 15: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/15.jpg)
15
Nested Cluster Validation
![Page 16: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/16.jpg)
16
The final models are used in production and served to chemists, etc.
Retraining occurs every 3 months
During these three months, models are “outdated”
Retraining more frequently is time-wise impractical
XGB and DNNs allow “On-line” updating
Fit new data during an additional training step of existing models
Can happen nearly real-time
Retraining only necessary when performance starts declining
On-line Updating of Models
![Page 17: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/17.jpg)
Our in house environments: CREAM and MOCCA
CREAM (Classification REgression At Merck)
- Python environment and modelling tool
- Used for the majority of predictive models
- Holds versatile features, such as
- Multiple machine learning algorithms
- Different validation methods
- Interface to MOCCA
MOCCA is the Merck Online Computational Chemistry Analyzer, our
web-based in-house prediction tool
![Page 18: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/18.jpg)
![Page 19: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/19.jpg)
Global models
• Large Dataset
• Large Applicability Domain (AD)
• Endpoints, such as
• Physico-chemical Properties
• Pharmacokinetics
• Toxicity
• General Selectivity
Global vs. local models
Local models
• Smaller Dataset
• Smaller Applicability Domain
• Endpoints, such as
• Activity
• Selectivity
• Toxicity, Pharmacokinetics
Generally global models are preferrable dueto greater in-house modelling experience andlarger AD, but we are happy to supportprojects with local models if needed.
![Page 20: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/20.jpg)
e.g.
![Page 21: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/21.jpg)
![Page 22: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/22.jpg)
22
• Chi Chung Lam
• Wolf-Guido Bolick (Andreas Dominik)
• Fabian Steinmetz
• Kristina Preuer, Günter Klambauer (Sepp Hochreiter)
• Friedrich Rippmann
• Marcel Baltruschat
• Cornelius Kohl
• Samo Turk
• Jan Fiedler
• Christian Röder
Acknowledgement
![Page 23: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/23.jpg)
23
back-up
![Page 24: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/24.jpg)
24
SET Train Test Classes
CACO 9637 523 3
CLINT_H 16264 797 3
CLINT_M 18313 981 3
CLINT_R 15910 760 3
HERG 6894 288 2
SOL 19615 667 3
Datasets
![Page 25: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/25.jpg)
millions of unique
combinations possible25
NN Architectures & Hyperparameters
NN-Architecture
• Layer-Type
• Number of Layers
• Neurons per Layer
• Activation-Functions
Training-Parameters
• Optimizer
• Learning-Rate
• Weight-Decay
• Batch-Size
• Loss-Function
• …
Hyperparameters
![Page 26: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/26.jpg)
26
Optimization of Hyperparameters
Expert Lucky People Everyone
Hyperparameters derived
from literature & experience
Hyperparameter search
within promising parameter
areas
Random-Search (Bergstra et al. 2012)
Grid-Search (Larochelle et al. 2007)
Probability based algorithms (Brochu et al. 2010, Bergstra et al. 2011)
Directed Random-Search
(e.g. genetic algorithms)
![Page 27: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/27.jpg)
27
What is a Genetic Algorithm?
5.1
5.2
4
12 3
![Page 28: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/28.jpg)
28
Validation Strategies
• Use as much data as possible for training
• Being able to get a realistic glimpse of the
performance
• 5-fold cross-validation
• Every compound represented in 4/5 models
• Hyperparameter optimization to increase
performance of validation sets
• Resulting performance trustworthy ?!
• 5-fold nested cross-validation 25 models
• Every compound represented in 16/25 models
• Increased computational requirements
• 5x Hyperparameter optimizations to increase
performances of validation sets
• Final performances evaluated using
corresponding outer loop test sets
![Page 29: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/29.jpg)
29
Getting a job (hyperparameters) from the jobserver
Repeat for all training/test sets:
Building of a NN based on hyperparameters
Training of the NN using a training set
Balanced-Batch-Generator maintains the same active/inactive-ratio within a batch
Early-Stopping, when mean validation-loss of sliding window (15 epochs) does not
improve for 100 epochs
Evaluation of best state (center of best window)
using validation set, metric Cohen’s Kappa
Training of a NN
1
2
2.1
2.2
2.3
Agreement of labels vs. prediction
Agreement of 2 random observers
![Page 30: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/30.jpg)
30
So many parameters..
Genetic Algorithm
• Population-Size: 100
• Workers: 10
• Fingerprint-Size:
1024
• Smarts-Patterns:
826
• Evolution-Strat.:
Drop-Worst-50%
Mutation Settings
• Default:
• Mutation-Rate: 5%
• Mutation-Strength: 1
• Crossing-Over-Rate: 30%
• Increased:
• Mutation-Rate: 10%
• Mutation-Strength: 2
• Crossing-Over-Rate: 30%
Training
• Optimizer: sgd, rmsprop, adagrad, adadelta, adam, adamax, nadam
• Loss-Functions: mae, mse, msle
• Learning-Rate: 0.05, 0.1, 0.5, 1.0
• Weight-Decay: 0.0, 1E-7, 5E-7
• Momentum: 0.0, 0.1, …, 0.9
• Nesterov: 0, 1
• Batch-Size: 5%, 6%, …, 20%
Architecture
• Layers: 1-4
• Layer-Types: Dense, Dropout
• Neurons: 32, 64, …, 512
• Dropout-Ratio: 5%, 10%, …, 90%
• Activation-Functions: linear, sigmoid, hard-sigmoid, softmax, relu, tanh
![Page 31: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/31.jpg)
31
Datasets
Dataset hERG Micronucleus-Test
Compounds 6999 798
Actives 3205 (46%) 263 (33%)
Inactives 3794 (54%) 535 (67%)
Binary Classification: Inactive 0
Active 1
![Page 32: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/32.jpg)
32
Found NN-Hyperparameters
![Page 33: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/33.jpg)
33
Found NN-Hyperparameters
![Page 34: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/34.jpg)
34
Improvement of NNs while running the GA
Initial population starts with inner-
kappa values of ~0.6 in all splits
GA is able to improve performance of
best entities even more (red line)
Mutations can lead to bad performing
entities (blue line) until the last
generation
![Page 35: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/35.jpg)
35
Novelty of Architectures
Proportion of new entities in population
decreases during the runtime of the GA
Higher mutation-rate (red line) increases
the searchable space for the GA
![Page 36: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/36.jpg)
36
Influence of Hyperparameters
1_activation (344)
First hidden
layer
Activation-function
of this layer
Number of
contributing pairs
Contributing pairs only differ by
the shown parameter
Boxplots are based on the
absolute difference of both inner-
kappa values of all contributing
pairs
![Page 37: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/37.jpg)
37
User-Interface
![Page 38: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/38.jpg)
38
Implemented an algorithm to create a consensus-model using 5-fold nested cross-validation
Each compound is represented in 16 of 25 NNs
Calculation needs 8-14 hours (e.g. during a night) using a GTX-Cluster
GA improves already high kappa values of NNs even more
Kappa values of final NN-models are mostly larger than 0.5 (“moderate” according to Landis et al. 1977)
Further steps:
Possibility to use chemical descriptors and multiple fingerprints
Option to create multi-class models (more classes than just 0 and 1) and regression models
(Polishing up and writing a paper)
Conclusion
![Page 39: Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016. 6 Comparing Global](https://reader033.vdocuments.us/reader033/viewer/2022042116/5e9435af98a849083b3bf6b7/html5/thumbnails/39.jpg)
39
Implementation of the GA