artificial intelligence in drug designartificial intelligence in drug design ola engkvist, hit...

32
Artificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG Research & Innovation April 2 2019

Upload: others

Post on 22-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

Artificial Intelligence in Drug Design

Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden

ELRIG Research & Innovation April 2 2019

Page 2: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

Drug Design

What to make next? How to make it?

De novo design

Multi-parameter scoring function

Retrosynthesis

Page 3: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

What is different now?

3

Augmented design

Autonomousdesign

Automatic design

de novo molecular design

Synthesis prediction

Automation

Data generation

Page 4: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

It takes two to tango

4

Artificial Intelligence Chemistry Automation

Page 5: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

Science @AZ

5

Page 6: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

Neural Networks & Deep Learning

6

• Neural Networks known for decades

• Inputs, Hidden Layers, Outputs

• Single layer NNs have been used in QSAR

modelling for years

• Recent Applications use more complex

networks such as

• Multi-layer Feed-Forward NNs

• Convolutional NNs

• biological image processing

• Auto-encoder NNs

• Recurrent NNs

• Trained using Maximum Likelihood

Estimation to maximize the likelihood

of next character

Page 7: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

Why? Generation of Novel Compounds in the 1060 Chemical Space!

7

Where´s the impact?

• Use for de novo Molecular Design

• Scaffold Hopping

• Novelty

• Virtual Screening

• Library Design

10601010-1012

Page 8: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

Natural language generation and molecular structure generation

8

• Can we borrow concepts from natural language processing and

apply to SMILES description of molecular structures to generate

molecules?

• Conditional probability distributions given context

• 𝑃 𝑔𝑟𝑒𝑒𝑛 𝑖𝑠, 𝑔𝑟𝑎𝑠𝑠, 𝑇ℎ𝑒

• 𝑃 𝑂 =, 𝐶, 𝐶

The grass is ?

C C = ?

Page 9: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

Recurrent Neural Network & Natural language generation

9

Page 10: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

Tokenization of SMILES

10

• Tokenize combinations of characters like “Cl” or “[nH]”

• Represent the characters as one-hot vectors

Page 11: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

The generative process

11

Page 12: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

Reinforcement learning

12

Learning from doing

Action Reward Update behaviour

Design molecule

Active?

Good DMPK?

Synthetically accessible?

Make more like this?

Make something else instead?

Agent

Page 13: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

AI live: Create Structures Similar to Celecoxib

13

• Key Message

• RNN generates

structures similar

to Celecoxib

• Rapid sampling!

• Average score

describes how

many learning

steps are required

to reach similar

compounds

Page 14: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

Some misconceptions about de novo RNN generated molecules

14

“The molecules are not diverse”

“The molecules are not synthetic feasible”

Answer: The generated molecules follows the properties of the molecules used as prior

Segler et al ACS Central Sci. 2018, 4, 120-131 Ertl et al arXiv:1712.07449

Diversity Synthetic feasibility

Page 15: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

“Cambrian explosion” of different DL based molecular de novo generation methods

15

PyTorch + RDKit + ChEMBL => anyone with a computer can contribute =>

Benchmarking is urgently needed

Page 16: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

Which benchmarks? What are the relevant questions?

Does the same algorithm work best for both

scaffold hopping and lead series optimization?

Which algorithm samples the underlying

chemical space most complete?

1

2

3

Which algorithm zooms most efficiently to the

most interesting regions of chemical space?4

Which is best way to describe molecules,

strings or graphs?

Page 17: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

Benchmark published by the scientific community

• MOSES Polykovskiy et al

• https://arxiv.org/abs/1811.12823

• Diversity and quality of generated molecules

1

2

3

• Arus-Pous et al • https://chemrxiv.org/articles/Exploring_the_GDB13_Chemical_Space_Using_Deep_Generative_Models/7172849

• Complete sampling of the relevant chemical space4

• Klambauer et al

• J. Chem. Inf. Mod. 2018, 58, 1736

• Distribution between generated and real molecules

• GuacaMol Brown et al

• https://arxiv.org/abs/1811.09621

• Efficient optimisation of a specific property

Page 18: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

AI + Big Data Transforms Synthesis Prediction

18

Manual heuristic rule generationAutomatic rule generation

Using Machine Learning

Accelerated by

Monte-Carlo

Tree search

What is the impact?

• Improves synthesis success

• Future implementation in iLab

• Route suggestions for Medicinal Chemists

• Invention of “novel” reactions

Segler et al Nature 2018, 555, 604

Page 19: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

Embedding Synthesis Prediction in Projects @AZ

• Build of Reaction Knowledge Base

Flat filesFlat filesFlat filesReaxys

Flat filesFlat filesFlat filesPatent

data DMTA

Make

Test

Analyse

Design

AZ Reaction Connect

~20M reactions

MedChem ELN

PharmSci ELN

AZ ChemistryConnect

iLAB

DMTA

cycles

Drug Discovery

Project

Page 20: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

Synthesis Prediction from Discovery to Launch

20

Forward Synthesis Prediction

Synthetic Route Prediction

Synergy between ML and QM

Support high-throughput experimentation

Page 21: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

Artificial Intelligence Guided Drug Design Platform

21

Generation of Novel

Chemical Space

Reaction & Synthesis

Prediction

iLAB

DMTA

Make

Test

Analyse

Design

Desirability

function

Σ IC50, LogP,

Novelty etc.

Iterations

Profiling

AI Design

Platform

Fully Automated

DMTA Cycle

Page 22: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

2018 Proof-of-Principle Pilot Study

1st iteration

Novelty

3rd iteration

Expansion2nd iteration

Novelty

4th iteration

iLab library

~2month~2month ~2month

Constant re-learning and training

RIA

• Novelty key goal

• Crowded IP space

• Lots of available data

• Selectivity

• New promising series

identified

Oncology

• Selectivity key goal

• Novelty

• Several promising

series identified

CVRM

• Optimising HI series

• Tool compound

Page 23: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

23

Lessons from pilot study

• It works!

• Novel scaffolds were identified in crowded chemical space

• Compound series could be efficiently optimised

• Affinity and ADME predictions are still bottlenecks

• Too many ideas might make prioritization for synthesis challenging

• Chemistry resources need to be frontloaded

• Optimisation under constraints might lead to molecules that is difficult to synthesize

Page 24: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

• Synergize with automation

• Better Machine Learning Models• Access to more data (for instance IMI2 Call 14 Topic 3)

• Experimental descriptors

• Graph convolution, include protein based information

• Multi-task modelling

• Matrix factorization with side information

• Free energy calculations• Progress in speed

• Combine with machine learning

• Confidence estimation• Conformal prediction

• Bayesian methods

• Benchmarking• Public Chemogenomics set available (Excape-DB, Pidgin)

• Blind competitions (SAMPL, D3R)

How can we improve affinity prediction?

24

Page 25: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

Deep learning have struggled to improve bioactivity predictions significantly

25

Ramsundar J. Chem. Inf. Model. 2017, 57, 2068

Tong et al https://www.biorxiv.org/content/10.1101/473074v1

Still potential due to its flexibility

Page 26: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

Experimental descriptors can improve predictions

26

HTS

Imaging

Transcriptomics

Simm et al Bioxriv later Cell Chem. Bio. 25, 611.

Page 27: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

Uncertainty estimation is key

27

• Platt scaling

• Iso-tonic regression

• Venn-ABERS predictor

Calibrated probabilities

Page 28: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

We are still only in the beginning

Formalize Chemical Intuition

Learn from Diverse Data

Extend to new Modalities

Explore New Deep Learning Architectures

Protein Structure Prediction

Automated Machine Learning

Page 29: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

What is the vision?

Augmented

Design

• AI as idea generator

• AI generates and scores

molecules

• All decisions by chemist

based on AI

• Synthesis route design by

chemist

Autonomous

Design

• AI as designer

• Chemist defines,

monitors and updates

the desirability

function

Automated

Design

• AI as designer

• AI monitors and

updates desirability

function when

needed

Tool compounds on “tap”

Green, Engkvist & Pairaudeau Fut. Med. Chem. 2018

Page 30: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

Will ML/AI revolutionize drug design?

My personal opinion(s)

30

• Only time will tell….

• The last commonly agreed revolution was the introduction of DMPK

departments in the 90s, so the bar is high

• ML/AI like other promising technologies (for instance PROTACS) warrants

further investments

• More data, automation and ability to learn makes ML/AI bound to have

larger impact on drug design in the future

• During my 19 years in industry it has never been as exciting to work with in

silico drug design

Page 31: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

Acknowledgements

31

Discovery Sciences Molecular AI TeamThierry Kogej

Hongming Chen

Isabella Feierberg

Atanas Patronov

Esben Jannik Bjerrum

Preeti Iyer

Jiangming Sun (Postdoc 2015-2017)

Noe Sturm (Postdoc 2017-2018)

Philipp Buerger (Postdoc 2017-2020)

Jiazhen He (Postdoc 2019-2022)

Rocio Mercado (Postdoc 2018-2021)

Thomas Blaschke (PhD student 2017-2018)

Josep Arus Pous (PhD student 2018-2019)

Michael Withnall (PhD student 2018-2019)

Oliver Laufkötter (PhD student 2018-2019)

Laurent David (PhD student 2018-2019)

Ave Kuusk (PhD student 2016-2019)

Marcus Olivecrona (AZ GradProgram 2017)

Alexander Aivazidis (AZ GradProgram 2018)

Dhanushka Weerakoon (AZ GradProgram 2018-2019)

Panagiotis-Christos Kotsias (AZ AI GradProgram 2018-2019)

Edvard Lindelöf (Master Thesis Student 2018-2019)

Simon Johansson (Master Thesis Student 2019)

Oleksii Prykhodko (Master Thesis Student 2019)

Academic CollaboratorsMarwin Segler (Munster)

Juergen Bajorath (Bonn)

Jean-Louis Reymond (Bern)

Andreas Bender (Cambridge)

Sepp Hochreiter (Linz)

Gunther Klambauer (Linz)

Sami Kaski (Helsinki)

Discovery Sciences Garry Pairaudeau

Clive Green

Lars Carlsson

Nidhal Selmi

DSM AI TeamErnst Ahlberg

Suzanne Winiwarter

Ioana Oprisiu

Ruben Buendia (Postdoc 2018)

PharmSciPer-Ola Norrby

2018 PoP Pilot StudyWerngard Czechtizky

Ina Terstiege

Christian Tyrchan

Anders Johansson

Jonas Boström

Kun Song

Alex Hird

Neil Grimster

Richard Ward

Jeff Johannes

Page 32: Artificial Intelligence in Drug DesignArtificial Intelligence in Drug Design Ola Engkvist, Hit Discovery, Discovery Sciences, IMED Biotech Unit, AstraZeneca, Gothenburg, Sweden ELRIG

Confidentiality Notice

This file is private and may contain confidential and proprietary information. If you have received this file in error, please notify us and remove

it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the

contents of this file is not permitted and may be unlawful. AstraZeneca PLC, 1 Francis Crick Avenue, Cambridge Biomedical Campus,

Cambridge, CB2 0AA, UK, T: +44(0)203 749 5000, www.astrazeneca.com

32