atomnet: a deep convolu0onal neural network for bioac0vity...

43
AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity Predic0on in Structure-based Drug Discovery Izhar Wallach, Michael Dzamba, Abraham Heifets Nithin Kannan CS 371 Presenta2on February 1, 2018

Upload: others

Post on 24-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

AtomNet: A Deep Convolu0onal

Neural Network for Bioac0vity

Predic0on in Structure-based

Drug Discovery

Izhar Wallach, Michael Dzamba, Abraham Heifets

NithinKannanCS371Presenta2on

February1,2018

Page 2: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

Current Approaches to Drug Discovery

•  StructurebasedalgorithmslookatthestructureoftargetproteinstodesignligandsthatbindØ Structurebasedalgorithmshavetoomanyfalseposi2veexamplesØ Theseareexpensivetoexperimentallyverify

• MLalgorithmsaIempttolearnmo2fsandstructuralsimilari2esbetweenligandstopredictligandreceptorinterac2onsØ TrainingsetnotsufficientforaccurateligandbasedMLalgorithms

Page 3: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

AtomNet

• Convolu2onalNeuralNetworksrecognizethelocalstructuresandpaIernsthatmakesuccessfulligands•  Theneuralnetworkisabletopickuponaspectssuchashydrogenbondingandaroma2city

Leeetal.2009Avisualiza2onofincreasinglycomplexlayersofaCNN Wallachetal.2015Sulfonyldetec2on

Page 4: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

AtomNet Architecture

•  TheinputtoAtomNetisa3DinputconvolvedoverastackofhundredsoffiltersØ Theinputcanbethoughtofasanimageofthe3Dstructurewitheachgridcellassociatedwithastructurevector

•  Theoutputisaprobabilitydescribingwhethertheligandwillbindstronglytothetargetprotein

•  Themodelhasfourconvolu2onallayersandtwofullyconnectedhiddenlayerswith1024neuronseachØ 128*5↑3 ,256* 3↑3 , 256* 3↑3 ,256* 3↑3 (numberoffilters,filterdimension)

Page 5: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

The Directory of Useful Decoys Enhanced

(DUDE) Dataset

•  Thedatasetcontainsadiversesetofac2vemolecules(ligands)forcertaintargetsetsofproteinsØ Foreveryac2vemolecule,thereisasetofpropertymatcheddecoys,thatareinac2ve

•  Similarac2vemoleculesareremovedbyclusteringusingscaffoldsimilarityinordertoreduceanaloguebiaswhentraining/tes2ngAtomNetØ ScaffoldSimilarity:structuralsimilarity(numberofringandchainatoms)

• Usedfortrainingandtes2ngofAtomNet

Page 6: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

The ChEMBL-20 PMD Dataset

•  Thedatasetisasetofac2veligandscompiledbytheEuropeanMolecularBiologyLaboratory

• ConstructedinamanneranalogoustotheDUDEdataset

•  30PropertyMatchedDecoys(PMD)associatedwitheachac2vemolecule

• Bemis-Murckoscaffoldswereusedtoclusterthemoleculestoavoidanaloguebias

Page 7: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

Experimentally Verified Inac0ves

•  TheproblemwithPMDdatasetsisthattheyrequiredecoystohaveasufficientlydifferent2Dfingerprint

•  Thisallowstheconstruc2onofmanydecoyswithoutexpensiveexperimentalvalida2on

•  However,thisinitselfcreatesabiasinwhatourmodelislearning

•  Therefore,theseexperimentallyverifiedinac2vesservedtoprovideamorerepresenta2vedatasetandforceourmodeltoproperlyclassifyac2vityonadversarialexamples

Page 8: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

AUC metric

•  Thiscurvelooksatthetrueposi2verateandfalseposi2verateatdifferentbindingthresholds

•  Ideally,wewouldliketohaveourmodelonlypredicttrueposi2ves,givingusanAUCof1

• AsimilarlyimportantmetricisthelogAUCcurve,whichplacesmoreemphasisonareasoftheAUCcurvewithlowerfalseposi2verates

Page 9: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

UNMC2010: VaryingAUCcurves

Page 10: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

Results of the AtomNet Model

•  Thereceiveropera2ngcharacteris2cisagraphofthetrueposi2veratevsthefalseposi2verate.

• Discoveringtrueposi2veswithasfewfalseposi2vesaspossiblestreamlinesthedrugdiscoveryprocess

Wallachetal.2015AtomNet’sperformanceonvariousdatasets

Page 11: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

Strengths

•  Theconvolu2onallayerspickuponlocalchemicalstructures,usinginterac2onsbetweenthesetono2ceincreasinglycomplexrela2onsinourmodel

•  Themodelisrobust,workingonavarietyofdifferentproteins

• DoeswellcomparedtootherstructurebasedmodelsontheDUDEdataset

Page 12: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

Limita0ons

• NoaIemptmadetoalterhyperparametersofthemodel

•  Thecodeisproprietary,meaningotherscan’timprovethemodel

• Didnottrydifferentinputrepresenta2ons

•  Themodeldoesn’tusephysics

Page 13: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

Learning Deep Architectures for Interaction Prediction in

Structure-based Virtual ScreeningCritique by Daniel Hsu

Page 14: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

Virtual Screening

• Virtual screening - computational drug discovery

• Identifies candidates for ligands to bind proteins

• Structure-based: Use binding capacity and structure

Page 15: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

Difficulties• Complexity of chemical space: 1060 [1]

• Commercially available compounds: 107 [2]

• High false positive rate of identified ligands [3]

• Limited datasets for structure-based virtual screening

[1] RS Bohacek, C McMartin, and WC Guida. The art and practice of structure-based drug design: A molecular modeling perspective. Medicinal research reviews, 16(1):3–50, 1996.[2] JJ Irwin, T Sterling, MM Mysinger, ES Bolstad, and RG Coleman. Zinc: a free tool to discover chemistry for biology. Journal of chemical information and modeling, 52(7):1757–1768, 2012.[3] Deng, N.; Forli, S.; He, P.; Perryman, A.; Wickstrom, L.; Vijayan, R. S.; Tiefenbrunn, T.; Stout, D.; Gallicchio, E.; Olson, A. J.; Levy, R. M. Distinguishing Binders from false positives by free energy calculations: fragment screening against the flap site of HIV protease. J. Phys. Chem. B 2015, 119, 976−988.

Page 16: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

Approach

• Use deep learning for structure-based virtual screening

• Predict binding capacity of protein-molecule pair

• Propose new benchmark dataset more suitable for structure-based virtual screening

Page 17: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

Model Pipeline• Process protein and small molecule separately into two

fixed-size descriptions (fingerprint vectors)

• Use neural nets to further transform fingerprints

Page 18: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

Ligand Representation: Fingerprints

• Convert molecule into binary vector

• 1D representation of a molecule

Page 19: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

ECFP Fingerprints

• Hashing - hash concatenated features of neighborhood of atom

• Indexing - Set 1 into index of feature vector

• Sensitive to small perturbations in molecular structure

Page 20: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

ECFP Fingerprints

https://chemaxon.com/products/screen-suite

Page 21: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

Atom Convolution Fingerprints

• Initialize vector representation of each with its element, connectivity, number of hydrogen bonds, etc

• Update vector with convolution of weight matrix on neighbor atoms

• Apply non-linearity to updated vector

Page 22: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

N1

N2

N3

H

Page 23: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

Atom Convolution Fingerprints

• Obtain fingerprint of ligand by convolving another matrix with the final atom vectors to obtain combined sum

• Traditional ECFP fingerprint is binary vector, so approximate this with a softmax operation

• Softmax also makes this operation differentiable

Page 24: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

A1

A2

A3

V

FSoftmax

Page 25: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

DUD-E experiment

• Used model to classify active ligands vs decoys using ECFP only: 0.904 AUC

• Perhaps dataset mostly contains information on differences between ligands and decoys, instead of interactions between ligand and proteins

• Suspicious because model makes conclusions on binding of ligands and proteins without using protein information

Page 26: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

PDBBind + DUD-E

• Use PDBBind for training and DUD-E for testing

• No decoys, negative examples from random sampling

• Fingerprint with network achieved 0.714 AUC, better than fingerprint with ECFP and other algorithms.

Page 27: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

Conclusion

• Deep learning architecture for predicting binding potential

• Form fingerprints using atom convolution instead of ECFP

• Propose new benchmark with PDBBind and DUD-E

• Criticism: Deep learning may not be “cheating” if it can make predictions on binding potential using only ligand information

Page 28: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

Machine learning for structure-based virtual screening

Page 29: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

What are Neural Networks?

2

5

5 …

25

ℎ" = $(&'(")( + +")-.

(

…Input Layer

Hidden Layer

Output Layer

Page 30: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

What are Neural Networks?

3

5

5 …

25

ℎ" = $(&'(")( + +")-.

(

Page 31: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

Convolutional Neural Networks

4

Local connectivity in 2D space. Similar to human vision.

Krizhevsky, A. et al. Proc. Advances in Neural Information Processing Systems 25 1090–1098 (2012)

Page 32: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

Convolutional Neural Networks

5

Shares the same weights and biases. Identifies a certain pattern and is translationally independent.

Krizhevsky, A. et al. Proc. Advances in Neural Information Processing Systems 25 1090–1098 (2012)

Page 33: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

Convolutional Neural Networks

6

Each feature map learns to identify a certain pattern in the image

Krizhevsky, A. et al. Proc. Advances in Neural Information Processing Systems 25 1090–1098 (2012)

Page 34: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

Convolutional Neural Networks

7Krizhevsky, A. et al. Proc. Advances in Neural Information Processing Systems 25 1090–1098 (2012)

Page 35: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

Convolutional Neural Networks

8

3D mesh of a molecule where each point in the grid contains information about the atom types.

Page 36: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

Protein-Ligand Scoring with Convolutional Neural Networks

Ragoza M. et al. J. Chem. Inf. Model. 57 942−957 (2017)

Page 37: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

Improvements from AtomNet

• More detailed methodology.• Includes descriptors which better describe the

binding site, includes aromaticity, protonation state etc.

• Improved visualization method enabling mutation analysis.

• Stringent model evaluation to check for overfitting.

10

Page 38: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

Generating a training set

11

DUD-E is a dataset with 102 proteins, 200,00 ligands and over 1 million decoy molecules. However, not all co-crystal structures are available, but reference complexes are given.

The ligand on the reference receptor is removed and a box is drawn from a 8 Å around the ligand.

Ligands and decoys are docked using smina and the Autodock Vina scoring function.

Page 39: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

About Autodock Vina and smina

• Given a bounded box, smina will run a docking algorithm to generate protein-ligand conformations.

• Each structure is then given a score, where the score is constructed from pairwise interactions including sterics, hydrogen bonding etc. These parameters are optimized using the PDBbind dataset.

• Assumptions• Receptor is rigid.• Protonation state of atoms remain unchanged.

12Trott et al. J Comput Chem. 31 455-461 (2010)

Page 40: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

Results

13Ragoza M. et al. J. Chem. Inf. Model. 57 942−957 (2017)

Outperforms Vina on 90% of the targets.

Page 41: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

Independent Test Sets

• How much is our model overfitting our data?

14Ragoza M. et al. J. Chem. Inf. Model. 57 942−957 (2017)

Page 42: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

Visualization

15Ragoza M. et al. J. Chem. Inf. Model. 57 942−957 (2017)

Page 43: AtomNet: A Deep Convolu0onal Neural Network for Bioac0vity ...cs371.stanford.edu/2018_slides/ml-screening.pdf · About AutodockVinaand smina •Given a bounded box, sminawill run

Limitations

• Ligand screening takes Autodock Vina prediction of ligand conformation as the “truth”.

• Assumption that receptors only have one binding site.• Co-crystal structures are not always readily available.• Entropy and enthalpy not fully encapsulated within the descriptors.• Feature maps are potentially interpretable.• Paper also carried out pose prediction, but the performance was around the

same as Vina.

16