approaches to improvements in quality of computational models in drug discovery. · 2016-04-19 ·...

25
Approaches to improvements in quality of computational models in drug discovery. Dr. Michal Vieth Indianapolis, In USA, November 2015 Summary of professional accomplishments – Attachment No. 1 (English version) to the application for the habilitation qualification

Upload: others

Post on 23-Apr-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Approaches to improvements in quality of computational models in drug discovery. · 2016-04-19 · Approaches to improvements in quality of computational models in drug discovery

Approaches to improvements in quality of computational models in drug discovery.

Dr. Michal Vieth

Indianapolis, In USA, November 2015

Summary of professional accomplishments – Attachment No. 1 (English version) to the application for the habilitation qualification

Page 2: Approaches to improvements in quality of computational models in drug discovery. · 2016-04-19 · Approaches to improvements in quality of computational models in drug discovery

1. Name and surname Michal Vieth 2. Held diplomas, scientific / arts degrees - with the name, place and year of acquisition, and the title of doctoral dissertation. 10.1995 – Doctor of Philosophy in Chemistry, specialization: theoretical chemistry. Department of Chemistry, The Scripps Research Institute, La Jolla, Ca. Title of the doctoral thesis “Theoretical Studies on Leucine Zippers. Folding and Multimeric Equilibria”, supervisor: Dr. Jeffrey Skolnick. 06.1991 – Master of Science degree in chemistry, Faculty of Chemistry, University of Warsaw. 3. Information on current and previous employment in scientific /art institutions. 1996 –1998 The Scripps Research Institute, Postdoctoral Fellow

Docking algorithm development – utilization of various search algorithms and modification of docking functions, CHARMM implementation. Supervised by prof. Charles Brooks III.

1998 – 2000 Research Scientist, Chemistry Research and Technologies, Eli Lilly &Co.

2001-2004 Principal Research Scientist, Discovery Chemistry, Eli Lilly &Co.

2004 -2009 Research Advisor, Discovery Chemistry, Eli Lilly & Co.

2009 -current Senior Research Advisor, Discovery Chemistry, Eli Lilly & Co.

Personal statement/position description. My research at Lilly focuses on the development and application of computational technologies to understand and utilize molecular interactions in biological systems with particular focus on the utilization of fragment approaches. I lead multiple technology initiative teams and collaborate with scientists from a variety of disciplines. I currently collaborate with multiple academic groups on computational technology development (Meiler Group at Vanderbilt through Award Lilly Research funding aiming at FolditDD drug discovery game, Roux’s group at University of Chicago - force field development and free energy MD, Sali’s group at UCSF – computational protocols for EM model reconstruction) and on Tuberculosis (Sacchatini group at TAMU – new TB target discovery) and malaria (Lowe’s group at Purdue – kinase inhibitors in malaria). Over 18 years at Lilly I have contributed to development and delivery of multiple computational applications to support drug discovery efforts. I also successfully led chemistry team to deliver Hit package which eventually resulted in clinical candidate for a kinase project. I have also supervised and mentored multiple Ph.D scientists, including four postdoctoral scientists, currently supervise one postdoctoral fellow and two PhD scientists and have maintained strong publication record while performing pharmaceutical research. I also advise on NIH panel for illuminating drugable genome initiative. I co-organized IPK2014 international kinase meeting in Warsaw in Sept 2014, and currently serve on organizing committee for EUROQSAR 2016 to

2

Page 3: Approaches to improvements in quality of computational models in drug discovery. · 2016-04-19 · Approaches to improvements in quality of computational models in drug discovery

be held in Verona, Italy 2016 and organizing the IPK2016 conference to be held in Poland, September 2016. 4. Indication of achievement1 under Art. Paragraph 16. 2 of the Act of 14 March 2003 Academic Degrees and Title, and Degrees and Title in Art. (Dz. U. No 65, item. 595 with amendments):

a) the title of the scientific achievement: Approaches to improvements in quality of computational models in drug discovery.

Publications comprising academic achievement

* – corresponding author H1. Vieth, Michal; Hirst, Jonathan D; Kolinski, Andrzej; Brooks, Charles L*, Assessing energy functions for flexible docking, Journal of Computational Chemistry,19,14,1612-1622,1998

H2. Wu, Guosheng; Robertson, Daniel H; Brooks, Charles L; Vieth, Michal* ,Detailed analysis of grid-based molecular docking: A case study of CDOCKER—A CHARMm-based MD docking algorithm, Journal of Computational Chemistry,24,13,1549-1562,2003

H3. Vieth, Michal*; Cummins, David J,DoMCoSAR: a novel approach for establishing the docking mode that is consistent with the structure-activity relationship. Application to HIV-1 protease inhibitors and VEGF receptor tyrosine kinase inhibitors, Journal of Medicinal Chemistry,43,16,3020-3032,2000, (78 citations, first, lead, corresponding author, 2000 IF 4.134)

H4. Wu, Guosheng; Vieth, Michal*;,SDOCKER: a method utilizing existing X-ray structures to improve docking accuracy, Journal of Medicinal Chemistry,47,12,3142-3148,2004

H5. Vieth, Michal*; Siegel, Miles G; Higgs, Richard E; Watson, Ian A; Robertson, Daniel H; Savin, Kenneth A; Durst, Gregory L; Hipskind, Philip A ,Characteristic physical properties and structural fragments of marketed oral drugs, Journal of Medicinal Chemistry, 47,1,224-232,2004

H6. Siegel, Miles G; Vieth, Michal*, Drugs in other drugs: a new look at drugs as fragments, Drug Discovery Today,12,1,71-79,2007

H7. Vieth, Michal*; Siegel, Miles ,Structural fragments in marketed oral drugs, in Fragment based Approaches in Drug Discovery,34, 2006,Methods and Principles in Medicinal Chemistry

H8. Sutherland, Jeffrey J; Higgs, Richard E; Watson, Ian; Vieth, Michal*,Chemical fragments as foundations for understanding target space and activity prediction, Journal of Medicinal Chemistry,51,9,2689-2700,2008

H9. Vieth, Michal*; Erickson, Jon; Wang, Jibo; Webster, Yue; Mader, Mary; Higgs, Richard; Watson, Ian ,Kinase inhibitor data modeling and de novo inhibitor design with fragment approaches, Journal of Medicinal Chemistry,52,20,6456-6466,2009

3

Page 4: Approaches to improvements in quality of computational models in drug discovery. · 2016-04-19 · Approaches to improvements in quality of computational models in drug discovery

H10. Gao, Cen; Cahya, Suntara; Nicolaou, Christos A; Wang, Jibo; Watson, Ian A; Cummins, David J; Iversen, Philip W; Vieth, Michal* ,"Selectivity data: assessment, predictions, concordance, and implications", Journal of Medicinal Chemistry,56,17,6991-7002,2013

c) discussion of the scientific / artistic goals of the above publication / publications and the results achieved together with a discussion of their possible use

Introduction to the subject and the scientific goals of the publications comprising the academic achievement Drug discovery is an extremely complex process as it requires understanding of the differential role of the biomolecular target in a disease and healthy state and it's modulation by a drug molecule (1). In addition the interactions of the drug molecule with the other biomolecules in the body prior to it's delivery to the site of action add to the complexity. Computational technologies aim at accelerating discovery process by aiding in all stages of the discovery from target selection, screening set design, analysis and follow-up, improving properties and potency of the molecules to optimization of the in vivo parameters(2). Critical understanding of limitations of these technologies allows not only to properly use them in various settings but also to continually challenge the field and make improvements to broaden their contextual utilization. This dissertation will demonstrate focus on critically analyzing and understanding current state of the art in computer aided drug design, creating algorithms, tools and processes that provide improvements to the quality of models and demonstration proof of concepts in utilization of those improvement in small molecule discovery. For the purpose of this work small molecules are defined as having molecular weight of less than 1000 Daltons.

In particular this dissertation will present the design and utilization of three aspects of computational technology - small molecule docking algorithm (H1-H4) (3), it's origins, design and continual improvements, in silico analysis of building blocks (molecular fragments) of existing bio active molecules (H5-H8) and its utilization in design of new chemical matter(H9). It will also discuss limitation and current state of the art in predicting biological activity and selectivity of small molecules (H10). It will also touch on the ways biological data can be used to relate biological targets of small molecules (H8). I hope to provide a perspective on the future directions to improve understanding and predicting structural and energetic aspects of protein ligand interactions and how they can be used to improve medicinal chemistry designs. Drug discovery pipeline and points of intervention of the techniques presented in the publications are highlighted in Figure 1.

4

Page 5: Approaches to improvements in quality of computational models in drug discovery. · 2016-04-19 · Approaches to improvements in quality of computational models in drug discovery

Figure 1. Drug discovery process and sample utilization of computational technologies discussed in this work. throughout the process. All stages of drug discovery will be discussed, except for ADME (Absorption, Distribution, Metabolism, Excretion) predictive models. Particular scientific goals and the presentation of the publications results

I. Small molecule to protein docking algorithm development and improvements

i) Analysis and process to select the docking energy function and proof of concept application in small number of test cases. (Publications H1)

ii) Improvements to the algorithm, docking pipeline creation, speedup, automated small molecule force field, large scale validation of Molecular Dynamics docking algorithm CDOCKER (currently marketed by Accelrys/Dassault). (Publication H2)

iii) Utilization of information present in binding data to improve docking accuracy (publication H3)

iv) Utilization of existing Xray information to improve docking accuracy for specific targets (Publication H4)

II. Computational Fragment approaches to analysis, design and binding affinity prediction i) Analysis of building blocks (molecular fragments) and of existing drugs (Publication H5),

initial molecular fragmentation algorithm introduction (H5) ii) Detailed analysis of related drugs based on fully embedded fragments and implications to

the historical trends in drug discovery (Publications H6, H7) iii) Analysis of fragments of bioactive molecules and utilization of fragment statistics in relating

target space, introduction to new generation of fragmentation algorithm and concept of de novo design (Publication H8)

iv) Utilization of fragment based analysis of bioactive molecules and predictive models in de-novo design of 7 kinase libraries (Publication H9)

5

Page 6: Approaches to improvements in quality of computational models in drug discovery. · 2016-04-19 · Approaches to improvements in quality of computational models in drug discovery

v) Assessment of state of the art of predictive models for selectivity prediction and comparison to variability in experimental data (Publication H10)

III. Future outlook on the improvements of quality of computational technologies in drug discovery i) Protein ligand interaction, simulations, and free energy methods ii) Small molecule FF improvement iii) Statistical quality improvement

I. Small molecule to protein docking algorithm development and improvements

Most small molecule drugs interact specifically with biomolecular target, usually protein enzyme or receptor. The essential target is intended to be modulated to stop, revert or manage the disease (4). In early phase of drug discovery, during lead generation knowledge of the protein ligand complex structure, while not absolutely necessary(1), can greatly accelerate discovery process (5, 6). It also allows for understanding, on molecular level, key interactions responsible for binding affinity, and allows for future optimization of potency, selectivity and indirectly ADME (7) (Absorption, distribution, metabolism, excretion) properties in later phases. X-ray crystallography is key technique, which allows for determination of protein ligand structures (8), either through soaking or co-crystallization experiments. As of October 2015, more than 59K protein ligand complex structures (with 17K chemicals weight in drug like space between 100 and 1000 Daltons) have been deposited in Protein Data Bank(PDB), 90% of them coming from crystallography(9). This information serves as a great resource to study, understand protein ligand interactions, validate predictive models and develop computational methodologies.

i)Analysis and process to select the docking energy function and proof of concept application in small number of test cases.

Molecular docking(10) is a companion computational technique which aims at predicting ligand binding mode in the protein target. It requires the detailed knowledge of the active site, and in most applications to make the problem tractable utilizes rigid protein structure taken from a known complex. In reality the ligand binding sometimes leads to small or larger changes in protein structure(11) complicating the prediction of binding mode and leading to a vast majority of inaccurate pose predictions(12).

In most early docking benchmark studies, complex structures were separated and ligand conformations and positions were randomized and attempted to be algorithmically positioned in the active site of the protein(13). In most docking algorithms the process is stochastic in nature, utilizing optimization algorithms, and repeated multiple times resulting in several low energy, top scoring poses (13). The algorithm discussed in this work, CDOCKER(H1-H2) (3) uses all atom Molecular Dynamics based simulated annealing optimization to produce candidate ligand poses. For a good docking function, the lowest energy poses correspond to the native or near native minima (usually within 2Å from the benchmark X-ray). In the early work on docking functions, concepts from protein folding(14)

6

Page 7: Approaches to improvements in quality of computational models in drug discovery. · 2016-04-19 · Approaches to improvements in quality of computational models in drug discovery

energy optimization were applied to select docking functions that gave the largest separation of native like poses from miss-docked solutions (greater than 4Å)(H1, Figure 2).

Figure 2. On the left a representation of sample docked native like solution (green) and missdocked (red). Right panel shows the energy energy distribution for the optimal docking function parameters selected over the 6 test cases for multiple docking runs.

While the initial data set used to optimize the CHARMM(15) energy function for docking was small (6 proteins)(H1), and the protein structures were taken directly from the complexes the energy parameters selected by this protocol were proven to be widely applicable to large benchmark and prospective datasets(H2-H4). Moreover, the protocol introduced in (H2, (3)) work was applicable to more complex and more realistic situations, where protein structures changed upon ligand binding(H3) (13, 16).

Figure 3. Sequence of docking events from left to right. Initial structure in 2D, pose generation in random orientations, final solutions (green is an Xray), the best docking solution (magenta). Example is Thrombin and one of its inhibitors.

Due to the randomized position of initial structures, many initial poses clashed with protein and led to failed Molecular Dynamics (MD) annealing and overall poor performance (Fig 3 illustrates the docking process). The solution to the search effectiveness was to smoothen the energy function by the introduction of soft core repulsive potential and gradually harden it during the docking (H1, (3)). This was accomplished by almost completely reducing the hard core of the vdW repulsive 1/r12 and electrostatic potentials in initial stages of the docking runs with maximum repulsive value at around 10-20kcal/mol, then switching to regular vdW in the final stages of the docking. The soft core repulsive

7

Page 8: Approaches to improvements in quality of computational models in drug discovery. · 2016-04-19 · Approaches to improvements in quality of computational models in drug discovery

potential was found to be absolutely essential in the effective usage of MD and other search strategies for docking (H2) (3).

One of the early challenges in proper accuracy assessment was the inherent bias generated by the usage of the rigid protein structure taken from the same complex, the process that was later called self docking (13, 16). Also, the ligand conformational and torsional biases were introduced if the ligand starting conformation was taken from the complex X-ray. These have been shown to bias the native solution with unrealistic shape complementarity, thus providing unrealistically high accuracy for some docking algorithms. In order to avoid the bias the systems used in later performance benchmarks included protein structures taken from one complex for a series of ligands, generation of ligand 3D structure from a chemical sketch, as well as the usage of sporadic prospective tests(13, 16).

ii) Improvements to the algorithm

The initial positive results from benchmarks led to the implementation of the automated ligand parametrization routines. This allowed to create a pipeline for target specific docking in Lilly environment. The docking pipelines were utilized in a number of discovery efforts (17-20). The need to improve performance and accuracy led to the introduction of pre-computed docking energy grids (both soft vdW core and hard core vdW). In the final scoring of poses grid scoring provided significantly less accuracy (74% structures <2Å for full force field docking vs. 68% for grid docking on a set of 41 protein/ligand complexes in H2). Thus the off grid, all atom scoring was introduced in the final stages providing the balance between docking speed and docking energy function geometric accuracy (H2).

iii) Utilization of information present in binding data to improve docking accuracy

The realization, that even small protein structure changes upon binding can strongly affect the generation of native like solutions led to the usage of various prior knowledge about the specific system. On one hand, the structure activity relationship was demonstrated to distinguish the native common substructure docking mode from the non-native one. This was first demonstrated in publication H3, where multiple related compound activity data (Structure Activity Relationship/SAR) were utilized to select native docking mode for HIV-1 protease and VEGF kinase inhibitors. The complexed protein structure generated similar number of native-like docking solutions for common substructure core to apo structure for HIV protease (30% vs 32% for the apo), despite the fact most of the whole molecule solutions would not have been considered as native. In this case, the apo active site differs more than 2Å in backbone RMS from complexed structure, due to the HIV flap movement. In the presented approach (DomCoSAR - Docking mode Consistent with SAR/publication H3) the major clusters of common substructures were used as positional constrains to dock and evaluate the major binding modes. The native solutions docking scores had significantly higher correlation with experimental potency for this specific target (R2 of 0.44 for native solution vs. 0.14 for non-native poses) (Figure 4).

8

Page 9: Approaches to improvements in quality of computational models in drug discovery. · 2016-04-19 · Approaches to improvements in quality of computational models in drug discovery

Figure 4. Major docking mode depiction for HIV-1 series of ligands (in 3D green is an Xray, red is Mode2 A, Yellow is mode 1, B)and predictive R2 distribution for 4 fold X-validation. Taken from H3.

While the correlation of binding mode with potency did not hold for many future targets, the idea of utilization of predicted or known common substructures as constraints for docking is currently widely utilized in many current high accuracy docking protocols and congeneric series library dockings(12).

9

Page 10: Approaches to improvements in quality of computational models in drug discovery. · 2016-04-19 · Approaches to improvements in quality of computational models in drug discovery

iv)Utilization of existing Xray information to improve docking accuracy for specific targets

Another idea to improve the docking accuracy for new ligands against the protein with known complex structures was the introduction of shape biasing potential based on the X-ray structures of prior ligands (H4). A combination of energy and shape grids were used in SDocker and showed improved cross-docking performance in 3 tested systems (Fig 5).

Figure 5. Left - Shape grid and 32 thrombin ligads. Middle – docking poses vs. Xray structure (green), Yellow is best scoring Cdocker pose for 1ets, magenta are Sdocker poses from 10 independent runs. Right – the number of docked ligands with RMS <2Å from X-ray for different algorithms showing Sdocker is superior to all tested methods (taken from publication H4).

A practical utilization of shape docking bias is the filtering of final poses with the shape complementarity to existing ligands - all these concepts are currently utilized in modern docking application like posit from Openeye(21). We have automated the implementation of the maximum common substructure based docking for congeneric series, initially in CDocker, and more recently within the MOE modeling package (12). In the latter, the implementation of the maximum common subgraph constraints allowed for improvement of docking accuracy by almost 2 fold in a large 8784 docking run benchmark of 1742 ligands and 125 protein targets, approaching 80% docking accuracy within 2Å from intended known native solution. The majority of docking failures come from the pocket rearrangements and cannot be overcome in a rigid docking approach. Future studies to refine the poses with explicit solvent MD simulations are hypothetical solutions and are currently underway in our group. The drawback is that constraints, can only be implemented for structures related to previously crystallized ligands, thus limiting the approach to lead optimization phase of drug discovery. In our internal benchmark studies the protocol covered retrospectively 70% of crystallized ligands(12).

II. Analysis and Utilization of fragment based computational strategies in drug discovery

i) Analysis of drug databases and initial observation of privileged fragments

Continued evolution of synthetic organic chemistry allows medicinal chemists to design and create growing diversity of bio active molecules(22). For specific targets certain chemical substructures appear to have special binding ability and are frequently used as key building blocks in new molecule design(23) (and H8). However, not all bio active molecules can show sufficient and prolonged concentration in the target tissue of diseased organism, and only certain chemical fragments appear in marketed

10

Page 11: Approaches to improvements in quality of computational models in drug discovery. · 2016-04-19 · Approaches to improvements in quality of computational models in drug discovery

drugs(publication H5-H7). The analogy of certain chemical fragments as building block of drugs to 20 amino acids building all proteins in our body was introduced by our analysis of marketed drugs for each route of administration(H6). We used an automated algorithm "Molecular Slicer" (H5) to recursively break bonds of small molecules and statistically analyze the resulting chemical fragments. The original algorithm had 15 ordered rules for breaking the bonds, i.e. Peptide bond or connection between two aromatic rings, most, but not all originating from retro synthetic reactions(23). Molecular Slicer applied to a peptide or a protein would result in the composition of individual amino acids. We distinguished terminal fragments (side chains with one broken bond) and more centrally positioned scaffolds/linkers with two or more slicer bond breakages.

The results revealed certain statistical preferences for fragments present in both injectable and oral drugs. Phenyl, benzyl side chains were among the most frequent ones across all drugs. Some fragments, mostly scaffolds, appeared preferentially in oral drugs (Figure 6a, 7a), ie. 4-substituted phenyls, pipperazine, benzodiazepine linkers. Some fragments appeared almost exclusively in injectable drugs (proline, arginine, tyrosine linkers) (Figure 6B, 7B). The analysis revealed insights into chemical origins and statistical fragment preferences for drugs for each route of administration.

Figure 6. Slicer side chains, number of drugs (of 1376 oral) in which the fragment occurs is given. A) Oral drugs. B) Injectable drugs. From publication H5.

We concluded that, in addition to certain computed properties(such as computed logp less than 6.3, Molecular Weight less than 593 Daltons), most (defined as 95 percentiles) marketed oral drugs also contain privileged chemical fragments. These privileged fragments, in addition to their contribution to bioactivity, provide for sufficient metabolic stability and suitability for oral delivery. The work inspired creation of many new molecules containing the orally and bio actively privileged fragments adding and strengthening the initial hypothesis (205 citations of H5).

11

Page 12: Approaches to improvements in quality of computational models in drug discovery. · 2016-04-19 · Approaches to improvements in quality of computational models in drug discovery

Figure 7. Oral (A) vs. Injectable (B) scaffolds-linkers (from Publication H6) ii)Detailed analysis of related drugs Later work which focused on gene family dependent properties and fragments of drug space refined the concept of computed property boundaries. We concluded the computed drug properties are dependent on the historical drug target biases(24), while further expanding the understanding of synthetic biases in drug molecule creation. In particular our observation (publication H6) that more than 15 percent of marketed oral drugs are full substructures of larger drugs, and as many as 30% of oral drugs fully contain smaller drugs added to understanding of privileged fragments by indicating historical biases in the chemistry and drug research. These biases continue to influence our evolving knowledge of drug space and point to the research community not to over-interpret highly contextual drug database analysis results. We also provided a statistical perspective supporting the statements made by sir. 1994 Nobel prize laureate in Physiology and medicine Sr. James Blake “..the most fruitful basis for the discovery of a new drug is to start with an old drug” (25) and set foundations for our future design work (publication H9).

12

Page 13: Approaches to improvements in quality of computational models in drug discovery. · 2016-04-19 · Approaches to improvements in quality of computational models in drug discovery

Table 1. Most frequently occuring DIODs (Drugs in other Drugs). The last column indicates the number of larger drugs the smaller drug (DIOD) is fully embeded in.

iii) Next steps – using fragments of bioactive molecules to characterize target space and design new molecules

Molecular Dicer (publication H8) was introduced to simplify and improve the performance of initial Slicer algorithm on large datasets. The algorithm featured 5 bond recursive breaking reactions (Fig. 8) sufficient to provide relevant fragments in large databases of millions of compounds. At the same time the concepts of joining the resulting fragments with special rules were introduced to propose new compounds from existing fragments. Initial demonstration of fragment utility was done to analyze gene family actives. We showed privileged fragments and their frequencies in specific targets and within each gene family. We also demonstrated that fragment frequencies can be used in predictive Naïve Bayes models, outperforming standard fingerprints in retrospective tests for SP activity prediction in 6 kinases. In addition, we demonstrated the large prospective validation set with hit rate of fragment based models doubling that from historical data on similarity searches (from 1965 compounds predictive active in one kinase 42% tested >70% inhibition at 20uM at one or more kinases) (publication H8).

Figure 8. Exemplification of the Dicer cut points.

13

Page 14: Approaches to improvements in quality of computational models in drug discovery. · 2016-04-19 · Approaches to improvements in quality of computational models in drug discovery

A proof of concept for utilization of fragment frequencies to relate individual targets within and across multiple gene families was shown (Fig. 9). We utilized Tanimoto similarity computed from individual target fragment frequencies or using fragment frequencies for each gene family. Fragment Similarity Tanimoto is defined by fragment frequencies in actives of targets 1,2:

The concept of using actives to related targets was first introduced by us in 2004 (26) and later expanded by Shoichet and co-workers in their SEA approach(27). Fragment similarity allows another way to relate targets based on their ligands, even if the whole ligand similarity is low.

=

=

−+= Nfrags

kkkkk

Nfrags

kkk

FrFrFrFr

FrFrFragSim

1,2,1

2,2

2,1

1,2,1

2,1

14

Page 15: Approaches to improvements in quality of computational models in drug discovery. · 2016-04-19 · Approaches to improvements in quality of computational models in drug discovery

Figure 9. Relating individual targets (top panel) and Gene Families (bottom panel) with fragment frequency Tanimoto similarity metric (from publication H8).

15

Page 16: Approaches to improvements in quality of computational models in drug discovery. · 2016-04-19 · Approaches to improvements in quality of computational models in drug discovery

iv)Utilization of fragments of bioactive molecules and predictive models in de-novo design of 7 kinase libraries

While fragment frequencies in actives could be used as descriptors in predictive models, their predictive power was significantly and meaningfully inferior to the emerging Support Vector Machine Fingerprint machine learning (SVMFP, publication H9). For the same set of 6 kinases, SVMFP models utilizing short substructure based circular Extended Connectivity Merck Molecular Pair fingerprints (based on substructure based fingerprints) showed 84% hit rate (defined as % inhibition at 20μM) on 1895 prospectively tested compounds selected from Lilly corporate library (H9). These models were determined to be of sufficient quality to allow for their use in de-novo design. The models and bioactive fragments were used to design 7 kinase biased chemical libraries.

The general process used in the design is depicted in Fig. 10 and involved fragmentation of existing bioactive compounds and utilization of the resulting fragments into new molecules with top scoring ones (as judged by SVMFP models for one of the six targets) selected for synthesis. In order to make the designs chemically tractable specific connections to selected scaffolds were required, so the actual groups in the specific locations were introduced from the commercially available building blocks.

Figure 10. General process to design chemical libraries (a), definition of the pharmacophore (topology for the design in 3D from superposition of Xray structures (b), pharmacophoric points in 2D (c), sample recombination of the scaffolds (hinge binding), with hydrophobic and solubilizing groups (d). Taken from publication H9.

16

Page 17: Approaches to improvements in quality of computational models in drug discovery. · 2016-04-19 · Approaches to improvements in quality of computational models in drug discovery

The process led to the synthesis and biological evaluation of 179 compounds representing 7 chemical classes. Overall, 92% of the synthesized compounds showed activity (>70% at 20μM) in one of the six kinases tested, with 90% of compounds active in Flt3 kinase (Fig. 11). Many compounds showed low nanomolar inhibition in one or more of the tested kinases. The compounds, while utilizing fragments from other kinase inhibitors differed from known actives by an average of 25% (Fig 11a). Compounds and their activity profile from one of the libraries are exemplified in Table 2. One of the limitations of the models was the inaccurate prediction of selectivity of compounds (Fig. 11c). Most compounds showed inhibition in multiple kinase, which in practice, limited their usage in future discovery. Nevertheless, the study (publication H10) set the benchmark for the computational de-novo design.

Figure 11. a)Tanimoto NN similarity distribution of 179 to know actives, b)hit rates against different kinases at 20μM, c) predicted vs. observed activity (from Publication H9).

17

Page 18: Approaches to improvements in quality of computational models in drug discovery. · 2016-04-19 · Approaches to improvements in quality of computational models in drug discovery

Table 2. Structures and data for selected compounds from library 6. Hydrophobic groups are drawn in red, solubilizing in green. Predicted values are shown in bottom row for each structure. Heat map color coding is used to denote the degree of predicted and observed inhibition. Dose response data for CHK2 and MET where available is shown. Empty cells signify missing data. Taken from H9.

Structure MW

CHK2 FB

IC50 uM

hMET FB

IC50 uM

ABL1 %

MET %

FLT3 %

CHK2 %

P70 %

ROCK2 %

537 1.13 1.32 11 71 61 60 45 42

predicted 57 54 42 51 54 45

465 0.58 87 . 93 98 89 98

predicted 59 57 61 36 54 45

451 0.61 84 92 86 96 100 37

predicted 63 59 58 35 55 48

v) Assessment of state of the art of predictive models for selectivity prediction and comparison to variability in experimental data

Many researchers tend to use published data, often from different sources, to build and validate the predictive models. The importance of predicting and designing for selectivity focused our efforts on understanding data concordance between different experimental sources (28). We were interested if the models built on the data from one source can predict the values from another source and how they compared with in between source experimental value differences. Concept of computational Minimum Significant Ratio (cMSR) was introduced to quantify agreement between results from different assays, sources or predictive models(publication H10). It borrowed from the MSR (Minimum Significant Ratio) parameter(29) which is a way to estimate assay variability. MSR indicates what activity ratio is

HC

HCl

N

O

NNH

NH

O

NH2

Cl

Cl

N

N

O

NH

NH

O

NH2

NH

HC

HCl

HCl

HCl

N

NNH

O

NH

O

NH

NH2

AND Enantiomer

18

Page 19: Approaches to improvements in quality of computational models in drug discovery. · 2016-04-19 · Approaches to improvements in quality of computational models in drug discovery

significant for two compounds upon single testing. Good, reproducible assays would have MSR of 3 or less. Guided by cMSR and other standard comparison metrics (Table 3) we compared two experimental sources (internal Lilly data and published literature by Metz(30)) to SVMFP predictive models on 32 compounds for which both experimental data were available for majority of 69 kinase targets. The predictive models used training sets with no compounds in common between experimental sources. For all data (set of 1466 kinase-inhibitor pairs) the cMSR and other metrics (such as R2, # of compounds within 3 fold activity) two experimental sources showed the highest concordance (cMSR of 12.6), with predictive models of Lilly data showing slightly worse results (cMSR of 16.7 – Table 3). Predictive models built on the other source data showed the worst performance.

Table 3. Concordance of all 1466 kinase-inhibitor pair between different sources measured by four metrics. Only numerical values are used in the comparisons after prefixes are eliminated. Exp indicates experimental/measured values, Pred indicates predicted values.

Source1 Source2 R2, all data

mean folda difference, all data

cMSR, all data

% within 3 fold, all data

Exp(LLY) Exp(Metz) 0.58a 1.0a 12.6a 73b Exp(LLY) Pred(LLY) 0.46 -1.0 16.7 68

Exp(Metz) Pred(Metz) 0.34 1.3 16.4 62 Exp(Metz) Pred(LLY) 0.28 1.02 18.8 65 Exp(LLY) Pred(Metz) 0.29 -1.3 26.1 51 a. Mean fold difference computed from common data comparison. Negative sign indicates source 1

has lower values than source 2.

b. the numbers correspond to Lilly-Metz bias corrected measured results from Metz publication(30) (values from supplementary Table S2 in H10), the R2 for the bias uncorrected results is 0.57

For active only compounds (not using the data if compounds were >20uM), the concordance worsened dramatically especially for predictive models now showing cMSR of 53 between Lilly data and predicted Lilly data (Table 4).

Table 4. Concordance of non-qualified 452 kinase-inhibitor pairs measured by four metrics. Non-qualified data indicates compound’s activity reported without the qualifiers ‘<’ or ‘>’.

Source1 Source2 R2, non-qualified

Mean fold difference, non-qualified

cMSR, non-qualified

% within 3 fold, non-qualified

Exp(LLY) Exp(Metz) 0.46 1.31 23.3 57 Exp(LLY) Pred(LLY) 0.35 -3.6 53.3 45

Exp(Metz) Pred(Metz) 0.23 -2.3 42.6 50 Exp(Metz) Pred(LLY) 0.14 2.7 62.9 43 Exp(LLY) Pred(Metz) 0.17 3.1 73.1 41

19

Page 20: Approaches to improvements in quality of computational models in drug discovery. · 2016-04-19 · Approaches to improvements in quality of computational models in drug discovery

The results indicated that predictive models could be used for average selectivity assessment of sets of compounds. It is worth noting, that many of the 32 compounds showed low degree of selectivity, and for selective compounds (active in 1 or a few kinases) the agreement between two experimental sources was significantly better than any of the predictive methods (visual assessment on Fig. 12).

Figure 12. Heat maps for 29 of the 32 compounds tested in at least 25 targets. Empty values are shown in gray. Targets in columns are arranged in alphabetical order. Pred(LLY), Exp(LLY), and Exp(Metz) are shown in the left, middle, and right panels, respectively.

The study pointed out potential applicability of computational models to predict selectivity profiles of groups of training set related compounds and highlighted the limitations of predicting activity profiles for selective compounds. It also cautioned the researchers on using cross-source data for model creation and indicated potentially large differences one might expect in published literature data for the same compounds.

III. Future outlook on the improvements of quality of computational technologies in drug discovery

The case studies presented in the previous sections illustrated how ideas from one field can be translated to another (folding to docking/H1, medicinal to computational/H5-H19, computational to medicinal/H9, chemoinformatics to bioinformatics/H8) for continued improvement of practical application of computational technologies. Most of the ideas described in this dissertation have been

20

Page 21: Approaches to improvements in quality of computational models in drug discovery. · 2016-04-19 · Approaches to improvements in quality of computational models in drug discovery

implemented as processes with user interface, initially through Web pages, later via our .net application. This includes CDOCKER docking algorithm with all automatically generated constraints and pose processing filters, (CDOCKER has been commercialized by Accellrys). Automated SVMFP predictive models are built weekly for more than 800 targets providing interface for chemists and biologists to understand selectivity profiles of new and existing compounds across multiple targets. Dicer fragmentation algorithm and automated analysis of fragments for compound sets is routinely used in analysis of activity data. De-novo design algorithm gave rise to reaction based system for Automated Synthesis Lab (ASL). Utilization of ASL and computational methodologies have been exemplified in the design of selective RIO2 inhibitors (31). The tools continue to contribute to hypothesis generation for multiple drug discovery application and became a part of the everyday toolkit for medicinal and computational chemists.

In each of the cases, limitations of the current approaches were pointed out to critically highlight intrinsic difficulties in proper utilization of computational technologies. One of the biggest challenges in drug discovery is the ability to accurately predict the structure of biomolecular complexes and accurately estimate the strength of complex stability. The accuracy becomes increasingly critical when specific multiple target activities are needed for the desired profile. While the structure prediction appears to be better understood and tackled with improvements in both large and small molecule molecular mechanics force fields, the accurate affinity prediction appears to be lacking significant progress. The belief that free energy methods like Free Energy Perturbation (32), will, with increased computer power, be able to accurately predict binding affinities is seriously questioned when the computer power is no longer an issue. The reported predictions, even when the answer is known (33) are inferior to chemists intuition or chemoinformatics machine learning methodologies (like SVMFPs H11). Perhaps we are realizing that our small and large molecule force fields, while capable of describing adequately local geometries are not adequate to accurately describe the energetics of complexes and entire conformational space needed to estimate free energy of binding. Ever increasing computer power will allow to test and critically evaluate most of the current ideas.

Encouraging developments are around. Molecular Dynamics is made simpler and more testable by introduction of automation, interfaces and ease of use. Supercomputers like Anton, customized for MD code allow the brute force simulations of 17Kns/day for solvated proteins (DFHR, 24K atoms) and medium size systems routinely on uS timescale (34) with record setting 1mS simulation of solvated native BPTI(35). Most methods originally targeted at experts users can be tested by more and more non-experts. Students can fold proteins and learn about molecular interactions using Xbox FoldIT (36). Quantum methods are streamlined to serve a basis fit MM potential and subject the resulting molecules to free energy simulations (37). Electron Microscopy coupled with hybrid modeling and other techniques pushes the limits of size and resolution and will soon become a standard toolbox for studying structures and dynamics of biomolecular assemblies (38).

The current and future scope of my collaborative research uses the foundations outlined in this report and focuses on both science popularization and crowdsourcing, force field improvement for small molecules and docking pose refinement through MD, hybrid modeling for reconstruction of dynamics of macromolecular assemblies, and putting statistical foundations for simulation data and analysis.

21

Page 22: Approaches to improvements in quality of computational models in drug discovery. · 2016-04-19 · Approaches to improvements in quality of computational models in drug discovery

Looking back on early work presented in this dissertation I feel that much has been done but even more is remains to be done(4).

Acknowledgements

I’d like to extent my deepest gratitude to my long time mentor and friend prof. Kolinski for his encouragement and valuable insights, as well as dr. hab Kmiecik for his guidance, and always present help with all aspects of the process and directions for improvements, my sister for her pointers to improve the translation, and to the late prof. Shugar for being my mentor and motivator for the last 10 years.

22

Page 23: Approaches to improvements in quality of computational models in drug discovery. · 2016-04-19 · Approaches to improvements in quality of computational models in drug discovery

References

1. Swinney, D. C., and Anthony, J. (2011) How were new medicines discovered?, Nat. Rev. Drug Disc. 10, 507-519.

2. Van Drie, J. H. (2007) Computer-aided drug design: the next 20 years, Journal of computer-aided molecular design 21, 591-601.

3. Vieth, M., Hirst, J. D., Dominy, B. N., Daigler, H., and Brooks, C. L., III. (1998) Assessing search strategies for flexible docking, Journal of Computational Chemistry 19, 1623-1631.

4. Edwards, A., Isserlin, R., Bader, G., Frye, S., Willson, T., and Yu, F. (2011) Too many roads not taken., Nature 470, 163-165.

5. Blundell, T. L. (1996) Structure-based drug design, Nature 384, 23-26. 6. Kuntz, I. D. (1992) Structure-based strategies for drug design and discovery, Science 257, 1078-

1082. 7. Pellegatti, M. (2012) Preclinical in vivo ADME studies in drug development: a critical review,

Expert opinion on drug metabolism & toxicology 8, 161-172. 8. Stevens, R. C. Y., Shigeyuki; Wilson, Lan A. (2001) Global efforts in structural genomics., Science

(Washington, DC, United States) 294, 89-92. 9. Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N.,

and Bourne, P. E. (2000) The Protein Data Bank, Nuc. Acid Res. 28, 235-242. 10. Kuntz, I. D. (1992) Structure-based strategies for drug design and discovery., Science

(Washington, DC, United States) 257, 1078-1082. 11. Gaudreault, F., Chartier, M., and Najmanovich, R. (2012) Side-chain rotamer changes upon

ligand binding: common, crucial, correlate with entropy and rearrange hydrogen bonding, Bioinformatics 28, i423-i430.

12. Gao, C., Thorsteinson, N., Watson, I., Wang, J., and Vieth, M. (2015) Knowledge-Based Strategy to Improve Ligand Pose Prediction Accuracy for Lead Optimization, J Chem Inf Model 55, 1460-1468.

13. Erickson, J. A., Jalaie, M., Robertson, D. H., Lewis, R. A., and Vieth, M. (2004) Lessons in molecular recognition: the effects of ligand and protein flexibility on molecular docking accuracy, Journal of Medicinal Chemistry 47, 45-55.

14. Levitt, M. (1991) Protein folding, Current Opinion in Structural Biology, 224-229. 15. Brooks, B. R., Bruccoleri, R. E., Olafson, B. D., States, D. J., Swaminathan, S., and Karplus, M.

(1983) CHARMM: A program for macromolecular energy, minimization and dynamics calculation, J. Comp. Chem. 4, 187-217.

16. Sutherland, J. J., Nandigam, R. K., Erickson, J., and Vieth, M. (2007) Lessons in molecular recognition. 2. Assessing and improving cross-docking accuracy., J. Chem. Inf. Model. 47, 2293-2302.

17. de Dios, A., Shih, C., Lopez de Uralde, B., Sanchez, C., del Prado, M., Cabrejas, L. M. M., Pleite, S., Blanco-Urgoiti, J., Lorite, M. J., Nevill, C. R., Jr., Bonjouklian, R., York, J., Vieth, M., Wang, Y., Magnus, N., Campbell, R. M., Anderson, B. D., McCann, D. J., Giera, D. D., Lee, P. A., Schultz, R. M., Li, L. C., Johnson, L. M., and Wolos, J. A. (2005) Design of Potent and Selective 2-Aminobenzimidazole-Based p38a MAP Kinase Inhibitors with Excellent in Vivo Efficacy, J. Med. Chem. 48, 2270-2273.

18. Jaramillo, C., de Diego, J., Hamdouchi, C., Collins, E., Keyser, H., Sánchez-Marti´nez, C., del Prado, M., Norman, B., Brooks, H. B., Watkins, S., Spencer, C. D., Dempsey, J. A., Anderson, B. D., Campbell, R. M., Leggett, T., Patel, B., Schultz, R. M., Espinosa, J., Vieth, M., Zhang, F., and Timm, D. E. (2004) Aminoimidazo[1,2-a]pyridines as a new structural class of cyclin-dependent kinase

23

Page 24: Approaches to improvements in quality of computational models in drug discovery. · 2016-04-19 · Approaches to improvements in quality of computational models in drug discovery

inhibitors. Part 1: Design, synthesis, and biological evaluation, Bioorg. Med. Chem. Let. 14, 6095-6099.

19. Hamdouchi, C., Keyser, H., Collins, E., Jaramillo, C., De Diego, J. E., Spencer, C. D., Dempsey, J. A., Anderson, B. D., Leggett, T., Stamm, N. B., Schultz, R. M., Watkins, S. A., Cocke, K., Lemke, S., Burke, T. F., Beckmann, R. P., Dixon, J. T., Gurganus, T. M., Rankl, N. B., Houck, K. A., Zhang, F., Vieth, M., Espinosa, J., Timm, D. E., Campbell, R. M., Patel, B. K. R., and Brooks, H. B. (2004) The discovery of a new structural class of cyclin-dependent kinase inhibitors, aminoimidazo[1,2-a]pyridines, Molecular Cancer Therapeutics 3, 1-9.

20. Sawyer, J. S., Anderson, B. D., Beight, D. W., Campbell, R. M., Jones, M. L., Herron, D. K., Lampe, J. W., McCowan, J. R., McMillen, W. T., Mort, N., Parsons, S., Smith, E. C. R., Vieth, M., Weir, L. C., Yan, L., Zhang, F., and Yingling, J. M. (2003) Synthesis and Activity of New Aryl- and Heteroaryl-Substituted Pyrazole Inhibitors of the Transforming Growth Factor-b Type I Receptor Kinase Domain, Journal of Medicinal Chemistry 46, 3953-3956.

21. Kelley, B. P., Brown, S. P., Warren, G. L., and Muchmore, S. W. (2015) POSIT: Flexible Shape-Guided Docking For Pose Prediction, J Chem Inf Model 55, 1771-1780.

22. Gorse, A. D. (2006) Diversity in medicinal chemistry space, Current topics in medicinal chemistry 6, 3-18.

23. Lewell, X. Q., Judd, D. B., Watson, S. P., and Hann, M. M. (1998) RECAP--retrosynthetic combinatorial analysis procedure: a powerful new technique for identifying privileged molecular fragments with useful applications in combinatorial chemistry, J Chem Inf Comput Sci 38, 511-522.

24. Vieth, M., and Sutherland, J. (2006) Dependence of Molecular Properties on Proteomic Family for Marketed Oral Drugs., J. Med. Chem. 49, 3451-3453.

25. Raju, T. N. (2000) The Nobel chronicles., Lancet 355, 1022. 26. Vieth, M., Higgs, R. E., Robertson, D. H., Shapiro, M., Gragg, E. A., and Hemmerle, H. (2004)

Kinomics-structural biology and chemogenomics of kinase inhibitors and targets, Biochimica et Biophysica Acta 1697, 243-257.

27. Keiser, M. J., Roth, B. L., Armbruster, B. N., Ernsberger, P., Irwin, J. J., and Shoichet, B. K. (2007) Relating protein pharmacology by ligand chemistry, Nat Biotechnol 25, 197-206.

28. Sutherland, J. J., Gao, C., Cahya, S., and Vieth, M. (2013) What general conclusions can we draw from kinase profiling data sets?, Biochim Biophys Acta 1834, 1425-1433.

29. Eastwood, B. J., Farmen, M. W., Iversen, P. W., Craft, T. J., Smallwood, J. K., Garbison, K. E., Delapp, N. W., and Smith, G. F. (2006) The minimum significant ratio: a statistical parameter to characterize the reproducibility of potency estimates from concentration-response assays and estimation by replicate-experiment studies, Journal of biomolecular screening 11, 253-261.

30. Metz, J. T., Johnson, E. F., Soni, N. B., Merta, P. J., Kifle, L., and Hajduk, P. J. (2011) Navigating the kinome, Nature chemical biology 7, 200-202.

31. Varin, T., Godfrey, A. G., Masquelin, T., Nicolaou, C. A., Evans, D. A., and Vieth, M. (2015) Discovery of selective RIO2 kinase small molecule ligand, Biochim Biophys Acta 1854, 1630-1636.

32. Jorgensen, W. L., and Thomas, L. L. (2008) Perspective on Free-Energy Perturbation Calculations for Chemical Equilibria, Journal of chemical theory and computation 4, 869-876.

33. Wang, L., Wu, Y., Deng, Y., Kim, B., Pierce, L., Krilov, G., Lupyan, D., Robinson, S., Dahlgren, M. K., Greenwood, J., Romero, D. L., Masse, C., Knight, J. L., Steinbrecher, T., Beuming, T., Damm, W., Harder, E., Sherman, W., Brewer, M., Wester, R., Murcko, M., Frye, L., Farid, R., Lin, T., Mobley, D. L., Jorgensen, W. L., Berne, B. J., Friesner, R. A., and Abel, R. (2015) Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field, Journal of the American Chemical Society 137, 2695-2703.

24

Page 25: Approaches to improvements in quality of computational models in drug discovery. · 2016-04-19 · Approaches to improvements in quality of computational models in drug discovery