molecular comparisons by maximum common sub...

83
MedChemica | 2016 Molecular comparisons by Maximum Common Sub-Structure (MCSS) and application to Matched Molecular Pair Analysis (MMPA) MedChemica BIGCHEM Lecture Oct 2016 SlideShare:

Upload: hoangmien

Post on 18-Mar-2018

231 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Molecular comparisons by Maximum Common Sub-Structure (MCSS) and application to Matched Molecular

Pair Analysis (MMPA)

MedChemica BIGCHEM Lecture Oct 2016

SlideShare:

Page 2: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Lecture: •  Brief introduction to MedChemica – MCSS and MMPA at the

heart of pharma knowledge sharing •  Comparing molecules:

–  Why do it? –  What methods, compare contrast? –  What is your experiment?

•  About MCSS: –  Definitions –  A simple use case – depicting the common structure of two molecules –  What tools are out there? –  A brief bit of code in python (with RDkit and OpenEye toolkits)

•  Matched Molecular Pair Analysis: –  What is it? –  MCSS is just the start of the problem….chemical encoding –  Processing the data – generating rules

•  What next? MMPA methodology can be extended to extract pharmacophores  

Page 3: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

MedChemica – Enabling Knowledge sharing →Better medicinal chemistry

Full  Paper,  J.Med.Chem  submi.ed  

Page 4: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Recent Media

“Since  the  big  10  pharma  companies  spend  $70  billion  a  year  collec=vely  on  research  without  always  have  a  good  return,  collabora=ng  on  research  can  benefit  companies  from  all  the  money  spent  by  all  companies  in  the  industry.  This  collabora=on  will  benefit  both  AstraZeneca  and  Roche  without  divulging  confiden=al  informa=on”  –  NASP  –  26th  June  

The  ‘Market’  approves  of  pharma  collabora=ng    

Page 5: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Lecture: •  Brief introduction to MedChemica – MCSS and MMPA at the

heart of pharma knowledge sharing •  Comparing molecules:

–  Why do it? –  What methods, compare contrast? –  What is your experiment?

•  About MCSS: –  Definitions –  A simple use case – depicting the common structure of two molecules –  What tools are out there? –  A brief bit of code in python (with RDkit and OpenEye toolkits)

•  Matched Molecular Pair Analysis: –  What is it? –  MCSS is just the start of the problem….chemical encoding –  Processing the data – generating rules

•  What next? MMPA methodology can be extended to extract pharmacophores  

Page 6: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Comparing Molecules •  Chemical information processing is the science of representing molecules in

computers. Hence the fundamental “object” or data structure within a chemical information system is that of the molecule, its atoms (nodes) and its bonds (edge).

Why  Compare  molecules?  

Database  searching    

Chemical  representa=on  

and  interpreta=on  

Structure  Ac=vity  /  Property  Rela=onships  

Knowledge  extrac=on  

A word about experimental design: Always consider what question is being asked by the experiment, by considering this we can choose the right technique to use Compute tasks can take days to run, especially comparing molecules

Page 7: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Maximum Common Sub Structure (or Graph) Method

A/B 36/39 ha = 92% MCS overlap

•  Structures matched by overlapping of the matching atoms (nodes) and bonds (edges)

•  The compute process uses works by traversing atoms and bonds in a graph representation of the molecules in memory

•  The compute problem is NP-complete or NP-hard.

A - CHEMBL1784632 B - CHEMBL2316582

Non-matching heavy atom

Page 8: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Maximum Common Sub-Structure patterns versus Fingerprints

MCSS Process •  Convert each molecule into a

graph representation in memory. Traverse each molecule finding atoms (nodes) and bonds (edges) that are the same, loop many many times building the largest set of edge linked nodes.

•  Larger memory •  Slow (many algorithm set

a time-out) •  Result is concise atom/

bond structure, so difference in chemistry can be captured

Fingerprints Process •  Convert each molecule into a ‘bit

string (0s and 1s) representing parts of the molecule – thus each molecule is a ‘number’ in a computer – thus comparison between two molecules is easy.

•  Low memory •  Very Fast •  Result is a numerical

difference between the molecules – no exact chemical structural difference

Experimental design is important – how much time and compute resource do you have?

Maggiora,  G;  Vogt,  M.;  Stumpfe,  D.;  Barorath,  J;  Molecular  Similarity  in  Medicinal  Chemistry,  J.Med.Chem.2013,3186.  

Page 9: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Graphs Applied to Molecules h.ps://en.wikipedia.org/wiki/Graph_theory  

A Graph (G) is a set of Nodes and Vertices G = (N, V) Each Node has relationship between other nodes by the connections made by Vertices Each Node is connected to every other node by the Vertices and other Nodes…. ….Node 6 is 3 Vertices from Node 2 but there are two paths to get there, via 3 or 5….

Page 10: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Molecules are suited to Graphs

SMILES    Cc1ccc(cc1)C(c2ccccc2)c3ccccc3      ‘parse’  into  Molecular  Graph  For  example:    OpenEye  Toolkit  >>>  from  openeye.oechem  import  OEGraphMol,  OESmilesToMol  >>>  mol  =  OEGraphMol()  >>>  OESmilesToMol(mol,  "Cc1ccc(cc1)C(c2ccccc2)c3ccccc3")  >>>  for  atom  in  mol.GetAtoms():  ...          print  ('Atom  Idx  :{0}  Atomic  Num:  {1}'.format(atom.GetIdx(),  atom.GetAtomicNum())  Atom  Idx  :0  Atomic  Num:  6  Atom  Idx  :1  Atomic  Num:  6  Atom  Idx  :2  Atomic  Num:  6….  ...  for  bond  in  mol.GetBonds():          print  (bond.GetOrder())        

3  

2  1  0  

4  5  

7  

6  

8  

9  10  

11  

12  13  14  15  

16  

17  18  

19  

Nodes are atoms 109 atom ‘types’ – known bonding behaviour

Vertices are bonds single, double, triple……..

Molecular Graphs are the corner stone for chem-infomatics processing

WARNING    –  Do  Not  be  tempted  to  manipulate  SMILES  strings  with  text  searches  or  replace  func=ons  

Page 11: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Types of MCSS – lets look at some molecules MCES  –  Maximum  Common  Edge-­‐Induced  Substructure  –  as  many  matching  chemical  bonds  

cMCES  –  connected  MCES  The  largest  matching  fragment  of  both  molecules  is  itself  connected  structure  

dMCES  –  disconnected  MCES  Mul=ple  matching  parts  of  the  molecule  

SMILES    Cc1ccc(cc1)C(c2ccccc2)c3ccccc3    Cc1ccc(cc1)N(c2ccccc2)c3ccccc3  cMCES      [C][c]1[c][c][c]([c][c]1)  dMCES      [C][c]1[c][c][c]([c][c]1).[c]2[c][c][c][c][c]2.[c]3[c][c][c][c][c]3    

Page 12: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

The Travelling Sales Man Problem (and other NP-hard problems)

Comparing Graphs to find Maximum Common Sub-Graph is compute intensive Described as NP

- Non-deterministic polynomial time - not sure how many iterations over the graphs it will take - Much harder and longer the more complex the graph (lots of atoms and bonds)

h.ps://en.wikipedia.org/wiki/Travelling_Salesman_(2012_film)  

What  is  the  shortest  possible  route  for  a  traveling  salesman  seeking  to  visit  each  city  on  a  list  exactly  once  and  return  to  his  city  of  origin?  It  has  defied  solu=on  to  this  day  

Page 13: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

MCSS - References

•  Maggiora, G; Vogt, M.; Stumpfe, D.; Barorath, J; Molecular Similarity in Medicinal Chemistry, J.Med.Chem.2013,3186.

•  Cao, Y.; Jiang, T.; Girke, T. A Maximum common substructure-based algorithm for searching and predicting drug-like compounds. Bioinfomatics, 2008, 366.

•  Duesbury, E.; Holliday, J.; Willet, P. Maximum Common Substructure-Based Data Fusion in Similarity Searching, J.Chem.Inf.Model 2015, 222.

•  Hariharan, R.; Janakiraman, A.; Nilakantan, R.; Singh, B.; Varghese, S.; Landrum, G.; Schuffenhauer, A. MultiMCS: A Fast Algorithm for the Maximum Common Substructure Problem on Multiple Molecules, J.Chem.Inf.Model 2011, 786.

Page 14: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

A difficult connected MCSS – can you spot it?

CHEMBL1779022 CHEMBL2146793

14

Paired  by  MCSS  

Hard compute – 54 seconds for just this pair on one CPU MCSS is compute intensive – lets use this example 400 antibiotic macrocycles - ALL pair comparison to find Structure Activity Relationships (SAR) Roughly how long?

(400 x 400 x 54 secs) / 2 / 3600 / 24 = 50 days for 1 CPU ….we are going to need a 50 CPUs to run in a day.

 

Page 15: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

CHEMBL1779022 CHEMBL2146793

15

Paired  by  MCSS  

This is a useful case study: •  MCSS is useful as a sub-substructure that matches both molecules •  Using a depictions tool we can visualise the difference between them •  For compound design / optimisation this is very useful

A difficult MCSS – can you spot it?

Page 16: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

MCSS on Macrocycles

CHEMBL1779022 CHEMBL2146793

16

Paired  by  MCSS  Common  substructure  and  change  is  clear    

Hard  to  re-­‐draw  by  a  human  –  30  minutes!  

Page 17: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

MCSS on Macrocycles

CHEMBL1779022 CHEMBL2146793

17

Graph processing algorithm has worked as coded, but did not pick out the difference between cyclised and open chain.

So we need to discover new drugs; as molecules become more complex (anti-biotic macrocycles) using a computer to help analyse and design new molecules becomes hard

Page 18: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Molecule Fragment and Index Method

A - CHEMBL1784632 B - CHEMBL2316582

a   a  

Break all single rotatable bonds and group on common ‘core’ parts  

•  Semantically the same matched pair as MCSS but syntactically different in that ‘points of modification’ are not the same

Page 19: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Frag / Index single bond SMARTS From the Hussain and Rea publication Hussain, J., & Rea, C. (2010). "Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets." Journal of chemical information and modeling, 50(3), 339-348.

A SMARTS pattern is required to identify the single bonds to break Original [*]!@!=[*]  (any atoms, any bond not in ring or double – what about triple bonds?) Amended in paper (see ref 24) [#6+0;!$(*=,#[!#6])]!@!=!#[*]    (carbon atom to any atom [excluding carbon that also has a double or triple] and no triple bond]) Effectively this removes amide bonds as disconnections MedChemica variation [#6+0;!$(*=,#[!#6])]!@!=!#[*;!$([H])]  Carbon atom to any atom excluding hydrogen – stop disconnection to H as part of a explicit chiral centre SMARTS pattern can be set in config file or from the command line for direct .smi file processing

Bonds broken by pattern

not broken by pattern

Page 20: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Frag / Index and MCSS pair finding Consider the following compounds

Strict heavy atom c-MCSS

FI and MCSS have found the ‘same’ pattern for the matched pairs via the same core. FI requires additional compute to find the true MCSS

Three ways of Frag / Index finding matches (dependent on settings) Num. of Trans = 3 x num of Environment atoms

The bond to H may not be broken but will be added by the H-additions step of the process

CHEMBL309689 CHEMBL2331793        

Page 21: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Frag And Index – useful for disconnected MCSS

A B

A and B are matched by the same context groups (for MCSS these are disconnected graphs so are not currently found) Context can be captured on both ends in the SMIRKS

Both acyclic to acyclic and acyclic to cyclic changes are captured

Subtle changes in ring isomers are found in the ‘centre’ of molecules

A – CHEMBL100461 B –CHEMBL103900

A – CHEMBL10085 B – CHEMBL10327

For large molecule whole side chain modification can be captured

A – CHEMBL111247   B – CHEMBL110034  

Molecule Fragment and Index Method: ‘Linker’ changes found by double cuts  

Page 22: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

MCSS – subtle differences and ring changes Maximum Common Sub Structure (or Graph) Method

For molecules A and B of the same size (by heavy atom count) that are smaller (<30) transformations are found for positional isomer switches (these can be missed by F/I)

Subtle changes to smaller groups around rings are captured for all examples.

A B

A – CHEMBL1173789 B – CHEMBL1173714  

A – CHEMBL117055   B – CHEMBL115519    

Acyclic to cyclic changes are captured well by MCSS method

A – CHEMBL156639   B - CHEMBL2387702      

Page 23: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Does the comparison method really matter?

Using only one technique will miss between 12% and 56% of pairings

23

Pairings Pairings number  of  compounds   common   FI  only   MCSS  only   total   FI  only  %   common  %   MCSS  only  %  

VEGF   4466   14631   17172   14823   46626   37   31   32  Dopamine  Transporter   1470   4480   8930   3497   16907   53   26   21  

GABAA   848   2500   1722   4205   8427   20   30   50  

D2  human   3873   12995   13811   13098   39904   35   33   33  

D2  rat   1807   5408   6595   7346   19349   34   28   38  Acetylcholine  esterase   383   536   725   1434   2695   27   20   53  Monoamine  oxidase   264   653   1156   246   2055   56   32   12  

min   20   20   12  

max   56   33   53  

FI MCSS

co

mm

on

Page 24: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

MCPairsv1.x Settings for FI and MCSS methods

A/B 36/39 ha ratio = 0.92 MCS overlap

A - CHEMBL1784632 B - CHEMBL2316582

Non-matching heavy atom

Set MCSSHAcutoff in configs smaller overlap <0.9 higher overlap >0.9

a   a  

modi / core ha ratio = 6/30 ratioFI = 0.2

modi / core ha ratio = 9/30 ratioFI = 0.3

Set FIratio in configs larger inclusion >0.3 lower inclusion <0.3

Page 25: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

OpenEye ToolKit MCSS #!/usr/bin/env  python  from  __future__  import  print_function  from  openeye.oechem  import  *    pattern  =  OEGraphMol()  target  =  OEGraphMol()  OESmilesToMol(pattern,  "c1cc(O)c(O)cc1CCN")  OESmilesToMol(target,  "c1c(O)c(O)c(Cl)cc1CCCBr")    atomexpr  =  OEExprOpts_DefaultAtoms  bondexpr  =  OEExprOpts_DefaultBonds  #  create  maximum  common  substructure  object  mcss  =  OEMCSSearch(pattern,  atomexpr,  bondexpr,  OEMCSType_Exhaustive)  #  set  scoring  function  mcss.SetMCSFunc(OEMCSMaxAtoms())  #  ignore  matches  smaller  than  6  atoms  mcss.SetMinAtoms(6)  unique  =  True  #  loop  over  matches  for  count,  match  in  enumerate(mcss.Match(target,  unique)):          print  ("Match  %d:"  %  (count  +  1))          print  ("pattern  atoms:",  end="  ")          for  ma  in  match.GetAtoms():                  print  (ma.pattern.GetIdx(),  end="  ")          print  ("\ntarget  atoms:  ",  end="  ")          for  ma  in  match.GetAtoms():                  print  (ma.target.GetIdx(),  end="  ")            #  create  match  subgraph          m  =  OEGraphMol()          OESubsetMol(m,  match,  True)            print  ("\nmatch  smiles  =",  OEMolToSmiles(m))  

h.p://docs.eyesopen.com/toolkits/python/oechemtk/pa.ernmatch.html  

Page 26: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Rdkit MCSS – Python >>> from rdkit.Chem import rdFMCS >>> mol1 = Chem.MolFromSmiles("O=C(NCc1cc(OC)c(O)cc1)CCCC/C=C/C(C)C") >>> mol2 = Chem.MolFromSmiles("CC(C)CCCCCC(=O)NCC1=CC(=C(C=C1)O)OC") >>> mol3 = Chem.MolFromSmiles("c1(C=O)cc(OC)c(O)cc1") >>> mols = [mol1,mol2,mol3] >>> res=rdFMCS.FindMCS(mols) >>> res <rdkit.Chem.rdFMCS.MCSResult object at 0x...> >>> res.numAtoms 10 >>> res.numBonds 10 >>> res.smartsString '[#6]1(-[#6]):[#6]:[#6](-[#8]-[#6]):[#6](:[#6]:[#6]:1)-[#8]' >>> res.canceled False

h.p://www.rdkit.org/docs/GemngStartedInPython.html#maximum-­‐common-­‐substructure  

Page 27: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

About Clean Code

Compute code is written once and read a thousand times Do:

write Unit-Tests (especially with processing molecules) write functional tests write looooooong function, variable names write functions that DO ONE THING! write modular code – DRY (Do-not Repeat Yourself)

Clean  Code:  A  Handbook  of  Agile  So7ware  Cra7smanship  (Robert  C.  MarBn)    

Working  EffecBvely  with  Legacy  Code  (Micheal  C.  Feathers)    

Page 28: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Page 29: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Matched Molecular Pair Analysis

An example of using MCSS (and FI) to compare molecules and extract Medicinal chemistry knowledge – Example it to show that comparing

molecules it just the start of a process – What experiment are we doing? – How do we process the chemistry output

and the data? – What things can go wrong? – Chirality!

Page 30: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

•  Matched Molecular Pairs – Molecules that differ only by a particular, well-defined structural transformation

•  Transformation with environment capture –MMPs can be recorded as transformations from A B

•  Environment is essential to understand chemistry

Statistical analysis •  Learn what effect the transformation has had on ADMET properties in

the past

Griffen,  E.  et  al.  Matched  Molecular  Pairs  as  a  Medicinal  Chemistry  Tool.  Journal  of  Medicinal  Chemistry.  2011,  54(22),  pp.7739-­‐7750.      

Matched Molecular Pair Analysis uses MCSS methods….

Δ Data A-B 1

2

2

3

3

3

4

4

4

1 2

2 3

3

3 4

4

4 A        B    

Page 31: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Identify and group matching SMIRKS

Calculate statistical parameters for each unique SMIRKS (n, median, sd, se, n_up/n_down)

Is n ≥ 6?

Not enough data: ignore transformation

Is the |median| ≤ 0.05 and the intercentile range (10-90%) ≤ 0.3?

Perform two-tailed binomial test on the transformation to determine the

significance of the up/ down frequency

transformation is classified as ‘neutral’

Transformation classified as ‘NED’ (No Effect Determined)

Transformation classified as ‘increase’ or ‘decrease’

depending on which direction the property is changing

passfail

yesno

yesno

Rule selection

0 +ve -ve

Median data difference

Neutral   Increase  Decrease  

NED  

Page 32: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Environment really matters HMe: •  Median Δlog(Solubility) •  225 different

environments

2.5log  

1.5log  

HMe:  •  Median Δlog(Clint)

Human microsomal clearance

•  278 different environments

Page 33: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

More environment = right detail HMe Solubility: •  225 different environments

Page 34: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

HF What effect on Clearance? •  Median Δlog(Clint) Human microsomal clearance •  37 different environments

2  fold  improvement   2  fold  worse  

Increase  clearance  

decrease  clearance  

Page 35: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Chiral Test Set example

Page 36: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Chirality MCPairs --chiralON

Page 37: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

How to ‘store’ chemical information?

MCSS MMPA SMIRKS Storing reactions as canonical SMIRKS – an example of storing chemical information

Page 38: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Canonicalised SMIRKS 1 – Its about Symmetry

4-­‐Atom  rule  

Both SMIRKS are VALID and would transform a molecule correctly However we want to consistently produce a single SMIRKS for all of these It is 50:50 how an unchecked algorithm will number around the ring Therefore we force the numbering in the reactant side by numbering 1,2,3,4..n Then use symmetry checking to change the numbers to a consistent pattern Thus SMIRKS_B is produced

4   3  2  

1  

4   3  2  

1  

SMIRKS_A  [C]([H])([H])([H])[O][c:1]1[c:2]([H])[c:3]([H])[c:4][c:5]([H])[c:6]1([H])  

     >>[c:5]1([H])[c:6]([H])[c:1]([c:2]([H])[c:3]([H])[c:4]1)[Br]  SMIRKS_B  [C]([H])([H])([H])[O][c:1]1[c:2]([H])[c:3]([H])[c:4][c:5]([H])[c:6]1([H])  

     >>[c:3]1([H])[c:2]([H])[c:1]([c:6]([H])[c:5]([H])[c:4]1)[Br]      

6  5  

6  5  

4  

3  2   1  

4  

3  2   1  

6  5  

6  5  

Page 39: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Canonicalised SMIRKS 2

CHEMBL309689 CHEMBL2331793        

1-­‐Atom  rule        [H][O:1]>>[C]([H])([H])([H])[C]([H])([H])[O:1]  

Highly specific explicit H

Key mapped atom With 1 atom rule this is simple – 1 becomes 1

2-­‐Atom  rule      [H][O:1][c:2]>>[c:2][O:1][C]([H])([H])[C]([H])([H])([H])  

Mapped atoms Absolutely key – note the ethyl is right most in SMIRKS

2  

1  3  

3  4  4  

Orange  –  atom  env  radius  Blue            -­‐  atom  map  index  

Page 40: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Canonicalised SMIRKS 3

CHEMBL309689 CHEMBL2331793        

3-­‐Atom  rule  [H][O:1][c:2]([c:3])[n:4]>>[c:3][c:2]([n:4])[O:1][C]([H])([H])[C]([H])([H])([H])  

Highly specific explicit H

Key mapped atom With 3 atom rule environment is more complex

4-­‐Atom  rule  [H][O:1][c:2]1[c:3]([c:4][o:5][n:6]1)[C:7]([H])([H])>>          [C]([H])([H])([H])[C]([H])([H])[O:1][c:2]1[c:3]([c:4][o:5][n:6]1)[C:7]([H])([H])  

Note Mapped atoms run left to right 1,2,3,4…n Absolutely key – note the ethyl is LEFT most in SMIRKS Rule change depending on rule size, environment and symmetry

2  

1  3  

3  4  4  

4   Orange  –  atom  env  radius  Blue            -­‐  atom  map  index  

4  

3   2  

1  

4  3  

2  1  

Page 41: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Canonicalised SMIRKS 4

CHEMBL309689 CHEMBL2331793        

3-­‐Atom  rule  [H][O:1][c:2]([c:3])[n:4]>>[c:3][c:2]([n:4])[O:1][C]([H])([H])[C]([H])([H])([H])  

Key mapped atom Explicit Hydrogen are on the RHS of product side Thus the mappings become complex and NOT in order 1,2,3,n

While this SMIRKS is VALID and would transform a molecule correctly IF we search for the this SMIRKS there is NO MATCH

Reverse  SMIRKS  [c:3][c:2]([n:4])[O:1][C]([H])([H])[C]([H])([H])([H])>>[H][O:1][c:2]([c:3])[n:4]  

Actual  SMIRKS    [c:1][c:2]([n:3])[O:4][C]([H])([H])[C]([H])([H])([H])>>[H][O:4][c:2]([c:1])[n:3]]  

Due to Explicit H, Symmetry and Canonicalisation - Map Index must be treated with care – directly swapping the string may not match

NOTE – The same applies to fragment SMILES

4  3  

2  1  

4  3  

2  1   Orange  –  atom  env  radius  

Blue            -­‐  atom  map  index  

Page 42: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Take home messages •  Molecular Graphs are the corner stone of chem-infomatics

•  Allow traversal of atoms and bonds

•  Allow graph theory techniques to be applied to comparing molecules

•  Maximum Common Sub-Structure is found by NP-complete graph techniques and is compute intensive

•  The Fragment and Index method can find MCSS in many circumstances but not all

•  Further compute processes are required to deal with chirality and encoding of the common parts of molecules and the difference between molecules

•  Matched Molecular Pair analysis is a data analysis technique, based on MCSS, to extract chemical design knowledge

Page 43: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Appendix

Page 44: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

IP security

•  Contributing company structures are NOT shared •  Contributing company identifiers are NOT shared •  Each contributing company receives it’s OWN

custom GRD copy to enable drill back to OWN example data

•  Minimum of n=6 example pairs required for inclusion in the GRD

•  Enforced limits of shared substructure sizes •  Option for “time-blocking” sharing of data to

withhold data < 6 months old for IP submission

Page 45: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

MCPairs Platform

•  Extract rules using Advanced Matched Molecular Pair Analysis •  Knowledge is captured as transformations

–  divorced from structures => sharable

Measured Data

rule finder Exploitable

Knowledge

MC Expert Enumerator

System

Problem molecule

Solution molecules

Pharmacophores& toxophores

SMARTS matching

Alerts  Virtual  screening  Library  design  

Protect  the  IP  jewels  

Page 46: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Barriers Broken to Sharing Knowledge

Data Integrity and

curation Knowledge extraction algorithms

Consortium building to

share knowledge Into the minds of

chemists

✓  

Page 47: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Barriers Broken to Sharing Knowledge

Data Integrity and

curation Knowledge extraction algorithms

Consortium building to

share knowledge Into the minds of

chemists

✓  ✓  

Page 48: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Merging knowledge •  Use the transforms that

are robust in both companies to calibrate assays.

•  Once the assays are calibrated against each other the transform data can be combined to build support in poorly exemplified transforms

•  Methodology precedented in other fields

Calibrate Robust

Robust

Weak

Weak

Discover Novel

Pharma  1  

Pharma  2  

Page 49: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Merging Datasets •  Datasets are standardized by comparison of

transformations shared by contributing companies

•  Transformations are examined at the “pair example” level

•  Minimum of 6 transformations, each with a minimum of 6 pairs (42 compounds bare minimum) required to standardise

•  “calibration factors” extracted to standardize the datasets to a common value – mean of calibration factors 0.94, typical range 0.8-1.2.

•  Datasets with too few common transformations have standard compound measurements shared for calibration.

“Blinded”  source  of  transforma=ons  

Page 50: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Current Knowledge sets – GRDv3 Numbers of statistically valid transforms

Grouped Datasets Number of Rules

logD7.4 153449

Merged solubility 46655

In vitro microsomal clearance: Human, rat ,mouse, cyno, dog 88423

In vitro hepatocyte clearance : Human, rat ,mouse, cyno, dog 26627

MCDK permeability A-B / B – A efflux 1852

Cytochrome P450 inhibition: 2C9, 2D6 , 3A4 , 2C19 , 1A2 40605

Cardiac ion channels NaV 1.5 , hERG ion channel inhibition 15636

Glutathione Stability 116

plasma protein or albumin binding Human, rat ,mouse, cyno, dog 64622

Page 51: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Pharma 1 100k rules

Pharma 2 92k rules

Pharma 3 37k rules

5.8k rules in common (pre-merge) ~ 2%

New Rules 88k ~26% of total

Merge  

Combining  data  yields  brand  new  rules  Gains:    300  -­‐  900%  

Merging knowledge – GRDv1

Page 52: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Key findings:

•  Secure sharing of large scale ADMET knowledge between large Pharma is possible

•  The collaboration generated great synergy •  Many findings are highly significant. •  MMP is a great tool for idea generation.  •  The rules have been used in drug-discovery projects

and generated meaningful results •  MMPA methodology can be extended to extract

pharmacophores  

Page 53: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Barriers Broken to Sharing Knowledge

Data Integrity and

curation Knowledge extraction algorithms

Consortium building to

share knowledge Into the minds of

chemists

✓  ✓  

✓  

Page 54: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Integrating knowledge exploitation

..many possible connections

..high Stability and flexibility**

Rule Database

MCExpert

API server

** api.medchemica.com has delivered 18 months of uptime and drives SaltTraX (Elixir) Approx 50 chemists use the tool

RESTful API

Chemistry  Shape  and  electrostaHcs  

Page 55: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Matched Molecular Pair

data A data B

data C data D

data E data F

Chemical Transformations

Δ data A B

Δ data C D

Δ data E F

Chemical Transformations

Δ data A B

Δ data C D

Δ data E F

Δ data G H

Δ data I J

Δ data K L

Matched Molecular Pair Analysis (MMPA) enables SAR sharing

Without sharing underlying structures and data

Grand Rule

Database

Enumeration

Rate-My-Idea

GRD-Browser

ChEMBL Tox database Toxophores MC-

Biophore

Page 56: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Collaborators and Users - experience

Page 57: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

A Collaboration of the willing

Craig Bruce OE John Cumming Roche David Cosgrove C4XD Andy Grant★

Martin Harrison Elixir Huw Jones Base360 Al Rabow Consulting David Riley AZ Graeme Robb AZ Attilla Ting AZ Howard Tucker retired Dan Warner Myjar Steve St-Galley Syngenta David Wood JDR Lauren Reid MedChemica Shane Monague MedChemica Jessica Stacey MedChemica

Andy Barker Consulting Pat Barton AZ Andy Davis AZ Andrew Griffin Elixir Phil Jewsbury AZ Mike Snowden AZ Peter Sjo AZ Martin Packer AZ Manos Perros Entasis Therapeutics Nick Tomkinson AZ Martin Stahl Roche Jerome Hert Roche Martin Blapp Roche Torsten Schindler Roche Paula Petrone Roche Christian Kramer Roche Jeff Blaney Genentech Hao Zheng Genentech Slaton Lipscomb Genentech Alberto Gobbi Genentech

Page 58: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

MedChemica | 2016

ACS Philadelphia 2016

- Fix hERG problem whilst maintaining potency

Waring et al, Med. Chem. Commun., (2011), 2, 775

Glucokinase Activators

MMPA ∆pEC50: -0.1 ∆logD: -0.6 ∆hERG pIC50 :-0.5

n=33 n=32 n=22

MMPA ∆pEC50: +0.3 ∆logD: +0.3 ∆hERG pIC50 :-0.3

n=20 n=23 n=19

MMPA ∆pEC50: -0.1 ∆logD: -0.6 ∆hERG pIC50 :-0.5

n=27 n=27 n=7

MedChemica | 2016

ACS Philadelphia 2016

A Less Simple Example Increase logD and gain solubility

Property NumberofObserva2ons

Direc2on MeanChange Probability

logD 8 Increase 1.2 100%

Log(Solubility) 14 Increase 1.4 92%

Whatistheeffectonlipophilicityandsolubility?Rochedataisinconclusive!(2pairsforlogD,1pairforsolubility)

logD=2.65KineMcsolubility=84µg/mlIC50SST5=0.8µM

logD=3.63KineMcsolubility=>452µg/mlIC50SST5=0.19µM

Ques2on:

AvailableSta2s2cs:

RocheExample:

MedChemica | 2016

ACS Philadelphia 2016

Solving a tBu metabolism issue

Benchmarkcompound

Predictedtooffermostimprovementinmicrosomalstability(inatleast1species/assay)

R2R1

tBu Me Et iPr

99392

1664

78410

53550

99288

78515

4135

98327

92372

24247

35128

2462

60395

39445

321

2027

5789

5489

•  Data shown are Clint for HLM and MLM (top and bottom, respectively)

R1 R2R1tBuRoger Butlin Rebecca Newton Allan Jordan

Page 59: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Cathepsin K – Di-methoxy surprise – Man and Machine

pIC50 7.95 LogD 0.67 HLM <2.0 Solubility 280μM DTM ~1.0 mg/kg UID Potent Too polar / Renal Cl

PDB  -­‐  97%  of  structures    Crawford,  J.J.;  Dosse.er,  A.G  J  Med  Chem.  2012,  55,  8827.  Dosse.er,  A.  G.  Bioorg.  Med.  Chem.  2010,  4405  Lewis  et  al,  J  Comput  Aided  Mol  Des,  2009,  23,  97–103      

pIC50 8.2 LogD 2.8 HLM <1.0 Solubility >1400μM DTM 0.01 mg/kg UID High F% / stability maximised

Increase in LogP, Properties improved

Solubility  ΔpIC50 - 0.1 ΔLogD +1.4 ΔpSol +1.2 ΔHLM + 0.25

No renal Cl low F%

ΔpIC50 +0.1 ΔLogD - 0.7 ΔpSol ~0.0 ΔHLM - 0.25

High F% rat/Dog Electrosta=c  poten=al  minima  between  oxygens  

Approx  like  N  from  5-­‐het,  new  compound  can  not  form  a  quinoline  

Incr.  selec=vity  

ΔpIC50 +0.1 ΔLogD - 0.7 ΔpSol ~0.0 ΔHLM - 0.25

High F% rat/Dog

Page 60: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

- Fix hERG problem whilst maintaining potency

Waring et al, Med. Chem. Commun., (2011), 2, 775

Glucokinase Activators

MMPA ∆pEC50: -0.1 ∆logD: -0.6 ∆hERG pIC50 :-0.5

n=33 n=32 n=22

MMPA ∆pEC50: +0.3 ∆logD: +0.3 ∆hERG pIC50 :-0.3

n=20 n=23 n=19

MMPA ∆pEC50: -0.1 ∆logD: -0.6 ∆hERG pIC50 :-0.5

n=27 n=27 n=7

Page 61: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Knowledge Based Design – MPO –  Novel more efficient core required, improve hERG for CD –  CNS penetration, good potency and deliver tool for in vivo testing

McCoull, Dossetter et al, Med. Chem. Commun., (2013), 4, 456

ΔpIC50 -0.4 ΔlogD -1.8 ΔhERG pIC50 +0.4

Ghrelin Inverse agonists

~

MMPA Cores

pIC50 9.9 logD 5.0 hERG pIC5 5.0 LLE 4.9 very potent very lipophilic

ΔpIC50 +0.9 ΔlogD +0.2 ΔhERG pIC50 -0.3

pIC50 8.2 logD 1.3 hERG pIC50 4.4 LLE 6.9

ΔpIC50 -2.2 ΔlogD -2.2 ΔhERG pIC50 -0.7

100 compounds

made

LLE = lipophilic ligand efficiency: LLE=pIC50-logD

LLE 6.4

LLE 6.9

Page 62: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Early successes From GRDv1 May 2014

62

J.  Med.  Chem.,  2015,  58  (23),  pp  9309–9333  DOI:  10.1021/acs.jmedchem.5b01312  

Page 63: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Comparison of Merck in-house MMPA with SALTMinerTM

Structure:

ADMET Issue: hERG Lead A2A receptor antagonist compound in Merck Parkinson's project

138 suggestion molecules with predicted improvement in hERG

binding

How many match the results of Merck?

•  Also shows potent binding to the hERG ion channel

•  Deng et al performed in-house MMPA on hERG binding compound data and have published 18 resulting fluorobenzene transformations, which they have synthesized and tested for hERG activity

Deng  et  al,    Bioorg.  &  Med  Chem  Let  (2015),  doi:  h.p://dx.doi.org/10.1016/j.bmcl.2015.05.036    

Page 64: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

R  group:  

Measured  hERG  pIC50  change  

-­‐1.187   -­‐1.149   -­‐1.038   -­‐1.215   -­‐1.157   -­‐0.149   -­‐1.487   -­‐1.133  

GRD  median  historic  pIC50  change  

0   -­‐0.171   -­‐0.1   -­‐0.283   -­‐0.219   -­‐0.318   -­‐0.159   -­‐0.103  

Results: 8 out of the 18 fluorobenzene transformations produced by Merck were also suggested by MCExpert to decrease hERG binding:

Searching the GRD for transformations that increase hERG there were none that matched the remaining 10 of 18 transformations in the paper.

MCExpert also suggested an additional 50 fluorobenzene replacements to decrease hERG binding NOT mentioned in the publication.

Page 65: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

A Less Simple Example Increase logD and gain solubility

Property   Number  of  ObservaBons  

DirecBon   Mean  Change   Probability  

logD   8   Increase   1.2   100%  

Log(Solubility)   14   Increase   1.4   92%  

What  is  the  effect  on  lipophilicity  and  solubility?  Roche  data  is  inconclusive!  (2  pairs  for  logD,  1  pair  for  solubility)  

logD  =  2.65  Kine=c  solubility  =  84  µg/ml  IC50  SST5  =  0.8  µM  

logD  =  3.63  Kine=c  solubility  =  >452  µg/ml  IC50  SST5  =  0.19  µM  

QuesBon:  

Available  StaBsBcs:  

Roche  Example:  

Page 66: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Solving a tBu metabolism issue

Benchmark  compound  

Predicted  to  offer  most  improvement  in  microsomal  stability  (in  at  least  1  species  /  assay)  

                     R2    R1  

tBu   Me   Et   iPr  

99  392  

16  64  

78  410  

53  550  

99  288  

78  515  

41  35  

98  327  

92  372  

24  247  

35  128  

24  62  

60  395  

39  445  

3  21  

20  27  

57  89  

54  89  

•  Data shown are Clint for HLM and MLM (top and bottom, respectively)

R1   R2  R1  tBu  Roger Butlin Rebecca Newton Allan Jordan

Page 67: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

The application helped lead optimization in project

67

•  193 compounds •  Enumerated

Objective: improve metabolic stability

MMP Enumeration

Calculated Property Docking

8 compounds synthesized

Page 68: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Barriers Broken to Sharing Knowledge

Data Integrity and

curation Knowledge extraction algorithms

Consortium building to

share knowledge Into the minds of

chemists

Page 69: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Data Integrity and Curation

Structural •  Extensive standards for

inclusion of mixtures, chiral compounds, salt forms

•  Tautomer and charge state canonicalisation client side

•  Automated validation of structures run client side = “clean” comparable structures submitted to pair finding

Measured Data •  Assay protocols reviewed

prior to merging •  Precise documentation

on unit definitions and data reporting standards

•  Option to share standard compound measured values

•  Automated extensive data validation checks prior to merging data

“client  side”  =  behind  Pharma  firewall  

Page 70: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Calibrating Assays

•  Sets of transformations can be calibrated against each other as we are comparing Δ values in assays not absolute values

•  Assays are usually linearly displaced against each other •  Data analysis equivalent of FEP

Compound A Compound B Compound C Compound D

Transformation 1 Transformation 2

pIC50, log(Clint), pSol etc

Assay 1 Assay 2

ΔT1  ΔT2  

ΔT1’=  ΔT1  ΔT2’=  ΔT2      

ΔT1’  ΔT2’  

Assay 2 more sensitive than Assay 1

Assay 1 Δ

Assay 2 Δ

Assay 2 less sensitive than Assay 1

T1

T2

Page 71: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Pharmacophores and Toxophores by extended analysis from the MMPA

Pharmacophores BigData Stats Matched

Pairs Finding

Public and in-house potency

data

Page 72: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Target Number  of  compounds  

Number  of  compound  

pairs  

Number  of  Fragments  

Number  of  Pharmacophore  dyads  ayer  filtering  

R2   RMSEP   ROC   odds_ra=o  (geomean)  

Acetylcholine esterase - human 383   27755   44   10   0.43   1.57   0.80   4  β 1 adrenergic receptor 505   145447   276   313   0.64   0.70   0.96   833  Androgen receptor 1064   113163   186   46   0.47   0.77   0.86   140  CB1 canabinnoid receptor 1104   88091   165   90   0.61   1.02   0.87   96  CB2 canabinnoid receptor 1112   82130   194   158   0.19   0.85   0.64   5.7  Dopamine D2 receptor - human 3873   230962   483   602   0.42   0.88   0.69   110  Dopamine D2 receptor - rat 1807   118736   267   377   0.29   0.85   0.78   125  Dopamine Transporter 1470   106969   282   336   0.58   0.73   0.88   141  GABA A receptor 848   39494   106   167   0.70   0.76   0.97   560  hERG ion channel 4189   242261   392   76   0.61   0.96   0.92   55  5HT2a receptor 642   50870   197   267   0.61   0.59   0.83   600  Monoamine oxidase 264   15439   44   11   0.12   1.25   0.48   181  Muscarinic acetylcholine receptor M1 628   48200   97   510   0.62   0.94   0.89   48  

µ opioid receptor 1128   37184   33   11   0.69   1.30   0.87   81  

Critical safety target analysis

•  Build models using 10-fold cross validated PLS •  Assess using ROC / BEDROC, R2 vs 100 fold y-scrambled R2 and geomean odds ratio

72

Public Data

Find Matched Pairs

Pharmacophores Find

Pharmacophore dyads

Find Potent Fragments

J. Bowes, et al Nat. Rev. Drug Discov., vol. 11, no. 12, pp. 909–922, Nov. 2012

Page 73: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Novartis Predictions From Our Model Domain of Applicability….

Actual: 8.4[1] Predicted: 7.5

73

Actual: 7.6[1] Predicted: 7.5

1. J MedChem(2016), Bold et al. 2. MedChem Lett (2016), Mainolfi et al.

Actual: 7.7[2] Predicted: 7.1

Actual: 9.0[2] Predicted: Out of Domain

Page 74: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

MCBiophore GUI screenshot

Assay Image Mean_with Mean_without PLS_coeff Path SMARTS1 SMARTS2 n_examples odds_ratio2

VEGFR 8.3 6.4 0.71 [c]c[c][c] Cc1ccc[c]c1[n] [c]/C(=N/O)/C 18 259.83

VEGFR 8.2 6.4 0.17 [c] [c]c1cc(cc[c]1)/C(=N/OC)/CCc1ccc[c]c1[n] 17 257.60

VEGFR 8.1 6.4 -0.01 [CH3] [cH]c([cH])/C(=N/OC)/C[c]/C(=N/OCC)/C 8 20.21

VEGFR 8.1 6.4 -0.01 [CH3] [c]c1cc(cc[c]1)/C(=N/OC)/C[c]/C(=N/OCC)/C 8 151.2

Detailed results in

excel

Page 75: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Mining transform sets to find influential fragments

Identify the ‘Z’ fragments associated with a significant number of potency increasing changes – irrespective of what they are replaced with ‘Z’ is ‘worse than anything you replace it with’

Fragment A Fragment B  Change in binding

measurement

Public Data

Find Matched

Pairs

Find Potent Fragments

+2.7  

+3.2  

+0.6  

+0.6  

Identify the ‘A’ fragments associated with a significant number of potency decreasing changes – irrespective of what they are replaced with ‘A’ is ‘better than anything you replace it with’

A  

+2.1  +2.2  +1.4  

+0.4  

+1.8  

Z  

pKi/ pIC50

Compounds with destructive fragment

Compounds with constructive fragments

Generate  Pharmacophore  dyads  by  permuta=ng  all  the  fragments  with  the  shortest  path  between  them  

Page 76: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Toxophores - Detailed, specific & transparent

76

Dopamine D2 receptor human Actual: 9.5

Predicted: 9.1 Mean with: 8.0 Mean without: 6.6 Odds Ratio: 340

Dopamine Transporter Actual: 9.1

Predicted: 8.6 Mean with: 8.3 Mean without: 6.7 Odds Ratio: 407

GABA-A Actual: 9.0

Predicted: 8.7 Mean with: 8.0 Mean without: 6.8 Odds Ratio: 1506

β1 adrenergic receptor Actual: 7.8 Predicted: 7.7 Mean with: 6.5 Mean without: 5.7 Odds Ratio: 1501

Find Potent Fragments

Matched Pairs

Finding

Find Pharmacophore

Dyads

Public and in-house potency

data

Page 77: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Prediction of unseen new molecules The acid test…

•  Vascular endothelial growth factor receptor 2 tyrosine kinase (KDR)

•  Inhibitors have oncology and ophthalmic indications

•  Large dataset in CHEMBL

•  10 fold cross validated PLS model

•  Selected model by minimised RMSEP

77

Compounds 4466 Matched Pairs 288100 Fragments 678 Pharmacophore dyads 787 RMSEP 0.8 R2 0.64 Y-scrambled R2 0.0 ROC 0.95 Geomean odds ratio 80

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

● ●

●●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

● ●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

4

6

8

10

5 7 9pIC50_pred

pIC50

Page 78: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Future developments

Methodology •  “Metabolophore” extraction •  Rule partition by charge •  Enhanced statistical rule selection methods •  Inferred rule extraction (AB + B C = AC) / matched series / matched networks •  Meta rule identification (eg halogen>>alkyl) •  Rule partition by shape •  Fuzzier atom typing (eg matching indole NH with ArNHC(=O)Me)

Technology •  Transform searching and clustering •  Graph database •  Distributed compute (eg Apache Spark)

Science •  Explaining counter dogma transformations

•  Single crystal x-ray for solubility •  Route of metabolism studies •  Serum protein albumin co-crydtallisation for PPB

•  Cardiac and liver tox screening panel development

78

Bigger,  faster,  bigger  &  faster!  

Page 79: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Essentials of the collaboration •  Roche, Genentech and AZ all have ADMET data

processed inside their firewalls to generate transformations (matched pairs transformations with change data)

•  The transformations (fragments of molecules only) are shared with MedChemica

•  MedChemica combines the transformation data and returns the aggregated knowledge

•  Therefore NO party can drill back to anyone else’s structures or original data

•  There is no reach-through by any party •  MedChemica facilitates the science coordinating group

to make suggestions on improvements and enhancements to the data sets, methods for extraction, analysis and exploitation

Page 80: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

How Specific are Pharmacophore dyads?

•  How selective is the pharmacophore? •  What are the odds of it hitting a molecule in the test set vs CHEMBL?

•  Odds of finding in potency set = n(pharmacophore hits in potency set) n(in potency set)

•  Odds of finding in CHEMBL =

n(pharmacophore hits in CHEMBL not in potency set) n(in CHEMBL)

•  Odds ratio = selectivity =

Odds of finding in potency set_______ Odds of finding in CHEMBL(not potency set)

80

27 1470

62 1351211

27/1470 62/1351211

=407 (95% confidence limits: 259-642)

Odds of hitting a potent compound are 407 times greater than a random compound in CHEMBL

Path"

Fragment 1  

Fragment 2  [CH2]CN  

Page 81: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Benchmarking Specificity What does a bad odds ratio look like?

What is the odds ratio? Found in CHEMBL 565658/1352681 Found in CHEMBL240 – hERG where pIC50 >=5 1985/2451

OR = 1985/2451 = 0.81 565658/1352681 0.42

=1.94 (95% conf 1.83 – 2.05)

81

Lipophilic base, usually a tertiary amine X = 2-5 atom chain, may include rings, heteroatoms or polar groups

XN

R1

R2

e. g. sertindole: 14nM vs hERG

[$([NX3;H2,H1,H0;!$(N[C,S]=[O,N])]~*~*~*~c),$([NX3;H2,H1,H0;!$(N[C,S]=[O,N])]~*~*~c),$([NX3;H2,H1,H0;!$(N[C,S]=[O,N])]~*~*~*~*~c),$([NX3;H2,H1,H0;!$(N[C,S]=[O,N])]~*~*~*~*~*~c)]  

Early simple hERG model

Ar-linker-base has only been found 1.9x more often in hERG inhibitors than at random in ChEMBL

Page 82: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Fast building block access from CRO collaboration

82

MCExpert suggests

improved building blocks

Specialist synthesis CROs access unique

chemistries

Rapid access to building blocks that address

metabolism and solubility issues

Mono & disubstituted chiral piperidines and pyrollidines

Chiral α methyl aryl amines and alcohols

Page 83: Molecular comparisons by Maximum Common Sub …bigchem.eu/sites/default/files/School1_Dossetter.pdf · Molecular comparisons by Maximum Common Sub-Structure ... Molecule Fragment

MedChemica | 2016

Better compounds designed from Data

Essentials Gains

Pains

•  Improved compounds quicker

•  Applicable ideas

•  Confident design decisions

•  Help when stuck

•  Clearly describable plans

•  Maximizing value from ADMET testing

•  Pursuing dead-end series

•  Pursuing dead-end projects

•  Running out of time or $