molecular design: how to and how not to?

44
Molecular design: How to and how not to? Peter W Kenny

Upload: peter-kenny

Post on 11-Jun-2015

462 views

Category:

Education


1 download

DESCRIPTION

I visited Astex and the main focus of the talk was correlation inflation.

TRANSCRIPT

Page 1: Molecular design:  How to and how not to?

Molecular design: How to and how not to?Peter W Kenny

Page 2: Molecular design:  How to and how not to?

Some things that are hurting Pharma

• Having to exploit targets that are poorly-linked to

human disease

• Inability to predict idiosyncratic toxicity

• Inability to measure free (unbound) physiological

concentrations of drug for remote targets (e.g.

intracellular or on far side of blood brain barrier)

Dans la merde : http://fbdd-lit.blogspot.com/2011/09/dans-la-merde.html

Page 3: Molecular design:  How to and how not to?

Molecular Design

• Control of behavior of compounds by manipulation of

molecular properties

• Hypothesis-driven or prediction-driven

• Sampling of chemical space

– Does fragment-based screening allow better control of

sampling resolution?

Page 4: Molecular design:  How to and how not to?

Achtung!Spitfire!

Prediction-driven design: Ju 87 Stuka

Stuka on wikipedia

Page 5: Molecular design:  How to and how not to?

“Why can’t we pray for something good, like a tighter bombing pattern, for example? Couldn’t we pray for a tighter bombing pattern?” , Heller, Catch 22, 1961

Hypothesis-driven design: B52 Stratofortress

B52 on wikipedia

Page 6: Molecular design:  How to and how not to?

Do1 Do2

Ac1

Kenny (2009) JCIM 49:1234-1244 DOI

Illustrating hypothesis-driven designDNA Base Isosteres: Acceptor & Donor Definitions

Page 7: Molecular design:  How to and how not to?

Watson-Crick Donor & Acceptor Electrostatic Potentials for Adenine Isosteres

Vm

in(A

c1)

Va (Do1)

Kenny (2009) JCIM 49:1234-1244 DOI

Page 8: Molecular design:  How to and how not to?

Eu prefiro minha comida cozida e meus dados brutos…

Page 9: Molecular design:  How to and how not to?

Correlation

• Strong correlation implies good predictivity

• Multivariate data analysis (e.g. PCA) usually involves transformation to orthogonal basis of lower dimensionality

• Applying cutoffs (e.g. MW restriction) to data can distort correlations

Page 10: Molecular design:  How to and how not to?

Quantifying strengths of relationships between continuous variables

• Correlation measures

– Pearson product-moment correlation coefficient (R)

– Spearman's rank correlation coefficient ()

– Kendall rank correlation coefficient (τ)

• Quality of fit measures

– Coefficient of determination (R2) is the fraction of the variance in Y that is explained by model

– Root mean square error (RMSE)

Page 11: Molecular design:  How to and how not to?

Difference in mean values of Y for X = A and X = B

Scale by standard deviation

Scale by standard error

Cohen’s d (independent of

sample size)

Student’s t(depends on sample size)

Size of effect for categorical XR2 can be seen as analogous to Cohen’s d

Page 12: Molecular design:  How to and how not to?

Preparation of synthetic data setsKenny & Montanari (2013) JCAMD 27:1-13 DOI

Add Gaussian noise (SD=10) to Y

Page 13: Molecular design:  How to and how not to?

Correlation inflation by hiding variationSee Hopkins, Mason & Overington (2006) Curr Opin Struct Biol 16:127-136 DOI

Leeson & Springthorpe (2007) NRDD 6:881-890 DOI

Data is naturally binned (X is an integer) and mean value of Y is calculated for each value of X. In some studies, averaged data is only presented graphically and it is left to the reader to judge the strength of the correlation.

R = 0.34 R = 0.30 R = 0.31

R = 0.67 R = 0.93 R = 0.996

Page 14: Molecular design:  How to and how not to?

r

N 1202

R 0.247 ( 95% CI: 0.193 | 0.299)

0.215 ( P < 0.0001)

0.148 ( P < 0.0001)

N 8

R 0.972 ( 95% CI: 0.846 | 0.995)

0.970 ( P < 0.0001)

0.909 ( P = 0.0018)

Correlation Inflation in FlatlandSee Lovering, Bikker & Humblet (2009) JMC 52:6752-6756 DOI

Page 15: Molecular design:  How to and how not to?

Masking variation with standard errorSee Gleeson (2008) JMC 51:817-834 DOI

Partition by value of X into 4 bins with equal numbers of data points and display 95% confidence interval for mean (green) and mean ± SD (blue) for each bin.

R = 0.12 R = 0.29 R = 0.28

Page 16: Molecular design:  How to and how not to?

N Bins Degrees of Freedom F P

40 4 3 0.2596 0.8540

400 4 3 12.855 < 0.0001

4000 4 3 115.35 < 0.0001

4000 2 1 270.91 < 0.0001

4000 8 7 50.075 < 0.0001

“In each plot provided, the width of the errors bars and the difference in the mean values of the different categories are indicative of the strength of the relationship between the parameters.” Gleeson (2008) JMC 51:817-834 DOI

The error of standard error

ANOVA for binned data sets

Page 17: Molecular design:  How to and how not to?

Know your data

• Assays are typically run in replicate making it possible to estimate assay variance

• Every assay has a finite dynamic range and it may not always be obvious what this is for a particular assay

• Dynamic range may have been sacrificed for thoughput but this, by itself, does not make the assay bad

• We need to be able analyse in-range and out-of-range data within single unified framework– See Lind (2010) QSAR analysis involving assay results which are only known to

be greater than, or less than some cut-off limit. Mol Inf 29:845-852 DOI

Page 18: Molecular design:  How to and how not to?

Depicting variation with percentile plots

This graphical representation of data makes it easy to visualize variation and can be used with mixed in-range and out-of-range data. See Colclough et al (2008) BMCL 16:6611-6616 DOI

Page 19: Molecular design:  How to and how not to?

Binning continuous data restricts your options for analysis and places burden of proof on you to show that your conclusions are independent of the binning scheme. Think before you bin!

Averaging the binned data was

your idea so don’t try blaming me this

time!

Page 20: Molecular design:  How to and how not to?

Some stuff to think about

• Model continuous data as continuous data– RMSE is most relevant to prediction but you still need R2

– Fitted parameters may provide insight (e.g. solubility is more sensitive than potency to lipophilicity)

• When selecting training data think in terms of Design of Experiments (e.g. evenly spaced values of X)

• Try to achieve normally distributed Y (e.g. use pIC50 rather than IC50)• Never make statements about the strength of a relationship when

you’ve hidden variation in the data (unless you want a starring role in Correlation Inflation 2)

• To be meaningful a measure of the spread of a distribution must be independent of sample size

• Reviewers/editors, mercilessly purge manuscripts of statements like, “A negative correlation was observed between X and Y” or “A and B are correlated/linked”

Page 21: Molecular design:  How to and how not to?

Choosing octanol was the first mistake...

Page 22: Molecular design:  How to and how not to?

Polarity

NClogP ≤ 5 Acc ≤ 10; Don ≤5

An alternative view of the Rule of 5

Page 23: Molecular design:  How to and how not to?

Does octanol/water ‘see’ hydrogen bond donors?

--0.06 -0.23 -0.24

--1.01 -0.66

Sangster lab database of octanol/water partition coefficients: http://logkow.cisti.nrc.ca/logkow/index.jsp

--1.05

Page 24: Molecular design:  How to and how not to?

Octanol/Water Alkane/Water

Octanol/water is not the only partitioning system

Page 25: Molecular design:  How to and how not to?

logPoct = 2.1

logPalk = 1.9

DlogP = 0.2

logPoct = 1.5

logPalk = -0.8

DlogP = 2.3

logPoct = 2.5

logPalk = -1.8

DlogP = 4.3

Differences in octanol/water and alkane/water logP values reflect hydrogen bonding between solute and octanol

Toulmin et al (2008) J Med Chem 51:3720-3730 DOI

Page 26: Molecular design:  How to and how not to?

DlogP = 0.5

PSA/ Å2 = 48

Polar Surface Area is not predictive of hydrogen bond strength

DlogP = 4.3

PSA/ Å2 = 22

Toulmin et al (2008) J Med Chem 51:3720-3730 DOI

Page 27: Molecular design:  How to and how not to?

DlogP

(corrected)

Vmin/(Hartree/electron)

DlogP

(corrected)

Vmin/(Hartree/electron)

N or ether OCarbonyl O

Prediction of contribution of acceptors to DlogP

DlogP = DlogP0 x exp(-kVmin)

Toulmin et al (2008) J Med Chem 51:3720-3730 DOI

Page 28: Molecular design:  How to and how not to?

Basis for ClogPalk model

logP

alk

MSA/Å2

Kenny, Montanari & Propopczyk et al (2013) JCAMD 27:389-402 DOIKenny, Montanari & Propopczyk et al (2013) JCAMD 27:389-402 DOI

Page 29: Molecular design:  How to and how not to?

𝐶𝑙𝑜𝑔𝑃𝑎𝑙𝑘 = 𝑙𝑜𝑔𝑃0 + 𝑠 ×𝑀𝑆𝐴 −

𝑖

∆𝑙𝑜𝑔𝑃𝐹𝐺,𝑖 −

𝑗

∆𝑙𝑜𝑔𝑃𝐼𝑛𝑡,𝑗

ClogPalk from perturbation of saturated hydrocarbon

logPalk predicted

for saturated

hydrocarbonPerturbation by

functional groups

Perturbation by

interactions

between

functional groups

Kenny, Montanari & Propopczyk et al (2013) JCAMD 27:389-402 DOI

Page 30: Molecular design:  How to and how not to?

Performance of ClogPalk model

Hydrocortisone

Cortisone

(logPalk ClogPalk)/2

logP

alk

Clo

gPal

k

AtropinePropanolol

Papavarlne

Kenny, Montanari & Propopczyk et al (2013) JCAMD 27:389-402 DOI

Page 31: Molecular design:  How to and how not to?

Another way to look at SAR?

Page 32: Molecular design:  How to and how not to?

(Descriptor-based) QSAR/QSPR:

Some questions • How valid is methodology (especially for validation)

when distribution of compounds in training/test space is

highly non-uniform?

• Are models predicting activity or locating neighbours?

• To what extent are ‘global’ models just ensembles of

local models?

• How well do the methods handle ‘activity cliffs’?

• How should we account for sizes of descriptor pools

when comparing model performance?

Page 33: Molecular design:  How to and how not to?

Measures of Diversity & Coverage

•• •

••

••

••

2-Dimensional representation of chemical space is used here to illustrate concepts of diversity

and coverage. Stars indicate compounds selected to sample this region of chemical space.

In this representation, similar compounds are close together

Page 34: Molecular design:  How to and how not to?

Neighborhoods and library design

Page 35: Molecular design:  How to and how not to?

Examples of relationships between structures

Tanimoto coefficient (foyfi) for structures is 0.90

Ester is methylated acid Amides are ‘reversed’

Page 36: Molecular design:  How to and how not to?

Leatherface molecular editorFrom chain saw to Matched Molecular Pairs

c-[A;!R]bnd 1 2

c-Brcul 2

hyd 1 1

[nX2]1c([OH])cccc1hyd 1 1hyd 3 -1

bnd 2 3 2

Kenny & Sadowski Structure modification in chemical databases, Methods and Principles in Medicinal

Chemistry (Chemoinformatics in Drug Discovery 2005, 23, 271-285 DOI

Page 37: Molecular design:  How to and how not to?

Effect of bioisosteric replacement on plasma protein binding

?

Date of Analysis N DlogFu SE SD %increase

2003 7 -0.64 0.09 0.23 0

2008 12 -0.60 0.06 0.20 0

Mining PPB database for carboxylate/tetrazole pairs suggested that bioisostericreplacement would lead to decrease in Fu so tetrazoles were not synthesised.

Birch et al (2009) BMCL 19:850-853 DOI

Page 38: Molecular design:  How to and how not to?

-0.316

-0.315

-0.296

-0.295

Bioisosterism: Carboxylate & tetrazole

-0.262

-0.261

-0.268

-0.268

Kenny (2009) JCIM 49:1234-1244 DOI

Page 39: Molecular design:  How to and how not to?

Amide N DlogS SE SD %Increase

Acyclic (aliphatic amine) 109 0.59 0.07 0.71 76

Cyclic 9 0.18 0.15 0.47 44

Benzanilides 9 1.49 0.25 0.76 100

Effect of amide N-methylation on aqueous solubility is dependent on substructural context

Birch et al (2009) BMCL 19:850-853 DOI

Page 40: Molecular design:  How to and how not to?

Relationships between structures

Discover new

bioisosteres &

scaffolds

Prediction of activity &

properties

Recognise

extreme data

Direct

prediction

(e.g. look up

substituent

effects)

Indirect

prediction

(e.g. apply

correction to

existing model)

Bad

measurement

or interesting

effect?

Page 41: Molecular design:  How to and how not to?

MUDO Molecule Editor

• SMIRKS-based re-write of Leatherface using OEChem

• Can process 3D structures (e.g. form covalent bond

between protein and ligand)

• Identification of matched molecular pairs is much easier

than with Leatherface

• Published with source code in supplemental information

Kenny, Montanari, Propopczyk, Sala, Rodrigues Sartori (2013) Automated molecule editing in

molecular design. JCAMD 27 DOI

Page 42: Molecular design:  How to and how not to?

More stuff to think about

• There is life beyond octanol/water (and atom-

centered charges) if we choose to look for it

• Even molecules can have meaningful

relationships

Page 43: Molecular design:  How to and how not to?

-0.054

-0.086-0.091

-0.072

-0.104 -0.093

Hydrogen bonding of esters

Toulmin et al (2008) J Med Chem 51:3720-3730 DOI

Page 44: Molecular design:  How to and how not to?

Glycogen Phosphorylase inhibitors:Series comparison

DpIC50

DlogFu

DlogS

0.38 (0.06)-0.30 (0.06)-0.29 (0.13)

DpIC50

DlogFu

DlogS

0.21 (0.06)0.13 (0.04)0.20 (0.09)

DpIC50

DlogFu

DlogS

0.29 (0.07)-0.42 (0.08)-0.62 (0.13)

Standard errors in mean values in parenthesis; see Birch et al (2009) BMCL 19:850-853 DOI