forming focused libraries and discovering active molecules with iterative stochastic elimination

Optimizing Drug DesignLeiden 20-23 July 2009

Forming focused libraries and discovering active molecules with Iterative Stochastic Elimination

Amiram Goldblum, Anwar Rayan and David MarcusDept. of Medicinal Chemistry

School of PharmacyEin Kerem Campus

http://www.md.huji.ac.il/models



Iterative Stochastic Elimination (ISE)Our Generic tool for optimizing highly complex combinatorial problems

Problem type: Systems with many variables, each variable having many discrete values, the variables interacting with each other, and each state of the system can be evaluated and given a score (transportation, communication, electronic devices, life sciences)

Method: ISE finds optimal system states (global and local minima/optima) by iteratively eliminating values of variables that contribute to worst results. Elimination is based on careful statistics of randomly picked states of the system

Why: ISE has been compared to Genetic Algorithms, Monte Carlo, Simulated annealing, Support Vector Machines and other optimization methods – on specific problems and found to do as well or better


Iterative Stochastic Elimination publications1. Glick, M. & Goldblum, A. A novel energy-based stochastic method for positioning polar

protons in protein structures from X-rays. Proteins-Structure Function and Genetics 38, 273-287 (2000).

2. Glick, M., Rayan, A. & Goldblum, A. A stochastic algorithm for global optimization and for best populations: A test case of side chains in proteins. Proceedings of the National Academy of Sciences of the United States of America 99, 703-708 (2002).

3. Noy, E., Gorelik, B., Rayan, A. & Goldblum, A. Stochastic path to form ensembles and to quantify flexibility in proteins. Abstracts of Papers of the American Chemical Society 225, U781-U781 (2003).

4. Rayan, A., Barasch, D., Brinker, G., Cycowitz, A., Geva-Dotan, I., Scaiewicz, A. & Goldblum, A. New stochastic algorithm to determine drug-likeness. Abstracts of Papers of the American Chemical Society 226, U297-U297 (2003).

5. Rayan, A., Scaiewicz, A., Geva-Dotan, I., Barasch, D. & Goldblum, A. Screening molecules for their drug-like index. Abstracts of Papers of the American Chemical Society 228, U358-U358 (2004).

6. Rayan, A., Senderowitz, H. & Goldblum, A. Exploring the conformational space of cyclic peptides by a stochastic search method. Journal of Molecular Graphics & Modelling 22, 319-333 (2004).

7. Rayan, A., Noy, E., Chema, D., Levitzki, A. & Goldblum, A. Stochastic algorithm for kinase homology model construction. Current Medicinal Chemistry 11, 675-692 (2004).

8. Rayan, A., Scaiewitz, A., Geva-Dotan, I., Marcus D., Barasch, D. & Goldblum, A (2007). Determining the Drug Like character of molecules and prioritizing them by a drug like index, ACS presentations 2005-8.

9. Noy, E., Tabakman, T. & Goldblbum A. Constructing ensembles of flexible fragments by ISE is relevant to protein-protein interfaces, Proteins (2007) 68, 702-711

10. Gorelik, B & Goldblum, A. High Quality binding modes in docking ligands to proteins. Proteins (2008), 71, 1373-1386


General Model System

A

B

C

DE

B5

B8

B7

B6

A1 A2

A6

A7

C6

C7

C5

The number of combinations:

7)A(x8(B)x7(C)xn(D)xm(E)..…

=A very large number

B4

C4

• Variables• Values• Interactions

• An exhaustive calculation is not possible


)1 (Randomly pick: one value for each of the variables

A

B

C

D

E

A7

C5

This determines a single “conformation” or “configuration” of the system

B4

(2) Employ the “cost function” to score the current configuration


(3) Repeat steps (1) and (2) for n conformations(n~103-106), and calculate the total value of each

sample 2

sample n

.

.

. ..

.

nth value

2nd value


(4) Construct a histogram of the distribution ofvalues for all sampled conformations

0%1%2%3%4%5%6%7%8%9%

10%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Function Value

Dis

trib

utio

n

low values region high values

region


0%1%2%3%4%5%6%7%8%9%

10%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Column number

Dis

trib

uti

on

high values region

conformation 715

zoom

conformation 314conformation 220

A3

B4

C6

D7

E8

F1

A3

B8

C6

D2

E8

F2

A3

B4

C6

D6

E2

F9

A3

C6

(5) Examine the frequency of each variable value in worst results, compare to expected


(6) Evict values that contribute above expectation to worst scores, and less than expected to best

(7) Repeat the process iteratively until all remaining combinations can be evaluated exhaustively and sorted. We obtain a population

conformation 314conformation 220 conformation 715

B4

D7

E8

F1

B8

D2

E8

F2

B4

D6

E2

F9

A3

C6

The total number of combinations is reduced


Acetylcholinesterase inhibitors with ISEInhibition measured by Marta Rosin (Novartis’ Excellon) , Hebrew University School

of Pharmacy

Target specificity

>2 million molecules

9 molecules, 5 measured, 3 active

Molecular chemical properties

ISE Docking and scoring

ISE “engine ”


Bcr-Abl dimerization inhibition by peptides 64aa Synthesized and measured by Martin Ruthardt, Goethe Univ. Frankfurt

Target specificity

~ 1080 sequences

10 peptides ,

6 active

Properties of amino acids

ISE protein design

ISE “engine ”


Distinguishing between actives and inactives, on a specific targetClassification – Drugs vs. Non-drugs, Selectives vs. non Selectives

Huge combinatorial problem with more than 10100 options

Optimization problem: find differences in molecular properties to distinguish between

actives and inactives


Learning from known data

“Actives” : Molecules with activity < 100nm“Selectives” : Molecules with selectivity > 3:1

“Inactives”: MDDR (randomly picked), or less actives

Properties (“descriptors”, our variables) are produced by computer programs (MOE):

Molecular weight, number of H-bond donors & acceptors, partial charges, topological, polar surface, Van der Waals,

Molar refraction etc…


Lower Range 0 800 ~ 80 values at intervals of 10

Optimization of property ranges by ISE to distinguish between the two databases

Mol Weight values

0

10

20

30

40

50

60

0 250 500 750 1000 1250 1500MW

pe

rce

nta

ge

0 1200

Upper Range 500 1200 ~70 values

Overall there are 80*70 = 5.6*103 combinations for ranges of this variable

100 700 Randomly picked range

Each property is separated into two “sub properties”


Using properties to optimize the differencebetween actives (selectives) and inactives

2 < HD 6

-2 < logP 3

150 < M.W 775

If we construct a RANGE for each property

Determine if TP, TN, FP, FN ( P N Pf Nf)

Then we test each of the molecules in the Actives and each in the inactives

Compute the fraction of each category in the full DB Use the Matthews Correlation to score

A FILTER


Scoring by the Matthews CorrelationEach given range is for ACTIVES, and actives can only be P or Nf

For a fully correct prediction C = 1

For a completely erroneous prediction C = - 1

For a random prediction C ~ 0.00

actives inactives

PfP

Databases :

Nf N

))()()((

)()(

ffff

ff

PPNPPNNN

NPPNMCC


))()()((

)()(

ffff

ff

PPNPPNNN

NPPNMCC

Applying ISE to discriminate between actives and inactives by optimizing descriptor ranges

Construct filter i :Pick randomly a value for each of the variables ,

i.e., low range MW, high range MW etc.

Construct filter i :Pick randomly a value for each of the variables ,

i.e., low range MW, high range MW etc.

Pass all actives and inactives of the training set through filter i

Pass all actives and inactives of the training set through filter i

Get MCC valuefor filter i

Get MCC valuefor filter i

P, N, Pf, Nf

Until i = 106

Histogram, Elimination, Iteration, Exhaustive, Test


Results of exhaustive step, before clustering

MCCMWClogPHDonHacc%actives

) P(

%inactives

) N(

0.49282< -6< 0< 2 8267

0.49292< -2.5< 0< 2 7871

0.49292< < 9.5 0< 2 8069

0.49301< -6< 0< 2 7772

0.48282< -6< 0< 1 8563

Bestfilter


n

NN

PP

MBI f

inactive

f

n

i

active 1

Employing the “best sets of filters” to construct a Molecular Bioactivity Index

With good data, the range of MBI is large and we get a good “resolution”

We have shown that we can use MBI to “fish” a few active molecules out of a “sea” of inactive oneshttp://www.md.huji.ac.il/models (look for “test MBI”)



n

NN

PP

DLI f

inactive

f

n

i

active 1

Employing the “best sets of filters” to construct a Drug Likeness Index (DLI)

Drug Likeness is different than Lipinski’s ROF!


High Throughput Screening

Combinatorial Synthesis

Hit to lead development

Lead optimization

Construction of Focused libraries

Molecular scaffold optimization

Selectivity optimization

MBI and DLI can make a difference in:


Timeline for discovery, single processorOne target (enzyme, cells, organs…)

11.. Model building

2-3 days

22.. ZINC scan Few hrs.

33.. Diversity, SimilarityEliminate known activesA few hours

4. SCIFinder manual search

4-5 days

5. Purchase/synthesize molecules

6. in vitro tests1-2 months


Input: VEGFR-2 KDR active inhibitors <100nm

549 actives divided randomly into 412 training and 137 test setInactives are from MDDR


Output: example of a filter with 6 descriptorsOne of the best (high MCC); there are others with higher MCC but many desciptors

Number of descriptors – 6MCC of test set – 0.79

TP - 98.9TN - 78.6

Bcut_SMR_3 0.0 – 3.06 SMR_VSA4 0.1 - 100.6

Vsa_pol 0.1 – 102.4

Reactive 0.0 – 0.999 balabanJ 0.0 - 1.902 Q_RPC- 0.0 – 0.267


A 6-property filter

Bcut_SMR_3 Molar refractionSMR_VSA4 VdW surface areaVsa_pol Approx VdW polar surfaceReactive Reactive fragmentsbalabanJ Topological variableQ_RPC- Relative Negative partial charge


MBI MODEL for VEGFR

Green :% True Positives above threshold Red :% True Negatives below threshold

Blue: Enrichment Factor

0

20

40

60

80

100

-18 -8 2 12 22

MBI Threshold

True

P

ositi

ves/

Neg

ativ

es

0

100

200

300

400

500

Enrichment in the training set of VEGFR2


Initial focused library from ZINC (2.1 million)

ZINC library screening gave 7826 molecules with top MBI


Similarity of focused library from ZINC against known VEGFR active compounds

0

500

1000

1500

2000

2500

3000

0.0

3

0.0

8

0.1

3

0.1

8

0.2

3

0.2

8

0.3

3

0.3

8

0.4

3

0.4

8

0.5

3

0.5

8

0.6

3

0.6

8

0.7

3

0.7

8

0.8

3

0.8

8

0.9

3

0.9

8

Tanimoto Index

Nu

mb

er

of m

ole

cu

les

0.0250

0.07537

0.125858

0.1752678

0.2252655

0.2751071

0.325344

0.375112

0.42554

0.47510

0.5252

0.5754

0.6251

0.6750

0.7250

0.7750

0.8250

0.8750

0.9250

0.9750

Similarity of highest MBI to training set


-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1B

BB

Ind

ex

Negative BBB pass

Positive BBB pass

BBB results


ER-MBI “moving ensemble”(normalized MBI values) lo

gRB

A

ER-MBI

HighModerateLow


ER-MBI Combined high/low MBI

-5

-4

-3

-2

-1

0

1

2

3

-1 -0.5 0 0.5 1

ER-MBI

log

RB

A

LowModerateHigh

R ²=0.75


Molecular bioactivity index


Molecular Bioactivity Index (MBI):Fishing actives from a “bath” of “non-actives”

Mix 10 in 100,000 - find 9 in best 100, 5 in best 10

Enrichment of 5000Enrichment of 900


Polypharmacology – with our indexing method

• We use several MBI (or MBI and DLI) to map activity into multiple targets.

This may be used to extract potential new poly-active compounds or

selective compounds depending on the behavior of the relevant disease

MBI target1

MBI target2

Multitarget

Target2 selective

Target1 selective

Non-actives


Docking & Scoring

Do the molecules bind ?

Requirement: 3D structure of the target

How strong is the binding

affinity?

How does the complex look like ?

X-ray, NMR,Homology

model

Binding modeScore


ISE-dock

• A new docking program from our lab that

uses the ISE algorithm in order to produce

large sets of optimal results for docking of

ligands to their targets


ISE-dock

• Better than AutoDock – the most cited docking program

• Much better in the main docking criteria than other two popular programs – Glide and GOLD

• Produces large near optimal docking populations to study the nature of binding and to predict alternative binding modes

• Accounts for ligand and protein flexibility• Correlation between ISE-dock

populations and experimental multiple binding modes


Anti Alzheimer current main drug strategy


MBI MODEL for AChE inhibitionGreen: % True Positives above thresholdRed: % False positives above threshold

Blue : richment factor

0

20

40

60

80

100

-10 0 10 20 30 40

MBI Threshold

Tru

e/F

alse

P

ositi

ves

0

500

1000

1500

2000

Based on ~450 active molecules with IC50 < 10 micromolar~8000 randomly picked molecules from ZINC assumed to be inactives


Docking with ISE-dock/Autodock

We used the crystal structure of mouse AChE (1q84) for docking.

Compounds in protonated state were docked to AChE by AutoDock3.0 and ISE-Dock.

751 out of 755 compounds were docked in the active site by both methods


10 different conformations of one ligand in the AChE. Each color represents a different pose

Fig 2 – AChE with ACh , the red color represents the negatively charged gorge due to many side chain aromatic rings

ISE-dock results


10 compounds from docking results (financial limitation)

The 10 compounds were picked by direct examination of each of these molecules in the active site, paying utmost attention to its conformation, H-bonds and other interactions.


Experimental Results

9 out of the 10 compounds were purchased

8 out of the 9 compounds reached our lab with enough quantity

5 out of the 8 compounds are soluble

3 out of the 5 compounds are active (IC50=3.25, 3.5, 3.75 µM)

Similarity to known active compounds is less than 0.35

molecules are novel AChE inhibitors (not a single paper on any)


ISE is useful for solving extremely complex

optimization problems

Provides large sets of graded results

Achieves high enrichments of “actives” vs.

“inactives” by MBI, DLI, MSI etc.

Useful for developing multi-targeted drugs

Discovers new binders for known drug targets

Produces diverse sets of solutions

Conclusions


Molecular Modeling Group Partnershttp://www.md.huji.ac.il/models

http://www.cancergrid.eu

Prof. Andrej Bohac: Comenius U, Bratislava, VEGFR2 (Angiokem)DAC company Milan, HDAC and HSP90 inhibitionProf. Mart Sarma U. Helsinki, RET Kinase inhibitionProf. Martin Rhutardt U. Frankfurt, Bcr-Abl inhibition by peptidesProf. Yousef Najajreh Al Quds University, Bcr-Abl inhibitor synthesisProf. Yossi Schlessinger Yale, FGFR inhibitorsProf. David Varon Hadassah, Jerusalem, ADAMTS-13 inhibitionProf. Angelo Carotti: School of Pharmacy, Univ. of Bari, MMP inhibitorsProf. Marta Rosin HUJI, AChE inhibitors


http://www.cancergrid.eu/


Molecular Modeling Group, HUJIhttp://www.md.huji.ac.il/models




forming focused libraries and discovering active molecules with iterative stochastic elimination

Documents

stochastic method

b goldblum

stochastic path

anwar rayan

new stochastic algorithm

stochastic search method

values of variables

american chemical society