chemical diversity qualify and/or quantify the extent of variety within a set of compounds. try to...

21
Chemical Diversity Qualify and/or quantify the extent of variety within a set of compounds. Try to define the extent of chemical space. In combinatorial chemistry, we are interested in the diversity of a library.

Upload: myrtle-wilkinson

Post on 18-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Chemical Diversity

Qualify and/or quantify the extent of variety within a set of compounds. Try to define the extent of chemical space.

In combinatorial chemistry, we are interested in the diversity of a library.

Example 1: here we are looking at compounds that can possess upto 2 functional groups. How do we define libraries that have different numbers of these cells occupied? How do we quantify those that have duplicates within cells?

Chemical Diversity based on propertiesExample 2: We can try to define the diversity based on properties of the compounds. For example, we could look at the naturally occurring amino acids and span the space define by their pI. This gives a poor spread, so try pI and MW. Could go to higher dimensions by also looking at the number of H-bonds they make, the number of OH groups, their dipole moment, etc.

Why is Diversity Important?• Similar Property Principle

– Structurally similar compounds will exhibit similar physicochemical and biological properties

– Test only representative compounds, eliminate redundancies• For lead discovery want a diverse space to locate all

possible hits (actives) – called a diverse library• For refining a lead into a drug (lead optimization), want

to survey a range of similar compounds – called a focused library

• Diversity hypothesis– Diverse reactants will lead to diverse products– Potentially useful for library design

• Quantify whether a library can be supplemented by additions of other compounds, other libraries

Beno, Drug Discovery Today, 2001, 6, 251 Brown, JCICS, 1996, 36, 572Gillet, JCICS, 1997, 37, 731

Types of Diversity

A library with members that sample chemical space evenly – an ideal situation for lead discovery

A library that covers the same chemical space but the compounds cluster and leave large holes.

A library with even sampling of space, but only with limited diversity – useful for modification of a lead.

From Rose, Drug Discovery Today, 2002, 7, 133.

Quantifying Diversity

• Need to define how similar (or dissimilar) two compounds are from each other– Similarity indices

• Then need to determine the spread of the compounds throughout space– Distance-based– Cell-based partitioning – Clustering

Agrafiotis, Mol. Diversity, 1999, 4, 1

Defining Similarity

• Descriptors– Property-based– Structure-based

• 2D• 3D

– Pharmacophore

• Structural keys• Fingerprints• Similarity/Distance Coefficients

Beno, Drug Discovery Today, 2001, 6, 251Willett, Curr. Opin. Biotechnology, 2000, 11, 85Willett, JCICS, 1998, 38, 983Daylight, http://www.daylight.com/dayhtml/doc/theory/theory.finger.html

Structural Keys• Boolean array expressing whether a pattern in present

(TRUE) or not (FALSE) within a molecule• This array is usually represented as a string of 1s

(TRUE) or 0s (FALSE) – a bitmap• So create a list of structural features and then set the

corresponding bit to 1 if the feature is present

Martin, J. Med. Chem., 1995, 38, 1431Flower, JCICS, 1998, 38, 379

Fingerprints• Problems with structural keys

– Lack of generality• Choice of structural keys is arbitrary and may not be appropriate for the

search or question at hand• List of structural keys can be very long and unwieldy to generate and

test• Solution – Fingerprint• Also a bitmap but NO assigned meaning to any particular bit!

– Your fingerprint is characteristic of you, but there is no meaning to any particular fragment of it

• Generate patterns from the molecule itself, such as a pattern for– Each atom– Each atom with nearest neighbors– Each group of atoms and bonds connected by up to 2 bonds long– Continuing with paths up to 3, 4, 5, 6, and 7 bonds long (seven seems to be

the longest typically employed)• This list of patterns is exhaustive, meaning all are generated for every

molecule

Fingerprints. II.• Since the number of patterns is huge, not possible to assign a

particular bit to each pattern• Instead, each pattern is the input into a hash function that creates a

number of set bits (typically 4-5 bits). These set bits are then added (with logical OR) to the fingerprint.

• Note that bit sets for different patterns may have some bits in common

• This conflict is not a problem since every bit set from some pattern (substructure) will be set in the molecule’s fingerprint.

• Each pattern (substructure) generates its particular set of bits, and it is unlikely that another pattern will set those exact same bits. So a search for that substructure simply means looking to see if those bits have been set.

• Fingerprint advantages– No predefined set of patterns (structural keys)– Structural keys are usually quite sparse, fingerprints are much more

dense

Similarity Coefficients

• Euclidean Distance

• Tanimoto Coefficient

• Cosine Coefficient

a = xjA number on bits in Ab = xjb number on bits in Bc = xjA xjB number on bits in both A and B

D(A,B) is similarity of A and B using bitsS(A,B) is similarity of A and B using continuous variables

D(A,B) = [a + b – 2c]1/2 range 0 to n bits S(A,B) = [ (xjA – xjB)2 ]1/2range 0 to infinity

D(A,B) = c/[a + b – c] range 0 to 1S(A,B) = xjAxjB / [ xjA

2 + xjB2 + xjAxjB] range -0.333 to 1

D(A,B) = c/[ab]1/2 range 0 to 1S(A,B) = xjAxjB / [ xjA

2 xjB2 ]1/2 range –1 to 1

Willett, JCICS, 1998, 38, 983

Example:

Bitmap for2,2-dimethylbutane 1111011000000 a = 6Ethylcyclobutane 1111110011100 b = 9 c = 5Euclid distance = (6+9-10)1/2 = 2.24Tanimoto coefficient = 5/(6+9-5) = 0.5Cosine coefficient = 5/(6*9)1/2 = 0.68

Problems with Tanimoto and related similarity indices

Flower, JCICS, 1998, 38, 379

Quantifying Diversity Rules for a diversity function

• adding redundant molecules does not change the value of the diversity

• Adding non-redundant molecules always increases the value of the diversity

• Space-filling behavior should be preferred• Perfect filling of space gives a finite value of the

diversity• As dissimilarity of a pair of compounds

increases, the diversity should increase asymptotically

Waldman, J. Mol. Graph. Model., 2000, 18, 412

Diversity definition 1

2

)( )(

)(

),(1)(

AN

KJSIMAD

AN

J

AN

K

Where SIM(J,K) is some similarity measurement between compounds A and B.

Can use this to build up a compound selection procedure for creating the sublibrary with maximal diversity

a) Find similarities of all compounds in the libraryb) Select compound that is most dissimilar from all otherc) Select 2nd compound that is most dissimilar from the firstd) Select 3rd compound that is most dissimilar from first 2e) Continue until you have selected as many f) compounds as you desire

Cell-based Partitioning

• Divide each dimension into a number of parts• These divisions are called cells or bins• Place compounds into appropriate bin based on the value of its

properties and/or descriptors• Can now create a sublibrary by choosing one compound from each

bin, usually the one nearest the center of the bin

Schematic representation of different sampling of diversity space

(a) Maximize Euclidean distance to create maximum diversity

(b) cell-based selection, choosing compound nearest center of each cell

From Rose, Drug Discovery Today, 2002, 7, 133

Diversity definition 2 and 32)()(

2 cellsN

i cells

Ti NNNAD

Suppose 10 molecules divided into 2 cells.Distribution 1: (5,5) – Dc2 = 0Distribution 2: (7,3) - Dc2 = -8So the more even distribution is scored as being more diverse.

But this may actually go too far – Dc2(2,2,2) > Dc2 (4,1,1) = Dc2 (3,3,0)

Makes these last two equivalent, but the (4,1,1) appears to be intuitively more diverse.

T

i

N

i T

ientropy N

NN

NDcells

ln

This entropy-like definition ranks the three sets Dentropy(2,2,2) > Dentropy(4,1,1) > Dentropy(3,3,0)

Waldman, J. Mol. Graph. Model., 2000, 18, 412

Clustering. I.• Hierarchical clusters

– Small clusters within larger clusters– Typically some relationship between clusters– Two procedures

• AgglomerativeStart with singletons and move upwards(a) Calculate all similarities of all pairs(b) Merge two most similar into a cluster(c) Continue until all only one cluster remains

• DivisiveStart with one cluster and break into smaller clusters(a) Calculate all dissimilarities of all pairs(b) Take the pair of most dissimilar structures and assign all other

structures to the least dissimilar of these initial cluster centers.(c) Recursively select the cluster with the largest diameter and partition

it intow two such that largest resulting cluster has the smallest diameter

(d) Repeat step (c) for a maximum of n-1 times

Brown, JCICS, 1996, 36, 572

Clustering. II.

• Nonhierarchical clusters• No relation between clusters• Jarvis-Patrick method

– calculate similarities of all pairs– Record top n most similar structures to each structure (nearest-

neighbor list)– Assign compounds to clusters. A and B are in the same cluster

if:• A is in the top K nearest-neighbor list of B• B is in the top K nearest-neighbor list of A• A and B have at least Kmin of their top K nearest-neighbors in

common

• Tends to produce lots of small clusters (singletons) under strict conditions or a few very large clusters under less strict conditions

Brown, JCICS, 1996, 36, 572

Goals for Diversity Metrics

• Insure the exploratory libraries are broad enough to locate active molecules

• Insure that focused (directed) libraries are both broad enough to sample space but compact enough to maintain activity

• Need to keep libraries small enough to readily manage – so want to insure that sublibraries separate actives from inactives

Other Diversity Comments

• Krchnak, Mol. Diversity, 1996, 1, 193 (http://www.5z.com/moldiv/publish/MD023/md_023.html)

– General comments of combinatorial methods and diversity• Good, JCICS, 1997, 40, 3926

– Use of 3d pharmacophores demands selection of products not reagents, since they are not additive

• Martin, J. Comb. Chem., 1999, 1, 32– Beyond diversity, library construction should include MW, lipophilicity, ease of

synthesis, pharmacophore features, reagent cost, solubility, complementarity to other libraries.

– Distance measures assess redundancy, coverage of space is better assessed with maps or binning procedures

– Diversity functions often overweight edges• Oprea, J. Comb. Chem., 2001, 3, 157

– Big numbers (lots of compounds) and serendipity are not enough• Martin, J. Comb. Chem., 2001, 3, 231

– Chemical similarity not always good predictor of bioproperties– Unlikely that a few thousand compounds can span all of chemical space– Just how much diversity is enough?