chemoinformatics tools for lead discovery
Post on 13-Feb-2016
39 Views
Preview:
DESCRIPTION
TRANSCRIPT
Chemoinformatics tools for lead discovery
Virtual screeningThe huge numbers of molecules available in
public and in-house databases means that there is a requirement for tools to rank compounds in order of decreasing probability of activity
Range of methods available, varying in the sophistication and the amount of information that is available
Use of structure-based methods when an X-ray structure for the biological target is available
If this is not the case then must make use of information about (potential) ligands
Ligand-Based MethodsSimilarity searching
Use when just a single bioactive reference structure is available
3D pharmacophore searchingUse when it has been possible to carry out a
pharmacophore mapping exerciseMachine learning
Use when a fair number of both actives and inactives have been identified
Similarity Searching: IUse of a similarity measure to quantify
the resemblance between an active target, or reference, structure and each database structure
The similar property principle means that high-ranked structures are likely to have similar activities to that of the target structure
Similarity searching hence provides an obvious way of following-up on an initial active
Similarity searching: IIMany ways in which the similarity
between two molecules can be computedA similarity measure has two components
A structure representationA similarity coefficient to compare two
representationsMost operational systems use similarity
measures based on 2D fingerprints and the Tanimoto coefficient
Fragment bit-strings (fingerprints)
Originally developed for 2D substructure search
Similarity is based on the fragments common to two molecules
Widely used in both in-house and commercial chemoinformatics systems
C C C
C
CC C C
O
Similarity coefficientsTanimoto coefficient for binary bit strings
C bits set in common between Target and Database Structure
T bits set in TargetD bits set in Database structure
Values between zero (no bits in common) and unity (identical fingerprints)
Many other, related similarity coefficients exist:Tversky, cosine, Euclidean distance …..
CDTCSTD
Combination of search techniques using data fusion: I
Tanimoto/fingerprint measures most common but many other types, e.g., Computed physicochemical properties3D grid describing the molecular electrostatic
potential These reflect different molecular
characteristics, so may enhance search performance by using more than one similarity measureData fusion or consensus scoring
Combination of search techniques using data fusion: IICombination of different rankings of
the same sets of moleculesTwo basic approaches
• Generate rankings from the same molecule using different similarity measures (similarity fusion)
• Generate rankings from different molecules using the same similarity measure but different molecules (group fusion)
Reference 1
Group fusion
Reference 2
Reference 3
After truncation to required rank
Reference 2
Reference 1
Reference 3
Fused
r = 2000
r = 1000
Active found in earlier list
New Active
Final truncated
GroupFusion
Group fusion rulesUseful performance increases, even with just
10 actives, as better coverage of structural space with multiple starting points
Improvement most obvious when searching for heterogeneous sets of active molecules
Best results obtained by • Fusing similarity coefficient values, rather than
ranks • Re-ranking using the maximum of the similarity
values associated with each molecule • Using the Tanimoto coefficient
Turbo similarity searching: I Similar property principle: nearest
neighbours are likely to exhibit the same activity as the reference structure
Group fusion improves the identification of active compounds
Potential for further enhancements by group fusion of rankings from the reference structure and from its assumed active nearest neighbours
Turbo similarity searching: IIREFERENCE STRUCTURE
NEAREST NEIGHBOURS
RANKED LIST
Experimental details
MDL Drug Data report (MDDR) dataset of 11 activity classes and 102K structures• In all, 8294 actives in the 11 classes, with
(turbo) similarity searches being carried out using each of these as the reference structure
• ECFP_4 fingerprints/Tanimoto coefficient• MAX group fusion on similarity scores• Increasing numbers of nearest neighbours
Numbers of nearest neighbours
36
37
38
39
40
41
42
43
44
45
46
SS TSS-5 TSS-10 TSS-15 TSS-20 TSS-30 TSS-40 TSS-50 TSS-100
TSS-200
Rec
alls
at 5
% (%
)
Upper and lower bound experiments
30
35
40
45
50
55
60
65
70
SS TSS-100 Upperbound(reference +active NNsamong the100NNs)
Lowerbound(inactive
NNs amongthe 100NNs)
Upperbound(reference +100 active
NNs)
Lowerbound(100
inactiveNNs)
Rec
all a
t 5%
Rationale for upper bound resultsThe true actives in the set of assumed actives
yield significant enhancements in performance
The true inactives in the set of assumed actives have little effect on performance
Taken together, the two groups of compounds yield the observed net enhancement
Use of machine-learning methods for similarity searching: ITurbo similarity searching uses group
fusion to enhance conventional similarity searching
Machine learning is a more powerful virtual screening tool than similarity searchingBut requires a training-set containing known
actives and inactivesGiven an active reference structure, a
training-set can be generated fromUsing the k nearest neighbours of the
reference structure as the activesUsing k randomly chosen, low-similarity
compounds as the inactives
Use of machine-learning methods for similarity searching: II
REFERENCESTRUCTURE
NEARESTNEIGHBOURS
RANKEDLIST
TRAININGSET
SIMILARITYSEARCHING
MACHINELEARNING
RANDOMLYSELECTEDCOMPOUNDS
Results: IExperiments with the MDDR dataset show
that group fusion better than machine-learning methods when averaged over all of the classes
However, group fusion inferior for the most diverse datasets (as measured by the mean pair-wise similarities)
Additional searches using 10 MDDR activity classes that are as structurally diverse as possible
Results: II
18
19
20
21
22
23
24
25
26
27
28
SS TSS-100 TSS-100-SSA-R4
TSS-100-BKD-OPT
TSS-100-SVM
Rec
all a
t 5%
Conclusions: IFingerprint-based similarity searching
using a known reference structure is long-established in chemoinformatics
When small numbers of actives are available, group fusion will enhance performance when the sought actives are structurally heterogeneous
Conclusions: IICan also enhance conventional similarity
search, even if there is just a single active, by assuming that the nearest neighbours are also active
Can be effected in two ways• Use of group fusion to combine similarity
rankings (overall best approach)• Use of substructural analysis to compute
fragment weights (best with highly heterogeneous sets of actives)
Soal untuk dipelajariTunjukkan peran khemoinformatik dalam QSARData dan analisis dari khemoinformatik yang banyak
digunakan dalam docking molekulIndeks kemiripan (similarity index) banyak digunakan
untuk mendapatkan informasi tentang senyawa baru yang memiliki aktivitas biologis tinggi. Jelaskan secara singkat sistem kerjanya
Dalam penemuan obat baru yang lebih potensial dari yang sudah dikenal, banyak memanfaatkan khemoinformatiks. Jelaskan dengan beberapa contoh.
Apa perbedaan penggunaan Khemoinformatiks dalam QSAR, molecular docking dan similarity searching?
top related