know more before you score: an analysis of structure-based virtual screening protocols ä...
Post on 20-Jan-2018
214 Views
Preview:
DESCRIPTION
TRANSCRIPT
Know More Before You Score: An Analysis of Structure-Based Virtual
Screening Protocols
Structure-Based Virtual Screening (SBVS) is a proven technique for Structure-Based Virtual Screening (SBVS) is a proven technique for lead discoverylead discovery
Still many areas for improvementStill many areas for improvement Many efforts focussed on scoring functionMany efforts focussed on scoring function
Often with little consideration of the assumptions underpinning SBVSOften with little consideration of the assumptions underpinning SBVS Here we consider a number of these processes in detail from the Here we consider a number of these processes in detail from the
perspective of our primary SBVS tool (DOCK) perspective of our primary SBVS tool (DOCK) Ligand conformational search protocolsLigand conformational search protocols Varying site points definitionsVarying site points definitions Alteration of sampling variablesAlteration of sampling variables
Determine their impact on hit enrichment and search speedDetermine their impact on hit enrichment and search speed Analyze implications for future researchAnalyze implications for future research
Ligand Flexibility StudiesStrategy
SBVS CPU intensiveSBVS CPU intensive Conformational searching of ligand clearly importantConformational searching of ligand clearly important
Sampling limited to allow search completion in reasonable time frameSampling limited to allow search completion in reasonable time frame Test required to compare different conformational sampling Test required to compare different conformational sampling
methodsmethods Ability to reproduce bioactive conformation testedAbility to reproduce bioactive conformation tested
145 ligands from a 1995 analysis of pdb complexes (Gschwend UCSF 145 ligands from a 1995 analysis of pdb complexes (Gschwend UCSF unpublished)unpublished)
30 compound subset chosen for analysis- selection based on visual and 30 compound subset chosen for analysis- selection based on visual and numerical inspection of diversity in ligand flexibility and functionality numerical inspection of diversity in ligand flexibility and functionality
Relatively small sample of molecules used, many peptidic in natureRelatively small sample of molecules used, many peptidic in nature Peptidic moieties are among the better parameterized systems, so this is Peptidic moieties are among the better parameterized systems, so this is
in some ways a best case scenario in some ways a best case scenario
Ligand Flexibility StudiesProcedure
Multiple sampling techniques chosen:Multiple sampling techniques chosen:Catalyst-best / Catalyst-fast / Confort / Omega / DOCKCatalyst-best / Catalyst-fast / Confort / Omega / DOCK
Variety of sampling levels Variety of sampling levels Starting from Concord structure, conformers generated Starting from Concord structure, conformers generated
and superimposed onto pdb ligand conformation. and superimposed onto pdb ligand conformation. Conformation with lowest heavy atom RMS to used as quality Conformation with lowest heavy atom RMS to used as quality
measure measure
Ligand Flexibility StudiesSearch Settings Employed
Dock - Dock - conformation_cutoff_factor=3/5/10 clash_overlapconformation_cutoff_factor=3/5/10 clash_overlap==0.7 times 0.7 times vdW radius for clash overlap with customized rules for bond increment vdW radius for clash overlap with customized rules for bond increment settingssettings
Confort - Confort - Rough (0.10 kcal) convergence, diverse conformer selection, Rough (0.10 kcal) convergence, diverse conformer selection, boat ring search on - sampling at 5/10 confs per single bond + 500 max boat ring search on - sampling at 5/10 confs per single bond + 500 max
Catalyst- Best/Fast Catalyst- Best/Fast Default settings - sampling at Default settings - sampling at 5/10 confs per 5/10 confs per single bond + 100 max single bond + 100 max
Omega: Omega: Defaults +Defaults + RMS_CUTOFF=1.0, GP_ENERGY_WINDOW=5.0, RMS_CUTOFF=1.0, GP_ENERGY_WINDOW=5.0, sampling at 100 maxsampling at 100 max
In addition Concord generated and Sybyl minimized ligand xray structures In addition Concord generated and Sybyl minimized ligand xray structures also analyzed as “controls”also analyzed as “controls”
Ligand Flexibility Results Overall Performance - RMS/ Rank
0.76 0.81 0.88 0.92 0.870.97 0.96 0.99 0.99 1.00 1.03 1.13
1.76
0.002.004.006.008.00
10.0012.0014.00
Min
xra
yCO
NFOR
T 50
0FA
ST 1
00CO
NFOR
T 10
BEST
100
FAST
5DO
CK 1
0BE
ST 5
OM
EGA
100
CONF
ORT
5DO
CK 5
DOCK
3Co
ncor
d
Ave
rage
inte
rnal
rank
0.000.200.400.600.801.001.201.401.601.80
Ave
rage
RM
S de
viat
ion
Average internal rankAverage rms deviation
Ligand Flexibility ResultsPerformance vs Flexibility
0
0.5
1
1.5
2
2.5
Ave
rage
RM
S D
evia
tion
3 to 5 single bonds (15)6 to 8 single bonds (7)9 to 14 single bonds (8)
Ligand Flexibility Results The Pain Gain Ratio
Does extra noise introduced to scoring functions outweigh this Does extra noise introduced to scoring functions outweigh this improvement? Is it worth the extra CPU?improvement? Is it worth the extra CPU?
425
0.81 0.87 0.88 0.92 0.96 0.97 1.03 1.125
0102030405060708090
100
Search Types
Con
form
atio
ns /
mol
ecul
e
0.000.200.400.600.801.001.201.401.601.80
RM
S de
viat
ion
Average conformations / moleculeAverage rms deviation
Ligand Flexibility ResultsVisual Analysis
Even at lower RMS, deviation in hydrogen positions an issueEven at lower RMS, deviation in hydrogen positions an issue As RMS rises (0.9) we begin to see more significant deviations in heavy As RMS rises (0.9) we begin to see more significant deviations in heavy
atom positions - large enough to possibly prove troublesome to atom positions - large enough to possibly prove troublesome to standard force fieldsstandard force fields
RMS=0.65 RMS=0.90
Ligand Flexibility ResultsVisual Analysis
As RMS rises further, hydrogen bond mapping begins to partially break downAs RMS rises further, hydrogen bond mapping begins to partially break down Significant deviation begins to be seen although general shape Significant deviation begins to be seen although general shape
complementarity is still reasonablecomplementarity is still reasonable DOCKing tricky, pharmacophore searches possible with loose tolerances, although DOCKing tricky, pharmacophore searches possible with loose tolerances, although
site point vector definitions (DISCO / Catalyst) a no nosite point vector definitions (DISCO / Catalyst) a no no
RMS=2.19RMS=1.55
Ligand FlexibilityConclusions
At current sampling levels used in virtual screeningAt current sampling levels used in virtual screening Rough search techniques perform comparably to more exhaustive methodsRough search techniques perform comparably to more exhaustive methods
Dock performs quite well, and Fast does slightly better than comparable Best runDock performs quite well, and Fast does slightly better than comparable Best run Results highlight the need for “forgiving” scoring functions and pharmacophore Results highlight the need for “forgiving” scoring functions and pharmacophore
constraint tolerances (especially for flexible molecules)constraint tolerances (especially for flexible molecules) Generating function directly from crystal structure data may not be optimumGenerating function directly from crystal structure data may not be optimum
Use the conformation closest to the biologically relevant structure with chosen sampling Use the conformation closest to the biologically relevant structure with chosen sampling techniquetechnique
May be better to ignore more flexible molecules when possible (~>8 bonds)May be better to ignore more flexible molecules when possible (~>8 bonds)
Analysis of more extensive data set might provide basis for determining if Analysis of more extensive data set might provide basis for determining if optimum sampling settings exist (Best/Omega/Confort)optimum sampling settings exist (Best/Omega/Confort) Coarseness of poling values for exampleCoarseness of poling values for example
Structure-Based Search ProtocolsAn Analysis of DOCK
Working within current DOCK paradigm, what search Working within current DOCK paradigm, what search protocols provide optimum search criterion?protocols provide optimum search criterion? Site point definitionsSite point definitions Alteration of sampling variablesAlteration of sampling variables Different scoring grids Different scoring grids
Comparisons illustrated for 5 test systems with Comparisons illustrated for 5 test systems with diverse active data sets diverse active data sets
Analysis based on ranking within list that includes Analysis based on ranking within list that includes ~10000 “noise” compounds ~10000 “noise” compounds
““Random” selection within bounds of size and flexibility Random” selection within bounds of size and flexibility distribution seen in in-house databasedistribution seen in in-house database
Structure-Based Search ProtocolsDOCK variables
Contains many variables that effect performance Contains many variables that effect performance Ligand sampling within the site being the primary variantLigand sampling within the site being the primary variant
nodesnodes 3/4 3/4distance_tolerance 0.5/1.0distance_tolerance 0.5/1.0distance_minimum 3.0distance_minimum 3.0bump_filter 4bump_filter 4conformation_cutoff_factor 5conformation_cutoff_factor 5clash_overlap 0.7clash_overlap 0.7maximum_orientations 500/5000maximum_orientations 500/5000
Structure-Based Search ProtocolsDOCK and pharmacophoric constraints
It is possible to assign fairly sophisticated pharmacophoric It is possible to assign fairly sophisticated pharmacophoric (henceforth also known as chemical) definitions(henceforth also known as chemical) definitions
name acidname acid# deprotonated carboxyl# deprotonated carboxyldefinition O.co2 ( C )definition O.co2 ( C )# tetrazole# tetrazoledefinition N.pl3 ( H ) ( N.2 ( N.2 ( N.2 ( C.2 ) ) ) )definition N.pl3 ( H ) ( N.2 ( N.2 ( N.2 ( C.2 ) ) ) )definition N.pl3 ( H ) ( N.2 ( N.2 ( C.2 ( N.2 ) ) ) )definition N.pl3 ( H ) ( N.2 ( N.2 ( C.2 ( N.2 ) ) ) )definition N.2 ( N.2 ( N.2 ( C.2 ( N.pl3 ( H ) ) ) ) )definition N.2 ( N.2 ( N.2 ( C.2 ( N.pl3 ( H ) ) ) ) )definition N.2 ( N.2 ( C.2 ( N.pl3 ( H ) ( N.2 ) ) ) )definition N.2 ( N.2 ( C.2 ( N.pl3 ( H ) ( N.2 ) ) ) )definition N.2 ( C.2 ( N.2 ( N.pl3 ( H ) ( N.2 ) ) ) )definition N.2 ( C.2 ( N.2 ( N.pl3 ( H ) ( N.2 ) ) ) )definition N.2 ( N.2 ( C.2 ( N.2 ( N.pl3 ( H ) ) ) ) )definition N.2 ( N.2 ( C.2 ( N.2 ( N.pl3 ( H ) ) ) ) )definition N.2 ( N.pl3 ( H ) ( N.2 ( N.2 ( C.2 ) ) ) )definition N.2 ( N.pl3 ( H ) ( N.2 ( N.2 ( C.2 ) ) ) )# acyl sulphonamide # acyl sulphonamide definition N.am ( S ( 2 O.2 ) ) ( C.2 ( O.2 ) )definition N.am ( S ( 2 O.2 ) ) ( C.2 ( O.2 ) )definition O.2 ( C.2 ( N.am ( H ) ( S ( 2 O.2 ) ) ) )definition O.2 ( C.2 ( N.am ( H ) ( S ( 2 O.2 ) ) ) )definition O.2 ( S ( O.2 ) ( N.am ( H ) ( C.2 ( O.2 ) ) )definition O.2 ( S ( O.2 ) ( N.am ( H ) ( C.2 ( O.2 ) ) )
Current types:heavy atom
donor
acceptor
hydrophobe
aromatic
aromatic_hydrophobic
acid
base
donor_and_acceptor
special (e.g. metal chelator)
Structure-Based Search ProtocolsSite Points Used in Kinase Search
Region 3
Hydrophobic /
Any heavy atom
Region 1 ( + 4)
acceptor / donor
Region 2
Hydrophobic + 2 donors
Structure-Based Search ProtocolsTest Sets and Site Points Used
Sphgen used to generate site points for “generic” DOCK searchesSphgen used to generate site points for “generic” DOCK searches Pharmacophore points derived from a mixture of non-data set bound ligands and in-house Pharmacophore points derived from a mixture of non-data set bound ligands and in-house
programs that process GRID maps and Connolly surfaces (plus plenty of human intervention)programs that process GRID maps and Connolly surfaces (plus plenty of human intervention)
Active data sets broken down into chemotypes to prevent the problem of common analogue Active data sets broken down into chemotypes to prevent the problem of common analogue bias - an under appreciated issue in all validationsbias - an under appreciated issue in all validations
Target Active ChemotypeDefinitions
PharmacophorePoints / Critical
Regions2 Serineproteases
P1 substituent / P1-P4 linker substituent
P1 (base /hydrophobe) + P4(hydrophobe) pockets
2 Fatty acidbindingproteins
Core linking acidmoiety to remainingsubstituents
Acid binding pocket
Kinase Moiety mimicingadenine / main coreof molecules
Adenine bindingpocket(donor/acceptor) [+rear hydrophobicpocket]
Results - kinaseNo. of hits after 50% of chemotypes located
by at least one search ( 400 compounds processed from 96 actives / 18 chemotypes)
Search type key: a_b_c(_d) e.g. cc_f_c_3 ***** NOTE poor 1 crit perform - premature terminationa: s=sphgen / c=critical / cc=chemical-critical b: s=single conf / f=flexi dockc: m=mm score / c = contact score d: = nXcr(a.b) - n node search with X critical regions and a.b distance tolerance
05
10152025
Search Type
Com
poun
ds
0246810
Che
mot
ypes
ChemotypesCompounds
Results - fatty acid binding protein 2 No. of hits after 7 chemotypes located by at least one search ( 500
compounds processed from 28 actives / 8 chemotypes)
Search type key: a_b_c(_d) e.g. cc_f_c_3 a: s=sphgen / c=critical / cc=chemical-critical b: s=single conf / f=flexi dockc: m=mm score / c = contact score d: 3=3 node search / 1.0=1.0 distance tolerance / 1.02crit/32crit = 1.0 distance tolerance or 3 node search with 2nd critical region ( hydrophobic binding pocket) / esp = electrostatic potential included in mm score / acid=all non acids removed from search lists
0
5
10
15
20
Search Types
Com
poun
ds
0
2
4
6
8
Che
mot
ypes
ChemotypesCompounds
Missing chemotype a citrazinate - not covered in chemical definitions -easy to fix - another advantage over electrostatics
Results-OverallCompounds processed for 50% Chemotype Coverage for All Systems
Search type key: a_b_c(_d) e.g. cc_f_c_3
a: s=sphgen / c=critical / cc=chemical-critical b: s=single conf / f=flexi dockc: m=mm score / c = contact score d: 3=3 node search / 1.0=1.0 distance tolerance
s_s_
cs_
s_m
c_s_
cc_
s_m
cc_s
_ccc
_s_m
s_f_
cs_
f_m
c_f_
cc_
f_m
cc_f
_ccc
_f_m
cc_f
_c_3
cc_f
_c_1
.0
0
200
400
600
800
1000
1200
1400C
ompo
unds
Search TypeBest hit rateMean hit rateWorst hit rate
Results Analysis: DOCK Scoring Functions - Shape
Contact generally a little more robust than vdW non bonded Contact generally a little more robust than vdW non bonded functionfunction More controllable bump penalty (no rMore controllable bump penalty (no rnn repulsion) repulsion)
Better able to deal with docking inaccuraciesBetter able to deal with docking inaccuracies More important in tight binding sites with pharmacophore constraints and flexible More important in tight binding sites with pharmacophore constraints and flexible
moleculesmolecules controllable max. vdW repulsion value mitigates this somwhatcontrollable max. vdW repulsion value mitigates this somwhat
Still useful with less flexible molecules for a more rigorous shape complementarity Still useful with less flexible molecules for a more rigorous shape complementarity scorescore
Results Analysis: DOCK Scoring Functions - H Bonding
ElectrostaticsElectrostatics Many intuitive reasons for caution in explicit treatmentMany intuitive reasons for caution in explicit treatment
Poor charge models / coarse conformations /inability to control ionization Poor charge models / coarse conformations /inability to control ionization statesstates
Pharmacophore centers provides better vehicle for h bonding descriptionPharmacophore centers provides better vehicle for h bonding description Spread points to allow for search approximations / set critical regions based Spread points to allow for search approximations / set critical regions based
on biological and structural information / faster searches (30-100 times)on biological and structural information / faster searches (30-100 times)
For maximum impact impact current methodology, scoring functions should either
Be designed/utilized with these limitations in mind Forgiving / targeted at less flexible molecules
Improve results by such a high degree that additional sampling (and CPU) is warranted
In the mean time, utility of pharmacophoric hypotheses {critical region(s) with pharmacophoric constraints} is clear
Better results faster / less sensitivity to model coarseness / allows constraints based on known biology
Conclusions
Acknowledgements
Thank youThank you to my BMS CADD colleagues to my BMS CADD colleagues
top related