biology problems solvedusinginformation technology
TRANSCRIPT
BioSolve IT GmbH • An der Ziegelei 75 • 53757 Sankt Augustin • Germanywww.biosolveit.de
BioSolveITBiology Problems Solved using Information Technology
A Combinatorial Docking Approachfor Dealing with
Protonation and Tautomer Ambiguities
Ingo Dramburg
26-Apr-04Ingo Dramburg © BioSolve IT GmbH 2
Overview
Motivation, current state Automatic protonation during dockingBenchmarking the method Outlook: Isosteric replacementConclusions
26-Apr-04Ingo Dramburg © BioSolve IT GmbH 3
A drug design workflowPhase I: Building a receptor model
X-Ray structure (PDB) docked complex
preparation
preparation
DOCK
Ligand
Protein
split
min. RMSD
Receptor model
26-Apr-04Ingo Dramburg © BioSolve IT GmbH 4
A drug design workflowPhase II: Building a screening model
pre
par
atio
n docked complexes
DOCK
enrichment
LigandLigandCompounds
inactive
active
activity data
Receptor model
maximize
26-Apr-04Ingo Dramburg © BioSolve IT GmbH 5
A drug design workflowPhase III: Virtual screening
Screening model
docked complexes
DOCK
pre
par
atio
n
LigandLigandCompounds
10n
compounds
hitsbiol.assay
106
compounds
leadlike
105
compds.
similarity
lead
optimize
26-Apr-04Ingo Dramburg © BioSolve IT GmbH 6
Ambiguities in X-ray structures
Gln,Asn,His are „flippable“
Glu-,Asp-,Lys+,His,(Arg+,Cys,Tyr) are „titrateable“
N
OO
O
N
O
O
O
N
O
OH
N
ON
C+N
NN
ONH3+
N
O
N
NH
N
O
SH
Glu-
Asp-
Lys+
His
Arg+
Tyr
Cys
Valid receptor modelmust take this into account
26-Apr-04Ingo Dramburg © BioSolve IT GmbH 7
Ambiguities in compound libraries
Virtual compound libraries differ Sources (vendor, in house,...) File-formats (MOL2, SDF, SMILES, PDB,...)Conformations (stereo isomers,...)Protonation, tautomers (neutral, charged)
Which protonation state is the right one ?16
compounds
-++ -
++ -
+ -+ + -
-- -- +- + -- +- + -- + +- + + -
?NH
NH
OH
OO
OH
26-Apr-04Ingo Dramburg © BioSolve IT GmbH 8
Typical data preparation steps
1. Complex-structure: separate protein and ligand
2. Protein preparationstructure validtyions, water, cofactor handlingprotonation states for titratable sites atomtypes (interaction properties)
3. Ligand preparationstructure validityprotonation stateatomtypes (interaction properties)
26-Apr-04Ingo Dramburg © BioSolve IT GmbH 9
The BioSolveIT tool family
ProteinEnsembles Docking
MolecularSimilarity
Comb.Library
(Clib)
I/OCheminf.
FlexibleSuperpos.
FlexS
Permute SMARTSengine
FlexX/E
FTrees
26-Apr-04Ingo Dramburg © BioSolve IT GmbH 10
Protein preparation
FlexE: Ensemble of multiple conformations
common structureH-orientationprotonation
Claussen et al., J.Mol.Biol.(2001)308; 377-395
26-Apr-04Ingo Dramburg © BioSolve IT GmbH 11
Ligand preparation
Pos
tpro
c es s
i ng
Bas
ic I /
O
SMARTSrules
SYBYL, MOL2FlexX Molecule Structure
Atoms (Chemical element)
Connectivity Table
Atomic Coordinates (3D)
Atom Types (SYBYL,MOL2)
Formal Charges
Hydrogens
SDF
SMILES
PDB
26-Apr-04Ingo Dramburg © BioSolve IT GmbH 12
SMILES and be SMARTS
Line notation for molecule and subgraph description
Initially developed by Daylight Inc.
SMILES → Molecule (Unique description of molecules) SMILES „CC(=O)O“ (acetic acid)
CH3 OH
O
R OH
OS OOR
R
P OOR
R
SMARTS → Subgraph (complex description of substructures)SMARTS „[P,S,C](=[OD1])[OD1]“ (‚acidic‘ groups)
Weininger D., J. Chem. Inf. Comp. Sci. (1988)28;31-36
26-Apr-04Ingo Dramburg © BioSolve IT GmbH 13
FlexX´ SMARTS engine
Subgraph Matching Property assignment (aromaticity, atomtypes, charges,....)Descriptor assignment (interaction-types, torsion angles,...)Structure checking (valences, bondtypes)
Structure transformation (similar to SMIRKS)Structure initialisation/correction (PDB setup, bondtypes, valences)Adjustment of ambiguities (mesomerism, tautomerism,...)Chemical modifications, reactions
Combinatorial structure manipulation: PermutationPermutation of protonation statesGeneration of close analoges
26-Apr-04Ingo Dramburg © BioSolve IT GmbH 14
Structure transformationold-substructure >> new-substructure
C(=O)O >> C(:[O-0.5):[O-0.5;H0]
O-0.5
O-0.5
NH2
formal charges
C(=O)O >> C(=O)[O-]O
ONH2
(de)protonation
NCCCCCC(=O)O
OH
O
NH2NCCC >> N=CC=C
OH
O
NH
bondtypes
C(=O)O >> C(-[OH]).O
OHNH2
reduction
C(=O)O >> C(=O)OCC
O
O
NH2
addition
*C(=O)O >> *.C(-[OH])O
NH2OH
O
+
disconnect
N
OH
O
NCCCC >> N1CCCC1
ring closure
26-Apr-04Ingo Dramburg © BioSolve IT GmbH 15
Automatic complex preparation
FLEX/RECEPTOR> pdbinfo 1dwd1dwd LIGAND MID-*-1-*, # (NAPAP – SEE REMARK 13. 27 H31 N5 O4 S1)1dwd PEPTIDE *-*-*-I # (Residues: 11)1dwd PEPTIDE *-*-*-H # (Residues: 258)1dwd PEPTIDE *-*-*-L # (Residues: 29)
FLEX/LIGAND> frompdb 1dwd MID-*-1-*Extract residue(s) as ligand
FLEX/RECEPTOR> read 1dwd.pdb 6.5 ya) Read pdb-file as receptor with default settings (generic.rdf)b) Use FlexE module to find optimal protonation
FLEX/DOCKING> complex allPerform a complete docking run
26-Apr-04Ingo Dramburg © BioSolve IT GmbH 16
Combinatorial structure manipulationGroup identification
NH
NH
OH
OO
OH
16compounds
-++ -
++ -
+ -+ + -
-- -- +- + -- +- + -- + +- + + -
Scaffold
Decomposition
R2NH
XNH
X
R3
O
OH X
R2 R1R3
CoreOH
O
R1
Transformation of R-groups
Combinatorialdocking
CombinatorialLibrary
26-Apr-04Ingo Dramburg © BioSolve IT GmbH 17
Simple tranformation rules
Rule 1: C(=[OD1])[OD1] >> C(=O)O; C(=O)[O-]
CoreOH
O
R1
OH
O
R1
O
O
R1
O
OH X
R3
O
OH X
O
-O X
R2NH
X
R2NH
X
R2NH2+XR1
NH
X
R3
R2
NH
X
R3
NH2+X
R3
Rule 2: [ND2] >> [NH];[NH2+]
26-Apr-04Ingo Dramburg © BioSolve IT GmbH 18
The complete picture
FlexX
Protein-LigandComplex
ensemble
Docking
Rules:C(=O)O >> C(=O)OH; C(=O)[O-][ND2] >> [NH2]; [NH3+]
....
FlexEmodule
Solutions:
1.
2.
3.
4.
.. ....
n.
+
Rules:........
I/O
SMARTSengine
Comb.LibraryModule
Permute16
compounds
-++ -
++ -
+ -+ + -
-- -- +- + -- +- + -- + +- + + -
26-Apr-04Ingo Dramburg © BioSolve IT GmbH 19
Benchmarking: A Dataset XXL
Dataset from PDB: ~20000 complexesProtein constraints
X-Ray structures from PDB
Resolution < 3.5 ANo DNA/RNA
Ligand constraints
MW 78-750Only C,H,N,P,S,F,Cl,Br,INo pure CH compounds
Max. 10 rot. Bonds, max. ring size 9Non-covalentNon-cofactor (242 HET groups, freq. > 10)
2300 selected complexes
Sadowski J.,Buning C.,Claussen H., (2004), in preparation
26-Apr-04Ingo Dramburg © BioSolve IT GmbH 20
Redocking comparison
Generic dataset (2300 complexes)Automatic complex setupRedocking with default settings(6.5 A around ligand as site, no water or cofactors included, chemscore)
Ligand is used in two statesneutral state default protonation
Protonation selected by unified protein model
Datasets prepared by hand:Flex200 Kramer et. al.1) (200 complexes)Astex Nissink et. al.2) (305 complexes)
1) Kramer et al., Proteins (1999) 37:228-2412) Nissink et al., Proteins (2002) 49:457-471
26-Apr-04Ingo Dramburg © BioSolve IT GmbH 21
RMSD of docking solutions (any rank)
0,00
20,0040,00
60,0080,00
100,00
flex200 Astex305
2300neutral
2300default
%
1,00 1,50 2,00 2,50 3,50RMSD
1324< 2.0
26-Apr-04Ingo Dramburg © BioSolve IT GmbH 22
RMSD of solutions on rank 1
0,0010,0020,0030,0040,0050,0060,0070,00
flex200 Astex 305 2300neutral
2300default
%
1,00 1,50 2,00 2,50 3,50RMSD
676< 2.0 !
26-Apr-04Ingo Dramburg © BioSolve IT GmbH 23
Permutation
Complexes with titratable ligand: 634RMSD on rank 1 <= 2.0 is a hit
without permutationlig.setup # hitsPDB2300 neutral 634 144 22 %PDB2300 charged 634 176 27 %
with permutation
lig.setup # hits
PDB2300 neutral 634 179 28 %
PDB2300 charged 634 210 33 %
Permutation of isomersincreases hitrate by ~5%
independent of initial settings
26-Apr-04Ingo Dramburg © BioSolve IT GmbH 24
Outlook: Isosteric replacement
Substituition ([CD1,$(cH)] >> X)F,Cl,Br,I,CF3,NO2,CH3,C2H5,isoprop.,t-butyl-OH,-SH,-NH2,-CH2OH,-OMe, N(Me)2
Rings:
Insertions (*[CD2]* >> X)-CH2-,-NH-,-O-,-C2H4-,-C3H6-,...-COCH2-, -CONH-,-COO-,...>C=O, >C=S, >C=NH, >C=NOH, >C=NO-,...
NH
NH
NNH
NH
N NN
O
26-Apr-04Ingo Dramburg © BioSolve IT GmbH 25
Example: β-phenylethylamine family
NH2
OH NH
CH3
Ephedrin
OH NH2
OH
Octopamin
O
MeO
NH2OH
Metoprolol
O
NH2
O
NHCH3
CH3
OH
Atenolol
NH
OH
OH
OH
CH3
OH
Fenoterol
NH
OH
OH
OH
CH3
CH3CH3
Terbutalin
NH
OH
CH3
CH3CH3
OH
OH Salbutamol
NH
OH
CH3
CH3CH3
NH2
Cl
Cl Clenbuterol
NH2
OHOH
Dopamin
26-Apr-04Ingo Dramburg © BioSolve IT GmbH 26
Conclusions
FlexX can perform automatic docking successfullyReasonable solution for ~3/4 of selected complexes in pdb
Manual setup not always beats automatismImpossible for screening of large datasets
Significant increase of top ranked solutions with permutation of isomers (~5 %)
Combinatorial docking is fast (5-10x faster than sequential docking of all isomers)
Extension of method allows generation anddocking of close analogues on the fly