iccs9th - do protein targets segregates?
TRANSCRIPT
Do Targets Segregate?
Andrea Zaliani
A. Zaliani 9thICCS 2
Aim• Bioinformaticians were able to segregate protein
targets by several means from 1D to 3D and 4D• We have potent means to perform same analysis from
ligand standpoint:o Fingerprint (e.g. 2D,3D, interactionFP, etc)o Shape Descriptorso Grid
• Do we appreciate their peculiarities?• Would our structural knowledge grow, if we knew some
frequent target-directing structural pattern?
A. Zaliani 9thICCS 3
Start – Method• Plenty of late work trying to link protein structures,
functions and cavities to ligands (and vice versa) through similarity concepts
• I would here stress not new methods but what we have already in our hands to boost ideas with couple of applications with freely available software (like KNIME, R)
• FP = Do we appreciate their peculiarities enough?• Can we look into statistical models? If yes, do we?
A. Zaliani 9thICCS 4
Different FingerPrint (FP) for different
scopes
• Can FP explain us this? FP Type Tan-Distance
MW,LogP,HA(CDK)… 0.000Layered(RDKit) 0.082AtomPairs (RDKit) 0.098Indigo(GGA) 0.190Morgan(RDKit) 0.302FeatMorgan(RDKit) 0.348ErG* 0.375
Similarity ≈ 0.62-65*N. Stiefl et al. JCIM.,(2006), 46(1)208; N. Stiefl et al. JCIM, (2006), 46(2)587
A. Zaliani 9thICCS 5
ErG = pharmacophore-fingerprintDevelopment of ErG (Extended reduced Graph), a 2D-pharmacophoric similarity tool for virtual screening ErG is much less substructure-dependent so that:
•Opens opportunities in library design (scaffold-hopping)•Multiple-to-one correspondence of chemical substructures to pharmacophoric patterns ‘abstract’•Similarity searching & ‘scaffold-hopping’ documented•FP interpretable as each bit corresponds to the count of pharmacophore pair distances in graph
•Atom types [6] generate pairs [21] x max_distance [15] = 315 bits
Graph
N
N
Ac
D+
Ac
D+
Hf
Ac
D+
Hf
Ar Ar
Charge / H-Bonding
Hydrophobic endcaps
Abstract ring forms
Ac
D+
Hf
Ar Ar
N
N
Ac
D+
Ac
D+
Hf
Ac
D+
Hf
Ar Ar
Charge / H-Bonding
Hydrophobic endcaps
Abstract ring forms
Ac
D+
Hf
Ar Ar
RDF vectorization
AcAcd1,AcAcd2,…,AcDod4,…,ArHfd4,…..,+-d15Cpd_A,0, 0, …,1, …,1, …,0
A. Zaliani 9thICCS 6
Experiment plan - Dataset
• From a literature database select a relevant random subset (ca.17K) literature compounds showing at least one activity (pEx50>6) towards a precise target among class families like GPCR-A, Kinases, Proteases or NHR
• Data are high quality in terms of consistency• Less than 5% of entire Pharma Database of Evolvus• To check homogeneity all vs. all similarity evaluation
with TanDistance under different FP…..
A. Zaliani 9thICCS 7
Liceptor Database
Targets Annotated• GPCR’s• Ion- Channels• CNS Transporters• Kinases• Proteases• Phosphatases
Client Proprietary Targets
Small Molecule Ligand Database Features
Liceptor database can be customized with client specified additional fields and custom data annotation
• 3.2 Million Structures• > 1000 Targets• Global Patents• Med Chem. Journals• Data annotated from 1967 • Multiple Target Data• 2D Structures• Molecular Descriptors• IC50 and Unified Values• Therapeutic Indications
A. Zaliani 9thICCS 8
Pharmacophore-based FP better
RDKit FP RDKit Feature Morgan FP
A. Zaliani 9thICCS 9
Experiment plan - Dataset
A. Zaliani 9thICCS 10
Experiment plan – Classification Model• Partition Tree model generated• Platform (KIN, GPCRA, NHR, PROT) can be
predicted with 15 ErG distances only• If shuffled on Y, models generated with ave
errors ranging 63-77% (100x)• External predictions at 82,6%
A. Zaliani 9thICCS 11
Target Family Classification Model
A. Zaliani 9thICCS 12
Learn from missclassified• 15 Distances enough to segregate 17K
compounds in four classes• From model some insights can be extracted:• Example KIN relevant features:
i. Presence of Ar-NH(OH) [DoArd1>0]ii. Absence of a-aminoacid signature
AcDod3 =0iii. Need of AcArd3 >0 if i. applies or =1
6H-Benzo[c]chromen-6-one derivatives as selective ERβ agonistsBioorganic & Medicinal Chemistry Letters 16, (6), 2006, Pages 1468-1472
A. Zaliani 9thICCS 13
Learn from missclassified• 15 Distances enough to segregate 17K
compounds in four classes• From model some insights can be extracted:• Example KIN relevant features:
i. Presence of Ar-NH(OH) [DoArd1>0]ii. Absence of a-aminoacid signature
AcDod3 =0iii. Need of AcArd3 >0 if i. applies or =1
A. Zaliani 9thICCS 14
Classification Model – What to learn• 15 Distances enough to segregate 17K
compounds in four classes• From model some insights can be extracted:• PROTEASE Target relevant features:
i. Presence of AA signature AcDoD3
ii. Presence of AcArd3
iii. Absence/Presence of max 1 HfArd4
Hf
Ar
A. Zaliani 9thICCS 15
Classification Model – How do we use this• We can try to use these as smarts query into PDB
http://www.pdb.org/pdb/search/advSearch.do • PROTEASE Target relevant features:
i. Presence of AA signature AcDoD3ii. Presence of AcArd3iii. Presence of max 1 HfArd4
• Results of query after removal of non polypeptide, solvents, chain duplicates
• 101 complexes of which 53% correct proteases
• If only i.&iii. Were used, then 1141 hits found with 738 protease complexes (65%) retrieved
A. Zaliani 9thICCS 16
Single Family Classification Models• Each Target Family could also be modeled through
classification• KNIME offers several functions for:
o Data preparationo Training/Test split with stratification on populationo Data reduction performed with an exhaustive retrograde selectiono Cross-validation with 100X Leave-10%-outo Shuffled-Y 100 classification models built for negative testo Performance statistics given on 25% external test set
A. Zaliani 9thICCS 17
Classification Model – NHR
HX
Ar
A. Zaliani 9thICCS 18
Classification Model – NHR
Ave. Distance Profiles
A. Zaliani 9thICCS 190,00 0,10 0,20 0,30 0,40 0,50 0,60 0,70 0,80 0,90 1,00
Classification Model – NHR
A. Zaliani 9thICCS 20
Classification Model – Kinase
A. Zaliani 9thICCS 21
Classification Model – Kinase
Ave. Distance Profiles
A. Zaliani 9thICCS 22
Classification Model – Kinase
A. Zaliani 9thICCS 23
Classification Model – GPCRA
A. Zaliani 9thICCS 24
Classification Model – GPCRA
Ave. Distance Profiles
A. Zaliani 9thICCS 25
Classification Model – GPCRA
26
Lessons learned here• QC-based database essential • 2D Pharmacophoric FP approach is enough but has to be
“understood”• Making FP less cryptic help understanding potentialities and
limits• Targets do segregate. Ligands help us realizing this, the more
the more precise• Pharmacophoric Graph Space is immensely less problematic
than chemical space• Provocation: how big is graph space of IP?
A. Zaliani 9thICCS
A. Zaliani 9thICCS 27
Limitations• Question: you find what you already know?• Question: Do abstraction help us? • Every FP method is ok, provided that teaches us
something• Promiscuity reduction is not the only final aim
(controlled promiscuity might be a need)• Graph distances might be too general• 2D Pharmacophoric fingerprinting to be improved
A. Zaliani 9thICCS 28
Future work• 3D distances (3Dtriangles) could easily implemented• Combinations of ligand FP and cavity FP could be
really a breakthrough to have a grip on multi-pharmacology
• FP Weighting for atomic de-solvation contribution is, for me, KEY
• Agonist/antagonist split• pEX50 >6 will provide different pictures?
A. Zaliani 9thICCS 29
Acknowledgements
Prof. M. Berthold
Greg LandrumNik Stiefl
Aniket Ausekar, CEOVikram Palshikar
Rashmi Jain
Mike Bodkin
A. Zaliani 9thICCS 30
Appendix
A. Zaliani 9thICCS 31
Approach to Polypharmacology• Pharmacophore target family mapping using Neural Networks (Kohonen)• Cpds mapped together with annotated actives from different sources (MDDR, UBI, etc.)• Clustering method to suggest pharmacophore similarity (Ext.Reduced Graphs
fingerprint)SOM Binary ErG on 9444 cpds with pIC50>8
pIC50_8_SOM8_8_1M_Z (x value)0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
7
Protease GCPRa Kinases NHR Transporter
Neuron 7,3775cpds from
different families
NN
S
OO
N N 2425712pIC50(PR)=8.79
N
Cl
N
N O
O450207pIC50(NPY_V)=8.79