a novel approach to analysis of primary hts data compound set enrichment thibault varinansgar...
TRANSCRIPT
A novel approach to analysis of primary HTS data
Compound Set Enrichment
Thibault Varin Ansgar Schuffenhauer
Gubler, H., Parker, C., Zhang, JH., Raman, P., Ertl, P.
Introduction
Active series identification: Can relevant SAR be extracted from primary HTS data?
Are activity data binary or continuous?
| Compound Set Enrichment | Thibault Varin | 10/07/143
IntroductionActive series identification
| Compound Set Enrichment | Thibault Varin | 10/07/144
Hypothesis 1:Within primary HTS screening data, structure activity relationships (SAR) are apparent and can be used to help selecting active compound classes.
IntroductionAre the activity data binary or continuous?
| Compound Set Enrichment | Thibault Varin | 10/07/145
Scaffold 1 Scaffold 2
Activity
Binary activity:-1 active / 5 inactives-Scaffold 1 = Scaffold 2
Continuous activity:Scaffold 1 > Scaffold 2
Active compound (binary)Inactive compound (binary)
N
N
NN
O
O
Introduction Are the activity data binary or continuous?
| Compound Set Enrichment | Thibault Varin | 10/07/146
Threshold 1Activity
Threshold 2Activity
Binary scaffold activity is different according to the threshold
Active compound (binary)Inactive compound (binary)
Hypothesis 2:
Methods based on an activity cut-off distort the activity information leading to the incorrect assignment of active series of compounds.
N
N
N
The Scaffold Tree – Visualization of the Scaffold Universe by Hierarchical Scaffold Classification A. Schuffenhauer, P. Ertl et al. J. Chem. Inf. Model., 47, 47, 2007
MethodsThe Scaffold Tree classification
| Compound Set Enrichment | Thibault Varin | 10/07/148
MethodsDatasets
| Compound Set Enrichment | Thibault Varin | 10/07/149
PubChem Annotationfrom CRC
Simulation of the primary screening data
Hypothesis 1
Methods Single hypothesis test: summary procedure
1. State the null and the alternative hypotheses
- H0: „the scaffold is inactive“
- H1: „the scaffold is active“
2. Specify a significance level: α=0.01
3. Compute the statistics and the p-value )→p-value=probability that the scaffold is inactive (H0)
4. Decision step:
- p-value> α: H0 is accepted
- p-value< α: H0 is rejected and then H1 is accepted„The scaffold is active“
| Compound Set Enrichment | Thibault Varin | 10/07/1410
Methods The KS and the Binomial hypothesis tests
| Compound Set Enrichment | Thibault Varin | 10/07/1411
Continuous dataKS test
Binary dataBinomial test
Actives Inactives
BioassayScaffold
H0: there is no difference in the activity distribution defined by compounds having the scaffold S3-2 and the background distribution
H0: there is no difference in the proportion of active compounds for compounds having the scaffold S3-2 and the proportion of active compounds for the full dataset.
Methods Multiple hypothesis tests: Bonferroni correction
Problem of false positives• α =probability to identify as active an inactive scaffold (for each test done...)
• 100 inactive scaffolds: probability to identify an „active“ by chance is equal 63% (1-0.99100))
Suggests to test each scaffold at a critical significance level equal to α = 0.01 / Nbr of scaffolds
Makes the assumption that the individual tests are independent
Each level in the Scaffold Tree have been done separately
| Compound Set Enrichment | Thibault Varin | 10/07/1412
MethodsDetermining the activity of classes
| Compound Set Enrichment | Thibault Varin | 10/07/1413
Hypo1
Hypo2
Scaffold activity evaluation
Comparison of results
Multiple hypothesis test correction (Bonferroni)
ResultsComparison of KSP and BTP predictions
| Compound Set Enrichment | Thibault Varin | 10/07/1415
BioassayTotal BPCA significantly
activesBPCA non significantly
actives
KSP BTP Δ BPCA KSP BTP Δ KSP BTP Δ
Hydroxysteroid dehydrogenase 330 231 +99 199 183 168 +15 147 63 +84
Caspase-1 331 114 +217 5 2 2 0 329 112 +217
PK 12 4 +8 12 3 3 0 9 1 +8
Luciferase 67 12 +55 15 13 11 +2 54 1 +53
Luciferase 178 48 +130 41 32 35 -3 146 13 +133
CYP450 2C9 58 33 +25 34 34 31 +3 24 2 +22
CYP450 3A4 121 64 +57 60 60 53 +7 61 11 +50With:-KSP: KS Prediction-BTP: Binomial Threshold Prediction-Δ: KSP-BTP-BPCA: Binomial PubChem Annotation
Both KSP and BTP retrieve BPCA significantly active classesNumber of active classes: KSP > BTPMost of new KSP active classes are not BPCA significantly actives
ResultsKSP significantly active scaffolds that are in Pubchem inactives
| Compound Set Enrichment | Thibault Varin | 10/07/1416
S
NH
S
O
O
NH
NH
O
NH
S O
O
O
N
N
Inconclusives?Inconclusive?
Inconclusives?
Compound activity (PubChem Annotation)
Active InconclusiveInactiveWA
WAWA
WA
ResultsPrioritize nodes instead of individual scaffolds
| Compound Set Enrichment | Thibault Varin | 10/07/1417
Scaffold activity (KS Prediction / Bonferroni)
Non significantly activeSignificantly active
ConclusionCompound Set Enrichment
| Compound Set Enrichment | Thibault Varin | 10/07/1420
Validation of initial hypotheses
A method to mine HTS data and identify active series of compounds• Chemical classification: Scaffold Tree
• Statistical analysis: Kolmogorov-Smirnov hypothesis test
• Multiple hypothesis test correction: Bonferroni correction
Use all primary data
No activity cut-off
Identification of new active scaffolds not necessarily represented by very active compounds (latent hits) during the primary screen
With many thanks to
| Compound Set Enrichment | Thibault Varin | 10/07/1421
Acknowledgments
Primary mentor: - Ansgar Schuffenhauer
Scientific advisers:-Christian Parker-Hanspeter Gubler-Ji-Hu Zhang-Peter Ertl-Edgar Jacoby
Help: MLI group
Fellowship: Education office
Discussions:-Martin Beibel-Sebastian Bergling-Meir Glick-Alain Dietrich-Marie-Cecile Didiot