vs explorer – analyzing large scale docking experiments chemaxon 2005 user group meeting marc...

21
VS Explorer – Analyzing large scale docking experiments ChemAxon 2005 User Group Meeting Marc Zimmermann Martin Hofmann

Upload: david-cross

Post on 26-Mar-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: VS Explorer – Analyzing large scale docking experiments ChemAxon 2005 User Group Meeting Marc Zimmermann Martin Hofmann

VS Explorer – Analyzing large scale docking experiments

ChemAxon 2005 User Group Meeting

Marc ZimmermannMartin Hofmann

Page 2: VS Explorer – Analyzing large scale docking experiments ChemAxon 2005 User Group Meeting Marc Zimmermann Martin Hofmann

Page 2Marc Zimmermann, 2005 ChemAxon UGM05

•28 million compounds currently known

•Drug company biologists screen up to 1 million compounds against target using ultra-high throughput technology

•Chemists select 50-100 compounds for follow-up

•Chemists work on these compounds, developing new, more potent compounds

•Pharmacologists test compounds for pharmacokinetic and toxicologicalprofiles

•1-2 compounds are selected as potential drugs

Selection of Potential Drugs

Page 3: VS Explorer – Analyzing large scale docking experiments ChemAxon 2005 User Group Meeting Marc Zimmermann Martin Hofmann

Page 3Marc Zimmermann, 2005 ChemAxon UGM05

High Volume Screening Analysis – the Methods

Screening

vHTS(similarity, docking)

HTS

Clustering

active

inactive

AssemblingFiltering

Modeling

Virtual Screening – Computational or in silico analog of biological screening

o

Score, rank, and/or filter a set of structures using one or more computational procedures

o

Helps to decide:

Which compounds to screen

Which libraries to synthesize

Which compounds to purchase from an external source

Virtual Screening – Computational or in silico analog of biological screening

o

Score, rank, and/or filter a set of structures using one or more computational procedures

o

Helps to decide:

Which compounds to screen

Which libraries to synthesize

Which compounds to purchase from an external source

Page 4: VS Explorer – Analyzing large scale docking experiments ChemAxon 2005 User Group Meeting Marc Zimmermann Martin Hofmann

Page 4Marc Zimmermann, 2005 ChemAxon UGM05

High Volume Screening Analysis – the Tools at SCAI

Screening

ClusteringAssemblingFiltering

Modeling

HTSviewVS Explorer DB Annotator

FTreesFlexX

GRID Layer

ProMinerTopNet

Page 5: VS Explorer – Analyzing large scale docking experiments ChemAxon 2005 User Group Meeting Marc Zimmermann Martin Hofmann

Page 6Marc Zimmermann, 2005 ChemAxon UGM05

•Enable scientists to quickly and easily find compounds binding to a

particular target proteino growth of targets numbero growth of 3D structures determination (PDB database)o growth of computing powero growth of prediction quality of protein-compound interactions

•Experimental screening very expensive : not for academic or small

companies

•Aim : Active molecules

Tested molecules

Computational Aspects of Drug Discovery : Virtual Screening

Page 6: VS Explorer – Analyzing large scale docking experiments ChemAxon 2005 User Group Meeting Marc Zimmermann Martin Hofmann

Page 7Marc Zimmermann, 2005 ChemAxon UGM05

In silico drug discovery process (EGEE, Swissgrid, …)

Clermont-Ferrand

The grid impact :

•Computing and storage resources for genomics research and in silico drug discovery

•cross-organizational collaboration space to progress research work

•Federation of patient databases for clinical trials and epidemiology in developing countries

Grids for neglected diseases and diseases of the developing world

Support to local centres in plagued areas (genomics research, clinical trials and vector control)

SCAI Fraunhofer

Swiss Biogrid consortium

Local research centresIn plagued areas

Page 7: VS Explorer – Analyzing large scale docking experiments ChemAxon 2005 User Group Meeting Marc Zimmermann Martin Hofmann

Page 8Marc Zimmermann, 2005 ChemAxon UGM05

Structure-Based Virtual Screening

Protein-Ligand Docking

o Aims to predict 3D structures when a molecule “docks” to a protein

Need a way to explore the space of possible protein-ligand geometries (poses)

Need to score or rank the poses

o Problem: many degrees of freedom (rotation, conformation, solvent effects)

Protein-Ligand Docking

o Aims to predict 3D structures when a molecule “docks” to a protein

Need a way to explore the space of possible protein-ligand geometries (poses)

Need to score or rank the poses

o Problem: many degrees of freedom (rotation, conformation, solvent effects)

Ligand databaseTarget Protein

Molecular docking

Ligand docked into protein’s active site

Page 8: VS Explorer – Analyzing large scale docking experiments ChemAxon 2005 User Group Meeting Marc Zimmermann Martin Hofmann

Page 9Marc Zimmermann, 2005 ChemAxon UGM05

Grid VS Results Browser

•Quick overview on very large log-files

•Sorting and merging of files

•Storing and retrieval in databases

•Similarity searches and property

predictions

•Interface to R statistics box

•Prototype is under construction

concat('ZINC', lpad(p.sub_id_fk,8,'0')) | target | ligand | conformations || score || timeZINC00000057 | 1cet | ZINC00000057 | 172 || -7.45 || 3.25 ZINC00000061 | 1cet | ZINC00000061 | 203 || -18.37 || 3.84 sZINC00000066 | 1cet | ZINC00000066 | 241 || -25.58 || 39.92 sZINC00000122 | 1cet | ZINC00000122 | 399 || -14.14 || 7.41 sZINC00000197 | 1cet | ZINC00000197 | 272 || -8.60 || 2.44 sZINC00000290 | 1cet | ZINC00000290 | 259 || -15.00 || 20.40 sZINC00000349 | 1cet | ZINC00000349 | 82 || -10.81 || 22.20 sZINC00000453 | 1cet | ZINC00000453 | 256 || -14.61 || 3.76 sZINC00000484 | 1cet | ZINC00000484 | 447 || -18.33 || 35.53 sZINC00000607 | 1cet | ZINC00000607 | 418 || -15.77 || 7.43 s

concat('ZINC', lpad(p.sub_id_fk,8,'0')) | target | ligand | conformations || score || timeZINC00000057 | 1cet | ZINC00000057 | 172 || -7.45 || 3.25 ZINC00000061 | 1cet | ZINC00000061 | 203 || -18.37 || 3.84 sZINC00000066 | 1cet | ZINC00000066 | 241 || -25.58 || 39.92 sZINC00000122 | 1cet | ZINC00000122 | 399 || -14.14 || 7.41 sZINC00000197 | 1cet | ZINC00000197 | 272 || -8.60 || 2.44 sZINC00000290 | 1cet | ZINC00000290 | 259 || -15.00 || 20.40 sZINC00000349 | 1cet | ZINC00000349 | 82 || -10.81 || 22.20 sZINC00000453 | 1cet | ZINC00000453 | 256 || -14.61 || 3.76 sZINC00000484 | 1cet | ZINC00000484 | 447 || -18.33 || 35.53 sZINC00000607 | 1cet | ZINC00000607 | 418 || -15.77 || 7.43 s

"Smiles";"Data""c1(N2CCC(CC2)C(OCC)=O)sc3c(ccc(Cl)c3)n1";MAC-0000001;02;101.66;104.66"C(=O)(Nc(cc1)ccc1Cl)N(CCCN2c(c(Cl)cc3C(F)(F)F)nc3)CC2";MAC-0000002;02;101.14;105.89"n1(CC(CNCCNc2nccc(n2)C(F)(F)F)O)c3c(cc1)cccc3";MAC-0000003;02;101.64;97.32"[N+](=O)([O-])c(ccc1N(CCCN2C(=S)Nc3ccc(cc3Cl)Cl)CC2)cn1";MAC-0000004;02;100.09;101.14"[N+](=O)([O-])c(ccc1N(CCCN2C(=S)Nc3ccc(cc3Br)F)CC2)cn1";MAC-0000005;02;108.98;97.02"C(F)(F)(F)c1ccnc(NCCNC(=O)c2ccco2)n1";MAC-0000006;02;110.19;106.15"C(F)(F)(F)c1ccnc(NCCNC(c2ccccc2)=O)n1";MAC-0000007;02;107.42;98.46"C(NCc1ccco1)(=S)Nc(cccn2)c2";MAC-0000008;02;103.86;97.98"C(F)(F)(F)c1ccnc(NCCNC(=S)Nc(cccn2)c2)n1";MAC-0000009;02;107.77;98.6"C(=O)(c1cccs1)N(CCCN2CC(O)COc(ccc3C(C)=O)cc3)CC2";MAC-0000010;02;107.41;104.92"C(F)(F)(F)c1ccnc(NCC=C)n1";MAC-0000011;02;105.78;106.84"N1(CCNc2ncccc2C(F)(F)F)C(=O)CC3(CCCC3)C1=O";MAC-0000012;02;105.26;103.38"N1(CCCNc(c(Cl)cc2C(F)(F)F)nc2)C(=O)CC3(CCCC3)C1=O";MAC-0000013;02;102;106.84

"Smiles";"Data""c1(N2CCC(CC2)C(OCC)=O)sc3c(ccc(Cl)c3)n1";MAC-0000001;02;101.66;104.66"C(=O)(Nc(cc1)ccc1Cl)N(CCCN2c(c(Cl)cc3C(F)(F)F)nc3)CC2";MAC-0000002;02;101.14;105.89"n1(CC(CNCCNc2nccc(n2)C(F)(F)F)O)c3c(cc1)cccc3";MAC-0000003;02;101.64;97.32"[N+](=O)([O-])c(ccc1N(CCCN2C(=S)Nc3ccc(cc3Cl)Cl)CC2)cn1";MAC-0000004;02;100.09;101.14"[N+](=O)([O-])c(ccc1N(CCCN2C(=S)Nc3ccc(cc3Br)F)CC2)cn1";MAC-0000005;02;108.98;97.02"C(F)(F)(F)c1ccnc(NCCNC(=O)c2ccco2)n1";MAC-0000006;02;110.19;106.15"C(F)(F)(F)c1ccnc(NCCNC(c2ccccc2)=O)n1";MAC-0000007;02;107.42;98.46"C(NCc1ccco1)(=S)Nc(cccn2)c2";MAC-0000008;02;103.86;97.98"C(F)(F)(F)c1ccnc(NCCNC(=S)Nc(cccn2)c2)n1";MAC-0000009;02;107.77;98.6"C(=O)(c1cccs1)N(CCCN2CC(O)COc(ccc3C(C)=O)cc3)CC2";MAC-0000010;02;107.41;104.92"C(F)(F)(F)c1ccnc(NCC=C)n1";MAC-0000011;02;105.78;106.84"N1(CCNc2ncccc2C(F)(F)F)C(=O)CC3(CCCC3)C1=O";MAC-0000012;02;105.26;103.38"N1(CCCNc(c(Cl)cc2C(F)(F)F)nc2)C(=O)CC3(CCCC3)C1=O";MAC-0000013;02;102;106.84

M END> <Object Id>MAC-0000100

> <Batch Ref>03

> <Supplier Object Id>6743501

> <ENZ_KINETIC_RES_ACT.RES_ACT>

M END> <Object Id>MAC-0000100

> <Batch Ref>03

> <Supplier Object Id>6743501

> <ENZ_KINETIC_RES_ACT.RES_ACT>

Page 9: VS Explorer – Analyzing large scale docking experiments ChemAxon 2005 User Group Meeting Marc Zimmermann Martin Hofmann

Page 10Marc Zimmermann, 2005 ChemAxon UGM05

Rapid prototyping using ChemAxon Libraries

GUI (Swing)

File I/ODB connect

Table Module

Chem Module

•100% Pure JAVA (JRE)o

Swing

o

JTable

•Using ChemAxon (MarvinBeans) for the chemical stuff

•OJDBC for database connection to Oracle

Page 10: VS Explorer – Analyzing large scale docking experiments ChemAxon 2005 User Group Meeting Marc Zimmermann Martin Hofmann

Page 11Marc Zimmermann, 2005 ChemAxon UGM05

Molecule Rendering

From spreadsheets to molecular spreadsheets

o Overloading cellRenderer with Marvin from

Switch SMILES Structure on / off

Page 11: VS Explorer – Analyzing large scale docking experiments ChemAxon 2005 User Group Meeting Marc Zimmermann Martin Hofmann

Page 12Marc Zimmermann, 2005 ChemAxon UGM05

File Import / Export

•Implemented as a thread

•Comma Separated Files

o CSV Parser

o Preview Window

o Tag missing Values

•SDF Molecular Files

o SDF Properties Names as Row-Keys

o Import Coordinates

o Based on MolImporter from

Preview

Page 12: VS Explorer – Analyzing large scale docking experiments ChemAxon 2005 User Group Meeting Marc Zimmermann Martin Hofmann

Page 13Marc Zimmermann, 2005 ChemAxon UGM05

Smart Indexing for large Collections

• Large index storing filepointers or database keys

• JAVA TableModel only stores the full information for a limited number of elements (cache)

Index

FilePointer

Page 13: VS Explorer – Analyzing large scale docking experiments ChemAxon 2005 User Group Meeting Marc Zimmermann Martin Hofmann

Page 14Marc Zimmermann, 2005 ChemAxon UGM05

Interactive Focus on Data

• Large index storing filepointers or database keys

• JAVA TableModel only stores the full information for a limited number of elements

• EventHandler for scrolling triggers reload from external memory (e.g. a cursor for RDB)

• Update of the TableModel

Index

FilePointer

Page 14: VS Explorer – Analyzing large scale docking experiments ChemAxon 2005 User Group Meeting Marc Zimmermann Martin Hofmann

Page 15Marc Zimmermann, 2005 ChemAxon UGM05

Column Sorting

• EventHandle starting a sorting thread

• Resorting of the Index for flat files

• New database query:+ ORDER BY columnLabel

• Coming next:

o Implementation of efficient online sorting algorithms in order to reduce the file access

o Merging of two tables

Index sort(List)

Object

FilePointer

Page 15: VS Explorer – Analyzing large scale docking experiments ChemAxon 2005 User Group Meeting Marc Zimmermann Martin Hofmann

Page 16Marc Zimmermann, 2005 ChemAxon UGM05

DB Annotator: Semantics for databases

Semantic annotation of relational data

o Linking databases and ontologies

o Using the VS Explorer as Plugin

Ontologybrowser

VS Explorer

Page 16: VS Explorer – Analyzing large scale docking experiments ChemAxon 2005 User Group Meeting Marc Zimmermann Martin Hofmann

Page 17Marc Zimmermann, 2005 ChemAxon UGM05

DHFR Assay for E.coli:

•Folate -> DHF -> THF -> synthesis of thymidin

•Important for cell growth

•DHFR inhibitor: Trimethoprim

DHF

Trimethoprim

Bioorg Med Chem Lett. 2003 Aug 4; 13(15):2493-6

High throughput screening identifies novel inhibitors of

Escheria coli dihydrofolate reductase that are

competitive

with dihydrofolate.

Zolli-Juran M, Cechetto JD, Hartlen R, Daigle DM, Brown ED.

http://hts.mcmaster.ca/HTSDataMiningCompetition.htm

Bioorg Med Chem Lett. 2003 Aug 4; 13(15):2493-6

High throughput screening identifies novel inhibitors of

Escheria coli dihydrofolate reductase that are

competitive

with dihydrofolate.

Zolli-Juran M, Cechetto JD, Hartlen R, Daigle DM, Brown ED.

http://hts.mcmaster.ca/HTSDataMiningCompetition.htm

Page 17: VS Explorer – Analyzing large scale docking experiments ChemAxon 2005 User Group Meeting Marc Zimmermann Martin Hofmann

Page 18Marc Zimmermann, 2005 ChemAxon UGM05

Docking with FlexX1

•PDB structure 1RA2

•Cocrystallized DHFR and NADP

•FlexX places water particles

1Rarey M, Kramer B, Lengauer T and Klebe G, J Mol Biol 1996, 261(3):470-89.

15th Symposium on QSAR 2004; Poster

Drilling into a HTS data set of e. coli.

Zimmermann M, Tresch A, Maass A, Hofmann M

15th Symposium on QSAR 2004; Poster

Drilling into a HTS data set of e. coli.

Zimmermann M, Tresch A, Maass A, Hofmann M

Page 18: VS Explorer – Analyzing large scale docking experiments ChemAxon 2005 User Group Meeting Marc Zimmermann Martin Hofmann

Page 19Marc Zimmermann, 2005 ChemAxon UGM05

In silico Screening Workflow:

HTS

2D Similarity Analysis

Fragment Analysis

Classification

MD Simulation

QSAR

Training Set Test SetDocking

CandidatesActivityRegion

active

inactive

Page 19: VS Explorer – Analyzing large scale docking experiments ChemAxon 2005 User Group Meeting Marc Zimmermann Martin Hofmann

Page 20Marc Zimmermann, 2005 ChemAxon UGM05

1CET – Lactate Dehydrogenase of Plasmodium Falciparum

Malaria Target:

o Chloroquine binds in the

cofactor binding site of

Plasmodium Falciparum

lactate dehydrogenase

o PDB structure: 1CET

o Ligand: Chloro-Quinolin

o Test Ligands: Ambinter data

set from ZINC

Page 20: VS Explorer – Analyzing large scale docking experiments ChemAxon 2005 User Group Meeting Marc Zimmermann Martin Hofmann

Page 21Marc Zimmermann, 2005 ChemAxon UGM05

1CET vs. 50 000 Compounds on 200 Nodes: Global Statistics

•Done : 100%

•Rescheduled : 46

•Running on nodes : 2296 h – 96 days

o Autodock.pl : 2288 h

o Total transfer : 8h

•submission script : 36 h

•time gain of : 64 (instead of

200)

•Ideal : 11,5 h

•Grid Time : 205,5

h

o Scheduled : 179h

o Ready : 78 mn

o Waiting : 78 mn

o Submitted : 24 h

Page 21: VS Explorer – Analyzing large scale docking experiments ChemAxon 2005 User Group Meeting Marc Zimmermann Martin Hofmann

Page 22Marc Zimmermann, 2005 ChemAxon UGM05

Planning Next Steps

•2M compounds vs. 1 protein target

o Input : 13GB

o Output : 2 TB output (dlg), 0,5 TB (pdb)

o 12 CPU/year

o Ideal : 3 days with 1350 CPUs

o Reality : clusters grid with users, queues, errors…

•Challenges for our application?

o 100% obtained results

o Minimal process time

o Grid resources consuming (storage, cpu)

o User interface for the application

o …