large-scale distributed in silico drug discovery using vsm-g · reference compound gw3965 has been...
TRANSCRIPT
Leo GHEMTIO : Phd Student ([email protected])Bernard MAIGRET : Advisor ([email protected])
INRIA LORRAINE, http://www.loria.fr
LargeLarge--scale distributed scale distributed in in silicosilico drug drug discovery using VSMdiscovery using VSM--GG
VVirtual
SScreening
MManager for
GGrids
Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 11
Cost evolution of new drug development.Time scale of new drug development.
Nowadays it takes ~14 years and costs more than 1.2 billion US$.
Cost and development time of a new drugCost and development time of a new drug
ContextContext : : New drug development (1)New drug development (1)
Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 22
Time scale of new drug development.
Cost and development time of a new drugCost and development time of a new drug
ContextContext : : New drug development (2)New drug development (2)
New solutions :- Drug Design
- Virtual Screnning
Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 33
Profusion of dataProfusion of data……
Computing power increase, fast networks, robots,…
Genome: 30,000 genesTranscriptome: 40-100,000 mRNAsProteome: 100-400,000 proteins Interactome: >1,000,000 interactions
……but also technology advances:but also technology advances:
ContextContext : : New drug development (3)New drug development (3)
Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 44
VSMVSM--G methodology (1)G methodology (1)
VSMVSM--G goal: G goal: ranks ranks ligandsligands in order to establish a limited set of hit compoundsin order to establish a limited set of hit compounds
Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 55
VSMVSM--G methodology (2)G methodology (2)
Multi-step screening funnel
SHEFSHEF
SemiSemi--flexible dockingflexible docking
Molecular DynamicsMolecular Dynamics
VSM-G funnel:
• Combination of several methods in the funnel allows both computational efficiency and high detection rate
• decrease the number of molecules after each step.
Fast geometrical Fast geometrical matchingmatching
Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 66
VSMVSM--G methodology (3)G methodology (3)
Screening input preparationScreening input preparation
Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 77
VSMVSM--G methodology (4)G methodology (4)
Funnel interface Job supervision Filtering from resultsFunnel interface Job supervision Filtering from results
Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 88
Protein: LXRProtein: LXRββ receptorreceptor
Set of representative target conformations
VSM-G Proof of concept (1)
LigandsLigands::
Starting database: ZINC
Substructure-based pre-filter:Phenyl-SO2-N motif.
Introduce a known active compound (GW3965) within the ligand database to check the validity of the approach
Starting data :11,907 structures; 103,535 conformers.
MD sampling Clustering
Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 99
Reference compound Reference compound GW3965GW3965 has been extracted as has been extracted as a putative candidate with good rank at each level of a putative candidate with good rank at each level of the funnel:the funnel:
Top 10 after 1st filter (shape complementarities)
#1 after 2nd filter (semi-flexible docking)
Highest interaction energy after MD refinements
This rank would propose the This rank would propose the GW3965GW3965 molecule as the best molecule as the best candidate ligand for the candidate ligand for the LXRLXRββ protein, with (computed) protein, with (computed) affinity far above other top compounds.affinity far above other top compounds.
VSMVSM--G G ProofProof ofof concept (2)concept (2)
/SHEF/SHEF~2 s/mol
15 min/mol
36 h / 2.5 ns (64 nodes)
Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 1010
VSMVSM--G High Throughput Screening (1)G High Throughput Screening (1)
France map with Grid’5000 sites.
Lille
Rennes Orsay Nancy
Lyon
SophiaBordeaux
Toulouse
Grenoble
Use of grid computing allows large DB screening Use of grid computing allows large DB screening within a reasonable time.within a reasonable time.
GridGrid’’5000 (5000 (www.grid5000.frwww.grid5000.fr) : ) : 5000 5000 CPUsCPUs for for researchresearch in Grid in Grid ComputingComputing, , eScienceeScience andand CyberCyber--infrastructuresinfrastructures..
We have used APST (A Parameter Sweep Tool) to schedule, distribute and manage calculations on this national-scale grid.
Current objective:Current objective: Screening databases of ~1,000,000 molecules against ~10 protein conformations (experimental and from MD sampling)
Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 1111
VSMVSM--G High Throughput Screening (2)G High Throughput Screening (2)
1,000,000 moleculesFlexible docking
(e.g. GOLD)
Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 1212
1,000,000 molecules
~ 1,000 years !
Flexible docking (e.g. GOLD)
Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 1313
VSMVSM--G High Throughput Screening (2)G High Throughput Screening (2)
1,000,000 molecules
5 years
Screening funnel(e.g. MSSH/SHEF
then GOLD)
Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 1414
VSMVSM--G High Throughput Screening (3)G High Throughput Screening (3)
1,000,000 molecules
2 days
Screening funnel(e.g. MSSH/SHEF
then GOLD)
Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 1515
VSMVSM--G High Throughput Screening (4)G High Throughput Screening (4)
Distribution Distribution ofof job on job on GridGrid 5000 : 5000 :
Local host:
DB
APST
APSTRes
Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 1616
VSMVSM--G High Throughput Screening (5)G High Throughput Screening (5)
By using these results for thejobs that present the best timeswe can compute:
Average duration of each task : 29.4 s
Total time to transfer one task input : 1.50 s
Time to run program in one task: 27.9 s
After the MSSH step we have run a SHEF filter on all molecules, selecting 10,000 for each cavity for the next screening funnel step(GOLD)
Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 1717
VSMVSM--G High Throughput Screening (6)G High Throughput Screening (6)
0
5 0 0
1 0 0 0
1 5 0 0
2 0 0 0
2 5 0 0
0 1 0 0 2 0 0 3 0 0 4 0 0
Numbers of task carried out for each size of the input files according to time
t (min)
nb. jobs done
VSMVSM--G future developments:G future developments:run GOLD on grid’5000 platform (at the present time we can run 20,000 GOLD in
14 hours on 1,200 processors)
run NAMD on grid’5000 (10 NAMD jobs of 2.5 ns in one day on 32 processors)
add possibility of QSAR modules in the funnel process
replace commercial software by in-house or open-source ones
enhance SHEF scoring function with physico-chemical features
development of a relational database to store input data and VSM-G result
etc…
PerspectivesPerspectives
Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 1818
SomeSome VSMVSM--G application in G application in thethe pipelinepipeline
Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 1919
VHTS on TC80, a VHTS on TC80, a targettarget againstagainst T T CruziCruzi trypanozomatrypanozoma
InIn projectproject ::
CooperationCooperation withwith Mali (Modibo Coulibaly et Mali (Modibo Coulibaly et Seydou Seydou DoumbiaDoumbia) on Malaria kinases) on Malaria kinases
Beautrait A., Leroux V., Chavent M., Souchet M., Cai W., Shao X., Moreau G., BladonP., Devignes M.D., Smaïl-Tabone M., Yang G., Cavin X., Ray N., Elkhayar A., SuterF., Yao J., Liao Q., Yu F.
• Grid’5000 for providing access to grid computing resources.
• LORIA and Région Lorraine for financial support.
AcknowledgementsAcknowledgements
Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 2020
1. Cai, W.; Shao, X.; Maigret B. (2002). Protein-ligand recognition using spherical harmonic molecular surfaces: towards a fast and efficient filter for large virtual throughput screening. J. Mol. Graph. Model., 20(4): p. 313-28.
2. Jones, G.; Willett, P.; Glen, R. C.; Leach, A. R.; Taylor, R. (1997). Development and validation of a genetic algorithm for flexible docking. J. Mol. Biol., 267: p. 727-748.
3. Phillips, J.C.; Braun, R.; Wang, W.; Gumbart, J.; Tajkhorshid, E.; Villa, E.; Chipot, C.; Skeel, R.D.; Kalé, L.; Schulten, K. (2005). Scalable molecular dynamics with NAMD. J. Comput. Chem., 26(16): p. 1781-1802.
4. Lala, D.S. (2005). The liver X receptors. Curr. Opin. Investig. Drugs., 6(9): p. 934-43.
5. Irwin, J.J.; Shoichet, B.K. (2005). ZINC – A Free Database of Commercially Available Compounds for Virtual Screening. J. Chem. Inf. Model., 45(1): p. 177–182.
ReferencesReferences
Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 2121
Ligands:Ligands:
Starting database: ZINC.- 3 million commercially available
compounds.
Pre-filter: only compounds showing Phenyl-SO2-N motif.
- 11,907 hits.
Introduce a known active compound (GW3965) within the ligand database to check the validity of the approach.
Conformational analysis: ~10 confs./coumpound.
- 103,535 conformers.
Bioinformatics for Africa/Nairobi 28 may - 2 june 2007
VSMVSM--G G ProofProof ofof concept (3)concept (3)
Protein:Protein:MD sampling:
- Duration: 6 ns, CHARMM27 force field, NPT ensemble, 300 K, 1 bar, explicit TIP3P water.
- Snapshot extraction each 0.5 ns.Snapshot clustering to limit the targets number.
Binding pocket at t=0.time
0
200
400
600
800
1000
1200 Surface (Ų)Volume (ų)
Binding pocket at t=6ns.Pocket evolution during MD sampling.
Bioinformatics for Africa/Nairobi 28 may - 2 june 2007
VSM-G Proof of concept (2)