large-scale distributed in silico drug discovery using vsm-g · reference compound gw3965 has been...

23
Leo GHEMTIO : Phd Student ([email protected]) Bernard MAIGRET : Advisor ([email protected]) INRIA LORRAINE, http://www.loria.fr Large Large - - scale distributed scale distributed in in silico silico drug drug discovery using VSM discovery using VSM - - G G V V irtual S S creening M M anager for G G rids Bioinformatics for Africa/Nairobi 28 may - 2 june 2007 [email protected] 1 1

Upload: others

Post on 21-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Large-scale distributed in silico drug discovery using VSM-G · Reference compound GW3965 has been extracted as a putative candidate with good rank at each level of the funnel: ¾Top

Leo GHEMTIO : Phd Student ([email protected])Bernard MAIGRET : Advisor ([email protected])

INRIA LORRAINE, http://www.loria.fr

LargeLarge--scale distributed scale distributed in in silicosilico drug drug discovery using VSMdiscovery using VSM--GG

VVirtual

SScreening

MManager for

GGrids

Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 11

Page 2: Large-scale distributed in silico drug discovery using VSM-G · Reference compound GW3965 has been extracted as a putative candidate with good rank at each level of the funnel: ¾Top

Cost evolution of new drug development.Time scale of new drug development.

Nowadays it takes ~14 years and costs more than 1.2 billion US$.

Cost and development time of a new drugCost and development time of a new drug

ContextContext : : New drug development (1)New drug development (1)

Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 22

Page 3: Large-scale distributed in silico drug discovery using VSM-G · Reference compound GW3965 has been extracted as a putative candidate with good rank at each level of the funnel: ¾Top

Time scale of new drug development.

Cost and development time of a new drugCost and development time of a new drug

ContextContext : : New drug development (2)New drug development (2)

New solutions :- Drug Design

- Virtual Screnning

Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 33

Page 4: Large-scale distributed in silico drug discovery using VSM-G · Reference compound GW3965 has been extracted as a putative candidate with good rank at each level of the funnel: ¾Top

Profusion of dataProfusion of data……

Computing power increase, fast networks, robots,…

Genome: 30,000 genesTranscriptome: 40-100,000 mRNAsProteome: 100-400,000 proteins Interactome: >1,000,000 interactions

……but also technology advances:but also technology advances:

ContextContext : : New drug development (3)New drug development (3)

Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 44

Page 5: Large-scale distributed in silico drug discovery using VSM-G · Reference compound GW3965 has been extracted as a putative candidate with good rank at each level of the funnel: ¾Top

VSMVSM--G methodology (1)G methodology (1)

VSMVSM--G goal: G goal: ranks ranks ligandsligands in order to establish a limited set of hit compoundsin order to establish a limited set of hit compounds

Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 55

Page 6: Large-scale distributed in silico drug discovery using VSM-G · Reference compound GW3965 has been extracted as a putative candidate with good rank at each level of the funnel: ¾Top

VSMVSM--G methodology (2)G methodology (2)

Multi-step screening funnel

SHEFSHEF

SemiSemi--flexible dockingflexible docking

Molecular DynamicsMolecular Dynamics

VSM-G funnel:

• Combination of several methods in the funnel allows both computational efficiency and high detection rate

• decrease the number of molecules after each step.

Fast geometrical Fast geometrical matchingmatching

Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 66

Page 7: Large-scale distributed in silico drug discovery using VSM-G · Reference compound GW3965 has been extracted as a putative candidate with good rank at each level of the funnel: ¾Top

VSMVSM--G methodology (3)G methodology (3)

Screening input preparationScreening input preparation

Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 77

Page 8: Large-scale distributed in silico drug discovery using VSM-G · Reference compound GW3965 has been extracted as a putative candidate with good rank at each level of the funnel: ¾Top

VSMVSM--G methodology (4)G methodology (4)

Funnel interface Job supervision Filtering from resultsFunnel interface Job supervision Filtering from results

Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 88

Page 9: Large-scale distributed in silico drug discovery using VSM-G · Reference compound GW3965 has been extracted as a putative candidate with good rank at each level of the funnel: ¾Top

Protein: LXRProtein: LXRββ receptorreceptor

Set of representative target conformations

VSM-G Proof of concept (1)

LigandsLigands::

Starting database: ZINC

Substructure-based pre-filter:Phenyl-SO2-N motif.

Introduce a known active compound (GW3965) within the ligand database to check the validity of the approach

Starting data :11,907 structures; 103,535 conformers.

MD sampling Clustering

Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 99

Page 10: Large-scale distributed in silico drug discovery using VSM-G · Reference compound GW3965 has been extracted as a putative candidate with good rank at each level of the funnel: ¾Top

Reference compound Reference compound GW3965GW3965 has been extracted as has been extracted as a putative candidate with good rank at each level of a putative candidate with good rank at each level of the funnel:the funnel:

Top 10 after 1st filter (shape complementarities)

#1 after 2nd filter (semi-flexible docking)

Highest interaction energy after MD refinements

This rank would propose the This rank would propose the GW3965GW3965 molecule as the best molecule as the best candidate ligand for the candidate ligand for the LXRLXRββ protein, with (computed) protein, with (computed) affinity far above other top compounds.affinity far above other top compounds.

VSMVSM--G G ProofProof ofof concept (2)concept (2)

/SHEF/SHEF~2 s/mol

15 min/mol

36 h / 2.5 ns (64 nodes)

Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 1010

Page 11: Large-scale distributed in silico drug discovery using VSM-G · Reference compound GW3965 has been extracted as a putative candidate with good rank at each level of the funnel: ¾Top

VSMVSM--G High Throughput Screening (1)G High Throughput Screening (1)

France map with Grid’5000 sites.

Lille

Rennes Orsay Nancy

Lyon

SophiaBordeaux

Toulouse

Grenoble

Use of grid computing allows large DB screening Use of grid computing allows large DB screening within a reasonable time.within a reasonable time.

GridGrid’’5000 (5000 (www.grid5000.frwww.grid5000.fr) : ) : 5000 5000 CPUsCPUs for for researchresearch in Grid in Grid ComputingComputing, , eScienceeScience andand CyberCyber--infrastructuresinfrastructures..

We have used APST (A Parameter Sweep Tool) to schedule, distribute and manage calculations on this national-scale grid.

Current objective:Current objective: Screening databases of ~1,000,000 molecules against ~10 protein conformations (experimental and from MD sampling)

Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 1111

Page 12: Large-scale distributed in silico drug discovery using VSM-G · Reference compound GW3965 has been extracted as a putative candidate with good rank at each level of the funnel: ¾Top

VSMVSM--G High Throughput Screening (2)G High Throughput Screening (2)

1,000,000 moleculesFlexible docking

(e.g. GOLD)

Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 1212

Page 13: Large-scale distributed in silico drug discovery using VSM-G · Reference compound GW3965 has been extracted as a putative candidate with good rank at each level of the funnel: ¾Top

1,000,000 molecules

~ 1,000 years !

Flexible docking (e.g. GOLD)

Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 1313

VSMVSM--G High Throughput Screening (2)G High Throughput Screening (2)

Page 14: Large-scale distributed in silico drug discovery using VSM-G · Reference compound GW3965 has been extracted as a putative candidate with good rank at each level of the funnel: ¾Top

1,000,000 molecules

5 years

Screening funnel(e.g. MSSH/SHEF

then GOLD)

Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 1414

VSMVSM--G High Throughput Screening (3)G High Throughput Screening (3)

Page 15: Large-scale distributed in silico drug discovery using VSM-G · Reference compound GW3965 has been extracted as a putative candidate with good rank at each level of the funnel: ¾Top

1,000,000 molecules

2 days

Screening funnel(e.g. MSSH/SHEF

then GOLD)

Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 1515

VSMVSM--G High Throughput Screening (4)G High Throughput Screening (4)

Page 16: Large-scale distributed in silico drug discovery using VSM-G · Reference compound GW3965 has been extracted as a putative candidate with good rank at each level of the funnel: ¾Top

Distribution Distribution ofof job on job on GridGrid 5000 : 5000 :

Local host:

DB

APST

APSTRes

Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 1616

VSMVSM--G High Throughput Screening (5)G High Throughput Screening (5)

Page 17: Large-scale distributed in silico drug discovery using VSM-G · Reference compound GW3965 has been extracted as a putative candidate with good rank at each level of the funnel: ¾Top

By using these results for thejobs that present the best timeswe can compute:

Average duration of each task : 29.4 s

Total time to transfer one task input : 1.50 s

Time to run program in one task: 27.9 s

After the MSSH step we have run a SHEF filter on all molecules, selecting 10,000 for each cavity for the next screening funnel step(GOLD)

Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 1717

VSMVSM--G High Throughput Screening (6)G High Throughput Screening (6)

0

5 0 0

1 0 0 0

1 5 0 0

2 0 0 0

2 5 0 0

0 1 0 0 2 0 0 3 0 0 4 0 0

Numbers of task carried out for each size of the input files according to time

t (min)

nb. jobs done

Page 18: Large-scale distributed in silico drug discovery using VSM-G · Reference compound GW3965 has been extracted as a putative candidate with good rank at each level of the funnel: ¾Top

VSMVSM--G future developments:G future developments:run GOLD on grid’5000 platform (at the present time we can run 20,000 GOLD in

14 hours on 1,200 processors)

run NAMD on grid’5000 (10 NAMD jobs of 2.5 ns in one day on 32 processors)

add possibility of QSAR modules in the funnel process

replace commercial software by in-house or open-source ones

enhance SHEF scoring function with physico-chemical features

development of a relational database to store input data and VSM-G result

etc…

PerspectivesPerspectives

Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 1818

Page 19: Large-scale distributed in silico drug discovery using VSM-G · Reference compound GW3965 has been extracted as a putative candidate with good rank at each level of the funnel: ¾Top

SomeSome VSMVSM--G application in G application in thethe pipelinepipeline

Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 1919

VHTS on TC80, a VHTS on TC80, a targettarget againstagainst T T CruziCruzi trypanozomatrypanozoma

InIn projectproject ::

CooperationCooperation withwith Mali (Modibo Coulibaly et Mali (Modibo Coulibaly et Seydou Seydou DoumbiaDoumbia) on Malaria kinases) on Malaria kinases

Page 20: Large-scale distributed in silico drug discovery using VSM-G · Reference compound GW3965 has been extracted as a putative candidate with good rank at each level of the funnel: ¾Top

Beautrait A., Leroux V., Chavent M., Souchet M., Cai W., Shao X., Moreau G., BladonP., Devignes M.D., Smaïl-Tabone M., Yang G., Cavin X., Ray N., Elkhayar A., SuterF., Yao J., Liao Q., Yu F.

• Grid’5000 for providing access to grid computing resources.

• LORIA and Région Lorraine for financial support.

AcknowledgementsAcknowledgements

Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 2020

Page 21: Large-scale distributed in silico drug discovery using VSM-G · Reference compound GW3965 has been extracted as a putative candidate with good rank at each level of the funnel: ¾Top

1. Cai, W.; Shao, X.; Maigret B. (2002). Protein-ligand recognition using spherical harmonic molecular surfaces: towards a fast and efficient filter for large virtual throughput screening. J. Mol. Graph. Model., 20(4): p. 313-28.

2. Jones, G.; Willett, P.; Glen, R. C.; Leach, A. R.; Taylor, R. (1997). Development and validation of a genetic algorithm for flexible docking. J. Mol. Biol., 267: p. 727-748.

3. Phillips, J.C.; Braun, R.; Wang, W.; Gumbart, J.; Tajkhorshid, E.; Villa, E.; Chipot, C.; Skeel, R.D.; Kalé, L.; Schulten, K. (2005). Scalable molecular dynamics with NAMD. J. Comput. Chem., 26(16): p. 1781-1802.

4. Lala, D.S. (2005). The liver X receptors. Curr. Opin. Investig. Drugs., 6(9): p. 934-43.

5. Irwin, J.J.; Shoichet, B.K. (2005). ZINC – A Free Database of Commercially Available Compounds for Virtual Screening. J. Chem. Inf. Model., 45(1): p. 177–182.

ReferencesReferences

Bioinformatics for Africa/Nairobi 28 may - 2 june [email protected] 2121

Page 22: Large-scale distributed in silico drug discovery using VSM-G · Reference compound GW3965 has been extracted as a putative candidate with good rank at each level of the funnel: ¾Top

Ligands:Ligands:

Starting database: ZINC.- 3 million commercially available

compounds.

Pre-filter: only compounds showing Phenyl-SO2-N motif.

- 11,907 hits.

Introduce a known active compound (GW3965) within the ligand database to check the validity of the approach.

Conformational analysis: ~10 confs./coumpound.

- 103,535 conformers.

Bioinformatics for Africa/Nairobi 28 may - 2 june 2007

VSMVSM--G G ProofProof ofof concept (3)concept (3)

Page 23: Large-scale distributed in silico drug discovery using VSM-G · Reference compound GW3965 has been extracted as a putative candidate with good rank at each level of the funnel: ¾Top

Protein:Protein:MD sampling:

- Duration: 6 ns, CHARMM27 force field, NPT ensemble, 300 K, 1 bar, explicit TIP3P water.

- Snapshot extraction each 0.5 ns.Snapshot clustering to limit the targets number.

Binding pocket at t=0.time

0

200

400

600

800

1000

1200 Surface (Ų)Volume (ų)

Binding pocket at t=6ns.Pocket evolution during MD sampling.

Bioinformatics for Africa/Nairobi 28 may - 2 june 2007

VSM-G Proof of concept (2)