nff. leeds institute of molecular medicine. university of leeds. 2009

12
Institute of Molecular Medicine. University of Leeds. 2009 Using NGS to run a massive Using NGS to run a massive in silico in silico drug screening drug screening against RAS against RAS Dr. Narcis Fernandez-Fuentes Dr. Narcis Fernandez-Fuentes RCUK Academic Fellow RCUK Academic Fellow Computational Biology Group Computational Biology Group Section of Experimental Section of Experimental Therapeutics Therapeutics University of Leeds University of Leeds 1

Upload: rodd

Post on 13-Jan-2016

73 views

Category:

Documents


0 download

DESCRIPTION

NFF. Leeds Institute of Molecular Medicine. University of Leeds. 2009. Using NGS to run a massive in silico drug screening against RAS. Dr. Narcis Fernandez-Fuentes RCUK Academic Fellow Computational Biology Group Section of Experimental Therapeutics University of Leeds. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: NFF. Leeds Institute of Molecular Medicine. University of Leeds.  2009

NFF. Leeds Institute of Molecular Medicine. University of Leeds. 2009

Using NGS to run a massive Using NGS to run a massive in silicoin silico drug drug

screening against RASscreening against RAS

Dr. Narcis Fernandez-Fuentes Dr. Narcis Fernandez-Fuentes RCUK Academic FellowRCUK Academic FellowComputational Biology GroupComputational Biology GroupSection of Experimental TherapeuticsSection of Experimental TherapeuticsUniversity of LeedsUniversity of Leeds 1

Page 2: NFF. Leeds Institute of Molecular Medicine. University of Leeds.  2009

OverviewOverview1. Biological and therapeutic relevance of RAS family

2. How to run a large in silico screening using NGS infrastructure (my version)

• Making files available to several NGS cores: SRB

• Keeping NGS cores ‘busy’

• Copying back your stuff

3. Benefits for my research2

NFF. Leeds Institute of Molecular Medicine. University of Leeds. 2009

Page 3: NFF. Leeds Institute of Molecular Medicine. University of Leeds.  2009

- RAS proteins are small GTPases that function as signaling switches that control normal cell growth and differentiation

Taken from Downward J, Nature Reviews Cancer 3, 11-22, 2003

Ras-GTPRas-GTP

Ras-GDPRas-GDP

Active Inactive

Ras-GTPRas-GTP

Ras-GDPRas-GDP*

X

3NFF. Leeds Institute of Molecular Medicine. University of Leeds. 2009

Page 4: NFF. Leeds Institute of Molecular Medicine. University of Leeds.  2009

PLCPI3KRALRAFP120GAP 4

NFF. Leeds Institute of Molecular Medicine. University of Leeds. 2009

RAS protein (RECEPTOR)

In silico screening

Library of chemical compounds (Drugs)

Software

Autodock, Glide ,Gold, …

1. Docking

2. Scoring

x-Score, eHiTs, …

Page 5: NFF. Leeds Institute of Molecular Medicine. University of Leeds.  2009

5

Receptor– RAS structure– Protein rigid during docking– Mg was kept during screening.

Drug Libraries

– NCIDS set: 1,990 (~140,000)– ZINC7 0.8 Lead-like set: 128,085– ZINC7 0.8 Drug-like set: 83,331 – ZINC7 0.8 Fragment-like set: 62,175– Timtec library ‘Actiprobe 25K’: 53,298– Chembridge ‘EXPRESS pick’: 156,268

∑ = 487,147 (~ >3 millions)

Autodock Parameters– 25,000,000 energy evaluations– 200 LGA runs– 300 initial population– Flexible ligand representation

A few numbers… Time required per docking run :

~ 5 hours

2,5M hours 102,000 days 56 years

Files required per docking run:

Receptor file (protein): ~500 Kb Ligand file (drug): 100Kb-2Mb Parameters file: ~200 Kb Grid file: ~105 Kb N(*) interaction map files: ~700 Kb It generates 1 output file

On average 12 files = 5,9 M files

(*) Where N=number of <> atom typesNFF. Leeds Institute of Molecular Medicine. University of Leeds. 2009

Page 6: NFF. Leeds Institute of Molecular Medicine. University of Leeds.  2009

6

SRB - LeedsSRB - Leeds

NFF. Leeds Institute of Molecular Medicine. University of Leeds. 2009

ngs.leeds.ac.ukngs.rl.ac.ukngs.wmin.ac.ukngs.oerc.ox.ac.uk

I. Moving data from local machine to

NGS

II. Submitting jobs: Keeping NGS computers ‘busy’

III. Retrieving results

Page 7: NFF. Leeds Institute of Molecular Medicine. University of Leeds.  2009

7NFF. Leeds Institute of Molecular Medicine. University of Leeds. 2009

SRB - LeedsSRB - Leeds

I. Moving data from local machine to NGS

1. Generate all needed files locally

~/DL (LL, FL, NCIDS, TimTec, Chembridge)~/DL/RECEPTOR (protein, parameter, grid, maps, etc)~/DL/COMPOUNDS (compound files)~/DL/list (text file)

comp1 param_file map_file1 map_file2 …comp2 param_file map_file1 map_file2 ……

2. Transfer files to SRB: accessible to any NGS cluster

Need a SRB accountInstall SRB client in your local machineUsing SRB command: # Srsync -r DL s:DL

Page 8: NFF. Leeds Institute of Molecular Medicine. University of Leeds.  2009

8

SRB - LeedsSRB - Leeds

II. Submitting jobs: Keeping NGS computers ‘busy’

1. A different list file was copy to each ngs cluster2. A perl script was used to monitor the queue and submit new jobs (cron job: 5’):

qstat if ‘Q’ state jobs

quitelse

Sinitn_jobs_to_submit = (n_slots - n_jobs_R)open (LIST) # listwhile i < n_jobs_to_submit

Smv comp(i)write submission scriptqsubi++

update(LIST)close(LIST)Sexit

exit

Important!

- Have a valid proxy certificate otherwise SRB will fail: upload your own (myproxy)

NFF. Leeds Institute of Molecular Medicine. University of Leeds. 2009

Page 9: NFF. Leeds Institute of Molecular Medicine. University of Leeds.  2009

9NFF. Leeds Institute of Molecular Medicine. University of Leeds. 2009

ngs0655@ngs:~> crontab –l

##CHECK QUEUE AND SUBMIT NEW JOBS*/5 * * * * cd /gpfs/scratch/ngs0655/DOCKINGS_5; perl check_queue_cron.pl list /gpfs/scratch/ngs0655/DOCKINGS_5 \ queue.log > /dev/null 2>&1

## CLEAN LOGS30 * * * * cd /gpfs/scratch/ngs0655/DOCKINGS_5; \ls *.o* >list.log; perl clean_logs.pl < list.log > /dev/null \ 2>&1

## COPY FINISHED JOBS TO MY LOCAL MACHINE10 * * * * perl /gpfs/scratch/ngs0655/DOCKINGS_5/clean_dir.pl > /dev/null 2>&1

Crontab

Example of a submission script#!/bin/bash##PBS -S /bin/bash#PBS -j oe#PBS -N pradera#PBS -l walltime=15:00:00scratchdir=/gpfs/scratch/ngs0655/DOCKINGS_3input=08074014_1ras.dpfoutput=08074014_1ras.dlgcompress=08074014_1ras.dlg.gzdrug=08074014.pdbqtecho STARTED `date`echo $scratchdir $input $output $compresscd $scratchdir/usr/ngs/AUTODOCK_4_0_1 -p $input -l $outputgzip $outputmv -f $compress ./FINISHEDrm -f $drug $inputecho FINISHED `date`

Page 10: NFF. Leeds Institute of Molecular Medicine. University of Leeds.  2009

10

III. Retrieving results

1. Create an empty passphrase public/private key between ngs cores and my local machine: will allow scp data without having to type the password

2. Perl script was used to monitor finished tasks and transfer files (cron job: 30’):

cd output_dirls *.dlg.gz > output_filesopen (output_files)

scp file_n to local_machineexit

#!/usr/bin/perl -w#MONITOR OUTPUT DIRECTORY AND IF THERE ARE ANY FILES TRANSFER THEM TO MY #LOCAL MACHINE USING A PUBLIC/PRIVATE PASSPHRASELESS KEY

my $scp = "/usr/bin/scp";#REMOTE DIR my $remotedir ="narcis\@imm-pc2171.leeds.ac.uk:/scratch/data/LEAD-LIKE my $localdir = "/gpfs/scratch/ngs0655/DOCKINGS_3/FINISHED"; #LOCAL DIRECTORYopendir (CONF, "$localdir"); #READ OUTPUT DIRmy @conffiles = readdir(CONF);foreach my $file (@conffiles) { if ($file =~ /\.dlg\.gz$/) { #GOOD ONE my $file2 = $localdir ."/". $file; system("$scp -rp -P 22 $file2 $remotedir");# TRANSFER FILE unlink $file2; # AND DELETE AFTERWARDS }}

NFF. Leeds Institute of Molecular Medicine. University of Leeds. 2009

Page 11: NFF. Leeds Institute of Molecular Medicine. University of Leeds.  2009

11

April 2008- Got my NGS account. - Initial tests, setting up scripts, etc.

April 2008- Got my NGS account. - Initial tests, setting up scripts, etc.

May 2008 – December 2008- in silico screeningMay 2008 – December 2008- in silico screening

January 2009 - onwards- In silico screening done- Score hits and select most promising binders- Starting experimental validation

January 2009 - onwards- In silico screening done- Score hits and select most promising binders- Starting experimental validation

MayMay JuneJune JulyJuly AugAug SepSep OctOct NovNov DecDec JanJan FebFeb MarMarAprilApril AprilApril

As of today:

203 compounds have been already tested (SPR) resulting in 18 validated binders

NFF. Leeds Institute of Molecular Medicine. University of Leeds. 2009

Page 12: NFF. Leeds Institute of Molecular Medicine. University of Leeds.  2009

Leeds Institute of Molecular MedicineLeeds Institute of Molecular Medicine

Prof. Terry RabbittsProf. Terry RabbittsDr. Tomo TanakaDr. Tomo TanakaDr. David PerezDr. David PerezDr. Donna PetchDr. Donna Petch

School of ChemistrySchool of Chemistry

Prof. Peter JohnsonProf. Peter JohnsonDr. Colin FishwickDr. Colin FishwickJayakanth Kankanala (JK)Jayakanth Kankanala (JK)

Faculty of Biological ScienceFaculty of Biological Science

Prof. Steven HomansProf. Steven HomansRichard MalhamRichard Malham

12

ngs@leedsngs@leeds

Shiv KaushalShiv KaushalJason LanderJason Lander

ngs@oxfordngs@oxford

Matteo TurilliMatteo Turilli

ngs@westminsterngs@westminster

Thierry DelaitreThierry DelaitreNFF. Leeds Institute of Molecular Medicine. University of Leeds. 2009