nff. leeds institute of molecular medicine. university of leeds. 2009
DESCRIPTION
NFF. Leeds Institute of Molecular Medicine. University of Leeds. 2009. Using NGS to run a massive in silico drug screening against RAS. Dr. Narcis Fernandez-Fuentes RCUK Academic Fellow Computational Biology Group Section of Experimental Therapeutics University of Leeds. - PowerPoint PPT PresentationTRANSCRIPT
NFF. Leeds Institute of Molecular Medicine. University of Leeds. 2009
Using NGS to run a massive Using NGS to run a massive in silicoin silico drug drug
screening against RASscreening against RAS
Dr. Narcis Fernandez-Fuentes Dr. Narcis Fernandez-Fuentes RCUK Academic FellowRCUK Academic FellowComputational Biology GroupComputational Biology GroupSection of Experimental TherapeuticsSection of Experimental TherapeuticsUniversity of LeedsUniversity of Leeds 1
OverviewOverview1. Biological and therapeutic relevance of RAS family
2. How to run a large in silico screening using NGS infrastructure (my version)
• Making files available to several NGS cores: SRB
• Keeping NGS cores ‘busy’
• Copying back your stuff
3. Benefits for my research2
NFF. Leeds Institute of Molecular Medicine. University of Leeds. 2009
- RAS proteins are small GTPases that function as signaling switches that control normal cell growth and differentiation
Taken from Downward J, Nature Reviews Cancer 3, 11-22, 2003
Ras-GTPRas-GTP
Ras-GDPRas-GDP
Active Inactive
Ras-GTPRas-GTP
Ras-GDPRas-GDP*
X
3NFF. Leeds Institute of Molecular Medicine. University of Leeds. 2009
PLCPI3KRALRAFP120GAP 4
NFF. Leeds Institute of Molecular Medicine. University of Leeds. 2009
RAS protein (RECEPTOR)
In silico screening
Library of chemical compounds (Drugs)
Software
Autodock, Glide ,Gold, …
1. Docking
2. Scoring
x-Score, eHiTs, …
5
Receptor– RAS structure– Protein rigid during docking– Mg was kept during screening.
Drug Libraries
– NCIDS set: 1,990 (~140,000)– ZINC7 0.8 Lead-like set: 128,085– ZINC7 0.8 Drug-like set: 83,331 – ZINC7 0.8 Fragment-like set: 62,175– Timtec library ‘Actiprobe 25K’: 53,298– Chembridge ‘EXPRESS pick’: 156,268
∑ = 487,147 (~ >3 millions)
Autodock Parameters– 25,000,000 energy evaluations– 200 LGA runs– 300 initial population– Flexible ligand representation
A few numbers… Time required per docking run :
~ 5 hours
2,5M hours 102,000 days 56 years
Files required per docking run:
Receptor file (protein): ~500 Kb Ligand file (drug): 100Kb-2Mb Parameters file: ~200 Kb Grid file: ~105 Kb N(*) interaction map files: ~700 Kb It generates 1 output file
On average 12 files = 5,9 M files
(*) Where N=number of <> atom typesNFF. Leeds Institute of Molecular Medicine. University of Leeds. 2009
6
SRB - LeedsSRB - Leeds
NFF. Leeds Institute of Molecular Medicine. University of Leeds. 2009
ngs.leeds.ac.ukngs.rl.ac.ukngs.wmin.ac.ukngs.oerc.ox.ac.uk
I. Moving data from local machine to
NGS
II. Submitting jobs: Keeping NGS computers ‘busy’
III. Retrieving results
7NFF. Leeds Institute of Molecular Medicine. University of Leeds. 2009
SRB - LeedsSRB - Leeds
I. Moving data from local machine to NGS
1. Generate all needed files locally
~/DL (LL, FL, NCIDS, TimTec, Chembridge)~/DL/RECEPTOR (protein, parameter, grid, maps, etc)~/DL/COMPOUNDS (compound files)~/DL/list (text file)
comp1 param_file map_file1 map_file2 …comp2 param_file map_file1 map_file2 ……
2. Transfer files to SRB: accessible to any NGS cluster
Need a SRB accountInstall SRB client in your local machineUsing SRB command: # Srsync -r DL s:DL
8
SRB - LeedsSRB - Leeds
II. Submitting jobs: Keeping NGS computers ‘busy’
1. A different list file was copy to each ngs cluster2. A perl script was used to monitor the queue and submit new jobs (cron job: 5’):
qstat if ‘Q’ state jobs
quitelse
Sinitn_jobs_to_submit = (n_slots - n_jobs_R)open (LIST) # listwhile i < n_jobs_to_submit
Smv comp(i)write submission scriptqsubi++
update(LIST)close(LIST)Sexit
exit
Important!
- Have a valid proxy certificate otherwise SRB will fail: upload your own (myproxy)
NFF. Leeds Institute of Molecular Medicine. University of Leeds. 2009
9NFF. Leeds Institute of Molecular Medicine. University of Leeds. 2009
ngs0655@ngs:~> crontab –l
##CHECK QUEUE AND SUBMIT NEW JOBS*/5 * * * * cd /gpfs/scratch/ngs0655/DOCKINGS_5; perl check_queue_cron.pl list /gpfs/scratch/ngs0655/DOCKINGS_5 \ queue.log > /dev/null 2>&1
## CLEAN LOGS30 * * * * cd /gpfs/scratch/ngs0655/DOCKINGS_5; \ls *.o* >list.log; perl clean_logs.pl < list.log > /dev/null \ 2>&1
## COPY FINISHED JOBS TO MY LOCAL MACHINE10 * * * * perl /gpfs/scratch/ngs0655/DOCKINGS_5/clean_dir.pl > /dev/null 2>&1
Crontab
Example of a submission script#!/bin/bash##PBS -S /bin/bash#PBS -j oe#PBS -N pradera#PBS -l walltime=15:00:00scratchdir=/gpfs/scratch/ngs0655/DOCKINGS_3input=08074014_1ras.dpfoutput=08074014_1ras.dlgcompress=08074014_1ras.dlg.gzdrug=08074014.pdbqtecho STARTED `date`echo $scratchdir $input $output $compresscd $scratchdir/usr/ngs/AUTODOCK_4_0_1 -p $input -l $outputgzip $outputmv -f $compress ./FINISHEDrm -f $drug $inputecho FINISHED `date`
10
III. Retrieving results
1. Create an empty passphrase public/private key between ngs cores and my local machine: will allow scp data without having to type the password
2. Perl script was used to monitor finished tasks and transfer files (cron job: 30’):
cd output_dirls *.dlg.gz > output_filesopen (output_files)
scp file_n to local_machineexit
#!/usr/bin/perl -w#MONITOR OUTPUT DIRECTORY AND IF THERE ARE ANY FILES TRANSFER THEM TO MY #LOCAL MACHINE USING A PUBLIC/PRIVATE PASSPHRASELESS KEY
my $scp = "/usr/bin/scp";#REMOTE DIR my $remotedir ="narcis\@imm-pc2171.leeds.ac.uk:/scratch/data/LEAD-LIKE my $localdir = "/gpfs/scratch/ngs0655/DOCKINGS_3/FINISHED"; #LOCAL DIRECTORYopendir (CONF, "$localdir"); #READ OUTPUT DIRmy @conffiles = readdir(CONF);foreach my $file (@conffiles) { if ($file =~ /\.dlg\.gz$/) { #GOOD ONE my $file2 = $localdir ."/". $file; system("$scp -rp -P 22 $file2 $remotedir");# TRANSFER FILE unlink $file2; # AND DELETE AFTERWARDS }}
NFF. Leeds Institute of Molecular Medicine. University of Leeds. 2009
11
April 2008- Got my NGS account. - Initial tests, setting up scripts, etc.
April 2008- Got my NGS account. - Initial tests, setting up scripts, etc.
May 2008 – December 2008- in silico screeningMay 2008 – December 2008- in silico screening
January 2009 - onwards- In silico screening done- Score hits and select most promising binders- Starting experimental validation
January 2009 - onwards- In silico screening done- Score hits and select most promising binders- Starting experimental validation
MayMay JuneJune JulyJuly AugAug SepSep OctOct NovNov DecDec JanJan FebFeb MarMarAprilApril AprilApril
As of today:
203 compounds have been already tested (SPR) resulting in 18 validated binders
NFF. Leeds Institute of Molecular Medicine. University of Leeds. 2009
Leeds Institute of Molecular MedicineLeeds Institute of Molecular Medicine
Prof. Terry RabbittsProf. Terry RabbittsDr. Tomo TanakaDr. Tomo TanakaDr. David PerezDr. David PerezDr. Donna PetchDr. Donna Petch
School of ChemistrySchool of Chemistry
Prof. Peter JohnsonProf. Peter JohnsonDr. Colin FishwickDr. Colin FishwickJayakanth Kankanala (JK)Jayakanth Kankanala (JK)
Faculty of Biological ScienceFaculty of Biological Science
Prof. Steven HomansProf. Steven HomansRichard MalhamRichard Malham
12
ngs@leedsngs@leeds
Shiv KaushalShiv KaushalJason LanderJason Lander
ngs@oxfordngs@oxford
Matteo TurilliMatteo Turilli
ngs@westminsterngs@westminster
Thierry DelaitreThierry DelaitreNFF. Leeds Institute of Molecular Medicine. University of Leeds. 2009