delivering bioinformatics training using cloud computing infrastructure - nathan watson-haigh
TRANSCRIPT
![Page 1: Delivering Bioinformatics Training Using Cloud Computing Infrastructure - Nathan Watson-Haigh](https://reader036.vdocuments.us/reader036/viewer/2022081515/554e763cb4c9054a698b4dc2/html5/thumbnails/1.jpg)
Delivering Bioinformatics Training Using Cloud Computing Infrastructure
Nathan S. Watson-Haigh
![Page 2: Delivering Bioinformatics Training Using Cloud Computing Infrastructure - Nathan Watson-Haigh](https://reader036.vdocuments.us/reader036/viewer/2022081515/554e763cb4c9054a698b4dc2/html5/thumbnails/2.jpg)
Take-Home Message
• Cloud computing infrastructure– Solves some issues in delivering hands-on
bioinformatics training– Has its own unique set of issues
• Code and materials (CC and Open Access)– NGS Workshop– Try rolling your own!
github.com/BPA-CSIRO-WorkshopsWatson-Haigh, N.S., et al. (2013). Next-generation sequencing: a challenge to meet the increasing demand for training workshops in Australia. Brief Bioinform 14, 563–574. http://bib.oxfordjournals.org/content/14/5/563
![Page 3: Delivering Bioinformatics Training Using Cloud Computing Infrastructure - Nathan Watson-Haigh](https://reader036.vdocuments.us/reader036/viewer/2022081515/554e763cb4c9054a698b4dc2/html5/thumbnails/3.jpg)
ACKNOWLEDGEMENTS
Catherine Shang (Bioplatforms Australia)
Nathan Watson-Haigh (ACPFG)Nandan Deshpande (Systems Biology Initiative, UNSW)Paula Moolhuijzen (CCG, Murdoch University)Sonika Tyagi (Australian Genome Research Facility)Matthew Field (ANU)
Annette McGrath (CSIRO Bioinformatics Core , Digital Productivity and Services Flagship)
Konsta Duesing (Food & Nutrition Flagship, CSIRO)Xi (Sean) Li (CSIRO Bioinformatics Core, Digital Productivity and Services Flagship)Sean McWilliam (CSIRO Agricultural Productivity Flagship)Paul Greenfield (CSIRO Digital Productivity and Services Flagship)
Cath Brooksbank (EBI)Vicky Schneider (TGAC)
Matthias Haimel (University of Cambridge)Myrto Kostadima (University of Cambridge)Remco Loos (EBI)Alex Mitchell (EBI)Hubert Denise (EBI)
Jerico Revote (Monash e-Research Centre)Simon Michnowicz (Monash e-Research Centre)Steve Quenette (Monash University)
Mark Crowe (QFAB)Peter Sterk (Oxford e-Research Centre)
![Page 4: Delivering Bioinformatics Training Using Cloud Computing Infrastructure - Nathan Watson-Haigh](https://reader036.vdocuments.us/reader036/viewer/2022081515/554e763cb4c9054a698b4dc2/html5/thumbnails/4.jpg)
Office
Workshop VM
THE SETUP
Host
![Page 5: Delivering Bioinformatics Training Using Cloud Computing Infrastructure - Nathan Watson-Haigh](https://reader036.vdocuments.us/reader036/viewer/2022081515/554e763cb4c9054a698b4dc2/html5/thumbnails/5.jpg)
SETTING UP
Office
Admin VM
Host
Cloud API Tools
![Page 6: Delivering Bioinformatics Training Using Cloud Computing Infrastructure - Nathan Watson-Haigh](https://reader036.vdocuments.us/reader036/viewer/2022081515/554e763cb4c9054a698b4dc2/html5/thumbnails/6.jpg)
Office Host
Sysadmin Computer
DEALING WITH HICCUPS
Portable Apps
Admin VM
parallel-ssh -scp -slurp
![Page 7: Delivering Bioinformatics Training Using Cloud Computing Infrastructure - Nathan Watson-Haigh](https://reader036.vdocuments.us/reader036/viewer/2022081515/554e763cb4c9054a698b4dc2/html5/thumbnails/7.jpg)
Drivers
• A need for bioinformatics training– Good bioinformaticians work at, and understand,
command line tools
• Take the workshops to the trainees– Maximise participation
![Page 8: Delivering Bioinformatics Training Using Cloud Computing Infrastructure - Nathan Watson-Haigh](https://reader036.vdocuments.us/reader036/viewer/2022081515/554e763cb4c9054a698b4dc2/html5/thumbnails/8.jpg)
Goals
• Minimise maintenance of the training environment– No monolithic installs
• Minimise cognitive burden on trainees– The training environment should go unseen
• Make everything publically accessible and as reusable as possible
![Page 9: Delivering Bioinformatics Training Using Cloud Computing Infrastructure - Nathan Watson-Haigh](https://reader036.vdocuments.us/reader036/viewer/2022081515/554e763cb4c9054a698b4dc2/html5/thumbnails/9.jpg)
NGS Workshop: Key Elements
• Knowledgeable, friendly trainers– Obviously
• Content– Tools, data, handout
• Mode of delivery– Dedicated training suite, BYO laptop, roadshow
• Training environment– Tailored to mode of delivery
![Page 10: Delivering Bioinformatics Training Using Cloud Computing Infrastructure - Nathan Watson-Haigh](https://reader036.vdocuments.us/reader036/viewer/2022081515/554e763cb4c9054a698b4dc2/html5/thumbnails/10.jpg)
THE REUSABLE RESOURCES: VM SETUP
Office
Vanilla Ubuntu VM
NGS Workshop VM
• Gnome• FreeNX• Generic Tools
• NGS tools• NGS data• NGS handout
![Page 11: Delivering Bioinformatics Training Using Cloud Computing Infrastructure - Nathan Watson-Haigh](https://reader036.vdocuments.us/reader036/viewer/2022081515/554e763cb4c9054a698b4dc2/html5/thumbnails/11.jpg)
THE REUSABLE RESOURCES: HANDOUT
Office
TraineeHandout
TrainerHandout
![Page 12: Delivering Bioinformatics Training Using Cloud Computing Infrastructure - Nathan Watson-Haigh](https://reader036.vdocuments.us/reader036/viewer/2022081515/554e763cb4c9054a698b4dc2/html5/thumbnails/12.jpg)
Rolling Your Own Handout
• Style file provided– Makes it easy(er) to write/edit LaTeX
• Trainee Handout • Trainer Handout
https://github.com/BPA-CSIRO-Workshops/handout-template
![Page 13: Delivering Bioinformatics Training Using Cloud Computing Infrastructure - Nathan Watson-Haigh](https://reader036.vdocuments.us/reader036/viewer/2022081515/554e763cb4c9054a698b4dc2/html5/thumbnails/13.jpg)
Simplified Styling\begin{information}Information to be provided to the trainee.
\end{information}
![Page 14: Delivering Bioinformatics Training Using Cloud Computing Infrastructure - Nathan Watson-Haigh](https://reader036.vdocuments.us/reader036/viewer/2022081515/554e763cb4c9054a698b4dc2/html5/thumbnails/14.jpg)
Simplified Styling\begin{questions}First question.
\begin{answer}Answer to first question.
\end{answer}
Second question.
\begin{answer}Answer to second question.
\end{answer}\end{questions}
![Page 15: Delivering Bioinformatics Training Using Cloud Computing Infrastructure - Nathan Watson-Haigh](https://reader036.vdocuments.us/reader036/viewer/2022081515/554e763cb4c9054a698b4dc2/html5/thumbnails/15.jpg)
Simplified Styling\begin{lstlisting}# several lines of codecd ~/ls -l# a long command that line wraps automaticallytophat --solexa-quals -g 2 --library-type fr-unstranded -j annotation/Danio_rerio.Zv9.66.spliceSites -o tophat/ZV9_2cells genome/ZV9 data/2cells_1.fastq data/2cells_2.fastq
\end{lstlisting}
![Page 16: Delivering Bioinformatics Training Using Cloud Computing Infrastructure - Nathan Watson-Haigh](https://reader036.vdocuments.us/reader036/viewer/2022081515/554e763cb4c9054a698b4dc2/html5/thumbnails/16.jpg)
Resources Refresher
• Plain text files (Bash, LaTeX) for– Generic tools install– Workshop-specific tool install– Workshop-specific data download/configuration– Handout document
• Why plain text?– Version control– Collaboration– Reuse
![Page 17: Delivering Bioinformatics Training Using Cloud Computing Infrastructure - Nathan Watson-Haigh](https://reader036.vdocuments.us/reader036/viewer/2022081515/554e763cb4c9054a698b4dc2/html5/thumbnails/17.jpg)
Cloud Pros and ConsPros• Consistent training environment
• No “alien” OS on host network• Minimal host network
configuration and traffic– Firewall (port 22)
• Minimal local computer specification and configuration– NX Client plus session files
• Scalable resources• Encourages reproducible work
Cons• Remote vs local confusion
– Hide this using NX
• How to analyse own data?
• Requires a computer suite• Sysadmin skills required
![Page 18: Delivering Bioinformatics Training Using Cloud Computing Infrastructure - Nathan Watson-Haigh](https://reader036.vdocuments.us/reader036/viewer/2022081515/554e763cb4c9054a698b4dc2/html5/thumbnails/18.jpg)
Workshops Using This SystemFe
b 20
12
Jul 2
012
MEL
(22)
SYD
(22)
Nov
201
2BN
E (3
3)AD
L (2
9)
Feb
2013
CAN
(35)
Jun
2013
PER
(38)
Jul 2
013
MEL
(60)
Jul 2
013
MEL
(38)
Nov
201
3SY
D (3
8)BN
E (3
0)
Feb
2014
SYD
(38)
MEL
(34)
Jul 2
014
SYD
(37)
Jul 2
014
CAN
(60)
Jul 2
014
CAN
(15)
Sep
2013
Feb
2012
Dec
201
2Bi
oInf
oSum
mer
(100
)
Dec
201
3Bi
oInf
oSum
mer
(100
)
Nov
201
3AC
AD (3
0)
Nov
201
2AC
AD (3
0)
Jul 2
014
R &
rQTL
(40)
Apr 2
013
Linu
x &
RN
A-Se
q (3
0)
Nov
201
4AC
AD (3
0)
BPA/CSIRO Competitive Courses:~650 applicants for ~400 places
EMBL Australia PhD Program:120 places
Other workshops:360 places
Total: ~900 places in 2.5 yrs
![Page 19: Delivering Bioinformatics Training Using Cloud Computing Infrastructure - Nathan Watson-Haigh](https://reader036.vdocuments.us/reader036/viewer/2022081515/554e763cb4c9054a698b4dc2/html5/thumbnails/19.jpg)
Future Directions
• Better “glue” to enable easier reuse on– Local VM’s– NeCTAR Research Cloud– Amazon Web Services
• Better documentation - ugh!– Easier for others to contribute and roll their own– Tagging workshop versions
![Page 20: Delivering Bioinformatics Training Using Cloud Computing Infrastructure - Nathan Watson-Haigh](https://reader036.vdocuments.us/reader036/viewer/2022081515/554e763cb4c9054a698b4dc2/html5/thumbnails/20.jpg)
Take-Home Message
• Cloud computing infrastructure– Solves some issues in delivering hands-on
bioinformatics training– Has its own unique set of issues
• Code and materials (CC and Open Access)– NGS Workshop– Try rolling your own!
github.com/BPA-CSIRO-WorkshopsWatson-Haigh, N.S., et al. (2013). Next-generation sequencing: a challenge to meet the increasing demand for training workshops in Australia. Brief Bioinform 14, 563–574. http://bib.oxfordjournals.org/content/14/5/563
![Page 21: Delivering Bioinformatics Training Using Cloud Computing Infrastructure - Nathan Watson-Haigh](https://reader036.vdocuments.us/reader036/viewer/2022081515/554e763cb4c9054a698b4dc2/html5/thumbnails/21.jpg)
ACKNOWLEDGEMENTS
Catherine Shang (Bioplatforms Australia)
Nathan Watson-Haigh (ACPFG)Nandan Deshpande (Systems Biology Initiative, UNSW)Paula Moolhuijzen (CCG, Murdoch University)Sonika Tyagi (Australian Genome Research Facility)Matthew Field (ANU)
Annette McGrath (CSIRO Bioinformatics Core , Digital Productivity and Services Flagship)
Konsta Duesing (Food & Nutrition Flagship, CSIRO)Xi (Sean) Li (CSIRO Bioinformatics Core, Digital Productivity and Services Flagship)Sean McWilliam (CSIRO Agricultural Productivity Flagship)Paul Greenfield (CSIRO Digital Productivity and Services Flagship)
Cath Brooksbank (EBI)Vicky Schneider-Gricar (TGAC)
Matthias Haimel (University of Cambridge)Myrto Kostadima (University of Cambridge)Remco Loos (EBI)Alex Mitchell (EBI)Hubert Denise (EBI)
Jerico Revote (Monash e-Research Centre)Simon Michnowicz (Monash e-Research Centre)Steve Quenette (Monash University)
Mark Crowe (QFAB)Peter Sterk (Oxford e-Research Centre)
![Page 22: Delivering Bioinformatics Training Using Cloud Computing Infrastructure - Nathan Watson-Haigh](https://reader036.vdocuments.us/reader036/viewer/2022081515/554e763cb4c9054a698b4dc2/html5/thumbnails/22.jpg)
Good, Bad and Ugly: TraineeGood Bad Ugly• Familiar
environment• Accessible
afterwards
• Permissions• Poor hardware
specification
• First hour (or 3) wasted by sorting out “issues”
• A dedicated facility
• Everything should just work
• Costs of residential courses
• Access to more powerful hardware
• Usually CLI• Remote vs local
confusion• Limited access
• I want a GUI• Users competing
over compute resources
• Accessible afterwards
• What’s a cloud!? • Can I use this for my own data?
BYO laptop
Dedicated training room
Remote server
Cloud virtualisation
![Page 23: Delivering Bioinformatics Training Using Cloud Computing Infrastructure - Nathan Watson-Haigh](https://reader036.vdocuments.us/reader036/viewer/2022081515/554e763cb4c9054a698b4dc2/html5/thumbnails/23.jpg)
Good Bad Ugly
Good, Bad and Ugly: Trainer• Just need a room • Network access
• Multiple OSes• We look like idiots• We wasted so
much time
• We know what works
• Local IT support
• Maintaining up-to-date hardware
• No access afterwards
• Control over the OS
• A single OS to maintain
• Managing users• Resource
management
• Enables roadshows
• 1 VM per trainee
• New skills required
• Post-workshop access
BYO laptop
Dedicated training room
Remote server
Cloud virtualisation
![Page 24: Delivering Bioinformatics Training Using Cloud Computing Infrastructure - Nathan Watson-Haigh](https://reader036.vdocuments.us/reader036/viewer/2022081515/554e763cb4c9054a698b4dc2/html5/thumbnails/24.jpg)
Puppet
• Helps sysadmins automate many repetitive tasks
• Puppet config files – Plain text (version control)– Defines the required state “B” - Puppet figures
out how to get from “A” to “B”• Workshops defined in terms of tools and data
needed using plain text– Collaborate and share on workshops