developing distributed analysis pipelines with shared community resources using cloudbiolinux and...
TRANSCRIPT
![Page 1: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/1.jpg)
Motivation Solution Implementation Demonstration
Developing distributed analysispipelines with shared community
resources using CloudBioLinux andCloudMan
Brad ChapmanBioinformatics Core
Harvard School of Public Health
22 September 2011
![Page 2: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/2.jpg)
Motivation Solution Implementation Demonstration
Acknowledgements
CloudBioLinux – Ntino Krampis, Tim
Booth, Dawn Field, Pjotr Prins and
CloudBioLinux community
CloudMan – Enis Afgan, James Taylor
Exome pipeline – HSPH, MGH, Win Hide,
Oliver Hofmann
![Page 3: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/3.jpg)
Motivation Solution Implementation Demonstration
Follow along
http://www.slideshare.net/chapmanb
![Page 4: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/4.jpg)
Motivation Solution Implementation Demonstration
Cue the “lots of data” slide
ls -lh fastq/
24G 1_110907_AD08A5ACXX_1_fastq.txt
21G 1_110907_AD08A5ACXX_2_fastq.txt
24G 2_110907_AD08A5ACXX_1_fastq.txt
20G 2_110907_AD08A5ACXX_2_fastq.txt
![Page 5: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/5.jpg)
Motivation Solution Implementation Demonstration
Rapidly changing tools
![Page 6: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/6.jpg)
Motivation Solution Implementation Demonstration
Science – fundamental challenge
75% one-off experimental
25% reused code
![Page 7: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/7.jpg)
Motivation Solution Implementation Demonstration
Unfortunate result
http://news.ycombinator.com/item?id=2735537
![Page 8: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/8.jpg)
Motivation Solution Implementation Demonstration
Hard choices
Computation
Demands flexible, well-architected, scalable
code
ScienceRequires rapid turn around and
experimentation
![Page 9: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/9.jpg)
Motivation Solution Implementation Demonstration
2 solutions (at least)
1 Improve your programming skills
2 Utilize community resources
![Page 10: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/10.jpg)
Motivation Solution Implementation Demonstration
Become a better coder
http://software-carpentry.org/
![Page 11: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/11.jpg)
Motivation Solution Implementation Demonstration
Community resources
Share painful parts
Base of well-written, scalable code
Start each problem from a higher level of
abstraction
![Page 12: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/12.jpg)
Motivation Solution Implementation Demonstration
Community components
CloudBioLinux – install software
CloudMan – manage cluster
Exome analysis pipeline – do science
![Page 13: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/13.jpg)
Motivation Solution Implementation Demonstration
CloudBioLinux
Amazon image with bioinformatics
software and libraries
Automated build framework
Community effort to maintain and
extend
http://cloudbiolinux.org
![Page 14: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/14.jpg)
Motivation Solution Implementation Demonstration
CloudMan
SGE cluster plus automation
Web interface and monitoring
Persistence and sharing
Powers the Galaxy Cloud offering
http://wiki.g2.bx.psu.edu/Admin/Cloud
![Page 15: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/15.jpg)
Motivation Solution Implementation Demonstration
Exome analysis pipeline
Existing algorithmsAligners – Bowtie, BWAVariation – GATKQuality assessment – FastQC, Picard
Messaging system – AMQP
https://github.com/chapmanb/bcbb/
tree/master/nextgen
![Page 16: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/16.jpg)
Motivation Solution Implementation Demonstration
Fastq lane processing
![Page 17: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/17.jpg)
Motivation Solution Implementation Demonstration
Sample processing
![Page 18: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/18.jpg)
Motivation Solution Implementation Demonstration
Variant calling
![Page 19: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/19.jpg)
Motivation Solution Implementation Demonstration
Parallelization
![Page 20: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/20.jpg)
Motivation Solution Implementation Demonstration
![Page 21: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/21.jpg)
Motivation Solution Implementation Demonstration
Amazon
Virtual machinesShareReproduceCoordinate
Accessibility
![Page 22: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/22.jpg)
Motivation Solution Implementation Demonstration
What are we going to do?
Use AWS console to boot
CloudBioLinux
Setup CloudMan in AWS console
Boot CloudMan instance with demo
data
![Page 23: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/23.jpg)
Motivation Solution Implementation Demonstration
What are we going to do?continued
Manage cluster with CloudMan interface
Setup messaging queue
Run pipeline, examine results
Share cluster
![Page 24: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/24.jpg)
Motivation Solution Implementation Demonstration
CloudBioLinux
Select and launch CloudBioLinux AMI
from AWS consoleConnect
FreeNX graphical clientssh
Full tutorial PDF: http://j.mp/nnh5TE
![Page 25: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/25.jpg)
Motivation Solution Implementation Demonstration
Prep work
Signup for AWS account:
http://aws.amazon.com/
Create login key pair in AWS Console
Install NX client:
http://www.nomachine.com/select-package-client.php
![Page 27: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/27.jpg)
Select CloudBioLinux image from Community AMIs
![Page 28: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/28.jpg)
enter NX password in user-data (freenxpass: secret)
![Page 29: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/29.jpg)
Launch CloudBioLinux server
![Page 30: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/30.jpg)
Get external hostname from Instances page
![Page 31: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/31.jpg)
Connect using NX client, with ubuntu user and secret password
![Page 32: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/32.jpg)
![Page 33: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/33.jpg)
Connect with ssh, using private ssh key-pair
![Page 34: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/34.jpg)
Terminate the server when finished
![Page 35: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/35.jpg)
Motivation Solution Implementation Demonstration
Setup CloudMan in AWS console
Create a custom security group
Full tutorial:
http://wiki.g2.bx.psu.edu/Admin/Cloud
![Page 36: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/36.jpg)
Create security group rules following wiki instructions
![Page 37: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/37.jpg)
Final security group specifications
![Page 38: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/38.jpg)
Motivation Solution Implementation Demonstration
Boot CloudMan instance withdemo data
Start server
Pass in CloudMan user data
Load shared CloudMan image
![Page 39: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/39.jpg)
Follow same procedure as CloudBioLinux
![Page 40: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/40.jpg)
Create CloudMan user-data file
cluster_name: cbldemo
password: cbl
access_key: your_access_key
secret_key: your_long_AWS_secret_key
![Page 41: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/41.jpg)
Provide user-data from file
![Page 42: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/42.jpg)
Choose created security group
![Page 43: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/43.jpg)
Login to instance with password from user-data
![Page 44: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/44.jpg)
Motivation Solution Implementation Demonstration
CloudMan share-an-instance
Persist data in a CloudMan cluster
Easily sharable
For this democm-b53c6f1223f966914df347687f6fc818/shared/2011-10-07–14-00
![Page 45: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/45.jpg)
Import shared instance with demo data
![Page 46: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/46.jpg)
Motivation Solution Implementation Demonstration
Manage cluster with CloudMan
Web-based console
Monitor running processes
Add nodes to cluster as needed
![Page 47: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/47.jpg)
CloudMan console to interact with cluster
![Page 48: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/48.jpg)
Add node to cluster
![Page 49: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/49.jpg)
Motivation Solution Implementation Demonstration
Setup messaging communication
Command line access to server
Adjust RabbitMQ configuration
Setup messaging queue
![Page 50: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/50.jpg)
Motivation Solution Implementation Demonstration
Command line access to server
ssh -i ~/.ec2/id-kunkel.keypair
Follow approach used to connect to
CloudBioLinux cluster; can also connect via
NX
![Page 51: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/51.jpg)
Motivation Solution Implementation Demonstration
Edit /export/data/galaxy/universe_wsgi.ini
configuration file to add internal host name.
[galaxy_amqp]
host = ip-10-125-10-182.ec2.internal
port = 5672
userid = biouser
password = tester
![Page 52: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/52.jpg)
Motivation Solution Implementation Demonstration
Setup messaging queue
$ sudo rabbitmqctl add_user biouser testercreating user ’biouser’ ......done.
$ sudo rabbitmqctl add_vhost bionextgencreating vhost ’bionextgen’ ......done.
$ sudo rabbitmqctl set_permissions -p bionextgenbiouser ".*" ".*" ".*"
setting permissions for user ’biouser’ in vhost ’bionextgen’ ......done.
![Page 53: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/53.jpg)
Motivation Solution Implementation Demonstration
Run pipeline, examine results
Ready to run distributed pipeline
Demo data – two paired end fastq lanes
Variant calling workflow
![Page 54: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/54.jpg)
Motivation Solution Implementation Demonstration
Input sequence data
$ ls -1 /export/data/exome_example/fastq/
7_100326_FC6107FAAXX_1-chr22.fastq
7_100326_FC6107FAAXX_2-chr22.fastq
8_100326_FC6107FAAXX_1-chr22.fastq
8_100326_FC6107FAAXX_2-chr22.fastq
![Page 55: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/55.jpg)
Motivation Solution Implementation Demonstration
Run level: YAML Configuration
$ cat /export/data/exome_example/config/run_info.yaml---fc_date: ’100326’fc_name: FC6107FAAXXdetails:- files: [7_100326_FC6107FAAXX_1-chr22.fastq,
7_100326_FC6107FAAXX_2-chr22.fastq]lane: 7description: Test replicate 1analysis: SNP callinggenome_build: hg19algorithm:quality_format: Standardhybrid_bait: hybrid_selection/baits.bedhybrid_target: hybrid_selection/targets.bed
![Page 56: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/56.jpg)
Motivation Solution Implementation Demonstration
System level: YAML Configuration
$ cat /export/data/galaxy/post_process.yaml---program:bowtie: bowtiebwa: bwaucsc_bigwig: wigToBigWigpicard: /usr/share/java/picardgatk: /usr/share/java/gatksnpEff: /usr/share/java/snpefffastqc: fastqc
distributed:cluster_platform: sgeplatform_args: ’-q all.q’cores_per_host: 1rabbitmq_vhost: bionextgen
![Page 57: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/57.jpg)
Motivation Solution Implementation Demonstration
Run exome pipeline
$ cd /export/data/work
$ distributed_nextgen_pipeline.py
/export/data/galaxy/post_process.yaml
/export/data/exome_example/fastq
/export/data/exome_example/config/run_info.yaml
![Page 58: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/58.jpg)
Motivation Solution Implementation Demonstration
What just happened?
![Page 59: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/59.jpg)
Motivation Solution Implementation Demonstration
Monitoring: SGE queues
$ qstatob-ID prior name state submit/start at queue--------------------------------------------------------------1 0.55500 nextgen_an r 18:16:32 [email protected] 0.55500 nextgen_an r 18:16:32 [email protected] 0.55500 automated_ r 18:16:47 [email protected]
![Page 60: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/60.jpg)
Motivation Solution Implementation Demonstration
Monitoring: Analysis directory
$ cd /export/data/work$ ls -lhdrwxr-xr-x 4.0 alignments-rw-r--r-- 2.0K automated_initial_analysis.py.o11drwxr-xr-x 33 log-rw-r--r-- 15K nextgen_analysis_server.py.o10-rw-r--r-- 15K nextgen_analysis_server.py.o9drwxr-xr-x 102 tmp
![Page 61: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/61.jpg)
Motivation Solution Implementation Demonstration
Monitoring: Log files
$ less nextgen_analysis_server.py.o10INFO: nextgen_pipeline: Processing sample: Test replicate 2;lane 8; reference genome hg19; researcher ;analysis method SNP calling
INFO: nextgen_pipeline:Aligning lane 8_100326_FC6107FAAXX with bwa aligner
INFO: nextgen_pipeline:Combining and preparing wig file [u’’, u’Test replicate 2’]
INFO: nextgen_pipeline:Recalibrating [u’’, u’Test replicate 2’] with GATK
![Page 62: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/62.jpg)
Motivation Solution Implementation Demonstration
Retrieve results: Copy files
$ upload_to_galaxy.py/export/data/galaxy/post_process.yaml/export/data/exome_example/fastq/export/data/work/export/data/exome_example/config/run_info.yaml
Final files copied into new directory; allows
cleanup of analysis directory
![Page 63: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/63.jpg)
Motivation Solution Implementation Demonstration
Retrieve results: Output directory
$ ls -lh /export/data/galaxy/storage/100326_FC6107FAAXX/7-rw-r--r-- 38M 7_100326_FC6107FAAXX.bam-rw-r--r-- 22M 7_100326_FC6107FAAXX-coverage.bigwig-rw-r--r-- 72M 7_100326_FC6107FAAXX-gatkrecal.bam-rw-r--r-- 109K 7_100326_FC6107FAAXX-snp-effects.tsv-rw-r--r-- 827K 7_100326_FC6107FAAXX-snp-filter.vcf-rw-r--r-- 1.6M 7_100326_FC6107FAAXX-summary.pdf
![Page 64: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/64.jpg)
Motivation Solution Implementation Demonstration
Share results
Share-an-instance
Uses CloudMan web interfaceReproducible research
CloudBioLinux AMI – softwareCloudMan – data and configuration
![Page 65: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/65.jpg)
CloudMan console enables push button sharing
![Page 66: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/66.jpg)
Can make public or available to specific collaborators
![Page 67: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/67.jpg)
When finished, turn everything off through CloudMan
![Page 68: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/68.jpg)
Motivation Solution Implementation Demonstration
Summary
CloudBioLinux
Shared machine image of biological
software
Boot from AWS console
Connect with NX graphical client and
ssh
![Page 69: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/69.jpg)
Motivation Solution Implementation Demonstration
Summary
CloudMan
Cluster setup and management
Boot from share-an-instance
Manage cluster through web interface
Share final results
![Page 70: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/70.jpg)
Motivation Solution Implementation Demonstration
Summary
Exome pipeline
Parallel framework for running analyses
Run using automated scripts
Extract alignments, variant calls and
summary information
![Page 71: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/71.jpg)
Motivation Solution Implementation Demonstration
Future: interfaces make it easier
https://bitbucket.org/hbc/galaxy-central-hbc
![Page 72: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/72.jpg)
Motivation Solution Implementation Demonstration
Future: Simplified file selection
![Page 73: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/73.jpg)
Motivation Solution Implementation Demonstration
Future: Top level parameters
![Page 74: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/74.jpg)
Motivation Solution Implementation Demonstration
Future: Galaxy data libraries
![Page 75: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/75.jpg)
Motivation Solution Implementation Demonstration
Future: Galaxy analysis
![Page 76: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/76.jpg)
Motivation Solution Implementation Demonstration
Future: External UCSCvisualization
![Page 77: Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan](https://reader033.vdocuments.us/reader033/viewer/2022052823/5550137ab4c905af648b4a4d/html5/thumbnails/77.jpg)
Motivation Solution Implementation Demonstration
Read more
Step-by-step instructions
http://j.mp/rp69nx
Approaches to parallelism
http://j.mp/nPQHcm
Future work
http://bcbio.wordpress.com