cloud biolinux s.africa

Download Cloud BioLinux S.Africa

If you can't read please download the document

Upload: ntino-krampis

Post on 16-Apr-2017

1.547 views

Category:

Technology


0 download

TRANSCRIPT

Cloud BioLinux: pre-configured and on-demand computing for genomics without institutional, geographic or economic boundaries

Ntino Krampis, PhDJCVI-NIAID-UL workshop S. Africa 2011

Low-cost sequencing technology

A new generation of small-factor, bench-top sequencers

example: GS Junior by 454

sequencing becoming standard in biology and genetics research

besides whole genomes: RNAseq, ChiPseq, and metagenomics

1

downstream bioinformatic analysis is required for scientific discovery

Problem 1: sequence data analysis requires high performance and expensive computing hardware

Problem 2: many commonly used bioinformatics tools are difficult to install, usually available only as source code - need technical expertise

Acquiring the sequence data is only the first step

2

cloud computing : high performance computers and data storage, remotely accessible through the Internet

we are all using the cloud: Gmail, Google Docs, Yahoo! Mail, FaceBook; you store and access data on a remote computer

cloud computers rented pay-as-you-go by service providers such as Amazon Elastic Compute Cloud (EC2)

Solving problem 1: computational capacity on the cloud

3

Cloud computing with Amazon EC2

Additional services besides computing and storage:http://aws.amazon.com

a subsidiary company of Amazon.com, pay-as-you go cloud computing

cloud computers cost $0.085 - $2 per hr (max 64GB memory and 8 processors)

used by companies that need additional computers without investing on hardware

physical locations US East / West regions, EU, Singapore, Japan researchers

work on the closest location, then distribute results world-wide

democratizes access to computing resources outside of institutional, economic or national boundaries

750 hours free for new users!:http://aws.amazon.com/free/

Additional services besides computing and storage:http://aws.amazon.com

Additional services besides computing and storage:http://aws.amazon.com

4

operating system, bioinformatics tools and data, are installed on a Virtual Machine (VM)

a VM is uploaded on the cloud; runs using on-demand computing capacity from the EC2 cloud service

can be accessed world-wide through a desktop / laptop computer with Internet access

removes need for local computing infrastructure at each laboratory

How does cloud computing work ?

local desktop computers

Internet

remote Amazon EC2 cloud computing service

VM

VM

VM

5

bioinformatics tools are difficult to install

Cloud BioLinux offers a VM on the cloud with 100+ pre-installed and configured bioinformatics tools

sequence analysis, de novo assembly, annotation, phylogeny, molecular modeling, gene expression

a researcher can initiate a practically unlimited number of VMs for large-scale data analysis

Solving problem 2: Cloud BioLinux

6

sign-in to the Amazon EC2 cloud control console http://aws.amazon.com/console Username: [email protected] Password: SAcloud!

7

Starting our tutorial: using the cloud

Launch Cloud BioLinux through the EC2 cloud console

Click the Launch Instancebutton

8

1. go to the Community AMIs tab, specify the Cloud BioLinux identifierami-6011e409

Click

2. select computational capacity: Large - 2 CPU cores 7.5 GB memory

Click

Cloud BioLinux launch wizard: steps 1 & 2

9

3. specify a password (workshop)for login to Cloud BioLinux in the User Data box

Click

Cloud BioLinux launch wizard: step 3

10

Cloud BioLinux launch wizard: steps 4 & 5

4. enter a value to uniquely identify your individual Cloud BioLinux VM

Click

5. select Proceed without a Key Pair

Click

11

Cloud BioLinux launch wizard: steps 6 & 7

6. choose default security groupClick

7. Are we all on the final screen ? Click

12

Cloud BioLinux launch status

wizard completes and we return back to the console

takes a few minutes to launch, will be in pending (yellow) state

13

While waiting for Cloud BioLinux to boot up...

14

public datasets on Amazon EC2: http://aws.amazon.com/publicdatasets

Genbank and Ensembl databases, 1000 human genomes project, influenza

data hosted for free, users pay only for the computing time used

community program: http://aws.amazon.com/datasets/submit

advantage: putting the data where computational capacity is available

Amazon EC2 education-research grants: http://aws.amazon.com/education/

Any questions before we get to the exercises ?

15

final step

In the console click Instances

findyour unique Cloud BioLinux VM using your name specified in step 4

copy its Public DNS (server address / URL on the cloud)

Connecting remotely to Cloud BioLinuxclick the NX client icon on your computer's desktop:

A. paste the DNS in the Host box B. select Unix, Gnome, remote desktop size

C. ubuntu is the default user Login workshop is the password we set

16

17

18

a.

b.

c.

19

two S.aureus strains and one S.carnosus speciesdrag & drop the .fna files on the Cloud BioLinux desktop

20

21

22

23

24

25

26

27

28

29

30

save and share the Virtual Machine (VM) containing your analysis results with a collaborator

storage costs:0.10$ / GB / month

31

authorize access to the VM: public or for certain users

other researchers can access the VM with all the software, data, analysis results directly on the cloud

Cloud BioLinux: whole system snapshot exchange

32

Acknowledgments & Credits

Brad Chapman,Tim Booth, Bela Tiwari, Dawn Field Cloud BioLinux developmentDeepak Singh and AWS - compute credits on EC2 supporting initial developmentJ. Craig Venter Inst. - sponsorship / time allowed to work on this projectD. Gomez, E. Navarro, J. Shao, I. Singh, D. Edwards, M. Stout JCVI tech innovation

Members of the Cloud Biolinux community:Enis AfganMichael HeuerRichard HollandMark JensenDave MessinaSteffen MllerRoman Valls

Thank you !