f03-cloud-obiwee

11

Click here to load reader

Upload: bioinformatics-open-source-conference

Post on 04-Jul-2015

422 views

Category:

Technology


1 download

DESCRIPTION

OBIWEE: An open source bioinformatics cloud environment (Jonathan Piat)

TRANSCRIPT

Page 1: F03-Cloud-Obiwee

OBIWEE - BOSC 2011, July 16, Vienna

OBIWEE : an open source bioinformatics cloud environment

OBIWEE : On Demand Bioinformatics Intensive Workflow Execution Environment

J. Piat, F. Moreews, O. Sallou

http://vapor.gforge.inria.fr/

Page 2: F03-Cloud-Obiwee

OBIWEE - BOSC 2011, July 16, Vienna

What is OBIWEE?

OBIWEE is an open source bioinformatics Intensive Computation Execution environment based on SLICEE.

Preconfigured on a scalable linux virtual cluster with Torgue job scheduler, it can be deployed on a private cloud, using OpenNebula, or EC2 public cloud.

S3 is used as a persistent storage layer (Eucalyptus Walrus or Amazon S3).

Based on Ubuntu/Debian linux bioinformatics images/packages.

Page 3: F03-Cloud-Obiwee

OBIWEE - BOSC 2011, July 16, Vienna

What is OBIWEE?

1/ A workflow authoring tool +

2/ A virtual cluster (Torque) +

3/ A set of deployment scripts forPrivate cloud (OpenNebula / KVM )

and/orPublic cloud (EC2)

Page 4: F03-Cloud-Obiwee

OBIWEE - BOSC 2011, July 16, Vienna

OBIWEE : components

1/ SLICEE : A workflow authoring tool

● Tools description is command-line based:Write the command line as on local, in your workflow, execute on remoteAll installed tools immediately availableEasy file referencing method

● Job scheduler front end (queue selection per job)

● Set of reference ID : dataset reference mechanism for remote service invocation

● Access data via URI : multiple protocols (sftp,ftp,http,file,s3) + internal ref. ID URI.

● Standard authentication(ssh)

● Persistence and logs

● Automatic coarse grain parallelism extraction:Basic bioinformatics formats implementedEasy extension with regular expressions/external scripts

Page 5: F03-Cloud-Obiwee

OBIWEE - BOSC 2011, July 16, Vienna

OBIWEE : components

2/ A virtual cluster

A scalable cluster using Torque/SGE scheduler

Workflow jobs and parallelized jobs are submitted to the DRM manager. It is easy to scale the DRM to increase the workload capacity of the tool.

Page 6: F03-Cloud-Obiwee

OBIWEE - BOSC 2011, July 16, Vienna

OBIWEE : components

3/ A set of deployment scripts for

Private cloud (OpenNebula / KVM )and/or

Public cloud (EC2)

Page 7: F03-Cloud-Obiwee

OBIWEE - BOSC 2011, July 16, Vienna

Bioinformatics software installation

Slicee install

Cluster generationNFS mount of working directory Node deployment

OBIWEE : installation

Image configuration

needs

needsprovides

Virtual image creation:

Lyncee install

needs

Workflow managementData parallelizationData managementJob submissionAuthorization

provides

Page 8: F03-Cloud-Obiwee

OBIWEE - BOSC 2011, July 16, Vienna

OBIWEE : architecture

masternode

master

node

node

Add node

Run job

Amazon EC2/Open Nebula

Client (Kepler/command line)

S3

publish

NFS share

Amazon EC2/Open Nebula

Retrieve

Page 9: F03-Cloud-Obiwee

OBIWEE - BOSC 2011, July 16, Vienna

OBIWEE : clients

●API (job submission) : create your own submission/orchestration clients

●Command line (workflow execution)

●GUI (workflow execution and design): Kepler with SLICEE actors (workflow creation/execution)

java -cp $cp vapor.cli.VaporCmdClient -w workflow.xml -i input.xml -d auth.xml

CommonRestClient client = new CommonRestClient(serverUrl);//upload dataclient.upload(sessionId, inputDataUriPath);//asynchronous executionrdsid = client.getDSIDFromAsyncExe(xmlQuery, sessionId);//wait (client.waitAndGetResult()),or do something else//download /move resultsclient.move(vaporSession, uri,new URI(path));

Page 10: F03-Cloud-Obiwee

OBIWEE - BOSC 2011, July 16, Vienna

OBIWEE : KEPLER client with SLICEE actors

miRNAs detection workflow

Page 11: F03-Cloud-Obiwee

OBIWEE - BOSC 2011, July 16, Vienna

OBIWEE : road map

THANK YOU !

more info on SLICEE and OBIWEE EC2 deployment tutorial at

http://vapor.gforge.inria.fr/

Road map

● Monitoring, fail over● Custom full web client● Integration in existing popular clients● Data cleanup policies