f03-cloud-obiwee
DESCRIPTION
OBIWEE: An open source bioinformatics cloud environment (Jonathan Piat)TRANSCRIPT
OBIWEE - BOSC 2011, July 16, Vienna
OBIWEE : an open source bioinformatics cloud environment
OBIWEE : On Demand Bioinformatics Intensive Workflow Execution Environment
J. Piat, F. Moreews, O. Sallou
http://vapor.gforge.inria.fr/
OBIWEE - BOSC 2011, July 16, Vienna
What is OBIWEE?
OBIWEE is an open source bioinformatics Intensive Computation Execution environment based on SLICEE.
Preconfigured on a scalable linux virtual cluster with Torgue job scheduler, it can be deployed on a private cloud, using OpenNebula, or EC2 public cloud.
S3 is used as a persistent storage layer (Eucalyptus Walrus or Amazon S3).
Based on Ubuntu/Debian linux bioinformatics images/packages.
OBIWEE - BOSC 2011, July 16, Vienna
What is OBIWEE?
1/ A workflow authoring tool +
2/ A virtual cluster (Torque) +
3/ A set of deployment scripts forPrivate cloud (OpenNebula / KVM )
and/orPublic cloud (EC2)
OBIWEE - BOSC 2011, July 16, Vienna
OBIWEE : components
1/ SLICEE : A workflow authoring tool
● Tools description is command-line based:Write the command line as on local, in your workflow, execute on remoteAll installed tools immediately availableEasy file referencing method
● Job scheduler front end (queue selection per job)
● Set of reference ID : dataset reference mechanism for remote service invocation
● Access data via URI : multiple protocols (sftp,ftp,http,file,s3) + internal ref. ID URI.
● Standard authentication(ssh)
● Persistence and logs
● Automatic coarse grain parallelism extraction:Basic bioinformatics formats implementedEasy extension with regular expressions/external scripts
OBIWEE - BOSC 2011, July 16, Vienna
OBIWEE : components
2/ A virtual cluster
A scalable cluster using Torque/SGE scheduler
Workflow jobs and parallelized jobs are submitted to the DRM manager. It is easy to scale the DRM to increase the workload capacity of the tool.
OBIWEE - BOSC 2011, July 16, Vienna
OBIWEE : components
3/ A set of deployment scripts for
Private cloud (OpenNebula / KVM )and/or
Public cloud (EC2)
OBIWEE - BOSC 2011, July 16, Vienna
Bioinformatics software installation
Slicee install
Cluster generationNFS mount of working directory Node deployment
OBIWEE : installation
Image configuration
needs
needsprovides
Virtual image creation:
Lyncee install
needs
Workflow managementData parallelizationData managementJob submissionAuthorization
provides
OBIWEE - BOSC 2011, July 16, Vienna
OBIWEE : architecture
masternode
master
node
node
Add node
Run job
Amazon EC2/Open Nebula
Client (Kepler/command line)
S3
publish
NFS share
Amazon EC2/Open Nebula
Retrieve
OBIWEE - BOSC 2011, July 16, Vienna
OBIWEE : clients
●API (job submission) : create your own submission/orchestration clients
●Command line (workflow execution)
●GUI (workflow execution and design): Kepler with SLICEE actors (workflow creation/execution)
java -cp $cp vapor.cli.VaporCmdClient -w workflow.xml -i input.xml -d auth.xml
CommonRestClient client = new CommonRestClient(serverUrl);//upload dataclient.upload(sessionId, inputDataUriPath);//asynchronous executionrdsid = client.getDSIDFromAsyncExe(xmlQuery, sessionId);//wait (client.waitAndGetResult()),or do something else//download /move resultsclient.move(vaporSession, uri,new URI(path));
OBIWEE - BOSC 2011, July 16, Vienna
OBIWEE : KEPLER client with SLICEE actors
miRNAs detection workflow
OBIWEE - BOSC 2011, July 16, Vienna
OBIWEE : road map
THANK YOU !
more info on SLICEE and OBIWEE EC2 deployment tutorial at
http://vapor.gforge.inria.fr/
Road map
● Monitoring, fail over● Custom full web client● Integration in existing popular clients● Data cleanup policies