elastic-r a cloud platform for web computing, real-time collaboration, rapid applications...

20
Elastic-R A cloud platform for web computing, real-time collaboration, rapid applications development and reproducible modelling Karim Chine Cloud Era Ltd [email protected] BD 04 February 2011

Upload: immanuel-hensell

Post on 01-Apr-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Elastic-R A cloud platform for web computing, real-time collaboration, rapid applications development and reproducible modelling Karim Chine Cloud Era

Elastic-RA cloud platform for web computing,

real-time collaboration, rapid applications development

and reproducible modelling

Karim ChineCloud Era Ltd

[email protected]

BD04 February 2011

Page 2: Elastic-R A cloud platform for web computing, real-time collaboration, rapid applications development and reproducible modelling Karim Chine Cloud Era

o Open-source (GPL) software environment for statistical computing and graphics

o Lingua franca of data analysis.

o Repositories of contributed R packages related to a variety of problem domains in life sciences, social sciences, finance, econometrics, chemo metrics, etc. are growing at an exponential rate.

o R Website: http://www.r-project.org/o CRAN Task View: http://cran.r-project.org/web/views/o CRAN packages : http://cran.cnr.berkeley.edu/o Bioconductor: http://www.bioconductor.org/o R Metrics: https://www.rmetrics.org/

Scientific Computing Environments

www.scilab.org

http://root.cern.ch

www.sagemath.org

www.sas.com

office.microsoft.com

www.mathworks.com

www.scipy.org

www.spss.com

www.wolfram.com

Page 3: Elastic-R A cloud platform for web computing, real-time collaboration, rapid applications development and reproducible modelling Karim Chine Cloud Era

From: John Fox, Aspects of the Social Organization and Trajectory of the R Project, R Journal-Feb 2009

The ‘s Success Story

Page 4: Elastic-R A cloud platform for web computing, real-time collaboration, rapid applications development and reproducible modelling Karim Chine Cloud Era

"Give me a place to stand, and I shall move the earth

with a lever"

Scientific/Statistical Computing Software, HPC and Usability

Page 5: Elastic-R A cloud platform for web computing, real-time collaboration, rapid applications development and reproducible modelling Karim Chine Cloud Era

Extract from the NetSolve/GridSolve Description Document

The emergence of Grid computing as the prototype of a next generation cyberinfrastructure for science has excited high expectations for its potential as an accelerator of discovery, but it has also raised questions about whether and how the broad population of research professionals, who must be the foundation of such productivity, can be motivated to adopt this new and more complex way of working.

The rise of the new era of scientific modeling and simulation has, after all, been precipitous, and many science and engineering professionals have only recently become comfortable with the relatively simple world of the uniprocessor workstations and desktop scientific computing tools. In that world, software packages such as Matlab and Mathematica represent general-purpose scientific computing environments (SCEs) that enable users — totaling more than a million worldwide — to solve a wide variety of problems through flexible user interfaces that can model in a natural way the mathematical aspects of many different problem domains.

Moreover, the ongoing, exponential increase in the computing resources supplied by the typical workstation makes these SCEs more and more powerful, and thereby tends to reduce the need for the kind of resource sharing that represents a major strength of Grid computing [1]. Certainly there are various forces now urging collaboration across disciplines and distances, and the burgeoning Grid community, which aims to facilitate such collaboration, has made significant progress in mitigating the well-known complexities of building, operating, and using distributed computing environments. But it is unrealistic to expect the transition of research professionals to the Grid to be anything but halting and slow if it means abandoning the SCEs that they rightfully view as a major source of their productivity . We therefore believe that Grid computing’s prospects for success will tend to rise and fall according to its ability to interface smoothly with the general purpose SCEs that are likely to continue to dominate the toolbox of its targeted user base.

Arnold, D. and Agrawal, S. and Blackford, S. and Dongarra, J. and Miller, M. and Seymour, K. and Sagi, K. and Shi, Z. and Vadhiyar, S.

Page 6: Elastic-R A cloud platform for web computing, real-time collaboration, rapid applications development and reproducible modelling Karim Chine Cloud Era

Computational Components R packages : CRAN, Bioconductor, Wrapped C,C++,Fortran code Scilab modules, Matlab Toolkits, etc. Open source or commercial

Computational Resources Hardware & OS agnostic computing engine : R, Scilab,..

Clusters, grids, private or public clouds free: academic grids or pay-per-use: EC2, Azure

Computational User InterfacesWorkbench within the browserBuilt-in views / Plugins / SpreadsheetsCollaborative viewsOpen source or commercial

Computational Scripts R / Python / Groovy

On client side: interactivity.. On server side: data transfer ..

Stateful or stateless, automatic mapping of R data objects and functions Computational Application Programming Interfaces Java / SOAP / REST, Stateless and stateful

Computational Data Storage Local, NFS, FTP, Amazon S3, Amazon EBS free or commercial

Generated Computational Web Services

Elastic-R

Elastic-R is a ubiquitous plug-and-play platform for scientific and statistical computing

Page 7: Elastic-R A cloud platform for web computing, real-time collaboration, rapid applications development and reproducible modelling Karim Chine Cloud Era

Public Clouds

Private Cloud

Elastic-R portal: single facade to public and private clouds

Page 8: Elastic-R A cloud platform for web computing, real-time collaboration, rapid applications development and reproducible modelling Karim Chine Cloud Era

Elastic-R is a collaborative Virtual Research Environment.Users can share their machine instances, stateful remote engines, data,..

Page 9: Elastic-R A cloud platform for web computing, real-time collaboration, rapid applications development and reproducible modelling Karim Chine Cloud Era

Reproducible research: A scientist can snapshot her computational environment and her data. She can archive the snapshot or share it with others.

Elastic-R AMI 1

R 2.10 + BioC 2.5

Elastic-R AMI 2

R 2.9 + BioC 2..3

Elastic-R AMI 3

R 2.8+BioC 2.0

Elastic-R Amazon Machine Images

Elastic-R EBS 1

Data Set XXX

Elastic-R EBS 2

Data Set YYY

Elastic-R EBS 3

Data Set ZZZ

Elastic-R EBS 4

Data Set VVV

Elastic-R AMI 2

R 2.9 +

BioC 2.3 Elastic-R EBS 4

Data Set VVV

Amazon Elastic Block Stores

Elastic-R AMI 2

R 2.9 +

BioC 2.3 Elastic-R EBS 4

Data Set VVV

Elastic-R.org

Page 10: Elastic-R A cloud platform for web computing, real-time collaboration, rapid applications development and reproducible modelling Karim Chine Cloud Era

Anatomy of an Elastic-R machine instance on Amazon EC2

HTTPS

Restful WS over SSL

SSH

Restful WS over SSL

SOAP over SSL

Heartbeat Restful WS over SSL

Page 11: Elastic-R A cloud platform for web computing, real-time collaboration, rapid applications development and reproducible modelling Karim Chine Cloud Era

The scientist can control any number of stateful R engines from within an R session on the cloud or on his machine. He can use them for parallel computing

Page 12: Elastic-R A cloud platform for web computing, real-time collaboration, rapid applications development and reproducible modelling Karim Chine Cloud Era

Software+Services=Applications convergence + ubiquitous collaboration.The server-side toolkit: R + spreadsheet models + virtual gui widgets.

Page 13: Elastic-R A cloud platform for web computing, real-time collaboration, rapid applications development and reproducible modelling Karim Chine Cloud Era

Elastic-R on Infrastructure-as-a-Service style Cloud

Page 14: Elastic-R A cloud platform for web computing, real-time collaboration, rapid applications development and reproducible modelling Karim Chine Cloud Era

Amazon Virtual Private Cloud

Subnet 2

Subnet 3

Subnet 1

The Elastic-R portal itself is an EC2 machine instance. Any number of portals can be run on EC2 for decentralized and private collaboration

Page 15: Elastic-R A cloud platform for web computing, real-time collaboration, rapid applications development and reproducible modelling Karim Chine Cloud Era

T1

T3T2 getData

LogOn Login

Pwd

Options

SessionID associated with a reserved Elastic-R Engine

Retrieve Data logOff

ES

ESon2 ESon3f ( ES )

ESon1

T1,T2,T3 : Generated Stateful Web Services for R functions T1,T2 & T3LogOn, getData : R-SOAP methods

ES : ExpressionSetESon1, ESon2, ESon3 : ExpressionSet Object Names

f = T3 o T2 o T1

• remove ESonx

• « Clean » Elastic-R Engine

• Put Elastic-R Engine back in the Pool

• kill Elastic-R Engine

Stateful generated Web Services : Elastic-R for workflow workbenches

Page 16: Elastic-R A cloud platform for web computing, real-time collaboration, rapid applications development and reproducible modelling Karim Chine Cloud Era

Generate token

Deliver token

Use token

Activate token

Launch machine instance

Register machine instance

Use R console

Call R Engine

XXYYZZ

XXYYZZ

XXYYZZ

XXYYZZ

AWSCredentials

+ Private Key

One Amazon account and many users : Elastic-R signed tokens

Page 17: Elastic-R A cloud platform for web computing, real-time collaboration, rapid applications development and reproducible modelling Karim Chine Cloud Era

Elastic-R Portal :

www.elastic-r.org

Articles about the project:

Chine K. (2010). Open Science in the Cloud: Towards a Universal Platform for Scientific and Statistical Computing. In Handbook of Cloud Computing. (Chapter 19). Springer US.

Karim Chine, "Learning Math and Statistics on the Cloud, Towards an EC2-Based Google Docs-like Portal for Teaching / Learning Collaboratively with R and Scilab," icalt, pp.752-753, 2010 10th IEEE International Conference on Advanced Learning Technologies, 2010

Karim Chine, "Scientific Computing Environments in the age of virtualization, toward a universal platform for the Cloud" pp. 44-48, 2009 IEEE International Workshop on Open Source Software for Scientific Computation (OSSC), 2009

Karim Chine, "Biocep, Towards a Federative, Collaborative, User-Centric, Grid-Enabled and Cloud-Ready Computational Open Platform" escience,pp.321-322, 2008 Fourth IEEE International Conference on eScience, 2008

Linkedin Group:

http://www.linkedin.com/groups?home=&gid=2345405

Links

Page 18: Elastic-R A cloud platform for web computing, real-time collaboration, rapid applications development and reproducible modelling Karim Chine Cloud Era

Thank you !

Page 19: Elastic-R A cloud platform for web computing, real-time collaboration, rapid applications development and reproducible modelling Karim Chine Cloud Era

Node 5 : EC2 virtual machine 2

Remote Objects Registry

Node 1: Windows XP

Front-end host

Node 4 : EC2 virtual machine 1Node 4 : EC2 virtual machine 1

Node 2: Mac OS

Node 3: 64 bits Server / Linux

Supervisor

Cloudbursting

via Amazon Web Services

Perl Scripts

logOn

Use R

logOff

.NET Appli logOn

Use R

logOff

R-HTTP R-SOAP

Parallel Computing Applications

Borrow Rs

Use Rs

Release Rs

Web Application

Borrow R

Generate Graphics/Data

Release R

Pool BPool A

Pool C

Elastic-R SOA platform

Page 20: Elastic-R A cloud platform for web computing, real-time collaboration, rapid applications development and reproducible modelling Karim Chine Cloud Era

Elastic-R for clusters/grids