elastic-r a cloud platform for web computing, real-time collaboration, rapid applications...
TRANSCRIPT
Elastic-RA cloud platform for web computing,
real-time collaboration, rapid applications development
and reproducible modelling
Karim ChineCloud Era Ltd
BD04 February 2011
o Open-source (GPL) software environment for statistical computing and graphics
o Lingua franca of data analysis.
o Repositories of contributed R packages related to a variety of problem domains in life sciences, social sciences, finance, econometrics, chemo metrics, etc. are growing at an exponential rate.
o R Website: http://www.r-project.org/o CRAN Task View: http://cran.r-project.org/web/views/o CRAN packages : http://cran.cnr.berkeley.edu/o Bioconductor: http://www.bioconductor.org/o R Metrics: https://www.rmetrics.org/
Scientific Computing Environments
www.scilab.org
http://root.cern.ch
www.sagemath.org
www.sas.com
office.microsoft.com
www.mathworks.com
www.scipy.org
www.spss.com
www.wolfram.com
From: John Fox, Aspects of the Social Organization and Trajectory of the R Project, R Journal-Feb 2009
The ‘s Success Story
"Give me a place to stand, and I shall move the earth
with a lever"
Scientific/Statistical Computing Software, HPC and Usability
Extract from the NetSolve/GridSolve Description Document
The emergence of Grid computing as the prototype of a next generation cyberinfrastructure for science has excited high expectations for its potential as an accelerator of discovery, but it has also raised questions about whether and how the broad population of research professionals, who must be the foundation of such productivity, can be motivated to adopt this new and more complex way of working.
The rise of the new era of scientific modeling and simulation has, after all, been precipitous, and many science and engineering professionals have only recently become comfortable with the relatively simple world of the uniprocessor workstations and desktop scientific computing tools. In that world, software packages such as Matlab and Mathematica represent general-purpose scientific computing environments (SCEs) that enable users — totaling more than a million worldwide — to solve a wide variety of problems through flexible user interfaces that can model in a natural way the mathematical aspects of many different problem domains.
Moreover, the ongoing, exponential increase in the computing resources supplied by the typical workstation makes these SCEs more and more powerful, and thereby tends to reduce the need for the kind of resource sharing that represents a major strength of Grid computing [1]. Certainly there are various forces now urging collaboration across disciplines and distances, and the burgeoning Grid community, which aims to facilitate such collaboration, has made significant progress in mitigating the well-known complexities of building, operating, and using distributed computing environments. But it is unrealistic to expect the transition of research professionals to the Grid to be anything but halting and slow if it means abandoning the SCEs that they rightfully view as a major source of their productivity . We therefore believe that Grid computing’s prospects for success will tend to rise and fall according to its ability to interface smoothly with the general purpose SCEs that are likely to continue to dominate the toolbox of its targeted user base.
Arnold, D. and Agrawal, S. and Blackford, S. and Dongarra, J. and Miller, M. and Seymour, K. and Sagi, K. and Shi, Z. and Vadhiyar, S.
Computational Components R packages : CRAN, Bioconductor, Wrapped C,C++,Fortran code Scilab modules, Matlab Toolkits, etc. Open source or commercial
Computational Resources Hardware & OS agnostic computing engine : R, Scilab,..
Clusters, grids, private or public clouds free: academic grids or pay-per-use: EC2, Azure
Computational User InterfacesWorkbench within the browserBuilt-in views / Plugins / SpreadsheetsCollaborative viewsOpen source or commercial
Computational Scripts R / Python / Groovy
On client side: interactivity.. On server side: data transfer ..
Stateful or stateless, automatic mapping of R data objects and functions Computational Application Programming Interfaces Java / SOAP / REST, Stateless and stateful
Computational Data Storage Local, NFS, FTP, Amazon S3, Amazon EBS free or commercial
Generated Computational Web Services
Elastic-R
Elastic-R is a ubiquitous plug-and-play platform for scientific and statistical computing
Public Clouds
Private Cloud
Elastic-R portal: single facade to public and private clouds
Elastic-R is a collaborative Virtual Research Environment.Users can share their machine instances, stateful remote engines, data,..
Reproducible research: A scientist can snapshot her computational environment and her data. She can archive the snapshot or share it with others.
Elastic-R AMI 1
R 2.10 + BioC 2.5
Elastic-R AMI 2
R 2.9 + BioC 2..3
Elastic-R AMI 3
R 2.8+BioC 2.0
Elastic-R Amazon Machine Images
Elastic-R EBS 1
Data Set XXX
Elastic-R EBS 2
Data Set YYY
Elastic-R EBS 3
Data Set ZZZ
Elastic-R EBS 4
Data Set VVV
Elastic-R AMI 2
R 2.9 +
BioC 2.3 Elastic-R EBS 4
Data Set VVV
Amazon Elastic Block Stores
Elastic-R AMI 2
R 2.9 +
BioC 2.3 Elastic-R EBS 4
Data Set VVV
Elastic-R.org
Anatomy of an Elastic-R machine instance on Amazon EC2
HTTPS
Restful WS over SSL
SSH
Restful WS over SSL
SOAP over SSL
Heartbeat Restful WS over SSL
The scientist can control any number of stateful R engines from within an R session on the cloud or on his machine. He can use them for parallel computing
Software+Services=Applications convergence + ubiquitous collaboration.The server-side toolkit: R + spreadsheet models + virtual gui widgets.
Elastic-R on Infrastructure-as-a-Service style Cloud
Amazon Virtual Private Cloud
Subnet 2
Subnet 3
Subnet 1
The Elastic-R portal itself is an EC2 machine instance. Any number of portals can be run on EC2 for decentralized and private collaboration
T1
T3T2 getData
LogOn Login
Pwd
Options
SessionID associated with a reserved Elastic-R Engine
Retrieve Data logOff
ES
ESon2 ESon3f ( ES )
ESon1
T1,T2,T3 : Generated Stateful Web Services for R functions T1,T2 & T3LogOn, getData : R-SOAP methods
ES : ExpressionSetESon1, ESon2, ESon3 : ExpressionSet Object Names
f = T3 o T2 o T1
• remove ESonx
• « Clean » Elastic-R Engine
• Put Elastic-R Engine back in the Pool
• kill Elastic-R Engine
Stateful generated Web Services : Elastic-R for workflow workbenches
Generate token
Deliver token
Use token
Activate token
Launch machine instance
Register machine instance
Use R console
Call R Engine
XXYYZZ
XXYYZZ
XXYYZZ
XXYYZZ
AWSCredentials
+ Private Key
One Amazon account and many users : Elastic-R signed tokens
Elastic-R Portal :
www.elastic-r.org
Articles about the project:
Chine K. (2010). Open Science in the Cloud: Towards a Universal Platform for Scientific and Statistical Computing. In Handbook of Cloud Computing. (Chapter 19). Springer US.
Karim Chine, "Learning Math and Statistics on the Cloud, Towards an EC2-Based Google Docs-like Portal for Teaching / Learning Collaboratively with R and Scilab," icalt, pp.752-753, 2010 10th IEEE International Conference on Advanced Learning Technologies, 2010
Karim Chine, "Scientific Computing Environments in the age of virtualization, toward a universal platform for the Cloud" pp. 44-48, 2009 IEEE International Workshop on Open Source Software for Scientific Computation (OSSC), 2009
Karim Chine, "Biocep, Towards a Federative, Collaborative, User-Centric, Grid-Enabled and Cloud-Ready Computational Open Platform" escience,pp.321-322, 2008 Fourth IEEE International Conference on eScience, 2008
Linkedin Group:
http://www.linkedin.com/groups?home=&gid=2345405
Links
Thank you !
Node 5 : EC2 virtual machine 2
Remote Objects Registry
Node 1: Windows XP
Front-end host
Node 4 : EC2 virtual machine 1Node 4 : EC2 virtual machine 1
Node 2: Mac OS
Node 3: 64 bits Server / Linux
Supervisor
Cloudbursting
via Amazon Web Services
Perl Scripts
logOn
Use R
logOff
.NET Appli logOn
Use R
logOff
R-HTTP R-SOAP
Parallel Computing Applications
Borrow Rs
Use Rs
Release Rs
Web Application
Borrow R
Generate Graphics/Data
Release R
Pool BPool A
Pool C
Elastic-R SOA platform
Elastic-R for clusters/grids