immersive teaching and research in data sciences via cloud computing cloud era ltd 13 june 2013...
TRANSCRIPT
Immersive Teaching and Research in Data Sciences via Cloud ComputingCloud Era Ltd 13 June 2013 [email protected]
Karim Chine
22
Outline Introduction The Rise of Data Science What is missing on the Cloud for Research and
Education? Elastic-R: The Next Generation Data Science Platform Elastic-R: Design and Technologies Overview Rethinking Virtual Research and Teaching Demo Conclusion
33
Introduction Science and the 4th Paradigm
The promises of Grid Computing for e-Research The e-Science program (UK) The NSF-funded cyber infrastructure (USA) e-Infrastructure and ICT Calls (EC)
2
22.
3
4
a
cG
a
a
Experimental Science Theoretical Science Computational Science e-Science / Data-intensive Science
4
Introduction
The Dancing Bear
The townspeople gather to see the wondrous sight as the massive, lumbering beast shambles and shuffles from paw to paw. The bear is really a terrible dancer, and the wonder isn't that the bear dances well but that the bear dances at all
The tragic failure of the Grid paradigm No democratic access for all scientists, a tool for Particle Physicists and Geeky scientists No sustainable infrastructures No SLA, best effort support No flexibility, no isolation, only admins can install software, X.509 certificates, restrictive policies No interaction design, no user-centric approach
55
Introduction The Data Deluge and the challenges of Big Data
* http://www.slideshare.net/AmazonWebServices/20120620-aws-summitberlinbigdataanalyticsonaws
6
The Rise of Data Science
* http://en.wikipedia.org/wiki/Data_science
Data Scientists Led by NASA Star Most Sought for Century: JobsBy Aki Ito - Jun 11, 2013 12:01 AM ET (Bloomberg.com)Harvard Business Review last year called this profession “the sexiest job of the 21st century”One measure of demand: Hours billed for work in statistical analysis grew by 522 percent in the first quarter compared with the same period in 2011
77
What is missing on the Cloud for Research and Education? Science-as-a-Service: (SCEs) Scientific Computing
Environments-as-a-service, Models-as-a-service… Generalized Real-time Collaboration, including
interaction with SCEs and with Data One consistent platform for homogenous access to
Science Clouds and new abstractions for stateful access to remote infrastructure/Data
Reproducibility and traceability of data analysis and computational research
Science Gateways building and publishing for everyone
8
Elastic-R: The Next Generation Data Science Platform
Arduino / Raspberry piDemocratizing Electronics
Elastic-RDemocratizing Data Science
99
Elastic-R: The Next Generation Data Science Platform Elastic-R is:
A link between the data scientist/the educator/the student and the Infrastructure-as-a-Service
A consistent set of concepts, architectures, frameworks and interaction design models to wrap and complete the existing public and private clouds with the missing capabilities to radically improve the data scientist productivity
« Super Glue » between the virtual IT capabilities, the software capabilities, the mathematical/statistical capabilities and the man-machine interaction components
A Data Operating System enabling infinite combinations for analysing data and building rapidly Data Science-centric applications and services
1010
Elastic-R: The Next Generation Data Science Platform Elastic-R is:
A hyper-dropbox allowing to run a large spectrum of capabilities next to the centrally stored and ubiquitously accessible data.
A platform using the cloud to introduce traceability and reproducibility to the data science arena.
A federation paradigm for scientific clouds A framework for building on-the-fly user-defined software-as-a-service
11
Elastic-R: Design and Technologies Overview
12
Elastic-R: Design and Technologies Overview
Robot submarine dives to the deepest part of the ocean controlled by a 7-mile cable as thin as single human hair
13
Elastic-R: Design and Technologies Overview
Remote Java/R Processes Events-driven Remote
Objects/Engines R, Python, Mathematica,
Matlab, Scilab, .. Collaborative Spreadsheets Collaborative Scientific
Graphics Canvas Collaborative Dashboard with
collaborative widgets
14
Elastic-R: Design and Technologies Overview
15
Elastic-R: Design and Technologies Overview
Elastic-R AMI 1R 2.10
BioC 2.5
Elastic-R AMI 2R 2.9
BioC 2.3
Elastic-R AMI 3R 2.8
BioC 2.0
Elastic-R Amazon Machine Images
Elastic-R EBS 1
Data Set XXX
Elastic-R EBS 2
Data Set YYY
Elastic-R EBS 3
Data Set ZZZ
Elastic-R EBS 4
Data Set VVV
Elastic-R AMI 2
R 2.9BioC 2.3
Elastic-R EBS 4
Data Set VVV
Amazon Elastic Block Stores
Elastic-R AMI 2R 2.9
BioC 2.3
Elastic-R.org
Elastic-R EBS 4
Data Set VVV
1616
Modes of access Individuals with AWS accounts
Standard AMIs (Amazon Machine Images): paid-per-use to Amazon. For Academic use and trial purposes
Paid AMI: Paid-per-use software model. For business users
Individuals without AWS accounts Trial tokens, purchased tokens, tokens granted by other users. Resources (data science engines) shared by other users Individual subscribtions
Companies/Educational & Research Institutions Dedicated Platform and AMIs on an Amazon VPC (Virtual Private
Cloud). Paid via subscribtion
1717
Rethinking Virtual Research and Teaching Free researchers from the IT service dictatorship. Give researchers self-service access to the IT resources
they need Allow Researchers to share without restrictions Make real-time collaboration a free and ubuiquitous
service Allow researchers to produce and publish to the web
advanced applications/services without recourse to developers/admins
1818
Rethinking Virtual Research and Teaching Make the cloud an ecosystem for Open Science where
all research artifacts can be produced, discovered and reused
Allow Researchers to « sell » the software/models/algorithms/techniques they invent seamlessly
Bridge the gap between the different computational research tools: interconnect SCEs, workflow workbenches, Documents editors…
1919
Rethinking Virtual Research and Teaching Provide affordable and reliable tools for remote
education High-quality voice and video chat for a large number of users Self-service collaboration tools: Editors, White boards, IDEs, etc. Modules for Traceability/reproducibilty
Extend existing on-line courses platforms to include capabilities such as: Companion software environments in SaaS mode Collaborative problem solving tools Interactive courses Tokens for Ready-to-run e-Learning applications E-Learning environments’ visual designers
2020
Demo Register to Elastic-R academic and trial portal (
www.elastic-r.org ) Create Data Science Engines using trial tokens Work with R, Python and Scientific Spreadsheets in the
browser Share Data Science Engine and Collaborate Use The Visual and Collaborative Scientific Applications
designer to create and publish to the web an interactive dashboard
Connect to the remote Data Science Engine from withing a local R session, push and pull data, execute commands and show impact on the dashboard
2121
Conclusion Elastic -R unlocks the potential of the cloud for Data
Scientists and Educators With Elastic-R, the cloud becomes a cyberspace for
collaborative research and sharing and an eco-system suited for open Science, open innovation and open education
Elastic-R improves dramatically the productivity of the Data Scientists: The entire Data Science Factory chain, from resources acquisition to services and applications publishing, becomes under their direct control
Elastic-R provides Analytics-as-a-Service platform that can extend any existing portal or application
2222
What to do Next Register to Elastic-R and try the HTML 5 Workbench and
the collaboration Download the R package elasticR and use it to access
the cloud from local R sessions Download the Java SDK and try to create your first
Analytical application using AWS and the most advanced tools for programming with data.
Get in touch with me to explore potential partnerships and collaborations