cloud com foster december 2010
DESCRIPTION
We've all heard about how on-demand computing and storage will transform scientific practice. But by focusing on resources alone, we're missing the real benefit of the large-scale outsourcing and consequent economies of scale that cloud is about. The biggest IT challenge facing science today is not volume but complexity. Sure, terabytes demand new storage and computing solutions. But they're cheap. It is establishing and operating the processes required to collect, manage, analyze, share, archive, etc., that data that is taking all of our time and killing creativity. And that's where outsourcing can be transformative. An entrepreneur can run a small business from a coffee shop, outsourcing essentially every business function to a software-as-a-service provider--accounting, payroll, customer relationship management, the works. Why can't a young researcher run a research lab from a coffee shop? For that to happen, we need to make it easy for providers to develop "apps" that encapsulate useful capabilities and for researchers to discover, customize, and apply these "apps" in their work. The effect, I will argue, will be a dramatic acceleration of discovery.TRANSCRIPT
What the cloud really
means for science
Ian FosterComputation Institute
University of Chicago & Argonne National Laboratory
Science is merely an extremely powerful method
of winnowing what’s true from what feels good
— Carl Sagan
J.C.R. Licklider on thinking (1960)
About 85% of my “thinking” time was spent getting into a position
to think, to make a decision, to learn something I needed to know
“At one point, it was necessary to compare six experimental determinations of a function relating speech-intelligibilityto speech-to-noise ratio. No two experimenters had used the same definition or measure of speech-to-noise ratio. Several hours of calculating were required to get the data into comparable form. When they were in comparable form, it took only a few seconds to determine what I needed to know.”
42%!!
Time-consuming tasks in science
Run experimentsCollect dataManage dataMove dataAcquire computersAnalyze dataRun simulationsCompare experiment with simulationSearch the literature
• Communicate with colleagues
• Publish papers• Find, configure, install
relevant software• Find, access, analyze
relevant data• Order supplies• Write proposals• Write reports• …
Software-as-a-Service (SaaS)
Platform-as-a-Service (PaaS)
Infrastructure-as-a-Service (IaaS)
Software-as-a-Service (SaaS)
Platform-as-a-Service (PaaS)
Infrastructure-as-a-Service (IaaS)
Software-as-a-Service (SaaS)
Platform-as-a-Service (PaaS)
Infrastructure-as-a-Service (IaaS)
Time-consuming tasks in business
Web presence Email (hosted Exchange) Calendar Telephony (hosted VOIP) Human resources and payroll Accounting Customer relationship mgmt Data analytics Content distribution …
SaaS
Time-consuming tasks in business
Web presence Email (hosted Exchange) Calendar Telephony (hosted VOIP) Human resources and payroll Accounting Customer relationship mgmt Data analytics Content distribution …
SaaS
IaaS
Time-consuming tasks in science
Run experimentsCollect dataManage dataMove dataAcquire computersAnalyze dataRun simulationsCompare experiment with simulationSearch the literature
• Communicate with colleagues
• Publish papers• Find, configure, install
relevant software• Find, access, analyze
relevant data• Order supplies• Write proposals• Write reports• …
Software-as-a-Service (SaaS)
Platform-as-a-Service (PaaS)
Infrastructure-as-a-Service (IaaS)
Time-consuming tasks in science
Run experimentsCollect dataManage dataMove dataAcquire computersAnalyze dataRun simulationsCompare experiment with simulationSearch the literature
• Communicate with colleagues
• Publish papers• Find, configure, install
relevant software• Find, access, analyze
relevant data• Order supplies• Write proposals• Write reports• …
Time-consuming tasks in science
Run experimentsCollect dataManage dataMove dataAcquire computersAnalyze dataRun simulationsCompare experiment with simulationSearch the literature
• Communicate with colleagues
• Publish papers• Find, configure, install
relevant software• Find, access, analyze
relevant data• Order supplies• Write proposals• Write reports• …
A B
SaaS defined (Gartner)
1. The application is owned, delivered, and managed remotely by one or more providers 2. The application is based on a single code base that is consumed in a one-to-many model by all contracted customers at any time3. The application is licensed on pay-per-use or subscription basis
————————————————————————————4. The application behind the service is properly web architected—not an existing application web enabled [D. Terrar]
Globus ToolkitBuild the Grid
Components for building custom grid solutions
globustoolkit.org
Globus OnlineUse the Grid
Cloud-hostedfile transfer service
globusonline.org
“CLI 2.0”
scp go#ep1:/share/godata/file1.txt \ go#ep2:~/myfile.txt
Command Endpoints
canceldetailsendpoint-activateendpoint-addendpoint-deactivateendpoint-listendpoint-modifyendpoint-removeendpoint-rename
eventslsprofilescpstatustransferversionswait
28.6 Terabytes31,000 files56h 44mNo human intervention
Astrophysics simulation data generated in Tennessee, moved to Illinois for visualization (Enzo, UCSD; Futures Lab, Argonne)
Datastore
A peek inside Globus Online
GridFTP
GridFTP
Profiles+ state
ConsumerConsumer
ConsumerConsumerRequest
collector
Notificationtarget
WorkerWorker
WorkerWorker
Worker
A B
32
11 x 125 files200 MB each
11 users12 sites
Coming soon
Lightweight transfer agentFor firewalls, sites without GridFTP installed
Higher-level data management capabilitiesGroup managementData publication, replication, etc.Workflow
Additional protocol supportHTTP, SRM, …
Condor integration (version 7.6.0)Stage in and stage out
Time-consuming tasks in science
Run experimentsCollect dataManage dataMove dataAcquire computersAnalyze dataRun simulationsCompare experiment with simulationSearch the literature
• Communicate with colleagues
• Publish papers• Find, configure, install
relevant software• Find, access, analyze
relevant data• Order supplies• Write proposals• Write reports• …
# U
sers
Application
Research App store
Distributedbig data
Third-partyWeb
apps
Students PartnersResearcher
Resources
Acknowledgements
Numerous people have contributed to the Globus Online work, including:
Bryce Allen, Joshua Boverhof, John Bresnahan, Lisa Childers, Paul Dave’, Fred Dech, Ian Foster, Dan Gunter, Gopi Kandaswany, Nick Karonis, Raj Kettimuthu, Jack Kordas, Lee Liming, Mike Link, Stu Martin, JP Navarro, Karl Pickett, Mei Hui Su, Steve Tuecke, Vas Vasiliadis
Many thanks to our funders: DOE, NSF, and the University of Chicago
Thank you!Ian Foster
Computation InstituteUniversity of Chicago & Argonne National Laboratory