saas and the transformation of research
DESCRIPTION
In the academic research community we've made much progress over the past decade toward effective distributed cyberinfrastructure. In big-science fields such as high energy physics, astronomy, and climate, thousands benefit daily from tools that enable the distributed management and analysis of large quantities of data. Exploding data volumes and powerful simulation tools mean that most researchers will soon require similar capabilities, but they often do not have the resources or expertise to build and maintain the necessary IT infrastructure. Faced with a similar problem in industry, companies have adopted the Software-as-a-Service (SaaS) model to "free" themselves from IT complexity. We see the same shift occurring in the academic research world over the next decade - indeed, many of us use SaaS services such as Google Docs and Dropbox on a daily basis as an integral part of our research workflow. Here we describe a vision for the next generation of research cyberinfrastructure, and work that the University of Chicago has embarked on to further empower investigators and enable them to access new capabilities beyond the boundaries of their campus.TRANSCRIPT
High energy physicsMolecular biology
Cosmology
Genomics
Metagenomics Linguistics
Economics
Climate change
Visual arts
Urban Science
Thank you to our sponsors!
U.S . DEPARTMENT OF
ENERGY
Higgs discovery “only possible because of the extraordinary achievements of …grid computing”Rolf Heuer, CERN DG
25PB per year8,000 scientists worldwide
1PB in last experiment800 scientists worldwide
1.2 PB of climate dataDelivered to 23,000 users
We have exceptional infrastructure for the 1%
What about the 99%?
We have exceptional infrastructure for the 1%
Most labs have limited resources
NSF grants in 2007
< $350,00080% of awards50% of grant $$
$1,000,000
$100,000
$10,000
$1,000
2000 4000 6000 8000 Bryan Heidorn
57.7%
2012 Faculty Burden Survey, National Academies
< $50K $50-99K $100-199K $200-299K $300-499K $500-999K $1-3M > $3M40
45
50
55
60
65
Federal Funding Amount
Act
ive
Res
earc
h Ti
me
(%)
Active Research Time vs. Federal Funding Amount
Potential economies of scale
Small laboratories– PI, postdoc, technician, grad students– Estimate 10,000 across US research community– Average ill-spent/unmet need of 0.5 FTE/lab?
+ Medium-scale projects– Multiple PIs, a few software engineers– Estimate 1,000 across US research community– Average ill-spent/unmet need of 3 FTE/project?
= Total 8,000 FTE: at ~$100K/FTE => $800M/yr (If we could even find 8,000 skilled people)
Plus computers, storage, opportunity costs, …
Is there a better way to deliver research cyberinfrastructure?
FrictionlessAffordable
Sustainable
Commercial startups as “role models”
My ShinyNew Startup
“Frictionless”
Great User Experience+
High performance (but invisible) infrastructure
SaaS is transformational for…
Researchers
A simple problem• “Transfers often take longer than expected
based on available network capacities”
• “Lack of an easy to use interface to some of the high-performance tools”
• “Tools [are] too difficult to install and use”
• “Time and interruption to other work required to supervise large data transfers”
• “Need data transfer tools that are easy to use, well-supported, and permitted by site and facility cybersecurity organizations”
Excerpts from ESnet reports
Exemplar: APS Beamline 2-BM
X-Ray imaging, tomography, ~few µm to 30nm resolution
Currently can generate >100TB per day
<1GB/s data rate; ~3-5GB/s in 5-10 years
Transforming data acquisition
Current• Experimental parameters
optimized manually
• Collected data combined with visual inspection to confirm optimal condition
• Data reconstructed and sent to users via external drive
• User team starts data reduction at home institution
Transforming data acquisition
Envisaged• Experimental parameters
optimized automatically
• Collected data available to optimization programs
• Data are automatically reconstructed, reduced, and shared with local and remote participants
• User team leaves the APS with reduced data
Current• Experimental parameters
optimized manually
• Collected data combined with visual inspection to confirm optimal condition
• Data reconstructed and sent to users via external drive
• User team starts data reduction at home institution
Facility data acquisition
Research Data Managementas a Service
Globus transfer service
Reduced data
Analysis/SharingGlobus
sharing service
Globus data publication service*
* In development
730GB90 minutes
“…frees up my time to do more creative work rather than typing scp commands or devising scripts to initiate and
monitor progress to move many files.”Steven Gottlieb, Indiana University
San Diego to Miami1 click20 minutes
“Twenty minutes instead of sixty one hours. Globus makes OLAM global climate
simulations manageable.”Craig Mattocks, University of Miami
Early adoption is encouraging
15,327endpoints
182*
daily users
*30-day average
41.8PB
2B files
Other innovative science SaaS projects
“Affordable”
Competitive TCOat
Modest scale
A time of disruptive change
A time of disruptive change
Will data kill genomics?
“We are close to having a $1,000 genome sequence, but this may be accompanied by a$1 million interpretation.”Bruce Korf M.D.,Past President, American College of Medical Genetics
Will data kill genomics?analysis
globus genomics
Flexible, scalable, affordable genomics
analysis for all biologists
+Data management
SaaS
Next-gen sequenceanalysis pipelines
+Scalable IaaS
Exome: $3 – $20Whole Genome: $20 – $50
RNA-Seq: <$5
Alternatives are at 10-20x
Affordable scalability
350K Core hours in last 6 months
Dobyns LabExome analysis20x speed-upNext: 50x
Cox LabConsensus variant calling134 samples; 4 days<0.01% Mendel error rateNext: 13,000 samples
Another Example: DTI Pipelines
m1.large
m1.xlarge
m3.xlarge
m3.2xlarge
m2.xlarge
m2.2xlarge
m2.4xlarge
00.05
0.10.15
0.20.25
0.30.35
0.40.45
0.5On-Demand Spot (Low) Spot (High)
Cost
per
Sub
ject
($)
SaaS is transformational for…
ResearchersResource Providers
installers brokers
Cede (some) controlEvolve financial modelsAdapt institutional policiesBecome a lawyer!
developers integrators
GSI-OpenSSH
A platform for integration
A platform for integration
A platform for integration
administrators curators(of the user experience)
1 : 1 : 0 UX : Dev : Ops
We are a non-profit service provider to the non-profit
research community
Our challenge:
Sustainability
We are a non-profit service provider to the non-profit
research community
“Affordable” and “Sustainable”?
EitherHigh-priced commercial software (with generally higher levels of quality)
OrFree, open source software (with generally lower levels of quality)
Is there a happy medium?
Industry and economics themes
• Matlab: Commercial closed-source software. Sustainability achieved via license fees.
• Kitware: Commercial open source software. Sustainability achieved via services (mostly gov.?).
• DUNE: Community of university and lab people, with some commercial involvement.
• MVAPICH: Open source software. University team. Sustainability by continued fed. funding, some industry.
Globus: Subscriptions
Globus Provider plans(globus.org/provider-plans)
Globus Plus(globus.org/plus)
To provide more capability formore people at substantially
lower cost by creatively aggregating (“cloud”) and
federating (“grid”) resources
Our vision for a 21st century discovery infrastructure