saas and the transformation of research

Post on 28-Aug-2014

104 Views

Category:

Data & Analytics

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

In the academic research community we've made much progress over the past decade toward effective distributed cyberinfrastructure. In big-science fields such as high energy physics, astronomy, and climate, thousands benefit daily from tools that enable the distributed management and analysis of large quantities of data. Exploding data volumes and powerful simulation tools mean that most researchers will soon require similar capabilities, but they often do not have the resources or expertise to build and maintain the necessary IT infrastructure. Faced with a similar problem in industry, companies have adopted the Software-as-a-Service (SaaS) model to "free" themselves from IT complexity. We see the same shift occurring in the academic research world over the next decade - indeed, many of us use SaaS services such as Google Docs and Dropbox on a daily basis as an integral part of our research workflow. Here we describe a vision for the next generation of research cyberinfrastructure, and work that the University of Chicago has embarked on to further empower investigators and enable them to access new capabilities beyond the boundaries of their campus.

TRANSCRIPT

SaaS and theTransformation of

Research

Vas Vasiliadisvas@uchicago.edu

ci.uchicago.edu

High energy physicsMolecular biology

Cosmology

Genomics

Metagenomics Linguistics

Economics

Climate change

Visual arts

Urban Science

Thank you to our sponsors!

U.S . DEPARTMENT OF

ENERGY

Higgs discovery “only possible because of the extraordinary achievements of …grid computing”Rolf Heuer, CERN DG

25PB per year8,000 scientists worldwide

1PB in last experiment800 scientists worldwide

1.2 PB of climate dataDelivered to 23,000 users

We have exceptional infrastructure for the 1%

What about the 99%?

We have exceptional infrastructure for the 1%

Most labs have limited resources

NSF grants in 2007

< $350,00080% of awards50% of grant $$

$1,000,000

$100,000

$10,000

$1,000

2000 4000 6000 8000 Bryan Heidorn

57.7%

2012 Faculty Burden Survey, National Academies

< $50K $50-99K $100-199K $200-299K $300-499K $500-999K $1-3M > $3M40

45

50

55

60

65

Federal Funding Amount

Act

ive

Res

earc

h Ti

me

(%)

Active Research Time vs. Federal Funding Amount

Potential economies of scale

Small laboratories– PI, postdoc, technician, grad students– Estimate 10,000 across US research community– Average ill-spent/unmet need of 0.5 FTE/lab?

+ Medium-scale projects– Multiple PIs, a few software engineers– Estimate 1,000 across US research community– Average ill-spent/unmet need of 3 FTE/project?

= Total 8,000 FTE: at ~$100K/FTE => $800M/yr (If we could even find 8,000 skilled people)

Plus computers, storage, opportunity costs, …

Is there a better way to deliver research cyberinfrastructure?

FrictionlessAffordable

Sustainable

Commercial startups as “role models”

My ShinyNew Startup

“Frictionless”

Great User Experience+

High performance (but invisible) infrastructure

SaaS is transformational for…

Researchers

A simple problem• “Transfers often take longer than expected

based on available network capacities”

• “Lack of an easy to use interface to some of the high-performance tools”

• “Tools [are] too difficult to install and use”

• “Time and interruption to other work required to supervise large data transfers”

• “Need data transfer tools that are easy to use, well-supported, and permitted by site and facility cybersecurity organizations”

Excerpts from ESnet reports

Exemplar: APS Beamline 2-BM

X-Ray imaging, tomography, ~few µm to 30nm resolution

Currently can generate >100TB per day

<1GB/s data rate; ~3-5GB/s in 5-10 years

Transforming data acquisition

Current• Experimental parameters

optimized manually

• Collected data combined with visual inspection to confirm optimal condition

• Data reconstructed and sent to users via external drive

• User team starts data reduction at home institution

Transforming data acquisition

Envisaged• Experimental parameters

optimized automatically

• Collected data available to optimization programs

• Data are automatically reconstructed, reduced, and shared with local and remote participants

• User team leaves the APS with reduced data

Current• Experimental parameters

optimized manually

• Collected data combined with visual inspection to confirm optimal condition

• Data reconstructed and sent to users via external drive

• User team starts data reduction at home institution

Facility data acquisition

Research Data Managementas a Service

Globus transfer service

Reduced data

Analysis/SharingGlobus

sharing service

Globus data publication service*

* In development

730GB90 minutes

“…frees up my time to do more creative work rather than typing scp commands or devising scripts to initiate and

monitor progress to move many files.”Steven Gottlieb, Indiana University

San Diego to Miami1 click20 minutes

“Twenty minutes instead of sixty one hours. Globus makes OLAM global climate

simulations manageable.”Craig Mattocks, University of Miami

Early adoption is encouraging

15,327endpoints

182*

daily users

*30-day average

41.8PB

2B files

Other innovative science SaaS projects

“Affordable”

Competitive TCOat

Modest scale

A time of disruptive change

A time of disruptive change

Will data kill genomics?

“We are close to having a $1,000 genome sequence, but this may be accompanied by a$1 million interpretation.”Bruce Korf M.D.,Past President, American College of Medical Genetics

Will data kill genomics?analysis

globus genomics

Flexible, scalable, affordable genomics

analysis for all biologists

+Data management

SaaS

Next-gen sequenceanalysis pipelines

+Scalable IaaS

Exome: $3 – $20Whole Genome: $20 – $50

RNA-Seq: <$5

Alternatives are at 10-20x

Affordable scalability

350K Core hours in last 6 months

Dobyns LabExome analysis20x speed-upNext: 50x

Cox LabConsensus variant calling134 samples; 4 days<0.01% Mendel error rateNext: 13,000 samples

Another Example: DTI Pipelines

m1.large

m1.xlarge

m3.xlarge

m3.2xlarge

m2.xlarge

m2.2xlarge

m2.4xlarge

00.05

0.10.15

0.20.25

0.30.35

0.40.45

0.5On-Demand Spot (Low) Spot (High)

Cost

per

Sub

ject

($)

SaaS is transformational for…

ResearchersResource Providers

installers brokers

Cede (some) controlEvolve financial modelsAdapt institutional policiesBecome a lawyer!

developers integrators

GSI-OpenSSH

A platform for integration

A platform for integration

A platform for integration

administrators curators(of the user experience)

1 : 1 : 0 UX : Dev : Ops

We are a non-profit service provider to the non-profit

research community

Our challenge:

Sustainability

We are a non-profit service provider to the non-profit

research community

“Affordable” and “Sustainable”?

EitherHigh-priced commercial software (with generally higher levels of quality)

OrFree, open source software (with generally lower levels of quality)

Is there a happy medium?

Industry and economics themes

• Matlab: Commercial closed-source software. Sustainability achieved via license fees.

• Kitware: Commercial open source software. Sustainability achieved via services (mostly gov.?).

• DUNE: Community of university and lab people, with some commercial involvement.

• MVAPICH: Open source software. University team. Sustainability by continued fed. funding, some industry.

Globus: Subscriptions

Globus Provider plans(globus.org/provider-plans)

Globus Plus(globus.org/plus)

To provide more capability formore people at substantially

lower cost by creatively aggregating (“cloud”) and

federating (“grid”) resources

Our vision for a 21st century discovery infrastructure

top related