above the clouds: a view from academia

Post on 13-Jun-2015

3.270 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

The closing keynote by Armando Fox at the Eduserv Symposium 2011 - Virtualisation and the Cloud.

TRANSCRIPT

UC Berkeley

1

Above the Clouds:A View From Academia

Armando Fox, UC BerkeleyEDUSERV Symposium, 12 May 2011

Presentation slides licensed under Creative Commons Attribution-ShareAlike 3.0 Unported License.

Image: John Curley http://www.flickr.com/photos/jay_que/1834540/

Who Am I?

• Research: Internet-scale systems; productive parallel programming

• Teaching: software engineering• Writing: co-author Above the Clouds tech report• Disclaimer 1: I don’t speak for UC• Disclaimer 2: Relationship with Amazon

How We Got Into the Cloud: RAD Lab’s 5-year Mission

Enable 1 entrepreneur to prototype a great Web app over 3-day weekend, then deploy at scale

• Key technology: Statistical machine learning• Early critiques: “Demonstrate your ideas at scale!”• Moved from Sun Blackbox to EC2 in mid-2008

• Feb. 2009: Above the Clouds tech report*– Over 50K downloads, influenced high-profile IT co.’s

3

* abovetheclouds.cs.berkeley.edu, or CACM April 2010

Outline: Two Themes

• Academic clouds: public or private? – Theme 1: save money or improve research?– Theme 2: cloud user or cloud provider?

• Assumption: familiar with cloud basics

– Public/pay-as-you-go

– Private/closed/“condo”

• Non-goal: regulatory thickets around cloudifying “sensitive” information

Public Cloud: CS Research

• Over $350,000 spent on AWS since 2008 – PhD student ~ US$75k/year => cloud ~ 1/3 student/mo.

• Experiments: 100-300 nodes common, 900 max– large-scale storage, cloud programming, MapReduce– results at scale now required for top-tier conferences– most experiments last 0-4 hours– “Small” experiments also in cloud for convenience

• Comparison: Sun BlackBox at Berkeley– $200k acquire & install– $300k+ in hardware donations – staff: ≥0.5 FTE

Public Cloud: CS Education

• Great Ideas in Computer Architecture (reinvented Fall 2010): 190 students

• Software Engineering for Software-as-a-Service: 70 students

• Operating Systems: 70 students• Intro. Data Science: 30 students • Adv. topics in HCI: 20 students• Natural language processing: 20 students• Large-scale programming abstractions for the cloud:

~20 students (Fall 2011)

Administration, provisioning, sizing much easier on public cloud than UC instructional computing

Cloud Economics

• “Private should be cheaper if you have stable utilization”

Demand

Capacity

Time

Demand

Capacity

Time

$Private < $Public?

• Capital: hardware, networking, power 5-7x cheaper at 100K’s scale (Hamilton 2007)

• Operations: heavy automation => 1000’s machines per FTE admin

• R&D: cloud providers had to serve internal business need

• Services: “Scale makes availability affordable”: wide-area disaster recovery facilities

• Hidden/shared costs: power, cooling, staff, ....

Hard to Compete on Cost

• Zero-touch metering/billing infrastructure• Optimized for low margin

– $0.08/hr: virtual CPU on EC2– $0.02-0.08/hr: as-available “spot instances”– Reserved (prepay 1-3 years, save if utilization > 25%)

– $0.00/hr: 1 year free usage tier for all services– Private: ≥ $0.075/hr ($2000 private server amortized

over 3 years with no indirect costs)

• “Moving to EC2 would cost about a factor of 2”– Highly placed colleague at major social site

Try to smooth out peaks?

• Not waiting in queues accelerates research!– Run several experiments simultaneously, each using

100’s of machines for 1-2 hours, without queueing up

– Basic queueing theory: trade utilization vs service time

– Better performance isolation than private cloud (!)

– N.B. for long jobs, some queueing may be OK

• Corollary 1: cost-associative billing encourages research spontaneity

• Corollary 2: incentive to stop using is important!

Effective metering & billing is key to on-demand usage model

Example: wait times on UC Berkeley “Mako” cluster

Mako has 272 dual-socket (quad-core per socket) nodes with 24 GB RAM each

Source: ShaRCS—Shared Research Computing Services, presentation by UC Office of the President at the UC Cloud Summit, April 2011

On the other hand...Big Data

Application Data generated per day

DNA Sequencing (Illumina HiSeq machine)

1 TB

Large Synoptic Survey Telescope 30 TB; 400 Mbps sustained data rate between Chile and NCSA

Large Hadron Collider 60 TB

* Simson L. Garfinkel, An Evaluation of Amazon’s Grid Computing Services: EC2, S3 and SQS, Technical Report TR-08-07, School of Engineering & Applied Sciences, Harvard University, 2008.Source: Ed Lazowska, eScience 2010, Microsoft Cloud Futures Workshop, lazowska.cs.washington.edu/cloud2010.pdf

• Challenge: Long-haul networking is most expensive cloud resource, and improving most slowly

• Copy 8 TB to EC2 at ~20 Mbps*: ~35 days, ~$800• Ship four 2 TB drives to Amazon: 1 day, ~$150• Can private/shared networking resources be combined

with public cloud to get best of both?

On the other hand...Cloud Provider

• Hard research on public cloud:– scheduling/provisioning research– security: honeypots, malware

containment,epidemic modeling

– energy efficiency or other physical monitoring– experimenting with networking fabric, multicast, etc.

• N.B., cloud provider research needs cloud users!– Example: Microsoft Research Silicon Valley

“Sherwood” cluster (~240 nodes)

Demanding customers drove cloud research

Nonprofit/Academic clouds

• PlanetLab & Emulab– highly successful from their customers’ point of view– lots of great research, some of which might have been

impossible on today’s public cloud

• Academic/research clusters– Yahoo/IBM/M45 cluster, Google/IBM cluster, TerraGrid: primarily

application-level research– OpenCirrus (HP/Intel/Yahoo/UIUC/IDA Singapore/Karlsruhe):

bare-metal, federated, 1K+ cores/site

• Access model: write proposal; closed community • Saving money is non-goal (in fact, a subsidized

investment by universities & industrial partners)

OpenCirrus

• Infrastructure costs increase with # sites• Claim: even at ~50% utilization, owning your infrastructure pays for itself

in ~3 years Source: R. Campbell et al., OpenCirrus..., Proc. 2011 Workshop on Hot Topics in Cloud Computing (HotCloud’09),

June 2011 (to appear)

Public & private clouds don’t see same benefits

Benefit Public Private

“infinite” resources on-demand Yes No

Instantaneous provisioning Yes Varies

Better hardware Yes No

Zero-commitment pay-as-you-go* Yes No

Reduced costs from economy of scale Yes No

Can do “cloud provider” research No Yes

Can trust co-tenants No Yes

Better utilization through virtualization Yes Yes

Quickly & inexpensively move big data No Yes

Address data-custody regulatory issues Varies Yes16* Implies ability to meter, and incentive to release idle resources

UC Berkeley

So You Want to Build a Cloud...

• Single point of failure?

• Zero Touch?

• Hidden costs?

Single point of failure

• 30+ hour EBS outage on 21 April 2011– triggered by human error (network config change)

• Georedundant services (Netflix) largely unaffected– At least, georedundancy was an available option!

• Non-redundant services had catastrophic outages

• Question: would “more” operational expertise have resolved outage faster?

Metering and Billing

• Billing is policy. Metering is mechanism.– Pay-as-you-go policy allows cost associativity– Any policy only as flexible as its mechanism – Amazon’s mechanism: “zero touch” metering– So, Virtual Private Cloud ≠ your private cloud

• Which of these need human intervention:– Signing up? Provisioning? Deploying? Billing?– Academic/nonprofit clouds don’t even try this

Hidden Costs

• Single billing scheme captures all costs, or must some costs be billed/accounted separately?– shared expenses: power, networking– general employment benefits/overhead for staff

• Cost of keeping up with innovation– On average, AWS has deployed 1 new service every

2 months since EC2 beta launch*

• Competition from new providers will exacerbate– Microsoft Azure, VMware CloudFoundry, ...

* 21 Web service APIs as of April 2011

Two themes

• Academic clouds: public or private? – Theme 1: save money or improve research?– Theme 2: cloud user or cloud provider?

• Capability– Cloud accelerates and enables new research– Scale that can’t be achieved any other way

• Cost– Will private cloud cost less? Is that the main goal?– Have hidden costs been accounted for?– Cost-associativity allows bursty use, encourages

spontaneity, but needs fine grained metering

Two themes

• Academic clouds: public or private? – Theme 1: save money or improve research?– Theme 2: cloud user or cloud provider?

• Cloud provider research may require private cloud– Security, energy, bare-metal, cloud provisioning, ...

– But, still need cloud users (customers) to drive/validate

– Need public-cloud-level APIs, service reliability

• Cloud user– Big data may impede some public-cloud-ready apps

– Exotic architectures (SSD, in-memory DB, ...)

– Regulatory issues....

Summary

• Public cloud shows how to “move slider” between insourcing & outsourcing

• Unlikely to compete on cost with very large scale public clouds

• So, how much can/should you outsource......for technical reasons (types of research possible)?

...for regulatory reasons (data privacy, etc.)?

• Remember the non-obvious costs– Metering & billing, esp. for shared overheads– Keeping up with the ecosystem

Thanks!

• UC Berkeley Reliable Adaptive Distributed Systems Lab & Affiliates

• UC Cloud Computing Task Force• Andy Powell & Eduserv

RAD Lab Team in 2009

top related