11 yeah! that’s what i’d like to know. indranil gupta (indy) lecture 2 what(’s in) the cloud?...

38
1 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

Upload: percival-hubbard

Post on 04-Jan-2016

220 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

11

Yeah! That’s what I’d like to know.

Indranil Gupta (Indy)Lecture 2

What(’s in) the Cloud?January 21, 2010

CS 525 Advanced Distributed

SystemsSpring 2010

All Slides © IG

Page 2: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

2

Clouds are Water Vapor

• Oracle has a Cloud Computing Center.

• And yet…

• Larry Ellison’s Rant on Cloud Computing

2

Page 3: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

33

The Hype!• Gartner - Cloud computing revenue

will soar faster than expected and will exceed $150 billion within five years.

• Forrester - Cloud-Based Email Is Often Cheaper Than On-Premise Email

• Vivek Kundra, CTO of Obama Government: “Growing adoption of cloud computing could improve data sharing and promote collaboration among federal, state and local governments.” E.g: fedbizopps.gov

• Merrill Lynch: “By 2011 the volume of cloud computing market opportunity would amount to $160bn, including $95bn in business and productivity apps (email, office, CRM, etc.) and $65bn in online advertising.”

• IDC: “Spending on IT cloud services will triple in the next 5 years, reaching $42 billion and capturing 25% of IT spending growth in 2012.”

Sources: http://www.infosysblogs.com/cloudcomputing/2009/08/the_cloud_computing_quotes.htm and http://www.mytestbox.com

Page 4: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

4

Ha ha hype! It’s a bunch of tripe, since no one is

probably making or saving money.

Page 5: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

5

$$$

• Ingo Elfering, Vice President of Information Technology Strategy, GlaxoSmithKline:“With Online Services, we are able to reduce our IT operational costs by roughly 30% of what we’re spending now and introduce a variable cost subscription model for these technologies that allows us to more rapidly scale or divest our investment as necessary as we undergo a transformational change in the pharmaceutical industry”

• Jim Swartz, CIO, Sybase: “At Sybase, a private cloud of virtual servers inside its data centre has saved nearly $US2 million annually since 2006, Swartz says, because the company can share computing power and storage resources across servers.”

• Dave Power, Associate Information Consultant at Eli Lilly and Company: “With AWS, Powers said, a new server can be up and running in three minutes (it used to take Eli Lilly seven and a half weeks to deploy a server internally) and a 64-node Linux cluster can be online in five minutes (compared with three months internally). The deployment time is really what impressed us. It's just shy of instantaneous."

Sources: http://www.infosysblogs.com/cloudcomputing/2009/08/the_cloud_computing_quotes.htm and http://www.mytestbox.com

Page 6: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

6

Alright, alright. But for heaven’s sake, can someone tell

me what is a cloud?

Page 7: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

7

What is a Cloud?

• It’s a cluster! It’s a supercomputer! It’s a datastore!

• It’s superman!

• None of the above• All of the above

• Cloud = Lots of storage + compute cycles nearby

Page 8: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

8

What is a Cloud?

• A single-site cloud (aka “Datacenter”) consists of– Compute nodes (split into racks)– Switches, connecting the racks– A network topology, e.g., hierarchical– Storage (backend) nodes connected to the network– Front-end for submitting jobs– Services: physical resource set, software services

• A geographically distributed cloud consists of– Multiple such sites– Each site perhaps with a different structure and

services

Page 9: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

9

A Sample Cloud Topology

Top of the Rack Switch

Core Switch

Servers

Rack

Page 10: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

10

Scale of Industry Datacenters

• Microsoft [NYTimes, 2008]– 150,000 machines– Growth rate of 10,000 per month– Largest datacenter: 48,000 machines– 80,000 total running Bing

• Yahoo! [Hadoop Summit, 2009]– 25,000 machines– Split into clusters of 4000

• AWS EC2 (Oct 2009)– 40,000 machines– 8 cores/machine

• Google– (Rumored) several hundreds of thousands of machines

Page 11: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

11

OK, they are massive. But it is still called a “cluster”! And

that’s not a new concept!

Page 12: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

12

1940

1950

1960

1970

1980

1990

2000

Timesharing Companies & Data Processing Industry

2010

Grids

Peer to peer systems

Clusters

The first datacenters!

PCs(not distributed!)

Clouds and datacenters

“A Cloudy History of Time” © IG 2010

Page 13: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

13

Timesharing Industry (1975):•Market Share: Honeywell 34%, IBM 15%, •Xerox 10%, CDC 10%, DEC 10%, UNIVAC 10%•Honeywell 6000 & 635, IBM 370/168, Xerox 940 & Sigma 9, DEC PDP-10, UNIVAC 1108

Grids (1980s-2000s):•GriPhyN (1970s-80s)•Open Science Grid and Lambda Rail (2000s)•Globus & other standards (1990s-2000s)

First large datacenters: ENIAC, ORDVAC, ILLIACMany used vacuum tubes and mechanical relays

P2P Systems (90s-00s)•Many Millions of users•Many GB per day

Data Processing Industry - 1968: $70 M. 1978: $3.15 Billion.

Berkeley NOW ProjectSupercomputersServer Farms (e.g., Oceano)

“A Cloudy History of Time” © IG 2010

Clouds

Page 14: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

14

Why did all of this happen?

Page 15: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

15

Trends: Technology

• Doubling Periods – storage: 12 mos, bandwidth: 9 mos, and (what law is this?) cpu speed: 18 mos

• Then and Now

Bandwidth– 1985: mostly 56Kbps links nationwide– 2004: 155 Mbps links widespread

Disk capacity– Today’s PCs have 100GBs, same as a 1990

supercomputer

Page 16: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

16

Trends: Users• Then and Now Biologists:

– 1990: were running small single-molecule simulations – 2004: want to calculate structures of complex

macromolecules, want to screen thousands of drug candidates, sequence very complex genomes

Physicists– 2008 onwards: CERN’s Large Hadron Collider will

produce 700 MB/s or 15 PB/year

• Trends in Technology and User Requirements: Independent or Symbiotic?

Page 17: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

17

Prophecies

In 1965, MIT's Fernando Corbató and the other designers of the Multics operating system envisioned a computer facility operating “like a power company or water company”.

Plug your thin client into the computing Utilityand Play your favorite Intensive Compute &Communicate Application

– [Have today’s clouds brought us closer to this reality?]

Page 18: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

18

So, clouds have been around for decades! But

aside from massive scale what’s new about today’s

cloud computing?!

Page 19: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

19

What(’s new) in Today’s Clouds?Three major features:I. On-demand access: Pay-as-you-go, no upfront

commitment.– Anyone can access it (e.g., Washington Post – Hillary Clinton

example)

II. Data-intensive Nature: What was MBs has now become TBs.

– Daily logs, forensics, Web data, etc.– Do you know the size of Wikipedia dump?

III. New Cloud Programming Paradigms: MapReduce/Hadoop, Pig Latin, DryadLinq, Swift, and many others.

– High in accessibility and ease of programmability

Combination of one or more of these gives rise to novel and unsolved distributed computing problems in cloud computing.

Page 20: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

20

I. On-demand access: *aaS Classification

On-demand: renting a cab vs (previously) renting a car, or buying one. E.g.: – AWS Elastic Compute Cloud (EC2): $0.086-$1.16 per CPU hour – AWS Simple Storage Service (S3): $0.055-$0.15 per GB-month

• HaaS: Hardware as a Service– You get access to barebones hardware machines, do whatever you want with

them– Ex: Your own cluster, Emulab

• IaaS: Infrastructure as a Service– You get access to flexible computing and storage infrastructure. Virtualization is

one way of achieving this. Often said to subsume HaaS.– Ex: Amazon Web Services (AWS: EC2 and S3), Eucalyptus, Rightscale.

• PaaS: Platform as a Service– You get access to flexible computing and storage infrastructure, coupled with a

software platform (often tightly)– Ex: Google’s AppEngine

• SaaS: Software as a Service– You get access to software services, when you need them. Often said to

subsume SOA (Service Oriented Architectures).– Ex: Microsoft’s LiveMesh, MS Office on demand

Page 21: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

21

II. Data-intensive Computing

• Computation-Intensive Computing– Example areas: MPI-based, High-performance computing, Grids– Typically run on supercomputers (e.g., NCSA Blue Waters)

• Data-Intensive– Typically store data at datacenters– Use compute nodes nearby– Compute nodes run computation services

• In data-intensive computing, the focus shifts from computation to the data: CPU utilization no longer the most important resource metric

• Problem areas include– Distributed systems– Middleware – OS– Storage – Networking– Security– Others

Page 22: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

22

III. New Cloud Programming Paradigms

Dataflow programming frameworks• Google: MapReduce and Sawzall• Yahoo: Hadoop and Pig Latin• Microsoft: DryadLINQ• Facebook: Hive• Amazon: Elastic MapReduce service (pay-as-you-go)• Google (MapReduce)

– Indexing: a chain of 24 MapReduce jobs– ~200K jobs processing 50PB/month (in 2006)

• Yahoo! (Hadoop + Pig)– WebMap: a chain of 100 MapReduce jobs– 280 TB of data, 2500 nodes, 73 hours

• Facebook (Hadoop + Hive)– ~300TB total, adding 2TB/day (in 2008)– 3K jobs processing 55TB/day

• Similar numbers from other companies, e.g., Yieldex, eharmony.com, etc.

Page 23: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

23

This is all confusing. Can you give me some

examples of clouds?

Page 24: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

24

Two Categories of Clouds• Industrial Clouds

– Can be either a (i) public cloud, or (ii) private cloud– Private clouds are accessible only to company employees– Public clouds provide service to any paying customer:

• Amazon S3 (Simple Storage Service): store arbitrary datasets ,pay per GB-onth stored

• Amazon EC2 (Elastic Compute Cloud): upload and run arbitrary images, pay per CPU hour used

• Google AppEngine: develop applications within their appengine framework, upload data that will be imported into their format, and run

• Academic Clouds – Allow researchers to innovate, deploy, and experiment– Google-IBM Cloud (U. Washington): run apps programmed atop

Hadoop– Cloud Computing Testbed (CCT @ UIUC): first cloud testbed to support

systems research. Runs: (i) apps programmed atop Hadoop and Pig, (ii) systems-level research on this first generation of cloud computing models (~HaaS), and (iii) Eucalyptus services (~AWS EC2). http://cloud.cs.illinois.edu

– OpenCirrus: first federated cloud testbed. http://opencirrus.org

Page 25: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

25

Academic Clouds

• CCT = Cloud Computing Testbed– NSF infrastructure– Used by 10+ NSF projects, including several non-UIUC

projects– Housed within Siebel Center (4th floor!)– Accessible to students of CS525!

• Almost half of SP09 course used CCT for their projects

• OpenCirrus = Federated Cloud Testbed– Contains CCT and other sites

• If you need a CCT account for your CS525 experiment, let me know asap! There are a limited number of these available for CS525

Page 26: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

26

Cloud Computing Testbed (CCT)

Page 27: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

27

•128 compute nodes = 64+64

•500 TB & 1000+ shared cores

CCT Hardware in more Detail

Page 28: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

28

Goal of CCT: Support both Systems Research and Applications Research

in Data-intensive Distributed Computing

Page 29: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

29

CCT Software Services

Accessing and Using CCT:

I. Systems Partition (64-8 nodes):– CentOS machines– Dedicated access to a subset of machines (~

Emulab), with sudo access– User accounts

• User requests # machines (<= 64) + storage quota (<= 30 TB)

• Machine allocation survives for 4 weeks, storage survives for 6 months (both extendible)

II. Hadoop/Pig Partition and Service (64 nodes)

III. Eucalyptus Partition (8 nodes)

Page 30: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

30

Accessing and Using CCT:I. Systems Partition (64-8 nodes)II. Hadoop/Pig Partition and Service (64 nodes):

– Looks like a regular shared Hadoop cluster service• Users share 64 nodes. Individual nodes not directly

reachable.• 4 slots per machine• Several users report stable operation at 256 instances• During Spring 09, 10+ projects running simultaneously

– User accounts• User requests account + storage quota (<= 30 TB)• Storage survives for 6 months (extendible)

III. Eucalyptus Partition (8 nodes)

CCT Software Services

Page 31: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

31

Accessing and Using CCT:

I. Systems Partition (64-8 nodes)

II. Hadoop/Pig Partition and Service (64 nodes):

III. Eucalyptus Partition (8 nodes): • Based on open-source version of

Eucalyptus from UCSB (Rich Wolski)• Exports same interface as AWS EC2 and

S3.

CCT Software Services

Page 32: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

32

• Some Services running inside CCT– ZFS: backend file system. – Zenoss: Systems Monitoring. Shared with

department’s other computing clusters– Hadoop + HDFS– Ability to make datasets publicly available

• How do users request an account: two-stage process (go to http://cloud.cs.illinois.edu )1. User account request – require background check

2. Allocation request

CCT Software Services

Page 33: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

33

Founding 6 sites

Open Cirrus Federation

Page 34: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

34

34 April 20, 2023

HP

UIUC

IntelKIT (de)

IDA (sg)

Yahoo

First open federated cloud testbed

Shared: research, applications, infrastructure (9*1,000 cores), data sets

Global services: sign on, monitoring, store, etc., Federated clouds, meaning each is different

CMU

RAS

ETRI

MIMOS

Grown to 9 sites, with more to come

Open Cirrus Federation

Page 35: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

35

OK, so that’s what a cloud looks like today. Now,

suppose I want to start my own company, Devils Inc.

Should I buy a cloud and own it, or should I outsource to a

public cloud?

Page 36: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

Next Week

• We will continue discussion of cloud computing– How MapReduce works– What is PlanetLab and Emulab– What is Grid computing

• Then we will start to discuss Basics of P2P systems

• Please read at least one paper from each session

36

Page 37: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

37

Administrative Announcements

Student-led paper presentations (see instructions on website)• Start from February 11th• Groups of up to 2 students present each class, responsible

for a set of 3 “Main Papers” on a topic– 45 minute presentations (total) followed by discussion– Set up appointment with me to show slides by 5 pm day prior to

presentation– Select your topic by Jan 31st

• List of papers is up on the website• Each of the other students (non-presenters) expected to read

the papers before class and turn in a one to two page review of the any two of the main set of papers (summary, comments, criticisms and possible future directions)– Email review and bring in hardcopy before class

Page 38: 11 Yeah! That’s what I’d like to know. Indranil Gupta (Indy) Lecture 2 What(’s in) the Cloud? January 21, 2010 CS 525 Advanced Distributed Systems Spring

38

Announcements (contd.)

Projects• Groups of 2 (need not be same as

presentation groups)• We’ll start detailed discussions “soon” (a few

classes into the student-led presentations)