google cloud computing on google developer 2008 day

Cloud Computing

Ping YehJune 14, 2008

Evolution of Computing with the Network

Cluster Computing

Network Computing

Grid Computing

Utility Computing

Network is computer (client - server)

Separation of Functionalities

Cluster and grid images are from Fermilab and CERN, respectively.


Cluster Computing

Network Computing

Grid Computing

Utility Computing


Tightly coupled computing resources:CPU, storage, data, etcUsually connected within a LANManaged as a single resource


Commodity, Open Source



Cluster Computing

Network Computing

Grid Computing

Utility Computing



Resource sharing across administrative domains Decentralized, open

standards, non-trivial service



Global Resource Sharing



Cluster Computing

Network Computing

Grid Computing

Utility Computing



Don't buy computers, lease computing power

Upload, run, download

Resource sharing across administrative domains Decentralized, open

standards, non-trivial service



Global Resource Sharing

Ownership Model


The Next Step: Cloud Computing

Services and data are in the cloud,accessible with any device

connected to the cloudwith a browser

The Next Step: Cloud Computing

Services and data are in the cloud,accessible with any device

connected to the cloudwith a browser

A key technical issue for developers:

Scalability

Applications on the Web

internet splat map: http://flickr.com/photos/jurvetson/916142/, CC-by 2.0baby picture: http://flickr.com/photos/cdharrison/280252512/, CC-by-sa 2.0

Your user


Your CoolestWeb Application

Applications on the Web


Your user


The Cloud


松下問童子

言師採藥去

只在此山中

雲深不知處

賈島《尋隱者不遇》

I asked the kid under the pine tree,"Where might your master be?""He is picking herbs in the mountain," he said,"the cloud is too deep to know where."

Jia Dao, "Didn't meet the master,"written around 800AD

The Cloud

How many users do you want to have?


Nov. '98: 10,000 queries on 25 computersApr. '99: 500,000 queries on 300 computersSep. '99: 3,000,000 queries on 2100 computers

Google Growth

Scalability matters

Counting the numbers

Client / Server

One : Many

Personal Computer

One : One

Counting the numbers

Client / Server

One : Many

Personal Computer

One : One

Cloud Computing

Many : Many

Developer transition

What Powers Cloud Computing?

Performance: single machine not interesting

Reliability:o Most reliable hardware will still fail:

fault-tolerant software neededo Fault-tolerant software enables use

of commodity components Standardization: use standardized

machinesto run all kinds of applications

Commodity Hardware

Infrastructure Software

Distributed storage:Google File System (GFS)

Distributed semi-structured datasystem: BigTable

Distributed data processingsystem: MapReduce

chunk ...chunk ...chunk ...chunk ...

/foo/bar

google.stanford.edu (circa 1997)

google.com (1999)

“cork boards"

Google Data Center (circa 2000)

google.com (new data center 2001)

google.com (3 days later)

Current Design

• In-house rack design• PC-class motherboards• Low-end storage and

networking hardware• Linux• + in-house software

How to develop a web application that scales?

Storage Database Serving

Google's solution/replacement

GoogleFile

SystemBigTableMapReduce Google

AppEngine

DataProcessing

How to develop a web application that scales?

Storage Database Serving

Google's solution/replacement

GoogleFile

SystemBigTableMapReduce Google

AppEngine

Published papers

Opened on2008/5/28

DataProcessing

hadoop: open source implementation

Google File System

GFS Client

ApplicationReplicas

Masters

GFS Master

GFS Master

C0 C1

C2C5

Chunkserver

C0

C2

C5

Chunkserver

C1

Chunkserver

…

File namespace

chunk 2ef7chunk ... chunk ... chunk ...

/foo/bar

GFS Client

Application

C5 C3

• Files broken into chunks (typically 64 MB)

• Chunks triplicated across three machines for safety (tunable)

• Master manages metadata

• Data transfers happen directly between clients and chunkservers

GFS Usage @ Google

• 200+ clusters• Filesystem clusters of

up to 5000+ machines• Pools of 10000+ clients• 5+ PB Filesystems

All in the presence of frequent HW failures

BigTable

“www.cnn.com”

“contents:”

Rows

Columns

Timestamps

t3t11

t17

“<html>…”

• Distributed multi-level sparse map: fault-tolerant, persistent

• Scalable:o Thousands of serverso Terabytes of in-memory data,

petabytes of disk-based data• Self-managing

o Servers can be added/removed dynamically

o Servers adjust to load imbalance

Data model:(row, column, timestamp) cell contents

Why not just use commercial DB?

• Scale is too large or cost is too high for most commercial databases

• Low-level storage optimizations help performance significantlyo Much harder to do when running on top of a database layero Also fun and challenging to build large-scale systems :)

System Structure

Lock service

Bigtable master

Bigtable tablet server Bigtable tablet serverBigtable tablet server

GFSCluster scheduling system

…

holds metadata,handles master-electionholds tablet data, logshandles failover, monitoring

performs metadata ops +load balancing

serves data serves dataserves data

BigTable Cell

System Structure

Lock service

Bigtable master

Bigtable tablet server Bigtable tablet serverBigtable tablet server

GFSCluster scheduling system

…

holds metadata,handles master-electionholds tablet data, logshandles failover, monitoring

performs metadata ops +load balancing

serves data serves dataserves data

Bigtable client

Bigtable clientlibrary

Open()read/write

metadata ops

BigTable Cell

BigTable Summary

• Data model applicable to broad range of clientso Actively deployed in many of Google’s services

• System provides high performance storage system on a large scaleo Self-managingo Thousands of serverso Millions of ops/secondo Multiple GB/s reading/writing

• Currently ~500 BigTable cells• Largest bigtable cell manages ~3PB of data spread over

several thousand machines (larger cells planned)

Distributed Data Processing

How do you process 1 month of apache logs to find the usage pattern numRequest[minuteOfTheWeek]?• Input files: N rotated logs• Size: O(TB) for popular sites – multiple physical disks• Processing phase 1: launch M processes

o input: N/M log fileso output: one file of numRequest[minuteOfTheWeek]

• Processing phase 2: merge M output files of step 1

Pseudo Codes for Phase 1 and 2

def findBucket(requestTime):# return minute of the week

numRequest = zeros(1440*7) # an array of 1440*7 zerosfor filename in sys.argv[2:]:for line in open(filename):minuteBucket = findBucket(findTime(line))numRequest[minuteBucket] += 1outFile = open(sys.argv[1], 'w')for i in range(1440*7):outFile.write("%d %d\n" % (i, numRequest[i]))outFile.close()

numRequest = zeros(1440*7) # an array of 1440*7 zerosfor filename in sys.argv[2:]:for line in open(filename):col = line.split()[i, count] = [int(col[0]), int(col[1])]numRequest[i] += count# write out numRequest[] like phase 1

Task Management

• Logistics:o Decide which computers to run phase 1, make sure the log

files are accessible (NFS-like or copy)o Similar for phase 2

• Execution:o Launch the phase 1 programs with appropriate command

line flags, re-launch failed tasks until phase 1 is doneo Similar for phase 2

• Automation: build task scripts on top of existing batch system (PBS, Condor, GridEngine, LoadLeveler, etc)

Technical Issues

• File management: where to store files?o Store all logs on the same file server Bottleneck!➔o Distributed file system: opportunity to run locally

• Granularity: how to decide N and M?o Performance when M until M == N if no I/O contention➚ ➚o Can M > N? Yes! Careful log splitting. Is it faster?

• Job allocation: assign which task to which node?o Prefer local job: knowledge of file system

• Fault-recovery: what if a node crashes?o Redundancy of data a musto Crash-detection and job re-allocation necessary Performance

Robustness

Reusability

MapReduce – A New Model and System

• Map: (in_key, in_value) { (➔ keyj, valuej) | j = 1, ..., K }• Reduce: (key, [value1, ... valueL]) (➔ key, f_value)

Two phases of data processing

MapReduce Programming Model

• Borrowed from functional programming

map(f, [x1, x2, ...]) = [f(x1), f(x2), ...]reduce(f, x0, [x1, x2, x3,...])= reduce(f, f(x0, x1), [x2,...])= ... (continue until the list is exausted)• Users implement two functions:

map (in_key, in_value) ➔(keyj, valuej) list

reduce [value1, ... valueL] ➔ f_value

MapReduce Version of Pseudo Code

def findBucket(requestTime):# return minute of the week

class LogMinuteCounter(MapReduction):def Map(key, value, output): # key is locationminuteBucket = findBucket(findTime(value))output.collect(str(minuteBucket), "1") def Reduce(key, iter, output):sum = 0while not iter.done():sum += 1output.collect(key, str(sum))

• Look! mom, no file I/O!• Only data processing logic...

... and gets much more than that!

MapReduce Framework

For certain classes of problems, the MapReduce framework provides:• Automatic & efficient parallelization/distribution• I/O scheduling: Run mapper close to input data (same

node or same rack when possible, with GFS)• Fault-tolerance: restart failed mapper or reducer tasks on

the same or different nodes• Robustness: tolerate even massive failures, e.g. large-

scale network maintenance: once lost 1800 out of 2000 machines

• Status/monitoring

Task Granularity And Pipelining

• Fine granularity tasks: many more map tasks than machineso Minimizes time for fault recoveryo Can pipeline shuffling with map executiono Better dynamic load balancing

• Often use 200,000 map/5000 reduce tasks w/ 2000 machines

MapReduce: Adoption at Google

MapReduce Programs in Google’s Source Tree

Summer intern effectNew MapReduce Programs Per Month

MapReduce: Uses at Google

• Typical configuration: 200,000 mappers, 500 reducers on 2,000 nodes

• Broad applicability has been a pleasant surpriseo Quality experiments, log analysis, machine translation, ad-

hoc data processing, …o Production indexing system: rewritten w/ MapReduce

~10 MapReductions, much simpler than old code

MapReduce Summary

• MapReduce has proven to be a useful abstraction• Greatly simplifies large-scale computations at Google• Fun to use: focus on problem, let library deal with messy

details• Published

A Data Playground

• Substantial fraction of internet available for processing• Easy-to-use teraflops/petabytes, quick turn-around• Cool problems, great colleagues

MapReduce + BigTable + GFS = Data playground

Query Frequency Over Time

Learning From Data

Searching for Britney Spears...

Open Source Cloud Software: Project Hadoop

• Google published papers on GFS ('03), MapReduce ('04) and BigTable ('06)

• Project Hadoopo An open source project with the Apache Software

Foundationo Implement Google's Cloud technologies in Javao HDFS ("GFS") and Hadoop MapReduce are available,

Hbase ("BigTable") is being developed.• Google is not directly involved in the development: avoid

conflict of interest.

Industrial interest in Hadoop

• Yahoo! hired core Hadoop developerso Announced that their Webmap is produced on a Hadoop

cluster with 2000 hosts (dual/quad cores) on Feb 19, 2008.• Amazon EC2 (Elastic Compute Cloud) supports Hadoop

o Write your mapper and reducer, upload your data and program, run and pay by resource utilisation

o Tiff-to-PDF conversion of 11 million scanned New York Times articles (1851-1922) done in 24 hours on Amazon S3/EC2 with Hadoop on 100 EC2 machines. http://open.nytimes.com/2007/11/01/self-service-prorated-super-computing-fun/

o Many silicon valley startups are using EC2 and starting to use Hadoop for their coolest ideas on internet-scale of data

• IBM announced "Blue Cloud," will include Hadoop among other software components

AppEngine

• Run your applications on Google infrastructure and data centerso Focus on your application, forget about machines, operating

systems, web server software, database setup / maintenance, load balancing, etc.

• Opened for public sign-up on 2008/5/28• Python API to Datastore (on top of BigTable) and Users• Free to start, pay as you expand• More details can be found in the AppEngine talks.• http://code.google.com/appengine/

Academic Cloud Computing Initiative

• Google works with top universities on teaching GFS, BigTable and MapReduce in courseso UW, MIT, Stanford, Berkeley, CMU, Marylando First wave in Taiwan: NTU, NCTU

"Parallel Programming" by Professor Pangfeng Liu (NTU) "Web Services and Applications" by Professors Wen-Chih

Peng and Jiun-Lung Huang (NCTU) o Google offers course materials, technical seminars and

student mentoring by Google engineers.• Google and IBM provides a data center for academic use

o Software stack: Linux + Hadoop + IBM's cluster management software

References

• “The Google File System,” Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung, Proceedings of the 19th ACM Symposium on Operating Systems Principles, 2003, pp. 20-43. http://research.google.com/archive/gfs-sosp2003.pdf

• “MapReduce: Simplified Data Processing on Large Clusters,” Jeffrey Dean, Sanjay Ghemawat, Communications of the ACM, vol. 51, no. 1 (2008), pp. 107-113. http://labs.google.com/papers/mapreduce-osdi04.pdf

• “Bigtable: A Distributed Storage System for Structured Data,” Fay Chang et al, 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2006, pp. 205-218. http://research.google.com/archive/bigtable-osdi06.pdf

• Distributed systems course materials (slides, videos): http://code.google.com/edu/parallel

Summary

• Cloud Computing is about scalable web applications (and data processing needed to make apps interesting)

• Lots of commodity PCs: good for scalability and cost• Build web applications to be scalable from the start

o AppEngine allows developers to use Google's scalable infrastructure and data centers

o Hadoop enables scalable data processing

The era of Cloud Computing is here!

Photo by mr.hero on panoramio (http://www.panoramio.com/photo/1127015)

news

peoplebook search

photo

product search

videomaps

e-mails

mobileblogs

groups

calendarscholar

Earth

Sky

web

desktop

translate

messages

google cloud computing on google developer 2008 day

Documents

evolution of computing

computing resources

cloud computing services

powers cloud computing

cloud computing ping

lease computing power

google data center

foobargfs client application