hw09 making hadoop easy on amazon web services

13
Amazon Elastic MapReduce Peter Sirota

Upload: cloudera-inc

Post on 20-Aug-2015

2.858 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Hw09   Making Hadoop Easy On Amazon Web Services

Amazon Elastic MapReduce

Peter Sirota

Page 2: Hw09   Making Hadoop Easy On Amazon Web Services

Amazon  Elas+c  MapReduce  

!  Enables  customers  to  easily  and  cost-­‐effec+vely  process  vast  amounts  of  data.    

!  U+lizes  a  hosted  Hadoop  framework  running  on  the  web-­‐scale  infrastructure  of  Amazon.  

!  Launched  in  the  US  in  April  and  EU  in  July  of  2009  

Page 3: Hw09   Making Hadoop Easy On Amazon Web Services

Amazon  Elas+c  MapReduce  

!  Large  scale  data  processing  has  a  lot  of  MUCK  and  we  want  to  remove  it  for  our  customers  

!  Hard  to  manage  compute  clusters  !  Hard  to  tune  Hadoop  !  Hadoop  issues  preven+ng  smooth  opera+on  in  the  cloud  

Amazon.com  Confiden+al   3  

Page 4: Hw09   Making Hadoop Easy On Amazon Web Services

Hadoop  made  simple  and  easy  

Page 5: Hw09   Making Hadoop Easy On Amazon Web Services

Input  S3  bucket  

Output  S3  bucket  

Amazon S3

Hadoop

Amazon EC2 Instances

Input dataset

output results

Deploy Application

Web Console, Command line tools

End

Notify

Get Results Input Data

Amazon Elastic MapReduce

Hadoop Hadoop

Hadoop

Hadoop

Hadoop

Elastic MapReduce

Elastic MapReduce

Page 6: Hw09   Making Hadoop Easy On Amazon Web Services

Amazon Elastic MapReduce Benefits

Elastic Uses as many or as few EC2 instances as needed. Spin up large or small job flows in minutes.

Easy to use Get up and running quickly with easy-to-use web console, robust command line clients and sample jobs. No configuration necessary.

Reliable Fault tolerant service built on top of battle-tested AWS infrastructure. Automatically retries failed tasks.

Cost Effective We monitor progress of your jobs and turn off resources when job flow is done.

Page 7: Hw09   Making Hadoop Easy On Amazon Web Services

Problems  customers  solve  with    Elas+c  MapReduce  

!  Data  mining  (Log  processing,  click  stream  analysis,  similari+es,  etc.)    

!  Bio-­‐informa+cs  (Genome  analysis)    

!  Financial  simula+on  (Monte  Carlo  simula+on)  

!  File  processing  (resize  jpegs)  !  Web  indexing  

7  Amazon.com  Confiden+al  

Page 8: Hw09   Making Hadoop Easy On Amazon Web Services

Customer  Feedback  

!  Pros:  !  Amazon  Elas+c  MapReduce  makes  it  easy  to  run  Hadoop  applica+ons.  

!  Reliable  plaZorm  for  produc+on  data-­‐processing  

!  Challenges:  !  Simple  tasks  such  as  log  processing  require  fluency  in  MapReduce  

!  Hadoop  applica+ons  are  difficult  to  develop  

Page 9: Hw09   Making Hadoop Easy On Amazon Web Services

New  Features  

!  Support  for  Apache  Pig  –  August  2009  !  Batch  and  interac+ve  mode  

!  Concurrent  access  to  mul+ple  file  systems  

!  Loading  resources  from  Amazon  S3  

!  Addi+onal  Piggybank  func+ons  !  Integra+on  with  Elas+c  MapReduce  Client  and  Web  Console  

Page 10: Hw09   Making Hadoop Easy On Amazon Web Services

New  Features  

!  Support  for  Apache  Hive  0.4  –  Today  !  Batch  and  interac+ve  mode  

!  Integra+on  with  Elas+c  MapReduce  Client  and  Web  Console  

!  Addi+ons  to  Hive    •  Load  table  par++ons  automa+cally  from  Amazon  S3  

•  Specify  an  off-­‐instance  metadata  store    

•  Op+mized  data  writes  to  Amazon  S3  •  Reference  resources  on  Amazon  S3  

Page 11: Hw09   Making Hadoop Easy On Amazon Web Services

Amazon  Elas+c  MapReduce  Ecosystem  

!  Karmasphere  Studio  for  Hadoop  –  NetBeans  IDE  for  development,  debugging,  deployment  and  management  of  Hadoop  jobs  !  Deploy  Hadoop  jobs  to  Elas+c  MapReduce  

!  Monitor  progress  of  Elas+c  MapReduce  job  flows  !  Amazon  S3  file  browser  !  Elas+c  MapReduce  HDFS  browser  

Page 12: Hw09   Making Hadoop Easy On Amazon Web Services

Amazon  Elas+c  MapReduce  Ecosystem  

!  Support  for  Cloudera’s  Hadoop  distribu+on  (private  beta)  !  Op+onally  use  Cloudera’s  Hadoop  while  execu+ng  Elas+c  MapReduce  job  flows  

!  Get  support  from  Cloudera  for  the  Elas+c  MapReduce  job  flows  

Page 13: Hw09   Making Hadoop Easy On Amazon Web Services

Q&A