hadoop in the cloud with aws' emr

Post on 17-Jan-2015

247 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Quick intro to and walkthrough of the AWS Elastic Map Reduce (EMR) service. Part of a larger course at http://bit.ly/get-hadoop

TRANSCRIPT

Hadoop in the Cloud: AWS Elastic Map Reduce

• What is EMR?• How does EMR compare to Hadoop?• Use cases

EMR is an AWS Service

• AWS review helpful to understand• Infiniteskills offers a course!

– http://bit.ly/learn-aws

• AWS constantly changing and evolving

http://aws.amazon.com/documentation/elasticmapreduce/

EMR Overview

• Abstracts out cluster setup & management– Integrated provisioning, tooling, debug, monitoring– AWS constantly tuning and optimizing– Failed nodes automatically re-provisioned by AWS

• Reduced costs– Clusters shut down automatically by default– Excellent for sporadic MapReduce needs

• Integration to AWS– Leverage cost-effective EC2 instances for processing, S3 for storage– Monitoring done via CloudWatch

EMR Architecture

Master Instance Group

EC2

S3

Core Instance Group

EC2EC2

HDFS HDFS

Task Instance Group

EC2 EC2

EC2 EC2

• Master group controls cluster• Core group runs DataNode &

TaskTracker daemons• Task group runs tasks

• Can be added & removed• S3 can be used for data input / output• Master group coordinates core + task

activities and manages cluster state• Core + task instances read / write to /

from S3

EMR AWS Integration

• Datastore pull / push to– RDS– DynamoDB– S3

• Derived data can be stored in RedShift– Via AWS DataPipelines– Further post-processing

• Data can be pre-processed with Kinesis

What you give up with EMR

• Control– Always 2-3 months behind Hadoop releases– Cannot use CDH or HDP releases (although MapR is supported)

• Speed (if you’re not an AWS customer)• Vendor lock-in

EMR Use Cases

• Already AWS customer– Lots of data in S3 / DynamoDB / RDS

• Sporadic MapReduce needs• Proof-of-concepting Hadoop• Ease of use

– Seamless, near-infinite scale– Simple administration

Hadoop in the Cloud: AWS Elastic Map Reduce

• What is EMR?• How does EMR compare to Hadoop?• Benefits & downsides• Use cases

top related