automated hadoop cluster construction on ec2

18
Automated Hadoop Clusters on EC2 Mark Kerzner SHMsoft

Upload: markkerzner

Post on 25-May-2015

1.561 views

Category:

Technology


2 download

DESCRIPTION

Presented at Houston Hadoop Meetup

TRANSCRIPT

Page 1: Automated Hadoop Cluster Construction on EC2

Automated Hadoop Clusters on EC2

Mark KerznerSHMsoft

Page 2: Automated Hadoop Cluster Construction on EC2

What is Hadoop? :) :) :)

Everybody knows that ... What is your definition?

Page 3: Automated Hadoop Cluster Construction on EC2

What is a cloud?

Everybody knows that, but 1. Elastic resources2. Internet delivery3. SAAS4. Virtualization5. Device-enabled6. Only (1) or all of the above

Page 4: Automated Hadoop Cluster Construction on EC2

You are the Hadoop programmer

... and you need tools What are your alternatives?● IDE● Local "cluster"● Pseudo-distributed cluster● EC2

Page 5: Automated Hadoop Cluster Construction on EC2

You are the Hadoop programmer

... and you need tools What are your alternatives?● IDE - compile and run the code● Local "cluster" - local file system● Pseudo-distributed cluster - test outside● EC2 - test on the cluster, test for scale

Page 6: Automated Hadoop Cluster Construction on EC2

What are your resources

● Tom White, "Hadoop, the Definitive Guide"● www.hadoopilluminated.com

Page 7: Automated Hadoop Cluster Construction on EC2

For real play, you need a cluster

Page 8: Automated Hadoop Cluster Construction on EC2

Hadoop+ (oh, by the way...)

HBase, Cassandra, MongoDB, NoSQL, Dynamo, BigTable, Dryad (MS), Azure (MS), MapReduce, MapR (EMC), Cloudera distribution, EMC distribution, IBM distribution...

Page 9: Automated Hadoop Cluster Construction on EC2

WhirrSetup export AWS_ACCESS_KEY_ID=... export AWS_SECRET_ACCESS_KEY=... Installcurl -O http://www.apache.org/dist/whirr/whirr-0.7.1/whirr-0.7.1.tar.gztar zxf whirr-0.7.1.tar.gz; cd whirr-0.7.1 Generate key sssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa_whirr Runbin/whirr launch-cluster --config recipes/zookeeper-ec2.properties --private-key-file ~/.ssh/id_rsa_whirr

Page 10: Automated Hadoop Cluster Construction on EC2

Whirr limitations

● No EBS● All or nothing● Generates configuration artifacts● Takes over your computer, no more local

development - uses proxy● Hard to customize

Page 11: Automated Hadoop Cluster Construction on EC2

Amazon EMR

Page 12: Automated Hadoop Cluster Construction on EC2

EMR limitations

● No choice of image● Fixed architecture● Hard to debug● Hard to customize

Page 13: Automated Hadoop Cluster Construction on EC2

You do it

Repeat the manual procedure, only automate it PrepareAMI, Java, Hadoop On-the-flyStart AMI, login, configure, start services, verify, run test jobs

Page 14: Automated Hadoop Cluster Construction on EC2

You do it - advanced

On startup Under-provision, over-provision, progress On-the-fly Monitor, run test jobs, watch for cluster deterioration

Page 15: Automated Hadoop Cluster Construction on EC2

Cloudera Manager

Page 16: Automated Hadoop Cluster Construction on EC2

MapR Manager

Page 17: Automated Hadoop Cluster Construction on EC2

On the large scale

Hadoop 0.20 - up to 4,000 nodesHadoop 0.23 - up to 20,000GridGain - 100's of 1,000's

Page 18: Automated Hadoop Cluster Construction on EC2

Thank you

Questions?