hadoop on ec2

Hadoop on EC2

Configuring and running Hadoop clusters, using Cloudera distribution

My farm

Confirm

OK, it's running

Set /etc/hosts

Logging into an EC2 machine

ec2_login.sh:#!/bin/shssh -i ~/.ssh/shmsoft_hadoop.pem ubuntu@$1

For example,

ec2_login sh1

Run command on the cluster

run_on_cluster.sh: #!/bin/bashfor i in {1..6}do ssh -i ~/.ssh/shmsoft_hadoop.pem ubuntu@sh$i $1done

For example:

run_on_cluster.sh 'ifconfig | grep cast'

Result of running ifconfig

inet addr:10.220.141.227 Bcast:10.220.141.255 inet addr:10.95.31.140 Bcast:10.95.31.255 inet addr:10.220.214.15 Bcast:10.220.215.255 inet addr:10.94.245.56 Bcast:10.94.245.255 inet addr:10.127.17.143 Bcast:10.127.17.255 inet addr:10.125.79.225 Bcast:10.125.79.255

Edit local conf

gedit masters core-site.xml mapred-site.xml slaves

10.220.141.227 - masters, core-site.xml10.95.31.140 - mapred-site.xml10.220.214.15 - slaves...10.94.245.5610.127.17.14310.125.79.225

masters

10.220.141.227

core-site.xml

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration> <property> <name>fs.default.name</name> <value>hdfs://10.220.141.227</value> </property></configuration>

mapred-site.xml

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration> <property> <name>mapred.job.tracker</name> <value>10.95.31.140:54311</value> </property> <property> <name>mapred.local.dir</name> <value>/tmp/mapred</value> </property></configuration>

slaves

10.220.214.1510.94.245.5610.127.17.14310.125.79.225

update-hadoop-cluster.sh

#!/bin/bashfor i in {1..6}do scp -i ~/.ssh/hadoop.pem -r ~/projects//hadoop/conf ubuntu@hc$i:/home/ubuntu/done

run-hadoop-cluster.sh 'sudo cp /home/ubuntu/conf/* /etc/hadoop/conf/'

Important gotchas

sudo chkconfig hadoop-0.20-namenode off

repeat for each installed service

Important gotchas - 2

On EC2 in /etc/hosts you have

# Added by cloud-init127.0.1.1 domU-12-31-38-04-AA-53.compute-1.internal domU-12-31-38-04-AA-53

Instead, do 127.0.0.1# Added by me - the developer127.0.0.1 domU-12-31-38-04-AA-53.compute-1.internal domU-12-31-38-04-AA-53

# !!! for remote access, use internal ip:10.220.169.157 domU-12-31-38-04-AA-53.compute-1.internal domU-12-31-38-04-AA-53

Now start Hadoop services

On each node

for service in /etc/init.d/hadoop-0.20-*; do sudo $service start; done

Do it with script

run_my_cluster.sh ''

Verify HDFS and MR

Verify!

hadoop fs -ls /copy tocopy fromrun MR better yet...

w3m http://localhost:50070

w3m next screen

Start HBase

start-hbase.sh

#!/bin/shsudo /etc/init.d/hadoop-hbase-master startsudo /etc/init.d/hadoop-zookeeper-server startsudo /etc/init.d/hadoop-hbase-regionserver start

w3m http://localhost:60010

Stop HBase

1. Do compaction

2. stop-hbase.sh

#!/bin/shsudo /etc/init.d/hadoop-hbase-master stopsleep 5sudo /etc/init.d/hadoop-zookeeper-server stopsudo /etc/init.d/hadoop-hbase-regionserver stop

Amazon EMR

hadoop on ec2

Technology

hadoop 3.x more...

redmine on amazon ec2

hadoop summit - sanoma self service on hadoop

configuring your first hadoop cluster on ec2

hadoop: setting up hadoop 2.7.3 (single node) on aws ec2...

building a hadoop cluster on amazon ec2 using...

optimizing ec2 usage on aws

wordpress on amazon ec2

running hadoop on amazon ec2

hosting drupal on amazon ec2

dedoop: efﬁcient deduplication with...

hadoop workshop using cloudera on amazon ec2

warehouse scale computers,...

matchmaking in the cloud: amazon ec2 and apache ·...

hadoop workshop on ec2 : march 2015

hadoop performance modeling for job estimation and ... ·...

sql on hadoop - meetup on hadoop - zaloni.pdf · •...

building a hadoop cluster on amazon ec2 using cloudera ·...

my sql performance on ec2

matchmaking in the cloud: amazon ec2 and apache hadoop at...