hadoop on ec2

Post on 29-Nov-2014

3.576 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

How to run your Hadoop clusters and HBase on EC2, without loosing the data :)

TRANSCRIPT

Hadoop on EC2

Configuring and running Hadoop clusters, using Cloudera distribution

My farm

 

Start

 

Confirm

 

OK, it's running

 

Set /etc/hosts

 

Logging into an EC2 machine

ec2_login.sh:#!/bin/shssh -i ~/.ssh/shmsoft_hadoop.pem ubuntu@$1

For example,

ec2_login sh1

Run command on the cluster

run_on_cluster.sh: #!/bin/bashfor i in {1..6}do  ssh -i ~/.ssh/shmsoft_hadoop.pem ubuntu@sh$i $1done

For example:

run_on_cluster.sh 'ifconfig | grep cast'

Result of running ifconfig

inet addr:10.220.141.227  Bcast:10.220.141.255  inet addr:10.95.31.140  Bcast:10.95.31.255  inet addr:10.220.214.15  Bcast:10.220.215.255  inet addr:10.94.245.56  Bcast:10.94.245.255  inet addr:10.127.17.143  Bcast:10.127.17.255  inet addr:10.125.79.225  Bcast:10.125.79.255 

Edit local conf

gedit masters core-site.xml mapred-site.xml slaves

10.220.141.227    -    masters, core-site.xml10.95.31.140    -    mapred-site.xml10.220.214.15    -    slaves...10.94.245.5610.127.17.14310.125.79.225

masters

10.220.141.227

core-site.xml

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>  <property>    <name>fs.default.name</name>    <value>hdfs://10.220.141.227</value>  </property></configuration>

mapred-site.xml

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>  <property>    <name>mapred.job.tracker</name>    <value>10.95.31.140:54311</value>  </property>  <property>    <name>mapred.local.dir</name>    <value>/tmp/mapred</value>  </property></configuration>

slaves

10.220.214.1510.94.245.5610.127.17.14310.125.79.225

update-hadoop-cluster.sh

#!/bin/bashfor i in {1..6}do  scp -i ~/.ssh/hadoop.pem -r ~/projects//hadoop/conf ubuntu@hc$i:/home/ubuntu/done

run-hadoop-cluster.sh 'sudo cp /home/ubuntu/conf/* /etc/hadoop/conf/'

Important gotchas

sudo chkconfig hadoop-0.20-namenode off

repeat for each installed service

Important gotchas - 2

On EC2 in /etc/hosts you have

# Added by cloud-init127.0.1.1       domU-12-31-38-04-AA-53.compute-1.internal domU-12-31-38-04-AA-53

Instead, do 127.0.0.1# Added by me - the developer127.0.0.1       domU-12-31-38-04-AA-53.compute-1.internal domU-12-31-38-04-AA-53

# !!! for remote access, use internal ip:10.220.169.157  domU-12-31-38-04-AA-53.compute-1.internal domU-12-31-38-04-AA-53

Now start Hadoop services

On each node 

for service in /etc/init.d/hadoop-0.20-*; do sudo $service start; done

Do it with script

run_my_cluster.sh ''

Verify HDFS and MR

Verify!

hadoop fs -ls /copy tocopy fromrun MR better yet...

w3m http://localhost:50070

w3m next screen

 

Start HBase

start-hbase.sh

#!/bin/shsudo /etc/init.d/hadoop-hbase-master startsudo /etc/init.d/hadoop-zookeeper-server  startsudo /etc/init.d/hadoop-hbase-regionserver start

w3m http://localhost:60010

Stop HBase

1. Do compaction

2. stop-hbase.sh

#!/bin/shsudo /etc/init.d/hadoop-hbase-master stopsleep 5sudo /etc/init.d/hadoop-zookeeper-server  stopsudo /etc/init.d/hadoop-hbase-regionserver stop

Amazon EMR

 

Whirr

 

top related