hadoop on ec2

26
Hadoop on EC2 Configuring and running Hadoop clusters, using Cloudera distribution

Upload: markkerzner

Post on 29-Nov-2014

3.576 views

Category:

Technology


0 download

DESCRIPTION

How to run your Hadoop clusters and HBase on EC2, without loosing the data :)

TRANSCRIPT

Page 1: Hadoop on ec2

Hadoop on EC2

Configuring and running Hadoop clusters, using Cloudera distribution

Page 2: Hadoop on ec2

My farm

 

Page 3: Hadoop on ec2

Start

 

Page 4: Hadoop on ec2

Confirm

 

Page 5: Hadoop on ec2

OK, it's running

 

Page 6: Hadoop on ec2

Set /etc/hosts

 

Page 7: Hadoop on ec2

Logging into an EC2 machine

ec2_login.sh:#!/bin/shssh -i ~/.ssh/shmsoft_hadoop.pem ubuntu@$1

For example,

ec2_login sh1

Page 8: Hadoop on ec2

Run command on the cluster

run_on_cluster.sh: #!/bin/bashfor i in {1..6}do  ssh -i ~/.ssh/shmsoft_hadoop.pem ubuntu@sh$i $1done

For example:

run_on_cluster.sh 'ifconfig | grep cast'

Page 9: Hadoop on ec2

Result of running ifconfig

inet addr:10.220.141.227  Bcast:10.220.141.255  inet addr:10.95.31.140  Bcast:10.95.31.255  inet addr:10.220.214.15  Bcast:10.220.215.255  inet addr:10.94.245.56  Bcast:10.94.245.255  inet addr:10.127.17.143  Bcast:10.127.17.255  inet addr:10.125.79.225  Bcast:10.125.79.255 

Page 10: Hadoop on ec2

Edit local conf

gedit masters core-site.xml mapred-site.xml slaves

10.220.141.227    -    masters, core-site.xml10.95.31.140    -    mapred-site.xml10.220.214.15    -    slaves...10.94.245.5610.127.17.14310.125.79.225

Page 11: Hadoop on ec2

masters

10.220.141.227

Page 12: Hadoop on ec2

core-site.xml

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>  <property>    <name>fs.default.name</name>    <value>hdfs://10.220.141.227</value>  </property></configuration>

Page 13: Hadoop on ec2

mapred-site.xml

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>  <property>    <name>mapred.job.tracker</name>    <value>10.95.31.140:54311</value>  </property>  <property>    <name>mapred.local.dir</name>    <value>/tmp/mapred</value>  </property></configuration>

Page 14: Hadoop on ec2

slaves

10.220.214.1510.94.245.5610.127.17.14310.125.79.225

Page 15: Hadoop on ec2

update-hadoop-cluster.sh

#!/bin/bashfor i in {1..6}do  scp -i ~/.ssh/hadoop.pem -r ~/projects//hadoop/conf ubuntu@hc$i:/home/ubuntu/done

run-hadoop-cluster.sh 'sudo cp /home/ubuntu/conf/* /etc/hadoop/conf/'

Page 16: Hadoop on ec2

Important gotchas

sudo chkconfig hadoop-0.20-namenode off

repeat for each installed service

Page 17: Hadoop on ec2

Important gotchas - 2

On EC2 in /etc/hosts you have

# Added by cloud-init127.0.1.1       domU-12-31-38-04-AA-53.compute-1.internal domU-12-31-38-04-AA-53

Instead, do 127.0.0.1# Added by me - the developer127.0.0.1       domU-12-31-38-04-AA-53.compute-1.internal domU-12-31-38-04-AA-53

# !!! for remote access, use internal ip:10.220.169.157  domU-12-31-38-04-AA-53.compute-1.internal domU-12-31-38-04-AA-53

Page 18: Hadoop on ec2

Now start Hadoop services

On each node 

for service in /etc/init.d/hadoop-0.20-*; do sudo $service start; done

Do it with script

run_my_cluster.sh ''

Page 19: Hadoop on ec2

Verify HDFS and MR

Verify!

hadoop fs -ls /copy tocopy fromrun MR better yet...

Page 20: Hadoop on ec2

w3m http://localhost:50070

Page 21: Hadoop on ec2

w3m next screen

 

Page 22: Hadoop on ec2

Start HBase

start-hbase.sh

#!/bin/shsudo /etc/init.d/hadoop-hbase-master startsudo /etc/init.d/hadoop-zookeeper-server  startsudo /etc/init.d/hadoop-hbase-regionserver start

Page 23: Hadoop on ec2

w3m http://localhost:60010

Page 24: Hadoop on ec2

Stop HBase

1. Do compaction

2. stop-hbase.sh

#!/bin/shsudo /etc/init.d/hadoop-hbase-master stopsleep 5sudo /etc/init.d/hadoop-zookeeper-server  stopsudo /etc/init.d/hadoop-hbase-regionserver stop

Page 25: Hadoop on ec2

Amazon EMR

 

Page 26: Hadoop on ec2

Whirr