hadoop multi-node cluster installation on centos6 · 2018. 10. 28. · hadoop multi cluster...

Hadoop Multi-node Cluster Installation

on Centos6.6

Created: 01-12-2015

Author: Hyun Kim

Last Updated: 01-12-2015

Version Number: 0.1

Contact info: [email protected]

[email protected]

mailto:[email protected]

mailto:[email protected]

Hadoop Multi Cluster Installation Guide with Centos 6

In this tutorial, we are using Centos 6.6 and we are going to

install multi node cluster Hadoop.

For this tutorial, we need at least two nodes. One of them is going

to be a master node and the other node is going to be a slave node.

I’m only using two nodes in this tutorial to make this guide as

simple as possible. We will be installing namenode and jobtracker

on the master node and installing datanode, tasktracker, and

secondarynamenode on the slave node. I’m using hostname for my

masternoe as lbb01.exmaple.com and slavenode as lbb02.example.com.

Simple enough? Let’s get started.

Static IP Configuration

We want our servers to work all the time even when they restart by

accident. Therefore, we will configure static ip for each server.

Use the command below to open ethernet configuration.

You connection might be eth0 instead of em1.

$nano /etc/sysconfig/network-scripts/ifcfg-em1

Change BOOTPROTO = “static” and add your IPADDR and NETMASK.

You can check your ip and netmask address by using “ifconfig” command.

As an exmaple:

IPADDR=”192.168.23.234”

NETMASK=”255.255.255.0”

Configure Default Gateway

$ nano /etc/sysconfig/network

Now we are trying to configure network. This may sound complicated but

we are simply add HOSTNAME and GATEWAY. If GATEWAY or HOSTNAME exists

already, simply edit them.

I’m using lbb01.exmaple.com as my hostname as you can see in the picture

below.

Add your GATEWAY=XXX.XXX.XXX.X

Restart network

$etc/init.d/network restart

Configure DNS

$ nano /etc/resolv.conf

add your primary and alternative nameserver.

For example,

nameserver xxx.xxx.xxx.x

nameserver xxx.xxx.xxx.x

$ install yum

to update everything.

Download JDK

We need JDK to install Hadoop. I’m installing jdk-7u25 in this tutorial.

ww.oracle.com/technetwork/java/javase/downloads/java-archive-downloads-

javase7-521261.html#jdk-7u25-oth-JPR

Download hadoop

We are installing hadoop-0.20.0 in this tutorial.

Hadoop-0.20.0 Donwload link--

https://archive.apache.org/dist/hadoop/core/hadoop-0.20.0/

https://archive.apache.org/dist/hadoop/core/hadoop-0.20.0/

I saved the file under root folder.

Ping localhost

Do what we’ve done so far on slave node as well. Do change host name to

lbb02.exmaple.com NOT lbb01.example.com. Each node has different

IPADDR(ip address) so use command “ifconfig” to adjust all the settings.

edit /etc/hosts

on each node edit the hosts file.

$nano /etc/hosts

add

XXX.XXX.XXX.XXX(ip address for your master node)

lbb01.example.com(hostname for your master node)

XXX.XXX.XXX.XXX(ip address for your slave node)

lbb02.example.com(hostname for your master node)

Try to ping each host to see if they can communicate with each other.

You should be able to ping each host by hostname now.

On each node,

$ping lbb01.example.com

$ping lbb02.exmaple.com

nslookup

$ nslookup lbb01.example.com

$ nslookup lbb02.example.com

If these commands outputs server, address, name on each node, we have

successfully configured network settings.

Install hadoop

As you can see, I’m logged in as a root user. However, I’m not going to

extract hadoop as a root user. I will be moving the hadoop file to

/home/lbbd/ since that is where I can write the file under the user name

“lbbd”.

Your user/account name will be different. Be aware.

Giving lbbd permission

Although the hadoop file is extracted under /home/lbbd/, we need to give

lbbd permission to play wit this folder. To do this, use the command

below.

$ chown -R lbbd:lbbd /home/lbbd/hadoop-0.20.0

Change hadoop-0.20.0 to hadoop

$ ln -s hadoop-0.20.0 hadoop

Why change to hadoop?

So that whenever we need to edit something on hadoop-0.20.0 folder, we

don’t have to type -0.20.0 anymore. We can simply go to hadoop-0.20.0

folder by $ cd /home/lbbd/hadoop. It’s convenient.

Install JDK

I saved the jdk-7u25 file on /root/hadoop_packages. You didn’t have to

do this. Wherever you saved your jdk file, go to the folder. use the

command below to extract the file.

$ rpm -ivh hadoop_pcakges/jdk-7u25-linux-x64.rpm

Edit hadoop-env.sh

$nano /home/lbbd/hadoop/conf/hadoop-env.sh

Now we need to change hadoop-env since we need to let hadoop related

files to know where we we extracted jdk and hadoop.

so I added two lines below:

export JAVA_HOME=/usr/java/jdk1.7.0_25/

export HADOOP_HOME=/home/lbbd/hadoop

core-site.xml edit

$nano /home/lbbd/hadoop/conf/core-site.xml

Edit the file by adding

<property>

<name>fs.default.name</name>

<value>hdfs://(your host anme):9000</value>

</property>

hdfs-site.xml

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

<property>

<name>dfs.name.dir</name>

<value>/var/datastore</value>

<final>true</final>

</property>

Don’t forget to give you account permission to /var/datastore. Namenode

cannot run without permission.

So login as root and create the folder shown above

$ mkdir /var/datastore

then give the user permission to access to the folder

$ chown -R lbbd:lbbd /var/datastore

use to below command to see if the permission has been updated

$ls -l /var/

mapred-site.xml

<property>

<name>mapred.job.tracker</name>

<value>hostname:9001</value>

</property>

edit .bash_profile

$ nano .bash_profile

run these commands below to see if everything is installed and directed

correctly in the system

$java

$hadoop

$jps

Format Namenode

$ hadoop namenode -format

$ hadoop-daemon.sh start namenode

$ jps

jobtracker running

$ hadoop-daemon.sh start jobtracker

$ jps

Do all the followings above on your slave node as well. However, when

you edit hdfs.xml file use the properties below:

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

<property>

<name>dfs.data.dir</name>

<value>/home/data</value>

<final>true</final>

</property>

And then you need to create data folder by $mkdir /home/data (as root

user) and give your user account permission to this folder as we did

with /var/datastore folder.

hadoop multi-node cluster installation on centos6 · 2018. 10. 28. · hadoop multi cluster...

Documents