hadoop multi-node cluster installation on centos6 · 2018. 10. 28. · hadoop multi cluster...
TRANSCRIPT
Hadoop Multi-node Cluster Installation
on Centos6.6
Created: 01-12-2015
Author: Hyun Kim
Last Updated: 01-12-2015
Version Number: 0.1
Contact info: [email protected]
Hadoop Multi Cluster Installation Guide with Centos 6
In this tutorial, we are using Centos 6.6 and we are going to
install multi node cluster Hadoop.
For this tutorial, we need at least two nodes. One of them is going
to be a master node and the other node is going to be a slave node.
I’m only using two nodes in this tutorial to make this guide as
simple as possible. We will be installing namenode and jobtracker
on the master node and installing datanode, tasktracker, and
secondarynamenode on the slave node. I’m using hostname for my
masternoe as lbb01.exmaple.com and slavenode as lbb02.example.com.
Simple enough? Let’s get started.
Static IP Configuration
We want our servers to work all the time even when they restart by
accident. Therefore, we will configure static ip for each server.
Use the command below to open ethernet configuration.
You connection might be eth0 instead of em1.
$nano /etc/sysconfig/network-scripts/ifcfg-em1
Change BOOTPROTO = “static” and add your IPADDR and NETMASK.
You can check your ip and netmask address by using “ifconfig” command.
As an exmaple:
IPADDR=”192.168.23.234”
NETMASK=”255.255.255.0”
Configure Default Gateway
$ nano /etc/sysconfig/network
Now we are trying to configure network. This may sound complicated but
we are simply add HOSTNAME and GATEWAY. If GATEWAY or HOSTNAME exists
already, simply edit them.
I’m using lbb01.exmaple.com as my hostname as you can see in the picture
below.
Add your GATEWAY=XXX.XXX.XXX.X
Restart network
$etc/init.d/network restart
Configure DNS
$ nano /etc/resolv.conf
add your primary and alternative nameserver.
For example,
nameserver xxx.xxx.xxx.x
nameserver xxx.xxx.xxx.x
$ install yum
to update everything.
Download JDK
We need JDK to install Hadoop. I’m installing jdk-7u25 in this tutorial.
ww.oracle.com/technetwork/java/javase/downloads/java-archive-downloads-
javase7-521261.html#jdk-7u25-oth-JPR
Download hadoop
We are installing hadoop-0.20.0 in this tutorial.
Hadoop-0.20.0 Donwload link--
https://archive.apache.org/dist/hadoop/core/hadoop-0.20.0/
I saved the file under root folder.
Ping localhost
Do what we’ve done so far on slave node as well. Do change host name to
lbb02.exmaple.com NOT lbb01.example.com. Each node has different
IPADDR(ip address) so use command “ifconfig” to adjust all the settings.
edit /etc/hosts
on each node edit the hosts file.
$nano /etc/hosts
add
XXX.XXX.XXX.XXX(ip address for your master node)
lbb01.example.com(hostname for your master node)
XXX.XXX.XXX.XXX(ip address for your slave node)
lbb02.example.com(hostname for your master node)
Try to ping each host to see if they can communicate with each other.
You should be able to ping each host by hostname now.
On each node,
$ping lbb01.example.com
$ping lbb02.exmaple.com
nslookup
$ nslookup lbb01.example.com
$ nslookup lbb02.example.com
If these commands outputs server, address, name on each node, we have
successfully configured network settings.
Install hadoop
As you can see, I’m logged in as a root user. However, I’m not going to
extract hadoop as a root user. I will be moving the hadoop file to
/home/lbbd/ since that is where I can write the file under the user name
“lbbd”.
Your user/account name will be different. Be aware.
Giving lbbd permission
Although the hadoop file is extracted under /home/lbbd/, we need to give
lbbd permission to play wit this folder. To do this, use the command
below.
$ chown -R lbbd:lbbd /home/lbbd/hadoop-0.20.0
Change hadoop-0.20.0 to hadoop
$ ln -s hadoop-0.20.0 hadoop
Why change to hadoop?
So that whenever we need to edit something on hadoop-0.20.0 folder, we
don’t have to type -0.20.0 anymore. We can simply go to hadoop-0.20.0
folder by $ cd /home/lbbd/hadoop. It’s convenient.
Install JDK
I saved the jdk-7u25 file on /root/hadoop_packages. You didn’t have to
do this. Wherever you saved your jdk file, go to the folder. use the
command below to extract the file.
$ rpm -ivh hadoop_pcakges/jdk-7u25-linux-x64.rpm
Edit hadoop-env.sh
$nano /home/lbbd/hadoop/conf/hadoop-env.sh
Now we need to change hadoop-env since we need to let hadoop related
files to know where we we extracted jdk and hadoop.
so I added two lines below:
export JAVA_HOME=/usr/java/jdk1.7.0_25/
export HADOOP_HOME=/home/lbbd/hadoop
core-site.xml edit
$nano /home/lbbd/hadoop/conf/core-site.xml
Edit the file by adding
<property>
<name>fs.default.name</name>
<value>hdfs://(your host anme):9000</value>
</property>
hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/var/datastore</value>
<final>true</final>
</property>
Don’t forget to give you account permission to /var/datastore. Namenode
cannot run without permission.
So login as root and create the folder shown above
$ mkdir /var/datastore
then give the user permission to access to the folder
$ chown -R lbbd:lbbd /var/datastore
use to below command to see if the permission has been updated
$ls -l /var/
mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>hostname:9001</value>
</property>
edit .bash_profile
$ nano .bash_profile
run these commands below to see if everything is installed and directed
correctly in the system
$java
$hadoop
$jps
Format Namenode
$ hadoop namenode -format
$ hadoop-daemon.sh start namenode
$ jps
jobtracker running
$ hadoop-daemon.sh start jobtracker
$ jps
Do all the followings above on your slave node as well. However, when
you edit hdfs.xml file use the properties below:
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/data</value>
<final>true</final>
</property>
And then you need to create data folder by $mkdir /home/data (as root
user) and give your user account permission to this folder as we did
with /var/datastore folder.