deploy hadoop on a single node cluster

Copyright (c) Chirag Ahuja RESTRICTED CIRCULATION

Deploy Hadoop on Single

Node Cluster

Install Hadoop in pseudo-distributed mode

This document explains how to setup Hadoop on a single node cluster. This single node will act as both

master as well as slave. Single node setup is useful for testing and learning purpose.

Contents 1. Recommended Platform ........................................................................................................................... 2

2. Prerequisites: ............................................................................................................................................ 3

3. Install java (either oracle / open jdk) ....................................................................................................... 3

3.1Update the source list ........................................................................................................................... 3

3.2 Install Open jdk................................................................................................................................... 3

4. Configure SSH .......................................................................................................................................... 3

4.1 Install Open SSH Server-Client .......................................................................................................... 3

4.2 Generate key-value pairs .................................................................................................................... 3

4.3 Configure password-less SSH ............................................................................................................ 3

4.4 Check by SSH to localhost ................................................................................................................. 3

5. Download Hadoop .................................................................................................................................... 3

5.1 Download Hadoop .............................................................................................................................. 3

6. Install Hadoop ........................................................................................................................................... 3

6.1 Untar Tar ball ...................................................................................................................................... 3

6.2 Go to HADOOP_HOME_DIR ........................................................................................................... 3

7. Setup Configuration: ................................................................................................................................. 4

7.1 Edit configuration file conf/hadoop-env.sh and set JAVA_HOME ................................................... 4

7.2 Edit configuration file conf/core-site.xml and add following entries: ................................................ 4

7.3 Edit configuration file conf/hdfs-site.xml and add following entries: ................................................ 4

7.4 Edit configuration file conf/mapred-site.xml and add following entries: ........................................... 4

8. Start The Cluster ....................................................................................................................................... 4

8.1 Format the name node: ....................................................................................................................... 4

8.2 Start Hadoop Services ......................................................................................................................... 4

8.3 Check whether services have been started .......................................................................................... 4

9. Run Map-Reduce Jobs .............................................................................................................................. 5

9.1 Run Classic Pi example ...................................................................................................................... 5

9.2 Run word count example: ................................................................................................................... 5

10. Stop The Cluster ..................................................................................................................................... 5

1. Recommended Platform OS: Ubuntu 12.04 (you can use other OS (cent OS, Redhat, etc))

Hadoop: Cloudera distribution for Apache hadoop CDH3U5 (you can use Apache hadoop

(0.20.X

/ 1.X)).

2. Prerequisites: 1. Java

2. OpenSSH

3. Hadoop Setup

3. Install java (either oracle / open jdk)

3.1Update the source list

#sudo apt-get update

3.2 Install Open jdk #sudo apt-get install openjdk-6-jdk

4. Configure SSH

4.1 Install Open SSH Server-Client $sudo apt-get install openssh-server openssh-client

4.2 Generate key-value pairs

$ssh-keygen -t rsa -P ""

4.3 Configure password-less SSH

$cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

4.4 Check by SSH to localhost $ssh localhost

5. Download Hadoop

5.1 Download Hadoop

http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u5.tar.gz

6. Install Hadoop

6.1 Untar Tar ball

$tar xzf hadoop-0.20.2-cdh3u5.tar.gz

6.2 Go to HADOOP_HOME_DIR $cd hadoop-0.20.2-cdh3u5/

7. Setup Configuration:

7.1 Edit configuration file conf/hadoop-env.sh and set JAVA_HOME

export JAVA_HOME=path to be the root of your Java installation(eg: /usr/lib/jvm/java-6-sun)

7.2 Edit configuration file conf/core-site.xml and add following entries:

fs.default.name

hdfs://localhost:9000

hadoop.tmp.dir

/home/hadoop_admin/hdata/hadoop-${user.name}

7.3 Edit configuration file conf/hdfs-site.xml and add following entries:

dfs.replication

1

7.4 Edit configuration file conf/mapred-site.xml and add following entries:

mapred.job.tracker

localhost:9001

8. Start The Cluster

8.1 Format the name node:

$bin/hadoop namenode format

This activity should be done once when you install hadoop, else It will delete all your data from HDFS

8.2 Start Hadoop Services

$bin/start-all.sh

8.3 Check whether services have been started

$jps

NameNode

SecondaryNameNode

DataNode

JobTracker

TaskTracker

9. Run Map-Reduce Jobs

9.1 Run Classic Pi example

$bin/hadoop jar hadoop-*-examples.jar pi 10 100

9.2 Run word count example:

$ bin/hadoop dfs -mkdir inputwords

$ bin/hadoop dfs -put conf inputwords

$ bin/hadoop jar hadoop-*-examples.jar wordcount inputwords outputwords

$ bin/hadoop dfs -cat outputwords/*

10. Stop The Cluster $bin/stop-all.sh

deploy hadoop on a single node cluster

Documents