deploy hadoop on a single node cluster
DESCRIPTION
complete guide for hadoop instalationTRANSCRIPT
-
Copyright (c) Chirag Ahuja RESTRICTED CIRCULATION
Deploy Hadoop on Single
Node Cluster
Install Hadoop in pseudo-distributed mode
This document explains how to setup Hadoop on a single node cluster. This single node will act as both
master as well as slave. Single node setup is useful for testing and learning purpose.
-
Contents 1. Recommended Platform ........................................................................................................................... 2
2. Prerequisites: ............................................................................................................................................ 3
3. Install java (either oracle / open jdk) ....................................................................................................... 3
3.1Update the source list ........................................................................................................................... 3
3.2 Install Open jdk................................................................................................................................... 3
4. Configure SSH .......................................................................................................................................... 3
4.1 Install Open SSH Server-Client .......................................................................................................... 3
4.2 Generate key-value pairs .................................................................................................................... 3
4.3 Configure password-less SSH ............................................................................................................ 3
4.4 Check by SSH to localhost ................................................................................................................. 3
5. Download Hadoop .................................................................................................................................... 3
5.1 Download Hadoop .............................................................................................................................. 3
6. Install Hadoop ........................................................................................................................................... 3
6.1 Untar Tar ball ...................................................................................................................................... 3
6.2 Go to HADOOP_HOME_DIR ........................................................................................................... 3
7. Setup Configuration: ................................................................................................................................. 4
7.1 Edit configuration file conf/hadoop-env.sh and set JAVA_HOME ................................................... 4
7.2 Edit configuration file conf/core-site.xml and add following entries: ................................................ 4
7.3 Edit configuration file conf/hdfs-site.xml and add following entries: ................................................ 4
7.4 Edit configuration file conf/mapred-site.xml and add following entries: ........................................... 4
8. Start The Cluster ....................................................................................................................................... 4
8.1 Format the name node: ....................................................................................................................... 4
8.2 Start Hadoop Services ......................................................................................................................... 4
8.3 Check whether services have been started .......................................................................................... 4
9. Run Map-Reduce Jobs .............................................................................................................................. 5
9.1 Run Classic Pi example ...................................................................................................................... 5
9.2 Run word count example: ................................................................................................................... 5
10. Stop The Cluster ..................................................................................................................................... 5
1. Recommended Platform OS: Ubuntu 12.04 (you can use other OS (cent OS, Redhat, etc))
Hadoop: Cloudera distribution for Apache hadoop CDH3U5 (you can use Apache hadoop
(0.20.X
/ 1.X)).
-
2. Prerequisites: 1. Java
2. OpenSSH
3. Hadoop Setup
3. Install java (either oracle / open jdk)
3.1Update the source list
#sudo apt-get update
3.2 Install Open jdk #sudo apt-get install openjdk-6-jdk
4. Configure SSH
4.1 Install Open SSH Server-Client $sudo apt-get install openssh-server openssh-client
4.2 Generate key-value pairs
$ssh-keygen -t rsa -P ""
4.3 Configure password-less SSH
$cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
4.4 Check by SSH to localhost $ssh localhost
5. Download Hadoop
5.1 Download Hadoop
http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u5.tar.gz
6. Install Hadoop
6.1 Untar Tar ball
$tar xzf hadoop-0.20.2-cdh3u5.tar.gz
6.2 Go to HADOOP_HOME_DIR $cd hadoop-0.20.2-cdh3u5/
-
7. Setup Configuration:
7.1 Edit configuration file conf/hadoop-env.sh and set JAVA_HOME
export JAVA_HOME=path to be the root of your Java installation(eg: /usr/lib/jvm/java-6-sun)
7.2 Edit configuration file conf/core-site.xml and add following entries:
fs.default.name
hdfs://localhost:9000
hadoop.tmp.dir
/home/hadoop_admin/hdata/hadoop-${user.name}
7.3 Edit configuration file conf/hdfs-site.xml and add following entries:
dfs.replication
1
7.4 Edit configuration file conf/mapred-site.xml and add following entries:
mapred.job.tracker
localhost:9001
8. Start The Cluster
8.1 Format the name node:
$bin/hadoop namenode format
This activity should be done once when you install hadoop, else It will delete all your data from HDFS
8.2 Start Hadoop Services
$bin/start-all.sh
8.3 Check whether services have been started
$jps
NameNode
SecondaryNameNode
DataNode
JobTracker
TaskTracker
-
9. Run Map-Reduce Jobs
9.1 Run Classic Pi example
$bin/hadoop jar hadoop-*-examples.jar pi 10 100
9.2 Run word count example:
$ bin/hadoop dfs -mkdir inputwords
$ bin/hadoop dfs -put conf inputwords
$ bin/hadoop jar hadoop-*-examples.jar wordcount inputwords outputwords
$ bin/hadoop dfs -cat outputwords/*
10. Stop The Cluster $bin/stop-all.sh