deploy hadoop on a single node cluster

5
Copyright (c) Chirag Ahuja RESTRICTED CIRCULATION Deploy Hadoop on Single Node Cluster Install Hadoop in pseudo-distributed mode This document explains how to setup Hadoop on a single node cluster. This single node will act as both master as well as slave. Single node setup is useful for testing and learning purpose.

Upload: krishna-mohan

Post on 20-Nov-2015

13 views

Category:

Documents


6 download

DESCRIPTION

complete guide for hadoop instalation

TRANSCRIPT

  • Copyright (c) Chirag Ahuja RESTRICTED CIRCULATION

    Deploy Hadoop on Single

    Node Cluster

    Install Hadoop in pseudo-distributed mode

    This document explains how to setup Hadoop on a single node cluster. This single node will act as both

    master as well as slave. Single node setup is useful for testing and learning purpose.

  • Contents 1. Recommended Platform ........................................................................................................................... 2

    2. Prerequisites: ............................................................................................................................................ 3

    3. Install java (either oracle / open jdk) ....................................................................................................... 3

    3.1Update the source list ........................................................................................................................... 3

    3.2 Install Open jdk................................................................................................................................... 3

    4. Configure SSH .......................................................................................................................................... 3

    4.1 Install Open SSH Server-Client .......................................................................................................... 3

    4.2 Generate key-value pairs .................................................................................................................... 3

    4.3 Configure password-less SSH ............................................................................................................ 3

    4.4 Check by SSH to localhost ................................................................................................................. 3

    5. Download Hadoop .................................................................................................................................... 3

    5.1 Download Hadoop .............................................................................................................................. 3

    6. Install Hadoop ........................................................................................................................................... 3

    6.1 Untar Tar ball ...................................................................................................................................... 3

    6.2 Go to HADOOP_HOME_DIR ........................................................................................................... 3

    7. Setup Configuration: ................................................................................................................................. 4

    7.1 Edit configuration file conf/hadoop-env.sh and set JAVA_HOME ................................................... 4

    7.2 Edit configuration file conf/core-site.xml and add following entries: ................................................ 4

    7.3 Edit configuration file conf/hdfs-site.xml and add following entries: ................................................ 4

    7.4 Edit configuration file conf/mapred-site.xml and add following entries: ........................................... 4

    8. Start The Cluster ....................................................................................................................................... 4

    8.1 Format the name node: ....................................................................................................................... 4

    8.2 Start Hadoop Services ......................................................................................................................... 4

    8.3 Check whether services have been started .......................................................................................... 4

    9. Run Map-Reduce Jobs .............................................................................................................................. 5

    9.1 Run Classic Pi example ...................................................................................................................... 5

    9.2 Run word count example: ................................................................................................................... 5

    10. Stop The Cluster ..................................................................................................................................... 5

    1. Recommended Platform OS: Ubuntu 12.04 (you can use other OS (cent OS, Redhat, etc))

    Hadoop: Cloudera distribution for Apache hadoop CDH3U5 (you can use Apache hadoop

    (0.20.X

    / 1.X)).

  • 2. Prerequisites: 1. Java

    2. OpenSSH

    3. Hadoop Setup

    3. Install java (either oracle / open jdk)

    3.1Update the source list

    #sudo apt-get update

    3.2 Install Open jdk #sudo apt-get install openjdk-6-jdk

    4. Configure SSH

    4.1 Install Open SSH Server-Client $sudo apt-get install openssh-server openssh-client

    4.2 Generate key-value pairs

    $ssh-keygen -t rsa -P ""

    4.3 Configure password-less SSH

    $cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

    4.4 Check by SSH to localhost $ssh localhost

    5. Download Hadoop

    5.1 Download Hadoop

    http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u5.tar.gz

    6. Install Hadoop

    6.1 Untar Tar ball

    $tar xzf hadoop-0.20.2-cdh3u5.tar.gz

    6.2 Go to HADOOP_HOME_DIR $cd hadoop-0.20.2-cdh3u5/

  • 7. Setup Configuration:

    7.1 Edit configuration file conf/hadoop-env.sh and set JAVA_HOME

    export JAVA_HOME=path to be the root of your Java installation(eg: /usr/lib/jvm/java-6-sun)

    7.2 Edit configuration file conf/core-site.xml and add following entries:

    fs.default.name

    hdfs://localhost:9000

    hadoop.tmp.dir

    /home/hadoop_admin/hdata/hadoop-${user.name}

    7.3 Edit configuration file conf/hdfs-site.xml and add following entries:

    dfs.replication

    1

    7.4 Edit configuration file conf/mapred-site.xml and add following entries:

    mapred.job.tracker

    localhost:9001

    8. Start The Cluster

    8.1 Format the name node:

    $bin/hadoop namenode format

    This activity should be done once when you install hadoop, else It will delete all your data from HDFS

    8.2 Start Hadoop Services

    $bin/start-all.sh

    8.3 Check whether services have been started

    $jps

    NameNode

    SecondaryNameNode

    DataNode

    JobTracker

    TaskTracker

  • 9. Run Map-Reduce Jobs

    9.1 Run Classic Pi example

    $bin/hadoop jar hadoop-*-examples.jar pi 10 100

    9.2 Run word count example:

    $ bin/hadoop dfs -mkdir inputwords

    $ bin/hadoop dfs -put conf inputwords

    $ bin/hadoop jar hadoop-*-examples.jar wordcount inputwords outputwords

    $ bin/hadoop dfs -cat outputwords/*

    10. Stop The Cluster $bin/stop-all.sh