hadoop components setup
Post on 20-Nov-2015
223 Views
Preview:
DESCRIPTION
TRANSCRIPT
-
TCS
Hadoop Components Setup
Ajay Vaidya
8/4/2012
-
Contents Purpose ......................................................................................................................................................... 3
Interdependent Hadoop components .......................................................................................................... 3
Before Starting Installation ........................................................................................................................... 5
Hadoop components..................................................................................................................................... 5
Hadoop, hdfs and Mapreduce .................................................................................................................. 5
Download and Unpack .......................................................................................................................... 5
Setting Parameters................................................................................................................................ 7
Format filesystem and start hadoop ................................................................................................... 10
Test hadoop installation ..................................................................................................................... 11
Hbase ...................................................................................................................................................... 12
Download and Unpack ........................................................................................................................ 12
Setting Parameters.............................................................................................................................. 12
Start Hbase .......................................................................................................................................... 14
Test hbase ........................................................................................................................................... 15
Hive ......................................................................................................................................................... 16
Download and Unpack ........................................................................................................................ 16
Setting Parameters.............................................................................................................................. 16
Start Hive ............................................................................................................................................. 17
Test hive .............................................................................................................................................. 17
Pig ............................................................................................................................................................ 18
Download and Unpack ........................................................................................................................ 18
Start Pig ............................................................................................................................................... 19
-
Purpose This document describes as how to install following Hadoop components in single machine
environment. Following installation procedure is tested on Open SUSE Linux OS on Vmware.
1) Hadoop, Hdfs and Mapreduce
2) Hbase
3) Hive
4) Pig
This document also describes as how these various components are related to each other in terms of
interdependency and parameters configuration.
Please contact Ajay Vaidya (ajay.vaidya@tcs.com) for any queries about this document.
Interdependent Hadoop components
Hadoop provides storage mechanism using HDFS supported by Mapreduce framework. Other Hadoop
family components are based on this storage mechanism.
mailto:ajay.vaidya@tcs.com
-
Once you install all the components described in this document, your linux filesystem would look like
following.
-
Before Starting Installation Make sure that you have 64bit linux environment with root access. Also it needs Java installed in the
environment
Hadoop components
Hadoop, hdfs and Mapreduce
Download and Unpack
Hadoop (includes hdfs and Mapreduce) can be downloaded from apache mirror site
http://apache.techartifact.com/mirror/hadoop/common/stable/
http://apache.techartifact.com/mirror/hadoop/common/stable/
-
(Note : Always download version from stable reference)
At the time of writing this document, the stable version was 1.0.3-1 and the file to be downloaded is
hadoop-1.0.3-1.x86_64.rpm
(Note: 64 stand for 64 bit version)
Hadoop can be installed in either Single Node Setup or in a Clustered Environment. Here we are
installing Hadoop on single machine in Single Node Setup
>rpm ivh /hadoop-1.0.3-1.x86_64.rpm
After executing this installation command, it creates files are two different places
A) Jar files stored as /usr/share/hadoop
-
B) Environmental variables script and parameter xml file
C) Log files location where log would be created when hadoop services are started
Note that following shows some log files. But after the fresh installation, you would see this
location with no log files created.
Setting Parameters
Parameter xml files are stored at /etc/hadoop location. Make changes to three xml files 1) core-
site.xml 2) hdfs-site.xml 3) mapred-site.xml
Examples of these three files are as follows. You can use vi editor command to edit these xml files like
>vi core-site.xml
-
Hdfs would use port 9000 to listen to incoming requests on url hdfs://localhost:9000.
-
You need to set only dfs.replication property for base hadoop installation. Other properties are required
for hbase installation and interworking with hadoop.
-
Format filesystem and start hadoop
Format by using following command. Note : hadoop command is located at /usr/bin which would
typically be in the system path. If it is not, you need to set the system path include the location where
hadoop command is stored.
>hadoop namenode format
By default, namespace IDs for name node and data node should match. But if it does not match, you
need to overwrite data node ID with name node ID.
As you can see namespace IDs for both are same i.e namespaceID=233327041
Start hadoop by executing start-all.sh command. By default, it is located in /usr/sbin which would be
typically in the system path.
-
After issuing start-all.sh command, it prompts for password. Enter the password for root. It creates
log files under HADOOP_LOG_DIR location. Check log files for any errors.
Test hadoop installation
Test if you can use hadoop command. For simplicity try to create and list one folder using hadoop fs
command. If you are able to list the test folder, your hadoop installation is successful.
Please note that you may not see input and output folders unless it is created. But you should see test
folder since you have created it using hadoop fs command.
-
Hbase
Download and Unpack
Hbase can be downloaded from following apache site. Always download the stable version.
http://apache.techartifact.com/mirror/hbase/stable/
At the time of writing this document, Hbase stable version was 0.92.1
Download file hbase-0.92.1.tar.gz and copy it to the folder on linux where you want to install hbase
Run command >tar xfz hbase-0.92.1.tar.gz
This unpacks the tar file and creates a folder in the same directory with name hbase-0.92.1
Setting Parameters
The Hbase parameters file hbase-site.xml is located in conf directory. Update it for various parameters
as shown.
You can edit hbase-site.xml file using vi editor using command >vi hbase-site.xml
http://apache.techartifact.com/mirror/hbase/stable/
-
Parameter hbase.rootdir indicates the location where hbase stores information on hadoop hdfs.
Please note that port number is 9000 same as we setup for name node while installing hadoop.
Make sure that /etc/hosts file has entry for 127.0.0.1 for localhost
-
Start Hbase
Hbase command is located in bin folder under the directory where hbase files are unpacked. If this
bin directory is not in system path, you need to either specific full path or go the bin folder for
executing hbase.
Start the hbase using start-hbase.sh command
Check for any errors in log files located under hbase-0.92.1/logs directory.
-
Start the hbase shell using hbase shell command.
Test hbase
Create sample table mytable and put / get sample data to test the hbase installation.
-
Hive
Download and Unpack
Hive can be downloaded from apache site from following mirror location. Always use stable version.
At the time of writing this document, the stable version was 0.8.1
http://apache.techartifact.com/mirror/hive/stable/
Download hive-0.8.1.tar.gz and copy on to the linux system where you want to install hive.
Run command >tar xfz hive-0.8.1.tar.gz
This creates folder hive-0.8.1.
Setting Parameters
Define HIVE_HOME environment variable and also include HIVE bin folder in the system path.
One way to do this is to define it in linux login script. For instance, it can be defined in .bash_profile file.
This file can be edited using vi editor using command >vi .bash_profile
Create /tmp and /user/hive/warehouse directories in hdfs using following commands
>hadoop fs mkdir /tmp
>hadoop fs mkdir /user/hive/warehouse
http://apache.techartifact.com/mirror/hive/stable/
-
>hadoop fs chmod g+w /tmp
>hadoop fs chmod g+w /user/hive/warehouse
These folders looks like this once created.
Start Hive
Hive shell by executing hive command that is located in hive-0.8.1/bin folder.
Test hive
Test hive by creating sample table through the shell.
-
Pig
Download and Unpack
Pig can be downloaded from apache site from following location. Always use stable release.
http://apache.techartifact.com/mirror/pig/stable/
At the time of writing this document, Pig stable release was 0.10.0
Download file pig-0.10.0.tar.gz and copy to the location on linux file system where you want to install
Pig.
Run command >tar xfz pig-0.10.0.tar.gz
This creates folder pig-0.10.0
http://apache.techartifact.com/mirror/pig/stable/
-
Start Pig
Pig command line can be started by executing Pig command which is located in pig-0.10.0/bin
folder.
top related