learn to setup a hadoop multi node cluster
TRANSCRIPT
www.edureka.co/hadoop-admin
www.edureka.co/hadoop-admin
What will you learn today?
Let us have a quick poll, do you know the following topics?
Hadoop Components and Configurations Modes of a Hadoop Cluster Hadoop Multi Node Cluster Setting up a Cluster (Hands-On)
www.edureka.co/hadoop-admin
Hadoop Components and Configurations
www.edureka.co/hadoop-admin
Hadoop 2.x Core Components
HDFS YARN
DataNode
NameNode Resource Manager
Node Manager
Master
Slave
SecondaryNameNode
Hadoop 2.x Core Components
Storage Processing
www.edureka.co/hadoop-admin
HDFS Components
Hadoop 2.x Core Components
® NameNode:
» Master of the system» Maintains and manages the blocks which are
present on the DataNodes
® DataNodes:
» Slaves which are deployed on each machine and provide the actual storage
» Responsible for serving read and write requests for the clients
® Client » Submits a MapReduce Job
® Resource Manager» Cluster Level resource manager» Long Life, High Quality Hardware
® Node Manager» One per Data Node» Monitors resources on Data Node
MapReduce Components
www.edureka.co/hadoop-admin
Hadoop Cluster: A Typical Use Case
RAM: 16GBHard disk: 6 x 2TBProcessor: Xenon with 2 coresEthernet: 3 x 10 GB/sOS: 64-bit CentOS
RAM: 16GBHard disk: 6 x 2TBProcessor: Xenon with 2 cores.Ethernet: 3 x 10 GB/sOS: 64-bit CentOS
RAM: 64 GB,Hard disk: 1 TBProcessor: Xenon with 8 CoresEthernet: 3 x 10 GB/sOS: 64-bit CentOSPower: Redundant Power Supply
RAM: 32 GB,Hard disk: 1 TBProcessor: Xenon with 4 CoresEthernet: 3 x 10 GB/sOS: 64-bit CentOSPower: Redundant Power Supply
Active NameNodeSecondary NameNode
DataNode DataNode
RAM: 64 GB,Hard disk: 1 TBProcessor: Xenon with 8 CoresEthernet: 3 x 10 GB/sOS: 64-bit CentOSPower: Redundant Power Supply
StandBy NameNode
www.edureka.co/hadoop-admin
Hadoop 2.x Configuration Files
Configuration Filenames Description of Log Files
hadoop-env.sh Environment variables that are used in the scripts to run Hadoop.
core-site.xmlConfiguration settings for Hadoop Core such as I/O settings that are common to HDFS and MapReduce.
hdfs-site.xmlConfiguration settings for HDFS daemons, the namenode, the secondary namenode and the data nodes.
mapred-site.xml Configuration settings for MapReduce Applications.yarn-site.xml Configuration settings for ResourceManager and NodeManager.masters A list of machines (one per line) that each run a secondary namenode.
slaves A list of machines (one per line) that each run a Datanode and a NodeManager.
www.edureka.co/hadoop-admin
Hadoop 2.x Configuration Files – Apache Hadoop
Core
HDFS
core-site.xml
hdfs-site.xml
yarn-site.xmlYARN
mapred-site.xml
Map Reduce
www.edureka.co/hadoop-admin
core-site.xml
-------------------------------------------------core-site.xml-----------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- core-site.xml --><configuration>
<property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value></property>
</configuration>
------------------------------------------------core-site.xml-----------------------------------------------------
The name of the default file system. The url's authority is used to
determine the host, port, etc. for a filesystem.
www.edureka.co/hadoop-admin
hdfs-site.xml ---------------------------------------------------------hdfs-site.xml-------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- hdfs-site.xml --><configuration>
<property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/home/edureka/hadoop-2.2.0/hadoop2_data/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/edureka/hadoop-2.2.0/hadoop2_data/hdfs/datanode</value> </property>
</configuration> ---------------------------------------------------------hdfs-site.xml-------------------------------------------------------------
Determines where on the local filesystem the DFS name node
should store the name table(fsimage).
If "true", enable permission checking in HDFS. If "false",
permission checking is turned off.
Determines where on the local filesystem the DFS name node
should store the name table(fsimage).
Determines where on the local filesystem an DFS data node
should store its blocks.
www.edureka.co/hadoop-admin
mapred-site.xml
-----------------------------------------------mapred-site.xml---------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- mapred-site.xml --><configuration>
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
</configuration>
-----------------------------------------------mapred-site.xml---------------------------------------------------
The runtime framework for executing MapReduce jobs. Can be one of local,
classic or yarn.
www.edureka.co/hadoop-admin
yarn-site.xml
-----------------------------------------------yarn-site.xml---------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- yarn-site.xml --><configuration>
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property>
</configuration>
-----------------------------------------------yarn-site.xml---------------------------------------------------
The auxiliary service name.
The auxiliary service class to
use.
www.edureka.co/hadoop-admin
Per-Process RunTime Environment
Set parameter JAVA_HOMEJVMhadoop-env.sh
® This file also offers a way to provide custom parameters for each of the servers.
® Hadoop-env.sh is sourced by all of the Hadoop Core scripts provided in the hadoop directory which is present in
hadoop installation directory (hadoop-2.2.0/etc/hadoop).
® Examples of environment variables that you can specify:
export HADOOP_HEAPSIZE=“512"
export HADOOP_DATANODE_HEAPSIZE=“128"
® NameNode status: http://localhost:50070/dfshealth.jsp® ResourceManager status: http://localhost:8088/cluster® MapReduce JobHistory Server status: http://localhost:
19888/jobhistory
www.edureka.co/hadoop-admin
Master & Slave nodes for Hadoop Multi Node Cluster
www.edureka.co/hadoop-admin
Slaves and Masters
® The ‘Masters’ file on the Slave Node is blank.
® The ‘Slaves’ file on the MasterNode contains a list of hosts that run DataNode and NodeManager.
Masters
Slaves
® The ‘Masters’ file on the MasterNode contains the Hostname and IP Address of Secondary NameNode server.
® The ‘Slaves’ file on the SlaveNode contains its own IP address.
www.edureka.co/hadoop-admin
Modes of a Hadoop Cluster
www.edureka.co/hadoop-admin
Hadoop Cluster Modes
Pseudo-Distributed Mode
Fully-Distributed Mode
® No daemons, everything runs in a single JVM.® Suitable for running MapReduce programs during development.® Has no DFS.
® Hadoop daemons run on the local machine.
® Hadoop daemons run on a cluster of machines.
Standalone (or Local) Mode
Hadoop can run in any of the following three modes:
www.edureka.co/hadoop-admin
Terminal Commands
www.edureka.co/hadoop-admin
Terminal Commands
www.edureka.co/hadoop-admin
Setting up of a Hadoop Multi Node Cluster
www.edureka.co/hadoop-admin
Course Details
www.edureka.co/hadoop-admin
Course Details
Edureka's Hadoop Administration course: • The Hadoop Cluster Administration training course is designed to provide knowledge and skills to become a
successful Hadoop Architect. It starts with the fundamental concepts of Apache Hadoop and Hadoop Cluster. It covers topics to deploy, configure, manage, monitor, and secure a Hadoop Cluster.
• Online Live Courses: 24 hours• Assignments: 30 hours• Project: 20 hours• Lifetime Access + 24 X 7 Support
Go to www.edureka.co/hadoop-admin
Batch starts from 7 November (Weekend Batch)
Hadoop Administration Course