-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
1/107
Administration of HadoopLab Guide
Summer 2014
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
2/107
Cluster Admin on Hadoop
This Certified Training Services Partner Program Guide (the Program Guide) is protected under
U.S. and international copyright laws, and is the exclusive property of MapR Technologies,
Inc. 2014, MapR Technologies, Inc. All rights reserved.
PROPRIETARY AND CONFIDENTIAL INFORMATION ii
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
3/107
Cluster Admin on Hadoop
"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 000
12345 /67# '89:;:B? #8?8CD8EA
Contents
Administration of Hadoop Lab Guide ...................................................................... i
Get Started ............................................................................................................ 8
F8B GB6CB8E 4H G8B I7 6 =6J 8;D0C
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
4/107
Cluster Admin on Hadoop
"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 0D
12345 /67# '89:;:B? #8?8CD8EA
.6J 4A4H "C8]0;?B6== D6=0E6B0
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
5/107
Cluster Admin on Hadoop
"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* D
12345 /67# '89:;:B? #8?8CD8EA
"C69B098 ,C86B0;> 6;E #8K R
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
6/107
Cluster Admin on Hadoop
"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* D0
12345 /67# '89:;:B? #8?8CD8EA
.6J WA[H /0CC
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
7/107
Cluster Admin on Hadoop
"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* D00
12345 /67# '89:;:B? #8?8CD8EA
.6J YA2H G8B I7 G/'" AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ^W
.6J YA[H /8BC09?@ /
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
8/107
Get Started
Get Started 1: Set up a lab environment in Amazon Web
Services (AWS)
':0? ?8B I7 7C
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
9/107
Cluster Admin on Hadoop
"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* ^
12345 /67# '89:;:B? #8?8CD8EA
(MG 7C
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
10/107
Cluster Admin on Hadoop
"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 43
12345 /67# '89:;:B? #8?8CD8EA
EA G8=89B fK67C]8K8;B ,
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
11/107
Cluster Admin on Hadoop
"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 44
12345 /67# '89:;:B? #8?8CD8EA
JA dG M8?B N$C8>
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
12/107
Cluster Admin on Hadoop
"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 42
12345 /67# '89:;:B? #8?8CD8EA
JA G8=89B f(== ',"f QC8@ C8D08S Vf ?B6B8 6;E ?B6BI? 9:89T? B< 9
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
13/107
Cluster Admin on Hadoop
"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 4[
12345 /67# '89:;:B? #8?8CD8EA
#" G8=89B f(MG /6;6>8K8;B ,
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
14/107
Cluster Admin on Hadoop
"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 45
12345 /67# '89:;:B? #8?8CD8EA
JA ?8=89B B:8 fK67C]
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
15/107
Cluster Admin on Hadoop
"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 4W
12345 /67# '89:;:B? #8?8CD8EA
$ passwd mapr
B:8; BV78 B:8 76??S
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
16/107
Cluster Admin on Hadoop
"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 4Y
12345 /67# '89:;:B? #8?8CD8EA
'< C8?B6CB B:8 0;?B6;98?@ C8786B B:8?8 ?B87?@ 6;E ?8=89B iGB6CBj 0; ?B87 WA #8K8KJ8C@ V
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
17/107
Cluster Admin on Hadoop
"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 4Z
12345 /67# '89:;:B? #8?8CD8EA
Get Started 2: Setup passwordless ssh access between
nodes
M:8; B8?B0;> :6CES6C8 ; U6E
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
18/107
Cluster Admin on Hadoop
"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 4P
12345 /67# '89:;:B? #8?8CD8EA
Get Started 3: Log into the class cluster
-
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
19/107
Cluster Admin on Hadoop
"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 4^
12345 /67# '89:;:B? #8?8CD8EA
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
20/107
Cluster Admin on Hadoop
"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 23
12345 /67# '89:;:B? #8?8CD8EA
&" .0; 6?
892]I?8C
76??S
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
21/107
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
22/107
Cluster Admin on Hadoop
"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 22
12345 /67# '89:;:B? #8?8CD8EA
Get Started 4: Explore the MapR Control System
':8 /67# ,
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
23/107
Cluster Admin on Hadoop
"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 2[
12345 /67# '89:;:B? #8?8CD8EA
Lab Procedure
Log on and explore different views of the cluster
!" ,
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
24/107
Cluster Admin on Hadoop
"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 25
12345 /67# '89:;:B? #8?8CD8EA
&" GB87 WA %; B:8 ;6D0>6B08 D08S QC
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
25/107
Cluster Admin on Hadoop
"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 2W
12345 /67# '89:;:B? #8?8CD8EA
^A ,"d IB0=0L6B06B0
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
26/107
Lesson 1: Pre-install
Lab OverviewIn this lesson you will learn where you can download a collection of tools and scripts that we will
use to prepare the cluster hardware for the parallel execution of tests and then test and
measure the performance of the hardware components for our cluster to determine that they
are functioning properly and within the specifications for Hadoop installation. we will also
identify the current firmware for each of the new hardware components in the cluster, and
update these components to make sure that they have matching firmware.
Lab 1.1: Pre-install validation downloads, setup and clustershell
Lab 1.2: Network, Memory and IO
Lab Procedures
Lab 1.1: Pre-install validation
Note: One of the most common causes for a failure when installing Hadoop is that the hardware
is not within the necessary specifications. You can see a list of the current hardware and OS
specifications at: http://doc.mapr.com/display/MapR/Preparing+Each+Node
The Professional Services team at MapR has developed a collection of all of the tools and scripts
that we will need to validate our hardware and prepare it for installation.
1. Download the cluster-validation package onto your master node from:
https://github.com/jbenninghoff/cluster-validation/archive/master.zip
Extract master.zip and move the pre-install and post-install folder directly under /root for
simplicity.
2. Here, we will find two directories, pre-installand post-install. We will use the tools and
scripts inside the pre-install directory to validate our new hardware prior to installing
Hadoop. We will use tools and scripts in the post-install later, to test our new cluster
after we have completed our install.
https://github.com/jbenninghoff/cluster-validation/archive/master.ziphttps://github.com/jbenninghoff/cluster-validation/archive/master.ziphttps://github.com/jbenninghoff/cluster-validation/archive/master.zip -
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
27/107
Cluster Admin on Hadoop
Note: The tools and files in this collection are updated frequently, so we should always make
sure we download the latest package when preparing for a new Hadoop installation.
3. To prepare the cluster for these validation tests, choose one node on the cluster to be
your set up master node. Generate ssh keys on this node, and make sure that it has
passwordless ssh access to all other nodes on the cluster. You can find steps for how todo this in your lab guide at the end of this guide.
4. Inside the pre-install directory is a clustershell rpm. Install this rpm on the master node
with passwordless ssh access to the rest of our cluster. We will be making all further
commands for this exercise from this master node, using clush to propagate those
commands throughout the rest of our hardware.
5. Once installed, update the file to include an entry for all:
/etc/clustershell/groups
then the host names for the nodes we will use, such as:
all: node[0-19]
6. Once we have our node names listed, type the following to copy the /root/pre-install
directory to all of our node hardware.:
# clush -a --copy /root/pre-install
7. When that is complete, type to confirm that all of the nodes have a copy of the package:
# clush -Ba ls /root/pre-install
8. After we have a copy of the pre-install package on all nodes, we are ready to start our
hardware validation tests. First, we will run an audit of our hardware to see exactly
what we have on each node, and to verify that they all have a similar configuration. to
run the cluster-audit.sh script, type:
/root/pre-install/cluster-audit.sh | tee cluster-audit.log
This will list hardware specifications from each of the new nodes.
We can examine the output log to look for hardware or software that does not match the
requirements to install Hadoop, or discrepancies in the hardware or software from one node to
the next.
PROPRIETARY AND CONFIDENTIAL INFORMATION 20
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
28/107
Cluster Admin on Hadoop
Note: that the audit output will give us deltas when looking at things like the RAM. It will tell us
the total about of RAM, number of slots and then the types of DIMMs found, but it will not tell
us which exact DIMMs are in which slots. Also, if only one DIMM type is listed, then all slots
have the same DIMM type
Lab 1.2: Network, Memory and IO
1. Evaluate the network interconnect bandwidth.
Inside the pre-install directory, update the network-test.shfile so that the half1
and half2 arrays contain the correct IP addresses for our hardware nodes. Next delete
the exit command, and save the file.
2. When the file has been updated, type:
# / r oot / pr e- i nst al l / net wor k- t est . sh | t ee net wor k- t est . l og
This will run RPC test to validate our network bandwidth. This test should take about 2
minutes to run, maybe a little longer.
We should expect to see results of about 90% of our peak bandwidth. Thus, with a
1GbE network, we should expect to see results of about 115MB/sec, or with a 10GbE
network, look for results around 1100MB/sec. If we are not seeing results in this range,
then we need to check with our network administrators to verify the connections and
firmware.
3. Next, we will evaluate the raw memory performance. Type to run the stream59 utility:
# cl ush - Ba ' / r oot / pr e- i nst al l / memor y- t est . sh | gr epTr i ad' | t ee memor y- t est . l og
This tests the memory performance of the cluster. The exact bandwidth of memory is
highly variable and is dependent on the speed of the DIMMs, the number of memory
channels and to a lesser degree, the CPU frequency.
4. Evaluate the raw disk performance. Thedisk-test.shscript will run IOzone on our
hard drives to test their performance.
Note: This process is destructive to any existing data, so make sure the drives do not have any
needed data on them, and that you do not run this test after you have installed MapR Hadoop
on the cluster.
PROPRIETARY AND CONFIDENTIAL INFORMATION 21
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
29/107
Cluster Admin on Hadoop
Type:
# clush -ab /root/pre-install/disk-test.sh
When you first run this script, it will list out the spindles to be tested. We need to verify
that this list is correct, and then edit the script to run the test.
The comments in the script will direct us to the edits that we need to make. When we are done,
we save the file and run the script again to perform the test.
If we have a large number of total drives, the summIOzone.sh script will provide us with a
summary of the disk-test.sh output.
We will keep the results of this test with the other benchmark tests for post installation
comparison.
Conclusion
Now that we have run all of our hardware tests, and compiled benchmarks for all of our
components, we have one final task to prepare our new hardware for installation.
The firmware for the new hardware must be up to date with vendor specifications and match
across each of the nodes of the same type. The BIOS versions and settings must also match for
similar nodes. In addition, the firmware for the management interfaces needs to be the same
on each of these nodes. Any other hardware components that we may have in our system, such
as NICs or onboard RAID controllers also need to have updated and matching firmware.
We will need to refer to the manual for each node vendor that we are including, and update the
firmware and BIOS according to their specifications. If there is a discrepancy in our BIOS or
firmware between nodes from the same vendor, then we can see inconsistent performance
across nodes.
PROPRIETARY AND CONFIDENTIAL INFORMATION 22
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
30/107
Lesson 2: Install MapR software
Lab Overview
!" $%&' ()(*+&'( ,-. /&00 &"'$100 1 2134 +0.'$(*5 !$ &' 10'- &63-*$1"$ $- +-"'&7(* %-/ 61", -8 (1+%
'(*9&+(' /&00 :( *.""&"; -" $%( ("$&*( +0.'$(* $- ("'.*( $%1$ ,-. %19( 1 *-:.'$
%& B-; &"$- $%( 61'$(* "-7( -8 ,-.* +0.'$(* 1' 7('+*&:(7 1:-9(C -* 1' 7('+*&:(7 :, ,-.*
&"'$*.+$-*5
'& D19&;1$( $- $%( E%-6(E613* 7&*(+$-*,>
$ cd /home/mapr
(& F-/"0-17 $%( 613*G'($.3 31+@1;(>$ wget http://package.mapr.com/releases/v.3.1.1//mapr-setup
)& F-/"0-17 $%( 3(6 @(, $- $%( 61'$(* "-7( &" ,-. +0.'$(*
*& H($ $%( 3(*6&''&-"' -" $%( '($.3G613* 8&0( 1"7 3(6 @(,>
$ chmod 755 mapr-setup
$ chmod 600
+& 4." $%( 613*G'($.3 '+*&3$5 D-$(> $%&' '+*&3$ /&00 +*(1$( E-3$E613*G&"'$100(* 7&*(+$-*, 1"7
177&$&-"10 '.:7&*(+$-*&('
$ sudo ./mapr-setup
===============================================
Self Extracting Installer for MapR Installation
===============================================
Extracting installer.......
Copying setup files to "/opt/mapr-installer"......
Installed to "/opt/mapr-installer"
====================================
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
31/107
Cluster Admin on Hadoop
I4JI4!KL=4M =DF ?JDN!FKDL!=B !DNJ42=L!JD OP
QPRST 2134 L(+%"-0-;&('C !"+5 =00 4&;%$' 4('(*9(75
Run "/opt/mapr-installer/bin/install" as super user, to
begin install process
[root@ip-10-170-125-38 ec2-user]#
,& ?-3, $%( '$.7("$'RUSUPRSP53(6 @(, $- $%( E -3$E613*G&"'$100(*E:&" 7&*(+$-*,>
$ mv /opt/mapr-installer/bin
-& !8 ,-. 1*( .'&"; 1 +-"8&; 8&0( /&$% $%( &"'$100(*C (7&$ $%( +-"8&;5()1630( 8&0( $- '3(+&8, $%(
+-"$*-0 "-7(' 1"7 71$1 "-7(' &"8-*61$&-"5
$ vi config.example
L%&' &"8-*61$&-" +1" 10'- :( &"3.$ /%(" *.""&"; $%( &"'$100(*C &8 "-$ .'&"; 1 +-"8&; 8&0(5
=77&$&-"10 &"8-*61$&-" +1" :( '3(+&8&(7 &" $%( +-"8&; 8&0( 1' /(00C &"+0.7&";>
o F&'@ .'(75 !"#$> 8-* =61V-"C 7&'@ 1*( $%( 8-00-/&"; /dev/xvdf,
/dev/xvdgo 2,'A0 71$1:1'( &"8-*61$&-" W'(( ,-. &"'$*.+$-* 8-* !I 177*(''X
o 4(3-'&$-*&(' W+1" :( 0-+10X
o Y(*'&-"
o H(+.*&$,
o 2U
o ?0.'$(*"16(
o K$+5
...............................................................
Z K1+% D-7( '(+$&-" +1" '3(+&8, "-7(' &" $%( 8-00-/&"; 8-*61$
Z D-7(> 7&'@SC 7&'@PC 7&'@OZ H3(+&8,&"; 7&'@' &' -3$&-"105 !" /%&+% +1'( $%( 7(81.0$ 7&'@ &"8-*61$&-"
Z 8*-6 $%( F(81.0$ '(+$&-" /&00 :( 3&+@(7 .3
[?-"$*-0\D-7(']
^&3GSRGSUSG_`GSU_a > E7(9E)978C E7(9E)97;
^&3GSRGSUSGO_GSbba > E7(9E)978C E7(9E)97;
^&3GSRGSUTGS`GSb`a > E7(9E)978C E7(9E)97;
[F1$1\D-7(']
^&3GSRGSUSGPOGPPba > E7(9E)978C E7(9E)97;
^&3GSRGSURGSS`GSPUa E7(9E)978C E7(9E)97;
^&3GSRGSUTGPOGTSa > E7(9E)978C E7(9E)97;
[?0&("$\D-7(']
Z?S
Z?P
[J3$&-"']
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
32/107
Cluster Admin on Hadoop
I4JI4!KL=4M =DF ?JDN!FKDL!=B !DNJ42=L!JD OO
QPRST 2134 L(+%"-0-;&('C !"+5 =00 4&;%$' 4('(*9(75
2134(7.+( c $*.(
M=4D c 810'(
613*G&"'$10053, [G%] [G'] [Gf HfFJ\fHK4] [G. 4K2JLK\fHK4]
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
33/107
Cluster Admin on Hadoop
I4JI4!KL=4M =DF ?JDN!FKDL!=B !DNJ42=L!JD OT
QPRST 2134 L(+%"-0-;&('C !"+5 =00 4&;%$' 4('(*9(75
[GG3*&91$(G@(, I4!Y=LK\jKM\N!BK] [G@] [Gj]
[GG'@&3G+%(+@'] [GGA.&($] [GG+8; ?Ng\BJ?=L!JD]
[GG7(:.;] [GG31''/-*7 4K2JLK\I=HH]
[GG'.7-G31''/-*7 HfFJ\I=HH]
k"(/C177l 555
3-'&$&-"10 1*;.6("$'>
k"(/C177l
"(/ H$1*$ "(/ !"'$1001$&-"
177 =77 $- 1" ()&'$&"; !"'$1001$&-"
-3$&-"10 1*;.6("$'>
GG+8; ?Ng\BJ?=L!JD +-"8&; 8&0( $- .'(*
GG7(:.; *." &"'$100(* &" 7(:.; 6-7(
GG31''/-*7 4K2JLK\I=HH
*(6-$( ''% .'(* 31''/-*7
GG3*&91$(G@(, I4!Y=LK\jKM\N!BK
.'( $%&' 8&0( $- 1.$%("$&+1$( $%( +-""(+$&-"
GGA.&($ *." &"'$100(* &" "-"G&"$(*1+$&9( 6-7(
GG'@&3G+%(+@' '@&3 3*(G+%(+@' WF=DgK4JfHX
GG'.7-G31''/-*7 HfFJ\I=HH
'.7- .'(* 31''/-*7GjC GG1'@G'.7-G31'' 1'@ 8-* '.7- 31''/-*7
Gf HfFJ\fHK4C GG'.7-G.'(* HfFJ\fHK4
7('&*(7 '.7- .'(* W7(81.0$c*--$X
G%C GG%(03 '%-/ $%&' %(03 6(''1;( 1"7 ()&$
G@C GG1'@G31'' 1'@ 8-* HH< 31''/-*7
G'C GG'.7- *." -3(*1$&-"' /&$% '.7- W"-31''/7X
G. 4K2JLK\fHK4C GG.'(* 4K2JLK\fHK4
0& =5 !8 ,-. 1*( "-$ .'&"; 1 +-"8&; 8&0(C *." $%( &"'$100(*>
$ sudo /opt/mapr-installer/bin/install K s private-key
-u ec2-user U root debug new
1"7 8&00 &" $%( +0.'$(* 7($1&0' /%(" 3*-63$(7 0&'$(7 1:-9(5
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
34/107
Cluster Admin on Hadoop
I4JI4!KL=4M =DF ?JDN!FKDL!=B !DNJ42=L!JD O_
QPRST 2134 L(+%"-0-;&('C !"+5 =00 4&;%$' 4('(*9(75
12
d5 !8 ,-. 1*( .'&"; 1 +-"8&; 8&0(C *." $%( &"'$100(* $- 7($(*6&"( &8 $%( 31*16($(*' ,-. %19(
'3(+&8&(7 1*( +-**(+$5
$ sudo /opt/mapr-installer/bin/install K s --cfg
config.example
--private-key -u ec2-user -U root --debug new
%3& !" $%( '.661*, *('3-"'( 1*(1 +%--'( W1X:-*$ 18$(* ()16&"&"; ,-.* 31*16($(*'5
%%&4(*." $%( &"'$100(* /&$% $%( Gquiet1*.;.(6("$ 8-* "-"G&"$(*1+$&9( 6-7( 1"7 $%( &
$- :1+@;*-."7 $%( &"'$100(* &" +1'( $%( /&"7-/ &' 0-'$ -* $%( 013$-3 ;-(' $- %&:(*"1$(
6-7(5 L%&' $&6(C '(0(+$ W+X $- +-"$&".( /&$% $%( &"'$100 18$(* *(9&(/&"; $%( 31*16($(*'5
A. $ sudo /opt/mapr-installer/bin/install K s private-
key -u ec2-user U root debug --quiet new &
OR
B. $sudo /opt/mapr-installer/bin/install --cfg
config.example
--private-key students07172012.pem -u ec2-user -s -U root -
-debug --quiet new &
D-$(> Y&(/ 7($1&0' 1:-.$ &"'$100&"; -" 1" JH -$%(* $%1" 4(7EE///5613*5+-6E7-+E7&'301,E2134E!"'$100&";m2134mH-8$/1*(
L%( 176&"&'$*1$&9( .'(* /%- '%-.07 :( ;&9(" 8.00 3(*6&''&-" &' n613*o 1"7 $%( .'(*
31''/-*7 &' n613*o5
e%(" *(;&'$(*&"; ,-.* +0.'$(* '(0(+$ 1" 2U L*&10 0&+("'(5 =0'-C :( '.*( $- 1330, ,-.*
2U 0&+("'( :(8-*( ,-. +0-'( $%( B&+("'( 21"1;(6("$ 7&10-;5
%'&e1$+% $%( &"'$1001$&-" 3*-+('' 1"7 0--@ 8-* $%( 91*&-.' 31+@1;(' :(&"; &"'$100(75 =8$(*
$%( +-"$*-0 "-7(' %19( :((" &"'$100(7 W.'.100, PRGOR6&"X 0-; &"$- $%( 2?H :, 3-&"$ ,-.*
:*-/'(* $- $%( !I 177*('' -8 -"( -8 $%( +-"$*-0 "-7('C 1$ 3-*$ `TTO>
http://ControlNodeIP:8443/
%(&=++(3$ $%( 2134 1;*((6("$C 1"7 '(0(+$ $%( 0&+("'(' 0&"@ &" $%( .33(* *&;%$ +-*"(*5
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
35/107
Cluster Admin on Hadoop
I4JI4!KL=4M =DF ?JDN!FKDL!=B !DNJ42=L!JD Oh
QPRST 2134 L(+%"-0-;&('C !"+5 =00 4&;%$' 4('(*9(75
%)&=330, $%( $(63-*1*, 2U 0&+("'( *(+(&9(7 /%(" *(;&'$(*&"; 8-* $%( +-.*'(5 !8 ,-. 7- "-$
%19( 1 $(63-*1*, 0&+("'(C +-"$1+$ $*1&"&";i613*5+-6-* 1'@ ,-.* &"'$*.+$-* &8 ,-. 1*(
$1@&"; 1 +01''*--6 -* 9&*$.10 $*1&"&"; +01''5
%*&=8$(* ,-. %19( '.++(''8.00, 1330&(7 1 $*&10 0&+("'( ,-. 61, "-$&+( $%1$ '-6( -8 $%( "-7('
&" $%( +0.'$(* %19( -*1";( &+-"' &" $%( %(1$613 &"7&+1$&"; $%1$ $%(, %19( 7(;*17(7
'(*9&+(5
%+&=' $%( &"'$100(* +-"$&".(' $- &"'$100 31+@1;('C 1"7 $%( /1*7(" '(*9&+( '$1*$ $%( '(*9&+('
-" (1+% "-7(C /( /&00 :(;&" $- '(( $%( "-7(' $.*" ;*(("5 K9("$.100, 100 -8 $%( "-7(' /&00
:( ;*(("C &"7&+1$&"; $%1$ 100 "-7(' 1*( 1+$&9( 1"7 %(10$%,5
Conclusion
I01" ,-.* '(*9&+( 01,-.$ 3*&-* $- &"'$100&"; $%( 2134 '-8$/1*(
o 21@( '.*( $%1$ ,-. %19( &7("$&8&(7 /%(*( $%( @(, 61"1;(6("$ '(*9&+(' W?BFdC
p--@((3(*C q-:L*1+@(*C e(:'(*9(*X /&00 :( *.""&"; &" $%( +0.'$(*
o K"'.*( $%1$ ,-. %19( ("-.;% &"'$1"+(' -8 $%( 61"1;(6("$ '(*9&+(' $- 61&"$1&" $%(
0(9(0 -8 '(*9&+( $%1$ &' 133*-3*&1$( 8-* ,-.* -*;1"&V1$&-"
N-00-/ $%( 3*-+(7.*(' -.$0&"(7 &" $%( 2134 7-+.6("$1$&-" ."7(* !"'$1001$&-" g.&7(
o %$$3>EE613*5+-6E7-+E7&'301,E2134E!"'$1001$&-"mg.&7(
f'( $%( 2?H $- 9(*&8, $%1$ $%( +0.'$(* &"'$1001$&-" &' +-630($( 1"7 $%1$ $%( +0.'$(* &' "-/
1+$&9(
Discussion
S5 J"+( ,-. '(( $%1$ $%( +0.'$(* &' 1+$&9(C $*, ()30-*&"; $%( 2?H :, +0&+@&"; -" $%( 7&88(*("$
0&"@' &" $%( D19&;1$&-" 31"( 1"7 -" $%( F1'%:-1*75 e%1$ /&00 ,-. :( 1:0( $- 6-"&$-*
-"+( ,-. :(;&" $- .'( ,-.* +0.'$(*r
P5 e%1$ /-.07 ,-.* "()$ '$(3 :( 18$(* &"'$100&"; $%( +0.'$(*r
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
36/107
Lesson 3: Post-install
Lab Overview
If you remember, the package that we downloaded in our pre-install lesson contained a post-
install directory. That directory contains all of the tools and scripts that we need to run post
install benchmarks to make sure our new cluster is performing as expected.
First, we will test the drive throughput. As with our pre-install tests, we will use clush to push
this test to all of the nodes on our cluster.
Lab Procedures
3.1 Run RWSpeedTest1. Log into the master node that we used for our pre-install tests and navigate to the
directory /root/post-install. In here we will find the file runRWSpeedTest.sh.
2. Note: This script uses an HDFS API to stress test the io subsystem. The output provides
an estimate of the maximum throughput the io subsystem can deliver. To begin the test,
type:
# clush -Ba /root/post-install/runRWSpeedTest.sh | tee
RWSpeedTest.log
3. After we run our RWSpeed, we can compare our results to our pre-install IOzone tests.We should expect to see similar results, within 10-15% of the pre-installation test.
3.2 TeraGen/TeraSort
Teragenis a map/reduce program that will generate 1GB of synthetic data, and Terasort
samples this data and uses map/reduce to sort it into a total order. These two tests together
will challenge the upper limits of our clusters performance
1. Type:
# maprcli volume create -name data1 replication 1 mount 1
path /root/data1
# mkdir data1/out1
# mkdir data1/out2
2. Verify that the new directories exist, then type:
hadoop jar /opt/mapr/hadoop/hadoop-0.20.2/hadoop-0.20.2-
dev-examples.jar teragen 10000000 /data1/out1
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
37/107
Cluster Admin on Hadoop
3. This will create 1TB worth of small number data. Once teragen has finished then type to
sort the newly created data:
hadoop jar /opt/mapr/hadoop/hadoop-0.20.2/hadoop-0.20.2-
dev-examples.jar terasort /data1/out1 /data1/out2
When we are running Terasort, we can use the MCS to watch the node usage. When we set the
heatmap to show Disk Usage, we can see the load on each node. We are looking for the load
to be spread evenly across our cluster. Hotspots suggest a problem with a hard drive or its
controller. We can change the view of our heatmap to look at the load of different resources of
our cluster as we run our tests.
In addition to the heatmap views, we can look at the services and jobs. Since we are using
synthetic code, we know that it functions properly. If we have a job or task failure, then we
have an issue with our hardware.
When Terasort is finished, we can compare the results with our RWSpeedTest results. We
should expect to see our Terasort throughput to be between 50% to 70% of our RWSpeedTest
throughput. Since we know the Terasort job code does not have any errors, if we see
performance that doesnt match our expectations, we know we have a problem with the
hardware in our cluster.
PROPRIETARY AND CONFIDENTIAL INFORMATION 32
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
38/107
Lesson 4: Configure Cluster Storage
Resources
Lab Overview
The labs in this chapter cover all the basics of cluster storage resources, including:
Topology and Storage Architecture:
o the physical layer, including nodes, disks & storage pools
o the logical layer, including files, chunks, containers
Volumes, including with mirrors, snapshots and remote mirrors
These labs provide insight into how data is managed in a MapR cluster, and teach hands-on
experience configuring topologies, volumes and quotas. You have a great degree of control over
your organizations MapR storage resources. Configuring a cluster with appropriate topologies
and volumes has long-term impacts on performance, reliability and ease-of-management. This
lab is broken into three separate exercises that build on each other.
Lab Procedures
Always set up node topology before deploying cluster. Never leave nodes in /data/default-
rack
Key Tips:
Create volumes to contain different types of data on the cluster before deploying the
cluster. (E.g., create one volume per user, one volume per project, distinct volumes for
production work and development work, etc.) Dont let data accumulate at the root
level of the cluster.
MapR separates the concepts of volume ownership and quota accounting. Project
members can have full ownership of files and folders for a project, while the collective
storage for the whole project is restricted by a quota independent of individual users.
Rack Layout
In this training lab environment, our physical rack layout is hypothetical. If you were
configuring node topology in a physical cluster environment, then you would coordinate with
the team responsible for the physical setup of the cluster to build a diagram of the physical rack
layout. For this lab, lets assume our clusters nodes are contained in two racks.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
39/107
Cluster Admin on Hadoop
Note: If applicable, you may need to coordinate your activities on your Team# cluster with the
other members of your team.
Lab 4.1: Configure Node Topology
The first step in getting a cluster ready for data storage is to set up the node topology. Node
topology describes the logical organization of the cluster. Grouping nodes into proximity-based
topologies, i.e. racks, helps to distribute data across physical failure domains, thus decreasing
the probability of data loss. It is also important to define higher-level logical topologies, typicallynamed / dat aand / decommi si oned, which serve as staging areas for nodes when
transitioning into and out of service.
/
data/
rack1/
r1_node1
r1_node2
r1_nodeN
rack2/
r2_node1
r2_node2
r2_nodeN
decomissioned/
PROPRIETARY AND CONFIDENTIAL INFORMATION 34
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
40/107
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
41/107
Cluster Admin on Hadoop
5. Set the default physical topology using the CLI. You can change the default topology,
such that any new node added to the cluster will appear in the specified topology. In
this step, you are going to change the default topology to / dat a.
a. Open a SSH session with a node in the cluster.
b. Type the following command at a command line.
maprcli config load json | grep default
c. Notice the default topology.
d. To change it you would do the following:
maprcli config save -values
'{"cldb.default.volume.topology":"/data"}'
6. Verify that all nodes are assigned to a physical topology.
a. In the MCS Navigation pane under the Cluster group, click Nodes.
b. Look at the Topology pane and confirm that each node in the cluster appears in a
specific rack, and that no nodes remain under /def aul t - r ack.
PROPRIETARY AND CONFIDENTIAL INFORMATION 36
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
42/107
Cluster Admin on Hadoop
Lab 4.2: Create Volumes and Set Quotas
In this lab exercise you will learn how to manage a MapR cluster in a shared environment.
Imagine that your cluster is going to be shared by up to 5 different groups each with multiple
users working on development and production projects. You need to manage the resources ofthe cluster so all of these groups can work simultaneously without consuming more than their
share of storage and compute resources. You also need to make sure that development
projects do not impinge upon production work.
In this exercise you will create independent volumes for each user and project, and then you will
impose quotas on those volumes.
Important!
Dont store data in the root volume (/).
If all data is in the root volume, you lose the ability to specify location, quota, or HA properties
for different types of data.
As soon as you set up your cluster, start creating volumes to organize data on the cluster. As this
lab will demonstrate, MapR recommends that you create at least the following volumes:
1. Create a separate volume for each user.
2. For active projects, create separate volumes for development work and production
activity.
Note: In order for a MapR cluster to function correctly, the user accounts and groups must be
set up identically across all nodes.
PROPRIETARY AND CONFIDENTIAL INFORMATION 37
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
43/107
Cluster Admin on Hadoop
Lab 4.2 Overview
The diagram below illustrates the key concepts of this exercise. In this case user01 and user02
are in the Log Analysis Development group (loganalysis_dev). Each of these users has
permission to read and write data to the project volume as well as their own user volume. The
cumulative storage used by these volumes rolls up to a group referred to as an AccountingEntity. Each user, volume and Accounting Entity can have a separate disk quota for flexible
management of cluster disk usage.
PROPRIETARY AND CONFIDENTIAL INFORMATION 38
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
44/107
Cluster Admin on Hadoop
Lab 4.2 Set-up
1. Set up the users and groups on all your cluster nodes. Note: they must all have the
same UID and GID on every node in the cluster. This is an opportunity to use the clush
utility if you wish.
# yum install clustershell
For example, run groupadd on every node in your cluster.
# groupadd -g 5000 loganalysis_dev
Add individual users on every node.
# useradd u 5001 g loganalysis_dev user17
Or
# clush -a groupadd -g 8000
# clush -a useradd -u 8001 -g 8000
2. Add the user to the MCS to the MCS permissions popup.
PROPRIETARY AND CONFIDENTIAL INFORMATION 39
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
45/107
Cluster Admin on Hadoop
Name Username/loginID Groupname Teamname/Clusterna
me
user01 webcrawl_dev Team1
user02 webcrawl_dev Team1
user03 webcrawl_prod Team1
user04 webcrawl_prod Team1
user05 frauddetect_dev Team2
user06 frauddetect_dev Team2
user07 frauddetect_prod Team2
user08 frauddetect_prod Team2
user09 recommendations_dev Team3
user10 recommendations_dev Team3
user11 recommendations_prod Team3
user12 recommendations_prod Team3
user13 twittersentiment_dev Team4
user14 twittersentiment_dev Team4
user15 twittersentiment_prod Team4
user16 twittersentiment_prod Team4
user17 loganalysis_dev Team5
user18 loganalysis_dev Team5
user19 loganalysis_prod Team5
user20 loganalysis_prod Team5
PROPRIETARY AND CONFIDENTIAL INFORMATION 40
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
46/107
Cluster Admin on Hadoop
Lab 4.2 Steps
Examine the volumes already on the cluster
1. Connect to the MCS for your cluster
2. Click Volumesunder MapR-FS in the left navigation pane:
Notice how many volumes are listed. Do these include systems volumes? Hint: noticewhether or not the Systems check box is selected on upper menu.
Display only the non-system volumes by de-selecting the System check box on
the upper menu.
Locate the New Volume button that lets you create a new volume.
What other volume actions are allowed in the Volume Actions modify volume
menu?
Examine volume properties from the volumes list
1. From the list of volumes, choose a volume to examine.
Look across the columns to find whether the volume of interest contains data, and if
so, what is the data size?
What is the replication factor listed for the volume you are examining?
2. Find more details for this volume on the Volumes Properties pane. Hint: Open the pane
by clicking the highlighted name of the volume.
What is the minimum replication factor for this volume?
Does the volume have a quota?
PROPRIETARY AND CONFIDENTIAL INFORMATION 41
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
47/107
Cluster Admin on Hadoop
Practice Creating and Removing Volumes
3. Click the New Volume button.
4. Select Standard Volume for the Volume Type in the new pop up window.
5. Enter a volume name using your name (or some other unique name) and designate
volume number 1 (e.g. name-vol1 where name is your name) in the Volume Name
field.
6. Type the mount path /name-vol1 in the Mount Path field.
Note: MapR MCS will not create any parental directories above the mount point, so make them
beforehand if necessary with the mkdir command.
7. Verify /data is displayed in the Topology field (This is the default topology; we will
discuss topology in the next lecture).
8. Verify the default replication factor and minimum replication settings. Are they set to
what was recommended in the Volumes lecture?
9. At the bottom of the popup window, click OK to create the volume.
10.Verify that your new volume appears in the volumes list. Do you see the volumes
created by the other students in the class?
(Note: If not, you will need to go to the volume name filter the top and remove the
filter by clicking the minus sign:
Repeat the process in Step 1 to create a user volume 2 for your name.
Verify that your new volume appears in the volumes list.
Once again remove the filter so that you can view the full list of non-system
volumes.
PROPRIETARY AND CONFIDENTIAL INFORMATION 42
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
48/107
Cluster Admin on Hadoop
Remove one volume
1. Decide which of your own volumes you want to remove and select it by clicking the
check box by the volume name.
2. Select Remove on the modify Volume menu. You will see this dialog box:
Make your choice for what style of removal you want and click the Remove Volume
button on lower right.
Verify in the volumes list that one of your volumes has disappeared.
Create a volume for each user
In this step, you will create a home volume for all project members, if applicable. On each user
volume:
Restrict the volume to the /data/rack2 topology, which prevents users from consuming
storage resources on /data/rack1.
Assign the Accounting Entity of the user volume to the appropriate group for that user.
Assigning this Accounting Entity prevents the members of the group from collectively
overshooting a storage quota for the project.
Set quotas for the user volume.
Note: user17 and loganalysis_dev are used as examples below. Be sure to substitute the
appropriate user name and group when you create the volumes for your team members.
1. In the MCS, in the Navigationpane under the MapR-FSgroup, click Volumes.
2. In the Volumes tab click the New Volume button.
3. Following the example below, enter the volume settings for each user volume in the
New Standard Volume dialog box.
Volume Setup section
PROPRIETARY AND CONFIDENTIAL INFORMATION 43
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
49/107
Cluster Admin on Hadoop
Volume Type: Standard Volume
Volume Name: user17-homedir
Mount Path: /mapr//home/user17/vol
Topology: /data/rack2
User/Group (This specifies the Accounting Entity)
Group loganalysis_dev
Note: group must exist on all nodes in the cluster
Permissions: u: user 17 f cUsage Tracking
o Quotas(This specifies disk quota for the volume itself)o Volume Advisory Quota: 100G
o Volume Hard Quota: 128G
4. Click OK.
PROPRIETARY AND CONFIDENTIAL INFORMATION 44
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
50/107
Cluster Admin on Hadoop
Command Line
It is also possible to create a new volume at the command line. For example:
maprcli volume create -path /home/user17/vol \
-ae loganalysis_dev -aetype 1 -topology /data/rack2 \
-quota 128G -advisoryquota 100G \
-user user17:fc -name user17-homedir
Note:Themaprcli volume createcommand requires specific ordering of
arguments. Make sure that the - nameoption comes last.
You can change quotas later at the command line. For example:
maprcli volume modify -quota 20G -advisoryquota 15G \
-name user17-homedir
5. Change ownership of the volume for the user. At a command line type:
chown user17 /mapr//home/user17/
Create a volume for your t eam pro ject
In this step, you will create a volume for your team project, if applicable. Bear in mind the
following criteria for your project volume
Restrict development volumes to the /data/rack2 topology, which prevents development
projects from consuming storage resources on /rack1.
Production volumes should be allowed to span the entire cluster, so they will have a
topology of /data
Set group permissions on each volume:
For developmentvolumes, members of both prod and dev groups get full control
Forproductionvolumes, only members of the prod group get full control
Assign your group as the Accounting Entity
Set quotas for the project volume:
Development volumes Advisory Quota is 9T and the Hard Quota is 10T
Productionvolumes Advisory Quota is 19T and the Hard Quota is 20T
Note: loganalysis_dev is used in the examples below. Be sure to substitute the appropriate user
name and group when you create the volumes for your project.
PROPRIETARY AND CONFIDENTIAL INFORMATION 45
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
51/107
Cluster Admin on Hadoop
1. Create the top-level project directory under /mapr//home/, if it
doesnt exist. For example, at a command line type:
mkdir /mapr//home//
2. In the MCS, in the Navigation pane under the MapR-FS group, click Volumes.
3. Create the project volume. In the Volumes tab click the New Volume button.
4. Following the example below, enter the volume settings for the project volume in the
New Standard Volume dialog box.
Volume Setup section
Volume Type: Standard Volume
Volume Name: loganalysis-dev
MountPath: /mapr/// home/ l oganal ysi s_dev/ vol
Note: the example below is for a development group volume. If you are creating a
volume for a production group then the topology would be / dat a
Topology: / dat a/ r ack2
Permissions section
Note: the example below is for a development group volume. If you are creating a
volume for a production group, do not add permissions for the development group.
g:loganalysis_dev fc
g:loganalysis_prod fc
Usage Tracking
User/Group (This specifies the Accounting Entity)
Group loganalysis_dev
Quotas (This specifies disk quota for the volume itself)
Note: the examples below are for a development group volume. If you are creating a
volume for a production group the Advisory Quota is 19T and the Hard Quota is 20T.
Volume Advisory Quota: 9T
Volume Hard Quota: 10T
5. Click OK.
6. Change ownership and permissions of the project volume. At a command line type:
PROPRIETARY AND CONFIDENTIAL INFORMATION 46
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
52/107
Cluster Admin on Hadoop
chgrp loganalysis_dev
/mapr//home/loganalysis_dev/vol
chmod g+rwx /mapr//home/loganalysis_dev/vol
Verify that the volumes are set up correctly
1. In the MCS, in the Navigation pane under the MapR-FS group, click Volumes. The
Volumes view appears, listing all volumes in the cluster.
2. Confirm that all of the volumes you created are listed in the Volumes view. Other
volumes that are part of the default cluster configuration may also appear here. You can
use the Filter option to list, for example, only the volumes with a mount path matching
/home*, as shown below.
3. Navigate the volumes at the command line and verify that they have been mounted. For
example:
ls -al /mapr//home/
ls -al /mapr//home/loganalysis_dev/vol
You should see the volumes you just created in the previous steps mounted in these
locations.
PROPRIETARY AND CONFIDENTIAL INFORMATION 47
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
53/107
Cluster Admin on Hadoop
Set disk usage quotas for your project Accounting Entity
By setting a quota on an Accounting Entity, we can make sure that all volumes assigned to the
Accounting Entity (including user volumes and project volumes) do not collectively overshoot a
project maximum.
1. In the MCS, in the Navigationpane under the MapR-FSgroup, click User Disk Usage.
The User Disk Usage panel displays all users and groups that have been assigned as an
Accounting Entity (e.g. loganalysis_dev).
2. Click on your project Accounting Entity. The Group Properties dialog box appears.
3. Following the example below, enter the quota settings for your project Accounting
Entity in the Usage Trackingsection of the Group Propertiesdialog box.
For development projects:
Turn on User/Group Advisory Quota. Enter 9T
Turn on User/Group Hard Quota. Enter 10T
For production projects:
Turn on User/Group Advisory Quota. Enter 19T
Turn on User/Group Hard Quota. Enter 20T
PROPRIETARY AND CONFIDENTIAL INFORMATION 48
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
54/107
Cluster Admin on Hadoop
Command Line
It is also possible to set the Accounting Entity quotas at the command line. For example:
maprcli entity modify -quota 10T -advisoryquota 9T \
-name loganalysis_dev -type 1
Conclusion
Before you begin adding data to your cluster or submitting jobs make a decision about topology
(node/data placement) and implement this decision on your cluster
Create volumes early and often. It is much easier to manage cluster data at a volume level than
managing all of the data on the cluster as one enormous data set. Imagine trying to manage
petabytes of data!
Creating separate volumes provides flexibility of resource management by separating ownership
from accounting
Do not use the / or /data/default-rack topology for data placement
PROPRIETARY AND CONFIDENTIAL INFORMATION 49
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
55/107
Lesson 5: Data Ingestion, Access &
Availability
Labs Overview
Lesson 5 labs cover the following topics:
Accessing the cluster using NFS
Snapshots
Mirrors
Multiple Clusters and Disaster Recovery
5.1 Get Data into an NFS Cluster
Topics and tasks in this first lab will help you to
understand the significance of NFS in MapR
learn how to get data into a cluster using NFS
view and manipulate data directly on your cluster using standard Linux file commands via
NFS
Before you begin the lab steps, the cluster filesystem must be mounted on the data instance.
Create Input Directory for Data
Copy Data from Data Instance to Input Directory on Cluster
1. SSH to the data instance (NFS node for exercise and contained in your hosts file )
mkdir /mapr/
2. Mount you cluster to the NFS client node
# mount t nfs :/mapr /mapr/
3. Copy the data from the /etc directory on the data instance to the input directory on
your project volume that you created in the previous step
cp v /etc/*.conf
/mapr//home/loganalysis_dev/input
4. Verify that the data is now on in the input directory on your cluster volume
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
56/107
Cluster Admin on Hadoop
ls /mapr//home/loganalysis_dev/input
You should see a collection of files that end in .conf
5. Verify that the data you moved from the data instance is now on the cluster in your
project volume
l s / mapr / / home/ l oganal ysi s_dev/ i nput
Run a MapReduce Job on Data
1. Run a MapReduce job on the data
hadoop jar /opt/mapr/hadoop/hadoop-0.20.2/hadoop-0.20.2- \
dev-examples.jar wordcount \
2. View the output of the MapReduce job
ls /mapr//home/loganalysis_dev/output1
Modify Data and Run MapReduce again
1. From the/home//input
Use `sed` to add some files to your input data directory
for i in `ls *`; do cp $i `echo $i | sed"s/.conf/AA.conf/g"`; done
2. Re-run the same MapReduce job on the data sending the output to a new directory
hadoop jar /opt/mapr/hadoop/hadoop-0.20.2/hadoop-0.20.2-
dev- \examples.jar wordcount /home//dev/input
\ /home//dev/output2
Compare Results f rom Both MapReduce Jobs
1. Compare the output from the MapReduce jobs
diff \ /mapr/my.cluster.com/home//output1/
\ part-r-00000 \
/mapr/my.cluster.com/home//output2/ \
part-r-00000 \
You should see the change you made in the previous step
PROPRIETARY AND CONFIDENTIAL INFORMATION 51
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
57/107
Cluster Admin on Hadoop
Conclusion
In this lab you experienced copying data from an external data source to the cluster storage via
NFS. You were able to do so with standard Linux file commands that are familiar to system
administrators. This process would have been much more technically challenging and taken a
significantly longer time to perform without NFS.
Lab 5.2: SnapshotsExplore how snapshots work by creating snapshots at various points in time of a volume
containing changing data to see that each snapshot shows data from a fixed point in time. Also
see that snapshot creation is almost instantaneous and that the snapshot can preserve data that
has since been deleted or changed. Learn to apply a schedule so that snapshots are
automatically created at fixed intervals. Schedules also allow snapshots to expire at a time you
designate. This lad has 4 exercises:
Create a snapshot in two ways
Show how snapshots capture frozen views of past state
Show snapshots preserve deleted data
Create a snapshot schedule from the MCS
Create snapshot in two ways
This exercise will create a snapshot of a volume at a particular point in time, using two different
methods for making a snapshot.
Preparation: Before starting this exercise, you should have created a volume for your
experiments and mounted it. If you havent already created such a volume, do so now using the
MCS. Make sure that your volume is different from the volumes other students are using for
this exercise to minimize confusion about who is doing what.
Note: The diff and vi you used above are standard Linux commands. Because the cluster
filesystem is mounted via NFS, any standard Linux programs that operate on text files (sed,
awk, grep, etc.) can be used with data on your cluster. This would not be possible without
NFS. You would need to copy the file out of the cluster first before performing your task andthen copy the resultant file back into the cluster.
PROPRIETARY AND CONFIDENTIAL INFORMATION 52
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
58/107
Cluster Admin on Hadoop
Put some sample data into your volume
1. Use ssh to log in to a node in your cluster. Use your own user id here.
$ ssh mapr@classnode-cluster
2. Change directory to your personal volume.
$ cd /mapr//snapshot_lab_mnt_user01
3. Create a data file called STATIC in your personal user-volume containing whatever
data you choose.
$ cat /etc/hosts > STATIC
Create a volume snapshot of your volume using MCS
Use the MCS to create a snapshot, as shown here:
Select New Snapshot from the pull down menu under Modify Volume on top bar, provide a
name for your snapshot, and click OK to create a snapshot of the selected volume, in this case,
snapshot_lab_vol_user01. This will create a snapshot of the volume you have selected.
PROPRIETARY AND CONFIDENTIAL INFORMATION 53
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
59/107
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
60/107
Cluster Admin on Hadoop
Create new data files in your volume by running a shell scrip t
.Run the following commands:
$ cd /mapr//snapshot_lab_mnt_user01
$ while true; do
touch file-$(date +%T)
date >> log; sleep 13
done &
This creates a new file every 13 seconds as this script runs in the background. The file name of
each file will contain the time the file is created. The last command will also log the time eachfile is created. This log file will look something like this:
Thu Dec 13 17: 15: 44 PST 2012
Thu Dec 13 17: 15: 57 PST 2012
Thu Dec 13 17: 16: 10 PST 2012
Thu Dec 13 17: 16: 23 PST 2012
The files created will look something like this:
$ ls
f i l e- 17: 15: 44 f i l e- 17: 16: 23 l og
f i l e- 17: 15: 57 f i l e- 17: 16: 10 STATI C
$
Create a new snapshot, wait about 30 seconds, then create another snapshot
Note the last time notation that was displayed in the original ssh window when you created
each snapshot by putting a line into the log file.
$maprcli volume snapshot create -volumesnapshot_lab_vol_user01
-snapshotname snapshot3_user01; echo snapped $(date)
>> log
PROPRIETARY AND CONFIDENTIAL INFORMATION 55
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
61/107
Cluster Admin on Hadoop
Explore the snapshot directory from CLI
1. Change directory into the mount point of the volume you created the snapshots for
earlier
2. List all files and directories there using "ls -a". Note that you won't see the .snapshot
directory because it is hidden. You can see the contents of the .snapshot directory if
you explicitly give its name, but you wont see it otherwise.
Even though you don't see the .snapshot directory using l sin the volume mount point, it is still
there and you can look inside. Do this:
$ ls alh .snapshot
total 2.5K
drwxr-xr-x. 5 root root 3 Jul 16 12:58 .
drwxr-xr-x. 2 root root 2 Jul 16 12:57 ..
drwxr-xr-x. 2 root root 1 Jul 16 12:24 snapshot2_user01
drwxr-xr-x. 2 root root 2 Jul 16 12:57 snapshot3_user01
drwxr-xr-x. 2 root root 1 Jul 16 12:24
SNP_of_lab_vol_user01_ ---2013-07-16.12-31-44
You should see the snapshots that you created earlier.
Note: You can also see a list of snapshots in the MCS along with details like when they werecreated and when they will expire. You will not, however, be able to see the contents of the
snapshots from the MCS.
1. List the contents of each snapshot. You should see that more files appear in each
subsequent snapshot, like this:
$ ls .snapshot/*
. snapshot / snapshot 1:
STATI C
. snapshot / snapshot 2:
f i l e- 08: 39: 16 f i l e- 08: 39: 55 f i l e- 08: 40: 34 f i l e- 08: 41: 13
f i l e- 08: 39: 29 f i l e- 08: 40: 08 f i l e- 08: 40: 47 l og
PROPRIETARY AND CONFIDENTIAL INFORMATION 56
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
62/107
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
63/107
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
64/107
Cluster Admin on Hadoop
Schedule snapshots from MSC
You will need a schedule for this next part of the lab. Schedules are independent of volumes,
snapshots or mirrors. A schedule simply expresses a policy in terms of frequency and retention
times.
Create a custom schedule
Using the MCS to create a schedule:
1. Click Schedules under MapR-FS in the navigation pane
2. Click the New Schedule button
3. Give the schedule a name (every_5 _minutes) and a rule (say every 5 minutes and
retain(expire) after 45 minutes)
4. Click the save schedule button
Note: the schedule is not currently applied to any volumes
Apply the schedule
Now you should use the MCS to apply the custom schedule a snapshot schedule for one of your
volumes:
1. Click Volumes under MapR-FS in the Navigation pane
2. Click the name of one of your volumes
3. Scroll down to the Snapshot Scheduling section
PROPRIETARY AND CONFIDENTIAL INFORMATION 59
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
65/107
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
66/107
Cluster Admin on Hadoop
Lab 5.3: Mirrors and schedules
Gain experience with making mirrors manually via the MCS and CLI. Also learn to apply a
schedule to update data from the source volume. This lab has three parts:
Create a mirror from the MapR Control System (MCS)
Apply a schedule to the mirror
Create a mirror from CLI and initiate a mirror sync
Create a mirror from the MCS
Create a local mirror based on a source volume
1. Connect to the cluster MCS
2. Choose one of the ways to set up a mirror volume, for instance, choose volumes from
the left bar to display volumes of choice for the source volume. If possible pick a volume
containing data for the source volume so will be able to verify that it is copied to the
new mirror volume.
2. Make the selection New Volume from top menu and fill in the template to make a
local mirror (Mounting is optional )
PROPRIETARY AND CONFIDENTIAL INFORMATION 61
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
67/107
Cluster Admin on Hadoop
Now you have created the mirror volume, but no data has been copied to it.
3. Verify your new mirror volume exists by selecting Mirror Volumes on left bar menu to
display names of all mirrors.
Copy data to your new mirror volume
1. Use the MCS to start mirroring by selecting this option from the Modify Volume
button drop down menu.
2. Verify that data are copied to your mirror volume by watching the display of mirror
volumes. If there is a lot of data, you will see an indication that the copying is in
progress:
PROPRIETARY AND CONFIDENTIAL INFORMATION 62
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
68/107
Cluster Admin on Hadoop
Apply a schedule to the mir ror
1. Use the MCS to apply a schedule to update your mirror volume.
Create a mirror from CLI and in itiate a mirror sync
Use CLI to create a new local mirror volume of a different source volume .
1. CLI to manually create a mirror volume
CLI example:
Determine schedule ids available
#maprcli schedule list
# maprcli volume create name -
source \ < source_vol_mirror@clusterName> -type 1 schedule
2. Use CLI to initiate a mirror sync
# maprcli volume mirror start name
OR
# maprcli volume mirror push name
PROPRIETARY AND CONFIDENTIAL INFORMATION 63
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
69/107
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
70/107
Cluster Admin on Hadoop
Set Up
1. Verify all nodes in the source cluster have for the source cluster(line 1)
and configure all nodes to be aware of the (line2)
2. SSH to the node you are configuring on the source cluster
3. Verify in / opt / mapr / conf / mapr - cl ust er s. confthat the is there
Team1
4. Add a second line in / opt / mapr / conf / mapr - cl ust er s. conffor the remote
cluster in the format:
: 7222 : 7222
cluster2 is the name of the destination cluster
and are the CLDB nodes in the destination cluster
5. Restart the Warden on your all your node(s)
service mapr-warden restart
Note: there is a small bug that requires you to add more than one remote cluster to the mapr-
clusters.conf to be visible in the GUI (3.0.2)
Configure all nodes in the destination cluster
Configure all nodes in the destination cluster with a unique name for the destination cluster
and configure all nodes to be aware of the source cluster
1. SSH to the nodes you are configuring on the destination cluster
2. Edit /opt/mapr/conf/mapr-clusters.conf and verify your
3. Add a second line in / opt / mapr / conf / mapr - cl ust er s. conffor the source
cluster in the format:
: 7222 : 7222
Teamname2 is the name of the source cluster
and are the CLDB nodes in the source cluster
4. Restart the Warden on your nodes
service mapr-warden restart
Note: the remainder of the steps should be completed by each team
PROPRIETARY AND CONFIDENTIAL INFORMATION 65
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
71/107
Cluster Admin on Hadoop
Verify that each cluster has a unique name
Verify that each cluster has a unique name and is aware of the other cluster
1. Log on to the MCS of the source cluster
2. Verify that the cluster name is cluster1
3. Click the + symbol next to the cluster1
4. Verify that cluster2 is listed under Available Clusters
5. Log on to the MCS of the destination cluster
6. Verify that the cluster name is cluster2
7. Click the + symbol next to the cluster2
8. Verify that cluster1 is listed under Available Clusters
PROPRIETARY AND CONFIDENTIAL INFORMATION 66
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
72/107
Cluster Admin on Hadoop
Create a remote mirror volume on the destination cluster
You should be logged into the MCS on the destination cluster
1. Select Volumes in the Navigation pane
2. Click the New Volume button
o Volume Type: Remote Mirror Volumeo Enter a unique name for your mirror volumeo Enter the name of the source volumeo Source Cluster Name: cluster1o Enter a unique mount path for the mirror volume
The parent directory must already exist
o Ensure that the Mounted checkbox is checkedo Topology: /data
3.
Click the OK button
You should see confirmation at the top of the MCS indicating that the mirror volume was
created
PROPRIETARY AND CONFIDENTIAL INFORMATION 67
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
73/107
Cluster Admin on Hadoop
Initiate mirroring to the destination cluster
1. If not already selected, click Volumes in the Navigation pane
2. Select the mirror volume you created in Step 4
3. Click Volume Actions
4. Select Start Mirroring
PROPRIETARY AND CONFIDENTIAL INFORMATION 68
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
74/107
Cluster Admin on Hadoop
Verify data from source cluster was copied to destination c luster
1. SSH to any node on the destination cluster
2. List the contents of the destination mirror volume
hadoop f s l s /
or, if the cluster filesystem is mounted via NFS
l s / mapr / Team2/
You should see the exact same contents in the mirror volume as you do in the original source
volume
Conclusion
In this lab you learned how to copy data from one cluster to another using remote mirroring. As youlearned earlier in this course, MapR volumes allow you a greater degree of control over how to
manage data in the cluster. Mirroring the volumes that contain your business-critical data to a
remote cluster can significantly reduce the amount of key data you would lose and the time it would
take to resume productivity in the event of a disaster.
Lab 5.5: Using the HBase shell
The objective of this lab is to get you started with HBase shell and perform operations to create
a table, put data into the table, retrieve data from the table and delete data from the table.
Start HBase shell
1. Get a help listing which demonstrates some basic commands.
a. Get help specifically on the "put" command.
2. Create a table called 'Blog' with the following schema: blog title, blog topic, author first
name, author last name. The blog title and topic must be grouped together as they will
be saved together and retrieved together. Author first and last name must also be
grouped together.
3. List the new table you created in its directory, to confirm it was created.
4. Insert the following data to the 'Blog' table.
PROPRIETARY AND CONFIDENTIAL INFORMATION 69
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
75/107
Cluster Admin on Hadoop
Where Title and Topic are in column family info and First and Last are in column family author
ID Title Topic First Last
1 MapR M7 is Now Available on Amazon
EMR
cloud Diana Truman
2 Enterprise Grade Solutions for HBase highavail Roopesh Nair
3 A Comparison of NoSQL Database
Platforms
nosql Jonathan Morgan
5. Count the number of rows. Make sure every row is printed to the screen as it iscounted.
6. Retrieve the entire record with ID '2'.
7. Retrieve only the title and topic for record with ID '3'.
8. Change the last name of the author with title "A Comparison of NoSQL Database
Platforms".
Display the record to verify the change.
Display both the new and old value. Can you explain why both values are there?
9. Display all the records.
10.Display the title and last name of all the records.
11.Display the title and topic of the first two records.
12.Delete the record with title "Enterprise Grade Solutions for HBase".
Verify that the record was deleted by scanning all records, or
Try to select just that record.
13.Drop the table 'Blog'.
PROPRIETARY AND CONFIDENTIAL INFORMATION 70
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
76/107
Cluster Admin on Hadoop
Create a Table us ing MapR Control System (MCS)
1. Connect to MCS from a bowser using the notes from the instructor. Login with your
account.
2. Create a table called 'Blogtest' with the following schema: blog title, blog topic, authorfirst name, author last name. The blog title and topic must be grouped together as they
will be saved together and retrieved together. Author first and last name must also be
grouped together.
3. List the new table you created in its directory, to confirm it was created.
4. Insert some data and test. Also change the number of versions of cells you can keep
and test.
MapR Tables - Solut ions
1. Use cases fit for MapR tables:
A data store comprised of petabytes of semi-structured data.
A data store that will be access by large numbers of client requests, for example
thousands of reads per second.
2. Use cases not fit for MapR tables:
Access normalized relational data with SQL
Full text search
3. Columns may be created when data is inserted, they don't have to be defined up front.
MapR can scale up to very large numbers of columns per column family. However, table
name and column family have to be defined before data is inserted.
4. In addition to using the l i s t command in the HBase shell, you can use standard Linux
l s to list all tables (and files) stored in a particular directory.
PROPRIETARY AND CONFIDENTIAL INFORMATION 71
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
77/107
Cluster Admin on Hadoop
HBase Shell Solution
You can find the commands below in the file Lab1_hbase_shel l _commands. t xt .
1. Start a hbase shell in your command window
user02@ip-10-196-89-226:~$ hbase shell
2. Hbase help command
hbase> help
hbase> help "put"
3. Create a table /user/user01/Blog with column families info and author
hbase> create '/user/user01/Blog', {NAME=>'info'},
{NAME=>'author'}
Since it was required that title and topic be grouped they will be stored as columns that
belong to the 'info' column family while 'first' and 'last' will belong to the 'author'
column family.
4. List the table:
hbase> list '/user/user01/'
5. Execute the following put statements to insert the records into Blog table:
hbase>put '/user/user01/Blog','1','info:title', 'MapR M7is Now Available on Amazon EMR'
hbase>put '/user/user01/Blog','1','info:topic','cloud'
hbase>put '/user/user01/Blog','1','author:first','Diana'
hbase>put '/user/user01/Blog','1','author:last','Truman'
hbase>put '/user/user01/Blog','2','info:title','Enterprise Grade Solutions for HBase'
hbase>put '/user/user01/Blog','2','info:topic','highavail'
hbase>put '/user/user01/Blog','2','author:first','Roopesh'
hbase>put '/user/user01/Blog','2','author:last','Nair'
hbase>put '/user/user01/Blog','3','info:title', 'AComparison of NoSQL Database Platforms'
hbase>put '/user/user01/Blog','3','info:topic','nosql'
PROPRIETARY AND CONFIDENTIAL INFORMATION 72
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
78/107
Cluster Admin on Hadoop
hbase>put'/user/user01/Blog','3','author:first','Jonathan'
hbase>put '/user/user01/Blog','3','author:last','Morgan'
6. Count the number of rows of data inserted
hbase> count '/user/user01/Blog',INTERVAL=>1
7. Retrieve the entire record with ID 2
hbase> get '/user/user01/Blog','2'
8. Retrieve only the title and topic for record with ID '3'.
hbase> get
'/user/user01/Blog','3',{COLUMNS=>['info:title','info:topic
']}
9. The record with title "A Comparison of NoSQL Database Platforms" has ID 3. To update
its value execute a put operation with that ID.
hbase>put '/user/user01/Blog', '3','author:last','Smith'
To verify the put worked, select the record:
hbase> get '/user/user01/Blog','3',{COLUMNS=>'author:last'}
To display both version specify the number of versions in a get operation:
hbase> get '/user/user01/Blog','3',{COLUMNS=>'author:last', VERSIONS=>3}
The reason we see the old value is cells have up to three versions by default in MapR
tables.
10.Display all the records.
hbase> scan '/user/user01/Blog'
PROPRIETARY AND CONFIDENTIAL INFORMATION 73
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
79/107
Cluster Admin on Hadoop
11.Display the title and last name of all the records.
hbase> scan'/user/user01/Blog',{COLUMNS=>['info:title','author:last']}
12.Display the title and topic of the first two records.
hbase> scan '/user/user01/Blog',
{COLUMNS=>['info:title','info:topic'],LIMIT=>2}
13.The record with title "Enterprise Grade Solutions for HBase" has record ID '2'; delete all
columns for record with ID '2':
hbase> delete '/user/user01/Blog','2','info:title'
hbase> delete '/user/user01/Blog','2','info:topic'
hbase> delete '/user/user01/Blog','2','author:first'
hbase> delete '/user/user01/Blog','2','author:last'
14.To delete a table in HBase shell, the table must first be disabled, and then you can drop
it.
hbase> disable '/user/user01/Blog'
hbase> drop '/user/user01/Blog'
Troubleshooting
NameError: undefined local variable or method `interval' for #
Happens for hbase> count '/user/user02/Blog', interval=>1
Use uppercase INTERVAL example: hbase> count '/user/user02/Blog', INTERVAL=>1
PROPRIETARY AND CONFIDENTIAL INFORMATION 74
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
80/107
Cluster Admin on Hadoop
HBase shell commands (optional)
The objective of this optional lab lab is to run scripts from your hbase shell. These commands
can be run individually in a hbase shell. Or they can be pasted into a script and run. Example:
hbase> source "hbase_script.txt"
1. Open a vi session and insert the following into your script.
2. Adjust all references to home directory to the appropriate directory
3. Name your script
4. Run your script
Additional commands to experiment with
# NOTE: You can copy- past e mul t i pl e l i nes at a t i me# i nt o HBase shel l . Or , you can sour ce a scr i pt .# Exampl e: hbase> sour ce "hbase_scr i pt . t xt "
# Backgr ound i nf ormat i on on HBase Shel l at :# ht t p: / / wi ki . apache. or g/ hadoop/ Hbase/ Shel l########################################################### Sol ut i on t o Lab 1# NOTE: Change t he t abl e paths t o your own user di r ect ory# so your act i ons don' t conf l i ct wi t h ot her st udent s.# Exampl e: cr eat e ' / home/ user 12/ atabl e' , {NAME=>' cf 1' }##########################################################
hel phel p "put "
cr eat e ' / home/ user01/ Bl og' , {NAME=>' i nf o' }, {NAME=>' aut hor ' }l i st ' / home/ user 01/ '
put ' / home/ user 01/ Bl og' , ' 1' , ' i nf o: t i t l e' , ' MapR M7 i s Now Avai l abl eon Amazon EMR'put ' / home/ user01/ Bl og' , ' 1' , ' i nf o: t opi c ' , ' cl oud'put ' / home/ user01/ Bl og' , ' 1' , ' aut hor : f i r st ' , ' Di ana'put ' / home/ user 01/ Bl og' , ' 1' , ' aut hor : l ast ' , ' Tr uman'
put ' / home/ user01/ Bl og' , ' 2' , ' i nf o: t i t l e' , ' Ent erpr i se Gr adeSol ut i ons f or HBase'put ' / home/ user01/ Bl og' , ' 2' , ' i nf o: t opi c ' , ' hi ghavai l 'put ' / home/ user 01/ Bl og' , ' 2' , ' aut hor : f i r st ' , ' Roopesh'put ' / home/ user01/ Bl og' , ' 2' , ' aut hor : l ast ' , ' Nai r 'put ' / home/ user 01/ Bl og' , ' 3' , ' i nf o: t i t l e' , ' A Compar i son of NoSQLDat abase Pl atf or ms'put ' / home/ user01/ Bl og' , ' 3' , ' i nf o: t opi c ' , ' nosql 'put ' / home/ user01/ Bl og' , ' 3' , ' aut hor: f i r st ' , ' J onat han'
PROPRIETARY AND CONFIDENTIAL INFORMATION 75
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
81/107
Cluster Admin on Hadoop
put ' / home/ user 01/ Bl og' , ' 3' , ' aut hor : l ast ' , ' Mor gan'
count ' / home/ user01/ Bl og' , I NTERVAL=>1
get ' / home/ user 01/ Bl og' , ' 2'get ' / home/ user 01/ Bl og' , ' 3' , {COLUMNS=>[ ' i nf o: t i t l e' , ' i nf o: t opi c' ] }
put ' / home/ user01/ Bl og' , ' 3' , ' aut hor: l ast ' , ' Smi t h'
get ' / home/ user01/ Bl og' , ' 3' , {COLUMNS=>' aut hor : l ast ' }
get ' / home/ user 01/ Bl og' , ' 3' , {COLUMNS=>' aut hor : l ast ' , VERSI ONS=>3}
scan ' / home/ user 01/ Bl og'scan ' / home/ user 01/ Bl og' , {COLUMNS=>[ ' i nf o: t i t l e' , ' aut hor : l ast ' ] }scan ' / home/ user 01/ Bl og' ,{COLUMNS=>[ ' i nf o: t i t l e' , ' i nf o: t opi c' ] , LI MI T=>2}
del ete ' / home/ user01/ Bl og' , ' 2' , ' i nf o: t i t l e'del et e ' / home/ user 01/ Bl og' , ' 2' , ' i nf o: t opi c 'del et e ' / home/ user 01/ Bl og' , ' 2' , ' aut hor: f i r st 'del et e ' / home/ user 01/ Bl og' , ' 2' , ' aut hor : l ast '
#di sabl e ' / home/ user 01/ Bl og'#dr op ' / home/ user01/ Bl og'
########################################################### Addi t i onal commands t o exper i ment wi t h# NOTE: You can copy- past e mul t i pl e l i nes at a t i me# i nt o HBase shel l . Or , you can sour ce a scr i pt .# Exampl e: hbase> sour ce "hbase_scr i pt . t xt "##########################################################
# add cont ent col umn- f ami l y t o tabl eal t er ' / home/ user01/ Bl og' , {NAME=>' cont ent ' }
# i nser t r ow 1put ' / home/ user 01/ Bl og' , ' Di ana- 001' , ' i nf o: t i t l e' , ' MapR M7 i s NowAvai l abl e on Amazon EMR'put ' / home/ user 01/ Bl og' , ' Di ana- 001' , ' i nf o: aut hor ' , ' Di ana'put ' / home/ user 01/ Bl og' , ' Di ana- 001' , ' i nf o: dat e' , ' 2013. 05. 06'put ' / home/ user 01/ Bl og' , ' Di ana- 001' , ' cont ent : post ' , ' Lor em i psumdol or si t amet , consectet ur adi pi si ci ng el i t '
# i nser t r ow 2put ' / home/ user 01/ Bl og' , ' Di ana- 002' , ' i nf o: t i t l e' , ' I mpl ement i ngTi meouts wi t h Fut ureTask'put ' / home/ user 01/ Bl og' , ' Di ana- 002' , ' i nf o: aut hor ' , ' Di ana'put ' / home/ user 01/ Bl og' , ' Di ana- 002' , ' i nf o: dat e' , ' 2011. 02. 14'put ' / home/ user 01/ Bl og' , ' Di ana- 002' , ' cont ent : post ' , ' Sed utper spi ci at i s unde omni s i st e nat us er r or si t '
# i nser t r ow 3
PROPRIETARY AND CONFIDENTIAL INFORMATION 76
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
82/107
Cluster Admin on Hadoop
put ' / home/ user 01/ Bl og' , ' Roopesh- 003' , ' i nf o: t i t l e' , ' Ent er pr i seGr ade Sol ut i ons f or HBase'put ' / home/ user 01/ Bl og' , ' Roopesh- 003' , ' i nf o: aut hor ' , ' Roopesh'put ' / home/ user 01/ Bl og' , ' Roopesh- 003' , ' i nf o: dat e' , ' 2012. 10. 20'put ' / home/ user 01/ Bl og' , ' Roopesh- 003' , ' cont ent : post ' , ' At ver o eoset accusamus et i ust o odi o di gni ssi mos duci mus'
# i nser t r ow 4put ' / home/ user 01/ Bl og' , ' J onat han- 004' , ' i nf o: t i t l e' , ' A Compar i sonof NoSQL Dat abase Pl at f orms'put ' / home/ user 01/ Bl og' , ' J onat han- 004' , ' i nf o: aut hor ' , ' J onat han'put ' / home/ user 01/ Bl og' , ' J onat han- 004' , ' i nf o: dat e' , ' 2013. 01. 08'put ' / home/ user 01/ Bl og' , ' J onat han- 004' , ' cont ent : post ' , ' Dui s aut ei r ur e dol or i n r epr ehender i t i n vol upt at e vel i t '
# i nser t r ow 5put ' / home/ user 01/ Bl og' , ' Syl vi a- 005' , ' i nf o: t i t l e' , ' Net Beans I DE7. 3. 1 I nt r oduces J ava EE 7 Suppor t 'put ' / home/ user01/ Bl og' , ' Syl vi a- 005' , ' i nf o: aut hor' , ' Syl vi a'put ' / home/ user 01/ Bl og' , ' Syl vi a- 005' , ' i nf o: dat e' , ' 2012. 07. 20'put ' / home/ user 01/ Bl og' , ' Syl vi a- 005' , ' cont ent : post ' , ' Except eursi nt occaecat cupi dat at non pr oi dent , sunt i n cul pa'
# count t he data you i nser t ed above, I NTERVAL speci f i es how of t encount s are di spl ayedcount ' / home/ user01/ Bl og' , {I NTERVAL=>2}count ' / home/ user01/ Bl og' , {I NTERVAL=>1}
# t hi s get won' t r et ur n anyt hi ng as t he rowkey doesn' t exi stget ' / home/ user 01/ Bl og' , ' unknownRowKey'
# r et r i eve ALL col umns f or t he pr ovi ded r owkeyget ' / home/ user 01/ Bl og' , ' J onat han- 004'# r et r i eve speci f i c col umns f or t he pr ovi ded r owkeyget ' / home/ user 01/ Bl og' , ' J onat han- 004' ,{COLUMN=>[ ' i nf o: aut hor ' , ' cont ent : post ' ] }
# r et r i eve dat a f or speci f i c col umns and t i me- st ampget ' / home/ user 01/ Bl og' , ' J onat han- 004' ,{COLUMN=>[ ' i nf o: aut hor ' , ' cont ent : post ' ] , TI MESTAMP=>1326061625690}
# exer ci se di f f er ent scan opt i onsscan ' / home/ user 01/ Bl og'scan ' / home/ user01/ Bl og' , {STOPROW=>' Syl vi a' }
scan ' / home/ user 01/ Bl og' , {COLUMNS=>' i nf o: t i t l e' ,STARTROW=>' Syl vi a' , STOPROW=>' J onat han' }
# update t he recor d f ew t i mes and t hen r et r i eve back mul t i pl ever si on# onl y 3 ver si ons are kept by def aul tput ' / home/ user 01/ Bl og' , ' J onat han- 004' , ' i nf o: dat e' , ' 2012. 01. 09'put ' / home/ user 01/ Bl og' , ' J onat han- 004' , ' i nf o: dat e' , ' 2012. 01. 10'put ' / home/ user 01/ Bl og' , ' J onat han- 004' , ' i nf o: dat e' , ' 2012. 01. 11'
PROPRIETARY AND CONFIDENTIAL INFORMATION 77
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
83/107
Cluster Admin on Hadoop
get ' / home/ user 01/ Bl og' , ' J onat han- 004' , {COLUMN=>' i nf o: dat e' ,VERSI ONS=>3}get ' / home/ user 01/ Bl og' , ' J onat han- 004' , {COLUMN=>' i nf o: dat e' ,VERSI ONS=>2}get ' / home/ user 01/ Bl og' , ' J onat han- 004' , {COLUMN=>' i nf o: dat e' ,VERSI ONS=>1}
# sel ect s 1 by def aul tget ' / home/ user01/ Bl og' , ' J onat han- 004' , {COLUMN=>' i nf o: date' }
# del et e a r ecor d, del et e al l ver si ons of t he cel lget ' / home/ user 01/ Bl og' , ' Roopesh- 003' , ' i nf o: dat e'del et e ' / home/ user 01/ Bl og' , ' Roopesh- 003' , ' i nf o: dat e'get ' / home/ user 01/ Bl og' , ' Roopesh- 003' , ' i nf o: dat e'
# del ete t he versi ons bef ore t he pr ovi ded t i mest ampget ' / home/ user 01/ Bl og' , ' J onat han- 004' , {COLUMN=>' i nf o: dat e' ,VERSI ONS=>3}del et e ' / home/ user 01/ Bl og' , ' J onat han- 004' , ' i nf o: dat e' ,1326254739791get ' / home/ user 01/ Bl og' , ' J onat han- 004' , {COLUMN=>' i nf o: dat e' ,VERSI ONS=>3}
# dr op t he t abl el i st ' / home/ user 01/ 'di sabl e ' / home/ user 01/ Bl og'dr op ' / home/ user 01/ Bl og'l i st ' / home/ user 01/ '
Using importtsv and copytable
The objective of this lab is to get you started with HBase shell and perform operations to create
a table, import flat tab separated data into the table, retrieve data from the table and delete
data from the table.
View Existing Table Using MCS
1. Log onto the MCS.
2. Select MapR-FS =>MapR Tables
3. Click on /user/mapr/ under Recently opened tables
If /user/mapr/ is not displayed under Recently opened tables, enter
/user/mapr/ in the Go to table field and click the Go button
4. Look at the information available in the Regions tab
Each row represents one region of data
The columns (Start Key, End Key, Physical Size, Logical size, etc.) represent
meaningful data about the table regions
PROPRIETARY AND CONFIDENTIAL INFORMATION 78
2014 MapR Technologies, Inc. All Rights Reserved.
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
84/107
-
8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1
85/107
Cluster Admin on Hadoop
Note: Highlighted area is syntax for 3.1 permissions
Notice the first column is defined as HBASE_ROW_KEY, this will take the first field of data
(namely the numerical index field) and make it the row:key.
Important: also notice that command above identifies each column in the data file as well as thecolumn family it belongs in. The column family used in the example below is cf1, cf2 and cf3. If
the table you are importing into has a different column family name, then you will need to
modify the command below to match the correct column family name.
6. While the import job is processing, look at the MCS to view changes to the table and
puts being processed on the node:
Click Nodes under Cluster
Click the Overview dropdown and change the value to Performance
If necessary, scroll to the right so you can see the Gets, Puts and Scans columns.
You should see a large number of puts across several nodes while your import is
processing
Click MapR Tables under MapR-FS
Click the name of the table you used for the import under Recently opened tables
Select the Regions tab
You should see that your table automatically split into a number of regions during
the import
7. In an hbase shell examine the data that has been imported
[root@CentOS001 data2]# hbase shell
HBase Shell; enter 'help' for list of supported commands. Type "exit" to
leave the HBase Shell
Ve