deploying biginsights 4.1 ibm spectrum scale hdfs transparency

134
Deployment Guide: IBM® BigInsights™ with IBM® Spectrum Scale™ HDFS Transparency and Ambari™ July 8, 2016 Version 1.0 Written for: Apache© Ambari V2.1 IBM BigInsights V4.1 IBM Open Platform with Apache Hadoop V4.1.0.2 IBM Spectrum Scale V4.1.1 and V4.2 IBM HDFS Transparency V2.7.0.3

Upload: ngotruc

Post on 14-Feb-2017

239 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

Deployment Guide: IBM® BigInsights™ with IBM® Spectrum Scale™ HDFS Transparency and Ambari™

July 8, 2016 Version 1.0

Written for:

Apache© Ambari V2.1 IBM BigInsights V4.1 IBM Open Platform with Apache Hadoop V4.1.0.2 IBM Spectrum Scale V4.1.1 and V4.2

IBM HDFS Transparency V2.7.0.3

Page 2: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

2/134

Contents Contents ...................................................................................................................................................................2

1. Purpose .............................................................................................................................................................6

2. Terminology ......................................................................................................................................................6

3. Limitations ........................................................................................................................................................7

3.1 General list ................................................................................................................................................7

3.2 Installation ................................................................................................................................................8

3.3 Configuration ............................................................................................................................................9

3.4 Kerberos mode .........................................................................................................................................9

3.5 Short-circuit read ......................................................................................................................................9

3.6 Ambari GUI ...............................................................................................................................................9

3.7 Node ...................................................................................................................................................... 10

3.7.1 Adding a node .................................................................................................................................... 10

3.7.2 Deleting and decommissioning a node ............................................................................................. 10

3.7.3 Moving a NameNode ......................................................................................................................... 11

4. Planning ......................................................................................................................................................... 11

4.1 Software packages ................................................................................................................................. 11

4.1.1 Mirror repository server .................................................................................................................... 11

4.1.2 Base packages .................................................................................................................................... 12

4.1.3 BigInsights Ambari package ............................................................................................................... 13

4.1.4 IBM Open Platform (IOP) for Hadoop ............................................................................................... 14

4.1.5 IBM Spectrum Scale ........................................................................................................................... 15

4.2 Software installation.............................................................................................................................. 19

4.2.1 Different installation modes .............................................................................................................. 19

4.2.2 Node roles information ..................................................................................................................... 22

5. Preparing the environment ........................................................................................................................... 22

5.1 Validating the network .......................................................................................................................... 23

5.2 Setting up password-less for root ......................................................................................................... 23

5.3 Pre-tasks preparation for IOP ................................................................................................................ 24

5.4 Yum repositories setup .......................................................................................................................... 25

5.4.1 Ambari and IOP repository ................................................................................................................ 26

Page 3: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

3/134

5.4.2 Setting up the IBM Spectrum Scale repository.................................................................................. 29

5.4.3 Setting up the OS repository ............................................................................................................. 32

5.5 IBM Spectrum Scale deployment modes .............................................................................................. 32

5.5.1 Deploy IOP over an existing IBM Spectrum Scale file system (FPO) ................................................. 33

5.5.2 Deploy IOP over an existing IBM Spectrum Scale file system (ESS) .................................................. 33

5.5.3 Deploy IOP over new IBM Spectrum Scale file system (FPO support only) ...................................... 35

6. Installation of a software stack ...................................................................................................................... 36

6.1 Overview ................................................................................................................................................ 36

6.2 Ambari IOP installation .......................................................................................................................... 37

6.2.1 Install the Ambari Server RPM........................................................................................................... 37

6.2.2 Update the Ambari Configuration ..................................................................................................... 39

6.2.3 Setting up the Ambari server ............................................................................................................. 42

6.2.4 Starting the Ambari server ................................................................................................................ 43

6.2.5 Ambari Install Wizard ........................................................................................................................ 44

6.2.6 Create an IOP cluster ......................................................................................................................... 44

6.3 Setting up High Availability [HA] ............................................................................................................ 56

6.4 Adding additional software services ..................................................................................................... 57

6.4.1 IBM BigInsights value-add modules .................................................................................................. 57

6.4.2 IBM Symphony ................................................................................................................................... 58

6.5 Install the GPFS integration module into Ambari .................................................................................. 58

6.5.1 Stop Ambari services ......................................................................................................................... 58

6.5.2 Installing the GPFS Ambari integration module.................................................................................... 59

6.5.3 Restart Ambari server ........................................................................................................................... 60

6.6 Adding the IBM Spectrum Scale service to Ambari ............................................................................... 61

6.6.1 Add Service ........................................................................................................................................ 61

6.6.2 Choose Services ................................................................................................................................. 62

6.6.3 Assign Masters ................................................................................................................................... 63

6.6.4 Assign Slaves and Clients ................................................................................................................... 64

6.6.5 Customize Services ............................................................................................................................ 65

6.6.6 Review ............................................................................................................................................... 71

6.6.7 Install, Start and Test ......................................................................................................................... 72

6.6.8 Summary ............................................................................................................................................ 73

Page 4: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

4/134

6.6.9 Ambari Cluster View .......................................................................................................................... 73

6.6.10 Restart Ambari Server ................................................................................................................... 74

6.6.11 Start all services ............................................................................................................................. 75

7. Verify and Test Installation ............................................................................................................................ 76

8. IBM Spectrum Scale versus Native HDFS ....................................................................................................... 80

8.1 Function limitations ............................................................................................................................... 80

8.2 Configuration that differs from native HDFS in IBM Spectrum Scale .................................................... 80

8.3 Short Circuit Read Configuration ........................................................................................................... 81

Appendix ................................................................................................................................................................ 82

A. Preparing a stanza File ............................................................................................................................... 82

Simple NSD File .............................................................................................................................................. 83

Standard NSD file ........................................................................................................................................... 84

Policy File ....................................................................................................................................................... 85

B. IBM Spectrum Scale-FPO Deployment ....................................................................................................... 86

Disk-partitioning algorithm ............................................................................................................................ 86

Failure Group selection rules ......................................................................................................................... 87

Rack Mapping File .......................................................................................................................................... 87

Partitioning Function Matrix in Automatic Deployment ............................................................................... 89

C. Dual-network deployment ......................................................................................................................... 90

D. BigInsights value-add services on IBM Spectrum Scale ............................................................................. 91

Installation ..................................................................................................................................................... 91

Troubleshooting value-add services .............................................................................................................. 92

E. Symphony Integration ................................................................................................................................ 95

F. Upgrade GPFS Ambari integration module ................................................................................................ 96

G. IBM Spectrum Scale Service Management ................................................................................................ 97

Service Actions Dropdown list ....................................................................................................................... 97

Running the service check ............................................................................................................................. 98

Upgrading IBM Spectrum Scale ..................................................................................................................... 99

Upgrading HDFS Transparency .................................................................................................................... 101

Integrating HDFS Transparency ................................................................................................................... 102

Unintegrating Transparency ........................................................................................................................ 105

H. Ambari Node management ...................................................................................................................... 109

Page 5: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

5/134

Adding a node .............................................................................................................................................. 109

Deleting a node ............................................................................................................................................ 116

Moving a NameNode ................................................................................................................................... 116

I. Collecting the snap data ........................................................................................................................... 118

J. Uninstalling Ambari IOP stack .................................................................................................................. 120

Ambari Server node ..................................................................................................................................... 120

Ambari Agent nodes .................................................................................................................................... 121

K. Resources ................................................................................................................................................. 122

FAQ ...................................................................................................................................................................... 123

Figures and Tables ............................................................................................................................................... 128

Notices ................................................................................................................................................................. 130

Revisions .............................................................................................................................................................. 134

Page 6: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

6/134

1. Purpose

The purpose of this document is to describe the installation and the configuration of the IBM® BigInsights™

Open Platform with Apache© Hadoop stack onto IBM® Spectrum Scale™ filesystem by using the Apache©

Ambari framework.

IBM Open Platform with Apache Hadoop (IOP) supports the Hadoop Distributed File System (HDFS). IBM

Spectrum Scale, formerly known as IBM General Parallel File System (IBM GPFS), can be offered to customers

who require advanced capabilities like a POSIX compliant file system, information lifecycle management,

incremental backups, high performance replication, and FIPS-140 / NIST complaint encryption.

With Ambari, the system administrator can provision, manage, and monitor a Hadoop cluster. Ambari can also

start and stop IBM Spectrum Scale services on all the nodes in the cluster and report the basic status

information through the Ambari web user interface (UI).

Prepare for system installation by obtaining all required software and review all limitations and configurations

by following the instructions in this document before performing the installation.

The examples and commands in this document are based on a RHEL 7 system.

2. Terminology

The following terms are used in this document for the Ambari integration with IBM Spectrum Scale and HDFS

Transparency.

Term Description

Native HDFS or HDFS The distributed file system designed to run on low-cost commodity hardware.

The HDFS service from Ambari dashboard:

GPFS Ambari Integration Module The package required to install an IBM Spectrum Scale service. This service supports IBM Spectrum Scale and HDFS Transparency from Ambari.

The IBM Spectrum Scale service Ambari service to monitor and configure IBM Spectrum Scale and HDFS Transparency. The IBM Spectrum Scale service from Ambari dashboard:

IBM Spectrum Scale A software-defined storage for high performance and large scale workloads.

Page 7: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

7/134

A GPFS Cluster or GPFS An IBM Spectrum Scale cluster.

GPFS Master The IBM Spectrum Scale cluster quorum node co-located with the Ambari server node.

A GPFS Node An Ambari node role in Assign Slaves and Clients where IBM Spectrum Scale packages will be installed and be part of a GPFS cluster.

GPFS Transparency GPFS Transparency in Ambari means HDFS Transparency.

An HDFS Transparency Node An Ambari node role in Assign Slaves and Clients where IBM Spectrum Scale HDFS Transparency package will be installed.

HDFS Transparency IBM Spectrum Scale HDFS Transparency (aka, HDFS Protocol) offers a set of interfaces that allows applications to use HDFS Client to access IBM Spectrum Scale through HDFS RPC requests.

Federation Ability to combine different hadoop clusters to run hadoop applications onto.

3. Limitations

This section describes the known limitations and workarounds for BigInsights Ambari IOP, IBM Spectrum Scale

and HDFS Transparency Integration.

Note: This is an iterative document. For the latest version of this document, see the IBM Spectrum Scale wiki,

Deploy BigInsights 4.1 Spectrum Scale HDFS Transparency with Ambari 2.1.

3.1 General list

General Limitations

Upgrading the version of Ambari is not supported.

GPFS Ambari Integration Module version 4.1-0 requires HDFS Transparency version 2.7.0.3.

HDFS Transparency version 2.7.0.3 requires Java OpenJDK version 1.8.

The Restart All, Restart GPFS Nodes and Restart GPFS Transparency Nodes options from the Spec-trum Scale dashboard > Service Actions cannot be used.

Page 8: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

8/134

Do not follow the restart prompt from the Restart Required alert displayed in the Spectrum Scale dashboard when HDFS or Spectrum Scale configurations have changed. To restart Spectrum Scale, users must use Spectrum Scale dashboard > Service Actions > Stop and Start options. This limitation is caused by an issue in Ambari 2.1.0 (JIRA AMBARI-12472).

The Restart option from the host dashboard cannot be used to restart GPFS on individual nodes. To restart the system, users must use the host dashboard at Host Actions > Stop GPFS Node and Start GPFS Node options.

Federation is not supported in Spectrum Scale Ambari Integration Module version 4.1-0.

For CentOS, create the /etc/redhat-release file to simulate a Redhat environment. Otherwise, the Ambari deployment fails.

For example: # cat redhat-release Red Hat Enterprise Linux Server release 7 .1 (Maipo)

3.2 Installation

Limitations

The GPFS Master node must be co-located with the Ambari server node. This implies that the Ambari server host is defined as an Ambari agent host in the Add Hosts UI panel while setting up the Hadoop cluster. Otherwise, IBM Spectrum Scale service fails to install if the nodes are not co-located. If the IBM Spectrum Scale service is not installed, perform the steps in Removing a Service (2.1.0) from the Confluence - Apache Ambari wiki and then add the IBM Spectrum Scale service to have the GPFS master node co-located with the Ambari server node.

While deploying IOP over an existing IBM Spectrum Scale for shared storage cluster (e.g. ESS), the IBM Spectrum Scale cluster on the shared storage system must be started and the file system must be mounted on all the nodes before starting the Ambari deployment.

If deploying Ambari IOP onto an existing HDFS Transparency cluster, make sure that the HDFS configuration provided through the Ambari UI is consistent with the existing HDFS configuration. For example, the existing HDFS NameNode and DataNode values must match the Ambari HDFS UI NameNode and DataNode values. Otherwise, the existing HDFS configurations will be overwritten by the default Ambari UI HDFS parameters after the Add Service Wizard is completed.

Ambari integration with the HDFS Transparency does not support migrating from an existing Ambari IOP Hadoop/GPFS cluster by using the first generation Hadoop connector. Contact [email protected] for more information.

Page 9: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

9/134

3.3 Configuration

Limitations

Any HDFS configuration changes require a restart of the GPFS Transparency nodes in order to take effect. It is important that the HDFS service be restarted before restarting the GPFS Transparency nodes. From the dashboard, select HDFS > Restart Required - Restart alert > Restart All Affected. Then from the dashboard, select Spectrum Scale > Service Actions > Stop and Start options. This is how the HDFS Transparency can synchronize with the HDFS configuration files. See Limitations - General list on Spectrum Scale restart.

After adding and removing nodes from Ambari, some aspects of the IBM Spectrum Scale configura-tion, such as pagepool as seen by running the mmlsconfig command, are not refreshed until after the next restart of the IBM Spectrum Scale Ambari service. However, this does not impact the func-tionality.

3.4 Kerberos mode

Limitations

For Kerberos support, the minimum version for HDFS Transparency is 2.7.0.0.

Kerberos is not supported in Ambari Integration with HDFS Transparency version 4.1-0. For manual configuration steps, contact [email protected] for more information.

3.5 Short-circuit read

Limitations

Short-circuit read is not enabled when IBM Spectrum Scale service is installed since BigInsights 4.1 is shipped with Hadoop version 2.7.1. Short-circuit works only with Hadoop version 2.7.0.

3.6 Ambari GUI

Limitations

If any GPFS node other than the GPFS Master is stopped, the Spectrum Scale panel does not display any alert.

The NFS gateway is displayed on the HDFS dashboard but is not used by HDFS Transparency.

Page 10: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

10/134

When the Spectrum Scale service is integrated with Ambari, the DataNodes field under the Summary tab on the HDFS GUI panel might display an incorrect data value. The data value shown here (4/4) shows the status of HDFS Transparency DataNodes which were also the native HDFS DataNodes prior to the integration of the Spectrum Scale service. If an HDFS DataNode was not selected as an HDFS Transparency node on the ASSIGN SLAVES AND CLIENTS selection page, the DataNodes field displays a disparity in value. However, the next line containing the DataNodes Status always displays the accurate data information for the HDFS Transparency DataNodes, irrespective of their status as a HDFS DataNode.

Example where the DataNode and DataNode Status values might be different:

The cluster contains three HDFS DataNodes and four HDFS Transparency Nodes. The value of DataNodes is 3/3 and the value of DataNodes Status is 4. DataNode Status displays the actual status for the number of HDFS Transparency Nodes.

3.7 Node

3.7.1 Adding a node

Limitations

Ambari adds nodes and installs the IBM Spectrum Scale software onto the existing IBM Spectrum Scale cluster, such as an ESS configuration, but does not create any Network Shared Disks (NSDs) or add NSDs into the existing file system. See the Adding a node section for more information.

3.7.2 Deleting and decommissioning a node

Limitations

Decommissioning a DataNode is not supported in GPFS Ambari Integration Module version 4.1-0.

Deleting a node is not supported in GPFS Ambari Integration Module version 4.1-0. To manually de-lete a node, see Deleting a node.

Page 11: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

11/134

3.7.3 Moving a NameNode

Limitations

Moving a NameNode from the Ambari HDFS UI when HDFS Transparency is integrated is not sup-ported in GPFS Ambari Integration Module version 4.1-0. To manually move the NameNode, see Moving a NameNode.

4. Planning

4.1 Software packages

This section lists the dependencies for only Ambari and IBM Spectrum Scale. The dependencies for Hadoop

have not been listed in this section. The installer must be setup to access the OS (e.g RHEL) repository from

every node of the Hadoop cluster. Failure in executing the yum install <RPM> command causes the overall in-

stallation process to fail.

4.1.1 Mirror repository server

IBM Spectrum Scale requires a local repository. Therefore, select a server to act as the mirror repository

server. This server requires the installation of the Apache HTTP server or a similar HTTP server. Every node in

the Hadoop cluster must be able to access this repository server. This mirror server can be defined in the DNS

or you can add an entry for the mirror server in /etc/hosts on each node of the cluster.

a) Create an HTTP server on the mirror repository server, such as Apache httpd. If Apache httpd is

not already installed, install it with the yum install httpd command. You can start Apache httpd

by running one of the following commands:

apachectl start

service httpd start

b) [Optional]: Ensure the http server starts automatically on reboot by running the following com-

mand:

chkconfig httpd on

c) Ensure that the firewall settings allow inbound HTTP access from the cluster nodes to the mirror

web server.

Page 12: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

12/134

d) On the mirror repository server, create a directory for your repositories, such as <document

root>/repos. For Apache httpd with document root /var/www/html, type the following com-

mand:

mkdir -p /var/www/html/repos

e) Test your local repository by browsing to the web directory:

http://<Yum-Server>/repos

The following example uses RHEL 7.1

# rpm -qa |grep httpd httpd-tools-2.4.6-31.el7.x86_64 httpd-2.4.6-31.el7.x86_64 # service httpd start # service httpd status Redirecting to /bin/systemctl status httpd.service httpd.service - The Apache HTTP Server Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled) Active: active (running) since Thu 2015-10-29 21:26:07 EDT; 6 months 6 days ago Process: 7400 ExecReload=/usr/sbin/httpd $OPTIONS -k graceful (code=exited, status=0/SUCCESS) Main PID: 8998 (httpd) Status: "Total requests: 0; Current requests/sec: 0; Current traffic: 0 B/sec" CGroup: /system.slice/httpd.service

├─ 6963 /usr/sbin/httpd -DFOREGROUND

├─ 6964 /usr/sbin/httpd -DFOREGROUND

├─ 7028 /usr/sbin/httpd -DFOREGROUND

├─ 8998 /usr/sbin/httpd -DFOREGROUND

├─15377 /usr/sbin/httpd -DFOREGROUND

├─19914 /usr/sbin/httpd -DFOREGROUND

├─19915 /usr/sbin/httpd -DFOREGROUND

├─20097 /usr/sbin/httpd -DFOREGROUND

├─20100 /usr/sbin/httpd -DFOREGROUND

├─20101 /usr/sbin/httpd -DFOREGROUND

└─21482 /usr/sbin/httpd -DFOREGROUND

…. # systemctl enable httpd

4.1.2 Base packages

The following packages are installed on the Ambari server node by the Ambari installer:

Page 13: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

13/134

postgresql

postgresql-server

postgresql-libs

The following packages must be installed on all IBM Spectrum Scale nodes:

ksh

libstdc++

libstdc++-devel

compat-libstdc++ (only X86_64; not needed for ppc64le)

kernel

kernel-devel

gcc-c++

libstdc++

imake (x86_64 only; not needed for ppc64le)

make

The following recommended packages can be downloaded to all nodes:

acl, libacl – to enable Hadoop ACL support

libattr – to enable Hadoop extended attributes

Some of these packages are installed by default while installing the operating system.

4.1.3 BigInsights Ambari package

Obtain the appropriate tarballs based on the operating system of the cluster for Ambari packages by using

wget. Only the operating systems and the hardware listed in the repository are supported.

Note: If you are using a Windows system to download the files, you can also open the URLs in a web browser

and download the files. You can transfer the files to the system that will host the mirror repository files.

Linux x86-64 (RHEL6)

Version Download URL

4.1.0.2 https://ibm-open-platform.ibm.com/repos/Ambari/rhel/6/x86_64/2.1.x/Up-

dates/2.1.0_Spark-1.5.1/BI-AMBARI-2.1.0-Spark-1.5.1-20160105_1211.el6.x86_64.tar.gz

Page 14: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

14/134

Linux x86-64 (RHEL7)

Version Download URL

4.1.0.2 https://ibm-open-platform.ibm.com/repos/Ambari/rhel/7/x86_64/2.1.x/Up-

dates/2.1.0_Spark-1.5.1/BI-AMBARI-2.1.0-Spark-1.5.1-20160105_1212.el7.x86_64.tar.gz

Linux x86-64 (SLES11)

Version Download URL

4.1.0.2 https://ibm-open-platform.ibm.com/repos/Ambari/sles/11/x86_64/2.1.x/Up-

dates/2.1.0_Spark-1.5.1/BI-AMBARI-2.1.0-Spark-1.5.1-

20160105_1528.sles11.x86_64.tar.gz

Power Linux LE (RHEL 7)

Version Download URL

4.1.0.2 https://ibm-open-platform.ibm.com/repos/Ambari/rhel/7/ppc64le/2.1.x/Up-

dates/2.1.0_Spark-1.5.1/BI-AMBARI-2.1.0-Spark-1.5.1-20160105_1212.el7.ppc64le.tar.gz

TABLE 1 BIGINSIGHTS AMBARI PACKAGES

4.1.4 IBM Open Platform (IOP) for Hadoop

Obtain the appropriate tarballs based on the operating system of the cluster for the IBM Open Platform reposi-

tory packages using wget. Only the operating systems and the hardware listed in the repository are supported.

Note: If you are using a Windows system to download the files, you can also open the URLs in a web browser

and proceed to download the files. You can then transfer the files to the system that will host the mirror repos-

itory files.

Note: IOP-Utils 1.1 is for IOP 4.1 release. Do not use IOP-Utils 1.2 version for IOP 4.1 release.

Linux x86-64(RHEL6)

PACKAGE | Download URL

Page 15: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

15/134

IOP 4.1.0.2 https://ibm-open-platform.ibm.com/repos/IOP/rhel/6/x86_64/4.1.x/Up-dates/4.1.0.0_Spark-1.5.1/IOP-4.1-Spark-1.5.1-20151210_1028.el6.x86_64.tar.gz

IOP-Utils 1.1

https://ibm-open-platform.ibm.com/repos/IOP-UTILS/rhel/6/x86_64/1.1/iop-utils-1.1.0.0.el6.x86_64.tar.gz

Linux x86-64(RHEL7)

PACKAGE | Download URL

IOP 4.1.0.2 https://ibm-open-platform.ibm.com/repos/IOP/rhel/7/x86_64/4.1.x/Up-dates/4.1.0.0_Spark-1.5.1/IOP-4.1-Spark-1.5.1-20151209_2001.el7.x86_64.tar.gz

IOP-Utils 1.1

https://ibm-open-platform.ibm.com/repos/IOP-UTILS/rhel/7/x86_64/1.1/iop-utils-1.1.0.0.el7.x86_64.tar.gz

Linux x86-64 (SLES 11)

PACKAGE | Download URL

IOP 4.1.0.2 https://ibm-open-platform.ibm.com/repos/IOP/sles/11/x86_64/4.1.x/Up-dates/4.1.0.0_Spark-1.5.1/IOP-4.1-Spark-1.5.1-20160105_1528.sles11.x86_64.tar.gz

IOP-Utils 1.1 [No tarballs available]

https://ibm-open-platform.ibm.com/repos/IOP-UTILS/sles/11/x86_64/1.1/

Power Linux LE (RHEL7)

PACKAGE | Download URL

IOP 4.1.0.2

https://ibm-open-platform.ibm.com/repos/IOP/rhel/7/ppc64le/4.1.x/Up-dates/4.1.0.0_Spark-1.5.1/IOP-4.1-Spark-1.5.1-20151210_1501.el7.ppc64le.tar.gz

IOP-Utils 1.1

https://ibm-open-platform.ibm.com/repos/IOP-UTILS/rhel/7/ppc64le/1.1/iop-utils-1.1.0.0.el7.ppc64le.tar.gz

TABLE 2 IOP PACKAGES

4.1.5 IBM Spectrum Scale

4.1.5.1 Base package

If you have purchased the IBM Spectrum Scale license, you can download the Spectrum Scale base installation

package files from the IBM Passport Advantage web site.

For IBM Spectrum Scale version 4.1.1.7 and later or version 4.2.0.1 and later, full images are available through

Fix Central.

For internal IBM users, customer POC, and trail licenses, follow the instructions at Spectrum Scale Sales Wiki

Software Evaluation - Spectrum Scale Trial license page.

Page 16: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

16/134

To order IBM Spectrum Scale, see IBM Spectrum Scale Knowledge Center Question 1.1.

4.1.5.2 Kernel level

Check the installed kernel RPMs. Unlike HDFS, IBM Spectrum Scale is a kernel-level file system and as such, in-

tegrates tightly with the operating system. This is a critical dependency. Ensure the environment has matching

kernel, kernel-devel, and kernel-headers.

The following example uses RHEL 7.1.

1) On all nodes, check the installed kernel RPMs by running the following command:

rpm -qa | grep kernel

2) On all nodes, confirm that the output includes the following:

kernel-headers

kernel-devel

kernel

If any of the kernel RPMs is missing, install them. If the kernels already exist, run the yum install com-

mand:

yum -y install kernel kernel-headers kernel-devel

3) Validate that all of the kernel RPM versions match.

WARNING: Kernels are updated after the original operating system installation. Ensure that the active

kernel version matches the installed version of both kernel-devel and kernel-headers.

[root@c902f09x02 tmp]# uname -r 3.10.0-229.el7.x86_64 <== Find kernel-devel and kernel-headers to match this [root@c902f09x02 tmp]# rpm -qa | grep kernel kernel-tools-3.10.0-229.el7.x86_64 kernel-devel-3.10.0-229.el7.x86_64<== kernel-devel matches kernel-tools-libs-3.10.0-229.el7.x86_64 kernel-headers-3.10.0-229.el7.x86_64<== kernel-headers matches kernel-3.10.0-229.el7.x86_64

If multiple kernels are installed, ensure that only one instance of the kernel, kernel-header, and ker-

nel-devel are installed. If older kernel packages are installed, remove them.

Page 17: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

17/134

4.1.5.3 HDFS Transparency package

IBM Spectrum Scale HDFS Transparency is independently installed from IBM Spectrum Scale and provided as an

RPM or DEB file. The HDFS Transparency supports both local and shared storage modes.

Download IBM Spectrum Scale HDFS Transparency from the IBM Spectrum Scale wiki here:

HDFS Transparency Download

The module name is gpfs.hdfs-protocol-2.7.0- (version).

Save this module in the IBM Spectrum Scale repository.

Note: HDFS Transparency version 2.7.0.3 and later requires OpenJDK version 1.8

GPFS Ambari Integration Module for HDFS Transparency requires HDFS Transparency version 2.7.0.3 The Ambari-based installer attempts to detect pre-existing HDFS Transparency on each node. If HDFS

Transparency is detected, the installer does not overwrite or re-deploy the connector. If IBM Spectrum Scale

HDFS Transparency is not detected on any node, the installer deploys the IBM Spectrum Scale HDFS

Transparency RPM file on all nodes from the IBM Spectrum Scale repository.

This document refers to the GPFS Transparency Node as the node on which the the HDFS Transparency

package installed.

4.1.5.4 GPFS Ambari integration module

The GPFS Ambari integration module is independent of IBM Spectrum Scale and is provided as a separate bin

file.

For traditional Hadoop clusters that use HDFS, an HDFS service appears in the Ambari console to provide a

graphical management interface for the HDFS configuration (hdfs-site.xml) and the Hadoop cluster itself (core-

site.xml). Through the Ambari HDFS service, you can start and stop the HDFS service, make configuration

changes, and propagate those changes across the cluster.

IBM Spectrum Scale replaces HDFS, and the Ambari HDFS service is no longer used. The GPFS Ambari

integration module, provided as a bin file, creates an Ambari IBM Spectrum Scale service to start, stop and

make configuration changes to IBM Spectrum Scale and HDFS Transparency.

Download the GPFS Ambari integration module from the IBM Spectrum Scale wiki here:

Download - GPFS Ambari Integration Module

Page 18: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

18/134

The module name is gpfs.hdfs-protocol.ambari-iop_4.1-(version).

WARNING: Do not run this on the /root directory since it can cause installation problems.

Note: The GPFS Ambari integration module for HDFS Transparency requires HDFS Transparency version 2.7.0.3

4.1.5.5 Update package

The latest Spectrum Scale update package files can be obtained from Fix Central.

From Fix Central on the Find product tab, provide the following information for the packages.

In this field: Select:

Product selector IBM Spectrum Scale

Installed Version 4.2.0

Platform/URL Linux 64-bit, x86_64 http://www-933.ibm.com/support/fixcentral/swg/select-

Fixes?parent=Software%2Bdefined%2Bstorage&prod-

uct=ibm/StorageSoftware/IBM+Spectrum+Scale&re-

lease=4.2.0&platform=Linux+64-bit,x86_64&function=all

Linux Power PC 64 Little

Endian

http://www-933.ibm.com/support/fixcentral/swg/select-

Fixes?parent=Software%2Bdefined%2Bstorage&prod-

uct=ibm/StorageSoftware/IBM+Spectrum+Scale&re-

lease=4.2.0&platform=Linux+PPC64LE&function=all

In this field: Select:

Product selector IBM Spectrum Scale

Installed Version 4.1.1

Platform/URL Linux 64-bit, x86_64 http://www-933.ibm.com/support/fixcentral/swg/select-

Fixes?parent=Software%2Bdefined%2Bstorage&prod-

uct=ibm/StorageSoftware/IBM+Spectrum+Scale&re-

lease=4.1.1&platform=Linux+64-bit,x86_64&function=all

Page 19: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

19/134

Linux Power PC 64 Little

Endian

http://www-933.ibm.com/support/fixcentral/swg/select-

Fixes?parent=Software%2Bdefined%2Bstorage&prod-

uct=ibm/StorageSoftware/IBM+Spectrum+Scale&re-

lease=4.1.1&platform=Linux+PPC64LE&function=all

TABLE 3 IBM SPECTRUM SCALE PACKAGES

4.2 Software installation

This section describes how to plan to install of BigInsights Ambari IOP, the GPFS Ambari integration module,

IBM Spectrum Scale, and HDFS Transparency on a FPO or shared storage (e.g. ESS) system based on the current

IBM Spectrum Scale configuration.

The GPFS Ambari integration module installation process attempts to detect an existing IBM Spectrum Scale

file system. For IBM Spectrum Scale FPO, which is a multi-node, just-a-bunch-of-disk/JBOD configuration, the

installer can set up a new clustered file system if the hostnames of all the nodes and disk devices are available

at each node by a stanza file. The installer designates manager roles and quorum nodes and creates NSDs and

the file system. The best practices for the Hadoop configuration are automatically implemented.

For installation on an existing filesystem, the Hadoop integration components for IBM Spectrum Scale are

deployed. There will be no validation checking on the pre-existing IBM Spectrum Scale configuration.

You can view the current best practices for installation at Big Data Best Practice wiki page.

In all cases, a local repository for IBM Spectrum Scale is required. Ambari reads from the repository to deploy

IBM Spectrum Scale, if it is not already created, and the following Hadoop integration components are

required:

Module Description

IBM Spectrum Scale HDFS Transparency

Provides an implementation of the Hadoop NameNode and DataNode RPC, thereby enabling Hadoop applications to use IBM Spectrum Scale as the distributed file system.

GPFS Ambari integration Module

Enables basic administration of IBM Spectrum Scale and HDFS Transparency within the Ambari console. When installed, an IBM Spectrum Scale service is displayed on the Ambari interface.

TABLE 4 HDFS TRANSPARENCY AND AMBARI INTEGRATION MODULE

For a list of known limitations, see Limitations.

4.2.1 Different installation modes

Page 20: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

20/134

This section lists the common environment scenarios to help guide users on how to deploy a big data cluster on

IBM Spectrum Scale.

Important Notes:

The recommendation is that all configuration changes be made through the Ambari GUI and not manually set them into the HDFS configuration files or into the HDFS Transparency configuration files.

Back up the existing HDFS and HDFS Transparency configuration before proceeding to deploy Ambari IOP or deploy IBM Spectrum Scale service with Ambari onto a system with HDFS Transparency al-ready installed.

If deploying Ambari IOP onto an existing HDFS Transparency cluster, ensure that the HDFS configura-tion provided through the Ambari UI is consistent with the existing HDFS configuration; Otherwise, the existing HDFS configuration might get overwritten. See the Limitations - Installation section for more information.

In your existing cluster, if the HDFS settings in the HDFS Transparency configuration files were manu-ally changed (for example: settings in core-site, hdfs-site, or log4j.properties in /usr/lpp/mmfs/ha-doop/etc/hadoop) and these changes were not propagated into your existing native HDFS configura-tion files, then during the deployment of Ambari IOP and Spectrum Scale service the HDFS Transpar-ency configuration will be replaced by the Ambari UI HDFS configurations. Therefore, save any spe-cific changes that were set just for the HDFS Transparency configuration files so that these values can later be applied through the Ambari GUI.

The restart sequence after the HDFS configuration is modified from Ambari GUI is to always restart the HDFS service then restart the Spectrum Scale service. This will keep the HDFS and the HDFS Transparency configuration settings in sync and update the environment properly.

4.2.1.1 Deploy a new Ambari IOP and an HDFS Transparency cluster

Deploy and configure a new IBM Spectrum Scale File Placement Optimizer (FPO) cluster by using the Ambari

IOP and the IBM Spectrum Scale service. IOP is configured to use IBM Spectrum Scale instead of HDFS. This

procedure is the basic recommended installation based on best practice policies for a big data cluster on IBM

Spectrum Scale.

Installation step sequence

1) Ambari IOP installation

2) Setting up High Availability [HA] [Optional]

3) Adding additional software services [Optional]

4) Install the GPFS integration module into Ambari

5) Adding the IBM Spectrum Scale service to Ambari

Installs IBM Spectrum Scale and HDFS Transparency

Note: The best practice sequence is to install the optional HA and Value Add Modules before installing the GPFS Ambari Integration module and IBM Spectrum Scale Service.

Page 21: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

21/134

4.2.1.2 Deploy Ambari IOP on an existing IBM Spectrum Scale cluster

Deploy and configure the Ambari IOP and IBM Spectrum Scale service on an existing IBM Spectrum Scale clus-

ter. The pre-created IBM Spectrum Scale filesystem is detected and only the Hadoop integration components

for IBM Spectrum Scale are deployed. The installer will install and configure Hadoop workload on top of the

existing IBM Spectrum Scale without any validation checking on the pre-existing IBM Spectrum Scale configura-

tion. This installation can be an existing FPO or shared storage, such as Elastic Storage Server (ESS) installation.

This installation procedure can be used by advanced users.

Note: This is the only supported option for share storage.

Review the Important Notes under the Different installation modes.

Installation step sequence

1) Ambari IOP installation

2) Install the GPFS integration module into Ambari

3) Adding the IBM Spectrum Scale service to Ambari

4.2.1.3 Add Spectrum Scale service to an existing Ambari IOP and an HDFS Transparency cluster

Deploy IBM Spectrum Scale service into an existing Ambari IOP cluster with IBM Spectrum Scale and HDFS

Transparency already installed. This will integrate IBM Spectrum Scale and HDFS Transparency to be managed

with Ambari.

Review the Important Notes under the Different installation modes.

Installation step sequence

1) Install the GPFS integration module into Ambari

2) Adding the IBM Spectrum Scale service to Ambari

4.2.1.4 Configure HA on an existing Ambari IOP and an IBM Spectrum Scale cluster

Configure HA into an existing Ambari IOP and IBM Spectrum Scale cluster with HDFS Transparency.

Installation step sequence

1) On the Ambari dashboard, go to Actions > Stop All.

2) On the Spectrum Scale dashboard, go to Service Actions > Unintegrate_Transparency.

3) On the Ambari server node, run the ambari-server restart command to restart the Ambari server.

4) Log back in to the Ambari GUI.

Page 22: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

22/134

5) From the Ambari dashboard, go to Actions > Start All.

6) Configure HDFS HA by following Setting up High Availability [HA].

7) On the Ambari dashboard, go to Actions > Stop All.

8) On the Spectrum Scale dashboard, go to Service Actions > Integrate_Transparency.

9) On the Ambari server node, run the ambari-server restart command to restart the Ambari server.

10) Log back in to the Ambari GUI.

11) On the Ambari dashboard, go to Actions > Start All.

4.2.2 Node roles information

Before starting software deployment, there are node role rules that must be taken into consideration:

Note:

The Ambari server must be part of the Ambari cluster and co-located with the GPFS Master.

If Ambari IOP was installed on top of an existing IBM Spectrum Scale and HDFS Transparency environ-

ment, the native HDFS NameNode and the HDFS Transparency NameNode must be configured to

have the same hostname.

If Ambari IOP was installed on top of an existing IBM Spectrum Scale and HDFS Transparency environ-

ment, the GPFS Transparency nodes being assigned in the Assign Slaves and Clients page in Ambari

must contain the existing HDFS Transparency nodes. Otherwise, the Spectrum Scale service fails to be

installed.

Ensure that all the hosts for the IBM Spectrum Scale cluster contain the same domain name while cre-

ating the cluster through Ambari.

Verify that the list of host names used are the data network addresses which IBM Spectrum Scale uses

for its cluster setup. Otherwise in an existing or shared filesystem, IBM Spectrum Scale service fails

during installation due to wrong hostname

For BigInsights value-add: The ResourceManager, Symphony Master, Spark History Server and Spark

Thrift Server must be configured to run on the same node.

See the Spectrum Scale HDFS Transparency Guide and the BigInsights documentation in the Knowledge Center

for additional information on node roles and service layouts.

5. Preparing the environment

Page 23: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

23/134

5.1 Validating the network

While using a private network for Hadoop data nodes, ensure that all nodes, including the management nodes,

have hostnames bound to the faster internal network or the data network.

On all nodes, the hostname -f must return the FQDN of the faster internal network. This network can be a

bonded network. If the nodes do not return the FQDN, modify /etc/sysconfig/network and use the hostname

command to change the FQDN of the node.

The /etc/hosts file host order listing must have the long hostname first before the short hostname. Otherwise,

the Hbase service check in Ambari might fail.

If the nodes in your cluster have two network adapters, see Dual Network Deployment.

5.2 Setting up password-less for root

IBM Spectrum Scale Master is a role designated to the node on which Ambari is installed and issues IBM

Spectrum Scale commands. Password-less SSH for root from the IBM Spectrum Scale Master node to all other

IBM Spectrum Scale nodes must be configured.

Set up passwordless ssh access for root.

Before the installation, configure the root password-less access from the IBM Spectrum Scale Master node to all other IBM Spectrum Scale nodes. This is required for IBM Spectrum Scale.

The following steps are for password-less access for root:

a. Define Node1 as the IBM Spectrum Scale master.

b. Log on to Node1 as the root user.

# cd /root/.ssh

c. Generate a pair of public authentication keys. Do not type a passphrase.

# ssh-keygen -t rsa Generating the public-private rsa key pair. Type the name of the file in which you want to save the key (/root/.ssh/id_rsa): Type the passphrase. Type the passphrase again. The identification has been saved in /root/.ssh/id_rsa. The public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is:

Note: During ssh-keygen -t rsa, accept the default for all.

d. Set the public key to the authorized_keys file.

# cd /root/.ssh/; cat id_rsa.pub > authorized_keys

Page 24: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

24/134

e. Copy the generated public key file to nodeX. # scp /root/.ssh/* root@nodeX :/root/.ssh

f. Ensure that the public key file permission is correct. # ssh root@nodeX “chmod 700 .ssh; chmod 640 .ssh/authorized_keys"

g. Check password-less access.

# ssh node2

[root@node1 ~]# ssh node2

The authenticity of host 'gpfstest9 (192.168.10.9)' can't be established.

RSA key fingerprint is 03:bc:35:34:8c:7f:bc:ed:90:33:1f:32:21:48:06:db.

Are you sure you want to continue connecting (yes/no)?yes

Note: You also need to run “ssh node1” to add the key into /root/.ssh/known_hosts for pass-wordless access.

5.3 Pre-tasks preparation for IOP

Before installing IBM Open Platform (IOP), pre-installation tasks must be performed on each node. These pre-

installation tasks are the same for installing Hadoop with HDFS or IBM Spectrum Scale.

1. Perform the steps listed in Preparing Your Environment: https://ibm.biz/BdHGfR

NOTE: While creating a Ambari IOP cluster, do not create a local partition file system to be used for HDFS if IBM Spectrum Scale will be used. Instead, set a dummy directory or mount point while setting up the HDFS cluster.

2. Pre-create Hadoop services IDs and groups according to https://ibm.biz/BdHGfX.

If you are using LDAP, create the IDs and groups on the LDAP server and ensure that all nodes

can authenticate the users.

If you are using local IDs, the IDs must be pre-created on all nodes with same id and group

values across the nodes.

For example:

groupadd --gid 1000 hadoop groupadd --gid 1016 rddcached #optionally align rddcached GID with UID useradd -g hadoop -u 1001 ams useradd -g hadoop -u 1002 hive useradd -g hadoop -u 1003 oozie

Page 25: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

25/134

useradd -g hadoop -u 1004 ambari-qa useradd -g hadoop -u 1005 flume useradd -g hadoop -u 1006 hdfs useradd -g hadoop -u 1007 solr useradd -g hadoop -u 1008 knox useradd -g hadoop -u 1009 spark useradd -g hadoop -u 1010 mapred useradd -g hadoop -u 1011 hbase useradd -g hadoop -u 1012 zookeeper useradd -g hadoop -u 1013 sqoop useradd -g hadoop -u 1014 yarn useradd -g hadoop -u 1015 hcat useradd -g rddcached -u 1016 rddcached #optionally align rddcached GID with UID useradd -g hadoop -u 1017 kafka

NOTE: UID or GID is the common way for a Linux system to control access from users and groups. For example, if the user yarn UID=100 on node1 generates data and the user yarn UID=200 on node2 wants to read this data, the read will fail because of permission issues. Keeping a consistent UID and GID over all nodes is important to avoid unexpected issues. For the initial installation through Ambari, the UID or GID of users will be consistent across all nodes. However, if you deploy the cluster for the second time, the UID or GID of these users might be inconsistent over all nodes, as per the AMBARI-10186 issue that was reported to the Ambari community. After deployment, check whether the UID is consistent across all nodes. If it is not, you must fix it by running the following commands on each node, for each user or group that must be fixed: ##### Change UID of one account usermod -u <NEWUID> <USER> ##### Change GID of one group groupmod -g <NEWGID> <GROUP> ##### Update all files with old UID to new UID find / -user <OLDUID> -exec chown -h <NEWUID> {} \; ##### Update all files with old GID to new GID find / -group <OLDGID> -exec chgrp -h <NEWGID> {} \; ##### Update GID of one account usermod -g <NEWGID> <USER>

5.4 Yum repositories setup

Page 26: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

26/134

IBM Open Platform (IOP) and BigInsights support installation by reading from the IBM-hosted Yum repositories

or the local mirror repositories. Reading from the local mirror repositories is faster for multi-node clusters be-

cause each node performs its own download of repository code.

IBM Spectrum Scale only supports installation through a local repository.

5.4.1 Ambari and IOP repository

Log in to the mirror repository server as root and extract the Ambari and IOP repository tarballs into the repos

directory under <document root>. For example: /var/www/html/repos.

For each of the tarballs downloaded in the Software packages section, run the following commands: cd /var/www/html/repos

tar xzvf <path to downloaded tarballs>

The result should be that three subdirectories under /var/www/html/repos, one for each extracted tarball.

[root@c902mnx11 repos]# pwd; ls -ltr /var/www/html/repos total 2717316 -rw-r--r-- 1 root root 560685051 Jun 8 14:29 BI-AMBARI-2.1.0-Spark-1.5.1-20160105_1212.el7.x86_64.tar.gz -rw-r--r-- 1 root root 2101943123 Jun 8 14:30 IOP-4.1-Spark-1.5.1-20151209_2001.el7.x86_64.tar.gz -rw-r--r-- 1 root root 119902006 Jun 8 14:30 iop-utils-1.2.0.0.el7.x86_64.tar.gz [root@c902mnx11 repos]#

The following example uses RHEL 7.1.

IOP # cd /var/www/html/repos # tar xzvf IOP-4.1-Spark-1.5.1-20151209_2001.el7.x86_64.tar.gz IOP/RHEL7/x86_64/4.1-Spark-1.5.1/ IOP/RHEL7/x86_64/4.1-Spark-1.5.1/BI-GPG-KEY.public IOP/RHEL7/x86_64/4.1-Spark-1.5.1/bigtop-jsvc/ IOP/RHEL7/x86_64/4.1-Spark-1.5.1/bigtop-jsvc/x86_64/ IOP/RHEL7/x86_64/4.1-Spark-1.5.1/bigtop-jsvc/x86_64/bigtop-jsvc-1.0.15-3.el7.x86_64.rpm IOP/RHEL7/x86_64/4.1-Spark-1.5.1/bigtop-jsvc/x86_64/bigtop-jsvc-debuginfo-1.0.15-3.el7.x86_64.rpm IOP/RHEL7/x86_64/4.1-Spark-1.5.1/bigtop-jsvc/bigtop-jsvc-1.0.15-3.el7.src.rpm IOP/RHEL7/x86_64/4.1-Spark-1.5.1/bigtop-tomcat/ IOP/RHEL7/x86_64/4.1-Spark-1.5.1/bigtop-tomcat/bigtop-tomcat-6.0.36-3.el6.noarch.rpm IOP/RHEL7/x86_64/4.1-Spark-1.5.1/bigtop-utils/ IOP/RHEL7/x86_64/4.1-Spark-1.5.1/bigtop-utils/noarch/ IOP/RHEL7/x86_64/4.1-Spark-1.5.1/bigtop-utils/noarch/bigtop-utils-0.9.0-3.el7.noarch.rpm

Page 27: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

27/134

IOP/RHEL7/x86_64/4.1-Spark-1.5.1/bigtop-utils/bigtop-utils-0.9.0-3.el7.src.rpm IOP/RHEL7/x86_64/4.1-Spark-1.5.1/flume/ IOP/RHEL7/x86_64/4.1-Spark-1.5.1/flume/noarch/ IOP/RHEL7/x86_64/4.1-Spark-1.5.1/flume/noarch/flume_4_1_0_0-1.5.2_IBM_7.4.1.0.0-3.el7.no-arch.rpm IOP/RHEL7/x86_64/4.1-Spark-1.5.1/flume/noarch/flume_4_1_0_0-agent-1.5.2_IBM_7.4.1.0.0-3.el7.noarch.rpm IOP/RHEL7/x86_64/4.1-Spark-1.5.1/flume/flume_4_1_0_0-1.5.2_IBM_7.4.1.0.0-3.el7.src.rpm IOP/RHEL7/x86_64/4.1-Spark-1.5.1/hadoop/ IOP/RHEL7/x86_64/4.1-Spark-1.5.1/hadoop/x86_64/ … … IOP/RHEL7/x86_64/4.1-Spark-1.5.1/sqoop/ IOP/RHEL7/x86_64/4.1-Spark-1.5.1/sqoop/noarch/ IOP/RHEL7/x86_64/4.1-Spark-1.5.1/sqoop/noarch/sqoop_4_1_0_0-1.4.6_IBM_20.4.1.0.0-3.el7.no-arch.rpm IOP/RHEL7/x86_64/4.1-Spark-1.5.1/sqoop/noarch/sqoop_4_1_0_0-metastore-1.4.6_IBM_20.4.1.0.0-3.el7.noarch.rpm IOP/RHEL7/x86_64/4.1-Spark-1.5.1/sqoop/sqoop_4_1_0_0-1.4.6_IBM_20.4.1.0.0-3.el7.src.rpm IOP/RHEL7/x86_64/4.1-Spark-1.5.1/zookeeper/ IOP/RHEL7/x86_64/4.1-Spark-1.5.1/zookeeper/noarch/ IOP/RHEL7/x86_64/4.1-Spark-1.5.1/zookeeper/noarch/zookeeper_4_1_0_0-3.4.6_IBM_3.4.1.0.0-3.el7.noarch.rpm IOP/RHEL7/x86_64/4.1-Spark-1.5.1/zookeeper/noarch/zookeeper_4_1_0_0-server-3.4.6_IBM_3.4.1.0.0-3.el7.noarch.rpm IOP/RHEL7/x86_64/4.1-Spark-1.5.1/zookeeper/noarch/zookeeper_4_1_0_0-rest-3.4.6_IBM_3.4.1.0.0-3.el7.noarch.rpm IOP/RHEL7/x86_64/4.1-Spark-1.5.1/zookeeper/zookeeper_4_1_0_0-3.4.6_IBM_3.4.1.0.0-3.el7.src.rpm # IOP-UTILS # cd /var/www/html/repos # tar xzvf iop-utils-1.1.0.0.el7.x86_64.tar.gz IOP-UTILS/ IOP-UTILS/rhel/ IOP-UTILS/rhel/7/ IOP-UTILS/rhel/7/x86_64/ IOP-UTILS/rhel/7/x86_64/1.1/ IOP-UTILS/rhel/7/x86_64/1.1/perl/ IOP-UTILS/rhel/7/x86_64/1.1/perl/perl-Net-SNMP-5.2.0-4.el6.noarch.rpm IOP-UTILS/rhel/7/x86_64/1.1/perl/perl-Crypt-DES-2.05-9.el6.x86_64.rpm IOP-UTILS/rhel/7/x86_64/1.1/openjdk/ IOP-UTILS/rhel/7/x86_64/1.1/openjdk/jdk-1.8.0.tar.gz IOP-UTILS/rhel/7/x86_64/1.1/openjdk/jdk-1.7.0.tar.gz IOP-UTILS/rhel/7/x86_64/1.1/repodata/ IOP-UTILS/rhel/7/x86_64/1.1/repodata/3846a102ac3f8434148da3a3b47c70d105e219e76711614a7a93da917ea1 f71a-primary.xml.gz IOP-UTILS/rhel/7/x86_64/1.1/repodata/4c9d17ac7e9f82ef0976b1da08206e3e903aa73145c89f5a50dceb12bd64 8a71-filelists.sqlite.bz2 IOP-UTILS/rhel/7/x86_64/1.1/repodata/e06ea6690d0d632751cd90f4b5c00e7e513588d65a097a0497df8dcb5baa 6291-other.xml.gz

Page 28: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

28/134

IOP-UTILS/rhel/7/x86_64/1.1/repodata/0282066f5534b5bd7e10881b7150f540150935db015a6a05ee6d7eeb6733 7c5a-filelists.xml.gz IOP-UTILS/rhel/7/x86_64/1.1/repodata/460a70532a7371926dd429398b78219e991efb08f471dc6be4f81a7de3c9 1860-primary.sqlite.bz2 IOP-UTILS/rhel/7/x86_64/1.1/repodata/7ff301750a65c33e70bfed08ae695fbf52f5f83c3eca74b14fa6facfec56 40ab-other.sqlite.bz2 IOP-UTILS/rhel/7/x86_64/1.1/repodata/repomd.xml IOP-UTILS/rhel/7/x86_64/1.1/corresponding source.txt IOP-UTILS/rhel/7/x86_64/1.1/libconfuse/ IOP-UTILS/rhel/7/x86_64/1.1/libconfuse/libconfuse-2.7-4.el6.x86_64.rpm IOP-UTILS/rhel/7/x86_64/1.1/rrdtool/ IOP-UTILS/rhel/7/x86_64/1.1/rrdtool/perl-rrdtool-1.4.5-1.el6.rfx.x86_64.rpm IOP-UTILS/rhel/7/x86_64/1.1/rrdtool/python-rrdtool-1.4.5-1.el6.rfx.x86_64.rpm IOP-UTILS/rhel/7/x86_64/1.1/rrdtool/rrdtool-1.4.5-1.el6.rfx.x86_64.rpm IOP-UTILS/rhel/7/x86_64/1.1/fping/ IOP-UTILS/rhel/7/x86_64/1.1/fping/fping-2.4b2-10.el6.x86_64.rpm IOP-UTILS/rhel/7/x86_64/1.1/hadoop-lzo/ IOP-UTILS/rhel/7/x86_64/1.1/hadoop-lzo/hadoop-lzo-native-0.5.1-1.x86_64.rpm IOP-UTILS/rhel/7/x86_64/1.1/hadoop-lzo/hadoop-lzo-0.5.1-1.x86_64.rpm IOP-UTILS/rhel/7/x86_64/1.1/extjs/ IOP-UTILS/rhel/7/x86_64/1.1/extjs/extjs-2.2_IBM_1-1.noarch.rpm #

Ambari # cd /var/www/html/repos # tar xzvf BI-AMBARI-2.1.0-Spark-1.5.1-20160105_1212.el7.x86_64.tar.gz Ambari/rhel/7/x86_64/2.1.x/Updates/2.1.0_Spark-1.5.1/ Ambari/rhel/7/x86_64/2.1.x/Updates/2.1.0_Spark-1.5.1/ambari-agent-2.1.0_IBM-5.x86_64.rpm Ambari/rhel/7/x86_64/2.1.x/Updates/2.1.0_Spark-1.5.1/ambari-log4j-2.1.0_IBM_8.noarch.rpm Ambari/rhel/7/x86_64/2.1.x/Updates/2.1.0_Spark-1.5.1/ambari-metrics-collector-2.1.0_IBM-5.x86_64.rpm Ambari/rhel/7/x86_64/2.1.x/Updates/2.1.0_Spark-1.5.1/ambari-metrics-hadoop-sink-2.1.0_IBM-5.x86_64.rpm Ambari/rhel/7/x86_64/2.1.x/Updates/2.1.0_Spark-1.5.1/ambari-metrics-monitor-2.1.0_IBM-5.x86_64.rpm Ambari/rhel/7/x86_64/2.1.x/Updates/2.1.0_Spark-1.5.1/ambari.repo Ambari/rhel/7/x86_64/2.1.x/Updates/2.1.0_Spark-1.5.1/ambari-server-2.1.0_IBM-5.x86_64.rpm Ambari/rhel/7/x86_64/2.1.x/Updates/2.1.0_Spark-1.5.1/BI-GPG-KEY.public Ambari/rhel/7/x86_64/2.1.x/Updates/2.1.0_Spark-1.5.1/repodata/ Ambari/rhel/7/x86_64/2.1.x/Updates/2.1.0_Spark-1.5.1/repo-data/545405d415128b69143de2ba2ae6c8b1c0934781887ba1e8f9946d9f05074636-other.sqlite.bz2 Ambari/rhel/7/x86_64/2.1.x/Updates/2.1.0_Spark-1.5.1/repo-data/785a52bdb2149dd090c69e7b3c19205e9c7a9b044ececbf516f505b9d64784ec-other.xml.gz Ambari/rhel/7/x86_64/2.1.x/Updates/2.1.0_Spark-1.5.1/repo-data/ad34e85955f3dac4182742e244fc34e03fbfc40c45c4e3b59bcea18cf5c1c20f-filelists.sqlite.bz2 Ambari/rhel/7/x86_64/2.1.x/Updates/2.1.0_Spark-1.5.1/repo-data/d2d1c5801f00dc84ce56b5b7ec2bc8778d422873bf0603aea2891f9e624f58f4-filelists.xml.gz Ambari/rhel/7/x86_64/2.1.x/Updates/2.1.0_Spark-1.5.1/repo-data/4e64b640b3553bd26b660aa01c7f93993c596fbc77548ac061b803baf728a55c-pri-mary.sqlite.bz2 Ambari/rhel/7/x86_64/2.1.x/Updates/2.1.0_Spark-1.5.1/repo-data/a88bfd85720c0777feec38685610bb76116e134d52cb6a3330e077e3b837aaa4-primary.xml.gz Ambari/rhel/7/x86_64/2.1.x/Updates/2.1.0_Spark-1.5.1/repodata/repomd.xml

Page 29: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

29/134

#

URLs for each Yum repository:

IOP: http://<YUM-Server>/repos/IOP/RHEL7/x86_64/4.1-Spark-1.5.1 IOP-UTILS: http://<YUM-Server>/repos/IOP-UTILS/rhel/7/x86_64/1.1 Ambari: http://<YUM-Server>/repos/Ambari/rhel/7/x86_64/2.1.x/Updates/2.1.0_Spark-1.5.1

5.4.2 Setting up the IBM Spectrum Scale repository

Note: If you have already set up an IBM Spectrum Scale file system, you can skip this section.

Perform the following steps only if you are deploying IBM Open Platform (IOP) with IBM Spectrum Scale

Advanced Edition. If you are using Ambari to install IBM Spectrum Scale, use the Standard or Advanced Edition

of IBM Spectrum Scale.

IBM Spectrum Scale Express Edition can be used only if it is installed and configured manually before installing

Ambari and IOP. The following list of RPM packages for IBM Spectrum Scale v4.1.1 and later can help verify the

edition of IBM Spectrum Scale.

IBM Spectrum Scale Edition rpm package list

Express Edition gpfs.base

gpfs.gpl

gpfs.docs

gpfs.gskit

gpfs.msg.en_US

gpfs.platform

Standard Edition <Express Edition rpm list> + gpfs.ext

Advanced Edition <Standard Edition rpm list> + gpfs.crypto

For IBM Spectrum Scale 4.2 release:

Add gpfs.adv to list above

TABLE 5 IBM SPECTRUM SCALE EDITIONS

The following example uses IBM Spectrum Scale 4.2.0.3 version.

1. On the repository web server, create a directory for your IBM Spectrum Scale repos, such as <document

root>/repos/GPFS. For Apache httpd with document root /var/www/html, type the following command:

Page 30: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

30/134

mkdir -p /var/www/html/repos/GPFS

2. Obtain the IBM Spectrum Scale software. If you have already installed IBM Spectrum Scale manually, skip

this step. Download the IBM Spectrum Scale package (See Base package). In thee following example, IBM

Spectrum Scale 4.2.0.3 is downloaded from Fix Central, untars the package, and extracts the installer.

E.g. As root or a user with sudo privileges, run the installer to get the IBM Spectrum Scale packages into a

user specified directory via the --dir option:

chmod +x Spectrum_Scale_Advanced-4.2.0.3-x86_64-Linux-install

./Spectrum_Scale_Advanced-4.2.0.3-x86_64-Linux-install --dir /var/www/html/repos/GPFS --silent

Note: The --silent option is used to accept the software license agreement and the --dir option places the

IBM Spectrum Scale RPMs into the directory/var/www/html/repos/GPFS. Without specifying the --dir

option the default location will be /usr/lpp/mmfs/4.2.X.

3. If the packages are extracted into the IBM Spectrum Scale default directory, /usr/lpp/mmfs/4.2.X, copy all

the IBM Spectrum Scale files that are required for your installation environment into the IBM Spectrum

Scale repository path:

cd /usr/lpp/mmfs/4.2.X

cp gpfs*.rpm /var/www/html/repos/GPFS

4. Ensure that the directory does not contain any optional IBM Spectrum Scale packages that you do not want

to be installed. See IBM Spectrum Scale Installation Guide for more information on base and optional

packages. Remove OS packages that are not relevant to your environment. Ambari requires only the

following packages:

gpfs.base

gpfs.gpl

gpfs.docs

gpfs.gskit

gpfs.msg.en_US

gpfs.ext

gpfs.crypto (if Advanced edition is used)

gpfs.adv (if IBM Spectrum Scale 4.2 Advanced edition is used)

For example, in x86-64 RHEL 7 environment, the repository must contain packages only pertaining to the OS

or noarch types.

Page 31: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

31/134

# pwd;ls /var/www/html/repos/GPFS gpfs.adv-4.2.0-3.x86_64.rpm gpfs.gss.pmcollector-4.2.0-3.el7.x86_64.rpm gpfs.base-4.2.0-3.x86_64.rpm gpfs.gss.pmsensors-4.2.0-3.el7.x86_64.rpm gpfs.callhome-4.2.0-1.001.noarch.rpm gpfs.gui-4.2.0-3.el7.x86_64.rpm gpfs.callhome-ecc-client-4.2.0-1.000.noarch.rpm gpfs.gui-4.2.0-3.sles12.x86_64.rpm gpfs.callhome-jre-8.0-2.0.x86_64.rpm gpfs.msg.en_US-4.2.0-3.noarch.rpm gpfs.crypto-4.2.0-3.x86_64.rpm license gpfs.docs-4.2.0-3.noarch.rpm manifest gpfs.ext-4.2.0-3.x86_64.rpm repodata gpfs.gpl-4.2.0-3.noarch.rpm Spectrum_Scale_Advanced-4.2.0.3-x86_64-Linux-install gpfs.gskit-8.0.50-47.x86_64.rpm

5. Remove the old connector gpfs.hadoop-connector file from the repository (e.g.

/var/www/html/repos/GPFS) if it exists.

cd/var/www/html/repos/GPFS

rm gpfs.hadoop-connector*.rpm

6. Copy HDFS Transparency package into the IBM Spectrum Scale repo path.

cp gpfs.hdfs-protocol-2.7.0-(version) /var/www/html/repos/GPFS

7. Check for IBM Spectrum Scale packages in the /root/ directory. If the package exists, relocate them to a

subdirectory. There are known issues with IBM Spectrum Scale package in the /root that cause the Ambari installation to fail.

8. Create the YUM repository

createrepo /var/www/html/repos/GPFS/

# cd /var/www/html/repos/GPFS/ # createrepo . Spawning worker 0 with 4 pkgs Spawning worker 1 with 4 pkgs Spawning worker 2 with 4 pkgs Spawning worker 3 with 4 pkgs Workers Finished Saving Primary metadata Saving file lists metadata Saving other metadata Generating sqlite DBs Sqlite DBs complete #

9. Access the repository at http://<YUM-Server>/repos/GPFS.

Page 32: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

32/134

5.4.3 Setting up the OS repository

Because some of the IBM Spectrum Scale files (e.g. RPMs) have dependencies on all nodes, you must create the operating system repository.

1. Create the repository path:

mkdir /var/www/html/repos/<rhel_OSlevel>

2. Synchronize the local directory with the current Yum repository:

cd /var/www/html/repos/<rhel_OSlevel>

Run the following:

reposync --gpgcheck -l --repoid=rhel-7-server-rpms --download_path=/var/www/html/repos/<rhel_OSlevel>

3. Create the repository for this node:

createrepo -v /var/www/html/repos/<rhel_OSlevel>

4. Ensure that all the firewalls are disabled or that you have the httpd service port open, because Yum uses http to get the packages from the repository.

5. On all nodes in the cluster that require the repositories, create a file in

/etc/yum.repos.d called local_<rhel_OSlevel>.repo

6. Copy this file to all nodes. The contents of this file must look like the following:

[local_rhel7.1] name=local_rhel7.1 enabled=yes baseurl=http://<internal IP that all nodes can reach>/repos/<rhel_OSlevel> gpgcheck=no

7. Run the yum repolist and yum install RPMs without external connections.

5.5 IBM Spectrum Scale deployment modes

The IBM Spectrum Scale state has three different modes. Follow the steps that pertain to your file system setup requirements.

Modes:

IOP over existing IBM Spectrum Scale file system (FPO)

Page 33: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

33/134

IOP over existing IBM Spectrum Scale file system (ESS)

IOP over new IBM Spectrum Scale cluster (FPO support only)

5.5.1 Deploy IOP over an existing IBM Spectrum Scale file system (FPO)

1. Ensure that IBM Spectrum Scale is set to automount on reboot by running the following command:

/usr/lpp/mmfs/bin/mmchfs <device> -A yes

2. In the console of any one node in the IBM Spectrum Scale cluster, start the IBM Spectrum Scale cluster by

running the following command:

/usr/lpp/mmfs/bin/mmstartup -a

3. Mount the file system over all nodes by running the following command:

/usr/lpp/mmfs/bin/mmmount <fs-name> -a.

4. Ensure that the IBM Spectrum Scale NSD stanza file, gpfs_nsd, does not exist under /var/lib/ambari-

server/resources/ on the Ambari server node.

5. If you have not started the IBM Spectrum Scale cluster but are on the Ambari Assign Slaves and Clients

page, click the Previous button to go back to Assign Master page in Ambari. Then start the IBM Spectrum Scale

cluster and mount the file system onto all the nodes. Go back to the Ambari GUI to continue to the Assign

Slaves and Client page.

Ambari detects the mounted file system and reflects it in Custom Service page for IBM Spectrum Scale.

For a pre-created IBM Spectrum Scale cluster:

IBM Spectrum Scale NSD stanza file is not required because the filesystem already exists. Because

Ambari does not allow a blank value, leave the default value of IBM Spectrum Scale NSD stanza file.

5.5.2 Deploy IOP over an existing IBM Spectrum Scale file system (ESS)

1. Start ESS and set up password-less ssh login from the Ambari server at one of the IBM Spectrum Scale

nodes which is in the ESS IBM Spectrum Scale cluster.

2. Ensure that IBM Spectrum Scale is set to automount on reboot by running the following command:

/usr/lpp/mmfs/bin/mmchfs<device> -A yes

Page 34: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

34/134

3. Ensure that the IBM Spectrum Scale cluster is started on all nodes by running the following command:

/usr/lpp/mmfs/bin/mmstartup -a

4. Ensure that the IBM Spectrum Scale filesystem is mounted on all nodes by running the following com-

mand:

/usr/lpp/mmfs/bin/mmmount<fs-name> -a

5. Create a shared node information file for the ESS cluster and name it

/var/lib/ambari-server/resources/shared_gpfs_node.cfg

on the Ambari server. This file must contain only one hostname which is the hostname of the node in the ESS

cluster. Ambari uses this one node to join the ESS cluster. Password-less SSH must be configured from the Am-

bari server to this node.

# cat /var/lib/ambari-server/resources/shared_gpfs_node.cfg smn-dat.ibm.com

6. Ensure that the IBM Spectrum Scale NSD stanza file, gpfs_nsd, does not exist under /var/lib/ambari-

server/resources/ on the Ambari server node.

7. If you have not started the IBM Spectrum Scale cluster but are on the Ambari Assign Slaves and Clients

page, click the Previous button to go back to Assign Master page in Ambari. Then start the IBM Spectrum Scale

cluster and mount the file system onto all the nodes. Go back to the Ambari gui to continue on to the “Assign

Slaves and Client” panel.

Ambari automatically detects the mounted file system and reflects it in Custom Service page for IBM Spectrum

Scale.

8. In this mode, Ambari can create local cache disk for Hadoop usage. Create the following file:

[root@compute000 GPFS]# cat /var/lib/ambari-server/resources/hadoop_disk DISK|compute001.pri-vate.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg,/dev/sdi,/dev/sdj,/dev/sdk,/dev/sdl,/dev/sdm,/dev/sdn,/dev/sdo,/dev/sdp DISK|compute002.pri-vate.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg,/dev/sdi,/dev/sdj,/dev/sdk,/dev/sdl,/dev/sdm,/dev/sdn,/dev/sdo,/dev/sdp DISK|compute003.pri-vate.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg,/dev/sdi,/dev/sdj,/dev/sdk,/dev/sdl,/dev/sdm,/dev/sdn,/dev/sdo,/dev/sdp DISK|compute005.pri-vate.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg,/dev/sdi,/dev/sdj,/dev/sdk,/dev/sdl,/dev/sdm,/dev/sdn,/dev/sdo,/dev/sdp DISK|compute006.pri-vate.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg,/dev/sdi,/dev/sdj,/dev/sdk,/dev/sdl,/dev/sdm,/dev/sdn,/dev/sdo,/dev/sdp

Page 35: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

35/134

9. Add the file name on the Custom Service page, Hadoop local cache disk stanza file.

FIGURE 1 AMBARI IBM SPECTRUM SCALE HADOOP LOCAL CACHE FILE STANZA

Note: If you are not using shared storage, you do not need this configuration and you can leave this parameter

unchanged in the Ambari GUI.

For a pre-created IBM Spectrum Scale cluster:

IBM Spectrum Scale NSD stanza file is not required because the filesystem already exists. Because

Ambari does not allow blank value, leave the default value of IBM Spectrum Scale NSD stanza file.

5.5.3 Deploy IOP over new IBM Spectrum Scale file system (FPO support only)

To deploy IOP on a new IBM Spectrum Scale FPO cluster, perform the following set-up points:

1. Prepare a IBM Spectrum Scale NSD stanza file

Two types of NSD files are supported for file system auto creation. One is the preferred simple format, and another is the standard IBM Spectrum Scale NSD file format for IBM Spectrum Scale experts.

If a simple NSD file is used, Ambari selects the proper metadata and data ratio for you. If possible, Am-bari creates partitions on some disks for Hadoop intermediate data, which improves the Hadoop per-formance.

If the standard IBM Spectrum Scale NSD file is used, administrators are responsible for the storage space arrangement. One policy file is also required when the standard IBM Spectrum Scale NSD file is used.

Page 36: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

36/134

See Preparing a stanza File and add the gpfs_nsd file to /var/lib/ambari-server/resources/ on the Am-bari server node.

2. Apply the partition algorithm

Apply the algorithm for system pool and usage.

3. Apply the failure group selection rule

Failure groups are created dependent on the rack location of the node.

4. Define the Rack mapping file

Nodes can be defined to belong to racks.

5. Partition the function matrix

The reason why one disk is divided into two partitions is so that one partition can be used for ext3/ext4 to store the map/reduce intermediate data, while another partition is used as a data disk in the IBM Spectrum Scale file system. Also, only data disks can be partitione. Metadata disks cannot be parti-tioned.

For more information on each of the setup points, see to Preparing a stanza File and IBM Spectrum Scale-FPO Deployment .

6. Installation of a software stack

6.1 Overview

Before starting the software deployment, review the Planning, IBM Spectrum Scale Deployment Modes and

Limitations sections.

Software Version

IBM BigInsights Ambari package 4.1.0.2

IBM Open Platform (IOP) for Hadoop

IOP 4.1.0.2 (Introduced SLES support)

IOP-UTILS 1.1

IBM Spectrum Scale 4.1.1.7 or 4.2.0.3

Page 37: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

37/134

HDFS Transparency 2.7.0-3

GPFS Ambari integration Module 4.1-0

Note:

To install the IBM Spectrum Scale service, an existing HDFS cluster is required. This can be created by installing the BI Ambari IOP stack with native HDFS first.

HDFS Transparency version 2.7.0.3 and later requires OpenJDK version 1.8

The GPFS Ambari integration module for HDFS Transparency requires HDFS Transparency version 2.7.0.3

6.2 Ambari IOP installation

If you have an IBM Spectrum Scale and HDFS Transparency cluster installed, review the Important Notes under

the Different installation modes.

Note: To configure High Availability [HA], set up Ambari IOP for HDFS before configuring HDFS HA and proceed-

ing to integrate the IBM Spectrum Scale integration module. See Setting up High Availability [HA] for more in-

formation.

6.2.1 Install the Ambari Server RPM

1. Log on to the Ambari server and create the Ambari YUM repo file, ambari.repo. In this example the Ambari

server is c902f09x02.

In the ambari.repo file, replace the hostname with the Ambari-server hostname and use the appropriate

value for base URL of the local repository that was configured previously.

WARNING: Verify that http:// or https:// is functioning for the repository.

Add the ambari.repo file on the Ambari-server only.

Note: If you plan to install the Symphony service later, read the Symphony Integration limitation section on

how to set up the ambari.repo file before proceeding.

[root@c902f09x02 ~]# cat /etc/yum.repos.d/ambari.repo [BI_AMBARI-2.1.0] name=ambari-2.1.0 baseurl=http://<Yum-Server>/repos/Ambari/rhel/7/x86_64/2.1.x/Updates/2.1.0_Spark-1.5.1 enabled=1 gpgcheck=0 [root@c902f09x02 ~]# yum clean all; yum makecache

Page 38: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

38/134

2. Use Yum to install the ambari-server rpm:

yum -y install ambari-server

[root@c902f09x02 ~]# yum -y install ambari-server Loaded plugins: product-id, subscription-manager This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register. Resolving Dependencies --> Running transaction check ---> Package ambari-server.x86_64 0:2.1.0_IBM-5 will be installed --> Processing Dependency: postgresql-server >= 8.1 for package: ambari-server-2.1.0_IBM-5.x86_64 --> Running transaction check ---> Package postgresql-server.x86_64 0:9.2.7-1.el7 will be installed --> Processing Dependency: postgresql(x86-64) = 9.2.7-1.el7 for package: postgresql-server-9.2.7-1.el7.x86_64 --> Processing Dependency: postgresql-libs(x86-64) = 9.2.7-1.el7 for package: postgresql-server-9.2.7-1.el7.x86_64 --> Processing Dependency: libpq.so.5()(64bit) for package: postgresql-server-9.2.7-1.el7.x86_64 --> Running transaction check ---> Package postgresql.x86_64 0:9.2.7-1.el7 will be installed ---> Package postgresql-libs.x86_64 0:9.2.7-1.el7 will be installed --> Finished Dependency Resolution Dependencies Resolved =========================================================================================================================== Package Arch Version Repository Size =========================================================================================================================== Installing: ambari-server x86_64 2.1.0_IBM-5 BI_AMBARI-2.1.0 381 M Installing for dependencies: postgresql x86_64 9.2.7-1.el7 xCAT-rhels7.1-path0 2.9 M postgresql-libs x86_64 9.2.7-1.el7 xCAT-rhels7.1-path0 229 k postgresql-server x86_64 9.2.7-1.el7 xCAT-rhels7.1-path0 3.8 M Transaction Summary =========================================================================================================================== Install 1 Package (+3 Dependent packages) Total download size: 387 M Installed size: 439 M Downloading packages: (1/4): postgresql-9.2.7-1.el7.x86_64.rpm | 2.9 MB 00:00:00 (2/4): postgresql-libs-9.2.7-1.el7.x86_64.rpm | 229 kB 00:00:00 (3/4): postgresql-server-9.2.7-1.el7.x86_64.rpm | 3.8 MB 00:00:00 (4/4): ambari-server-2.1.0_IBM-5.x86_64.rpm | 381 MB 00:00:06 --------------------------------------------------------------------------------------------------------------------------- Total 62 MB/s | 387 MB 00:00:06 Running transaction check Running transaction test Transaction test succeeded Running transaction

Page 39: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

39/134

Installing : postgresql-libs-9.2.7-1.el7.x86_64 1/4 Installing : postgresql-9.2.7-1.el7.x86_64 2/4 Installing : postgresql-server-9.2.7-1.el7.x86_64 3/4 Installing : ambari-server-2.1.0_IBM-5.x86_64 4/4 Verifying : postgresql-server-9.2.7-1.el7.x86_64 1/4 Verifying : postgresql-libs-9.2.7-1.el7.x86_64 2/4 Verifying : ambari-server-2.1.0_IBM-5.x86_64 3/4 Verifying : postgresql-9.2.7-1.el7.x86_64 4/4 Installed: ambari-server.x86_64 0:2.1.0_IBM-5 Dependency Installed: postgresql.x86_64 0:9.2.7-1.el7 postgresql-libs.x86_64 0:9.2.7-1.el7 postgresql-server.x86_64 0:9.2.7-1.el7 Complete! [root@c902f09x02 ~]#

6.2.2 Update the Ambari Configuration

1. Update the Ambari configuration file to use the local repository.

If a cloned local Yum repository is used, the Ambari configuration file must be updated before setting up the Ambari server.

Update the value of openjdk1.8.url and openjdk1.7.url in /etc/ambari-server/conf/ambari.properties.

Specify the hostname of the local repository server. Also check the protocol type (http vs. https).

vi /etc/ambari-server/conf/ambari.properties

[root@c902f09x02 ~]# cat /etc/ambari-server/conf/ambari.properties | grep openjdk | grep .url openjdk1.7.url=http://ibm-open-platform.ibm.com/repos/IOP-UTILS/rhel/7/x86_64/1.1/openjdk/jdk-1.7.0.tar.gz openjdk1.8.url=http://ibm-open-platform.ibm.com/repos/IOP-UTILS/rhel/7/x86_64/1.1/openjdk/jdk-1.8.0.tar.gz [root@c902f09x02 ~]#

2. The size of the threadpool must be increased to match the number of CPUs on the node which the

Ambari server is running.

Threadpools requiring to be modified:

server.execution.scheduler.maxThreads client.threadpool.size.max agent.threadpool.size.max

For example, if you have 16 CPUs, edit the /etc/ambari-server/conf/ambari.properties file to change the thread values:

Page 40: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

40/134

vi /etc/ambari-server/conf/ambari.properties

# Find the number of CPUs on node [root@c902f09x02 ~]# nproc

16

# Update the ambari.properties file thread values to the nproc value [root@c902f09x02 ~]# grep -i thread /etc/ambari-server/conf/ambari.properties server.execution.scheduler.maxThreads=16 # thread pool maximums

client.threadpool.size.max=16 agent.threadpool.size.max=16 [root@c902f09x02 ~]#

3. Update the Ambari repos file.

The repoinfo.xml file defines the locations of the software repositories that Ambari uses to install IOP on

the cluster. This file must be updated by editing the baseurl for each repository before starting the Ambari

server.

Update the correct IOP and IOP-Utils to be displayed on Advanced Repository options page under the

Select Stack step:

vi /var/lib/ambari-server/resources/stacks/BigInsights/4.1/repos/repoinfo.xml

<?xml version="1.0"?> <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> <reposinfo> <mainrepoid>IOP-4.1</mainrepoid> <os family="redhat7"> <repo> <baseurl>http://birepo-build.svl.ibm.com/repos/IOP/RHEL7/x86_64/4.1-Spark-1.5.1</baseurl> <repoid>IOP-4.1</repoid> <reponame>IOP</reponame>

Page 41: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

41/134

</repo> <repo> <baseurl>http://birepo-build.svl.ibm.com/repos/IOP-UTILS/rhel/7/x86_64/1.1 </baseurl> <repoid>IOP-UTILS-1.1</repoid> <reponame>IOP-UTILS</reponame> </repo> </os> </reposinfo>

4. Update services configuration files

Update the Spark eventlog permission.

Update the Spark params.py file in the Ambari server stack definition so that the Spark history services can

be started.

The parameter, spark_eventlog_dir_mode, is 01777 by default which will cause permission issues when

the Spark History Service is started. The workaround is to change the value to 0777. This change can be

made at any time before or after the initial deployment.

vi /var/lib/ambari-server/resources/stacks/BigInsights/4.1/services/SPARK/package/scripts/params.py

70 spark_hdfs_user_dir = format("/user/{spark_user}") 71 spark_hdfs_user_mode = 0755 72 spark_eventlog_dir_mode = 0777 73 spark_jar_hdfs_dir = "/iop/apps/4.1.0.0/spark/jars" 74 spark_jar_hdfs_dir_mode = 0755 75 spark_jar_file_mode = 0444 76 spark_jar_src_dir = "/usr/iop/current/spark-client/lib" 77 spark_jar_src_file = "spark-assembly.jar"

Note: If you have already set up Ambari, restart the Ambari server after making this change.

Update HIVE permission

Update the hive.py file in the Ambari server stack definition to change the permission of Hive’s data ware-house directory.

The data warehouse directory is specified as hive.metastore.warehouse.dir. The default directory is /apps/hive/warehouse. When the Hive service is started, the permission of this directory is reset to 770 (rwx rwx ---) and the directory is owned by hive:hadoop. Therefore, other users from other groups cannot access the directory and thus cannot create any hive database or table under the warehouse. Make the following change to allow other users to be able to create hive database or tables:

vi /var/lib/ambari-server/resources/stacks/BigInsights/4.0/services/HIVE/package/scripts/hive.py

171 params.HdfsResource(params.hive_apps_whs_dir,

172 type="directory",

Page 42: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

42/134

173 action="create_on_execute",

174 owner=params.hive_user,

175 group=params.user_group,

176 mode=0777

177 )

This change can be done after the initial deployment and the Ambari server must be restarted to make it effective.

6.2.3 Setting up the Ambari server

Run the setup command to configure your Ambari Server, Database, JDK, LDAP, and other options:

ambari-server setup

[root@c902f09x02 ~]# ambari-server setup Using python /usr/bin/python2.7 Setup ambari-server Checking SELinux... SELinux status is 'disabled' Customize user account for ambari-server daemon [y/n] (n)? n Adjusting ambari-server permissions and ownership... Checking firewall status... Redirecting to /bin/systemctl status iptables.service Checking JDK... [1] OpenJDK 1.8.0 [2] OpenJDK 1.7.0 (deprecated) [3] Custom JDK ============================================================================== Enter choice (1): 1 Downloading JDK from http://c902mnx11.pok.stglabs.ibm.com/repos/IOP-UTILS/rhel/7/x86_64/1.1/openjdk/jdk-1.8.0.tar.gz to /var/lib/ambari-server/resources/jdk-1.8.0.tar.gz jdk-1.8.0.tar.gz... 100% (57.6 MB of 57.6 MB) Successfully downloaded JDK distribution to /var/lib/ambari-server/resources/jdk-1.8.0.tar.gz Installing JDK to /usr/jdk64/ Successfully installed JDK to /usr/jdk64/ Completing setup... Configuring database... Enter advanced database configuration [y/n] (n)? n Configuring database... Default properties detected. Using built-in database. Configuring ambari database... Checking PostgreSQL... Running initdb: This may take upto a minute. Initializing database ... OK

Page 43: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

43/134

About to start PostgreSQL Configuring local database... Connecting to local database...done. Configuring PostgreSQL... Restarting PostgreSQL Extracting system views... ambari-admin-2.1.0_IBM_5.jar ..... Adjusting ambari-server permissions and ownership... Ambari Server 'setup' completed successfully. [root@c902f09x02 ~]#

6.2.4 Starting the Ambari server

1. By default, the Ambari server uses Port 8080. If there are any other services that are using this port,

another port can be assigned to Ambari. To change the default port of Ambari, change or add the

following line in /etc/ambari-server/conf/ambari.properties:

client.api.port=<port_number>

The port number can be changed later with ambari-server restart after adding the port you want to

save the /etc/ambari-server/conf/ambari.properties file.

2. Optionally, PostgreSQL is used by the Ambari server to store the cluster configuration information.

Ensure that it restarts after reboot:

chkconfig postgresql on

3. Start the Ambari server:

ambari-server start

[root@c902f09x02 ~]# ambari-server start

Using python /usr/bin/python2.7

Starting ambari-server

Ambari Server running with administrator privileges.

Organizing resource files at /var/lib/ambari-server/resources... Server PID at: /var/run/ambari-server/ambari-server.pid

Server out at: /var/log/ambari-server/ambari-server.out

Server log at: /var/log/ambari-server/ambari-server.log

Waiting for server start....................

Ambari Server 'start' completed successfully.

Page 44: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

44/134

[root@c902f09x02 ~]#

6.2.5 Ambari Install Wizard

1. Open a browser (Firefox or Internet Explorer) to log on to the Ambari administrator console at

http://<ambari-server-host-name>:8080.

The default Ambari account is admin:admin.

2. Click Sign in.

FIGURE 2 AMBARI IOP LOGIN

6.2.6 Create an IOP cluster

6.2.6.1 Welcome Screen

Click Launch Install Wizard to create a new Ambari cluster.

Page 45: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

45/134

FIGURE 3 AMBARI IOP WELCOME PAGE

6.2.6.2 Cluster Name

Type in a name for the cluster (e.g. mycluster) and then click Next.

FIGURE 4 AMBARI IOP CLUSTER NAME

Page 46: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

46/134

6.2.6.3 Select Stack

1. The BigInsights stack now appears on the Select Stack page.

Stack Name Description

BigInsights 4.1 Installs BigInsights with HDFS

Note: Expand the Advanced Repository Options section to review the repository settings. Ensure that

the local mirror repository configured is correct.

If there are multiple OS listed, only place a check on the specific OS for your environment.

Validate the base URLs for all the local repositories: IOP and IOP-UTILS.

Note: If you want to use the public BigInsights IOP 4.1 repository, such as https://ibm-open-plat-form.ibm.com/repos/IOP/RHEL7/x86_64/4.1-Spark-1.5.1 and IOP-UTILS 1.1, such as https://ibm-open-platform.ibm.com/repos/IOP-UTILS/rhel/7/x86_64/1.1, ensure that all the nodes in the cluster can access the internet. In this mode, installation might take more time because all the RPM pack-ages must be downloaded during installation.

2. After verifying the information, click Next.

Figure 5 ambari iop select stack

Page 47: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

47/134

6.2.6.4 Install Options

1. On the Install Options screen, in the Target Hosts section, specify the hosts information. Ambari

requires a list of fully qualified domain names (FQDNs) of the nodes in the cluster.

a) For an existing filesystem (e.g. ESS), verify that the list of host names used on the Ambari

Target Hosts section are the data network addresses which IBM Spectrum Scale uses for the

cluster setup. Otherwise during the installation of adding the IBM Spectrum Scale service the

installation will fail with incorrect hostname.

b) If this is an ESS, the ESS I/O servers must not be part of the Ambari cluster.

c) Make sure the Ambari server node is also the Ambari agent node and the GPFS Master node.

2. For SSH Private Key, upload or copy and paste the key from /root/.ssh/id_rsa from the Ambari server.

[root@c902f09x02]# cat /root/.ssh/id_rsa

-----BEGIN RSA PRIVATE KEY----- OIIEpAIAAAKCAQEAuh/4pytncsHXShXRJFONbxJD6bsBkn8zm8x3ifCiS2VvSTBQ ydI3BYyUYco2dc8vbXT2h6sQGnBIOTWMU12izVZFqqT29kfHhajX8pWLNVCn/Vcx LkkI7V3b9uiWOMk/dYzrEYDaPqGmiFWFM7RL8cg8RAqCGvEujJX0H0dGFvSD8Acv 1tbKCBnSzxcTfNwkGwoaR9TWejiFEmL55DFshC3+xBfVCOxadItdN+3KgDDFGZEE m2H7Og6PYmFxV3t6hY+ozFMVJ0loc847Ni4x8A4td0Fy9QjgEVtbxQETfIUXsaS4 cd6A7hDjqvuhj34xqcIX9dYQbvDWhZcwGVxxxxxxAQABAoIBACYeUiB6hS89f8gO e8zCx96NkRcXU5UbNAiecYTwoWxrk8UbfhA3W0lppyH39fteuUnjgHH8mMmxYTlG PDz+mk8Pcikmq+V4geZf1Ao8kkwS/rSl3M6r6oYiiOAidlGe9b4vZB0rlIbrOF4H lcYnEL6t0ZnlxQsfsfasfsdfsfsfvvc7v/YLpQRIJgds8A+rSUhuid0IiBdo1z6IWsgAc8v StBX8BNjH60WT86j5tV5ENotKhpoZ/j6lX1GWzXDaGo3wOjSqsxBO6OTgO/OvsSd AMc7OgpnBaD2631K9q7naDhhNDoduYPtT0l0Noj4H/iYromCe62W8nE9958VCeiL q1v7fYECgYEA4CdqdW+GbnZ5MswepnvgNgs/pIyEIWAMawnJ4Iqu75jnwHxQ9Qte UnyeovuKt9OrsfwCI14UTZMpr1ww3u1HlKBCNv1852c5kR5OH/r5FkAV//xcF6w 319OCiM60cb1og0eWYgoDeHPydAwLjeIYGT3UNckTJSpEzXta394cyECgYEA1JFr i39+Nk0IPfqtD/FDKDmuFY2WZKpz4BELpwg0n5Gbjmv6Wv6DlszcA2B5Hb6XqJVE 3xDaW7Kv111111111lkM3z8Tde470DTUWsmPmfdQqmYso8ojSPJ6vT8tWsjJfLTV 9YAEzVsXFX4CqjHBnFH6kxWvHdHgI3xqtsKxFMsCgYB/3PwVUTDWAi7Qky9IuJEF QxViv+T/RNLQnBzUQUfY1NgeLMvhfEKpuvyi6+oNQmkjA+ukdU6UxOykkJuqZv7B 9T4114NNHDNMBvM/D3l4IGzZxd3rw5gRU0Qo9D+xOlUTw3f0in6OrKji7icNtg/N LMsHCjJinMp1cuWj1GrZAQKBgQC4lH3aaB7kcAeUKjRU/57dMxjjRkstpoVwL9z6 cbmgRgPZrBF9KjyBmeoCVKdXGMUAcn888888881MxTv9bkP8Iuo31C6ut/db8Jm A1hazdKR5J8mqsPw9/10j4OGrYuoai7QOxXwbASjeQQ/XW1NnDIFvnKNZk1HAyZ3 BKBX+cw34w05845adofBbYJ92f0nAXoU3iFVZZ5KIqg4KQCjp0qvObBq0xFG0zu4 +kRuROYGGhlOnQkA5t5fbLxDcbF9MVQRBSRTJl6BDVYsmhyfU9XMDL6IRUpaO1aA Am5TtN5R+cJn4AeAd1U/EnNPISxLX5G/rjDEwafpW0iUkx23333XgVslA== -----END RSA PRIVATE KEY-----

Page 48: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

48/134

FIGURE 6 AMBARI IOP INSTALL OPTIONS – HOST LIST

3. After specifying all of the hosts information and the SSH private key, click Register and Confirm.

6.2.6.5 Confirm Hosts

On the Confirm Hosts screen, click Next. Ambari installs the agents on all the nodes and performs the basic

verification checks. Fix errors to ensure that the pre-requisites have been met.

Page 49: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

49/134

FIGURE 7 AMBARI IOP CONFIRM HOSTS

6.2.6.6 Choose Services

On the Choose Services screen, place a check next to the services that must be installed. Click Next to continue

the set up process.

Note: Any services that is not checked will not be selected for installation.

Page 50: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

50/134

Figure 8 ambari iop choose services

6.2.6.7 Assign Masters

1. On the Assign Masters screen, services that belong to a master or management node are presented.

Ambari set up the default configurations based on the number of requested nodes and services. Select

the location where you want to deploy the master node services if you want change the configuration.

2. After verifying the services to be placed onto the specific hosts, click Next.

Page 51: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

51/134

FIGURE 9 AMBARI IOP ASSIGN MASTERS

6.2.6.8 Assign Slaves and Clients

1. On the Assign Slaves and Clients screen, select the client and slave components that are to be

deployed across the hosts.

2. Click Next after reviewing all the components for each listed host.

Important:

If you anticipate adding the Big SQL service at a later time, you must include all clients on all

the anticipated Big SQL worker nodes. Big SQL specifically needs the HDFS, Hive, HBase, Sqoop,

HCat, and Oozie clients. See IBM BigInsights - Running the installation package for more

information.

The /etc/hadoop/conf/slaves file are derived from the hosts where you choose to place the

DataNode service.

Page 52: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

52/134

FIGURE 10 AMBARI IOP ASSIGN SLAVES AND CLIENTS

6.2.6.9 Customize Services

1. If you have an IBM Spectrum Scale and HDFS Transparency cluster installed, review the Important

Notes under the Different installation modes.

2. Ambari can select mounted paths other than / as default values for some directories when local file

system must be used. When a shared file system is selected by Ambari, some services do not function

properly.

The following directories can be affected by this shared file system issue. Therefore, verify that the

following configuration directories are in a local filesystem.

yarn.nodemanager.log-dirs (YARN Advanced)

yarn.nodemanager.local-dirs (YARN Advanced)

yarn.timeline-service.leveldb-timeline-store.path (YARN Advanced)

HBase local directory (HBase Advanced, under Advanced hbase-site)

Oozie Data Dir (Oozie)

ZooKeeper directory (ZooKeeper)

log.dirs (Kafka)

Page 53: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

53/134

FIGURE 11 AMBARI IOP CUSTOMIZE SERVICE IOP TABS

Note:

Check all the services with a red circle . The red circle denotes mandatory entries before the service can be deployed. NOTE: When creating the Ambari IOP cluster, do not create local partition file system to be used for the HDFS if IBM Spectrum Scale is used. Use a directory name that is not already hosting the Hadoop cluster.

3. After verifying all the configurations, click Next for deployment.

6.2.6.10 Review - Deployment

Review the information on the deployment page, click Deploy to begin the installation.

Page 54: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

54/134

FIGURE 12 AMBARI IOP DEPLOYMENT REVIEW

6.2.6.11 Install, Start and Test

After you click Deploy, Ambari installs all selected services, start the services, and runs service checks for each

of the services. If all services are installed successfully, Ambari considers the operation a success, even if some

services could not be started or if some service checks failed. You can continue even if there were some failures

as long as everything is installed.

Page 55: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

55/134

FIGURE 13 AMBARI IOP INSTALL, START AND TEST

6.2.6.12 Summary

Summary lists all the services applied onto the specific hosts. After reviewing the information, click Complete

to view the Ambari dashboard.

FIGURE 14 AMBARI IOP SUMMARY

6.2.6.13 Ambari Cluster View

Page 56: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

56/134

1. The Ambari Cluster is now deployed with the IOP stack. The Ambari dashboard displays the metric in-

formation and the deployed services are displayed on the left side of the panel. The services can now

be managed by Ambari.

FIGURE 15 AMBARI IOP MAIN CLUSTER VIEW

2. In the event of any failure during the initial cluster deployment, it is a good practice to go through each

service one by one to run its service check command. Ambari runs all of the service checks as part of

the installation wizard, but if anything were to fail, Ambari might not have run all of the service checks.

On the dashboard page for each service in the Ambari GUI, go to Service Actions > Run Service Check.

6.3 Setting up High Availability [HA]

Set up high availability to protect against planned and unplanned events. The process sets up a standby

NameNode configuration so that failover can happen automatically.

To configure with the High Availability option when IBM Spectrum Scale service is not deployed onto the clus-

ter, perform the following steps:

Page 57: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

57/134

1. Log into the Ambari GUI.

2. To stop all services, select dashboard → Actions → Stop All.

3. From the Ambari dashboard, click the HDFS service.

4. Select Service Actions > Enable NameNode HA and follow the steps.

See BigInsight 4.1 - Setting up NameNode high availability for more information on setting up the NameNode

HA.

If IBM Spectrum Scale service is already deployed in your environment, see Configure HA onto an existing Am-

bari IOP and IBM Spectrum Scale cluster to configure the HA environment.

6.4 Adding additional software services

The recommended procedure is to add the additional services (e.g. BigR, BigSQL, etc) and verify the software stack against native HDFS before adding the IBM Spectrum Scale integration module. This will help pinpoint if the added services have an issue with native HDFS or with IBM Spectrum Scale.

6.4.1 IBM BigInsights value-add modules

Page 58: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

58/134

Several value-add services from BigInsights can be installed by using the Ambari GUI. Any of these services can

be optionally installed and do not explicitly depend on one another. The BigInsights Home service provides a

web UI that serves as a launching pad for the web UI's of the Data Server Manager, BigSheets and Text Analyt-

ics services.

If you install Big SQL, BigSheets or Text Analytics, install the BigInsights Home service.

See BigInsights value-add services on IBM Spectrum Scale for more information.

6.4.2 IBM Symphony

IBM Platform Symphony provides better performance and efficiency, as well as superior management and

monitoring for your Hadoop workload environment.

See Symphony Integration for more information.

6.5 Install the GPFS integration module into Ambari

6.5.1 Stop Ambari services

On the Ambari dashboard, select Actions > Stop All to stop all services.

Page 59: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

59/134

FIGURE 16 AMBARI IOP STOP ALL SERVICES

6.5.2 Installing the GPFS Ambari integration module

1. Install the gpfs.hdfs-protocol.ambari integration module on the Ambari server node:

chmod 755 gpfs.hdfs-protocol.ambari-iop_<version>.noarch.bin

./gpfs.hdfs-protocol .ambari-iop_<version>.noarch.bin

Note: Do not save the gpfs-ambari integration package in /root/.

[root@c902f09x02 tmp]# ./gpfs.hdfs-transparency.ambari-iop_4.1-0.noarch.bin International License Agreement for Non-Warranted Programs Part 1 - General Terms BY DOWNLOADING, INSTALLING, COPYING, ACCESSING, CLICKING ON AN "ACCEPT" BUTTON, OR OTHERWISE USING THE PROGRAM, LIC ENSEE AGREES TO THE TERMS OF THIS AGREEMENT. IF YOU ARE ACCEPTING THESE TERMS ON BEHALF OF LICENSEE, YOU REPRESENT

Page 60: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

60/134

AND WARRANT THAT YOU HAVE FULL AUTHORITY TO BIND LICENSEE TO THESE TERMS. IF YOU DO NOT AGREE TO THESE TERMS, * DO NOT DOWNLOAD, INSTALL, COPY, ACCESS, CLICK ON AN "ACCEPT" BUTTON, OR USE THE PROGRAM; AND * PROMPTLY RETURN THE UNUSED MEDIA AND DOCUMENTATION TO THE PARTY FROM WHOM IT WAS OBTAINED FOR A REFUND OF THE AMO UNT PAID. IF THE PROGRAM WAS DOWNLOADED, DESTROY ALL COPIES OF THE PROGRAM. 1. Definitions .... c. wasted management time or lost profits, business, revenue, goodwill, or anticipated savings. Z125-5589-05 (07/2011) Do you agree to the above license terms? [yes or no] yes Unpacking... Done Installing... Preparing... ################################# [100%] Updating / installing... 1:gpfs.hdfs-transparency.ambari-iop################################# [100%] Done. [root@c902f09x02 tmp]#

2. For a shared file system (e.g. ESS cluster), create a file and name it

/var/lib/ambari-server/resources/shared_gpfs_node.cfg and save it on the Ambari server.

This file must contain the Service Management Node (SMN) hostname, and the SMN must be in the

ESS cluster. Ambari must use the SMN to join the ESS cluster.

[root@mn01-dat ~]# vi /var/lib/ambari-server/resources/shared_gpfs_node.cfg

[root@mn01-dat ~]# cat /var/lib/ambari-server/resources/shared_gpfs_node.cfg

smn-dat.ibm.com

6.5.3 Restart Ambari server

After the GPFS Ambari Integration module is deployed, restart the Ambari Server.

# ambari-server restart

[root@c902f09x02 tmp]# ambari-server restart Using python /usr/bin/python2.7 Restarting ambari-server Using python /usr/bin/python2.7 Stopping ambari-server

Page 61: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

61/134

Ambari Server stopped Using python /usr/bin/python2.7 Starting ambari-server Ambari Server running with administrator privileges. Organizing resource files at /var/lib/ambari-server/resources... Server PID at: /var/run/ambari-server/ambari-server.pid Server out at: /var/log/ambari-server/ambari-server.out Server log at: /var/log/ambari-server/ambari-server.log Waiting for server start.................... Ambari Server 'start' completed successfully. [root@c902f09x02 tmp]#

Log back into the Ambari GUI after the ambari-server restart command completed.

6.6 Adding the IBM Spectrum Scale service to Ambari

Ensure to review the Planning, IBM Spectrum Scale Deployment Modes, Preparing a stanza File, IBM Spectrum

Scale-FPO Deployment and Limitations sections before starting deployment.

Check the IBM Spectrum Scale Master and consider its placement.

The IBM Spectrum Scale Master node designates the node from where Ambari issues commands affecting the

entire cluster. For example, when IBM Spectrum Scale is first being installed and an FPO cluster is first being

created, the commands are all executed on the IBM Spectrum Scale Master node. Youmust ensure that

password-less SSH is set up on every node. As another example, if the configuration changes are made after

the cluster has been deployed, the IBM Spectrum Scale Master Node executes the commands to reconfigure

the cluster and, if necessary, restarts IBM Spectrum Scale on all nodes. The term Master is used to follow the

convention used by the other Hadoop services. Other than being one of the quorum nodes, the IBM Spectrum

Scale Master node has no special role in the IBM Spectrum Scale cluster.

Important: If the IBM Spectrum Scale cluster has been created, a quorum node must be selected as the IBM

Spectrum Scale Master node.

Important: Create the required files from the Preparing a stanza File section

e.g. NSD file, gpfs_nsd, in /var/lib/ambari-server/resources

6.6.1 Add Service

On the dashboard, select Actions > Add Service.

Page 62: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

62/134

FIGURE 17 AMBARI ADD SERVICES

6.6.2 Choose Services

On the Add Service Wizard panel, select the Spectrum Scale package and click Next.

Page 63: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

63/134

FIGURE 18 AMBARI IBM SPECTRUM SCALE SERVICE

6.6.3 Assign Masters

Note: Ambari Server and IBM Spectrum Scale GPFS master must be co-located on the same node.

Page 64: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

64/134

FIGURE 19 AMBARI IBM SPECTRUM SCALE ASSIGN MASTERS

In this example, the Ambari Server is on c902f09x02.

GPFS Master must be set to co-locate on the same node as the Ambari Server.

After reviewing where the GPFS Master will be hosted, click Next.

6.6.4 Assign Slaves and Clients

Note: The native HDFS NameNode and IBM Spectrum Scale Transparency NameNode are set to be on the same node by GPFS Ambari integration module. NameNode is configured as the hostname by fs.defaultFS in the core-site.xml file in Hadoop version

2.4, 2.5, and 2.7.

The Secondary NameNode in native HDFS is not needed for HDFS Transparency because the HDFS Transparency NameNode is stateless and doesn’t maintain FSImage-like or EditLog information.

Page 65: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

65/134

1. On the Assign Slaves and Clients screen, select the client and slave components to be deployed across

the data nodes.

Note:

For client only nodes where you do not want IBM Spectrum Scale, then do not select the GPFS

Transparency Node and GPFS Node options.

Select the management nodes and any DataNodes that are part of the IBM Spectrum Scale

cluster by checking the boxes in the column for the GPFS Transparency Node and GPFS Node.

This is to run the IBM Spectrum Scale node and IBM Spectrum Scale HDFS Transparency on

every node in the IBM Spectrum Scale cluster. The HDFS transparency nodes must be GPFS nodes

because the HDFS transparency depends on the GPFS mount point.

FIGURE 20 AMBARI IBM SPECTRUM SCALE ASSIGN SLAVES AND CLIENTS

2. After reviewing the locations of GPFS Transparency Node and the GPFS Node, click Next.

6.6.5 Customize Services

1. If you have an IBM Spectrum Scale and HDFS Transparency cluster installed, review the Important

Notes under the Different installation modes.

2. On the Customize Services screen, review the IBM Spectrum Scale page. There are two tabs under the

IBM Spectrum Scale page: Standard and Advanced configuration.

Page 66: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

66/134

If a new IBM Spectrum Scale cluster is being created, configuration fields on both tabs are pop-

ulated with values taken from the Deploying a big data solution using IBM Spectrum Scale -

Hadoop Best Practices White Paper.

In the Standard tab, adjust the parameters by using the slider bars and drop-down menus. The

Advanced tab contains parameters that do not need to be changed frequently.

IMPORTANT: Read IBM Spectrum Scale deployment modes to know more about the mode you are deploying IOP on and the Table 6 IBM Spectrum Scale checklist parameters to know which parameters will affect the system and the Standard and Advanced tabs in the Ambari wizard.

Page 67: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

67/134

Here are important IBM Spectrum Scale parameters checklists:

Standard tab Rule Advanced tab Rule

Cluster Name Advanced core-site: fs.defaultFS Make sure hdfs://lo-calhost:8020 is used

FileSystem Name Advanced gpfs-advance: gpfs.quorum.nodes

The node number must be odd

FileSystem Mount Point

NSD stanza file See guide in Preparing a stanza File

Policy file See guide in Policy File

Hadoop local cache disk stanza file

See guide in Deploy IOP over an existing IBM Spectrum Scale file system (ESS)

Default Metadata Replicas <= Max Metadata Rep-licas

Default Data Replicas <= Max Data Replicas

Max Metadata Replicas

Max Data Replicas

TABLE 6 IBM SPECTRUM SCALE CHECKLIST PARAMETERS

3. Verify the configuration for the IBM Spectrum Scale service.

If you have already created the IBM Spectrum Scale cluster and are using Ambari to deploy IOP and the

Hadoop integration components for IBM Spectrum Scale, the fields are populated by using values

detected from the existing cluster.

For all setups, the parameters with a lock icon must not be changed after deployment. These include

parameters like the cluster name, remote shell, filesystem name, and max data replicas. Therefore,

verify all the parameters with the lock icon before proceeding to the next step. Further, while every

attempt is made to detect the correct values from the cluster, verify that the parameters are imported

properly and make corrections as needed.

Page 68: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

68/134

FIGURE 21 AMBARI IBM SPECTRUM SCALE DATA AND METADATA REPLICAS

The review parameters for Max Data Replicas and Max Metadata Replicas as these values cannot be

changed after the file system is created. If you decrease the values from the default of three, ensure

that it is really what you want. Also, setting the value of Max Data Replicas, Max Metadata Replicas,

Default Data Replicas, and Default Metadata Replicas to 3 implies that at least three failure groups in

the cluster (at least three nodes with disks) or the file system creation will fail.

4. Review the Customize Services panel

Page 69: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

69/134

FIGURE 22 AMBARI IBM SPECTRUM SCALE CUSTOMIZED SERVICES

3. Under the Standard, if you are creating an FPO cluster, set the GPFS NSD filename.

Page 70: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

70/134

Figure 23 ambari IBM Spectrum Scale standard configurations

4. Under the Advanced configuration tab, enter the Ambari admin userid and password.

5. Under the Advanced configuration tab, enter the gpfs repository URL of where the IBM Spectrum Scale

resides.

Page 71: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

71/134

FIGURE 24 AMBARI IBM SPECTRUM SCALE ADVANCED CONFIGURATIONS

6. After verifying all the information in the standard and advanced configuration section, click Next.

6.6.6 Review

Review the information before deploying the service, and click Next to install.

Page 72: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

72/134

FIGURE 25 AMBARI IBM SPECTRUM SCALE REVIEW

6.6.7 Install, Start and Test

1. The Install, Start and Test screen displays the status of the deployment of IBM Spectrum Scale and

HDFS transparency on the DataNodes. The installation will be skipped for clusters that have IBM Spec-

trum Scale and HDFS transparency installed.

2. If the installation is successful, click Next. Otherwise, click on the failure bar to debug the error from

the log information.

Page 73: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

73/134

FIGURE 26 AMBARI IBM SPECTRUM SCALE INSTALL, START AND TEST

6.6.8 Summary

The Summary list all the services applied on the specific hosts. After reviewing the information, click Complete.

FIGURE 27 AMBARI IBM SPECTRUM SCALE SUMMARY

6.6.9 Ambari Cluster View

Page 74: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

74/134

The IBM Spectrum Scale service is added to the Ambari Cluster. IBM Spectrum Scale can now be managed by

Ambari.

6.6.10 Restart Ambari Server

After incorporated the IBM Spectrum Scale service into Ambari, the Ambari server must be restarted to reflect

all the changes.

Log on to the ambari server node and run the following command:

ambari-server restart

[root@c902f09x02 ~]# ambari-server restart Using python /usr/bin/python2.7 Restarting ambari-server Using python /usr/bin/python2.7 Stopping ambari-server Ambari Server stopped Using python /usr/bin/python2.7 Starting ambari-server Ambari Server running with administrator privileges. Organizing resource files at /var/lib/ambari-server/resources... Server PID at: /var/run/ambari-server/ambari-server.pid Server out at: /var/log/ambari-server/ambari-server.out Server log at: /var/log/ambari-server/ambari-server.log Waiting for server start....................

Page 75: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

75/134

Ambari Server 'start' completed successfully. [root@c902f09x02 ~]#

6.6.11 Start all services

1. Log on to Ambari.

2. From the dashboard, select Actions > Start All to start all services.

Note:

If some of the services do not start, start them by going to the host dashboard and restarting each ser-

vice individually or rerunning all services from the Ambari dashboard by selectiing Actions > Start All.

Run Restart all affected components with Stale Configs if dashboard display the request. For Spectrum

Scale, restart from the Spectrum Scale dashboard > Service Actions > Stop and Start options

If any HDFS configuration in hdfs-site or core-site is changed, a restart required alert is displayed for the

IBM Spectrum Scale service. It is important that the HDFS service be restarted then restart the Spec-

trum Scale service. This is how the HDFS Transparency is synchronized with the HDFS configuration

Page 76: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

76/134

files. Note: To restart Spectrum Scale, select from the Spectrum Scale dashboard > Service Actions >

Stop > Start options. See Limitations - General list on Spectrum Scale restart.

7. Verify and Test Installation

1. For an initial installation through Ambari, the UID and GID of the users is consistent over all nodes.

However, if you deploy it for the second time or part of nodes have been created with the same UID or

GID above, the UID and GID of these users might not be consistent over all nodes, as per the AMBARI-

10186 issue, from the Ambari community.

After deployment and during verification of system, check by using

mmdsh -N all id <user-name>

to see whether the UID is consistent across all nodes.

2. After the Ambari deployment, check the IBM Spectrum Scale installed packages on all nodes by using

rpm -qa | grep gpfs to verified that all base IBM Spectrum Scale packages have been installed.

3. Check user id, spark, and user access to filesystem.

Default users: [spark@c902f09x02 user]$ pwd; ls -ltr /bigpfs/user total 0 drwxr-xr-x 2 hbase hadoop 4096 May 14 12:11 hbase drwxr-xr-x 2 hcat hadoop 4096 May 14 12:12 hcat drwx------ 2 hive hadoop 4096 May 14 12:12 hive drwxrwxr-x 3 oozie hadoop 4096 May 14 12:12 oozie drwxr-xr-x 4 spark hadoop 4096 May 15 12:45 spark [spark@c902f09x02 user]$ Spark user information: [spark@c902f09x02 spark]$ id uid=1009(spark) gid=1000(hadoop) groups=1000(hadoop) [spark@c902f09x02 spark]$ HDFS commands: [spark@c902f09x02 spark]$ hadoop fs -ls /user Found 5 items drwxr-xr-x - hbase hadoop 0 2016-05-14 12:11 /user/hbase drwxr-xr-x - hcat hadoop 0 2016-05-14 12:12 /user/hcat drwx------ - hive hadoop 0 2016-05-14 12:12 /user/hive drwxrwxr-x - oozie hadoop 0 2016-05-14 12:12 /user/oozie drwxr-xr-x - spark hadoop 0 2016-05-15 12:38 /user/spark [spark@c902f09x02 spark]$

Page 77: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

77/134

POSIX commands: [spark@c902f09x02 user]$ pwd; ls -ltr /bigpfs/user total 0 drwxr-xr-x 2 hbase hadoop 4096 May 14 12:11 hbase drwxr-xr-x 2 hcat hadoop 4096 May 14 12:12 hcat drwx------ 2 hive hadoop 4096 May 14 12:12 hive drwxrwxr-x 3 oozie hadoop 4096 May 14 12:12 oozie drwxr-xr-x 4 spark hadoop 4096 May 15 12:45 spark [spark@c902f09x02 user]$

[spark@c902f09x02 spark]$ pwd /bigpfs/user/spark [spark@c902f09x02 spark]$ hadoop fs -ls Found 1 items drwx------ - spark hadoop 0 2016-05-15 12:45 .staging [spark@c902f09x02 spark]$ [spark@c902f09x02 spark]$ ls -ltr total 0 [spark@c902f09x02 spark]$ echo "My test" > mytest [spark@c902f09x02 spark]$ cat mytest My test [spark@c902f09x02 spark]$ hadoop fs -cat mytest My test [spark@c902f09x02 spark]$ [spark@c902f09x02 spark]$ rm mytest [spark@c902f09x02 spark]$ ls -ltr total 0 [spark@c902f09x02 spark]$

4. Run wordcount as user spark.

Copy the mywordcountfile file to be used as input to the wordcount program. [spark@c902f09x02 spark]$ pwd /bigpfs/user/spark [spark@c902f09x02 spark]$ cp /etc/passwd mycountfile

Run the wordcount program [spark@c902f09x02 spark]$ yarn jar /usr/iop/4.1.0.0/hadoop-mapreduce/hadoop-mapreduce-examples-2.7.1-IBM-11.jar wordcount mycountfile wc_output 16/07/08 13:50:36 INFO impl.TimelineClientImpl: Timeline service address: http://c902f09x03.pok.stglabs.ibm.com:8188/ws/v1/timeline/ 16/07/08 13:50:36 INFO client.RMProxy: Connecting to ResourceManager at c902f09x03.pok.stglabs.ibm.com/172.16.0.67:8050 16/07/08 13:50:37 INFO input.FileInputFormat: Total input paths to process : 1 16/07/08 13:50:37 INFO mapreduce.JobSubmitter: number of splits:1

Page 78: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

78/134

16/07/08 13:50:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1467946633720_0001 16/07/08 13:50:37 INFO impl.YarnClientImpl: Submitted application application_1467946633720_0001 16/07/08 13:50:37 INFO mapreduce.Job: The url to track the job: http://c902f09x03.pok.stglabs.ibm.com:8088/proxy/application_1467946633720_0001/ 16/07/08 13:50:37 INFO mapreduce.Job: Running job: job_1467946633720_0001 16/07/08 13:50:46 INFO mapreduce.Job: Job job_1467946633720_0001 running in uber mode : false 16/07/08 13:50:46 INFO mapreduce.Job: map 0% reduce 0% 16/07/08 13:50:56 INFO mapreduce.Job: map 100% reduce 0% 16/07/08 13:51:03 INFO mapreduce.Job: map 100% reduce 100% 16/07/08 13:51:03 INFO mapreduce.Job: Job job_1467946633720_0001 completed successfully 16/07/08 13:51:03 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=3256 FILE: Number of bytes written=257887 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2920 HDFS: Number of bytes written=2886 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=7712 Total time spent by all reduces in occupied slots (ms)=5219 Total time spent by all map tasks (ms)=7712 Total time spent by all reduce tasks (ms)=5219 Total vcore-seconds taken by all map tasks=7712 Total vcore-seconds taken by all reduce tasks=5219 Total megabyte-seconds taken by all map tasks=27639808 Total megabyte-seconds taken by all reduce tasks=18704896 Map-Reduce Framework Map input records=56 Map output records=97 Map output bytes=3178 Map output materialized bytes=3256 Input split bytes=130 Combine input records=97 Combine output records=91 Reduce input groups=91 Reduce shuffle bytes=3256 Reduce input records=91 Reduce output records=91 Spilled Records=182 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=79 CPU time spent (ms)=1460 Physical memory (bytes) snapshot=1593815040 Virtual memory (bytes) snapshot=10161987584 Total committed heap usage (bytes)=1739587584 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0

Page 79: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

79/134

WRONG_REDUCE=0 File Input Format Counters Bytes Read=2790 File Output Format Counters Bytes Written=2886 [spark@c902f09x02 spark]$

Check output in directory [spark@c902f09x02 spark]$ hadoop fs -ls wc_output Found 2 items -rw-r--r-- 3 spark hadoop 0 2016-07-08 13:51 wc_output/_SUCCESS -rw-r--r-- 3 spark hadoop 2886 2016-07-08 13:51 wc_output/part-r-00000 [spark@c902f09x02 spark]$ [spark@c902f09x02 spark]$ pwd; ls -ltr wc_output /bigpfs/user/spark total 0 -rw-r--r-- 1 spark hadoop 2886 Jul 8 13:51 part-r-00000 -rw-r--r-- 1 spark hadoop 0 Jul 8 13:51 _SUCCESS [spark@c902f09x02 spark]$

5. Check the Hadoop GUI.

Page 80: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

80/134

8. IBM Spectrum Scale versus Native HDFS

When IBM Spectrum Scale service is added, the native HDFS is no longer used. The Hadoop application

interacts with HDFS transparency similar to their interactions with the native HDFS. The application can

access HDFS by using Hadoop file system APIs and Distributed File System APIs. The application can have its

own cluster that is larger than the HDFS protocol cluster. However, all the nodes within the application

cluster must be able to connect to all the nodes in the HDFS protocol cluster by RPC.

Note:

The Secondary NameNode and Journal nodes in native HDFS are not needed for HDFS Transparency be-cause the HDFS Transparency NameNode is stateless, metadata are distributed, and the NameNode does not maintain the FSImage-like or EditLog information.

8.1 Function limitations

Max number of EA is limited by IBM Spectrum Scale, the total size of EA key and value must be less than a metadata block size in IBM Spectrum Scale.

EA operation on snapshots is not supported.

Raw namespace is not implemented because it is not not used internally.

8.2 Configuration that differs from native HDFS in IBM Spectrum Scale

Property name Value New definition or limitation

dfs.permissions.enabled True/false For HDFS protocol, the permis-

sion check is always done.

dfs.namenode.acls.enabled True/false For native HDFS, the NameNode

manages all metadata including

the ACL information. HDFS can

use this to turn the ACL checking

on or off. However, for IBM

SPECTRUM SCALE, the HDFS pro-

tocol does not hold the

metadata. When on, the ACL is

set and stored in the IBM Spec-

trum Scale file system. If the ad-

ministrator turns it off later, the

Page 81: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

81/134

ACL entries set and stored in IBM

Spectrum Scale take effect. This

will be improved in the next re-

lease.

dfs.blocksize Long digital Must be a multiple of the IBM

Spectrum Scale file system

blocksize (mmlsfs -B), the maxi-

mal value is 1024 * file-system-

data-block-size(mmlsfs –B).

gpfs.data.dir String A user in Hadoop must have full

access to this directory. If this

configuration is omitted, a user

in Hadoop must have full access

to gpfs.mount.dir.

dfs.namenode.fs-limits.max-xattrs-

per-inode

INT Does not apply to the HDFS pro-

tocol

dfs.namenode.fs-limits.max-xattr-

size

INT Not apply to the HDFS protocol

TABLE 7 NATIVE HDFS AND IBM SPECTRUM SCALE DIFFERENCES

8.3 Short Circuit Read Configuration

In HDFS, read requests go through the DataNode. When the client requests the DataNode to read a file, the

DataNode reads the file off the disk and sends the data to the client over a TCP socket. The short-circuit read

obtains the file descriptor from the DataNode, allowing the client to read the file directly.

This is possible only in cases where the client is co-located with the data and used in the FPO mode.

Short-circuit reads provide a substantial performance boost to many applications.

Note: Short-circuit local reads can only be enabled on Hadoop 2.7.0. HDFS Transparency versions 2.7.0-0 and 2.7.0-1 do not support this feature over Hadoop versions 2.7.1 and 2.7.2. IBM BigInsights IOP 4.1 uses Hadoop version 2.7.1 which is not a supported version for short-circuit. For more information on how to enable short-circuit reads on other Hadoop versions, contact [email protected].

When IBM Spectrum Scale is integrated with Ambari to replace native HDFS, the short-circuit option in HDFS

advance panel will be unselected.

Page 82: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

82/134

Appendix

A. Preparing a stanza File

The Ambari install process can install and configure a new IBM Spectrum Scale cluster file system and configure

it for Hadoop workloads. To support this task, the installer must know the disks available in the cluster and how

you want to use them. If you do not indicate preferences, intelligent defaults are used.

The sample files for gpfs policy, nsd, hadoop cach, rack configuration and share configuration file are located in

/var/lib/ambari-server/resources/stacks/BigInsights/4.1/services/GPFS/package/templates directory.

# pwd /var/lib/ambari-server/resources/stacks/BigInsights/4.1/services/GPFS/package/templates # ls gpfs_fs.pol.sample gpfs_nsd.sample hadoop_cache.sample racks.sample shared_gpfs_node.cfg.sample

Copy sample files to /var/lib/ambari-server/resources

[root@c902f09x02 ~]# cp -p /var/lib/ambari-server/resources/stacks/BigInsights/4.1/services/GPFS/pack-age/templates/* /var/lib/ambari-server/resources [root@c902f09x02 resources]# pwd; ls *.sample /var/lib/ambari-server/resources gpfs_fs.pol.sample gpfs_nsd.sample hadoop_cache.sample racks.sample shared_gpfs_node.cfg.sample

Page 83: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

83/134

[root@c902f09x02 resources]#

Note: Ambari for deploying a new IBM Spectrum Scale cluster is only supported for FPO mode.

Two types of NSD files are supported for file system auto-creation. One is the preferred simple format, and the

other is the standard IBM Spectrum Scale NSD format intended for experienced IBM Spectrum Scale adminis-

trators.

Preferred Simple Format Standard Format

o Ambari selects the correct metadata and data ratios.

o If possible, Ambari creates partitions on some disks for Hadoop intermediate data to enhance performance.

o One system pool and one data pool are cre-ated.

o NSD file must be located at /var/lib/ambari-server/resources/ on the Ambari server.

o Only /dev/sdX and /dev/dx-X devices are supported.

o The GPFS administrator is responsible for the storage arrangement and configuration.

o A policy file is also required

o Storage pools and block sizes can be defined as needed.

Simple NSD File

Example of a Preferred Simple IBM Spectrum Scale NSD file

There are 7 nodes, each with 6 disk drives to be defined as NSDs. All information must be continuous with no

extra spaces

# cat /var/lib/ambari-server/resources/gpfs_nsd DISK|compute001.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg DISK|compute002.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg DISK|compute003.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg DISK|compute005.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg DISK|compute006.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg DISK|compute007.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg

Page 84: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

84/134

If you want to select disks such as SSD drives for metadata , add the label -meta to those disks.

In a simple NSD file, add the label meta for the disks that you want to use as metadata disks, as shown in the following example. If -meta is used, the Partition algorithm is ignored.

# cat /var/lib/ambari-server/resources/gpfs_nsd DISK|compute001.private.dns.zone:/dev/sdb-meta,/dev/sdc,/dev/sdd DISK|compute002.private.dns.zone:/dev/sdb-meta,/dev/sdc,/dev/sdd DISK|compute003.private.dns.zone:/dev/sdb-meta,/dev/sdc,/dev/sdd DISK|compute005.private.dns.zone:/dev/sdb-meta,/dev/sdc,/dev/sdd DISK|compute006.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd DISK|compute007.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd

In the simple NSD file, /dev/sdb from compute001, compute002, compute003, and compute005 are specified as meta disks in the IBM Spectrum Scale file system. The partition algorithm is ignored if the nodes listed in the simple NSD file do not match the set of nodes that will be used for the NodeManager service. If nodes that are not NodeManagers are in the NSD file or nodes that will be NodeManagers are not in the NSD file, no partitioning will be done.

Standard NSD file Example of a Standard IBM Spectrum Scale NSD File

%pool: pool=system blockSize=256K layoutMap=cluster allowWriteAffinity=no %pool: pool=datapool blockSize=2M layoutMap=cluster allowWriteAffinity=yes writeAffinityDepth=1 blockGroupFac-tor=256 # gpfstest9 %nsd: nsd=node9_meta_sdb device=/dev/sdb servers=gpfstest9 usage=metadataOnly failureGroup=101 pool=system %nsd: nsd=node9_meta_sdc device=/dev/sdc servers=gpfstest9 usage=metadataOnly failureGroup=101 pool=system %nsd: nsd=node9_data_sde2 device=/dev/sde2 servers=gpfstest9 usage=dataOnly failureGroup=1,0,1 pool=datapool %nsd: nsd=node9_data_sdf2 device=/dev/sdf2 servers=gpfstest9 usage=dataOnly failureGroup=1,0,1 pool=datapool # gpfstest10 %nsd: nsd=node10_meta_sdb device=/dev/sdb servers=gpfstest10 usage=metadataOnly failureGroup=201 pool=system %nsd: nsd=node10_meta_sdc device=/dev/sdc servers=gpfstest10 usage=metadataOnly failureGroup=201 pool=system %nsd: nsd=node10_data_sde2 device=/dev/sde2 servers=gpfstest10 usage=dataOnly failureGroup=2,0,1 pool=datapool %nsd: nsd=node10_data_sdf2 device=/dev/sdf2 servers=gpfstest10 usage=dataOnly failureGroup=2,0,1 pool=datapool # gpfstest11 %nsd: nsd=node11_meta_sdb device=/dev/sdb servers=gpfstest11 usage=metadataOnly failureGroup=301 pool=system %nsd: nsd=node11_meta_sdc device=/dev/sdc servers=gpfstest11 usage=metadataOnly failureGroup=301 pool=system %nsd: nsd=node11_data_sde2 device=/dev/sde2 servers=gpfstest11 usage=dataOnly failureGroup=3,0,1 pool=datapool %nsd: nsd=node11_data_sdf2 device=/dev/sdf2 servers=gpfstest11 usage=dataOnly failureGroup=3,0,1 pool=datapool

Page 85: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

85/134

Type the /var/lib/ambari-server/resources/gpfs_nsd filename in the NSD stanza field. If you are using standard NSD stanza file, a policy file is required.

Policy File Policy File E.g. bigpfs.pol RULE 'default' SET POOL 'datapool'

Because of the limitations of the Ambari framework, the NSD file must be copied to the Ambari server in the the /var/lib/ambari-server/resources/ directory. Ensure that the correct file name is specified on the IBM Spectrum Scale Customize Services page.

Page 86: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

86/134

FIGURE 28 AMBARI NSD STANZA

B. IBM Spectrum Scale-FPO Deployment

Disk-partitioning algorithm

If a simple NSD file is used without the -meta label, Ambari assigns metadata and data disks and partitions the

disk according to the following rules:

1. If nodes number is less than four:

Page 87: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

87/134

a. If the disk number of each node is less than three, put all disks to system pool and usage = metada-

taanddata. Partitioning is not done.

b. If the disk number of each node is greater than four, assign metaonly and dataonly disks based on

ratio 1:3 on each node. But the MAX metadisk number per node is four. Partitioning is done if all

NodeManager nodes are also NSD nodes and have the same number of NSD disks.

2. If the node number is greater than 5:

a. If the disk number of each node is less than 2, put all disks to the system pool and usage is metada-

taanddata. Partitioning is not done.

b. Set four nodes to metanodes where meta disks are located. Others are DataNodes.

c. Failure groups are created based on the failure group selection rule.

d. Assign meta disk and data disks to the meta node. Assign only data disk to the data node. The ratio

follows best practice and between 1:3 and 1:10.

e. If all node manager nodes have the same number of NSD disks, create a local partition on data

disks for Hadoop intermediate data.

Failure Group selection rules

Failure groups are created based on rack allocation of the nodes. One rack mapping file is supported (Rack

Mapping File). Ambari reads this file and assigns one failure group per rack. The rack number must be three or

greater than three. If rack mapping file is not provided, virtual racks are created for data fault toleration.

1. If the node number is less than four, each node is on a different rack.

2. If the node number is greater than five and node number is greater than 10, every two nodes are put in

one virtual rack.

3. If the node number is greater than ten and node number is less than 21, every three nodes are put in

one virtual rack.

4. If the node number is less than 22, every 10 nodes are put in one virtual rack.

Rack Mapping File

Nodes can be defined to belong to racks. For three or more racks, the failure groups of NSD will correspond to the rack the node is in. A sample file is available on the Ambari server at /var/lib/ambari-server/resources/stacks/BigInsights/4.1/services/GPFS/package/templates/racks.sample. To use, copy the racks.sample file to /var/lib/ambari-server/resources directory.

# cat /var/lib/ambari-server/resources/racks.sample #Host/Rack map configuration file

Page 88: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

88/134

#Format: #[hostname]:/[rackname] #Example: #mn01:/rack1 #NOTE: #The first character in rack name must be "/" mn03:/rack1 mn04:/rack2 dn02:/rack3

FIGURE 29 AMBARI RACK MAPPING

Page 89: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

89/134

Partitioning Function Matrix in Automatic Deployment

Each data disk is divided into two partitions because one partition is used for an ext4 file system to store the map or reduce intermediate data, while another partition is used as a data disk in the IBM Spectrum Scale file system. Only data disks can be partitioned. Meta disks cannot be partitioned. On the other hand, if a node is not selected as NodeManager for YARN, there will not be a map or reduce tasks running on that node. In this case, partitioning the disks of the node is not favorable because the local partition will not be used The following table describes the partitioning function matrix:

TABLE 8 IBM SPECTRUM SCALE PARTITIONING FUNCTION MATRIX

Specify the standard

NSD file

Specify the simple NSD file

without the -meta label

Specify the simple NSD file

with the -meta label

#1:

<node manager host list> ==

< IBM Spectrum Scale NSD

server nodes>

The node manager host list is

equal to IBM Spectrum Scale

NSD server nodes.

No partitioning;

Create an NSD directly

with the NSD file.

Partition and select meta

disks for the customer ac-

cording to Disk-partition-

ing algorithm and Failure

Group selection rules

No partitioning.

All disks marked with the

-meta label will be used

for metadata NSD disks.

All others are marked as

data NSDs.

#2:

<node manager host list>><

IBM Spectrum Scale NSD

server nodes>

Some node manager hosts

are not in IBM Spectrum Scale

NSD server nodes but all IBM

Spectrum Scale NSD server

nodes are in node manager

host list.

No partitioning.

Create the NSD di-

rectly with the speci-

fied NSD file.

No partitioning, but select

meta disks for the cus-

tomer according to Disk-

partitioning algorithm

and Failure Group selec-

tion rules

No partitioning.

All disks marked with the

-meta label are used for

metadata NSD disks. All

others are marked as data

NSDs.

<node manager host list><<

IBM Spectrum Scale NSD

server nodes>

No partitioning. No partitioning, but select

meta disks for customer

according to Disk-parti-

tioning algorithm and

No partitioning.

All disks marked with the

-meta label will be used

Page 90: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

90/134

Some IBM Spectrum Scale

NSD server nodes are not in

the node manager host list

but all node manager host

lists are in IBM Spectrum

Scale NSD server nodes.

Create the NSD di-

rectly with the speci-

fied NSD file.

Failure Group selection

rules

for metadata NSD disks.

All others are marked as

data NSDs.

For standard NSD files or simple NSD files with the -meta label, the IBM Spectrum Scale NSD and filesystem are created directly. To specify which disks that must be used for metadata and have data disks partitioned, use the script parti-tion_disks_general.sh, found in Attachments at the bottom of the IBM Open Platform with Apache Hadoop wiki page, to partition the disks first, and specify the partition that is used for GPFS NSD in a simple NSD file. For example:

[root@compute000 ~]# cat /var/lib/ambari-server/resources/gpfs_nsd DISK|compute001.private.dns.zone:/dev/sdb-meta,/dev/sdc2,/dev/sdd2 DISK|compute002.private.dns.zone:/dev/sdb-meta,/dev/sdc2,/dev/sdd2 DISK|compute003.private.dns.zone:/dev/sdb-meta,/dev/sdc2,/dev/sdd2 DISK|compute005.private.dns.zone:/dev/sdb-meta,/dev/sdc2,/dev/sdd2 DISK|compute006.private.dns.zone:/dev/sdb,/dev/sdc2,/dev/sdd2 DISK|compute007.private.dns.zone:/dev/sdb,/dev/sdc2,/dev/sdd2

After deployment is done by this mode, manually update the yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs to contain the directory list from the disk partitions that are used for map/reduce intermediate data.

C. Dual-network deployment

The following section is only applicable for IBM Spectrum Scale FPO (local storage) mode, and does not im-

pact Hadoop clusters running over a shared storage configuration (e.g. SAN-based cluster, or ESS).

If the FPO cluster has a dual 10Gb network, you have two configuration options. The first option is to bond

the two network interfaces and deploy the IBM Spectrum Scale cluster and the Hadoop cluster over the

bonded interface. The second option is to configure one network interface for the Hadoop services including

the HDFS transparency service and configure the other network interface for IBM Spectrum Scale to use for

data traffic. This configuration can minimize interference between disk I/O and application communication.

For the second option, perform the following steps to ensure that the Hadoop applications can exploit data

locality for better performance.

Configure the first network interface with one subnet address (e.g. 192.168.1.0), configure the sec-

ond network interface as another subnet address (e.g. 192.168.2.0).

Page 91: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

91/134

Create the IBM Spectrum Scale cluster and NSDs with the IP or hostname from the first network in-

terface

Install the Hadoop cluster and the HDFS Transparency services by using the IP addresses or host-

names from the first network interface

Run mmchconfig subnets=192.168.2.0 -N all.

Note: 192.168.2.0 is the subnet used for IBM Spectrum Scale data traffic.

For Hadoop map and reduce jobs, the scheduler Yarn checks the block location. HDFS transparency returns

the hostname which is used to create an IBM Spectrum Scale cluster as block location to Yarn. Yarn checks

the hostname within the NodeManager host list. If Yarn cannot find the hostname within the NodeManager

list, it cannot schedule the tasks according to data locality. The suggested configuration can ensure that, the

hostname for block location can be found in the Yarn NodeManager list and Yarn can schedule the task ac-

cording to data locality.

For a Hadoop distribution like IBM BigInsights IOP, all Hadoop components are managed by Ambari™. In this

scenario, all Hadoop components, HDFS transparency and the IBM Spectrum Scale cluster must be created by

using one network interface. Use the second network interface for GPFS.

See Deploying a big data solution using IBM Spectrum Scale for more information.

D. BigInsights value-add services on IBM Spectrum Scale

Limitations

When adding the Big SQL service, the bigsql_user_password must be set to “bigsql”.

For SLES OS, the R service is not supported to be installable from the Ambari dashboard by using the Add Service wizard.

See the SLES OS section under Troubleshooting Value Add Services for steps on on adding the R ser-vice manually.

BigSQL and BigR service check workaround after unintegrating the HDFS Transparency. See Trou-bleshooting Value Add Services section under General for more information.

Installation

1. Perform the preparation steps for BigInsights value-adds:

Page 92: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

92/134

IBM BigInsights 4.1 documentation - Preparing to install the BigInsights value-add services

2. Install the BigInsights value-add package as stated in the BigInsights Knowledge Center web page:

IBM BigInsights 4.1 documentation - Installing the BigInsights value-add packages

Note: BigSQL automatically determines the number of GPFS NSD servers, sets the number of worker

threads to that number, and runs them on those NSD nodes. However, in the case of a shared storage sys-

tem (e.g. ESS), BigSQL reports an error because there are only a limited number of NSD servers and usually

they are not part of the Hadoop cluster.

To correctly configure the number of worker threads for BigSQL in a shared storage system or in a remote

mounted environment, the following workaround must be set in the BigSQL bigsql-conf.xml configuration

file:

<property> <name>scheduler.dataLocationCount</name> <value>max:8</value> <description>Set this to max:number-of-worker-nodes in gpfs-shared-disk environment</description> </property>

This will specify the number of worker threads that we want BigSQL to use and BigSQL will not enforce those worker

threads to run only on the GPFS NSD server nodes.

Troubleshooting value-add services

General

1. The BigInsights Home webpage is blank.

Solution: Need to configure the Knox service and restart the BigInsight service to see the home

webpage.

a) Enabled demo ldap in Knox.

Log on to Ambari GUI → Knox → Service Actions → Start Demo LDAP. b) Go to /usr/ibmpacks/bin/<version> directory of the BI installation. c) Execute: ./knox_setup.sh -u admin -p admin -x 8080 (Follow the prompts). d) Restart the Knox service in Ambari GUI. e) Restart BIGINSIGHTS_HOME service in Ambari GUI. f) Verify the BigInsights Home web page:

https://<bi_home_host>:8443/gateway/default/BigInsightsWeb/index.html

For more details, see Enable Knox for BigInsights value-add services in IBM Knowledge Center.

Page 93: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

93/134

2. Big SQL and Big R service check limitation

On the Ambari dashboard, select Spectrum Scale > Service Actions > Unintegrate_Transparency.

Create the user directories for BigSQL and BigR. Otherwise, the service check for those services will

fail.

# su - hdfs -c " hadoop fs -mkdir /user/bigsql "

# su - hdfs -c " hadoop fs -chown bigsql:hadoop /user/bigsql "

# hadoop fs -ls /user

# su - hdfs -c " hadoop fs -mkdir /user/bigr "

# su - hdfs -c " hadoop fs -chown bigr:hadoop /user/bigr "

# su - hdfs -c " hadoop fs -chmod 777 /user/bigr "

# hadoop fs -ls /user

SLES OS

Installation steps for R on SLES

For SLES OS, the R service is not supported to be installable from the Ambari dashboard by using the

Add Service wizard. Therefore, do not check the R install box in the Add Service wizard.

Two methods to install R on SLES 11:

1. Install R from a binary package

R is now part of the OpenSuSE official package and can be searched from its repository. To get the

latest R binary package from the repository, go to opensuse via this link:

http://software.opensuse.org/search?baseproject=ALL

Type R-patched in the search text box and the R packages for all the supported SuSE are displayed.

Download the 64-bit package version.

A libgfortan package dependency is required to install R.

From the same repository, type libgfortran43 for version 4.3 in the search text box. Download the

package and install it. Once the installation finishes, re-install the R-patched package using YaST2.

2. Install R from a source package (tar.gz):

All versions of the R source codes can be downloaded from the CRAN link:

http://cran.us.r-project.org/src/base

Page 94: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

94/134

Install dependency packages:

$> yum install gcc-gfortran $> yum install gcc-c++ On the directory where the R package was downloaded, untar the package (e.g. R-3.0.1.tar.gz)

$> tar zxvf R-3.0.1.tar.gz

Go to where the untared directory and configure R

$> cd R-3.0.1

$> ./configure

The compiler might complain about readline (GNU readline) and x (Xserver) packages if your sys-

tem did not install them already. You can install these two packages or compile R without them

since Big R does not require these two packages.

Compile without readline and x packages:

$> ./configure --with-readline=no --with-x=no

Run make to comple and install R on the system.

$> make

Verify that the R installation succeed by running make check.

$> make check

Issues on SLES OS

Note: BigSQL service check fails in x86_64 SLES environment.

Problem: The mysql-connector-java package cannot be found while installing the IOP package.

Solution: The mysql-connector-java rpm is in the SLES SDK DVD, e.g. SLE-11-SP3-SDK-DVD-x86_64-GM-

DVD1.iso. The SLES SDK DVD needs to be mounted to create a zypp repository, such as /etc/zypp/re-

pos.d/.

Problem: How to configure MySQL JDBC Driver for Ambari?

Solution: Set the correct JDBC driver to be available to the ambari server.

Ensure that the mysql-connector-java is installed on the Ambari server:

Page 95: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

95/134

# rpm -q mysql-connector-java mysql-connector-java-5.1.6-1.27

Configure the Ambari-server to point to the JDBC driver command during the set-up phase:

# ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar Using python /usr/bin/python2.6 Setup ambari-server Copying /usr/share/java/mysql-connector-java.jar to /var/lib/ambari-server/resources JDBC driver was successfully initialized. Ambari Server 'setup' completed successfully.

Problem: The Hive service does not start and the system displays the following MySQL error: Un-

boundLocalError: local variable 'pid_file' referenced before assignment

Solution: The /etc/my.cnf file must be configured with the proper pid-file value on the node where

MySQL is running.

Edit the /etc/my.cnf and add the following line under the [mysqld] section:

[mysqld] … pid-file = /var/lib/mysql/mysqld.pid …

Restart the MySQL service:

# /etc/init.d/mysql restart

E. Symphony Integration

IBM Platform Symphony 7.1 provides a special build so that it can be integrated with Ambari as a service. The

normal Symphony package and image cannot be used.

The IBM Platform Symphony base package and IOP Ambari integration package can be obtained from Fix Cen-

tral repository. See the Symphony Fix README on how to install the package.

From Fix Central, on the Find product tab, provide the following information.

In this field: Select:

Product selector Platform Symphony

Page 96: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

96/134

In this field: Select:

Installed Version 7.1

Platform Linux 64-bit,x86_64

Linux PPC

Individual fix IDs sym-7.1-build413186

*Note: When this document was published, the Fix ID sym-7.1-build413186 was not available in Fix Central.

Check Fix Central again or contact [email protected] for information.

Limitations

There are Symphony integration issue with ambari repo name.

If a local ambari repo file is used and the repo does not use the default repo name that starts with “BI”, Sym-

phony does not integrate with IOP and displays the following error:

HDP installation is detected. Hadoop Jars were not found. Please check if HDP or BigInsights installed properly.

The following is an example of the correct ambari.repo naming convention “BI_AMBARI-2.1.0”:

# cat /etc/yum.repos.d/ambari.repo

[BI_AMBARI-2.1.0]

name=ambari-2.1.0

baseurl=https://ibm-open-platform.ibm.com/repos/Ambari/rhel/7/ppc64le/2.1.x/GA/2.1/

enabled=1

gpgcheck=0

See BigInsight 4.1 Symphony service stack on Ambari server for more information.

F. Upgrade GPFS Ambari integration module

You must plan a cluster maintenance window and prepare for cluster downtime when upgrading a GPFS Am-

bari Integration Module.

If a new version of the GPFS Ambari Integration module is available and you want to upgrade to this new ver-sion (gpfs.hdfs-protocol.ambari-iop_4.1<version>.noarch.bin), perform the following steps:

1. Log in to the Ambari GUI.

Page 97: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

97/134

2. From the dashboard, select Actions > Stop All. 3. From the dashboard, select Spectrum Scale > Service Actions > Unintegrate_Transparency. 4. On the Ambari server node, run ambari-server restart command to restart the Ambari server.

5. Log in to the Ambari server node and remove the existing gpfs.hdfs-transparency.ambari-iop_4.1<ver-sion>.noarch package, such as yum erase gpfs.hdfs-transparency.ambari-iop_4.1*.noarch.

6. On the Ambari server node, install the new gpfs.hdfs-transparency.ambari-iop_4.1<version>.noarch

package. See the Install the gpfs.hdfs-protocol.ambari integration module step in Install the GPFS inte-

gration module into Ambari.

7. Log in to the Ambari GUI.

8. From the dashboard, select Spectrum Scale > Service Actions > Integrate_Transparency.

9. On the Ambari server node, run ambari-server restart command to restart the Ambari server.

10. Log back in to the Ambari GUI.

11. From the dashboard, select Actions > Start All to start all the services.

G. IBM Spectrum Scale Service Management

Manage the IBM Spectrum Scale through the Spectrum Scale dashboard. Status and utilization information of

IBM Spectrum Scale and HDFS Transparency can be view on this panel.

Service Actions Dropdown list

Page 98: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

98/134

FIGURE 30 IBM SPECTRUM SCALE SERVICE ACTIONS

Note: Do not use the Restart GPFS Transparency Nodes and Restart GPFS nodes option under the Service Ac-tions. To restart Spectrum Scale, users must use Service Actions > Stop and Start options. See Limitations - Gen-eral list on Spectrum Scale restart.

Running the service check

To check the status and stability of the service, run a service check on the IBM Spectrum Scale dashboard by

clicking Run Service Check in the Service Actions dropdown menu.

Service Check output logs:

stderr: /var/lib/ambari-agent/data/errors-341.txt None stdout: /var/lib/ambari-agent/data/output-341.txt 2016-05-04 15:28:51,564 - ========== Getting GPFS configuration from Ambari =========== 2016-05-04 15:28:51,567 - Run command: mkdir -p /var/lib/ambari-agent/ ; rm -f /var/lib/ambari-agent//gpfs_nsd ; curl -kf -x "" --retry 10 http://c902f09x02.pok.stglabs.ibm.com:8080/resources//gpfs_nsd -o /var/lib/ambari-agent//gpfs_nsd 2016-05-04 15:28:51,726 - Status: 0, Output: % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 288 100 288 0 0 1899 0 --:--:-- --:--:-- --:--:-- 1907 2016-05-04 15:28:51,727 - Execute['curl -kf -x "" --retry 10 http://c902f09x02.pok.stglabs.ibm.com:8080/resources//scripts/configs.sh -o /tmp/configs.sh'] {'environment': {'no_proxy': u'c902f09x02.pok.stglabs.ibm.com'}, 'path': ['/bin', '/usr/bin/']} 2016-05-04 15:28:51,890 - Run command: grep "^DISK|" /var/lib/ambari-agent//gpfs_nsd 2016-05-04 15:28:51,893 - Status: 0, Output: DISK|c902f09x02.pok.stglabs.ibm.com:/dev/sdd,/dev/sdi,/dev/sdj,/dev/sdk DISK|c902f09x03.pok.stglabs.ibm.com:/dev/sdd,/dev/sdi,/dev/sdj,/dev/sdk DISK|c902f09x04.pok.stglabs.ibm.com:/dev/sdd,/dev/sdi,/dev/sdj,/dev/sdk DISK|c902f09x08.pok.stglabs.ibm.com:/dev/sdd,/dev/sdi,/dev/sdj,/dev/sdk 2016-05-04 15:28:51,893 - This is a Simple NSD file 2016-05-04 15:28:51,893 - Run command: cat /var/lib/ambari-agent//gpfs_nsd |grep "^DISK" |cut -d: -f1|cut -d"|" -f2 2016-05-04 15:28:51,896 - Status: 0, Output: c902f09x02.pok.stglabs.ibm.com c902f09x03.pok.stglabs.ibm.com c902f09x04.pok.stglabs.ibm.com c902f09x08.pok.stglabs.ibm.com 2016-05-04 15:28:51,897 - Run command: cat /var/lib/ambari-agent//gpfs_nsd |grep "^DISK|.*-meta" 2016-05-04 15:28:51,899 - Status: 256, Output: 2016-05-04 15:28:51,900 - Ambari Web PORT:8080 2016-05-04 15:28:51,900 - gpfs_conf_dict[nm_hosts]:[u'c902f09x08'] 2016-05-04 15:28:51,900 - gpfs_conf_dict[gpfs.data]:['c902f09x02', 'c902f09x03', 'c902f09x04', 'c902f09x08'] 2016-05-04 15:28:51,900 - Not all MapReduce nodes are NSD nodes. Skip Partition.. 2016-05-04 15:28:51,901 - ========== Checking GPFS configuration =========== 2016-05-04 15:28:51,901 - Check gpfs.DataDiskLocalPercent: 0 2016-05-04 15:28:51,901 - Check gpfs.DefaultDataReplicas: 3 2016-05-04 15:28:51,901 - Check gpfs.MaxDataReplicas: 3 2016-05-04 15:28:51,901 - Check gpfs.cnfsReboot: no 2016-05-04 15:28:51,901 - Check gpfs.quorum.nodes: [u'c902f09x02', u'c902f09x08', u'c902f09x03'] 2016-05-04 15:28:51,901 - Check gpfs.disableInodeUpdateOnFdatasync: yes 2016-05-04 15:28:51,901 - Check gpfs.forceLogWriteOnFdatasync: no 2016-05-04 15:28:51,901 - Check gpfs.restripeOnDiskFailure: yes 2016-05-04 15:28:51,902 - Check gpfs.unmountOnDiskFail: meta 2016-05-04 15:28:51,902 - Check gpfs.readReplicaPolicy: local 2016-05-04 15:28:51,902 - Check gpfs.Pagepool: 25 2016-05-04 15:28:51,902 - Run command: /usr/lpp/mmfs/bin/mmgetstate -N c902f09x03,c902f09x02,c902f09x04,c902f09x08

Page 99: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

99/134

2016-05-04 15:28:53,493 - Status: 0, Output: Node number Node name GPFS state ------------------------------------------ 1 c902f09x02 active 2 c902f09x08 active 3 c902f09x03 active 4 c902f09x04 active 2016-05-04 15:28:53,493 - ========== All nodes are up. =========== 2016-05-04 15:28:53,493 - Service_Check Hostnamec902f09x02 2016-05-04 15:28:53,493 - Run command: /usr/lpp/mmfs/bin/mmlsmount bigpfs -L 2016-05-04 15:28:54,006 - Status: 0, Output: File system bigpfs is mounted on 4 nodes: 172.16.0.66 c902f09x02 172.16.0.72 c902f09x08 172.16.0.67 c902f09x03 172.16.0.68 c902f09x04 2016-05-04 15:28:54,006 - ========== The filesystem is mounted on all nodes. =========== 2016-05-04 15:28:54,006 - Run command: /usr/lpp/mmfs/bin/mmdsh -N c902f09x03,c902f09x02,c902f09x04,c902f09x08 "/usr/lpp/mmfs/bin/mmhadoopctl connector getstate 2>/dev/null" 2>/dev/null | grep -c "running as process" 2016-05-04 15:28:55,158 - Status: 0, Output: 20 2016-05-04 15:28:55,168 - ExecuteHadoop['fs -mkdir /tmp'] {'bin_dir': '/usr/iop/current/hadoop-client/bin', 'conf_dir': '/usr/iop/current/hadoop-client/conf', 'logoutput': True, 'not_if': "ambari-sudo.sh su hdfs -l -s /bin/bash -c '/usr/iop/current/hadoop-client/bin/hadoop --config /usr/iop/current/hadoop-client/conf fs -test -e /tmp'", 'try_sleep': 3, 'tries': 5, 'user': 'hdfs'} 2016-05-04 15:28:56,830 - Skipping ExecuteHadoop['fs -mkdir /tmp'] due to not_if 2016-05-04 15:28:56,830 - ExecuteHadoop['fs -chmod 777 /tmp'] {'bin_dir': '/usr/iop/current/hadoop-client/bin', 'conf_dir': '/usr/iop/current/hadoop-cli-ent/conf', 'logoutput': True, 'try_sleep': 3, 'tries': 5, 'user': 'hdfs'} 2016-05-04 15:28:56,831 - Execute['hadoop --config /usr/iop/current/hadoop-client/conf fs -chmod 777 /tmp'] {'logoutput': True, 'try_sleep': 3, 'environment': {}, 'tries': 5, 'user': 'hdfs', 'path': ['/usr/iop/current/hadoop-client/bin']} 2016-05-04 15:28:58,450 - ExecuteHadoop['fs -rm /tmp/id10ac4200_date280416; hadoop --config /usr/iop/current/hadoop-client/conf fs -put /etc/passwd /tmp/id10ac4200_date280416'] {'bin_dir': '/usr/iop/current/hadoop-client/bin', 'conf_dir': '/usr/iop/current/hadoop-client/conf', 'logoutput': True, 'try_sleep': 3, 'tries': 5, 'user': 'hdfs'} 2016-05-04 15:28:58,450 - Execute['hadoop --config /usr/iop/current/hadoop-client/conf fs -rm /tmp/id10ac4200_date280416; hadoop --config /usr/iop/cur-rent/hadoop-client/conf fs -put /etc/passwd /tmp/id10ac4200_date280416'] {'logoutput': True, 'try_sleep': 3, 'environment': {}, 'tries': 5, 'user': 'hdfs', 'path': ['/usr/iop/current/hadoop-client/bin']} rm: `/tmp/id10ac4200_date280416': No such file or directory 2016-05-04 15:29:01,826 - ExecuteHadoop['fs -test -e /tmp/id10ac4200_date280416'] {'bin_dir': '/usr/iop/current/hadoop-client/bin', 'conf_dir': '/usr/iop/cur-rent/hadoop-client/conf', 'logoutput': True, 'try_sleep': 3, 'tries': 5, 'user': 'hdfs'} 2016-05-04 15:29:01,827 - Execute['hadoop --config /usr/iop/current/hadoop-client/conf fs -test -e /tmp/id10ac4200_date280416'] {'logoutput': True, 'try_sleep': 3, 'environment': {}, 'tries': 5, 'user': 'hdfs', 'path': ['/usr/iop/current/hadoop-client/bin']}

Upgrading IBM Spectrum Scale

Plan a cluster maintenance window and prepare for cluster downtime while upgrading IBM Spectrum Scale.

You can update the IBM Spectrum Scale PTF package through the Ambari server. The cross release upgrade is not supported. The IBM Spectrum Scale update package and the IBM Spectrum Scale HDFS Transparency is up-graded separately. Please get the update packages from IBM Fix Central and extract the packages as stated by IBM Spectrum Scale documentation.

1. Put all the update packages (PTF) into a Yum repository. If the Yum repository is not the existing IBM Spectrum Scale Yum repository path specified in Ambari, then add the Yum repository URL to Ambari Spectrum Scale configuration. From the dashboard, select Spectrum Scale > Configs tab > Advanced tab > Advanced gpfs-ambari-server-env > GPFS_REPO_URL to update the Yum repository path.

2. Go to the IBM Spectrum Scale Yum directory and rebuild the Yum database by using createrepo com-mand.

# createrepo . Spawning worker 0 with 4 pkgs Spawning worker 1 with 4 pkgs

Page 100: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

100/134

Spawning worker 2 with 4 pkgs Spawning worker 3 with 4 pkgs Workers Finished Saving Primary metadata Saving file lists metadata Saving other metadata Generating sqlite DBs Sqlite DBs complete

3. From the dashboard, select Actions > Stop All to stop all services. 4. From the dashboard, select Spectrum Scale > Service Actions > Upgrade_SpectrumScale.

FIGURE 31 AMBARI UPGRADE IBM SPECTRUM SCALE

5. From the dashboard, select Actions > Start All. IBM Spectrum Scale starts with the latest PTF packages.

6. Verify that the selected IBM Spectrum Scale PTF packages are installed on the nodes and the HDFS Transparency NameNode and DataNodes are up.

# xdsh c902f09x02,c902f09x03,c902f09x04,c902f09x08 "rpm -qa | grep gpfs" ** Check that the gpfs version installed on the nodes are the updated ones ** # /usr/lpp/mmfs/hadoop/sbin/mmhadoopctl connector getstate [root@c902f09x02 ~]# /usr/lpp/mmfs/hadoop/sbin/mmhadoopctl connector getstate

Page 101: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

101/134

c902f09x02.pok.stglabs.ibm.com: namenode running as process 17668. c902f09x02.pok.stglabs.ibm.com: datanode running as process 19326. c902f09x08.pok.stglabs.ibm.com: datanode running as process 16280. c902f09x04.pok.stglabs.ibm.com: datanode running as process 6002. c902f09x03.pok.stglabs.ibm.com: datanode running as process 2600.

Upgrading HDFS Transparency

You can update HDFS Transparency by using the Ambari server. The IBM Spectrum Scale update package and HDFS Transparency is upgraded separately.

1. Save the new IBM Spectrum Scale HDFS Transparency packages into the existing IBM Spectrum Scale Yum repository.

2. Go to the IBM Spectrum Scale Yum directory and rebuild the Yum database by running the createrepo command.

# createrepo . Spawning worker 0 with 2 pkgs Spawning worker 1 with 2 pkgs Spawning worker 2 with 2 pkgs Spawning worker 3 with 2 pkgs Workers Finished Saving Primary metadata Saving file lists metadata Saving other metadata Generating sqlite DBs Sqlite DBs complete

3. From the dashboard, select Actions > Stop All to stop all services. 4. From the dashboard, select Spectrum Scale > Service Actions > Upgrade_Transparency.

Page 102: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

102/134

5. From the dashboard > Actions > Start All. 6. Check to see if the correct version of HDFS Transparency is installed and that the connector NameNode

and DataNode are functioning.

# rpm -qa | grep hdfs-protocol gpfs.hdfs-protocol-2.7.<version>.x86_64

# /usr/lpp/mmfs/hadoop/sbin/mmhadoopctl connector getstate c902f09x02.pok.stglabs.ibm.com: namenode running as process 6716. c902f09x08.pok.stglabs.ibm.com: datanode running as process 5432. c902f09x02.pok.stglabs.ibm.com: datanode running as process 8699. c902f09x03.pok.stglabs.ibm.com: datanode running as process 7642. c902f09x04.pok.stglabs.ibm.com: datanode running as process 9096. c902f09x13.pok.stglabs.ibm.com: datanode running as process 8315. #

Integrating HDFS Transparency

You must plan a cluster maintenance window and prepare for cluster downtime when integrating the HDFS

Transparency from native HDFS.

To integrate HDFS Transparency (GPFS Transparency Node) from native HDFS:

1. From the dashboard, select Actions > Stop All to stop all services.

Page 103: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

103/134

2. From the Spectrum Scale dashboard, select Service Actions > Integrate_Transparency.

FIGURE 32 IBM SPECTRUM SCALE INTEGRATE TRANSPARENCY

3. On the Ambari server node, run ambari-server restart command to restart the Ambari server.

4. Log back in to the Ambari GUI.

5. Start all services from Ambari GUI. The Hadoop cluster starts using IBM Spectrum Scale and HDFS

Transparency.

Check the Spectrum Scale panel for the available GPFS Transparency Nodes.

Page 104: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

104/134

On the HDFS dashboard, NameNode and DataNodes are available because of using the HDFS Transparency

NameNode and DataNodes.

Command verification

Page 105: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

105/134

[root@c902f09x02 ~]# /usr/lpp/mmfs/bin/mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 c902f09x02 active 2 c902f09x08 active 3 c902f09x03 active 4 c902f09x04 active [root@c902f09x02 ~]# [root@c902f09x02 ~]# /usr/lpp/mmfs/hadoop/sbin/mmhadoopctl connector getstate c902f09x02.pok.stglabs.ibm.com: namenode running as process 6452. c902f09x08.pok.stglabs.ibm.com: datanode running as process 15442. c902f09x02.pok.stglabs.ibm.com: datanode running as process 8163. c902f09x03.pok.stglabs.ibm.com: datanode running as process 20566. c902f09x04.pok.stglabs.ibm.com: datanode running as process 14866. [root@c902f09x02 ~]#

Cluster environment

When IBM Spectrum Scale service is deployed, IBM Spectrum Scale is used instead of HDFS. IBM Spectrum

Scale inherits the native HDFS configuration and adds in additional changes for IBM Spectrum Scale to function

correctly.

After IBM Spectrum Scale is deployed, a new HDFS configuration set V2 is created and is visible in the HDFS UI

Panel > Configs tab.

Unintegrating Transparency

You must plan a cluster maintenance window and prepare for cluster downtime when unintegrating the HDFS

Transparency back to native HDFS.

1. From the dashboard, select Actions > Stop All to stop all services.

2. Select Spectrum Scale > Service Actions > Unintegrate_Transparency.

Page 106: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

106/134

FIGURE 33 IBM SPECTRUM SCALE UNINTEGRATE TRANSPARENCY

3. On the Ambari server node, run ambari-server restart command to restart the Ambari server.

4. Log back in to the Ambari GUI

5. Start all services from the Ambari GUI. The Hadoop cluster starts using native HDFS. The IBM Spec-

trum Scale service is not removed from the Ambari panel and will be displayed in GREEN. IBM Spec-

trum Scale will function but HDFS Transparency will not function.

NOTE: When unintegrated back to native HDFS, the HDFS configuration used remains the same as the HDFS configuration set used by the IBM Spectrum Scale prior to unintegration. If you must revert back to the original HDFS configuration, go to the HDFS dashboard and make the configuration changes un-der the Configs tab.

Check the Spectrum Scale panel and note that the GPFS Transparency is not available.

Page 107: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

107/134

In the Hosts dashboard, all the HDFS Transparency hosts will have due to the GPFS Transparency nodes are

down because the native HDFS is now effective.

The HDFS Transparency host displays the GPFS Transparency component as STOPPED.

Page 108: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

108/134

Command verification

[root@c902f09x02 ~]# /usr/lpp/mmfs/bin/mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 c902f09x02 active 2 c902f09x08 active 3 c902f09x03 active

Page 109: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

109/134

4 c902f09x04 active [root@c902f09x02 ~]# [root@c902f09x02 ~]# /usr/lpp/mmfs/hadoop/sbin/mmhadoopctl connector getstate c902f09x02.pok.stglabs.ibm.com: namenode is not running. c902f09x08.pok.stglabs.ibm.com: datanode is not running. c902f09x02.pok.stglabs.ibm.com: datanode is not running. c902f09x04.pok.stglabs.ibm.com: datanode is not running. c902f09x03.pok.stglabs.ibm.com: datanode is not running. [root@c902f09x02 ~]#

Cluster environment

After Spectrum Scale Unintegrate_Transparency, the native HDFS will be in effect. The configuration from IBM

Spectrum scale before the unintegrate phase will still be in effect. The IBM Spectrum Scale configuration will

not affect the native HDFS functionality. If you must revert back to the original native HDFS configuration, go

to the HDFS dashboard, and select the V1 configuration version under the Configs tab.

H. Ambari Node management

Adding a node

See Preparing the environment to prepare the new nodes.

Note: If you are adding new nodes to an existing cluster and if the nodes being added already have IBM Spectrum Scale installed on them, ensure that the new nodes are at the same version of IBM Spectrum Scale as the existing cluster. Do not mix GPFS Nodes with different versions of IBM Spectrum Scale software in a GPFS cluster. If you are adding a new node to an existing cluster with inconsistent IBM Spectrum Scale versions, the new node will not install even if the failed installed node might still be displayed in the cluster list in Ambari. To delete the failed node from the cluster in Ambari, see Delete Node.

The new nodes can then be added to the Ambari cluster by using the Ambari web interface.

1. Adding New Hosts

From the dashboard, select Hosts > Actions > Add New Hosts

Page 110: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

110/134

2. Specify the new node information, and click Registration and Confirm.

Note: The SSH Private Key is the key of the user on the Ambari Server.

3. Confirm Hosts panel

If the host check fails, check for the failure by clicking on the link and follow the directions in the pop up win-

dow.

Page 111: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

111/134

4. Select the services that you want to install on the new node.

For example, you can check the options for DataNode, GPFS Transparency Node, and GPFS Node

5. If several configuration groups are created, select one of them for the new node.

Page 112: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

112/134

6. Review the information and start the deployment by clicking Deploy.

7. Install, Start and Test panel

Page 113: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

113/134

8. After the Install, Start and Test wizard finished, click Complete.

9. New node is added to the Ambari Cluster

From Hosts dashboard, the new node is added to the host list.

Page 114: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

114/134

10. Restart IBM Spectrum Scale

Plan a cluster maintenance window and prepare for cluster downtime when restarting IBM Spectrum Scale.

a) From the dashboard, select Spectrum Scale > Service Actions > Stop.

b) Select Spectrum Scale > Service Actions > Start.

Restarting the HDFS Transparency adds the new nodes into the slave file -

/usr/lpp/mmfs/hadoop/etc/hadoop/slaves:

c902f09x03.pok.stglabs.ibm.com

c902f09x13.pok.stglabs.ibm.com

c902f09x02.pok.stglabs.ibm.com

c902f09x04.pok.stglabs.ibm.com

c902f09x08.pok.stglabs.ibm.com

Note: Ambari does not create NSDs on the new nodes. To create IBM Spectrum Scale NSDs and add NSDs to

the filesystem, follow the ADD section in Deploying a big data solution using IBM Spectrum Scale.

Check the cluster information [root@c902f09x02 ~]# /usr/lpp/mmfs/bin/mmlscluster GPFS cluster information ======================== GPFS cluster name: bigpfs.pok.stglabs.ibm.com GPFS cluster id: 13434414655446927443 GPFS UID domain: bigpfs.pok.stglabs.ibm.com Remote shell command: /usr/bin/ssh

Page 115: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

115/134

Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation ------------------------------------------------------------------------------------------------ 1 c902f09x02.pok.stglabs.ibm.com 172.16.0.66 c902f09x02.pok.stglabs.ibm.com quorum 2 c902f09x08.pok.stglabs.ibm.com 172.16.0.72 c902f09x08.pok.stglabs.ibm.com 3 c902f09x03.pok.stglabs.ibm.com 172.16.0.67 c902f09x03.pok.stglabs.ibm.com quorum 4 c902f09x04.pok.stglabs.ibm.com 172.16.0.68 c902f09x04.pok.stglabs.ibm.com quorum 5 c902f09x13.pok.stglabs.ibm.com 172.16.0.77 c902f09x13.pok.stglabs.ibm.com [root@c902f09x02 ~]# [root@c902f09x02 ~]# /usr/lpp/mmfs/bin/mmgetstate -a Node number Node name GPFS state ------------------------------------------ 1 c902f09x02 active 2 c902f09x08 active 3 c902f09x03 active 4 c902f09x04 active 5 c902f09x13 active [root@c902f09x02 ~]# [root@c902f09x02 ~]# /usr/lpp/mmfs/bin/mmlsnsd File system Disk name NSD servers --------------------------------------------------------------------------- bigpfs gpfs1nsd c902f09x03.pok.stglabs.ibm.com bigpfs gpfs2nsd c902f09x04.pok.stglabs.ibm.com bigpfs gpfs3nsd c902f09x08.pok.stglabs.ibm.com bigpfs gpfs4nsd c902f09x02.pok.stglabs.ibm.com bigpfs gpfs5nsd c902f09x02.pok.stglabs.ibm.com bigpfs gpfs6nsd c902f09x03.pok.stglabs.ibm.com bigpfs gpfs7nsd c902f09x08.pok.stglabs.ibm.com bigpfs gpfs8nsd c902f09x03.pok.stglabs.ibm.com bigpfs gpfs9nsd c902f09x02.pok.stglabs.ibm.com bigpfs gpfs10nsd c902f09x08.pok.stglabs.ibm.com bigpfs gpfs11nsd c902f09x03.pok.stglabs.ibm.com bigpfs gpfs12nsd c902f09x08.pok.stglabs.ibm.com bigpfs gpfs13nsd c902f09x02.pok.stglabs.ibm.com bigpfs gpfs14nsd c902f09x04.pok.stglabs.ibm.com bigpfs gpfs15nsd c902f09x04.pok.stglabs.ibm.com bigpfs gpfs16nsd c902f09x04.pok.stglabs.ibm.com [root@c902f09x02 ~]# [root@c902f09x13 ~]# mount | grep bigpfs /dev/bigpfs on /bigpfs type gpfs (rw,relatime) [root@c902f09x13 ~]# [root@c902f09x02 ~]# /usr/lpp/mmfs/hadoop/sbin/mmhadoopctl connector getstate c902f09x02.pok.stglabs.ibm.com: namenode running as process 18463. c902f09x13.pok.stglabs.ibm.com: datanode running as process 1994. c902f09x08.pok.stglabs.ibm.com: datanode running as process 6745.

Page 116: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

116/134

c902f09x02.pok.stglabs.ibm.com: datanode running as process 20183. c902f09x03.pok.stglabs.ibm.com: datanode running as process 30854. c902f09x04.pok.stglabs.ibm.com: datanode running as process 23493. [root@c902f09x02 ~]#

Deleting a node

Decommissioning a DataNode and deleting GPFS master, GPFS nodes and GPFS transparency nodes are not

supported through the Ambari GUI with the installed GPFS Ambari Integration Module version 4.1-0.

Instead, from the Ambari GUI, you can stop all services for the node or put the node into maintenance mode.

However, the node can be deleted manually. See Removing a Host (2.1.0) under the Confluence – Apache Am-

bari wiki.

Moving a NameNode

IBM Spectrum Scale HDFS Transparency NameNode is stateless and does not maintain the FSimage-like infor-

mation.

The Move NameNode option is not supported by the Ambari HDFS GUI when HDFS Transparency is integrated

with the installed GPFS Ambari Integration Module version 4.1-0.

To manually move the NameNode:

1. From the dashboard, select Actions > Stop All. 2. On the Ambari server host, run the following command:

python /var/lib/ambari-server/resources/MoveNameNodeTransparency.py

Follow the command prompts and type the required input.

3. Once the command completes, a restart required icon is displayed next to the HDFS service. Restart the HDFS service.

4. From the dashboard, select Actions > Start All.

When HDFS Transparency is integrated, the Move NameNode option sets the new NameNode to be the same

value for both the HDFS NameNode and the HDFS Transparency NameNode.

For example:

Environment HDFS Transparency = Integrated HDFS NameNode = c902f09x02 HDFS Transparency NameNode = c902f09x02

Page 117: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

117/134

Execute Move NameNode:

Current NameNode (c902f09x02) will be moved to a new NameNode (c902f09x03)

Environment HDFS Transparency = Integrated HDFS NameNode = c902f09x03 HDFS Transparency NameNode = c902f09x03

If you unintegrated the HDFS Transparency after the Move NameNode procedure was run, the NameNode

value for both the HDFS NameNode and the HDFS Transparency NameNode will be the NameNode value set by

the Move NameNode procedure.

The Move NameNode Wizard in the HDFS page is displayed after unintegrating the HDFS Transparency. Use

the manual steps to start HDFS. Do not use the Move NameNode Wizard in the HDFS page after unintegrating

the HDFS Transparency if the Move NameNode procedure was ran during the time the HDFS Transparency was

integrated.

Perform the manual steps to correctly start HDFS:

1. Unintegrate HDFS Transparency

a) From the dashboard, select Actions > Stop All to stop all services. b) From the dashboard, select Spectrum Scale > Service Actions > Unintegrate_Transparency. c) On the Ambari server node, run ambari-server restart command to restart the Ambari server.

2. Manually enabling services based on HA status.

If HA is not enabled, perform the following steps:

For example: Name Node being moved = c902f09x02 Execute the Move NameNode during HDFS Transparency integration to a new NameNode c902f09x03 a) Copy the contents of /hadoop/hdfs/namenode from the NameNode being moved (c902f09x02) to

/hadoop/hdfs/namenode on the new NameNode (c902f09x03) b) On the new NameNode (c902f09x03), run the following commands: chown -R hdfs:hadoop /hadoop/hdfs/namenode mkdir -p /var/lib/hdfs/namenode/formatted

If HA is enabled, perform the following steps:

For example: NameNode being moved c902f09x02 Execute the Move NameNode during HDFS Transparency Integration to new NameNode c902f09x04

Page 118: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

118/134

NameNode not moved c902f09x03 a) Start the Zookeeper Server from the Ambari GUI. b) Start the NameNode that was not moved (c902f09x03) from the Hosts dashboard, selecting the NameNode

that was not moved > Summary tab > Components > NameNode / HDFS (Active or Standby) > Start.

This will start only the NameNode. Do not start any other services or hosts. c) Format the ZKFC on the NameNode that was not moved (c902f09x03) by running the following command: sudo su hdfs -l -c 'hdfs zkfc -formatZK' d) On the new NameNode (c902f09x04), run the following command:

sudo su hdfs -l -c 'hdfs namenode -bootstrapStandby'

3. Log in to Ambari.

4. From the dashboard, select Actions > Start All. The Hadoop cluster will now use the native HDFS.

I. Collecting the snap data

You can collect the IBM Spectrum Scale snap data from the Ambari GUI. The command is run by the IBM Spec-trum Scale Master and the snap data is saved to /var/log/ambari.gpfs.snap.<timestamp> on the IBM Spectrum Scale Master node.

Page 119: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

119/134

You can also override the default behavior of this snap by providing the arguments to be given to the gpfs.snap command in the file /var/lib/ambari-server/resources/gpfs.snap.args.

By default, the IBM Spectrum Scale Master runs the following command:

/usr/lpp/mmfs/bin/gpfs.snap -d /var/log/ambari.gpfs.snap.<timestamp> -N <all nodes> --check-space --timeout 600

Where <all nodes> is the list of nodes in the IBM Spectrum Scale cluster and in the Ambari cluster. The external nodes in a shared cluster, such as ESS servers, are not included.

If you wanted to override the default arguments, specify the arguments to be passed to gpfs.snap in /var/lib/ambari-server/resources/gpfs.snap.args. For example, if you wanted to write the snap data to a differ-ent location, collect snap data from all nodes in the cluster, and increase the timeout. You can provide a gpfs.snap.args file similar to that in the following example:

# cat /var/lib/ambari-server/resources/gpfs.snap.args -d /root/gpfs.snap.out -a --timeout 1200

You can see the output from the snap command and see the directory in which the snap data was written to by

looking at the output file from Ambari.

FIGURE 34 AMBARI COLLECT SNAP DATA

Page 120: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

120/134

J. Uninstalling Ambari IOP stack

To uninstall the Ambari IOP stack and all its services, do the following:

From the dashboard, select Actions > Stop All to stop all services.

Ambari Server node

On the Ambari server node, run the following commands:

# Stop the server ambari-server stop

#------------------------------------------------------------------------------------------------------------------------------------# # If this node is also the ambari agent, then run the following commands: # Stop the agent ambari-agent stop

# Run the cleanup configuration file which is a part of the ambari_agent python module # python<versionNumber> e.g. python2.6 python /usr/lib/python2.6/site-packages/ambari_agent/HostCleanup.py --silent --skip=users -f /etc/ambari-agent/conf/HostCleanup.ini,/etc/ambari-agent/conf/HostCleanup_Custom_Actions.ini

#------------------------------------------------------------------------------------------------------------------------------------#

# Removing all the ambari packages and directories yum erase -y ambari-* rm -rf /var/lib/ambari-server

rm -rf /etc/ambari* /usr/lib/python2.6/site-packages/ambari* /usr/lib/python2.6/site-packages/resource-management

#------------------------------------------------------------------------------------------------------------------------------------# # If this node is also the ambari agent, then run the following command: rm -rf /var/lib/ambari-agent /usr/lib/python2.6/site-packages/ambari_agent

#------------------------------------------------------------------------------------------------------------------------------------#

# Remove db, if possible yum erase -y postgresql-* yum erase -y mysql mysql-devl mysql-server rm -rf /var/lib/mysql/*

rm -rf /var/lib/pgsql* # Removing all the services directories

rm -rf /usr/iop/ /hadoop /etc/hadoop/ /etc/hive /etc/hbase/ /etc/oozie/ /etc/zookeeper/ /tmp/spark

Page 121: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

121/134

# Unlink all the packages unlink /usr/lib/python2.6/site-packages/ambari_commons unlink /usr/lib/python2.6/site-packages/resource_management unlink /usr/lib/python2.6/site-packages/ambari_jinja2 unlink /usr/lib/python2.6/site-packages/ambari_simplejson

yum clean all; yum makecache

Ambari Agent nodes

On each of the Ambari Agent nodes, run the following commands:

## If this node is also the Ambari server node, then follow the Ambari server node instructions ## ## Otherwise, follow the instructions below to clean up the ambari agent ## # Stop the agent ambari-agent stop

# Run the cleanup configuration file which is a part of the ambari_agent python module # python<versionNumber> e.g. python2.6 python /usr/lib/python2.6/site-packages/ambari_agent/HostCleanup.py --skip=users -f /etc/ambari-agent/conf/HostCleanup.ini,/etc/ambari-agent/conf/HostCleanup_Custom_Actions.ini

# Removing all the ambari packages and directories yum erase -y ambari-* rm -rf /var/lib/ambari-agent /usr/lib/python2.6/site-packages/ambari_agent /usr/lib/python2.6/site-pack-ages/ambari* /usr/lib/python2.6/site-packages/resource-management # Removing all the services directories

rm -rf /usr/iop/ /hadoop /etc/hadoop/ /etc/hive /etc/hbase/ /etc/oozie/ /etc/zookeeper/ /tmp/spark # Unlink all the packages unlink /usr/lib/python2.6/site-packages/ambari_commons unlink /usr/lib/python2.6/site-packages/resource_management unlink /usr/lib/python2.6/site-packages/ambari_jinja2 unlink /usr/lib/python2.6/site-packages/ambari_simplejson # remove agent ambari.repo, the reinstall of Ambari IOP will get the one from the Ambari server to the agent # nodes rm -rf /etc/yum.repos.d/ambari.repo

yum clean all; yum makecache

Page 122: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

122/134

Note: If IBM Spectrum Scale integration module has been installed, the IBM Spectrum Scale packages and di-

rectories are not removed. If NSD and partitions have been created, they are not removed. To clean up IBM

Spectrum Scale, see IBM Spectrum Scale documentation in the Knowledge Center.

K. Resources

Resource List

WIKI: IBM Open Platform with Apache Hadoop

WIKI: 2nd generatation HDFS Transparency Protocol

WIKI: Troubleshooting HDFS Transparency

IBM Knowledge Center: InfoSphere BigInsights V4.1

IBM Knowledge Center: Platform Symphony v7.1

IBM Knowledge Center: IBM Spectrum Scale v4.1.1

IBM Knowledge Center: IBM Spectrum Scale v4.2

Page 123: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

123/134

FAQ

1. What IBM Spectrum Scale edition is required for the Ambari deployment?

If you want to perform a new installation (including cluster creation, file system creation, and so on),

you need to use the Standard or Advanced edition because the IBM Spectrum Scale file system policy

is used by default. If you only have the Express Edition, select the Deploy the IOP over existing IBM

Spectrum Scale cluster mode.

2. Why do I fail in registering the Ambari agent?

You could run ps -elf | grep ambari on the failing agent node to see what it is running. Usually, while

registering in the agent node, there should be nothing under /etc/yum.repos.d/. If there is an addi-

tional repository that does not work because of an incorrect path or yum server address, the Ambari

agent register operation will fail.

3. Which yum repository must be under /etc/yum.repos.d?

Before registering, on the Ambari server node, under /etc/yum.repos.d, there is only one Ambari re-

pository file that you create in section 3.1. On the Ambari agent, there must be no repository files re-

lated with Ambari. After the Ambari agent has been registered successfully, the Ambari server copies

the Ambari repository to all Ambari agents. After that, the Ambari server creates the IOP and IOP-

UTILS repository over the Ambari server and agents, according to your specification in the Ambari GUI

in section 4.3.

If you interrupt the Ambari deployment, you will have to clean these files before starting up Ambari

the next time, especially when you specify a different IBM Spectrum Scale, IOP, or IOP-UTILS yum URL.

4. Must all nodes have the same root password?

No, this is unnecessary. You only need to specify the ssh key file for root on the Ambari server.

5. Why did the MapReduce services fail?

Look for the ambari-qa folder in the DFS user directory. If it does not exist, create it. If this step is

skipped, MapReduce service check will fail with the /user/ambari-qa path not found error.

As root:

mkdir<gpfs mount>/user/ambari-qa

chown ambari-qa.hadoop /gpfs/hadoopfs/user/ambari-qa

Page 124: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

124/134

6. How to check the superuser and the supergroup?

If you are using connector version hadoop-gpfs-2.7.0-3 or later, additional security controls are added

to support multiple user groups. Normally, just one super user “hdfs” and super group “hadoop” is

used. Control over the IDs that can access the distributed file system via HDFS is controlled by permis-

sions and ACLs defined on /var/run/ibm_bigpfs_gcd. To see the superuser or the super group:

ls -alt /var/run/ibm_bigpfs_gcd

srw-------. 1 hdfs hadoop 0 Dec 10 21:17 /var/run/ibm_bigpfs_gcd

7. How to set user permissions in the filesystem?

Create some directories to support the new connector, if they do not already exist

mkdir /var/mmfs/bi; chown hdfs:hadoop /var/mmfs/bi; chmod 660 /var/mmfs/bi

In this example, the HDFS superuser is hdfs and the super group is hadoop.

a) To allow a specific set of users to access the DFS via ACLs, perform the following on all nodes:

Note: For HDFS ACL support, install the following RPM packages: acl, libacl to enable Hadoop ACL

support, and libattr to enable Hadoop extended attributes on all nodes.

Note: Using fine grained control will require extensive testing for your applications. If a user ID is

not authorized to see the DFS through HDFS APIs, the error will be:

java.io.IOException: GPFSC00023E: Unable to establish communication with file system

at org.apache.hadoop.fs.gpfs.GeneralParallelFileSystem.lockNativeRootAction(GeneralParallelFileSys-

tem.java:2786)

at org.apache.hadoop.fs.gpfs.GeneralParallelFileSystem.getFileStatus(GeneralParallelFileSystem.java:799)

On every node, run:

yum install -y acl libacl libattr

To see the currently set ACLs, run:

getfacl /var/run/ibm_bigpfs_gcd

# file: ibm_bigpfs_gcd

# owner: root

Page 125: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

125/134

# group: root

user::rwx

group::---

other::---

b) To allow hdfs (super user in HDFS) to have full access to DFS, run the following command on all

nodes:

setfacl -m "u:hdfs:rwx" /var/run/ibm_bigpfs_gcd

c) To allow any service ID that is a member of hadoop group (e.g. Hadoop service IDs) to have full ac-

cess to DFS, run the following command on all nodes:

setfacl -m "g:hadoop:rwx" /var/run/ibm_bigpfs_gcd

8. Why is the Ambari GUI displaying the Service down message when the service process is active on the

target node?

a) Check whether the file /var/lib/ambari-agent/data/structured-out-status.json has a length of 0 bytes. If it does, remove the structured-out-status.json file.

b) Check the space usage of the file system where the json file resides. Free some space on the file system if the file system is full.

9. Why am I unable to connect to the Ambari Server through the web browser?

If you cannot connect to the Ambari Server through the web browser, check to see if the following message is displayed in the Ambari Server log which is located in /var/log/ambari-server:

WARN [main] AbstractConnector:335 - insufficient threads configured for [email protected]:8080

The size of the thread pool can be increased to match the number of CPUs on the node where the Am-bari Server is running.

For example, if you have 160 CPUs, add the following properties to /etc/ambari-server/conf/ambari.properties:

server.execution.scheduler.maxThreads=160 agent.threadpool.size.max=160 client.threadpool.size.max=160

10. If the Ambari GUI stops functioning, how must I fix it?

Page 126: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

126/134

The Ambari GUI might stop functioning due to unresolved exception handling in Ambari version 2.1.0

app.js.

A reason for this error can be a service sending an error where the app.js could not handle.

Perform one of the following actions to resolve this issue:

Restart the Ambari server from the Ambari server node.

# /usr/sbin/ambari-server restart

Use a different browser to log in to the Ambari server.

Restart Metrics by using Ambari REST APIs from the Ambari server node.

Replace the admin, $PASSWORD, AMBARI_SERVER_HOST, and CLUSTER_NAME with the corre-sponding values in the environment. Where admin:$PASSWORD is the admin userid and password. To Stop curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"con-text" :"Stop Ambari Metrics via REST"}, "Body": {"ServiceInfo": {"state": "INSTALLED"}}}' http://AM-BARI_SERVER_HOST:8080/api/v1/clusters/CLUSTER_NAME/services/AMBARI_METRICS

To Start curl -u admin:$PASSWORD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"con-text" :"Start Ambari Metrics via REST"}, "Body": {"ServiceInfo": {"state": "STARTED"}}}' http://AM-BARI_SERVER_HOST:8080/api/v1/clusters/CLUSTER_NAME/services/AMBARI_METRICS

11. Oozie fails to start after installation.

Error Message:

Page 127: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

127/134

resource_management.core.exceptions.Fail: Execution of 'cd /var/tmp/oozie && /usr/iop/current/oozie-

server/bin/oozie-start.sh' returned 255. WARN: Use of this script is deprecated; use 'oozied.sh start'

instead

Check that IOP-UTILS 1.1 package is used with the IOP 4.1 package. Note that IOP-UTILS 1.2 is not com-

patible with IOP 4.1.

12. Hive service check fails.

Restart Hive and run the service check again.

13. Refer to the 2nd generation HDFS Transparency troubleshooting wiki page for additional known prob-

lem determination

Page 128: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

128/134

Figures and Tables

Figure 1 ambari IBM Spectrum Scale hadoop local cache file stanza ................................................................... 35

Figure 2 ambari iop login ....................................................................................................................................... 44

Figure 3 ambari iop welcome page ....................................................................................................................... 45

Figure 4 ambari iop cluster name .......................................................................................................................... 45

Figure 5 ambari iop select stack ............................................................................................................................ 46

Figure 6 ambari iop install options – host list........................................................................................................ 48

Figure 7 ambari iop confirm hosts ......................................................................................................................... 49

Figure 8 ambari iop choose services ..................................................................................................................... 50

Figure 9 ambari iop assign masters ....................................................................................................................... 51

Figure 10 ambari iop assign slaves and clients ...................................................................................................... 52

Figure 11 ambari iop customize service iop tabs .................................................................................................. 53

Figure 12 ambari iop deployment review ............................................................................................................. 54

Figure 13 ambari iop install, start and test ............................................................................................................ 55

Figure 14 ambari iop summary .............................................................................................................................. 55

Figure 15 ambari iop main cluster view ................................................................................................................ 56

Figure 16 ambari iop stop all services ................................................................................................................... 59

Figure 17 ambari add services ............................................................................................................................... 62

Figure 18 ambari IBM Spectrum Scale service ...................................................................................................... 63

Figure 19 ambari IBM Spectrum Scale assign masters .......................................................................................... 64

Figure 20 ambari IBM Spectrum Scale assign slaves and clients ........................................................................... 65

Figure 21 ambari IBM Spectrum Scale data and metadata replicas ..................................................................... 68

Figure 22 ambari IBM Spectrum Scale customized services ................................................................................. 69

Figure 23 ambari IBM Spectrum Scale standard configurations ........................................................................... 70

Figure 24 ambari IBM Spectrum Scale advanced configurations .......................................................................... 71

Figure 25 ambari IBM Spectrum Scale review ....................................................................................................... 72

Figure 26 ambari IBM Spectrum Scale install, start and test ................................................................................ 73

Figure 27 ambari IBM Spectrum scale summary ................................................................................................... 73

Figure 28 ambari nsd stanza .................................................................................................................................. 86

Figure 29 ambari rack mapping ............................................................................................................................. 88

Figure 30 IBM Spectrum Scale Service Actions ..................................................................................................... 98

Figure 31 ambari upgrade IBM Spectrum Scale .................................................................................................. 100

Figure 32 IBM Spectrum Scale Integrate Transparency ...................................................................................... 103

Figure 33 IBM Spectrum Scale Unintegrate Transparency .................................................................................. 106

Figure 34 ambari collect snap data ..................................................................................................................... 119

Table 1 BigInsights Ambari packages..................................................................................................................... 14

Table 2 IOP packages ............................................................................................................................................. 15

Table 3 IBM Spectrum Scale Packages .................................................................................................................. 19

Table 4 HDFS Transparency and Ambari integration module ............................................................................... 19

Page 129: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

129/134

Table 5 IBM Spectrum Scale editions .................................................................................................................... 29

Table 6 IBM Spectrum Scale checklist parameters ............................................................................................... 67

Table 7 native hdfs and IBM Spectrum Scale differences ..................................................................................... 81

Table 8 IBM Spectrum Scale partitioning function matrix .................................................................................... 89

Page 130: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

130/134

Notices

This information was developed for products and services that are offered in the USA.

IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive, MD-NC119 Armonk, NY 10504-1785 United States of America For license inquiries regarding double-byte character set (DBCS) information, contact the IBM Intellectual Prop-erty Department in your country or send inquiries, in writing, to: Intellectual Property Licensing Legal and Intellectual Property Law IBM Japan Ltd. 19-21, Nihonbashi-Hakozakicho, Chuo-ku Tokyo 103-8510, Japan

The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WAR-RANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRAN-TIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM websites are provided for convenience only and do not in any manner serve as an endorsement of those websites. The materials at those websites are not part of the materi-als for this IBM product and use of those websites is at your own risk.

Page 131: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

131/134

IBM may use or distribute any of the information you supply in any way it believes appropriate without incur-ring any obligation to you. Licensees of this program who wish to have information about it for the purpose of enabling: (i)the exchange of

information between independently created programs and other programs (including this one) and (ii) the mu-

tual use of the information which has been exchanged, should contact:

IBM Director of Licensing IBM Corporation North Castle Drive, MD-NC119 Armonk, NY 10504-1785 US Such information may be available, subject to appropriate terms and conditions, including in some cases, pay-ment of a fee. The licensed program described in this document and all licensed material available for it are provided by IBM under terms of the IBM Customer Agreement, IBM International Program License Agreement or any equivalent agreement between us. Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same ongener-ally available systems. Furthermore, some measurements may have been estimated through extrapolation. Ac-tual results may vary. Users of this document should verify the applicable data for their specific environment. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the ca-pabilities of non-IBM products should be addressed to the suppliers of those products. Statements regarding IBM's future direction or intent are subject to change or withdrawal without notice, and represent goals and objectives only. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual businessenterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming tech-niques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing applicationpro-grams conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. The sample programs are provided "AS IS", without warranty of any kind. IBM shall not be liable for any damages arising out of your use of the sample programs.

Page 132: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

132/134

Each copy or any portion of these sample programs or any derivative work, must include a copyright notice as

follows:

Portions of this code are derived from IBM Corp. Sample Programs. © Copyright IBM Corp. 2016. All rights reserved.

Trademarks

IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at "Copyright and trademark information" (www.ibm.com/legal/copytrade.shtml). Java™ and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

Other company, product, or service names may be trademarks or service marks of others.

Terms and conditions for product documentation Permissions for the use of these publications are granted subject to the following terms and conditions. Applicability These terms and conditions are in addition to any terms of use for the IBM website. Personal use You may reproduce these publications for your personal, noncommercial use provided that all proprietary no-tices are preserved. You may not distribute, display or make derivative work of these publications, or any por-tion thereof, without the express consent of IBM. Commercial use You may reproduce, distribute and display these publications solely within your enterprise provided that all proprietary notices are preserved. You may not make derivative works of these publications, or reproduce, dis-tribute or display these publications or any portion thereof outside your enterprise, without the express con-sent of IBM. Rights Except as expressly granted in this permission, no other permissions, licenses or rights are granted, either ex-press or implied, to the publications or any information, data, software or other intellectual property contained therein. IBM reserves the right to withdraw the permissions granted herein whenever, in its discretion, the use of the

publications is detrimental to its interest or, as determined by IBM, the above instructions are not being

properly followed.

You may not download, export or re-export this information except in full compliance with all applicable laws and regulations, including all United States export laws and regulations.

Page 133: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

133/134

IBM MAKES NO GUARANTEE ABOUT THE CONTENT OF THESE PUBLICATIONS. THE PUBLICATIONS ARE PRO-

VIDED "AS-IS" AND WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT

LIMITED TO IMPLIED WARRANTIES OF MERCHANTABILITY, NON-INFRINGEMENT, AND FITNESS FOR A PARTICU-

LAR PURPOSE.

IBM Online Privacy Statement IBM Software products, including software as a service solutions, (“Software Offerings”) may use cookies or other technologies to collect product usage information, to help improve the end user experience, to tailor in-teractions with the end user, or for other purposes. In many cases no personally identifiable information is col-lected by the Software Offerings. Some of our Software Offerings can help enable you to collect personally identifiable information. If this Software Offering uses cookies to collect personally identifiable information, specific information about this offering’s use of cookies is set forth below. This Software Offering does not use cookies or other technologies to collect personally identifiable information. If the configurations deployed for this Software Offering provide you as customer the ability to collect person-ally identifiable information from end users via cookies and other technologies, you should seek your own legal advice about any laws applicable to such data collection, including any requirements for notice and consent. For more information about the use of various technologies, including cookies, for these purposes, see IBM’s Privacy Policy at http://www.ibm.com/privacy and IBM’s Online Privacy Statement at http://www.ibm.com/privacy/details in the section entitled “Cookies, Web Beacons and Other Technologies”, and the “IBM Software Products and Software-as-a-Service Privacy Statement” at http://www.ibm.com/soft-ware/info/product-privacy.

Page 134: Deploying BigInsights 4.1 IBM Spectrum Scale HDFS Transparency

134/134

Revisions Number Date Comments

.01 5/01/16 Initial Raw Draft

.02 - .21 5/15/16 – 7/8/16 Comments from development, test and ID teams

1.0 7/8/16 GA version