deployment guide (apache)

123
Kunpeng BoostKit for Big Data Deployment Guide (Apache) Issue 05 Date 2021-10-19 HUAWEI TECHNOLOGIES CO., LTD.

Upload: others

Post on 08-May-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deployment Guide (Apache)

Kunpeng BoostKit for Big Data

Deployment Guide (Apache)

Issue 05

Date 2021-10-19

HUAWEI TECHNOLOGIES CO., LTD.

Page 2: Deployment Guide (Apache)

Copyright © Huawei Technologies Co., Ltd. 2021. All rights reserved.

No part of this document may be reproduced or transmitted in any form or by any means without priorwritten consent of Huawei Technologies Co., Ltd. Trademarks and Permissions

and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.All other trademarks and trade names mentioned in this document are the property of their respectiveholders. NoticeThe purchased products, services and features are stipulated by the contract made between Huawei andthe customer. All or part of the products, services and features described in this document may not bewithin the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements,information, and recommendations in this document are provided "AS IS" without warranties, guaranteesor representations of any kind, either express or implied.

The information in this document is subject to change without notice. Every effort has been made in thepreparation of this document to ensure accuracy of the contents, but all statements, information, andrecommendations in this document do not constitute a warranty of any kind, express or implied.

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. i

Page 3: Deployment Guide (Apache)

Contents

1 ZooKeeper Deployment Guide (CentOS 7.6 & openEuler 20.03).................................11.1 Introduction............................................................................................................................................................................... 11.2 Environment Requirements................................................................................................................................................. 21.3 Configuring the Deployment Environment.................................................................................................................... 31.4 Deploying ZooKeeper.............................................................................................................................................................41.4.1 Compiling and Decompressing ZooKeeper................................................................................................................. 41.4.2 Setting ZooKeeper Environment Variables................................................................................................................. 41.4.3 Modifying the ZooKeeper Configuration Files........................................................................................................... 41.4.4 Synchronizing the Configuration to Other Nodes.................................................................................................... 51.5 Running and Verifying ZooKeeper.....................................................................................................................................6

2 Hadoop Cluster Deployment Guide (CentOS 7.6 & openEuler 20.03)........................72.1 Introduction............................................................................................................................................................................... 72.2 Environment Requirements................................................................................................................................................. 82.3 Configuring the Deployment Environment.................................................................................................................... 92.4 Deploying ZooKeeper.......................................................................................................................................................... 102.4.1 Compiling and Decompressing ZooKeeper...............................................................................................................102.4.2 Setting ZooKeeper Environment Variables............................................................................................................... 102.4.3 Modifying the ZooKeeper Configuration Files........................................................................................................ 112.4.4 Synchronizing the Configuration to Other Nodes..................................................................................................122.4.5 Running and Verifying ZooKeeper...............................................................................................................................122.5 Deploying Hadoop................................................................................................................................................................122.5.1 Compiling and Decompressing Hadoop.................................................................................................................... 122.5.2 Setting the Hadoop Environment Variables.............................................................................................................122.5.3 Modifying the Hadoop Configuration File................................................................................................................132.5.4 Synchronizing the Configuration to Other Nodes..................................................................................................182.5.5 Starting the Hadoop Cluster..........................................................................................................................................182.5.6 Verifying Hadoop...............................................................................................................................................................192.6 Troubleshooting..................................................................................................................................................................... 20

3 Flink Deployment Guide (CentOS 7.6 & openEuler 20.03)......................................... 213.1 Introduction............................................................................................................................................................................ 213.2 Environment Requirements............................................................................................................................................... 223.3 Configuring the Deployment Environment.................................................................................................................. 23

Kunpeng BoostKit for Big DataDeployment Guide (Apache) Contents

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. ii

Page 4: Deployment Guide (Apache)

3.4 Deploying ZooKeeper.......................................................................................................................................................... 243.4.1 Compiling and Decompressing ZooKeeper...............................................................................................................243.4.2 Setting ZooKeeper Environment Variables............................................................................................................... 243.4.3 Modifying the ZooKeeper Configuration Files........................................................................................................ 253.4.4 Synchronizing the Configuration to Other Nodes..................................................................................................263.4.5 Running and Verifying ZooKeeper...............................................................................................................................263.5 Deploying Hadoop................................................................................................................................................................263.5.1 Compiling and Decompressing Hadoop.................................................................................................................... 263.5.2 Setting the Hadoop Environment Variables.............................................................................................................263.5.3 Modifying the Hadoop Configuration File................................................................................................................273.5.4 Synchronizing the Configuration to Other Nodes..................................................................................................323.5.5 Starting the Hadoop Cluster..........................................................................................................................................323.5.6 Verifying Hadoop...............................................................................................................................................................333.6 Deploying Flink (Flink on Yarn)....................................................................................................................................... 343.6.1 Obtaining Flink...................................................................................................................................................................343.6.2 Setting Flink Environment Variables...........................................................................................................................343.6.3 Modifying the Flink Configuration Files.................................................................................................................... 353.6.4 Running and Verifying Flink.......................................................................................................................................... 353.6.5 Stopping Flink..................................................................................................................................................................... 35

4 HBase Cluster Deployment Guide (CentOS 7.6 & openEuler 20.03).........................364.1 Introduction............................................................................................................................................................................ 364.2 Environment Requirements............................................................................................................................................... 374.3 Configuring the Deployment Environment.................................................................................................................. 384.4 Deploying ZooKeeper.......................................................................................................................................................... 394.4.1 Compiling and Decompressing ZooKeeper...............................................................................................................394.4.2 Setting ZooKeeper Environment Variables............................................................................................................... 394.4.3 Modifying the ZooKeeper Configuration Files........................................................................................................ 404.4.4 Synchronizing the Configuration to Other Nodes..................................................................................................414.4.5 Running and Verifying ZooKeeper...............................................................................................................................414.5 Deploying Hadoop................................................................................................................................................................414.5.1 Compiling and Decompressing Hadoop.................................................................................................................... 414.5.2 Setting the Hadoop Environment Variables.............................................................................................................424.5.3 Modifying the Hadoop Configuration File................................................................................................................424.5.4 Synchronizing the Configuration to Other Nodes..................................................................................................474.5.5 Starting the Hadoop Cluster..........................................................................................................................................474.5.6 Verifying Hadoop...............................................................................................................................................................484.6 Deploying HBase................................................................................................................................................................... 494.6.1 Obtaining HBase................................................................................................................................................................494.6.2 Setting HBase Environment Variables........................................................................................................................494.6.3 Modifying the HBase Configuration Files................................................................................................................. 504.6.4 Synchronizing the Configuration to Other Nodes..................................................................................................514.6.5 Starting the HBase Cluster............................................................................................................................................. 51

Kunpeng BoostKit for Big DataDeployment Guide (Apache) Contents

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. iii

Page 5: Deployment Guide (Apache)

4.6.6 (Optional) Stopping the HBase Cluster..................................................................................................................... 524.6.7 Verifying HBase.................................................................................................................................................................. 52

5 Hive Deployment Guide (CentOS 7.6 & openEuler 20.03).......................................... 535.1 Introduction............................................................................................................................................................................ 535.2 Environment Requirements............................................................................................................................................... 545.3 Configuring the Deployment Environment.................................................................................................................. 555.4 Deploying ZooKeeper.......................................................................................................................................................... 565.4.1 Compiling and Decompressing ZooKeeper...............................................................................................................565.4.2 Setting ZooKeeper Environment Variables............................................................................................................... 565.4.3 Modifying the ZooKeeper Configuration Files........................................................................................................ 575.4.4 Synchronizing the Configuration to Other Nodes..................................................................................................585.4.5 Running and Verifying ZooKeeper...............................................................................................................................585.5 Deploying Hadoop................................................................................................................................................................585.5.1 Compiling and Decompressing Hadoop.................................................................................................................... 585.5.2 Setting the Hadoop Environment Variables.............................................................................................................585.5.3 Modifying the Hadoop Configuration File................................................................................................................595.5.4 Synchronizing the Configuration to Other Nodes..................................................................................................645.5.5 Starting the Hadoop Cluster..........................................................................................................................................645.5.6 Verifying Hadoop...............................................................................................................................................................655.6 Deploying Hive.......................................................................................................................................................................665.6.1 Installing MariaDB............................................................................................................................................................ 665.6.2 Obtaining Hive................................................................................................................................................................... 675.6.3 Setting Hive Environment Variables........................................................................................................................... 675.6.4 Modifying the Hive Configuration Files.....................................................................................................................685.6.5 Starting and Verifying Hive............................................................................................................................................68

6 Kafka Deployment Guide (CentOS 7.6 & openEuler 20.03)........................................716.1 Introduction............................................................................................................................................................................ 716.2 Environment Requirements............................................................................................................................................... 726.3 Configuring the Deployment Environment.................................................................................................................. 736.4 Deploying ZooKeeper.......................................................................................................................................................... 746.4.1 Compiling and Decompressing ZooKeeper...............................................................................................................746.4.2 Setting ZooKeeper Environment Variables............................................................................................................... 746.4.3 Modifying the ZooKeeper Configuration Files........................................................................................................ 756.4.4 Synchronizing the Configuration to Other Nodes..................................................................................................756.4.5 Running and Verifying ZooKeeper...............................................................................................................................766.5 Deploying Kafka.................................................................................................................................................................... 766.5.1 Obtaining Kafka................................................................................................................................................................. 766.5.2 Setting Kafka Environment Variables......................................................................................................................... 766.5.3 Modifying the Kafka Configuration Files.................................................................................................................. 776.5.4 Verifying Kafka................................................................................................................................................................... 77

7 Solr Deployment Guide (CentOS 7.6 & openEuler 20.03)........................................... 79

Kunpeng BoostKit for Big DataDeployment Guide (Apache) Contents

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. iv

Page 6: Deployment Guide (Apache)

7.1 Introduction............................................................................................................................................................................ 797.2 Environment Requirements............................................................................................................................................... 807.3 Configuring the Deployment Environment.................................................................................................................. 817.4 Deploying ZooKeeper.......................................................................................................................................................... 827.4.1 Compiling and Decompressing ZooKeeper...............................................................................................................827.4.2 Setting ZooKeeper Environment Variables............................................................................................................... 827.4.3 Modifying the ZooKeeper Configuration Files........................................................................................................ 837.4.4 Synchronizing the Configuration to Other Nodes..................................................................................................847.4.5 Running and Verifying ZooKeeper...............................................................................................................................847.5 Deploying Solr........................................................................................................................................................................847.5.1 Obtaining Solr.................................................................................................................................................................... 847.5.2 Setting Solr Environment Variables............................................................................................................................ 857.5.3 Copy the Solr Configuration.......................................................................................................................................... 857.5.4 Modifying the Configuration......................................................................................................................................... 857.5.5 Synchronizing the Configuration to Other Nodes..................................................................................................867.5.6 Uploading the Configuration to the ZooKeeper Cluster...................................................................................... 867.5.7 Running and Verifying Solr............................................................................................................................................ 87

8 Spark Cluster Deployment Guide (CentOS 7.6 & openEuler 20.03)......................... 888.1 Introduction............................................................................................................................................................................ 888.2 Environment Requirements............................................................................................................................................... 898.3 Configuring the Deployment Environment.................................................................................................................. 908.4 Deploying ZooKeeper.......................................................................................................................................................... 918.4.1 Compiling and Decompressing ZooKeeper...............................................................................................................918.4.2 Setting ZooKeeper Environment Variables............................................................................................................... 928.4.3 Modifying the ZooKeeper Configuration Files........................................................................................................ 928.4.4 Synchronizing the Configuration to Other Nodes..................................................................................................938.4.5 Running and Verifying ZooKeeper...............................................................................................................................938.5 Deploying Hadoop................................................................................................................................................................938.5.1 Compiling and Decompressing Hadoop.................................................................................................................... 938.5.2 Setting the Hadoop Environment Variables.............................................................................................................948.5.3 Modifying the Hadoop Configuration File................................................................................................................948.5.4 Synchronizing the Configuration to Other Nodes..................................................................................................998.5.5 Starting the Hadoop Cluster..........................................................................................................................................998.5.6 Verifying Hadoop............................................................................................................................................................ 1018.6 Deploying Spark.................................................................................................................................................................. 1018.6.1 Obtaining Spark...............................................................................................................................................................1018.6.2 Setting Spark Environment Variables.......................................................................................................................1028.6.3 Modifying the Spark Configuration Files................................................................................................................ 1028.6.4 Running Spark (Standalone Mode).......................................................................................................................... 1038.6.4.1 Synchronizing the Configuration to Other Nodes............................................................................................1038.6.4.2 Starting the Spark Cluster........................................................................................................................................ 1038.6.4.3 (Optional) Stopping the Spark Cluster................................................................................................................ 103

Kunpeng BoostKit for Big DataDeployment Guide (Apache) Contents

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. v

Page 7: Deployment Guide (Apache)

8.6.5 Running Spark (on Yarn Mode)................................................................................................................................. 1038.6.5.1 Installing Scala..............................................................................................................................................................1038.6.5.2 Running in the Yarn-client Mode........................................................................................................................... 1048.6.5.3 Using HiBench to Verify the Functions................................................................................................................ 105

9 Storm Deployment Guide (CentOS 7.6 & openEuler 20.03).....................................1079.1 Introduction.......................................................................................................................................................................... 1079.2 Environment Requirements.............................................................................................................................................1089.3 Configuring the Deployment Environment............................................................................................................... 1099.4 Deploying ZooKeeper........................................................................................................................................................ 1109.4.1 Compiling and Decompressing ZooKeeper............................................................................................................ 1109.4.2 Setting ZooKeeper Environment Variables.............................................................................................................1109.4.3 Modifying the ZooKeeper Configuration Files...................................................................................................... 1119.4.4 Synchronizing the Configuration to Other Nodes............................................................................................... 1119.4.5 Running and Verifying ZooKeeper............................................................................................................................ 1129.5 Deploying Storm................................................................................................................................................................. 1129.5.1 Obtaining Storm.............................................................................................................................................................. 1129.5.2 Setting Storm Environment Variables......................................................................................................................1129.5.3 Modifying the Storm Configuration File................................................................................................................. 1139.5.4 Synchronizing the Configuration to Other Nodes............................................................................................... 1139.5.5 Runing and Verifying Storm........................................................................................................................................ 113

A Change History....................................................................................................................115

Kunpeng BoostKit for Big DataDeployment Guide (Apache) Contents

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. vi

Page 8: Deployment Guide (Apache)

1 ZooKeeper Deployment Guide (CentOS7.6 & openEuler 20.03)

1.1 Introduction

1.2 Environment Requirements

1.3 Configuring the Deployment Environment

1.4 Deploying ZooKeeper

1.5 Running and Verifying ZooKeeper

1.1 Introduction

ZooKeeper OverviewThis document describes the ZooKeeper deployment procedure and does notinclude the source code compilation procedure.

All programs required in this document are downloaded from the official websites.Most of these programs are compiled based on the x86 platform and may containmodules that are implemented in platform-dependent languages (such as C/C++).Therefore, incompatibility issues may occur if these programs are directly run onTaiShan servers. To resolve the problem, you need to download and compile thesource code and then deploy the programs. The deployment procedure is the sameregardless of the program compilation platform.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

1 ZooKeeper Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 1

Page 9: Deployment Guide (Apache)

Recommended VersionsSoftware

Version

How to Obtain

OpenJDK

jdk8u252-b09

ARM: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gzx86: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz

ZooKeeper

3.4.6 Download the software package of the required versionfrom the official website:https://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/

1.2 Environment Requirements

HardwareMinimum configuration: any CPU, one DIMM of any capacity, and one drive of anycapacity

The configuration depends on the actual application scenario.

OS RequirementsCentOS 7.4 to 7.6, openEuler 20.03

NO TE

This document uses CentOS 7.6 as an example to describe how to deploy a ZooKeepercluster.

Cluster EnvironmentFor example, the cluster has nodes 1 to 4. Table 1-1 lists the data plan of eachnode.

Table 1-1 Cluster data plan

Node IP Address Number of Drives OS & JDK

Node 1 IPaddress1 System drive: 1 x 4TB HDD

Data drive: 12 x 4TB HDD

CentOS 7.6 & OpenJDKjdk8u252-b09

Node 2 IPaddress2

Node 3 IPaddress3

Node 4 IPaddress4

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

1 ZooKeeper Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 2

Page 10: Deployment Guide (Apache)

Software PlanningTable 1-2 lists the software plan of each node in the cluster.

Table 1-2 Software plan

Node Services

Node 1 -

Node 2 QuorumPeerMain

Node 3 QuorumPeerMain

Node 4 QuorumPeerMain

1.3 Configuring the Deployment EnvironmentStep 1 Log in to nodes 1 to 4 in sequence and change their host names to server1,

agent1, agent2 and agent3.hostnamectl set-hostname host_name --static

Step 2 Log in to each node and modify the /etc/hosts file.

Add the mapping between the IP addresses and host names of the nodes to thehosts file.

IPaddress1 server1IPaddress2 agent1IPaddress3 agent2IPaddress4 agent3

Step 3 Log in to each node and disable the firewall.systemctl stop firewalld.servicesystemctl disable firewalld.service

Step 4 Log in to each node and enable password-free SSH login.

1. Generate a key and press Enter if any message is prompted.ssh-keygen -t rsa

2. Enable password-free SSH login on each node (including password-free loginfor the local node):ssh-copy-id -i ~/.ssh/id_rsa.pub root@node_IP_address

Step 5 Log in to each node and install OpenJDK.

1. Install OpenJDK.ARM:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gz -C /usr/local

x86:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz -C /usr/local

2. Add environment variables.vim /etc/profile

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

1 ZooKeeper Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 3

Page 11: Deployment Guide (Apache)

export JAVA_HOME=/usr/local/jdk8u252-b09export PATH=$JAVA_HOME/bin:$PATH

3. Make the environment variables take effect.source /etc/profile

4. Check whether the OpenJDK is successfully installed.java -version

The installation is successful if information similar to the following isdisplayed:

----End

1.4 Deploying ZooKeeper

1.4.1 Compiling and Decompressing ZooKeeperStep 1 Compile the zookeeper-3.4.6.tar.gz deployment package by following instructions

in ZooKeeper 3.4.6 Porting Guide (CentOS 7.6).

Step 2 Place zookeeper-3.4.6.tar.gz in the /usr/local directory on agent1 anddecompress it.mv zookeeper-3.4.6.tar.gz /usr/localcd /usr/localtar -zxvf zookeeper-3.4.6.tar.gz

Step 3 Create a soft link for subsequent version update.ln -s zookeeper-3.4.6 zookeeper

----End

1.4.2 Setting ZooKeeper Environment VariablesStep 1 Open the configuration file.

vim /etc/profile

Step 2 Add ZooKeeper to environment variables.export ZOOKEEPER_HOME=/usr/local/zookeeperexport PATH=$ZOOKEEPER_HOME/bin:$PATH

Step 3 Make the environment variables take effect.source /etc/profile

----End

1.4.3 Modifying the ZooKeeper Configuration FilesStep 1 Switch to the directory where ZooKeeper is located.

cd /usr/local/zookeeper/conf

Step 2 Copy the configuration file.cp zoo_sample.cfg zoo.cfg

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

1 ZooKeeper Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 4

Page 12: Deployment Guide (Apache)

Step 3 Modify the configuration file.vim zoo.cfg

1. Change the data directory.dataDir=/usr/local/zookeeper/tmp

2. Add the following code to the end of the file. server.1 to server.3 are thenodes where ZooKeeper is deployed.server.1=agent1:2888:3888server.2=agent2:2888:3888server.3=agent3:2888:3888

Step 4 Create the tmp directory as the data directory.mkdir /usr/local/zookeeper/tmp

Step 5 Create an empty file in the tmp directory and write an ID to the file.touch /usr/local/zookeeper/tmp/myidecho 1 > /usr/local/zookeeper/tmp/myid

----End

1.4.4 Synchronizing the Configuration to Other NodesStep 1 Copy the ZooKeeper configuration to other nodes.

scp -r /usr/local/zookeeper-3.4.6 root@agent2:/usr/localscp -r /usr/local/zookeeper-3.4.6 root@agent3:/usr/local

Step 2 Create a soft link and modify myid on agent2 and agent3.

● agent2:cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 2 > /usr/local/zookeeper/tmp/myid

● agent3:

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

1 ZooKeeper Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 5

Page 13: Deployment Guide (Apache)

cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 3 > /usr/local/zookeeper/tmp/myid

----End

1.5 Running and Verifying ZooKeeperStep 1 Start ZooKeeper on agent1, agent2, and agent3.

cd /usr/local/zookeeper/bin./zkServer.sh start

NO TE

You can stop ZooKeeper on agent1, agent2, and agent3.cd /usr/local/zookeeper/bin./zkServer.sh stop

Step 2 Check the ZooKeeper status../zkServer.sh status

----End

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

1 ZooKeeper Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 6

Page 14: Deployment Guide (Apache)

2 Hadoop Cluster Deployment Guide(CentOS 7.6 & openEuler 20.03)

2.1 Introduction

2.2 Environment Requirements

2.3 Configuring the Deployment Environment

2.4 Deploying ZooKeeper

2.5 Deploying Hadoop

2.6 Troubleshooting

2.1 Introduction

OverviewThis document describes the software deployment procedure and does not involvethe software source code compilation procedure.

You can download all programs required in this document from their officialwebsites. Most of these programs are compiled based on the x86 platform andmay contain modules that are implemented in platform-dependent languages(such as C/C++). Therefore, incompatibility issues may occur if these programsdirectly run on TaiShan servers. To resolve the problem, you need to download andcompile the source code and then deploy the programs. The deploymentprocedure remains the same regardless of the program compilation platform.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

2 Hadoop Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 7

Page 15: Deployment Guide (Apache)

Recommended Software VersionsSoftware

Version

How to Obtain

OpenJDK

jdk8u252-b09

ARM: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gzx86: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz

ZooKeeper

3.4.6 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/

Hadoop 3.1.1 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/hadoop/core/hadoop-3.1.1/

2.2 Environment Requirements

HardwareMinimum configuration: any CPU, one DIMM of any capacity, and one drive of anycapacity

The configuration depends on the actual application scenario.

OSCentOS 7.4 to 7.6, openEuler 20.03

NO TE

This section uses CentOS 7.6 as an example to describe how to deploy a Hadoop cluster.

Cluster Environment PlanIn this document, four hosts are used as nodes 1 to 4 in a cluster. Table 2-1 liststhe data specifications of each node.

Table 2-1 Cluster environment plan

MachineName

IP Address Number ofDrives

OS and JDK

Node 1 IPaddress1 System drive: 1 x4 TB HDD

CentOS 7.6 and OpenJDKjdk8u252-b09

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

2 Hadoop Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 8

Page 16: Deployment Guide (Apache)

MachineName

IP Address Number ofDrives

OS and JDK

Node 2 Data drive: 12 x 4TB HDD

IPaddress2

Node 3 IPaddress3

Node 4 IPaddress4

Software Plan

Table 2-2 describes the software planning of each node in the cluster.

Table 2-2 Software plan

MachineName

Service Name

Node 1 NameNode and ResourceManager

Node 2 QuorumPeerMain, DataNode, NodeManager, and JournalNode

Node 3 QuorumPeerMain, DataNode, NodeManager, and JournalNode

Node 4 QuorumPeerMain, DataNode, NodeManager, and JournalNode

2.3 Configuring the Deployment EnvironmentStep 1 Log in to nodes 1 to 4 in sequence and change their host names to server1,

agent1, agent2 and agent3.hostnamectl set-hostname host_name --static

Step 2 Log in to each node and modify the /etc/hosts file.

Add the mapping between the IP addresses and host names of the nodes to thehosts file.

IPaddress1 server1IPaddress2 agent1IPaddress3 agent2IPaddress4 agent3

Step 3 Log in to each node and disable the firewall.systemctl stop firewalld.servicesystemctl disable firewalld.service

Step 4 Log in to each node and enable password-free SSH login.

1. Generate a key and press Enter if any message is prompted.ssh-keygen -t rsa

2. Enable password-free SSH login on each node (including password-free loginfor the local node):ssh-copy-id -i ~/.ssh/id_rsa.pub root@node_IP_address

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

2 Hadoop Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 9

Page 17: Deployment Guide (Apache)

Step 5 Log in to each node and install OpenJDK.

1. Install OpenJDK.ARM:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gz -C /usr/local

x86:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz -C /usr/local

2. Add environment variables.vim /etc/profileexport JAVA_HOME=/usr/local/jdk8u252-b09export PATH=$JAVA_HOME/bin:$PATH

3. Make the environment variables take effect.source /etc/profile

4. Check whether the OpenJDK is successfully installed.java -version

The installation is successful if information similar to the following isdisplayed:

----End

2.4 Deploying ZooKeeper

2.4.1 Compiling and Decompressing ZooKeeperStep 1 Compile the zookeeper-3.4.6.tar.gz deployment package by following instructions

in ZooKeeper 3.4.6 Porting Guide (CentOS 7.6).

Step 2 Place zookeeper-3.4.6.tar.gz in the /usr/local directory on agent1 anddecompress it.mv zookeeper-3.4.6.tar.gz /usr/localcd /usr/localtar -zxvf zookeeper-3.4.6.tar.gz

Step 3 Create a soft link for subsequent version update.ln -s zookeeper-3.4.6 zookeeper

----End

2.4.2 Setting ZooKeeper Environment VariablesStep 1 Open the configuration file.

vim /etc/profile

Step 2 Add ZooKeeper to environment variables.export ZOOKEEPER_HOME=/usr/local/zookeeperexport PATH=$ZOOKEEPER_HOME/bin:$PATH

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

2 Hadoop Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 10

Page 18: Deployment Guide (Apache)

Step 3 Make the environment variables take effect.source /etc/profile

----End

2.4.3 Modifying the ZooKeeper Configuration FilesStep 1 Switch to the directory where ZooKeeper is located.

cd /usr/local/zookeeper/conf

Step 2 Copy the configuration file.cp zoo_sample.cfg zoo.cfg

Step 3 Modify the configuration file.vim zoo.cfg

1. Change the data directory.dataDir=/usr/local/zookeeper/tmp

2. Add the following code to the end of the file. server.1 to server.3 are thenodes where ZooKeeper is deployed.server.1=agent1:2888:3888server.2=agent2:2888:3888server.3=agent3:2888:3888

Step 4 Create the tmp directory as the data directory.mkdir /usr/local/zookeeper/tmp

Step 5 Create an empty file in the tmp directory and write an ID to the file.touch /usr/local/zookeeper/tmp/myidecho 1 > /usr/local/zookeeper/tmp/myid

----End

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

2 Hadoop Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 11

Page 19: Deployment Guide (Apache)

2.4.4 Synchronizing the Configuration to Other NodesStep 1 Copy the ZooKeeper configuration to other nodes.

scp -r /usr/local/zookeeper-3.4.6 root@agent2:/usr/localscp -r /usr/local/zookeeper-3.4.6 root@agent3:/usr/local

Step 2 Create a soft link and modify myid on agent2 and agent3.● agent2:

cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 2 > /usr/local/zookeeper/tmp/myid

● agent3:cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 3 > /usr/local/zookeeper/tmp/myid

----End

2.4.5 Running and Verifying ZooKeeperStep 1 Start ZooKeeper on agent1, agent2, and agent3.

cd /usr/local/zookeeper/bin./zkServer.sh start

NO TE

You can stop ZooKeeper on agent1, agent2, and agent3.cd /usr/local/zookeeper/bin./zkServer.sh stop

Step 2 Check the ZooKeeper status../zkServer.sh status

----End

2.5 Deploying Hadoop

2.5.1 Compiling and Decompressing HadoopStep 1 Compile the Hadoop software deployment package hadoop-3.1.1.tar.gz by

referring to Hadoop 3.1.1 Porting Guide (CentOS 7.6).

Step 2 Place hadoop-3.1.1.tar.gz in the /usr/local directory on server1 and decompressit.mv hadoop-3.1.1.tar.gz /usr/localcd /usr/localtar -zxvf hadoop-3.1.1.tar.gz

Step 3 Create a soft link for later version replacement.ln -s hadoop-3.1.1 hadoop

----End

2.5.2 Setting the Hadoop Environment VariablesStep 1 Open the /etc/profile file:

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

2 Hadoop Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 12

Page 20: Deployment Guide (Apache)

vim /etc/profile

Step 2 Add the following environment variables to the end of the file:export HADOOP_HOME=/usr/local/hadoopexport PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

Step 3 Make the environment variables take effect.source /etc/profile

----End

2.5.3 Modifying the Hadoop Configuration FileNO TE

All Hadoop configuration files are stored in the $HADOOP_HOME/etc/hadoop directory.Before modifying the configuration files, go to the $HADOOP_HOME/etc/hadoop directoryfirst.cd $HADOOP_HOME/etc/hadoop

Modifying the hadoop-env.sh FileChange the environment variable JAVA_HOME to an absolute path and set theuser to user root.

echo "export JAVA_HOME=/usr/local/jdk8u252-b09" >> hadoop-env.shecho "export HDFS_NAMENODE_USER=root" >> hadoop-env.shecho "export HDFS_SECONDARYNAMENODE_USER=root" >> hadoop-env.shecho "export HDFS_DATANODE_USER=root" >> hadoop-env.sh

Modifying the yarn-env.sh FileChange the user to user root.

echo "export YARN_REGISTRYDNS_SECURE_USER=root" >> yarn-env.shecho "export YARN_RESOURCEMANAGER_USER=root" >> yarn-env.shecho "export YARN_NODEMANAGER_USER=root" >> yarn-env.sh

Modifying the core-site.xml File

Step 1 Open the core-site.xml file.vim core-site.xml

Step 2 Add or modify parameters under the configuration section.<property> <name>fs.defaultFS</name> <value>hdfs://server1:9000</value></property><property> <name>hadoop.tmp.dir</name> <value>/home/hadoop_tmp_dir</value></property><property> <name>ipc.client.connect.max.retries</name> <value>100</value></property><property> <name>ipc.client.connect.retry.interval</name> <value>10000</value></property><property> <name>hadoop.proxyuser.root.hosts</name>

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

2 Hadoop Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 13

Page 21: Deployment Guide (Apache)

<value>*</value></property><property> <name>hadoop.proxyuser.root.groups</name> <value>*</value></property>

NO TICE

Create a directory on server1.mkdir /home/hadoop_tmp_dir

----End

Modifying the hdfs-site.xml File

Step 1 Modify the hdfs-site.xml file.vim hdfs-site.xml

Step 2 Add or modify parameters under the configuration section.<property> <name>dfs.replication</name> <value>1</value></property><property> <name>dfs.namenode.name.dir</name> <value>/data/data1/hadoop/nn</value></property><property> <name>dfs.datanode.data.dir</name> <value>/data/data1/hadoop/dn,/data/data2/hadoop/dn,/data/data3/hadoop/dn,/data/data4/hadoop/dn,/data/data5/hadoop/dn,/data/data6/hadoop/dn,/data/data7/hadoop/dn,/data/data8/hadoop/dn,/data/data9/hadoop/dn,/data/data10/hadoop/dn,/data/data11/hadoop/dn,/data/data12/hadoop/dn</value></property><property> <name>dfs.http.address</name> <value>server1:50070</value></property><property> <name>dfs.namenode.http-bind-host</name> <value>0.0.0.0</value></property><property> <name>dfs.datanode.handler.count</name> <value>600</value></property><property> <name>dfs.namenode.handler.count</name> <value>600</value></property><property> <name>dfs.namenode.service.handler.count</name> <value>600</value></property><property> <name>ipc.server.handler.queue.size</name> <value>300</value></property><property> <name>dfs.webhdfs.enabled</name> <value>true</value></property>

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

2 Hadoop Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 14

Page 22: Deployment Guide (Apache)

NO TICE

Create a directory for dfs.datanode.data.dir on agent1, agent2, and agent3.Example:mkdir -p /data/data{1,2,3,4,5,6,7,8,9,10,11,12}/hadoop

----End

Modifying the mapred-site.xml File

Step 1 Edit the mapred-site.xml file.vim mapred-site.xml

Step 2 Add or modify parameters under the configuration section.<property> <name>mapreduce.framework.name</name> <value>yarn</value> <final>true</final> <description>The runtime framework for executing MapReduce jobs</description></property><property> <name>mapreduce.job.reduce.slowstart.completedmaps</name> <value>0.88</value></property><property> <name>mapreduce.application.classpath</name> <value> /usr/local/hadoop/etc/hadoop, /usr/local/hadoop/share/hadoop/common/*, /usr/local/hadoop/share/hadoop/common/lib/*, /usr/local/hadoop/share/hadoop/hdfs/*, /usr/local/hadoop/share/hadoop/hdfs/lib/*, /usr/local/hadoop/share/hadoop/mapreduce/*, /usr/local/hadoop/share/hadoop/mapreduce/lib/*, /usr/local/hadoop/share/hadoop/yarn/*, /usr/local/hadoop/share/hadoop/yarn/lib/* </value></property><property> <name>mapreduce.map.memory.mb</name> <value>6144</value></property><property> <name>mapreduce.reduce.memory.mb</name> <value>6144</value> </property> <property> <name>mapreduce.map.java.opts</name> <value>-Xmx5530m</value></property><property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx2765m</value></property><property> <name>mapred.child.java.opts</name> <value>-Xmx2048m -Xms2048m</value></property><property> <name>mapred.reduce.parallel.copies</name> <value>20</value></property><property> <name>yarn.app.mapreduce.am.env</name>

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

2 Hadoop Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 15

Page 23: Deployment Guide (Apache)

<value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property><property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property><property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property>

----End

Modifying the yarn-site.xml FileStep 1 Edit the yarn-site.xml file.

vim yarn-site.xml

Step 2 Add or modify parameters under the configuration section.<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> <final>true</final></property><property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value></property><property> <name>yarn.resourcemanager.hostname</name> <value>server1</value></property><property> <name>yarn.resourcemanager.bind-host</name> <value>0.0.0.0</value></property><property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>65536</value></property><property> <name>yarn.nodemanager.resource.memory-mb</name> <value>102400</value></property><property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>48</value></property><property> <name>yarn.log-aggregation-enable</name> <value>true</value></property><property> <name>yarn.client.nodemanager-connect.max-wait-ms</name> <value>300000</value></property><property> <name>yarn.nodemanager.vmem-pmem-ratio</name> <value>7.1</value></property><property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value></property><property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value></property>

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

2 Hadoop Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 16

Page 24: Deployment Guide (Apache)

<property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>3072</value></property><property> <name>yarn.app.mapreduce.am.resource.mb</name> <value>3072</value></property><property> <name>yarn.scheduler.maximum-allocation-vcores</name> <value>48</value></property><property> <name>yarn.application.classpath</name> <value> /usr/local/hadoop/etc/hadoop, /usr/local/hadoop/share/hadoop/common/*, /usr/local/hadoop/share/hadoop/common/lib/*, /usr/local/hadoop/share/hadoop/hdfs/*, /usr/local/hadoop/share/hadoop/hdfs/lib/*, /usr/local/hadoop/share/hadoop/mapreduce/*, /usr/local/hadoop/share/hadoop/mapreduce/lib/*, /usr/local/hadoop/share/hadoop/yarn/*, /usr/local/hadoop/share/hadoop/yarn/lib/* </value></property><property> <name>yarn.nodemanager.local-dirs</name> <value>/data/data1/hadoop/yarn/local,/data/data2/hadoop/yarn/local,/data/data3/hadoop/yarn/local,/data/data4/hadoop/yarn/local,/data/data5/hadoop/yarn/local,/data/data6/hadoop/yarn/local,/data/data7/hadoop/yarn/local,/data/data8/hadoop/yarn/local,/data/data9/hadoop/yarn/local,/data/data10/hadoop/yarn/local,/data/data11/hadoop/yarn/local,/data/data12/hadoop/yarn/local</value> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/data/data1/hadoop/yarn/log,/data/data2/hadoop/yarn/log,/data/data3/hadoop/yarn/log,/data/data4/hadoop/yarn/log,/data/data5/hadoop/yarn/log,/data/data6/hadoop/yarn/log,/data/data7/hadoop/yarn/log,/data/data8/hadoop/yarn/log,/data/data9/hadoop/yarn/log,/data/data10/hadoop/yarn/log,/data/data11/hadoop/yarn/log,/data/data12/hadoop/yarn/log</value> </property>

NO TICE

Create a directory for yarn.nodemanager.local-dirs on agent1, agent2, andagent3.

Example:mkdir -p /data/data{1,2,3,4,5,6,7,8,9,10,11,12}/hadoop/yarn

----End

Modifying the slaves or workers Files

Step 1 Check the Hadoop version. If the Hadoop version is earlier than 3.x, edit theslaves file. If the Hadoop version is 3.x or later, edit the workers file.

Step 2 Edit the workers file (taking Hadoop 3.1.1 as an example in this document).vim workers

Step 3 Modify the workers file and delete all content except the IP addresses or hostnames of all agent nodes.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

2 Hadoop Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 17

Page 25: Deployment Guide (Apache)

agent1agent2agent3

----End

2.5.4 Synchronizing the Configuration to Other NodesStep 1 Create a journaldata directory on each node in sequence.

mkdir -p /usr/local/hadoop-3.1.1/journaldata

Step 2 Copy hadoop-3.1.1 to the /usr/local directory on agent1, agent2, and agent3nodes.scp -r /usr/local/hadoop-3.1.1 root@agent1:/usr/localscp -r /usr/local/hadoop-3.1.1 root@agent2:/usr/localscp -r /usr/local/hadoop-3.1.1 root@agent3:/usr/local

Step 3 Log in to the agent1, agent2, and agent3 nodes and create soft links forhadoop-3.1.1.cd /usr/localln -s hadoop-3.1.1 hadoop

----End

2.5.5 Starting the Hadoop Cluster

NO TICE

Perform operations in this section in sequence.

Step 1 Start the ZooKeeper cluster.

Start ZooKeeper on agent1, agent2, and agent3.

cd /usr/local/zookeeper/bin./zkServer.sh start

Step 2 Start JournalNode.

Start JournalNode on agent1, agent2, and agent3.

NO TE

Perform Step 2 to Step 4 only when you format the cluster for the first time. After theformatting is complete, you only need to perform Step 1, Step 5, and Step 6 when youstart the cluster next time.

cd /usr/local/hadoop/sbin./hadoop-daemon.sh start journalnode

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

2 Hadoop Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 18

Page 26: Deployment Guide (Apache)

Step 3 Format HDFS.

1. Format HDFS on server1.hdfs namenode -format

2. After the formatting, the cluster generates a directory based on thehadoop.tmp.dir parameter configured in the core-site.xml file.The directory configured in this example is /home/hadoop_tmp.

Step 4 Format ZKFC.

Format ZKFC on server1.

hdfs zkfc -formatZK

Step 5 Start the HDFS.

Start HDFS on server1.

cd /usr/local/hadoop/sbin./start-dfs.sh

Step 6 Start Yarn.

Start Yarn on server1.

cd /usr/local/hadoop/sbin./start-yarn.sh

Step 7 Check whether all processes are started properly.

NO TE

Perform this operation on each node to check whether all processes are started properly.(The following figures show the processes to be started on server1 and agent1, respectively.The processes to be started on other server nodes and agent nodes are similar.)

jps

----End

2.5.6 Verifying HadoopEnter the URL in the address box of the browser to access the Hadoop web page.The URL format is http://server1:50070.

Change server1 to the IP address of the node where the server process resides.Check whether the number of live nodes is the same as the number of agentnodes (the quantity is 3 in this section) and whether the number of dead nodes is0. If yes, the cluster is started properly.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

2 Hadoop Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 19

Page 27: Deployment Guide (Apache)

2.6 Troubleshooting

Failed to Start ZooKeeperSymptom:

After running the zkServer.sh command to check the ZooKeeper startup statuswhen ZooKeeper is started, a message is displayed indicating a startup failure.After ZooKeeper is stopped and started again by running the zkServer.sh start-foreground command, the following error information is displayed.

Solution:

1. Change the IP address of the node where ZooKeeper failed to be started to0.0.0.0 in the zoo.cfg file.

2. Modify the myid file in the directory specified by datadir in the zoo.cfg file.Ensure that the number in $datadir/myid is the same as that inzookeeper/tmp/myid.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

2 Hadoop Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 20

Page 28: Deployment Guide (Apache)

3 Flink Deployment Guide (CentOS 7.6 &openEuler 20.03)

3.1 Introduction

3.2 Environment Requirements

3.3 Configuring the Deployment Environment

3.4 Deploying ZooKeeper

3.5 Deploying Hadoop

3.6 Deploying Flink (Flink on Yarn)

3.1 Introduction

Flink OverviewThis document describes the Flink deployment procedure and does not include thesource code compilation procedure.

All programs required in this document are downloaded from the official websites.Most of these programs are compiled based on the x86 platform and may containmodules that are implemented in platform-dependent languages (such as C/C++).Therefore, incompatibility issues may occur if these programs are directly run onTaiShan servers. To resolve the problem, you need to download and compile thesource code and then deploy the programs. The deployment procedure is the sameregardless of the program compilation platform.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 21

Page 29: Deployment Guide (Apache)

Recommended Versions

Software

Version

How to Obtain

OpenJDK

jdk8u252-b09

ARM: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gzx86: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz

ZooKeeper

3.4.6 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/

Hadoop 3.1.1 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/hadoop/core/hadoop-3.1.1/

Flink 1.7.0 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/flink/flink-1.7.0/flink-1.7.0-bin-hadoop28-scala_2.11.tgz

3.2 Environment Requirements

Hardware

Minimum configuration: any CPU, one DIMM of any capacity, and one drive of anycapacity

The configuration depends on the actual application scenario.

OS Requirements

CentOS 7.4 to 7.6, openEuler 20.03

NO TE

This document uses CentOS 7.6 as an example to describe how to deploy a Flink cluster.

Cluster Environment

In this document, four hosts are used as nodes 1 to 4 in a cluster. Table 3-1 liststhe specifications of each node.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 22

Page 30: Deployment Guide (Apache)

Table 3-1 Cluster data plan

Node IP Address Number of Drives OS and JDK

Node 1 IPaddress1 System drive: 1 x 4TB HDD

Data drive: 12 x 4TB HDD

CentOS 7.6 and OpenJDKjdk8u252-b09

Node 2 IPaddress2

Node 3 IPaddress3

Node 4 IPaddress4

Software PlanningTable 3-2 lists the software plan of each node in the cluster.

Table 3-2 Software plan

Node Services

Node 1 JobManager

Node 2 QuorumPeerMain and TaskManager

Node 3 QuorumPeerMain and TaskManager

Node 4 QuorumPeerMain and TaskManager

3.3 Configuring the Deployment EnvironmentStep 1 Log in to nodes 1 to 4 in sequence and change their host names to server1,

agent1, agent2 and agent3.hostnamectl set-hostname host_name --static

Step 2 Log in to each node and modify the /etc/hosts file.

Add the mapping between the IP addresses and host names of the nodes to thehosts file.

IPaddress1 server1IPaddress2 agent1IPaddress3 agent2IPaddress4 agent3

Step 3 Log in to each node and disable the firewall.systemctl stop firewalld.servicesystemctl disable firewalld.service

Step 4 Log in to each node and enable password-free SSH login.

1. Generate a key and press Enter if any message is prompted.ssh-keygen -t rsa

2. Enable password-free SSH login on each node (including password-free loginfor the local node):ssh-copy-id -i ~/.ssh/id_rsa.pub root@node_IP_address

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 23

Page 31: Deployment Guide (Apache)

Step 5 Log in to each node and install OpenJDK.

1. Install OpenJDK.ARM:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gz -C /usr/local

x86:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz -C /usr/local

2. Add environment variables.vim /etc/profileexport JAVA_HOME=/usr/local/jdk8u252-b09export PATH=$JAVA_HOME/bin:$PATH

3. Make the environment variables take effect.source /etc/profile

4. Check whether the OpenJDK is successfully installed.java -version

The installation is successful if information similar to the following isdisplayed:

----End

3.4 Deploying ZooKeeper

3.4.1 Compiling and Decompressing ZooKeeperStep 1 Compile the zookeeper-3.4.6.tar.gz deployment package by following instructions

in ZooKeeper 3.4.6 Porting Guide (CentOS 7.6).

Step 2 Place zookeeper-3.4.6.tar.gz in the /usr/local directory on agent1 anddecompress it.mv zookeeper-3.4.6.tar.gz /usr/localcd /usr/localtar -zxvf zookeeper-3.4.6.tar.gz

Step 3 Create a soft link for subsequent version update.ln -s zookeeper-3.4.6 zookeeper

----End

3.4.2 Setting ZooKeeper Environment VariablesStep 1 Open the configuration file.

vim /etc/profile

Step 2 Add ZooKeeper to environment variables.export ZOOKEEPER_HOME=/usr/local/zookeeperexport PATH=$ZOOKEEPER_HOME/bin:$PATH

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 24

Page 32: Deployment Guide (Apache)

Step 3 Make the environment variables take effect.source /etc/profile

----End

3.4.3 Modifying the ZooKeeper Configuration FilesStep 1 Switch to the directory where ZooKeeper is located.

cd /usr/local/zookeeper/conf

Step 2 Copy the configuration file.cp zoo_sample.cfg zoo.cfg

Step 3 Modify the configuration file.vim zoo.cfg

1. Change the data directory.dataDir=/usr/local/zookeeper/tmp

2. Add the following code to the end of the file. server.1 to server.3 are thenodes where ZooKeeper is deployed.server.1=agent1:2888:3888server.2=agent2:2888:3888server.3=agent3:2888:3888

Step 4 Create the tmp directory as the data directory.mkdir /usr/local/zookeeper/tmp

Step 5 Create an empty file in the tmp directory and write an ID to the file.touch /usr/local/zookeeper/tmp/myidecho 1 > /usr/local/zookeeper/tmp/myid

----End

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 25

Page 33: Deployment Guide (Apache)

3.4.4 Synchronizing the Configuration to Other NodesStep 1 Copy the ZooKeeper configuration to other nodes.

scp -r /usr/local/zookeeper-3.4.6 root@agent2:/usr/localscp -r /usr/local/zookeeper-3.4.6 root@agent3:/usr/local

Step 2 Create a soft link and modify myid on agent2 and agent3.● agent2:

cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 2 > /usr/local/zookeeper/tmp/myid

● agent3:cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 3 > /usr/local/zookeeper/tmp/myid

----End

3.4.5 Running and Verifying ZooKeeperStep 1 Start ZooKeeper on agent1, agent2, and agent3.

cd /usr/local/zookeeper/bin./zkServer.sh start

NO TE

You can stop ZooKeeper on agent1, agent2, and agent3.cd /usr/local/zookeeper/bin./zkServer.sh stop

Step 2 Check the ZooKeeper status../zkServer.sh status

----End

3.5 Deploying Hadoop

3.5.1 Compiling and Decompressing HadoopStep 1 Compile the Hadoop software deployment package hadoop-3.1.1.tar.gz by

referring to Hadoop 3.1.1 Porting Guide (CentOS 7.6).

Step 2 Place hadoop-3.1.1.tar.gz in the /usr/local directory on server1 and decompressit.mv hadoop-3.1.1.tar.gz /usr/localcd /usr/localtar -zxvf hadoop-3.1.1.tar.gz

Step 3 Create a soft link for later version replacement.ln -s hadoop-3.1.1 hadoop

----End

3.5.2 Setting the Hadoop Environment VariablesStep 1 Open the /etc/profile file:

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 26

Page 34: Deployment Guide (Apache)

vim /etc/profile

Step 2 Add the following environment variables to the end of the file:export HADOOP_HOME=/usr/local/hadoopexport PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

Step 3 Make the environment variables take effect.source /etc/profile

----End

3.5.3 Modifying the Hadoop Configuration FileNO TE

All Hadoop configuration files are stored in the $HADOOP_HOME/etc/hadoop directory.Before modifying the configuration files, go to the $HADOOP_HOME/etc/hadoop directoryfirst.cd $HADOOP_HOME/etc/hadoop

Modifying the hadoop-env.sh FileChange the environment variable JAVA_HOME to an absolute path and set theuser to user root.

echo "export JAVA_HOME=/usr/local/jdk8u252-b09" >> hadoop-env.shecho "export HDFS_NAMENODE_USER=root" >> hadoop-env.shecho "export HDFS_SECONDARYNAMENODE_USER=root" >> hadoop-env.shecho "export HDFS_DATANODE_USER=root" >> hadoop-env.sh

Modifying the yarn-env.sh FileChange the user to user root.

echo "export YARN_REGISTRYDNS_SECURE_USER=root" >> yarn-env.shecho "export YARN_RESOURCEMANAGER_USER=root" >> yarn-env.shecho "export YARN_NODEMANAGER_USER=root" >> yarn-env.sh

Modifying the core-site.xml File

Step 1 Open the core-site.xml file.vim core-site.xml

Step 2 Add or modify parameters under the configuration section.<property> <name>fs.defaultFS</name> <value>hdfs://server1:9000</value></property><property> <name>hadoop.tmp.dir</name> <value>/home/hadoop_tmp_dir</value></property><property> <name>ipc.client.connect.max.retries</name> <value>100</value></property><property> <name>ipc.client.connect.retry.interval</name> <value>10000</value></property><property> <name>hadoop.proxyuser.root.hosts</name>

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 27

Page 35: Deployment Guide (Apache)

<value>*</value></property><property> <name>hadoop.proxyuser.root.groups</name> <value>*</value></property>

NO TICE

Create a directory on server1.mkdir /home/hadoop_tmp_dir

----End

Modifying the hdfs-site.xml File

Step 1 Modify the hdfs-site.xml file.vim hdfs-site.xml

Step 2 Add or modify parameters under the configuration section.<property> <name>dfs.replication</name> <value>1</value></property><property> <name>dfs.namenode.name.dir</name> <value>/data/data1/hadoop/nn</value></property><property> <name>dfs.datanode.data.dir</name> <value>/data/data1/hadoop/dn,/data/data2/hadoop/dn,/data/data3/hadoop/dn,/data/data4/hadoop/dn,/data/data5/hadoop/dn,/data/data6/hadoop/dn,/data/data7/hadoop/dn,/data/data8/hadoop/dn,/data/data9/hadoop/dn,/data/data10/hadoop/dn,/data/data11/hadoop/dn,/data/data12/hadoop/dn</value></property><property> <name>dfs.http.address</name> <value>server1:50070</value></property><property> <name>dfs.namenode.http-bind-host</name> <value>0.0.0.0</value></property><property> <name>dfs.datanode.handler.count</name> <value>600</value></property><property> <name>dfs.namenode.handler.count</name> <value>600</value></property><property> <name>dfs.namenode.service.handler.count</name> <value>600</value></property><property> <name>ipc.server.handler.queue.size</name> <value>300</value></property><property> <name>dfs.webhdfs.enabled</name> <value>true</value></property>

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 28

Page 36: Deployment Guide (Apache)

NO TICE

Create a directory for dfs.datanode.data.dir on agent1, agent2, and agent3.Example:mkdir -p /data/data{1,2,3,4,5,6,7,8,9,10,11,12}/hadoop

----End

Modifying the mapred-site.xml File

Step 1 Edit the mapred-site.xml file.vim mapred-site.xml

Step 2 Add or modify parameters under the configuration section.<property> <name>mapreduce.framework.name</name> <value>yarn</value> <final>true</final> <description>The runtime framework for executing MapReduce jobs</description></property><property> <name>mapreduce.job.reduce.slowstart.completedmaps</name> <value>0.88</value></property><property> <name>mapreduce.application.classpath</name> <value> /usr/local/hadoop/etc/hadoop, /usr/local/hadoop/share/hadoop/common/*, /usr/local/hadoop/share/hadoop/common/lib/*, /usr/local/hadoop/share/hadoop/hdfs/*, /usr/local/hadoop/share/hadoop/hdfs/lib/*, /usr/local/hadoop/share/hadoop/mapreduce/*, /usr/local/hadoop/share/hadoop/mapreduce/lib/*, /usr/local/hadoop/share/hadoop/yarn/*, /usr/local/hadoop/share/hadoop/yarn/lib/* </value></property><property> <name>mapreduce.map.memory.mb</name> <value>6144</value></property><property> <name>mapreduce.reduce.memory.mb</name> <value>6144</value> </property> <property> <name>mapreduce.map.java.opts</name> <value>-Xmx5530m</value></property><property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx2765m</value></property><property> <name>mapred.child.java.opts</name> <value>-Xmx2048m -Xms2048m</value></property><property> <name>mapred.reduce.parallel.copies</name> <value>20</value></property><property> <name>yarn.app.mapreduce.am.env</name>

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 29

Page 37: Deployment Guide (Apache)

<value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property><property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property><property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property>

----End

Modifying the yarn-site.xml FileStep 1 Edit the yarn-site.xml file.

vim yarn-site.xml

Step 2 Add or modify parameters under the configuration section.<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> <final>true</final></property><property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value></property><property> <name>yarn.resourcemanager.hostname</name> <value>server1</value></property><property> <name>yarn.resourcemanager.bind-host</name> <value>0.0.0.0</value></property><property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>65536</value></property><property> <name>yarn.nodemanager.resource.memory-mb</name> <value>102400</value></property><property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>48</value></property><property> <name>yarn.log-aggregation-enable</name> <value>true</value></property><property> <name>yarn.client.nodemanager-connect.max-wait-ms</name> <value>300000</value></property><property> <name>yarn.nodemanager.vmem-pmem-ratio</name> <value>7.1</value></property><property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value></property><property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value></property>

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 30

Page 38: Deployment Guide (Apache)

<property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>3072</value></property><property> <name>yarn.app.mapreduce.am.resource.mb</name> <value>3072</value></property><property> <name>yarn.scheduler.maximum-allocation-vcores</name> <value>48</value></property><property> <name>yarn.application.classpath</name> <value> /usr/local/hadoop/etc/hadoop, /usr/local/hadoop/share/hadoop/common/*, /usr/local/hadoop/share/hadoop/common/lib/*, /usr/local/hadoop/share/hadoop/hdfs/*, /usr/local/hadoop/share/hadoop/hdfs/lib/*, /usr/local/hadoop/share/hadoop/mapreduce/*, /usr/local/hadoop/share/hadoop/mapreduce/lib/*, /usr/local/hadoop/share/hadoop/yarn/*, /usr/local/hadoop/share/hadoop/yarn/lib/* </value></property><property> <name>yarn.nodemanager.local-dirs</name> <value>/data/data1/hadoop/yarn/local,/data/data2/hadoop/yarn/local,/data/data3/hadoop/yarn/local,/data/data4/hadoop/yarn/local,/data/data5/hadoop/yarn/local,/data/data6/hadoop/yarn/local,/data/data7/hadoop/yarn/local,/data/data8/hadoop/yarn/local,/data/data9/hadoop/yarn/local,/data/data10/hadoop/yarn/local,/data/data11/hadoop/yarn/local,/data/data12/hadoop/yarn/local</value> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/data/data1/hadoop/yarn/log,/data/data2/hadoop/yarn/log,/data/data3/hadoop/yarn/log,/data/data4/hadoop/yarn/log,/data/data5/hadoop/yarn/log,/data/data6/hadoop/yarn/log,/data/data7/hadoop/yarn/log,/data/data8/hadoop/yarn/log,/data/data9/hadoop/yarn/log,/data/data10/hadoop/yarn/log,/data/data11/hadoop/yarn/log,/data/data12/hadoop/yarn/log</value> </property>

NO TICE

Create a directory for yarn.nodemanager.local-dirs on agent1, agent2, andagent3.

Example:mkdir -p /data/data{1,2,3,4,5,6,7,8,9,10,11,12}/hadoop/yarn

----End

Modifying the slaves or workers Files

Step 1 Check the Hadoop version. If the Hadoop version is earlier than 3.x, edit theslaves file. If the Hadoop version is 3.x or later, edit the workers file.

Step 2 Edit the workers file (taking Hadoop 3.1.1 as an example in this document).vim workers

Step 3 Modify the workers file and delete all content except the IP addresses or hostnames of all agent nodes.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 31

Page 39: Deployment Guide (Apache)

agent1agent2agent3

----End

3.5.4 Synchronizing the Configuration to Other NodesStep 1 Create a journaldata directory on each node in sequence.

mkdir -p /usr/local/hadoop-3.1.1/journaldata

Step 2 Copy hadoop-3.1.1 to the /usr/local directory on agent1, agent2, and agent3nodes.scp -r /usr/local/hadoop-3.1.1 root@agent1:/usr/localscp -r /usr/local/hadoop-3.1.1 root@agent2:/usr/localscp -r /usr/local/hadoop-3.1.1 root@agent3:/usr/local

Step 3 Log in to the agent1, agent2, and agent3 nodes and create soft links forhadoop-3.1.1.cd /usr/localln -s hadoop-3.1.1 hadoop

----End

3.5.5 Starting the Hadoop Cluster

NO TICE

Perform operations in this section in sequence.

Step 1 Start the ZooKeeper cluster.

Start ZooKeeper on agent1, agent2, and agent3.

cd /usr/local/zookeeper/bin./zkServer.sh start

Step 2 Start JournalNode.

Start JournalNode on agent1, agent2, and agent3.

NO TE

Perform Step 2 to Step 4 only when you format the cluster for the first time. After theformatting is complete, you only need to perform Step 1, Step 5, and Step 6 when youstart the cluster next time.

cd /usr/local/hadoop/sbin./hadoop-daemon.sh start journalnode

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 32

Page 40: Deployment Guide (Apache)

Step 3 Format HDFS.

1. Format HDFS on server1.hdfs namenode -format

2. After the formatting, the cluster generates a directory based on thehadoop.tmp.dir parameter configured in the core-site.xml file.The directory configured in this example is /home/hadoop_tmp.

Step 4 Format ZKFC.

Format ZKFC on server1.

hdfs zkfc -formatZK

Step 5 Start the HDFS.

Start HDFS on server1.

cd /usr/local/hadoop/sbin./start-dfs.sh

Step 6 Start Yarn.

Start Yarn on server1.

cd /usr/local/hadoop/sbin./start-yarn.sh

Step 7 Check whether all processes are started properly.

NO TE

Perform this operation on each node to check whether all processes are started properly.(The following figures show the processes to be started on server1 and agent1, respectively.The processes to be started on other server nodes and agent nodes are similar.)

jps

----End

3.5.6 Verifying HadoopEnter the URL in the address box of the browser to access the Hadoop web page.The URL format is http://server1:50070.

Change server1 to the IP address of the node where the server process resides.Check whether the number of live nodes is the same as the number of agentnodes (the quantity is 3 in this section) and whether the number of dead nodes is0. If yes, the cluster is started properly.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 33

Page 41: Deployment Guide (Apache)

3.6 Deploying Flink (Flink on Yarn)

3.6.1 Obtaining FlinkStep 1 Download the Flink installation package.

wget https://archive.apache.org/dist/flink/flink-1.7.0/flink-1.7.0-bin-hadoop28-scala_2.11.tgz

Step 2 Place the flink-1.7.0-bin-hadoop28-scala_2.11.tgz package in the /usr/localdirectory of server1 and decompress it.mv flink-1.7.0-bin-hadoop28-scala_2.11.tgz /usr/localtar -zxvf flink-1.7.0-bin-hadoop28-scala_2.11.tgz

Step 3 Create a soft link for subsequent version update.ln -s flink-1.7.0 flink

----End

3.6.2 Setting Flink Environment VariablesStep 1 Open the configuration file.

vim /etc/profile

Step 2 Add Flink path to the environment variables.export FLINK_HOME=/usr/local/flinkexport PATH=$FLINK_HOME/bin:$PATH

Step 3 Make the environment variables take effect.source /etc/profile

----End

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 34

Page 42: Deployment Guide (Apache)

3.6.3 Modifying the Flink Configuration FilesNO TE

All Flink configuration files are stored in the $FLINK_HOME/conf directory. Beforemodifying the configuration files, run the following command to switch to $FLINK_HOME/conf:cd $FLINK_HOME/conf

Modify the flink-conf.yaml file as follows:env.java.home: /usr/local/jdk8u252-b09env.hadoop.conf.dir: /usr/local/hadoop/etc/hadoop/

3.6.4 Running and Verifying FlinkStep 1 Start ZooKeeper and Hadoop in sequence.

Step 2 Start the Flink cluster on server1:/usr/local/flink/bin/yarn-session.sh -n 1 -s 1 -jm 768 -tm 1024 -qu default -nm flinkapp -d

NO TE

You can stop the Flink cluster on server1./usr/local/flink/bin/stop-cluster.sh

Step 3 Enter the URL in the address box of a browser to access the Flink WebUI. The URLformat is as follows:http://server1:8081

----End

3.6.5 Stopping FlinkCheck the Yarn task process of Flink and kill the task to stop Flink.

yarn app - kill $(yarn app -list | grep flinkapp | awk '{print $1}')

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 35

Page 43: Deployment Guide (Apache)

4 HBase Cluster Deployment Guide(CentOS 7.6 & openEuler 20.03)

4.1 Introduction

4.2 Environment Requirements

4.3 Configuring the Deployment Environment

4.4 Deploying ZooKeeper

4.5 Deploying Hadoop

4.6 Deploying HBase

4.1 Introduction

HBase OverviewThis document describes the HBase deployment procedure and does not includethe source code compilation procedure.

All programs required in this document are downloaded from the official websites.Most of these programs are compiled based on the x86 platform and may containmodules that are implemented in platform-dependent languages (such as C/C++).Therefore, incompatibility issues may occur if these programs are directly run onTaiShan servers. To resolve the problem, you need to download and compile thesource code and then deploy the programs. The deployment procedure is the sameregardless of the program compilation platform.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 36

Page 44: Deployment Guide (Apache)

Recommended Versions

Software

Version

How to Obtain

OpenJDK

jdk8u252-b09

ARM: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gzx86: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz

ZooKeeper

3.4.6 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/

Hadoop 3.1.1 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/hadoop/core/hadoop-3.1.1/

HBase 2.0.2 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/hbase/2.0.2/hbase-2.0.2-bin.tar.gz

4.2 Environment Requirements

Hardware

Minimum configuration: any CPU, one DIMM of any capacity, and one drive of anycapacity

The configuration depends on the actual application scenario.

OS Requirements

CentOS 7.4 to 7.6, openEuler 20.03

NO TE

This document uses CentOS 7.6 as an example to describe how to deploy an HBase cluster.

Cluster Environment

In this document, four hosts are used as nodes 1 to 4 in a cluster. Table 4-1 liststhe data plan of each node.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 37

Page 45: Deployment Guide (Apache)

Table 4-1 Cluster data plan

Node IP Address Number of Drives OS & JDK

Node 1 IPaddress1 System drive: 1 x 4TB HDD

Data drive: 6 x 4TB HDD

CentOS 7.6 & OpenJDKjdk8u252-b09

Node 2 IPaddress2

Node 3 IPaddress3

Node 4 IPaddress4

Software PlanningTable 4-2 lists the software plan of each node in the cluster.

Table 4-2 Software plan

Node Services

Node 1 NameNode, ResourceManager, HMaster

Node 2 QuorumPeerMain, DataNode, NodeManager, JournalNode, andHRegionServer

Node 3 QuorumPeerMain, DataNode, NodeManager, JournalNode, andHRegionServer

Node 4 QuorumPeerMain, DataNode, NodeManager, JournalNode, andHRegionServer

4.3 Configuring the Deployment EnvironmentStep 1 Log in to nodes 1 to 4 in sequence and change their host names to server1,

agent1, agent2 and agent3.hostnamectl set-hostname host_name --static

Step 2 Log in to each node and modify the /etc/hosts file.

Add the mapping between the IP addresses and host names of the nodes to thehosts file.

IPaddress1 server1IPaddress2 agent1IPaddress3 agent2IPaddress4 agent3

Step 3 Log in to each node and disable the firewall.systemctl stop firewalld.servicesystemctl disable firewalld.service

Step 4 Log in to each node and enable password-free SSH login.

1. Generate a key and press Enter if any message is prompted.ssh-keygen -t rsa

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 38

Page 46: Deployment Guide (Apache)

2. Enable password-free SSH login on each node (including password-free loginfor the local node):ssh-copy-id -i ~/.ssh/id_rsa.pub root@node_IP_address

Step 5 Log in to each node and install OpenJDK.

1. Install OpenJDK.ARM:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gz -C /usr/local

x86:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz -C /usr/local

2. Add environment variables.vim /etc/profileexport JAVA_HOME=/usr/local/jdk8u252-b09export PATH=$JAVA_HOME/bin:$PATH

3. Make the environment variables take effect.source /etc/profile

4. Check whether the OpenJDK is successfully installed.java -version

The installation is successful if information similar to the following isdisplayed:

----End

4.4 Deploying ZooKeeper

4.4.1 Compiling and Decompressing ZooKeeperStep 1 Compile the zookeeper-3.4.6.tar.gz deployment package by following instructions

in ZooKeeper 3.4.6 Porting Guide (CentOS 7.6).

Step 2 Place zookeeper-3.4.6.tar.gz in the /usr/local directory on agent1 anddecompress it.mv zookeeper-3.4.6.tar.gz /usr/localcd /usr/localtar -zxvf zookeeper-3.4.6.tar.gz

Step 3 Create a soft link for subsequent version update.ln -s zookeeper-3.4.6 zookeeper

----End

4.4.2 Setting ZooKeeper Environment VariablesStep 1 Open the configuration file.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 39

Page 47: Deployment Guide (Apache)

vim /etc/profile

Step 2 Add ZooKeeper to environment variables.export ZOOKEEPER_HOME=/usr/local/zookeeperexport PATH=$ZOOKEEPER_HOME/bin:$PATH

Step 3 Make the environment variables take effect.source /etc/profile

----End

4.4.3 Modifying the ZooKeeper Configuration FilesStep 1 Switch to the directory where ZooKeeper is located.

cd /usr/local/zookeeper/conf

Step 2 Copy the configuration file.cp zoo_sample.cfg zoo.cfg

Step 3 Modify the configuration file.vim zoo.cfg

1. Change the data directory.dataDir=/usr/local/zookeeper/tmp

2. Add the following code to the end of the file. server.1 to server.3 are thenodes where ZooKeeper is deployed.server.1=agent1:2888:3888server.2=agent2:2888:3888server.3=agent3:2888:3888

Step 4 Create the tmp directory as the data directory.mkdir /usr/local/zookeeper/tmp

Step 5 Create an empty file in the tmp directory and write an ID to the file.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 40

Page 48: Deployment Guide (Apache)

touch /usr/local/zookeeper/tmp/myidecho 1 > /usr/local/zookeeper/tmp/myid

----End

4.4.4 Synchronizing the Configuration to Other NodesStep 1 Copy the ZooKeeper configuration to other nodes.

scp -r /usr/local/zookeeper-3.4.6 root@agent2:/usr/localscp -r /usr/local/zookeeper-3.4.6 root@agent3:/usr/local

Step 2 Create a soft link and modify myid on agent2 and agent3.● agent2:

cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 2 > /usr/local/zookeeper/tmp/myid

● agent3:cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 3 > /usr/local/zookeeper/tmp/myid

----End

4.4.5 Running and Verifying ZooKeeperStep 1 Start ZooKeeper on agent1, agent2, and agent3.

cd /usr/local/zookeeper/bin./zkServer.sh start

NO TE

You can stop ZooKeeper on agent1, agent2, and agent3.cd /usr/local/zookeeper/bin./zkServer.sh stop

Step 2 Check the ZooKeeper status../zkServer.sh status

----End

4.5 Deploying Hadoop

4.5.1 Compiling and Decompressing HadoopStep 1 Compile the Hadoop software deployment package hadoop-3.1.1.tar.gz by

referring to Hadoop 3.1.1 Porting Guide (CentOS 7.6).

Step 2 Place hadoop-3.1.1.tar.gz in the /usr/local directory on server1 and decompressit.mv hadoop-3.1.1.tar.gz /usr/localcd /usr/localtar -zxvf hadoop-3.1.1.tar.gz

Step 3 Create a soft link for later version replacement.ln -s hadoop-3.1.1 hadoop

----End

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 41

Page 49: Deployment Guide (Apache)

4.5.2 Setting the Hadoop Environment VariablesStep 1 Open the /etc/profile file:

vim /etc/profile

Step 2 Add the following environment variables to the end of the file:export HADOOP_HOME=/usr/local/hadoopexport PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

Step 3 Make the environment variables take effect.source /etc/profile

----End

4.5.3 Modifying the Hadoop Configuration FileNO TE

All Hadoop configuration files are stored in the $HADOOP_HOME/etc/hadoop directory.Before modifying the configuration files, go to the $HADOOP_HOME/etc/hadoop directoryfirst.cd $HADOOP_HOME/etc/hadoop

Modifying the hadoop-env.sh FileChange the environment variable JAVA_HOME to an absolute path and set theuser to user root.

echo "export JAVA_HOME=/usr/local/jdk8u252-b09" >> hadoop-env.shecho "export HDFS_NAMENODE_USER=root" >> hadoop-env.shecho "export HDFS_SECONDARYNAMENODE_USER=root" >> hadoop-env.shecho "export HDFS_DATANODE_USER=root" >> hadoop-env.sh

Modifying the yarn-env.sh FileChange the user to user root.

echo "export YARN_REGISTRYDNS_SECURE_USER=root" >> yarn-env.shecho "export YARN_RESOURCEMANAGER_USER=root" >> yarn-env.shecho "export YARN_NODEMANAGER_USER=root" >> yarn-env.sh

Modifying the core-site.xml File

Step 1 Open the core-site.xml file.vim core-site.xml

Step 2 Add or modify parameters under the configuration section.<property> <name>fs.defaultFS</name> <value>hdfs://server1:9000</value></property><property> <name>hadoop.tmp.dir</name> <value>/home/hadoop_tmp_dir</value></property><property> <name>ipc.client.connect.max.retries</name> <value>100</value></property><property>

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 42

Page 50: Deployment Guide (Apache)

<name>ipc.client.connect.retry.interval</name> <value>10000</value></property><property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value></property><property> <name>hadoop.proxyuser.root.groups</name> <value>*</value></property>

NO TICE

Create a directory on server1.mkdir /home/hadoop_tmp_dir

----End

Modifying the hdfs-site.xml File

Step 1 Modify the hdfs-site.xml file.vim hdfs-site.xml

Step 2 Add or modify parameters under the configuration section.<property> <name>dfs.replication</name> <value>1</value></property><property> <name>dfs.namenode.name.dir</name> <value>/data/data1/hadoop/nn</value></property><property> <name>dfs.datanode.data.dir</name> <value>/data/data1/hadoop/dn,/data/data2/hadoop/dn,/data/data3/hadoop/dn,/data/data4/hadoop/dn,/data/data5/hadoop/dn,/data/data6/hadoop/dn,/data/data7/hadoop/dn,/data/data8/hadoop/dn,/data/data9/hadoop/dn,/data/data10/hadoop/dn,/data/data11/hadoop/dn,/data/data12/hadoop/dn</value></property><property> <name>dfs.http.address</name> <value>server1:50070</value></property><property> <name>dfs.namenode.http-bind-host</name> <value>0.0.0.0</value></property><property> <name>dfs.datanode.handler.count</name> <value>600</value></property><property> <name>dfs.namenode.handler.count</name> <value>600</value></property><property> <name>dfs.namenode.service.handler.count</name> <value>600</value></property><property> <name>ipc.server.handler.queue.size</name> <value>300</value></property><property>

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 43

Page 51: Deployment Guide (Apache)

<name>dfs.webhdfs.enabled</name> <value>true</value></property>

NO TICE

Create a directory for dfs.datanode.data.dir on agent1, agent2, and agent3.Example:mkdir -p /data/data{1,2,3,4,5,6,7,8,9,10,11,12}/hadoop

----End

Modifying the mapred-site.xml File

Step 1 Edit the mapred-site.xml file.vim mapred-site.xml

Step 2 Add or modify parameters under the configuration section.<property> <name>mapreduce.framework.name</name> <value>yarn</value> <final>true</final> <description>The runtime framework for executing MapReduce jobs</description></property><property> <name>mapreduce.job.reduce.slowstart.completedmaps</name> <value>0.88</value></property><property> <name>mapreduce.application.classpath</name> <value> /usr/local/hadoop/etc/hadoop, /usr/local/hadoop/share/hadoop/common/*, /usr/local/hadoop/share/hadoop/common/lib/*, /usr/local/hadoop/share/hadoop/hdfs/*, /usr/local/hadoop/share/hadoop/hdfs/lib/*, /usr/local/hadoop/share/hadoop/mapreduce/*, /usr/local/hadoop/share/hadoop/mapreduce/lib/*, /usr/local/hadoop/share/hadoop/yarn/*, /usr/local/hadoop/share/hadoop/yarn/lib/* </value></property><property> <name>mapreduce.map.memory.mb</name> <value>6144</value></property><property> <name>mapreduce.reduce.memory.mb</name> <value>6144</value> </property> <property> <name>mapreduce.map.java.opts</name> <value>-Xmx5530m</value></property><property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx2765m</value></property><property> <name>mapred.child.java.opts</name> <value>-Xmx2048m -Xms2048m</value></property><property>

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 44

Page 52: Deployment Guide (Apache)

<name>mapred.reduce.parallel.copies</name> <value>20</value></property><property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property><property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property><property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property>

----End

Modifying the yarn-site.xml FileStep 1 Edit the yarn-site.xml file.

vim yarn-site.xml

Step 2 Add or modify parameters under the configuration section.<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> <final>true</final></property><property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value></property><property> <name>yarn.resourcemanager.hostname</name> <value>server1</value></property><property> <name>yarn.resourcemanager.bind-host</name> <value>0.0.0.0</value></property><property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>65536</value></property><property> <name>yarn.nodemanager.resource.memory-mb</name> <value>102400</value></property><property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>48</value></property><property> <name>yarn.log-aggregation-enable</name> <value>true</value></property><property> <name>yarn.client.nodemanager-connect.max-wait-ms</name> <value>300000</value></property><property> <name>yarn.nodemanager.vmem-pmem-ratio</name> <value>7.1</value></property><property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value>

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 45

Page 53: Deployment Guide (Apache)

</property><property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value></property><property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>3072</value></property><property> <name>yarn.app.mapreduce.am.resource.mb</name> <value>3072</value></property><property> <name>yarn.scheduler.maximum-allocation-vcores</name> <value>48</value></property><property> <name>yarn.application.classpath</name> <value> /usr/local/hadoop/etc/hadoop, /usr/local/hadoop/share/hadoop/common/*, /usr/local/hadoop/share/hadoop/common/lib/*, /usr/local/hadoop/share/hadoop/hdfs/*, /usr/local/hadoop/share/hadoop/hdfs/lib/*, /usr/local/hadoop/share/hadoop/mapreduce/*, /usr/local/hadoop/share/hadoop/mapreduce/lib/*, /usr/local/hadoop/share/hadoop/yarn/*, /usr/local/hadoop/share/hadoop/yarn/lib/* </value></property><property> <name>yarn.nodemanager.local-dirs</name> <value>/data/data1/hadoop/yarn/local,/data/data2/hadoop/yarn/local,/data/data3/hadoop/yarn/local,/data/data4/hadoop/yarn/local,/data/data5/hadoop/yarn/local,/data/data6/hadoop/yarn/local,/data/data7/hadoop/yarn/local,/data/data8/hadoop/yarn/local,/data/data9/hadoop/yarn/local,/data/data10/hadoop/yarn/local,/data/data11/hadoop/yarn/local,/data/data12/hadoop/yarn/local</value> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/data/data1/hadoop/yarn/log,/data/data2/hadoop/yarn/log,/data/data3/hadoop/yarn/log,/data/data4/hadoop/yarn/log,/data/data5/hadoop/yarn/log,/data/data6/hadoop/yarn/log,/data/data7/hadoop/yarn/log,/data/data8/hadoop/yarn/log,/data/data9/hadoop/yarn/log,/data/data10/hadoop/yarn/log,/data/data11/hadoop/yarn/log,/data/data12/hadoop/yarn/log</value> </property>

NO TICE

Create a directory for yarn.nodemanager.local-dirs on agent1, agent2, andagent3.Example:mkdir -p /data/data{1,2,3,4,5,6,7,8,9,10,11,12}/hadoop/yarn

----End

Modifying the slaves or workers Files

Step 1 Check the Hadoop version. If the Hadoop version is earlier than 3.x, edit theslaves file. If the Hadoop version is 3.x or later, edit the workers file.

Step 2 Edit the workers file (taking Hadoop 3.1.1 as an example in this document).vim workers

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 46

Page 54: Deployment Guide (Apache)

Step 3 Modify the workers file and delete all content except the IP addresses or hostnames of all agent nodes.agent1agent2agent3

----End

4.5.4 Synchronizing the Configuration to Other NodesStep 1 Create a journaldata directory on each node in sequence.

mkdir -p /usr/local/hadoop-3.1.1/journaldata

Step 2 Copy hadoop-3.1.1 to the /usr/local directory on agent1, agent2, and agent3nodes.scp -r /usr/local/hadoop-3.1.1 root@agent1:/usr/localscp -r /usr/local/hadoop-3.1.1 root@agent2:/usr/localscp -r /usr/local/hadoop-3.1.1 root@agent3:/usr/local

Step 3 Log in to the agent1, agent2, and agent3 nodes and create soft links forhadoop-3.1.1.cd /usr/localln -s hadoop-3.1.1 hadoop

----End

4.5.5 Starting the Hadoop Cluster

NO TICE

Perform operations in this section in sequence.

Step 1 Start the ZooKeeper cluster.

Start ZooKeeper on agent1, agent2, and agent3.

cd /usr/local/zookeeper/bin./zkServer.sh start

Step 2 Start JournalNode.

Start JournalNode on agent1, agent2, and agent3.

NO TE

Perform Step 2 to Step 4 only when you format the cluster for the first time. After theformatting is complete, you only need to perform Step 1, Step 5, and Step 6 when youstart the cluster next time.

cd /usr/local/hadoop/sbin./hadoop-daemon.sh start journalnode

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 47

Page 55: Deployment Guide (Apache)

Step 3 Format HDFS.

1. Format HDFS on server1.hdfs namenode -format

2. After the formatting, the cluster generates a directory based on thehadoop.tmp.dir parameter configured in the core-site.xml file.The directory configured in this example is /home/hadoop_tmp.

Step 4 Format ZKFC.

Format ZKFC on server1.

hdfs zkfc -formatZK

Step 5 Start the HDFS.

Start HDFS on server1.

cd /usr/local/hadoop/sbin./start-dfs.sh

Step 6 Start Yarn.

Start Yarn on server1.

cd /usr/local/hadoop/sbin./start-yarn.sh

Step 7 Check whether all processes are started properly.

NO TE

Perform this operation on each node to check whether all processes are started properly.(The following figures show the processes to be started on server1 and agent1, respectively.The processes to be started on other server nodes and agent nodes are similar.)

jps

----End

4.5.6 Verifying HadoopEnter the URL in the address box of the browser to access the Hadoop web page.The URL format is http://server1:50070.

Change server1 to the IP address of the node where the server process resides.Check whether the number of live nodes is the same as the number of agentnodes (the quantity is 3 in this section) and whether the number of dead nodes is0. If yes, the cluster is started properly.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 48

Page 56: Deployment Guide (Apache)

4.6 Deploying HBase

4.6.1 Obtaining HBaseStep 1 Download the HBase package from the following website:

https://archive.apache.org/dist/hbase/2.0.2/hbase-2.0.2-bin.tar.gz

Step 2 Place hbase-2.0.2-bin.tar.gz in the /usr/local directory on server1 anddecompress it.mv hbase-2.0.2-bin.tar.gz /usr/localtar -zxvf hbase-2.0.2-bin.tar.gz

Step 3 Create a soft link for subsequent version update.ln -s hbase-2.0.2 hbase

----End

4.6.2 Setting HBase Environment VariablesStep 1 Open the /etc/profile file.

vim /etc/profile

Step 2 Add the following environment variables to the end of the file:export HBASE_HOME=/usr/local/hbaseexport PATH=$HBASE_HOME/bin:$HBASE_HOME/sbin:$PATH

Step 3 Make the environment variables take effect.source /etc/profile

----End

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 49

Page 57: Deployment Guide (Apache)

4.6.3 Modifying the HBase Configuration FilesNO TE

All HBase configuration files are stored in the HBASE_HOME/conf directory. Beforemodifying the configuration files, go to the HBASE_HOME/conf directory.cd $HBASE_HOME/conf

Modifying the hbase-env.sh File

Step 1 Open the hbase-env.sh file.vim hbase-env.sh

Step 2 Change the value of JAVA_HOME to an absolute path, and HBASE_MANAGES_ZKto false.export JAVA_HOME=/usr/local/jdk8u252-b09export HBASE_MANAGES_ZK=falseexport HBASE_LIBRARY_PATH=/usr/local/hadoop/lib/native

----End

Modifying the hbase-site.xml File

Step 1 Open the hbase-site.xml file.vim hbase-site.xml

Step 2 Add or modify some parameters under the configuration section.<configuration> <property> <name>hbase.rootdir</name> <value>hdfs://server1:9000/HBase</value> </property> <property> <name>hbase.tmp.dir</name> <value>/usr/local/hbase/tmp</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.unsafe.stream.capability.enforce</name> <value>false</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>agent1:2181,agent2:2181,agent3:2181</value> </property> <property> <name>hbase.unsafe.stream.capability.enforce</name> <value>false</value> </property></configuration>

----End

Modifying the regionservers File

Step 1 Open the regionservers file.vim regionservers

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 50

Page 58: Deployment Guide (Apache)

Step 2 Replace the content of the regionservers file with the IP addresses (or hostnames) of the agents.agent1agent2agent3

----End

Copying hdfs-site.xml

Copy the hdfs-site.xml file of Hadoop to the hbase/conf/ directory. You can use asoft link or copy the file.cp /usr/local/hadoop/etc/hadoop/hdfs-site.xml /usr/local/hbase/conf/hdfs-site.xml

4.6.4 Synchronizing the Configuration to Other NodesStep 1 Copy hbase-2.0.2 to the /usr/local directory on agent1, agent2, and agent3

nodes.scp -r /usr/local/hbase-2.0.2 root@agent1:/usr/localscp -r /usr/local/hbase-2.0.2 root@agent2:/usr/localscp -r /usr/local/hbase-2.0.2 root@agent3:/usr/local

Step 2 Log in to the agent1, agent2, and agent3 nodes and create soft links forhbase-2.0.2.cd /usr/localln -s hbase-2.0.2 hbase

----End

4.6.5 Starting the HBase ClusterStep 1 Start ZooKeeper and Hadoop in sequence.

Step 2 Start the HBase cluster on the server1 node./usr/local/hbase/bin/start-hbase.sh

Step 3 Check whether all processes are started properly.jps

server1:

ResourceManagerNameNodeHMaster

agent1:

NodeManagerDataNodeHRegionServerJournalNodeQuorumPeerMain

NO TE

Observe the process startup status on all nodes. The process that needs to be started on theserver1 node is the process of server1, and the process that needs to be started on theagent1 node is the process of agent1.

----End

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 51

Page 59: Deployment Guide (Apache)

4.6.6 (Optional) Stopping the HBase ClusterStop the HBase cluster on the server1 node.

/usr/local/hbase/bin/stop-hbase.sh

4.6.7 Verifying HBaseOpen the browser and enter http://server1:16010 to access the HBase web page.In the URL, server1 indicates the IP address of the node where the HMasterprocess is located, and 16010 is the default port number of HBase 1.0 or later. Youcan change the value of hbase.master.info.port in the hbase-site.xml file tochange the port number.

Check whether the number of servers in Region Servers is the same as thenumber of configured agents (there are 3 agents in this example) and whetherthe cluster is properly started.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 52

Page 60: Deployment Guide (Apache)

5 Hive Deployment Guide (CentOS 7.6 &openEuler 20.03)

5.1 Introduction

5.2 Environment Requirements

5.3 Configuring the Deployment Environment

5.4 Deploying ZooKeeper

5.5 Deploying Hadoop

5.6 Deploying Hive

5.1 Introduction

Hive OverviewThis document describes the Hive deployment procedure and does not include thesource code compilation procedure.

All programs required in this document are downloaded from the official websites.Most of these programs are compiled based on the x86 platform and may containmodules that are implemented in platform-dependent languages (such as C/C++).Therefore, incompatibility issues may occur if these programs are directly run onTaiShan servers. To resolve the problem, you need to download and compile thesource code and then deploy the programs. The deployment procedure is the sameregardless of the program compilation platform.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 53

Page 61: Deployment Guide (Apache)

Recommended VersionSoftware

Version

How to Obtain

OpenJDK

jdk8u252-b09

ARM: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gzx86: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz

ZooKeeper

3.4.6 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/

Hadoop 3.1.1 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/hadoop/core/hadoop-3.1.1/

Hive 3.1.0 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/hive/hive-3.1.0/

5.2 Environment Requirements

HardwareMinimum configuration: any CPU, one DIMM of any capacity, and one drive of anycapacity

The configuration depends on the actual application scenario.

OS RequirementsCentOS 7.4 to 7.6, openEuler 20.03

NO TE

This document uses CentOS 7.6 as an example to describe how to deploy a Hive cluster.

Cluster EnvironmentIn this document, four hosts are used as nodes 1 to 4 in a cluster. Table 5-1 liststhe data plan of each node.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 54

Page 62: Deployment Guide (Apache)

Table 5-1 Cluster data plan

Node IP Address Number of Drives OS & JDK

Node 1 IPaddress1 System drive: 1 x 4TB HDD

Data drive: 12 x 4TB HDD

CentOS 7.6 and OpenJDKjdk8u252-b09

Node 2 IPaddress2

Node 3 IPaddress3

Node 4 IPaddress4

Software PlanningTable 5-2 lists the software plan of each node in the cluster.

Table 5-2 Software plan

Node Services

Node 1 NameNode, ResourceManager, and Hive client

Node 2 QuorumPeerMain, DataNode, NodeManager, and JournalNode

Node 3 QuorumPeerMain, DataNode, NodeManager, and JournalNode

Node 4 QuorumPeerMain, DataNode, NodeManager, and JournalNode

5.3 Configuring the Deployment EnvironmentStep 1 Log in to nodes 1 to 4 in sequence and change their host names to server1,

agent1, agent2 and agent3.hostnamectl set-hostname host_name --static

Step 2 Log in to each node and modify the /etc/hosts file.

Add the mapping between the IP addresses and host names of the nodes to thehosts file.

IPaddress1 server1IPaddress2 agent1IPaddress3 agent2IPaddress4 agent3

Step 3 Log in to each node and disable the firewall.systemctl stop firewalld.servicesystemctl disable firewalld.service

Step 4 Log in to each node and enable password-free SSH login.

1. Generate a key and press Enter if any message is prompted.ssh-keygen -t rsa

2. Enable password-free SSH login on each node (including password-free loginfor the local node):ssh-copy-id -i ~/.ssh/id_rsa.pub root@node_IP_address

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 55

Page 63: Deployment Guide (Apache)

Step 5 Log in to each node and install OpenJDK.

1. Install OpenJDK.ARM:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gz -C /usr/local

x86:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz -C /usr/local

2. Add environment variables.vim /etc/profileexport JAVA_HOME=/usr/local/jdk8u252-b09export PATH=$JAVA_HOME/bin:$PATH

3. Make the environment variables take effect.source /etc/profile

4. Check whether the OpenJDK is successfully installed.java -version

The installation is successful if information similar to the following isdisplayed:

----End

5.4 Deploying ZooKeeper

5.4.1 Compiling and Decompressing ZooKeeperStep 1 Compile the zookeeper-3.4.6.tar.gz deployment package by following instructions

in ZooKeeper 3.4.6 Porting Guide (CentOS 7.6).

Step 2 Place zookeeper-3.4.6.tar.gz in the /usr/local directory on agent1 anddecompress it.mv zookeeper-3.4.6.tar.gz /usr/localcd /usr/localtar -zxvf zookeeper-3.4.6.tar.gz

Step 3 Create a soft link for subsequent version update.ln -s zookeeper-3.4.6 zookeeper

----End

5.4.2 Setting ZooKeeper Environment VariablesStep 1 Open the configuration file.

vim /etc/profile

Step 2 Add ZooKeeper to environment variables.export ZOOKEEPER_HOME=/usr/local/zookeeperexport PATH=$ZOOKEEPER_HOME/bin:$PATH

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 56

Page 64: Deployment Guide (Apache)

Step 3 Make the environment variables take effect.source /etc/profile

----End

5.4.3 Modifying the ZooKeeper Configuration FilesStep 1 Switch to the directory where ZooKeeper is located.

cd /usr/local/zookeeper/conf

Step 2 Copy the configuration file.cp zoo_sample.cfg zoo.cfg

Step 3 Modify the configuration file.vim zoo.cfg

1. Change the data directory.dataDir=/usr/local/zookeeper/tmp

2. Add the following code to the end of the file. server.1 to server.3 are thenodes where ZooKeeper is deployed.server.1=agent1:2888:3888server.2=agent2:2888:3888server.3=agent3:2888:3888

Step 4 Create the tmp directory as the data directory.mkdir /usr/local/zookeeper/tmp

Step 5 Create an empty file in the tmp directory and write an ID to the file.touch /usr/local/zookeeper/tmp/myidecho 1 > /usr/local/zookeeper/tmp/myid

----End

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 57

Page 65: Deployment Guide (Apache)

5.4.4 Synchronizing the Configuration to Other NodesStep 1 Copy the ZooKeeper configuration to other nodes.

scp -r /usr/local/zookeeper-3.4.6 root@agent2:/usr/localscp -r /usr/local/zookeeper-3.4.6 root@agent3:/usr/local

Step 2 Create a soft link and modify myid on agent2 and agent3.● agent2:

cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 2 > /usr/local/zookeeper/tmp/myid

● agent3:cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 3 > /usr/local/zookeeper/tmp/myid

----End

5.4.5 Running and Verifying ZooKeeperStep 1 Start ZooKeeper on agent1, agent2, and agent3.

cd /usr/local/zookeeper/bin./zkServer.sh start

NO TE

You can stop ZooKeeper on agent1, agent2, and agent3.cd /usr/local/zookeeper/bin./zkServer.sh stop

Step 2 Check the ZooKeeper status../zkServer.sh status

----End

5.5 Deploying Hadoop

5.5.1 Compiling and Decompressing HadoopStep 1 Compile the Hadoop software deployment package hadoop-3.1.1.tar.gz by

referring to Hadoop 3.1.1 Porting Guide (CentOS 7.6).

Step 2 Place hadoop-3.1.1.tar.gz in the /usr/local directory on server1 and decompressit.mv hadoop-3.1.1.tar.gz /usr/localcd /usr/localtar -zxvf hadoop-3.1.1.tar.gz

Step 3 Create a soft link for later version replacement.ln -s hadoop-3.1.1 hadoop

----End

5.5.2 Setting the Hadoop Environment VariablesStep 1 Open the /etc/profile file:

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 58

Page 66: Deployment Guide (Apache)

vim /etc/profile

Step 2 Add the following environment variables to the end of the file:export HADOOP_HOME=/usr/local/hadoopexport PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

Step 3 Make the environment variables take effect.source /etc/profile

----End

5.5.3 Modifying the Hadoop Configuration FileNO TE

All Hadoop configuration files are stored in the $HADOOP_HOME/etc/hadoop directory.Before modifying the configuration files, go to the $HADOOP_HOME/etc/hadoop directoryfirst.cd $HADOOP_HOME/etc/hadoop

Modifying the hadoop-env.sh FileChange the environment variable JAVA_HOME to an absolute path and set theuser to user root.

echo "export JAVA_HOME=/usr/local/jdk8u252-b09" >> hadoop-env.shecho "export HDFS_NAMENODE_USER=root" >> hadoop-env.shecho "export HDFS_SECONDARYNAMENODE_USER=root" >> hadoop-env.shecho "export HDFS_DATANODE_USER=root" >> hadoop-env.sh

Modifying the yarn-env.sh FileChange the user to user root.

echo "export YARN_REGISTRYDNS_SECURE_USER=root" >> yarn-env.shecho "export YARN_RESOURCEMANAGER_USER=root" >> yarn-env.shecho "export YARN_NODEMANAGER_USER=root" >> yarn-env.sh

Modifying the core-site.xml File

Step 1 Open the core-site.xml file.vim core-site.xml

Step 2 Add or modify parameters under the configuration section.<property> <name>fs.defaultFS</name> <value>hdfs://server1:9000</value></property><property> <name>hadoop.tmp.dir</name> <value>/home/hadoop_tmp_dir</value></property><property> <name>ipc.client.connect.max.retries</name> <value>100</value></property><property> <name>ipc.client.connect.retry.interval</name> <value>10000</value></property><property> <name>hadoop.proxyuser.root.hosts</name>

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 59

Page 67: Deployment Guide (Apache)

<value>*</value></property><property> <name>hadoop.proxyuser.root.groups</name> <value>*</value></property>

NO TICE

Create a directory on server1.mkdir /home/hadoop_tmp_dir

----End

Modifying the hdfs-site.xml File

Step 1 Modify the hdfs-site.xml file.vim hdfs-site.xml

Step 2 Add or modify parameters under the configuration section.<property> <name>dfs.replication</name> <value>1</value></property><property> <name>dfs.namenode.name.dir</name> <value>/data/data1/hadoop/nn</value></property><property> <name>dfs.datanode.data.dir</name> <value>/data/data1/hadoop/dn,/data/data2/hadoop/dn,/data/data3/hadoop/dn,/data/data4/hadoop/dn,/data/data5/hadoop/dn,/data/data6/hadoop/dn,/data/data7/hadoop/dn,/data/data8/hadoop/dn,/data/data9/hadoop/dn,/data/data10/hadoop/dn,/data/data11/hadoop/dn,/data/data12/hadoop/dn</value></property><property> <name>dfs.http.address</name> <value>server1:50070</value></property><property> <name>dfs.namenode.http-bind-host</name> <value>0.0.0.0</value></property><property> <name>dfs.datanode.handler.count</name> <value>600</value></property><property> <name>dfs.namenode.handler.count</name> <value>600</value></property><property> <name>dfs.namenode.service.handler.count</name> <value>600</value></property><property> <name>ipc.server.handler.queue.size</name> <value>300</value></property><property> <name>dfs.webhdfs.enabled</name> <value>true</value></property>

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 60

Page 68: Deployment Guide (Apache)

NO TICE

Create a directory for dfs.datanode.data.dir on agent1, agent2, and agent3.Example:mkdir -p /data/data{1,2,3,4,5,6,7,8,9,10,11,12}/hadoop

----End

Modifying the mapred-site.xml File

Step 1 Edit the mapred-site.xml file.vim mapred-site.xml

Step 2 Add or modify parameters under the configuration section.<property> <name>mapreduce.framework.name</name> <value>yarn</value> <final>true</final> <description>The runtime framework for executing MapReduce jobs</description></property><property> <name>mapreduce.job.reduce.slowstart.completedmaps</name> <value>0.88</value></property><property> <name>mapreduce.application.classpath</name> <value> /usr/local/hadoop/etc/hadoop, /usr/local/hadoop/share/hadoop/common/*, /usr/local/hadoop/share/hadoop/common/lib/*, /usr/local/hadoop/share/hadoop/hdfs/*, /usr/local/hadoop/share/hadoop/hdfs/lib/*, /usr/local/hadoop/share/hadoop/mapreduce/*, /usr/local/hadoop/share/hadoop/mapreduce/lib/*, /usr/local/hadoop/share/hadoop/yarn/*, /usr/local/hadoop/share/hadoop/yarn/lib/* </value></property><property> <name>mapreduce.map.memory.mb</name> <value>6144</value></property><property> <name>mapreduce.reduce.memory.mb</name> <value>6144</value> </property> <property> <name>mapreduce.map.java.opts</name> <value>-Xmx5530m</value></property><property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx2765m</value></property><property> <name>mapred.child.java.opts</name> <value>-Xmx2048m -Xms2048m</value></property><property> <name>mapred.reduce.parallel.copies</name> <value>20</value></property><property> <name>yarn.app.mapreduce.am.env</name>

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 61

Page 69: Deployment Guide (Apache)

<value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property><property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property><property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property>

----End

Modifying the yarn-site.xml FileStep 1 Edit the yarn-site.xml file.

vim yarn-site.xml

Step 2 Add or modify parameters under the configuration section.<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> <final>true</final></property><property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value></property><property> <name>yarn.resourcemanager.hostname</name> <value>server1</value></property><property> <name>yarn.resourcemanager.bind-host</name> <value>0.0.0.0</value></property><property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>65536</value></property><property> <name>yarn.nodemanager.resource.memory-mb</name> <value>102400</value></property><property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>48</value></property><property> <name>yarn.log-aggregation-enable</name> <value>true</value></property><property> <name>yarn.client.nodemanager-connect.max-wait-ms</name> <value>300000</value></property><property> <name>yarn.nodemanager.vmem-pmem-ratio</name> <value>7.1</value></property><property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value></property><property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value></property>

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 62

Page 70: Deployment Guide (Apache)

<property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>3072</value></property><property> <name>yarn.app.mapreduce.am.resource.mb</name> <value>3072</value></property><property> <name>yarn.scheduler.maximum-allocation-vcores</name> <value>48</value></property><property> <name>yarn.application.classpath</name> <value> /usr/local/hadoop/etc/hadoop, /usr/local/hadoop/share/hadoop/common/*, /usr/local/hadoop/share/hadoop/common/lib/*, /usr/local/hadoop/share/hadoop/hdfs/*, /usr/local/hadoop/share/hadoop/hdfs/lib/*, /usr/local/hadoop/share/hadoop/mapreduce/*, /usr/local/hadoop/share/hadoop/mapreduce/lib/*, /usr/local/hadoop/share/hadoop/yarn/*, /usr/local/hadoop/share/hadoop/yarn/lib/* </value></property><property> <name>yarn.nodemanager.local-dirs</name> <value>/data/data1/hadoop/yarn/local,/data/data2/hadoop/yarn/local,/data/data3/hadoop/yarn/local,/data/data4/hadoop/yarn/local,/data/data5/hadoop/yarn/local,/data/data6/hadoop/yarn/local,/data/data7/hadoop/yarn/local,/data/data8/hadoop/yarn/local,/data/data9/hadoop/yarn/local,/data/data10/hadoop/yarn/local,/data/data11/hadoop/yarn/local,/data/data12/hadoop/yarn/local</value> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/data/data1/hadoop/yarn/log,/data/data2/hadoop/yarn/log,/data/data3/hadoop/yarn/log,/data/data4/hadoop/yarn/log,/data/data5/hadoop/yarn/log,/data/data6/hadoop/yarn/log,/data/data7/hadoop/yarn/log,/data/data8/hadoop/yarn/log,/data/data9/hadoop/yarn/log,/data/data10/hadoop/yarn/log,/data/data11/hadoop/yarn/log,/data/data12/hadoop/yarn/log</value> </property>

NO TICE

Create a directory for yarn.nodemanager.local-dirs on agent1, agent2, andagent3.

Example:mkdir -p /data/data{1,2,3,4,5,6,7,8,9,10,11,12}/hadoop/yarn

----End

Modifying the slaves or workers Files

Step 1 Check the Hadoop version. If the Hadoop version is earlier than 3.x, edit theslaves file. If the Hadoop version is 3.x or later, edit the workers file.

Step 2 Edit the workers file (taking Hadoop 3.1.1 as an example in this document).vim workers

Step 3 Modify the workers file and delete all content except the IP addresses or hostnames of all agent nodes.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 63

Page 71: Deployment Guide (Apache)

agent1agent2agent3

----End

5.5.4 Synchronizing the Configuration to Other NodesStep 1 Create a journaldata directory on each node in sequence.

mkdir -p /usr/local/hadoop-3.1.1/journaldata

Step 2 Copy hadoop-3.1.1 to the /usr/local directory on agent1, agent2, and agent3nodes.scp -r /usr/local/hadoop-3.1.1 root@agent1:/usr/localscp -r /usr/local/hadoop-3.1.1 root@agent2:/usr/localscp -r /usr/local/hadoop-3.1.1 root@agent3:/usr/local

Step 3 Log in to the agent1, agent2, and agent3 nodes and create soft links forhadoop-3.1.1.cd /usr/localln -s hadoop-3.1.1 hadoop

----End

5.5.5 Starting the Hadoop Cluster

NO TICE

Perform operations in this section in sequence.

Step 1 Start the ZooKeeper cluster.

Start ZooKeeper on agent1, agent2, and agent3.

cd /usr/local/zookeeper/bin./zkServer.sh start

Step 2 Start JournalNode.

Start JournalNode on agent1, agent2, and agent3.

NO TE

Perform Step 2 to Step 4 only when you format the cluster for the first time. After theformatting is complete, you only need to perform Step 1, Step 5, and Step 6 when youstart the cluster next time.

cd /usr/local/hadoop/sbin./hadoop-daemon.sh start journalnode

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 64

Page 72: Deployment Guide (Apache)

Step 3 Format HDFS.

1. Format HDFS on server1.hdfs namenode -format

2. After the formatting, the cluster generates a directory based on thehadoop.tmp.dir parameter configured in the core-site.xml file.The directory configured in this example is /home/hadoop_tmp.

Step 4 Format ZKFC.

Format ZKFC on server1.

hdfs zkfc -formatZK

Step 5 Start the HDFS.

Start HDFS on server1.

cd /usr/local/hadoop/sbin./start-dfs.sh

Step 6 Start Yarn.

Start Yarn on server1.

cd /usr/local/hadoop/sbin./start-yarn.sh

Step 7 Check whether all processes are started properly.

NO TE

Perform this operation on each node to check whether all processes are started properly.(The following figures show the processes to be started on server1 and agent1, respectively.The processes to be started on other server nodes and agent nodes are similar.)

jps

----End

5.5.6 Verifying HadoopEnter the URL in the address box of the browser to access the Hadoop web page.The URL format is http://server1:50070.

Change server1 to the IP address of the node where the server process resides.Check whether the number of live nodes is the same as the number of agentnodes (the quantity is 3 in this section) and whether the number of dead nodes is0. If yes, the cluster is started properly.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 65

Page 73: Deployment Guide (Apache)

5.6 Deploying Hive

5.6.1 Installing MariaDBHive stores metadata in the database. Before installing Hive, you need to installthe database software and configure database information in hive-site.xml.Common databases include Derby, MySQL, and MariaDB. This document usesMariaDB as an example. The deployment of other databases is similar.

Step 1 Install MariaDB on server1.

NO TE

Before installing MariaDB, ensure that the Yum source has been configured.yum install mariadb*

Start the MariaDB service.

systemctl start mariadb.service

(Optional) Configure autostart upon power-on:

systemctl enable mariadb.service

Step 2 Configure the permissions and password.

1. Log in to the database and press Enter twice. No password is required for thefirst login.mysql -uroot -p

2. Connect to the MySQL database.mysql> use mysql;

3. Grant all permissions to the root user.mysql> grant all on *.* to root@'server1' identified by 'root';

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 66

Page 74: Deployment Guide (Apache)

4. Update the permissions.mysql> flush privileges;

5. Set the password.mysql> set password for root@server1=password('123456');

Step 3 Sets the UTF-8 character encoding.

1. Open the my.cnf configuration file.vim /etc/my.cnf

2. Add the following content under the [mysqld] section:init_connect='SET collation_connection = utf8_unicode_ci'init_connect='SET NAMES utf8'character-set-server=utf8collation-server=utf8_unicode_ciskip-character-set-client-handshake

3. Open the client.cnf file.vim /etc/my.cnf.d/client.cnf

4. Add the following content under the [client] section:default-character-set=utf8

5. Open the mysql-clients.cnf file.vim /etc/my.cnf.d/mysql-clients.cnf

6. Add the following content under the [mysql] section:default-character-set=utf8

7. Restart MariaDB.systemctl restart mariadb

----End

5.6.2 Obtaining HiveStep 1 Download Hive from the following website:

https://archive.apache.org/dist/hive/hive-3.1.0/

Step 2 Place apache-hive-3.1.0-bin.tar.gz in the /usr/local directory on server1 anddecompress it.mv apache-hive-3.1.0-bin.tar.gz /usr/localtar -zxvf apache-hive-3.1.0-bin.tar.gz

Step 3 Create a soft link for subsequent version update.ln -s apache-hive-3.1.0-bin hive

----End

5.6.3 Setting Hive Environment VariablesStep 1 Open the configuration file.

vim /etc/profile

Step 2 Add Hive path to the environment variables.export HIVE_HOME=/usr/local/hiveexport PATH=$HIVE_HOME/bin:$PATH

Step 3 Make the environment variables take effect.source /etc/profile

----End

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 67

Page 75: Deployment Guide (Apache)

5.6.4 Modifying the Hive Configuration FilesNO TE

All Hive configuration files are stored in the $HIVE_HOME/conf directory. Before modifyingthe configuration files, go to the $HIVE_HOME/conf directory first.cd $HIVE_HOME/conf

Step 1 Modify the hive-env.sh file.cp hive-env.sh.template hive-env.sh

Add the following content to the end of the hive-env.sh file:

export JAVA_HOME=/usr/local/jdk8u252-b09export HADOOP_HOME=/usr/local/hadoopexport HIVE_CONF_DIR=/usr/local/hive/conf

Step 2 Modify the hive-site.xml file.cp hive-default.xml.template hive-site.xml

Run the following command to replace for&# with for to prevent encodingproblems during initialization:

sed -i 's/for&#/for/g' hive-site.xml

Change the values of related parameters in the hive-site.xml file as follows:

<name>javax.jdo.option.ConnectionURL</name><value>jdbc:mysql://server1:3306/hive?createDatabaseIfNotExist=true</value>

<name>javax.jdo.option.ConnectionDriverName</name><value>org.mariadb.jdbc.Driver</value>

<name>javax.jdo.option.ConnectionUserName</name><value>root</value>

<name>javax.jdo.option.ConnectionPassword</name><value>root</value>

<name>hive.exec.local.scratchdir</name><value>/tmp/hive</value>

<name>hive.downloaded.resources.dir</name><value>/tmp/${hive.session.id}_resources</value>

<name>hive.querylog.location</name><value>/tmp/hive</value>

<name>hive.server2.logging.operation.log.location</name><value>/tmp/hive/operation_logs</value>

----End

5.6.5 Starting and Verifying HiveStep 1 Prepare for starting Hive.

1. Download JDBC.Download the JDBC driver and save it to the /usr/local/hive/lib directory. Inthis example, mariadb-java-client-2.3.0.jar is used.

2. Create a directory in which the Hive data is stored./usr/local/hadoop/bin/hadoop fs -mkdir /tmp/usr/local/hadoop/bin/hadoop fs -mkdir -p /user/hive/warehouse

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 68

Page 76: Deployment Guide (Apache)

/usr/local/hadoop/bin/hadoop fs -chmod g+w /tmp/usr/local/hadoop/bin/hadoop fs -chmod g+w /user/hive/warehouse

3. Create a Hive log directory.mkdir -p /usr/local/hive/log/touch /usr/local/hive/log/hiveserver.logtouch /usr/local/hive/log/hiveserver.err

Step 2 Initialize Hive.schematool -dbType mysql -initSchema

Step 3 Start Hive metastore.hive --service metastore -p 9083 &

Step 4 Start hiveserver2.

1. Start hiveserver2.nohup hiveserver2 1>/usr/local/hive/log/hiveserver.log 2>/usr/local/hive/log/hiveserver.err &

2. View the progress.tail -f /usr/local/hive/log/hiveserver.errnohup: ignoring input which: no hbase in (/usr/local/hadoop/bin:/usr/local/hadoop/sbin:/usr/local/hive/bin:/usr/local/zookeeper/bin:/usr/local/jdk8u222-b10/bin:/usr/local/python3/bin:/usr/local/hadoop/bin:/usr/local/hadoop/sbin:/usr/local/hive/bin:/usr/local/zookeeper/bin:/usr/local/jdk8u222-b10/bin:/usr/local/python3/bin:/usr/local/hadoop/bin:/usr/local/hadoop/sbin:/usr/local/zookeeper/bin:/usr/local/jdk8u222-b10/bin:/usr/local/python3/bin:/usr/local/jdk8u222-b10/bin:/usr/local/python3/bin:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin) 2021-01-18 11:32:22: Starting HiveServer2 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/apache-hive-3.1.0-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop-3.1.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Hive Session ID = 824030a3-2afe-488c-a2fa-7d98cfc8f7bd Hive Session ID = 1031e326-2088-4025-b2e2-c9bb1e81b03d Hive Session ID = 32203873-49ad-44b7-987c-da1aae8b3375 Hive Session ID = d7be9389-11c6-46cb-90d6-a91a2d5199b8 OK

3. Query the port.netstat -anp|grep 10000

If the following information is displayed, the startup is successful:tcp6 0 0 :::10000 :::* LISTEN 27800/java

4. Use beeline to connect to server1.beeline -u jdbc:hive2://server1:10000

The command output is as follows:SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/apache-hive-3.1.0-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop-3.1.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Connecting to jdbc:hive2://server1:10000 Connected to: Apache Hive (version 3.1.0) Driver: Hive JDBC (version 3.1.0) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 3.1.0 by Apache Hive0: jdbc:hive2://server1:10000>

Step 5 Check the created database.show databases;

The verification is successful if the following information is displayed:

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 69

Page 77: Deployment Guide (Apache)

0: jdbc:hive2://server1:10000> show databases; INFO : Compiling command(queryId=root_20210118113531_49c3505a-80e1-4aba-9761-c2f77a06ac5f): show databases INFO : Concurrency mode is disabled, not creating a lock manager INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null) INFO : Completed compiling command(queryId=root_20210118113531_49c3505a-80e1-4aba-9761-c2f77a06ac5f); Time taken: 0.903 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=root_20210118113531_49c3505a-80e1-4aba-9761-c2f77a06ac5f): show databases INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=root_20210118113531_49c3505a-80e1-4aba-9761-c2f77a06ac5f); Time taken: 0.029 seconds INFO : OK INFO : Concurrency mode is disabled, not creating a lock manager +----------------+ | database_name | +----------------+ | default | +----------------+ 1 row selected (1.248 seconds)

Step 6 Quit Hive.quit;

----End

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 70

Page 78: Deployment Guide (Apache)

6 Kafka Deployment Guide (CentOS 7.6 &openEuler 20.03)

6.1 Introduction

6.2 Environment Requirements

6.3 Configuring the Deployment Environment

6.4 Deploying ZooKeeper

6.5 Deploying Kafka

6.1 Introduction

Kafka OverviewThis document describes the Kafka deployment procedure and does not includethe software compilation procedure using source code.

All programs required in this document are downloaded from the official websites.Most of these programs are compiled based on the x86 platform and may containmodules that are implemented in platform-dependent languages (such as C/C++).Therefore, incompatibility issues may occur if these programs directly run onTaiShan servers. To resolve the problem, you need to download and compile thesource code and then deploy the programs. The deployment procedure is the sameregardless of the program compilation platform.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

6 Kafka Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 71

Page 79: Deployment Guide (Apache)

Recommended Version

Software Version How to Obtain

OpenJDK jdk8u252-b09 ARM: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gzx86: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz

ZooKeeper 3.4.6 Download the software package of the requiredversion from:https://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/

Kafka 2.11-2.2.0 Download the software package of the requiredversion from: https://archive.apache.org/dist/kafka/2.2.0/.

6.2 Environment Requirements

Hardware

Minimum configuration: any CPU, one DIMM of any capacity, and one drive of anycapacity

The configuration depends on the actual application scenario.

OS Requirements

CentOS 7.4 to 7.6, openEuler 20.03

NO TE

This section uses CentOS 7.6 as an example to describe how to deploy a Kafka cluster.

Cluster Data Plan

In this document, four hosts are used as nodes 1 to 4 in a cluster. Table 6-1 liststhe data plan of each node.

Table 6-1 Cluster data plan

Node IP Address Number of Drives OS & JDK

Node 1 IPaddress1 System drive: 1 x 4TB HDD

CentOS 7.6 & OpenJDKjdk8u252-b09

Node 2 IPaddress2

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

6 Kafka Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 72

Page 80: Deployment Guide (Apache)

Node IP Address Number of Drives OS & JDK

Node 3 Data drive: 12 x 4TB HDD

IPaddress3

Node 4 IPaddress4

Software PlanTable 6-2 lists the software plan of each node in the cluster.

Table 6-2 Software plan

Node Service

Node 1 Kafka client

Node 2 QuorumPeerMain and Kafka

Node 3 QuorumPeerMain and Kafka

Node 4 QuorumPeerMain and Kafka

6.3 Configuring the Deployment EnvironmentStep 1 Log in to nodes 1 to 4 in sequence and change their host names to server1,

agent1, agent2 and agent3.hostnamectl set-hostname host_name --static

Step 2 Log in to each node and modify the /etc/hosts file.

Add the mapping between the IP addresses and host names of the nodes to thehosts file.

IPaddress1 server1IPaddress2 agent1IPaddress3 agent2IPaddress4 agent3

Step 3 Log in to each node and disable the firewall.systemctl stop firewalld.servicesystemctl disable firewalld.service

Step 4 Log in to each node and enable password-free SSH login.

1. Generate a key and press Enter if any message is prompted.ssh-keygen -t rsa

2. Enable password-free SSH login on each node (including password-free loginfor the local node):ssh-copy-id -i ~/.ssh/id_rsa.pub root@node_IP_address

Step 5 Log in to each node and install OpenJDK.

1. Install OpenJDK.ARM:

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

6 Kafka Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 73

Page 81: Deployment Guide (Apache)

wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gz -C /usr/local

x86:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz -C /usr/local

2. Add environment variables.vim /etc/profileexport JAVA_HOME=/usr/local/jdk8u252-b09export PATH=$JAVA_HOME/bin:$PATH

3. Make the environment variables take effect.source /etc/profile

4. Check whether the OpenJDK is successfully installed.java -version

The installation is successful if information similar to the following isdisplayed:

----End

6.4 Deploying ZooKeeper

6.4.1 Compiling and Decompressing ZooKeeperStep 1 Compile the zookeeper-3.4.6.tar.gz deployment package by following instructions

in ZooKeeper 3.4.6 Porting Guide (CentOS 7.6).

Step 2 Place zookeeper-3.4.6.tar.gz in the /usr/local directory on agent1 anddecompress it.mv zookeeper-3.4.6.tar.gz /usr/localcd /usr/localtar -zxvf zookeeper-3.4.6.tar.gz

Step 3 Create a soft link for subsequent version update.ln -s zookeeper-3.4.6 zookeeper

----End

6.4.2 Setting ZooKeeper Environment VariablesStep 1 Open the configuration file.

vim /etc/profile

Step 2 Add ZooKeeper to environment variables.export ZOOKEEPER_HOME=/usr/local/zookeeperexport PATH=$ZOOKEEPER_HOME/bin:$PATH

Step 3 Make the environment variables take effect.source /etc/profile

----End

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

6 Kafka Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 74

Page 82: Deployment Guide (Apache)

6.4.3 Modifying the ZooKeeper Configuration FilesStep 1 Switch to the directory where ZooKeeper is located.

cd /usr/local/zookeeper/conf

Step 2 Copy the configuration file.cp zoo_sample.cfg zoo.cfg

Step 3 Modify the configuration file.vim zoo.cfg

1. Change the data directory.dataDir=/usr/local/zookeeper/tmp

2. Add the following code to the end of the file. server.1 to server.3 are thenodes where ZooKeeper is deployed.server.1=agent1:2888:3888server.2=agent2:2888:3888server.3=agent3:2888:3888

Step 4 Create the tmp directory as the data directory.mkdir /usr/local/zookeeper/tmp

Step 5 Create an empty file in the tmp directory and write an ID to the file.touch /usr/local/zookeeper/tmp/myidecho 1 > /usr/local/zookeeper/tmp/myid

----End

6.4.4 Synchronizing the Configuration to Other NodesStep 1 Copy the ZooKeeper configuration to other nodes.

scp -r /usr/local/zookeeper-3.4.6 root@agent2:/usr/localscp -r /usr/local/zookeeper-3.4.6 root@agent3:/usr/local

Step 2 Create a soft link and modify myid on agent2 and agent3.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

6 Kafka Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 75

Page 83: Deployment Guide (Apache)

● agent2:cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 2 > /usr/local/zookeeper/tmp/myid

● agent3:cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 3 > /usr/local/zookeeper/tmp/myid

----End

6.4.5 Running and Verifying ZooKeeperStep 1 Start ZooKeeper on agent1, agent2, and agent3.

cd /usr/local/zookeeper/bin./zkServer.sh start

NO TE

You can stop ZooKeeper on agent1, agent2, and agent3.cd /usr/local/zookeeper/bin./zkServer.sh stop

Step 2 Check the ZooKeeper status../zkServer.sh status

----End

6.5 Deploying Kafka

6.5.1 Obtaining KafkaStep 1 Download Kafka installation package.

wget https://archive.apache.org/dist/kafka/2.2.0/kafka_2.11-2.2.0.tgz

Step 2 Save kafka_2.11-2.2.0.tgz to the /usr/local directory on agent1 and decompressit.mv kafka_2.11-2.2.0.tgz /usr/localtar -zxvf kafka_2.11-2.2.0.tgz

Step 3 Create a soft link for subsequent version update.ln -s kafka_2.11-2.2.0 kafka

----End

6.5.2 Setting Kafka Environment VariablesStep 1 Open the configuration file.

vim /etc/profile

Step 2 Add Kafka to the environment variables.export KAFKA_HOME=/usr/local/kafkaexport PATH=$KAFKA_HOME/bin:$PATH

Step 3 Make the environment variables take effect.source /etc/profile

----End

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

6 Kafka Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 76

Page 84: Deployment Guide (Apache)

6.5.3 Modifying the Kafka Configuration FilesNO TE

All Kafka configuration files are stored in the $KAFKA_HOME/config directory. Beforemodifying the configuration files, go to the $KAFKA_HOME/config directory first.cd $KAFKA_HOME/config

Step 1 Modify the server.properties file.vim server.properties

The modified content is as follows:broker.id=0port=6667host.name=agent1log.dirs=/data/data1/kafka,/data/data2/kafka,/data/data3/kafka,/data/data4/kafka,/data/data5/kafka,/data/data6/kafka,/data/data7/kafka,/data/data8/kafka,/data/data9/kafka,/data/data10/kafka,/data/data11/kafka,/data/data12/kafkazookeeper.connect=agent1:2181,agent2:2181,agent3:2181

NO TE

In the preceding command, set host.name to the IP address of agent1 and log.dirs to theactual data storage path.

Step 2 Synchronize the configuration to other nodes.

1. Copy kafka_2.11-0.10.1.1 to the /usr/local directory on each of agent2 andagent3.scp -r /usr/local/kafka_2.11-2.2.0 root@agent2:/usr/localscp -r /usr/local/kafka_2.11-2.2.0 root@agent3:/usr/local

2. Log in to agent2 and agent3 and separately create a soft link forkafka_2.11-2.2.0.cd /usr/localln -s kafka_2.11-2.2.0 kafka

Step 3 Modify related node parameters.

1. Log in to agent2 and modify the server.properties file.vim server.properties

The modified content is as follows:broker.id=1host.name=agent2 # Enter the corresponding IP address.

2. Log in to agent3 and modify the server.properties file.vim server.properties

The modified content is as follows:broker.id=2host.name=agent3 # Enter the corresponding IP address.

----End

6.5.4 Verifying KafkaStep 1 Run the following commands on agent1 to agent3 to start Kafka:

cd /usr/local/kafka/bin./kafka-server-start.sh -daemon ../config/server.properties

Step 2 Run the following command on each node to check whether all processes arestarted properly.jps

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

6 Kafka Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 77

Page 85: Deployment Guide (Apache)

NO TE

Processes started on agent1 are the same as those on other agent nodes. No process needsto be started on server nodes.

Step 3 Run the following commands on agent1 to agent3 to stop Kafka:cd /usr/local/kafka/bin./kafka-server-stop.sh

----End

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

6 Kafka Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 78

Page 86: Deployment Guide (Apache)

7 Solr Deployment Guide (CentOS 7.6 &openEuler 20.03)

7.1 Introduction

7.2 Environment Requirements

7.3 Configuring the Deployment Environment

7.4 Deploying ZooKeeper

7.5 Deploying Solr

7.1 Introduction

Solr OverviewThis document describes the Solr deployment procedure and does not include thesource code compilation procedure.

All programs required in this document are downloaded from the official websites.Most of these programs are compiled based on the x86 platform and may containmodules that are implemented in platform-dependent languages (such as C/C++).Therefore, incompatibility issues may occur if these programs directly run onTaiShan servers. To resolve the problem, you need to download and compile thesource code and then deploy the programs. The deployment procedure is the sameregardless of the program compilation platform.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

7 Solr Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 79

Page 87: Deployment Guide (Apache)

Recommended Versions

Software

Version How to Obtain

OpenJDK jdk8u252-b09

ARM: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gzx86: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz

ZooKeeper

3.4.6 Download the software package of the required versionfrom:https://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/

Solr 6.2.0 Download the software package of the required versionfrom:https://archive.apache.org/dist/lucene/solr/6.2.0/

Tomcat 8.5.28 Download the software package of the required versionfrom:https://archive.apache.org/dist/tomcat/tomcat-8/v8.5.28/bin/

7.2 Environment Requirements

Hardware

Minimum configuration: any CPU, one DIMM of any capacity, and one drive of anycapacity

The configuration depends on the actual application scenario.

OS Requirements

CentOS 7.4 to 7.6, openEuler 20.03

NO TE

This document uses CentOS 7.6 as an example to describe how to deploy a Solr cluster.

Cluster Data Plan

In this document, four hosts are used as nodes 1 to 4 in a cluster. Table 7-1 liststhe data plan of each node.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

7 Solr Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 80

Page 88: Deployment Guide (Apache)

Table 7-1 Cluster Data Plan

Node IP Address Number of Drives OS & JDK

Node 1 IPaddress1 System drive: 4 TBHDD x 1

Data drive: 12 x 4TB HDD

CentOS 7.6 & OpenJDKjdk8u252-b09

Node 2 IPaddress2

Node 3 IPaddress3

Node 4 IPaddress4

Software PlanTable 7-2 lists the software plan of each node in the cluster.

Table 7-2 Software plan

Node Services

Node 1 -

Node 2 QuorumPeerMain, Bootstrap (Solr)

Node 3 QuorumPeerMain, Bootstrap (Solr)

Node 4 QuorumPeerMain, Bootstrap (Solr)

7.3 Configuring the Deployment EnvironmentStep 1 Log in to nodes 1 to 4 in sequence and change their host names to server1,

agent1, agent2 and agent3.hostnamectl set-hostname host_name --static

Step 2 Log in to each node and modify the /etc/hosts file.

Add the mapping between the IP addresses and host names of the nodes to thehosts file.

IPaddress1 server1IPaddress2 agent1IPaddress3 agent2IPaddress4 agent3

Step 3 Log in to each node and disable the firewall.systemctl stop firewalld.servicesystemctl disable firewalld.service

Step 4 Log in to each node and enable password-free SSH login.

1. Generate a key and press Enter if any message is prompted.ssh-keygen -t rsa

2. Enable password-free SSH login on each node (including password-free loginfor the local node):ssh-copy-id -i ~/.ssh/id_rsa.pub root@node_IP_address

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

7 Solr Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 81

Page 89: Deployment Guide (Apache)

Step 5 Log in to each node and install OpenJDK.

1. Install OpenJDK.ARM:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gz -C /usr/local

x86:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz -C /usr/local

2. Add environment variables.vim /etc/profileexport JAVA_HOME=/usr/local/jdk8u252-b09export PATH=$JAVA_HOME/bin:$PATH

3. Make the environment variables take effect.source /etc/profile

4. Check whether the OpenJDK is successfully installed.java -version

The installation is successful if information similar to the following isdisplayed:

----End

7.4 Deploying ZooKeeper

7.4.1 Compiling and Decompressing ZooKeeperStep 1 Compile the zookeeper-3.4.6.tar.gz deployment package by following instructions

in ZooKeeper 3.4.6 Porting Guide (CentOS 7.6).

Step 2 Place zookeeper-3.4.6.tar.gz in the /usr/local directory on agent1 anddecompress it.mv zookeeper-3.4.6.tar.gz /usr/localcd /usr/localtar -zxvf zookeeper-3.4.6.tar.gz

Step 3 Create a soft link for subsequent version update.ln -s zookeeper-3.4.6 zookeeper

----End

7.4.2 Setting ZooKeeper Environment VariablesStep 1 Open the configuration file.

vim /etc/profile

Step 2 Add ZooKeeper to environment variables.export ZOOKEEPER_HOME=/usr/local/zookeeperexport PATH=$ZOOKEEPER_HOME/bin:$PATH

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

7 Solr Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 82

Page 90: Deployment Guide (Apache)

Step 3 Make the environment variables take effect.source /etc/profile

----End

7.4.3 Modifying the ZooKeeper Configuration FilesStep 1 Switch to the directory where ZooKeeper is located.

cd /usr/local/zookeeper/conf

Step 2 Copy the configuration file.cp zoo_sample.cfg zoo.cfg

Step 3 Modify the configuration file.vim zoo.cfg

1. Change the data directory.dataDir=/usr/local/zookeeper/tmp

2. Add the following code to the end of the file. server.1 to server.3 are thenodes where ZooKeeper is deployed.server.1=agent1:2888:3888server.2=agent2:2888:3888server.3=agent3:2888:3888

Step 4 Create the tmp directory as the data directory.mkdir /usr/local/zookeeper/tmp

Step 5 Create an empty file in the tmp directory and write an ID to the file.touch /usr/local/zookeeper/tmp/myidecho 1 > /usr/local/zookeeper/tmp/myid

----End

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

7 Solr Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 83

Page 91: Deployment Guide (Apache)

7.4.4 Synchronizing the Configuration to Other NodesStep 1 Copy the ZooKeeper configuration to other nodes.

scp -r /usr/local/zookeeper-3.4.6 root@agent2:/usr/localscp -r /usr/local/zookeeper-3.4.6 root@agent3:/usr/local

Step 2 Create a soft link and modify myid on agent2 and agent3.● agent2:

cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 2 > /usr/local/zookeeper/tmp/myid

● agent3:cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 3 > /usr/local/zookeeper/tmp/myid

----End

7.4.5 Running and Verifying ZooKeeperStep 1 Start ZooKeeper on agent1, agent2, and agent3.

cd /usr/local/zookeeper/bin./zkServer.sh start

NO TE

You can stop ZooKeeper on agent1, agent2, and agent3.cd /usr/local/zookeeper/bin./zkServer.sh stop

Step 2 Check the ZooKeeper status../zkServer.sh status

----End

7.5 Deploying Solr

7.5.1 Obtaining SolrStep 1 Place solr-6.2.0.tgz and apache-tomcat-8.5.28.tar.gz in the /usr/local/solrCloud

directory of agent1.

NO TE

The /usr/local/solrCloud directory must be created in advance.

Step 2 Decompress the Solr software package.tar -zxvf solr-6.2.0.tgz

Step 3 Create a soft link for subsequent version update.ln -s solr-6.2.0 solr

Step 4 Decompress the Tomcat software package.tar -zxvf apache-tomcat-8.5.28.tar.gz

Step 5 Create a soft link for subsequent version update.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

7 Solr Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 84

Page 92: Deployment Guide (Apache)

ln -s apache-tomcat-8.5.28 tomcat

----End

7.5.2 Setting Solr Environment VariablesStep 1 Open the configuration file.

vim /etc/profile

Step 2 Add Solr to environment variables.export SOLR_HOME=/usr/local/solrCloudexport PATH=$SOLR_HOME/bin:$PATH

Step 3 Make the environment variable take effect.source /etc/profile

----End

7.5.3 Copy the Solr ConfigurationNO TE

Solr installation directory: /usr/local/solrCloud/solr

Solr configuration file directory: /usr/local/solrCloud/solrConfig

Solr data file directory: /usr/local/solrCloud/solrCores

Tomcat installation directory: /usr/local/solrCloud/tomcat

Step 1 Create Solr configuration file and data file directories.mkdir -p /usr/local/solrCloud/{solrConfig,solrCores}

Step 2 Copy solr-webapps.cp -r /usr/local/solrCloud/solr/server/solr-webapp/webapp /usr/local/solrCloud/tomcat/webapps/solr

Step 3 Copy the jar file from solr/server/lib/ext to tomcat/webapps/solr/WEB-INF/lib.cp -r /usr/local/solrCloud/solr/server/lib/ext/*.jar /usr/local/solrCloud/tomcat/webapps/solr/WEB-INF/lib/

Step 4 Copy the configuration file.cp -r /usr/local/solrCloud/solr/server/solr/configsets/basic_configs/conf/* /usr/local/solrCloud/solrConfigcp -r /usr/local/solrCloud/solr/example/files/conf/velocity /usr/local/solrCloud/solrConfigcp /usr/local/solrCloud/solr/server/solr/solr.xml /usr/local/solrCloud/solrCores

----End

7.5.4 Modifying the ConfigurationStep 1 Open the solrCores/solr.xml file.

vim solrCores/solr.xml

Modify the hostPort file to ensure that the port is the same as the Tomcat port. Inthis document, the default Tomcat port 8080 is used.

<int name="hostPort">8080</int>

Step 2 Run the following command to create a directory:mkdir -p /usr/local/solrCloud/tomcat/conf/Catalina/localhost

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

7 Solr Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 85

Page 93: Deployment Guide (Apache)

Step 3 Create the solr.xml file in the /usr/local/solrCloud/tomcat/conf/Catalina/localhost/ directory.vim /usr/local/solrCloud/tomcat/conf/Catalina/localhost/solr.xml

Add the following content:

<?xml version="1.0" encoding="UTF-8"?><Context docBase="/usr/local/solrCloud/tomcat/webapps/solr" debug="0" crossContext="true"> <Environment name="solr/home" type="java.lang.String" value="/usr/local/solrCloud/solrCores" override="true"/></Context>

Step 4 Modify the /usr/local/solrCloud/tomcat/bin/catalina.sh file.vim /usr/local/solrCloud/tomcat/bin/catalina.sh

Add the following information:

JAVA_OPTS="-DzkHost=Datanode1:2181,Datanode2:2181,Datanode3:2181"

----End

7.5.5 Synchronizing the Configuration to Other NodesStep 1 Copy solrCloud to the /usr/local directory of agent2 and agent3.

scp -r /usr/local/solrCloud root@agent2:/usr/localscp -r /usr/local/solrCloud root@agent3:/usr/local

Step 2 Log in to the agent2 and agent3 nodes and create soft links for Solr and Tomcat.

NO TE

You can skip this step because the Solr and Tomcat directories become the actualdirectories in SCP.

cd /usr/local/solrCloudrm -rf solr tomcatln -s solr-6.2.0 solrln -s apache-tomcat-8.5.28 tomcat

----End

7.5.6 Uploading the Configuration to the ZooKeeper ClusterStep 1 Start the ZooKeeper cluster.

Step 2 Upload the file to Datanode1.java -classpath .:/usr/local/solrCloud/tomcat/webapps/solr/WEB-INF/lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost agent1:2181,agent2:2181,agent3:2181 -confdir /usr/local/solrCloud/solrConfig/ -confname solrconfig

Step 3 Check whether the configuration is successfully uploaded to ZooKeeper.cd /usr/local/zookeeper/bin./zkCli.sh -server agent1:2181ls /configs/solrconfig

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

7 Solr Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 86

Page 94: Deployment Guide (Apache)

Step 4 Disconnect the connection.quit

----End

7.5.7 Running and Verifying SolrStep 1 Run the following command on agent1 to agent3 to start Tomcat (Solr):

/usr/local/solrCloud/tomcat/bin/startup.sh<Connector port="New port number" protocol="HTTP/1.1"

NO TE

● The default port 8080 is often in use. You can change the Tomcat port No. in the ${tomcat}/conf/server.xml file. After the modification, change the value of hostPort insolrCores/solr.xml. For details, see Step 1.

● You can run the following command on agent1 to agent3 to stop Tomcat (Solr):/usr/local/solrCloud/tomcat/bin/shutdown.sh

Step 2 Enter http://agent1:8080/solr/index.html in the address box of the browser andpress Enter to access Solr.

In the address box, enter the IP address of the node where the agent1 processresides.

----End

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

7 Solr Deployment Guide (CentOS 7.6 & openEuler20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 87

Page 95: Deployment Guide (Apache)

8 Spark Cluster Deployment Guide(CentOS 7.6 & openEuler 20.03)

8.1 Introduction

8.2 Environment Requirements

8.3 Configuring the Deployment Environment

8.4 Deploying ZooKeeper

8.5 Deploying Hadoop

8.6 Deploying Spark

8.1 Introduction

Spark OverviewThis document describes the Spark deployment procedure and does not includethe software compilation procedure using source code.

All programs required in this document are downloaded from the official websites.Most of these programs are compiled based on the x86 platform and may containmodules that are implemented in platform-dependent languages (such as C/C++).Therefore, incompatibility issues may occur if these programs directly run onTaiShan servers. To resolve the problem, you need to download and compile thesource code and then deploy the programs. The deployment procedure is the sameregardless of the program compilation platform.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 88

Page 96: Deployment Guide (Apache)

Recommended Versions

Software

Version

How to Obtain

OpenJDK

jdk8u252-b09

ARM: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gzx86: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz

ZooKeeper

3.4.6 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/

Hadoop 3.1.1 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/hadoop/core/hadoop-3.1.1/

Spark 2.3.2 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/spark/spark-2.3.2/

Scala 2.11.12

Download the software package of the required version fromthe official website:https://downloads.lightbend.com/scala/2.11.12/scala-2.11.12.tgz

Hive 3.1.0 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/hive/hive-3.1.0/

8.2 Environment Requirements

Hardware

Minimum configuration: any CPU, one DIMM of any capacity, and one drive of anycapacity

The configuration depends on the actual application scenario.

OS Requirements

CentOS 7.4 to 7.6, openEuler 20.03

NO TE

This document uses CentOS 7.6 as an example to describe how to deploy a Spark cluster.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 89

Page 97: Deployment Guide (Apache)

Cluster Data Plan

In this document, four hosts are used as nodes 1 to 4 in a cluster. Table 8-1 liststhe data plan of each node.

Table 8-1 Cluster data plan

Node IP Address Number of Drives OS & JDK

Node 1 IPaddress1 System drive: 1 x 4TB HDD

Data drive: 12 x 4TB HDD

CentOS 7.6 & OpenJDKjdk8u252-b09

Node 2 IPaddress2

Node 3 IPaddress3

Node 4 IPaddress4

Software Planning

Table 8-2 lists the software plan of each node in the cluster.

Table 8-2 Software plan

Node Service

Node 1 NameNode, ResourceManager, and Master

Node 2 QuorumPeerMain, DataNode, NodeManager, JournalNode, andWorker

Node 3 QuorumPeerMain, DataNode, NodeManager, JournalNode, andWorker

Node 4 QuorumPeerMain, DataNode, NodeManager, JournalNode, andWorker

8.3 Configuring the Deployment EnvironmentStep 1 Log in to nodes 1 to 4 in sequence and change their host names to server1,

agent1, agent2 and agent3.hostnamectl set-hostname host_name --static

Step 2 Log in to each node and modify the /etc/hosts file.

Add the mapping between the IP addresses and host names of the nodes to thehosts file.

IPaddress1 server1IPaddress2 agent1IPaddress3 agent2IPaddress4 agent3

Step 3 Log in to each node and disable the firewall.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 90

Page 98: Deployment Guide (Apache)

systemctl stop firewalld.servicesystemctl disable firewalld.service

Step 4 Log in to each node and enable password-free SSH login.

1. Generate a key and press Enter if any message is prompted.ssh-keygen -t rsa

2. Enable password-free SSH login on each node (including password-free loginfor the local node):ssh-copy-id -i ~/.ssh/id_rsa.pub root@node_IP_address

Step 5 Log in to each node and install OpenJDK.

1. Install OpenJDK.ARM:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gz -C /usr/local

x86:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz -C /usr/local

2. Add environment variables.vim /etc/profileexport JAVA_HOME=/usr/local/jdk8u252-b09export PATH=$JAVA_HOME/bin:$PATH

3. Make the environment variables take effect.source /etc/profile

4. Check whether the OpenJDK is successfully installed.java -version

The installation is successful if information similar to the following isdisplayed:

----End

8.4 Deploying ZooKeeper

8.4.1 Compiling and Decompressing ZooKeeperStep 1 Compile the zookeeper-3.4.6.tar.gz deployment package by following instructions

in ZooKeeper 3.4.6 Porting Guide (CentOS 7.6).

Step 2 Place zookeeper-3.4.6.tar.gz in the /usr/local directory on agent1 anddecompress it.mv zookeeper-3.4.6.tar.gz /usr/localcd /usr/localtar -zxvf zookeeper-3.4.6.tar.gz

Step 3 Create a soft link for subsequent version update.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 91

Page 99: Deployment Guide (Apache)

ln -s zookeeper-3.4.6 zookeeper

----End

8.4.2 Setting ZooKeeper Environment VariablesStep 1 Open the configuration file.

vim /etc/profile

Step 2 Add ZooKeeper to environment variables.export ZOOKEEPER_HOME=/usr/local/zookeeperexport PATH=$ZOOKEEPER_HOME/bin:$PATH

Step 3 Make the environment variables take effect.source /etc/profile

----End

8.4.3 Modifying the ZooKeeper Configuration FilesStep 1 Switch to the directory where ZooKeeper is located.

cd /usr/local/zookeeper/conf

Step 2 Copy the configuration file.cp zoo_sample.cfg zoo.cfg

Step 3 Modify the configuration file.vim zoo.cfg

1. Change the data directory.dataDir=/usr/local/zookeeper/tmp

2. Add the following code to the end of the file. server.1 to server.3 are thenodes where ZooKeeper is deployed.server.1=agent1:2888:3888server.2=agent2:2888:3888server.3=agent3:2888:3888

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 92

Page 100: Deployment Guide (Apache)

Step 4 Create the tmp directory as the data directory.mkdir /usr/local/zookeeper/tmp

Step 5 Create an empty file in the tmp directory and write an ID to the file.touch /usr/local/zookeeper/tmp/myidecho 1 > /usr/local/zookeeper/tmp/myid

----End

8.4.4 Synchronizing the Configuration to Other NodesStep 1 Copy the ZooKeeper configuration to other nodes.

scp -r /usr/local/zookeeper-3.4.6 root@agent2:/usr/localscp -r /usr/local/zookeeper-3.4.6 root@agent3:/usr/local

Step 2 Create a soft link and modify myid on agent2 and agent3.● agent2:

cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 2 > /usr/local/zookeeper/tmp/myid

● agent3:cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 3 > /usr/local/zookeeper/tmp/myid

----End

8.4.5 Running and Verifying ZooKeeperStep 1 Start ZooKeeper on agent1, agent2, and agent3.

cd /usr/local/zookeeper/bin./zkServer.sh start

NO TE

You can stop ZooKeeper on agent1, agent2, and agent3.cd /usr/local/zookeeper/bin./zkServer.sh stop

Step 2 Check the ZooKeeper status../zkServer.sh status

----End

8.5 Deploying Hadoop

8.5.1 Compiling and Decompressing HadoopStep 1 Compile the Hadoop software deployment package hadoop-3.1.1.tar.gz by

referring to Hadoop 3.1.1 Porting Guide (CentOS 7.6).

Step 2 Place hadoop-3.1.1.tar.gz in the /usr/local directory on server1 and decompressit.mv hadoop-3.1.1.tar.gz /usr/localcd /usr/localtar -zxvf hadoop-3.1.1.tar.gz

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 93

Page 101: Deployment Guide (Apache)

Step 3 Create a soft link for later version replacement.ln -s hadoop-3.1.1 hadoop

----End

8.5.2 Setting the Hadoop Environment VariablesStep 1 Open the /etc/profile file:

vim /etc/profile

Step 2 Add the following environment variables to the end of the file:export HADOOP_HOME=/usr/local/hadoopexport PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

Step 3 Make the environment variables take effect.source /etc/profile

----End

8.5.3 Modifying the Hadoop Configuration FileNO TE

All Hadoop configuration files are stored in the $HADOOP_HOME/etc/hadoop directory.Before modifying the configuration files, go to the $HADOOP_HOME/etc/hadoop directoryfirst.cd $HADOOP_HOME/etc/hadoop

Modifying the hadoop-env.sh FileChange the environment variable JAVA_HOME to an absolute path and set theuser to user root.

echo "export JAVA_HOME=/usr/local/jdk8u252-b09" >> hadoop-env.shecho "export HDFS_NAMENODE_USER=root" >> hadoop-env.shecho "export HDFS_SECONDARYNAMENODE_USER=root" >> hadoop-env.shecho "export HDFS_DATANODE_USER=root" >> hadoop-env.sh

Modifying the yarn-env.sh FileChange the user to user root.

echo "export YARN_REGISTRYDNS_SECURE_USER=root" >> yarn-env.shecho "export YARN_RESOURCEMANAGER_USER=root" >> yarn-env.shecho "export YARN_NODEMANAGER_USER=root" >> yarn-env.sh

Modifying the core-site.xml File

Step 1 Open the core-site.xml file.vim core-site.xml

Step 2 Add or modify parameters under the configuration section.<property> <name>fs.defaultFS</name> <value>hdfs://server1:9000</value></property><property> <name>hadoop.tmp.dir</name> <value>/home/hadoop_tmp_dir</value>

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 94

Page 102: Deployment Guide (Apache)

</property><property> <name>ipc.client.connect.max.retries</name> <value>100</value></property><property> <name>ipc.client.connect.retry.interval</name> <value>10000</value></property><property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value></property><property> <name>hadoop.proxyuser.root.groups</name> <value>*</value></property>

NO TICE

Create a directory on server1.mkdir /home/hadoop_tmp_dir

----End

Modifying the hdfs-site.xml File

Step 1 Modify the hdfs-site.xml file.vim hdfs-site.xml

Step 2 Add or modify parameters under the configuration section.<property> <name>dfs.replication</name> <value>1</value></property><property> <name>dfs.namenode.name.dir</name> <value>/data/data1/hadoop/nn</value></property><property> <name>dfs.datanode.data.dir</name> <value>/data/data1/hadoop/dn,/data/data2/hadoop/dn,/data/data3/hadoop/dn,/data/data4/hadoop/dn,/data/data5/hadoop/dn,/data/data6/hadoop/dn,/data/data7/hadoop/dn,/data/data8/hadoop/dn,/data/data9/hadoop/dn,/data/data10/hadoop/dn,/data/data11/hadoop/dn,/data/data12/hadoop/dn</value></property><property> <name>dfs.http.address</name> <value>server1:50070</value></property><property> <name>dfs.namenode.http-bind-host</name> <value>0.0.0.0</value></property><property> <name>dfs.datanode.handler.count</name> <value>600</value></property><property> <name>dfs.namenode.handler.count</name> <value>600</value></property><property> <name>dfs.namenode.service.handler.count</name> <value>600</value>

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 95

Page 103: Deployment Guide (Apache)

</property><property> <name>ipc.server.handler.queue.size</name> <value>300</value></property><property> <name>dfs.webhdfs.enabled</name> <value>true</value></property>

NO TICE

Create a directory for dfs.datanode.data.dir on agent1, agent2, and agent3.Example:mkdir -p /data/data{1,2,3,4,5,6,7,8,9,10,11,12}/hadoop

----End

Modifying the mapred-site.xml File

Step 1 Edit the mapred-site.xml file.vim mapred-site.xml

Step 2 Add or modify parameters under the configuration section.<property> <name>mapreduce.framework.name</name> <value>yarn</value> <final>true</final> <description>The runtime framework for executing MapReduce jobs</description></property><property> <name>mapreduce.job.reduce.slowstart.completedmaps</name> <value>0.88</value></property><property> <name>mapreduce.application.classpath</name> <value> /usr/local/hadoop/etc/hadoop, /usr/local/hadoop/share/hadoop/common/*, /usr/local/hadoop/share/hadoop/common/lib/*, /usr/local/hadoop/share/hadoop/hdfs/*, /usr/local/hadoop/share/hadoop/hdfs/lib/*, /usr/local/hadoop/share/hadoop/mapreduce/*, /usr/local/hadoop/share/hadoop/mapreduce/lib/*, /usr/local/hadoop/share/hadoop/yarn/*, /usr/local/hadoop/share/hadoop/yarn/lib/* </value></property><property> <name>mapreduce.map.memory.mb</name> <value>6144</value></property><property> <name>mapreduce.reduce.memory.mb</name> <value>6144</value> </property> <property> <name>mapreduce.map.java.opts</name> <value>-Xmx5530m</value></property><property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx2765m</value>

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 96

Page 104: Deployment Guide (Apache)

</property><property> <name>mapred.child.java.opts</name> <value>-Xmx2048m -Xms2048m</value></property><property> <name>mapred.reduce.parallel.copies</name> <value>20</value></property><property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property><property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property><property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property>

----End

Modifying the yarn-site.xml FileStep 1 Edit the yarn-site.xml file.

vim yarn-site.xml

Step 2 Add or modify parameters under the configuration section.<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> <final>true</final></property><property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value></property><property> <name>yarn.resourcemanager.hostname</name> <value>server1</value></property><property> <name>yarn.resourcemanager.bind-host</name> <value>0.0.0.0</value></property><property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>65536</value></property><property> <name>yarn.nodemanager.resource.memory-mb</name> <value>102400</value></property><property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>48</value></property><property> <name>yarn.log-aggregation-enable</name> <value>true</value></property><property> <name>yarn.client.nodemanager-connect.max-wait-ms</name> <value>300000</value></property><property>

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 97

Page 105: Deployment Guide (Apache)

<name>yarn.nodemanager.vmem-pmem-ratio</name> <value>7.1</value></property><property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value></property><property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value></property><property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>3072</value></property><property> <name>yarn.app.mapreduce.am.resource.mb</name> <value>3072</value></property><property> <name>yarn.scheduler.maximum-allocation-vcores</name> <value>48</value></property><property> <name>yarn.application.classpath</name> <value> /usr/local/hadoop/etc/hadoop, /usr/local/hadoop/share/hadoop/common/*, /usr/local/hadoop/share/hadoop/common/lib/*, /usr/local/hadoop/share/hadoop/hdfs/*, /usr/local/hadoop/share/hadoop/hdfs/lib/*, /usr/local/hadoop/share/hadoop/mapreduce/*, /usr/local/hadoop/share/hadoop/mapreduce/lib/*, /usr/local/hadoop/share/hadoop/yarn/*, /usr/local/hadoop/share/hadoop/yarn/lib/* </value></property><property> <name>yarn.nodemanager.local-dirs</name> <value>/data/data1/hadoop/yarn/local,/data/data2/hadoop/yarn/local,/data/data3/hadoop/yarn/local,/data/data4/hadoop/yarn/local,/data/data5/hadoop/yarn/local,/data/data6/hadoop/yarn/local,/data/data7/hadoop/yarn/local,/data/data8/hadoop/yarn/local,/data/data9/hadoop/yarn/local,/data/data10/hadoop/yarn/local,/data/data11/hadoop/yarn/local,/data/data12/hadoop/yarn/local</value> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/data/data1/hadoop/yarn/log,/data/data2/hadoop/yarn/log,/data/data3/hadoop/yarn/log,/data/data4/hadoop/yarn/log,/data/data5/hadoop/yarn/log,/data/data6/hadoop/yarn/log,/data/data7/hadoop/yarn/log,/data/data8/hadoop/yarn/log,/data/data9/hadoop/yarn/log,/data/data10/hadoop/yarn/log,/data/data11/hadoop/yarn/log,/data/data12/hadoop/yarn/log</value> </property>

NO TICE

Create a directory for yarn.nodemanager.local-dirs on agent1, agent2, andagent3.Example:mkdir -p /data/data{1,2,3,4,5,6,7,8,9,10,11,12}/hadoop/yarn

----End

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 98

Page 106: Deployment Guide (Apache)

Modifying the slaves or workers Files

Step 1 Check the Hadoop version. If the Hadoop version is earlier than 3.x, edit theslaves file. If the Hadoop version is 3.x or later, edit the workers file.

Step 2 Edit the workers file (taking Hadoop 3.1.1 as an example in this document).vim workers

Step 3 Modify the workers file and delete all content except the IP addresses or hostnames of all agent nodes.agent1agent2agent3

----End

8.5.4 Synchronizing the Configuration to Other NodesStep 1 Create a journaldata directory on each node in sequence.

mkdir -p /usr/local/hadoop-3.1.1/journaldata

Step 2 Copy hadoop-3.1.1 to the /usr/local directory on agent1, agent2, and agent3nodes.scp -r /usr/local/hadoop-3.1.1 root@agent1:/usr/localscp -r /usr/local/hadoop-3.1.1 root@agent2:/usr/localscp -r /usr/local/hadoop-3.1.1 root@agent3:/usr/local

Step 3 Log in to the agent1, agent2, and agent3 nodes and create soft links forhadoop-3.1.1.cd /usr/localln -s hadoop-3.1.1 hadoop

----End

8.5.5 Starting the Hadoop Cluster

NO TICE

Perform operations in this section in sequence.

Step 1 Start the ZooKeeper cluster.

Start ZooKeeper on agent1, agent2, and agent3.

cd /usr/local/zookeeper/bin./zkServer.sh start

Step 2 Start JournalNode.

Start JournalNode on agent1, agent2, and agent3.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 99

Page 107: Deployment Guide (Apache)

NO TE

Perform Step 2 to Step 4 only when you format the cluster for the first time. After theformatting is complete, you only need to perform Step 1, Step 5, and Step 6 when youstart the cluster next time.

cd /usr/local/hadoop/sbin./hadoop-daemon.sh start journalnode

Step 3 Format HDFS.

1. Format HDFS on server1.hdfs namenode -format

2. After the formatting, the cluster generates a directory based on thehadoop.tmp.dir parameter configured in the core-site.xml file.The directory configured in this example is /home/hadoop_tmp.

Step 4 Format ZKFC.

Format ZKFC on server1.

hdfs zkfc -formatZK

Step 5 Start the HDFS.

Start HDFS on server1.

cd /usr/local/hadoop/sbin./start-dfs.sh

Step 6 Start Yarn.

Start Yarn on server1.

cd /usr/local/hadoop/sbin./start-yarn.sh

Step 7 Check whether all processes are started properly.

NO TE

Perform this operation on each node to check whether all processes are started properly.(The following figures show the processes to be started on server1 and agent1, respectively.The processes to be started on other server nodes and agent nodes are similar.)

jps

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 100

Page 108: Deployment Guide (Apache)

----End

8.5.6 Verifying HadoopEnter the URL in the address box of the browser to access the Hadoop web page.The URL format is http://server1:50070.

Change server1 to the IP address of the node where the server process resides.Check whether the number of live nodes is the same as the number of agentnodes (the quantity is 3 in this section) and whether the number of dead nodes is0. If yes, the cluster is started properly.

8.6 Deploying Spark

8.6.1 Obtaining SparkStep 1 Download the Spark package from the following website:

https://archive.apache.org/dist/spark/spark-2.3.2/

Step 2 Place spark-2.3.2-bin-hadoop2.7.tgz in the /usr/local directory on server1 anddecompress it.mv spark-2.3.2-bin-hadoop2.7.tgz /usr/localtar -zxvf spark-2.3.2-bin-hadoop2.7.tgz

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 101

Page 109: Deployment Guide (Apache)

Step 3 Create a soft link for subsequent version update.ln -s spark-2.3.2-bin-hadoop2.7 spark

----End

8.6.2 Setting Spark Environment VariablesStep 1 Open the /etc/profile file.

vim /etc/profile

Step 2 Add the following environment variables to the end of the file:export SPARK_HOME=/usr/local/sparkexport PATH=$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH

Step 3 Make the environment variables take effect.source /etc/profile

----End

8.6.3 Modifying the Spark Configuration FilesNO TE

All Spark configuration files are stored in the $SPARK_HOME/conf directory. Beforemodifying the configuration files, switch to the $SPARK_HOME/conf directory.cd $SPARK_HOME/conf

Modifying the spark-env.sh File

Step 1 Use spark-env.sh.template as the template to copy a file and name it spark-env.sh.cp spark-env.sh.template spark-env.sh

Step 2 Open the spark-env.sh file.vim spark-env.sh

Change the value of the environment variable JAVA_HOME to an absolute path,and specify the Hadoop directory, IP address and port number of the Spark masternode, and Spark directory.

export JAVA_HOME=/usr/local/jdk8u252-b09export HADOOP_HOME=/usr/local/hadoopexport SCALA_HOME=/usr/local/scalaexport HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoopexport HDP_VERSION=3.1.0

----End

Modifying the spark-defaults.conf File

Modify the file.echo "spark.master yarn" >> spark-defaults.confecho "spark.eventLog.enabled true" >> spark-defaults.confecho "spark.eventLog.dir hdfs://server1:9000/spark2-history" >> spark-defaults.confecho "spark.eventLog.compress true" >> spark-defaults.confecho "spark.history.fs.logDirectory hdfs://server1:9000/spark2-history" >> spark-defaults.conf

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 102

Page 110: Deployment Guide (Apache)

Synchronizing the core-site.xml and hdfs-site.xml files of Hadoop.Synchronize the files.

cp /usr/local/hadoop/etc/hadoop/core-site.xml /usr/local/spark/confcp /usr/local/hadoop/etc/hadoop/hdfs-site.xml /usr/local/spark/conf

Synchronizing the mariadb-java-client PackageNO TE

If the Hive database is used, synchronize the mariadb-java-client package.

Synchronize the package.

cp /usr/local/hive/lib/mariadb-java-client-2.3.0.jar /usr/local/spark/jars

8.6.4 Running Spark (Standalone Mode)

8.6.4.1 Synchronizing the Configuration to Other Nodes

Step 1 Copy spark-2.3.2-bin-hadoop2.7 to the /usr/local directory of agent1, agent2,and agent3.scp -r /usr/local/spark-2.3.2-bin-hadoop2.7 root@agent1:/usr/localscp -r /usr/local/spark-2.3.2-bin-hadoop2.7 root@agent2:/usr/localscp -r /usr/local/spark-2.3.2-bin-hadoop2.7 root@agent3:/usr/local

Step 2 Log in to agent1, agent2, and agent3 and create soft links for spark-2.3.2-bin-hadoop2.7.cd /usr/localln -s spark-2.3.2-bin-hadoop2.7 spark

----End

8.6.4.2 Starting the Spark ClusterStart the Spark cluster on the server1 node.

cd /usr/local/spark/sbin./start-all.sh

8.6.4.3 (Optional) Stopping the Spark ClusterStop the Spark cluster on the server1 node.

cd /usr/local/spark/sbin./stop-all.sh

8.6.5 Running Spark (on Yarn Mode)

8.6.5.1 Installing Scala

Step 1 Place scala-2.11.12.tgz in the /usr/local directory on server1 and decompress it.tar -zvxf scala-2.11.12.tgz

Step 2 Create a soft link.ln -s scala-2.11.12 scala

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 103

Page 111: Deployment Guide (Apache)

Step 3 Edit the /etc/profile file.vim /etc/profile

Step 4 Add the following environment variables to the end of the file:export SCALA_HOME=/usr/local/scalaexport PATH=$SCALA_HOME/bin:$PATH

Step 5 Make the environment variables take effect.source /etc/profile

----End

8.6.5.2 Running in the Yarn-client Mode

Step 1 Submit the task to Yarn and enter the spark-shell mode.spark-shell --master yarn --deploy-mode client

Step 2 Access the Yarn web page at http://server1:8088). The new task is displayed.

Change server1 to the IP address of the node where the NameNode processresides.

Step 3 Click Application for a task in the Tracking column. The details page is displayed.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 104

Page 112: Deployment Guide (Apache)

Step 4 Click ApplicationMaster to access the Spark page (If the page cannot be directlydisplayed, manually change server1 to the actual IP address).

----End

8.6.5.3 Using HiBench to Verify the FunctionsNO TE

The cluster name involved in the operations is specified by the fs.defaultFS parameter inthe Hadoop configuration file core-site.xml.

Step 1 Upload HiBench-HiBench-7.0 to the /opt directory and go to the conf directory.cd /opt/HiBench-HiBench-7.0/conf

Step 2 Modify the hadoop.conf file.vim hadoop.conf

Change the value of hibench.hadoop.home to the location where Hadoop isstored and the value of hibench.hadf.master to hdfs://cluster_name.hibench.hadoop.home /usr/local/hadoop/hibench.hdfs.master hdfs://ns1

Step 3 Modify the spark.conf file.vim spark.conf

Change the value of hibench.spark.home to the current location where Spark isstored, the value of hibench.spark.master to yarn-client, and the value ofspark.eventLog.dir to hdfs://cluster name/spark2xJobHistory2.hibench.spark.home /usr/local/sparkhibench.spark.master yarn-clientspark.eventLog.dir = hdfs://ns1/spark2xJobHistory2

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 105

Page 113: Deployment Guide (Apache)

Step 4 Create the spark2xJobHistory2x directory in HDFS and check whether thedirectory is created successfully.hdfs dfs -mkdir /spark2xJobHistory2xhdfs dfs -ls /

Step 5 Switch to the HiBench root directory and generate test data.cd /opt/HiBench-HiBench-7.0/bin/workloads/ml/kmeans/prepare/prepare.sh

Step 6 Run the test script.bin/workloads/ml/kmeans/spark/run.sh

Step 7 The application status of the tasks executed in steps 5 and 6 can be viewed on theYarn web page at http://server1:8088.

Change server1 to the IP address of the node where the server process resides.

Step 8 Check the test result in the report/hibench.report file.vim report/hibench.report

----End

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 106

Page 114: Deployment Guide (Apache)

9 Storm Deployment Guide (CentOS 7.6 &openEuler 20.03)

9.1 Introduction

9.2 Environment Requirements

9.3 Configuring the Deployment Environment

9.4 Deploying ZooKeeper

9.5 Deploying Storm

9.1 Introduction

Storm OverviewThis document describes the Storm deployment procedure. It does not include thesoftware source code compilation procedure.

All programs required in this document are downloaded from the official websites.Most of these programs are compiled based on the x86 platform and may containmodules that are implemented in platform-dependent languages (such as C/C++).Therefore, incompatibility issues may occur if these programs are directly run onTaiShan servers. To resolve the problem, you need to download and compile thesource code and then deploy the programs. The deployment procedure is the sameregardless of the program compilation platform.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

9 Storm Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 107

Page 115: Deployment Guide (Apache)

Recommended Versions

Software

Version

How to Obtain

OpenJDK

jdk8u252-b09

ARM: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gzx86: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz

ZooKeeper

3.4.6 Download the software package of the required versionfrom:https://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/

Storm 1.2.1 Download the software package of the required versionfrom:https://archive.apache.org/dist/storm/apache-storm-1.2.1/

9.2 Environment Requirements

HardwareMinimum configuration: any CPU, one DIMM of any capacity, and one drive of anycapacity

The configuration depends on the actual application scenario.

OS RequirementsCentOS 7.4 to 7.6, openEuler 20.03

NO TE

This section uses CentOS 7.6 as an example to describe how to deploy a Storm cluster.

Cluster PlanningIn this document, four hosts are used as nodes 1 to 4 in a cluster. Table 9-1 liststhe data plan of each node.

Table 9-1 Cluster data plan

Node IP Address Number of Drives OS & JDK

Node 1 IPaddress1 System drive: 1 x 4TB HDD

CentOS 7.6 & OpenJDKjdk8u252-b09

Node 2 IPaddress2

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

9 Storm Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 108

Page 116: Deployment Guide (Apache)

Node IP Address Number of Drives OS & JDK

Node 3 Data drive: 12 x 4TB HDD

IPaddress3

Node 4 IPaddress4

Software PlanningTable 9-2 lists the software plan of each node in the cluster.

Table 9-2 Software plan

Node Service

Node 1 -

Node 2 Nimbus and UI

Node 3 QuorumPeerMain and Supervisor

Node 4 QuorumPeerMain and Supervisor

9.3 Configuring the Deployment EnvironmentStep 1 Log in to nodes 1 to 4 in sequence and change their host names to server1,

agent1, agent2 and agent3.hostnamectl set-hostname host_name --static

Step 2 Log in to each node and modify the /etc/hosts file.

Add the mapping between the IP addresses and host names of the nodes to thehosts file.

IPaddress1 server1IPaddress2 agent1IPaddress3 agent2IPaddress4 agent3

Step 3 Log in to each node and disable the firewall.systemctl stop firewalld.servicesystemctl disable firewalld.service

Step 4 Log in to each node and enable password-free SSH login.

1. Generate a key and press Enter if any message is prompted.ssh-keygen -t rsa

2. Enable password-free SSH login on each node (including password-free loginfor the local node):ssh-copy-id -i ~/.ssh/id_rsa.pub root@node_IP_address

Step 5 Log in to each node and install OpenJDK.

1. Install OpenJDK.ARM:

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

9 Storm Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 109

Page 117: Deployment Guide (Apache)

wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gz -C /usr/local

x86:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz -C /usr/local

2. Add environment variables.vim /etc/profileexport JAVA_HOME=/usr/local/jdk8u252-b09export PATH=$JAVA_HOME/bin:$PATH

3. Make the environment variables take effect.source /etc/profile

4. Check whether the OpenJDK is successfully installed.java -version

The installation is successful if information similar to the following isdisplayed:

----End

9.4 Deploying ZooKeeper

9.4.1 Compiling and Decompressing ZooKeeperStep 1 Compile the zookeeper-3.4.6.tar.gz deployment package by following instructions

in ZooKeeper 3.4.6 Porting Guide (CentOS 7.6).

Step 2 Place zookeeper-3.4.6.tar.gz in the /usr/local directory on agent1 anddecompress it.mv zookeeper-3.4.6.tar.gz /usr/localcd /usr/localtar -zxvf zookeeper-3.4.6.tar.gz

Step 3 Create a soft link for subsequent version update.ln -s zookeeper-3.4.6 zookeeper

----End

9.4.2 Setting ZooKeeper Environment VariablesStep 1 Open the configuration file.

vim /etc/profile

Step 2 Add ZooKeeper to environment variables.export ZOOKEEPER_HOME=/usr/local/zookeeperexport PATH=$ZOOKEEPER_HOME/bin:$PATH

Step 3 Make the environment variables take effect.source /etc/profile

----End

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

9 Storm Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 110

Page 118: Deployment Guide (Apache)

9.4.3 Modifying the ZooKeeper Configuration FilesStep 1 Switch to the directory where ZooKeeper is located.

cd /usr/local/zookeeper/conf

Step 2 Copy the configuration file.cp zoo_sample.cfg zoo.cfg

Step 3 Modify the configuration file.vim zoo.cfg

1. Change the data directory.dataDir=/usr/local/zookeeper/tmp

2. Add the following code to the end of the file. server.1 to server.3 are thenodes where ZooKeeper is deployed.server.1=agent1:2888:3888server.2=agent2:2888:3888server.3=agent3:2888:3888

Step 4 Create the tmp directory as the data directory.mkdir /usr/local/zookeeper/tmp

Step 5 Create an empty file in the tmp directory and write an ID to the file.touch /usr/local/zookeeper/tmp/myidecho 1 > /usr/local/zookeeper/tmp/myid

----End

9.4.4 Synchronizing the Configuration to Other NodesStep 1 Copy the ZooKeeper configuration to other nodes.

scp -r /usr/local/zookeeper-3.4.6 root@agent2:/usr/localscp -r /usr/local/zookeeper-3.4.6 root@agent3:/usr/local

Step 2 Create a soft link and modify myid on agent2 and agent3.

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

9 Storm Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 111

Page 119: Deployment Guide (Apache)

● agent2:cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 2 > /usr/local/zookeeper/tmp/myid

● agent3:cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 3 > /usr/local/zookeeper/tmp/myid

----End

9.4.5 Running and Verifying ZooKeeperStep 1 Start ZooKeeper on agent1, agent2, and agent3.

cd /usr/local/zookeeper/bin./zkServer.sh start

NO TE

You can stop ZooKeeper on agent1, agent2, and agent3.cd /usr/local/zookeeper/bin./zkServer.sh stop

Step 2 Check the ZooKeeper status../zkServer.sh status

----End

9.5 Deploying Storm

9.5.1 Obtaining StormStep 1 Download the Storm package.

wget https://archive.apache.org/dist/storm/apache-storm-1.2.1/apache-storm-1.2.1.tar.gz

Step 2 Place apache-storm-1.2.1.tar.gz in the /usr/local directory on server1 anddecompress it.mv apache-storm-1.2.1.tar.gz /usr/localtar -zxvf apache-storm-1.2.1.tar.gz

Step 3 Create a soft link for subsequent version update.ln -s apache-storm-1.2.1 storm

----End

9.5.2 Setting Storm Environment VariablesStep 1 Open the configuration file.

vim /etc/profile

Step 2 Add Storm paths to environment variables.export STORM_HOME=/usr/local/stormexport PATH=$STORM_HOME/bin:$PATH

Step 3 Make the environment variables take effect.source /etc/profile

----End

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

9 Storm Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 112

Page 120: Deployment Guide (Apache)

9.5.3 Modifying the Storm Configuration FileNO TE

All Storm configuration files are stored in the $STORM_HOME/conf directory. Beforemodifying the configuration files, switch to the $STORM_HOME/conf.cd $STORM_HOME/conf

Modify the storm.yaml file.vim storm.yaml

Content to be modified is as follows:storm.zookeeper.servers:- "agent1"#You can change it to the corresponding IP address.- "agent2"- "agent3"storm.zookeeper.port: 2181storm.local.dir: "/usr/local/storm/stormLocal"# You need to manually create it.nimbus.seeds: ["server1"]#You can change it to the corresponding IP address.supervisor.slots.ports:#The number of slots depends on the actual situation.- 6700- 6701- 6702- 6703storm.health.check.dir: "healthchecks"storm.health.check.timeout.ms: 5000

9.5.4 Synchronizing the Configuration to Other NodesStep 1 Copy apache-storm-1.2.1 to the /usr/local directory of agent1 to agent3.

scp -r /usr/local/apache-storm-1.2.1 root@agent1:/usr/localscp -r /usr/local/apache-storm-1.2.1 root@agent2:/usr/localscp -r /usr/local/apache-storm-1.2.1 root@agent3:/usr/local

Step 2 On each of agent1 to agent3, create a soft link for apache-storm-1.2.1.cd /usr/localln -s apache-storm-1.2.1 storm

----End

9.5.5 Runing and Verifying StormStep 1 Start a Storm cluster.

1. Start a ZooKeeper cluster. For details, see Running and Verifying ZooKeeper.2. Start the following processes on server1:

/usr/local/storm/bin/storm nimbus &/usr/local/storm/bin/storm ui &

3. Start the following process on agent1 to agent3:/usr/local/storm/bin/storm supervisor &

4. Check whether all processes are started properly. The core process is the UIprocess.jps

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

9 Storm Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 113

Page 121: Deployment Guide (Apache)

NO TE

Observe the process startup status on all nodes. The process that needs to be startedon the server node is the process of server1, and the process that needs to be startedon the agent node is the process of agent1. After starting these processes, wait forabout 30 seconds. Connect the node where the Storm resides to other nodes until theStorm is started properly. Otherwise, the startup fails. (You can restart the Storm torectify the fault.)

Step 2 Stop the Storm cluster.

1. Stop the following processes on server1:jps | grep nimbus | grep -v grep | awk '{print $1}' | xargs kill -9jps | grep core | grep -v grep | awk '{print $1}' | xargs kill -9

2. Stop the following process on agent1 to agent3:jps | grep Supervisor | grep -v grep | awk '{print $1}' | xargs kill -9

3. Stop the ZooKeeper cluster. For details, see Running and VerifyingZooKeeper.

----End

Kunpeng BoostKit for Big DataDeployment Guide (Apache)

9 Storm Deployment Guide (CentOS 7.6 &openEuler 20.03)

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 114

Page 122: Deployment Guide (Apache)

A Change History

Date Description

2021-07-13

This issue is the fourth official release.Added the adaptation to openEuler 20.03 in the deploymentguides of Apache components.

2020-07-24

This issue is the third official release.Moved Elasticsearch and Redis deployment guide to Othercategories. For details, see Elasticsearch Deployment Guide(CentOS 7.6 & openEuler 20.03) and Redis Deployment Guide(CentOS 7.6 & openEuler 20.03).

Kunpeng BoostKit for Big DataDeployment Guide (Apache) A Change History

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 115

Page 123: Deployment Guide (Apache)

Date Description

2020-05-23

This issue is the second official release.● Modified some descriptions in section "1.4.3 Modifying the

ZooKeeper Configuration Files" in the ZooKeeper DeploymentGuide (CentOS 7.6).

● Modified the description in "Environment Requirements" and"Configuring the Deployment Environment" in the ElasticsearchDeployment Guide (CentOS 7.6) as well as the parameterdescription in "Modifying the Elasticsearch Configuration File"and "Synchronizing the Configuration to Other Nodes."

● Modified the parameter description in 3.6.3 Modifying theFlink Configuration Files in the Flink Deployment Guide(CentOS 7.6).

● Modified some descriptions in 4.6.5 Starting the HBaseCluster in the HBase Cluster Deployment Guide (CentOS 7.6).

● Modified the parameter description in 6.5.3 Modifying theKafka Configuration Files in the Kafka Deployment Guide(CentOS 7.6).

● Modified some descriptions in "Deploying a Cluster" in theRedis Deployment Guide (CentOS 7.6).

● Deleted "Troubleshooting" in the Spark Deployment Guide(CentOS 7.6).

● Modified some descriptions in 8.6.5.3 Using HiBench to Verifythe Functions in the Spark Deployment Guide (CentOS 7.6).

2020-03-20

This issue is the first official release.

Kunpeng BoostKit for Big DataDeployment Guide (Apache) A Change History

Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 116