deployment guide (apache)
TRANSCRIPT
Kunpeng BoostKit for Big Data
Deployment Guide (Apache)
Issue 05
Date 2021-10-19
HUAWEI TECHNOLOGIES CO., LTD.
Copyright © Huawei Technologies Co., Ltd. 2021. All rights reserved.
No part of this document may be reproduced or transmitted in any form or by any means without priorwritten consent of Huawei Technologies Co., Ltd. Trademarks and Permissions
and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.All other trademarks and trade names mentioned in this document are the property of their respectiveholders. NoticeThe purchased products, services and features are stipulated by the contract made between Huawei andthe customer. All or part of the products, services and features described in this document may not bewithin the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements,information, and recommendations in this document are provided "AS IS" without warranties, guaranteesor representations of any kind, either express or implied.
The information in this document is subject to change without notice. Every effort has been made in thepreparation of this document to ensure accuracy of the contents, but all statements, information, andrecommendations in this document do not constitute a warranty of any kind, express or implied.
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. i
Contents
1 ZooKeeper Deployment Guide (CentOS 7.6 & openEuler 20.03).................................11.1 Introduction............................................................................................................................................................................... 11.2 Environment Requirements................................................................................................................................................. 21.3 Configuring the Deployment Environment.................................................................................................................... 31.4 Deploying ZooKeeper.............................................................................................................................................................41.4.1 Compiling and Decompressing ZooKeeper................................................................................................................. 41.4.2 Setting ZooKeeper Environment Variables................................................................................................................. 41.4.3 Modifying the ZooKeeper Configuration Files........................................................................................................... 41.4.4 Synchronizing the Configuration to Other Nodes.................................................................................................... 51.5 Running and Verifying ZooKeeper.....................................................................................................................................6
2 Hadoop Cluster Deployment Guide (CentOS 7.6 & openEuler 20.03)........................72.1 Introduction............................................................................................................................................................................... 72.2 Environment Requirements................................................................................................................................................. 82.3 Configuring the Deployment Environment.................................................................................................................... 92.4 Deploying ZooKeeper.......................................................................................................................................................... 102.4.1 Compiling and Decompressing ZooKeeper...............................................................................................................102.4.2 Setting ZooKeeper Environment Variables............................................................................................................... 102.4.3 Modifying the ZooKeeper Configuration Files........................................................................................................ 112.4.4 Synchronizing the Configuration to Other Nodes..................................................................................................122.4.5 Running and Verifying ZooKeeper...............................................................................................................................122.5 Deploying Hadoop................................................................................................................................................................122.5.1 Compiling and Decompressing Hadoop.................................................................................................................... 122.5.2 Setting the Hadoop Environment Variables.............................................................................................................122.5.3 Modifying the Hadoop Configuration File................................................................................................................132.5.4 Synchronizing the Configuration to Other Nodes..................................................................................................182.5.5 Starting the Hadoop Cluster..........................................................................................................................................182.5.6 Verifying Hadoop...............................................................................................................................................................192.6 Troubleshooting..................................................................................................................................................................... 20
3 Flink Deployment Guide (CentOS 7.6 & openEuler 20.03)......................................... 213.1 Introduction............................................................................................................................................................................ 213.2 Environment Requirements............................................................................................................................................... 223.3 Configuring the Deployment Environment.................................................................................................................. 23
Kunpeng BoostKit for Big DataDeployment Guide (Apache) Contents
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. ii
3.4 Deploying ZooKeeper.......................................................................................................................................................... 243.4.1 Compiling and Decompressing ZooKeeper...............................................................................................................243.4.2 Setting ZooKeeper Environment Variables............................................................................................................... 243.4.3 Modifying the ZooKeeper Configuration Files........................................................................................................ 253.4.4 Synchronizing the Configuration to Other Nodes..................................................................................................263.4.5 Running and Verifying ZooKeeper...............................................................................................................................263.5 Deploying Hadoop................................................................................................................................................................263.5.1 Compiling and Decompressing Hadoop.................................................................................................................... 263.5.2 Setting the Hadoop Environment Variables.............................................................................................................263.5.3 Modifying the Hadoop Configuration File................................................................................................................273.5.4 Synchronizing the Configuration to Other Nodes..................................................................................................323.5.5 Starting the Hadoop Cluster..........................................................................................................................................323.5.6 Verifying Hadoop...............................................................................................................................................................333.6 Deploying Flink (Flink on Yarn)....................................................................................................................................... 343.6.1 Obtaining Flink...................................................................................................................................................................343.6.2 Setting Flink Environment Variables...........................................................................................................................343.6.3 Modifying the Flink Configuration Files.................................................................................................................... 353.6.4 Running and Verifying Flink.......................................................................................................................................... 353.6.5 Stopping Flink..................................................................................................................................................................... 35
4 HBase Cluster Deployment Guide (CentOS 7.6 & openEuler 20.03).........................364.1 Introduction............................................................................................................................................................................ 364.2 Environment Requirements............................................................................................................................................... 374.3 Configuring the Deployment Environment.................................................................................................................. 384.4 Deploying ZooKeeper.......................................................................................................................................................... 394.4.1 Compiling and Decompressing ZooKeeper...............................................................................................................394.4.2 Setting ZooKeeper Environment Variables............................................................................................................... 394.4.3 Modifying the ZooKeeper Configuration Files........................................................................................................ 404.4.4 Synchronizing the Configuration to Other Nodes..................................................................................................414.4.5 Running and Verifying ZooKeeper...............................................................................................................................414.5 Deploying Hadoop................................................................................................................................................................414.5.1 Compiling and Decompressing Hadoop.................................................................................................................... 414.5.2 Setting the Hadoop Environment Variables.............................................................................................................424.5.3 Modifying the Hadoop Configuration File................................................................................................................424.5.4 Synchronizing the Configuration to Other Nodes..................................................................................................474.5.5 Starting the Hadoop Cluster..........................................................................................................................................474.5.6 Verifying Hadoop...............................................................................................................................................................484.6 Deploying HBase................................................................................................................................................................... 494.6.1 Obtaining HBase................................................................................................................................................................494.6.2 Setting HBase Environment Variables........................................................................................................................494.6.3 Modifying the HBase Configuration Files................................................................................................................. 504.6.4 Synchronizing the Configuration to Other Nodes..................................................................................................514.6.5 Starting the HBase Cluster............................................................................................................................................. 51
Kunpeng BoostKit for Big DataDeployment Guide (Apache) Contents
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. iii
4.6.6 (Optional) Stopping the HBase Cluster..................................................................................................................... 524.6.7 Verifying HBase.................................................................................................................................................................. 52
5 Hive Deployment Guide (CentOS 7.6 & openEuler 20.03).......................................... 535.1 Introduction............................................................................................................................................................................ 535.2 Environment Requirements............................................................................................................................................... 545.3 Configuring the Deployment Environment.................................................................................................................. 555.4 Deploying ZooKeeper.......................................................................................................................................................... 565.4.1 Compiling and Decompressing ZooKeeper...............................................................................................................565.4.2 Setting ZooKeeper Environment Variables............................................................................................................... 565.4.3 Modifying the ZooKeeper Configuration Files........................................................................................................ 575.4.4 Synchronizing the Configuration to Other Nodes..................................................................................................585.4.5 Running and Verifying ZooKeeper...............................................................................................................................585.5 Deploying Hadoop................................................................................................................................................................585.5.1 Compiling and Decompressing Hadoop.................................................................................................................... 585.5.2 Setting the Hadoop Environment Variables.............................................................................................................585.5.3 Modifying the Hadoop Configuration File................................................................................................................595.5.4 Synchronizing the Configuration to Other Nodes..................................................................................................645.5.5 Starting the Hadoop Cluster..........................................................................................................................................645.5.6 Verifying Hadoop...............................................................................................................................................................655.6 Deploying Hive.......................................................................................................................................................................665.6.1 Installing MariaDB............................................................................................................................................................ 665.6.2 Obtaining Hive................................................................................................................................................................... 675.6.3 Setting Hive Environment Variables........................................................................................................................... 675.6.4 Modifying the Hive Configuration Files.....................................................................................................................685.6.5 Starting and Verifying Hive............................................................................................................................................68
6 Kafka Deployment Guide (CentOS 7.6 & openEuler 20.03)........................................716.1 Introduction............................................................................................................................................................................ 716.2 Environment Requirements............................................................................................................................................... 726.3 Configuring the Deployment Environment.................................................................................................................. 736.4 Deploying ZooKeeper.......................................................................................................................................................... 746.4.1 Compiling and Decompressing ZooKeeper...............................................................................................................746.4.2 Setting ZooKeeper Environment Variables............................................................................................................... 746.4.3 Modifying the ZooKeeper Configuration Files........................................................................................................ 756.4.4 Synchronizing the Configuration to Other Nodes..................................................................................................756.4.5 Running and Verifying ZooKeeper...............................................................................................................................766.5 Deploying Kafka.................................................................................................................................................................... 766.5.1 Obtaining Kafka................................................................................................................................................................. 766.5.2 Setting Kafka Environment Variables......................................................................................................................... 766.5.3 Modifying the Kafka Configuration Files.................................................................................................................. 776.5.4 Verifying Kafka................................................................................................................................................................... 77
7 Solr Deployment Guide (CentOS 7.6 & openEuler 20.03)........................................... 79
Kunpeng BoostKit for Big DataDeployment Guide (Apache) Contents
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. iv
7.1 Introduction............................................................................................................................................................................ 797.2 Environment Requirements............................................................................................................................................... 807.3 Configuring the Deployment Environment.................................................................................................................. 817.4 Deploying ZooKeeper.......................................................................................................................................................... 827.4.1 Compiling and Decompressing ZooKeeper...............................................................................................................827.4.2 Setting ZooKeeper Environment Variables............................................................................................................... 827.4.3 Modifying the ZooKeeper Configuration Files........................................................................................................ 837.4.4 Synchronizing the Configuration to Other Nodes..................................................................................................847.4.5 Running and Verifying ZooKeeper...............................................................................................................................847.5 Deploying Solr........................................................................................................................................................................847.5.1 Obtaining Solr.................................................................................................................................................................... 847.5.2 Setting Solr Environment Variables............................................................................................................................ 857.5.3 Copy the Solr Configuration.......................................................................................................................................... 857.5.4 Modifying the Configuration......................................................................................................................................... 857.5.5 Synchronizing the Configuration to Other Nodes..................................................................................................867.5.6 Uploading the Configuration to the ZooKeeper Cluster...................................................................................... 867.5.7 Running and Verifying Solr............................................................................................................................................ 87
8 Spark Cluster Deployment Guide (CentOS 7.6 & openEuler 20.03)......................... 888.1 Introduction............................................................................................................................................................................ 888.2 Environment Requirements............................................................................................................................................... 898.3 Configuring the Deployment Environment.................................................................................................................. 908.4 Deploying ZooKeeper.......................................................................................................................................................... 918.4.1 Compiling and Decompressing ZooKeeper...............................................................................................................918.4.2 Setting ZooKeeper Environment Variables............................................................................................................... 928.4.3 Modifying the ZooKeeper Configuration Files........................................................................................................ 928.4.4 Synchronizing the Configuration to Other Nodes..................................................................................................938.4.5 Running and Verifying ZooKeeper...............................................................................................................................938.5 Deploying Hadoop................................................................................................................................................................938.5.1 Compiling and Decompressing Hadoop.................................................................................................................... 938.5.2 Setting the Hadoop Environment Variables.............................................................................................................948.5.3 Modifying the Hadoop Configuration File................................................................................................................948.5.4 Synchronizing the Configuration to Other Nodes..................................................................................................998.5.5 Starting the Hadoop Cluster..........................................................................................................................................998.5.6 Verifying Hadoop............................................................................................................................................................ 1018.6 Deploying Spark.................................................................................................................................................................. 1018.6.1 Obtaining Spark...............................................................................................................................................................1018.6.2 Setting Spark Environment Variables.......................................................................................................................1028.6.3 Modifying the Spark Configuration Files................................................................................................................ 1028.6.4 Running Spark (Standalone Mode).......................................................................................................................... 1038.6.4.1 Synchronizing the Configuration to Other Nodes............................................................................................1038.6.4.2 Starting the Spark Cluster........................................................................................................................................ 1038.6.4.3 (Optional) Stopping the Spark Cluster................................................................................................................ 103
Kunpeng BoostKit for Big DataDeployment Guide (Apache) Contents
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. v
8.6.5 Running Spark (on Yarn Mode)................................................................................................................................. 1038.6.5.1 Installing Scala..............................................................................................................................................................1038.6.5.2 Running in the Yarn-client Mode........................................................................................................................... 1048.6.5.3 Using HiBench to Verify the Functions................................................................................................................ 105
9 Storm Deployment Guide (CentOS 7.6 & openEuler 20.03).....................................1079.1 Introduction.......................................................................................................................................................................... 1079.2 Environment Requirements.............................................................................................................................................1089.3 Configuring the Deployment Environment............................................................................................................... 1099.4 Deploying ZooKeeper........................................................................................................................................................ 1109.4.1 Compiling and Decompressing ZooKeeper............................................................................................................ 1109.4.2 Setting ZooKeeper Environment Variables.............................................................................................................1109.4.3 Modifying the ZooKeeper Configuration Files...................................................................................................... 1119.4.4 Synchronizing the Configuration to Other Nodes............................................................................................... 1119.4.5 Running and Verifying ZooKeeper............................................................................................................................ 1129.5 Deploying Storm................................................................................................................................................................. 1129.5.1 Obtaining Storm.............................................................................................................................................................. 1129.5.2 Setting Storm Environment Variables......................................................................................................................1129.5.3 Modifying the Storm Configuration File................................................................................................................. 1139.5.4 Synchronizing the Configuration to Other Nodes............................................................................................... 1139.5.5 Runing and Verifying Storm........................................................................................................................................ 113
A Change History....................................................................................................................115
Kunpeng BoostKit for Big DataDeployment Guide (Apache) Contents
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. vi
1 ZooKeeper Deployment Guide (CentOS7.6 & openEuler 20.03)
1.1 Introduction
1.2 Environment Requirements
1.3 Configuring the Deployment Environment
1.4 Deploying ZooKeeper
1.5 Running and Verifying ZooKeeper
1.1 Introduction
ZooKeeper OverviewThis document describes the ZooKeeper deployment procedure and does notinclude the source code compilation procedure.
All programs required in this document are downloaded from the official websites.Most of these programs are compiled based on the x86 platform and may containmodules that are implemented in platform-dependent languages (such as C/C++).Therefore, incompatibility issues may occur if these programs are directly run onTaiShan servers. To resolve the problem, you need to download and compile thesource code and then deploy the programs. The deployment procedure is the sameregardless of the program compilation platform.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
1 ZooKeeper Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 1
Recommended VersionsSoftware
Version
How to Obtain
OpenJDK
jdk8u252-b09
ARM: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gzx86: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz
ZooKeeper
3.4.6 Download the software package of the required versionfrom the official website:https://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/
1.2 Environment Requirements
HardwareMinimum configuration: any CPU, one DIMM of any capacity, and one drive of anycapacity
The configuration depends on the actual application scenario.
OS RequirementsCentOS 7.4 to 7.6, openEuler 20.03
NO TE
This document uses CentOS 7.6 as an example to describe how to deploy a ZooKeepercluster.
Cluster EnvironmentFor example, the cluster has nodes 1 to 4. Table 1-1 lists the data plan of eachnode.
Table 1-1 Cluster data plan
Node IP Address Number of Drives OS & JDK
Node 1 IPaddress1 System drive: 1 x 4TB HDD
Data drive: 12 x 4TB HDD
CentOS 7.6 & OpenJDKjdk8u252-b09
Node 2 IPaddress2
Node 3 IPaddress3
Node 4 IPaddress4
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
1 ZooKeeper Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 2
Software PlanningTable 1-2 lists the software plan of each node in the cluster.
Table 1-2 Software plan
Node Services
Node 1 -
Node 2 QuorumPeerMain
Node 3 QuorumPeerMain
Node 4 QuorumPeerMain
1.3 Configuring the Deployment EnvironmentStep 1 Log in to nodes 1 to 4 in sequence and change their host names to server1,
agent1, agent2 and agent3.hostnamectl set-hostname host_name --static
Step 2 Log in to each node and modify the /etc/hosts file.
Add the mapping between the IP addresses and host names of the nodes to thehosts file.
IPaddress1 server1IPaddress2 agent1IPaddress3 agent2IPaddress4 agent3
Step 3 Log in to each node and disable the firewall.systemctl stop firewalld.servicesystemctl disable firewalld.service
Step 4 Log in to each node and enable password-free SSH login.
1. Generate a key and press Enter if any message is prompted.ssh-keygen -t rsa
2. Enable password-free SSH login on each node (including password-free loginfor the local node):ssh-copy-id -i ~/.ssh/id_rsa.pub root@node_IP_address
Step 5 Log in to each node and install OpenJDK.
1. Install OpenJDK.ARM:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gz -C /usr/local
x86:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz -C /usr/local
2. Add environment variables.vim /etc/profile
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
1 ZooKeeper Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 3
export JAVA_HOME=/usr/local/jdk8u252-b09export PATH=$JAVA_HOME/bin:$PATH
3. Make the environment variables take effect.source /etc/profile
4. Check whether the OpenJDK is successfully installed.java -version
The installation is successful if information similar to the following isdisplayed:
----End
1.4 Deploying ZooKeeper
1.4.1 Compiling and Decompressing ZooKeeperStep 1 Compile the zookeeper-3.4.6.tar.gz deployment package by following instructions
in ZooKeeper 3.4.6 Porting Guide (CentOS 7.6).
Step 2 Place zookeeper-3.4.6.tar.gz in the /usr/local directory on agent1 anddecompress it.mv zookeeper-3.4.6.tar.gz /usr/localcd /usr/localtar -zxvf zookeeper-3.4.6.tar.gz
Step 3 Create a soft link for subsequent version update.ln -s zookeeper-3.4.6 zookeeper
----End
1.4.2 Setting ZooKeeper Environment VariablesStep 1 Open the configuration file.
vim /etc/profile
Step 2 Add ZooKeeper to environment variables.export ZOOKEEPER_HOME=/usr/local/zookeeperexport PATH=$ZOOKEEPER_HOME/bin:$PATH
Step 3 Make the environment variables take effect.source /etc/profile
----End
1.4.3 Modifying the ZooKeeper Configuration FilesStep 1 Switch to the directory where ZooKeeper is located.
cd /usr/local/zookeeper/conf
Step 2 Copy the configuration file.cp zoo_sample.cfg zoo.cfg
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
1 ZooKeeper Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 4
Step 3 Modify the configuration file.vim zoo.cfg
1. Change the data directory.dataDir=/usr/local/zookeeper/tmp
2. Add the following code to the end of the file. server.1 to server.3 are thenodes where ZooKeeper is deployed.server.1=agent1:2888:3888server.2=agent2:2888:3888server.3=agent3:2888:3888
Step 4 Create the tmp directory as the data directory.mkdir /usr/local/zookeeper/tmp
Step 5 Create an empty file in the tmp directory and write an ID to the file.touch /usr/local/zookeeper/tmp/myidecho 1 > /usr/local/zookeeper/tmp/myid
----End
1.4.4 Synchronizing the Configuration to Other NodesStep 1 Copy the ZooKeeper configuration to other nodes.
scp -r /usr/local/zookeeper-3.4.6 root@agent2:/usr/localscp -r /usr/local/zookeeper-3.4.6 root@agent3:/usr/local
Step 2 Create a soft link and modify myid on agent2 and agent3.
● agent2:cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 2 > /usr/local/zookeeper/tmp/myid
● agent3:
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
1 ZooKeeper Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 5
cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 3 > /usr/local/zookeeper/tmp/myid
----End
1.5 Running and Verifying ZooKeeperStep 1 Start ZooKeeper on agent1, agent2, and agent3.
cd /usr/local/zookeeper/bin./zkServer.sh start
NO TE
You can stop ZooKeeper on agent1, agent2, and agent3.cd /usr/local/zookeeper/bin./zkServer.sh stop
Step 2 Check the ZooKeeper status../zkServer.sh status
----End
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
1 ZooKeeper Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 6
2 Hadoop Cluster Deployment Guide(CentOS 7.6 & openEuler 20.03)
2.1 Introduction
2.2 Environment Requirements
2.3 Configuring the Deployment Environment
2.4 Deploying ZooKeeper
2.5 Deploying Hadoop
2.6 Troubleshooting
2.1 Introduction
OverviewThis document describes the software deployment procedure and does not involvethe software source code compilation procedure.
You can download all programs required in this document from their officialwebsites. Most of these programs are compiled based on the x86 platform andmay contain modules that are implemented in platform-dependent languages(such as C/C++). Therefore, incompatibility issues may occur if these programsdirectly run on TaiShan servers. To resolve the problem, you need to download andcompile the source code and then deploy the programs. The deploymentprocedure remains the same regardless of the program compilation platform.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
2 Hadoop Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 7
Recommended Software VersionsSoftware
Version
How to Obtain
OpenJDK
jdk8u252-b09
ARM: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gzx86: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz
ZooKeeper
3.4.6 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/
Hadoop 3.1.1 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/hadoop/core/hadoop-3.1.1/
2.2 Environment Requirements
HardwareMinimum configuration: any CPU, one DIMM of any capacity, and one drive of anycapacity
The configuration depends on the actual application scenario.
OSCentOS 7.4 to 7.6, openEuler 20.03
NO TE
This section uses CentOS 7.6 as an example to describe how to deploy a Hadoop cluster.
Cluster Environment PlanIn this document, four hosts are used as nodes 1 to 4 in a cluster. Table 2-1 liststhe data specifications of each node.
Table 2-1 Cluster environment plan
MachineName
IP Address Number ofDrives
OS and JDK
Node 1 IPaddress1 System drive: 1 x4 TB HDD
CentOS 7.6 and OpenJDKjdk8u252-b09
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
2 Hadoop Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 8
MachineName
IP Address Number ofDrives
OS and JDK
Node 2 Data drive: 12 x 4TB HDD
IPaddress2
Node 3 IPaddress3
Node 4 IPaddress4
Software Plan
Table 2-2 describes the software planning of each node in the cluster.
Table 2-2 Software plan
MachineName
Service Name
Node 1 NameNode and ResourceManager
Node 2 QuorumPeerMain, DataNode, NodeManager, and JournalNode
Node 3 QuorumPeerMain, DataNode, NodeManager, and JournalNode
Node 4 QuorumPeerMain, DataNode, NodeManager, and JournalNode
2.3 Configuring the Deployment EnvironmentStep 1 Log in to nodes 1 to 4 in sequence and change their host names to server1,
agent1, agent2 and agent3.hostnamectl set-hostname host_name --static
Step 2 Log in to each node and modify the /etc/hosts file.
Add the mapping between the IP addresses and host names of the nodes to thehosts file.
IPaddress1 server1IPaddress2 agent1IPaddress3 agent2IPaddress4 agent3
Step 3 Log in to each node and disable the firewall.systemctl stop firewalld.servicesystemctl disable firewalld.service
Step 4 Log in to each node and enable password-free SSH login.
1. Generate a key and press Enter if any message is prompted.ssh-keygen -t rsa
2. Enable password-free SSH login on each node (including password-free loginfor the local node):ssh-copy-id -i ~/.ssh/id_rsa.pub root@node_IP_address
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
2 Hadoop Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 9
Step 5 Log in to each node and install OpenJDK.
1. Install OpenJDK.ARM:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gz -C /usr/local
x86:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz -C /usr/local
2. Add environment variables.vim /etc/profileexport JAVA_HOME=/usr/local/jdk8u252-b09export PATH=$JAVA_HOME/bin:$PATH
3. Make the environment variables take effect.source /etc/profile
4. Check whether the OpenJDK is successfully installed.java -version
The installation is successful if information similar to the following isdisplayed:
----End
2.4 Deploying ZooKeeper
2.4.1 Compiling and Decompressing ZooKeeperStep 1 Compile the zookeeper-3.4.6.tar.gz deployment package by following instructions
in ZooKeeper 3.4.6 Porting Guide (CentOS 7.6).
Step 2 Place zookeeper-3.4.6.tar.gz in the /usr/local directory on agent1 anddecompress it.mv zookeeper-3.4.6.tar.gz /usr/localcd /usr/localtar -zxvf zookeeper-3.4.6.tar.gz
Step 3 Create a soft link for subsequent version update.ln -s zookeeper-3.4.6 zookeeper
----End
2.4.2 Setting ZooKeeper Environment VariablesStep 1 Open the configuration file.
vim /etc/profile
Step 2 Add ZooKeeper to environment variables.export ZOOKEEPER_HOME=/usr/local/zookeeperexport PATH=$ZOOKEEPER_HOME/bin:$PATH
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
2 Hadoop Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 10
Step 3 Make the environment variables take effect.source /etc/profile
----End
2.4.3 Modifying the ZooKeeper Configuration FilesStep 1 Switch to the directory where ZooKeeper is located.
cd /usr/local/zookeeper/conf
Step 2 Copy the configuration file.cp zoo_sample.cfg zoo.cfg
Step 3 Modify the configuration file.vim zoo.cfg
1. Change the data directory.dataDir=/usr/local/zookeeper/tmp
2. Add the following code to the end of the file. server.1 to server.3 are thenodes where ZooKeeper is deployed.server.1=agent1:2888:3888server.2=agent2:2888:3888server.3=agent3:2888:3888
Step 4 Create the tmp directory as the data directory.mkdir /usr/local/zookeeper/tmp
Step 5 Create an empty file in the tmp directory and write an ID to the file.touch /usr/local/zookeeper/tmp/myidecho 1 > /usr/local/zookeeper/tmp/myid
----End
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
2 Hadoop Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 11
2.4.4 Synchronizing the Configuration to Other NodesStep 1 Copy the ZooKeeper configuration to other nodes.
scp -r /usr/local/zookeeper-3.4.6 root@agent2:/usr/localscp -r /usr/local/zookeeper-3.4.6 root@agent3:/usr/local
Step 2 Create a soft link and modify myid on agent2 and agent3.● agent2:
cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 2 > /usr/local/zookeeper/tmp/myid
● agent3:cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 3 > /usr/local/zookeeper/tmp/myid
----End
2.4.5 Running and Verifying ZooKeeperStep 1 Start ZooKeeper on agent1, agent2, and agent3.
cd /usr/local/zookeeper/bin./zkServer.sh start
NO TE
You can stop ZooKeeper on agent1, agent2, and agent3.cd /usr/local/zookeeper/bin./zkServer.sh stop
Step 2 Check the ZooKeeper status../zkServer.sh status
----End
2.5 Deploying Hadoop
2.5.1 Compiling and Decompressing HadoopStep 1 Compile the Hadoop software deployment package hadoop-3.1.1.tar.gz by
referring to Hadoop 3.1.1 Porting Guide (CentOS 7.6).
Step 2 Place hadoop-3.1.1.tar.gz in the /usr/local directory on server1 and decompressit.mv hadoop-3.1.1.tar.gz /usr/localcd /usr/localtar -zxvf hadoop-3.1.1.tar.gz
Step 3 Create a soft link for later version replacement.ln -s hadoop-3.1.1 hadoop
----End
2.5.2 Setting the Hadoop Environment VariablesStep 1 Open the /etc/profile file:
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
2 Hadoop Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 12
vim /etc/profile
Step 2 Add the following environment variables to the end of the file:export HADOOP_HOME=/usr/local/hadoopexport PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
Step 3 Make the environment variables take effect.source /etc/profile
----End
2.5.3 Modifying the Hadoop Configuration FileNO TE
All Hadoop configuration files are stored in the $HADOOP_HOME/etc/hadoop directory.Before modifying the configuration files, go to the $HADOOP_HOME/etc/hadoop directoryfirst.cd $HADOOP_HOME/etc/hadoop
Modifying the hadoop-env.sh FileChange the environment variable JAVA_HOME to an absolute path and set theuser to user root.
echo "export JAVA_HOME=/usr/local/jdk8u252-b09" >> hadoop-env.shecho "export HDFS_NAMENODE_USER=root" >> hadoop-env.shecho "export HDFS_SECONDARYNAMENODE_USER=root" >> hadoop-env.shecho "export HDFS_DATANODE_USER=root" >> hadoop-env.sh
Modifying the yarn-env.sh FileChange the user to user root.
echo "export YARN_REGISTRYDNS_SECURE_USER=root" >> yarn-env.shecho "export YARN_RESOURCEMANAGER_USER=root" >> yarn-env.shecho "export YARN_NODEMANAGER_USER=root" >> yarn-env.sh
Modifying the core-site.xml File
Step 1 Open the core-site.xml file.vim core-site.xml
Step 2 Add or modify parameters under the configuration section.<property> <name>fs.defaultFS</name> <value>hdfs://server1:9000</value></property><property> <name>hadoop.tmp.dir</name> <value>/home/hadoop_tmp_dir</value></property><property> <name>ipc.client.connect.max.retries</name> <value>100</value></property><property> <name>ipc.client.connect.retry.interval</name> <value>10000</value></property><property> <name>hadoop.proxyuser.root.hosts</name>
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
2 Hadoop Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 13
<value>*</value></property><property> <name>hadoop.proxyuser.root.groups</name> <value>*</value></property>
NO TICE
Create a directory on server1.mkdir /home/hadoop_tmp_dir
----End
Modifying the hdfs-site.xml File
Step 1 Modify the hdfs-site.xml file.vim hdfs-site.xml
Step 2 Add or modify parameters under the configuration section.<property> <name>dfs.replication</name> <value>1</value></property><property> <name>dfs.namenode.name.dir</name> <value>/data/data1/hadoop/nn</value></property><property> <name>dfs.datanode.data.dir</name> <value>/data/data1/hadoop/dn,/data/data2/hadoop/dn,/data/data3/hadoop/dn,/data/data4/hadoop/dn,/data/data5/hadoop/dn,/data/data6/hadoop/dn,/data/data7/hadoop/dn,/data/data8/hadoop/dn,/data/data9/hadoop/dn,/data/data10/hadoop/dn,/data/data11/hadoop/dn,/data/data12/hadoop/dn</value></property><property> <name>dfs.http.address</name> <value>server1:50070</value></property><property> <name>dfs.namenode.http-bind-host</name> <value>0.0.0.0</value></property><property> <name>dfs.datanode.handler.count</name> <value>600</value></property><property> <name>dfs.namenode.handler.count</name> <value>600</value></property><property> <name>dfs.namenode.service.handler.count</name> <value>600</value></property><property> <name>ipc.server.handler.queue.size</name> <value>300</value></property><property> <name>dfs.webhdfs.enabled</name> <value>true</value></property>
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
2 Hadoop Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 14
NO TICE
Create a directory for dfs.datanode.data.dir on agent1, agent2, and agent3.Example:mkdir -p /data/data{1,2,3,4,5,6,7,8,9,10,11,12}/hadoop
----End
Modifying the mapred-site.xml File
Step 1 Edit the mapred-site.xml file.vim mapred-site.xml
Step 2 Add or modify parameters under the configuration section.<property> <name>mapreduce.framework.name</name> <value>yarn</value> <final>true</final> <description>The runtime framework for executing MapReduce jobs</description></property><property> <name>mapreduce.job.reduce.slowstart.completedmaps</name> <value>0.88</value></property><property> <name>mapreduce.application.classpath</name> <value> /usr/local/hadoop/etc/hadoop, /usr/local/hadoop/share/hadoop/common/*, /usr/local/hadoop/share/hadoop/common/lib/*, /usr/local/hadoop/share/hadoop/hdfs/*, /usr/local/hadoop/share/hadoop/hdfs/lib/*, /usr/local/hadoop/share/hadoop/mapreduce/*, /usr/local/hadoop/share/hadoop/mapreduce/lib/*, /usr/local/hadoop/share/hadoop/yarn/*, /usr/local/hadoop/share/hadoop/yarn/lib/* </value></property><property> <name>mapreduce.map.memory.mb</name> <value>6144</value></property><property> <name>mapreduce.reduce.memory.mb</name> <value>6144</value> </property> <property> <name>mapreduce.map.java.opts</name> <value>-Xmx5530m</value></property><property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx2765m</value></property><property> <name>mapred.child.java.opts</name> <value>-Xmx2048m -Xms2048m</value></property><property> <name>mapred.reduce.parallel.copies</name> <value>20</value></property><property> <name>yarn.app.mapreduce.am.env</name>
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
2 Hadoop Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 15
<value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property><property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property><property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property>
----End
Modifying the yarn-site.xml FileStep 1 Edit the yarn-site.xml file.
vim yarn-site.xml
Step 2 Add or modify parameters under the configuration section.<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> <final>true</final></property><property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value></property><property> <name>yarn.resourcemanager.hostname</name> <value>server1</value></property><property> <name>yarn.resourcemanager.bind-host</name> <value>0.0.0.0</value></property><property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>65536</value></property><property> <name>yarn.nodemanager.resource.memory-mb</name> <value>102400</value></property><property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>48</value></property><property> <name>yarn.log-aggregation-enable</name> <value>true</value></property><property> <name>yarn.client.nodemanager-connect.max-wait-ms</name> <value>300000</value></property><property> <name>yarn.nodemanager.vmem-pmem-ratio</name> <value>7.1</value></property><property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value></property><property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value></property>
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
2 Hadoop Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 16
<property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>3072</value></property><property> <name>yarn.app.mapreduce.am.resource.mb</name> <value>3072</value></property><property> <name>yarn.scheduler.maximum-allocation-vcores</name> <value>48</value></property><property> <name>yarn.application.classpath</name> <value> /usr/local/hadoop/etc/hadoop, /usr/local/hadoop/share/hadoop/common/*, /usr/local/hadoop/share/hadoop/common/lib/*, /usr/local/hadoop/share/hadoop/hdfs/*, /usr/local/hadoop/share/hadoop/hdfs/lib/*, /usr/local/hadoop/share/hadoop/mapreduce/*, /usr/local/hadoop/share/hadoop/mapreduce/lib/*, /usr/local/hadoop/share/hadoop/yarn/*, /usr/local/hadoop/share/hadoop/yarn/lib/* </value></property><property> <name>yarn.nodemanager.local-dirs</name> <value>/data/data1/hadoop/yarn/local,/data/data2/hadoop/yarn/local,/data/data3/hadoop/yarn/local,/data/data4/hadoop/yarn/local,/data/data5/hadoop/yarn/local,/data/data6/hadoop/yarn/local,/data/data7/hadoop/yarn/local,/data/data8/hadoop/yarn/local,/data/data9/hadoop/yarn/local,/data/data10/hadoop/yarn/local,/data/data11/hadoop/yarn/local,/data/data12/hadoop/yarn/local</value> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/data/data1/hadoop/yarn/log,/data/data2/hadoop/yarn/log,/data/data3/hadoop/yarn/log,/data/data4/hadoop/yarn/log,/data/data5/hadoop/yarn/log,/data/data6/hadoop/yarn/log,/data/data7/hadoop/yarn/log,/data/data8/hadoop/yarn/log,/data/data9/hadoop/yarn/log,/data/data10/hadoop/yarn/log,/data/data11/hadoop/yarn/log,/data/data12/hadoop/yarn/log</value> </property>
NO TICE
Create a directory for yarn.nodemanager.local-dirs on agent1, agent2, andagent3.
Example:mkdir -p /data/data{1,2,3,4,5,6,7,8,9,10,11,12}/hadoop/yarn
----End
Modifying the slaves or workers Files
Step 1 Check the Hadoop version. If the Hadoop version is earlier than 3.x, edit theslaves file. If the Hadoop version is 3.x or later, edit the workers file.
Step 2 Edit the workers file (taking Hadoop 3.1.1 as an example in this document).vim workers
Step 3 Modify the workers file and delete all content except the IP addresses or hostnames of all agent nodes.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
2 Hadoop Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 17
agent1agent2agent3
----End
2.5.4 Synchronizing the Configuration to Other NodesStep 1 Create a journaldata directory on each node in sequence.
mkdir -p /usr/local/hadoop-3.1.1/journaldata
Step 2 Copy hadoop-3.1.1 to the /usr/local directory on agent1, agent2, and agent3nodes.scp -r /usr/local/hadoop-3.1.1 root@agent1:/usr/localscp -r /usr/local/hadoop-3.1.1 root@agent2:/usr/localscp -r /usr/local/hadoop-3.1.1 root@agent3:/usr/local
Step 3 Log in to the agent1, agent2, and agent3 nodes and create soft links forhadoop-3.1.1.cd /usr/localln -s hadoop-3.1.1 hadoop
----End
2.5.5 Starting the Hadoop Cluster
NO TICE
Perform operations in this section in sequence.
Step 1 Start the ZooKeeper cluster.
Start ZooKeeper on agent1, agent2, and agent3.
cd /usr/local/zookeeper/bin./zkServer.sh start
Step 2 Start JournalNode.
Start JournalNode on agent1, agent2, and agent3.
NO TE
Perform Step 2 to Step 4 only when you format the cluster for the first time. After theformatting is complete, you only need to perform Step 1, Step 5, and Step 6 when youstart the cluster next time.
cd /usr/local/hadoop/sbin./hadoop-daemon.sh start journalnode
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
2 Hadoop Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 18
Step 3 Format HDFS.
1. Format HDFS on server1.hdfs namenode -format
2. After the formatting, the cluster generates a directory based on thehadoop.tmp.dir parameter configured in the core-site.xml file.The directory configured in this example is /home/hadoop_tmp.
Step 4 Format ZKFC.
Format ZKFC on server1.
hdfs zkfc -formatZK
Step 5 Start the HDFS.
Start HDFS on server1.
cd /usr/local/hadoop/sbin./start-dfs.sh
Step 6 Start Yarn.
Start Yarn on server1.
cd /usr/local/hadoop/sbin./start-yarn.sh
Step 7 Check whether all processes are started properly.
NO TE
Perform this operation on each node to check whether all processes are started properly.(The following figures show the processes to be started on server1 and agent1, respectively.The processes to be started on other server nodes and agent nodes are similar.)
jps
----End
2.5.6 Verifying HadoopEnter the URL in the address box of the browser to access the Hadoop web page.The URL format is http://server1:50070.
Change server1 to the IP address of the node where the server process resides.Check whether the number of live nodes is the same as the number of agentnodes (the quantity is 3 in this section) and whether the number of dead nodes is0. If yes, the cluster is started properly.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
2 Hadoop Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 19
2.6 Troubleshooting
Failed to Start ZooKeeperSymptom:
After running the zkServer.sh command to check the ZooKeeper startup statuswhen ZooKeeper is started, a message is displayed indicating a startup failure.After ZooKeeper is stopped and started again by running the zkServer.sh start-foreground command, the following error information is displayed.
Solution:
1. Change the IP address of the node where ZooKeeper failed to be started to0.0.0.0 in the zoo.cfg file.
2. Modify the myid file in the directory specified by datadir in the zoo.cfg file.Ensure that the number in $datadir/myid is the same as that inzookeeper/tmp/myid.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
2 Hadoop Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 20
3 Flink Deployment Guide (CentOS 7.6 &openEuler 20.03)
3.1 Introduction
3.2 Environment Requirements
3.3 Configuring the Deployment Environment
3.4 Deploying ZooKeeper
3.5 Deploying Hadoop
3.6 Deploying Flink (Flink on Yarn)
3.1 Introduction
Flink OverviewThis document describes the Flink deployment procedure and does not include thesource code compilation procedure.
All programs required in this document are downloaded from the official websites.Most of these programs are compiled based on the x86 platform and may containmodules that are implemented in platform-dependent languages (such as C/C++).Therefore, incompatibility issues may occur if these programs are directly run onTaiShan servers. To resolve the problem, you need to download and compile thesource code and then deploy the programs. The deployment procedure is the sameregardless of the program compilation platform.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 21
Recommended Versions
Software
Version
How to Obtain
OpenJDK
jdk8u252-b09
ARM: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gzx86: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz
ZooKeeper
3.4.6 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/
Hadoop 3.1.1 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/hadoop/core/hadoop-3.1.1/
Flink 1.7.0 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/flink/flink-1.7.0/flink-1.7.0-bin-hadoop28-scala_2.11.tgz
3.2 Environment Requirements
Hardware
Minimum configuration: any CPU, one DIMM of any capacity, and one drive of anycapacity
The configuration depends on the actual application scenario.
OS Requirements
CentOS 7.4 to 7.6, openEuler 20.03
NO TE
This document uses CentOS 7.6 as an example to describe how to deploy a Flink cluster.
Cluster Environment
In this document, four hosts are used as nodes 1 to 4 in a cluster. Table 3-1 liststhe specifications of each node.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 22
Table 3-1 Cluster data plan
Node IP Address Number of Drives OS and JDK
Node 1 IPaddress1 System drive: 1 x 4TB HDD
Data drive: 12 x 4TB HDD
CentOS 7.6 and OpenJDKjdk8u252-b09
Node 2 IPaddress2
Node 3 IPaddress3
Node 4 IPaddress4
Software PlanningTable 3-2 lists the software plan of each node in the cluster.
Table 3-2 Software plan
Node Services
Node 1 JobManager
Node 2 QuorumPeerMain and TaskManager
Node 3 QuorumPeerMain and TaskManager
Node 4 QuorumPeerMain and TaskManager
3.3 Configuring the Deployment EnvironmentStep 1 Log in to nodes 1 to 4 in sequence and change their host names to server1,
agent1, agent2 and agent3.hostnamectl set-hostname host_name --static
Step 2 Log in to each node and modify the /etc/hosts file.
Add the mapping between the IP addresses and host names of the nodes to thehosts file.
IPaddress1 server1IPaddress2 agent1IPaddress3 agent2IPaddress4 agent3
Step 3 Log in to each node and disable the firewall.systemctl stop firewalld.servicesystemctl disable firewalld.service
Step 4 Log in to each node and enable password-free SSH login.
1. Generate a key and press Enter if any message is prompted.ssh-keygen -t rsa
2. Enable password-free SSH login on each node (including password-free loginfor the local node):ssh-copy-id -i ~/.ssh/id_rsa.pub root@node_IP_address
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 23
Step 5 Log in to each node and install OpenJDK.
1. Install OpenJDK.ARM:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gz -C /usr/local
x86:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz -C /usr/local
2. Add environment variables.vim /etc/profileexport JAVA_HOME=/usr/local/jdk8u252-b09export PATH=$JAVA_HOME/bin:$PATH
3. Make the environment variables take effect.source /etc/profile
4. Check whether the OpenJDK is successfully installed.java -version
The installation is successful if information similar to the following isdisplayed:
----End
3.4 Deploying ZooKeeper
3.4.1 Compiling and Decompressing ZooKeeperStep 1 Compile the zookeeper-3.4.6.tar.gz deployment package by following instructions
in ZooKeeper 3.4.6 Porting Guide (CentOS 7.6).
Step 2 Place zookeeper-3.4.6.tar.gz in the /usr/local directory on agent1 anddecompress it.mv zookeeper-3.4.6.tar.gz /usr/localcd /usr/localtar -zxvf zookeeper-3.4.6.tar.gz
Step 3 Create a soft link for subsequent version update.ln -s zookeeper-3.4.6 zookeeper
----End
3.4.2 Setting ZooKeeper Environment VariablesStep 1 Open the configuration file.
vim /etc/profile
Step 2 Add ZooKeeper to environment variables.export ZOOKEEPER_HOME=/usr/local/zookeeperexport PATH=$ZOOKEEPER_HOME/bin:$PATH
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 24
Step 3 Make the environment variables take effect.source /etc/profile
----End
3.4.3 Modifying the ZooKeeper Configuration FilesStep 1 Switch to the directory where ZooKeeper is located.
cd /usr/local/zookeeper/conf
Step 2 Copy the configuration file.cp zoo_sample.cfg zoo.cfg
Step 3 Modify the configuration file.vim zoo.cfg
1. Change the data directory.dataDir=/usr/local/zookeeper/tmp
2. Add the following code to the end of the file. server.1 to server.3 are thenodes where ZooKeeper is deployed.server.1=agent1:2888:3888server.2=agent2:2888:3888server.3=agent3:2888:3888
Step 4 Create the tmp directory as the data directory.mkdir /usr/local/zookeeper/tmp
Step 5 Create an empty file in the tmp directory and write an ID to the file.touch /usr/local/zookeeper/tmp/myidecho 1 > /usr/local/zookeeper/tmp/myid
----End
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 25
3.4.4 Synchronizing the Configuration to Other NodesStep 1 Copy the ZooKeeper configuration to other nodes.
scp -r /usr/local/zookeeper-3.4.6 root@agent2:/usr/localscp -r /usr/local/zookeeper-3.4.6 root@agent3:/usr/local
Step 2 Create a soft link and modify myid on agent2 and agent3.● agent2:
cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 2 > /usr/local/zookeeper/tmp/myid
● agent3:cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 3 > /usr/local/zookeeper/tmp/myid
----End
3.4.5 Running and Verifying ZooKeeperStep 1 Start ZooKeeper on agent1, agent2, and agent3.
cd /usr/local/zookeeper/bin./zkServer.sh start
NO TE
You can stop ZooKeeper on agent1, agent2, and agent3.cd /usr/local/zookeeper/bin./zkServer.sh stop
Step 2 Check the ZooKeeper status../zkServer.sh status
----End
3.5 Deploying Hadoop
3.5.1 Compiling and Decompressing HadoopStep 1 Compile the Hadoop software deployment package hadoop-3.1.1.tar.gz by
referring to Hadoop 3.1.1 Porting Guide (CentOS 7.6).
Step 2 Place hadoop-3.1.1.tar.gz in the /usr/local directory on server1 and decompressit.mv hadoop-3.1.1.tar.gz /usr/localcd /usr/localtar -zxvf hadoop-3.1.1.tar.gz
Step 3 Create a soft link for later version replacement.ln -s hadoop-3.1.1 hadoop
----End
3.5.2 Setting the Hadoop Environment VariablesStep 1 Open the /etc/profile file:
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 26
vim /etc/profile
Step 2 Add the following environment variables to the end of the file:export HADOOP_HOME=/usr/local/hadoopexport PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
Step 3 Make the environment variables take effect.source /etc/profile
----End
3.5.3 Modifying the Hadoop Configuration FileNO TE
All Hadoop configuration files are stored in the $HADOOP_HOME/etc/hadoop directory.Before modifying the configuration files, go to the $HADOOP_HOME/etc/hadoop directoryfirst.cd $HADOOP_HOME/etc/hadoop
Modifying the hadoop-env.sh FileChange the environment variable JAVA_HOME to an absolute path and set theuser to user root.
echo "export JAVA_HOME=/usr/local/jdk8u252-b09" >> hadoop-env.shecho "export HDFS_NAMENODE_USER=root" >> hadoop-env.shecho "export HDFS_SECONDARYNAMENODE_USER=root" >> hadoop-env.shecho "export HDFS_DATANODE_USER=root" >> hadoop-env.sh
Modifying the yarn-env.sh FileChange the user to user root.
echo "export YARN_REGISTRYDNS_SECURE_USER=root" >> yarn-env.shecho "export YARN_RESOURCEMANAGER_USER=root" >> yarn-env.shecho "export YARN_NODEMANAGER_USER=root" >> yarn-env.sh
Modifying the core-site.xml File
Step 1 Open the core-site.xml file.vim core-site.xml
Step 2 Add or modify parameters under the configuration section.<property> <name>fs.defaultFS</name> <value>hdfs://server1:9000</value></property><property> <name>hadoop.tmp.dir</name> <value>/home/hadoop_tmp_dir</value></property><property> <name>ipc.client.connect.max.retries</name> <value>100</value></property><property> <name>ipc.client.connect.retry.interval</name> <value>10000</value></property><property> <name>hadoop.proxyuser.root.hosts</name>
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 27
<value>*</value></property><property> <name>hadoop.proxyuser.root.groups</name> <value>*</value></property>
NO TICE
Create a directory on server1.mkdir /home/hadoop_tmp_dir
----End
Modifying the hdfs-site.xml File
Step 1 Modify the hdfs-site.xml file.vim hdfs-site.xml
Step 2 Add or modify parameters under the configuration section.<property> <name>dfs.replication</name> <value>1</value></property><property> <name>dfs.namenode.name.dir</name> <value>/data/data1/hadoop/nn</value></property><property> <name>dfs.datanode.data.dir</name> <value>/data/data1/hadoop/dn,/data/data2/hadoop/dn,/data/data3/hadoop/dn,/data/data4/hadoop/dn,/data/data5/hadoop/dn,/data/data6/hadoop/dn,/data/data7/hadoop/dn,/data/data8/hadoop/dn,/data/data9/hadoop/dn,/data/data10/hadoop/dn,/data/data11/hadoop/dn,/data/data12/hadoop/dn</value></property><property> <name>dfs.http.address</name> <value>server1:50070</value></property><property> <name>dfs.namenode.http-bind-host</name> <value>0.0.0.0</value></property><property> <name>dfs.datanode.handler.count</name> <value>600</value></property><property> <name>dfs.namenode.handler.count</name> <value>600</value></property><property> <name>dfs.namenode.service.handler.count</name> <value>600</value></property><property> <name>ipc.server.handler.queue.size</name> <value>300</value></property><property> <name>dfs.webhdfs.enabled</name> <value>true</value></property>
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 28
NO TICE
Create a directory for dfs.datanode.data.dir on agent1, agent2, and agent3.Example:mkdir -p /data/data{1,2,3,4,5,6,7,8,9,10,11,12}/hadoop
----End
Modifying the mapred-site.xml File
Step 1 Edit the mapred-site.xml file.vim mapred-site.xml
Step 2 Add or modify parameters under the configuration section.<property> <name>mapreduce.framework.name</name> <value>yarn</value> <final>true</final> <description>The runtime framework for executing MapReduce jobs</description></property><property> <name>mapreduce.job.reduce.slowstart.completedmaps</name> <value>0.88</value></property><property> <name>mapreduce.application.classpath</name> <value> /usr/local/hadoop/etc/hadoop, /usr/local/hadoop/share/hadoop/common/*, /usr/local/hadoop/share/hadoop/common/lib/*, /usr/local/hadoop/share/hadoop/hdfs/*, /usr/local/hadoop/share/hadoop/hdfs/lib/*, /usr/local/hadoop/share/hadoop/mapreduce/*, /usr/local/hadoop/share/hadoop/mapreduce/lib/*, /usr/local/hadoop/share/hadoop/yarn/*, /usr/local/hadoop/share/hadoop/yarn/lib/* </value></property><property> <name>mapreduce.map.memory.mb</name> <value>6144</value></property><property> <name>mapreduce.reduce.memory.mb</name> <value>6144</value> </property> <property> <name>mapreduce.map.java.opts</name> <value>-Xmx5530m</value></property><property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx2765m</value></property><property> <name>mapred.child.java.opts</name> <value>-Xmx2048m -Xms2048m</value></property><property> <name>mapred.reduce.parallel.copies</name> <value>20</value></property><property> <name>yarn.app.mapreduce.am.env</name>
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 29
<value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property><property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property><property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property>
----End
Modifying the yarn-site.xml FileStep 1 Edit the yarn-site.xml file.
vim yarn-site.xml
Step 2 Add or modify parameters under the configuration section.<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> <final>true</final></property><property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value></property><property> <name>yarn.resourcemanager.hostname</name> <value>server1</value></property><property> <name>yarn.resourcemanager.bind-host</name> <value>0.0.0.0</value></property><property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>65536</value></property><property> <name>yarn.nodemanager.resource.memory-mb</name> <value>102400</value></property><property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>48</value></property><property> <name>yarn.log-aggregation-enable</name> <value>true</value></property><property> <name>yarn.client.nodemanager-connect.max-wait-ms</name> <value>300000</value></property><property> <name>yarn.nodemanager.vmem-pmem-ratio</name> <value>7.1</value></property><property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value></property><property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value></property>
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 30
<property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>3072</value></property><property> <name>yarn.app.mapreduce.am.resource.mb</name> <value>3072</value></property><property> <name>yarn.scheduler.maximum-allocation-vcores</name> <value>48</value></property><property> <name>yarn.application.classpath</name> <value> /usr/local/hadoop/etc/hadoop, /usr/local/hadoop/share/hadoop/common/*, /usr/local/hadoop/share/hadoop/common/lib/*, /usr/local/hadoop/share/hadoop/hdfs/*, /usr/local/hadoop/share/hadoop/hdfs/lib/*, /usr/local/hadoop/share/hadoop/mapreduce/*, /usr/local/hadoop/share/hadoop/mapreduce/lib/*, /usr/local/hadoop/share/hadoop/yarn/*, /usr/local/hadoop/share/hadoop/yarn/lib/* </value></property><property> <name>yarn.nodemanager.local-dirs</name> <value>/data/data1/hadoop/yarn/local,/data/data2/hadoop/yarn/local,/data/data3/hadoop/yarn/local,/data/data4/hadoop/yarn/local,/data/data5/hadoop/yarn/local,/data/data6/hadoop/yarn/local,/data/data7/hadoop/yarn/local,/data/data8/hadoop/yarn/local,/data/data9/hadoop/yarn/local,/data/data10/hadoop/yarn/local,/data/data11/hadoop/yarn/local,/data/data12/hadoop/yarn/local</value> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/data/data1/hadoop/yarn/log,/data/data2/hadoop/yarn/log,/data/data3/hadoop/yarn/log,/data/data4/hadoop/yarn/log,/data/data5/hadoop/yarn/log,/data/data6/hadoop/yarn/log,/data/data7/hadoop/yarn/log,/data/data8/hadoop/yarn/log,/data/data9/hadoop/yarn/log,/data/data10/hadoop/yarn/log,/data/data11/hadoop/yarn/log,/data/data12/hadoop/yarn/log</value> </property>
NO TICE
Create a directory for yarn.nodemanager.local-dirs on agent1, agent2, andagent3.
Example:mkdir -p /data/data{1,2,3,4,5,6,7,8,9,10,11,12}/hadoop/yarn
----End
Modifying the slaves or workers Files
Step 1 Check the Hadoop version. If the Hadoop version is earlier than 3.x, edit theslaves file. If the Hadoop version is 3.x or later, edit the workers file.
Step 2 Edit the workers file (taking Hadoop 3.1.1 as an example in this document).vim workers
Step 3 Modify the workers file and delete all content except the IP addresses or hostnames of all agent nodes.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 31
agent1agent2agent3
----End
3.5.4 Synchronizing the Configuration to Other NodesStep 1 Create a journaldata directory on each node in sequence.
mkdir -p /usr/local/hadoop-3.1.1/journaldata
Step 2 Copy hadoop-3.1.1 to the /usr/local directory on agent1, agent2, and agent3nodes.scp -r /usr/local/hadoop-3.1.1 root@agent1:/usr/localscp -r /usr/local/hadoop-3.1.1 root@agent2:/usr/localscp -r /usr/local/hadoop-3.1.1 root@agent3:/usr/local
Step 3 Log in to the agent1, agent2, and agent3 nodes and create soft links forhadoop-3.1.1.cd /usr/localln -s hadoop-3.1.1 hadoop
----End
3.5.5 Starting the Hadoop Cluster
NO TICE
Perform operations in this section in sequence.
Step 1 Start the ZooKeeper cluster.
Start ZooKeeper on agent1, agent2, and agent3.
cd /usr/local/zookeeper/bin./zkServer.sh start
Step 2 Start JournalNode.
Start JournalNode on agent1, agent2, and agent3.
NO TE
Perform Step 2 to Step 4 only when you format the cluster for the first time. After theformatting is complete, you only need to perform Step 1, Step 5, and Step 6 when youstart the cluster next time.
cd /usr/local/hadoop/sbin./hadoop-daemon.sh start journalnode
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 32
Step 3 Format HDFS.
1. Format HDFS on server1.hdfs namenode -format
2. After the formatting, the cluster generates a directory based on thehadoop.tmp.dir parameter configured in the core-site.xml file.The directory configured in this example is /home/hadoop_tmp.
Step 4 Format ZKFC.
Format ZKFC on server1.
hdfs zkfc -formatZK
Step 5 Start the HDFS.
Start HDFS on server1.
cd /usr/local/hadoop/sbin./start-dfs.sh
Step 6 Start Yarn.
Start Yarn on server1.
cd /usr/local/hadoop/sbin./start-yarn.sh
Step 7 Check whether all processes are started properly.
NO TE
Perform this operation on each node to check whether all processes are started properly.(The following figures show the processes to be started on server1 and agent1, respectively.The processes to be started on other server nodes and agent nodes are similar.)
jps
----End
3.5.6 Verifying HadoopEnter the URL in the address box of the browser to access the Hadoop web page.The URL format is http://server1:50070.
Change server1 to the IP address of the node where the server process resides.Check whether the number of live nodes is the same as the number of agentnodes (the quantity is 3 in this section) and whether the number of dead nodes is0. If yes, the cluster is started properly.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 33
3.6 Deploying Flink (Flink on Yarn)
3.6.1 Obtaining FlinkStep 1 Download the Flink installation package.
wget https://archive.apache.org/dist/flink/flink-1.7.0/flink-1.7.0-bin-hadoop28-scala_2.11.tgz
Step 2 Place the flink-1.7.0-bin-hadoop28-scala_2.11.tgz package in the /usr/localdirectory of server1 and decompress it.mv flink-1.7.0-bin-hadoop28-scala_2.11.tgz /usr/localtar -zxvf flink-1.7.0-bin-hadoop28-scala_2.11.tgz
Step 3 Create a soft link for subsequent version update.ln -s flink-1.7.0 flink
----End
3.6.2 Setting Flink Environment VariablesStep 1 Open the configuration file.
vim /etc/profile
Step 2 Add Flink path to the environment variables.export FLINK_HOME=/usr/local/flinkexport PATH=$FLINK_HOME/bin:$PATH
Step 3 Make the environment variables take effect.source /etc/profile
----End
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 34
3.6.3 Modifying the Flink Configuration FilesNO TE
All Flink configuration files are stored in the $FLINK_HOME/conf directory. Beforemodifying the configuration files, run the following command to switch to $FLINK_HOME/conf:cd $FLINK_HOME/conf
Modify the flink-conf.yaml file as follows:env.java.home: /usr/local/jdk8u252-b09env.hadoop.conf.dir: /usr/local/hadoop/etc/hadoop/
3.6.4 Running and Verifying FlinkStep 1 Start ZooKeeper and Hadoop in sequence.
Step 2 Start the Flink cluster on server1:/usr/local/flink/bin/yarn-session.sh -n 1 -s 1 -jm 768 -tm 1024 -qu default -nm flinkapp -d
NO TE
You can stop the Flink cluster on server1./usr/local/flink/bin/stop-cluster.sh
Step 3 Enter the URL in the address box of a browser to access the Flink WebUI. The URLformat is as follows:http://server1:8081
----End
3.6.5 Stopping FlinkCheck the Yarn task process of Flink and kill the task to stop Flink.
yarn app - kill $(yarn app -list | grep flinkapp | awk '{print $1}')
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
3 Flink Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 35
4 HBase Cluster Deployment Guide(CentOS 7.6 & openEuler 20.03)
4.1 Introduction
4.2 Environment Requirements
4.3 Configuring the Deployment Environment
4.4 Deploying ZooKeeper
4.5 Deploying Hadoop
4.6 Deploying HBase
4.1 Introduction
HBase OverviewThis document describes the HBase deployment procedure and does not includethe source code compilation procedure.
All programs required in this document are downloaded from the official websites.Most of these programs are compiled based on the x86 platform and may containmodules that are implemented in platform-dependent languages (such as C/C++).Therefore, incompatibility issues may occur if these programs are directly run onTaiShan servers. To resolve the problem, you need to download and compile thesource code and then deploy the programs. The deployment procedure is the sameregardless of the program compilation platform.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 36
Recommended Versions
Software
Version
How to Obtain
OpenJDK
jdk8u252-b09
ARM: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gzx86: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz
ZooKeeper
3.4.6 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/
Hadoop 3.1.1 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/hadoop/core/hadoop-3.1.1/
HBase 2.0.2 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/hbase/2.0.2/hbase-2.0.2-bin.tar.gz
4.2 Environment Requirements
Hardware
Minimum configuration: any CPU, one DIMM of any capacity, and one drive of anycapacity
The configuration depends on the actual application scenario.
OS Requirements
CentOS 7.4 to 7.6, openEuler 20.03
NO TE
This document uses CentOS 7.6 as an example to describe how to deploy an HBase cluster.
Cluster Environment
In this document, four hosts are used as nodes 1 to 4 in a cluster. Table 4-1 liststhe data plan of each node.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 37
Table 4-1 Cluster data plan
Node IP Address Number of Drives OS & JDK
Node 1 IPaddress1 System drive: 1 x 4TB HDD
Data drive: 6 x 4TB HDD
CentOS 7.6 & OpenJDKjdk8u252-b09
Node 2 IPaddress2
Node 3 IPaddress3
Node 4 IPaddress4
Software PlanningTable 4-2 lists the software plan of each node in the cluster.
Table 4-2 Software plan
Node Services
Node 1 NameNode, ResourceManager, HMaster
Node 2 QuorumPeerMain, DataNode, NodeManager, JournalNode, andHRegionServer
Node 3 QuorumPeerMain, DataNode, NodeManager, JournalNode, andHRegionServer
Node 4 QuorumPeerMain, DataNode, NodeManager, JournalNode, andHRegionServer
4.3 Configuring the Deployment EnvironmentStep 1 Log in to nodes 1 to 4 in sequence and change their host names to server1,
agent1, agent2 and agent3.hostnamectl set-hostname host_name --static
Step 2 Log in to each node and modify the /etc/hosts file.
Add the mapping between the IP addresses and host names of the nodes to thehosts file.
IPaddress1 server1IPaddress2 agent1IPaddress3 agent2IPaddress4 agent3
Step 3 Log in to each node and disable the firewall.systemctl stop firewalld.servicesystemctl disable firewalld.service
Step 4 Log in to each node and enable password-free SSH login.
1. Generate a key and press Enter if any message is prompted.ssh-keygen -t rsa
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 38
2. Enable password-free SSH login on each node (including password-free loginfor the local node):ssh-copy-id -i ~/.ssh/id_rsa.pub root@node_IP_address
Step 5 Log in to each node and install OpenJDK.
1. Install OpenJDK.ARM:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gz -C /usr/local
x86:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz -C /usr/local
2. Add environment variables.vim /etc/profileexport JAVA_HOME=/usr/local/jdk8u252-b09export PATH=$JAVA_HOME/bin:$PATH
3. Make the environment variables take effect.source /etc/profile
4. Check whether the OpenJDK is successfully installed.java -version
The installation is successful if information similar to the following isdisplayed:
----End
4.4 Deploying ZooKeeper
4.4.1 Compiling and Decompressing ZooKeeperStep 1 Compile the zookeeper-3.4.6.tar.gz deployment package by following instructions
in ZooKeeper 3.4.6 Porting Guide (CentOS 7.6).
Step 2 Place zookeeper-3.4.6.tar.gz in the /usr/local directory on agent1 anddecompress it.mv zookeeper-3.4.6.tar.gz /usr/localcd /usr/localtar -zxvf zookeeper-3.4.6.tar.gz
Step 3 Create a soft link for subsequent version update.ln -s zookeeper-3.4.6 zookeeper
----End
4.4.2 Setting ZooKeeper Environment VariablesStep 1 Open the configuration file.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 39
vim /etc/profile
Step 2 Add ZooKeeper to environment variables.export ZOOKEEPER_HOME=/usr/local/zookeeperexport PATH=$ZOOKEEPER_HOME/bin:$PATH
Step 3 Make the environment variables take effect.source /etc/profile
----End
4.4.3 Modifying the ZooKeeper Configuration FilesStep 1 Switch to the directory where ZooKeeper is located.
cd /usr/local/zookeeper/conf
Step 2 Copy the configuration file.cp zoo_sample.cfg zoo.cfg
Step 3 Modify the configuration file.vim zoo.cfg
1. Change the data directory.dataDir=/usr/local/zookeeper/tmp
2. Add the following code to the end of the file. server.1 to server.3 are thenodes where ZooKeeper is deployed.server.1=agent1:2888:3888server.2=agent2:2888:3888server.3=agent3:2888:3888
Step 4 Create the tmp directory as the data directory.mkdir /usr/local/zookeeper/tmp
Step 5 Create an empty file in the tmp directory and write an ID to the file.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 40
touch /usr/local/zookeeper/tmp/myidecho 1 > /usr/local/zookeeper/tmp/myid
----End
4.4.4 Synchronizing the Configuration to Other NodesStep 1 Copy the ZooKeeper configuration to other nodes.
scp -r /usr/local/zookeeper-3.4.6 root@agent2:/usr/localscp -r /usr/local/zookeeper-3.4.6 root@agent3:/usr/local
Step 2 Create a soft link and modify myid on agent2 and agent3.● agent2:
cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 2 > /usr/local/zookeeper/tmp/myid
● agent3:cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 3 > /usr/local/zookeeper/tmp/myid
----End
4.4.5 Running and Verifying ZooKeeperStep 1 Start ZooKeeper on agent1, agent2, and agent3.
cd /usr/local/zookeeper/bin./zkServer.sh start
NO TE
You can stop ZooKeeper on agent1, agent2, and agent3.cd /usr/local/zookeeper/bin./zkServer.sh stop
Step 2 Check the ZooKeeper status../zkServer.sh status
----End
4.5 Deploying Hadoop
4.5.1 Compiling and Decompressing HadoopStep 1 Compile the Hadoop software deployment package hadoop-3.1.1.tar.gz by
referring to Hadoop 3.1.1 Porting Guide (CentOS 7.6).
Step 2 Place hadoop-3.1.1.tar.gz in the /usr/local directory on server1 and decompressit.mv hadoop-3.1.1.tar.gz /usr/localcd /usr/localtar -zxvf hadoop-3.1.1.tar.gz
Step 3 Create a soft link for later version replacement.ln -s hadoop-3.1.1 hadoop
----End
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 41
4.5.2 Setting the Hadoop Environment VariablesStep 1 Open the /etc/profile file:
vim /etc/profile
Step 2 Add the following environment variables to the end of the file:export HADOOP_HOME=/usr/local/hadoopexport PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
Step 3 Make the environment variables take effect.source /etc/profile
----End
4.5.3 Modifying the Hadoop Configuration FileNO TE
All Hadoop configuration files are stored in the $HADOOP_HOME/etc/hadoop directory.Before modifying the configuration files, go to the $HADOOP_HOME/etc/hadoop directoryfirst.cd $HADOOP_HOME/etc/hadoop
Modifying the hadoop-env.sh FileChange the environment variable JAVA_HOME to an absolute path and set theuser to user root.
echo "export JAVA_HOME=/usr/local/jdk8u252-b09" >> hadoop-env.shecho "export HDFS_NAMENODE_USER=root" >> hadoop-env.shecho "export HDFS_SECONDARYNAMENODE_USER=root" >> hadoop-env.shecho "export HDFS_DATANODE_USER=root" >> hadoop-env.sh
Modifying the yarn-env.sh FileChange the user to user root.
echo "export YARN_REGISTRYDNS_SECURE_USER=root" >> yarn-env.shecho "export YARN_RESOURCEMANAGER_USER=root" >> yarn-env.shecho "export YARN_NODEMANAGER_USER=root" >> yarn-env.sh
Modifying the core-site.xml File
Step 1 Open the core-site.xml file.vim core-site.xml
Step 2 Add or modify parameters under the configuration section.<property> <name>fs.defaultFS</name> <value>hdfs://server1:9000</value></property><property> <name>hadoop.tmp.dir</name> <value>/home/hadoop_tmp_dir</value></property><property> <name>ipc.client.connect.max.retries</name> <value>100</value></property><property>
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 42
<name>ipc.client.connect.retry.interval</name> <value>10000</value></property><property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value></property><property> <name>hadoop.proxyuser.root.groups</name> <value>*</value></property>
NO TICE
Create a directory on server1.mkdir /home/hadoop_tmp_dir
----End
Modifying the hdfs-site.xml File
Step 1 Modify the hdfs-site.xml file.vim hdfs-site.xml
Step 2 Add or modify parameters under the configuration section.<property> <name>dfs.replication</name> <value>1</value></property><property> <name>dfs.namenode.name.dir</name> <value>/data/data1/hadoop/nn</value></property><property> <name>dfs.datanode.data.dir</name> <value>/data/data1/hadoop/dn,/data/data2/hadoop/dn,/data/data3/hadoop/dn,/data/data4/hadoop/dn,/data/data5/hadoop/dn,/data/data6/hadoop/dn,/data/data7/hadoop/dn,/data/data8/hadoop/dn,/data/data9/hadoop/dn,/data/data10/hadoop/dn,/data/data11/hadoop/dn,/data/data12/hadoop/dn</value></property><property> <name>dfs.http.address</name> <value>server1:50070</value></property><property> <name>dfs.namenode.http-bind-host</name> <value>0.0.0.0</value></property><property> <name>dfs.datanode.handler.count</name> <value>600</value></property><property> <name>dfs.namenode.handler.count</name> <value>600</value></property><property> <name>dfs.namenode.service.handler.count</name> <value>600</value></property><property> <name>ipc.server.handler.queue.size</name> <value>300</value></property><property>
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 43
<name>dfs.webhdfs.enabled</name> <value>true</value></property>
NO TICE
Create a directory for dfs.datanode.data.dir on agent1, agent2, and agent3.Example:mkdir -p /data/data{1,2,3,4,5,6,7,8,9,10,11,12}/hadoop
----End
Modifying the mapred-site.xml File
Step 1 Edit the mapred-site.xml file.vim mapred-site.xml
Step 2 Add or modify parameters under the configuration section.<property> <name>mapreduce.framework.name</name> <value>yarn</value> <final>true</final> <description>The runtime framework for executing MapReduce jobs</description></property><property> <name>mapreduce.job.reduce.slowstart.completedmaps</name> <value>0.88</value></property><property> <name>mapreduce.application.classpath</name> <value> /usr/local/hadoop/etc/hadoop, /usr/local/hadoop/share/hadoop/common/*, /usr/local/hadoop/share/hadoop/common/lib/*, /usr/local/hadoop/share/hadoop/hdfs/*, /usr/local/hadoop/share/hadoop/hdfs/lib/*, /usr/local/hadoop/share/hadoop/mapreduce/*, /usr/local/hadoop/share/hadoop/mapreduce/lib/*, /usr/local/hadoop/share/hadoop/yarn/*, /usr/local/hadoop/share/hadoop/yarn/lib/* </value></property><property> <name>mapreduce.map.memory.mb</name> <value>6144</value></property><property> <name>mapreduce.reduce.memory.mb</name> <value>6144</value> </property> <property> <name>mapreduce.map.java.opts</name> <value>-Xmx5530m</value></property><property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx2765m</value></property><property> <name>mapred.child.java.opts</name> <value>-Xmx2048m -Xms2048m</value></property><property>
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 44
<name>mapred.reduce.parallel.copies</name> <value>20</value></property><property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property><property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property><property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property>
----End
Modifying the yarn-site.xml FileStep 1 Edit the yarn-site.xml file.
vim yarn-site.xml
Step 2 Add or modify parameters under the configuration section.<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> <final>true</final></property><property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value></property><property> <name>yarn.resourcemanager.hostname</name> <value>server1</value></property><property> <name>yarn.resourcemanager.bind-host</name> <value>0.0.0.0</value></property><property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>65536</value></property><property> <name>yarn.nodemanager.resource.memory-mb</name> <value>102400</value></property><property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>48</value></property><property> <name>yarn.log-aggregation-enable</name> <value>true</value></property><property> <name>yarn.client.nodemanager-connect.max-wait-ms</name> <value>300000</value></property><property> <name>yarn.nodemanager.vmem-pmem-ratio</name> <value>7.1</value></property><property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value>
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 45
</property><property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value></property><property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>3072</value></property><property> <name>yarn.app.mapreduce.am.resource.mb</name> <value>3072</value></property><property> <name>yarn.scheduler.maximum-allocation-vcores</name> <value>48</value></property><property> <name>yarn.application.classpath</name> <value> /usr/local/hadoop/etc/hadoop, /usr/local/hadoop/share/hadoop/common/*, /usr/local/hadoop/share/hadoop/common/lib/*, /usr/local/hadoop/share/hadoop/hdfs/*, /usr/local/hadoop/share/hadoop/hdfs/lib/*, /usr/local/hadoop/share/hadoop/mapreduce/*, /usr/local/hadoop/share/hadoop/mapreduce/lib/*, /usr/local/hadoop/share/hadoop/yarn/*, /usr/local/hadoop/share/hadoop/yarn/lib/* </value></property><property> <name>yarn.nodemanager.local-dirs</name> <value>/data/data1/hadoop/yarn/local,/data/data2/hadoop/yarn/local,/data/data3/hadoop/yarn/local,/data/data4/hadoop/yarn/local,/data/data5/hadoop/yarn/local,/data/data6/hadoop/yarn/local,/data/data7/hadoop/yarn/local,/data/data8/hadoop/yarn/local,/data/data9/hadoop/yarn/local,/data/data10/hadoop/yarn/local,/data/data11/hadoop/yarn/local,/data/data12/hadoop/yarn/local</value> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/data/data1/hadoop/yarn/log,/data/data2/hadoop/yarn/log,/data/data3/hadoop/yarn/log,/data/data4/hadoop/yarn/log,/data/data5/hadoop/yarn/log,/data/data6/hadoop/yarn/log,/data/data7/hadoop/yarn/log,/data/data8/hadoop/yarn/log,/data/data9/hadoop/yarn/log,/data/data10/hadoop/yarn/log,/data/data11/hadoop/yarn/log,/data/data12/hadoop/yarn/log</value> </property>
NO TICE
Create a directory for yarn.nodemanager.local-dirs on agent1, agent2, andagent3.Example:mkdir -p /data/data{1,2,3,4,5,6,7,8,9,10,11,12}/hadoop/yarn
----End
Modifying the slaves or workers Files
Step 1 Check the Hadoop version. If the Hadoop version is earlier than 3.x, edit theslaves file. If the Hadoop version is 3.x or later, edit the workers file.
Step 2 Edit the workers file (taking Hadoop 3.1.1 as an example in this document).vim workers
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 46
Step 3 Modify the workers file and delete all content except the IP addresses or hostnames of all agent nodes.agent1agent2agent3
----End
4.5.4 Synchronizing the Configuration to Other NodesStep 1 Create a journaldata directory on each node in sequence.
mkdir -p /usr/local/hadoop-3.1.1/journaldata
Step 2 Copy hadoop-3.1.1 to the /usr/local directory on agent1, agent2, and agent3nodes.scp -r /usr/local/hadoop-3.1.1 root@agent1:/usr/localscp -r /usr/local/hadoop-3.1.1 root@agent2:/usr/localscp -r /usr/local/hadoop-3.1.1 root@agent3:/usr/local
Step 3 Log in to the agent1, agent2, and agent3 nodes and create soft links forhadoop-3.1.1.cd /usr/localln -s hadoop-3.1.1 hadoop
----End
4.5.5 Starting the Hadoop Cluster
NO TICE
Perform operations in this section in sequence.
Step 1 Start the ZooKeeper cluster.
Start ZooKeeper on agent1, agent2, and agent3.
cd /usr/local/zookeeper/bin./zkServer.sh start
Step 2 Start JournalNode.
Start JournalNode on agent1, agent2, and agent3.
NO TE
Perform Step 2 to Step 4 only when you format the cluster for the first time. After theformatting is complete, you only need to perform Step 1, Step 5, and Step 6 when youstart the cluster next time.
cd /usr/local/hadoop/sbin./hadoop-daemon.sh start journalnode
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 47
Step 3 Format HDFS.
1. Format HDFS on server1.hdfs namenode -format
2. After the formatting, the cluster generates a directory based on thehadoop.tmp.dir parameter configured in the core-site.xml file.The directory configured in this example is /home/hadoop_tmp.
Step 4 Format ZKFC.
Format ZKFC on server1.
hdfs zkfc -formatZK
Step 5 Start the HDFS.
Start HDFS on server1.
cd /usr/local/hadoop/sbin./start-dfs.sh
Step 6 Start Yarn.
Start Yarn on server1.
cd /usr/local/hadoop/sbin./start-yarn.sh
Step 7 Check whether all processes are started properly.
NO TE
Perform this operation on each node to check whether all processes are started properly.(The following figures show the processes to be started on server1 and agent1, respectively.The processes to be started on other server nodes and agent nodes are similar.)
jps
----End
4.5.6 Verifying HadoopEnter the URL in the address box of the browser to access the Hadoop web page.The URL format is http://server1:50070.
Change server1 to the IP address of the node where the server process resides.Check whether the number of live nodes is the same as the number of agentnodes (the quantity is 3 in this section) and whether the number of dead nodes is0. If yes, the cluster is started properly.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 48
4.6 Deploying HBase
4.6.1 Obtaining HBaseStep 1 Download the HBase package from the following website:
https://archive.apache.org/dist/hbase/2.0.2/hbase-2.0.2-bin.tar.gz
Step 2 Place hbase-2.0.2-bin.tar.gz in the /usr/local directory on server1 anddecompress it.mv hbase-2.0.2-bin.tar.gz /usr/localtar -zxvf hbase-2.0.2-bin.tar.gz
Step 3 Create a soft link for subsequent version update.ln -s hbase-2.0.2 hbase
----End
4.6.2 Setting HBase Environment VariablesStep 1 Open the /etc/profile file.
vim /etc/profile
Step 2 Add the following environment variables to the end of the file:export HBASE_HOME=/usr/local/hbaseexport PATH=$HBASE_HOME/bin:$HBASE_HOME/sbin:$PATH
Step 3 Make the environment variables take effect.source /etc/profile
----End
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 49
4.6.3 Modifying the HBase Configuration FilesNO TE
All HBase configuration files are stored in the HBASE_HOME/conf directory. Beforemodifying the configuration files, go to the HBASE_HOME/conf directory.cd $HBASE_HOME/conf
Modifying the hbase-env.sh File
Step 1 Open the hbase-env.sh file.vim hbase-env.sh
Step 2 Change the value of JAVA_HOME to an absolute path, and HBASE_MANAGES_ZKto false.export JAVA_HOME=/usr/local/jdk8u252-b09export HBASE_MANAGES_ZK=falseexport HBASE_LIBRARY_PATH=/usr/local/hadoop/lib/native
----End
Modifying the hbase-site.xml File
Step 1 Open the hbase-site.xml file.vim hbase-site.xml
Step 2 Add or modify some parameters under the configuration section.<configuration> <property> <name>hbase.rootdir</name> <value>hdfs://server1:9000/HBase</value> </property> <property> <name>hbase.tmp.dir</name> <value>/usr/local/hbase/tmp</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.unsafe.stream.capability.enforce</name> <value>false</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>agent1:2181,agent2:2181,agent3:2181</value> </property> <property> <name>hbase.unsafe.stream.capability.enforce</name> <value>false</value> </property></configuration>
----End
Modifying the regionservers File
Step 1 Open the regionservers file.vim regionservers
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 50
Step 2 Replace the content of the regionservers file with the IP addresses (or hostnames) of the agents.agent1agent2agent3
----End
Copying hdfs-site.xml
Copy the hdfs-site.xml file of Hadoop to the hbase/conf/ directory. You can use asoft link or copy the file.cp /usr/local/hadoop/etc/hadoop/hdfs-site.xml /usr/local/hbase/conf/hdfs-site.xml
4.6.4 Synchronizing the Configuration to Other NodesStep 1 Copy hbase-2.0.2 to the /usr/local directory on agent1, agent2, and agent3
nodes.scp -r /usr/local/hbase-2.0.2 root@agent1:/usr/localscp -r /usr/local/hbase-2.0.2 root@agent2:/usr/localscp -r /usr/local/hbase-2.0.2 root@agent3:/usr/local
Step 2 Log in to the agent1, agent2, and agent3 nodes and create soft links forhbase-2.0.2.cd /usr/localln -s hbase-2.0.2 hbase
----End
4.6.5 Starting the HBase ClusterStep 1 Start ZooKeeper and Hadoop in sequence.
Step 2 Start the HBase cluster on the server1 node./usr/local/hbase/bin/start-hbase.sh
Step 3 Check whether all processes are started properly.jps
server1:
ResourceManagerNameNodeHMaster
agent1:
NodeManagerDataNodeHRegionServerJournalNodeQuorumPeerMain
NO TE
Observe the process startup status on all nodes. The process that needs to be started on theserver1 node is the process of server1, and the process that needs to be started on theagent1 node is the process of agent1.
----End
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 51
4.6.6 (Optional) Stopping the HBase ClusterStop the HBase cluster on the server1 node.
/usr/local/hbase/bin/stop-hbase.sh
4.6.7 Verifying HBaseOpen the browser and enter http://server1:16010 to access the HBase web page.In the URL, server1 indicates the IP address of the node where the HMasterprocess is located, and 16010 is the default port number of HBase 1.0 or later. Youcan change the value of hbase.master.info.port in the hbase-site.xml file tochange the port number.
Check whether the number of servers in Region Servers is the same as thenumber of configured agents (there are 3 agents in this example) and whetherthe cluster is properly started.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
4 HBase Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 52
5 Hive Deployment Guide (CentOS 7.6 &openEuler 20.03)
5.1 Introduction
5.2 Environment Requirements
5.3 Configuring the Deployment Environment
5.4 Deploying ZooKeeper
5.5 Deploying Hadoop
5.6 Deploying Hive
5.1 Introduction
Hive OverviewThis document describes the Hive deployment procedure and does not include thesource code compilation procedure.
All programs required in this document are downloaded from the official websites.Most of these programs are compiled based on the x86 platform and may containmodules that are implemented in platform-dependent languages (such as C/C++).Therefore, incompatibility issues may occur if these programs are directly run onTaiShan servers. To resolve the problem, you need to download and compile thesource code and then deploy the programs. The deployment procedure is the sameregardless of the program compilation platform.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 53
Recommended VersionSoftware
Version
How to Obtain
OpenJDK
jdk8u252-b09
ARM: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gzx86: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz
ZooKeeper
3.4.6 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/
Hadoop 3.1.1 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/hadoop/core/hadoop-3.1.1/
Hive 3.1.0 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/hive/hive-3.1.0/
5.2 Environment Requirements
HardwareMinimum configuration: any CPU, one DIMM of any capacity, and one drive of anycapacity
The configuration depends on the actual application scenario.
OS RequirementsCentOS 7.4 to 7.6, openEuler 20.03
NO TE
This document uses CentOS 7.6 as an example to describe how to deploy a Hive cluster.
Cluster EnvironmentIn this document, four hosts are used as nodes 1 to 4 in a cluster. Table 5-1 liststhe data plan of each node.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 54
Table 5-1 Cluster data plan
Node IP Address Number of Drives OS & JDK
Node 1 IPaddress1 System drive: 1 x 4TB HDD
Data drive: 12 x 4TB HDD
CentOS 7.6 and OpenJDKjdk8u252-b09
Node 2 IPaddress2
Node 3 IPaddress3
Node 4 IPaddress4
Software PlanningTable 5-2 lists the software plan of each node in the cluster.
Table 5-2 Software plan
Node Services
Node 1 NameNode, ResourceManager, and Hive client
Node 2 QuorumPeerMain, DataNode, NodeManager, and JournalNode
Node 3 QuorumPeerMain, DataNode, NodeManager, and JournalNode
Node 4 QuorumPeerMain, DataNode, NodeManager, and JournalNode
5.3 Configuring the Deployment EnvironmentStep 1 Log in to nodes 1 to 4 in sequence and change their host names to server1,
agent1, agent2 and agent3.hostnamectl set-hostname host_name --static
Step 2 Log in to each node and modify the /etc/hosts file.
Add the mapping between the IP addresses and host names of the nodes to thehosts file.
IPaddress1 server1IPaddress2 agent1IPaddress3 agent2IPaddress4 agent3
Step 3 Log in to each node and disable the firewall.systemctl stop firewalld.servicesystemctl disable firewalld.service
Step 4 Log in to each node and enable password-free SSH login.
1. Generate a key and press Enter if any message is prompted.ssh-keygen -t rsa
2. Enable password-free SSH login on each node (including password-free loginfor the local node):ssh-copy-id -i ~/.ssh/id_rsa.pub root@node_IP_address
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 55
Step 5 Log in to each node and install OpenJDK.
1. Install OpenJDK.ARM:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gz -C /usr/local
x86:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz -C /usr/local
2. Add environment variables.vim /etc/profileexport JAVA_HOME=/usr/local/jdk8u252-b09export PATH=$JAVA_HOME/bin:$PATH
3. Make the environment variables take effect.source /etc/profile
4. Check whether the OpenJDK is successfully installed.java -version
The installation is successful if information similar to the following isdisplayed:
----End
5.4 Deploying ZooKeeper
5.4.1 Compiling and Decompressing ZooKeeperStep 1 Compile the zookeeper-3.4.6.tar.gz deployment package by following instructions
in ZooKeeper 3.4.6 Porting Guide (CentOS 7.6).
Step 2 Place zookeeper-3.4.6.tar.gz in the /usr/local directory on agent1 anddecompress it.mv zookeeper-3.4.6.tar.gz /usr/localcd /usr/localtar -zxvf zookeeper-3.4.6.tar.gz
Step 3 Create a soft link for subsequent version update.ln -s zookeeper-3.4.6 zookeeper
----End
5.4.2 Setting ZooKeeper Environment VariablesStep 1 Open the configuration file.
vim /etc/profile
Step 2 Add ZooKeeper to environment variables.export ZOOKEEPER_HOME=/usr/local/zookeeperexport PATH=$ZOOKEEPER_HOME/bin:$PATH
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 56
Step 3 Make the environment variables take effect.source /etc/profile
----End
5.4.3 Modifying the ZooKeeper Configuration FilesStep 1 Switch to the directory where ZooKeeper is located.
cd /usr/local/zookeeper/conf
Step 2 Copy the configuration file.cp zoo_sample.cfg zoo.cfg
Step 3 Modify the configuration file.vim zoo.cfg
1. Change the data directory.dataDir=/usr/local/zookeeper/tmp
2. Add the following code to the end of the file. server.1 to server.3 are thenodes where ZooKeeper is deployed.server.1=agent1:2888:3888server.2=agent2:2888:3888server.3=agent3:2888:3888
Step 4 Create the tmp directory as the data directory.mkdir /usr/local/zookeeper/tmp
Step 5 Create an empty file in the tmp directory and write an ID to the file.touch /usr/local/zookeeper/tmp/myidecho 1 > /usr/local/zookeeper/tmp/myid
----End
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 57
5.4.4 Synchronizing the Configuration to Other NodesStep 1 Copy the ZooKeeper configuration to other nodes.
scp -r /usr/local/zookeeper-3.4.6 root@agent2:/usr/localscp -r /usr/local/zookeeper-3.4.6 root@agent3:/usr/local
Step 2 Create a soft link and modify myid on agent2 and agent3.● agent2:
cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 2 > /usr/local/zookeeper/tmp/myid
● agent3:cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 3 > /usr/local/zookeeper/tmp/myid
----End
5.4.5 Running and Verifying ZooKeeperStep 1 Start ZooKeeper on agent1, agent2, and agent3.
cd /usr/local/zookeeper/bin./zkServer.sh start
NO TE
You can stop ZooKeeper on agent1, agent2, and agent3.cd /usr/local/zookeeper/bin./zkServer.sh stop
Step 2 Check the ZooKeeper status../zkServer.sh status
----End
5.5 Deploying Hadoop
5.5.1 Compiling and Decompressing HadoopStep 1 Compile the Hadoop software deployment package hadoop-3.1.1.tar.gz by
referring to Hadoop 3.1.1 Porting Guide (CentOS 7.6).
Step 2 Place hadoop-3.1.1.tar.gz in the /usr/local directory on server1 and decompressit.mv hadoop-3.1.1.tar.gz /usr/localcd /usr/localtar -zxvf hadoop-3.1.1.tar.gz
Step 3 Create a soft link for later version replacement.ln -s hadoop-3.1.1 hadoop
----End
5.5.2 Setting the Hadoop Environment VariablesStep 1 Open the /etc/profile file:
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 58
vim /etc/profile
Step 2 Add the following environment variables to the end of the file:export HADOOP_HOME=/usr/local/hadoopexport PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
Step 3 Make the environment variables take effect.source /etc/profile
----End
5.5.3 Modifying the Hadoop Configuration FileNO TE
All Hadoop configuration files are stored in the $HADOOP_HOME/etc/hadoop directory.Before modifying the configuration files, go to the $HADOOP_HOME/etc/hadoop directoryfirst.cd $HADOOP_HOME/etc/hadoop
Modifying the hadoop-env.sh FileChange the environment variable JAVA_HOME to an absolute path and set theuser to user root.
echo "export JAVA_HOME=/usr/local/jdk8u252-b09" >> hadoop-env.shecho "export HDFS_NAMENODE_USER=root" >> hadoop-env.shecho "export HDFS_SECONDARYNAMENODE_USER=root" >> hadoop-env.shecho "export HDFS_DATANODE_USER=root" >> hadoop-env.sh
Modifying the yarn-env.sh FileChange the user to user root.
echo "export YARN_REGISTRYDNS_SECURE_USER=root" >> yarn-env.shecho "export YARN_RESOURCEMANAGER_USER=root" >> yarn-env.shecho "export YARN_NODEMANAGER_USER=root" >> yarn-env.sh
Modifying the core-site.xml File
Step 1 Open the core-site.xml file.vim core-site.xml
Step 2 Add or modify parameters under the configuration section.<property> <name>fs.defaultFS</name> <value>hdfs://server1:9000</value></property><property> <name>hadoop.tmp.dir</name> <value>/home/hadoop_tmp_dir</value></property><property> <name>ipc.client.connect.max.retries</name> <value>100</value></property><property> <name>ipc.client.connect.retry.interval</name> <value>10000</value></property><property> <name>hadoop.proxyuser.root.hosts</name>
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 59
<value>*</value></property><property> <name>hadoop.proxyuser.root.groups</name> <value>*</value></property>
NO TICE
Create a directory on server1.mkdir /home/hadoop_tmp_dir
----End
Modifying the hdfs-site.xml File
Step 1 Modify the hdfs-site.xml file.vim hdfs-site.xml
Step 2 Add or modify parameters under the configuration section.<property> <name>dfs.replication</name> <value>1</value></property><property> <name>dfs.namenode.name.dir</name> <value>/data/data1/hadoop/nn</value></property><property> <name>dfs.datanode.data.dir</name> <value>/data/data1/hadoop/dn,/data/data2/hadoop/dn,/data/data3/hadoop/dn,/data/data4/hadoop/dn,/data/data5/hadoop/dn,/data/data6/hadoop/dn,/data/data7/hadoop/dn,/data/data8/hadoop/dn,/data/data9/hadoop/dn,/data/data10/hadoop/dn,/data/data11/hadoop/dn,/data/data12/hadoop/dn</value></property><property> <name>dfs.http.address</name> <value>server1:50070</value></property><property> <name>dfs.namenode.http-bind-host</name> <value>0.0.0.0</value></property><property> <name>dfs.datanode.handler.count</name> <value>600</value></property><property> <name>dfs.namenode.handler.count</name> <value>600</value></property><property> <name>dfs.namenode.service.handler.count</name> <value>600</value></property><property> <name>ipc.server.handler.queue.size</name> <value>300</value></property><property> <name>dfs.webhdfs.enabled</name> <value>true</value></property>
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 60
NO TICE
Create a directory for dfs.datanode.data.dir on agent1, agent2, and agent3.Example:mkdir -p /data/data{1,2,3,4,5,6,7,8,9,10,11,12}/hadoop
----End
Modifying the mapred-site.xml File
Step 1 Edit the mapred-site.xml file.vim mapred-site.xml
Step 2 Add or modify parameters under the configuration section.<property> <name>mapreduce.framework.name</name> <value>yarn</value> <final>true</final> <description>The runtime framework for executing MapReduce jobs</description></property><property> <name>mapreduce.job.reduce.slowstart.completedmaps</name> <value>0.88</value></property><property> <name>mapreduce.application.classpath</name> <value> /usr/local/hadoop/etc/hadoop, /usr/local/hadoop/share/hadoop/common/*, /usr/local/hadoop/share/hadoop/common/lib/*, /usr/local/hadoop/share/hadoop/hdfs/*, /usr/local/hadoop/share/hadoop/hdfs/lib/*, /usr/local/hadoop/share/hadoop/mapreduce/*, /usr/local/hadoop/share/hadoop/mapreduce/lib/*, /usr/local/hadoop/share/hadoop/yarn/*, /usr/local/hadoop/share/hadoop/yarn/lib/* </value></property><property> <name>mapreduce.map.memory.mb</name> <value>6144</value></property><property> <name>mapreduce.reduce.memory.mb</name> <value>6144</value> </property> <property> <name>mapreduce.map.java.opts</name> <value>-Xmx5530m</value></property><property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx2765m</value></property><property> <name>mapred.child.java.opts</name> <value>-Xmx2048m -Xms2048m</value></property><property> <name>mapred.reduce.parallel.copies</name> <value>20</value></property><property> <name>yarn.app.mapreduce.am.env</name>
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 61
<value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property><property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property><property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property>
----End
Modifying the yarn-site.xml FileStep 1 Edit the yarn-site.xml file.
vim yarn-site.xml
Step 2 Add or modify parameters under the configuration section.<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> <final>true</final></property><property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value></property><property> <name>yarn.resourcemanager.hostname</name> <value>server1</value></property><property> <name>yarn.resourcemanager.bind-host</name> <value>0.0.0.0</value></property><property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>65536</value></property><property> <name>yarn.nodemanager.resource.memory-mb</name> <value>102400</value></property><property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>48</value></property><property> <name>yarn.log-aggregation-enable</name> <value>true</value></property><property> <name>yarn.client.nodemanager-connect.max-wait-ms</name> <value>300000</value></property><property> <name>yarn.nodemanager.vmem-pmem-ratio</name> <value>7.1</value></property><property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value></property><property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value></property>
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 62
<property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>3072</value></property><property> <name>yarn.app.mapreduce.am.resource.mb</name> <value>3072</value></property><property> <name>yarn.scheduler.maximum-allocation-vcores</name> <value>48</value></property><property> <name>yarn.application.classpath</name> <value> /usr/local/hadoop/etc/hadoop, /usr/local/hadoop/share/hadoop/common/*, /usr/local/hadoop/share/hadoop/common/lib/*, /usr/local/hadoop/share/hadoop/hdfs/*, /usr/local/hadoop/share/hadoop/hdfs/lib/*, /usr/local/hadoop/share/hadoop/mapreduce/*, /usr/local/hadoop/share/hadoop/mapreduce/lib/*, /usr/local/hadoop/share/hadoop/yarn/*, /usr/local/hadoop/share/hadoop/yarn/lib/* </value></property><property> <name>yarn.nodemanager.local-dirs</name> <value>/data/data1/hadoop/yarn/local,/data/data2/hadoop/yarn/local,/data/data3/hadoop/yarn/local,/data/data4/hadoop/yarn/local,/data/data5/hadoop/yarn/local,/data/data6/hadoop/yarn/local,/data/data7/hadoop/yarn/local,/data/data8/hadoop/yarn/local,/data/data9/hadoop/yarn/local,/data/data10/hadoop/yarn/local,/data/data11/hadoop/yarn/local,/data/data12/hadoop/yarn/local</value> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/data/data1/hadoop/yarn/log,/data/data2/hadoop/yarn/log,/data/data3/hadoop/yarn/log,/data/data4/hadoop/yarn/log,/data/data5/hadoop/yarn/log,/data/data6/hadoop/yarn/log,/data/data7/hadoop/yarn/log,/data/data8/hadoop/yarn/log,/data/data9/hadoop/yarn/log,/data/data10/hadoop/yarn/log,/data/data11/hadoop/yarn/log,/data/data12/hadoop/yarn/log</value> </property>
NO TICE
Create a directory for yarn.nodemanager.local-dirs on agent1, agent2, andagent3.
Example:mkdir -p /data/data{1,2,3,4,5,6,7,8,9,10,11,12}/hadoop/yarn
----End
Modifying the slaves or workers Files
Step 1 Check the Hadoop version. If the Hadoop version is earlier than 3.x, edit theslaves file. If the Hadoop version is 3.x or later, edit the workers file.
Step 2 Edit the workers file (taking Hadoop 3.1.1 as an example in this document).vim workers
Step 3 Modify the workers file and delete all content except the IP addresses or hostnames of all agent nodes.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 63
agent1agent2agent3
----End
5.5.4 Synchronizing the Configuration to Other NodesStep 1 Create a journaldata directory on each node in sequence.
mkdir -p /usr/local/hadoop-3.1.1/journaldata
Step 2 Copy hadoop-3.1.1 to the /usr/local directory on agent1, agent2, and agent3nodes.scp -r /usr/local/hadoop-3.1.1 root@agent1:/usr/localscp -r /usr/local/hadoop-3.1.1 root@agent2:/usr/localscp -r /usr/local/hadoop-3.1.1 root@agent3:/usr/local
Step 3 Log in to the agent1, agent2, and agent3 nodes and create soft links forhadoop-3.1.1.cd /usr/localln -s hadoop-3.1.1 hadoop
----End
5.5.5 Starting the Hadoop Cluster
NO TICE
Perform operations in this section in sequence.
Step 1 Start the ZooKeeper cluster.
Start ZooKeeper on agent1, agent2, and agent3.
cd /usr/local/zookeeper/bin./zkServer.sh start
Step 2 Start JournalNode.
Start JournalNode on agent1, agent2, and agent3.
NO TE
Perform Step 2 to Step 4 only when you format the cluster for the first time. After theformatting is complete, you only need to perform Step 1, Step 5, and Step 6 when youstart the cluster next time.
cd /usr/local/hadoop/sbin./hadoop-daemon.sh start journalnode
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 64
Step 3 Format HDFS.
1. Format HDFS on server1.hdfs namenode -format
2. After the formatting, the cluster generates a directory based on thehadoop.tmp.dir parameter configured in the core-site.xml file.The directory configured in this example is /home/hadoop_tmp.
Step 4 Format ZKFC.
Format ZKFC on server1.
hdfs zkfc -formatZK
Step 5 Start the HDFS.
Start HDFS on server1.
cd /usr/local/hadoop/sbin./start-dfs.sh
Step 6 Start Yarn.
Start Yarn on server1.
cd /usr/local/hadoop/sbin./start-yarn.sh
Step 7 Check whether all processes are started properly.
NO TE
Perform this operation on each node to check whether all processes are started properly.(The following figures show the processes to be started on server1 and agent1, respectively.The processes to be started on other server nodes and agent nodes are similar.)
jps
----End
5.5.6 Verifying HadoopEnter the URL in the address box of the browser to access the Hadoop web page.The URL format is http://server1:50070.
Change server1 to the IP address of the node where the server process resides.Check whether the number of live nodes is the same as the number of agentnodes (the quantity is 3 in this section) and whether the number of dead nodes is0. If yes, the cluster is started properly.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 65
5.6 Deploying Hive
5.6.1 Installing MariaDBHive stores metadata in the database. Before installing Hive, you need to installthe database software and configure database information in hive-site.xml.Common databases include Derby, MySQL, and MariaDB. This document usesMariaDB as an example. The deployment of other databases is similar.
Step 1 Install MariaDB on server1.
NO TE
Before installing MariaDB, ensure that the Yum source has been configured.yum install mariadb*
Start the MariaDB service.
systemctl start mariadb.service
(Optional) Configure autostart upon power-on:
systemctl enable mariadb.service
Step 2 Configure the permissions and password.
1. Log in to the database and press Enter twice. No password is required for thefirst login.mysql -uroot -p
2. Connect to the MySQL database.mysql> use mysql;
3. Grant all permissions to the root user.mysql> grant all on *.* to root@'server1' identified by 'root';
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 66
4. Update the permissions.mysql> flush privileges;
5. Set the password.mysql> set password for root@server1=password('123456');
Step 3 Sets the UTF-8 character encoding.
1. Open the my.cnf configuration file.vim /etc/my.cnf
2. Add the following content under the [mysqld] section:init_connect='SET collation_connection = utf8_unicode_ci'init_connect='SET NAMES utf8'character-set-server=utf8collation-server=utf8_unicode_ciskip-character-set-client-handshake
3. Open the client.cnf file.vim /etc/my.cnf.d/client.cnf
4. Add the following content under the [client] section:default-character-set=utf8
5. Open the mysql-clients.cnf file.vim /etc/my.cnf.d/mysql-clients.cnf
6. Add the following content under the [mysql] section:default-character-set=utf8
7. Restart MariaDB.systemctl restart mariadb
----End
5.6.2 Obtaining HiveStep 1 Download Hive from the following website:
https://archive.apache.org/dist/hive/hive-3.1.0/
Step 2 Place apache-hive-3.1.0-bin.tar.gz in the /usr/local directory on server1 anddecompress it.mv apache-hive-3.1.0-bin.tar.gz /usr/localtar -zxvf apache-hive-3.1.0-bin.tar.gz
Step 3 Create a soft link for subsequent version update.ln -s apache-hive-3.1.0-bin hive
----End
5.6.3 Setting Hive Environment VariablesStep 1 Open the configuration file.
vim /etc/profile
Step 2 Add Hive path to the environment variables.export HIVE_HOME=/usr/local/hiveexport PATH=$HIVE_HOME/bin:$PATH
Step 3 Make the environment variables take effect.source /etc/profile
----End
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 67
5.6.4 Modifying the Hive Configuration FilesNO TE
All Hive configuration files are stored in the $HIVE_HOME/conf directory. Before modifyingthe configuration files, go to the $HIVE_HOME/conf directory first.cd $HIVE_HOME/conf
Step 1 Modify the hive-env.sh file.cp hive-env.sh.template hive-env.sh
Add the following content to the end of the hive-env.sh file:
export JAVA_HOME=/usr/local/jdk8u252-b09export HADOOP_HOME=/usr/local/hadoopexport HIVE_CONF_DIR=/usr/local/hive/conf
Step 2 Modify the hive-site.xml file.cp hive-default.xml.template hive-site.xml
Run the following command to replace for&# with for to prevent encodingproblems during initialization:
sed -i 's/for&#/for/g' hive-site.xml
Change the values of related parameters in the hive-site.xml file as follows:
<name>javax.jdo.option.ConnectionURL</name><value>jdbc:mysql://server1:3306/hive?createDatabaseIfNotExist=true</value>
<name>javax.jdo.option.ConnectionDriverName</name><value>org.mariadb.jdbc.Driver</value>
<name>javax.jdo.option.ConnectionUserName</name><value>root</value>
<name>javax.jdo.option.ConnectionPassword</name><value>root</value>
<name>hive.exec.local.scratchdir</name><value>/tmp/hive</value>
<name>hive.downloaded.resources.dir</name><value>/tmp/${hive.session.id}_resources</value>
<name>hive.querylog.location</name><value>/tmp/hive</value>
<name>hive.server2.logging.operation.log.location</name><value>/tmp/hive/operation_logs</value>
----End
5.6.5 Starting and Verifying HiveStep 1 Prepare for starting Hive.
1. Download JDBC.Download the JDBC driver and save it to the /usr/local/hive/lib directory. Inthis example, mariadb-java-client-2.3.0.jar is used.
2. Create a directory in which the Hive data is stored./usr/local/hadoop/bin/hadoop fs -mkdir /tmp/usr/local/hadoop/bin/hadoop fs -mkdir -p /user/hive/warehouse
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 68
/usr/local/hadoop/bin/hadoop fs -chmod g+w /tmp/usr/local/hadoop/bin/hadoop fs -chmod g+w /user/hive/warehouse
3. Create a Hive log directory.mkdir -p /usr/local/hive/log/touch /usr/local/hive/log/hiveserver.logtouch /usr/local/hive/log/hiveserver.err
Step 2 Initialize Hive.schematool -dbType mysql -initSchema
Step 3 Start Hive metastore.hive --service metastore -p 9083 &
Step 4 Start hiveserver2.
1. Start hiveserver2.nohup hiveserver2 1>/usr/local/hive/log/hiveserver.log 2>/usr/local/hive/log/hiveserver.err &
2. View the progress.tail -f /usr/local/hive/log/hiveserver.errnohup: ignoring input which: no hbase in (/usr/local/hadoop/bin:/usr/local/hadoop/sbin:/usr/local/hive/bin:/usr/local/zookeeper/bin:/usr/local/jdk8u222-b10/bin:/usr/local/python3/bin:/usr/local/hadoop/bin:/usr/local/hadoop/sbin:/usr/local/hive/bin:/usr/local/zookeeper/bin:/usr/local/jdk8u222-b10/bin:/usr/local/python3/bin:/usr/local/hadoop/bin:/usr/local/hadoop/sbin:/usr/local/zookeeper/bin:/usr/local/jdk8u222-b10/bin:/usr/local/python3/bin:/usr/local/jdk8u222-b10/bin:/usr/local/python3/bin:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin) 2021-01-18 11:32:22: Starting HiveServer2 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/apache-hive-3.1.0-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop-3.1.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Hive Session ID = 824030a3-2afe-488c-a2fa-7d98cfc8f7bd Hive Session ID = 1031e326-2088-4025-b2e2-c9bb1e81b03d Hive Session ID = 32203873-49ad-44b7-987c-da1aae8b3375 Hive Session ID = d7be9389-11c6-46cb-90d6-a91a2d5199b8 OK
3. Query the port.netstat -anp|grep 10000
If the following information is displayed, the startup is successful:tcp6 0 0 :::10000 :::* LISTEN 27800/java
4. Use beeline to connect to server1.beeline -u jdbc:hive2://server1:10000
The command output is as follows:SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/apache-hive-3.1.0-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop-3.1.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Connecting to jdbc:hive2://server1:10000 Connected to: Apache Hive (version 3.1.0) Driver: Hive JDBC (version 3.1.0) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 3.1.0 by Apache Hive0: jdbc:hive2://server1:10000>
Step 5 Check the created database.show databases;
The verification is successful if the following information is displayed:
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 69
0: jdbc:hive2://server1:10000> show databases; INFO : Compiling command(queryId=root_20210118113531_49c3505a-80e1-4aba-9761-c2f77a06ac5f): show databases INFO : Concurrency mode is disabled, not creating a lock manager INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null) INFO : Completed compiling command(queryId=root_20210118113531_49c3505a-80e1-4aba-9761-c2f77a06ac5f); Time taken: 0.903 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=root_20210118113531_49c3505a-80e1-4aba-9761-c2f77a06ac5f): show databases INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=root_20210118113531_49c3505a-80e1-4aba-9761-c2f77a06ac5f); Time taken: 0.029 seconds INFO : OK INFO : Concurrency mode is disabled, not creating a lock manager +----------------+ | database_name | +----------------+ | default | +----------------+ 1 row selected (1.248 seconds)
Step 6 Quit Hive.quit;
----End
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
5 Hive Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 70
6 Kafka Deployment Guide (CentOS 7.6 &openEuler 20.03)
6.1 Introduction
6.2 Environment Requirements
6.3 Configuring the Deployment Environment
6.4 Deploying ZooKeeper
6.5 Deploying Kafka
6.1 Introduction
Kafka OverviewThis document describes the Kafka deployment procedure and does not includethe software compilation procedure using source code.
All programs required in this document are downloaded from the official websites.Most of these programs are compiled based on the x86 platform and may containmodules that are implemented in platform-dependent languages (such as C/C++).Therefore, incompatibility issues may occur if these programs directly run onTaiShan servers. To resolve the problem, you need to download and compile thesource code and then deploy the programs. The deployment procedure is the sameregardless of the program compilation platform.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
6 Kafka Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 71
Recommended Version
Software Version How to Obtain
OpenJDK jdk8u252-b09 ARM: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gzx86: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz
ZooKeeper 3.4.6 Download the software package of the requiredversion from:https://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/
Kafka 2.11-2.2.0 Download the software package of the requiredversion from: https://archive.apache.org/dist/kafka/2.2.0/.
6.2 Environment Requirements
Hardware
Minimum configuration: any CPU, one DIMM of any capacity, and one drive of anycapacity
The configuration depends on the actual application scenario.
OS Requirements
CentOS 7.4 to 7.6, openEuler 20.03
NO TE
This section uses CentOS 7.6 as an example to describe how to deploy a Kafka cluster.
Cluster Data Plan
In this document, four hosts are used as nodes 1 to 4 in a cluster. Table 6-1 liststhe data plan of each node.
Table 6-1 Cluster data plan
Node IP Address Number of Drives OS & JDK
Node 1 IPaddress1 System drive: 1 x 4TB HDD
CentOS 7.6 & OpenJDKjdk8u252-b09
Node 2 IPaddress2
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
6 Kafka Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 72
Node IP Address Number of Drives OS & JDK
Node 3 Data drive: 12 x 4TB HDD
IPaddress3
Node 4 IPaddress4
Software PlanTable 6-2 lists the software plan of each node in the cluster.
Table 6-2 Software plan
Node Service
Node 1 Kafka client
Node 2 QuorumPeerMain and Kafka
Node 3 QuorumPeerMain and Kafka
Node 4 QuorumPeerMain and Kafka
6.3 Configuring the Deployment EnvironmentStep 1 Log in to nodes 1 to 4 in sequence and change their host names to server1,
agent1, agent2 and agent3.hostnamectl set-hostname host_name --static
Step 2 Log in to each node and modify the /etc/hosts file.
Add the mapping between the IP addresses and host names of the nodes to thehosts file.
IPaddress1 server1IPaddress2 agent1IPaddress3 agent2IPaddress4 agent3
Step 3 Log in to each node and disable the firewall.systemctl stop firewalld.servicesystemctl disable firewalld.service
Step 4 Log in to each node and enable password-free SSH login.
1. Generate a key and press Enter if any message is prompted.ssh-keygen -t rsa
2. Enable password-free SSH login on each node (including password-free loginfor the local node):ssh-copy-id -i ~/.ssh/id_rsa.pub root@node_IP_address
Step 5 Log in to each node and install OpenJDK.
1. Install OpenJDK.ARM:
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
6 Kafka Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 73
wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gz -C /usr/local
x86:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz -C /usr/local
2. Add environment variables.vim /etc/profileexport JAVA_HOME=/usr/local/jdk8u252-b09export PATH=$JAVA_HOME/bin:$PATH
3. Make the environment variables take effect.source /etc/profile
4. Check whether the OpenJDK is successfully installed.java -version
The installation is successful if information similar to the following isdisplayed:
----End
6.4 Deploying ZooKeeper
6.4.1 Compiling and Decompressing ZooKeeperStep 1 Compile the zookeeper-3.4.6.tar.gz deployment package by following instructions
in ZooKeeper 3.4.6 Porting Guide (CentOS 7.6).
Step 2 Place zookeeper-3.4.6.tar.gz in the /usr/local directory on agent1 anddecompress it.mv zookeeper-3.4.6.tar.gz /usr/localcd /usr/localtar -zxvf zookeeper-3.4.6.tar.gz
Step 3 Create a soft link for subsequent version update.ln -s zookeeper-3.4.6 zookeeper
----End
6.4.2 Setting ZooKeeper Environment VariablesStep 1 Open the configuration file.
vim /etc/profile
Step 2 Add ZooKeeper to environment variables.export ZOOKEEPER_HOME=/usr/local/zookeeperexport PATH=$ZOOKEEPER_HOME/bin:$PATH
Step 3 Make the environment variables take effect.source /etc/profile
----End
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
6 Kafka Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 74
6.4.3 Modifying the ZooKeeper Configuration FilesStep 1 Switch to the directory where ZooKeeper is located.
cd /usr/local/zookeeper/conf
Step 2 Copy the configuration file.cp zoo_sample.cfg zoo.cfg
Step 3 Modify the configuration file.vim zoo.cfg
1. Change the data directory.dataDir=/usr/local/zookeeper/tmp
2. Add the following code to the end of the file. server.1 to server.3 are thenodes where ZooKeeper is deployed.server.1=agent1:2888:3888server.2=agent2:2888:3888server.3=agent3:2888:3888
Step 4 Create the tmp directory as the data directory.mkdir /usr/local/zookeeper/tmp
Step 5 Create an empty file in the tmp directory and write an ID to the file.touch /usr/local/zookeeper/tmp/myidecho 1 > /usr/local/zookeeper/tmp/myid
----End
6.4.4 Synchronizing the Configuration to Other NodesStep 1 Copy the ZooKeeper configuration to other nodes.
scp -r /usr/local/zookeeper-3.4.6 root@agent2:/usr/localscp -r /usr/local/zookeeper-3.4.6 root@agent3:/usr/local
Step 2 Create a soft link and modify myid on agent2 and agent3.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
6 Kafka Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 75
● agent2:cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 2 > /usr/local/zookeeper/tmp/myid
● agent3:cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 3 > /usr/local/zookeeper/tmp/myid
----End
6.4.5 Running and Verifying ZooKeeperStep 1 Start ZooKeeper on agent1, agent2, and agent3.
cd /usr/local/zookeeper/bin./zkServer.sh start
NO TE
You can stop ZooKeeper on agent1, agent2, and agent3.cd /usr/local/zookeeper/bin./zkServer.sh stop
Step 2 Check the ZooKeeper status../zkServer.sh status
----End
6.5 Deploying Kafka
6.5.1 Obtaining KafkaStep 1 Download Kafka installation package.
wget https://archive.apache.org/dist/kafka/2.2.0/kafka_2.11-2.2.0.tgz
Step 2 Save kafka_2.11-2.2.0.tgz to the /usr/local directory on agent1 and decompressit.mv kafka_2.11-2.2.0.tgz /usr/localtar -zxvf kafka_2.11-2.2.0.tgz
Step 3 Create a soft link for subsequent version update.ln -s kafka_2.11-2.2.0 kafka
----End
6.5.2 Setting Kafka Environment VariablesStep 1 Open the configuration file.
vim /etc/profile
Step 2 Add Kafka to the environment variables.export KAFKA_HOME=/usr/local/kafkaexport PATH=$KAFKA_HOME/bin:$PATH
Step 3 Make the environment variables take effect.source /etc/profile
----End
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
6 Kafka Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 76
6.5.3 Modifying the Kafka Configuration FilesNO TE
All Kafka configuration files are stored in the $KAFKA_HOME/config directory. Beforemodifying the configuration files, go to the $KAFKA_HOME/config directory first.cd $KAFKA_HOME/config
Step 1 Modify the server.properties file.vim server.properties
The modified content is as follows:broker.id=0port=6667host.name=agent1log.dirs=/data/data1/kafka,/data/data2/kafka,/data/data3/kafka,/data/data4/kafka,/data/data5/kafka,/data/data6/kafka,/data/data7/kafka,/data/data8/kafka,/data/data9/kafka,/data/data10/kafka,/data/data11/kafka,/data/data12/kafkazookeeper.connect=agent1:2181,agent2:2181,agent3:2181
NO TE
In the preceding command, set host.name to the IP address of agent1 and log.dirs to theactual data storage path.
Step 2 Synchronize the configuration to other nodes.
1. Copy kafka_2.11-0.10.1.1 to the /usr/local directory on each of agent2 andagent3.scp -r /usr/local/kafka_2.11-2.2.0 root@agent2:/usr/localscp -r /usr/local/kafka_2.11-2.2.0 root@agent3:/usr/local
2. Log in to agent2 and agent3 and separately create a soft link forkafka_2.11-2.2.0.cd /usr/localln -s kafka_2.11-2.2.0 kafka
Step 3 Modify related node parameters.
1. Log in to agent2 and modify the server.properties file.vim server.properties
The modified content is as follows:broker.id=1host.name=agent2 # Enter the corresponding IP address.
2. Log in to agent3 and modify the server.properties file.vim server.properties
The modified content is as follows:broker.id=2host.name=agent3 # Enter the corresponding IP address.
----End
6.5.4 Verifying KafkaStep 1 Run the following commands on agent1 to agent3 to start Kafka:
cd /usr/local/kafka/bin./kafka-server-start.sh -daemon ../config/server.properties
Step 2 Run the following command on each node to check whether all processes arestarted properly.jps
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
6 Kafka Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 77
NO TE
Processes started on agent1 are the same as those on other agent nodes. No process needsto be started on server nodes.
Step 3 Run the following commands on agent1 to agent3 to stop Kafka:cd /usr/local/kafka/bin./kafka-server-stop.sh
----End
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
6 Kafka Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 78
7 Solr Deployment Guide (CentOS 7.6 &openEuler 20.03)
7.1 Introduction
7.2 Environment Requirements
7.3 Configuring the Deployment Environment
7.4 Deploying ZooKeeper
7.5 Deploying Solr
7.1 Introduction
Solr OverviewThis document describes the Solr deployment procedure and does not include thesource code compilation procedure.
All programs required in this document are downloaded from the official websites.Most of these programs are compiled based on the x86 platform and may containmodules that are implemented in platform-dependent languages (such as C/C++).Therefore, incompatibility issues may occur if these programs directly run onTaiShan servers. To resolve the problem, you need to download and compile thesource code and then deploy the programs. The deployment procedure is the sameregardless of the program compilation platform.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
7 Solr Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 79
Recommended Versions
Software
Version How to Obtain
OpenJDK jdk8u252-b09
ARM: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gzx86: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz
ZooKeeper
3.4.6 Download the software package of the required versionfrom:https://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/
Solr 6.2.0 Download the software package of the required versionfrom:https://archive.apache.org/dist/lucene/solr/6.2.0/
Tomcat 8.5.28 Download the software package of the required versionfrom:https://archive.apache.org/dist/tomcat/tomcat-8/v8.5.28/bin/
7.2 Environment Requirements
Hardware
Minimum configuration: any CPU, one DIMM of any capacity, and one drive of anycapacity
The configuration depends on the actual application scenario.
OS Requirements
CentOS 7.4 to 7.6, openEuler 20.03
NO TE
This document uses CentOS 7.6 as an example to describe how to deploy a Solr cluster.
Cluster Data Plan
In this document, four hosts are used as nodes 1 to 4 in a cluster. Table 7-1 liststhe data plan of each node.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
7 Solr Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 80
Table 7-1 Cluster Data Plan
Node IP Address Number of Drives OS & JDK
Node 1 IPaddress1 System drive: 4 TBHDD x 1
Data drive: 12 x 4TB HDD
CentOS 7.6 & OpenJDKjdk8u252-b09
Node 2 IPaddress2
Node 3 IPaddress3
Node 4 IPaddress4
Software PlanTable 7-2 lists the software plan of each node in the cluster.
Table 7-2 Software plan
Node Services
Node 1 -
Node 2 QuorumPeerMain, Bootstrap (Solr)
Node 3 QuorumPeerMain, Bootstrap (Solr)
Node 4 QuorumPeerMain, Bootstrap (Solr)
7.3 Configuring the Deployment EnvironmentStep 1 Log in to nodes 1 to 4 in sequence and change their host names to server1,
agent1, agent2 and agent3.hostnamectl set-hostname host_name --static
Step 2 Log in to each node and modify the /etc/hosts file.
Add the mapping between the IP addresses and host names of the nodes to thehosts file.
IPaddress1 server1IPaddress2 agent1IPaddress3 agent2IPaddress4 agent3
Step 3 Log in to each node and disable the firewall.systemctl stop firewalld.servicesystemctl disable firewalld.service
Step 4 Log in to each node and enable password-free SSH login.
1. Generate a key and press Enter if any message is prompted.ssh-keygen -t rsa
2. Enable password-free SSH login on each node (including password-free loginfor the local node):ssh-copy-id -i ~/.ssh/id_rsa.pub root@node_IP_address
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
7 Solr Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 81
Step 5 Log in to each node and install OpenJDK.
1. Install OpenJDK.ARM:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gz -C /usr/local
x86:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz -C /usr/local
2. Add environment variables.vim /etc/profileexport JAVA_HOME=/usr/local/jdk8u252-b09export PATH=$JAVA_HOME/bin:$PATH
3. Make the environment variables take effect.source /etc/profile
4. Check whether the OpenJDK is successfully installed.java -version
The installation is successful if information similar to the following isdisplayed:
----End
7.4 Deploying ZooKeeper
7.4.1 Compiling and Decompressing ZooKeeperStep 1 Compile the zookeeper-3.4.6.tar.gz deployment package by following instructions
in ZooKeeper 3.4.6 Porting Guide (CentOS 7.6).
Step 2 Place zookeeper-3.4.6.tar.gz in the /usr/local directory on agent1 anddecompress it.mv zookeeper-3.4.6.tar.gz /usr/localcd /usr/localtar -zxvf zookeeper-3.4.6.tar.gz
Step 3 Create a soft link for subsequent version update.ln -s zookeeper-3.4.6 zookeeper
----End
7.4.2 Setting ZooKeeper Environment VariablesStep 1 Open the configuration file.
vim /etc/profile
Step 2 Add ZooKeeper to environment variables.export ZOOKEEPER_HOME=/usr/local/zookeeperexport PATH=$ZOOKEEPER_HOME/bin:$PATH
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
7 Solr Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 82
Step 3 Make the environment variables take effect.source /etc/profile
----End
7.4.3 Modifying the ZooKeeper Configuration FilesStep 1 Switch to the directory where ZooKeeper is located.
cd /usr/local/zookeeper/conf
Step 2 Copy the configuration file.cp zoo_sample.cfg zoo.cfg
Step 3 Modify the configuration file.vim zoo.cfg
1. Change the data directory.dataDir=/usr/local/zookeeper/tmp
2. Add the following code to the end of the file. server.1 to server.3 are thenodes where ZooKeeper is deployed.server.1=agent1:2888:3888server.2=agent2:2888:3888server.3=agent3:2888:3888
Step 4 Create the tmp directory as the data directory.mkdir /usr/local/zookeeper/tmp
Step 5 Create an empty file in the tmp directory and write an ID to the file.touch /usr/local/zookeeper/tmp/myidecho 1 > /usr/local/zookeeper/tmp/myid
----End
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
7 Solr Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 83
7.4.4 Synchronizing the Configuration to Other NodesStep 1 Copy the ZooKeeper configuration to other nodes.
scp -r /usr/local/zookeeper-3.4.6 root@agent2:/usr/localscp -r /usr/local/zookeeper-3.4.6 root@agent3:/usr/local
Step 2 Create a soft link and modify myid on agent2 and agent3.● agent2:
cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 2 > /usr/local/zookeeper/tmp/myid
● agent3:cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 3 > /usr/local/zookeeper/tmp/myid
----End
7.4.5 Running and Verifying ZooKeeperStep 1 Start ZooKeeper on agent1, agent2, and agent3.
cd /usr/local/zookeeper/bin./zkServer.sh start
NO TE
You can stop ZooKeeper on agent1, agent2, and agent3.cd /usr/local/zookeeper/bin./zkServer.sh stop
Step 2 Check the ZooKeeper status../zkServer.sh status
----End
7.5 Deploying Solr
7.5.1 Obtaining SolrStep 1 Place solr-6.2.0.tgz and apache-tomcat-8.5.28.tar.gz in the /usr/local/solrCloud
directory of agent1.
NO TE
The /usr/local/solrCloud directory must be created in advance.
Step 2 Decompress the Solr software package.tar -zxvf solr-6.2.0.tgz
Step 3 Create a soft link for subsequent version update.ln -s solr-6.2.0 solr
Step 4 Decompress the Tomcat software package.tar -zxvf apache-tomcat-8.5.28.tar.gz
Step 5 Create a soft link for subsequent version update.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
7 Solr Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 84
ln -s apache-tomcat-8.5.28 tomcat
----End
7.5.2 Setting Solr Environment VariablesStep 1 Open the configuration file.
vim /etc/profile
Step 2 Add Solr to environment variables.export SOLR_HOME=/usr/local/solrCloudexport PATH=$SOLR_HOME/bin:$PATH
Step 3 Make the environment variable take effect.source /etc/profile
----End
7.5.3 Copy the Solr ConfigurationNO TE
Solr installation directory: /usr/local/solrCloud/solr
Solr configuration file directory: /usr/local/solrCloud/solrConfig
Solr data file directory: /usr/local/solrCloud/solrCores
Tomcat installation directory: /usr/local/solrCloud/tomcat
Step 1 Create Solr configuration file and data file directories.mkdir -p /usr/local/solrCloud/{solrConfig,solrCores}
Step 2 Copy solr-webapps.cp -r /usr/local/solrCloud/solr/server/solr-webapp/webapp /usr/local/solrCloud/tomcat/webapps/solr
Step 3 Copy the jar file from solr/server/lib/ext to tomcat/webapps/solr/WEB-INF/lib.cp -r /usr/local/solrCloud/solr/server/lib/ext/*.jar /usr/local/solrCloud/tomcat/webapps/solr/WEB-INF/lib/
Step 4 Copy the configuration file.cp -r /usr/local/solrCloud/solr/server/solr/configsets/basic_configs/conf/* /usr/local/solrCloud/solrConfigcp -r /usr/local/solrCloud/solr/example/files/conf/velocity /usr/local/solrCloud/solrConfigcp /usr/local/solrCloud/solr/server/solr/solr.xml /usr/local/solrCloud/solrCores
----End
7.5.4 Modifying the ConfigurationStep 1 Open the solrCores/solr.xml file.
vim solrCores/solr.xml
Modify the hostPort file to ensure that the port is the same as the Tomcat port. Inthis document, the default Tomcat port 8080 is used.
<int name="hostPort">8080</int>
Step 2 Run the following command to create a directory:mkdir -p /usr/local/solrCloud/tomcat/conf/Catalina/localhost
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
7 Solr Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 85
Step 3 Create the solr.xml file in the /usr/local/solrCloud/tomcat/conf/Catalina/localhost/ directory.vim /usr/local/solrCloud/tomcat/conf/Catalina/localhost/solr.xml
Add the following content:
<?xml version="1.0" encoding="UTF-8"?><Context docBase="/usr/local/solrCloud/tomcat/webapps/solr" debug="0" crossContext="true"> <Environment name="solr/home" type="java.lang.String" value="/usr/local/solrCloud/solrCores" override="true"/></Context>
Step 4 Modify the /usr/local/solrCloud/tomcat/bin/catalina.sh file.vim /usr/local/solrCloud/tomcat/bin/catalina.sh
Add the following information:
JAVA_OPTS="-DzkHost=Datanode1:2181,Datanode2:2181,Datanode3:2181"
----End
7.5.5 Synchronizing the Configuration to Other NodesStep 1 Copy solrCloud to the /usr/local directory of agent2 and agent3.
scp -r /usr/local/solrCloud root@agent2:/usr/localscp -r /usr/local/solrCloud root@agent3:/usr/local
Step 2 Log in to the agent2 and agent3 nodes and create soft links for Solr and Tomcat.
NO TE
You can skip this step because the Solr and Tomcat directories become the actualdirectories in SCP.
cd /usr/local/solrCloudrm -rf solr tomcatln -s solr-6.2.0 solrln -s apache-tomcat-8.5.28 tomcat
----End
7.5.6 Uploading the Configuration to the ZooKeeper ClusterStep 1 Start the ZooKeeper cluster.
Step 2 Upload the file to Datanode1.java -classpath .:/usr/local/solrCloud/tomcat/webapps/solr/WEB-INF/lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost agent1:2181,agent2:2181,agent3:2181 -confdir /usr/local/solrCloud/solrConfig/ -confname solrconfig
Step 3 Check whether the configuration is successfully uploaded to ZooKeeper.cd /usr/local/zookeeper/bin./zkCli.sh -server agent1:2181ls /configs/solrconfig
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
7 Solr Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 86
Step 4 Disconnect the connection.quit
----End
7.5.7 Running and Verifying SolrStep 1 Run the following command on agent1 to agent3 to start Tomcat (Solr):
/usr/local/solrCloud/tomcat/bin/startup.sh<Connector port="New port number" protocol="HTTP/1.1"
NO TE
● The default port 8080 is often in use. You can change the Tomcat port No. in the ${tomcat}/conf/server.xml file. After the modification, change the value of hostPort insolrCores/solr.xml. For details, see Step 1.
● You can run the following command on agent1 to agent3 to stop Tomcat (Solr):/usr/local/solrCloud/tomcat/bin/shutdown.sh
Step 2 Enter http://agent1:8080/solr/index.html in the address box of the browser andpress Enter to access Solr.
In the address box, enter the IP address of the node where the agent1 processresides.
----End
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
7 Solr Deployment Guide (CentOS 7.6 & openEuler20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 87
8 Spark Cluster Deployment Guide(CentOS 7.6 & openEuler 20.03)
8.1 Introduction
8.2 Environment Requirements
8.3 Configuring the Deployment Environment
8.4 Deploying ZooKeeper
8.5 Deploying Hadoop
8.6 Deploying Spark
8.1 Introduction
Spark OverviewThis document describes the Spark deployment procedure and does not includethe software compilation procedure using source code.
All programs required in this document are downloaded from the official websites.Most of these programs are compiled based on the x86 platform and may containmodules that are implemented in platform-dependent languages (such as C/C++).Therefore, incompatibility issues may occur if these programs directly run onTaiShan servers. To resolve the problem, you need to download and compile thesource code and then deploy the programs. The deployment procedure is the sameregardless of the program compilation platform.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 88
Recommended Versions
Software
Version
How to Obtain
OpenJDK
jdk8u252-b09
ARM: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gzx86: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz
ZooKeeper
3.4.6 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/
Hadoop 3.1.1 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/hadoop/core/hadoop-3.1.1/
Spark 2.3.2 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/spark/spark-2.3.2/
Scala 2.11.12
Download the software package of the required version fromthe official website:https://downloads.lightbend.com/scala/2.11.12/scala-2.11.12.tgz
Hive 3.1.0 Download the software package of the required version fromthe official website:https://archive.apache.org/dist/hive/hive-3.1.0/
8.2 Environment Requirements
Hardware
Minimum configuration: any CPU, one DIMM of any capacity, and one drive of anycapacity
The configuration depends on the actual application scenario.
OS Requirements
CentOS 7.4 to 7.6, openEuler 20.03
NO TE
This document uses CentOS 7.6 as an example to describe how to deploy a Spark cluster.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 89
Cluster Data Plan
In this document, four hosts are used as nodes 1 to 4 in a cluster. Table 8-1 liststhe data plan of each node.
Table 8-1 Cluster data plan
Node IP Address Number of Drives OS & JDK
Node 1 IPaddress1 System drive: 1 x 4TB HDD
Data drive: 12 x 4TB HDD
CentOS 7.6 & OpenJDKjdk8u252-b09
Node 2 IPaddress2
Node 3 IPaddress3
Node 4 IPaddress4
Software Planning
Table 8-2 lists the software plan of each node in the cluster.
Table 8-2 Software plan
Node Service
Node 1 NameNode, ResourceManager, and Master
Node 2 QuorumPeerMain, DataNode, NodeManager, JournalNode, andWorker
Node 3 QuorumPeerMain, DataNode, NodeManager, JournalNode, andWorker
Node 4 QuorumPeerMain, DataNode, NodeManager, JournalNode, andWorker
8.3 Configuring the Deployment EnvironmentStep 1 Log in to nodes 1 to 4 in sequence and change their host names to server1,
agent1, agent2 and agent3.hostnamectl set-hostname host_name --static
Step 2 Log in to each node and modify the /etc/hosts file.
Add the mapping between the IP addresses and host names of the nodes to thehosts file.
IPaddress1 server1IPaddress2 agent1IPaddress3 agent2IPaddress4 agent3
Step 3 Log in to each node and disable the firewall.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 90
systemctl stop firewalld.servicesystemctl disable firewalld.service
Step 4 Log in to each node and enable password-free SSH login.
1. Generate a key and press Enter if any message is prompted.ssh-keygen -t rsa
2. Enable password-free SSH login on each node (including password-free loginfor the local node):ssh-copy-id -i ~/.ssh/id_rsa.pub root@node_IP_address
Step 5 Log in to each node and install OpenJDK.
1. Install OpenJDK.ARM:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gz -C /usr/local
x86:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz -C /usr/local
2. Add environment variables.vim /etc/profileexport JAVA_HOME=/usr/local/jdk8u252-b09export PATH=$JAVA_HOME/bin:$PATH
3. Make the environment variables take effect.source /etc/profile
4. Check whether the OpenJDK is successfully installed.java -version
The installation is successful if information similar to the following isdisplayed:
----End
8.4 Deploying ZooKeeper
8.4.1 Compiling and Decompressing ZooKeeperStep 1 Compile the zookeeper-3.4.6.tar.gz deployment package by following instructions
in ZooKeeper 3.4.6 Porting Guide (CentOS 7.6).
Step 2 Place zookeeper-3.4.6.tar.gz in the /usr/local directory on agent1 anddecompress it.mv zookeeper-3.4.6.tar.gz /usr/localcd /usr/localtar -zxvf zookeeper-3.4.6.tar.gz
Step 3 Create a soft link for subsequent version update.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 91
ln -s zookeeper-3.4.6 zookeeper
----End
8.4.2 Setting ZooKeeper Environment VariablesStep 1 Open the configuration file.
vim /etc/profile
Step 2 Add ZooKeeper to environment variables.export ZOOKEEPER_HOME=/usr/local/zookeeperexport PATH=$ZOOKEEPER_HOME/bin:$PATH
Step 3 Make the environment variables take effect.source /etc/profile
----End
8.4.3 Modifying the ZooKeeper Configuration FilesStep 1 Switch to the directory where ZooKeeper is located.
cd /usr/local/zookeeper/conf
Step 2 Copy the configuration file.cp zoo_sample.cfg zoo.cfg
Step 3 Modify the configuration file.vim zoo.cfg
1. Change the data directory.dataDir=/usr/local/zookeeper/tmp
2. Add the following code to the end of the file. server.1 to server.3 are thenodes where ZooKeeper is deployed.server.1=agent1:2888:3888server.2=agent2:2888:3888server.3=agent3:2888:3888
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 92
Step 4 Create the tmp directory as the data directory.mkdir /usr/local/zookeeper/tmp
Step 5 Create an empty file in the tmp directory and write an ID to the file.touch /usr/local/zookeeper/tmp/myidecho 1 > /usr/local/zookeeper/tmp/myid
----End
8.4.4 Synchronizing the Configuration to Other NodesStep 1 Copy the ZooKeeper configuration to other nodes.
scp -r /usr/local/zookeeper-3.4.6 root@agent2:/usr/localscp -r /usr/local/zookeeper-3.4.6 root@agent3:/usr/local
Step 2 Create a soft link and modify myid on agent2 and agent3.● agent2:
cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 2 > /usr/local/zookeeper/tmp/myid
● agent3:cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 3 > /usr/local/zookeeper/tmp/myid
----End
8.4.5 Running and Verifying ZooKeeperStep 1 Start ZooKeeper on agent1, agent2, and agent3.
cd /usr/local/zookeeper/bin./zkServer.sh start
NO TE
You can stop ZooKeeper on agent1, agent2, and agent3.cd /usr/local/zookeeper/bin./zkServer.sh stop
Step 2 Check the ZooKeeper status../zkServer.sh status
----End
8.5 Deploying Hadoop
8.5.1 Compiling and Decompressing HadoopStep 1 Compile the Hadoop software deployment package hadoop-3.1.1.tar.gz by
referring to Hadoop 3.1.1 Porting Guide (CentOS 7.6).
Step 2 Place hadoop-3.1.1.tar.gz in the /usr/local directory on server1 and decompressit.mv hadoop-3.1.1.tar.gz /usr/localcd /usr/localtar -zxvf hadoop-3.1.1.tar.gz
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 93
Step 3 Create a soft link for later version replacement.ln -s hadoop-3.1.1 hadoop
----End
8.5.2 Setting the Hadoop Environment VariablesStep 1 Open the /etc/profile file:
vim /etc/profile
Step 2 Add the following environment variables to the end of the file:export HADOOP_HOME=/usr/local/hadoopexport PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
Step 3 Make the environment variables take effect.source /etc/profile
----End
8.5.3 Modifying the Hadoop Configuration FileNO TE
All Hadoop configuration files are stored in the $HADOOP_HOME/etc/hadoop directory.Before modifying the configuration files, go to the $HADOOP_HOME/etc/hadoop directoryfirst.cd $HADOOP_HOME/etc/hadoop
Modifying the hadoop-env.sh FileChange the environment variable JAVA_HOME to an absolute path and set theuser to user root.
echo "export JAVA_HOME=/usr/local/jdk8u252-b09" >> hadoop-env.shecho "export HDFS_NAMENODE_USER=root" >> hadoop-env.shecho "export HDFS_SECONDARYNAMENODE_USER=root" >> hadoop-env.shecho "export HDFS_DATANODE_USER=root" >> hadoop-env.sh
Modifying the yarn-env.sh FileChange the user to user root.
echo "export YARN_REGISTRYDNS_SECURE_USER=root" >> yarn-env.shecho "export YARN_RESOURCEMANAGER_USER=root" >> yarn-env.shecho "export YARN_NODEMANAGER_USER=root" >> yarn-env.sh
Modifying the core-site.xml File
Step 1 Open the core-site.xml file.vim core-site.xml
Step 2 Add or modify parameters under the configuration section.<property> <name>fs.defaultFS</name> <value>hdfs://server1:9000</value></property><property> <name>hadoop.tmp.dir</name> <value>/home/hadoop_tmp_dir</value>
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 94
</property><property> <name>ipc.client.connect.max.retries</name> <value>100</value></property><property> <name>ipc.client.connect.retry.interval</name> <value>10000</value></property><property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value></property><property> <name>hadoop.proxyuser.root.groups</name> <value>*</value></property>
NO TICE
Create a directory on server1.mkdir /home/hadoop_tmp_dir
----End
Modifying the hdfs-site.xml File
Step 1 Modify the hdfs-site.xml file.vim hdfs-site.xml
Step 2 Add or modify parameters under the configuration section.<property> <name>dfs.replication</name> <value>1</value></property><property> <name>dfs.namenode.name.dir</name> <value>/data/data1/hadoop/nn</value></property><property> <name>dfs.datanode.data.dir</name> <value>/data/data1/hadoop/dn,/data/data2/hadoop/dn,/data/data3/hadoop/dn,/data/data4/hadoop/dn,/data/data5/hadoop/dn,/data/data6/hadoop/dn,/data/data7/hadoop/dn,/data/data8/hadoop/dn,/data/data9/hadoop/dn,/data/data10/hadoop/dn,/data/data11/hadoop/dn,/data/data12/hadoop/dn</value></property><property> <name>dfs.http.address</name> <value>server1:50070</value></property><property> <name>dfs.namenode.http-bind-host</name> <value>0.0.0.0</value></property><property> <name>dfs.datanode.handler.count</name> <value>600</value></property><property> <name>dfs.namenode.handler.count</name> <value>600</value></property><property> <name>dfs.namenode.service.handler.count</name> <value>600</value>
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 95
</property><property> <name>ipc.server.handler.queue.size</name> <value>300</value></property><property> <name>dfs.webhdfs.enabled</name> <value>true</value></property>
NO TICE
Create a directory for dfs.datanode.data.dir on agent1, agent2, and agent3.Example:mkdir -p /data/data{1,2,3,4,5,6,7,8,9,10,11,12}/hadoop
----End
Modifying the mapred-site.xml File
Step 1 Edit the mapred-site.xml file.vim mapred-site.xml
Step 2 Add or modify parameters under the configuration section.<property> <name>mapreduce.framework.name</name> <value>yarn</value> <final>true</final> <description>The runtime framework for executing MapReduce jobs</description></property><property> <name>mapreduce.job.reduce.slowstart.completedmaps</name> <value>0.88</value></property><property> <name>mapreduce.application.classpath</name> <value> /usr/local/hadoop/etc/hadoop, /usr/local/hadoop/share/hadoop/common/*, /usr/local/hadoop/share/hadoop/common/lib/*, /usr/local/hadoop/share/hadoop/hdfs/*, /usr/local/hadoop/share/hadoop/hdfs/lib/*, /usr/local/hadoop/share/hadoop/mapreduce/*, /usr/local/hadoop/share/hadoop/mapreduce/lib/*, /usr/local/hadoop/share/hadoop/yarn/*, /usr/local/hadoop/share/hadoop/yarn/lib/* </value></property><property> <name>mapreduce.map.memory.mb</name> <value>6144</value></property><property> <name>mapreduce.reduce.memory.mb</name> <value>6144</value> </property> <property> <name>mapreduce.map.java.opts</name> <value>-Xmx5530m</value></property><property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx2765m</value>
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 96
</property><property> <name>mapred.child.java.opts</name> <value>-Xmx2048m -Xms2048m</value></property><property> <name>mapred.reduce.parallel.copies</name> <value>20</value></property><property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property><property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property><property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property>
----End
Modifying the yarn-site.xml FileStep 1 Edit the yarn-site.xml file.
vim yarn-site.xml
Step 2 Add or modify parameters under the configuration section.<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> <final>true</final></property><property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value></property><property> <name>yarn.resourcemanager.hostname</name> <value>server1</value></property><property> <name>yarn.resourcemanager.bind-host</name> <value>0.0.0.0</value></property><property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>65536</value></property><property> <name>yarn.nodemanager.resource.memory-mb</name> <value>102400</value></property><property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>48</value></property><property> <name>yarn.log-aggregation-enable</name> <value>true</value></property><property> <name>yarn.client.nodemanager-connect.max-wait-ms</name> <value>300000</value></property><property>
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 97
<name>yarn.nodemanager.vmem-pmem-ratio</name> <value>7.1</value></property><property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value></property><property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value></property><property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>3072</value></property><property> <name>yarn.app.mapreduce.am.resource.mb</name> <value>3072</value></property><property> <name>yarn.scheduler.maximum-allocation-vcores</name> <value>48</value></property><property> <name>yarn.application.classpath</name> <value> /usr/local/hadoop/etc/hadoop, /usr/local/hadoop/share/hadoop/common/*, /usr/local/hadoop/share/hadoop/common/lib/*, /usr/local/hadoop/share/hadoop/hdfs/*, /usr/local/hadoop/share/hadoop/hdfs/lib/*, /usr/local/hadoop/share/hadoop/mapreduce/*, /usr/local/hadoop/share/hadoop/mapreduce/lib/*, /usr/local/hadoop/share/hadoop/yarn/*, /usr/local/hadoop/share/hadoop/yarn/lib/* </value></property><property> <name>yarn.nodemanager.local-dirs</name> <value>/data/data1/hadoop/yarn/local,/data/data2/hadoop/yarn/local,/data/data3/hadoop/yarn/local,/data/data4/hadoop/yarn/local,/data/data5/hadoop/yarn/local,/data/data6/hadoop/yarn/local,/data/data7/hadoop/yarn/local,/data/data8/hadoop/yarn/local,/data/data9/hadoop/yarn/local,/data/data10/hadoop/yarn/local,/data/data11/hadoop/yarn/local,/data/data12/hadoop/yarn/local</value> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/data/data1/hadoop/yarn/log,/data/data2/hadoop/yarn/log,/data/data3/hadoop/yarn/log,/data/data4/hadoop/yarn/log,/data/data5/hadoop/yarn/log,/data/data6/hadoop/yarn/log,/data/data7/hadoop/yarn/log,/data/data8/hadoop/yarn/log,/data/data9/hadoop/yarn/log,/data/data10/hadoop/yarn/log,/data/data11/hadoop/yarn/log,/data/data12/hadoop/yarn/log</value> </property>
NO TICE
Create a directory for yarn.nodemanager.local-dirs on agent1, agent2, andagent3.Example:mkdir -p /data/data{1,2,3,4,5,6,7,8,9,10,11,12}/hadoop/yarn
----End
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 98
Modifying the slaves or workers Files
Step 1 Check the Hadoop version. If the Hadoop version is earlier than 3.x, edit theslaves file. If the Hadoop version is 3.x or later, edit the workers file.
Step 2 Edit the workers file (taking Hadoop 3.1.1 as an example in this document).vim workers
Step 3 Modify the workers file and delete all content except the IP addresses or hostnames of all agent nodes.agent1agent2agent3
----End
8.5.4 Synchronizing the Configuration to Other NodesStep 1 Create a journaldata directory on each node in sequence.
mkdir -p /usr/local/hadoop-3.1.1/journaldata
Step 2 Copy hadoop-3.1.1 to the /usr/local directory on agent1, agent2, and agent3nodes.scp -r /usr/local/hadoop-3.1.1 root@agent1:/usr/localscp -r /usr/local/hadoop-3.1.1 root@agent2:/usr/localscp -r /usr/local/hadoop-3.1.1 root@agent3:/usr/local
Step 3 Log in to the agent1, agent2, and agent3 nodes and create soft links forhadoop-3.1.1.cd /usr/localln -s hadoop-3.1.1 hadoop
----End
8.5.5 Starting the Hadoop Cluster
NO TICE
Perform operations in this section in sequence.
Step 1 Start the ZooKeeper cluster.
Start ZooKeeper on agent1, agent2, and agent3.
cd /usr/local/zookeeper/bin./zkServer.sh start
Step 2 Start JournalNode.
Start JournalNode on agent1, agent2, and agent3.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 99
NO TE
Perform Step 2 to Step 4 only when you format the cluster for the first time. After theformatting is complete, you only need to perform Step 1, Step 5, and Step 6 when youstart the cluster next time.
cd /usr/local/hadoop/sbin./hadoop-daemon.sh start journalnode
Step 3 Format HDFS.
1. Format HDFS on server1.hdfs namenode -format
2. After the formatting, the cluster generates a directory based on thehadoop.tmp.dir parameter configured in the core-site.xml file.The directory configured in this example is /home/hadoop_tmp.
Step 4 Format ZKFC.
Format ZKFC on server1.
hdfs zkfc -formatZK
Step 5 Start the HDFS.
Start HDFS on server1.
cd /usr/local/hadoop/sbin./start-dfs.sh
Step 6 Start Yarn.
Start Yarn on server1.
cd /usr/local/hadoop/sbin./start-yarn.sh
Step 7 Check whether all processes are started properly.
NO TE
Perform this operation on each node to check whether all processes are started properly.(The following figures show the processes to be started on server1 and agent1, respectively.The processes to be started on other server nodes and agent nodes are similar.)
jps
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 100
----End
8.5.6 Verifying HadoopEnter the URL in the address box of the browser to access the Hadoop web page.The URL format is http://server1:50070.
Change server1 to the IP address of the node where the server process resides.Check whether the number of live nodes is the same as the number of agentnodes (the quantity is 3 in this section) and whether the number of dead nodes is0. If yes, the cluster is started properly.
8.6 Deploying Spark
8.6.1 Obtaining SparkStep 1 Download the Spark package from the following website:
https://archive.apache.org/dist/spark/spark-2.3.2/
Step 2 Place spark-2.3.2-bin-hadoop2.7.tgz in the /usr/local directory on server1 anddecompress it.mv spark-2.3.2-bin-hadoop2.7.tgz /usr/localtar -zxvf spark-2.3.2-bin-hadoop2.7.tgz
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 101
Step 3 Create a soft link for subsequent version update.ln -s spark-2.3.2-bin-hadoop2.7 spark
----End
8.6.2 Setting Spark Environment VariablesStep 1 Open the /etc/profile file.
vim /etc/profile
Step 2 Add the following environment variables to the end of the file:export SPARK_HOME=/usr/local/sparkexport PATH=$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH
Step 3 Make the environment variables take effect.source /etc/profile
----End
8.6.3 Modifying the Spark Configuration FilesNO TE
All Spark configuration files are stored in the $SPARK_HOME/conf directory. Beforemodifying the configuration files, switch to the $SPARK_HOME/conf directory.cd $SPARK_HOME/conf
Modifying the spark-env.sh File
Step 1 Use spark-env.sh.template as the template to copy a file and name it spark-env.sh.cp spark-env.sh.template spark-env.sh
Step 2 Open the spark-env.sh file.vim spark-env.sh
Change the value of the environment variable JAVA_HOME to an absolute path,and specify the Hadoop directory, IP address and port number of the Spark masternode, and Spark directory.
export JAVA_HOME=/usr/local/jdk8u252-b09export HADOOP_HOME=/usr/local/hadoopexport SCALA_HOME=/usr/local/scalaexport HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoopexport HDP_VERSION=3.1.0
----End
Modifying the spark-defaults.conf File
Modify the file.echo "spark.master yarn" >> spark-defaults.confecho "spark.eventLog.enabled true" >> spark-defaults.confecho "spark.eventLog.dir hdfs://server1:9000/spark2-history" >> spark-defaults.confecho "spark.eventLog.compress true" >> spark-defaults.confecho "spark.history.fs.logDirectory hdfs://server1:9000/spark2-history" >> spark-defaults.conf
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 102
Synchronizing the core-site.xml and hdfs-site.xml files of Hadoop.Synchronize the files.
cp /usr/local/hadoop/etc/hadoop/core-site.xml /usr/local/spark/confcp /usr/local/hadoop/etc/hadoop/hdfs-site.xml /usr/local/spark/conf
Synchronizing the mariadb-java-client PackageNO TE
If the Hive database is used, synchronize the mariadb-java-client package.
Synchronize the package.
cp /usr/local/hive/lib/mariadb-java-client-2.3.0.jar /usr/local/spark/jars
8.6.4 Running Spark (Standalone Mode)
8.6.4.1 Synchronizing the Configuration to Other Nodes
Step 1 Copy spark-2.3.2-bin-hadoop2.7 to the /usr/local directory of agent1, agent2,and agent3.scp -r /usr/local/spark-2.3.2-bin-hadoop2.7 root@agent1:/usr/localscp -r /usr/local/spark-2.3.2-bin-hadoop2.7 root@agent2:/usr/localscp -r /usr/local/spark-2.3.2-bin-hadoop2.7 root@agent3:/usr/local
Step 2 Log in to agent1, agent2, and agent3 and create soft links for spark-2.3.2-bin-hadoop2.7.cd /usr/localln -s spark-2.3.2-bin-hadoop2.7 spark
----End
8.6.4.2 Starting the Spark ClusterStart the Spark cluster on the server1 node.
cd /usr/local/spark/sbin./start-all.sh
8.6.4.3 (Optional) Stopping the Spark ClusterStop the Spark cluster on the server1 node.
cd /usr/local/spark/sbin./stop-all.sh
8.6.5 Running Spark (on Yarn Mode)
8.6.5.1 Installing Scala
Step 1 Place scala-2.11.12.tgz in the /usr/local directory on server1 and decompress it.tar -zvxf scala-2.11.12.tgz
Step 2 Create a soft link.ln -s scala-2.11.12 scala
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 103
Step 3 Edit the /etc/profile file.vim /etc/profile
Step 4 Add the following environment variables to the end of the file:export SCALA_HOME=/usr/local/scalaexport PATH=$SCALA_HOME/bin:$PATH
Step 5 Make the environment variables take effect.source /etc/profile
----End
8.6.5.2 Running in the Yarn-client Mode
Step 1 Submit the task to Yarn and enter the spark-shell mode.spark-shell --master yarn --deploy-mode client
Step 2 Access the Yarn web page at http://server1:8088). The new task is displayed.
Change server1 to the IP address of the node where the NameNode processresides.
Step 3 Click Application for a task in the Tracking column. The details page is displayed.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 104
Step 4 Click ApplicationMaster to access the Spark page (If the page cannot be directlydisplayed, manually change server1 to the actual IP address).
----End
8.6.5.3 Using HiBench to Verify the FunctionsNO TE
The cluster name involved in the operations is specified by the fs.defaultFS parameter inthe Hadoop configuration file core-site.xml.
Step 1 Upload HiBench-HiBench-7.0 to the /opt directory and go to the conf directory.cd /opt/HiBench-HiBench-7.0/conf
Step 2 Modify the hadoop.conf file.vim hadoop.conf
Change the value of hibench.hadoop.home to the location where Hadoop isstored and the value of hibench.hadf.master to hdfs://cluster_name.hibench.hadoop.home /usr/local/hadoop/hibench.hdfs.master hdfs://ns1
Step 3 Modify the spark.conf file.vim spark.conf
Change the value of hibench.spark.home to the current location where Spark isstored, the value of hibench.spark.master to yarn-client, and the value ofspark.eventLog.dir to hdfs://cluster name/spark2xJobHistory2.hibench.spark.home /usr/local/sparkhibench.spark.master yarn-clientspark.eventLog.dir = hdfs://ns1/spark2xJobHistory2
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 105
Step 4 Create the spark2xJobHistory2x directory in HDFS and check whether thedirectory is created successfully.hdfs dfs -mkdir /spark2xJobHistory2xhdfs dfs -ls /
Step 5 Switch to the HiBench root directory and generate test data.cd /opt/HiBench-HiBench-7.0/bin/workloads/ml/kmeans/prepare/prepare.sh
Step 6 Run the test script.bin/workloads/ml/kmeans/spark/run.sh
Step 7 The application status of the tasks executed in steps 5 and 6 can be viewed on theYarn web page at http://server1:8088.
Change server1 to the IP address of the node where the server process resides.
Step 8 Check the test result in the report/hibench.report file.vim report/hibench.report
----End
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
8 Spark Cluster Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 106
9 Storm Deployment Guide (CentOS 7.6 &openEuler 20.03)
9.1 Introduction
9.2 Environment Requirements
9.3 Configuring the Deployment Environment
9.4 Deploying ZooKeeper
9.5 Deploying Storm
9.1 Introduction
Storm OverviewThis document describes the Storm deployment procedure. It does not include thesoftware source code compilation procedure.
All programs required in this document are downloaded from the official websites.Most of these programs are compiled based on the x86 platform and may containmodules that are implemented in platform-dependent languages (such as C/C++).Therefore, incompatibility issues may occur if these programs are directly run onTaiShan servers. To resolve the problem, you need to download and compile thesource code and then deploy the programs. The deployment procedure is the sameregardless of the program compilation platform.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
9 Storm Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 107
Recommended Versions
Software
Version
How to Obtain
OpenJDK
jdk8u252-b09
ARM: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gzx86: https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz
ZooKeeper
3.4.6 Download the software package of the required versionfrom:https://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/
Storm 1.2.1 Download the software package of the required versionfrom:https://archive.apache.org/dist/storm/apache-storm-1.2.1/
9.2 Environment Requirements
HardwareMinimum configuration: any CPU, one DIMM of any capacity, and one drive of anycapacity
The configuration depends on the actual application scenario.
OS RequirementsCentOS 7.4 to 7.6, openEuler 20.03
NO TE
This section uses CentOS 7.6 as an example to describe how to deploy a Storm cluster.
Cluster PlanningIn this document, four hosts are used as nodes 1 to 4 in a cluster. Table 9-1 liststhe data plan of each node.
Table 9-1 Cluster data plan
Node IP Address Number of Drives OS & JDK
Node 1 IPaddress1 System drive: 1 x 4TB HDD
CentOS 7.6 & OpenJDKjdk8u252-b09
Node 2 IPaddress2
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
9 Storm Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 108
Node IP Address Number of Drives OS & JDK
Node 3 Data drive: 12 x 4TB HDD
IPaddress3
Node 4 IPaddress4
Software PlanningTable 9-2 lists the software plan of each node in the cluster.
Table 9-2 Software plan
Node Service
Node 1 -
Node 2 Nimbus and UI
Node 3 QuorumPeerMain and Supervisor
Node 4 QuorumPeerMain and Supervisor
9.3 Configuring the Deployment EnvironmentStep 1 Log in to nodes 1 to 4 in sequence and change their host names to server1,
agent1, agent2 and agent3.hostnamectl set-hostname host_name --static
Step 2 Log in to each node and modify the /etc/hosts file.
Add the mapping between the IP addresses and host names of the nodes to thehosts file.
IPaddress1 server1IPaddress2 agent1IPaddress3 agent2IPaddress4 agent3
Step 3 Log in to each node and disable the firewall.systemctl stop firewalld.servicesystemctl disable firewalld.service
Step 4 Log in to each node and enable password-free SSH login.
1. Generate a key and press Enter if any message is prompted.ssh-keygen -t rsa
2. Enable password-free SSH login on each node (including password-free loginfor the local node):ssh-copy-id -i ~/.ssh/id_rsa.pub root@node_IP_address
Step 5 Log in to each node and install OpenJDK.
1. Install OpenJDK.ARM:
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
9 Storm Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 109
wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_aarch64_linux_hotspot_8u252b09.tar.gz -C /usr/local
x86:wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u252-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gztar -zxf OpenJDK8U-jdk_x64_linux_hotspot_8u252b09.tar.gz -C /usr/local
2. Add environment variables.vim /etc/profileexport JAVA_HOME=/usr/local/jdk8u252-b09export PATH=$JAVA_HOME/bin:$PATH
3. Make the environment variables take effect.source /etc/profile
4. Check whether the OpenJDK is successfully installed.java -version
The installation is successful if information similar to the following isdisplayed:
----End
9.4 Deploying ZooKeeper
9.4.1 Compiling and Decompressing ZooKeeperStep 1 Compile the zookeeper-3.4.6.tar.gz deployment package by following instructions
in ZooKeeper 3.4.6 Porting Guide (CentOS 7.6).
Step 2 Place zookeeper-3.4.6.tar.gz in the /usr/local directory on agent1 anddecompress it.mv zookeeper-3.4.6.tar.gz /usr/localcd /usr/localtar -zxvf zookeeper-3.4.6.tar.gz
Step 3 Create a soft link for subsequent version update.ln -s zookeeper-3.4.6 zookeeper
----End
9.4.2 Setting ZooKeeper Environment VariablesStep 1 Open the configuration file.
vim /etc/profile
Step 2 Add ZooKeeper to environment variables.export ZOOKEEPER_HOME=/usr/local/zookeeperexport PATH=$ZOOKEEPER_HOME/bin:$PATH
Step 3 Make the environment variables take effect.source /etc/profile
----End
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
9 Storm Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 110
9.4.3 Modifying the ZooKeeper Configuration FilesStep 1 Switch to the directory where ZooKeeper is located.
cd /usr/local/zookeeper/conf
Step 2 Copy the configuration file.cp zoo_sample.cfg zoo.cfg
Step 3 Modify the configuration file.vim zoo.cfg
1. Change the data directory.dataDir=/usr/local/zookeeper/tmp
2. Add the following code to the end of the file. server.1 to server.3 are thenodes where ZooKeeper is deployed.server.1=agent1:2888:3888server.2=agent2:2888:3888server.3=agent3:2888:3888
Step 4 Create the tmp directory as the data directory.mkdir /usr/local/zookeeper/tmp
Step 5 Create an empty file in the tmp directory and write an ID to the file.touch /usr/local/zookeeper/tmp/myidecho 1 > /usr/local/zookeeper/tmp/myid
----End
9.4.4 Synchronizing the Configuration to Other NodesStep 1 Copy the ZooKeeper configuration to other nodes.
scp -r /usr/local/zookeeper-3.4.6 root@agent2:/usr/localscp -r /usr/local/zookeeper-3.4.6 root@agent3:/usr/local
Step 2 Create a soft link and modify myid on agent2 and agent3.
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
9 Storm Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 111
● agent2:cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 2 > /usr/local/zookeeper/tmp/myid
● agent3:cd /usr/localln -s zookeeper-3.4.6 zookeeperecho 3 > /usr/local/zookeeper/tmp/myid
----End
9.4.5 Running and Verifying ZooKeeperStep 1 Start ZooKeeper on agent1, agent2, and agent3.
cd /usr/local/zookeeper/bin./zkServer.sh start
NO TE
You can stop ZooKeeper on agent1, agent2, and agent3.cd /usr/local/zookeeper/bin./zkServer.sh stop
Step 2 Check the ZooKeeper status../zkServer.sh status
----End
9.5 Deploying Storm
9.5.1 Obtaining StormStep 1 Download the Storm package.
wget https://archive.apache.org/dist/storm/apache-storm-1.2.1/apache-storm-1.2.1.tar.gz
Step 2 Place apache-storm-1.2.1.tar.gz in the /usr/local directory on server1 anddecompress it.mv apache-storm-1.2.1.tar.gz /usr/localtar -zxvf apache-storm-1.2.1.tar.gz
Step 3 Create a soft link for subsequent version update.ln -s apache-storm-1.2.1 storm
----End
9.5.2 Setting Storm Environment VariablesStep 1 Open the configuration file.
vim /etc/profile
Step 2 Add Storm paths to environment variables.export STORM_HOME=/usr/local/stormexport PATH=$STORM_HOME/bin:$PATH
Step 3 Make the environment variables take effect.source /etc/profile
----End
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
9 Storm Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 112
9.5.3 Modifying the Storm Configuration FileNO TE
All Storm configuration files are stored in the $STORM_HOME/conf directory. Beforemodifying the configuration files, switch to the $STORM_HOME/conf.cd $STORM_HOME/conf
Modify the storm.yaml file.vim storm.yaml
Content to be modified is as follows:storm.zookeeper.servers:- "agent1"#You can change it to the corresponding IP address.- "agent2"- "agent3"storm.zookeeper.port: 2181storm.local.dir: "/usr/local/storm/stormLocal"# You need to manually create it.nimbus.seeds: ["server1"]#You can change it to the corresponding IP address.supervisor.slots.ports:#The number of slots depends on the actual situation.- 6700- 6701- 6702- 6703storm.health.check.dir: "healthchecks"storm.health.check.timeout.ms: 5000
9.5.4 Synchronizing the Configuration to Other NodesStep 1 Copy apache-storm-1.2.1 to the /usr/local directory of agent1 to agent3.
scp -r /usr/local/apache-storm-1.2.1 root@agent1:/usr/localscp -r /usr/local/apache-storm-1.2.1 root@agent2:/usr/localscp -r /usr/local/apache-storm-1.2.1 root@agent3:/usr/local
Step 2 On each of agent1 to agent3, create a soft link for apache-storm-1.2.1.cd /usr/localln -s apache-storm-1.2.1 storm
----End
9.5.5 Runing and Verifying StormStep 1 Start a Storm cluster.
1. Start a ZooKeeper cluster. For details, see Running and Verifying ZooKeeper.2. Start the following processes on server1:
/usr/local/storm/bin/storm nimbus &/usr/local/storm/bin/storm ui &
3. Start the following process on agent1 to agent3:/usr/local/storm/bin/storm supervisor &
4. Check whether all processes are started properly. The core process is the UIprocess.jps
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
9 Storm Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 113
NO TE
Observe the process startup status on all nodes. The process that needs to be startedon the server node is the process of server1, and the process that needs to be startedon the agent node is the process of agent1. After starting these processes, wait forabout 30 seconds. Connect the node where the Storm resides to other nodes until theStorm is started properly. Otherwise, the startup fails. (You can restart the Storm torectify the fault.)
Step 2 Stop the Storm cluster.
1. Stop the following processes on server1:jps | grep nimbus | grep -v grep | awk '{print $1}' | xargs kill -9jps | grep core | grep -v grep | awk '{print $1}' | xargs kill -9
2. Stop the following process on agent1 to agent3:jps | grep Supervisor | grep -v grep | awk '{print $1}' | xargs kill -9
3. Stop the ZooKeeper cluster. For details, see Running and VerifyingZooKeeper.
----End
Kunpeng BoostKit for Big DataDeployment Guide (Apache)
9 Storm Deployment Guide (CentOS 7.6 &openEuler 20.03)
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 114
A Change History
Date Description
2021-07-13
This issue is the fourth official release.Added the adaptation to openEuler 20.03 in the deploymentguides of Apache components.
2020-07-24
This issue is the third official release.Moved Elasticsearch and Redis deployment guide to Othercategories. For details, see Elasticsearch Deployment Guide(CentOS 7.6 & openEuler 20.03) and Redis Deployment Guide(CentOS 7.6 & openEuler 20.03).
Kunpeng BoostKit for Big DataDeployment Guide (Apache) A Change History
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 115
Date Description
2020-05-23
This issue is the second official release.● Modified some descriptions in section "1.4.3 Modifying the
ZooKeeper Configuration Files" in the ZooKeeper DeploymentGuide (CentOS 7.6).
● Modified the description in "Environment Requirements" and"Configuring the Deployment Environment" in the ElasticsearchDeployment Guide (CentOS 7.6) as well as the parameterdescription in "Modifying the Elasticsearch Configuration File"and "Synchronizing the Configuration to Other Nodes."
● Modified the parameter description in 3.6.3 Modifying theFlink Configuration Files in the Flink Deployment Guide(CentOS 7.6).
● Modified some descriptions in 4.6.5 Starting the HBaseCluster in the HBase Cluster Deployment Guide (CentOS 7.6).
● Modified the parameter description in 6.5.3 Modifying theKafka Configuration Files in the Kafka Deployment Guide(CentOS 7.6).
● Modified some descriptions in "Deploying a Cluster" in theRedis Deployment Guide (CentOS 7.6).
● Deleted "Troubleshooting" in the Spark Deployment Guide(CentOS 7.6).
● Modified some descriptions in 8.6.5.3 Using HiBench to Verifythe Functions in the Spark Deployment Guide (CentOS 7.6).
2020-03-20
This issue is the first official release.
Kunpeng BoostKit for Big DataDeployment Guide (Apache) A Change History
Issue 05 (2021-10-19) Copyright © Huawei Technologies Co., Ltd. 116