sap hana spark controller installation guide hana spark controller installation guide. ... about sap...
TRANSCRIPT
Installation Guide PUBLIC
SAP HANA Spark Controller 2.0 SP02Document Version: 1.0 – 2017-07-26
SAP HANA Spark Controller Installation Guide
Content
1 Getting Started with SAP HANA Spark Controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.1 Audience. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2 Hardware and Operating System Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3 SAP HANA Spark Controller Releases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4 Download SAP HANA Spark Controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.5 Compatibility. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.6 Overview of Startup and Configuration Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Hadoop Integration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10
3 Installing SAP HANA Spark Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.1 Ambari . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11
Installation Prerequisites (Ambari) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Modify mapreduce.application.classpath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Install SAP HANA Spark Controller Using Ambari. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Post Installation Checks and Troubleshooting (Ambari). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Modify Configuration Properties (Ambari) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Start or Stop SAP HANA Spark Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18SAP HANA Ambari Integration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18Uninstall from Ambari. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Cloudera Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20Installation Prerequisites (Cloudera Manager) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21Install SAP HANA Spark Controller Using Cloudera Manager. . . . . . . . . . . . . . . . . . . . . . . . . . . 22Post Installation Checks and Troubleshooting (Cloudera). . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28Modify Configuration Properties (Cloudera Manager) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29Run the Diagnostic Utility. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30Start or Stop the SAP HANA Spark Controller Service. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32Uninstall from Cloudera Manager. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34Installation Prerequisites (Manual). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35Manually Install SAP HANA Spark Controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36Post Installation Checks and Troubleshooting (Manual). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38Start SAP HANA Spark Controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Uninstall from a Manual Installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .40Update SAP HANA Spark Controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41
3.4 MapR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41Installation Prerequisites (MapR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2 P U B L I CSAP HANA Spark Controller Installation Guide
Content
Add Properties for YARN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Install SAP HANA Spark Controller for MapR Distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . .43
4 Configuring SAP HANA Spark Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .444.1 Port Configurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .444.2 Update Configuration Parameters when Upgrading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.3 Environment Variables for hana_hadoop-env.sh. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .464.4 Spark DataNucleus JARs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Configuring the DataNucleus JARs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.5 Configuration Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Resource Allocation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55Configuring Cloud Deployment Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59Distribution Deployment Configuration Templates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.6 Configure hanaes User Proxy Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60Ambari. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62Cloudera Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.7 Configuring a Proxy Server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644.8 Enabling Remote Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Remote Caching Configuration Parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5 Setting Up Security. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675.1 LDAP Authentication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .675.2 Configure Auditing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685.3 Kerberos. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Configure Kerberos SSO on the SAP HANA Server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725.4 SSL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Configure SSL Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76OpenSSL Command Syntax for SAP HANA Spark Controller. . . . . . . . . . . . . . . . . . . . . . . . . . . 77Configure SSL Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78SSL Mode Configure Parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6 Create a Remote Source. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82
7 Create a Custom Spark Procedure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 837.1 Privileges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 847.2 Virtual Package System Built-Ins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
8 Data Lifecycle Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
9 Troubleshooting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 889.1 Troubleshooting Diagnostic Utility. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Run the Diagnostic Tool. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89Error Messages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
9.2 SAP HANA Hadoop Integration Memory Leak for Spark Versions 1.5.2 and 1.6.x. . . . . . . . . . . . . . . . 94
SAP HANA Spark Controller Installation GuideContent P U B L I C 3
9.3 SAP HANA Spark Controller Unsupported Features and Datatypes for Spark 1.5.2. . . . . . . . . . . . . . 969.4 Cannot Execute Service Actions or Turn Off Service Level Maintenance Mode on Ambari. . . . . . . . . .969.5 SAP Vora - SAP HANA Spark Controller Fails To Start. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 979.6 The TINYINT Datatype is not Supported When Accessing Apache Hive Tables . . . . . . . . . . . . . . . . . 989.7 Fixing Classpath Order - Error Logs Shows the Exception "URI is not hierarchical". . . . . . . . . . . . . . .989.8 Enable SAP HANA Spark Controller to Fetch Data From Each Spark Executor Node in the Network
Directly. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 999.9 Configure SAP HANA Spark Controller for Non-Proxy Server Environments. . . . . . . . . . . . . . . . . . . 999.10 SAP HANA Spark Controller Moves Incorrect Number of Records When Using Date Related Built-
ins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1009.11 Data Warehousing Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4 P U B L I CSAP HANA Spark Controller Installation Guide
Content
1 Getting Started with SAP HANA Spark Controller
SAP HANA spark controller supports SAP HANA in-memory access to data in Hadoop cluster HDFS data files.
Spark controller allows SAP HANA to access Hadoop data through an SQL interface. Primarily working with Spark SQL, spark controller connects to an existing Hive metastore. The Spark SQL adapter is a plug-in for SAP HANA Smart Data Access (SDA) that provides access to spark controller, and moderates query execution and data transfer.
Spark controller is assembled, installed, and configured on a Hadoop cluster. YARN and Spark Assembly JAR are used to connect to the HDFS system, with YARN as the resource management layer for the Hadoop ecosystem. If you are already running SDA scenarios and connecting to HiveServer through an ODBC driver, you can migrate to Spark controller with minimal configuration.
On the Hadoop side, spark controller provides a SQL interface to underlying Hive that use Spark SQL and performs the following functions:
● Facilitates query execution and enables SAP HANA to fetch data in a compressed columnar format.● Supports SAP HANA-specific query optimizations and secure communication.● Facilitates data transfer between SAP HANA and executor nodes.
SAP HANA Spark Controller Architecture
* You can configure spark controller to exploit HDFS as extended storage for aging data from SAP HANA via Data Lifecycle Manager (DLM). See Data Lifecycle Manager [page 87].
SAP HANA Spark Controller Installation GuideGetting Started with SAP HANA Spark Controller P U B L I C 5
Related Information
Audience [page 6]Hardware and Operating System Support [page 6]SAP HANA Spark Controller Releases [page 7]Download SAP HANA Spark Controller [page 7]Compatibility [page 7]Overview of Startup and Configuration Files [page 8]
1.1 Audience
The information in this document is intended for technical users who:
● Want to install and configure spark controller on an existing Hadoop cluster.● Have prior knowledge of monitoring and troubleshooting Hadoop cluster operations.● Are familiarity with Hadoop and Spark, as an operator or developer.
1.2 Hardware and Operating System Support
SAP HANA spark controller is supported on these hardware platforms and operating systems.
● Supported Hardware Platforms:○ Intel-based hardware platforms○ IBM Power Systems running Red Hat 7.2 for Hortonworks 2.6
● Supported Operating Systems for SAP HANA:○ Red Hat Enterprise Linux for SAP Solutions○ Red Hat Enterprise Linux for SAP HANA○ SUSE Linux Enterprise Server for SAP Applications○ SUSE Linux Enterprise Server
NoteFor Debian systems, use the alien computer program to convert a Linux RPM package distribution file format to Debian. For more information, see Manually Install SAP HANA Spark Controller [page 36].
6 P U B L I CSAP HANA Spark Controller Installation Guide
Getting Started with SAP HANA Spark Controller
1.3 SAP HANA Spark Controller Releases
spark controller is a component of SAP HANA platform edition.
spark controller SP releases are delivered on the same schedule as SAP HANA platform, however, on occasion, spark controller is updated with PL releases off schedule of the SAP HANA platform edition software. Be sure to check for the latest patch level releases here:
Software Downloads Site > By Alphabetical Index (A-Z) H SAP HANA PLATFORM EDITION SAP HANA PLATFORM EDITION 2.0 HANA SPARK CONTROLLER 2.0
1.4 Download SAP HANA Spark Controller
The SAP HANA Spark controller installation and upgrade packages are available on the SAP Software Download Center.
Procedure
1. To download the installation media for SAP HANA Spark controller, go to:
Software Downloads Site .2. Click SUPPORT PACKAGES & PATCHES.
3. Go to: By Alphabetical Index (A-Z) H SAP HANA PLATFORM EDITION SAP HANA PLATFORM EDITION 2.0 HANA SPARK CONTROLLER 2.0
Choose the installation package.
1.5 Compatibility
Spark controller 2.0 SP02 is compatible with SAP HANA 2.0 versions, as well as SAP HANA 1.0 SPS 12.
Features introduced in later versions of Spark controller, such as remote caching, are not supported on SAP HANA 1.0 SPS12.
Earlier versions of Spark controller are not compatible with SAP HANA 2.0 versions.
For more information about version compatibility, distribution support, and SAP Vora compatibility, see the SAP HANA Spark Controller Compatibility Matrix.
SAP HANA Spark Controller Installation GuideGetting Started with SAP HANA Spark Controller P U B L I C 7
1.6 Overview of Startup and Configuration Files
There are a number of files that are referenced in this document which are used to define environment variables, properties, or specify the location of directories for: Hadoop distributions, YARN, Hive, Spark, and Spark controller.
Table 1: Supported Files
File Description and Default Location
hanaes_site.xml Lists properties for configuring Spark controller.
Properties that are set in this file for Spark (properties that start with spark) are also respected. You can use these standard Spark parameters to change the general behavior of Spark controller.
Location: /usr/sap/spark/controller/conf/hanaes_site.xml
hana_hadoop-env.sh Lists environment variables that specify Spark controller dependencies, such as the directory locations of components and libraries.
Location: /usr/sap/spark/controller/conf/hana_hadoop-env.sh
hive-site.xml Provides Spark-application configurations for Hive, such as where to connect to a remote Hive Metastore server. Set the HIVE_CONF_DIR environment variable to the location of this file.
Location: /etc/hive/conf/hive-site.xml
Spark assembly JAR file A JAR file that bundles all the required dependencies for running Spark.
Location: The location depends on your Hadoop distribution. An example of the location for Cloudera is: /opt/cloudera/parcels/CDH-<version>/lib/spark/spark-assembly-<version>.jar
hana.spark.controller
File required to start Spark controller.
Location: /var/run/hanaes/hana.spark.controller
hana_controller.log A log file for the hanaes user. The hanaes user is created when installing Spark controller.
Location: /var/log/hanaes/hana_controller.log
spark-defaults.conf Configuration file for setting the default environment for all Spark jobs submitted on the local host.
Location: /usr/sap/spark/controller/conf/spark-defaults.conf
yarn-site.xml Stores YARN configuration options.
Location: The location depends on your Hadoop distribution. An example of the location for MapR is: /opt/mapr/hadoop/hadoop-2.x.x/etc/hadoop/yarn-site.xml
8 P U B L I CSAP HANA Spark Controller Installation Guide
Getting Started with SAP HANA Spark Controller
File Description and Default Location
mapred-site.xml Lists configuration parameters that override the default values for MapReduce parameters.
Location: /opt/mapr/hadoop/hadoop-<version>/etc/hadoop/mapred-site.xml
core-site.xml Lists the configuration parameters for Hadoop, such as I/O settings that are common to HDFS and MapReduce, and informs the Hadoop daemon of the location of the NameNode running on the cluster.
Location: The location depends on your Hadoop distribution. An exampleof the location for Cloudera is: /etc/hadoop/<service_name>/conf/core-site.xml
hdfs-site.xml List the configuration settings for HDFS daemons: NameNode, Secondary NameNode, and DataNodes.
Location: The location depends on your Hadoop distribution. An example of the location for Cloudera is: /etc/hadoop/<service_name>/conf/hdfs-site.xml
hadoop-env.sh Lists Hadoop specific environment variables.
Location: The location depends on your Hadoop distribution. An example of the location for Cloudera is: /etc/hadoop/<service_name>/conf/hadoop-env.sh
log4j.properties
.sparkStaging Spark and YARN file under the /user/hanaes directory on HDFS. To see files, enter: hdfs dfs -ls /user/hanaes (on MaprR use: hadoop fs –mkdir /user/hanaes).
SAP HANA Spark Controller Installation GuideGetting Started with SAP HANA Spark Controller P U B L I C 9
2 Hadoop Integration
There are two methods available to enable communication between SAP HANA and your Hadoop system: SAP HANA spark controller and the Hive ODBC driver.
This document describes installing and configuring SAP HANA spark controller. For additional information about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller Installation Guide.
10 P U B L I CSAP HANA Spark Controller Installation Guide
Hadoop Integration
3 Installing SAP HANA Spark Controller
Depending on your distribution, different options are available for installing and configuring SAP HANA spark controller.
● Cloudera – Use Cloudera Manager or manually install spark controller.● Hortonworks – Use Ambari or manually install spark controller.● MapR – Manually install spark controller.
Related Information
Ambari [page 11]Cloudera Manager [page 20]Manual [page 34]MapR [page 41]
3.1 Ambari
If you are using the Hortonworks distribution of Hadoop, set up SAP HANA spark controller using the Ambari Web UI.
Non-root users do not have permission to install spark controller using Ambari, and must install and configure spark controller manually.
Related Information
Installation Prerequisites (Ambari) [page 12]Modify mapreduce.application.classpath [page 13]Install SAP HANA Spark Controller Using Ambari [page 14]Post Installation Checks and Troubleshooting (Ambari) [page 16]Modify Configuration Properties (Ambari) [page 17]Start or Stop SAP HANA Spark Controller [page 18]SAP HANA Ambari Integration [page 18]Uninstall from Ambari [page 20]
SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 11
3.1.1 Installation Prerequisites (Ambari)
This section provides a list of installation prerequisites when using Spark controller with Ambari.
Task Description
SAP HANA Install one of these spark controller compatible SAP HANA versions:
● 1.0 SPS12● 2.0 SPS00● 2.0 SPS01● 2.0 SPS02
Hadoop cluster Your Hadoop cluster requires Hive metastore, YARN, and Spark. The core-site.xml and hdfs-site.xml files must exist with the appropriate configurations for your Hadoop cluster.
Use Apache Spark 1.5.2 or 1.6.x.
The Hortonworks distributions HDP 2.4, 2.5, and 2.6 have been tested. Additional Hortonworks version are expected to be compatible with Spark controller, but have not been tested.
Download Spark controller
Software Downloads Site > By Alphabetical Index (A-Z) H SAP HANA PLATFORM
EDITION SAP HANA PLATFORM EDITION 2.0 HANA SPARK CONTROLLER 2.0
Spark assembly file If you have Spark installed on your cluster, you can find the assembly file is located here:
Hortonworks – $SPARK_HOME defaults to /usr/hdp/current/spark-client.
During the installation process, you will set the HANA_SPARK_ASSEMBLY_JAR variable to the location of the spark assembly file.
If you do not have Spark installed on your cluster, download it from the Apache Mirror website at https://spark.apache.org/downloads.html .
Proxy User Spark controller impersonates the currently logged-in user while accessing Hadoop services. You can configuring user proxy settings for SAP HANA hanaes in the core-site.xml file.
See Configure hanaes User Proxy Settings [page 60].
Specify the HDP version
Modify the mapreduce.application.classpath. See Modify mapreduce.application.classpath [page 13].
12 P U B L I CSAP HANA Spark Controller Installation Guide
Installing SAP HANA Spark Controller
3.1.2 Modify mapreduce.application.classpath
(Hortonworks distribution only) Using Ambari, modify the mapreduce.application.classpath property to specify the HDP version.
Procedure
1. Find the Hadoop version you are using. The version is listed as a directory under /usr/hdp.
For example, if you are using Hadoop 2.4 and the directory is listed as /usr/hdp/2.4.2.0-258, the Hadoop version (or <hdp.version>) is 2.4.2.0-258.
2. From the menu on the left side of the Ambari Dashboard, click MapReduce2.3. Click the Configs tab.4. Click the Advanced tab, then expand Advanced mapred-site.5. Update the property mapreduce.application.classpath by removing all entries containing $
{<hdp.version>} and replace them with an appropriate HDP value.
An example of this property is:
$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${<hdp.version>}/hadoop/lib/hadoop-lzo-0.6.0.${<hdp.version>}.jar:/etc/hadoop/conf/secure
SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 13
An example of the property with the HDP version is:
$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/2.4.2.0-169/hadoop/lib/hadoop-lzo-0.6.0.2.4.0.0-258.jar:/etc/hadoop/conf/secure
6. Save your configuration changes and restart the MapReduce2 job and related services such as YARN and Hive.
3.1.3 Install SAP HANA Spark Controller Using Ambari
Install SAP HANA Spark controller using the Ambari Web UI.
Prerequisites
See Installation Prerequisites (Ambari) [page 12].
You must have root permission.
Procedure
1. Move the 2.0 SP02 download package to your Ambari Server host. See Download SAP HANA Spark Controller [page 7].
2. Extract the download file, which is a TAR archive file that contains three types of installer binaries: RPM, Ambari, and Cloudera.
3. Copy and extract the controller.distribution-<spark_controller_version>-Ambari-Archive.tar.gz file to your Ambari Server services folder to create a new Spark controller directory. For example:
sudo cp controller.distribution-<spark_controller_version>-Ambari-Archive.tar.gz /var/lib/ambari-server/resources/stacks/HDP/<hdp_version>/services sudo tar -xvf controller.distribution-<spark_controller_version>-Ambari-Archive.tar.gz
4. Restart the Ambari Server by executing either of the following commands:
○ ambari-server restart○ sudo /usr/sbin/ambari-server restart
5. Log in to Ambari and select Actions Add Service :
14 P U B L I CSAP HANA Spark Controller Installation Guide
Installing SAP HANA Spark Controller
6. In the Add Service Wizard, select Choose Services from the left pane, then check the box for SparkController service in the right pane.
7. From the Assign Masters menu, assign the SparkController service to one of the hosts on your cluster.8. Select Customize Services, then select the SparkController tab:
a. Expand the Advanced hana_hadoop-env option. The hana_hadoop_env template includes a list of environment variables and paths. Replace #export HANA_SPARK_ASSEMBLY_JARS= with the path to the Spark JAR file. The Spark JAR file must be located on the same node where Spark controller is installed. For example:
export HANA_SPARK_ASSEMBLY_JAR=$SPARK_HOME/lib/spark-assembly-1.5.2-hadoop2.6.0.jar
b. (Required for Spark versions 1.6.x) Add the path of the datanucleus-* libraries in to HADOOP_CLASSPATH. For example:
export HADOOP_CLASSPATH=/etc/hive/conf:/usr/hdp/<hdp_version>/spark/lib/datanucleus-api-jdo-<version>.jar:/usr/hdp/<hdp_version>/spark/lib/datanucleus-rdbms-<version>.jar:/usr/hdp/<hdp_version>/spark/lib/datanucleus-rdbms-<version>.jar
c. Expand the Advanced hanaes-site option and set the sap.hana.es.server.port number to 7860. Specify the spark.executor.instances and spark.executor.memory values.
The properties set in this form are the default values defined in the hanaes_site.xml file included with the Spark controller.
Depending on your environment, additional properties may be required. For more information, see Configuration Properties [page 49].
d. Expand the Custom hanaes-site option. Click Add Property and add the following properties:
Key: spark.sql.hive.metastore.sharedPrefixes
SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 15
Value: com.mysql.jdbc,org.postgresql,com.microsoft.sqlserver,oracle.jdbc,org.apache.hadoop
9. Click Next, then Deploy to install and start Spark controller on the host that you indicated.10. To verify that Spark controller started correctly without waiting for the Ambari Web UI to update the
controller's status, log onto the host on which you installed Spark controller and check the log, at /var/log/hanaes/hana_controller.log.
Next Steps
● Post Installation Checks and Troubleshooting (Ambari) [page 16].● Troubleshooting Diagnostic Utility [page 88].● Create a Remote Source [page 82].● Add the Ambari URL to SAP HANA Cockpit [page 19]
3.1.4 Post Installation Checks and Troubleshooting (Ambari)
This section provides an overview of the configuration that is performed during the installation. Depending on your environment, additional configuration may be required.
NoteYou can run the diagnostic utility to check your Spark controller installation and configuration. See Troubleshooting Diagnostic Utility [page 88]. The table below provides information about post installation configurations and troubleshooting.
Task Description
Environment variables
When installing Spark controller, you set the location for HANA_SPARK_ASSEMBLY_JARS. This path is set in the hana_hadoop-env.sh file. To confirm or change the path, see Environment Variables for hana_hadoop-env.sh [page 46].
Configure spark controller dependencies in the hana_hadoop-env.sh file. Use the template for your distribution as a starting point. See Environment Variables for hana_hadoop-env.sh [page 46].
16 P U B L I CSAP HANA Spark Controller Installation Guide
Installing SAP HANA Spark Controller
Task Description
Hive Metastore To allow spark controller to connect to the Hive metastore, ensure that Hive is running and available, and that hive-site.xml is available in spark controller’s classpath.
The default Hive configuration path is /etc/hive/conf, and should be available in spark controller's classpath. If the your path differs from the default, update the HIVE_CONF_PATH environment variable in the hana_hadoop-env.sh file, located in the conf directory: /usr/sap/spark/controller/conf.
Configure hanaes When installing Spark controller, you set properties through the Advanced hanaes-site and Custom hanaes-site menus. These configuration properties are defined in the hanaes_site.xml file. Properties can be added or changed in this file, or using Ambari. See Modify Configuration Properties (Ambari) [page 17] and Configuration Properties [page 49].
These properties are required:
● sap.hana.es.server.port● spark.sql.hive.metastore.sharedPrefixes● spark.executor.memory● spark.executor.instances
Cloud deployment Configure spark controller for cloud deployment. See Configuring Cloud Deployment Example [page 59].
Upgrading When updating to a new version of spark controller, be aware of new configuration and deprecated parameters, and changes to distribution formats. See Update Configuration Parameters when Upgrading [page 45].
3.1.5 Modify Configuration Properties (Ambari)
You can modify or view the default SAP HANA Spark controller properties in the Ambari Web UI.
Procedure
1. On the main Ambari page, click SparkController, then select the Configs tab.
Spark and Spark controller support the same properties. For more information about supported properties, see the Apache Spark documentation.
2. Add and enable additional properties for Spark controller by expanding Advanced hanaes-site, or Custom hanaes-site. See Configuration Properties [page 49].
3. Save and restart Spark controller from Ambari.
SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 17
3.1.6 Start or Stop SAP HANA Spark Controller
You can start, stop, or restart SAP HANA Spark controller from the Ambari UI.
Procedure
1. In the Summary section, select SparkController:
2. From the SparkController drop-down menu, select Start, Stop, or Restart.
3.1.7 SAP HANA Ambari Integration
The Apache Ambari integration with SAP HANA cockpit allows you to enter the Ambari Web URL in the cockpit and access Hadoop cluster monitoring functionality using Ambari Web UI.
After entering the Ambari Web URL, you can navigate to the Apache Ambari website and monitor Hadoop clusters. You can also use Ambari to set up Spark controller.
18 P U B L I CSAP HANA Spark Controller Installation Guide
Installing SAP HANA Spark Controller
Related Information
Add the Ambari URL to SAP HANA Cockpit [page 19]
3.1.7.1 Add the Ambari URL to SAP HANA Cockpit
Add Ambari to the SAP HANA cockpit.
Context
After going to the Ambari Web URL, you can navigate to the Apache Ambari website and monitor Hadoop clusters.
Procedure
1. Import the cockpit delivery unit package (HANA_HADOOP_AMBR.tgz) into SAP HANA studio.
2. Using an account with SAP HANA System Administrator role, assign these roles to all users requiring access the web application site:
○ com.sap.hana.hadoop.cockpit.ambari.data::Administrator○ sap.hana.uis.db::SITE_DESIGNER○ sap.hana.uis.db::SITE_USER
3. In the Systems view, right-click on the system name and select Configuration and Monitoring Open SAP HANA Cockpit to launch the SAP HANA cockpit.
4. Log in to the cockpit using the SAP HANA username and password.5. Select Hadoop Cluster on the home page.
If the Hadoop Cluster tile is not available, select Tile Catalog from the menu and add the Hadoop Cluster tile to a desired group.
6. For each cluster, provide a Hadoop cluster name and an Ambari URL (for example, http://my.ambari.server.url:8080).
7. Select a Hadoop cluster and click on Go to navigate to the Ambari website.
SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 19
3.1.8 Uninstall from Ambari
Follow these steps to remove SAP HANA Spark controller.
Procedure
1. Log in to the Ambari Server terminal.
To put the service in installed status and then delete it from the Ambari Web UI, execute:
curl -u <Ambari_User_Name>:<Ambari_password> -H "X-Requested-By: ambari" -X GET "http://<host_name>:8080/api/v1/clusters/<cluster_name>/services/SparkController"
curl -u <Ambari_User_Name>:<Ambari_password> -H "X-Requested-by:ambari" -i -k -X PUT -d '{"ServiceInfo": {"state": "INSTALLED"}}' "http://<host_name>:8080/api/v1/clusters/<cluster_name>/services/SparkController"
curl -u <Ambari_User_Name>:<Ambari_password> -H "X-Requested-By: ambari" -X DELETE "http://<host_name>:8080/api/v1/clusters/<cluster_name>/services/SparkController"
To remove the SparkController directory from Ambari Server:
sudo rm -rf /var/lib/ambari-server/resources/stacks/HDP/<hdp_version>/services/SparkController/
2. To remove and uninstall Spark controller, log in to the Ambari Agent terminal:
To remove the SparkController directory from Ambari Agent, execute:
sudo rm -rf /var/lib/ambari-agent/cache/stacks/HDP/<hdp_version>/services/SparkController/
To uninstall Spark controller using Ambari, execute:
sudo su rm -rf /usr/sap/spark/controller userdel hanaesrm -rf /var/log/hanaes/
3.2 Cloudera Manager
If you are using the Cloudera distribution of Hadoop, set up Spark controller using the Cloudera Manager.
20 P U B L I CSAP HANA Spark Controller Installation Guide
Installing SAP HANA Spark Controller
Related Information
Installation Prerequisites (Cloudera Manager) [page 21]Install SAP HANA Spark Controller Using Cloudera Manager [page 22]Post Installation Checks and Troubleshooting (Cloudera) [page 28]Modify Configuration Properties (Cloudera Manager) [page 29]Run the Diagnostic Utility [page 30]Start or Stop the SAP HANA Spark Controller Service [page 32]Uninstall from Cloudera Manager [page 33]
3.2.1 Installation Prerequisites (Cloudera Manager)
This section provides a list of installation prerequisites when using Spark controller with Cloudera Manager.
Task Description
SAP HANA Install one of these spark controller compatible SAP HANA versions:
● 1.0 SPS12● 2.0 SPS00● 2.0 SPS01● 2.0 SPS02
Hadoop cluster Your Hadoop cluster requires Hive metastore, YARN, and Spark. The core-site.xml and hdfs-site.xml files must exist with the appropriate configurations for your Hadoop cluster.
Use the Spark version distributed with Cloudera Manager.
The Hadoop Cloudera distributions CDH 5.10 and 5.11 have been tested. Additional Cloudera version are expected to be compatible with Spark controller, but have not been tested.
Download Spark controller
Software Downloads Site > By Alphabetical Index (A-Z) H SAP HANA PLATFORM
EDITION SAP HANA PLATFORM EDITION 2.0 HANA SPARK CONTROLLER 2.0
NoteFor Cloudera Manager installations with Spark controller versions 2.0 SP01 and higher, use the parcels distribution format. You can use either parcels or packages if you manually deploy your Cloudera cluster. If you have already installed Cloudera using packages, install Spark controller manually and use the RPM distribution format. See Manual [page 34]. For more information about Cloudera Manager parcels, see Cloudera's documentation .
SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 21
Task Description
Spark assembly file If you have Spark installed on your cluster, you can find the assembly file is located here:
Cloudera – $SPARK_HOME defaults to /usr/lib/spark in package installations and /opt/cloudera/parcels/CDH/lib/spark in parcel installations.
During the installation process, you will set the HANA_SPARK_ASSEMBLY_JAR variable to the location of the spark assembly file.
If you do not have Spark installed on your cluster, download it from the Apache Mirror website at https://spark.apache.org/downloads.html .
Proxy User Spark controller impersonates the currently logged-in user while accessing Hadoop services. You can configuring user proxy settings for SAP HANA hanaes in the core-site.xml file.
See Configure hanaes User Proxy Settings [page 60].
3.2.2 Install SAP HANA Spark Controller Using Cloudera Manager
Install SAP HANA Spark controller using the Cloudera Manager.
Prerequisites
See Installation Prerequisites (Cloudera Manager) [page 21].
Procedure
1. Move the Spark controller 2.0 SP02 download package to the location where the cloudera-scm-server service is running on your Cloudera Manager host. See Download SAP HANA Spark Controller [page 7].
2. Extract the download file, which is a TAR archive file that contains three types of installer binaries: RPM, Ambari, and Cloudera.
After extracting the file, you can choose the operating system, version, and distribution for your system.3. Extract the SAPHanaSparkController-<version>-cloudera-<distribution>.tar.gz file to the
Cloudera Manager directory. For example:
sudo tar -xvf SAPHanaSparkController-<version>-cloudera-<distribution>.tar.gz -C /opt/cloudera
22 P U B L I CSAP HANA Spark Controller Installation Guide
Installing SAP HANA Spark Controller
You see the directory SAPHanaSparkController-<version>-cloudera-<distribution>. Within this directory are the directories csd and parcel-repo directories:
total 7524 SAPHanaSparkController-2.2.0-el7/SAPHanaSparkController-2.2.0-el7/cds/SAPHanaSparkController-2.2.0-el7/cds/SAPHanaSparkController-2.0.0.jarSAPHanaSparkController-2.2.0-el7/parcel-repo/SAPHanaSparkController-2.2.0-el7/parcel-repo/SAPHanaSparkController-2.0.0-sles11.parcelSAPHanaSparkController-2.2.0-el7/parcel-repo/SAPHanaSparkController-2.0.0-sles11.parcel.shaSAPHanaSparkController-2.2.0-el7/parcel-repo/manifest.json
4. When you install Cloudera Manager, you have the option to activate single user mode. In single user mode, the Cloudera Manager Agent and all the processes run by services managed by Cloudera Manager are started as the single configured user and group cloudera-scm. If you are in single user mode, change the owner permissions to cloudera-scm for both user and group. For example:
[root cloudera]# chown cloudera-scm:cloudera-scm -R SAPHanaSparkController-2.2.0-el7/ [root cloudera]# lltotal 20drwxr-xr-x 2 cloudera-scm cloudera-scm 4096 June 18 00:02 cdsdrwxr-xr-x 2 root root 4096 June 18 00:03 parcel-cachedrwxr-xr-x 2 cloudera-scm cloudera-scm 4096 June 18 00:04 parcel-repodrwxr-xr-x 4 root root 4096 June 18 00:05 parcels drwxr-xr-x 4 cloudera-scm cloudera-scm 4096 June 7 01:57 SAPHanaSparkController-2.2.0-el7
5. Copy the files from SAPHanaSparkController-<version>-<distribution> into their respective folders in /opt/cloudera/. For example:
sudo cp <extract_path>/SAPHanaSparkController<version>-<distribution>/csd/* /opt/cloudera/csd/ sudo cp <extract_path>/SAPHanaSparkController<version>-<distribution>/parcel-repo/* /opt/cloudera/parcel-repo/
Directory Files
/opt/cloudera/csd ○ SAPHanaSparkController-<version>.jar
/opt/cloudera/parcel-repo ○ manifest.json○ SAPHanaSparkController-<version>-
<distribution>.parcel○ SAPHanaSparkController-<version>-
<distribution>.parcel.sha
6. To make sure the Cloudera server service can access the command scripts to start and stop Spark controller, run the following command on the console of your Cloudera server host. For example, from the /opt/cloudera/parcel-repo directory:
sudo service cloudera-scm-server restart
7. Log in to your Cloudera Manager Web UI. The Cloudera Manager displays that changes have been made since the last restart, and indicates that you need to restart the Cloudera Management Service.
SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 23
8. Restart Cloudera Manager. This pulls the scripts and the descriptor from the SAPHanaSparkController-<version>.jar:
9. Refresh the browser. When the services restart and their indicators display green check marks, you have restarted successfully.
24 P U B L I CSAP HANA Spark Controller Installation Guide
Installing SAP HANA Spark Controller
10. Download, distribute, and activate new parcels:
a. Select Hosts Parcels to download a new parcel.b. Select Distribute to distribute the Spark controller parcel to each host on your Cloudera cluster:
c. Select Activate to create the hanaes user, and the sapsys group on your host.
11. On the home page of Cloudera Manager Web UI, choose Add a Service from the drop-down menu to the right of the cluster name:
12. From the Service Type menu, select SAP HANA Spark Controller. Click Continue.
13. In the Add SAP HANA Spark Controller Service to Cluster menu, choose the host to install for your Spark controller service and click Continue. Although it is possible to install on more than host, SAP recommends that you only install one host because Spark controller service uses YARN for memory.
SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 25
14. In the Add SAP HANA Spark Controller Service to Cluster, Review Changes menu, confirm the configuration settings for the installation, adjusting for your cluster.
This form provides a subset of properties and environment variables that you can define. After installing Spark controller you can define additional configuration properties.
Depending on your environment, additional properties may be required. For more information, see Configuration Properties [page 49].
15. In the Path of Spark Assemble Jar field, add the absolute path of the Spark assembly JAR file in the Review Changes menu. For example:
/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/spark/lib/spark-assembly-1.6.0-cdh5.11.1-hadoop2.6.0-cdh5.11.1.jar
26 P U B L I CSAP HANA Spark Controller Installation Guide
Installing SAP HANA Spark Controller
Cloudera is compatible with the Spark assemble versions that are included with the Cloudera Manager version you installed.
16. The Extra Classpath for Spark Executors field is set with a default value. Make sure the class the path for Spark executors is correct. For example:
Key: spark.executor.extraClassPath
Value: /opt/cloudera/parcels/CDH/lib/hadoop/lib/*:/opt/cloudera/parcels/CDH/lib/hadoop/*:/opt/cloudera/parcels/CDH/lib/hadoop-hdfs/*:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/*:/opt/cloudera/parcels/CDH/lib/hadoop-yarn/*:/opt/cloudera/parcels/CDH/lib/hive/lib/*
17. Click Continue to install and start Spark controller.
18. Click Continue and restart any dependency services.19. When the installation scripts are complete and the HDFS Hanaes directory is created, click Finish.
You can confirm that Spark controller is started by viewing the log file. For example:
tail -f /var/log/hanaes/hana_controller.log
Next Steps
● Post Installation Checks and Troubleshooting (Cloudera) [page 28].● Run the Diagnostic Utility [page 30].● Create a Remote Source [page 82].
SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 27
3.2.3 Post Installation Checks and Troubleshooting (Cloudera)
This section provides an overview of the configuration that is performed during the installation. Depending on your environment, additional configuration may be required.
NoteYou can run the diagnostic utility to check your Spark controller installation and configuration. See Run the Diagnostic Utility [page 30]. The table below provides information about post installation configurations and troubleshooting.
Task Description
Environment variables
When installing Spark controller, you set the location for Path of Spark Assemble Jar. This path is set in the hana_hadoop-env.sh file. To confirm or change the path, see Environment Variables for hana_hadoop-env.sh [page 46].
Hive Metastore To allow spark controller to connect to the Hive metastore, ensure that Hive is running and available, and that hive-site.xml is available in spark controller’s classpath.
The default Hive configuration path is /etc/hive/conf, and should be available in spark controller's classpath. If the your path differs from the default, update the HIVE_CONF_PATH environment variable in the hana_hadoop-env.sh file, located in the conf directory: /usr/sap/spark/controller/conf.
Configure hanaes When installing Spark controller, you set the properties for Extra Classpath for Spark Executors and Add SAP HANA Spark Controller Service to Cluster. These configuration properties are defined in the hanaes_site.xml file. Properties can be added or changed in this file, or using Cloudera Manager. See Modify Configuration Properties (Cloudera Manager) [page 29] and Configuration Properties [page 49].
These properties are required:
● sap.hana.es.server.port● spark.executor.extraClassPath● spark.executor.memory● spark.executor.instances
Cloud deployment Configure spark controller for cloud deployment. See Configuring Cloud Deployment Example [page 59].
Upgrading When updating to a new version of spark controller, be aware of new configuration and deprecated parameters, and changes to distribution formats. See Update Configuration Parameters when Upgrading [page 45].
28 P U B L I CSAP HANA Spark Controller Installation Guide
Installing SAP HANA Spark Controller
3.2.4 Modify Configuration Properties (Cloudera Manager) After you install SAP HANA Spark controller, you can modify configuration properties.
Prerequisites
You have installed Spark controller and Spark Controller Service is listed on your Cloudera Manager home page.
Procedure
1. On the Cloudera Manager SAP HANA Spark Controller, click on the Configurations tab.
You see the current property settings:
2. The SAP HANA Spark Controller Master Advanced Configuration Snippet (Safety Valve) for Conf/hanaes-site.xml field allows you to set properties for DLM scenarios, cache properties, SSL properties, and so on.
You can set properties in the Editor or as XML:
SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 29
3. When you are finished setting properties, click Save Changes and restart Spark controller.
3.2.5 Run the Diagnostic Utility
Use the diagnostic utility from Cloudera Manager to check your Spark controller installation for errors.
Prerequisites
Spark controller is installed, configured, and is stopped.
Procedure
1. From the Cloudera Manager home page, click on SAP HANA Spark Controller.
2. On the SAP HANA Spark Controller Status page, select the Instances tab.
30 P U B L I CSAP HANA Spark Controller Installation Guide
Installing SAP HANA Spark Controller
3. On the Instances page, click on the SAP HANA Spark Controller Master instance.
4. From the Actions pull-down, select Run DiagnosticUtility.
5. Confirm that you want to run the utility from the pop-up menu.
The Diagnostic Utility checks your Spark controller installation and provides information about installation and configuration errors, and recommendations. The information is displayed in three tabs: stdout, stderr, and Role Log.
6. Click on the stdout tab to view recommendations and errors. Click Full log file to see the entire output.
Each file provides error and debugging information:
SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 31
○ stdout – Shows the error codes that are assigned to the different types of errors.○ stderr – Shows information for debugging purposes.○ Role Log – Shows the hana_controller.log file.
7. Use the Error Messages table to find information about the error codes listed in the output: Error Messages [page 91].
3.2.6 Start or Stop the SAP HANA Spark Controller Service
Start or stop SAP HAN Spark controller from the Cloudera Manager Web UI.
Prerequisites
You have installed Spark controller and Spark Controller Service is listed on your Cloudera Manager home page.
Procedure
● On the Cloudera Manager SAP HANA Spark Controller Status tab, select Start from the SAP HANA Spark Controller Actions drop-down menu.
You see that the SAP HANA Spark Controller Master is started and succeeded. On the Cloudera Manager Web UI home page the status of Spark Controller services is green.
From the same drop-down menu, select Stop to stop the service.
NoteDo not use Restart. Using Restart displays an error similar to, Abruptly stop the remaining roles. Failed to execute command Stop on service Spark Controller, and the details view shows At least one role must be started. To restart the service, use Stop, then Start.
32 P U B L I CSAP HANA Spark Controller Installation Guide
Installing SAP HANA Spark Controller
3.2.7 Uninstall from Cloudera Manager
Uninstall Spark controller using Cloudera Manager by stopping and deleting Spark controller, then clean the file system from Spark controller's installation files.
Context
Cloudera provides universal instructions for uninstalling managed software. For more information, see the Uninstalling Cloudera Manager and Managed Software in Cloudera's Installation and Upgrade documentation.
Procedure
1. Stop and delete the Spark controller service from Cloudera Manager.a. Log on to the Cloudera Manager Web UI and locate the SAP HANA Spark Controller Actions menu for
your cluster.
b. From the pull-down menu, select Stop.
The Stop command shows the details and warning when shutting down the controller.
c. Once Spark controller has stopped, select Delete from the SAP HANA Spark Controller Actions menu to remove the service from your cluster.
d. From the Cloudera Manager navigation bar, select Host Parcels .e. Locate SAPHanaSparkController in the parcel list and choose Deactivate.f. Choose Remove from host.
SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 33
g. Choose Delete.
2. Remove the file system from the Spark controller's installation files.
During the installation of the SAP HANA Spark Controller, you copied installer files to the Cloudera location. In this step, you will remove them.
a. From the command line, remove the following structure from Cloudera for all of the Spark controller installation files:
/opt/cloudera/csd /opt/cloudera/parcel-repo /opt/cloudera/parcel-cache
b. Restart the Cloudera Server to apply all of the changes:
sudo service cloudera-scm-server restart
a. Log on the Cloudera Manager Web UI and open the drop-down menu next to the Cloudera Management Service. Select Restart, to restart the service.
This will make sure that any scripts, that may still be cached are removed from the Cloudera Manager.
3.3 Manual
Install and configure SAP HANA spark controller manually.
If you are using Hortonworks or Cloudera distributions, spark controller can be installed using Ambari or Cloudera Manager, respectively.
The MapR distribution requires manual installation.
Related Information
Installation Prerequisites (Manual) [page 35]Manually Install SAP HANA Spark Controller [page 36]Post Installation Checks and Troubleshooting (Manual) [page 38]Start SAP HANA Spark Controller [page 39]Uninstall from a Manual Installation [page 40]Update SAP HANA Spark Controller [page 41]
34 P U B L I CSAP HANA Spark Controller Installation Guide
Installing SAP HANA Spark Controller
3.3.1 Installation Prerequisites (Manual)
This section provides a list of prerequisites when installing Spark controller manually.
Task Description
SAP HANA Install one of these spark controller compatible SAP HANA versions:
● 1.0 SPS12● 2.0 SPS00● 2.0 SPS01● 2.0 SPS02
Hadoop cluster Your Hadoop cluster requires Hive metastore, YARN, and Spark. The core-site.xml and hdfs-site.xml files must exist with the appropriate configurations for your Hadoop cluster.
See the compatibility Matrix for more information SAP HANA Spark Controller 2.0 SP02 Support for Hadoop Distributions.
Download Spark controller
Software Downloads Site > By Alphabetical Index (A-Z) H SAP HANA PLATFORM
EDITION SAP HANA PLATFORM EDITION 2.0 HANA SPARK CONTROLLER 2.0
NoteFor Cloudera Manager installations with Spark controller versions 2.0 SP01 and higher, use the parcels distribution format. You can use either parcels or packages if you manually deploy your Cloudera cluster. If you have already installed Cloudera using packages, install Spark controller manually and use the RPM distribution format. See Manual [page 34]. For more information about Cloudera Manager parcels, see Cloudera's documentation .
Update YARN properties
For MapR installations, add YARN properties to the yarn-site.xml file.
See Add Properties for YARN [page 43].
SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 35
Task Description
Spark assembly file If you have Spark installed on your cluster, you can find the assembly file is located here:
● Cloudera – $SPARK_HOME defaults to /usr/lib/spark in package installations and /opt/cloudera/parcels/CDH/lib/spark in parcel installations.
● Hortonworks – $SPARK_HOME defaults to /usr/hdp/current/spark-client.
● MapR – $SPARK_HOME defaults to /opt/mapr/spark/spark-<version>/lib/spark-assembly-<version>-mapr-<version>-hadoop<version>-mapr-<version>.jar
During the installation process, you will set the HANA_SPARK_ASSEMBLY_JAR variable to the location of the spark assembly file.
If you do not have Spark installed on your cluster, download it from the Apache Mirror website at https://spark.apache.org/downloads.html .
NoteFor MapR distributions, you can only use the Spark assembly JAR file provided by MapR (not Apache, for example).
Proxy User For Hortonworks and Cloudera only. MapR distributions do not require proxy settings.
Spark controller impersonates the currently logged-in user while accessing Hadoop services. You can configuring user proxy settings for SAP HANA hanaes in the core-site.xml file.
See Configure hanaes User Proxy Settings [page 60].
3.3.2 Manually Install SAP HANA Spark Controller
Install SAP HANA Spark controller manually.
Prerequisites
See Installation Prerequisites (Manual) [page 35].
Context
Although you can deploy your Cloudera or Hortonworks cluster manually, SAP recommends that you use Cloudera Manager or Ambari for the installation. For more information, see Cloudera Manager [page 20] and Ambari [page 11].
36 P U B L I CSAP HANA Spark Controller Installation Guide
Installing SAP HANA Spark Controller
Procedure
1. The Spark controller download file is a TAR archive file that contains three types of installer binaries: RPM, Ambari, and Cloudera. Extract the rpm file.
2. Install Spark controller on one of the Hadoop cluster nodes by executing one of the following.
○ Linux – sudo rpm -i sap.hana.spark.controller-<version>.noarch.rpm○ Debian Linux – sudo alien -c -i sap.hana.spark.controller-<version>.noarch.rpm
NoteIf alien is not available, install it by executing:
sudo apt-get install alien
The installation path is predefined to: /usr/sap/spark. The hanaes account is created during the installation process and is the owner of the installed Spark controller directories and files. The default owning group is sapsys.
3. Log on as the hanaes user: sudo su - hanaes.
4. Confirm that the following folder structure is created, and is owned by user hanaes.
/usr/sap/spark/controller/conf /usr/sap/spark/controller/bin/usr/sap/spark/controller/lib /usr/sap/spark/controller/utils
5. Spark and YARN create a staging directory under the /user/hanaes directory on HDFS. Make sure that user hanaes has full access to the directory by creating this directory manually and assigning the necessary permissions to the hanaes user. For example:
hdfs dfs -mkdir /user/hanaes; hdfs dfs -chown hanaes:hdfs /user/hanaes; hdfs dfs -chmod 744 /user/hanaes;
For MapR:
su mapr hadoop fs –mkdir /user/hanaeshadoop fs –chown hanaes:sapsys /user/hanaes
6. Configure Spark controller.a. If you are not logged-in as the hanaes user, execute:sudo su - hanaes.b. Go to /usr/sap/spark/controller/conf, and review or edit the hana_hadoop-env.sh file.
The default file contains the following environment variables:○ HANA_SPARK_ASSEMBLY_JAR – (Required) Enter the path of Spark assembly JAR. This location
depends on your Hadoop distribution.○ HADOOP_CLASSPATH – (Optional) Enter the location of the Hadoop and Hive libraries.○ HADOOP_CONF_DIR – (Required) Directory containing all *-site.xml files. The default location
is: /etc/hadoop/conf.○ HIVE_CONF_DIR – (Required) Directory containing the hive-site.xml file. The default location
is: /etc/hive/conf.
SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 37
Additional configuration propertis and templates are listed here: Environment Variables for hana_hadoop-env.sh [page 46].
c. Go to /usr/sap/spark/controller/conf, and review or edit the hanaes-site.xml file.
The default file contains the following properties:○ sap.hana.es.server.port – (Required) 7860 is the default listening port which exchanges
requests with SAP HANA. The listening port +1 is used to transmit data.○ sap.hana.es.driver.host – (Optional) If the host on which Spark controller is installed has
multiple network interfaces, or if your hosts file contains an ambiguous resolution you can specify the name or IP address of the host where Spark controller is running.
○ sap.hana.executor.count – (Required) Number of YARN executors.○ sap.hana.executor.memory – (Required) Allocates the amount of memory to use per YARN
executor node.○ sap.hana.hadoop.engine.facades – (Required) Defines the Hadoop processing engines.○ sap.hana.es.warehouse.dir – (Required for DML scenarios only) Defines the DLM warehouse
directory location.
Additional configuration properties are listed here: Configuration Properties [page 49].7. Start Spark controller:
cd /usr/sap/spark/controller/bin ./hanaes start
Next Steps
● Post Installation Checks and Troubleshooting (Manual) [page 38].● Troubleshooting Diagnostic Utility [page 88].● Create a Remote Source [page 82].
3.3.3 Post Installation Checks and Troubleshooting (Manual)
This section provides an overview of the configuration that is performed during the installation. Depending on your environment, additional configuration may be required.
NoteYou can run the diagnostic utility to check your Spark controller installation and configuration. See Troubleshooting Diagnostic Utility [page 88]. The table below provides information about post installation configurations and troubleshooting.
38 P U B L I CSAP HANA Spark Controller Installation Guide
Installing SAP HANA Spark Controller
Task Description
Include the datanucleus path
For MapR installations, add the datanucleus path classes. See Spark DataNucleus JARs [page 47].
Hive Metastore To allow spark controller to connect to the Hive metastore, ensure that Hive is running and available, and that hive-site.xml is available in spark controller’s classpath.
The default Hive configuration path is /etc/hive/conf, and should be available in spark controller's classpath. If the your path differs from the default, update the HIVE_CONF_PATH environment variable in the hana_hadoop-env.sh file, located in the conf directory: /usr/sap/spark/controller/conf.
Configure hanaes Configure Spark controller properties in the hanaes_site.xml file.
These properties are required:
● sap.hana.es.server.port● spark.executor.extraClassPath (Cloudera)
● spark.sql.hive.metastore.sharedPrefixes (Hortonworks and MapR)
● spark.executor.memory● spark.executor.instances
Depending on your environment, additional properties may be required. For more information, see Configuration Properties [page 49].
Upgrading When updating to a new version of spark controller, be aware of new configuration and deprecated parameters, and changes to distribution formats. See Update Configuration Parameters when Upgrading [page 45].
3.3.4 Start SAP HANA Spark Controller
Once you have updated all configuration parameters, follow these steps to start SAP HANA Spark controller.
Procedure
1. Start Spark controller:
Option Description
Hortonworks sudo su - hanaes; cd /usr/sap/spark/controller/bin/;./hanaes start;
SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 39
Option Description
Cloudera sudo su - hanaes; export HADOOP_HOME=/usr/lib/hadoop/cd /usr/sap/spark/controller/bin/./hanaes start
NoteFor the exact location of HADOOP_HOME, see Cloudera documentation. Some Cloudera Hadoop versions may also have a specific JAVA_HOME. You can either add these environment variables to/usr/sap/spark/controller/conf/hana_hadoop-env.sh, or export them each time.
MapR sudo su - hanaes; cd /usr/sap/spark/controller/bin/; ./hanaes start;
2. Check the /var/log/hanaes/hana_controller.log file to make sure that Spark controller is started.
Next Steps
You can now consume remote data using SAP HANA smart data access.
3.3.5 Uninstall from a Manual Installation
Remove the Spark controller package to uninstall Spark controller. Optionally, you can remove the hanaes user, and the configuration files.
Procedure
1. Check for the Spark controller package:
rpm -qa | grep sap.hana.spark.controller
2. Remove the Spark controller package:
rpm -e sap.hana.spark.controller-*
3. (Optional) Remove the following structure to remove the hanaes user and the configuration files:
/usr/sap/spark/controller/conf /usr/sap/spark/controller/bin/usr/sap/spark/controller/lib
40 P U B L I CSAP HANA Spark Controller Installation Guide
Installing SAP HANA Spark Controller
/usr/sap/spark/controller/utils
3.3.6 Update SAP HANA Spark Controller
Manually update the SAP HANA Spark controller installation.
Prerequisites
● Spark controller must be running when executing the upgrade command.● Check for changes to configuration parameters. See Update Configuration Parameters when Upgrading
[page 45].
Procedure
Execute the following command to update the Spark controller installation. For example:
rpm -Uvh sap.hana.spark.controller-2.2.0-1.noarch.rpm
The rpm arguments -v and -h are optional:○ -v: verbose○ -h: print hash marks as the package archive is unpacked.
3.4 MapR
If you are using the MapR distribution of Hadoop, set up SAP HANA spark controller manually.
Related Information
Installation Prerequisites (MapR) [page 42]Add Properties for YARN [page 43]Install SAP HANA Spark Controller for MapR Distributions [page 43]
SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 41
3.4.1 Installation Prerequisites (MapR)
This section provides a list of prerequisites when installing SAP HANA spark controller manually.
Task Description
SAP HANA Install one of these spark controller compatible SAP HANA versions:
● 1.0 SPS12● 2.0 SPS00● 2.0 SPS01● 2.0 SPS02
Hadoop cluster Your Hadoop cluster requires Hive metastore, YARN, and Spark. The core-site.xml and hdfs-site.xml files must exist with the appropriate configurations for your Hadoop cluster.
See the compatibility Matrix for more information SAP HANA Spark Controller 2.0 SP02 Support for Hadoop Distributions.
Download Spark controller
Software Downloads Site > By Alphabetical Index (A-Z) H SAP HANA PLATFORM
EDITION SAP HANA PLATFORM EDITION 2.0 HANA SPARK CONTROLLER 2.0
Update YARN properties
Add YARN properties to the yarn-site.xml file.
See Add Properties for YARN [page 43].
Spark assembly file If you have Spark installed on your cluster, you can find the assembly file is located here:
MapR – $SPARK_HOME defaults to /opt/mapr/spark/spark-<version>/lib/spark-assembly-<version>-mapr-<version>-hadoop<version>-mapr-<version>.jar
During the installation process, you will set the HANA_SPARK_ASSEMBLY_JAR variable to the location of the spark assembly file.
NoteFor MapR distributions, you can only use the Spark assembly JAR file provided by MapR (not Apache, for example).
42 P U B L I CSAP HANA Spark Controller Installation Guide
Installing SAP HANA Spark Controller
3.4.2 Add Properties for YARN
Before you install SAP HANA Spark controller, follow these steps to configure MapR.
Context
For MapR distributions, you can only use the Spark assembly JAR file provided by MapR (not Apache, for example). This allows the application to be generated on YARN. Also, make sure you have installed the Spark service when deploying the cluster.
Procedure
1. Add the following properties to /opt/mapr/hadoop/hadoop-2.x.x/etc/hadoop/yarn-site.xml:
<property> <name>yarn.application.classpath</name> <value>/opt/mapr/hadoop/hadoop-<version>/etc/hadoop/:/opt/mapr/hive/hive-<version>/conf:/opt/mapr/hadoop/hadoop-<version>/share/hadoop/hdfs/lib/*:/opt/mapr/hadoop/hadoop-<version>/share/hadoop/yarn/lib/*:/opt/mapr/hadoop/hadoop-<version>/share/hadoop/yarn/*:/opt/mapr/hadoop/hadoop-<version>/share/hadoop/mapreduce/*:/opt/mapr/hadoop/hadoop-<version>/share/hadoop/mapreduce/lib/*:/opt/mapr/hadoop/hadoop-<version>/share/hadoop/hdfs/*:/opt/mapr/hadoop/hadoop-<version>/share/hadoop/common/*:/opt/mapr/hadoop/hadoop-<version>/share/hadoop/common/lib/*</value> </property><property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>102400</value> </property>
NoteFor reference, see the MapR documentation at http://doc.mapr.com/display/MapR/yarn-site.xml.
2. Restart the YARN Node managers per-node agent.
3.4.3 Install SAP HANA Spark Controller for MapR Distributions
For MapR distributions, you need to install SAP HANA spark controller manually.
See Manual [page 34].
SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 43
4 Configuring SAP HANA Spark Controller
Configure Spark controller environment variables and override property values.
This section describes the values and properties used to configure Spark controller.
Configuration File Configuration Type
hanaes_site.xml ● Configuration Properties [page 49]● Limit Resource Allocations [page 56]● Configuring Cloud Deployment Example [page 59]
hana_hadoop-env.sh
● Environment Variables for hana_hadoop-env.sh [page 46]● Distribution Deployment Configuration Templates [page 59]
hive-site.xml ● LDAP Authentication [page 67]
Related Information
Port Configurations [page 44]Update Configuration Parameters when Upgrading [page 45]Environment Variables for hana_hadoop-env.sh [page 46]Spark DataNucleus JARs [page 47]Configuration Properties [page 49]Configure hanaes User Proxy Settings [page 60]Configuring a Proxy Server [page 64]Enabling Remote Caching [page 65]
4.1 Port Configurations
SAP HANA spark controller is configured to use ports 7860 and 7861 by default.
spark controller uses port 7860 to exchange requests with SAP HANA, and port 7861 is used by SAP HANA to transmit the data, or “tunnel” data. When tunneling the data, the data is sent from the Hadoop cluster nodes (executors), through spark controller to SAP HANA. Tunneling does not require a proxy server, however a proxy server can be configured. See Configuring a Proxy Server [page 64].
To confirm ports 7860 and 7861 are available, execute:
netstat -nlp | grep 7860 netstat -nlp | grep 7861
44 P U B L I CSAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller
NoteFor non-proxy server environments, tunneling data is performed with ports 56000 – 58000 open. The data is sent from Hadoop node (executor) directly to SAP HANA, but it does not go through spark controller. For more information, see https://launchpad.support.sap.com/#/notes/2554388
4.2 Update Configuration Parameters when Upgrading
When updating to a new version of SAP HANA Spark controller, be aware of new configuration and deprecated parameters, and changes to distribution formats.
● 2.0 SP02The sap.hana.hadoop.engine.facades parameter has been added in 2.0 SP02. Use this parameter to list the facades connecting to Hadoop processing engines. This parameter replaces sap.hana.hadoop.datastore.
● 2.0 SP01Spark controller 2.0 SP01 and higher supports only the parcels distribution format for Cloudera Manager installations.When upgrading:○ If you have already installed Cloudera using the packages distribution format, install Spark controller
manually and use the RPM distribution format. See Manual [page 34]. For more information about Cloudera Manager parcels, see Cloudera's documentation .
○ If you are using manually installing Cloudera using packages, be sure to maintain the correct paths in these files:hanaes-site.xml:
<property> <name>spark.executor.extraClassPath</name> <value>/usr/lib/hadoop/lib/*:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-yarn/*:/usr/lib/hive/lib/*</value> <final>true</final> <description>Shared libraries to be loaded once</description></property>
hana_hadoop-env.sh.
#!/bin/bash export JAVA_HOME=/usr/java/jdk1.7.0_67-clouderaexport HADOOP_HOME=/usr/lib/hadoopexport HADOOP_CONF_DIR=/etc/hadoop/confexport HIVE_CONF_DIR=/etc/hive/confexport HANAES_LOG_DIR=/var/log/hanaes export HANA_SPARK_ASSEMBLY_JAR=/usr/lib/spark/lib/spark-assembly-<version>-cdh<version>-hadoop<version>-cdh<version>.jar export HADOOP_CLASSPATH=/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-yarn/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hive/lib/*#export HANA_SPARK_ADDITIONAL_JARS=#export HANAES_CONF_DIR=/etc/hanaes/conf
SAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller P U B L I C 45
● 2.0Support for the hadoop.proxyuser.hanaes.hosts and hadoop.proxyuser.hanaes.groups parameters is new for version 2.0. If you are upgrading from an earlier version, you can use these parameters to configure user proxy settings. See Configure hanaes User Proxy Settings [page 60].
Related Information
Configuration Properties [page 49]
4.3 Environment Variables for hana_hadoop-env.sh
Configure SAP HANA Spark controller dependencies using these environment variables.
NoteStarting with version 1.6 PL1, the Spark controller installation does not respect dependencies copied into either the HDFS or the local file system.
Use the following environment variables to configure Spark controller dependencies in the conf/hana_hadoop-env.sh file.
Variable Name Description
JAVA_HOME By default, Spark controller uses the Java available in local path. This can be overridden or configured by setting this variable. Default value: None.
HADOOP_HOME (Optional) Directory where all components and libraries are installed. Default value: None.
HADOOP_CONF_DIR Directory where all *-site.xml files are available. Refer to your distribution documentation to identify the location. Default value: /etc/hadoop/conf.
HIVE_CONF_DIR File system location where hive-site.xml is available. Default value for Hortonworks and Cloudera: /etc/hive/conf.
NoteThe value can be changed for the different distributions of Hadoop. For example, use the value of /opt/mapr/hive/hive-1.2/conf for MapR.
HANAES_LOG_DIR Location to which all Spark controller logs are written. Default: /var/log/hanaes.
46 P U B L I CSAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller
Variable Name Description
HANA_SPARK_ASSEMBLY_JAR Required variable that points to the path of Spark assembly JAR. This location depends on your Hadoop distribution. If you manually downloaded the Spark assembly from Apache, specify the location of the assembly JAR. Default value: None.
HADOOP_CLASSPATH (Optional) Location of the Hadoop and Hive libraries. Different Hadoop distributions follow different installation paths; use this variable to configure them. Spark controller tries to locate them automatically. When you are running Spark 1.6, the Joda-Time dependencies are required. Configure the respective locations using this environment variable. Default value: None.
NoteThe dependencies provided are made available in the local classpath and are not copied to the running Spark context.
HANA_SPARK_ADDITIONAL_JARS
Add additional dependency JARs, separated by colon. Default value: None.
NoteThese dependencies are made available in running Spark context. When connecting to SAP Vora, you can specify SAP Vora data source dependency with this variable.
HANAES_CONF_DIR (Optional) Use this to override the configuration directory for Spark controller. Default value: /usr/sap/spark/controller/conf.
4.4 Spark DataNucleus JARs
SAP HANA spark controller has a dependency on Spark DataNucleus libraries.
These DataNucleus libraries are required:
● datanucleus-api-jdo.jar – Provides the DataNucleus implementation of the JDO API.● datanucleus-core.jar – Provides a DataNucleus persistence mechanism.● datanucleus-rdbms.jar – Provides persistence to the RDBMS datastore.
Depending on the distribution and version, these Spark libraries may be missing from your environment, or your Hadoop environment might include the wrong version of DataNucleus files. Some of the issues are: the libraries are missing from your installation, the currently installed libraries are incompatible with your Hadoop distribution and version, or multiple JAR versions are specified in the classpath.
For example, Apache Spark 1.6 is integrated with Cloudera 5.7 and later. If your Hadoop distribution was configured with a different version of Spark, the DataNucleus included your Spark JAR files may not be compatible with your distribution and this will result in errors raised by spark controller.
SAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller P U B L I C 47
If you see errors in the /var/log/hanaes/hana_controller.log file stating that the datanucleus-* classes are not found, check to see if the JARs are missing, or if the incorrect version of the DataNucleus JARs are included in your Hadoop environment. See the documentation for your distribution for more information:
● Cloudera – Product Compatibility Matrix .● Hortownworks – Hortonworks Documentation .● MapR – Interoperability Matrix .
Related Information
Configuring the DataNucleus JARs [page 48]
4.4.1 Configuring the DataNucleus JARs
Include the DataNucleus libraries in your hana_hadoop-env.sh configuration.
NoteIf you see an error in the /var/log/hanaes/hana_controller.log file stating that the datanucleus-* classes are not found, include these libraries in the hana_hadoop-env.sh configuration.
● Installing Spark Controller Using Ambari
1. Go to SparkController Configs Advanced hana_hadoop_env and add the path of the datanucleus-* libraries in to HANA_SPARK_ADDITIONAL_JARS.For example:
export HANA_SPARK_ADDITIONAL_JARS=/usr/hdp/<hdp_version>/spark-client/lib/datanucleus-api-jdo-<version>.jar:/usr/hdp/<hdp_version>/spark-client/lib/datanucleus-core-<version>.jar:/usr/hdp/<hdp_version>/spark-client/lib/datanucleus-rdbms-<version>.jar
2. Save the configurations and restart spark controller.● Installing Spark Controller Using Cloudera Manager
1. If you installed spark controller using Cloudera Manager, go to SAP HANA Spark ControllerConfiguration HANA_SPARK_ADDITIONAL_JARS and add the path of the datanucleus-* libraries..For example:
export HANA_SPARK_ADDITIONAL_JARS=/usr/lib/hive/lib/datanucles-api-jdo-<version>.jar:/usr/lib/hive/lib/datanucleaus-core-<version>.jar:/usr/lib/hive/lib/datanucleus-rdbms-<version>.jar
2. Save the configurations and restart spark controller.● Manually Installation on Hortonworks or Cloudera
1. If you installed Spark controller manually on Hortonworks or Cloudera, edit /usr/sap/spark/controller/conf/hana_hadoop-env.sh and add the path of datanucleus-* libraries into HANA_SPARK_ADDITIONAL_JARS.
48 P U B L I CSAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller
For example:
export HANA_SPARK_ADDITIONAL_JARS=/usr/hdp/<hdp_version>/hive-metastore/lib/datanucleus-api-jdo-<version>.jar:/usr/hdp/<hdp_version>/hive-metastore/lib/datanucleus-core-<version>.jar:/usr/hdp/<hdp_version>/hive-metastore/lib/datanucleus-rdbms-<version>.jar
2. Save the configurations and restart spark controller.● Manually Installing Spark Controller on MapR
1. If you installed spark controller manually on MapR, edit /usr/sap/spark/controller/conf/hana_hadoop-env.sh, add the path of datanucleus-* libraries into HANA_SPARK_ADDITIONAL_JARS.
export HANA_SPARK_ADDITIONAL_JARS=/opt/mapr/hive/<hive_version>/lib/datanucleus-api-jdo-<version>.jar:/opt/mapr/hive/<hive_version>/lib/datanucleus-core-<version>.jar:/opt/mapr/hive/<hive_version>/lib/datanucleus-rdbms-<version>.jar
2. Save the configurations and restart spark controller.
4.5 Configuration Properties
Define values in the hanaes_site.xml file to override default configuration properties for Spark and Spark controller.
The file is located in /usr/sap/spark/controller/conf.
NoteSpark controller respects all other Spark parameters that start with spark. Use these standard Spark parameters to change the general behavior of Spark controller. See https://spark.apache.org/docs/1.6.1/configuration.html .
Spark Controller Ports and Hosts
Name Default Values
Description
sap.hana.es.server.port 7860 The Spark controller listening port number that exchanges requests with SAP HANA.
7861 is used by SAP HANA to transmit data. This port is calculated from sap.hana.es.server.port to port sap.hana.es.server.port +1
SAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller P U B L I C 49
Name Default Values
Description
sap.hana.es.driver.host None IP address of the host where Spark controller is running. If the host on which Spark controller is installed has multiple network interfaces, or if your hosts file (/etc/hosts) contains ambiguous resolution, maintain the property with the IP address of your host.
sap.hana.dmz.proxy.host None IP address of the proxy server you wish to use for tunneling data through a proxy server.
See Port Configurations [page 44].
Spark Controller Class Path
Name Default Values
Description
spark.executor.extraClassPath (Cloudera)
spark.sql.hive.metastore.sharedPrefixes (Hortonworks and MapR)
None Defines location of shared libraries. Provides extra classpath entries to prepend to the classpath of executors.
See Distribution Deployment Configuration Templates [page 59].
Spark Controller HDFS Location
Name Default Values
Description
sap.hana.es.spark.yarn.jar None Location of the Spark Assembly JAR. Obtain the JAR from either Apache mirrors or respective Hadoop vendors.
sap.hana.es.lib.location None Location where all open source libraries are available.
50 P U B L I CSAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller
Spark Controller Timeout
Name Default Values
Description
sap.hana.connection.timeout 120 Connection time out in seconds (for all network traffic).
sap.hana.datatransfer.timeout 2 Connection time out in minutes for SAP HANA to transfer data. The query is canceled when the time has elapsed.
Spark Controller Cache
Name Default Values
Description
sap.hana.es.cache.max.capacity
500 Maximum number of queries to be cached in disk.
sap.hana.es.enable.cache False Enables remote caching for Spark controller.
See Enabling Remote Caching [page 65].
Resource Allocations
Name Default Values
Description
spark.executor.instances None The number of YARN executors.
Do not specify both spark.executor.instances and spark.dynamicAllocation.enabled. If you do, this will result in an error.
spark.executor.memory None Allocates available memory for YARN executor nodes.
spark.yarn.queue None Allocates the percentage of resources on the dedicated queue for Spark controller on the YARN Resource Manager.
spark.dynamicAllocation.enabled
False Enables dynamic allocation of executors. Dynamic allocation also requires spark.shuffle.service.enabled to be set to true.
SAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller P U B L I C 51
Name Default Values
Description
spark.shuffle.service.enabled False Enables the external shuffle service. The external shuffle service must be set on each worker node in the same cluster for this service to work properly.
spark.dynamicAllocation.minExecutors
None Minimum number of executors for Spark controller.
spark.dynamicAllocation.maxExecutors
None Maximum number of executors for Spark controller.
spark.dynamicAllocation.initialExecutors
None Initial number of executors for Spark controller.
See Resource Allocation [page 55].
Spark Connection to Hive (Ambari)
Name Default Values
Description
spark.sql.hive.metastore.sharedPrefixes
None Defines location of shared libraries. When using Ambari, use this property to configure the connection access from Spark to Hive Metastore.
See Install SAP HANA Spark Controller Using Ambari [page 14].
Compression
Name Default Values
Description
sap.hana.enable.compression False Enables or disables compression. Compression is only used for data exchange.
52 P U B L I CSAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller
Hadoop Processing Engine Facades
Name Default Values
Description
sap.hana.hadoop.engine.facades
sparksql
Comma separated list of facades connecting to Hadoop processing engines such as sparksql, hadoop (Map Reduce) and vora.
NoteThis property replaces sap.hana.hadoop.datastore.
Valid values are sparksql, hadoop, and vora.
Data Storage Format
Name Default Values
Description
sap.hana.es.data.format auto Specifies the data storage format when moving data from Spark controller to Hadoop.
Valid values are parquet, orc, or auto.
Cloud Deployment
Name Default Values
Description
sap.hana.ar.provider None Address of translation service. Useful for cloud scenarios.
See Configuring Cloud Deployment Example [page 59].
SAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller P U B L I C 53
DLM Scenarios
Name Default Values
Description
sap.hana.es.warehouse.dir None When Spark controller is used for DLM scenarios, this property should point to a valid HDFS directory where you plan to store all aging data.
Security
Name Default Values
Description
sap.hadoop.kerberos.principal None Kerberos principal to be used for starting Spark controller.
NoteReplaces the spark.yarn.principal property.
sap.hadoop.kerberos.keytab None Kerberos key tab file to be used for the principal.
NoteReplaces the spark.yarn.keytab property.
sap.hana.es.ssl.enabled False Indicates whether to use secure communication.
sap.hana.es.ssl.keystore None Path to the PKCS keystore file.
sap.hana.es.ssl.keystore.password
None Password for the PKCS keystore file.
sap.hana.es.ssl.truststore None Path to the JKS truststore file. Set the value to Java Trust Store if explicit trust is warranted.
sap.hana.es.ssl.truststore.password
None Password for the trust store file.
sap.hana.es.ssl.verify.hostnames
False Indicates whether to check hostname against the certificate used for the SSL handshake.
sap.hana.es.ssl.clientauth.required
True Indicates whether client authentication is required.
sap.hana.auditing.enabled False If enabled, controller will log all the queries executed.
54 P U B L I CSAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller
Name Default Values
Description
sap.hana.proc.security.disabled
False Disables security manager for procedure execution.
sap.hana.allow.nonkerberos.client
False Allows non-Kerberos clients to connect to a Kerberos-enabled Spark controller. Set this property to true only when running SAP HANA instances that do not support Kerberos SSO. Otherwise, this setting allows every logged-in SAP HANA user to connect to Spark controller.
See Setting Up Security [page 67].
Spark Controller Properties for Older Versions of Spark and SAP HANA
Name Default Values
Description
sap.hana.sql.tungsten.enabled None Effective only when using Spark 1.5.2. Improves Spark execution by optimizing Spark jobs for CPU and memory efficiency.
sap.hana.use.dot.separator False Earlier versions of SAP HANA and Spark controller used a dot (.) as the separator between the schema name and table name, such as SYSTEM.SALES_ORDER. This was changed in later versions. If you are running an SAP HANA version of 1.0, up to version SPS12, set this property to true. When running SAP HANA 2.0 and higher, this is property is not required.
Related Information
Resource Allocation [page 55]Configuring Cloud Deployment Example [page 59]Distribution Deployment Configuration Templates [page 59]
4.5.1 Resource Allocation
Spark configurations for resource allocation are set in spark-defaults.conf or hanaes_site.xml.
You can limit the number of executors with the Spark property spark.executor.instances, or create a dedicated queue for Spark controller on YARN ResourceManager.
SAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller P U B L I C 55
You can also specify dynamic allocation, which allows Spark to dynamically scale the cluster resources allocated for the Spark application. When dynamic allocation is enabled, if there are backlog of pending tasks for a Spark application, it can request for new executors. When the application becomes idle, its executors are released and can be acquired by other spark applications.
NoteThese two methods are not compatible. Do not specify both spark.executor.instances and spark.dynamicAllocation.enabled.
Related Information
Limit Resource Allocations [page 56]Enable Dynamic Allocation of Executors [page 57]
4.5.1.1 Limit Resource Allocations
Follow these steps to limit resource allocation for SAP HANA spark controller.
Context
Spark controller runs on YARN in yarn-client mode. A Spark context requests executors to run the job. To prevent Spark from taking up all available resources — thus leaving no resource for any other application running on YARN ResourceManager — perform one of the following (listed in order of preference considering ease of configuration and stability):
Procedure
● Limit the number of executors by setting the <desired_number_of_executors> parameter in the hanaes-site.xml file:
<property> <name>spark.executor.instances</name> <value><desired_number_of_executors></value> <final>true</final></property>
● Create a dedicated queue for spark controller on YARN ResourceManager and allocate a percentage of resources to that queue. After you create the queue, maintain the <newly_created_queue> parameter in hanaes-site.xml:
<property>
56 P U B L I CSAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller
<name>spark.yarn.queue</name> <value><newly_created_queue></value> <final>true</final></property>
See the YARN documentation for information about YARN queue creation.
4.5.1.2 Enable Dynamic Allocation of Executors
Enable dynamic allocation of executors by specifying a minimum, maximum, and initial number of executors for Spark. This ensures that the computing capacity is elastic and robust for scenarios where Hadoop cluster is shared by various data processing applications.
Context
The following steps describe how to configure dynamic allocation for manual installations by editing property files.
Procedure
1. After installing spark controller, add the following spark. properties to the hanaes-site.xml file, which is typically located in the /usr/sap/spark/controller/conf directory:
These spark. properties are not specific to spark controller and are provided as an example of how to configure Spark dynamic allocation. See the Spark documentation for information about dynamic allocation properties and how to determine the appropriate values for your environment.
<property> <name>spark.shuffle.service.enabled</name> <value>true</value> <final>true</final></property><property> <name>spark.dynamicAllocation.enabled</name> <value>true</value> <final>true</final></property><property> <name>spark.dynamicAllocation.minExecutors</name> <value>4</value> <final>true</final></property><property> <name>spark.dynamicAllocation.maxExecutors</name> <value>8</value> <final>true</final></property><property> <name>spark.dynamicAllocation.initialExecutors</name> <value>4</value> <final>true</value>
SAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller P U B L I C 57
</property> 2. Edit the yarn-site.xml file.
○ Cloudera Manager – Log into Cloudera Manager and navigate to YARN Configuration and select Yarn Service Advanced Configuration Snippets (Safety valve) for yarn-site.xml.
○ Ambari – Log in to the Ambari Web UI and select YARN Config .○ Manual – Open the yarn-site.xml file in a text editor.
3. Add the following properties and values:
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle,spark_shuffle</value></property><property> <name>yarn.nodemanager.aux-services.spark_shuffle.class</name> <value>org.apache.spark.network.yarn.YarnShuffleService</value></property>
4. Copy the spark-<version>-yarn-shuffle.jar file from Spark to the Hadoop YARN classpath on all the Nodemanager hosts. This file is typically located in /usr/lib/hadoop-yarn/lib.
For Hortonworks, this folder is typically located in /usr/hdp/<hdb_version>/hadoop-yarn/lib.
5. Save the changes, then restart YARN and the node manager.
Hortonworks
Context
For some older Hortonworks versions, you may also need to perform the following steps.
Procedure
1. Locate and open mapred-site.xml file, or in Ambari web UI, select MapReduce2 Configs .
2. Update the property mapreduce.application.classpath by removing all entries containing <hdp_version>.
3. Restart the MapReduce job.
58 P U B L I CSAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller
4.5.2 Configuring Cloud Deployment Example
Configure SAP HANA Spark controller for cloud deployment.
Context
NoteYou do not need to set this property if both SAP HANA and Hadoop cluster are hosted in the cloud.
If your Spark controller is running on an cluster that is hosted in the cloud (such as on Amazon Web Services), then your host machines are using different IP addresses for internal and external communication. If your SAP HANA is running on premise, and a connection is attempted to Spark controller running in the cloud, the external IP address of the hosts is unavailable, causing query executions to fail because SAP HANA cannot reach the executor instances directly.
Procedure
Spark controller versions 1.5 PL 2 and higher offer a built-in translation service for Amazon cloud computing. If you are using AWS, enable proper hostname translation with the following configuration in the hanaes-site.xml file:
<property> <name>sap.hana.ar.provider</name> <value>com.sap.hana.spark.aws.extensions.AWSResolver</value> <final>true</final></property>
Cloud providers typically offer a service to translate internal host names to an external host name. You can also implement custom translators by implementing controller extension APIs.
4.5.3 Distribution Deployment Configuration Templates
Use these templates to configure SAP HANA spark controller for your distribution.
Update the hanaes-site.xml file located in the /usr/sap/spark/controller/conf directory.
● Cloudera (Parcel) Use for a parcel distribution format to deploy your Cloudera cluster. The path for deploying Cloudera differs when using parcels or packages. This path is not related to the file format when installing spark controller.
<property> <name>spark.executor.extraClassPath</name> <value>/opt/cloudera/parcels/CDH/lib/hadoop/lib/*:/opt/cloudera/parcels/CDH/lib/hadoop/*:/opt/cloudera/parcels/CDH/lib/hadoop-hdfs/*:/opt/
SAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller P U B L I C 59
cloudera/parcels/CDH/lib/hadoop-mapreduce/*:/opt/cloudera/parcels/CDH/lib/hadoop-yarn/*:/opt/cloudera/parcels/CDH/lib/hive/lib/* </value> <final>true</final> <description>Shared libraries to be loaded once</description></property>
● Cloudera (Package)Use for a package distribution format to deploy your Cloudera cluster. The path for deploying Cloudera differs when using parcels or packages. This path is not related to the file format when installing spark controller.
<property> <name>spark.executor.extraClassPath</name> <value>/usr/lib/hadoop/lib/*:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-yarn/*:/usr/lib/hive/lib/*</value> <final>true</final> <description>Shared libraries to be loaded once</description></property>
● Hortonworks
<property> <name>spark.sql.hive.metastore.sharedPrefixes</name> <value>com.mysql.jdbc,org.postgresql,com.microsoft.sqlserver,oracle.jdbc,org.apache.hadoop</value> <final>true</final> <description>Shared libraries to be loaded once</description></property>
● MapR
<property> <name>spark.sql.hive.metastore.sharedPrefixes</name> <value>org.apache.hadoop</value> <final>true</final> <description>Shared libraries to be loaded once</description></property>
4.6 Configure hanaes User Proxy Settings
SAP HANA Spark controller impersonates the currently logged-in user while accessing Hadoop services. To allow this impersonation, maintain appropriate configuration parameters in the core-site.xml file.
NoteThis section describes configuring user proxy settings for SAP HANA hanaes, and is not related to the proxy settings for tunneling with a proxy server.
Use the cluster management tool (Ambari or Cloudera Manager) to add these entries to the core-site.xml file. For more information about proxy user settings, see the Ambari or Cloudera Manager documentation.
The following examples show the different configurations you can set in the core-site.xml file.
60 P U B L I CSAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller
This example shows the hosts in which the hanaes user is allowed to perform an impersonation. Ideally, this is the host where Spark controller is installed. Maintaining “*”allows the hanaes user to impersonate from any host.
The syntax is:
hadoop.proxyuser.<proxy_user>.hosts
<property> <name>hadoop.proxyuser.hanaes.hosts</name> <value>*</value> </property>
This example shows how a user can set proxy setting from hosts in the range of 10.222.0.0-16 and 10.113.221.221, to impersonate user1 and user2. This property accepts comma-separated lists of host names, or of IP addresses and IP address ranges in CIDR format:
<property> <name>hadoop.proxyuser.super.hosts</name> <value>10.222.0.0/16,10.113.221.221</value></property><property> <name>hadoop.proxyuser.super.users</name> <value>user1,user2</value> </property>
If you have a group consisting of users that are connecting from SAP HANA, you can maintain a group name and allow users to impersonate a user belonging to the group. This example shows how to allow hanaes to impersonate users from any group. The syntax is:
hadoop.proxyuser.<proxy_group>.groups
<property> <name>hadoop.proxyuser.hanaes.groups</name> <value>*</value> </property>
This example shows how to allow user tom to impersonate a user belonging to group1 and group2:
<property> <name>hadoop.proxyuser.tom.groups</name> <value>group1,group2</value> </property>
Related Information
Ambari [page 62]Cloudera Manager [page 63]
SAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller P U B L I C 61
4.6.1 Ambari
Use the Ambari Web UI to add the proxy user entries to the core-site.xml file.
Procedure
1. From the menu on the left side of the Ambari Dashboard, click HDFS.2. Click the Configs tab.3. Click the Advanced tab, then expand Custom core-site.4. Click Add Property.5. In the Key field, enter hadoop.proxyuser.hanaes.hosts, and in the Value field, enter an asterisk.
6. Click Add Property.7. In the Key field, enter hadoop.proxyuser.hanaes.groups, and in the Value field, enter an asterisk.
8. Save your configuration changes.
62 P U B L I CSAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller
Results
By specifying an asterisk for the hosts and groups, the user named hanaes can impersonate any user belonging to any group from any host.
4.6.2 Cloudera Manager
Use the Cloudera Manager Web UI to add the proxy user entries to the core-site.xml file.
Procedure
1. In the "New layout" mode on Cloudera Manager, go to Cloudera Manager HDFS Configuration .2. In the Search box, type cluster-wide to find Cluster-wide Advanced Configuration Snippet (Safety
Valve) and the core-site.xml settings.
3. Click the plus sign (+), and add the properties as follows:
Results
By specifying an asterisk for the hosts and groups, the user named hanaes can impersonate any user belonging to any group from any host.
SAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller P U B L I C 63
4.7 Configuring a Proxy Server
You can configure SAP HANA Spark controller to tunnel data in environments where networking landscapes require a proxy server to operate between SAP HANA and Spark controller.
Context
When tunneling data with a proxy server, the data is sent from the Hadoop cluster nodes (executors) through Spark controller, and then through the proxy server to SAP HANA.
Procedure
1. Edit the hanaes_site.xml file to include the sap.hana.dmz.proxy.host parameter with the IP address of your proxy server.
2. Configure your proxy server.
SAP HANA Studio uses the defined proxy server and port when creating the Spark controller remote source. The following is an example configuration through an nginx proxy server:
tcp { upstream spark controller { server <controller_host_domain>:<configured_port>; } upstream sparkcontrollerdata { server <controller_host_domain>:<configured_port + 1>; } server { listen <listen_proxy_port>; proxy_pass sparkcontroller; } server { listen <configured_port + 1>; proxy_pass sparkcontrollerdata; } }
3. Create a remote source in SAP HANA Studio:
CREATE REMOTE SOURCE "proxy_spark" ADAPTER "sparksql" CONFIGURATION 'server=<x.x.x.x>;port=7860;ssl_mode=disabled' WITH CREDENTIAL TYPE 'PASSWORD' USING 'user=hanaes;password=<password>'
Related Information
Configuration Properties [page 49]
64 P U B L I CSAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller
4.8 Enabling Remote Caching
You can enable remote caches in Spark for queries with complex calculation, which allows you to use materialized data for the repetitive execution of the same query.
Context
SAP HANA dispatching a virtual table query to Spark involves a series of Spark computations that may take from a few minutes to hours to complete a query, depending on the data size in Hadoop and the current cluster capacity. In most cases, the data in the Hadoop cluster is not frequently updated, and successive execution of map and reduce jobs might result in the same queries. Using remote caching with Hadoop through the Spark interface allows you to use the cached remote data set rather than wait for queries to be executed again. The first time you run a statement, you see no performance improvement because of the time it takes to run the job and sort the data in the table. The execution time for the job is reduced the next time you run the same query, because you are accessing materialized data.
NoteRemote caching is available when using SAP HANA 2.0 and spark controller 2.0 versions and higher.
Use this feature for Hive tables or extended storage tables with low-velocity data (which are not frequently updated).
Procedure
● To enable remote caching, add the following configuration to hanaes-site.xml:
<property> <name>sap.hana.es.enable.cache</name> <value>true</value> <final>true</final></property><property> <name>sap.hana.es.cache.max.capacity</name> <value>5000</value> <final>true</final> </property>
This behavior is controlled by using a hint to instruct the optimizer to use remote caching. For example:
After you create a virtual table called spark_activity_log, fetch all erroneous entries for plant 001:
select * from spark_activity_log where incident_type = 'ERROR' and plant ='001' with hint (USE_REMOTE_CACHE, USE_REMOTE_CACHE_MAX_LAG(7200))
When you use the hint USE_REMOTE_CACHE, this result set is materialized in Spark, and subsequent queries are served from the materialized view.
SAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller P U B L I C 65
Related Information
Remote Caching Configuration Parameters [page 66]
4.8.1 Remote Caching Configuration Parameters
Use the following configuration parameters for remote caching, which are stored in the indexserver.ini file in the smart_data_access section.
Parameter Description
enable_remote_cache ( 'true' | 'false' )
A global switch to enable or disable remote caching for federated queries. This parameter only supports Hive sources. The USE_REMOTE_CACHE hint parameter is ignored when this parameter is disabled.
remote_cache_validity = 3600 (seconds)
Defines how long the remote cache remains valid. By default, the cache is retained for an hour.
USE_REMOTE_CACHE_MAX_LAG() Defines how long, in seconds, the remote cache remains valid for an individual query. By default, the cache is retained for an hour. This value overwrites the value of remote_cache_validity.
66 P U B L I CSAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller
5 Setting Up Security
This section describes how to enable audit logs for Spark controller, and an overview of how to set up authentication for your Hadoop cluster using Kerberos, or by configuring SSL.
Related Information
LDAP Authentication [page 67]Configure Auditing [page 68]Kerberos [page 69]SSL [page 74]
5.1 LDAP Authentication
Configure and activate user name and password authorization in SAP HANA spark controller.
There are various LDAP (Lightweight Directory Access Protocol) implementations, such as OpenLDAP or Active Directory (Apache or Microsoft); therefore the configuration steps and syntax may differ depending on the implementation and the Hadoop distribution.
To configure HiveServer2 to use LDAP or configure HiveServer2 to use LDAP over SSL (LDAPS), you must set up an LDAP server and create a user account on the LDAP server. The following links provide details for the different Hadoop distributions.
Distributions Documentation
Hortonworks 2.6.0 Data Access
Cloudera 5.8.x Using LDAP Username/Password Authentication with HiveServer2
MapR 5.0 Authentication for HiveServer2
Property changes should be made to etc/hive/conf/hive-site.xml file. The location of this file is set by the HIVE_CONF_DIR environment variable in the hana_hadoop-env.sh file.
After you set the properties, restart spark controller. You can now create remote sources and virtual tables in SAP HANA using the LDAP server username and password for authentication. For example:
CREATE REMOTE SOURCE SPARK_SQL ADAPTER "sparksql" CONFIGURATION 'server=<YOUR_SPARK_CONTROLLER_HOST>;port=<SPARK_CONTROLLER_PORT>;ssl_mode=disabled;' WITH CREDENTIAL TYPE 'PASSWORD' USING 'user=hana;password=<password>';
SAP HANA Spark Controller Installation GuideSetting Up Security P U B L I C 67
5.2 Configure Auditing
Enable writing to audit logs.
Spark controller supports writing to audit logs. Enable writing audit logs by setting the following property:
<property> <name>sap.hana.auditing.enabled</name> <value>true</value> <final>true</final> </property>
Audit events are emitted as a set of key-value pairs for the following keys:
Key Value
ugi <user>,<group>[,<group>]*
client <client ip address>
cmd (QUERY_EXECUTE|CREATE_EXTENDED|DROP_EXTENDED)
sql <sql query that was executed>
schema (<schema>|NULL)
table (<table>|NULL)
This is a sample line of the audit output:
2016-07-28 14:32:04,182 ugi=hanaes,sapsys,hdfs client=1.2.3.4 cmd=QUERY_EXECUTE sql="SELECT "ORDERS01_SPARK_TESTA"."O_ORDERKEY", "ORDERS01_SPARK_TESTA"."O_CUSTKEY", "ORDERS01_SPARK_TESTA"."O_ORDERSTATUS", "ORDERS01_SPARK_TESTA"."O_TOTALPRICE", "ORDERS01_SPARK_TESTA"."O_ORDERDATE", "ORDERS01_SPARK_TESTA"."O_ORDERPRIORITY", "ORDERS01_SPARK_TESTA"."O_CLERK", "ORDERS01_SPARK_TESTA"."O_SHIPPRIORITY", "ORDERS01_SPARK_TESTA"."O_COMMENT" FROM "SYSTEM"."ORDERS01" "ORDERS01_SPARK_TESTA"" schema=NULL table=NULL
68 P U B L I CSAP HANA Spark Controller Installation Guide
Setting Up Security
5.3 Kerberos
Kerberos is a protocol for establishing mutual identity trust, or authentication, for a client and a server, via a trusted third-party.
These overview instructions assume you know how to install Kerberos, or that you already have a working Kerberos key distribution center (KDC) and realm setup.
Enabling Kerberos on Your Hadoop Cluster
You must install Kerberos client packages on all cluster hosts and hosts that will be used to access the cluster. Refer to the security documentation for your Hadoop distribution for information about up your Hadoop cluster for Kerberos.
Reference Information:
● Cloudera Manager – Enabling Kerberos Authentication Using the Wizard .● MapR – Configuring Kerberos User Authentication .● Ambari - Setting Up Kerberos for Use with Ambari .
Configure Kerberos for SAP HANA Instance
Reference Information:
● SAP HANA Administration Guide – Managing Single Sign-On (SSO) with Kerberos.● SAP HANA Security Guide – Single Sign-On Using Kerberos.● "SAP HANA Smart Data Access Single Sign-On Guide", attached to SAP Note 2303807● "Single Sign-On with SAP HANA Database using Kerberos and Microsoft Active Directory", attached to SAP
Note 1837331● "SAP HANA SSO/Kerberos: create keytab and validate configuration script", attached to SAP Note
1813724
Kerberos 5 is installed with SAP HANA. It contains the S4U (Service for User) extension needed for user impersonation and constrained delegation. Constrained delegation means that delegation can be done only to a predefined set of services. For the purposes of protocol transition, the computer on which the server is installed needs to be entrusted by the Microsoft Windows Active Directory for delegation. Kerberos protocol is used in SAP HANA for authentication only and not for session management.
These are the SAP HANA .keytab requirements:
● <sidadm_home>/etc/krb5_hdb.conf● <sidadm_home>/etc/krb5_hdb.keytabkrb5_hdb.keytab● <sidadm_home>/etc/krb5_host.keytabkrb5_host.keytab
If the files are present in the <sidadm_home>/etc folder, the configuration is automatically taken from there, otherwise the default OS configuration in /etc/krb5.conf and /etc/krb5.keytab are used instead.
SAP HANA Spark Controller Installation GuideSetting Up Security P U B L I C 69
For a custom setup of Kerberos, you can overwrite the following variables in /usr/sap/<SID>/home/.customer.sh: KRB5_CONFIG, KRB5_KTNAME, KRB5_CLIENT_KTNAME. For example:
Sample Code
export KRB5_CONFIG=<conf file> export KRB5_KTNAME=<hdb keytab file> export KRB5_CLIENT_KTNAME=<host keytab file>
You can connect to an SAP HANA remote source using single sign-on (SSO) with Kerberos. Declare either a global credential type for the remote source, or as an individual type for a given user. If a user with user level credentials is defined and the remote source has global credentials defined, the global credentials are used; the user level credentials are ignored on the remote source. Do one of the following:
To: Execute:
Create global credentials CREATE CREDENTIAL COMPONENT 'SAPHANAFEDERATION' PURPOSE <remote_source_name> TYPE 'KERBEROS';
Create user level credentials CREATE CREDENTIAL FOR USER <user_name> COMPONENT 'SAPHANAFEDERATION' PURPOSE <remote_source_name> TYPE 'KERBEROS';
On the source SAP HANA server, configure Kerberos to support constrained delegation.
1. Create the file $HOME/etc/krb5_hdb.conf and enable delegation by setting the forwardable parameter for Kerberos service tickets to true in the krb5_hdb.conf file. See the template here: SAP HANA Server Configuration [page 73].
2. On the Microsoft Windows Active Directory server, create a Windows Domain account for the SAP HANA server computer and map a host service principal name (SPN) to it. See WinAD Server Configuration [page 73].
3. Add a keytab entry for the hdb service. The keytab stores the keys needed by the SAP HANA server to take part in the authentication protocol. service of a remote SAP HANA server to a Microsoft Windows Active Directory account in order to be able to log in to the remote SAP HANA server using Kerberos. Enable constrained delegation and protocol transition for your remote SAP HANA server in the Active Directory Users and Computers application. See WinAD Server Configuration [page 73].
Configure Kerberos for SAP HANA Spark Controller
1. In the Active Directory, define hanaes/<hadoop_host_name.domain>.com as the Hadoop node host on which spark controller is running.
2. Delegate (forward tickets) from host/<hana_host_name.domain>.com to the remote SAP HANA server hanaes/<hadoop_host_name.domain>.com services, where <hana_host_name.domain>.com is the host on which the SAP HANA instance is running.
3. Create an SAP HANA user account which has a defined Kerberos external ID. The Kerberos external ID can be set in SAP HANA Studio when the user is edited. This user must have the rights to create remote source.
70 P U B L I CSAP HANA Spark Controller Installation Guide
Setting Up Security
4. Create .keytabs for spark controller Kerberos setup. You should contact your Kerberos administrator to request the generation of .keytabs. The hanaes user is assigned a Kerberos keytab.These are the spark controller .keytab requirements:○ hanaes.keytab○ krb5.conf
5. Copy the .keytab file to the spark controller configuration directory at: /usr/sap/spark/controller/conf. Use the following parameters for Spark to connect to the cluster using your Kerberos principal.
<property> <name>sap.hadoop.kerberos.keytab</name> <value>/usr/sap/spark/controller/conf/hanaes.keytab</value></property><property> <name>sap.hadoop.kerberos.principal</name> <value>hanaes/<hadoop_host.domain>@<your_domain></value> </property>
6. On the Hadoop cluster, add the following properties for Kerberos using Ambari for the Hortonworks Hadoop distribution, or using comparable administration tools provided by other distributions of Hadoop:1. Add the following rule to hadoop.security.auth_to_local in the HDFS Advanced core-
site.xml:
RULE:[2:$1@$0]([email protected])s/.*/hanaes/
2. Add the proxy user parameters to the HDFS Custom core-site.xml:
hadoop.proxyuser.hanaes.groups=* hadoop.proxyuser.hanaes.hosts=* 7. Create a remote source of type sparksql with the credential type KERBEROS. For example:
CREATE REMOTE SOURCE "spark_krb" ADAPTER "sparksql" CONFIGURATION 'server=Host_name.domain.com;port=7860;ssl_mode=disabled' WITH CREDENTIAL TYPE 'KERBEROS'
Host_name.domain.com is the host where spark controller is running.
Related Information
Configure Kerberos SSO on the SAP HANA Server [page 72]
SAP HANA Spark Controller Installation GuideSetting Up Security P U B L I C 71
5.3.1 Configure Kerberos SSO on the SAP HANA Server
This section describes how to configure Kerberos in SAP HANA for SDA as to use protocol transition and authenticate SAP HANA users automatically on a Windows Domain Active Directory without providing a password (SSO mode).
Prerequisites
Microsoft Windows Server, version 2003 or later.
Architecture Overview
The Kerberos platform architecture used in SSO authentication for connections to SAP HANA remote sources is shown below. Protocol transition is assured by Kerberos 5's S4U2Proxy extension:
Protocol transition is a capacity of Kerberos used mainly on intermediary platform servers. The user can be authenticated further on Kerberos using their own name, as with SSO. To enable this, the computer on which the server is installed should be entrusted by Windows Active Directory for delegation. Protocol transition supports only constrained delegation, meaning that the delegation can be done only towards a predefined set of services.
72 P U B L I CSAP HANA Spark Controller Installation Guide
Setting Up Security
SAP HANA Server Configuration
After installing SAP HANA, create the file $HOME/etc/krb5_hdb.conf using the following template. By default, HOME=/usr/sap/<SID>/home:
[libdefaults] default_realm = <PLATFORM.DOMAIN> clockskew = 300 default_keytab_name = /usr/sap/<SID>/home/etc/krb5_hdb.keytab default_client_keytab_name = /usr/sap/<SID>/home/etc/krb5_host.keytab forwardable = true [realms] <PLATFORM.DOMAIN> = { kdc = <server>.<server.DNS.domain>:88 kpasswd_server = <server>.<server.DNS.domain>:464 } [domain_realm] .<localhost.DNS.domain> = <PLATFORM.DOMAIN> <localhost.DNS.domain> = <PLATFORM.DOMAIN> [logging] kdc = FILE:/usr/sap/<SID>/home/log/krb5kdc.log admin_server = FILE:/usr/sap/<SID>/home/log/kadmind.log default = SYSLOG:NOTICE:DAEMON
where:
<PLATFORM.DOMAIN> – your WinAD NT domain name.
<SID> – your SAP HANA instance SID.
<server> – the WinAD server host.
<server.DNS.domain> – the full DNS domain of the WinAD server host.
<localhost.DNS.domain> – the full DNS domain of your Hana server.
WinAD Server Configuration
On the WinAD server, create a new Windows Domain User account which will act as a UNIX computer account. For more information, see the Windows-Unix inter-operability setup documentation at: https://technet.microsoft.com/en-us/library/bb742433.aspx
1. Add a new user using the Active Directory Users and Computers management tool. The new user account should be specified as user logon name <host>/<fully qualified host name of HANA server>. In Linux Kerberos terminology, the user logon name is also known as UPN (user principal name).
2. Open a command window and set the SPN (service principal name) to <host>/hanaserver.sap.corp. For example:
ktpass princ host/[email protected] mapuser krbHana pass Pass1234 out krb5_host.keytab
It is mandatory that the SPN and UPN are the same and the host service is set as <host>/hanaserver.sap.corp.
SAP HANA Spark Controller Installation GuideSetting Up Security P U B L I C 73
3. Copy the generated krb5_host.keytab file to your SAP HANA server account at /usr/sap/<SID>/home/etc/krb5_host.keytab and to /usr/sap/<SID>/home/etc/krb5_hdb.keytab.
4. Add to this account the hdb service to allow logging into the SAP HANA server using Kerberos. This allows SAP HANA users to validate their Kerberos IDs by first logging into SAP HANA with their Kerberos account before being authorized to use this ID for protocol transition.
5. In a windows administration console execute:
setspn -S hdb/hanaserver.sap.corp plat_security\hanaserver
You will need to use the NT Domain name, which is case insensitive on Windows.6. In the Active Directory Users and Computers application, open the created user and click on the
Delegation tab. And add hdb service of your Hana server account1. Select Trust this user for delegation to specified services only, and chose Use any authentication
protocol.2. Add the hdb service for your server account:
○ Service Type – hdb○ User or Computer – hanaserver.sap.corp
7. In your Linux account, you will need to add the keytab entry for the hdb service. For example:
klist -k /usr/sap/<SID>/home/etc/krb5_hdb.keytab -etK
8. Start the ktutil tool and execute the following at the command prompt:
ktutil: addent -password -p hdb/[email protected] -k 3 -e rc4-hmac
The KVNO number is typically 3.9. Provide the account password and execute:
ktutil: wkt /usr/sap/<SID>/home/etc/krb5_hdb.keytab ktutil: q
If you re-execute the command, 2 entries should appear.10. Copy or share the /usr/sap/<SID>/home/etc folder to each node of your cluster.11. Restart your SAP HANA server instance.
5.4 SSL
SSL ensures authentication of the server using a certificate and corresponding key. Use the SSL protocol to perform mutual SSL authentication between SAP HANA and SAP HANA spark controller.
OpenSSL Tools
OpenSSL contains an open-source implementation of the SSL and TLS protocols. The core library, written in the C programming language, implements basic cryptographic functions and provides various utility functions.
74 P U B L I CSAP HANA Spark Controller Installation Guide
Setting Up Security
Wrappers allowing the use of the OpenSSL library in a variety of computer languages are available. See the OpenSSL documentation for more information: https://www.openssl.org/docs/manmaster/ .
Private and Public Keys
Two key pairs are used for the SSL/TLS protocol to authenticate, secure, and manage secure connections. One key is a private key, and the other is a public key. They are created together and work together as a pair during the SSL/TLS handshake process (using asymmetric encryption) to set up a secure session:
● Private key – This is a text file used to generate a Certificate Signing Request (CSR), which is a message sent from an applicant to a certificate, and later to secure and verify connections using the certificate created per that request.
● Public key – This it is included as part of your SSL certificate, and works together with your private key to make sure that your data is encrypted and verified. Anyone with access to the public key can verify that the digital signature is authentic without having to know the secret private key.
Certificate Authorities (CA)
The CA is an entity that issues digital certificates that certifies the ownership of a public key. There are two types of CAs: a root CA and an intermediate CA. A trusted SSL certificate must be issued by a CA that is included in the trusted store of the connecting device. The connecting device checks to see if the certificate was issued by a trusted CA. To obtain an SSL certificate from a certificate authority (CA), you must generate a certificate signing request (CSR).
Certificate Signing Request (CSR)
A CSR is a block of encoded text that is given to a certificate authority when applying for an SSL certificate. It is usually generated on the server where the certificate will be installed and contains information that will be included in the certificate such as the organization name, domain name, locality, and country. It also contains the public key that will be included in the certificate. A private key is usually created at the same time that you create the CSR, making a key pair.
Certificate Chain
The certificate chain is an ordered list of certificates that contains an SSL certificate and certificate authority (CA). If the certificate was not issued by a trusted CA, the connecting device checks to see if the certificate of the issuing CA was issued by a trusted CA. The list of SSL certificates, from the root certificate to the end-user certificate, represents the SSL certificate chain:
● The root certificate is generally embedded in your connected device.● Installing the intermediate SSL certificate depends on environment. For example, Apache requires you to
bundle the intermediate SSL certificates and assign the location of the bundle to the
SAP HANA Spark Controller Installation GuideSetting Up Security P U B L I C 75
SSLCertificateChainFile configuration. Conversely, NGINX requires you to package the intermediate SSL certificates in a single bundle with the end-user certificate.
Personal Security Environments (PSE)
Certificates can be stored in the database, or in trust and key stores located in the file system. These certificates are contained in personal security environments (PSE) files located in the file system. PSEs are referred to as certificate collections, and are used when the certificates are required to secure internal communication channels using the system public key infrastructure (system PKI), and HTTP client access using the SAP Web Dispatcher administration tool, or the SAPGENPSE tool, both of which are delivered with SAP HANA. If you are using OpenSSL, you can also use the tools provided with OpenSSL.
Java KeyStore (JKS)
JKS (PKCS12) is a repository for security certificates. This repository is for authorization certificates, or public key certificates, and corresponding private keys that are used for SSL encryption.
Related Information
Configure SSL Mode [page 76]OpenSSL Command Syntax for SAP HANA Spark Controller [page 77]Configure SSL Example [page 78]SSL Mode Configure Parameters [page 80]
5.4.1 Configure SSL Mode
To create a remote source in SSL mode, perform mutual SSL authentication between SAP HANA and SAP HANA spark controller. For security purposes, enable SSL mode in production scenarios.
The following are the high level steps you preform to configure SSL mode.
1. Create a private key, SSL certificate chain, and certificate signing request (CSR).The private key is a text file used to generate a CSR. The certificate authority (CA) is a digital certificate that certifies the ownership of a public key, and the certificate chain is an ordered list of certificates that contain an SSL certificate and CAs. A public key is included as part of your SSL certificate.Use the OpenSSL tools to create the files, then copy them to the following locations:1. Add the private key and certificate chain to the SAP HANA keystore personal security environments
(PSE) file located in the file system of the machine on which you installed SAP HANA.2. Add the root certificate that you created to the spark controller JKS (PKCS12 file) trust store on the
machine on which you installed spark controller.
76 P U B L I CSAP HANA Spark Controller Installation Guide
Setting Up Security
Install both spark controller and its associated JKS (PKCS12) file on the machine that is part of the Hadoop cluster.
2. Create another private key, SSL certificate chain, and CSR. This key can be use the same, or different certificate authority.You can create this private key on any machine where the OpenSSL tool is located, then copy it to the machines where you installed spark controller and SAP HANA:1. Add the private key and certificate chain to the spark controller keystore JKS (PKCS12 file) on the
machine on which you installed spark controller.2. Add the root certificate to the SAP HANA PSE file, located in the file system of the machine on which
you installed SAP HANA3. Make the appropriate changes in hanaes-site.xml file. See SSL Mode Configure Parameters [page 80]
for a list of configuration parameters that are specific to SSL.4. Restart spark controller.
5.4.2 OpenSSL Command Syntax for SAP HANA Spark Controller
This section describes the openssl options and arguments used in this document.
openssl is a command line tool for using the various cryptography functions of the OpenSSL cryptography library. For more information about OpenSSL commands, see https://www.openssl.org/docs/manmaster/ .
The following are openssl options and arguments which are used in the example for configuring mutual SSL authentication between SAP HANA and spark controller.
● CA – specifies to sign the certificate request from a user.● CAKey – specifies to sign the certificate key request from a user.● CAcreateserial – the first time you use your CA to sign a certificate you can use the CAcreateserial
option, which creates a ca.srl file containing a serial number. The next time you use your CA, use the CAserial option.
● certfile – file from which to read additional certificates.● days – the number of days for which the certificate is valid.● export – exports the PKCS12 file.● extensions – specifies the extensions to be added when a certificate is issued.● extfile – file containing certificate extensions to use. If not specified, no extensions are added to the
certificate.● in - input filename from which to read the certificate.● inkey – file from which to read the private key.● keyout – stores the private key in the specified file name.● new – specifies that this is a new request.● newkey – specifies that this is for generating a new private key.● nodes – meaning "no DES". This option specifies that you do not want the private key in a PKCS12 file
encrypted.● out – stores the certificate request in the specified file name.● pkcs12 – specified to create and parse a PKCS12 file.
SAP HANA Spark Controller Installation GuideSetting Up Security P U B L I C 77
● req – generates a certificate request signing.● rsa:2048 – specifies the bit length of the private key. For smaller keys you can use 1024 or 512. The
strength of the key should match the type of service your certificate authority is providing to you.● sha1 – specifies to use Secure Hash Algorithm 1 (the 160-bit cryptographic hash function).● x509 – certificate utility. x509 can be used to display certificate information, convert certificate, sign
certificate requests, or edit certificate trust settings.
5.4.3 Configure SSL Example
This example describes configuring SSL authentication between SAP HANA and SAP HANA spark controller.
Context
You can create a CA (Certificate Authority) and use the CA to sign a certificate, or use a self-signed certificate. This example shows how to use a self-signed certificate.
The /etc/ssl/openssl.cnf file is the general configuration file for OpenSSL program. You can configure the expiration date of your keys, the name of your organization, the address, and so on. This file is also refer to as <openssl conf>.
NoteSee OpenSSL Command Syntax for SAP HANA Spark Controller [page 77] for a list of command options used in this example and their descriptions.
Procedure
1. Login into your SAP HANA system as the <sid>adm user.
2. Create a personal security environments (PSE) file for SAP HANA. This syntax creates an unencrypted 2048-bit RSA private key and the associated self-signed certificate in the privacy-enhanced mail (PEM) format:
openssl req -new -x509 -newkey rsa:2048 -days 365 -sha1 -keyout CA_Key.pem -out CA_Cert.pem -extensions v3_ca
The output displays the certificate request. Additional information is required.3. When prompted, provide the appropriate information in the identification form. For example:
○ Country Name (2 letter code) [AU]:US○ State or Province Name (full name) [Some-State]:California○ Locality Name (eg, city) []:San Jose○ Organization Name (eg, company) [Internet Widgits Pty Ltd]:MyCompany
78 P U B L I CSAP HANA Spark Controller Installation Guide
Setting Up Security
○ Organizational Unit Name (eg, section) []:Hadoop○ Common Name (e.g. server FQDN or YOUR name) []: BIG DATA○ Email Address []:<xxxx>@gmail.com
4. Create a certificate signing request (CSR). This file is used to send the public key information that identifies your company and domain name for a signing request:
openssl req -newkey rsa:2048 -days 365 -sha1 -keyout Hana_Key.pem -out Hana_Req.pem -nodes
The output displays the certificate request. Additional information is required.5. When prompted, provide the appropriate information in the identification form. For example:
○ Country Name (2 letter code) [AU]:US○ State or Province Name (full name) [Some-State]:California○ Locality Name (eg, city) []:San Jose○ Organization Name (eg, company) [Internet Widgits Pty Ltd]:MyCompany○ Organizational Unit Name (eg, section) []:Hadoop○ Common Name (e.g. server FQDN or YOUR name) []: <HANA_hostname>○ Email Address []:[email protected]○ A challenge password []: myPass○ An optional company name []: MyCompany2
6. Use the self-signed certificated to sign the request:
openssl x509 -req -days 365 -in Hana_Req.pem -sha1 -extfile <openssl conf> -extensions usr_cert -CA CA_Cert.pem -CAkey CA_Key.pem -CAcreateserial -out Hana_Cert.pem
7. Export the private key and certificate chain to the PKCS12 store:
openssl pkcs12 -export -out hana.pkcs12 -in Hana_Cert.pem -inkey Hana_Key.pem -certfile CA_Cert.pem
8. Use the sapgenpse.exe utility to create a personal security environments (PSE) file from the PKCS12 store created above.
sapgenpse import_p8 -p sparksql_ks.pse ./hana.pkcs12
where:○ import_p8 – creates a PSE from an OpenSSL keyfile.○ p – path and file name for the server PSE.
9. Create a new certificate signing request (CSR) for spark controller:
openssl req -newkey rsa:2048 -days 365 -sha1 -keyout Controller_Key.pem -out Controller_Req.pem -nodes
The output displays the certificate request. Additional information is required.10. When prompted, provide the appropriate information in the identification form. For example:
○ Country Name (2 letter code) [AU]:US○ State or Province Name (full name) [Some-State]:California○ Locality Name (eg, city) []:San Jose
SAP HANA Spark Controller Installation GuideSetting Up Security P U B L I C 79
○ Organization Name (eg, company) [Internet Widgits Pty Ltd]:MyCompany○ Organizational Unit Name (eg, section) []:Hadoop○ Common Name (e.g. server FQDN or YOUR name) []: <Spark_controller_hostname>○ Email Address []:[email protected]
11. Use the self-signed certificated to sign the request:
openssl x509 -req -days 365 -in Controller_Req.pem -sha1 -extfile <openssl conf> -extensions usr_cert -CA CA_Cert.pem -CAkey CA_Key.pem -out Controller_Cert.pem
12. Export the private key and certificate chain to the PKCS12 store:
openssl pkcs12 -export -out controller_ks.p12 -in Controller_Cert.pem -inkey Controller_Key.pem -certfile CA_Cert.pem
13. Use scp to copy the CA_Cert.pem and controller_ks.p12 files to the same location as the spark controller installation.
14. Use the Java Keytool key and certificate management utility to import the spark controller _ts.jks file into the keystore:
keytool -import -file <Path to CA_Cert.pem>/CA_Cert.pem -keystore ./controller_ts.jks
You now have the controller_ks.p12 and controller_ts.jks files imported where you installed spark controller.
15. Set the appropriate parameters in the hanaes-site.xml file. See SSL Mode Configure Parameters [page 80].
16. If you use the spark controller host name to create the CSR, set the value of the sap.hana.es.driver.host property to your spark controller’s host name to avoid any ambiguous host resolution.
17. Restart Spark controller.
5.4.4 SSL Mode Configure Parameters
The spark_configuration parameter group in the global.ini file includes SSL parameters.
Parameter Default value Description
sslKeyStore sparksql_ks.pse Path to the keystore file in PSE format. sslKeyStore holds a single key and certificate chain for SAP HANA to communicate with all Spark controllers.
sslTrustStore sparksql_ts.pse Path to the trust store file in PSE format. The SAP HANA trust store for Spark holds certificates for all Spark controllers to which it connects.
80 P U B L I CSAP HANA Spark Controller Installation Guide
Setting Up Security
Parameter Default value Description
sslValidateCertificate true If set to true, the host's certificate is validated.
sslValidateHostNameInCertificate
true If set to true, the hostname is validated against the certificate used for the SSL handshake. This parameter is only used when sslValidateCertificate is set to true.
Make sure that sslValidateCertificate and sslValidateHostNameInCertificate are set to true.
On Spark controller, the hanaes-site.xml file includes additional security parameters. See Configuration Properties [page 49]. Make sure that sap.hana.es.ssl.enabled is set to true.
SAP HANA Spark Controller Installation GuideSetting Up Security P U B L I C 81
6 Create a Remote Source
Connect to your Hadoop cluster from SAP HANA by creating a remote source.
Prerequisites
Spark controller must be running.
Procedure
Run the CREATE REMOTE SOURCE SQL statement in the SQL console of SAP HANA Studio.
○ This example creates a remote source of type sparksql:
CREATE REMOTE SOURCE "spark_demo" ADAPTER "sparksql" CCONFIGURATION 'server=<x.x.x.x>;port=7860;ssl_mode=disabled' WITH CREDENTIAL TYPE 'PASSWORD' USING 'user=hanaes;password=hanaes';
○ This example creates a remote source of type sparksql with the credential type Kerberos:
CREATE REMOTE SOURCE "spark_demo" ADAPTER "sparksql" CONFIGURATION 'server=<x.x.x.x>;port=7860;ssl_mode=disabled' WITH CREDENTIAL TYPE 'KERBEROS'
The remote source appears under Provisioning Remote Source .
Related Information
Configuring a Proxy Server [page 64]
82 P U B L I CSAP HANA Spark Controller Installation Guide
Create a Remote Source
7 Create a Custom Spark Procedure
Custom Spark procedures are virtual procedures used to access a Spark remote source.
Prerequisites
A remote source exists.
Context
Create a custom Spark procedure in SAP HANA to perform compilation and execution on a Hadoop cluster and consume the results back in SAP HANA. You can easily access Spark libraries from SAP HANA and then push the procedure to spark controller for compilation and execution. An example of this is accessing the machine learning libraries on a Hadoop cluster and bringing the model back to SAP HANA for prediction.
The body of the CREATE VIRTUAL PROCEDURE defines the source code for the virtual procedure. You can run complex algorithms on both structured (such as tables) and non-structured (such as log files) data using Scala programming language.
SAP HANA Spark Controller Installation GuideCreate a Custom Spark Procedure P U B L I C 83
Procedure
● The following example is adapted from a sample hosted on the Apache Spark Web site: https://spark.apache.org/docs/1.6.2/ml-features.html#n-gram .
Sample Code
CREATE VIRTUAL PROCEDURE SYSTEM.FINDNGRAMS( IN N INT,OUT NGRAMS TABLE(STR TEXT))LANGUAGE SCALASPARKAT SPARK_OAKLASBEGINimport sqlContext.implicits._import scala.collection.mutable.WrappedArrayimport org.apache.spark.ml.feature.NGram // $example on$ val wordDataFrame = sqlContext.createDataFrame(Seq( (0, Array("Hi", "I", "heard", "about", "Spark")), (1, Array("I", "wish", "Java", "could", "use", "case", "classes")), (2, Array("Logistic", "regression", "models", "are", "neat")) )).toDF("id", "words") val ngram = new NGram().setN(N).setInputCol("words").setOutputCol("ngrams") val ngramDataFrame = ngram.transform(wordDataFrame) ngramDataFrame.select("ngrams").show(false) NGRAMS = ngramDataFrame.select("ngrams"). map(y=>y(0).asInstanceOf[WrappedArray[_]]. mkString(",")).toDFEND;CALL FINDNGRAMS(6, ?);
Related Information
Privileges [page 84]Virtual Package System Built-Ins [page 85]
7.1 Privileges
Use these privileges to give users permission to create a virtual procedure and virtual package.
To create a virtual procedure on a remote source, the CREATE VIRTUAL PROCEDURE object privilege is required on the remote source. The syntax is:
GRANT CREATE VIRTUAL PROCEDURE ON REMOTE SOURCE <source_name> TO <user>
84 P U B L I CSAP HANA Spark Controller Installation Guide
Create a Custom Spark Procedure
CautionGrant this privilege only to trusted database users. Even though procedure execution is done in a restricted sandbox, use extreme caution when granting this privilege.
The CREATE VIRTUAL PACKAGE privilege provides access to create a new virtual package:
GRANT CREATE VIRTUAL PACKAGE ON <schema_name> TO <user> WITH GRANT OPTION
7.2 Virtual Package System Built-Ins
Create, alter, or drop virtual packages.
A virtual package is an archive (zip) file containing Java libraries and resource files. These can be referenced in virtual procedures and functions. Typically, reusable Java libraries (JARs) are packaged into virtual package and shared across multiple virtual procedures.
The schema-level privilege CREATE VIRTUAL PACKAGE allows permission to add a new virtual package.
Table 2: VIRTUAL_PACKAGE_CREATE
Type (INPUT/ OUTPUT)
SQL Data Type Length Description
SCHEMA_NAME INPUT NVARCHAR 256 Schema name
PACKAGE_NAME INPUT NVARCHAR 256 Package name
ADAPTER_NAME INPUT NVARCHAR 256 Name of the remote source adapter
CONTENT INPUT BLOB Package file content
Table 3: VIRTUAL_PACKAGE_ALTER
Type (INPUT/ OUTPUT)
SQL Data Type Length Description
SCHEMA_NAME INPUT NVARCHAR 256 Schema name
PACKAGE_NAME INPUT NVARCHAR 256 Package name
ADAPTER_NAME INPUT NVARCHAR 256 Name of the remote source adapter
CONTENT INPUT BLOB Package file content
SAP HANA Spark Controller Installation GuideCreate a Custom Spark Procedure P U B L I C 85
Table 4: VIRTUAL_PACKAGE_DROP
Type (INPUT/ OUTPUT)
SQL Data Type Length Description
SCHEMA_NAME INPUT NVARCHAR 256 Schema name
PACKAGE_NAME INPUT NVARCHAR 256 Package name
ADAPTER_NAME INPUT NVARCHAR 256 Name of the remote source adapter
86 P U B L I CSAP HANA Spark Controller Installation Guide
Create a Custom Spark Procedure
8 Data Lifecycle Manager
The Data Lifecycle Manager (DLM) can be used to relocate data to Hadoop with SAP HANA spark controller for data access.
See these sections for more information about using spark controller for DLM scenarios:
● Set the sap.hana.es.warehouse.dir property to the location for aged data. See Configuration Properties [page 49].
● To create a remote source to SAP HANA, see Create a Remote Source [page 82].
See SAP HANA Data Warehousing Foundation for more information.
SAP HANA Spark Controller Installation GuideData Lifecycle Manager P U B L I C 87
9 Troubleshooting
This section contains troubleshooting procedures for problems that you may encounter when using SAP HANA spark controller.
Find solutions to known issues for HANA 2.0 SPS 02 using SAP ONE Support Launchpad .
Related Information
Troubleshooting Diagnostic Utility [page 88]SAP HANA Hadoop Integration Memory Leak for Spark Versions 1.5.2 and 1.6.x [page 94]SAP HANA Spark Controller Unsupported Features and Datatypes for Spark 1.5.2 [page 96]Cannot Execute Service Actions or Turn Off Service Level Maintenance Mode on Ambari [page 96]SAP Vora - SAP HANA Spark Controller Fails To Start [page 97]The TINYINT Datatype is not Supported When Accessing Apache Hive Tables [page 98]Fixing Classpath Order - Error Logs Shows the Exception "URI is not hierarchical" [page 98]Enable SAP HANA Spark Controller to Fetch Data From Each Spark Executor Node in the Network Directly [page 99]Configure SAP HANA Spark Controller for Non-Proxy Server Environments [page 99]SAP HANA Spark Controller Moves Incorrect Number of Records When Using Date Related Built-ins [page 100]Data Warehousing Support [page 100]
9.1 Troubleshooting Diagnostic Utility
SAP HANA spark controller includes a diagnostic tool to ensure that the required properties, environment variables, and component versions are correctly configured so that spark controller can start.
The diagnostic tool does not provide information regarding the installation or configuration of your Hadoop cluster. The diagnostic tool only provides troubleshooting information relevant to starting spark controller, therefore invoking the diagnostic tool after starting spark controller is not recommended.
For Ambari and manual installations, the diagnostic tool is located in the /usr/sap/spark/controller/utils directory. For Cloudera Manager installations, the tool is located in the /opt/cloudera/parcels/SAPHanaSparkController-<spark_controller_version>/lib/sap/spark/controller/utils directory.
For information about running diagnostics through the Cloudera Manager Web UI, see Run the Diagnostic Utility [page 30].
88 P U B L I CSAP HANA Spark Controller Installation Guide
Troubleshooting
Related Information
Run the Diagnostic Tool [page 89]Error Messages [page 91]
9.1.1 Run the Diagnostic Tool
Use the diagnostic tool to check your Spark controller installation for errors.
Prerequisites
● Ensure the HANA_SPARK_ASSEMBLY_JAR environment variables is set. This is the full path of the spark-assembly jar file used with Spark controller. See Environment Variables for hana_hadoop-env.sh [page 46].
● You must have root user permissions.
NoteIf you have installed Spark controller using Cloudera Manager, see Run the Diagnostic Utility [page 30] for information about running diagnostics through the Cloudera Manager Web UI.
Procedure
1. The diagnostic tool uses the default class paths for Hadoop libraries. If those libraries are installed in a different location, you will need to locate them for the tool. Run the following:
java -cp ./controller.util-<spark_controller_version>.jar:<classpath> com/sap/hana/spark/DiagnosticUtil
2. Go to the directory in which the diagnostic tool is located.○ For Ambari and manual installations, go to the /usr/sap/spark/controller/utils directory.○ For Cloudera installations, go to the /opt/cloudera/parcels/SAPHanaSparkController-
<spark_controller_version>/lib/sap/spark/controller/utils directory.
3. Enter the following:
sudo ./diagnose
The diagnostic tool will determine if Spark controller was installed correctly. If Spark controller was not installed correctly, the tool will provide a list of error messages briefly detailing the error, and a reference code. This list is compressed into the output.tar file in the same directory in which the tool was run. This compressed file consists of output.txt, which contains the error logs, as well as unmodified copies of hana_controller.log, hanaes-site.xml, and hana_hadoop-env.sh for convenience.
SAP HANA Spark Controller Installation GuideTroubleshooting P U B L I C 89
Here is an example of the tool running and not detecting an error.
Here is an example of the tool detecting the error A.15.
4. Use the error codes from the output to find more information about the errors. See Error Messages [page 91].
90 P U B L I CSAP HANA Spark Controller Installation Guide
Troubleshooting
9.1.2 Error Messages
Identify and troubleshoot errors in your SAP HANA spark controller installation after running the diagnostic tool.
The error codes are displayed in the following format: <Hadoop Distribution>.<Installation Type>.<Error Number>.
The first letter of the code indicates the Hadoop Distribution codes and corresponds to the following:
● A – All● M – MapR● C – Cloudera● H – Hortonworks
Terminal (or manual) installations are indicated withT. If there is no middle letter, the error could affect either manual or Web UI installations.
Table 5: Error Messages
Error Code Description
A.01 Cause: The file hana_hadoop-env.sh must contain the entry export HANA_SPARK_ASSEMBLY_JAR=<path> in order for the spark-assembly jar file to be accessible to the system environment. This error occurs when no such entry exists, when this entry has been commented out, or when no <path> has been provided.
Solution: Ensure the line export HANA_SPARK_ASSEMBLY_JAR=<path> exists in the hana_hadoop-env.sh file.
See Environment Variables for hana_hadoop-env.sh [page 46].
A.02 Cause: The file hana_hadoop-env.sh must contains entry export HANA_SPARK_ASSEMBLY_JAR=<path> and <path> has been provided, but does not point to a valid filesystem location.
Solution: Ensure the line export HANA_SPARK_ASSEMBLY_JAR=<path> exists in the hana_hadoop-env.sh file.
See Environment Variables for hana_hadoop-env.sh [page 46].
A.04 Cause: The Hadoop distribution and spark-assembly distribution files do not match or could not be determined.
Solution: Ensure you are using the correct spark-assembly jar file.
See the Installation Prerequisites for your installation type to find the location of the spark-assembly jar file for your configuration.
A.05 Cause: The directory /user/hanaes does not exist in HDFS.
Solution: The directory /user/hanaes and the does not exist in HDFS. The hanaes user must exist in HDFS. To create this user manually, see Confirm that the following folder structure is created, and is owned by the user hanaes [page 37].
SAP HANA Spark Controller Installation GuideTroubleshooting P U B L I C 91
Error Code Description
A.06 Cause: The directory /user/hanaes exists in HDFS but is not owned by the user group hanaes:hdfs for Hortonworks and Cloudera distributions, or hanaes:sapsys for MapR distributions.
Solution: The directory /user/hanaes exists and the user must be owned by the hanaes:hdfs group. To change ownership, see Confirm that the following folder structure is created, and is owned by the user hanaes [page 37].
H.09 Cause: The mapred-site.xml file was not found.
Solution: Ensure the environment variables are set correctly.
See Environment Variables for hana_hadoop-env.sh [page 46].
A.10 Cause: The properties in the hanaes-site.xml file are not set, or they are configured incorrectly.
Solution: Configure hanaes-site.xml:
● For Ambari, see the Ambari installation step: Go to Spark Controller Configs Custom hanaes-site and add the following properties [page 15].
● For Cloudera Manager, see Modify Configuration Properties (Cloudera Manager) [page 29].● For manual installation, see Configuration Properties [page 49].
M.T.13 Cause: The yarn-site.xml file was not found.
Solution: The configuration file yarn-site.xml should exist under the directory listed in the diagnostic tool's output. However, this file could not be found. Contact your Hadoop administrator.
A.14 Cause: The file hana_hadoop-env.sh must contain the entry export HIVE_CONF_DIR=<path>. This error occurs when no such entry exists, when this entry has been commented out, or when no <path> has been provided.
Solution: Ensure the line export HIVE_CONF_DIR=<path> exists in the hana_hadoop-env.sh file.
See Environment Variables for hana_hadoop-env.sh [page 46].
A.15 Cause: The file hana_hadoop-env.sh must contain the entry export HIVE_CONF_DIR=<path>. This error occurs when a <path> has been provided but does not point to a valid file system location.
Solution: Ensure the line export HIVE_CONF_DIR=<path> exists in the hana_hadoop-env.sh file.
See Environment Variables for hana_hadoop-env.sh [page 46].
A.16 Cause: The Hive metastore is not running.
Solution: Check your Hadoop cluster Web UI or contact your Hadoop administrator.
92 P U B L I CSAP HANA Spark Controller Installation Guide
Troubleshooting
Error Code Description
A.18 Cause: The core-site.xml file was not found.
Solution: The configuration file core-site.xml should exist under the directory listed in the diagnostic tool's output. However, this file could not be found. Contact your Hadoop administrator.
See Configure hanaes User Proxy Settings [page 60].
A.19 Cause: The hadoop.proxyuser.hanaes.hosts parameter does not exist, or is not set.
Solution: Set the appropriate proxy configuration parameters in the core-site.xml file.
See Configure hanaes User Proxy Settings [page 60].
A.20 Cause: The hadoop.proxyuser.hanaes.group parameter does not exist, or is not set.
Solution: Set the appropriate proxy configuration parameters in the core-site.xml file.
See Configure hanaes User Proxy Settings [page 60].
A.21 Cause: The hdfs-site.xml file could not be found.
Solution: The configuration file hdfs-site.xml should exist under the directory listed in the diagnostic tool's output. However, this file could not be found. Contact your Hadoop administrator.
A.T.07 Cause: The spark controller folder structure has not been created.
Solution: During manual installation of spark controller, the directories /usr/sap/spark/controller/conf, /usr/sap/spark/, controller/bin, /usr/sap/spark/controller/lib, and /usr/sap/spark/controller/utils should have been created.
If this error occurs, spark controller may not have been installed correctly from the tar.gzor rpm file, and may need to be downloaded again.
See the manual installation step: Confirm that the following folder structure is created, and is owned by the user hanaes [page 37].
C.17 Cause: The directory /var/run/cloudera-scm-agent/process does not exist.
Solution: The directory /var/run/cloudera-scm-agent/process should have been created during the Cloudera installation process. However, this directory could not be found. Contact your Hadoop administrator.
H.08 Cause: The mapred-site.xml file is missing or not correctly configured.
Solution: The mapred path must be set in mapred-site.xml, otherwise Hadoop will not be able to run Map Reduce jobs.
See Modify mapreduce.application.classpath [page 13].
NoteThe framework/hadoop/share/hadoop/tools/lib/* needs to be set correctly, or not set at all.
SAP HANA Spark Controller Installation GuideTroubleshooting P U B L I C 93
Error Code Description
M.T.03 Cause: The installed Spark assembly distribution uses Apache, which is not supported by MapR.
Solution: Apache's spark-assembly jar distribution is not supported by MapR. Replace the jar file with the MapR spark-assembly jar distribution.
See Add Properties for YARN [page 43].
M.T.11 Cause: The yarn-site.xml file does not contain yarn.application.classpath or it was not correctly configured.
Solution: Ensure the property yarn.application.classpath exists in the yarn-site.xml file.
See Configure YARN Properties for MapR [page 43].
M.T.12 Cause: The yarn-site.xml file does not contain yarn.scheduler.maximum-allocation-mb or it was not correctly configured.
Solution: Ensure the property yarn.application.classpath exists in the yarn-site.xml file.
See Configure YARN Properties for MapR [page 43].
9.2 SAP HANA Hadoop Integration Memory Leak for Spark Versions 1.5.2 and 1.6.x
Description: A memory leak occurs with Spark versions 1.5.2, 1.6.0, 1.6.1, and 1.6.2. This leak is attributed to Tungsten (an Apache open source project) which is a part of the distributed computational framework. The Spark containers/executors shut down and Spark becomes unresponsive.
If there is a memory leak, the executors will fail and the spark controller log will show that your container is missing, and a new container has started.
The following is an example of the error stack:
Exit status: 52. Diagnostics: Exception from container-launch. Container id: container_e07_1476226555327_0121_02_000025Exit code: 52Stack trace: ExitCodeException exitCode=52:atorg.apache.hadoop.util.Shell.runCommand(Shell.java:576)atorg.apache.hadoop.util.Shell.run(Shell.java:487)atorg.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)at ………
94 P U B L I CSAP HANA Spark Controller Installation Guide
Troubleshooting
Solution: If the issue is a memory leak, spark controller logs Error 52 and you will see the following error in the Apache Spark container log:
16/20/28 20:11:02 WARN memory.TaskMemoryManager: leak 64.0 MB memory from org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@1b00.ffa8
To access the Spark container log:
1. Find the application ID for spark controller. The Ambari Resource Manager UI can be used for HDP distributions of Hadoop and comparable administration tools can be used for other Hadoop distributions.
2. Using Putty, login to the Linux machine where spark controller is running and execute:
sudo su – hanaes
3. Run the following using the application ID and redirect the log file to the tmp directory:
yarn logs –applicationID application_1477939831078_0008 >> /tmp/containers.log
Make sure that the /tmp folder has enough disk space for the log information. Note that application_1477939831078_0008 is an example of the application ID
4. Search for the word “leak” in /tmp/containers.log.
To resolve the issue:
Spark 1.5.2
Tungsten is enabled by default and needs to be disabled. Disable Tungsten by adding the following property in the spark controllerhanaes-site.xml file:
<property> <name>spark.sql.tungsten.enabled</name> <value>false</value> <final>true</final> </property>
Using Spark 1.5.2 with spark controller 2.0 requires the following property in the spark controller hanaes-site.xml file in addition to the property mentioned above:
<property> <name>spark.sql.hive.metastore.sharedPrefixes</name> <value>com.mysql.jdbc,org.postgresql,com.microsoft.sqlserver,oracle.jdbc,org.apache.hadoop</value> </property>
Spark 1.6.0, 1.6.1, and 1.6.2
Tungsten cannot be disabled on Spark versions 1.6.0, 1.6.1, and 1.6.2 and the memory leak cannot be avoided. If you experience memory leak issues, use Spark 1.5.2 and set the properties in the hanaes-site.xml file as described above.
Reference: https://launchpad.support.sap.com/#/notes/2385144
SAP HANA Spark Controller Installation GuideTroubleshooting P U B L I C 95
9.3 SAP HANA Spark Controller Unsupported Features and Datatypes for Spark 1.5.2
Description: Spark controller does not support the following features and data types when you are running Spark 1.5.2 with spark controller on SAP HANA platform 1.0 SPS 12.
Features:
● Nominal key● Clash Strategy
Datatypes:
● TEXT● SHORTTEXT● BINTEXT● BLOB● CLOB● TIME● NCLOB● ALPHANUM● ST_POINT● ST_GEOMETRY● BOOLEAN● ARRAY● SECONDDATE
The following has limited support:
● Packeting - Packeting is available only when moving data from SAP HANA to Hadoop.● CHAR - this datatype is not supported on Hive tables when using Spark as the execution engine. To
workaround this issue, change the datatype from CHAR to VARCHAR or STRING type.
Reference: https://launchpad.support.sap.com/#/notes/2315404
9.4 Cannot Execute Service Actions or Turn Off Service Level Maintenance Mode on Ambari
Description: When you install spark controller using Ambari, you may encounter an issue where you cannot execute operations using Service Actions (Start, Stop, Restart All or Turn On/Off Maintenance Mode) and the message the following message is displayed in /var/log/ambari-server/ambari-server.log: Cannot determine request operation level. Operation level property should be specified for this request. The message is not available from Web UI.
96 P U B L I CSAP HANA Spark Controller Installation Guide
Troubleshooting
Additionally, if you run Restart All from Service Actions, and then check the Turn On Maintenance Mode for SparkController option from the confirmation pop-up window, the Service Level Maintenance Mode will be turned on and cannot be turned off.
This issues is due to an Ambari front-end to back-end issue, and spark controller is to able grant the permission to operate on the Service Level correctly.
Solution: Always execute operations at the Component Level. To access the spark controller Component Level drop-down menu, use either:
● Spark Controller SparkController Components
● Hosts host (the one you installed Spark controller) Components
If, based on the above symptoms, you inadvertently turned on the Service Level Maintenance Mode, and cannot turn it off. Run the following command from the console window using RESTful API to indicate the correct operation level from the front-end to the back-end so that the maintenance mode can be turned off:
curl -u <user>:<password> -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Remove SparkController from maintenance mode"}, "Body": {"ServiceInfo": {"maintenance_state": "OFF"}}}' htttp://<hostname>:8080/api/v1/clusters/<stackname>/services/SparkController
Reference: https://launchpad.support.sap.com/#/notes/2315432
9.5 SAP Vora - SAP HANA Spark Controller Fails To Start
Description: Spark controller fails to start, or running SQL statement fails against SAP HANA virtual tables created from an SAP Vora data source. The Spark controller log, var/log/hanaes/hana_controller.log shows the following error messages:
ERROR RequestOrchestrator: Result set was not fetched by connected Client. Hence cancelled the execution ERROR RequestOrchestrator: org.apache.spark.SparkException: Job 0 cancelled part of cancelled job group at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
The environment is:
● SAP HANA● SAP HANA Vora 1.2 or Hadoop HIVE● Spark controller 1.5 Patch 5 or spark controller 1.6 Patch 1
To reproducing the issue:
Starting spark controller using the Ambari UI, or using the command ./hanaes start results in the spark controller cannot be started.
Or
1. Starting spark controller via Ambari UI or command ./hanaes start.2. Create an SAP Vora remote source in SAP HANA studio.
SAP HANA Spark Controller Installation GuideTroubleshooting P U B L I C 97
3. Add the SAP Vora tables as virtual tables in SAP HANA studio.4. Run a query against the virtual tables created from the SAP Vora data source.5. Spark controller stops with an error.
Solution:
● Make sure to create an SAP Vora remote source using FQDN (Fully Qualified Domain Name)● Make sure ports 56000-58000 are open on spark controller nodes.● Maintain appropriate entries in /etc/hosts file on the SAP HANA server, so that it contains the correct
hostname, FQDN and IP address of the spark controller node.
Reference: https://launchpad.support.sap.com/#/notes/2396015
9.6 The TINYINT Datatype is not Supported When Accessing Apache Hive Tables
Description: The TINYINT datatype is not supported when accessing Apache Hive tables using spark controller. No error message is raised to indicate that there are incompatible datatype definitions.
Using the TINYINT datatype is not supported when accessing Apache Hive tables using spark controller.
● Apache Hive defines the TINYINT datatype is as a signed integer with a ranging of -128 to 128.● SAP HANA defines the TINYINT datatype as unsigned integer with a range of 0 to 255.
Solution: To access Apache Hive tables using spark controller, convert the TINYINT datatype in your table schema to a compatible datatype, such as SMALLINT, INTEGER, or BIGINT before exchanging data and to allow consistency across the two databases.
Reference: https://launchpad.support.sap.com/#/notes/2542953
9.7 Fixing Classpath Order - Error Logs Shows the Exception "URI is not hierarchical"
Description: When installing spark controller 2.0 SP02 PL0, it may not start correctly during the final step of the installation. The spark controller logs shows the exception "URI is not hierarchical".
Solution: Do one of the following:
● For Cloudera Manager, this issue is resolved by updating to version 2.0 SP02 PL1.● For manual installations, upgrade to version 2.0 SP02 PL1, however you can change the order of the
classpath in hanaes as a workaround. For example:Original:
# CLASSPATH initially contains $HADOOP_CONF_DIRCLASSPATH="${HANA_SPARK_ASSEMBLY_JAR}:${HANA_SPARK_ADDITIONAL_JARS}:${HADOOP_CLASSPATH}"
98 P U B L I CSAP HANA Spark Controller Installation Guide
Troubleshooting
CLASSPATH="${CLASSPATH}:${DEFAULT_ESCONF_DIR}:${HADOOP_CONF_DIR}:${HIVE_CONF_DIR}:$bin/../*:$bin/../lib/*:${HADOOP_HOME}/*:${HADOOP_HOME}/lib/*:${HADOOP_HDFS_HOME}/*:${HADOOP_HDFS_HOME}/lib/*"
Revised:
# CLASSPATH initially contains $HADOOP_CONF_DIRCLASSPATH="${DEFAULT_ESCONF_DIR}:${HANA_SPARK_ASSEMBLY_JAR}:${HANA_SPARK_ADDITIONAL_JARS}:${HADOOP_CLASSPATH}"CLASSPATH="${CLASSPATH}:${HADOOP_CONF_DIR}:${HIVE_CONF_DIR}:$bin/../*:$bin/../lib/*:${HADOOP_HOME}/*:${HADOOP_HOME}/lib/*:${HADOOP_HDFS_HOME}/*:${HADOOP_HDFS_HOME}/lib/*"
Reference: https://launchpad.support.sap.com/#/notes/2516409
9.8 Enable SAP HANA Spark Controller to Fetch Data From Each Spark Executor Node in the Network Directly
Description: Spark controller is configured to use ports 7860 and 7861 by default. Port 7860 is used to exchange requests or messages with SAP HANA, and port 7861 is used by SAP HANA to fetch the data (which is refered to as tunneling). In this scenario, the data is sent from the Hadoop cluster nodes (executors) through spark controller to SAP HANA. An alternative to tunneling is peer-to-peer parallel data transfer, whereby SAP HANA connects to, and fetches data from each Spark executor node in the network directly.
Peer-to-peer parallel data transfers is used when there is an increased amount of data transfers, and when having multiple ports open is not a security concern (such as in an internal network).
To support peer-to-peer setting, ports 56000 to 58000 must be open on the Spark executor nodes and the sap.hana.p2p.transfer.enabled configuration parameter must be set to true.
Reference: https://launchpad.support.sap.com/#/notes/2554425
9.9 Configure SAP HANA Spark Controller for Non-Proxy Server Environments
Description: In a non-proxy server environment, you may see an error such as, "Result set was not fetched by connected Client" if the ports are not reachable.
(Version 2.0 SP01 PL01 only) Spark controller uses ports 7860 and 7861 by default. However, spark controller supports an SAP HANA connection to the Spark executor nodes through the port range 56000 to 58000 for non-proxy server environments. If ports 56000 to 58000 are not available in this scenario, you may see an error such as: Result set was not fetched by connected Client.
Solution: For spark controller installations that are configured for a non-proxy server environment, ensure that the Spark executor node is reachable and the port connections in the 56000–58000 range are open.
Also, maintain these entries in the /etc/hosts file on the SAP HANA server:
SAP HANA Spark Controller Installation GuideTroubleshooting P U B L I C 99
● Hostname● FQDN (Fully Qualified Domain Name)● IP address of the Spark executor node
◦
Reference: https://launchpad.support.sap.com/#/notes/2554388
9.10 SAP HANA Spark Controller Moves Incorrect Number of Records When Using Date Related Built-ins
Description: When the filter condition for moving data from Hadoop to SAP HANA involves any date related built-ins, for example ADD_MONTHS(), an incorrect number of records may get moved and could lead to records getting lost in some situations. You may not see any errors when this happens.
This situation could arise when the input date to these built-ins is not specified in the YYYY-MM-DD format.
Solution: To work around this scenario, supply the input date in the format of: YYYY-MM-DD.
Reference: https://launchpad.support.sap.com/#/notes/2443093
9.11 Data Warehousing Support
See the following SAP Notes:
● https://launchpad.support.sap.com/#/notes/2456468● https://launchpad.support.sap.com/#/notes/2290922
100 P U B L I CSAP HANA Spark Controller Installation Guide
Troubleshooting
Important Disclaimers and Legal Information
Coding SamplesAny software coding and/or code lines / strings ("Code") included in this documentation are only examples and are not intended to be used in a productive system environment. The Code is only intended to better explain and visualize the syntax and phrasing rules of certain coding. SAP does not warrant the correctness and completeness of the Code given herein, and SAP shall not be liable for errors or damages caused by the usage of the Code, unless damages were caused by SAP intentionally or by SAP's gross negligence.
Gender-Neutral LanguageAs far as possible, SAP documentation is gender neutral. Depending on the context, the reader is addressed directly with "you", or a gender-neutral noun (such as "sales person" or "working days") is used. If when referring to members of both sexes, however, the third-person singular cannot be avoided or a gender-neutral noun does not exist, SAP reserves the right to use the masculine form of the noun and pronoun. This is to ensure that the documentation remains comprehensible.
Internet HyperlinksThe SAP documentation may contain hyperlinks to the Internet. These hyperlinks are intended to serve as a hint about where to find related information. SAP does not warrant the availability and correctness of this related information or the ability of this information to serve a particular purpose. SAP shall not be liable for any damages caused by the use of related information unless damages have been caused by SAP's gross negligence or willful misconduct. All links are categorized for transparency (see: https://help.sap.com/viewer/disclaimer).
SAP HANA Spark Controller Installation GuideImportant Disclaimers and Legal Information P U B L I C 101
go.sap.com/registration/contact.html
© 2018 SAP SE or an SAP affiliate company. All rights reserved.No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company. The information contained herein may be changed without prior notice.Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors. National product specifications may vary.These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. All other product and service names mentioned are the trademarks of their respective companies.Please see https://www.sap.com/corporate/en/legal/copyright.html for additional trademark information and notices.