sap hana spark controller installation guide hana spark controller installation guide. ... about sap...

102
Installation Guide PUBLIC SAP HANA Spark Controller 2.0 SP02 Document Version: 1.0 – 2017-07-26 SAP HANA Spark Controller Installation Guide

Upload: doandat

Post on 22-Mar-2018

274 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Installation Guide PUBLIC

SAP HANA Spark Controller 2.0 SP02Document Version: 1.0 – 2017-07-26

SAP HANA Spark Controller Installation Guide

Page 2: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Content

1 Getting Started with SAP HANA Spark Controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.1 Audience. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2 Hardware and Operating System Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3 SAP HANA Spark Controller Releases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4 Download SAP HANA Spark Controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.5 Compatibility. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.6 Overview of Startup and Configuration Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Hadoop Integration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10

3 Installing SAP HANA Spark Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.1 Ambari . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11

Installation Prerequisites (Ambari) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Modify mapreduce.application.classpath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Install SAP HANA Spark Controller Using Ambari. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Post Installation Checks and Troubleshooting (Ambari). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Modify Configuration Properties (Ambari) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Start or Stop SAP HANA Spark Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18SAP HANA Ambari Integration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18Uninstall from Ambari. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 Cloudera Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20Installation Prerequisites (Cloudera Manager) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21Install SAP HANA Spark Controller Using Cloudera Manager. . . . . . . . . . . . . . . . . . . . . . . . . . . 22Post Installation Checks and Troubleshooting (Cloudera). . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28Modify Configuration Properties (Cloudera Manager) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29Run the Diagnostic Utility. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30Start or Stop the SAP HANA Spark Controller Service. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32Uninstall from Cloudera Manager. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.3 Manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34Installation Prerequisites (Manual). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35Manually Install SAP HANA Spark Controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36Post Installation Checks and Troubleshooting (Manual). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38Start SAP HANA Spark Controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Uninstall from a Manual Installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .40Update SAP HANA Spark Controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41

3.4 MapR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41Installation Prerequisites (MapR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2 P U B L I CSAP HANA Spark Controller Installation Guide

Content

Page 3: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Add Properties for YARN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Install SAP HANA Spark Controller for MapR Distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . .43

4 Configuring SAP HANA Spark Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .444.1 Port Configurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .444.2 Update Configuration Parameters when Upgrading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.3 Environment Variables for hana_hadoop-env.sh. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .464.4 Spark DataNucleus JARs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Configuring the DataNucleus JARs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.5 Configuration Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Resource Allocation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55Configuring Cloud Deployment Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59Distribution Deployment Configuration Templates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.6 Configure hanaes User Proxy Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60Ambari. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62Cloudera Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.7 Configuring a Proxy Server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644.8 Enabling Remote Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

Remote Caching Configuration Parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5 Setting Up Security. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675.1 LDAP Authentication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .675.2 Configure Auditing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685.3 Kerberos. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Configure Kerberos SSO on the SAP HANA Server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725.4 SSL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

Configure SSL Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76OpenSSL Command Syntax for SAP HANA Spark Controller. . . . . . . . . . . . . . . . . . . . . . . . . . . 77Configure SSL Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78SSL Mode Configure Parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6 Create a Remote Source. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82

7 Create a Custom Spark Procedure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 837.1 Privileges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 847.2 Virtual Package System Built-Ins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

8 Data Lifecycle Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

9 Troubleshooting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 889.1 Troubleshooting Diagnostic Utility. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Run the Diagnostic Tool. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89Error Messages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

9.2 SAP HANA Hadoop Integration Memory Leak for Spark Versions 1.5.2 and 1.6.x. . . . . . . . . . . . . . . . 94

SAP HANA Spark Controller Installation GuideContent P U B L I C 3

Page 4: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

9.3 SAP HANA Spark Controller Unsupported Features and Datatypes for Spark 1.5.2. . . . . . . . . . . . . . 969.4 Cannot Execute Service Actions or Turn Off Service Level Maintenance Mode on Ambari. . . . . . . . . .969.5 SAP Vora - SAP HANA Spark Controller Fails To Start. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 979.6 The TINYINT Datatype is not Supported When Accessing Apache Hive Tables . . . . . . . . . . . . . . . . . 989.7 Fixing Classpath Order - Error Logs Shows the Exception "URI is not hierarchical". . . . . . . . . . . . . . .989.8 Enable SAP HANA Spark Controller to Fetch Data From Each Spark Executor Node in the Network

Directly. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 999.9 Configure SAP HANA Spark Controller for Non-Proxy Server Environments. . . . . . . . . . . . . . . . . . . 999.10 SAP HANA Spark Controller Moves Incorrect Number of Records When Using Date Related Built-

ins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1009.11 Data Warehousing Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

4 P U B L I CSAP HANA Spark Controller Installation Guide

Content

Page 5: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

1 Getting Started with SAP HANA Spark Controller

SAP HANA spark controller supports SAP HANA in-memory access to data in Hadoop cluster HDFS data files.

Spark controller allows SAP HANA to access Hadoop data through an SQL interface. Primarily working with Spark SQL, spark controller connects to an existing Hive metastore. The Spark SQL adapter is a plug-in for SAP HANA Smart Data Access (SDA) that provides access to spark controller, and moderates query execution and data transfer.

Spark controller is assembled, installed, and configured on a Hadoop cluster. YARN and Spark Assembly JAR are used to connect to the HDFS system, with YARN as the resource management layer for the Hadoop ecosystem. If you are already running SDA scenarios and connecting to HiveServer through an ODBC driver, you can migrate to Spark controller with minimal configuration.

On the Hadoop side, spark controller provides a SQL interface to underlying Hive that use Spark SQL and performs the following functions:

● Facilitates query execution and enables SAP HANA to fetch data in a compressed columnar format.● Supports SAP HANA-specific query optimizations and secure communication.● Facilitates data transfer between SAP HANA and executor nodes.

SAP HANA Spark Controller Architecture

* You can configure spark controller to exploit HDFS as extended storage for aging data from SAP HANA via Data Lifecycle Manager (DLM). See Data Lifecycle Manager [page 87].

SAP HANA Spark Controller Installation GuideGetting Started with SAP HANA Spark Controller P U B L I C 5

Page 6: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Related Information

Audience [page 6]Hardware and Operating System Support [page 6]SAP HANA Spark Controller Releases [page 7]Download SAP HANA Spark Controller [page 7]Compatibility [page 7]Overview of Startup and Configuration Files [page 8]

1.1 Audience

The information in this document is intended for technical users who:

● Want to install and configure spark controller on an existing Hadoop cluster.● Have prior knowledge of monitoring and troubleshooting Hadoop cluster operations.● Are familiarity with Hadoop and Spark, as an operator or developer.

1.2 Hardware and Operating System Support

SAP HANA spark controller is supported on these hardware platforms and operating systems.

● Supported Hardware Platforms:○ Intel-based hardware platforms○ IBM Power Systems running Red Hat 7.2 for Hortonworks 2.6

● Supported Operating Systems for SAP HANA:○ Red Hat Enterprise Linux for SAP Solutions○ Red Hat Enterprise Linux for SAP HANA○ SUSE Linux Enterprise Server for SAP Applications○ SUSE Linux Enterprise Server

NoteFor Debian systems, use the alien computer program to convert a Linux RPM package distribution file format to Debian. For more information, see Manually Install SAP HANA Spark Controller [page 36].

6 P U B L I CSAP HANA Spark Controller Installation Guide

Getting Started with SAP HANA Spark Controller

Page 7: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

1.3 SAP HANA Spark Controller Releases

spark controller is a component of SAP HANA platform edition.

spark controller SP releases are delivered on the same schedule as SAP HANA platform, however, on occasion, spark controller is updated with PL releases off schedule of the SAP HANA platform edition software. Be sure to check for the latest patch level releases here:

Software Downloads Site > By Alphabetical Index (A-Z) H SAP HANA PLATFORM EDITION SAP HANA PLATFORM EDITION 2.0 HANA SPARK CONTROLLER 2.0

1.4 Download SAP HANA Spark Controller

The SAP HANA Spark controller installation and upgrade packages are available on the SAP Software Download Center.

Procedure

1. To download the installation media for SAP HANA Spark controller, go to:

Software Downloads Site .2. Click SUPPORT PACKAGES & PATCHES.

3. Go to: By Alphabetical Index (A-Z) H SAP HANA PLATFORM EDITION SAP HANA PLATFORM EDITION 2.0 HANA SPARK CONTROLLER 2.0

Choose the installation package.

1.5 Compatibility

Spark controller 2.0 SP02 is compatible with SAP HANA 2.0 versions, as well as SAP HANA 1.0 SPS 12.

Features introduced in later versions of Spark controller, such as remote caching, are not supported on SAP HANA 1.0 SPS12.

Earlier versions of Spark controller are not compatible with SAP HANA 2.0 versions.

For more information about version compatibility, distribution support, and SAP Vora compatibility, see the SAP HANA Spark Controller Compatibility Matrix.

SAP HANA Spark Controller Installation GuideGetting Started with SAP HANA Spark Controller P U B L I C 7

Page 8: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

1.6 Overview of Startup and Configuration Files

There are a number of files that are referenced in this document which are used to define environment variables, properties, or specify the location of directories for: Hadoop distributions, YARN, Hive, Spark, and Spark controller.

Table 1: Supported Files

File Description and Default Location

hanaes_site.xml Lists properties for configuring Spark controller.

Properties that are set in this file for Spark (properties that start with spark) are also re­spected. You can use these standard Spark parameters to change the general behavior of Spark controller.

Location: /usr/sap/spark/controller/conf/hanaes_site.xml

hana_hadoop-env.sh Lists environment variables that specify Spark controller dependencies, such as the direc­tory locations of components and libraries.

Location: /usr/sap/spark/controller/conf/hana_hadoop-env.sh

hive-site.xml Provides Spark-application configurations for Hive, such as where to connect to a remote Hive Metastore server. Set the HIVE_CONF_DIR environment variable to the location of this file.

Location: /etc/hive/conf/hive-site.xml

Spark assembly JAR file A JAR file that bundles all the required dependencies for running Spark.

Location: The location depends on your Hadoop distribution. An example of the location for Cloudera is: /opt/cloudera/parcels/CDH-<version>/lib/spark/spark-assembly-<version>.jar

hana.spark.controller

File required to start Spark controller.

Location: /var/run/hanaes/hana.spark.controller

hana_controller.log A log file for the hanaes user. The hanaes user is created when installing Spark control­ler.

Location: /var/log/hanaes/hana_controller.log

spark-defaults.conf Configuration file for setting the default environment for all Spark jobs submitted on the local host.

Location: /usr/sap/spark/controller/conf/spark-defaults.conf

yarn-site.xml Stores YARN configuration options.

Location: The location depends on your Hadoop distribution. An example of the location for MapR is: /opt/mapr/hadoop/hadoop-2.x.x/etc/hadoop/yarn-site.xml

8 P U B L I CSAP HANA Spark Controller Installation Guide

Getting Started with SAP HANA Spark Controller

Page 9: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

File Description and Default Location

mapred-site.xml Lists configuration parameters that override the default values for MapReduce parameters.

Location: /opt/mapr/hadoop/hadoop-<version>/etc/hadoop/mapred-site.xml

core-site.xml Lists the configuration parameters for Hadoop, such as I/O settings that are common to HDFS and MapReduce, and informs the Hadoop daemon of the location of the NameNode running on the cluster.

Location: The location depends on your Hadoop distribution. An exampleof the location for Cloudera is: /etc/hadoop/<service_name>/conf/core-site.xml

hdfs-site.xml List the configuration settings for HDFS daemons: NameNode, Secondary NameNode, and DataNodes.

Location: The location depends on your Hadoop distribution. An example of the location for Cloudera is: /etc/hadoop/<service_name>/conf/hdfs-site.xml

hadoop-env.sh Lists Hadoop specific environment variables.

Location: The location depends on your Hadoop distribution. An example of the location for Cloudera is: /etc/hadoop/<service_name>/conf/hadoop-env.sh

log4j.properties

.sparkStaging Spark and YARN file under the /user/hanaes directory on HDFS. To see files, enter: hdfs dfs -ls /user/hanaes (on MaprR use: hadoop fs –mkdir /user/hanaes).

SAP HANA Spark Controller Installation GuideGetting Started with SAP HANA Spark Controller P U B L I C 9

Page 10: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

2 Hadoop Integration

There are two methods available to enable communication between SAP HANA and your Hadoop system: SAP HANA spark controller and the Hive ODBC driver.

This document describes installing and configuring SAP HANA spark controller. For additional information about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller Installation Guide.

10 P U B L I CSAP HANA Spark Controller Installation Guide

Hadoop Integration

Page 11: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

3 Installing SAP HANA Spark Controller

Depending on your distribution, different options are available for installing and configuring SAP HANA spark controller.

● Cloudera – Use Cloudera Manager or manually install spark controller.● Hortonworks – Use Ambari or manually install spark controller.● MapR – Manually install spark controller.

Related Information

Ambari [page 11]Cloudera Manager [page 20]Manual [page 34]MapR [page 41]

3.1 Ambari

If you are using the Hortonworks distribution of Hadoop, set up SAP HANA spark controller using the Ambari Web UI.

Non-root users do not have permission to install spark controller using Ambari, and must install and configure spark controller manually.

Related Information

Installation Prerequisites (Ambari) [page 12]Modify mapreduce.application.classpath [page 13]Install SAP HANA Spark Controller Using Ambari [page 14]Post Installation Checks and Troubleshooting (Ambari) [page 16]Modify Configuration Properties (Ambari) [page 17]Start or Stop SAP HANA Spark Controller [page 18]SAP HANA Ambari Integration [page 18]Uninstall from Ambari [page 20]

SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 11

Page 12: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

3.1.1 Installation Prerequisites (Ambari)

This section provides a list of installation prerequisites when using Spark controller with Ambari.

Task Description

SAP HANA Install one of these spark controller compatible SAP HANA versions:

● 1.0 SPS12● 2.0 SPS00● 2.0 SPS01● 2.0 SPS02

Hadoop cluster Your Hadoop cluster requires Hive metastore, YARN, and Spark. The core-site.xml and hdfs-site.xml files must exist with the appropriate configurations for your Hadoop cluster.

Use Apache Spark 1.5.2 or 1.6.x.

The Hortonworks distributions HDP 2.4, 2.5, and 2.6 have been tested. Additional Hortonworks ver­sion are expected to be compatible with Spark controller, but have not been tested.

Download Spark controller

Software Downloads Site > By Alphabetical Index (A-Z) H SAP HANA PLATFORM

EDITION SAP HANA PLATFORM EDITION 2.0 HANA SPARK CONTROLLER 2.0

Spark assembly file If you have Spark installed on your cluster, you can find the assembly file is located here:

Hortonworks – $SPARK_HOME defaults to /usr/hdp/current/spark-client.

During the installation process, you will set the HANA_SPARK_ASSEMBLY_JAR variable to the lo­cation of the spark assembly file.

If you do not have Spark installed on your cluster, download it from the Apache Mirror website at https://spark.apache.org/downloads.html .

Proxy User Spark controller impersonates the currently logged-in user while accessing Hadoop services. You can configuring user proxy settings for SAP HANA hanaes in the core-site.xml file.

See Configure hanaes User Proxy Settings [page 60].

Specify the HDP version

Modify the mapreduce.application.classpath. See Modify mapreduce.application.class­path [page 13].

12 P U B L I CSAP HANA Spark Controller Installation Guide

Installing SAP HANA Spark Controller

Page 13: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

3.1.2 Modify mapreduce.application.classpath

(Hortonworks distribution only) Using Ambari, modify the mapreduce.application.classpath property to specify the HDP version.

Procedure

1. Find the Hadoop version you are using. The version is listed as a directory under /usr/hdp.

For example, if you are using Hadoop 2.4 and the directory is listed as /usr/hdp/2.4.2.0-258, the Hadoop version (or <hdp.version>) is 2.4.2.0-258.

2. From the menu on the left side of the Ambari Dashboard, click MapReduce2.3. Click the Configs tab.4. Click the Advanced tab, then expand Advanced mapred-site.5. Update the property mapreduce.application.classpath by removing all entries containing $

{<hdp.version>} and replace them with an appropriate HDP value.

An example of this property is:

$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${<hdp.version>}/hadoop/lib/hadoop-lzo-0.6.0.${<hdp.version>}.jar:/etc/hadoop/conf/secure

SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 13

Page 14: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

An example of the property with the HDP version is:

$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/2.4.2.0-169/hadoop/lib/hadoop-lzo-0.6.0.2.4.0.0-258.jar:/etc/hadoop/conf/secure

6. Save your configuration changes and restart the MapReduce2 job and related services such as YARN and Hive.

3.1.3 Install SAP HANA Spark Controller Using Ambari

Install SAP HANA Spark controller using the Ambari Web UI.

Prerequisites

See Installation Prerequisites (Ambari) [page 12].

You must have root permission.

Procedure

1. Move the 2.0 SP02 download package to your Ambari Server host. See Download SAP HANA Spark Controller [page 7].

2. Extract the download file, which is a TAR archive file that contains three types of installer binaries: RPM, Ambari, and Cloudera.

3. Copy and extract the controller.distribution-<spark_controller_version>-Ambari-Archive.tar.gz file to your Ambari Server services folder to create a new Spark controller directory. For example:

sudo cp controller.distribution-<spark_controller_version>-Ambari-Archive.tar.gz /var/lib/ambari-server/resources/stacks/HDP/<hdp_version>/services sudo tar -xvf controller.distribution-<spark_controller_version>-Ambari-Archive.tar.gz

4. Restart the Ambari Server by executing either of the following commands:

○ ambari-server restart○ sudo /usr/sbin/ambari-server restart

5. Log in to Ambari and select Actions Add Service :

14 P U B L I CSAP HANA Spark Controller Installation Guide

Installing SAP HANA Spark Controller

Page 15: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

6. In the Add Service Wizard, select Choose Services from the left pane, then check the box for SparkController service in the right pane.

7. From the Assign Masters menu, assign the SparkController service to one of the hosts on your cluster.8. Select Customize Services, then select the SparkController tab:

a. Expand the Advanced hana_hadoop-env option. The hana_hadoop_env template includes a list of environment variables and paths. Replace #export HANA_SPARK_ASSEMBLY_JARS= with the path to the Spark JAR file. The Spark JAR file must be located on the same node where Spark controller is installed. For example:

export HANA_SPARK_ASSEMBLY_JAR=$SPARK_HOME/lib/spark-assembly-1.5.2-hadoop2.6.0.jar

b. (Required for Spark versions 1.6.x) Add the path of the datanucleus-* libraries in to HADOOP_CLASSPATH. For example:

export HADOOP_CLASSPATH=/etc/hive/conf:/usr/hdp/<hdp_version>/spark/lib/datanucleus-api-jdo-<version>.jar:/usr/hdp/<hdp_version>/spark/lib/datanucleus-rdbms-<version>.jar:/usr/hdp/<hdp_version>/spark/lib/datanucleus-rdbms-<version>.jar

c. Expand the Advanced hanaes-site option and set the sap.hana.es.server.port number to 7860. Specify the spark.executor.instances and spark.executor.memory values.

The properties set in this form are the default values defined in the hanaes_site.xml file included with the Spark controller.

Depending on your environment, additional properties may be required. For more information, see Configuration Properties [page 49].

d. Expand the Custom hanaes-site option. Click Add Property and add the following properties:

Key: spark.sql.hive.metastore.sharedPrefixes

SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 15

Page 16: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Value: com.mysql.jdbc,org.postgresql,com.microsoft.sqlserver,oracle.jdbc,org.apache.hadoop

9. Click Next, then Deploy to install and start Spark controller on the host that you indicated.10. To verify that Spark controller started correctly without waiting for the Ambari Web UI to update the

controller's status, log onto the host on which you installed Spark controller and check the log, at /var/log/hanaes/hana_controller.log.

Next Steps

● Post Installation Checks and Troubleshooting (Ambari) [page 16].● Troubleshooting Diagnostic Utility [page 88].● Create a Remote Source [page 82].● Add the Ambari URL to SAP HANA Cockpit [page 19]

3.1.4 Post Installation Checks and Troubleshooting (Ambari)

This section provides an overview of the configuration that is performed during the installation. Depending on your environment, additional configuration may be required.

NoteYou can run the diagnostic utility to check your Spark controller installation and configuration. See Troubleshooting Diagnostic Utility [page 88]. The table below provides information about post installation configurations and troubleshooting.

Task Description

Environment varia­bles

When installing Spark controller, you set the location for HANA_SPARK_ASSEMBLY_JARS. This path is set in the hana_hadoop-env.sh file. To confirm or change the path, see Environment Var­iables for hana_hadoop-env.sh [page 46].

Configure spark controller dependencies in the hana_hadoop-env.sh file. Use the template for your distribution as a starting point. See Environment Variables for hana_hadoop-env.sh [page 46].

16 P U B L I CSAP HANA Spark Controller Installation Guide

Installing SAP HANA Spark Controller

Page 17: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Task Description

Hive Metastore To allow spark controller to connect to the Hive metastore, ensure that Hive is running and available, and that hive-site.xml is available in spark controller’s classpath.

The default Hive configuration path is /etc/hive/conf, and should be available in spark control­ler's classpath. If the your path differs from the default, update the HIVE_CONF_PATH environment variable in the hana_hadoop-env.sh file, located in the conf directory: /usr/sap/spark/controller/conf.

Configure hanaes When installing Spark controller, you set properties through the Advanced hanaes-site and Custom hanaes-site menus. These configuration properties are defined in the hanaes_site.xml file. Properties can be added or changed in this file, or using Ambari. See Modify Configuration Properties (Ambari) [page 17] and Configuration Properties [page 49].

These properties are required:

● sap.hana.es.server.port● spark.sql.hive.metastore.sharedPrefixes● spark.executor.memory● spark.executor.instances

Cloud deployment Configure spark controller for cloud deployment. See Configuring Cloud Deployment Example [page 59].

Upgrading When updating to a new version of spark controller, be aware of new configuration and deprecated parameters, and changes to distribution formats. See Update Configuration Parameters when Up­grading [page 45].

3.1.5 Modify Configuration Properties (Ambari)

You can modify or view the default SAP HANA Spark controller properties in the Ambari Web UI.

Procedure

1. On the main Ambari page, click SparkController, then select the Configs tab.

Spark and Spark controller support the same properties. For more information about supported properties, see the Apache Spark documentation.

2. Add and enable additional properties for Spark controller by expanding Advanced hanaes-site, or Custom hanaes-site. See Configuration Properties [page 49].

3. Save and restart Spark controller from Ambari.

SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 17

Page 18: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

3.1.6 Start or Stop SAP HANA Spark Controller

You can start, stop, or restart SAP HANA Spark controller from the Ambari UI.

Procedure

1. In the Summary section, select SparkController:

2. From the SparkController drop-down menu, select Start, Stop, or Restart.

3.1.7 SAP HANA Ambari Integration

The Apache Ambari integration with SAP HANA cockpit allows you to enter the Ambari Web URL in the cockpit and access Hadoop cluster monitoring functionality using Ambari Web UI.

After entering the Ambari Web URL, you can navigate to the Apache Ambari website and monitor Hadoop clusters. You can also use Ambari to set up Spark controller.

18 P U B L I CSAP HANA Spark Controller Installation Guide

Installing SAP HANA Spark Controller

Page 19: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Related Information

Add the Ambari URL to SAP HANA Cockpit [page 19]

3.1.7.1 Add the Ambari URL to SAP HANA Cockpit

Add Ambari to the SAP HANA cockpit.

Context

After going to the Ambari Web URL, you can navigate to the Apache Ambari website and monitor Hadoop clusters.

Procedure

1. Import the cockpit delivery unit package (HANA_HADOOP_AMBR.tgz) into SAP HANA studio.

2. Using an account with SAP HANA System Administrator role, assign these roles to all users requiring access the web application site:

○ com.sap.hana.hadoop.cockpit.ambari.data::Administrator○ sap.hana.uis.db::SITE_DESIGNER○ sap.hana.uis.db::SITE_USER

3. In the Systems view, right-click on the system name and select Configuration and Monitoring Open SAP HANA Cockpit to launch the SAP HANA cockpit.

4. Log in to the cockpit using the SAP HANA username and password.5. Select Hadoop Cluster on the home page.

If the Hadoop Cluster tile is not available, select Tile Catalog from the menu and add the Hadoop Cluster tile to a desired group.

6. For each cluster, provide a Hadoop cluster name and an Ambari URL (for example, http://my.ambari.server.url:8080).

7. Select a Hadoop cluster and click on Go to navigate to the Ambari website.

SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 19

Page 20: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

3.1.8 Uninstall from Ambari

Follow these steps to remove SAP HANA Spark controller.

Procedure

1. Log in to the Ambari Server terminal.

To put the service in installed status and then delete it from the Ambari Web UI, execute:

curl -u <Ambari_User_Name>:<Ambari_password> -H "X-Requested-By: ambari" -X GET "http://<host_name>:8080/api/v1/clusters/<cluster_name>/services/SparkController"

curl -u <Ambari_User_Name>:<Ambari_password> -H "X-Requested-by:ambari" -i -k -X PUT -d '{"ServiceInfo": {"state": "INSTALLED"}}' "http://<host_name>:8080/api/v1/clusters/<cluster_name>/services/SparkController"

curl -u <Ambari_User_Name>:<Ambari_password> -H "X-Requested-By: ambari" -X DELETE "http://<host_name>:8080/api/v1/clusters/<cluster_name>/services/SparkController"

To remove the SparkController directory from Ambari Server:

sudo rm -rf /var/lib/ambari-server/resources/stacks/HDP/<hdp_version>/services/SparkController/

2. To remove and uninstall Spark controller, log in to the Ambari Agent terminal:

To remove the SparkController directory from Ambari Agent, execute:

sudo rm -rf /var/lib/ambari-agent/cache/stacks/HDP/<hdp_version>/services/SparkController/

To uninstall Spark controller using Ambari, execute:

sudo su rm -rf /usr/sap/spark/controller userdel hanaesrm -rf /var/log/hanaes/

3.2 Cloudera Manager

If you are using the Cloudera distribution of Hadoop, set up Spark controller using the Cloudera Manager.

20 P U B L I CSAP HANA Spark Controller Installation Guide

Installing SAP HANA Spark Controller

Page 21: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Related Information

Installation Prerequisites (Cloudera Manager) [page 21]Install SAP HANA Spark Controller Using Cloudera Manager [page 22]Post Installation Checks and Troubleshooting (Cloudera) [page 28]Modify Configuration Properties (Cloudera Manager) [page 29]Run the Diagnostic Utility [page 30]Start or Stop the SAP HANA Spark Controller Service [page 32]Uninstall from Cloudera Manager [page 33]

3.2.1 Installation Prerequisites (Cloudera Manager)

This section provides a list of installation prerequisites when using Spark controller with Cloudera Manager.

Task Description

SAP HANA Install one of these spark controller compatible SAP HANA versions:

● 1.0 SPS12● 2.0 SPS00● 2.0 SPS01● 2.0 SPS02

Hadoop cluster Your Hadoop cluster requires Hive metastore, YARN, and Spark. The core-site.xml and hdfs-site.xml files must exist with the appropriate configurations for your Hadoop cluster.

Use the Spark version distributed with Cloudera Manager.

The Hadoop Cloudera distributions CDH 5.10 and 5.11 have been tested. Additional Cloudera version are expected to be compatible with Spark controller, but have not been tested.

Download Spark controller

Software Downloads Site > By Alphabetical Index (A-Z) H SAP HANA PLATFORM

EDITION SAP HANA PLATFORM EDITION 2.0 HANA SPARK CONTROLLER 2.0

NoteFor Cloudera Manager installations with Spark controller versions 2.0 SP01 and higher, use the parcels distribution format. You can use either parcels or packages if you manually deploy your Cloudera cluster. If you have already installed Cloudera using packages, install Spark controller manually and use the RPM distribution format. See Manual [page 34]. For more information about Cloudera Manager parcels, see Cloudera's documentation .

SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 21

Page 22: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Task Description

Spark assembly file If you have Spark installed on your cluster, you can find the assembly file is located here:

Cloudera – $SPARK_HOME defaults to /usr/lib/spark in package installations and /opt/cloudera/parcels/CDH/lib/spark in parcel installations.

During the installation process, you will set the HANA_SPARK_ASSEMBLY_JAR variable to the lo­cation of the spark assembly file.

If you do not have Spark installed on your cluster, download it from the Apache Mirror website at https://spark.apache.org/downloads.html .

Proxy User Spark controller impersonates the currently logged-in user while accessing Hadoop services. You can configuring user proxy settings for SAP HANA hanaes in the core-site.xml file.

See Configure hanaes User Proxy Settings [page 60].

3.2.2 Install SAP HANA Spark Controller Using Cloudera Manager

Install SAP HANA Spark controller using the Cloudera Manager.

Prerequisites

See Installation Prerequisites (Cloudera Manager) [page 21].

Procedure

1. Move the Spark controller 2.0 SP02 download package to the location where the cloudera-scm-server service is running on your Cloudera Manager host. See Download SAP HANA Spark Controller [page 7].

2. Extract the download file, which is a TAR archive file that contains three types of installer binaries: RPM, Ambari, and Cloudera.

After extracting the file, you can choose the operating system, version, and distribution for your system.3. Extract the SAPHanaSparkController-<version>-cloudera-<distribution>.tar.gz file to the

Cloudera Manager directory. For example:

sudo tar -xvf SAPHanaSparkController-<version>-cloudera-<distribution>.tar.gz -C /opt/cloudera

22 P U B L I CSAP HANA Spark Controller Installation Guide

Installing SAP HANA Spark Controller

Page 23: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

You see the directory SAPHanaSparkController-<version>-cloudera-<distribution>. Within this directory are the directories csd and parcel-repo directories:

total 7524 SAPHanaSparkController-2.2.0-el7/SAPHanaSparkController-2.2.0-el7/cds/SAPHanaSparkController-2.2.0-el7/cds/SAPHanaSparkController-2.0.0.jarSAPHanaSparkController-2.2.0-el7/parcel-repo/SAPHanaSparkController-2.2.0-el7/parcel-repo/SAPHanaSparkController-2.0.0-sles11.parcelSAPHanaSparkController-2.2.0-el7/parcel-repo/SAPHanaSparkController-2.0.0-sles11.parcel.shaSAPHanaSparkController-2.2.0-el7/parcel-repo/manifest.json

4. When you install Cloudera Manager, you have the option to activate single user mode. In single user mode, the Cloudera Manager Agent and all the processes run by services managed by Cloudera Manager are started as the single configured user and group cloudera-scm. If you are in single user mode, change the owner permissions to cloudera-scm for both user and group. For example:

[root cloudera]# chown cloudera-scm:cloudera-scm -R SAPHanaSparkController-2.2.0-el7/ [root cloudera]# lltotal 20drwxr-xr-x 2 cloudera-scm cloudera-scm 4096 June 18 00:02 cdsdrwxr-xr-x 2 root root 4096 June 18 00:03 parcel-cachedrwxr-xr-x 2 cloudera-scm cloudera-scm 4096 June 18 00:04 parcel-repodrwxr-xr-x 4 root root 4096 June 18 00:05 parcels drwxr-xr-x 4 cloudera-scm cloudera-scm 4096 June 7 01:57 SAPHanaSparkController-2.2.0-el7

5. Copy the files from SAPHanaSparkController-<version>-<distribution> into their respective folders in /opt/cloudera/. For example:

sudo cp <extract_path>/SAPHanaSparkController<version>-<distribution>/csd/* /opt/cloudera/csd/ sudo cp <extract_path>/SAPHanaSparkController<version>-<distribution>/parcel-repo/* /opt/cloudera/parcel-repo/

Directory Files

/opt/cloudera/csd ○ SAPHanaSparkController-<version>.jar

/opt/cloudera/parcel-repo ○ manifest.json○ SAPHanaSparkController-<version>-

<distribution>.parcel○ SAPHanaSparkController-<version>-

<distribution>.parcel.sha

6. To make sure the Cloudera server service can access the command scripts to start and stop Spark controller, run the following command on the console of your Cloudera server host. For example, from the /opt/cloudera/parcel-repo directory:

sudo service cloudera-scm-server restart

7. Log in to your Cloudera Manager Web UI. The Cloudera Manager displays that changes have been made since the last restart, and indicates that you need to restart the Cloudera Management Service.

SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 23

Page 24: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

8. Restart Cloudera Manager. This pulls the scripts and the descriptor from the SAPHanaSparkController-<version>.jar:

9. Refresh the browser. When the services restart and their indicators display green check marks, you have restarted successfully.

24 P U B L I CSAP HANA Spark Controller Installation Guide

Installing SAP HANA Spark Controller

Page 25: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

10. Download, distribute, and activate new parcels:

a. Select Hosts Parcels to download a new parcel.b. Select Distribute to distribute the Spark controller parcel to each host on your Cloudera cluster:

c. Select Activate to create the hanaes user, and the sapsys group on your host.

11. On the home page of Cloudera Manager Web UI, choose Add a Service from the drop-down menu to the right of the cluster name:

12. From the Service Type menu, select SAP HANA Spark Controller. Click Continue.

13. In the Add SAP HANA Spark Controller Service to Cluster menu, choose the host to install for your Spark controller service and click Continue. Although it is possible to install on more than host, SAP recommends that you only install one host because Spark controller service uses YARN for memory.

SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 25

Page 26: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

14. In the Add SAP HANA Spark Controller Service to Cluster, Review Changes menu, confirm the configuration settings for the installation, adjusting for your cluster.

This form provides a subset of properties and environment variables that you can define. After installing Spark controller you can define additional configuration properties.

Depending on your environment, additional properties may be required. For more information, see Configuration Properties [page 49].

15. In the Path of Spark Assemble Jar field, add the absolute path of the Spark assembly JAR file in the Review Changes menu. For example:

/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/spark/lib/spark-assembly-1.6.0-cdh5.11.1-hadoop2.6.0-cdh5.11.1.jar

26 P U B L I CSAP HANA Spark Controller Installation Guide

Installing SAP HANA Spark Controller

Page 27: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Cloudera is compatible with the Spark assemble versions that are included with the Cloudera Manager version you installed.

16. The Extra Classpath for Spark Executors field is set with a default value. Make sure the class the path for Spark executors is correct. For example:

Key: spark.executor.extraClassPath

Value: /opt/cloudera/parcels/CDH/lib/hadoop/lib/*:/opt/cloudera/parcels/CDH/lib/hadoop/*:/opt/cloudera/parcels/CDH/lib/hadoop-hdfs/*:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/*:/opt/cloudera/parcels/CDH/lib/hadoop-yarn/*:/opt/cloudera/parcels/CDH/lib/hive/lib/*

17. Click Continue to install and start Spark controller.

18. Click Continue and restart any dependency services.19. When the installation scripts are complete and the HDFS Hanaes directory is created, click Finish.

You can confirm that Spark controller is started by viewing the log file. For example:

tail -f /var/log/hanaes/hana_controller.log

Next Steps

● Post Installation Checks and Troubleshooting (Cloudera) [page 28].● Run the Diagnostic Utility [page 30].● Create a Remote Source [page 82].

SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 27

Page 28: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

3.2.3 Post Installation Checks and Troubleshooting (Cloudera)

This section provides an overview of the configuration that is performed during the installation. Depending on your environment, additional configuration may be required.

NoteYou can run the diagnostic utility to check your Spark controller installation and configuration. See Run the Diagnostic Utility [page 30]. The table below provides information about post installation configurations and troubleshooting.

Task Description

Environment varia­bles

When installing Spark controller, you set the location for Path of Spark Assemble Jar. This path is set in the hana_hadoop-env.sh file. To confirm or change the path, see Environment Variables for hana_hadoop-env.sh [page 46].

Hive Metastore To allow spark controller to connect to the Hive metastore, ensure that Hive is running and available, and that hive-site.xml is available in spark controller’s classpath.

The default Hive configuration path is /etc/hive/conf, and should be available in spark control­ler's classpath. If the your path differs from the default, update the HIVE_CONF_PATH environment variable in the hana_hadoop-env.sh file, located in the conf directory: /usr/sap/spark/controller/conf.

Configure hanaes When installing Spark controller, you set the properties for Extra Classpath for Spark Executors and Add SAP HANA Spark Controller Service to Cluster. These configuration properties are defined in the hanaes_site.xml file. Properties can be added or changed in this file, or using Cloudera Manager. See Modify Configuration Properties (Cloudera Manager) [page 29] and Configuration Properties [page 49].

These properties are required:

● sap.hana.es.server.port● spark.executor.extraClassPath● spark.executor.memory● spark.executor.instances

Cloud deployment Configure spark controller for cloud deployment. See Configuring Cloud Deployment Example [page 59].

Upgrading When updating to a new version of spark controller, be aware of new configuration and deprecated parameters, and changes to distribution formats. See Update Configuration Parameters when Up­grading [page 45].

28 P U B L I CSAP HANA Spark Controller Installation Guide

Installing SAP HANA Spark Controller

Page 29: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

3.2.4 Modify Configuration Properties (Cloudera Manager) After you install SAP HANA Spark controller, you can modify configuration properties.

Prerequisites

You have installed Spark controller and Spark Controller Service is listed on your Cloudera Manager home page.

Procedure

1. On the Cloudera Manager SAP HANA Spark Controller, click on the Configurations tab.

You see the current property settings:

2. The SAP HANA Spark Controller Master Advanced Configuration Snippet (Safety Valve) for Conf/hanaes-site.xml field allows you to set properties for DLM scenarios, cache properties, SSL properties, and so on.

You can set properties in the Editor or as XML:

SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 29

Page 30: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

3. When you are finished setting properties, click Save Changes and restart Spark controller.

3.2.5 Run the Diagnostic Utility

Use the diagnostic utility from Cloudera Manager to check your Spark controller installation for errors.

Prerequisites

Spark controller is installed, configured, and is stopped.

Procedure

1. From the Cloudera Manager home page, click on SAP HANA Spark Controller.

2. On the SAP HANA Spark Controller Status page, select the Instances tab.

30 P U B L I CSAP HANA Spark Controller Installation Guide

Installing SAP HANA Spark Controller

Page 31: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

3. On the Instances page, click on the SAP HANA Spark Controller Master instance.

4. From the Actions pull-down, select Run DiagnosticUtility.

5. Confirm that you want to run the utility from the pop-up menu.

The Diagnostic Utility checks your Spark controller installation and provides information about installation and configuration errors, and recommendations. The information is displayed in three tabs: stdout, stderr, and Role Log.

6. Click on the stdout tab to view recommendations and errors. Click Full log file to see the entire output.

Each file provides error and debugging information:

SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 31

Page 32: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

○ stdout – Shows the error codes that are assigned to the different types of errors.○ stderr – Shows information for debugging purposes.○ Role Log – Shows the hana_controller.log file.

7. Use the Error Messages table to find information about the error codes listed in the output: Error Messages [page 91].

3.2.6 Start or Stop the SAP HANA Spark Controller Service

Start or stop SAP HAN Spark controller from the Cloudera Manager Web UI.

Prerequisites

You have installed Spark controller and Spark Controller Service is listed on your Cloudera Manager home page.

Procedure

● On the Cloudera Manager SAP HANA Spark Controller Status tab, select Start from the SAP HANA Spark Controller Actions drop-down menu.

You see that the SAP HANA Spark Controller Master is started and succeeded. On the Cloudera Manager Web UI home page the status of Spark Controller services is green.

From the same drop-down menu, select Stop to stop the service.

NoteDo not use Restart. Using Restart displays an error similar to, Abruptly stop the remaining roles. Failed to execute command Stop on service Spark Controller, and the details view shows At least one role must be started. To restart the service, use Stop, then Start.

32 P U B L I CSAP HANA Spark Controller Installation Guide

Installing SAP HANA Spark Controller

Page 33: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

3.2.7 Uninstall from Cloudera Manager

Uninstall Spark controller using Cloudera Manager by stopping and deleting Spark controller, then clean the file system from Spark controller's installation files.

Context

Cloudera provides universal instructions for uninstalling managed software. For more information, see the Uninstalling Cloudera Manager and Managed Software in Cloudera's Installation and Upgrade documentation.

Procedure

1. Stop and delete the Spark controller service from Cloudera Manager.a. Log on to the Cloudera Manager Web UI and locate the SAP HANA Spark Controller Actions menu for

your cluster.

b. From the pull-down menu, select Stop.

The Stop command shows the details and warning when shutting down the controller.

c. Once Spark controller has stopped, select Delete from the SAP HANA Spark Controller Actions menu to remove the service from your cluster.

d. From the Cloudera Manager navigation bar, select Host Parcels .e. Locate SAPHanaSparkController in the parcel list and choose Deactivate.f. Choose Remove from host.

SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 33

Page 34: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

g. Choose Delete.

2. Remove the file system from the Spark controller's installation files.

During the installation of the SAP HANA Spark Controller, you copied installer files to the Cloudera location. In this step, you will remove them.

a. From the command line, remove the following structure from Cloudera for all of the Spark controller installation files:

/opt/cloudera/csd /opt/cloudera/parcel-repo /opt/cloudera/parcel-cache

b. Restart the Cloudera Server to apply all of the changes:

sudo service cloudera-scm-server restart

a. Log on the Cloudera Manager Web UI and open the drop-down menu next to the Cloudera Management Service. Select Restart, to restart the service.

This will make sure that any scripts, that may still be cached are removed from the Cloudera Manager.

3.3 Manual

Install and configure SAP HANA spark controller manually.

If you are using Hortonworks or Cloudera distributions, spark controller can be installed using Ambari or Cloudera Manager, respectively.

The MapR distribution requires manual installation.

Related Information

Installation Prerequisites (Manual) [page 35]Manually Install SAP HANA Spark Controller [page 36]Post Installation Checks and Troubleshooting (Manual) [page 38]Start SAP HANA Spark Controller [page 39]Uninstall from a Manual Installation [page 40]Update SAP HANA Spark Controller [page 41]

34 P U B L I CSAP HANA Spark Controller Installation Guide

Installing SAP HANA Spark Controller

Page 35: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

3.3.1 Installation Prerequisites (Manual)

This section provides a list of prerequisites when installing Spark controller manually.

Task Description

SAP HANA Install one of these spark controller compatible SAP HANA versions:

● 1.0 SPS12● 2.0 SPS00● 2.0 SPS01● 2.0 SPS02

Hadoop cluster Your Hadoop cluster requires Hive metastore, YARN, and Spark. The core-site.xml and hdfs-site.xml files must exist with the appropriate configurations for your Hadoop cluster.

See the compatibility Matrix for more information SAP HANA Spark Controller 2.0 SP02 Support for Hadoop Distributions.

Download Spark controller

Software Downloads Site > By Alphabetical Index (A-Z) H SAP HANA PLATFORM

EDITION SAP HANA PLATFORM EDITION 2.0 HANA SPARK CONTROLLER 2.0

NoteFor Cloudera Manager installations with Spark controller versions 2.0 SP01 and higher, use the parcels distribution format. You can use either parcels or packages if you manually deploy your Cloudera cluster. If you have already installed Cloudera using packages, install Spark controller manually and use the RPM distribution format. See Manual [page 34]. For more information about Cloudera Manager parcels, see Cloudera's documentation .

Update YARN prop­erties

For MapR installations, add YARN properties to the yarn-site.xml file.

See Add Properties for YARN [page 43].

SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 35

Page 36: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Task Description

Spark assembly file If you have Spark installed on your cluster, you can find the assembly file is located here:

● Cloudera – $SPARK_HOME defaults to /usr/lib/spark in package installations and /opt/cloudera/parcels/CDH/lib/spark in parcel installations.

● Hortonworks – $SPARK_HOME defaults to /usr/hdp/current/spark-client.

● MapR – $SPARK_HOME defaults to /opt/mapr/spark/spark-<version>/lib/spark-assembly-<version>-mapr-<version>-hadoop<version>-mapr-<version>.jar

During the installation process, you will set the HANA_SPARK_ASSEMBLY_JAR variable to the lo­cation of the spark assembly file.

If you do not have Spark installed on your cluster, download it from the Apache Mirror website at https://spark.apache.org/downloads.html .

NoteFor MapR distributions, you can only use the Spark assembly JAR file provided by MapR (not Apache, for example).

Proxy User For Hortonworks and Cloudera only. MapR distributions do not require proxy settings.

Spark controller impersonates the currently logged-in user while accessing Hadoop services. You can configuring user proxy settings for SAP HANA hanaes in the core-site.xml file.

See Configure hanaes User Proxy Settings [page 60].

3.3.2 Manually Install SAP HANA Spark Controller

Install SAP HANA Spark controller manually.

Prerequisites

See Installation Prerequisites (Manual) [page 35].

Context

Although you can deploy your Cloudera or Hortonworks cluster manually, SAP recommends that you use Cloudera Manager or Ambari for the installation. For more information, see Cloudera Manager [page 20] and Ambari [page 11].

36 P U B L I CSAP HANA Spark Controller Installation Guide

Installing SAP HANA Spark Controller

Page 37: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Procedure

1. The Spark controller download file is a TAR archive file that contains three types of installer binaries: RPM, Ambari, and Cloudera. Extract the rpm file.

2. Install Spark controller on one of the Hadoop cluster nodes by executing one of the following.

○ Linux – sudo rpm -i sap.hana.spark.controller-<version>.noarch.rpm○ Debian Linux – sudo alien -c -i sap.hana.spark.controller-<version>.noarch.rpm

NoteIf alien is not available, install it by executing:

sudo apt-get install alien

The installation path is predefined to: /usr/sap/spark. The hanaes account is created during the installation process and is the owner of the installed Spark controller directories and files. The default owning group is sapsys.

3. Log on as the hanaes user: sudo su - hanaes.

4. Confirm that the following folder structure is created, and is owned by user hanaes.

/usr/sap/spark/controller/conf /usr/sap/spark/controller/bin/usr/sap/spark/controller/lib /usr/sap/spark/controller/utils

5. Spark and YARN create a staging directory under the /user/hanaes directory on HDFS. Make sure that user hanaes has full access to the directory by creating this directory manually and assigning the necessary permissions to the hanaes user. For example:

hdfs dfs -mkdir /user/hanaes; hdfs dfs -chown hanaes:hdfs /user/hanaes; hdfs dfs -chmod 744 /user/hanaes;

For MapR:

su mapr hadoop fs –mkdir /user/hanaeshadoop fs –chown hanaes:sapsys /user/hanaes

6. Configure Spark controller.a. If you are not logged-in as the hanaes user, execute:sudo su - hanaes.b. Go to /usr/sap/spark/controller/conf, and review or edit the hana_hadoop-env.sh file.

The default file contains the following environment variables:○ HANA_SPARK_ASSEMBLY_JAR – (Required) Enter the path of Spark assembly JAR. This location

depends on your Hadoop distribution.○ HADOOP_CLASSPATH – (Optional) Enter the location of the Hadoop and Hive libraries.○ HADOOP_CONF_DIR – (Required) Directory containing all *-site.xml files. The default location

is: /etc/hadoop/conf.○ HIVE_CONF_DIR – (Required) Directory containing the hive-site.xml file. The default location

is: /etc/hive/conf.

SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 37

Page 38: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Additional configuration propertis and templates are listed here: Environment Variables for hana_hadoop-env.sh [page 46].

c. Go to /usr/sap/spark/controller/conf, and review or edit the hanaes-site.xml file.

The default file contains the following properties:○ sap.hana.es.server.port – (Required) 7860 is the default listening port which exchanges

requests with SAP HANA. The listening port +1 is used to transmit data.○ sap.hana.es.driver.host – (Optional) If the host on which Spark controller is installed has

multiple network interfaces, or if your hosts file contains an ambiguous resolution you can specify the name or IP address of the host where Spark controller is running.

○ sap.hana.executor.count – (Required) Number of YARN executors.○ sap.hana.executor.memory – (Required) Allocates the amount of memory to use per YARN

executor node.○ sap.hana.hadoop.engine.facades – (Required) Defines the Hadoop processing engines.○ sap.hana.es.warehouse.dir – (Required for DML scenarios only) Defines the DLM warehouse

directory location.

Additional configuration properties are listed here: Configuration Properties [page 49].7. Start Spark controller:

cd /usr/sap/spark/controller/bin ./hanaes start

Next Steps

● Post Installation Checks and Troubleshooting (Manual) [page 38].● Troubleshooting Diagnostic Utility [page 88].● Create a Remote Source [page 82].

3.3.3 Post Installation Checks and Troubleshooting (Manual)

This section provides an overview of the configuration that is performed during the installation. Depending on your environment, additional configuration may be required.

NoteYou can run the diagnostic utility to check your Spark controller installation and configuration. See Troubleshooting Diagnostic Utility [page 88]. The table below provides information about post installation configurations and troubleshooting.

38 P U B L I CSAP HANA Spark Controller Installation Guide

Installing SAP HANA Spark Controller

Page 39: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Task Description

Include the datanucleus path

For MapR installations, add the datanucleus path classes. See Spark DataNucleus JARs [page 47].

Hive Metastore To allow spark controller to connect to the Hive metastore, ensure that Hive is running and available, and that hive-site.xml is available in spark controller’s classpath.

The default Hive configuration path is /etc/hive/conf, and should be available in spark control­ler's classpath. If the your path differs from the default, update the HIVE_CONF_PATH environment variable in the hana_hadoop-env.sh file, located in the conf directory: /usr/sap/spark/controller/conf.

Configure hanaes Configure Spark controller properties in the hanaes_site.xml file.

These properties are required:

● sap.hana.es.server.port● spark.executor.extraClassPath (Cloudera)

● spark.sql.hive.metastore.sharedPrefixes (Hortonworks and MapR)

● spark.executor.memory● spark.executor.instances

Depending on your environment, additional properties may be required. For more information, see Configuration Properties [page 49].

Upgrading When updating to a new version of spark controller, be aware of new configuration and deprecated parameters, and changes to distribution formats. See Update Configuration Parameters when Up­grading [page 45].

3.3.4 Start SAP HANA Spark Controller

Once you have updated all configuration parameters, follow these steps to start SAP HANA Spark controller.

Procedure

1. Start Spark controller:

Option Description

Hortonworks sudo su - hanaes; cd /usr/sap/spark/controller/bin/;./hanaes start;

SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 39

Page 40: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Option Description

Cloudera sudo su - hanaes; export HADOOP_HOME=/usr/lib/hadoop/cd /usr/sap/spark/controller/bin/./hanaes start

NoteFor the exact location of HADOOP_HOME, see Cloudera documentation. Some Cloudera Hadoop versions may also have a specific JAVA_HOME. You can either add these environment variables to/usr/sap/spark/controller/conf/hana_hadoop-env.sh, or export them each time.

MapR sudo su - hanaes; cd /usr/sap/spark/controller/bin/; ./hanaes start;

2. Check the /var/log/hanaes/hana_controller.log file to make sure that Spark controller is started.

Next Steps

You can now consume remote data using SAP HANA smart data access.

3.3.5 Uninstall from a Manual Installation

Remove the Spark controller package to uninstall Spark controller. Optionally, you can remove the hanaes user, and the configuration files.

Procedure

1. Check for the Spark controller package:

rpm -qa | grep sap.hana.spark.controller

2. Remove the Spark controller package:

rpm -e sap.hana.spark.controller-*

3. (Optional) Remove the following structure to remove the hanaes user and the configuration files:

/usr/sap/spark/controller/conf /usr/sap/spark/controller/bin/usr/sap/spark/controller/lib

40 P U B L I CSAP HANA Spark Controller Installation Guide

Installing SAP HANA Spark Controller

Page 41: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

/usr/sap/spark/controller/utils

3.3.6 Update SAP HANA Spark Controller

Manually update the SAP HANA Spark controller installation.

Prerequisites

● Spark controller must be running when executing the upgrade command.● Check for changes to configuration parameters. See Update Configuration Parameters when Upgrading

[page 45].

Procedure

Execute the following command to update the Spark controller installation. For example:

rpm -Uvh sap.hana.spark.controller-2.2.0-1.noarch.rpm

The rpm arguments -v and -h are optional:○ -v: verbose○ -h: print hash marks as the package archive is unpacked.

3.4 MapR

If you are using the MapR distribution of Hadoop, set up SAP HANA spark controller manually.

Related Information

Installation Prerequisites (MapR) [page 42]Add Properties for YARN [page 43]Install SAP HANA Spark Controller for MapR Distributions [page 43]

SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 41

Page 42: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

3.4.1 Installation Prerequisites (MapR)

This section provides a list of prerequisites when installing SAP HANA spark controller manually.

Task Description

SAP HANA Install one of these spark controller compatible SAP HANA versions:

● 1.0 SPS12● 2.0 SPS00● 2.0 SPS01● 2.0 SPS02

Hadoop cluster Your Hadoop cluster requires Hive metastore, YARN, and Spark. The core-site.xml and hdfs-site.xml files must exist with the appropriate configurations for your Hadoop cluster.

See the compatibility Matrix for more information SAP HANA Spark Controller 2.0 SP02 Support for Hadoop Distributions.

Download Spark controller

Software Downloads Site > By Alphabetical Index (A-Z) H SAP HANA PLATFORM

EDITION SAP HANA PLATFORM EDITION 2.0 HANA SPARK CONTROLLER 2.0

Update YARN prop­erties

Add YARN properties to the yarn-site.xml file.

See Add Properties for YARN [page 43].

Spark assembly file If you have Spark installed on your cluster, you can find the assembly file is located here:

MapR – $SPARK_HOME defaults to /opt/mapr/spark/spark-<version>/lib/spark-assembly-<version>-mapr-<version>-hadoop<version>-mapr-<version>.jar

During the installation process, you will set the HANA_SPARK_ASSEMBLY_JAR variable to the lo­cation of the spark assembly file.

NoteFor MapR distributions, you can only use the Spark assembly JAR file provided by MapR (not Apache, for example).

42 P U B L I CSAP HANA Spark Controller Installation Guide

Installing SAP HANA Spark Controller

Page 43: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

3.4.2 Add Properties for YARN

Before you install SAP HANA Spark controller, follow these steps to configure MapR.

Context

For MapR distributions, you can only use the Spark assembly JAR file provided by MapR (not Apache, for example). This allows the application to be generated on YARN. Also, make sure you have installed the Spark service when deploying the cluster.

Procedure

1. Add the following properties to /opt/mapr/hadoop/hadoop-2.x.x/etc/hadoop/yarn-site.xml:

<property> <name>yarn.application.classpath</name> <value>/opt/mapr/hadoop/hadoop-<version>/etc/hadoop/:/opt/mapr/hive/hive-<version>/conf:/opt/mapr/hadoop/hadoop-<version>/share/hadoop/hdfs/lib/*:/opt/mapr/hadoop/hadoop-<version>/share/hadoop/yarn/lib/*:/opt/mapr/hadoop/hadoop-<version>/share/hadoop/yarn/*:/opt/mapr/hadoop/hadoop-<version>/share/hadoop/mapreduce/*:/opt/mapr/hadoop/hadoop-<version>/share/hadoop/mapreduce/lib/*:/opt/mapr/hadoop/hadoop-<version>/share/hadoop/hdfs/*:/opt/mapr/hadoop/hadoop-<version>/share/hadoop/common/*:/opt/mapr/hadoop/hadoop-<version>/share/hadoop/common/lib/*</value> </property><property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>102400</value> </property>

NoteFor reference, see the MapR documentation at http://doc.mapr.com/display/MapR/yarn-site.xml.

2. Restart the YARN Node managers per-node agent.

3.4.3 Install SAP HANA Spark Controller for MapR Distributions

For MapR distributions, you need to install SAP HANA spark controller manually.

See Manual [page 34].

SAP HANA Spark Controller Installation GuideInstalling SAP HANA Spark Controller P U B L I C 43

Page 44: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

4 Configuring SAP HANA Spark Controller

Configure Spark controller environment variables and override property values.

This section describes the values and properties used to configure Spark controller.

Configuration File Configuration Type

hanaes_site.xml ● Configuration Properties [page 49]● Limit Resource Allocations [page 56]● Configuring Cloud Deployment Example [page 59]

hana_hadoop-env.sh

● Environment Variables for hana_hadoop-env.sh [page 46]● Distribution Deployment Configuration Templates [page 59]

hive-site.xml ● LDAP Authentication [page 67]

Related Information

Port Configurations [page 44]Update Configuration Parameters when Upgrading [page 45]Environment Variables for hana_hadoop-env.sh [page 46]Spark DataNucleus JARs [page 47]Configuration Properties [page 49]Configure hanaes User Proxy Settings [page 60]Configuring a Proxy Server [page 64]Enabling Remote Caching [page 65]

4.1 Port Configurations

SAP HANA spark controller is configured to use ports 7860 and 7861 by default.

spark controller uses port 7860 to exchange requests with SAP HANA, and port 7861 is used by SAP HANA to transmit the data, or “tunnel” data. When tunneling the data, the data is sent from the Hadoop cluster nodes (executors), through spark controller to SAP HANA. Tunneling does not require a proxy server, however a proxy server can be configured. See Configuring a Proxy Server [page 64].

To confirm ports 7860 and 7861 are available, execute:

netstat -nlp | grep 7860 netstat -nlp | grep 7861

44 P U B L I CSAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller

Page 45: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

NoteFor non-proxy server environments, tunneling data is performed with ports 56000 – 58000 open. The data is sent from Hadoop node (executor) directly to SAP HANA, but it does not go through spark controller. For more information, see https://launchpad.support.sap.com/#/notes/2554388

4.2 Update Configuration Parameters when Upgrading

When updating to a new version of SAP HANA Spark controller, be aware of new configuration and deprecated parameters, and changes to distribution formats.

● 2.0 SP02The sap.hana.hadoop.engine.facades parameter has been added in 2.0 SP02. Use this parameter to list the facades connecting to Hadoop processing engines. This parameter replaces sap.hana.hadoop.datastore.

● 2.0 SP01Spark controller 2.0 SP01 and higher supports only the parcels distribution format for Cloudera Manager installations.When upgrading:○ If you have already installed Cloudera using the packages distribution format, install Spark controller

manually and use the RPM distribution format. See Manual [page 34]. For more information about Cloudera Manager parcels, see Cloudera's documentation .

○ If you are using manually installing Cloudera using packages, be sure to maintain the correct paths in these files:hanaes-site.xml:

<property> <name>spark.executor.extraClassPath</name> <value>/usr/lib/hadoop/lib/*:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-yarn/*:/usr/lib/hive/lib/*</value> <final>true</final> <description>Shared libraries to be loaded once</description></property>

hana_hadoop-env.sh.

#!/bin/bash export JAVA_HOME=/usr/java/jdk1.7.0_67-clouderaexport HADOOP_HOME=/usr/lib/hadoopexport HADOOP_CONF_DIR=/etc/hadoop/confexport HIVE_CONF_DIR=/etc/hive/confexport HANAES_LOG_DIR=/var/log/hanaes export HANA_SPARK_ASSEMBLY_JAR=/usr/lib/spark/lib/spark-assembly-<version>-cdh<version>-hadoop<version>-cdh<version>.jar export HADOOP_CLASSPATH=/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-yarn/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hive/lib/*#export HANA_SPARK_ADDITIONAL_JARS=#export HANAES_CONF_DIR=/etc/hanaes/conf

SAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller P U B L I C 45

Page 46: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

● 2.0Support for the hadoop.proxyuser.hanaes.hosts and hadoop.proxyuser.hanaes.groups parameters is new for version 2.0. If you are upgrading from an earlier version, you can use these parameters to configure user proxy settings. See Configure hanaes User Proxy Settings [page 60].

Related Information

Configuration Properties [page 49]

4.3 Environment Variables for hana_hadoop-env.sh

Configure SAP HANA Spark controller dependencies using these environment variables.

NoteStarting with version 1.6 PL1, the Spark controller installation does not respect dependencies copied into either the HDFS or the local file system.

Use the following environment variables to configure Spark controller dependencies in the conf/hana_hadoop-env.sh file.

Variable Name Description

JAVA_HOME By default, Spark controller uses the Java available in local path. This can be overridden or configured by setting this variable. Default value: None.

HADOOP_HOME (Optional) Directory where all components and libraries are installed. Default value: None.

HADOOP_CONF_DIR Directory where all *-site.xml files are available. Refer to your distribution docu­mentation to identify the location. Default value: /etc/hadoop/conf.

HIVE_CONF_DIR File system location where hive-site.xml is available. Default value for Horton­works and Cloudera: /etc/hive/conf.

NoteThe value can be changed for the different distributions of Hadoop. For example, use the value of /opt/mapr/hive/hive-1.2/conf for MapR.

HANAES_LOG_DIR Location to which all Spark controller logs are written. Default: /var/log/hanaes.

46 P U B L I CSAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller

Page 47: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Variable Name Description

HANA_SPARK_ASSEMBLY_JAR Required variable that points to the path of Spark assembly JAR. This location depends on your Hadoop distribution. If you manually downloaded the Spark assembly from Apache, specify the location of the assembly JAR. Default value: None.

HADOOP_CLASSPATH (Optional) Location of the Hadoop and Hive libraries. Different Hadoop distributions fol­low different installation paths; use this variable to configure them. Spark controller tries to locate them automatically. When you are running Spark 1.6, the Joda-Time de­pendencies are required. Configure the respective locations using this environment var­iable. Default value: None.

NoteThe dependencies provided are made available in the local classpath and are not copied to the running Spark context.

HANA_SPARK_ADDI­TIONAL_JARS

Add additional dependency JARs, separated by colon. Default value: None.

NoteThese dependencies are made available in running Spark context. When connecting to SAP Vora, you can specify SAP Vora data source dependency with this variable.

HANAES_CONF_DIR (Optional) Use this to override the configuration directory for Spark controller. Default value: /usr/sap/spark/controller/conf.

4.4 Spark DataNucleus JARs

SAP HANA spark controller has a dependency on Spark DataNucleus libraries.

These DataNucleus libraries are required:

● datanucleus-api-jdo.jar – Provides the DataNucleus implementation of the JDO API.● datanucleus-core.jar – Provides a DataNucleus persistence mechanism.● datanucleus-rdbms.jar – Provides persistence to the RDBMS datastore.

Depending on the distribution and version, these Spark libraries may be missing from your environment, or your Hadoop environment might include the wrong version of DataNucleus files. Some of the issues are: the libraries are missing from your installation, the currently installed libraries are incompatible with your Hadoop distribution and version, or multiple JAR versions are specified in the classpath.

For example, Apache Spark 1.6 is integrated with Cloudera 5.7 and later. If your Hadoop distribution was configured with a different version of Spark, the DataNucleus included your Spark JAR files may not be compatible with your distribution and this will result in errors raised by spark controller.

SAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller P U B L I C 47

Page 48: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

If you see errors in the /var/log/hanaes/hana_controller.log file stating that the datanucleus-* classes are not found, check to see if the JARs are missing, or if the incorrect version of the DataNucleus JARs are included in your Hadoop environment. See the documentation for your distribution for more information:

● Cloudera – Product Compatibility Matrix .● Hortownworks – Hortonworks Documentation .● MapR – Interoperability Matrix .

Related Information

Configuring the DataNucleus JARs [page 48]

4.4.1 Configuring the DataNucleus JARs

Include the DataNucleus libraries in your hana_hadoop-env.sh configuration.

NoteIf you see an error in the /var/log/hanaes/hana_controller.log file stating that the datanucleus-* classes are not found, include these libraries in the hana_hadoop-env.sh configuration.

● Installing Spark Controller Using Ambari

1. Go to SparkController Configs Advanced hana_hadoop_env and add the path of the datanucleus-* libraries in to HANA_SPARK_ADDITIONAL_JARS.For example:

export HANA_SPARK_ADDITIONAL_JARS=/usr/hdp/<hdp_version>/spark-client/lib/datanucleus-api-jdo-<version>.jar:/usr/hdp/<hdp_version>/spark-client/lib/datanucleus-core-<version>.jar:/usr/hdp/<hdp_version>/spark-client/lib/datanucleus-rdbms-<version>.jar

2. Save the configurations and restart spark controller.● Installing Spark Controller Using Cloudera Manager

1. If you installed spark controller using Cloudera Manager, go to SAP HANA Spark ControllerConfiguration HANA_SPARK_ADDITIONAL_JARS and add the path of the datanucleus-* libraries..For example:

export HANA_SPARK_ADDITIONAL_JARS=/usr/lib/hive/lib/datanucles-api-jdo-<version>.jar:/usr/lib/hive/lib/datanucleaus-core-<version>.jar:/usr/lib/hive/lib/datanucleus-rdbms-<version>.jar

2. Save the configurations and restart spark controller.● Manually Installation on Hortonworks or Cloudera

1. If you installed Spark controller manually on Hortonworks or Cloudera, edit /usr/sap/spark/controller/conf/hana_hadoop-env.sh and add the path of datanucleus-* libraries into HANA_SPARK_ADDITIONAL_JARS.

48 P U B L I CSAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller

Page 49: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

For example:

export HANA_SPARK_ADDITIONAL_JARS=/usr/hdp/<hdp_version>/hive-metastore/lib/datanucleus-api-jdo-<version>.jar:/usr/hdp/<hdp_version>/hive-metastore/lib/datanucleus-core-<version>.jar:/usr/hdp/<hdp_version>/hive-metastore/lib/datanucleus-rdbms-<version>.jar

2. Save the configurations and restart spark controller.● Manually Installing Spark Controller on MapR

1. If you installed spark controller manually on MapR, edit /usr/sap/spark/controller/conf/hana_hadoop-env.sh, add the path of datanucleus-* libraries into HANA_SPARK_ADDITIONAL_JARS.

export HANA_SPARK_ADDITIONAL_JARS=/opt/mapr/hive/<hive_version>/lib/datanucleus-api-jdo-<version>.jar:/opt/mapr/hive/<hive_version>/lib/datanucleus-core-<version>.jar:/opt/mapr/hive/<hive_version>/lib/datanucleus-rdbms-<version>.jar

2. Save the configurations and restart spark controller.

4.5 Configuration Properties

Define values in the hanaes_site.xml file to override default configuration properties for Spark and Spark controller.

The file is located in /usr/sap/spark/controller/conf.

NoteSpark controller respects all other Spark parameters that start with spark. Use these standard Spark parameters to change the general behavior of Spark controller. See https://spark.apache.org/docs/1.6.1/configuration.html .

Spark Controller Ports and Hosts

Name Default Values

Description

sap.hana.es.server.port 7860 The Spark controller listening port number that exchanges re­quests with SAP HANA.

7861 is used by SAP HANA to transmit data. This port is calcu­lated from sap.hana.es.server.port to port sap.hana.es.server.port +1

SAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller P U B L I C 49

Page 50: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Name Default Values

Description

sap.hana.es.driver.host None IP address of the host where Spark controller is running. If the host on which Spark controller is installed has multiple network interfaces, or if your hosts file (/etc/hosts) contains ambig­uous resolution, maintain the property with the IP address of your host.

sap.hana.dmz.proxy.host None IP address of the proxy server you wish to use for tunneling data through a proxy server.

See Port Configurations [page 44].

Spark Controller Class Path

Name Default Values

Description

spark.executor.extraClassPath (Cloudera)

spark.sql.hive.metastore.sharedPrefixes (Hortonworks and MapR)

None Defines location of shared libraries. Provides extra classpath en­tries to prepend to the classpath of executors.

See Distribution Deployment Configuration Templates [page 59].

Spark Controller HDFS Location

Name Default Values

Description

sap.hana.es.spark.yarn.jar None Location of the Spark Assembly JAR. Obtain the JAR from ei­ther Apache mirrors or respective Hadoop vendors.

sap.hana.es.lib.location None Location where all open source libraries are available.

50 P U B L I CSAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller

Page 51: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Spark Controller Timeout

Name Default Values

Description

sap.hana.connection.timeout 120 Connection time out in seconds (for all network traffic).

sap.hana.datatransfer.timeout 2 Connection time out in minutes for SAP HANA to transfer data. The query is canceled when the time has elapsed.

Spark Controller Cache

Name Default Values

Description

sap.hana.es.cache.max.capacity

500 Maximum number of queries to be cached in disk.

sap.hana.es.enable.cache False Enables remote caching for Spark controller.

See Enabling Remote Caching [page 65].

Resource Allocations

Name Default Values

Description

spark.executor.instances None The number of YARN executors.

Do not specify both spark.executor.instances and spark.dynamicAllocation.enabled. If you do, this will result in an error.

spark.executor.memory None Allocates available memory for YARN executor nodes.

spark.yarn.queue None Allocates the percentage of resources on the dedicated queue for Spark controller on the YARN Resource Manager.

spark.dynamicAllocation.enabled

False Enables dynamic allocation of executors. Dynamic allocation also requires spark.shuffle.service.enabled to be set to true.

SAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller P U B L I C 51

Page 52: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Name Default Values

Description

spark.shuffle.service.enabled False Enables the external shuffle service. The external shuffle service must be set on each worker node in the same cluster for this service to work properly.

spark.dynamicAllocation.minExecutors

None Minimum number of executors for Spark controller.

spark.dynamicAllocation.maxExecutors

None Maximum number of executors for Spark controller.

spark.dynamicAllocation.initialExecutors

None Initial number of executors for Spark controller.

See Resource Allocation [page 55].

Spark Connection to Hive (Ambari)

Name Default Values

Description

spark.sql.hive.metastore.sharedPrefixes

None Defines location of shared libraries. When using Ambari, use this property to configure the connection access from Spark to Hive Metastore.

See Install SAP HANA Spark Controller Using Ambari [page 14].

Compression

Name Default Values

Description

sap.hana.enable.compression False Enables or disables compression. Compression is only used for data exchange.

52 P U B L I CSAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller

Page 53: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Hadoop Processing Engine Facades

Name Default Values

Description

sap.hana.hadoop.engine.facades

sparksql

Comma separated list of facades connecting to Hadoop proc­essing engines such as sparksql, hadoop (Map Reduce) and vora.

NoteThis property replaces sap.hana.hadoop.datastore.

Valid values are sparksql, hadoop, and vora.

Data Storage Format

Name Default Values

Description

sap.hana.es.data.format auto Specifies the data storage format when moving data from Spark controller to Hadoop.

Valid values are parquet, orc, or auto.

Cloud Deployment

Name Default Values

Description

sap.hana.ar.provider None Address of translation service. Useful for cloud scenarios.

See Configuring Cloud Deployment Example [page 59].

SAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller P U B L I C 53

Page 54: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

DLM Scenarios

Name Default Values

Description

sap.hana.es.warehouse.dir None When Spark controller is used for DLM scenarios, this property should point to a valid HDFS directory where you plan to store all aging data.

Security

Name Default Values

Description

sap.hadoop.kerberos.principal None Kerberos principal to be used for starting Spark controller.

NoteReplaces the spark.yarn.principal property.

sap.hadoop.kerberos.keytab None Kerberos key tab file to be used for the principal.

NoteReplaces the spark.yarn.keytab property.

sap.hana.es.ssl.enabled False Indicates whether to use secure communication.

sap.hana.es.ssl.keystore None Path to the PKCS keystore file.

sap.hana.es.ssl.keystore.password

None Password for the PKCS keystore file.

sap.hana.es.ssl.truststore None Path to the JKS truststore file. Set the value to Java Trust Store if explicit trust is warranted.

sap.hana.es.ssl.truststore.password

None Password for the trust store file.

sap.hana.es.ssl.verify.hostnames

False Indicates whether to check hostname against the certificate used for the SSL handshake.

sap.hana.es.ssl.clientauth.required

True Indicates whether client authentication is required.

sap.hana.auditing.enabled False If enabled, controller will log all the queries executed.

54 P U B L I CSAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller

Page 55: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Name Default Values

Description

sap.hana.proc.security.disabled

False Disables security manager for procedure execution.

sap.hana.allow.nonkerberos.client

False Allows non-Kerberos clients to connect to a Kerberos-enabled Spark controller. Set this property to true only when running SAP HANA instances that do not support Kerberos SSO. Other­wise, this setting allows every logged-in SAP HANA user to con­nect to Spark controller.

See Setting Up Security [page 67].

Spark Controller Properties for Older Versions of Spark and SAP HANA

Name Default Values

Description

sap.hana.sql.tungsten.enabled None Effective only when using Spark 1.5.2. Improves Spark execu­tion by optimizing Spark jobs for CPU and memory efficiency.

sap.hana.use.dot.separator False Earlier versions of SAP HANA and Spark controller used a dot (.) as the separator between the schema name and table name, such as SYSTEM.SALES_ORDER. This was changed in later versions. If you are running an SAP HANA version of 1.0, up to version SPS12, set this property to true. When running SAP HANA 2.0 and higher, this is property is not required.

Related Information

Resource Allocation [page 55]Configuring Cloud Deployment Example [page 59]Distribution Deployment Configuration Templates [page 59]

4.5.1 Resource Allocation

Spark configurations for resource allocation are set in spark-defaults.conf or hanaes_site.xml.

You can limit the number of executors with the Spark property spark.executor.instances, or create a dedicated queue for Spark controller on YARN ResourceManager.

SAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller P U B L I C 55

Page 56: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

You can also specify dynamic allocation, which allows Spark to dynamically scale the cluster resources allocated for the Spark application. When dynamic allocation is enabled, if there are backlog of pending tasks for a Spark application, it can request for new executors. When the application becomes idle, its executors are released and can be acquired by other spark applications.

NoteThese two methods are not compatible. Do not specify both spark.executor.instances and spark.dynamicAllocation.enabled.

Related Information

Limit Resource Allocations [page 56]Enable Dynamic Allocation of Executors [page 57]

4.5.1.1 Limit Resource Allocations

Follow these steps to limit resource allocation for SAP HANA spark controller.

Context

Spark controller runs on YARN in yarn-client mode. A Spark context requests executors to run the job. To prevent Spark from taking up all available resources — thus leaving no resource for any other application running on YARN ResourceManager — perform one of the following (listed in order of preference considering ease of configuration and stability):

Procedure

● Limit the number of executors by setting the <desired_number_of_executors> parameter in the hanaes-site.xml file:

<property> <name>spark.executor.instances</name> <value><desired_number_of_executors></value> <final>true</final></property>

● Create a dedicated queue for spark controller on YARN ResourceManager and allocate a percentage of resources to that queue. After you create the queue, maintain the <newly_created_queue> parameter in hanaes-site.xml:

<property>

56 P U B L I CSAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller

Page 57: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

<name>spark.yarn.queue</name> <value><newly_created_queue></value> <final>true</final></property>

See the YARN documentation for information about YARN queue creation.

4.5.1.2 Enable Dynamic Allocation of Executors

Enable dynamic allocation of executors by specifying a minimum, maximum, and initial number of executors for Spark. This ensures that the computing capacity is elastic and robust for scenarios where Hadoop cluster is shared by various data processing applications.

Context

The following steps describe how to configure dynamic allocation for manual installations by editing property files.

Procedure

1. After installing spark controller, add the following spark. properties to the hanaes-site.xml file, which is typically located in the /usr/sap/spark/controller/conf directory:

These spark. properties are not specific to spark controller and are provided as an example of how to configure Spark dynamic allocation. See the Spark documentation for information about dynamic allocation properties and how to determine the appropriate values for your environment.

<property> <name>spark.shuffle.service.enabled</name> <value>true</value> <final>true</final></property><property> <name>spark.dynamicAllocation.enabled</name> <value>true</value> <final>true</final></property><property> <name>spark.dynamicAllocation.minExecutors</name> <value>4</value> <final>true</final></property><property> <name>spark.dynamicAllocation.maxExecutors</name> <value>8</value> <final>true</final></property><property> <name>spark.dynamicAllocation.initialExecutors</name> <value>4</value> <final>true</value>

SAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller P U B L I C 57

Page 58: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

</property> 2. Edit the yarn-site.xml file.

○ Cloudera Manager – Log into Cloudera Manager and navigate to YARN Configuration and select Yarn Service Advanced Configuration Snippets (Safety valve) for yarn-site.xml.

○ Ambari – Log in to the Ambari Web UI and select YARN Config .○ Manual – Open the yarn-site.xml file in a text editor.

3. Add the following properties and values:

<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle,spark_shuffle</value></property><property> <name>yarn.nodemanager.aux-services.spark_shuffle.class</name> <value>org.apache.spark.network.yarn.YarnShuffleService</value></property>

4. Copy the spark-<version>-yarn-shuffle.jar file from Spark to the Hadoop YARN classpath on all the Nodemanager hosts. This file is typically located in /usr/lib/hadoop-yarn/lib.

For Hortonworks, this folder is typically located in /usr/hdp/<hdb_version>/hadoop-yarn/lib.

5. Save the changes, then restart YARN and the node manager.

Hortonworks

Context

For some older Hortonworks versions, you may also need to perform the following steps.

Procedure

1. Locate and open mapred-site.xml file, or in Ambari web UI, select MapReduce2 Configs .

2. Update the property mapreduce.application.classpath by removing all entries containing <hdp_version>.

3. Restart the MapReduce job.

58 P U B L I CSAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller

Page 59: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

4.5.2 Configuring Cloud Deployment Example

Configure SAP HANA Spark controller for cloud deployment.

Context

NoteYou do not need to set this property if both SAP HANA and Hadoop cluster are hosted in the cloud.

If your Spark controller is running on an cluster that is hosted in the cloud (such as on Amazon Web Services), then your host machines are using different IP addresses for internal and external communication. If your SAP HANA is running on premise, and a connection is attempted to Spark controller running in the cloud, the external IP address of the hosts is unavailable, causing query executions to fail because SAP HANA cannot reach the executor instances directly.

Procedure

Spark controller versions 1.5 PL 2 and higher offer a built-in translation service for Amazon cloud computing. If you are using AWS, enable proper hostname translation with the following configuration in the hanaes-site.xml file:

<property> <name>sap.hana.ar.provider</name> <value>com.sap.hana.spark.aws.extensions.AWSResolver</value> <final>true</final></property>

Cloud providers typically offer a service to translate internal host names to an external host name. You can also implement custom translators by implementing controller extension APIs.

4.5.3 Distribution Deployment Configuration Templates

Use these templates to configure SAP HANA spark controller for your distribution.

Update the hanaes-site.xml file located in the /usr/sap/spark/controller/conf directory.

● Cloudera (Parcel) Use for a parcel distribution format to deploy your Cloudera cluster. The path for deploying Cloudera differs when using parcels or packages. This path is not related to the file format when installing spark controller.

<property> <name>spark.executor.extraClassPath</name> <value>/opt/cloudera/parcels/CDH/lib/hadoop/lib/*:/opt/cloudera/parcels/CDH/lib/hadoop/*:/opt/cloudera/parcels/CDH/lib/hadoop-hdfs/*:/opt/

SAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller P U B L I C 59

Page 60: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

cloudera/parcels/CDH/lib/hadoop-mapreduce/*:/opt/cloudera/parcels/CDH/lib/hadoop-yarn/*:/opt/cloudera/parcels/CDH/lib/hive/lib/* </value> <final>true</final> <description>Shared libraries to be loaded once</description></property>

● Cloudera (Package)Use for a package distribution format to deploy your Cloudera cluster. The path for deploying Cloudera differs when using parcels or packages. This path is not related to the file format when installing spark controller.

<property> <name>spark.executor.extraClassPath</name> <value>/usr/lib/hadoop/lib/*:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-yarn/*:/usr/lib/hive/lib/*</value> <final>true</final> <description>Shared libraries to be loaded once</description></property>

● Hortonworks

<property> <name>spark.sql.hive.metastore.sharedPrefixes</name> <value>com.mysql.jdbc,org.postgresql,com.microsoft.sqlserver,oracle.jdbc,org.apache.hadoop</value> <final>true</final> <description>Shared libraries to be loaded once</description></property>

● MapR

<property> <name>spark.sql.hive.metastore.sharedPrefixes</name> <value>org.apache.hadoop</value> <final>true</final> <description>Shared libraries to be loaded once</description></property>

4.6 Configure hanaes User Proxy Settings

SAP HANA Spark controller impersonates the currently logged-in user while accessing Hadoop services. To allow this impersonation, maintain appropriate configuration parameters in the core-site.xml file.

NoteThis section describes configuring user proxy settings for SAP HANA hanaes, and is not related to the proxy settings for tunneling with a proxy server.

Use the cluster management tool (Ambari or Cloudera Manager) to add these entries to the core-site.xml file. For more information about proxy user settings, see the Ambari or Cloudera Manager documentation.

The following examples show the different configurations you can set in the core-site.xml file.

60 P U B L I CSAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller

Page 61: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

This example shows the hosts in which the hanaes user is allowed to perform an impersonation. Ideally, this is the host where Spark controller is installed. Maintaining “*”allows the hanaes user to impersonate from any host.

The syntax is:

hadoop.proxyuser.<proxy_user>.hosts

<property> <name>hadoop.proxyuser.hanaes.hosts</name> <value>*</value> </property>

This example shows how a user can set proxy setting from hosts in the range of 10.222.0.0-16 and 10.113.221.221, to impersonate user1 and user2. This property accepts comma-separated lists of host names, or of IP addresses and IP address ranges in CIDR format:

<property> <name>hadoop.proxyuser.super.hosts</name> <value>10.222.0.0/16,10.113.221.221</value></property><property> <name>hadoop.proxyuser.super.users</name> <value>user1,user2</value> </property>

If you have a group consisting of users that are connecting from SAP HANA, you can maintain a group name and allow users to impersonate a user belonging to the group. This example shows how to allow hanaes to impersonate users from any group. The syntax is:

hadoop.proxyuser.<proxy_group>.groups

<property> <name>hadoop.proxyuser.hanaes.groups</name> <value>*</value> </property>

This example shows how to allow user tom to impersonate a user belonging to group1 and group2:

<property> <name>hadoop.proxyuser.tom.groups</name> <value>group1,group2</value> </property>

Related Information

Ambari [page 62]Cloudera Manager [page 63]

SAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller P U B L I C 61

Page 62: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

4.6.1 Ambari

Use the Ambari Web UI to add the proxy user entries to the core-site.xml file.

Procedure

1. From the menu on the left side of the Ambari Dashboard, click HDFS.2. Click the Configs tab.3. Click the Advanced tab, then expand Custom core-site.4. Click Add Property.5. In the Key field, enter hadoop.proxyuser.hanaes.hosts, and in the Value field, enter an asterisk.

6. Click Add Property.7. In the Key field, enter hadoop.proxyuser.hanaes.groups, and in the Value field, enter an asterisk.

8. Save your configuration changes.

62 P U B L I CSAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller

Page 63: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Results

By specifying an asterisk for the hosts and groups, the user named hanaes can impersonate any user belonging to any group from any host.

4.6.2 Cloudera Manager

Use the Cloudera Manager Web UI to add the proxy user entries to the core-site.xml file.

Procedure

1. In the "New layout" mode on Cloudera Manager, go to Cloudera Manager HDFS Configuration .2. In the Search box, type cluster-wide to find Cluster-wide Advanced Configuration Snippet (Safety

Valve) and the core-site.xml settings.

3. Click the plus sign (+), and add the properties as follows:

Results

By specifying an asterisk for the hosts and groups, the user named hanaes can impersonate any user belonging to any group from any host.

SAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller P U B L I C 63

Page 64: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

4.7 Configuring a Proxy Server

You can configure SAP HANA Spark controller to tunnel data in environments where networking landscapes require a proxy server to operate between SAP HANA and Spark controller.

Context

When tunneling data with a proxy server, the data is sent from the Hadoop cluster nodes (executors) through Spark controller, and then through the proxy server to SAP HANA.

Procedure

1. Edit the hanaes_site.xml file to include the sap.hana.dmz.proxy.host parameter with the IP address of your proxy server.

2. Configure your proxy server.

SAP HANA Studio uses the defined proxy server and port when creating the Spark controller remote source. The following is an example configuration through an nginx proxy server:

tcp { upstream spark controller { server <controller_host_domain>:<configured_port>; } upstream sparkcontrollerdata { server <controller_host_domain>:<configured_port + 1>; } server { listen <listen_proxy_port>; proxy_pass sparkcontroller; } server { listen <configured_port + 1>; proxy_pass sparkcontrollerdata; } }

3. Create a remote source in SAP HANA Studio:

CREATE REMOTE SOURCE "proxy_spark" ADAPTER "sparksql" CONFIGURATION 'server=<x.x.x.x>;port=7860;ssl_mode=disabled' WITH CREDENTIAL TYPE 'PASSWORD' USING 'user=hanaes;password=<password>'

Related Information

Configuration Properties [page 49]

64 P U B L I CSAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller

Page 65: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

4.8 Enabling Remote Caching

You can enable remote caches in Spark for queries with complex calculation, which allows you to use materialized data for the repetitive execution of the same query.

Context

SAP HANA dispatching a virtual table query to Spark involves a series of Spark computations that may take from a few minutes to hours to complete a query, depending on the data size in Hadoop and the current cluster capacity. In most cases, the data in the Hadoop cluster is not frequently updated, and successive execution of map and reduce jobs might result in the same queries. Using remote caching with Hadoop through the Spark interface allows you to use the cached remote data set rather than wait for queries to be executed again. The first time you run a statement, you see no performance improvement because of the time it takes to run the job and sort the data in the table. The execution time for the job is reduced the next time you run the same query, because you are accessing materialized data.

NoteRemote caching is available when using SAP HANA 2.0 and spark controller 2.0 versions and higher.

Use this feature for Hive tables or extended storage tables with low-velocity data (which are not frequently updated).

Procedure

● To enable remote caching, add the following configuration to hanaes-site.xml:

<property> <name>sap.hana.es.enable.cache</name> <value>true</value> <final>true</final></property><property> <name>sap.hana.es.cache.max.capacity</name> <value>5000</value> <final>true</final> </property>

This behavior is controlled by using a hint to instruct the optimizer to use remote caching. For example:

After you create a virtual table called spark_activity_log, fetch all erroneous entries for plant 001:

select * from spark_activity_log where incident_type = 'ERROR' and plant ='001' with hint (USE_REMOTE_CACHE, USE_REMOTE_CACHE_MAX_LAG(7200))

When you use the hint USE_REMOTE_CACHE, this result set is materialized in Spark, and subsequent queries are served from the materialized view.

SAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller P U B L I C 65

Page 66: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Related Information

Remote Caching Configuration Parameters [page 66]

4.8.1 Remote Caching Configuration Parameters

Use the following configuration parameters for remote caching, which are stored in the indexserver.ini file in the smart_data_access section.

Parameter Description

enable_remote_cache ( 'true' | 'false' )

A global switch to enable or disable remote caching for federated queries. This parameter only supports Hive sources. The USE_REMOTE_CACHE hint parameter is ignored when this param­eter is disabled.

remote_cache_validity = 3600 (seconds)

Defines how long the remote cache remains valid. By default, the cache is retained for an hour.

USE_REMOTE_CACHE_MAX_LAG() Defines how long, in seconds, the remote cache remains valid for an individual query. By default, the cache is retained for an hour. This value overwrites the value of remote_cache_validity.

66 P U B L I CSAP HANA Spark Controller Installation GuideConfiguring SAP HANA Spark Controller

Page 67: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

5 Setting Up Security

This section describes how to enable audit logs for Spark controller, and an overview of how to set up authentication for your Hadoop cluster using Kerberos, or by configuring SSL.

Related Information

LDAP Authentication [page 67]Configure Auditing [page 68]Kerberos [page 69]SSL [page 74]

5.1 LDAP Authentication

Configure and activate user name and password authorization in SAP HANA spark controller.

There are various LDAP (Lightweight Directory Access Protocol) implementations, such as OpenLDAP or Active Directory (Apache or Microsoft); therefore the configuration steps and syntax may differ depending on the implementation and the Hadoop distribution.

To configure HiveServer2 to use LDAP or configure HiveServer2 to use LDAP over SSL (LDAPS), you must set up an LDAP server and create a user account on the LDAP server. The following links provide details for the different Hadoop distributions.

Distributions Documentation

Hortonworks 2.6.0 Data Access

Cloudera 5.8.x Using LDAP Username/Password Authentication with HiveServer2

MapR 5.0 Authentication for HiveServer2

Property changes should be made to etc/hive/conf/hive-site.xml file. The location of this file is set by the HIVE_CONF_DIR environment variable in the hana_hadoop-env.sh file.

After you set the properties, restart spark controller. You can now create remote sources and virtual tables in SAP HANA using the LDAP server username and password for authentication. For example:

CREATE REMOTE SOURCE SPARK_SQL ADAPTER "sparksql" CONFIGURATION 'server=<YOUR_SPARK_CONTROLLER_HOST>;port=<SPARK_CONTROLLER_PORT>;ssl_mode=disabled;' WITH CREDENTIAL TYPE 'PASSWORD' USING 'user=hana;password=<password>';

SAP HANA Spark Controller Installation GuideSetting Up Security P U B L I C 67

Page 68: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

5.2 Configure Auditing

Enable writing to audit logs.

Spark controller supports writing to audit logs. Enable writing audit logs by setting the following property:

<property> <name>sap.hana.auditing.enabled</name> <value>true</value> <final>true</final> </property>

Audit events are emitted as a set of key-value pairs for the following keys:

Key Value

ugi <user>,<group>[,<group>]*

client <client ip address>

cmd (QUERY_EXECUTE|CREATE_EXTENDED|DROP_EXTENDED)

sql <sql query that was executed>

schema (<schema>|NULL)

table (<table>|NULL)

This is a sample line of the audit output:

2016-07-28 14:32:04,182 ugi=hanaes,sapsys,hdfs client=1.2.3.4 cmd=QUERY_EXECUTE sql="SELECT "ORDERS01_SPARK_TESTA"."O_ORDERKEY", "ORDERS01_SPARK_TESTA"."O_CUSTKEY", "ORDERS01_SPARK_TESTA"."O_ORDERSTATUS", "ORDERS01_SPARK_TESTA"."O_TOTALPRICE", "ORDERS01_SPARK_TESTA"."O_ORDERDATE", "ORDERS01_SPARK_TESTA"."O_ORDERPRIORITY", "ORDERS01_SPARK_TESTA"."O_CLERK", "ORDERS01_SPARK_TESTA"."O_SHIPPRIORITY", "ORDERS01_SPARK_TESTA"."O_COMMENT" FROM "SYSTEM"."ORDERS01" "ORDERS01_SPARK_TESTA"" schema=NULL table=NULL

68 P U B L I CSAP HANA Spark Controller Installation Guide

Setting Up Security

Page 69: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

5.3 Kerberos

Kerberos is a protocol for establishing mutual identity trust, or authentication, for a client and a server, via a trusted third-party.

These overview instructions assume you know how to install Kerberos, or that you already have a working Kerberos key distribution center (KDC) and realm setup.

Enabling Kerberos on Your Hadoop Cluster

You must install Kerberos client packages on all cluster hosts and hosts that will be used to access the cluster. Refer to the security documentation for your Hadoop distribution for information about up your Hadoop cluster for Kerberos.

Reference Information:

● Cloudera Manager – Enabling Kerberos Authentication Using the Wizard .● MapR – Configuring Kerberos User Authentication .● Ambari - Setting Up Kerberos for Use with Ambari .

Configure Kerberos for SAP HANA Instance

Reference Information:

● SAP HANA Administration Guide – Managing Single Sign-On (SSO) with Kerberos.● SAP HANA Security Guide – Single Sign-On Using Kerberos.● "SAP HANA Smart Data Access Single Sign-On Guide", attached to SAP Note 2303807● "Single Sign-On with SAP HANA Database using Kerberos and Microsoft Active Directory", attached to SAP

Note 1837331● "SAP HANA SSO/Kerberos: create keytab and validate configuration script", attached to SAP Note

1813724

Kerberos 5 is installed with SAP HANA. It contains the S4U (Service for User) extension needed for user impersonation and constrained delegation. Constrained delegation means that delegation can be done only to a predefined set of services. For the purposes of protocol transition, the computer on which the server is installed needs to be entrusted by the Microsoft Windows Active Directory for delegation. Kerberos protocol is used in SAP HANA for authentication only and not for session management.

These are the SAP HANA .keytab requirements:

● <sidadm_home>/etc/krb5_hdb.conf● <sidadm_home>/etc/krb5_hdb.keytabkrb5_hdb.keytab● <sidadm_home>/etc/krb5_host.keytabkrb5_host.keytab

If the files are present in the <sidadm_home>/etc folder, the configuration is automatically taken from there, otherwise the default OS configuration in /etc/krb5.conf and /etc/krb5.keytab are used instead.

SAP HANA Spark Controller Installation GuideSetting Up Security P U B L I C 69

Page 70: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

For a custom setup of Kerberos, you can overwrite the following variables in /usr/sap/<SID>/home/.customer.sh: KRB5_CONFIG, KRB5_KTNAME, KRB5_CLIENT_KTNAME. For example:

Sample Code

export KRB5_CONFIG=<conf file> export KRB5_KTNAME=<hdb keytab file> export KRB5_CLIENT_KTNAME=<host keytab file>

You can connect to an SAP HANA remote source using single sign-on (SSO) with Kerberos. Declare either a global credential type for the remote source, or as an individual type for a given user. If a user with user level credentials is defined and the remote source has global credentials defined, the global credentials are used; the user level credentials are ignored on the remote source. Do one of the following:

To: Execute:

Create global credentials CREATE CREDENTIAL COMPONENT 'SAPHANAFEDERATION' PURPOSE <remote_source_name> TYPE 'KERBEROS';

Create user level credentials CREATE CREDENTIAL FOR USER <user_name> COMPONENT 'SAPHANAFEDERATION' PURPOSE <remote_source_name> TYPE 'KERBEROS';

On the source SAP HANA server, configure Kerberos to support constrained delegation.

1. Create the file $HOME/etc/krb5_hdb.conf and enable delegation by setting the forwardable parameter for Kerberos service tickets to true in the krb5_hdb.conf file. See the template here: SAP HANA Server Configuration [page 73].

2. On the Microsoft Windows Active Directory server, create a Windows Domain account for the SAP HANA server computer and map a host service principal name (SPN) to it. See WinAD Server Configuration [page 73].

3. Add a keytab entry for the hdb service. The keytab stores the keys needed by the SAP HANA server to take part in the authentication protocol. service of a remote SAP HANA server to a Microsoft Windows Active Directory account in order to be able to log in to the remote SAP HANA server using Kerberos. Enable constrained delegation and protocol transition for your remote SAP HANA server in the Active Directory Users and Computers application. See WinAD Server Configuration [page 73].

Configure Kerberos for SAP HANA Spark Controller

1. In the Active Directory, define hanaes/<hadoop_host_name.domain>.com as the Hadoop node host on which spark controller is running.

2. Delegate (forward tickets) from host/<hana_host_name.domain>.com to the remote SAP HANA server hanaes/<hadoop_host_name.domain>.com services, where <hana_host_name.domain>.com is the host on which the SAP HANA instance is running.

3. Create an SAP HANA user account which has a defined Kerberos external ID. The Kerberos external ID can be set in SAP HANA Studio when the user is edited. This user must have the rights to create remote source.

70 P U B L I CSAP HANA Spark Controller Installation Guide

Setting Up Security

Page 71: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

4. Create .keytabs for spark controller Kerberos setup. You should contact your Kerberos administrator to request the generation of .keytabs. The hanaes user is assigned a Kerberos keytab.These are the spark controller .keytab requirements:○ hanaes.keytab○ krb5.conf

5. Copy the .keytab file to the spark controller configuration directory at: /usr/sap/spark/controller/conf. Use the following parameters for Spark to connect to the cluster using your Kerberos principal.

<property> <name>sap.hadoop.kerberos.keytab</name> <value>/usr/sap/spark/controller/conf/hanaes.keytab</value></property><property> <name>sap.hadoop.kerberos.principal</name> <value>hanaes/<hadoop_host.domain>@<your_domain></value> </property>

6. On the Hadoop cluster, add the following properties for Kerberos using Ambari for the Hortonworks Hadoop distribution, or using comparable administration tools provided by other distributions of Hadoop:1. Add the following rule to hadoop.security.auth_to_local in the HDFS Advanced core-

site.xml:

RULE:[2:$1@$0]([email protected])s/.*/hanaes/

2. Add the proxy user parameters to the HDFS Custom core-site.xml:

hadoop.proxyuser.hanaes.groups=* hadoop.proxyuser.hanaes.hosts=* 7. Create a remote source of type sparksql with the credential type KERBEROS. For example:

CREATE REMOTE SOURCE "spark_krb" ADAPTER "sparksql" CONFIGURATION 'server=Host_name.domain.com;port=7860;ssl_mode=disabled' WITH CREDENTIAL TYPE 'KERBEROS'

Host_name.domain.com is the host where spark controller is running.

Related Information

Configure Kerberos SSO on the SAP HANA Server [page 72]

SAP HANA Spark Controller Installation GuideSetting Up Security P U B L I C 71

Page 72: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

5.3.1 Configure Kerberos SSO on the SAP HANA Server

This section describes how to configure Kerberos in SAP HANA for SDA as to use protocol transition and authenticate SAP HANA users automatically on a Windows Domain Active Directory without providing a password (SSO mode).

Prerequisites

Microsoft Windows Server, version 2003 or later.

Architecture Overview

The Kerberos platform architecture used in SSO authentication for connections to SAP HANA remote sources is shown below. Protocol transition is assured by Kerberos 5's S4U2Proxy extension:

Protocol transition is a capacity of Kerberos used mainly on intermediary platform servers. The user can be authenticated further on Kerberos using their own name, as with SSO. To enable this, the computer on which the server is installed should be entrusted by Windows Active Directory for delegation. Protocol transition supports only constrained delegation, meaning that the delegation can be done only towards a predefined set of services.

72 P U B L I CSAP HANA Spark Controller Installation Guide

Setting Up Security

Page 73: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

SAP HANA Server Configuration

After installing SAP HANA, create the file $HOME/etc/krb5_hdb.conf using the following template. By default, HOME=/usr/sap/<SID>/home:

[libdefaults] default_realm = <PLATFORM.DOMAIN> clockskew = 300 default_keytab_name = /usr/sap/<SID>/home/etc/krb5_hdb.keytab default_client_keytab_name = /usr/sap/<SID>/home/etc/krb5_host.keytab forwardable = true [realms] <PLATFORM.DOMAIN> = { kdc = <server>.<server.DNS.domain>:88 kpasswd_server = <server>.<server.DNS.domain>:464 } [domain_realm] .<localhost.DNS.domain> = <PLATFORM.DOMAIN> <localhost.DNS.domain> = <PLATFORM.DOMAIN> [logging] kdc = FILE:/usr/sap/<SID>/home/log/krb5kdc.log admin_server = FILE:/usr/sap/<SID>/home/log/kadmind.log default = SYSLOG:NOTICE:DAEMON

where:

<PLATFORM.DOMAIN> – your WinAD NT domain name.

<SID> – your SAP HANA instance SID.

<server> – the WinAD server host.

<server.DNS.domain> – the full DNS domain of the WinAD server host.

<localhost.DNS.domain> – the full DNS domain of your Hana server.

WinAD Server Configuration

On the WinAD server, create a new Windows Domain User account which will act as a UNIX computer account. For more information, see the Windows-Unix inter-operability setup documentation at: https://technet.microsoft.com/en-us/library/bb742433.aspx

1. Add a new user using the Active Directory Users and Computers management tool. The new user account should be specified as user logon name <host>/<fully qualified host name of HANA server>. In Linux Kerberos terminology, the user logon name is also known as UPN (user principal name).

2. Open a command window and set the SPN (service principal name) to <host>/hanaserver.sap.corp. For example:

ktpass princ host/[email protected] mapuser krbHana pass Pass1234 out krb5_host.keytab

It is mandatory that the SPN and UPN are the same and the host service is set as <host>/hanaserver.sap.corp.

SAP HANA Spark Controller Installation GuideSetting Up Security P U B L I C 73

Page 74: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

3. Copy the generated krb5_host.keytab file to your SAP HANA server account at /usr/sap/<SID>/home/etc/krb5_host.keytab and to /usr/sap/<SID>/home/etc/krb5_hdb.keytab.

4. Add to this account the hdb service to allow logging into the SAP HANA server using Kerberos. This allows SAP HANA users to validate their Kerberos IDs by first logging into SAP HANA with their Kerberos account before being authorized to use this ID for protocol transition.

5. In a windows administration console execute:

setspn -S hdb/hanaserver.sap.corp plat_security\hanaserver

You will need to use the NT Domain name, which is case insensitive on Windows.6. In the Active Directory Users and Computers application, open the created user and click on the

Delegation tab. And add hdb service of your Hana server account1. Select Trust this user for delegation to specified services only, and chose Use any authentication

protocol.2. Add the hdb service for your server account:

○ Service Type – hdb○ User or Computer – hanaserver.sap.corp

7. In your Linux account, you will need to add the keytab entry for the hdb service. For example:

klist -k /usr/sap/<SID>/home/etc/krb5_hdb.keytab -etK

8. Start the ktutil tool and execute the following at the command prompt:

ktutil: addent -password -p hdb/[email protected] -k 3 -e rc4-hmac

The KVNO number is typically 3.9. Provide the account password and execute:

ktutil: wkt /usr/sap/<SID>/home/etc/krb5_hdb.keytab ktutil: q

If you re-execute the command, 2 entries should appear.10. Copy or share the /usr/sap/<SID>/home/etc folder to each node of your cluster.11. Restart your SAP HANA server instance.

5.4 SSL

SSL ensures authentication of the server using a certificate and corresponding key. Use the SSL protocol to perform mutual SSL authentication between SAP HANA and SAP HANA spark controller.

OpenSSL Tools

OpenSSL contains an open-source implementation of the SSL and TLS protocols. The core library, written in the C programming language, implements basic cryptographic functions and provides various utility functions.

74 P U B L I CSAP HANA Spark Controller Installation Guide

Setting Up Security

Page 75: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Wrappers allowing the use of the OpenSSL library in a variety of computer languages are available. See the OpenSSL documentation for more information: https://www.openssl.org/docs/manmaster/ .

Private and Public Keys

Two key pairs are used for the SSL/TLS protocol to authenticate, secure, and manage secure connections. One key is a private key, and the other is a public key. They are created together and work together as a pair during the SSL/TLS handshake process (using asymmetric encryption) to set up a secure session:

● Private key – This is a text file used to generate a Certificate Signing Request (CSR), which is a message sent from an applicant to a certificate, and later to secure and verify connections using the certificate created per that request.

● Public key – This it is included as part of your SSL certificate, and works together with your private key to make sure that your data is encrypted and verified. Anyone with access to the public key can verify that the digital signature is authentic without having to know the secret private key.

Certificate Authorities (CA)

The CA is an entity that issues digital certificates that certifies the ownership of a public key. There are two types of CAs: a root CA and an intermediate CA. A trusted SSL certificate must be issued by a CA that is included in the trusted store of the connecting device. The connecting device checks to see if the certificate was issued by a trusted CA. To obtain an SSL certificate from a certificate authority (CA), you must generate a certificate signing request (CSR).

Certificate Signing Request (CSR)

A CSR is a block of encoded text that is given to a certificate authority when applying for an SSL certificate. It is usually generated on the server where the certificate will be installed and contains information that will be included in the certificate such as the organization name, domain name, locality, and country. It also contains the public key that will be included in the certificate. A private key is usually created at the same time that you create the CSR, making a key pair.

Certificate Chain

The certificate chain is an ordered list of certificates that contains an SSL certificate and certificate authority (CA). If the certificate was not issued by a trusted CA, the connecting device checks to see if the certificate of the issuing CA was issued by a trusted CA. The list of SSL certificates, from the root certificate to the end-user certificate, represents the SSL certificate chain:

● The root certificate is generally embedded in your connected device.● Installing the intermediate SSL certificate depends on environment. For example, Apache requires you to

bundle the intermediate SSL certificates and assign the location of the bundle to the

SAP HANA Spark Controller Installation GuideSetting Up Security P U B L I C 75

Page 76: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

SSLCertificateChainFile configuration. Conversely, NGINX requires you to package the intermediate SSL certificates in a single bundle with the end-user certificate.

Personal Security Environments (PSE)

Certificates can be stored in the database, or in trust and key stores located in the file system. These certificates are contained in personal security environments (PSE) files located in the file system. PSEs are referred to as certificate collections, and are used when the certificates are required to secure internal communication channels using the system public key infrastructure (system PKI), and HTTP client access using the SAP Web Dispatcher administration tool, or the SAPGENPSE tool, both of which are delivered with SAP HANA. If you are using OpenSSL, you can also use the tools provided with OpenSSL.

Java KeyStore (JKS)

JKS (PKCS12) is a repository for security certificates. This repository is for authorization certificates, or public key certificates, and corresponding private keys that are used for SSL encryption.

Related Information

Configure SSL Mode [page 76]OpenSSL Command Syntax for SAP HANA Spark Controller [page 77]Configure SSL Example [page 78]SSL Mode Configure Parameters [page 80]

5.4.1 Configure SSL Mode

To create a remote source in SSL mode, perform mutual SSL authentication between SAP HANA and SAP HANA spark controller. For security purposes, enable SSL mode in production scenarios.

The following are the high level steps you preform to configure SSL mode.

1. Create a private key, SSL certificate chain, and certificate signing request (CSR).The private key is a text file used to generate a CSR. The certificate authority (CA) is a digital certificate that certifies the ownership of a public key, and the certificate chain is an ordered list of certificates that contain an SSL certificate and CAs. A public key is included as part of your SSL certificate.Use the OpenSSL tools to create the files, then copy them to the following locations:1. Add the private key and certificate chain to the SAP HANA keystore personal security environments

(PSE) file located in the file system of the machine on which you installed SAP HANA.2. Add the root certificate that you created to the spark controller JKS (PKCS12 file) trust store on the

machine on which you installed spark controller.

76 P U B L I CSAP HANA Spark Controller Installation Guide

Setting Up Security

Page 77: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Install both spark controller and its associated JKS (PKCS12) file on the machine that is part of the Hadoop cluster.

2. Create another private key, SSL certificate chain, and CSR. This key can be use the same, or different certificate authority.You can create this private key on any machine where the OpenSSL tool is located, then copy it to the machines where you installed spark controller and SAP HANA:1. Add the private key and certificate chain to the spark controller keystore JKS (PKCS12 file) on the

machine on which you installed spark controller.2. Add the root certificate to the SAP HANA PSE file, located in the file system of the machine on which

you installed SAP HANA3. Make the appropriate changes in hanaes-site.xml file. See SSL Mode Configure Parameters [page 80]

for a list of configuration parameters that are specific to SSL.4. Restart spark controller.

5.4.2 OpenSSL Command Syntax for SAP HANA Spark Controller

This section describes the openssl options and arguments used in this document.

openssl is a command line tool for using the various cryptography functions of the OpenSSL cryptography library. For more information about OpenSSL commands, see https://www.openssl.org/docs/manmaster/ .

The following are openssl options and arguments which are used in the example for configuring mutual SSL authentication between SAP HANA and spark controller.

● CA – specifies to sign the certificate request from a user.● CAKey – specifies to sign the certificate key request from a user.● CAcreateserial – the first time you use your CA to sign a certificate you can use the CAcreateserial

option, which creates a ca.srl file containing a serial number. The next time you use your CA, use the CAserial option.

● certfile – file from which to read additional certificates.● days – the number of days for which the certificate is valid.● export – exports the PKCS12 file.● extensions – specifies the extensions to be added when a certificate is issued.● extfile – file containing certificate extensions to use. If not specified, no extensions are added to the

certificate.● in - input filename from which to read the certificate.● inkey – file from which to read the private key.● keyout – stores the private key in the specified file name.● new – specifies that this is a new request.● newkey – specifies that this is for generating a new private key.● nodes – meaning "no DES". This option specifies that you do not want the private key in a PKCS12 file

encrypted.● out – stores the certificate request in the specified file name.● pkcs12 – specified to create and parse a PKCS12 file.

SAP HANA Spark Controller Installation GuideSetting Up Security P U B L I C 77

Page 78: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

● req – generates a certificate request signing.● rsa:2048 – specifies the bit length of the private key. For smaller keys you can use 1024 or 512. The

strength of the key should match the type of service your certificate authority is providing to you.● sha1 – specifies to use Secure Hash Algorithm 1 (the 160-bit cryptographic hash function).● x509 – certificate utility. x509 can be used to display certificate information, convert certificate, sign

certificate requests, or edit certificate trust settings.

5.4.3 Configure SSL Example

This example describes configuring SSL authentication between SAP HANA and SAP HANA spark controller.

Context

You can create a CA (Certificate Authority) and use the CA to sign a certificate, or use a self-signed certificate. This example shows how to use a self-signed certificate.

The /etc/ssl/openssl.cnf file is the general configuration file for OpenSSL program. You can configure the expiration date of your keys, the name of your organization, the address, and so on. This file is also refer to as <openssl conf>.

NoteSee OpenSSL Command Syntax for SAP HANA Spark Controller [page 77] for a list of command options used in this example and their descriptions.

Procedure

1. Login into your SAP HANA system as the <sid>adm user.

2. Create a personal security environments (PSE) file for SAP HANA. This syntax creates an unencrypted 2048-bit RSA private key and the associated self-signed certificate in the privacy-enhanced mail (PEM) format:

openssl req -new -x509 -newkey rsa:2048 -days 365 -sha1 -keyout CA_Key.pem -out CA_Cert.pem -extensions v3_ca

The output displays the certificate request. Additional information is required.3. When prompted, provide the appropriate information in the identification form. For example:

○ Country Name (2 letter code) [AU]:US○ State or Province Name (full name) [Some-State]:California○ Locality Name (eg, city) []:San Jose○ Organization Name (eg, company) [Internet Widgits Pty Ltd]:MyCompany

78 P U B L I CSAP HANA Spark Controller Installation Guide

Setting Up Security

Page 79: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

○ Organizational Unit Name (eg, section) []:Hadoop○ Common Name (e.g. server FQDN or YOUR name) []: BIG DATA○ Email Address []:<xxxx>@gmail.com

4. Create a certificate signing request (CSR). This file is used to send the public key information that identifies your company and domain name for a signing request:

openssl req -newkey rsa:2048 -days 365 -sha1 -keyout Hana_Key.pem -out Hana_Req.pem -nodes

The output displays the certificate request. Additional information is required.5. When prompted, provide the appropriate information in the identification form. For example:

○ Country Name (2 letter code) [AU]:US○ State or Province Name (full name) [Some-State]:California○ Locality Name (eg, city) []:San Jose○ Organization Name (eg, company) [Internet Widgits Pty Ltd]:MyCompany○ Organizational Unit Name (eg, section) []:Hadoop○ Common Name (e.g. server FQDN or YOUR name) []: <HANA_hostname>○ Email Address []:[email protected]○ A challenge password []: myPass○ An optional company name []: MyCompany2

6. Use the self-signed certificated to sign the request:

openssl x509 -req -days 365 -in Hana_Req.pem -sha1 -extfile <openssl conf> -extensions usr_cert -CA CA_Cert.pem -CAkey CA_Key.pem -CAcreateserial -out Hana_Cert.pem

7. Export the private key and certificate chain to the PKCS12 store:

openssl pkcs12 -export -out hana.pkcs12 -in Hana_Cert.pem -inkey Hana_Key.pem -certfile CA_Cert.pem

8. Use the sapgenpse.exe utility to create a personal security environments (PSE) file from the PKCS12 store created above.

sapgenpse import_p8 -p sparksql_ks.pse ./hana.pkcs12

where:○ import_p8 – creates a PSE from an OpenSSL keyfile.○ p – path and file name for the server PSE.

9. Create a new certificate signing request (CSR) for spark controller:

openssl req -newkey rsa:2048 -days 365 -sha1 -keyout Controller_Key.pem -out Controller_Req.pem -nodes

The output displays the certificate request. Additional information is required.10. When prompted, provide the appropriate information in the identification form. For example:

○ Country Name (2 letter code) [AU]:US○ State or Province Name (full name) [Some-State]:California○ Locality Name (eg, city) []:San Jose

SAP HANA Spark Controller Installation GuideSetting Up Security P U B L I C 79

Page 80: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

○ Organization Name (eg, company) [Internet Widgits Pty Ltd]:MyCompany○ Organizational Unit Name (eg, section) []:Hadoop○ Common Name (e.g. server FQDN or YOUR name) []: <Spark_controller_hostname>○ Email Address []:[email protected]

11. Use the self-signed certificated to sign the request:

openssl x509 -req -days 365 -in Controller_Req.pem -sha1 -extfile <openssl conf> -extensions usr_cert -CA CA_Cert.pem -CAkey CA_Key.pem -out Controller_Cert.pem

12. Export the private key and certificate chain to the PKCS12 store:

openssl pkcs12 -export -out controller_ks.p12 -in Controller_Cert.pem -inkey Controller_Key.pem -certfile CA_Cert.pem

13. Use scp to copy the CA_Cert.pem and controller_ks.p12 files to the same location as the spark controller installation.

14. Use the Java Keytool key and certificate management utility to import the spark controller _ts.jks file into the keystore:

keytool -import -file <Path to CA_Cert.pem>/CA_Cert.pem -keystore ./controller_ts.jks

You now have the controller_ks.p12 and controller_ts.jks files imported where you installed spark controller.

15. Set the appropriate parameters in the hanaes-site.xml file. See SSL Mode Configure Parameters [page 80].

16. If you use the spark controller host name to create the CSR, set the value of the sap.hana.es.driver.host property to your spark controller’s host name to avoid any ambiguous host resolution.

17. Restart Spark controller.

5.4.4 SSL Mode Configure Parameters

The spark_configuration parameter group in the global.ini file includes SSL parameters.

Parameter Default value Description

sslKeyStore sparksql_ks.pse Path to the keystore file in PSE format. sslKeyStore holds a single key and certificate chain for SAP HANA to communicate with all Spark controllers.

sslTrustStore sparksql_ts.pse Path to the trust store file in PSE format. The SAP HANA trust store for Spark holds certificates for all Spark controllers to which it connects.

80 P U B L I CSAP HANA Spark Controller Installation Guide

Setting Up Security

Page 81: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Parameter Default value Description

sslValidateCertificate true If set to true, the host's certificate is validated.

sslValidateHostNameInCertificate

true If set to true, the hostname is validated against the certificate used for the SSL handshake. This parameter is only used when sslValidateCertificate is set to true.

Make sure that sslValidateCertificate and sslValidateHostNameInCertificate are set to true.

On Spark controller, the hanaes-site.xml file includes additional security parameters. See Configuration Properties [page 49]. Make sure that sap.hana.es.ssl.enabled is set to true.

SAP HANA Spark Controller Installation GuideSetting Up Security P U B L I C 81

Page 82: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

6 Create a Remote Source

Connect to your Hadoop cluster from SAP HANA by creating a remote source.

Prerequisites

Spark controller must be running.

Procedure

Run the CREATE REMOTE SOURCE SQL statement in the SQL console of SAP HANA Studio.

○ This example creates a remote source of type sparksql:

CREATE REMOTE SOURCE "spark_demo" ADAPTER "sparksql" CCONFIGURATION 'server=<x.x.x.x>;port=7860;ssl_mode=disabled' WITH CREDENTIAL TYPE 'PASSWORD' USING 'user=hanaes;password=hanaes';

○ This example creates a remote source of type sparksql with the credential type Kerberos:

CREATE REMOTE SOURCE "spark_demo" ADAPTER "sparksql" CONFIGURATION 'server=<x.x.x.x>;port=7860;ssl_mode=disabled' WITH CREDENTIAL TYPE 'KERBEROS'

The remote source appears under Provisioning Remote Source .

Related Information

Configuring a Proxy Server [page 64]

82 P U B L I CSAP HANA Spark Controller Installation Guide

Create a Remote Source

Page 83: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

7 Create a Custom Spark Procedure

Custom Spark procedures are virtual procedures used to access a Spark remote source.

Prerequisites

A remote source exists.

Context

Create a custom Spark procedure in SAP HANA to perform compilation and execution on a Hadoop cluster and consume the results back in SAP HANA. You can easily access Spark libraries from SAP HANA and then push the procedure to spark controller for compilation and execution. An example of this is accessing the machine learning libraries on a Hadoop cluster and bringing the model back to SAP HANA for prediction.

The body of the CREATE VIRTUAL PROCEDURE defines the source code for the virtual procedure. You can run complex algorithms on both structured (such as tables) and non-structured (such as log files) data using Scala programming language.

SAP HANA Spark Controller Installation GuideCreate a Custom Spark Procedure P U B L I C 83

Page 84: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Procedure

● The following example is adapted from a sample hosted on the Apache Spark Web site: https://spark.apache.org/docs/1.6.2/ml-features.html#n-gram .

Sample Code

CREATE VIRTUAL PROCEDURE SYSTEM.FINDNGRAMS( IN N INT,OUT NGRAMS TABLE(STR TEXT))LANGUAGE SCALASPARKAT SPARK_OAKLASBEGINimport sqlContext.implicits._import scala.collection.mutable.WrappedArrayimport org.apache.spark.ml.feature.NGram // $example on$ val wordDataFrame = sqlContext.createDataFrame(Seq( (0, Array("Hi", "I", "heard", "about", "Spark")), (1, Array("I", "wish", "Java", "could", "use", "case", "classes")), (2, Array("Logistic", "regression", "models", "are", "neat")) )).toDF("id", "words") val ngram = new NGram().setN(N).setInputCol("words").setOutputCol("ngrams") val ngramDataFrame = ngram.transform(wordDataFrame) ngramDataFrame.select("ngrams").show(false) NGRAMS = ngramDataFrame.select("ngrams"). map(y=>y(0).asInstanceOf[WrappedArray[_]]. mkString(",")).toDFEND;CALL FINDNGRAMS(6, ?);

Related Information

Privileges [page 84]Virtual Package System Built-Ins [page 85]

7.1 Privileges

Use these privileges to give users permission to create a virtual procedure and virtual package.

To create a virtual procedure on a remote source, the CREATE VIRTUAL PROCEDURE object privilege is required on the remote source. The syntax is:

GRANT CREATE VIRTUAL PROCEDURE ON REMOTE SOURCE <source_name> TO <user>

84 P U B L I CSAP HANA Spark Controller Installation Guide

Create a Custom Spark Procedure

Page 85: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

CautionGrant this privilege only to trusted database users. Even though procedure execution is done in a restricted sandbox, use extreme caution when granting this privilege.

The CREATE VIRTUAL PACKAGE privilege provides access to create a new virtual package:

GRANT CREATE VIRTUAL PACKAGE ON <schema_name> TO <user> WITH GRANT OPTION

7.2 Virtual Package System Built-Ins

Create, alter, or drop virtual packages.

A virtual package is an archive (zip) file containing Java libraries and resource files. These can be referenced in virtual procedures and functions. Typically, reusable Java libraries (JARs) are packaged into virtual package and shared across multiple virtual procedures.

The schema-level privilege CREATE VIRTUAL PACKAGE allows permission to add a new virtual package.

Table 2: VIRTUAL_PACKAGE_CREATE

Type (INPUT/ OUT­PUT)

SQL Data Type Length Description

SCHEMA_NAME INPUT NVARCHAR 256 Schema name

PACKAGE_NAME INPUT NVARCHAR 256 Package name

ADAPTER_NAME INPUT NVARCHAR 256 Name of the remote source adapter

CONTENT INPUT BLOB Package file content

Table 3: VIRTUAL_PACKAGE_ALTER

Type (INPUT/ OUT­PUT)

SQL Data Type Length Description

SCHEMA_NAME INPUT NVARCHAR 256 Schema name

PACKAGE_NAME INPUT NVARCHAR 256 Package name

ADAPTER_NAME INPUT NVARCHAR 256 Name of the remote source adapter

CONTENT INPUT BLOB Package file content

SAP HANA Spark Controller Installation GuideCreate a Custom Spark Procedure P U B L I C 85

Page 86: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Table 4: VIRTUAL_PACKAGE_DROP

Type (INPUT/ OUT­PUT)

SQL Data Type Length Description

SCHEMA_NAME INPUT NVARCHAR 256 Schema name

PACKAGE_NAME INPUT NVARCHAR 256 Package name

ADAPTER_NAME INPUT NVARCHAR 256 Name of the remote source adapter

86 P U B L I CSAP HANA Spark Controller Installation Guide

Create a Custom Spark Procedure

Page 87: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

8 Data Lifecycle Manager

The Data Lifecycle Manager (DLM) can be used to relocate data to Hadoop with SAP HANA spark controller for data access.

See these sections for more information about using spark controller for DLM scenarios:

● Set the sap.hana.es.warehouse.dir property to the location for aged data. See Configuration Properties [page 49].

● To create a remote source to SAP HANA, see Create a Remote Source [page 82].

See SAP HANA Data Warehousing Foundation for more information.

SAP HANA Spark Controller Installation GuideData Lifecycle Manager P U B L I C 87

Page 88: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

9 Troubleshooting

This section contains troubleshooting procedures for problems that you may encounter when using SAP HANA spark controller.

Find solutions to known issues for HANA 2.0 SPS 02 using SAP ONE Support Launchpad .

Related Information

Troubleshooting Diagnostic Utility [page 88]SAP HANA Hadoop Integration Memory Leak for Spark Versions 1.5.2 and 1.6.x [page 94]SAP HANA Spark Controller Unsupported Features and Datatypes for Spark 1.5.2 [page 96]Cannot Execute Service Actions or Turn Off Service Level Maintenance Mode on Ambari [page 96]SAP Vora - SAP HANA Spark Controller Fails To Start [page 97]The TINYINT Datatype is not Supported When Accessing Apache Hive Tables [page 98]Fixing Classpath Order - Error Logs Shows the Exception "URI is not hierarchical" [page 98]Enable SAP HANA Spark Controller to Fetch Data From Each Spark Executor Node in the Network Directly [page 99]Configure SAP HANA Spark Controller for Non-Proxy Server Environments [page 99]SAP HANA Spark Controller Moves Incorrect Number of Records When Using Date Related Built-ins [page 100]Data Warehousing Support [page 100]

9.1 Troubleshooting Diagnostic Utility

SAP HANA spark controller includes a diagnostic tool to ensure that the required properties, environment variables, and component versions are correctly configured so that spark controller can start.

The diagnostic tool does not provide information regarding the installation or configuration of your Hadoop cluster. The diagnostic tool only provides troubleshooting information relevant to starting spark controller, therefore invoking the diagnostic tool after starting spark controller is not recommended.

For Ambari and manual installations, the diagnostic tool is located in the /usr/sap/spark/controller/utils directory. For Cloudera Manager installations, the tool is located in the /opt/cloudera/parcels/SAPHanaSparkController-<spark_controller_version>/lib/sap/spark/controller/utils directory.

For information about running diagnostics through the Cloudera Manager Web UI, see Run the Diagnostic Utility [page 30].

88 P U B L I CSAP HANA Spark Controller Installation Guide

Troubleshooting

Page 89: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Related Information

Run the Diagnostic Tool [page 89]Error Messages [page 91]

9.1.1 Run the Diagnostic Tool

Use the diagnostic tool to check your Spark controller installation for errors.

Prerequisites

● Ensure the HANA_SPARK_ASSEMBLY_JAR environment variables is set. This is the full path of the spark-assembly jar file used with Spark controller. See Environment Variables for hana_hadoop-env.sh [page 46].

● You must have root user permissions.

NoteIf you have installed Spark controller using Cloudera Manager, see Run the Diagnostic Utility [page 30] for information about running diagnostics through the Cloudera Manager Web UI.

Procedure

1. The diagnostic tool uses the default class paths for Hadoop libraries. If those libraries are installed in a different location, you will need to locate them for the tool. Run the following:

java -cp ./controller.util-<spark_controller_version>.jar:<classpath> com/sap/hana/spark/DiagnosticUtil

2. Go to the directory in which the diagnostic tool is located.○ For Ambari and manual installations, go to the /usr/sap/spark/controller/utils directory.○ For Cloudera installations, go to the /opt/cloudera/parcels/SAPHanaSparkController-

<spark_controller_version>/lib/sap/spark/controller/utils directory.

3. Enter the following:

sudo ./diagnose

The diagnostic tool will determine if Spark controller was installed correctly. If Spark controller was not installed correctly, the tool will provide a list of error messages briefly detailing the error, and a reference code. This list is compressed into the output.tar file in the same directory in which the tool was run. This compressed file consists of output.txt, which contains the error logs, as well as unmodified copies of hana_controller.log, hanaes-site.xml, and hana_hadoop-env.sh for convenience.

SAP HANA Spark Controller Installation GuideTroubleshooting P U B L I C 89

Page 90: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Here is an example of the tool running and not detecting an error.

Here is an example of the tool detecting the error A.15.

4. Use the error codes from the output to find more information about the errors. See Error Messages [page 91].

90 P U B L I CSAP HANA Spark Controller Installation Guide

Troubleshooting

Page 91: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

9.1.2 Error Messages

Identify and troubleshoot errors in your SAP HANA spark controller installation after running the diagnostic tool.

The error codes are displayed in the following format: <Hadoop Distribution>.<Installation Type>.<Error Number>.

The first letter of the code indicates the Hadoop Distribution codes and corresponds to the following:

● A – All● M – MapR● C – Cloudera● H – Hortonworks

Terminal (or manual) installations are indicated withT. If there is no middle letter, the error could affect either manual or Web UI installations.

Table 5: Error Messages

Error Code Description

A.01 Cause: The file hana_hadoop-env.sh must contain the entry export HANA_SPARK_ASSEMBLY_JAR=<path> in order for the spark-assembly jar file to be accessible to the system environment. This error occurs when no such entry exists, when this entry has been com­mented out, or when no <path> has been provided.

Solution: Ensure the line export HANA_SPARK_ASSEMBLY_JAR=<path> exists in the hana_hadoop-env.sh file.

See Environment Variables for hana_hadoop-env.sh [page 46].

A.02 Cause: The file hana_hadoop-env.sh must contains entry export HANA_SPARK_ASSEMBLY_JAR=<path> and <path> has been provided, but does not point to a valid filesystem location.

Solution: Ensure the line export HANA_SPARK_ASSEMBLY_JAR=<path> exists in the hana_hadoop-env.sh file.

See Environment Variables for hana_hadoop-env.sh [page 46].

A.04 Cause: The Hadoop distribution and spark-assembly distribution files do not match or could not be de­termined.

Solution: Ensure you are using the correct spark-assembly jar file.

See the Installation Prerequisites for your installation type to find the location of the spark-assembly jar file for your configuration.

A.05 Cause: The directory /user/hanaes does not exist in HDFS.

Solution: The directory /user/hanaes and the does not exist in HDFS. The hanaes user must exist in HDFS. To create this user manually, see Confirm that the following folder structure is created, and is owned by the user hanaes [page 37].

SAP HANA Spark Controller Installation GuideTroubleshooting P U B L I C 91

Page 92: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Error Code Description

A.06 Cause: The directory /user/hanaes exists in HDFS but is not owned by the user group hanaes:hdfs for Hortonworks and Cloudera distributions, or hanaes:sapsys for MapR distribu­tions.

Solution: The directory /user/hanaes exists and the user must be owned by the hanaes:hdfs group. To change ownership, see Confirm that the following folder structure is created, and is owned by the user hanaes [page 37].

H.09 Cause: The mapred-site.xml file was not found.

Solution: Ensure the environment variables are set correctly.

See Environment Variables for hana_hadoop-env.sh [page 46].

A.10 Cause: The properties in the hanaes-site.xml file are not set, or they are configured incorrectly.

Solution: Configure hanaes-site.xml:

● For Ambari, see the Ambari installation step: Go to Spark Controller Configs Custom hanaes-site and add the following properties [page 15].

● For Cloudera Manager, see Modify Configuration Properties (Cloudera Manager) [page 29].● For manual installation, see Configuration Properties [page 49].

M.T.13 Cause: The yarn-site.xml file was not found.

Solution: The configuration file yarn-site.xml should exist under the directory listed in the diagnos­tic tool's output. However, this file could not be found. Contact your Hadoop administrator.

A.14 Cause: The file hana_hadoop-env.sh must contain the entry export HIVE_CONF_DIR=<path>. This error occurs when no such entry exists, when this entry has been commented out, or when no <path> has been provided.

Solution: Ensure the line export HIVE_CONF_DIR=<path> exists in the hana_hadoop-env.sh file.

See Environment Variables for hana_hadoop-env.sh [page 46].

A.15 Cause: The file hana_hadoop-env.sh must contain the entry export HIVE_CONF_DIR=<path>. This error occurs when a <path> has been provided but does not point to a valid file system location.

Solution: Ensure the line export HIVE_CONF_DIR=<path> exists in the hana_hadoop-env.sh file.

See Environment Variables for hana_hadoop-env.sh [page 46].

A.16 Cause: The Hive metastore is not running.

Solution: Check your Hadoop cluster Web UI or contact your Hadoop administrator.

92 P U B L I CSAP HANA Spark Controller Installation Guide

Troubleshooting

Page 93: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Error Code Description

A.18 Cause: The core-site.xml file was not found.

Solution: The configuration file core-site.xml should exist under the directory listed in the diagnos­tic tool's output. However, this file could not be found. Contact your Hadoop administrator.

See Configure hanaes User Proxy Settings [page 60].

A.19 Cause: The hadoop.proxyuser.hanaes.hosts parameter does not exist, or is not set.

Solution: Set the appropriate proxy configuration parameters in the core-site.xml file.

See Configure hanaes User Proxy Settings [page 60].

A.20 Cause: The hadoop.proxyuser.hanaes.group parameter does not exist, or is not set.

Solution: Set the appropriate proxy configuration parameters in the core-site.xml file.

See Configure hanaes User Proxy Settings [page 60].

A.21 Cause: The hdfs-site.xml file could not be found.

Solution: The configuration file hdfs-site.xml should exist under the directory listed in the diagnos­tic tool's output. However, this file could not be found. Contact your Hadoop administrator.

A.T.07 Cause: The spark controller folder structure has not been created.

Solution: During manual installation of spark controller, the directories /usr/sap/spark/controller/conf, /usr/sap/spark/, controller/bin, /usr/sap/spark/controller/lib, and /usr/sap/spark/controller/utils should have been created.

If this error occurs, spark controller may not have been installed correctly from the tar.gzor rpm file, and may need to be downloaded again.

See the manual installation step: Confirm that the following folder structure is created, and is owned by the user hanaes [page 37].

C.17 Cause: The directory /var/run/cloudera-scm-agent/process does not exist.

Solution: The directory /var/run/cloudera-scm-agent/process should have been created during the Cloudera installation process. However, this directory could not be found. Contact your Ha­doop administrator.

H.08 Cause: The mapred-site.xml file is missing or not correctly configured.

Solution: The mapred path must be set in mapred-site.xml, otherwise Hadoop will not be able to run Map Reduce jobs.

See Modify mapreduce.application.classpath [page 13].

NoteThe framework/hadoop/share/hadoop/tools/lib/* needs to be set correctly, or not set at all.

SAP HANA Spark Controller Installation GuideTroubleshooting P U B L I C 93

Page 94: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Error Code Description

M.T.03 Cause: The installed Spark assembly distribution uses Apache, which is not supported by MapR.

Solution: Apache's spark-assembly jar distribution is not supported by MapR. Replace the jar file with the MapR spark-assembly jar distribution.

See Add Properties for YARN [page 43].

M.T.11 Cause: The yarn-site.xml file does not contain yarn.application.classpath or it was not correctly configured.

Solution: Ensure the property yarn.application.classpath exists in the yarn-site.xml file.

See Configure YARN Properties for MapR [page 43].

M.T.12 Cause: The yarn-site.xml file does not contain yarn.scheduler.maximum-allocation-mb or it was not correctly configured.

Solution: Ensure the property yarn.application.classpath exists in the yarn-site.xml file.

See Configure YARN Properties for MapR [page 43].

9.2 SAP HANA Hadoop Integration Memory Leak for Spark Versions 1.5.2 and 1.6.x

Description: A memory leak occurs with Spark versions 1.5.2, 1.6.0, 1.6.1, and 1.6.2. This leak is attributed to Tungsten (an Apache open source project) which is a part of the distributed computational framework. The Spark containers/executors shut down and Spark becomes unresponsive.

If there is a memory leak, the executors will fail and the spark controller log will show that your container is missing, and a new container has started.

The following is an example of the error stack:

Exit status: 52. Diagnostics: Exception from container-launch. Container id: container_e07_1476226555327_0121_02_000025Exit code: 52Stack trace: ExitCodeException exitCode=52:atorg.apache.hadoop.util.Shell.runCommand(Shell.java:576)atorg.apache.hadoop.util.Shell.run(Shell.java:487)atorg.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)at ………

94 P U B L I CSAP HANA Spark Controller Installation Guide

Troubleshooting

Page 95: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Solution: If the issue is a memory leak, spark controller logs Error 52 and you will see the following error in the Apache Spark container log:

16/20/28 20:11:02 WARN memory.TaskMemoryManager: leak 64.0 MB memory from org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@1b00.ffa8

To access the Spark container log:

1. Find the application ID for spark controller. The Ambari Resource Manager UI can be used for HDP distributions of Hadoop and comparable administration tools can be used for other Hadoop distributions.

2. Using Putty, login to the Linux machine where spark controller is running and execute:

sudo su – hanaes

3. Run the following using the application ID and redirect the log file to the tmp directory:

yarn logs –applicationID application_1477939831078_0008 >> /tmp/containers.log

Make sure that the /tmp folder has enough disk space for the log information. Note that application_1477939831078_0008 is an example of the application ID

4. Search for the word “leak” in /tmp/containers.log.

To resolve the issue:

Spark 1.5.2

Tungsten is enabled by default and needs to be disabled. Disable Tungsten by adding the following property in the spark controllerhanaes-site.xml file:

<property> <name>spark.sql.tungsten.enabled</name> <value>false</value> <final>true</final> </property>

Using Spark 1.5.2 with spark controller 2.0 requires the following property in the spark controller hanaes-site.xml file in addition to the property mentioned above:

<property> <name>spark.sql.hive.metastore.sharedPrefixes</name> <value>com.mysql.jdbc,org.postgresql,com.microsoft.sqlserver,oracle.jdbc,org.apache.hadoop</value> </property>

Spark 1.6.0, 1.6.1, and 1.6.2

Tungsten cannot be disabled on Spark versions 1.6.0, 1.6.1, and 1.6.2 and the memory leak cannot be avoided. If you experience memory leak issues, use Spark 1.5.2 and set the properties in the hanaes-site.xml file as described above.

Reference: https://launchpad.support.sap.com/#/notes/2385144

SAP HANA Spark Controller Installation GuideTroubleshooting P U B L I C 95

Page 96: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

9.3 SAP HANA Spark Controller Unsupported Features and Datatypes for Spark 1.5.2

Description: Spark controller does not support the following features and data types when you are running Spark 1.5.2 with spark controller on SAP HANA platform 1.0 SPS 12.

Features:

● Nominal key● Clash Strategy

Datatypes:

● TEXT● SHORTTEXT● BINTEXT● BLOB● CLOB● TIME● NCLOB● ALPHANUM● ST_POINT● ST_GEOMETRY● BOOLEAN● ARRAY● SECONDDATE

The following has limited support:

● Packeting - Packeting is available only when moving data from SAP HANA to Hadoop.● CHAR - this datatype is not supported on Hive tables when using Spark as the execution engine. To

workaround this issue, change the datatype from CHAR to VARCHAR or STRING type.

Reference: https://launchpad.support.sap.com/#/notes/2315404

9.4 Cannot Execute Service Actions or Turn Off Service Level Maintenance Mode on Ambari

Description: When you install spark controller using Ambari, you may encounter an issue where you cannot execute operations using Service Actions (Start, Stop, Restart All or Turn On/Off Maintenance Mode) and the message the following message is displayed in /var/log/ambari-server/ambari-server.log: Cannot determine request operation level. Operation level property should be specified for this request. The message is not available from Web UI.

96 P U B L I CSAP HANA Spark Controller Installation Guide

Troubleshooting

Page 97: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Additionally, if you run Restart All from Service Actions, and then check the Turn On Maintenance Mode for SparkController option from the confirmation pop-up window, the Service Level Maintenance Mode will be turned on and cannot be turned off.

This issues is due to an Ambari front-end to back-end issue, and spark controller is to able grant the permission to operate on the Service Level correctly.

Solution: Always execute operations at the Component Level. To access the spark controller Component Level drop-down menu, use either:

● Spark Controller SparkController Components

● Hosts host (the one you installed Spark controller) Components

If, based on the above symptoms, you inadvertently turned on the Service Level Maintenance Mode, and cannot turn it off. Run the following command from the console window using RESTful API to indicate the correct operation level from the front-end to the back-end so that the maintenance mode can be turned off:

curl -u <user>:<password> -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Remove SparkController from maintenance mode"}, "Body": {"ServiceInfo": {"maintenance_state": "OFF"}}}' htttp://<hostname>:8080/api/v1/clusters/<stackname>/services/SparkController

Reference: https://launchpad.support.sap.com/#/notes/2315432

9.5 SAP Vora - SAP HANA Spark Controller Fails To Start

Description: Spark controller fails to start, or running SQL statement fails against SAP HANA virtual tables created from an SAP Vora data source. The Spark controller log, var/log/hanaes/hana_controller.log shows the following error messages:

ERROR RequestOrchestrator: Result set was not fetched by connected Client. Hence cancelled the execution ERROR RequestOrchestrator: org.apache.spark.SparkException: Job 0 cancelled part of cancelled job group at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)

The environment is:

● SAP HANA● SAP HANA Vora 1.2 or Hadoop HIVE● Spark controller 1.5 Patch 5 or spark controller 1.6 Patch 1

To reproducing the issue:

Starting spark controller using the Ambari UI, or using the command ./hanaes start results in the spark controller cannot be started.

Or

1. Starting spark controller via Ambari UI or command ./hanaes start.2. Create an SAP Vora remote source in SAP HANA studio.

SAP HANA Spark Controller Installation GuideTroubleshooting P U B L I C 97

Page 98: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

3. Add the SAP Vora tables as virtual tables in SAP HANA studio.4. Run a query against the virtual tables created from the SAP Vora data source.5. Spark controller stops with an error.

Solution:

● Make sure to create an SAP Vora remote source using FQDN (Fully Qualified Domain Name)● Make sure ports 56000-58000 are open on spark controller nodes.● Maintain appropriate entries in /etc/hosts file on the SAP HANA server, so that it contains the correct

hostname, FQDN and IP address of the spark controller node.

Reference: https://launchpad.support.sap.com/#/notes/2396015

9.6 The TINYINT Datatype is not Supported When Accessing Apache Hive Tables

Description: The TINYINT datatype is not supported when accessing Apache Hive tables using spark controller. No error message is raised to indicate that there are incompatible datatype definitions.

Using the TINYINT datatype is not supported when accessing Apache Hive tables using spark controller.

● Apache Hive defines the TINYINT datatype is as a signed integer with a ranging of -128 to 128.● SAP HANA defines the TINYINT datatype as unsigned integer with a range of 0 to 255.

Solution: To access Apache Hive tables using spark controller, convert the TINYINT datatype in your table schema to a compatible datatype, such as SMALLINT, INTEGER, or BIGINT before exchanging data and to allow consistency across the two databases.

Reference: https://launchpad.support.sap.com/#/notes/2542953

9.7 Fixing Classpath Order - Error Logs Shows the Exception "URI is not hierarchical"

Description: When installing spark controller 2.0 SP02 PL0, it may not start correctly during the final step of the installation. The spark controller logs shows the exception "URI is not hierarchical".

Solution: Do one of the following:

● For Cloudera Manager, this issue is resolved by updating to version 2.0 SP02 PL1.● For manual installations, upgrade to version 2.0 SP02 PL1, however you can change the order of the

classpath in hanaes as a workaround. For example:Original:

# CLASSPATH initially contains $HADOOP_CONF_DIRCLASSPATH="${HANA_SPARK_ASSEMBLY_JAR}:${HANA_SPARK_ADDITIONAL_JARS}:${HADOOP_CLASSPATH}"

98 P U B L I CSAP HANA Spark Controller Installation Guide

Troubleshooting

Page 99: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

CLASSPATH="${CLASSPATH}:${DEFAULT_ESCONF_DIR}:${HADOOP_CONF_DIR}:${HIVE_CONF_DIR}:$bin/../*:$bin/../lib/*:${HADOOP_HOME}/*:${HADOOP_HOME}/lib/*:${HADOOP_HDFS_HOME}/*:${HADOOP_HDFS_HOME}/lib/*"

Revised:

# CLASSPATH initially contains $HADOOP_CONF_DIRCLASSPATH="${DEFAULT_ESCONF_DIR}:${HANA_SPARK_ASSEMBLY_JAR}:${HANA_SPARK_ADDITIONAL_JARS}:${HADOOP_CLASSPATH}"CLASSPATH="${CLASSPATH}:${HADOOP_CONF_DIR}:${HIVE_CONF_DIR}:$bin/../*:$bin/../lib/*:${HADOOP_HOME}/*:${HADOOP_HOME}/lib/*:${HADOOP_HDFS_HOME}/*:${HADOOP_HDFS_HOME}/lib/*"

Reference: https://launchpad.support.sap.com/#/notes/2516409

9.8 Enable SAP HANA Spark Controller to Fetch Data From Each Spark Executor Node in the Network Directly

Description: Spark controller is configured to use ports 7860 and 7861 by default. Port 7860 is used to exchange requests or messages with SAP HANA, and port 7861 is used by SAP HANA to fetch the data (which is refered to as tunneling). In this scenario, the data is sent from the Hadoop cluster nodes (executors) through spark controller to SAP HANA. An alternative to tunneling is peer-to-peer parallel data transfer, whereby SAP HANA connects to, and fetches data from each Spark executor node in the network directly.

Peer-to-peer parallel data transfers is used when there is an increased amount of data transfers, and when having multiple ports open is not a security concern (such as in an internal network).

To support peer-to-peer setting, ports 56000 to 58000 must be open on the Spark executor nodes and the sap.hana.p2p.transfer.enabled configuration parameter must be set to true.

Reference: https://launchpad.support.sap.com/#/notes/2554425

9.9 Configure SAP HANA Spark Controller for Non-Proxy Server Environments

Description: In a non-proxy server environment, you may see an error such as, "Result set was not fetched by connected Client" if the ports are not reachable.

(Version 2.0 SP01 PL01 only) Spark controller uses ports 7860 and 7861 by default. However, spark controller supports an SAP HANA connection to the Spark executor nodes through the port range 56000 to 58000 for non-proxy server environments. If ports 56000 to 58000 are not available in this scenario, you may see an error such as: Result set was not fetched by connected Client.

Solution: For spark controller installations that are configured for a non-proxy server environment, ensure that the Spark executor node is reachable and the port connections in the 56000–58000 range are open.

Also, maintain these entries in the /etc/hosts file on the SAP HANA server:

SAP HANA Spark Controller Installation GuideTroubleshooting P U B L I C 99

Page 100: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

● Hostname● FQDN (Fully Qualified Domain Name)● IP address of the Spark executor node

Reference: https://launchpad.support.sap.com/#/notes/2554388

9.10 SAP HANA Spark Controller Moves Incorrect Number of Records When Using Date Related Built-ins

Description: When the filter condition for moving data from Hadoop to SAP HANA involves any date related built-ins, for example ADD_MONTHS(), an incorrect number of records may get moved and could lead to records getting lost in some situations. You may not see any errors when this happens.

This situation could arise when the input date to these built-ins is not specified in the YYYY-MM-DD format.

Solution: To work around this scenario, supply the input date in the format of: YYYY-MM-DD.

Reference: https://launchpad.support.sap.com/#/notes/2443093

9.11 Data Warehousing Support

See the following SAP Notes:

● https://launchpad.support.sap.com/#/notes/2456468● https://launchpad.support.sap.com/#/notes/2290922

100 P U B L I CSAP HANA Spark Controller Installation Guide

Troubleshooting

Page 101: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

Important Disclaimers and Legal Information

Coding SamplesAny software coding and/or code lines / strings ("Code") included in this documentation are only examples and are not intended to be used in a productive system environment. The Code is only intended to better explain and visualize the syntax and phrasing rules of certain coding. SAP does not warrant the correctness and completeness of the Code given herein, and SAP shall not be liable for errors or damages caused by the usage of the Code, unless damages were caused by SAP intentionally or by SAP's gross negligence.

Gender-Neutral LanguageAs far as possible, SAP documentation is gender neutral. Depending on the context, the reader is addressed directly with "you", or a gender-neutral noun (such as "sales person" or "working days") is used. If when referring to members of both sexes, however, the third-person singular cannot be avoided or a gender-neutral noun does not exist, SAP reserves the right to use the masculine form of the noun and pronoun. This is to ensure that the documentation remains comprehensible.

Internet HyperlinksThe SAP documentation may contain hyperlinks to the Internet. These hyperlinks are intended to serve as a hint about where to find related information. SAP does not warrant the availability and correctness of this related information or the ability of this information to serve a particular purpose. SAP shall not be liable for any damages caused by the use of related information unless damages have been caused by SAP's gross negligence or willful misconduct. All links are categorized for transparency (see: https://help.sap.com/viewer/disclaimer).

SAP HANA Spark Controller Installation GuideImportant Disclaimers and Legal Information P U B L I C 101

Page 102: SAP HANA Spark Controller Installation Guide HANA Spark Controller Installation Guide. ... about SAP HANA and Hadoop integration using the ODBC driver, see the SAP HANA Spark Controller

go.sap.com/registration/contact.html

© 2018 SAP SE or an SAP affiliate company. All rights reserved.No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company. The information contained herein may be changed without prior notice.Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors. National product specifications may vary.These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. All other product and service names mentioned are the trademarks of their respective companies.Please see https://www.sap.com/corporate/en/legal/copyright.html for additional trademark information and notices.