hadoop integration with sap hana

Post on 03-Jul-2015

781 Views

Category:

Technology

7 Downloads

Preview:

Click to see full reader

DESCRIPTION

This guide was generated in Jan-Feb'2014 timeframe. Using the feature of SAP HANA Smart Data Access(SDA), it is possible to access remote data, without having to replicate the data to the SAP HANA database beforehand. The following are supported as sources(till 2013): - Teradata database, - SAP Sybase ASE, - SAP Sybase IQ, - Intel Distribution for Apache Hadoop, - SAP HANA. SAP HANA handles the data like local tables on the database. Automatic data type conversion makes it possible to map data types from databases connected via SAP HANA Smart Data Access to SAP HANA data types. This guide will explain the step-by-step approach SAP HANA SDA for Hadoop data - which also include the following : - Hadoop Installation - Data Load in Hadoop system - Activities on Unstructured Data in Hadoop system - ODBC Driver installation & configuration on HANA Server for Hadoop system data access - Smart Data Access in SAP HANA (through SAP HANA Studio), using HADOOP as a remote data source Setup used for this guide : 1) Hadoop : HDP 1.3 for Windows(Hortonworks Data Platform) - Standalone - on Dell Laptop, OS Win7 64bit with 8GB RAM 2) SAP HANA Sever : running on VM – 24GB Standalone HANA 1.0 SPS 7 – SLES 11 SP1

TRANSCRIPT

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 1

SAP HANA Smart Data Access using Hadoop/Hive =================================================================================================

By

Debajit Banerjee

Table of Contents

Introduction about SAP HANA Smart Data Access………………………………………………………………. Page 02

I.HDP 1.3 for Windows Installation Pre-requisite……………………………………………………………….. Page 03

II.HDP 1.3 for Windows (Hortonworks Data Platform) Standalone Installation………………….. Page 13

III.Validation of HDP 1.3 for Windows - Standalone Installation…………………………………………. Page 16

IV.Data Load in Hadoop System : eBook Upload…………………………………………………………………. Page 26

V.Unstructured Data Transformation into Table/View in Hadoop System…………………………… Page 35

VI.ODBC Driver Installation & Configuration on SAP HANA Server………………………………………. Page 40

VII.Smart Data Access (Hadoop Data) in SAP HANA…………………………………………………………….. Page 47

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 2

SAP HANA Smart Data Access

Using the feature of SAP HANA Smart Data Access, it is possible to access remote data, without having to replicate the

data to the SAP HANA database beforehand. The following are supported as sources(till 2013):

Teradata database,

SAP Sybase ASE,

SAP Sybase IQ,

Intel Distribution for Apache Hadoop,

SAP HANA.

SAP HANA handles the data like local tables on the database. Automatic data type conversion makes it possible to map

data types from databases connected via SAP HANA Smart Data Access to SAP HANA data types.

Steps/Procedure :

Hadoop Installation

Data Load in Hadoop system

Activities on Unstructured Data in Hadoop system

ODBC Driver installation & configuration on HANA Server for Hadoop system data access

Smart Data Access in SAP HANA (through SAP HANA Studio), using HADOOP as a remote data source

Assumption – SAP HANA System is already up & running.

Scenario / Lab Setup Details :

1) Hadoop Installation Pre-requisite : HDP 1.3 for Windows(Hortonworks Data Platform) - Standalone

2) Hadoop Installation : HDP 1.3 for Windows(Hortonworks Data Platform) - Standalone – on Dell Laptop, OS Win7

64bit – 8GB)

3) SAP HANA Sever Installation(Lab Server running on VM – 24GB Standalone HANA 1.0 SPS 70) – SLES 11 SP1

4) Validation of Hadoop Installation

5) Data Load in Hadoop system : eBook Upload

6) Unstructured Data transformation into table/views, so that HANA Server can understand Hadoop data.

7) ODBC Driver installation & configuration on HANA Server

8) Smart Data Access in SAP HANA (through SAP HANA Studio), using Hadoop as a remote data source

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 3

I. HDP 1.3 for Windows Installation Pre-requisite

- On HANA Server -Simba : Apache Hive ODBC Driver – Linux 64bit

- On Hadoop System - Microsoft Visual C++ 2010 Redistributable Package (64bit)

- On Hadoop System - Microsoft .NET Framework 4.0

- On Hadoop System - JAVA JDK 1.6/1.7 and PATH, JAVA_HOME environment variables setup

- On Hadoop System - Python 2.7 and PATH environment variable setup

In Linux

In Windows

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 4

MS Visual C++ 2010

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 5

MS .NET Framework 4

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 6

Cancelling it as it gives the option of Repair !!

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 7

Oracle JDK

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 8

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 9

i. Open the Control Panel -> System pane and click on Advanced system

settings.

ii. Click on the Advanced tab.

iii. Click the Environment Variables button.

iv. Under System variables, click New.

v. Enter the Variable Name as JAVA_HOME.

vi. Enter the Variable Value, as the installation path for the Java Development Kit.

For example, if your JDK is installed at C:\Java\jdk1.6.0_31, then you must

provide this path to the Variable Value.

vii. Click OK. viii. Click OK to close the Environment Variables dialog box.

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 10

Python

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 11

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 12

Like Oracle JDK above, C:\Python27 also to be set in PATH variable.

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 13

II. HDP 1.3 for Windows (Hortonworks Data Platform) Standalone Installation

Now accordingly update the C:\hdp-1.3.0.0-GA\clusterproperties.txt as per following:

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 14

In Command Window(Admin Privilege):

msiexec /i "C:\hdp-1.3.0.0-GA\hdp-1.3.0.0.winpkg.msi" /lv "C:\DEBAJIT\HD\hdp13\hdp.log" HDP_LAYOUT="C:\hdp-

1.3.0.0-GA\clusterproperties.txt" HDP_DIR="C:\hdp\hadoop" DESTROY_DATA="Yes"

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 15

There are 3 shortcuts created in desktop area.

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 16

III. Validation of HDP 1.3 for Windows - Standalone Installation

Now we have to start Hadoop.

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 17

Services were not started due to 0 bytes in .xml files(master & regionserver)

Also rest/thrift/thrift2.xml are also of zero bytes.

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 18

1) Navigate to the hbase install directory: C:\hdp\hadoop\hbase-0.94.6.1.3.0.0-0380\bin 2) Open the hbase.cmd in a text editor 3) Look for the line that says: set PATH=%PATH%;%HADOOP_HOME%\bin 4) Delete it or comment it out with a @rem

Now Open a command prompt and navigate to hbase install: C:\hdp\hadoop\hbase-0.94.6.1.3.0.0-0380\bin Rebuild the .xml files: hbase.cmd --service master start > master.xml hbase.cmd --service regionserver start > regionserver.xml hbase.cmd --service rest > rest.xml hbase.cmd --service thrift > thrift.xml hbase.cmd --service thrift2 > thrift2.xml

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 19

Now all the above .xml files having contents.

Stop & Start Hadoop – now it is PERFECT. No more failed services.

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 20

Hadoop Smoketest

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 21

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 22

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 23

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 24

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 25

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 26

IV. Data Load in Hadoop System : eBook Upload

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 27

Now to check whether Hadoop can read the same or not…

It can…perfect !!

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 28

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 29

After refresh

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 30

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 31

From the Namenode server, click on “Browse the filesystem”

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 32

Click on “user”

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 33

Click on .txt file…one can see the book

If one can click on .out file, then one can see the part file

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 34

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 35

V. Unstructured Data Transformation into Table/View in Hadoop System

Now we have to convert those files to be readable table format for HANA. For that we will use HIVE.

Created a table called “debajit_wc” for wordcount part file. But right now, it is empty.

Now loading Data.

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 36

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 37

Configuration change required in hive-site.xml file.

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 38

Just changed from http to thrift – servermode.

And then restart Hadoop.

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 39

Now we can test whether SAP HANA can connect to Hadoop….

Download the license file from email and deployed. Problem solved.

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 40

VI. ODBC Driver Installation & Configuration on SAP HANA Server

Renaming done at WinSCP level….

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 41

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 42

Stopping HANA System

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 43

SIMBA Driver

Changed items are as follows:

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 44

UNIXODBC

We have to upgrade it because of compatibility issue with Simba.

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 45

ODBC.INI - DSN purpose

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 46

Now added odbc information into customer.sh

So, now the connection is working between HANA Server and Hadoop system from OS level.

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 47

VII. Smart Data Access (Hadoop Data) in SAP HANA

SAP HANA Studio

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 48

So, now the connection is working between HANA Server and Hadoop system from SAP HANA Studio.

Creating a schema in HP7

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 49

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 50

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 51

One can do Query and Connection Monitoring when click on “Smart Data Access” under “Provisioning”.

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 52

That’s all.

**** END OF DOCUMENT ****

top related