hadoop integration with sap hana

52
SAP HANA Smart Data Access using Hadoop/Hive Prepared by Debajit Banerjee Page 1 SAP HANA Smart Data Access using Hadoop/Hive ================================================================================================= By Debajit Banerjee Table of Contents Introduction about SAP HANA Smart Data Access………………………………………………………………. Page 02 I.HDP 1.3 for Windows Installation Pre-requisite……………………………………………………………….. Page 03 II.HDP 1.3 for Windows (Hortonworks Data Platform) Standalone Installation………………….. Page 13 III.Validation of HDP 1.3 for Windows - Standalone Installation…………………………………………. Page 16 IV.Data Load in Hadoop System : eBook Upload…………………………………………………………………. Page 26 V.Unstructured Data Transformation into Table/View in Hadoop System…………………………… Page 35 VI.ODBC Driver Installation & Configuration on SAP HANA Server………………………………………. Page 40 VII.Smart Data Access (Hadoop Data) in SAP HANA…………………………………………………………….. Page 47

Upload: debajit-banerjee

Post on 03-Jul-2015

780 views

Category:

Technology


7 download

DESCRIPTION

This guide was generated in Jan-Feb'2014 timeframe. Using the feature of SAP HANA Smart Data Access(SDA), it is possible to access remote data, without having to replicate the data to the SAP HANA database beforehand. The following are supported as sources(till 2013): - Teradata database, - SAP Sybase ASE, - SAP Sybase IQ, - Intel Distribution for Apache Hadoop, - SAP HANA. SAP HANA handles the data like local tables on the database. Automatic data type conversion makes it possible to map data types from databases connected via SAP HANA Smart Data Access to SAP HANA data types. This guide will explain the step-by-step approach SAP HANA SDA for Hadoop data - which also include the following : - Hadoop Installation - Data Load in Hadoop system - Activities on Unstructured Data in Hadoop system - ODBC Driver installation & configuration on HANA Server for Hadoop system data access - Smart Data Access in SAP HANA (through SAP HANA Studio), using HADOOP as a remote data source Setup used for this guide : 1) Hadoop : HDP 1.3 for Windows(Hortonworks Data Platform) - Standalone - on Dell Laptop, OS Win7 64bit with 8GB RAM 2) SAP HANA Sever : running on VM – 24GB Standalone HANA 1.0 SPS 7 – SLES 11 SP1

TRANSCRIPT

Page 1: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 1

SAP HANA Smart Data Access using Hadoop/Hive =================================================================================================

By

Debajit Banerjee

Table of Contents

Introduction about SAP HANA Smart Data Access………………………………………………………………. Page 02

I.HDP 1.3 for Windows Installation Pre-requisite……………………………………………………………….. Page 03

II.HDP 1.3 for Windows (Hortonworks Data Platform) Standalone Installation………………….. Page 13

III.Validation of HDP 1.3 for Windows - Standalone Installation…………………………………………. Page 16

IV.Data Load in Hadoop System : eBook Upload…………………………………………………………………. Page 26

V.Unstructured Data Transformation into Table/View in Hadoop System…………………………… Page 35

VI.ODBC Driver Installation & Configuration on SAP HANA Server………………………………………. Page 40

VII.Smart Data Access (Hadoop Data) in SAP HANA…………………………………………………………….. Page 47

Page 2: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 2

SAP HANA Smart Data Access

Using the feature of SAP HANA Smart Data Access, it is possible to access remote data, without having to replicate the

data to the SAP HANA database beforehand. The following are supported as sources(till 2013):

Teradata database,

SAP Sybase ASE,

SAP Sybase IQ,

Intel Distribution for Apache Hadoop,

SAP HANA.

SAP HANA handles the data like local tables on the database. Automatic data type conversion makes it possible to map

data types from databases connected via SAP HANA Smart Data Access to SAP HANA data types.

Steps/Procedure :

Hadoop Installation

Data Load in Hadoop system

Activities on Unstructured Data in Hadoop system

ODBC Driver installation & configuration on HANA Server for Hadoop system data access

Smart Data Access in SAP HANA (through SAP HANA Studio), using HADOOP as a remote data source

Assumption – SAP HANA System is already up & running.

Scenario / Lab Setup Details :

1) Hadoop Installation Pre-requisite : HDP 1.3 for Windows(Hortonworks Data Platform) - Standalone

2) Hadoop Installation : HDP 1.3 for Windows(Hortonworks Data Platform) - Standalone – on Dell Laptop, OS Win7

64bit – 8GB)

3) SAP HANA Sever Installation(Lab Server running on VM – 24GB Standalone HANA 1.0 SPS 70) – SLES 11 SP1

4) Validation of Hadoop Installation

5) Data Load in Hadoop system : eBook Upload

6) Unstructured Data transformation into table/views, so that HANA Server can understand Hadoop data.

7) ODBC Driver installation & configuration on HANA Server

8) Smart Data Access in SAP HANA (through SAP HANA Studio), using Hadoop as a remote data source

Page 3: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 3

I. HDP 1.3 for Windows Installation Pre-requisite

- On HANA Server -Simba : Apache Hive ODBC Driver – Linux 64bit

- On Hadoop System - Microsoft Visual C++ 2010 Redistributable Package (64bit)

- On Hadoop System - Microsoft .NET Framework 4.0

- On Hadoop System - JAVA JDK 1.6/1.7 and PATH, JAVA_HOME environment variables setup

- On Hadoop System - Python 2.7 and PATH environment variable setup

In Linux

In Windows

Page 4: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 4

MS Visual C++ 2010

Page 5: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 5

MS .NET Framework 4

Page 6: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 6

Cancelling it as it gives the option of Repair !!

Page 7: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 7

Oracle JDK

Page 8: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 8

Page 9: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 9

i. Open the Control Panel -> System pane and click on Advanced system

settings.

ii. Click on the Advanced tab.

iii. Click the Environment Variables button.

iv. Under System variables, click New.

v. Enter the Variable Name as JAVA_HOME.

vi. Enter the Variable Value, as the installation path for the Java Development Kit.

For example, if your JDK is installed at C:\Java\jdk1.6.0_31, then you must

provide this path to the Variable Value.

vii. Click OK. viii. Click OK to close the Environment Variables dialog box.

Page 10: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 10

Python

Page 11: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 11

Page 12: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 12

Like Oracle JDK above, C:\Python27 also to be set in PATH variable.

Page 13: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 13

II. HDP 1.3 for Windows (Hortonworks Data Platform) Standalone Installation

Now accordingly update the C:\hdp-1.3.0.0-GA\clusterproperties.txt as per following:

Page 14: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 14

In Command Window(Admin Privilege):

msiexec /i "C:\hdp-1.3.0.0-GA\hdp-1.3.0.0.winpkg.msi" /lv "C:\DEBAJIT\HD\hdp13\hdp.log" HDP_LAYOUT="C:\hdp-

1.3.0.0-GA\clusterproperties.txt" HDP_DIR="C:\hdp\hadoop" DESTROY_DATA="Yes"

Page 15: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 15

There are 3 shortcuts created in desktop area.

Page 16: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 16

III. Validation of HDP 1.3 for Windows - Standalone Installation

Now we have to start Hadoop.

Page 17: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 17

Services were not started due to 0 bytes in .xml files(master & regionserver)

Also rest/thrift/thrift2.xml are also of zero bytes.

Page 18: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 18

1) Navigate to the hbase install directory: C:\hdp\hadoop\hbase-0.94.6.1.3.0.0-0380\bin 2) Open the hbase.cmd in a text editor 3) Look for the line that says: set PATH=%PATH%;%HADOOP_HOME%\bin 4) Delete it or comment it out with a @rem

Now Open a command prompt and navigate to hbase install: C:\hdp\hadoop\hbase-0.94.6.1.3.0.0-0380\bin Rebuild the .xml files: hbase.cmd --service master start > master.xml hbase.cmd --service regionserver start > regionserver.xml hbase.cmd --service rest > rest.xml hbase.cmd --service thrift > thrift.xml hbase.cmd --service thrift2 > thrift2.xml

Page 19: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 19

Now all the above .xml files having contents.

Stop & Start Hadoop – now it is PERFECT. No more failed services.

Page 20: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 20

Hadoop Smoketest

Page 21: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 21

Page 22: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 22

Page 23: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 23

Page 24: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 24

Page 25: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 25

Page 26: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 26

IV. Data Load in Hadoop System : eBook Upload

Page 27: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 27

Now to check whether Hadoop can read the same or not…

It can…perfect !!

Page 28: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 28

Page 29: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 29

After refresh

Page 30: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 30

Page 31: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 31

From the Namenode server, click on “Browse the filesystem”

Page 32: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 32

Click on “user”

Page 33: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 33

Click on .txt file…one can see the book

If one can click on .out file, then one can see the part file

Page 34: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 34

Page 35: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 35

V. Unstructured Data Transformation into Table/View in Hadoop System

Now we have to convert those files to be readable table format for HANA. For that we will use HIVE.

Created a table called “debajit_wc” for wordcount part file. But right now, it is empty.

Now loading Data.

Page 36: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 36

Page 37: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 37

Configuration change required in hive-site.xml file.

Page 38: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 38

Just changed from http to thrift – servermode.

And then restart Hadoop.

Page 39: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 39

Now we can test whether SAP HANA can connect to Hadoop….

Download the license file from email and deployed. Problem solved.

Page 40: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 40

VI. ODBC Driver Installation & Configuration on SAP HANA Server

Renaming done at WinSCP level….

Page 41: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 41

Page 42: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 42

Stopping HANA System

Page 43: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 43

SIMBA Driver

Changed items are as follows:

Page 44: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 44

UNIXODBC

We have to upgrade it because of compatibility issue with Simba.

Page 45: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 45

ODBC.INI - DSN purpose

Page 46: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 46

Now added odbc information into customer.sh

So, now the connection is working between HANA Server and Hadoop system from OS level.

Page 47: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 47

VII. Smart Data Access (Hadoop Data) in SAP HANA

SAP HANA Studio

Page 48: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 48

So, now the connection is working between HANA Server and Hadoop system from SAP HANA Studio.

Creating a schema in HP7

Page 49: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 49

Page 50: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 50

Page 51: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 51

One can do Query and Connection Monitoring when click on “Smart Data Access” under “Provisioning”.

Page 52: Hadoop integration with SAP HANA

SAP HANA Smart Data Access using Hadoop/Hive

Prepared by Debajit Banerjee Page 52

That’s all.

**** END OF DOCUMENT ****