predictive analysis library manual

8/2/2019 Predictive Analysis Library Manual

1/46


2/46

SAP AG 2011 2

Table of Contents

Overview .......................................................................................................................................... 4PAL Common Interface .................................................................................................................. 5

Input Data Table .................................................................................................................... 5Parameter Table .................................................................................................................... 6

Specifying the ID Column ............................................................................................. 6Output Data Table ................................................................................................................. 7

List of PAL Algorithms ................................................................................................................... 8PAL Algorithm Descriptions.......................................................................................................... 9

K-means ................................................................................................................................ 9Prerequisites ................................................................................................................. 9Interface (kmeans) ........................................................................................................ 9Interface (validateKmeans)......................................................................................... 11Example ...................................................................................................................... 12

C4.5 Decision Tree .............................................................................................................. 16Prerequisites ............................................................................................................... 16Interface (createDT) ................................................................................................... 16Interface (predictWithDT) ........................................................................................... 17Example ...................................................................................................................... 18

KNN .................................................................................................................................... 22

Prerequisites ............................................................................................................... 22Interface (knn) ............................................................................................................ 22Example ...................................................................................................................... 23

Multiple Linear Regression .................................................................................................. 26Prerequisites ............................................................................................................... 26Interface (linearRegression) ....................................................................................... 26Interface (forecastWithLR).......................................................................................... 27Example ...................................................................................................................... 28

Apriori .................................................................................................................................. 32Prerequisite ................................................................................................................ 32Interface (aprioriRule) ................................................................................................. 32Example ...................................................................................................................... 33

ABC Classification ............................................................................................................... 36Prerequisite ................................................................................................................ 36Interface (abcAnalysis) ............................................................................................... 36Example ...................................................................................................................... 37

Weighted Score Table ......................................................................................................... 39Prerequisites ............................................................................................................... 39Interface (weightedTable) ........................................................................................... 39


3/46

SAP AG 2011 3

Example ...................................................................................................................... 40Log and Trace ............................................................................................................................... 43Copyrights ..................................................................................................................................... 46


4/46

Predictive Analysis Library Reference Manual

SAP AG 2011 4

OverviewSince the SAP HANA 1.0 GA, SQL Script v2 can be used to express application logic within the

database that exceeds the capabilities of pure SQL. With enhanced control flow capabilities, SQL

Script v2 is more suitable for pushing complex application logic to the SAP HANA database.

When designing HANA applications, procedures are the main programmable containers. However, it is

difficult and often impossible to describe predictive analysis logic with procedures. For example, an

application may need to perform a cluster analysis in a huge customer table with a terra byte of data. It

is impossible to implement the analysis in a procedure even with the simple classic K-means algorithm.

Obviously, it is not a wise decision to copy a huge table to the application server to perform the K-

means calculation, because data-copying will be slow and is not necessary.

SAP provides the Predictive Analysis Library (PAL) to offer you the flexibility and efficiency to develop

HANA applications requiring predictive functionality. In the above case, using the PAL is the best

choice. PAL algorithms can be called directly in L wrapper within SQL Script v2. The inputs and

outputs are all tables. Currently, PAL includes seven well known predictive analysis algorithms in threedata mining algorithm categories:

Cluster analysis

Classification analysis

Association analysis

The functions in PAL are predefined. More functions will be supported in future releases. The seven

algorithms included in PAL were carefully selected based on the following criteria:

1. These algorithms are required for SAP HANA applications.

2. Market surveys (e.g. Rexer Analytics and KDnuggets polls) show that these algorithms are most

commonly used.

3. These are the most common algorithms available in database products from other vendors in

the marketplace, such as Microsoft SQL Server, Oracle, and IBM DB2.


5/46


SAP AG 2011 5

PAL Common InterfaceThe Predictive Analysis Library uses tables as the data interface. Three types of tables are used for

each PAL function:

Input data table Supplies input data to PAL functions. Each function can have one or more

input data tables.

Parameter table Supplies parameters to PAL functions. Parameters are special inputs that

control what the functions do. Each function has only one parameter table.

Output data table Holds the output results calculated by PAL functions. Each function can

have one or more output data tables.

The structures of the above tables are pre-defined and cannot be changed. Table names and column

names can be specified by the users.

Input Data Table

Input data tables contain the input data required to perform predictive analysis, sometimes described

as the historical and training data used to build predictive models.

For some cluster and classification functions, primary keys/IDs are required in the input data table, so

that the output data table can contain the same primary key/ID column. Usually the ID column is the

first column of the input data table. For some PAL functions, you can specify which column should be

the ID column. SeeSpecifying the ID Columnfor details.

A typical input data table looks like the following:

ID X1 X2 X3

0 0.1 0.1 0.1

1 0.1 0 0.1

2 10.1 10.1 0.1

3 10.1 9.9 0.1

4 10.1 10.1 0.1

5 1000.1 1000.1 0.1

6 999.1 1000.1 0.1

7 0.1 10000.1 0.18 0.1 9999.1 0.1

9 11.1 10.1 0.1

10 0.001 0.001 0.1


6/46


SAP AG 2011 6

Parameter Table

PAL functions use parameter tables to transfer parameter values. Each PAL function has its own

parameter table. To avoid the conflict of table names when several users call PAL functions at thesame time, the parameter table must be created as a local temporary column table, so that each

parameter table has its unique scope per session. The table structure is defined as below. Each row

contains only one parameter value.

Column Name Data Type Description

Name VARCHAR/CHAR Parameter name

intArgs Integer Integer parameter value

doubleArgs Double Double parameter value

stringArgs VARCHAR/CHAR String parameter value

The following shows an example of a parameter table with three parameters. The first parameter,

THREAD_NUMBER, is an integer parameter. Thus, in the THREAD_NUMBER row, you should fill the

parameter value in the intArgs column, and leave the doubleArgs and stringArgs columns blank.

Name intArgs doubleArgs stringArgs

THREAD_NUMBER 1

SUPPORT 0.2

VAR_NAME hello

Specifying the ID Column

Some cluster and classification functions include an ID column in the input data table. Usually the ID

column is the first column of the input data table. For some PAL functions, you can specify a column

number for the ID_COLUMN parameter in the parameter table to tell the function which column should

be the ID column. Column number starts from 0, so the first column is column 0, the second column 1,

and so on. For example, if you want to specify the first column as the ID column, the parameter table

should contain the following row:

Name intArgs doubleArgs stringArgs

ID_COLUMN 0


7/46


SAP AG 2011 7

Output Data Table

Output tables contain predicted results, such as classification types and fitted values. A typical output

table looks like the following:ID CENTER_ASSIGN ENERGY

0 0 0.3

1 1 0.2

2 4 20.3

3 4 20.1

4 4 20.3

5 4 2000.3

6 4 1999.3

7 2 10000.3

8 2 9999.3

9 3 21.3

10 1 0.12


8/46


SAP AG 2011 8

List of PAL AlgorithmsThe following table lists all available algorithms and functions in the Predictive Analysis Library. The

algorithms are grouped by categories.

To learn more about an algorithm, click the algorithm name.

To learn more about function interface, click the function name.

Category Algorithm Function Name Description

Cluster Analysis K-means kmeans Clusters data using the k-means algorithm.

validateKmeans Validates or measures the quality of

clustered result.

ClassificationAnalysis C4.5 Decision Tree

createDT Creates tree models using the C4.5algorithm. The model is represented in

JSON.

predictWithDT Uses the tree model to perform prediction.

KNN knn K-nearest neighbor algorithm function.

Multiple Linear

Regression

linearRegression Multiple linear regression algorithm

function.

forecastWithLR Makes forecasts using the regression

equation.

AssociationAnalysis Apriori

aprioriRule Creates association rules using the Apriorialgorithm.

Other ABC Classification abcAnalysis Performs ABC classification analysis. For

example, when 20% items contribute 80%

of the total revenue, these 20% items can

be put into class A.

Weighted Score

Table

weightedTable Performs weighted table calculation.


9/46


SAP AG 2011 9

PAL Algorithm DescriptionsThis section contains detailed descriptions of all available algorithms in the Predictive Analysis Library.

The following information is provided for each algorithm.

Algorithm description

Prerequisites

Interface description (function name; L function signature; input, parameter, and output tables)

Example

K-means

In predictive analysis, k-means clustering is a method of cluster analysis which aims to partition n

observations or records into kclusters in which each observation belongs to the cluster with the

nearest mean. In marketing and customer-relationship management areas, it uses customer data to

track customer behavior and create strategic business initiatives. Organizations can use this data to

divide customers into segments based on variants such as demography, customer behavior, customer

profitability, measure of risk, and lifetime value of a customer or retention probability.

Clustering works to group records together according to an algorithm or mathematical formula that

attempts to find centroids, or centers, around which similar records gravitate. The most common

algorithm uses an iterative refinement technique. It is also referred to as Lloyd's algorithm:

Given an initial set of kmeans m1,...,mk, the algorithm proceeds by alternating between two steps:

Assignment step: Assign each observation to the cluster with the closest mean.

Update step: Calculate the new means to be the centroid of the observations in the cluster.

The algorithm repeats until the assignments no longer change.

For more information, refer tohttp://en.wikipedia.org/wiki/K-means_clustering.

The k-means implementation in PAL supports multi-thread, data normalization, different distance level

measurement, and cluster quality measure (Silhouette).The implementation doesn t support

categorical data, however this can be managed through data transformation. The first K and random K

starting methods are supported.

Prerequisites The input data should contain an ID column and the other columns should be integer or double

data type.

Input data does not contain null value. The function will issue errors when encountering null

values.

Interface (kmeans)

Function: pal::kmeans

This is a clustering function using the k-means algorithm.
http://en.wikipedia.org/wiki/K-means_clusteringhttp://en.wikipedia.org/wiki/K-means_clusteringhttp://en.wikipedia.org/wiki/K-means_clusteringhttp://en.wikipedia.org/wiki/K-means_clustering


10/46


SAP AG 2011 10

L Function Signature

pal::kmeans(Table dataset, Table args, Table result)

Input Table

Table Column Column Data Type Description Constraint

Data 1st column Integer ID

Other columns Integer or double Attribute data

Parameter Table

Name Data Type Description

GROUP_NUMBER Integer Number of groups (k).

DISTANCE_LEVEL Integer Computes the distance between item and

cluster center.

2 = Euclidean distance

MAX_ITERATION Integer Maximum number of iterations.

START_COLUMN Integer The index number of the first data column

(column index starts from zero).

COLUMN_NUM Integer Number of data columns.

START_ROW Integer The index number of the first data row (row

index starts from zero).

ROW_NUM Integer Number of data rows.

INIT_TYPE Integer Center initialization type:

1 = first K

2 = weighted random with replacement

3 = random without replacement

NORMALIZATION Integer Normalization type:

0 = no

1 = yes, for each point X(x1,x2,...xn),normalized value will be X'

(|x1|/S,|x2|/S...|xn|/S) where S =

|x1|+|x2|+...|xn|

THREAD_NUMBER Integer Number of threads.

EXIT_THRESHOLD Double Threshold (actual value) for exiting the

iterations.


11/46


SAP AG 2011 11

Output Table


Result 1st column Integer ID

2nd column Integer or double Clustered item assigned to

class number

3th column Integer or double Sum of item's attribute, which

is used in normalization.

Interface (validateKmeans)

Function: pal::validateKmeans

This is a quality measurement function for k-means clustering.


pal::validateKmeans (Table dataset, Table args, Table

result)

Input Table



Other

columns

Integer or double Attribute data

Type Data/

Class Data

1st column Integer ID

2nd column Integer Class type

Parameter Table


VARIABLE_NUM Integer Number of variables

THREAD_NUMBER Integer Number of threads

Output Table


Result 1st column VARCHAR/CHAR Name

2nd column Double Measure result


12/46


SAP AG 2011 12

Example

;#KMEANS.SQL

; # create type for kmeans resultALTERSESSION SET CURRENT_SCHEMA = "DM_PAL";DROPTYPE T_KMEANS_RESULT_ASSIGN_TAB;CREATETYPE T_KMEANS_RESULT_ASSIGN_TABASTABLE("ID" INT,"CENTER_ASSIGN" INT,"ENERGY"DOUBLE);

; # create type for double argumentsDROPTYPE T_SINGLE_COLUMN_DOUBLE_TAB ;CREATETYPE T_SINGLE_COLUMN_DOUBLE_TABASTABLE("VALUE"DOUBLE) ;

; # create type for integer arguments

DROPTYPE T_SINGLE_COLUMN_INT_TAB ;CREATETYPE T_SINGLE_COLUMN_INT_TABASTABLE("VALUE" INT) ;

DROPTYPE KMEANS_DOUBLE_INPUT;CREATETYPE KMEANS_DOUBLE_INPUTASTABLE("ID" INT,"V000"DOUBLE, "V001"DOUBLE,

primarykey("ID"));

DROPTABLE #CONTROL_TAB;CREATELOCALTEMPORARYCOLUMNTABLE #CONTROL_TAB (

"Name"VARCHAR(50),"intArgs"INTEGER,

"doubleArgs"DOUBLE,"stringArgs"VARCHAR(100));

DROPTYPE CONTROL_T;CREATETYPE CONTROL_TASTABLE ( "Name"VARCHAR(50),"intArgs"INTEGER,"doubleArgs"DOUBLE,"stringArgs"VARCHAR(100));

DROPPROCEDURE KMEANS_WITH_NEWDB;

CREATEPROCEDUREKMEANS_WITH_NEWDB(IN dataset KMEANS_DOUBLE_INPUT,IN control CONTROL_T,

OUT cluster_assignment T_KMEANS_RESULT_ASSIGN_TAB)LANGUAGELLANGASBEGINexport Void main(Table "dataset" datasetTab,Table "control" argsTab,Table"cluster_assignment" & resultTab){pal::kmeans(datasetTab, argsTab, resultTab);}END;


13/46


14/46


SAP AG 2011 14

;#EXPECTED OUTPUT:

#validateKmeans.sqlDROPVIEWKMEANS_TYPE_ASSIGN_TAB_PAL;

CREATEVIEWKMEANS_TYPE_ASSIGN_TAB_PALASSELECT"ID", "CENTER_ASSIGN"AS"TYPE_ASSIGN"FROMKMEANS_RESULT_ASSIGN_TAB_PAL;

DROPTYPE T_KMEANS_TYPE_ASSIGN_TAB_S;CREATETYPE T_KMEANS_TYPE_ASSIGN_TAB_SASTABLE("ID"INTEGER,"TYPE_ASSIGN"INTEGER);

DROPTYPE T_KMEANS_RESULT_SVALUE_TAB;CREATETYPE T_KMEANS_RESULT_SVALUE_TABASTABLE(

"NAME"VARCHAR(50),"S"DOUBLE);

DROPPROCEDURE KMEANSVALIDATE_WITH_NEWDB;

CREATEPROCEDUREKMEANSVALIDATE_WITH_NEWDB(IN dataset KMEANS_DOUBLE_INPUT,IN typeset T_KMEANS_TYPE_ASSIGN_TAB_S,IN control CONTROL_T,OUT sValue T_KMEANS_RESULT_SVALUE_TAB)LANGUAGELLANGASBEGIN

export Void main(Table "dataset" datasetTab,


15/46


16/46


SAP AG 2011 16

C4.5 Decision Tree

A decision tree is used as a classifier for determining an appropriate action or decision among a

predetermined set of actions for a given case. A decision tree helps to effectively identify the factors to

consider and how each factor has historically been associated with different outcomes of the decision.A decision tree is a classifier that uses a tree-like structure of conditions and their possible

consequences. Each node of a decision tree can be a leaf node or a decision node.

Leaf node: identifies the value of the dependent (target) variable.

Decision node: contains one condition that specifies some test on an attribute value. The

outcome of the condition is further divided into branches with subtrees or leaf nodes.

C4.5 is an algorithm used to generate a decision tree. C4.5 builds decision trees from a set of training

data , using the concept of information entropy. The training data is a set S = s1,s2,... of already

classified samples. Each sample si = x1,x2,... is a vector where x1,x2,... represent attributes or

features of the sample. The training data is augmented with a vector C = c1,c2,... where c1,c2,...

represent the class to which each sample belongs. At each node of the tree, C4.5 chooses oneattribute of the data that most effectively splits its set of samples into subsets enriched in one class or

the other. Its criterion is the normalized information gain (difference in entropy) that results from

choosing an attribute for splitting the data. The attribute with the highest normalized information gain is

chosen to make the decision. The C4.5 algorithm then recursively works on the smaller sublists. For

more information, refer tohttp://en.wikipedia.org/wiki/C4.5_algorithm.

The C4.5 decision tree functions implemented in PAL supports both discrete and continuous values. A

continuous attribute is discretised by defining fixed intervals provided by the user. For example, if the

salary ranges from $100 to $20,000, then we can form intervals such as $0 $8,000, $8,000

$18,000, and $18,000 $20,000. An attribute value will fall into any one of these intervals. In the PAL

implementation, Reduced Error Pruning (REP) Algorithm is used as pruning method.

Prerequisites The column order and column number of predicted data should be the same as the order and

number used in tree model building.

The last column of training data is used as a predicted field and its type should be discrete type.

Predicted data set should have an ID column.

Input data does not contain null value, otherwise exceptions will be thrown.

The table used to store the tree model should be a column table.

Interface (createDT)Function: pal::createDT

The createDT function creates a decision tree from input training data.


pal::createDT(Table training, Table args, Table result)
http://en.wikipedia.org/wiki/C4.5_algorithmhttp://en.wikipedia.org/wiki/C4.5_algorithmhttp://en.wikipedia.org/wiki/C4.5_algorithmhttp://en.wikipedia.org/wiki/C4.5_algorithm


17/46


SAP AG 2011 17

Input Table


Training /

Historical Data

Columns VARCHAR/CHAR,

integer, or double

Table used to

build the predictivetree model

Discrete value: integer and

stringContinuous value: integer and

double

Parameter Table


START_COLUMN Integer The first attribute/column used to make

prediction in the training data set


END_COLUMN Integer The last attribute/column used to make

prediction in the training data set .This

attribute/column must contain class

information (column index starts from

zero).

PERCENTAGE Double The percentage to be applied to

determine the input training data set.


CONTINUOUS_COL (Integer,

Double)

(optional)

Defines which column needs

discretization and the interval provided

by the user (column index starts from

zero).

The integer value specifies the column

position.

The double value specifies the interval.

Output Table


Result (treemodel)

1st column CLOB Tree model savedas a JSON string

in the 1st column.

The table should be a columntable; otherwise the CLOB

type is not supported.

Interface (predictWithDT)

Function: pal::predictWithDT

The predictWithDT function is used to perform prediction by using decision trees.


18/46


SAP AG 2011 18


pal::predictWithDT(Table predictive, Table args, Table model,

Table result)

Input Table


Predicted Data Columns VARCHAR/CHAR or

integer/Double

Data to be classified

(predicted)

An ID column is

mandatory. Its

data type should

be integer.

Predictive Model 1st column CLOB Serialized tree model

Parameter Table


START_COLUMN Integer The first attribute/column used to make

prediction in the input data set (column index

starts from zero).

END_COLUMN Integer The last attribute/column used to make

prediction in the input data set (column index

starts from zero).

ID_COLUMN Integer The column of predicted data used as primary

key/ID. Column index starts from zero (columnindex starts from zero).


Output Table



2nd column VARCHAR/CHAR Predictive result

Example;# DECISION TREE

DROPTYPE DATA_T;

CREATETYPE DATA_TASTABLE( "ID" INT,"Region"VARCHAR(50),"SalesPeriod"VARCHAR(50),"Revenue" INT,"CLASS"VARCHAR(50));

DROPTYPE MODEL_T;

CREATETYPE MODEL_TASTABLE("Model"CLOB);


19/46


20/46


SAP AG 2011 20

INSERTINTO TESTDT_TABVALUES (4, 'West', 'Spring', 5000, 'Poor');

INSERTINTO TESTDT_TABVALUES (5, 'East', 'Spring', 200000, 'Good');

INSERTINTO TESTDT_TABVALUES (6, 'South', 'Summer', 25000, 'Poor');

INSERTINTO TESTDT_TABVALUES (7, 'South', 'Spring', 10000, 'Average');

INSERTINTO TESTDT_TABVALUES (8, 'North', 'Winter', 50000, 'Average');

DROPTABLE PREDICTIVE_TAB;

CREATECOLUMNTABLE PREDICTIVE_TAB ("ID" INT,"Region"VARCHAR(50),"SalesPeriod"VARCHAR(50),"Revenue" INT);

INSERTINTO PREDICTIVE_TABVALUES (0,'South', 'Autumn', 60000);

INSERTINTO PREDICTIVE_TABVALUES (1,'North', 'Spring', 30000);

INSERTINTO PREDICTIVE_TABVALUES (2,'South', 'Summer', 25000);

INSERTINTO PREDICTIVE_TABVALUES (3,'West', 'Winter', 5000);

DROPTABLE #CONTROL_TAB;

CREATELOCALTEMPORARYCOLUMNTABLE #CONTROL_TAB ("Name"VARCHAR(100),"intArgs" INT, "doubleArgs"DOUBLE, "stringArgs"VARCHAR(100));

INSERTINTO #CONTROL_TABVALUES ('START_COLUMN',1,null,null);

INSERTINTO #CONTROL_TABVALUES ('END_COLUMN',4,null,null);

INSERTINTO #CONTROL_TABVALUES ('PERCENTAGE',null,0.71,null);

INSERTINTO #CONTROL_TABVALUES ('THREAD_NUMBER',1,null,null);

INSERTINTO #CONTROL_TABVALUES ('CONTINUOUS_COL',3,25000,null);

INSERTINTO #CONTROL_TABVALUES ('CONTINUOUS_COL',3,60000,null);

DROPTABLE #PRE_CONTROL_TAB;

CREATELOCALTEMPORARYCOLUMNTABLE #PRE_CONTROL_TAB ("Name"VARCHAR(100), "intArgs" INT, "doubleArgs"DOUBLE, "stringArgs"VARCHAR(100));

INSERTINTO #PRE_CONTROL_TABVALUES ('START_COLUMN',1,null,null);

INSERTINTO #PRE_CONTROL_TABVALUES ('END_COLUMN',3,null,null);

INSERTINTO #PRE_CONTROL_TABVALUES ('THREAD_NUMBER',2,null,null);

INSERTINTO #PRE_CONTROL_TABVALUES ('ID_COLUMN',0,null,null);

DROPTABLE RESULTS_TAB;

CREATECOLUMNTABLE RESULTS_TAB ("ID" INT, "CLASS"VARCHAR(50));

DROPTABLE MODEL_TAB;CREATECOLUMNTABLE MODEL_TAB ("Model"CLOB);

CALL palDT1(TESTDT_TAB, "#CONTROL_TAB", MODEL_TAB)with overview;

CALL palDT2(PREDICTIVE_TAB, "#PRE_CONTROL_TAB", MODEL_TAB, RESULTS_TAB)with overview;

SELECT * FROMRESULTS_TAB;

;#EXPECTED OUTPUT:


21/46


SAP AG 2011 21


22/46


SAP AG 2011 22

KNN

The k-nearest neighbor algorithm (KNN) is a method for classifying objects based on closest training

examples in the feature space. KNN is amongst the simplest of all machine learning algorithms: an

object is classified by a majority vote of its neighbors, with the object being assigned to the class mostcommon amongst its knearest neighbors (kis a positive integer, typically small). If k = 1, then the

object is simply assigned to the class of its nearest neighbor. The neighbors are taken from a set of

objects for which the correct classification is known.

The training examples are vectors in a multidimensional feature space, each with a class label. The

training phase of the algorithm consists only of storing the feature vectors and class labels of the

training samples. In the classification phase, k is a user-defined constant, and an unlabelled vector (a

query or test point) is classified by assigning the label which is most frequent among the ktraining

samples nearest to that query point. Usually Euclidean distance is used as the distance metric. For

more information, refer tohttp://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm.

The PAL implementation of KNN supports multi-thread and different voting type.

Prerequisites The first column of training data and input data should be ID column. The second column of

training data should be class type. The class type column is of integer type. Other data columns

are of integer or double type.

Input data does not contain null value.

Interface (knn)Function: pal::knn

This is a classification function using the KNN algorithm.


pal::knn(Table value, Table classvalue, Table args,

Table result)

Input Table


Training Data 1st column Integer ID

2nd column Integer Class type


Class Data 1st column Integer ID


Parameter Table


K_NEAREST_NEIGHBOURS Integer Number of nearest neighbors (k)
http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithmhttp://en.wikipedia.org/wiki/K-nearest_neighbor_algorithmhttp://en.wikipedia.org/wiki/K-nearest_neighbor_algorithmhttp://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm


23/46


SAP AG 2011 23


ATTRIBUTE_NUM Integer Number of attributes

VOTING_TYPE Integer Voting type:

0 = majority voting

1 = distance-weighted voting


Output Table



2nd column Integer or double class type

Example

;# knn.sql

ALTERSESSION SET CURRENT_SCHEMA = "DM_PAL";

DROPTYPE DATA_T;

CREATETYPE DATA_TASTABLE( "ID" INT,"TYPE" INT,"X1"DOUBLE, "X2"DOUBLE);

DROPTYPE CLASSDATA_T;

CREATETYPE CLASSDATA_TASTABLE( "ID" INT,"X1"DOUBLE, "X2"DOUBLE);

DROPTYPE RESULT_T;

CREATETYPE RESULT_TASTABLE("ID" INT,"Type" INT);

DROPTABLE #CONTROL_TAB;CREATELOCALTEMPORARYCOLUMNTABLE #CONTROL_TAB (

"Name"VARCHAR(50),

"intArgs"INTEGER,

"doubleArgs"DOUBLE,

"stringArgs"VARCHAR(100));

DROPTYPE CONTROL_T;

CREATETYPE CONTROL_TASTABLE( "Name"VARCHAR(50),"intArgs"INTEGER, "doubleArgs"DOUBLE,"stringArgs"VARCHAR(100));

DROPPROCEDURE palKNN;


24/46


SAP AG 2011 24

CREATEPROCEDURE palKNN( IN data DATA_T, IN classdata CLASSDATA_T, INcontrol CONTROL_T, OUT results RESULT_T )

LANGUAGELLANGAS

BEGIN

export Void main(Table"data" dataTab,

Table "classdata" classdataTab,

Table "control" argsTab,

Table "results" & resultsTab) {

pal::knn(dataTab, classdataTab, argsTab, resultsTab);

}

END;

DROPTABLE DATA_TAB;

CREATECOLUMNTABLE DATA_TAB ( "ID" INT,"TYPE" INT,"X1"DOUBLE, "X2"DOUBLE);

INSERTINTO DATA_TABVALUES (0,2,1,1);










DROPTABLE CLASSDATA_TAB;

CREATECOLUMNTABLE CLASSDATA_TAB ( "ID" INT,"X1"DOUBLE, "X2"DOUBLE);

INSERTINTO CLASSDATA_TABVALUES (0,2,1);






INSERTINTO CLASSDATA_TABVALUES (6,500,999);INSERTINTO CLASSDATA_TABVALUES (7,199,999);

TRUNCATETABLE #CONTROL_TAB;

INSERTINTO #CONTROL_TABVALUES ('K_NEAREST_NEIGHBOURS',3,null,null);

INSERTINTO #CONTROL_TABVALUES ('ATTRIBUTE_NUM',2,null,null);

INSERTINTO #CONTROL_TABVALUES ('VOTING_TYPE',0,null,null);



CREATECOLUMNTABLE RESULTS_TAB ("ID" INT,"Type" INT);


25/46


SAP AG 2011 25

CALL palKNN(DATA_TAB, CLASSDATA_TAB, "#CONTROL_TAB", RESULTS_TAB)withoverview;


;#EXPECTED OUTPUT:


26/46


SAP AG 2011 26

Multiple Linear Regression

Linear regression is an approach to model the relationship between a scalar variable yand one or

more variables denoted X. In linear regression, data are modeled using linear functions, and unknown

model parameters are estimated from the data. Such models are called linear models.For more information, refer tohttp://en.wikipedia.org/wiki/Linear_regression.

In PAL, the implementation of linear regression is to solve the equation:

Ax=Y

Where A is MxN matrix, x is Nx1 matrix, and Y is Mx1 matrix.

Then, x= reverse(A)*Y

And it can be transformed into

x= reverse(transpose(A)*A) *transpose(A) *Y

The implementation also supports calculating the F value and R^2 determining statistical significance.

Prerequisites No missing or null data in inputs.

Data is numeric, not categorical.

Given the structure as Y and X1...Xn, there must be more than n+1 records available for

analysis.

Interface (linearRegression)

Function: pal::linearRegression

This is a multiple linear regression function.


pal::linearRegression( Table data, Table args, Table result,

Table fitted, Table significance)

Input Table



2nd column Integer or double Variable y

Other columns Integer or double Variable Xn

Parameter Table


VARIABLE_NUM Integer Number of variable X

http://en.wikipedia.org/wiki/Linear_regressionhttp://en.wikipedia.org/wiki/Linear_regressionhttp://en.wikipedia.org/wiki/Linear_regressionhttp://en.wikipedia.org/wiki/Linear_regression


27/46


SAP AG 2011 27

Output Table



2nd column Integer or double Value Ai (A0 is

the intercept; A1 is

the beta

coefficient for X1,

A2 is the beta

coefficient for X2

etc )

Fitted Data 1st column Integer ID

2nd column Integer or double Value Yi

Significance 1st column VARCHAR/CHAR Name (R^2 / F)

2nd column Double Value

Interface (forecastWithLR)

Function: pal::forecastWithLR

This function is used to perform predication with linear regression result.


pal::forecastWithLR( Table predictdata, Table coefficient,

Table args, Table result)

Input Table


Predictive Data 1st column Integer ID

Other columns Integer or double Variable Xn

Coefficient 1st column Integer ID

2nd column Integer or double Value Ai

Parameter Table


VARIABLE_NUM Integer Number of variable X



28/46


SAP AG 2011 28

Output Table


Fitted Result 1st column Integer ID

2nd column Integer or double Value Yi

Example

;# linearRegression


DROPTYPE DATA_T;

CREATETYPE DATA_TASTABLE( "ID" INT,"Y"DOUBLE,"X1"DOUBLE, "X2"DOUBLE);

DROPTYPE RESULT_T;

CREATETYPE RESULT_TASTABLE("ID" INT,"Ai"DOUBLE);

DROPTYPE FITTED_T;

CREATETYPE FITTED_TASTABLE("ID" INT,"Fitted"DOUBLE);

DROPTYPE SIGNIFICANCE_T;

CREATETYPE SIGNIFICANCE_TASTABLE("NAME"varchar(50),"VALUE"DOUBLE);

DROPTYPE CONTROL_T;


DROPPROCEDURE palLR;

CREATEPROCEDURE palLR( IN data DATA_T, IN control CONTROL_T, OUTresults RESULT_T, OUT fittedValue FITTED_T,OUT significanceSIGNIFICANCE_T )

LANGUAGELLANG

AS

BEGIN

export Void main(Table"data" dataTab,


Table "results" & resultsTab,

Table "fittedValue" & fittedTab,

Table "significance" &significanceTab) {

pal::linearRegression(dataTab, argsTab, resultsTab,fittedTab,significanceTab);

}

END;


29/46


SAP AG 2011 29

DROPTABLE DATA_TAB;

CREATECOLUMNTABLE DATA_TAB ( "ID" INT,"Y"DOUBLE,"X1"DOUBLE, "X2"DOUBLE);

INSERTINTO DATA_TABVALUES (0,0.5,0.13,0.33);

INSERTINTO DATA_TABVALUES (1,0.15,0.14,0.34);INSERTINTO DATA_TABVALUES (2,0.25,0.15,0.36);









CREATELOCALTEMPORARYCOLUMNTABLE #CONTROL_TAB ( "Name"VARCHAR(50),

"intArgs"INTEGER,

"doubleArgs"DOUBLE,


INSERTINTO #CONTROL_TABVALUES ('VARIABLE_NUM',2,null,null);



CREATECOLUMNTABLE RESULTS_TAB ("ID" INT,"Ai"DOUBLE);

DROPTABLE FITTED_TAB;

CREATECOLUMNTABLE FITTED_TAB ("ID" INT,"Fitted"DOUBLE);

DROPTABLE SIGNIFICANCE_TAB;

CREATECOLUMNTABLE SIGNIFICANCE_TAB ("NAME"varchar(50),"VALUE"DOUBLE);

CALL palLR(DATA_TAB, "#CONTROL_TAB", RESULTS_TAB, FITTED_TAB,SIGNIFICANCE_TAB)with overview;


SELECT * FROMFITTED_TAB;

SELECT * FROMSIGNIFICANCE_TAB;

;#EXPECTED OUTPUT:RESULTS_TAB:

FITTED_TAB:


30/46


SAP AG 2011 30

SIGNIFICANCE_TAB:

;# forecastWithLR


DROPTYPE PREDICT_T;

CREATETYPE PREDICT_TASTABLE( "ID" INT,"X1"DOUBLE, "X2"DOUBLE);

DROPTYPE COEFFICIENT_T;CREATETYPE COEFFICIENT_TASTABLE("ID" INT,"Ai"DOUBLE);

DROPTYPE FITTED_T;

CREATETYPE FITTED_TASTABLE("ID" INT,"Fitted"DOUBLE);

DROPTYPE CONTROL_T;


DROPPROCEDURE palForecastWithLR;

CREATEPROCEDURE palForecastWithLR( IN predictData PREDICT_T, INcoefficient COEFFICIENT_T, IN control CONTROL_T, OUT fittedValueFITTED_T )

LANGUAGELLANG

AS

BEGIN

export Void main(

Table "predictData"predictDataTab,

Table "coefficient" coefficientTab,



31/46


SAP AG 2011 31

Table "fittedValue" & fittedTab) {

pal::forecastWithLR(predictDataTab, coefficientTab, argsTab,fittedTab);

}

END;

DROPTABLE PREDICTDATA_TAB;

CREATECOLUMNTABLE PREDICTDATA_TAB ( "ID" INT,"X1"DOUBLE, "X2"DOUBLE);

INSERTINTO PREDICTDATA_TABVALUES (0,0.5,0.3);

INSERTINTO PREDICTDATA_TABVALUES (1,4,0.4);

INSERTINTO PREDICTDATA_TABVALUES (2,0,1.6);




CREATELOCALTEMPORARYCOLUMNTABLE #CONTROL_TAB (

"Name"VARCHAR(50),

"intArgs"INTEGER,

"doubleArgs"DOUBLE,


INSERTINTO #CONTROL_TABVALUES ('VARIABLE_NUM',2,null,null);


DROPTABLE COEEFICIENT_TAB;CREATECOLUMNTABLE COEEFICIENT_TAB ("ID" INT,"Ai"DOUBLE);

INSERTINTO COEEFICIENT_TABVALUES (0,1.7120914258645001);

INSERTINTO COEEFICIENT_TABVALUES (1,0.2652771198483208);

INSERTINTO COEEFICIENT_TABVALUES (2,-3.471103742302148);

DROPTABLE FITTED_TAB;

CREATECOLUMNTABLE FITTED_TAB ("ID" INT,"Fitted"DOUBLE);

CALL palForecastWithLR(PREDICTDATA_TAB, COEEFICIENT_TAB, "#CONTROL_TAB",FITTED_TAB)with overview;

SELECT * FROMFITTED_TAB;

;#EXPECTED OUTPUT:


32/46


SAP AG 2011 32

Apriori

Apriori is a classic predictive analysis algorithm for finding association rules in association analysis.

Association analysis uncovers the hidden patterns, correlations or casual structures among a set ofitems or objects. For example, association analysis is used to understand what products and services

customers tend to purchase at the same time. By analyzing the purchasing trends of customers with

association analysis, then prediction of their future behavior may be made.

Apriori is designed to operate on databases containing transactions. As is common in association rule

mining, given a set of items, the algorithm attempts to find subsets which are common to at least a

minimum number of the item sets. Apriori uses a bottom up approach, where frequent subsets are

extended one item at a time, a step known as candidate generation, and groups of candidates are

tested against the data. The algorithm terminates when no further successful extensions are found.

Apriori uses breadth-first search and a tree structure to count candidate item sets efficiently. It

generates candidate item sets of length kfrom item sets of length k-1. Then it prunes the candidates

which have an infrequent sub pattern. The candidate set contains all frequent k-length item sets. Afterthat, it scans the transaction database to determine frequent item sets among the candidates. For

more information, refer tohttp://en.wikipedia.org/wiki/Apriori_algorithm.

The Apriori function in PAL uses vertical data format to store the transaction data in memory. The

function can take string or integer transaction ID and item ID as input. It supports the output of

confidence, support, and lift value, but does not limit the number of output rules. However, you can

use SQL Script to select the number of output rules, for example:

SELECT TOP 2000 FROM RULE_RESULTS where lift >0.5

PrerequisiteInput data does not contain null value.

Interface (aprioriRule)Function: pal::aprioriRule

This function reads input transactions data and generates association rules by the Apriori algorithm.


pal::aprioriRule( Table dataset, Table args, Table result)

Input Table


Dataset/Historical Data Transaction ID

column

Integer or

VARCHAR/CHAR

Transaction ID

Item column Integer or

VARCHAR/CHAR

Item ID
http://en.wikipedia.org/wiki/Apriori_algorithmhttp://en.wikipedia.org/wiki/Apriori_algorithmhttp://en.wikipedia.org/wiki/Apriori_algorithmhttp://en.wikipedia.org/wiki/Apriori_algorithm


33/46


SAP AG 2011 33

Parameter Table


MIN_SUPPORT Double User-specified minimum support (actual

value).

MIN_CONFIDENCE Double User-specified minimum confidence (actual

value).

TID_COLUMN Integer Indicates which column stores the

transaction ID (column index starts from

zero).

ITEM_COLUMN Integer Indicates which column stores the items ID



Output Table


Result 1st column VARCHAR/CHAR Leading items

2nd column VARCHAR/CHAR Dependent items

3rd column Double Support value

4th column Double Confidence value

5th column Double Lift value

Example;#aprioriRule

DROPTYPE DATA_T;

CREATETYPE DATA_TASTABLE( "ID" INT,"CUSTOMER" INT,"ITEM"VARCHAR(20));

DROPTYPE RESULT_T;

CREATETYPE RESULT_TASTABLE("PRERULE"VARCHAR(500),"POSTRULE"VARCHAR(500),"SUPPORT"DOUBLE, "CONFIDENCE"DOUBLE,"LIFT"DOUBLE);


CREATELOCALTEMPORARYCOLUMNTABLE #CONTROL_TAB(

"Name"VARCHAR(50),

"intArgs"INTEGER,

"doubleArgs"DOUBLE,


DROPTYPE CONTROL_T;


34/46


SAP AG 2011 34

CREATETYPE CONTROL_TASTABLE( "Name"VARCHAR(50),"intArgs"INTEGER,"doubleArgs"DOUBLE,"stringArgs"VARCHAR(100));

DROPPROCEDURE palapprioriRule;

CREATEPROCEDURE palapprioriRule( IN data DATA_T, IN control CONTROL_T,OUT results RESULT_T )

LANGUAGELLANG

AS

BEGIN

export Void main(Table "data"dataTab,



pal::aprioriRule(dataTab, argsTab, resultsTab);

}

END;

DROPTABLE TESTASSOCIATION_TAB;

CREATECOLUMNTABLE TESTASSOCIATION_TAB("ID" INT,"CUSTOMER" INT,"ITEM"VARCHAR(20));

INSERTINTO TESTASSOCIATION_TABVALUES (0, 0, 'item1');


INSERTINTO TESTASSOCIATION_TABVALUES (2, 0, 'item5'); #transacion T0:I1,I2,I5

INSERTINTO TESTASSOCIATION_TABVALUES (3, 1, 'item2');INSERTINTO TESTASSOCIATION_TABVALUES (4, 1, 'item4'); #transacion T1:I2,I4






INSERTINTO TESTASSOCIATION_TABVALUES (10, 4,'item1');

INSERTINTO TESTASSOCIATION_TABVALUES (11, 4,'item3');














35/46


SAP AG 2011 35

CREATECOLUMNTABLE RESULTS_TAB ("PRERULE"VARCHAR(500),"POSTRULE"VARCHAR(500), "SUPPORT"Double, "CONFIDENCE"Double,"LIFT"DOUBLE);

TRUNCATETABLE #CONTROL_TAB;

INSERTINTO #CONTROL_TABVALUES ('TID_COLUMN',1,null,null);INSERTINTO #CONTROL_TABVALUES ('ITEM_COLUMN',2,null,null);


INSERTINTO #CONTROL_TABVALUES ('MIN_SUPPORT',null,0.2,null);

INSERTINTO #CONTROL_TABVALUES ('MIN_CONFIDENCE',null,0.2,null);

TRUNCATETABLE RESULTS_TAB;

CALL palapprioriRule(TESTASSOCIATION_TAB, "#CONTROL_TAB", RESULTS_TAB)with overview;


;#EXPECTED OUTPUT:


36/46


SAP AG 2011 36

ABC Classification

ABC Classification is used to classify objects, such as customers, employees, or products, based on a

particular measure, such as revenue or profit. ABC analysis suggests that inventories of anorganization are not of equal value. Thus, the inventories are grouped into three categories (A, B, and

C) by their estimated importance. A items are very important for an organization. B items are

important, but less important than A items and more important than C items. Therefore, B items

are of medium importance, and C items are marginally important.

An example of ABC classification is as follows:

A items 20% of the items account for 70% of the annual consumption value of all items.

B items 30% of the items account for 25% of the annual consumption value of all items.

C items 50% of the items account for 5% of the annual consumption value of all items.

For more information, refer tohttp://en.wikipedia.org/wiki/ABC_analysis.

Prerequisite

Input data cannot contain null value.

The item names in the Input table must be of string data type and be unique.

Interface (abcAnalysis)

Function: pal::abcAnalysis

This function performs the ABC analysis algorithm.


pal::abcAnalysis ( Table target, Table args, Table result)

Input Table


Target Data 1st column VARCHAR/CHAR Item name

2nd column Double Value

Parameter Table


START_COLUMN Integer The first column used to do the

classification (column index starts from

zero).

END_COLUMN Integer The last column used to do the classification


THREAD_NUMBER Integer Number of threadsPERCENT_A Double Interval for A class
http://en.wikipedia.org/wiki/ABC_analysishttp://en.wikipedia.org/wiki/ABC_analysishttp://en.wikipedia.org/wiki/ABC_analysishttp://en.wikipedia.org/wiki/ABC_analysis


37/46


SAP AG 2011 37


PERCENT_B Double Interval for B class

PERCENT_C Double Interval for C class

Output Table


Result 1st column VARCHAR/CHAR ABC class

2nd column VARCHAR/CHAR Items

Example

;# ABC

DROPTYPE DATA_T;

CREATETYPE DATA_TASTABLE("ITEM"VARCHAR(100),"VALUE"DOUBLE);

DROPTYPE CONTROL_T;

CREATETYPE CONTROL_TASTABLE("Name"VARCHAR(100), "intArgs" INT,"doubleArgs"DOUBLE,"strArgs"VARCHAR(100));

DROPTYPE RESULT_T;

CREATETYPE RESULT_TASTABLE("ABC"VARCHAR(10),"ITEM"VARCHAR(100));

DROPPROCEDURE palAbcAnalysis;

CREATEPROCEDURE palAbcAnalysis( IN target DATA_T, IN control CONTROL_T,OUT results RESULT_T )

LANGUAGELLANG

AS

BEGIN

export Void main(Table "target" targetTab,

Table "control" controlTab,

Table "results" & resultsTab){

pal::abcAnalysis(targetTab, controlTab, resultsTab);

}

END;

DROPTABLE #CONTROL_TBL;

CREATELOCALTEMPORARYCOLUMNTABLE #CONTROL_TBL ("Name"VARCHAR(100),"intArgs" INT, "doubleArgs"DOUBLE,"strArgs"VARCHAR(100));

INSERTINTO #CONTROL_TBLVALUES ('START_COLUMN',0,null,NULL);

INSERTINTO #CONTROL_TBLVALUES ('END_COLUMN',1,null,null);

INSERTINTO #CONTROL_TBLVALUES ('THREAD_NUMBER',2,null,null);

INSERTINTO #CONTROL_TBLVALUES ('PERCENT_A',null,0.7,null);


38/46


SAP AG 2011 38

INSERTINTO #CONTROL_TBLVALUES ('PERCENT_B',null,0.2,null);

INSERTINTO #CONTROL_TBLVALUES ('PERCENT_C',null,0.1,null);

DROPTABLE TESTABCTAB;

CREATECOLUMNTABLE TESTABCTAB("ITEM"VARCHAR(100),"VALUE"DOUBLE);INSERTINTO TESTABCTABVALUES ('item1', 15.4);

INSERTINTO TESTABCTABVALUES ('item2', 200.4);


INSERTINTO TESTABCTABVALUES ('item4', 100.9);#100.9






INSERTINTO

TESTABCTABVALUES

('item10', 9.4);

DROPTABLE RESULT_TBL;

CREATECOLUMNTABLE RESULT_TBL("ABC"VARCHAR(10),"ITEM"VARCHAR(100));

CALL palAbcAnalysis(TESTABCTAB, "#CONTROL_TBL", RESULT_TBL)withoverview;

SELECT * FROMRESULT_TBL;

;#EXPECTED OUTPUT:


39/46


SAP AG 2011 39

Weighted Score Table

A weighted score table is a method of evaluating alternatives when the importance of each criterion

differs. In a weighted score table, each alternative is given a score for each criterion. These scores are

then weighted by the importance of each criterion. All of an alternative's weighted scores are thenadded together to calculate that alternative's total weighted score. The alternative with the highest total

score should be the best alternative. Weighted score tables can be used to make predictions about

future customer behavior. A model based on historical data in a data mining application may be

applied to new data to make prediction. The prediction, that is, the output of the model, is also called a

Score. A single score for customers can be calculated by taking into account different dimensions.

A function defined by weighted score tables is a linear combination of functions of a variable.

f(x1,,xn) = w1f1(x1) + + wnfn(xn)

Prerequisites

Input data cannot contain null value.

The column of the Map Function table should be sorted by the attribute order of the Input Data

table.

Interface (weightedTable)

Function: pal::weightedTable

This function performs weighted table calculation. It is similar to the Volume Driver function in the

Business Function Library (BFL). Volume Driver calculates only one column, but weightedTable

calculates multiple columns at the same time.


pal::weightedTable ( Table target, Table mapfun, Table

control, Table args, Table result)

Input Table


Target/Input

Data

Columns VARCHAR/CHAR,

integer, or double

Specifies which

will be used to

calculate the

scores

Discrete value: integer, string,

double;

Continuous value: integer,

double;

An ID column is mandatory. Its

data type should be integer.

Map Function Columns VARCHAR/CHAR,

integer, or double

Creates the map

function

Every attribute (except ID) in the

Input Data table maps to two

columns in the Map Function

table: Key column and Value

column. The Value column must

be double type.


40/46


SAP AG 2011 40


Control Columns Integer or double This table has three columns.

When the Input Data table has n

attributes (except ID), the WeightTable will have nrows.

Output Table



2nd column Double Result value

Parameter Table


START_COLUMN Integer The first column used to do the calculation


END_COLUMN Integer The last column used to do the calculation



ID_COLUMN Integer Specifies the ID column (column index

starts from zero).

Example

;#weightedTable

DROPTYPE DATA_T;

CREATETYPE DATA_TASTABLE( "ID" INT,"GENDER"VARCHAR(10),"INCOME"INT,"HEIGHT"DOUBLE);

DROPTYPE MAP_FUN_T;

CREATETYPE MAP_FUN_TASTABLE("GENDER"VARCHAR(10), "VAL1"DOUBLE,"INCOME" INT, "VAL2"DOUBLE, "HEIGHT"DOUBLE, "VAL3"DOUBLE);

DROPTYPE CONTROL_T;

CREATETYPE CONTROL_TASTABLE( "WEIGHT"DOUBLE, "ISDIS" INT, "ROWNUM"INT);

DROPTYPE PARAMETERS_T;

CREATETYPE PARAMETERS_TASTABLE("Name"VARCHAR(100), "intArgs" INT,"doubleArgs"DOUBLE);


41/46


SAP AG 2011 41

DROPTYPE RESULT_T;

CREATETYPE RESULT_TASTABLE("ID" INT,"result"DOUBLE);

DROPPROCEDURE palWeightTable;

CREATEPROCEDURE palWeightTable( IN target DATA_T, IN mapfun MAP_FUN_T,IN control CONTROL_T, IN parameters PARAMETERS_T,OUT results RESULT_T )

LANGUAGELLANG

AS

BEGIN

export Void main(Table "target" targetTab,

Table "mapfun" mapfunTab,

Table"control" controlTab,

Table"parameters" parametersTab,


pal::weightedTable(targetTab, mapfunTab, controlTab, parametersTab,resultsTab);

}

END;

DROPTABLE TESTTARGET_TBL;

CREATECOLUMNTABLE TESTTARGET_TBL ("ID" INT,"GENDER"VARCHAR(10),"INCOME" INT,"HEIGHT"DOUBLE);

INSERTINTO TESTTARGET_TBLVALUES (0,'male',5000,1.73);


INSERTINTO TESTTARGET_TBLVALUES (2,'female',6000,1.55);






INSERTINTO TESTTARGET_TBLVALUES (8,'female',5500,1.85);#5500


DROPTABLE MAP_FUN_TBL;

CREATECOLUMNTABLE MAP_FUN_TBL ( "GENDER"VARCHAR(10), "VAL1"DOUBLE,"INCOME" INT, "VAL2"DOUBLE, "HEIGHT"DOUBLE, "VAL3"DOUBLE);

INSERTINTO MAP_FUN_TBLVALUES ('male',2.0, 0,0.0, 1.5,0.0);

INSERTINTO MAP_FUN_TBLVALUES ('female',1.5, 5500,1.0, 1.6,1.0);

INSERTINTO MAP_FUN_TBLVALUES ('null',0.0, 9000,2.0, 1.71,2.0);

INSERTINTO MAP_FUN_TBLVALUES ('null',0.0, 12000,3.0, 1.80,3.0);

DROPTABLE CONTROL_TBL;


42/46


SAP AG 2011 42

CREATECOLUMNTABLE CONTROL_TBL ("WEIGHT"DOUBLE, "ISDIS" INT, "ROWNUM"INT);

INSERTINTO CONTROL_TBLVALUES (0.5,1,2);

INSERTINTO CONTROL_TBLVALUES (2.0,-1,4);

INSERTINTO CONTROL_TBLVALUES (1.0,-1,4);

DROPTABLE #PARAMETERS_TBL;

CREATELOCALTEMPORARYCOLUMNTABLE #PARAMETERS_TBL ("Name"VARCHAR(100),"intArgs" INT, "doubleArgs"DOUBLE);

INSERTINTO #PARAMETERS_TBLVALUES ('ID_COLUMN',0,null);

INSERTINTO #PARAMETERS_TBLVALUES ('START_COLUMN',1,null);

INSERTINTO #PARAMETERS_TBLVALUES ('END_COLUMN',3,null);

INSERTINTO #PARAMETERS_TBLVALUES ('THREAD_NUMBER',2,null);

DROPTABLE RESULT_TBL;

CREATECOLUMNTABLE RESULT_TBL("ID" INT,"result"DOUBLE);

CALL palWeightTable(TESTTARGET_TBL, MAP_FUN_TBL, CONTROL_TBL,"#PARAMETERS_TBL", RESULT_TBL)with overview;

SELECT * FROMRESULT_TBL;;#EXPECTED OUTPUT:


43/46


SAP AG 2011 43

Log and TraceTo learn about the details of PAL function implementation or trace a problem occurred during function

runtime, you can check the log information in SAP HANA Database Administration.

Procedure

To open a log file:

1. In SAP HANA Database Administration, choose TraceLevels. Then click Add Componentin

the Trace Levelsdialog box, enter the PAL function name, and click OK. The name should beentered as PAL_, for example, PAL_APRIORIRULE.


44/46


SAP AG 2011 44

2. Select a log type for the newly added PAL component and click OK, as shown below.

3. Right-click on indexserver_ and select Show File.


45/46


SAP AG 2011 45

Result

The log file for the specified PAL function is displayed. You can use the Findbutton to search for the

log information that you need.

predictive analysis library manual

Documents