predictive analysis library manual
TRANSCRIPT
-
8/2/2019 Predictive Analysis Library Manual
1/46
-
8/2/2019 Predictive Analysis Library Manual
2/46
SAP AG 2011 2
Table of Contents
Overview .......................................................................................................................................... 4PAL Common Interface .................................................................................................................. 5
Input Data Table .................................................................................................................... 5Parameter Table .................................................................................................................... 6
Specifying the ID Column ............................................................................................. 6Output Data Table ................................................................................................................. 7
List of PAL Algorithms ................................................................................................................... 8PAL Algorithm Descriptions.......................................................................................................... 9
K-means ................................................................................................................................ 9Prerequisites ................................................................................................................. 9Interface (kmeans) ........................................................................................................ 9Interface (validateKmeans)......................................................................................... 11Example ...................................................................................................................... 12
C4.5 Decision Tree .............................................................................................................. 16Prerequisites ............................................................................................................... 16Interface (createDT) ................................................................................................... 16Interface (predictWithDT) ........................................................................................... 17Example ...................................................................................................................... 18
KNN .................................................................................................................................... 22
Prerequisites ............................................................................................................... 22Interface (knn) ............................................................................................................ 22Example ...................................................................................................................... 23
Multiple Linear Regression .................................................................................................. 26Prerequisites ............................................................................................................... 26Interface (linearRegression) ....................................................................................... 26Interface (forecastWithLR).......................................................................................... 27Example ...................................................................................................................... 28
Apriori .................................................................................................................................. 32Prerequisite ................................................................................................................ 32Interface (aprioriRule) ................................................................................................. 32Example ...................................................................................................................... 33
ABC Classification ............................................................................................................... 36Prerequisite ................................................................................................................ 36Interface (abcAnalysis) ............................................................................................... 36Example ...................................................................................................................... 37
Weighted Score Table ......................................................................................................... 39Prerequisites ............................................................................................................... 39Interface (weightedTable) ........................................................................................... 39
-
8/2/2019 Predictive Analysis Library Manual
3/46
SAP AG 2011 3
Example ...................................................................................................................... 40Log and Trace ............................................................................................................................... 43Copyrights ..................................................................................................................................... 46
-
8/2/2019 Predictive Analysis Library Manual
4/46
Predictive Analysis Library Reference Manual
SAP AG 2011 4
OverviewSince the SAP HANA 1.0 GA, SQL Script v2 can be used to express application logic within the
database that exceeds the capabilities of pure SQL. With enhanced control flow capabilities, SQL
Script v2 is more suitable for pushing complex application logic to the SAP HANA database.
When designing HANA applications, procedures are the main programmable containers. However, it is
difficult and often impossible to describe predictive analysis logic with procedures. For example, an
application may need to perform a cluster analysis in a huge customer table with a terra byte of data. It
is impossible to implement the analysis in a procedure even with the simple classic K-means algorithm.
Obviously, it is not a wise decision to copy a huge table to the application server to perform the K-
means calculation, because data-copying will be slow and is not necessary.
SAP provides the Predictive Analysis Library (PAL) to offer you the flexibility and efficiency to develop
HANA applications requiring predictive functionality. In the above case, using the PAL is the best
choice. PAL algorithms can be called directly in L wrapper within SQL Script v2. The inputs and
outputs are all tables. Currently, PAL includes seven well known predictive analysis algorithms in threedata mining algorithm categories:
Cluster analysis
Classification analysis
Association analysis
The functions in PAL are predefined. More functions will be supported in future releases. The seven
algorithms included in PAL were carefully selected based on the following criteria:
1. These algorithms are required for SAP HANA applications.
2. Market surveys (e.g. Rexer Analytics and KDnuggets polls) show that these algorithms are most
commonly used.
3. These are the most common algorithms available in database products from other vendors in
the marketplace, such as Microsoft SQL Server, Oracle, and IBM DB2.
-
8/2/2019 Predictive Analysis Library Manual
5/46
Predictive Analysis Library Reference Manual
SAP AG 2011 5
PAL Common InterfaceThe Predictive Analysis Library uses tables as the data interface. Three types of tables are used for
each PAL function:
Input data table Supplies input data to PAL functions. Each function can have one or more
input data tables.
Parameter table Supplies parameters to PAL functions. Parameters are special inputs that
control what the functions do. Each function has only one parameter table.
Output data table Holds the output results calculated by PAL functions. Each function can
have one or more output data tables.
The structures of the above tables are pre-defined and cannot be changed. Table names and column
names can be specified by the users.
Input Data Table
Input data tables contain the input data required to perform predictive analysis, sometimes described
as the historical and training data used to build predictive models.
For some cluster and classification functions, primary keys/IDs are required in the input data table, so
that the output data table can contain the same primary key/ID column. Usually the ID column is the
first column of the input data table. For some PAL functions, you can specify which column should be
the ID column. SeeSpecifying the ID Columnfor details.
A typical input data table looks like the following:
ID X1 X2 X3
0 0.1 0.1 0.1
1 0.1 0 0.1
2 10.1 10.1 0.1
3 10.1 9.9 0.1
4 10.1 10.1 0.1
5 1000.1 1000.1 0.1
6 999.1 1000.1 0.1
7 0.1 10000.1 0.18 0.1 9999.1 0.1
9 11.1 10.1 0.1
10 0.001 0.001 0.1
-
8/2/2019 Predictive Analysis Library Manual
6/46
Predictive Analysis Library Reference Manual
SAP AG 2011 6
Parameter Table
PAL functions use parameter tables to transfer parameter values. Each PAL function has its own
parameter table. To avoid the conflict of table names when several users call PAL functions at thesame time, the parameter table must be created as a local temporary column table, so that each
parameter table has its unique scope per session. The table structure is defined as below. Each row
contains only one parameter value.
Column Name Data Type Description
Name VARCHAR/CHAR Parameter name
intArgs Integer Integer parameter value
doubleArgs Double Double parameter value
stringArgs VARCHAR/CHAR String parameter value
The following shows an example of a parameter table with three parameters. The first parameter,
THREAD_NUMBER, is an integer parameter. Thus, in the THREAD_NUMBER row, you should fill the
parameter value in the intArgs column, and leave the doubleArgs and stringArgs columns blank.
Name intArgs doubleArgs stringArgs
THREAD_NUMBER 1
SUPPORT 0.2
VAR_NAME hello
Specifying the ID Column
Some cluster and classification functions include an ID column in the input data table. Usually the ID
column is the first column of the input data table. For some PAL functions, you can specify a column
number for the ID_COLUMN parameter in the parameter table to tell the function which column should
be the ID column. Column number starts from 0, so the first column is column 0, the second column 1,
and so on. For example, if you want to specify the first column as the ID column, the parameter table
should contain the following row:
Name intArgs doubleArgs stringArgs
ID_COLUMN 0
-
8/2/2019 Predictive Analysis Library Manual
7/46
Predictive Analysis Library Reference Manual
SAP AG 2011 7
Output Data Table
Output tables contain predicted results, such as classification types and fitted values. A typical output
table looks like the following:ID CENTER_ASSIGN ENERGY
0 0 0.3
1 1 0.2
2 4 20.3
3 4 20.1
4 4 20.3
5 4 2000.3
6 4 1999.3
7 2 10000.3
8 2 9999.3
9 3 21.3
10 1 0.12
-
8/2/2019 Predictive Analysis Library Manual
8/46
Predictive Analysis Library Reference Manual
SAP AG 2011 8
List of PAL AlgorithmsThe following table lists all available algorithms and functions in the Predictive Analysis Library. The
algorithms are grouped by categories.
To learn more about an algorithm, click the algorithm name.
To learn more about function interface, click the function name.
Category Algorithm Function Name Description
Cluster Analysis K-means kmeans Clusters data using the k-means algorithm.
validateKmeans Validates or measures the quality of
clustered result.
ClassificationAnalysis C4.5 Decision Tree
createDT Creates tree models using the C4.5algorithm. The model is represented in
JSON.
predictWithDT Uses the tree model to perform prediction.
KNN knn K-nearest neighbor algorithm function.
Multiple Linear
Regression
linearRegression Multiple linear regression algorithm
function.
forecastWithLR Makes forecasts using the regression
equation.
AssociationAnalysis Apriori
aprioriRule Creates association rules using the Apriorialgorithm.
Other ABC Classification abcAnalysis Performs ABC classification analysis. For
example, when 20% items contribute 80%
of the total revenue, these 20% items can
be put into class A.
Weighted Score
Table
weightedTable Performs weighted table calculation.
-
8/2/2019 Predictive Analysis Library Manual
9/46
Predictive Analysis Library Reference Manual
SAP AG 2011 9
PAL Algorithm DescriptionsThis section contains detailed descriptions of all available algorithms in the Predictive Analysis Library.
The following information is provided for each algorithm.
Algorithm description
Prerequisites
Interface description (function name; L function signature; input, parameter, and output tables)
Example
K-means
In predictive analysis, k-means clustering is a method of cluster analysis which aims to partition n
observations or records into kclusters in which each observation belongs to the cluster with the
nearest mean. In marketing and customer-relationship management areas, it uses customer data to
track customer behavior and create strategic business initiatives. Organizations can use this data to
divide customers into segments based on variants such as demography, customer behavior, customer
profitability, measure of risk, and lifetime value of a customer or retention probability.
Clustering works to group records together according to an algorithm or mathematical formula that
attempts to find centroids, or centers, around which similar records gravitate. The most common
algorithm uses an iterative refinement technique. It is also referred to as Lloyd's algorithm:
Given an initial set of kmeans m1,...,mk, the algorithm proceeds by alternating between two steps:
Assignment step: Assign each observation to the cluster with the closest mean.
Update step: Calculate the new means to be the centroid of the observations in the cluster.
The algorithm repeats until the assignments no longer change.
For more information, refer tohttp://en.wikipedia.org/wiki/K-means_clustering.
The k-means implementation in PAL supports multi-thread, data normalization, different distance level
measurement, and cluster quality measure (Silhouette).The implementation doesn t support
categorical data, however this can be managed through data transformation. The first K and random K
starting methods are supported.
Prerequisites The input data should contain an ID column and the other columns should be integer or double
data type.
Input data does not contain null value. The function will issue errors when encountering null
values.
Interface (kmeans)
Function: pal::kmeans
This is a clustering function using the k-means algorithm.
http://en.wikipedia.org/wiki/K-means_clusteringhttp://en.wikipedia.org/wiki/K-means_clusteringhttp://en.wikipedia.org/wiki/K-means_clusteringhttp://en.wikipedia.org/wiki/K-means_clustering -
8/2/2019 Predictive Analysis Library Manual
10/46
Predictive Analysis Library Reference Manual
SAP AG 2011 10
L Function Signature
pal::kmeans(Table dataset, Table args, Table result)
Input Table
Table Column Column Data Type Description Constraint
Data 1st column Integer ID
Other columns Integer or double Attribute data
Parameter Table
Name Data Type Description
GROUP_NUMBER Integer Number of groups (k).
DISTANCE_LEVEL Integer Computes the distance between item and
cluster center.
2 = Euclidean distance
MAX_ITERATION Integer Maximum number of iterations.
START_COLUMN Integer The index number of the first data column
(column index starts from zero).
COLUMN_NUM Integer Number of data columns.
START_ROW Integer The index number of the first data row (row
index starts from zero).
ROW_NUM Integer Number of data rows.
INIT_TYPE Integer Center initialization type:
1 = first K
2 = weighted random with replacement
3 = random without replacement
NORMALIZATION Integer Normalization type:
0 = no
1 = yes, for each point X(x1,x2,...xn),normalized value will be X'
(|x1|/S,|x2|/S...|xn|/S) where S =
|x1|+|x2|+...|xn|
THREAD_NUMBER Integer Number of threads.
EXIT_THRESHOLD Double Threshold (actual value) for exiting the
iterations.
-
8/2/2019 Predictive Analysis Library Manual
11/46
Predictive Analysis Library Reference Manual
SAP AG 2011 11
Output Table
Table Column Column Data Type Description Constraint
Result 1st column Integer ID
2nd column Integer or double Clustered item assigned to
class number
3th column Integer or double Sum of item's attribute, which
is used in normalization.
Interface (validateKmeans)
Function: pal::validateKmeans
This is a quality measurement function for k-means clustering.
L Function Signature
pal::validateKmeans (Table dataset, Table args, Table
result)
Input Table
Table Column Column Data Type Description Constraint
Data 1st column Integer ID
Other
columns
Integer or double Attribute data
Type Data/
Class Data
1st column Integer ID
2nd column Integer Class type
Parameter Table
Name Data Type Description
VARIABLE_NUM Integer Number of variables
THREAD_NUMBER Integer Number of threads
Output Table
Table Column Column Data Type Description Constraint
Result 1st column VARCHAR/CHAR Name
2nd column Double Measure result
-
8/2/2019 Predictive Analysis Library Manual
12/46
Predictive Analysis Library Reference Manual
SAP AG 2011 12
Example
;#KMEANS.SQL
; # create type for kmeans resultALTERSESSION SET CURRENT_SCHEMA = "DM_PAL";DROPTYPE T_KMEANS_RESULT_ASSIGN_TAB;CREATETYPE T_KMEANS_RESULT_ASSIGN_TABASTABLE("ID" INT,"CENTER_ASSIGN" INT,"ENERGY"DOUBLE);
; # create type for double argumentsDROPTYPE T_SINGLE_COLUMN_DOUBLE_TAB ;CREATETYPE T_SINGLE_COLUMN_DOUBLE_TABASTABLE("VALUE"DOUBLE) ;
; # create type for integer arguments
DROPTYPE T_SINGLE_COLUMN_INT_TAB ;CREATETYPE T_SINGLE_COLUMN_INT_TABASTABLE("VALUE" INT) ;
DROPTYPE KMEANS_DOUBLE_INPUT;CREATETYPE KMEANS_DOUBLE_INPUTASTABLE("ID" INT,"V000"DOUBLE, "V001"DOUBLE,
primarykey("ID"));
DROPTABLE #CONTROL_TAB;CREATELOCALTEMPORARYCOLUMNTABLE #CONTROL_TAB (
"Name"VARCHAR(50),"intArgs"INTEGER,
"doubleArgs"DOUBLE,"stringArgs"VARCHAR(100));
DROPTYPE CONTROL_T;CREATETYPE CONTROL_TASTABLE ( "Name"VARCHAR(50),"intArgs"INTEGER,"doubleArgs"DOUBLE,"stringArgs"VARCHAR(100));
DROPPROCEDURE KMEANS_WITH_NEWDB;
CREATEPROCEDUREKMEANS_WITH_NEWDB(IN dataset KMEANS_DOUBLE_INPUT,IN control CONTROL_T,
OUT cluster_assignment T_KMEANS_RESULT_ASSIGN_TAB)LANGUAGELLANGASBEGINexport Void main(Table "dataset" datasetTab,Table "control" argsTab,Table"cluster_assignment" & resultTab){pal::kmeans(datasetTab, argsTab, resultTab);}END;
-
8/2/2019 Predictive Analysis Library Manual
13/46
-
8/2/2019 Predictive Analysis Library Manual
14/46
Predictive Analysis Library Reference Manual
SAP AG 2011 14
;#EXPECTED OUTPUT:
#validateKmeans.sqlDROPVIEWKMEANS_TYPE_ASSIGN_TAB_PAL;
CREATEVIEWKMEANS_TYPE_ASSIGN_TAB_PALASSELECT"ID", "CENTER_ASSIGN"AS"TYPE_ASSIGN"FROMKMEANS_RESULT_ASSIGN_TAB_PAL;
DROPTYPE T_KMEANS_TYPE_ASSIGN_TAB_S;CREATETYPE T_KMEANS_TYPE_ASSIGN_TAB_SASTABLE("ID"INTEGER,"TYPE_ASSIGN"INTEGER);
DROPTYPE T_KMEANS_RESULT_SVALUE_TAB;CREATETYPE T_KMEANS_RESULT_SVALUE_TABASTABLE(
"NAME"VARCHAR(50),"S"DOUBLE);
DROPPROCEDURE KMEANSVALIDATE_WITH_NEWDB;
CREATEPROCEDUREKMEANSVALIDATE_WITH_NEWDB(IN dataset KMEANS_DOUBLE_INPUT,IN typeset T_KMEANS_TYPE_ASSIGN_TAB_S,IN control CONTROL_T,OUT sValue T_KMEANS_RESULT_SVALUE_TAB)LANGUAGELLANGASBEGIN
export Void main(Table "dataset" datasetTab,
-
8/2/2019 Predictive Analysis Library Manual
15/46
-
8/2/2019 Predictive Analysis Library Manual
16/46
Predictive Analysis Library Reference Manual
SAP AG 2011 16
C4.5 Decision Tree
A decision tree is used as a classifier for determining an appropriate action or decision among a
predetermined set of actions for a given case. A decision tree helps to effectively identify the factors to
consider and how each factor has historically been associated with different outcomes of the decision.A decision tree is a classifier that uses a tree-like structure of conditions and their possible
consequences. Each node of a decision tree can be a leaf node or a decision node.
Leaf node: identifies the value of the dependent (target) variable.
Decision node: contains one condition that specifies some test on an attribute value. The
outcome of the condition is further divided into branches with subtrees or leaf nodes.
C4.5 is an algorithm used to generate a decision tree. C4.5 builds decision trees from a set of training
data , using the concept of information entropy. The training data is a set S = s1,s2,... of already
classified samples. Each sample si = x1,x2,... is a vector where x1,x2,... represent attributes or
features of the sample. The training data is augmented with a vector C = c1,c2,... where c1,c2,...
represent the class to which each sample belongs. At each node of the tree, C4.5 chooses oneattribute of the data that most effectively splits its set of samples into subsets enriched in one class or
the other. Its criterion is the normalized information gain (difference in entropy) that results from
choosing an attribute for splitting the data. The attribute with the highest normalized information gain is
chosen to make the decision. The C4.5 algorithm then recursively works on the smaller sublists. For
more information, refer tohttp://en.wikipedia.org/wiki/C4.5_algorithm.
The C4.5 decision tree functions implemented in PAL supports both discrete and continuous values. A
continuous attribute is discretised by defining fixed intervals provided by the user. For example, if the
salary ranges from $100 to $20,000, then we can form intervals such as $0 $8,000, $8,000
$18,000, and $18,000 $20,000. An attribute value will fall into any one of these intervals. In the PAL
implementation, Reduced Error Pruning (REP) Algorithm is used as pruning method.
Prerequisites The column order and column number of predicted data should be the same as the order and
number used in tree model building.
The last column of training data is used as a predicted field and its type should be discrete type.
Predicted data set should have an ID column.
Input data does not contain null value, otherwise exceptions will be thrown.
The table used to store the tree model should be a column table.
Interface (createDT)Function: pal::createDT
The createDT function creates a decision tree from input training data.
L Function Signature
pal::createDT(Table training, Table args, Table result)
http://en.wikipedia.org/wiki/C4.5_algorithmhttp://en.wikipedia.org/wiki/C4.5_algorithmhttp://en.wikipedia.org/wiki/C4.5_algorithmhttp://en.wikipedia.org/wiki/C4.5_algorithm -
8/2/2019 Predictive Analysis Library Manual
17/46
Predictive Analysis Library Reference Manual
SAP AG 2011 17
Input Table
Table Column Column Data Type Description Constraint
Training /
Historical Data
Columns VARCHAR/CHAR,
integer, or double
Table used to
build the predictivetree model
Discrete value: integer and
stringContinuous value: integer and
double
Parameter Table
Name Data Type Description
START_COLUMN Integer The first attribute/column used to make
prediction in the training data set
(column index starts from zero).
END_COLUMN Integer The last attribute/column used to make
prediction in the training data set .This
attribute/column must contain class
information (column index starts from
zero).
PERCENTAGE Double The percentage to be applied to
determine the input training data set.
THREAD_NUMBER Integer Number of threads.
CONTINUOUS_COL (Integer,
Double)
(optional)
Defines which column needs
discretization and the interval provided
by the user (column index starts from
zero).
The integer value specifies the column
position.
The double value specifies the interval.
Output Table
Table Column Column Data Type Description Constraint
Result (treemodel)
1st column CLOB Tree model savedas a JSON string
in the 1st column.
The table should be a columntable; otherwise the CLOB
type is not supported.
Interface (predictWithDT)
Function: pal::predictWithDT
The predictWithDT function is used to perform prediction by using decision trees.
-
8/2/2019 Predictive Analysis Library Manual
18/46
Predictive Analysis Library Reference Manual
SAP AG 2011 18
L Function Signature
pal::predictWithDT(Table predictive, Table args, Table model,
Table result)
Input Table
Table Column Column Data Type Description Constraint
Predicted Data Columns VARCHAR/CHAR or
integer/Double
Data to be classified
(predicted)
An ID column is
mandatory. Its
data type should
be integer.
Predictive Model 1st column CLOB Serialized tree model
Parameter Table
Name Data Type Description
START_COLUMN Integer The first attribute/column used to make
prediction in the input data set (column index
starts from zero).
END_COLUMN Integer The last attribute/column used to make
prediction in the input data set (column index
starts from zero).
ID_COLUMN Integer The column of predicted data used as primary
key/ID. Column index starts from zero (columnindex starts from zero).
THREAD_NUMBER Integer Number of threads
Output Table
Table Column Column Data Type Description Constraint
Result 1st column Integer ID
2nd column VARCHAR/CHAR Predictive result
Example;# DECISION TREE
DROPTYPE DATA_T;
CREATETYPE DATA_TASTABLE( "ID" INT,"Region"VARCHAR(50),"SalesPeriod"VARCHAR(50),"Revenue" INT,"CLASS"VARCHAR(50));
DROPTYPE MODEL_T;
CREATETYPE MODEL_TASTABLE("Model"CLOB);
-
8/2/2019 Predictive Analysis Library Manual
19/46
-
8/2/2019 Predictive Analysis Library Manual
20/46
Predictive Analysis Library Reference Manual
SAP AG 2011 20
INSERTINTO TESTDT_TABVALUES (4, 'West', 'Spring', 5000, 'Poor');
INSERTINTO TESTDT_TABVALUES (5, 'East', 'Spring', 200000, 'Good');
INSERTINTO TESTDT_TABVALUES (6, 'South', 'Summer', 25000, 'Poor');
INSERTINTO TESTDT_TABVALUES (7, 'South', 'Spring', 10000, 'Average');
INSERTINTO TESTDT_TABVALUES (8, 'North', 'Winter', 50000, 'Average');
DROPTABLE PREDICTIVE_TAB;
CREATECOLUMNTABLE PREDICTIVE_TAB ("ID" INT,"Region"VARCHAR(50),"SalesPeriod"VARCHAR(50),"Revenue" INT);
INSERTINTO PREDICTIVE_TABVALUES (0,'South', 'Autumn', 60000);
INSERTINTO PREDICTIVE_TABVALUES (1,'North', 'Spring', 30000);
INSERTINTO PREDICTIVE_TABVALUES (2,'South', 'Summer', 25000);
INSERTINTO PREDICTIVE_TABVALUES (3,'West', 'Winter', 5000);
DROPTABLE #CONTROL_TAB;
CREATELOCALTEMPORARYCOLUMNTABLE #CONTROL_TAB ("Name"VARCHAR(100),"intArgs" INT, "doubleArgs"DOUBLE, "stringArgs"VARCHAR(100));
INSERTINTO #CONTROL_TABVALUES ('START_COLUMN',1,null,null);
INSERTINTO #CONTROL_TABVALUES ('END_COLUMN',4,null,null);
INSERTINTO #CONTROL_TABVALUES ('PERCENTAGE',null,0.71,null);
INSERTINTO #CONTROL_TABVALUES ('THREAD_NUMBER',1,null,null);
INSERTINTO #CONTROL_TABVALUES ('CONTINUOUS_COL',3,25000,null);
INSERTINTO #CONTROL_TABVALUES ('CONTINUOUS_COL',3,60000,null);
DROPTABLE #PRE_CONTROL_TAB;
CREATELOCALTEMPORARYCOLUMNTABLE #PRE_CONTROL_TAB ("Name"VARCHAR(100), "intArgs" INT, "doubleArgs"DOUBLE, "stringArgs"VARCHAR(100));
INSERTINTO #PRE_CONTROL_TABVALUES ('START_COLUMN',1,null,null);
INSERTINTO #PRE_CONTROL_TABVALUES ('END_COLUMN',3,null,null);
INSERTINTO #PRE_CONTROL_TABVALUES ('THREAD_NUMBER',2,null,null);
INSERTINTO #PRE_CONTROL_TABVALUES ('ID_COLUMN',0,null,null);
DROPTABLE RESULTS_TAB;
CREATECOLUMNTABLE RESULTS_TAB ("ID" INT, "CLASS"VARCHAR(50));
DROPTABLE MODEL_TAB;CREATECOLUMNTABLE MODEL_TAB ("Model"CLOB);
CALL palDT1(TESTDT_TAB, "#CONTROL_TAB", MODEL_TAB)with overview;
CALL palDT2(PREDICTIVE_TAB, "#PRE_CONTROL_TAB", MODEL_TAB, RESULTS_TAB)with overview;
SELECT * FROMRESULTS_TAB;
;#EXPECTED OUTPUT:
-
8/2/2019 Predictive Analysis Library Manual
21/46
Predictive Analysis Library Reference Manual
SAP AG 2011 21
-
8/2/2019 Predictive Analysis Library Manual
22/46
Predictive Analysis Library Reference Manual
SAP AG 2011 22
KNN
The k-nearest neighbor algorithm (KNN) is a method for classifying objects based on closest training
examples in the feature space. KNN is amongst the simplest of all machine learning algorithms: an
object is classified by a majority vote of its neighbors, with the object being assigned to the class mostcommon amongst its knearest neighbors (kis a positive integer, typically small). If k = 1, then the
object is simply assigned to the class of its nearest neighbor. The neighbors are taken from a set of
objects for which the correct classification is known.
The training examples are vectors in a multidimensional feature space, each with a class label. The
training phase of the algorithm consists only of storing the feature vectors and class labels of the
training samples. In the classification phase, k is a user-defined constant, and an unlabelled vector (a
query or test point) is classified by assigning the label which is most frequent among the ktraining
samples nearest to that query point. Usually Euclidean distance is used as the distance metric. For
more information, refer tohttp://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm.
The PAL implementation of KNN supports multi-thread and different voting type.
Prerequisites The first column of training data and input data should be ID column. The second column of
training data should be class type. The class type column is of integer type. Other data columns
are of integer or double type.
Input data does not contain null value.
Interface (knn)Function: pal::knn
This is a classification function using the KNN algorithm.
L Function Signature
pal::knn(Table value, Table classvalue, Table args,
Table result)
Input Table
Table Column Column Data Type Description Constraint
Training Data 1st column Integer ID
2nd column Integer Class type
Other columns Integer or double Attribute data
Class Data 1st column Integer ID
Other columns Integer or double Attribute data
Parameter Table
Name Data Type Description
K_NEAREST_NEIGHBOURS Integer Number of nearest neighbors (k)
http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithmhttp://en.wikipedia.org/wiki/K-nearest_neighbor_algorithmhttp://en.wikipedia.org/wiki/K-nearest_neighbor_algorithmhttp://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm -
8/2/2019 Predictive Analysis Library Manual
23/46
Predictive Analysis Library Reference Manual
SAP AG 2011 23
Name Data Type Description
ATTRIBUTE_NUM Integer Number of attributes
VOTING_TYPE Integer Voting type:
0 = majority voting
1 = distance-weighted voting
THREAD_NUMBER Integer Number of threads
Output Table
Table Column Column Data Type Description Constraint
Result 1st column Integer ID
2nd column Integer or double class type
Example
;# knn.sql
ALTERSESSION SET CURRENT_SCHEMA = "DM_PAL";
DROPTYPE DATA_T;
CREATETYPE DATA_TASTABLE( "ID" INT,"TYPE" INT,"X1"DOUBLE, "X2"DOUBLE);
DROPTYPE CLASSDATA_T;
CREATETYPE CLASSDATA_TASTABLE( "ID" INT,"X1"DOUBLE, "X2"DOUBLE);
DROPTYPE RESULT_T;
CREATETYPE RESULT_TASTABLE("ID" INT,"Type" INT);
DROPTABLE #CONTROL_TAB;CREATELOCALTEMPORARYCOLUMNTABLE #CONTROL_TAB (
"Name"VARCHAR(50),
"intArgs"INTEGER,
"doubleArgs"DOUBLE,
"stringArgs"VARCHAR(100));
DROPTYPE CONTROL_T;
CREATETYPE CONTROL_TASTABLE( "Name"VARCHAR(50),"intArgs"INTEGER, "doubleArgs"DOUBLE,"stringArgs"VARCHAR(100));
DROPPROCEDURE palKNN;
-
8/2/2019 Predictive Analysis Library Manual
24/46
Predictive Analysis Library Reference Manual
SAP AG 2011 24
CREATEPROCEDURE palKNN( IN data DATA_T, IN classdata CLASSDATA_T, INcontrol CONTROL_T, OUT results RESULT_T )
LANGUAGELLANGAS
BEGIN
export Void main(Table"data" dataTab,
Table "classdata" classdataTab,
Table "control" argsTab,
Table "results" & resultsTab) {
pal::knn(dataTab, classdataTab, argsTab, resultsTab);
}
END;
DROPTABLE DATA_TAB;
CREATECOLUMNTABLE DATA_TAB ( "ID" INT,"TYPE" INT,"X1"DOUBLE, "X2"DOUBLE);
INSERTINTO DATA_TABVALUES (0,2,1,1);
INSERTINTO DATA_TABVALUES (1,3,10,10);
INSERTINTO DATA_TABVALUES (2,3,10,11);
INSERTINTO DATA_TABVALUES (3,3,10,10);
INSERTINTO DATA_TABVALUES (4,1,1000,1000);
INSERTINTO DATA_TABVALUES (5,1,1000,1001);
INSERTINTO DATA_TABVALUES (6,1,1000,999);
INSERTINTO DATA_TABVALUES (7,1,999,999);
INSERTINTO DATA_TABVALUES (8,1,999,1000);
INSERTINTO DATA_TABVALUES (9,1,1000,1000);
DROPTABLE CLASSDATA_TAB;
CREATECOLUMNTABLE CLASSDATA_TAB ( "ID" INT,"X1"DOUBLE, "X2"DOUBLE);
INSERTINTO CLASSDATA_TABVALUES (0,2,1);
INSERTINTO CLASSDATA_TABVALUES (1,9,10);
INSERTINTO CLASSDATA_TABVALUES (2,9,11);
INSERTINTO CLASSDATA_TABVALUES (3,15000,15000);
INSERTINTO CLASSDATA_TABVALUES (4,1000,1000);
INSERTINTO CLASSDATA_TABVALUES (5,500,1001);
INSERTINTO CLASSDATA_TABVALUES (6,500,999);INSERTINTO CLASSDATA_TABVALUES (7,199,999);
TRUNCATETABLE #CONTROL_TAB;
INSERTINTO #CONTROL_TABVALUES ('K_NEAREST_NEIGHBOURS',3,null,null);
INSERTINTO #CONTROL_TABVALUES ('ATTRIBUTE_NUM',2,null,null);
INSERTINTO #CONTROL_TABVALUES ('VOTING_TYPE',0,null,null);
INSERTINTO #CONTROL_TABVALUES ('THREAD_NUMBER',8,null,null);
DROPTABLE RESULTS_TAB;
CREATECOLUMNTABLE RESULTS_TAB ("ID" INT,"Type" INT);
-
8/2/2019 Predictive Analysis Library Manual
25/46
Predictive Analysis Library Reference Manual
SAP AG 2011 25
CALL palKNN(DATA_TAB, CLASSDATA_TAB, "#CONTROL_TAB", RESULTS_TAB)withoverview;
SELECT * FROMRESULTS_TAB;
;#EXPECTED OUTPUT:
-
8/2/2019 Predictive Analysis Library Manual
26/46
Predictive Analysis Library Reference Manual
SAP AG 2011 26
Multiple Linear Regression
Linear regression is an approach to model the relationship between a scalar variable yand one or
more variables denoted X. In linear regression, data are modeled using linear functions, and unknown
model parameters are estimated from the data. Such models are called linear models.For more information, refer tohttp://en.wikipedia.org/wiki/Linear_regression.
In PAL, the implementation of linear regression is to solve the equation:
Ax=Y
Where A is MxN matrix, x is Nx1 matrix, and Y is Mx1 matrix.
Then, x= reverse(A)*Y
And it can be transformed into
x= reverse(transpose(A)*A) *transpose(A) *Y
The implementation also supports calculating the F value and R^2 determining statistical significance.
Prerequisites No missing or null data in inputs.
Data is numeric, not categorical.
Given the structure as Y and X1...Xn, there must be more than n+1 records available for
analysis.
Interface (linearRegression)
Function: pal::linearRegression
This is a multiple linear regression function.
L Function Signature
pal::linearRegression( Table data, Table args, Table result,
Table fitted, Table significance)
Input Table
Table Column Column Data Type Description Constraint
Data 1st column Integer ID
2nd column Integer or double Variable y
Other columns Integer or double Variable Xn
Parameter Table
Name Data Type Description
VARIABLE_NUM Integer Number of variable X
THREAD_NUMBER Integer Number of threads
http://en.wikipedia.org/wiki/Linear_regressionhttp://en.wikipedia.org/wiki/Linear_regressionhttp://en.wikipedia.org/wiki/Linear_regressionhttp://en.wikipedia.org/wiki/Linear_regression -
8/2/2019 Predictive Analysis Library Manual
27/46
Predictive Analysis Library Reference Manual
SAP AG 2011 27
Output Table
Table Column Column Data Type Description Constraint
Result 1st column Integer ID
2nd column Integer or double Value Ai (A0 is
the intercept; A1 is
the beta
coefficient for X1,
A2 is the beta
coefficient for X2
etc )
Fitted Data 1st column Integer ID
2nd column Integer or double Value Yi
Significance 1st column VARCHAR/CHAR Name (R^2 / F)
2nd column Double Value
Interface (forecastWithLR)
Function: pal::forecastWithLR
This function is used to perform predication with linear regression result.
L Function Signature
pal::forecastWithLR( Table predictdata, Table coefficient,
Table args, Table result)
Input Table
Table Column Column Data Type Description Constraint
Predictive Data 1st column Integer ID
Other columns Integer or double Variable Xn
Coefficient 1st column Integer ID
2nd column Integer or double Value Ai
Parameter Table
Name Data Type Description
VARIABLE_NUM Integer Number of variable X
THREAD_NUMBER Integer Number of threads
-
8/2/2019 Predictive Analysis Library Manual
28/46
Predictive Analysis Library Reference Manual
SAP AG 2011 28
Output Table
Table Column Column Data Type Description Constraint
Fitted Result 1st column Integer ID
2nd column Integer or double Value Yi
Example
;# linearRegression
ALTERSESSION SET CURRENT_SCHEMA = "DM_PAL";
DROPTYPE DATA_T;
CREATETYPE DATA_TASTABLE( "ID" INT,"Y"DOUBLE,"X1"DOUBLE, "X2"DOUBLE);
DROPTYPE RESULT_T;
CREATETYPE RESULT_TASTABLE("ID" INT,"Ai"DOUBLE);
DROPTYPE FITTED_T;
CREATETYPE FITTED_TASTABLE("ID" INT,"Fitted"DOUBLE);
DROPTYPE SIGNIFICANCE_T;
CREATETYPE SIGNIFICANCE_TASTABLE("NAME"varchar(50),"VALUE"DOUBLE);
DROPTYPE CONTROL_T;
CREATETYPE CONTROL_TASTABLE( "Name"VARCHAR(50),"intArgs"INTEGER, "doubleArgs"DOUBLE,"stringArgs"VARCHAR(100));
DROPPROCEDURE palLR;
CREATEPROCEDURE palLR( IN data DATA_T, IN control CONTROL_T, OUTresults RESULT_T, OUT fittedValue FITTED_T,OUT significanceSIGNIFICANCE_T )
LANGUAGELLANG
AS
BEGIN
export Void main(Table"data" dataTab,
Table "control" argsTab,
Table "results" & resultsTab,
Table "fittedValue" & fittedTab,
Table "significance" &significanceTab) {
pal::linearRegression(dataTab, argsTab, resultsTab,fittedTab,significanceTab);
}
END;
-
8/2/2019 Predictive Analysis Library Manual
29/46
Predictive Analysis Library Reference Manual
SAP AG 2011 29
DROPTABLE DATA_TAB;
CREATECOLUMNTABLE DATA_TAB ( "ID" INT,"Y"DOUBLE,"X1"DOUBLE, "X2"DOUBLE);
INSERTINTO DATA_TABVALUES (0,0.5,0.13,0.33);
INSERTINTO DATA_TABVALUES (1,0.15,0.14,0.34);INSERTINTO DATA_TABVALUES (2,0.25,0.15,0.36);
INSERTINTO DATA_TABVALUES (3,0.35,0.16,0.35);
INSERTINTO DATA_TABVALUES (4,0.45,0.17,0.37);
INSERTINTO DATA_TABVALUES (5,0.55,0.18,0.38);
INSERTINTO DATA_TABVALUES (6,0.65,0.19,0.39);
INSERTINTO DATA_TABVALUES (7,0.75,0.19,0.31);
INSERTINTO DATA_TABVALUES (8,0.85,0.11,0.32);
INSERTINTO DATA_TABVALUES (9,0.95,0.12,0.33);
DROPTABLE #CONTROL_TAB;
CREATELOCALTEMPORARYCOLUMNTABLE #CONTROL_TAB ( "Name"VARCHAR(50),
"intArgs"INTEGER,
"doubleArgs"DOUBLE,
"stringArgs"VARCHAR(100));
INSERTINTO #CONTROL_TABVALUES ('VARIABLE_NUM',2,null,null);
INSERTINTO #CONTROL_TABVALUES ('THREAD_NUMBER',8,null,null);
DROPTABLE RESULTS_TAB;
CREATECOLUMNTABLE RESULTS_TAB ("ID" INT,"Ai"DOUBLE);
DROPTABLE FITTED_TAB;
CREATECOLUMNTABLE FITTED_TAB ("ID" INT,"Fitted"DOUBLE);
DROPTABLE SIGNIFICANCE_TAB;
CREATECOLUMNTABLE SIGNIFICANCE_TAB ("NAME"varchar(50),"VALUE"DOUBLE);
CALL palLR(DATA_TAB, "#CONTROL_TAB", RESULTS_TAB, FITTED_TAB,SIGNIFICANCE_TAB)with overview;
SELECT * FROMRESULTS_TAB;
SELECT * FROMFITTED_TAB;
SELECT * FROMSIGNIFICANCE_TAB;
;#EXPECTED OUTPUT:RESULTS_TAB:
FITTED_TAB:
-
8/2/2019 Predictive Analysis Library Manual
30/46
Predictive Analysis Library Reference Manual
SAP AG 2011 30
SIGNIFICANCE_TAB:
;# forecastWithLR
ALTERSESSION SET CURRENT_SCHEMA = "DM_PAL";
DROPTYPE PREDICT_T;
CREATETYPE PREDICT_TASTABLE( "ID" INT,"X1"DOUBLE, "X2"DOUBLE);
DROPTYPE COEFFICIENT_T;CREATETYPE COEFFICIENT_TASTABLE("ID" INT,"Ai"DOUBLE);
DROPTYPE FITTED_T;
CREATETYPE FITTED_TASTABLE("ID" INT,"Fitted"DOUBLE);
DROPTYPE CONTROL_T;
CREATETYPE CONTROL_TASTABLE( "Name"VARCHAR(50),"intArgs"INTEGER, "doubleArgs"DOUBLE,"stringArgs"VARCHAR(100));
DROPPROCEDURE palForecastWithLR;
CREATEPROCEDURE palForecastWithLR( IN predictData PREDICT_T, INcoefficient COEFFICIENT_T, IN control CONTROL_T, OUT fittedValueFITTED_T )
LANGUAGELLANG
AS
BEGIN
export Void main(
Table "predictData"predictDataTab,
Table "coefficient" coefficientTab,
Table "control" argsTab,
-
8/2/2019 Predictive Analysis Library Manual
31/46
Predictive Analysis Library Reference Manual
SAP AG 2011 31
Table "fittedValue" & fittedTab) {
pal::forecastWithLR(predictDataTab, coefficientTab, argsTab,fittedTab);
}
END;
DROPTABLE PREDICTDATA_TAB;
CREATECOLUMNTABLE PREDICTDATA_TAB ( "ID" INT,"X1"DOUBLE, "X2"DOUBLE);
INSERTINTO PREDICTDATA_TABVALUES (0,0.5,0.3);
INSERTINTO PREDICTDATA_TABVALUES (1,4,0.4);
INSERTINTO PREDICTDATA_TABVALUES (2,0,1.6);
INSERTINTO PREDICTDATA_TABVALUES (3,0.3,0.45);
INSERTINTO PREDICTDATA_TABVALUES (4,0.4,1.7);
DROPTABLE #CONTROL_TAB;
CREATELOCALTEMPORARYCOLUMNTABLE #CONTROL_TAB (
"Name"VARCHAR(50),
"intArgs"INTEGER,
"doubleArgs"DOUBLE,
"stringArgs"VARCHAR(100));
INSERTINTO #CONTROL_TABVALUES ('VARIABLE_NUM',2,null,null);
INSERTINTO #CONTROL_TABVALUES ('THREAD_NUMBER',8,null,null);
DROPTABLE COEEFICIENT_TAB;CREATECOLUMNTABLE COEEFICIENT_TAB ("ID" INT,"Ai"DOUBLE);
INSERTINTO COEEFICIENT_TABVALUES (0,1.7120914258645001);
INSERTINTO COEEFICIENT_TABVALUES (1,0.2652771198483208);
INSERTINTO COEEFICIENT_TABVALUES (2,-3.471103742302148);
DROPTABLE FITTED_TAB;
CREATECOLUMNTABLE FITTED_TAB ("ID" INT,"Fitted"DOUBLE);
CALL palForecastWithLR(PREDICTDATA_TAB, COEEFICIENT_TAB, "#CONTROL_TAB",FITTED_TAB)with overview;
SELECT * FROMFITTED_TAB;
;#EXPECTED OUTPUT:
-
8/2/2019 Predictive Analysis Library Manual
32/46
Predictive Analysis Library Reference Manual
SAP AG 2011 32
Apriori
Apriori is a classic predictive analysis algorithm for finding association rules in association analysis.
Association analysis uncovers the hidden patterns, correlations or casual structures among a set ofitems or objects. For example, association analysis is used to understand what products and services
customers tend to purchase at the same time. By analyzing the purchasing trends of customers with
association analysis, then prediction of their future behavior may be made.
Apriori is designed to operate on databases containing transactions. As is common in association rule
mining, given a set of items, the algorithm attempts to find subsets which are common to at least a
minimum number of the item sets. Apriori uses a bottom up approach, where frequent subsets are
extended one item at a time, a step known as candidate generation, and groups of candidates are
tested against the data. The algorithm terminates when no further successful extensions are found.
Apriori uses breadth-first search and a tree structure to count candidate item sets efficiently. It
generates candidate item sets of length kfrom item sets of length k-1. Then it prunes the candidates
which have an infrequent sub pattern. The candidate set contains all frequent k-length item sets. Afterthat, it scans the transaction database to determine frequent item sets among the candidates. For
more information, refer tohttp://en.wikipedia.org/wiki/Apriori_algorithm.
The Apriori function in PAL uses vertical data format to store the transaction data in memory. The
function can take string or integer transaction ID and item ID as input. It supports the output of
confidence, support, and lift value, but does not limit the number of output rules. However, you can
use SQL Script to select the number of output rules, for example:
SELECT TOP 2000 FROM RULE_RESULTS where lift >0.5
PrerequisiteInput data does not contain null value.
Interface (aprioriRule)Function: pal::aprioriRule
This function reads input transactions data and generates association rules by the Apriori algorithm.
L Function Signature
pal::aprioriRule( Table dataset, Table args, Table result)
Input Table
Table Column Column Data Type Description Constraint
Dataset/Historical Data Transaction ID
column
Integer or
VARCHAR/CHAR
Transaction ID
Item column Integer or
VARCHAR/CHAR
Item ID
http://en.wikipedia.org/wiki/Apriori_algorithmhttp://en.wikipedia.org/wiki/Apriori_algorithmhttp://en.wikipedia.org/wiki/Apriori_algorithmhttp://en.wikipedia.org/wiki/Apriori_algorithm -
8/2/2019 Predictive Analysis Library Manual
33/46
Predictive Analysis Library Reference Manual
SAP AG 2011 33
Parameter Table
Name Data Type Description
MIN_SUPPORT Double User-specified minimum support (actual
value).
MIN_CONFIDENCE Double User-specified minimum confidence (actual
value).
TID_COLUMN Integer Indicates which column stores the
transaction ID (column index starts from
zero).
ITEM_COLUMN Integer Indicates which column stores the items ID
(column index starts from zero).
THREAD_NUMBER Integer Number of threads.
Output Table
Table Column Column Data Type Description Constraint
Result 1st column VARCHAR/CHAR Leading items
2nd column VARCHAR/CHAR Dependent items
3rd column Double Support value
4th column Double Confidence value
5th column Double Lift value
Example;#aprioriRule
DROPTYPE DATA_T;
CREATETYPE DATA_TASTABLE( "ID" INT,"CUSTOMER" INT,"ITEM"VARCHAR(20));
DROPTYPE RESULT_T;
CREATETYPE RESULT_TASTABLE("PRERULE"VARCHAR(500),"POSTRULE"VARCHAR(500),"SUPPORT"DOUBLE, "CONFIDENCE"DOUBLE,"LIFT"DOUBLE);
DROPTABLE #CONTROL_TAB;
CREATELOCALTEMPORARYCOLUMNTABLE #CONTROL_TAB(
"Name"VARCHAR(50),
"intArgs"INTEGER,
"doubleArgs"DOUBLE,
"stringArgs"VARCHAR(100));
DROPTYPE CONTROL_T;
-
8/2/2019 Predictive Analysis Library Manual
34/46
Predictive Analysis Library Reference Manual
SAP AG 2011 34
CREATETYPE CONTROL_TASTABLE( "Name"VARCHAR(50),"intArgs"INTEGER,"doubleArgs"DOUBLE,"stringArgs"VARCHAR(100));
DROPPROCEDURE palapprioriRule;
CREATEPROCEDURE palapprioriRule( IN data DATA_T, IN control CONTROL_T,OUT results RESULT_T )
LANGUAGELLANG
AS
BEGIN
export Void main(Table "data"dataTab,
Table "control" argsTab,
Table "results" & resultsTab) {
pal::aprioriRule(dataTab, argsTab, resultsTab);
}
END;
DROPTABLE TESTASSOCIATION_TAB;
CREATECOLUMNTABLE TESTASSOCIATION_TAB("ID" INT,"CUSTOMER" INT,"ITEM"VARCHAR(20));
INSERTINTO TESTASSOCIATION_TABVALUES (0, 0, 'item1');
INSERTINTO TESTASSOCIATION_TABVALUES (1, 0, 'item2');
INSERTINTO TESTASSOCIATION_TABVALUES (2, 0, 'item5'); #transacion T0:I1,I2,I5
INSERTINTO TESTASSOCIATION_TABVALUES (3, 1, 'item2');INSERTINTO TESTASSOCIATION_TABVALUES (4, 1, 'item4'); #transacion T1:I2,I4
INSERTINTO TESTASSOCIATION_TABVALUES (5, 2, 'item2');
INSERTINTO TESTASSOCIATION_TABVALUES (6, 2, 'item3');
INSERTINTO TESTASSOCIATION_TABVALUES (7, 3, 'item1');
INSERTINTO TESTASSOCIATION_TABVALUES (8, 3, 'item2');
INSERTINTO TESTASSOCIATION_TABVALUES (9, 3, 'item4');
INSERTINTO TESTASSOCIATION_TABVALUES (10, 4,'item1');
INSERTINTO TESTASSOCIATION_TABVALUES (11, 4,'item3');
INSERTINTO TESTASSOCIATION_TABVALUES (12, 5, 'item2');
INSERTINTO TESTASSOCIATION_TABVALUES (13, 5, 'item3');
INSERTINTO TESTASSOCIATION_TABVALUES (14, 6, 'item1');
INSERTINTO TESTASSOCIATION_TABVALUES (15, 6, 'item3');
INSERTINTO TESTASSOCIATION_TABVALUES (16, 7, 'item1');
INSERTINTO TESTASSOCIATION_TABVALUES (18, 7, 'item2');
INSERTINTO TESTASSOCIATION_TABVALUES (19, 7, 'item3');
INSERTINTO TESTASSOCIATION_TABVALUES (20, 7, 'item5');
INSERTINTO TESTASSOCIATION_TABVALUES (21, 8, 'item1');
INSERTINTO TESTASSOCIATION_TABVALUES (22, 8, 'item2');
INSERTINTO TESTASSOCIATION_TABVALUES (23, 8, 'item3');
DROPTABLE RESULTS_TAB;
-
8/2/2019 Predictive Analysis Library Manual
35/46
Predictive Analysis Library Reference Manual
SAP AG 2011 35
CREATECOLUMNTABLE RESULTS_TAB ("PRERULE"VARCHAR(500),"POSTRULE"VARCHAR(500), "SUPPORT"Double, "CONFIDENCE"Double,"LIFT"DOUBLE);
TRUNCATETABLE #CONTROL_TAB;
INSERTINTO #CONTROL_TABVALUES ('TID_COLUMN',1,null,null);INSERTINTO #CONTROL_TABVALUES ('ITEM_COLUMN',2,null,null);
INSERTINTO #CONTROL_TABVALUES ('THREAD_NUMBER',4,null,null);
INSERTINTO #CONTROL_TABVALUES ('MIN_SUPPORT',null,0.2,null);
INSERTINTO #CONTROL_TABVALUES ('MIN_CONFIDENCE',null,0.2,null);
TRUNCATETABLE RESULTS_TAB;
CALL palapprioriRule(TESTASSOCIATION_TAB, "#CONTROL_TAB", RESULTS_TAB)with overview;
SELECT * FROMRESULTS_TAB;
;#EXPECTED OUTPUT:
-
8/2/2019 Predictive Analysis Library Manual
36/46
Predictive Analysis Library Reference Manual
SAP AG 2011 36
ABC Classification
ABC Classification is used to classify objects, such as customers, employees, or products, based on a
particular measure, such as revenue or profit. ABC analysis suggests that inventories of anorganization are not of equal value. Thus, the inventories are grouped into three categories (A, B, and
C) by their estimated importance. A items are very important for an organization. B items are
important, but less important than A items and more important than C items. Therefore, B items
are of medium importance, and C items are marginally important.
An example of ABC classification is as follows:
A items 20% of the items account for 70% of the annual consumption value of all items.
B items 30% of the items account for 25% of the annual consumption value of all items.
C items 50% of the items account for 5% of the annual consumption value of all items.
For more information, refer tohttp://en.wikipedia.org/wiki/ABC_analysis.
Prerequisite
Input data cannot contain null value.
The item names in the Input table must be of string data type and be unique.
Interface (abcAnalysis)
Function: pal::abcAnalysis
This function performs the ABC analysis algorithm.
L Function Signature
pal::abcAnalysis ( Table target, Table args, Table result)
Input Table
Table Column Column Data Type Description Constraint
Target Data 1st column VARCHAR/CHAR Item name
2nd column Double Value
Parameter Table
Name Data Type Description
START_COLUMN Integer The first column used to do the
classification (column index starts from
zero).
END_COLUMN Integer The last column used to do the classification
(column index starts from zero).
THREAD_NUMBER Integer Number of threadsPERCENT_A Double Interval for A class
http://en.wikipedia.org/wiki/ABC_analysishttp://en.wikipedia.org/wiki/ABC_analysishttp://en.wikipedia.org/wiki/ABC_analysishttp://en.wikipedia.org/wiki/ABC_analysis -
8/2/2019 Predictive Analysis Library Manual
37/46
Predictive Analysis Library Reference Manual
SAP AG 2011 37
Name Data Type Description
PERCENT_B Double Interval for B class
PERCENT_C Double Interval for C class
Output Table
Table Column Column Data Type Description Constraint
Result 1st column VARCHAR/CHAR ABC class
2nd column VARCHAR/CHAR Items
Example
;# ABC
DROPTYPE DATA_T;
CREATETYPE DATA_TASTABLE("ITEM"VARCHAR(100),"VALUE"DOUBLE);
DROPTYPE CONTROL_T;
CREATETYPE CONTROL_TASTABLE("Name"VARCHAR(100), "intArgs" INT,"doubleArgs"DOUBLE,"strArgs"VARCHAR(100));
DROPTYPE RESULT_T;
CREATETYPE RESULT_TASTABLE("ABC"VARCHAR(10),"ITEM"VARCHAR(100));
DROPPROCEDURE palAbcAnalysis;
CREATEPROCEDURE palAbcAnalysis( IN target DATA_T, IN control CONTROL_T,OUT results RESULT_T )
LANGUAGELLANG
AS
BEGIN
export Void main(Table "target" targetTab,
Table "control" controlTab,
Table "results" & resultsTab){
pal::abcAnalysis(targetTab, controlTab, resultsTab);
}
END;
DROPTABLE #CONTROL_TBL;
CREATELOCALTEMPORARYCOLUMNTABLE #CONTROL_TBL ("Name"VARCHAR(100),"intArgs" INT, "doubleArgs"DOUBLE,"strArgs"VARCHAR(100));
INSERTINTO #CONTROL_TBLVALUES ('START_COLUMN',0,null,NULL);
INSERTINTO #CONTROL_TBLVALUES ('END_COLUMN',1,null,null);
INSERTINTO #CONTROL_TBLVALUES ('THREAD_NUMBER',2,null,null);
INSERTINTO #CONTROL_TBLVALUES ('PERCENT_A',null,0.7,null);
-
8/2/2019 Predictive Analysis Library Manual
38/46
Predictive Analysis Library Reference Manual
SAP AG 2011 38
INSERTINTO #CONTROL_TBLVALUES ('PERCENT_B',null,0.2,null);
INSERTINTO #CONTROL_TBLVALUES ('PERCENT_C',null,0.1,null);
DROPTABLE TESTABCTAB;
CREATECOLUMNTABLE TESTABCTAB("ITEM"VARCHAR(100),"VALUE"DOUBLE);INSERTINTO TESTABCTABVALUES ('item1', 15.4);
INSERTINTO TESTABCTABVALUES ('item2', 200.4);
INSERTINTO TESTABCTABVALUES ('item3', 280.4);
INSERTINTO TESTABCTABVALUES ('item4', 100.9);#100.9
INSERTINTO TESTABCTABVALUES ('item5', 40.4);
INSERTINTO TESTABCTABVALUES ('item6', 25.6);
INSERTINTO TESTABCTABVALUES ('item7', 18.4);
INSERTINTO TESTABCTABVALUES ('item8', 10.5);
INSERTINTO TESTABCTABVALUES ('item9', 96.15);
INSERTINTO
TESTABCTABVALUES
('item10', 9.4);
DROPTABLE RESULT_TBL;
CREATECOLUMNTABLE RESULT_TBL("ABC"VARCHAR(10),"ITEM"VARCHAR(100));
CALL palAbcAnalysis(TESTABCTAB, "#CONTROL_TBL", RESULT_TBL)withoverview;
SELECT * FROMRESULT_TBL;
;#EXPECTED OUTPUT:
-
8/2/2019 Predictive Analysis Library Manual
39/46
Predictive Analysis Library Reference Manual
SAP AG 2011 39
Weighted Score Table
A weighted score table is a method of evaluating alternatives when the importance of each criterion
differs. In a weighted score table, each alternative is given a score for each criterion. These scores are
then weighted by the importance of each criterion. All of an alternative's weighted scores are thenadded together to calculate that alternative's total weighted score. The alternative with the highest total
score should be the best alternative. Weighted score tables can be used to make predictions about
future customer behavior. A model based on historical data in a data mining application may be
applied to new data to make prediction. The prediction, that is, the output of the model, is also called a
Score. A single score for customers can be calculated by taking into account different dimensions.
A function defined by weighted score tables is a linear combination of functions of a variable.
f(x1,,xn) = w1f1(x1) + + wnfn(xn)
Prerequisites
Input data cannot contain null value.
The column of the Map Function table should be sorted by the attribute order of the Input Data
table.
Interface (weightedTable)
Function: pal::weightedTable
This function performs weighted table calculation. It is similar to the Volume Driver function in the
Business Function Library (BFL). Volume Driver calculates only one column, but weightedTable
calculates multiple columns at the same time.
L Function Signature
pal::weightedTable ( Table target, Table mapfun, Table
control, Table args, Table result)
Input Table
Table Column Column Data Type Description Constraint
Target/Input
Data
Columns VARCHAR/CHAR,
integer, or double
Specifies which
will be used to
calculate the
scores
Discrete value: integer, string,
double;
Continuous value: integer,
double;
An ID column is mandatory. Its
data type should be integer.
Map Function Columns VARCHAR/CHAR,
integer, or double
Creates the map
function
Every attribute (except ID) in the
Input Data table maps to two
columns in the Map Function
table: Key column and Value
column. The Value column must
be double type.
-
8/2/2019 Predictive Analysis Library Manual
40/46
Predictive Analysis Library Reference Manual
SAP AG 2011 40
Table Column Column Data Type Description Constraint
Control Columns Integer or double This table has three columns.
When the Input Data table has n
attributes (except ID), the WeightTable will have nrows.
Output Table
Table Column Column Data Type Description Constraint
Result 1st column Integer ID
2nd column Double Result value
Parameter Table
Name Data Type Description
START_COLUMN Integer The first column used to do the calculation
(column index starts from zero).
END_COLUMN Integer The last column used to do the calculation
(column index starts from zero).
THREAD_NUMBER Integer Number of threads
ID_COLUMN Integer Specifies the ID column (column index
starts from zero).
Example
;#weightedTable
DROPTYPE DATA_T;
CREATETYPE DATA_TASTABLE( "ID" INT,"GENDER"VARCHAR(10),"INCOME"INT,"HEIGHT"DOUBLE);
DROPTYPE MAP_FUN_T;
CREATETYPE MAP_FUN_TASTABLE("GENDER"VARCHAR(10), "VAL1"DOUBLE,"INCOME" INT, "VAL2"DOUBLE, "HEIGHT"DOUBLE, "VAL3"DOUBLE);
DROPTYPE CONTROL_T;
CREATETYPE CONTROL_TASTABLE( "WEIGHT"DOUBLE, "ISDIS" INT, "ROWNUM"INT);
DROPTYPE PARAMETERS_T;
CREATETYPE PARAMETERS_TASTABLE("Name"VARCHAR(100), "intArgs" INT,"doubleArgs"DOUBLE);
-
8/2/2019 Predictive Analysis Library Manual
41/46
Predictive Analysis Library Reference Manual
SAP AG 2011 41
DROPTYPE RESULT_T;
CREATETYPE RESULT_TASTABLE("ID" INT,"result"DOUBLE);
DROPPROCEDURE palWeightTable;
CREATEPROCEDURE palWeightTable( IN target DATA_T, IN mapfun MAP_FUN_T,IN control CONTROL_T, IN parameters PARAMETERS_T,OUT results RESULT_T )
LANGUAGELLANG
AS
BEGIN
export Void main(Table "target" targetTab,
Table "mapfun" mapfunTab,
Table"control" controlTab,
Table"parameters" parametersTab,
Table "results" & resultsTab) {
pal::weightedTable(targetTab, mapfunTab, controlTab, parametersTab,resultsTab);
}
END;
DROPTABLE TESTTARGET_TBL;
CREATECOLUMNTABLE TESTTARGET_TBL ("ID" INT,"GENDER"VARCHAR(10),"INCOME" INT,"HEIGHT"DOUBLE);
INSERTINTO TESTTARGET_TBLVALUES (0,'male',5000,1.73);
INSERTINTO TESTTARGET_TBLVALUES (1,'male',9000,1.80);
INSERTINTO TESTTARGET_TBLVALUES (2,'female',6000,1.55);
INSERTINTO TESTTARGET_TBLVALUES (3,'male',15000,1.65);
INSERTINTO TESTTARGET_TBLVALUES (4,'female',2000,1.70);
INSERTINTO TESTTARGET_TBLVALUES (5,'female',12000,1.65);
INSERTINTO TESTTARGET_TBLVALUES (6,'male',1000,1.65);
INSERTINTO TESTTARGET_TBLVALUES (7,'male',8000,1.60);
INSERTINTO TESTTARGET_TBLVALUES (8,'female',5500,1.85);#5500
INSERTINTO TESTTARGET_TBLVALUES (9,'female',9500,1.85);
DROPTABLE MAP_FUN_TBL;
CREATECOLUMNTABLE MAP_FUN_TBL ( "GENDER"VARCHAR(10), "VAL1"DOUBLE,"INCOME" INT, "VAL2"DOUBLE, "HEIGHT"DOUBLE, "VAL3"DOUBLE);
INSERTINTO MAP_FUN_TBLVALUES ('male',2.0, 0,0.0, 1.5,0.0);
INSERTINTO MAP_FUN_TBLVALUES ('female',1.5, 5500,1.0, 1.6,1.0);
INSERTINTO MAP_FUN_TBLVALUES ('null',0.0, 9000,2.0, 1.71,2.0);
INSERTINTO MAP_FUN_TBLVALUES ('null',0.0, 12000,3.0, 1.80,3.0);
DROPTABLE CONTROL_TBL;
-
8/2/2019 Predictive Analysis Library Manual
42/46
Predictive Analysis Library Reference Manual
SAP AG 2011 42
CREATECOLUMNTABLE CONTROL_TBL ("WEIGHT"DOUBLE, "ISDIS" INT, "ROWNUM"INT);
INSERTINTO CONTROL_TBLVALUES (0.5,1,2);
INSERTINTO CONTROL_TBLVALUES (2.0,-1,4);
INSERTINTO CONTROL_TBLVALUES (1.0,-1,4);
DROPTABLE #PARAMETERS_TBL;
CREATELOCALTEMPORARYCOLUMNTABLE #PARAMETERS_TBL ("Name"VARCHAR(100),"intArgs" INT, "doubleArgs"DOUBLE);
INSERTINTO #PARAMETERS_TBLVALUES ('ID_COLUMN',0,null);
INSERTINTO #PARAMETERS_TBLVALUES ('START_COLUMN',1,null);
INSERTINTO #PARAMETERS_TBLVALUES ('END_COLUMN',3,null);
INSERTINTO #PARAMETERS_TBLVALUES ('THREAD_NUMBER',2,null);
DROPTABLE RESULT_TBL;
CREATECOLUMNTABLE RESULT_TBL("ID" INT,"result"DOUBLE);
CALL palWeightTable(TESTTARGET_TBL, MAP_FUN_TBL, CONTROL_TBL,"#PARAMETERS_TBL", RESULT_TBL)with overview;
SELECT * FROMRESULT_TBL;;#EXPECTED OUTPUT:
-
8/2/2019 Predictive Analysis Library Manual
43/46
Predictive Analysis Library Reference Manual
SAP AG 2011 43
Log and TraceTo learn about the details of PAL function implementation or trace a problem occurred during function
runtime, you can check the log information in SAP HANA Database Administration.
Procedure
To open a log file:
1. In SAP HANA Database Administration, choose TraceLevels. Then click Add Componentin
the Trace Levelsdialog box, enter the PAL function name, and click OK. The name should beentered as PAL_, for example, PAL_APRIORIRULE.
-
8/2/2019 Predictive Analysis Library Manual
44/46
Predictive Analysis Library Reference Manual
SAP AG 2011 44
2. Select a log type for the newly added PAL component and click OK, as shown below.
3. Right-click on indexserver_ and select Show File.
-
8/2/2019 Predictive Analysis Library Manual
45/46
Predictive Analysis Library Reference Manual
SAP AG 2011 45
Result
The log file for the specified PAL function is displayed. You can use the Findbutton to search for the
log information that you need.
-
8/2/2019 Predictive Analysis Library Manual
46/46
Predictive Analysis Library Reference Manual
Copyrights Copyright 2011 SAP AG. All rights reserved.