predictive analysis library manual

Upload: abhilashcs

Post on 05-Apr-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 Predictive Analysis Library Manual

    1/46

  • 8/2/2019 Predictive Analysis Library Manual

    2/46

    SAP AG 2011 2

    Table of Contents

    Overview .......................................................................................................................................... 4PAL Common Interface .................................................................................................................. 5

    Input Data Table .................................................................................................................... 5Parameter Table .................................................................................................................... 6

    Specifying the ID Column ............................................................................................. 6Output Data Table ................................................................................................................. 7

    List of PAL Algorithms ................................................................................................................... 8PAL Algorithm Descriptions.......................................................................................................... 9

    K-means ................................................................................................................................ 9Prerequisites ................................................................................................................. 9Interface (kmeans) ........................................................................................................ 9Interface (validateKmeans)......................................................................................... 11Example ...................................................................................................................... 12

    C4.5 Decision Tree .............................................................................................................. 16Prerequisites ............................................................................................................... 16Interface (createDT) ................................................................................................... 16Interface (predictWithDT) ........................................................................................... 17Example ...................................................................................................................... 18

    KNN .................................................................................................................................... 22

    Prerequisites ............................................................................................................... 22Interface (knn) ............................................................................................................ 22Example ...................................................................................................................... 23

    Multiple Linear Regression .................................................................................................. 26Prerequisites ............................................................................................................... 26Interface (linearRegression) ....................................................................................... 26Interface (forecastWithLR).......................................................................................... 27Example ...................................................................................................................... 28

    Apriori .................................................................................................................................. 32Prerequisite ................................................................................................................ 32Interface (aprioriRule) ................................................................................................. 32Example ...................................................................................................................... 33

    ABC Classification ............................................................................................................... 36Prerequisite ................................................................................................................ 36Interface (abcAnalysis) ............................................................................................... 36Example ...................................................................................................................... 37

    Weighted Score Table ......................................................................................................... 39Prerequisites ............................................................................................................... 39Interface (weightedTable) ........................................................................................... 39

  • 8/2/2019 Predictive Analysis Library Manual

    3/46

    SAP AG 2011 3

    Example ...................................................................................................................... 40Log and Trace ............................................................................................................................... 43Copyrights ..................................................................................................................................... 46

  • 8/2/2019 Predictive Analysis Library Manual

    4/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 4

    OverviewSince the SAP HANA 1.0 GA, SQL Script v2 can be used to express application logic within the

    database that exceeds the capabilities of pure SQL. With enhanced control flow capabilities, SQL

    Script v2 is more suitable for pushing complex application logic to the SAP HANA database.

    When designing HANA applications, procedures are the main programmable containers. However, it is

    difficult and often impossible to describe predictive analysis logic with procedures. For example, an

    application may need to perform a cluster analysis in a huge customer table with a terra byte of data. It

    is impossible to implement the analysis in a procedure even with the simple classic K-means algorithm.

    Obviously, it is not a wise decision to copy a huge table to the application server to perform the K-

    means calculation, because data-copying will be slow and is not necessary.

    SAP provides the Predictive Analysis Library (PAL) to offer you the flexibility and efficiency to develop

    HANA applications requiring predictive functionality. In the above case, using the PAL is the best

    choice. PAL algorithms can be called directly in L wrapper within SQL Script v2. The inputs and

    outputs are all tables. Currently, PAL includes seven well known predictive analysis algorithms in threedata mining algorithm categories:

    Cluster analysis

    Classification analysis

    Association analysis

    The functions in PAL are predefined. More functions will be supported in future releases. The seven

    algorithms included in PAL were carefully selected based on the following criteria:

    1. These algorithms are required for SAP HANA applications.

    2. Market surveys (e.g. Rexer Analytics and KDnuggets polls) show that these algorithms are most

    commonly used.

    3. These are the most common algorithms available in database products from other vendors in

    the marketplace, such as Microsoft SQL Server, Oracle, and IBM DB2.

  • 8/2/2019 Predictive Analysis Library Manual

    5/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 5

    PAL Common InterfaceThe Predictive Analysis Library uses tables as the data interface. Three types of tables are used for

    each PAL function:

    Input data table Supplies input data to PAL functions. Each function can have one or more

    input data tables.

    Parameter table Supplies parameters to PAL functions. Parameters are special inputs that

    control what the functions do. Each function has only one parameter table.

    Output data table Holds the output results calculated by PAL functions. Each function can

    have one or more output data tables.

    The structures of the above tables are pre-defined and cannot be changed. Table names and column

    names can be specified by the users.

    Input Data Table

    Input data tables contain the input data required to perform predictive analysis, sometimes described

    as the historical and training data used to build predictive models.

    For some cluster and classification functions, primary keys/IDs are required in the input data table, so

    that the output data table can contain the same primary key/ID column. Usually the ID column is the

    first column of the input data table. For some PAL functions, you can specify which column should be

    the ID column. SeeSpecifying the ID Columnfor details.

    A typical input data table looks like the following:

    ID X1 X2 X3

    0 0.1 0.1 0.1

    1 0.1 0 0.1

    2 10.1 10.1 0.1

    3 10.1 9.9 0.1

    4 10.1 10.1 0.1

    5 1000.1 1000.1 0.1

    6 999.1 1000.1 0.1

    7 0.1 10000.1 0.18 0.1 9999.1 0.1

    9 11.1 10.1 0.1

    10 0.001 0.001 0.1

  • 8/2/2019 Predictive Analysis Library Manual

    6/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 6

    Parameter Table

    PAL functions use parameter tables to transfer parameter values. Each PAL function has its own

    parameter table. To avoid the conflict of table names when several users call PAL functions at thesame time, the parameter table must be created as a local temporary column table, so that each

    parameter table has its unique scope per session. The table structure is defined as below. Each row

    contains only one parameter value.

    Column Name Data Type Description

    Name VARCHAR/CHAR Parameter name

    intArgs Integer Integer parameter value

    doubleArgs Double Double parameter value

    stringArgs VARCHAR/CHAR String parameter value

    The following shows an example of a parameter table with three parameters. The first parameter,

    THREAD_NUMBER, is an integer parameter. Thus, in the THREAD_NUMBER row, you should fill the

    parameter value in the intArgs column, and leave the doubleArgs and stringArgs columns blank.

    Name intArgs doubleArgs stringArgs

    THREAD_NUMBER 1

    SUPPORT 0.2

    VAR_NAME hello

    Specifying the ID Column

    Some cluster and classification functions include an ID column in the input data table. Usually the ID

    column is the first column of the input data table. For some PAL functions, you can specify a column

    number for the ID_COLUMN parameter in the parameter table to tell the function which column should

    be the ID column. Column number starts from 0, so the first column is column 0, the second column 1,

    and so on. For example, if you want to specify the first column as the ID column, the parameter table

    should contain the following row:

    Name intArgs doubleArgs stringArgs

    ID_COLUMN 0

  • 8/2/2019 Predictive Analysis Library Manual

    7/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 7

    Output Data Table

    Output tables contain predicted results, such as classification types and fitted values. A typical output

    table looks like the following:ID CENTER_ASSIGN ENERGY

    0 0 0.3

    1 1 0.2

    2 4 20.3

    3 4 20.1

    4 4 20.3

    5 4 2000.3

    6 4 1999.3

    7 2 10000.3

    8 2 9999.3

    9 3 21.3

    10 1 0.12

  • 8/2/2019 Predictive Analysis Library Manual

    8/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 8

    List of PAL AlgorithmsThe following table lists all available algorithms and functions in the Predictive Analysis Library. The

    algorithms are grouped by categories.

    To learn more about an algorithm, click the algorithm name.

    To learn more about function interface, click the function name.

    Category Algorithm Function Name Description

    Cluster Analysis K-means kmeans Clusters data using the k-means algorithm.

    validateKmeans Validates or measures the quality of

    clustered result.

    ClassificationAnalysis C4.5 Decision Tree

    createDT Creates tree models using the C4.5algorithm. The model is represented in

    JSON.

    predictWithDT Uses the tree model to perform prediction.

    KNN knn K-nearest neighbor algorithm function.

    Multiple Linear

    Regression

    linearRegression Multiple linear regression algorithm

    function.

    forecastWithLR Makes forecasts using the regression

    equation.

    AssociationAnalysis Apriori

    aprioriRule Creates association rules using the Apriorialgorithm.

    Other ABC Classification abcAnalysis Performs ABC classification analysis. For

    example, when 20% items contribute 80%

    of the total revenue, these 20% items can

    be put into class A.

    Weighted Score

    Table

    weightedTable Performs weighted table calculation.

  • 8/2/2019 Predictive Analysis Library Manual

    9/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 9

    PAL Algorithm DescriptionsThis section contains detailed descriptions of all available algorithms in the Predictive Analysis Library.

    The following information is provided for each algorithm.

    Algorithm description

    Prerequisites

    Interface description (function name; L function signature; input, parameter, and output tables)

    Example

    K-means

    In predictive analysis, k-means clustering is a method of cluster analysis which aims to partition n

    observations or records into kclusters in which each observation belongs to the cluster with the

    nearest mean. In marketing and customer-relationship management areas, it uses customer data to

    track customer behavior and create strategic business initiatives. Organizations can use this data to

    divide customers into segments based on variants such as demography, customer behavior, customer

    profitability, measure of risk, and lifetime value of a customer or retention probability.

    Clustering works to group records together according to an algorithm or mathematical formula that

    attempts to find centroids, or centers, around which similar records gravitate. The most common

    algorithm uses an iterative refinement technique. It is also referred to as Lloyd's algorithm:

    Given an initial set of kmeans m1,...,mk, the algorithm proceeds by alternating between two steps:

    Assignment step: Assign each observation to the cluster with the closest mean.

    Update step: Calculate the new means to be the centroid of the observations in the cluster.

    The algorithm repeats until the assignments no longer change.

    For more information, refer tohttp://en.wikipedia.org/wiki/K-means_clustering.

    The k-means implementation in PAL supports multi-thread, data normalization, different distance level

    measurement, and cluster quality measure (Silhouette).The implementation doesn t support

    categorical data, however this can be managed through data transformation. The first K and random K

    starting methods are supported.

    Prerequisites The input data should contain an ID column and the other columns should be integer or double

    data type.

    Input data does not contain null value. The function will issue errors when encountering null

    values.

    Interface (kmeans)

    Function: pal::kmeans

    This is a clustering function using the k-means algorithm.

    http://en.wikipedia.org/wiki/K-means_clusteringhttp://en.wikipedia.org/wiki/K-means_clusteringhttp://en.wikipedia.org/wiki/K-means_clusteringhttp://en.wikipedia.org/wiki/K-means_clustering
  • 8/2/2019 Predictive Analysis Library Manual

    10/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 10

    L Function Signature

    pal::kmeans(Table dataset, Table args, Table result)

    Input Table

    Table Column Column Data Type Description Constraint

    Data 1st column Integer ID

    Other columns Integer or double Attribute data

    Parameter Table

    Name Data Type Description

    GROUP_NUMBER Integer Number of groups (k).

    DISTANCE_LEVEL Integer Computes the distance between item and

    cluster center.

    2 = Euclidean distance

    MAX_ITERATION Integer Maximum number of iterations.

    START_COLUMN Integer The index number of the first data column

    (column index starts from zero).

    COLUMN_NUM Integer Number of data columns.

    START_ROW Integer The index number of the first data row (row

    index starts from zero).

    ROW_NUM Integer Number of data rows.

    INIT_TYPE Integer Center initialization type:

    1 = first K

    2 = weighted random with replacement

    3 = random without replacement

    NORMALIZATION Integer Normalization type:

    0 = no

    1 = yes, for each point X(x1,x2,...xn),normalized value will be X'

    (|x1|/S,|x2|/S...|xn|/S) where S =

    |x1|+|x2|+...|xn|

    THREAD_NUMBER Integer Number of threads.

    EXIT_THRESHOLD Double Threshold (actual value) for exiting the

    iterations.

  • 8/2/2019 Predictive Analysis Library Manual

    11/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 11

    Output Table

    Table Column Column Data Type Description Constraint

    Result 1st column Integer ID

    2nd column Integer or double Clustered item assigned to

    class number

    3th column Integer or double Sum of item's attribute, which

    is used in normalization.

    Interface (validateKmeans)

    Function: pal::validateKmeans

    This is a quality measurement function for k-means clustering.

    L Function Signature

    pal::validateKmeans (Table dataset, Table args, Table

    result)

    Input Table

    Table Column Column Data Type Description Constraint

    Data 1st column Integer ID

    Other

    columns

    Integer or double Attribute data

    Type Data/

    Class Data

    1st column Integer ID

    2nd column Integer Class type

    Parameter Table

    Name Data Type Description

    VARIABLE_NUM Integer Number of variables

    THREAD_NUMBER Integer Number of threads

    Output Table

    Table Column Column Data Type Description Constraint

    Result 1st column VARCHAR/CHAR Name

    2nd column Double Measure result

  • 8/2/2019 Predictive Analysis Library Manual

    12/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 12

    Example

    ;#KMEANS.SQL

    ; # create type for kmeans resultALTERSESSION SET CURRENT_SCHEMA = "DM_PAL";DROPTYPE T_KMEANS_RESULT_ASSIGN_TAB;CREATETYPE T_KMEANS_RESULT_ASSIGN_TABASTABLE("ID" INT,"CENTER_ASSIGN" INT,"ENERGY"DOUBLE);

    ; # create type for double argumentsDROPTYPE T_SINGLE_COLUMN_DOUBLE_TAB ;CREATETYPE T_SINGLE_COLUMN_DOUBLE_TABASTABLE("VALUE"DOUBLE) ;

    ; # create type for integer arguments

    DROPTYPE T_SINGLE_COLUMN_INT_TAB ;CREATETYPE T_SINGLE_COLUMN_INT_TABASTABLE("VALUE" INT) ;

    DROPTYPE KMEANS_DOUBLE_INPUT;CREATETYPE KMEANS_DOUBLE_INPUTASTABLE("ID" INT,"V000"DOUBLE, "V001"DOUBLE,

    primarykey("ID"));

    DROPTABLE #CONTROL_TAB;CREATELOCALTEMPORARYCOLUMNTABLE #CONTROL_TAB (

    "Name"VARCHAR(50),"intArgs"INTEGER,

    "doubleArgs"DOUBLE,"stringArgs"VARCHAR(100));

    DROPTYPE CONTROL_T;CREATETYPE CONTROL_TASTABLE ( "Name"VARCHAR(50),"intArgs"INTEGER,"doubleArgs"DOUBLE,"stringArgs"VARCHAR(100));

    DROPPROCEDURE KMEANS_WITH_NEWDB;

    CREATEPROCEDUREKMEANS_WITH_NEWDB(IN dataset KMEANS_DOUBLE_INPUT,IN control CONTROL_T,

    OUT cluster_assignment T_KMEANS_RESULT_ASSIGN_TAB)LANGUAGELLANGASBEGINexport Void main(Table "dataset" datasetTab,Table "control" argsTab,Table"cluster_assignment" & resultTab){pal::kmeans(datasetTab, argsTab, resultTab);}END;

  • 8/2/2019 Predictive Analysis Library Manual

    13/46

  • 8/2/2019 Predictive Analysis Library Manual

    14/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 14

    ;#EXPECTED OUTPUT:

    #validateKmeans.sqlDROPVIEWKMEANS_TYPE_ASSIGN_TAB_PAL;

    CREATEVIEWKMEANS_TYPE_ASSIGN_TAB_PALASSELECT"ID", "CENTER_ASSIGN"AS"TYPE_ASSIGN"FROMKMEANS_RESULT_ASSIGN_TAB_PAL;

    DROPTYPE T_KMEANS_TYPE_ASSIGN_TAB_S;CREATETYPE T_KMEANS_TYPE_ASSIGN_TAB_SASTABLE("ID"INTEGER,"TYPE_ASSIGN"INTEGER);

    DROPTYPE T_KMEANS_RESULT_SVALUE_TAB;CREATETYPE T_KMEANS_RESULT_SVALUE_TABASTABLE(

    "NAME"VARCHAR(50),"S"DOUBLE);

    DROPPROCEDURE KMEANSVALIDATE_WITH_NEWDB;

    CREATEPROCEDUREKMEANSVALIDATE_WITH_NEWDB(IN dataset KMEANS_DOUBLE_INPUT,IN typeset T_KMEANS_TYPE_ASSIGN_TAB_S,IN control CONTROL_T,OUT sValue T_KMEANS_RESULT_SVALUE_TAB)LANGUAGELLANGASBEGIN

    export Void main(Table "dataset" datasetTab,

  • 8/2/2019 Predictive Analysis Library Manual

    15/46

  • 8/2/2019 Predictive Analysis Library Manual

    16/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 16

    C4.5 Decision Tree

    A decision tree is used as a classifier for determining an appropriate action or decision among a

    predetermined set of actions for a given case. A decision tree helps to effectively identify the factors to

    consider and how each factor has historically been associated with different outcomes of the decision.A decision tree is a classifier that uses a tree-like structure of conditions and their possible

    consequences. Each node of a decision tree can be a leaf node or a decision node.

    Leaf node: identifies the value of the dependent (target) variable.

    Decision node: contains one condition that specifies some test on an attribute value. The

    outcome of the condition is further divided into branches with subtrees or leaf nodes.

    C4.5 is an algorithm used to generate a decision tree. C4.5 builds decision trees from a set of training

    data , using the concept of information entropy. The training data is a set S = s1,s2,... of already

    classified samples. Each sample si = x1,x2,... is a vector where x1,x2,... represent attributes or

    features of the sample. The training data is augmented with a vector C = c1,c2,... where c1,c2,...

    represent the class to which each sample belongs. At each node of the tree, C4.5 chooses oneattribute of the data that most effectively splits its set of samples into subsets enriched in one class or

    the other. Its criterion is the normalized information gain (difference in entropy) that results from

    choosing an attribute for splitting the data. The attribute with the highest normalized information gain is

    chosen to make the decision. The C4.5 algorithm then recursively works on the smaller sublists. For

    more information, refer tohttp://en.wikipedia.org/wiki/C4.5_algorithm.

    The C4.5 decision tree functions implemented in PAL supports both discrete and continuous values. A

    continuous attribute is discretised by defining fixed intervals provided by the user. For example, if the

    salary ranges from $100 to $20,000, then we can form intervals such as $0 $8,000, $8,000

    $18,000, and $18,000 $20,000. An attribute value will fall into any one of these intervals. In the PAL

    implementation, Reduced Error Pruning (REP) Algorithm is used as pruning method.

    Prerequisites The column order and column number of predicted data should be the same as the order and

    number used in tree model building.

    The last column of training data is used as a predicted field and its type should be discrete type.

    Predicted data set should have an ID column.

    Input data does not contain null value, otherwise exceptions will be thrown.

    The table used to store the tree model should be a column table.

    Interface (createDT)Function: pal::createDT

    The createDT function creates a decision tree from input training data.

    L Function Signature

    pal::createDT(Table training, Table args, Table result)

    http://en.wikipedia.org/wiki/C4.5_algorithmhttp://en.wikipedia.org/wiki/C4.5_algorithmhttp://en.wikipedia.org/wiki/C4.5_algorithmhttp://en.wikipedia.org/wiki/C4.5_algorithm
  • 8/2/2019 Predictive Analysis Library Manual

    17/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 17

    Input Table

    Table Column Column Data Type Description Constraint

    Training /

    Historical Data

    Columns VARCHAR/CHAR,

    integer, or double

    Table used to

    build the predictivetree model

    Discrete value: integer and

    stringContinuous value: integer and

    double

    Parameter Table

    Name Data Type Description

    START_COLUMN Integer The first attribute/column used to make

    prediction in the training data set

    (column index starts from zero).

    END_COLUMN Integer The last attribute/column used to make

    prediction in the training data set .This

    attribute/column must contain class

    information (column index starts from

    zero).

    PERCENTAGE Double The percentage to be applied to

    determine the input training data set.

    THREAD_NUMBER Integer Number of threads.

    CONTINUOUS_COL (Integer,

    Double)

    (optional)

    Defines which column needs

    discretization and the interval provided

    by the user (column index starts from

    zero).

    The integer value specifies the column

    position.

    The double value specifies the interval.

    Output Table

    Table Column Column Data Type Description Constraint

    Result (treemodel)

    1st column CLOB Tree model savedas a JSON string

    in the 1st column.

    The table should be a columntable; otherwise the CLOB

    type is not supported.

    Interface (predictWithDT)

    Function: pal::predictWithDT

    The predictWithDT function is used to perform prediction by using decision trees.

  • 8/2/2019 Predictive Analysis Library Manual

    18/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 18

    L Function Signature

    pal::predictWithDT(Table predictive, Table args, Table model,

    Table result)

    Input Table

    Table Column Column Data Type Description Constraint

    Predicted Data Columns VARCHAR/CHAR or

    integer/Double

    Data to be classified

    (predicted)

    An ID column is

    mandatory. Its

    data type should

    be integer.

    Predictive Model 1st column CLOB Serialized tree model

    Parameter Table

    Name Data Type Description

    START_COLUMN Integer The first attribute/column used to make

    prediction in the input data set (column index

    starts from zero).

    END_COLUMN Integer The last attribute/column used to make

    prediction in the input data set (column index

    starts from zero).

    ID_COLUMN Integer The column of predicted data used as primary

    key/ID. Column index starts from zero (columnindex starts from zero).

    THREAD_NUMBER Integer Number of threads

    Output Table

    Table Column Column Data Type Description Constraint

    Result 1st column Integer ID

    2nd column VARCHAR/CHAR Predictive result

    Example;# DECISION TREE

    DROPTYPE DATA_T;

    CREATETYPE DATA_TASTABLE( "ID" INT,"Region"VARCHAR(50),"SalesPeriod"VARCHAR(50),"Revenue" INT,"CLASS"VARCHAR(50));

    DROPTYPE MODEL_T;

    CREATETYPE MODEL_TASTABLE("Model"CLOB);

  • 8/2/2019 Predictive Analysis Library Manual

    19/46

  • 8/2/2019 Predictive Analysis Library Manual

    20/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 20

    INSERTINTO TESTDT_TABVALUES (4, 'West', 'Spring', 5000, 'Poor');

    INSERTINTO TESTDT_TABVALUES (5, 'East', 'Spring', 200000, 'Good');

    INSERTINTO TESTDT_TABVALUES (6, 'South', 'Summer', 25000, 'Poor');

    INSERTINTO TESTDT_TABVALUES (7, 'South', 'Spring', 10000, 'Average');

    INSERTINTO TESTDT_TABVALUES (8, 'North', 'Winter', 50000, 'Average');

    DROPTABLE PREDICTIVE_TAB;

    CREATECOLUMNTABLE PREDICTIVE_TAB ("ID" INT,"Region"VARCHAR(50),"SalesPeriod"VARCHAR(50),"Revenue" INT);

    INSERTINTO PREDICTIVE_TABVALUES (0,'South', 'Autumn', 60000);

    INSERTINTO PREDICTIVE_TABVALUES (1,'North', 'Spring', 30000);

    INSERTINTO PREDICTIVE_TABVALUES (2,'South', 'Summer', 25000);

    INSERTINTO PREDICTIVE_TABVALUES (3,'West', 'Winter', 5000);

    DROPTABLE #CONTROL_TAB;

    CREATELOCALTEMPORARYCOLUMNTABLE #CONTROL_TAB ("Name"VARCHAR(100),"intArgs" INT, "doubleArgs"DOUBLE, "stringArgs"VARCHAR(100));

    INSERTINTO #CONTROL_TABVALUES ('START_COLUMN',1,null,null);

    INSERTINTO #CONTROL_TABVALUES ('END_COLUMN',4,null,null);

    INSERTINTO #CONTROL_TABVALUES ('PERCENTAGE',null,0.71,null);

    INSERTINTO #CONTROL_TABVALUES ('THREAD_NUMBER',1,null,null);

    INSERTINTO #CONTROL_TABVALUES ('CONTINUOUS_COL',3,25000,null);

    INSERTINTO #CONTROL_TABVALUES ('CONTINUOUS_COL',3,60000,null);

    DROPTABLE #PRE_CONTROL_TAB;

    CREATELOCALTEMPORARYCOLUMNTABLE #PRE_CONTROL_TAB ("Name"VARCHAR(100), "intArgs" INT, "doubleArgs"DOUBLE, "stringArgs"VARCHAR(100));

    INSERTINTO #PRE_CONTROL_TABVALUES ('START_COLUMN',1,null,null);

    INSERTINTO #PRE_CONTROL_TABVALUES ('END_COLUMN',3,null,null);

    INSERTINTO #PRE_CONTROL_TABVALUES ('THREAD_NUMBER',2,null,null);

    INSERTINTO #PRE_CONTROL_TABVALUES ('ID_COLUMN',0,null,null);

    DROPTABLE RESULTS_TAB;

    CREATECOLUMNTABLE RESULTS_TAB ("ID" INT, "CLASS"VARCHAR(50));

    DROPTABLE MODEL_TAB;CREATECOLUMNTABLE MODEL_TAB ("Model"CLOB);

    CALL palDT1(TESTDT_TAB, "#CONTROL_TAB", MODEL_TAB)with overview;

    CALL palDT2(PREDICTIVE_TAB, "#PRE_CONTROL_TAB", MODEL_TAB, RESULTS_TAB)with overview;

    SELECT * FROMRESULTS_TAB;

    ;#EXPECTED OUTPUT:

  • 8/2/2019 Predictive Analysis Library Manual

    21/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 21

  • 8/2/2019 Predictive Analysis Library Manual

    22/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 22

    KNN

    The k-nearest neighbor algorithm (KNN) is a method for classifying objects based on closest training

    examples in the feature space. KNN is amongst the simplest of all machine learning algorithms: an

    object is classified by a majority vote of its neighbors, with the object being assigned to the class mostcommon amongst its knearest neighbors (kis a positive integer, typically small). If k = 1, then the

    object is simply assigned to the class of its nearest neighbor. The neighbors are taken from a set of

    objects for which the correct classification is known.

    The training examples are vectors in a multidimensional feature space, each with a class label. The

    training phase of the algorithm consists only of storing the feature vectors and class labels of the

    training samples. In the classification phase, k is a user-defined constant, and an unlabelled vector (a

    query or test point) is classified by assigning the label which is most frequent among the ktraining

    samples nearest to that query point. Usually Euclidean distance is used as the distance metric. For

    more information, refer tohttp://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm.

    The PAL implementation of KNN supports multi-thread and different voting type.

    Prerequisites The first column of training data and input data should be ID column. The second column of

    training data should be class type. The class type column is of integer type. Other data columns

    are of integer or double type.

    Input data does not contain null value.

    Interface (knn)Function: pal::knn

    This is a classification function using the KNN algorithm.

    L Function Signature

    pal::knn(Table value, Table classvalue, Table args,

    Table result)

    Input Table

    Table Column Column Data Type Description Constraint

    Training Data 1st column Integer ID

    2nd column Integer Class type

    Other columns Integer or double Attribute data

    Class Data 1st column Integer ID

    Other columns Integer or double Attribute data

    Parameter Table

    Name Data Type Description

    K_NEAREST_NEIGHBOURS Integer Number of nearest neighbors (k)

    http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithmhttp://en.wikipedia.org/wiki/K-nearest_neighbor_algorithmhttp://en.wikipedia.org/wiki/K-nearest_neighbor_algorithmhttp://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
  • 8/2/2019 Predictive Analysis Library Manual

    23/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 23

    Name Data Type Description

    ATTRIBUTE_NUM Integer Number of attributes

    VOTING_TYPE Integer Voting type:

    0 = majority voting

    1 = distance-weighted voting

    THREAD_NUMBER Integer Number of threads

    Output Table

    Table Column Column Data Type Description Constraint

    Result 1st column Integer ID

    2nd column Integer or double class type

    Example

    ;# knn.sql

    ALTERSESSION SET CURRENT_SCHEMA = "DM_PAL";

    DROPTYPE DATA_T;

    CREATETYPE DATA_TASTABLE( "ID" INT,"TYPE" INT,"X1"DOUBLE, "X2"DOUBLE);

    DROPTYPE CLASSDATA_T;

    CREATETYPE CLASSDATA_TASTABLE( "ID" INT,"X1"DOUBLE, "X2"DOUBLE);

    DROPTYPE RESULT_T;

    CREATETYPE RESULT_TASTABLE("ID" INT,"Type" INT);

    DROPTABLE #CONTROL_TAB;CREATELOCALTEMPORARYCOLUMNTABLE #CONTROL_TAB (

    "Name"VARCHAR(50),

    "intArgs"INTEGER,

    "doubleArgs"DOUBLE,

    "stringArgs"VARCHAR(100));

    DROPTYPE CONTROL_T;

    CREATETYPE CONTROL_TASTABLE( "Name"VARCHAR(50),"intArgs"INTEGER, "doubleArgs"DOUBLE,"stringArgs"VARCHAR(100));

    DROPPROCEDURE palKNN;

  • 8/2/2019 Predictive Analysis Library Manual

    24/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 24

    CREATEPROCEDURE palKNN( IN data DATA_T, IN classdata CLASSDATA_T, INcontrol CONTROL_T, OUT results RESULT_T )

    LANGUAGELLANGAS

    BEGIN

    export Void main(Table"data" dataTab,

    Table "classdata" classdataTab,

    Table "control" argsTab,

    Table "results" & resultsTab) {

    pal::knn(dataTab, classdataTab, argsTab, resultsTab);

    }

    END;

    DROPTABLE DATA_TAB;

    CREATECOLUMNTABLE DATA_TAB ( "ID" INT,"TYPE" INT,"X1"DOUBLE, "X2"DOUBLE);

    INSERTINTO DATA_TABVALUES (0,2,1,1);

    INSERTINTO DATA_TABVALUES (1,3,10,10);

    INSERTINTO DATA_TABVALUES (2,3,10,11);

    INSERTINTO DATA_TABVALUES (3,3,10,10);

    INSERTINTO DATA_TABVALUES (4,1,1000,1000);

    INSERTINTO DATA_TABVALUES (5,1,1000,1001);

    INSERTINTO DATA_TABVALUES (6,1,1000,999);

    INSERTINTO DATA_TABVALUES (7,1,999,999);

    INSERTINTO DATA_TABVALUES (8,1,999,1000);

    INSERTINTO DATA_TABVALUES (9,1,1000,1000);

    DROPTABLE CLASSDATA_TAB;

    CREATECOLUMNTABLE CLASSDATA_TAB ( "ID" INT,"X1"DOUBLE, "X2"DOUBLE);

    INSERTINTO CLASSDATA_TABVALUES (0,2,1);

    INSERTINTO CLASSDATA_TABVALUES (1,9,10);

    INSERTINTO CLASSDATA_TABVALUES (2,9,11);

    INSERTINTO CLASSDATA_TABVALUES (3,15000,15000);

    INSERTINTO CLASSDATA_TABVALUES (4,1000,1000);

    INSERTINTO CLASSDATA_TABVALUES (5,500,1001);

    INSERTINTO CLASSDATA_TABVALUES (6,500,999);INSERTINTO CLASSDATA_TABVALUES (7,199,999);

    TRUNCATETABLE #CONTROL_TAB;

    INSERTINTO #CONTROL_TABVALUES ('K_NEAREST_NEIGHBOURS',3,null,null);

    INSERTINTO #CONTROL_TABVALUES ('ATTRIBUTE_NUM',2,null,null);

    INSERTINTO #CONTROL_TABVALUES ('VOTING_TYPE',0,null,null);

    INSERTINTO #CONTROL_TABVALUES ('THREAD_NUMBER',8,null,null);

    DROPTABLE RESULTS_TAB;

    CREATECOLUMNTABLE RESULTS_TAB ("ID" INT,"Type" INT);

  • 8/2/2019 Predictive Analysis Library Manual

    25/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 25

    CALL palKNN(DATA_TAB, CLASSDATA_TAB, "#CONTROL_TAB", RESULTS_TAB)withoverview;

    SELECT * FROMRESULTS_TAB;

    ;#EXPECTED OUTPUT:

  • 8/2/2019 Predictive Analysis Library Manual

    26/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 26

    Multiple Linear Regression

    Linear regression is an approach to model the relationship between a scalar variable yand one or

    more variables denoted X. In linear regression, data are modeled using linear functions, and unknown

    model parameters are estimated from the data. Such models are called linear models.For more information, refer tohttp://en.wikipedia.org/wiki/Linear_regression.

    In PAL, the implementation of linear regression is to solve the equation:

    Ax=Y

    Where A is MxN matrix, x is Nx1 matrix, and Y is Mx1 matrix.

    Then, x= reverse(A)*Y

    And it can be transformed into

    x= reverse(transpose(A)*A) *transpose(A) *Y

    The implementation also supports calculating the F value and R^2 determining statistical significance.

    Prerequisites No missing or null data in inputs.

    Data is numeric, not categorical.

    Given the structure as Y and X1...Xn, there must be more than n+1 records available for

    analysis.

    Interface (linearRegression)

    Function: pal::linearRegression

    This is a multiple linear regression function.

    L Function Signature

    pal::linearRegression( Table data, Table args, Table result,

    Table fitted, Table significance)

    Input Table

    Table Column Column Data Type Description Constraint

    Data 1st column Integer ID

    2nd column Integer or double Variable y

    Other columns Integer or double Variable Xn

    Parameter Table

    Name Data Type Description

    VARIABLE_NUM Integer Number of variable X

    THREAD_NUMBER Integer Number of threads

    http://en.wikipedia.org/wiki/Linear_regressionhttp://en.wikipedia.org/wiki/Linear_regressionhttp://en.wikipedia.org/wiki/Linear_regressionhttp://en.wikipedia.org/wiki/Linear_regression
  • 8/2/2019 Predictive Analysis Library Manual

    27/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 27

    Output Table

    Table Column Column Data Type Description Constraint

    Result 1st column Integer ID

    2nd column Integer or double Value Ai (A0 is

    the intercept; A1 is

    the beta

    coefficient for X1,

    A2 is the beta

    coefficient for X2

    etc )

    Fitted Data 1st column Integer ID

    2nd column Integer or double Value Yi

    Significance 1st column VARCHAR/CHAR Name (R^2 / F)

    2nd column Double Value

    Interface (forecastWithLR)

    Function: pal::forecastWithLR

    This function is used to perform predication with linear regression result.

    L Function Signature

    pal::forecastWithLR( Table predictdata, Table coefficient,

    Table args, Table result)

    Input Table

    Table Column Column Data Type Description Constraint

    Predictive Data 1st column Integer ID

    Other columns Integer or double Variable Xn

    Coefficient 1st column Integer ID

    2nd column Integer or double Value Ai

    Parameter Table

    Name Data Type Description

    VARIABLE_NUM Integer Number of variable X

    THREAD_NUMBER Integer Number of threads

  • 8/2/2019 Predictive Analysis Library Manual

    28/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 28

    Output Table

    Table Column Column Data Type Description Constraint

    Fitted Result 1st column Integer ID

    2nd column Integer or double Value Yi

    Example

    ;# linearRegression

    ALTERSESSION SET CURRENT_SCHEMA = "DM_PAL";

    DROPTYPE DATA_T;

    CREATETYPE DATA_TASTABLE( "ID" INT,"Y"DOUBLE,"X1"DOUBLE, "X2"DOUBLE);

    DROPTYPE RESULT_T;

    CREATETYPE RESULT_TASTABLE("ID" INT,"Ai"DOUBLE);

    DROPTYPE FITTED_T;

    CREATETYPE FITTED_TASTABLE("ID" INT,"Fitted"DOUBLE);

    DROPTYPE SIGNIFICANCE_T;

    CREATETYPE SIGNIFICANCE_TASTABLE("NAME"varchar(50),"VALUE"DOUBLE);

    DROPTYPE CONTROL_T;

    CREATETYPE CONTROL_TASTABLE( "Name"VARCHAR(50),"intArgs"INTEGER, "doubleArgs"DOUBLE,"stringArgs"VARCHAR(100));

    DROPPROCEDURE palLR;

    CREATEPROCEDURE palLR( IN data DATA_T, IN control CONTROL_T, OUTresults RESULT_T, OUT fittedValue FITTED_T,OUT significanceSIGNIFICANCE_T )

    LANGUAGELLANG

    AS

    BEGIN

    export Void main(Table"data" dataTab,

    Table "control" argsTab,

    Table "results" & resultsTab,

    Table "fittedValue" & fittedTab,

    Table "significance" &significanceTab) {

    pal::linearRegression(dataTab, argsTab, resultsTab,fittedTab,significanceTab);

    }

    END;

  • 8/2/2019 Predictive Analysis Library Manual

    29/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 29

    DROPTABLE DATA_TAB;

    CREATECOLUMNTABLE DATA_TAB ( "ID" INT,"Y"DOUBLE,"X1"DOUBLE, "X2"DOUBLE);

    INSERTINTO DATA_TABVALUES (0,0.5,0.13,0.33);

    INSERTINTO DATA_TABVALUES (1,0.15,0.14,0.34);INSERTINTO DATA_TABVALUES (2,0.25,0.15,0.36);

    INSERTINTO DATA_TABVALUES (3,0.35,0.16,0.35);

    INSERTINTO DATA_TABVALUES (4,0.45,0.17,0.37);

    INSERTINTO DATA_TABVALUES (5,0.55,0.18,0.38);

    INSERTINTO DATA_TABVALUES (6,0.65,0.19,0.39);

    INSERTINTO DATA_TABVALUES (7,0.75,0.19,0.31);

    INSERTINTO DATA_TABVALUES (8,0.85,0.11,0.32);

    INSERTINTO DATA_TABVALUES (9,0.95,0.12,0.33);

    DROPTABLE #CONTROL_TAB;

    CREATELOCALTEMPORARYCOLUMNTABLE #CONTROL_TAB ( "Name"VARCHAR(50),

    "intArgs"INTEGER,

    "doubleArgs"DOUBLE,

    "stringArgs"VARCHAR(100));

    INSERTINTO #CONTROL_TABVALUES ('VARIABLE_NUM',2,null,null);

    INSERTINTO #CONTROL_TABVALUES ('THREAD_NUMBER',8,null,null);

    DROPTABLE RESULTS_TAB;

    CREATECOLUMNTABLE RESULTS_TAB ("ID" INT,"Ai"DOUBLE);

    DROPTABLE FITTED_TAB;

    CREATECOLUMNTABLE FITTED_TAB ("ID" INT,"Fitted"DOUBLE);

    DROPTABLE SIGNIFICANCE_TAB;

    CREATECOLUMNTABLE SIGNIFICANCE_TAB ("NAME"varchar(50),"VALUE"DOUBLE);

    CALL palLR(DATA_TAB, "#CONTROL_TAB", RESULTS_TAB, FITTED_TAB,SIGNIFICANCE_TAB)with overview;

    SELECT * FROMRESULTS_TAB;

    SELECT * FROMFITTED_TAB;

    SELECT * FROMSIGNIFICANCE_TAB;

    ;#EXPECTED OUTPUT:RESULTS_TAB:

    FITTED_TAB:

  • 8/2/2019 Predictive Analysis Library Manual

    30/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 30

    SIGNIFICANCE_TAB:

    ;# forecastWithLR

    ALTERSESSION SET CURRENT_SCHEMA = "DM_PAL";

    DROPTYPE PREDICT_T;

    CREATETYPE PREDICT_TASTABLE( "ID" INT,"X1"DOUBLE, "X2"DOUBLE);

    DROPTYPE COEFFICIENT_T;CREATETYPE COEFFICIENT_TASTABLE("ID" INT,"Ai"DOUBLE);

    DROPTYPE FITTED_T;

    CREATETYPE FITTED_TASTABLE("ID" INT,"Fitted"DOUBLE);

    DROPTYPE CONTROL_T;

    CREATETYPE CONTROL_TASTABLE( "Name"VARCHAR(50),"intArgs"INTEGER, "doubleArgs"DOUBLE,"stringArgs"VARCHAR(100));

    DROPPROCEDURE palForecastWithLR;

    CREATEPROCEDURE palForecastWithLR( IN predictData PREDICT_T, INcoefficient COEFFICIENT_T, IN control CONTROL_T, OUT fittedValueFITTED_T )

    LANGUAGELLANG

    AS

    BEGIN

    export Void main(

    Table "predictData"predictDataTab,

    Table "coefficient" coefficientTab,

    Table "control" argsTab,

  • 8/2/2019 Predictive Analysis Library Manual

    31/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 31

    Table "fittedValue" & fittedTab) {

    pal::forecastWithLR(predictDataTab, coefficientTab, argsTab,fittedTab);

    }

    END;

    DROPTABLE PREDICTDATA_TAB;

    CREATECOLUMNTABLE PREDICTDATA_TAB ( "ID" INT,"X1"DOUBLE, "X2"DOUBLE);

    INSERTINTO PREDICTDATA_TABVALUES (0,0.5,0.3);

    INSERTINTO PREDICTDATA_TABVALUES (1,4,0.4);

    INSERTINTO PREDICTDATA_TABVALUES (2,0,1.6);

    INSERTINTO PREDICTDATA_TABVALUES (3,0.3,0.45);

    INSERTINTO PREDICTDATA_TABVALUES (4,0.4,1.7);

    DROPTABLE #CONTROL_TAB;

    CREATELOCALTEMPORARYCOLUMNTABLE #CONTROL_TAB (

    "Name"VARCHAR(50),

    "intArgs"INTEGER,

    "doubleArgs"DOUBLE,

    "stringArgs"VARCHAR(100));

    INSERTINTO #CONTROL_TABVALUES ('VARIABLE_NUM',2,null,null);

    INSERTINTO #CONTROL_TABVALUES ('THREAD_NUMBER',8,null,null);

    DROPTABLE COEEFICIENT_TAB;CREATECOLUMNTABLE COEEFICIENT_TAB ("ID" INT,"Ai"DOUBLE);

    INSERTINTO COEEFICIENT_TABVALUES (0,1.7120914258645001);

    INSERTINTO COEEFICIENT_TABVALUES (1,0.2652771198483208);

    INSERTINTO COEEFICIENT_TABVALUES (2,-3.471103742302148);

    DROPTABLE FITTED_TAB;

    CREATECOLUMNTABLE FITTED_TAB ("ID" INT,"Fitted"DOUBLE);

    CALL palForecastWithLR(PREDICTDATA_TAB, COEEFICIENT_TAB, "#CONTROL_TAB",FITTED_TAB)with overview;

    SELECT * FROMFITTED_TAB;

    ;#EXPECTED OUTPUT:

  • 8/2/2019 Predictive Analysis Library Manual

    32/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 32

    Apriori

    Apriori is a classic predictive analysis algorithm for finding association rules in association analysis.

    Association analysis uncovers the hidden patterns, correlations or casual structures among a set ofitems or objects. For example, association analysis is used to understand what products and services

    customers tend to purchase at the same time. By analyzing the purchasing trends of customers with

    association analysis, then prediction of their future behavior may be made.

    Apriori is designed to operate on databases containing transactions. As is common in association rule

    mining, given a set of items, the algorithm attempts to find subsets which are common to at least a

    minimum number of the item sets. Apriori uses a bottom up approach, where frequent subsets are

    extended one item at a time, a step known as candidate generation, and groups of candidates are

    tested against the data. The algorithm terminates when no further successful extensions are found.

    Apriori uses breadth-first search and a tree structure to count candidate item sets efficiently. It

    generates candidate item sets of length kfrom item sets of length k-1. Then it prunes the candidates

    which have an infrequent sub pattern. The candidate set contains all frequent k-length item sets. Afterthat, it scans the transaction database to determine frequent item sets among the candidates. For

    more information, refer tohttp://en.wikipedia.org/wiki/Apriori_algorithm.

    The Apriori function in PAL uses vertical data format to store the transaction data in memory. The

    function can take string or integer transaction ID and item ID as input. It supports the output of

    confidence, support, and lift value, but does not limit the number of output rules. However, you can

    use SQL Script to select the number of output rules, for example:

    SELECT TOP 2000 FROM RULE_RESULTS where lift >0.5

    PrerequisiteInput data does not contain null value.

    Interface (aprioriRule)Function: pal::aprioriRule

    This function reads input transactions data and generates association rules by the Apriori algorithm.

    L Function Signature

    pal::aprioriRule( Table dataset, Table args, Table result)

    Input Table

    Table Column Column Data Type Description Constraint

    Dataset/Historical Data Transaction ID

    column

    Integer or

    VARCHAR/CHAR

    Transaction ID

    Item column Integer or

    VARCHAR/CHAR

    Item ID

    http://en.wikipedia.org/wiki/Apriori_algorithmhttp://en.wikipedia.org/wiki/Apriori_algorithmhttp://en.wikipedia.org/wiki/Apriori_algorithmhttp://en.wikipedia.org/wiki/Apriori_algorithm
  • 8/2/2019 Predictive Analysis Library Manual

    33/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 33

    Parameter Table

    Name Data Type Description

    MIN_SUPPORT Double User-specified minimum support (actual

    value).

    MIN_CONFIDENCE Double User-specified minimum confidence (actual

    value).

    TID_COLUMN Integer Indicates which column stores the

    transaction ID (column index starts from

    zero).

    ITEM_COLUMN Integer Indicates which column stores the items ID

    (column index starts from zero).

    THREAD_NUMBER Integer Number of threads.

    Output Table

    Table Column Column Data Type Description Constraint

    Result 1st column VARCHAR/CHAR Leading items

    2nd column VARCHAR/CHAR Dependent items

    3rd column Double Support value

    4th column Double Confidence value

    5th column Double Lift value

    Example;#aprioriRule

    DROPTYPE DATA_T;

    CREATETYPE DATA_TASTABLE( "ID" INT,"CUSTOMER" INT,"ITEM"VARCHAR(20));

    DROPTYPE RESULT_T;

    CREATETYPE RESULT_TASTABLE("PRERULE"VARCHAR(500),"POSTRULE"VARCHAR(500),"SUPPORT"DOUBLE, "CONFIDENCE"DOUBLE,"LIFT"DOUBLE);

    DROPTABLE #CONTROL_TAB;

    CREATELOCALTEMPORARYCOLUMNTABLE #CONTROL_TAB(

    "Name"VARCHAR(50),

    "intArgs"INTEGER,

    "doubleArgs"DOUBLE,

    "stringArgs"VARCHAR(100));

    DROPTYPE CONTROL_T;

  • 8/2/2019 Predictive Analysis Library Manual

    34/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 34

    CREATETYPE CONTROL_TASTABLE( "Name"VARCHAR(50),"intArgs"INTEGER,"doubleArgs"DOUBLE,"stringArgs"VARCHAR(100));

    DROPPROCEDURE palapprioriRule;

    CREATEPROCEDURE palapprioriRule( IN data DATA_T, IN control CONTROL_T,OUT results RESULT_T )

    LANGUAGELLANG

    AS

    BEGIN

    export Void main(Table "data"dataTab,

    Table "control" argsTab,

    Table "results" & resultsTab) {

    pal::aprioriRule(dataTab, argsTab, resultsTab);

    }

    END;

    DROPTABLE TESTASSOCIATION_TAB;

    CREATECOLUMNTABLE TESTASSOCIATION_TAB("ID" INT,"CUSTOMER" INT,"ITEM"VARCHAR(20));

    INSERTINTO TESTASSOCIATION_TABVALUES (0, 0, 'item1');

    INSERTINTO TESTASSOCIATION_TABVALUES (1, 0, 'item2');

    INSERTINTO TESTASSOCIATION_TABVALUES (2, 0, 'item5'); #transacion T0:I1,I2,I5

    INSERTINTO TESTASSOCIATION_TABVALUES (3, 1, 'item2');INSERTINTO TESTASSOCIATION_TABVALUES (4, 1, 'item4'); #transacion T1:I2,I4

    INSERTINTO TESTASSOCIATION_TABVALUES (5, 2, 'item2');

    INSERTINTO TESTASSOCIATION_TABVALUES (6, 2, 'item3');

    INSERTINTO TESTASSOCIATION_TABVALUES (7, 3, 'item1');

    INSERTINTO TESTASSOCIATION_TABVALUES (8, 3, 'item2');

    INSERTINTO TESTASSOCIATION_TABVALUES (9, 3, 'item4');

    INSERTINTO TESTASSOCIATION_TABVALUES (10, 4,'item1');

    INSERTINTO TESTASSOCIATION_TABVALUES (11, 4,'item3');

    INSERTINTO TESTASSOCIATION_TABVALUES (12, 5, 'item2');

    INSERTINTO TESTASSOCIATION_TABVALUES (13, 5, 'item3');

    INSERTINTO TESTASSOCIATION_TABVALUES (14, 6, 'item1');

    INSERTINTO TESTASSOCIATION_TABVALUES (15, 6, 'item3');

    INSERTINTO TESTASSOCIATION_TABVALUES (16, 7, 'item1');

    INSERTINTO TESTASSOCIATION_TABVALUES (18, 7, 'item2');

    INSERTINTO TESTASSOCIATION_TABVALUES (19, 7, 'item3');

    INSERTINTO TESTASSOCIATION_TABVALUES (20, 7, 'item5');

    INSERTINTO TESTASSOCIATION_TABVALUES (21, 8, 'item1');

    INSERTINTO TESTASSOCIATION_TABVALUES (22, 8, 'item2');

    INSERTINTO TESTASSOCIATION_TABVALUES (23, 8, 'item3');

    DROPTABLE RESULTS_TAB;

  • 8/2/2019 Predictive Analysis Library Manual

    35/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 35

    CREATECOLUMNTABLE RESULTS_TAB ("PRERULE"VARCHAR(500),"POSTRULE"VARCHAR(500), "SUPPORT"Double, "CONFIDENCE"Double,"LIFT"DOUBLE);

    TRUNCATETABLE #CONTROL_TAB;

    INSERTINTO #CONTROL_TABVALUES ('TID_COLUMN',1,null,null);INSERTINTO #CONTROL_TABVALUES ('ITEM_COLUMN',2,null,null);

    INSERTINTO #CONTROL_TABVALUES ('THREAD_NUMBER',4,null,null);

    INSERTINTO #CONTROL_TABVALUES ('MIN_SUPPORT',null,0.2,null);

    INSERTINTO #CONTROL_TABVALUES ('MIN_CONFIDENCE',null,0.2,null);

    TRUNCATETABLE RESULTS_TAB;

    CALL palapprioriRule(TESTASSOCIATION_TAB, "#CONTROL_TAB", RESULTS_TAB)with overview;

    SELECT * FROMRESULTS_TAB;

    ;#EXPECTED OUTPUT:

  • 8/2/2019 Predictive Analysis Library Manual

    36/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 36

    ABC Classification

    ABC Classification is used to classify objects, such as customers, employees, or products, based on a

    particular measure, such as revenue or profit. ABC analysis suggests that inventories of anorganization are not of equal value. Thus, the inventories are grouped into three categories (A, B, and

    C) by their estimated importance. A items are very important for an organization. B items are

    important, but less important than A items and more important than C items. Therefore, B items

    are of medium importance, and C items are marginally important.

    An example of ABC classification is as follows:

    A items 20% of the items account for 70% of the annual consumption value of all items.

    B items 30% of the items account for 25% of the annual consumption value of all items.

    C items 50% of the items account for 5% of the annual consumption value of all items.

    For more information, refer tohttp://en.wikipedia.org/wiki/ABC_analysis.

    Prerequisite

    Input data cannot contain null value.

    The item names in the Input table must be of string data type and be unique.

    Interface (abcAnalysis)

    Function: pal::abcAnalysis

    This function performs the ABC analysis algorithm.

    L Function Signature

    pal::abcAnalysis ( Table target, Table args, Table result)

    Input Table

    Table Column Column Data Type Description Constraint

    Target Data 1st column VARCHAR/CHAR Item name

    2nd column Double Value

    Parameter Table

    Name Data Type Description

    START_COLUMN Integer The first column used to do the

    classification (column index starts from

    zero).

    END_COLUMN Integer The last column used to do the classification

    (column index starts from zero).

    THREAD_NUMBER Integer Number of threadsPERCENT_A Double Interval for A class

    http://en.wikipedia.org/wiki/ABC_analysishttp://en.wikipedia.org/wiki/ABC_analysishttp://en.wikipedia.org/wiki/ABC_analysishttp://en.wikipedia.org/wiki/ABC_analysis
  • 8/2/2019 Predictive Analysis Library Manual

    37/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 37

    Name Data Type Description

    PERCENT_B Double Interval for B class

    PERCENT_C Double Interval for C class

    Output Table

    Table Column Column Data Type Description Constraint

    Result 1st column VARCHAR/CHAR ABC class

    2nd column VARCHAR/CHAR Items

    Example

    ;# ABC

    DROPTYPE DATA_T;

    CREATETYPE DATA_TASTABLE("ITEM"VARCHAR(100),"VALUE"DOUBLE);

    DROPTYPE CONTROL_T;

    CREATETYPE CONTROL_TASTABLE("Name"VARCHAR(100), "intArgs" INT,"doubleArgs"DOUBLE,"strArgs"VARCHAR(100));

    DROPTYPE RESULT_T;

    CREATETYPE RESULT_TASTABLE("ABC"VARCHAR(10),"ITEM"VARCHAR(100));

    DROPPROCEDURE palAbcAnalysis;

    CREATEPROCEDURE palAbcAnalysis( IN target DATA_T, IN control CONTROL_T,OUT results RESULT_T )

    LANGUAGELLANG

    AS

    BEGIN

    export Void main(Table "target" targetTab,

    Table "control" controlTab,

    Table "results" & resultsTab){

    pal::abcAnalysis(targetTab, controlTab, resultsTab);

    }

    END;

    DROPTABLE #CONTROL_TBL;

    CREATELOCALTEMPORARYCOLUMNTABLE #CONTROL_TBL ("Name"VARCHAR(100),"intArgs" INT, "doubleArgs"DOUBLE,"strArgs"VARCHAR(100));

    INSERTINTO #CONTROL_TBLVALUES ('START_COLUMN',0,null,NULL);

    INSERTINTO #CONTROL_TBLVALUES ('END_COLUMN',1,null,null);

    INSERTINTO #CONTROL_TBLVALUES ('THREAD_NUMBER',2,null,null);

    INSERTINTO #CONTROL_TBLVALUES ('PERCENT_A',null,0.7,null);

  • 8/2/2019 Predictive Analysis Library Manual

    38/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 38

    INSERTINTO #CONTROL_TBLVALUES ('PERCENT_B',null,0.2,null);

    INSERTINTO #CONTROL_TBLVALUES ('PERCENT_C',null,0.1,null);

    DROPTABLE TESTABCTAB;

    CREATECOLUMNTABLE TESTABCTAB("ITEM"VARCHAR(100),"VALUE"DOUBLE);INSERTINTO TESTABCTABVALUES ('item1', 15.4);

    INSERTINTO TESTABCTABVALUES ('item2', 200.4);

    INSERTINTO TESTABCTABVALUES ('item3', 280.4);

    INSERTINTO TESTABCTABVALUES ('item4', 100.9);#100.9

    INSERTINTO TESTABCTABVALUES ('item5', 40.4);

    INSERTINTO TESTABCTABVALUES ('item6', 25.6);

    INSERTINTO TESTABCTABVALUES ('item7', 18.4);

    INSERTINTO TESTABCTABVALUES ('item8', 10.5);

    INSERTINTO TESTABCTABVALUES ('item9', 96.15);

    INSERTINTO

    TESTABCTABVALUES

    ('item10', 9.4);

    DROPTABLE RESULT_TBL;

    CREATECOLUMNTABLE RESULT_TBL("ABC"VARCHAR(10),"ITEM"VARCHAR(100));

    CALL palAbcAnalysis(TESTABCTAB, "#CONTROL_TBL", RESULT_TBL)withoverview;

    SELECT * FROMRESULT_TBL;

    ;#EXPECTED OUTPUT:

  • 8/2/2019 Predictive Analysis Library Manual

    39/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 39

    Weighted Score Table

    A weighted score table is a method of evaluating alternatives when the importance of each criterion

    differs. In a weighted score table, each alternative is given a score for each criterion. These scores are

    then weighted by the importance of each criterion. All of an alternative's weighted scores are thenadded together to calculate that alternative's total weighted score. The alternative with the highest total

    score should be the best alternative. Weighted score tables can be used to make predictions about

    future customer behavior. A model based on historical data in a data mining application may be

    applied to new data to make prediction. The prediction, that is, the output of the model, is also called a

    Score. A single score for customers can be calculated by taking into account different dimensions.

    A function defined by weighted score tables is a linear combination of functions of a variable.

    f(x1,,xn) = w1f1(x1) + + wnfn(xn)

    Prerequisites

    Input data cannot contain null value.

    The column of the Map Function table should be sorted by the attribute order of the Input Data

    table.

    Interface (weightedTable)

    Function: pal::weightedTable

    This function performs weighted table calculation. It is similar to the Volume Driver function in the

    Business Function Library (BFL). Volume Driver calculates only one column, but weightedTable

    calculates multiple columns at the same time.

    L Function Signature

    pal::weightedTable ( Table target, Table mapfun, Table

    control, Table args, Table result)

    Input Table

    Table Column Column Data Type Description Constraint

    Target/Input

    Data

    Columns VARCHAR/CHAR,

    integer, or double

    Specifies which

    will be used to

    calculate the

    scores

    Discrete value: integer, string,

    double;

    Continuous value: integer,

    double;

    An ID column is mandatory. Its

    data type should be integer.

    Map Function Columns VARCHAR/CHAR,

    integer, or double

    Creates the map

    function

    Every attribute (except ID) in the

    Input Data table maps to two

    columns in the Map Function

    table: Key column and Value

    column. The Value column must

    be double type.

  • 8/2/2019 Predictive Analysis Library Manual

    40/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 40

    Table Column Column Data Type Description Constraint

    Control Columns Integer or double This table has three columns.

    When the Input Data table has n

    attributes (except ID), the WeightTable will have nrows.

    Output Table

    Table Column Column Data Type Description Constraint

    Result 1st column Integer ID

    2nd column Double Result value

    Parameter Table

    Name Data Type Description

    START_COLUMN Integer The first column used to do the calculation

    (column index starts from zero).

    END_COLUMN Integer The last column used to do the calculation

    (column index starts from zero).

    THREAD_NUMBER Integer Number of threads

    ID_COLUMN Integer Specifies the ID column (column index

    starts from zero).

    Example

    ;#weightedTable

    DROPTYPE DATA_T;

    CREATETYPE DATA_TASTABLE( "ID" INT,"GENDER"VARCHAR(10),"INCOME"INT,"HEIGHT"DOUBLE);

    DROPTYPE MAP_FUN_T;

    CREATETYPE MAP_FUN_TASTABLE("GENDER"VARCHAR(10), "VAL1"DOUBLE,"INCOME" INT, "VAL2"DOUBLE, "HEIGHT"DOUBLE, "VAL3"DOUBLE);

    DROPTYPE CONTROL_T;

    CREATETYPE CONTROL_TASTABLE( "WEIGHT"DOUBLE, "ISDIS" INT, "ROWNUM"INT);

    DROPTYPE PARAMETERS_T;

    CREATETYPE PARAMETERS_TASTABLE("Name"VARCHAR(100), "intArgs" INT,"doubleArgs"DOUBLE);

  • 8/2/2019 Predictive Analysis Library Manual

    41/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 41

    DROPTYPE RESULT_T;

    CREATETYPE RESULT_TASTABLE("ID" INT,"result"DOUBLE);

    DROPPROCEDURE palWeightTable;

    CREATEPROCEDURE palWeightTable( IN target DATA_T, IN mapfun MAP_FUN_T,IN control CONTROL_T, IN parameters PARAMETERS_T,OUT results RESULT_T )

    LANGUAGELLANG

    AS

    BEGIN

    export Void main(Table "target" targetTab,

    Table "mapfun" mapfunTab,

    Table"control" controlTab,

    Table"parameters" parametersTab,

    Table "results" & resultsTab) {

    pal::weightedTable(targetTab, mapfunTab, controlTab, parametersTab,resultsTab);

    }

    END;

    DROPTABLE TESTTARGET_TBL;

    CREATECOLUMNTABLE TESTTARGET_TBL ("ID" INT,"GENDER"VARCHAR(10),"INCOME" INT,"HEIGHT"DOUBLE);

    INSERTINTO TESTTARGET_TBLVALUES (0,'male',5000,1.73);

    INSERTINTO TESTTARGET_TBLVALUES (1,'male',9000,1.80);

    INSERTINTO TESTTARGET_TBLVALUES (2,'female',6000,1.55);

    INSERTINTO TESTTARGET_TBLVALUES (3,'male',15000,1.65);

    INSERTINTO TESTTARGET_TBLVALUES (4,'female',2000,1.70);

    INSERTINTO TESTTARGET_TBLVALUES (5,'female',12000,1.65);

    INSERTINTO TESTTARGET_TBLVALUES (6,'male',1000,1.65);

    INSERTINTO TESTTARGET_TBLVALUES (7,'male',8000,1.60);

    INSERTINTO TESTTARGET_TBLVALUES (8,'female',5500,1.85);#5500

    INSERTINTO TESTTARGET_TBLVALUES (9,'female',9500,1.85);

    DROPTABLE MAP_FUN_TBL;

    CREATECOLUMNTABLE MAP_FUN_TBL ( "GENDER"VARCHAR(10), "VAL1"DOUBLE,"INCOME" INT, "VAL2"DOUBLE, "HEIGHT"DOUBLE, "VAL3"DOUBLE);

    INSERTINTO MAP_FUN_TBLVALUES ('male',2.0, 0,0.0, 1.5,0.0);

    INSERTINTO MAP_FUN_TBLVALUES ('female',1.5, 5500,1.0, 1.6,1.0);

    INSERTINTO MAP_FUN_TBLVALUES ('null',0.0, 9000,2.0, 1.71,2.0);

    INSERTINTO MAP_FUN_TBLVALUES ('null',0.0, 12000,3.0, 1.80,3.0);

    DROPTABLE CONTROL_TBL;

  • 8/2/2019 Predictive Analysis Library Manual

    42/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 42

    CREATECOLUMNTABLE CONTROL_TBL ("WEIGHT"DOUBLE, "ISDIS" INT, "ROWNUM"INT);

    INSERTINTO CONTROL_TBLVALUES (0.5,1,2);

    INSERTINTO CONTROL_TBLVALUES (2.0,-1,4);

    INSERTINTO CONTROL_TBLVALUES (1.0,-1,4);

    DROPTABLE #PARAMETERS_TBL;

    CREATELOCALTEMPORARYCOLUMNTABLE #PARAMETERS_TBL ("Name"VARCHAR(100),"intArgs" INT, "doubleArgs"DOUBLE);

    INSERTINTO #PARAMETERS_TBLVALUES ('ID_COLUMN',0,null);

    INSERTINTO #PARAMETERS_TBLVALUES ('START_COLUMN',1,null);

    INSERTINTO #PARAMETERS_TBLVALUES ('END_COLUMN',3,null);

    INSERTINTO #PARAMETERS_TBLVALUES ('THREAD_NUMBER',2,null);

    DROPTABLE RESULT_TBL;

    CREATECOLUMNTABLE RESULT_TBL("ID" INT,"result"DOUBLE);

    CALL palWeightTable(TESTTARGET_TBL, MAP_FUN_TBL, CONTROL_TBL,"#PARAMETERS_TBL", RESULT_TBL)with overview;

    SELECT * FROMRESULT_TBL;;#EXPECTED OUTPUT:

  • 8/2/2019 Predictive Analysis Library Manual

    43/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 43

    Log and TraceTo learn about the details of PAL function implementation or trace a problem occurred during function

    runtime, you can check the log information in SAP HANA Database Administration.

    Procedure

    To open a log file:

    1. In SAP HANA Database Administration, choose TraceLevels. Then click Add Componentin

    the Trace Levelsdialog box, enter the PAL function name, and click OK. The name should beentered as PAL_, for example, PAL_APRIORIRULE.

  • 8/2/2019 Predictive Analysis Library Manual

    44/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 44

    2. Select a log type for the newly added PAL component and click OK, as shown below.

    3. Right-click on indexserver_ and select Show File.

  • 8/2/2019 Predictive Analysis Library Manual

    45/46

    Predictive Analysis Library Reference Manual

    SAP AG 2011 45

    Result

    The log file for the specified PAL function is displayed. You can use the Findbutton to search for the

    log information that you need.

  • 8/2/2019 Predictive Analysis Library Manual

    46/46

    Predictive Analysis Library Reference Manual

    Copyrights Copyright 2011 SAP AG. All rights reserved.