datapro4j programmer's guide...datapro4j the data processing library for java the...

165
datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura (2012). datapro4j: the data processing library for Java. Dept. of Computer Science and Numerical Analysis, University of Córdoba (Spain). Available for download from http://www.uco.es/grupos/kdis/datapro4j Knowledge Discovery and Intelligent Systems University of Córdoba, Spain http://www.uco.es/grupos/kdis July 2012

Upload: others

Post on 10-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j The data processing library for Java

The programmer’s guide

Revision: 1

Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura (2012). datapro4j: the data processing library for Java. Dept. of Computer Science and Numerical Analysis, University of Córdoba (Spain). Available for download from http://www.uco.es/grupos/kdis/datapro4j

Knowledge Discovery and Intelligent Systems University of Córdoba, Spain http://www.uco.es/grupos/kdis July 2012

Page 2: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [1]   University of Córdoba, Spain   

CONTACT INFO

José Raúl Romero, PhD Dept. Computer Science and Numerical Analysis University of Córdoba, Spain

Email: [email protected] Web: http://www.jrromero.net/en

PARTICIPANTS (BY ALPHABETICAL ORDER)

• de la Torre López, José. BSc. [JTL] • Luna, José María, MSc. [JML] • Orozco Borrego, Mario. BSc. [MOB] • Ramírez Quesada, Aurora. MSc. [ARQ]

PROJECT HISTORY

Version Date Description Participants 0.1 July 2011 Initial version. Intruder algorithms. ARQ, JTL, JML, JRR 0.2 September 2011 Strategies and columns MOB, JML, JRR 0.3 April 2012 Refactoring, performance improvements

and testing ARQ, JML, JRR

0.4 Under development Weka wrappers for preprocessing, association, clustering and classification

JRR

0.5 Under development New dataset sources from relational databases and noSQL databases

JRR

DOCUMENT HISTORY

Revision Date Description Author 1 July 17, 2012 Initial version of this document JRR

Page 3: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [2]   University of Córdoba, Spain   

TABLE OF CONTENTS

TABLE OF FIGURES  6 

INTRODUCTION  8 

PURPOSE  8 SCOPE  8 LICENSE  8 OVERVIEW  9 TO‐DO LIST  9 

PACKAGE ES::UCO::KDIS::DATAPRO  10 

PACKAGE  ES::UCO::KDIS::DATAPRO::ALGORITHM  11 

PACKAGE ES::UCO::KDIS::DATAPRO::ALGORITHM::BASE  12 

CLASS DATASETSTRATEGY  12 

PACKAGE ES::UCO::KDIS::DATAPRO::ALGORITHM::INTRUDER  16 

CLASS AVERAGEATTACK  16 CLASS BANDWAGONATTACK  18 CLASS DATASETSTATISTICS  21 CLASS INTRUDERATTACK  22 CLASS LOVEHATEATTACK  27 CLASS RANDOMATTACK  29 CLASS REVERSEBANDWAGONATTACK  31 CLASS SEGMENTATTACK  32 

PACKAGE ES::UCO::KDIS::DATAPRO::ALGORITHM::PREPROCESSING  35 

PACKAGE ES::UCO::KDIS::DATAPRO::ALGORITHM::PREPROCESSING:: DISCRETIZATION  36 

CLASS EQUALFREQUENCYDISCRETIZATION  39 CLASS EQUALWIDTHDISCRETIZATION  36 CLASS MDLPDISCRETIZE  40 

PACKAGE ES::UCO::KDIS::DATAPRO::ALGORITHM::PREPROCESSING:: INSTANCE  43 

CLASS REMOVEDUPLICATES  43 CLASS REMOVEPERCENTAGE  44 

Page 4: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [3]   University of Córdoba, Spain   

PACKAGE ES::UCO::KDIS::DATAPRO::ALGORITHM::VALIDATION  48 

CLASS KFOLDS  48 

PACKAGE  ES::UCO::KDIS::DATAPRO::DATASET  51 

CLASS DATASET  51 CLASS FILEDATASET  64 CLASS INSTANCEITERATOR  68 INTERFACE IITERATOR  70 

PACKAGE ES::UCO::KDIS::DATAPRO::DATASET::COLUMN  72 

CLASS COLUMNABSTRACTION  72 CLASS COLUMNIMPL  79 ENUMERATION COLUMNTYPE  85 CLASS BINARYCOLUMN  87 CLASS BINARYCOLUMNIMPL  89 CLASS CATEGORICALCOLUMNIMPL  95 CLASS DATECOLUMN  100 CLASS DATECOLUMNIMPL  102 CLASS INTEGERCOLUMN  105 CLASS INTEGERCOLUMNIMPL  108 CLASS NOMINALCOLUMN  110 CLASS NOMINALCOLUMNIMPL  111 CLASS NUMERICALCOLUMN  115 CLASS NUMERICALCOLUMNIMPL  119 CLASS RANGECOLUMN  123 CLASS RANGECOLUMNIMPL  125 

PACKAGE ES::UCO::KDIS::DATAPRO::DATASET::SOURCE  130 

CLASS ARFFDATASET  130 CLASS CSVDATASET  135 CLASS EXCELDATASET  139 CLASS KEELDATASET  142 

PACKAGE ES::UCO::KDIS::DATAPRO::DATATYPES  146 

CLASS INVALIDVALUE  146 CLASS EMPTYVALUE  147 CLASS MISSINGVALUE  147 CLASS NULLVALUE  148 CLASS RANGE  149 CLASS DOUBLERANGE  152 

PACKAGE ES::UCO::KDIS::DATAPRO::EXCEPTION  154 

Page 5: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [4]   University of Córdoba, Spain   

CLASS ILLEGALFORMATSPECIFICATIONEXCEPTION  154 CLASS NOSUCHCATEGORYEXCEPTION  155 CLASS NOTADDEDVALUEEXCEPTION  156 

APPENDIX A: UML DIAGRAMS  157 

PACKAGE ES.UCO.KDIS.DATAPRO.ALGORITHM.BASE  157 PACKAGE ES.UCO.KDIS.DATAPRO.ALGORITHM.PREPROCESSING  158 PACKAGE ES.UCO.KDIS.DATAPRO.DATASET COLUMNS  159 PACKAGE ES.UCO.KDIS.DATAPRO.DATASET.SOURCE  160 

APPENDIX B: EXTENDING THE LIBRARY  162 

PROJECT STRUCTURE  162 CODE DOCUMENTATION  163 CODING RECOMMENDATIONS  164 

Page 6: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [5]   University of Córdoba, Spain   

THIS PAGE IS LEFT BLANK INTENTIONALLY

Page 7: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [6]   University of Córdoba, Spain   

TABLE OF FIGURES

PACKAGE ES.UCO.KDIS.DATAPRO.ALGORITHM _________________________________________________________________  11 PACKAGE ES.UCO.KDIS.DATAPRO.ALGORITHM.BASE _____________________________________________________________  12 CLASS DATASETSTRATEGY ______________________________________________________________________________  12 PACKAGE ES.UCO.KDIS.DATAPRO.ALGORITHM.INTRUDER __________________________________________________________  16 PACKAGE ES.UCO.KDIS.DATAPRO.ALGORITHM.PREPROCESSING  _____________________________________________________  35 PACKAGE ES.UCO.KDIS.DATAPRO.ALGORITHM.PREPROCESSING.DISCRETIZATION __________________________________________  36 CLASS EQUALFREQUENCYDISCRETIZATION ___________________________________________________________________  39 CLASS EQUALWIDTHDISCRETIZATION  ______________________________________________________________________  36 CLASS MDLPDISCRETIZE _______________________________________________________________________________  40 PACKAGE ES.UCO.KDIS.DATAPRO.ALGORITHM.PREPROCESSING.INSTANCE  ______________________________________________  43 CLASS REMOVEDUPLICATES _____________________________________________________________________________  43 CLASS REMOVEPERCENTAGE ____________________________________________________________________________  45 PACKAGE ES.UCO.KDIS.DATAPRO.ALGORITHM.VALIDATION ________________________________________________________  48 PACKAGE ES.UCO.KDIS.DATAPRO.DATASET ___________________________________________________________________  51 CLASS DATASET _____________________________________________________________________________________  52 CLASS FILEDATASET __________________________________________________________________________________  64 CLASS INSTANCEITERATOR ______________________________________________________________________________  69 INTERFACE IITERATOR _________________________________________________________________________________  70 PACKAGE ES.UCO.KDIS.DATAPRO.DATASET.COLUMN ____________________________________________________________  72 ABSTRACT CLASS COLUMNABSTRACTION ____________________________________________________________________  73 ABSTRACT CLASS COLUMNIMPL __________________________________________________________________________  80 ENUMERATION COLUMNTYPE ___________________________________________________________________________  86 CLASS BINARYCOLUMN ________________________________________________________________________________  87 CLASS BINARYCOLUMNIMPL  ____________________________________________________________________________  89 CLASS CATEGORICALCOLUMN  ___________________________________________________________________________  92 CLASS CATEGORICALCOLUMNIMPL ________________________________________________________________________  95 CLASS DATECOLUMN ________________________________________________________________________________  101 CLASS DATECOLUMNIMPL _____________________________________________________________________________  102 CLASS INTEGERCOLUMN ______________________________________________________________________________  105 CLASS INTEGERCOLUMNIMPL ___________________________________________________________________________  108 CLASS NOMINALCOLUMN _____________________________________________________________________________  110 CLASS NOMINALCOLUMNIMPL __________________________________________________________________________  112 CLASS NUMERICALCOLUMN ____________________________________________________________________________  115 CLASS NUMERICALCOLUMNIMPL ________________________________________________________________________  119 CLASS RANGECOLUMN _______________________________________________________________________________  123 CLASS RANGECOLUMNIMPL ____________________________________________________________________________  125 PACKAGE ES.UCO.KDIS.DATAPRO.DATASET.SOURCE ____________________________________________________________  130 CLASS ARFFDATASET  ________________________________________________________________________________  130 CLASS CSVDATASET _________________________________________________________________________________  135 CLASS EXCELDATASET ________________________________________________________________________________  139 CLASS KEELDATASET _________________________________________________________________________________  143 PACKAGE ES.UCO.KDIS.DATAPRO.DATATYPES  ________________________________________________________________  146 CLASS INVALIDVALUE ________________________________________________________________________________  146 CLASS EMPTYVALUE _________________________________________________________________________________  147 CLASS MISSINGVALUE  _______________________________________________________________________________  148 CLASS NULLVALUE __________________________________________________________________________________  149 CLASS RANGE _____________________________________________________________________________________  150 CLASS DOUBLERANGE ________________________________________________________________________________  152 PACKAGE ES.UCO.KDIS.DATAPRO.EXCEPTION _________________________________________________________________  154 CLASS ILLEGALFORMATSPECIFICATIONEXCEPTION _____________________________________________________________  154 CLASS NOSUCHCATEGORYEXCEPTION _____________________________________________________________________  155 

Page 8: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [7]   University of Córdoba, Spain   

CLASS NOTADDEDVALUEEXCEPTION ______________________________________________________________________  156 CLASS DIAGRAM: PACKAGE OVERVIEW _____________________________________________________________________  157 CLASS DIAGRAM: PACKAGE ES.UCO.KDIS.DATAPRO.ALGORITHM.BASE  ________________________________________________  157 CLASS DIAGRAM: PACKAGE ES.UCO.KDIS.DATAPRO.ALGORITHM.PREPROCESSING _________________________________________  158 CLASS DIAGRAM: PACKAGE ES.UCO.KDIS.DATAPRO.DATASET.COLUMN  _______________________________________________  159 PACKAGE ES.UCO.KDIS.DATAPRO.DATASET.SOURCE ____________________________________________________________  160 CLASS DIAGRAM: PACKAGE ES.UCO.KDIS.DATAPRO.DATATYPES  ____________________________________________________  161 CLASS DIAGRAM: PACKAGE ES.UCO.KDIS.DATAPRO.EXCEPTION _____________________________________________________  161 

Page 9: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [8]   University of Córdoba, Spain   

Introduction

Purpose

This document provides class, interface, and enumeration specification for the datapro4j library. The specification provides the details of the types being modeled within the system.

The datapro4j library is conceived to provide fully support to the efficient handling of data sets from different sources and declaring different kind of data types. This task often takes too long to the Java programmer, especially in certain domains, such as analytical analysis or data mining. Notice that this library is not provided for a given application domain, just for those that require the handling of structured data in Java from diverse data sources.

Therefore, datapro4j can be used in data mining for handling inputs or preprocessing data, using both internal strategies (e.g. algorithms on instances, discretization, etc.) or external tools (e.g. Weka or any other application). It can be also used for handling outputs: for example, in migrating data to other different formats, rearrange results from external tools or algorithms, executing statistical tests, etc.

It is worth mentioning that datapro4j is conceived to be extended, adding new algorithms, data formats, column types, etc. All these aspects are independent of each other, so algorithms can be executed regardless of being introduced in diverse formats (stored in noSQL databases, as an ARFF file, or whichever).

Scope

This document is intended to define the class specification for the datapro4j library.

License

Copyright©2012UniversityofCordoba,Spain.

ThissoftwarewasdevelopedbymembersoftheKnowlegdeDiscoveryandIntelligentSystemsat theUniversityofCórdoba,Spain.For furtherinformationon the libraryandmodifications,pleasevisittheURLhttp://www.uco.es/grupos/kdis/datapro4j

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS ORIMPLIED.

Redistribution and use of binary forms, with or without modification, are permitted if the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the disclaimer above.

Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

All advertising materials or publication mentioning features or use of this software must display the following acknowledgement: “This product includes software developed by the KDIS Research Group at the University of Córdoba (Spain) and its contributors.” or cite the following reference:

J.R.Romero,J.M.Luna,S.Ventura(2012).datapro4j:thedataprocessinglibraryforJava.Dept.ofComputerScienceandNumericalAnalysis,UniversityofCórdoba(Spain).Availablefordownloadfromhttp://www.uco.es/grupos/kdis/datapro4j

Page 10: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [9]   University of Córdoba, Spain   

Neither the name of the University nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

Commercial use of this software or part of it is not allowed without specific prior written permission. Licensing and conditions are subject to change without notice.

Note: At the moment this software is provided in binary form as a Java library. Source code is not provided (we plan to release the Java source code in a near future).

Overview This document provides a list of all packages with a summary for each. Each package has a section that contains a list of its classes, interfaces and enumeration type, with a summary for each. Class and Interface contains description, summary tables, detailed member descriptions, and relation table.

Private properties are omitted. Protected properties are shown when useful for external programmers.

To-do list

In the near future, this library will be updated with the following features (not necessarily in this order):

Listeners in strategies. Graphical UI. (Some minor support is already provided). Generation of synthetic datasets under precise constraints. Multipart datasets: those datasets which are not possible to be fully stored in memory, so they need

to be split and partially retrieved. Different data mining support. Wrappers for different datasets and tools.

o A wrapper for Weka is under development. Access to different databases.

o Access thru JDBC to RDBMS engines (e.g. MySQL, Oracle) is under development. o Access to no-sql engines (e.g. Cassandra) is under development.

More dataset formats: o Currently, the following formats are supported: ARFF, KEEL, CSV, Excel o The following formats are under development: XRFF

Page 11: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [10]   University of Córdoba, Spain   

Package es::uco::kdis::datapro

The library base package. The software is mainly divided into three different components:

Dataset and columns. The logical abstract representation of a dataset and its attributes. Dataset and sources. The physical representation of a dataset, serialized in files, stored in

databases or any other device. Dataset and strategies. Any algorithm running on a single dataset, set of datasets or column.

Name datapro Qualified Name es::uco::kdis::datapro

Page 12: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [11]   University of Córdoba, Spain   

Package es::uco::kdis::datapro::algorithm Only those public strategies are described here. Developers can easily provide their own strategies.

Figure 1. Package es.uco.kdis.datapro.algorithm

Name algorithm Qualified Name es::uco::kdis::datapro::algorithm

Page 13: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [12]   University of Córdoba, Spain   

Package es::uco::kdis::datapro::algorithm::base

Figure 2. Package es.uco.kdis.datapro.algorithm.base

Name base Qualified Name es::uco::kdis::datapro::algorithm::base

Class DatasetStrategy This class represents a generic strategy.

Strategies are a well-known design pattern, where algorithms are encapsulated into classes. Strategies should be executed using either a sequential or a step-by-step process. In general, every strategy is executed according to the following sequence:

Creation: the strategy constructor should collect all the parameters required by the algorithm to be initialized and executed for the first time. Build as many constructors as required.

Initialization: the method initialize() implements any preprocessing step required to the algorithm to be executed. This preprocessing is not a part of the algorithm itself but it should be executed for the first time that the algorithm is invoked.

Execution: the method execution() runs the algorithm once using the parameters introduced when the constructor was invoked, and initialized afterwards. If the algorithm has finished and it could not be invoked any more, then the method setExecutable(false) should be called. On the contrary, the execution is allowed until the stop criteria are fulfilled.

Stop criteria: the method isExecutable returns true if the algorithm can be executed once more over the dataset; false, otherwise.

Post-execution: Any post-processing step has to be implemented by the method postexec(). Result collection: Final results are collected from the dataset, if changed, and returned from the

method getResult().

Figure 3. Class DatasetStrategy

Name DatasetStrategy Qualified Name es::uco::kdis::datapro::algorithm::base::DatasetStrategy Visibility public Abstract true

Page 14: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [13]   University of Córdoba, Spain   

Base Classifier Realized Interface

Attribute Detail

bExecutable

Execution flag. This is protected only for inheritance purposes, and should be never directly modified.

Type boolean Default Value true Visibility protected Multiplicity

oDataset

Dataset used by the strategy.

Type Dataset Default Value Visibility protected Multiplicity

Operation Detail

execute

This method is invoked to execute the strategy.

Type void Visibility public Is Abstract true Parameter

getDataset

Getter method for the dataset attribute.

Type Dataset Visibility protected Is Abstract false Parameter

getResult

This method returns an object comprising the resulting Object of the process

Type Object Visibility public Is Abstract true Parameter

initialize

This method calls the Initialization process of the strategy.

Page 15: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [14]   University of Córdoba, Spain   

Type void Visibility public Is Abstract true Parameter

isExecutable

This method returns true if the strategy is in an executable state.

Type boolean Visibility public Is Abstract false Parameter

postexec

This method should be invoked, if required, after the strategy execution.

Type void Visibility public Is Abstract true Parameter

setDataset

This method sets the dataset to be used by the strategy.

Type void Visibility protected Is Abstract false Parameter • inout data : Dataset

setExecutable

This method sets the current executable state of the strategy.

Type void Visibility protected Is Abstract false Parameter • in bExecutable : boolean

Relation Detail

Generalization

Name Related Element • EqualFrequencyDiscretization

Name Related Element • EqualWidthDiscretization

Name Related Element • MDLPDiscretize

Page 16: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [15]   University of Córdoba, Spain   

Name Related Element • RemoveDuplicates

Name Related Element • IntruderAttack

Name Related Element • KFolds

Name Related Element • RemovePercentage

Name Related Element • DatasetStatistics

Page 17: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [16]   University of Córdoba, Spain   

Package es::uco::kdis::datapro::algorithm::intruder

Figure 4. Package es.uco.kdis.datapro.algorithm.intruder

Name intruder Qualified Name es::uco::kdis::datapro::algorithm::intruder

Class AverageAttack This class implements the Average Attack. This attack strategy sets the maximum value (push attack) or the minimum value (nuke attack) to the target item. The filler items are selected randomly and their values are also randomly chosen over a Normal Distribution, using the mean and standard deviation of the own item.

For a further description see the following paper:

B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. vol. 7, no. 4, pp. 23, 2007.

Name AverageAttack Qualified Name es::uco::kdis::datapro::algorithm::intruder::AverageAttack Visibility public Abstract false Base Classifier • IntruderAttack Realized Interface

Operation Detail

AverageAttack

Parameterized Constructor.

• oDataset The original dataset • iNumAttacks The number of attack instances • bPush The attack type (true, push; false, nuke) • iTarget The target item (The column attribute/item index) • iNumFillers The size of filler item set: -1 for randomly size, >0 for fixed size • dXRand The possibility of choose an item as selected/filler item • iSeed The random seed

Type Visibility public Is Abstract false

Page 18: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [17]   University of Córdoba, Spain   

Parameter • in bPush : boolean • in dXRand : double • in iNumAttacks : int • in iNumFillers : int • in iSeed : int • in iTarget : int • inout oDataset : Dataset

chooseSelectedItems

The Average Attack does not use the selected item set.

Type void Visibility protected Is Abstract false Parameter

initialize

Initialization method.

Type void Visibility public Is Abstract false Parameter

setFillerValues

In the Average Attack, the values for filler items must be randomly generated by a Normal Distribution, using the mean and standard deviation of each item.

Type void Visibility protected Is Abstract false Parameter

setSelectedValues

The Average Attack does not use the selected item set.

Type void Visibility protected Is Abstract false Parameter

Relation Detail

Generalization

Name Related Element • IntruderAttack

Page 19: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [18]   University of Córdoba, Spain   

Class BandwagonAttack This class implements the Bandwagon Attack. This attack strategy sets the maximum value (push attack) to the target item. Then, a set of items, named selected items, are chosen between the most visibility items.

The visibility items are those having a high mean and high evaluation density. For a further description see the following paper:

B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. vol. 7, no. 4, pp. 23, 2007.

Name BandwagonAttack Qualified Name es::uco::kdis::datapro::algorithm::intruder::BandwagonAttack Visibility public Abstract false Base Classifier • IntruderAttack Realized Interface

Attribute Detail

dDensity

The density threshold, i.e. the minimum number of values in the column.

Type double Default Value Visibility protected Multiplicity

dVisibility

The visibility threshold, i.e., the possibility of choose an item to act as selected item.

Type double Default Value Visibility protected Multiplicity

rgdMeanSD

It stores the mean and standard deviation of the overall dataset.

Type Double Default Value new ArrayList<Double>()

Visibility protected Multiplicity 0..*

rgoVisibilityColumns

The array of columns whose visibility exceed the thresholds dXVisibility and dXDensity.

Type Integer Default Value new ArrayList<Integer>()

Visibility package Multiplicity 0..*

Page 20: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [19]   University of Córdoba, Spain   

rgoVisibilityMeans

The array of mean columns whose visibility exceed the thresholds dXVisibility and dXDensity.

Type Double Default Value new ArrayList<Double>() Visibility package Multiplicity 0..*

Operation Detail

BandwagonAttack

Parameterized Constructor:

• oDataset The original dataset • iNumAttacks The number of attack instances • iTarget The target item (The column attribute/item index) • iNumFillers The size of filler item set: -1 for randomly size, >0 for fixed size • iNumSelected The size of selected item set • dVisibility The visibility threshold (absolute value of column mean). • dDensity The density threshold (absolute value of instances without counting null, empty

or missing values in the column) • dXRand The possibility of choose an item as filler item • iSeed The random seed

Type Visibility public Is Abstract false Parameter • in dDensity : double

• in dVisibility : double • in dXRand : double • in iNumAttacks : int • in iNumFillers : int • in iNumSelected : int • in iSeed : int • in iTarget : int • inout oDataset : Dataset

chooseSelectedItems

Create the set of selected items. The size is prefixed by iNumSelected property.

Type void Visibility protected Is Abstract false Parameter

initialize

Initialization method for the strategy.

Page 21: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [20]   University of Córdoba, Spain   

Type void Visibility public Is Abstract false Parameter

orderArray

Order the columns using their mean as comparative metric. This method implements the QuickSort algorithm.

• iInit The initial position of the array

• iEnd The end position in the array

Type void Visibility protected Is Abstract false Parameter • in iEnd : int

• in iInit : int

setFillerValues

In the Bandwagon Attack, the values for filler items must be randomly generated by a Normal Distribution, using the mean and standard deviation of the overall dataset.

Type void Visibility protected Is Abstract false Parameter

setSelectedValues

Set the values of selected items. In the Bandwagon Attack, each selected item has the maximum value.

Type void Visibility protected Is Abstract false Parameter

setVisibilityColumns

Select the columns that exceed the visibility and density threshold.

Type void Visibility protected Is Abstract false Parameter

Relation Detail

Generalization

Name Related Element • ReverseBandwagonAttack

Page 22: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [21]   University of Córdoba, Spain   

Name Related Element • IntruderAttack

Class DatasetStatistics

Name DatasetStatistics Qualified Name es::uco::kdis::datapro::algorithm::intruder::DatasetStatistics Visibility public Abstract false Base Classifier • DatasetStrategy Realized Interface

Attribute Detail All attributes are private.

Operation Detail

DatasetStatistics

Constructor. A parameter is required:

• data Dataset over which the statistical strategy will be executed.

Type Visibility public Is Abstract false Parameter • inout data : Dataset

execute

It executes the algorithm.

Type void Visibility public Is Abstract false Parameter

getResult

It returns the mean and SD in form of an ArrayList of Double values.

Type ArrayList<Double> Visibility public Is Abstract false Parameter

Initialize

Inialization/Pre-processing method for the strategy.

Page 23: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [22]   University of Córdoba, Spain   

Type void Visibility public Is Abstract false Parameter

postexec

Type void Visibility public Is Abstract false Parameter

Relation Detail

Generalization

Name Related Element • DatasetStrategy

Class IntruderAttack IntruderAttack is the abstract base class for all the intruder attack algorithms. This class represents a generic attack used to alter the content of a dataset. It extends DatasetStrategy, whose methods are implemented and adapted to a general intruder strategy.

For a further description see the paper:

B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. vol. 7, no. 4, pp. 23, 2007.

Name IntruderAttack Qualified Name es::uco::kdis::datapro::algorithm::intruder::IntruderAttack Visibility public Abstract true Base Classifier • DatasetStrategy Realized Interface

Attribute Detail

bPush

bPush represents the version of the algorithm (true, for push attack; false for nuke attack).

Type boolean Default Value Visibility protected Multiplicity

dXRand

dXrand represents the possibility of choosing an itemm(attribute) as filler item.

Page 24: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [23]   University of Córdoba, Spain   

Type double Default Value Visibility protected Multiplicity

iActualInstance

iActualInstance represents the dataset instance modified by the attack.

Type Int Default Value Visibility Protected Multiplicity

iNumAttacks

iNumAttacks represents the number of attack instances that will be generated.

Type int Default Value Visibility protected Multiplicity

iNumFillers

iNumFillers is the number of filler items, -1 if the filler item set size is randomly chosen.

Type int Default Value Visibility protected Multiplicity

iNumSelected

iNumSelected is the number of selected items, -1 if the selected item set size is randomly chosen.

Type Int Default Value Visibility Protected Multiplicity

iSeed

iSeed is the seed for the oRand object.

Type Int Default Value Visibility Protected Multiplicity

iTarget

iTarget is the target attribute of the attack.

Page 25: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [24]   University of Córdoba, Spain   

Type int Default Value Visibility protected Multiplicity

oInjection

oInjection stores the attack instances.

Type Dataset Default Value Visibility protected Multiplicity

oRand

oRand represents a random object.

Type Random Default Value Visibility protected Multiplicity

rgoFillers

rgoFillers is the set of selected items.

Type ColumnAbstraction Default Value new ArrayList<ColumnAbstraction>()

Visibility protected Multiplicity 0..*

rgoSelected

rgoSelected is the set of selected items.

Type ColumnAbstraction Default Value new ArrayList<ColumnAbstraction>()

Visibility protected Multiplicity 0..*

Operation Detail

addAttack

Add a new instance (all items set to missed value) to the injection.

Type void Visibility protected Is Abstract false Parameter

Page 26: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [25]   University of Córdoba, Spain   

chooseFillerItems

Select the set of filler items. This set is common for all the intruder attack algorithms.

Type void Visibility protected Is Abstract false Parameter

chooseSelectedItems

Select the set of selected items. The selection process is part of a specific intruder attack algorithm.

Type void Visibility protected Is Abstract true Parameter

createRandomSetOfFiller

Select a random set of columns to act as filler items. The set size is also randomly selected. It returns the array of dataset columns that will act as filler items.

Type ArrayList<ColumnAbstraction> Visibility protected Is Abstract false Parameter

createSetOfFiller

Select a random set of columns to act as filler items. The set size is prefixed by iNumFiller property. It returns the array of dataset columns that will act as filler items.

Type ArrayList<ColumnAbstraction> Visibility protected Is Abstract false Parameter

execute

Implements the strategy of attack algorithms.

Type void Visibility public Is Abstract false Parameter

getMeanAndSD

Calculate the mean and standard deviation of the overall dataset. It returns an array with two elements, mean and standard deviation.

Type ArrayList<Double> Visibility protected Is Abstract false Parameter

Page 27: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [26]   University of Córdoba, Spain   

getResult

Return the dataset injection created. It returns the object comprising the injection after the attack.

Type Object Visibility public Is Abstract false Parameter

initialize

Initialize the algorithm to prepare the execution.

Type void Visibility public Is Abstract false Parameter

isSelectedColumn

This method returns a true value if the rgoSelected contains a column named as sName parameter, false otherwise.

sName The name of the column to be searched. It returns True if the column exists, false if not.

Type boolean Visibility protected Is Abstract false Parameter inout sName: String

postexec

Post-processing after the execute method.

Type void Visibility public Is Abstract false Parameter

setFillerValues

This method assigns the correct value for each filler item. It depends on the intruder attack algorithm.

Type void Visibility protected Is Abstract true Parameter

setMaximumValue

Assign the maximum value to the target item.

Type void Visibility protected Is Abstract false

Page 28: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [27]   University of Córdoba, Spain   

Parameter

setMinimumValue

Assign the minimum value to the target item.

Type void Visibility protected Is Abstract false Parameter

setSelectedValues

The selected items value generation process. It is also depends on the specific intruder attack algorithm.

Type void Visibility protected Is Abstract true Parameter

Relation Detail

Generalization

Name Related Element • AverageAttack

Name Related Element • DatasetStrategy

Name Related Element • RandomAttack

Name Related Element • LoveHateAttack

Name Related Element • BandwagonAttack

Name Related Element • SegmentAttack

Class LoveHateAttack This class implements the Love/Hate Attack. This attack strategy sets the maximum value (push attack) or the minimum value (nuke attack) to the target item. The filler items are selected randomly and their values are assigned in the opposite sense of the target item.

For a further description see the paper:

B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. vol. 7, no. 4, pp. 23, 2007.

Page 29: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [28]   University of Córdoba, Spain   

Name LoveHateAttack Qualified Name es::uco::kdis::datapro::algorithm::intruder::LoveHateAttack Visibility public Abstract false Base Classifier • IntruderAttack Realized Interface

Operation Detail

chooseSelectedItems

The Love/Hate Attack does not use the selected items.

Type void Visibility protected Is Abstract false Parameter

initialize

Initialization method.

Type void Visibility public Is Abstract false Parameter

LoveHateAttack

Parameterized Constructor:

• oDataset The original dataset

• iNumAttacks The number of attack instances

• bPush The attack type (true, push; false, nuke)

• iTarget The target item (The column attribute/item index)

• iNumFillers The size of filler item set: -1 for randomly size, >0 for fixed size

• dXRand The possibility of choose an item as selected/filler item

• iSeed The random seed

Type Visibility public Is Abstract false Parameter • in bPush : boolean

• in dXRand : double • in iNumAttacks : int • in iNumFillers : int • in iSeed : int • in iTarget : int

• inout oDataset : Dataset

setFillerValues

In the Love/Hate Attack, the values for filler items must be assigned in the opposite sense of the type of

Page 30: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [29]   University of Córdoba, Spain   

attack. If it is a push attack, all the filler items will be set to minimum value; if it is a nuke attack, all the filler items will be set to maximum value.

Type void Visibility protected Is Abstract false Parameter

setSelectedValues

The Love/Hate Attack does not use the selected items.

Type void Visibility protected Is Abstract false Parameter

Relation Detail

Generalization

Name Related Element • IntruderAttack

Class RandomAttack This class implements the Random Attack. This attack strategy sets the maximum value (push attack) or the minimum value (nuke attack) to the target item. The filler items are selected randomly and their values are also chosen with a Normal Distribution, using the global dataset mean and standard deviation.

For a further description read the article:

B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. vol. 7, no. 4, pp. 23, 2007.

Name RandomAttack Qualified Name es::uco::kdis::datapro::algorithm::intruder::RandomAttack Visibility public Abstract false Base Classifier • IntruderAttack Realized Interface

Attribute Detail All attributes are private.

Operation Detail

chooseSelectedItems

The Random Attack does not use the selected items.

Page 31: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [30]   University of Córdoba, Spain   

Type void Visibility protected Is Abstract false Parameter

initialize

Initialization method.

Type void Visibility public Is Abstract false Parameter

RandomAttack

Parameterized Constructor:

• oDataset The original dataset

• iNumAttacks The number of attack instances

• bPush The attack type (true, push; false, nuke)

• iTarget The target item (The column attribute/item index)

• iNumFillers The size of filler item set: -1 for randomly size, >0 for fixed size

• dXRand The possibility of choose an item as selected/filler item

• iSeed The random seed

Type Visibility public Is Abstract false Parameter • in bPush : boolean

• in dXRand : double • in iNumAttacks : int • in iNumFillers : int • in iSeed : int • in iTarget : int • inout oDataset : Dataset

setFillerValues

In the Random Attack, the values for filler items must be randomly generated by a Normal Distribution, using the mean and standard deviation of the dataset.

Type void Visibility protected Is Abstract false Parameter

setSelectedValues

The Random Attack does not use the selected items.

Type void Visibility protected Is Abstract false Parameter

Page 32: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [31]   University of Córdoba, Spain   

Relation Detail

Generalization

Name Related Element • IntruderAttack

Class ReverseBandwagonAttack This class implements the Reverse Bandwagon Attack. This attack strategy sets the minimum value (nuke attack) to the target item. Then, a set of items, named selected items, are chosen between the less visibility items. The visibility items are those having a low mean and high evaluation density.

For a better description read the article:

B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. vol. 7, no. 4, pp. 23, 2007.

Name ReverseBandwagonAttack Qualified Name es::uco::kdis::datapro::algorithm::intruder::ReverseBandwagonAttack Visibility public Abstract false Base Classifier • BandwagonAttack Realized Interface

Operation Detail

chooseSelectedItems

Create the set of selected items. The size is prefixed by iNumSelected property.

Type void Visibility protected Is Abstract false Parameter

initialize

Initialization method.

Type void Visibility public Is Abstract false Parameter

ReverseBandwagonAttack

Parameterized Constructor:

• oDataset The original dataset

• iNumAttacks The number of attack instances

• iTarget The target item (The column attribute/item index)

Page 33: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [32]   University of Córdoba, Spain   

• iNumFillers The size of filler item set: -1 for randomly size, >0 for fixed size

• iNumSelected The size of selected item set: -1 for randomly size, >0 for fixed size

• dXVisibility The visibility threshold

• dXDensity The density threshold

• dXRand The possibility of choose an item as selected/filler item

• iSeed The random seed

Type Visibility public Is Abstract false Parameter • in dXDensity : double

• in dXRand : double • in dXVisibility : double • in iNumAttacks : int • in iNumFillers : int • in iNumSelected : int • in iSeed : int • in iTarget : int • inout oDataset : Dataset

setSelectedValues

Set the values of selected items. In the Reverse Bandwagon Attack, each selected item has the minimum value.

Type void Visibility protected Is Abstract false Parameter

setVisibilityColumns

Select the columns that exceed the visibility and density threshold.

Type void Visibility protected Is Abstract false Parameter

Relation Detail

Generalization

Name Related Element • BandwagonAttack

Class SegmentAttack This class implements the Segment Attack. This attack strategy sets the maximum value (push attack) to the

Page 34: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [33]   University of Córdoba, Spain   

target item. Then, a set of selected items (the segment) are set to the maximum value. Finally, a set of filler items are randomly chosen and the minimum value are set to their.

For a better description read the article:

B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. 7(4):1-23, 2007.

Name SegmentAttack Qualified Name es::uco::kdis::datapro::algorithm::intruder::SegmentAttack Visibility public Abstract false Base Classifier • IntruderAttack Realized Interface

Attribute Detail

rgdMeanSD

rgdMeanSDstores the mean and standard deviation of the overall dataset.

Type Double Default Value new ArrayList<Double>()

Visibility protected Multiplicity 0..*

Operation Detail

chooseSelectedItems

Create the segment, the set of selected item, with the information given in rgsNamesOfSelected. It returns the array of dataset columns that will act as selected items.

Type void Visibility protected Is Abstract false Parameter

initialize

Initialization method.

Type void Visibility public Is Abstract false Parameter

SegmentAttack

Parameterized Constructor:

• oDataset The original dataset

• iNumAttacks The number of attack instances

• iTarget The target item (The column attribute/item index)

• iNumFillers The size of filler item set: -1 for randomly size, >0 for fixed size

• rgsNamesOfSelected The array with the names of the columns that will act

Page 35: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [34]   University of Córdoba, Spain   

as selected items (the segment)

• dXRand The possibility of choose an item as selected/filler item

• iSeed The random seed

Type Visibility public Is Abstract false Parameter • in dXRand : double

• in iNumAttacks : int • in iNumFillers : int • in iSeed : int • in iTarget : int • inout oDataset : Dataset • inout rgsNamesOfSelected : ArrayList<String>

setFillerValues

Set the value for filler items. In the Segment Attack, the minimum value is assigned.

Type void Visibility protected Is Abstract false Parameter

setSelectedValues

Set the values for the selected items. In the Segment Attack, the maximum value is assigned.

Type void Visibility protected Is Abstract false Parameter

Relation Detail

Generalization

Name Related Element • IntruderAttack

Page 36: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [35]   University of Córdoba, Spain   

Package es::uco::kdis::datapro::algorithm::preprocessing

Figure 5. Package es.uco.kdis.datapro.algorithm.preprocessing

Name preprocessing Qualified Name es::uco::kdis::datapro::algorithm::preprocessing

Page 37: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [36]   University of Córdoba, Spain   

Package es::uco::kdis::datapro::algorithm::preprocessing:: discretization

Figure 6. Package es.uco.kdis.datapro.algorithm.preprocessing.discretization

Name discretization Qualified Name es::uco::kdis::datapro::algorithm::preprocessing::discretization

Class EqualWidthDiscretization Equal-width discretization of a given numerical/integer column of the dataset. A RangeColumn is returned. Notice that this class is inherited from EqualFrequencyDiscretization.

Figure 7. Class EqualWidthDiscretization

Name EqualWidthDiscretization Qualified Name es::uco::kdis::datapro::algorithm::preprocessing::discretization::EqualWidthDi

scretization Visibility public Abstract false Base Classifier • DatasetStrategy Realized Interface

Attribute Detail

iBins

iBins is the number of bins.

Type int Default Value Visibility protected

Page 38: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [37]   University of Córdoba, Spain   

Multiplicity

oCol

The column to be discretized.

Type NumericalColumn Default Value Visibility protected Multiplicity

oRangeColumn

The column returned as result.

Type RangeColumn Default Value Visibility protected Multiplicity

sColName

The name of the column to be discretized.

Type String Default Value Visibility protected Multiplicity

sResName

The name of the resulting column.

Type String Default Value Visibility protected Multiplicity

Operation Detail

calculateDRangeColumn

This (protected) method creates a new RangeColumn taking both the intervals given as parameter and the values comprised by the original numerical column.

• aoRanges Array of intervals

• sName Name of the new column

It returns the resulting RangeColumn.

Type RangeColumn Visibility protected Is Abstract false

Page 39: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [38]   University of Córdoba, Spain   

Parameter • inout aoRanges : DoubleRange • inout sName : String

EqualWidthDiscretization

Parameterized Constructor:

• oDataset The dataset to be processed.

• iBins The number of bins.

• sColName The name of the column to be processed.

• sResName The name of the resulting column .

Type Visibility public Is Abstract false Parameter • in iBins : int

• inout oDataset : Dataset • inout sColName : String • inout sResName : String

execute

This method runs the discretization process. Firstly, it calculates the cut-points and sets the range intervals.

Type void Visibility public Is Abstract false Parameter

getResult

The discretized RangeColumn is returned.

Type Object Visibility public Is Abstract false Parameter

initialize

The initialization method. Types of the column and its values are checked.

Type void Visibility public Is Abstract false Parameter

postexec

Not required.

Type void Visibility public Is Abstract false Parameter

Page 40: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4Rev 1 (Jul

More @ htt 

Relation

Gene

NaRe

NaRe

ClassEqual-frreturned

Name Qualif

VisibilAbstraBase C

Realiz

Attribut

All attrib

Operatio

Notice t

Equa

Parame

Parame

4j ly 2012) 

tp://www.jrrom

n Detail

eralization

ame elated Eleme

ame elated Eleme

s EqualFrequency disd.

fied Name

lity act Classifier

zed Interface

e Detail

butes are priv

on Detail

that this class

lFrequency

etrized constr

eters:

iBins NumoDataset SsColName sResName

mero.net/en 

ent •

ent •

Frequenscretization o

EqualFes::ucocyDiscpublic false • D• E

e

vate.

s is inherited

Discretizatio

ructor.

mber of bins tSource datasName of theName of the

KDIS ReseUniversity of 

EqualFrequ

DatasetStra

ncyDiscf a given num

Figure 8. Cla

FrequencyDiso::kdis::datapretization

DatasetStratequalWidthDi

d from Equal

on

to be createdset containinsource coluresulting Ra

 

earch GroupCórdoba, Spain

uencyDiscret

ategy

cretizatimerical/integ

ass EqualFreq

scretizationpro::algorithm

egy iscretization

lWidthDisc

d ng the colummn ange column

n

ization

ion ger column of

quencyDiscre

m::preproces

cretizatio

n to be discr

n

f the dataset

etization

sing::discret

on.

retized

The pro

t. A RangeCo

tization::Equa

ogrammer’s gu

[39] 

olumn is

alFrequen

ide 

Page 41: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [40]   University of Córdoba, Spain   

Type Visibility public Is Abstract false Parameter • in iBins : int

• inout oDataset : Dataset • inout sColName : String • inout sResName : String

execute

This method makes the discretization by frequency of the column passed as parameter.

Type void Visibility public Is Abstract false Parameter

Relation Detail

Generalization

Name Related Element • DatasetStrategy

Name Related Element • EqualWidthDiscretization

Class MDLPDiscretize

Figure 9. Class MDLPDiscretize

Name MDLPDiscretize Qualified Name es::uco::kdis::datapro::algorithm::preprocessing::discretization::MDLPDiscreti

ze Visibility public Abstract false Base Classifier • DatasetStrategy Realized Interface

Page 42: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [41]   University of Córdoba, Spain   

Attribute Detail All attributes are private.

Operation Detail

execute

This method runs the discretization process following the MDLP algorithm.

Type void Visibility public Is Abstract false Parameter

getResult

It returns the discretized dataset.

Type Object Visibility public Is Abstract false Parameter

initialize

The initialize() strategy method. It takes the whole dataset, and distribute each column in a LinkedList that contains a double array where the first value is the concrete value of the column, the second value is the label associated.

Type void Visibility public Is Abstract false Parameter

MDLPDiscretize

Constructor with parameters:

• oDataset source dataset Note: class labels are supposed to be in the last column of the dataset.

Type Visibility public Is Abstract false Parameter • inout oDataset : Dataset

postexec

The postexec() strategy method

Page 43: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [42]   University of Córdoba, Spain   

Type void Visibility public Is Abstract false Parameter

Relation Detail

Generalization

Name Related Element • DatasetStrategy

Page 44: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [43]   University of Córdoba, Spain   

Package es::uco::kdis::datapro::algorithm::preprocessing:: instance

Figure 10. Package es.uco.kdis.datapro.algorithm.preprocessing.instance

Name instance Qualified Name es::uco::kdis::datapro::algorithm::preprocessing::instance

Class RemoveDuplicates This class modifies the content of a Dataset by removing duplicate instances from this dataset.

Figure 11. Class RemoveDuplicates

Name RemoveDuplicates Qualified Name es::uco::kdis::datapro::algorithm::preprocessing::instance::RemoveDuplicatesVisibility public Abstract false Base Classifier • DatasetStrategy Realized Interface

Attribute Detail All attributes are private.

Operation Detail

execute

Execution method.

Page 45: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [44]   University of Córdoba, Spain   

Type void Visibility public Is Abstract false Parameter

getResult

It returns the clean dataset.

Type Object Visibility public Is Abstract false Parameter

initialize

Initialize the algorithm to prepare the execution.

Type void Visibility public Is Abstract false Parameter

postexec

Post-processing.

Type void Visibility public Is Abstract false Parameter

RemoveDuplicates

Parameterized Constructor:

• oDataset The source dataset to work with.

Type Visibility public Is Abstract false Parameter • inout oDataset : Dataset

Relation Detail

Generalization

Name Related Element • DatasetStrategy

Class RemovePercentage This class modifies the content of a dataset by removing a percentage of its instances.

Page 46: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [45]   University of Córdoba, Spain   

Figure 12. Class RemovePercentage

Name RemovePercentage Qualified Name es::uco::kdis::datapro::algorithm::preprocessing::instance::RemovePercentag

e Visibility public Abstract false Base Classifier • DatasetStrategy Realized Interface

Attribute Detail

RANDOM

RANDOM mode, when instances to be removed are randomly selected.

Type int Default Value 0 Visibility public Multiplicity

FROMINIT

FROMINIT mode, when instances to be removed are taken from the beginning of the column.

Type int Default Value 1 Visibility public Multiplicity

FROMEND

FROMEND mode, when instances to be removed are taken from the end of the column.

Type int Default Value 2 Visibility public Multiplicity

Page 47: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [46]   University of Córdoba, Spain   

oRnd

oRnd is the random generator object.

Type Random Default Value new Random() Visibility public Multiplicity

Operation Detail

execute

Execute method.

Type void Visibility public Is Abstract false Parameter

getResult

Return the resulting dataset from the strategy process.

Type Object Visibility public Is Abstract false Parameter

initialize

Initialize the algorithm to prepare the execution.

Type void Visibility public Is Abstract false Parameter

postexec

Post-processing method.

Type void Visibility public Is Abstract false Parameter

RemovePercentage

Parameterized Constructor:

• oDataset The source dataset

• iMode The mode of removal

• dPercentage The percentage of instances (in [0,1]) to remove from the dataset

Page 48: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [47]   University of Córdoba, Spain   

Type Visibility public Is Abstract false Parameter • in dPercentage : double

• in iMode : int • inout oDataset : Dataset

Relation Detail

Generalization

Name Related Element • DatasetStrategy

Page 49: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4Rev 1 (Jul

More @ htt 

Packes::u

Name Qualif

ClassThis claalgorithm

Name QualifVisibilAbstraBase CRealiz

Attribut All attrib

Operatio

execu

It runs t

TyVisIs Pa

4j ly 2012) 

tp://www.jrrom

kage uco::k

fied Name

s KFoldsass implemenm.

fied Name lity act Classifier zed Interface

e Detail

butes are priv

on Detail

ute

he KFolds al

ype sibility Abstract

arameter

mero.net/en 

kdis::d

Figure

validaties::uco

s nts the strateg

Figure 14

KFoldses::ucopublic false • D

e

vate.

lgorithm. Afte

voidpublfalse

KDIS ReseUniversity of 

atapro

13. Package

on o::kdis::datap

gy that calcul

. Class es.uc

s o::kdis::datap

DatasetStrate

er the execu

ic e

 

earch GroupCórdoba, Spain

o::algo

es.uco.kdis.d

pro::algorithm

ates the diffe

co.kdis.datapr

pro::algorithm

egy

tion, the algo

n

orithm

datapro.algor

m::validation

erent partition

ro.algorithm.

m::validation

orithm is not

m::valid

rithm.validati

ns of the data

validation.KF

::KFolds

executable a

The pro

dation

ion

aset using th

Folds

anymore.

ogrammer’s gu

[48] 

n

he KFolds

ide 

Page 50: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [49]   University of Córdoba, Spain   

getResult

This method returns the list containing the resulting dataset partitions.

Type List<Object> Visibility public Is Abstract false Parameter

initialize

This method initializes the algorithm. The instances are sorted as a HashMap by categories.

Type void Visibility public Is Abstract false Parameter

KFolds

Parameterized constructor. Notice that the class column is supposed to be the last column in the dataset:

• oDataset Source dataset • iNumberOfPartitions Number of partitions to be built

Type Visibility public Is Abstract false Parameter • in iNumberOfPartitions : int

• inout oDataset : Dataset

KFolds

Parameterized constructor. Notice that the class column is supposed to be the last column in the dataset:

• oDataset Source dataset • iNumberOfPartitions Number of partitions to be built • iSeed If the programmer wants to reproduce a previous partition, he can indicate a given seed to the process. Otherwise, the seed is randomly selected.

Type Visibility public Is Abstract false Parameter • in iNumberOfPartitions : int

• inout oDataset : Dataset

postexec

Not required.

Type void Visibility public Is Abstract false Parameter

Page 51: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [50]   University of Córdoba, Spain   

Relation Detail

Generalization

Name Related Element • DatasetStrategy

Page 52: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [51]   University of Córdoba, Spain   

Package es::uco::kdis::datapro::dataset

Figure 15. Package es.uco.kdis.datapro.dataset

Name dataset Qualified Name es::uco::kdis::datapro::dataset

Class Dataset Dataset is the abstract base class for all the different types of dataset sources. This class fills the gap between the physical dataset (stored in a file, database, etc.) and its logical handling, where the access to attributes/columns and processing methods is provided.

Page 53: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [52]   University of Córdoba, Spain   

Figure 16. Class Dataset

Name Dataset Qualified Name es::uco::kdis::datapro::dataset::Dataset Visibility public Abstract true Base Classifier Realized Interface

Attribute Detail

iCursor

iCursor refers to the row being pointed in the dataset by the InstanceIterator.

Page 54: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [53]   University of Córdoba, Spain   

Type int Default Value Visibility Protected Multiplicity

rgoColumns

rgoColumns is the list of columns that comprise the dataset.

Type ColumnAbstraction Default Value Visibility protected Multiplicity 0..*

rgoValidBinaryFalseValues

For binary columns, it contains the list of values that will be interpreted as False when reading from the physical dataset. Writing will be performed using the first element in the list.

Type String Default Value Visibility Protected Multiplicity 0..*

rgoValidBinaryTrueValues

For binary columns, it contains the list of values that will be interpreted as True when reading from the physical dataset. Writing will be performed using the first element in the list.

Type String Default Value Visibility protected Multiplicity 0..*

sOpenRangeDelimiter

For range columns, sOpenRangeDelimiter stores the symbol(s) that open the numerical range, right before the minimum value: e.g., '[' for [2,3]. This is used during the reading and writing of the physical dataset.

Type String Default Value Visibility protected Multiplicity

sSeparationRangeDelimiter

For range columns, sSeparationRangeDelimiter stores the symbol(s) that separate the minimum and maximum values in a numerical range: e.g., ',' for [2,3]. This value is only used during the reading and writing of the physical dataset.

Page 55: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [54]   University of Córdoba, Spain   

Type String Default Value Visibility protected Multiplicity

sCloseRangeDelimiter

For range columns, sCloseRangeDelimiter stores the symbol(s) that serves to close the numerical range, right after the maximum value: e.g., ']' for [2,3]. This is only used during the reading and writing of the physical dataset.

Type String Default Value Visibility protected Multiplicity

tiplicity

sEmptyValue

sEmptyValue stores the string that will represent an empty value in the dataset file.

Type String Default Value Visibility protected Multiplicity

sMissingValue

sMissedValue stores the string that will represent a missing value in the dataset file.

Type String Default Value Visibility protected Multiplicity

sNullValue

sNullValue stores the string that will represent a null value in the dataset file.

Type String Default Value Visibility protected Multiplicity

sName

The name of the dataset.

Type String Default Value Visibility protected Multiplicity

Page 56: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [55]   University of Córdoba, Spain   

Operation Detail

addAllValues

A set of column values are inserted into the dataset structure. Notice that instance duplication is not checked.

Parameters:

• sColumnFormat String that specifies the types of the columns to be added. Types depend on the specific dataset.

Exceptions:

• IOException • IllegalFormatSpecificationException • NotAddedValueException • IndexOutOfBoundsException

Type void Visibility protected Is Abstract true Parameter • inout sColumnFormat : String

addColumn

Insert a column abstraction given by parameter in the last position of the list of columns of the dataset

Parameter:

• oColumn: Column abstraction to be added

Type void Visibility public Is Abstract false Parameter • inout oColumn : ColumnAbstraction

addColumn

Insert a column abstraction in a given position of the list of dataset columns.

Parameters:

• oColumn: Column abstraction to be inserted • iIndex: Position index where the column element is added in the list. The rest of column

items will be shifted one position to the right. Exceptions:

• UnsupportedOperationException • ClassCastException • NullPointedException • IllegalArgumentException • IndexOutOfBoundsException

Type void Visibility public Is Abstract false Parameter • inout iIndex : int

• inout oColumn : ColumnAbstraction

Page 57: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [56]   University of Córdoba, Spain   

clone

Create a new dataset exactly with the same metadata and column structure. However, only the structure is copied, since instances from the original dataset are not added to the new one.

It returns the empty cloned dataset.

Type Dataset Visibility public Is Abstract false Parameter

close

Abstract method that serves to close the physical dataset source.

Exceptions:

• IOException

Type void Visibility protected Is Abstract true Parameter

copy

This method creates a new dataset exactly with the same metadata, column structure and data than the original dataset. In this case, instances from the original dataset are also copied to the new one.

A copy of the dataset is returned.

Type Dataset Visibility public Is Abstract false Parameter

Dataset

This is the default constructor of this class. By default, it sets the following parameters to their default values:

• sMissedValue: "?" • sNullValue: "?" • sEmptyValue: "?" • sOpenRangeDelimiter: "[" • sSeparationRangeDelimiter: "," • sCloseRangeDelimiter: "]"

Notice that using these symbols is not mandatory for reading/writing, as its applicability depends on the specific implementation of each source dataset.

Page 58: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [57]   University of Córdoba, Spain   

Type Visibility public Is Abstract false Parameter

getColumn

This method looks for a column abstraction by its index in the column list. Notice that indexes can change when one column is added or removed to/from intermediate positions.

Parameter:

• iIndex: Index of the queried column. It returns a reference to the column abstraction queried.

Type ColumnAbstraction Visibility public Is Abstract false Parameter • in iIndex : int

getColumnByName

This method returns the first column instance found having the name required as parameter. Parameter:

• sName: The name of the column queried (no case-sensitive) It returns the column abstraction class that accesses to the column required by its name.

Type ColumnAbstraction Visibility public Is Abstract false Parameter • inout sName : String

getColumns

Getter method for the private property rgoColumns, which comprises the array of column abstractions in the dataset.

Type List<ColumnAbstraction> Visibility public Is Abstract false Parameter

getEmptyValue

Getter method for the private property sEmptyValue, which comprises the String that represents the symbol for the empty value in the dataset source. Notice that each specific dataset implementation can use, or not, this property accordingly.

Type String Visibility public Is Abstract false Parameter

Page 59: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [58]   University of Córdoba, Spain   

getIndexOfColumn

Given a column abstraction, it searches for the index that this column occupies in the array of column abstractions in the dataset.

Parameter:

• oCol: Column to be located.

It returns the index of the column abstraction passed as parameter; -1, otherwise.

Type int Visibility public Is Abstract false Parameter • inout oCol : ColumnAbstraction

getMissingValue

Getter method for the private property sMissingValue, which comprises the String that represents the symbol for the missing value in the dataset source. Notice that each specific dataset implementation can use, or not, this property accordingly.

Type String Visibility public Is Abstract false Parameter

getName

Getter method for the private property sName, which represents the name given to the dataset.

Type String Visibility public Is Abstract false Parameter

getNullValue

Getter method for the private property sNullValue, which comprises the String that represents the symbol for the null value in the dataset source. Notice that each specific dataset implementation can use or not this property accordingly.

Type String Visibility public Is Abstract false Parameter

getNumberOfDecimals

Getter method for the private property iNumberOfDecimals, which indicates the number of decimal digits used when writing numerical columns in dataset sources. Notice that this value can be used accordingly by each specific dataset source.

Type int Visibility public Is Abstract false Parameter

Page 60: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [59]   University of Córdoba, Spain   

getRangeDelimiters

This method gets a list of the three values used to demarcate a range, comprising the sOpenRangeDelimiter, sSeparationRangeDelimiter and sCloseRangeDelimiter. Notice that each specific dataset source could make use of these values accordingly.

Type ArrayList<String> Visibility public Is Abstract false Parameter

getValidBinaryFalseValues

Getter method for the private property rgoValidBinaryFalseValues: the list of strings that are interpreted as false when reading and writing Boolean values in a dataset source. Notice that its specific use depends on the implementation made for each dataset source.

Type ArrayList<String> Visibility public Is Abstract false Parameter

getValidBinaryTrueValues

Getter method for the private property rgoValidBinaryTrueValues: the list of strings that are interpreted as true when reading and writing Boolean values in a dataset source. Notice that its specific use depends on the implementation made for each dataset source.

Type ArrayList<String> Visibility public Is Abstract false Parameter

merge

This method merges two datasets by adding the dataset passed as parameter to the current one. Parameters:

• oDSInjected: The dataset to be added. Notice that this dataset must contain the same number and type of columns than the dataset object this.

Type void Visibility public Is Abstract false Parameter • inout oDSInjected : Dataset

merge

This method merges two datasets by adding the dataset passed as parameter to the dataset object this.

Parameters:

oDataset: The dataset to be added. sColumnFormat: Sometimes the target dataset contains more columns than the source dataset.

For those cases, the columns to be added can be explicitly specified. This parameter is a String that indicates the columns to be added. Each character in the String matches to a column in the target dataset. The String may comprise some of the following characters:

o x: Include this column

Page 61: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [60]   University of Córdoba, Spain   

o %: Skip this column.

Type void Visibility public Is Abstract false Parameter • inout oDataset : Dataset

• inout sColumnFormat : String

open

Abstract protected method. This method just opens the source dataset and initializes the row cursor to the first row of data. However, each specific dataset class is responsible for its implementation, and thus defining its real scope, according to its specific properties.

Notice that each type of datasets will provide specific methods to process the full dataset. For example, file datasets provide the method readDataset.

Exceptions:

• FileNotFoundException • IOException • IllegalFormatSpecificationException

Type void Visibility protected Is Abstract true Parameter

removeColumn

This method removes a column from the dataset. Notice that column indexes can be modified (decreased) for the rest of columns. The column removed is returned.

Parameter:

• iIndex: Position index where the column to be removed is located. Exceptions:

• UnsupportedOperationException • IndexOutOfBoundsException

Type ColumnAbstraction Visibility public Is Abstract false Parameter • in iIndex : int

setColumns

Setter method for the property rgoColumns. Even when it is a public method, notice that it should be used very carefully, mainly for those cases when the replacement of the entire set of columns is mandatory. To add or remove a single column, or just a set of them, use instead the methods addColumn and removeColumn.

Parameter:

• rgoCols: The entire list of columns in the dataset.

Page 62: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [61]   University of Córdoba, Spain   

Type void Visibility public Is Abstract false Parameter • inout rgoCols : List<ColumnAbstraction>

setEmptyValue

Setter method for the private property sEmptyValue, which comprises the String that represents the symbol for the null value in the dataset source. Notice that each specific dataset implementation can make its own use of this property accordingly.

Parameters:

• sEmptyValue The symbol/string representing an empty value in the dataset

Type void Visibility public Is Abstract false Parameter • inout sEmptyValue : String

setMissingValue

Setter method for the private property sMissingValue, which comprises the String that represents the symbol for the missing value in the dataset source. Notice that each specific dataset implementation can make its own use of this property accordingly.

Parameters:

• sMissingValue The symbol/string representing a missing value in the dataset

Type void Visibility public Is Abstract false Parameter • inout sMissingValue : String

setName

Setter method for the private property sName, which represents the name of the dataset. Parameter:

• sName: The name of the dataset.

Type void Visibility public Is Abstract false Parameter • inout sName : String

setNullValue

Setter method for the private property sNullValue, which comprises the String that represents the symbol for the null value in the dataset source. Notice that each specific dataset implementation can make its own use of this property accordingly.

Parameters:

• sNullValue The symbol/string representing a null value in the dataset

Type void Visibility public Is Abstract false Parameter • inout sNull : String

Page 63: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [62]   University of Córdoba, Spain   

setNumberOfDecimals

Setter method for the private property iNumberOfDecimals, which represents the number of decimals that the programmer wants to set for numerical values. Notice that the specific applicability of this attribute directly depends on the specific implementation of the dataset source.

Parameter:

• iNum: The number of decimal digits that will be considered when saving numerical values.

Type void Visibility public Is Abstract false Parameter • in iNum : int

setRangeDelimiters

This method sets the symbols that will serve as range delimiter. Notice that the specific applicability of these attributes directly depends on the specific implementation of the dataset source.

Parameters:

• sInitial: The symbol/string that represents the starting delimiter. • sSeparator: The symbol/string that represents the value separator. • sEnding: The symbol/string that represents the ending delimiter.

Type void Visibility public Is Abstract false Parameter • inout sEnding : String

• inout sInitial : String • inout sSeparator : String

setValidBinaryFalseValues

Setter method of the list rgoValidBinaryFalseValues, which contains the set of strings that represent a False boolean value in the dataset. Notice that only the first value of the list should be used for serialization (reading or writing). In any case, it depends on the specific implementation of each dataset source.

Parameter:

• rgoValidBinaryFalseValues: The list of values that will be interpreted as False.

Type void Visibility public Is Abstract false Parameter • inout rgoValidBinaryFalseValues : ArrayList<String>

setValidBinaryTrueValues

Setter method of the list rgoValidBinaryTrueValues, which contains the set of strings that represent a True boolean value in the dataset. Notice that only the first value of the list should be used for serialization (reading or writing). In any case, it depends on the specific implementation of each dataset source.

Parameter:

• rgoValidBinaryTrueValues: The list of values that will be interpreted as True.

Page 64: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [63]   University of Córdoba, Spain   

Type void Visibility public Is Abstract false Parameter • inout rgoValidBinaryTrueValues : ArrayList<String>

setValidBinaryValues

This method sets both the list of strings that will represent a True boolean value, and the list of strings that will represent a False boolean value in the dataset. This functionality could be also done by invoking seldom specific methods.

Parameters:

• rgoFalseList: A list with the valid False symbols/strings

• rgoTrueList: A list with the valid True symbols/strings

Type void Visibility public Is Abstract false Parameter • inout rgoFalseList : ArrayList<String>

• inout rgoTrueList : ArrayList<String>

swapColumns

This method swaps two columns in the list of columns of the dataset. It searches for both columns, and swaps its positions, and thus both structure and data.

Parameters:

• oColumn1: The first column to swap. • oColumn2: The second column to swap.

Exceptions:

• ColumnAbstraction • UnsupportedOperationException • ClassCastException • NullPointedException • IllegalArgumentException • IndexOutOfBoundsException

Type void Visibility public Is Abstract false Parameter • inout oColumn1 : ColumnAbstraction

• inout oColumn2 : ColumnAbstraction

Relation Detail

Association

Name rgoColumns Related Element • ColumnAbstraction

Dependency

Page 65: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [64]   University of Córdoba, Spain   

Name Related Element • InstanceIterator

Generalization

Name Related Element • FileDataset

Class FileDataset This abstract class represents a dataset when its source is extracted from a file. It includes the specific methods required to handle with datasets in form of files.

Figure 17. Class FileDataset

Name FileDataset Qualified Name es::uco::kdis::datapro::dataset::FileDataset Visibility public Abstract true Base Classifier • Dataset Realized Interface

Attribute Detail

oBufferedReader

oBufferedReader is the buffer used to read the file.

Type BufferedReader Default Value Visibility protected Multiplicity

sCommentValue

sCommentedValue stores the string that will indicate the beginning of a comment line in the dataset file, if

Page 66: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [65]   University of Córdoba, Spain   

this line has to be omitted from the processing.

Type String Default Value Visibility protected Multiplicity

sFileName

sFileName is the name of the file source that contains the dataset.

Type String Default Value Visibility protected Multiplicity

sSeparationSymbol

sSeparationSymbol stores the symbol/string that indicates the separator between values of the same instance-row (i.e., a comma, a line of the dataset file, etc).

Type String Default Value Visibility protected Multiplicity

Operation Detail

clone

This method creates a new dataset exactly with the same type and column structure than the original. Instances from the original dataset are not copied. It returns a new Dataset instance.

Type Dataset Visibility public Is Abstract false Parameter

copy

This method clones the dataset and fills its content with the instances extracted from the original. Create a new dataset exactly with the same type, column structure and data. It returns the copied Dataset instance.

Type Dataset Visibility public Is Abstract false Parameter

FileDataset

Default constructor. Notice that the following symbols are used by default:

• sCommentValue: "%" • sSeparationSymbol: ","

Page 67: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [66]   University of Córdoba, Spain   

Type Visibility public Is Abstract false Parameter

FileDataset

This constructor receives the name of the file as parameter. The following symbols are used as default:

• sCommentValue: "%" • sSeparationSymbol: ","

Parameter:

• sFileName: The filename of the dataset source.

Type Visibility public Is Abstract false Parameter • inout sFileName : String

getCommentValue

Getter method of the property sCommentValue.

Type String Visibility public Is Abstract false Parameter

getFileName

Getter method of the filename of the dataset source.

Type String Visibility public Is Abstract false Parameter

getSeparationSymbol

Getter method of the property sSeparationSymbol.

Type String Visibility public Is Abstract false Parameter

readDataset

Implementations of this abstract method will read the dataset from the file specified by the constructor.

Parameters:

• sContentFormat: String that specifies the reading format of the dataset file. Construct the string using a sequence of control tokens:

o % to omit a line (only one line).

Page 68: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [67]   University of Córdoba, Spain   

o %name to read the name of columns (only one line). o %col to read data (zero, one or more lines).

Example: the string “%%%col%%name” indicates that the first two lines must be omitted, then data is read and, finally, the last line will contain the column names.

• sColumnFormat: A String that contains an ordered sequence of tokens that determine the

data type of each column to be read. Use the following tokens: o s: Nominal column o f: Real column o c: Categorical column o b: Binary column o i: Integer column o %: Skip this column (the column skipped is not processed)

Additionally, notice that other tokens can be considered depending of the specific dataset source (e.g., d for columns of type date).

Exceptions:

• FileNotFoundException • IOException • IllegalFormatSpecificationException • NotAddedValueException • IndexOutOfBoundsException

Type void Visibility public Is Abstract true

Parameter • inout sColumnFormat : String • inout sContentFormat : String

setCommentValue

Setter method of the property sCommentValue.

Parameter:

• sComment: The token/string indicating the symbol that represents a comment line in the dataset file.

Type void Visibility protected Is Abstract false Parameter • inout sComment : String

setFileName

Setter method of the property sFileName. Parameter:

• sFileName: The filename of the dataset source.

Type void Visibility public Is Abstract false Parameter • inout sFileName : String

setSeparationSymbol

Setter method of the property sSeparationSymbol. Parameter:

Page 69: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [68]   University of Córdoba, Spain   

• sSeparationSymbol: The token used to differentiate between instances in the same line of the dataset source.

Type void Visibility protected Is Abstract false Parameter • inout sSeparator : String

writeDataset

This abstract method defines the signature of the write method for every file dataset. Implementations of this method deal with the serialization (writing) of the current column structure into each specific file format.

Parameter:

• sOutputFile: The path where the dataset file will be saved. Exception:

• IOException

Type void Visibility public Is Abstract true Parameter • inout sOutputFile : String

Relation Detail

Generalization

Name Related Element • CsvDataset

Name Related Element • ExcelDataset

Name Related Element • ArffDataset

Name Related Element • Dataset

Class InstanceIterator InstanceIterator is the class that implements the interface IIterator for covering the instances of the dataset. Thus, this class represents an iterator to access each row/instance in a dataset. The instance iterator provides methods to cover the whole set of instances and keeps the reference to the dataset being iterated.

Page 70: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [69]   University of Córdoba, Spain   

Figure 18. Class InstanceIterator

Name InstanceIterator Qualified Name es::uco::kdis::datapro::dataset::InstanceIterator Visibility public Abstract false Base Classifier Realized Interface • IIterator

Attribute Detail All attributes are private.

Operation Detail

currentInstance

This method returns the list of objects that form the currently pointed instance in the dataset.

Type List<Object> Visibility public Is Abstract false Parameter

first

This method returns the list of objects that form the first instance in the dataset and sets the pointer to the first instance.

Type List<Object> Visibility public Is Abstract false Parameter

InstanceIterator

Default iterator constructor.

Parameter:

• oDataset: The dataset to be covered by the iterator.

Type Visibility public Is Abstract false Parameter • inout oDataset : Dataset

Page 71: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [70]   University of Córdoba, Spain   

isDone

This method returns true if the dataset has no more instances to be iterated. False, otherwise.

Type boolean Visibility public Is Abstract false Parameter

next

This method increases the instance pointer by one, i.e. sets the pointer to the next instance in the dataset.

Type void Visibility public Is Abstract false Parameter

Relation Detail

Interface Realization

Name Related Element • IIterator

Interface IIterator IIterator is the interface that any instance iterator has to implement, as InstanceIterator does.

Figure 19. Interface IIterator

Name IIterator Qualified Name es::uco::kdis::datapro::dataset::IIterator Visibility public Base Classifier

Operation Detail

currentInstance

The implementation of this method has to return the current pointed instance in the dataset as a List of instances of any class from Object.

Page 72: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [71]   University of Córdoba, Spain   

Type List<Object> Visibility public Is Abstract true Parameter

first

An implementation of this method returns the first instance of the dataset. From here on, the current instance pointed by the iterator should be this first one.

Type List<Object> Visibility public Is Abstract true Parameter

isDone

This method should be implemented to return True if the iterator points to the last instance of the dataset. It returns False otherwise.

Type boolean Visibility public Is Abstract true Parameter

next The implementation of this method increases the iterator to the next instance in the dataset.

Type void Visibility public Is Abstract true Parameter

Relation Detail

Interface Realization

Name Related Element • InstanceIterator

Page 73: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [72]   University of Córdoba, Spain   

Package es::uco::kdis::datapro::dataset::Column This package contains the classes related to the different types of columns supported by the library. At the moment, datapro4j provides an implementation for the following types:

• Binary column, for positive or negative values. • Categorical column, for prefixed string values, considered as an enumeration of categories. • Date column. • Integer column, for numerical integer values. • Nominal column, for free valued strings. • Numerical column, for numerical real values. • Range column, for those values that represent a numerical interval (minimum, maximum), where both

open and close ranges can be considered.

Columns are coded following the philosophy of the bridge design pattern, where an abstraction is decoupled from its implementation. In this way, the programmer can add to the library new implementations of some of the columns provided, e.g. for performance reasons, without altering the manner in which the rest of the library–including algorithms–interacts with this column.

Therefore, every column type is implemented by at least two different classes: its abstraction, where the accessor methods to its functionalities exist, and its implementation, where these functionalities are coded, and invoked from the abstraction.

Using columns properly demands considering the following rules:

• Any code from the library (i.e. from other columns, datasets or strategies) should always invoke methods of the abstraction. Never invoke directly to the column implementation (only its own abstraction should).

• Altering current abstractions may cause unexpected failures. Use generalization or provide conversion methods to build your own abstractions instead.

• Abstractions and implementations must be subclasses of ColumnAbstraction and ColumnImplementation, respectively.

• Datapro4j only supports one implementation class per abstraction. If the programmer wants to have more than one implementation, then more than one abstraction should be provided, or a factory pattern should be coded.

• If new abstractions (i.e. type of columns) are provided, modify the enumeration ColumnType accordingly.

Figure 20. Package es.uco.kdis.datapro.dataset.Column

Name Column Qualified Name es::uco::kdis::datapro::dataset::Column

Class ColumnAbstraction This abstract class implements the common functionalities contained by every column in the dataset. It also defines the methods that are not coded by the implementation class, but they refer to the column metainformation (e.g. name, type, etc.). The latter methods are directly implemented by abstractions, since they do not require any access to data.

Page 74: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [73]   University of Córdoba, Spain   

Figure 21. Abstract class ColumnAbstraction

Name ColumnAbstraction Qualified Name es::uco::kdis::datapro::dataset::Column::ColumnAbstraction Visibility public Abstract true Base Classifier Realized Interface

Attribute Detail

ctColumnType

The column type, as represented by the enumeration defined by the class ColumnType.

Type ColumnType Default Value Visibility protected Multiplicity 1

oImpl

A reference to the implementation object.

Type ColumnImpl Default Value Visibility protected Multiplicity 1

sName

The name of the column.

Page 75: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [74]   University of Córdoba, Spain   

Type String Default Value Visibility protected Multiplicity

Operation Detail

addAllValues

This method calls the implementation to add a list of values at the end of the column.

Parameter:

• rgoCol The list of values to be added. The objects here contained must satisfy the type required by the column.

Type void Visibility public Is Abstract false Parameter • inout rgoCol : List<Object>

addValue

This method calls the implementation to add a single value at the end of the column.

Parameter:

• oValue The value to be added. It must satisfy the type required by the column. The method returns the number of items successfully added to the column.

Type int Visibility public Is Abstract false Parameter • inout oValue : Object

addValue

This method calls the implementation to add a single value at the end of the column.

Parameters:

• oValue The value to be added. It must satisfy the type required by the column. • bForce is used to indicate that the value must be added, independently of the constraints

and addition policies defined by the column type. The method returns the number of items successfully added to the column.

Type int Visibility public Is Abstract false Parameter • in bForce : boolean

• inout oValue : Object

addValue

This method calls the implementation to add a single value at a given position in the column.

Parameters:

Page 76: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [75]   University of Córdoba, Spain   

• iIndex indicates the element position where the item has to be added. • oValue The value to be added. It must satisfy the type required by the column.

The method returns the number of items successfully added to the column.

Type int Visibility public Is Abstract false Parameter • in iIndex : int

• inout oValue : Object

ColumnAbstraction

Default constructor with parameters. Subclasses may override this method or create new constructors.

This constructor only assigns the parameter values to its respective variables. The constructor in the subclass should create the implementation object and assigned it to the variable oImpl.

Parameters:

• ctColumnType The column type. • sName The Name of the column to be created.

Type Visibility public Is Abstract false Parameter • inout ctColumnType : ColumnType

• inout sName : String

countEmptyValues

This method calls the implementation to return the number of empty values in the column set.

Type int Visibility public Is Abstract false Parameter

countInvalidValues

This method calls the implementation to return the number of invalid values (i.e. empty, null and missing values) in the column set.

Type int Visibility public Is Abstract false Parameter

countMissingValues

This method calls the implementation to return the number of missing values in the column set.

Type Int Visibility public Is Abstract false Parameter

Page 77: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [76]   University of Córdoba, Spain   

countNullValues

This method calls the implementation to return the number of null values in the column set.

Type int Visibility public Is Abstract false Parameter

getElement

This method calls the implementation to return the element at the given position.

Parameter:

• iPos Position of the element queried.

Type Object Visibility public Is Abstract false Parameter • in iPos : int

getEmptyValue

This method calls the implementation to return the column-specific empty value. This is not the default empty value used by datapro4j (Class EmptyValue) for reading, writing or internally checking empty values, but it serves the developer to define its own use (e.g., the symbol associated to the empty value, or whatever).

Type Object Visibility public Is Abstract false Parameter

getMissingValue

This method calls the implementation to return the column-specific missing value. This is not the default missing value used by datapro4j (Class MissingValue) for reading, writing or internally checking missing values, but it serves the developer to define its own use (e.g., the symbol associated to a missing value, or whatever).

Type Object Visibility public Is Abstract false Parameter

getName

This method returns the name given of the column.

Type String Visibility public Is Abstract false Parameter

getNullValue

This method calls the implementation to return the column-specific null value. This is not the default null value used by datapro4j (Class NullValue) for reading, writing or internally checking null values, but it serves the developer to define its own null object.

Page 78: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [77]   University of Córdoba, Spain   

Type Object Visibility public Is Abstract false Parameter

getSize

This method calls the implementation to return the size of the column.

Type int Visibility public Is Abstract false Parameter

getType

This method returns the type of the column as a value of ColumnType.

Type ColumnType Visibility public Is Abstract false Parameter

getValues

This method calls the implementation to return the list of items (as instances of Object) contained in the column.

Type List<Object> Visibility public Is Abstract false Parameter

removeValue

It calls the implementation to remove an element in the column at a given position. Parameter:

• iIndex The index of the element to be removed.

Type void Visibility public Is Abstract false Parameter • in iIndex : int

setEmptyValue

This method calls the implementation to set the column-specific empty value, if required. This is not the default empty value used by datapro4j (Class EmptyValue) for reading, writing or internally checking empty values, but the developer has to define its usage in the code of the proper strategies.

Parameter:

• oEmptyValue The empty value to be set.

Type void Visibility public Is Abstract false Parameter • inout oEmptyValue : Object

Page 79: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [78]   University of Córdoba, Spain   

setMissingValue

This method calls the implementation to set the column-specific missing value, if required. This is not the default missing value used by datapro4j (Class MissingValue) for reading, writing or internally checking missing values, but the developer has to define its usage in the code of the proper strategies. Parameter:

• oMissingValue The missing value to be set.

Type void Visibility public Is Abstract false Parameter • inout oMissingValue : Object

setName

This method sets the name of the column.

Parameter:

• sName The new name for the column.

Type void Visibility public Is Abstract false Parameter • inout sName : String

setNullValue

This method calls the implementation to set the column-specific null value, if required. This is not the default null value used by datapro4j (Class NullValue) for reading, writing or internally checking null values, but the developer has to define its usage in the code of the proper strategies.

Parameter:

• oNullValue The null value to be set.

Type void Visibility public Is Abstract false Parameter • inout oNullValue : Object

setValue

This method calls the implementation to set the value of an element in the column at a given position.

Parameters:

• oValue The value to be added. • iIndex The element position in the column.

It returns the number of elements correctly added.

Type int Visibility public Is Abstract false Parameter • in iIndex : int

• inout oValue : Object

Page 80: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [79]   University of Córdoba, Spain   

Relation Detail

Association

Name Related Element • ColumnImpl

Name Related Element • ColumnType

Name rgoColumns Related Element • Dataset

Generalization

Name Related Element • CategoricalColumn

Name Related Element • NumericalColumn

Name Related Element • DateColumn

Name Related Element • BinaryColumn

Name Related Element • NominalColumn

Name Related Element • RangeColumn

Class ColumnImpl This abstract class serves as a base for column implementation classes. These classes comprise the real code accessing data in the column. Only metainformation is managed by its abstraction.

Note: None of its methods should be directly invoked, apart from its specific abstraction. Thus, for a given column type, abstraction is inalterable, whereas implementation could be adapted by the programmer.

Page 81: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [80]   University of Córdoba, Spain   

Figure 22. Abstract class ColumnImpl

Name ColumnImpl Qualified Name es::uco::kdis::datapro::dataset::Column::ColumnImpl Visibility public Abstract true Base Classifier Realized Interface

Attribute Detail

oEmptyValue

This object represents a column-specific empty value. Notice that this is not the standard empty value object, as used by datapro4j strategies and datasets.

Type Object Default Value null Visibility protected Multiplicity

oMissingValue

This object represents a column-specific missing value. Notice that this is not the standard missing value object, as used by datapro4j strategies and datasets.

Type Object Default Value null Visibility protected Multiplicity

oNullValue

This object represents a column-specific null value. Notice that this is not the standard null value object, as used by datapro4j strategies and datasets.

Page 82: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [81]   University of Córdoba, Spain   

Type Object Default Value null Visibility protected Multiplicity

Operation Detail The following methods code the implementation for their corresponding abstraction methods.

addAllValues

This method implements the method addAllValues of the column abstraction, returning the number of objects successfully added.

Parameter:

• rgoCol The list of item objects to be added to the column.

Type int Visibility public Is Abstract true Parameter • inout rgoCol : List<Object>

addValue

This method implements the method addValue of the column abstraction, returning the number of objects successfully added.

Parameter:

• oValue The value to be added.

Type int Visibility public Is Abstract true Parameter • inout oValue : Object

addValue

This method implements the method addValue of the column abstraction, returning the number of objects successfully added.

Parameters:

• oValue The value to be added • bForce If true, the implementation must force its addition.

Note: By default bForce is not considered. Otherwise, the subclass implementing the specific column should explicitly rewrite this method.

Type int Visibility public Is Abstract false Parameter • inout oValue : Object

• in bForce : boolean

Page 83: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [82]   University of Córdoba, Spain   

addValue

This method implements the method addValue of the column abstraction, returning the number of objects successfully added.

Parameters:

• oValue The value to be added. • iIndex The position in the column to add the value.

Type Int Visibility public Is Abstract true Parameter • inout oValue : Object

• in iIndex : int

countEmptyValues

This method implements the method countEmptyValue of the column abstraction, returning the number of empty values contained in the column values. -1 is returned if this value could not be calculated.

Type int Visibility public Is Abstract false Parameter

countInvalidValues

This method implements the method countInvalidValue of the column abstraction, returning the number of invalid values (null, empty and missing values) contained in the column values. -1 is returned if this value could not be calculated.

Type int Visibility public Is Abstract false Parameter

countMissingValues

This method implements the method countMissingValue of the column abstraction, returning the number of missing values contained in the column values. -1 is returned if this value could not be calculated.

Type int Visibility public Is Abstract false Parameter

countNullValues

This method implements the method countNullValue of the column abstraction, returning the number of null values contained in the column values. -1 is returned if this value could not be calculated.

Type int Visibility public Is Abstract false Parameter

Page 84: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [83]   University of Córdoba, Spain   

getElement

This method implements the method getElement of the column abstraction, returning the element at the given position.

Parameter:

• iPos The position of the element to be returned.

Type Object Visibility public Is Abstract true Parameter • in iPos : int

getEmptyValue

This method implements the method getEmptyValue of the column abstraction, returning the element representing the column-specific empty value.

Type Object Visibility public Is Abstract false Parameter

getMissingValue

This method implements the method getMissingValue of the column abstraction, returning the element representing the column-specific missing value.

Type Object Visibility public Is Abstract false Parameter

getNullValue

This method implements the method getNullValue of the column abstraction, returning the element representing the column-specific null value.

Type Object Visibility public Is Abstract false Parameter

getSize

This method implements the method getSize of the column abstraction, returning the number of elements contained in the column.

Type int Visibility public Is Abstract true Parameter

Page 85: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [84]   University of Córdoba, Spain   

getValues

This method implements the method getValues of the column abstraction, returning the list of elements (as instances of Object) contained in the column.

Type List<Object> Visibility public Is Abstract true Parameter

removeValue

This method implements the method removeValue of the column abstraction.

Parameter:

• iIndex The position in the column to add the value.

Type void Visibility public Is Abstract true Parameter • in iIndex : int

setEmptyValue

This method implements the method setEmptyValue of the column abstraction, setting the element representing the column-specific empty value.

Parameter:

• oEmptyValue The object representing a specific empty value in this column.

Type void Visibility public Is Abstract false Parameter • inout oEmptyValue : Object

setMissingValue

This method implements the method setMissingValue of the column abstraction, setting the element representing the column-specific missing value.

Parameter:

• oMissingValue The object representing a specific missing value in this column.

Type void Visibility public Is Abstract false Parameter • inout oMissingValue : Object

setNullValue

This method implements the method setNullValue of the column abstraction, setting the element representing the column-specific null value.

Parameter:

• oNullValue The object representing a specific null value in this column.

Type void Visibility public

Page 86: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [85]   University of Córdoba, Spain   

Is Abstract false Parameter • inout oNullValue : Object

setValue

This method implements the method setValue of the column abstraction, setting the element value at the given position.

Parameters:

• oValue The object value to set. • iIndex The position index in the column.

Type int Visibility public Is Abstract true Parameter • in iIndex : int

• inout oValue : Object

Relation Detail

Association

Name Related Element • ColumnAbstraction

Generalization

Name Related Element • RangeColumnImpl

Name Related Element • NominalColumnImpl

Name Related Element • NumericalColumnImpl

Name Related Element • DateColumnImpl

Name Related Element • CategoricalColumnImpl

Name Related Element • BinaryColumnImpl

Enumeration ColumnType This enumeration contains the different types of columns supported by datapro4j. The following types are currently supported:

• Binary • Categorical • Date • Integer

Page 87: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4Rev 1 (Jul

More @ htt 

• N• N• R

Note: If columns

Name QualifVisibilAbstraBase CRealiz

Attribut

Binar

Boolean

TyDeVisMu

Categ

Categor

TyDeVisMu

Date

Date att

TyDeVisMu

4j ly 2012) 

tp://www.jrrom

Nominal Numerical Range

the programs)

ColumnAb

if (oCol…

}

fied Name lity act Classifier zed Interface

e Detail

ry

n attribute

ype efault Value sibility ultiplicity

gorical

rical attribute

ype efault Value sibility ultiplicity

tribute

ype efault Value sibility ultiplicity

mero.net/en 

mmer wants to

bstraction

l.getType(

Columnes::ucopublic false

e

publ

e

publ

publ

KDIS ReseUniversity of 

o check the c

n oCol;

().equals(

Figure 2

nType o::kdis::datap

ic

ic

ic

 

earch GroupCórdoba, Spain

column type,

(ColumnTyp

23. Enumerati

pro::dataset:

n

, the followin

pe.Binary)

ion ColumnT

:Column::Co

g code shou

)) {

ype

olumnType

The pro

uld be used (e

ogrammer’s gu

[86] 

e.g. for binar

ide 

ry

Page 88: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [87]   University of Córdoba, Spain   

Integer

Integer attribute

Type Default Value Visibility public Multiplicity

Nominal

Nominal attribute

Type Default Value Visibility public Multiplicity

Numerical

Numerical attribute

Type Default Value Visibility public Multiplicity

Range

Range attribute

Type Default Value Visibility public Multiplicity

Relation Detail

Association

Name Related Element • ColumnAbstraction

Class BinaryColumn This class represents the abstraction of a binary column. Here the methods that provide specific operations on specific binary data are defined.

Figure 24. Class BinaryColumn

Page 89: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [88]   University of Córdoba, Spain   

Name BinaryColumn Qualified Name es::uco::kdis::datapro::dataset::Column::BinaryColumn Visibility public Abstract false Base Classifier • ColumnAbstraction Realized Interface

Operation Detail

BinaryColumn

Default constructor. The implementation BinaryColumnImpl is invoked.

Type Visibility public Is Abstract false Parameter

BinaryColumn

Constructor with the name of the column as a parameter. The implementation BinaryColumnImpl is invoked.

Parameter:

• sName The name of the column.

Type Visibility public Is Abstract false Parameter • inout sName : String

toCategorical

This method calls the implementation to return a categorical column generated from the binary column. The resulting categorical column defines two categories, one per each binary value (false, true).

Parameters:

• sFalseCategory The category representing the false binary value. • sTrueCategory The category representing the true binary value.

Notes:

• If the value is an empty or a missing value, then a false value is considered. • If the value is a null value, then a null value is considered.

Type CategoricalColumn Visibility public Is Abstract false Parameter • inout sFalseCategory : String

• inout sTrueCategory : String

Relation Detail

Generalization

Name Related Element • ColumnAbstraction

Page 90: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [89]   University of Córdoba, Spain   

Class BinaryColumnImpl

This class provides the implementation code accessing real data in a binary column. Binary values are stored as objects of class Boolean.

Note: None of its methods should be directly invoked, but only from its specific abstraction.

Figure 25. Class BinaryColumnImpl

Name BinaryColumnImpl Qualified Name es::uco::kdis::datapro::dataset::Column::BinaryColumnImpl Visibility public Abstract false Base Classifier • ColumnImpl Realized Interface

Attribute Detail All attributes are private.

Operation Detail

For a more complete specification of the methods inherited from ColumnImpl, see its specifications above.

addAllValues

Type int Visibility public Is Abstract false Parameter • inout rgoCol : List<Object>

addValue

Type int Visibility public Is Abstract false Parameter • inout oValue : Object

Page 91: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [90]   University of Córdoba, Spain   

addValue

Type int Visibility public Is Abstract false Parameter • in iIndex : int

• inout oValue : Object

BinaryColumnImpl

Default constructor.

Type Visibility public Is Abstract false Parameter

countEmptyValues

Type int Visibility public Is Abstract false Parameter

countInvalidValues

Type int Visibility public Is Abstract false Parameter

countMissingValues

Type int Visibility public Is Abstract false Parameter

countNullValues

Type int Visibility public Is Abstract false Parameter

getElement

Type Object Visibility public Is Abstract false Parameter • in iPos : int

Page 92: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [91]   University of Córdoba, Spain   

getSize

Type int Visibility public Is Abstract false Parameter

getValues

Type List<Object> Visibility public Is Abstract false Parameter

removeValue

Type void Visibility public Is Abstract false Parameter • inout iIndex : int

setValue

Type int Visibility public Is Abstract false Parameter • in iIndex : int

• inout oValue : Object

toCategorical

This method implements the method toCategorical of the binary column abstraction, converting the binary column into a categorical column.

Parameters:

• sName The name of the column. By default this property is set by the abstraction to the current name of the binary column.

• sFalseCategory The category representing the false binary value. • sTrueCategory The category representing the true binary value.

Type CategoricalColumn Visibility public Is Abstract false Parameter • inout sName : String

• inout sFalseCategory : String • inout sTrueCategory : String

Relation Detail

Generalization

Name Related Element • ColumnImpl

Page 93: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [92]   University of Córdoba, Spain   

Class CategoricalColumn

This class defines the abstraction of a categorical column, where every value belongs to a predefined category. Here the methods that provide specific operations on categorical data are defined.

Figure 26. Class CategoricalColumn

Name CategoricalColumn Qualified Name es::uco::kdis::datapro::dataset::Column::CategoricalColumn Visibility public Abstract false Base Classifier • ColumnAbstraction Realized Interface

Operation Detail

addCategory

This method calls the implementation to add a new category to the set of allowable values. Categories are included as objects of class String.

Parameter:

• szCategory The new category in the column

Type void Visibility public Is Abstract false Parameter • inout szCategory : String

CategoricalColumn

Constructor with the name of the column as a parameter. The implementation CategoricalColumnImpl is invoked.

Parameter:

• sName The name of the column

Type Visibility public Is Abstract false Parameter • inout sName : String

Page 94: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [93]   University of Córdoba, Spain   

CategoricalColumn

Default constructor. The implementation CategoricalColumnImpl is invoked.

Type Visibility public Is Abstract false Parameter

getCategoryIndex

This method calls the implementation to return the index in the list of categories of a given string. The value -1 is returned if the value is not found.

Parameter:

• szCategory The string representing the category to be searched in the list of categories

Type int Visibility public Is Abstract false Parameter • inout szCategory : String

getCategoryList

This method calls the implementation to return the list of categories in the column.

Type List<Object> Visibility public Is Abstract false Parameter

getCategoryName

This method calls the implementation to return the category string stored in a given position of the list of categories. null is returned if the index given is not valid.

Parameter:

• iIndex The index of the wanted category

Type String Visibility public Is Abstract false Parameter • inout iIndex : Integer

getElementIndex

This method calls the implementation to return the element stored in a given position in the column. The category index is returned, whereas the default method getElement (inherited from ColumnAbstraction) returns the category by name. If the value is invalid, -1 is returned.

Parameter:

• iPos The index of the item in the column

Page 95: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [94]   University of Córdoba, Spain   

Exceptions:

• IndexOutOfBoundsException

Type Integer Visibility public Is Abstract false Parameter • in iPos : int

replaceCategory

This method calls the implementation to replace a given category with a new one. Parameters:

• szOldCategory The category string to be replaced • szNewCategory The new category string to be set • bJoinCategory If the new category string already exists, then this parameter

determines whether the values in of the old category are mixed together with the values of the column whose values coincide

1 is returned if the category is successfully replaced, or 0 otherwise.

Type int Visibility public Is Abstract false Parameter • in bJoinCategory : boolean

• inout szNewCategory : String • inout szOldCategory : String

toBinary

This method calls the implementation to return a binary column generated from the categorical column. Invalid values remain unaltered.

Parameter:

• aReferenceTrueValues The list of category strings to be as true values

Type BinaryColumn Visibility public Is Abstract false Parameter • inout aReferenceTrueValues : List<String>

toNominal

This method calls the implementation to return a nominal column generated from the strings stored in the categorical column. Nominal values are extracted from the strings representing each category.

Type NominalColumn Visibility public Is Abstract false Parameter

toNumerical

This method calls the implementation to return an integer column generated from the index values assigned to the categories in the source column.

Page 96: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [95]   University of Córdoba, Spain   

Type IntegerColumn Visibility public Is Abstract false Parameter

Relation Detail

Generalization

Name Related Element • ColumnAbstraction

Class CategoricalColumnImpl This class provides the implementation code accessing real data in a categorical column. Categories are stored as a HashMap between a String and an Integer. Thus, internally, data are stored as an ArrayList of Integer, whereas their correspondences to categories are saved as String.

This class should never be directly invoked, apart from those invocations coming from its abstraction.

Figure 27. Class CategoricalColumnImpl

Name CategoricalColumnImpl Qualified Name es::uco::kdis::datapro::dataset::Column::CategoricalColumnImpl Visibility public Abstract false Base Classifier • ColumnImpl Realized Interface

Page 97: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [96]   University of Córdoba, Spain   

Attribute Detail All attributes are private.

Operation Detail For a more complete specification of the methods inherited from ColumnImpl, see its specification above. Notice that values can be added both as a String –identifier- and as an Integer–index- (see methods addValue, addAllValues). In both cases only elements belonging to valid categories are added to the set of values in the column.

addCategory

This method implements the functionality of addCategory in the categorical column abstraction, adding a new category to the column. This category should not exist. It returns the index of the new category, if successfully created, or -1 if the category cannot be added.

Parameter:

• sCat The identifier of the new category

Type int Visibility public Is Abstract false Parameter • inout sCat : String

addValue

Type int Visibility public Is Abstract false Parameter • inout oValue : Object

addValue

Type int Visibility public Is Abstract false Parameter • in bForce : boolean

• inout oValue : Object

addValue

Type int Visibility public Is Abstract false Parameter • in iIndex : int

• inout oValue : Object

CategoricalColumnImpl

Default constructor.

Page 98: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [97]   University of Córdoba, Spain   

Type Visibility public Is Abstract false Parameter

countEmptyValues

Type int Visibility public Is Abstract false Parameter

countInvalidValues

Type int Visibility public Is Abstract false Parameter

countMissingValues

Type int Visibility public Is Abstract false Parameter

countNullValues

Type int Visibility public Is Abstract false Parameter

getCategoryIndex

This method implements the functionality of getCategoryIndex in the column abstraction, returning the index of the category passed as String, or -1 if the category does not exist in the list of categories of the column.

Parameter:

• sCategory The category identifier

Type int Visibility public Is Abstract false Parameter • inout sCategory : String

getCategoryList

This method implements the functionality of getCategoryIndex in the column abstraction, returning the list of category identifiers comprised by the category list. The resulting list is not sorted.

Page 99: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [98]   University of Córdoba, Spain   

Type List<Object> Visibility public Is Abstract false Parameter

getCategoryName

This method implements the functionality of getCategoryName in the column abstraction, returning the identifier of the category whose index is passed as parameter. If the category does not exist, then null is returned.

Parameter:

• iIndex The category index

Type String Visibility public Is Abstract false Parameter • inout iIndex : Integer

getElement

Type Object Visibility public Is Abstract false Parameter • in iPos : int

getElementIndex

This method implements the functionality of getElementIndex in the column abstraction, returning the category index stored at a given position. Notice that indexes in the category list do not have to be sorted or sequencial, since categories may be successively created and deleted, causing gaps in the index sequence. Always consider category indexes as numerical identifiers, never as sequential indexes.

This method returns -1 if the position given is invalid.

Parameter:

• iPos The position given in the category list. Exceptions:

• IndexOutOfBoundsException

Type Integer Visibility public Is Abstract false Parameter • in iPos : int

getSize

Type int Visibility public Is Abstract false Parameter

Page 100: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [99]   University of Córdoba, Spain   

getValues

Type List<Object> Visibility public Is Abstract false Parameter

removeValue

Type void Visibility public Is Abstract false Parameter • in iIndex : int

replaceCategory

This method implements the functionality of replaceCategory in the column abstraction, updating both the category list and replacing the values in the column. 1 is returned if done; 0, otherwise.

Parameters:

• sOldCategory The old category identifier to be replaced • sNewCategory The new category • bJoinCategory If true, if the new category identifier already exists in the column, then the

values with the old category identifier will be joined to the already existing identifier, having only one category as a result

Type int Visibility public Is Abstract false Parameter • in bJoinCategory : boolean

• inout sNewCategory : String • inout sOldCategory : String

setValue

Type int Visibility public Is Abstract false Parameter • in iIndex : int

• inout oValue : Object

toBinary

This method implements the functionality of toBinary in the column abstraction, returning a binary column constructed from the data contained in the categorical column. The list of category identifiers considered as True values in the binary column is passes as parameter. The non included category identifiers are considered as False values. Note that invalid values are observed.

Parameters:

• aReferenceTrueValues The list of categories representing true values • sName The name of the new binary column

Page 101: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [100]   University of Córdoba, Spain   

Type BinaryColumn Visibility public Is Abstract false Parameter • inout aReferenceTrueValues : List<String>

• inout sName : String

toNominal

This method implements the functionality of toNominal in the column abstraction, returning a nominal column constructed from the data contained in the categorical column. Strings for the nominal column are constructed from the category identifiers.

Parameter:

• sName The name of the new nominal column

Type NominalColumn Visibility public Is Abstract false Parameter • inout sName : String

toNumerical

This method implements the functionality of toNumerical in the column abstraction, returning an integer column constructed from the data contained in the categorical column. Numbers of the integer column are extracted from the category indexes.

Parameter:

• sName The name of the new integer column

Type IntegerColumn Visibility public Is Abstract false Parameter • inout sName : String

Relation Detail

Generalization

Name Related Element • RangeColumnImpl

Name Related Element • ColumnImpl

Class DateColumn This class represents the abstraction of a date datatype column. This type of column is specifically required by ARFF datasets.

Page 102: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [101]   University of Córdoba, Spain   

Figure 28. Class DateColumn

Name DateColumn Qualified Name es::uco::kdis::datapro::dataset::Column::DateColumn Visibility public Abstract false Base Classifier • ColumnAbstraction Realized Interface

Operation Detail

addDateSpecification

This method calls the implementation to set the date format specification of the values in the column.

Parameter:

• sDate The format specification of the values in the date column

Type void Visibility public Is Abstract false Parameter • inout oDate : SimpleDateFormat

DateColumn

Default constructor with no parameters. The implementation DateColumnImpl is invoked.

Type Visibility public Is Abstract false Parameter

DateColumn

Constructor with the name of the column as a parameter. The implementation DateColumnImpl is invoked.

Parameter:

• sName The name of the column

Type Visibility public Is Abstract false Parameter • inout sName : String

getDateSpecification

This method calls the implementation to get the date format specification of the values in the column.

Page 103: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [102]   University of Córdoba, Spain   

Type SimpleDateFormat Visibility public Is Abstract false Parameter

Relation Detail

Generalization

Name Related Element • ColumnAbstraction

Class DateColumnImpl This class provides the implementation code accessing real data in a date column. Values are stored as

Date objects according to the format specified by a given SimpleDateFormat object. This class should not be invoked directly, only by the column abstraction.

Figure 29. Class DateColumnImpl

Name DateColumnImpl Qualified Name es::uco::kdis::datapro::dataset::Column::DateColumnImpl Visibility public Abstract false Base Classifier • ColumnImpl Realized Interface

Attribute Detail

All attributes are private.

Operation Detail For a more complete specification of the methods inherited from ColumnImpl, see its specifications above.

Page 104: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [103]   University of Córdoba, Spain   

addAllValues

Type int Visibility public Is Abstract false Parameter • inout rgoCol : List<Object>

addDateSpecification

This method implements the method addDateSpecification of the date column abstraction, setting the date format specification of the values in the column.

Parameter:

• sDate The format specification of the values in the date column

Type void Visibility public Is Abstract false Parameter • inout oDate : SimpleDateFormat

addValue

Type int Visibility public Is Abstract false Parameter • inout oValue : Object

addValue

Type int Visibility public Is Abstract false Parameter • in bForce : boolean

• inout oValue : Object

addValue

Type int Visibility public Is Abstract false Parameter • in iIndex : int

• inout oValue : Object

DateColumnImpl Default constructor with no parameters.

Type Visibility public Is Abstract false Parameter

Page 105: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [104]   University of Córdoba, Spain   

getDateSpecificaiton

This method implements the method getDateSpecification of the column abstraction, returning the date format specification of the values in the column.

Type SimpleDateFormat Visibility public Is Abstract false Parameter

getElement

Type Object Visibility public Is Abstract false Parameter • in iPos : int

getSize

Type int Visibility public Is Abstract false Parameter

getValues

Type List<Object> Visibility public Is Abstract false Parameter

removeValue

Type void Visibility public Is Abstract false Parameter • in iIndex : int

setValue

Type int Visibility public Is Abstract false Parameter • in iIndex : int

• inout oValue : Object

Relation Detail

Generalization

Name Related Element • ColumnImpl

Page 106: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [105]   University of Córdoba, Spain   

Class IntegerColumn This class represents the abstraction of an integer column. Integer columns are a specialization of numerical (real) columns.

Figure 30. Class IntegerColumn

Name IntegerColumn Qualified Name es::uco::kdis::datapro::dataset::Column::IntegerColumn Visibility public Abstract false Base Classifier • NumericalColumn Realized Interface

Operation Detail Many methods are specializations of their respective methods in the numerical column (NumericalColumn), adapted to the domain of integer values.

getiMaxInterval

Analogously to getdMaxInterval in the NumericalColumn abstraction class, this method gets the maximum integer value allowed for this column.

Type Integer Visibility public Is Abstract false Parameter

getiMinInterval

Analogously to getdMinInterval in the NumericalColumn abstraction class, this method gets the minimum integer value allowed for this column.

Type Integer Visibility public Is Abstract false Parameter

Page 107: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [106]   University of Córdoba, Spain   

getMaxValue

See getMaxValue in the specification of the NumericalColumn abstraction class.

Type double Visibility public Is Abstract false Parameter

getMinValue

For further information, see getMinValue in the specification of the NumericalColumn abstraction class.

Type double Visibility public Is Abstract false Parameter

IntegerColumn

Default constructor with no parameters.

Type Visibility public Is Abstract false Parameter

IntegerColumn

Constructor with the name of the resulting column as a parameter.

Parameter:

• sName The Name of the column

Type Visibility public Is Abstract false Parameter • inout sName : String

mean

For further information, see mean in the specification of the NumericalColumn abstraction class.

Type double Visibility public Is Abstract false Parameter

setiMaxInterval

Analogously to setdMaxInterval in the NumericalColumn abstraction class, this method sets the maximum integer value allowed for this column.

Parameter:

• iMaxInterval The maximum value allowed in the column Exceptions:

• IllegalAccessException if the value cannot be set.

Page 108: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [107]   University of Córdoba, Spain   

Type void Visibility public Is Abstract false Parameter • inout iMaxInterval : Integer

setiMinInterval

Analogously to setdMinInterval in the NumericalColumn abstraction class, this method sets the minimum integer value allowed for this column.

Parameter:

• iMinInterval The maximum value allowed in the column Exceptions:

• IllegalAccessException if the value cannot be set.

Type void Visibility public Is Abstract false Parameter • inout iMinInterval : Integer

standardDeviation

For further information, see standardDeviation in the specification of the NumericalColumn abstraction class.

Type double Visibility public Is Abstract false Parameter

toCategorical

This method calls the implementation to return a categorical column using the values contained in the integer column, where each different value constitutes a different category.

Type CategoricalColumn Visibility public Is Abstract false Parameter

toNumerical

This method calls the implementation to return a numerical column using the values contained in the integer column, where each integer value is casted to a double value.

Type NumericalColumn Visibility public Is Abstract false Parameter

Page 109: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [108]   University of Córdoba, Spain   

Relation Detail

Generalization

Name Related Element • NumericalColumn

Class IntegerColumnImpl This class provides the implementation code accessing real data in an integer column. This class is a specialization of the numerical column implementation (NumericalColumnImpl). Integer values are stored as objects of class Integer. This class and its methods should not be invoked directly.

Figure 31. Class IntegerColumnImpl

Name IntegerColumnImpl Qualified Name es::uco::kdis::datapro::dataset::Column::IntegerColumnImpl Visibility public Abstract false Base Classifier • NumericalColumnImpl Realized Interface

Operation Detail For further information, see a complete specification of these methods in NumericalColumnImpl and ColumnImpl.

addValue

Type int Visibility public Is Abstract false Parameter • inout oValue : Object

addValue

Type int Visibility public Is Abstract false Parameter • in iIndex : int

• inout oValue : Object

Page 110: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [109]   University of Córdoba, Spain   

getMaxValue

Type double Visibility public Is Abstract false Parameter

getMinValue

Type double Visibility public Is Abstract false Parameter

IntegerColumnImpl

Default constructor with no parameters.

Type Visibility public Is Abstract false Parameter

mean

Type double Visibility public Is Abstract false Parameter

setValue

Type int Visibility public Is Abstract false Parameter • in iIndex : int

• inout oValue : Object

standardDeviation

Type double Visibility public Is Abstract false Parameter

toCategorical

This method implements the method toNumerical of the abstraction, returning a categorical column using the values contained in the integer column, where each different value constitutes a different category.

Parameter:

• sName The name of the resulting column

Page 111: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [110]   University of Córdoba, Spain   

Type CategoricalColumn Visibility public Is Abstract false Parameter • inout sName : String

toNumerical

This method implements the method toNumerical of the abstraction, returning a numerical column using the values contained in the integer column, where each different value constitutes a different category.

Parameter:

• sName The name of the resulting column

Type NumericalColumn Visibility public Is Abstract false Parameter • inout sName : String

Relation Detail

Generalization

Name Related Element • NumericalColumnImpl

Class NominalColumn This class represents the abstraction of a nominal column containing free-style strings as values. Here the methods that provide specific operations of nominal values are defined.

Figure 32. Class NominalColumn

Name NominalColumn Qualified Name es::uco::kdis::datapro::dataset::Column::NominalColumn Visibility public Abstract false Base Classifier • ColumnAbstraction Realized Interface

Operation Detail

NominalColumn

Default constructor with no parameters.

Page 112: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [111]   University of Córdoba, Spain   

Type Visibility public Is Abstract false Parameter

NominalColumn

Constructor with the name of the column as parameter.

Parameter:

• sName Name of the column

Type Visibility public Is Abstract false Parameter • inout sName : String

toCategorical

This method calls the implementation to return a categorical column, where each different string is a category (no repetition).

Type CategoricalColumn Visibility public Is Abstract false Parameter

toNumerical

This method calls the implementation to return a numerical column, where each string is parsed to a numerical value. If the string could not be parsed, then an empty value is added instead. Notice that upper and lower interval values are not set for the numerical column returned.

Type NumericalColumn Visibility public Is Abstract false Parameter

Relation Detail

Generalization

Name Related Element • ColumnAbstraction

Class NominalColumnImpl This class provides the implementation code accessing real data in the nominal column. Nominal values are stored as String objects. Note that these methods should not be invoked directly.

Page 113: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [112]   University of Córdoba, Spain   

Figure 33. Class NominalColumnImpl

Name NominalColumnImpl Qualified Name es::uco::kdis::datapro::dataset::Column::NominalColumnImpl Visibility public Abstract false Base Classifier • ColumnImpl Realized Interface

Attribute Detail All attributes are private.

Operation Detail For a more detailed specification of the methods inherited from ColumnImpl, see its specification above.

addAllValues

Type int Visibility public Is Abstract false Parameter • inout rgoCol : List<Object>

addValue

Type int Visibility public Is Abstract false Parameter • inout oValue : Object

Page 114: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [113]   University of Córdoba, Spain   

addValue

Type int Visibility public Is Abstract false Parameter • in iIndex : int

• inout oValue : Object

countEmptyValues

Type int Visibility public Is Abstract false Parameter

countInvalidValues

Type int Visibility public Is Abstract false Parameter

countMissingValues

Type int Visibility public Is Abstract false Parameter

countNullValues

Type int Visibility public Is Abstract false Parameter

getElement

Type Object Visibility public Is Abstract false Parameter • in iPos : int

getSize

Type int Visibility public Is Abstract false Parameter

Page 115: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [114]   University of Córdoba, Spain   

getValues

Type List<Object> Visibility public Is Abstract false Parameter

NominalColumnImpl

Default constructor with no parameters.

Type Visibility public Is Abstract false Parameter

removeValue

Type void Visibility public Is Abstract false Parameter • in iIndex : int

setValue

Type int Visibility public Is Abstract false Parameter • in iIndex : int

• inout oValue : Object

toCategorical

This method implements the method toCategorical of the abstraction, returning a categorical column, where each different string is a category (no repetition).

Parameter:

• sName The name of the column to be created

Type CategoricalColumn Visibility public Is Abstract false Parameter • inout sName : String

toNumerical

This method implements the method toNumerical of the abstraction, returning a numerical column, where each string is parsed to a numerical value. If the string could not be parsed, then an empty value is added instead. Notice that upper and lower interval values are not set for the numerical column returned.

Parameter:

• sName The name of the column

Page 116: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [115]   University of Córdoba, Spain   

Type NumericalColumn Visibility public Is Abstract false Parameter • inout sName : String

Relation Detail

Generalization

Name Related Element • ColumnImpl

Class NumericalColumn This class represents the abstraction of a numerical (real) column.

Figure 34. Class NumericalColumn

Name NumericalColumn Qualified Name es::uco::kdis::datapro::dataset::Column::NumericalColumn Visibility public Abstract false Base Classifier • ColumnAbstraction Realized Interface

Attribute Detail

dMaxInterval

This attribute indicates the maximum value allowed in the column. This property should be accessed using getter/setter methods.

Type Double Default Value Double.MAX_VALUE Visibility protected Multiplicity

Page 117: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [116]   University of Córdoba, Spain   

dMinInterval

This attribute indicates the minimum value allowed in the column. This property should be accessed using getter/setter methods.

Type Double Default Value Double.MIN_VALUE Visibility protected Multiplicity

Operation Detail

getdMaxInterval

This method returns the maximum real value allowed in the column. The implementation is not invoked, since this information is part of metadata.

Type Double Visibility public Is Abstract false Parameter

getdMinInterval

This method returns the minimum real value allowed in the column. The implementation is not invoked, since this information is part of metadata.

Type Double Visibility public Is Abstract false Parameter

getMaxValue

This method calls the implementation to get the maximum existing value in the column data.

Type double Visibility public Is Abstract false Parameter

getMinValue

This method calls the implementation to get the minimum existing value in the column data.

Type double Visibility public Is Abstract false Parameter

mean

This method calls the implementation to get the mean value of the column data.

Page 118: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [117]   University of Córdoba, Spain   

Type double Visibility public Is Abstract false Parameter

normalize

This method calls the implementation to normalize the set of values in the numerical column.

Type void Visibility public Is Abstract false Parameter

NumericalColumn

Default constructor with no parameters. The implementation NumericalColumnImpl is invoked.

Type Visibility public Is Abstract false Parameter

NumericalColumn

Constructor with the name of the column as a parameter. The implementation NumericalColumnImpl is invoked.

Parameter:

• sName The name of the column

Type Visibility public Is Abstract false Parameter • inout sName : String

setdMaxInterval

This method sets the maximum real value allowed in the column. The implementation is not invoked, since this information is part of the column metadata.

Parameter

• dMaxInterval The maximum value allowed Exceptions:

• IllegalAccessException if the value cannot be set

Type void Visibility public Is Abstract false Parameter • inout dMaxInterval : Double

setdMinInterval

This method sets the minimum real value allowed in the column. The implementation is not invoked, since this information is part of the column metadata.

Parameter

Page 119: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [118]   University of Córdoba, Spain   

• dMinInterval The minimum value allowed Exceptions:

• IllegalAccessException if the value cannot be set

Type void Visibility public Is Abstract false Parameter • inout dMinInterval : Double

standardDeviation

This method calls the implementation to return the standard deviation calculated from the set of values in the numerical column.

Type double Visibility public Is Abstract false Parameter

standarize

This method calls the implementation to standarize the set of values in the numerical column.

Parameters:

• dMean Value of the mean used to standardize the set of values of the column • dVariance Value of the variance used for the standardization

Type void Visibility public Is Abstract false Parameter • in dMean : double

• in dVariance : double

toInteger

This method calls the implementation to return an integer column containing values extracted from the numerical column. It returns an IntegerColumn object.

Parameter:

• bRoundedValue if false, values are truncated; if true, values are rounded.

Type IntegerColumn Visibility public Is Abstract false Parameter • in bRoundedValue : boolean

toNominal

This method calls the implementation to return a nominal column, where strings are constructed from real values.

Type NominalColumn Visibility public Is Abstract false Parameter

Page 120: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [119]   University of Córdoba, Spain   

Relation Detail

Generalization

Name Related Element • ColumnAbstraction

Class NumericalColumnImpl This class provides the implementation code accessing real data in a numerical column. Values are stored as objects of the class Double. Notice that this class should not be directly instantiated, with the exception of its abstraction.

Figure 35. Class NumericalColumnImpl

Name NumericalColumnImpl Qualified Name es::uco::kdis::datapro::dataset::Column::NumericalColumnImpl Visibility public Abstract false Base Classifier • ColumnImpl Realized Interface

Attribute Detail All the attributes are either private or protected.

Operation Detail

addAllValues

Type int Visibility public Is Abstract false Parameter • inout rgoCol : List<Object>

Page 121: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [120]   University of Córdoba, Spain   

addValue

Type int Visibility public Is Abstract false Parameter • inout oValue : Object

addValue

Type int Visibility public Is Abstract false Parameter • in iIndex : int

• inout oValue : Object

countEmptyValues

Type int Visibility public Is Abstract false Parameter

countInvalidValues

Type int Visibility public Is Abstract false Parameter

countMissingValues

Type int Visibility public Is Abstract false Parameter

countNullValues

Type int Visibility public Is Abstract false Parameter

getElement

Type Object Visibility public Is Abstract false Parameter • in iPos : int

Page 122: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [121]   University of Córdoba, Spain   

getMaxValue

This method implements the method getMaxValue of the abstraction class, returning the maximum existing value in the column.

Type double Visibility public Is Abstract false Parameter

getMinValue

This method implements the method getMinValue of the abstraction class, returning the maximum existing value in the column.

Type double Visibility public Is Abstract false Parameter

getSize

Type int Visibility public Is Abstract false Parameter

getValues

Type List<Object> Visibility public Is Abstract false Parameter

mean

This method implements the method mean of the abstraction class, returning the mean value of the column.

Type double Visibility public Is Abstract false Parameter

normalize

This method implements the method normalize of the abstraction class, calculating and normalizing the values contained in the set of values of the column.

Type void Visibility public Is Abstract false Parameter

NumericalColumnImpl

Default constructor with no parameters.

Page 123: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [122]   University of Córdoba, Spain   

Type Visibility public Is Abstract false Parameter

removeValue

Type void Visibility public Is Abstract false Parameter • in iIndex : int

setValue

Type int Visibility public Is Abstract false Parameter • in iIndex : int

• inout oValue : Object

standardDeviation

This method implements the method standardDeviation of the abstraction class, returning the standard deviation value of the set of values contained in the numerical column.

Type double Visibility public Is Abstract false Parameter

standarize

This method implements the method standarize of the abstraction class, standardizing the values in the column according to the mean and variance passed as parameter.

Parameters:

• dMean Mean value considered for the standardization • dVariance Variance value considered for the standardization

Type void Visibility public Is Abstract false Parameter • in dMean : double

• in dVariance : double

toInteger

This method implements the method toInteger of the abstraction class, returning an integer column calculated from the numerical column.

Parameters:

• sName The name of the resulting new column • bRoundedValue If false, values are truncated; if true, values are rounded

Page 124: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [123]   University of Córdoba, Spain   

Type IntegerColumn Visibility public Is Abstract false Parameter • in bRoundedValue : boolean

• inout sName : String

toNominal

This method implements the method toNominal of the abstraction class, returning a nominal column which strings are constructed parsing the numerical values in the column.

Parameter:

• sName The name of the resulting new column

Type NominalColumn Visibility public Is Abstract false Parameter • inout sName : String

Relation Detail

Generalization

Name Related Element • ColumnImpl

Class RangeColumn This class represents the abstraction of a range column, whose values are intervals with a minimum and a maximum value in the range.

Figure 36. Class RangeColumn

Name RangeColumn Qualified Name es::uco::kdis::datapro::dataset::Column::RangeColumn Visibility public Abstract false Base Classifier • ColumnAbstraction Realized Interface

Operation Detail

RangeColumn

Default constructor with no parameters.

Type

Page 125: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [124]   University of Córdoba, Spain   

Visibility public Is Abstract false Parameter

RangeColumn

Constructor with the name of the column as a parameter.

Parameter:

• sName The name of the column.

Type Visibility public Is Abstract false Parameter • inout sName : String

toCategorical

This method calls the implementation to return a categorical column extracted from the range data contained in the column. The method returns a CategoricalColumn object.

Exceptions:

• NotAddedValueException

Type CategoricalColumn Visibility public Is Abstract false Parameter

toNumerical

This method calls the implementation to return a numerical column extracted from the range values contained in the column, and according to on of the following modes:

0: The minimum value of each range is selected.

1: The maximum value of each range is selected.

2: The mean value between min and max is selected.

3: A random value in the range is selected.

It returns the resulting NumericalColumn object.

Parameter:

• iMode An integer between 0 and 3 indicating the conversion mode, as described above. Exceptions:

• NotAddedValueException

Type NumericalColumn Visibility public Is Abstract false Parameter • inout iMode : int

Page 126: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [125]   University of Córdoba, Spain   

toNumericalByGaussian

This method calls the implementation to return a numerical column extracted from the range values contained in the column, according to the Gauss distribution.

Parameters:

• dMean The arithmetic mean for the distribution • dStdDev The standard deviation for the distribution

It returns the resulting NumericalColumn object.

Exceptions:

• NotAddedValueException

Type NumericalColumn Visibility public Is Abstract false Parameter • in dMean : double

• in dStdDev : double

Relation Detail

Generalization

Name Related Element • ColumnAbstraction

Class RangeColumnImpl This class, the abstraction of a range column (i.e. a representation of a [min, max] interval), is the one that should be used by the programmer, since it hides the actual implementation of the column. Even when the implementation changes, the abstraction must remain unaltered.

Figure 37. Class RangeColumnImpl

Name RangeColumnImpl

Page 127: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [126]   University of Córdoba, Spain   

Qualified Name es::uco::kdis::datapro::dataset::Column::RangeColumnImpl Visibility public Abstract false Base Classifier • ColumnImpl Realized Interface

Attribute Detail All attributes are private.

Operation Detail For a detailed specification of the methods inherited from ColumnImpl, see its specifications above.

addAllValues

Type int Visibility public Is Abstract false Parameter • inout rgoValues : List<Object>

addValue

Type int Visibility public Is Abstract false Parameter • inout oValue : Object

addValue

Type int Visibility public Is Abstract false Parameter • in iIndex : int

• inout oValue : Object

countEmptyValues

Type int Visibility public Is Abstract false Parameter

countInvalidValues

Type int Visibility public Is Abstract false Parameter

countMissingValues

Page 128: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [127]   University of Córdoba, Spain   

Type int Visibility public Is Abstract false Parameter

countNullValues

Type int Visibility public Is Abstract false Parameter

getElement

Type Object Visibility public Is Abstract false Parameter • in iPos : int

getSize

Type int Visibility public Is Abstract false Parameter

getValues

Type List<Object> Visibility public Is Abstract false Parameter

RangeColumn

Default constructor with no parameters.

Type Visibility public Is Abstract false Parameter

RangeColumn

Constructor with the name of the column as a Parameter.

Parameter:

• sName The name of the column.

Type Visibility public

Page 129: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [128]   University of Córdoba, Spain   

Is Abstract false Parameter • inout sName : String

removeValue

Type Visibility public Is Abstract false Parameter • in iIndex : int

setValue

Type int Visibility public Is Abstract false Parameter • in iIndex : int

• inout oValue : Object

toCategorical

This method implements the method toCategorical of the abstraction, returning a categorical column extracted from the range data contained in the column. The method returns the resulting CategoricalColumn object.

Exceptions:

• NotAddedValueException

Type CategoricalColumn Visibility public Is Abstract false Parameter

toNumerical

This method implements the method toNumerical of the abstraction, returning a numerical column extracted from the range values contained in the column, and according to on of the following modes:

0: The minimum value of each range is selected.

1: The maximum value of each range is selected.

2: The mean value between min and max is selected.

3: A random value in the range is selected.

It returns the resulting NumericalColumn object.

Parameter:

• iMode An integer between 0 and 3 indicating the conversion mode, as described above. Exceptions:

• NotAddedValueException

Type NumericalColumn Visibility public

Page 130: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [129]   University of Córdoba, Spain   

Is Abstract false Parameter • in iMode : int

toNumericalByGaussian

This method implements the method toNumericalByGaussian of the abstraction, returning a numerical column extracted from the range values contained in the column, according to the Gauss distribution.

Parameters:

• dMean The arithmetic mean for the distribution • dStdDev The standard deviation for the distribution

It returns the resulting NumericalColumn object.

Exceptions:

• NotAddedValueException

Type NumericalColumn Visibility public Is Abstract false Parameter • in dMean : double

• in dStdDev : double

Relation Detail

Generalization

Name Related Element • ColumnImpl

Page 131: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [130]   University of Córdoba, Spain   

Package es::uco::kdis::datapro::dataset::Source

Figure 38. Package es.uco.kdis.datapro.dataset.Source

Name Source Qualified Name es::uco::kdis::datapro::dataset::Source

Class ArffDataset ArffDataset implements the ARFF (Attribute-Relation File Format) dataset file specification, as used by Weka. This is a subclass of FileDataset.

ARFF files are ASCII text files that describe a list of instances sharing a set of attributes. After a few heading lines, where the metainformation is presented, one instance per line is dumped, until the end of the file is reached.

Types of attribute in ARFF dataset files:

• @ATTRIBUTE name numeric (As numerical columns) • @ATTRIBUTE name {value1, value2, ...} (As categorical columns) • @ATTRIBUTE name string (As nominal columns) • @ATTRIBUTE name date "yyyy-MM-dd HH:mm:ss" (As date columns)

For a further description, visit the web site http://www.cs.waikato.ac.nz/ml/weka/arff.html (Nov. 1st, 2008).

Figure 39. Class ArffDataset

Name ArffDataset Qualified Name es::uco::kdis::datapro::dataset::Source::ArffDataset Visibility public Abstract false Base Classifier • FileDataset Realized Interface

Page 132: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [131]   University of Córdoba, Spain   

Attribute Detail Some attributes are protected to allow reusability by inheritance.

ATTRIBUTE

ATTRIBUTE is the static constant string for the ARFF keyword '@attribute'.

Type String Default Value "@attribute" Visibility protected Multiplicity

DATA

DATA is the static constant string for the ARFF keyword '@data'. It defines the beginning of the data block in the ARFF file.

Type String Default Value "@data" Visibility protected Multiplicity

RELATION

RELATION is the static constant with the ARFF keyword '@relation'. It represents the beginning of the ARFF dataset definition.

Type String Default Value "@relation" Visibility protected Multiplicity

Operation Detail

addAllValues

This method reads the DATA block in the dataset and adds the values in the file to the corresponding column structure.

Parameter:

• sColumnFormat String indicating the sequence of column types that corresponds to the attribute order of each instance in the dataset.

o s: Nominal column o f: Numerical (real) column o c: Categorical column o b: Binary column o d: Date column o %: Skip this column (do not dump its values to any column)

For example, the string “cbbf%%d” indicates the sequence of attributes that are read from the dataset and copied in memory to the column structure of the dataset: a categorical column, two binary columns, and a numerical column. The following two attributes are omitted. Finally, the date attribute is copied.

Page 133: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [132]   University of Córdoba, Spain   

Exceptions:

• IndexOutOfBoundsException • IOException • NotAddedValueException

Type void Visibility protected Is Abstract false Parameter • inout sColumnFormat : String

ArffDataset

Default constructor with no parameters. No dataset filename is specified using this constructor.

Type Visibility public Is Abstract false Parameter

ArffDataset

Constructor with the filename of the dataset as a parameter.

Parameter:

• sFileName The filename of the dataset

Type Visibility public Is Abstract false Parameter • inout sFileName : String

close

This method closes the ARFF file.

Exception:

• IOException

Type void Visibility protected Is Abstract false Parameter

obtainMetadata

This method reads the metadata of an ARFF file. Each attribute specification is interpreted and, if required, the column structure is created in the dataset.

This method reads the metadata block of the dataset. Parameter:

• sColumnFormat String indicating the sequence of column types that corresponds to the attribute order of each instance in the dataset.

o s: Nominal column o f: Numerical (real) column o c: Categorical column

Page 134: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [133]   University of Córdoba, Spain   

o b: Binary column o d: Date column o %: Skip this column (do not dump its values to any column)

For example, the code "bbf%c" indicates that two binary columns and a numerical (real) column will be read. Then, the forth attribute will be skipped and, finally, a categorical column will be read.

Exceptions:

• IOException • InputMismatchException

Type void Visibility protected Is Abstract false Parameter • inout sColumnFormat : String

open

This method opens the dataset file using the name passed as a parameter to the constructor.

Exceptions:

• FileNotFoundException

Type void Visibility protected Is Abstract false Parameter

readDataset

This method opens the dataset, reads the metadata and instances and, finally, closes the dataset file.

Parameter:

• sContentFormat Not considered for ARFF datasets • sColumnFormat String that specifies the types of columns to be read. Each column type

is represented by one of the following symbols: o s: Nominal column o f: Numerical column o c: Categorical column o b: Binary column o d: Date column o %: Skip this column

Exceptions:

• NotAddedValueException • IOException • IndexOutOfBoundsException

Type void Visibility public Is Abstract false Parameter • inout sColumnFormat : String

• inout sContentFormat : String

Page 135: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [134]   University of Córdoba, Spain   

readDataset

This method opens the dataset, reads the metadata and instances and, finally, closes the dataset file.

Parameter:

• sColumnFormat String that specifies the types of columns to be read. Each column type is represented by one of the following symbols:

o s: Nominal column o f: Numerical column o c: Categorical column o b: Binary column o d: Date column o %: Skip this column

Exceptions:

• NotAddedValueException • IOException • IndexOutOfBoundsException

Type void Visibility public Is Abstract false Parameter • inout sColumnFormat : String

readDataset

This method opens the dataset, reads the metadata and instances and, finally, closes the dataset file. The value of the column format string is null.

Exceptions:

• NotAddedValueException • IOException • IndexOutOfBoundsException

Type void Visibility public Is Abstract false Parameter

writeDataset

This method opens the dataset file, writes metadata and instances, and closes the file. The column types accepted (otherwise, an InputMismatchException exception is thrown) are the following:

• Numerical • Date • Nominal • Categorical • Boolean (binary values are saved as categorical values)

Parameter:

• sOutputFile The filename of the dataset Exceptions:

• InputMismatchException • IOException

Page 136: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [135]   University of Córdoba, Spain   

Type void Visibility public Is Abstract false Parameter • inout sOutputFile : String

Relation Detail

Generalization

Name Related Element • FileDataset

Class CsvDataset CsvDataset implements the CSV (Comma-Separated Values) dataset file specification, as prescribed by the IETF specification, available from http://tools.ietf.org/html/rfc4180 (October, 2005).

Figure 40. Class CsvDataset

Name CsvDataset Qualified Name es::uco::kdis::datapro::dataset::Source::CsvDataset Visibility public Abstract false Base Classifier • FileDataset Realized Interface

Operation Detail

addAllValues

This method adds all the values in the file to the corresponding column structure. Parameter:

• sColumnFormat String indicating the sequence of column types that corresponds to the attribute order of each instance in the dataset.

o s: Nominal column o f: Numerical (real) column o i: Integer column o c: Categorical column o %: Skip this column (do not dump its values to any column)

For example, the string “cf%%s” indicates the sequence of attributes that are read from the dataset and

Page 137: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [136]   University of Córdoba, Spain   

copied in memory to the column structure of the dataset: a categorical column and a numerical column. The following two attributes are omitted, and, finally, the date attribute is copied.

Exceptions:

• IndexOutOfBoundsException • IOException • NotAddedValueException

Type void Visibility protected Is Abstract false Parameter • inout sColumnFormat : String

close

This method closes the CSV file.

Exception:

• IOException

Type void Visibility protected Is Abstract false Parameter

CsvDataset

The default constructor of the CSV dataset with no parameters.

Type Visibility public Is Abstract false Parameter

CsvDataset

Constructor of the CSV dataset with its filename as a parameter.

Parameter:

• sFileName The filename of the CVS dataset

Type Visibility public Is Abstract false Parameter • inout sFileName : String

obtainMetadata

This method reads the metadata of the CSV file. Notice that any metainformation in CSV files is optional.

Parameter:

• sContentFormat String that specifies the structure of the CSV file. The following symbols are used:

o n: Indicates that a line with the attribute names is read o v: Indicates the block containing the instance values is read o %: Skip one row in the file

Page 138: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [137]   University of Córdoba, Spain   

• sColumnFormat String that specifies the different column types to be read from the file. Each column type is represented by one of the following symbols:

o s: Nominal column o f: Numerical (real) column o c: Categorical column o i: Integer column o %: Skip this column

Exceptions:

• IOException • IllegalFormatSpecificationException

Type void Visibility Protected Is Abstract false Parameter • inout sColumnFormat : String

• inout sContentFormat : String

open

This method opens the dataset CSV file using the name passed as a parameter to the constructor.

Exceptions:

• FileNotFoundException

Type void Visibility protected Is Abstract false Parameter

readDataset

This method opens the dataset, reads the metadata and instances and, finally, closes the dataset file.

Parameter:

• sContentFormat String that specifies the structure of the CSV file. The following symbols are used:

o n: Indicates that a line with the attribute names is read o v: Indicates the block containing the instance values is read o %: Skip one row in the file For example, “%n%%v” omits the first line, then reads the column names, omits the next two lines and, finally, reads the dataset instances

• sColumnFormat String that specifies the types of columns to be read. Each column type is represented by one of the following symbols:

o s: Nominal column o f: Numerical column o i: Integer column o c: Categorical column o %: Skip this column

Exceptions:

• NotAddedValueException • IOException • IndexOutOfBoundsException • IllegalFormatSpecificationException

Page 139: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [138]   University of Córdoba, Spain   

Type void Visibility public Is Abstract false Parameter • inout sColumnFormat : String

• inout sContentFormat : String

readDataset

This method opens the dataset, reads metainformation and instances and, finally, closes the dataset file. This method assumes the following file format: one first line with the attribute names (metadata), followed by the instances.

Parameter:

• sColumnFormat String that specifies the types of columns to be read. Each column type is represented by one of the following symbols:

o s: Nominal column o f: Numerical column o i: Integer column o c: Categorical column o %: Skip this column

Exceptions:

• NotAddedValueException • IOException • IndexOutOfBoundsException

Type void Visibility public Is Abstract false Parameter • inout sColumnFormat : String

writeDataset

This method writes a new CVS dataset file. The column types allowed for writing are the following:

• Numerical • Integer • Nominal • Categorical • Binary (binary values are saved as categorical values)

Parameter:

• sOutputFile The filename of the dataset Exceptions:

• IOException

Type void Visibility public Is Abstract false Parameter • inout sOutputFile : String

Page 140: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [139]   University of Córdoba, Spain   

Relation Detail

Generalization

Name Related Element • FileDataset

Class ExcelDataset ExcelDataset is a class that represents a dataset conformant to the Microsoft Excel standard specification. This type of files has the basic features of all spreadsheets, using a grid of cells arranged in numbered rows and letter-named columns.

Note: This class has external dependencies to the Java library POI.

Figure 41. Class ExcelDataset

Name ExcelDataset Qualified Name es::uco::kdis::datapro::dataset::Source::ExcelDataset Visibility public Abstract false Base Classifier • FileDataset Realized Interface

Attribute Detail

All attributes are private.

Operation Detail

addAllValues

This method adds all the values in the DATA block of the file to the corresponding column structure. Parameter:

• sColumnFormat String indicating the sequence of column types that corresponds to the attribute order of each instance in the dataset.

o s: Nominal column o f: Numerical (real) column o i: Integer column o c: Categorical column

Page 141: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [140]   University of Córdoba, Spain   

o %: Skip this column (do not dump its values to any column) For example, the string “cf%%s” indicates the sequence of attributes that are read from the dataset and copied in memory to the column structure of the dataset: a categorical column and a numerical column. The following two attributes are omitted, and, finally, the nominal attribute is copied.

Exceptions:

• IndexOutOfBoundsException • IOException • NotAddedValueException

Type void Visibility protected Is Abstract false Parameter • inout sColumnFormat : String

close

Close the Excel file.

Exceptions:

• IOException

Type void Visibility Protected Is Abstract false Parameter

ExcelDataset

Default constructor with no parameters.

Type Visibility public Is Abstract false Parameter

ExcelDataset

Constructor with the filename as parameter.

Parameter:

• sFileName The filename of the Excel dataset

Type Visibility public Is Abstract false Parameter • inout sFileName : String

obtainMetadata

This method reads the metadata of the Excel file.

Parameter:

• sContentFormat String that specifies the data structure in the Excel file. The

Page 142: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [141]   University of Córdoba, Spain   

following symbols are used: o n: Indicates that a line with the attribute names is read o v: Indicates the block containing the instance values is read o %: Skip one row in the file

• sColumnFormat String that specifies the different column types to be read from the file. Each column type is represented by one of the following symbols:

o s: Nominal column o f: Numerical (real) column o c: Categorical column o i: Integer column o %: Skip this column

Exceptions:

• IOException • IllegalFormatSpecificationException

Type void Visibility protected Is Abstract false Parameter • inout sColumnFormat : String

• inout sContentFormat : String

open

This method opens the Excel file using the name passed as a parameter to the constructor.

Exceptions:

• FileNotFoundException

Type void Visibility protected Is Abstract false Parameter

readDataset

This method opens the dataset, reads the metadata and instances and, finally, closes the dataset file.

Parameter:

• sContentFormat String that specifies the structure of the CSV file. The following symbols are used:

o n: Indicates that a line with the attribute names is read o v: Indicates the block containing the instance values is read o %: Skip one row in the file For example, “%n%%v” omits the first line, then reads the column names, omits the next two lines and, finally, reads the dataset instances

• sColumnFormat String that specifies the types of columns to be read. Each column type is represented by one of the following symbols:

o s: Nominal column o f: Numerical column o i: Integer column o c: Categorical column o %: Skip this column

Page 143: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [142]   University of Córdoba, Spain   

Exceptions:

• NotAddedValueException • IOException • IndexOutOfBoundsException • IllegalFormatSpecificationException

Type void Visibility public Is Abstract false Parameter • inout sColumnFormat : String

• inout sContentFormat : String

writeDataset

This method writes the dataset to a new Excel file. The column types supported for writing are the following:

• Numerical • Integer • Nominal • Categorical • Binary (binary values are saved as categorical values)

Parameter:

• sOutputFile The filename of the dataset Exceptions:

• IOException

Type void Visibility public Is Abstract false Parameter • inout sOutputFile : String

Relation Detail

Generalization

Name Related Element • FileDataset

Class KeelDataset KeelDataset is the class representing a dataset conformant to the KEEL (Knowledge Extraction based on Evolutionary Learning) standard specification. KeelDataset is a subclass of ArffDataset.

KEEL files are a specific subtype of ARFF files with the following kind of attributes:

• @ATTRIBUTE name real [value1, value2] for real data • @ATTRIBUTE name integer [value1, value2] for integer data • @ATTRIBUTE name {value1, value2, ...} for categorical data

For a more detailed description of this specification, the reader can consult the following reference:

Page 144: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [143]   University of Córdoba, Spain   

J. Alcalá-Fdez et al. KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework. Journal of Multiple-Valued Logic and Soft Computing 17:2-3 (2011) 255-287.

Also, for further information, visit the website http://www.keel.es.

Figure 42. Class KeelDataset

Name KeelDataset Qualified Name es::uco::kdis::datapro::dataset::Source::KeelDataset Visibility public Abstract false Base Classifier • ArffDataset Realized Interface

Attribute Detail

INPUTS

Constant for the keyword @inputs

Type String Default Value "@inputs" Visibility protected Multiplicity

OUTPUTS

Constant for the keyword @outputs

Type String Default Value "@outputs" Visibility protected Multiplicity

Operation Detail

addAllValues

This method adds all the values in the @DATA block of the file to the corresponding column structure.

Parameter:

• sColumnFormat String indicating the sequence of column types that corresponds to the attribute order of each instance in the dataset.

o f: Numerical (real) column

Page 145: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [144]   University of Córdoba, Spain   

o i: Integer column o c: Categorical column o b: Binary column o %: Skip this column (do not dump its values to any column)

For example, the string “cf%%b” indicates the sequence of attributes that are read from the dataset and copied in memory to the column structure of the dataset: a categorical column and a numerical column. The following two attributes are omitted, and, finally, the binary attribute is copied.

Exceptions:

• IndexOutOfBoundsException • IOException • NotAddedValueException

Type void Visibility protected Is Abstract false Parameter • inout sColumnFormat : String

KeelDataset

Default constructor with no parameters.

Type Visibility public Is Abstract false Parameter

KeelDataset

Constructor with the filename of the dataset as a parameter.

Parameter:

• sFileName The filename containing the dataset

Type Visibility public Is Abstract false Parameter • inout sFileName : String

obtainMetadata

This method reads the metadata of the KEEL file.

Parameter:

• sColumnFormat String that specifies the different column types to be read from the file. Each column type is represented by one of the following symbols:

o b: Binary column o f: Numerical (real) column o c: Categorical column o i: Integer column o %: Skip this column

Exceptions:

• IOException

Page 146: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [145]   University of Córdoba, Spain   

• IllegalFormatSpecificationException

Type void Visibility protected Is Abstract false Parameter • inout sColumnFormat : String

writeDataset

This method writes the dataset to a new Excel file. Only the following types of column are supported for writing:

• Numerical (real) • Integer • Categorical

Parameter:

• sOutputFile The filename of the dataset Exceptions:

• IOException

Type void Visibility public Is Abstract false Parameter • inout sOutputFile : String

Relation Detail

Generalization

Name Related Element • ArffDataset

Page 147: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [146]   University of Córdoba, Spain   

Package es::uco::kdis::datapro::datatypes

Figure 43. Package es.uco.kdis.datapro.datatypes

Name datatypes Qualified Name es::uco::kdis::datapro::datatypes

Class InvalidValue This abstract class represents any invalid value in a column. This is the base class of the following types of invalid values:

• Missing values. • Null values. • Empty values.

For a more detailed description, see the following reference:

Pyle, D. Data preparation for data mining. Morgan Kaufmann, 1999. ISBN: 1-55869-529-0.

Note. Notice that columns may define their own invalid values. However, these values are not processed by the library, but only devoted to serialization and specific algorithms. Generally, these objects for invalid values are more than enough for a regular use. Further, these objects are notation-independent, and only used for data processing.

Figure 44. Class InvalidValue

Name InvalidValue Qualified Name es::uco::kdis::datapro::datatypes::InvalidValue Visibility public Abstract true Base Classifier Realized Interface

Relation Detail

Generalization

Name Related Element • MissingValue

Name Related Element • EmptyValue

Page 148: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [147]   University of Córdoba, Spain   

Name Related Element • NullValue

Class EmptyValue This class represents an empty value in a variable, i.e., the one for which no real-world value can be supposed.

This class implements a singleton object, so only one reference can be instantiated simultaneously. Instantiation is done using the method getEmptyValue. Therefore, empty values can be compared using the operator ‘==’.

Figure 45. Class EmptyValue

Name EmptyValue Qualified Name es::uco::kdis::datapro::datatypes::EmptyValue Visibility public Abstract false Base Classifier • InvalidValue Realized Interface

Attribute Detail All attributes are private.

Operation Detail

getEmptyValue

Singleton constructor for the object representing an empty value.

Type EmptyValue Visibility public Is Abstract false Parameter

Relation Detail

Generalization

Name Related Element • InvalidValue

Class MissingValue This class represents a missing value in a variable, i.e., the one that has not been entered into the dataset, but for which an actual value exists in the real-world in which the measurements were made.

Page 149: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [148]   University of Córdoba, Spain   

This class implements a singleton object, so only one reference can be instantiated simultaneously. Instantiation is done using the method getMissingValue. Therefore, missing values can be compared using the operator ‘==’.

Figure 46. Class MissingValue

Name MissingValue Qualified Name es::uco::kdis::datapro::datatypes::MissingValue Visibility public Abstract false Base Classifier • InvalidValue Realized Interface

Attribute Detail All attributes are private.

Operation Detail

getMissingValue

Singleton constructor for the object representing a missing value.

Type MissingValue Visibility public Is Abstract false Parameter

Relation Detail

Generalization

Name Related Element • InvalidValue

Class NullValue This class represents an explicit null value in a variable.

This class implements a singleton object, so only one reference can be simultaneously instantiated. Instantiation is done using the method getNullValue. Therefore, null values can be compared using the operator ‘==’. Its use allows the programmer to replace null values with comparable object instances (e.g. in collections, comparisons, etc.).

Page 150: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [149]   University of Córdoba, Spain   

Figure 47. Class NullValue

Name NullValue Qualified Name es::uco::kdis::datapro::datatypes::NullValue Visibility public Abstract false Base Classifier • InvalidValue Realized Interface

Attribute Detail All attributes are private.

Operation Detail

getNullValue

Singleton constructor for the object representing a null value.

Type NullValue Visibility public Is Abstract false Parameter

Relation Detail

Generalization

Name Related Element • InvalidValue

Class Range This class is a template to represent any kind of interval consisting of a maximum and minimum limit. These boundaries can be open or close, indicating that the value is excluded or included in the range. The C defined by the template is the class of object involved in the range.

Page 151: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [150]   University of Córdoba, Spain   

Figure 48. Class Range

Name Range Qualified Name es::uco::kdis::datapro::datatypes::Range Visibility public Abstract true Base Classifier Realized Interface

Attribute Detail

Protected attributes with accessors (getter/setter) are omitted.

Operation Detail

getMaxValue

This method returns the upper interval boundary value, i.e. the maximum value in the interval (the programmer has to check whether the interval is open or close).

Type C Visibility public Is Abstract false Parameter

getMinValue

This method returns the lower interval boundary value, i.e. the minimum value in the interval (the programmer has to check whether the interval is open or close).

Type C Visibility public Is Abstract false Parameter

isOpenMax

This method returns a boolean value indicating whether the upper interval boundary is open, i.e. the maximum value is excluded from the range.

Type boolean Visibility public Is Abstract false Parameter

Page 152: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [151]   University of Córdoba, Spain   

isOpenMin

This method returns a boolean value indicating whether the lower interval boundary is open, i.e. the minimum value is excluded from the range.

Type boolean Visibility public Is Abstract false Parameter

setMaxValue

This method sets the upper interval boundary.

Parameter:

• oMax The new maximum value

Type void Visibility public Is Abstract false Parameter • inout oMax : C

setMinValue

This method sets the lower interval boundary.

Parameter:

• oMin The new minimum value

Type void Visibility public Is Abstract false Parameter • inout oMin : C

setOpenMax

This method sets the upper interval boundary to open or close.

Parameter:

• bOpenMax True if open; false if close.

Type void Visibility public Is Abstract false Parameter • inout bOpenMax : boolean

setOpenMin

This method sets the lower interval boundary to open or close.

Parameter:

• bOpenMin True if open; false if close.

Type void Visibility public Is Abstract false Parameter • inout bOpenMin : boolean

Page 153: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [152]   University of Córdoba, Spain   

Relation Detail

Dependency

Name Related Element • Range<Double>

Class DoubleRange This class is a specialization of the template Range, where the template parameter is of type Double.

Figure 49. Class DoubleRange

Name DoubleRange Qualified Name es::uco::kdis::datapro::datatypes::DoubleRange Visibility public Abstract false Base Classifier • Range<Double> Realized Interface

Operation Detail

DoubleRange

Default constructor with no parameters. By default, the lower and upper limit boundaries are set to the negative and positive infinite values, respectively.

Type Visibility public Is Abstract false Parameter

DoubleRange

Constructor with parameters.

Parameters:

• dMin The minimum value of the range, i.e. the lower interval boundary. • dMax The maximum value of the range, i.e. the upper interval boundary.

Type Visibility public Is Abstract false Parameter • in dMax : double

• in dMin : double

hasValue

This method returns true if the value passed as a parameter is a valid value in the interval.

Page 154: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [153]   University of Córdoba, Spain   

Parameter:

• dValue The value to be checked.

Type boolean Visibility public Is Abstract false Parameter • in dValue : double

toString

This method returns the interval in a String format. The output format is as follows:

‘[‘|’(‘ <min> ‘,’ <max> ‘)’|’]’

where square brackets are used for close intervals, and regular brackets indicate an open value.

Type String Visibility public Is Abstract false Parameter

Relation Detail

Generalization

Name Related Element • Range<Double>

Page 155: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [154]   University of Córdoba, Spain   

Package es::uco::kdis::datapro::exception

Figure 50. Package es.uco.kdis.datapro.exception

Name exception Qualified Name es::uco::kdis::datapro::exception

Class IllegalFormatSpecificationException This class is the exception indicating that the file format under consideration does not fulfill the expected standards for such a specification.

Figure 51. Class IllegalFormatSpecificationException

Name IllegalFormatSpecificationException Qualified Name es::uco::kdis::datapro::exception::IllegalFormatSpecificationException Visibility public Abstract false Base Classifier • Exception Realized Interface

Attribute Detail All attributes are private.

Operation Detail

IllegalFormatSpecificationException

Constructor with the error message as a parameter.

Parameter:

• string Error message

Type Visibility public Is Abstract false Parameter • inout string : String

Page 156: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [155]   University of Córdoba, Spain   

Relation Detail

Generalization

Name Related Element • Exception

Class NoSuchCategoryException This class is the exception indicating that a certain element does not belong to the specified category, or that a category is not found.

Figure 52. Class NoSuchCategoryException

Name NoSuchCategoryException Qualified Name es::uco::kdis::datapro::exception::NoSuchCategoryException Visibility public Abstract false Base Classifier • Exception Realized Interface

Attribute Detail All attributes are private.

Operation Detail

NoSuchCategoryException

Constructor with the error message as a parameter.

Parameter:

• string Error message

Type Visibility public Is Abstract false Parameter • inout string : String

Relation Detail

Generalization

Name Related Element • Exception

Page 157: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [156]   University of Córdoba, Spain   

Class NotAddedValueException This class is the exception indicating that a value was not successfully added to the dataset.

Figure 53. Class NotAddedValueException

Name NotAddedValueException Qualified Name es::uco::kdis::datapro::exception::NotAddedValueException Visibility public Abstract false Base Classifier • Exception Realized Interface

Attribute Detail All attribute are private.

Operation Detail

NotAddedValueException

Constructor with the error message as a parameter.

Parameter:

• string Error message

Type Visibility public Is Abstract false Parameter • inout string : String

Relation Detail

Generalization

Name Related Element • Exception

Page 158: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4Rev 1 (Jul

More @ htt 

Appe This apppackage

Packa

4j ly 2012) 

tp://www.jrrom

endix A

pendix showe overview. T

age es.u

mero.net/en 

A: UML

ws the class dThe different

uco.kdis

Figure 55. Cl

KDIS ReseUniversity of 

L diagr

diagrams tha packages a

Figure 54. C

s.datap

lass diagram

 

earch GroupCórdoba, Spain

rams

at represent tre shown ne

Class diagram

pro.algo

: package es

n

he structureext.

m: package ov

orithm.b

.uco.kdis.dat

of datapro4j

verview

base

tapro.algorith

The pro

4j. This is the

hm.base

ogrammer’s gu

[157] 

e general

ide 

Page 159: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4Rev 1 (Jul

More @ htt 

Packa

4j ly 2012) 

tp://www.jrrom

age es.u

Figur

mero.net/en 

uco.kdi

re 56. Class d

KDIS ReseUniversity of 

s.datap

diagram: Pac

 

earch GroupCórdoba, Spain

pro.algo

ckage es.uco.

n

orithm.p

.kdis.datapro

preproce

.algorithm.pr

The pro

essing

reprocessing

ogrammer’s gu

[158] 

g

ide 

Page 160: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4Rev 1 (Jul

More @ htt 

Packa

4j ly 2012) 

tp://www.jrrom

age es.u

F

mero.net/en 

uco.kdi

Figure 57. Cla

KDIS ReseUniversity of 

s.datap

ass diagram:

 

earch GroupCórdoba, Spain

pro.data

: Package es.

n

aset col

.uco.kdis.dat

umns

apro.dataset

The pro

t.Column

ogrammer’s gu

[159] 

ide 

Page 161: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4Rev 1 (Jul

More @ htt 

Packa

4j ly 2012) 

tp://www.jrrom

age es.u

mero.net/en 

uco.kdis

Figur

KDIS ReseUniversity of 

s.datap

re 58. Packag

 

earch GroupCórdoba, Spain

pro.data

ge es.uco.kdi

n

aset.Sou

is.datapro.da

urce

taset.Source

The pro

e

ogrammer’s gu

[160] 

ide 

Page 162: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4Rev 1 (Jul

More @ htt 

Packa

Packag

4j ly 2012) 

tp://www.jrrom

age es.u

ge es.uc

mero.net/en 

uco.kdis

Figure 59.

co.kdis

Figure 60.

KDIS ReseUniversity of 

s.datap

Class diagra

.datapr

Class diagra

 

earch GroupCórdoba, Spain

pro.data

am: Package

ro.excep

am: Package

n

atypes

es.uco.kdis.d

ption

es.uco.kdis.d

datapro.datat

datapro.exce

The pro

types

eption

ogrammer’s gu

[161] 

ide 

Page 163: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [162]   University of Córdoba, Spain   

Appendix B: Extending the library Project structure

This project is structured in three different parts:

1. Column structure. 2. Datasets hierarchy. 3. Strategies.

Column structure If the programmer wants to develop new columns or adapt an existing one to his own requirements, he should have in mind the strict separation between abstraction and implementation. The former implements those methods directly devoted to manage the column metainformation and delegates any processing, handling or query related to the column real values to its implementation. For further information, see the Bridge design pattern (http://en.wikipedia.org/wiki/Bridge_pattern).

We recommend the following guidelines for the development of new columns:

• Column classes should be located in the package es.uco.kdis.datapro.dataset.Column • For a given type of column, namely X, the abstraction class will be named XColumn, and

its implementation class, XColumnImpl. • The new column X has to be added to the enumeration ColumnType. This value is returned

by the column as its type. • Column implementations should not be directly accessed from any other class than

its abstraction.

Datasets hierarchy The library provides a finite number of dataset implementations (ARFF, Keel, CSV, MySql, ... and increasing), but its architecture permits the programmer to extend this part to make his own datasets of interest available. Rarely dataset classes are directly inherited from the top Dataset abstract class, but it is advisable to create, use and maintain the correct class hierarchy where common (both structural and behavioural) properties are defined, for design reasons. For example, ARFF and CSV datasets will inherit from the common file-based dataset, i.e. the abstract class FileDataset. Their respective classes will only define those properties that are specific to these kinds of file, whereas file-specific properties are defined by intermediate abstract classes. Dataset is always the root of this hierarchy, since this class links the physical dataset to the logical column structure.

Some guidelines to be considered:

• Dataset abstract classes for defining common properties are located in the package es.uco.kdis.datapro.dataset

• Dataset concrete classes are located in the package es.uco.kdis.datapro.dataset.Source • Dataset classes should be named with the suffix -“Dataset”, .e.g, CsvDataset.

Apart from the constructor (with or without parameters), the main methods to pay attention are inherited from the abstract class Dataset:

• readDataset, which allows the programmer to configure the type of columns to be filled, as well as and the dataset structure.

• writeDataset, which permits the programmer to save current dataset values into the specific format.

These methods should fulfill the following assumptions:

• When reading, format can vary or contain errors (invalid values, missing or wrong structure, etc.). • When reading, the original structure (meta-data) of the dataset should be recalled somehow.

Page 164: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [163]   University of Córdoba, Spain   

• When writing, the dataset may have been read from a dataset of the same type, or not: o If the source dataset is of the same format, the programmer may want to overwrite or

generate a new dataset. In both cases, the resulting dataset should maintain the same structure (e.g. column types and meta-data) than the source dataset.

o If the dataset to be written is of a different type than the source dataset (or the same type with a different structure), the programmer may want to specify the type of columns to be declared in the resulting dataset.

Strategies Strategies are the core and most scalable element of the library. Strategies implement algorithms on data. Strategies are independent of a specific dataset, so they can make use of more than one dataset. See DatasetStrategy in this guide for more information on the methods that should be implemented.

To implement your own algorithms, the following guidelines should be considered:

• Every algorithm should be a subclass of DatasetStrategy. • Algorithms are grouped in packages from es.uco.kdis.datapro.algorithm • Only the package es.uco.kdis.datapro.algorithm.base is required by the library. The rest

of packages from es.uco.kdis.datapro.algorithm could be excluded from the programmer’s distribution. Notice that each specific algorithm package may have its own external dependencies.

Other packages Apart from the specific packages for columns, datasets and strategies, there are some other relevant packages to consider that may be extended as well:

• es.uco.kdis.datapro.datatypes, this package implements the auxiliary classes and datatypes used by datapro4j. For example, the classes declaring invalid values, ranges, etc.

• es.uco.kdis.datapro.exception, this package implements the exception classes. The programmer should look for alternative Java common exceptions before implementing his own class and clutter the library up with unnecessary classes.

Code documentation Class headings are documented according to the following structure: class description, contact info and history.

/***CLASSDESCRIPTION**<p>*CONTACTINFO:*<ul>*<li>JoseRaulRomero,PhD [[email protected]]*<p>{@linkhttp://www.jrromero.net}*<p><p>*KnowledgeDiscoveryandIntelligentSystemsResearchGroup(KDIS)<p>*{@linkhttp://www.uco.es/grupos/kdis}*</ul>*<p>*HISTORY:*<ul>*<li>INCLUDEHERETHELISTOFCHANGESTOTHISSPECIFICFILE*</ul>*<p>*@authorJoseRaulRomero(JRR,0.2,0.3) EXAMPLEOFAUTHORS,INITIALS,VERSIONS

Page 165: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura

datapro4j    The programmer’s guide Rev 1 (July 2012)   

More @ http://www.jrromero.net/en  KDIS Research Group  [164]   University of Córdoba, Spain   

@authorJoseMariaLuna(JML,0.1)@version0.3***/

Each parameter and method should follow the Javadoc notation for documenting the code.

Further, remember include the file license.txt in every distribution that includes the library or part of it.

Coding recommendations

1. Code should be implemented following the Hungarian notation. 2. Code and comments should be written in English.