datapro4j programmer's guide...datapro4j the data processing library for java the...
TRANSCRIPT
![Page 1: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/1.jpg)
datapro4j The data processing library for Java
The programmer’s guide
Revision: 1
Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura (2012). datapro4j: the data processing library for Java. Dept. of Computer Science and Numerical Analysis, University of Córdoba (Spain). Available for download from http://www.uco.es/grupos/kdis/datapro4j
Knowledge Discovery and Intelligent Systems University of Córdoba, Spain http://www.uco.es/grupos/kdis July 2012
![Page 2: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/2.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [1] University of Córdoba, Spain
CONTACT INFO
José Raúl Romero, PhD Dept. Computer Science and Numerical Analysis University of Córdoba, Spain
Email: [email protected] Web: http://www.jrromero.net/en
PARTICIPANTS (BY ALPHABETICAL ORDER)
• de la Torre López, José. BSc. [JTL] • Luna, José María, MSc. [JML] • Orozco Borrego, Mario. BSc. [MOB] • Ramírez Quesada, Aurora. MSc. [ARQ]
PROJECT HISTORY
Version Date Description Participants 0.1 July 2011 Initial version. Intruder algorithms. ARQ, JTL, JML, JRR 0.2 September 2011 Strategies and columns MOB, JML, JRR 0.3 April 2012 Refactoring, performance improvements
and testing ARQ, JML, JRR
0.4 Under development Weka wrappers for preprocessing, association, clustering and classification
JRR
0.5 Under development New dataset sources from relational databases and noSQL databases
JRR
DOCUMENT HISTORY
Revision Date Description Author 1 July 17, 2012 Initial version of this document JRR
![Page 3: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/3.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [2] University of Córdoba, Spain
TABLE OF CONTENTS
TABLE OF FIGURES 6
INTRODUCTION 8
PURPOSE 8 SCOPE 8 LICENSE 8 OVERVIEW 9 TO‐DO LIST 9
PACKAGE ES::UCO::KDIS::DATAPRO 10
PACKAGE ES::UCO::KDIS::DATAPRO::ALGORITHM 11
PACKAGE ES::UCO::KDIS::DATAPRO::ALGORITHM::BASE 12
CLASS DATASETSTRATEGY 12
PACKAGE ES::UCO::KDIS::DATAPRO::ALGORITHM::INTRUDER 16
CLASS AVERAGEATTACK 16 CLASS BANDWAGONATTACK 18 CLASS DATASETSTATISTICS 21 CLASS INTRUDERATTACK 22 CLASS LOVEHATEATTACK 27 CLASS RANDOMATTACK 29 CLASS REVERSEBANDWAGONATTACK 31 CLASS SEGMENTATTACK 32
PACKAGE ES::UCO::KDIS::DATAPRO::ALGORITHM::PREPROCESSING 35
PACKAGE ES::UCO::KDIS::DATAPRO::ALGORITHM::PREPROCESSING:: DISCRETIZATION 36
CLASS EQUALFREQUENCYDISCRETIZATION 39 CLASS EQUALWIDTHDISCRETIZATION 36 CLASS MDLPDISCRETIZE 40
PACKAGE ES::UCO::KDIS::DATAPRO::ALGORITHM::PREPROCESSING:: INSTANCE 43
CLASS REMOVEDUPLICATES 43 CLASS REMOVEPERCENTAGE 44
![Page 4: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/4.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [3] University of Córdoba, Spain
PACKAGE ES::UCO::KDIS::DATAPRO::ALGORITHM::VALIDATION 48
CLASS KFOLDS 48
PACKAGE ES::UCO::KDIS::DATAPRO::DATASET 51
CLASS DATASET 51 CLASS FILEDATASET 64 CLASS INSTANCEITERATOR 68 INTERFACE IITERATOR 70
PACKAGE ES::UCO::KDIS::DATAPRO::DATASET::COLUMN 72
CLASS COLUMNABSTRACTION 72 CLASS COLUMNIMPL 79 ENUMERATION COLUMNTYPE 85 CLASS BINARYCOLUMN 87 CLASS BINARYCOLUMNIMPL 89 CLASS CATEGORICALCOLUMNIMPL 95 CLASS DATECOLUMN 100 CLASS DATECOLUMNIMPL 102 CLASS INTEGERCOLUMN 105 CLASS INTEGERCOLUMNIMPL 108 CLASS NOMINALCOLUMN 110 CLASS NOMINALCOLUMNIMPL 111 CLASS NUMERICALCOLUMN 115 CLASS NUMERICALCOLUMNIMPL 119 CLASS RANGECOLUMN 123 CLASS RANGECOLUMNIMPL 125
PACKAGE ES::UCO::KDIS::DATAPRO::DATASET::SOURCE 130
CLASS ARFFDATASET 130 CLASS CSVDATASET 135 CLASS EXCELDATASET 139 CLASS KEELDATASET 142
PACKAGE ES::UCO::KDIS::DATAPRO::DATATYPES 146
CLASS INVALIDVALUE 146 CLASS EMPTYVALUE 147 CLASS MISSINGVALUE 147 CLASS NULLVALUE 148 CLASS RANGE 149 CLASS DOUBLERANGE 152
PACKAGE ES::UCO::KDIS::DATAPRO::EXCEPTION 154
![Page 5: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/5.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [4] University of Córdoba, Spain
CLASS ILLEGALFORMATSPECIFICATIONEXCEPTION 154 CLASS NOSUCHCATEGORYEXCEPTION 155 CLASS NOTADDEDVALUEEXCEPTION 156
APPENDIX A: UML DIAGRAMS 157
PACKAGE ES.UCO.KDIS.DATAPRO.ALGORITHM.BASE 157 PACKAGE ES.UCO.KDIS.DATAPRO.ALGORITHM.PREPROCESSING 158 PACKAGE ES.UCO.KDIS.DATAPRO.DATASET COLUMNS 159 PACKAGE ES.UCO.KDIS.DATAPRO.DATASET.SOURCE 160
APPENDIX B: EXTENDING THE LIBRARY 162
PROJECT STRUCTURE 162 CODE DOCUMENTATION 163 CODING RECOMMENDATIONS 164
![Page 6: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/6.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [5] University of Córdoba, Spain
THIS PAGE IS LEFT BLANK INTENTIONALLY
![Page 7: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/7.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [6] University of Córdoba, Spain
TABLE OF FIGURES
PACKAGE ES.UCO.KDIS.DATAPRO.ALGORITHM _________________________________________________________________ 11 PACKAGE ES.UCO.KDIS.DATAPRO.ALGORITHM.BASE _____________________________________________________________ 12 CLASS DATASETSTRATEGY ______________________________________________________________________________ 12 PACKAGE ES.UCO.KDIS.DATAPRO.ALGORITHM.INTRUDER __________________________________________________________ 16 PACKAGE ES.UCO.KDIS.DATAPRO.ALGORITHM.PREPROCESSING _____________________________________________________ 35 PACKAGE ES.UCO.KDIS.DATAPRO.ALGORITHM.PREPROCESSING.DISCRETIZATION __________________________________________ 36 CLASS EQUALFREQUENCYDISCRETIZATION ___________________________________________________________________ 39 CLASS EQUALWIDTHDISCRETIZATION ______________________________________________________________________ 36 CLASS MDLPDISCRETIZE _______________________________________________________________________________ 40 PACKAGE ES.UCO.KDIS.DATAPRO.ALGORITHM.PREPROCESSING.INSTANCE ______________________________________________ 43 CLASS REMOVEDUPLICATES _____________________________________________________________________________ 43 CLASS REMOVEPERCENTAGE ____________________________________________________________________________ 45 PACKAGE ES.UCO.KDIS.DATAPRO.ALGORITHM.VALIDATION ________________________________________________________ 48 PACKAGE ES.UCO.KDIS.DATAPRO.DATASET ___________________________________________________________________ 51 CLASS DATASET _____________________________________________________________________________________ 52 CLASS FILEDATASET __________________________________________________________________________________ 64 CLASS INSTANCEITERATOR ______________________________________________________________________________ 69 INTERFACE IITERATOR _________________________________________________________________________________ 70 PACKAGE ES.UCO.KDIS.DATAPRO.DATASET.COLUMN ____________________________________________________________ 72 ABSTRACT CLASS COLUMNABSTRACTION ____________________________________________________________________ 73 ABSTRACT CLASS COLUMNIMPL __________________________________________________________________________ 80 ENUMERATION COLUMNTYPE ___________________________________________________________________________ 86 CLASS BINARYCOLUMN ________________________________________________________________________________ 87 CLASS BINARYCOLUMNIMPL ____________________________________________________________________________ 89 CLASS CATEGORICALCOLUMN ___________________________________________________________________________ 92 CLASS CATEGORICALCOLUMNIMPL ________________________________________________________________________ 95 CLASS DATECOLUMN ________________________________________________________________________________ 101 CLASS DATECOLUMNIMPL _____________________________________________________________________________ 102 CLASS INTEGERCOLUMN ______________________________________________________________________________ 105 CLASS INTEGERCOLUMNIMPL ___________________________________________________________________________ 108 CLASS NOMINALCOLUMN _____________________________________________________________________________ 110 CLASS NOMINALCOLUMNIMPL __________________________________________________________________________ 112 CLASS NUMERICALCOLUMN ____________________________________________________________________________ 115 CLASS NUMERICALCOLUMNIMPL ________________________________________________________________________ 119 CLASS RANGECOLUMN _______________________________________________________________________________ 123 CLASS RANGECOLUMNIMPL ____________________________________________________________________________ 125 PACKAGE ES.UCO.KDIS.DATAPRO.DATASET.SOURCE ____________________________________________________________ 130 CLASS ARFFDATASET ________________________________________________________________________________ 130 CLASS CSVDATASET _________________________________________________________________________________ 135 CLASS EXCELDATASET ________________________________________________________________________________ 139 CLASS KEELDATASET _________________________________________________________________________________ 143 PACKAGE ES.UCO.KDIS.DATAPRO.DATATYPES ________________________________________________________________ 146 CLASS INVALIDVALUE ________________________________________________________________________________ 146 CLASS EMPTYVALUE _________________________________________________________________________________ 147 CLASS MISSINGVALUE _______________________________________________________________________________ 148 CLASS NULLVALUE __________________________________________________________________________________ 149 CLASS RANGE _____________________________________________________________________________________ 150 CLASS DOUBLERANGE ________________________________________________________________________________ 152 PACKAGE ES.UCO.KDIS.DATAPRO.EXCEPTION _________________________________________________________________ 154 CLASS ILLEGALFORMATSPECIFICATIONEXCEPTION _____________________________________________________________ 154 CLASS NOSUCHCATEGORYEXCEPTION _____________________________________________________________________ 155
![Page 8: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/8.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [7] University of Córdoba, Spain
CLASS NOTADDEDVALUEEXCEPTION ______________________________________________________________________ 156 CLASS DIAGRAM: PACKAGE OVERVIEW _____________________________________________________________________ 157 CLASS DIAGRAM: PACKAGE ES.UCO.KDIS.DATAPRO.ALGORITHM.BASE ________________________________________________ 157 CLASS DIAGRAM: PACKAGE ES.UCO.KDIS.DATAPRO.ALGORITHM.PREPROCESSING _________________________________________ 158 CLASS DIAGRAM: PACKAGE ES.UCO.KDIS.DATAPRO.DATASET.COLUMN _______________________________________________ 159 PACKAGE ES.UCO.KDIS.DATAPRO.DATASET.SOURCE ____________________________________________________________ 160 CLASS DIAGRAM: PACKAGE ES.UCO.KDIS.DATAPRO.DATATYPES ____________________________________________________ 161 CLASS DIAGRAM: PACKAGE ES.UCO.KDIS.DATAPRO.EXCEPTION _____________________________________________________ 161
![Page 9: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/9.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [8] University of Córdoba, Spain
Introduction
Purpose
This document provides class, interface, and enumeration specification for the datapro4j library. The specification provides the details of the types being modeled within the system.
The datapro4j library is conceived to provide fully support to the efficient handling of data sets from different sources and declaring different kind of data types. This task often takes too long to the Java programmer, especially in certain domains, such as analytical analysis or data mining. Notice that this library is not provided for a given application domain, just for those that require the handling of structured data in Java from diverse data sources.
Therefore, datapro4j can be used in data mining for handling inputs or preprocessing data, using both internal strategies (e.g. algorithms on instances, discretization, etc.) or external tools (e.g. Weka or any other application). It can be also used for handling outputs: for example, in migrating data to other different formats, rearrange results from external tools or algorithms, executing statistical tests, etc.
It is worth mentioning that datapro4j is conceived to be extended, adding new algorithms, data formats, column types, etc. All these aspects are independent of each other, so algorithms can be executed regardless of being introduced in diverse formats (stored in noSQL databases, as an ARFF file, or whichever).
Scope
This document is intended to define the class specification for the datapro4j library.
License
Copyright©2012UniversityofCordoba,Spain.
ThissoftwarewasdevelopedbymembersoftheKnowlegdeDiscoveryandIntelligentSystemsat theUniversityofCórdoba,Spain.For furtherinformationon the libraryandmodifications,pleasevisittheURLhttp://www.uco.es/grupos/kdis/datapro4j
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS ORIMPLIED.
Redistribution and use of binary forms, with or without modification, are permitted if the following conditions are met:
Redistributions of source code must retain the above copyright notice, this list of conditions and the disclaimer above.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
All advertising materials or publication mentioning features or use of this software must display the following acknowledgement: “This product includes software developed by the KDIS Research Group at the University of Córdoba (Spain) and its contributors.” or cite the following reference:
J.R.Romero,J.M.Luna,S.Ventura(2012).datapro4j:thedataprocessinglibraryforJava.Dept.ofComputerScienceandNumericalAnalysis,UniversityofCórdoba(Spain).Availablefordownloadfromhttp://www.uco.es/grupos/kdis/datapro4j
![Page 10: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/10.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [9] University of Córdoba, Spain
Neither the name of the University nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
Commercial use of this software or part of it is not allowed without specific prior written permission. Licensing and conditions are subject to change without notice.
Note: At the moment this software is provided in binary form as a Java library. Source code is not provided (we plan to release the Java source code in a near future).
Overview This document provides a list of all packages with a summary for each. Each package has a section that contains a list of its classes, interfaces and enumeration type, with a summary for each. Class and Interface contains description, summary tables, detailed member descriptions, and relation table.
Private properties are omitted. Protected properties are shown when useful for external programmers.
To-do list
In the near future, this library will be updated with the following features (not necessarily in this order):
Listeners in strategies. Graphical UI. (Some minor support is already provided). Generation of synthetic datasets under precise constraints. Multipart datasets: those datasets which are not possible to be fully stored in memory, so they need
to be split and partially retrieved. Different data mining support. Wrappers for different datasets and tools.
o A wrapper for Weka is under development. Access to different databases.
o Access thru JDBC to RDBMS engines (e.g. MySQL, Oracle) is under development. o Access to no-sql engines (e.g. Cassandra) is under development.
More dataset formats: o Currently, the following formats are supported: ARFF, KEEL, CSV, Excel o The following formats are under development: XRFF
![Page 11: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/11.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [10] University of Córdoba, Spain
Package es::uco::kdis::datapro
The library base package. The software is mainly divided into three different components:
Dataset and columns. The logical abstract representation of a dataset and its attributes. Dataset and sources. The physical representation of a dataset, serialized in files, stored in
databases or any other device. Dataset and strategies. Any algorithm running on a single dataset, set of datasets or column.
Name datapro Qualified Name es::uco::kdis::datapro
![Page 12: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/12.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [11] University of Córdoba, Spain
Package es::uco::kdis::datapro::algorithm Only those public strategies are described here. Developers can easily provide their own strategies.
Figure 1. Package es.uco.kdis.datapro.algorithm
Name algorithm Qualified Name es::uco::kdis::datapro::algorithm
![Page 13: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/13.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [12] University of Córdoba, Spain
Package es::uco::kdis::datapro::algorithm::base
Figure 2. Package es.uco.kdis.datapro.algorithm.base
Name base Qualified Name es::uco::kdis::datapro::algorithm::base
Class DatasetStrategy This class represents a generic strategy.
Strategies are a well-known design pattern, where algorithms are encapsulated into classes. Strategies should be executed using either a sequential or a step-by-step process. In general, every strategy is executed according to the following sequence:
Creation: the strategy constructor should collect all the parameters required by the algorithm to be initialized and executed for the first time. Build as many constructors as required.
Initialization: the method initialize() implements any preprocessing step required to the algorithm to be executed. This preprocessing is not a part of the algorithm itself but it should be executed for the first time that the algorithm is invoked.
Execution: the method execution() runs the algorithm once using the parameters introduced when the constructor was invoked, and initialized afterwards. If the algorithm has finished and it could not be invoked any more, then the method setExecutable(false) should be called. On the contrary, the execution is allowed until the stop criteria are fulfilled.
Stop criteria: the method isExecutable returns true if the algorithm can be executed once more over the dataset; false, otherwise.
Post-execution: Any post-processing step has to be implemented by the method postexec(). Result collection: Final results are collected from the dataset, if changed, and returned from the
method getResult().
Figure 3. Class DatasetStrategy
Name DatasetStrategy Qualified Name es::uco::kdis::datapro::algorithm::base::DatasetStrategy Visibility public Abstract true
![Page 14: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/14.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [13] University of Córdoba, Spain
Base Classifier Realized Interface
Attribute Detail
bExecutable
Execution flag. This is protected only for inheritance purposes, and should be never directly modified.
Type boolean Default Value true Visibility protected Multiplicity
oDataset
Dataset used by the strategy.
Type Dataset Default Value Visibility protected Multiplicity
Operation Detail
execute
This method is invoked to execute the strategy.
Type void Visibility public Is Abstract true Parameter
getDataset
Getter method for the dataset attribute.
Type Dataset Visibility protected Is Abstract false Parameter
getResult
This method returns an object comprising the resulting Object of the process
Type Object Visibility public Is Abstract true Parameter
initialize
This method calls the Initialization process of the strategy.
![Page 15: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/15.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [14] University of Córdoba, Spain
Type void Visibility public Is Abstract true Parameter
isExecutable
This method returns true if the strategy is in an executable state.
Type boolean Visibility public Is Abstract false Parameter
postexec
This method should be invoked, if required, after the strategy execution.
Type void Visibility public Is Abstract true Parameter
setDataset
This method sets the dataset to be used by the strategy.
Type void Visibility protected Is Abstract false Parameter • inout data : Dataset
setExecutable
This method sets the current executable state of the strategy.
Type void Visibility protected Is Abstract false Parameter • in bExecutable : boolean
Relation Detail
Generalization
Name Related Element • EqualFrequencyDiscretization
Name Related Element • EqualWidthDiscretization
Name Related Element • MDLPDiscretize
![Page 16: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/16.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [15] University of Córdoba, Spain
Name Related Element • RemoveDuplicates
Name Related Element • IntruderAttack
Name Related Element • KFolds
Name Related Element • RemovePercentage
Name Related Element • DatasetStatistics
![Page 17: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/17.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [16] University of Córdoba, Spain
Package es::uco::kdis::datapro::algorithm::intruder
Figure 4. Package es.uco.kdis.datapro.algorithm.intruder
Name intruder Qualified Name es::uco::kdis::datapro::algorithm::intruder
Class AverageAttack This class implements the Average Attack. This attack strategy sets the maximum value (push attack) or the minimum value (nuke attack) to the target item. The filler items are selected randomly and their values are also randomly chosen over a Normal Distribution, using the mean and standard deviation of the own item.
For a further description see the following paper:
B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. vol. 7, no. 4, pp. 23, 2007.
Name AverageAttack Qualified Name es::uco::kdis::datapro::algorithm::intruder::AverageAttack Visibility public Abstract false Base Classifier • IntruderAttack Realized Interface
Operation Detail
AverageAttack
Parameterized Constructor.
• oDataset The original dataset • iNumAttacks The number of attack instances • bPush The attack type (true, push; false, nuke) • iTarget The target item (The column attribute/item index) • iNumFillers The size of filler item set: -1 for randomly size, >0 for fixed size • dXRand The possibility of choose an item as selected/filler item • iSeed The random seed
Type Visibility public Is Abstract false
![Page 18: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/18.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [17] University of Córdoba, Spain
Parameter • in bPush : boolean • in dXRand : double • in iNumAttacks : int • in iNumFillers : int • in iSeed : int • in iTarget : int • inout oDataset : Dataset
chooseSelectedItems
The Average Attack does not use the selected item set.
Type void Visibility protected Is Abstract false Parameter
initialize
Initialization method.
Type void Visibility public Is Abstract false Parameter
setFillerValues
In the Average Attack, the values for filler items must be randomly generated by a Normal Distribution, using the mean and standard deviation of each item.
Type void Visibility protected Is Abstract false Parameter
setSelectedValues
The Average Attack does not use the selected item set.
Type void Visibility protected Is Abstract false Parameter
Relation Detail
Generalization
Name Related Element • IntruderAttack
![Page 19: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/19.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [18] University of Córdoba, Spain
Class BandwagonAttack This class implements the Bandwagon Attack. This attack strategy sets the maximum value (push attack) to the target item. Then, a set of items, named selected items, are chosen between the most visibility items.
The visibility items are those having a high mean and high evaluation density. For a further description see the following paper:
B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. vol. 7, no. 4, pp. 23, 2007.
Name BandwagonAttack Qualified Name es::uco::kdis::datapro::algorithm::intruder::BandwagonAttack Visibility public Abstract false Base Classifier • IntruderAttack Realized Interface
Attribute Detail
dDensity
The density threshold, i.e. the minimum number of values in the column.
Type double Default Value Visibility protected Multiplicity
dVisibility
The visibility threshold, i.e., the possibility of choose an item to act as selected item.
Type double Default Value Visibility protected Multiplicity
rgdMeanSD
It stores the mean and standard deviation of the overall dataset.
Type Double Default Value new ArrayList<Double>()
Visibility protected Multiplicity 0..*
rgoVisibilityColumns
The array of columns whose visibility exceed the thresholds dXVisibility and dXDensity.
Type Integer Default Value new ArrayList<Integer>()
Visibility package Multiplicity 0..*
![Page 20: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/20.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [19] University of Córdoba, Spain
rgoVisibilityMeans
The array of mean columns whose visibility exceed the thresholds dXVisibility and dXDensity.
Type Double Default Value new ArrayList<Double>() Visibility package Multiplicity 0..*
Operation Detail
BandwagonAttack
Parameterized Constructor:
• oDataset The original dataset • iNumAttacks The number of attack instances • iTarget The target item (The column attribute/item index) • iNumFillers The size of filler item set: -1 for randomly size, >0 for fixed size • iNumSelected The size of selected item set • dVisibility The visibility threshold (absolute value of column mean). • dDensity The density threshold (absolute value of instances without counting null, empty
or missing values in the column) • dXRand The possibility of choose an item as filler item • iSeed The random seed
Type Visibility public Is Abstract false Parameter • in dDensity : double
• in dVisibility : double • in dXRand : double • in iNumAttacks : int • in iNumFillers : int • in iNumSelected : int • in iSeed : int • in iTarget : int • inout oDataset : Dataset
chooseSelectedItems
Create the set of selected items. The size is prefixed by iNumSelected property.
Type void Visibility protected Is Abstract false Parameter
initialize
Initialization method for the strategy.
![Page 21: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/21.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [20] University of Córdoba, Spain
Type void Visibility public Is Abstract false Parameter
orderArray
Order the columns using their mean as comparative metric. This method implements the QuickSort algorithm.
• iInit The initial position of the array
• iEnd The end position in the array
Type void Visibility protected Is Abstract false Parameter • in iEnd : int
• in iInit : int
setFillerValues
In the Bandwagon Attack, the values for filler items must be randomly generated by a Normal Distribution, using the mean and standard deviation of the overall dataset.
Type void Visibility protected Is Abstract false Parameter
setSelectedValues
Set the values of selected items. In the Bandwagon Attack, each selected item has the maximum value.
Type void Visibility protected Is Abstract false Parameter
setVisibilityColumns
Select the columns that exceed the visibility and density threshold.
Type void Visibility protected Is Abstract false Parameter
Relation Detail
Generalization
Name Related Element • ReverseBandwagonAttack
![Page 22: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/22.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [21] University of Córdoba, Spain
Name Related Element • IntruderAttack
Class DatasetStatistics
Name DatasetStatistics Qualified Name es::uco::kdis::datapro::algorithm::intruder::DatasetStatistics Visibility public Abstract false Base Classifier • DatasetStrategy Realized Interface
Attribute Detail All attributes are private.
Operation Detail
DatasetStatistics
Constructor. A parameter is required:
• data Dataset over which the statistical strategy will be executed.
Type Visibility public Is Abstract false Parameter • inout data : Dataset
execute
It executes the algorithm.
Type void Visibility public Is Abstract false Parameter
getResult
It returns the mean and SD in form of an ArrayList of Double values.
Type ArrayList<Double> Visibility public Is Abstract false Parameter
Initialize
Inialization/Pre-processing method for the strategy.
![Page 23: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/23.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [22] University of Córdoba, Spain
Type void Visibility public Is Abstract false Parameter
postexec
Type void Visibility public Is Abstract false Parameter
Relation Detail
Generalization
Name Related Element • DatasetStrategy
Class IntruderAttack IntruderAttack is the abstract base class for all the intruder attack algorithms. This class represents a generic attack used to alter the content of a dataset. It extends DatasetStrategy, whose methods are implemented and adapted to a general intruder strategy.
For a further description see the paper:
B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. vol. 7, no. 4, pp. 23, 2007.
Name IntruderAttack Qualified Name es::uco::kdis::datapro::algorithm::intruder::IntruderAttack Visibility public Abstract true Base Classifier • DatasetStrategy Realized Interface
Attribute Detail
bPush
bPush represents the version of the algorithm (true, for push attack; false for nuke attack).
Type boolean Default Value Visibility protected Multiplicity
dXRand
dXrand represents the possibility of choosing an itemm(attribute) as filler item.
![Page 24: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/24.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [23] University of Córdoba, Spain
Type double Default Value Visibility protected Multiplicity
iActualInstance
iActualInstance represents the dataset instance modified by the attack.
Type Int Default Value Visibility Protected Multiplicity
iNumAttacks
iNumAttacks represents the number of attack instances that will be generated.
Type int Default Value Visibility protected Multiplicity
iNumFillers
iNumFillers is the number of filler items, -1 if the filler item set size is randomly chosen.
Type int Default Value Visibility protected Multiplicity
iNumSelected
iNumSelected is the number of selected items, -1 if the selected item set size is randomly chosen.
Type Int Default Value Visibility Protected Multiplicity
iSeed
iSeed is the seed for the oRand object.
Type Int Default Value Visibility Protected Multiplicity
iTarget
iTarget is the target attribute of the attack.
![Page 25: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/25.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [24] University of Córdoba, Spain
Type int Default Value Visibility protected Multiplicity
oInjection
oInjection stores the attack instances.
Type Dataset Default Value Visibility protected Multiplicity
oRand
oRand represents a random object.
Type Random Default Value Visibility protected Multiplicity
rgoFillers
rgoFillers is the set of selected items.
Type ColumnAbstraction Default Value new ArrayList<ColumnAbstraction>()
Visibility protected Multiplicity 0..*
rgoSelected
rgoSelected is the set of selected items.
Type ColumnAbstraction Default Value new ArrayList<ColumnAbstraction>()
Visibility protected Multiplicity 0..*
Operation Detail
addAttack
Add a new instance (all items set to missed value) to the injection.
Type void Visibility protected Is Abstract false Parameter
![Page 26: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/26.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [25] University of Córdoba, Spain
chooseFillerItems
Select the set of filler items. This set is common for all the intruder attack algorithms.
Type void Visibility protected Is Abstract false Parameter
chooseSelectedItems
Select the set of selected items. The selection process is part of a specific intruder attack algorithm.
Type void Visibility protected Is Abstract true Parameter
createRandomSetOfFiller
Select a random set of columns to act as filler items. The set size is also randomly selected. It returns the array of dataset columns that will act as filler items.
Type ArrayList<ColumnAbstraction> Visibility protected Is Abstract false Parameter
createSetOfFiller
Select a random set of columns to act as filler items. The set size is prefixed by iNumFiller property. It returns the array of dataset columns that will act as filler items.
Type ArrayList<ColumnAbstraction> Visibility protected Is Abstract false Parameter
execute
Implements the strategy of attack algorithms.
Type void Visibility public Is Abstract false Parameter
getMeanAndSD
Calculate the mean and standard deviation of the overall dataset. It returns an array with two elements, mean and standard deviation.
Type ArrayList<Double> Visibility protected Is Abstract false Parameter
![Page 27: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/27.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [26] University of Córdoba, Spain
getResult
Return the dataset injection created. It returns the object comprising the injection after the attack.
Type Object Visibility public Is Abstract false Parameter
initialize
Initialize the algorithm to prepare the execution.
Type void Visibility public Is Abstract false Parameter
isSelectedColumn
This method returns a true value if the rgoSelected contains a column named as sName parameter, false otherwise.
sName The name of the column to be searched. It returns True if the column exists, false if not.
Type boolean Visibility protected Is Abstract false Parameter inout sName: String
postexec
Post-processing after the execute method.
Type void Visibility public Is Abstract false Parameter
setFillerValues
This method assigns the correct value for each filler item. It depends on the intruder attack algorithm.
Type void Visibility protected Is Abstract true Parameter
setMaximumValue
Assign the maximum value to the target item.
Type void Visibility protected Is Abstract false
![Page 28: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/28.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [27] University of Córdoba, Spain
Parameter
setMinimumValue
Assign the minimum value to the target item.
Type void Visibility protected Is Abstract false Parameter
setSelectedValues
The selected items value generation process. It is also depends on the specific intruder attack algorithm.
Type void Visibility protected Is Abstract true Parameter
Relation Detail
Generalization
Name Related Element • AverageAttack
Name Related Element • DatasetStrategy
Name Related Element • RandomAttack
Name Related Element • LoveHateAttack
Name Related Element • BandwagonAttack
Name Related Element • SegmentAttack
Class LoveHateAttack This class implements the Love/Hate Attack. This attack strategy sets the maximum value (push attack) or the minimum value (nuke attack) to the target item. The filler items are selected randomly and their values are assigned in the opposite sense of the target item.
For a further description see the paper:
B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. vol. 7, no. 4, pp. 23, 2007.
![Page 29: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/29.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [28] University of Córdoba, Spain
Name LoveHateAttack Qualified Name es::uco::kdis::datapro::algorithm::intruder::LoveHateAttack Visibility public Abstract false Base Classifier • IntruderAttack Realized Interface
Operation Detail
chooseSelectedItems
The Love/Hate Attack does not use the selected items.
Type void Visibility protected Is Abstract false Parameter
initialize
Initialization method.
Type void Visibility public Is Abstract false Parameter
LoveHateAttack
Parameterized Constructor:
• oDataset The original dataset
• iNumAttacks The number of attack instances
• bPush The attack type (true, push; false, nuke)
• iTarget The target item (The column attribute/item index)
• iNumFillers The size of filler item set: -1 for randomly size, >0 for fixed size
• dXRand The possibility of choose an item as selected/filler item
• iSeed The random seed
Type Visibility public Is Abstract false Parameter • in bPush : boolean
• in dXRand : double • in iNumAttacks : int • in iNumFillers : int • in iSeed : int • in iTarget : int
• inout oDataset : Dataset
setFillerValues
In the Love/Hate Attack, the values for filler items must be assigned in the opposite sense of the type of
![Page 30: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/30.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [29] University of Córdoba, Spain
attack. If it is a push attack, all the filler items will be set to minimum value; if it is a nuke attack, all the filler items will be set to maximum value.
Type void Visibility protected Is Abstract false Parameter
setSelectedValues
The Love/Hate Attack does not use the selected items.
Type void Visibility protected Is Abstract false Parameter
Relation Detail
Generalization
Name Related Element • IntruderAttack
Class RandomAttack This class implements the Random Attack. This attack strategy sets the maximum value (push attack) or the minimum value (nuke attack) to the target item. The filler items are selected randomly and their values are also chosen with a Normal Distribution, using the global dataset mean and standard deviation.
For a further description read the article:
B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. vol. 7, no. 4, pp. 23, 2007.
Name RandomAttack Qualified Name es::uco::kdis::datapro::algorithm::intruder::RandomAttack Visibility public Abstract false Base Classifier • IntruderAttack Realized Interface
Attribute Detail All attributes are private.
Operation Detail
chooseSelectedItems
The Random Attack does not use the selected items.
![Page 31: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/31.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [30] University of Córdoba, Spain
Type void Visibility protected Is Abstract false Parameter
initialize
Initialization method.
Type void Visibility public Is Abstract false Parameter
RandomAttack
Parameterized Constructor:
• oDataset The original dataset
• iNumAttacks The number of attack instances
• bPush The attack type (true, push; false, nuke)
• iTarget The target item (The column attribute/item index)
• iNumFillers The size of filler item set: -1 for randomly size, >0 for fixed size
• dXRand The possibility of choose an item as selected/filler item
• iSeed The random seed
Type Visibility public Is Abstract false Parameter • in bPush : boolean
• in dXRand : double • in iNumAttacks : int • in iNumFillers : int • in iSeed : int • in iTarget : int • inout oDataset : Dataset
setFillerValues
In the Random Attack, the values for filler items must be randomly generated by a Normal Distribution, using the mean and standard deviation of the dataset.
Type void Visibility protected Is Abstract false Parameter
setSelectedValues
The Random Attack does not use the selected items.
Type void Visibility protected Is Abstract false Parameter
![Page 32: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/32.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [31] University of Córdoba, Spain
Relation Detail
Generalization
Name Related Element • IntruderAttack
Class ReverseBandwagonAttack This class implements the Reverse Bandwagon Attack. This attack strategy sets the minimum value (nuke attack) to the target item. Then, a set of items, named selected items, are chosen between the less visibility items. The visibility items are those having a low mean and high evaluation density.
For a better description read the article:
B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. vol. 7, no. 4, pp. 23, 2007.
Name ReverseBandwagonAttack Qualified Name es::uco::kdis::datapro::algorithm::intruder::ReverseBandwagonAttack Visibility public Abstract false Base Classifier • BandwagonAttack Realized Interface
Operation Detail
chooseSelectedItems
Create the set of selected items. The size is prefixed by iNumSelected property.
Type void Visibility protected Is Abstract false Parameter
initialize
Initialization method.
Type void Visibility public Is Abstract false Parameter
ReverseBandwagonAttack
Parameterized Constructor:
• oDataset The original dataset
• iNumAttacks The number of attack instances
• iTarget The target item (The column attribute/item index)
![Page 33: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/33.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [32] University of Córdoba, Spain
• iNumFillers The size of filler item set: -1 for randomly size, >0 for fixed size
• iNumSelected The size of selected item set: -1 for randomly size, >0 for fixed size
• dXVisibility The visibility threshold
• dXDensity The density threshold
• dXRand The possibility of choose an item as selected/filler item
• iSeed The random seed
Type Visibility public Is Abstract false Parameter • in dXDensity : double
• in dXRand : double • in dXVisibility : double • in iNumAttacks : int • in iNumFillers : int • in iNumSelected : int • in iSeed : int • in iTarget : int • inout oDataset : Dataset
setSelectedValues
Set the values of selected items. In the Reverse Bandwagon Attack, each selected item has the minimum value.
Type void Visibility protected Is Abstract false Parameter
setVisibilityColumns
Select the columns that exceed the visibility and density threshold.
Type void Visibility protected Is Abstract false Parameter
Relation Detail
Generalization
Name Related Element • BandwagonAttack
Class SegmentAttack This class implements the Segment Attack. This attack strategy sets the maximum value (push attack) to the
![Page 34: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/34.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [33] University of Córdoba, Spain
target item. Then, a set of selected items (the segment) are set to the maximum value. Finally, a set of filler items are randomly chosen and the minimum value are set to their.
For a better description read the article:
B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward Trustworthy Recommender Systems: An Analysis of Attack Models and Algorithm Robustness. ACM Trans. Internet Technol. 7(4):1-23, 2007.
Name SegmentAttack Qualified Name es::uco::kdis::datapro::algorithm::intruder::SegmentAttack Visibility public Abstract false Base Classifier • IntruderAttack Realized Interface
Attribute Detail
rgdMeanSD
rgdMeanSDstores the mean and standard deviation of the overall dataset.
Type Double Default Value new ArrayList<Double>()
Visibility protected Multiplicity 0..*
Operation Detail
chooseSelectedItems
Create the segment, the set of selected item, with the information given in rgsNamesOfSelected. It returns the array of dataset columns that will act as selected items.
Type void Visibility protected Is Abstract false Parameter
initialize
Initialization method.
Type void Visibility public Is Abstract false Parameter
SegmentAttack
Parameterized Constructor:
• oDataset The original dataset
• iNumAttacks The number of attack instances
• iTarget The target item (The column attribute/item index)
• iNumFillers The size of filler item set: -1 for randomly size, >0 for fixed size
• rgsNamesOfSelected The array with the names of the columns that will act
![Page 35: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/35.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [34] University of Córdoba, Spain
as selected items (the segment)
• dXRand The possibility of choose an item as selected/filler item
• iSeed The random seed
Type Visibility public Is Abstract false Parameter • in dXRand : double
• in iNumAttacks : int • in iNumFillers : int • in iSeed : int • in iTarget : int • inout oDataset : Dataset • inout rgsNamesOfSelected : ArrayList<String>
setFillerValues
Set the value for filler items. In the Segment Attack, the minimum value is assigned.
Type void Visibility protected Is Abstract false Parameter
setSelectedValues
Set the values for the selected items. In the Segment Attack, the maximum value is assigned.
Type void Visibility protected Is Abstract false Parameter
Relation Detail
Generalization
Name Related Element • IntruderAttack
![Page 36: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/36.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [35] University of Córdoba, Spain
Package es::uco::kdis::datapro::algorithm::preprocessing
Figure 5. Package es.uco.kdis.datapro.algorithm.preprocessing
Name preprocessing Qualified Name es::uco::kdis::datapro::algorithm::preprocessing
![Page 37: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/37.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [36] University of Córdoba, Spain
Package es::uco::kdis::datapro::algorithm::preprocessing:: discretization
Figure 6. Package es.uco.kdis.datapro.algorithm.preprocessing.discretization
Name discretization Qualified Name es::uco::kdis::datapro::algorithm::preprocessing::discretization
Class EqualWidthDiscretization Equal-width discretization of a given numerical/integer column of the dataset. A RangeColumn is returned. Notice that this class is inherited from EqualFrequencyDiscretization.
Figure 7. Class EqualWidthDiscretization
Name EqualWidthDiscretization Qualified Name es::uco::kdis::datapro::algorithm::preprocessing::discretization::EqualWidthDi
scretization Visibility public Abstract false Base Classifier • DatasetStrategy Realized Interface
Attribute Detail
iBins
iBins is the number of bins.
Type int Default Value Visibility protected
![Page 38: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/38.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [37] University of Córdoba, Spain
Multiplicity
oCol
The column to be discretized.
Type NumericalColumn Default Value Visibility protected Multiplicity
oRangeColumn
The column returned as result.
Type RangeColumn Default Value Visibility protected Multiplicity
sColName
The name of the column to be discretized.
Type String Default Value Visibility protected Multiplicity
sResName
The name of the resulting column.
Type String Default Value Visibility protected Multiplicity
Operation Detail
calculateDRangeColumn
This (protected) method creates a new RangeColumn taking both the intervals given as parameter and the values comprised by the original numerical column.
• aoRanges Array of intervals
• sName Name of the new column
It returns the resulting RangeColumn.
Type RangeColumn Visibility protected Is Abstract false
![Page 39: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/39.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [38] University of Córdoba, Spain
Parameter • inout aoRanges : DoubleRange • inout sName : String
EqualWidthDiscretization
Parameterized Constructor:
• oDataset The dataset to be processed.
• iBins The number of bins.
• sColName The name of the column to be processed.
• sResName The name of the resulting column .
Type Visibility public Is Abstract false Parameter • in iBins : int
• inout oDataset : Dataset • inout sColName : String • inout sResName : String
execute
This method runs the discretization process. Firstly, it calculates the cut-points and sets the range intervals.
Type void Visibility public Is Abstract false Parameter
getResult
The discretized RangeColumn is returned.
Type Object Visibility public Is Abstract false Parameter
initialize
The initialization method. Types of the column and its values are checked.
Type void Visibility public Is Abstract false Parameter
postexec
Not required.
Type void Visibility public Is Abstract false Parameter
![Page 40: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/40.jpg)
datapro4Rev 1 (Jul
More @ htt
Relation
Gene
NaRe
NaRe
ClassEqual-frreturned
Name Qualif
VisibilAbstraBase C
Realiz
Attribut
All attrib
Operatio
Notice t
Equa
Parame
Parame
4j ly 2012)
tp://www.jrrom
n Detail
eralization
ame elated Eleme
ame elated Eleme
s EqualFrequency disd.
fied Name
lity act Classifier
zed Interface
e Detail
butes are priv
on Detail
that this class
lFrequency
etrized constr
eters:
iBins NumoDataset SsColName sResName
mero.net/en
ent •
ent •
Frequenscretization o
EqualFes::ucocyDiscpublic false • D• E
e
vate.
s is inherited
Discretizatio
ructor.
mber of bins tSource datasName of theName of the
KDIS ReseUniversity of
EqualFrequ
DatasetStra
ncyDiscf a given num
Figure 8. Cla
FrequencyDiso::kdis::datapretization
DatasetStratequalWidthDi
d from Equal
on
to be createdset containinsource coluresulting Ra
earch GroupCórdoba, Spain
uencyDiscret
ategy
cretizatimerical/integ
ass EqualFreq
scretizationpro::algorithm
egy iscretization
lWidthDisc
d ng the colummn ange column
n
ization
ion ger column of
quencyDiscre
m::preproces
cretizatio
n to be discr
n
f the dataset
etization
sing::discret
on.
retized
The pro
t. A RangeCo
tization::Equa
ogrammer’s gu
[39]
olumn is
alFrequen
ide
![Page 41: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/41.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [40] University of Córdoba, Spain
Type Visibility public Is Abstract false Parameter • in iBins : int
• inout oDataset : Dataset • inout sColName : String • inout sResName : String
execute
This method makes the discretization by frequency of the column passed as parameter.
Type void Visibility public Is Abstract false Parameter
Relation Detail
Generalization
Name Related Element • DatasetStrategy
Name Related Element • EqualWidthDiscretization
Class MDLPDiscretize
Figure 9. Class MDLPDiscretize
Name MDLPDiscretize Qualified Name es::uco::kdis::datapro::algorithm::preprocessing::discretization::MDLPDiscreti
ze Visibility public Abstract false Base Classifier • DatasetStrategy Realized Interface
![Page 42: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/42.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [41] University of Córdoba, Spain
Attribute Detail All attributes are private.
Operation Detail
execute
This method runs the discretization process following the MDLP algorithm.
Type void Visibility public Is Abstract false Parameter
getResult
It returns the discretized dataset.
Type Object Visibility public Is Abstract false Parameter
initialize
The initialize() strategy method. It takes the whole dataset, and distribute each column in a LinkedList that contains a double array where the first value is the concrete value of the column, the second value is the label associated.
Type void Visibility public Is Abstract false Parameter
MDLPDiscretize
Constructor with parameters:
• oDataset source dataset Note: class labels are supposed to be in the last column of the dataset.
Type Visibility public Is Abstract false Parameter • inout oDataset : Dataset
postexec
The postexec() strategy method
![Page 43: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/43.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [42] University of Córdoba, Spain
Type void Visibility public Is Abstract false Parameter
Relation Detail
Generalization
Name Related Element • DatasetStrategy
![Page 44: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/44.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [43] University of Córdoba, Spain
Package es::uco::kdis::datapro::algorithm::preprocessing:: instance
Figure 10. Package es.uco.kdis.datapro.algorithm.preprocessing.instance
Name instance Qualified Name es::uco::kdis::datapro::algorithm::preprocessing::instance
Class RemoveDuplicates This class modifies the content of a Dataset by removing duplicate instances from this dataset.
Figure 11. Class RemoveDuplicates
Name RemoveDuplicates Qualified Name es::uco::kdis::datapro::algorithm::preprocessing::instance::RemoveDuplicatesVisibility public Abstract false Base Classifier • DatasetStrategy Realized Interface
Attribute Detail All attributes are private.
Operation Detail
execute
Execution method.
![Page 45: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/45.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [44] University of Córdoba, Spain
Type void Visibility public Is Abstract false Parameter
getResult
It returns the clean dataset.
Type Object Visibility public Is Abstract false Parameter
initialize
Initialize the algorithm to prepare the execution.
Type void Visibility public Is Abstract false Parameter
postexec
Post-processing.
Type void Visibility public Is Abstract false Parameter
RemoveDuplicates
Parameterized Constructor:
• oDataset The source dataset to work with.
Type Visibility public Is Abstract false Parameter • inout oDataset : Dataset
Relation Detail
Generalization
Name Related Element • DatasetStrategy
Class RemovePercentage This class modifies the content of a dataset by removing a percentage of its instances.
![Page 46: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/46.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [45] University of Córdoba, Spain
Figure 12. Class RemovePercentage
Name RemovePercentage Qualified Name es::uco::kdis::datapro::algorithm::preprocessing::instance::RemovePercentag
e Visibility public Abstract false Base Classifier • DatasetStrategy Realized Interface
Attribute Detail
RANDOM
RANDOM mode, when instances to be removed are randomly selected.
Type int Default Value 0 Visibility public Multiplicity
FROMINIT
FROMINIT mode, when instances to be removed are taken from the beginning of the column.
Type int Default Value 1 Visibility public Multiplicity
FROMEND
FROMEND mode, when instances to be removed are taken from the end of the column.
Type int Default Value 2 Visibility public Multiplicity
![Page 47: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/47.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [46] University of Córdoba, Spain
oRnd
oRnd is the random generator object.
Type Random Default Value new Random() Visibility public Multiplicity
Operation Detail
execute
Execute method.
Type void Visibility public Is Abstract false Parameter
getResult
Return the resulting dataset from the strategy process.
Type Object Visibility public Is Abstract false Parameter
initialize
Initialize the algorithm to prepare the execution.
Type void Visibility public Is Abstract false Parameter
postexec
Post-processing method.
Type void Visibility public Is Abstract false Parameter
RemovePercentage
Parameterized Constructor:
• oDataset The source dataset
• iMode The mode of removal
• dPercentage The percentage of instances (in [0,1]) to remove from the dataset
![Page 48: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/48.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [47] University of Córdoba, Spain
Type Visibility public Is Abstract false Parameter • in dPercentage : double
• in iMode : int • inout oDataset : Dataset
Relation Detail
Generalization
Name Related Element • DatasetStrategy
![Page 49: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/49.jpg)
datapro4Rev 1 (Jul
More @ htt
Packes::u
Name Qualif
ClassThis claalgorithm
Name QualifVisibilAbstraBase CRealiz
Attribut All attrib
Operatio
execu
It runs t
TyVisIs Pa
4j ly 2012)
tp://www.jrrom
kage uco::k
fied Name
s KFoldsass implemenm.
fied Name lity act Classifier zed Interface
e Detail
butes are priv
on Detail
ute
he KFolds al
ype sibility Abstract
arameter
mero.net/en
kdis::d
Figure
validaties::uco
s nts the strateg
Figure 14
KFoldses::ucopublic false • D
e
vate.
lgorithm. Afte
voidpublfalse
KDIS ReseUniversity of
atapro
13. Package
on o::kdis::datap
gy that calcul
. Class es.uc
s o::kdis::datap
DatasetStrate
er the execu
ic e
earch GroupCórdoba, Spain
o::algo
es.uco.kdis.d
pro::algorithm
ates the diffe
co.kdis.datapr
pro::algorithm
egy
tion, the algo
n
orithm
datapro.algor
m::validation
erent partition
ro.algorithm.
m::validation
orithm is not
m::valid
rithm.validati
ns of the data
validation.KF
::KFolds
executable a
The pro
dation
ion
aset using th
Folds
anymore.
ogrammer’s gu
[48]
n
he KFolds
ide
![Page 50: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/50.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [49] University of Córdoba, Spain
getResult
This method returns the list containing the resulting dataset partitions.
Type List<Object> Visibility public Is Abstract false Parameter
initialize
This method initializes the algorithm. The instances are sorted as a HashMap by categories.
Type void Visibility public Is Abstract false Parameter
KFolds
Parameterized constructor. Notice that the class column is supposed to be the last column in the dataset:
• oDataset Source dataset • iNumberOfPartitions Number of partitions to be built
Type Visibility public Is Abstract false Parameter • in iNumberOfPartitions : int
• inout oDataset : Dataset
KFolds
Parameterized constructor. Notice that the class column is supposed to be the last column in the dataset:
• oDataset Source dataset • iNumberOfPartitions Number of partitions to be built • iSeed If the programmer wants to reproduce a previous partition, he can indicate a given seed to the process. Otherwise, the seed is randomly selected.
Type Visibility public Is Abstract false Parameter • in iNumberOfPartitions : int
• inout oDataset : Dataset
postexec
Not required.
Type void Visibility public Is Abstract false Parameter
![Page 51: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/51.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [50] University of Córdoba, Spain
Relation Detail
Generalization
Name Related Element • DatasetStrategy
![Page 52: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/52.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [51] University of Córdoba, Spain
Package es::uco::kdis::datapro::dataset
Figure 15. Package es.uco.kdis.datapro.dataset
Name dataset Qualified Name es::uco::kdis::datapro::dataset
Class Dataset Dataset is the abstract base class for all the different types of dataset sources. This class fills the gap between the physical dataset (stored in a file, database, etc.) and its logical handling, where the access to attributes/columns and processing methods is provided.
![Page 53: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/53.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [52] University of Córdoba, Spain
Figure 16. Class Dataset
Name Dataset Qualified Name es::uco::kdis::datapro::dataset::Dataset Visibility public Abstract true Base Classifier Realized Interface
Attribute Detail
iCursor
iCursor refers to the row being pointed in the dataset by the InstanceIterator.
![Page 54: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/54.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [53] University of Córdoba, Spain
Type int Default Value Visibility Protected Multiplicity
rgoColumns
rgoColumns is the list of columns that comprise the dataset.
Type ColumnAbstraction Default Value Visibility protected Multiplicity 0..*
rgoValidBinaryFalseValues
For binary columns, it contains the list of values that will be interpreted as False when reading from the physical dataset. Writing will be performed using the first element in the list.
Type String Default Value Visibility Protected Multiplicity 0..*
rgoValidBinaryTrueValues
For binary columns, it contains the list of values that will be interpreted as True when reading from the physical dataset. Writing will be performed using the first element in the list.
Type String Default Value Visibility protected Multiplicity 0..*
sOpenRangeDelimiter
For range columns, sOpenRangeDelimiter stores the symbol(s) that open the numerical range, right before the minimum value: e.g., '[' for [2,3]. This is used during the reading and writing of the physical dataset.
Type String Default Value Visibility protected Multiplicity
sSeparationRangeDelimiter
For range columns, sSeparationRangeDelimiter stores the symbol(s) that separate the minimum and maximum values in a numerical range: e.g., ',' for [2,3]. This value is only used during the reading and writing of the physical dataset.
![Page 55: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/55.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [54] University of Córdoba, Spain
Type String Default Value Visibility protected Multiplicity
sCloseRangeDelimiter
For range columns, sCloseRangeDelimiter stores the symbol(s) that serves to close the numerical range, right after the maximum value: e.g., ']' for [2,3]. This is only used during the reading and writing of the physical dataset.
Type String Default Value Visibility protected Multiplicity
tiplicity
sEmptyValue
sEmptyValue stores the string that will represent an empty value in the dataset file.
Type String Default Value Visibility protected Multiplicity
sMissingValue
sMissedValue stores the string that will represent a missing value in the dataset file.
Type String Default Value Visibility protected Multiplicity
sNullValue
sNullValue stores the string that will represent a null value in the dataset file.
Type String Default Value Visibility protected Multiplicity
sName
The name of the dataset.
Type String Default Value Visibility protected Multiplicity
![Page 56: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/56.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [55] University of Córdoba, Spain
Operation Detail
addAllValues
A set of column values are inserted into the dataset structure. Notice that instance duplication is not checked.
Parameters:
• sColumnFormat String that specifies the types of the columns to be added. Types depend on the specific dataset.
Exceptions:
• IOException • IllegalFormatSpecificationException • NotAddedValueException • IndexOutOfBoundsException
Type void Visibility protected Is Abstract true Parameter • inout sColumnFormat : String
addColumn
Insert a column abstraction given by parameter in the last position of the list of columns of the dataset
Parameter:
• oColumn: Column abstraction to be added
Type void Visibility public Is Abstract false Parameter • inout oColumn : ColumnAbstraction
addColumn
Insert a column abstraction in a given position of the list of dataset columns.
Parameters:
• oColumn: Column abstraction to be inserted • iIndex: Position index where the column element is added in the list. The rest of column
items will be shifted one position to the right. Exceptions:
• UnsupportedOperationException • ClassCastException • NullPointedException • IllegalArgumentException • IndexOutOfBoundsException
Type void Visibility public Is Abstract false Parameter • inout iIndex : int
• inout oColumn : ColumnAbstraction
![Page 57: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/57.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [56] University of Córdoba, Spain
clone
Create a new dataset exactly with the same metadata and column structure. However, only the structure is copied, since instances from the original dataset are not added to the new one.
It returns the empty cloned dataset.
Type Dataset Visibility public Is Abstract false Parameter
close
Abstract method that serves to close the physical dataset source.
Exceptions:
• IOException
Type void Visibility protected Is Abstract true Parameter
copy
This method creates a new dataset exactly with the same metadata, column structure and data than the original dataset. In this case, instances from the original dataset are also copied to the new one.
A copy of the dataset is returned.
Type Dataset Visibility public Is Abstract false Parameter
Dataset
This is the default constructor of this class. By default, it sets the following parameters to their default values:
• sMissedValue: "?" • sNullValue: "?" • sEmptyValue: "?" • sOpenRangeDelimiter: "[" • sSeparationRangeDelimiter: "," • sCloseRangeDelimiter: "]"
Notice that using these symbols is not mandatory for reading/writing, as its applicability depends on the specific implementation of each source dataset.
![Page 58: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/58.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [57] University of Córdoba, Spain
Type Visibility public Is Abstract false Parameter
getColumn
This method looks for a column abstraction by its index in the column list. Notice that indexes can change when one column is added or removed to/from intermediate positions.
Parameter:
• iIndex: Index of the queried column. It returns a reference to the column abstraction queried.
Type ColumnAbstraction Visibility public Is Abstract false Parameter • in iIndex : int
getColumnByName
This method returns the first column instance found having the name required as parameter. Parameter:
• sName: The name of the column queried (no case-sensitive) It returns the column abstraction class that accesses to the column required by its name.
Type ColumnAbstraction Visibility public Is Abstract false Parameter • inout sName : String
getColumns
Getter method for the private property rgoColumns, which comprises the array of column abstractions in the dataset.
Type List<ColumnAbstraction> Visibility public Is Abstract false Parameter
getEmptyValue
Getter method for the private property sEmptyValue, which comprises the String that represents the symbol for the empty value in the dataset source. Notice that each specific dataset implementation can use, or not, this property accordingly.
Type String Visibility public Is Abstract false Parameter
![Page 59: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/59.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [58] University of Córdoba, Spain
getIndexOfColumn
Given a column abstraction, it searches for the index that this column occupies in the array of column abstractions in the dataset.
Parameter:
• oCol: Column to be located.
It returns the index of the column abstraction passed as parameter; -1, otherwise.
Type int Visibility public Is Abstract false Parameter • inout oCol : ColumnAbstraction
getMissingValue
Getter method for the private property sMissingValue, which comprises the String that represents the symbol for the missing value in the dataset source. Notice that each specific dataset implementation can use, or not, this property accordingly.
Type String Visibility public Is Abstract false Parameter
getName
Getter method for the private property sName, which represents the name given to the dataset.
Type String Visibility public Is Abstract false Parameter
getNullValue
Getter method for the private property sNullValue, which comprises the String that represents the symbol for the null value in the dataset source. Notice that each specific dataset implementation can use or not this property accordingly.
Type String Visibility public Is Abstract false Parameter
getNumberOfDecimals
Getter method for the private property iNumberOfDecimals, which indicates the number of decimal digits used when writing numerical columns in dataset sources. Notice that this value can be used accordingly by each specific dataset source.
Type int Visibility public Is Abstract false Parameter
![Page 60: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/60.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [59] University of Córdoba, Spain
getRangeDelimiters
This method gets a list of the three values used to demarcate a range, comprising the sOpenRangeDelimiter, sSeparationRangeDelimiter and sCloseRangeDelimiter. Notice that each specific dataset source could make use of these values accordingly.
Type ArrayList<String> Visibility public Is Abstract false Parameter
getValidBinaryFalseValues
Getter method for the private property rgoValidBinaryFalseValues: the list of strings that are interpreted as false when reading and writing Boolean values in a dataset source. Notice that its specific use depends on the implementation made for each dataset source.
Type ArrayList<String> Visibility public Is Abstract false Parameter
getValidBinaryTrueValues
Getter method for the private property rgoValidBinaryTrueValues: the list of strings that are interpreted as true when reading and writing Boolean values in a dataset source. Notice that its specific use depends on the implementation made for each dataset source.
Type ArrayList<String> Visibility public Is Abstract false Parameter
merge
This method merges two datasets by adding the dataset passed as parameter to the current one. Parameters:
• oDSInjected: The dataset to be added. Notice that this dataset must contain the same number and type of columns than the dataset object this.
Type void Visibility public Is Abstract false Parameter • inout oDSInjected : Dataset
merge
This method merges two datasets by adding the dataset passed as parameter to the dataset object this.
Parameters:
oDataset: The dataset to be added. sColumnFormat: Sometimes the target dataset contains more columns than the source dataset.
For those cases, the columns to be added can be explicitly specified. This parameter is a String that indicates the columns to be added. Each character in the String matches to a column in the target dataset. The String may comprise some of the following characters:
o x: Include this column
![Page 61: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/61.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [60] University of Córdoba, Spain
o %: Skip this column.
Type void Visibility public Is Abstract false Parameter • inout oDataset : Dataset
• inout sColumnFormat : String
open
Abstract protected method. This method just opens the source dataset and initializes the row cursor to the first row of data. However, each specific dataset class is responsible for its implementation, and thus defining its real scope, according to its specific properties.
Notice that each type of datasets will provide specific methods to process the full dataset. For example, file datasets provide the method readDataset.
Exceptions:
• FileNotFoundException • IOException • IllegalFormatSpecificationException
Type void Visibility protected Is Abstract true Parameter
removeColumn
This method removes a column from the dataset. Notice that column indexes can be modified (decreased) for the rest of columns. The column removed is returned.
Parameter:
• iIndex: Position index where the column to be removed is located. Exceptions:
• UnsupportedOperationException • IndexOutOfBoundsException
Type ColumnAbstraction Visibility public Is Abstract false Parameter • in iIndex : int
setColumns
Setter method for the property rgoColumns. Even when it is a public method, notice that it should be used very carefully, mainly for those cases when the replacement of the entire set of columns is mandatory. To add or remove a single column, or just a set of them, use instead the methods addColumn and removeColumn.
Parameter:
• rgoCols: The entire list of columns in the dataset.
![Page 62: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/62.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [61] University of Córdoba, Spain
Type void Visibility public Is Abstract false Parameter • inout rgoCols : List<ColumnAbstraction>
setEmptyValue
Setter method for the private property sEmptyValue, which comprises the String that represents the symbol for the null value in the dataset source. Notice that each specific dataset implementation can make its own use of this property accordingly.
Parameters:
• sEmptyValue The symbol/string representing an empty value in the dataset
Type void Visibility public Is Abstract false Parameter • inout sEmptyValue : String
setMissingValue
Setter method for the private property sMissingValue, which comprises the String that represents the symbol for the missing value in the dataset source. Notice that each specific dataset implementation can make its own use of this property accordingly.
Parameters:
• sMissingValue The symbol/string representing a missing value in the dataset
Type void Visibility public Is Abstract false Parameter • inout sMissingValue : String
setName
Setter method for the private property sName, which represents the name of the dataset. Parameter:
• sName: The name of the dataset.
Type void Visibility public Is Abstract false Parameter • inout sName : String
setNullValue
Setter method for the private property sNullValue, which comprises the String that represents the symbol for the null value in the dataset source. Notice that each specific dataset implementation can make its own use of this property accordingly.
Parameters:
• sNullValue The symbol/string representing a null value in the dataset
Type void Visibility public Is Abstract false Parameter • inout sNull : String
![Page 63: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/63.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [62] University of Córdoba, Spain
setNumberOfDecimals
Setter method for the private property iNumberOfDecimals, which represents the number of decimals that the programmer wants to set for numerical values. Notice that the specific applicability of this attribute directly depends on the specific implementation of the dataset source.
Parameter:
• iNum: The number of decimal digits that will be considered when saving numerical values.
Type void Visibility public Is Abstract false Parameter • in iNum : int
setRangeDelimiters
This method sets the symbols that will serve as range delimiter. Notice that the specific applicability of these attributes directly depends on the specific implementation of the dataset source.
Parameters:
• sInitial: The symbol/string that represents the starting delimiter. • sSeparator: The symbol/string that represents the value separator. • sEnding: The symbol/string that represents the ending delimiter.
Type void Visibility public Is Abstract false Parameter • inout sEnding : String
• inout sInitial : String • inout sSeparator : String
setValidBinaryFalseValues
Setter method of the list rgoValidBinaryFalseValues, which contains the set of strings that represent a False boolean value in the dataset. Notice that only the first value of the list should be used for serialization (reading or writing). In any case, it depends on the specific implementation of each dataset source.
Parameter:
• rgoValidBinaryFalseValues: The list of values that will be interpreted as False.
Type void Visibility public Is Abstract false Parameter • inout rgoValidBinaryFalseValues : ArrayList<String>
setValidBinaryTrueValues
Setter method of the list rgoValidBinaryTrueValues, which contains the set of strings that represent a True boolean value in the dataset. Notice that only the first value of the list should be used for serialization (reading or writing). In any case, it depends on the specific implementation of each dataset source.
Parameter:
• rgoValidBinaryTrueValues: The list of values that will be interpreted as True.
![Page 64: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/64.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [63] University of Córdoba, Spain
Type void Visibility public Is Abstract false Parameter • inout rgoValidBinaryTrueValues : ArrayList<String>
setValidBinaryValues
This method sets both the list of strings that will represent a True boolean value, and the list of strings that will represent a False boolean value in the dataset. This functionality could be also done by invoking seldom specific methods.
Parameters:
• rgoFalseList: A list with the valid False symbols/strings
• rgoTrueList: A list with the valid True symbols/strings
Type void Visibility public Is Abstract false Parameter • inout rgoFalseList : ArrayList<String>
• inout rgoTrueList : ArrayList<String>
swapColumns
This method swaps two columns in the list of columns of the dataset. It searches for both columns, and swaps its positions, and thus both structure and data.
Parameters:
• oColumn1: The first column to swap. • oColumn2: The second column to swap.
Exceptions:
• ColumnAbstraction • UnsupportedOperationException • ClassCastException • NullPointedException • IllegalArgumentException • IndexOutOfBoundsException
Type void Visibility public Is Abstract false Parameter • inout oColumn1 : ColumnAbstraction
• inout oColumn2 : ColumnAbstraction
Relation Detail
Association
Name rgoColumns Related Element • ColumnAbstraction
Dependency
![Page 65: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/65.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [64] University of Córdoba, Spain
Name Related Element • InstanceIterator
Generalization
Name Related Element • FileDataset
Class FileDataset This abstract class represents a dataset when its source is extracted from a file. It includes the specific methods required to handle with datasets in form of files.
Figure 17. Class FileDataset
Name FileDataset Qualified Name es::uco::kdis::datapro::dataset::FileDataset Visibility public Abstract true Base Classifier • Dataset Realized Interface
Attribute Detail
oBufferedReader
oBufferedReader is the buffer used to read the file.
Type BufferedReader Default Value Visibility protected Multiplicity
sCommentValue
sCommentedValue stores the string that will indicate the beginning of a comment line in the dataset file, if
![Page 66: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/66.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [65] University of Córdoba, Spain
this line has to be omitted from the processing.
Type String Default Value Visibility protected Multiplicity
sFileName
sFileName is the name of the file source that contains the dataset.
Type String Default Value Visibility protected Multiplicity
sSeparationSymbol
sSeparationSymbol stores the symbol/string that indicates the separator between values of the same instance-row (i.e., a comma, a line of the dataset file, etc).
Type String Default Value Visibility protected Multiplicity
Operation Detail
clone
This method creates a new dataset exactly with the same type and column structure than the original. Instances from the original dataset are not copied. It returns a new Dataset instance.
Type Dataset Visibility public Is Abstract false Parameter
copy
This method clones the dataset and fills its content with the instances extracted from the original. Create a new dataset exactly with the same type, column structure and data. It returns the copied Dataset instance.
Type Dataset Visibility public Is Abstract false Parameter
FileDataset
Default constructor. Notice that the following symbols are used by default:
• sCommentValue: "%" • sSeparationSymbol: ","
![Page 67: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/67.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [66] University of Córdoba, Spain
Type Visibility public Is Abstract false Parameter
FileDataset
This constructor receives the name of the file as parameter. The following symbols are used as default:
• sCommentValue: "%" • sSeparationSymbol: ","
Parameter:
• sFileName: The filename of the dataset source.
Type Visibility public Is Abstract false Parameter • inout sFileName : String
getCommentValue
Getter method of the property sCommentValue.
Type String Visibility public Is Abstract false Parameter
getFileName
Getter method of the filename of the dataset source.
Type String Visibility public Is Abstract false Parameter
getSeparationSymbol
Getter method of the property sSeparationSymbol.
Type String Visibility public Is Abstract false Parameter
readDataset
Implementations of this abstract method will read the dataset from the file specified by the constructor.
Parameters:
• sContentFormat: String that specifies the reading format of the dataset file. Construct the string using a sequence of control tokens:
o % to omit a line (only one line).
![Page 68: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/68.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [67] University of Córdoba, Spain
o %name to read the name of columns (only one line). o %col to read data (zero, one or more lines).
Example: the string “%%%col%%name” indicates that the first two lines must be omitted, then data is read and, finally, the last line will contain the column names.
• sColumnFormat: A String that contains an ordered sequence of tokens that determine the
data type of each column to be read. Use the following tokens: o s: Nominal column o f: Real column o c: Categorical column o b: Binary column o i: Integer column o %: Skip this column (the column skipped is not processed)
Additionally, notice that other tokens can be considered depending of the specific dataset source (e.g., d for columns of type date).
Exceptions:
• FileNotFoundException • IOException • IllegalFormatSpecificationException • NotAddedValueException • IndexOutOfBoundsException
Type void Visibility public Is Abstract true
Parameter • inout sColumnFormat : String • inout sContentFormat : String
setCommentValue
Setter method of the property sCommentValue.
Parameter:
• sComment: The token/string indicating the symbol that represents a comment line in the dataset file.
Type void Visibility protected Is Abstract false Parameter • inout sComment : String
setFileName
Setter method of the property sFileName. Parameter:
• sFileName: The filename of the dataset source.
Type void Visibility public Is Abstract false Parameter • inout sFileName : String
setSeparationSymbol
Setter method of the property sSeparationSymbol. Parameter:
![Page 69: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/69.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [68] University of Córdoba, Spain
• sSeparationSymbol: The token used to differentiate between instances in the same line of the dataset source.
Type void Visibility protected Is Abstract false Parameter • inout sSeparator : String
writeDataset
This abstract method defines the signature of the write method for every file dataset. Implementations of this method deal with the serialization (writing) of the current column structure into each specific file format.
Parameter:
• sOutputFile: The path where the dataset file will be saved. Exception:
• IOException
Type void Visibility public Is Abstract true Parameter • inout sOutputFile : String
Relation Detail
Generalization
Name Related Element • CsvDataset
Name Related Element • ExcelDataset
Name Related Element • ArffDataset
Name Related Element • Dataset
Class InstanceIterator InstanceIterator is the class that implements the interface IIterator for covering the instances of the dataset. Thus, this class represents an iterator to access each row/instance in a dataset. The instance iterator provides methods to cover the whole set of instances and keeps the reference to the dataset being iterated.
![Page 70: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/70.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [69] University of Córdoba, Spain
Figure 18. Class InstanceIterator
Name InstanceIterator Qualified Name es::uco::kdis::datapro::dataset::InstanceIterator Visibility public Abstract false Base Classifier Realized Interface • IIterator
Attribute Detail All attributes are private.
Operation Detail
currentInstance
This method returns the list of objects that form the currently pointed instance in the dataset.
Type List<Object> Visibility public Is Abstract false Parameter
first
This method returns the list of objects that form the first instance in the dataset and sets the pointer to the first instance.
Type List<Object> Visibility public Is Abstract false Parameter
InstanceIterator
Default iterator constructor.
Parameter:
• oDataset: The dataset to be covered by the iterator.
Type Visibility public Is Abstract false Parameter • inout oDataset : Dataset
![Page 71: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/71.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [70] University of Córdoba, Spain
isDone
This method returns true if the dataset has no more instances to be iterated. False, otherwise.
Type boolean Visibility public Is Abstract false Parameter
next
This method increases the instance pointer by one, i.e. sets the pointer to the next instance in the dataset.
Type void Visibility public Is Abstract false Parameter
Relation Detail
Interface Realization
Name Related Element • IIterator
Interface IIterator IIterator is the interface that any instance iterator has to implement, as InstanceIterator does.
Figure 19. Interface IIterator
Name IIterator Qualified Name es::uco::kdis::datapro::dataset::IIterator Visibility public Base Classifier
Operation Detail
currentInstance
The implementation of this method has to return the current pointed instance in the dataset as a List of instances of any class from Object.
![Page 72: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/72.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [71] University of Córdoba, Spain
Type List<Object> Visibility public Is Abstract true Parameter
first
An implementation of this method returns the first instance of the dataset. From here on, the current instance pointed by the iterator should be this first one.
Type List<Object> Visibility public Is Abstract true Parameter
isDone
This method should be implemented to return True if the iterator points to the last instance of the dataset. It returns False otherwise.
Type boolean Visibility public Is Abstract true Parameter
next The implementation of this method increases the iterator to the next instance in the dataset.
Type void Visibility public Is Abstract true Parameter
Relation Detail
Interface Realization
Name Related Element • InstanceIterator
![Page 73: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/73.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [72] University of Córdoba, Spain
Package es::uco::kdis::datapro::dataset::Column This package contains the classes related to the different types of columns supported by the library. At the moment, datapro4j provides an implementation for the following types:
• Binary column, for positive or negative values. • Categorical column, for prefixed string values, considered as an enumeration of categories. • Date column. • Integer column, for numerical integer values. • Nominal column, for free valued strings. • Numerical column, for numerical real values. • Range column, for those values that represent a numerical interval (minimum, maximum), where both
open and close ranges can be considered.
Columns are coded following the philosophy of the bridge design pattern, where an abstraction is decoupled from its implementation. In this way, the programmer can add to the library new implementations of some of the columns provided, e.g. for performance reasons, without altering the manner in which the rest of the library–including algorithms–interacts with this column.
Therefore, every column type is implemented by at least two different classes: its abstraction, where the accessor methods to its functionalities exist, and its implementation, where these functionalities are coded, and invoked from the abstraction.
Using columns properly demands considering the following rules:
• Any code from the library (i.e. from other columns, datasets or strategies) should always invoke methods of the abstraction. Never invoke directly to the column implementation (only its own abstraction should).
• Altering current abstractions may cause unexpected failures. Use generalization or provide conversion methods to build your own abstractions instead.
• Abstractions and implementations must be subclasses of ColumnAbstraction and ColumnImplementation, respectively.
• Datapro4j only supports one implementation class per abstraction. If the programmer wants to have more than one implementation, then more than one abstraction should be provided, or a factory pattern should be coded.
• If new abstractions (i.e. type of columns) are provided, modify the enumeration ColumnType accordingly.
Figure 20. Package es.uco.kdis.datapro.dataset.Column
Name Column Qualified Name es::uco::kdis::datapro::dataset::Column
Class ColumnAbstraction This abstract class implements the common functionalities contained by every column in the dataset. It also defines the methods that are not coded by the implementation class, but they refer to the column metainformation (e.g. name, type, etc.). The latter methods are directly implemented by abstractions, since they do not require any access to data.
![Page 74: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/74.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [73] University of Córdoba, Spain
Figure 21. Abstract class ColumnAbstraction
Name ColumnAbstraction Qualified Name es::uco::kdis::datapro::dataset::Column::ColumnAbstraction Visibility public Abstract true Base Classifier Realized Interface
Attribute Detail
ctColumnType
The column type, as represented by the enumeration defined by the class ColumnType.
Type ColumnType Default Value Visibility protected Multiplicity 1
oImpl
A reference to the implementation object.
Type ColumnImpl Default Value Visibility protected Multiplicity 1
sName
The name of the column.
![Page 75: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/75.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [74] University of Córdoba, Spain
Type String Default Value Visibility protected Multiplicity
Operation Detail
addAllValues
This method calls the implementation to add a list of values at the end of the column.
Parameter:
• rgoCol The list of values to be added. The objects here contained must satisfy the type required by the column.
Type void Visibility public Is Abstract false Parameter • inout rgoCol : List<Object>
addValue
This method calls the implementation to add a single value at the end of the column.
Parameter:
• oValue The value to be added. It must satisfy the type required by the column. The method returns the number of items successfully added to the column.
Type int Visibility public Is Abstract false Parameter • inout oValue : Object
addValue
This method calls the implementation to add a single value at the end of the column.
Parameters:
• oValue The value to be added. It must satisfy the type required by the column. • bForce is used to indicate that the value must be added, independently of the constraints
and addition policies defined by the column type. The method returns the number of items successfully added to the column.
Type int Visibility public Is Abstract false Parameter • in bForce : boolean
• inout oValue : Object
addValue
This method calls the implementation to add a single value at a given position in the column.
Parameters:
![Page 76: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/76.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [75] University of Córdoba, Spain
• iIndex indicates the element position where the item has to be added. • oValue The value to be added. It must satisfy the type required by the column.
The method returns the number of items successfully added to the column.
Type int Visibility public Is Abstract false Parameter • in iIndex : int
• inout oValue : Object
ColumnAbstraction
Default constructor with parameters. Subclasses may override this method or create new constructors.
This constructor only assigns the parameter values to its respective variables. The constructor in the subclass should create the implementation object and assigned it to the variable oImpl.
Parameters:
• ctColumnType The column type. • sName The Name of the column to be created.
Type Visibility public Is Abstract false Parameter • inout ctColumnType : ColumnType
• inout sName : String
countEmptyValues
This method calls the implementation to return the number of empty values in the column set.
Type int Visibility public Is Abstract false Parameter
countInvalidValues
This method calls the implementation to return the number of invalid values (i.e. empty, null and missing values) in the column set.
Type int Visibility public Is Abstract false Parameter
countMissingValues
This method calls the implementation to return the number of missing values in the column set.
Type Int Visibility public Is Abstract false Parameter
![Page 77: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/77.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [76] University of Córdoba, Spain
countNullValues
This method calls the implementation to return the number of null values in the column set.
Type int Visibility public Is Abstract false Parameter
getElement
This method calls the implementation to return the element at the given position.
Parameter:
• iPos Position of the element queried.
Type Object Visibility public Is Abstract false Parameter • in iPos : int
getEmptyValue
This method calls the implementation to return the column-specific empty value. This is not the default empty value used by datapro4j (Class EmptyValue) for reading, writing or internally checking empty values, but it serves the developer to define its own use (e.g., the symbol associated to the empty value, or whatever).
Type Object Visibility public Is Abstract false Parameter
getMissingValue
This method calls the implementation to return the column-specific missing value. This is not the default missing value used by datapro4j (Class MissingValue) for reading, writing or internally checking missing values, but it serves the developer to define its own use (e.g., the symbol associated to a missing value, or whatever).
Type Object Visibility public Is Abstract false Parameter
getName
This method returns the name given of the column.
Type String Visibility public Is Abstract false Parameter
getNullValue
This method calls the implementation to return the column-specific null value. This is not the default null value used by datapro4j (Class NullValue) for reading, writing or internally checking null values, but it serves the developer to define its own null object.
![Page 78: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/78.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [77] University of Córdoba, Spain
Type Object Visibility public Is Abstract false Parameter
getSize
This method calls the implementation to return the size of the column.
Type int Visibility public Is Abstract false Parameter
getType
This method returns the type of the column as a value of ColumnType.
Type ColumnType Visibility public Is Abstract false Parameter
getValues
This method calls the implementation to return the list of items (as instances of Object) contained in the column.
Type List<Object> Visibility public Is Abstract false Parameter
removeValue
It calls the implementation to remove an element in the column at a given position. Parameter:
• iIndex The index of the element to be removed.
Type void Visibility public Is Abstract false Parameter • in iIndex : int
setEmptyValue
This method calls the implementation to set the column-specific empty value, if required. This is not the default empty value used by datapro4j (Class EmptyValue) for reading, writing or internally checking empty values, but the developer has to define its usage in the code of the proper strategies.
Parameter:
• oEmptyValue The empty value to be set.
Type void Visibility public Is Abstract false Parameter • inout oEmptyValue : Object
![Page 79: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/79.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [78] University of Córdoba, Spain
setMissingValue
This method calls the implementation to set the column-specific missing value, if required. This is not the default missing value used by datapro4j (Class MissingValue) for reading, writing or internally checking missing values, but the developer has to define its usage in the code of the proper strategies. Parameter:
• oMissingValue The missing value to be set.
Type void Visibility public Is Abstract false Parameter • inout oMissingValue : Object
setName
This method sets the name of the column.
Parameter:
• sName The new name for the column.
Type void Visibility public Is Abstract false Parameter • inout sName : String
setNullValue
This method calls the implementation to set the column-specific null value, if required. This is not the default null value used by datapro4j (Class NullValue) for reading, writing or internally checking null values, but the developer has to define its usage in the code of the proper strategies.
Parameter:
• oNullValue The null value to be set.
Type void Visibility public Is Abstract false Parameter • inout oNullValue : Object
setValue
This method calls the implementation to set the value of an element in the column at a given position.
Parameters:
• oValue The value to be added. • iIndex The element position in the column.
It returns the number of elements correctly added.
Type int Visibility public Is Abstract false Parameter • in iIndex : int
• inout oValue : Object
![Page 80: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/80.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [79] University of Córdoba, Spain
Relation Detail
Association
Name Related Element • ColumnImpl
Name Related Element • ColumnType
Name rgoColumns Related Element • Dataset
Generalization
Name Related Element • CategoricalColumn
Name Related Element • NumericalColumn
Name Related Element • DateColumn
Name Related Element • BinaryColumn
Name Related Element • NominalColumn
Name Related Element • RangeColumn
Class ColumnImpl This abstract class serves as a base for column implementation classes. These classes comprise the real code accessing data in the column. Only metainformation is managed by its abstraction.
Note: None of its methods should be directly invoked, apart from its specific abstraction. Thus, for a given column type, abstraction is inalterable, whereas implementation could be adapted by the programmer.
![Page 81: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/81.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [80] University of Córdoba, Spain
Figure 22. Abstract class ColumnImpl
Name ColumnImpl Qualified Name es::uco::kdis::datapro::dataset::Column::ColumnImpl Visibility public Abstract true Base Classifier Realized Interface
Attribute Detail
oEmptyValue
This object represents a column-specific empty value. Notice that this is not the standard empty value object, as used by datapro4j strategies and datasets.
Type Object Default Value null Visibility protected Multiplicity
oMissingValue
This object represents a column-specific missing value. Notice that this is not the standard missing value object, as used by datapro4j strategies and datasets.
Type Object Default Value null Visibility protected Multiplicity
oNullValue
This object represents a column-specific null value. Notice that this is not the standard null value object, as used by datapro4j strategies and datasets.
![Page 82: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/82.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [81] University of Córdoba, Spain
Type Object Default Value null Visibility protected Multiplicity
Operation Detail The following methods code the implementation for their corresponding abstraction methods.
addAllValues
This method implements the method addAllValues of the column abstraction, returning the number of objects successfully added.
Parameter:
• rgoCol The list of item objects to be added to the column.
Type int Visibility public Is Abstract true Parameter • inout rgoCol : List<Object>
addValue
This method implements the method addValue of the column abstraction, returning the number of objects successfully added.
Parameter:
• oValue The value to be added.
Type int Visibility public Is Abstract true Parameter • inout oValue : Object
addValue
This method implements the method addValue of the column abstraction, returning the number of objects successfully added.
Parameters:
• oValue The value to be added • bForce If true, the implementation must force its addition.
Note: By default bForce is not considered. Otherwise, the subclass implementing the specific column should explicitly rewrite this method.
Type int Visibility public Is Abstract false Parameter • inout oValue : Object
• in bForce : boolean
![Page 83: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/83.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [82] University of Córdoba, Spain
addValue
This method implements the method addValue of the column abstraction, returning the number of objects successfully added.
Parameters:
• oValue The value to be added. • iIndex The position in the column to add the value.
Type Int Visibility public Is Abstract true Parameter • inout oValue : Object
• in iIndex : int
countEmptyValues
This method implements the method countEmptyValue of the column abstraction, returning the number of empty values contained in the column values. -1 is returned if this value could not be calculated.
Type int Visibility public Is Abstract false Parameter
countInvalidValues
This method implements the method countInvalidValue of the column abstraction, returning the number of invalid values (null, empty and missing values) contained in the column values. -1 is returned if this value could not be calculated.
Type int Visibility public Is Abstract false Parameter
countMissingValues
This method implements the method countMissingValue of the column abstraction, returning the number of missing values contained in the column values. -1 is returned if this value could not be calculated.
Type int Visibility public Is Abstract false Parameter
countNullValues
This method implements the method countNullValue of the column abstraction, returning the number of null values contained in the column values. -1 is returned if this value could not be calculated.
Type int Visibility public Is Abstract false Parameter
![Page 84: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/84.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [83] University of Córdoba, Spain
getElement
This method implements the method getElement of the column abstraction, returning the element at the given position.
Parameter:
• iPos The position of the element to be returned.
Type Object Visibility public Is Abstract true Parameter • in iPos : int
getEmptyValue
This method implements the method getEmptyValue of the column abstraction, returning the element representing the column-specific empty value.
Type Object Visibility public Is Abstract false Parameter
getMissingValue
This method implements the method getMissingValue of the column abstraction, returning the element representing the column-specific missing value.
Type Object Visibility public Is Abstract false Parameter
getNullValue
This method implements the method getNullValue of the column abstraction, returning the element representing the column-specific null value.
Type Object Visibility public Is Abstract false Parameter
getSize
This method implements the method getSize of the column abstraction, returning the number of elements contained in the column.
Type int Visibility public Is Abstract true Parameter
![Page 85: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/85.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [84] University of Córdoba, Spain
getValues
This method implements the method getValues of the column abstraction, returning the list of elements (as instances of Object) contained in the column.
Type List<Object> Visibility public Is Abstract true Parameter
removeValue
This method implements the method removeValue of the column abstraction.
Parameter:
• iIndex The position in the column to add the value.
Type void Visibility public Is Abstract true Parameter • in iIndex : int
setEmptyValue
This method implements the method setEmptyValue of the column abstraction, setting the element representing the column-specific empty value.
Parameter:
• oEmptyValue The object representing a specific empty value in this column.
Type void Visibility public Is Abstract false Parameter • inout oEmptyValue : Object
setMissingValue
This method implements the method setMissingValue of the column abstraction, setting the element representing the column-specific missing value.
Parameter:
• oMissingValue The object representing a specific missing value in this column.
Type void Visibility public Is Abstract false Parameter • inout oMissingValue : Object
setNullValue
This method implements the method setNullValue of the column abstraction, setting the element representing the column-specific null value.
Parameter:
• oNullValue The object representing a specific null value in this column.
Type void Visibility public
![Page 86: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/86.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [85] University of Córdoba, Spain
Is Abstract false Parameter • inout oNullValue : Object
setValue
This method implements the method setValue of the column abstraction, setting the element value at the given position.
Parameters:
• oValue The object value to set. • iIndex The position index in the column.
Type int Visibility public Is Abstract true Parameter • in iIndex : int
• inout oValue : Object
Relation Detail
Association
Name Related Element • ColumnAbstraction
Generalization
Name Related Element • RangeColumnImpl
Name Related Element • NominalColumnImpl
Name Related Element • NumericalColumnImpl
Name Related Element • DateColumnImpl
Name Related Element • CategoricalColumnImpl
Name Related Element • BinaryColumnImpl
Enumeration ColumnType This enumeration contains the different types of columns supported by datapro4j. The following types are currently supported:
• Binary • Categorical • Date • Integer
![Page 87: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/87.jpg)
datapro4Rev 1 (Jul
More @ htt
• N• N• R
Note: If columns
Name QualifVisibilAbstraBase CRealiz
Attribut
Binar
Boolean
TyDeVisMu
Categ
Categor
TyDeVisMu
Date
Date att
TyDeVisMu
4j ly 2012)
tp://www.jrrom
Nominal Numerical Range
the programs)
ColumnAb
…
if (oCol…
}
fied Name lity act Classifier zed Interface
e Detail
ry
n attribute
ype efault Value sibility ultiplicity
gorical
rical attribute
ype efault Value sibility ultiplicity
tribute
ype efault Value sibility ultiplicity
mero.net/en
mmer wants to
bstraction
l.getType(
Columnes::ucopublic false
e
publ
e
publ
publ
KDIS ReseUniversity of
o check the c
n oCol;
().equals(
Figure 2
nType o::kdis::datap
ic
ic
ic
earch GroupCórdoba, Spain
column type,
(ColumnTyp
23. Enumerati
pro::dataset:
n
, the followin
pe.Binary)
ion ColumnT
:Column::Co
g code shou
)) {
ype
olumnType
The pro
uld be used (e
ogrammer’s gu
[86]
e.g. for binar
ide
ry
![Page 88: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/88.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [87] University of Córdoba, Spain
Integer
Integer attribute
Type Default Value Visibility public Multiplicity
Nominal
Nominal attribute
Type Default Value Visibility public Multiplicity
Numerical
Numerical attribute
Type Default Value Visibility public Multiplicity
Range
Range attribute
Type Default Value Visibility public Multiplicity
Relation Detail
Association
Name Related Element • ColumnAbstraction
Class BinaryColumn This class represents the abstraction of a binary column. Here the methods that provide specific operations on specific binary data are defined.
Figure 24. Class BinaryColumn
![Page 89: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/89.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [88] University of Córdoba, Spain
Name BinaryColumn Qualified Name es::uco::kdis::datapro::dataset::Column::BinaryColumn Visibility public Abstract false Base Classifier • ColumnAbstraction Realized Interface
Operation Detail
BinaryColumn
Default constructor. The implementation BinaryColumnImpl is invoked.
Type Visibility public Is Abstract false Parameter
BinaryColumn
Constructor with the name of the column as a parameter. The implementation BinaryColumnImpl is invoked.
Parameter:
• sName The name of the column.
Type Visibility public Is Abstract false Parameter • inout sName : String
toCategorical
This method calls the implementation to return a categorical column generated from the binary column. The resulting categorical column defines two categories, one per each binary value (false, true).
Parameters:
• sFalseCategory The category representing the false binary value. • sTrueCategory The category representing the true binary value.
Notes:
• If the value is an empty or a missing value, then a false value is considered. • If the value is a null value, then a null value is considered.
Type CategoricalColumn Visibility public Is Abstract false Parameter • inout sFalseCategory : String
• inout sTrueCategory : String
Relation Detail
Generalization
Name Related Element • ColumnAbstraction
![Page 90: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/90.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [89] University of Córdoba, Spain
Class BinaryColumnImpl
This class provides the implementation code accessing real data in a binary column. Binary values are stored as objects of class Boolean.
Note: None of its methods should be directly invoked, but only from its specific abstraction.
Figure 25. Class BinaryColumnImpl
Name BinaryColumnImpl Qualified Name es::uco::kdis::datapro::dataset::Column::BinaryColumnImpl Visibility public Abstract false Base Classifier • ColumnImpl Realized Interface
Attribute Detail All attributes are private.
Operation Detail
For a more complete specification of the methods inherited from ColumnImpl, see its specifications above.
addAllValues
Type int Visibility public Is Abstract false Parameter • inout rgoCol : List<Object>
addValue
Type int Visibility public Is Abstract false Parameter • inout oValue : Object
![Page 91: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/91.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [90] University of Córdoba, Spain
addValue
Type int Visibility public Is Abstract false Parameter • in iIndex : int
• inout oValue : Object
BinaryColumnImpl
Default constructor.
Type Visibility public Is Abstract false Parameter
countEmptyValues
Type int Visibility public Is Abstract false Parameter
countInvalidValues
Type int Visibility public Is Abstract false Parameter
countMissingValues
Type int Visibility public Is Abstract false Parameter
countNullValues
Type int Visibility public Is Abstract false Parameter
getElement
Type Object Visibility public Is Abstract false Parameter • in iPos : int
![Page 92: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/92.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [91] University of Córdoba, Spain
getSize
Type int Visibility public Is Abstract false Parameter
getValues
Type List<Object> Visibility public Is Abstract false Parameter
removeValue
Type void Visibility public Is Abstract false Parameter • inout iIndex : int
setValue
Type int Visibility public Is Abstract false Parameter • in iIndex : int
• inout oValue : Object
toCategorical
This method implements the method toCategorical of the binary column abstraction, converting the binary column into a categorical column.
Parameters:
• sName The name of the column. By default this property is set by the abstraction to the current name of the binary column.
• sFalseCategory The category representing the false binary value. • sTrueCategory The category representing the true binary value.
Type CategoricalColumn Visibility public Is Abstract false Parameter • inout sName : String
• inout sFalseCategory : String • inout sTrueCategory : String
Relation Detail
Generalization
Name Related Element • ColumnImpl
![Page 93: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/93.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [92] University of Córdoba, Spain
Class CategoricalColumn
This class defines the abstraction of a categorical column, where every value belongs to a predefined category. Here the methods that provide specific operations on categorical data are defined.
Figure 26. Class CategoricalColumn
Name CategoricalColumn Qualified Name es::uco::kdis::datapro::dataset::Column::CategoricalColumn Visibility public Abstract false Base Classifier • ColumnAbstraction Realized Interface
Operation Detail
addCategory
This method calls the implementation to add a new category to the set of allowable values. Categories are included as objects of class String.
Parameter:
• szCategory The new category in the column
Type void Visibility public Is Abstract false Parameter • inout szCategory : String
CategoricalColumn
Constructor with the name of the column as a parameter. The implementation CategoricalColumnImpl is invoked.
Parameter:
• sName The name of the column
Type Visibility public Is Abstract false Parameter • inout sName : String
![Page 94: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/94.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [93] University of Córdoba, Spain
CategoricalColumn
Default constructor. The implementation CategoricalColumnImpl is invoked.
Type Visibility public Is Abstract false Parameter
getCategoryIndex
This method calls the implementation to return the index in the list of categories of a given string. The value -1 is returned if the value is not found.
Parameter:
• szCategory The string representing the category to be searched in the list of categories
Type int Visibility public Is Abstract false Parameter • inout szCategory : String
getCategoryList
This method calls the implementation to return the list of categories in the column.
Type List<Object> Visibility public Is Abstract false Parameter
getCategoryName
This method calls the implementation to return the category string stored in a given position of the list of categories. null is returned if the index given is not valid.
Parameter:
• iIndex The index of the wanted category
Type String Visibility public Is Abstract false Parameter • inout iIndex : Integer
getElementIndex
This method calls the implementation to return the element stored in a given position in the column. The category index is returned, whereas the default method getElement (inherited from ColumnAbstraction) returns the category by name. If the value is invalid, -1 is returned.
Parameter:
• iPos The index of the item in the column
![Page 95: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/95.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [94] University of Córdoba, Spain
Exceptions:
• IndexOutOfBoundsException
Type Integer Visibility public Is Abstract false Parameter • in iPos : int
replaceCategory
This method calls the implementation to replace a given category with a new one. Parameters:
• szOldCategory The category string to be replaced • szNewCategory The new category string to be set • bJoinCategory If the new category string already exists, then this parameter
determines whether the values in of the old category are mixed together with the values of the column whose values coincide
1 is returned if the category is successfully replaced, or 0 otherwise.
Type int Visibility public Is Abstract false Parameter • in bJoinCategory : boolean
• inout szNewCategory : String • inout szOldCategory : String
toBinary
This method calls the implementation to return a binary column generated from the categorical column. Invalid values remain unaltered.
Parameter:
• aReferenceTrueValues The list of category strings to be as true values
Type BinaryColumn Visibility public Is Abstract false Parameter • inout aReferenceTrueValues : List<String>
toNominal
This method calls the implementation to return a nominal column generated from the strings stored in the categorical column. Nominal values are extracted from the strings representing each category.
Type NominalColumn Visibility public Is Abstract false Parameter
toNumerical
This method calls the implementation to return an integer column generated from the index values assigned to the categories in the source column.
![Page 96: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/96.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [95] University of Córdoba, Spain
Type IntegerColumn Visibility public Is Abstract false Parameter
Relation Detail
Generalization
Name Related Element • ColumnAbstraction
Class CategoricalColumnImpl This class provides the implementation code accessing real data in a categorical column. Categories are stored as a HashMap between a String and an Integer. Thus, internally, data are stored as an ArrayList of Integer, whereas their correspondences to categories are saved as String.
This class should never be directly invoked, apart from those invocations coming from its abstraction.
Figure 27. Class CategoricalColumnImpl
Name CategoricalColumnImpl Qualified Name es::uco::kdis::datapro::dataset::Column::CategoricalColumnImpl Visibility public Abstract false Base Classifier • ColumnImpl Realized Interface
![Page 97: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/97.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [96] University of Córdoba, Spain
Attribute Detail All attributes are private.
Operation Detail For a more complete specification of the methods inherited from ColumnImpl, see its specification above. Notice that values can be added both as a String –identifier- and as an Integer–index- (see methods addValue, addAllValues). In both cases only elements belonging to valid categories are added to the set of values in the column.
addCategory
This method implements the functionality of addCategory in the categorical column abstraction, adding a new category to the column. This category should not exist. It returns the index of the new category, if successfully created, or -1 if the category cannot be added.
Parameter:
• sCat The identifier of the new category
Type int Visibility public Is Abstract false Parameter • inout sCat : String
addValue
Type int Visibility public Is Abstract false Parameter • inout oValue : Object
addValue
Type int Visibility public Is Abstract false Parameter • in bForce : boolean
• inout oValue : Object
addValue
Type int Visibility public Is Abstract false Parameter • in iIndex : int
• inout oValue : Object
CategoricalColumnImpl
Default constructor.
![Page 98: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/98.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [97] University of Córdoba, Spain
Type Visibility public Is Abstract false Parameter
countEmptyValues
Type int Visibility public Is Abstract false Parameter
countInvalidValues
Type int Visibility public Is Abstract false Parameter
countMissingValues
Type int Visibility public Is Abstract false Parameter
countNullValues
Type int Visibility public Is Abstract false Parameter
getCategoryIndex
This method implements the functionality of getCategoryIndex in the column abstraction, returning the index of the category passed as String, or -1 if the category does not exist in the list of categories of the column.
Parameter:
• sCategory The category identifier
Type int Visibility public Is Abstract false Parameter • inout sCategory : String
getCategoryList
This method implements the functionality of getCategoryIndex in the column abstraction, returning the list of category identifiers comprised by the category list. The resulting list is not sorted.
![Page 99: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/99.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [98] University of Córdoba, Spain
Type List<Object> Visibility public Is Abstract false Parameter
getCategoryName
This method implements the functionality of getCategoryName in the column abstraction, returning the identifier of the category whose index is passed as parameter. If the category does not exist, then null is returned.
Parameter:
• iIndex The category index
Type String Visibility public Is Abstract false Parameter • inout iIndex : Integer
getElement
Type Object Visibility public Is Abstract false Parameter • in iPos : int
getElementIndex
This method implements the functionality of getElementIndex in the column abstraction, returning the category index stored at a given position. Notice that indexes in the category list do not have to be sorted or sequencial, since categories may be successively created and deleted, causing gaps in the index sequence. Always consider category indexes as numerical identifiers, never as sequential indexes.
This method returns -1 if the position given is invalid.
Parameter:
• iPos The position given in the category list. Exceptions:
• IndexOutOfBoundsException
Type Integer Visibility public Is Abstract false Parameter • in iPos : int
getSize
Type int Visibility public Is Abstract false Parameter
![Page 100: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/100.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [99] University of Córdoba, Spain
getValues
Type List<Object> Visibility public Is Abstract false Parameter
removeValue
Type void Visibility public Is Abstract false Parameter • in iIndex : int
replaceCategory
This method implements the functionality of replaceCategory in the column abstraction, updating both the category list and replacing the values in the column. 1 is returned if done; 0, otherwise.
Parameters:
• sOldCategory The old category identifier to be replaced • sNewCategory The new category • bJoinCategory If true, if the new category identifier already exists in the column, then the
values with the old category identifier will be joined to the already existing identifier, having only one category as a result
Type int Visibility public Is Abstract false Parameter • in bJoinCategory : boolean
• inout sNewCategory : String • inout sOldCategory : String
setValue
Type int Visibility public Is Abstract false Parameter • in iIndex : int
• inout oValue : Object
toBinary
This method implements the functionality of toBinary in the column abstraction, returning a binary column constructed from the data contained in the categorical column. The list of category identifiers considered as True values in the binary column is passes as parameter. The non included category identifiers are considered as False values. Note that invalid values are observed.
Parameters:
• aReferenceTrueValues The list of categories representing true values • sName The name of the new binary column
![Page 101: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/101.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [100] University of Córdoba, Spain
Type BinaryColumn Visibility public Is Abstract false Parameter • inout aReferenceTrueValues : List<String>
• inout sName : String
toNominal
This method implements the functionality of toNominal in the column abstraction, returning a nominal column constructed from the data contained in the categorical column. Strings for the nominal column are constructed from the category identifiers.
Parameter:
• sName The name of the new nominal column
Type NominalColumn Visibility public Is Abstract false Parameter • inout sName : String
toNumerical
This method implements the functionality of toNumerical in the column abstraction, returning an integer column constructed from the data contained in the categorical column. Numbers of the integer column are extracted from the category indexes.
Parameter:
• sName The name of the new integer column
Type IntegerColumn Visibility public Is Abstract false Parameter • inout sName : String
Relation Detail
Generalization
Name Related Element • RangeColumnImpl
Name Related Element • ColumnImpl
Class DateColumn This class represents the abstraction of a date datatype column. This type of column is specifically required by ARFF datasets.
![Page 102: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/102.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [101] University of Córdoba, Spain
Figure 28. Class DateColumn
Name DateColumn Qualified Name es::uco::kdis::datapro::dataset::Column::DateColumn Visibility public Abstract false Base Classifier • ColumnAbstraction Realized Interface
Operation Detail
addDateSpecification
This method calls the implementation to set the date format specification of the values in the column.
Parameter:
• sDate The format specification of the values in the date column
Type void Visibility public Is Abstract false Parameter • inout oDate : SimpleDateFormat
DateColumn
Default constructor with no parameters. The implementation DateColumnImpl is invoked.
Type Visibility public Is Abstract false Parameter
DateColumn
Constructor with the name of the column as a parameter. The implementation DateColumnImpl is invoked.
Parameter:
• sName The name of the column
Type Visibility public Is Abstract false Parameter • inout sName : String
getDateSpecification
This method calls the implementation to get the date format specification of the values in the column.
![Page 103: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/103.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [102] University of Córdoba, Spain
Type SimpleDateFormat Visibility public Is Abstract false Parameter
Relation Detail
Generalization
Name Related Element • ColumnAbstraction
Class DateColumnImpl This class provides the implementation code accessing real data in a date column. Values are stored as
Date objects according to the format specified by a given SimpleDateFormat object. This class should not be invoked directly, only by the column abstraction.
Figure 29. Class DateColumnImpl
Name DateColumnImpl Qualified Name es::uco::kdis::datapro::dataset::Column::DateColumnImpl Visibility public Abstract false Base Classifier • ColumnImpl Realized Interface
Attribute Detail
All attributes are private.
Operation Detail For a more complete specification of the methods inherited from ColumnImpl, see its specifications above.
![Page 104: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/104.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [103] University of Córdoba, Spain
addAllValues
Type int Visibility public Is Abstract false Parameter • inout rgoCol : List<Object>
addDateSpecification
This method implements the method addDateSpecification of the date column abstraction, setting the date format specification of the values in the column.
Parameter:
• sDate The format specification of the values in the date column
Type void Visibility public Is Abstract false Parameter • inout oDate : SimpleDateFormat
addValue
Type int Visibility public Is Abstract false Parameter • inout oValue : Object
addValue
Type int Visibility public Is Abstract false Parameter • in bForce : boolean
• inout oValue : Object
addValue
Type int Visibility public Is Abstract false Parameter • in iIndex : int
• inout oValue : Object
DateColumnImpl Default constructor with no parameters.
Type Visibility public Is Abstract false Parameter
![Page 105: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/105.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [104] University of Córdoba, Spain
getDateSpecificaiton
This method implements the method getDateSpecification of the column abstraction, returning the date format specification of the values in the column.
Type SimpleDateFormat Visibility public Is Abstract false Parameter
getElement
Type Object Visibility public Is Abstract false Parameter • in iPos : int
getSize
Type int Visibility public Is Abstract false Parameter
getValues
Type List<Object> Visibility public Is Abstract false Parameter
removeValue
Type void Visibility public Is Abstract false Parameter • in iIndex : int
setValue
Type int Visibility public Is Abstract false Parameter • in iIndex : int
• inout oValue : Object
Relation Detail
Generalization
Name Related Element • ColumnImpl
![Page 106: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/106.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [105] University of Córdoba, Spain
Class IntegerColumn This class represents the abstraction of an integer column. Integer columns are a specialization of numerical (real) columns.
Figure 30. Class IntegerColumn
Name IntegerColumn Qualified Name es::uco::kdis::datapro::dataset::Column::IntegerColumn Visibility public Abstract false Base Classifier • NumericalColumn Realized Interface
Operation Detail Many methods are specializations of their respective methods in the numerical column (NumericalColumn), adapted to the domain of integer values.
getiMaxInterval
Analogously to getdMaxInterval in the NumericalColumn abstraction class, this method gets the maximum integer value allowed for this column.
Type Integer Visibility public Is Abstract false Parameter
getiMinInterval
Analogously to getdMinInterval in the NumericalColumn abstraction class, this method gets the minimum integer value allowed for this column.
Type Integer Visibility public Is Abstract false Parameter
![Page 107: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/107.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [106] University of Córdoba, Spain
getMaxValue
See getMaxValue in the specification of the NumericalColumn abstraction class.
Type double Visibility public Is Abstract false Parameter
getMinValue
For further information, see getMinValue in the specification of the NumericalColumn abstraction class.
Type double Visibility public Is Abstract false Parameter
IntegerColumn
Default constructor with no parameters.
Type Visibility public Is Abstract false Parameter
IntegerColumn
Constructor with the name of the resulting column as a parameter.
Parameter:
• sName The Name of the column
Type Visibility public Is Abstract false Parameter • inout sName : String
mean
For further information, see mean in the specification of the NumericalColumn abstraction class.
Type double Visibility public Is Abstract false Parameter
setiMaxInterval
Analogously to setdMaxInterval in the NumericalColumn abstraction class, this method sets the maximum integer value allowed for this column.
Parameter:
• iMaxInterval The maximum value allowed in the column Exceptions:
• IllegalAccessException if the value cannot be set.
![Page 108: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/108.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [107] University of Córdoba, Spain
Type void Visibility public Is Abstract false Parameter • inout iMaxInterval : Integer
setiMinInterval
Analogously to setdMinInterval in the NumericalColumn abstraction class, this method sets the minimum integer value allowed for this column.
Parameter:
• iMinInterval The maximum value allowed in the column Exceptions:
• IllegalAccessException if the value cannot be set.
Type void Visibility public Is Abstract false Parameter • inout iMinInterval : Integer
standardDeviation
For further information, see standardDeviation in the specification of the NumericalColumn abstraction class.
Type double Visibility public Is Abstract false Parameter
toCategorical
This method calls the implementation to return a categorical column using the values contained in the integer column, where each different value constitutes a different category.
Type CategoricalColumn Visibility public Is Abstract false Parameter
toNumerical
This method calls the implementation to return a numerical column using the values contained in the integer column, where each integer value is casted to a double value.
Type NumericalColumn Visibility public Is Abstract false Parameter
![Page 109: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/109.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [108] University of Córdoba, Spain
Relation Detail
Generalization
Name Related Element • NumericalColumn
Class IntegerColumnImpl This class provides the implementation code accessing real data in an integer column. This class is a specialization of the numerical column implementation (NumericalColumnImpl). Integer values are stored as objects of class Integer. This class and its methods should not be invoked directly.
Figure 31. Class IntegerColumnImpl
Name IntegerColumnImpl Qualified Name es::uco::kdis::datapro::dataset::Column::IntegerColumnImpl Visibility public Abstract false Base Classifier • NumericalColumnImpl Realized Interface
Operation Detail For further information, see a complete specification of these methods in NumericalColumnImpl and ColumnImpl.
addValue
Type int Visibility public Is Abstract false Parameter • inout oValue : Object
addValue
Type int Visibility public Is Abstract false Parameter • in iIndex : int
• inout oValue : Object
![Page 110: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/110.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [109] University of Córdoba, Spain
getMaxValue
Type double Visibility public Is Abstract false Parameter
getMinValue
Type double Visibility public Is Abstract false Parameter
IntegerColumnImpl
Default constructor with no parameters.
Type Visibility public Is Abstract false Parameter
mean
Type double Visibility public Is Abstract false Parameter
setValue
Type int Visibility public Is Abstract false Parameter • in iIndex : int
• inout oValue : Object
standardDeviation
Type double Visibility public Is Abstract false Parameter
toCategorical
This method implements the method toNumerical of the abstraction, returning a categorical column using the values contained in the integer column, where each different value constitutes a different category.
Parameter:
• sName The name of the resulting column
![Page 111: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/111.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [110] University of Córdoba, Spain
Type CategoricalColumn Visibility public Is Abstract false Parameter • inout sName : String
toNumerical
This method implements the method toNumerical of the abstraction, returning a numerical column using the values contained in the integer column, where each different value constitutes a different category.
Parameter:
• sName The name of the resulting column
Type NumericalColumn Visibility public Is Abstract false Parameter • inout sName : String
Relation Detail
Generalization
Name Related Element • NumericalColumnImpl
Class NominalColumn This class represents the abstraction of a nominal column containing free-style strings as values. Here the methods that provide specific operations of nominal values are defined.
Figure 32. Class NominalColumn
Name NominalColumn Qualified Name es::uco::kdis::datapro::dataset::Column::NominalColumn Visibility public Abstract false Base Classifier • ColumnAbstraction Realized Interface
Operation Detail
NominalColumn
Default constructor with no parameters.
![Page 112: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/112.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [111] University of Córdoba, Spain
Type Visibility public Is Abstract false Parameter
NominalColumn
Constructor with the name of the column as parameter.
Parameter:
• sName Name of the column
Type Visibility public Is Abstract false Parameter • inout sName : String
toCategorical
This method calls the implementation to return a categorical column, where each different string is a category (no repetition).
Type CategoricalColumn Visibility public Is Abstract false Parameter
toNumerical
This method calls the implementation to return a numerical column, where each string is parsed to a numerical value. If the string could not be parsed, then an empty value is added instead. Notice that upper and lower interval values are not set for the numerical column returned.
Type NumericalColumn Visibility public Is Abstract false Parameter
Relation Detail
Generalization
Name Related Element • ColumnAbstraction
Class NominalColumnImpl This class provides the implementation code accessing real data in the nominal column. Nominal values are stored as String objects. Note that these methods should not be invoked directly.
![Page 113: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/113.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [112] University of Córdoba, Spain
Figure 33. Class NominalColumnImpl
Name NominalColumnImpl Qualified Name es::uco::kdis::datapro::dataset::Column::NominalColumnImpl Visibility public Abstract false Base Classifier • ColumnImpl Realized Interface
Attribute Detail All attributes are private.
Operation Detail For a more detailed specification of the methods inherited from ColumnImpl, see its specification above.
addAllValues
Type int Visibility public Is Abstract false Parameter • inout rgoCol : List<Object>
addValue
Type int Visibility public Is Abstract false Parameter • inout oValue : Object
![Page 114: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/114.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [113] University of Córdoba, Spain
addValue
Type int Visibility public Is Abstract false Parameter • in iIndex : int
• inout oValue : Object
countEmptyValues
Type int Visibility public Is Abstract false Parameter
countInvalidValues
Type int Visibility public Is Abstract false Parameter
countMissingValues
Type int Visibility public Is Abstract false Parameter
countNullValues
Type int Visibility public Is Abstract false Parameter
getElement
Type Object Visibility public Is Abstract false Parameter • in iPos : int
getSize
Type int Visibility public Is Abstract false Parameter
![Page 115: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/115.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [114] University of Córdoba, Spain
getValues
Type List<Object> Visibility public Is Abstract false Parameter
NominalColumnImpl
Default constructor with no parameters.
Type Visibility public Is Abstract false Parameter
removeValue
Type void Visibility public Is Abstract false Parameter • in iIndex : int
setValue
Type int Visibility public Is Abstract false Parameter • in iIndex : int
• inout oValue : Object
toCategorical
This method implements the method toCategorical of the abstraction, returning a categorical column, where each different string is a category (no repetition).
Parameter:
• sName The name of the column to be created
Type CategoricalColumn Visibility public Is Abstract false Parameter • inout sName : String
toNumerical
This method implements the method toNumerical of the abstraction, returning a numerical column, where each string is parsed to a numerical value. If the string could not be parsed, then an empty value is added instead. Notice that upper and lower interval values are not set for the numerical column returned.
Parameter:
• sName The name of the column
![Page 116: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/116.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [115] University of Córdoba, Spain
Type NumericalColumn Visibility public Is Abstract false Parameter • inout sName : String
Relation Detail
Generalization
Name Related Element • ColumnImpl
Class NumericalColumn This class represents the abstraction of a numerical (real) column.
Figure 34. Class NumericalColumn
Name NumericalColumn Qualified Name es::uco::kdis::datapro::dataset::Column::NumericalColumn Visibility public Abstract false Base Classifier • ColumnAbstraction Realized Interface
Attribute Detail
dMaxInterval
This attribute indicates the maximum value allowed in the column. This property should be accessed using getter/setter methods.
Type Double Default Value Double.MAX_VALUE Visibility protected Multiplicity
![Page 117: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/117.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [116] University of Córdoba, Spain
dMinInterval
This attribute indicates the minimum value allowed in the column. This property should be accessed using getter/setter methods.
Type Double Default Value Double.MIN_VALUE Visibility protected Multiplicity
Operation Detail
getdMaxInterval
This method returns the maximum real value allowed in the column. The implementation is not invoked, since this information is part of metadata.
Type Double Visibility public Is Abstract false Parameter
getdMinInterval
This method returns the minimum real value allowed in the column. The implementation is not invoked, since this information is part of metadata.
Type Double Visibility public Is Abstract false Parameter
getMaxValue
This method calls the implementation to get the maximum existing value in the column data.
Type double Visibility public Is Abstract false Parameter
getMinValue
This method calls the implementation to get the minimum existing value in the column data.
Type double Visibility public Is Abstract false Parameter
mean
This method calls the implementation to get the mean value of the column data.
![Page 118: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/118.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [117] University of Córdoba, Spain
Type double Visibility public Is Abstract false Parameter
normalize
This method calls the implementation to normalize the set of values in the numerical column.
Type void Visibility public Is Abstract false Parameter
NumericalColumn
Default constructor with no parameters. The implementation NumericalColumnImpl is invoked.
Type Visibility public Is Abstract false Parameter
NumericalColumn
Constructor with the name of the column as a parameter. The implementation NumericalColumnImpl is invoked.
Parameter:
• sName The name of the column
Type Visibility public Is Abstract false Parameter • inout sName : String
setdMaxInterval
This method sets the maximum real value allowed in the column. The implementation is not invoked, since this information is part of the column metadata.
Parameter
• dMaxInterval The maximum value allowed Exceptions:
• IllegalAccessException if the value cannot be set
Type void Visibility public Is Abstract false Parameter • inout dMaxInterval : Double
setdMinInterval
This method sets the minimum real value allowed in the column. The implementation is not invoked, since this information is part of the column metadata.
Parameter
![Page 119: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/119.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [118] University of Córdoba, Spain
• dMinInterval The minimum value allowed Exceptions:
• IllegalAccessException if the value cannot be set
Type void Visibility public Is Abstract false Parameter • inout dMinInterval : Double
standardDeviation
This method calls the implementation to return the standard deviation calculated from the set of values in the numerical column.
Type double Visibility public Is Abstract false Parameter
standarize
This method calls the implementation to standarize the set of values in the numerical column.
Parameters:
• dMean Value of the mean used to standardize the set of values of the column • dVariance Value of the variance used for the standardization
Type void Visibility public Is Abstract false Parameter • in dMean : double
• in dVariance : double
toInteger
This method calls the implementation to return an integer column containing values extracted from the numerical column. It returns an IntegerColumn object.
Parameter:
• bRoundedValue if false, values are truncated; if true, values are rounded.
Type IntegerColumn Visibility public Is Abstract false Parameter • in bRoundedValue : boolean
toNominal
This method calls the implementation to return a nominal column, where strings are constructed from real values.
Type NominalColumn Visibility public Is Abstract false Parameter
![Page 120: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/120.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [119] University of Córdoba, Spain
Relation Detail
Generalization
Name Related Element • ColumnAbstraction
Class NumericalColumnImpl This class provides the implementation code accessing real data in a numerical column. Values are stored as objects of the class Double. Notice that this class should not be directly instantiated, with the exception of its abstraction.
Figure 35. Class NumericalColumnImpl
Name NumericalColumnImpl Qualified Name es::uco::kdis::datapro::dataset::Column::NumericalColumnImpl Visibility public Abstract false Base Classifier • ColumnImpl Realized Interface
Attribute Detail All the attributes are either private or protected.
Operation Detail
addAllValues
Type int Visibility public Is Abstract false Parameter • inout rgoCol : List<Object>
![Page 121: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/121.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [120] University of Córdoba, Spain
addValue
Type int Visibility public Is Abstract false Parameter • inout oValue : Object
addValue
Type int Visibility public Is Abstract false Parameter • in iIndex : int
• inout oValue : Object
countEmptyValues
Type int Visibility public Is Abstract false Parameter
countInvalidValues
Type int Visibility public Is Abstract false Parameter
countMissingValues
Type int Visibility public Is Abstract false Parameter
countNullValues
Type int Visibility public Is Abstract false Parameter
getElement
Type Object Visibility public Is Abstract false Parameter • in iPos : int
![Page 122: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/122.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [121] University of Córdoba, Spain
getMaxValue
This method implements the method getMaxValue of the abstraction class, returning the maximum existing value in the column.
Type double Visibility public Is Abstract false Parameter
getMinValue
This method implements the method getMinValue of the abstraction class, returning the maximum existing value in the column.
Type double Visibility public Is Abstract false Parameter
getSize
Type int Visibility public Is Abstract false Parameter
getValues
Type List<Object> Visibility public Is Abstract false Parameter
mean
This method implements the method mean of the abstraction class, returning the mean value of the column.
Type double Visibility public Is Abstract false Parameter
normalize
This method implements the method normalize of the abstraction class, calculating and normalizing the values contained in the set of values of the column.
Type void Visibility public Is Abstract false Parameter
NumericalColumnImpl
Default constructor with no parameters.
![Page 123: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/123.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [122] University of Córdoba, Spain
Type Visibility public Is Abstract false Parameter
removeValue
Type void Visibility public Is Abstract false Parameter • in iIndex : int
setValue
Type int Visibility public Is Abstract false Parameter • in iIndex : int
• inout oValue : Object
standardDeviation
This method implements the method standardDeviation of the abstraction class, returning the standard deviation value of the set of values contained in the numerical column.
Type double Visibility public Is Abstract false Parameter
standarize
This method implements the method standarize of the abstraction class, standardizing the values in the column according to the mean and variance passed as parameter.
Parameters:
• dMean Mean value considered for the standardization • dVariance Variance value considered for the standardization
Type void Visibility public Is Abstract false Parameter • in dMean : double
• in dVariance : double
toInteger
This method implements the method toInteger of the abstraction class, returning an integer column calculated from the numerical column.
Parameters:
• sName The name of the resulting new column • bRoundedValue If false, values are truncated; if true, values are rounded
![Page 124: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/124.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [123] University of Córdoba, Spain
Type IntegerColumn Visibility public Is Abstract false Parameter • in bRoundedValue : boolean
• inout sName : String
toNominal
This method implements the method toNominal of the abstraction class, returning a nominal column which strings are constructed parsing the numerical values in the column.
Parameter:
• sName The name of the resulting new column
Type NominalColumn Visibility public Is Abstract false Parameter • inout sName : String
Relation Detail
Generalization
Name Related Element • ColumnImpl
Class RangeColumn This class represents the abstraction of a range column, whose values are intervals with a minimum and a maximum value in the range.
Figure 36. Class RangeColumn
Name RangeColumn Qualified Name es::uco::kdis::datapro::dataset::Column::RangeColumn Visibility public Abstract false Base Classifier • ColumnAbstraction Realized Interface
Operation Detail
RangeColumn
Default constructor with no parameters.
Type
![Page 125: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/125.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [124] University of Córdoba, Spain
Visibility public Is Abstract false Parameter
RangeColumn
Constructor with the name of the column as a parameter.
Parameter:
• sName The name of the column.
Type Visibility public Is Abstract false Parameter • inout sName : String
toCategorical
This method calls the implementation to return a categorical column extracted from the range data contained in the column. The method returns a CategoricalColumn object.
Exceptions:
• NotAddedValueException
Type CategoricalColumn Visibility public Is Abstract false Parameter
toNumerical
This method calls the implementation to return a numerical column extracted from the range values contained in the column, and according to on of the following modes:
0: The minimum value of each range is selected.
1: The maximum value of each range is selected.
2: The mean value between min and max is selected.
3: A random value in the range is selected.
It returns the resulting NumericalColumn object.
Parameter:
• iMode An integer between 0 and 3 indicating the conversion mode, as described above. Exceptions:
• NotAddedValueException
Type NumericalColumn Visibility public Is Abstract false Parameter • inout iMode : int
![Page 126: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/126.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [125] University of Córdoba, Spain
toNumericalByGaussian
This method calls the implementation to return a numerical column extracted from the range values contained in the column, according to the Gauss distribution.
Parameters:
• dMean The arithmetic mean for the distribution • dStdDev The standard deviation for the distribution
It returns the resulting NumericalColumn object.
Exceptions:
• NotAddedValueException
Type NumericalColumn Visibility public Is Abstract false Parameter • in dMean : double
• in dStdDev : double
Relation Detail
Generalization
Name Related Element • ColumnAbstraction
Class RangeColumnImpl This class, the abstraction of a range column (i.e. a representation of a [min, max] interval), is the one that should be used by the programmer, since it hides the actual implementation of the column. Even when the implementation changes, the abstraction must remain unaltered.
Figure 37. Class RangeColumnImpl
Name RangeColumnImpl
![Page 127: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/127.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [126] University of Córdoba, Spain
Qualified Name es::uco::kdis::datapro::dataset::Column::RangeColumnImpl Visibility public Abstract false Base Classifier • ColumnImpl Realized Interface
Attribute Detail All attributes are private.
Operation Detail For a detailed specification of the methods inherited from ColumnImpl, see its specifications above.
addAllValues
Type int Visibility public Is Abstract false Parameter • inout rgoValues : List<Object>
addValue
Type int Visibility public Is Abstract false Parameter • inout oValue : Object
addValue
Type int Visibility public Is Abstract false Parameter • in iIndex : int
• inout oValue : Object
countEmptyValues
Type int Visibility public Is Abstract false Parameter
countInvalidValues
Type int Visibility public Is Abstract false Parameter
countMissingValues
![Page 128: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/128.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [127] University of Córdoba, Spain
Type int Visibility public Is Abstract false Parameter
countNullValues
Type int Visibility public Is Abstract false Parameter
getElement
Type Object Visibility public Is Abstract false Parameter • in iPos : int
getSize
Type int Visibility public Is Abstract false Parameter
getValues
Type List<Object> Visibility public Is Abstract false Parameter
RangeColumn
Default constructor with no parameters.
Type Visibility public Is Abstract false Parameter
RangeColumn
Constructor with the name of the column as a Parameter.
Parameter:
• sName The name of the column.
Type Visibility public
![Page 129: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/129.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [128] University of Córdoba, Spain
Is Abstract false Parameter • inout sName : String
removeValue
Type Visibility public Is Abstract false Parameter • in iIndex : int
setValue
Type int Visibility public Is Abstract false Parameter • in iIndex : int
• inout oValue : Object
toCategorical
This method implements the method toCategorical of the abstraction, returning a categorical column extracted from the range data contained in the column. The method returns the resulting CategoricalColumn object.
Exceptions:
• NotAddedValueException
Type CategoricalColumn Visibility public Is Abstract false Parameter
toNumerical
This method implements the method toNumerical of the abstraction, returning a numerical column extracted from the range values contained in the column, and according to on of the following modes:
0: The minimum value of each range is selected.
1: The maximum value of each range is selected.
2: The mean value between min and max is selected.
3: A random value in the range is selected.
It returns the resulting NumericalColumn object.
Parameter:
• iMode An integer between 0 and 3 indicating the conversion mode, as described above. Exceptions:
• NotAddedValueException
Type NumericalColumn Visibility public
![Page 130: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/130.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [129] University of Córdoba, Spain
Is Abstract false Parameter • in iMode : int
toNumericalByGaussian
This method implements the method toNumericalByGaussian of the abstraction, returning a numerical column extracted from the range values contained in the column, according to the Gauss distribution.
Parameters:
• dMean The arithmetic mean for the distribution • dStdDev The standard deviation for the distribution
It returns the resulting NumericalColumn object.
Exceptions:
• NotAddedValueException
Type NumericalColumn Visibility public Is Abstract false Parameter • in dMean : double
• in dStdDev : double
Relation Detail
Generalization
Name Related Element • ColumnImpl
![Page 131: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/131.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [130] University of Córdoba, Spain
Package es::uco::kdis::datapro::dataset::Source
Figure 38. Package es.uco.kdis.datapro.dataset.Source
Name Source Qualified Name es::uco::kdis::datapro::dataset::Source
Class ArffDataset ArffDataset implements the ARFF (Attribute-Relation File Format) dataset file specification, as used by Weka. This is a subclass of FileDataset.
ARFF files are ASCII text files that describe a list of instances sharing a set of attributes. After a few heading lines, where the metainformation is presented, one instance per line is dumped, until the end of the file is reached.
Types of attribute in ARFF dataset files:
• @ATTRIBUTE name numeric (As numerical columns) • @ATTRIBUTE name {value1, value2, ...} (As categorical columns) • @ATTRIBUTE name string (As nominal columns) • @ATTRIBUTE name date "yyyy-MM-dd HH:mm:ss" (As date columns)
For a further description, visit the web site http://www.cs.waikato.ac.nz/ml/weka/arff.html (Nov. 1st, 2008).
Figure 39. Class ArffDataset
Name ArffDataset Qualified Name es::uco::kdis::datapro::dataset::Source::ArffDataset Visibility public Abstract false Base Classifier • FileDataset Realized Interface
![Page 132: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/132.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [131] University of Córdoba, Spain
Attribute Detail Some attributes are protected to allow reusability by inheritance.
ATTRIBUTE
ATTRIBUTE is the static constant string for the ARFF keyword '@attribute'.
Type String Default Value "@attribute" Visibility protected Multiplicity
DATA
DATA is the static constant string for the ARFF keyword '@data'. It defines the beginning of the data block in the ARFF file.
Type String Default Value "@data" Visibility protected Multiplicity
RELATION
RELATION is the static constant with the ARFF keyword '@relation'. It represents the beginning of the ARFF dataset definition.
Type String Default Value "@relation" Visibility protected Multiplicity
Operation Detail
addAllValues
This method reads the DATA block in the dataset and adds the values in the file to the corresponding column structure.
Parameter:
• sColumnFormat String indicating the sequence of column types that corresponds to the attribute order of each instance in the dataset.
o s: Nominal column o f: Numerical (real) column o c: Categorical column o b: Binary column o d: Date column o %: Skip this column (do not dump its values to any column)
For example, the string “cbbf%%d” indicates the sequence of attributes that are read from the dataset and copied in memory to the column structure of the dataset: a categorical column, two binary columns, and a numerical column. The following two attributes are omitted. Finally, the date attribute is copied.
![Page 133: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/133.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [132] University of Córdoba, Spain
Exceptions:
• IndexOutOfBoundsException • IOException • NotAddedValueException
Type void Visibility protected Is Abstract false Parameter • inout sColumnFormat : String
ArffDataset
Default constructor with no parameters. No dataset filename is specified using this constructor.
Type Visibility public Is Abstract false Parameter
ArffDataset
Constructor with the filename of the dataset as a parameter.
Parameter:
• sFileName The filename of the dataset
Type Visibility public Is Abstract false Parameter • inout sFileName : String
close
This method closes the ARFF file.
Exception:
• IOException
Type void Visibility protected Is Abstract false Parameter
obtainMetadata
This method reads the metadata of an ARFF file. Each attribute specification is interpreted and, if required, the column structure is created in the dataset.
This method reads the metadata block of the dataset. Parameter:
• sColumnFormat String indicating the sequence of column types that corresponds to the attribute order of each instance in the dataset.
o s: Nominal column o f: Numerical (real) column o c: Categorical column
![Page 134: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/134.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [133] University of Córdoba, Spain
o b: Binary column o d: Date column o %: Skip this column (do not dump its values to any column)
For example, the code "bbf%c" indicates that two binary columns and a numerical (real) column will be read. Then, the forth attribute will be skipped and, finally, a categorical column will be read.
Exceptions:
• IOException • InputMismatchException
Type void Visibility protected Is Abstract false Parameter • inout sColumnFormat : String
open
This method opens the dataset file using the name passed as a parameter to the constructor.
Exceptions:
• FileNotFoundException
Type void Visibility protected Is Abstract false Parameter
readDataset
This method opens the dataset, reads the metadata and instances and, finally, closes the dataset file.
Parameter:
• sContentFormat Not considered for ARFF datasets • sColumnFormat String that specifies the types of columns to be read. Each column type
is represented by one of the following symbols: o s: Nominal column o f: Numerical column o c: Categorical column o b: Binary column o d: Date column o %: Skip this column
Exceptions:
• NotAddedValueException • IOException • IndexOutOfBoundsException
Type void Visibility public Is Abstract false Parameter • inout sColumnFormat : String
• inout sContentFormat : String
![Page 135: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/135.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [134] University of Córdoba, Spain
readDataset
This method opens the dataset, reads the metadata and instances and, finally, closes the dataset file.
Parameter:
• sColumnFormat String that specifies the types of columns to be read. Each column type is represented by one of the following symbols:
o s: Nominal column o f: Numerical column o c: Categorical column o b: Binary column o d: Date column o %: Skip this column
Exceptions:
• NotAddedValueException • IOException • IndexOutOfBoundsException
Type void Visibility public Is Abstract false Parameter • inout sColumnFormat : String
readDataset
This method opens the dataset, reads the metadata and instances and, finally, closes the dataset file. The value of the column format string is null.
Exceptions:
• NotAddedValueException • IOException • IndexOutOfBoundsException
Type void Visibility public Is Abstract false Parameter
writeDataset
This method opens the dataset file, writes metadata and instances, and closes the file. The column types accepted (otherwise, an InputMismatchException exception is thrown) are the following:
• Numerical • Date • Nominal • Categorical • Boolean (binary values are saved as categorical values)
Parameter:
• sOutputFile The filename of the dataset Exceptions:
• InputMismatchException • IOException
![Page 136: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/136.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [135] University of Córdoba, Spain
Type void Visibility public Is Abstract false Parameter • inout sOutputFile : String
Relation Detail
Generalization
Name Related Element • FileDataset
Class CsvDataset CsvDataset implements the CSV (Comma-Separated Values) dataset file specification, as prescribed by the IETF specification, available from http://tools.ietf.org/html/rfc4180 (October, 2005).
Figure 40. Class CsvDataset
Name CsvDataset Qualified Name es::uco::kdis::datapro::dataset::Source::CsvDataset Visibility public Abstract false Base Classifier • FileDataset Realized Interface
Operation Detail
addAllValues
This method adds all the values in the file to the corresponding column structure. Parameter:
• sColumnFormat String indicating the sequence of column types that corresponds to the attribute order of each instance in the dataset.
o s: Nominal column o f: Numerical (real) column o i: Integer column o c: Categorical column o %: Skip this column (do not dump its values to any column)
For example, the string “cf%%s” indicates the sequence of attributes that are read from the dataset and
![Page 137: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/137.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [136] University of Córdoba, Spain
copied in memory to the column structure of the dataset: a categorical column and a numerical column. The following two attributes are omitted, and, finally, the date attribute is copied.
Exceptions:
• IndexOutOfBoundsException • IOException • NotAddedValueException
Type void Visibility protected Is Abstract false Parameter • inout sColumnFormat : String
close
This method closes the CSV file.
Exception:
• IOException
Type void Visibility protected Is Abstract false Parameter
CsvDataset
The default constructor of the CSV dataset with no parameters.
Type Visibility public Is Abstract false Parameter
CsvDataset
Constructor of the CSV dataset with its filename as a parameter.
Parameter:
• sFileName The filename of the CVS dataset
Type Visibility public Is Abstract false Parameter • inout sFileName : String
obtainMetadata
This method reads the metadata of the CSV file. Notice that any metainformation in CSV files is optional.
Parameter:
• sContentFormat String that specifies the structure of the CSV file. The following symbols are used:
o n: Indicates that a line with the attribute names is read o v: Indicates the block containing the instance values is read o %: Skip one row in the file
![Page 138: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/138.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [137] University of Córdoba, Spain
• sColumnFormat String that specifies the different column types to be read from the file. Each column type is represented by one of the following symbols:
o s: Nominal column o f: Numerical (real) column o c: Categorical column o i: Integer column o %: Skip this column
Exceptions:
• IOException • IllegalFormatSpecificationException
Type void Visibility Protected Is Abstract false Parameter • inout sColumnFormat : String
• inout sContentFormat : String
open
This method opens the dataset CSV file using the name passed as a parameter to the constructor.
Exceptions:
• FileNotFoundException
Type void Visibility protected Is Abstract false Parameter
readDataset
This method opens the dataset, reads the metadata and instances and, finally, closes the dataset file.
Parameter:
• sContentFormat String that specifies the structure of the CSV file. The following symbols are used:
o n: Indicates that a line with the attribute names is read o v: Indicates the block containing the instance values is read o %: Skip one row in the file For example, “%n%%v” omits the first line, then reads the column names, omits the next two lines and, finally, reads the dataset instances
• sColumnFormat String that specifies the types of columns to be read. Each column type is represented by one of the following symbols:
o s: Nominal column o f: Numerical column o i: Integer column o c: Categorical column o %: Skip this column
Exceptions:
• NotAddedValueException • IOException • IndexOutOfBoundsException • IllegalFormatSpecificationException
![Page 139: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/139.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [138] University of Córdoba, Spain
Type void Visibility public Is Abstract false Parameter • inout sColumnFormat : String
• inout sContentFormat : String
readDataset
This method opens the dataset, reads metainformation and instances and, finally, closes the dataset file. This method assumes the following file format: one first line with the attribute names (metadata), followed by the instances.
Parameter:
• sColumnFormat String that specifies the types of columns to be read. Each column type is represented by one of the following symbols:
o s: Nominal column o f: Numerical column o i: Integer column o c: Categorical column o %: Skip this column
Exceptions:
• NotAddedValueException • IOException • IndexOutOfBoundsException
Type void Visibility public Is Abstract false Parameter • inout sColumnFormat : String
writeDataset
This method writes a new CVS dataset file. The column types allowed for writing are the following:
• Numerical • Integer • Nominal • Categorical • Binary (binary values are saved as categorical values)
Parameter:
• sOutputFile The filename of the dataset Exceptions:
• IOException
Type void Visibility public Is Abstract false Parameter • inout sOutputFile : String
![Page 140: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/140.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [139] University of Córdoba, Spain
Relation Detail
Generalization
Name Related Element • FileDataset
Class ExcelDataset ExcelDataset is a class that represents a dataset conformant to the Microsoft Excel standard specification. This type of files has the basic features of all spreadsheets, using a grid of cells arranged in numbered rows and letter-named columns.
Note: This class has external dependencies to the Java library POI.
Figure 41. Class ExcelDataset
Name ExcelDataset Qualified Name es::uco::kdis::datapro::dataset::Source::ExcelDataset Visibility public Abstract false Base Classifier • FileDataset Realized Interface
Attribute Detail
All attributes are private.
Operation Detail
addAllValues
This method adds all the values in the DATA block of the file to the corresponding column structure. Parameter:
• sColumnFormat String indicating the sequence of column types that corresponds to the attribute order of each instance in the dataset.
o s: Nominal column o f: Numerical (real) column o i: Integer column o c: Categorical column
![Page 141: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/141.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [140] University of Córdoba, Spain
o %: Skip this column (do not dump its values to any column) For example, the string “cf%%s” indicates the sequence of attributes that are read from the dataset and copied in memory to the column structure of the dataset: a categorical column and a numerical column. The following two attributes are omitted, and, finally, the nominal attribute is copied.
Exceptions:
• IndexOutOfBoundsException • IOException • NotAddedValueException
Type void Visibility protected Is Abstract false Parameter • inout sColumnFormat : String
close
Close the Excel file.
Exceptions:
• IOException
Type void Visibility Protected Is Abstract false Parameter
ExcelDataset
Default constructor with no parameters.
Type Visibility public Is Abstract false Parameter
ExcelDataset
Constructor with the filename as parameter.
Parameter:
• sFileName The filename of the Excel dataset
Type Visibility public Is Abstract false Parameter • inout sFileName : String
obtainMetadata
This method reads the metadata of the Excel file.
Parameter:
• sContentFormat String that specifies the data structure in the Excel file. The
![Page 142: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/142.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [141] University of Córdoba, Spain
following symbols are used: o n: Indicates that a line with the attribute names is read o v: Indicates the block containing the instance values is read o %: Skip one row in the file
• sColumnFormat String that specifies the different column types to be read from the file. Each column type is represented by one of the following symbols:
o s: Nominal column o f: Numerical (real) column o c: Categorical column o i: Integer column o %: Skip this column
Exceptions:
• IOException • IllegalFormatSpecificationException
Type void Visibility protected Is Abstract false Parameter • inout sColumnFormat : String
• inout sContentFormat : String
open
This method opens the Excel file using the name passed as a parameter to the constructor.
Exceptions:
• FileNotFoundException
Type void Visibility protected Is Abstract false Parameter
readDataset
This method opens the dataset, reads the metadata and instances and, finally, closes the dataset file.
Parameter:
• sContentFormat String that specifies the structure of the CSV file. The following symbols are used:
o n: Indicates that a line with the attribute names is read o v: Indicates the block containing the instance values is read o %: Skip one row in the file For example, “%n%%v” omits the first line, then reads the column names, omits the next two lines and, finally, reads the dataset instances
• sColumnFormat String that specifies the types of columns to be read. Each column type is represented by one of the following symbols:
o s: Nominal column o f: Numerical column o i: Integer column o c: Categorical column o %: Skip this column
![Page 143: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/143.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [142] University of Córdoba, Spain
Exceptions:
• NotAddedValueException • IOException • IndexOutOfBoundsException • IllegalFormatSpecificationException
Type void Visibility public Is Abstract false Parameter • inout sColumnFormat : String
• inout sContentFormat : String
writeDataset
This method writes the dataset to a new Excel file. The column types supported for writing are the following:
• Numerical • Integer • Nominal • Categorical • Binary (binary values are saved as categorical values)
Parameter:
• sOutputFile The filename of the dataset Exceptions:
• IOException
Type void Visibility public Is Abstract false Parameter • inout sOutputFile : String
Relation Detail
Generalization
Name Related Element • FileDataset
Class KeelDataset KeelDataset is the class representing a dataset conformant to the KEEL (Knowledge Extraction based on Evolutionary Learning) standard specification. KeelDataset is a subclass of ArffDataset.
KEEL files are a specific subtype of ARFF files with the following kind of attributes:
• @ATTRIBUTE name real [value1, value2] for real data • @ATTRIBUTE name integer [value1, value2] for integer data • @ATTRIBUTE name {value1, value2, ...} for categorical data
For a more detailed description of this specification, the reader can consult the following reference:
![Page 144: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/144.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [143] University of Córdoba, Spain
J. Alcalá-Fdez et al. KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework. Journal of Multiple-Valued Logic and Soft Computing 17:2-3 (2011) 255-287.
Also, for further information, visit the website http://www.keel.es.
Figure 42. Class KeelDataset
Name KeelDataset Qualified Name es::uco::kdis::datapro::dataset::Source::KeelDataset Visibility public Abstract false Base Classifier • ArffDataset Realized Interface
Attribute Detail
INPUTS
Constant for the keyword @inputs
Type String Default Value "@inputs" Visibility protected Multiplicity
OUTPUTS
Constant for the keyword @outputs
Type String Default Value "@outputs" Visibility protected Multiplicity
Operation Detail
addAllValues
This method adds all the values in the @DATA block of the file to the corresponding column structure.
Parameter:
• sColumnFormat String indicating the sequence of column types that corresponds to the attribute order of each instance in the dataset.
o f: Numerical (real) column
![Page 145: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/145.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [144] University of Córdoba, Spain
o i: Integer column o c: Categorical column o b: Binary column o %: Skip this column (do not dump its values to any column)
For example, the string “cf%%b” indicates the sequence of attributes that are read from the dataset and copied in memory to the column structure of the dataset: a categorical column and a numerical column. The following two attributes are omitted, and, finally, the binary attribute is copied.
Exceptions:
• IndexOutOfBoundsException • IOException • NotAddedValueException
Type void Visibility protected Is Abstract false Parameter • inout sColumnFormat : String
KeelDataset
Default constructor with no parameters.
Type Visibility public Is Abstract false Parameter
KeelDataset
Constructor with the filename of the dataset as a parameter.
Parameter:
• sFileName The filename containing the dataset
Type Visibility public Is Abstract false Parameter • inout sFileName : String
obtainMetadata
This method reads the metadata of the KEEL file.
Parameter:
• sColumnFormat String that specifies the different column types to be read from the file. Each column type is represented by one of the following symbols:
o b: Binary column o f: Numerical (real) column o c: Categorical column o i: Integer column o %: Skip this column
Exceptions:
• IOException
![Page 146: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/146.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [145] University of Córdoba, Spain
• IllegalFormatSpecificationException
Type void Visibility protected Is Abstract false Parameter • inout sColumnFormat : String
writeDataset
This method writes the dataset to a new Excel file. Only the following types of column are supported for writing:
• Numerical (real) • Integer • Categorical
Parameter:
• sOutputFile The filename of the dataset Exceptions:
• IOException
Type void Visibility public Is Abstract false Parameter • inout sOutputFile : String
Relation Detail
Generalization
Name Related Element • ArffDataset
![Page 147: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/147.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [146] University of Córdoba, Spain
Package es::uco::kdis::datapro::datatypes
Figure 43. Package es.uco.kdis.datapro.datatypes
Name datatypes Qualified Name es::uco::kdis::datapro::datatypes
Class InvalidValue This abstract class represents any invalid value in a column. This is the base class of the following types of invalid values:
• Missing values. • Null values. • Empty values.
For a more detailed description, see the following reference:
Pyle, D. Data preparation for data mining. Morgan Kaufmann, 1999. ISBN: 1-55869-529-0.
Note. Notice that columns may define their own invalid values. However, these values are not processed by the library, but only devoted to serialization and specific algorithms. Generally, these objects for invalid values are more than enough for a regular use. Further, these objects are notation-independent, and only used for data processing.
Figure 44. Class InvalidValue
Name InvalidValue Qualified Name es::uco::kdis::datapro::datatypes::InvalidValue Visibility public Abstract true Base Classifier Realized Interface
Relation Detail
Generalization
Name Related Element • MissingValue
Name Related Element • EmptyValue
![Page 148: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/148.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [147] University of Córdoba, Spain
Name Related Element • NullValue
Class EmptyValue This class represents an empty value in a variable, i.e., the one for which no real-world value can be supposed.
This class implements a singleton object, so only one reference can be instantiated simultaneously. Instantiation is done using the method getEmptyValue. Therefore, empty values can be compared using the operator ‘==’.
Figure 45. Class EmptyValue
Name EmptyValue Qualified Name es::uco::kdis::datapro::datatypes::EmptyValue Visibility public Abstract false Base Classifier • InvalidValue Realized Interface
Attribute Detail All attributes are private.
Operation Detail
getEmptyValue
Singleton constructor for the object representing an empty value.
Type EmptyValue Visibility public Is Abstract false Parameter
Relation Detail
Generalization
Name Related Element • InvalidValue
Class MissingValue This class represents a missing value in a variable, i.e., the one that has not been entered into the dataset, but for which an actual value exists in the real-world in which the measurements were made.
![Page 149: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/149.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [148] University of Córdoba, Spain
This class implements a singleton object, so only one reference can be instantiated simultaneously. Instantiation is done using the method getMissingValue. Therefore, missing values can be compared using the operator ‘==’.
Figure 46. Class MissingValue
Name MissingValue Qualified Name es::uco::kdis::datapro::datatypes::MissingValue Visibility public Abstract false Base Classifier • InvalidValue Realized Interface
Attribute Detail All attributes are private.
Operation Detail
getMissingValue
Singleton constructor for the object representing a missing value.
Type MissingValue Visibility public Is Abstract false Parameter
Relation Detail
Generalization
Name Related Element • InvalidValue
Class NullValue This class represents an explicit null value in a variable.
This class implements a singleton object, so only one reference can be simultaneously instantiated. Instantiation is done using the method getNullValue. Therefore, null values can be compared using the operator ‘==’. Its use allows the programmer to replace null values with comparable object instances (e.g. in collections, comparisons, etc.).
![Page 150: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/150.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [149] University of Córdoba, Spain
Figure 47. Class NullValue
Name NullValue Qualified Name es::uco::kdis::datapro::datatypes::NullValue Visibility public Abstract false Base Classifier • InvalidValue Realized Interface
Attribute Detail All attributes are private.
Operation Detail
getNullValue
Singleton constructor for the object representing a null value.
Type NullValue Visibility public Is Abstract false Parameter
Relation Detail
Generalization
Name Related Element • InvalidValue
Class Range This class is a template to represent any kind of interval consisting of a maximum and minimum limit. These boundaries can be open or close, indicating that the value is excluded or included in the range. The C defined by the template is the class of object involved in the range.
![Page 151: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/151.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [150] University of Córdoba, Spain
Figure 48. Class Range
Name Range Qualified Name es::uco::kdis::datapro::datatypes::Range Visibility public Abstract true Base Classifier Realized Interface
Attribute Detail
Protected attributes with accessors (getter/setter) are omitted.
Operation Detail
getMaxValue
This method returns the upper interval boundary value, i.e. the maximum value in the interval (the programmer has to check whether the interval is open or close).
Type C Visibility public Is Abstract false Parameter
getMinValue
This method returns the lower interval boundary value, i.e. the minimum value in the interval (the programmer has to check whether the interval is open or close).
Type C Visibility public Is Abstract false Parameter
isOpenMax
This method returns a boolean value indicating whether the upper interval boundary is open, i.e. the maximum value is excluded from the range.
Type boolean Visibility public Is Abstract false Parameter
![Page 152: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/152.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [151] University of Córdoba, Spain
isOpenMin
This method returns a boolean value indicating whether the lower interval boundary is open, i.e. the minimum value is excluded from the range.
Type boolean Visibility public Is Abstract false Parameter
setMaxValue
This method sets the upper interval boundary.
Parameter:
• oMax The new maximum value
Type void Visibility public Is Abstract false Parameter • inout oMax : C
setMinValue
This method sets the lower interval boundary.
Parameter:
• oMin The new minimum value
Type void Visibility public Is Abstract false Parameter • inout oMin : C
setOpenMax
This method sets the upper interval boundary to open or close.
Parameter:
• bOpenMax True if open; false if close.
Type void Visibility public Is Abstract false Parameter • inout bOpenMax : boolean
setOpenMin
This method sets the lower interval boundary to open or close.
Parameter:
• bOpenMin True if open; false if close.
Type void Visibility public Is Abstract false Parameter • inout bOpenMin : boolean
![Page 153: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/153.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [152] University of Córdoba, Spain
Relation Detail
Dependency
Name Related Element • Range<Double>
Class DoubleRange This class is a specialization of the template Range, where the template parameter is of type Double.
Figure 49. Class DoubleRange
Name DoubleRange Qualified Name es::uco::kdis::datapro::datatypes::DoubleRange Visibility public Abstract false Base Classifier • Range<Double> Realized Interface
Operation Detail
DoubleRange
Default constructor with no parameters. By default, the lower and upper limit boundaries are set to the negative and positive infinite values, respectively.
Type Visibility public Is Abstract false Parameter
DoubleRange
Constructor with parameters.
Parameters:
• dMin The minimum value of the range, i.e. the lower interval boundary. • dMax The maximum value of the range, i.e. the upper interval boundary.
Type Visibility public Is Abstract false Parameter • in dMax : double
• in dMin : double
hasValue
This method returns true if the value passed as a parameter is a valid value in the interval.
![Page 154: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/154.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [153] University of Córdoba, Spain
Parameter:
• dValue The value to be checked.
Type boolean Visibility public Is Abstract false Parameter • in dValue : double
toString
This method returns the interval in a String format. The output format is as follows:
‘[‘|’(‘ <min> ‘,’ <max> ‘)’|’]’
where square brackets are used for close intervals, and regular brackets indicate an open value.
Type String Visibility public Is Abstract false Parameter
Relation Detail
Generalization
Name Related Element • Range<Double>
![Page 155: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/155.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [154] University of Córdoba, Spain
Package es::uco::kdis::datapro::exception
Figure 50. Package es.uco.kdis.datapro.exception
Name exception Qualified Name es::uco::kdis::datapro::exception
Class IllegalFormatSpecificationException This class is the exception indicating that the file format under consideration does not fulfill the expected standards for such a specification.
Figure 51. Class IllegalFormatSpecificationException
Name IllegalFormatSpecificationException Qualified Name es::uco::kdis::datapro::exception::IllegalFormatSpecificationException Visibility public Abstract false Base Classifier • Exception Realized Interface
Attribute Detail All attributes are private.
Operation Detail
IllegalFormatSpecificationException
Constructor with the error message as a parameter.
Parameter:
• string Error message
Type Visibility public Is Abstract false Parameter • inout string : String
![Page 156: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/156.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [155] University of Córdoba, Spain
Relation Detail
Generalization
Name Related Element • Exception
Class NoSuchCategoryException This class is the exception indicating that a certain element does not belong to the specified category, or that a category is not found.
Figure 52. Class NoSuchCategoryException
Name NoSuchCategoryException Qualified Name es::uco::kdis::datapro::exception::NoSuchCategoryException Visibility public Abstract false Base Classifier • Exception Realized Interface
Attribute Detail All attributes are private.
Operation Detail
NoSuchCategoryException
Constructor with the error message as a parameter.
Parameter:
• string Error message
Type Visibility public Is Abstract false Parameter • inout string : String
Relation Detail
Generalization
Name Related Element • Exception
![Page 157: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/157.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [156] University of Córdoba, Spain
Class NotAddedValueException This class is the exception indicating that a value was not successfully added to the dataset.
Figure 53. Class NotAddedValueException
Name NotAddedValueException Qualified Name es::uco::kdis::datapro::exception::NotAddedValueException Visibility public Abstract false Base Classifier • Exception Realized Interface
Attribute Detail All attribute are private.
Operation Detail
NotAddedValueException
Constructor with the error message as a parameter.
Parameter:
• string Error message
Type Visibility public Is Abstract false Parameter • inout string : String
Relation Detail
Generalization
Name Related Element • Exception
![Page 158: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/158.jpg)
datapro4Rev 1 (Jul
More @ htt
Appe This apppackage
Packa
4j ly 2012)
tp://www.jrrom
endix A
pendix showe overview. T
age es.u
mero.net/en
A: UML
ws the class dThe different
uco.kdis
Figure 55. Cl
KDIS ReseUniversity of
L diagr
diagrams tha packages a
Figure 54. C
s.datap
lass diagram
earch GroupCórdoba, Spain
rams
at represent tre shown ne
Class diagram
pro.algo
: package es
n
he structureext.
m: package ov
orithm.b
.uco.kdis.dat
of datapro4j
verview
base
tapro.algorith
The pro
4j. This is the
hm.base
ogrammer’s gu
[157]
e general
ide
![Page 159: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/159.jpg)
datapro4Rev 1 (Jul
More @ htt
Packa
4j ly 2012)
tp://www.jrrom
age es.u
Figur
mero.net/en
uco.kdi
re 56. Class d
KDIS ReseUniversity of
s.datap
diagram: Pac
earch GroupCórdoba, Spain
pro.algo
ckage es.uco.
n
orithm.p
.kdis.datapro
preproce
.algorithm.pr
The pro
essing
reprocessing
ogrammer’s gu
[158]
g
ide
![Page 160: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/160.jpg)
datapro4Rev 1 (Jul
More @ htt
Packa
4j ly 2012)
tp://www.jrrom
age es.u
F
mero.net/en
uco.kdi
Figure 57. Cla
KDIS ReseUniversity of
s.datap
ass diagram:
earch GroupCórdoba, Spain
pro.data
: Package es.
n
aset col
.uco.kdis.dat
umns
apro.dataset
The pro
t.Column
ogrammer’s gu
[159]
ide
![Page 161: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/161.jpg)
datapro4Rev 1 (Jul
More @ htt
Packa
4j ly 2012)
tp://www.jrrom
age es.u
mero.net/en
uco.kdis
Figur
KDIS ReseUniversity of
s.datap
re 58. Packag
earch GroupCórdoba, Spain
pro.data
ge es.uco.kdi
n
aset.Sou
is.datapro.da
urce
taset.Source
The pro
e
ogrammer’s gu
[160]
ide
![Page 162: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/162.jpg)
datapro4Rev 1 (Jul
More @ htt
Packa
Packag
4j ly 2012)
tp://www.jrrom
age es.u
ge es.uc
mero.net/en
uco.kdis
Figure 59.
co.kdis
Figure 60.
KDIS ReseUniversity of
s.datap
Class diagra
.datapr
Class diagra
earch GroupCórdoba, Spain
pro.data
am: Package
ro.excep
am: Package
n
atypes
es.uco.kdis.d
ption
es.uco.kdis.d
datapro.datat
datapro.exce
The pro
types
eption
ogrammer’s gu
[161]
ide
![Page 163: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/163.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [162] University of Córdoba, Spain
Appendix B: Extending the library Project structure
This project is structured in three different parts:
1. Column structure. 2. Datasets hierarchy. 3. Strategies.
Column structure If the programmer wants to develop new columns or adapt an existing one to his own requirements, he should have in mind the strict separation between abstraction and implementation. The former implements those methods directly devoted to manage the column metainformation and delegates any processing, handling or query related to the column real values to its implementation. For further information, see the Bridge design pattern (http://en.wikipedia.org/wiki/Bridge_pattern).
We recommend the following guidelines for the development of new columns:
• Column classes should be located in the package es.uco.kdis.datapro.dataset.Column • For a given type of column, namely X, the abstraction class will be named XColumn, and
its implementation class, XColumnImpl. • The new column X has to be added to the enumeration ColumnType. This value is returned
by the column as its type. • Column implementations should not be directly accessed from any other class than
its abstraction.
Datasets hierarchy The library provides a finite number of dataset implementations (ARFF, Keel, CSV, MySql, ... and increasing), but its architecture permits the programmer to extend this part to make his own datasets of interest available. Rarely dataset classes are directly inherited from the top Dataset abstract class, but it is advisable to create, use and maintain the correct class hierarchy where common (both structural and behavioural) properties are defined, for design reasons. For example, ARFF and CSV datasets will inherit from the common file-based dataset, i.e. the abstract class FileDataset. Their respective classes will only define those properties that are specific to these kinds of file, whereas file-specific properties are defined by intermediate abstract classes. Dataset is always the root of this hierarchy, since this class links the physical dataset to the logical column structure.
Some guidelines to be considered:
• Dataset abstract classes for defining common properties are located in the package es.uco.kdis.datapro.dataset
• Dataset concrete classes are located in the package es.uco.kdis.datapro.dataset.Source • Dataset classes should be named with the suffix -“Dataset”, .e.g, CsvDataset.
Apart from the constructor (with or without parameters), the main methods to pay attention are inherited from the abstract class Dataset:
• readDataset, which allows the programmer to configure the type of columns to be filled, as well as and the dataset structure.
• writeDataset, which permits the programmer to save current dataset values into the specific format.
These methods should fulfill the following assumptions:
• When reading, format can vary or contain errors (invalid values, missing or wrong structure, etc.). • When reading, the original structure (meta-data) of the dataset should be recalled somehow.
![Page 164: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/164.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [163] University of Córdoba, Spain
• When writing, the dataset may have been read from a dataset of the same type, or not: o If the source dataset is of the same format, the programmer may want to overwrite or
generate a new dataset. In both cases, the resulting dataset should maintain the same structure (e.g. column types and meta-data) than the source dataset.
o If the dataset to be written is of a different type than the source dataset (or the same type with a different structure), the programmer may want to specify the type of columns to be declared in the resulting dataset.
Strategies Strategies are the core and most scalable element of the library. Strategies implement algorithms on data. Strategies are independent of a specific dataset, so they can make use of more than one dataset. See DatasetStrategy in this guide for more information on the methods that should be implemented.
To implement your own algorithms, the following guidelines should be considered:
• Every algorithm should be a subclass of DatasetStrategy. • Algorithms are grouped in packages from es.uco.kdis.datapro.algorithm • Only the package es.uco.kdis.datapro.algorithm.base is required by the library. The rest
of packages from es.uco.kdis.datapro.algorithm could be excluded from the programmer’s distribution. Notice that each specific algorithm package may have its own external dependencies.
Other packages Apart from the specific packages for columns, datasets and strategies, there are some other relevant packages to consider that may be extended as well:
• es.uco.kdis.datapro.datatypes, this package implements the auxiliary classes and datatypes used by datapro4j. For example, the classes declaring invalid values, ranges, etc.
• es.uco.kdis.datapro.exception, this package implements the exception classes. The programmer should look for alternative Java common exceptions before implementing his own class and clutter the library up with unnecessary classes.
Code documentation Class headings are documented according to the following structure: class description, contact info and history.
/***CLASSDESCRIPTION**<p>*CONTACTINFO:*<ul>*<li>JoseRaulRomero,PhD [[email protected]]*<p>{@linkhttp://www.jrromero.net}*<p><p>*KnowledgeDiscoveryandIntelligentSystemsResearchGroup(KDIS)<p>*{@linkhttp://www.uco.es/grupos/kdis}*</ul>*<p>*HISTORY:*<ul>*<li>INCLUDEHERETHELISTOFCHANGESTOTHISSPECIFICFILE*</ul>*<p>*@authorJoseRaulRomero(JRR,0.2,0.3) EXAMPLEOFAUTHORS,INITIALS,VERSIONS
![Page 165: datapro4j Programmer's guide...datapro4j The data processing library for Java The programmer’s guide Revision: 1 Please, cite this document as: J.R. Romero, J.M. Luna, S. Ventura](https://reader033.vdocuments.us/reader033/viewer/2022050523/5fa6b5352dfa5c69de613f18/html5/thumbnails/165.jpg)
datapro4j The programmer’s guide Rev 1 (July 2012)
More @ http://www.jrromero.net/en KDIS Research Group [164] University of Córdoba, Spain
@authorJoseMariaLuna(JML,0.1)@version0.3***/
Each parameter and method should follow the Javadoc notation for documenting the code.
Further, remember include the file license.txt in every distribution that includes the library or part of it.
Coding recommendations
1. Code should be implemented following the Hungarian notation. 2. Code and comments should be written in English.