Download - Post-processing Operators for Decision Lists
BioHEL SystemOur approach
ResultsSummary
Post-processing Operators forDecision Lists
María A. Franco
Supervisor: Jaume BacarditUniversity of Nottingham, UK,
ICOS Research Group,School of Computer Science
June 12, 2012
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 1 / 29
BioHEL SystemOur approach
ResultsSummary
Motivation
Goal of my PhD projectTo enhance evolutionary learning systems based on IRL(BioHEL) to work better with large scale datasets.
How have we been doing this?Analysing the weaknesses of the system in differentdomains [Franco et al., 2012a]
Improving the execution time by means of GPGPUs[Franco et al., 2010]
Developing theoretical models that allow us to adaptparameters within the system [Franco et al., 2011]
Improving the quality of the final solutions by means oflocal search (memetic operators) [Franco et al., 2012b]
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 2 / 29
BioHEL SystemOur approach
ResultsSummary
Motivation
Goal of my PhD projectTo enhance evolutionary learning systems based on IRL(BioHEL) to work better with large scale datasets.
How have we been doing this?Analysing the weaknesses of the system in differentdomains [Franco et al., 2012a]
Improving the execution time by means of GPGPUs[Franco et al., 2010]
Developing theoretical models that allow us to adaptparameters within the system [Franco et al., 2011]
Improving the quality of the final solutions by means oflocal search (memetic operators) [Franco et al., 2012b]
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 2 / 29
BioHEL SystemOur approach
ResultsSummary
Motivation
Goal of my PhD projectTo enhance evolutionary learning systems based on IRL(BioHEL) to work better with large scale datasets.
How have we been doing this?Analysing the weaknesses of the system in differentdomains [Franco et al., 2012a]
Improving the execution time by means of GPGPUs[Franco et al., 2010]
Developing theoretical models that allow us to adaptparameters within the system [Franco et al., 2011]
Improving the quality of the final solutions by means oflocal search (memetic operators) [Franco et al., 2012b]
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 2 / 29
BioHEL SystemOur approach
ResultsSummary
Motivation
Goal of my PhD projectTo enhance evolutionary learning systems based on IRL(BioHEL) to work better with large scale datasets.
How have we been doing this?Analysing the weaknesses of the system in differentdomains [Franco et al., 2012a]
Improving the execution time by means of GPGPUs[Franco et al., 2010]
Developing theoretical models that allow us to adaptparameters within the system [Franco et al., 2011]
Improving the quality of the final solutions by means oflocal search (memetic operators) [Franco et al., 2012b]
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 2 / 29
BioHEL SystemOur approach
ResultsSummary
Motivation
Goal of my PhD projectTo enhance evolutionary learning systems based on IRL(BioHEL) to work better with large scale datasets.
How have we been doing this?Analysing the weaknesses of the system in differentdomains [Franco et al., 2012a]
Improving the execution time by means of GPGPUs[Franco et al., 2010]
Developing theoretical models that allow us to adaptparameters within the system [Franco et al., 2011]
Improving the quality of the final solutions by means oflocal search (memetic operators) [Franco et al., 2012b]
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 2 / 29
BioHEL SystemOur approach
ResultsSummary
Motivation
Goal of this workTo improve the quality of the decision lists by means of localsearch (memetic operators)
Decision lists are a widespread paradigm in rule learning,guided local search and supervised learning.
ExamplePittsburgh Learning Classifier SystemsRule induction systems in mainstream machine learning(PART, CN2, JRip)
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 3 / 29
BioHEL SystemOur approach
ResultsSummary
Motivation
Goal of this workTo improve the quality of the decision lists by means of localsearch (memetic operators)
Decision lists are a widespread paradigm in rule learning,guided local search and supervised learning.
ExamplePittsburgh Learning Classifier SystemsRule induction systems in mainstream machine learning(PART, CN2, JRip)
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 3 / 29
BioHEL SystemOur approach
ResultsSummary
Outline
1 BioHELAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?
2 Our approach: Post-processing the rulesSwappingPruningCleaning
3 Results
4 SummaryWhere to go from here?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 4 / 29
BioHEL SystemOur approach
ResultsSummary
BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?
Introduction to the BioHEL System
BIOinformatics-oriented Hierarchical Evolutionary Learning- BioHEL [Bacardit et al., 2009]
BioHEL is an evolutionary learning system that employsthe Iterative Rule Learning (IRL) paradigmBioHEL was especially designed to cope with large scaledatasets
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 5 / 29
BioHEL SystemOur approach
ResultsSummary
BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?
Attribute List Knowledge Representation
Meta-representation to handle large amount of discreteand continuous attributes fast [Bacardit and Krasnogor, 2009].
ALKR Classifier Example
numAtt
predicates
class
whichAtt
3
0
0.70.5
1
0.3
offsetPred 0
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 6 / 29
BioHEL SystemOur approach
ResultsSummary
BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?
Attribute List Knowledge Representation
Discrete attributesGABIL representation
F1 F2 F3100 01 1101ABC DE FGHI
F1 = A ∧ F2 = E ∧ F3 = (F ∨ G ∨ I)
Continuous attributesHyper-rectangle representation
C1 = [0.1,0.3] ∧ C2 = [0.7,0.9]
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 7 / 29
BioHEL SystemOur approach
ResultsSummary
BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?
Solutions generated by the BioHEL system
Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
BioHEL SystemOur approach
ResultsSummary
BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?
Solutions generated by the BioHEL system
Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
BioHEL SystemOur approach
ResultsSummary
BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?
Solutions generated by the BioHEL system
Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
BioHEL SystemOur approach
ResultsSummary
BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?
Solutions generated by the BioHEL system
Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
BioHEL SystemOur approach
ResultsSummary
BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?
Solutions generated by the BioHEL system
Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
BioHEL SystemOur approach
ResultsSummary
BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?
Solutions generated by the BioHEL system
Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
BioHEL SystemOur approach
ResultsSummary
BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?
Solutions generated by the BioHEL system
Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
BioHEL SystemOur approach
ResultsSummary
BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?
Solutions generated by the BioHEL system
Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
BioHEL SystemOur approach
ResultsSummary
BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?
Solutions generated by the BioHEL system
Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
BioHEL SystemOur approach
ResultsSummary
BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?
Solutions generated by the BioHEL system
Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
BioHEL SystemOur approach
ResultsSummary
BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?
Solutions generated by the BioHEL system
Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
BioHEL SystemOur approach
ResultsSummary
BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?
How can the rules be improved further?
We encountered the following problems:The rules were learned in the wrong order
Larger rulesets!
Example
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 9 / 29
BioHEL SystemOur approach
ResultsSummary
BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?
How can the rules be improved further?
We encountered the following problems:The rules did not have the correct specificity
The number of attributes expressed was rather high!
ExampleProblem:x1 = 1 ∧ x3 = 0
000 = 0 100 = 1001 = 0 101 = 0010 = 0 110 = 1011 = 0 111 = 0
Goodx1 = 1 ∧ x3 = 0
Over-specificx1 = 1 ∧ x2 = 1 ∧ x3 = 0x1 = 1 ∧ x2 = 0 ∧ x3 = 0
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 10 / 29
BioHEL SystemOur approach
ResultsSummary
SwappingPruningCleaning
Our approach: Post-processing the rules
Ruleset-wise operatorsRule swapping
Rule-wise operatorsPruningCleaning
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 11 / 29
BioHEL SystemOur approach
ResultsSummary
SwappingPruningCleaning
Our approach: Post-processing the rules
Ruleset-wise operatorsRule swapping
Rule-wise operatorsPruningCleaning
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 11 / 29
BioHEL SystemOur approach
ResultsSummary
SwappingPruningCleaning
Rule Swapping
Consist is swapping the order of the rules in the finalrulesets.Which rules shall we swap? ⇒ Similarities
Measure of similarity
S(i , j) =DisNA
∑Disk Sk (i , j)∑Dis
k numVals(k)+
RealNA
Real∑k
Sk (i , j) +MiNA
Measures the overlapping between rules
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 12 / 29
BioHEL SystemOur approach
ResultsSummary
SwappingPruningCleaning
How does it works?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
BioHEL SystemOur approach
ResultsSummary
SwappingPruningCleaning
How does it works?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
BioHEL SystemOur approach
ResultsSummary
SwappingPruningCleaning
How does it works?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
BioHEL SystemOur approach
ResultsSummary
SwappingPruningCleaning
How does it works?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
BioHEL SystemOur approach
ResultsSummary
SwappingPruningCleaning
How does it works?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
BioHEL SystemOur approach
ResultsSummary
SwappingPruningCleaning
How does it works?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
BioHEL SystemOur approach
ResultsSummary
SwappingPruningCleaning
How does it works?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
BioHEL SystemOur approach
ResultsSummary
SwappingPruningCleaning
How does it works?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
BioHEL SystemOur approach
ResultsSummary
SwappingPruningCleaning
How does it works?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
BioHEL SystemOur approach
ResultsSummary
SwappingPruningCleaning
How does it works?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
BioHEL SystemOur approach
ResultsSummary
SwappingPruningCleaning
How does it works?
Helps eraseunnecessary rulesIt does not ensure thefinal rule set is minimalIt has to reevaluate therules in the new order ineach iteration
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 14 / 29
BioHEL SystemOur approach
ResultsSummary
SwappingPruningCleaning
How does it works?
Helps eraseunnecessary rulesIt does not ensure thefinal rule set is minimalIt has to reevaluate therules in the new order ineach iteration
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 14 / 29
BioHEL SystemOur approach
ResultsSummary
SwappingPruningCleaning
How does it works?
Helps eraseunnecessary rulesIt does not ensure thefinal rule set is minimalIt has to reevaluate therules in the new order ineach iteration
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 14 / 29
BioHEL SystemOur approach
ResultsSummary
SwappingPruningCleaning
Our approach: Post-processing the rules
Ruleset-wise operatorsRule swapping
Rule-wise operatorsPruning
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 15 / 29
BioHEL SystemOur approach
ResultsSummary
SwappingPruningCleaning
Our approach: Post-processing the rules
Ruleset-wise operatorsRule swapping
Rule-wise operatorsPruning
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 15 / 29
BioHEL SystemOur approach
ResultsSummary
SwappingPruningCleaning
Rule pruning
Drops attributes that do not affect the accuracy of the rules.
ExampleProblem:x1 = 1 ∧ x3 = 0
000 = 0 100 = 1001 = 0 101 = 0010 = 0 110 = 1011 = 0 111 = 0
Goodx1 = 1 ∧ x3 = 0
Over-specificx1 = 1 ∧ x2 = 1 ∧ x3 = 0x1 = 1 ∧ x2 = 0 ∧ x3 = 0
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 16 / 29
BioHEL SystemOur approach
ResultsSummary
SwappingPruningCleaning
Our approach: Post-processing the rules
Ruleset-wise operatorsRule swapping
Rule-wise operatorsPruning ⇒ Wait! This does not work if the other attributesare not correctly specified!Cleaning
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 17 / 29
BioHEL SystemOur approach
ResultsSummary
SwappingPruningCleaning
Our approach: Post-processing the rules
Ruleset-wise operatorsRule swapping
Rule-wise operatorsPruning ⇒ Wait! This does not work if the other attributesare not correctly specified!Cleaning
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 17 / 29
BioHEL SystemOur approach
ResultsSummary
SwappingPruningCleaning
Our approach: Post-processing the rules
Ruleset-wise operatorsRule swapping
Rule-wise operatorsPruning ⇒ Wait! This does not work if the other attributesare not correctly specified!Cleaning
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 17 / 29
BioHEL SystemOur approach
ResultsSummary
SwappingPruningCleaning
Rule cleaning
In the χary domain is not always possible to drop attributesif the correct attributes are misaligned
ExampleProblem:x1 nominal {a,b,c,d,e}x2 nominal {w,y,z}x3 nominal {m,n}
Rule 1:x1 = (a ∨ b) ∧ x2 = w
Generated Rule:x1 = (a ∨ b ∨ c) ∧ x2 = w ∧ x3 = m
We need to deactivate literals in the attributes
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 18 / 29
BioHEL SystemOur approach
ResultsSummary
SwappingPruningCleaning
How does it works?
Cleaning approaches:CL - Focus on the positivesCL2 - Do not infer
(- - - - ( (+ - + + + + - + -+) ) - - -) CL2 CLOLD OLDCL CL2
1 1 1 0 1 1a b c d e f
Values covered by possitive examples: a,b,cValues covered by negative examples: c,e
1 1 1 0 0 0a b c d e f
1 1 1 0 0 1a b c d e f
CL CL2
OLD
Continuous
Discrete
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 19 / 29
BioHEL SystemOur approach
ResultsSummary
Experimental design
We analysed the operators over final rulesets generatedwith 35 real world problems3 stages of experiments
Independent operatorsCombinations between CL and PRCombinations with the SW operator
Questions
Where are the most significant improvements?
Are the results significant?
What about the computational time?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 20 / 29
BioHEL SystemOur approach
ResultsSummary
Experimental design
We analysed the operators over final rulesets generatedwith 35 real world problems3 stages of experiments
Independent operatorsCombinations between CL and PRCombinations with the SW operator
Questions
Where are the most significant improvements?
Are the results significant?
What about the computational time?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 20 / 29
BioHEL SystemOur approach
ResultsSummary
Results of the operators independently%
of va
riation
−20
−15
−10
−5
0
−30
−25
−20
−15
−10
−5
0
−3
−2
−1
0
1
2
−4
−2
0
2
Atts
Rules
Test_acc
Test_ensemble
Adult
C−4
CN
CN
−bin
KD
DC
up
ParM
XS
S1
bal
bpa
bre
cmc
col
cr−a
gls
h−c1
h−h
h−s
hep
ion
irs
lab
lym
pen
pim
prt
sat
son
thyvo
tw
avw
bcd
wdbc
win
ew
pbc
zoo
Algorithm
CL
CL2
PR
SW
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 21 / 29
BioHEL SystemOur approach
ResultsSummary
Results of combining CL and PR%
of va
riation
−30
−25
−20
−15
−10
−5
0
−4
−3
−2
−1
0
1
2
−4
−2
0
2
4
CL
Adult
C−4
CN
CN
−bin
KD
DC
up
ParM
XS
S1
bal
bpa
bre
cmc
col
cr−a
gls
h−c1
h−h
h−s
hep
ion
irsla
blympen
pim
prt
sat
son
thyvo
tw
avw
bcd
wdbc
win
ew
pbc
zoo
CL2
Adult
C−4
CN
CN
−bin
KD
DC
up
ParM
XS
S1
bal
bpa
bre
cmc
col
cr−a
gls
h−c1
h−h
h−s
hep
ion
irsla
blympen
pim
prt
sat
son
thyvo
tw
avw
bcd
wdbc
win
ew
pbc
zoo
Atts
Te
st_
acc
Te
st_
en
se
mble
Algorithm
CL−PR
PR−CL
PR−CL−PR
CL2−PR
PR−CL2
PR−CL2−PR
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 22 / 29
BioHEL SystemOur approach
ResultsSummary
Results of combining CL, PR and SW%
of va
riation
−25
−20
−15
−10
−5
0
−30
−25
−20
−15
−10
−5
0
−3
−2
−1
0
1
2
−4
−2
0
2
4
Atts
Rules
Test_acc
Test_ensemble
Adult
C−4
CN
CN
−bin
KD
DC
up
ParM
XS
S1
bal
bpa
bre
cmc
col
cr−a
gls
h−c1
h−h
h−s
hep
ion
irsla
blympen
pim
prt
sat
son
thyvo
tw
avw
bcd
wdbc
win
ew
pbc
zoo
Algorithm
CL−SW
CL2−SW
PR−SW
PR−CL2−PR−SW
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 23 / 29
BioHEL SystemOur approach
ResultsSummary
Are the results significant?
Table: Rankings of the Friedman statistical tests. ? indicates that thealgorithm is significantly better (Holm test with 99% confidence).
Test Test # Rules # Attsacc ensem
P-Values 0.708 0.962 8.9e-09 2.2e-16
Base 7.80 7.07 3.73 10.84CL 7.73 7.86 – 10.84CL2 7.64 7.84 – 10.84PR 7.57 7.21 – 5.53 ?SW 7.51 6.60 2.59 ? 11.30
CL-PR 6.37 7.29 – 3.97 ?PR -CL 6.67 7.31 – 5.53 ?PR-CL-PR 5.87 6.79 – 1.51 ?CL2-PR 6.59 6.79 – 5.81 ?PR -CL2 6.89 7.16 – 5.71 ?PR-CL2-PR 6.36 6.91 – 2.29 ?
CL-SW 7.14 6.51 2.07 ? 11.23CL2-SW 7.46 6.83 2.40 ? 11.17PR-SW 6.94 6.29 2.14 ? 5.94 ?PR-CL2-PR-SW 6.46 6.54 2.07 ? 2.47 ?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 24 / 29
BioHEL SystemOur approach
ResultsSummary
Are the results significant?
Table: Rankings of the Friedman statistical tests. ? indicates that thealgorithm is significantly better (Holm test with 99% confidence).
Test Test # Rules # Attsacc ensem
P-Values 0.708 0.962 8.9e-09 2.2e-16
Base 7.80 7.07 3.73 10.84CL 7.73 7.86 – 10.84CL2 7.64 7.84 – 10.84PR 7.57 7.21 – 5.53 ?SW 7.51 6.60 2.59 ? 11.30
CL-PR 6.37 7.29 – 3.97 ?PR -CL 6.67 7.31 – 5.53 ?PR-CL-PR 5.87 6.79 – 1.51 ?CL2-PR 6.59 6.79 – 5.81 ?PR -CL2 6.89 7.16 – 5.71 ?PR-CL2-PR 6.36 6.91 – 2.29 ?
CL-SW 7.14 6.51 2.07 ? 11.23CL2-SW 7.46 6.83 2.40 ? 11.17PR-SW 6.94 6.29 2.14 ? 5.94 ?PR-CL2-PR-SW 6.46 6.54 2.07 ? 2.47 ?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 24 / 29
BioHEL SystemOur approach
ResultsSummary
Are the results significant?
Table: Rankings of the Friedman statistical tests. ? indicates that thealgorithm is significantly better (Holm test with 99% confidence).
Test Test # Rules # Attsacc ensem
P-Values 0.708 0.962 8.9e-09 2.2e-16
Base 7.80 7.07 3.73 10.84CL 7.73 7.86 – 10.84CL2 7.64 7.84 – 10.84PR 7.57 7.21 – 5.53 ?SW 7.51 6.60 2.59 ? 11.30
CL-PR 6.37 7.29 – 3.97 ?PR -CL 6.67 7.31 – 5.53 ?PR-CL-PR 5.87 6.79 – 1.51 ?CL2-PR 6.59 6.79 – 5.81 ?PR -CL2 6.89 7.16 – 5.71 ?PR-CL2-PR 6.36 6.91 – 2.29 ?
CL-SW 7.14 6.51 2.07 ? 11.23CL2-SW 7.46 6.83 2.40 ? 11.17PR-SW 6.94 6.29 2.14 ? 5.94 ?PR-CL2-PR-SW 6.46 6.54 2.07 ? 2.47 ?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 24 / 29
BioHEL SystemOur approach
ResultsSummary
How long does the post-processing takes?
Table: Execution time of the application of each one of the differentoperators independently
Prob Ins Rules Atts CL2 (s) PR (s) SW (s)
CN-bin 493788 38.20 ± 1.85 7.12±0.73 17.44±0.76 20.52±0.82 157.51±76.42Adult 43960 194.24 ± 10.26 10.18±2.80 49.87±3.85 69.60±10.22 5855.04±874.14CN 234638 253.34 ± 12.48 10.09±2.78 314.02±26.01 631.68±70.09 43097.44±5429.48KDD 444619 188.84 ± 13.52 4.25±2.99 213.95±18.25 375.85±59.00 23791.21±5041.45C-4 60803 316.14 ± 19.10 9.96±3.23 96.49±8.33 192.21±24.76 18763.03±2614.41ParMX 235929 394.34 ± 19.39 9.00±0.01 405.77±37.05 619.20±82.02 106343.70±13094.78SS1 75583 773.26 ± 30.42 11.49±3.40 293.70±23.26 649.51±85.94 133415.03±19160.27
Swapping is very slow... It depends on the number of instancesand number of rules generated.
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 25 / 29
BioHEL SystemOur approach
ResultsSummary
How long does the post-processing takes?
Table: Execution time of the application of each one of the differentoperators independently
Prob Ins Rules Atts CL2 (s) PR (s) SW (s)
CN-bin 493788 38.20 ± 1.85 7.12±0.73 17.44±0.76 20.52±0.82 157.51±76.42Adult 43960 194.24 ± 10.26 10.18±2.80 49.87±3.85 69.60±10.22 5855.04±874.14CN 234638 253.34 ± 12.48 10.09±2.78 314.02±26.01 631.68±70.09 43097.44±5429.48KDD 444619 188.84 ± 13.52 4.25±2.99 213.95±18.25 375.85±59.00 23791.21±5041.45C-4 60803 316.14 ± 19.10 9.96±3.23 96.49±8.33 192.21±24.76 18763.03±2614.41ParMX 235929 394.34 ± 19.39 9.00±0.01 405.77±37.05 619.20±82.02 106343.70±13094.78SS1 75583 773.26 ± 30.42 11.49±3.40 293.70±23.26 649.51±85.94 133415.03±19160.27
Swapping is very slow... It depends on the number of instancesand number of rules generated.
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 25 / 29
BioHEL SystemOur approach
ResultsSummary
Where to go from here?
Summary and next steps
SummaryThe operators manage to reduce the number of rules andexpressed attributes in 30% in some cases.
Next stepsApply the CL and PR operators during the learning processInvestigate other measures of similarities among rulesApply these operators over other systems
Different representations
CUDA accelerated operators?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 26 / 29
BioHEL SystemOur approach
ResultsSummary
Where to go from here?
Summary and next steps
SummaryThe operators manage to reduce the number of rules andexpressed attributes in 30% in some cases.
Next stepsApply the CL and PR operators during the learning processInvestigate other measures of similarities among rulesApply these operators over other systems
Different representations
CUDA accelerated operators?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 26 / 29
BioHEL SystemOur approach
ResultsSummary
Where to go from here?
References I
Bacardit, J., Burke, E., and Krasnogor, N. (2009).Improving the scalability of rule-based evolutionary learning.Memetic Computing, 1(1):55–67.
Bacardit, J. and Krasnogor, N. (2009).A mixed discrete-continuous attribute list representation for large scale classification domains.In GECCO ’09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pages1155–1162, New York, NY, USA. ACM Press.
Franco, M., Krasnogor, N., and Bacardit, J. (2012a).Analysing biohel using challenging boolean functions.Evolutionary Intelligence, 5:87–102.10.1007/s12065-012-0080-9.
Franco, M. A., Krasnogor, N., and Bacardit, J. (2010).Speeding up the evaluation of evolutionary learning systems using GPGPUs.In GECCO ’10: Proceedings of the 12th annual conference on Genetic and evolutionary computation, pages1039–1046, New York, NY, USA. ACM.
Franco, M. A., Krasnogor, N., and Bacardit, J. (2011).Modelling the initialisation stage of the alkr representation for discrete domains and gabil encoding.In Proceedings of the 13th annual conference on Genetic and evolutionary computation, GECCO ’11, pages1291–1298, New York, NY, USA. ACM.
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 27 / 29
BioHEL SystemOur approach
ResultsSummary
Where to go from here?
References II
Franco, M. A., Krasnogor, N., and Bacardit, J. (2012b).Postprocessing operators for decision lists.In GECCO ’12: Proceedings of the 14th annual conference comp on Genetic and evolutionary computation,page to appear, New York, NY, USA. ACM Press.
Venturini, G. (1993).SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts.In Brazdil, P. B., editor, Machine Learning: ECML-93 - Proceedings of the European Conference on MachineLearning, pages 280–296. Springer-Verlag.
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 28 / 29
BioHEL SystemOur approach
ResultsSummary
Where to go from here?
Questions or comments?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 29 / 29