analysing biohel using challenging boolean functions

44
BioHEL GBML System k-Disjuntive Normal functions Experiments Conclusions and Further Work Analysing BioHEL Using Challenging Boolean Functions María A. Franco, Natalio Krasnogor and Jaume Bacardit University of Nottingham, UK, ASAP Research Group, School of Computer Science {mxf,nxk,jqb}@cs.nott.ac.uk July 8, 2010 M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 1 / 27 right-logo

Upload: university-of-nottingham

Post on 27-Jul-2015

773 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Analysing BioHEL Using ChallengingBoolean Functions

María A. Franco, Natalio Krasnogor and Jaume Bacardit

University of Nottingham, UK,ASAP Research Group,

School of Computer Science{mxf,nxk,jqb}@cs.nott.ac.uk

July 8, 2010

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 1 / 27

right-logo

Page 2: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

1 BioHEL GBML SystemCharacteristics of the systemBioHEL fitness functionOpen questions for BioHEL

2 k-Disjuntive Normal functions

3 ExperimentsExperiment SetupIterations and execution timeLearning and overgeneralisation

4 Conclusions and Further Work

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 2 / 27

Page 3: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Characteristics of the systemBioHEL fitness functionOpen questions for BioHEL

The BioHEL GBML System

BIOinformatics-oriented Hierarchical Evolutionary Learning- BioHEL[Bacardit et al., 2009]BioHEL was designed to handle large scale bioinformaticsdatasets[Stout et al., 2008]BioHEL is a GBML system that employs the Iterative RuleLearning (IRL) paradigm

First used in EC in Venturini’s SIA system[Venturini, 1993]Widely used for both Fuzzy and non-fuzzy evolutionarylearning

BioHEL inherits most of its components fromGAssist[Bacardit, 2004], a Pittsburgh GBML system

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 3 / 27

Page 4: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Characteristics of the systemBioHEL fitness functionOpen questions for BioHEL

The BioHEL GBML System

BIOinformatics-oriented Hierarchical Evolutionary Learning- BioHEL[Bacardit et al., 2009]BioHEL was designed to handle large scale bioinformaticsdatasets[Stout et al., 2008]BioHEL is a GBML system that employs the Iterative RuleLearning (IRL) paradigm

First used in EC in Venturini’s SIA system[Venturini, 1993]Widely used for both Fuzzy and non-fuzzy evolutionarylearning

BioHEL inherits most of its components fromGAssist[Bacardit, 2004], a Pittsburgh GBML system

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 3 / 27

Page 5: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Characteristics of the systemBioHEL fitness functionOpen questions for BioHEL

The BioHEL GBML System

BIOinformatics-oriented Hierarchical Evolutionary Learning- BioHEL[Bacardit et al., 2009]BioHEL was designed to handle large scale bioinformaticsdatasets[Stout et al., 2008]BioHEL is a GBML system that employs the Iterative RuleLearning (IRL) paradigm

First used in EC in Venturini’s SIA system[Venturini, 1993]Widely used for both Fuzzy and non-fuzzy evolutionarylearning

BioHEL inherits most of its components fromGAssist[Bacardit, 2004], a Pittsburgh GBML system

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 3 / 27

Page 6: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Characteristics of the systemBioHEL fitness functionOpen questions for BioHEL

The BioHEL GBML System

BIOinformatics-oriented Hierarchical Evolutionary Learning- BioHEL[Bacardit et al., 2009]BioHEL was designed to handle large scale bioinformaticsdatasets[Stout et al., 2008]BioHEL is a GBML system that employs the Iterative RuleLearning (IRL) paradigm

First used in EC in Venturini’s SIA system[Venturini, 1993]Widely used for both Fuzzy and non-fuzzy evolutionarylearning

BioHEL inherits most of its components fromGAssist[Bacardit, 2004], a Pittsburgh GBML system

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 3 / 27

Page 7: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Characteristics of the systemBioHEL fitness functionOpen questions for BioHEL

Iterative Rule Learning

IRL has been used for many years in the ML community, with thename of separate-and-conquer

Algorithm 1.1: ITERATIVERULELEARNING(Examples)

Theory ← ∅while Example 6= ∅

do

Rule← FindBestRule(Examples)Covered ← Cover(Rule, Examples)if RuleStoppingCriterion(Rule, Theory , Examples)

then exitExamples ← Examples − CoveredTheory ← Theory ∪ Rule

return (Theory)

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 4 / 27

Page 8: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Characteristics of the systemBioHEL fitness functionOpen questions for BioHEL

Characteristics of BioHEL

A fitness function based on theMinimum-Description-Length (MDL) (Rissanen,1978)principle that tries to

Evolve accurate rulesEvolve high coverage rulesEvolve rules with low complexity, as general as possible

The ILAS windowing schemeEfficiency enhancement method, not all training points areused for each fitness computation

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 5 / 27

Page 9: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Characteristics of the systemBioHEL fitness functionOpen questions for BioHEL

Characteristics of BioHEL

A fitness function based on theMinimum-Description-Length (MDL) (Rissanen,1978)principle that tries to

Evolve accurate rulesEvolve high coverage rulesEvolve rules with low complexity, as general as possible

The ILAS windowing schemeEfficiency enhancement method, not all training points areused for each fitness computation

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 5 / 27

Page 10: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Characteristics of the systemBioHEL fitness functionOpen questions for BioHEL

Characteristics of BioHEL

The Attribute List Knowledge representationRepresentation designed to handle high-dimensionalitydomains

An explicit default rule mechanismGenerating more compact rule sets

Ensembles for consensus predictionEasy system to boost robustness

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 6 / 27

Page 11: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Characteristics of the systemBioHEL fitness functionOpen questions for BioHEL

Characteristics of BioHEL

The Attribute List Knowledge representationRepresentation designed to handle high-dimensionalitydomains

An explicit default rule mechanismGenerating more compact rule sets

Ensembles for consensus predictionEasy system to boost robustness

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 6 / 27

Page 12: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Characteristics of the systemBioHEL fitness functionOpen questions for BioHEL

Characteristics of BioHEL

The Attribute List Knowledge representationRepresentation designed to handle high-dimensionalitydomains

An explicit default rule mechanismGenerating more compact rule sets

Ensembles for consensus predictionEasy system to boost robustness

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 6 / 27

Page 13: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Characteristics of the systemBioHEL fitness functionOpen questions for BioHEL

BioHEL fitness function

Coverage term penalises rules that do not cover a minimumpercentage of examples

Choosing the coverage break changes the behaviour andperformance of the entire system

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 7 / 27

Page 14: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Characteristics of the systemBioHEL fitness functionOpen questions for BioHEL

Open questions for BioHEL

Does a single coverage break work for the same family ofproblems?How difficult is to hand-tune the coverage break?What is the performance impact of the coverage breakwhen it is not properly adjusted?

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 8 / 27

Page 15: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Characteristics of the systemBioHEL fitness functionOpen questions for BioHEL

Open questions for BioHEL

Does a single coverage break work for the same family ofproblems?How difficult is to hand-tune the coverage break?What is the performance impact of the coverage breakwhen it is not properly adjusted?

Motivation of the paperThe motivation of the paper is to answer this questions. Weused k-DNF problems to test exhaustively the system withproblems that vary their difficulty.

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 8 / 27

Page 16: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

k-Disjuntive Normal functions

r disjuntive termsd possible attributesk represented attributes in each term

Exampled = 10, k = 3, r = 3

(¬x1 ∧ x5 ∧ x7) ∨ (x1 ∧ ¬x2 ∧ x8) ∨ (x4 ∧ ¬x5 ∧ ¬x9)

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 9 / 27

Page 17: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

k-Disjuntive Normal functions

r disjuntive termsd possible attributesk represented attributes in each term

Exampled = 10, k = 3, r = 3

(¬x1 ∧ x5 ∧ x7) ∨ (x1 ∧ ¬x2 ∧ x8) ∨ (x4 ∧ ¬x5 ∧ ¬x9)

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 9 / 27

Page 18: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

k-DNF class imbalance

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

2 3 4 5 6 7 8 9 10

5 10

15 20

25 30

35 40

45 50

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

Probability of having a negative example

(1 - 2(-k)

)r

k - Attributes expressed

r - Number of terms

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 10 / 27

Page 19: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Experiment SetupIterations and execution timeLearning and overgeneralisation

Experimental setup

90 different k-DNF scenariosd = 20k ranging between 2 and 10r ranging between 5 and 50

5 different coverage breaksWe show results in terms of:

Iterations to learn a optimal k-DNF termNumber of cases where the system overgeneralised andlearned.

Using a fixed default class and the majority policy

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 11 / 27

Page 20: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Experiment SetupIterations and execution timeLearning and overgeneralisation

Iterations to learn a optimal k-DNF term

2 3

4 5

6 7

8 9

10 5 10

15 20

25 30

35 40

45 50

-2

0

2

4

6

8

10

12

14

Number of iterations to find a good rule

Model z=a*k + b*r + c*r2 + d

0.00010.0010.010.1

k - Number of terms in the rule

r - Number of rules

a > b > c > d

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 12 / 27

Page 21: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Experiment SetupIterations and execution timeLearning and overgeneralisation

Number of iterations to learn a good rule

Coverage break 0.0001Coverage break 0.0001Coverage break 0.0001Coverage break 0.0001Coverage break 0.0001Coverage break 0.0001Coverage break 0.0001Coverage break 0.0001Coverage break 0.0001Coverage break 0.00015 10 15 20 25 30 35 40 45 50

2345678910

2345678910

0,62 1,64 1,83 1,64 3,25 3,55 3,65 3,71 3,89 3,94 3,92 4,33 4,92 5,48 5,63 5,93 5,96 6,04 6,13 6,07 6,265,60 6,39 7,06 7,38 7,41 7,67 7,71 7,95 7,96 8,076,55 7,68 8,19 8,40 8,84 9,05 9,22 9,47 9,57 9,637,58 8,76 9,37 9,80 9,94 10,21 10,45 10,67 10,82 10,959,02 10,22 10,72 11,09 11,45 11,64 11,77 12,03 12,10 12,24

10,65 11,57 12,15 12,64 12,76 12,87 13,11 13,12 13,30 13,42

Coverage break 0.001Coverage break 0.001Coverage break 0.001Coverage break 0.001Coverage break 0.001Coverage break 0.001Coverage break 0.001Coverage break 0.001Coverage break 0.001Coverage break 0.0015 10 15 20 25 30 35 40 45 50

0,65 1,64 1,83 1,53 3,16 3,51 3,60 3,65 3,83 3,91 3,91 4,31 4,79 5,31 5,53 5,87 5,91 5,92 6,01 5,95 6,185,27 6,07 6,73 7,12 7,11 7,35 7,49 7,70 7,73 7,815,96 7,10 7,58 7,80 8,30 8,45 8,69 8,95 9,09 9,197,04 8,07 8,65 8,97 9,12 9,41 9,67 9,90 10,02 10,219,18 10,10 10,39 10,70 11,00 11,11 11,19 11,43 11,51 11,64

10,11 11,22 11,71

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 13 / 27

Page 22: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Experiment SetupIterations and execution timeLearning and overgeneralisation

Number of iterations to learn a good rule

Coverage break 0.0001Coverage break 0.0001Coverage break 0.0001Coverage break 0.0001Coverage break 0.0001Coverage break 0.0001Coverage break 0.0001Coverage break 0.0001Coverage break 0.0001Coverage break 0.00015 10 15 20 25 30 35 40 45 50

2345678910

2345678910

0,62 1,64 1,83 1,64 3,25 3,55 3,65 3,71 3,89 3,94 3,92 4,33 4,92 5,48 5,63 5,93 5,96 6,04 6,13 6,07 6,265,60 6,39 7,06 7,38 7,41 7,67 7,71 7,95 7,96 8,076,55 7,68 8,19 8,40 8,84 9,05 9,22 9,47 9,57 9,637,58 8,76 9,37 9,80 9,94 10,21 10,45 10,67 10,82 10,959,02 10,22 10,72 11,09 11,45 11,64 11,77 12,03 12,10 12,24

10,65 11,57 12,15 12,64 12,76 12,87 13,11 13,12 13,30 13,42

Coverage break 0.001Coverage break 0.001Coverage break 0.001Coverage break 0.001Coverage break 0.001Coverage break 0.001Coverage break 0.001Coverage break 0.001Coverage break 0.001Coverage break 0.0015 10 15 20 25 30 35 40 45 50

0,65 1,64 1,83 1,53 3,16 3,51 3,60 3,65 3,83 3,91 3,91 4,31 4,79 5,31 5,53 5,87 5,91 5,92 6,01 5,95 6,185,27 6,07 6,73 7,12 7,11 7,35 7,49 7,70 7,73 7,815,96 7,10 7,58 7,80 8,30 8,45 8,69 8,95 9,09 9,197,04 8,07 8,65 8,97 9,12 9,41 9,67 9,90 10,02 10,219,18 10,10 10,39 10,70 11,00 11,11 11,19 11,43 11,51 11,64

10,11 11,22 11,71

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 14 / 27

Page 23: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Experiment SetupIterations and execution timeLearning and overgeneralisation

Number of iterations to learn a good rule

Coverage break 0.01Coverage break 0.01Coverage break 0.01Coverage break 0.01Coverage break 0.01Coverage break 0.01Coverage break 0.01Coverage break 0.01Coverage break 0.01Coverage break 0.015 10 15 20 25 30 35 40 45 50

2345678910

2345678910

0,61 1,48 1,68 1,54 2,73 3,09 3,19 3,23 3,48 3,59 3,53 3,64 4,11 4,55 4,75 5,20 5,29 5,34 5,52 5,52 5,754,95 5,40 5,96 6,25 6,30 6,68 6,76 6,96 7,04 7,307,53 7,88 7,88 8,02 8,35 8,46 8,56 8,78 8,88 9,07

Coverage break 0.1Coverage break 0.1Coverage break 0.1Coverage break 0.1Coverage break 0.1Coverage break 0.1Coverage break 0.1Coverage break 0.1Coverage break 0.1Coverage break 0.15 10 15 20 25 30 35 40 45 50

0,50 1,29 1,45 1,39 3,21

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 15 / 27

Page 24: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Experiment SetupIterations and execution timeLearning and overgeneralisation

Which one is the best configuration?

Minimum valuesMinimum valuesMinimum valuesMinimum valuesMinimum valuesMinimum valuesMinimum valuesMinimum valuesMinimum valuesMinimum values5 10 15 20 25 30 35 40 45 50

2345678910

0,501,29 1,45 1,392,73 3,09 3,19 3,23 3,48 3,59 3,533,64 4,11 4,55 4,75 5,20 5,29 5,34 5,52 5,52 5,754,95 5,40 5,96 6,25 6,30 6,68 6,76 6,96 7,04 7,305,96 7,10 7,58 7,80 8,30 8,45 8,56 8,78 8,88 9,077,04 8,07 8,65 8,97 9,12 9,41 9,67 9,90 10,02 10,219,02 10,10 10,39 10,70 11,00 11,11 11,19 11,43 11,51 11,6410,11 11,22 12,15 11,71 12,76 12,87 13,11 13,12 13,30 13,42

The adequate coverage break depends on thecharacteristics of the problem

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 16 / 27

Page 25: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Experiment SetupIterations and execution timeLearning and overgeneralisation

Execution time to learn the problem

2 3 4 5 6 7 8 9 10

5 10 15 20 25 30 35 40 45 50

0

2000

4000

6000

8000

10000

12000

14000

Execution time (s)

Average execution time to learn the problem

0.00010.0010.010.1

k - Number of terms in the ruler - Number of rules

Execution time (s)

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 17 / 27

Page 26: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Experiment SetupIterations and execution timeLearning and overgeneralisation

Execution time to learn the problem - Majority policy

2 3 4 5 6 7 8 9 10

5 10 15 20 25 30 35 40 45 50

0

10000

20000

30000

40000

50000

60000

Execution time (s)

Average execution time to learn the problem

0.00010.0010.010.1

k - Number of terms in the ruler - Number of rules

Execution time (s)

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 18 / 27

Page 27: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Experiment SetupIterations and execution timeLearning and overgeneralisation

Execution time to learn the problem - Majority policy

2 3

4 5

6 7

8 9

10 5 10 15 20 25 30 35 40 45 50

0

10000

20000

30000

40000

50000

60000

Average execution time to learn the problem

0.00010.0010.010.1

k - Number of terms in the rule

r - Number of rules

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 19 / 27

Page 28: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Experiment SetupIterations and execution timeLearning and overgeneralisation

Summary

The execution time and the iterations are proportional to:Number of rules rNumber of specified attributes k

Learning with the minority policy is more similar to a reallife scenario.Choosing the wrong default class might lead to learn amore difficult problem.

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 20 / 27

Page 29: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Experiment SetupIterations and execution timeLearning and overgeneralisation

Learning and overgeneralisation

Learning mapsShow different colours depending on the percentage ofexamples that learned correctly, overgeneralised and did notlearn the correct set of rules.

Blue: total learning⇒ All the runs learned the right set ofrulesCyan: between learning and overgeneralisationPurple: overgeneralisation⇒ All the runs learned a set ofrules with less that 100% accuracy.Orange: between overgeneralisation and no learningRed: no learning⇒ All the runs used the default rule tocover all the examples. No rules were generated

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 21 / 27

Page 30: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Experiment SetupIterations and execution timeLearning and overgeneralisation

Learning and overgeneralisation

Learning mapsShow different colours depending on the percentage ofexamples that learned correctly, overgeneralised and did notlearn the correct set of rules.

Blue: total learning⇒ All the runs learned the right set ofrulesCyan: between learning and overgeneralisationPurple: overgeneralisation⇒ All the runs learned a set ofrules with less that 100% accuracy.Orange: between overgeneralisation and no learningRed: no learning⇒ All the runs used the default rule tocover all the examples. No rules were generated

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 21 / 27

Page 31: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Experiment SetupIterations and execution timeLearning and overgeneralisation

Learning and overgeneralisation - Default class 0

k - Attributes expressed

r -

Num

ber

of te

rms o

r ru

les

Map of cases - Cov. break 0,0001 - Default Class 0

1 2 3 4 5 6 7 8 9 10 11

0

5

10

15

20

25

30

35

40

45

50

55

(a) Cov. Break 0,0001k - Attributes expressed

r -

Num

ber

of te

rms o

r ru

les

Map of cases - Cov. break 0,001 - Default Class 0

1 2 3 4 5 6 7 8 9 10 11

0

5

10

15

20

25

30

35

40

45

50

55

(b) Cov. Break 0,001k - Attributes expressed

r -

Num

ber

of te

rms o

r ru

les

Map of cases - Cov. break 0,01 - Default Class 0

1 2 3 4 5 6 7 8 9 10 11

0

5

10

15

20

25

30

35

40

45

50

55

(c) Cov. Break 0,01

k - Attributes expressed

r -

Num

ber

of te

rms o

r ru

les

Map of cases - Cov. break 0,1 - Default Class 0

1 2 3 4 5 6 7 8 9 10 11

0

5

10

15

20

25

30

35

40

45

50

55

(d) Cov. Break 0,1k - Attributes expressed

r -

Num

ber

of te

rms o

r ru

les

Map of cases - Cov. break 0,5 - Default Class 0

1 2 3 4 5 6 7 8 9 10 11

0

5

10

15

20

25

30

35

40

45

50

55

(e) Cov. Break 0,5

Blue: total learning, Cyan: between learning and overgeneralisation, Purple: overgeneralisation,Orange: between overgeneralisation and no learning , Red: no learning

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 22 / 27

Page 32: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Experiment SetupIterations and execution timeLearning and overgeneralisation

Learning and overgeneralisation - Majority policy

k - Attributes expressed

r -

Num

ber

of te

rms o

r ru

les

Map of cases - Cov. break 0,0001 - Default Class major

1 2 3 4 5 6 7 8 9 10 11

0

5

10

15

20

25

30

35

40

45

50

55

(f) Cov. Break 0,0001k - Attributes expressed

r -

Num

ber

of te

rms o

r ru

les

Map of cases - Cov. break 0,001 - Default Class major

1 2 3 4 5 6 7 8 9 10 11

0

5

10

15

20

25

30

35

40

45

50

55

(g) Cov. Break 0,001k - Attributes expressed

r -

Num

ber

of te

rms o

r ru

les

Map of cases - Cov. break 0,01 - Default Class major

1 2 3 4 5 6 7 8 9 10 11

0

5

10

15

20

25

30

35

40

45

50

55

(h) Cov. Break 0,01

k - Attributes expressed

r -

Num

ber

of te

rms o

r ru

les

Map of cases - Cov. break 0,1 - Default Class major

1 2 3 4 5 6 7 8 9 10 11

0

5

10

15

20

25

30

35

40

45

50

55

(i) Cov. Break 0,1k - Attributes expressed

r -

Num

ber

of te

rms o

r ru

les

Map of cases - Cov. break 0,5 - Default Class major

1 2 3 4 5 6 7 8 9 10 11

0

5

10

15

20

25

30

35

40

45

50

55

(j) Cov. Break 0,5

Blue: total learning, Cyan: between learning and overgeneralisation, Purple: overgeneralisation,Orange: between overgeneralisation and no learning , Red: no learning

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 23 / 27

Page 33: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Experiment SetupIterations and execution timeLearning and overgeneralisation

Summary

The coverage break should be large enough to introducegeneralisation pressure over the system but low enough toavoid overgeneral rules.The adequate coverage break depends on k and alsodepends on r .The problems where the rules that cover wider areas aremore difficult to learn even with the right coverage break.The difficulty of a k-DNF problem depends on the classimbalance and the rule overlapping.

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 24 / 27

Page 34: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Experiment SetupIterations and execution timeLearning and overgeneralisation

Summary

The coverage break should be large enough to introducegeneralisation pressure over the system but low enough toavoid overgeneral rules.The adequate coverage break depends on k and alsodepends on r .The problems where the rules that cover wider areas aremore difficult to learn even with the right coverage break.The difficulty of a k-DNF problem depends on the classimbalance and the rule overlapping.

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 24 / 27

Page 35: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Experiment SetupIterations and execution timeLearning and overgeneralisation

Summary

The coverage break should be large enough to introducegeneralisation pressure over the system but low enough toavoid overgeneral rules.The adequate coverage break depends on k and alsodepends on r .The problems where the rules that cover wider areas aremore difficult to learn even with the right coverage break.The difficulty of a k-DNF problem depends on the classimbalance and the rule overlapping.

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 24 / 27

Page 36: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Experiment SetupIterations and execution timeLearning and overgeneralisation

Summary

The coverage break should be large enough to introducegeneralisation pressure over the system but low enough toavoid overgeneral rules.The adequate coverage break depends on k and alsodepends on r .The problems where the rules that cover wider areas aremore difficult to learn even with the right coverage break.The difficulty of a k-DNF problem depends on the classimbalance and the rule overlapping.

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 24 / 27

Page 37: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Conclusions

There is no coverage break that works with all type ofproblems⇒ No Free LunchThe adequate coverage break facilitates the learningwhile the wrong coverage break makes it harder or evenimpossible.

Open questionsWould it be possible to adapt the coverage break automaticallyand reduce the cost of hand tuning the parameters?

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 25 / 27

Page 38: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Conclusions

There is no coverage break that works with all type ofproblems⇒ No Free LunchThe adequate coverage break facilitates the learningwhile the wrong coverage break makes it harder or evenimpossible.

Open questionsWould it be possible to adapt the coverage break automaticallyand reduce the cost of hand tuning the parameters?

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 25 / 27

Page 39: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Conclusions

There is no coverage break that works with all type ofproblems⇒ No Free LunchThe adequate coverage break facilitates the learningwhile the wrong coverage break makes it harder or evenimpossible.

Open questionsWould it be possible to adapt the coverage break automaticallyand reduce the cost of hand tuning the parameters?

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 25 / 27

Page 40: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Further Work

Incorporate a heuristic inside BioHEL to determine a goodcoverage break for the problem and readapt this coveragebreak during the learning processAnalyse the learning map of other evolutionary learningsystems to determine strengths and weaknesses of thesystems.Encourage the usage of the kDNF family of problems as acommon benchmark in the LCS community

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 26 / 27

Page 41: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Further Work

Incorporate a heuristic inside BioHEL to determine a goodcoverage break for the problem and readapt this coveragebreak during the learning processAnalyse the learning map of other evolutionary learningsystems to determine strengths and weaknesses of thesystems.Encourage the usage of the kDNF family of problems as acommon benchmark in the LCS community

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 26 / 27

Page 42: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Further Work

Incorporate a heuristic inside BioHEL to determine a goodcoverage break for the problem and readapt this coveragebreak during the learning processAnalyse the learning map of other evolutionary learningsystems to determine strengths and weaknesses of thesystems.Encourage the usage of the kDNF family of problems as acommon benchmark in the LCS community

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 26 / 27

Page 43: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Bacardit, J. (2004).Pittsburgh Genetics-Based Machine Learning in the Data Mining era: Representations, generalization, andrun-time.PhD thesis, Ramon Llull University, Barcelona, Spain.

Bacardit, J., Burke, E., and Krasnogor, N. (2009).Improving the scalability of rule-based evolutionary learning.Memetic Computing, 1(1):55–67.

Stout, M., Bacardit, J., Hirst, J. D., and Krasnogor, N. (2008).Prediction of recursive convex hull class assignments for protein residues.Bioinformatics, 24(7):916–923.

Venturini, G. (1993).SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts.In Brazdil, P. B., editor, Machine Learning: ECML-93 - Proceedings of the European Conference on MachineLearning, pages 280–296. Springer-Verlag.

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 27 / 27

Page 44: Analysing BioHEL Using Challenging Boolean Functions

BioHEL GBML Systemk-Disjuntive Normal functions

ExperimentsConclusions and Further Work

Questions or comments?

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Analysing BioHEL Using Boolean Functions 27 / 27