gecco'2006: bounding xcs’s parameters for unbalanced datasets

28
Bounding XCS’s Parameters f U b l d D t t for Unbalanced Datasets Albert Orriols-Puig Ester Bernadó-Mansilla Research Group in Intelligent Systems Enginyeria i Arquitectura La Salle Ramon Llull University Barcelona, Spain

Upload: albert-orriols-puig

Post on 24-Jan-2015

328 views

Category:

Education


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

Bounding XCS’s Parameters f U b l d D t tfor Unbalanced Datasets

Albert Orriols-PuigEster Bernadó-Mansilla

Research Group in Intelligent SystemsEnginyeria i Arquitectura La Salle

Ramon Llull UniversityBarcelona, Spain, p

Page 2: GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

Framework

New instance

Learner M d l

Information basedon experience

Knowledgeextraction

New instance

Dataset Learner Model

Consisting

Examples

Counter-examples

Predicted Output

ofCou te e a p es

In real-world domains, typically:, yp yHigher cost to obtain examples of the concept to be learntSo, distribution of examples in the training dataset is usually unbalanced

Applications:Fraud DetectionRare medical diagnosis

Slide 2GRSI Enginyeria i Arquitectura la Salle

Rare medical diagnosisDetection of oil spills in satellite images

Page 3: GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

Framework

Do learners suffer from class imbalances?

L Minimize theTrainingLearner Minimize the

global errorSet

examplesnumbererrorsnumerrorsnum

error cc 21 .. +=Biased towards

the overwhelmed class

Maximization of the overwhelmed class accuracy,in detriment of the minority class.

Slide 3GRSI Enginyeria i Arquitectura la Salle

Page 4: GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

Aim

Analyze the performance of XCS when learning from imbalanced datasets

Analyze the contribution of the different components

P h th t f ilit t t lPropose approaches that facilitate to learn minority class regions

Slide 4GRSI Enginyeria i Arquitectura la Salle

Page 5: GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

Outline

1. Description of XCS

2. Description of the Domain

3 E i t ti3. Experimentation

4 XCS and Class Imbalances4. XCS and Class Imbalances

5. Guidelines for Parameter Tuning

6. Online Adaptation

7. Conclusions

Slide 5GRSI Enginyeria i Arquitectura la Salle

Page 6: GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

1. Description of XCS1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningp

In single-step tasks:

g6. Online Adaptation7. Conclusions

Environment

g p

Problem instance

1 C A P ε F num as ts exp3 C A P ε F num as ts exp5 C A P ε F num as ts exp

Match Set [M]Selected

action

1 C A P ε F num as ts exp2 C A P ε F num as ts exp3 C A P ε F num as ts exp

Population [P] Match set generation

5 C A P ε F num as ts exp6 C A P ε F num as ts exp

…Prediction Array

REWARD

4 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp

A ti S t [A]

c1 c2 … cn

Random Action

1 C A P ε F num as ts exp3 C A P ε F num as ts exp5 C A P ε F num as ts exp

C

Action Set [A]

Selection, Reproduction, Mutation

Deletion ClassifierParameters

Update6 C A P ε F num as ts exp…Genetic Algorithm

Update

Slide 6GRSI Enginyeria i Arquitectura la Salle

Page 7: GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

1. Description of XCS1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningp g6. Online Adaptation7. Conclusions

Learning domain

Environment

Prediction RewardPrediction Reward

Set of Rules

GA ReinforcementLearning

R ti b t l 525 75Ratio between classes 525:75

1 minority class example

7 majority class examples

Slide 7GRSI Enginyeria i Arquitectura la Salle

j y p

Page 8: GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

2. Description of the Domain1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningp g6. Online Adaptation7. Conclusions

Selection

(11-bit) Multiplexer Imbalanced MultiplexerSelection

bitsPosition

bits

Example:

Complexity related to the

000 10010100:1

Co p e y e a ed o enumber of selection bitsCompletely balanced

•We under-sampled class 1

•ir: Proportion between majority andXCS should evolve:

000 0#######:0 000 0#######:1 000 1#######:0 000 1#######:1

ir: Proportion between majority and minority class instances

•i: imbalance level (i=log ir)001 #0######:0 001 #0######:1 001 #1######:0 001 #1######:1

010 ##0#####:0 010 ##0#####:1 010 ##1#####:0 010 ##1#####:1

011 ###0####:0 011 ###0####:1 011 ###1####:0 011 ###1####:1

•i: imbalance level (i=log2ir)

100 ####0###:0 100 ####0###:1 100 ####1###:0 100 ####1###:1

101 #####0##:0 101 #####0##:1 101 #####1##:0 101 #####1##:1

110 ######0#:0 110 ######0#:1 110 ######1#:0 110 ######1#:1

111 #######0:0 111 #######0:1 111 #######1:0 111 #######1:1

Slide 8GRSI Enginyeria i Arquitectura la Salle

111 #######0:0 111 #######0:1 111 #######1:0 111 #######1:1

Page 9: GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

3. Experimentation1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningp g6. Online Adaptation7. Conclusions

We ran XCS with the following standard configuration fromWe ran XCS with the following standard configuration from i=0 (ir=1) to i=9 (ir=512:1):

N=800, α=0.1, ν=5, Rmax = 1000, ε0=1, θGA=25, β=0.2,χ=0.8, μ=0.4, θdel=20, δ=0.1, θsub=200, P#=0.6

selection=rws mutation=nichedselection=rws, mutation=niched,GAsub=true, [A]sub=false

Slide 9GRSI Enginyeria i Arquitectura la Salle

Page 10: GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

3. Experimentation1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningp g6. Online Adaptation7. Conclusions

True Negative rate True Positive rate

i 16 1ir = 32:1

i 64 1ir = 16:1 ir = 64:1

Slide 10GRSI Enginyeria i Arquitectura la Salle

Page 11: GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

3. Experimentation1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningp g6. Online Adaptation7. Conclusions

Most numerous rules, ir=128:1

Condition:Action P Error F Num

###########:0 1000 0.120 0.98 385

###########:1 1.2 · 10-4 0.074 0.98 366

Estimated parametersare too high. Theoretically:

P:0 = 992.24 P:1 = 15.38P:0 992.24 P:1 15.38ε0:0 = ε0:1 = 7.75

Overgeneral classifiersovertake the population

(they represent the 94%of the population)

Slide 11GRSI Enginyeria i Arquitectura la Salle

Page 12: GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

4. XCS and Class Imbalances1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningg6. Online Adaptation7. Conclusions

We analyze the following factors:

Classifiers’ ErrorClassifiers Error

Stability of Prediction and Error Estimatesy

Occurrence-based Reproduction

Slide 12GRSI Enginyeria i Arquitectura la Salle

Page 13: GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

4. XCS and Class Imbalances 4 1 Classifiers’ Error

1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuning4.1. Classifiers Error g6. Online Adaptation7. Conclusions

How does the imbalance ratio influences the classifier’s error?XCS considers that a classifier is accurate if:XCS receives a reward of Rmax (correct prediction) or 0 (incorrect prediction)

0εε <cl

XCS computes classifiers’ error (ε) and prediction (p) as window averages:

P di ti ( )pRpp += β• Prediction:

• Error: ( )tttt pR εβεε −−+=+1

( )ttt pRpp −+=+ β1

Slide 13GRSI Enginyeria i Arquitectura la Salle

Page 14: GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

4. XCS and Class Imbalances 4 1 Classifiers’ Error

1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuning4.1. Classifiers Error

Until which class imbalance will XCS detect overgeneral

g6. Online Adaptation7. Conclusions

Until which class imbalance will XCS detect overgeneral classifiers?

– Bound for inaccurate classifier: 0εε ≥– Given the estimated prediction and error:

0

))(1(||)(||))(1()( minmax

lPRPlPRPRclPRclPP cc −+=

Overgeneral classifiersdetected

– We derive:))(1(||)(|| minmax clPRPclPRP cc −−+−=ε

0)(2 002 ≥−−+− εεε Rpp

0εε ≥

– where

0)(2 00max ≥+ εεε RppoOvergeneral classifiers

not detected1/1998 1998

– For CCp /!=

11000R– we get maximum imbalance ratio:

1998=ir

11000 0max == εR

10=i

0)(2 00max2 ≥−−+− εεε Rppo irmax = 1998

Slide 14GRSI Enginyeria i Arquitectura la Salle

1998max =ir 10max =iimax = 10

Page 15: GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

4. XCS and Class Imbalances 4 1 Classifiers’ Error

1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuning4.1. Classifiers Error

XCS computes classifiers’ error (ε) and prediction (p) as

g6. Online Adaptation7. Conclusions

XCS computes classifiers’ error (ε) and prediction (p) as window averages:– Prediction: ( )ttt pRpp −+=+ β1

Size of the windowPrediction:

– Error: ( )tttt pR εβεε −−+=+1

( )ttt pRpp ++ β1ew

ard

nce

of th

e re

β=0.2 The effect of previousrewards is forgotten

Influ

en

β=0.1

β=0.05

Slide 15GRSI Enginyeria i Arquitectura la Salle

timet+8t+1 t+2 t+3 t+4 t+5 t+6 t+7

Page 16: GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

4. XCS and Class Imbalances 4 2 Stability of Prediction and Error Estimates

1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuning4.2. Stability of Prediction and Error Estimates

S f f

g6. Online Adaptation7. Conclusions

Stability of Prediction and Error for ir=128:1

0.8

.30.

4

992.247.75β

= 0.

2

Den

sity

0.4

0.6

Den

sity

.10.

20.

3

β D

0.0

0.2

0 20 40 60 80 100

0.0

0.1

As ir increases β should be decreased

Prediction

900 920 940 960 980 1000Error

0 20 40 60 80 100

0.200.12

992.247.75

As ir increases, β should be decreasedto stabilize the prediction and error estimates

Den

sity

50.

100.

15

Den

sity

.04

0.08

= 0.

002

900 920 940 960 980 10000.

000.

05

D

0 20 40 60 80 100

0.00

0.04

β=

Slide 16GRSI Enginyeria i Arquitectura la Salle

Prediction

900 920 940 960 980 1000

Error

0 20 40 60 80 100

Page 17: GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

4. XCS and Class Imbalances 4 3 Occurrence based Reproduction

1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuning4.3. Occurrence-based Reproduction

To receive a GA event a classifier has to belong to [A]

g6. Online Adaptation7. Conclusions

To receive a GA event, a classifier has to belong to [A]Frequency of occurrences

Classifier pocc 11-Mux ir=128:0

000 0#######:0 0.062

iirp selocc = + 12

11

0.4

0.5

000 0#######:0000 1#######:1

### ########:0/1

000 1#######:1 0.000484

irp selocc ++ 12 1

irp selocc +

= + 11

21

1 0.2

0.3### ########:0/1

### ########:0 ½ 0.5

### ########:1 ½ 0.5 0

0.1

0 100 200 300 400 500

Classifiers that occur more frequently:– Have better estimates

0 100 200 300 400 500ir

– Tend to have more genetic opportunities…… depending on θGA

Slide 17GRSI Enginyeria i Arquitectura la Salle

Page 18: GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

4. XCS and Class Imbalances 4 3 Occurrence based Reproduction

1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuning4.3. Occurrence-based Reproduction

Genetic opportunities

g6. Online Adaptation7. Conclusions

Genetic opportunities– A classifier goes through a genetic event when (TGA):

• It occurs in [A]• Average time since last GA application > θGA

T (########### 0/1)

Tocc

TGA(###########:0/1)

GA GA GA GA

θGA25 θGA

50 θGA75 θGA

100

TGA(0001#######:1)

Set θGA = Tocc of the most infrequent nicheTo balance the genetic opportunities

that receive the different niches

ToccGA

Slide 18GRSI Enginyeria i Arquitectura la Salle

θGA

Page 19: GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

5. Guidelines for Parameter Tuning1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningg g6. Online Adaptation7. Conclusions

From the analysis we can extract the following guidelinesRmax and ε0 determine the threshold between negligible noise and max 0 g gimbalance ratio

β t th d f tf l ti W t thi ti tβ represents the reward forgetfulness ratio. We want this ratio to consider under-sampled instances:

f i

majffk min

1=β

θGA is the GA rate when Tocc < θGA. If we want that all niches receive the same number of genetic opportunities:

min2

1f

kGA =θ

Slide 19GRSI Enginyeria i Arquitectura la Salle

minf

Page 20: GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

5. Guidelines for Parameter Tuning1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningg g6. Online Adaptation7. Conclusions

We set β={0.04,0.02,0.01,0.005} and θGA={200,400,800,800,1600}

Standard Configuration Configuration following the guidelinesStandard Configuration Configuration following the guidelines

ir = 16:1 ir = 32:1 ir = 64:1 ir = 64:1 ir = 128:1 ir = 256:1

Slide 20GRSI Enginyeria i Arquitectura la Salle

Page 21: GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

6. Online Adaptation1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningp g6. Online Adaptation7. Conclusions

Problem: How can we estimate the niche frequency?f

f maj– In the multiplexer:

– In a real-world problem

irf

f maj=min

In a real world problem…… niche frequencies may not be related to imbalance ratio

small disjuncts

ir = 5 in both figures

Slide 21GRSI Enginyeria i Arquitectura la Salle

Page 22: GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

6. Online Adaptation1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningp g6. Online Adaptation7. Conclusions

Our approach: Let XCS discover small disjuncts.

We search for regions that promote overgeneral classifiers

We estimate ir based on that regionsWe estimate ircl based on that regions

We use ircl to adapt β and θGA Overgeneral classifierircl = 14:1

Slide 22GRSI Enginyeria i Arquitectura la Salle

Page 23: GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

6. Online Adaptation1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningp g6. Online Adaptation7. Conclusions

The Algorithm

Checking if prediction oscillates

Estimating the imbalance ratio

Requiring a minimum of experience and numerosity

to adapt the parametersto adapt the parameters

Adapting parametersfollowing the guidelines and

the estimation of θGAGA

Slide 23GRSI Enginyeria i Arquitectura la Salle

Page 24: GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

6. Online Adaptation1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningp g6. Online Adaptation7. Conclusions

Standard Configuration Online AdaptationConfiguration following the guidelinesStandard Configuration Online AdaptationConfiguration following the guidelines

ir = 64:1 ir = 128:1 ir = 256:1ir = 16:1 ir = 32:1 ir = 64:1ir = 64:1 ir = 128:1 ir = 256:1

Slide 24GRSI Enginyeria i Arquitectura la Salle

Page 25: GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

7. Conclusions1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningg6. Online Adaptation7. Conclusions

We studied the behavior of XCS when the training set is unbalanced

XCS with standard configuration only can solve the multiplexer for an imbalance ratio up to ir=16p

The theoretical analysis denotes that XCS is highly robust to class imbalances if:class imbalances if:

– Classifier estimates are accurate

N b f ti t iti f i h i b l d– Number of genetic opportunities of niches is balanced

We define guidelines to adapt XCS’s parameters:

– XCS could solve the multiplexer until an imbalance ratio ir=256

Slide 25GRSI Enginyeria i Arquitectura la Salle

Page 26: GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

7. Conclusions1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningg6. Online Adaptation7. Conclusions

As an advantage to other learners, XCS can automatically discover small disjuncts:

Self-adaptationof parameters

Slide 26GRSI Enginyeria i Arquitectura la Salle

Page 27: GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

7. Further Work1. Description of XCS2. Description of the domain3. Experimentation4. XCS and class imbalances5. Guidelines for P. Tuningg6. Online Adaptation7. Conclusions

What about the convergence time?– An increase θGA A decrease of search for promising rulesGA p g

Cluster-based resampling methods…… unfortunately, there is no a direct relation between cluster and niche

What about niche-based resampling?

i 14 1irniche = 14:1

Let’s resampleLet s resamplethese instances 1/irniche

Slide 27GRSI Enginyeria i Arquitectura la Salle

Page 28: GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

Bounding XCS’s Parameters f U b l d D t tfor Unbalanced Datasets

Albert Orriols-PuigEster Bernadó-Mansilla

Research Group in Intelligent SystemsEnginyeria i Arquitectura La Salle

Ramon Llull UniversityBarcelona, Spain, p