i workshop on knowledge extraction based on evolutionary ......international journal of...

I Workshop on Knowledge Extraction based on Evolutionary Learning

Granada, May 16th, 2008

Low Quality DataLuciano Sánchez

Universidad de Oviedo

Part I

Introduction to Low Quality Data

Summary

Part I: Introduction to Low Quality Data

Future directions

Novel types of data (issues with the representation, fuzzy random variables)

Preprocessing

Learning (Inference, Fitness, Genetic optimization)

Validation

Part II: Some results on the use of Evolutionary Algorithms for knowledge extraction from low quality data

Future directions in 2005

Trade-off interpretability vs. precision. Use of MOGAs

FRBS for high dimensional problems

GFSs in Data Mining and Knowledge Discovery

Learning genetic models based on vague data

Herrera, F. Genetic Fuzzy Systems: Status, Critical Considerations and Future Directions. International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67

Future directions in 2008

Multiobjective genetic learning of FRBSs: interpretability-precision

GA-based techniques for mining fuzzy association rules and data mining

Learning genetic models based on low quality data

Genetic learning of fuzzy partitions and context adaptation

Genetic adaptation of inference engine components

Revisiting Michigan style GFSs

Herrera, F. Genetic Fuzzy Systems: Taxonomy, Current Research Trends and Prospects. Evolutionary Intelligence 1 (2008) 27-46

Rationale behind low quality data

Crisp data-based GFS are standard statistical classifiers and models:

Genetic fuzzy classifiers minimize a biased estimation of the classification error (the training error)

Genetic fuzzy models minimize an estimate of the squared error of a model

There might be theoretical differences if artificial imprecision is added to crisp data and a fuzzy fitness-based GFS is used

[4] Sánchez, L., Couso, I., Advocating the use of imprecisely observed data in Genetic Fuzzy Systems. IEEE Trans Fuzzy Sys 15(4). 2007. 551-562

Addition of imprecision to crisp data (I)

There might be a relation between fuzzy fitness-based GF classifiers and SVM.

Addition of imprecision to crisp data (II)

The same happens with models - the more regular the model, the more specific its fuzzy fitness is, pointing again to relations between regularized models and fuzzy fitness-based GF models.

Further uses of low quality data

Imprecise / low quality

Imprecisely measured data

Coarsely discretized data, censored data

Missing values

Novel representations

Crisp data + tolerance

Aggregated values, lists, conflicting data

Addition of imprecision to crisp data

Fuzzy Random Variables

Classical modelSecond order modelFirst order, imprecise probabilities-based

[8] Couso, I., Sánchez, L., Higher order models for fuzzy random variables. Fuzzy Sets and Systems 159, 3, 237-258, 2008

0 1

1

0.1

0 1

1

0.5

w1

w2

w3

{1}

!X

!

{1|0 + 0.1|1}

Representation

The first order model has a possibilistic interpretation, coherent with the view of a fuzzy set as a familiy of confidence intervals

f(x|!) !(x|!)

A A

P (A) ! maxA

!(x|!)P (A) =!

Af(x|!)dx

Critical considerations

EAs used in GFS

Simple GAs

The use of novel EAs

Experimental study

Benchmark problems and reproducibility

Lack of experimental statistical analysis

Comparison with the state of the art

Graphical!

Data!

Analysis

Pre-!

processing

Learning

Model!

Design

Validation

Herrera, F. Genetic Fuzzy Systems: Status, Critical Considerations and Future Directions. International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67

Graphical Analysis

PCA, ICA, MDS, for interval and fuzzy data

Graphical!

Data!

Analysis

Pre-!

processing

Learning

Model!

Design

Validation

[9] Sánchez, L., Palacios, A., Suárez, M. R., Couso, I. Graphical exploratory analysis of vague data in the early diagnosis of dyslexia. IPMU 08. Málaga.

Preprocessing

Feature selection (mutual information between frv, and other wrapper / filter methods)

Instance selection

Transformations of sets variables

Graphical!

Data!

Analysis

Pre-!

processing

Learning

Model!

Design

Validation

[7] Sanchez, L., Suarez, M. R., Villar, J. R., Couso, I. Some results about mutual information based feature selection and fuzzy discretization of vague data. Intl. Conf. Fuzzy Systems FUZZ-IEEE 2007, pp. 1-6. 2007.

Fuzzy fitness-based GFS

New inference mechanisms, that are coherent with the representation of interval or fuzzy data

New measures of fitness (variance of an frv, error of a classifier on imprecise data, etc.)

Graphical!

Data!

Analysis

Pre-!

processing

Learning

Model!

Design

Validation

Sanchez, L., Couso, I., Casillas, J., Genetic Learning of Fuzzy Rules based on Low Quality Data. Submitted to Fuzzy Sets and Systems.

Inference

The reasoning method must be coherent with the possibilistic representation of the data.

Fitness Function

!K

K

0

0

1

X(x1) X(x2)

x2x1

0

1

0

1

K2

VarCl(X) VarKr(X) VarIm1(X)

K2/3

[5] Couso, I., Dubois, D., Montes, S., Sanchez, L., On various definitions of the variance of a fuzzy random variable. 5th International Symposium on Imprecise Probability: Theories and Applications. ISIPTA’07. 16-19 July 2007

The 1st order imprecise model has an interval-valued (not fuzzy) fitness

Optimization of interval-valued fitness functions

GAs and Metaheuristics should minimize interval valued functions

Optimizing an interval valued funcion is analogous to solving a multicriteria problem: the same algorithms (i.e. NSGA-II, SPEA, etc) can be adapted.

NSGA-II for interval-valued fitness

Dominance relation (non dominated sorting)Crowding distance

[2] Sánchez, L. Couso, I. Casillas, J. Modelling Vague Data with Genetic Fuzzy Systems under a Combination of Crisp and Imprecise Criteria, in Proc. of the 2007 IEEE Symposium on Computational Intelligence in Multicriteria Decision Making (MCDM'2007), pp. 30--37, Honolulu, Hawaii, USA, April 2007

Dominance

Strong dominanceUniform priorImprecise probabilities based prior

1 3 5

2

4

5

!1

!2

!1>!2

!1<!2

S=0.5

S=3.5

Crowding

The Haussdorff distance is between the minimum and the maximum distance between the individuals

I1I2

I3

I4I5

I6

V 5min = 0V 2

min

V 2max

V 5max

Validation

Benchmarks with interval and fuzzy data

Statistical tests for comparing samples of fuzzy random variables

Graphical!

Data!

Analysis

Pre-!

processing

Learning

Model!

Design

Validation

Couso, I., Sanchez, L., Mark-recapture techniques in statistical tests for imprecise data. International Journal of Approximate Reasoning (submitted)

[10] Couso, I., Sanchez, L., Defuzzification of fuzzy p-values. Fourth International Workshop on Soft methods in probability and statistics SPMS’08 (admitted)

Summary of Part I

Lots of open problems. Some to mention:Theoretical study of the relations with SVMGraphical analysis tools for coarse data (MDS, ICA, PCA, etc.)Feature selection, instance selection, transformationsNew inference mechanisms that match the representation of dataNew measures of fitness for classifiers and modelsTheoretical and practical studies relating MOEAs and optimization of interval and fuzzy-valued fitness functions, new MOEAs and MO metaheuristics for solving problems with mixtures of objectivesStatistical tests (bootstrap, t-test for imprecise data, etc.)Benchmarks for comparing the new algorithms.

Part II

Some results on the use of Evolutionary Algorithms for extracting knowledge from low

quality data

Examples of successful applications

1. Artificial addition of imprecision to crisp data, learning with IRL and GCCL (synthetic datasets)

2. Aggregates of conflicting data, learning with Pittsburgh-style GFS (marketing models)

3. Crisp data + tolerance, evolutionary filtering (GPS trajectories)

4. Interval-valued data, Pittsburgh-style GFS(Diagnosis of dyslexia)

5. Preprocessing and evolutionary graphical analysis

1. IRL (Boosting) - Extended Data

The addition of fuzzy imprecision to crisp data, followed by a fuzzy fitness-based GFS, improves the generalization

BFT BMO

0.1

0.2

0.3

0.4

0.5

[1] L. Sánchez, J. Otero, and J. R. Villar, “Boosting of fuzzy models for high-dimensional imprecise datasets,” in Proc. IPMU 2006, Paris, France, 2006.

1. GCCL - Extended data

GCCL is better than IRL for low quality data, in complexity and accuracy

Fuzzy Regression Models from Imprecise Data with Multiobjective GA 11

WM1 WM2 WM3 CH1 CH2 CH3 NIT LIN QUA NEU TSK BFT BMO GCCLf1 5.65 5.73 5.57 5.82 8.90 6.93 5.63 130.5 0.00 0.35 0.095 0.327 0.30 0.14

f1-10 6.89 7.19 6.54 6.84 10.15 8.20 7.16 133.91 1.40 1.78 1.62 1.86 1.71 1.65f1-20 11.07 10.99 11.06 11.33 13.45 12.42 10.63 135.6 5.29 6.42 5.90 6.04 5.98 5.89f1-50 51.78 46.40 47.80 53.48 48.94 48.16 39.65 166.64 33.53 41.18 36.76 39.62 38.66 38.66

f2 0.41 0.48 0.45 0.40 0.59 0.45 0.43 1.54 1.61 1.48 0.15 0.24 0.26 0.25f2-10 0.64 0.68 0.68 0.59 0.68 0.60 0.58 1.71 1.75 1.81 0.29 0.42 0.41 0.38f2-20 1.27 1.16 1.17 1.29 1.15 1.17 0.97 2.04 2.09 0.90 0.76 0.87 0.87 0.90f2-50 4.34 3.98 3.94 4.47 3.90 3.97 3.59 4.67 4.78 3.76 3.62 3.67 3.72 3.70

Table 5 Comparative test error of FGCCL and the synthetic datasets f1 and f2 (reproduced from reference [28]). Thealgorithms include: heuristics (WM, CH), statistical regression (LIN, QUA), neural networks (NEU), rules with a real-valuedconsequent (NIT), TSK rules (TSK), genetic backfitting (BFT) and MOSA-based backfitting (BMO). BFT, BMO and GCCLruns were limited to 25 fuzzy rules. The best of WM, CH, NIT, BFT, BMO and GCCL, plus the best overall model, werehighlighted for every dataset.

Statistical Fuzzy Rule LearningLIN NEU WM WLS-TSK BFT FGCCL

elec ·10!3 419 616 832 485 444 399machine-CPU 6204 8955 18057 7533 17857 9536

daily-elec 0.171 0.195 0.305 0.179 0.224 0.196Friedman 7.33 1.22 7.11 1.80 2.28 1.61

building ·103 4.77 2.76 4.44 2.68 3.15 2.98

Table 6 Performance of a selection of statistical models, heuristic rule learning and GFS in some benchmarks. The mostaccurate fuzzy rule learning algorithm is Weighted Least Squares TSK, followed by FGCCL.

Labels WM WLS-TSK BFT FGCCLelec 3 7 8 10 4

machine-CPU 3 20 91 25 4daily-elec 3 64 427 25 5Friedman 3 192 242 25 10building 3/2" 789 896" 30 20

Table 7 The number of fuzzy rules generated by FGCCL is much lower than those produced by either example-guided orgrid-guided learning methods. FGCCL improves grid-based learning algorithms by one order of magnitude in the number ofparameters. We have used three linguistic labels in all the variables (but the discrete ones). In the dataset “building”, thegrid-based algorithm depends on partitions of size 2. Otherwise, the number of rules is too high for practical purposes.

6.4 Performance of GCCL over machine learningdatabases

The fourth benchmark uses standard Machine Learn-ing benchmarks to assess the accuracy and the compact-ness of the FGCCL algorithm, in problems with di!erentsizes and number of inputs. In order to keep the discus-sion simple, we have chosen a selection of the algorithmsin the preceding section. CH1, CH2 and CH3 were dis-carded because they produced a much higher numberof rules than WM, without a noticeable increment ofthe accuracy. Also, since we are measuring the linguis-tic quality by the number of rules, without taking intoaccount the complexities of the consequent part of therules, we discard NIT, because it is less accurate thanweighted linear squares-based TSK, for the same numberof fuzzy rules. The quadratic regression is also removed,because most of times the corresponding polynomial hasmore terms than the multilayer perceptron needed forobtaining an equivalent accuracy.

Summarizing, linear regression and the neural net-work are left for reference purposes. Other than this,we have compared: one example-guided heuristic method(Wang and Mendel algorithm), one TSK grid-based al-gorithm (we fitted with weighted least squares a planeto each set of elements covered by one cell of the fuzzygrid; the weights are the memberships of the elements tothe cell), an IRL method (genetic backfitting) and FuzzyGCCL. The results are given in Tables 6, 7, and Figure5.

As expected, the example-guided heuristic uses a mod-erately high number of rules, and it is not very pre-cise (equivalent or worse than linear regression). Theweighted linear squares obtention of TSK rules is themost precise method, similar to that of the neural net-work. Backfitting and FGCCL both perform well ando!er a good compromise between compacteness and ac-curacy. FGCCL was significantly better than backfitting

Fuzzy Regression Models from Imprecise Data with Multiobjective GA 9

0% 1% 2% 3%f1 0.13 0.22 1.34 5.30

f1-10 1.59 1.67 2.75 6.69f1-20 5.98 6.08 7.17 11.05f1-30 14.37 14.53 15.76 19.92f1-50 39.22 39.30 40.31 44.16

f2 0.22 0.23 0.34 0.76f2-10 0.35 0.37 0.50 0.93f2-20 0.86 0.88 1.00 1.43f2-30 1.40 1.42 1.55 1.97f2-50 3.69 3.71 3.83 4.25

elec ·10!3 416 416 416 890

Table 3 Test error when the input data has tolerances ±1%, ±2% and ±3%. When the observation error is combined with ahigh slope in the function (function f1 and “elec” problem,) the learning algorithm overtrains. The abnormal values are shownin boldface.

1% 1% 5% 5% 10% 10%BFT FGCCL BFT FGCCL BFT FGCCL

f1 0.89 0.35 6.64 6.25 24.82 24.77f1-10 2.66 1.86 9.39 8.31 29.03 28.60

f2 0.52 0.23 0.60 0.47 1.41 1.23f2-10 0.56 0.37 0.97 0.68 1.67 1.70elec 440 421 581 558 1003 988

Table 4 Comparison of FGCCL and Backfitting in datasets with 1%, 5% and 10% of interval-valued imprecision in theoutputs, and tested with crisp data with non-zero mean observation error. FGCCL improved the results in all the tests forwhich the observation error was the most relevant source of noise.

The results are shown in Table 4. As expected, GCCLimproved the results in the tests for which the observa-tion error was the most relevant source of noise, almostuniformly.

6.3 Influence of the stochastic noise

The third benchmark checks whether FGCCL can be ap-plied to crisp problems. We have compared the accuracyof FGCCL with the selection of learning algorithms inreference [28]. We have intentionally dropped from thatlist the real-world datasets1, and focused in the syntheticdatasets, whose degree of contamination with stochasticnoise is known. It is emphasized that, in this case, wehave assumed that the observation error is zero, sincenone of the algorithms to which we will compare ourscan use vague data.

The fuzzy rule learning algorithms in reference [28]are: Wang and Mendel with importance degrees ‘max-imum’, ‘mean’ and ‘product of maximum and mean’(WM1, WM2 and WM3, respectively) [36] the same threeversions of Cordon and Herrera’s method (CH1, CH2,CH3) [3]. Nozaki, Ishibuchi and Tanaka’s (NIT) fuzzyrule learning [22], TSK rules [34] optimized with Weighted

1 The results of FGCCL on the two dropped problemsbuilding and elec can be found in Table 6. In both cases,FGCCL is better than all GFSs in reference [28], but thedi!erence is not relevant in this context.

Least Squares and Genetic Backfitting [27], and MOSA-based [32] backfitting (BMO) [28]. The same referenceincludes the linear regression (LIN), the Quadratic Re-gression (QUA) and a Conjugate-Gradient trained Mul-tilayer Perceptron (NEU). This 13 algorithms have beencompared to the Genetic Cooperative Competitive FGCCL.

The best overall result and the most accurate fuzzyrule base are boldfaced in Table 5. We have also in-cluded a selection of boxplots in Figure 4. These pro-vide a graphical insight of the relevance of the di!er-ences between the median, the mean and the variance ofthe results. Observe that the performances of the heuris-tic algorithms are always worse than those of the GFS.There are not, in general, significant di!erences in accu-racy between GFS, statistical nonlinear regression andneural networks, although there seems to be a small im-provement in accuracy of FGCCL over the remainingGFS. The significance of the di!erence is never statis-tically sound (95% level). Observe that, if we had usedthe results in the first column of Table 3 (which wereobtained after a di!erent set of ten independent runs ofthe algorithm) the conclusions are the same.

However, the most important gain, if FGCCL is tobe used in crisp datasets, is not the improved accuracybut the compactness of the rule base, as we show in thenext section.

[6] Sanchez, L., Otero, J. Learning fuzzy linguistic models from low quality data by genetic algorithms FUZZ-IEEE 2007, London. pp 1-6, 2007

2. Pittsburgh - Aggregated data (I)

The fuzzy fitness-based GFS improved the accuracy of the crisp version.

2. Pittsburgh - Aggregated data (II)

The precedence operator based on imprecise probabilities is more efficient in the latter generations of the GA

[2] Sánchez, L. Couso, I. Casillas, J. Modelling Vague Data with Genetic Fuzzy Systems under a Combination of Crisp and Imprecise Criteria, in Proc. of the 2007 IEEE Symposium on Computational Intelligence in Multicriteria Decision Making (MCDM'2007), pp. 30--37, Honolulu, Hawaii, USA, April 2007

3. Composite data (GPS)

Population-based SA and NSGA-II were used to find the lowest upper bound of the length of a trajectory

!"

! "!! #!!! #"!! $!!! $"!! %!!!

!$!!

&!!

'!!

(!!

#!!!

)*+,-./

012,3-./

%!!! %$!! %&!! %'!! %(!! &!!! &$!!

#!!!

#"!!

$!!!

$"!!

%!!!

)*+,-./

012,3-./

#$%& !!& '()*+,-.($+/ 0/+1 $2 -3+ +45+($6+2-/7 .8-)$2+1 89 /$60:)-$.2 .; <=> 8+3)?$.(7 )/ +45:)$2+1 $2 /+,-$.2 @AA&

$!!! $$!! $&!! $'!! $(!! %!!!

(!!

("!

)*+,-./

012,3-./

#$%& !B& C4)65:+ .; <=> %+2+()-+1 1)-) ):.2% D$-3 -3+ -(0+ -()*+,-.(9& E/ ,)2 8+ /++27 /.6+ .; -3+ !4+/ FC= 1. 2.- $2-+(/+,- -3+ -(0+-()*+,-.(9& '3+ 6./- 5)(- .; -3+ 5.$2-/ .; -3+ -()*+,-.(9 )(+ $2/$1+ -3+ FC= )- GHI&

@AA& CJ=CKALCM'> EMN KC>OP'>

A- D)/ 1+,$1+1 -. +?):0)-+ .0( ):%.($-36 $2 ) (+):$/-$, 5)-3 -3)- ,.?+(/ -3+ /$-0)-$.2/ 0/0)::9 ;.021 D3+2 -3+ @'>>

-+/- .; ) -)4$ $/ ,)(($+1 .0-& '3$/ $2,:01+/ /+?+(): -0(2/7 ),,+:+()-$.2/ )21 1+,+:+()-$.2/7 ,3)2%+/ $2 -3+ 2068+( .;

)?)$:)8:+ /)-+::$-+/ )21 -30/7 ,3)2%+/ $2 QNR= )21 FC= STBU& <=> :.2%$-01+V:)-$-01+ ,..(1$2)-+/ D+(+ -()2/:)-+1

-. O2$?+(/): '()2/?+(/+ L+(,)-.( 2.(-3$2%V+)/-$2% ,..(1$2)-+/ $2 .(1+( -. 6)W+ 1$/-)2,+ ,):,0:)-$.2/ 8+-D++2 <=>

!4+/ SB"U +)/$+(& X$-3 -3$/ /9/-+67 5.$2-/ .2 C)(-3Y/ /0(;),+ )(+ 5(.*+,-+1 .2-. )2 +Z0): /5),+1 5:)2)( 6+-($, %($17

-3+(+;.(+ -3+ 1$/-)2,+ 8+-D++2 !4+/ $/ -3+ 0/0): C0,:$1+)2 .2+&

'3+ -()*+,-.(9 $/ /)65:+1 +),3 /+,.217 )- +),3 :.,)-$.27 ) ()21.6 2068+( ;(.6 [ -. G $/ -)W+2 )/ -3+ 2068+(

!"

!"#$ %#&$!''

()*+

()(+

())+

(),+

()-+

().+

!/0/1213453'06/7/62893:;0<=>

#$%& "'& ()*+,)- ). /012,-1 ./)3 "4 /251 ). 6789 :5; <8=9>??@$-A)2- ;B5:3$C :5:,B1$1D ;:-:10- ). -A0 !/1- 10C-$)5 ). -/:E0C-)/B&

!"#$ %#&$!''

(?(+

(?)+

(?,+

(?-+

(?.+

!/0/1213453'06/7/62893:;0<=>

#$%& "F& ()*+,)- ). /012,-1 ./)3 "4 /251 ). 6789 :5; <8=9>?? @$-A)2- ;B5:3$C :5:,B1$1D ;:-:10- ). -A0 10C)5; 10C-$)5 ). -/:E0C-)/B&

G???& H7<HIJ8?7<8

K2/$5% -A0 ;0L0,)+305- ). -A$1 :++,$C:-$)5 @0 .)25; -A:- $. @0 /0+)/- ;$/0C-,B -A0 ;:-: )M-:$50; @$-A =N8

0O2$+305-D -A0/0 @0/0 ,0%:, $11201 :M)2- -A0 25C0/-:$5-B ). -A0 30:12/0305-1& P:*$ )@50/1 C)2,; 0:1$,B %:$5 $5

C)2/-1 :5B /0C,:3:-$)5 @A0/0 -A0 25C0/-:$5-B ). -A0 =N8 30:12/0305-1 @0/0 /0L0:,0;& <)@:;:B1 -A0 )5,B 1B1-03

-A:- @0 C:5 210 -) C0/-$.B : -:*$30-0/ $5 : GP88 Q:1 1-:-0; $5 ?R $1 -) 210 : =N8 ;0L$C0D 1) @0 A:L0 -) 3);$.B

-A0 +)$5- ). L$0@ ). -A0 GP88 -01- +/)C0;2/0& (0C:210 ). -A$1 @0 CA))10 -) %$L0 :1 /012,- -A0 2++0/ M)25; ). -A0

-/:E0C-)/B ,05%-A C)3+:-$M,0 @$-A =N8 ;:-:D $5 -A$1 @:B -A0/0 $1 5) ;)2M- $5 -A:- $. -A0 -:*$30-0/ /0+)/-0; ,05%-A

$1 "4S :M)L0 -A$1 30:12/0D -A05 $- 1A)2,; M0 /0E0C-0;& 9;;$-$)5:,,BD -A$1 :,-0/5:-$L0 $1 ,011 /01-/$C-$L0 @$-A -A0 /0:,

;:-: %$L05 -A0 M$:10; 0//)/ ;0-0C-0; $5 -A0 -:*$30-0/1&

P/20 6789 <8=9>??

K:-:10- I05%-A (01- 60:5 (01- 60:5

" '!!T&UVF '!'F&TW '!'U&T' '!F!&WF '!F'DFU

! !VF"&'4W !VWV&4W !VT!&VV !VWU&W' !VXT&4"

P9(IY G

ZY8JIP8 #Z76 "4 ZJ<8 7# 6789 9<K <8=9>?? [9G?<= P[Y K\<96?H 9<9I\8?8 69Z]YKD ^?P[ F444 =Y<YZ9P?7<8 9<K "U

?<K?G?KJ9I8 ?< 7<Y N7NJI9P?7<&

[3] Sanchez, L., Couso, I., Casillas, J. A Multiobjective Genetic Fuzzy System with Imprecise Probability Fitness for Vague Data. Int. Symp. on Evolving Fuzzy Systems (EFS 2006), pp. 131-136, 2006.

4. Feature Selection + Pittsburgh

Dyslexia diagnosis with interval data. The set of variables selected based on fuzzy techniques are uniformly better than those found by crisp techniques

~

FPMIN FPMAX FSMIN FSMAX FIMIN FIMAX CPMIN CPMAX CSMIN CSMAX CIMIN CIMAX

0.6

0.7

0.8

0.9

1.0

Erro

r

5. MDS, Interval & Missing data

MDS analysis for different granularities of the linguistic partition

0 10 20 30 40 50

010

20

30

40

50

V1

V2

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

1819

20

21

22

23

24

25

26

2728

29

30

31

32

33

34

35

36

37

38

39

4041

42

43

44

45

46

47

4849

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!! !

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

0 5 10

05

10

15

V1

V2

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

1718

19

20

21222324

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

Figure 5: Left: MDS in input space. Right: MDS in activation space. The use of the activationspace instead of the input space gains more insight into the structure of the data, and does notrequire scaling. The meaning of the colors is: Red: 0. Green: lower than 1. Blue: 1. Cyan: 2.Magenta: 3. Yellow: 4

0 1 2 3

0.0

0.5

1.0

1.5

2.0

2.5

3.0

V1

V2

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

3031

32

3334

353637

38

3940

41

4243

44

45

46

47

48

49

50

5152

53

54

55

56

57

58

5960

61

62

63

64

65

0.0 0.5 1.0 1.5 2.0 2.5 3.0

!0.5

0.0

0.5

1.0

1.5

2.0

2.5

V1

V2

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

2021

22

23

24

25

26

27

28

29

30

31

3233

34

353637

3839

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

0.0 0.5 1.0 1.5 2.0

0.0

0.5

1.0

1.5

2.0

V1

V2

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

2223

24

25

26

27

28

29

30

31

32

33

34

35

3637

38

39

40

41

42

43

44

45

46

4748

49

50

5152

53

54

55

56

57

58

59

60

61

62

63

64

65

Figure 6: Three subsets of 8 variables have been examined. Left: Fuzzy Mutual Information[15]. Center: Analysis of Variance of linear models, with crisp data. Right: Selection of the mostrelevant tests, according to the human expert. The crisp selection did not take into accountthe vagueness. The set of variables chosen by the expert does not produce a clear distinctionbetween children with and without dyslexia. The use of fuzzy IM produced reasonable results.

!0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0

!1.0

!0.5

0.0

0.5

1.0

1.5

2.0

2.5

V1

V2

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

1718

19

20

21

22

23

2425

2627

28

29

30

3132

33

34

353637

38

39

40

41

42

43

44

45

46

4748

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

0 1 2 3

0.0

0.5

1.0

1.5

2.0

2.5

3.0

V1

V2

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

3031

32

3334

353637

38

3940

41

4243

44

45

46

47

48

49

50

5152

53

54

55

56

57

58

5960

61

62

63

64

65

!1 0 1 2 3

!1

01

23

V1

V2

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

2829

30

31

32

3334

35

36

37

38

39

40

41

42

434445

46

47

48

49

50

51

52

53

54

55

56

57

58

59

6061

62

63

64

65

Figure 7: Interval MDS in activation space, with granularities 3, 5 and 7, and the feature setobtained by IM. The projection of the set of granularity 7 does not significantly improve theseparability of the data and 3 labels are not enough for separating the cases, thus the bestgranularity is 5.[9] L. Sánchez, J. Otero, and J. R. Villar, “Graphical exploratory analysis of vague data in the early diagnosis of syslexia,”

in Proc. IPMU 2008, Málaga, Spain, 2008.

5. Feature selection, fuzzified data

The ranking of the features depends on the linguistic partition of the input variables

Wine

RELIEF + SSGA + MIFS + FMIFS

Perc

ent of T

ota

l

0

10

20

30

40

0.0 0.1 0.2 0.3 0.4 0.5 0.6

RELIEF

0.0 0.1 0.2 0.3 0.4 0.5 0.6

SSGA

0.0 0.1 0.2 0.3 0.4 0.5 0.6

MIFS

0.0 0.1 0.2 0.3 0.4 0.5 0.6

FMIFS

Ion


Perc

ent of T

ota

l

0

10

20

30

40

0.1 0.2 0.3 0.4 0.5

RELIEF

0.1 0.2 0.3 0.4 0.5

SSGA

0.1 0.2 0.3 0.4 0.5

MIFS

0.1 0.2 0.3 0.4 0.5

FMIFS

Sonar


Perc

ent of T

ota

l

0

10

20

30

40

0.2 0.3 0.4 0.5

RELIEF

0.2 0.3 0.4 0.5

SSGA

0.2 0.3 0.4 0.5

MIFS

0.2 0.3 0.4 0.5

FMIFS

Pima


Perc

ent of T

ota

l

0

10

20

30

0.25 0.30 0.35

RELIEF

0.25 0.30 0.35

SSGA

0.25 0.30 0.35

MIFS

0.25 0.30 0.35

FMIFS

German


Perc

ent of T

ota

l

0

10

20

30

0.24 0.26 0.28 0.30 0.32

RELIEF

0.24 0.26 0.28 0.30 0.32

SSGA

0.24 0.26 0.28 0.30 0.32

MIFS

0.24 0.26 0.28 0.30 0.32

FMIFS

Fig. 5. Histogram of the test error, 1000 runs of HEUR2 (each sample is the meantest error in a 10 fold cross validation, which was repeated 100 times, with ran-dom Ruspini partitions of the input variables.) From top to bottom: WINE, ION,SONAR, PIMA and GERMAN datasets. Observe that there exist datasets (e.g.ION) for which the density function of the test error in FMIFS is skewed, showinga correlation between the ranking of the features and the membership functions ofthe input variables.

23

[7] Sanchez, L., Suarez, M. R., Villar, J. R.; Couso, I. Some Results about Mutual Information-based Feature Selection and Fuzzy Discretization of Vague Data. FUZZ-IEEE 2007, London. pp 1-6, 2007.

i workshop on knowledge extraction based on evolutionary ......international journal of...

Documents