i workshop on knowledge extraction based on evolutionary ......international journal of...
TRANSCRIPT
![Page 1: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/1.jpg)
I Workshop on Knowledge Extraction based on Evolutionary Learning
Granada, May 16th, 2008
Low Quality DataLuciano Sánchez
Universidad de Oviedo
![Page 2: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/2.jpg)
Part I
Introduction to Low Quality Data
![Page 3: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/3.jpg)
Summary
Part I: Introduction to Low Quality Data
Future directions
Novel types of data (issues with the representation, fuzzy random variables)
Preprocessing
Learning (Inference, Fitness, Genetic optimization)
Validation
Part II: Some results on the use of Evolutionary Algorithms for knowledge extraction from low quality data
![Page 4: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/4.jpg)
Future directions in 2005
Trade-off interpretability vs. precision. Use of MOGAs
FRBS for high dimensional problems
GFSs in Data Mining and Knowledge Discovery
Learning genetic models based on vague data
Herrera, F. Genetic Fuzzy Systems: Status, Critical Considerations and Future Directions. International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67
![Page 5: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/5.jpg)
Future directions in 2008
Multiobjective genetic learning of FRBSs: interpretability-precision
GA-based techniques for mining fuzzy association rules and data mining
Learning genetic models based on low quality data
Genetic learning of fuzzy partitions and context adaptation
Genetic adaptation of inference engine components
Revisiting Michigan style GFSs
Herrera, F. Genetic Fuzzy Systems: Taxonomy, Current Research Trends and Prospects. Evolutionary Intelligence 1 (2008) 27-46
![Page 6: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/6.jpg)
Rationale behind low quality data
Crisp data-based GFS are standard statistical classifiers and models:
Genetic fuzzy classifiers minimize a biased estimation of the classification error (the training error)
Genetic fuzzy models minimize an estimate of the squared error of a model
There might be theoretical differences if artificial imprecision is added to crisp data and a fuzzy fitness-based GFS is used
[4] Sánchez, L., Couso, I., Advocating the use of imprecisely observed data in Genetic Fuzzy Systems. IEEE Trans Fuzzy Sys 15(4). 2007. 551-562
![Page 7: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/7.jpg)
Addition of imprecision to crisp data (I)
There might be a relation between fuzzy fitness-based GF classifiers and SVM.
![Page 8: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/8.jpg)
Addition of imprecision to crisp data (II)
The same happens with models - the more regular the model, the more specific its fuzzy fitness is, pointing again to relations between regularized models and fuzzy fitness-based GF models.
![Page 9: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/9.jpg)
Further uses of low quality data
Imprecise / low quality
Imprecisely measured data
Coarsely discretized data, censored data
Missing values
Novel representations
Crisp data + tolerance
Aggregated values, lists, conflicting data
Addition of imprecision to crisp data
![Page 10: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/10.jpg)
Fuzzy Random Variables
Classical modelSecond order modelFirst order, imprecise probabilities-based
[8] Couso, I., Sánchez, L., Higher order models for fuzzy random variables. Fuzzy Sets and Systems 159, 3, 237-258, 2008
0 1
1
0.1
0 1
1
0.5
w1
w2
w3
{1}
!X
!
{1|0 + 0.1|1}
![Page 11: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/11.jpg)
Representation
The first order model has a possibilistic interpretation, coherent with the view of a fuzzy set as a familiy of confidence intervals
f(x|!) !(x|!)
A A
P (A) ! maxA
!(x|!)P (A) =!
Af(x|!)dx
![Page 12: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/12.jpg)
Critical considerations
EAs used in GFS
Simple GAs
The use of novel EAs
Experimental study
Benchmark problems and reproducibility
Lack of experimental statistical analysis
Comparison with the state of the art
Graphical!
Data!
Analysis
Pre-!
processing
Learning
Model!
Design
Validation
Herrera, F. Genetic Fuzzy Systems: Status, Critical Considerations and Future Directions. International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67
![Page 13: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/13.jpg)
Graphical Analysis
PCA, ICA, MDS, for interval and fuzzy data
Graphical!
Data!
Analysis
Pre-!
processing
Learning
Model!
Design
Validation
[9] Sánchez, L., Palacios, A., Suárez, M. R., Couso, I. Graphical exploratory analysis of vague data in the early diagnosis of dyslexia. IPMU 08. Málaga.
![Page 14: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/14.jpg)
Preprocessing
Feature selection (mutual information between frv, and other wrapper / filter methods)
Instance selection
Transformations of sets variables
Graphical!
Data!
Analysis
Pre-!
processing
Learning
Model!
Design
Validation
[7] Sanchez, L., Suarez, M. R., Villar, J. R., Couso, I. Some results about mutual information based feature selection and fuzzy discretization of vague data. Intl. Conf. Fuzzy Systems FUZZ-IEEE 2007, pp. 1-6. 2007.
![Page 15: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/15.jpg)
Fuzzy fitness-based GFS
New inference mechanisms, that are coherent with the representation of interval or fuzzy data
New measures of fitness (variance of an frv, error of a classifier on imprecise data, etc.)
Graphical!
Data!
Analysis
Pre-!
processing
Learning
Model!
Design
Validation
Sanchez, L., Couso, I., Casillas, J., Genetic Learning of Fuzzy Rules based on Low Quality Data. Submitted to Fuzzy Sets and Systems.
![Page 16: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/16.jpg)
Inference
The reasoning method must be coherent with the possibilistic representation of the data.
![Page 17: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/17.jpg)
Fitness Function
!K
K
0
0
1
X(x1) X(x2)
x2x1
0
1
0
1
K2
VarCl(X) VarKr(X) VarIm1(X)
K2/3
[5] Couso, I., Dubois, D., Montes, S., Sanchez, L., On various definitions of the variance of a fuzzy random variable. 5th International Symposium on Imprecise Probability: Theories and Applications. ISIPTA’07. 16-19 July 2007
The 1st order imprecise model has an interval-valued (not fuzzy) fitness
![Page 18: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/18.jpg)
Optimization of interval-valued fitness functions
GAs and Metaheuristics should minimize interval valued functions
Optimizing an interval valued funcion is analogous to solving a multicriteria problem: the same algorithms (i.e. NSGA-II, SPEA, etc) can be adapted.
![Page 19: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/19.jpg)
NSGA-II for interval-valued fitness
Dominance relation (non dominated sorting)Crowding distance
[2] Sánchez, L. Couso, I. Casillas, J. Modelling Vague Data with Genetic Fuzzy Systems under a Combination of Crisp and Imprecise Criteria, in Proc. of the 2007 IEEE Symposium on Computational Intelligence in Multicriteria Decision Making (MCDM'2007), pp. 30--37, Honolulu, Hawaii, USA, April 2007
![Page 20: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/20.jpg)
Dominance
Strong dominanceUniform priorImprecise probabilities based prior
1 3 5
2
4
5
!1
!2
!1>!2
!1<!2
S=0.5
S=3.5
![Page 21: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/21.jpg)
Crowding
The Haussdorff distance is between the minimum and the maximum distance between the individuals
I1I2
I3
I4I5
I6
V 5min = 0V 2
min
V 2max
V 5max
![Page 22: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/22.jpg)
Validation
Benchmarks with interval and fuzzy data
Statistical tests for comparing samples of fuzzy random variables
Graphical!
Data!
Analysis
Pre-!
processing
Learning
Model!
Design
Validation
Couso, I., Sanchez, L., Mark-recapture techniques in statistical tests for imprecise data. International Journal of Approximate Reasoning (submitted)
[10] Couso, I., Sanchez, L., Defuzzification of fuzzy p-values. Fourth International Workshop on Soft methods in probability and statistics SPMS’08 (admitted)
![Page 23: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/23.jpg)
Summary of Part I
Lots of open problems. Some to mention:Theoretical study of the relations with SVMGraphical analysis tools for coarse data (MDS, ICA, PCA, etc.)Feature selection, instance selection, transformationsNew inference mechanisms that match the representation of dataNew measures of fitness for classifiers and modelsTheoretical and practical studies relating MOEAs and optimization of interval and fuzzy-valued fitness functions, new MOEAs and MO metaheuristics for solving problems with mixtures of objectivesStatistical tests (bootstrap, t-test for imprecise data, etc.)Benchmarks for comparing the new algorithms.
![Page 24: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/24.jpg)
Part II
Some results on the use of Evolutionary Algorithms for extracting knowledge from low
quality data
![Page 25: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/25.jpg)
Examples of successful applications
1. Artificial addition of imprecision to crisp data, learning with IRL and GCCL (synthetic datasets)
2. Aggregates of conflicting data, learning with Pittsburgh-style GFS (marketing models)
3. Crisp data + tolerance, evolutionary filtering (GPS trajectories)
4. Interval-valued data, Pittsburgh-style GFS(Diagnosis of dyslexia)
5. Preprocessing and evolutionary graphical analysis
![Page 26: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/26.jpg)
1. IRL (Boosting) - Extended Data
The addition of fuzzy imprecision to crisp data, followed by a fuzzy fitness-based GFS, improves the generalization
BFT BMO
0.1
0.2
0.3
0.4
0.5
[1] L. Sánchez, J. Otero, and J. R. Villar, “Boosting of fuzzy models for high-dimensional imprecise datasets,” in Proc. IPMU 2006, Paris, France, 2006.
![Page 27: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/27.jpg)
1. GCCL - Extended data
GCCL is better than IRL for low quality data, in complexity and accuracy
Fuzzy Regression Models from Imprecise Data with Multiobjective GA 11
WM1 WM2 WM3 CH1 CH2 CH3 NIT LIN QUA NEU TSK BFT BMO GCCLf1 5.65 5.73 5.57 5.82 8.90 6.93 5.63 130.5 0.00 0.35 0.095 0.327 0.30 0.14
f1-10 6.89 7.19 6.54 6.84 10.15 8.20 7.16 133.91 1.40 1.78 1.62 1.86 1.71 1.65f1-20 11.07 10.99 11.06 11.33 13.45 12.42 10.63 135.6 5.29 6.42 5.90 6.04 5.98 5.89f1-50 51.78 46.40 47.80 53.48 48.94 48.16 39.65 166.64 33.53 41.18 36.76 39.62 38.66 38.66
f2 0.41 0.48 0.45 0.40 0.59 0.45 0.43 1.54 1.61 1.48 0.15 0.24 0.26 0.25f2-10 0.64 0.68 0.68 0.59 0.68 0.60 0.58 1.71 1.75 1.81 0.29 0.42 0.41 0.38f2-20 1.27 1.16 1.17 1.29 1.15 1.17 0.97 2.04 2.09 0.90 0.76 0.87 0.87 0.90f2-50 4.34 3.98 3.94 4.47 3.90 3.97 3.59 4.67 4.78 3.76 3.62 3.67 3.72 3.70
Table 5 Comparative test error of FGCCL and the synthetic datasets f1 and f2 (reproduced from reference [28]). Thealgorithms include: heuristics (WM, CH), statistical regression (LIN, QUA), neural networks (NEU), rules with a real-valuedconsequent (NIT), TSK rules (TSK), genetic backfitting (BFT) and MOSA-based backfitting (BMO). BFT, BMO and GCCLruns were limited to 25 fuzzy rules. The best of WM, CH, NIT, BFT, BMO and GCCL, plus the best overall model, werehighlighted for every dataset.
Statistical Fuzzy Rule LearningLIN NEU WM WLS-TSK BFT FGCCL
elec ·10!3 419 616 832 485 444 399machine-CPU 6204 8955 18057 7533 17857 9536
daily-elec 0.171 0.195 0.305 0.179 0.224 0.196Friedman 7.33 1.22 7.11 1.80 2.28 1.61
building ·103 4.77 2.76 4.44 2.68 3.15 2.98
Table 6 Performance of a selection of statistical models, heuristic rule learning and GFS in some benchmarks. The mostaccurate fuzzy rule learning algorithm is Weighted Least Squares TSK, followed by FGCCL.
Labels WM WLS-TSK BFT FGCCLelec 3 7 8 10 4
machine-CPU 3 20 91 25 4daily-elec 3 64 427 25 5Friedman 3 192 242 25 10building 3/2" 789 896" 30 20
Table 7 The number of fuzzy rules generated by FGCCL is much lower than those produced by either example-guided orgrid-guided learning methods. FGCCL improves grid-based learning algorithms by one order of magnitude in the number ofparameters. We have used three linguistic labels in all the variables (but the discrete ones). In the dataset “building”, thegrid-based algorithm depends on partitions of size 2. Otherwise, the number of rules is too high for practical purposes.
6.4 Performance of GCCL over machine learningdatabases
The fourth benchmark uses standard Machine Learn-ing benchmarks to assess the accuracy and the compact-ness of the FGCCL algorithm, in problems with di!erentsizes and number of inputs. In order to keep the discus-sion simple, we have chosen a selection of the algorithmsin the preceding section. CH1, CH2 and CH3 were dis-carded because they produced a much higher numberof rules than WM, without a noticeable increment ofthe accuracy. Also, since we are measuring the linguis-tic quality by the number of rules, without taking intoaccount the complexities of the consequent part of therules, we discard NIT, because it is less accurate thanweighted linear squares-based TSK, for the same numberof fuzzy rules. The quadratic regression is also removed,because most of times the corresponding polynomial hasmore terms than the multilayer perceptron needed forobtaining an equivalent accuracy.
Summarizing, linear regression and the neural net-work are left for reference purposes. Other than this,we have compared: one example-guided heuristic method(Wang and Mendel algorithm), one TSK grid-based al-gorithm (we fitted with weighted least squares a planeto each set of elements covered by one cell of the fuzzygrid; the weights are the memberships of the elements tothe cell), an IRL method (genetic backfitting) and FuzzyGCCL. The results are given in Tables 6, 7, and Figure5.
As expected, the example-guided heuristic uses a mod-erately high number of rules, and it is not very pre-cise (equivalent or worse than linear regression). Theweighted linear squares obtention of TSK rules is themost precise method, similar to that of the neural net-work. Backfitting and FGCCL both perform well ando!er a good compromise between compacteness and ac-curacy. FGCCL was significantly better than backfitting
Fuzzy Regression Models from Imprecise Data with Multiobjective GA 9
0% 1% 2% 3%f1 0.13 0.22 1.34 5.30
f1-10 1.59 1.67 2.75 6.69f1-20 5.98 6.08 7.17 11.05f1-30 14.37 14.53 15.76 19.92f1-50 39.22 39.30 40.31 44.16
f2 0.22 0.23 0.34 0.76f2-10 0.35 0.37 0.50 0.93f2-20 0.86 0.88 1.00 1.43f2-30 1.40 1.42 1.55 1.97f2-50 3.69 3.71 3.83 4.25
elec ·10!3 416 416 416 890
Table 3 Test error when the input data has tolerances ±1%, ±2% and ±3%. When the observation error is combined with ahigh slope in the function (function f1 and “elec” problem,) the learning algorithm overtrains. The abnormal values are shownin boldface.
1% 1% 5% 5% 10% 10%BFT FGCCL BFT FGCCL BFT FGCCL
f1 0.89 0.35 6.64 6.25 24.82 24.77f1-10 2.66 1.86 9.39 8.31 29.03 28.60
f2 0.52 0.23 0.60 0.47 1.41 1.23f2-10 0.56 0.37 0.97 0.68 1.67 1.70elec 440 421 581 558 1003 988
Table 4 Comparison of FGCCL and Backfitting in datasets with 1%, 5% and 10% of interval-valued imprecision in theoutputs, and tested with crisp data with non-zero mean observation error. FGCCL improved the results in all the tests forwhich the observation error was the most relevant source of noise.
The results are shown in Table 4. As expected, GCCLimproved the results in the tests for which the observa-tion error was the most relevant source of noise, almostuniformly.
6.3 Influence of the stochastic noise
The third benchmark checks whether FGCCL can be ap-plied to crisp problems. We have compared the accuracyof FGCCL with the selection of learning algorithms inreference [28]. We have intentionally dropped from thatlist the real-world datasets1, and focused in the syntheticdatasets, whose degree of contamination with stochasticnoise is known. It is emphasized that, in this case, wehave assumed that the observation error is zero, sincenone of the algorithms to which we will compare ourscan use vague data.
The fuzzy rule learning algorithms in reference [28]are: Wang and Mendel with importance degrees ‘max-imum’, ‘mean’ and ‘product of maximum and mean’(WM1, WM2 and WM3, respectively) [36] the same threeversions of Cordon and Herrera’s method (CH1, CH2,CH3) [3]. Nozaki, Ishibuchi and Tanaka’s (NIT) fuzzyrule learning [22], TSK rules [34] optimized with Weighted
1 The results of FGCCL on the two dropped problemsbuilding and elec can be found in Table 6. In both cases,FGCCL is better than all GFSs in reference [28], but thedi!erence is not relevant in this context.
Least Squares and Genetic Backfitting [27], and MOSA-based [32] backfitting (BMO) [28]. The same referenceincludes the linear regression (LIN), the Quadratic Re-gression (QUA) and a Conjugate-Gradient trained Mul-tilayer Perceptron (NEU). This 13 algorithms have beencompared to the Genetic Cooperative Competitive FGCCL.
The best overall result and the most accurate fuzzyrule base are boldfaced in Table 5. We have also in-cluded a selection of boxplots in Figure 4. These pro-vide a graphical insight of the relevance of the di!er-ences between the median, the mean and the variance ofthe results. Observe that the performances of the heuris-tic algorithms are always worse than those of the GFS.There are not, in general, significant di!erences in accu-racy between GFS, statistical nonlinear regression andneural networks, although there seems to be a small im-provement in accuracy of FGCCL over the remainingGFS. The significance of the di!erence is never statis-tically sound (95% level). Observe that, if we had usedthe results in the first column of Table 3 (which wereobtained after a di!erent set of ten independent runs ofthe algorithm) the conclusions are the same.
However, the most important gain, if FGCCL is tobe used in crisp datasets, is not the improved accuracybut the compactness of the rule base, as we show in thenext section.
[6] Sanchez, L., Otero, J. Learning fuzzy linguistic models from low quality data by genetic algorithms FUZZ-IEEE 2007, London. pp 1-6, 2007
![Page 28: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/28.jpg)
2. Pittsburgh - Aggregated data (I)
The fuzzy fitness-based GFS improved the accuracy of the crisp version.
![Page 29: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/29.jpg)
2. Pittsburgh - Aggregated data (II)
The precedence operator based on imprecise probabilities is more efficient in the latter generations of the GA
[2] Sánchez, L. Couso, I. Casillas, J. Modelling Vague Data with Genetic Fuzzy Systems under a Combination of Crisp and Imprecise Criteria, in Proc. of the 2007 IEEE Symposium on Computational Intelligence in Multicriteria Decision Making (MCDM'2007), pp. 30--37, Honolulu, Hawaii, USA, April 2007
![Page 30: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/30.jpg)
3. Composite data (GPS)
Population-based SA and NSGA-II were used to find the lowest upper bound of the length of a trajectory
!"
! "!! #!!! #"!! $!!! $"!! %!!!
!$!!
&!!
'!!
(!!
#!!!
)*+,-./
012,3-./
%!!! %$!! %&!! %'!! %(!! &!!! &$!!
#!!!
#"!!
$!!!
$"!!
%!!!
)*+,-./
012,3-./
#$%& !!& '()*+,-.($+/ 0/+1 $2 -3+ +45+($6+2-/7 .8-)$2+1 89 /$60:)-$.2 .; <=> 8+3)?$.(7 )/ +45:)$2+1 $2 /+,-$.2 @AA&
$!!! $$!! $&!! $'!! $(!! %!!!
(!!
("!
)*+,-./
012,3-./
#$%& !B& C4)65:+ .; <=> %+2+()-+1 1)-) ):.2% D$-3 -3+ -(0+ -()*+,-.(9& E/ ,)2 8+ /++27 /.6+ .; -3+ !4+/ FC= 1. 2.- $2-+(/+,- -3+ -(0+-()*+,-.(9& '3+ 6./- 5)(- .; -3+ 5.$2-/ .; -3+ -()*+,-.(9 )(+ $2/$1+ -3+ FC= )- GHI&
@AA& CJ=CKALCM'> EMN KC>OP'>
A- D)/ 1+,$1+1 -. +?):0)-+ .0( ):%.($-36 $2 ) (+):$/-$, 5)-3 -3)- ,.?+(/ -3+ /$-0)-$.2/ 0/0)::9 ;.021 D3+2 -3+ @'>>
-+/- .; ) -)4$ $/ ,)(($+1 .0-& '3$/ $2,:01+/ /+?+(): -0(2/7 ),,+:+()-$.2/ )21 1+,+:+()-$.2/7 ,3)2%+/ $2 -3+ 2068+( .;
)?)$:)8:+ /)-+::$-+/ )21 -30/7 ,3)2%+/ $2 QNR= )21 FC= STBU& <=> :.2%$-01+V:)-$-01+ ,..(1$2)-+/ D+(+ -()2/:)-+1
-. O2$?+(/): '()2/?+(/+ L+(,)-.( 2.(-3$2%V+)/-$2% ,..(1$2)-+/ $2 .(1+( -. 6)W+ 1$/-)2,+ ,):,0:)-$.2/ 8+-D++2 <=>
!4+/ SB"U +)/$+(& X$-3 -3$/ /9/-+67 5.$2-/ .2 C)(-3Y/ /0(;),+ )(+ 5(.*+,-+1 .2-. )2 +Z0): /5),+1 5:)2)( 6+-($, %($17
-3+(+;.(+ -3+ 1$/-)2,+ 8+-D++2 !4+/ $/ -3+ 0/0): C0,:$1+)2 .2+&
'3+ -()*+,-.(9 $/ /)65:+1 +),3 /+,.217 )- +),3 :.,)-$.27 ) ()21.6 2068+( ;(.6 [ -. G $/ -)W+2 )/ -3+ 2068+(
!"
!"#$ %#&$!''
()*+
()(+
())+
(),+
()-+
().+
!/0/1213453'06/7/62893:;0<=>
#$%& "'& ()*+,)- ). /012,-1 ./)3 "4 /251 ). 6789 :5; <8=9>??@$-A)2- ;B5:3$C :5:,B1$1D ;:-:10- ). -A0 !/1- 10C-$)5 ). -/:E0C-)/B&
!"#$ %#&$!''
(?(+
(?)+
(?,+
(?-+
(?.+
!/0/1213453'06/7/62893:;0<=>
#$%& "F& ()*+,)- ). /012,-1 ./)3 "4 /251 ). 6789 :5; <8=9>?? @$-A)2- ;B5:3$C :5:,B1$1D ;:-:10- ). -A0 10C)5; 10C-$)5 ). -/:E0C-)/B&
G???& H7<HIJ8?7<8
K2/$5% -A0 ;0L0,)+305- ). -A$1 :++,$C:-$)5 @0 .)25; -A:- $. @0 /0+)/- ;$/0C-,B -A0 ;:-: )M-:$50; @$-A =N8
0O2$+305-D -A0/0 @0/0 ,0%:, $11201 :M)2- -A0 25C0/-:$5-B ). -A0 30:12/0305-1& P:*$ )@50/1 C)2,; 0:1$,B %:$5 $5
C)2/-1 :5B /0C,:3:-$)5 @A0/0 -A0 25C0/-:$5-B ). -A0 =N8 30:12/0305-1 @0/0 /0L0:,0;& <)@:;:B1 -A0 )5,B 1B1-03
-A:- @0 C:5 210 -) C0/-$.B : -:*$30-0/ $5 : GP88 Q:1 1-:-0; $5 ?R $1 -) 210 : =N8 ;0L$C0D 1) @0 A:L0 -) 3);$.B
-A0 +)$5- ). L$0@ ). -A0 GP88 -01- +/)C0;2/0& (0C:210 ). -A$1 @0 CA))10 -) %$L0 :1 /012,- -A0 2++0/ M)25; ). -A0
-/:E0C-)/B ,05%-A C)3+:-$M,0 @$-A =N8 ;:-:D $5 -A$1 @:B -A0/0 $1 5) ;)2M- $5 -A:- $. -A0 -:*$30-0/ /0+)/-0; ,05%-A
$1 "4S :M)L0 -A$1 30:12/0D -A05 $- 1A)2,; M0 /0E0C-0;& 9;;$-$)5:,,BD -A$1 :,-0/5:-$L0 $1 ,011 /01-/$C-$L0 @$-A -A0 /0:,
;:-: %$L05 -A0 M$:10; 0//)/ ;0-0C-0; $5 -A0 -:*$30-0/1&
P/20 6789 <8=9>??
K:-:10- I05%-A (01- 60:5 (01- 60:5
" '!!T&UVF '!'F&TW '!'U&T' '!F!&WF '!F'DFU
! !VF"&'4W !VWV&4W !VT!&VV !VWU&W' !VXT&4"
P9(IY G
ZY8JIP8 #Z76 "4 ZJ<8 7# 6789 9<K <8=9>?? [9G?<= P[Y K\<96?H 9<9I\8?8 69Z]YKD ^?P[ F444 =Y<YZ9P?7<8 9<K "U
?<K?G?KJ9I8 ?< 7<Y N7NJI9P?7<&
[3] Sanchez, L., Couso, I., Casillas, J. A Multiobjective Genetic Fuzzy System with Imprecise Probability Fitness for Vague Data. Int. Symp. on Evolving Fuzzy Systems (EFS 2006), pp. 131-136, 2006.
![Page 31: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/31.jpg)
4. Feature Selection + Pittsburgh
Dyslexia diagnosis with interval data. The set of variables selected based on fuzzy techniques are uniformly better than those found by crisp techniques
~
FPMIN FPMAX FSMIN FSMAX FIMIN FIMAX CPMIN CPMAX CSMIN CSMAX CIMIN CIMAX
0.6
0.7
0.8
0.9
1.0
Erro
r
![Page 32: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/32.jpg)
5. MDS, Interval & Missing data
MDS analysis for different granularities of the linguistic partition
0 10 20 30 40 50
010
20
30
40
50
V1
V2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
1819
20
21
22
23
24
25
26
2728
29
30
31
32
33
34
35
36
37
38
39
4041
42
43
44
45
46
47
4849
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
0 5 10
05
10
15
V1
V2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
1718
19
20
21222324
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Figure 5: Left: MDS in input space. Right: MDS in activation space. The use of the activationspace instead of the input space gains more insight into the structure of the data, and does notrequire scaling. The meaning of the colors is: Red: 0. Green: lower than 1. Blue: 1. Cyan: 2.Magenta: 3. Yellow: 4
0 1 2 3
0.0
0.5
1.0
1.5
2.0
2.5
3.0
V1
V2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
3031
32
3334
353637
38
3940
41
4243
44
45
46
47
48
49
50
5152
53
54
55
56
57
58
5960
61
62
63
64
65
0.0 0.5 1.0 1.5 2.0 2.5 3.0
!0.5
0.0
0.5
1.0
1.5
2.0
2.5
V1
V2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
2021
22
23
24
25
26
27
28
29
30
31
3233
34
353637
3839
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
0.0 0.5 1.0 1.5 2.0
0.0
0.5
1.0
1.5
2.0
V1
V2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
2223
24
25
26
27
28
29
30
31
32
33
34
35
3637
38
39
40
41
42
43
44
45
46
4748
49
50
5152
53
54
55
56
57
58
59
60
61
62
63
64
65
Figure 6: Three subsets of 8 variables have been examined. Left: Fuzzy Mutual Information[15]. Center: Analysis of Variance of linear models, with crisp data. Right: Selection of the mostrelevant tests, according to the human expert. The crisp selection did not take into accountthe vagueness. The set of variables chosen by the expert does not produce a clear distinctionbetween children with and without dyslexia. The use of fuzzy IM produced reasonable results.
!0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0
!1.0
!0.5
0.0
0.5
1.0
1.5
2.0
2.5
V1
V2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
1718
19
20
21
22
23
2425
2627
28
29
30
3132
33
34
353637
38
39
40
41
42
43
44
45
46
4748
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
0 1 2 3
0.0
0.5
1.0
1.5
2.0
2.5
3.0
V1
V2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
3031
32
3334
353637
38
3940
41
4243
44
45
46
47
48
49
50
5152
53
54
55
56
57
58
5960
61
62
63
64
65
!1 0 1 2 3
!1
01
23
V1
V2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
2829
30
31
32
3334
35
36
37
38
39
40
41
42
434445
46
47
48
49
50
51
52
53
54
55
56
57
58
59
6061
62
63
64
65
Figure 7: Interval MDS in activation space, with granularities 3, 5 and 7, and the feature setobtained by IM. The projection of the set of granularity 7 does not significantly improve theseparability of the data and 3 labels are not enough for separating the cases, thus the bestgranularity is 5.[9] L. Sánchez, J. Otero, and J. R. Villar, “Graphical exploratory analysis of vague data in the early diagnosis of syslexia,”
in Proc. IPMU 2008, Málaga, Spain, 2008.
![Page 33: I Workshop on Knowledge Extraction based on Evolutionary ......International Journal of Computational Intelligence Research. 1 (1). 2005. 59-67 Future directions in 2008 Multiobjective](https://reader034.vdocuments.us/reader034/viewer/2022052105/604105129207ed1ecb40d50d/html5/thumbnails/33.jpg)
5. Feature selection, fuzzified data
The ranking of the features depends on the linguistic partition of the input variables
Wine
RELIEF + SSGA + MIFS + FMIFS
Perc
ent of T
ota
l
0
10
20
30
40
0.0 0.1 0.2 0.3 0.4 0.5 0.6
RELIEF
0.0 0.1 0.2 0.3 0.4 0.5 0.6
SSGA
0.0 0.1 0.2 0.3 0.4 0.5 0.6
MIFS
0.0 0.1 0.2 0.3 0.4 0.5 0.6
FMIFS
Ion
RELIEF + SSGA + MIFS + FMIFS
Perc
ent of T
ota
l
0
10
20
30
40
0.1 0.2 0.3 0.4 0.5
RELIEF
0.1 0.2 0.3 0.4 0.5
SSGA
0.1 0.2 0.3 0.4 0.5
MIFS
0.1 0.2 0.3 0.4 0.5
FMIFS
Sonar
RELIEF + SSGA + MIFS + FMIFS
Perc
ent of T
ota
l
0
10
20
30
40
0.2 0.3 0.4 0.5
RELIEF
0.2 0.3 0.4 0.5
SSGA
0.2 0.3 0.4 0.5
MIFS
0.2 0.3 0.4 0.5
FMIFS
Pima
RELIEF + SSGA + MIFS + FMIFS
Perc
ent of T
ota
l
0
10
20
30
0.25 0.30 0.35
RELIEF
0.25 0.30 0.35
SSGA
0.25 0.30 0.35
MIFS
0.25 0.30 0.35
FMIFS
German
RELIEF + SSGA + MIFS + FMIFS
Perc
ent of T
ota
l
0
10
20
30
0.24 0.26 0.28 0.30 0.32
RELIEF
0.24 0.26 0.28 0.30 0.32
SSGA
0.24 0.26 0.28 0.30 0.32
MIFS
0.24 0.26 0.28 0.30 0.32
FMIFS
Fig. 5. Histogram of the test error, 1000 runs of HEUR2 (each sample is the meantest error in a 10 fold cross validation, which was repeated 100 times, with ran-dom Ruspini partitions of the input variables.) From top to bottom: WINE, ION,SONAR, PIMA and GERMAN datasets. Observe that there exist datasets (e.g.ION) for which the density function of the test error in FMIFS is skewed, showinga correlation between the ranking of the features and the membership functions ofthe input variables.
23
[7] Sanchez, L., Suarez, M. R., Villar, J. R.; Couso, I. Some Results about Mutual Information-based Feature Selection and Fuzzy Discretization of Vague Data. FUZZ-IEEE 2007, London. pp 1-6, 2007.