hisao ishibuchi

159
Fuzzy Rule-Based Classifier Design Hisao Ishibuchi Osaka Prefecture University, Japan

Upload: ilg1

Post on 10-Nov-2015

223 views

Category:

Documents


0 download

DESCRIPTION

fuzzy logic

TRANSCRIPT

  • Fuzzy Rule-Based Classifier Design

    Hisao IshibuchiOsaka Prefecture University, Japan

  • Plan of This Presentation1. Introduction

    - Focus of the presentation - Brief introduction to evolutionary computation- Brief introduction to fuzzy rule-based classifiersBrief introduction to fuzzy rule based classifiers

    2. Review of Fuzzy Rule-Based System ResearchFuzzy rules from human experts- Fuzzy rules from human experts

    - Accuracy maximization- Interpretability improvement- Multiobjective approach

    3. Evolutionary Fuzzy Classifier Designy y g- Evolutionary fuzzy rule selection- Fuzzy genetics-based machine learning

    4. Current Hot Issues and Future Directions

  • Plan of This Presentation1. Introduction

    - Focus of the presentation - Brief introduction to evolutionary computation- Brief introduction to fuzzy rule-based classifiersBrief introduction to fuzzy rule based classifiers

    2. Review of Fuzzy Rule-Based System ResearchFuzzy rules from human experts- Fuzzy rules from human experts

    - Accuracy maximization- Interpretability improvement- Multiobjective approach

    3. Evolutionary Fuzzy Classifier Designy y g- Evolutionary fuzzy rule selection- Fuzzy genetics-based machine learning

    4. Current Hot Issues and Future Directions

  • Focus of This Presentation

    Fuzzy Rule-Based SystemsFuzzy Rule Based Systems

    Modeling

    ControlTime Series

    Control

    Classification

  • Plan of This Presentation1. Introduction

    - Focus of the presentation - Brief introduction to evolutionary computation- Brief introduction to fuzzy rule-based classifiersBrief introduction to fuzzy rule based classifiers

    2. Review of Fuzzy Rule-Based System ResearchFuzzy rules from human experts- Fuzzy rules from human experts

    - Accuracy maximization- Interpretability improvement- Multiobjective approach

    3. Evolutionary Fuzzy Classifier Designy y g- Evolutionary fuzzy rule selection- Fuzzy genetics-based machine learning

    4. Current Hot Issues and Future Directions

  • Basic Idea of Evolutionary Computation

    EnvironmentEnvironment

    Population

    Individual

  • Basic Idea of Evolutionary Computation

    Environment EnvironmentEnvironment

    Population

    Environment

    Population

    (1)(1)

    G d I di id lIndividual

    (1) Natural selection in a tough environment

    Good Individual(Relatively Good)

    (1) Natural selection in a tough environment.

  • Basic Idea of Evolutionary Computation

    Environment Environment EnvironmentEnvironment

    Population

    Environment

    Population

    Environment

    Population

    (1) (2)(1) (2)

    G d I di id lIndividual New Individual

    (1) Natural selection in a tough environment

    Good Individual(Relatively Good)

    (1) Natural selection in a tough environment.(2) Reproduction of new individuals by crossover and mutation.

  • Basic Idea of Evolutionary Computation

    Environment EnvironmentEnvironment

    Population

    Environment

    PopulationA large number of iterations gof generation update

    Iteration of the generation update many times(1) Natural selection in a tough environment.(2) Reproduction of new individuals by crossover and mutation.

  • Applications of Evolutionary ComputationDesign of High Speed TrainsDesign of High Speed TrainsEnvironmentEnvironment

    Population

    Individual = Design ( )

  • Applications of Evolutionary ComputationDesign of Stock Trading AlgorithmsDesign of Stock Trading AlgorithmsEnvironmentEnvironment

    Population

    Individual = Trading Algorithm ( )

  • Design of Rule-Based SystemsEnvironment

    g y

    PopulationIf ... Then ...If ... Then ...If Then

    If ... Then ...If ... Then ...If Then

    If ... Then ...If ... Then ...If Then

    If ... Then ...If ... Then ...If ThenIf ... Then ...

    If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...

    Individual = Rule-Based System ( )If ... Then ...If ... Then ...If ... Then ...If ... Then ...

  • Design of Decision TreesEnvironment

    g

    Population

    Individual = Decision Tree ( )

  • Design of Neural NetworksEnvironment

    g

    Population

    Individual = Neural Network ( )

  • Plan of This Presentation1. Introduction

    - Focus of the presentation - Brief introduction to evolutionary computation- Brief introduction to fuzzy rule-based classifiersBrief introduction to fuzzy rule based classifiers

    2. Review of Fuzzy Rule-Based System ResearchFuzzy rules from human experts- Fuzzy rules from human experts

    - Accuracy maximization- Interpretability improvement- Multiobjective approach

    3. Evolutionary Fuzzy Classifier Designy y g- Evolutionary fuzzy rule selection- Fuzzy genetics-based machine learning

    4. Current Hot Issues and Future Directions

  • What is a Fuzzy Rule-Based Classifier ?

    Q. What is a fuzzy rule-based classifier ?A. It is a classifier with fuzzy classification rules.

  • What is a Fuzzy Rule-Based Classifier ?

    Q. What is a fuzzy rule-based classifier ?A. It is a classifier with fuzzy classification rules.

    Q. What is a fuzzy classification rule ?Q yA. It is an if-then rule with fuzzy conditions in the if-part and

    a consequent class in the then-part.

  • What is a Fuzzy Rule-Based Classifier ?

    Q. What is a fuzzy rule-based classifier ?A. It is a classifier with fuzzy classification rules.

    Q. What is a fuzzy classification rule ?Q yA. It is an if-then rule with fuzzy conditions in the if-part and

    a consequent class in the then-part.

    Example: High School StudentsRule 1: If Math Score is high and English Score is high then Class 1Rule 1: If Math Score is high and English Score is high then Class 1Rule 2: If Math Score is high and English Score is low then Class 2Rule 3: If Math Score is low and English Score is high then Class 2gRule 4: If Math Score is low and English Score is low then Class 3

  • What is a Fuzzy Rule-Based Classifier ?

    Q. What is a fuzzy rule-based classifier ?A. It is a classifier with fuzzy classification rules.

    Q. What is a fuzzy classification rule ?Q yA. It is an if-then rule with fuzzy conditions in the if-part and

    a consequent class in the then-part.

    Example: High School StudentsRule 1: If Math Score is high and English Score is high then Class 1Rule 1: If Math Score is high and English Score is high then Class 1Rule 2: If Math Score is high and English Score is low then Class 2Rule 3: If Math Score is low and English Score is high then Class 2gRule 4: If Math Score is low and English Score is low then Class 3

    Class 1: Students with high scores (Good Students)Cl 2 S d i h b l d (P i ll G d S d )Class 2: Students with unbalanced scores (Potentially Good Students)Class 3: Students with low scores (Not Good Students)

  • What is a Fuzzy Rule-Based Classifier ?

    Q. What is a fuzzy rule-based classifier ?A. It is a classifier with fuzzy classification rules.

    Q. What is a fuzzy classification rule ?Q yA. It is an if-then rule with fuzzy conditions in the if-part and

    a consequent class in the then-part.

    Q. Why do we use fuzzy rules ?A Because linguistic values (e g small large short tallA. Because linguistic values (e.g., small, large, short, tall,

    bad, good, ) are frequently used in our everyday life.

  • What is a Fuzzy Rule-Based Classifier ?

    Q. What is a fuzzy rule-based classifier ?A. It is a classifier with fuzzy classification rules.

    Q. What is a fuzzy classification rule ?Q yA. It is an if-then rule with fuzzy conditions in the if-part and

    a consequent class in the then-part.

    Q. Why do we use fuzzy rules ?A Because linguistic values (e g small large short tallA. Because linguistic values (e.g., small, large, short, tall,

    bad, good, ) are frequently used in our everyday life.

    It seems to me that the use of fuzzy rules is a natural way to represent our linguistic knowledgeway to represent our linguistic knowledge.

  • Plan of This Presentation1. Introduction

    - Focus of the presentation - Brief introduction to evolutionary computation- Brief introduction to fuzzy rule-based classifiersBrief introduction to fuzzy rule based classifiers

    2. Review of Fuzzy Rule-Based System ResearchFuzzy rules from human experts- Fuzzy rules from human experts

    - Accuracy maximization- Interpretability improvement- Multiobjective approach

    3. Evolutionary Fuzzy Classifier Designy y g- Evolutionary fuzzy rule selection- Fuzzy genetics-based machine learning

    4. Current Hot Issues and Future Directions

  • Brief Review of Fuzzy Systems ResearchShort Literature Review using Google ScholarShort Literature Review using Google Scholar

    Google Scholar Search was performed on September 25, 2012.

  • Brief Review of Fuzzy Systems ResearchZadehs Paper on Fuzzy Sets (1965)Zadeh s Paper on Fuzzy Sets (1965)

    First Paper on Fuzzy Sets by Zadeh (1965)p y y ( )

    L. A. Zadeh: Fuzzy sets. Information and Control (1965)y f ( )37,529 Citations

    StudentsGood

    St d tC

    Students StudentsA B C A B

    D E F DE

    FE

  • Brief Review of Fuzzy Systems ResearchZadehs Paper on Fuzzy Sets (1965)Zadeh s Paper on Fuzzy Sets (1965)

    First Paper on Fuzzy Sets by Zadeh (1965)p y y ( )

    L. A. Zadeh: Fuzzy sets, Information and Control (1965)y , f ( )37,529 Citations

    StudentsGood

    St d t 0 0C

    Students StudentsA B C A B

    0.0

    D E F DE

    F1.0 1.00 7E0.5 0.7 0.2

  • Brief Review of Fuzzy Systems ResearchFuzzy Clustering in 1970sFuzzy Clustering in 1970s

    E. H. Ruspini: Numerical methods for fuzzy clustering.E. H. Ruspini: Numerical methods for fuzzy clustering. Information Sciences (1970) 389 CitationsJ. C. Bezdek: Cluster validity with fuzzy sets. Journal of y y fCybernetics (1973) 673 CitationsJ. C. Bezdek: Pattern Recognition with Fuzzy Objective g y jFunction Algorithms. Kluwer (1981) 10,565 Citations

    Non-Fuzzy Clustering Fuzzy Clustering

  • Brief Review of Fuzzy Systems ResearchFuzzy Clustering in 1970sFuzzy Clustering in 1970s

    E. H. Ruspini: Numerical methods for fuzzy clustering.E. H. Ruspini: Numerical methods for fuzzy clustering. Information Sciences (1970) 389 CitationsJ. C. Bezdek: Cluster validity with fuzzy sets. Journal of y y fCybernetics (1973) 673 CitationsJ. C. Bezdek: Pattern Recognition with Fuzzy Objective g y jFunction Algorithms. Kluwer (1981) 10,565 Citations

    Non-Fuzzy Clustering Fuzzy Clustering

  • Brief Review of Fuzzy Systems ResearchProposal of Concept of Linguistic Variables (1975)Proposal of Concept of Linguistic Variables (1975)

    Concept of Linguistic Variables (Zadeh, 1975)p g ( , )

    L.A. Zadeh: The concept of a linguistic variable and its p gapplication to approximate reasoning - I. Information Sciences (1975) 6,894 Citations

    I am not tall.

    ( )

    Linguistic variable: HeightLinguistic value: Tall

    Medium TallShort

    But I am heavy.Linguistic variable: Weight 165150 180g gLinguistic value: Heavy Height (cm)

  • Brief Review of Fuzzy Systems ResearchMamdanis Fuzzy Control (1970s)Mamdani s Fuzzy Control (1970s)

    Fuzzy Control (Mamdani, 1975)y ( , )E. H. Mamdani, S. Assilian: An experiment in linguistic

    th i ith f l i t ll I i lsynthesis with a fuzzy logic controller. International Journal of Man-Machine Studies (1975) 3,323 Citations

    The basic idea is to utilize linguistic if-then rules of human experts

    If your height is short and

    human experts.Medium HeavyLight

    If your height is short and your weight is heavy

    then you should not eat too much 6550 80then you should not eat too much. 6550 80Weight (kg)

  • Brief Review of Fuzzy Systems ResearchFuzzy Boom in Japan (Late 1980s)Fuzzy Boom in Japan (Late 1980s)

    Real-World Applications of Fuzzy ControlReal World Applications of Fuzzy ControlSubway Trains Car EnginesSt l Pl t MiSteel Plants MicrowavesContainer Cranes OvensVacuum Cleaners Air ConditionersVideo Cameras Drying Machinesy gFan Heaters Electric CarpetsWashing Machines Rice CookersWashing Machines Rice Cookers

    More than 200 real-world applications in Japanpp p

  • Fuzzy Systems in 1980sSimple and Interpretable fuzzy systemsSimple and Interpretable fuzzy systems

    r

    g

    e Fuzzy systems: easy to understand,t i l t

    L

    a

    r easy to implement,easy to modify.

    1980sInterpretable

    o

    r

    ==> Many Applicationsp

    fuzzy system

    E

    r

    r

    o

    Accuratefuzzy system

    S

    m

    a

    l

    l

    ComplexitySimple Complicated

  • Plan of This Presentation1. Introduction

    - Focus of the presentation - Brief introduction to evolutionary computation- Brief introduction to fuzzy rule-based classifiersBrief introduction to fuzzy rule based classifiers

    2. Review of Fuzzy Rule-Based System ResearchFuzzy rules from human experts- Fuzzy rules from human experts

    - Accuracy maximization- Interpretability improvement- Multiobjective approach

    3. Evolutionary Fuzzy Classifier Designy y g- Evolutionary fuzzy rule selection- Fuzzy genetics-based machine learning

    4. Current Hot Issues and Future Directions

  • Around 1990 in Fuzzy System ResearchFuzzy System Design from Numerical DataFuzzy System Design from Numerical Data

    r

    g

    e Simple and Interpretable Fuzzy systemsL

    a

    r

    Interpretable1980s

    Fuzzy rule generation

    o

    r

    pfuzzy system Fuzzy rule generation from numerical data

    E

    r

    r

    o

    Accuratefuzzy system

    S

    m

    a

    l

    l

    ComplexitySimple Complicated

  • Brief Review of Fuzzy Systems ResearchTakagi-Sugeno Model (TSK Model 1985)Takagi-Sugeno Model (TSK Model 1985)Fuzzy Rule Generation from Numerical DataTakagi-Sugeno Model (IEEE T. SMC, 1985)T. Takagi, M. Sugeno: Fuzzy identification of systems and its applications to modeling and control. IEEE Trans. on S t M d C b ti (1985) 10 278 CitationsSystems, Man, and Cybernetics (1985) 10,278 Citations

    Takagi-Sugeno:If x1 is small and x2 is small then y = a0 + a1x1 + a2x2If x1 is small and x2 is small then y a0 a1x1 a2x2

    Mamdani: If x1 is small and x2 is small then y is large.1 2 y g

  • Brief Review of Fuzzy Systems ResearchWang-Mendels Rule Generation Method (1992)Wang-Mendel s Rule Generation Method (1992)Fuzzy Rule Generation from Numerical DataWang-Mendel Method (IEEE T. SMC, 1992)L X Wang J M Mendel: Generating fuzzy rules byL. X. Wang, J. M. Mendel: Generating fuzzy rules by learning from examples. IEEE Trans. on Systems, Man and Cybernetics (1992) 1 972 Citationsand Cybernetics (1992) 1,972 Citations

    1.0 large medium large mediumA single Mamdani-style fuzzy rule is

    D t ( ) ( ) (0 4 0 9 0 1)

    medium small medium large

    Input variab

    A single Mamdani style fuzzy rule is generated from a single data point.

    Data ( ): (x1, x2, y) = (0.4, 0.9, 0.1)

    If i di d i l 0 0

    msm

    all

    medium small medium

    ble x2

    If x1 is medium and x2 is largethen y is small 0.0 1.0

    small medium large

    0.0

    Input variable x1

  • Brief Review of Fuzzy Systems ResearchFuzzy Classification Rule with a Rule Weight (1992)Fuzzy Classification Rule with a Rule Weight (1992)

    Fuzzy Rule Generation from Numerical DataFuzzy Classification Rule with Rule Weight (1992)H Ishibuchi K Nozaki H Tanaka: DistributedH. Ishibuchi, K. Nozaki, H. Tanaka: Distributed representation of fuzzy rules and its application to pattern classification. Fuzzy Sets and Systems (1992) 351 Citations

    Type of Fuzzy Rules

    classification. Fuzzy Sets and Systems (1992) 351 Citations

    yp yIf x1 is small and x2 is small then Class 2 with 0.8

  • Fuzzy Rule GenerationConseq ent class can be specified b training patternsConsequent class can be specified by training patterns

    Basic Form1 Class 1 Class 2 Class 3

    If x1 is small and x2 is smallthen Class 2

    .0 largem

    ex2

    ediumsm

    al

    small medium large

    0.0ll

    0.0 1.0x1

  • Fuzzy Rule GenerationConseq ent class can be specified b training patternsConsequent class can be specified by training patterns

    Basic Form1 Class 1 Class 2 Class 3

    If x1 is small and x2 is smallthen Class 2

    .0 large

    If x1 is small and x2 is mediumthen Class 2

    mex

    2

    ediumsm

    al

    small medium large

    0.0ll

    0.0 1.0x1

  • Fuzzy Rule GenerationConseq ent class can be specified b training patternsConsequent class can be specified by training patterns

    Basic Form1 Class 1 Class 2 Class 3

    If x1 is small and x2 is smallthen Class 2

    .0 large

    If x1 is small and x2 is mediumthen Class 2

    me

    If x1 is small and x2 is largethen Class 1

    x2

    edium

    then Class 1

    smal

    small medium large

    0.0ll

    0.0 1.0x1

  • Fuzzy Rule GenerationConseq ent class can be specified b training patternsConsequent class can be specified by training patterns

    Basic Form1 Class 1 Class 2 Class 3

    If x1 is small and x2 is smallthen Class 2

    .0 large

    If x1 is small and x2 is mediumthen Class 2me

    If x1 is small and x2 is largethen Class 1

    x2

    edium

    . . .

    If x is large and x is large

    smal If x1 is large and x2 is large

    then Class 3small medium large

    0 0 1 0

    0.0ll

    High InterpretabilityEasy to Understand !

    0.0 1.0x1

  • Fuzzy Rule GenerationConseq ent class can be specified b training patternsConsequent class can be specified by training patterns

    Basic Form1 Class 1 Class 2 Class 3

    If x1 is small and x2 is smallthen Class 2

    .0 large

    If x1 is small and x2 is mediumthen Class 2me

    If x1 is small and x2 is largethen Class 1

    x2

    edium

    . . .

    If x is large and x is large

    smal If x1 is large and x2 is large

    then Class 3small medium large

    0 0 1 0

    0.0ll

    High InterpretabilityEasy to Understand !

    0.0 1.0x1Winner Rule: Highest Compatibility

  • Disadvantage: Low AccuracyMany patterns are misclassifiedMany patterns are misclassified

    1 Class 1 Class 2 Class 3 Basic Form.0 large If x1 is small and x2 is smallthen Class 2

    me

    If x1 is small and x2 is mediumthen Class 2x

    2

    edium If x1 is small and x2 is largethen Class 1

    smal

    . . .

    If x is large and x is large0.0

    small medium large

    ll If x1 is large and x2 is largethen Class 3

    0.0 1.0x1 High InterpretabilityEasy to Understand !Winner Rule: Highest Compatibility

  • How to Improve the Accuracy ? y

    1 Class 1 Class 2 Class 3 Basic Form.0 large If x1 is small and x2 is smallthen Class 2

    me

    If x1 is small and x2 is mediumthen Class 2x

    2

    edium If x1 is small and x2 is largethen Class 1

    smal

    . . .

    If x is large and x is large

    small medium large

    0.0ll If x1 is large and x2 is large

    then Class 3

    0.0 1.0x1Winner Rule: Highest Compatibility

    High InterpretabilityLow Accuracy !

  • Accuracy ImprovementUse of Rule Weight (Certainty Factor)Use of Rule Weight (Certainty Factor)

    1 Class 1 Class 2 Class 3 Basic Form.0

    If x1 is small and x2 is mediumthen Class 2

    large

    Rule Weight VersionIf x is small and x is medium

    mex

    2

    If x1 is small and x2 is mediumthen Class 2 with 0.158

    ediumsm

    al0.0

    small medium large

    ll

    0.0 1.0x1H. Ishibuchi et al. (1992) Distributed representation of fuzzy rules and its application to pattern classification, Fuzzy Sets and Systems.

  • Accuracy ImprovementUse of Rule Weight (Certainty Factor)Use of Rule Weight (Certainty Factor)

    1 Class 1 Class 2 Class 3 Basic Form.0

    If x1 is small and x2 is mediumthen Class 2

    large

    Rule Weight VersionIf x is small and x is medium

    mex

    2

    If x1 is small and x2 is mediumthen Class 2 with 0.158

    ediumsm

    al

    Rule weight specification- Heuristic manner

    Learning (NN)0.0

    small medium large

    ll - Learning (NN)- Optimization (GA)

    0.0 1.0x1H. Ishibuchi et al. (1992) Distributed representation of fuzzy rules and its application to pattern classification, Fuzzy Sets and Systems.

  • Classification BoundaryFuzzy Rule Based Classifier with Rule WeightsFuzzy Rule-Based Classifier with Rule Weights

    1 Class 1 Class 2 Class 3 Basic Form.0 large If x1 is small and x2 is mediumthen Class 2

    me

    Rule Weight VersionIf x is small and x is mediumx

    2

    edium

    If x1 is small and x2 is mediumthen Class 2 with 0.158

    smal Winner Rule0.0

    small medium large

    ll

    Compatibility x Rule Weight

    0.0 1.0x1

  • Brief Review of Fuzzy Systems ResearchClustering-Based Approach to Classification ProblemsClustering-Based Approach to Classification Problems

    P. K. Simpson: Fuzzy min-max neural networks - I: Classification.IEEE Trans. on Neural Networks (1992) 701 Citations

    S. Abe, M. S. Lan: A method for fuzzy rules extraction directly from numerical data and its application to pattern classification. IEEE Trans. on Fuzzy Systems (1995) 270 Citations

    Type of Fuzzy RulesIf x is A and x is A

    1.0

    IIf x1 is A1 and x2 is A2then Class 2

    nput variable

    Each fuzzy rule has its own antecedent fuzzy sets 0 0

    e x2

    antecedent fuzzy sets

    0.0 1.0

    0.0

    Input variable x1

  • Accuracy ImprovementIndependent Membership Functions

    Class 1 Class 2 Class 3

    Independent Membership Functions

    1.0

    High FlexibilityEach rule can be generatedg

    and adjusted independently.==> High Accuracy

    x2

    0.0

    1.00.0 x1

  • Potential DifficultyOverlapping of Membership Functions

    Class 1 Class 2 Class 3

    Overlapping of Membership Functions

    1.0

    High FlexibilityEach rule can be generatedg

    and adjusted independently.==> High Accuracy

    x2 Membership functions can

    be heavily overlapping.==> Poor Interpretability

    0.0

    1.00.0 x1

  • Main Stream in 1990sLearning of Fuzzy Rule Based SystemsLearning of Fuzzy Rule-Based Systems

    Various ideas related to learning and optimization were proposed for fuzzy system design in 1990se

    Accuracy Maximizationwere proposed for fuzzy system design in 1990s.

    L

    a

    r

    g

    1980s- Fuzzy Neural Learning- Genetic Fuzzy Optimization

    Interpretablefuzzy system

    E

    r

    r

    o

    r

    Accurate1990s

    m

    a

    l

    l

    fuzzy system

    S

    m

    ComplexitySimple Complicated

  • Brief Review of Fuzzy Systems ResearchFuzzy & Neural Network Hybrid (1990s)Fuzzy & Neural Network Hybrid (1990s)Learning of Fuzzy Rule-Based Systems (1990s)Applications of Neural Network Learning Methods

    J S R Jang: ANFIS: Adaptive network based fuzzyJ. S. R. Jang: ANFIS: Adaptive-network-based fuzzy inference system. IEEE Trans. on Systems, Man and Cybernetics (1993) 6,510 CitationsCybernetics (1993) 6,510 CitationsC. T. Lin, C. S. G. Lee, Neural-network-based fuzzy logic control and decision system IEEE Trans onlogic control and decision system. IEEE Trans. on Computers (1991) 1,118 Citations

  • Brief Review of Fuzzy Systems ResearchFuzzy & Neural Network Hybrid (1990s)Fuzzy & Neural Network Hybrid (1990s)

  • Brief Review of Fuzzy Systems ResearchFuzzy & Neural Network Hybrid (1990s)Fuzzy & Neural Network Hybrid (1990s)

  • Brief Review of Fuzzy Systems ResearchFuzzy & Genetic Algorithm Hybrid (1990s)Fuzzy & Genetic Algorithm Hybrid (1990s)Learning of Fuzzy Rule-Based Systems (1990s)Applications of Genetic Algorithms

    C. L. Karr, E. J. Gentry: Fuzzy control of pH using genetic algorithms IEEE Trans on Fuzzy Systemsgenetic algorithms, IEEE Trans. on Fuzzy Systems (1993) 546 Citations

  • Brief Review of Fuzzy Systems ResearchNeuro Fuzzy & Genetic Fuzzy for Pattern ClassificationNeuro Fuzzy & Genetic Fuzzy for Pattern Classification

    Learning of Fuzzy Classification Rules (1990s)D. Nauck, R. Kruse: A neuro-fuzzy method to learn fuzzy classification rules from data Fuzzy sets andfuzzy classification rules from data. Fuzzy sets and Systems (1997) 288 CitationsH Ishibuchi et al : Performance evaluation of fuzzyH. Ishibuchi et al.: Performance evaluation of fuzzy classifier systems for multidimensional pattern classification problems. IEEE Trans. on Systems, Man,classification problems. IEEE Trans. on Systems, Man, and Cybernetics- Part B (1999) 374 Citations

  • Accuracy ImprovementAdjustment of Membership Functions

    Class 1 Class 2 Class 3

    Adjustment of Membership Functions

    1.0

    Adjustment- Learning Algorithms (NN)g g ( )- Optimization (GA)==> High Accuracy

    x2

    0.0

    0.0 1.0x1

  • Potential DifficultyHow to interpret adjusted membership functions ?

    Class 1 Class 2 Class 3

    How to interpret adjusted membership functions ?

    1.0

    Adjustment- Learning Algorithms (NN)g g ( )- Optimization (GA)==> High Accuracy

    x2 After Adjustment

    ==> Poor Interpretability

    0.0

    0.0 1.0x1

  • Difficulty in the Learning of Fuzzy Systems

    Various ideas related to learning and optimization were proposed for fuzzy system design in 1990se

    Difficulty:S h ti d th t

    were proposed for fuzzy system design in 1990s.

    L

    a

    r

    g

    e

    1980sSome researchers noticed that

    the interpretability was degraded while the accuracy was improved.

    Interpretablefuzzy system y p

    E

    r

    r

    o

    r

    E

    Accurate1990s

    m

    a

    l

    l

    fuzzy system

    S

    m

    ComplexitySimple Complicated

  • Attempts around 2000Accuracy Maximization and Complexity Minimization

    Various ideas related to learning and optimization were proposed for fuzzy system design in 1990se

    Accuracy Maximization and Complexity Minimization

    Accuracy Maximizationwere proposed for fuzzy system design in 1990s.

    L

    a

    r

    g

    e

    1980s

    Interpretability

    Interpretablefuzzy system

    Interpretability improvement

    E

    r

    r

    o

    r

    E

    Accurate1990s

    m

    a

    l

    l

    fuzzy system

    S

    m

    ComplexitySimple Complicated

  • Plan of This Presentation1. Introduction

    - Focus of the presentation - Brief introduction to evolutionary computation- Brief introduction to fuzzy rule-based classifiersBrief introduction to fuzzy rule based classifiers

    2. Review of Fuzzy Rule-Based System ResearchFuzzy rules from human experts- Fuzzy rules from human experts

    - Accuracy maximization- Interpretability improvement- Multiobjective approach

    3. Evolutionary Fuzzy Classifier Designy y g- Evolutionary fuzzy rule selection- Fuzzy genetics-based machine learning

    4. Current Hot Issues and Future Directions

  • Complexity MinimizationMerging Similar Membership Functions

    Class 1 Class 2 Class 3

    Merging Similar Membership Functions

    1.0 Similar Membership Functions

    x2

    0.0

    1.00.0 x1

  • Complexity MinimizationMerging Similar Membership Functions

    Class 1 Class 2 Class 3

    Merging Similar Membership Functions

    1.0 Similar Membership Functions==> One Membership Functionp

    x2

    - Interpretability is improved.A i d d d- Accuracy is degraded.

    0.0

    1.00.0 x1

  • Brief Review of Fuzzy Systems ResearchSingle-Objective Formulation of Fuzzy System DesignSingle-Objective Formulation of Fuzzy System Design

    Accuracy Maximization & Complexity MinimizationSingle-Objective Formulation (Around 2000)

    M Setnes R Babuska H B Verbruggen: Rule basedM. Setnes, R. Babuska, H. B. Verbruggen: Rule-based modeling: Precision and transparency. IEEE Trans. on Systems, Man and Cybernetics - Part C (1998) 249Systems, Man and Cybernetics Part C (1998) 249M. Setnes, H. Roubos: GA-fuzzy modeling and classification: Complexity and performance IEEEclassification: Complexity and performance. IEEE Trans. on Fuzzy Systems (2000) 396 Citations

    Simplification oftuned membership functions.

  • Weighted Sum of Accuracy and Complexity

    The number of fuzzy rules

    A (S) C l i (S)

    The number of fuzzy rules

    Fi (S) w1 Accuracy(S) w2 Complexity(S)Fitness(S) =

    The number of correctly classified training patterns

    1st Term: Accuracy Maximization2nd Term: Complexity Minimization

    H. Ishibuchi et al.: Selecting fuzzy if-then rules for classification problems using genetic algorithms IEEEclassification problems using genetic algorithms, IEEE Trans. on Fuzzy Systems (1995) 552 Citations

  • Difficulty in Weighted Sum ApproachSensitivity to Weight Specifications

    Minimize w1 Error + w2 Complexity

    Sensitivity to Weight Specificationsr

    Minimize w1 Error w2 ComplexityWhen the weight for the complexity minimization is large:

    E

    r

    r

    o

    r

    Test dataA simple system is obtained.

    E accuracy

    Training datagaccuracy

    ComplexityS*0

  • Difficulty in Weighted Sum ApproachSensitivity to Weight SpecificationsSensitivity to Weight Specifications

    Minimize w1 Error + w2 Complexityr

    Minimize w1 Error w2 ComplexityWhen the weight for the error minimization is large:

    E

    r

    r

    o

    r

    Test dataA complicated system is obtained.

    E accuracy

    Training datagaccuracy

    ComplexityS*0

  • Difficulty in Weighted Sum ApproachSensitivity to Weight SpecificationsSensitivity to Weight Specifications

    Minimize w1 Error + w2 Complexityr

    Minimize w1 Error w2 ComplexityWhen the two weights are appropriately specified:

    E

    r

    r

    o

    r

    Test dataA good system is obtained. But the best complexity is not always found.

    E accuracy

    Training datagaccuracy

    ComplexityS*0

  • Multiobjective Fuzzy System Design

    Basic IdeaTo search for a number of non-dominated fuzzy systems withrespect to the accuracy maximization and the interpretability

    i i ti (i t d f hi f i l f t )maximization (instead of searching for a single fuzzy system).

    Aggregation Approach

    )()()( Complexity2Error1 SfwSfwSf +=Multiobjective Approach

    )}(),({Minimize ComplexityError SfSf p ySearch for Pareto Optimal Fuzzy Rule-Based Systems

  • Multiobjective Approach

    Minimize (Error, Complexity)r

    Minimize (Error, Complexity)

    Multiobjective

    E

    r

    r

    o

    r

    Test dataMultiobjective

    Search

    E accuracy

    Training datagaccuracy

    ComplexityS*0

  • Plan of This Presentation1. Introduction

    - Focus of the presentation - Brief introduction to evolutionary computation- Brief introduction to fuzzy rule-based classifiersBrief introduction to fuzzy rule based classifiers

    2. Review of Fuzzy Rule-Based System ResearchFuzzy rules from human experts- Fuzzy rules from human experts

    - Accuracy maximization- Interpretability improvement- Multiobjective approach

    3. Evolutionary Fuzzy Classifier Designy y g- Evolutionary fuzzy rule selection- Fuzzy genetics-based machine learning

    4. Current Hot Issues and Future Directions

  • Multiobjective Approach (Pittsburgh)Various Fuzzy Systems along the Tradeoff FrontVarious Fuzzy Systems along the Tradeoff Front

    A number of non-dominated fuzzy systems f d l th t d ff fe were found along the tradeoff surface.

    Multiobjective 1980sLa

    r

    g

    e

    Interpretablefuzzy system

    jSearch

    E

    r

    r

    o

    r 2000Good TradeoffE

    Accurate1990sGood Tradeoff

    m

    a

    l

    l

    fuzzy system

    S

    m

    ComplexitySimple Complicated

  • Multi-Objective Formulations

    Two-Objective Formulation (1997) j ( )f1(S): To minimize the error (Accuracy Maximization)f (S): To minimize the number of fuzzy rulesf2(S): To minimize the number of fuzzy rules

    (Complexity Minimization)H. Ishibuchi et al. (1997) Single-objective and two-objective genetic algorithms for selecting linguistic rules for ... , Fuzzy Sets and Systems.

    Three-Objective Formulation (2001) j ( )f3(S): To minimize the total number of conditions

    (i.e., to minimize the total rule length)H. Ishibuchi et al. (2001) Three-objective genetics-based machine learning for linguistic rule extraction, Information Sciences.

    (i.e., to minimize the total rule length)

    H. Ishibuchi and Y. Nojima (2007) Analysis of interpretability-accuracy tradeoff of fuzzy systems by multiobjective fuzzy genetics-based machine learning, International Journal of Approximate Reasoning.

  • Brief Review of Fuzzy Systems ResearchMulti-Objective Formulation of Fuzzy System DesignMulti-Objective Formulation of Fuzzy System Design

    Two-Objective Formulation

    H. Ishibuchi, T. Murata, I. B. Trken: Single-objective and two-objective genetic algorithms for selecting linguistic rules for pattern classification problems. Fuzzy Sets and Systems (1997) 257 Citations

  • Brief Review of Fuzzy Systems ResearchMulti-Objective Formulation of Fuzzy System DesignMulti-Objective Formulation of Fuzzy System Design

    Three-Objective Formulation

    H. Ishibuchi, T. Nakashima, T. Murata: Three-objective genetics based machine learning for linguistic rule

    j

    genetics-based machine learning for linguistic rule extraction. Information Sciences (2001) 210 CitationsH Ishib chi T Yamamoto: F r le selection bH. Ishibuchi, T. Yamamoto: Fuzzy rule selection by multi-objective genetic local search algorithms and rule evaluation measures in data mining Fuzzy Sets andevaluation measures in data mining. Fuzzy Sets and Systems (2004) 250 CitationsH Ishibuchi Y Nojima: Analysis of interpretabilityH. Ishibuchi, Y. Nojima: Analysis of interpretability-accuracy tradeoff of fuzzy systems by multiobjective fuzzy genetics-based machine learning. Internationalfuzzy genetics based machine learning. International Journal of Approximate Reasoning (2007) 177 Citations

  • Experimental Results: Obtained Rule SetsMultiobjective Rule Selection for the Glass Data Set

    60 60

    Multiobjective Rule Selection for the Glass Data Set%

    )

    50

    %

    )

    50

    o

    r

    r

    a

    t

    e

    (

    %

    40

    o

    r

    r

    a

    t

    e

    (

    %

    40

    E

    r

    r

    o

    30

    E

    r

    r

    o

    30

    Number of rules2 4 6 8 10 1220

    Number of rules2 4 6 8 10 1220

    Number of rules Number of rulesTraining data accuracy Testing data accuracy

    Minimize f1(S) f2(S) and f3(S)Minimize f1(S), f2(S) and f3(S)f1(S): Error Rate, f2(S): Number of Rules, f3(S): Total Rule Length

  • Experimental Results: Obtained Rule SetsMultiobjective Rule Selection for the Glass Data Set

    60 60

    Multiobjective Rule Selection for the Glass Data Set%

    )

    50

    %

    )

    50

    o

    r

    r

    a

    t

    e

    (

    %

    40

    o

    r

    r

    a

    t

    e

    (

    %

    40

    E

    r

    r

    o

    30

    E

    r

    r

    o

    30

    Number of rules2 4 6 8 10 1220

    Number of rules2 4 6 8 10 1220

    Number of rules Number of rules

    Obtained rule sets help us to find the optimal complexity of fuzzy

    Training data accuracy Testing data accuracyObtained rule sets help us to find the optimal complexity of fuzzy systems. (Rule sets with five rules may be good)

  • A Rule Set with High Interpretability

    A very simple rule set with two rulesA very simple rule set with two rules

    60

    %

    )

    50 x1 x3 x4Cl 1

    Consequent

    r

    r

    o

    r

    r

    a

    t

    e

    (

    40R1

    R2

    DCDC

    Class 1(0.44)Class 2(0 24)

    E

    r

    30

    2 DC (0.24)

    Number of rules2 4 6 8 10 1220

    Testing data accuracyTesting data accuracy

  • A Rule Set with High Generalization Ability

    A rule set with the highest test data accuracyA rule set with the highest test data accuracy

    60

    %

    )

    50

    DC

    x1 x2

    R1

    R

    DCx3

    DC

    x4

    DCx7 x8

    DCDC

    DC Class 1(0.44)Class 2

    Consequent

    r

    r

    o

    r

    r

    a

    t

    e

    (

    40

    DCR2DCR3

    R

    DCDC

    DCDC

    DCDC

    DCDC

    DC

    DC

    Class 2(0.36)

    Class 2(0.43)Class 4

    E

    r

    30R4

    DCR5 DCDCDC

    DCDC

    DCDCDC

    (0.37)

    Class 6(0.74)

    Number of rules2 4 6 8 10 1220

    Testing data accuracyTesting data accuracy

  • A Rule Set with High Complexity

    A rule set the highest training data accuracy

    60

    A rule set the highest training data accuracy

    DCx1 x2

    R

    x3 x4 x5 x7

    DC DC DCx8 x9

    Class 1

    Consequent

    DC DC

    %

    )

    50

    DCDC

    R1

    R2

    DCR3DCDC

    DCDC DC DC

    DCDC

    DCDC

    DCDC DC Class 1(0.42)

    Class 1(0.26)

    Class 2(0.85)

    DC DC

    r

    r

    o

    r

    r

    a

    t

    e

    (

    %

    40R4

    DCR5DCR

    DCDC

    DCDC

    DC DC

    DC

    DCDC

    DC DCDC

    DCDCDC

    Class 2(0.67)

    Class 2(0.36)

    Class 2

    E

    r

    30DCR6

    DCDC DC DCDCDC (0.24)

    DCR7R8

    DCDC

    DCDC

    DCDC DC

    DCDC

    DCDC

    Class 4(0.93)

    Class 4(0.37)

    Number of rules2 4 6 8 10 1220

    DC

    R9

    DCR10DCR

    DCDC DC DC

    DCDC

    DC

    DC

    DCDC

    DC DCDCDC

    Class 5(0.27)

    Class 6(1.00)

    Class 6Training data accuracy DCR11 DC DC DCDC DC (0.88)Training data accuracy

  • Plan of This Presentation1. Introduction

    - Focus of the presentation - Brief introduction to evolutionary computation- Brief introduction to fuzzy rule-based classifiersBrief introduction to fuzzy rule based classifiers

    2. Review of Fuzzy Rule-Based System ResearchFuzzy rules from human experts- Fuzzy rules from human experts

    - Accuracy maximization- Interpretability improvement- Multiobjective approach

    3. Evolutionary Fuzzy Classifier Designy y g- Evolutionary fuzzy rule selection- Fuzzy genetics-based machine learning

    4. Current Hot Issues and Future Directions

  • General Case: Fuzzy Rules with n Inputs

    Rule R : If x is A and and x is ARule Rq : If x1 is Aq1 and ... and xn is Aqnthen Class Cq with CFqq q

    Antecedent Fuzzy Sets: 15 Fuzzy Sets

    1 0 1 2 1 0 3 541 0 0

    (including dont care)

    1.0 1.01.0

    dont care

    0.0 1.0 0.0 1.0

    1.0 6 9870.0 1.0

    1.0 a edb c

    0.0 1.0 0.0 1.0

  • Possible Fuzzy Rules with n inputsy p

    Total number of possible fuzzy rules: 15nTotal number of possible fuzzy rules: 15

    ( ) ( ) ( )n151515 ( ) ( ) ( )151515 =Dont careDont care Don t careDon t care

    If x1 is xn is then

  • Design of Fuzzy Rule-Based Classifierg y

    Possible fuzzy rules : (15)n rulesPossible fuzzy rules : (15) rules

    Combinatorial Optimi ationCombinatorial Optimization==> Evolutionary Computation

    Our Approaches- Genetic Rule Selection Approach

    A small number of fuzzy rules (An accurate and interpretable fuzzy system)

  • Genetic Rule SelectionTwo Step ApproachTwo Step ApproachPossible fuzzy rules : (15)n rules

    C did t f l ( 1000 l )Step 1: Rule Extraction (Data Mining)

    Candidate fuzzy rules (e.g., 1000 rules)Step 2: Rule Selection (Optimization)

    A small number of fuzzy rules (e.g., 10 rules)

    If x is ... then Class 1

    If x is ... then Class 2

    If x is ... then Class 1

    If i th Cl 2If x is then Class 1If x is ... then Class 2

    If x is ... then Class 2

    ...

    If x is ... then Class 1

    If x is ... then Class 2

    If x is ... then Class 2If x is ... then Class 1

    If x is then Class 2

    Step 1 Step 2

    If x is ... then Class 2If x is ... then Class 2

    If x is ... then Class 2

  • Genetic Rule SelectionStep 1: Rule Extraction (Data Mining)Step 1: Rule Extraction (Data Mining)

    Candidate RulesN i l D tIf x is ... then Class 1

    If x is then Class 2

    Numerical DataIf x is ... then Class 2

    If x is ... then Class 3

    If x is then Class 2Rule ExtractionRule Extraction If x is ... then Class 2

    If x is ... then Class 1

    ...

    Rule ExtractionRule Extraction

    If x is ... then Class 2

    Candidate rules are extracted from numerical data using a data mining technique based on Support and Confidence.H. Ishibuchi, T. Yamamoto (2004) Fuzzy rule selection by multi-objective genetic local search algorithms and rule evaluation measures in data mining, Fuzzy Sets and Systems.

  • Genetic Rule SelectionStep 2: Rule Selection (Optimization)Step 2: Rule Selection (Optimization)

    1If x is ... then Class 1If x is ... then Class 2

    100

    1: Inclusion of the rule0 E l i f th l

    If x is ... then Class 3

    If x is ... then Class 2

    01

    0: Exclusion of the ruleIf x is ... then Class 1

    ...

    1

    ...Phenotype Genotype

    If x is ... then Class 2 0

    A subset of candidate rules is coded as a binary string

    Phenotype Genotype

    A subset of candidate rules is coded as a binary string.String length is the same as the number of candidate rules.

  • Plan of This Presentation1. Introduction

    - Focus of the presentation - Brief introduction to evolutionary computation- Brief introduction to fuzzy rule-based classifiersBrief introduction to fuzzy rule based classifiers

    2. Review of Fuzzy Rule-Based System ResearchFuzzy rules from human experts- Fuzzy rules from human experts

    - Accuracy maximization- Interpretability improvement- Multiobjective approach

    3. Evolutionary Fuzzy Classifier Designy y g- Evolutionary fuzzy rule selection- Fuzzy genetics-based machine learning

    4. Current Hot Issues and Future Directions

  • Genetics-Based Machine Learningfor a classification problem with n inputsfor a classification problem with n inputsDifferent Approach from Rule Selectione e t pp oac o u e Se ect o(Genetic Algorithm for Rule Generation)

    Rule Rq : If x1 is Aq1 and ... and xn is Aqnthen Class Cq with CFq

    Antecedent Part of a Single Rule: Aq1 Aq2 ... AqnCoding of the Antecedent Part of Each Rule:Coding of the Antecedent Part of Each Rule:

    Aq1 Aq2 . . . Aqn

    Integer String of Length n

  • Genetics-Based Machine Learningfor a classification problem with n inputs

    Coding of Each Rule: : 15n rules

    for a classification problem with n inputsA 1 A 2 . . . ACoding of Each Rule: : 15 rules Aq1 Aq2 . . . Aqn

    ( ) ( ) ( )n151515 =( ) ( ) ( )151515 Dont careDont care

    If x1 is xn is then

    Aq1 Aqn

  • Genetics-Based Machine Learningfor a classification problem with n inputs

    Rule R : If x1 is A 1 and and x is A

    for a classification problem with n inputsRule Rq : If x1 is Aq1 and ... and xn is Aqn

    then Class Cq with CFqCoding of Each Rule: Integer String of Length n

    1 5 bA 1 A 2 A 1 5 . . . b

    1.0 1 2 1.0 3 541.0 0

    Aq1 Aq2 . . . Aqn

    0 0 1 0 0 0 1 00 0 1 0

    dont care

    0.0 1.0 0.0 1.0

    1.0 6 987

    0.0 1.0

    1.0 a edb c

    0.0 1.0 0.0 1.0

  • Genetics-Based Machine Learningfor a classification problem with n inputsfor a classification problem with n inputs

    Rule R : If x is A and and x is ARule Rq : If x1 is Aq1 and ... and xn is Aqnthen Class Cq with CFq

    Single Rule: Aq1 Aq2 . . . Aqn

    Rule Set: { , , ...

    }

    A11 A12 . . . A1n A21 A22 . . . A2nAN1 AN2 ANA 1 A 2 A , ..., }

    Two Genetics-Based Machine Learning Approaches

    AN1 AN2 . . . ANnAq1 Aq2 . . . Aqn

    Two Genetics Based Machine Learning ApproachesPittsburgh Approach: Individual = Rule SetMichigan Approach: Individual = Ruleg pp

  • Crossover in Pittsburgh ApproachIndividual = Rule Set (Seven Input Case)Individual = Rule Set (Seven-Input Case)

    0 0 08 0 0 0

    Parent 10 0 30 0 0 0

    Parent 2

    0 e 00 9 0 0

    0 0 08 0 0 0

    9 0 00 0 d 0 0 0 b0 0 0 5

    0 8 00 0 e 0

    0 0 30 0 0 0

    7 5 00 0 d 0

    0 0 03 0 0 0

    9 0 00 0 d 0 0 0 b0 0 0 5

    8 0 06 0 1 0

    6 0 00 0 0 0

    0 0 00 2 0 c

    0 b 00 0 4 0

    0 0 03 0 0 0 6 0 00 0 0 0

    0 0 0c 0 0 9

    Random selection

    Off i

    selection

    Offspring

  • Crossover in Pittsburgh ApproachIndividual = Rule Set (Seven Input Case)Individual = Rule Set (Seven-Input Case)

    0 0 08 0 0 0

    Parent 10 0 30 0 0 0

    Parent 2

    0 e 00 9 0 0

    0 0 08 0 0 0

    9 0 00 0 d 0 0 0 b0 0 0 5

    0 8 00 0 e 0

    0 0 30 0 0 0

    7 5 00 0 d 0

    0 0 03 0 0 0

    9 0 00 0 d 0 0 0 b0 0 0 5

    8 0 06 0 1 0

    6 0 00 0 0 0

    0 0 00 2 0 c

    0 b 00 0 4 0

    0 0 03 0 0 0 6 0 00 0 0 0

    0 0 0c 0 0 9

    Random selection

    Off i0 0 08 0 0 0 9 0 00 0 d 0

    selection

    Offspring 0 0 03 0 0 0 6 0 00 0 0 00 0 0c 0 0 9

  • Mutation in Pittsburgh ApproachIndividual = Rule Set (Seven Input Case)Individual = Rule Set (Seven-Input Case)

    Offspring0 0 08 0 0 0 9 0 00 0 d 0

    0 0 03 0 0 0 6 0 00 0 0 0

    0 0 0c 0 0 9

    MutationRandom change of a part of each stringRandom change of a part of each string.

    9 0 00 0 d 0

    6 0 00 0 0 0

    0 0 0 0 0 9

    a

    e

    7

    Offspring0 0 08 0 0 0

    0 0 03 0 0 0

    9

    1

    0 0 0c 0 0 97

  • Michigan Approach (Seven-Input Case)Individual = Rule (Population = Rule Set)Individual = Rule (Population = Rule Set)

    H. Ishibuchi et al. (1999) Performance evaluation of fuzzy classifier t f lti di i l tt l ifi ti bl IEEEsystems for multi-dimensional pattern classification problems, IEEE

    Trans. on SMC- Part B.H. Ishibuchi et al. (2005) Hybridization of fuzzy GBML approaches for ( ) y y pppattern classification problems,, IEEE Trans. on Systems, Man, and Cybernetics- Part B: Cybernetics.

    0 e 00 9 0 0

    0 0 08 0 0 0 Individual: A single fuzzy rule

    7 5 00 0 d 0

    0 e 00 9 0 0

    9 0 00 0 d 0

    0 0 00 2 0 c

    0 0 03 0 0 0

    0 b 00 0 4 0

    A rule set (Current Population)

  • Michigan Approach (Seven-Input Case)Individual = Rule (Population = Rule Set)

    A number of fuzzy rules in a current population areIndividual = Rule (Population = Rule Set)

    used in the next population with no changes (elitism).

    0 e 00 9 0 0

    0 0 08 0 0 0

    0 e 00 9 0 0

    0 0 08 0 0 0

    No Changes

    7 5 00 0 d 0

    0 e 00 9 0 0

    9 0 00 0 d 0

    0 e 00 9 0 0

    9 0 00 0 d 0

    No Changes

    0 0 00 2 0 c

    0 0 03 0 0 0

    0 b 00 0 4 0

    A rule set (Current Population) Next Population

  • Michigan Approach (Seven-Input Case)Individual = Rule (Population = Rule Set)

    A number of fuzzy rules in a current population areIndividual = Rule (Population = Rule Set)

    used in the next population with no changes (elitism). Some rules are generated from misclassified patterns

    b h i ti l tiby heuristic rule generation.

    0 e 00 9 0 0

    0 0 08 0 0 0

    0 e 00 9 0 0

    0 0 08 0 0 0

    No Changes

    7 5 00 0 d 0

    0 e 00 9 0 0

    9 0 00 0 d 0

    0 e 00 0 d 0

    0 e 00 9 0 0

    9 0 00 0 d 0

    H i i

    No Changes

    0 0 00 2 0 c

    0 0 03 0 0 0 0 0 08 2 0 cHeuristic

    0 b 00 0 4 0

    A rule set (Current Population) Next Population

  • Michigan Approach (Seven-Input Case)Individual = Rule (Population = Rule Set)

    A number of fuzzy rules in a current population areIndividual = Rule (Population = Rule Set)

    used in the next population with no changes (elitism). Some rules are generated from misclassified patterns

    b h i ti l tiby heuristic rule generation. The other rules are generated by genetic operators.

    0 e 00 9 0 0

    0 0 08 0 0 0

    0 e 00 9 0 0

    0 0 08 0 0 0

    No Changes

    7 5 00 0 d 0

    0 e 00 9 0 0

    9 0 00 0 d 0

    0 e 00 0 d 0

    0 e 00 9 0 0

    9 0 00 0 d 0

    H i i

    No Changes

    0 0 00 2 0 c

    0 0 03 0 0 0

    0 0 07 0 4 0

    0 0 08 2 0 cHeuristic

    GA O t0 b 00 0 4 0 0 b 00 2 0 c

    GA Operators

    A rule set (Current Population) Next Population

  • Plan of This Presentation1. Introduction

    - Focus of the presentation - Brief introduction to evolutionary computation- Brief introduction to fuzzy rule-based classifiersBrief introduction to fuzzy rule based classifiers

    2. Review of Fuzzy Rule-Based System ResearchFuzzy rules from human experts- Fuzzy rules from human experts

    - Accuracy maximization- Interpretability improvement- Multiobjective approach

    3. Evolutionary Fuzzy Classifier Designy y g- Evolutionary fuzzy rule selection- Fuzzy genetics-based machine learning

    4. Current Hot Issues and Future Directions

  • Interpretability of Fuzzy SystemsInterpretability Maximization = Complexity Minimization (?)

    - Minimization of the number of fuzzy rules- Minimization of the number of antecedent conditions

    Many other factors are related to the interpretability- Granularity of membership functions

    Sh f b hi f ti- Shape of membership functions- Overlap of membership functions, and many others.

    H Ishibuchi et al (2011) Design of linguistically interpretableH. Ishibuchi et al. (2011) Design of linguistically interpretable fuzzy rule-based classifiers: A short review and open questions, Journal of Multiple-Valued Logic and Soft Computing.p g p g

  • Current Hot Issues and Future Directions Fuzzy Classifiers on Various ProblemsWe have a lot of different types of classification problems

    h f l b d l ifi h t b ft

    Fuzzy Classifiers on Various Problems

    where fuzzy rule-based classifiers have not been oftenused but have a large potential usefulness:1 Large Data Sets1. Large Data Sets

    (e.g., 1,000,000 patterns)

    2 Imbalanced Data Sets2. Imbalanced Data Sets(e.g., Class 1: 100,000 patterns, Class 2: 100 patterns)

    3 Semi-Supervised Learning3. Semi Supervised Learning(e.g., Data Set = 100 labeled patterns + 9,900 unlabeled)

    4. On-Line Learning4. On Line Learning(e.g., New training patterns come every minute)

    5. . . .

  • Current Hot Issues and Future Directions Fuzzy Classifiers on Various ProblemsWe have a lot of different types of classification problems

    h f l b d l ifi h t b ft

    Fuzzy Classifiers on Various Problems

    where fuzzy rule-based classifiers have not been oftenused but have a large potential usefulness:1 Large Data Sets1. Large Data Sets

    (e.g., 1,000,000 patterns)

    2 Imbalanced Data Sets2. Imbalanced Data Sets (e.g., Class 1: 100,000 patterns, Class 2: 100 patterns)

    3 Semi-Supervised Learning3. Semi Supervised Learning(e.g., Data Set = 100 labeled patterns + 9,900 unlabeled)

    4. On-Line Learning4. On Line Learning(e.g., New training patterns come every minute)

    5. . . .

  • Current Hot Issues and Future Directions Fuzzy Data Mining (Handling of Large Data)Fuzzy Data Mining (Handling of Large Data)

  • Current Hot Issues and Future Directions Fuzzy Data Mining (Handling of Large Data)Fuzzy Data Mining (Handling of Large Data)

  • Current Hot Issues and Future Directions Fuzzy Classifiers on Various ProblemsWe have a lot of different types of classification problems

    h f l b d l ifi h t b ft

    Fuzzy Classifiers on Various Problems

    where fuzzy rule-based classifiers have not been oftenused but have a large potential usefulness:1 Large Data Sets ==> Difficult for GA Based Approaches1. Large Data Sets ==> Difficult for GA-Based Approaches

    (e.g., 1,000,000 patterns)

    2 Imbalanced Data Sets2. Imbalanced Data Sets (e.g., Class 1: 100,000 patterns, Class 2: 100 patterns)

    3 Semi-Supervised Learning3. Semi Supervised Learning(e.g., Data Set = 100 labeled patterns + 9,900 unlabeled)

    4. On-Line Learning4. On Line Learning(e.g., New training patterns come every minute)

    5. . . .

  • Design of Rule-Based SystemsEnvironment

    g y

    PopulationIf ... Then ...If ... Then ...If Then

    If ... Then ...If ... Then ...If Then

    If ... Then ...If ... Then ...If Then

    If ... Then ...If ... Then ...If ThenIf ... Then ...

    If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...

    Individual = Rule-Based System ( )If ... Then ...If ... Then ...If ... Then ...If ... Then ...

  • Difficulty in Applications to Large DataComputation Load for Fitness EvaluationComputation Load for Fitness Evaluation

    Fitness Evaluation:PopulationIf ... Then ... If ... Then ...

    Fitness Evaluation:Evaluation of each individual using the given training data

    If ... Then ...If ... Then ...If Then

    If ... Then ...If ... Then ...If ThenIf ... Then ...

    If ... Then ...

    If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...If Then

    If ... Then ...If ... Then ...If ... Then ... Training dataIf ... Then ... If ... Then ...

    I di id l R l B d S t ( )If ... Then ...

    Training data

    Individual = Rule-Based System ( )If ... Then ...If ... Then ...If ... Then ...

  • Difficulty in Applications to Large DataComputation Load for Fitness EvaluationComputation Load for Fitness Evaluation

    Fitness Evaluation:PopulationIf ... Then ... If ... Then ...

    Fitness Evaluation:Evaluation of each individual using the given training data

    If ... Then ...If ... Then ...If Then

    If ... Then ...If ... Then ...If ThenIf ... Then ...

    If ... Then ...

    If ... Then ...

    If ... Then ...T i i d tIf ... Then ...

    If ... Then ...If Then

    If ... Then ...If ... Then ...If ... Then ...

    Training dataIf ... Then ... If ... Then ...

    I di id l R l B d S t ( )If ... Then ...

    Individual = Rule-Based System ( )If ... Then ...If ... Then ...If ... Then ...

  • Difficulty in Applications to Large DataComputation Load for Fitness Evaluation

    P l ti

    Computation Load for Fitness Evaluation

    PopulationIf ... Then ...If ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...If Then

    Training dataIf ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...

    Training dataFitness Evaluation

    If ... Then ...

    If ... Then ...If ... Then ...

    I di id l R l B d S ( )If ... Then ...

    Individual = Rule-Based System ( )If ... Then ...If ... Then ...If ... Then ...

  • Difficulty in Applications to Large DataStandard Non Parallel Model

    P l ti

    Standard Non-Parallel Model

    PopulationIf ... Then ...If ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...If Then

    Training dataIf ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...

    Training dataFitness Evaluation

    If ... Then ...

    If ... Then ...If ... Then ...

    I di id l R l B d S ( )If ... Then ...

    Individual = Rule-Based System ( )If ... Then ...If ... Then ...If ... Then ...

  • Difficulty in Applications to Large DataStandard Non Parallel Model

    P l ti

    Standard Non-Parallel Model

    PopulationIf ... Then ...If ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...If Then

    Training dataIf ... Then ...If ... Then ...If ... Then ...If ... Then ...

    Fitness Evaluation

    If ... Then ...If ... Then ...If ... Then ...

    Training dataIf ... Then ...

    If ... Then ...If ... Then ...

    I di id l R l B d S ( )If ... Then ...

    Individual = Rule-Based System ( )If ... Then ...If ... Then ...If ... Then ...

  • Difficulty in Applications to Large DataStandard Non Parallel Model

    P l ti

    Standard Non-Parallel Model

    PopulationIf ... Then ...If ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...If Then

    Fitness Evaluation

    Training dataIf ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...

    Training dataIf ... Then ...

    If ... Then ...If ... Then ...

    I di id l R l B d S ( )If ... Then ...

    Individual = Rule-Based System ( )If ... Then ...If ... Then ...If ... Then ...

  • Difficulty in Applications to Large DataStandard Non Parallel Model

    P l ti

    Standard Non-Parallel Model

    Fit E l tiPopulationIf ... Then ...If ... Then ...If ... Then ...If ... Then ...

    Fitness Evaluation

    If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...If Then

    Training dataIf ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...

    Training dataIf ... Then ...

    If ... Then ...If ... Then ...

    I di id l R l B d S ( )If ... Then ...

    Individual = Rule-Based System ( )If ... Then ...If ... Then ...If ... Then ...

  • Parallel Fitness Evaluation

    P l ti Fitness EvaluationPopulation Fitness EvaluationIf ... Then ...If ... Then ...If ... Then ...If ... Then ...

    CPU 1If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...If Then

    CPU 2

    Training dataIf ... Then ...If ... Then ...If ... Then ...If ... Then ...

    CPU 3

    If ... Then ...If ... Then ...If ... Then ...

    Training dataIf ... Then ...

    CPU 4If ... Then ...If ... Then ...

    If we use n CPUs, the computation load for each CPU can be 1/n in comparison with the case of a single CPU (e.g., 25% by four CPUs)

  • Training Data Reductiong

    P l tiPopulationIf ... Then ...If ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...If Then

    Training dataIf ... Then ...If ... Then ...If ... Then ...If ... Then ...

    Training dataTraining data SubsetIf ... Then ...

    If ... Then ...If ... Then ...

    If ... Then ...

    If ... Then ...If ... Then ...

    If we use x% of the training data, the computation load can be reduced to x% in comparison with the use of all the training data.

  • Training Data Reductiong

    P l tiPopulationIf ... Then ...If ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...If Then

    Training dataIf ... Then ...If ... Then ...If ... Then ...If ... Then ...

    Training dataTraining data SubsetIf ... Then ...

    If ... Then ...If ... Then ...

    If ... Then ...

    If ... Then ...If ... Then ...

    If we use x% of the training data, the computation load can be reduced to x% in comparison with the use of all the training data.

  • Training Data Reductiong

    P l tiPopulationIf ... Then ...If ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...If Then

    Training dataIf ... Then ...If ... Then ...If ... Then ...If ... Then ...

    Training dataTraining data SubsetIf ... Then ...

    If ... Then ...If ... Then ...

    If ... Then ...

    If ... Then ...If ... Then ...

    If we use x% of the training data, the computation load can be reduced to x% in comparison with the use of all the training data.

  • Training Data Reductiong

    P l tiPopulationIf ... Then ...If ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...If Then

    Training dataIf ... Then ...If ... Then ...If ... Then ...If ... Then ...

    Training dataTraining data SubsetIf ... Then ...

    If ... Then ...If ... Then ...

    If ... Then ...

    If ... Then ...If ... Then ...

    If we use x% of the training data, the computation load can be reduced to x% in comparison with the use of all the training data.

  • Difficultyy

    P l tiPopulationIf ... Then ...If ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...If Then

    Training dataIf ... Then ...If ... Then ...If ... Then ...If ... Then ...

    Training dataTraining data SubsetIf ... Then ...

    If ... Then ...If ... Then ...

    If ... Then ...

    Difficulty: How to choose a training data subset

    If ... Then ...If ... Then ...

    Difficulty: How to choose a training data subsetThe population will overfit to the selected training data subset.

  • Idea of Windowing in J. Bacardit et al.: Speeding-up Pittsburgh learning classifier systems: Modeling time

    d PPSN 2004and accuracy. PPSN 2004.Training data are divided into data subsets.

    Training datag

  • Idea of Windowing in J. Bacardit et al.: Speeding-up Pittsburgh learning classifier systems: Modeling time

    d PPSN 2004

    1st Generation Fitness Evaluation

    and accuracy. PPSN 2004.

    If ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...If ... Then ...

    At each generation a different dataAt each generation, a different data subset (i.e., different window) is used.

  • Idea of Windowing in J. Bacardit et al.: Speeding-up Pittsburgh learning classifier systems: Modeling time

    d PPSN 2004and accuracy. PPSN 2004.

    2nd GenerationIf ... Then ...If ... Then ...

    If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...

    Fitness Evaluation

    If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...

    At each generation a different dataAt each generation, a different data subset (i.e., different window) is used.

  • Idea of Windowing in J. Bacardit et al.: Speeding-up Pittsburgh learning classifier systems: Modeling time

    d PPSN 2004and accuracy. PPSN 2004.

    3rd Generation Fitness EvaluationIf ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...If ... Then ...

    At each generation a different dataAt each generation, a different data subset (i.e., different window) is used.

  • Visual Image of WindowingA population is moving around in the training dataA population is moving around in the training data.

    Training Data = Environment

    PopulationIf ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...If ... Then ...

  • Visual Image of WindowingA population is moving around in the training dataA population is moving around in the training data.

    Training Data = Environment

    PopulationIf ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...If ... Then ...

    p

    If ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...If ... Then ...

  • Visual Image of WindowingA population is moving around in the training dataA population is moving around in the training data.

    Training Data = Environment

    PopulationIf ... Then ...If ... Then ...If ... Then ...If Then

    If ... Then ...If ... Then ...If ... Then ...If Then

    Population

    If ... Then ...If ... Then ...If ... Then ...If Then

    If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...

    If ... Then ...If ... Then ...

  • Visual Image of WindowingA population is moving around in the training dataA population is moving around in the training data.

    Training Data = Environment

    If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...

    Population

    If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...f

    If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...

  • Visual Image of WindowingA population is moving around in the training dataA population is moving around in the training data.

    Training Data = Environment

    PopulationIf ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...If ... Then ...If ... Then ...

  • Visual Image of WindowingA population is moving around in the training dataA population is moving around in the training data.

    Training Data = Environment

    If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...

    Population

    If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...

    If ... Then ...If ... Then ...

    Aft h l ti ith i i dAfter enough evolution with a moving windowThe population does not overfit to any particular training data subset.The population may have high generalization ability.p p y g g y

  • Our Idea: Parallel Distributed ImplementationH. Ishibuchi et al.: Parallel Distributed Hybrid Fuzzy GBML Models

    ith R l S t Mi ti d T i i D t R t ti TFS (i P )

    Non-parallel Non-distributed

    with Rule Set Migration and Training Data Rotation. TFS (in Press)

    CPU

    l

    a

    t

    i

    o

    n

    P

    o

    p

    u

    Training data

  • Our Idea: Parallel Distributed ImplementationH. Ishibuchi et al.: Parallel Distributed Hybrid Fuzzy GBML Models

    ith R l S t Mi ti d T i i D t R t ti TFS (i P )

    Non-parallel Non-distributed

    with Rule Set Migration and Training Data Rotation. TFS (in Press)

    Our Parallel Distributed ModelCPU CPU CPU CPU CPUCPU

    l

    a

    t

    i

    o

    n

    P

    o

    p

    u

    Training data

    (1) A population is divided into multiple subpopulations(1) A population is divided into multiple subpopulations.(as in an island model)

  • Our Idea: Parallel Distributed ImplementationH. Ishibuchi et al.: Parallel Distributed Hybrid Fuzzy GBML Models

    ith R l S t Mi ti d T i i D t R t ti TFS (i P )

    Non-parallel Non-distributed

    with Rule Set Migration and Training Data Rotation. TFS (in Press)

    Our Parallel Distributed ModelCPU CPU CPU CPU CPUCPU

    l

    a

    t

    i

    o

    n

    P

    o

    p

    u

    Training data Training data

    (1) A population is divided into multiple subpopulations(1) A population is divided into multiple subpopulations.(2) Training data are also divided into multiple subsets.

    (as in the windowing method)

  • Our Idea: Parallel Distributed ImplementationH. Ishibuchi et al.: Parallel Distributed Hybrid Fuzzy GBML Models

    ith R l S t Mi ti d T i i D t R t ti TFS (i P )

    Non-parallel Non-distributed

    with Rule Set Migration and Training Data Rotation. TFS (in Press)

    Our Parallel Distributed ModelCPU CPU CPU CPU CPUCPU

    l

    a

    t

    i

    o

    n

    P

    o

    p

    u

    Training data Training data

    (1) A population is divided into multiple subpopulations(1) A population is divided into multiple subpopulations.(2) Training data are also divided into multiple subsets.(3) An evolutionary algorithm is locally performed at each CPU.

    (as in an island model)

  • Our Idea: Parallel Distributed ImplementationH. Ishibuchi et al.: Parallel Distributed Hybrid Fuzzy GBML Models

    ith R l S t Mi ti d T i i D t R t ti TFS (i P )

    Our Parallel Distributed ModelNon-parallel Non-distributed

    with Rule Set Migration and Training Data Rotation. TFS (in Press)

    CPU CPU CPU CPU CPUCPU

    l

    a

    t

    i

    o

    n

    P

    o

    p

    u

    Training data Training data

    (1) A population is divided into multiple subpopulations(1) A population is divided into multiple subpopulations.(2) Training data are also divided into multiple subsets.(3) An evolutionary algorithm is locally performed at each CPU.(4) Training data subsets are periodically rotated.

    (e.g., every 100 generations)

  • Our Idea: Parallel Distributed ImplementationH. Ishibuchi et al.: Parallel Distributed Hybrid Fuzzy GBML Models

    ith R l S t Mi ti d T i i D t R t ti TFS (i P )

    Our Parallel Distributed ModelNon-parallel Non-distributed

    with Rule Set Migration and Training Data Rotation. TFS (in Press)

    CPU CPU CPU CPU CPUCPU

    l

    a

    t

    i

    o

    n

    P

    o

    p

    u

    Training data Training data

    (1) A population is divided into multiple subpopulations(1) A population is divided into multiple subpopulations.(2) Training data are also divided into multiple subsets.(3) An evolutionary algorithm is locally performed at each CPU.(4) Training data subsets are periodically rotated.(5) Migration is also periodically performed.

  • Computational Experimentsp pThe following two models are compared.Standard Model:

    Evaluation of all rule sets

    Our Model with Seven CPU:

    Evaluation of 1/7 rule sets using

    Rule set migration

    Evaluation of all rule sets using all the training data

    Evaluation of 1/7 rule sets using one of the seven data subsets.

    CPU

    a

    t

    i

    o

    n

    CPU

    o

    n

    CPU CPU CPU CPU CPU CPU

    Rule set migration

    P

    o

    p

    u

    l

    a

    P

    o

    p

    u

    l

    a

    t

    i

    o

    Training data rotation

    Training data Training data

  • Standard Non-Parallel Non-Distributed Modelwith a Single Population and a Single Data Setwith a Single Population and a Single Data Set

    CPU

    a

    t

    i

    o

    n

    t

    i

    o

    n

    P

    o

    p

    u

    l

    a

    P

    o

    p

    u

    l

    a

    t

    P

    P

    Training data

  • Standard Non-Parallel Non-Distributed Modelwith a Single Population and a Single Data Set

    CPU

    with a Single Population and a Single Data Set

    Single CPU

    t

    i

    o

    n

    g

    P

    o

    p

    u

    l

    a

    t

    Single Population of Size 210P

    Whole Data SetTraining data

  • Standard Non-Parallel Non-Distributed Modelwith a Single Population and a Single Data Set

    CPU

    with a Single Population and a Single Data Set

    Single CPU

    t

    i

    o

    n

    g

    P

    o

    p

    u

    l

    a

    t

    Single Population of Size 210P

    Whole Data SetTraining data

    Termination Conditions: 50 000 GenerationsTermination Conditions: 50,000 GenerationsComputation Load: 210 x 50,000 = 10,500,000 Evaluations

    (more than ten million evaluations)

  • Our Model in Computational Experimentswith Seven Subpopulations and Seven Data Subsetswith Seven Subpopulations and Seven Data Subsets

    Rule set migration

    CPU CPU CPU CPU CPU CPU CPU

    g

    a

    t

    i

    o

    n

    P

    o

    p

    u

    l

    a

    P

    Training data rotation

    Training data

  • Our Model in Computational Experimentswith Seven Subpopulations and Seven Data Subsetswith Seven Subpopulations and Seven Data Subsets

    Rule set migration

    CPU CPU CPU CPU CPU CPU CPU

    g

    Seven CPUs

    a

    t

    i

    o

    n

    Seven Subpopulations of Size 30

    P

    o

    p

    u

    l

    a

    p pPopulation Size = 30 x 7 = 210

    P

    Seven Data SubsetsTraining data rotation

    Seven Data Subsets

    Training data

  • Comparison of Computation Loadp pComputation Load on a Single CPU per GenerationStandard Model:

    Evaluation of 210 rule sets

    Parallel Distributed Model:

    Evaluation of 30 rule sets using

    Rule set migration

    Evaluation of 210 rule sets using all the training data

    Evaluation of 30 rule sets using one of the seven data subsets.

    CPU

    a

    t

    i

    o

    n

    CPU

    o

    n

    CPU CPU CPU CPU CPU CPU

    Rule set migration

    P

    o

    p

    u

    l

    a

    P

    o

    p

    u

    l

    a

    t

    i

    o

    Training data rotation

    Training data Training data

  • Comparison of Computation Loadp pComputation Load on a Single CPU per GenerationStandard Model:

    Evaluation of 210 rule sets

    Parallel Distributed Model:

    Evaluation of 30 rule sets using

    Rule set migration

    Evaluation of 210 rule sets using all the training data

    Evaluation of 30 rule sets using one of the seven data subsets.

    CPU

    a

    t

    i

    o

    n

    CPU

    o

    n

    CPU CPU CPU CPU CPU CPU

    Rule set migration

    P

    o

    p

    u

    l

    a

    P

    o

    p

    u

    l

    a

    t

    i

    o

    1/7

    Training data rotation 1/7Training data Training data

  • Comparison of Computation Loadp pComputation Load on a Single CPU per GenerationComputation Load ==> 1/7 x 1/7 = 1/49 (about 2%) Standard Model:

    Evaluation of 210 rule sets

    Parallel Distributed Model:

    Evaluation of 30 rule sets using

    p ( )

    Rule set migration

    Evaluation of 210 rule sets using all the training data

    Evaluation of 30 rule sets using one of the seven data subsets.

    CPU

    a

    t

    i

    o

    n

    CPU

    o

    n

    CPU CPU CPU CPU CPU CPU

    Rule set migration

    P

    o

    p

    u

    l

    a

    P

    o

    p

    u

    l

    a

    t

    i

    o

    1/7

    Training data rotation 1/7Training data Training data

  • Data Sets in Computational Experiments Nine Pattern Classification ProblemsNine Pattern Classification Problems

    Name of Number of Number of Number of Data Set Patterns Attributes ClassesSegment 2,310 19 7g ,Phoneme 5,404 5 2

    Page blocks 5 472 10 5Page-blocks 5,472 10 5Texture 5,500 40 11

    Satimage 6,435 36 6Twonorm 7,400 20 2

    Ring 7,400 20 2PenBased 10,992 16 10PenBased 10,992 16 10

    Magic 19,020 10 2

  • Computation Time for 50,000 GenerationsComputation time was decreased to about 2%Computation time was decreased to about 2%

    Name of Standard Our Model Percentage of B Data Set A minutes B minutes

    gB/A (%)

    Segment 203.66 4.69 2.30%gPhoneme 439.18 13.19 3.00%

    Page blocks 204 63 4 74 2 32%Page-blocks 204.63 4.74 2.32%Texture 766.61 15.72 2.05%

    Satimage 658.89 15.38 2.33%Twonorm 856.58 7.84 0.92%

    Ring 1015.04 22.52 2.22%PenBased 1520 54 35 56 2 34%PenBased 1520.54 35.56 2.34%

    Magic 771.05 22.58 2.93%

  • Computation Time for 50,000 GenerationsComputation time was decreased to about 2%Computation time was decreased to about 2%

    Name of Standard Our Model Percentage of B Data Set A minutes B minutes

    gB/A (%)

    Segment 203.66 4.69 2.30%Wh ?gPhoneme 439.18 13.19 3.00%Page blocks 204 63 4 74 2 32%

    Why ?Because the population and the training

    Page-blocks 204.63 4.74 2.32%Texture 766.61 15.72 2.05%

    data were divided into seven subsets.1/7 x 1/7 = 1/49 (about 2%)

    Satimage 658.89 15.38 2.33%Twonorm 856.58 7.84 0.92%

    ( )

    Ring 1015.04 22.52 2.22%PenBased 1520 54 35 56 2 34%PenBased 1520.54 35.56 2.34%

    Magic 771.05 22.58 2.93%

  • Computation Time for 50,000 GenerationsComputation time was decreased to about 2%Computation time was decreased to about 2%

    Name of Standard Our Model Percentage of B Data Set A minutes B minutes

    gB/A (%)

    Segment 203.66 4.69 2.30%Wh ?gPhoneme 439.18 13.19 3.00%Page blocks 204 63 4 74 2 32%

    Why ?Because the population and the training

    Page-blocks 204.63 4.74 2.32%Texture 766.61 15.72 2.05%

    data were divided into seven subsets.1/7 x 1/7 = 1/49 (about 2%)

    Satimage 658.89 15.38 2.33%Twonorm 856.58 7.84 0.92%

    ( )

    Ring 1015.04 22.52 2.22%PenBased 1520 54 35 56 2 34%PenBased 1520.54 35.56 2.34%

    Magic 771.05 22.58 2.93%

  • Test Data Error Rates (Results of 3x10CV)Test data accuracy was improved for six data setsTest data accuracy was improved for six data sets

    Name of Standard Our Model Improvement Data Set (A %) (B %)

    pfrom A: (A - B)%

    Segment 5.99 5.90 0.09gPhoneme 15.43 15.96 - 0.53

    Page blocks 3 81 3 62 0 19Page-blocks 3.81 3.62 0.19Texture 4.64 4.77 - 0.13

    Satimage 15.54 12.96 2.58 Twonorm 7.36 3.39 3.97

    Ring 6.73 5.25 1.48 PenBased 3 07 3.30 - 0 23PenBased 3.07 3.30 - 0.23

    Magic 15.42 14.89 0.53

  • Test Data Error Rates (Results of 3x10CV)Test data accuracy was improved for six data setsTest data accuracy was improved for six data sets

    Name of Standard Our Model Improvement Data Set (A %) (B %)

    pfrom A: (A - B)%

    Segment 5.99 5.90 0.09gPhoneme 15.43 15.96 - 0.53

    Page blocks 3 81 3 62 0 19Page-blocks 3.81 3.62 0.19Texture 4.64 4.77 - 0.13

    Satimage 15.54 12.96 2.58 Twonorm 7.36 3.39 3.97

    Ring 6.73 5.25 1.48 PenBased 3 07 3.30 - 0 23PenBased 3.07 3.30 - 0.23

    Magic 15.42 14.89 0.53

  • Q. Why did our model improve the test data accuracy ?A Because our model improved the search abilityA. Because our model improved the search ability.

    Data Set Standard Our Model ImprovementSatimage 15.54% 12.96% 2.58%

    Parallel distributed model22%

    )

    a

    t

    e

    (

    %

    )

    Parallel distributed modelNon-parallel non-distributed model20

    22R

    a

    t

    e

    (

    %

    a

    e

    r

    r

    o

    r

    r

    a

    16

    18

    E

    r

    r

    o

    r

    R

    n

    i

    n

    g

    d

    a

    t

    a

    12

    14

    D

    a

    t

    a

    E Non-Parallel Non-Distributed

    T

    r

    a

    i

    n

    10

    12

    a

    i

    n

    i

    n

    g

    Our Parallel Distributed Model

    Number of generations0 10000 20000 30000 40000 50000

    T

    r

    Number of Generations

  • Q. Why did our model improve the search ability ?A Because our model maintained the diversity

    Data Set Standard Our Model Improvement

    A. Because our model maintained the diversity.

    Satimage 15.54% 12.96% 2.58%

    %

    )

    221616

    R

    a

    t

    e

    (

    %

    20

    22

    1414

    E

    r

    r

    o

    r

    R

    16

    18

    1212

    g

    D

    a

    t

    a

    E

    14

    16

    1010

    r

    a

    i

    n

    i

    n

    g

    10

    1220000 22000 24000 26000 28000 3000020000 22000 24000 26000 28000 30000

    T

    r

    Number of Generations0 10000 20000 30000 40000 50000

  • Q. Why did our model improve the search ability ?A Because our model maintained the diversityA. Because our model maintained the diversity.

    Data Set Standard Our Model Improvement

    16Satimage 15.54% 12.96% 2.58%

    %

    ) The best and worst error rates in a particular subpopulation

    14

    R

    a

    t

    e

    (

    % at each generation in a single run.

    E

    r

    r

    o

    r

    R

    12Training Data Rotation:

    Every 100 GenerationsDa

    t

    a

    E

    Best Worst

    10

    y

    Rule Set Migration:Every 100 Generations

    a

    i

    n

    i

    n

    g

    Parallel Distributed Model30001 30100 30200 30300 30400 30500

    T

    r

    Number of Generations

  • Q. Why did our model improve the search ability ?A Because our model maintained the diversityA. Because our model maintained the diversity.

    Data Set Standard Our Model Improvement

    16Satimage 15.54% 12.96% 2.58%

    %

    ) The best and worst error rates in a particular subpopulation

    14

    R

    a

    t

    e

    (

    %

    N P ll l N Di t ib t d M d l

    at each generation in a single run.

    E

    r

    r

    o

    r

    R Non-Parallel Non-Distributed Model(Best = Worst: No Diversity)

    12Training Data Rotation:

    Every 100 GenerationsDa

    t

    a

    E

    Best Worst

    10

    y

    Rule Set Migration:Every 100 Generations

    a

    i

    n

    i

    n

    g

    Parallel Distributed Model30001 30100 30200 30300 30400 30500

    T

    r

    Number of Generations

  • Current Hot Issues and Future Directions Type 2 Fuzzy Sets and Type 2 Fuzzy SystemsType 2 Fuzzy Sets and Type 2 Fuzzy Systems

    N N K ik t l Type 2 Fuzzy Logic Systems IEEE TN. N. Karnik et al.: Type-2 Fuzzy Logic Systems. IEEE Trans. on Fuzzy Systems (1999) 556 CitationsQ Liang J M Mendel: Interval Type 2 Fuzzy Logic Systems:Q. Liang, J. M. Mendel: Interval Type-2 Fuzzy Logic Systems: Theory and Design. IEEE Trans. on Fuzzy Systems (2000) 579 CitationsJ. M. Mendel, R. I. John: Type-2 Fuzzy Sets Made Simple.IEEE Trans. on Fuzzy Systems (2002) 782 Citationsy y ( )J. M. Mendel, R. I. John, F. Liu: Interval Type-2 Fuzzy Logic Systems Made Simple. IEEE Trans. on Fuzzy Systems (2006) y p y y ( )335 Citations

  • Current Hot Issues and Future Directions Type 2 Fuzzy Sets and Type 2 Fuzzy SystemsType 2 Fuzzy Sets and Type 2 Fuzzy Systems

    Students Good Students CA B C A B

    1 0 1 00.0C

    D E F DE

    F1.0 1.0

    0 5 0.7 0.20.5 0.2

    Good Students

    A B 0 0C

    Good Students Type 2 Fuzzy Set:A membership value of aA B

    D F1.0 1.0

    0.0

    l

    type 2 fuzzy set is fuzzy.For example, Student Fhas a fuzzy membershipD E

    mediumlarge small

    has a fuzzy membershipvalue small (not a realnumber 0.2).

  • Current Hot Issues and Future Directions Type 2 Fuzzy Sets and Type 2 Fuzzy SystemsType 2 Fuzzy Sets and Type 2 Fuzzy Systems

    p

    Membership function of a fuzzy sete

    r

    s

    h

    i

    p 1p y

    Membership value:

    m

    e

    m

    b

    e

    0

    Membership value:Real number

    m 0

    M b hi f ti f t 2 f t

    s

    h

    i

    p 1Membership function of a type 2 fuzzy set

    e

    m

    b

    e

    r

    s

    Membership value:Fuzzy number

    m

    e

    0

  • Plan of This Presentation1. Introduction

    - Focus of the presentation - Brief introduction to evolutionary computation- Brief introduction to fuzzy rule-based classifiersBrief introduction to fuzzy rule based classifiers

    2. Review of Fuzzy Rule-Based System ResearchFuzzy rules from human experts- Fuzzy rules from human experts

    - Accuracy maximization- Interpretability improvement- Multiobjective approach

    3. Evolutionary Fuzzy Classifier Designy y g- Evolutionary fuzzy rule selection- Fuzzy genetics-based machine learning

    4. Current Hot Issues and Future Directions

  • Conclusion

    Thank you very much!Thank you very much!