neuro-fyzzy methods for modeling and identification

Neuro-Fyzzy Methods

for

Modeling and Identification

Presented by:Ali Maleki

Presentation Agenda

Introduction Fuzzy Systems Artificial Neural Networks Neuro-Fuzzy Modeling Simulation Examples

Introduction Control Systems

CompetionEnvironment requirementsEnergy and material costsDemand for robust, fault-tolerant systems

Extra needs for Effective process modeling techniques

Conventional modeling?

lack precise, formal knowledg about the systemStrongly nonlinear behavior,High degree of uncertainty,Time varying characteristics

Introduction Solution: Neuro-fuzzy modeling

A powerful tool which can facilitate the effective development of models by combining information from different source:

Empirical modelsHeuristicsData

Neuro-fuzzy modelsDescribe systems by means of fuzzy if-then rulesRepresented in a network structureApply algorithms from the area of Neural Networks

IntroductionNeuro-fuzzy modeling

- neural networks- fuzzy systems

Both are motivated by imitating human reasoning process

Relationships:In neural networks :

implicitly, coded in the network and its parameters

In fuzzy systems:explicitly, in the form of if–then rules

Neuro–fuzzy systems combine the semantic transparency of rule-based fuzzy systems with the learning capability of neural networks

Nonlinear system identification

NARX (nonlinear autoregressive with exogenous input) model

Regressor vector:

Dynamic order of the system: Represented by the number of lags nu and ny

Task of nonlinear system identification:Infer unknown function f from available data sequences

Nonlinear system identification

Multivariable systems:

Nonlinear state-space discription

Task of nonlinear system identification:Infer unknown functions g and h from available data sequences

Neural NetworksNeuro-fuzzy systemsSplinesInterpolated look-up tablesAccurate PredictorAccurate predictor +model that can be used to learn something about system +Analyse system properties

Fuzzy Models

Fuzzy Models

Definition: A mathematical model which in some way uses

fuzzy setsIn system identification:

rule-based fuzzy modelsExample: IF heating is high THEN temperature increase is fast

Linguistic terms

To make such a model operational: Linguistic terms must be defined more precisely using fuzzy setsFuzzy sets defined through their membership functions

Fuzzy Models

Types of fuzzy models: (depending on the structure of if-then rules)

Mamdani Model

IF D1 is low and D2 is high THEN D is medium

Takagi-Sugeno Model

IF D1 is low and D2 is high THEN D=k (zero-order)

IF D1 is low and D2 is high THEN D=0.7D1+0.2D2+0.1 (first-order)

Mamdani Model

Linguistic terms Number of rules

Linguistic fuzzy model is useful for representing qualitative knowledge

Example – Mamdani Model

Linguistic terms is defined by membership function

Gas burner Heating powerOxygen flow rate

Example – Mamdani Model

Membership functions can be defined by the model developer based on prior

knowledgeor (automatically) by using data (constructed - adjusted)

Gas burner Heating powerOxygen flow rate

Takagi-Sugeno Model

Mamdani model is typically used in knowledge-based (expert) systemsIn data driven identification, Takagi_Sageno model has becom popular

Consequent parameter vector Scalar offset Number of rules

Takagi-Sugeno Model

The output y is computed by taking the weighted average of the individual rules contribution:

Degree of fulfilment of the ith rule

In a special case:

Takagi-Sugeno Model (zero-order)Rules:

Input-output equation:

This model is a special case of the Mamdani system, in whichConsequent fuzzy sets degenerate to singletons (real numbers)

Takagi-Sugeno Model

usually,antecedent fuzzy sets are usually defined to describe distinct, partly overlapping regions in the input space

Then,The parameters ai are approximate local linear models of the considered nonlinear system

TS model:Piece-wise linear approximation of a nonlinear function

Example: Takagi-Sugeno Model

static characteristic of an actuator withdead zone and anon-symmetrical response for positive and negative inputs

Fuzzy Logic Operators

In fuzzy systems with multiple inputs, the antecedent proposition is usually represented as a combination of terms,by using logic operators

‘and’ (conjunction)‘or’ (disjunction) and‘not’ (complement)

In fuzzy set theory,several families of operators have been introduced for these logical connectives

Example: Fuzzy Logic Operators

conjunctive form of the antecedent

degree of fulfillment:

minimum conjunction operator

product conjunction operator

Dynamic Fuzzy Models

TS NARX model

Regressor vector

Dynamic Fuzzy Models

TS state-space model

Advantages of the state-space modeling approach:• structure of the model can easily be related to the

physical structure of the real system (model parameters are physically relevant)

• This is not necessarily the case with input-output models.

• Dimension of the regression problem in state-space modeling is often smaller than with input–output models

Artificial Neural Networks

ANNs:• Inspired by the functionality of biological neural

networks• Can generalizing from a limited amount of training

data• black-box models of nonlinear, multivariable static

and dynamic systems• can be trained by using input–output data

ANNs consist of:• Neurons• Interconnection among them• Weights assigned to these interconnections

Multi-Layer Neural Network

• One input Layer• One output layer• A number of hidden layers

Activation function: Linear neurons• Tangent hyperbolic

• Threshold function• Sigmoidal function

Multi-Layer Neural Network - Training

Training definition:• adaptation of weights in a multi-layer network

such that the error between the desired output and the network output is minimized

Training steps:• Feedforward computation• Weight adaptation

Gradient-descent optimizationError backpropagation

Multi-Layer Neural Network - Structure

A network with one hidden layer is sufficient for most approximation tasks

More layers:• Can give a better fit• But the training takes longer

number of neurons in the hidden layer:• Too few neurons give a poor fit• Too many neurons result in overtraining of

the net (poor generalization to unseen data)

Dynamic Neural Networks

static feedforward network combined with an externalfeedback connection

First-order NARX model

Dynamic Neural Networks

Recurrent Networks

Feedback:• Internally in the neurons (Elman network)• Internally to other neurons in the same layer• Internally to neurons in preceding layer (Hopfield Network)

Hopfield Network

Elman Network:

Error Backpropagation

Input:

Desired output:

Error:

Cost function:

Adjusting the weights: minimization of the cost function


network’s output y is nonlinear in the weightsTherefore,The training of a MNN is thus a nonlinear optimization

problem

Methods:• Error backpropagation (first-order gradient)• Newton, Levenberg-Marquardt methods (second-

order gradient)• Genetic algorithms and many others techniques


First-order gradient

Update rule:

Weight vector in iteration nLearning rate

Jacobian of the network

nonlinear optimization problem is thus solved by using the first term of its Taylor series expansion


Second-order gradient:

Second-order gradient method make use of the second term

Hessian

Update rule:


Difference between first-order and Second-order gradient methods:

Size and direction of gradient-decent step

Second order methods are usually more effective than first-order ones


For output layer

Update law for output weights:

For hidden layer

Update law for hidden layer weights:

Backpropagation error

Radial Basis Function Network

RBF network is a twolayer network as figure below

Ususal choise for basis function is Gaussian function

Radial Basis Function Network

Adjustable weights are only present in the output layer

Free parameters of RBF nets are• Output weights• Parameters of the basis functions (centers and

radii)

Output is linear in the weights, and these weights can be estimated by least-squares methods

adaptation of the RBF parameters (center and radial) is a nonlinear optimization problem that can be solved by the gradient-descent techniques

Neuro-Fuzzy Modeling

Fuzzy system as a layered structure (network), similar to ANNs of the RBF type

gradientdescent training algorithms for parameter optimization

This approach is usually referred to as Neuro-Fuzzy Modeling

• Zero-order TS fuzzy model• First-order TS fuzzy model


Zero-order TS fuzzy model

Typical membership function

Input-output equation


First-order TS fuzzy model

Input-output equation

Neuro-Fuzzy Networks - Constructing

• Prior knowledge (can be of a rather approximate nature)

• Process data

Integration of knowledge and data:• Expert knowledge as a collection of if–then rules

(Initial model creation – fine tune using process data)• Fuzzy rules are constructed from scratch by using

numerical data(expert can confront the information stored in the rule base with his own knowledge)(can modify the rules)(supply additional rules to extend the validity of the model)

Comparision the second method with Truly black-box structures

possibility to interpret the obtained results

Neuro-Fuzzy Networks – Structure and parameters

System identification steps:• Structure identification• Parameter estimation

• choice of the model’s structure determines the flexibility of the model in the approximation of (unknown) systems

• model with a rich structure can approximate more complicated functions, but, will have worse generalization properties

• Good generalization means that a model fitted to one data set will also perform well on another data set from the same process.


Structure selection process involves:

• Selection of input variables

• Number and type of membership functions, number of rules(These two structural parameters are mutually related)


Selection of input variables• Physical inputs• Dynamic regressors (defined by the input and output

lags)

Typical sources of information:• Prior knowledge• Insight in the process behavior• Purpose of the modeling exercise

Automatic data-driven selection can then be used to compare different structures in terms of some specified performance criteria.


Number and type of membership functions, number of rules

• Determine the level of detail (granularity) of the model

Typical criteria• Purpose of modeling• Amount of available information (knowledge and

data)

Automated methods can be used to add or remove membership functions and rules.

Neuro-Fuzzy Networks – Gradient-based learning

Zero-order ANFIS model

Consequent parameters

Jacobian

Update law

Centers and spreads of the Gaussian membership functions

Neuro-Fuzzy Networks – Hybrid Learning

Output-layer parameters in RBF networks can be estimated by linear least-squares (LS) techniques

LS methods are more effective than the gradient-based update rule

Hybrid methods:• One-shot least-squares estimation of the consequent

parameters• Iterative gradient-based optimization of the membership

functions

Choice of LS estimation method:• In terms of error minimization is not crucial• If consequent parameters are to be interpreted as local

models great care must be taken

PROBLEM: over-parameterizationnumerical problems - over-fitting - meaningless parameter

estimates

Example – Hybrid Learning

Approximation of second-order polynomial by a first-order ANFIS model

Membership function for t1<u<t2

Model:

Output of TS model

• Model has four free parameters, while three are sufficient to fit the polynomial

Neuro-Fuzzy Networks – Hybrid Learning

To avoid over-parameterization, the basic least-squares criterion can be combined with additional criteria for local fit, or with constraints on the parameter values

• Local Least-Squares Estimation• Constrained Estimation• Multi-Objective Optimization

Hybrid Learning - Global LS Estimation

Hybrid Learning - Local LS Estimation

While the global solution gives the minimal prediction error, it may bias the estimates of the consequents as parameters of local models.

If locally relevant model parameters are required, a weighted LS approach applied per rule should be used.

• The consequent parameters of the individual rules are estimated independently (result is not influenced by the interactions of the rules)

• Larger prediction error is obtained than with global least squares

Example – Global and Local LS Estimation

Approximation of second-order polynomial by a first-order ANFIS model

Model:

Local LS estimation Global LS estimation

Hybrid Learning - Constrained Estimation

Knowledge about the dynamic system such as its stability, minimal or maximal static gain, or its settling time can be translated into convex constraints on the consequent parameters

Hybrid Learning - Multi-Objective Optimization

Minimize the weighted sum of the global and local identification criteria

Initialization of Antecedent Membership Functions

For a successful application of gradient-descent learning to the membership function parameters, good initialization is important

Initialization methods:• Template-Based Membership Functions• Discrete Search Methods• Fuzzy Clustering

Initialization - Template-Based Membership Functions

domains of the antecedent variables are a priori partitioned by a number of membership functions (usually evenly spaced and shaped)

severe drawback of this approach is that the number of rules in the model grows exponentially

• Complexity of the system’s behavior is typically not uniform.• Some operating regions can be well approximated by a local

linear model, while other regions require a rather fine partitioning.

• In order to obtain an efficient representation with as few rules as possible, the membership functions must be placed such that they capture the non-uniform behavior of the system.

Initialization - Discrete Search Methods

Iterative tree-search algorithms can be applied to decompose the antecedent space into hyper-rectangles by axis-orthogonal splits.

Advantage: Its effectiveness for high-dimensional data and the transparency of the obtained partition.

Drawback: Tree building procedure is sub-optimal (greedy) and hence the number of rules obtained can be quite large

Initialization – Fuzzy Clutering

Based on the similarity, data vectors are clustered such that the data within a cluster are as similar as possible, and data from different clusters are as dissimilar as possible.

The number of clusters in the data can either be determined a priori or sought automatically by using cluster validity measures and merging techniques

Simulation Example

1. A simple fitting problem of a univariate static function

• It demonstrates the typical construction procedure of a neuro-fuzzy model. Numerical results show that an improvement in performance is achieved at the expense of obtaining if-then rules that are not completely relevant as local descriptions of the system.

2. Modeling of a nonlinear dynamic system

• Illustrates that the performance of a neuro-fuzzy model does not necessarily improve after training. This is due to overfitting which in the case of dynamic systems can easily occur when the data only sparsely cover the domains

Simulation Example - Static Function

ANFIS model with linear consequent functionNumber of rules: five rulesConstruction of initial model:

Gustafson-Kessel algorithm

Fit of the function with initial model – local models - membership functions


this initial model can easily be interpreted in terms of the local behavior

It is reasonably accurate (RMS= 0.0258)

ANFIS method, 100 learning epochsanfis function of the MATLAB Fuzzy Logic Toolbox

Fit of the function with fine-tuned model, local models, membership functions


RMS error is about 23 times better than the initial model

Initial model Fine-tuned modelRMS error = 0.0258 RMS error =

0.0011


after learning, the local models are much further from the true local description of the function

Initial model Fine-tuned model

Fine-tuned model are thus less accurate in describing the system locally

Simulation Example – pH Neutralization Process

Neutralization tank Effluent stream

Acid Buffer Base

Influent streams

Neutralization tank pH in the tank

Acid flowrate = cteBuffer flowrate = cteBase stream flowrate

Simulation Example - pH Neutralization Process

Identification and validation data sets:• Simulating the model by Hall and Seborg for random change of

the influent base stream flow rate• N = 499 samples with the sampling time of 15 s.

The process is approximated as a first–order discrete-time NARX model


Membership functions

Befor training After training


Rules: Initial Rules:

Fine Tuned Rules: after 1000 epochs of hybrid learning using the ANFIS function of the MATLAB Fuzzy Logic Toolbox


Overtraining Problem: • Comparision of RMS ERROR befor and after training

• Prediction befor and after training

References

[1] Robert Babuska, “Neuro-Fuzzy Methods for Modeling and Identification”, Recent Advances in Intelligent Paradigms and Application, Springer-Verilag, 2002

[2] Robert Babuska, “Fuzzy Modeling and Identification Toolbox User’s Guide - For Use with MATLAB”, 1998.

[3] A. Trabelsi, F. Lafont, M. Kamoun and G. Enea, “Identification of Nonlinear Systems by Adaptive Fuzzy Takagi-Sugeno Model”, International Journal of Computational Cognition, Volume 2, Number 3, Pages 137–153, September 2004.

[4] Jan Jantzen, “Neurofuzzy Modelling”, Technical University of Denmark, Department of Automation,1998.

THANK YOU VERY MUCH

For your

Attention

Presented by:Ali Maleki

Titration Curve

Neutralization tank pH in the tank

Acid flowrate = cteBuffer flowrate = cteBase stream flowrate

neuro-fyzzy methods for modeling and identification

Documents

parametersin fuzzy systems

fuzzy modelsexample

sageno model

mathematical model

model operational

model developer

fuzzy setsfuzzy sets

mamdani modelif d1