advanced process identification & control

TLFeBOOK

AOVANCEOPROCESS

IDENTIFICATIONANO CONTROL

TLFeBOOK

CONTROL ENGINEERING

A Series of Reference Books and Textbooks

Editor

NEIL MUNRO, PH.D., D.Sc.

ProfessorApplied Control Engineering

University of Manchester Institute of Science and TechnologyManchester, United Kingdom

1. Nonlinear Control of Electric Machinery, Darren M. Dawson, Jun Hu, andTimothy C. Burg

2. Computational Intelligence in Control Engineering, Robert E. King3. Quantitative Feedback Theory: Fundamentals and Applications, Con-

stantine H. Houpis and Steven J. Rasmussen4. Self-Learning Control of Finite Markov Chains, A. S. Poznyak, K. Najim,

and E. GOmez-Ramirez5. Robust Control and Filtering for Time-Delay Systems, Magdi S. Mahmoud6. Classical Feedback Control: With MATLAB, Boris J. Lurie and Paul J.

Enright7. Optimal Control of Singularly Perturbed Linear Systems and Applications:

High-Accuracy Techniques, Zoran GajM and Myo-Taeg Lim8. Engineering System Dynamics: A Unified Graph-Centered Approach,

Forbes T. Brown9. Advanced Process Identification and Control, Enso Ikonen and Kaddour

Najim10. Modem Control Engineering, P. N. Paraskevopoulos

Additional Volumes in Preparation

Sliding Mode Control in Engineering, Wilfrid Perruquetti and Jean PierreBarbot

Actuator Saturation Control, edited by Vikram Kapila and KarolosGdgodadis

TLFeBOOK

ADVANCEDPROC

IOENTIFICATIONAND CONTROL

Enso IkonenUniversity of Oulu

Oulu, Finland

Kaddour NajimInstitut National Polytechnique de Toulouse

Toulouse, France

MARCEL

DEKKER

MARCEL DEKKER, INC. NEW YORK- BASEL

TLFeBOOK

ISBN: 0-8247-0648-X

This book is printed on acid-free paper.

HeadquartersMarcel Dekker, Inc.270 Madison Avenue, New York, NY 10016tel: 212-696-9000; fax: 212-685-4540

Eastern Hemisphere DistributionMarcel Dekker AGHutgasse 4, Postfach 812, CH-4001 Basel, Switzerlandtel: 41-61-261-8482; fax: 41-61-261-8896

World Wide Webhttp ://www.dekker.com

The publisher offers discounts on this book when ordered in bulk quantities. Formore information, write to Special Sales/Professional Marketing at theheadquarters address above.

Copyright © 2002 by Marcel Dekker, Inc. All Rights Reserved.

Neither this book nor any part may be reproduced or transmitted in any ibrm orby any means, electronic or mechanical, including photocopying, microfilming, andrecording, or by any reformation storage and retrieval system, without permissionin writing from the publisher.

Current printing (last digit):10987654321

PRINTED IN THE UNITED STATES OF AMERICA

TLFeBOOK

Series Introduction

Many textbooks have been written on control engineering, describingnew techniques for controlling systems, or new and better ways ofmathematically formulating existing methods to solve the ever-increasing complex problems faced by practicing engineers. However,few of these books fully address the applications aspects of control en-gineering. It is the intention of this new series to redress this situation.

The series will stress applications issues, and not just the mathe-matics of control engineering. It will provide texts that present not onlyboth new and well-established techniques, but also detailed examples ofthe application of these methods to the solution of real-world problems.The authors will be drawn from both the academic world and the rele-vant applications sectors.

There are already many exciting examples of the application ofcontrol techniques in the established fields of electrical, mechanical (in-cluding aerospace), and chemical engineering. We have only to lookaround in today’s highly automated society to see the use of advancedrobotics techniques in the manufacturing industries; the use of auto-mated control and navigation systems in air and surface transport sys-tems; the increasing use of intelligent control systems in the many arti-facts available to the domestic consumer market; and the reliable sup-ply of water, gas, and electrical power to the domestic consumer and toindustry. However, there are currently many challenging problems thatcould benefit from wider exposure to the applicability of control meth-odologies, and the systematic systems-oriented basis inherent in theapplication of control techniques.

This series presents books that draw on expertise from both theacademic world and the applications domains, and will be useful notonly as academically recommended course texts but also as handbooksfor practitioners in many applications domains. Advanced Process Iden-tification and Control is another outstanding entry to Dekker’s ControlEngineering series.

Nell Munro

III

TLFeBOOK

Preface

The study of control systems has gained momentum in both theoryand applications. Identification and control techniques have emergedas powerful techniques to analyze, understand and improve the per-formance of industrial processes. The application of modeling, identi-fication and control techniques is an extremely wide field. Processidentification and control methods play an increasingly important rolein the solution of many engineering problems.

There is extensive literature concerning the field of systems identi-fication and control. Far too often, an engineer faced with the identifi-cation and control of a given process cannot identify it in this vast lit-erature, which looks like the cavern of Ali Baba. This book will intro-duce the basic concepts of advanced identification, prediction and con-trol for engineers. We have selected recent ideas and results in areas ofgrowing importance in systems identification, parameter estimation,prediction and process control. This book is intended for advanced un-dergraduate students of process engineering (chemical, mechanical,electrical, etc.), or can serve as a textbook of an introductory course forpostgraduate students. Practicing engineers will find this book espe-cially useful. The level of mathematical competence expected of thereader is that covered by most basic control courses.

This book consists of nine chapters, two appendices, a bibliographyand an index. A detailed table of contents provides a general idea of thescope of the book. The main techniques detailed in this book are givenin the form of algorithms, in order to emphasize the main tools and fa-cilitate their implementation. In most books it is important to read allchapters in consecutive order. This is not necessarily the only way toread this book.

Modeling is an essential part of advanced control methods. Modelsare extensively used in the design of advanced controllers, and the suc-cess of the methods relies on the accuracy modeling of relevant featuresof the process to be controlled. Therefore the first part (Chapters 1-6)of the book is dedicated to process identification--the experimental ap-proach to process modeling.

V

TLFeBOOK

vi PREFACE

Linear models, considered in Chapters 1-3, are by far the mostcommon in industrial practice. They are simple to identify and allowanalytical solutions for many problems in identification and control.For many real-world problems, however, sufficient accuracy can be ob-tained only by using non-linear system descriptions. In Chapter 4, anumber of structures for the identification of non-linear systems areconsidered: power series, neural networks, fuzzy systems, and so on.Dynamic non-linear structures are considered in Chapter 5, with a spe-cial focus on Wiener and Hammerstein systems. These systems consistof a combination of linear dynamic and non-linear static structures.Practical methods of parameter estimation in non-linear and con-strained systems are briefly introduced in Chapter 6, including bothgradient-based and random search techniques.

Chapters 7-9 constitute the second part of the book. This part fo-cuses on advanced control methods, the predictive control methods inparticular. The basic ideas behind the predictive control technique, aswell as the generalized predictive controller (GPC), are presented Chapter 7, together with an application example.

Chapter 8 is devoted to the control of multivariable systems. Thecontrol of MIMO systems can be handled by two approaches, i.e., theimplementation of either global multi-input-multi-output controllers ordistributed controllers (a set of SISO controllers for the consideredMIMO system). To achieve the design of a distributed controller it isnecessary to select the best input-output pairing. We present a well-known and efficient technique, the relative gain array method. As anexample of decoupling methods, a multivariable PI-controller based ondecoupling at both low and high frequencies is presented. The design ofa multivariable GPC based on a state-space representation ends thischapter.

Finally, in order to solve complex problems faced by practicing en-gineers, Chapter 9 deals with the development of predictive controllersfor non-linear systems (adaptive control, Hammerstein and Wiener con-trol, neural control, etc.). Predictive controllers can be used to designboth fixed parameter and adaptive strategies, to solve unconstrainedand constrained control problems.

Application of the control techniques presented in this book are il-lustrated by several examples: fluidized-bed combustor, valve, binarydistillation column, two-tank system, pH neutralization, fermenter, tu-bular chemical reactor. The techniques presented are general and canbe easily applied to many processes. Because the example concerning

TLFeBOOK

PREFACE vii

fluidized bed combustion (FBC) is repeatedly used in several sections the book, an appendix is included on the modeling of the FBC process.An ample bibliography is given at the end of the book to allow readersto pursue their interests further.

Any book on advanced methods is predetermined to be incomplete.We have selected a set of methods and approaches based on our ownpreferences, reflected by our experience--and, undoubtedly, lack of ex-perience--with many of the modern approaches. In particular, we con-centrate on the discrete time approaches, largely omitting the issuesrelated to sampling, such as multi-rate sampling, handling of missingdata, etc. In parameter estimation, sub-space methods have drawnmuch interest during the past years. We strongly suggest that thereader pursue a solid understanding of the bias-variance dilemma andits implications in the estimation of non-linear functions. Concerningthe identification of non-linear dynamic systems, we only scratch thesurface of Wiener and Hammerstein systems, not to mention the multi-plicity of the other paradigms available. Process control can hardly beconsidered a mere numerical optimization problem, yet we have largelyomitted all frequency domain considerations so invaluable for any de-signer of automatic feedback control. Many of our colleagues would cer-tainly have preferred to include robust control in a cookbook of ad-vanced methods. Many issues in adaptive and learning control wouldhave deserved inspection, such as identification in closed-loop, input-output linearization, or iterative control. Despite all this, we believe wehave put together a solid package of material on the relevant methodsof advanced process control, valuable to students in process, mechani-cal, or electrical engineering, as well as to engineers solving controlproblems in the real world.

We would like to thank Professor M. M’Saad, Professor U. Kortela,and M.Sc. H. Aaltonen for providing valuable comments on the manu-script. Financial support from the Academy of Finland (Projects 45925and 48545) is gratefully acknowledged.

Enso Ikonen

Kaddour N~im

TLFeBOOK

ContentsSeries IntroductionPrefaceI Identification

1 Introduction to Identification

iii

2

31.1 Where are models needed? .................... 31.2 What kinds of models are thele? ................. 4

1.2.1 Identification vs. first-principle modeling ........ 71.3 Steps cf identification ........................ 81.4 Outline of the book ........................ 11

3

Linear Regression 132.1 Linear systems . ......................... 132.2 Method of least squares ..................... 17

2.2.1 Derivation ......................... 182.2.2 Algorithm ......................... 202.2.3 Matrix reFresentation ................... 212.2.4 Properties ......................... 25

2.3 Recursive LS method ....................... 282.3.1 Derivation ......................... 282.3.2 Algorithm ......................... 312.3.3 A ~osteviori prediction error ............... 33

2.4 RLS with exponential forgetting ................. 342.4.1 Derivation ..................... ¯ .... 362.4.2 Algorithm ......................... 36

2.5 Kalman filter ........................... 372.5.1 Derivation ......................... 402.5.2 Algorithm ......................... 422.5.3 Kalman filter in parameter estimation ......... 44

Linear Dynamic Systems 473.1 Transfer function ......................... 47

3.1.1 Finite impulse response .................. 473.1.2 Transfer function ..................... 50

ix

TLFeBOOK

x CONTENTS

4

5

6

3.2 Deterministic disturbances . . . , ................. 533.3 Stochastic disturbances ...................... 53

3.3.13.3.23.3.33.3.43.3.53.3.63.3.73.3.8

Offset in noise ....................... 55Box-Jenkins ........................ 55Autoregressive exogenous ................. 57Output error ....................... 59Other structures ..................... 61Diophantine equation ................... 66/-step-ahead predictions ................. 69Remarks .......................... 74

Non-linear Systems 774.1 Basis function networks ...................... 78

4.1.1 Generalized basis function network ........... 784.1.2 Basis functions ... : .................. 794.1.3 Function approximation ................. 81

4.2 Non-linear black-box structures ................. 824.2.1 Power series ........................ 834.2.2 Sigmoid neural networks ................. 894.2.3 Nearest neighbor methods ................ 954.2.4 Fuzzy inference systems ................. 98

Non-linear Dynamic Structures 1135.1 Non-linear time-series models .................. 114

5.1.1 Gradients of non-linear time-series models ....... 1175.2 Linear dynamics and static non-linearities ........... 120

5.2.1 Wiener systems ...................... 1215.2.2 Hammerstein systems ................... 124

5.3 Linear dynamics and steady-state models ............ 1255.3.1 Transfer function with unit steady-state gain ...... 1265.3.2 Wiener and Hammerstein predictors .......... 1265.3.3 Gradients of the Wiener and Hammerstein predictors . 128

5.4 Remarks .............................. 1325.4.1 Inverse of Hammerstein and Wiener systems ...... 1335.4.2 ARX dynamics ...................... 134

Estimation of Parameters 1376.1 Prediction error methods ..................... 138

6.1.1 First-order methods .................... 1396.1.2 Second-order methods .................. 1406.1.3 Step size .......................... 141

TLFeBOOK

CONTENTS xi

6.1.4 Levenberg-Marquardt algorithm ............. 1426.2 Optimization under constraints ................. 149

6.2.1 Equality constraints ................... 1496.2.2 Inequality constraints ................... 151

6.3 Guided random search ~nethods ................. 1536.3.1 Stochastic learning automaton .............. 155

6.4 Simulation examples ....................... 159Pneumatic valve: identification of a Wiener system . . 160Binary distillation column: identification of Hammer-stein model under constraints .............. 167Two-tank system: Wiener modeling under constraints. 172Conclusions ........................ 176

II Control

Predictive Control7.17.27.3

7.4

7.5

181Introduction to model-based control ............... 181The basic idea ........................... 182Linear quadratic predictive control ............... 1837.3.1 Plant and model ..................... 1847.3.2 /-step ahead predictions ................. 1857.3.3 Cost function ....................... 1867.3.4 Remarks .......................... 1877.3.5 Closed-loop behavior ................... 188Generalized predictive control .................. 1897.4.1 ARMAX/ARIMAX model ................ 1907.4.2 /-step-ahead predictions ................. 1917.4.3 Cost function ....................... 1937.4.4 Remarks .......................... 1957.4.5 Closed-loop behavior ................... 197Simulation example ........................ 197

Multivariable Systems 2038.1 Relative gain array method ................... 204

8.1.1 The basic idea ....................... 2048.1.2 Algorithm ......................... 206

8.2 Decoupling of interactions .................... 2098.2.1 Multivariable PI-controller ................ 210

8.3 Multivariable predictive control ................. 2138.3.1 State-space model ..................... 213

TLFeBOOK

xii CONTENTS

9

8.3.28.3.38.3.48.3.5

/-step ahead predictions ................. 216Cost function ....................... 217Remarks .......................... 218Simulation example .................... 219

Time-varying and Non-linear Systems 2239.1 Adaptive control ......................... 223

9.1.1 Types of adaptive control ................ 2259.1.2 Simulation example .................... 228

9.2 Control of Hammerstein and Wiener systems .......... 2329.2.1 Simulation example .................... 2339.2.2 Second order Hammerstein systems ........... 242

9.3 Control of non-linear systems .................. 2479.3.19.3.29.3.39.3.49.3.5

Predictive control ..................... 248Sigmoid neural networks ................. 248Stochastic approximation ................. 252Control of a fermenter .................. 254Control of a tubular reactor ............... 266

III Appendices

A State-Space Representation 273A.1 St ate-space description ...................... 273

A.I.1 Control and observer canonical forms .......... 274A.2 Controllability and observability ................. 275

A.2.1 Pole placement ...................... 276A.2.2 Observers ......................... 280

B Fluidized Bed Combustion 283B.1 Model of a bubbling iiuidized bed ................ 283

B.I.1 Bed ............................ 285B.1.2 Freeboard ......................... 286B.1.3 Power .......... : ................ 286B.1.4 Steady-state ........................ 287

B.2 Tuning of the model .................. ..... 288B.2.1 Initial values ........................ 288B.2.2 Steady-state behavior ................... 288B.2.3 Dynamics ......................... 290B.2.4 Performance of the model ................ 291

B.3 Linearization of the model .................... 293

TLFeBOOK

CONTENTS xiii

Bibliography

Index

299

307

TLFeBOOK

Part I

Identification

TLFeBOOK

Chapter 1

Introduction to Identification

Identification is the experimental approach to process modeling [5]. In thefollowing chapters, an introductory overview to some important topics inprocess modeling is given. The emphasis is on methods based on the use ofmeasurements from the process. In general, these types of methods do notrequire detailed knowledge of the underlying process; the chemical and phys-ical phenomena need not be fully understood. Instead, good measurementsof the plant behavior need to be available.

In this chapter, the role of identification in process engineering is dis-cussed, and the steps of identification are briefly outlined. Various methods,techniques and algorithms are considered in detail in the chapters to follow.

1.1 Where are models needed?

An engineer who is faced with the characterization or the prediction of theplant behavior, has to model the considered process. A modeling effort alwaysreflects the intended use of the model. The needs for process models arisefrom various requirements:

In process design, one wants to formalize the knowledge of the chemicaland physical phenomena taking place in the process, in order to un-derstand and develop the process. Because of safety and/or financialreasons, it might be difficult or even impossible to perform experimentson the real process. If a proper model is available, experimenting canbe conducted using the model instead. Process models can also help toscale-up the process, or integrate a given system in a larger productionscheme.

¯ In process control, the short-term behavior and dynamics of the process

3

TLFeBOOK

4 CHAPTER 1. INTRODUCTION TO IDENTIFICATION

may need to be predicted. The better one is able to predict the outputof a system, the better one is able to control it. A poor control systemmay lead to a loss of production time and valuable raw materials.

In plant optimization, an optimal process operating strategy is sought.This can be accomplished by using a model of the plant for simulatingthe process behavior under different conditions, or using the model asa part of a numerical optimization procedure. The models can also beused in an operator decision support system, or in training the plantpersonnel.

In fault detection, anomalies in different parts of the process are moni-tored by comparing models of known behavior with the measured be-havior. In process monitoring, we are interested in physical states (con-centrations, temperatures, etc.) which must be monitored but that arenot directly (or reliably) available through measurements. Therefore,we try to deduce their values by using a model. Intelligent sensors areused, e.g., for inferring process outputs that are subject to long mea-surement delays, by using other measurements which may be availablemore rapidly.

1.2 What kinds of models are there?

Several approaches and techniques are available for deriving the desired pro-cess model. Standard modeling approaches include two main streams:

¯ the first-principle (white-box) approach and

¯ the identification of a parameterized black-box model.

The first-principle approach (white-box models) denotes models basedon the physical laws and relationships (mass and energy balances, etc.) thatare supposed to govern the system’s behavior. In these models, the structurereflects all physical insight about the process, and all the variables and theparameters all have direct physical interpretations (heat transfer coefficients,chemical reaction constants, etc.)

Example 1 (Conservation principle) A typical first-principle law is thegeneral conservation principle:

Accumulation = Input - Output + Internal production (1.1)

The fundamental quantities that are being conserved in all cases are eithermass, momentum, or energy, or combinations thereof.

TLFeBOOK

1.2. WHAT KINDS OF MODELS ARE THERE? 5

Example 2 (Bioreactor) Many biotechnological processes consist of fer-mentation, oxidation and/or reduction of feedstuff (substrate) by microor-ganisms such as yeasts and bacteria. Let us consider a continuous-flow fer-mentation process. Mass balance considerations lead to the following model:

dx

d-~ = (#- u) (1.2)

ds 1

d-~ -- -~#x + u(s~,~- s)(1.3)

where x is the biomass concentration, s is the substrate concentration, u isthe dilution rate, sin is the influent substrate concentration, R is the yieldcoefficient and ~ is the specific growth rate.

The specific growth rate # is known to be a complex function of severalparameters (concentrations of biomass, x, and substrate, s, pH, etc.) Manyanalytical formulae for the specific growth rate have been proposed in theliterature [1] [60]. The Monod equation is frequently used as the kineticdescription for growth of micro-organisms and the formation of metabolicproducts:

s(1.4)I ~ ----- #max KM q- S

where ~rnax is the maximum growth rate and KM is the Michaelis-Mentonparameter.

Often, such a direct modeling may not be possible. One may say that:

The physical models are as different from the world as a geo-graphic map is from the surface of the earth (Brillouin).

The reason may be that the

¯ knowledge of the system’s mechanisms is incomplete, or the

¯ properties exhibited by the system may change in an unpredictablemanner. Furthermore,

¯ modeling may be time-consuming and

¯ may lead to models that are unnecessarily complex.

TLFeBOOK


In such cases, variables characterizing the behavior of the considered systemcan be measured and used to construct a model. This procedure is usuallycalled identification [55]. Identification governs many types of methods. Themodels used in identification are referred to as black-box models (or exper-imental models), since the parameters are obtained through identificationfrom experimental data.

Between the two extremes of white-box and black-box models lay thesemiphysical grey-box models. They utilize physical insight about the un-derlying process, but not to the extent that a formal first-principle model isconstructed.

Example 3 (Heating system) If we are dealing with the modeling of electric heating system, it is preferable to use the electric power V2 as a con-trol variable, rather than the voltage, V. In fact, the heater power, ratherthan the voltage, causes the temperature to change. Even if the heatingsystem is non-linear, a linear relationship between the power and the tem-perature will lead to a good representation of the behavior of this system.

Example 4 (Tank outflow) Let us consider a laboratory-scale tank system[53]. The purpose is to model how the water level y (t) changes with theinflow that is generated by the voltage u (t) applied to the pump. Severalexperiments were carried out, and they showed that the best linear black-boxmodel is the following

y(t) = aly(t - 1) + a2u(t (1.5)

Simulated outputs from this model were compared to real tank measure-ments. They showed that the fit was not bad, yet the model output wasphysically impossible since the tank level was negative at certain time in-tervals. As a matter of fact, all linear models tested showed this kind ofbehavior.

Observe that the outflow can be approximated by Bernoulli’s law whichstates that the outflow is proportional to square root of the level y (t). Com-bining these facts, it is straightforward to arrive at the following non-linearmodel structure

y(t) = aly (t 1)+ a~u(t - 1) + a~v/y (t - 1) (1.6)

This is a grey box model. The simulation behavior of this model was foundbetter than that of the previous one (with linear black-box model), as theconstraint on the origin of the output (level) was no longer violated.

TLFeBOOK

1.2. WHAT KINDS OF MODELS ARE THERE? 7

Modeling always involves approximations since all real systems are, tosome extent, non-linear, time-varying, and distributed. Thus it is highlyimprobable that any set of models will contain the ’true’ system structure.All that can be hoped for is a model which provides an acceptable level ofapproximation, as measured by the use to which the model will be dedicated.

Another problem is that we are striving to build models not just forthe fun of it, but to use the model for analysis, whose outcome will affectour decision in the future. Therefore we are always faced with the problemof having model ’accurate enough,’ i.e., reflecting enough of the importantaspects of the problem. The question of what is ’accurate enough’ can only,eventually, be settled by real-world experiments.

In this book, emphasis will be on the discrete time approaches. Mostprocesses encountered in process engineering are continuous time in nature.However, the development of discrete-time models arises frequently in prac-tical situations where system measurements (observations) are made, andcontrol policies are implemented at discrete time instants on computer sys-tems. Discrete time systems (discrete event systems) exist also, such as foundfrom manufacturing systems and assembly lines, for example. In general, fora digital controller it is convenient to use discrete time models. Severaltechniques are also available to transform continuous time models to a timediscrete form.

1.2.1 Identification vs. first-principle modeling

Provided that adequate theoretical knowledge is available, it may seem ob-vious that the first-principle modeling approach should be preferred. Themodel is justified by the underlying laws and principles, and can be easilytransferred and used in any other context bearing similar assumptions.

However, these assumptions may become very limiting. This can be dueto the complexity of the process itself, which forces the designer to use strongsimplifications and/or to fix the model components too tightly. Also, ad-vances in process design together with different local conditions often resultin that no two plants are identical.

Example 5 (Power plant constructions) Power plant constructions areusually strongly tailored to match the local conditions of each individualsite. The construction depends on factors such as the local fuels available, theratio and amount of thermal and electrical power required, new technologicalinnovations towards better thermal efficiency and emission control, etc. Tomake the existing models suit a new construction, an important amount ofredesign and tuning is required.

TLFeBOOK


Solving of the model equations might also pose problems with highlydetailed first-principle models. Either cleverness of a mathematician is re-quired from the engineer developing the model, or time-consuming iterativecomputations need to be performed.

In addition to the technical point of view, first-principle models can becriticized due to their costs. The more complex and a priori unknown thevarious chemical/physical phenomena are to the model developer, or to thescientific community as a whole, the more time and effort the building ofthese models requires. Although the new information adds to the generalknowledge of the considered process, this might not be the target of themodel development project. Instead, as in projects concerning plant controland optimization, the final target is in improving the plant behavior andproductivity. Just as plants are built and run in order to fabricate a productwith a competitive price, the associated development projects are normallyassessed against this criterion.

The description of the process phenomena given by the model might alsobe incomprehensible for users other than the developer, and the obtainedknowledge of the underlying phenomena may be wasted. It might turn out tobe difficult to train the process operators to use a highly detailed theoreticalmodel, not to mention teaching them to understand the model equations.Furthermore, the intermediate results, describing the sub-phenomena of theprocess, are more difficult to put to use in a process automation system.Even an advanced modern controller, such as a predictive controller, typicallyrequires only estimates of the future behavior of the controlled variable.

Having accepted these points of view, a semi- or full-parameterized ap-proach seems much more meaningful. This is mainly due to the saved designtime, although collecting of valid input-output observations from a processmight be time consuming. Note however, that it is very difficult to over-perform the first-principle approach in the case where few measurements areavailable, or when good understanding of the plant behavior has already beengained. In process design, for example, there are no full-scale measurementdata at all (as the plant has not been built yet) and the basic phenomena are(usually) understood. In many cases, however, parameterized experimentalmodels can be justified by the reduced time and effort required in buildingthe models, and their flexibility in real-world modeling problems.

1.3 Steps of identification

Identification is the experimental approach to process modeling [5]. Identifi-cation is an iterative process of the following components:

TLFeBOOK

1.3. STEPS OF IDENTIFICATION

¯ experimental planning (data acquisition),

¯ selection of the model structure,

¯ parameter estimation, and

¯ model validation.

9

The basis for the identification procedure is experimental planning, whereprocess experiments are designed and conducted so that suitable data for thefollowing three steps is obtained. The purpose is to maximize the informationcontent in the data, within the limits imposed by the process.

In modeling of dynamic systems, the sampling period1 must be smallenough so that significant process information is not lost. A peculiareffect called aliasing may also occur if the sampled signal containsfrequencies that are higher than half of the sampling frequency: Ingeneral, if a process measurement is sampled with a sampling frequencyws, high frequency components of the process variable with a frequencygreater than ~-~ appear as low-frequency components in the sampledsignal, and may cause problems if they appear in the same frequencyrange as the normM process variations. The sampling frequency shouldbe, if at all possible, ten times the maximum system bandwidth. Forlow signal-to-noise ratios, a filter should be considered. In some cases,a time-varying sampling period may be useful (related, e.g., to thethroughflow of a process).

The signal must also be persistently exciting, such as a pseudo random(binary) sequence, PRBS, which exhibits spectral properties similar those of the white noise.

Selection of the model structure is referred to as structure estimation,where the model input-output signals and the internal components of themodel are determined. In general, the model structure is derived using priorknowledge.

1When a digital computer is used for data acquisition, real-valued continuous signalsare converted into digital form. The time interval between successive samples is referredto as sampling period (sampling rate). In recursive identification the length of the timeinterval between two successive measurements can be different from the sampling rateassociated with data acquisition (for more details, see e.g. [5]).

TLFeBOOK


Most of the suggested criteria can be seen as a minimization of a lossfunction (prediction error, Akaike Information Criterion, etc.). In dy-namic systems, the choice of the order of the model is a nontrivial prob-lem. The choice of the model order is a compromise between reducingthe unmodelled dynamics and increasing the complexity of the modelwhich can lead to model stabilizability difficulties. In many practicalcases, a second order (or even a first order) model is adequate.

Various model structures will be discussed in detail in the following chapters.In general, conditioning of data is necessary: scaling and normalization

of data (to scale the variables to approximately the same scale), and filtering(to remove noise from the measurements).

Scaling process is commonly used in several aspects of applied physics(heat transfer, fluid mechanics, etc.). This process leads to dimension-less parameters (Reynolds number of fluid mechanics, etc.) which areused as an aid to understanding similitude and scaling. In [9] a theoryof scaling for linear systems using method from Lie theory is described.The scaling of the input and output units has very significant effects formultivariable systems [16]. It affects interaction, design aims, weightingfunctions, model order reduction, etc.

The unmodeled dynamics result from the use of input-output mod-els to represent complex systems: parts of the process dynamics areneglected and these introduce extra modeling errors which are not nec-essarily bounded. It is therefore advisable to perform normalizationof the input-output data before they are processed by the identifica-tion procedure. The normalization procedure based on the norm of theregressor is commonly used [62].

Data filtering permits to focus the parameter estimator on an appro-priate bandwidth. There are two aspects, namely high-pass filtering toeliminate offsets, load disturbances, etc., and low-pass filtering to elim-inate irrelevant high frequency components including noise and systemresponse. The rule of thumb governing the design of the filter is thatthe upper frequency should be about twice the desired system band-width and the lower frequency should be about one-tenth the desiredbandwidth.

In parameter estimation, the values of the unknown parameters of a pa-rameterized model structure are estimated. The choice of the parameterestimation method depends on the structure of the model, as well as the

TLFeBOOK

1.4. OUTLINE OF THE BOOK 11

properties of the data. Parameter estimation techniques will be discussed indetail in the following chapters.

In validation, the goodness of the identified model is assessed. The val-idation methods depend on the properties that are desired from the model.Usually, accuracy and good generalization (interpolation/extrapolation) abil-ities are desired; transparency and computational efficiency may also be ofinterest. Simulations provide a useful tool for model validation. Accuracyand generalization can be tested by cross-validation techniques, where themodel is tested on a test data set, previously unseen to the model. Also sta-tistical tests on prediction error may provide useful. With dynamic systems,stability, zeros and poles, and the effect of the variation of the poles, are ofinterest.

¯ Most model validation tests are based on simply the difference betweenthe simulated and measured output. Model validation is really aboutmodel falsification. The validation problem deals with demonstratingthe confidence in the model. Often prior knowledge concerning theprocess to be modeled and statistical tests involving confidence limitsare used to validate a model.

1.4 Outline of the book

In the remaining chapters, various model structures, parameter estimationtechniques, and predictiv~ control of different kinds of systems (linear, non-linear, SISO and MIMO) are discussed. In the second chapter, linear regres-sion models and methods for estimating model parameters are presented.The method of least squares (LS) is a very commonly used batch method. can be written in a recursive form, so that the components of the recursiveleast squares (RLS) algorithm can be updated with new information as soonas it becomes available. Also the Kalman filter, commonly used both forstate estimation as well as for parameter estimation, is presented in Chapter2. Chapter 3 considers linear dynamic systems. The polynomial time-seriesrepresentation and stochastic disturbance models are introduced. An/-step-ahead predictor for a general linear dynamic system is derived.

Structures for capturing the behavior of non-linear systems are discussedin Chapter 4. A general framework of generalized basis function networksis introduced. As special cases of the basis function network, commonlyused non-linear structures such as power series, sigmoid neural networks andSugeno fuzzy models are obtained. Chapter 5 extends to non-linear dynami-cal systems. The general non-linear time-series approaches are briefly viewed.

TLFeBOOK


A detailed presentation of Wiener and Hammerstein systems, consisting oflinear dynamics coupled with non-linear static systems,, is given.

To conclude the chapters on identification, parameter estimation tech-niques are presented in Chapter 6. Discussion is limited to prediction errormethods, as they are sufficient for most practical problems encountered inprocess engineering. An extension to optimization under constraints is done,to emphasize the practical aspects of identification of industrial processes. Abrief introduction to learning automata, and guided random search methodsin general, is also given.

The basic ideas behind predictive control are presented in Chapter 7.First, a simple predictive controller is considered. This is followed by an ex-tension including a noise model: the generalized predictive controller (GPC).State space representation is used, and various practical features are illus-trated. Appendix A gives some background on state space systems.

Chapter 8 is devoted to the control of multiple-input-multiple-output(MIMO) systems. There are two main approaches to handle the controlof MIMO systems: the implementation of a global MIMO controllers, orimplementation of a distributed controller (a set of SISO controllers for theconsidered MIMO system). To achieve the design of a distributed controller itis necessary to be able to select the best input-output pairing. In this chapterwe present a well known and efficient technique, the relative gain array (RGA)method. As an example of decoupling methods, a multivariable PI-controllerbased on decoupling at both low and high frequencies, is presented. Finally,the design of a multivariable GPC based on a state space representation isconsidered.

In order to solve increasingly complex problems faced by practicing en-gineers, Chapter 9 deals with the development of predictive controllers fornon-linear systems. Various approaches (adaptive control, control based onHammerstein and Wiener models, or neural networks) are considered to dealwith the time-varying and non-linear behavior of systems. Detailed descrip-tions are provided for predictive control algorithms to use. Using the inversemodel of the non-linear part of both Hammerstein and Wiener models, weshow that any linear control strategy can be easily implemented in order toachieve the desired performance for non-linear systems.

The applications of the different control techniques presented in this bookare illustrated by several examples including: fluidized-bed combnstor, valve,binary distillation column, two-tank system, pH neutralization, fermenter,tubular chemical reactor, etc. The example concerning the fluidized bedcombustion is repeatedly used in several sections of the book. This bookends with Appendix B concerning the description and modeling of a fluidizedbed combustion process.

TLFeBOOK

Chapter 2

Linear Regression

A major decision in identification is how to parameterize the characteristicsand properties of a system using a model of a suitable structure. Linearmodels usually provide a good starting point in the structure selection of theidentification procedure. In general, linear structures are simpler than thenon-linear ones and analytical solutions may be found. In this chapter, linearstructures and parameter estimation in such structures are considered.

2.1 Linear systems

The dominating distinction between linear and non-linear systems is the prin-ciple of superposition [19].

Definition 1 (Principle of superposition) The following holds only if is linearly dependent on b:

If alis the output due to bland a2 is the output due to b2,

then aal + ~a2 is the output due to abl ÷ j3b~.(2.1)

In above, the a and ~3 are constant parameters, and ai and bi (i -- 1, 2) aresome values assumed by variables a and b.

The characterization of linear time-invariant dynamic systems, in general,is virtually complete because the principle of superposition applies to all suchsystems. As a consequence, a large body of knowledge concerning the analysisand design of linear time-invariant systems exists. By contrast, the state ofnon-linear systems analysis is not nearly complete.

13

TLFeBOOK

14 CHAPTER 2. LINEAR REGRESSION

With parameterized structures f(~a, 0), two types of linearities are of im-portance: Linearity of the model output with respect to model inputs ~; andlinearity of the model output with respect to model parameters 0. The for-mer considers the mapping capabilities of the model, while the latter affectsthe estimation of the model parameters¯ If at least one parameter appearsnon-linearly, models are referred to as non-linear regression models [78].

In this chapter, linear regression models are considered. Consider thefollowing model of the relation between the inputs and output of a system[55]:

y(k) = OTcp(k) + ( (2.2)

where

(2.3)

and

~1(]g)

:

:

.

(2.4)

The model describes the observed variable y (k) as an unknown linear com-bination of the observed vector ~ (k) plus noise ~ (k). Such a model is calleda linear regression model, and is a very common type of model in control andsystems engineering. ~ (k) is commonly referred to as the regression vector;0 is a vector of constants containing the parameters of the system; k is thesample index. Often, one of the inputs is chosen to be a constant, ~t ~ 1,which enables the modeling of bias.

If the statistical characteristics of the disturbance term are not known,we can think of

= (2.5)

TLFeBOOK

2.1. LINEAR SYSTEMS 15

as a natural prediction of what y (k) will be. The expression (2.5) becomes prediction in an exact statistical (mean squares) sense, if {4 (k)} is a sequenceof independent random variables, independent of the observations ~o, withzero mean and finite variance1.

In many pra~.ctical cases, the parameters 0 are not known, and need to beestimated. Let 0 be the estimate of ~

~(k)-- ~Tg~ (k) (2.6)

Note, that the output ~(k) is linearly dependent on both 0 and ~ (k).

Example 6 (Static system) The structure (2.2) can be used to describemany kinds of systems. Consider a noiseless static system with input vari-ables Ul, u2 and ua and output y

(2.7)

where as (i = 1, 2, 3, 4) are constants. It can be presented in the form of (2.2)

lWe are looking for a predictor if(k) which minimizes the mean square error criterion

Replacing y (k) by its expression oT~ (k) ~- ~ (k) it follows:

E{(y(k)-ff) 2} = E{(OTT(k)+((k)-~)2}

If the sequence {( (k)} is independent of the obser~tions ~ (k),

In view of the fact that {( (k)} is a sequence of independent random v~iables with zero

meanvalue, it follows E {( (k) (OT~(k) - ~) } =O. As aconsequence,

and the minimum is obtNned for (2.5). The minimum ~lue of the criterion is equal

E I(( (k))2~, the ~iance of the noise.k"

TLFeBOOK


by choosing

and we have

=

al

a2

a3

a4

ul(k)u2(k)

(k)1

(2.8)

(2.9)

y(k) =oTqo(k) (2.10)

Example 7 (Dynamic system) Consider a dynamic system with inputsignals { u (k) } and output signals { y (k) }, sampled at discrete time instants2

k = 1, 2, 3, .... If the values are related through a linear difference equation

y(k) + aly(k 1)+ . .. . + anAY(k- hA)(2.1 1)

= bou(k-d)+ .... +b,~Bu(k-d-nt~)+~(k)

where a~ (i = 1, ..., hA) and b~ (i = 0, ..., riB) are constants and d is the timedelay, we can introduce a parameter vector/9

--a1:

0 ~-. --anAbo:

bnB

and a vector of lagged input-output data ~ (k)

y(k-- 1)

y (k - hA)

u(k-d)

(2.12)

(2.13)

:

u (k - d - riB)

2Observed at sampling instant k (k E 1,2 .... ) at time t = kT, where T is referred toas the sampling interval, or sampling period. Two related terms are used: the salnplingfrequency f = 3, and the angular sampling frequency, w = ~.

TLFeBOOK

2.2. METHOD OF LEAST SQUARES 17

and represent the system in the form of (2.2)

y(k) (2.14)

The backward shift d is a convenient way to deal with process time delays.Often, there is a noticeable delay between the instant when a change in theprocess input is implemented and the instant when the effect can be observedfrom the process output. When a process involves mass or energy transport,a transportation lag (time delay) is associated with the movement. This timedelay is equal to the ratio L/V where L represents the length of the process(furnace for example), and V is the velocity (e.g., of the raw material).

In system identification, both the structure and the true parameters 8 of asystem may be a priori unknown. Linear structures are a very useful startingpoint in black-box identification, and in most cases provide predictions thatare accurate enough. Since the structure is simple, it is also simple to validatethe performance of the model. The selection of a model structure is largelybased on experience and the informatior/that is available of the process.

Similarly, parameter estimates ~ may be based on the available a prioriinformation concerning the process (physical laws, phenomenological mod-els, etc.). If these are not available, efficient techniques exist for estimatingsome or all of the unknown parameters using sampled data from the process.In what follows, we shall be concerned with some methods related to theestimation of the parameters in linear systems. These methods assume thata set of input-output data pairs is available, either off-line or on-line, givingexamples of the system behavior.

2.2 Method of least squares

The method of least squares3 is essential in systems and control engineering.It provides a simple tool for estimating the parameters of a linear system.

In this section, we deal with linear regression models. Consider the model(2.2):

y(k)

where 0 is a column vector of parameters to be estimated from observationsy (k), ~a (k), k = 1, 2, ..., K, and where regressor ~ (k) is independent

~The least squares method was developed by Karl Gauss. He was interested in the esti-- mation of six parameters characterising the motions of planets and comets, using telescopicmeasurements.

TLFeBOOK


(linear regression)4. K is the number of observations. This type ot" model iscommonly used by engineers to develop correlations between physical quan-tities. Notice that ~a (k) may correspond to a priori known functions (log,exp, etc.) of a measured quantity.

The goal of parameter estimation is to obtain an estimate of the param-eters of the model, so that the model fit becomes ’good’ in the sense of somecriterion. A commonly accepted method for a ’good’ fit is to calculate thevalues of the parameter vector that minimize the sum of the squared residuals.Let us consider the following estimation criterion

1 Kj (0) [y(k)- (k)]2k=l

(2.16)

This quadratic cost function (to be minimized with respect to 0) expresses theaverage of the weighted squared errors between the K observed outputs, y (k),and the predictions provided by the. model, oTcp (k). The scalar coefficientsc~k allow the weighting of different observations.

The important benefit of having a quadratic cost function is that it can beminimized analytically. Remember that a quadratic function has the shapeof a parabola, and thus possesses a single optimum point. The optimum(minimum or maximum) can be solved analytically by setting the deriva-tive to zero and the examination of the second derivative shows whether aminimum or a maximum is in question.

2.2.1 Derivation

Let us minimize the cost function J with respect to parameters

~ = arg min J (2.17)

where J is given by (2.16)

K

J = -k--~l

(12.18)

aNote that this poses restrictions on the choice of ~ (k).

TLFeBOOK


Assuming that ~ (k) is not a function of 0~, the partial derivative for the i’thterm can be calculated, which gives

OJ O1

00-~= O0~g ~ [y(k)-0T~(k)]2 (2.19/k----1

1 g

K

=~~~ {~ [~ (~1 - o~ (~1] [-~, (~11} (~.~1k=l

K ~=~

For the second derivative we have

the first derivatives can be written as a row vector:

(2.24)

Taking the transpose gives

OJ 2~0T0--~ = ~ ~’(~) (~) 0-- ~y(k)~,(~)

k----1 k----1

(2.29)

TLFeBOOK


The optimum of a quadratic function is found by setting all partial deriva-tives to zero:

OJ-0 (2.30)

~ .~(k) (k) .~y (a)~(a) =ok=l k=l

(2.31)

]~v(~)v~(~) ~= ~.~v(k)y(~) (2.32)k=l

The second derivative can be collected in a matrix:

02J [ 02J ]oo - (2.33)

: [~.~ (k) ~, (k)] (2.34)k=l ~,j

K

= ~(~)~(~) (~.351k=l

For the optim~ to be a minim~, we req~re that the matrix is positivedefinite~.

Finally, the parameter vector ~ ~nimizing the c~t f~ction J is givenby (if the inverse of the matrix exists):

"~= o~k~(k)~T(k) Zo~k~(k)y(k) (2.36)

The optimum is a minimum if the second derivative is positive, i.e. themy is positive definite.matrix ~

2.2.2 Algorithm

Let us represent the celebrated least squares parameter estimate as an algo-rithm.

5Matrix A is positive definite if xTAx > 0 for x # 0.

TLFeBOOK


Algorithm 1 (Least squares method for a fixed data set) Let a systembe given by

y(k) = 0~ (k) + ~ (2.37)

where y (k) is the scalar output of the system; 0 is the true parameter vectorof the system of size I × 1; ~a (k) is the regression vector of size I × 1; and ~ (k)is system noise. The least squares parameter estimate 0 of 0 that minimizesthe cost function

K1

J: ~~ [y (k) - 0% (k)]~ (2.38)k=l

where a~ are scalar weighting factors, is given by

~= ~(~)~T(~) ~k~(~)y(~) (~.a9)k=l k=l

IfK

~.k~ (k)v~ (~) (2.40)k=l

is invertible, then there is a unique solution. The inverse exists if the matrixis positive definite.

Hence, a linear regression model

~(k) = ~T~ (k) (2.41)

was identified using sampled measurements of the plant behavior, where ~ (k)is the output of the model (predicted output of the system) and 0 is a pa-rameter estimate (based on K samples).

2.2.3 Matrix representation

Often, it is more convenient to calculate the least squares estimate from acompact matrix form. Let us collect the observations at the input of themodel to a K × I matrixr,~T(1) ] [ ~1(1) ~p~(1) ..-

(2)-_~ ~1 (2) ~ (2) ~p, (2)

(K) ~o1 (K) 2 (K) .. . ~OI (K(2.42)

TLFeBOOK


and observations at the output to a K × 1 vectory(1)

y(2):

y(K)

The K equations can be represented by a matrix equation

(2.43)

y -- (I)0 + (2.44)

where E is a K × 1 column vector of modeling errors. Now the least squaresalgorithm (assuming a~ = 1 for all k) that minimizes

1J = ~ (y- 00)T (y- O~) (2.45)

can be represented in a more compact form by

~= [(I)T4)]-1 (I)Ty (2.46)

where

must be positive definite.

02J ,~To (2.47)O0~

Consider Example 7 (dynamic system). If the input signal is constant,say ~, the right side of equation (2.11) may be written as follows

~Eb~+~(k) (2.48)i=0

It is clear that we can not identify separately the parameters b~ (i --- 0, ...,nt~). Mathematically, the matrix (I)T(I) is singular. From the point of view ofprocess operation, the constant input fails to excite all the dynamics of thesystem. In order to be able to identify all the model parameters, the inputsignal must fluctuate enough, i.e. it has to be persistently exciting.

Let us illustrate singularity by considering the following matrix:

A= [1 ~]71 1 (2.49)

TLFeBOOK


Qc[~] P [MW]2.2 19.12.3 19.32.3 19.22.3 19.11.6 13.11.7 15.11.7 14.33.1 26.03.0 27.03.0 25.6

Table 2.1: Steady-state data from an FBC plant.

which is singular for all s E ~. However, if s is very small we can neglect theterm al,2 = s, and obtain

AI= [1 0]~1 1 (2.50)

The determinant of A1 is equal to 1. Thus, the determinant provides noinformation on the closeness of singularity of a matrix. Recall that the de-terminant of a matrix is equal to the product of its eigenvalues. We mighttherefore think that the eigenvalues contain more information. The eigenval-ues of the matrix A1 are both equal to 1, and thus the eigenvalues give noadditional information. The singular values (the positive square roots of theeigenvalues of the matrix ATA) of a matrix represent a good quantitativemeasure of the near singularity of a matrix. The ratio of the largest to thesmallest singular value is called the condition number of the considered ma-trix. It provides a measure of closeness of a given matrix to being singular.Observe that the condition number associated with the matrix A1 tends toinfinity as e ~ 0.

Let us illustrate the least squares method with two examples.

Example 8 (Effective heat value) Let us cousider a simple applicationof the least squares method. The following steady state data (Table 2.1) wasmeasured from an FBC plant (see Appendix B). In steady state, the powerP is related to the fuel feed by

P = gQc + ho (2.51)

where H is the effective heat value MJ[-~-~ ] and h0 is due to losses. Based on thedata, let us determine the least squares estimate of the effective heat valueof the fuel.

TLFeBOOK


4O

30

20

10

0

-100 l 2 3

Qc [kg/s]4

Figure 2.1: Least squares estimate of the heat value.

Substituting t9 ~- [H, ho]T, cb ~ [Qc, 1], y ,- P we have

2.2 12.3 1: ¯

3.0 1

19.119.3

(2.52)

Using (2.46), or Algorithm 1, we obtain

~ = [dPT(b]-1 0Ty (2.53)

=[8.7997]-0.6453

Thus, H = 8.7997 i8 the least square8 estimate of the effective heat valueof the fuel. Fig. 2.1 8how8 the data point8 (dots) and the estimated linearrelation (8olid line).

Example 9 (02 dynamics) From an FBC plant (see Appendix B), [Nm~]fuel feed Qc [~] and flue gas oxygen content CF t~--’~ 1 were measured with a

sampling interval of 4 seconds. The data set consisted of 91 noisy measure-ment patterns from step experiments around a steady-state operating point:fuel feed ~c = 2.6 [~], primary air ~1 = 3.6 ,[Nm3]s j, secondary air ~2 = 8.4[g’~3] Based on the measurements, let us determine the parameters a, b,and c of the following difference equation:

[CF (k) - ~F] = --a [CF (k - 1) - ~F] + b [Qc (k - 6) - ~c] + c

TLFeBOOK


Let us construct the input matrix

Qc(1)-~c 1Qc(2)-~c 1

CF (90) -

and the vector of measured outputs

Qc(85)-~c

(2.55)

CF (7)c (8)

CF (91)

(2.56)

The least squares estimate of I-a, b, c]T is then calculated by (2.46), resultingin:

0.648-0.0172-0.0000

(2.57)

Thus, the dynamics of the 02 content from fuel feed are described by

(2.58)= 0.648[CF(k- 1)--~F] -0.0172[Qc(k-6)-~c]

or, equivalently using the backward shift operator, x (k - 1) q-ix (k

(1 - 0.648q-~) [CF (k) - ~F] = --0.0172q-6 [Qc (k) - ~c] (2.59)

The data (dots) and a simulation with the estimated model (solid lines) illustrated in Fig. 2.2.

2.2.4 Properties

Nex~ we will be concerned with the properties of the least squares estima-tor 0. Owing to the fact that the measurements are disturbed, the vectorparameter estimation ~ is random. An estimator is said to be unbiased ifthe mathematical expectation of the parameter estimation is equal to thetrue parameters O. The least squares estimation is unbiased if the noise Ehas zero mean and if the noise and the data (I) are statistically independent.Notice, that the statistical independence of the observations and a zero mean

TLFeBOOK


0.04

~’ 0.03~

~’~ 0o0

O.G

~2.~

~2.f

2.5

I I 0I I I

_ ,~r~i~ 0 000

.

1 2 3 4 5 6

0 1 2 3 4 5 6t [min]

Figure 2.2: Prediction by the estimated model. Upper plot shows the pre-dicted (solid line) and measured (circles) flue gas oxygen content. The lowerplot shows the model input, fuel feed.

noise is sufficient but not necessary for carrying out unbiased estimation ofthe vector parameters [62].

The estimation error is given by

~=0-~ (2.60)The mathematical expectation is given by

_- {o-io o1-1~-- E{O- [*T~]-I~T[~o-..~-E]} (2.62)

since [oTo] -~ oTO = I, and E and ~e statistically independent. It followsthat if E h~ zero mean, the LS ~timator is unbi~ed, i.e.

E{~} = 0 and E{~} = O (2.64)

Let ~ now co~ider the covariance matr~ of the estimation error whichrepr~ents the dispersion of~ about its mean value. The cov~i~ce matrix~

~The co.fiance of a r~dom ~riable x is defined by c~(x)

E {[~ - E {~}1 [~ - E {~}1~}. If x is zero mean, E {x} = 0, then coy(x) =

TLFeBOOK


of the estimation error is given by

=since E h~ zero mean and v~i~ce a~ (and its components are identicMlydistributed), and E and ¢ are statistically independent. It is a me~e ofhow well we can estimate the u~nown 0. In the le~t squ~ approach weoperate on given data, ¯ is known. This results in

P = [oTo]-ia~ (2.71)

The squ~e root of the diagonal elements of P, ~, repr~ents the

standard e~ors of each element ~ of the estimate ~. The v~iance can be~timated ~ing the sum of squ~ed errors divided by de~ees of freedom

where I is the number of p~ameters go ~timate.

Nx~ple 10 (Nffec~ive hea~ ~lue~ continued) Co~ider Exhale We have K = 10 data points and two p~ameters, I = 2. Using (2.72) obtain ~ = o.a6ag, a stand~d error of 0.a~82 for the ~timate of H, and0.8927 for the bi~ h0.

Nem~k 1 (Co~anee matrix) ~or ~ = 1 we obtain

Therefore, in ~he framework of p~ame~er ~timation, the ma~rN P = [~r~] -~is called the error cov~iance matrN.

TLFeBOOK


2.3 Recursive LS method

The least squares method provides an estimate for the model parameters,based on a set of observations. Consider the situation when the observationpairs are obtained one-by-one from the process, and that we would like toupdate the parameter estimate whenever new information becomes available.This can be done by adding the new observation to the previous set of ob-servations and recomputing (2.39). In what follows, a recursive formulationis derived, however [55]. Instead of recomputing the estimates with all avail-able data, the previous parameter estimates are updated with the new datasample. In order to do this, the least squares estimation formula is writtenin the form of a recursive algorithm.

Definition 2 (Recursive algorithm) A recursive algorithm has the form

new old= (2.74)

estimate estimate

predictioncorrection new

- with oldfactor observation estimate

2.3.1 Derivation

The least squares estimate at sample instant k - 1 is given by (2.39)

~(k- 1)-- ai cp(i)~ T (i ai~(i)y(i) (2.75)Li=I ":

At sample instant k, new information is obtained and the least squares esti-mate is given by

k--1

] --1

: (2.76)

x(~a~(i)y(i)+a~(k)y(k))~ i=l

Define

k

a(a) = ~.,~ (i) ~ (i) (2.77)i=l

TLFeBOOK

2.3. RECURSIVE LS METHOD 29

which leads to the following recursive formula for R (k)

R(k) = R(k- 1)+ ~k~ (k)~,~ (k) (2.78)Using (2.77), the least squares estimate (2.76) can be rewritten

~(k)=R-l(k)I~°~(i)y(i)+°~kcP(k)y(k)]~. (2.79)

Based on (2.77), the estimate at iteration k - 1, (2.75), can be rewritten follows:

k-1

~ (k - 1) = -1 (k -1)E c~i~p (i)y (i) (2.80)i-----1

which gives

k-1

E o~icp(i)y(i)= R(k- 1)~(k- (2.81)i=1

Substituting this equation into (2.79), we find

~(~) = a-1 (~) [a(~- 1)~ (~- 1) +,~v (k) y (~)] From (2.78), we have a recursive formula for which is substituted in (2.82)

~(~)= -1 ( ~) [ [a ( k)- ~ (~)~ (k )] ~ (k - ~) + ~

Reorganizing give:

~(k)=~(k-1)+R-~(k)a~(k)[y(k)-~T(k)~(k-1)] (2.84)

which, together with (2.78), is a rec~sive formul~ for the le~t squ~ ~ti-mate.

In the algorithm given by (2.84), the matrix R (k) needs to be invertedat each time step. In order to avoid this, introduce

P (k) = -~ (k) (2.85)

The recision of R (k), (2.78), now becom~

P-~ (k) = P-~ (~- 1) + ,~ (k) ~ (k) (2.86)The t~get is to be able to update P (k) directly, without needing to matrix inversion. This can be done by ~ing the matrix inversion lemma.

TLFeBOOK


Lemma 1 (Matrix inversion lemma) Let A, B, C and D be matrices compatible dimensions so that A ÷ BCD exists. Then

[A + BCD]-’ = A-’ - A-ZB [DA-’B + c-l] -1DA-1 (2.87)

The verification of the lemma can be obtained by multiplying the right-hand side by A + BCD from the right, which gives unit matrix (for proof,see [64], p. 64).

Making the following substitutions

A ~- p-1 (k- I) (2.88)

B ~- ~(k) (2.89)

C *- ak (2.90)D ~- ~T(k) (2.91)

and applying Lemma 1 to (2.86) gives

P (k) = [P-1 (k- 1) ~(k)a~:~ T (k )] -~ (2.92)

= P(k-1)-P(k-1)cp(k)cpT(k)P(k-1) (2.93)l+~T(k)P(k_ 1)~(k)

Thus the inversion of a square matrix of size dim 0 is replaced by the inversionof a scalar.

The algorithm can be more conveniently expressed by defining a gainvector L (k)

L(k) = P(k- 1) ~ (k) = (~P (k) (2.94)± + (k) e (k - 1)÷ ~k

where the second equality can be verified by substituting (2.93) for P (k) reorganizing.

The recursive algorithm needs some initial values to be started up. In theabsence of prior knowledge, one possibility to obtain initial values is to usethe least squares method on the first k0> dim 0 samples. Another commonchoice is to set the initial parameter vector to zero

~(k0) =0 (2.95)

and let the initial error covariance matrix to be a multiple of identity matrix

P (k0) = (2.96)

where C is some large positive constant. A large value implies that theconfidence in 0 (k0) is poor and ensures a high initial degree of correction(adaptation). Notice that this makes the updating direction coincide withthe negative gradient of the least squares criterion.

TLFeBOOK


2.3.2 Algorithm

The recursive least squares algorithm can now be given, using (2.94), (2.84)-(2.85), and (2.93)

Algorithm 2 (Recursive least squares method) The recursive leastsquares algorithm is given by

L(k) P(k- 1) ~ (k)

(2.97)± (k) P (/c - 1) ÷ ~k

"~(k)=’~(k- 1) ÷L (k) [y (k) _ ~T (k _ 1)~o(k)] (2.98)

P (k) = P (k - 1) - L (k) ~T (k)P (2.99)

where k = ko + 1, ko + 2, ko + 3, ... The initial values ~ (ko) and P (/Co) obtained by using the LS on the first/co > dim 0 samples

-1

~ (k0) = P (k0) ~ ~ (i)y (2.101)

The ~S method is one of the most widely ~ed rec~sive p~ameter~timation techniques, due to its rob~tn~s and e~iness of implementation.

Example 11 (02 dynamics: continued) Let us consider the same prob-lem as in Example 9 where the parameters of the following model were to beestimated:

[1 "~- aq -1] Iv F (k) - ~F] ~- bq-6 [Qc (k) - ~c] c

Using the recursive LS-method with the initial values k0 -= 7:

~(7)= 0 ;P(7)= ~ 00 0 0 109

and substituting for k -- 7, 8, ..., 91

(2.102)

(2.103)

(2.10a)

TLFeBOOK


O.Od

0.03-~

0.03

0.0%

oI I I0 00 0

i 2 3 4 5 6

2.8

~2.7

22.6

2.5

-0.5

I I I

3 4t [minl

5 6

-10~ ~ ~ ~ ~1 2 3 4 5 6

Figure 2.3: On-line prediction by the estimated model. Upper plot showsthe predicted (solid line) and measured (circles) flue gas oxygen content; middle plot shows the model input, fuel feed. The evolution of the values ofthe estimated parameters is shown at the bottom of the figure.

we have the following parameters at k = 91:

~(91) 0.646-0.0172-0.0000

(2.1o

which are the same (up to two digits) as in Example 9. Fig. 2.3 illustratesthe evolution of the parameters a, b and c, as well as the on-line predictionby the model.

Remark 2 (Factorization) The covariance matrix must remain positivedefinite. However, even if the initial matrix P (0) satisfies the second ordercondition of optimality (least squares optimization problem), the positivedefiniteness of P (k) can be lost, owing the numerical round-off errors in

TLFeBOOK


long term behavior (adaptive context, etc.). In order to maintain numericalaccuracy it is more advisable to update the estimator in a factorized formwhich guarantees that P (k) remains positive definite and that the round-off errors, unavoidable in computer applications, do not affect the solutionsignificantly. One of the most popular methods is the UD factorization whichis based on the decomposition of P (k)

P (k) = V (k) D UT (k)

where the factors U (k) and D (k) are, respectively, a unitary upper triangu-lax matrix and a diagonal matrix.

2.3.3 A posteriori prediction error

In the previous developments, the RLS was derived using the a priori pre-diction error

e(klk- 1)=y(k)_~T(k_ 1)~(k) (2.106)

In some cases, the a posteriori version may be preferred[51]

e(klk ) =y(k)-~~’(k)~(k) (2.107)

The connection between these can be obtained using (2.106) and (2.107)

e(k[k) y( k)-’~T(k-1)~(k) (2.108)

= e(klk- 1)-~ T(k) [~(k)-~(k- (2.109)

From (2.98) we derive

[~(k)-~(k- 1)] L(k)e(klk- 1) (2.110)

Substituting (2.97) into this equation leads

[ ]P(k-1)~(k)

(klk 1) (2.111)~(k)-~(k-1) ~+~T(k)P(k_l)cp(k)e -

Thus, substituting (2.111) into (2.109) gives

~T (k) P (k- 1)~ (k) (klk- 1)(2.112)= e(klk- 1)- ~-~-~i~-~ - 1)~(k).e

e(klk- 1)= 1 +ak~r(k)P(k - 1)~(k)

(2.113)

TLFeBOOK


which is the relation between a priori and a posteriori prediction errors. Themodified RLS algorithm is then given by (2.97),

B(k) _~T (k- 1) ~ ~(ele- 1)

~(klk-1) (2.114)

~(klk) (2.115)

~(k) (2.116)It can be observed that e (k[k) can tend to zero if ~a (k) becomes

1 + c~aT (k) P (k - 1)~a = ~(k- 1)+L(k)e(klk)

and (2.99).unbounded, even if e (k]k - 1) doesn’t.

2.4 RLS with exponential forgetting

The criterion (2.16) gives an estimate based on the average behavior of thesystem, as expressed by the samples used in the identification. This resultedin the Algorithms 1 and 2. However, if we believe that the system is time-varying, we need an estimate that is representative of the current propertiesof the system. This can be accomplished by putting more weight on newersamples, i.e. by forgetting old information. These types of algorithms arereferred to as adaptive algorithms.

In the time-varying case, it is necessary to infer the model at the sametime as the data is collected. The model is then updated at each time instantwhen some new data becomes available. The need to cope with time-varyingprocesses is not the only motivatipn for adaptive algorithms. Adaptive iden-tification may need to be considered, e.g., for processes that are non-linearto the extent that one set of model parameters may not adequately describethe process over its operating region [85].

In order to obtain an estimate that is representative for the current prop-erties of the system at sample instant k, consider a criterion where oldermeasurements are discounted ([55], pp. 56-59):

k

J~ (8) = ~ ~-~f~ (k,i) [y(i) - oT~ (2.117)

where ~ (k, i) is increasing in i for a given k. The criterion is still quadraticin ~ and the minimizing off-line estimate is given by

0(k)= Z(k,i)i----1

-1 k

~(i)~T (i) Z/~(k,i)~(i)y(i) (2.118)i~1

TLFeBOOK

2.4. RLS WITH EXPONENTIAL FORGETTING 35

~(k,i)

0.5

~L--1H=Inf ....... "" /’

/

~,=0.95 H=20 ........ "/0

0 50 k=100

Figure 2.4: The effect of A (ai = 1 for all i).

Consider the following structure for/~ (k, i):

/3(k,i)= A(k)/3(k-

where 1 < i < k - 1 and A (k) is a scalar. This can also be written

(2.119)

’3 (k’ i)= 1~I=~+1A (j)] a (2.120)

where

/3 (i,i) --- (2.121)

/~ (k, i) = Ak-~ai (2.122)

If A (i) is a constant A, we get

which gives an exponential forgetting profile in the criterion (2.117). In sucha case, the coefficient A is referred to as the forgetting factor. Figure 2.4illustrates the weighting obtained using a constant A.

The effect of A can be illustrated by computing the equivalent memory1horizon H = T=X (a~ = 1 for all i). A common choice of A is 0.95 - 0.99.

When A is close to 1, the time constant for the exponential decay is approx-imately H. Thus choosing A in the range 0.95 - 0.99 corresponds, roughly,to remembering the 20 - 100 most recent data.

TLFeBOOK


2.4.1 Derivation

We are now ready to derive a recursive form for the previous equa.tious. Letus introduce the following notation (see (2.77))

k

a(k)-- ~Z (k,i)v ~ (i) (2.123)

Separating the old and the new information

k-1

R(k) -= y~fl(k,i)cp(i)cp T (i) + fl(k,k)cp(k)cp T (k)i=1

(2.124)

and substituting (2.119) and (2.122) into this equation leads

k-1

a(~) y:~ ~Z(~- 1,i) v (i~ (i) +.~ (~)~ (i=1

(2.125)

Using (2.123) for R (k - 1), we have a recursive formula for R

a(k) = ha(k- 1)+ ~,~ (k) ~ (~) (2.126)

In a similar way to the RLS, we can write a recursive formula for thepara~neter update

~ (k) = ~ (k - 1) q-a -1 (k)O~k~O (~) [y (k) - ~T (k - 1) (2.127)

This is exactly the same as (2.84). Again, we can denote P (k) -~(k)and use the matrix inversion lemma (Lemma 1) to avoid matrix inversion (2.127) (select A ~- £p-1 (k- 1) and B ~- ~o (k); G ~- a~ ; T (k)

2.4.2 Algorithm

Now the recursive least squares algorithm with exponential forgetting can begiven.

Algorithm 3 (RLS with exponential forgetting) The recursive leastsquares algorithm with exponential forgetting is given by

P(k- 1) ~ (k) (2.128)L(k)= ~+~V(k) P(k_l)~(k)~k

TLFeBOOK

2.5. KALMAN FILTER 37

~(k)=~(k- 1) +L (k) [y (k) T(k- 1)~(k)] (2.129)

1(2.130)

where 0 < A _< 1, and A = 1 gives the RLS algorithm with no forgetting.

The effect of the forgetting factor £ is that the P (k) and hence the gainL (k) are kept larger. With ~ < 1, the P (k) will not tend to zero and algorithm will always be alert to changes in 0.

Example 12 (Or dynamics: continued) Let us illustrate the perfor-mance of the RLS with exponential forgetting. Consider the identificationproblem in an FBC plant in Example 9, and let an unmeasured 20% decreasein the char feed occur (e.g., due to an increase in the fuel moisture). Fig 2.5illustrates the prediction and the on-line estimated parameters when using aforgetting factor A -- 0.97.

The change occurs at t = 8 min. The algorithm is able to follow thechanges in the process.

There exists a large number of other forgetting schemes. Many (if notmost) of them are inspired by the robustness of the Kalman filter, discussedin the next section.

2.5 Kalman filter

In the Bayesian approach to the parameter estimation problem, the param-eter itself is thought of as a random variable. Based on the observations ofother random variables that are correlated with the parameter, we may inferinformation about its value. The Kalman filter is developed in such a frame-work. The unobservable state vector is assumed to be correlated with theoutput of a system. So, based on the observations of the output, the valueof the state vector can be estimated. In what follows, the Kalman filter isfirst introduced for state estimation. This is followed by an application tothe parameter estimation problem.

Assume that a stationary stochastic vector signal {x (k)} can be describedby the following Markov process

x(k ÷ 1)-- A(k)x(k)+v(k) (2.131)

TLFeBOOK

38 CHAPTER 2. LINEAR REGB:ESSION

0.0~

~0.0(~

0.0:

2.8

5 10 15

~2.7

5 10t [min]

15

0 5 I0 15

Figure 2.5: On-line prediction by the estimated model. Upper plot shows

the predicted (solid line) and measured (circles) flue gas oxygen content

clarity, only every third measurement is shown). The middle plot showsthe model input, fuel feed. The evolution of the values of the estimated

parameters is shown at the bottom of the figure.

TLFeBOOK


with measurement equation

y(k) = C (k)x(k) (2.132)

where x (k) is an S × 1 dimensional column state vector, v (k) is a S dimensional column vector containing the system noise; and y (k) and e (k)are O × 1 dimensional column vectors of measurable outputs and the outputnoise. A (k) is an S × S dimensional system state transition matrix describingthe internal dynamics of the system (Markov process). C (k) is the O output matrix, describing the relation between states and the measurableoutputs. In state estimation, a stationary system is often assumed, A (k) A, C(~) =

The objective is to estimate the state vector x (k) based on measurementsof the outputs y (k), contaminated by noise e (k). The system model sample instant k is assumed to be known:

A(k),C(k) (2.133)

and the processes .{v (k)} and {e (k)} are zero mean, independent Ganssianprocesses with known mean values and covariances:

E {v (k)} = 0; E {v (k) T (j) } = V (k) 6k (2.134)

E{e(k)} = 0;E {e(k)e T (j)} = Y (k)6kj (2.135)

E {e(k)v T (j)} = (2.136)

where 5~i is the Kronecker delta function7. v (k) and e (k) have covariancesV (k) and Y (k), respectively, which are non-negative and symmetric. It assumed that {y (k)} is available to measurement, but {x (k)} is not. It desirable to predict {x (k)} from the measurements of {y (k)}.

The Kalman filter can be derived in a number of ways. In what follows,the mean square error approach for the Kalman predictor is considered [41].We then proceed by giving the algorithm for the Kalman filter (the proof forthe filter case is omitted as it is lengthy).

7Kronecker delta function is given by

ifi=jotherwise

TLFeBOOK


2.5.1 Derivation

Let us introduce the following predictor for the state x at instant .k + 1

~.(k+l)=A(k)~(k)+K(k)[y(k)-C(k)~(k)] (2.137)

which consists of two terms: a prediction based on the system model andthe previous estimate, and a correction term from the difference between themeasured output and the output predicted using the system model. The gainmatrix K (k) needs to be chosen.

Let us consider the following cost function to be minimized

g(k+ 1)-- E {~(k + 1)~T (k + 1)} (2.138)

where ~ is the prediction error

~(k+ 1) = ~(k + 1) - x(k (2.139)

The optimal solution is given by

K (k) = A (k) P CT ( k) Iv ( k) -~-C (k) P(k) T (k)] -1 (2.140)

where

P (k + 1) = A (k) P (k) T (k) +V (k) - K (k) C (k) P T (k) (2.141)

Proof. Substituting (2.137) into (2.139) we have

= A(k)~(k)+K(k)[y(k)-C(k)~(k)]-x(k+l) (2.142)

= [A(k)-K(k)C(k)]~(k)+K(k)y(k)-x(k+l) (2.143)

and substituting (2.131)-(2.132) we

~(k+l) -- [A(k)-K(k)C(k)]~(k)+K(k)C(k)x(k) (2.144)+K (k)e (k)- A (k)x (k)-

Reorganizing and using (2.139), we have the following prediction error dy-namics

~(k+l)=[A(k)-K(k)C(k)]~(k)+K(k)e(k)-v(k) (2.145)

TLFeBOOK


The cost function (2.138) can now be expressed

J(k+l)E{[[A (k) -K (k) C (k)] ~, (k) + K (k)e (k) [[A (k) -K (k) C (k)] ~ (k) + K (k)e (k) T}[A (k) -K (k) C (k)] E {~ (k)~T x [A (k) -K (k) C T+V (k) + K (k) Y gT (k)

since e (k),[A(k)- K(k) C(k)] are known.

Let us use the following notation

{2.146)

(2.147)

v (k), and ~ (k) are statistically independents and K (k) and

P(k) E{ ~(k)~T( k)} (2.148)

Q (k) = Y (k) +C (k) P T (k (2.149)

where P (k) is the covariance matrix of the estimation error. Rewrite (2.147)

P(k+l) A( k)P(k)AT(k)-K(k)C(k)P(k)AT(k) (2 .150)-A (k) P (k) T (k) KT (k)

+V (k) + g (k) Q gT (k)

By completing squares of terms containing K (k) we find

P(k+l) A( lc)P(k)AT(k)+V(k) (2.151)

-A (k) P (k) T (k) Q-’ ( k) C(k) P T (k)

+ [K (k)-A (k) P CT (k) Q-1 (k)] Q (k)

x [g (k)-A (k) P (k) T (k) Q-1 ( T

Now only the last term of the sum depends on K (k), and minimization of can be done by choosing K (k) such that the last term disappears:

K (k) = A (k) P (k) T (k) [ Y (k) +C (k) PT (k)] -1 (2.152)

SBy assumption, v (k) and e (k) are statistically independent. ~ (k) is given by ~. ~ (k) - x (k). The prediction ~ (k) depends on the past measurement y (k - 1), hence dependent on e (k - 1). The state x (k) is dependent on noise v (k - 1) disturbingthe state. Thus the prediction error ~(k) depends on e(k - 1) and v(k - 1), but e (k) or v (k). Thus, v (k), e (k), and ~ (k) are statistically independent.

TLFeBOOK


Since the last term disappears, we have

P(k+I)=A(k)P(k)AT(k)+V(k)-K(k)C(k)P(k)AT(k)

¯Collecting the results, we have the following algorithm for an optimal

estimate (in the mean square error sense) of the next state x (k + 1), basedon information up to k:

K(k) ~(k+ 1)

P(k+ 1)

A (k) ~ (k) IT (k) [V(k)4-(J(k) P(k) (jT (kA(k)~(k)-g(k)[C(k)~(k)-y(k)] (2.155)n (k) P (k) T (k) +V (k (2.156)-K (k) C (k) P (k) T (k)

If the disturbances {e (k) } and {v (k) } as well as the initial state x (0) Gaussian (with mean values 0, 0, and x0 and covariances V (k), Y (k) P (0), respectively), the estimate ~ (k ÷ 1) is the mean of the conditionaldistribution of x(k+ 1), ~(k + 1) = E{x(k + 1) ly(0),y(1),--. P (k + 1) is the covariance of the conditional distribution of x (k +

2.5.2 Algorithm

Let us denote the estimate (2.155) based on information up to time k ~ (k + l[k). A Kalman filter can also be derived for estimating the statex (k + 1), assuming now that the measurement y (k + 1) has become avail-able, i.e.

~(k + llk + 1)= E{x(k + 1)ly (0),y (1),... ,y(~ (2.157)

Consider now a filter of the form

~(k + llk + 1) = ~(k + l[k) + K (k + 1)[y (k + 1) - C (2.138)

The following algorithm can be derived. (Note that an extended state spacemodel is used with an additional deterministic input u (k) and a noise tran-sition matrix G (k).)

Algorithm 4 (Kalman filter) Estimate the state vectors x (k) of a systemdescribed by the following equations

x(k + 1) = A(k)x(k) +B(k)u(k) (2.159)

TLFeBOOK


y (k) = C (k)x (k) ÷ (2.160)

where x (k) is an S× 1 state vector: u (k) and v (k) are I× 1 vectors containingthe system inputs and Gaussian noise; y (k) and e (k) are O × 1 vectors measurable outputs and the output Gaussian noise, respectively. A (k) is S × S system state transition matrix; B (k) and G (k) are S × I and S system input and noise transition matrices; C (k) is the O × S output matrix.The following are known for k = k0, k0 + 1, ko + 2, ..., j _< k:

A (k),B (k), C (k), (2.161)

E{v(k))=O;E(v(k)vT(j))=V(k)~ (2.162)

S(e(k)}=O;S(e(k)eT(j)~-=Y(k)~ (2.163)

E (e (k) T (j)} = (2.164)

x (ko) -~ Xko; cov (x (ko)} (2.165)

1. Set k = ko. Initialize ~ (kolko) x~o and P (kolko) = P~o.

2. Time update:

Compute the state estimate at k ÷ 1, given data up to k:

~(k+llk)=A(k)~(klk)+B(k)u(k) (2.166)

and update the covariance matrix of the error in

P(k+llk)=A(k)P(k]k)AT(k)+C~(k)V(k)CIT(k) (2.167)

3. Measurement update:

Observe the new measurement y (k ÷ 1), at time t -- kT.

Compute the Kalman filter gain matrix:

g (k + 1) = P (k +llk ) CT (k + 1) (2.168)

× [Y (k + 1) ÷ C(k + 1)P (k + llk) T ( k ÷1) -1

Correct the state estimate at k + 1, given data up to k ÷ 1:

~(k +llk+ 1) = ~(k+ llk) (2.169)÷K(k + 1)[y (k + 1)-

TLFeBOOK

44 CHAPTER 2. LINEAR REGE~SSION

and update the new error covariance matrixg:

P(k-~llk-t-1 ) = [I- K(k-t-1)C(k ÷ l)]P(k-t- )× [I- K(k-t- 1)C(k-t- T

+K (k + 1) Y (k + KT (k -t- 1)

(2.170)

4. Increase sample index k = k + 1 and return to step 2.

2.5.3 Kalman filter in parameter estimation

Suppose that the data is generated according to

y(k) = ~ (k)0 (2.171)

where e (k) is a sequence of independent Gaussian variables with zero meanand variance a2 (k). Suppose also that the prior distribution of 0 is Gaussianwith mean ~0 and covariance P0. The model, (2.171), can be seen as a linearstate-space model:

O(k-t- 1) = 0(k) (2.172)y(k) = ¢pT (k)O(k) (2.173)

Comparing with (2.159)-(2.165) shows that these equations are identicalwhen making the following substitutions:

A(k) ~- t;x(k)+-0(~) (2.174)B(k) ~- 0;u(k)+-0 (2.175)(;(k) ~ 0; v (k) ,- 0; V (k) (2.176)

C(k) +- ¢pT (k);y(k) ~- y(k); (2.177)

e(k) +--- e(k);Y(k),:-a2(k) (2.178)

~ (010) +- ~0; P (010) ~-- P0

9This is a numerically better form of

P(k + l{k+ 1) = P(k+ llk) - K(k+ 1) C(k+ 1) )

(see [39], p. 270).

(2.179)

TLFeBOOK


The Kalman filter algorithm, (2.159)-(2.170), is now given by (note P (k) ~- P (k+ llk) -- (klk);O(k); O ~-- ~( llk)

K(k + 1)= a2(k)+cpT(k + 1) P (k) (k (2.18o)

~(k+ 1)=~(k)+ K(k+ 1)(y(k + 1)-~ T(k)9~(k + 1)) (2.181)

P(k+l)=P(k)-K(k+l)cpT(k+l)P(k) (2.182)

Comparing with the RLS (Algorithm 2) shows that the Kalman filter holdsthe RLS as its special case. Not~ that now the initial conditions of theILLS have a clear interpretation: 0 (0) is the prior mean and P (0) is theprior covariance of the parameters 0. Furthermore, the p_.osterior distributionof 0 at sample instant k is also Ganssian with mean 0 (k) and covarianceP (k) (see [55], pp. 33-36). The Kalman filter approach also shows that optimal weighting factor ak in the least squares criterion is the inverse of thevariance of the noise term, a~ = 1/cr2 (k), at least when the noise is whiteand Gaussian.

If the dynamics of the system are changing with time, i.e. the modelparameters are time-varying, we can assume that the parameter vector variesaccording to

0 (k + 1) = 0 (k) + v (2.183)

Now V ~ 0 and the covariance update becomes (see (2.167)):

P(k+l)=P(k)-g(k+l)~oT(k+l)P(k)+V (2.184)

This prevents the covariance matrix from tending to zero. In fact P (k) --~ when the number of iterations increases, and the algoritlmi remains alert tochanges in model parameters. For example, in [23] the addition (regulariza-tion) of a constant scaled identity matrix at each sample interval is suggested,V ~- ~I. The bounded information algorithm [70] ensures both lower andupper bounds ami n and a~,= for P (k)

p (klk) = amax - aminp (k[k - 1) + aminI (2.185)amax

An advantage of the Kalman filter approach, compared to the least squaresalgorithm with exponential forgetting, is that the nature of the parameterchanges can be easily incorporated, and interpreted as the covariance matrix

V.

TLFeBOOK

Chapter 3

Linear Dynamic Systems

In this chapter, our attention is focused on the discrete-time black-box mod-eling of linear dynamic systems. This type of model is commonly used inprocess identification, and is essential in digital process control. From thepoint of view of control, the simplicity of black-box models has establishedthem as a fundamentM means for obtaining input-output representations ofprocesses.

The transfer function approach provides a basic tool for representing dy-namic systems. Stochastic disturbance models provide a tool for character-izing (unmeasured) disturbances, present in all real systems.

3.1 Transfer function

Let us first consider two commonly used transfer functionI representationsof process dynamics.

3.1.1 Finite impulse response

A finite impulse response (FIR) system is given

(3.1)

where

1In order to avoid unnecessary complexity in notation and terminology, the backwardshift operator, q-l, notation will be used, x (k - i) -- q-~x (k). Strictly speaking, a divisionof two polynomials in q-1 is not meaningful (whereas the division of two functions in -1

is). However, the reader should consider this transfer operation as a symbolic notation(or as an equivalent z transform). With this loose terminology, we allow ourselves to usethe term ’transfer function’ for descriptions that use polynomials in the backward shiftoperator.

47

TLFeBOOK

48 CHAPTER 3. LINEAR DYNAMIC SYSTEMS

¯ {y (k)} is a sequence of system outputs, and

¯ {u (k)} is a sequence of system inputs,

sampled from the process at instants k = 1, 2,... which are usually equidistanttime intervals:

kT=t (3.2)

where t is the time and T is the sampling interval (e.g., in seconds). Theprocess is characterized by

B (q-l) = bo + blq-~ + ... + b~q (3.3)

which is a polynomial in the backward shift operator q-1

(3.4)

d is the time delay (in sampling instants) between process input and output.The system behavior is determined by its coefficients or parameters b~, n =0, 1, 2, ..., riB, bn E ~.

FIR structures are among the simplest used for describing dynamic pro-cesses. They involve:

¯ no complex calculations, and

¯ no assumption on the process order is required.

The parameters can be obtained directly from the elements of the impulseresponse of the system. The choice of nB and d is less critical, if chosen largeenough and small enough, respectively.

The disadvantages of FIR are that:

¯ unstable processes can not be modelled,

¯ a large number of parameters need to be estimated (e,specially for pro-cesses containing slow modes, i.e. slow dynamics).

TLFeBOOK

3.1. TRANSFER FUNCTION 49

Residence time

Process engineers are often confronted with the calculation of residence timein continuous flow systems (reactors, columns, etc.) [62]. The residence timeis the time needed for the fluid to travel from one end of the process tothe other. The residence time is a convenient time base for normalization(usually, the states variables are made dimensionless and scaled to take thevalue of unity at their target value). The residence time is also directlyrelated to the efficiency and productivity of a given chemical process.

Tracer tests (isotopic, etc.) are commonly used in chemical engineering fordetermining the residence time. An amount of tracer is fed into the processas quickly as possible (impulse input). The output is then measured andinterpreted as the process impulse response. For linear systems, the residencetime is directly calculated from the impulse response or from the parametersof their transfer function [97].

A linear system can be defined by its continuous-time impulse responseg(t). Its output equation is given by:

y(t) = / g(t- -r)u(~)d~ (3.5)

~’~0

where y(t) and u(t) represent respectively the output and the input. Theresidence time [97] is given by:

~=0 (36)Tr¢s

f g(t)dt

In continuous flow system, the residence time can be interpreted as the ex-pected time it takes for a molecule to pass trough the flow system.

The residence time can also be connected to the input-output signalswithout using a phenomenological model description of the considered pro-cess. Based on the concept of impulse response function, the residence timecan be calculated as follows:

¯ Continuous-time systems:

TF’(O)(3.7)~’res- TF(O)

where TF(s) is the Laplace transform of the impulse response g(t) andTF’(.) is the derivative of TF(.) with respect to s.

TLFeBOOK


¯ Discrete-time systems:

T.~e 8 ~

~ kbkk=O

~, bkk----O

B(z) represents the discrete impulse response function defined as

B(z) = bkz-kk=0

It is easy to verify that in the discrete case the residence time is

B’(1)T,,e~ -- B(1)

where B’(.) is the derivative of B(.) with respect to

(3.8)

(3.9)

(3.10)

The results concerning the calculation of the residence time for linear sys-tems can also be extended to multidimensional continuous flow in non linearsystems [65].

3.1.2 Transfer function

A more general structure is the transfer function (TF) structure. It holdsthe FIR structure as its special case.

Definition 3 (Transfer function) Transfer function is given

B(q-~)y(k) = A(q-’) u(k-d) (3.11)

where A is a monic polynomial of degree nA

A (q-l) = 1 + alq-1 nt- ... -}- a,~Aq-’~’~ (3.12)

and B is a polynomial of degree nB

B (q-l) = bo + b~q-1 + ... + b,~Bq-’ ~" (3.13)

where an E ~, n = 1, 2, ..., nA and bn E ~, n = 0, 1, ..., nt~.

The main advantages of the TF model are that:

TLFeBOOK

3.1. TRANSFER FUNCTION 51

¯ a minimal number of parameters is required,

¯ both stable and unstable processes can be described.

Disadvantages include that

¯ an assumption of process order, hA, is needed (in addition to nB andd),

¯ the prediction is more complex to compute.

Poles and zeros give a convenient way of characterizing the behaviorof systems described using transfer functions.transform gives

U(z-1)

Note, that switching into z-

= z-’~B (z-l) (3.14)A(z-1)

= z-~b° q- blZ-1 q- "’" q- b’~’z-n~ (3.15)1 + alz-1 + ... + anaz-’ ~a

Multiplying the numerator and the denominator by znB+n-4+d gives

Y (z__~) = TM (bozTM + b,z~,-~ + ... + b~,) = B (z~ (3.16)U (z) ~.+~ ( TM + al z~-~ + .. . + a~) A (z

The roots of the polynomials give the poles (roots of A (z) = 0) and the zeros(roots of B (z) = 0) of the system.

Definition 4 (Poles and zeros) For a tra~fer function (Deflation 3) nB zeros of the system are obtained from the roots of

B (z) = boz~ + b~znB-1 + ... + b~. = 0 (3.17)

The nA pol~ of the syste~n are obtained from the roots of

A (z) = ~A +alzha-1 + .. . + a~a = 0 (3.18)

A (z) can be represented nR nC

A (z) = ~ (z - p~). ~ + a~z + Z~) (3.19)n=l

where p~ are the nn real poles and z2 + anZ + ~ contain the nc complexpMrs of pol~ of the system. In a simil~ way, B (z) can be represented

nR nc

B (z) = (z + + &)n=l

where r~ are the na real zeros and z2 + a~z + ~ contain the nc complexpairs of zeros of the system.

TLFeBOOK


The steady-state gain is obtained when z --+ 1

Y (z)lim ~ (3.21)z’-~l U (z)

From (3.16) it is simple to derive the following result.

Algorithm 5 (Steady-state gain) The steady-state gain of a system de-scribed using a transfer function (Definition 3) is given

nB

K~ = n=0 (3.22)1+

where K,~ E ~ denotes the steady-state gain of the system.

Example 13 (Pole and steady-state gain) Consider the following first-order system:

y(k) = ay(k 1)+ u(k- 1) (3.23)

The system can be written as

y(z-1)_=B(z-i) Z-1

-- - (3.24)U (z -1) A (z -1) 1 - az-1

and

Y(z)g(z)

This system has one pole in z = a.The steady-state gain is given by

B(z) A(z) z-a

(3.25)

K,~,~ = ~ (3.26)1-a

In general, a system is stable2 if all its poles are located inside the unitcircle. If at least one pole is on or outside the unit circle, the system is notstable.

Example 14 (Stability) Consider the system in Example 13 with initialcondition y (0) = y0 and control input u (k) = 0.

The future values of the system for k = 1, 2, ... are given by

y(k) = akyo (3.27)

If ]aI < 1, then y (k) tends to zero and the system is stable.

2BIBO stability: A system is BIBO stable, if for every bounded input, we have a

bounded output.

TLFeBOOK

3.2. DETERMINISTIC DISTURBANCES 53

3.2 Deterministic disturbances

In general, a real process is always subjected to disturbances. The effectsof the system environment and approximation errors are modelled as distur-bance processes. Models of disturbance processes should capture the essentialcharacteristics of the disturbances. In control, the disturbances that affectthe control performance without making the resulting controller implemen-tation uneconomical, are of interest.

Consider a TF structure with a disturbance:

y(k)= B(q-1) (k-d)+~(k) (3.28)A (q-l)

where ~ (k) represents the totality of all disturbances at the output of theprocess. It is the sum of both deterministic and stochastic disturbances.

In some cases, deterministic disturbances are exactly predictable. Asstmlethat the disturbances are described by the following model

C~ (q-l) ~ (k) (3.29)

Typical exactly predictable deterministic disturbances include

¯ a constant

C~ (q-l) = 1 -- q-1 (3.30)

¯ a sinusoid

C~ (q-l) = 1 -- -1 cos (w Ts) + q- (3.31)

Example 15 (Constant deterministic disturbance) A constant distur-bance gives

~ (k) = ~ (k - (3.32)

Thus, the effect of a disturbance at sampling instant k - 1 remains also atinstant k.

3.3 Stochastic disturbances

The most serious difficulty in applying identification and control techniquesto industrial processes is the lack of good models. The effect of the environ-ment of the process to be modeled, and approximation errors, are modeled as

TLFeBOOK


disturbance processes. These disturbances are classified into two categories:measured (e.g., ambient temperature) and unmeasured (e.g., particle sizedistributions, or composition of raw materials).

Usually, random disturbances are assumed to be stationary. Let us recallthe definition of stationary processes.

Definition 5 (Stationary process) A process {x (k), k E T} is said be stationary if, for any {kl, k2, ..., kg}, i.e. any finite subset of T, thejoint distribution (x (kl + T), Z (k2 + T),..., X (kN + T)) of X (k + T) does notdepend upon T.

The modeling of unmeasured perturbation is based on a sequence {e (k)} independent random variables with zero mean, E {e (k)} = 0, and variance0"2-- i.

These assumptions are not restrictive. In fact, a random sequence {bsuch that E {b (k)} = and E {b(k)2} = 0"2 can be expressed as a functionof e (k) as follows

b(k)=ae(k)+m (3.33)

Remark 3 (Gaussian stationary processes) The usual argument givenin favor of Gaussian stationary processes hinges upon the central limit the-orem. Roughly, a large number of small independent fluctuations, when av-eraged, give a Gaussian random variable. Notice also that linear operationsupon Gaussian process leave it Gaussian. Physically independent sources(linear systems or linear regime) of small disturbances produce Gaussianprocesses.

Example 16 (Fluidized bed) Consider a bubbling fluidized bed [20]. The-oretically it is possible to understand and predict the mechanism and coales-cence for two or three isolated bubbles in a deterministic manner. However,we are unable to extend the deterministic model to accurately predict thebehavior of a large swarm of bubbles, since we do not have exact and com-plete knowledge about the initial conditions (start-up of a fluidized bed) andexternal forces acting on the system (particle size distributions, etc.). Sucha process appears to us to be stochastic, and we speak of the random coa-lescence and movement of the bubbles, which leads to pressure and densityfluctuations.

TLFeBOOK

3.3. STOCHASTIC DISTURBANCES 55

where A is the difference operator

3.3.1 Offset in noise

The following model

~ (k) = C (q-l) {~(k) (3.34)

where C is a polynomial in the backward shift operator q-l, can be used todescribe the noise affecting the plant under consideration. The model consistsof a zero mean random noise sequence, (e (k)}, colored by the polynomial The offset is not modeled by (3.34).

To take the offset into account, the following solution has been proposed

~ (k) = C (q-l) e(k) (3.35)

where d is a constant depending on the plant operating point. However, it hasbeen shown that even if d is a constant, or slowly varying, there are inherentproblems in estimating its value (appearance of 1 in the regressor, whichis not a persistently exciting signal). Thus, the parameter d is inherentlydifferent from the other parameters of the model.

A better solution, which does not involve the estimation of the offset, isto assume that the perturbation process has stationary increments, i.e.

C(q-1)

A (-~e (k) (3.36)

i (q-l) = 1 __q-1 (3.37)

This disturbance model is more realistic. It can be interpreted as randomstep disturbances occurring at random intervals (e.g., sudden change of loador variation in the quality of feed flow).

The model described in (3.36) corresponds to the inherent inclusion an integrator in the closed loop system. In general, the perturbation isdescribed by -cD where D is a polynomial in q-1. The choice of D = AD*allows the incorporation of an explicit integral action into the design, whereD* is a polynomial in q-1. In particular, the choice D = AA is common.Various system structures with stochastic disturbances will be considered inthe following.

3.3.2 Box-Jenkins

The representation of process dynamics is usually achieved with a disturbedlinear discrete-time model. Practically all linear black-box SISO model struc-tures can be seen as special cases of the Box-Jenkins (B J) model structure.

TLFeBOOK


Definition 6 (Box-Jenkins) Box-Jenkins (B J) structure is given

y(k)=A (q_l) uB (q-1----~) (k - d) + D--~) e (k)C (3.38)

where { y (k) } is a sequence of process outputs, {u (k) } is a sequence of processinputs, and {e (k)} is a discrete white noise sequence (zero mean with finitevariance (r 2) sampled from the process at instants k = 1, 2, ...;

A (q-~) = 1 + alq-1 + ... + a,~aq-’~A (3.39)B (q-l) = bo + blq-~ + ... + ~,~,q-~" (3.40)C (q-~) = 1 + c~q-~ + ... + c,~cq-’~c (3.41)n (q-~) 1 + diq-1 + ... + dnDq-nn (3.42)

are polynomials in the backward shift operator q-~, q-ix (k) = x (k 1)

Basically, this type of black-box system is used for four main purposes:

1. characterizing (understanding) the input-output behavior of a process,

2. predicting future responses of a plant,

3. developing control systems and tuning controllers, and

4. filtering and smoothing of signals.

Items 1-2 are related to process modeling (monitoring, fault detection, etc.)and items 2-3 to process control (controller design, especially model-basedcontrol). The fourth topic concerns signal processing (handling of measure-ments in process engineering).

d is the time delay (in sampling instants) between the process input andthe output:

In process modeling d >_ 1 assures causality: process output, can notchange before (or exactly at the same time) when a change in processinput occurs.

d _< 0 is used in filtering (smoothing) signals, d -- 0 can be used on-line filtering to remove measurement noise; d < 0 can be appliedonly in off-line filtering (to compute the filtered signal, future values ofthe signal are required).

TLFeBOOK


In what follows, interest is focused on process modeling, d _> 1, where d is nota design parameter, but depends on the time delay observed in the processto be modeled.

Assume that the current sample instant is k, and that the following in-formation is available:

current and past process inputs u (k), u (k - 1), ..., u (k

process outputs y (k), y (k - 1), ..., y (k - hA).

Let us denote by ~(k + 1) the prediction of y (k + 1) obtained using model. Let us assume further that

¯ the predictions ~(k),~(k- 1),...,~’(k- max(nA, nc)).

are available as well.In practice, an exact mathematical description of the dynamic response

of an industrial process may be either impossible or impractical. The use oflinear models involves a loss of information (approximation errors, neglecteddynamics). When selecting a structure for a stochastic process model, anassumption on the effect of noise is made. In the following, some com-monly used transfer function models (input-output model of the process)with stochastic noise models (effect of unmeasured noise to the process out-put) are discussed.

3.3.3 Autoregressive exogenous

A variety of real-world processes can be described well by the autoregressive(AR) model. The AR process can be viewed as a result of passing of thewhite noise through a linear all-pole filter. In the acronym ARX, the Xdenotes the presence of an exogenous variable.

Definition 7 (ARX structure) ARX (autoregressive exogenous) structureis obtained by setting C = 1, D = A in the general structure (Definition 6):

B(q-1) 1y (k) = A (q_l) u (k - d) +)A (q-"------~e (3.43)

Let us rewrite the ARX system for k + 1 and multiply by A:

A(q-1) y(k+l)=-B(q-1)u(k-d+l)q-e(k-bl) (3.44)

TLFeBOOK


For the system output at k + 1 we get

y(k+l)=B(q-1)u(k-d+l)-Al(q-1)y(k)+e(k+l) (3.45)

where A = 1 + q-IA~. Noticing that the first two terms on the right sidecan be calculated exactly from the available data up to time k, and the noiseterm e (k + 1) will act on the process in the future, we have that

~(k+l)=B(q-~)u(k-d+l)-A~(q-~)y(k) (3.46)

which minimizes the expected squared prediction error3.

Algorithm 6 (ARX predictor) Predictor for an ARX system (Definition7) is given by:

~(k + 1) = B (q-*) u(k - d 1)- A, (q-*) y(k (3.48)

where AI (q-*) = a, + ... + a,~Aq-(’~A-1). The prediction is a function of theprocess measurements.

3The objective is to find a linear predictor depending on the information available upto and including k which minimizes the expectation of the squared prediction error, i.e.

(3.47)

where E {.} represents the conditional expectation (on the available data). Introducing(3.45) in (3.47), we

E{[y(k+I)-~]2}

E { [B (q-1)u(k_d 1)- A,(q- ’) y(k ) + e (k 1)-

E{[(B(q-1)u(l~-dq - 1)- 1 ( q-1)y(lg)_~) q -e(k n t- 1 )]2 }

E { [B (q-’)u (k - d + 1) 1 (q- -l) y ( ~) -- ~

+~E {(~ (q-l)~(~-~ + ~)-a, (q-’)~ (a)- ~) +E {¢~ (~ + ~)}

Since e(k+l) is independent with respect to u(k-d),u(k-d-1),.., y (k), y (k- 1),... , and a linear combination of these variables generating ~, the ond term will be zero. The third term does not depend on the choice of ~ and the criterionwill be minimized if the first term becomes null. This leads to (3.46).

TLFeBOOK


The prediction given by the ARX structure can further be written out asscalar computations:

~(k+l) bou(k-d+l) (3.49)

+blu (k - d) + ...

+b,~Bu (k - d + 1 - nt~)-aly(k)

-a2y ( k - 1) - ...

-a,~Ay (k - nA + 1)

Note, that the predictor can be written as a linear regression

~(k + 1)= ~ (k + (3.50)

where r=...it, and the LS method can be used for estimating the parameters.

In general, the predictor can be written as

~’(k + 1)= f(u(k-d + 1), ...,y (k) (3.51)

where f is a linear function of the process inputs and past process outputs. Iff is a non-linear function, these models are referred to as NARX models. Theprediction is a function of the process inputs and the past (real, measured)process outputs. This avoids the model to drift far from the true process inthe case of modeling errors.

3.3.4 Output error

Definition 8 (OF, structure) Output error (OE) structure is obtained setting C = D = 1 in the general system (Definition 6):

B(q-~)y(k) = A (q-~) u(k - d) (3.52)

In the OE system, the process output is disturbed by white noise only.

Let us calculate the output of such a system at the future sampling instantk + 1 (one-step-ahead prediction); assume that A and B are known. We canrewrite the OE structure for k + 1

y(k + 1)= B(q-1)A (q-~) u (k - d + 1) + e (k (3.53)

TLFeBOOK


The first term on the right side is a deterministic process, with u (t; - d + 1),u (k- d- 2),... available. The second term is unknown (not available instant k) but {e (k)} is assumed to have zero mean. Thus, we

~(k + 1)=~(q-X~,u(k_dB + 1) (3.54)A(q-x)

where the hat in ~ indicates that a prediction of y is considered. It is easy toshow that this predictor minimizes the expected squared prediction error4.

Algorithm 7 (OE 1-step-ahead predictor) Predictor for an OE system(Definition 8) is given

B(q-1)~(k + 1)= A(q_x)u(k-d+ 1) (3.55)

The predictor operates ’in parallel’ with the process. Only a sequence ofsystem inputs is required and the measured process outputs y (k) are notneeded.

The predictor can be written in a more explicit way as

~(k+l)=B(q-1)u(k-d+l)-Ax(q-1)~(k) (3.56)

where At (q-~) = ax + ... + a,.~q-(’~A-x) (containing the model coefficientscorresponding to past predictions), i.e. given by

A (q-X) = 1 + q-lA1 (q-l) (3.57)

4Let us minimize the expected squared prediction error,

~(k + 1) =argn~nE {[y(k 1) -y~2}

Substituting (3.52) to the above, we get

+E +

The second term wiB be zero. The t~rd term do~ not depend on the choi~ of ~. Thecriterion wiB be ~d if the ~st term becomes n~l. T~s le~ to (3.54) [51].

TLFeBOOK


The prediction given by the OE-structure can further be written out as scalarcomputations:

~(k+l) bo u(k-d+l) (3.58)

+blu (k - d) + ...

+bn~u (k - d + 1 - riB)

1)- ...-a,~4~(k - nA + 1)

Note, that the prediction has the form

~(k+l)=f(u(k-d+l), ,~(k),...) (3.59)

where f is a linear function (superposition) of the process inputs and pastpredictions. Nonlinear models are referred to as NOE models (non-linearoutput error). The prediction is a function of the past predicted outputs.Notice that the output measurement noise does not affect the prediction.Notice also that we can write the predictor as ~(k + 1) = ~T~ (k); however,

the ~s in the regression vector are functions of the parameters 0 (see Section3.3.8 how this affects the parameter estimation).

3.3.5 Other structures

A third important system structure is the ARMAX (autoregressive movingaverage exogenous) structure.

Definition 9 (ARMAX structure) The ARMAX structure is obtainedby setting D = A in the general structure (Definition 6):

A(q-1) y(k) = B(q-1) u(k-d)+v(q-1)e(k) (3.60)

Let us again rewrite the system for k + 1

A(q-1) y(k+l)=B(q-i)u(k-d+l)+C(q-1)e(k+l) (3.61)

Defining C1 (C = 1 + q-lC1) and A1 (C and A are monic), we can write

y(k+l) -A ~(q-~)y(k)+B(q-i)u(k-d+l) (3 .62)

+e(k + 1)+C~ (q-~) e(k)

TLFeBOOK


Taking into account that the random variable e(k + 1) will act on the process(system) in the future, we obtain an expression of the ARMAX predictor

ff(k+l)=-Al(q-1)y(k)+B(q-1)u(k-d+l)+Cl(q-~)e(k)

which is the prediction minimizing the expected squared error5.

In view of (3.62), it follows that the prediction error is equal

e (k) = y (k) - (3.64)

The past noise terms can be calculated from data. Alternatively, we canobtain the expression from (3.60) for computing past noise terms

[e(k) = ~=Z ~(k) ~-~u

Algorithm 8 (ARMAX predictor) Predictor for an ARMAX structure(Definition 9) is given by:

~(k + 1)=-A, (q-~)y(k)+ B(q-’)u(k-d+ 1)+C1 (q--l) ¢(k) (3.66)

where e (k) = y (k) - ~(k). It is a function of three terms: the measurements, system inputs, and known errors.

Substituting (3.64) to (3.63) and reorganizing, we can see that the diction can be written as

~(k+l) = (Cl(q-~)-A~(q-1))~(k)

+~ (q-1)u(~- ~+ (3.67)-cl (q-~) ~(~)

5Let us minimize the expected squared prediction error. Let the ARMAX system begiven by (3.62). We have that

= ¢{[-~1 (q-’)~(~)+ S(q-~)u(~-~+ ~)+c~ (q-~) e (~) + e(~ + ~]:}

Reorganizing gives

J ~--- E{[-A1 (q-1) y(k)q-J~(q-1)~z(k-dq- 1)-be1 (q-1) e(k)-~]2)

+2E {e (k + 1)

[-A~ (~-~) ~ (~) + ~ (~-’) ~ (~ - ~ + ~) + c, (~-~) ~ (~) where the l~t term ~nishes since e (k + 1) is independent of all previous obse~tions.The minimum of J is obtained at (3.63).

TLFeBOOK


which has the form

~(k + 1) = f(u(k-d+ 1),...,y(k),...,~(k),...) (3.68)

where f is a linear function of the process inputs, past process outputs, andpast predictions; non-linear models are referred to as NARMAX models.

Another important form is obtained by rewriting the noise term

(q ~ (k + 1) = C A (----~e (~ + (3.69)

Using definitions for C1, A1

~ (k + 1) + A~ (k)~ (k) e(k + 1)+ C~e(k) (3.70)

and ~ (k)= ~e(k), from (3.69), we have

{(k + 1)=e(k 1)+ [C ~ (q -~)- A1 (q- l) C(q A(q_~)je(k) (3.71)

From (3.65) we get an expression for the past noise terms. Substituting (3.65)to (3.71) and reorganizing we have for the noise term

~(k+l) = e(k+l) (3.72)

C(~_-i~y(k) A(q_,)

Substituting (3.72) for the noise term we obtain another expression for theARMAX predictor.

Algorithm 9 (ARMAX predictor: continued) Predictor for an AR-MAX structure (Definition 9) is given by:

B(q-1)~(k+l) m( q_l)u(k+l-d) (3.73)

_~ Cl (q-~) - At (q-~) B (q-~)]C (q-l) Y(k)-A---~u(k-d)

Thus the ARMAX predictor can be seen as consisting of an OE predictorand a correction term. ~

Example 17 (ARMA) Let us consider the following stochastic process [2]

y(k) + ay(~ - 1) = e(k) + ~(k (3.74)

TLFeBOOK


where {e (k)} is a sequence of equally distributed normal random variableswith zero mean.

The process can be written as

1 +cq_~e(k) (3.75)y(k)= l +aq--

Consider the situation at sampling instant k when y (k),y (k- 1),... observed and we want to determine y (k + 1). (3.75) gives

y(k+ 1)1 + cq-1

= e(k ÷ 1) (3.76)1 +aq-~

= e(k+ 1) e(k) (3.77)1 + aq-1

The term e (k + 1) is independent of all observations. The last term is linear combination of e (k), e (k - 1) ,... to be computed from the availabledata:

e(k)-1 +aq-ly(k)1 +cq-~ (3.78)

Eliminating e (k) from (3.77), we obtain

y(k+l)=e(k+l)+c-a

1 + cq-~y( k) (3.79)

The problem now is to find the prediction ~(k + 1) of y (k + 1), based the available data at instant k, such that the criterion

J= E{~(k + 1)} (3.s0)

is minimized, where s (k + 1) is the prediction error

e(k+ 1)=y(k+ 1)-~(k+ (3.81)

Equations (3.79)-(3.81) lead

E{e=(k+l)}

+E 1 + cq-~ y (k)

+2E e(k+l) l+cq-~

(3.82)

TLFeBOOK


As e (k + 1) is independent of the observations available at instant k, it fol-lows that the last term vanishes. Hence, we can write

J=E{e2(k+l)}>_E{e2(k+l)}

where the equality is obtained for

~ = ~(k + l)

(3.83)

The prediction error is given by z (k + 1) = e (k +

Example 18 (ARMA: continued) Let us obtain the same result usingAlgorithm 9. From the system given by (3.75) we get

C (q-l) _- 1 ~- cq-1 (3.85)

B (q-’) = (3.86)A (q-l) = 1 +aq-~ (3.87)

Using Algorithm 8 we get

C~ (q-i)- A~ -1)~(k+X) y(k)c(q-1)

Substituting C, C~ = c and A1 = a gives

~(k + 1)= +cq-~y(k) (3.88)

Definition 10 (ARIMAX structure) ARIMAX (autoregressive integralmoving average exogenous) structure is obtained by setting C = 1, D = AAin the general structure (Definition 6):

C(q-I)y(k) -- B (q-~) u(k_ (3.89)

A(q-’) AA(q-~)e(k)

where A = 1 - q-~.

Multiplying (3.89) by AA, reorganizing, and shifting to k + 1 gives

AA(q-~)y(k+l)=B(q-~)Au(k-d+l)+C(q-1)e(k+l) (3.90)

The ARIMAX system can be seen as an ARMAX process, where A (q-~) ~--AA (q-l) and u (k) ~-- Au (k). Then, using Algorithm 8, we have predictor.

c--a

1 + cq-’ y (k) (3.84)

TLFeBOOK


Algorithm 10 (ARIMAX predictor) The predictor for an ARIMAX sys-tem is given by

~(k + 1) = -[AA], (q-l)y(k)+ B (q-l)Au(k- d-~- 1)d-C1 (q-l) e (]g)

(3.91)

where e(k) = y(k) - ~(k) and [AA] = 1 + q-1 [AA]I. In the ARIMAXprocess, the noise (filtered by C) is integrated to the process output, whichmakes it possible to model disturbances of random walk type.

The ARIMAX model (also referred to as the CARIMA model) is used the Generalized Predictive Control (GPC). Due to the integral term presentin the noise model, an additional integral-of-error term is not needed in thecontroller.

3.3.6 Diophantine equation

Prediction is intimately related to the separation procedure of available andunavailable data. This separation procedure is performed by Diophantineequation which will be presented next.

The Diophantine equation

q_i Fi (q-1) (3.92)

is used for separating a transfer operator into future and known parts (avail-able and unavailable information). The solution to this equation will beneeded in the next sections. Equation (3.92) can be solved in a recursivefashion, so that polynomials Ei+l and F/+I are obtained given the values ofEi and Fi. In the following, this recursive solution will be derived.

Let us assume that Y is monic. Hence, the polynomials are given by

v (q-l) = 1 + y,q-’ + ... + y~yq-~- (3.93)x (q-I) = x0 + ~lq-1 + ... + x.~q-~x (3.94)Ei (q-l) = ei,o + ei,lq-1 + ... + ei,,,~,q-’ ~E’ (3.95)

F, (q-~) = £,o + £,1q-1 + ". + £,n~iq- nF’ (3.96)

Consider two Diophantine equations

x (q-i) = y (q-,) E,÷I (q-~) + q-(’÷l)F,+~ (3.97)X (q-l) _- y (q-i) Ei (q-i) -4- q-iFi (q-i) (3.98)

TLFeBOOK


Subtracting (3.97) from (3.98) yields

0 = Y (q-I) [Ei+~ (q-l) Ei(q- l)] (3.99)+q-i [q-’F~+~ (q-~) - Fi (q-l)]

The polynomial E~+I - E~ can be split into two parts (by simply taking outone element)

Ei+I (q-l) Ei(q- l) --- -- ~ ( q- l) _~_ ei+l ,iq-i (3.100)

Substituting (3.100) into (3.99) gives

e -io = ~ (q-~) [~ (q-l) + ,÷l,,q (~.~o~)+q-’ [q-lF,÷l (q-~) - F~ (q-’)]

= y (q-l) N (q-l) (3.1o2)

Hence, it follows that

and

/~ (q-l) = (3.103)

q-~Fi+l(q-~)-Fi(q-1)+ y(q-1)ei+Li=O (3.104)

In order to derive the coefficients of the polynomial q-~F~+~, let us rewritethis equation into the following form:

q--l[fi+l,Oq-fi+l,lq--l+...q-fi+l,nFi+lq -nFi+l] (3.105)

+ [1 + y~q-~ + ... + y, wq-’w] ei+l,i= 0

Finally, we obtain

ei+l,i ~ fi,o

f~+~,o= ~,~ - yae~+~,~/i+1,1 --= /i,2 -- Y2ei+l,i

fi+l,j -=" fi,j+l -- yj+iei+l,i

:

(3.~06)

(3.107)

TLFeBOOK


where (3.107) is for j = 1, 2, ... Thus, a recursive formula for computingwas obtained. Using (3.100) and (3.103), we also obtain a recursive formulafor

Ei+x (q-a) = Ei (q-a) + e~+x,iq-i (3.108)

Now all that is needed are the initial values Ex and Fx for the recursiveformula. Setting i = 1 in (3.92) gives

X (q-X) = E, (q-a) + q-lFx (q-l)(3.109)

y(q-a)

X(q -1) -~ El(q-i) y(q-1)Wq-iFl(q-1) (3.110)

Since Y is monic, we get

E1 (q-l) _- (3.111)

and substituting (3.111) into (3.11o) gives

F1 (q-l) = q IX (q-l) xoY (q-l)] (3.112)

The Diophantine equation (3.92) for (3.93)-(3.96) can thus be solved ing from the initial values E1 and Fa given by (3.111) and (3.112). The lutions E~ and F~, i = 2, 3, ... , are then obtained recursively using (3.106),(3.107), and (3.108) using i = 1,2,3,

Algorithm 11 (Solution of the Diophantine equation) The solutionof the Diophantine equation

where

(q-l)-- = Ei (q-a) + q-i~ (q-I)1

(3.113)

ny > 0, can be computed recursively using

E, (q-a) = (3.118)

F1 (q-l) = q IX (q-l) _ Xo]Z (q-l)] (3.119)

y (q-a) _~ 1 4-yaq-1 + ... 4- ynvq-nY (3.114)

X (q-l) = Xo 4- xaq-1 4- ... 4- Xnxq-nX (3.115)

(q-X) = e ,o + e ,aq-a + ... + q (3.116)F~ (q-X) = .5,o + f~,aq-a + ... + I~,,~F, q-’~F’ (3.117)

TLFeBOOK

3.3. STOCHASTIC DISTURBANCES

and for i = 1, 2, ... and j = 0, 1, ..., max (nx - i, ny - 1) -

69

ei+l,i = fi,0 (3.120)

fi+l,j = fi,j+l - yj+le~+l,~ (3.121)

E/+I (q-i) = E, (q-i) + ei+l,,q-~ (3.122)

The degrees of the polynomials are given by

n~,, = i- 1 (3.123)

nF~ = max(nx - i, ny - 1) (3.124)

3.3.7 /-step-ahead predictions

Let us consider a Box-Jenkins structure (Definition 6)

C_.(q-1)y(k)= B(q-1)u(k-d)+ (k) (3.125)

A(q-1) D~-~e

where the disturbance is given by

C(q-1)~ (k) = n(~e(k) (3.126)

and let us calculate a ’one-step’ algorithm for obtaining/-step ahead predic-tions (see [88]). Thus, we wish to have a prediction ~(k + i) for the plant put y (k + i), provided with information up to instant k: y (k), y (k - 1), u(k),u(k- 1),... and ~(k),~(k- 1), ....values

Observe that the future output

y(k +i)=B(q-1)A(q_l) u(k +i-d)+ D(q_l)e(k +i)C(q-1) (3.127)

can only be predicted with uncertainty since the future noise terms e (k + 1),e (k + 2), ..., e (k + i) are unknown. The minimization of such an uncertaintyis the objective of the predictor design problem. This is a crucial issue in thepredictive control, to be discussed in later chapters.

Separation of disturbance

Let us start by separating unknown terms (future) and known terms introducing the Diophantine equation for the disturbance process

C= E, (q-~) + q-i-~ (q-i) (3.128)

n(q-1) (q-i)

TLFeBOOK


where

degEi(q-X) = { i-1 if riD>0(3.129)min (i -- 1, nc) otherwise

degFi (q -x) = max(nc- i, no-1) (3.130)

The disturbance at k + i can be decomposed into unknown (future) andknown (current and past) parts

~ (k + i) = Ei (q-l) e (]~ d- d-D(q-~)--e(k) (3.131)

The polynomials Ei and F~ are usually solved recursively (see 3.3.6). As-

sume that the solutions E~ and F/ are available. The second term on theright side can be compUted by multiplying (3.125) by

~ (q-l) Fi (q-~) B (q-*), (k - d) F~(q- *) C (q-l) ,,,c (q-l) v(k) - c (q-,) A ~ C (q-l) -d(-~e~’~) (3.132)

and rearranging

Fi (q-l) Fi (q-l)

BA (q-~)(q-~) u (k - d)](3.133)

The process output/-steps ahead then becomes

y(k+i) = -dB(q-~)X~-~=~u (~ + i)

(3.134)

C(q-1) A(q-i) u --

+E, (q-l) e(~

The third term depends on future noise terms e (k + i), which are unknown.However, {e (k)} was assumed to have zero mean, and we can take the condi-tional expectation of y (k + i), given all data up to k and the future processinputs. The best/-step ahead predictor (in the sense that the variance of theprediction error is minimal) then becomes

4B (q-~)u (k + (3.135)~(k-t-i) =- q- ~(q-1)

.~Fi(q-~) B(q-a)u(k_d)]C(q-’) y(k) A(q-’)

TLFeBOOK


Notice, that (3.135) represents the/-step-ahead prediction as a function system inputs and prediction errors.

The prediction error for the i’th predictor is given by

~(~ + ~)=y(~ +i)-~(k +~)= E,(q-i)~(~ (3.136)

which consists of future noise only (white noise with zero mean and variancea2). The variance is given by

j--0

where ei,j is the j’th element of Ei. Thus, the variance of the prediction erroris minimal.

Let us continue a bit further and write (3.135) strictly as a function system inputs and past outputs. Multiplying both sides of the Diophantine(3.128) with BD/AC we obtain

B(q-1) B(q-~)D(q-1)Ei(q-1) -~B(q-1)F~(q-1)(3.138)A(q_~) - A(q_~)C(q_~) + A(q-~)C(q-~)

which with (3.135) yields:

~(k+i) q_d [B(q-1)D(q-~)E{(q -1) q_iB(q-1)F~(q-1)1A(q-1)C(q-,) + A(q_l)-~:-~ju(k+i)

-tFi(q-1) B(q-1) (k-d)] (3.139)C (q_l) y(k)

Simple algebraic calculations lead to the following/-step-ahead predictor.

Algorithm 12 (/-step-ahead BJ predictor) The/-step-ahead predictorfor a Box-Jenkins system (Definition 6) is given

~(k + i) = -dB ( q-l) D(q-i) E~(q- l) ?~ (k -{- i) ~ --A(q-1)C(q-1)

Fi (q-1)C(q_l)y(k) (3.140)

where E~ and F~ are obtained from the Diophantine equation

D(q_i)(3.141)

TLFeBOOK


Example 19 (1-step-ahead OE predictor) Let us derive the one-step-ahead predictor for an OE system (Definition 8).

The Diophantine equation becomes

1--- E1 (q-l) -t-q-iF1 (q-l) (3.142)

for which the solution is

E1 (q-i) = (3.143)

F1 (q-l) _-- (3.144)

The predictor becomes

~’(k + 1) B(q-1)u(k-d+ 1) (3.145)A (q-i) -

If A is a factor of D, numerical problems may occur (notice that in theARX and ARMAX structures D = A, in the ARIMAX D - AA.) To avoidthese problems, let us rewrite the algorithm for this particular case.

Algorithm 13 (/-step-ahead BJ predictor: continued) If A is a factorof D, denote

D (q-l) = D1 (q-i) A (q-l) (3.146)

The/-step-ahead predictor for a Box-Jenkins system (Definition 6) is thengiven by

Fi(q-1)~(k+i)=q-dB(q-1)nl(q-1)Ei(q-i)u(k+i)+ (k) (3.147)C(q-1) C(q-1)Y

where Ei and F~ are obtained from the Diophantine equation

C (q-l__) iFi (q-i)D (q_l) = Ei (q-i) + q- -~(q-i) (3.148)

Example 20 (1-step-ahead ARX predictor) Let us derive a one-step-ahead predictor for an ARX system (Definition 7).

Since A = D, D1 = 1. The Diophantine equation becomes

1 .-i F1 (q-i) (3.149)A(q_l) = E1 (q-i) +t/ ~(q_-~)

The solution for the Diophantine is given by

E1 (q-I) _-- (3.150)

F1 (q-l) = q [1 - A (q-l)] = -A1 (q-I) (3.151)

The predictor becomes

~(k+l)=B(q-1)u(k-d+i)-Al(q-1)y(k) (3.152)

TLFeBOOK


Separation of inputs

In control, the future process inputs are of interest (they are to be determinedby the controller). The future and known signals in (3.140) can be furtherseparated into future and known parts using a Diophantine equation:

B (q-l) D (q-l) Z~ (q-’) Hi (q-l)= Gi (q-l) + q-i+a (3.153)

A (q-l) C (q-l) A (q-l) C (q-i)

which gives the algorithm for the/-step ahem prediction.

Algorithm 14 (/-step-ahead BJ predictor: continued) Using a model with separated available and unavailable information, the/-step aheadprediction is given by

+ i) (q-l) _ d + (3.154)Hi(q-1)

~ A(q_l)C(q_l)u(k)

where Ei and F/are obtained from the Diophantine equation

C= E~ (q-l) + q-i~ (q-l) (3.155)

D(q-1) (q-l)

and Hi and Gi are obtained from the Diophantine equation

B (q-l) D (q-l) Ei (q-l) Hi (q-l)= Gi (q-l) + q-,+d (3.156)

A (q-l)C (q-l) A (q-l)C (q-l)

Finally, let us give the corresponding algorithm for the case of having Aas a factor of D.

Algorithm 15 (/-step-ahead BJ predictor: continued) Consider a model with separated available and unavailable information and where A isa factor of D. Denote

D (q-l) -_ D1 (q-l) A (q-l) (3.157)

The/-step-ahead predictor for a Box-Jenkins system (Definition 6) is thengiven by

~(k + i) = ai (q-1)u(k-d (3.158)

-+ H,C(q-1)

TLFeBOOK


where Ei and F~ are obtained from the Diophantine equation

and Hi and G~ are obtained from the Diophantine equation

B (q-~) D1 (q-’) E~ (q-i)= G~ (q-i) + q-~+dH~ (q-i)

C(q-1)

(3.159)

(3.160)

3.3.8 Remarks

Let us conclude this chapter by making a few remarks concerning the prac-tical use of the stochastic time-series models.

Incremental estimation

In practice, differencing of data is often preferred, i.e. working with signalsAy (k) and Au (k), where A = 1 -q-~. However, differencing data with highfrequency noise components degrades the signal-to-noise ratio. It is possible(simple solution) to overcome this with appropriate signal filtering.

Gradients

The estimation of the parameters in the polynomials A and B of the processmodel is usually based on gradient-based techniques. For the ARX structure,the predictor is given by

~(k + 1) = B (q-~) u(k - 1) - A1 (q-~) y(k (3.161)

Since the inputs are independent of the predictor parameters, LS, RLS, etc.(see Chapter 2) can be used.

In the OE structure (as well as ARMAX, etc.), the predictor outputdepends on the past predictions and the regression vector is thus a function ofthe parameters themselves. In order to estimate the parameters, alternativemethods must be used. Following chapters will present the prediction errormethods (non-linear LS methods), for which the gradients of the predictoroutput with respect to the parameters are required.

The OE predictor is given by

~(k+l)=B(q-~)u(k-d+l)-Al(q-~)~(k) (3.162)

TLFeBOOK


The gradients with respect to the parameters in the feed forward part B aregiven by

O~(k) =u(k-d- n)- am(3.163)

m~l

where n = 0, 1, ..., riB; and with respect to parameters in the feedback partA

O~(k)’~’~ O~(k-m)

(3.164)

where n = 1, 2, ..., nA.6

aThe gradients with respect to the parameters in the feedforward part B are given by

o~(~) _~_oB ~Oan = Oa,~ (q-’) u (k - d) - A, (q-’) ~(k 1)

The first term on the right hand side does not depend on an. For the second term, sinceA1 = al + a2q-~ + ... + an.~q-In-~-~), we can write

O~(k)Oan

which can be written as (3.164). Similarly, the gradient with respect to parameters in are given by

o~(k) oObn = "~n B (q--l) u(k - d) - "~nA1 (q-l) ~(k -

The first term on the right hand side gives

0 0 0-~-~B (q-’) u (k - d) = -~-~nbou (k - d) + ... + ~b,u (k - d - n)

o+~bn,~u (k - d -

Ob,= u(k-d-n)

and the second term gives

o ,~A O~(k - .~)-Ob---~ (q-~) ~(~ - 1) = - ~_, a..

m=l

Combining these, we have (3.16g).

TLFeBOOK


Assuming that the parameters change slowly during the estimation7, thepast gradients can be stored and the computations performed in a recursivefashion. Let us collect the results in this more convenient form.

Algorithm 16 (Gradients of the OE predictor) The derivatives of theoutput of the OE predictor with respect to the parameters in A and B axegiven by

(k) % (k); (3.165)

~A

¯ b,,(k) = u(k-d-n)-Ea,~(k-m) (3.166)rn=l

nA

¯ ~(k) = -~(k-n)-Ea,~(k-m) (3.167)

where n = 0, 1, ..., nB and n = 1, 2, ..., nA, respectively.

Notice that the system needs to be stable, since otherwise the gradients willgrow (unbounded).

Estimation of noise polynomials

In the system model

y(k)= B(q-1)u(k-d)+--A(q-1) e(k) (3.16s)D(q-1)

only the process dynamics, B and A, are usually identified. D is a designparameter, the selection of which results in the OE structure, ARX structure,etc. Estimating C is generally difficult in practice, because of the nature ofC and the fact that e (k) is never available and must be approximated by priori or a posteriori prediction errors, thus reducing the convergence rateof the parameters.

For estimating C, a simple solution is to filter the data (using the priorinformation about the process noise) with a low pass filter, F, thus removinghigh frequency components of the signals. It is then possible to use a fixedestimate of C (often denoted by T), representing prior knowledge about theprocess noise. One interpretation of T is that of a fixed observer. In theestimation, e.g., the RLS can be used.

7The assumptions are that ~ (k - m)10=0q¢) " "~ ~ (k - m) IO=O(k--m) andOb~°---L (k - rn) IO=O(k) ~ O_L (k - m)Io=o(k-m), where 0 contains the time-varying compo-

Oa~ Oannents of the model.

TLFeBOOK

Chapter 4

Non-linear Systems

Identification can be justified by the reduced time and effort required inbuilding the models, and the flexibility of parameterized experimental modelsin real-world modeling problems. For simple input-output relations, linearmodels are a relatively robust alternative. Linear models are simple andefficient also when extending to the identification of adaptive and/or dynamicmodels, and readily available control design methods can be found from theliterature. However, most industrial processes are non-linear.

If the non-linear characteristics of the process areknown, a seemingly non-linear identification problem may often be converted to a linear identificationproblem. Using the available a priori knowledge of the non-linearities, themodel input-output data can be pre-processed, or the model re-parameterized.This is in fact what is often done in gray-box modeling. As the processes be-come more complex, a sufficiently accurate non-linear input-output behavioris more difficult to obtain using linear descriptions. If more detailed modelsare required, then the engineer needs to turn to methods of identification ofnon-linear systems.

Many types of model structures have been considered for the identificationof non-linear systems. Traditionally, model structures with constrained non-linearities have been considered (see, e.g., [78]). Lately, a number of newstructures have been proposed (see, e.g., [86]) and shown to be useful inapplications. Particular interest has been focused on fields such as neuralcomputation [29][27] and fuzzy systems [47] [73]. These fields, among manyother topics, are a part of the field of artificial intelligence.

In this chapter, a brief introduction to some basic topics in the identifi-cation of non-linear systems is given. The target of this chapter is to providethe reader with a basic understanding and overview of some common param-eterized (black-box) structures used for approximating non-linear functions.In particular, the basis function networks are introduced. They provide a

77

TLFeBOOK

78 CHAPTER 4. NON-LINEAR SYSTEMS

general framework for most non-linear model structures, which sb~ould helpthe reader in understanding and clustering the multitude of differelat specificparadigms, structures and methods available. The power series, one-hidden-layer sigmoid neural networks and 0-order Sugeno fuzzy models are consid-ered in detail, including linearization of the mappings and the computationof gradients.

4.1 Basis function networks

In this section, the basis function networks [86] are introduced. They providea general framework for most non-linear model structures.

4.1.1 Generalized basis function network

Most non-linear model structures can be presented as decomposed into twoparts:

¯ a mapping ~ from the input space to regressors; and

¯ a mapping f from regressors to model output.

The selection of regressors ~o is mainly based on utilizing physical insightto the problem. Obviously, all the necessary input signals should be in-cluded. Some transformation (pre-processing, filtering) of the raw measure-ments could also be used in order to facilitate the estimation of the parame-ters. In dynamic time-series modeling, the ’orders’ of the system (number ofpast inputs, outputs and predictions) need to be chosen. Such semi-physicalregressors are formed in view of what is known about the system. In theremaining sections, we will be interested in the mapping f.

The non-linear mapping f can be viewed as function expansions [86]. Ina generalized basis function network [31], the mapping f is formed by

H

~(k) = f(~o(k),.) = Ehh (~(k),.)gh h----1

(4.1)

where gh are the basis functions and hh are weighting functions, h = 1, 2, ..., H.~ denotes the model output1. The dot indicates that there may be some pa-rameters associated with these functions. The output of each basis function

1 Usually the models are to be used as predictors. We will use this notation throughout

the remaining chapters.

TLFeBOOK

4.1. BASIS FUNCTION NETWORKS 79

is multiplied by the weighting function and these values are summed to formthe function output. With constants as weighting functions, the structureis referred to as a standard basis .function network. The k’s in (4.1) refer the fact that these models will be used for sampled systems. The mapping f,however, is not dependent on the sampling, just as the operations of multi-plication and summing in linear systems are not dependent on the sampling.In the remainder of this chapter, simplified notation will be used

H

~= f(~, ") = :~-~ hh (~, ") gh (qa, (4.2)h----1

An important interpretation of the basis function network is that of localmodels [31]: ~n (4.2), ...

... each function hh can be viewed as a local model, validity ofwhich is defined by the activation value ofgh. Hence gh’S partitionthe input space into operating regions on each of which a localmodel is defined. The network smoothly joins these local modelstogether through interpolation to form an overall global model f.

4.1.2 Basis functions

Usually the basis functions are obtained by parameterizing a single-variablemother basis .function, ~, and repeating it a large number of times in theexpansion. Single-variable basis functions can be classified into local andglobal basis functions. Local basis .functions have a gradient with a boundedsupport (at least in a practical sense), whereas global basis functions havean infinitely spreading gradient. This means, roughly, that with local basisfunctions there axe large areas in the input space where a change in the inputvariable causes no change in the function output; a change in the input of aglobal basis function always causes a change in the function output. Differentkinds of single-variable basis functions are illustrated in Fig. 4.1.

In the multi-variable case, the basis functions can be classified into threemain groups [86]: tensor products, radial constructions and ridge construc-tions.

¯ The tensor product construction is the product of single-variable func-tions

I

gn (~o) = g (~i, ") (4.3)

where the subscript i indexes the elements of the regression vector.

TLFeBOOK


0.8~

0.~

OA

0.~

0

sine

0 0.2 0.4 0.6 0.8 1

1

0.~

0.~

0.4

0.2

0

0 0.2 0.4 0’.6 0’.8

1

0.~

O.f

0.4

0.~

0

semi-circl 1

o.~

0.~

O.d

0.2

0

sigmoid

0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

Figure 4.1: Examples of single-variable basis functions, ~. A global basis

function (sine) has a gradient with an infinite support. Local basis functions

(semi-circle) have a bounded support, at least in a practical sense (Gaussian

and sigmoid functions).

TLFeBOOK

4.1. BASIS FUNCTION NETWORKS 81

Radial construction is based on taking some norm on the space of theregression vector and passing the result through a single-variable func-tion

In ridge constructions, a linear combination of the regression vector ispassed through a single-variable function

gh (~) ---- ~ (f~’~+~/h) (4.5)

The parameters ~/h and f~h are typically related to the scale and position of

gh.

4.1.3 Function approximation

The powerful function approximation capabilities of some basis function net-works are a major reason for their popularity in the identification of non-linear systems.

Let us call by a universal approximator something that can uniformlyapproximate continuous functions to any degree of accuracy on compact sets[12]. Proofs of universal approximation for basis function networks have beenpublished by several authors. Hornik [30] showed that the multi-layer feed-forward networks with one hidden layer using arbitrary squashing functions(e.g., sigmoid neural networks) are capable of approximating any measurablefunction from finite dimensional space to another. This can be done to anydesired degree of accuracy, provided that sufficiently many basis functionsare available. The function approximation capability can be explained in thefollowing intuitive way [29]:

Any reasonable function f{x} can be represented by a linearcombination of localized bumps that are each non-zero only in asmall region of the domain {x}. Such bumps can be constructedwith local basis functions and the associated weighting functions.

Not surprisingly, universal function approximation capability can be provedfor many types of networks. All the proofs are existence proofs, showing thatapproximations are possible: There exists a set of basis functions with a setof parameters that produces a mapping with given accuracy.

Unfortunately, less can be said about how to find this mapping: Howto find the correct parameters from data, or what is a (smallest) sufficient

TLFeBOOK


number of basis functions for a particular problem. A typical framework isto approximate an unknown function F

y=F(~)÷e (4.6)

based on sampled data ~ (k), y (k), k = 1, 2, ..., K, where the observed puts are corrupted by zero mean noise e (k) with finite variance. Notice, thatin a standard basis function network

H

~= f(~) " E ahgh (~,Dh,~/~) (4.7)

the parameters ah appear linearly. If only ah are of interest, these can beestimated from data, e.g., using the least squares (the regressor containingthe evaluated basis functions). If there are parameters in the basis functionsto be estimated (Dh,’~h) they typically appear non-linearly. In some cases,these types of parameters are commonly estimated using iterative gradient-based methods (see Chapter 6).

The structure selection problem (roughly, the selection of H) can alsobe guided by data (see, e.g., [18][26]). The main obstacle in structure se-lection is the fundamental trade-off between bias (due to insufficient modelstructure) and variance (due to noise in a finite data set), the bias-variancedilemma. With increased network size the bias decreases but the varianceincreases, and vice versa. In practice the performance of data driven struc-ture selection (smoothing) algorithms can be computationally expensive andsometimes questionable, however, and it is more common to experiment withseveral fixed network sizes H. The ’optimal’ network size is then found as thesmallest network which gives sufficient accuracy both on the data and on in-dependent test data (roughly, cross-validation). The bias-variance dilemmacan also be tackled in parameter estimation by posing constraints on thefunctional form of the mapping (see Chapter 6).

4.2 Non-linear black-box structures

Non-linear system identification can be difficult because a non-linear systemcan be non-linear in so many different ways. Traditionally only model struc-tures with constrained non-linearities have had success in practice. Lately, anumber of new model structures have been proposed and shown to be usefulin applications (see, e.g., [34]). Most interest has been focused on artificialneural networks (such as sigmoid neural networks and radial basis functionnetworks), and fuzzy systems.

TLFeBOOK

4.2. NON-LINEAR BLACK-BOX STRUCTURES 83

To start with, recall the structure of the generalized basis function net-work (4.2)

H

~= f(~o, .) = ~-~hh (~, ")gh (~o, (4.8)h=l

The overall mapping is obtained by taking a weighted sum of the activationof the H basis functions. In what follows, some commonly used structures arepresented and shown to fit to the above generalized basis function networkscheme.

4.2.1 Power series

When global basis functions are used, each weighting function hu has aneffect on the model outcome at every operating region. Typical examplesinclude the linear and multi-linear models, special cases of power series, orpolynomial developments. In power series, the powers of the regressor gener-ate the basis functions; in multi-linear systems only first order terms of eachregressor component are used. The static mapping can be seen as a specialcase of the identification of non-linear dynamic systems using Volterra series(see Chapter 5). Other common structures include the Fourier series, for ex-ample. These belong to the class of series estimators, an extension of linearregression where the components of the regression vector represent the basisfunctions. A convenient feature of these structures is that all the parametersappear linearly, and can be estimated, e.g., using the least squares method.

Linear regression

A linear regression model uses global basis functions

^ (4 9)y-- ~o .

where y is the model output, ~ [~1, ~2, ..., ~, ~,+1 -- 1] T= are the I inputs^ r ....

to the model with bias, and ~ = [~, ~, ..., ~, 0r+l] are the correspond-ing parameters. A linear model can be presented in the framework of the

TLFeBOOK


generalized basis function network by assigning

(4.10)

Quite obviously, only linear functions can be mapped using the above modelstructure.

Alternatively, we can also consider using the observed data points as basisfunctions. Assume that a linear model is based on K available data points(T (k) ,y (k)), k = 1, 2, ..., K. Let a linear model be given by (4.9) ~= [~T~]-I CTy (see Section 2.2.3)¯ Then

~= ~oTZy (4¯11)

where Z = [~T¢] -1 cT. Denote the kt|’ column of Z by Z~. The presentationin the framework of the generalized basis function network is obtained by

y=y H=Kg, (~o, .) ~- ~oTz, hi (~,’) ~-- y(1): :

gh (~O,’)~-- 9~Tzk hn (~a,.)~-- y(k)

gg (~,’) *-- ~orZ~c h, (~o,.) ~- y(g)

(4.12)

assigning

This type of formulation is important in smoothing ([26] [25]). The smoothedvalues for each observed data point are given by

~ = ~ [~T~]-~ ~Ty (4.13)

where S = ¯ [¢T¢] -~ cT (a g × K smoother matrix), and its rows are re-ferred to as equivalent kernels of a linear smoother.

Multi-linear systems

In many practical cases, multi-linear developments are sufficient. A functiong (~o), ~o = IT1, ..., ~, ..., ~a~]T, is multi-linear if it is linear in each component

TLFeBOOK


q~i, when all other components 9~j, j ~ i , are fixed. A general form is givenby

{i1=1,-.. ,il_l }=l;il ~i2<...<ii_1

+01,2,"" ,I~01 (4.15)V

owhere E(a,b,.--,cI:l;a<b<.-.<c denotes multiple summations ~aD__1 EL1 "" EcD---1under the conditions a < b < ... < c.

Example 21 (Multilinear system) For a two-input system (I = 2), multi-linear development is given by

(4.16)

Example 22 (Fluidized bed combustion) In an FBC (see Appendix the steady-state relation between the fuel feed rate, Qc, and the combustionpower is given by

P = gc (1 - V)Qc + gvVQc (4.17)Let us now consider Qc and V (fraction of volatiles in fuel) as non-constantinputs to the system. Using the following input transformations

(4.18)

the equation can be written as

P = Hc~ 1 q- (g V - gc)~1~:~2

which is a multi-linear mapping from ~ to P.

(4.19)

Exeunple 23(I = 3) a multi-linear development is given

(Multilinear system: continued) For a three-input system

~ = ~0 + ~,~, + ~ + ~a~oa + (4.20)

nt-~1,2~91~P2 nt- ~1,3(~1~P3 -Jr" ~2,3(~2(/93

÷01,2,3~o~ ~o2~o3

TLFeBOOK


Power series

In power series, the basis functions are formed by taking tensor products ofinteger-valued powers (j~ = 0, 1, 2, ...; i = 1, 2, ..., I) of the input variables

I

gh (90, .) = H (4.21)i=1

up to a given order j

I

j = Zji (4.22)i=1

The model is then produced by taking a weighted sum of the activations ofthe basis functions and the associated parameters.

Example 24 (Power series) In many practical situations, a second orderdevelopment (j = 2) is sufficient. For a two-input system (I = 2), a polyno-mial development model would be as follows

The corresponding presentation in the framework of the generMized basisfunction network is obtained by substituting

~’-=~" H=6g~ (90,-) = 1 h~ (90,.) ~ g~ (90, .) ~ qo~ hu (90, .) ~ g3 (T, ") ~ ha (~, ") ~ g~ (~,.) ~ ~ (~,.) gs(~,’)~x~2 hs(~,’)~g~ (~,.) ~ ~ (~,-) ~ ~,~

(4.24)

In multi-linear and polynomia..1 developments, the model output ~ is linearwith respect to the parameters 0, yet non-linear with respect to inputs ~.From parameter estimation point of view, the methods could also be seen asa method of pre-processing the input data, and then applying simple linearregression. However, non-linear functions can be mapped using the abovemodel constructions.

TLFeBOOK


Inverse power series

The inverse of a function is commonly needed in applications of control, forexample. In practice it is often simplest to choose the structure of the inversemodel (of the power series expansion) and to estimate their coefficients. a side note, however, we discuss in the following a less known approach forfinding the inverse of a power series.

The Bttrman-Lagrange series constitutes a generalization of Taylor series.These series appear when we expand an analytical function f(~o) into a seriesof ascending powers of another analytical function w(~o):

f(~o) = E ¢x~w~ (~o) (4.25)h=O

For n >_ 1, it follows:

a,~ = . lim~_~a d~,~_l f’ (~o) ’~ (~) j

where

(4.26)

f’ (~o) = df(~o) (4.27)d~o

Example 25 (Function approximation) Consider a function f(x) 2

to be approximated with w(x) = x around x = 0. Then f (x) = 2x and using(4.26) we have for the coefficients:

11. d { (x-a)2}=lim~_,oda2 = ~ ,m~__,a~xx 2x x2 ~xxx = 1 (4.28)

ah = 0 for h = O,h = 2, 3, 4,... (4.29)

Let us apply this result for inverting a power series. Consider the followingpower series:

w (~o) = c1~ + ~ + .. . + c,~o’~ + .. . (4.30)c~ # 0, which is convergent in the neighborhood of the point ~o = 0. Findthe expansion of the function ~o (w) with respect to the ascending powers W

~o(w) = s0 + cqw + ... + c~nw" + ... (4.31)

TLFeBOOK


This particular problem can be solved using the Bttrman-Lagrange series.

1 d n-1 (~)’~c~= ~ lim~_~0 d--~ ,(n= 1,2,...) (4.32)

because in our case

f (~o) = ~ and t’ (~o) = (4.33)

Example 26 (Numerical example) Consider the following power seriesmodel

y = ~o (x) = clx + c2~2 (4.34)For an approximation of its inverse in the neighborhood of x -- 0

x = v(y) = a0 +aly+ a2y2 +""

the following coefficients are obtained using (4.32)

~c~ca2 c4233 = -c15’~ = --52-7’35c~ = 14~1,-..

(4.35)

(4.36)

(4.37)

(4.39)

which leads to:

oo (an),~-~w,~. (4.42)

Figure 4.2 illustrates the approximation with c~ = 1 and c2 = 10.

Example 27’ (Exponential function) Consider the following seriesoo (_a). z,~+ a (4.40)

~ (~) = ~-o~ = ~!n~-0

Using (4.32) we derive

c~,~ = lim~-~0 ~exp (anx) (- an)’-~ (4.41). n~

TLFeBOOK


0.1.*

0.1

>~ 0.05

\". .~\

\\\

/

1

/

05 0 0.05 0.1 0.15X

Figure 4.2: A power series (solid line) and its inverses (dotted lines) approx-imated around x = 0 using n -- 1, 2, 3, 4 and 5 first terms of the series.

4.2.2 Sigmoid neural networks

Neural networks consist of multiple techniques related loosely to each otherby the background of the algorithms: the neural circuitry in a living brain.There are three basic perspectives to neural networks. On can consider themas a form of artificial intelligence, as a means for enabling computers toperform intelligent tasks. On the other hand, neural networks can be seenfrom a biological point of view, as a way of modeling the neural circuitryobserved in living creatures. The approach taken in engineering, the morepractical view, considers neural networks from a purely technical perspectiveof data classification, filtering and identification of non-linear systems.

For a large part the research in neural computation overlaps with thefields of statistical analysis and optimization. In general, neural networksare modeling structures characterized by:

¯ A large number of simple interconnected elements (units, nodes, neu-rons); and

¯ A learning mechanism for adjusting the connections (weights, param-eters) between the nodes, based on observed patterns of the systembehavior.

There are several alternative ways to categorize neural network models andtechniques. From the pragmatic point of view, we can categorize themroughly into two classes:

TLFeBOOK


¯ Multi-layer perceptron networks, such as sigmoid neural networks (SNN),deal with function approximation. Perhaps the most important resultbrought by the neural research has been to show that any reasonablefunction can be approximated to any degree of accuracy; and to providemodel structures that are also viable in practice.

¯ Self-organizing maps (SOM) consider the problems of clustering andquantization. Among the main new contributions is the introductionof an internal topology into the clustering process.

In what follows, we will focus on function approximation tasks. The SOMwill be briefly discussed in connection of nearest neighbor methods (section4.2.3).

Sigmoid neural networks are probably the most common neural networkstructure used for non-linear function approximation. They are also com-monly referred to as multi-layer perceptrons (MLPs), backpropagation net-works, or feed-forward artificial neural nets. These names come from thedifferent properties of the standard sigmoid neural network.

Sigmoid neural networks use (practically) local basis functions. However,due to the use of the ridge construction (4.5), the interpretation as local rood-els is not very useful as this type of structure estimates non-linear hypersur-faces, rather than local models for various operating regions. In practice, thefact that hypersurfaces are estimated provides advantages in interpolation.This type of structure is often referred to as semi-global. Other examples ofsimilar structures include perceptrons or hinging hyperplanes, for example.

In sigmoid neural networks, the basis functions have a sigmoidal shape.The network units are organized as layers. A typical structure is that of alayered2 feed-forward3 sigmoid neural network. In the neural network ter-minology, the model inputs reside at the input layer. The input layer thenfeeds the hidden layer units. The network units, at the hidden layer, com-pure a linear combination of the input variables and pass this sum througha sigmoid function

1gh (~,/~h, ~) -- 1 + e-~-~ (4.43)

The outputs of the multiple units at the hidden layer are then. fed to theoutput unit(s). The output unit computes a weighted sum of the activations

2In the simplest class of layered networks, every unit feeds signals only to the unitslocated at the next layer (and receives signals only from the units located at the precedinglayer). Hence there are no connections leading from a unit to units in preceeding layers,nor to other units in the same layer, nor to units more than one layer ahea~t.

3A network topology is feed-forward, if it does not contain any closed loops (feedback).

TLFeBOOK

4.2. NON-LINE~AR BLACK-BOX STRUCTURES 91

Figure 4.3: Two network constructions. Both models compute a function,~ =f(~). Left: linear model consisting of a single summing node. Pdght:standard sigmoid neural network with five hidden nodes (consisting of summing element followed by a non-linear sigmoid element) at the hiddenlayer, and a single summing output node.

of the hidden layer units. The value of the weighted sum is then the outputof the model.

Figure 4.3 illustrates a standard sigmoid neural network. Also structureswith multiple hidden layers can be constructed, where the basic mappingsare further convolved with each other by treating basis function outputs asnew regressors.

One-hidden-layer sigmoid neural net

For most practical purposes in process engineering, a single-hidden-layer net-work topology is sufficient. Let us consider a standard one-hidden-layer sig-mold neural net with H hidden units (see Fig. 4.3) including bias parameters.The network computes a function f from an I dimensional column vector ~

TLFeBOOK


of model inputs:

H

~-- f(~o, ~, f~) = ahgh (~O,[~h) + aH (4.44)h=l

where

1= i (4.45)1-)-exp(-y]flh’’~ai-~h’’+a)i=a

The H + 1 + H (I + 1) network parameters are contained in an H + 1 dimen-sional colunm vector c~:

(4.46)

and a matrix f~:

(4.47)

Note that, for convenience, the bias parameters ~/in (4.43) are now integratedinto the structure of the matrix f~. It is common to include bias constantsin the linear summing.

Usually, the parameters in the sigrnoid neural network are estimated usingsome gradient-based method. To do this, the derivatives with respect to theparameters need to be computed. Let us derive the required gradients.

For the parameters c~ at the output node we have

~f = -- Z ahgh (~,f}h) = gh (~,f~h) (4.48)Oah O0~h h=l

0~f = 1 (4.49)00~H+I

For the parameters f} at the hidden layer nodes, the derivative cm~ be written

(4.50)

TLFeBOOK


The derivative ~gh (~o,/3h) of a sigmoid function n and ridge constructionn (¢) can be rewritten using the chain rule

0 0 00Zh,,,g.(~,/~h) = 0Zh,,~(¢(~,/~)) = 0¢---7~(¢.) 0--~,, ~ (~0,~)

For ~ as the sigmoid function

1n(¢h) = 1 + exp(--¢h) (4.52)

the derivative can be expressed in terms of the function output

0_-a-7-~ ~ (¢h) = ~ (¢u)[1 - a (¢u)] (4.53)

For the linear sum ¢~

I

¢~ = ~/~h,~qo~ + ~h,,+l (4.54)i=1

we have that

0--¢h = ~o~ (4.55)0Z.,~0~¢~ = 1 (4.56)

Using (4.50), (4.51), (4.53) and (4.55) the derivative o/~h.~ can be written as

0--f = o~hgh (~, Dh)[1 -- gh (OR, ~h)] qO, (4.57)

where gh is given by (4.45). For the bias parameters we have, using (4.50),(4.51), (4.53) and (4.56)

0--f= o~hgh (~o,/3h) [1 - gh (~o,/3~)] (4.58)

In a similar way, it is possible to linearize the one-hidden layer sigmoidneural net in the neighborhood of its operating point ~ using the Taylorseries approximation:

"~(~O, ~) : ~1~01 c . ., a t- ai~O! - 3t- ~I+1 (4.59)

TLFeBOOK


where 5~ = b-~,f(~, (~,/3). To do this, the derivatives with respect to thesystem inputs need to be calculated:

ai = a~i a~fl~ an~igu (~,/3h)(4.60)

Again, the chain rule can be applied. The sigmoid and its derivative havealready been given in (4.52)-(4.53). For the linear sum, we have

0~-~¢h =/3h,i (4.61)

Substituting these into (4.60) and evaluating at the point of linearization the linearized parameters are obtained from

H

5i = E ahgh (~O,/3h) [1 --h gh (~,/3h)] ,3h,i (4.62)h=l

For zero error at the operating point ~o, the bias can be taken as

a~+~ = f (~, (~,/3) - E 5i~ (4.63)i----1

Let us collect the results.

Algorithm 17 (One-hidden-layer sigmoid neural network) The out-put of a one-hidden-layer sigmoid neural network with H hidden nodes isgiven by

H

~’= f(~") = ahgh (~O,/3h) + aH (4.64)h=l

where

1gh (~o,/3h) , (4.65)

1 +exp (- E/3h’’~’ -/~’,+1)i- ---1

and ~ is the I dimensional input vector. The parameters of the networkare contained in a H + 1 dimensional vector c~ and H × (I ÷ 1) dimensional

TLFeBOOK


matrix/~. The gradients with respect to the parameters are given by

_~0f

0 .fO0~H+I

0~f

0f

= gh (~o,/3h) (4.66)

----- 1 (4.67)

---- ahgh(tP,~h)[1--gh(CP,~h)]~i (4.68)

= ahgh (¢p,~Oh)[1 -- gh (~O,/3h)] (4.69)

where h = 1, 2, ..., H and i = 1, 2, ..., I. A linearized approximation in theneighborhood of an operating point ~ is given by

f(~, a) = ~1 (~)~1 + ... + 5, (~)~i + ai+~ (4.70)

where the linearized parameters are given by

H

~i(~) = Eahgh(~,~Oh)[1--gh(~,Oh)]~h,, (4.71)

I

~I+l ---~ f(~,O:,f~) -- E~i~i (4.72)

i=1

i = 1, 2, ..., I.

4.2.3 Nearest neighbor methods

There exists a large variety of paradigms using local basis functions. Prob-ably the simplest paradigm is the nearest nei9hbor method. Let us considera pool of data of K input-output measurement pairs: 4) = (~o (1), y (1)),(~o (2), y (2)), ..., (~o (K), y (K)). In order to obtain an estimate put given an input pattern ~o, ~o is compared with all the input patterns~o (1), ..., ~o (K) in the data pool. A pattern in the pool is found that is clos-est (e.g., in the Euclidean sense) to the given input (competition betweenthe patterns). Denote this closest pattern by ~o (c), c E {1, 2, ..., K}. estimate of the output for the given ~o is then ~ = y (c).

Let us choose ~ as the indicator function in (4.4):

1 if k e r (~o)(4.73)g} (~) = { 0 otherwise

TLFeBOOK


where P (~o) -- arg mink=l,2,...,K

]1~o - ~o (k)]]2; and in (4.2):

K

~= ~-~ y (k) g~ (T) (4.74)k=l

the standard basis function network has as many basis functions as thereare data points, H ~- K, the local models are given by the observed data,hh ~- y (k). As can be easily seen, this type of approach belongs to the radialconstructions [86].

In order to cope with noise and storage limits, an estimate can be com-puted based on prototypes representing average local behavior, instead ofdirect observations. In this case, H < K. The problem then is to find aset of suitable centers ~h of the basis functions. There are several ways to

look for centers. The ~3h,iS can be spread on the domain of each i using, e.g.,equidistant intervals, thus forming a grid of points. In the Kohonen networks(learning vector quantization, self-organizing map (SOM) [49], see also [32][80]), the distribution of the basis function centers resembles the probabilitydistribution of the data patterns k = 1, 2, ..., K.

All the above methods find a single winner among the basis functions;there is competition between the basis functions (non-overlapping partitions).One can also consider multiple winners at the same time (overlapping par-titions). For example, in the k-nearest neighbors estimate, the ~ is takento be the average of those A observed y (k)’s that are associated with the inputs ~o (k) closest to the given ~. The k-nearest neighbors method can represented by using (4.2) with:

if k e r (4.75)g~ (~o) = { 0 otherwise

where F (~o) contains the indexes for the A-nearest neighbors of W. Noticethat the nearest neighbors estimators have no parameters to be estimated,and that no a priori assumption on the shape of the function is made. There-fore this type of methods are typical examples of non-parametric regressionmethods [18][25][26]. The A is seen as a smoothing parameter concerningstructure selection. Proper selection of A is important: notice that as A in-creases the bias increases and the variance decreases, and vice versa. Thelinear smoother is given by ~ = Sy, where S is, roughly, a A-banded matrix(provided a suitable ordering). For A = 1, the smoother matrix is given bythe identity matrix, S = I). For A > 1, the equivalent kernel is a rectangularone; note that many other types of kernels can be considered.

When each basis function is associated with a constant output at themodel output space, piecewise constant functions can be constructed. With

TLFeBOOK


overlapping partitioning the resolution of the mapping is enhanced, since themodel outcome is an average of the activated constant values. However, theresult is still a piecewise constant function. Often, this is an undesirableproperty for a model. Consider applications of control, for example: Basedon a piecewise constant type of model, the gain of the system is either zero orundefined! There are many ways to arrive at smoother models. It is commonto use prototypes (instead of direct data points), multiple winners (insteadof a single winner) and Gaussian kernels (instead of rectangular kernels), in the radial basis function networks to be considered in the next subsection.

Radial basis function networks

In radial-basis function (RBF) networks, the input space is partitioned us-ing radial basis functions. The center and width of each basis function isadjustable. Each basis function is associated with an adjustable weight andthe output estimate is produced by computing a weighted sum at the out-put unit. Hence, the output of the network is a linear superposition of theactivities of all the radial functions in the network.

The RBF network is often given in the normalized form:

H

f(~,/~, ~’) = ah~U (~,¢~, 7U) = ~= (4.76)

h=l

The normalization Inakes the total sum of all basis functions ~h unity, inthe whole operating region. Gaussian functions are typically used with RBFnetworks:

(4.77)

The use of the RBF networks requires that a suitable partitioning of the inputspace can be obtained. In simple methods, the locations of the Gaussians aretaken from arbitrary data points or from a uniform lattice in the input space.Alternatively, clustering techniques can be used to choose the centers, suchas the SOM. In the orthogonal least-squares method, locations are selectedfrom data points one by one, maximizing the increment of the explainedvariance of the desired output.

Fig. 4.4 illustrates function approximation with non-overlapping par-titioning of the input space (left), and overlapping partition (right).

TLFeBOOK


1

0.8

~_o.~

0.2

0

o

o

o

0.6 0.7 0.8 0.9

SOM

[0.67]

Normalized Gaussians

Figure 4.4: Examples of identification using local basis functions.

lower pictures illustrate the partitioning found by SOM (left), and normal-ized Gaussians (right) placed at the same centers as well as the associatedvalues of the LS weighting constants. The upper figures depict the data pat-terns (dots) and the mapping obtained at the output of the structure (solidline).

4.2.4 Fuzzy inference systems

Fuzzy modeling [73] [50] stems from advances in logic and cybernetics. Orig-inally, fuzzy systems were developed in the 1960s as an outcome of fuzzyset theory. The fuzzy sets are a mathematical means to represent vague in-formation. In applications to process modeling and control, the, uncertaintyhandling aspects have, however, received less interest. Instead, the focus hasbeen in extending the interpolation capabilities of rule-based expert systems.In what follows, we will focus on fuzzy rule-based systems. For more general

TLFeBOOK


fuzzy systems, please see remarks at the end of this section.In rule-based inference systems, the universe is partitioned using con-

cepts, modeled via sets. Reasoning is then based on expressions of logicalrelationships between the concepts: if-then rules. In expert systems, binary-valued logic is applied; fuzzy systems belong to the class of multi-valued logicsystems.

Expert systems are rule-based systems. The knowledge about the processis represented using rules, such as

if premise then consequent (4.78)

Traditional expert systems use crisp rules based on two-state logic, whereelements either belong or do not belong to a given class. The propositions(premise and consequent) represented by the rule can be either true or false,

In real life, the classes are ill-defined, overlapping, or fuzzy, and a patternmay belong to more than one class. Such nuances can be described with thehelp of fuzzy sets. In a fuzzy context, a pattern may be assigned a degree ofmembership value, which represents its degree of membership in a fuzzy set,~ e [0,1].

Example 28 (Fuzzy and crisp sets) For example, it might be difficultto classify the speed of a car as ’fast’ or ’not fast’ because human reasoningrecognizes different shades of meaning between the two concepts [50]. Fig.4.5 illustrates crisp and fuzzy concepts of ’fast’.

The domain knowledge is expressed as if-then rules, which relate theinput fuzzy sets with the model outcome (if speed is fast then move away).Note, how the use of the adjective ’fast’ to characterize the speed of anapproaching car is entirely sufficient to signal the necessity to move away;the precise velocity of the car at this moment is not important.

The if-then rule structure of fuzzy inference systems is convenient in thatit is easily comprehensible as being close to one of the ways humans storeknowledge. It also provides explanations for the model outcome since it isalways possible to find out the exact rules that were fired, and to convert theseinto semantically meaningful sentences. In this sense, the fuzzy inferencesystems also provide insight and understanding of the considered process,and support ’what if’ type of analysis.

Fuzzy systems in process modeling

From the process modeling point of view, the applications have shown twomain contributions due to the use of fuzzy systems:

TLFeBOOK


Fuzzy and crisp sets ’FAST’

Crisp ’FAST’

60 80 100 120

Figure 4.5: Examples of a crisp set and a fuzzy set describing the ’fast’ speedof a car. A crisp set has sharp boundaries and its membership functionasssumes binary values in {0,1}. A fuzzy set has vague boundaries and itsmembership function takes values in [0,1].

A reduction of the complexity of systems, based on the use of fuzzysets; and

¯ A transparent form of reasoning (similar to the conscious reasoning byhumans).

Neural networks have been shown to be very efficient in their fi~nction ap-proximation capabilities (see e.g. [27][29]), that is in mimicking the observedinput-output behavior. Unfortunately, neural networks appear as black-boxmodels to the developer and end-user. The disadvantage of black-box modelsis that, although they seem to provide the correct functional mappings, theydo not easily give any additional explanation on what this mapping is com-posed of, or make it easier to understand the nature of the relation betweenthe function inputs and outputs. This lack of transparency might lead todifficulties if human intervention or man-machine interaction is required orexpected. This is often the case when models are utilized for optimizationor monitoring purposes. Obviously, some transparency would also help themodel developer to evaluate the validity of the model and to locate unsatis-factory behavior when further model development is needed. The need fortransparency has motivated the use of fuzzy systems.

In process modeling, fairly simple fuzzy models have been applied. Ingeneral, fuzzy modeling can be an efficient way to quickly build a model or

TLFeBOOK


a controller for a process, when only rough information is available. Alsonon-linear systems can be considered without extra effort. In a ’standard’learning approach:

1. fuzzy sets and rules are stated by the experts (plant operators, engi-neers),

2. the system structure is established, and

3. the membership functions and/or output constants are fine-tuned usingdata.

This allows us to build a model of a system based on experimental humanknowledge. Alternatively, one may start from a nominal model, in whichcase the motivation for using the fuzzy approach comes from the easiness ofvalidation and the possibility to tune the system manually.

Sugeno and Mamdani fuzzy models

Fuzzy models can also be seen as based on local basis functions [86]. In afuzzy system, the input partitioning is given by the premises of rules. InMamdani fuzzy models, both the premise and the consequent of a fuzzy ruleare specified using fuzzy sets:

if {(~1 is A~,I) and ... and (~o, is Au,~)} then (~ is (4.79)

where Ah,i and Bh are fuzzy sets specifying the h’th rule and I is the inputdimension. In order to get a crisp output from the fuzzy inference, defuzzi-fication is needed to convert the inferred fuzzy output into a crisp singleton.

In Sugeno fuzzy models, the consequent of a rule is a parameterized func-tion of the input variables. Hence the rules assume the form:

if {(~o~ is Ah,1) and ... and (~o, is A~,I)} then (~= fh (~,’))

Typically, the functions fh are constants (0-order Sugeno model) or linearpolynomials. With Sugeno models, the consequences of multiple rules arecombined by summing, weighting the rules with the normalized activationlevel of each rule. (}-order Sugeno models can be viewed as a special case ofMamdani models, in which each rule’s consequent is specified by a constant(a singleton fuzzy set).

In order to compute the fuzzy rules, the operations on fuzzy sets (is, and)need to be specified, as well as the inference (if premise then consequent).Common choice is to implement the ’~i is Ah,i’ by evaluating a triangular

TLFeBOOK


membership function (or Gaussian, or bell-shaped function), and the ’p andq’ operation as a product (or minimum). The if-then inference (implication)is usually seen as a binary-valued relation, true for the sets contai~.~ed in therule, zero elsewhere.

Fuzzy inference systems

Let us have a brief look at the logic background concerning fuzzy sets andreasoning in fuzzy systems. For more information, see e.g., [40].

A .fuzzy set A of X is expressed by its membership function #A from theuniverse of discourse to the unit interval

,A: x [0,1] (4.81)

#A (~) expresses the extent to which ~ fulfills the category specified by #A,where X is the universe of discourse (domain) of

Fuzzy inference systems consist of five blocks

¯ fuzzification

¯ data base

¯ rule base

¯ decision logic

¯ defuzzification

The .fuzzification block converts the system input ~ --- ~0 E ~ into a fuzzyset A~ on X. Its membership function #A’ (~) is usually defined by the pointfuzzification

1 if ~ = ~0(4.82)#A’ (~) ---- { 0 otherwise

Alternative fuzzifications can be used if information about the uncertaintyof the measurement ~ -- ~0 is available, or if the measurement itself is notcrisp.

The data base contains information about the fuzzy sets #A~ (~)’s (fuzzi-fication), #A, (~)’s and B (y)’s (r ules), and the associated linguistic teA~’s and B’s (rules). The rule base is a set of linguistic statements: rules.The rules assume the form

if (~1 is A1) and ... and (~ is A~) then (~ (4.83)

TLFeBOOK

4.2. NON-LINEAR BLACK-BOX STRUCTURES ~ 103

where A and B are linguistic terms defined by fuzzy sets in the data base.This can be translated into a simpler form using fuzzy and4

#A (qO)= T (#A, (~1),’",#A, (4.84)

A rule can be seen as a fuzzy implication function I

I(~ZA (¢P),~B (~)) (4.85)

which is often modeled using a t-norm.The decision logic processes the input fuzzy signals using linguistic rules.

Let us derive the modus ponens inference assuming a point fuzzification.

if (~ is A) then (~ is ~ is A’ (4.86)==~ ~" is B’

where

ttB, (~) = sup (T {#A’ (q~), A (~O),B (~’)]}) (4.87)

Since the input is a point qa0 (we assume point fuzzification here), then

1 if qa = ~oo#A’ (~0) ---- { 0 otherwise (4.ss)

and the result can be expressed as

#B’ (if) ---~ T {1,I[pt A (qOo),/z B (if)I} (4.89)

-~ I[/z A (qOo),/zB(ff)] (4.90)

In general, the inference of the h ’th rule, h = 1, 2, ..., H , for input qao canbe expressed as

(~) = [ if #Ah (tPo)=

(4.91)I [~Ah (qO0),~tBh (ff)] otherwise

4Basic operations (intersection =¢. fuzzy and, union =~ fuzzy or) on fuzzy sets canbe defined using t-norms and s-norms. T-norms are monotonic non-decreasing: a <b =:> T(a, c) < T(b, c); commutative: T(a, b) = T(b, a); associative: T(a, T (b, T(T(a,b) ,c); and have 1 as unit element T(!,a) = a. Any t-norm is related to its s-norm (t-conorm) by the deMorgan law S(a, b) = 1- T(1 - a, 1 - b). The conmaonly t-norms include product: T(a, b) abandmini mum: T(a,b) = rai n (a, b ). T he r elats-norms are probabilistic sum: S(a, b) = a + b - ab and maximum: S(a, b) = max (a,

TLFeBOOK

104 - CHAPTER 4. NON-LINEAR SYSTEMS

which in the case of t-norm implications can be further simplified to

#B’~ (9) = T [PAh (~00),IZBh (9)] (4.92)

The combination of all fuzzy inferences is made by means of an s-norm

~, (9) S ~;, (9) ~,=~,~,...,-The defuzzification determines (converts) the fuzzy output of the decision

logic into a crisp output value. A common choice is the center of area method

f~y 9#B’ (9) 49 (4.94)

In the case of 0-order Sugeno fuzzy models, if (~o is Ah) then (~ = Yh), can think of the output sets in (4.92) as given by singleton sets: #Bh (~) 1if ~ = ~h , zero elsewhere. The defuzzification can then be replaced by aweighted average

H

E ~h~. (~0)9= h=, (4.95)

H

h=l

O-order Sugeno fuzzy model

O-order Sugeno fuzzy models are common in many process engineering ap-plications. They represent a simple case of the more general fuzzy inferencesystems. Very often, the following choices are made:

¯ system inputs are crisp,

¯ product is chosen for the fuzzy and, and

¯ weighted sum is chosen for the defuzzification.

With these choices we arrive to the following O-order Sugeno fuzzy model.

Definition 11 (O-order Sugeno fuzzy model) A O-order Sngeno modelwith H rules is a function f from an I dimensional column vector of modelinputs

H

E ~,,g,, (~, ")9= f(~,a, .) = "=~

E g. (’¢, ")h=l

(4.96)

TLFeBOOK


Figure 4.6: Add-one partition of the domain of the i’th input ~ (i 1, 2, ..., I) using triangular fuzzy sets. The centers of the sets are given by

~,p (p = 1, 2, ..., P~). The bold line shows the membership function ~,3 (~).

where

I

gn (~’ ") = H #h,i (~i, (4.97)

where #h,~ (~, ") 6 [0, 1] is the degree of membership of the i’th inputin the premise of the h’th rule.

Let us change slightly the notation. Assume that an add-one partition-ing is used, see Fig. 4.6, where the domain of each input ~i is partitionedseparately such that

~ ~,p~ (~, .)= 1 for all (4.98)pi=l

where P~ is the number of fuzzy sets used for partitioning the domain of thei’th input. Notice that it is usually simple to set the membership functionssuch that an add-one partitioning is obtained. The tilde emphasizes thatdifference in the notation (strong fuzzy partition). The following result canbe derived:

Theorem 1 (Add-one partition) Assume that each input domain i partitioned such that

~ ~,p, (~, .) = (4.99)p~=l

TLFeBOOK


where ~,p~ E [0, 1] are the Pi membership functions used for partitioning thedomain of the i’th input ~ E ~, i = 1, 2, ..., I. In addition, suppose that therule-base is complete and that the product t-norm is used. Then, tlhe sum ofbasis functions in a 0-order Sugeno model is given by

H P1 P~ I

h----1 pl----1 pl=l i-~l

where h = 1, 2,..., H = YIi~l P~ are the H rules. The sum of basis functionsis equal to one

H

~-~gh (~,’) = (4.101)h=l

for all ~i-

A typical add-one partition is obtained using triangular membership func-tions

:~h,i

where ~,p~_l < ~i,~. Observe, how any crisp input ~,i can have non-zerodegrees of membership only in at most two fuzzy sets ~i,p~. Hence, we havethe following simpler result.

Algorithm 18 (0-order add-1 Sugeno fuzzy model) Assume crisp sys-tem inputs, an add-one partition, product t-norm, and weighted averagedefu~gification. Then, a 0-order Sugeno model is given by

H I

= (4.104 h=l i=1

where ~,i (~i, ") is the membersNp f~ction ~sociated with the h’gh rNeand ~he i’th input. Equivalently, we can ~ite

P1 P2 PI

~ ~ ~ ~*’" ~ ~P~,~,’",PI~I,p~ (~l~ ") ~2,~ (~2~ ")’’" ~I,pl (~I~ ") (4.105)

pl~l ~1 pl=l

where ~,p, (~, .) is the membership function ~sociated with tee p~’th setp~titioning the i’th input (p~ = 1, 2,... , ~).

TLFeBOOK


Notice, that at each point in the input space which is a center of sometriangular fuzzy set, i.e. ~i = ~i,p,Vi, the output of the system is given by~pl,p2 ~"" ~PI"

Example 29 (Fuzzy PI controller) A PI controller is given

= Kpe K, (4.106)

and can be rewritten in an incremental form

= K ae (k) (4.107)

where the control applied to the plant is given by u (k) -- u (k - 1) + Au Let us develop a fuzzy PI-type controller. Clearly, the system has two

inputs: the error e (k) and the change-of-error Ae (k), I = 2. For simplicity,let us choose P1 = 5 with linguistic labels negative big (NB), negative small(NS), zero (Z), positive small (PS), positive large (PL) defined by the of triangular add-1 fuzzy sets f~l = [fl1,1," ¯ " , fll,5], and P2 = 3 (negative (N),zero (Z), positive (P) set by f12 = [fl2,1,fl2,~,fl~,a]).

This can be written as a Sugeno model

5 3

pl----11o2=1

(4.108)

where ~1,~, ~1,2,-.. represent the degrees of membership for the propositions(fuzzy predicates)

error is negative bigerror is negative small (4.109)

etc. Similarly, the products ~1,1~2,1,~1,1~2,2,’’" can be interpreted as thetruth values of the propositions

error is negative big and change-of-error is negative

error is negative big and change-of-error is zero (4.110)

etc. The entire rule base can be collected in a table format, showing the

TLFeBOOK


values of Au (k)

NBNSZPSPB

N Z P

(4.111)

Often, linguistic labels are also assigned for the output singletons C%,p=, inorder to further enhance the transparency of the controller.

Next, let us consider the derivatives of a 0-order add-1 Sugeno model.The model is given by

f(~o,.) = ~’~ .. . ~ ~l pl,P2,...~p ’ H ~i ,pi (~ 0,, ") (4. 112)

pl:l p2:l pl-=-I i----1

The derivatives with respect to the parameters ~m,~,... ,~ are simple to cal-culate. It follows

I0f(w,-) = l-I~,,~, (~,’) (4.113)

0~,~,~,... ,~,, i=~

If only ~s are of interest, notice that these parameters appear lineaxly and,e.g., least squares can be used for their estimation. Also the gradients withrespect to parameters ~ can be calculated. However, the tuning of fuzzysets using data is more complicated due to various reasons (especially thetransparency of the model may easily be lost). Therefore, this is omittedhere.

In order to get a linearized approximation of the Sugeno model in theneighborhood of its operating point ~

(4.114)

the derivatives with respect to the inputs need to be computed:

5i =O~°i f(~o, .) = 0~o---7. ~ ~-~ ... ~,,,~,...,~,, H ~ti,p, (~°i’ ")pl=l p2----1 p1=l i=1

(4.115)

TLFeBOOK


Separating terms not depending on ~i and moving the derivation operatorinside the summation gives

pl=l p~=l pI=l j=l;j~i

(4.11~)

where the derivative of the triangular membership function, (4.103), is givenby

otherwise(4.117)

Notice that the gradient (4.117) is a piecewise constant. Thus, the consid-ered Sugeno model can be seen as a piecewise multi-linear system and theinterpolation properties of the system are particularly well defined.

Example 30 (Fuzzy PI controller: continued) Assume that at presenta plant operates under a linear PI controller (or that this has been designedusing, e.g., the Ziegler-Nichols rules), and that this PI controller is to beimproved by designing a fuzzy PI controller. Thus, the parameters Kp andKI of a linear PI controller Au (k) KpAe (k) + Kie (k) ar e a priori known.In order to use the nominal system as a starting point in the design of a fuzzyPI control, the equivalent fuzzy representation is needed.

First, the input space needs to be partitioned. Assume that reasonablebounds Jerkin, emax] and [Aemin, Aema~] can be set. Initially, we can place thecenters of add-1 triangular fuzzy sets, e.g., at equidistant intervals (usingPi =5, P2 = 3):

~1 = [emin, emin + Ze, ernin q- 2Ze, emin + 3ze, (4.118)

~2 = [Aemin, Aemin + ZAe, Aemax] (4.119)

( A’eraax --/kernin )where z~ = (~-~"~")a and zA~ = 2 The C~p~,m remain to bespecified. As the nominal system is given, we can set the ~u, ~,x,"" , ~.~

to correspond to the system output at points (~1,1,~,~, (/~,2,/~2,1),\ / \ /

TLFeBOOK


~2,3) by assigning

Since this type of Sugeno model is (piecewise multi-)linear, the resultingfuzzy PI now produces exactly the same function as the nominal linear PI,as long as the inputs are within the given ranges, i.e., e E [emin, emax] andAe E [Aemin, Aemax] .

Let us conclude this section by making a few remarks.

Remark 4 (Extension principle) The extension principle, or composi-tional rule of inference, is a means for extending any mapping of a fuzzy setfrom one ~pace to another. Let A be a fuzzy set defined in X, and f be amapping from X to Y, f:X -~ Y. Then a mapping of A via f is a fuzzy set

~B (A) defined in Y. The membership function is computed according to:

[#B (A)] (y) sup [#A (x)] (4.121)all xeX for which y=f(x)

assuming that sup 0 = 0 (when no element of X is mapped to y). In theMISO case, we have [#t~ (A)] (y) = sup~ ,,ex, for which y=ff,,)[#A (x)]

#~ (x) =T[#A, (Xl)," "" ,#A~-(X/)].

Example 31 (Extension principle) Figure 4.7 illustrates the extensionprinciple for mapping a fuzzy set A (characterized by #A) through a functionf. The result is a fuzzy set B (characterized by #B).

Example 32 (0-order Sugeno fuzzy system) Figure 4.8 shows an illus-tration of the extension principle for a fuzzy input A’ and a function givenby sampled data points {y} =f{x}. The output is a fuzzy set on a discretedomain Y. Note that the fuzzy input A’ may be a result of fuzzificationof a non-fuzzy input x0. The ’fuzziness’ in A’ together with a defuzzifica-tion method then determines the interpolation/smoothing properties of thesystem.

Example 33 (0-order Sugeno fuzzy system: continued) Figure 4.9shows an illustration of the extension principle for a crisp input x0 and a

TLFeBOOK


Y

Figure 4.7: Mapping a fuzzy set A through a function f.

!

X

Figure 4.8: Mapping a fuzzy input A through a ’function’ given by sampleddata points.

TLFeBOOK


Y

B

! o

ixo

Figure 4.9: Mapping a crisp input through a ’function’ given by fuzzy rules.

function given by ’fuzzy rules’ (constant local models). The outputs arefuzzy singletons on a discrete domain Y. Note that the input is crisp (orpoint fuzzification is used). The fuzziness in rule antecedents, as well asthe defuzzification method, determines the interpolation properties of thesystem.

Remark 5 (Fuzzy neural networks) During the past few years, the closeconnections between fuzzy and neural systems have been recognized (see,e.g., [86][34]). Fuzzy neural networks try to benefit from the advantages ofboth neural and fuzzy approaches. Functional equivalence of some neuraland fuzzy paradigms has been established, and common frameworks, such asthe (generalized) basis function network, have been introduced. The links the ’old’ methods of parameter estimation have become apparent, which hasenabled the application of efficient parameter estimation methods.

Fuzzy neural networks emphasize that the model contents can be pre-sented as linguistic rules or as numerical parameters. The former allowsthe use of human experimental knowledge in initializing model parameters,complementing missing data, and validating the identified model. The latterenables, e.g., the application of efficient optimization methods for parameterestimation. In most fuzzy neural network approaches found in the literature,the learning abilities of neural networks are applied to structures sharing thetransparent logical interpretability of fuzzy systems.

TLFeBOOK

Chapter 5

Non-linear Dynamic Structures

The best approach for describing non-linear dynamic systems is to considerthe a priori physical information about the system to be characterized. Inmany cases, suitable information is not available and the designer needs toturn into semi-empirical or black-box methods. In nonlinear dynamic sys-tems, the output of the system depends, often in a complex way, on the pastoutputs, inputs or internal components of the system. The main problemsin system identification are in structure selection, whereas efficient gr~lient-based or guided random search methods are available for solving the asso-ciated parameter estimation problems, even if the model is not linear withrespect to the parameters.

A direct extension of linear dynamic models is the Volterra series rep-resentation [92]. The Volterra representation is very general. In practice,however, a finite truncation of the series must be used, and a discrete ap-proximation of the series made. For a SISO system, a Volterra model can begiven by

where y (k) and u (k) are the system output and input at discrete instant k. The system parameters are given by w0, w,~, -.., w~l,,~,...,,~P(n,n~ =1,2, ...,N; i = 1,2, ...,P). N and P are the orders of the system,respectively. The order N is related to the length of the time window (zerosof the polynomials), and the order P is related to the non-linearity of themapping. The model output is linear with respect to its parameters, which

113

TLFeBOOK

114 CHAPTER 5. NON-LINEAR DYNAMIC STRUCTURES

makes the parameter estimation simple. Extension into the MISO case isstraightforward. However, there are theoretical and practical drawbacks as-sociated with the Volterra series [92]. In particular, the system ml~y containa large amount of parameters and suffer from the curse of dimensionality.Due to this, practical applications of Volterra series are often limited to firstand second order terms.

The static non-linearity in the Volterra models can be approximated byalternative structures, providing more convenient means for

¯ including a priori knowledge,

¯ handling of incomplete and noisy data sets,

¯ more efficient parameterization,

¯ increasing the transparency of the model,

¯ improved data compression, etc.

In what follows, two types of nonlinear black-box dynamic structures areconsidered:

¯ Non-linear time-series, and

¯ Wiener and Hammerstein models.

In both structures, the non-linear function is a static one. The capability tocharacterize dynamical process behavior is obtained using delayed inputs andexternal feedback (non-linear time-series), or internal feedback using lineardynamic filters (Wiener and Hammerstein models).

5.1 Non-linear time-series models

There are a large number of different black-box approaches for describingnon-linear dynamic systems. In process identification, non-linear dynamicblack-box time-series structures are common. The ability to characterizedynamical process behavior is obtained by using delayed inputs and exter-nal feedback. For most practical purposes in process identification, MISOnon-linear dynamic systems can be described with sufficient accuracy usingthe NAI~X, NOE and NARMAX time-series structures. The structure de-termines the inputs to the model, where only externally recurrent feedbackconnections are allowed. For modelling very complex non-linear dynamicsystems, fully recurrent systems can also be considered.

TLFeBOOK

5.1. NON-LINEAR TIME-SERIES MODELS 115

Denote a non-linear static function by f, a function of some parameters w.In the NOE time-series structure the predictor input consists of past inputsof the process and the past predictions of the process output:

~(k)=f(u(k-d),...,u(k-d-nB),~(k-1),...,~(k-nA),w) (5.2)

In the NARX structure the input consists of past inputs and outputs of theprocess:

~(k)=f(u(k-d),...,u(k-d-nB),y(k- 1),...,y(k--nA),W) (5.3)

In the NARMAX structure the input consists of past inputs and outputs ofthe process, as well as past predictions:

~(k) = f(u(k-d),...,u(k-d-nB), (5.4)

y(k-1),...,y(k-nA),

~’(k- 1) ,...,~(k - nv)

The NARMAX structure is shown in Fig. 5.1. Notice, that the NOE andNARX structures can be seen as special cases of the NARMAX structure.

The structure of the mapping f between the inputs and the output is notdetermined. If no a priori information about the structure of the processis available, it is common to choose some black-box structure: power series,sigmoid neural networks, or 0-order Sugeno fuzzy system (among many oth-ers, see Chapter 4). In practice, process modelling using NOE, NARX andNARMAX structures can give accurate predictions on a fixed data set. Mostimportantly, it is possible to model a wide class of non-linearities. If somenon-linear black-box structure is chosen for the static function f, practicallyall reasonable dynamic functions can be approximated (provided that theinput data windows are long enough, and the size (number of parameters)of the black-box model is sufficiently large). The approach is simple, as itextends the linear dynamic time-series structures to non-linear combinationsof the inputs. If the mapping f is a linear one, ARX, OE and ARMAXstructures result (see Chapter 3).

The main problem with these structures concerns the identification ofthe static non-linear function f. The complexity (degrees of freedom) the mapping depends on the structure chosen for the non-linear (parame-terized) function. In nonlinear black-box structures, the degree of freedomis usually large since the restriction of linearity of the mapping is removed.Technically, it is simple to apply some gradient-based optimization methodwith the NARX structure; with NOE and NARMAX the need to take intoaccount the dynamics when computing the gradients increases slightly the

TLFeBOOK


~,(~)

y(k-1)

~ .(~-d)

~ ~(~-d-~)

~ y(k-2)

::

~(~)

Figure 5.1: NARMAX time-series predictor.

TLFeBOOK


need of computations. However, too many degrees of freedom in f makethe parameters w sensitive to noise in data, and poor interpolation can beexpected if the data set does not contain enough information (covering thewhole operating range and all dynamic situations of interest). In general, theextrapolation properties of non-linear time-series models are always poor.

These problems can be tackled in parameter estimation by using opti-mization under constraints, where constraints can be posed on the structure(regularization), based on a priori known properties of the process, or, e.g.,deviation from a nominal model [63]. Alternatively, the degrees of freedomin the mapping can be reduced.

Let us next consider the gradients of the general nonlinear black-box time-series models. These are required by gradient-based parameter estimationtechniques (Chapter 6).

5.1.1 Gradients of non-linear time-series models

For simplicity of notation, let us restrict to SISO systems (extending toMISO is straightforward.) Consider a non-linear time-series NARMAX pre-dictor (5.4), see also Fig. 5.1. Let us calculate the gradient O~/Ow~ of thesystem output ~ with respect to its parameters w~, w = [wl, ...,w~, ...wj]T,

j = 1.2,..., J.For simplicity of notation, denote f(u (k - d), y (k - 1), ~ (k - 1),

f(k, w). Let us linearize the function f (5.4) around an operating point ~u, y, ~, W~. Using Taylor series, wehavethat

f(k,w) 7(k, w)d)

÷ ~x=~

+c(k,w)

where c is a constant (c (k, w) = [f (k, w)]~,=~) and the tilded variables the deviation from the point of linearization (u -- ~ + ~, etc.). The notation[’]x=~ indicates that the expression is evaluated at the point of linearization.

TLFeBOOK


We then ha~e that

0~

+ ~ N(k’w) x=~0

+3-~7~ c (k,w)

(5.7)

since the ~ and ~ do not depend on wj, j = 1, 2, ..., J, whereas f and ~ do.For the third term on the right hand side we have that

(5.8)

Substituting (5.8) to (5.7) and reorganizing gives

(5.9)

(5.10)

since ~ (k- 1, w) =~.2~ yO A (k- 1, w). Thus, the gradient is composed two terms: the static gradient (first term on the right) and the dynamic effectof the gradient (second term).

Let us summarize the results by writing the above in a more convenientform.

TLFeBOOK


Algorithm 19 (Gradients of NARMAX predictor) The gradients fora NARMAX time-series model

~(k) = f(u(k-d),...,u(k-d-nB),

y(k-1),...,y(k-nA), (5.11)

~(k- 1),...,~(k- nc),W)

with parameters w = [wl, ..., wj, ...wj] T are obtained from

noof (k, w) + ~ ¢~(k_m) (k,w) % (k (5.1~.)

where ~j denotes the ~adient of the model output with respect to its pmr amet ers:

Ow~ (k) ~ V~ (k, w) (5.13)

~ (k, w) (j = 1, 2, J) are the static ~ients of the non-line~ function

with r~pect to its parameters. The second term giv~ the dyna~c effect ofthe feedback in the network to the ~ients, a correction by the line~izedgain:

o~(a, ~) (~.~a)

Ex~ple 34 (ARMAX structure) Let ~ ill~trate the above ~ing simple line~ dynamic system

~(k) = a~(k) +~(~- 1) + ~(~- (5.1~)

The system h~ t~ p~ameters, w = [a, b, c] T, the ~nction f is linear. The~adients of the static (line~) function ~re given by:

[y(k)Of (k,w)= ~(~- Ow~ ~(k - 1,w)

(5.16)

of(k,w) =c (5.17)O~(k- 1)

TLFeBOOK


linearz

11

nonlineardynamic ~ static

part part

Y

Figure 5.2: A Wiener system. The system input u is put through a linearfilter, and a nonlinear mapping of the intermediate signal z gives the systemoutput y.

nonlinear zstaticpart

lineardynamic

part

Y

Figure 5.3: A Hammerstein system. A nonlinear mapping of the input signalu gives the intermediate signal z. The system output y is the output of alinear filter.

The gradient of the system output with respect to its parameters is given by

v(k)= = u(k- 1)if(k- 1,w)

+ c~, (k - 1) (5.18)

Notice that although f is linear, the system output is not linear since thegradients depend also on past data. A similar result was derived in section3.3.8.

5.2 Linear dynamics and static non-linearities

In many cases, dynamics of the non-linear process can be approximated usinglinear transfer functions for describing the system dynamics. Wiener andHammerstein structures are typical examples of such structures. A restrictedclass of Wiener and Hammerstein systems will be considered next.

Wiener and Hammerstein structures consist of a linear dynamic part anda non-linear static part. In a Wiener structure (see Fig. 5.2), the lineardynamic part is followed by the non-linear part. In a Hammerstein structure(see Fig. 5.3), the non-linear part precedes the linear dynamic part.

TLFeBOOK

5.2. LINEAR DYNAMICS AND STATIC NON-LINEARITIE, S

5.2.1 Wiener systems

Assume a SISO Wiener system given by

121

(5.19)

where

z(k) = g(q-~) ~(k-d) (5.20)

y (k) is the output of the Wiener system, f is a non-linear static SISO function,z (k) is an intermediate variable, u (k) is the input to the system and the time delay. A (q-Z) and B (q-l) are polynomials in the backward shiftoperator q-1 :

A (q-’) = 1 + a~q-~ + ... + a~q-~ (5.21)

B (q-~) = bo + b~q-~ + ... + bn, q-" ~ (5.22)

Obviously, a predictor for the above deterministic system is given by

~(~) = ~(~(k)) (5.~3)

where A (q-a) ~(k) = B (q-~) u (k - d). But this is also the predictor O~system. This leads to consider the following stoch~tic proc~s:

~ (~1 = f [~(q-~)~ A (q-~)~ (~ - ~1 + ~ (~1] (~.~41

Let us rewrite the noise term:

where the prediction error now appears at the output of the Wiener system:

Although this may seem nice, the tradition from e (k) to e~ (k) is critical.~om a statistical poim of view, in line~ syste~ the prope~i~ of {e (k)}convey to ( ey (k) } (e.g., if e (k) h~ Gaussian distribution, then ey (~) Ga~sian, too). For nonlinear system, this is not the c~e, and even ~ an

TLFeBOOK


approximation it is valid only locally around an operating point providedthat the function f is smooth enoughI .

Let us consider the following stochastic Wiener system

f By(k)= (~(--~u(k-d)) (5.29)

where {e (k)} is a sequence of independent random variables with zero meanand finite variance. The predictor for such a system is given by

~(k) = f(~-~(q_l) u(q-’) (k - d)) (5.30)

and minimizes the expectation of the squared prediction error~. In general,the non-linear system is a function of some parameters w. Hence we havethe expression for a SISO Wiener predictor:

Bw)~(k)=f(~u(k-d), (5.31)

It is straightforward to extend these results for MISO systems with multi-ple linear dynamic systems (one for each input). Let us first define a Wienersystem.

1Under some conditions related to the non-linear mapping f and its inverse f-1 (con-tinuity, differentiability, etc.), the density function of the output of the nonlinear systemcan be expressed as a function of the density function of the input (see [39], p. 34, see also[711).

~Let us find

~(k)=argmj~nE {[y(k)- 2}

:

+E {ewhere the second term is zero (due to the independence of e (k) with respect ~o ~ (k ~d ~ and that E {e (k)} = 0). If the ~iance is finite, the criterion is mi~mi~ed (5.30).

Substituting (5.29) we have that

=

TLFeBOOK

5.2. LINEAR DYNAMICS AND STATIC NON-LINEAPdTIES 123

Definition 12 (Wiener system) Define a MISO Wiener system

y(k) = f(z (k)) (5.32)

where z (k) = [zl (k), z2 (k), ..., z~ (k), ..., T are given by

B~(q-1) (k d~)(5.33)= A (q-x)

y (k) is the output of the Wiener system, f is a non-linear static MISO func-tion, z~ (k) are intermediate variables, u~ (k) are the inputs to the system d~ are the time delays. A~ (q-~) and B~ (q-~) are polynomials in the backwardshift operator q-1 :

Ai(q -x) = 1 +ai,~q -1 +... +a,,,a~q-~a’ (5.34)

Bi (q-X) ___ bi,o+biAq-1 +... +bi,n,,q-,,~ (5.35)

and i = 1, 2, ..., I, where I is the number of inputs to the system.

Note, that a general Wiener system may have a single MIMO linear dynamicpart. Here we restrict to the case of multiple SISO linear dynamic parts.The MISO predictor can be derived in a way similar to the SISO ease.

Algorithm 20 (Wiener predictor) A predictor for a MISO Wiener sys-tem is given by

~(k) = f(~(k) (5.36)

where

Biz"~ (k) = A~ (q-a) u~ (k (5.37)

~(k) is the predicted output of the Wiener system, f is the non-linear staticMISO function of parameters w, u~ (k) and ~ (k) are the input to the systemand intermediate variables, respectively, and di are the time delays. Ai (q-X)and B~ (q-a) are polynomials in the backward shift operator q-1

Ai (q-X) = 1 + a,,lq -x + ... + ai,,~a,q-" a, (5.38)

Bi (q-~) = b,,o + b~,~q-x + ... + bi,,,,q-"’, (5.39)


Let us next consider Hammerstein systems.

TLFeBOOK


5.2.2 Hammerstein systems

Definition 13 (Hammerstein system) Define a MISO Hammerstein sys-tem by

y(k) --=B(q-1) (u(k)w)~-e(k) (5.40)A(q-1)f ,

where u(k)= [u1(k),u2(k),...,ui(k),...,u~(k)] T, i= 1,2,...,I where I isthe number of inputs to the system, y (k) is the output of the HammersteinSystem, and f is a non-linear static MISO function of parameters w. A (q-l)and B (q-l) are polynomials in the backward shift operator q-1

A (q-l) = 1 alq-1 .- b .. . T a,~Aq-’~’~ (5.41)B (q-~) -= boq-d + blq-’-d + ... + b,,Bq-’ *"-d (5,42)

d is the time delay.

Since the multiple-input non-linearity appears at the input and the systemcontains just one linear dynamic filter, the prediction is simple to derive.

Algorithm 21 (Hammerstein predictor) A predictor for a MISO Ham-merstein system is given by

B(q-~)’" w)(5.43)~(k) = ~(-~(u(k),

where u(k) = [u~ (k) , u2 (k) , ..., u~ (k) , ..., u, T, i = 1,2,...,I wher e I isthe number of inputs to the system. ~(k) is the predicted output of theHammerstein system, f is a non-linear static MISO function of parametersw. A (q-~) and B (q-l) are polynomials in the backward shift operator q-~

A (q-~) = 1 + a~q-’ + ... + a,~,~q-’*’* (5.44)

B (q-~) = boq-~ + blq-~-~ + ... + ~,,,q-"’-" (~.45)

d is the time delay.

In the SISO case, the input u (k) is a scalar u (k).

TLFeBOOK

5.3. LINEAR DYNAMICS AND STEADY-STATE MODELS 125

5.3 Linear dynamics and steady-state models

The Wiener and Hammerstein systems consist of two parts: the linear dy-namic (transfer function) and the static (non-linear) part. In the practice industrial process engineering, the steady-state characteristics of a processare of main interest, and the dynamic behavior is often poorly known. Infact, often only the control engineers seem to be interested in the modelingof the dynamics of a process, while the system designers and production en-gineers largely ignore the dynamics. In order to provide models that bothparties can understand and in order to employ already existing (steady-state)models of the process -among other reasons (such as increased simplicity inidentification and control design)- it is reasonable to consider the case wherethe non-linear static part in Wiener and Hammerstein models represents thesteady-state behavior of a process.

Hence, we assume that the static (non-linear) function is given by thesteady-state function of the process

y.~8 = f (u.~8) (5.46)

where the subscript ss denotes steady-state.The Wiener system is given by

y(k) = ~ (,. (k) ,w) (5.47)

where z(k) = [zl (k),z2(k),...,zi(k),...,zi(k)] T are given by

Bi(q-1)z, (k) = Ai (q-l) u~ (k (5.48)

In order to preserve the steady-state function, the steady-state gain ofthe linear dynamic part has to be equal to one,

B,(z)lim- --- 1 (5.49)z-~l A~ (z)

i.e., in steady-state

(5.50)

for all i = 1,2, ..., I.Similarly, for the Hammerstein system:

y(k)= B(q-~)f~.~(z(k) w)+e(k)A(q-~) , (5.51)

TLFeBOOK


we must have

B(z)lim ~ = 1 (5.52)¯ -~ A (z)

in order to preserve the steady-state function

y~s = f.~ (u~,w) (5.53)

for all i = 1, 2, ..., I.

5.3.1 Transfer function with unit steady-state gain

There are several ways to fulfill the requirements (5.49) and (5.52). Let consider the following constraint on the coefficient of the transfer function,where a substitution for b* fulfills the requirement.

Algorithm 22 (TF with unit steady-state gain) Let the transfer poly-nomial, nB _> 0 , be given by

B*(q-1)

A(q-1)¯ .. b* ~--~Bbo + blq -I + + bnB-lq -(n~-l) + nB~

1 + alq-~ + ... + an, q-n~

A unit steady-state gain is ensured by

(5.54)

nA nB - 1

b*ns = 1 + Z an- ~_~ bn (5.55)n=l n=O

Proof.letting z -~ 1 gives

Substituting (5.55) to the z-transform equivalent of (5.54)

B* (z) b0 A- bl + A- b* na ~na-1 bnlim-- = ... n.-1 + 1 + En=l an - A~n=0 :---- 1 (5.56)z-.~ A (z) 1 + a~ +... +

which shows that the steady-state gain of the dynamic part is one;, as desired.

5.3.2 Wiener and Hammerstein predictors

Combining the result in section 5.3.1 with the steady-state Wiener structure,we get the following predictor.

TLFeBOOK


Algorithm 23 (Wiener predictor: continued) A predictor for a MISOWiener system with a steady-state non-linear function is given by

where

~(k)--fss(~(k),w) (5.57)

~i (k) -~ B~ (q-l) (]g d~) (5.58)Ai(q-t)us

~(k) is the predicted output of the Wiener system, fs~ is the non-linear staticsteady-state MISO function of parameters w, ui (k) and ~ (k) are the inputto the system and intermediate variables, respectively, and di are the timedelays. Ai (q-l) and B~ (q-l) are polynomials in the backward shift operatorq-t :

A~ (q-l) = l + a~,tq-~ + ... + a~,~.~q-na~ (5.59)

B; (q-t) = bi,o + bi,lq -1 q-... q- bi,ns,-lq -(nsi-1) q- bi*,nthq-nB’ (5.60)

nBi --1

b~*,n" = l+Ea’,~- E bi,,~n=l

where


For the Hammerstein system we get a similar result.

Algorithm 24 (Hammerstein predictor: continued) A predictor for MISO Hammerstein system with a steady-state non-linear function is givenby

B* (q-t) fs~ (u (k) (5.62)if(k)= A(q_t) ,

where ~(k) is the predicted output of the Hammerstein system, and fss the non-linear static steady-state MISO function of parameters w. u (k) Jut (k), ..., ui (k), ..., u~ T, i = 1,2,... , I , where I is t henumber of inputsto the system. A (q-~) and B* (q-t) are polynonfials in the backward shiftoperator q-t :

A (q-l) = 1 al q-1 q- ... q- anaq-n’~ (5.63)

B* (q-~) = boq-d + b~q-~-’~ + ... (5.64)

b* fl-nB -d+bn,_tq-(’~-~)-’~ +

TLFeBOOK


where

and d is the time delay.

nA riB--1

b’n. = 1 + E a,~- E b. (5.65)n=l n=O

5.3.3 Gradients of the Wiener and Hammerstein pre-dictors

If the parameters of the Wiener or Hammerstein model are unknown, theyneed to be estimated. Often, the most convenient way is to estimate theparameters from input~)utput data observed from the process. If gradient-based techniques are used, the gradients with respect to the parameters needto be computed. Assume that all parameters are unknown:

¯ parameters of the static (steady-state) mapping, w; and

¯ parameters of the linear transfer function(s), i.e., coefficients of thepolynomials A and B, as well as delay(s)

An estimate for the delay(s), is usually obtained by simply looking at theprocess behavior (step response, etc.), whereas parameters w, A and B areestimated by minimizing the prediction error. Sometimes the estimationof d may require several iterative rounds, where a value for d is suggested,parameters in A, B and w are estimated, and if the model is unsatisfactory,new value(s) for d are suggested.

Let us compute the gradients for parameters in A, B, and w in theWiener predictor (23). For the static part, denote gradient with respect parameters by

0F %(k) (5.66)Owj

and the gradients with respect to inputs by

where w = [~/)1, ..., toj,..., ’~Oj]T and J is the number of parameters in the non-linear part. I is the number of inputs to the system, i = 1, 2, ..., I. Theseparameters depend on the structure chosen for the static part.

TLFeBOOK


The chain rule can be applied for calculating the gradients with respectto parameters of the dynamic part:

and

0~

0~0b~,--: (~) = ¢’ (~) ~ (~) (5.~9)

Hence, in order to compute the gradients, only the gradients

0~ ors0b,,--: (k) ~d ~ (~) (~.~0)

~e f~ther needed, where n = 1, 2, ..., hA, and n = 0, 1, ..., n~, - 1, r~pec-tively, i = 1, 2,..., I.

For simplicity, omit the input index i for a moment. The output of theline~ part can be ~itten ~

* --nB--d~(~) = [60q-~+...+6~._~q-(~--1)-~+6~.q ]~(e)- [(a~ + ~q-~ + ... + ~,~q-(~-l))] ~(~ ~)

nA ~nB -- 1where b* = 1 + ~=~ a~ - b~. It is now simple to compute thenB ~n~O

derivatives with r~pect to the parameters:

Oa-~(k)=u(k-n,-d)-~ am (k-m)-~(k-n) (5.73)

where n = 0, 1, ..., nB -- 1 and n = 1, 2, ..., hA, respectively. Assuming that theparameters nA change slowly, past gradients of o~ o~

~ and~ can be stored andthe equations computed recursively, thus avoiding excessive computations.

Let us collect the previous results in what follows.

Algorithm 25 (Gradients of the Wiener predictor) The gradients the Wiener predictor with steady-state non-linear part are given by

-- (k) ~- ~ (k) (5.Va)Owj

TLFeBOOK


where

(k) ¢,(k)~_---(~) (5.75)Oa,,----’-~ ua~,,~

09(k) = (Ih (k) _~-7-- (5.76)

Ob~,--’~

0~~ (k) ~ (I)i (5.77)

Ob~,----~ (k) = u~ (k - n di) - ui(k - nB~- d~) - ~, I,~,,,,-O-KZ~ (~ -

Jrn-=l L

(5.78)

Oai,--~OK

,~A, ra o~i

(k)=ui(k-n.,-di)-E. ~,m-a’--- (k-m) -~(k-n) m=l L Oai,n

where i = 1, 2, ..., I (system inputs), j = 1, 2, ..., J (parameters of the staticsteady-state part) and n = 0, 1, ...,nB~- 1 and n = 1,2, ..., nA~, respectively(orders of the polynomials associated with each input).

Owing to the recursion in the computation of the gradients, the polyno-mial A~ needs to be stable. When instability is encountered, the parametersneed to be projected toward a stable region. A simple method consists ofmultiplying A~ with a constant 3’, 0 << 3’ < 1:

o

A~ = 1 + ~fai,lq -1 + ... + "7~A’ai,,~A,q-~A~ (5.80)

until all roots of the polynomial lay inside the unit circle.Let us next give the gradients for the Hammerstein predictor. The pre-

dictor is given by

B* (q-’) r~ (u (k) w)(5.81)9(k) = A (q-l) ’

in which the linear dynamic subsystem at the output can be written out

9(~) = [~o~-~ + ~,~-,-~ + ... + ~_,~-¢~-’/-~]

n=l n=O

-- [(a I na a2q-1 nt- ...-b anAq--(na--1))] 9(k -- 1)

TLFeBOOK


where the output of the static subsystem is given by

~(k) = r,~ (~, (~) (5.83)

It is simple to calculate the derivatives with respect to the parameters:

a~n~ L ~

]~ la~ (~- ~) (~.84)

m=l

cga--(k)=’d(k-nB-d)-E am (k-m)-~(k-n) (5.85)

where n = 0, 1, ..., nB -- 1 and n = 1, 2, ..., hA, respectively.The gradient of the output of the Hammerstein predictor with respect to

the parameters of the static part, denoted by

o~0w-~. (~) ~- ~j (~) (5.86)

(j = 1, 2, ..., J) is still required. Denote the gradient of the output of thestatic part with respect to its parameters by

0~o~--~. (~) ~- % (k) (5.s7)

where w = [w~,..., w~,..., wj]T contMns the parameters of the static mapping.The ~adient of the linear subsystem (5.82) is given

o~o~ (~) = (~°~-~ + ~(’-~ + ’ + ~-~(~-~)-~) (~

+ 1 + Ea"- E b.{ q- "- (k) (5.88)~=o ]

_[(~,+~q-l+ +~q-(~-l))] o~ (~_~)

which is more conveniently expr~sed ~

n=l n=0~A

- ~ ~s~ (~ - n)

TLFeBOOK


Assuming that the parameters change slowly, past gradients Ej and koj canbe stored and the equations computed recursively, thus avoiding excessivecomputations.

Let us collect the results for the Hammerstein system.

Algorithm 26 (Gradients of the Hammerstein predictor) The gra-dients of the Hammerstein predictor with steady-state non-line~c part aregiven by

,~A [ O~ (k m)](5.90)Oh--: (~) = ~(t: - ~ .....

/=1

Oa---~(k)=~(k-nB-d)-Z am (k-m)rn: l

(5.91)

where

=_j(a)

Owj (k) ~-- .=.~ (k) (5.92)

(5.93)

0~"-- (k) ~- ~ (k) (5.94)Owy

where j = 1, 2, ..., J (parameters of the static steady-state part) and n 0, 1, ...,n~ - 1 and n -- 1,2, ...,hA, respectively (orders of the polynomialsassociated with the output).

5.4 Remarks

Let us conclude this chapter by making a few remarks on the practical useof Wiener and Hammerstein systems. For discussion on the Wiener, Ham-merstein, and related structures see, e.g., [72].

TLFeBOOK

5.4. REMARKS 133

5.4.1 Inverse of Hammerstein and Wiener systems

Wiener and Hammerstein models are counterparts: the inverse of a Ham-merstein structure is a Wiener structure, and vice versa. This is importantin applications to control.

To show this, let us assume that a system is described by a Hammersteinmodel

~B (q-l) f(u (k))(5.95)y(k) =q- ~(q-1)

d is the time delay of the dynamic system, thus we can require that bo -fi 0.The linear dynamic part is given by

dB(q-~)y(k)=q- ~(q_l)Z(k) (5.96)

which can be written out as a difference equation:

y (k) = -a~y (k - 1) - ... - a~ (~ (5.97)

+boz (k - d) + ... + b,~,z (k - d - riB)

Its inverse is given by

dA(q-’) .z(k)=q (k) (5.9s)

Writing out, we have

1 a1

~z(k) -=- -~oY(k+d)+~oY(k+d-1)+...+ y(k+d-nA)

b~ (k - 1) - bnB z

-b-~Z...- ~ (k - 1 - he) (5.99)

The non-linear static part is given by

z (~) = r(u(~))

Let us assume that the inverse(s) of the non-linear static part exists

~ (k) = ~ (z (k))

Then, the inverse(s) of a Hammerstein system can be expressed

ui (k) = ~ (z (k)) -(q-~)(q_l) y (k +d)

(5.1oo)

(5.101)

(5.102)

which is a Wiener system with input sequence (y (k + d)} filtered by B andscaled through ~1, to produce an output sequence (u~ (k)}.

TLFeBOOK


Example 35 (Hammerstein inverse) Let a SISO system be given by Hammerstein model

y(k) 0.7z(k-1)+0.3y(k-1) (5.103)z(k) = 2 (k) (5.104)

with u E ~+ (positive real). Thus

A(q-1) = 1-0.3q-~;B(q -:~) =0.7;d= 1

The inverse is given by

z(~)~(k)

(5.105)

0.31 (k÷ 1) -~y(k) (5.106)-- 0--2y -

= v~ (5.i07)which has the structure of a Wiener system. Notice, that for u E N theinverse does not exist (not unique).

5.4.2 ARX dynamics

So far, only OE-type of dynamics have been considered. Let us next considerbriefly systems with ARX dynamics. An ARX-Hammerstein predictor isgiven by

~(k) = -A~ (q-i)y(k)+ B (q-i)z(k) (5.108)

where A = 1 + q-~A1, and z (k) =f(u (k)). Since {y} is known (y is able), a one-step ahead ARX-Hammerstein predictor can be directly imple-mented. In general, AP~-type of dynamics can not be recommended for theidentification of processes where measurements are strongly contaminated bynoise.

The gradients are simple to calculate

where ~j (k) = o_~_~

TLFeBOOK

5.4. REMARKS 135

For Wiener systems the intermediate variable z (k) is not available. Thusan ARX implementation is not straightforward. However, if the inverses ofthe non-linear part exist and are known, we can obtain the intermediatevariables using the inverses

z~ (k) = ~’ (y (k), (5.112)

Then the gradients of the system can be given as

0~--~. (~) = % (~) (5.113)

09 (k) = ,~(k)u(k-n) (5.114)Ob~,,~

~(k) = ~,(k)z,(~-n) (5.115)Oai,n

where ~ = ~. The identification of Wiener systems with ARX type ofdynamics requires the identification of the inverse(s)

The derivation of the corresponding equations for a system with unitsteady-state gain linear dynamics is straightforward and omitted here.

TLFeBOOK

Chapter 6

Estimation of Parameters

This chapter considers parameter estimation techniques. These techniquesare essential in system identification, as they provide the means for deter-mining (off-line) or adjusting (on-line) the parameters of a chosen modelstructure, using sampled data (measurements). The least squares methodcan be applied when the system output is linear with respect to its parame-ters. This is true for linear static mappings (linear regression models), as wellas for some linear dynamic structures (such as ARX structures), and somenon-linear systems (e.g., power series and multi-linear systems). However,usually the least squares method can not be directly applied in non-linear ordynamic systems, since these types of systems are, in general, non-linear withrespect to their parameters. In the previous chapters we have pointed outstructures such as the OE (Chapter 3), the sigmoid neural networks (Chapter4), or the Wiener structure (Chapter 5), with which the least squares methodcan not be directly applied.

In this chapter, the least squares method will be extended to the case ofnon-linear systems. The parameter estimation problem is seen as an opti-mization problem, minimizing a cost function consisting of a sum of squaredprediction errors.

In general, the non-linear parameter estimation techniques are iterative:A fixed set of data is used repeatedly; at each iteration the parameters areadjusted towards a smaller cost. This is because usually the criterion willnot be quadratic in the parameters, as in the least squares method, andtherefore an analytical solution is not available. Note the difference betweenrecursive algorithms, such as recursive least squares (RLS). In RLS, newdata patterns are added one-by-one, possibly forgetting the older ones (e.g.,exponential forgetting), but not used repeatedly. The methods discussed inthis chapter are batch methods, such as the least squares method, where afixed data set is used. Also recursive forms can be derived. Unfortunately,

137

TLFeBOOK

138 CHAPTER 6. ESTIMATION OF PARAMETERS

on-line identification using non-linear models is less robust, due to severalreasons, and can not be recommended.

In practice, gradient-based methods are dominating. Their main draw-back is that they can get stuck with the local minima in the cost function.However, these methods have shown to be efficient in practice. For complexsystems, suitable methods can be found, e.g., from random.search and prob-abilistics. Unfortunately, the practical implementations are often inefficient.

First, a general overview to prediction error methods is given, and thealgorithm for the Levenberg-Marquardt method is given. This is followed bythe presentation of the Lagrange multipliers approach for the case of opti-mization under constraints. The guided random search methods are consid-ered in the next section, with special emphasis on the learning automata. Inorder to illustrate the feasibility and performance of the methods presentedin this chapter, a number of simulation examples on process identification ispresented at the end of this chapter.

6.1 Prediction error methods

The family of methods that minimize the error between the predicted and theobserved values of the output are called prediction error methods1. Manyother parameter estimation techniques can be regarded as special cases ofthese methods.

Iterative prediction error minimization methods are based on the follow-ing principle [87]: given a function J(0) to be minimized

K 1J(t9) = ~-~ -~ (y (k) - ~(k))2 (6.1)

and an initial state for ~(0), find a minimization direction u (l) and step ~ (1), and update the parameters

~ (l + 1) = ~ (1) + y (l) (6.2).

The prediction ~(k) -- f(O, k) is a function of the parameter vector [21, . . " .~p]T, which is computed using the current parameter estimates: ~ =

~ (/); y (k) is the corresponding target observation (measurement.); K number of data patterns in the training set.

1 Principle of least squares prediction error [3]: Postulate a model that gives the pre-

diction in terms of past data and parameters. Given the observations of past data, adjustthe parameters in such a way that the sum of the squares of the prediction errors is assmall as possible.

TLFeBOOK

6.1. PREDICTION ERROR METHODS 139

The task of the minimization is to find optimal values for the directionand the step size, when only local information of the function is available.Repeated application of (6.2), each time with the optimal direction and stepsize, will bring J(0) to a minimum. As a result of the search, a parameterestimate is obtained which minimizes the cost function

0 ---- argm~n J (0) (6.3)

Note, however, that the globality of the minimum can not be guaranteed.The optimization techniques are not necessarily restricted to cost functionsof the specific form given by (6.1), as long as the derivatives are computedaccordingly.

6.1.1 First-order methods

Let us now focus on finding the minimization direction. Let 0 be somefixed parameter vector. The cost function (6.1) can be written as a Taylorexpansion around 0:

P

[ ] 0~0p.+... (6.4)p=l 0=~ p=l;p.=l

where ~ is the deviation from ~, 0 = ~ + ~. In the first order methods, onlythe first non-constant term of the Taylor expansion is used:

p1j (o) +

" (6.5)p=l

The derivative of J(0), approximated by (6.5), is given

A na~al ~ed point 0 is the c~rent parameter ~timate 0 (1). To com-pute the new ~gimate, the ~nimi~ation direction is given by the negative~adient. The leaning rule then becom~

The ste~ size ~ (l) is often re~laced by a ~ed co,rant, due to lower com~u-ga~ionM cos~. These typ~ of methods ~e o~en Nso referred to ~ st~p~tdescent, ~adient d~cent, le~t mean squ~es, or error b~kpropagation gech-~qu~.

TLFeBOOK


6.1.2 Second-order methods

If the second non-constant term from the Taylor expansion is also considered,the cost function becomes

[0 J .] o o% (6.8) roJ1 ap=l p=l;p*----1

and can be written as

a (8) = J ~ - bT~ + ~THT~

where

(6.9)

oa](6.10)b=- ~--~ a=a

and the elements of the Hessian H are given by

(6.11)

Minimum of (6.8) is found by setting the derivative to zero, and is locatedat

H8 - b = 0 (6.12)

from where the optimal 8 can be obtained. It is given by

0 = H-lb (6.13)

Thus, the optimization reduces to matrix inversion.Unfortunately, the calculation of the Hessian H is computationally pro-

hibitive in practice (analytical solutions are rare (see, e.g., [10]) and ap-proximation methods must be applied. Alternatives include quasi-Newtonmethods such as BFGS and DFP, or the conjugate gradient methods (see,e.g., [71187][84])).

Levenberg-Marquardt method

A commonly used second-order method is the Levenberg-Marquardt method(see, e.g., [7]).

Define a vector R whose K components rk are the residuals

rk=~(k)-y(k) (6.14)

TLFeBOOK


k = 1, 2, ..., K, where K is the number of data samples. The cost functionand its derivatives can now be expressed as

2(6.1~)

oJ (0) = ~ rk~-6 _- G (01~ R (0) (6.16)O0

=002(6.17)

where G (19) is the Jacobian matrix, whose elements g~,p are given by

or~g~’P =

(note that ~0~ = ~ (k)) and 8(t9)is the part of the Hessian matrix contains the second derivatives of rk.

The Newton iteration is given by

~(l+ 1)=~(/)- [G(19)TG(19)+S(19)]-~G(19)TR(19) (6.19)

where the Jacobian G (19) is easy to calculate, while S (19) is not. In Gauss-Newton method, the S (19) is simply neglected. In the Levenberg-Marquardt method, the step is defined as

(6.20)

where # (l) is increased whenever the step would result to an increased valueof J. VChen a step reduces J, # (1) is reduced.

6.1.3 Step size

When the minimization direction is available, the problem is to decide howfar to go along this line before a new direction is chosen. The step sizemay be constant or time-varying. Often, the step size parameter is chosensuch that it is a decaying function of time, such as ~/(k) = -~, for example.This choice is made due to theoretical requirements (to ensure infinite search

TLFeBOOK


range: Y]~I ~ (l) -- ~x~ and convergence of the estimates:Heuristically, the step size should be large when far away from the optimum(at the beginning of the search), and tend towards zero in the neighborhoodof the optimum point (at the end of search). In practice, however, it may difficult to find an efficient step size coefficient fulfilling these requirements.

The following simple procedure was suggested for # in the Levenberg-Marquardt method in [24]: Whenever a step would result in an increasedvalue of J, # (l) is increased by multiplying it by some factor larger than when a step reduces the value of J, it is divided by some factor larger than1. Note, that when # is large the algorithm becomes steepest descent withstep size equal to ~, while for small # the algorithm becomes Gauss-Newton.

Another common method is the three point method: Given there are threevalues 0a < 0b < 0c such that the function at 0b, J(0b), is the lowest, thesign of the derivative at 0b indicates whether a minimum is located in [0a, 0hior in [0b, 0el. This section is then linearly interpolated from its endpoints,and the procedure is repeated.

6.1.4 Levenberg-Marquardt algorithm

Let us summarize the Levenberg-Marquardt method in the following algo-rithm.

Algorithm 27 (Levenberg-Marquardt) Given a function J(0) of sum of prediction errors

K

J(0) = 5 [y (k)-

the minimizing parameters

~ ---- arg m~n J (0)

can be found by the following algorithm.

1. Initialize:

Set iteration index I = 1.

Initialize ~ (1) and ~ (1) and specify

2. Evaluate the model and the residuals:

Evaluate ~ (k) and ~ (k) for all patterns k and parame~ers

(6.22)

TLFeBOOK


Compose the residual vector R

and compute J (~ (/))

J (~(/)) =

Compose the Jacobian matrix (~

9k,p = ~ (k)

Solve the parameter update

a~ (~) = - [G~G + , (~)I]-1G~R

(6.23)

(6.24)

(6.25)

(6.26)

Repeat Step 2 using ~ (l) + A~ (l), i.e. compute J(~ (l) + A~ (l)).

If J(~ (/)+ A~ (1)) <J(~ (/))then increase the

#(t + a) = ,(t)/,~

and update the parameters

~ (l + 1) - ~ (l) -t- A~ (l)

otherwise reduce the step size

(6.27)

(6.28)

Example 36 (pH neutrali~.ation) Let us consider a simple simulated ex-ample related to pH neutralization. The process model is given in detail inChapter 8. Here, we will concentrate on the problem of identifying a SISOsteady state titration curve: the effect of base stream to the effluent pH. Thedata was obtained by solving the steady state model for randomly selectedinputs, scaling both variables to [0, 1], and adding Gaussian noise N(0, 0.052)to the output measurement.

5. Set l -- l + 1 and return to Step 2, or quit.

#(l + 1)= z/r (/) (6.29)

TLFeBOOK


a)1.2

1

0.8

0.6~~ 0.4

b)

0.2

0

-0.20

f(~) = 1.183~+ 0.001

0’,2 0’.4 0’.6 (~.8

Figure 6.1: Titration curve data. Plot a)shows the observed data. Trainingdata is indicated by circles, test data by crosses. Plot b) shows the estimatedlinear model.

Let us now examine the identification of a sigmoid neural network modelfor the process, estimating the parameters with the Levenberg-Marquardt(LM) method.

Data. Assume that two sets of data from .the process have been measured,both containing 75 input-output observations of the plant behavior. The firstset will be used for parameter estimation (training data), while the secondis conserved for model validation (test set). In addition, in this simulatedexample, we can evaluate the true noiseless function. This gives us a thirdset (500 observations). The data are shown in Fig. 6.1a where the trainingdata is indicated by circles and the test data by crosses.

Model structure. Given the data, the next thing to do is to select themodel structure. First, a linear regression model was estimated. The pre-diction is shown in Fig. 6.lb. Clearly, the plot indicates that the processmay possess some non-linearities. Since we are not aware of any data trans-formation for a titration curve that would convert the parameter estimationinto a linear problem, we consider a sigmoid neural network (SNN) black-boxstructure. Let us stick to the simple one-hidden layer network. With SNNthe number of nodes H still needs to be set. The non-linearities seem mild,so let us experiment with several moderate network sizes, say, ’3, 5, and 10.

Parameter estimation. For the LM method, the initial parametervector 0 (1), initial step size # (1) and its adjustment factor ’0 need to specified. For SNN, a reasonable starting point is obtained by initializing ~

with small random values, say 0v E N(0, 0.01) Vp. Let us set the initial stepsize to/z (1) = 0.01 and a relatively moderate adjustment factor r/= 1.2.

TLFeBOOK

6.1. PREDICTION EPJ:tOR METHODS 145

!i 102 ’

0 50 I00 150200 250 300I

Figure 6.2: Evolution of the criteria J = ~ ~ (~(k) - y(k)) 2 during pa-rameter estimation as a function of the LM iterations l. y (k) are the outputdata in the noiseless (J), test (Jtest) and training (Jtrai~) data sets and are the corresponding predictions by the model. H = 5.

With these values set, we can proceed in estimating the parameters. Letus start with the medium sized network (H = 5). The evolution of sev-eral criteria is shown in Fig. 6.2. The LM algorithm performs the task ofminimizing Jtrain, which is the average sum of residual errors between obser-vations in the training data and the corresponding model predictions. Asseen from Fig. 6.2 (the curve with circles), the Jtrai,, drops rapidly duringthe first 50 iteration rounds, and for l > 50 the evolution is slower. Withthe stopping criterion of II(~TR < 10-3 the model shown in Fig. 6.3a isobtained (predictions connected with a solid line).

Validation. At first sight, this seems to be a reasonable fit to the trainingdata (circles in Fig. 6.3a). To have a better view of the model performance,we can compute the linearized parameters. These are shown in Fig. 6.3b.These show peaks at the areas where the prediction grows rapidly. Also, thegain is positive throughout the operating area indicating that the model ismonotone. Thus, the model seems to coincide with our expectations of whata titration curve should look like.

It is interesting to look at the estimated basis functions, see Fig. 6.3c,showing the adjusted sigmoid functions. The mapping seems to be com-posed of a practically linear term (with coefficient 3.3), a slowly increasingterm (coefficient 0.4) and two sharper correction terms (coefficients -0.3 -0.2). The model output is obtained by multiplying the sigmoids with theassociated coefficients, and adding up the results (see Fig 6.3d).

If the network has too many degrees of freedom, the noise in the finite

TLFeBOOK


a)1.2

1 Jwain(330) =0.0018

¯ J(330) = 0.0005 ~[~9~o-~

0.8Jaes~(330) = 0.0032j

o.6/~

0.4 0 ’~

0.2

0 /-/=5

-0"20 0.2 0.4 0.6 0.8

c)

0 0.2 0.4 0.6 0.8

5

4

3

1

0

-1

-20

b)

0.2 0.4 0.6 0.8

d)

._.M 0(6)=-0.755

0’.2 6.4 0.6 0.8

Figure 6.3: Performance of the model (H = 5). Plot a) shows the trainingdata (circles), the prediction by the model (solid curve) and the noiselessfunction (dotted curve). The upper left corner shows the values of the per-formance criteria (see caption of Fig. 6.2 at the end of training). Plot shows the linearized parameters, i.e., the gradients with respect to the sys-tem input (solid curve) and the constant (dotted curve). Plot c) shows adjusted basis functions and the associated coefficients C~h. Plot d) showsthe basis functions multiplied by their constants (solid curves) and the finalmapping (dotted curve) obtained by summing the components and addingthe bias term. All plots are shown as a function of the system input ~.

TLFeBOOK


102

Figure 6.4: Evolution of the criteria as a function of the LM iterations l.H --= 10 (see the legend of Fig. 6.2).

data set will be captured by the model parameters. A rough cross-validationmethod is to spare a set of observations for model validation. Computingthe minimization criterion for the test data can reveal whether this is thecase. Fig. 6.2 (the curve with crosses) shows the evolution of the Jtest duringthe estimation. It can be seen to drop rapidly during the first 50 iterationrounds and then grow, with a larger increase at l = 200. This indicatesthat our model may have captured some noise in it. Indeed, looking atthe criterion J (the dashed curve in Fig. 6.2), computed using the modelprediction and the noiseless function (notice that this is not available inpractical problems), shows that the minimum occurs at l = 180 and thenstarts to grow. This effect is even more pronounced for the largest networkexperimented (H = 10), as shown in Fig. 6.4. Figure 6.5 shows the predictionby the largest network, which indeed contains some spurious wiggles at ~ E[0, 0.15]. During the parameter estimation for the smallest network (H = 3)such phenomena could not be observed, and the prediction is the smoothestamong the SNNs experimented (see Fig. 6.6).

Comparing the final values of Jtest (see Figs. 6.3a, 6.5 and 6.6) wouldsuggest the smallest network (H = 3) as the ’optimal’ one. For this simulatedexample, we can also compute the deviation of the model from the noiselessfunction (Js in Figs. 6.3a, 6.5 and 6.6), and we find the medium network(H = 5) to have the smallest J. This gives a small taste of the difficultiesassociated with autonmtic data driven structure selection methods. Takinginto account that, in general, different values for initial 0, initial # (1), r/andthe stopping criterion result in different parameter estimates, it is easy to see

TLFeBOOK


1.2

1 ¯ Jtrain(462) = 0.0016 o~. J(462) = 0.0007 ~"_" o ~ 0.8Jtest(462) = 0.0035_.,#P o-

0.6 o~0.4 0

0.2

0 /4=-10

-0"20 0.2 0.4 0.6 0.8

Figure 6.5: Prediction by the largest SNN model (solid curve), training data(circles) and the noiseless function (dotted curve). The upper left cornershows the values of the performance criteria (see legend of Fig. 6.2 at theend of training.)

0.2

0

-0.20

Jtrain(66) = 0.0019

J(66) = 0.0006

Jtest(66) = 0.0031

0.2 0.4 0.6 0.8

Figure 6.6: Prediction by the smallest SNN model (solid curve), trainingdata (circles) and the noiseless function (dotted curve). The upper left cornershows the values of the performance criteria (see legend of Fig. 6.2 at theend of training.)

TLFeBOOK

6.2. OPTIMIZATION UNDER CONSTRAINTS 149

that common sense engineering and process knowledge help a lot in assessingthe validity of an identified model.

6.2 Optimization under constraints

Parameter estimation rarely consists only of minimizing the prediction erroron a fixed data set. Instead, more or less vague conditions are imposedon the model, either implicitly or explicitly. One way to include a prioriknowledge explicitly into the identification is to consider optimization underconstraints. A common technique for solving this kind of problem is theLagrange multipliers approach (see, e.g., [15]).

6.2.1 Equality constraints

Consider a function J(0) of a parameter vector ~ = [01,... , 0pit of P pa-rameters to be minimized

minJ (~) (6.30)

subject to C equality constraints

h~ (~) =

:

hc (0) =

(6.31)

Construct a Lagrange function

C

L(O,A) = J(0) + E~chc(O) (6.32)

which is to be minimized with respect to O and maximized with respect toJk. X = [~1,"" , Ac] are the Lagrange multipliers, also referred to ~ theKuhn-~cker parameters.

The Taylor expa~ion of t~s La~ge function is given by

p=l 0=0;~=~

c OL (O,X)

+~[c=l ~ ]0=0;A=~Ac+’’"(6.34)

TLFeBOOK


The optimality conditions are given by

0L (0, A) 0L (0,

0L (0, A)

Example 37 (Parametergression model

=0 (6.35)

=0 (6.36)

~= y(k)-~T (k- 1)e(k- ~(k- 1)~,T (k-

The identification algorithm can be summarized as follows:

~(k). = ~(k- 1)

which gives for A

and a criterion

1J= ~ Ile(k)-e(~:- 1)112

Determine an identification algorithm for e (k) which minimizes Consider the following Lagrange function:

1 PL= 5 ~(op(k)-0p(k- 1)) 2 + £ [,(k) -0T(k) cp(k- 1)]

p=l

From the optimality constraints it follows that

O(k)-O(~- 1)-~,~(k- 1)

~(~¢)-o ~’(~),e(~- 1) = o

Solving for 0 (k) we derive

y(k) _~T(k_ 1)[a(k- 1)+Av,(k- 1)]

(6.37)

(6.38)

(6.39)

(6.40)

(6.41)

(6.42)

(6.43)

~(~¢- 1)~,~" (~- 1)~(~- [y(k)-~ T (k- 1)~ (k- 1)]

(6.44)

~ (a) o~(~)~ (~¢ - 1)

estimation algorithm) Consider a linear re-

TLFeBOOK

6.2. OPTIMIZATION UNDER CONSTRAINTS 151

6.2.2 Inequality constraints

Let us extend the previous results in order to deal with inequality constraints.Assume that a function J(O) of a parameter vector 0 = [01,... , Op] of parameters is to be minimized

mini (0) (6.45)

subject to C inequality constraints

ql (O) (6.46)0

OL (to, X)O(l+l) = O(1)-~?(l)

A(/+I) = max 0, A(/)+~(/)

where ~ (k) is the le~ng rate (step size).The derivatives with respect to Lagrange multipliers are given directly by

the constraints

OL (to, A)

0~c (1) = qc (~ (l))(6.50)

where c -- 1, 2, ..., C. In order to calculate the gradients with respect to theparameters

C

COL(to, A)CO0p (l) = -’~-~ J (~ (/))c00p + E )~c~p qc ~(l))c=l (6.51)

(6.48)

(6.49)

qc (0)

Note, that an equality constraint can be constructed using two inequalityconstraints.

Again, a Lagrange function L(O, A) can be constructed

L (to, X) = J (to) + E ~q~ (6.47)c----1

where now Ac _> 0 (for more sophisticated approaches see, e.g., [15], pp.302-319, pp. 334-342). The Lagrange function is simultaneously minimizedwith respect to the parameter vector tO, and maximized with respect to themultipliers A.

A simple recursive algorithm solving this problem can be written as fol-lOWS

TLFeBOOK


(p = 1, 2,..., P), the gradients

__oj o (6.52)00, (~(l))

need to be available. In general, they can be approximated by finite differ-ences

"-0--0 J00p() \~ (1)/ J (0 +e~,A0P)2A0p- J (0--epA0v) (6.53)

0~)[~(1)~ qc(O+ep~Op)-qe(O--ep~Op)~qc ~

2~Op(~.~)

where A0v , p = 1, 2, ..., P, are some sma~ variatio~ of the parameters; epis a column vector with 1 ~ the p’th element, zeros elsewhere.

In some c~, an anal~ical form can be given. In the c~e of p~ameter

estimation, the ~adients ~J(0) (p = 1, 2, ..., P) are usually available. prediction error methods, the cost function is given by

K

J (0) = ~ ~ (V (~¢) - ~(a))2 (6.55)

where y is the target output and ~ is the output predicted by model. Thus,

oo,, (o)=~ (y(~)-~(a))b-~(a)k=l

(6.56)

In black-box models, the gradient ~ is required by (unconstrained) param-eter estimation methods, and is usually easily available.

The availability of the gradients of the constraints, b-~ qc (0), depends

the type of constraints. For simple constraints (upper and lower bounds onthe output or parameters, fixed points), analytic forms of the gradients areeasy to obtain. For other typical constraints, such as constraints on gains,poles, deviation from a nominal model etc., these may be difficult to obtain.

Let us collect the results to an algorithm.

Algorithm 28 (Lagrange multipliers) Problem: minimize J(0) subjectto qc (0) _~ 0, c = 1, 2,..., C, with respect to 0 = [01,-.. OF].

1. Initialize:

Set iteration index l - 1.

TLFeBOOK

6.3. GUIDED RANDOM SEARCH METHODS

Initialize 0 (l) and ,k (l)

2. Evaluate model and constraints:

Evaluate ~ (k) and ~ (k) for all patterns

Evaluate qc and ~ for all constraints c.

Evaluate ~ for all parameters p.

0L 0L.3. Compose gradients of the Lagrangian, ~-5 and y~.

0L0A--~ (l) -- qc (l)

153

(6.57)

0L 0Jc

0q~

00-~ (~) = ~ (~) ~ ~°(~)~ (~ (6.58)¢----1

4. Update the parameters and Lagrange multipliers:

~ (~ + 1) = ~ (~) - ~ (~) (6.59)

,k(l+l)=max O,,,k(1) + ~(l)--f~

5. Quit, or set l = l + 1 and return to Step 2.

(6.60)

6.3 Guided random search methods

For solving optimization tasks, such as in parameter estimation, the mostpopular approaches are gradient-based. However, a cost function may haveseveral local optima, and it is well known that gradient-based estimationroutines may converge to a local optimum instead of a global one.

A common solution is to repeat the gradient-based search several times,starting at different (random) initial locations. An alternative is to use some(guided) random search method instead of a gradient-based method, or useboth as in hybrid methods.

With random search methods, the computation of the gradients at eachiteration (often the most time-consuming phase in the implementation and

TLFeBOOK


computation of gradient-based methods) can be avoided. As well, constraintscan be easily included. Most importantly, random search methods perform aglobal search in the search space, and are thus not easily fooled by the localoptima. As this has, however, not shown to be a severe problem i.n many ofthe practical applications, various gradient methods are commonly used dueto their efficacy:

Matyas random optimization method [6] uses the idea of ’contaminat-ing’ the current solution in order to explore the search space aroundthe current solution. At each iteration, a Gaussian random vector isadded to the current solution. If the new solution improves the modelperformance, it will be used in the next iteration. If no improvementoccurs, the old solution is kept and a new (~aussian random vectoris generated. In simulated annealing [46], occasional upward steps inthe criterion are allowed. The acceptance of upward steps is treatedprobabilistically, so that as the optimization proceeds, the probabilityis decreased until the system ’freezes’.

A different approach was taken by Luus and Jaakola [56], whose methodis based on direct search and systematic search region reduction. Ineach iteration, a number of random vectors belonging to the searchspace are generated and evaluated against the criterion. In order toimprove the solution, instead of concentrating on the space close to thebest solution, the direct search is performed in a much larger space.In each iteration, the search space is, however, slightly reduced until itbecomes so small that a desired accuracy has been obtained.

Where simulated annealing finds background from statistical mechan-ics and evolutionary processes, genetic algorithms [22] are motivatedby the mechanisms of natural selection and genetics. From an initialpopulation of solutions, a genetic algorithm chooses the most fitted so-lutions (in the sense of a given criterion) using a selectior~ operator.In order to generate new solutions, operators such as crossover andmutation are used. They are based on a specific form of coding ofthe solutions as strings, which allows recombination operators similarto those observed with chromosomes. On the new population of solu-tions, selection and recombination operators are used repeatedly untilthe population converges, giving a population of fittest solu~tions.

Optimization techniques based on learning automata [75] also belong to theclass of random search techniques. The concept of learning automata wasinitially introduced in connection of modeling of biological systems. They

TLFeBOOK

6.3. GUIDED RANDOM SEARCH METHODS 155

have been widely used to solve problems for which an analytical solutionis not possible, or which are mathematically intractable. They have alsoattracted interest due to their potential usefulness in engineering problemsof optimization and control characterized by non-linearity and high level ofuncertainty. In general, the learning automata are very simple machines, andhave few and transparent tuning parameters. As an example of the randomsearch paradigms, we will next consider the stochastic learning automata inmore detail.

6.3.1 Stochastic learning automaton

Optimization techniques based on learning automata (LA) belong to the classof (guided) random search techniques. In general, random search methodshave attained fairly little interest in optimization, although they have somevery appealing features. Learning automata can be applied to a large class ofoptimization problems, since there are only few assumptions concerning thefunction to be optimized. They are simple, transparent and easy to apply,even for complexly structured or constrained systems. The main advantages,if compared to the more popular gradient-based algorithms, are that thegradients need not be computed and that the search for the global minimumis not easily fooled by the local minima.

A learning automaton is a sequential machine characterized by:

¯ a set of actions,

¯ an action probability distribution and

¯ a reinforcement learning scheme.

It is connected in a feedback loop (see Fig. 6.7) to the random environment(the function to be optimized, the process to be controlled, etc.). At everysampling instant, the automaton chooses randomly an action from a finiteset of actions on the basis of a probability distribution. The selected actioncauses a reaction of the environment, which in turn is the input signal forthe automaton. With a reinforcement scheme, the learning automaton recur-sively updates its probability distribution, and should be capable of changingits structure and/or parameters to achieve the desired goal or optimal per-formance in the sense of a given criterion.

To describe an automaton, introduce the following [75]:

1. U denotes the set {ul,u~,--. ,UA} of the A actions of the automaton,A e [2, ~x)[.

TLFeBOOK

156 CHAPTER 6. ESTIMATION OF PAP~tMETERS

u(k)

randomenvironment

normalizationprocedure

stochasticautomaton ~ ~(k)

Figure 6.7: A learning automaton with a normalization procedure connectedin a feedback loop with the environment. The automaton produces an action,u(k), based on the probabilities of the actions. The environment response,~ (k), is normalized and fed back to the automaton. The automaton adjustsits action probabilities and produces a new action.

2. {u (k) } is a sequence of automaton outputs (actions), u (k)

3. p (k) = [Pl (k),... ,PA (k)] T is a vector of action probabilities at itera-tion k, for which

.4

~-~p~,(k) = 1,Vk (6.61)a=l

4. {7 (k)} is a sequence of automaton inputs (environment responses).Automaton inputs are provided by the environment either in a binary(P-model environment) or continuous (S-model environment) form.

5. T represents the reinforcement scheme which changes the probabilityvector

p(k+ l)=p(k)+rl(k)W(p(k),{~(~)},u(~))~=l,2,...,~: (6.62)

Pa (1) > 0 Va, (6.63)

where r~ (k) is a scalar learning rate that may be time-varying. Thevector T=[T1 (.),..., TA (.)IT satisfies the following conditions for serving the probability measure:

A

~ Wa (.) = 0, Vk, (6.64)a=l

TLFeBOOK

6.3. GUIDED RANDOM SEARCH METHODS 157

pa(k)+zl(k)Ta(.) [0,1],Vk, Va . (6.65)

The operation of a learning automaton can be summarized as follows (seeFig. 6.7)

1. Select randomly an action u(k) from the action set U according to theprobability distribution p(k).

2. Calculate the normalized environment response ~ (k).

3. Adjust the probability vector p(k).

4. Return to Step 1, or quit.

A practical method for choosing an action according to a probabilitydistribution (Step 1) is to generate a uniformly distributed random variable~ =U(0, 1). The a’th action u (k) = ua is then chosen such that a is equal the least value of i, satisfying the following constraint:

~’~p~ (k) _> (6.66)

In the S-model environment, the continuous environment responses (Step2) need to be in the range of ~ (k) ¯ [0, 1]. To achieve this, a normalizationprocedure can be applied, e.g.,

~ (k) = so (k) - mini:l ..... A s, (k)(6.67)

maxi=l,...,A si (k) - mini:l ..... n si (k)

where Ua is the chosen action, sa is the expectation of the environment re-sponse ~ for action a, and ~ (k) denotes the normalized environment response,(k) [0,

A number of reinforcement schenies (Step 3) have been described in theliterature [75]. A general non-linear reinforcement scheme is of the form [67]:

¯ if u (k) = us:

p~(k+ 1)A

Pa(k)+(1-~(k)) j= l, j=/=a

A

-~(k) E h~(p(k))j:l, j¢a

g, (p (k)) (6.68)

TLFeBOOK

158 CHAPTER 6. ESTIMATION OF PAIL4METERS

¯ ifu(k)#u~:

p,~(k+l)=p,~(k)-(1--~(k))g,~(p(k))+~(k)ha(p(k)) (6.69)

where the functions ga and ha are associated with reward and penalty,respectively.

A simple reinforcement scheme, the linear reward-penalty (LR_p) scheme[11], for an automaton of A actions operating in an S-model environment isobtained by selecting

go(p(k)) (6.70)

1ha (p (k))= A- Opa(k) (6.71)

Substituting the preceding equations into (6.68)-(6.69) we have tim followingLR-p learning scheme:

¯ if u (k) = Ua:

p~(k + 1) =pa(k)+~1(1 -p~(k)-~(k)) (6.72)

¯ ifu(k)#ua:

Pa (k + 1) = Pa (k) vI (Pa(k)

Learning automata can be applied to a large variety of complex optimiza-tion problems, since there are only few assumptions concerning the functionto be optimized. Typical applications include multimodal function optimiza-tion problems, see Fig. 6.8. What makes the automata approach ]particularlyinteresting is the existence of theoretical proofs (e-accuracy, e-optimality,convergence with probability one, convergence in the mean square sense, rateof convergence) (see, e.g., [75]). There are almost no conditions concerningthe function to be optimized (continuity, unimodality, differentiability, con-vexity, etc.). Learning automata perform a global search on the search space(action space), and they are not easily fooled by the local minima. In general,learning automata are simple machines, and have few and transparent tuningparameters. The main drawback is in the lack of efficiency (slow convergencerates), in particular for large action spaces.

TLFeBOOK

6.4. SIMULATION EXAMPLES 159

environment

optimizedfunction observation noise

~ lossfunction

learning automaton

Figure 6.8: Multimodal function optimization using a learning automaton.The search region is quantified using X,~, Xb C X; X,~N~,¢bXb = 0; uaXa = X,where u (k) e {ul,... , UA}; and Ua E X,~ are fixed points.

6.4 Simulation examples

This section will concern three applications related to process engineering.First, a SISO Wiener model for a simulated pneumatic valve is identified.Both grey-box and black-box approaches for modeling the static part areconsidered. The results are compared with those from the literature [95],and the estimated parameters are examined. The second example considersthe estimation of the parameters in a MISO Hammerstein model. The datais drawn from a binary distillation column model, considered also in [17].Parameter estimation under constraints posed on the properties of the staticpart is illustrated. The third example illustrates identification of a two-tanksystem under constraints using a Wiener model.

All simulations were performed on a Pentium PC (450 MHz) and MatlabTM

5.2. In the parameter estimation, the functions leastsq.m (Levenberg - Mar-quardt method with a mixed quadratic and cubic line search) and constr, r~(mechanization of the Lagrange multipliers approach) from the Matlab opti-mization toolbox were used. The differential equations were solved using theode23 .m function, inverse problems with fsolve, m.

TLFeBOOK

160 CHAPTER 6. ESTIMATION OF PAtbtMETERS

6.4.1 Pneumatic valve: identification of a Wiener sys-tem

Let us first consider a simple example on the identification of a Wiener sys-tem. A simple model for a pneumatic valve for fluid flow control is givenin [95], where also a Wiener model for the system is identified. Some of thesimulation results can also be found from [36].

Process and data

Pneumatic as well as electrical valves are commonly used for fluid flow con-trol. The static characteristics of a valve for fluid flow control vary withoperating conditions. The input of the model represents the pneumatic con-trol signal applied to the stem, while the internal variable represents thestem position, or equivalently, the position of the valve plug. Linear dynam-ics describe the dynamic balance between the control signal and the stemposition:

z (k) 0.1044q-1 + 0.0883q-2-- = (6.74)u (k) 1 - 1.4138q-1 + 0.6065q-2

The flow through the valve is given by a non-linear function of the stemposition:

Z (k) 1 = 0.1; c2= 0.9 (6.75)

Based on the above model, data sets of 1000 input-output pairs weregenerated in the same fashion as in [95] (see Fig. 6.9): a pseudo randomsequence (PRS) was used as input. In practice it is impossible to obtainperfect measurements; to make the simulations more realistic in the case ofnoisy observations a Gaussian noise was introduced at the output measure-ment. For reference purposes, also a noiseless data set was generated, as wellas two test sets of 200 input-output patterns.

Model structure and parameter estimation

The structure of the linear dynamic SISO model was assumed to be known(riB = 1, nA = 2, d = 1)

bo+b~q-1 2u(k- 1) (6.76)z (k) = 1 - alq-~ + a2q-

TLFeBOOK


1.5

0.5

0

-0.5~

Training data

200 300 400 500time(samples)

1 O0 600 700 1000800 900

I I I200 300 400 500 600 700 800 900100 lO00

time(samples)

Figure 6.9: Training data for the pneumatic valve model. The system input(upper plot) was a PRS signal with a basic clock period of seven samples.The system output (lower plot) was corrupted by Gaussian noise N(0, 0.052).

TLFeBOOK


Thus the linear dynamic part contained three parameters to be estimatedfrom data (remember that b~ is fixed, see Algorithm 23).

Two model structures were considered for the static part. ]in the firstcase, the SQRT structure of (6.75) with the two parameters Cl mid c2 to estimated, was considered. The required derivatives are simple to compute,resulting in

Oy C1Oz (cl + c2z2)~

(6.77)

Oy -z- 3 (6.78)Ocl 2 (C1 -[- C2Z2)g

Oy - za-- 3 (6.79)

0c2 2 (C1 -[- 52Z2)~

In the second case, a 0-order Sugeno fuzzy model was used (see Algorithm18). Let us assume crisp system input(s), triangular fuzzy sets, add-onepartition, product t-norm, and weighted average defuzzification. Then, a0-order Sugeno model can be expressed as

P1 P2 PI

(z2, (z,,.) (6.80)p~=l/)2----1 pl=l

where the input degrees of membership are given by

#,,~,-(zi,~i)=max(min(zi-~i’p’-~ \ ~,,:-_-~ ,~ ~i’+1- zi / ,0), --- ~,,, / (6.81)

where ~i,v, (~i, ") is the membership function associated with the p~’th fuzzyset partitioning the i’th input (pi = 1, 2,... , Pi). The input partition is given

by Di i,p~-i < i,p, . The derivatives with respect to the inputs are givenby

P1 P2 P~ I

p~=l p2=l p~=l j=l;j¢i

where

otherwise

(6.83)

TLFeBOOK


and with respect to the consequent parameters by

I0~" : l-I ~i,p~ (zi, .)(6.84)

O0~pl,P2,"" ,PIi=1

In the SISO problem (I = 1) considered here, five triangular membershipfunctions were used (P1 = 5). This results in a piece-wise linear structurewhich is functionally equivalent to that used in [95]. The antecedent parame-ters (knots) were set to ~31 = [0, 0.2, 0.4, 0.7, 1] as in [95], and the consequentparameters c~ were to be estimated from data.

The parameters of the two model structures were estimated using theLevenberg-Marquardt method, using both noiseless and noisy data result-ing in four simulations. In addition, the results were compared with thosereported in [95].

Analysis

Table 6.1 shows the number of iterations required in the four cases. Thetraining was fast, completed in a few minutes. Note, that the recursiveprediction error method used in [95] is not a batch method, and thus theresults are not directly comparable with [95] (bottom line of Table 6.1).

The root-mean-squared errors (RMSE)

~ K1 (y(k)- (6.s5)RM S E = "-~

on the corresponding training set and on noiseless test data are given inTable 6.2. When the model structure was exactly known (the SQRT case),the match was perfect; the RMSE on training and test set were zero up tofour digits in the noiseless case, and close to the standard deviation of thenoise in the case of noisy data. Also in the case of a more general black-boxmodel (0-Sugeno) the accuracy of all the identified models was good.

Tables 6.3, 6.4 and 6.5 show the true values of the parameters, estimatedvalues, and the parameter values estimated in [95]. Table 6.3 shows thecoefficients of the polynomials A and B, i.e linear dynamics. The maximumdeviation from true zero at q = -0.8458 was less than 1% in the case of SQRTmodel, as well as in the case of 0-Sugeno model identified from noiseless data.For the 0-Sugeno models identified from noisy data, however, the deviationwas more important, yet less than 20%. In all cases, deviation from the truepoles located at q = 0.706’9 + 0.3268i was less than 2%.

TLFeBOOK

164 CHAPTER 6. ESTIMATION OF PAI:b~.METERS

Table 6.1:method.

training trainingtime epochs

SQRT-no noise 47SQRT-noise 640-Sugeno-no noise 370-Sugeno-noise 370-Sugeno-noise[95] (1)

Number of iterations required by the Levenberg-Marquardt

Table 6.4 gives the parameters of the SQRT model, and Table 6.5 theconsequent parameters of the 0-Sugeno model. Note, that the redundancy ingains is removed by fixing the gain of the dynamic part; thus parameter b~ isnot independently estimated. In [95], the problem of redundancy was solvedby fixing the gain of the static part on an interval u E [0.2, 0.4] to 1.5; thusonly the bias for the interval was identified (0.2402). The ’non-identified’parameters are indicated by the parentheses in the tables. However, as thesteady-state gain of the transfer function of the true system (6.74)-(6.75)is one, the parameters are directly comparable. Table .6.4 shows that theparameters were correctly estimated up to two digits in the case where thecorrect form of the non-linearity was known. Table 6.5 indicates that thedifference between the consequent parameters estimated here and the knotparameters estimated in [95] was small, the largest difference appeared in thefirst parameter (IAwll < 0.08), for which region the training set containedonly a few data points.

The performance of the identified 0-Sugeno model on test set data (previ-ously unseen to the model) is illustrated in Fig. 6.10. The upper part of thefigure shows the steps in the system input u, and the filtered intermediatevariable z. Note that the steady-state gain of the linear dynamic filter isone. The lower part in Fig. 6.10 shows the output of the true system y andthe prediction by the model ~, which is obtained by putting the intermediatevariable z through the static (non-linear) part. The fit between the desiredand obtained signals is close, although some deviation can be seen at lowervalues of y. This mismatch is due to the noise in the few data samples fromthat operating area.

Results

In this example we found that the correct parameter values were estimatedusing the Wiener structure and the Levenberg-Marquardt method in pa-

TLFeBOOK


OA

Test data

~1II

t I I f 1 I I

0 20 40 60 80 100 120 140time(samples)

160 180 200

1.2

l

0.6

0.4

20 40 60 80 1 O0 120 140 160 180 200time(samples)

Figure 6.10: Prediction by the 0-Sugeno model identified on noisy data. Theintermediate signal z (upper plot - dashed line) is obtained by filtering theinput sigal u (upper plot - solid line). The model output (lower plot - dashedline) is computed by p~¢ting the intermediate signal through the non-linearstatic function. The true output is shown by a solid line.

RMSE training data test datatrue model-no noise 0 0SQRT-no noise 0.0000 0.0000SQRT-noise 0.0488 0.00140-Sugeno-no noise 0.0297 0.03150-Sugeno-noise 0.0525 0.01260-Sugeno-noise[95] 0.0554 0.0111

Table 6.2: Root-mean-squared error on training and test data.

TLFeBOOK


Linear dynamics b0 bl (b~) a2true parameters-no noiseSQRT-no noiseSQRT-noise0-Sugeno-no noise0-Sugeno-noise0-Sugeno-noise[95]

0.1044 0.0883 -1.4138 0.60650.1044 (0.0883) -1.4138 0.60650.1046 (0.0902) -1.4104 0.60520.1058 (0.0884) -1.3983 0.59250.0997 (0.0991) -1.3926 0.59140.0980 0.0984 -1.4026 0.5990

Table 6.3: Estimated parameters of the linear dynamic block.

SQRT-parameters Cl C2

true parameters-no noise 0.1 0.9SQRT:no noise 0.1000 0.9000SQRT-noise 0.0992 0.8999

Table 6.4: Estimated parameters of the static SQRT-block.

0-Sugeno parameters W1 W2 W3 wa w5

0-Sugeno-no noise0-Sugeno-noise0-Sugeno-noise[95]

-0.0545 0.5743 0.8253 0.9626 1.0021-0.0397 0.5770 0.8252 0.9608 1.00300.0221 (0.5402) (0.8402) 0.9607 0.9852

Table 6.5: Estimated consequent parameters of the static 0-Sugeno block.

TLFeBOOK


rameter estimation. Both noisy and noiseless cases were experimented. Asexpected, the performance of the approach was similar to that in [95], whencomparable. However, the approach suggested here is not restricted to anyparticular form of the static mapping. As illustrated, the suggested identi-fication procedure does not restrict the type of the non-linear static model,as long as the gradients can be computed (or approximated). In the exam-ple, it was shown how a grey-box SQRT-model with a structure justified byphysical background could also be used, as well as a fuzzy black-box model.

In the next example, we will consider the identification of a more compli-cated MISO system.

6.4.2 Binary distillation column: identification of Ham-merstein model under constraints

In a second example, identification of a binary distillation column was stud-ied. Hammerstein modeling of this process was considered in [17], see also[37]. In what follows, special emphasis is focused on the role of processidentification under constraints.

Process and data

Distillation is a complex chemical operation for separation of componentsof liquid mixtures and purification of products. It is widely used in thepetroleum, chemical and pharmaceutical industries. In a typical distillationcolumn, the feed enters near the center of the column and flows down. Vaporsthat are released by heating are condensed and can be removed as overheadproduct or distillate. Any liquid that is returned to the column is calledreflux. The reflux flows down the column and joins the feed stream. Thereflux rate has a major influence on the separation process. Too much refluxmakes the product excessively pure, but wastes energy because more refluxliquid has to be vaporized, while too little reflux causes an impure product.

A simple model for a binary distillation column was given in [90] (pp.70-74). The model is described by a set of differential equations. The com-position dynamics at the bottom, 1st tray, feed tray, top trays and condenserare given by

dxb 1dt ~ ~ (L,x, - Bxb - Yyb) (6.86)

dxl 1d--~- = ~ (V (y~ - Yl) + L2x2 - LIxl) (6.87)

TLFeBOOK


dx1 1Mn~

(V (Ynf-, - yn~) + L,~+,x,~+l - L,fx,~ + Fz~) (6.88)

dXN 1- (V (YN-1 -- YN) + Rxd -- LNXN) (6.89)

MN

dxd 1dt- -~d (Yyg - RXd- Dxd) (6.90)

respectively. At other trays, the composition dynamics are given by

dxn 1-- (V (Y.-1 - Y,~) + L,~+Ix,~+I - L,~x.) (6.91)

dt

The vapor composition at trays n = 1, 2, ..., N and at the bottom are com-puted using a constant relative volatility:

OlXny. = (6.92)1 + (a - 1)

ozxbYb = (6.93)

1 + (a - 1)

The relations between the changes in flows are assumed to be immediate:

B = F-V+R (6.94)

R+F ifn<_nI (6.95)L, = Rif n > nI

D = V-R (6.96)

where B, D, and Ln are flows of the bottom, distillate and liquid at trayn, respectively. The steady-state operating parameters of the .distillationcolumn model are given in Table 6.6. The model is distinguished by an open-book like steady-state non-linearity between reflux and top composition anda strong variation of the apparent time constant with change in reflux.

Using the model, a data set of 900 data patterns was generated by varyingthe values of reflux flow R and distillate flow V (PRS with a maximum am-plitude of 5%, sampling time 10 min). The top composition Xd we~S observedand a model for it was to be identified.


In [17], a Hammerstein model for the process was considered: u~ ~-- R,u2 ~-- V, y ~- Xd, with first order dynamics, nB= O, nA ---- 1, d = 1. Thesame setting for the dynamic part was used here.

TLFeBOOK


Parameter Valuereflux R 1.477vapour boilup V 1.977feed flow F 1feed composition zf 0.5number of trays N 25feed tray nf 12relative volatility c~ 2holdups Mb = Mn = Md 0.5bottom composition xb 0.005top composition Xd 0.995

Table 6.6: Steady-state operating parameters of the binary distillation col-umn model.

The steady state mapping was identified using a sigmoid neural network(SNN) structure. The output of a one-hidden-layer SNN with H hiddennodes (see Algorithm 17) is given

H~ = Zahgh (U,/3h) ÷ ag+l (6.97)

h----1

1gh (u,/3h) (6.98)

where u is the I dimensional input vector. The parameters of the networkare contained in an H ÷ 1 dimensional vector c~ and H × (I + 1) dimensionalmatrix/3. The gradients with respect to the parameters are given by

0~" H

0~,-~. = ~ o~,,g,~ (u,/3,~) [1 - g,~ (u, /3,31Z,~,,+l (6.99)h=l

0~".... 1 (6.100)O0~h gh (U,/3h)

0Z.,~0~

= O~hgh(U,/3h)[1--gh(U,/3h)]~pi (6.101)

= O~hgh (U,/3h)[1 -- gh (U,/3h)] (6.102)

TLFeBOOK


where h = 1, 2, ..., H and i = 1, 2, ..., I. In the distillation colun~a example,H = 6 was used.

In the parameter estimation, optimization under constraints was consid-ered. Constraints were posed on the gain of the static mapping so that

0u-Z > o (6.103)

2ou < o (6.104)

was required. The constraints were evaluated at 625 points, forming agrid with regular intervals on the input space spanned by R and V: ul E{0.95u~8, ..., 1.05u~8}, u2 e {0.95u~,..., 1.05u~}, u~~ = 1.477, u~~ = 1.977.This results in 1250 constraint evaluations at each iteration. The sum ofsquared errors on the training set was then to be minimized under theseconstraints. The parameters were estimated using the Lagrange multipliersapproach.

Analysis

The training data and the prediction of the identified model on training dataare illustrated in Fig. 6.11. The RMSE on training data was 0.6731. Forreference purposes, the parameters were also estimated using the iLevenberg-Marquardt method (no constraints). This resulted in a RMSE of 0.2048 training data. Hence, a more accurate description of training data pointswas obtained using the Levenberg-Marquardt method.

However, the examination of the static mapping shows a significant prob-lem with the unconstrained model. Fig. 6.12 shows the mappings obtainedin the constrained and unconstrained cases. In the unconstrained case, thestatic mapping is non-monotonic. This is due to the small amount of data andthe mismatch in the structure of the plant and the model. The constrainedcase corresponds better to the a priori knowledge of the process behavior(monotonic increasing with respect to R, monotonic decreasing with respectto v).

Compared with [17], visual inspection of model predictions reveals thatmore accurate descriptions of the process were identified with the approachsuggested here. This can be attributed mainly to the more flexible structureused for the static part (a power series was used in [17]). As ipointed outin [17], however, a logarithmic transformation of the output measurementwould provide a more reasonable resolution for real applications.

TLFeBOOK


2.1

2

155

1.~

1.’~

"61’~ ~ ~ ~

1

1.5

14¯ 0 100 200 300 400 500 600 700 800 )00

time(samples)

1.1/ , , , , , ,

/

o.8t0 100 200 300 400 500 600

time(samples)700 800 900

Figure 6.11: Prediction by the Hammerstein model identified under con-straints on static gains. The upper plot shows the model inputs; the lowerplot shows the plant response (solid line) and model responses: the inter-mediate variable (dotted line) and the prediction by the model (dash-dotline).

TLFeBOOK


1.4032 1.477 1.5509R

~ \.R=I.514

~ XR= 1.477

XR-| 440

i782 1.~77 2.0759V

Figure 6.12: Static mapping in the constrained (solid lines) and uncon-strained (dashed lines) cases.

Results

In this example, a Hammerstein model for a MISO process was identified.Parameters were estimated under constraints, where constraints were posedon the static mapping. The suggested approach enables to pose constraintsdirectly based on the a pr/or/ knowledge on steady-state behavior. Typi-cally, information such as minimum and maximum bounds of plemt output,knowledge on sign or bounds of the gains, fixed equilibrium points, etc., isavailable. With linear dynamics, it is simple to pose constraints Mso on thedynamical part, such as bounds on the location of poles and zeros. This ap-plies both for the Hammerstein and Wiener approaches. For put’poses suchas process control, clearly the constrained model can be expected to givebetter performance.

6.4.3 Two-tank system: Wiener modeling under con-

straints

As a final example, let us illustrate the identification of a two-tank processunder constraints, using a Wiener structure.

Process Consider a two-tank system [64], see Fig.considerations lead to the following non-linear model:

dY1(t) /= {Q (~) -- 1dt \ /

6.13. Mass-balance

(6.105)

TLFeBOOK


R2

Figure 6.13: A two-tank system.

dY~(t)dt A2 \

(6.106)

where Y1 and Y2 are the levels, A1 and As are the cross-surfaces, and kl andk2 are the coefficients involved in the modeling of the restrictions and of thetwo tanks, respectively. The following values were adopted: A~ = A~ = 1,k~ = 1, ks = 0.9.

Experiment design

The system was simulated using an input consisting of a pseudo random se-quence. The output measurement was corrupted with a normally distributedrandom sequence with a variance equal to 0.04. From the simulations, a setof 398 measurements describing the behavior of the system were sampledusing T = 1.

TLFeBOOK


Model structure

A SISO, I = 1, Wiener model was constructed from the input flow, Q (t), the level of the second tank, Y~ (t). Second order linear dynamics, N = M = 2, with delay of one sample, d = 1, were considered. A sigmoid neuralnetwork with six hidden nodes, H = 6, was used to model the non-linearstatic behavior of the system.

Parameter estimation (under constraints)

A number of constraints were considered for the static part. These constraintswere evaluated in C = 56 points Qc = {0.0, 0.02, 0.04, ..., 1.1}, c = 1, 2, ..., C.

Constraints on the output:

Jc(0) = f(Q~,0) - Ym~; Ymax -- 1.203 (6.107)

Jc+c (~) = Ymin f (Q~, ~) ; Ymin =

Constraints on the static gain:

J2c+ (0)= of(Q ,o) Km~; Km~x -= 2.5

(6.10s)

(6.109)

J3c+c (0) -- gminOf(Q~, ~). Kmin ---- 0

OQ~ ’

Fixed point in the origin: f(0, 0) --

J4c+1 (~) = f (0,

(6.110)

(6.111)

J4c+2 (~) = -f (0, ~) (6.112)

In addition, the poles pl and P2 were restricted to belong to the circle centeredat the origin with radius ps:

J4c+a (O) = IPl (0)1- Ps;Ps = 0.95 (6.113)

Jac+a (0) = IP2 (0)1- P~; P~ = (6.114)

TLFeBOOK


50 100 150 200 250 300 350 400

50 1 O0 150 200 250 300 350 400time

Figure 6.14: Performance on training data. Upper plot shows the input flowQ (t). Lower plot shows the level of the second tank, Y2 (t). Dots indicatethe points contained in the training set. Solid lines show the correspondingpredictions given by the constrained and unconstrained Wiener models.

Hence, a total number of 228 constraints were posed to the model.

Using the training data, the parameters were estimated under the con-straints given by Eqs. (6.107)-(6.114). For comparison, the same data was used for training a Wiener model with the same structure using theLevenberg-Marquardt method (without constraints). Figure 6.14 shows theperformance of the Wiener models after training (8000 iterations).. The re-sults indicate that the information contained in the training set was wellcaptured in both cases.

Figure 6.15 shows the static mappings provided by the two models. Inboth cases, the static mapping is accurate on the operating area for whichmeasurements were provided in the training set. However, extrapolationoutside the operating area contained in the training set gives poor resultswith the unconstrained model.

It is simple to include additional a priori information using the con-straints. Figure 6.15 shows that the constraints posed on the output of themodel, on the gain, and on the fixed point are satisfied by the Wiener modelidentified under constraints. At the same time, the prediction error on mea-sured data is minimized. For the dynamic part, the following linear model

TLFeBOOK

176 CHAPTER 6. ESTIMATION OF PAI:U~METERS

0 0.2 0.4 0.6 0.8 1input

Figure 6.15: Non-linear static mappings identified by the Wiener models.Solid line shows the response for the static part of the Wiener model in theunconstrained case. Dotted line shows the behavior of the model identifiedunder constraints. The circle indicates the equality constraint.

was identified:

~(k) = 0.14Q (k) + 1.30~(k - 1) - 0.44~(k (6.115)

with poles pl = 0.6476 + i0.143, p2 = 0.6476 - i0.143, IPll = IPll = 0.6632.Thus the constraints on the dynamic part were fulfilled, too.

Figure 6.16 depicts the performance of the Wiener models on test setdata. Note that in the test data the input varies in a wider range than inthe training data. The performance of the unconstrained Wiener model ispoor, whereas for the constrained Wiener model the performance is muchbetter. All the prior information was captured by the constrained Wienermodel. Note that the static output of the Wiener model was constrained tobe always less than (height of the second tank), which was not taken intoaccount in the simulation of the plant, Eqs. (6.105)-(6.106), as shown in 6.16.

6.4.4 Conclusions

The application of Wiener and Hammerstein structures in the identificationof industrial processes was considered. Structures and associated parameterestimation methods were proposed, which resulted in a non-linear steady-state description of the process with dynamics identified as linear OE-typefilters.

In many cases, the dynamics of a non-linear process can be approximatedusing linear transfer functions, and the system non-linearities can be pre-

TLFeBOOK


1

.=, 0.5

~ 1.5

~0.

00

5

0

50 100 150 200 250 300 350 400

50 100 150 200 250 300 350 400

50 100 150 200 250 300 . 350 400time

Figure 6.16: Performance on test data. Upper plot shows the input flowQ (t). Middle and lower plots show the level of the second tank, Y2 (t). Solidlines show the corresponding predictions given by the constrained Wienermodel (middle plot) and unconstrained Wiener model (lower plot).

TLFeBOOK

178 CHAPTER 6. ESTIMATION OF PAIbtMETERS

sented by a non-linear gain only. This provides many benefits i:n the formof robustness in dealing with the bias-variance dilemma, availability of thewell-developed tools for handling both linear dynamic and non-linear staticsystems, and increased transparency of the plant description. In this section,examples of identifying a steady-state static plant model were presented, thusemphasizing the transparency aspects.

In industrial practice, it is common that the steady-state behavior of aprocess is much better known than its dynamic characteristics. With theapproach considered in the examples, it is simple to use this knowledge inthe initialization and validation of a black-box model. If a reliable steady-state model is available, it can be used as a white-box or grey-box staticmapping in the Wiener or Hammerstein structure. Furthermore, there werefew restrictions posed on the form of the static mapping; no specific propertiesof a certain paradigm were used. This enables a non-linear structure to bechosen depending on the application requirements (good transparency-fuzzysystems, high accuracy-neural networks, efficiency and speed-power series,expectable interpolation-piecewise linear systems, etc.). These properties areimportant from the practical point of view of process modeling. In addition,the identification of OE-type of linear dynamics was considered. This type ofmodel is more robust towards noisy measurements, and particularly suitablefor long-range simulation purposes.

TLFeBOOK

Part II

Control

TLFeBOOK

Chapter 7

Predictive Control

7.1 Introduction to model-based control

Models are a basic tool in modern process control. Explicit models are re-quired by many of the modern control methods, or models are required duringcontrol design. In the control of non-linear processes the role of models iseven more emphasized. In the model-based approaches, the controller canbe seen as an algorithm operating on a model of the process (subject to dis-turbances), and optimized in order to reach given control design objectives.

In modeling, the choice of both the model structure and the associatedparameter estimation techniques are constrained by the function approxima-tion and interpolation capabilities (e.g., linear approximations, smoothness ofnon-linearities, a priori information). From the control design point of view,the need for convenient ways to characterize a desired closed-loop perfor-mance gives additional restrictions (e.g., existence of derivatives and analyticsolutions). In addition, many other properties may be of importance (han-dling of uncertainties, non-ideal sampling, data fusion, tuning, transparency,etc.). Clearly, the choice of a modeling method is of essential importance,and therefore a large part of this book has been consecrated for explainingthe various approaches.

In some cases, the behavior of the process operator is modeled (common,e.g., in fuzzy control), or a model of a control-oriented cost function is di-rectly desired (e.g., in some passivity-based control approaches). Usually,however, the characterization of the input-output behavior of the process(or the closed-loop control relevant characteristics) is the target of modeling(on/off-line, in open/closed-loop, etc.).

The theory of modeling and control of linear systems is well-developed. Inthe control of non-linear systems, a common approach has been to consider

181

TLFeBOOK

182 CHAPTER 7. PREDICTIVE CONTROL

a non-linear model, to linearize it around an operating point, and designa controller based on the linear description. This is simple and efficient,fits well to most regulation problems, and can be seen as gain sc:heduling orindirect adaptive control. In particular, linear approaches are difficult to beatin the analysis of dynamical systems. For servo problems, fully non-linearapproaches have been considered, based on the properties of known non-linearities or on the exploitation of raw computing power (e.g., non-linearpredictive control).

Predictive control is a model-based control approach that uses explicitlya process model in order to determine the control actions. In this chapter,the predictive control approach will be discussed for the case of linear SISOmodels.

7.2 The basic idea

Predictive controllers are based on a guess, a prediction, of the future be-havior of the process, forecasted using a model of the process. There existsa multitude of predictive control schemes, which all have four major featuresin common:

1. A model of the process to be controlled. The model is used to predictthe process output, with given inputs, over the prediction horizon.

2. A criterion function (usually quadratic) that is minimized in orderto obtain the optimal controller output sequence over the predictedhorizon.

3. A reference trajectory for the process output, i.e. a sequence of desiredfuture outputs.

4. A minimization procedure.

The basic concept of predictive control is simple. A predictive controllercalculates such future controller sequence that the predicted ou~tput of theprocess is close to the desired process output. Predictive controllers use thereceding horizon principle: Only the first element of the controller output se-quence is applied to control the process, and the whole procedure is repeatedat the next sample.

Any model that describes the relationship between the input and theoutput of the process can be used, including disturbance models, non-linear

TLFeBOOK

7.3. LINEAR QUADRATIC PREDICTIVE CONTROL 183

models, or constrained models. The approach can also be extended for mul-tivariable control. Calculation of the controller output sequence is an opti-mization (minimization) problem. In general, solving requires an iterativeprocedure. Although many types of models can be considered, a majorproblem in deriving predictive controllers for non-linear process models isthe non-linear optimization problem that must be solved at every sample.The way this problem is solved depends on the type of non-linearity of theprocess model. However, if:

¯ the criterion is quadratic,

¯ the model is linear, and

¯ there are no constraints,

then an analytical solution is available. The resulting controller is linear andtime-invariant if the model is time-invariant. This appealing case will beconsidered in the following sections.

Example 38 (Car driver) Consider the process of driving a car. Thisprocess can be assimilated to a SISO system where the input is the variationof the position of the steering wheel towards a given fixed point of dashboard. The output is the position of the car with respect to the direction ofthe road ahead. At each sampling instant the driver of the car calculates thevariation of the control variable and implements it, based on his observationsof the road and the traffic ahead (to see further than the end of one’s nose)and his prediction of the behavior of the car. This procedure is repeated ateach sampling period which depends on the driver.

7.3 Linear quadratic predictive control

In this section, the state space formulation is adopted (see, e.g., [69] [83] [96]).Remember, that a transfer function model can always be converted into astate space form; in fact, for each transfer function, there is an infinite numberof state space representations (see Appendix A for a brief recap on statespace models). First, the state space model and the principle of certaintyequivalence control are introduced. The/-step-ahead predictors for the modelin state space form will be derived. A simple quadratic cost function is thenformulated and the optimal solution minimizing the cost function is derived.Finally, the issues of control horizon, integral control action, state estimationand closed-loop behavior are briefly discussed.

TLFeBOOK


7.3.1 Plant and model

Let a SISO system (plant, process) be described by a state-space model

x(k+ 1) = Ax(k) ÷Bu(k) (7.1)y(k) = Cx(k) (7.2)

where

x is the state vector (n × 1),

u is the system input (controller output) (1 x

y is the system output (measured) (1 x

A is the state transition matrix (n x n)

B is the input transition vector (n x 1)

C is the state observer vector (1 x n)

Let us assume that a model (approximation) for the system is :known andgiven by _~, ~ and ~, and that the states x and output y are measurable.

In the certainty equivalence control, the uncertainty in the parameters isnot consider~.ed; the estimated parameters are used as if they were the trueones (A ~-A, B ~-~, C ~-~). Thus, in what follows, we allow ourselves simplify the notation by dropping out the ’hats’.

The target is to find the control input u (k) so that the desired controlobjectives are fulfilled. The objectives concern the future behavior of theprocess, from the next-to-current state up to the prediction horizon, H~,. Theprediction horizon is generally chosen to be at least equal to the equivalenttime delay (the maximum time delay augmented by the number of unstablezeros). Let the cost function (to be minimized) be given

J=~-~,(w(k+i)-~(k+i))2+ru(k+i-1)2 (7.3)i=1

where w (k + i) is the desired system output at instant k + i. r is a scalarwhich can be used for balancing the relative importance of the two squaredterms in (7.3). The minimization

{u(k),...,u(k + H~,- 1)} = arg min (7.4)u(k),...,u(kWHp -1)

gives a sequence of future controls {u (k), u (k + 1), ..., u (k + H~, -- 1) first value of the sequence (u (k)) is applied to control the system, at control instant the optimization is repeated (receding horizon control).

TLFeBOOK


7.3.2 /-step ahead predictions

Let us consider the/-step ahead predictions. At instant k, the measuredstate vector x (k) is available. For future values of x, the model has to used. The prediction for y (k + 1), based on information at k, is given

ff(k + 1) = C lax (k) + Bu (7.5)

For y (k + 2) we have

ff(k + 2) = C [Ax (k + 1) Bu(k + 1) (7.6)

where the estimate for x (k + 1) can be obtained using the model, x (k + 1) Ax (k) + Bu (k). Substituting this gives

ff(k + 2) = C[A[Ax(k) +Bu(k)] + 1)1 (7.7)

= CA2x(k) + CABu(k) + CBu(k+ (7.8)

In a similar way we have that

ff(k + 3) = CAax(k) + CA~Bu(k) + CABu(k+ 1) + CBu(k +

and, by induction, for the/-step ahead predictioni

+ i) = CA’x (k) + Ch - Bu + j - (7.10)j=l

Let us use a more compact matrix notation. Collect the predicted systemoutputs, the system inputs, and the desired future outputs at instant k intovectors of size (H~, x 1):

ff(k + 1) = [~’(k + 1),... ,~’(k + r (7.11)

u(k) = [u(k),..-,u(k+Hv-l)] T (7.12)

w(k+l) [w (k+l),...,w(k+g,,)]T (7.13)

The future predictions can be calculated from

~(k + 1)= Zchx (k)+ ZchBu

where

KCA

KCAB

CA

CAH,,

CB

CAH,,-1B

(7.14)

(7.15)

".. " (7.16)¯ .- CB

TLFeBOOK


7.3.3 Cost function

The cost function (7.3) can be expressed in a vector form

J = (w(k+l)-~(k÷l))T(w(k+l)-~(k+l)) (7.17)

where R = rI. The solution for u minimizing J is given by

T KcTA~ (w (k + 1) - KcAx (k)) u(k) [R~- Kch, KCAB]-1

Proof. Let us simplify the notations by dropping out the sample indexesk related to time. Minimization can be done analytically by setting thederivative o~ = 0. The derivative is given by

OJ 0 (w- y)

° (w - y) (w-

For the partial derivatives we get

0 (w- ~) = 0 --~uuy = --KcAn

(7.19)

(7.20)

0 uTRu = R; ~ = I (7.21)

Thus, the derivative (7.19) can be written

0__~J = _2K~AB (W- p) + 2RTu(7.22)

0U

Setting the derivative to zero and substituting the vector of future predictionsfrom (7.14) we have

(7.23)K~AB (W -- KCAX -- KCABU) ~- RTu

Solving for u gives the optimal control sequence (7.18).Let us introduce a gain matrix K:

T -IK = [R + KcABKCAB] K~An (7.24)

TLFeBOOK


Denote the first row of K by K1. Since only the first element of the opti-mal sequence is applied to the process, the on-line control computations arereduced to

u(k) = K1 (w(k+ 1)- KcAx(k)) (7.25)

If the system parameters, A, B and C, are constant, the gain matrices K1and KCA can be computed beforehand.

7.3.4 Remarks

In many cases, it is useful to consider an additional parameter in the tuning ofthe predictive controller, the control horizon. The control horizon H,, specifiesthe allowed number of changes in the control signal during optimization, i.e.

Au(k+i) = fo r i >_H~ (7.26)where A ---- 1 - q-~.

A simple way to implement the control horizon is to. modify the KCABmatrix. Let us decompose the matrix in two parts. The first part, K"CAB~contains the first Hc - 1 columns from the left of the KCAB matrix. Thesecond part, vector Kb

CAB, sunIs row-wise the remaining elements of the KCAB

matrix, i.e.

(7.27)

where k~b and ki,j are the elements (i th row and jth column) of the K~AB andKCAB matrices. The new KCAB matrix is then formed by

K AB = (7.28)

In practice, it is useful to introduce also a minimum horizon, whichspecifies the beginning of the horizon to be used in the cost function, ioe.J = Y’]~=H~, (’) in (7.3). A simple implementation can be done by removingthe first H,~ - 1 rows from KCA and KCAB in (7.15) and (7.16), respectively.

Notice, that there is no integral action present. Thus, in the case ofmodeling errors, a steady state error may occur. A simple.way to include anintegral term to the controller is to use an augmented state space model, withan additional state constructed of the integral-of-error, xI (k) = I (k - 1)y (k) - ~ (k). This state then has a gain kz from the augmented state x~ the controller output u.

TLFeBOOK


In general, the states x are not directly measurable. When noise is notpresent an observer is used for state "recovering". In the presence of noise,a Kalman filter can be used to estimate the states (see Chapter 3). Providedthat the covariances of the input and output noises are available or can beestimated, a state estimate minimizing the variance of the state ~timationerror can then be constructed. The Kalman filter uses both the system model(A, B, C) and system input-output measurements u, y in order to providean optimal state estimate.

The behavior of this dynamic system under the feedback, that is simplya function which maps the state space into the space of control variables, isanalyzed in the next subsection.

7.3.5 Closed-loop behavior

In order to analyze the behavior of the closed-loop system, let t~s derive itscharacteristic function. Taking into account the control strategy (7.25), fromthe state-space model (7.1)-(7.2) we derive the relation between the outputy(k) and the desired system output w(k) [ 1 ... = w (k): Substitute(7.25) to x(k + 1) in (7.1) with k ~-

x(k) -- Ax(k- 1) + BK1 (w(k) - KcAx(k- (7.29)

Reorganizing gives

x(k) = [I- q-1 (A- BK1KcA)]-1BKl w(k)

Substituting to (7.2) gives the relation between y (k) and w

(7.30)

y(k) = C [I- q-1 (A- BK1KcA)]-11: w (k)i

(7.31)

and the characteristic polynomial

det [I - q-~ (A - BKIKcA)] (7.32)

Example 39 (Characteristic polynomial) Let a process be described the following transfer function

0.1989q-3y (k) = 1 - 0.9732q-~ u (k) (7.33)

TLFeBOOK

7. 4. GENERALIZED PREDICTIVE CONTROL 189

(this example is discussed in more detail at the end of this chapter). Theequivalent control canonical state-space presentation is given by

0.97 0 0] 1A-- 1 0 0] ;B= 0 ;C= [0 0 0.1989] (7.34)

0 1 0 0

Let us design a predictive controller using Hpa gain vector

-- 5 and r = 1. This results to

K1-[0 0 0.1799 0.1623 0.1514] (7.35)

and

0 0.1989 0 10.1989 0 0

KCA = 0.1929 0 0 (7.36)0.1871 0 00.1815 0 0

The matrix A - BKIKcA is given by

0.8774 0 01 0 0 (7.37)0 1 0

and the characteristic polynomial will be 1 - 0.8774q-~. For r = 0.01, whichpenalizes less the control actions, the characteristic polynomial will be 1 -0.1692q-~’, a much faster response.

Note that the control strategy (7.18) associated with the cost function(7.17) is linear towards the system input, output and the desired output. can be easily expressed in the R-S-T-form:

R (q-’)u(k) = S (q-~)y(k) + T (q-~) (7.38)In the next section, the approach of generalized predictive control is con-

sidered, where a disturbance model is included in the plant description.

7.4 Generalized predictive control

An appealing formulation called generalized predictive control (GPC) of long-range predictive control was derived by Clarke and co-workers [13]. It rep-resents a unification of many long-range predictive control algorithms (ID-COM [79], DMC [14]) and a Computationally simple approach. In the GPC,

TLFeBOOK


an ARMAX/ARIMAX representation of the plant is used. In what follows,/-step-ahead predictors for the ARMAX/ARIMAX model in state space formwill be derived, a cost function formulated and the optimal solution mini-mizing the cost function derived. In the next section, a simulation exampleillustrates the performance and tuning of the GPC controller.

7.4.1 ARMAX/ARIMAX model

Recall the ARMAX and ARIMAX structures from Chapter 3. An AR-MAX/ARIMAX model in the polynomial form is given by:

F(q-1) y(k)= B(q-~)v(k)+C(q-~)e(k) (7.39)

where fj, bj and ci are the coefficients of the polynomials.F (q-l), B (q-l) C (q-l), j = 1, 2, ..., n. For notational convenience, without loss of generality,we assume that the polynomials are all of order n; F (q-~) and C (q-l) monic, and b0 = 0. Substituting v(k) u(k) and F(-~) ÷-- A(q- 1)

in (7.39) gives the ARMAX model, and substituting v (k) ~ Au(k) F (q-l) ~__ AA (q-I) gives the ARIMAX model structure. In w:hat follows,we denote the controller output by v (k). In the ARIMAX case, the finalcontroller output to be applied to the plant will be u (k) = u (k - 1)+Au

The ARMAX/ARIMAX model can be represented in the state-spaceform~ as

and

x(k+l) = hx(k)+n,(k)+Ge(~) (7.40)y(k) = Cx(~)+e(k) (7.41)

The relation between the state-space description and input-output description is given

B (q__) _- cT [qI - -~ B"C (q)F (q) ’ F (q)-- -- CT[qI- A] -~ G ÷ 1

F(q) -- det[qI-A] ;B(q)=CTadj[qI-AIB

C(q) = CTadj[qI-A](~+det[qI-A]

Note that the polynomials are given in terms of the feedforward operator q.

TLFeBOOK

7.4. GENERALIZED PREDICTIVE CONTROL 191

where

-f~ 10.-. 0-f2 0 1 0

-fn-~ 0 0 I-fn 0 0 ..-0

(7.42)

B= [bl b= .-. bn-1 bn]T (7.43)

c = [ 1 0 ... 0 ] (7.45)

If the coefficients of the polynomials F (q-l) and B (q-l) are unknown, can be obtained through identification (see previous chapters). An estimateof C (q-l) may also be identified. On can also consider estimating the ma-trices A, B and C (and G) directly from input-output data using subspacemethods [4S11541.

7.4.2 /-step-ahead predictions

The prediction is simple to derive. Let us consider a 1-step-ahead prediction

y(k+l) = Cx(k + 1) +e(k + (7.46)= C[Ax(k)+Bv(k)+Ge(k)]+e(k+l) (7.47)= C(A-GC)x(k)+CBv(k)+CGy(k) (7.48)

+~(k + 1)

where the last equality is obtained by substituting e (k) = y (k) - Cx from (7.41) and future noise is not known but assumed zero mean. The1-step-ahead predictor becomes2

~(k + I)= C(A-GC)x(k)+ CBv(k)

2The task is to find ~(k + I)

(7.49)

TLFeBOOK


Similarly, for the 2-step-ahead prediction, we have

v(~+2) = Cx (k + 2) + e (k + (7.50)= c [Ax(k + 1) Bv(k + 1)+ Ge(k + 1)1 (7.51)

+e(~+2)= C[A[Ax(k)+Bv(k)+Ge(k)]+Bv(k+].)] (7.52)

+CGe(k + 1)+ e(k +2)and the 2-step ahead predictor becomes~

~(k + 2) = CA[A- GC]x(k) + CABv (k) + CBv(k + 1)

+ChGy(~)By induction, we have the following formula for an/-step-ahead prediction

~(k + i) = CA~-~Bv (k + j - 1) (7.54)k j=l

+c~~-~ [~- ~cl x+CA~-~G~ (~)

E{[y(k+ 1)-~]2}

= E{[C(A-GC)x(k)+CBv(k)+CGy(k)+e(k+I)-~2}

= E { [C (A - GC) x (k) + CBv (k) + CGy (k)

+2 [C (A - GC) x (k) + CBv (k) + CGy (k) - y~ e

+e2 (k + 1)}

= E{[C(A-(]C)x(k)+CBv(k)+CGy(k)-~] 2} +E{e2(k+ 1)}

since e (k + 1) does not correlate with x (k), v (k), y (k) or ~. The minimum is when the first term is zero, i.e. (7.49).

3Proceeding in the same way as with the 1-step ahead predictor, we hmze

2}= E {[CA (A - GC) x (k) + CABv (k) + CBv (k + 1) + CAGy (k)

since e (k + 1) and e (k + 2) do not correlate with x (k), v (k), y (k), ~ or with each The ~i~ce is is ~nimized when (7.53) holds.

TLFeBOOK

7.4. GENERALIZED PREDICTIVE CONTROL 193

Let us use a more compact matrix notation. Collect the predicted syste~noutputs, the system inputs, and the desired future outputs at instant k intovectors of size (Hp x 1):

~(k+l) = [~(k+l),-..,~(k+Hp)]T (7.55)

v(k) = Iv(k),... ,v(k+H,- T (7.56)

w(k+l) -- [w(k+l),...,w(k+gp)]T (7.57)

The future predictions can be calculated from

~ (k + 1) = KcnGcX (k) + Kcn~v (k) + KcnGy

where

KCAGC

KCAB

KCAG

C[A- GC]:

CA"~-~ [A - GC]CB ... 0

CAHp-IB ... CB

TCG:

CAHp-IG

(7.58)

(7.59)

(7.60)

(7.61)

7.4.3 Cost function

Let us minimize the following cost function, expressed in a vector form

J -- (w(k+ 1)- ~(k + TQ(w(k + 1) - ~(k+ 1)) (7.6-~-VT (k) RV (k)

where Q = diag[ql,..-,qgp] and R = diag[rl,... ,rHp]. Notice that ifv (k) ~-- Au (k), the control costs are taken on the increments of the controlaction, whereas if v (k) ~-- u (k), the costs are on the absolute values of control, as in (7.17). The introduction of diagonal weighting matrices Q andR enables the weighting of the terms in the cost function also with respectto their appearance in time.

The optimal sequence is given by

v (k) = [R + K~ABQKcnB]-1 KcTABQX

(w (k + 1) - KCA~CX (k) KcnGY (k))

TLFeBOOK


Proof. Let us simplify the notations by dropping out the sample indexesk. Minimization can be done analytically by setting the derivative oJThe derivative is given by

o~ = (w- p)~q (w-

- (w-~)

+ v 0-~-~uRV + ~vvTM Rv

(7.64)

For the partial derivatives we get

0 (w- p) = 0 -~vvy ---- --KcAB

(7.65)

0 vTRv = R; ~v = I (7.66)

Thus, the derivative can be written as

OJ0--~ ’= --2K~ABQ (w -- ~) + 2RTv

(7.67)

Setting the derivative to zero and substituting the vector of future predictionsfrom (7.58), we have

TKcABQ (W -- KCAGCX -- KCABV -- KCAGY) = RTv (7.68)

Solving for v gives the optimal control sequence (7.63). Let us introduce a gain matrix K:

K = [R + KcTABQKcAB] -1 K~A~Q (7.69)

and denote the first row of K by K~. Since only the first element of theoptimal sequence is applied to the process, the on-line control computationsare reduced to

v(k) = K1 [w(k + 1) Kcn~cx(k) -- KcAGy(k)] (7.70)

If the system parameters, A, B, G, and C, are constant, the gain matricesK~, Kch~c and KCA~, can be computed beforehand.

TLFeBOOK

7. 4. GENERALIZED PREDICTIVE CONTROL 195

7.4.4 Remarks

The disturbance model in the ARIMAX structure

c(q-1) (7.71)d)+ A(q_l)A(q_l)e(k)

allows a versatile design of disturbance control in predictive control. In par-ticular:

¯ with C (q-l) __ A (q-l) (q-l), the approach reduces tothat of section7.3, with no integral action;

¯ with C (q-l) = A (q-l), a pure integral control of disturbances is tained (noise characteristics -~);

¯ with C (q-l) = C~ (q-~), an ARIMAX model with noise characteristicsc_a_ is obtained;AA

¯ with C (q-~) A (q-~) C~(q-~), an ARMAX model wit h noise chateristics Ca is obtained;A

¯ with C (q-~) A (q-l) A (q -l) C1 (q-l), an arbitrary FIRfilt er can designed for the noise (no integral action); etc.

Since the controller is operating on Au, the control horizon is simple toimplement. A control horizon H,: is obtained when only the first Hc columnsof the matrix KCAB in (7.63) are used. Accordingly, the control weightingmatrix R, associated with future vs, has to be adjusted by specifying onlythe first Hc rows and columns. The future control increments: v (k + He),v (k + H~ + 1), ... are then assumed to be equal to zero. H~ -- 1 resultsin mean-level control, where the optimization seeks for a constant controlinput (only one change in u allowed), which minimizes the difference betweentargets w and predictions ~ in the given horizon. With large Hp, the plantis driven to a constant reference trajectory (in the absence of disturbances)with the same dynamics as the open-loop plant.

A minimum horizon specifies the beginning of the horizon to be used inthe cost function. If the plant model has a dead time of d [assuming that b0is nonzero in (7.71)], then only the predicted outputs at k + d, k + d + 1, ...are affected by a change in u (k). Thus, the calculation of earlier predictionswould be unnecessary. If d is not known, or is variable, H~,~ can be set to 1. Asimple implementation can be done by removing the first H~,, - 1 rows fromKcAcc, KCAB and KCAG in (7.59)- (7.61). The corresponding (first H~n rows and columns of the weighting matrix Q need to be removed, too. With

TLFeBOOK


Hc = nA -b 1, Hp = nA q- nB+ 1, H,n =nB + 1 a dead-beat control [8] results,where the output of the process is driven to a constant reference trajectoryin nB -b 1 samples, nA -b 1 controller outputs are required to do so. The GPCrepresents an unification of many long-range predictive control algorithms,as well as a computationally simple approach. For example the generalizedminimum variance controller corresponds to the GPC in which both the Hlnand Hp are set equal to time delay and only one control signal is weighted..

In some cases it is more relevant to consider a cost function with weightson the non-incremental control input

J =+u (k)T (k)

(7.72)

The above equations are still valid with substitutions F (q-l) ~__ A (q-l)and v (k) ~-- u (k) (ARMAX structure). This is a good choice, e.g., if theprocess already includes an integrator in itself. Note, that the control horizonis then implemented as by (7.27) and (7.28).

The ARMAX/ARIMAX model can be seen as a fixed gain state observer.For the noise, we always have e (k) = y (k) - Cx (k). In general, the states are not known (not measured, or there is noise in the measurements). Usingthe state-space model (7.40)-(7.41), a prediction ~ (k) of the state x (k), y and u up to and including instant k - 1, can be written as

~(k) = [A-GC]~(k- 1)+ By(k- 1)+ Gy(k- 1) (7.73)

or, equivalently,

~(k)=A~.(k-1)÷Bv(k-1)+(~[y(k-1)-t2~(k-1)] (7.74)

The prediction ~ (k) is then used for x (k) in the GPC equations.The above observer is also called an asymptotic state estimate [69], an

estimate where the optimal estimate tends to when time tends to infinity.An optimal estimate of the state can be obtained from a Kalman filter:

~(k) = [A-GC]~(k-1)+Bv(k-1)+Gy(k-1)d-K(k)[y(k)-~(k)]

(7.75).

where

~(k)=C(A-(~C)~(k-1).+CBv(k-1)+CGy(k-1) (7.76)

TLFeBOOK

7.5. SIMULATION EXAMPLE 197

and the Kalman filter gain vector is obtained from the following recursiveequations

K(k) (A - GC) P (k - 1)(A T × (7.77)

[Y + C (A - GC)e (k - 1)(A T cT]-I C

P(k) = (A-GC)P(k-I)(A-GC)T

-K (k) C (A- GC)P (k- 1) (A- T(7.78)

where the initial condition is P (0), the covariance matrix of the initial

state estimation error: P (0) = E {(x(0) - ~(0))(x(0) - ~(0))T~ and is the variance of e (k). The asymptotic estimate is obtained when limk-~K (k + 1) = 0, which is true if the eigenvalues of the matrix (A - GC) less than one.

7".4.5 Closed-loop behavior

The GPC control strategy is a linear combination of the system input, outputand the desired output. It can be expressed in the R-S-T-form. As forthe linear quadratic predictive controller, the characteristic function can bederived, proceeding in a similar way as in section 7.3.5. The controller is givenby (7.70). Substituting (7.41) for y (k) in (7.70), substituting the result (7.40), regrouping and solving for x (k) and using (7.41) again, we

y(k) = {I-q -1 [A- BK1 (KcAGC ~- KCAGC)]} -1 (7.79)

× BK~ : w(k)+(BK~KcAG+G)e(k-1)+e(k)

and the characteristic polynomial is given by

Get{I-q-I[A-BKI(KcAac+KcAGC)]} (7.80)

The next subsection is dedicated to a control problem originating froman industrial process.

7.5 Simulation example

Let us consider an example of the control of a fluidized-bed combustor (seeAppendix B).

TLFeBOOK


Consider a nominal steady-state point given by Qc = 2.6 k~ (fuel feedrate), F1 -- 3 1N’’’3 (primary air flow) and F2 Nm3 (secondary airflow). The following linearized and discretized description between combus-tion power and fuel feed is obtained from the plant model using a samplingtime of 10 seconds:

0.1989q-3P (k) = 1 - 0.9732q-1Vc (k) (7.81)

Assuming an ARIMAX-model structure with C (q-l) = A (q-l) (integratingoutput measurement noise) we have the following state-space model for thesystem

x(k+l) Ax (k)+BAu(k)+Ge(k) (7.82)

y(k) -- Cx(k)+e(k) (7.83)

where y ~ P, u ,:-- Qc, and the matrices are given by

I1.9732 1 0-0.9732 0 10.0000 0 0

c [1 o 0],a--

0.0000, B = 0.0000

0.1989

1 3-0.9732 to.oooo j

(7.84)

(7.85)

Let us first design a mean-level controller: Hc = 1 (control horizon),Hp = 360 (large prediction horizon corresponding to 1 hour of operation).The gain matrices are then given by

KCAB

KCAGC

000.19890.3925:

7.4212

0.97321.92032.84213.7393:

36.3114

,Kcha ---- 1 I:

i1

1 01.9732 12.9203 1.97323.8421 2.9203: :

37.3113 37.3113

(7.86)

(7.87)

TLFeBOOK


26

24

2¢

18

10 20 30 40 50 60 7O

3.2

3

2.8

2.6

(:Y 2.4

2.2

2

1.~10 20 30 40 50 60 70

t [mini

Figure 7.1: Mean-level control. Hc = 1, H,,l = 3, Hp = 360, R = 0, Q = I.The upper plot shows the combustion power, P [MW], controlled by the fuelfeed rate Qc ~

Hm can be given as equal to the time delay, H,~ = 3.

The ’ideal’ mean-level control result (using weighting matrices Q = and R = 0) is shown in Fig. 7.1, where the linear model (7.81) is used the process to be controlled. In mean-level control, the plant has open loopdynamics, the closed loop characteristic polynomial, (7.80), is 1 - 0.97q-1.

A tighter control can be obtained by reducing the length of the predictionhorizon (Hp = 30 in Fig. 7.2,) and/or increasing the control horizon (Ho 30, Hc = 2 in Fig. 7.3). The characteristic polynomials are given by 1 0.93q-1 and 1, respectively. Notice, however, that in the latter simulationthe control signal is bounded, whereas the computation of the characteristicpolynomial was based on an (unconstrained) linear model.

Figure 7.4 shows a more realistic simulation, where the differential equa-tion model was used for simulating the plant. Measurement noise with a

TLFeBOOK


26

24

22

20

18

10 20 30 40 50I

60 70

T I"

3.5

3

2.5

2

1.5

1

0"50 I 0 20

"[ T

.I I.

30 40 50 60’ 70t [minl

Figure 7.2: A typical GPC setting. Hp = 30, see Fig. 7.1 for other details.

TLFeBOOK


26

24

22

20

18

16~ 10 20 30 50 60 70

5

4

2

1

10 30 40 50 60 70t [mini

Figure 7.3: Dead-beat type of setting. H~, = 30, H,: = 2 (see Fig. 7.1 for

other details). Note that the input was constrained on the range [0.5, 5].

TLFeBOOK

202 CHAPTER 7. PREDICT1-VE CONTROL

26

24

22

20

18

10 20 30 40 50 60 70

3.5

3

~2.5

2

1.5

10 10 20 30 40 50 60 70

t [mini

Figure 7.4: GPC control. Hc = 1, H,,l = 3, Hp = 30, R = 100I, Q = I. Theupper plot shows the combustion power, P [MW], controlled by the fuel feedrate Qc[~]. The plant was simulated using the differential equation model,with output noise N(0, 0.21). An unmeasured 25% heat value loss affects theprocess at t = 55 rain.

standard deviation of 1% of the nominal value was added to the: output. Inaddition, an unmeasured disturbance (25% step-wise drop in fuel power)affects the simulated process at t = 55 rain. An ARIMAX model withC (q-l) = 1 - 0.9q-1 was designed for disturbance rejection. In addition,a nonzero control weight was used, R = 100I to reduce jitter in the con-troller output.

TLFeBOOK

Chapter 8

Multivariable Systems

In this chapter, the control of linear multivariable systems is considered.First, the design of a MIMO control system is reduced to several SISO de-sign problems. The relative gain array (RGA) method aims at helping choose suitable pairs of control and controlled variables. If the interactionsbetween the variables are strong, the system may not be satisfactorily con-trolled by SISO controllers only. In this case the interactions can be activelyreduced by decouplers and the control of the decoupled system can then bedesigned using SISO methods. Decoupling is considered in the second sec-tion, and a simple multivariable PI controller (MPI) based on decoupling both low and high frequencies is presented. The third approach considered inthis section is a ’true’ multivariable control approach. The design of a multi-variable generalized predictive controller (MGPC) is considered, which solvesthe MIMO control design problem by minimizing a quadratic cost function.Simulation examples conclude this chapter.

All methods are based on models of the system. However, only steady-state gains are required by the RGA method; steady-state and high frequencygains by the MPI approach. These can be determined experimentally byusing relatively simple plant experiments. The MGPC approach requires adynamic model of the MIMO system, the identification of which may be amore laborious task and require more extensive experimenting with the plant.

For MIMO systems, the state-space fortnulation is simpler than, e.g., thatof polynomial matrices. Therefore, state-space models are assumed in whatfollows. In the case of MGPC, the conversion of a system model from apolynomial matrix form to a state-sp~ce form is also considered.

203

TLFeBOOK

204 CHAPTER 8. MULTIVARIABLE SYSTEMS

8.1 Relative gain array method

For processes with N controlled outputs and N manipulated variables, thereare N! different ways to select input-output pairs for SISO control loops. Oneway to select the ’best’ possible SISO controllers among the configurations, isto consider all the N! loops and select those input-output pairs thai; minimizethe amount of interaction between the SISO controllers. This is the relativegain array (RGA) method, also known as Bristol’s method (see, e.g., [90],pp. 494-503).

The RGA method tries to minimize the interactions between SISO loops,by selecting an appropriate pairing. It does not eliminate the interactions,it merely tries to minimize the effect. It only relies upon steady-state infor-mation. If dynamic interactions are more important than those occurring atsteady-state, then clearly RGA is not a good method for such systems.

8.1.1 The basic idea

Consider a stable N-input N-output process. Let us define a relative gainbetween an output Yo (o = 1, 2, ..., O) and a manipulated variable u~ (i 1, 2, ..., I) (O = I = N)

au, j ~ co,,~,a,,t v~#,(8.1)

L AUi ] Y~ constant Vk~o

where the notation ’u~ co~tant Vk ~ i’ denot~ that the valu~ of the ma-nip~ated v~iabl~ other th~ u{ are kept co~tant. Sillily ’y~ co~tantVk # o’ denot~ that M1 outputs except the o’th one ~e kept constant bysome control loops. The, the numerator in (8.1) is the open-loop ste~y-state gain of the system (the difference betw~n i~tial and final steady-stat~in output o, divided by the amplitude of the step change in input i). Thedeno~nator in (8.1) is the closed-loop steady-state g~n, where all other out-puts except the o’th one are controlled using a controller w~ch eli~nat~steady-state error (e.g., a PI-controller). The ratio of the two g~ns defin~the relative gain Ao,~.

The value of Ao,i is a ~ef~ me~e of interaction. In partic~ar (s~[521):

1. If Ao,i = 1, the output Yo is completely decoupled from ~l other inputsth~ the i’th one. T~s p~ of v~iabl~ is a perfect choice for SISOcontrol.

TLFeBOOK

8.1. RELATIVE GAIN ARRAY METHOD 205

If 0 < Ao,~ < 1, there is interaction between the output yo and inputvariables other than ui. The smaller the Ao,~, the smaller the interactionbetween output yo and input u~.

If ~o,~ = 0, then output Yo does not respond to changes in input u~.Consequently, the input u~ can not be used to control the o’th output.

If ~o# < 0, then the gains of the open- and the closed-loop systems havedifferent signs. This is dangerous, as the system is only conditionallystable1.

5. If Ao# > 1, the open-loop gain is greater than closed-loop gain. Thiscase is also undesirable~.

A N × N matrix of relative gains (Bristol’s matrix) collects all the relativegains into a matrix form.

A~,t A~,~ ... AI,N

AN,1 AN,2 "’" AN, N

The sum of each row and column of the matrix is equal to one.The RGA method recommends the following way to pair the controlled

outputs with the manipulated variables:

Proposition 1 (BristoPs method) Select the control loops in such a waythat the relative gains Ao,i are positive and as close to unity as possible.

In other words, those pairs of input and output variables are selected thatminimize the amount of interaction among the resulting loops.

1 Assume, for example, that the system is in open loop, and that the gain between Yo

and ui is positive. This would then fix the gain(s) of the controller (e.g., positive gainsin PI-control Aui ----- kpAeo q- kteo (eo = Wo - Yo)). If the other loops are then put toautomatic mode (controlled), the sign of the gain between yo and ui changes sign (sinceAo,i < 0). Consequently, the gain of the controller designed for the open loop system hasgain with a wrong sign, which results in instability.

2In most instances the Yo - ui controller will be tuned with the other control loops inmanual mode. When the other control loops are then put into automatic mode, the gainbetween yo and ui will reduce (since Ao,i > 1) and the control performance for yo willprobably degrade. If the Yo - ui controller is then re-tuned with a higher gain, a potentialproblem may arise: If the other loops are put back in manual mode, the gain between Yoand ui would increase. Coupled with the new high gain controller, instability could result.The greater Ao,i is, the more pronounced this effect is.

TLFeBOOK


8.1.2 Algorithm

When a model of the system is available, the Bristol’s method is simple tocompute. Consider a static model of an N-input N-output process:

y = K~u (8.3)

Without loss of generality we can assume for a linear system that the initialstate is at y = 0, u = 0. The open loop gains for a unit step are given bythe coefficients of the gain matrix [K~]o,~ = k~ o,~:

= (8.4)constant

In order to solve the closed-loop gains let us compute the inw~rse of thesystem

-1u = K~ y = My (8.5)

and denote the inverse matrix by M, [M]o,i = mo,~. In closed loop, allthe other outputs axe controlled so that the steady-state remains the same,except for the o’th one (Ayj = O, Vj ~ o, Ayo = Aye). We can then write thefollowing steady-state relation between the i’th input and the o’th output:

= MAy

"0

0=MAy;

0

0

mi,o iYo

II

L mN, o J

(8.6)

(8.7)

Taking the i’th row of the above system of equations gives

AU~ = m~,oAy~ (8.8)

and

yk constant Vk¢o

1(8.9)

TLFeBOOK

8.1. RELATIVE GAIN ARRAY METHOD 207

where mi,o is the (i, o)’th element of the inverse of the process’ steady stategain matrix. Thus, the elements of the Bristol’s matrix are given by

"~o,i = kss o,imi,o (8.10)

Let us give an algorithm for computing the Bristol’s matrix, when a linearmodel for the system is available.

Algorithm 29 (Bristol’s method) Given a steady-state process model

y = K.~,~u (8.11)

the Bristol matrix is given by

A = K~ ® (g~1) T (8.12)

where ® denotes the element-wise multiplication.

Example 40 (Brlstol’s method) Consider a 2 x 2 system

¯ Let the following steady-state information be available

where

0.15 -0.2

This results in the following matrix of relative gains

A= 0 1

The Bristol’s method then suggests to select SISO controllers for pairsyl - ul and y2 - u2, which is intuitively clear since the input u~ has noeffect on the output y~.

Let the system be given by

0.15 0.2


0.6 0A

The Bristol’s method then suggests toy~ - u2 and y~ - ul.

(8.16)

(8.17)

select SISO controllers for pairs

TLFeBOOK


Let the system be given by

[1Kss= 0.15 0.2 (8.18)


-2 3 ](8.19)A= 3 -2

The Bristol’s method then suggests to select SISO controllers for pairsyl - u~ and ye - ul. There may be problems in switching betweenautomatic and manual modes, but at least the gains in open and closedloop will have same signs.

Example 41 (Fluidized bed combustion) A steady-state model for FBC plant (see Appendix B) in the neighborhood of an operating point given by

-0.0688212.29162.728.103

0.0155 0.0155-93.73 0-5.87 -18.290 0

Qc

(8.20)

¯ Let us first consider that the outputs CF (flue gas O~), TB (bed temper-atures) and P (combustion power) are controlled by the three inputs(fuel feed, primary and secondary airs). The Bristol’s matrix becomes

0 0 10 1 01 0 0

(8.21)

Thus the suggestion is to control oxygen with secondary air, powerwith fuel feed, and bed temperatures with primary air. :For the firsttwo, this is common practice in reality; the bed temperatures are notusually under automatic control.

¯ Let us consider controlling the freeboard temperatures ~.~, instead ofbed temperatures. The Bristol’s matrix is given by

I0 1.4734 -0.47340 -0.4734 1.473410 0

(8.22)

TLFeBOOK

8.2. DECOUPLING OF INTERACTIONS 209

The suggestion is still to control the power by fuel feed (note that this issimple to reason using physical arguments, too). For the temperaturesand air flows the situation is more complicated. The suggestion is nowto use primary air for 02 control and secondary air for the freeboardtemperatures; if chosen otherwise the open- and closed-loop gains willhave different signsa. In practice, freeboard temperatures are not underautomatic control.

If the number of input and output variables is not the same, then severalBristol’s matrices need to be formed. Assume that there are O output vari-ables and I (0 <_ I) possible manipulated variables. Then an O × 0 matrix relative gains can be formed for all different combinations of O manipulatedvariables. All the matrices need to be examined before selecting the O loopswith minimal interaction. The rule for the selection of control loops remainsthe same, i.e. the control loops that have relative gains positive and as closeto unity as possible are recommended.

The RGA-method indicates how the inputs should be coupled (paired)with the outputs to form loops with the smallest amount of interaction.However, this interaction may not be small enough, even if it is the smallestpossible. In this case, decouplers can be applied. These will be consideredin the next section.

8.2 Decoupling of interactions

The purpose of decouplers is to cancel the interaction between the loops.The remaining system can then be considered (and designed) as having interactions at all. Hence, a multivariable control design problem is con-verted into a set of SISO control design problems, by introducing artificialdecoupling compensations (see, e.g., [90], pp.504-509).

The interactions can be perfectly decoupled only if the process is perfectlyknown. In practice, a perfect model is rarely available. Thus only a partialdecoupling can be obtained, with some (weak) interactions persisting. may also be that the decouplers are not realizable, or that the degree ofdecouplers would be too high for a practical implementation. In this case,some realizable form of the decoupler can be considered. For a stable process,

3The gains from F~ and F2 to CF are equal, but the gain from F~ to TF is significantlysmaller than that from F2. If F2 is used for 02 control, and F~ for temperature controlthen each action taken by the F2 would need to be compensated by a (larger) counter-action in F~. Thus the open- and closed-loop gains would have different signs dependingon wheather TF -- F1 controller is on or off.

TLFeBOOK


a steady-state decoupler is always realizable. Remember that for a severelyinteracting system, static decoupling is better than no decoupling at all.

There are a number of different approaches, the most famous being per-haps the Rosenbrock’s (inverse) Nyquist array method (see, e.g., [57]), whichis a frequency response method seeking to reduce the interaction by using acompensator to first make the system diagonally dominant. In what follows,a simple scheme for designing a discrete-time multivariable PI controller ispresented.

8.2.1 Multivariable PI-controller

In [74] a multivariable PI controller was suggested. The main idea is todecouple the system both at low and high frequencies. The original deriva-tion was based on a continuous-time state-space model, in what follows thediscrete-time case is considered.

Consider a linear time-invariant stable multivariable plant described bythe following discrete-time equations

x(k+ 1) -- Ax(k) +Bu(k) (8.23)= Cx( (S.24)

and controlled by a multivariable PI-controller

Au (k) = KpO~vAe (k) + K~cqe (8.25)

where

e(k)=w(k)-y(k) (8.26)

and ap and at are tuning variables (diagonal matrices) and w (k) containsthe set points. The idea is that the P-part decouples the system at highfrequencies, while the I-part decouples the system at low frequencies (steady-state). Let us first consider the P- and I-controllers separately, and thencombine them together.

P-controller

Let us first assume that the system is controlled by a P-controller and thatthe aim is to drive the error e to zero as fast as possible. For the highfrequencies we can write (Ay is the component with the highest frequency

TLFeBOOK

8.2. DECOUPLING OF INTERACTIONS 211

that can be described by the discrete-time model):

Ay(k+ 1) = C~x(k + 1) (S.27)= C[x(k+ 1)- x(k)] (8.28)= C[Ax(k) +Bu(k)- x(k)] (8.29)= C(A- I)x(k) + CBu(k) (8.30)= C(A- I)x(k) + Khi~,u(k) (8.31)

where

Khigh ---- CB (8.32)

Consider that, at sample instant k, x is initially in a desired steady-stateand a step change in the reference signal, Aw (k + 1), occurs at k + 1. then have

e(k+ 1) = Aw(k+ (8.33)

= w(k + 1) - w(k) (8.34)

= w(k + 1)- y(k) (8.35)

In order to drive the error w (k + 1) -y (k + 1) to zero in one control sample(if possible), we need to have

y(k+l) = w(k+l) (8.36)Ay(k+ 1) = Aw(k+ (8.37)

C(A- I)x(k)+KhighU(k) = e(k+ (8.38)

and we can solve for the manipulated variables

u(k)= Khig,,-1 [e (k _~v1) _. C (A _ i) (8.39)

Au(k) -1= KhighAe(k + 1) (8.40)

where the last equality is obtained using Ax (k) = 0 since x (k) was a steady--1state. Thus, Kp --- Khigh in (8.25), if the inverse exists.

I-controller

Let us now consider the case where the system is controlled by an I-controller.From the system model we obtain the following relationship for a steady-state(by setting x(k + 1) = x(k) =

x~ = Axe8 ÷ Bu~8 (8.41)

Xss ----- (I -- A)-1 Bu~ (8.42)

TLFeBOOK


and

which gives

y~.~ = Cx,~ (8.43)

y~ C (I -’= - A) Bu~ (8.44)

y~ = K~u~.~ (8.45)

and

where

e(k) -- w(k) - (8.53)

Ae(k) = e(k)-e(k-1) (8.54)

Kp = [CB]-1 (8.55)

SI = [C (I - A)-1B ]- I (8.56)

(8.52)

I controllers. The controller was given by (8.25)

Au(k) = Kp(~pAe (k) + Kic~,e

where

K~ = C (I - A)-1B (8.46)

In order to drive the steady-state error (for a step change in the referencesignal) to zero, we need to have

y~,,,~,w = w (8.47)

Ksstlss,lmW = iw + y (k) (8.48)

Ks~ (u~,o~d + Au~,,,,,w) = Aw + Y~,o~d (8.49)

K~Au~,,,ew = Aw (8.50)

The required change at the controller output at k + 1 is then

Au(k)- i~le (k + (8.51)

Thus, KI = K~1 in (8.25), if the inverse exists.

PI-controller

The PI controller can now be constructed by combining the tuning for P and

TLFeBOOK

8.3. MULTIVARIABLE PREDICTIVE CONTROL 213

Kp and K~ provide decoupling at high and low frequencies. The tuning ofthe controller is conducted by adjusting the ~p and c~, starting with smallpositive values (0 < ap,~ << 1, 0 < a~,. << 1):

~pO~P,2

0

0

O~P,N

0

(s.57)

(8.58)

O~I,N

results in an I-controller only; similarly c~,~ -- 0 results inSetting Otp,npure P-control. With ap,~ = 1, an aggressive P-control is obtained, whichtries to drive the error to zero in one sample. With a~,~ -- 1, the controlleroutput at k + 1 is set to the value which provides the new steady-state (mean-level control). In the presence of noisy measurements and modelling errors,these can provide instability to the closed loop system, and unrealizablecontrol signals. Therefore, smoother control is usually desired, at the cost ofclosed-loop performance.

8.3 Multivariable predictive control

In Chapter 7, the generalized predictive control (GPC) for SISO systems wasconsidered. In this section, we will extend the concepts of SISO GPC to thecontrol of MIMO processes (see [45][96]).

8.3.1 State-space model

A MIMO system can be conveniently described by a state-space model. Letus consider a multivariable input--output polynomial model of the form

r (q-l) y (k) = B (q-’) v (k) + C (q-l)

where the polynomial matrices A, B and C are given by

F (q-l) = I + Flq-1 +... + FNq-N (8.60)

B (q-l) Blq-’ +" " + Bgq-g (8.61)

TLFeBOOK

214 CHAPTER 8. MULTIVAPJABLE SYSTEMS

C (q-~) = I+ C~q-~ +... + CHq-~ (8.62)

The output and noise vectors y and e, respectively, are of size O x 1:

y(k) = [Yl ,yo(k)]T (8.63)e(k) T (8.64)

Consequently, matrices F (q-i) and C (q-~) are of size O x O, (as well matrices F,~ and C,~). Input vector v is of size I x 1:

v(k) = [,)1 (k),v2(k),"" ,vi(k)]T (8.65)

matrix B (q-~) is of size O x I (as well as matrices B~). Without loss generality we assume that all polynomials are of order N.

The above polynomial model can be represented in a (canonical observ-able) state-space form:

x(k+ 1) = AX(k) + By(k) + (8.66)y(k) -- Cx(k)+e(k) (8.67)

Please note that A, B, C and G are matrices in the state-space model,whereas F (q-l), B (q-l) and C (q-l) are polynomial matrices. The matricesA, B, G and C are given by

A

00

--FN-1

= [ B~ B~

[ C1 - F1

= [I 0 ...

0 00 0 ...

¯¯ ¯ BN~I

C2 - F~

O]

(8.68)I0

B B.u ]T (8.69)

G "" CN-i-FN-i CN--FN ]T(8.70)c (8.71)

The matrices will now have the following sizes: [A] = NO × NOI [B] =NO x I, [G] = NO x O and [C] -- O × NO.

Example 42 (Representation of a 2 × 2 system) For a 2 × 2 P-canonical

TLFeBOOK

8.3, MULTIVARIABLE PREDICTIVE CONTROL 215

system with common denominators we have (8.59)

1 + fl,l,lq -1 + ... + fN, l,lq-N

0

_ [ bl,~,lq-~ + ... + bg3,1q-N-- [ bt,23q-1 + ... + bN,2,iq-g

o1 + fl,2,2q -1 + ... + fN,2,2q-N ×

(8.72)

bl,2,2q -1 n~ ... q- bN,2,2q-N v2 (k)

0 c~,2,2q-1 -t- ... q- CN,2,2q-N e2

With this notation, the elements (i, j) of the matrices Fn consist then of thescalar coefficients f,~,~,j, matrices B and C are constructed in a similar way.We then have a state-space representation (8.66)-(8.67) with matrices A, C and G given by:

--fl,l,1 0

[0 -fl,m ]:

0 --fN-1,2,2 ]

-f~,l,1 00 -f~,~,~

10]0 00 00 00 0

0 0

..

[1

00 10 00 0

(8.7a)

bN-l,l,1 bN-l,l,2 ]

bg-l,2,1 bg-l,2,2bg,~,t bN, l,~ ]

bN,2,1 bN,2,2

C1,1,1 -- /1,1,1 0 ]

0 Cl,2,2 -- /1,2,2

CN-1,1,1 -- fN-l,l,1 0 10 c~v-1,~,2 - f~v-~,~,~JCN, I,1 -- fN, l,1 0 1

0 c~,m - f~r,~,~

(8.7a)

(8.75)

TLFeBOOK


where all the elements of the matrices are scalars.

8.3.2 /-step ahead predictions

The optimal predictions at sample instant k + i will be

~(k+i)i

y~ CAi-JBv (k + j - 1)j=l

+CAi-~ [A - GC]x (k)

+CAi-IGy (k)

(8.77)

Let us use the following condensed notation:

^T (k + gp)]T_- [~T(k+ 1),... v~ (~ + H, 1)]~= [v~(~),..., -

that is

(s.78)(8.79)

~ (k + H~,)

~1 (k+ 1)~ (k + 1):

~o (k + 1)~1 (k+2)~(k+2):

~o (k +2)

~ (k + g,) ~(k+H,)

~o (~ + g~)

(s.so)

TLFeBOOK

8.3. MULTIVAPJABLE PREDICTIVE CONTROL 217

V(k)=v(k)

v(k+H,,- 1)

~1 (k)~2 (k)

v~ (k)v~ (k + 1) v2 (k + 1)

vl (/¢ + 1)

vl (k + H~,- 1)v2 (k + Hp - 1):

v~(k + Hp- 1)

(8.81)

We have then a global predictive model for the Hp future time iustants k + 1,k+2, ..., k+Hp:

~ (k + 1) = KCAGCX (k) + KCABV (k) + KCAGY (8.82)

where

KCAGC

KCAB

KCAG

C [A - GCI,.-., CAH"-’ [A - GC]]TCB ... 0

CAH"-IB "’" CB

CG,... , CAHP-1G] T

(8.83)

(8.84)

(8.88)

8.3.3 Cost function

Let us consider .the following cost function

J = [W(k+l)-~(k+l)]Q[W(k+l)-~(k÷l)]v

+v~ (~¢) RV (~)

where

(8.86)

W (k + 1) = T (k + 1), ...,w T (k + Hp)T (8.87)

TLFeBOOK


that is

w(~+l)=w(~+l)

w(~+ H,,)

wl (k + 1)w2 (k + 1)

wo (k + 1)wl (k + 2)w2 (k + 2)

wo (k + 2)

Wl (k_~ Hp) -i

~2(~+H~,) !

wo (k + Hp)

The optimal control sequence is given by

V(k) = [K~AuQKcA, +R]-~

XK~ABQ [W (k + 1)- KCAGCX (k)- KCAGY

(8.88)

(8.89)

Comparing with the SISO case, we see that the equations have the sameform. Only the sizes of the vectors and matrices are different since an O-output/-input system is considered instead of a SISO system. Instead of ascalar prediction, the predictions ~, (8.77), and targets w are given by an O 1 vector¯ Likewise, instead of a scalar input and noise, the system inputs arenow given by~.a I x 1 vector v as well as the noise e. The collection of all futurepredictions Y as well as future targets W are now long vectors of lengthOHm,, system inputs V are contained in a vector of length I~. Similarly,the elements of the gain matrices (8.83)-(8.85) have been composed by pilingthe future predictions on top of each other. However, from a technical pointof view, the future predictions, controller outputs and targets are still givenin column vectors just as in the SISO case. Therefore, the solution also hasa similar form and we will give no proof for it.

8.3.4 Remarks

The implementation of the minimum horizons H,~, prediction horizons H,and the control horizons H~ can be done in the same manner as in theSIS0 case, by ’cutting’ the matrices. For simplicity, in the above derivation

TLFeBOOK


same horizons were assumed for all inputs and outputs. Note, however,that when different horizons are used, H,n,o ~ H,n,j; Hp,o ~ Hp,j and/orHc,~ ~ H,~.,j for input i # j and/or output o # j, the ’removed’ elements inthe matrices need to be replaced by zeros. If R, (8.86), is a nonzero matrix,this results in numerical problems. To avoid this, all the rows and columnsof KCAB containing only zero elements need to be removed, as well as thecorresponding rows and columns of Q and R, KCAGC, KCAG and W.

In the same way as in the SISO case, a fixed gain observer is given by

E(k) = [h- GC]~(k- 1) +Bv(k- 1) + Gy(k- (8.90)

This is the estimate where the Kalman filter tends to, when time approachesinfinity, i.e. the asymptotic estimate obtained under the condition that theeigenvalues of the matrix (A - GC) are less than one.

8.3.5 Simulation example

Let us illustrate the multivariable predictive control on the FBC process.From Appendix B we obtain a model for the relations between combustionpower P and flue gas oxygen CF (controlled variables) and fuel feed and secondary air F2 (manipulated variables), using a sampling time of seconds. Linearizing, discretizing, and converting to a state-space form, wehave a state-space description in the form

x(k+ 1) = hx(k) +BAu(k) (8.91)y(k) = Cx(k) (8.92)

where

y~- p ,u~- F2

Let us design a multivariable GPC controller with the following setting:H(~. -- [ 3 3 ], and H~, = [ 90 90 ] corresponding to a 15 min predictionhorizon. The weighting matrix Q was determined such that the varying in-terval (different scales) of the corresponding input and output variables wastaken into account, resulting in ql = 278 and q2 = 0.01 where ql are the diag-onal elements of Q corresponding to CF and q2 the elements correspondingto P. The control weighting was set to zeros. An integral noise model wasassumed for both outputs ( C~,~ (q-~) = A~,~ (q-l) i 1,2; C~,~ (q- l) _~ fori#j).

The plant was simulated using the differential equation model (AppendixB). In addition, an unmeasured disturbance (25% drop in fuel power) effectedthe process at t = 55 min. Simulation results are shown in Fig. 8.1.

TLFeBOOK


20[

150

4

1

I I

10 20 30 40 50

10 20 30

60 70J J J

I . I I

40 50 60 70

015

~0 20 30 40I I ,

50 60 70

60 70!0 20 30 40 50t[min]

Figure 8.1: Multivariable GPC of an FBC. The upper plots show the process

[N’~3]and combustion power P [MW]. Theoutputs: flue gas oxygen C~ [~-~-~ j

dashed line indicates the targets. The lower plots show the manipulated

variables: fuel feed Qc [~]and secondary air F2 [-~[. At t:= 55 min an

unmeasured disturbance affects the process.

TLFeBOOK


0"0"0 50 i O0 150 200 250 3003O

t~ 20~

10150 200 25050 100 300

50 100I I

150 200 250 300

50 100 150 200t [mini

250 300

Figure 8.2: Multivariable GPC of an FBC. See legend of Fig. 8.1 for notation.Measured steady states (see Appendix B) were used as target values.

In a second simulation, the steady-states measured from the true plant(see Appendix B) were used as reference targets. Fig. 8.2 illustrates thesesimulations.

TLFeBOOK

Chapter 9

Time-varying and Non-linearSystems

In this chapter, some aspects of the control of time-varying and non-linearsystems are considered. The field of control of non-linear systems is wide.The aim of this chapter is to give the interested reader, with the help ofillustrative examples, a flavor of the problems encountered and the solutionsavailable. First, a brief introduction to adaptive control is given and thetwo main approaches of gain scheduling and indirect adaptive control arepresented. In non-linear control, the Wiener and Hammerstein systems areof particular interest, as the non-linear control problem can be reformulatedsuch that linear control design methods can be applied. A general approachfor Wiener and Hammerstein systems, based on the availability of an inverseof the static part, is introduced, and illustrated via a simulation exampleusing Wiener GPC for the control of a pH neutralization process. Then aspecial case of second order Hammerstein systems is considered. This chapteris concluded by a presentation of a ’pure’ non-linear predictive control ap-proach, using SNN and optimization under constraints. The control methodis illustrated using two examples concerned with a fermentor and a tubularreactor.

9.1 Adaptive control

Let us use the following definition for adaptive control systems [38], p. 362,as a starting point:

Definition 14 (Adaptive control) Adaptive control systems adapt (ad-just) their behavior to the changing properties of controlled processes andtheir signals.

223

TLFeBOOK

224 CHAPTER 9. TIME-VARYING AND NON-LINEAR SYSTEMS

Basically, there are two main motivations for adaptive control:

¯ non-linear processes, and

¯ time-varying processes.

In real life, all industrial processes exhibit non-linear time-varying behavior.Adaptive control may need to be considered for a process that changes withtime, or when the process is non-linear to the extent that one set of controlsystem parameters is not suj~ficient to adequately describe the process overits operating region [85].

A major assumption in conventional control design is that the underlyingprocesses are linear time-invariant (LTI) dynamical systems. Control designis almost always based on linear descriptions of the process to be controlled,or on the assumption of linearity of the process in its operating region. Thisis due to the relative easiness of the identification of linear models, as well asto the availability of analytical results in the derivation of control laws basedon linear process descriptions. A compete non-linear theory does not exist,and linear design methods work rather well even when applied to non-linearprocesses.

Non-linear processes

All industrial processes, however, are inherently non-linear. Non-linearitiesmay be due to constraints, saturations or hysteresis in the process variables(such as upper and lower bounds of the position of an actuator, or an overflowin a tank.) Typically these non-linearities occur at the boundaries of theoperation areas of the process. Non-linearities may also be present duringthe normal operation of the process due to the non-linearity of the processphenomena, for example transport phenomena (transfer heat by conductionor by radiation, etc.). These non-linearities are typically smooth and close-to-linear, which justifies the use of linear approximations.

A certain linear model (say, model A) may be valid in the neighborhood its operating point (a), or in a part of the operating area, and a controller maybe designed based on the model description. When the operal~ing point ischanged, the model may no longer match with the process. Instead, anotherlinear model (model B) describes well the behavior of the process in thenew operating point (point b). Consequently, the controller needs to redesigned (using the model B) in order to maintain satisfactory behavior the controlled process.

TLFeBOOK

9.1. ADAPTIVE CONTROL 225

Time-varying processes

A major part of conventional system theory is based upon the assump-tion that the systems have constant coefficients. This assumption of time-invariance is fundamental to conventional design procedures. In real life,however, all industrial processes exhibit time-varying behavior. The prop-erties of the process and/or its signals change with time due to componentwearing, changes in process instrumentation, updates in process equipment,failures, etc. When changes in the process are significant, the controller needsto be re-designed in order to maintain satisfactory behavior of the controlledprocess.

With time-varying processes, the model parameters need to be updatedon-line. On-line identification may be performed continuously, so that themodel is updated at each sample instant. Alternatively, the model may beupdated at certain times ’when necessary’. The necessity for identificationmay be indicated from outside of the system (e.g., by a process operator), orsought out by the adaptive control system itself (e.g., triggered by passing acertain value of an index of performance). (Note how linear control based on-line identified linear models can be applied to a wide variety of processes,including non-linear time-varying plants.)

An adaptive system is able to adjust to the current environment: togain information about the current environment, and to use it. An adaptivesystem is memoryless in the sense that it is not able to store this informationfor later use; all new information replaces the old one and only the currentinformation is available. A learning system, instead, is able to recognize thecurrent environment and to recall previously learned associated informationfrom its memory. A learning system is an adaptive system with memory.Learning then means that the system adapts to its current environment andstores this information to be recalled later. Thus, one may expect that alearning system improves its behavior with time; an adaptive system merelyadjusts to its current environment.

9.1.1 Types of adaptive control

Methodsof adaptive control are commonly categorized into three classes:

¯ gain scheduling,

¯ indirect adaptive control, and

¯ direct adaptive control.

TLFeBOOK


operating conditions

parametersCOntroller ~

reference ~[ Controller

~I

auxiliaryprocess variables

Gain schedule

Monitoring ~fprocess

Process

Figure 9.1: Gain scheduling.

Gain scheduling

In gain scheduling, Fig. 9.1, the controller parameters are computed before-hand for each operating region. The computation of controller parametersmay be based on a known non-linear model linearized at each operating point,or on linear models identified for each operating region. The model parame-ters (or, rather, the pre-computed controller parameters) are then tabulated.A scheduling variable permits to select which parameter values to use, thetabulated information is then applied for control. Since there is no feedbackfrom the closed loop signals to the controller, gain scheduling is feed forwardadaptation.

In gain scheduling, the process operating conditions are monitored, possi-bly using some auxiliary process variables. Based on the observed operatingconditions, pre-computed controller parameters are selected using the ’gainschedule’, and then used in process control. The controller is switched be-tween the pre-computed settings, as the operating parameters vary. Thename ’gain scheduling’ is due to a historical background, since the schemewas originally used to accommodate for changes in the process gain only.

The design of gain scheduling can be seen as consisting of two steps:

¯ finding suitable scheduling variables, and

¯ designing of the controller at a number of operating conditions.

The main task in the design is to find suitable scheduling variables. This is

TLFeBOOK


normally done based on physical knowledge of the system. In process control,the production rate can often be chosen as a scheduling variable, since timeconstants and time delays are often inversely proportional to production rate.When the scheduling variables have been found, the controller is designed at anumber of operating conditions, and the controller parameters are stored foreach specific operating region. The stability and performance of the systemare typically evaluated by simulation, with particular attention given to thetransition between different operating conditions.

The application of gain scheduling is usually a straightforward process.An advantage of gain scheduling is that the controller parameters can bechanged very quickly in response to process changes, since there is no esti-mation involved. The lack of estimation also brings about the main drawbackof the system: there is no feedback to compensate for an incorrect schedule.Gain scheduling is feed foward adaptation (or open-loop adaptation): Thereis no feedback from the performance of the closed-loop to the controller pa-rameters.

Indirect adaptive control

Indirect adaptive controllers [5] [3-8] try to attain an optimal control perfor-mance, subject to the design criterion of the controller and to the obtainableinformation on the process. The indirect adaptive control scheme is illus-trated in Fig. 9.2. Three stages can be distinguished:

¯ the identification of the process (in closed loop);

¯ the controller design; and

¯ the adjustment of the controller.

Conceptually, an indirect adaptive control scheme is simple to develop. Acontrol design procedure is taken that is based on the use of a process model;the chosen controller design procedure is automated; and the procedure isapplied every time a new process model has been identified. Thus thereexists a large number of different adaptive indirect (self-tuning) controllers,based on different identification procedures and control laws. The modelparameters are estimated in real time, the estimates are then used as ifthey were equal to the true ones (certainty equivalence principle) and theuncertainties of the estimates are, in general, not concerned.

The indirect adaptive control has been shown to yield good results in prac-tice. Unfortunately, analysis of adaptive control systems is difficult due to theinteraction between the controller design and the parameter estimation. This

TLFeBOOK

228 CHAPTER 9. TIME-VARYING AND NON-LINEAR ,SYSTEMS

reference

Model

Controllerdesign ~ Identification

Controller

T

ProcessY

Figure 9.2: Indirect adaptive self-tuning control.

interaction can, however, play a key role in determining the convergence andstability of the adaptive control. Often, this problem is handled by lookingat the states of the adaptive control as separated into two categories whichchange at different rates. This introduces the idea of two time scales: thefast scale is the ordinary feed back, and the slower one is for updating thecontroller parameters.

In direct adaptive control methods, the controller parameters are directlyidentified based on data (without first identifying a model of the process).


Let us look at a simulation example of the performance of an adaptive ver-sion of the generalized predictive control. In GPC, an explicit process modelis required, i.e. a model structure needs to be selected [delay el and ordersof model polynomials A (q-l) and B (q-l)]. In addition, the coefficients and B need to be determined. In an adaptive version of GPC, the model pa-rameters are updated using on-line measurement information. 2~he updatedprocess model is then instantly used in minimizing the GPC cost function.Thus, a typical indirect adaptive controller is obtained.

TLFeBOOK


Process

Consider a linear process (from [13]) described by its Laplace transform:

y(s__A) = 1 (9.1)u (s) 1 + 10s + 40s2

The process can be written as a discrete-time model:

0.0114 + 0.0106q-1y(k)

(1 _q-l)(1- 1.7567q-1 + 0.77S8q-2)1) (9.2)

In the following simulations, a sudden change at sampling instant k = 100in the process is considered, so that the process gain is reduced to one-fifth(B (q-l) = 0.0023 + 0.0021q-~) of the initial gain.

GPC with a fixed process model

The initial process model was assumed to be known (correct structure andparameters). The parameters of the GPC were set to Hm = 1 (minimumoutput horizon), Hp = 10 (prediction horizon), H~ = 1 (control horizon)and r -- 0 (control weighting). Fig. 9.3 depicts the resulting mean-levelcontrol with fixed parameters. Clearly, after k = 100, the mean-level type ofcontrol performance is not obtained anymore. Instead, a significant overshootappears and the settling time increases.

Adaptive GPC

In a second simulation, the parameters of the process model were updatedusing RLS with exponential forgetting (forgetting factor A = 0.99). Theevolution of the parameters in B (q-l) during estimation is illustrated in Fig.9.4. The updated model was then used in the GPC computations. Theresult of the control is shown in Fig. 9.5. The original design specificationsare fulfilled even if the process changes with time.

TLFeBOOK


0.5

50 150 200 250 300 350I

40O

2

0

"~’-4

-6

-80 50 100 150 200 250 300 350 400

Figure 9.3: GPC using a process model with constant parameters. Atk -- 100, the process gain is decreased abrubtly to one-fifth of the original.The performance of the designed mean-level GPC deteriorates at samplinginstants k > 100 with large overshoots and a longer settling tirae.

TLFeBOOK


0.01.’

0.01

"~ 0.00~

50 100 150 200 250 300 350 400

Figure 9.4: The coefficients in B polynomial are correctly estimated. Arelatively slow estimator was designed, with equivalent memory horizon equalto 100 samples.

0.5 ,

-0.5~) 50 1 O0 150 200 250 300 350t

4OO

2

0

"~’-4

-6

-80 50 1 O0 150 200 250 300 350 400

Figure 9.5: With adaptive GPC, the performace of the GPC remains asdesigned for the original process, even when the process gain changes (atk = ~00).

TLFeBOOK


inversemodel controller

Wiener system

i I linear I I (nonlinear)

~dynamic ~ static

l I system I I system

z / inverse ]_

Y

Figure 9.6: Schematic drawing of the control of a Wiener system based onthe inverse of the model of the static nonlinear part.

9.2 Control of Hammerstein and Wiener sys-tems

For a given process, the predictive control problem can be tackled in manyways. This section provides a very simple and interesting approach for thedesign of control strategies on the basis of Hammerstein and Wiener models.

The control based on non-linear models such as the Hammerstein orWiener models can be simplified and reduced to the design of co:ntrollers forlinear systems. Figures 9.6-9.7 illustrate the control structures for Wienerand Hammerstein systems. Provided that the inverse of the non-linear staticpart exists and is available[71], the remaining control problem is linear. Thusany linear control design method (such as the GPC, for example) can be ap-plied in a straightforward way.

What is required is the model of the inverse static systern. In manycases, it is simplest to identify the inverse model directly from input~)utputdata (process inverse). Alternatively, the static (forward) non-linearity be identified and its inverse mapping then identified (model i~Lverse). If forward static model is available, the inverse may also be solved ’on-line’ byiterating (on-line solution of model inverse). In some cases it :may also possible to have the inverse model from other sources, such as first principle-based process models, etc.

We will next illustrate this large scale linearization approach with a sim-ulated Wiener GPC example.

TLFeBOOK

9.2. CONTROL OF HAMMERSTEIN AND WIENER SYSTEMS 233

Hammerstein system ~

~_~inverse ]~.] (nonlinear) ] linear [ i Ymode,~ s,a,ic ~ dy~m~I i ~[[ [ system [ [ system

[ [

Figure 9.7: Schematic drawing of the control of a Hammerstein system basedon the inverse of the model of the static nonlinear part.


A MISO Wiener model for the pH neutralization process is identified, andgeneralized predictive control (GPC) applied for the control of the process.The control of this process has been studied in a number of papers, e.g.,

[61][68][28].

Process and data

Acid (ql), buffer (q2) and base (qa) streams are mixed in a tank and

effluent pH is measured. The process model [61] consists of three non-linearordinary differential equations and a non-linear output equation for the pH(pH4):

dh

dtdW.~dt

dW~,~dt

0

_ 1 (q~+q2+qa_C,~(h+z)n) (9.3)A1

= AT [(W’~I - Wa4) q~ + (Wa~ - W,,) (9.4)

+ (w~,~ - w,~,)1

= A---~ [(W~,I - ~,4) q~ + (~,~ - Wb,) (9.5)

+ (w~,~ - w~,,)= W~ + 1~~- ~4 (9.6)

1 + 2 x 1~Ha-pK~+W~’4 1 + 1~K~-pH4 + 1~H4-pK~ -- l~H4

where h is the liquid level in the tank, and W~4 and W~4 are the reactioninvariants of the effluent stream. Table 9.1 gives the nominal values used inthe simulations.

TLFeBOOK


variable valuetank area Avalve coefficient C~log of equilibrium constant pK1log of equilibrium constant pK2reaction invariantreaction invariantreaction invariantreaction invariantreaction invariantreaction invarianttime delay 0acid flowrate qlbuffer flowrate q2base flowrate q3

W~I

liquid level in tank heffluent pH pH,~vertical distance between outlet and bottom of tank zvalve exponent nreaction invariant W,.reaction invariant

207 cm2

8.756.3510.253 x 10--3 M-3 x 10-2 M-3.05 × 10-3 MOM3 x 10--2 M5 x 10-5 M0 min16.6 ml/s0.55 ml/s15.6 ml/s14 cm7.00 cm0.5-4.32 x 10-4 M5.28 x 10-4 M

Table 9.1: Estimated parameters of the linear dynamic b][ock.

TLFeBOOK


bo b~ al a2B--L10.2763 (-0.0757) -0.6737 -0.1257A1B2-- 0.2230 (0.2799) 0.0509 -0.5480A2B~m 0.2329 (-0.0653) -0.89.00 -0.019.3Aa

Table 9.2: Identified parameters of the linear dynamic transfer polynomials.

Training data of 500 samples was generated by simulating the model(9.3)-(9.6). The input signal consisted of pseudo-random sequences for input, with a maximum amplitude of 50% of the nominal value. A test setof 500 data patterns was generated in a similar way.


A Wiener model was identified using a SNN of 5 hidden nodes for the staticpart, the linear dynamics were identified using nBt = riB2 = nBa = 1, nAt -~

hA2 -~ hA3 = 2 , and dl = d~ = d3 = 1. Model inputs consisted of thethree input flows to the tank, ql, q2 and q3, the system output was the pHat the outlet. For reference purposes, also a Wiener model with a linearstatic part was identified (y = KTz). Parameters were estimated using theLevenberg-Marquardt method.

Results on identification

The performance of the identified model is illustrated in Figs. 9.8-9.10.Fig. 9.8 shows the performance on training set. The Wiener model outputfollows closely the output of the true plant. Simulation on test set, Fig.9.9, reveals that the mapping is not perfect. This is also indicated by theroot-mean-squared errors on training set (RMSE = 0.1192) and test sets(RMSE = 0.2576). However, a reasonably accurate description of the pHprocess was obtained. The static mapping is illustrated in Fig. 9.10, showingthe titration curves for each input flow, when other flows have their nominalvalues. Table 9.2 shows the estimated parameters of the transfer polynomials.

Control design

The objective was to control the pH (pH4) in the tank by manipulating thebase flow rate (q3). In order to fulfill the objective, a GPC controller wasdesigned for the plant.

TLFeBOOK


150 100 150 200 250 300 350 400 450 500

"

0/ I 1 I I I I I

0 50 1 O0 150 200 250 300 350

301 , , , , , , , , ,/

~ 20~~ ~ .. ~--~ ~P-7,-

0 ~

400 450 500

I I I I I I I I

0 50 I O0 150 200 250 300 350 400 450 500

0 ~ I i i ~ I I

0 50 l O0 150 200 250 300 350 400 450 500time(samples)

Figure 9.8: Training data for the pH neutralization model. The system inputsare shown in the three upper plots. The system output is the lower plot (solidline). The predicted output (dashed line) follows closely the training data;dashed lines on upper plots show the corresponding intermediate variables.

10 ,

8

7

6

5

4 x~__

30 50

Testdam

100 150 200 250 300 350 400 450 500time(samples)

Figttre 9.9: Performance on test data.

TLFeBOOK


l0

~6

4

lOt

8t

4~

IC

10 15 20 0.2 0.4 0.6 0.8 I0 15 20ql q2 q3

Figure 9.10: True (solid line) and identified (dashed line) static mappings.

For Wiener and Hammerstein systems, a linear control design can be ac-complished if the inverse of the non-linear static part is available. In theWiener system, when the process output and the target are mapped throughan inverse non-linear mapping of the static part, the remaining system is alinear one (see Fig. 9.6). In the simulations, the inverse problem was solvedon-line: The process model and a Gauss-Newton search were used in orderto find a z3 (SNN input) such that if(k) (SNN output) equals y (k) sured output) 1. The control problem was then based on the error betweenthe desired output zw (k) and the intermediate signal z3 so that the desiredperformance characteristics were fulfilled.

The GPC cost function is of the form

s=2i= Hrn i=0

(9.7)

where Zw (k + i) are the desired future responses. In the case of Wienersystems, the z~ and z can be obtained from the desired and measured processoutputs w and y, by solving the inverse of the static mapping.

Control simulations

First, a dead-beat controller was designed for the system: H~,l = deg B + d +1 = 3, Hc = deg A + 1 = 3, H~, _> deg A + deg B + 1 (Hr, = 8) and r = 0. Theideal response is shown in Fig. 9.11, obtained using the identified Wienermodel both in the controller and as the simulated plant. The upper partof Fig. 9.11 shows the control signal u and the intermediate signal z. The

~Note that in this case the (global) inverse does not need to exist, as the local solution(depending on the initial value of the search) was found. For the considered process, theglobal inverse would exist and could be identified.

TLFeBOOK


lower part shows the target w and the plant output y. The desired targetsequence consisted of four step changes to the process (at t = {375, 750, 1125,1500} seconds). At t = {1875, 2250, 2625} seconds three disturbances affectthe process (unmeasured :t=10% changes in ql). As shown by the simulationresponses, the ideal dead beat response is fast and accurate. Note, that thesteady-state gain of the u-z controller is one. However, the magnitude andrate of the control input signal makes it unrealistic for many real processcontrol applications. (Note how in the simulation the input q3 was restrictedto be non-negative which effects the control at t = 375, Fig. 9.11.) A correctestimate of the I/O behavior of the process also needs to be available.

In order to get an implementable control signal, it is common in GPCto decrease the control horizon H~ (often H~ = 1). In mean-level control(H,: = 1, Hp large) the closed loop will have open loop plant dynamics; withsmaller H~, a tighter control is obtained. Alternatively (not excluding), realizable controller can be obtained by introducing a non-zero value for theparameter r. Fig. 9.12 shows the simulation when using the true process,(9.3)-(9.6), dead beat control settings, and r = 0.1. This parameter settingresults in a relatively smooth control input, and a small overshoot. The smalldeviations from unit static gain are due to the process-model mismatch. Dueto the integral action in GPC, there is no steady-state error.

For comparison, a GPC controller based on the linear pH model wasexperimented. The simulations are shown in Fig. 9.13. Let us assume thatthe desired performance was a small overshoot and smooth comrol actions,as in Fig. 9.12. Then the system response at pH = 7 is as desired. AtpH -- 9, the system is overdamped, however, and at pH = 5 the systemis underdamped. Clearly, the changes in the gain of the system reflect thecontrol, and the closed-loop performance changes depending on the operatingpoint. These design difficulties were avoided in the Wiener control approach.

Results

In the example, identification and control of a pH neutralization processwere considered. First, a MISO Wiener model was identified for the process.Using the identified model, a GPC controller was designed. Simulationsshowed good results.

Simplicity of non-linear process control is one of the main motivations forWiener and Hammerstein systems. Provided that the inverse of the staticpart is available, linear control design methods (e.g., pole placement) can bedirectly applied for the control of the linear subsystem. In the example weshowed that an explicit inverse model is not always necessary. (,Note that insome cases a global inverse may not exist). Although fixed parm~eter models

TLFeBOOK


4OWiener-GPC: Dead-beat with a perfect model

3O

~ 20

10

Oo

8

i!

tI

1500time (seconds)

II

I1000 2000 2500 31300

0 500 1000 1500 2000 2500 3000time (seconds)

Figure 9.11: Ideal dead beat control. Upper part: controller output u (dashedline) and ’measured’ intermediate signal z. Lower part: plant output y (solidline) and target w (dotted line).

TLFeBOOK


25

2~

IC

Wiener-GPC: Dead-beat with control weighing

50

r1

500 1000 1500 2000 2500time (seconds)

3000

IC

9

8

5

500 1000 1500 2000 2500time (seconds)

3000

Figure 9.12: Simulation of Wiener-GPC control of the pH neutralizationprocess.

TLFeBOOK


25Linear-GPC: Dead-beat with control weighing

2O

=~15

10

r L j _ ~ _ _ _

{,. i iI

5~)I i

5 O0 1000I

1500 2000 2500 3000time (seconds)

IC

60 500 1000

I1500 2000 2500 3000

time (seconds)

Figure 9.13: Simulation of linear GPC control of the pH neutralization pro-cess.

TLFeBOOK


were applied in the example, adaptive control applications are straightfor-ward to conduct, provided that robustness of the closed-loop learMng systemcan be guaranteed. From this point of view, the Wiener and Hammersteinapproaches do not pose any additional difficulties.

9.2.2 Second order Hammerstein systems

In this sub-section, we will consider the special case of predictive control fora second order Hammerstein system.

Second order Hammerstein model

In order for a control system to function properly, it should be unduly in-sensitive to inaccuracies in the process model. We shall be concerned with aclass of SISO discrete time Hammerstein models

P ~(k)(9.8)A(q-~)Y(k)=EB’(q-1)u’(k-1)+ A(q-’)

where Bp (q-l) is a polynomial of degree nBp. This model belongs to thefollowing class

A (q-~) y(k) = B (q-~) f(u (k)) (9.9)

where f(.) is a non-linear function.Let us introduce the following auxiliary (pseudo) input [42] [43]

P

p----1

(9.10)

The process model (9.8) will be rewritten as follows:

A (q-~) y(k) = q-~x(k) A(q-1) (9.11)

Let us derive a predictive controller for the special case P = 2:

1 [Bl(q_~)u(k_l)+B2(q_~)u~(k_l)]A(q-1)

1-t A (q-~) A (q_~).((k)

(9.12)

TLFeBOOK


Prediction

In order to separate the available information from the unavailable (separatepast and future noise), let us consider the following polynomial identity (seeChapter 3, Section 3.3.6)

1 _. Fj(q-1)A(q_l)A(q_l) (q-i) q_q :.A(q_l) A(q_)

(9.13)

from which we have

Ej (q-1) A (q-1) A (q -1) = 1-q-JFj (q-1 ) (9.14)Multiplying both sides of the model (9.12) by qJEj (q-l) A (q-i) A (q-i) andsubstituting (9.14) leads

y(k+ j) = al,j (q-i) ~ (q-i) ~(k + (0.1s)+G2d (q-l) A (q-~) 2 (k +j - 1)+F~ (q-’) y(k) + E~ (q-1) ~(k

where

aid (q-l) -=- Ej (q-l) Bi (q-l) (9.16)

----- g},0 + g~,lq q- "’" + lJj,nbi-Fj--l~l "

Since the degree of the polynomial E~ (q-i) is j - 1, the noise componentsE~ (q-i) ~(k + j) are all in the future and since ~(k) is assumed to be the prediction ~(k + j) of y (k + j) in the mean squares sense is given

~(k + j) = Gld (q-1) Au(k + j -1) (9.17)

-t-G2,j (q-l) A~2 (k -~- j -

+F~ (q-~) y(~).Notice that the prediction ~(k + j) depends upon: i) p~t and pr~ent mea-s~ed outputs; ii) p~t ~own control increments; ~d iii) pr~ent and fut~econtrol increments yet to be deter~ned.

Let ~ denote by f (k + j) the componentof the prediction ~(k + j) posM of signM known (available) at sampling i~t~t k. For example, theexpr~sio~ of ] (k + 1) and f (k + m) are respectively given

f(k+l) = [GI,1 (q-l)-~,0] ~u(k) (9.18)

+ [a2,1 (q-l) __ g~,0] ~u2 (k)

+F~ (q-x)~(~)

TLFeBOOK

244 CHAPTER 9. TIMEVARYING AND NON-LINEAR SYSTEMS

1 f (k + m) = [GI,, (4-l) - gA,o - gk,lq-' - 0 . - - gm,m-lq-(m-l)] (k)

+ p 2 , m (9-l) - 9;,0 - 9;Jq-l - . * * - 9m,m-lQ 3 ( ) +Fm (q-l) Y (k) (9.19)

2 -(m-1) Au2

Let us rewrite equation (9.17) for j = 1, ..., H, in the following matrix form

where

Gi =

9 = G1U + G2U2 + F

Au2 (k) Au2 (k + 1) I u2 =

1 Au2 (k + H,, - 1)

UHp 1 ' 1 -1

... 9f,o 0 91,l 95,o 0 91,2 91,l 91,o 0

* . . 0 i i i i

9m,m-l .9m,m-2 * * * 9m,l 9m,o

0 0

(9.20)

(9.21)

(9.22)

(9.23)

(9.24)

(9.25)

i = 1,2. We denote by gi the kth column of the matrices Gi (i = 1,2)

Gi= [ g6 gi * * . gkp-l ] (9.26)

TLFeBOOK


Cost function

In what follows, we shall be concerned with the minimization of the following control objective

I) HC

{ [Z1 j=l

J = E C [ ~ ( I c + j ) - ~ ( I C + j)12 + C r [AU(IC + j - ill2 I I C (9.27)

which with (9.20), yields

J = E { (GIU + G2U2 + F - W)' (GIU + G2U2 + F - W) (9.28)

+TUTU}

where

w = [w(k + 1) * * * w(lc + (9.29)

Let us set

Then (9.28) is equivalent to

J = E (VTV + TUTU} (9.31)

Minimization of cost function

To minimize this quadratic cost function, we have to calculate the gradient of J with respect to the control increments ui and their squares u: (2 = 1, ...) HI, - 1).

d J dV - = 2vT- + 2rui. dUa dUa

From (9.30)) it follows that the gradient is given by

(9.32)

(9.33)

which leads to

(9.34) d J T 2 - = 2vTgf + 4uiV gi + 2rui dUi

TLFeBOOK


The partial derivative of the criterion J with respect to U

may be written as

OJOU

OJ i oj oj OJ ](9.35)-SU= ~oo NT" o~Hp_l

[ zv~g~ ~.v~g~ ~.v~gl, p_l ]+[ 4u0v~g~ 4ulv%~ 4u,,_lV~g~,._~ ]+ [ 2ruo 2ru~o 2rUgp_~ ]

(9.36)

02JOU2

02 J 02 J 02 J

~ OUo~1" " " OUoUHp-1

¯ :

O’IZ, Hp --i~0 OItHp --IU, I" ¯

~

where 5ij represents the Kronecker symbol. By (9.33), we obtain

OZJ -_- 0 [2VTgj + 4u~VTg~] + 2r6~d

O%u~ Ou~

= 2 \"~-u~ ,/ [gj + 2u/g~] + 46idV gj + 2r6i,~

From (9.32), we derive

02J

Ou~u~

o [ .v OVl- 0~, ~ =~[ b-~+~

(9.39)

(9.40)

(9.41)

(9.42)

(9.43)

This gradient can be rewritten in a more compact form as0_j_J = 2vTG1 + 4vTG~diag (U) 2rUT

(9.37)0U

The current control law is given as the first element of the vector U, obtainedby equating the gradient oA~ to zero, i.e.,

0~J = 2vTG~ + 4VTG~diag (U) 2rUT = 0.

(9.38)0U.

The complexity of this expression makes it intractable to calculate theoptimal control increment. It is not easy to derive the analytical expressionof the current control. In the sequel, we shall present Newton’s method forthe computation of this current control. For this purpose, let us calculatethe elements of the Hessian

TLFeBOOK

9.3. CONTROL OF NON-LINEAR SYSTEMS 247

Again, taking (9.33) into account, we deduce that

Finally, the second partial derivative of the criterion J with respect to U isgiven by

02 J= 2G~TG2 + 4diagtV)[GzTG1] + 4 [G~TG2] diag(U) (9.47)

0U2

+8diag (V)[G~TG~diag (U)] + 4diag (V~G~)

The optimal control increment can be obtained iteratively on the basis of theHessian (9.47).

Even if the considered model (second order Hammerstein model) is rela-tively simple, the development of the long-range predictive control strategyrequires many calculations which alter considerably the inherent robustnessof long-range predictive controllers.

The next section is dedicated to the development of predictive controlstrategies for both unconstrained and constrained systems. This developmentis based on stochastic approximation techniques and sigmoid neural networks.Any non-linear basis function approach can also be used.

9.3 Control of non-linear systems

Several studies on the use of SNN (sigmoid neural network) as the basisfor model predictive controllers (with finite prediction horizons), have beenpublished [82][77][91] [81]. To avoid the use of locally linearized models ofthe process to be controlled (the classical GPC approach), and complex op-timization techniques, we present a simple solution of deriving long-rangepredictive control based on SNN [66]2. The design of this solution is basedon the training of two dynamic neural networks. The NOE and the NARX

2This section is based on K. Najim, A. Rusnak, A Meszaros and M. Fikar. ConstrainedLong-Range Predictive Control Based on Artificial Neural Networks. International Journalof Systems Science, 28(12): 1211-1226, 1997. Reproduced with permission from Taylor Francis Ltd.

TLFeBOOK

248 CHAPTER 9. TIME-VARYING AND NON-LINEAR .SYSTEMS

neural networks are respectively used as a multi-step predictor and for cal-culating the control signal (neural controller). The multi-layer feed forwardSNN is trained so as to achieve the control objective. The maila idea pre-sented in this study concerns the use of stochastic recursive approximationtechniques as learning tool for the design of neural networks controllers tosolve both unconstrained and constrained predictive control problems (min-imization of a long-range quadratic cost function and preventing violationsof process constraints). The control approach described below is general anddoes not depend on the structure of the control objective and the constraints.

9.3.1 Predictive control

The formulation is based on a NARIMAX model, and on the minimizationof the conditional expectation of a quadratic function measuring the con-trol effort and the distance between the predicted system output and somepredicted reference sequence over the receding horizon, i.e.

J = E y~ (w(k + j) - y(k + j))2 y~(Au(k+ j - 1)2 k (9.48j=Hm

where y, Au, and w are the controlled variable, future control increments, andset point, respectively. H,~l, and H~,, are respectively, the minimum and themaximum prediction horizon. The weighting factor r serves for penalizationof the future control increments Au.

In what follows, the use of neural networks (see Chapter 5) for predictionand control is considered.

9.3.2 Sigmoid neural networks

Consider the problem concerning the design of an algorithm which at timek predicts simultaneously the outcomes of the process {y(k)} at time k + 1, k + 2, ..., k + H, where H, is the prediction horizon. NARX SNN,using one-step-ahead structure, generally perform poorly over a trajectory(prediction horizon) because errors are amplified when inaccurate networkoutputs are recycled to the input layer. Recurrent SNNs are more appropriatefor application in model predictive control [94], and in this study we have useda NOE SNN as a multi-step predictor. Unlike NARX SNN where informationflows only from input layer to the outputs, recurrent SNN inc]Lude delayedinformation flow back to preceding layers. An NOE SNN predictor is depictedin Fig. 9.14. The SNN inputs consisted of the previous and current values ofprocess inputs and predicted plant outputs. The values of process outputs

TLFeBOOK


come into the NOE SNN only indirectly in the process of training whenthe future predicted output is compared with the actual process output.The training process of this SNN predictor was carried out using a backpropagation through the time algorithm [94].

A multilayer feed forward SNN was used as a controller. The proposedstructure of this SNN controller is depicted in Fig. 9.15. The controller inputsconsisted of the plant predictions which are provided by the predictor and thedesired value of the plant output. The outputs correspond to the present andfuture increments of the control signal. The weights of this SNN controllerwere updated directly using a stochastic approximation algorithm, whichminimizes the control objective (9.48) subject to constraints (any controlobjective can be considered in this control approach). These weights areconsidered as the controller parameters. Since control action is based on theprediction of the plant behavior, offset can occur due to disturbances andmodel mismatch when the SNN is used as a dynamic model of the controlledprocess. Therefore the plant output is predicted at each sampling time asfollows:

y(k)=f(U,O)+d(k) (9.49)

where U is the SNN input vector, ~ is a vector of the weights to be optimized,and d is a disturbance. This disturbance (correction) of the prediction computed by the following equation:

d(k + i) = d(k) = y(k) (9.50)where y is the current value of the plant output and ~" is the prediction of ygenerated by the SNN predictor. The disturbance is assumed to be constantover the prediction horizon.

The general schematic diagram of the predictive controller is depictedin Fig. 9.16. At each sampling period, the signals are fed into the SNNpredictors:

¯ past and present plant outputs,

¯ past values of control actions applied to the process, and

¯ the calculated future control sequence from the last sampling period.

The predictor calculates predictions of the plant outputs over the relevanthorizon. These are corrected with the calculated deviation between actualprocess output and predicted process output at time k. Next the predictionsare fed, together with the set point value (or sequence of future set point

TLFeBOOK

250 CHAPTER 9. TIME- VARYING AND NON-LINEAR SYSTEMS

Figure 9.14: Structure of a sigmoid neural network (SNN) predictor.

TLFeBOOK


o

Figure 9.15: Structure of a SNN controller.

TLFeBOOK


Model predictions

Disturbances

Set point

Past and present process outputs

SNN ~’

predictor ~-~

Future. [ Past and presentcon.trolI control

a..~cti~

actions

SNN 1 Processcontroller ]

New controlaction

Process output

Figure 9.16: Structure of a predictive control system using sig~noid neuralnetworks.

values for programmed set points), into the SNN controller. This minimizesthe criterion (9.48), constructs the future control increments, and closes theinner loop. This procedure is repeated until the future control incrementsconverge. The algorithm used for training this neural network controller isdescribed in the following section.

9.3.3 Stochastic approximation

In this section our main emphasis will be on stochastic approximation tech-niques [44]. The synthesis of the neural network controller is formulatedas the determination of their associated weights which minimize the uncon-strained (constrained) control objective function J, i.e.,

O=argm~nJ(O) (9.51)

where 0 is the weight vector. The optimization problem (9.51) can be solvedusing stochastic approximation techniques.

A key feature of many practical control problems is the presence of con-straints on both controlled and manipulated variables. Inequality constraintsarise commonly in process control problems due to physical limitations ofplant equipment. For example, the control objective may be to minimizesome quadratic cost function while satisfying constraints of product qual-ity and avoiding undesirable operating regimes (flooding in a liquid-liquidextraction column, etc.).

TLFeBOOK


Let us consider the following constrained optimization problem:

min J0(0) (9.52)0

under the constraints

J~(O) ~_ 0, (i = 1,...,m) (9.53)

J0(0) is associated with the control objective defined by (9.48). The straints Ji(0) _< 0, ( i -- 1, ... , m) are usually associated with processphysical limitations of valves, reactor volume, etc.

Let us introduce the Lagrange function [93] defined by

m

L(0, <I)) = J0(0) + ~ ¢~Jj(O) (9.54)j=l

where <I> = [~bl, ... , ~,~] T is the Kuhn-Tucker vector. To solve the opti-mization problem (9.52)-(9.53), an iterative algorithm based on stochasticapproximation techniques was proposed by Walk [93]. This algorithm maxi-mizes simultaneously L(O, (I)) with respect to (I) and minimizes it with respectto ~. It is presented below:

Let the estimates (~k, (I)k) be available at time k where 0k is dimensional random vector, (I)k = (¢k,i) i = 1, ..., rn is an m-dimensionalrandom vector with Ck,i --> 0 (k E N) on a probability space (0, ko, R). ak, ck (k E N) be real positive sequences tending to zero and satisfying:

~a2 -2(9.55)kCk < ~X), E ak=oC, ~-~akck < cX~

The observation noise (the contamination of function values) is modeled by square inte~able real random v~iabl~ E~,~ (i = 0, ..., m; l = 1, ..., P; k ~ N).The optimization algorithm is given by [93]:

~k+~ = Ck + akSoL(Ok, ~) (9.56)

where

5¢L(0k, (I)k)~ ---- max {J,(0k)-

and

’~’"}, (i = 1,...,m) (9.57)

Ok+l = Ok -- ak~)oL(Ok, +k) (9.58)

TLFeBOOK


where

(DoL(~e, (I)a))t

= (2c~)-1 [J0(ek + c~,) - J0(e~ - c~,) - vL] + (9.~)m

+ ~ ~.~(~)-~ [~(~ + ~,) - j~(~ - ~,)

e~ is a P-dime~ional null-vector with 1 ~/’th coordinate (l = 1,., P). Withthe a-algebra ~ defined ~ follows

¯ ~ = ~(~,,¢~,~:,...,~_~(i= ~,...,~), (~.~0)~i

"~,,, ..., V;_~,,(i = O,...,m;l = 1, ..., P))

it is assumed:

and

(9.61)

l,i

Vi (9.63)Under these assumptions, it has been shown [93] that this algorithm con-verges almost surely to the optimal solution.

In this algorithm, the components of the gradients of the Lagrange func-tion toward the weights ~ and the Kuhn-Tucker parameters (I) are estimatedby finite differences. The convergence of this algorithm has been proved byWalk [93] as well as a central limit theorem with convergence order kwhich is also achieved for the Kiefer-Wolfowitz method to which the consid-ered algorithm reduces if there are no constraints.

To demonstrate the performance and the feasibility of this approach weapplied it to control a continuous-flow, stirred biochemical reactor and a fixedbed tubular chemical reactor.

9.3.4 Control of a fermenter

Control problems in biotechnological processes have gained increasing inter-est because of the great number of applications, mainly in the pharmaceu-tical industry and biological depollution [62]. We considered a model of acontinuous-flow, stirred biochemical reactor (fermenter). This model, whichdescribes the growth of saccharomyces cerevisiae on glucose with continuousfeeding, was adopted from [58]. It is based on a hypothesis in [89]: a lim-ited oxidation capacity leading to formation of ethanol under conditions ofoxygen limitation or an excessive glucose concentration.

TLFeBOOK


Process

The dynamic model is derived from mass balance considerations. It is de-scribed by the following equations: Cell mass concentration

dt(9.64)

glucose (substrate) concentration

_v~ = D~(c~,i,, - c~) - Qsc~dt

(9.65)

ethanol (product) concentration

dced~ = D,(ce,i,~ - c,,) + (Q~.,pr - Qe)cx (9.66)

carbon dioxide concentration

_c___2d = Dg(c,.,i,, - c,:) + Qcc~dt "

(9.67)

dissolved oxygen concentration

dco

d-"~ = D~(co,i,, - co) + Na - Qoc~(9.68)

gas phase oxygen concentration

-~ ----- Dg(cg,i n - cg) - Na (9.69)

where

ql q_~gD~ = ~ll and Og = Vg (9.70)

The mathematical description of the kinetic model mechanisms is given inTable 9.3. The model parameters are given in Table 9.4. The initial condi-tions for equations (9.64)-(9.69) are given in Table

The fermenter model was simulated using the Runge-Kutta method. Thebioprocess static behavior is depicted in Fig. 9.17. One can get informationabout the non-linear behavior of the bioprocess by looking at the variationof the steady-state gain depicted in Fig. 9.17.

The experiments described here illustrate the use of the predictive controlalgorithm when applied to controlling the dissolved oxygen concentration,Co(t), using the dilution rate, Dg(t), as.the manipulated variable.

TLFeBOOK


Mechanism DescriptionGlucose uptakeOxidation capacity

Oxidative ethanol metabolism

Reductive glucose metabolismEthanol uptake

Oxidative ethanol metabolism

Ethanol productionGrowthCarbon dioxide productionOxygen consumptionOxygen transferMaximum consumption rates

where induction or repressionfactors are

Qs -- Qs,maxk~+c~

qo, lim = Qo, maXko+C~

Q~,ox = min { Q~gos~o, lira

Q~,~,i--Q~-Q~,oxQ,~. -_ c~ ~ ~_.~_’,~e, max k.,~ +c¢ kl +c~

-- min ~ QeQe,ox - ( (Qo, lim- Qs,ox~o)~e

Table 9.3: The kinetic model mechanism description.

k,, = 2.2 10-3mol/1kl = 5.6 10-4mol/1kin : 1.7 10-4tool/1kn -~. 3.6 10-4mol/1ko = 3.0 10-6mol/1k~ = 5.6 10-4tool/1kLa = 592 h-1

PQe, max = 0.13 mol/(C-mol.h)PQ~, max = 0.50 mol/(C-mol.h)

Q~(’,,max = 0.20 mol/(C-mol.h)m = 35 mol/molVI = O.70 lDg = 1.0 h-1

c~,i,, = c~,i,, = 1.0 10-3tool/1

Y~¢ = 0.68 mol/mol~(, = 1.28 tool/toolY~x = 1.32 C-mol/mol~e = 1.88 mol/mol~o = 2.27 mol/molY~’ = 2.35 mol/mol~r~ed = 1.89 mol/moly~x = 3.65 C-mol/moly~:~d = 0.36 C-tool/toolT,, = 2.80 h"to = 1.60 hT~ = 2.50 hCo,in = Cc,in = Co,in = 0.0 mol/lv.~ = 2.50 h

Table 9.4: Model parameters. (1 C-tool of biomass has the compositionCH~.saN0.~700.~).

TLFeBOOK


co(0) = 1.5510-amol/1Cg(0) = 8.7210-5mol/1

cs(0) = 2.4210-4mol/1

c~(O) = 1.010-amol/lco(O) = 2.4310-6moi/I~(0) = l.OlO-~C-mol/l

Table 9.5: Bioprocess initial conditions.

2

1

0.5

x 104

0.5 1 1.5 0 0.5 1 1.5D D

Figure 9.17: Static behavior of dissolved oxygen Co [mol/1] (left) and staticgain (right) as a function of the dilution rate

The structure of the SNN predictor used was [6, 5, 1]: six neurons in theinput layer with inputs [y(k- 1), y(k- 2), y(k- 3), u(k), u(k- 1), u(k- 3)],five neurons in the hidden layer and one neuron in the output layer. Thesampling period was set equal to 0.5 hours. The training set contained 600input-output pairs. The structure of the SNN controller used was [14, 8, 4],and the inputs consisted of the predictions of the process behavior obtainedusing the SNN predictor.

Apart from predictor and controller parametrization (i.e. choice of thenumber of nodes), there still remain a few design parameters that must bespecified a priori, i.e. the prediction horizon, the control weighting factor,and setting the values of the parameters involved in the stochastic approxi-mation algorithm. The following choices were made:

¯ the horizons related to the control objective were set equal to H,, =1, H, = 13.

¯ the weighting factor r in the control objective was fixed to r = 0.1.

¯ the setting values of the parameters involved in the stochastic approx-imation algorithm were fixed to ak = 0.3, Ck ---- 0.01.

TLFeBOOK


Notice that ak must decrease in order to remove the influence of disturbances,according to (9.55). In the noise free case, ak can be either constant decreasing sequence that converges to a constant value.

Control simulations

For the first set of tests, the future reference was considered to !be a knownsquare wave, as shown in the upper graph of Fig. 9.18. Figure 9.19 givesan enlargement of Fig. 9.18 for the first 200 hours of the simulation run.The lower graph of Fig. 9.18 represents the control signal u(k) derived fromthe predictive control calculation. Figure 9.18 and Figure 9.19 show theperformance of the control. It can be seen that both steady-state and tran-sient behavior are satisfactory. The variation of the control variable is verysmooth. Notice that the set .point changes led to the variation of the biopro-cess dynamics (steady-state gain change, etc.).

The second experiment considers level constraints on the input. Thefollowing constraints on the manipulated variable, dilution rate D~, wereused: Dg,min _~ D~(k) _< D~,,,,a~, Dg,min = 0.4, Dg,,,,~ = 0.85. Figure 9.20shows the evolution of the dissolved oxygen concentration, co(k), and thedilution rate, D~(k). The evolution of the bioprocess output as well as thecontrol variable for the first 200 hours of operation are depicted in Fig. 9.21.These simulation results show that the bioprocess operates well under theconstrained control.

In the third experiment, a rate constraint on the input was considered,[ADg(k)l _< 0.1. Figure 9.22 shows the evolution of the dissolved oxygenconcentration co(k) and the dilution rate D~(k). Figure 9.23 gives an en-largement of Fig. 9.22 for the first 200 hours of this simulation run. Thereare a large number of set point changes. Some of them occur randomly(see Fig. 9.22 for k >_ 200 hours). Due to the fulfillment of the constraint,the control signal has no large variations, thus it corresponds to industrialrequirements.

In practice, it is impossible to obtain perfect measurements and uniformdissolved oxygen concentrations. We therefore introduced measurement noise~(k) with zero mean and variance equal to 0.02. Figure 9.2,1 shows thedissolved oxygen concentration and the dilution rate. The control action issmooth. These results demonstrate that the presented control aigorithm hasgood regulation and tracking properties. Next, we will consider the controlof a chemical reactor.

TLFeBOOK


~o1 . : i : . " : :

I00 2~ 3~ 4~ 5~ 6~ 7~ 8~ 9~ 1000 11~ 12~

1

0.8

~d3.6

OA

0.2

0400 500 600

t100 200 300 700 800 900 1000 1100 1200

Figure 9.18: Predictive control of a fermenter: unconstrained case. Top: themeasured and desired dissolved oxygen, c,,, as a function of time [hi. Bottom:the manipulated variable, D~.

TLFeBOOK


1.5

40 60 80 100 120 140 160 180

1

0.8

~d).6

OA

0.2

020 40 60 80 1130 120 140 160 180

Figure 9.19: Enlargement of Fig. 9.18 for the first 200 hours of operation.

TLFeBOOK


0.5

1100

1

0.8

0.4

0.2

0 I I } I , I I I I I I I

100 200 300 400 500 600 700 800 900 1000 1100t

Figure 9.20: Predictive control of a fermenter: level constraint on the manip-ulated variable, 0.4 _< Dg _< 0.85. Top: the measured and desired dissolvedoxygen, Co, as a function of time [hi. Bottom: the manipulated variable, Dg.

TLFeBOOK


20 40 60 80 I00 120 140 160 180

1

0,~

o.2

0.~

l I I I I I I I I

20 40 60 80 1 O0 120 140 160 180I


TLFeBOOK


1.5

100 200 300 400 500 600 700 800 900 1000 1100

1

0.8

~.6

0.4

0.2

01 O0 200 300 400 500 600 700 800 900

t1000 11130

Figure 9.22: Predictive control of a fermenter: rate constraint on the manip-ulated variable, IDol <_ 0.1. Top: the measured and desired dissolved oxygen,Co, as a function of time [hi. Bottom: the manipulated variable, DE.

TLFeBOOK


1.5

Go

20 40 60 80 100 120 140 160 180

1

0.~

0.4

0.2

020 40 60 80 1 O0 120

!

I I I

140 160 180


TLFeBOOK


1.5

200 300 400 500 600 700 800 900 1000 1100 1200

1

0.1~

0.,~

0.2

0 I I I I

1 O0 200 300 400 500 600 700 800 900 1000 1100 1200t

Figure 9.24: Predictive control of a fermenter: noisy measurements. Top:the measured and desired dissolved oxygen, Co, as a function of time [hi.Bottom: the ~nanipulated variable, Dg.

TLFeBOOK


9.3.5 Control of a tubular reactor

In a second example we consider the temperature control in a tubular chem-ical reactor. A tubular reactor is a significant and widely used piece of equip-ment in chemical technology. The object of our study is such a reactor withfixed-bed catalyst and cooling. Efficient control of this type of process isoften hampered by its highly non-linear behavior and hazardous operatingconditions.

Assuming that jl reversible exothermic first-order reactions take place inthe reactor and some specified simplifying circumstances hold, a structurednon-linear mathematical model of the process can be developed [59]. Themodel was proposed on the basis of both mass and heat balances and its finalform is given by a set of non-linear hyperbolic partial differential equations,as follows. Mass balance for the i’th component

Oci Oci~

0--~ + u-~’z = - E r,~(c,, Tk)(9.71)

j----1

Energetic balance of reactant mixture

OTg+ = AI(Tk - - A2(T - Tw)

Energetic balance of catalyst

OTk = BI [~-~ (-AH~)r~ - B2(Tk - Tg) - B~(Tk - (9.73)Ot

L~=I

Energetic balance of the reactor’s wall

-~- = C~ [C~(Tk - T~) + C~(T~ - T~) - C4(T~ - (9.74)

The initial and boundary conditions are:

ci(z,O) = ci~(z); Tg(z, 0)= Tg~(Z); T~(z,O)= T~(z) (9.75)

Tw(z,0) = Two(Z); c4(O,t)=cio(t); Tg(0, t)= Tg0(t) (9.76)

Symbols used stand for the following physical quantities: c~ - concentration ofthe i’th component, t - dimensionless time, z - dimensionless spatial variable,ri~ - rate of chemical reactions (i’th component in j’th reaction), Tg - reactantmixture temperature, Tw - wall temperature, T~ - catalyst temperature, T¢ -coolant temperature, and coefficients A1, A~, B1 - B3, C~ - C,~ include thetechnological parameters of the reactor.

TLFeBOOK


For simulation, two reactions of the following rates were considered:

( _ 2oorl(c, Tk)=8.7 103exp 1.98Tk] c (9.77)

19800 ~r2(c, Tk) = 4.57 105 exp 1.98Tk ) (9.78)

This case corresponds to the ethylene oxide production in an industrial scalereactor. The parameters involved take the following values:

A1

B~B2

C~C~

= 51.356307; A~. = 23.796894; B1 -- 0.000614; (9.79)

= 2.301454; B1Ba = 0.266606; CIC~ = 0.080613 (9.80)

= 0.322451; CICa = 1.048619 (9.81)

For simulation, the following values of initial and boundary conditions wereconsidered:

c,s(z)T s( z)

= 0.015037 kmol m-a; Tg~(z) = 522.266948 K (9.82)

= 526.844683 K; T,~(z)= 514.216915 K (9.83)= 0.015600 kmol m-a; Tgo(t) = 499.579969 K (9.84)

The coolant temperature was chosen as a control variable. The goal of thecontrol was to maintain a desired profile of the gas mixture average temper-ature in the reactor. Partial differential equations of the model were solvedby dividing the reactor into 10 segments according to the spatial variable.

Experiments included both unconstrained and constrained cases for thecontrol. After some ’pre-experimental’ runs we found the appropriate struc-ture for SNN predictor and SNN controller. The structure of the SNN pre-dictor used was [4, 6, 1] with inputs [y(k), y(k- 1), u(k), u(k- 1)] and output[y(k ÷ 1)]. The sampling period was set equal to 0.5 min. The training datacontained 700 input-output pairs. The structure of the SNN controller usedwas [14, 8, 4]. On the input of the SNN controller, the future predictions ofthe process behavior were applied and the controller generated the futurecontrol a~:tions. The experiments reported here were carried out with thefollowing choices:

¯ the horizons relating to the control objective were set equal to Hm ---- 4,Hp = 13.

¯ the weighting factor r in the control objective was fixed to r = 0.1.

TLFeBOOK


1.015 -

1.010.

1.005

~ 0.995.--

0.990

0.985

0.980

I00 200 300 400 500 600 700 800 900Tc [K]

Figure 9.25: Static gain of the chemical reactor [66].

¯ the values of the parameters involved in the stochastic approximationalgorithm were fixed to: ak = 0.3, ck = 0.01.

Figure 9.25 shows the non-linear steady-state (static) gain. The behaviorof the controlled reactor in the unconstrained case is depicted in Fig. 9.26.The upper graph shows the reactor output, the middle graph shows thecontrol signal. In the lower graph, the control action increments are depicted.

The behavior of the chemical reactor in the constrained case is illustratedin Fig. 9.27. In this experiment, a constraint of IAT,:(k)I < 15 K on thecontrol variable was considered. The description of this figure is similar tothat of Fig. 9.26. From the lower graph of Fig. 9.26, it can be s~n that theincrements of the control signal lie in the interval [0, 100] K. On the otherhand, in the constrained case, the control variable increments vary withinthe interval [0, 15] K and the control signal is smooth.

In this section, we presented some experiments concerning the imple-mentation of the constrained predictive control algorithm based on neuralnetworks. There are a number of other potential applications, since manyindustrial plants (chemical, mineral, etc.) are characterized by non-linearand time-varying behavior, and are subject to several kinds of disturbances.

TLFeBOOK


4.2

6.5-

6.0-

5.5-

5.0-

4.5.

4.0.

3.5

Tin~

0.8.

0 25

Figure 9.26: The output, desired output and the manipulated variable forpredictive control [66].

TLFeBOOK


5.8.

5.6

5.4’

52’

5.0

4.6.

4.4.

5,5’

4O

0.15

25 50 75 lid 125 150

25 50 75 I~0 125

Tam

Figure 9.27: The output, desired output and the manipulated variable for theconstrained predictive control with constraint on the manipula,ted variableof the chemical reactor, (IAT~ (k) <_ 15 KI) [66].

TLFeBOOK

Part III

Appendices

TLFeBOOK

Appendix A

State-Space Representation

The primary purpose of this appendix is to introduce a number of conceptswhich are fundamental in the state-space representation and analysis of dy-namic systems. We formally define and illustrate the concepts of controlla-bility and observability.

A.1 State-space description

Let a SISO system (plant, process) be described by a state-space model

x(k+ 1) = Ax(k) +Bu(k) (A.1)y(k) = Cx(k) (A.2)

where

x is the state vector (n x 1),

u is the system input (controller output) (1 ×

y is the system output (measured) (1 x

A is the state transition matrix (n × n)

B is the input transition vector (n x 1)

C is the state observer vector (1 x n)

Remark 6 (Characteristic equation) The characteristic equation is givenby

det (zI- A) =

273

TLFeBOOK

274 APPENDIX A. STATE-SPACE REPRESENTATION

Remark 7 (Number of representations) For a given system there existsno unique state-space representation. In fact from any state representationwe can obtain a new one by using any linear transformation, i.e., ~(k) Tx(k) where T is a non-singular matrix.

Let us next introduce two state representations, namely, the control andobserver canonical forms.

A.I.1 Control and observer canonical forms

Consider a system given by a transfer polynomial

y(kl_B(q-11A(q_l) u(k) (i.3)

where B (q-l) b~q-~ + .. . + b,~q-’~ and A (q-~) -- 1 +a~q-1 + .. . ÷ a,~q-’~.

For notational convenience, without loss of generality, we assume that thepolynomials are all of order n. The control canonical form

x,:(k+l) A, :~(k)+BEu(k) (A.4)

y(k) = C,:xE(k) (A.5)

is obtained by substituting

--al --a2 .... an-1 --an

1 0 ... 0 00 1 0 0

0 0 1 0

(A.6)

10

(A.7)

(~c = [ bl 52 "’" bn ] (A.8)

The key idea is that the elements of the first row of Ac are exactly thecoefficients of the characteristic polynomial of the system (the matrix AE is

TLFeBOOK

A.2. CONTROLLABILITY AND OBSERVABILITY 275

known as the companion matrix of the polynomial ,~ ÷ al~n-1 T "’" ~- a,~(see [64]).

Similarly, the observer canonical form

xo(k+l) Aoxo(k)+Bou(k) (A.9)y(k) CoX,,(k) (A.10)

is obtained by substituting

-a~ 1 0 ... 0-a2 0 1 0

-a,~_~ 0 0 1-a,~ 0 0 ... 0

(A.11)

Ib~b~:

;o(A.12)

Co= [ 1 0 ... 0 ] (A.13)

A.2 Controllability and observability

We will now consider now two fundamental notions concerning dynamicssystems that are represented in the state-space form. The first is whether itis possible to transfer (drive, force) a system from a given initial state to anyother arbitrary final state. The second is how can we observe (determine)the state of a given system if the only available information consists of inputand output measurements. These concepts have been introduced by Kalmanas the concepts of controllability and observability.

Definition 15 (Controllability) The system (A.1) is said to be completelystate controllable (reachable), or simply controllable, if it is possible to finda control sequence which steers it from any initial state x (ki) at any instantki to any arbitrary final state x (kf) at any instant kf ki _~ 0. Otherwise,the system is said to be uncontrollable.

TLFeBOOK


Definition 16 (Observability) The system (A.1) is said to be completelystate observable, or simply observable, if and only if the complete state of thesystem can determined over any finite time interval [ki, kf] from the availableinput and output measurements over the time interval [ki, kf] with kf > ki ~_ 0.Otherwise, the system is said to be unobservable.

The following theorems state the conditions under which a given systemis controllable or observable.

Theorem 2 (Controllability) The system (A.1)-(A.2) is controllable and only if the controllability matrix defined by

Is AS A2B ... An-iS ] (A.14)

has rank n.

Theorem 3only if the observability matrix defined by

CCACA2

:

CA~-I

(Observability) The system (A.1)-(A.2) is observable

(A.15)

has rank n.

In order to illustrate the usefulness of the controllability notion and thestates transformation, we shall consider two applications: a feedback con-troller based on the pole placement control approach, and the reconstructionof the state variables.

A.2.1 Pole placement

The design of state-space feedback controllers can be seen as consisting oftwo independent steps: the design of the control law, and the design of anobserver [21]. The final control algorithm will consist of a combination ofthe control law and the estimator with control-law calculations based on theestimated states.

The state feedback control law is simply a linear combination of the com-ponents of the state-space vector

u(k) -Kx (k (A.16)

TLFeBOOK


where K = [kl, ..., kn]. Substituting this to the state equation (A.1) gives

x(k + 1)= Ax(k)- BKx (A.17)

and its z-transform is given by

(zI- h+BK)x(z)= (A.lS)

with a characteristic equation

det (zI - A + BK) = (A.19)

In pole-placement, the control-law design consists of finding the elementsof K so that the roots of (A.19), poles of the closed-loop system, are in thedesired locations. The desired characteristic equation is given by

P (z) = ’~ +an_l zn-1 ~-... -~- o~iz ~- o~ 0 ~-- 0 (A.20)

In order to do this, let us consider a linear transformation defined by a matrixM

x = M~ (A.21)

The system equations then become

~(k+ 1) = M-1AM~(k) +M-1Bu(k) (A.22)y(k) = CM~(k)

u(k) = -KM~(k) (A.23)

Let us introduce the following notations [64]:

M-~AM (A.24)

M-in (A.25)

KM (A.26)

where .~ et ~ have the following form

0 1 0 ...0 0 1

0 0 0--ao --al --a2 ’’’

00

--an_1

[00

(A.27)

TLFeBOOK

278 APPENDIX A. STATE-SPACE REPRESI-~NTATION

The matrix ~ is a companion matrix. The characteristic polynomial is givenby

( ~5~) = a n-1 (A.28)det zI- z’~ + ,~-1 +... + alz + ao

Let us denote the matrix M as follows:

M = [ml,m2,... ,rn~] (A.29)

where rn~ (i -- 1, ..., n) represents the columns of the matrix M. We obtainfrom (A.24)

M~ = AM (A.30)

that is,

mn-1

mn-2

mI =

0 =

The equation (A.25)

Hence,

leads to

From (A.32), we derive

llln_1

ITln_2

1111 ~-~

ran-1 -- an-lmn : tmn

mn-2 - an-2mn : tmn-1

:

ml - almn = Am2

--aomn = Arnl

(A.31)

(A + a,~_xI)(A2 ~- a~_lA + an-2I) m,~

:

(A’~-1 + a,~_~A’~-2 + ... + a2A + alI) m.;(A’~ + an_~An-1 + ... + a~A + aoI) mn

B = MB (A.33)

mn=S (A.34)

(A + an-lI) B(A2 -b an-~A q- an-2I)

:

(A’~-1 + an_l A~-2 + ... + a2A + alI) B

(A.35)

TLFeBOOK

A.2. CONTROLLABILITY AND OBSERVABILITY

The inverse of the matrix M exists if and only if

rank (B, AB,... ,A~-IB) =

This corresponds to the controllability condition.Based on the desired characteristic equation (A.20) we derive

det (zI - X + ~= ’~ +a, ~_~z’~-1 + .. . + alz + O~o\ ]

We have

0 1 00 0 1:

0 0 0~0 --~1 --~2

00

279

(A.36)

(A.a7)

(A.a8)

We also have

00

O~0 -- ao

0 00 0

0 00~1 -- al 0~2 -- a2

00

O~n-1 -- an-1

(A.39)

[ ~0 "~1 "’" "~n--1 ] (A.40)

It then follows in view of (A.39) and (A.40)

~, = a, - a, (A.41)

(i = 0, ..., n - 1) and K = M-~K.The ideas behind this control design can also be used in the context of

state observation, which will be considered next.

TLFeBOOK


A.2.2 Observers

The problem of determination of the states of a given system arise in manycontexts, e.g., for purposes of control, soft sensors development, diagnosis,etc. The state vector can be directly calculated from the available measure-ments (input-output data). From (A.1)-(A.2), we

y(k-n+l)

=Cx (k - n + 1)CAx(k- n+ 1) + CBu(k- n+

: (A.42)

CA"-’x(k- n + 1) + CA"-~Bu(k- n+ 1) + ...

+CBu (k - 1)

These equations can be conveniently arranged in matrix form as, say,

y(k-n+2):

= Qx(k- n+ 1) +

u(k-n+l)- n + 2)

:

u(k- :)

(A.43)

where

CCACA:

:

CA,~-~

(A.44)

This matrix is nothing else than the observability matrix. It is clear that inorder to calculate the state x (k - n + 1), this matrix must not be singular(these developments represent the proof of the Theorem 3). In other words,the non-singularity of the matrix Q is crucial to the problem of observingthe states. This state reconstruction approach has the drawback that it maybe sensitive to disturbances [5].

Another approach for state reconstruction is based on the use of a dy-namic system. Let us consider the following observer for the system states

~(k+l)=A~(k)+Bu(k)+L[y(k)-C~.(k)] (A.45)

based on the system model and the correction term with gain L -- [11, l~, ...,l,~] T. The estimation error is given by

~(k + 1)= x(k + 1)- ~(k + 1)= [A- LC] (A.46)

TLFeBOOK


Thus, if the matrix [A- LC] represents an asymptotically stable system,~ (.) will converge to zero for any initial error ~ (0).

For the design of the gain L, we can use the same approach as for the de-sign of the state-feedback control law: The characteristic equation associatedwith the system governing the dynamics of the estimator error is

det (zI - A + LC) (A.47)

and should be identical to the desired estimator characteristic equation. No-tice that this characteristic equation is similar to (A.37), and, therefore themathematical tools used for solving the state reconstruction problem aresimilar to those employed in the pole placement control design.

TLFeBOOK

Appendix B

Fluidized Bed Combustion

In fluidized bed combustion (FBC), the combustion chamber contains a quan-tity of finely divided particles such as sand or ash. The combustion air en-tering from below lifts these particles until they form a turbulent bed, whichbehaves like a boiling fluid. The fuel is added to the bed and the mixed mate-rial is kept in constant movement by the combustion air. The heat releasedas the material burns maintains the bed temperature, and the turbulencekeeps the temperature uniform throughout the bed.

The main purpose of an FBC plant is to generate power (energy fluxJ[W]=[~]). Several powers can be distinguished: The fuel power is the power

in the fuel (heat value times feed); the combustion power is the power re-leased in combustion (dependent on completeness of the combustion and thefurnace dynamics). Boiler power depends further on the efficiency of theheat exchangers as well as their dynamics. Often, a part of the heat is usedfor generating electricity in the turbines, while the remaining heat is usedfor the generation of steam and hot water. We can then distinguish electri-cal power and thermal power. Depending on plant constructions these areroughly of order 40%-0% (electrical plant), 30%-55% (co-generating plant),or 0 - 80% (thermal plant) of the fuel power. In what follows a simplifiedmodel of a thermal plant is considered. For a more realistic modeling of thethermal power (steam mass flow), including the drum pressure, see [4].

B.1 Model of a bubbling fluidized bed

A rough model for a bubbling fluidized bed combustor can be formulated [76]based on mass and energy balances (see also [33] [35]). The model divides thefurnace into two parts: the bed and the freeboard, see Fig. B.1. Combustiontakes place in both: oxygen is consumpted and heat is released and removed.

283

TLFeBOOK

284 APPENDIX B. FLUIDIZED BED COMBUSTION

Throat temperature

Freeboard temperature

Secondary air flow

throat

Bed temperature

board

Fuel feed ~’

bzd4

Primary air flow

heat e: changers

C N0 [ stack

Figure B.I: A schematic drawing of a typical FBC plant. A mixture ofinert/sorbent bed material and fuel is fluidized by the primary air. Completecombustion is ensured by the secondary air flow inserted from above the bed.The heat released in combustion is captured by heat exchangers and used forthe generation of electricity, steam, or both.

TLFeBOOK

B.1. MODEL OF A BUBBLING FLUIDIZED BED 285

The control inputs of the system are the fuel feed Qc [~] and the primaryand secondary air flows F1 and F2 [ N"3] Measurable system outputs are the

t S

flue gas 02 [Nm3]tN~----~ ~, and the bed and the freeboard temperatures TB and TF

B.I.1 Bed

The solids (char) in the fuel combust in the bed. When fed to the combustor,the solids are stored in the fuel inventory (the amount of unburned char inthe bed). The combustion rate QB [~] depends on the availability of oxygenin the bed as well as the fuel properties:

QB (t) = Wc CB (t) (B.1)tc

where Wc is the fuel inventory [kg], and tc is the (average) char combustiontime Is]. CB and C1 are the oxygen contents [~] in the bed and in theprimary air, respectively.

The dynamics of the fuel inventory are given by the difference betweenthe fraction of the fuel feed rate that combusts in the bed Qc [~] and thecombustion rate QB (t)

dWc (t) = (1 - V) qc (t) - QB (B.2)

dt

where V is the fraction of volatiles in the fuel [~-~].Combustion in the bed consumes oxygen. (92 comes into the bed in the

primary air flow F1 (t) N’na] which is naturally provided by t he environment~ Nm3 ]and having an oxygen content C1 = 0.21 tN~---~ J" O2 is consumed in the

combustion and transported to the freeboard:

dCn (t) 1 [elF1 (t ) - XcQB (t) -- C~ (t) (t)](B.3)

dt Vn

where Xc [s,~] is the coefficient describing the amount of 02 consumed byt kg J

the fuel and Vs [ma] is the volume of the bed.As a result of combustion, heat is released. The amount of released

heat depends on the heat value Hc [~] of the solids in the fuel. Heat isremoved from the bed by cooling water tubes. The energy balance for thebed temperatures TB [K] is given by

dTB (t) = ~{gcQ~(t)-amAmtTn(t)-Tnt] (B.4)

dt(0T1 (0%

TLFeBOOK


where c~ and W~ are the specific heat [~] and mass [kg] of the bed material(inert sand), aBt and ABt are the heat transfer coefficient w[m-yr~] and surface[m2] of the cooling tubes, and Tnt is the temperature [K] of ~he coolingwater. The incoming primary air in temperature T1 [K], with specific heatcl [Nm--E~K], conveys some heat into the system. The remaining air, heated in

Jbed temperature, is transported into the freeboard, where CF [~,,-7r~] is thespecific heat of the flue gases.

B.1.2 Freeboard

The gaseous components (volatiles) in the fuel are released and transportedby the fluidizing air to the freeboard where immediate combustion occurs.The combustion of volatile fraction of the fuel consumes oxygen in the free-board. Oxygen comes to the freeboard from the bed and with the secondaryair flow F2 [Nm3] with the O2 content C2 [Nln3], s J tN,--’~ J" The dynamics of the freeboardoxygen content CF rNm3~ (flue gas oxygen) are given

dCF (t)dt

1~{c, (t) F1 (t) + C~F2 -XvVQc (t) - CF (t) [F~ (t) (t )]}

(B.5)

[Nm3] is the coefficient describing the amount of 02 consumed bywhere Xv t kg ~

the volatiles. VF [m~] is the freeboard volume.The volatiles release energy when combusted. Heat is removed from the

freeboard by cooling water tubes located at the walls of the furnace. Theenergy balance for the freeboard temperatures TF [K] is given by

dTf (t)dt

1CFVF [HvVQc (t) aFtAFt [T F (t ) -- TFt

(B.6)

~gl (t) Tn (t) + c2F2 (t) T2 (t) + c~ [F~ (t) + F2 (t)]

where aFt and AFt are the heat transfer coefficient w[,,,--~K] and surface [m2] ofthe cooling tubes, TFt is the temperature [K] of the cooling water, c2 is thespecific heat J[N,~-’;F~] of the secondary air, in temperature T2, a~d Hv is theheat value of the volatiles [~].

B.1.3 Power

The combustion power Pc [W] is the rate of energy released in ,combustion

Pc (t) = HcQ~ (t) + HvVQc (B.7)

TLFeBOOK

B.1. MODEL OF A BUBBLING FLUIDIZED BED 287

and simple first order dynamics were assumed for the thermal power P [W]

dP (t) = [Pc (t ) - P (t(B.8)

dt Tmix

where Tmix IS] is a time constant.

B.1.4 Steady-state

The equations can be solved in steady-state. Bed fuel inventory and bedoxygen content are functions of the fuel feed and the primary air flows:

Cltc (1 - V) FIQcWc = CIF1 - Xc (1 - V) (B.9)

cn = c1 + xc (1 - v) (B.10)F1

Let us solve the equations eliminating variables other than the Qc, F1 andF2. Bed temperatures depend on the fuel power:

Tn = Hc(1 - V)Qc + ClF~T1 + aBtABtTBtaBtABt -~- EFF1

(B.11)

Flue gas oxygen is influenced mainly by the secondary air flow and the fuelfeed:

ClF, + C~F2 - Xc (1 - V) Qc - XvVQcCF = (B.12)

FI+F2

Freeboard temperatures depend on the heat released by the volatiles:

[antAnt + cFF~] [aFtAft + CF (F1 + F2)]

(B.13)

Power depends entirely on the fuel feed and its heat value:

P= Hc(1- V) Qc + HvVQc (B.14)

TLFeBOOK


B.2 Tuning of the model

The above model is very simple. The poor aspect is that it describes only afew of the phenomena involved in a complex process such as FBC combustion.For example, the assumptions on fluidization, combusiton and heat transferare very elementary. Therefore, the model needs to be tuned in order tomatch plant measurements. The nice thing is that a model with a simplestructure is also simple to tune using standard methods.

Much more detailed models have been constructed for the FBC process.From a practical point of view the calculation times and the lack of accuratemeasurements restricts the use of these models. It is common in practicethat a simple mass or energy balance can not be closed based on plant mea-surements, due to systematic errors in measurements. Many of the internalparameters related to combustion, fluidization, and heat transfer can be ac-curately measured only in laboratory conditions, and are not applicable toa real plant. The advanced models can be extremely useful in helping todevelp and understand the process. In automatic process control, however,their significance is less.

B.2.1 Initial values

The tuning of the model was divided into three phases. The initial valueswere found from the literature (heat values and heat transfer coeiYicients, O2consumption, plant geometry). These are given in Table B. 1.

B.2.2 Steady-state behavior

The steady-state behavior of the model was adjusted first.tuning knots were used

TILe following

Hc ~- plHc V ~- p2Y Xc. ~- p3Xc (B.15)

taking further that Hv = Hc; Xv = Xc. A cost function was formulated asa weighted sum of squared errors between measured steady state values andthe predicted ones:

w~s were chosen according to the scales of the variables. Using a data set of 11steady-state points, Table B.2, the values ofp~ were estimated to p -- [0.2701,

TLFeBOOK

B.2. TUNING OF THE MODEL 289

Bed:bed material specific heatbed inert materialvolumeheat transfer coefficientheat exchange surfacecooling water temperature

c, = 800 [@]W~ = 25000 [kg]VB = 26.3 [ma]

aB,. = 210Am. = 26.8 [mTm= 573 [K]

Freeboard:volumeheat transfer coefficientheat exchange surfacecooling water temperature

v = 128.1 [m3]aft = 210 [mW--~-g]AFt = 130.7 [m2]

Tm= 573 IN]Air flows:primary air O2 contentprimary air specific heatprimary air temperaturesecondary air O2 contentsecondary air specific heatsecondary air temperatureflue gas specific heat

C1 = 0.21C1 = 1305 [~--~K]T1 = 328 [Z]

fNma 1C2 =0.21 t N.---~Jc2 = ~305T~ = 3~8 [K]CF 1305 J

Fuel feed:

O2 consumed in combustionheat value of charmean combustion ratefraction of volatiles

O2 consumed in combustionheat value of volatiles

Xc = 1.886 rNm~lt kg J

Hc=30x106 [~]tc = 50 Is]v=o.75

[ Nm3Xv = 1.225Hv = 50 × 10

Other:time constant T,~,i~ = 300 [S]

Table B.I: Constants for the FBC model.

TLFeBOOK


Qc[~) F~r~""l v rN,,~, CF[%-vol] TB[°C] TEl°C] P[MW]2.2 3.5 7.9 5.1 696 556 19.12.3 3.5 6.5 3.0 662 607 19.32.3 3.5 9.8 6.9 696 550 19.22.3 3.5 8.0 5.1 686 572 19.11.6 2.5 4.4 2.9 650 581 13.11.7 2.8 5.2 4.0 668 569 15.11.7 2.8 6.4 5.4 696 530 14.33.1 3.7 10.2 3.9 691 646 26.03.0 3.7 8.6 2.5 681 628 27.03.0 3.7 11.0 5.0 659 599 25.6

Table B.2: Steady-state data from a FBC plant.

0.9956, 0.4238]T using the Levenberg-Marquardt method. The efficient heatvalue was found to be only -.~ 30 % of the heat value of dry fuel. As the fuelfeed was taken as the measured kg-input flux to the furnace, and moisture wasnot taken into account, this is acceptable. For volatiles the 3/4 assumptionwas reasonable. The O2 consumption coefficient reflects the fact that lessthan half of the input feed consists of combustible components.

B.2.3 Dynamics

The dynamics were tuned by hand by comparing measurements from step-response experiments and corresponding simulations. First, the delays wereexamined and found negligible (equal to zero) from air flows F1 andand 20 seconds from fuel feed Qc. This was judged reasonable as there istransport delay from a change in fuel belt conveyor speed to the introductionof a change in the flow to the furnace. In addition, some delay is due toignition of fuel, which was not taken into a~count in the model. Thus wehave

Qc (t) : Qc,actuator (t 20) (B.17)

where Qc,actuator is the measured fuel flow. The transport delay in air flowswas insignificant.

The time constants for CF were found adequate (bed and freeboard vol-umes). The time constants for temperatures were adjusted by altering themass of the bed inert material WI and the freeboard time constant CFVF (indTF/dt equation (B.6) only); for power these were adjusted by settingFor WI a value of 480 kg was found reasonable; the CFVF for TF was multipliedby 35 in order to have reasonable responses.

TLFeBOOK

B.2. TUNING OF THE MODEL 291

0 50 100 150 200 250

0 50 1 O0 ! 50 200 250 300

90~

50 I O0 ! 50 200 250 300t [mini

80~

50~

Figure B.2: Response of the tuned FBC model (solid lines) against measure-ments (dotted lines) for a 25 MW FBC plant.

B.2.4 Performance of the model

Figures B.2-B.3 illustrate the performance of the model with respect to datameasured from a 25 MW semi-circulated FBC plant, for step-like changes inQc, F1 and F~. Figure B.2 illustrates the steady-state performance. Notethat the data used for tuning the steady-states of the model was not takenfrom these experiments. The main characteristics of the 02 in flue gases aswell as freeboard temperatures are captured by the model. For bed temper-atures the response is poor.

Figure B.3 shows a smaller section of the same simulation. For O2 thedynamic response is good. For freeboard temperatures the response seemslike an acceptable first order approximation of a second order process. Again,the predictability for bed temperatures is poor.

TLFeBOOK

292 APPENDIX B. FLUIDIZED BED CO_MBUSTION

3

~1.5

0.0~

70~

~,65(

60(

~0

210

175 180 185 190 195 200 205

175 180 185 190 195 200 205t[min]

210

210

Figure B.3: Response of the tuned FBC model (solid lines) against measure-ments (dotted lines) for a 25 MW FBC plant.

TLFeBOOK

B.3. LINEARIZATION OF THE MODEL

B.3 Linearization of the model

293

The differential equation model (B.1)-(B.6) is a non-linear one, containingbilinear terms such as WcCB in (B.1) and CB (t) F1 (t) in (B.3). This can be discretized around an operating point.

Let the operating point be given by:

{Qc, F1, F2, We, CB, CF, TB, TF, P} (B.lS)

If this is a steady state point, then {Wc,CB,CF,TB,TF,P} can be de-termined by specifying {Qc, ~1,~2} and using the steady state equations(B.9)-(B.14). Using a Taylor-series expansion, we obtain the following earized continuous time model:

dWc(t)dt

(1 - v) [oc (t) - -~] we [c. (t) tcC1

C~ [Wc (t) - Wc] + (1 - c WcCutcC1 tcC1

(B.19)

dCB (t)dt VB

dCF (t)dt

TLFeBOOK


dT~ (t)dt

cIT1 -- CFTB (F 1 (t) - ~1)

c~ WiHcWc~ c,W, tcC, (CB (t)

--aBtABt -- CFF1

o W~tcC~

c~ W~ (TB (t) -

(Wc (t)

(B.22)

(H~W~C~ _ aBtAm(TB -- Tnt) c~-fi~T~ - CF~n)tc C~

c, W~

dTF (t)dt

HvV [Qc (t) - -~c] [F~ (t)

F~-~ c2T2c__F~Fr.~-- CFTF IF2 (t) - ~] + ~ [TB (t) -- T~] (B.23)

--CF(~ + ~2) -- aFtAFt [TF (t) -- ~F]CF VFTF

HvV-~c - CF(~ + F~)TF - a~tAFt(~F -- TFt)

c~F~T2 + CFF1TB

CF VFTF

CF VFTF

dP(t)dt

Tmix

(B.24)

Equations (B.19)-(B.24) can be expressed as a linear state-space modelaround an operating point

dx(t)= A,:x (t) + B~,u (t) (B.25)

dty (t) = C,~x (t) (B.26)

where the vectors x, u and y are given by deviations from the point of

TLFeBOOK

B.3. LINEARIZATION OF THE MODEL 295

linearization

x(t)

wc(t)-Wcc~, (t)-U,c~(t)-U~

TF (t) -- P(t)--fi

;u(t) -]Go(t)F~ (~) F~F~ (t)

(B.27)

The coei~ticients of the matrices Ac and Bc are obtained from (B.19)-(B.24),-~-~- b1,1 = (1 - V), etc. Usually the flue gas 02, bed and freeboard

al,1 = tcC1 ~

temperatures and power outtake are measured:

T~(t)-~ 0 0 0 1 0 0y(t)= TF(t)--~F andC,~= 0 0 0 0 1 0

P(t)--fi 0 0 0 0 0 1

(B.28)

For many practical purposes, the model needs to be discretized. The ap-proximation of a continous-time state-space model by a discrete-time modelis straightforward (see, e.g., [64]), and results in

x(k+!) = Ax(k)+Bu(k) (8.29)y(k) = Cx(k) (8.30)

where t = kT~ and Ts is the sampling time.

Example 43 (Linearization) Let us linearize the model in a steady stategiven by

"~c = 2.6kg, ~1 = 3.1Nm----~, -~2 = 8.4Nm3 (B.31)8 8 8

The remaining states of the operating point are then given by

Wc = 165kg, CB = 0.042, CF = 0.031 (B.32)

TB = 749°C ,TF = 650°C, P = 21.1MW (B.33)

For a continuous time~ state-space model we obtain

tC =

-0.0040 -15.6819 0 0 0-0.0001 -0.5908 0 0 00 0.0242 -0.0898 0 00.0027 10.5892 0 -0.0008 00 0 0 0.0005 -0.00510.0001 0.4236 0 0 0

00000-0.0033

(B.34)

TLFeBOOK


0.2533 0 00 0.0064 0-0.0046 0.0001 0.00140 -0.755 00.7238 0.0155 -0.09290.0202 0 0

(B.35)

001000000100000010000001

(B.36)

Using Ts -- 4 s, a linearized discrete-time state-space model is obtained(this is simple to accomplish numerically using a suitable software like Mat-lab, for example). For convenience, the model is given in a transfer polyno-mial form. We have from the fuel feed:

CF(q-1)

Qc(q-1)

rB(q-I)

Qc(q-~)

TF(q-1)

Qc(q-1)

p(q-~)Qc(q-~)

-0.01549q-6 (1 - 0.9957q-1) (1 - 0.09308q-~) (R

0.003362q-5 (1 - 0.6233q-~) (1 + 0.5543q-~)(B.38)

(1 - 0.9968q-~)2 (1 - 0.09293q-~)

2.866q-6 (1 - 1.994q-~ + 0.9936q-2)

(1 - 0.9799q-1) (1 - 9968q-1)2 (B.39)

0.08027q-6 (1 - 0.0923q-~) (1 - 0.9958q-q)

(1 -~ ~~’(i --~~ (i _--~~qq-1)(B.40)

From primary air flow:

C (qf2(q-1)

TB(q-1)

F2(q-1)TF(q

P(q-~)F2(q-1)

0.00084931q-~ (1 - 0.9872q-1) (1 + 0.2433q-1)(B.41)

(1 -0.6983q-~) (1 - 0.9968q-1) (1 - 0.09293q-:t)

-0.020027q-1 (1 - 1.006q-1) (1 - 7,9q-~)(B.42)

(1 - 0.9968q-’)~ (1 - 0.09293q-a)

0.06135q-’ (1 - 0.095q-~) (1 - 0.OSTZq-’) (1 - 1.00Zq(-~),,,,(1 - 0.9799q-1) (1 - 0.9968q-1)2 (1 - 0.09293q-1)~’’" "6)

0.011219q-’ (1 - q-’)(1 + 0.4645q-~)

(1 - 0.9868q-1) (1 - 0.9968q-’) (1 - 0.09293q-’)(B.44)

TLFeBOOK

B.3. LINEARIZATION OF THE MODEL 297

step in: Q-c

@0.-o.0zl ~-0.04~

0 15 30 45 60

9~

0 15 30 45 60

0 15 30 45 60

~. 262422

0 15 30 45 60t lmin]

-0.02t-0.04[

-0.02[-0.04~

0 15 30 45 60 0 15 30 45 60

0 15 30 45 60 60

0 15 30 45 60

262422

0 15 30 45

0 15 30 45 60

262422

0 15 30 45 60 0 15 30 45 60t[min] t[min]

Figure B.4: Step responses of the linearized model, linearized at steady stateQc = 2.6 [kg/s], Fx = 3.1 [Nm3/s], F2 = 8.4 [Nm3/s].

and from the secondary air flow:

CF (q-l) 0.0046901q-1= (B.45)

F~ (q-l) 1 - 0.6983q-1

TB(q-x)--- 0 (B.46)

F2(q-1)

TF (q-X) _0.36785q-~= (B.47)F~ (q-l) 1 - 0.9799q-x

P(q-~)= 0 (B.4S)Fz(q-x)

The performance of the linearized model is depicted in Fig. B.4.

TLFeBOOK

Bibliography

[1]

[2]

J Andrews. A mathematical model for the continuous culture of micro-organisms using inhibitory substrate. Biotechnology and Bioengineering,10:707-723, 1968.

K ~strom. Introduction to Stochastic Control Theory. Academic Press,New York, 1970.

[3] K/~.str6m. Maximum likelihood and prediction error methods. Auto-matica, 16:551-574, 1980.

[4] K/~.strom and R Bell. Drum-boiler dynamics. Automatica, 36:363-378,2000.

[5]

[6]

[7]

[8]

[9]

[10]

[11]

K/~trt~m and K Wittenmark. Computer Controlled Systems: Theoryand Design. Prentice-Hall Inc., Englewood Cliffs, New Jersey, 1990.

N Baba. A new approach for finding the global minimum of error func-tion of neural networks. Neural Networks, 2:367-373, 1989.

R Battiti. First- and second-order methods for learning: Between steep-est descent and Newton’s method. Neural Computation, 4:141-166,1992.

R Bitmead, M Gevers, and V Wertz. Adaptive Optimal Control - TheThinking Man’s GPC. Prentice-Hall, New York, 1990.

R Brockett and P Krishnaprasad. A scaling theory for linear systems.IEEE Transactions on Automatic Control, 25(2):197-207, 1980.

W Buntine and A Weigend.forward networks: A review.5:480-488, 1994.

Computing second derivatives in feed-IEEE Transactions on Neural Networks,

R Bush and F Mosteller. Stochastic Models for Learning. John Wileyand Sons, New York, 1958.

299

TLFeBOOK

300 BIBLIOGRAPHY

[12]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

J Castro and M Delgado. Fuzzy systems with defuzzification are uni-versal approximators. IEEE Transactions on Systems, Man and Cyber-netics, 26:149-152, 1996.

D Clarke, C Mohtadi, and P Tufts. Generalized predictive control - part.Automatica, 23(2):137-148, 1989.

C Cutler and B Ramaker. Dynamic matrix control: A computer controlalgorithm. JAAC, pages 0-0, 1980.

T Edgar and D Himmelblau. Optimization of Chemical Processes.McGraw-Hill, New York, 1989.

J Edmunds. Input and output scaling and reordering for diagonal domi-nance and block diagonal dominance. IEE Proceedings - Control Theoryand Applications, 145(6):523-530, 1998.

E Eskinat, S Johnson, and W Luyben. Use of Hammerstein modelsin identification of nonlinear systems. AIChE Journal, 37(2):255-268,1991.

R L Eubank. Nonparametric Regression and Spline Smoothing. MarcelDekker, New York, 1999.

P Eykhoff. System Identification: Parameter and State Estimation. JohnWiley and Sons, New York, 1974.

R Fox and L Fan. Stochastic modeling of chemical process systems:Parts I-III. Chemical Engineering Education, XXIV:56-59, 88-92, 164-167, 1990.

G Franklin, J Powell, and M Workman. Digital Control of DynamicSystems. Addison Wesley Longman Inc., Menlo Park, U.S.A., 1998.

D Goldberg. Genetic Algorithms in Search, Optimization, and MachineLearning. Addison-Wesley, Reding, Massachusetts, 1989.

G Goodwin and K Sin. Adaptive Filtering, Prediction and Control.Prentice-Hall, New Jersey, 1984.

M Hagan and M Menhaj. Training feedforward networks with the mar-quardt algorithm. IEEE Transactions on Neural Networks, 5:989-993,1994.

TLFeBOOK

BIBLIOGRAPHY 301

[25[W H~rdle. Applied Nonparametric Regression. Cambridge UniversityPress, New York, 1990.

[26] T Hastie and R Tibshirani. Generalized Additive Models. Chapman andH~ll, London, 1990.

[27] S Haykin. Neural Networks: A Comprehensive Foundation. MacMillan,New York, 1994.

[28]

[29]

[3o]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

M Henson and D Seborg. Adaptive nonlinear control of a pH neu-tralization process. IEEE Transactions on Control Systems Technology,2(3):169-182, 1994.

J Hertz, A Krogh, and R Palmer. Introduction to the Theory of NeuralComputation. Addison-Wesley, Redwood City, 1991.

K Hornik, M Stinchcombe, and H White. Multilayer feedforward neu-ral networks are universal approximators. Neural Networks, 2:359-366,1989.

K Hunt, R Haas, and R Murray-Smith. Extending the functional equiv-Mence of radial basis function networks and fuzzy inference systems.IEEE Transactions on Neural Networks, 7:776--781, 1996.

H HyStyniemi. Self-Organizing Artificial Neural Networks in DynamicSystems Modeling and Control. PhD thesis, Helsinki University of Tech-nology, 1994.

E Ikonen. Pedin polttoainekertymanmallintaminen leijukerrospoltossa.Master’s thesis, University of Oulu, 1991.

E Ikonen. Algorithms for Process Modelling Using Fuzzy Neural Net-works: A Distributed Logic Processor Approach. PhD thesis, Universityof Oulu, 1996.

E Ikonen and U Kortela. Dynamic model of a fluidized bed coal com-bustor. Control Engineering Practice, 2(6):1001-1006, 1994.

E Ikonen and K Najim. Non-linear process modelling based on a Wienerapproach. Journal of Systems and Control Engineering - Proceedings ofthe Institution of Mechanical Engineers Part I, to appear.

E Ikonen, K Najim, and U Kortela. Process identification usingHammerstein systems. In IASTED International Conference on Mod-elling, Identification and Control (MIC 2000), Insbruck, Austria, 2000.IASTED.

TLFeBOOK

302 BIBLIOGRAPHY

[38] R Isermann. Digital Control Systems. Springer-Verlag, Heidelberg, 1981.

[39] A Jazwinski. Stochastic Processes and Filtering Theory. Academic Press,New York, 1970.

[40] M Johansson. A Primer on Fuzzy Control (Report). Lund ]Institute ofTechnology, 1996.

[41] R Johansson. System Modeling and Identification. Prentice, Hall, NewJersey, 1993.

[42] E Katende and A Jutan. Nonlinear predictive control of complex pro-cesses. Industrial Engineering Chemistry Research, 35:3539-3546, 1996.

[43] E Katende, A Jutan, and R Corless. Quadratic nonlinear predictivecontrol. Industrial Engineering Chemistry Research, 37(7):2721-2728,1998.

[44] 3 Kiefer and J Wolfowitz. Stochastic estimation of the maximum of aregression. Annals of Mathematics and Statistics, 23:462-466, 1952.

[45] M Kinnaert. Adaptive generalized predictive controller for MIMO sys-tems. International Journal of Control, 50(1):161-172, 1989.

[46] S Kirkpatrick, C Gelatt, and M Vecchi. Optimization by simulatedannealing. Science, 220:671-680, 1983.

[47] G Klir and T Folger. Fuzzy Sets, Uncertainty and Information. Prentice-Hall, 1988.

[48] T Knudsen. Consistency analysis of subspace identification methodsbased on a linear regression approach. Automatica, 37(1):81-89, 2001.

[49] T Kohonen. The self-organising map. IEEE Proceedings, 78:1464-1486,1990.

[50] R Kruse, J Gebhardt, and. F Klawonn. Foundations of Fuzzy Systems.John Wiley and Sons, Chichester, 1994.

[51] I Landau, R Lozano, and M M’Saad. Adaptive Control. Springer Verlag,London, 1997.

[52] P Lee, R Newell, and I Cameron. Process Management and Control.http://wwweng2.murdoch.edu.au/m288/resources/textbook/title.htm,2000.

TLFeBOOK

BIBLIOGRAPHY 3O3

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[6o]

[61]

[62]

[63]

[64]

[65]

P Lindskog. Methods, Algorithms and Tools for System IdentificationBased on Prior Knowledge. PhD thesis, Lindki3ping University, 1996.

L Ljung and T McKelvey. A least squares interpretation of sub-spacemethods for system identification. In Proceedings of the 35th Conferenceon Decision and Control, pages 335-342. IEEE, 1996.

L Ljung and T S~derstr~m. Theory and Practice of Recursive Identifi-cation. MIT Press, Cambridge, Massachusetts, 1983.

R Luus and T Jaakola. Optimization by direct search and systematicreduction of the size of the search region. AIChE Journal, 19:760-766,1973.

J Maciejowski. Multivariable Feedback Design. Addison-Wesley, Wok-ingham, England, 1989.

A M~szaros, M Brdys, P Tatjewski, and P Lednicky. Multilayer adaptivecontrol of continuous bioprocesses using optimising control techniques.case study: Baker’s yeast culture. Bioprocess Engineering, 12:1-9, 1995.

A M~szaros, P Dostal, and J Mikles. Development of tubular chemicalreactor models for control purposes. Chemical Papers, 48:69-72, 1994.

J Monod. Recherche sur la Croissance des Cultures Bact$riennes. Her-man, Paris, 1942.

E Nahas, M Hanson, and D Seborg. Nonlinear internal model controlstrategy for neural network models. Computers and Chemical Engineer-ing, 29(4):1039-1057, 1992.

K Najim. Process Modeling and Control in Chemical Engineering. Mar-cel Dekker Inc., New York, 1988.

K Najim and E Ikonen. Distributed logic processors trained under con-straints using stochastic approximation techniques. IEEE Transationson Systems, Man, and Cybernetics - A, 29:421-426, 1999.

K Najim and E Ikonen. Outils MathOmatiques pour le G~nie des Proc~I~s- Cours et Exercices CorrigOs. Dunod, Paris, 1999.

K Najim, A Poznyak, and E Ikonen. Calculation of residence time fornon linear systems. International Journal of Systems Science, 27:661-667, 1996.

TLFeBOOK

304 BIBLIOGRAPHY

[66]

[67]

[68]

[69]

[7o]

[71]

[72]

[73]

[74]

[75]

[76]

[77]

[78]

[79]

K Najim, A Rusnak, A M~szaros, and M Fikar. Constrained long-range predictive control based on artificial neural networks. Interna-tional Journal of Systems Science, 28:1211-1226, 1997.

K Narendra and M A L Thathachar. Learning Automata an Introduc-tion. Prentice-Hall, Englewood Cliffs, New Jersey, 1989.

S Norquay, A Palazoglu, and J Romagnoli. Application of Wiener modelpredictive control (WMPC) to a pH neutralization experinmnt. IEEETransactions on Control Systems Technology, 7(4):437-445, 1999.

A Ordys and D Clarke. A state-space description for GPC controllers.International Journal of Systems Science, 24(9):1727-1744, 1993.

J Parkum, J Poulsen, and J Holst. Recursive forgetting algorithms.International Journal of Control, 55(1):109-128, 1990.

T Parthasarathy. On Global Univalence Theorems. Springer-Verlag,Berlin, 1983.

R Pearson. Discrete-Time Dynamic Models. Oxford University Press,Oxford, 1999.

W Pedrycz. Fuzzy Control and Fuzzy Systems. John Wiley and Sons,New York, 1989.

J Penttinen and H Koivo. Multivariable tuning regulators for unknownsystem. Automatica, 16:393-398, 1980.

A Poznyak and K Najim. Learning Automata and Stochastic Optimiza-tion. Springer, Berlin, 1997.

M Pyykk(i. Leijupetikattilan tulipesdn siiiitbjen simulointi. TampereUniversity of Technology, Finland, 1989.

J M Quero, E F Camacho, and L G Franquelo. Neural network for con-strained predictive control. IEEE Transactions on Circuits and Systems- I: Fundamental Theory and Applications, 40:621-626, 1993.

D Ratkowsky. Nonlinear Regression Modeling: A Unified .Practical Ap-proach. Marcel Dekker Inc., New York, 1983.

J Richalet, A Rault, J Testud, and J Papon. Model predictive heuristiccontrol: Applications to industrial processes. Automatica, 14:413-428,1978.

TLFeBOOK

BIBLIOGRAPHY 3O5

[80] V Ruoppila, T Sorsa, and H Koivo. Recursive least-squares approachto self-organizing maps. In IEEE International Conference on NeuralNetworks, San Frdncisco, 1993.

[81] A Rusnak, M Fikar, K Najim, and A M~szaros. Generalized predictivecontrol based on neural networks. Neural Processing Letters, 4:107-112,1996.

[82]

[83]

[84]

[85]

[86]

[87]

[88]

[89]

[9o]

[91]

J Saint-Donat, N Bhat, and T McAvoy. Neural net based model predic-tive control. International Journal of Control, 54:1453-1468, 1991.

D Sbarbaro, N Filatov, and H Unbehauen. Adaptive predictive con-trollers based on othonormal series representation. International journalof adaptive control and signal processing, 13:621--631, 1999.

R Setiono and L Hui. Use of quasi-Newton method in a feedforwardneural network construction algorithm. IEEE Transactions on NeuralNetworks, 6:273-277, 1995.

S Shah and W Cluett. Recursive least squares based estimation schemesfor self-tuning control. Canadian Journal of Chemical Engineering,69:89-96, 1991.

J Sjt~berg, Q Zhang, L Ljung, A Benveniste, B Delyon, P Glorennec,H Hjalmarsson, and A Juditsky. Nonlinear black-box modelling in sys-tem identification: A unified overview. Automatica, 31:1691-1724, 1995.

P van der Smagt. Minimisation methods for training feedforward neuralnetworks. Neural Networks, 7:1-11, 1994.

R. Soeterboek. Predictive Control: A Unified Approach. Prentice HallInternational, London, 1992.

B Sonnleitner and O Kt~ppeli. Growth of saccharomyces crevisiae is con-trolled by its limited respiratiry capacity. Formulation and verificationof a hypothesis. Biotechnology and Bioengineering, 28:81-88, 1986.

G Stephanopoulos. Chemical Process Control: An Introduction to The-ory and Practice. Prentice-Hall, New York, 1984.

K O Temeng, P D Schnelle, and T J McAvoy. Model predictive controlof an industrial packed bed reactor using neural networks. Journal ofProcess Control, 5:19-27, 1995.

TLFeBOOK

306 BIBLIOGRAPHY

[92]

[93]

[94]

[95]

[96]

[97]

A Visala. Modeling of Nonlinear Processes Using Wiener-NN Represen-tation and Multiple Models. PhD thesis, Helsinki University of Technol-ogy, 1997.

H Walk. Stochastic iteration for a constrained optimization problem.Communications in Statistics, Sequential Analysis, 2:369-385, 1983-1984.

P Werbos. Backpropagation through time: What it does and how to doit. Proceedings of the IEEE, 78:1550-1560, 1990.

T Wigren. Recursive prediction error identification using the nonlinearWiener model. Automatica, 29(4):1011-1025, 1993.

C Yue and W Qinglin. A multivariable unified predictive control (UPC)algorithm based on the state space model. In K Seki, editor, 38th SICEConference ’99, pages 949-952, Moriol~, Japan, 1999. The Society of In-strument and Control Engineers, The Society of Instrument and ControlEngineers.

K Zenger. Analysis and Control Design of a Class of Time- Varying Sys-tems. Report 88, Helsinki University of Technology, Control :EngineeringLaboratory, 1992.

TLFeBOOK

Indexadaptive control, 223

direct, 228gain scheduling, 226indirect, 227

adaptive systems, 225add-one partitioning, 105ARMAX, 61ARX, 57automata, 155autoregressive exogenous, 57

basis function networks, 79basis functions

global, 79local, 79mother, 79multi-variable, 79single-variable, 79

batch methods, 137bias, 96bias-variance dilemma, 82Box-Jenkins, 56Bristol’s matrix, 205Bristol’s method, 204, 207Burman-Lagrange series, 87Bush-Mosteller scheme, 158

CARIMA, 66certainty equivalence, 184characteristic equation, 188, 197,

273, 277, 281chemical reactor model, 266companion matrix, 275condition number, 23

constraints, 152, 253control canonical form, 274control horizon, 187, 195, 218controllability, 275covaxiance, 26

dead-beat control, 196decision logic, 103decouplers, 209defuzzification, 104Diophantine equation, 68distillation column model, 167disturbance model, 195

equivalent kernel, 84, 96equivalent memory horizon, 35expert systems, 99extension principle, 110

factorization, 32FBC, 283fermenter model, 254finite impulse response, 47FIR, 47fixed gain observer, 196, 219fluidized bed combustion, 197, 208,

219, 283fluidized bed combustion model, 283forgetting factor, 35fuzzification, 102

point fuzzification, 103fuzzy

implication function, 103Mamdani models, 101

307

TLFeBOOK

308 INDEX

operations on sets, 103PI-controller, 107set, 102Sugeno models, 101, 104, 106

examples, 162fuzzy neural networks, 112

gain scheduling, 226Gauss-Newton method, 141generalized basis function networks,

79generalized predictive control, 189,

213adaptive, 228examples, 197, 219, 228, 235

genetic algorithms, 154global basis functions, 79GPC, 189

Hammerstein systems, 124, 127, 132and Wiener systems, 133control, 232, 242examples, 167

Hessian matrix, 140hinging hyperplanes, 90

i-step-ahead predictors, 71-73identification, 8indirect adaptive control, 227integral action, 187, 195, 211inverse Nyquist array, 210~nverses, 133, 232iterative methods, 137

Jacobian matrix, 141

k-nearest neighbours, 96Kalman filter, 42, 188, 196, 219Kuhn-Tucker parameters, 149

Lagrange function, 149, 151, 253Lagrange multipliers, 149, 152

examples, 172, 254

large scale linearization, 232learning automata, 155learning systems, 225least squares method, 17, 20, 31,

36, 45non-linear, 137

Levenberg-Marquardt, 1.40, 142examples, 143, 163, 290

linear systems, 83local basis functions, 79local models, 79LTI systems, 224

Markov process, 37matrix inversion lemma, 29Matyas random optimization, 154mean-level control, 195, 213, 229MGPC, 213minimum horizon, 187, 195, 218modus ponens, 103multilinear systems, 84multivariable

GPC, 213PI-controller, 212predictive controller, 218

nearest neighbors, 95neural networks, 89

Kohonen’s, 96one-hidden-layer si~pnoid, 94sigmoid function, 90sigmoid neural networks

examples, 169, 17’4, 235, 257,267

Newton’s method, 141non-parametric regression, 96normalization, 10

observability, 275observer canonical form, 275OE, 59output error, 59

TLFeBOOK

INDEX 309

persistently exciting, 22pH neutralization, 143pH neutralization model, 233PI-controller

fuzzy, 107multivariable, 212

pneumatic valve model, 160poles of a system, 51power series, 86prediction error

a posteriori, 33a priori, 33

prediction error methods, 138iterative, 138

prediction horizon, 218predictive control, 182

constrainedexamples, 267

examples, 255principle of superposition, 13projection algorithm, 130

radial basis function networks, 97radial construction, 81receding horizon principle, 182recursive algorithms, 137recursive least squares method, 31,

36reinforcement schemes, 157relative gain, 204relative gain array, 207

examples, 208reward-penalty scheme, 158RGA method, 204, 207ridge construction, 81RMSE, 163root-mean-squared error, 163rule-based systems, 99

s-norm, 103scaling, 10

scheduling variable, 226search region reduction, 154semi-global models, 90series estimator, 83sigmoid function, 90sigmoid neural networks

examples, 143sigmoid neural networks (SNN), 90,

94simulated annealing, 154singular matrix, 22singular values, 23smoother matrix, 84, 96stability, 130state-space feedback, 276state-space model, 184, 213, 273stationary process, 54steady state gain, 52, 126stochastic approximation, 253subspace methods, 191Sugeno fuzzy models, 101, 104, 106

t-norm, 103tensor product construction, 79three point method, 142time series

ARIMAX, 65, 190, 195ARMAX, 61, 190ARX, 57Box-Jenkins, 56, 73CARIMA, 66NARIMAX, 248NARMAX, 115, 119NARX, 115NOE, 115nonlinear, 114OE, 59, 76state-space form, 190

transfer function, 50transparency, 100two-tank system model, 172

TLFeBOOK

310 INDEX

UD factorization, 32universal approximator, 81

variance, 96Volterra systems, 113

Wiener systems, 123, 127, 129and Hammerstein systems, 133control, 232examples, 160, 172, 233

zeros of a system, 51

TLFeBOOK

advanced process identification & control

Documents