or arne johansen - institutt for teknisk kybernetikk, ntnu · 2012. 2. 18. · don wib erg at the...

�

Operating Regime based

Process Modeling and Identi�cation

Dr� Ing� Thesis

Tor Arne Johansen

�Department of Engineering Cybernetics

The Norwegian Institute of Technology � University of Trondheim

��

Report ��WDepartment of Engineering CyberneticsNorwegian Institute of TechnologyN�� Trondheim Norway

Preface

This thesis is submitted in partial ful�lment of the requirements for the degreeDr� Ing� at the Norwegian Institute of Technology � NTH� Trondheim�

This work was supported by The Research Council of Norway under doctoralscholarship grant no� ST� �� and supervised by Prof� Bjarne Foss atthe Department of Engineering Cybernetics� who has been of great inspiration andsupport� Thanks�

Moreover� I would like to thank Prof� Petros Ioannou at the University of SouthernCalifornia for hosting my six month visit at USC� My interactions with him andhis students improved my mathematical precision and resulted in some adaptivecontrol results that are partially reported in this thesis� Two chapters in thisthesis are based on manuscripts that are co�authored with Aage V� S�rensen atthe Norwegian Institute of Technology� and Dr� Erik Weyer at The Universityof Queensland� Brisbane� Australia� Several parts of this thesis has bene�tedfrom numerous discussions with Dr� Rod Murray�Smith at Daimler�Benz AG�Berlin and Dr� Tom Kavli at SINTEF�SI� Among others� Mike Thompson atMassachusetts Institute of Technology� Dr� Magne Hillestad at STATOIL� Prof�Don Wiberg at the University of California in Los Angeles� and Olav Slupphaug atthe Norwegian Institute of Technology have contributed with comments on partsof this work�

Finally� I would like to thank Trude for her support�

Trondheim� November ��

Tor Arne Johansen

i

Summary

The development of mathematical models is a major bottleneck for the appli�cation of advanced model�based techniques for control� optimization� scheduling�automatic fault detection and diagnosis in the process industries� Hence� thereis a potential for improved product quality and pro�tability� as well as improvedproduction �exibility and safety if the cost of model development can be reduced�

In this thesis we address some aspects of non�linear modeling and identi�cationusing a combination of empirical process data and prior knowledge�

The major part of this thesis is concerned with a modeling framework based onan operating regime decomposition of the system�s operating range� Within eachoperating regime� the system is modeled with a simple local model� The localmodels are patched together using a smooth interpolation technique� This frame�work supports the development of transparent semi�empirical or semi�mechanisticmodels� and is �exible with respect to the prior knowledge and empirical data re�quired� We argue that the cost of model development can be low� yet the qualityof the model can be high� through the application of this modeling framework� insome cases� These cases are characterized by a limited amount of process knowl�edge� and the availability of a reasonable amount of process data� Identi�cationof model structure and parameters on the basis of process data in this frameworkis discussed in detail� The properties of the modeling framework is illustred withsome semi�realistic examples� both simulated and experimental� In addition� theapplicability of the modeling framework for model based control is investigatedboth through simulation examples and analysis�

A minor part of this thesis is an optimization formulation of the modeling andidenti�cation problem� where the idea is to minimize a criterion that penalizesmismatch between model behavior and empirical data� and inconsistency withthe prior knowledge� The motivation is that in some cases� the application ofprior knowledge may be simpler and more transparent than through the directspeci�cation of a parameterized model structure� which is the traditional approach�

In addition� this thesis contains some theoretical results of more general interest�Some of these results are related to properties of some model structure identi�ca�tion criteria� while others are related to the stability and robustness of adaptivecontrol loops based on non�linear models�

iii

Contents

� Introduction �

�� Modeling Paradigms � � � � � � � � � � � � � � � � � � � � � � � � � � �

�� Modeling and Identi�cation � � � � � � � � � � � � � � � � � � � � � � �

�� An Operating Regime based Modeling Framework � � � � � � � � � �

�� Outline of Thesis � � � � � � � � � � � � � � � � � � � � � � � � � � � �

�� Literature overview � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

�� Hybrid Modeling � In Chemical Engineering � � � � � � � � � ��

�� Local Models � � � � � � � � � � � � � � � � � � � � � � � � � � ��

�� Contributions � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

� Input�Output Modeling ��

�� Model Representation � � � � � � � � � � � � � � � � � � � � � � � � � �

�� Optimal combination of Local Models � � � � � � � � � � � � �

�� The Model Set � � � � � � � � � � � � � � � � � � � � � � � � � ��

�� Approximation Properties � � � � � � � � � � � � � � � � � � � ��

�� Operating Regimes � � � � � � � � � � � � � � � � � � � � � � � ��

�� Some Comparisons � � � � � � � � � � � � � � � � � � � � � � � �

�� Modeling � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

�� Identi�cation � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

�� Identifying Local Model Parameters using Local Criteria � � ��

�� Identifying Local Model Parameters using a Global Criterion ��

�� Identifying Model Validity Function Parameters � � � � � � ��

�� Experimental Results � � � � � � � � � � � � � � � � � � � � � � � � � � ��

�� Identi�cation of ARX Model � � � � � � � � � � � � � � � � � �

�� Identi�cation of Operating Regime based NARX Model � � �

�� Identi�cation of a Semi�mechanistic Model � � � � � � � � � ��

�� Discussion of Identi�cation Results � � � � � � � � � � � � � � ��

�� Discussion � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

v

vi CONTENTS

� State�space Modeling �� State�Space Model Representation � � � � � � � � � � � � � � � � � � �

�� Local Models of Di�erent Structure � � � � � � � � � � � � � �� Local Linear Models � � � � � � � � � � � � � � � � � � � � � � ��

�� Incorporating Process Knowledge � � � � � � � � � � � � � � � � � � � �� Examples � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

�� Simulation Example� Population Dynamics � � � � � � � � � �� Simulation Example� A Batch Fermentation Reactor � � � � ��

�� Discussion � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

� Identication using Incomplete Data �� Bootstrap Estimation � � � � � � � � � � � � � � � � � � � � � � � � � �� Discussion � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

Identication using Validation Data �� Main Assumptions and Preliminary Results � � � � � � � � � � � � � �� Structural and Parametric Identi�cation � � � � � � � � � � � � � � � �� Discussion and Concluding Remarks � � � � � � � � � � � � � � � � � �

� Identication of Operating Regimes �� Problem Formulation � � � � � � � � � � � � � � � � � � � � � � � � � � �

�� A Generalized Framework � � � � � � � � � � � � � � � � � � � �� Model Representation � � � � � � � � � � � � � � � � � � � � � �� Model Structure Identi�cation Criteria � � � � � � � � � � � � �

�� System Identi�cation � � � � � � � � � � � � � � � � � � � � � � � � � � �� The Set of Model Structure Candidates � � � � � � � � � � � �� Basic Search Algorithm � � � � � � � � � � � � � � � � � � � � �� Heuristic Search Algorithm � � � � � � � � � � � � � � � � � � �� Parameter Identi�cation � � � � � � � � � � � � � � � � � � � � �� User Choices � � � � � � � � � � � � � � � � � � � � � � � � � � ��

�� Statistical Properties � � � � � � � � � � � � � � � � � � � � � � � � � � �� Asymptotic Properties � � � � � � � � � � � � � � � � � � � � � �� Finite Sample Accuracy � � � � � � � � � � � � � � � � � � � � ��

�� Examples � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� Simulation Example� A Batch Fermentation Reactor �Re�

visited� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� Simulation Example� A pH�neutralization Tank � � � � � � � �� Experimental Results� Hydraulic Manipulator � � � � � � � � �� Discussion of Examples � � � � � � � � � � � � � � � � � � � � ��

�� Discussion � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� A Priori Knowledge� What is required� and what can be

incorporated� � � � � � � � � � � � � � � � � � � � � � � � � � � �� A Posteriori Knowledge� What can be extracted from the

model� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� Related Work � � � � � � � � � � � � � � � � � � � � � � � � � � �� Limitations and Possible Improvements � � � � � � � � � � � �� The Fundamental Assumptions � � � � � � � � � � � � � � � � ��

CONTENTS vii

� Identication � An Optimization Approach ��

�� Problem Formulation � � � � � � � � � � � � � � � � � � � � � � � � � � ��

�� Optimization � Function Space Methods � � � � � � � � � � � � � � � ��

�� Optimization � Parameterized Approximation � � � � � � � � � � � � ��

�� Tuning the Criterion � � � � � � � � � � � � � � � � � � � � � � � � � � ��

�� Simulation Example� pH�neutralization � � � � � � � � � � � � � � � ��

�� Discussion � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

� Operating Regime based MPC ��

�� Batch Processes and Dynamic Optimization � � � � � � � � � � � � � ��

�� Model Predictive Control � � � � � � � � � � � � � � � � � � � � � � � ��

�� Simulation Example� Batch Fermentation Reactor � � � � � � � � � ��

�� System Description � � � � � � � � � � � � � � � � � � � � � � � ��

�� Modeling and Identi�cation � � � � � � � � � � � � � � � � � � ��

�� Model Predictive Control � � � � � � � � � � � � � � � � � � � ��

�� Results � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

�� Discussion � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

� Operating Regime based Adaptive Control ��

�� Preliminaries and Notation � � � � � � � � � � � � � � � � � � � � � � ��

�� Model Representation � � � � � � � � � � � � � � � � � � � � � � � � � ��

�� Non�linear Decoupling Control � � � � � � � � � � � � � � � � � � � � ��

�� Parameter Estimation � � � � � � � � � � � � � � � � � � � � � � � � � ��

�� Adaptive Control � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

�� Simulation Example� A �� CSTR� � � � � � � � � � � � � � � � � � ��

� Discussion � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

� Conclusions ��

�� Operating Regime based Modeling � � � � � � � � � � � � � � � � � � � �

�� Transparency � � � � � � � � � � � � � � � � � � � � � � � � � � � �

�� Hybrid Modeling � � � � � � � � � � � � � � � � � � � � � � � � � �

�� Incremental and Iterative Modeling � � � � � � � � � � � � � � � �

�� Computer Aided Modeling � � � � � � � � � � � � � � � � � � � �

�� Applicability � � � � � � � � � � � � � � � � � � � � � � � � � � � �

�� Open questions � � � � � � � � � � � � � � � � � � � � � � � � � � �

�� Empirical Modeling� System Identi�cation� and Regularization � � � �

�� Adaptive Control � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

A Robust Adaptive Non�linear Control ��

A�� Preliminaries � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

A�� Model Representation � � � � � � � � � � � � � � � � � � � � � � � � � ��

A�� Adaptive Control Structure � � � � � � � � � � � � � � � � � � � � � � ��

A�� Parameter Estimation � � � � � � � � � � � � � � � � � � � � � � � � � ��

A�� Closed Loop Stability � � � � � � � � � � � � � � � � � � � � � � � � � ��

A�� Discussion and Concluding Remarks � � � � � � � � � � � � � � � � � ��

viii CONTENTS

Chapter �

Introduction

In this thesis we develop a modeling framework that supports the developmentof non�linear dynamic models that are required to be valid under a wide range ofoperating conditions� Such models may be useful for e�g� analysis� optimization�control� or simulation of complex processes� e�g� chemical� biological� or metallur�gical processes� The goal of this doctoral study has been a modeling frameworkthat is �exible with respect to utilization of the prior knowledge and empiricaldata available�

�� Modeling Paradigms

In science and engineering� there are two fundamentally di�erent philosophies thatform the basis of modeling� namely the mechanistic and empirical approaches� Amechanistic model structure is developed on the basis of a detailed understandingof the generic underlying mechanisms� or laws� that governs the system behavior�Its parameters may be identi�ed using empirical data� An empirical model� on theother hand� is derived on the basis of the speci�c observed behavior of the system�Its structure is often a generic black box that cannot be directly interpreted interms of the system mechanisms� However� it may also be developed on the ba�sis of empirical knowledge� including measured process data� and the experienceof process operators and engineers� Often� neither of these two approaches areattractive� If the system mechanisms are only partially understood� the develop�ment of a mechanistic model structure may not be feasible� This may� however�also be the case for an empirical model� because of a lack of process data anddi�culties to incorporate the available system knowledge in this approach� This isbecause engineering knowledge is incompatible with many empirical model repre�sentations� like black boxes� In practise� most models are based on an unbalancedcombination of mechanistic and empirical knowledge� For example� in a mainlymechanistic approach certain aspects of the system that are not su�ciently wellunderstood may be described by empirical correlations in the model� Also� in adominantly empirical approach� some mechanistic understanding is often useful

�

� CHAPTER �� INTRODUCTION

to make certain structural choices� like the model order� or non�linearities� More�over� at early stages in the model development� an empirical model will often beuseful as a starting point for gaining more knowledge and eventually designing amechanistic model�

Good

Poor

Mechanistic

GoodPoor

Empirical Modeling

Hybrid Modeling

Mechanistic Modeling

Process Data and EmpiricalKnowledge

Knowledge

of modelingInitial state

Figure �� The �gure illustrates di�erent modeling paradigms�

Di�erent modeling paradigms are visualized in Fig� �� projected onto the �mech�anistic� and �empirical� axes� A typical situation is illustrated� where the initialstate of the model development is characterized by a lack of both empirical andmechanistic knowledge� With the empirical approach� one will collect more data�and end up in a state with more data and perhaps some improved mechanisticknowledge� With the mechanistic approach� on the other hand� one will end upwith signi�cantly improved mechanistic knowledge and perhaps some more data�The �in between� region and path corresponding to hybrid models �semi�empiricalmodels� semi�mechanistic models� or models with a balanced utilization of empir�ical and mechanistic knowledge� is often approached in an ad� hoc� manner� oravoided� A major reason for this may be that powerful frameworks and softwaretools for such problems are lacking� The main objective of this thesis is to developand study a framework that we believe may be useful for solving such hybrid mod�eling problems� It is our assumption that resources in many cases can be saved bychoosing the hybrid approach� This will be justi�ed in this thesis� We also thinkthe framework is useful during a transition phase through the �in between� regiontowards a mechanistic model� if that is the ultimate goal�

In general� mechanistic models contribute to a larger extent to our scienti�c un�

�� MODELING PARADIGMS �

derstanding of the system under study� and is the ultimate goal of all scienti�cstudies� For engineering purposes� on the other hand� a deep understanding of thesystem mechanisms is not always required� and a model that contains a signi�cantamount of empiricism can sometimes be accepted� Both the mechanistic and em�pirical modeling paradigms have serious drawbacks� The mechanistic approach isoften resource demanding and sometimes infeasible if the system is not well under�stood� Empirical modeling� on the other hand� su�ers from a perhaps even moreserious drawback� namely that it is based on the principle of induction� which sug�gests that it is possible to generalize from a su�ciently large number of consistentobservations� It is widely accepted that induction has serious limitations� since a�nite amount of observations is not su�cient to cover the usually in�nite numberof conditions the model must be valid under� This� and other aspects� makes theempirical approach highly criticized as a scienti�c approach� e�g� �Chalmers � ��However� for engineering purposes� the model is usually required to be valid onlyfor a speci�c system that is operating in a signi�cantly smaller operating rangecompared to scienti�c laws and models that are required to be valid for all possiblesystems and under a wide range of conditions� Often� one may be able to establishsome continuity or smoothness properties of the system� so that limited interpola�tion and extrapolation of the empirical model can be allowed� It seems fair to statethat mechanistic models are often reliable but expensive� while empirical modelsare often unreliable but inexpensive� where the reliability is with respect to extrap�olation� and the expenses are development costs� Moreover� an empirical model isoften more accurate than a mechanistic model when the system is operating un�der similar conditions as when the empirical data was collected� A hybrid model�based on a combination of both empirical and mechanistic knowledge and data�may be inexpensive� accurate� and reliable simultaneously� and should therefore beof potential interest for practical applications� It must be emphasized that theseproperties of hybrid models are by no means inherent� but require a minimum ofprior knowledge� empirical data� and engineering skills�

In general� it is desirable that the model representation is transparent and com�patible with the engineers understanding of the system� since this simpli�es in�terpretation� analysis� and application of the model� and allows incorporation ofprior knowledge�

The intended application of a model� together with the available knowledge anddata� will restrict its required and attainable accuracy� For example� a model usedfor process design or optimizationmay need to be more accurate than a model usedfor control design� On the other hand� such a model may only need to be a steady�state model� while a control design will in general be based on a dynamic modelthat should be most accurate around the desired crossover frequency� Also� a modelused for control during startup� shutdown or exceptional operating conditions mayonly need to roughly predict the major causal in�uences between variables� sincethis may be su�cient to bring the system quickly and reliably into the desiredoperating mode� There are often di�erent requirements regarding accuracy of themodel under di�erent operating conditions�

The intended application of a model will also impose some restrictions on its struc�ture and complexity� If a particular control algorithm is chosen a priori� it may


be a restriction that the model must be a linear transfer�function model� for ex�ample� Furthermore� a model used for analysis need to be in a mathematicallytractable form� contrary to a model used for numerical calculations only� Obvi�ously� since physical states and parameters are not part of a black�box model� theuse of a black�box model for estimation of such unmeasured physical variables isnot straightforward� Mechanistic models are generally applicable for a wider classof problems than empirical models� The structure of hybrid models will in somecases restrict the applicability to the same as an empirical model� while in othercases� the form will be closer to a mechanistic model�

Knowledge is not static� but will usually improve during plant operation� bothas historic process data� and improved understanding of the plant during opera�tion� It is therefore important that the model representation supports incrementalmodeling in the sense that falsi�ed or incomplete knowledge and models can besubstituted with improved knowledge and models in a simple way�

�� Modeling and Identi�cation

Modeling and identi�cation deals with the mapping of prior knowledge and empir�ical data to a model� More precisely� we formulate the modeling and identi�cationproblems as follows�

The modeling problem� Given a set of empirical data and some a priori knowl�edge about the system� �nd a set of models that is likely to contain a model thatdescribes the desired aspects of the system adequately�

The identication problem� Given a model set� a set of empirical data� and anobjective criterion that measures the mismatch between the prediction of a givenmodel and the empirical data� �nd a model within the model set that minimizesthe criterion� It is common practise to decompose the identi�cation problem intoat least two subproblems� Model structure identi�cation� and parameter identi��cation�

The modeling process is typically iterative and can be formalized as in Fig� ��

�� The �rst step is the formalization and analysis of the prior knowledge� designof experiments� collecting and analyzing data� Sometimes the data or priorknowledge are given� and parts of this task may be omitted� This step isperhaps the most important in the whole modeling process� since the successof the modeling and identi�cation will strongly depend on the informationin the data� and the amount and reliability of the prior knowledge�

�� The choice of the model set is what we have de�ned as modeling� and isusually based on priori knowledge and�or some preliminary analysis of thedata� With good a priori knowledge� the model set will typically containmechanistic models� On the other hand� black box model sets are usefulsince they are generic� i�e� there is a reasonable chance that there is a modelin the black�box model set that can describe any system within a large classof systems� Black�box model sets are typically very large� but their size can

�� MODELING AND IDENTIFICATION �

Select a promising

Select a promising

Does the parametervector appear to be the

parameter vector

model structure

best possible?

Is the model adequate?NO

NO

NO

YES

YES

YES

ModelingLoop

StructureIdentificationLoop

ParameterIdentificationLoop

A priori knowledge

and empirical data

Does the model

the best in the modelstructure set?

structure appear to be

Select a promisingmodel set

Figure �� The iterative process of modeling and identi�cation�


be reduced considerably by the application of some prior knowledge� This isimportant� since while it is obviously important that the model set is rich�it should at the same time be as restricted as possible to reduce the cost ofsearching through the model set� Clearly� one may put a lot of resources intothe modeling part to reduce the complexity of the identi�cation� and viceversa� In practice� time� knowledge� operational� and economic constraintswill in�uence the choice of model set�

�� The model set is typically large� and parameterized by elements of functionspaces� discrete spaces� and Euclidean spaces� Model structure identi�cationdeals with search and optimization in function spaces and discrete spaces� Itis typically an iterative process� and the model structure from the previousstep in the iteration is usually used as a basis for the selection of a morepromising model structure� either re�ned or simpli�ed� Often� the structureidenti�cation problem is a combinatorial one� Hence� good heuristics areimportant to avoid too many iterations in this loop�

�� In this thesis� parameter identi�cation is de�ned as estimation of parame�ters belonging to an Euclidean space� This problem is much studied in theliterature�

�� After the apparently best model in the model set has been found� it mustbe validated� A validation test may include prediction tests using a separatevalidation data set� If the model does not pass the validation test� a moregeneral model set may be tried� Alternatively� the prior knowledge must berevised and improved� or more informative data must be collected�

Observe that each steps in this procedure can be done either manually� or auto�matically by a computer� The inner loops performs well de�ned tasks based onquantitative information and data� and are most suited for computer implementa�tion� while the outer loops are based on qualitative knowledge� and therefore moredi�cult to formalize and implement on a computer�

Notice that the procedure outlined above is not intended to capture all facets ofpractical modeling and identi�cation work� It is only meant to serve as a frameworkfor later use in this thesis�

�� An Operating Regime based Modeling Frame�

work

Any model will have a limited range of validity� This may be restricted by themodeling assumptions for a mechanistic model� or by the experimental conditionsunder which the data was logged for an empirical model� To emphasize this� amodel that has a range of validity less than the desired range of validity will becalled a local model� as opposed to a global model that is valid in the full range ofoperation� In this thesis we will be concerned with a modeling framework that isbased on combining a number of local models� where it is of particular importanceto describe the region in which each local model is valid� We call such a region anoperating regime�

�� AN OPERATING REGIME BASED MODELING FRAMEWORK

Regime 2

Regime 1

Regime 3

Regime 4

z��t�

z��t�

Figure �� The set of two�dimensional operating points is decomposed into fourregimes� The vector z�t� � �z��t�� z��t�� is the operating point�

The framework can be conceptually illustrated as in Fig� �� The system�s fullrange of operation is completely covered by a number of possibly overlapping op�erating regimes� In each operating regime the system is modeled by a local model�and the local models can be combined into a global model using an interpolationtechnique� One motivation behind this framework is that global modeling is com�plicated because one will need to describe the interactions between a large numberof phenomena that appear globally� Local modeling� on the other hand� may beconsiderably simpler because locally there may be a smaller number of phenomenathat are relevant� and their interactions are simpler�

Already at this stage� we can to some extent justify the following attractive featuresof the framework�

� The decomposition of the modeling problem using operating regimes is adirect application of the divide�and�conquer strategy� which is a general ap�proach to problem solving�

� Both the concept of operating regimes� and the model structure� are easy tounderstand� This is important� since the model structure can be interpretedboth qualitatively in terms of the operating regimes� but also quantitativelyin terms of the individual local models� Moreover� the operating regimeconcept o�ers a framework for interdisciplinary work because of its closerelation to common engineering thinking�

� The framework is independent of the underlying model representation� in thesense that it works for input�output or state�space models� distributed orlumped models� and discrete�time� continuous�time� static� or discrete�eventmodels�

CHAPTER �� INTRODUCTION

� An immediate implication of the points above is that the framework formsa basis for hybrid modeling� For example� in some regimes the system canbe described by an empirical input�output model� while it in other regimescan be described by a mechanistic state�space model� In other words� theframework is �exible with respect to the type of prior knowledge that canbe applied� Moreover� the framework supports incremental modeling andsimple model maintenance to some extent� because the modi�cation of asingle local model or operating regime has a predictable e�ect on the globalmodel� In addition� we may use local models with di�erent levels of accuracyaccording to what is required in di�erent operating regimes�

� The local nature of the model representation has advantages when the modelis applied for model based control� and in particular adaptive control� Onereason is that only a few model parameters will be relevant at a given timeinstant� Hence� one will reduce drift phenomena in an on�line parameterestimator caused by lack of global persistence of excitation�

These properties will be further discussed and supported in the remaining of thisthesis�

�� Outline of Thesis

First a few words on style� This thesis is based on a number of manuscripts thatare published separately� or submitted for publication� We have attempted toreduce the redundancy somewhat� and more or less rewritten each manuscript�The notation within each chapter will be consistent� but there may be some minorinconsistencies in notation between the various chapters� Instead of providing thereader with a complete nomenclature list� each symbol is de�ned when �rst appliedin each chapter� With a few standard expections� abbreviations are avoided� Ifnot explicitly stated� the mathematical notation follows the textbook of Kreyszig�� The mathematical precision varies somewhat throughout this thesis� Wehave tried to apply the most natural precision level in the given context� Forexample� in theorems� no compromise on the mathematical precision is made�while in the de�nitions of some mathematical models it may be quite sloppy� Inparticular� we do not always distinguish between a function and its value� or arandom variable and its realization�

The �nd and �rd chapters describe the core of the operating regime based modelingframework� and illustrate the basic approach with some examples�

Structure identi�cation is a major issue in this thesis� In Chapters � and �� wedescribe some new results� These results are of general interest� and are in partic�ular applied in Chapter �� that describes an algorithm for structure identi�cationin the operating regime based modeling framework� Furthermore� identi�cation isalso the topic of Chapter �

Model predictive control and adaptive control using operating regime based modelsis discussed in Chapters and � respectively� A theoretical contribution in the�eld of robust adaptive control can be found in Appendix A�

�� OUTLINE OF THESIS

This organization of the thesis is chosen because it allows linear reading withoutjumping back and forth�

Chapter �� Operating Regime based Input�Output Modeling� Somebasic properties of the model representation based on interpolating a number oflocal models� are elaborated upon� First� we prove that the applied interpolationprocedure is optimal� in a certain sense� Next� some approximation theoreticalbounds on the modeling error are developed� The developments are constructiveand provides fundamental insight into the design trade�o�s� Moreover� a proce�dure for the development of NARMAX models is described� and the procedure isillustrated with some experimental results from a laboratory heat transfer process�The optimality result can be found in

Tor A� Johansen� �On the Optimality of the Takagi�Sugeno�Kang Fuzzy InferenceMechanism�� th IEEE Int� Conference on Fuzzy Systems� Yokohama� Japan�� accepted��

The approximation theoretical results and the modeling procedure is published in

Tor A� Johansen and Bjarne A� Foss� �Constructing NARMAX Models using AR�MAX Models�� Int� J� Control� Vol� �� pp� ��

The experimental results are from

Tor A� Johansen and Bjarne A� Foss� �Empirical Modeling of a Heat TransferProcess using Local Models and Interpolation�� American Control Conference�Seattle� Wa�� submitted��

Chapter �� Operating Regime based State�space Modeling� The model�ing procedure in Chapter � is extended to state�space modeling� and the additionaldi�culties are discussed� Simulation examples from the areas of population dy�namics and biochemical engineering are used to illustrate the ideas� The core ofthis chapter was presented in

Tor A� Johansen and Bjarne A� Foss� �State�space Modeling using OperatingRegime Decomposition and Local Models�� Preprints ��th IFAC World Congress�Sydney� Australia� Vol� �� pp� ��

Chapter �� Model Structure Identication using Empirical Data froma Limited Operating Range� A new structure identi�cation criterion based onbootstrap estimation is developed� The purpose is to give reliable identi�cationof model structure �in the sense of a model with good extrapolation properties�in cases when the model is required to work under a wider range of operatingconditions than re�ected in the identi�cation data sequence� This work can befound in

Tor A� Johansen and Bjarne A� Foss� �A Dynamic Modeling Framework based onLocal Models and Interpolation � Combining Empirical and Mechanistic Knowl�edge and Data�� Computers and Chemical Engineering �submitted��

�� CHAPTER �� INTRODUCTION

Chapter � Model Structure Identication using Separate ValidationData � Asymptotic Properties� Here we prove that under reasonable assump�tions� the identi�cation of model structure on the basis of a separate validationdata sequence will asymptotically give the best possible model in the model set�This result applies to the structure identi�cation algorithm in Chapter �� Thischapter is contained in

Tor A� Johansen and Erik Weyer� �Model Structure Identi�cation using SeparateValidation Data � Asymptotic Properties�� European Control Conference� Rome�� submitted��

Chapter �� Identication of Operating Regimes� We propose an algorithmthat will automatically �nd a decomposition into regimes and local models on thebasis of empirical data� The input to the algorithm is in addition to a sequenceof empirical data� a set of alternative local model structures� Some statisticalproperties of this algorithm are brie�y discussed� The algorithm is applied to thesame simulated biochemical modeling problem as in Chapter �� a simulated pHneutralization tank� and an experimental hydraulic manipulator modeling prob�lem� In this last example� the results are compared to three other state�of�the�artempirical modeling algorithms from the literature� A short version of this chapteris

Tor A� Johansen and Bjarne A� Foss� �Identi�cation of Non�linear System Struc�ture and Parameters using Regime Decomposition�� Preprints IFAC Symposiumon System Identi�cation� Copenhagen� Vol �� pp� �� To appear inAutomatica� March � � �accepted��

Some of the examples are taken from the following manuscript

Tor A� Johansen and Bjarne A� Foss� �A Dynamic Modeling Framework based onLocal Models and Interpolation � Combining Empirical and Mechanistic Knowl�edge and Data�� Computers and Chemical Engineering �submitted��

Chapter �� Identication of Non�linear Systems using Prior Knowledgeand Empirical Data � An Optimization Approach� In this chapter weillustrate how various kinds of prior knowledge can be formulated as constraintsor penalties in a prediction error identi�cation criterion� The purpose of this isto elevate the modeling problem from the speci�cation of a model structure interms of e�g� a set of basis�functions� to a higher level� It is shown how di�erentkinds of prior knowledge leads to di�erent optimalmodel structures in some simplecases� and a practical �almost optimal� numerical procedure is suggested for thegeneral case� The choice of optimization criterion corresponds to modeling inthis approach� and a procedure that support the choice of criterion is suggested�The method is illustrated with the same simulated pH neutralization tank as inChapter �� This chapter is based on

Tor A� Johansen� �Identi�cation of Non�linear Systems using Prior Knowledge andEmpirical Data � An Optimization Approach�� Automatica �submitted��

�� OUTLINE OF THESIS ��

Chapter �� Operating Regime based Model Predictive Control� Witha simulation example� we illustrate the application of the operating regime basedmodeling framework for model predictive control of a batch fermentation reactor�The material in this chapter is taken from

Bjarne A� Foss� Tor A� Johansen and Aage V� S�rensen� �Nonlinear PredictiveControl using Local Models � Applied to a Batch Process�� Preprints IFAC Sym�posium on Advanced Control of Chemical Processes ADCHEM� Kyoto� Japan��pp� �� Also in Control Engineering Practise� summer � � �accepted��

Chapter �� Operating Regime based Adaptive Control� We illustrate howthe operating regime based modeling framework may give model structures thatare attractive for the control of non�linear systems� We restrict our attention to adiscrete�time non�linear decoupling controller based on a MIMO NARXmodel rep�resentation� First� we prove closed loop stability and robustness of the non�linearcontrol system� Next� we formulate a certainty equivalence adaptive controllerbased on the same model representation� and develop a modi�ed parameter es�timation algorithm that only updates the parameters of the most relevant localmodels at each time instant� We show that this algorithm has basically the sameconvergence properties as the standard projection algorithm� Finally� we proverobust stability of this adaptive control system� The method and design trade�o�sare illustrated with a simulation example� where a �� Continuous Stirred TankReactor �CSTR� is controlled� This work was presented in

Tor A� Johansen� �Adaptive Control of MIMO Non�linear Systems using LocalARXModels and Interpolation�� Preprints IFAC Symposium on Advanced Controlof Chemical Processes ADCHEM� Kyoto� Japan�� pp� ��

and published in a fuzzy disguise in

Tor A� Johansen� �Fuzzy Model Based Control� Stability� Robustness� and Per�formance Issues�� IEEE Transactions on Fuzzy Systems� Vol� �� pp ��

Chapter � � Conclusions� This chapter summarizes our main conclusions andcontains some suggestions for future work�

Appendix A� Robust Adaptive Control of Slowly Time�varying Non�linear Systems� Here we prove global stability and robustness of a SISO discrete�time adaptive non�linear decoupling controller for a quite general class of slowlytime�varying non�linear systems� This result is a generalization of the relatedresults in Chapter � and based on

Tor A� Johansen� �Weighted l��norms for Analysis of an Adaptive Control Loopbased on a Non�linear Model�� IEEE Transactions on Automatic Control �submit�ted��

Some results in the same spirit are given in

Tor A� Johansen and Petros A� Ioannou� �Robust Adaptive Control of MinimumPhase Non�linear Systems�� Int� J� Adaptive Control and Signal Processing �sub�mitted��


�� Literature overview

First we will give a brief overview of some recent approaches to hybrid modelingfrom the chemical engineering literature� Then an overview of some local modelingapproaches will be presented�

�� Hybrid Modeling � In Chemical Engineering

Recently� the hybrid modeling problem has attracted considerable attention� Onereason for this may be the interest in non�linear empirical modeling and neuralnetworks over the last decade� combined with the increasing demand for non�linearmodels to be applied in intelligent controllers� diagnosis systems� and optimizationbased systems for design� scheduling� and quality or pro�t maximization� A verybrief overview of some of these approaches follows�

The interactions between empirical and mechanistic knowledge is widely discussedin the engineering and scienti�c literature� Through a series of work� e�g� �Boxand Youle � �� Box and Hunter � �� Box and Draper � �� the interplay be�tween model purpose� experimentation� empirical models� and mechanistic modelshas been discussed in great detail� Referring to Fig� �� the tools developed byBox and co�workers will support the transition from a state with little systemknowledge� to either an empirical or mechanistic model� rather than to a hybridmodel� A more direct approach to the hybrid modeling problem is the grey�boxapproach of Bohlin �� see also �Bohlin and Graebe � �� Grey�box modelsare characterized by a structure that is mainly mechanistic� but augmented withstochastic elements� empirical relations and parameters that need not have a di�rect physical interpretation� Physically based constraints that ensure the model isconsistent with certain aspects of the physical reality� and prior knowledge aboutthe parameters� can also be utilized �Tulleken � �� In another approach� Lind�skog and Ljung �� illustrates how simple mechanistic knowledge can be usedto �nd adequate non�linearities in a semi�empirical model� Kramer� Thompsonand Phagat �� has proposed an approach where an a priori given mechanisticmodel is used as a default model� and an empirical model based on a radial�basisfunction series expansion compensates for the possible mismatch between the pre�diction of the more or less inaccurate mechanistic model and the observed data�see also �Thompson and Kramer � �� Su� Bhat and McAvoy � �� In addition�the parameters of the empirical model part may be constrained� A related� butsomewhat di�erent approach was taken by Jian �� in his model of a blast fur�nace� A linear empirical �ARMAX� model with an on�line estimator was used asa rough model of the system� The di�erence between the predictions given by theARMAX model and the system output was modeled with a set of empirical quali�tative rules from a knowledge�base that was designed with the help of experiencedoperators� This hybrid model was reported to give better predictions than eitherthe ARMAX model or the predictions of the operators alone� Another approach tothe modeling problem is described in �Psichogios and Ungar � �� and �Aoyamaand Venkatasubramanian � �� A mechanistic model structure containing some

�� LITERATURE OVERVIEW ��

unknown internal variables is supposed to be known� The unknown internal vari�ables will typically be reaction rates or thermodynamic properties that depend ina complex and often poorly known manner on a number of other model variables�The idea is to use a neural network as an empirical model of these dependences�Yet another approach will be the main topic of this thesis�

�� Local Models

The basic idea is simple� and several scientists and engineers has coined similarideas� It is particularly interesting to observe that when we have presented theseideas for engineers practising in industry� �everybody� seems to recognize thebasic concept� However� the industrial applications have in general been ad� hoc��in the sense that no formalized framework or theoretical foundation has beenapplied� Unfortunately� the industrial results are typically not published� and theproceeding literature overview will therefore mainly consist of contributions fromacademia�

The earliest model we have found� that is directly based on local models� is fromthe �eld of mathematical biology� and dates back to Kolmogoro� �� Themodel is an alternative to the classical Lotka�Volterra model� and based on adecomposition of the system into di�erent regimes �called zones� with qualitativelydi�erent behavior� This model will be discussed in some detail in section ��

Piecewise linear models clearly fall into this framework� Early contribution describ�ing modeling procedures based on piecewise linear models and an optimization for�mulation of the regime decomposition problem are found in �Bellman � ��b� Opoit�sev � �� Dorofeyuk� Kasavin and Torgovitsky � �� Kasavin � �� Rajbman� Do�rofeyuk and Kasavin � �� Various other procedures are described in �Breimanand Meisel � �� Cyrot�Normand and Mien � �� Tong and Lim � �� Haber�Vajk and Keviczky � �� Omohundro � � Farmer and Sidorowich � � Billingsand Voon � � Billings and Chen � � Kavli � �� Str�omberg� Gustafsson andLjung � �� Skeppstedt� Ljung and Millnert � �� Hilhorst � �� S�oderman� Topand Str�omberg � ��

A particularly interesting piecewise modeling approach from the biochemical en�gineering literature is described in �Zhang� Visala� Halme and Linko � �� wherethe application of a prior knowledge for the decomposition into regimes �calledfunctional states� is highlighted�

The use of more complex local models than linear� like a piecewise combinationof several local neural network models �S�rheim � �� and local polynomial mod�els �Pottmann� Unbehauen and Seborg � �� are also described in the literature�Clearly� piecewise polynomials are closely related to the spline�based modelingapproach� e�g� �Wahba � �� The main di�erence lies in the way splines han�dles multi�dimensionality� While piecewise polynomials are multi�dimensional bynature� splines are not� and multi�dimensionality is typically introduced by themeans of tensor products�

In our view� the most interesting approach in the local modeling literature is theone of Takagi and Sugeno �� where each operating regime is represented as


a fuzzy set �which may roughly be viewed as a set with soft boundaries�� Thisrepresentation is appealing� since many systems change behavior smoothly as afunction of the operating point� and the soft transition between the regimes intro�duced by the fuzzy sets representations captures this feature in an elegant fashion�Algorithms for the identi�cation of a decomposition into regimes on the basis ofdata are described by Sugeno and Kang �� and several experimental and simu�lated examples are given �Takagi and Sugeno � �� Sugeno and Kang � �� Sugenoand Kang � � Nakamori� Suzuki and Yamanaka � �� Related identi�cation al�gorithms based on cluster analysis are described in �Bezdek� Coray� Gundersonand Watson � �a� Bezdek� Coray� Gunderson and Watson � �b� Hathaway andBezdek � �� Yoshinari� Pedrycz and Hirota � �� Nakamori and Ryoke � ��and similar ideas are applied in chemometrics by N�s and Isaksson �� andN�s ��

Essentially the same ideas has evolved in the neural network community as well�where Stokbro� Hertz and Umberger �� Jones and co�workers �� describeapproaches to modeling based on local linear models as generalizations of neuralnetworks with localized receptive �elds and radial basis�functions� e�g� �Moodyand Darken � �� Simultaneously� Jacobs� Jordan� Nowlan and Hinton �� developed a framework based on �local experts�� and an e�ective identi�actionalgorithm were later developed by Jordan and Jacobs �� Other identi�cationalgorithms were developed by Murray�Smith �� and Murray�Smith and Gollee��

A standard approach to non�linear control design is gain scheduling� which is es�sentially a collection of local controllers� Di�erent controllers are activated �sched�uled� under di�erent operating conditions� The operating regime based modelingapproach has obvious similarities �it may be descibed as model scheduling�� Noticethat the design of a non�linear controller based on a combination of local models ismore sophisticated than the use of gain scheduling� see also �Shamma and Athans� �� Rugh � �� that analyze the shortcomings of gain scheduling�

The works of Stokbro et al� �� and Jones and co�workers �� were the initialinspiration for the present thesis� and our initial ideas were published in �Johansenand Foss � �b� Johansen and Foss � �a� Johansen and Foss � �c� Foss andJohansen � ��

It is interesting to observe that most of this literature was found more or less byluck� i�e� browsing volumes of proceedings and journals� Essentially nothing wasfound by database searches� The reason for this is the interdisciplinary nature ofthese ideas� which have their origin in very di�erent communities� including biol�ogy� fuzzy sets� physics� mathematics� statistics� neural nets� chemical engineering�control and identi�cation theory� to mention some� The terminology varies con�siderably� which makes database search almost impossible� We expect there aresigni�cant contributions in perhaps completely di�erent areas that we have failedto discover�

�� CONTRIBUTIONS ��

�� Contributions

Brie�y� the main contributions in this thesis are

� The design procedure for operating regime based model structures based on acombination of qualitative� quantitative� empirical� and mechanistic systemknowledge outlined in Chapters � and ��

� The structure identi�cation algorithm in Chapter �� The bootstrap structure identi�cation criteria described in Chapter �� The asymptotic analysis of the structure identi�cation criterion based onseparate validation data� Chapter ��

� The formulation of the modeling and identi�cation problem as an optimiza�tion problem in Chapter �

� The proof of optimality of the local model interpolation method in section��

� The constructive approximation theoretical results in section �� The procedure for combining local models with di�erent state�spaces� section��

� The modi�ed on�line identi�cation algorithm in section �� The proof of stability in Chapter �� The application of weighted l��norms in the stability proof in Appendix A�

Chapter �

Operating Regime basedInput�Output Modeling

For some applications� one wants a model that only describes the input�outputbehavior of the system� The ARMAX model representation is a well known linearinput�output model representation� while the NARMAX �Nonlinear ARMAX�model representation is an extension that represents the model as a nonlinearmapping of past inputs� outputs and noise terms to future outputs� In this paperwe discuss how NARMAX models can be represented in the operating regimebased modeling framework� and discuss in detail how a NARMAX model can beconstructed from a set of local ARMAX models�

This chapter is organized as follows� First� in section �� we present a modelrepresentation based on local models� and builds a theoretical foundation for thisrepresentation� The operating regime concept is introduced and we present ageneral result guiding the choice of variables used to characterize the operatingregimes� Thereafter� we discuss some practical aspects of modeling using localARMAXmodels in section �� and some aspects of system identi�cation in section�� In section �� the procedure is illustrated with some experimental results� andsection �� contains some discussion�

�� Model Representation

The NARMAX model representation

y�t� � f�y�t�� y�t�ny�� u�t�� u�t�nu�� e�t�� e�t�ne�� e�t� ��

is shown by Leontaritis and Billings �� to be able to represent the observableand controllable modes of a large class of discrete�time non�linear state�space mod�els� Here y�t� � Y � Rm is the output vector� u�t� � U � Rr is the input vector�and e�t� � E � Rm is noise� Throughout this chapter we will assume ny� nu and

�

� CHAPTER �� INPUTOUTPUT MODELING

ne are known� and focus on the problem of representing and constructing the non�linear function f � ! � Rm� We introduce the �m�ny ne� rnu��dimensionalinformation vector

��t � ��

�BBBBBBBBBBBBBBB�

y�t � ��

y�t � ny�u�t� ��

��u�t � nu�e�t� ��

e�t� ne�

�CCCCCCCCCCCCCCCAbelonging to the set ! � Y ny � Unu � Ene � This enables us to write �� in theform

y�t� � f��t � �� e�t� ��

Provided necessary smoothness conditions on f are satis�ed� a general way of ap�proximating f is by series expansions� A �rst order Taylor�series expanded aboutan equilibrium point yields an ARMAX model� Second� and third�order Taylor�expansions are possible� while higher�order Taylor�expansions are not very usefulin practice because the number of parameters in the model increases rapidly withthe expansion order� and because of the poor extrapolation and interpolation ca�pabilities of higher�order polynomials� Splines o�ers one possible solution to thisproblem� e�g� �Wahba � �� where the idea is to patch together low�order poly�nomials� A representation that is closely related to splines in spirit� but still verydi�erent for multi�dimensional modeling problems� is based on patching togetherlocal models� In its basic form� it was introduced by Takagi and Sugeno �� andrediscovered independently by Stokbro et al� �� Jones and co�workers �� and �Jacobs et al� � �� In its present form� it was introduced in �Johansen andFoss � �b� Johansen and Foss � �a�� The material in this chapter is mainlybased on �Johansen and Foss � �a��

�� Optimal combination of Local Models

Suppose N local models �indexed by i � f�� Ng�

y�t� � "fi��t � �� e�t� ��

are available� and the di�erent local models are accurate under di�erent operatingconditions� Hence� under some operating conditions there may be several localmodels that are accurate� while no local model may be accurate under other con�ditions� Suppose the relative validity �or relevance� of each local model is indicatedby the functions #�� #�� #�N � ! � $�� %� If at a given � � ! the local modelindexed with i is accurate� then #�i�� will be close to one� while #�i�� is close tozero for all � � ! where local model i is inaccurate�

�� MODEL REPRESENTATION �

This immediately raises the question about how to predict the system output forthose � where it is not exactly one local model is accurate� We essentially seek aglobal model

y�t� � "f��t � �� e�t�

based on a combination of the local models �� From the de�nition of #�i� it is

natural to require that "f�� should be close to "fi�� at those � � ! where #�i��is large� The subset of ! where �#�i�� is large�

� is denoted !i� This suggests that"f should minimize a criterion such as

M � "f � �NXi��

Z��

�� "f �� "fi��#�i��d�

where jj � jj� is the Euclidean norm� We now have the following result �Johansen� ��

Theorem � Suppose

�� the functions "f�� "f�� "fN belong to Cm�!�� the set of all continuous m�dimensional functions de�ned on !�

��PN

i�� #�i�� for all � � !�Then the function "f de�ned by

"f �� NXi��

"fi�� #wi��

#wi�� #�i��PNj�� #�j��

��

minimizes M on Cm�!��

Proof� The variation of M with respect to any perturbation &f � Cm�!� is

�M � "f '&f� � �NXi��

Z��

� "f �� "fi��#�i��&f��d�

Notice that M is strictly convex� A necessary and su�cient condition for globaloptimality of "f is now �Luenberger � � �

NXi��

� "f �� "fi��#�i�� for all � � !

From the second assumption� it is evident that "f is well de�ned by this equation�

�

�We will later illustrate how the expression �is large� can be made mathematically preciseby the introduction of fuzzy set theory� For the time being� let us just accept this unprecisede�nition of �i�

�� CHAPTER �� INPUTOUTPUT MODELING

The �rst assumption is quite reasonable� since we are allowed to choose the localmodels ourselves in most applications� The second assumption simply means thatat least one local model should be relevant under every possible operating condi�tion� This is also reasonable� since we do not need an accurate local model� just arelevant one�

We call the functions in the set f #wigNi�� interpolation functions because they are

used to interpolate� between local models "fi� which are supposed to be accuratedescriptions of �the true f� locally� The relative weight given to each local modelin the interpolation is determined by the relative validity of the local models� Wecan interpret #wi as a function that has largest value in the parts of ! where thefunction "fi is the best approximation to f � and close to zero elsewhere �notice that

best does not imply good�� By the de�nition of #wi we know thatPN

i�� #wi�� for all � � !� Hence� the resulting global model is a convex combination of thelocal models� In the remaining of this thesis we will consistently apply this methodfor combining local models�

�� The Model Set

The set of all functions of the form �� with local models of polynomial order pis denoted

#Fp� #K� ��"f � !� Rm

�� "f�� NXi��

"fi�� #wi��

�

It is assumed that the function #�i belong to a given set #K of smooth kernel functionsthat go su�ciently fast to zero� for example the set of all Gaussians�

#K �

�k � !� $�� %

�� k�� exp�� i�T(��i �� i�

��

where (i is a positive de�nite scaling matrix� The choice of #K will be elaboratedupon later�

Next� let us brie�y discuss the choice of local model structures� At the extreme� azeroth order Taylor�expansion of f about a point �i � !i may be used to de�nethe local model

"fi�� f��i� ��

or the parameterized local model structure

"fi�� i ��

where �i is an unknown parameter vector�

�Strictly speaking� it is not an interpolation� rather a combination� However� we will still usethe word interpolation throughout this thesis�

�� MODEL REPRESENTATION ��

A �rst order Taylor�expansion of f about �i provides better extrapolation andinterpolation than the zeroth order expansion �� Assuming the �rst derivativeof f in �i exists� the local models are given by

"fi�� f��i� rf��i�� i� ��

or parameterized local model structure may be de�ned as

"fi�� i )i�� i� ��

where �i is an unknown parameter vector and )i is an unknown parameter matrix�Observe that �� is actually an ARMAX model resulting from a linearization of�� about �i� Both the Weighted Linear Maps of Stokbro et al� �� Stokbroand Umberger �� and the Connectionist Normalized Linear Spline Networks ofJones and co�workers �� and Jones� Lee� Barnes� Flake� Lee� Lewis and Qian�� use a �rst order expansion locally� This representation makes it possible toconstruct a NARMAX model by interpolating a number of ARMAX models�

Higher order and non�polynomial local models can also be used� Furthermore�there is no requirement that all the local models must have the same structure� Inparticular� the local models may have di�erent dynamic or polynomial order�

�� Approximation Properties

It is intuitive that any smooth function can be approximated arbitrarily well withthis representation by making the decomposition of ! into subsets !i su�ciently�ne� This is indeed the case� as illustrated in the following� We use the followingnorm to measure the approximation accuracy

jjf � "f jj� � sup��

jjf�� "f��jj�

The �p ��th derivative of the vector function f at the point � is denoted

rp��f�� Assume f is continuously di�erentiable p � times� and "fi is a lo�cal model equal to the �rst p � terms of the Taylor�series expansion of f about�i� For any � � !� we have

f�� "f �� NXi��

�f�� "fi�� #wi��

If we assume jjrp��f��jj � M for all � � !� where jj � jj denotes the inducednorm� we obtain by Taylor�s theorem

jjf�� "f ��jj� �

NXi��

M

�p ��*jj� � �ijjp�� #wi��

In order to ensure that the right�hand side is no larger than an arbitrary � ��we must ensure that for any � � ! the following condition holds

NXi��

jj�� ijjp�� #�i�� p ��*

M

NXi��

#�i��


-1

0

1

2

3

4

5

6

-1 0 1 2 3 4 5 6 7 8 9

�

� �p��M

g�g�

g

#�#��#��

��

Figure �� A geometric interpretation of the constraints on #�i

De�ning the function gi � !� R by

gi�� jj�� ijjp�� p ��*

M

and rewriting �� gives the following condition that must hold for any � � !NXi��

gi��#�i��

or equivalently

NXi��

gi�� #wi��

The problem is now to �nd conditions on N and the set of functions f#�igNi�� #K toensure that equation �� holds for any given � �� A geometric interpretationof �� is given in Figure �� Certainly� this equation holds if the negativecontribution of one term gi��#�i�� in �� dominates the �possibly positive�contribution of all other terms�

Notice that the shape of the gi�functions is �xed and given by the speci�cations�We are� however� free to choose the location and number of local models� Let uschoose the set f�igNi�� so large and �su�ciently dense in !� that at least one ofthe functions fgigNi�� will be negative at any � � !� Then the functions fgigNi��are �xed� and we must choose the #�i�functions such that �� hold� This can beaccomplished in several ways� In the limit when the width of the #�i�functions goto zero� the interpolation functions #wi will approach step�functions as shown inFigure �� The model will then approach a piecewise constant model if p � �� a


-1

0

1

2

3

4

5

6

-1 0 1 2 3 4 5 6 7 8 9

�

� �p��M

g�g�

g

��

#�� #�� #�

#w� #w#w�

Figure �� Situation when the width of #�i goes to zero�

piecewise linear model if p � �� etc� In this limit� at any � � ! there will exist aj such that

#wi��

�� if i � j�� if i � j

By the choice of f�igNi�� we know that gj�� and since #wi�� for i � j�� will hold� We can now provide a result for the case when ! is a compactset�

Theorem � Suppose any integer p � is given� and f has continuous �p ��th

derivative� If ! is compact� then for any � � there is an "f � #Fp� #K� such that

jjf � "f jj� �

Proof� Since rp��f�� is continuous� its norm is bounded �by M � �� on thecompact set !� Moreover� since ! is compact� a �nite N is su�cient to ensurethat at least one gi�function is negative at any point in !� Since N is �nite� itis not necessary to go to the limit and make f #wigNi�� step�functions� but choosef#�igNi�� as smooth kernel functions that are su�ciently narrow for �� to hold�This is possible because the kernel functions go to zero su�ciently fast�

�

This is an existence result� However� the proof is constructive and gives indicationson how to construct the approximation� In order to use this proof to formulate anupper bound on the approximation error� we introduce the following de�nition ofdistance between sets� similar to the Haussdorf metric�

Denition � Assume A and B are non�empty subsets of Rn� Then the distancebetween the sets is de�ned as

D�A�B� � infa�A

supb�B

jja� bjj�

�


The crux in the proof of Theorem � is that at any point � � ! one of the gi�functions is negative and that the #�i�functions are chosen such that the negativeterm gi��#�i�� will dominate the sum �� At least one gi�function will benegative at any � � ! if the following condition holds

D�f�ig�!� ��p ��*

M

�p��

��

If the set f�igNi�� is dense in !� this distance will be zero� The term �su�cientlydense� used informally above� means that the set f�igNi�� should be chosen suchthat �� holds for the given �

Theorem � Suppose an integer p � is given� If ! is compact� and f hasbounded �p ��th derivative� i�e� jjrp��f��jj � M for all � � !� then there

exists polynomials "f�� "fN of order p and smooth kernel functions #�� #�N � #Ksuch that the approximation error is bounded by

jjf � "f jj� �M

�p ��*�D�f�ig�!��p��

Proof� Eq� �� will hold for equal to the right�hand side of �� From the

previous discussion� it is evident that �� holds for any � � !� Hence� jjf� "f jj�will be bounded by � and the result follows�

�

The result above holds under the condition that the #�i�functions are chosen suf��ciently narrow� and the local models are truncated Taylor�series expansions oforder p� Hence� the bound �� may be conservative� since better local modelswith di�erent parameters than truncated Taylor�series may exist� Moreover� for#�i�functions that are not too narrow and not too wide� one may expect betteraccuracy� From the deviations above it is evident that choosing #�i from the setof Gaussians� cf� �� is only one possibility� The Gaussian goes to zero fasterthan gi goes to in�nity� which is clearly a desirable property� In general� a smoothkernel function set #K can be generated by the application of wavelet functions�Benveniste� Juditsky� Delyon� Zhang and Glorennec � �� Then #K consists ofall possible translations and scalings of a given wavelet function that go fast tozero� #K de�ned by �� is such a set� where translation is parameterized by �i�and scaling by (i�

Obviously� if f does not satisfy the smoothness conditions in Theorem �� the proofdoes not hold� However� if f is such that it can be approximated arbitrarily well bya smooth function� then it is trivial to show that f can be approximated arbitrarilywell by smoothly interpolating local models� In patricular we have�

Corollary � The conclusion in Theorem � also holds if the smoothness assump�tion on f is relaxed to assuming only continuity� In other words� the set #Fp� #K� isdense in Cm�!��

Proof� By the Weierstrass approximation theorem� e�g� �Kreyszig � �� for any � � there exists a polynomial #f such that jjf� #f jj� � �� By Theorem �� there

exists an "f � #Fp� #K� such that jj #f � "f jj� � ��

�


Example �

Assume p � �� i�e� the local models are ARMAX models� Then �� can bewritten

jjf � "f jj� � M

��D�f�ig�!��

where M is a bound on the Hessian of f � If f is linear� then M � � and one locallinear model is su�cient to make an arbitrary good global model �of course�� Mis an indication of the nonlinearity or complexity of the function� and we expect to increase with increasing M � i�e� increasing nonlinearity� which is indeed thecase� cf� �� However� using the upper bound M gives a conservative boundsince f may be more complex �less linear� in some regions than others� Hence� wedo not need high density of local models in parts of ! where the system does nothave strongly nonlinear behavior�

�

Example �

With a simple example we illustrate an application of Theorem �� Consider thefunction f � $�� % � R given by f�� Assume that we have twolocal linear models �truncated Taylor series expansions� located at � � �� and�� Then D�f�� g� $��%� � �� p � � and M � �� Theorem � predictsthe bound � �� on the approximation accuracy� As shown by Figure ��this bound is not conservative when using in�nitely narrow functions #�i� i�e� a

piecewise linear approximation� The reason for this is that M � d�fd�� for

all � � $�� %� hence there are no parts of ! where f is �less nonlinear� or �lesscomplex�� As we shall see in Example �� better approximations can be achievedusing well�chosen #�i�functions� From Figure �� it is also evident that if the locallinear models are not chosen as a �rst order Taylor�expansion� but chosen on thebasis of e�g� a least squares regression� better accuracy can be achieved�

�

Since the system function f can be approximated arbitrarily well� we are able tomake arbitrarily good predictions on a �nite horizon provided there is no noise�the initial values are correct� and the model predicts outputs that give vectors� that remain in ! �Polycarpou� Ioannou and Ahmed�Zaid � �� However� itis well known that the solution to some di�erence equations are sensitive withrespect to incorrect initial values or noise� Examples of such systems are chaoticand unstable systems�

�� Operating Regimes

In the representation �� the interpolation functions #wi are de�ned on the set!� This is a subset of the information�space� If the information�space has highdimension� the curse of dimensionality �Bellman � ��a� will restricts the appli�cability of all local modeling approaches� The core of this problem is that the


0

1

2

3

4

5

6

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

�

f��

�f��

Figure �� Approximation of f�� using two local linear models�

number of regions needed to uniformly partition a subset of a vector space in�creases exponentially with the dimension of the space� Uniform partitioning isoften not necessary� but the problem is still signi�cant� However� in some casesthe interpolation functions may be de�ned on a space of lower dimension� whichwill reduce the curse of dimensionality considerably� This is our motivation for in�troducing the terms operating regime and operating point� First� we de�ne Z to bethe set of operating points� An operating point z � Z is a vector of variables thatcharacterizes the system�s di�erent modes of behavior� It should be mentionedthat this de�nition of operating point di�ers from its de�nition in system theory�where it is an equilibrium point in the model�s state�space�

It is convenient to de�ne an operating regime Zi as a subset of Z where the systembehavior can be adequately described with a given local model structure� Hence�in the case of a local linear model structure� within an operating regime the systembehavior is �approximately linear�� A local model validity function �i � Z � $�� %belongs to a set of smooth kernel functions K� and satis�es �i�z� � � for z � Zi�and goes to quickly zero outside Zi� The interpolation functions wi � Z � $�� %are now de�ned as

wi�z� ��i�z�PNj�� j�z�

assuming that at every operating point z � Z� not all local model validity functionsvanish�

In many cases there will exist a function H � ! � Z such that z�t� � H��t��for all t� H will typically be a projection� i�e� Z will be in a sub�space or sub�manifold of the information space� When the operating point is calculated on the


basis of �ltered or estimated quantities� the relationship between ��t� and z�t� ismore complex� and must be described by an operator H mapping past data tothe operating point� Although of some importance� this complicates the analysisconsiderably� and we will not consider this case in this section�

To summarize� the model representation we investigate at this stage is

y�t� �NXi��

"fi��t � ��wi�z�t� �� e�t� ��

Next� we de�ne the set of functions

Fp�K� ��"f � !� Rm

�� "f�� NXi��

"fi��wi�H��

�

where p is the polynomial order of "fi� and �i � K� Now we want to state some gen�eral results regarding the function H from the information space to the operatingspace� In general� f can always be written as an a�ne function of the form

f�� f��L� �N � � f��N � f��N ��L ��

where �L � !L� �N � !N � and !L and !N are sub�sets of ! with the property!L �!N � !� Furthermore� f� � !N � Rm and f� � !N � Rm�dim��L� are non�linear vector� and matrix�valued functions� respectively� A trivial proof of ��is the not very useful choice !N � ! and !L � � Our principal result guidingthe choice of Z and H is the following� which indicates that with local linearmodels� the operating point must be chosen such that it captures the system�snon�linearities�

Theorem � Assume f given by �� is continuous� and ! is compact� Then for

any � � there exists an "f � F��K� with z � H�� N such that jjf� "f jj� � �

Proof� Fix an arbitrary � � ! and let �L and �N be the corresponding elementsin !L and !N � respectively� Then

f�� "f �� f��N � �L��NXi��

"fi��N � �L�wi��N �

�NXi��

�f��N � f��N ��L � "fi��N � �L��wi��N �

�NXi��

�f��N �� "fNi��N � f��N ��L � "fLi��L��wi��N �

In the last line we split the linear function "fi � !� Rm into two linear functions"fNi � !N � Rm and "fLi � !L � Rm� We represent "fLi as "fLi��L� � �i�L� where�i is a not yet speci�ed constant parameter matrix� Then we have

jjf�� "f ��jj� ��NXi��

�f��N � � "fNi��N � �f��N �� i��L�wi��N �

��


��NXi��

�f��N � � "fNi��N ��wi��N �

��

��NXi��

�f��N � � �i��Lwi��N �

��

��f��N ��

NXi��

"fNi��N �wi��N �

��

jj�Ljj� ��f��N ��

NXi��

�iwi��N �

��

��

The �rst term in this equation can be made arbitrarily small� cf� Corollary � withp � � since "fNi is linear and !N is compact� The second term can be madearbitrarily small by the same corollary with p � � through the choice of �i� since!L is compact� Hence� for any � � we can make jjf�� "f ��jj� � � Since �

is arbitrary� we get jjf � "f jj� � �

�

It is clear that the curse of dimensionality is not avoided� but the reduction indimension reduces this problem considerably for a signi�cant number of modelingproblems� as we shall illustrate with some examples in later chapters�

Example �

If the system is of the a�ne form

y�t� � f�y�t � �� u�t� �� e�t�

� f��y�t � �� f��y�t � ��u�t� �� e�t�

the choice z�t� � y�t� does not restrict the attainable prediction accuracy usinglocal ARX models�

�

This result can be generalized to local expansions of �polynomial� order p� We canalways decompose the set ! into two subsets !L and !H such that !L�!H � !�and

f�� f��L� �H� � fH��H � fH��H �fL��L� ��

where fL � !L � Rn is of polynomial order no larger than p� and fH� � !H � Rm

and fH� � !H � Rm�n may be of higher order�

Theorem Suppose f given by �� is continuous� and ! is compact� Then for

any � � there exists an "f � Fp�K� with z � H�� H such that jjf� "f jj� � �

Proof� The proof follows the same idea as the proof of Theorem �� but requiressome tedious notation� and is therefore omitted�

�


�� Some Comparisons

Using local linear model structures we can write the model representation ��as

"f��t � �� NXi��

��i )i��t � ��wi�z�t� ��

�NXi��

�iwi�z�t � ��

NXi��

)iwi�z�t� ��t � ��

� ��z�t � �� )�z�t � ��t � ��

This means that the non�linear model structure can be written as a quasi�linearmodel structure� where the parameters are dependent on the operating point�Priestley �� introduced State�dependent models which are of the form

y�t� � ��x�t � �� )�x�t� ��t � �� e�t� ��

where x is the state�vector� � is a state�dependent vector� and ) is a state�dependent matrix� In general� the canonical state�space x � � was suggested�but it was also observed that this might be redundant� so a simpler vector maybe used to describe the parameter dependence� The present approach with x � zhas obvious similarities� and o�ers one particular representation for the state�dependent parameters� Billings and Voon �� discuss the use of models withsignal�dependent parameters� which are similar to �� this x � �� where ��t� isthe auxiliary signal� Polynomials are used to de�ne the dependence of the param�eters on the auxiliary signal� i�e� ��t�� and )��t�� are polynomials in ��t�� Ourapproach is also similar� but prior knowledge is applied to design the �i�functions�which again de�nes ��z�t�� and )�z�t�� The Threshold AR Model �TAR� of Tongand Lim �� can also be written in the form �� with x�t�� y�t�� and

��y�t � �� if y�t � �� Y�� if y�t � �� Y�

)�y�t � �� )�� if y�t � �� Y�)�� if y�t � �� Y�

where Y � Y��Y� and Y��Y� � � The parameter values are chosen from a set oftwo possible parameter values� where the decision is based on the value of y�t��The resulting model is a piecewise linear model and related to our approach ifz�t � �� y�t � �� and the interpolation functions are step�functions� Whilethe functional forms above may appear very similar� we would like to stress thatthe operating regime based design of the �i�functions appears to be considerablymore transparent and attractive from an engineering point of view than the otherapproaches�

Takagi and Sugeno �� suggested a fuzzy set �Zadeh � �� based technique forcombining in a smooth fashion a set of linear models into a non�linear model� It


turns out that if the operating regimes Zi are viewed as fuzzy sets with member�ship functions Zi�z� equal to the local model validity functions �i�z�� then fuzzyinference on a rule�base of the form

IF z�t� �� IS Zi THEN y�t� � "fi��t � �� e�t�

results in a model of the form �� This suggests the use of fuzzy sets andrules as a means of designing the operating regimes� This is appealing since itis natural to interpret operating regimes as sets with soft boundaries� The fuzzyset theory provides a powerful representation of operating regimes through thefollowing de�nitions of union� intersection� and complement

Za�Zb�z� � max� Za�z�� Zb�z��

Za�Zb�z� � min� Za�z�� Zb�z��

ZCa�z� � �� Za�z�

where Za and Zb are fuzzy sets� For example� a fuzzy set Zi de�ned by

z IS Zi � �z� IS LARGE�AND �z IS NOT INCREASING�

has the membership function

Zi�z� � min

� z

��large�z�� z

��incr� +z�

where z

��large�z�� is the mono�variable membership function for the fuzzy sub�

set where �z� is large� and z��incr

� +z� is the mono�variable membership func�

tion that represent the fuzzy subset where �z is increasing�� In particularly forcomplex high�dimensional modeling problems� this representation may be usefulfor structuring the operating regime knowledge since multi�variable membership�functions are constructed from mono�variable ones� However� in order to keep thepresentation as simple as possible� we have chosen not to focus on the fuzzy setinterpretation in this thesis�

We have also chosen to suppress the neural network interpretation of the modelrepresentation �Stokbro et al� � �� Jones and co�workers � �� Jacobs et al� � ��The reason is simply that such an interpretation does not appear to contributewith any advantages that counterweight the possible confusion it may introduce�in our view�

A constant local model structure is closely related to an interpolating memory�Albus � �� Lane� Handelman and Gelfand � �� Cybenko� Saarinen� Gray� Wuand Khrabrov � �� Since just the constant parameter value �i is extrapolatedlocally� each local model will have a small range of validity� and a large number ofoperating regimes is needed� Moreover� the curse of dimensionality will be moreapparent� since the dimension of the operating point cannot be reduced� and mustequal the dimension of the information vector� The constant local model case isfunctionally identical to neural networks with localized and normalized receptive�elds� e�g� �Moody and Darken � �� and some fuzzy models� e�g� �Wang and


Mendel � �� Considering f #wigNi�� as a set of basis�functions� the method is alsosimilar to radial basis�function �RBF� expansions �Powell � � Broomhead andLowe � � Dyn � �� Using RBF�s� a non�linear function is modeled with theseries expansion

"f�� NXi��

�iri�jj�� ijj��

where ri � R� � R is typically chosen as a thin plate spline� multi�quadric�

or Gaussian function� If the functions #�i are chosen as local radial functions�the normalized function #wi de�ned by �� will not be radial in general� but itwill qualitatively have much the same shape and features as #�i� except near theboundary of !� The e�ect of normalization in discussed in �Shorten and Murray�Smith � �� Another class of related model representations can be found in thestatistical smoothing literature� e�g� �Hastie and Tibshirani � ��

Example �

We consider again the very simple function in Example �� and the following fourmodeling approaches�

�� Two �rst order truncated Taylor�expansions at �� and �� Theseare combined into a piecewise linear one� as in Example �� This may also beinterpreted as a Thresholded AR model�

�� We choose Gaussian model validity functions

�i�z� � exp

��

�z � zi�i

��

��

with z� � �� z� � �� i � �� and use two linear local modelstructures�

�� Five constant local model structures centered at �� and Gaussian model validity functions with �i � ��

�� A radial basis�function expansion with �ve Gaussian basis�functions centeredat �� and �i � ��

Linear regression is used to estimate the model parameters on the basis of a uni�formly distributed data set containing �� samples� The results are shown inFigure �� Comparing Figure ��a with ��b� it appears that interpolating locallinear models with estimated parameters and well chosen model validity functionscan improve the accuracy compared to the piecewise linear model with truncatedTaylor�expansions locally�

Notice that f in this example is de�ned on the interval $�� %� while data from theinterval $�� % is used for parameter estimation� The extrapolation capabilities canthus be evaluated� and we see that the model based on local linear models gives�rst order extrapolation� as would be expected� while the constant local models


give zeroth order extrapolation� The RBF approach does not given any sensibleextrapolation at all� since all basis�functions go to zero� It is interesting to observethat while all methods give roughly comparable results in the region where thereare data available� there are fundamental di�erences concerning the extrapolationcapabilities�

-5

0

5

10

-1 0 1 2 3-5

0

5

10

-1 0 1 2 3

0

2

4

6

8

10

-1 0 1 2 30

2

4

6

8

10

-1 0 1 2 3

(a) (b)

(c) (d)

�

f��

�f��

�

��

f��

�f��

f��

�f��

f��

�f��

Figure �� a� Approximation of f�� using a piecewise linear model withtwo local linear models� b� Model based on two local linear models and Gaussianmodel validity functions� c� Model based on �ve constant local models� d� Modelbased on a radial basis�function expansion with �ve Gaussian basis�functions�

�

�� Modeling

Within the framework described here� the development of a model typically con�sists of the following steps

�� Design experiments and collect data�

�� Choose an operating point set Z� including a choice of which variables tocharacterize the system�s operating conditions with�

�� IDENTIFICATION ��

�� Decompose Z into regimes and �nd an adequate local model structure anda local model validity function for each regime�

�� Identify the unknown local model parameters �and any unknown local modelvalidity function parameters��

Prior knowledge can be applied in all these steps� and is particularly important inthe second and third steps� Hence� this procedure should by no means be lookedupon as a black�box approach� because the model representation is transparentin the sense that it allows easy interpretation and incorporation of prior knowl�edge� The transparency is related to the possibility to relate the operating regimedecomposition to the system mechanisms or high�level behavior� Moreover� it isrelated to the transparency of each local model� which is typically good� becausethe local model structures are chosen to be simple�

Selection of the operating point set includes a speci�cation of which variables theoperating regimes are characterized with� As we have seen in Theorem �� withlocal linear model structures it is important that the operating point containsvariables that captures that non�linearities� To get an accurate and transparentmodel� it is important that prior knowledge is applied to de�ne the operating pointusing as few and interpretable variables as possible�

The decomposition into regimes is typically accomplished with the help of em�pirical or mechanistic prior knowledge about the behavior of the system underdi�erent operating conditions�

The choice of local model structures and local model validity functions is closelyrelated to the decomposition into operating regimes� as the operating regimes mustbe designed with a particular local model structure in mind� or vice versa� It isevident that there is a trade�o� between low dimension of the operating pointvector and the simplicity of the local models� At one extreme� one may have onlyone complex local model which will also be the global model� At the other extreme�one may need a large number of very simple local models� like constant ones� anda high�dimensional operating point vector� This gives a signi�cant amount of�exibility with this approach� Like with ARMAX models� one must specify thelocal model structure order parameters� and analyze the structure of the noise anddisturbances in order to select the MA part of the local model structures�

It must be emphasized that there does not exist a unique best model structurewith this approach� and that there is signi�cant �exibility regarding the di�erentstructural choices� like decomposition into regimes� which variables to characterizethe operating regimes with� and which local model structures to choose� Togetherwith its simplicity and transparency� this �exibility is a major reason for the powerof this procedure�

�� Identi�cation

First we consider identi�cation of the local model parameters based on local criteriafor each local model� Then we discuss identi�cation on the basis of a globalcriterion� before we brie�y discuss identi�cation of local model validity functionparameters�


�� Identifying Local Model Parameters using Local Criteria

The prediction error at time t using the local model "fi is

�i�t� � y�t� � "fi�y�t � �� y�t� ny��

u�t � �� u�t� nu�� i�t � �� i�t� ne��

The rationale behind this predictor is that if the model matches the true system�then �i�t� � e�t� as t � � provided the system �� is asymptotically stable�and the initial values are within the region of attraction �Granger and Andersen� �� A local criterion Ji associated with each local model can be written as

Ji ��

l

lXt��

�Ti �t�,i�t��i�t�

where l is the number of observations available� and ,i�t� is a weight�matrix thatin addition to relative scaling of the prediction errors has the purpose of weightingonly the observations that are relevant for local model i� A typical choice is

,i�t� � ,�i�z�t� ��where , is a constant scaling�matrix �Murray�Smith � �b�� Consider the localARMAX model structure �� The local model structure is parameterized bya vector �i and a matrix )i� If all local model structures are linear functions ofthe parameters� like �� the global model structure is linear in the parameters�and standard identi�cation methods can be applied� e�g� �S�oderstr�om and Stoica� ��

Assume �rst the information vector ��t � �� does not contain any noise termse�t � �� e�t� ne�� The predictor can now be written on the linear regressionform

"fi��t � �� Ti �t� ��iwhere �i is a parameter vector and �i�t�� is a regression matrix� The parameterscan be estimated using the least squares �LS� method�

"�i �

��

l

lXt��

�i�t�,i�t��Ti �t�

��

l

lXt��

�i�t�,i�t�y�t�

��

In the general case when delayed noise� e�t�� e�t�� e�t�ne�� is included in��t�� we use a prediction error �PE� method� Since these terms are replaced withcorresponding �i�terms� the prediction error �� is no longer a linear function ofthe parameters� Hence it is not possible to �nd an explicit solution like �� Thecriterion �� must be minimized numerically using e�g� the Newton�Raphsonalgorithm

"��k��

i � "��k�

i � �k

�r�Ji

�"��k�

i

��rJi

�"��k�

i

�� EXPERIMENTAL RESULTS ��

�� Identifying Local Model Parameters using a GlobalCriterion

We de�ne the global prediction error to be

��t� � y�t� � "f �y�t � �� y�t� ny��

u�t � �� u�t� nu�� t� �� t� ne��

and a global identi�cation criterion is

J ��

l

lXt��

�T �t�,��t� ��

The LS� and PE�methods can be formulated in the same manner as above� Ourexperience is that in most cases the use of a global criterion is superior to theuse of local criteria� in terms of prediction performance� However� an exception iswhen the model structure is overparameterized� Then it can be shown that theuse of local criteria has a regularizing e�ect �Murray�Smith � �b� that reducesthe e�ective number of parameters somewhat� and therefore gives better predictionperformance� The amount of regularization can be tuned with the overlap betweenthe �i�functions�

�� Identifying Model Validity Function Parameters

In general� the local model validity function parameters will enter the equationsfor the prediction error non�linearly� In particular� if these parameters are to beidenti�ed simultaneously with the local model parameters� we get a complex non�linear programming problem� We will not discuss this problem here� but referto the vast literature on general�purpose non�linear programming algorithms� e�g��Gill� Murray and Wright � ��

We will like to point out that our experiments indicate that a rough choice ofthe location and scaling parameters in the model validity functions will in mostcases be almost as accurate and at least as robust as the identi�cation of localmodel validity function parameters when a global identi�cation criterion is applied�We therefore recommend that the local model validity function parameters arenot optimized using empirical data� but chosen with the aid of prior knowledge�whenever available� With local identi�cation criteria� the exact choice of localmodel validity functions becomes more critical� and a rough empirical choice maynot be su�cient�

�� Experimental Results

In this section we will present the results of some rather simple experiments thatillustrate the modeling procedure� �Johansen and Foss � ��


Angle

AIR

Voltage

TemperatureSensor

Resistor

FANvt

utyt

Figure �� A heat transfer process� Air is pulled into the �� cm tube through avalve by a fan� The air is heated by a resistor� and its temperature is measured atthe outlet�

Consider the experimental setup illustrated in Fig� �� The output of this systemis the air temperature measurement y�t�� the input is the voltage over the resistoru�t� � $�� %� In addition� the valve opening angle v�t� � $�� % is an indepen�dent measured variable that can be manipulated by hand only� On this systemwe have performed some experiments� First� consider the responses to a �� voltstep input� for v� � ��

�� v� � �� and v � ��

� plotted in Fig� ��a� From thesecurves� we make two observations�

�� First� there seems to be two dominating time�constants� at least for smallvalve opening angles v� The fast mode �time constant about � second� isrelated to the heat capacity of the air in the tube� while the slower one�time constant equal to a few minutes� is related to the heat capacity of thetube itself and the rest of the equipment� In the proceeding modeling andidenti�cation experiments� we will only attempt to �nd a model with goodprediction performance on the horizon of the fastest of these modes� sincethis is the one that is interesting for control purposes�

�� Second� the steady�state gain seems to be a function of the valve openingangle v� On a shorter time�scale� cf� Fig ��b� we see that also the time�constant and time�delay are functions of v� Similar experiments with �xed vbut varying input step amplitude shows that the steady�state gain dependson the input signal level� too� cf� the steady�state response in Fig� ��

For the purpose of identi�cation� we use the data sequence in Fig �� The samplinginterval is &t � �� s� and the sequence contains about �� samples� Theinput u�t� is a random signal exciting the dynamics locally about a sequence ofrandom operating points that covers the full range of operation� The valve openingv�t� varies over the full range of operation in a pseudo�random manner� For thepurpose of model validation� we use another data sequence with somewhat di�erent

�� EXPERIMENTAL RESULTS �

0 10 20 30 40 50 600

1

2

3

4

5

6

7

8

9

10

4 5 6 7 8 9 100

1

2

3

4

5

6

7

8

9

10

time [s]

time [s]

b)

a)

y�t�u�t�

u�t�y�t�

v� � ��

v� � ��

v � ��

v � ��

v� � ��

v� � ��

Figure �� a� Response to a �� volt step input� and b� the same response on ashorter time�scale�

excitation signals� see Fig� �� The input varies in a qualitatively similar manner�while the valve opening angle varies more or less systematically to cover a widerange of frequencies and amplitude levels�

�� Identication of ARX Model

Using the identi�cation data sequence and the least squares algorithm� we identifythe following �rst�order ARX model

y�t� � �� y�t� �� u�t� �� v�t � �� e�t� ��


0

50

100

150 02

46

810

0

2

4

6

8

10

y

v

u

Figure �� Steady�state response�

The order and time�delays are determined from the step�responses� cf� Fig� ��b�

�� Identication of Operating Regime based NARXModel

From the discussion above� it is clear that the coe�cients of the ARX model ��should be functions of at least u and v� On the basis of the step�responses and thesteady�state gain characteristic� cf� Figs� �� and �� we therefore try to combinefour local ARX model structures into an NARX model structure� as outlined insection �� The operating point is z�t� �� u�t� �� v�t� ��T � The four localARX models have local model validity functions

��u� v� � exp

��

��u� ��

�

�v � ��

��

��u� v� � exp

��

��u� �

�

�v � ��

��

��u� v� � exp

��

��u� ��

�

�v � ��

��

��u� v� � exp

��

��u� �

�

�v � ��

��

The interpolation functions are illustrated in Fig� �� The local model valid�

�� EXPERIMENTAL RESULTS �

0 20 40 60 80 100 120 1400

5

10

0 20 40 60 80 100 120 1400

5

10

0 20 40 60 80 100 120 1400

50

100

150

time [s]

u

y

v

Figure �� Data sequence used for identi�cation�

ity functions are chosen to give a reasonable amount of overlap� The motivationbehind this particular decomposition is that we believe a linear ARX model struc�ture gives an adequate description of the system within each regime� Using theleast squares algorithm based on the criterion that penalizes mismatch betweenthe global model and the data gives the model

y�t� � �� y�t� �� u�t� �� v�t � ��w��u�t� �� v�t � ��

�� y�t� �� u�t� �� v�t � ��w��u�t� �� v�t � ��

�� y�t� �� u�t� �� v�t � ��w�u�t� �� v�t � ��

�� y�t� �� u�t� �� v�t � ��w��u�t� �� v�t � �� e�t� ��

which we will call �the ��local model�� The time�delays are chosen to be smallerat large valve opening angle� cf� Fig ��b� We see that the model has capturedthe facts that the dynamics is faster at large valve�opening� and that the gain islarger at high input voltages�


0 20 40 60 80 100 120 1400

5

10

0 20 40 60 80 100 120 1400

2

4

6

0 20 40 60 80 100 120 1400

50

100

150

time [s]

u

y

v

Figure �� Data sequence used for validation�

0

50

100

150 02

46

810

0

0.2

0.4

0.6

0.8

1wi�u� v�

v

u

Figure �� Interpolation functions for the local models�


Symbol Variable Value Unitv Valve opening �

� Mass density for air �� g�lV Volume of tube �� lcp Speci�c heat capacity for air �� J�g Kq Volumetric air �ow rate l�sk �Fan coe�cient� l�sG Conductance in resistor -��

U E�ective heat transfer coe�cient J�K sQ Heat added through fan WT Air temperature in tube �CT Air temperature in environment �� CT� Temperature in equipment � �C�v Time�delay� valve opening to output �� s�u Time�delay� input to output �� s

Table �� Symbols� constants and variables used in the mechanistic model�

�� Identication of a Semimechanistic Model

A simple energy balance for this system is

�V cpd

dtT �t� � �cpq�v�t � �v��T � T �t�� U �T� � T �t��

Gu��t � �u� Q ��

where the symbols are de�ned in Table �� The �rst term in �� is the energylost because the outlet temperature is higher than the inlet temperature� Thesecond term is the energy lost through the tube walls� while the third and fourthterms are heat added by the power source and fan� respectively� The volumetricair �ow rate q is a function of the valve opening angle v and the �uid dynamicsin the fan housing� We choose the empirical correlation q � kv� where k is anunknown empirical constant� since a higher order correlation does not appear togive a model with signi�cantly better prediction performance� In addition� weneed the calibrated measurement equation

y�t� � ��T �t�� where y�t� is the output voltage� Again� we identify the unknown physical param�eters using the least squares algorithm� which gives

"G � �� -��

"U � �� JKs

"k � �� ls

"Q � �� W

Notice that this model is most correctly interpreted as a semi�mechanistic model�For example� an error in T or T� will strongly in�uence the estimate "Q� which


therefore has lost its physical interpretation as the heat added by the fan� Dis�cretizing the model �� we get

T �t� � T �t� �� &t�kv�t� ��

V�T � T �t� ��

U

�V cp�T� � T �t� �� G

�V cpu��t� �� Q

�V cp

We may re�parameterize this model to a semi�mechanistic NARX model on theform

y�t� � �� y�t� �� v�t� �� y�t � ��v�t � �� u��t� �� e�t�

Lindskog and Ljung �� have discussed the design of semi�mechanistic modelsof this form�

�� Discussion of Identication Results

The prediction performance of the three identi�ed models are compared by simu�lating the models� responses to the validation data input sequence� cf� Fig� ��It is evident that both the empirical ��local and semi�mechanistic models performbetter than the ARX model� as one should expect� The semi�mechanistic modelpredicts slightly better than the ��local model� but the di�erence is not signi��cant� Notice that because the validation data were logged on a day with higherenvironment temperature� there was a signi�cant� but constant� o�set between theoutput of the simulations and the measured output� As this is easily compensatedfor with integral action in a controller� this o�set was removed in Fig� �� tosimplify the comparison�

Model ��step�ahead ��steps�aheadprediction error prediction error

ARX �� local �� Semi�mechanistic ��

Table �� Root average squared prediction errors estimated with the validationdata sequence�

The average ��step�ahead and ��steps�ahead prediction errors are estimated usingthe validation data sequence for the three models� The results are shown in Table�� Again� we see that the semi�mechanistic model performs somewhat betterthan the ��local model� The ��steps�ahead prediction errors are illustrated inFig� �� They reveal that there is a signi�cant amount of unmodeled dynamicsleft� This may be due to the slow time�constant we have neglected� and othere�ects that are not adequately modeled� In particular� there seem to be someslowly time�varying bias that might be adequately modeled as integrated white


0 20 40 60 80 100 120 1400

1

2

3

4

5

6

7

8

0 20 40 60 80 100 120 1400

1

2

3

4

5

6

7

8

0 20 40 60 80 100 120 1400

1

2

3

4

5

6

7

8

time [s]

c)

b)

a)

yt�

�yt�

yt�

�yt�

yt�

�yt�

Figure �� Simulation of the identi�ed models� responses to the validation datainput sequence� a� ARX model� b� ��local model� and c� semi�mechanistic model�

noise� It is interesting to observe that the prediction error with the ��local modelconsists of more bias but less variance than both the ARX and the mechanisticmodels� Further experiments has shown that the introduction of a noise model


0 20 40 60 80 100 120 140−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100 120 140−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100 120 140−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

time [s]

c)

b)

a)

Figure �� steps�ahead prediction errors for the � models� responses to thevalidation data input sequence� a� with ARX model� b� with the ��local model�and c� with the semi�mechanistic model�

�� DISCUSSION ��

that tries to incorporate the slowly varying bias in all three model alternativesgives most signi�cant improvements for the ��local model� However� the reasonwhy the ��local model has less variance is unclear� but we �nd it unlikely that thisis a feature of the operating regime based modeling framework�

The prior knowledge applied to de�ne the structure of the ��local model is entirelyempirical and gained by examining process data� Hence� the ��local model isentirely empirical� but certainly not a black box model� The regimes can be linkedto qualitative di�erent behaviors of the system� For example� in Regime �� thegain is small and the dynamics are slow� while in Regime �� the gain is higher� andthe dynamics are faster�

We have chosen local ARX models� Clearly� this is not the only possible choice�We might for example have expected a u��term since it is obvious that it is thepower and not the voltage that is important in the energy balance� This would givelocal NARX model structures� and would have allowed us to reduce the dimensionof the operating point and the number of local models�

�� Discussion

To summarize� we have investigated how non�linear systems can be modeled usingNARMAX models based on local ARMAX models� We have described a prac�tical modeling procedure� which is supported with some theoretical results andexperiments�

The main theoretical results are

�� The interpolation method used for combining a number of local models isoptimal� in a certain sense� While this is a nice result� one should not put toomuch emphasis on it� since the optimality is obviously a result of the partic�ular choice of objective criterion� On the other hand� the chosen objectivecriterion captures the desired properties of the interpolation method�

�� The system function can be uniformly approximated to an arbitrary accu�racy on any compact operating range by making the decomposition intooperating regimes su�ciently �ne� The proof is constructive and gives detailon how the regimes� local models� and local model validity functions maybe chosen� Moreover� we have given an upper bound on the approximationaccuracy when the model is constructed using this procedure� However� theconstructive procedure is sub�optimal and is not recommended in practice�It is important to remember that these results are approximation resultsthat assumes the �true function f� is known� In practise� the model is esti�mated on the basis of empirical data� and noise� the quality and amount ofthe empirical data available will limit the attainable accuracy� Hence� theapproximation results are not directly applicable� but provide neverthelessa useful basis for understanding the various design trade�o�s that must bemade in the practical modeling and identi�cation procedure� In the liter�ature� there exists a wide range of results similar to Corollary � for othermodel representations� e�g� �Cybenko � � Barron � �� Parzen � �� Park


and Sandberg � �� Related results on the relations between the complexityof the system� the approximation bound� and the complexity of a piecewiselinear model are given in �Omohundro � ��

�� With local linear model structures� the operating regimes should be charac�terized using variables that capture the system�s non�linearities� This is asomewhat trivial result� but is still perhaps our best guide for the choice ofoperating point vector�

The experimental results has the purpose of illustrating the operating regime basedmodeling procedure� We have compared the results of this procedure with standardapproaches� namely an ARX model and a simple mechanistic model� The oper�ating regime based modeling approach gave a transparent model that was clearlysuperior to the ARX model and comparable to the semi�mechanistic model�

The problem of choosing the local model order parameters ny� nu� and ne has notbeen discussed� The reason is certainly not that this is a trivial problem� Thegeneral problem of order selection is widely discussed in the system identi�cationliterature� and there exists several statistical criteria and general guidelines� e�g��Box and Jenkins � �� Ljung � �� see also Chapter �� In the operating regimebased modeling framework� there may be a need for di�erent order of the di�erentlocal models� Sbarbaro �� has suggested to represent continuous�time locallinear models with Laguerre networks in order to reduce the sensitivity with respectto the choice of order�

The ARMAX model representation is perhaps the most widely used linear em�pirical dynamic model representation� For the NARMAX model representation�there exists a wide range of generic representations for the function f � Neuralnetworks have over the past few years gained considerable popularity� at leastin academia� The most popular feed�forward type networks can be used to buildblack box NARMAX models� typical examples can be found in �Chen� Billings andGrant � �a� Nguyen and Widrow � �� Sj�oberg� Hjalmarsson and Ljung � ��Common to most types of neural networks is that the internal representation isvery complex and far from transparent� which may explain why so few industrialapplications are reported� Moreover� from an identi�cation point of view� the non�linear parameterization of many neural network based model representations is aserious drawback� in particular as long as there exists powerful linearly param�eterized alternative model structures� Recently� there has been some interest inthe application a prior knowledge for structuring neural nets �Mavrovouniotis andChang � �� initialization of neural network parameters and interpretation of theresulting model through linearizations �Scott and Ray � �� A further step wastaken in �Kramer et al� � �� Thompson and Kramer � �� Su et al� � �� Psicho�gios and Ungar � �� Aoyama and Venkatasubramanian � �� Brown� Ruchti andFeng � �� where certain combinations of neural network structures and mech�anistic model structures was suggested� However� we believe these frameworksstill does not provide the �exibility and elegance of the operating regime basedmodeling approach�

Other related empirical model structures are splines �Wahba � �� Friedman � ��Kavli � �� Lane et al� � �� radial basis function series expansions �Moody and

�� DISCUSSION �

Darken � � Chen� Billings� Cowan and Grant � �b�� some fuzzy models �Wangand Mendel � �� and wavelets �Benveniste et al� � �� All these approacheslead to local basis�functions� which for low�dimensional problems automaticallygives a transparent representation� since each basis�function and its coe�cient canbe uniquely related to a particular part of the information space� A fundamentalproblem with all local modeling methods is the curse of dimensionality� which maylead to lack of transparency of the resulting model� and exponentially increasingmemory demand and computational complexity as a function of the key dimension�With the present approach� we have shown that this problem may sometimes bereduced considerably� since the operating point may be of lower dimension thanthe information vector and each local model contains more information than abasis�function� Moreover� the problem can be further reduced through the use ofprior knowledge for regime decomposition� One reason is that for most practicalproblems� one does not need a model with uniform accuracy over the operatingrange� One reason for this is that there will be operating conditions that are phys�ically infeasible for the dynamic system to operate under� However� for very highdimensional and complex empirical modeling problems� one may prefer projectionbased methods �De Veaux� Psichogios and Ungar � �� such as sigmoidal neuralnets� or projection pursuit regression �Friedman and Stuetzle � ��

The title of this chapter is �Operating Regime based Input�Output Modeling�� Tokeep the presentation simple� we have focused entirely on the NARMAXmodel rep�resentation� We do not think that the extension to other input�output model rep�resentations� like FIR model representations� or di�erential equation based modelrepresentations� will reveal any suprises�

In summary� decomposing the system�s operating range into operating regimesand using local ARMAX models to describe the system behavior in each operatingregime is appealing for the following reasons�

� ARMAX models are well understood and widely used in industry� and istherefore a convenient basis for building NARMAXmodels� If a given processis currently modeled with an ARMAX model� a non�linear model may quitenaturally be introduced by extending this model to an NARMAX modelbased on two or three local ARMAX models�

� The class of �mathematical� systems that can be accurately represented isshown to be large� and the model structure is linearly parameterized�

� The concept and modeling procedure is straightforward� and the model struc�ture is transparent� This is important� since the model structure can be easilyinterpreted� analyzed� and validated� and prior knowledge can be incorpo�rated�

� Describing a system by means of operating regimes is common practise inengineering�

Chapter �

Operating Regime basedState�space Modeling

In this chapter we will discuss the design of non�linear state�space models in theoperating regime based modeling framework� Local state�space models introducesome additional features and di�culties compared to local input�output modelsthat were discussed in Chapter ��

This chapter is organized as follows� First the model representation is introducedin section �� In section �� we discuss how process knowledge can be applied todevelop a model structure in terms of operating regimes and local model structures�In section �� we present some examples to illustrate the procedures� see also�Johansen and Foss � �b�� and in section �� there is a discussion�

�� State�Space Model Representation

Here we address the problem of representing a model of the form

+x � f�x� u� v� ��

y � g�x� w ��

where

x � X � Rn � State vectoru � U � Rr � Control vectorv � V � Rn � Disturbance vectory � Y � Rm � Measurement vectorw � W � Rm � Measurement noise vector

A state�space model of the system consists of knowledge of f � X � U � V � Rn�and g � X � Rm� and some structural or statistical knowledge of the signals vand w� In this chapter we will be concerned with representation and design of thefunctions f and g�

�

�� CHAPTER �� STATESPACE MODELING

Within a su�ciently small operating regime� a simple �for example linear� localmodel

+x � fi�x� u� v� ��

y � gi�x� w ��

will give an adequate description of the system� provided the system satis�es somesmoothness condition� This model will be valid within this particular operatingregime� and more or less invalid outside this regime� We introduce the operatingpoint z� which is a vector of variables that characterizes the system�s di�erentmodes of operation� and the set of operating points Z� An operating regime isde�ned as a subset Zi � Z� The operating point is typically a function of thestate� inputs� and disturbances

z � H�x� u� v�

Next� we assume that for each local model there exists a smooth local model validityfunction �i � Z � $�� % that is constructed such that its value is close to one foroperating points where the local model is an accurate description of the system�and close to zero elsewhere� If there are N operating regimes with local modelsand validity functions for each regime� one may apply the following interpolationto get a global model

+x �NXi��

fi�x� u� v�wi�z� ��

y �NXi��

gi�x�wi�z� w ��

wi�z� ��i�z�PNj�� j�z�

��

The interpolation function wi � Z � $�� % is a normalization of the model validity

function �i� which has the property thatPN

i��wi�z� � � for all z � Z� To get acomplete global model� it must be assumed that at any operating point z � Z� notall local model validity functions vanish�

�� Local Models of Di�erent Structure

In general� only the input and output vectors need to be common for two di�erentlocal models of the same system� as these are the only variables that are directlyrelated to the system� States� disturbances� noise� parameters� and internal vari�ables are directly related to the model� We have chosen to use the same superset ofvariables in all local models� Of course� each local model will only use the relevantsubset of the variables� The motivation is the simpli�ed notation� and the factthat dealing with all the variables simultaneously is required in the interpolation�� anyway�

�� STATESPACE MODEL REPRESENTATION ��

Consider the following two local models

+x� � f��x�� u� v�

+x� � f��x�� u� v�

where x� � �� T and x� � ��

T � Typical reasons for di�erent state�spacesmay be that � is not a state in the �rst local model either because it is constant� orirrelevant in that operating regime� Again� the reason for this may be of physicalnature� like

� The temperature of boiling water is constant� while it may vary when it isnot boiling�

� In a certain range of �ow�rates there may be gas bubbles in e�g� a pipelineor reactor� The phase de�ned by the gas bubbles may be described by statesthat are not relevant at other �ow rates� simply because the gas bubbles willnot exist under these �ow conditions�

Alternatively� the reason may be linked to the model representation� for example

� The local model structure in one regime is a black�box model structure�while it is a mechanistic model structure in another regime� Clearly� thestate�spaces may be completely di�erent�

� A simple model of low order is su�cient in one regime� but a high ordermodel is required in another regime�

Anyway� when the model is applied for prediction during transition between tworegimes with di�erent state�spaces� one needs to answer the question� �What arethe �initial� state of the local model associated with the regime just being en�tered�� A simple answer exists in cases when there exists a one�to�one mappingbetween the state�spaces� or from the input�output space to the state�space� whichwill typically be the case when the model is in some canonical form� In the follow�ing� we will describe a more general solution that mainly assumes observability ofthe states associated with each local model in a su�ciently large region containingthe operating regime�

If � is constant in the �rst operating regime� we have +� � � in that regime� and� enters as a constant parameter in the function f�� Let us assume in additionthat �� is constant in the second operating regime� De�ning an augmented statevector x � ��

T � we get the local models

+x �

�� +��+��+�

�A �

�� f�� u� v�f�� u� v�

�

�A+x �

�� +��+��+�

�A �

�� f�� u� v�f�� u� v�

�AA global model is now�� +��

+��+�

�A �� f�� u� v� �w��z� � �w��z�f�� u� v� �w��z� f�� u� v� �w��z�� w��z� f�� u� v� �w��z�

�A ��


The other possibility is that � is irrelevant in the �rst operating regime� Thismeans that f� is not dependent on �� but it does not mean that � is constant�Instead of augmenting with an equation +� � �� we model � as a random walkprocess +� � � in this operating regime� Similarly� we model �� as a random walkprocess in the second operating regime� Now we get a global model�� +��

+��+�

�A �

�� f�� u� v� �w��z� �� w��z�f�� u� v� �w��z� f�� u� v� � w��z�� w��z� f�� u� v� � w��z�

�AThe variables �� and � are unknown� but using an extended Kalman��lter toestimate the irrelevant process states� it is su�cient to assume these variables tobe random white noise processes with known variance� If the model is appliedon�line� the measured output can be used in the Kalman��lter� Otherwise� theglobal model output is used�

If the irrelevant variable � is not observable in the �rst regime� it is impossibleto update an estimate of it using an extended Kalman��lter� since there is noinformation available about � in that regime� The best thing to do is to turn o��parts of� the estimator until the process enters an regime where � is observable�

Another potential problem that in particular arises when one tries to transfer stateor output information from an empirical local model to a mechanistic local modelis that the empirical model may suggest state values in the mechanistic model thatare non�physical� If this is a potential problem� it must be handled in an ad�hoc�manner by e�g� projecting the variables to a physically feasible region� since suchmismatch between physical reality and model prediction is a fundamental problemwith empirical models�

�� Local Linear Models

An interesting special case is when the local models are all linear with equalstructure

+x � ai Ai&xi Bi&ui Ci&vi

y � di Di&xi w

Here we linearize around points xi� ui� vi� and de�ne &xi � x � xi� &ui �u � ui� &vi � v � vi� ai � f�xi� ui� vi�� di � g�xi�� Ai � �f�x�xi� ui� vi�� Bi ��f�u�xi� ui� vi�� Ci � �f�v�xi � ui� vi�� and Di � �g�x�xi�� Then ��becomes

+x �NXi��

�ai Ai&xi Bi&ui Ci&vi�wi�z� ��

y �NXi��

�di Di&xi�wi�z� w ��

�� INCORPORATING PROCESS KNOWLEDGE ��

The model representation �� with local linear models can be viewed as aquasi�linear model representation where the matrices are functions of the operatingpoint z

+x �NXi��

�ai Ai�x� xi� Bi�u� ui� Ci�v � vi��wi�z�

�NXi��

�ai � Aixi �Biui � Civi�wi�z� �NXi��

Aiwi�z�

�x

�NXi��

Biwi�z�

�u

�NXi��

Ciwi�z�

�v

� a�z� A�z�x B�z�u C�z�v ��

Of course� the measurement equation can be rewritten in the same manner

y � d�z� D�z�x w

It is easy to see that any f and g can be written in the form

+x � f�x� u� v� � #a�x� u� v� #A�x� u� v�x #B�x� u� v�u #C�x� u� v�v ��

y � g�x� w � #d�x� #D�x�x w ��

This o�ers a guide for the choice of Z and mapping H� which is essentially thesame as the theoretical result in Chapter �� but perhaps more intuitive� Comparing�� with �� it is clear that z � H�x� u� v� can be chosen such that

#A�x� u� v� � A�z�� #B�x� u� v� � B�z�� etc�

In other words� such that z captures the non�linearities� Notice that becausethe representation �� is non�unique� Hence� there are several possiblechoices for H�

�� Incorporating Process Knowledge

In section �� we stated that a major motivation for the operating regime basedmodeling approach is that local modeling is simpler than global modeling becauselocally there are less relevant phenomena� and their interactions are simpler� Letus illustrate this fact with an almost trivial example� Consider a chemical reactorwhere two globally relevant phenomena are reaction and ow pattern� Clearly�these are interacting since the �ow pattern may in�uence the reaction kinetics�and the composition may in�uence the �ow pattern� Developing a model that isvalid for a wide range of �ow patterns and compositions will be a laborous taskbecause of the large number of mass� and heat�transfer phenomena that must betaken into account simultaneously� A local model� on the other hand� need onlybe valid for one particular �ow pattern and a small range of compositions� and istypically signi�cantly simpler to develop because the interactions between reaction


and �ow pattern are simpler� Of course� one should compare the cost of developingthe required number of local models to the cost of developing one global modeldirectly� When the number of operating regimes is large� this may appear as amajor drawback of the operating regime based modeling approach� However� it isnot necessarily so� The reason is that having developed one local model structure�the remaining ones follow often more easily� For example� it will often make senseto let a number of the local model structures be equal� A very simple� but generallyapplicable approach is to use the same local linear model structure in all regimes�

The basic modeling procedure outlined in section �� can also be applied for state�space models� A major task is to identify and characterize the system�s operatingregimes� and select local model structures� Typically� this is based on one of thefollowing two approaches�

�� Di�erent mechanisms or physical phenomena are dominant in di�erent op�erating regimes� giving rise to di�erent local model structures�

�� Di�erent local model parameters in di�erent operating regimes� but the samelocal model structures�

Choosing a priori a linear model structure for each operating regime leads to thesecond approach� With this approach� an operating regime must be de�ned to be aregion in which the system behaves approximately linearly� Hence� the decomposi�tion into operating regimes is motivated and given by the system�s non�linearities�not by the di�erent physical phenomena� Compared to the �rst approach� this isa more empirical and less knowledge�demanding approach� There may be somecases that falls more or less into both these categories� for example when the para�metric variations in the local model structure are explicitly related to physicalphenomena or mechanisms�

Often� there will be a number of model variables that are candidates for being usedto characterize operating regimes �i�e� included in the vector z�� For example� con�sider a set of linear di�erential equations describing the mass� and energy balancesof a chemical reactor� Such a model is clearly a local model� since chemical reactorsusually exhibit strongly non�linear behavior� Di�erent regimes may be character�ized using variables like temperature and �ow�rate� since the parameters of thelinear local model structure will be di�erent at high and low temperature� becausethe reaction rate is usually highly temperature dependent� Likewise� the �ow�ratewill in�uence the holdup�time� mixing� and �ow patterns� and thereby the linearmodel parameters� In addition� there may be di�erent operating regimes becausea di�erent set of chemical reactions may dominate under di�erent conditions� oftencharacterized by temperature� composition� or catalyst properties� Although thisexample is simple� it illustrates two important points� First� the choice of operat�ing regimes and local model structures are closely interconnected� For example�if the local linear models in the example are replaced by non�linear local mod�els where the reaction rate constants are replaced by Arrhenius�type terms� theneach local model would be valid over a much wider� possibly global� temperaturerange� Hence� the dimension of the operating point space can usually be reducedat the cost of increasing the complexity of the local model structures� Second� theexample illustrates that the process knowledge used for decomposition into oper�ating regimes may be quite elementary� like knowledge of conditions under which

�� INCORPORATING PROCESS KNOWLEDGE ��

di�erent chemical reactions dominate� and the knowledge that the reaction ratesdepend strongly on temperature� Supplementary knowledge� like �The reactionrate will be approximately doubled if the temperature increases by �� K� will besu�cient to decide roughly on how to design regimes along the temperature axis�Knowledge about how accurate the model need to be under di�erent conditionscan be used for the same purpose� For example� for on�line dynamic optimizationpurposes� a detailed model may be needed to describe the reactor under normaloperating conditions in order to get close to optimal operation� Under other oper�ating conditions� like startup� shutdown� product shifts� and abnormal operation�a rough model may be su�cient for the control system to bring the system safelyand reliably into the desired operating mode�

It is not di�cult to imagine that the problem may not always be as transparent asin the example above� If the number of model variables is much larger� and thereare signi�cant dependences among these variables� and the system behavior is notwell understood� then one may need to take a di�erent approach to the choice ofoperating space and regimes� One will typically need to analyse the correlationstructure and collinearities in the empirical data as proposed by Sugeno and Kang�� see also �Nakamori et al� � �� Projection pursuit tools �Huber � �� likeprincipal component analysis will guide the search for a small number of variables�or combination or variables� that contain su�cient information to characterize thesystem behavior in terms of operating regimes in relation to the local models�Such a projection pursuit algorithm may search for a projection within a class ofprojection operators� e�g� linear ones� A suitable optimization criterion will givepenalty on large prediction errors and other undesirable properties of the model�For example� it is always desirable with an operating space of low dimension� sincethis will make the model structure more transparent� and the curse of dimension�ality will be avoided� However� transparency also requires interpretability of thevariables characterizing the operating regimes� Unfortunately� this is sometimesin con�ict with the requirement of low dimension� so a tradeo� must be made�

When a rough decomposition into regimes is designed� one must choose how muchthe local model validity functions should overlap� The main motivation for usinga smooth interpolation of the local models is that usually the system has somesmoothness properties� i�e� the behavior changes smoothly and the phenomenaappear or disappear gradually as the operating point changes� However� one mayoccasionally come across systems that are non�smooth� in the sense that they ex�hibit abrupt changes in behavior or mechanisms �jump phenomena� catastrophes�bifurcations�� Reasons for this may be diverse� like in a combustion system thatcertainly behave di�erently depending on the presence of an ignition source� orin a �ow system where a unit may be suddenly by�passed� There also exist phe�nomena that appear gradually� but rapidly compared to the interesting dynamics�as the operating point changes� Such systems may also be looked upon as non�smooth in some cases� This may include phenomena related to phase transitionsand some changes in �uid �ow patterns� In such cases it may be of interest use dis�crete variables to describe the operating point� or design non�smooth local modelvalidity functions to capture these phenomena in a best possible way� see also �Hil�horst � �� S�oderman et al� � �� However� if it is believed that the system hassu�cient smoothness properties� then smoothly overlapping local model validity


functions seems reasonable� If we use the model�s ability to approximate the sys�tem behavior as a criterion for the choice of overlap� our experience indicate thatthere may exists a well de�ned optimum� see also Chapter �� Unfortunately� theoptimal overlap will depend on the modeling problem� On the other hand� we havealso experienced that a rough choice will often give a close�to�optimal result� Animportant factor that may in�uence the choice of overlap is that a large overlapwill lead to decreased interpretability of the model if the parameters are identi�edusing a global criterion �Murray�Smith � �b�� The reason for this is essentiallythat an increased overlap will increase the dependences and interactions betweenthe local models� If the local model parameters are �tted using empirical data� alarge overlap may imply that a given data point will be relevant for a large numberregimes� This may lead to local models that cannot be interpreted as local approx�imations to the system� since the parameters of some local models may be highlyin�uenced by remote data points� Often� one local model will have a behavior thatpartially compensates for the behavior of another local model in the overlappingregion� Hence� the global model may have a behavior that closely approximatesthe system behavior� but none of the local models approximate the behavior ofthe system locally� Clearly� this makes the interpretation somewhat more di�cult�If the local model parameters are identi�ed using local criteria� then the choice ofoverlap becomes more critical� since a large overlap will have a regularizing e�ect�Murray�Smith � �a��

�� Examples

�� Simulation Example� Population Dynamics

A classical non�linear modeling problem is a predator�prey system� Here we willapply the local modeling framework to such a system� Let x� be the concentrationof prey� and x� the concentration of predators� The basic mechanistic knowledgeand assumptions one has about this system can be qualitatively expressed as theset of rules

IF x� is large THEN x� increases

IF x� is small THEN x� decreases

IF x� is large THEN x� decreases

IF x� is small THEN x� increases

As illustrated in Fig� �� this gives four regions of the state space where thesystem behaves qualitatively di�erent� Within each region� the system behavior isqualitatively simple� Adding some more detail� we can write the model as

IF x� is larger than x� THEN +x� � a��x�

IF x� is smaller than x� THEN +x� � �a��x�IF x� is larger than x� THEN +x� � �a��x�IF x� is smaller than x� THEN +x� � a��x�

�� EXAMPLES �

x� increasesx� decreases

x� decreases

x� increases x� increases

x� decreases

x� increasesx� decreases

x� �Predators�

x� �Prey�

Figure �� A high�level description of the predator�prey model�

where x� and x� are the average concentrations of prey and predators over along period of time� respectively� Since there is no reason to believe that thebehavior changes abruptly� the �smaller than� and �larger than� statements shouldbe interpreted in a �soft� way� This gives four local operating regimes as illustratedin Fig� ��a� with local model validity functions as in Fig� ��b� and interpolationfunctions as in Fig� ��c� With x� � x� � a�� a�� a�� a�� this modelhas trajectories as illustrated in Fig� ��a� where each closed cycle corresponds todi�erent initial conditions� We see that the behavior of this model is qualitativelysimilar to the classical Lotka�Volterra model �Volterra � ��

+x� � b��x� � b��x�x�

+x� � �b��x� b��x�x�

which has trajectories as illustrated in Fig� ��b� with b�� b�� b�� b��

Now� these two models have similar behavior� but the underlying model represen�tation is quite di�erent� even though the same basic knowledge about the systemmechanisms is applied during the modeling processes� Clearly� the Lotka�Volterramodel is the most compact� On the other hand� the operating regime based modelis de�nitely a mechanistic model� and the representation is perhaps closer to theway most people �including scientists and engineers� understand the behavior ofthis system� The modeling assumptions are more explicit and the model parame�ters are perhaps more transparent than in the Lotka�Volterra model� Hence� theoperating regime based modeling framework has elevated the modeling process toa more transparent level than the equation�based modeling approach�

It is interesting to observe that a very similar operating regime based alternativeto the Lotka�Volterra model was coined by Kolmogoro� �� This much lessknown work is also based on four regimes �called zones� with qualitatively di�erent

� CHAPTER �� STATESPACE MODELING

00.5

11.5

2

0

0.5

1

1.5

20

0.2

0.4

0.6

0.8

1

00.5

11.5

2

0

0.5

1

1.5

20

0.2

0.4

0.6

0.8

1c)

b)

a)

x�

x�

x�x�

x�x�

�x� � a��x�

�x� � a��x�

�x� � a��x�

�x� � �a��x�

�x� � �a��x�

�x� � a��x�

�x� � �a��x�

�x� � �a��x�

x�

x�

Figure �� a� The operating regimes and local models� b� local model validityfunctions� and c� interpolation functions�

�� EXAMPLES �

0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2b) Trajectories with Lotka-Volterra model

a) Trajectories with the operating regime based model

x�

x�

x�

x�

Figure �� a� Trajectories with operating regime based model� and b� Trajectorieswith the Lotka�Volterra model�


behavior of the system� similar to Fig� �� Rescigno and Richardson �� concludes that this is ��a general qualitative theory for the interactions betweentwo species� and they show how the modeling principle can be applied to somewhatmore complex systems� From �Rescigno and Richardson � �� it appears that suchmodel representations have had some impact on the understanding and analysisof such systems in the area of mathematical biology�

�� Simulation Example� A Batch Fermentation Reactor

Batch fermentation processes are examples of processes where an operating regimebased model appears to be a may be quite natural approach� A typical batch con�sists of the following sequence of phases� Initial lag phase where the cells adaptto their new environment� Growth phase where the cell culture grows� produc�tion phase when the enzymes in the cells catalyses various chemical reactions� andtermination when the reactions have stopped �Bailey and Ollis � �� The pro�duction phase can in many cases be further decomposed into several more speci�cregimes� This will be the case if there is a sequence of chemical reaction steps� orif the cells under di�erent environmental conditions form di�erent enzymes andtherefore catalyses di�erent chemical reactions� Moreover� there may also be dif�ferent regimes associated with various faulty modes of operation� It is natural tolet the regimes overlap� For example� the chemical reactions in the productionregimes may start immediately� but may only be signi�cant after the cell culturehas grown for some period� The cell growth may continue through the productionregime� Often� there will be complex relations between the di�erent regimes� sincesome chemical or biological components may inhibit some chemical reactions� cellgrowth or enzyme formation� while stimulating others� A more detailed discussionof bioreactors and local models can be found in �Zhang et al� � ��

Building global mechanistic models of such processes may be a resource demandingtask� e�g� �Bailey and Ollis � �� Here we present a semi�realistic simulationexample that illustrates how operating regimes and local models can be used tosimplify the modeling problem� We will study the fermentation of glucose togluconic acid by the micro�organism Pseudomonas ovalis in a well stirred batchreactor� The main overall reaction mechanism can be described by

Cells Glucose O� � More Cells

Glucose O�Cells� Gluconolactone

Gluconolactone H�O � Gluconic Acid

The �rst reaction is the reproduction of cells� using glucose and oxygen� The secondreaction is the production of the intermediate product gluconolactone� again usingglucose and oxygen� This reaction is enzyme�catalysed by the cells� while the lastreaction forms the �nal product� gluconic acid� In the simulation study� we usethe following state�space model to simulate the �true system� �Ghose and Ghosh� ��

+� � msc

ksc kos sc�

�� EXAMPLES ��

Symbol Description� Cell concentration $UODml%p Gluconic acid concentration $gl%l Gluconolactone concentration $gl%s Glucose concentration $gl%c Dissolved oxygen concentration $gl%OUR Oxygen uptake rate $glh% m �� h��

ks �� glko �� glkp �� h��

vl ��mg UOD�� h��

kl �� glkla � �� h��Ys �� UODmgYo �� UODmgc� �� gl

Table �� Symbols and constants in state�space model of the fermenter�

+p � kpl

+l � vls

kl s� � �� kpl

+s � � �

Ys m

sc

ksc kos sc� � ��vl s

kl s�

+c � kla�c� � c� � �

Yo m

sc

ksc kos sc�� vl s

kl s�

with initial conditions �� p�� l�� s�� s and c�� c��The symbols are de�ned in Table �� It is assumed that the volume is con�stant� and that glucose and oxygen are rate�limiting� Simulating �� h batchesusing this model� �measuring� all states with �� h intervals� and adding �ran�dom measurement�noise� with a signal�to�noise ration of �� dB to the �measured�states� two sets of data are collected� by varying �� s and the agitation speed�a�ecting kla�� The �rst set contains data from �� batches� and is used for estima�tion� while the second independent set� containing �� batches� shown in Fig� ��is used for model validation only� We de�ne the following normalized variables�

�n ��

� UODml

pn �p

�� gl

ln �l

�� gl

sn �s

�� gl


cn �c

�� gl

OURn �OUR

glh

where OUR is the oxygen uptake rate� and the normalized state�vector is de�nedby

x �

�BBBB��npnlnsncn

�CCCCAUsing knowledge of the overall reaction mechanism� we de�ne three operatingregimes for this process� which are entered in the following sequence�

�� Cell growth regime� Qualitatively� this regime is characterized by cellreproduction and some limited chemical production�

�� Cell growth and production� All three reactions in the reaction schemeare signi�cant�

�� Final production� This regime is characterized by shortage of glucose�Hence� only the last chemical reaction takes place�

In the following� we will describe two possible decompositions of this process�operating range into regimes� based on two di�erent operating point vectors�

Characterizing Operating Regimes using � and OUR

The operating point is de�ned by

z�t� �

�OURn�t��n�t�

The three operating regimes can be characterized in the following way�

�� Small cell population �� and a small OUR� Since the two �rst reactions bothconsume oxygen� OUR is a measure of the total rate of those reactions�

�� Since all reactions progress at a high rate� the OUR is high�

�� Since only the last reaction takes place� and no oxygen is consumed in thisreaction� the regime is characterized by a small OUR� In addition� � tend tobe large�

We use the following local linear model structure

+x � ai Aix


0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

1.2

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

1.2

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

1.2

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

1.2

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

1.2

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

1.2

time [h]

pn

�n

ln

sn

cn

OURn

Figure �� Data collected from simulated batch runs of fermenter�


where ai is a � � � vector� and Ai a � � � matrix� Identi�cation of the unknownlocal model parameters� using a standard least squares algorithm based on a globalcriterion� gives the following local model parameters

a� �

�BBBB��

�CCCCA � A� �

�BBBB��

��

�CCCCA

a� �

�BBBB��


�BBBB��

� � ��

�CCCCA

a �

�BBBB��

�CCCCA � A �

�BBBB��

��

�CCCCAwhere the zeros are structural and follows immediately by inspecting the overallreaction mechanism� These local models may be viewed as approximate lineariza�tions of a mass balance for this process� Notice that the local model structuresmay be viewed as semi�empirical� since their structure is based on some limitedmechanistic knowledge�

Using Gaussian local model validity functions� with an ad�hoc� choice of scalingmatrices and location vectors to represent the regimes� we get interpolation func�tions as shown in �gure �� The process� transition between the operating regimesis illustrated in �gure �� The application of fuzzy sets to characterize the threeregimes gives similar results �Johansen and Foss � �b��

Characterizing Operating Regimes using c and s

As discussed in section �� a general method for �nding a suitable operatingpoint vector is to use those state�variables that captures the non�linearities� Ofcourse� some prior knowledge is required here� In the fermenter example� wesuggest

z�t� �

�cn�t�sn�t�

as an operating point satisfying this condition� Since c and s are the concentrationsof the rate�limiting substrates� this can be expected without knowing the equations


0

0.5

1

1.5

0

0.5

10

0.2

0.4

0.6

0.8

1

�nOURn

w�

w�

Figure �� Interpolation functions w�� w� and w for the three local models of thefermenter� using OUR and � to characterize the operating regimes�

0 0.2 0.4 0.6 0.8 1 1.2 1.40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Regime 3

Regime 1Regime 2

time

OURn

�n

Figure �� The trajectories show the evolution of the operating point �OUR��for some batches� The trajectories start at a low � and OUR and end up at high�� The level curves for the interpolation functions indicate the region of transitionbetween the operating regimes�


for the �true system�� It follows directly from a simple mass balance based on thereaction mechanisms and the assumption that the reaction rates depend only onc and s� We use four local models with a rough choice of Gaussian local modelvalidity functions� These model validity functions gives interpolation functions asshown in �gure �� and transitions between the regimes as illustrated in �gure�� The local model parameters are identi�ed using the same procedure as before�

a� �

�BBBB��


�BBBB��

��

�CCCCA

a� �

�BBBB��


�BBBB��

��

�CCCCA

a �

�BBBB��

�CCCCA � A �

�BBBB��

� � ��

�CCCCA

a� �

�BBBB��


�BBBB��

��

�CCCCANotice that there is a close relationship between the two candidate operating pointvectors �OUR�� and �c� s�� In fact� there is a one�to�one relationship betweenOUR and c� and � will be increasing with about the same rate as s is decreasing�because the process is operating in batch mode� The use of four operating regimesis motivated by the observation that this improves the prediction capabilities ofthe global model somewhat� compared to three operating regimes� It is in partic�ular when the glucose concentration is very low that the model bene�ts from theadditional regime� The reason for this is that the production regime can naturallybe decomposed into two regimes� Early in the production regime� it is the oxygenuptake rate that limits the rate of the reactions� until eventually glucose becomesrate limiting�

�� EXAMPLES �

00.2

0.40.6

0.8

0

0.5

1

0

0.2

0.4

0.6

0.8

1

cnsn

w�

w�

Figure �� Interpolation functions w�� w�� w� and w� for the four local models ofthe fermenter� using c and s to characterize the operating regimes�

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80

0.2

0.4

0.6

0.8

1

1.2

Regime 3

Regime 2

Regime 1

Regime 4

time

cn

sn

Figure �� Evolution of the operating point �c� s� for some batches� The tra�jectories start at high s and c and moves down to a low s and high c throughthe di�erent regimes� The level curves for the interpolation functions indicate theregions of transition between the operating regimes�


Results

The models are validated by simulation of the validation data� The results ofsimulations of a typical batch using the identi�ed models are shown in �gures �� together with the evolution of the �true system� and a simulation using anidenti�ed global linear model +x � a Ax

a �

�BBBB��

�CCCCA � A �

�BBBB��

� � ��

�CCCCAIn addition� the root average squared prediction errors estimated on the validationdata are given in Tab� �� We see that the two non�linear models always outper�form the linear model� and the non�linear predictions are close to the evolution ofthe true system�

Model Average ErrorThree local linear models� �OUR�� as operating point ��Four local linear models� �c� s� as operating point ��Global linear model ��

Table �� Summary of simulation results� root average squared prediction errorwith the di�erent models�

The process knowledge applied is primarily the knowledge of the overall reactionmechanism� Using this knowledge� operating regimes and local model structuresare designed� The parameters of the local models are identi�ed using a large dataset that contains measurement of all states� Notice that we have used very limitedknowledge about mass�transport� kinetics� and stoichiometry� It is therefore fairto say that the model structure is found on the basis of considerably less� andmore qualitative� process knowledge compared to what would have been requiredto develop a mechanistic model�

�� Discussion

In this chapter we have focused on practical procedures for development of state�space models based on local models� The main results are

�� Combining local models with di�erent state�spaces is less straightforwardthan when the state�spaces are the same� One must necessarily transfer stateinformation between the di�erent state spaces� For the more complicatedcases� we suggest the application of a state�estimator as a general tool�

�� The operating regime based modeling framework typically gives transparentmodel structures that allow interpretation� analysis� and the incorporation ofprior knowledge� The regimes can often be explicitly related to a particularset of phenomena� or di�erent system behavior�


0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

1.2

t $h%

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

t $h%

w� w� w

sn

cn

�n

lnpn

Figure �� Top� Prediction using OUR and � as operating point� Trajectorieswith circles are generated by the �true system�� while the other trajectories aresimulations using the model� Bottom� The relative weight of the three local modelsin the interpolation�

�� We have illustrated by the means of simulation examples and philosophicalarguments that the process knowledge required to design operating regimesand select local model structures may often be rather elementary and ofquite qualitative nature�

It is interesting to compare this framework with the mechanistic modeling ap�proach� The demand for process knowledge may be signi�cantly less� and morequalitative and empirical knowledge can be incorporated directly�


0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

1.2

t $h%

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

t $h%

w� w� w w�

sn

cn

�n

lnpn

Figure �� Top� Prediction using c and s as operating point� Trajectories withcircles are generated by the �true system�� while the other trajectories are simu�lations using the model� Bottom� The relative weight of the four local models inthe interpolation�


0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

1.2

t $h%

sn

cn

�n

lnpn

Figure �� Prediction using global linear model� Trajectories with circles aregenerated by the �true system�� while the other trajectories are simulations usingthe model�

Chapter �

Model StructureIdenti�cation usingEmpirical Data from aLimited Operating Range

A fundamental assumption underlying most empirical modeling and identi�ca�tion methods is that the identi�cation data sequence is complete� i�e� containsa su�cient amount of data from all relevant operating conditions �Ljung � ��However� in many industries it is impossible to satisfy this condition� due to eco�nomic� engineering� or safety constraints on the operating conditions one is allowedto perform experiments under� Using standard system identi�cation methods forthe identi�cation of model structure� without paying attention to the fact that afundamental assumption is violated� may in such cases lead to an unreliable model�in the sense that its extrapolation capability can be poor� This problem is relatedto the robustness of the model structure parameterization� and identi�ability� Itis typical that mechanistic model structures are more robustly parameterized thanempirical model structures�

In this chapter� we describe some criteria for model structure identi�cation thatwith the help of some prior knowledge about the system�s future operating range�is able to partially take into account this aspect� The application of such a criterionwill improve the robustness of the structure identi�cation algorithm and result ina more reliable model with better extrapolation capabilities� This procedure isalso outlined in �Johansen and Foss � �a��

�� Estimating Expected Prediction Performance

with Bootstrapping

Let a candidate model structure S be given� This model structure is essentiallya set of equations �a combination of di�erential� di�erence and algebraic�� The

�

� CHAPTER �� IDENTIFICATION USING INCOMPLETE DATA

equations are parameterized by a parameter vector � that is estimated using astandard prediction error estimator "� � "��Dl�� e�g� �S�oderstr�om and Stoica � ��on the basis of a data sequence

Dl � ��u�� y�� u�� y�� u�l�� y�l��

The problem we address is to estimate on the basis of the data sequence Dl theexpected �future� prediction performance of the model de�ned by �S� "��Dl��

Let an unknown future data sequence be denoted D�t � The one�step�ahead pre�

diction "y�D�t��'S� "��Dl�� is computed by solving the model equations with the

parameter estimate "��Dl�� either from a known or estimated state at time t � ��The state estimate may come from an extended Kalman��lter �Gelb � �� Letus assume that the future output y��t� is made up of a deterministic componenty��D�

t�� and an unpredictable white noise component e�t��

y��t� � y��D�t�� e�t�

Clearly� the best we can hope for� is that the model can predict the deterministiccomponent well� We de�ne the expected squared prediction error using the modelstructure S as

(�S� � E y��t�� "y�D�

t��'S� "��Dl��

y��t�� "y�D�t��'S� "��Dl��

�T� E

y��D�

t�� "y�D�t��'S� "��Dl�� e�t�

�� y��D�

t�� "y�D�t��'S�

"��Dl�� e�t��T

where E denotes expectation with respect to the joint probability distribution ofthe identi�cation data sequence Dl and the future data sequence D�

t the modelis to be applied to� Both Dl and D�

t are viewed as stochastic variables� so (�S�is the ensemble average over all possible identi�cation data sequences of length land future data sequences� Provided the future data D�

t is uncorrelated with theidenti�cation data Dl� we get

(�S� � E y��D�

t�� E "y�D�

t��'S�"��Dl��

��D�t��

�� y��D�

t��E "y�D�

t 'S� "��Dl��D�

t��

��T E

"y�D�

t��'S�"��Dl�� E

"y�D�

t��'S� "��Dl��D�

t��

�� "y�D�

t��'S�"��Dl�� E

"y�D�

t��'S�"��Dl��

��D�t��

��T E

�e�t�eT �t�

��

Eq� �� is a bias�variance decomposition of the expected squared predictionerror� The �rst term is the systematic error �squared bias�� that is non�zero ifthe model structure S is not su�ciently rich to exactly describe the deterministiccomponent of the system output� The second term is the random error �variance��

�� BOOTSTRAP ESTIMATION �

0 5 10 15 20 25 300

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Model Complexity

ExpectedError

Bias

Variance

Figure �� Typical relationship between bias� variance and model structure com�plexity� when it is assumed that model structure complexity can be measured bya real number�

which arises because the optimal parameters cannot be exactly determined on thebasis of the �nite data sequence Dl� The third term is due to the noise� whichis independent of the model and impossible to predict� Typically� if the lengthof the data sequence Dl is kept �xed� and the complexity of the model structureS increases� the bias will decrease� while the variance will increase� see Fig� ��The decrease in bias is a direct consequence of the increased degrees of freedomin the model structure� while the increase in variance is caused by the growingnumber of parameters in the model structure that must be estimated� If the modelstructure is over�parameterized� the parameter estimate will be highly sensitive tothe particular realization of the identi�cation data sequence� i�e� another datasequence of the same length logged under the same experimental conditions maylead to a very di�erent parameter estimate� Hence� there is a tradeo� between biasand variance� which suggests that there exists a model with an optimal complexitywithin the set of model structures� cf� Fig ��

Let us �rst consider the case when D�t and Dl have the same underlying probability

distribution� (�S� will re�ect the expected prediction performance of the modeland would be suited as the basis for an identi�cation criterion� Unfortunately� (�S�cannot be computed� since we assume the underlying probability distribution isunknown� However� (�S� can be estimated on the basis of the data sequence Dl

using bootstrapping� e�g� �Efron and Tibshirani � �� Carlstein � ��

The bootstrap strategy is generally applicable� and the idea is simply to approxi�mate the unknown probability distribution with a probability distribution that isestimated on the basis of the available empirical data� Suppose we want to esti�mate a statistic T � T �p� that depends on an unknown probability distribution p�

� CHAPTER �� IDENTIFICATION USING INCOMPLETE DATA

The bootstrap estimate is dT �p� � T �"p� where "p is a parametric or non�parametricestimate of p�In our case� a bootstrap estimate of the expected squared prediction error is

"(BOOT �S� ��

l

lXt��

�

B

BXb��

y�t� � "y�Dt��'S� "�b�

� y�t� � "y�Dt��'S� "�b�

�TIn this estimate� the underlying probability distribution is approximated by theempirical distribution of the residuals �Carlstein � �� The expectation in theexpression for (�S� is approximated using Monte Carlo integration� and the esti�mated probability distribution� B is the number of Monte Carlo runs� In moredetail� suppose "��Dl� is de�ned by

"��Dl� � argmin�

�

l

lXt��

�y�t� � "y�Dt��'S� ��T �y�t� � "y�Dt��'S� ��

In order to approximately mimic the random nature of the identi�cation data� "�bis de�ned by

"�b � argmin�

�

l

lXk��

y�tbk� � "y�Dtb

k��'S� ��

�T y�tbk�� "y�Dtb

k��'S� ��

�where tb�� t

b�� t

bl are independent random samples with probability

P �tbk � i� �

��l� i � f�� lg�� otherwise

In other words� "�b minimizes a criterion that is based on random resamples of theoriginal residuals�The criterion

JBOOT �S� � trace "(BOOT �S�

�is well suited for model structure identi�cation when the future data sequence D�

t

has the same underlying probability distribution as the identi�cation data sequenceDl�An alternatice bootstrap estimate can be based on the bias�variance decomposition��

"(BOOTBV �S� ��

l

lXt��

�y�t� � �

B�

B�Xb��

"y�Dt��'S� "�b��

�

��y�t� � �

B�

B�Xb��

"y�Dt��'S� "�b��

�T

�

l

lXt��

�

B�

B�Xb��

�"y�Dt��'S� "�b��

�

B�

B�Xb��

"y�Dt��'S� "�b��

�

��"y�Dt��'S� "�b��

�

B�

B�Xb��

"y�Dt��'S� "�b��

�T

�� BOOTSTRAP ESTIMATION

where B� and B� are the number of Monte Carlo runs and "�b� and"�b� are de�ned

in the same way as "�b� However� this is not recommended in practise� but will be auseful starting point in the following� when we will develop two modi�ed criteria�Notice that the variance �second term� can be computed exactly in the specialcase when the predictor is linearly parameterized�

Let us now return to the general problem when the future data has a di�erentunderlying probability distribution than the identi�cation data� A typical situationis when there are identi�cation data available from a limited region of the fulloperating range only� It appears to be impossible to accurately estimate the biasof a model structure in some operating regime without any data or prior knowledgein that operating regime� However� we may indeed estimate the variance of themodel structure over the entire range of operating conditions the model structureis de�ned for� Hence� we modify the estimate "(BOOTBV such that the variance overthe interesting range of operation is weighted according to the future probabilitydistribution pfuture�D�

t �

#(BOOT �S� ��

l

lXt��

�y�t� � �

B�

B�Xb��

"y�Dt��'S� "�b��

�

��y�t� � �

B�

B�Xb��

"y�Dt��'S� "�b��

�T

ZD�t

�

B�

B�Xb��

�"y�D�

t 'S� "�b��

B�

B�Xb��

"y�D�t 'S� "�b��

�

��"y�D�

t 'S� "�b��

B�

B�Xb��

"y�D�t 'S� "�b��

�T

pfuture�D�t �dD�

t

In practise� this integral is also approximated using Monte Carlo integration� Com�pared to "(BOOTBV � the estimate of the term containing the bias is unchanged� butideally we would also like to weight this term according to pfuture� too� This isfeasible only if an estimate of the probability distribution of the identi�cation datappast is available�

(BOOT

�S� ��

l

lXt��

�y�t� � �

B�

B�Xb��

"y�Dt��'S� "�b��

�

��y�t� � �

B�

B�Xb��

"y�Dt��'S� "�b��

�T

pfuture�Dt��

ppast�Dt��

ZD�t

�

B�

B�Xb��

�"y�D�

t ' "�b��

B�

B�Xb��

"y�D�t ' "�b��

�

��"y�D�

t '"�b��

�

B�

B�Xb��

"y�D�t '"�b��

�T

pfuture�D�t �dD�

t

CHAPTER �� IDENTIFICATION USING INCOMPLETE DATA

�� Discussion

The above estimates provides at least a partial solution to the problem� and criterialike

#JBOOT �S� � trace #(BOOT �S�

�or

JBOOT �S� � trace (BOOT

�S��

will favor model structures that have low variance everywhere� and low bias in atleast the region from which we have data available�

Since the criteria do not take into account the bias of the model structure inregimes from which there is no data available� it is important that these criteriaare applied with care� and that a subjective assessment of this bias is taken intoconsideration when the model structure is selected�

Bootstrapping with sequential structure in the data may be complicated� see �Carl�stein � �� for a survey of available techniques that more or less preserve thesequential structure in the resample� It is shown that bootstrapping using theresiduals� as applied here� at least gives a consistent estimate for AR processes�Freedman � �� However� a strong theoretical justi�cation of the applicationof bootstrapping in cases with non�stationary and sequentially correlated stochas�tic processes is not available� to our knowledge� It is the general applicabilityand simple computer implementation that has made bootstrapping a very popularstatistical technique�

The main �user�s choice� is the future data probability distribution pfuture�D�t ��

For theoretical developments we are usually concerned with identi�cation criteriathat impose on the error according to the data distribution� because it allows us toderive nice results� While such weighting may be reasonable in many applications�it is not di�cult to imagine cases where this is far from what we want� A typicalexample is a safety critical process� where on one hand there may not be much dataavailable from certain critical operating conditions� while these are the conditionsunder which the model should be most reliable� The distribution pfuture�D�

t � is aconvenient tool for at least partially take such considerations into account� whenit is interpreted as a weight�function for desired model accuracy�

Unfortunately� pfuture�D�t � is usually not known� and only a very rough approxi�

mation may be available� This is the major drawback with this approach� in ourview� However� on the few examples we have tested this procedure on� it is clearthat improvements are achieved with a rough choice of pfuture�D�

t � compared toneglecting the fact that the future operating range is larger that re�ected in theidenti�cation data�

Parametric and non�parametric estimation of density functions like ppast�D�t � is a

topic that is treated in detail in the literature� see e�g� �Silverman � �� We willtherefore not dwell with this problem here�

Chapter �

Model StructureIdenti�cation using SeparateValidation Data �Asymptotic Properties

The decomposition of a model set into a parametric and a structural level is com�mon� The parameters are typically identi�ed using a prediction error criterion� e�g��Ljung � �� while there are several possible criteria for structure identi�cation�These include FPE �Akaike � � �� AIC �Akaike � �� MDL �Rissanen � ��Bayesian criteria �Kashyap � �� cross�validation �Stoica� Eykho�� Janssen andS�oderstr�om � �� and the unbiasness criterion �Ivakhnenko and Yurachkovsky� �� Common for all these criteria is that they try to estimate the expectedprediction performance when the identi�ed models of di�erent structures are usedfor prediction� However� the simplest and perhaps most popular such estimateis the use of separate validation data to assess the prediction performance of thedi�erent identi�ed models� The major drawback of this criterion� compared to theabove mentioned ones� is that a longer data sequence may be needed�

Suppose a bounded sequence of �l input�output observations from a dynamicalsystem is available�

Z�l � �z�� z�� z�l�

where zt � �ut� yt�� ut � Rr is the input vector� and yt � Rm is the outputvector� at time t� The problem we address is the one of identifying a model ofthe system on the basis of these observations� and in particular study the modelas l approaches in�nity� The algorithm we will study is based on hierarchicaloptimization� since the model set is decomposed into two levels� namely modelstructure and model parameters� Referring to Fig� �� we will in this chapterassume that a model set is given� We will present a novel convergence result for

� CHAPTER �� IDENTIFICATION USING VALIDATION DATA

the hierarchical identi�cation procedure based on separate validation data� undersimilar assumptions as in the asymptotic analysis of the parameter identi�cationproblem given by Ljung �� Indeed� such a result should be expected� sincethe use of separate validation data should give a good estimate of the expectedprediction performance� It should be noted that some asymptotic results are givenfor related criteria based on separate validation data in �Stepashko � � Aksenova� �� in a signi�cantly less general context than this chapter� In particular� theseresults only hold for a �nite number of model structures� and the observations areassumed to be independent�

For easy reference� the telations between the assumptions and propositions inthis chapter are as illustrated in Fig� �� The material in this chapter is from�Johansen and Weyer � ��

Lemma 1 Lemma 2

Lemma 3

Theorem 1

Theorem 3

Corollary 1 Corollary 2

Lemma 4

Theorem 2

S1M1 M2

A1A2

Figure �� How the assumptions and proposistions are related�

�� MAIN ASSUMPTIONS AND PRELIMINARY RESULTS �

�� Main Assumptions and Preliminary Results

Let S be a set of model structures� A model structure S � S is a set of possibly non�linear equations� which may be algebraic� di�erence� di�erential� or a combination�The equations may contain some unknown parameters � that are known to belongto a parameter set )S � which is a subset of a �nite�dimensional Euclidean vectorspace� by assumption� We will not explicitly require )S to be compact� but as weshall see� this is typically needed to satisfy the assumptions� This de�nes a modelset

M � fM � �S� �� j S � S� � � )S g

Suppose that for each modelM �M we are able to make a prediction "yt�M�Zt��of the output yt on the basis of the data sub�sequence Zt�� Notice that thedependence of the model M on the predictor may be arbitrary� For example�the predictor may be based on a suboptimal state�estimator� The one�step�aheadprediction error using the model M is de�ned by �t�M�Zt� � yt � "yt�M�Zt��In the following� we may let the dependence on Zt�� be implicit when it is moreconvenient� and use the notation "yt�M � and �t�M ��

Before we proceed� we need to introduce some notation� Let the ��norm of avector x � Rn be de�ned by jjxjj� � �jx�j� �� jxnj�� for all � �� and theEuclidean norm be denoted by jjxjj � jjxjj��The following assumption states the key technical property of the model set� Itessentially limits the complexity of the model set� in the sense that there mustexists a �nite number of good approximations to all models in the model set�

Assumption M�� The model set M can be covered by a �nite �net withN �� elements� i�e� for any � � there exists a �nite set of models M� ��M��M�� MN��

��not necessarily a subset of M� such that for any M � M

there exists an Mi�M � �M� with the property

supt�Zt��

jj"yt�M�Zt�� "yt�Mi�M �� Zt��jj � ��

Please apologize for possible confusion by using and � in two di�erent meanings�

�

Before we proceed� let us discuss the limitations imposed by assumption M� onthe model set�

Lemma � Suppose

�� S contains only a �nite number of model structures�

�� For every S � S the parameter set )S is compact�

�� For every S � S the predictor "yt�S� �� Zt�� is uniformly continuous in )S �

Then M can be covered by a �nite �net�


Proof� Let S � S be arbitrary� From the uniform continuity there exists for every � � a � � � that depends only on � with the property that

jj�� jj � � implies supt�Zt��

jj"yt�S� �� Zt�� "yt�S� �� Zt��jj � ��

uniformly in )S � Since )S is compact� it can be covered by a �nite number of��balls� The parameter vectors at the center of these ��balls de�ne an �net for themodel setMS � fM � �S� �� j � � )Sg� Since S contains only a �nite number ofmodel structures� the result follows�

�

Uniform continuity may exclude certain types of models and systems� Considerfor example the simple model �predictor� "yt�S� �� Zt�� yt�� We get

supt�Zt��

jj"yt�S� �� Zt�� "yt�S� �� Zt��jj � � suptjjyt��jj � ��

Since � cannot depend on t� the system output must be bounded in order foruniform continuity to hold� However� we are free to choose the predictors our�selves� and may include for example saturation e�ects for very large outputs inthe predictor to enforce this�

Lemma � Suppose

�� For all t� zt � Z� where Z is compact�

�� There exists a constant K such that

supt�Zt��M

jj"yt�M�Zt��jj � K ��

�� For every k � there exists a uniformly continuous function fk �M�Zk �Rm with the property

supt�Zt��

jj"yt�M� zt�� z�� fk�M� zt�� zt�k�jj � C�k ��

for some constants C and � � �� Then M can be covered by a �nite �net�

Proof� Let M � M and � � be arbitrary� Choose k so large that

supt�Zt��

jj"yt�M� zt�� z�� fk�M� zt�� zt�k�jj � � ��

Provided the set of functions Fk � ffk�M� �� Zk � Rm j M � Mg can becovered with a �nite ��net� this ��net will also be an �net forM�

From the uniform continuity of fk there exists a � � � such that

jjZ� � Z�jj � � implies supM�M

jjfk�M�Z�� fk�M�Z��jj � � ��

�� MAIN ASSUMPTIONS AND PRELIMINARY RESULTS �

Cover Zk with a �nite number of boxes Z��Z�� ZN� where each box is containedwithin a ��ball� and

Z ��i

Zi and Zi � Zj � for i � j ��

An approximation "fk�M� �� to fk�M� �� is now

"fk�M�Z� �NXi��

�i�M � i�Z�

where i is the characteristic function for Zi� From �� and assumption �� itis clear that a �nite number of values for the parameter vector �i is su�cientto achieve a uniform approximation accuracy of �� Hence� this de�nes a �nite��net for Fk�Remarks� It will later be required that the �net forMmust consists of predictorsthat are smooth functions of past data� cf� M�� This may be achieved by replacingthe i�functions by smoothed versions� It should be mentioned that the use of apiecewise constant approximation in the proof is only one possibility� and severalother approximators are expected to work as well�

�

Assumption � requires that the dependence of the predictor on past data mustdecay exponentially� Roughly speaking� this means that the predictor must beexponentially stable� Moreover� the predictor must be time�invariant� which isa quite restrictive assumption that need not hold when the predictor contains astate estimator� for example� On the other hand� boundedness of the data andprediction �cf� assumptions � and �� are quite reasonable assumptions�

We would also like to stress that Lemma � and � are only examples of su�cientconditions for the existence of �nets that cover M� and weaker conditions mayvery well exist� It must also be emphasized that it is only su�cient to provethe existence of an �net� However� as we have seen� the proofs are typicallyconstructive�

The asymptotic analysis will be based on a law of large numbers that holds uni�formly in the model set� We will in the next paragraphs state a set of assumptionsthat together with M� ensure that such a law holds� We will introduce a stochas�tic framework that allows us to deal with non�stationary stochastic processes thatmay contain deterministic components and sequential correlations� In the fol�lowing� let E denote expectation with respect to input�output observations� andfollowing Ljung �� we de�ne the operator E by

Ext � limT��

�

T

TXt��

Ext

for a vector sequence x � �xt�� Notice that it is implicitly assumed that the limitexists in all subsequent applications of this operator�


Denition Ljung ��Hjalmarsson �� A sequence �xt� is said to beexponentially forgetting of order � if there exists a doubly indexed sequence �xst�where xst is independent of the subsequences �x� x�� xs� and �x

pp� x

pp�� x

ps�

for all p � s� xtt � �� and for every t s

E jjxt � xst jj�� C�t�s ��

for some constants C and � � ��

This means that it must be possible to approximate the sequence �xt� with asequence �xst� that is independent of the remote past� Moreover� the approximationmust be so good that the �remote past is forgotten at an exponential rate�� Ljung�� uses this as a de�nition of exponential stability of stochastic dynamicalsystems� and has shown that in many cases it contains the classical de�nition ofexponential stability� Clearly� this de�nition also has similarities to assumption � inLemma �� The de�nition is made in a stochastic framework using the expectationoperator� while in Lemma � the assumption is in a deterministic framework usingthe supremum norm�

Assumption S�� The sequences �ut� and �yt� are exponentially forgetting oforder ��

�

On the model setM and the �netM� we need to make an additional assumptionto ensure that the squared prediction error is an exponentially forgetting sequencewhen S� holds�

Assumption M�� De�ne the sequence h�M � � �jj�t�M �jj�� and suppose�� The function "yt�M�Zt�� is di�erentiable with respect to Zt�� for all M �M�M��

�� For all t � and k �� there exists a convex function gt�k such that for allmodels M �M�M��

�zt�kht�M� zt�� z��

�� gt�k�zt�� z��

�� There exist constants C and � � �� such that for all t � the set offunctions fgt�k j k �g has the property

Ejjztjj� �� for all t implies E�gt�k�zt�� z�� C�k

�� Moreover� �ht�M� �� is a bounded sequence for and allM � M�M��

�

First� assumption M� means that the sensitivity of the squared prediction errorwith respect to the past inputs and outputs zt�k must be bounded by a functionthat decays exponentially with k� which basically means that the predictor mustbe exponentially stable� In addition� M� impose some growth conditions on thepredictor� It is clearly su�cient that the derivative of the predictor is bounded�Furthermore� the prediction must depend smoothly on the past data� These arereasonable conditions that are likely to be ful�lled for many model sets� see also�Hjalmarsson � �� p� ��

�� STRUCTURAL AND PARAMETRIC IDENTIFICATION �

Lemma � Hjalmarsson �� pp� �� Suppose Assumptions S� and M�hold� then for all M � M�M� the sequence �jj�t�M �jj�� is exponentially forget�ting of order �� Moreover� the sequence �supM�M jj�t�M �jj�� is also exponentiallyforgetting of order ��

�

Exponential forgetting of order � is su�cient for the strong law of large numbersto hold�

Theorem � �Law of Large Numbers�� Suppose a scalar sequence �ht� is ex�ponentially forgetting of order �� Then��Eht � �l

lXt��

ht

�� w� p� � as l ��

Proof� Follows from a somewhat stronger result in �Hjalmarsson � �� p� ��

�

This law is a prerequisite for a uniform law of large numbers� which is the maintool in the asymptotic analysis�

Theorem � �Uniform Law of Large Numbers�� Suppose H is a set of scalarsequences that are exponentially forgetting of order �� and for each � � thereexists a �nite set of exponentially forgetting sequences H� containing lower andupper approximations h��L and h��U to each h � H such that

h��Lt � ht � h��Ut and E�h��Ut � h��Lt � � ��

Then

suph�H

��Eht � �llX

t��

ht

�� w� p� � as l��

Proof� Follows from a theorem in �Pollard � �� p� � and Theorem ��

�

�� Structural and Parametric Identi�cation

We de�ne the expected squared prediction error

V �M � � E jj�t�M �jj� ��

The ultimate goal is to �nd a model inM that makes V as small as possible� Tobetter understand the nature of this problem� consider

EZV � "M�l� � EZE��t� "M�l�

��


where "M�l � � "S�l� "��l� �M is a model estimated on the basis of the data Z�l usingan arbitrary method� and the expectation EZ is with respect to the data sequenceZ�l used for identi�cation� that is� the average over the ensemble of possible datasequences Z�l of �xed length �l� The composite operator EZE will in additionaverage over the ensemble of possible �future� data in order to assess the expectedsquared prediction error� In other words� EZV � "M�l� will be the expected squaredprediction error of the model "M�l when both the �xed length data sequence Z�l

used for identi�cation and the future data are viewed as random variables� Wehave the following bound on the expected squared prediction error�

EZV � "M�l� � E� jjyt � "yt�M��jj�

EZE��"yt�S� � �� "yt� "S�l� ��l��

EZE��"yt� "S�l� ��l�� "yt� "S�l� "��l��

where M� � �S�� is the �not necessarily unique� best model inM� in the sensethat it minimizes V �M � overM� Likewise� ��l is the �not necessarily unique� best

parameter vector for the structure "S�l� in the sense that it minimizes V �M � overthe set of models

M S�l�nM � � "S�l� ��

�� ) S�l

oinduced by the model structure "S�l� The �rst term in this bias�variance decom�position is the variance of the residuals that are impossible to predict using anymodel inM� both due to unpredictable random error and because the model setM may not be su�ciently rich to exactly reproduce the deterministic componentof the system output� The second term is the error caused by a possibly too sim�ple model structure "S�l �bias�� while the third term is variance due to parametricerror� In general� model structures that are too complex will give large parametricerrors� In particular� models with more parameters than the number of obser�vations used to �t the model may give unbounded parametric error� Clearly� itis desirable with both small structural and parametric errors� but for a �nite l�these are mutually exclusive� cf� Fig �� A small structural error will in generalrequire a complex model structure� while a small parametric error will require asimple model structure with few parameters compared to l� This is the well knownbias�variance trade�o�� However� asymptotically �as l � �� it is possible to si�multaneously reduce both these terms� This requires that the model is chosen togive a delicate balance between structural and parametric error�

Minimization of the expected squared prediction error V �M � will in many casesre�ect the desired properties of the model� and is therefore suited as a criterionfor model identi�cation� Unfortunately� the underlying probability measure isassumed to be unknown� and the expectation cannot be computed� Considerinstead the empirical average prediction error

Vl�S� ��

l

lXt��

jj�t�S� ��jj� ��

�� STRUCTURAL AND PARAMETRIC IDENTIFICATION

estimated on the basis of the �rst half of the data sequence Z�l� Within thehierarchical identi�cation framework in Fig� �� we minimize for each modelstructure S � S the criterion Vl�S� �� with respect to � in order to �nd a parameterestimate�

Assumption A�� For every l and S � S the following global minimum existswith probability one

"�S�l � arg min��S

Vl�S� ��

�

Notice that we do not assume that the minimum is unique� but use the conventionthat if more than one value of � minimizes Vl�S� �� then is "�S�l an arbitrary mini�mum� Non�uniqueness should indeed be expected� becauseM may contain modelstructures of high complexity� possibly with an unbounded number of parameters�If this is the case� then for l smaller than the number of parameters in the modelstructure� the parameter identi�cation problem is under�determined and there maybe several parameter vectors that give exact match to the data� Notice that A�automatically holds if )S is compact� and Vl is a continuous function of ��

We de�ne "MS�l � �S� "�S�l�� This is the �not necessarily unique� model with struc�ture S that on the basis of the l observations is believed to be the best� Next�we need to choose among the models in the set f "MS�l j S � Sg� We de�ne a newcriterion on the basis of the second half of the data sequence Z�l

V l �S� "�S�l� ��

l � k

�lXt�l�k��

��t�S� "�S�l��

throwing away the �rst k samples to ensure that the remaining samples are not toomuch correlated with "�S�l� cf� the de�nition of exponentially forgetting sequences�

Assumption A�� For every l� the global minimum

"S�l � argminS�S

V l �S� "�S�l�

exists with probability one�

�

Again� this minimum need not be unique� and we use the convention that in thecase of non�uniqueness� "S�l is any minimum� In summary� the model estimate "M�l

is de�ned by the following hierarchical optimization procedure

Parameter Identi�cation� "�S�l � arg min��S

Vl�S� �� for all S � S ��

Structure Identi�cation� "S�l � argminS�S

V l �S� "�S�l� ��

Identi�ed Model� "M�l � � "S�l� "� S�l �l� ��

We are now in position to prove the main result�

CHAPTER �� IDENTIFICATION USING VALIDATION DATA

Theorem � Suppose Assumptions A� and A� hold� and

supM�M

��Vl�M � � V �M �� w� p� � as l ��

supS�S

��V l �S� "�S�l�� V �S� "�S�l�

�� w� p� � as l ��

Then

V � "M�l� � infM�M

V �M � w� p� � as l��

Proof� Successive applications of the triangle inequality gives��V � "M�l� � infM�M

V �M �

�� V � "M�l�� infS�S

inf��S

V �S� ��

��V � "M�l�� inf

S�SV �S� "�S�l�

��

�� infS�SV �S� "�S�l�� inf

S�Sinf��S

V �S� ��

��V � "M�l�� V l � "M�l�

��

��V l �"M�l�� inf

S�SV l �S�

"�S�l�

��

�� infS�SV l �S� "�S�l� � inf

S�SV �S� "�S�l�

��

�� infS�SV �S� "�S�l�� inf

S�SVl�S� "�S�l�

��

�� infS�SVl�S� "�S�l� � inf

S�Sinf��S

Vl�S� ��

��

�� infS�Sinf��S

Vl�S� �� infS�S

inf��S

V �S� ��

��The second and �fth terms are zero� with probability one� cf� �� and ��It is straightforward to prove that for any set X � and functions f� g � X � R thatare bounded from below� that�� infx�X

f�x� � infx�X

g�x�

�� supx�X

jf�x� � g�x�j ��

Hence� with probability one��V � "M�l� � infM�M

V �M �

�� supS�S

��V l �S� "�S�l�� V �S� "�S�l�

�� sup

S�Ssup��S

��Vl�S� �� V �S� ��

�

The result tells us that the expected squared prediction error when the model "M�l

is used for prediction will tend towards the smallest possible with probability oneas l �� In particular�

�� STRUCTURAL AND PARAMETRIC IDENTIFICATION

Corollary � Suppose the assumptions in Theorem � hold� and in addition

�� There exists a unique M� �M such that M� � argminM�M V �M ��

�� The parameterizations of the model structures are such that a metric on Mcan be de�ned� and M is compact�

Then "M�l �M� with probability one as l��

�

This result is commonly de�ned as consistency of the model estimate� The condi�tions are related to identi�ability of the model set�

Next� we discuss assumptions under which �� and �� hold�

Lemma � Suppose S�� M�� M� and A� hold� then �� and �� hold�

Proof of �� The proof is inspired by a result in �Weyer� Williamson andMareels � �� Recall the de�nition of the sequence h�M � � �jj�t�M �jj�� andde�ne the set of sequences H � fh�M � j M � Mg� Of course� the main tool isTheorem �� and the proof is basically to show that h�M � satis�es the assumptionsof this theorem�

Pick an arbitrary � � and M � M� and let Mi�M � � M�� denote the modelin the �net M�� that is closest to M �in the sense of �� where is yetunspeci�ed� Now

ht�Mi�M �� pht�Mi�M �� ht�M � �

pht�Mi�M ��

��

and sincepx � � x� lower and upper approximations to the sequence h�M � are

given by

h��Lt � ht�Mi�M �� ht�Mi�M �� ht�M �

� ht�Mi�M �� ht�Mi�M ��

�� h��Ut ��

Taking the expectation� we get

E h��Ut � h��Lt

�� V �Mi�M ��

Now

V �Mi�M �� supM�M

V �Mi�M �� K ��

since it follows directly from Lemma � that �supM�M ht�M �� is exponentiallyforgetting of order �� and it therefore exists a constant K such that

supM�M

V �M � � K� ��

Choosing su�ciently small ensures

E h��Lt � h��Ut

��


Hence� the conditions of Theorem � are ful�lled� and �� follows�

Proof of �� The terms in V l will make up a sequence �jj�t�S� "�S�l�jj�� that is

exponentially forgetting of order �� and since the model subset fM � �S� "�S�l� j S �Sg �M can be covered by a �nite �net� the rest of the proof is similar to that of��

�

Eq� �� says that the empirical average Vl converges with probability one toits expectation V simultaneously for all possible models in the model set� Thisis often referred to as uniform convergence� or as a strong uniform law of largenumbers� Its main implication is that Vl will be a close approximation to V forall models inM� for large l� Hence� minimization of Vl and V are approximatelyequivalent� The result of Ljung �� which applies to a �xed model structure�is a special case�

Corollary � Suppose S� and M� hold� Consider a �xed model structure S � S�suppose )S is compact and "yt�S� �� is di�erentiable with respect to � for all t and� � )S � Then

sup��S

��Vl�S� �� V �S� �� w� p� � as l ��

�

In addition to the use of a separate validation data sequence� it is for exampleknown that the MDL criterion �Rissanen � �� and Bayesian criterion �Kashyap� � lead to consistent model structure estimates� On the other hand� it isknown that AIC and FPE are biased �Shibata � �� and does therefore not leadto consistent model structure estimates�

Theorem � still holds if we introduce several structure levels in a hierarchy� usingseparate data sequences to formulate a criterion at each level in the hierarchy� Wemay also add a regularization term �Tikhonov and Arsenin � � in the criterion�

Vl�� M � � Vl�M � ��l�-�M � ��

where ��l� � � is a regularization parameter� and - is a stabilizer that has thepurpose of making the problem better conditioned� To asymptotically reduce thebias introduced by this term� we let ��l� � � as l � �� Optimization of theregularization parameter ��l� using yet another independent data sequence doesnot impose any di�culties in the analysis�

The data sequence is split into two sequences of approximately equal length� Theanalysis holds as long as the length of both sequences go to in�nity as l goes toin�nity� In order to �nd a good trade�o�� one should consider the size of the modelstructure set relative to the parameter sets� and take into account the convergencerate� cf� �� Unfortunately� accurate estimates of the convergence rates are noteasily attainable� There are some convergence results available in �Vapnik � ��Pollard � �� but they su�er from two major drawbacks� They are potentiallyvery conservative� and applicable for independent stochastic processes only� It isnot clear how to extend these results to exponentially forgetting processes�

�� DISCUSSION AND CONCLUDING REMARKS �

�� Discussion and Concluding Remarks

A model set is decomposed into a structural and a parametric level� For eachmodel structure� the parameters are estimated using a prediction error method�The model structures are compared by estimating their expected prediction per�formance on a separate validation data sequence� using the estimated parameters�Here we show that under reasonable assumptions� selecting the model structurewith the best performance on the validation data asymptotically leads to a modelwith the best possible expected prediction performance within the given modelset�

We have shown that this procedure essentially has the same asymptotic conver�gence properties as the parameter identi�cation procedure of Ljung �� Wehave introduced one additional assumption� namely the existence of a �nite �netthat covers the model set� but also illustrated that this assumption holds undersome reasonable conditions on the model set� cf� Lemmas � and ��

Chapter �

Identi�cation of OperatingRegimes

In Chapters � and � we focused on regime decomposition and choice of local modelstructure on the basis of system knowledge� The aim of this chapter is to report onan algorithm that automatically identi�es a decomposition into operating regimesand local models on the basis of empirical data� Only weak a priori knowledge�such as one or more alternative local model structures is required�

This chapter is organized as follows� First� in section �� the problem is formulated�Then� in section �� a heuristic search algorithm is presented� Statistical prop�erties of the algorithm are discussed in section �� and the algorithm is appliedto simulated and experimental data sets in section �� Next� in section �� theknowledge requirements and possibilities for incorporation of a priori knowledge inthe model are considered� before it is discussed how qualitative system knowledgecan be extracted from the identi�ed model� An overview of related work and adiscussion of the limitations and possible improvements of the approach follows�A short version of the chapter can be found in �Johansen and Foss � �b��

�� Problem Formulation

We address the problem of inducing a model of an unknown non�linear system onthe basis of a sequence of l input�output observations

Dl � ��u�� y�� u�� y�� u�l�� y�l��

where u�t� � Rr and y�t� � Rm are the input and output vectors of the system�respectively� We denote by Dt the subsequence of Dl containing data up to andincluding time t � l�

First� consider static models

y�t� � f�u�t�� e�t� ��

�

� CHAPTER �� IDENTIFICATION OF OPERATING REGIMES

where e�t� � Rm is zero�mean noise� and f is an unknown function to be estimated�

An approximation "f to f suggests the predictor

"y�tju�t�� "f �u�t��

which gives the prediction error

��t� � y�t� � "y�tju�t�� f�u�t�� "f �u�t�� e�t�

Next� consider a stable dynamic system represented by an NARMAX �non�linearARMAX� model� �Chen and Billings � ��

y�t� � f�y�t � �� y�t� ny�� u�t� �� u�t� nu��

e�t � �� e�t� ne�� e�t� ��

where e�t� � Rm is zero�mean noise� and ny� nu� and ne are non�negative integers�

Given an approximation "f to the function f � a one�step�ahead predictor "y�tjDt��can be formulated�

"y�tjDt�� "f �y�t � �� y�t� ny�� u�t� �� u�t� nu�� t� �� t� ne��

��t� � y�t� � "y�tjDt��

The motivation behind this predictor is that while the noise sequence e is unknown��t�� e�t� as t�� if "f � f � the noise model is invertible� and the initial valuesare within the region of attraction�

Finally� we consider state�space models

x�t �� g�x�t�� u�t�� v�t� ��

y�t� � h�x�t�� w�t� ��

where x�t� is a state�vector� and v�t� and w�t� are zero�mean disturbance andnoise vectors of appropriate dimensions� In this case� the model is de�ned by thefunctions g and h� Again� using approximations "g and "h� it is possible to constructa one�step�ahead predictor "y�tjDt�� using for example the extended Kalman��lterapproach� e�g� �Ljung � �

"x�tjDt�� "g�"x�t � �jDt�� u�t� �� K�t � ��t� ��"y�tjDt�� "h�"x�tjDt��

��t� � y�t� � "y�tjDt��

where K�t� is the Kalman��lter gain matrix� This matrix will depend explicitly

on the time� the functions "g and "h� and the covariance matrices of the disturbanceand noise sequences�

�� PROBLEM FORMULATION �

�� A Generalized Framework

In all these cases� we can write the model equations on the form

��t� � f��t�� e�t� ��

where ��t� � Rm is a generalized output�vector� ��t� � Rr is a generalized input�vector� and e�t� � Rm is zero�mean noise� We denote the space Rr the input space�In the static model case �� the input and output vectors equals the generalizedinput and output vectors� In the NARMAX case �� the generalized inputvector contains delayed input and output vectors in addition to delayed noisevectors� while the generalized output equals the system output� If noise termse�t�� e�t�ne� are present� the generalized input vector is partially unknownand cannot be found exactly from the data Dt� For state�space models ��neither the generalized input nor the generalized output vectors can be foundexactly� because they contain the unmeasured state vector� The purpose of theformulation �� with the generalized input and output vectors is to write themodel in a generic form with one unknown function f � The problem we addressis to estimate this function� and since this function immediately gives the modelequations� this also solves the system identi�cation problem� Notice that thefact that the generalized inputs and outputs may not be exactly known is not atoo di�cult problem� since the model parameters can still be estimated from theinput�output data using a prediction error approach with the predictors describedabove� e�g� �S�oderstr�om and Stoica � ��


We de�ne the system�s operating point at time t as z�t� � �z��t�� zd�t��T � Z �Rd� where typically d � r and the operating space Z is a subspace or sub�manifoldof the input space� It is assumed that � and z are related by a known boundedmapping H such that z � H�� Typically� Z and H are designed such that theoperating point z�t� characterizes di�erent modes of behavior of the system underdi�erent operating conditions� as we have discussed in Chapters � and �� SupposeZ is decomposed into N disjoint sets fZigi�IN �regimes� so that

Z ��i�IN

Zi

for some index set IN � fi�� iNg with N elements� Assume that for each regime

Zi we have a local model structure de�ned by the function "fi��' �i� parameterizedby the vector �i� and a local model validity function �i�z� � which indicatesthe relative validity of the local model structure as a function of z� In additionto being smooth� �i is designed to have the property that �i�z� is close to zeroif z � Zi� Furthermore� it is assumed that for all z � Z there exists an i � INsuch that �i�z� � �� to ensure completeness of the model� A global model can beformed as

"f �� Xi�IN

"fi��' �i�wi�z� ��

� CHAPTER �� IDENTIFICATION OF OPERATING REGIMES

wi�z� � �i�z�

�Xj�IN

�j�z� ��

where the functions in the set fwigi�IN are called interpolation functions�A model structure based on a decomposition into N regimes is written

SN �n

Zi� �i� "fi�o

i�IN��

This is somewhat redundant� since there is a close �but not necessarily one�to�one�relationship between Zi and �i�

With this representation� the modeling problem consists of the following subprob�lems�

�� Choose an operating space Z and mapping H�

�� Decompose Z into regimes� and choose local model structures�

�� Identify the local model parameters�

In Chapters � and � it is demonstrated with examples that in some cases� somecoarse qualitative process knowledge is su�cient to carry out this procedure� Inthe following sections we propose an algorithm that requires signi�cantly less sys�tem knowledge to decompose Z� choose local model structures� and constructinterpolation functions�

�� Model Structure Identication Criteria

Let a model structure S of the form �� be given� Notice that in a modelstructure� the model parameters �T � ��Ti� � ��

TiN� are considered unknown� The

model structure S together with the admissible parameter set )S induces a modelset

MS � fM � �S� ��' � � )Sg

In this section� we will discuss how di�erent model structures can be comparedusing a sequence of empirical data to estimate its expected prediction performance�We introduce the notation

y�t� � y��Dt�� e�t�

��tjS� �� y�t� � "y�t jDt�� S� � �

where y��Dt�� is the deterministic �predictable� component of the system output�e�t� is the stochastic �unpredictable� component� and ��tjS� �� is the residual� Let"�S be a parameter estimate that minimize the prediction error criterion

JS ��

l

lXt��

trace��t jS� � ��T �t jS� � ��

�� PROBLEM FORMULATION

Let an unknown future data sequence be denoted D�t � and assume D�

t and Dl areuncorrelated� Moreover� let ED and ED� denote expectations with respect to Dl

and D�t � respectively� The prediction error is given by

��tjS� "�S� � y��D�t�� "y�tjD�

t�� S�"�S�Dl�� e�t�

where the dependence of Dl on "�S has been written explicitly� The expectedsquared prediction error is de�ned by

(�S� � ED�ED��tjS� "�S�Dl��

��tjS� "�S �Dl��T

Assuming e�t� is white noise that is uncorrelated with D�t�� and Dl� we get the

following bias�variance decomposition of this expected squared prediction error

(�S� � ED�

y��D�

t�� ED"y�tjD�t�� S�

"�S�Dl��

� y��D�

t�� ED"y�tjD�t�� S�

"�S�Dl��T

ED�ED

"y�tjD�

t�� S�"�S�Dl� �� ED"y�t

��D�t�� S�

"�S�Dl��

� "y�tjD�

t�� S�"�S�Dl�� ED"y�tjD�

t�� S�"�S�Dl��

�T ED�e�t�eT �t� ��

The �rst term is the squared systematic error �squared bias� caused by a too simplemodel structure� The second term is the random error �variance� that is presentbecause the best model in the model setMS cannot in general be identi�ed on thebasis of the �nite data sequence Dl� Finally� the third term is the unpredictablecomponent of the system output� Notice that the �rst term does not depend onthe data Dl� while the third term does not depend on neither the data Dl� nor themodel structure�

We apply a set of approximators that can approximate any smooth function uni�formly on a compact subset of the input space� cf� Theorem � in Chapter ��This is obviously a desirable property of the model set� but also a cause for someproblems� The richness implies that there will exist models in the model set thatwill make the bias arbitrarily small� However� the �nite amount of data will givelarge variance for such models� Such a model will be �tted not only to representthe system� but also the particular realization of the noise� In other words� themodel may give very good prediction of Dl� but poor prediction capability whenapplied to D�

t � This is known as over��tting� and is caused by too many degreesof freedom in the model� It is therefore important that a model with the correctnumber of degrees of freedom is found� in the sense that it balances bias and vari�ance� A general guideline is the parsimony principle� e�g� �Ljung � �� whichstates that the best model structure is the one with least degrees of freedom thatadequately describes the behavior of the system� We will base the model structureidenti�cation algorithm on statistical criteria that re�ect this principle�

The mean square error �MSE� criterion is de�ned by

JMSE�S� � trace�(�S��

CHAPTER �� IDENTIFICATION OF OPERATING REGIMES

Minimizing JMSE will lead to a parsimonious model structure� but with a �nitesequence of data Dl� the problem is ill�posed� The reason is simply that JMSE

cannot be computed since the probability distribution for the prediction error isunknown� An alternative would be to minimize the average squared predictionerror �PE� criterion with respect to the model structure

JPE�S� � trace

��

l

lXt��

��tjS� "�S ��tjS� "�S �T�

For �nite l� JPE may be a strongly biased estimate of JMSE� since the predictionperformance is measured using the same data the parameters are �tted to� andthe law of large numbers is not valid because of the strong dependences betweenthe terms� Hence� the use of JPE for structure identi�cation� will not lead to aparsimonious model� We will in the following present several criteria that are farbetter estimates of JMSE than JPE �If a separate data sequence D�

l �independent of Dl� is known� an unbiased estimateof JMSE can be found by computing the empirical average squared prediction errorthat results when the model �tted to the data Dl is used to predict the data D�

l �This is the simplest and perhaps most reliable procedure� but su�ers from thedrawback that a signi�cantly larger amount of data is required� Experiments andcollection of data is a major cost for many modeling problems� We thereforeproceed with some alternatives that allow the data Dl to be reused to �nd goodestimates of JMSE �First we consider the �nal prediction error criterion �FPE� �Akaike � � �� givenby

JFPE �S� �� p�S�l

�� p�S�lJPE�S�

where p�S� is the e�ective number of parameters �degrees of freedom� in the modelstructure� JFPE is an estimate of JMSE � and penalizes model complexity relativeto the length of the available data sequence through the term p�S�l� It is assumedthat the predictor is linearly parameterized� A non�linear generalization is givenin �Larsen � �� Closely related criteria are Mallows Cp statistic �Mallows � ��and the Akaike Information Criterion �AIC� �Akaike � �� which is appropriatein the context of maximum likelihood estimation�An alternative criterion can be formulated using cross validation �Stone � ��The basic principle is extended to dynamic systems by Stoica et al� �� andJanssen� Stoica� S�oderstr�om and Eykho� �� Let v � l be a positive integer�and let k � $lv%� i�e� the largest integer not greater than lv� We de�ne the indexsets

Ij � f�j � ��k �� j � ��k �� kjg� j � �� v� �Iv � f�v � ��k �� v � ��k �� lg

which forms a partition of the time interval from � to l into v approximatelyequal�sized intervals� The v�fold cross�validation criterion is de�ned as

JvCV �S� � trace

��l

vXj��

Xt�Ij

� t��S� "��IjS

��T t��S� "��IjS

��A

�� SYSTEM IDENTIFICATION

where the parameter estimate "��IkS minimizes the following criterion

J�IjPE ��

�

l �.IjXt�I�

k

��t jDt�� S� � ��T �t jDt�� S� � �

where I�k � f�� lgnIk� and .Ij is the number of elements in the set Ij� Cross�validation may give a reasonable approximation to the use of independent data forselecting the model structure� at the cost of extra computations� The computa�tional complexity can be considerably reduced when the prediction error is a linearfunction of the parameters� or using the approximate cross�validation criteria ofStoica et al� �� and Janssen et al� �� It is shown that the approximatecriteria are asymptotically equivalent to FPE� as l � �� Yet another approx�imation to cross�validation is the Generalized Cross Validation �GCV� criterion�Craven and Wahba � �

JGCV �S� ��

�� p�S�l��JPE �S�

which is easily seen to be asymptotically equivalent to FPE� and assumes linearparameterization of the predictor�

Bootstrapping �Efron and Tibshirani � �� is a general procedure for the estima�tion of statistical properties� In Chapter �� we introduced a bootstrap criterionthat estimates the expected squared prediction error� and two modi�ed bootstrapcriteria that gives improved estimates of the expected squared prediction errorwhen the system operates under a wider range of operating conditions than re��ected in the empirical data Dl�

Any one of these criteria can be applied with the structure identi�cation algorithmpresented in the next section�

�� System Identi�cation

Let a set of candidate local model structures L � ff�� f�� fNLg be given� where

fi is a parameterized function that determines a local model structure� cf� ��

�� The Set of Model Structure Candidates

Assume the input� and output observations in Dl are bounded� Then the system�soperating range Z can be approximated by the d�dimensional box

Z� �hzmin�� zmax

��

i� � � � �

hzmin��d � zmax

��d

iwhere z�t� � Z� for all t � f�� lg� since H is a bounded mapping� Noticethat the resulting model will extrapolate and can be applied for operating pointsoutside Z�� Next� we consider the problem of decomposing Z� into regimes�

�� CHAPTER �� IDENTIFICATION OF OPERATING REGIMES

Consider the possible decompositions of the set Z� into two disjoint subsets Z��

and Z�� with the property Z� � Z�� Z�� We restrict the possibilities by theconstraint that the splitting boundary is a hyper�plane orthogonal to one of thenatural basis�vectors of Rd� i�e�

Z�� fz � Z� j zd� � ��gZ�� fz � Z� j zd� ��g

for some dimension index d� � f�� dgofRd and splitting point �� hzmin��d�

� zmax��d�

i�

Local model validity functions for the two regimes are de�ned by the recursion

��z� � ��z�b�zd� � z��d�'��

��z� � ��z�b�zd� � z��d�'��

where zi�d� � �� zmini�d�

zmaxi�d�

�for i � f�� g is the center point of Zi in the

d��direction� The function b�r'�� is a scalar basis�function with scaling parameter�� and the local model validity function associated with the regime Z� is ��z� � ��The scaling parameters are chosen by considering the overlap between the local

model validity functions� For i � f�� g� we choose �i � �� zmaxi�d�

� zmini�d�

�where � is a design parameter that typically takes a value between �� and ��There will be almost no overlap when � � �� and large overlap when � � ��

For each dimension index d� � f�� dgwe represent the intervalhzmin��d�

� zmax��d�

iby

a �nite number of N� points uniformly covering the interval� Now d�� f��v� andf��w de�nes a new model structure� where the regime Z� is decomposed accordingto the dimension index d� at the point �� and the two local model structures aref��v and f��w� Formally� the set of candidate model structures Sn with n regimesis given by

S� � ff�Z�� fj�g' j � f�� NLggS� �

��Zi��

i�� fj

��Zi��

i�� fk

��' i � f�� dN�g� j� k � f�� NLg

�S �

��Zi��

i�� fj

�� Zm

�� m�� fk� � �Z

m��

m�� fn�

�'

i�m � f�� dN�g� j� k� n � f�� NLgg��Zm

�� m�� fk� � �Z

m��

m�� fn� �

�Zi��

i�� fj

��'

i�m � f�� dN�g� j� k� n � f�� NLggS� � � � �The model structure set is now

S � S� � S� � S � � � �which is illustrated as a search tree in Fig� �� Strictly speaking� the modelstructure set is not a tree� since sometimes di�erent sequences of decompositionslead to the same model structure� However� we choose to represent this as atree� for simplicity� Now the structure identi�cation problem can be looked uponas a multi�step decomposition process� where at each step one regime from the

�� SYSTEM IDENTIFICATION ��

previous step is decomposed into two sub�regimes� Such an approach will lead toa sequence of model structures S�� S�� Sn where the model structure Si�� hasmore degrees of freedom than Si� Due to the normalization of the local modelvalidity functions� the model set is usually not strictly hierarchical� in the sensethat Si cannot be exactly represented using Si�� However� the increasing degreesof freedom de�nes a hierarchical structure�

.........................

......

......

...............

ZdN��

Z�

��

Z�

��

ZdN��

Z�

Z�

��Z�

��

Z�

�� Z�

�� Z�

��

Z�

��

Figure �� Model structure search tree illustrating possible decompositions intoregimes and choice of local model structures� Each level in the tree correspond tothe possible decompositions into one more regime than at the previous level� i�e�the model structure sets S��S��S etc� The subset of model structures at each�super�node� in the tree corresponds to a �xed decomposition into regimes� butdi�erent combinations of local model structures�

�� Basic Search Algorithm

The problem is now to search the set S for the best possible model structure� Theestimate of the parameters in model structure S is de�ned by

"� � argmin�

JS��


where it has been assumed that a unique minimum exists� This can be ensuredby restricting the parameters to a compact set� Now� the chosen structure identi��cation criterion is written J �S�� We de�ne for a given n

Sn � arg minS�Sn

J �S� ��

when it again has been assumed that a unique minimum exists� Consider thefollowing extended horizon search algorithm� where the integer n� � is calledthe search horizon�

Search Algorithm�

�� Start with the regime Z�� Let n � ��

�� At each step n �� nd a sequence of decompositions and local modelstructures Sn� Sn�� Sn�n� that solves the optimization problem

minS�Sn�n�

J �S�

�� Restrict the search tree by keeping the decomposition that leads to Sn��xed for the future�

�� If

J �Sn� � mink�f��n�g

J �Sn�k�

then increment n and go to �� Otherwise� the model structure Sn is chosen�

Referring to Fig� �� this algorithm will search the tree starting at the top �cor�responding to one local model covering the whole operating space�� and selectinga decomposition at each level through a sequence of �locally exhaustive� searchesof depth n�� In other words� this algorithm will make an n��step�ahead optimaldecomposition at each step� in the sense that the decomposition is optimal if thereis going to be exactly n� more decompositions� If n� � �� this is a local searchalgorithm�

�� Heuristic Search Algorithm

Clearly� the performance of the algorithm is expected to improve as n� increases�but the computational complexity makes n� � � not feasible for any practicalproblem with � � desktop computer technology� even if the local model structureset L contains as few as � or � possibilities�Example� Consider the problem of identifying a state�space model of the formx�t �� f�x�t�� where dim�x� � �� and we apply local models of the form�BBB�

x��t ��x��t ��

��x��t ��

�CCCA �

�BBB�a�a��a�

�CCCA �BBB�a�� a�� a��a�� a�� a��

� � ��

a�� a�� a��

�CCCA�BBB�x��t�x��t��

x��t�

�CCCA

�� SYSTEM IDENTIFICATION ��

Each of the �� parameters can be replaced by a structural zero� which gives a setof local linear model structure with � � �� elements� On the other hand� evenif there is only one possible local model structure to choose among� the number ofpossible decompositions into no more than �ve regimes is

.S� �� .S� � � dN� ��dN�� *�dN��

�*�dN��

For d � � and N� � �� this is approximately � � �� candidate decompositions�With n� � �� the model structure set is considerably reduced� In particular�n� � � gives about �� decompositions to search among� n� � � gives about ��while n� � � gives � candidate decompositions�

�

Because of the combinatorial nature of the model structure set� it is clearly of inter�est to implement some heuristics that cut down on the computational complexity�without sacri�cing too much of the optimality of the algorithm� As we have seen�the number of candidate decompositions at each step in the search may be large�To reduce the number of candidates� we suggest to apply the following heuristicsin the �locally exhaustive� search at the second step in the search algorithm�

Heuristic �� At each level in the search tree� proceed with only the most promisingcandidates�

The best candidates are of course not known a priori� so there is always a possibilitythat this may lead to a sub�optimal model� We suggest to proceed with the bestdecomposition for each of the possible splitting dimensions� Instead of trying to�nd the best candidates� one can often more easily single out the �least promisingcandidate decompositions��

Heuristic �� Discard the candidate decompositions that give an increase in thecriterion from one level in the search tree to the next�

The number of remaining candidates will typically be larger than when usingHeuristic �� but the chance of discarding the optimal decomposition may besmaller� Some candidate decompositions may give rise to regimes where no sub�stantial amount of data is available� and may therefore be classi�ed a priori as notfeasible�

Heuristic �� Discard candidate decompositions that lead to regimes with lessdata points relevant to the regime than the number of degrees of freedom in thecorresponding local model structure and local model validity function�

Counting the number of relevant data�points associated with each regime is con�troversial� since the interpolation functions overlap� We use the heuristic countli �

Plt��wi�z�t�� which has the attractive property

Pi�IN

li � l�

Heuristic �� Use a backward or forward stepwise regression procedure to handlelocal model structure sets L of combinatorial nature� as implemented by Sugenoand Kang ��

Related to the example above� one should start with no structural zeroes� andthen add one structural zero at a time� choosing the one that gives most signi�cantimprovement in the prediction performance� This should give less than �� candidate model structures� which is quite di�erent from ��


�� Parameter Identication

For each model structure S� the prediction error criterion JS�� is used for parame�ter identi�cation� Let us brie�y concentrate on parameter identi�cation algorithmsfor models with a �xed structure� i�e� solving the problem ��

First� consider the simplest possible case� when the generalized output vector ��t�and generalized input vector ��t� are explicitly given by the data� In addition� weassume that all the local model structures are linearly parameterized� i�e� can bewritten in the linear regression form

"fi��t�' �i� � �Ti ��t��i ��

for some matrix �i��t�� Using �� the global model is

��t� �Xi�IN

�Ti ��t��wi�z�t��i e�t�

which is also in the linear regression form� A standard o��line least squares algo�rithm� e�g� �Ljung � �� is applied to estimate the parameters�

If any of the assumptions above is violated� a numerical search algorithm is appliedto �nd the parameter estimate� This is simply because the predictor will be non�linearly parameterized� We use the Fletcher�Reeves�Powell conjugate gradientalgorithm with a line search �Press� Flannery� Teukolsky and Vetterling � ��which requires computation of the gradient

�

��JS��

In the case of an NARMAX model based on local linear ARMAX models� thisgradient can be e�ectively computed using a �lter similar to the ARMAX case�e�g� Ljung �� An explicit expression for the gradient can also be foundfor some other special cases� Otherwise� an approximation to �� is found bynumerical di�erentiation�

�� User Choices

The basis�function b�r'�� with scaling parameter � has the purpose of providinga smooth interpolation between the local models� The basis�function is assumedto have the property b�r'�� for all r � R and b�r� �� as jrj � ��Typical choices are bell�shaped kernel�functions like the unnormalized Gaussianexp��r�� It may appear that the choice of this function has signi�cant impacton the model� However� it is our experience that both the algorithm and themodel�s prediction performance are quite insensitive with respect to this choice�and the speci�cation of this function does not require any prior knowledge aboutthe system� What is more important is the choice of �� which is controlled by theuser�speci�ed parameter ��

�� STATISTICAL PROPERTIES ��

To compute the criterion JGCV or JFPE � the e�ective number of parameters p inthe model structure must be known� If the choice of model structure is not basedon the data Dl� then the e�ective number of parameters is

p �Xi�IN

dim��i� ��

in the case of linear regression� However� the proposed algorithm for model struc�ture identi�cation makes some use of the data Dl for restricting the model structureset during the search� Hence� the p�value given by �� will be too small� Count�ing the e�ective number of parameters in this case is controversial� We apply theheuristic

p � ��N � �� Xi�IN

dim��i�

where � � is a heuristic constant� which also can be interpreted as a smoothingparameter� since a large � will put a large penalty on model complexity� and willtherefore give a smooth model� A typical choice of � is between � and � �Friedman� ��

�� Statistical Properties

Consider the bias�variance decomposition �� It is evident that a small biasrequires in general a complex model structure with a large number of operatingregimes� cf� Theorem � in Chapter �� On the other hand� a small variance requiresa model structure that is simple� with few parameters compared to the numberof observations l� �The perfect model� is characterized by both a small bias andvariance� which appears to be impossible to achieve simultaneously for a small l�This is known as the bias�variance dilemma� cf� Fig �� However� as l � ��it is possible to make the bias and variance vanish simultaneously� This will bediscussed in section �� For �nite l� it is important to make the correct trade�o�between bias and variance to �nd the best model structure� as we will discuss insection ��

�� Asymptotic Properties

It is shown in Theorem � in Chapter � that both the bias and the variance willtend towards their smallest possible values� with probability one� provided

�� The prediction error criterion converges to its expectation� see �Ljung � �for conditions under which this holds�

�� The estimate J of the expected squared prediction error used for modelstructure identi�cation converges to its expectation�

�� Global minima of the parameter and structure optimization problems arefound with probability one�


�� The model set

M � fM � �S� ��'S � S� � � )Sgcan be covered by a �nite �net�

It is shown in Chapter � that the use of a separate validation data sequence formodel structure identi�cation gives a J that satis�es the second requirement� Itis also known that FPE and AIC are slightly biased�

However� the main problemmay be the third requirement� since neither the param�eter optimization nor the structure optimization algorithms need result in globalminima� An attractive feature of the model structure tree is that it appears notonly to have a large amount of multiple global minima� but also many close�to�optimal local minima� It is easy to see that the restriction of the search to anysub�tree of the model structure tree does not exclude any possible decomposition�The worst thing that can happen is that the number of decompositions may besomewhat larger the necessary� or alternatively that the partition may not be as�ne as desired� This leads to sub�optimality for �nite amount of data� but notnecessarily so asymptotically�

The fourth condition is somewhat technical� but does in general impose a restric�tion on the complexity of the model set� In practise� this is not a serious restriction�as discussed in Chapter ��

�� Finite Sample Accuracy

For practical problems� the �nite sample accuracy is obviously more interestingthan the asymptotic properties� Consider for example the problem of modelingthe unknown static system

y � �e�u�

sin�u� e ��

where e � N �� Using some unknown probability density for u� �� datasequences D�

�� D�� are generated� each containing l � �� realizations of the

mapping �� Running the identi�cation algorithm on each of these �� datasequences gives �� di�erent models� Some have the same structure� but havedi�erent parameters� while others have both di�erent structure and parameters�For example� using local linear models� the search algorithmwith n� � �� the GCVcriterion with � � �� the least squares algorithm� and Gaussian basis�function with� � � gives �� models as shown in Fig �� The number of regimes is typicallythree to �ve� but we observe from Fig� �� that the variability is fairly small�except in the regimes where data is sparse� Tests indicate that the variability maydecrease slightly as n� increases�

Usually� one has only one data sequence available� In one particular case� supposeonly D�

� is available� It is clearly of interest to compute some measure of theuncertainty of the model identi�ed on the basis of this data set� Since no moreindependent data is available� the best we can do is to reuse the data D�

�� Onee�ective data resampling technique is the bootstrap� e�g� �Efron and Tibshirani

�� STATISTICAL PROPERTIES ��

-3 -2 -1 0 1 2 3 4 5

-0.5

0

0.5

1

u

y

Figure �� The curves are �� models �tted using �� di�erent data sets generatedby the same system� each containing �� samples� The dots are some typicalobservations from the unknown static system y � f�u� e�

-3 -2 -1 0 1 2 3 4 5-1

-0.5

0

0.5

1

1.5

u

y

Figure �� The �� dots are observations from the unknown static system� Thesolid curve � / � is a model "f �u� �tted using the �� observations� while thedashed curve �� is the �true system� f�u�� Approximate con�dence setsestimated using bootstrapping with B � �� are dotted ��


� �� The idea is to generate a large number B of new data sequences of lengthl by randomly choosing data from D�

� with replacement� This gives a set of Bmodels� and using this set of models� one may for example estimate the bias� vari�ance� and con�dence sets using Monte Carlo simulation� Fig� �� shows f�u�� "f �u�and approximate con�dence sets corresponding to

"f �u� dbias "f �u�� rcvar "f �u��The use of bootstrapping for the estimation of statistics is somewhat more com�plicated for dynamic systems because one needs to preserve the time�structure inthe resamples� see the survey by Carlstein �� and also Chapter ��

A theoretical analysis of the �nite sample properties is outside the scope of thiswork� although a modi�cation of the results of Kavli and Weyer �� would befeasible� These results makes use of the theory of Vapnik �� which makes weakassumptions on the model set and underlying probability measure� but is basedon the strong assumption that the observations are independent� Essentially� thistheory gives potentially very conservative con�dence sets for the prediction error�parameter estimates� and model structure�

�� Examples

�� Simulation Example� A Batch Fermentation Reactor�Revisited�

Consider the fermentation of glucose to gluconic acid by the micro�organismPseu�domonas ovalis in a well stirred batch reactor� The main overall reaction mecha�nism is described by




The �rst reaction is the reproduction of cells� consuming the substrate glucose� andoxygen� The second reaction is the production of gluconolactone� again consumingglucose and oxygen� This reaction is enzyme�catalysed by the cells� while the lastreaction forms the �nal product� gluconic acid� The model used to simulate the�true system� is the same as in section �� The data sets used for identi�cationand validation are also the same as in section ��

Alternative �� Semi�empirical Local Model Structure

We have speci�ed only one possible local linear discrete�time state�space modelstructure

x�t �� ai Aix�t�


with �� structural zeros in the Ai matrices� These zeros follow directly fromthe reaction mechanism� using the assumption that glucose and oxygen are rate�limiting� We choose the operating point z � �sn� cn�T � which captures these non�linearities and characterizes the operating conditions of the process with respectto local linear models� see also section ��

Running the search algorithm with n� � �� the FPE criterion with � � �� the leastsquares estimator� and Gaussian basis�function with � � �� results in a model with�ve operating regimes� and root average squared one�step�ahead prediction error�PE� on the validation data PE�� Restricting the number of operatingregimes to three� gives PE�� while an identi�ed global linear model givesPE�� This clearly indicates that there exists signi�cant non�linearities thathave been captured by the two more complex models and not by the linear model�The �ve regimes are illustrated in Fig� �� Perhaps the most interesting and

0 0.1 0.2 0.3 0.4 0.5 0.6 0.70

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Reg. 4 Reg. 5

Reg. 1Reg. 2

Reg. 3

time

cn

sn

Figure �� The decomposition into �ve regimes using the simulated fermenterdata� with a typical simulated system trajectory projected onto the �cn� sn��plane�

attractive feature of the method is that the identi�ed model can be interpreted ina natural way� The �ve regimes correspond to the following phases in the batch

�� Initial phase�

�� Growth phase� where only the amount of micro�organisms is limiting therate of the reactions�

�� Oxygen supply is rate�limiting�

�� Glucose is rate�limiting�

�� No glucose left� termination�

This gives a high�level qualitative description of the reactor� More low�level quan�titative details on e�g� reaction kinetics can be added by examining the parametersof the local models� The model can be represented as the model tree shown in Fig�� see also �Str�omberg et al� � �� Omohundro � � Breiman� Friedman� Olshenand Stone � �� A simulation of a typical batch from the test set is shown in Fig�


Concentration

MEDIUM

Oxygen

Concentration

Glucose

Concentration

Oxygen

LARGESMALL

SMALLLARGE

LARGE SMALL

+x � a Ax

+x � a� A�x+x � a� A�x +x � a� A�x+x � a� A�x

Figure �� Model tree representation of fermenter model�

�� using the model with �ve local models� and the identi�ed global linear modelfor comparison� Clearly� the results favor the non�linear model� Comparison withsimilar models with three or four regimes� designed by hand on the basis of systemknowledge �Johansen and Foss � �b�� see also section �� indicates that theirperformance are comparable� Results of simulations using various heuristics onthe problem� when N � � is �xed a priori� are shown in Fig� �� The heuristicsgive models that are only marginally suboptimal� Moreover� n� � � gives about�0 improvement over n� � �� Even if the variability of the models is small mea�sured by the chosen criterion� the structural di�erence appears to be large fromthe qualitatively di�erent regimes in Fig� �� However� from the typical trajec�tory projected onto the operating space� it is evident that due to the evolution ofthe batch� all three models are similar� since the three regimes correspond to theinitial� intermediate� and �nal phases of the batch in all three cases�

Basically� the applied a priori knowledge is the overall reaction mechanism� usedfor structuring the Ai matrices and for selection of the two variables used tocharacterize the operating regimes� It must be noted that the algorithm has alsobeen applied without this knowledge� i�e� with full Ai matrices and operatingpoint z � x� resulting in only a slight decrease in the prediction accuracy of theidenti�ed model� In this case� it is interesting to observe that among the �vecomponents of the operating point� the algorithm chooses only �n and cn fordecomposition into regimes� As noted in section �� due to the batch nature ofthe process operation� the information in the state �n is highly redundant withthe information in sn� This is also evident from Fig� �� which clearly showsthe collinearity between these variables� Hence� the algorithm is forced to makea somewhat arbitrary choice about which variable to use for decomposition� afact that makes the interpretation of the model more di�cult� This problem isalso commented upon in view of the MARS algorithm by De Veaux et al� ��


0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

1.2

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

1.2

time [h]

pn

�n

lncn

sn

sn

�n

lncn

pn

w� w� w�

w�

w

Figure �� Top� Trajectories with circles are generated by the �true system�� whilethe other trajectories are simulations of the model based on �ve local linear models�Middle� The relative weight of the �ve local linear models in the interpolation�Bottom� Simulation of an identi�ed global linear model�


0 0.1 0.2 0.3 0.4 0.5 0.6 0.70

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.70

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.70

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Heuristic 1

No Heuristics

Heuristic 2

PE = 0.014734

PE = 0.014009 PE = 0.013989

cn

cn

sn

sn

n� � �

n� � �

sn

cn

n� � �

Figure �� The �gure shows the decompositions into three regimes found usingdi�erent heuristics on the simulated fermenter data�

This only emphasizes the important fact that the success of empirical modelingis heavily dependent on the information in the empirical data� and that datade�ciencies should to the highest possible extent be compensated for using priorknowledge�

Alternative �� Mechanistic local model structure

A set of mass balances for this reactor can be found using knowledge of the reactionmechanism� stoichiometry� and oxygen uptake mechanism �Ghose and Ghosh � ��

+� � �+l � ��

Yljpkpl

+p � kpl+s � � �

Ysj� ��

Ylj��

+c � kla�c� � c��

Y� � � �

Y��

��


The internal variables � � and kp are reaction rates for the three reactions� c� isthe maximum dissolved oxygen concentration� and kla is a mass transfer coe��cient that describes the uptake of oxygen� The yield coe�cients Y�� Y�� Yljp� Ylj�and Ysj are determined by the stoichiometry� and assumed to be known� Themodel structure is not complete unless expressions for the abovementioned inter�nal variables and parameters are known� Without any more prior knowledge� onemay use empirical relations of the form

� �c� s�kp � kp�c� s�� c� s�c� � constantkla � constant

��

The dependence on variables like temperature� pH� and agitation speed is ignoredin this example� but can be included in the same way� In summary� the modelstructure consists of the incomplete mechanistic model structure �� and someunknown empirical functions �� Let us now consider how this modeling prob�lem can be solved within the operating regime based modeling framework� On thebasis of �� and �� we construct a local semi�mechanistic model structuresby substituting local empirical approximations

i�c� s� � �i�� i��c �i�s

kpi�c� s� � �i��

�i�c� s� � �i�� i��c �i��s

where the unknown local parameters �i�� i�� correspond to local model i� Thevariable kp is approximated locally by a constant� while � and are approximatedlocally by linear functions� The reason for this di�erence is that it is expected thatkp will show considerably less variations than the other two variables� In addition�the model consists of two unknown global parameters

c� � ��

kla � ��

The model is found with the aid of the structure identi�cation algorithm based onthe FPE criterion with n� � �� and the parameters are estimatedusing the Fletcher�Reeves�Powell conjugate gradient algorithm with a line search�Press et al� � �� The model equations are integrated numerically using anadaptive Runge�Kutta method �Press et al� � �� We stop the algorithm whenfour regimes are identi�ed� cf� Fig� �� The prediction performance of the modelon an independent� but typical� batch is illustrated in Fig� �� from which we seethat the model has captured the major non�linear e�ects� Since the reaction ratesappear linearly in the model equations� we are able to study the identi�ed kineticmodel directly� because


0 1 2 3 4 5 6

x 10−3

0

5

10

15

20

25

30

35

40

45

50

Reg. 1

Reg. 4

Reg. 3

Reg. 2 time

s

c

Figure �� The identi�ed decomposition into four regimes in the fermenter sim�ulation example� The trajectory corresponds to a typical batch� projected on theoperating space�

" �c� s� ��Xi��

" i�c� s�wi�c� s�

"kp�c� s� ��Xi��

"kpi�c� s�wi�c� s�

"��c� s� ��Xi��

"�i�c� s�wi�c� s�

These functions are illustrated in Fig� �� which also contains the �true kineticmodel� used in the simulator� Comparing the identi�ed functions with the �trueones� clearly shows that the identi�ed model is a close approximation to the �truemodel� in the parts of the operating space that are densely populated with empir�ical data� while there may be some mismatch elsewhere� cf� Fig� �� Moreover�the identi�ed kinetic model suggests a possible interpretation of the � regimes�It is evident that neither oxygen nor glucose is rate�limiting in the initial phaseof the batch corresponding to Regimes � and �� cf� Fig� �� In these regimesit is only the biomass concentration that limits the reaction rates� On the otherhand� oxygen is rate�limiting through Regime �� and at the end of this regime andthrough Regime �� glucose is rate�limiting� It should be mentioned that a similarapproach was taken in �Psichogios and Ungar � �� Aoyama and Venkatasubra�manian � �� where the kinetic model was represented by a neural network� Thegeneral properties of these models are quite similar to the one presented here�


0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

time [h]

w�

w�w

w�

sn�n

cn

pn

ln

sn�n

ln

cn

pn

Figure �� Top� Trajectories with circles are generated by the �true system�� whilethe others trajectories are simulations of the identi�ed model with four operatingregimes and mechanistic local models� Middle� The relative weight of the variouslocal models in the interpolation� Bottom� A simulation with an identi�ed modelwith only one operating regime �for the purpose of comparison��


0

5x 10

−3

0

50

0

0.5

1

0

5x 10

−3

0

50

0

0.2

0.4

0

5x 10

−3

0

50

0

5

10

0

5x 10

−3

0

50

0

0.5

1

0

5x 10

−3

0

50

0

0.2

0.4

0

5x 10

−3

0

50

0

5

10

�kp �

�kp ��

sc

sc s c

cs

cs

cs

Figure �� Top� The true kinetic model used in simulator� Bottom� Identi�edkinetic model�

�� Simulation Example� A pHneutralization Tank

Consider a pH�neutralization tank� where there are three in�uent streams and onee1uent stream�

� In�uent acid stream Q� �HNO�

� In�uent bu�er stream Q� �NaHCO�

� In�uent base stream Q �NaOH and traces of NaHCO�

� E1uent stream Q�

For our simulation study we use the model of Hall and Seborg �� to simulatethe �true system�� It is based on the assumptions of perfect mixing� constantdensity� fast reactions and completely soluble ions� Only the following chemicalreactions are modeled

H�O�� OH� H�

H�CO�� HCO� H�

HCO�� CO��

H�

because HNO is a strong acid and NaOH is a strong base� Chemical reactioninvariants for the process are �Gustafsson and Waller � ��

Wa � $H�%� $OH�%� $HCO� %� �$CO�� %

Wb � $H�CO% $HCO� % $CO�� %


Using the equilibrium equations

Ka� � $HCO� %$H�%$H�CO%

��

Ka� � $CO�� %$H

�%$HCO� %��

Kw � $H�%$OH�%

an implicit equation for $H�% is found

Wa � $H�%� Kw

$H�%�Wb

Ka�$H�% �Ka�Ka�$H�%�

� Ka�$H�% Ka�Ka�$H�%�

Solving this equation for $H�%� we can �nd pH � �log�$H�%� A total massbalance for the tank gives

A +h � Q� Q� Q � cph� h

where c is a valve constant� A is the tank cross�section area� h is the freely varyingtank level� and h the vertical distance from the bottom of the tank to the outlet�Component balances gives

hA +Wa � Q��Wa� �Wa� Q��Wa� �Wa� Q�Wa �Wa�

hA +Wb � Q��Wb� �Wb� Q��Wb� �Wb� Q�Wb �Wb�

where Wai and Wbi are chemical reaction invariants of the i�th stream� Thevariables are de�ned in Table ��

The streams Q� and Q� are �xed� while Q is controlled� The data used for modelidenti�cation is shown in Fig� ��a� while the data sequence used for validatingthe model is shown in Fig� ��b� Both data sequences are noise�free� and thesampling interval is �� s� Clearly� the validation data covers a signi�cantly wideroperating range than the identi�cation data� which is typical for many applications�

Alternative �� A Simple Mechanistic Model

Let us �rst consider a very simple model of the tank� based on neglecting thebu�er stream in the model� Hence� the streams are


� In�uent base stream Q �NaOH�


The only reaction considered in this model is

H�O�� OH� H�

A reaction invariant is W � $H�%� $OH�%� and an implicit equation for $H�% is

W � $H�%�Kw$H�% ��


Table �� Symbols� constants and variables used in models�Symbol Variable Nominal valueA Tank area �� cm�

h Tank level �� cmh Tank outlet level � cmQ� Acid �ow�rate �� mlsQ� Bu�er �ow�rate �� mlsQ Base �ow�rate �� mls$HNO%� Acid concentration in acid stream �� moll$NaHCO% Bu�er concentration in base stream �� moll$NaOH% Base concentration in base stream �� moll$NaHCO%� Bu�er concentration in bu�er stream �� mollc Valve constant mls

pcm

pKa� �log�Ka� ��pKa� �log�Ka� ��pKw �log�Kw ��Wa� $HNO%� �� mollWa� �$NaHCO%� �� mollWa �$NaHCO% � $NaOH% �� mollWb� �Wb� $NaHCO%� �� mollWb $NaHCO% �� mollW� $HNO%� �� mollW� �W �$NaOH% �� moll

The mass balances are

A +h � Q� Q � cph� h ��

hA +W � Q��W� �W � Q�W �W � ��

This model is validated against the validation data by a simulation of the responseto the validation data input sequence in Fig� ��a� We observe that the modelis accurate for high pH values and low pH values� but inaccurate for intermediatevalues of pH� In fact� this model is a reasonable model of almost any neutralizationprocess at high and low pH values� Of course� what is high and low in this contextwill depend on the particular process�

Alternative �� An ARX Model

Next� the ARX model

pH�t �� pH�t�� Q�t� � ��is identi�ed using the least squares method and the identi�cation data� A simula�tion of the model�s response to the validation data input sequence is shown in Fig�


0 20 40 60 80 100 1200

10

20

30

Base flow-rate

0 20 40 60 80 100 1202

4

6

8

10

12pH

0 20 40 60 80 100 1202

4

6

8

10

12pH

0 20 40 60 80 100 120

5

10

15

20

25

30

Base flow-rate

time [min]

time [min]

a) Data sequence used for model identification

b) Data sequence used for model validation

Figure �� Simulated data sequences�

��b� The prediction error is clearly smallest for intermediate pH values� whileit is completely wrong for large pH values� An NARX model based on a numberof local ARX models� and identi�ed using the structure identi�cation algorithmgives better �t for intermediate pH values� but even less convincing predictionswhen extrapolated to regimes with large pH value�


0 20 40 60 80 100 1202

4

6

8

10

12

14

16

0 20 40 60 80 100 1202

4

6

8

10

12

14

16

0 20 40 60 80 100 1202

4

6

8

10

12

14

16

pH

pH

pH

time [min]

a) Simplified mechanistic model

b) Empirical ARX model

c) Hybrid model (ARX + simplified mechanistic)

Figure �� Simulations of the three models� responses to the validation datainput sequences� Solid curves � / � are the �true system�� while dashed�dottedcurves �� are model simulations�

Alternative �� A Hybrid Model

Finally� consider the case when the structure identi�cation algorithm is allowed tochoose between the local ARX model structure� and the simpli�ed local mechanis�tic model structure� The hybrid model is a discrete�time model� and the operatingpoint is chosen as z�t� � pH�t�� The continuous�time mechanistic sub�model isintegrated numerically using the explicit Euler method over the �� s sampling


interval �unit time�step of the ARX model�� The problems discussed in section�� regarding transition between di�erent state�spaces in the di�erent regimesarises here� It is possible to compute the model state W directly from �� andthe pH value� i�e� there exists a one�to�one mapping from the output to this statevariable� The state h appears to be somewhat more di�cult to deal with� sincethere exists no such mapping for this state� Fortunately� h is globally observablefrom the inlet and outlet �ow�rates� and by keep integrating �� even if themechanistic model is not relevant at the current operating point� the problem isresolved�

We use the structure identi�cation algorithm based on JBOOT � where pfuture�pH�is chosen to re�ect a uniform distribution of pH values between � and �� andppast�pH� is a normal distribution estimated using the identi�cation data� cf�Chapter �� Notice that we have introduced a simpli�cation by restricting theprobability distribution to depend on the operating point only� The algorithmidenti�es a hybrid model that consists of three local models� The �rst is the sim�ple mechanistic model that corresponds to an operating regime with high pH value�i�e� above pH equal to �� The second local model is an empirical ARX model�that is valid at intermediate pH values� i�e� between �� and �� The third localmodel is the mechanistic model that corresponds to an operating regime with lowpH value� The relative weight in the interpolation is shown in Fig� �� and asimulation of the model�s response to the validation data input sequence is shownin Fig� ��c� We see that the prediction accuracy has improved signi�cantly formost operating conditions� but we also see that there exists room for improvementby tuning the regimes and local model validity functions�

0 2 4 6 8 10 12 140

0.2

0.4

0.6

0.8

1

Mechanistic ModelEmpirical ModelMechanistic Model

pH

w��pH� w��pH� w��pH�

Figure �� Interpolation functions for the hybrid model�


�� Experimental Results� Hydraulic Manipulator

A data sequence with �� samples� logged from a hydraulic TR�� robot �Kavli� �� from ABB Trallfa Robotics A�S� was used to �nd a model describing theinverse dynamics

� �t� � f�q�t�� +q�t�� q�t�� e�t�

of a joint of this robot� where � �t� is the control signal to the servo valve� q�t�is joint position� and e�t� is equation error� The joint position was logged at asampling rate of �� Hz while the robot was moving along a randomly generatedtrajectory� The joint velocity and acceleration was estimated by low�pass �lteringand numerical di�erentiations� The predictions of an estimated linear model wassubtracted from the data to emphasize the non�linearities� According to Kavli�� the non�linearities are mainly due to variations in the momentum arm ofthe hydraulic cylinder� non�linear damping� and non�linear pressure gain charac�teristics due to varying �ow�rates in the servo valve� In addition� �� independentsamples was used for validating the model� A number of models were identi�ed�based on the structure identi�cation algorithm� a least squares parameter estima�tion algorithm� local linear model structure� and a Gaussian basis�function with� � �� The results are summarized and compared to the spline models of Kavli�� the algorithm for local model identi�cation of Murray�Smith �� a�� theMARS algorithm �Friedman � �� and some neural network models reported byCarlin� Kavli and Lillekjendlie �� in Table �� The table shows that thestructure identi�cation algorithm is able to �nd an adequate model with a smallnumber of parameters while maintaining the high accuracy of the models found bythe other empirical modeling algorithms� In all cases� only the parameters corre�sponding to local model parameters or basis�function coe�cients are counted in thetable� Notice that according to the FPE criterion with � � �� the data sequenceallows more degrees of freedom to be added to the identi�ed model structure� Thiswas not pursued due to the computational complexity� The operating point waschosen as z � �q� +q� �q� although complementary identi�cation experiments showedthat z � � +q� �q� was su�cient to capture most of the non�linearities�

�� Discussion of Examples

In the fermenter example� we have identi�ed operating regime based models using�� a semi�empirical local linear model structure with some structural zeros� basedon a linearized mass balance� and �� a mechanistic local model structure basedon a mass balance with simpli�ed empirical equations for the unknown reactionrates� The results where qualitatively similar� with roughly the same predictionperformance and interpretation of the identi�ed regimes� With mechanistic localmodel structure it was possible to �nd a global approximation to the reactionrates directly from the identi�ed local reaction rate equations� A study of theseequations gives signi�cant understanding of the reaction mechanisms and can guidethe development of a mechanistic kinetic model� In other words� the hybrid modelmay be a useful intermediate step on the search for a mechanistic model� Notice


Model Comments Num� NRMSEParam�

ASMODz Quadratic Spline Basis �� 0MARS �� 0Local Local linear models

� � �� n� � �� Heuristics �� and � � � 0Local Local linear models

� � �� n� � �� Heuristic � � � 0Local Local linear models NA � 0MARS �� 0Local Local linear models

� � �� n� � �� Heuristics �� and � �� 0Local Local linear models

� � �� n� � �� Heuristic � �� 0RBF� Gaussian radial basis�functions �� 0MARS �� 0ASMODz Quadratic Spline Basis � �� 0Local Local linear models

� � �� n� � �� Heuristics �� and � �� 0NN� Sigmoidal Neural Network �� 0Local Local linear models

� � �� n� � �� Heuristic � �� 0NN� Sigmoidal Neural Network �� 0

Table �� Results of applying various identi�cation algorithms on the hydraulicmanipulator joint data� The result marked with � is with the identi�cation algo�rithm of Murray�Smith �� a�� Results marked with and z are from �Carlin�Kavli� Lillekjendlie� � �� and �Kavli� � �� respectively� The NRMSE criterionis de�ned as the square root of the ratio of the average squared one�step�aheadprediction error to the variance of the output� using the independent test data�

that in this example� the form of the models are equivalent to a mechanisticmodel� and does therefore not impose any serious restrictions on the applicability�The underlying modeling principle is quite general� First ones derives a set ofbalance equations with a number of internal variables that are hard to model� Suchvariables are typically related to reaction kinetics� thermodynamics� �uid �ow� andmass� and heat�transfer� Since such variables must often be modeled empiricallyanyway� we suggest either simple local approximations to these relations� or simplelocal approximations of the balance equations themselves� If a su�ciently largeamount of empirical data is available� this procedure may give a transparent�reliable and inexpensive model�

The pH neutralization example shows that the proposed model structure identi��cation algorithm is able to identify a sensible hybrid model structure when beingallowed to choose between an oversimpli�ed local mechanistic model structure anda local linear empirical model structure� The algorithm utilizes the a priori infor�mation that the model will be applied under a signi�cantly wider range of operating


conditions than re�ected in the data used for model identi�cation� The identi�edmodel consists of local mechanistic models in the regimes where there are no data�and local empirical models in regimes that contain su�ciently large amounts ofdata� The hybrid model is therefore able to �t the data well� as an empirical model�and provides reliable extrapolation� as a mechanistic model� This example clearlyillustrates the power of the operating regime based modeling framework and thestructure identi�cation algorithm� In addition to a data sequence� only ratherelementary process knowledge is applied� Constructing a complete mechanisticmodel that take into consideration all possible chemical components in the tankcan be a resource demanding task� in more complicated examples than this� Onthe other hand� a good empirical model would require a larger amount of empiricaldata that would cover a signi�cantly wider range of operating conditions� In somecases� providing such additional data could require extra resources� Clearly� thestructure identi�cation algorithm is not necessary for such a simple problem ashere� and regimes chosen �by hand� leads to a similar model �Foss and Johansen� ��

In the hydraulic manipulator example� the large number of regimes makes theinterpretation of the empirical model more di�cult than in the previous example�The model should be viewed as a black box� Hence� this example mainly servesas a benchmark that shows that the accuracy with the operating regime basedmodeling approach is comparable to some of the most popular empirical modelingalgorithm from the literature�

There are more examples available in the literature �both simulated and experi�mental� that illustrate the use of various system identi�cation algorithm and localempirical models� typically local linear regression or local ARX models� Theseinclude river �ow modeling �Sugeno and Kang � �� environmental modeling�Nakamori and Ryoke � �� multi�layer incinerator modeling �Sugeno and Kang� �� Nakamori et al� � �� a converter in a steel�making process �Takagi andSugeno � �� aluminum roll mill �Murray�Smith � �a�� and in chemometrics�N�s and Isaksson � �� N�s � �� In most of these examples it is justi�ed thatthe model performs better than simpler empirical models� but the models are gen�erally not compared to other complex empirical or mechanistic models� Of course�the same kind of argument can be used against some of the examples in this thesis�

�� Discussion

�� A Priori Knowledge� What is required� and what canbe incorporated�

The system knowledge required with the proposed approach is quite reasonable�First of all� an operating point space Z is required� In many cases� it is possible tochoose Z equal to a subspace or sub�manifold of the input space� as discussed inChapters � and �� The design of Z need not be based solely on a priori knowledge�but can in addition consider the distribution of the data Dl� Quite often� thereare collinearities or correlations in the data� so that Dl can be embedded in a


subspace or sub�manifold of considerably lower dimension than the input space�In that case� z need not be of higher dimension than this embedding� As we haveseen� some system knowledge will often make it possible to reduce the dimensionof z considerably� This is important� since it may reduce the complexity of themodel� improve its transparency� and also reduce the computational complexityfor the structure identi�cation algorithm considerably�

A set of local model structure candidates must be speci�ed� If no a priori knowl�edge exists to support one choice over the other� one will typically choose locallinear model structures of various order and possibly with structural zeros as de�fault� since linear models are well understood and possible to interpret� Moreover�a linear model will always be a su�ciently good approximation locally� providedthe system is smooth� and the regimes are small enough� On the other hand� ifthere is substantial a priori knowledge available in terms of mechanistic local modelstructures� these can be included� Such local model structures may for examplebe based on simpli�ed mass� and energy�balances�

�� A Posteriori Knowledge� What can be extracted fromthe model�

The purpose of a model can be diverse� e�g� system analysis� design� optimization�prediction� control� or diagnosis� In many applications� it is important that themodel can be easily interpreted and understood in terms of the systemmechanisms�With empirical models� which are often based on black�box model representations�this is often a hard or impossible task� However� the approach presented here givestransparent empirical models because

� local models are simple enough to be interpreted�� the operating regimes are a qualitative high�level description of the systemthat is close to engineers and operators knowledge representation�

Finally� suppose the local model structures are all linear� We can examine the re�sulting operating regimes� and observe which variables cause non�linear behavior�Furthermore� by examining the parameter variations between local models corre�sponding to neighboring regimes� or examining the relative size of the regimes�we may get an indication on how strong the non�linearities are in the di�erentregimes�

�� Related Work

The proposed identi�cation algorithm has a number of relatives� Probably� themost well known algorithm is the MARS algorithm of Friedman �� This isa local search algorithm �n� � �� that searches for a set of natural spline basis�functions� The MARS algorithm also contains a model reduction part that reducesthe model structure to an optimal size� A similar algorithm is the ASMOD algo�rithm of Kavli �� with tensor product B�spline basis�functions� The structure


identi�cation part of this algorithm is based on stepwise modi�cation of the basis�functions using one out of three competing strategies� A fundamental di�erencefrom our approach is that the result of one modi�cation may be more than one ad�ditional basis�function� or a reduction in the number of basis�functions� The CARTalgorithm of Breiman et al� �� searches for a decomposition into regimes andbuilds a piecewise constant model� The structure identi�cation part �rst builds atoo complex model� which is subsequently reduced� Similar basis�function trees areconstructed by the algorithms of Sanger �� and Omohundro �� Commonto all these algorithms is that they consider non�linear regression problems� andtry to describe the data using as few dimensions of the input space as possible�increasing the dimension of the domain of the basis function only when necessary�This is motivated by the curse of dimensionality� The same problem also applyto the present approach� but to a smaller extent� since the local models are notconstants� but functions of the generalized input� This may allow the operatingpoint space to be of a smaller dimension than the input space� which reduces thecurse of dimensionality�

Local linear models are also applied by Jones and co�workers �� and by Stok�bro et al� �� together with a clustering algorithm to determine the locationof the local models� Jacobs et al� �� and Jordan and Jacobs �� use a pa�rameterized regime description and a hierarchical estimator to estimate the regimeparameters simultaneously with the local model parameters� An algorithm basedon local linear models and decomposition of regimes where the system appears tobe more complex than the model� has been suggested by Murray�Smith and Gollee�� Pottmann et al� �� have proposed a model representation based onlocal polynomial models and smooth interpolation� The structure identi�cationalgorithm is based on a orthogonal stepwise regression procedure �Kortmann andUnbehauen � � that sequentially adds or discards terms in the local polynomialmodel according to their signi�cance�

When the interpolation functions are chosen as the characteristic functions of theregime�sets Zi� a piecewise linear model results� The resulting model will notbe smooth� and may not even be continuous� which may be a requirement insome applications� Also� we have experienced that smooth interpolation betweenlocal linear models usually gives better model �t compared to a local constantor piecewise linear model with the same number of parameters� The local linearmodeling approach combined with a fuzzy set representation of the regimes alsoleads to a model representation with interpolation between the local linear models�Takagi and Sugeno � �� In that case� it is the fuzzy inference mechanismthat implicitly gives an interpolation� Structure identi�cation algorithms basedon clustering �Bezdek et al� � �a� Bezdek et al� � �b� Yoshinari et al� � ��Yager and Filev � �� Nakamori and Ryoke � �� and a heuristic local searchalgorithm �Sugeno and Kang � � have been proposed in this context� In thiscase� the b� and �i�functions are interpreted as membership functions for fuzzysets� A statistical pattern recognition approach with multiple models leads to asimilar representation based on a piecewise linear model and discriminant functionsto represent the regime boundaries� �Skeppstedt et al� � �� Finally� S�rheim�� has suggested a model representation with neural nets as local models anda structure identi�cation algorithm based on pattern recognition� The pattern


recognition algorithm will detect parts of the input space in which the model �tis inadequate� in some sense� and re�ne the model locally�

With this large body of literature in mind� one may ask� What are the contribu�tions and improvements represented by the present approach� We have attemptedto take the most attractive features from the algorithms in the literature and com�bined these into one algorithm� The algorithm of Sugeno and Kang �� is theclosest relative to the present algorithm� The main di�erence is the extra �exibil�ity and e�ort applied to �nd a closer to optimal model with the present algorithm�In addition� we have emphasized interpretability of the resulting model� �exibilitywith respect to incorporation of prior knowledge� and a transparent modeling andidenti�cation process that is close to engineering practise� The price we have topay is a computer intensive algorithm� Some may also argue that the algorithmis too �exible and not completely automatic� and as a result it may be di�cult toapply for inexperienced users� However� it is our view that real world applicationsrequire perhaps even more �exibility and a less automated approach�

�� Limitations and Possible Improvements

One major limitation of the proposed algorithm is the restriction that the regimesmust be d�dimensional boxes with orthogonal edges� This implies that regimes thatmore naturally could have been described with a more complex regime boundary�must be represented as two or more separate regimes� with separate local models�Hence� more local models than necessary are required� On the other hand� theintroduction of complex descriptions of the regime boundaries will increase thenumber of parameters needed to represent these boundaries or local model valid�ity functions and lead to a more complex identi�cation problem� However� thework of e�g� Jordan and Jacobs �� shows that this is feasible� What is the�optimal� regime representation� is currently not clear to the authors� We believe�however� that the proposed representation is a fair tradeo� between �exibility andsimplicity� Its major strength is that the simple regime description combined withsimple local models gives a transparent model� An optimization of the �i� and �i�parameters after the model structure and local model parameters have been �xedhas been implemented� For the test�cases� the optimization gave only marginalimprovements over the heuristic choices� One may conclude that the algorithm isrobust with respect to the location of the splitting hyper�planes and the ad�hoc�choice of local model validity function parameters�

In the presented strategies and heuristics� no concerns has been made about whichmodel structure candidates to evaluate �rst� In particular when n� is large� it isimportant with a best��rst search �Pearl � �� since one cannot hope to evaluateall possible candidates in most practical cases� First locally optimal models will befound� and subsequent improvements will by found until eventually the globallyoptimal model will be found� The power of a best��rst algorithm is that it canbe interrupted at any time with a high probability of returning a good result�which is not generally true if the algorithm does not examine the most promisingcandidates �rst� Again� which are the best candidates is not known a priori� sosome heuristic must be applied to �nd the most promising candidates at each step�


An alternative strategy could be based on �rst to construct a model by stepwisedecomposition� and then to reduce the model using stepwise merging� The ideais that some of the locally optimal� but globally suboptimal� decomposition stepscan be undone during the reduction part of the algorithm� The e�ciency of thismodi�cation is strongly restricted by the fact that the regimes must be mergedpairwise� and the selected regime representation will seriously restrict which regimepairs that can be merged� Furthermore� since the forward step leads to a parsimo�nious model� the backward part of the algorithm will often �nd that very little orno model reduction is possible� and the strategy gives no improvements over theproposed algorithm� This problem can be overcome by allowing the forward partof the algorithm to generate a too complex model� However� one can not guaran�tee that such a strategy is an improvement compared to the proposed algorithm�Alternatively� one can at each step let decomposition and merging compete whendeciding what the next model structure shall be �Kavli � ��

Another class of feasible algorithms for the structure optimization problem arestochastic optimization algorithms like simulated annealing �Kirkpatrick� Gelatt Jr�and Vecchi � �� and genetic algorithms �Goldberg � �� Such algorithms canhandle both integer and real variables� and will under weak conditions convergeto a global minimum� It is the stochastic nature of the algorithms that allowsescape from local minima� Unfortunately� these algorithms converges very slowly�Yet another class of algorithms is dynamic programming� �Bellman and Dreyfus� �� Bellman � ��b�� The advantage of dynamic programming is that the factthat the model structure set is not a tree� allows the computational complexity tobe reduced� However� dynamic programming su�ers from an unreasonable demandfor computer memory�

�� The Fundamental Assumptions

In general� the fundamental assumptions behind empirical modeling is

� the empirical data is not too contaminated by noise and other unpredictablephenomena� and

� the data set is complete in the sense that it contains a su�cient amount ofobservations from all interesting operating conditions and system variables�

Unfortunately� these assumptions are often not met in practical applications� Theproposed algorithm should therefore be applied with care� and as a part of acomputer aided modeling environment that allows �exible incorporation of priorknowledge� and not as an automatic modeling algorithm� Moreover� one shouldundertake a detailed study of the robustness of the algorithm with respect tocontaminated� sparse� and incomplete data� in particularly for high dimensionaland otherwise complex modeling problems�

Chapter �

Identi�cation of Non�linearSystems using PriorKnowledge and EmpiricalData � An OptimizationApproach

Identifying a model from a �nite sample of observations without any prior knowl�edge about the system is an ill�posed problem� in the sense that a unique modelmay not exist� or it may not depend continuously on the observations �Tikhonovand Arsenin � �� Indeed� without any prior knowledge except the observations�nothing is known about the system behavior in states between the observations�Hence� even if the data are not corrupted by noise� the model can behave arbi�trarily between the observations� and it should be clear that such a model willnot be unique� nor will it depend continuously on the data� The problem is essen�tially that the �nite amount of data does not constrain the model set su�ciently�Fortunately� a minimum of prior knowledge will in general provide the necessaryconstraints� and there exists a theory that will aid reformulation of the prob�lem into a well�posed one� namely regularization theory �Tikhonov and Arsenin� �� The prior knowledge or assumption that is often used� is that the systemhas certain smoothness properties� This will constrain the system behavior in aneighborhood of each observation from changing abruptly� and some interpolationand extrapolation of the observations can be justi�ed� Such an assumption isreasonable for a large class of real world systems� but certainly not for all� Thesmoothness assumption can be incorporated explicitly by the use of a possiblyover�parameterized or nonparametric model� and a penalty on non�smoothness inthe identi�cation criterion �Bertero� De Mol and Pike � �� O�Sullivan � �� Dyn� � Madych and Nelson � �� Wahba � �� The penalty will reduce the e�ec�tive number of parameters in the model set� However� the smoothness assumption

��

�� CHAPTER �� IDENTIFICATION AN OPTIMIZATION APPROACH

is usually made implicitly through the choice of a parameterized model structurethat is simple in the sense that it has few unknown parameters compared to thenumber of observations� In this case the dimension of the model space is reduceda priori such that the data set can determine the unique model in the model set�

It is usually desirable to e�ectively employ all the available prior knowledge andempirical data� Increased amounts of prior knowledge will make the identi�cationproblem better conditioned� since more constraints on the model set are imposed�If the prior knowledge is correct� this will in general lead to a better model thatis robust against a de�cient or incomplete data set� Herein lies the motivation forour work� and we will show that the regularization framework is a convenient toolfor incorporation of prior knowledge into semi�empirical models� In particular� wewill allow prior knowledge� empirical data� and desired properties of the model inthe form of

� Smoothness of the model behavior�� Partially or completely known models�� Constraints on model structure� variables and behavior� like stability andlinearity�

� Empirical data �steady�state� time series��The various pieces of knowledge or properties can be locally valid under certainoperating conditions only� or globally valid� Moreover� varying levels of accuracy�completeness� and reliability of the knowledge can be handled�

The unifying framework we propose here is an optimization formulation of themodeling and identi�cation problem� where the di�erent pieces of knowledge willenter as penalty terms or constraints in the optimization formulation� This formu�lation of the problem has found its inspiration in �Thompson and Kramer � ��where it is discussed how di�erent kinds of knowledge can be used to structurethe model and optimization criterion� but in a signi�cantly less general frameworkthan here� Tulleken �� has also suggested the use of constraints derived fromsystem knowledge to improve the model� in the context of linear system identi��cation� Another source of inspiration is �Girosi� Jones and Poggio � �� whichanalyze how di�erent non�smoothness penalties lead to di�erent basis�functions ina series expansion of the model� The present work can be viewed as a uni�cationand extension of these approaches� Contrary to the static model formulation in�Girosi et al� � �� Thompson and Kramer � �� we formulate the problem fordynamical systems described by MISO NARMAX �non�linear ARMAX� models�

The chapter is a reprint of �Johansen � �c�� and organized as follows� In section�� the problem is formulated mathematically and some simple motivating exam�ples are given� Next� in section �� the exact solution to a simpli�ed modelingproblem is found explicitly using function space methods� and this solution is dis�cussed in detail� before numerical procedures that solve the general problem arediscussed in section �� In section �� we discuss how to tune the parametersin the optimization criterion and present alternative criteria and procedures forthis� Section �� contains a fairly complete semi�realistic simulation example thatillustrates the power of the approach� and also the e�ect of di�erent kinds of priorknowledge and empirical data on the model� The chapter ends up in a discussionand some conclusions�

�� PROBLEM FORMULATION ��

�� Problem Formulation

In this work we study the identi�cation of NARMAX models �Leontaritis andBillings � �� of the form

y�t� � f��t � �� e�t� ��

where ��t�� y�t�� y�t�ny�� uT �t�� uT�t�nu�� e�t�� e�t�ne��Tis the information vector� For simplicity we assume the system output y�t� is ascalar� while the system input u�t� � Rr may be r�dimensional� Let ��t� � ! �Rn� where ! is a compact subset in which we are interested in modeling thesystem�s input�output behavior� Furthermore� let F be an inner product spacecontaining smooth functions on !

F � ff � L��!� j f is smoothg

The smoothness in the de�nition of F should be interpreted as existence andcontinuity of su�ciently high order derivatives of the functions in F � For thepurpose of model identi�cation� a sequence of l input�output observations

��u�� y�� u�� y�� u�l�� y�l��

may be available� Moreover� we allow a possibly inaccurate default model Ma

to be available a priori� together with soft equality constraints Qf � q� whereQ � F � Q is an operator� Q is an inner product space� and q � Q� Furthermore�we allow hard constraints Hf � � and Pf � �� where H � F � H and P � F � Pare operators� and H and P are inner product spaces� In addition� an operatorS � F � S may be given� where S is an inner product space and S indicates thenon�smoothness of f � For example� S may be a di�erential operator�

We are now in position to informally state the problem addressed in this work�namely to �nd the function f � F that de�nes the NARMAX model �� that ismost consistent with the a priori knowledge and empirical data� in a well de�nedsense� For simplicity� we will assume the integers ny� nu and ne are given�

Finding the best f � F can be formulated as an optimization problem in the innerproduct space F � The prior knowledge and empirical data are used to formulatean optimization criterion that penalizes

�� Mismatch between model prediction and the empirical data�

�� Non�smoothness of the model�

�� Mismatch between model and the default model�

�� Violation of the soft constraints�

In addition� we allow hard equality and inequality constraints to the optimizationproblem� We choose the criterion

J�f� ��

l

lXt��

��t' f� �jjSf jj� �D��f�Ma� �jjQf � qjj� ��


subject to the constraints

Hf � � and Pf � �

where �� and � are non�negative constants� and the norms are induced by possi�bly weighted inner products an the various spaces� D�f�Ma� measures the distancebetween the model �� de�ned by f and the default model Ma� and ��t' f� canfor example be the one�step�ahead prediction error de�ned by

��t' f� � y�t� � f�y�t � �� y�t� ny��

u�t� �� u�t� nu�� t� �' f�� t� ne' f��

or a multi�step�ahead prediction error� It must be assumed that this model oninnovation form is asymptotically stable� Obviously� the prior knowledge and thechoice of �� and � will have great in�uence on the model� In particular� theproblem of choosing these constants will be discussed in detail in section ��

Before we proceed with the solution to this optimization problem� it will be il�lustrative to motivate this optimization formulation with examples of some priorknowledge and desired properties of the model that can be speci�ed within thisframework� A more complete example is provided in section ��

Example �� Default model�

Consider a default model of the form

+x � fa�x� u� vy � ga�x� w

Ma

The purpose of the default model is to provide a basic model that at least can beused during operating conditions when there are no empirical data or other morerelevant prior knowledge available� The modelMa can be used for one�step�aheadpredictions� and a predictor can be formulated using an extended Kalman��lter�e�g� �S�oderstr�om and Stoica � �

"x�tjt� � "x�tjt� �� K�t� �y�t� � ga�"x�tjt� ��

"x�t �jt� �Z t��

��tfa�"x�� jt�� u�� d�

"y�t �jMa� t� � ga�"x�t �jt��where the Kalman��lter gain K�t� is computed using the assumption that v�t� andw�t� are zero�mean white noise processes with known variance� A multi�step�aheadpredictor can be formulated in a similar manner�

The distance D�f�Ma� can be de�ned in several ways� Since the two models arebased on fundamentally di�erent representations �NARMAX and continuous�timestate�space�� there is no direct way of comparing these two models in this case�Hence� the natural approach is to compare them indirectly by comparing theirprediction performance when both are used to predict the response of the systemto di�erent inputs from di�erent initial states� Hence� we suggest

D��f�Ma� �

Zx�t�

Zu�t�

�"y�t �jf� t�� "y�t �jMa� t��t�dx�t�du�t�

�� PROBLEM FORMULATION ��

where ��t� is a weight�function that penalizes mismatch between the two modelsas a function of the operating condition at each time instant t�

�

Example �� Linear noise model�

Suppose we want to restrict the model �� to the set of models of the form

y�t� � f �y�t � �� y�t� ny�� u�t� �� u�t� nu��

c�e�t � �� cnee�t � ne� e�t�

where c�� c�� cne are unknown constants� This gives a linear noise model thatmay be desirable because it simpli�es estimation� analysis and application of themodel� This restriction can be represented as a constraint

��

�e�t � ��f � � � � � ��

�e�t � ne��f � �

which can be written as a linear operator equation Pf � �� by de�ning P �Lne� �R

n� and

P �

�BB��

�e�t��

��

�e�t�ne��

�CCA�

Example �� Known linear model�

Next� suppose the origin is an equilibrium point for the system� A linearized modelis given by y�t� � rT

�f��t� �� e�t�� Now� suppose a linear �ARMAX� model

A�z��y�t� � B�z��u�t� C�z��e�t� ��

of the system behavior near the origin is known� Penalty on deviations of thelinearization of the non�linear model from this linear model can be formulated asa soft constraint of the form Qf � q� The distance between the two linear modelscan be measured using di�erent metrics� for example in the frequency domain� timedomain� or some model space� Perhaps the simplest is to measure their di�erencein the parameter space� Then we choose Q � Rn� Qf � r�f�� and q is theRiesz representation of �� written on the form y�t� � qT��t � �� e�t�� Amore robust model space metric would involve comparison of poles� zeros� andgain�

Notice that this formulation is very di�erent from a default model fa��t � �� qT��t � �� where deviations from the default model will in general be penalizedglobally over the full range of operating conditions� while in this example thepenalty is de�ned only in a point� Of course� the non�smoothness penalty willindirectly make this penalty e�ective in a neighborhood of this point�


If parts of a linear model is known� like an approximate pole valid in some operatingregime� this knowledge can be incorporated in the same way�

�

Example �� Steady�state mass balance�

Consider a system which has two input �ows with rates u��t� and u��t�� and oneoutput �ow with rate y�t�� If we keep the input �ow rates �xed at some valuesu��t� � u� and u��t� � u�� and the system reaches a steady state y� then we mayconclude that the following steady state mass balance hold�

u� u� � y � �

Clearly� it may be desirable that the model have the same property� Let Pfbe the steady�state output of the system de�ned by f � then Pf � P � L��R��is a function of the steady�state inputs u� and u�� P will in general be a non�linear operator that can be implemented by solving the di�erence equation exactlyor approximated by a recursion that will be stopped when su�ciently close toconvergence�

�

Example � Stability�

Another useful kind of prior knowledge is that the system is open�loop stable� ormore precisely� that certain equilibrium points are asymptotically stable� If weseek a model of the form

y�t� � f�y�t � �� u�t� �� e�t�

then local asymptotic stability of an equilibriumpoint �y� u� is guaranteed providedthe following inequality constraints are satis�ed

�� f

�y�t � ��y� u� � �

�

The choice of inner product on the di�erent inner product spaces is of great im�portance� For example� the region of validity or relevance of a default model Ma

of the NARMAX form �� de�ned by a function fa � L��!� can be representedby a weighted L� inner product

hf�� f�i �Z��

f��f��d�

where � is a strictly positive weight function that integrates to one� and can beinterpreted as our a priori knowledge about the validity of the default model as afunction of the operating point� since we may de�ne

D��f�Ma� � jjf � fajj� �Z��

�f�� fa��d�

�� OPTIMIZATION � FUNCTION SPACE METHODS ��

A similar weighting may also be useful for the constraints� since they may bemore or less relevant under di�erent operating conditions� One may also repre�sent knowledge about the smoothness of the system behavior as a function of theoperating point by a weighted inner product� Alternatively� the weight can beinterpreted as the relative desired accuracy of the model as a function of the op�erating point� if the weights are similar for all inner products� A useful class offunction spaces are Sobolev spaces �Adams � �� which are subspaces of L��!�with inner products like

� f�� f� � �

Z��

f��f��d�

Z��

hr�f��r�f��i d�

if an objective is to make the Jacobian of the model to be a close approximationto the Jacobian of the system� This may be of particular interest if the model isused for control system design� where it is often the accuracy of the Jacobian of fthat is most important�

�� Optimization � Function Space Methods

In this section we will study the solution to the problem of minimizing the func�tional �� in the function space F � using calculus of variations� For simplicity�we will assume that there are no soft or hard constraints� the inner products arenot weighted� ne � � �we want to �nd an NARX model�� and the default modelis of the same form as the model we are seeking �i�e� an NARX model�� Hence�the default model is de�ned by a function fa � L��!�� which allows the simpledistance measure D��f�Ma� � jjf � fajj�� In other words� we consider a criterion

J�f� ��

l

lXt��

��t' f� �jjSf jj� �jjf � fajj�

Like Girosi et al� �� we choose to represent the penalty on non�smoothnessin the spatial frequency domain using the Fourier transform

#f ��

Z��

f��ej�T �d�

where #f is the Fourier transform of f � F � and � is an n�dimensional vector thatis interpreted as spatial frequency �frequency is here the inverse of distance in theinformation space� not time�� Since f is continuous� and ! has compact support�this transform is well de�ned� and the inverse transform

f��

��n

Z��Rn

#f��e�j�T �d�

will be unique� On L��!� we apply the standard inner product

� f�� f� � �

Z��Rn

#f�� #f�� d�


where the superscript in #f�� denotes complex conjugate� We restrict our atten�tion to operators S of the form �Girosi et al� � ��

�Sf��

��n

Z��Rn

�#G��

#f��e�j�T�d� ��

In other words� #f is �ltered by #G��

�� which will be chosen as a symmetric �l�

ter that ampli�es high�frequency energy� since this gives penalty on high�frequencyenergy in f � which is an intuitive measure of non�smoothness� The induced normis

jjSf jj� �Z��Rn

�� #f ��#G��

d�

where #G�� #G��

The case when � � � is analyzed in detail by Girosi et al� �� Applying themathematical tools in �Wahba � �� Madych and Nelson � �� Dyn � �� onecan show that the global minimum of J can be represented as

f�� lX

t��

�tG�� t � �� dGXj��

�j�Gj ��

where �� l and �� dG are real constants� and the set of functionsf�G� � �G� � �� GdGg satisfy

�#G��

#�Gj ��

The constant parameters �� l and �� dG can be found explicitlyby substituting �� back into the functional J and minimizing this as a functionof this �nite number of parameters� cf� �Wahba � ��

Observe that the �Gj �terms in the model appear because certain functions willbe �invisible� to the non�smoothness penalty� and such terms can be added tothe solution without imposing any cost� For example� if #G�� j�� then#G�� j�� and � #G�� is an operator that di�erentiates fourtimes� Hence� �G� � ��

GdGis a basis for the space of polynomials with order no

greater than three� As a curiosity� the resulting basis functions are in this casecubic splines �Wahba � ��

Next� we show that the case � � � can be treated using the same technique� First�we rewrite the criterion J�f� in terms of #f as

J�f� ��

l

lXt��

�y�t� � �

��n

Z��Rn

#f ��e�j�T��t��d�

� �

Z��Rn

#f �� #f��#G��

d� �

Z��Rn

� #f �� #fa�� #f�� #f�a ��d�

�� OPTIMIZATION � FUNCTION SPACE METHODS ��

J is a strictly convex functional that must have a unique global minimum� Usingcalculus of variations �e�g� �Luenberger � � �� we �rst �nd the Gateaux variation

�J�f '&f� � ��l

lXt��

�y�t� � f��t � ��Z��Rn

& #f��e�j�T��t��d�

�

Z��Rn

#f ��& #f�� #f��& #f��#G��

d�

�

Z��Rn

&#f��

#f�� #f�a ��

� & #f��

#f �� #fa��

��d�

for an arbitrary perturbation &f � F � It is straightforward to show that the twolast terms are real� hence

�J�f '&f� � �

Z��Rn

& #f ��

��l

lXt��

�y�t� � f��t � ��e�j�T��t��

�#f��#G��

� #f�� #f�a ��

��d�

It follows that a su�cient condition for the Gateaux variation to be zero for all&f � F is that f satisfy the Euler�Lagrange equation

�

l

lXt��

�y�t� � f��t � ��e�j�T��t�� #f��#G��

� #f�� #f�a ��

��

for all � � Rn� De�ning �t � �y�t� � f��t � �� l�� and

#K�� #G�� #G��

��we get

#f �� #K�� #fa�� lX

t��

�t #K��ej�T��t��

If the set of functions f�K� � �K� � �� KdKg satisfy

�#K��

#�Kj ��

we may add to the solution f any term in spanf�K� � �K� � �� KdKg without violatingthe Euler�Lagrange equation �� Hence� by transforming back into the originalcoordinates� we get the representation

f�� K � fa�� lX

t��

�tK�� t � �� dKXj��

�j�Kj ��


In the next paragraph� we will argue that #K�� will always be a low�pass �lter�Hence� K � fa is a smoothed variant of fa� The fundamental di�erences betweenthe cases � � � and � � � is interesting� When � � �� the model �� is thesmoothed default model fa in addition to some smooth terms that compensate formismatch between the smoothed default model and the observed data� Anothermajor di�erence is that with � � �� the basis�functions are translations of G��while they are translations K�� for � � �� As one should expect� K � G as� � �� We will in the following illustrate this di�erence with some examples�

Let us look at some typical choices for #G�� and look at the corresponding basis�functions G�� and K�� in the one�dimensional case �n � �� As discussed� wemust choose #G�� such that � #G�� is a symmetric �lter that ampli�es high�frequency energy�

�� Let us �rst choose #G�� j��

#G��

�� G��

�j�j

#K��

� �� K��

�

��e�j�j��

where K�� is characterized by the parameter � �p�� A large penalty

on non�smoothness �large �� will give a soft basis�function K that will ex�trapolate the data points widely� while a small � will give basis�functions thatare sharp spikes� It is interesting to observe that while K�� is a local basis�function �a bump centered at the origin��G�� is a global basis�function� Thereason for this will be discussed below� A model representation that utilizesthe basis�function G is the hinging hyper�plane model structure of Breiman�� Also notice that S is the di�erential operator� and #G�� j��di�erentiates twice� Then dG � �� and the basis �G� �� and �

G� ��

is a possible choice�

�� Next� we impose in�nite penalty on high frequencies

#G��

�� j�j� �� j�j � �

� G��

�sinc��

#K��

�� j�j� ��

�� j�j � �

� K��

�� sinc��

which gives basis�functions K�� and G�� which are equal� except for afactor� and are local oscillating bumps centered at the origin� Notice that�� and �� are satis�ed for all functions that have Fourier transformequal to zero for frequencies above �� This is an in�nite�dimensional spaceof band�limited function� and dK and dG are in�nite� The basis�function Gis a reproducing kernel for the Hilbert space of band�limited functions� andsuch kernels have been suggested as basis�functions in empirical modeling byMosca ��

�� OPTIMIZATION � PARAMETERIZED APPROXIMATION ��

From the de�nition of #K�� we immediately get the approximation

#K��

��#G�� if

#G��

��

�� if #G��

��

��

The �rst example illustrates the case when #G��

�� as � � � and

#G��

� � as � � �� While the �lter #G�� ampli�es low frequency energy and

suppresses high�frequency energy� #K�� will let low frequency energy pass whileit suppresses high frequency energy� The ampli�cation of low frequency energy isessentially why the impulse response G�� is a global basis�function� while K��is a local basis�functions�

In the second example� #G��

�� as � � �� while

#G��

�� as

� � �� This leads to #K�� #G�� for all �� and since #G�� does not amplify lowfrequency energy� both K�� and G�� will be local basis�functions� We conclude

that it is the low�frequency characteristic of #G��

��that determines whether

G�� is local or global� From �� it follows that as long as #G��

��does not

amplify low frequency energy� K�� will be a local basis�function� For all sensiblechoices of #G�� this will be the case� This emphasizes the interpretation thatthe data�dependent part of the model will compensate for the mismatch betweenthe observed data and the smoothed default model� and this data dependent partwill only have in�uence on the total model locally in a neighborhood of eachobservation� Intuitively� this makes sense� and a similar hybrid parametric modelstructure has been suggested on the basis of engineering considerations by Krameret al� �� see also �Thompson and Kramer � �� The main di�erence is thatthe default model is not smoothed� and that the basis functions are chosen in anad� hoc� manner�

�� Optimization � Parameterized Approximation

Solving the modeling problem directly in the function space gives a mathemati�cally very elegant solution� However� a simple solution can only be found for rathersimple problems not involving complicated constraints� non�linear operators� com�plicated default model� complicated inner products� or a complicated multidimen�sional geometry of !� For the purpose of solving practical modeling problems wewill therefore study solutions based on approximating the optimal model closelywith a �nite dimensional approximation� With a suitable parameterization of f �the problem of directly minimizing �� with respect to these parameters will becomputationally feasible in many cases� First� we must choose a rich representa�tion for f��' �� This representation can be almost arbitrary since it is of minorimportance with the augmented optimization criterion approach� What is requiredis only that the optimal model can be approximated closely� As we shall see later�


the e�ective number of parameters in this model will be controlled through thechoice of �� and �� Hence� the number of parameters in the representation off can be arbitrarily large� Of course� if prior knowledge is available and sug�gest a particular parameterized structure for f � this should indeed be used� Onthe other hand� it is tempting to choose a linearly parameterized representationf��' �� T�� like a look�up table with interpolation� The reason is simplythat this will make the criterion quadratic for a signi�cant number of problems�However� in cases when the dimension of � is high� such representations may notbe well suited because the required number of parameters will typically grow ex�ponentially with the dimension n if the accuracy of the approximation is requiredto be uniform� This is known as the curse of dimensionality� Hence� non�linearlyparameterized representations based on low�dimensional projections� like neuralnets� may be more e�ective in these cases�

With a parameterized model structure it is straightforward to reformulate thein�nite�dimensional optimization problem �� into the �nite dimensional problem

"J��

l

lXt��

��t' f��' �� jjSf��' ��jj� �D��f��' ��Ma� �jjQf��' �� qjj�

subject to

Hf��' �� P f��' ��

If all operators operating on f are rede�ned as operators operating on the param�eter vector �� we get the optimization problem

"J��

l

lXt��

��t' �� jj"S��jj� �D��f��' ��Ma� �jj "Q�� qjj� ��

subject to

"H�� "P ��

Notice that the range of the operators "H� "P� "S and "Q are still of possibly in��nite dimension� A taxonomy for such problems� together with possible solutionmethods� is

� Quadratic criterion� and no constraints� This gives a set of linear algebraicequations that can easily be solved�

� Quadratic criterion� linear �nite�dimensional constraints� Solved using quad�ratic programming�

� Non�quadratic criterion� no constraints� This is a non�linear programmingproblem that can be solved iteratively�

� Non�quadratic criterion� �nite�dimensional constraints� Constrained non�linear programming problem�

� In�nite�dimensional constraints gives a semi�in�nite programming problem�

�� OPTIMIZATION � PARAMETERIZED APPROXIMATION ��

An in�depth discussion of various practical optimization algorithms that take ad�vantage of the particular structure of the problem is outside the scope of this work�Instead� we refer to the vast literature on general�purpose algorithms� e�g� �Luen�berger � �� Gill et al� � �� Tanaka� Fukushima and Ibaraki � �� and softwaretools like the MATLAB Optimization Toolbox �Grace � �� and only discuss indepth the simple case when f is linearly parameterized� i�e� f � spanf�ngNn��where f�ngNn�� is a set of linearly independent smooth basis�functions de�ned onthe domain !� Furthermore� we assume there are no constraints� and the defaultmodel is an NARMAX model de�ned by the function fa � L��!�� We use the dis�tance measure D��f�Ma� � jjf �fajj�� While this is perhaps the simplest possiblespecial case� it contains enough complexity to support a discussion of the variousproblems involved in solving the general optimization problem�

In this case we can easily compute the solution to the optimization problem ex�plicitly� We de�ne the vectors and matrices

Y � �Yt�� Yt � y�t�� t � �� l

2 � �2ti�� 2ti � �j��t�� t � �� l� �� i � �� N

V � �V i�� V i �� i� fa �� i � �� N

W � �W i�� W i �� i� q �� i � �� N

S � �Sij�� Sij �� S�i� S�j �� i � �� N� j � �� N

R � �Rij�� Rij �� i� �j �� i � �� N� j � �� N

Q � �Qij�� Qij �� Q�i� Q�j �� i � �� N� j � �� N

Suppose S and Q are linear operator� Then we can reformulate the criterion as

"J��

l� 2� � Y�2� � Y � �

�NXn��

�nS�n�

NXn��

�nS�n

�

�

�NXn��

�n�n � fa�

NXn��

�n�n � fa

�

�

�NXn��

�nQ�n � q�

NXn��

�nQ�n � q

�

��

l�T2T2� � ��

l�T2TY

�

lY TY ��TS�

��TR� � ��TV � fa� fa �

� �

��TQ� � ��TW � q� q �

�It follows that � � �� N�T is a global minimum of "J � provided�

�

l2T2 �S �R �Q

� �

�

l2TY �V �W ��

This set of equations will never be over�determined� so a solution will always exist�Moreover� it will be unique under very reasonable conditions� since all terms on theleft�hand side of �� are positive semi�de�nite� and uniqueness follows if at leastone of them is positive de�nite� One possibility is that rank�2T2� � N � which is


related to the classical condition that one needs at least as many observations asunknown parameters� However� the introduction of prior knowledge allows us torelax this condition and use an over�parameterized approximate representation ofthe model� For example� a default model will always make the problem well posed�In addition� there should be some restrictions on the choice of basis�functions�These restrictions are related to the fact that the operators S and Q may havenull�spaces that may make certain functions �disappear� from the equations thatdetermine the parameters� giving a singular set of equations� A typical exampleis a di�erential operator that will map polynomials of su�ciently low order to thezero function�

It is evident that the computational complexity may be a considerable problemeven for solving �� The major computational cost involved is computingthe inner products� In general� the basis�functions or operators may not alwaysbe so nice that this can be done analytically� and numerical integration su�ersfrom the curse of dimensionality� Hence� the computational complexity may growexponentially with the dimension of the information space� Of course� with non�linear parameterization� operators� or constraints� the problem is more apparent�and the computational complexity will reduce the applicability of the approach�

�� Tuning the Criterion

In section �� we formulated the semi�empirical modeling and identi�cation prob�lem as an optimization problem� and in sections �� and �� we discussed thesolution to this problem� However� it is the choice of criterion that is the mostchallenging and interesting aspect in this optimization approach� and tuning ofthe criterion parameters �� is the topic of this section� Viewed in anotherway� the problem we discuss in this section is basically to decide on how much wetrust the empirical data and the various pieces of prior knowledge relative to eachother� Obviously� a lot can be said about this general problem� but we do not havespace to give such an in�depth discussion of this problem as it deserves� We willfocus on the case when we always require a smooth model� and give precedence tothe empirical data relative to the prior knowledge for operating conditions whenthere are con�icts between the empirical data and the prior knowledge� Moreover�we always rely on the prior knowledge under operating conditions when data islacking�

If the available empirical data sequence is viewed as the realization of some stochas�tic process� and a model is identi�ed on the basis of this data� then the expectedsquared prediction error can be decomposed into a systematic component �bias�and a random component �variance�� The systematic component is due to thefact that the prior knowledge and empirical data set can be incorrect or incom�plete� The random component is due to unpredictable noise� but most importantlycaused by the fact that a �nite number of observations is in general not su�cientto identify the best model in the model set� The best model has a complexity thatgives the best balance between bias and variance� cf� Fig� �� It is clear thatthe model complexity should depend on the information contents in the empirical

�� TUNING THE CRITERION ��

data sequence� and the prior knowledge available� We will later see that the com�plexity of a model in some cases can be explicitly related to the e�ective numberof parameters in the model� a quantity that depends on �� Hence� it makessense to let �� depend on the empirical data� Assume that we want to �ndcriterion parameters �� such that the model minimizing J will be the onethat has best expected squared prediction error� It is easy too see that the perfor�mance of the model will indeed depend on these parameters� For example� a toosmall �� or � may lead to over��tting and poor performance when extrapolating�since too much weight will be given to the data� Likewise� a too large �� or �may give too little weight on the empirical data� and a model that is biased underoperating conditions where the prior knowledge is incorrect or incomplete�

This suggests a hierarchical optimization approach where the criterion parameters�� are optimized in an outer loop using a criterion V �� that re�ects theexpected squared prediction error on future data� cf� Fig �� The main problem isto �nd such a criterion V �� In a stochastic framework� exact computation ofthis criterion requires the joint probability distribution for all stochastic variablesin the model to be known� Clearly� this is unrealistic and we therefore considersome practical alternatives based on empirical data� and mainly assumes ergodicityof the stochastic processes�

Suppose another independent data sequence

��u�l �� y�l �� u�l �� y�l �� u�l L�� y�l L��

with L input�output pairs is available� Then an estimate of V �� is given by

VS��

L

LXt�l��

��t' ��

where the prediction error ��t' �� is with a parameter vector �� thatsolves the problem �� In other words� an independent data sequence is usedto test the prediction performance of the identi�ed model� which is reasonable ifthis data sequence re�ects the underlying distribution of the future data� Thismethod is simple� but su�ers from the major drawback that a larger amount ofempirical data is required�

To overcome this drawback� one can re�use the original data sequence using cross�validation� The cross�validation criterion is an estimate of V given by

VCV ��

l

lXt��

��t' ��t��

where ��t�� is the parameter vector that minimizes the criterion ��modi��ed such that there is no weight on the residual at time t� This means that l modelsare �tted by removing one residual at a time from the criterion� and the predictionperformance of each model is tested one the single observation that was removedfrom the criterion� More details can be found in e�g� �Wahba � �� Stoica et al��


YES

YES

NO

NO

��

Find ��

V ��

��

Is the best ��

that reduces

Find a � that

Is the best � found

found

��

�

reduces J��

Figure �� Hierarchical optimization approach� The inner loop can be viewed asparameter identi�cation� while the outer loop can be viewed as a kind of structureidenti�cation as it estimates the optimal model complexity�

�� TUNING THE CRITERION ��

When ��t' �� depends linearly on �� a suitable criterion is a modi�ed version of theFinal Prediction Error criterion of Akaike ��

VFPE �� d�� l

�� d�� lV��

where d�� trace�A�� is interpreted as the e�ective number of pa�rameters in the model �degrees of freedom�� and

A�� 2T2�2T2 �lS �lR �lQ

��

V��

l

lXt��

��t' ��

When the prediction error depends non�linearly on �� the FPE criterion for regu�larized models of Larsen �� can be applied

VFPER�� d�� l

�� d�� l d�� lV��

where

d�� trace �A�� d�� trace �A�� A��

A��

��V��

��

��V��

��

��"-��

��"-�� jj "S��jj� �D��f��' ��Ma� �jj "Q�� qjj�

Next� consider the case when the parameters are constrained� Let the parameterestimate that solves �� be a regular point denoted "�� and let c be the number

of linearly independent equality and active inequality constraints at "�� Let usdecompose � into a c�dimensional sub�vector �� and an �N � c��dimensional sub�

vector �� such that � � ��

�� and in a neighborhood of "� there exists a function

g such that the constraints are satis�ed on the manifold de�ned by �� g��The existence of such a decomposition and function g is guaranteed by the ImplicitFunction Theorem� e�g� �Luenberger � �� In other words� in a neighborhood of"�� the optimization problem can be reformulated as an unconstrained problem onan �N � c��dimensional subspace of the parameter space� When regularization isnot applied� it follows directly that the e�ective number of parameters equals N�c�However� in the general case� one must explicitly know the �implicit function� gin order to substitute � � �� g��

�� into "J�� It is easy to see that a linear

approximation to g that is exact in the point "�� will be su�cient� Fortunately�such a linearization can usually be found from the equations for the constraints�

Clearly� there will be situations when the automatic procedure for choosing �� suggested in this section will lead to spurious results� A trivial example is whenthe given default model is signi�cantly wrong over the whole range of operatingconditions where there are empirical data available� but adequate for other oper�ating conditions� Then it is optimal to choose � � �� unless the inner product is


chosen to give zero weight on the model near all the data points� in which case� can be arbitrary� Hence� one reason why the algorithm may fail is in generalthat the future data distribution will be di�erent from the distribution of the em�pirical data used for identi�cation� As discussed before� the automatic choice of�� and � is based on the assumption that the empirical data are more reliablethan the prior knowledge� Such an assumption must be made in order to resolvecon�icts between the di�erent pieces of knowledge and data� and is controversialboth in the philosophy of science� and in engineering practise� Still� even if thealgorithm may fail when searching for � or �� the choice of � may still be sensible�since smoothness may be a uniform property of the system� i�e� uniformly true forall operating conditions� while the default model and constraints may be incorrectunder some operating conditions� In summary� we do not recommend a completelyautomated choice of tuning parameters� in particularly not � and ��

The estimated relative weights on the di�erent terms in the criterion may be inter�preted as indicators on the relative correctness or relevance of the correspondingprior knowledge or empirical data� As we have discussed above� such interpreta�tions should be made carefully� as it is possible that knowledge that is incorrect orirrelevant �or both� for all the operating conditions we have empirical data avail�able for� will be adequate for other operating conditions� One must always have theinformation contents in the data set in mind� However� if there are large amountsof data available� and the data covers all operating conditions� such interpretationsmay be valuable�

Finally� it should be mentioned that it is possible is include the order parametersny� nu� and ne together with �� and � in the search�

�� Simulation Example� pH�neutralization

The purpose of this simulation example is to illustrate the e�ect of imprecise�incomplete� and incorrect prior knowledge and empirical data on the identi�edmodel� and how this framework can handle the di�erent kinds of prior knowledgeand empirical data�

The system we will study is a pH neutralization tank� The system has one input�the base �ow rate u�t�� one output �the measured pH y�t�� and no disturbancesor noise� There are three in�uent streams and one e1uent stream�


� In�uent bu�er stream Q� �NaHCO�

� In�uent base stream Q �NaOH and traces of NaHCO�


We use a model developed by Hall and Seborg �� to simulate the �true system��see section ��

The available prior knowledge and empirical data is supposed to be as follows�

�� SIMULATION EXAMPLE PHNEUTRALIZATION ��

�� Data sequence from a limited region of operation� cf� Fig ��a� Thesampling interval is �� s� and the sequence contains �� observations�

�� Steady�state data from � operating points� pH � f �� g�

�� Simplied mass�balance model where the bu�er stream is neglected� cf�section ��

�� pH range� in other words the fact that � � pH � �� Smoothness of the system behavior�

�� Open�loop stability of the system�

We will study the problem of identifying a model of the NARX form

y�t� � f�y�t � �� u�t� �� e�t� ��

The interesting operating range is de�ned by � � y � �� and � � u � �� ml�s�Hence� ! � $�� %� $�� %� Notice that the available data sequence only covers asmall part of !� cf� Fig� ��a� while the model will be partially validated using adata sequence that covers a much larger part of !� cf� Figs� ��b and ��b� Theunknown non�linear function f is represented by a ��dimensional lookup�table ��ij�with linear B�spline interpolation on this domain� The choice of representation isof minor importance� and the lookup�table is chosen because of its simplicity� The� � entries in the lookup�table are considered as unknown parameters�

We choose a non�smoothness penalty of the form

jjSf jj� � �TS� �Z��

�� f��

�� d�which is based on an easy�to�compute �nite�di�erence approximation of the Hes�sian

��f��ij�

�� i��j��i�j��i��j

��

�i��j��i�j��i��j��i�j��

�i��j��i�j��i��j��i�j��

��

�i�j��i�j��i�j��

��

�

where f��ij� � �ij� Furthermore� &� � �� and &� � �� are the uniform dis�cretization intervals in the lookup�table� One motivation behind this choice is thatpenalty on the Hessian can be interpreted as penalty on deviations from a linearmodel structure� This is reasonable� since a linear model structure is perhapsthe simplest possible that may be adequate� Another motivation is that smallcurvature intuitively is equivalent to smoothness�

In the following we will identify �ve models based on di�erent combinations ofthe available knowledge and data� and as an alternative model structure we try aneural network for comparison�


0 20 40 60 80 100 1200

10

20

30

Base flow-rate

0 20 40 60 80 100 1202

4

6

8

10

12pH

0 20 40 60 80 100 1202

4

6

8

10

12pH

0 20 40 60 80 100 120

5

10

15

20

25

30

Base flow-rate

time [min]

time [min]

a) Data sequence used for model identification

b) Data sequence used for model validation

Figure �� Simulated data sequences�

Model �� Knowledge� Data sequence� Smoothness� Stability

Open loop stability is represented as a constraint in the problem

"J��

l

lXt��

�y�t� � ��t � ��T �� TS�


0 2 4 6 8 10 12 140

5

10

15

20

25

30

35

40

0 2 4 6 8 10 12 140

5

10

15

20

25

30

35

40

pH pH

a) b)Qb Qb

Figure �� Empirical distribution of a� identi�cation data� and b� validation data�

subject to

j�i�j � �i��jj � &�� for all i� j

This optimization problem is solved using quadratic programming �MATLABfunction qp�� Notice that the use of a lookup�table with linear interpolationallows us to reduce the stability constraint to a �nite number of constraints on theparameters� The generalized FPE criterion with the e�ective number of parame�ters corrected for the active constraints� is minimized using a line search algorithm�MATLAB function fmin� to �nd the smoothing parameter � � �� cor�responding to d� � �� e�ective parameters� Notice that the prior knowledgeabout open loop stability is of great importance� since otherwise the model will beunstable for a wide range of � values�

Model �� Knowledge� Data sequence� pH range� Smoothness

Notice that bounded pH range automatically implies BIBO�stability of the model�Here we solve the minimization problem

"J��

l

lXt��

�y�t� � ��t � ��T �� TS�

subject to

� � �ij � �� for all i� j

This quadratic programming problem is �nite�dimensional� and solved using theMATLAB function qp� Similar to Case �� a minimization of the generalized FPEcriterion gives � � �� which gives d� � �� e�ective parameters�


Model �� Knowledge� Simplied massbalance

No identi�cation is needed� cf� the model in section �� Notice that this modelcontains very limited knowledge about the chemistry in the tank� since the bu�erstream is neglected� Hence� the prior knowledge assumed is signi�cantly less thanthe �true system�� and the model is considerably simpler�

Model �� Knowledge� Data sequence� Simpliedmassbalance�Smoothness

The default model is stable� so the stability constraint is not required� The crite�rion is now

"J��

l

lXt��

�y�t� � ��t � ��T �� TS� �D�f��' ��Ma�

where

D�f��' ��Ma� ��

��

�Xt��

�ya�t�� Ta �t � ��t�

we have de�ned �a�t� � �ya�t�� ua�t��T � and the data sequence

��ua�� ya�� ua�� ya�� ua�� ya��

is generated by simulating the model Ma� The input sequence is chosen to coveras much of the operating range of the process as possible� The weighting functionis de�ned as ��t� � �� In this case a search for � and � using the generalizedFPE criterion fails� The reason is simply that the default model is signi�cantlywrong for all the operating conditions captured by the sequence of empirical dataavailable� Hence� the optimization gives � � �� A more intelligent choice ofweighting function ��t� that applies the prior knowledge that the default model ispoor for intermediate pH values� does not resolve the problem� Instead� we havechosen � � �� and � � �� which gives d� � �� This choice gives a smoothtransition to the default model outside the range there are data available from�

Model �� Knowledge� Steadystate data� Simplied massbalance� Smoothness

Here we solve the minimization problem

"J�� TS� �D�f��' ��Ma� �

�Xi��

�ysteady�state�ui�� yi��

where D�f��' ��Ma� is the same as above� Since there is data sequence available�we make the somewhat arbitrary choice � � �� The steady�state model output is computed by simulating the model until a steady state isreached� The model�s steady�state output is a non�linear function of the steady�state input� and the steady�state data are represented as �ve soft constraints� Thisoptimization problem is solved using the MATLAB function fminu�


Model and �� Sigmoidal Neural Network

For comparison� we have also identi�ed a feed�forward neural network model ofthe form �� Compared to the optimization approach� this corresponds to thea priori choice of a particular parameterized model structure� The parametersare identi�ed using the MATLAB Neural Network Toolbox �Demuth and Beale� �� i�e� the back�propagation algorithm with momentum term using the iden�ti�cation data sequence and a least squares criterion� Depending on the numberof hidden nodes� the identi�cation algorithm tuning�parameters� and initial pa�rameter estimates� the results varies considerably� We have chosen to present twoidenti�ed models with the same structure �� hidden nodes with sigmoidal non�linearities� corresponding to �� parameters�� the same identi�cation algorithmtuning�parameters� but di�erent �small random� initial parameter estimates�

Results

The prediction performance of the models is illustrated by simulations using thevalidation data input sequence in Fig� �� Moreover� the major non�linearity inthis process is the steady�state response� or titration curve� which is shown for thedi�erent models in Fig� ��

These curves clearly show that di�erent aspects of the system behavior are cap�tured� depending on the prior knowledge applied� The performance of Model �is poor at at high and low pH values� since no empirical data and only weakprior knowledge like smoothness and stability is relevant under these conditions�Restricting the pH range gives some improvement� cf� Model �� Likewise� thesimpli�ed mass balance that makes up Model � gives reasonable predictions onlyat high and low pH values� Using this model as a default model combined withthe empirical data sequence from the intermediate pH range gives a model withreasonable predictions over the full pH range� cf� Model �� It is also interestingto observe that the default model combined with a small amount of steady�statedata from the intermediate pH range� but no dynamic data� does not give poorerprediction� cf� Model ��

A comparison with the neural network models �Models � and � is also quiteinteresting� The two neural network models perform equally well on predicting theidenti�cation data� but perform considerably di�erent on the validation data andhave quite di�erent titration curves� It should be mentioned that these di�erencesare typical� and more pronounced di�erences �on the validation data� appear whenthe model structure and identi�cation algorithm tuning�parameters varies as well�Notice that it is most fair to compare Models � and with Model �� since theyare both based on weak prior knowledge�

It seems reasonable to conclude that this example illustrates that the incorporationof increasing amounts of prior knowledge may improve the quality of the identi�edmodel� Furthermore� the optimization approach seems more adequate than theapplication of a black�box model structure like a neural network in this application�since the empricial data does not cover the operating range su�ciently� This


0 20 40 60 80 100 1200

2

4

6

8

10

12

14

16

0 20 40 60 80 100 1200

2

4

6

8

10

12

14

16

0 20 40 60 80 100 1200

2

4

6

8

10

12

14

16

0 20 40 60 80 100 1200

2

4

6

8

10

12

14

16

0 20 40 60 80 100 1200

2

4

6

8

10

12

14

16

0 20 40 60 80 100 1200

2

4

6

8

10

12

14

16

Model 1 Model 2

Model 3 Model 4

Model 5

time [h] time [h]

Model 7 (dashed-dotted)Model 6 (dashed)

pH

pH

pHpH

pH

pH

Figure �� Simulation of the models when the validation data input sequenceis applied to the models� The solid curves are the validation data pH� while thedashed curves are simulated pH using the models� For models � and � the dashedand dashed�dotted curves are with di�erent initial parameter estimates�

supports our initial claim that the choice of model structure in empirical modelingis a di�cult and critical problem�

This example also shows that the automatic choice of criterion tuning parameters�� is feasible only if the available data covers the interesting operating con�ditions su�ciently well� In particular� the algorithm gives reasonable choices forModels � and �� but failes for Model ��


5 10 15 20 25 30 35 400

2

4

6

8

10

12

14

16

5 10 15 20 25 30 35 400

2

4

6

8

10

12

14

16

5 10 15 20 25 30 35 400

2

4

6

8

10

12

14

16

5 10 15 20 25 30 35 400

2

4

6

8

10

12

14

16

5 10 15 20 25 30 35 400

2

4

6

8

10

12

14

16

5 10 15 20 25 30 35 400

2

4

6

8

10

12

14

16

Model 1 Model 2

Model 3 Model 4

Model 5Model 6 (dashed)Model 7 (dashed - dotted)

pH

pH

pH pH

pH

pH

Qb Qb

QbQb

Qb Qb

Figure �� Titration curves �steady state response� of the models� The solidcurves are the system�s true titration curve� and the dashed curves are the models�For models � and � the dashed and dashed�dotted curves are with di�erent initialparameter estimates�

� Discussion

We have formulated the semi�empiricalmodeling problem as an optimization prob�lem on a function space� In addition to the standard penalty on mismatch betweenmodel prediction and the empirical data� the criterion also penalizes mismatch be�tween the model and various pieces of prior knowledge� Furthermore� prior knowl�edge can enter as constraints in the optimization problem� We have solved a simpli��ed optimization problem in the function space� and discussed �nite�dimensional


approximation to the general solution� The approach is interpreted in the contextof regularization� and it is shown how di�erent pieces of prior knowledge will re�duce the e�ective number of parameters in the model� The problem of weightingthe various pieces of knowledge and data is approached similarly to the problemof choosing a regularization parameter� and some criteria are suggested�

Our work provides a uni�ed framework for incorporation of prior knowledge innon�linear system identi�cation� It is primarily based on �Thompson and Kramer� �� Girosi et al� � �� In particular� the framework provides an attractivealternative to the direct speci�cation of a parametric model structure� since theformulationof an optimization criterion is clearly a more high�level and transparentprocedure that requires less guesswork�

However� there are at least two major drawbacks with this approach�

�� The computational complexity may limit its applicability� This problem willin particular be apparent when the dimension of the information space ishigh� or there are non�linear operators or parameterizations involved� Theproblem may be reduced with the aid of tailored optimization algorithms�solving the problem analytically as far as possible� and trying to simplifythe various elements involved� A library of �exible and computationallye�ective building blocks for representing various pieces of prior knowledgeand information would be useful�

�� The resulting model need not be transparent� in the sense that it cannoteasily be interpreted and analyzed� since it is non�parametric� The onlypossibility is to interpret the model in terms of the prior knowledge andempirical data applied� However� this may be of limited help for manyapplications� We would like to stress that all black�box model structuressu�er from the same drawback� perhaps to an even larger extent� since theprior knowledge is often applied in a less transparent manner�

The major advantage of the approach is its �exibility with respect to incorporationof prior knowledge in various forms� This �exibility can possibly be extended andimproved in several directions�

�� The extension to the MIMO�case is conceptually straightforward�

�� We believe that extensions to other model representations than NARMAXmodels �like state�space models� is possible�

�� A more accurate model �and better interpretation of estimated criteriontuning�parameters� may be found if the terms in the criterion are split intodi�erent terms that are valid under di�erent operating conditions�

�� As mentioned before� a library of building blocks would be useful� Such a li�brary might contain various model space metrics� non�smoothness operators�operators for representing steady�state data� stability etc�

�� More work is needed to develop practical engineering procedures for thecoding of prior knowledge into the criterion and constraints� One can imaginea high�level expert system based user interface that automatically codes priorknowledge like �We have good con�dence in the default model when the pHis either high or low� but no con�dence in it for intermediate pH values� intothe optimization criterion� This elevates the modeling problem to an evenhigher level where the optimization formulation is hidden for the user�


It is commonly stated that the main bene�t of �modern� non�linear empiricalmodeling approaches like radial basis�functions� wavelets� and neural networksis that essentially no prior knowledge is required� This is to some extent true�but one must remember that the price to pay may be uncertainty about whatset of basis�functions to apply� and a more or less ill�conditioned identi�cationproblem� which will typically lead to poor extrapolation capabilities of the model�cf� the simulation example in section �� Unfortunately� it has also been com�mon practise that the lack of transparency of such black�box approaches has ledto neglection of the available prior knowledge� and the resulting ill�conditioninghas been handled implicitly using regularization methods like stopped training�Sj�oberg and Ljung � �� Ljung� Sj�oberg and McKelvey � �� However� sim�ple explicit regularization methods like default models �Thompson and Kramer� �� Su et al� � �� Kramer et al� � �� Johansen and Foss � �c�� constraints�Joerding and Meador � �� Thompson and Kramer � �� penalty on parametermagnitude �Weigend� Huberman and Rumelhart � �� and smoothness regular�ization �Bishop � �� Girosi et al� � �� has also been suggested� The optimizationframework presented here will be useful for regularizing such complex parameteridenti�cation problems� and is useful as a complementary technique applied to�gether with other approaches such as neural networks� in order to reduce thesensitivity with respect to the a priori choice of model structure�

Chapter

Operating Regime basedModel Predictive Control

A major application area for non�linear dynamic models is dynamic optimization�or model predictive control� In this chapter we will present a semi�realistic simu�lation example� where the operating regime based modeling framework is used todevelop a model that is used in a model predictive controller that is applied to abatch fermentation reactor� The performance with this model is compared to twoother models�

The continuation of this chapter is structured as follows� First� we present a briefsurvey of the batch process control and dynamic optimization literature� withemphasis on the intersection of these �elds� Second� the predictive control problemis de�ned� This is� essentially� nonlinear state�space MPC utilizing the operatingregime based modeling framework� The method is applied to a simulated batchfermentation process� Model development is emphasized� MPC based on localmodeling is compared to the use of a conventional nonlinear state�space modeland a linear model� Some discussion �nalize the chapter� which is based on �Foss�Johansen and S�rensen � ��

�� Batch Processes and Dynamic Optimization

Dynamic optimization has for decades been a basis for control� During the ��sand ��s� closed�form solutions to the optimization problem was the driving forcefor much research activity� The linear�quadratic controller seemed to provide asolution to many multi�variable control problems� Experience has shown that thisapproach has serious shortcomings� First� the solution assumes no constraints onthe states and control inputs� Second� the linear�quadratic controller need not berobust when estimator dynamics are included for state estimation� as shown byDoyle ��

��

�� CHAPTER �� OPERATING REGIME BASED MPC

Within the domain of optimization�based process control research� the interest andsuccessful industrial applications have the last decade focused on model�basedpredictive control �MPC�' see �Garcia� Prett and Morari � � and �Rawlings�Meadows and Muske � �� for comprehensive surveys of this �eld� The idea is tosolve the optimization problem at a given time instant by utilizing the most recentprocess measurements� The optimization problem is de�ned on some horizon� anda control trajectory is computed on this horizon� Only the �rst part of the controltrajectory is applied to the process� and the entire optimization is repeated at thenext sampling instant� again utilizing process measurements up until this new timeinstant� This methodology was �rst presented by Cutler and Ramaker �� andthey minimize a quadratic criterion weighting the control errors and changes inthe control inputs� They use a linear moving�average model for prediction� Moreimportant� this optimization�based controller can handle constraints in both thecontrol inputs and the controlled variables�

The large majority of MPC is based on linear models� An important reason forthis is that linear MPC� like other linear controllers� can handle processes withweak non�linearities� Although the use of MPC within the process industries hasbeen extensive� it has been mainly limited to continuous processes� Such processesare often characterized by small variations in operating conditions� This is not thecase for batch and fed�batch processes which are also widely used in industry�The di�erence between batch and fed�batch processes is that no feed is addedand no product is removed in a batch process� while this is done in a fed�batchprocess� Batch processes will typically exhibit large variations in the operatingconditions during a batch� Moreover� the product speci�cations may di�er amongbatches� thereby changing the operating conditions signi�cantly between batches�Johnson �� compares control of continuous processes and batch�type processesby stating that the optimization problem of batch processes is a dynamic probleminvolving highly nonlinear process models� As a contrast� continuous processescan often be optimized by a static formulation� Control of a batch reactor isusually carried out by a two step procedure� Time�varying trajectories for theimportant variables are �rst derived� This is either done in a heuristic mannerbased on process insight and experience from earlier operation� or by open�loopoptimization based on a model of the batch reactor� Second� the tracking of thevariables is accomplished by set�point controllers� Reviewing the batch reactorcontrol literature� emphasizing fermentation reactors� shows that four questionsare focused' the generation of optimal trajectories� controller design for setpointcontrol� computing on�line estimates of reactor states� and the issue of reactormodeling� �Rippin � �� Johnson � � and �J�orgensen and Jensen � ��

The work on optimal trajectories is usually based on some non�linear mechanisticmodel of the process in question� The cost criterion typically include productivityand input costs� and the optimization problem is solved o��line� Examples ofthis can be found in �Impe� Nicola� Vanrolleghem� Spriet� Moor and Vandewalle� �� and �Sargantanis� Karim� Murphy and Ryoo � �� There are� however�also examples on the use of on�line optimization� i�e� use of MPC� �Lim and Lee� ��

There is a quite extensive literature on setpoint control of batch�type reactors�in particular the use of adaptive control and feedback linearization� Bastin and

�� MODEL PREDICTIVE CONTROL ��

Dochain �� and Pomerleau� Perrier and Dochain �� base their controllerdesign on mechanistic models� while recent work by Proll and Karim �� andKeulers �� use empirical models for nonlinear control�

A major problem when implementing advanced control in biotechnical processes isthe lack of good measurements� Hence� research has also focused on�line estimationof reactor states� particularly substrate and product concentrations� Examples ofthis is found in �Hengjie� Jianzhong� Shuqing and Jicheng � � Hilaly� Karim andLinden � �� Keulers � ��

Reactor modeling for the purpose of control� span from linear and nonlinear mech�anistic models to linear and nonlinear empirical models� Of particular interest toour work is �Zhang et al� � �� They utilize the fact that di�erent phenomenadominate during di�erent parts of a batch cycle� and construct a set of local mod�els that that are valid during di�erent parts of a batch cycle� In addition� theyspecify a method to select the appropriate model at a given time� The advan�tage of this concept is that the individual local models are simpler than a globalmodel that can represent the whole batch cycle� A similar approach is proposedby Konstantinov and Yoshida ��

This work investigates the use of MPC on batch processes using a non�linear modelin the controller� There are some reports on this in the literature� Lim and Lee�� describe the use of MPC using on�line parameter estimation� The controltrajectory is computed by simultaneous parameter estimation and re�optimization�Garcia �� extends the method introduced by Cutler and Ramaker �� byusing a nonlinear model for output prediction� This controller is tested on a poly�merization reactor model� A similar approach is presented by Peterson� Hernandez�Arkun and Schork ��

�� Model Predictive Control

The present work rests on two assumptions� The �rst assumption is that theperformance of MPC depends critically on the predictive capabilities of the un�derlying process model� The wide operating range of a batch makes the use ofa non�linear prediction model particularly interesting� The second assumptionis that nonlinear model building is a cumbersome task� Hence� semi�empiricalmodeling techniques are interesting� This is also motivated by the observationthat practically all predictive control loops implemented in industry are based onempirical models�

In this chapter we apply a state�space formulation of the model� cf� Chapter ��This leads to a nonlinear state�space MPC problem� cf� �Balchen� Ljungquist andStrand � �� For batch and semi�batch processes� this nonlinear state�space MPCproblem can be formulated as

maxu�U

�m�x�Ts��

Z Ts

t

l�x�� y�� u�� d�

�


subject to

+x � f�x� u� v�� x�t� given

y � g�x�w�

h�x� y� u� � �

where typically Ts � min�t T� Tf �� The optimization is de�ned on some horizonT � starting at the present time t� Time t � � de�nes the start of a batch andt � Tf de�nes the end of a batch� The end time Tf need not be �xed� and thisvariable is often optimized� too�

Both equality and inequality constraints can be de�ned� Soft constraints maybe de�ned as an integrated part of the optimization criterion� Measurements areexplicitly mentioned in the formulation to emphasize the fact that it is sometimesnatural to optimize with respect to these variables�

To reduce the complexity of the optimization problem� the set of possible controlinput trajectories U is restricted to a �nite�dimensional space� The control inputis here parameterized as a piecewise constant function�

u��

�� $t� t &T � �� $t &T� t �&T ��

where &T is the sampling interval� The optimization problem is solved by the useof a non�linear programming algorithm at time instants t � f��&T� �&T� �� Tf �&Tg using the most recent process measurements� Only� the �rst part of theoptimal trajectory� �� is applied as the control input�

A major problem with the above formulation is its dependence on the initial statesx�t�� In practice� these are not readily available� Hence� some estimate of the statesmust be computed� This may be accomplished by state estimation or an observer�

Since most batch�type processes are highly non�linear� there are two potentialadvantages in applying nonlinear MPC for batch processes� compared to linearMPC� First� the predictive capability on the optimization horizon may improve byutilizing a nonlinear as opposed to a linear model� Second� the states x�t� may beestimated with improved accuracy by the use of a nonlinear model�

�� Simulation Example� Batch Fermentation Re�

actor

A semi�realistic simulation study of a batch fermentation process illustrates theideas presented in this chapter� In this study� �ve controllers are formulated� basedon the above formulation� All controllers utilize the same performance criterionand constraints� equal control input parameterization� and identical optimizationalgorithms� The controllers di�er in the following way�

�� SIMULATION EXAMPLE BATCH FERMENTATION REACTOR ��

�� The �st MPC uses an ideal process model� i�e� the model and the �true sys�tem� are identical� Provided the initial values x�� are correct� this controllergives an upper limit to the performance of MPC�

�� The �nd MPC uses a nonlinear operating regime based state�space modelfor both prediction and state estimation�

�� The �rd MPC uses a global linear state�space model for both prediction andstate estimation�

�� The �th controller is an open�loop optimal controller �OLOC�using the samenon�linear model as the �nd MPC�

�� The �th controller is also an open�loop optimal controller �OLOC�� using theglobal linear model for prediction�

�� System Description

The simulated �true system� model is adapted from �Ghose and Ghosh � ��and �Rai and Constantindes � �� and describes the fermentation of glucose togluconic acid by the micro�organism Pseudomonas ovalis in a well�stirred batchreactor� The main overall reaction mechanism can be described by




The �rst reaction is the reproduction of cells� using the substrate glucose andoxygen� The second reaction is the production of gluconolactone� again usingglucose and oxygen� This reaction is enzyme�catalyzed by the cells� while the �nalproduct� gluconic acid� is formed by the last reaction� The following state�spacemodel is used to simulate the �true system��

+x� � mx�x�x�

Ksx� Kx� x�x�

+x� � vLx�x�

KL x�� Kpx�

+x � Kpx�

+x� � � �

Ys m

x�x�x�Ksx� Kx� x�x�

� ��vL x�x�KL x�

+x� � kla�x� � x�� vL x�x�

KL x��

Y m

x�x�x�Ksx� Kx� x�x�

where x� is the cell concentration� x� is gluconolactone concentration� x is glu�conic acid concentration� x� is glucose concentration and x� is dissolved oxygenconcentration� The parameters m� KL� vL� and Kp depend on temperature andpH� This dependency is given by an interpolated lookup table based on the exper�imental data in �Rai and Constantindes � �� The remaining parameters can be


found in �Rai and Constantindes � �� and �Ghose and Ghosh � �� Initial valuesfor the batch are x�� x�� x�� x�� x�� x�� and x�� x��

The setpoints to the temperature and pH basis�control loops are used as controlinputs by the predictive controller� The basis�control loops are assumed to beperfect� which is realistic� since the system dynamics are slow compared to thetypical bandwidth of these loops�

Three perfect on�line measurements are available at �� h intervals during thebatch� Dissolved oxygen concentration� biomass concentration and gluconic acidconcentration� There are no noise or disturbances in the simulations�

�� Modeling and Identication

All the local models are chosen to have the same linear structure

x�t �� ai Aix�t� Biu�t� v�t� ��

where x � �x�� x�� u � �pH� temp�� ai is a vector of unknown parameters� Bi

is a �� matrix of unknown parameters� and Ai has the structure

Ai �

�BBBB�Ai�� Ai

�� Ai��

Ai�� Ai

�� Ai��

� Ai� � � �

Ai�� Ai

�� Ai��

Ai�� Ai

�� Ai��

�CCCCAThe structural zeros follow from a simple mass�balance based on the reactionmechanism and the assumption that the reaction rates only depends on x� andx�� in addition to u� which is a quite natural assumption to make� since these arethe rate�limiting components�

By examining the main reaction mechanisms� four operating regimes can be iden�ti�ed� see also section �� At the beginning of the batch� the production ofgluconolactone is small due to the small concentration of cells� Hence� the produc�tion of gluconic acid is small due to the low concentration of gluconolactone� Thisregime is characterized by a relatively high concentration of both dissolved oxygenand glucose� In the intermediate stages of the batch� the production of cells andgluconolactone proceeds at a high rate� and some gluconic acid is produced� Thereis a relatively low concentration of dissolved oxygen� and the concentration of glu�cose is decreasing� Depending on whether the dissolved oxygen concentration isso low that the transfer of oxygen to the cells is rate�limiting or not� the dynamicbehavior of the process is di�erent� This gives two regimes that are characterizedby a medium concentration of glucose� and either low or medium concentration ofoxygen� During the �nal stages of the batch� the production of cells and glucono�lactone is reduced due to shortage of glucose� The only signi�cant reaction is theproduction of gluconic acid from gluconolactone� This regime is characterized bylow substrate concentration� and high dissolved oxygen concentration�

These four regimes can all be characterized by the concentration of dissolved oxy�gen and glucose� and these two variables are chosen to be the variables that de�nes


the operating point� The four regimes were chosen on the basis of the discussionabove� and their interpolation functions are shown in Fig� �� Since the depen�dencies on pH and temperature are highly nonlinear� the local model within eachof these four regimes should therefore depend non�linearly on temperature andpH� The chosen local model structure �� does not� so each of these four regimesis therefore further decomposed into four new regimes along the temperature andpH axes� as shown in Fig� �� Hence� the model is based on a total of �� localmodels� The model validity functions �i were chosen to be Gaussian functions�with some suitable overlap�

00.2

0.40.6

0.81

0

0.2

0.4

0.6

0.8

10

0.2

0.4

0.6

0.8

1

Glucose Oxygen

Figure �� Interpolation functions for the four regimes in the plane spanned byoxygen concentration and glucose concentration� Notice that the axes are scaled�

The �� unknown model parameters are estimated using the least�squares method�and simulated data from �� batches� each run for �� h� and all states �measured�every �� h� For every batch� the initial states x� and x� were randomly chosenfrom the intervals $�� % and $�� %� respectively� The control input trajectorieswere designed by randomly selecting between � and � step changes� within theallowable ranges of both temperature and pH� during the batch� A global linearmodel was also found using the same estimation method and identi�cation data�

Both models were visually �validated� on a number of independent batches notused for identi�cation� In these batches� the pH and temperature were randomlychanged every �� h� A typical ballistic prediction is shown in Fig� �� andindicates that the prediction accuracy of the non�linear model is satisfactory� whilethe linear model has poorer prediction capabilities on the full batch length�

Due to the inaccuracy in the models used for MPC� and because not all statesare measured� state estimators are implemented� using a time�varying extendedKalman �lter for the non�linear model� and a time�varying Kalman �lter for thelinear model� The initial state�estimates of the �lters equals the initial states of


5.5

6

6.5

7

25

30

350

0.2

0.4

0.6

0.8

1

pHTemperature

Figure �� Interpolation functions for the four regimes in the plane spanned bytemperature and pH�

the �true system�� The covariance matrices were tuned to make the estimatorloop fast compared to the system dynamics�

�� Model Predictive Control

The objective of the MPC is to maximize the average production rate of gluconicacid� neglecting the costs of substrate� cells� and separation� The time Tc from�nishing one batch to starting the next� due to emptying� cleaning and initializingthe reactor� is Tc � � h�

This optimization problem is formulated as

max�u�Tf ��U�T �

x�Tf �

Tf Tc��

subject to the model equations� and the restrictions �� u� � �� u� � �� and x�� x� � at all time� The trajectories are optimized fromtime t to the batch end time Tf � The batch end time is restricted to T �ft� t &T� t �&T� ��g� and the sampling interval is &T � �� h� In the op�timization� the current state is estimated using the extended Kalman �lter� andthe model is used to compute ballistic predictions from this initial value� Thecriterion �� is maximized using a sequential quadratic programming algorithmwith line search �MATLAB function constr� �Grace � �� The initial values tothe search algorithm are constant input trajectories corresponding to pH � ��and temp � ��


0 1 2 3 4 5 6 70

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 1 2 3 4 5 6 70

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Time h�

x�

x�

x�

x�x

x�

x�

x�

x�

x

Figure �� Simulation �ballistic prediction� from correct initial values for a typicalbatch� marked with circles �� and �true system� trajectories� The upper part iswith the non�linear model based on �� local models� while the lower part is withthe linear model� Notice that the variables are scaled�

�� Results

The results of simulations using the �ve controllers described at the beginning ofthis section are summarized in Table �� The results are averages computed overseven representative initial states� The temperature and pH trajectories for onetypical initial state for these �ve cases are shown in Fig� �� The correspondingstate trajectories for the three MPC simulations are shown in Fig� ��


Table �� Summary of results�Average prod rate Average end time

+p $glh% Tf $h%MPC� Ideal model �� MPC� Local modeling �� MPC� Linear model �� OLOC� Local modeling �� OLOC� Linear model ��

0 1 2 3 4 5 631

31.5

32

32.5

33

33.5

34

34.5

35

35.5

36

0 1 2 3 4 5 66.1

6.2

6.3

6.4

6.5

6.6

6.7

6.8

6.9

7

Temperature

pH

Time h�

Figure �� Optimal temperature �upper part� and pH trajectories �lower part�computed by the �ve controllers� for a typical initial state� MPC� ideal model �dashed�dotted line� MPC� local modeling � solid line� MPC� linear model � dottedline� OLOC� local modeling � dashed line� and OLOC� linear model � solid linewith circles� Notice that the di�erent trajectories have di�erent end times�


0 1 2 3 4 5 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Time h�

x�

x

x�

x�

x�

Figure �� System trajectories for the three MPC simulations and a typical initialstate� Notice that the variables are scaled� Lines marked are with the idealmodel� lines marked � are with operating regime based modeling� and lines marked� are with the linear model�

�� Discussion

The results show� as might be expected� signi�cantly improved performance bymoving from a linear to a non�linear model as the basis for MPC for this type ofprocess� The improvement is somewhat limited by the fact that the control inputsfor all �ve controllers are limited by hard upper constraints during signi�cant partsof the batch� Furthermore� the results show that re�optimization during a batchby MPC may be advantageous compared to open�loop optimization� However�this is not true for the linear model� since the poor prediction capabilities makethe optimization unreliable�

The experience from the modeling and identi�cation suggests that with the oper�ating regime based modeling method� it is both su�cient and necessary to havesome rather elementary process knowledge to develop the model structure� anda set of informative empirical data for parameter estimation� In particular thedecomposition into operating regimes is a critical part of the modeling� where itimportant to use process knowledge to get a sound model structure� However�the applied process knowledge is signi�cantly less than what would be needed fordeveloping a mechanistic model based on mass balances� On the other hand� theamount of data is signi�cantly larger than what would be needed for the identi��cation of a mechanistic model� It should be mentioned that less empirical datamay be su�cient to identify an accurate model� While this aspect is obviouslyimportant from a practical point of view� we have not investigated it here�

Chapter

Operating Regime basedAdaptive Control

The application of operating regime based models for adaptive control may beattractive because the local nature of the model representation will automaticallyfocus the attention of the on�line parameter identi�cation algorithm to the pa�rameters of only the local model�s� that are relevant at the current operatingconditions� Hence� drift phenomena related to poor global excitation of certainparameters will be avoided� simply because we do not update the parameters themodel is not sensitive with respect to� Notice that drift and bursting related topoor local excitation may still be present�

In this chapter� we will �rst present some preliminaries and brie�y review themodel representation� in sections �� and �� respectively� Then� in section ��we describe and analyze a MIMO non�linear decoupler based on local models� Thenwe will analyze the properties of an on�line parameter identi�cation algorithm thatis modi�ed according to the guidelines above� in section �� In section �� closedloop stability and robustness of an adaptive non�linear decoupler is analyzed� Thetheory is illustrated with a semi�realistic simulation example� in section �� andthe chapter ends up in a discussion�

The material in the chapter is from �Johansen � �a� Johansen � �b��

�� Preliminaries and Notation

The norm of a vector x � Rn is jjxjj �pxTx� The norm of a discrete�time transfer

function H�q�� is de�ned as jjH�q��jj� � ess sup�� jH�e�j��j� where q��is the delay operator� We will use the mixed time�frequency�domain notationy�t� � H�q��u�t� instead of the convolution y�t� � h�t� � u�t�� where h�t� is theimpulse response of H�q�� We will in this chapter neglect those initial valuesthat do not a�ect the stability arguments� The analysis is based on input�outputstability theory� Background material can be found in �Desoer and Vidyasagar� �� Kreisselmeier and Anderson � �� Datta � �� and we will make use of thefollowing lemmas frequently�

��

�� CHAPTER �� OPERATING REGIME BASED ADAPTIVE CONTROL

Lemma � Let y�t� � H�q��u�t�� where H�q�� is a causal transfer function thathas all roots inside the unit circle� Suppose

Pt�� u

�� for all t �� Thenfor all t �

tX��

y�� jjH�q��jj��tX

��

u��

Proof� See e�g� �Datta � ��

�

Lemma � Let s� be a positive sequence� i�e� s��t� � for all t� Let the transferfunction H�q�� be de�ned by H�q�� K�� q�� for some K � and� � $�� Suppose the sequence s� satis�es

s��t� � H�q�� s��t� ��s��t� �� ns��t� n��

for some non�negative integer n and constants �� n �� Then�

s��t� � H�q��

�� n��n�s��t�

except for an exponentially decaying term due to possibly non�zero initial condi�tions�

Proof�

s��t� � K�ts�� K

tX��

�t�� s�� ns�� n��

Clearly� for any k � f�� ng

tX��

�t��ks�� k� �t�kX��k

�t��k�ks��

� ��kt�kX��

�t��ks�� exp decaying term

� ��ktX

��

�t��ks�� exp decaying term

where the last inequality follows because s� and �k are positive� Substituting into� �� gives the desired result�

�



The multi�input� multi�output NARX model based on local ARX models

y�t d� � A�iy�t� �� Any�iy�t � ny�

B�iu�t� �� Bnu�iu�t� nu� Ci ��t d�

and operating regimes introduced in Chapter � can be written in the form �seealso �Priestley � ��

y�t d� � A�z�t��y�t� �� Any �z�t��y�t � ny�

B�z�t��u�t� �� Bnu�z�t��u�t� nu� C�z�t�� t d� � ��

The input� and output�vectors are de�ned by y�t� � �y��t�� ym�t��T and u�t� ��u��t�� ur�t��T � and m and r are the number of outputs and inputs� respectively�It is assumed that r m� The vector of positive integers d � �d�� dm�

T is thesystem�s time�delay� Alternatively� d can be viewed as the system�s relative degree�Monaco and Normand�Cyrot � �� For later use� we de�ne d� � max�d�� dm��and for convenience� we introduce the notation y�t d� � �y��t d�� ym�t dm��T and ��t d� � ��t d�� m�t dm��T � The m�m�matrices A� �� Any�the m� r�matrices B� �� Bnu and the m�vector C depend on the operating pointz�t� � Z in the following way

A�z�t�� PN

i��A�iwi�z�t��

Any�z�t�� PN

i��Any�iwi�z�t��

B�z�t�� PN

i��B�iwi�z�t��

Bnu�z�t�� PN

i��Bnu�iwi�z�t��

C�z�t�� PN

i��Ciwi�z�t��

��

where the wi�functions are interpolation functions� cf� Chapter �� The operatingset Z is an Euclidean space� or a subset of one� The sequence � contains unknown�unstructured uncertainty� such as measurement noise� disturbances� unmodeled dy�namics� modeling error� e�ects caused by sampling a continuous�time system etc�The �true system� may for example be an in�nite�dimensional continuous�timesystem� approximated by the above �nite�dimensional discrete�time model repre�sentation� and in addition to external phenomena like noise and disturbances� theapproximation error contributes to the unstructured uncertainty� The informationvector is de�ned by ��t� � �yT �t�� yT �t� ny�� uT �t�� uT�t� nu��T �

�� Non�linear Decoupling Control

The control structure we will study in this chapter is a discrete�time non�lineardecoupler �Monaco and Normand�Cyrot � �� Such controllers are often called


feedback linearizing controllers �Isidori � � because a non�linear feedback and astate�transform are designed in such a way that the nominal system� with a newset of input variables� is rendered linear� With the NARX model representation�the model is already in the appropriate controlable canonical form� so the state�transform in not needed� We will choose the feedback such that the nominalsystem is not only linearized� but also decoupled�

The control objective is to track a given� bounded reference sequence y��t� �

�y��t�� y�m�t��

T while rejecting the impact of disturbances and noise �containedin �� Consider the feedback

v�t� � A�z�t��y�t� �� Any �z�t��y�t � ny�

B�z�t��u�t� �� Bnu�z�t��u�t � nu� C�z�t��

where v�t� � �v��t� � � � vm�t��T is an external input� Eq� � �� is implicit in u�t��but can easily be solved if z�t� does not explicitly depend on u�t�� since B�z�t�� isnon�singular by the de�nition of the relative degree d �Monaco and Normand�Cyrot� �� If� however� z�t� is a function of u�t�� this function must be invertible� inthe sense that � �� has at least one solution for u�t� at any time t �� Assuminga solution u�t� can be found at all t �� the feedback � �� gives the closed loopinput�output behavior

yj�t dj� � vj�t� �j�t dj� � ��

for j � f�� mg� The nominal system seen from v to y is transformed into a knownlinear and decoupled system� The problem is reduced to one of controlling thesimple system � �� using v�t� as the new control input� However� the unmodeleddynamics introduce couplings� We must therefore design a controller for � �� thatis robust with respect to unmodeled dynamics� and able to reject the disturbanceswhile tracking the reference trajectory� If the dead�times d�� dm are small� thenm decentralized PI or I controllers tuned with some suitable stability margins isa good choice� However� a large dead�time will signi�cantly limit the performanceattainable with a PI or I controller� and more advanced strategies like a Smith�predictor may be considered� In the following� suppose the external inputs arechosen as

vj�t� � y�j �t dj�� Gj�q��#yj�t� � ��

where Gj�q�� Pj�q

��Qj�q�� is a discrete�time transfer function� and the

tracking error is de�ned as #y�t� � �#y��t�� #ym�t��T � y�t� � y��t�� Combining

the model equation � �� with the linearizing feedback � �� and the feedback � ��the closed loop satis�es

#yj�t� � Mj�q��j�t� � ��

for j � f�� mg� where

Mj�q��

�

� q�djGj�q��

Qj�q��

Qj�q�� q�djPj�q��

�� NONLINEAR DECOUPLING CONTROL ��

Design techniques for robust decentralized control may be applied for designingdecentralized controllers Gj�q�� such that the closed loop has the desired distur�bance rejection and tracking properties�

Suppose the model � �� can be inverted� i�e� there exists a globally de�ned func�tion f such that

u�t� � f�u�t � �� u�t� nu�� y�t d�� y�t�� y�t � �� y�t� ny�� t d��

This system should be viewed as the inverse of � �� with output u and inputsy and �� Boundedness of the control input requires a restriction of the class ofnominal models to ones where the inverse system � �� is globally exponentiallystable� Such systems are often called minimum phase system �Byrnes and Isidori� �� Monaco and Normand�Cyrot � ��

For the purpose of analyzing the closed loop� we introduce a relative bound on theunstructured uncertainty�

n�t� � �n�t� �� jj��t�jj � � � �

j�j�t�j � Vjn�t� dj� � ��

for j � f�� mg� where n�� and Vj � is a small constant� Byconstruction� n�t� will bound all other signals in the closed loop� so � �� requiresthe unstructured uncertainty to be relatively small compared to the inputs andoutputs� Notice that the uncertainty formulation is su�ciently general to includestructural modeling error and unmodeled dynamics �Kreisselmeier and Anderson� �� and observe in particular that non�minimum phase e�ects can be hiddenin this uncertainty� Of course� the requirement that the uncertainty must berelatively small will require that these non�minimum phase e�ects are small�

Let � � �� and K � be de�ned such that the impulse response coe�cients ofM��q

�� Mm�q�� are all bounded by K �

t for all t �� Exponential stabilityof the transfer functions M��q

�� Mm�q�� will ensure that this is possible�

Theorem � Let the system �� be controlled by �� and �� and suppose

�� The reference sequence y� is bounded and known d� steps ahead in time� Thebound is denoted K� � supt� jjy��t�jj�

�� The model�s relative degree d � �d�� dm�T � and the order parameters nu

and ny are known�

�� The inverse system �� is globally exponentially stable� i�e� there existconstants Ku � and � � $�� such that for all t �

jju�t�jj � Ku

tX��

�jjy�� d�jj jj�� d�jj ��

�� The transfer functions M��q�� Mm�q�� are proper and exponentiallystable�

�� Eq� �� has at least one solution for u�t� at all t ��


�� The unstructured uncertainty satis�es ��

Let V � max�V�� Vm�� and

� �mV��

��K�K

��

�K�Ku

�� d

�

K�Ku

��

where the constants K� and K� are de�ned in the proof of the theorem� If � � ��then for arbitrary initial conditions� all variables in the closed loop are boundedand the average squared tracking error satis�es for all t � �

�

t

tX��

jj#y�� jj� � K� exp decaying term � ��

where

K� �

�K

��

��K

��

� mXj��

V�j

and K is de�ned in the proof of the theorem�

Proof� From the de�nition of the information vector� we get

jj��t�jj � jju�t�jj �� jju�t� nu�jj jjy�t�jj �� jjy�t � ny�jj

Together with the de�nition of the normalizing signal� and Lemma �� this yieldsthe inequality

n�t� � �

�� q��K�jju�t�jj K�jjy�t�jj ��

where K� � � �� nu and K� � � �� ny � Using the exponentialstability of the inverse system� we get

n�t� � �

�� q��

�K�Ku

�� q��jjy�t d�jj jj��t d�jj �� K�jjy�t�jj �

� H��q

��jjy�t d�jj H��q��jj��t d�jj K

where

H��q��

K�Ku

�� q�� q��

K��d�

�� q��

H��q��

K�Ku

�� q�� q��

K ��

�� jjH��q

��jj�

This gives

n�t� � H��q��jj#y�t d�jj H��q

��jj��t d�jj K

�� NONLINEAR DECOUPLING CONTROL ��

where K � K K�jjH��q��jj�� Next� from � �� we get

n�t� �mXj��

�H��q

��Mj�q�� H��q

�� j�j�t dj�j K

� VR�q��n�t� K

where

R�q�� H��q��

mXj��

Mj�q�� mH��q

��

We de�ne the linear time�invariant system

n�t� � VR�q��n�t� K

The initial condition is chosen as n�� n�� It follows that n�t� n�t� for allt �� By the Small Gain Theorem �Zames � �� it follows that n is a boundedsequence� since VjjR�q��jj� � � � �� Hence� n is also a bounded sequence� andit follows from the de�nition of n that y and u are bounded sequences� We get fort �

n�t� � K

�� VR�q�� K

��

From � �� and Lemma � we get

tX��

#y�j �� jjMj�q��jj��

tX��

��j ��

�

t

tX��

jj#y�� jj� ��

K

��

��K

��

� mXj��

V�j

Remarks� Eq� � �� for the average tracking error indicates that the trackingerror scales with the unstructured uncertainty� and in particular that jj#y�t�jj � �exponentially when t � � if V � �� The exponentially decaying term will beof the form K��

t where K� � is some constant� and is due to non�zero initialconditions that have been neglected in the analyzis� as these do not a�ect thestability of the system� The expression for � is rather complex� and several ofthe constants involved �such as V� � and Ku� are usually not known� The resultmay therefore best be interpreted in a qualitative manner� The most importantobservation is that � scales with V� and that stability requires the unstructureduncertainty to be small�

The proof is based on input�output stability theory �Desoer and Vidyasagar � ��which is well known to give very conservative results in many cases� This alsosuggests that one should emphasize the qualitative interpretation of the results�If one wants to emphasize the quantitative bounds� it must be observed thatappropriate scaling of the various signals is important in order to make the resultsas little conservative as possible�

�


�� Parameter Estimation

The interpolation functions are assumed not to contain any unknown parameters�and all the local model structures are linear in the parameters� Hence� the globalrepresentation � �� is also linear in the parameters� It can be reformulated into alinear regression form

yj�t dj� � �T �t��j �j�t dj� � ��

well suited for parameter estimation� for j � f�� mg� The regressor vector isgiven by

��t� �

�BBBBB�w��z�t��

��t�w��z�t��

wN �z�t��t�wN �z�t��

�CCCCCAand the �true parameter� vector ��j is de�ned as

��j �

�B� ��j��

��j�N

�CA � ��j�i �

�BBBBBBBBBB�

rowj�ci�rowj�A�i�

��rowj�Any�i�rowj�B�i�

��rowj�Bnu�i�

�CCCCCCCCCCAfor j � f�� mg� The vector rowj�A� is the j�th row of the matrix A� It isassumed that a closed and convex sub�set )j of the parameter space is known�and that ��j � )j �In the following� we will investigate the properties of recursive parameter esti�mation algorithms for the linearly parameterized model � �� The parameter

estimate at time t is denoted "�j�t�� and the parameter error vector is de�ned by#�j�t� � "�j�t�� j � We de�ne the prediction error

ej�t� � "yj�tjt� �� yj�t� � �T �t � dj�#�j�t � �� j�t� � ��

For convenience� the normalizing signal �Praly � �� is rede�ned as

n�t� � �n�t� �� jj��t�jj

for some constant � � �� and initial condition n�� Again� assume thereexists a small constant Vj � such that

j�j�t�j � Vjn�t� dj� � ��

�� PARAMETER ESTIMATION �

for j � f�� mg� The normalized prediction error is de�ned by

�j�t� � ej�t�n�t � dj�

Consider the projection algorithm with relative dead�zone �Kreisselmeier and An�derson � �� and parameter projection�

"�j�t� � "�j�t� �� ,j�t� ��t� dj�

�T �t� dj��t � dj�n�t � dj�D��j�t��

"�j�t� � P�j

"�j�t�

�D��j �t��

��j�t� d

j if �j�t� � �dj

� if j�j�t�j � dj

�j�t�� dj if �j�t� � d

j

An initial estimate "�j�� )j is assumed to be given� Observe that jj��t� dj�jjis bounded away from zero for all t � because PN

i��wi�z�t�� for all z�t� �Z� The continuous projection operator P�j

projects "�j�t� to the closest point in)j �using Euclidean norm�� The existence and uniqueness of a closest point isguaranteed by the closedness and convexity of )j � The continuous function Dis the dead�zone function with dead�zone d

j � The purpose of the time�varying

matrix ,j�t� is to select which local models to update� in addition to being a gain�matrix� Traditionally� this matrix is chosen as ,j�t� � �I� where � � �� is thescalar gain �Goodwin� Ramadge and Caines � �� Using local models� it may bedesirable to choose ,j�t� � diag ��j��t�I� �� j�N�t�I�� where

�j�i�t� �

�� for i � Ij�t�� otherwise

� ��

where Ij�t� � f�� Ng is a time�varying and operating�point dependent indexset� To see the motivation behind this� suppose we use Gaussian local modelvalidity functions

�i�z� � exp

��z � zi�

T (��i �z � zi�

The constant vector zi is of the same dimension as z� and (i is a positive de�nitematrix of appropriate dimension� Remember that the regressor vector ��t� hassub�vectors ��t�wi�z�t�� Using the projection algorithm with ,j�t� � �I willinvolve the updating of the parameter estimates of all the local models� sincewi�z� � � for all z � Z and i � f�� Ng� This may be undesirable� since itmeans that local models corresponding to irrelevant operating regimes will havetheir parameter estimates slightly updated� If the system operates within a singleoperating regime for a long time� this may cause drift of the parameter estimateswhich do not correspond to this regime� This problem can be handled in at leasttwo ways�

�� The local model validity functions can be chosen such that they are exactlyzero at distant operating points� instead of going asymptotically to zero as

� CHAPTER �� OPERATING REGIME BASED ADAPTIVE CONTROL

the Gaussian� This will give zeros in the regressor vector and automaticallyprevent the updating of the parameters which correspond to irrelevant op�erating regimes� If the dimension of z�t� is high� designing such local modelvalidity functions is sometimes di�cult� since we do not always know ana priori bound for z�t�� and because for any z�t�� at least one of the localmodel validity functions must be non�zero to ensure that the global modelis complete� It may also be desirable that the local model validity functionsare smooth� which may cause an additional design problem�

�� It may sometimes be simpler to use smooth� strictly positive local modelvalidity functions� and ensure by modifying the estimation algorithm thatonly the most relevant local model�s� are updated at a given time�

Using the second approach� we need to design the gain�matrix ,j�t� such that at agiven time t� only the most relevant local model parameter estimates are updated�and the stability properties of the parameter estimation algorithm are preserved�

Theorem � Suppose the data are generated by �� )j is a known compactand convex sub�set of the parameter space with the property ��j � )j � and theunstructured uncertainty satis�es �� Consider the projection algorithm with

dead�zone dj � Vj and gain �� where

Ij�t� ��i � f�� Ng

�� wi�z�t�� jD��j�t��jn�t� dj�

�N �� jj��t� dj�jj�2j�i

and 2j�i sup�j�i��j�i

jj�j�i � ��j�ijj for i � f�� Ng� If � � �� max�� where�max � �� this algorithm has the properties

jj#�j�t ��jj � jj#�j�t�jj � jj#�j��jj � ��tX

��

D��j��

��max � ��jj#�j��jj� � ��

j�j�t�j � Vj jD��j�t��j � ��tX

��

jj&#�j�� jj� � �

�max � �jj#�j��jj� � ��

for all t ��Proof� First observe that the existence of 2j�i is guaranteed by the boundedness

of )j�i� Boundedness of jj#�j�t�jj follows directly from the boundedness of )j andthe parameter projection� The normalized prediction error can be written

�j�t� ��T �t� dj�

n�t� dj�#�j�t� �� j�t�

n�t � dj�

From the de�nition of the relative dead�zone and � �� there exists a sequence�j such that � � �j�t� � � and

D��j �t�� j�t��T �t� dj�

n�t� dj�#�j�t � ��

�� PARAMETER ESTIMATION �

Hence�

#�j�t� �

�I � �j�t�,j�t�

��t � dj��T �t� dj�

�T �t� dj��t� dj�

#�j�t� ��

We de�ne the Lyapunov�function Vj�t� � jj#�j�t�jj�� Using the projection operatorproperty jj#�j�t�jj � jj#�j�t�jj� we get

Vj�t�� Vj�t � �� j�t�#�Tj �t� ��,j�t��t� dj��T �t � dj�

�T �t � dj��t � dj�#�j�t� ��

��j �t�#�Tj �t� ��

��t� dj��T �t � dj�,

Tj �t�,j�t��t � dj��

T �t� dj�

��T �t� dj��t � dj��#�j�t� ��

If �j�t� � � and #�Tj �t � ��t� dj� � �� then

Vj�t�� Vj�t� �� j�t��n��t � dj�D��j�t��

�T �t � dj��t � dj�

� ��

where

�j�t� � �#�Tj �t� ��,j�t��t � dj�

�j�t�#�Tj �t� ��t � dj�� T �t � dj�,Tj �t�,j�t��t � dj�

�T �t � dj��t � dj��

By the de�nition of Ij�t�� we get#�Tj �t� ��,j�t��t � dj�

n�t� dj�D��j�t�� #�Tj �t� ��t� dj�� #�Tj �t � �� I � ,j�t��t � dj�

n�t � dj�D��j �t��

�j�t��

Xi�� Ij�t�

�jj#�j�i�t � ��jj � �� jj��t� dj�jj�wi�z�t � dj��

n�t � dj�D��j �t��

�

��

�N �.Ij�t�

�N

�

��

From � �� we get �j�t� �� If �j�t� � � or #�Tj �t � ��t � dj� � ��then it follows from � �� that Vj�t� � Vj�t�� and we may conclude from � ��that Vj is a non�increasing sequence� and � �� follows� Furthermore� from � ��

D��j�t��

��

��T �t� dj��t� dj�

n��t� dj�

�Vj�t � �� Vj�t��

if �j�t� � �� and from � �� it follows that D��j�t�� if �j�t� � �� HencetX

��

D��j�t��

�� jj#�j��jj�


and � �� follows� Directly from the de�nition of the dead�zone� it follows thatj�j�t�j � Vj jD��j�t��j� Finally�

tX��

jj&#�j�� jj� �tX

��

jj#�j�� #�j�� jj�

�tX

��

��j�� ,j��

�� dj��T �� dj�

�T �� dj�� dj�#�j��

T��

�j�� ,j�� dj��T �� dj�

�T �� dj�� dj�#�j��

� �

��

tX��

D��j��

Remarks� No assumptions on the data� such as boundedness or persistence ofexcitation� are made in this theorem�

Similar results can be found if the dead�zone modi�cation is replaced by one ofseveral other possible robustness modi�cations� including ��modi�cation and �modi�cation �Ioannou and Datta � �� Narendra and Annaswamy � ��

�

The assumption that )j is a compact set is made because explicit bounds 2j�i onthe parameter error must be known to compute Ij�t�� If we use the parameterestimation algorithm without thresholding� i�e� Ij�t� � f�� Ng� then )j maybe unbounded� and the same conclusions as in Theorem � with �max � � can beproved�

The threshold suggested in the de�nition of Ij�t� may be too close to zero forpractical purposes� because of the conservative bounds applied in the derivation ofthis threshold� We therefore suggest that a somewhat larger threshold is applied�typically in the range �� to ��

�� Adaptive Control

Any unknown parameters in � �� can be replaced by their estimates� which givesthe certainty equivalence feedback

v�t� � "A�z�t�� t�y�t� �� "Any �z�t�� t�y�t � ny�

"B�z�t�� t�u�t� �� "Bnu�z�t�� t�u�t � nu� "C�z�t�� t� � ��

It is assumed that this implicit equation has at least one solution for u�t� at everytime t �� and that a solution can be computed� Now� the closed loop is described

�� ADAPTIVE CONTROL ��

by

y�t d� � v�t� A�z�t�� "A�z�t�� t�

�y�t�

�� Any�z�t�� "Any �z�t�� t�

�y�t � ny�

B�z�t�� "B�z�t�� t�

�u�t�

�� Bnu�z�t�� "Bnu�z�t�� t�

�u�t� nu�

C�z�t�� "C�z�t�� t�

� ��t d� � ��

Notice that in this case both the unmodeled dynamics and the errors in the pa�rameter estimates introduce couplings� Using the same arguments as in section �� the external inputs can be chosen according to � �� An overview of thecontrol structure is shown in Fig� �� Combining the model equation � �� withthe certainty equivalence feedback � �� and the feedback � �� the closed loopsatis�es

#yj�t� � Mj�q�� T �t� dj�#�j�t� dj� �j�t�

��

Theorem � Suppose system �� is controlled by �� and �� and the pro�

jection algorithm with dead�zone dj is applied to estimate the unknown parameter

vectors ��j � for j � f�� mg� Furthermore� suppose assumptions �� of Theorem� hold� and in addition

�� The sets )j � for j � f�� mg� are closed and convex� If thresholding is used�)j must also be bounded� Furthermore� these sets are such that ��j � )j �and equation �� has at least one solution u�t� at all t ��

�� The parameter estimators have gain � � �� max�� where �max � �� ifthresholding is used� and �max � � otherwise�

�� The unstructured uncertainty satis�es �� and the dead�zone is chosen

as dj � Vj � for j � f�� mg�

Let V � max�V�� Vm�� and let � be as in Theorem �� If � � �� then forarbitrary initial conditions� all variables in the closed loop are bounded and theaverage squared tracking error satis�es for all t � �

�

t

tX��

jj#y�� jj� � �K� �

tK� exp decaying term � ��

where

K� �

�K

��

��K

��

� mXj��

V�j

K� � �

�K

��

��K

��

� mXj��

��

��max � �� dj � ��max � �

jj#�j��jj�

and K is de�ned in the proof of the theorem�


-

-

Adaptation

y� #y v

�

"y

M�� S

M

u yG

"�

Figure �� An overview of the control structure� S is the system�M is the model�M�� is the inverse model� where it is understood that the non�minimum�phasee�ects are not inverted� "� is the model parameter estimate� and G is the outerlevel controller�

Proof� We de�ne

#Yj�t� � �ej�t�� T �t� dj� "�j�t� dj�� "�j�t� ��

��

and with this notation� from � �� we get the following equation for the trackingerror

#yj�t� � Mj�q�� #Yj�t� � ��

Using the same arguments and notation as in the proof of Theorem �� we get

n�t� � H��q��jj#y�t d�jj H��q

��jj��t d�jj K

�mXj��

H��q

��Mj�q��j#Yj�t dj�j H��q

��j�j�t dj�j� K

From � �� we get

#Yj�t�

n�t� dj�� j�t� � �T �t� dj�

n�t � dj�

"�j�t � dj� � "�j�t � ��

��

From Theorem � it is evident that since V � Vj � dj � there exists a Tj � such

that for all t Tj

dj jD��j�t��j jj#�j�t� dj�� #�j�t� ��jj � V

because both jD��j�t��j � � and jj#�j�t � dj� � #�j�t � ��jj � � as t � �� Hence�for t Tj it follows from � �� that j#Yj�t�j � Vn�t � dj�� Together with � ��

�� ADAPTIVE CONTROL ��

this gives for t T � max�T�� Tm�

n�t� �mXj��

�H��q

��Mj�q��Vn�t� H��q

��Vn�t�� K

� VR�q��n�t� K

Using the same argument as in the proof of Theorem �� we conclude that since

VjjR�q��jj� � � � �

all signals in the closed loop are bounded� and for t T we have n�t� � K��

�� Since T is �nite and the system does not have �nite escape time� there existsa constant K

K such that n�t� � K

�� for all t �� The equation forthe tracking error � �� together with � �� gives

#yj�t� � Mj�q��

��j�t�

�T �t� dj�

n�t� dj�

#�j�t� dj�� #�j�t � ��

�n�t � dj�

From Lemma �� we get

tX��

#y�j �� jjMj�q��jj��

tX��

��j �� jj#�j�� dj�� #�j�� jj�

�n�� dj�

and

tX��

jj#y�� jj� � �

�K

��

��K

��

� mXj��

tX��

��j �� jj#�j�� dj�� #�j�t� ��jj�

�Using Theorem �� the conclusion of the theorem is proved�

Remarks� The bound on the average squared tracking error � �� consists ofthree terms� The �rst term �K� is due to the unstructured uncertainty� and willvanish as V � �� The second termK�t is due to parametric error� and will asymp�totically go to zero as t�� since the estimator will asymptotically eliminate thee�ect of parametric uncertainty� The third term is due to exponentially decayingterms that have been neglected in the analysis� While K� bounds the controller�sasymptotic performance� K� is a bound on the controller�s transient performance�A major drawback of the analysis is that an explicit expression forK

is not found�However� more explicit bounds may be found using the Bellman�Gronwall lemma�Desoer and Vidyasagar � �� cf� Appendix A�

The result clearly shows that the initial parameter error is the most importantcause for poor transient behavior� and while the result is global stability� it is clearfrom the discussion above that jj#��jj must be small to avoid large transients�Persistence of excitation has not been assumed� nor has it been proved� How�ever� we expect the transient performance to improve if the input is persistentlyexciting� since the e�ect of jj#��jj will decay exponentially� Notice that a neces�sary condition for global persistence of excitation is that every operating regimeis regularly visited� a condition that is unlikely to be ful�lled for many systems�


While the result is that the average tracking error is small� it is not possible toshow that the tracking error is small at a certain instant in time� Hence� burstingphenomena may be present if the input is not locally persistently exciting�

�

An alternative controller for the decoupled and linearized system may be

vj�t� � �Gj�q��#yj�t�

which does not require the future reference trajectory to be known� The analysiswill be identical� except for a term y�j �t� that will appear in the equation for

the tracking error� If the reference sequence is slowly time�varying and Gj�q��

contains integral action� we get similar performance results in this case�

Of practical importance are methods for on�line solution of the implicit equation� �� for u�t�� The simplest case is when z�t� does not explicitly depend on u�t��Provided "B�z�t�� t� is non�singular� an expression for u�t� can be found explicitly�Non�singularity can be enforced either through the speci�cation of the convexparameter sets )��)�� )m� or by varying the gain � � �min � ��t� � �max

at each discrete time step t such that non�singularity is ensured �Goodwin et al�� An alternative to directly inverting "B�z�t�� t� when computing u�t� is touse the Levenberg�Marquardt regularized inverse

"B�� z�t�� t� � �I "BT

�z�t�� t� "B�z�t�� t��

"BT �z�t�� t�

where � � � is a small regularization constant� Using this modi�cation� it is notnecessary to restrict the parameters or modify the parameter estimation algorithmto avoid singularity� at the cost of an approximate solution� In the events of z�t�explicitly depending on u�t�� a numerical equation solver will usually be required�Should it be impossible to ensure a priori that there exists an exact solution of� �� an optimization approach can be taken� In that case� one can also includeconstraints such as actuator saturation� component failures� or constraints on theoutputs into the optimization problem� It is also possible to include some weighton the control input and to handle input redundancy when r � m�

Usually� the operating regime based modeling approach gives a model structurethat contains a signi�cant number of unknown parameters� In general� a largenumber of parameters may lead to drift phenomena and bursting� since the modelmay be insensitive to some parameters� if the excitation is not persistent� However�this will not represent a serious problem for the proposed method� The reason issimply that the local model representation will naturally focus the attention onthe few relevant parameters at any time� namely the parameters of the relevant lo�cal model�s� at the current operating point� The parameter estimation algorithmwith thresholding will emphasize this focus� On the other hand� the number ofunknown parameters can be reduced by using prior knowledge in terms of knownlocal models� if available� It is a straightforward exercise to modify the algorithmspresented here in such a way that local models with di�erent structures are al�lowed� Also� the parameters of some local models may be �xed� This �exibilitymay give robust control structures with high performance� since one may choose

�� SIMULATION EXAMPLE A �� CSTR� ��

simple but robust models in regimes when robustness is more important than per�formance� and more complex model structures with more parameters in regimeswhere performance is more important� Of course� one must take into account theavailable prior knowledge� and the excitation of the system�

�� Simulation Example� A � � � CSTR�

The theoretical analysis in the previous sections has shown that the model basedcontrol structure may give a stable and robust control system under certain as�sumptions� The most fundamental assumption is that the unstructured uncer�tainty must be small� i�e� neither the modeling error nor the disturbances must betoo large� With a semi�realistic simulation example� we will illustrate the modeldevelopment and justify our claim that the operating regime based modeling ap�proach may lead to models that describe the behavior of the system su�cientlywell for model based control applications�

The adaptive control structure is applied to a simulated exothermic continuousstirred tank reactor �CSTR� where a �rst order chemical reaction A � B takesplace� The simulated system is described by the mass� and energy�balances

Vd

dtcA � cAiqi � cAqo � V rA

�cvVd

dtT � �cvTiqi � �cvTqo Q�&HrV rA

rA � kcA exp

��EA

R

��

T� �

TR

where the symbols are described in Table �� In addition� the model containsdead�time caused by transportation delay and stirring dynamics�

The control objective is to track reference trajectories for the composition cA andtemperature T � while rejecting disturbances which are due to variations in feedcomposition cAi� The temperature set�point T � will typically vary between ��and �� K� while the composition set�point c�A is constant at �� moll� Weassume that T and cA are measured on�line� and use the heat �ow�rate Q andfeed �ow rate�rate qi as control variables� The two�dimensional vectors y�t� andu�t� are appropriately scaled outputs and inputs� The system output is sampledat one�minute intervals� and the control input may be changed at the same rate�Furthermore� we assume ideal control of the volume� such that V and � are keptconstant�

The operating point vector z�t� � �T �t�� qi�t��T is two�dimensional and containsthe temperature and feed �ow�rate� This choice is motivated by the well knownfacts that the reaction rate is non�linearly dependent on temperature� and thehold�up time depends on the �ow�rate� Hence� the major non�linear phenomenaare expected to be captured by this choice of operating point� The local modelvalidity functions are illustrated in Fig� �� It is a rule of thumb that an increasein temperature of about �� K will lead to an approximately doubled reaction rate


Symbol Value Unit DescriptionT K Reactor temperatureTi �� K Feed temperatureTR �� K ReferencecA moll Concentration of A in reactorcAi moll Concentration of A in feedEA �� Jmol Activation energyR �� J�Kmol� Gas constantk �� min Frequency factor� �� kgl Ave� density in reactorcv �� J�Kkg� Ave� heat capacity in reactorV �� l Reactor volume&Hr � �� Jmol Reaction energyqi lmin Feed �ow�rateqo lmin Outlet �ow�rateQ MW Heat �ow�rate �heat exchanger�� min Dead�timeT � �� K Temperature set�pointc�A �� moll Concentration set�point

Table �� Symbols�

for many chemical reactions �including this one� �Vogler � �� On this basis�we choose the distance between each local model to be �� K� since we expectthis to give an approximation of the non�linearities that is su�ciently good forcontrol purposes� The �ow�rate qi and the hold�up time are inversely proportional�Hence� by considering the range the �ow�rate is expected to vary within� it seemsreasonable to distinguish between low and high �ow�rates only� In other words�the operating regimes are designed using fuzzy sets representation of �high� and�low� feed �ow�rate� and �high�� medium�� and �low� temperature� cf� Fig� ��

The model structure is based on six local ARX model structures of the form

"y�t �jt� � A�iy�t� B�iu�t� B��iu�t� �� Ci

Clearly� since the model structure is di�erent from the simulated system� there willbe modeling error� In principle� the modeling error can be reduced by a �ner de�composition of the operating regimes� but this will lead to an increase in the num�ber of parameters� which will introduce uncertainty in itself� cf� the bias�variancedilemma discussed in Chapter �� However� we will see that although the chosendecomposition is quite rough� it is su�cient for our purpose� and it provides agood trade�o� between model bias and variance� Observe that rather elementaryprocess knowledge has been applied in order to �nd the model structure�

The implicit control equation is non�linear with respect to u�t�� and a numericalequation solver is used to solve � �� at each discrete time step� The controlinput from the previous time�step is used as the initial value in this algorithm�No convergence problems were observed� Since the system is open loop unsta�ble �&Hr � �� it is important to have initial parameter estimates which give a

�� SIMULATION EXAMPLE A �� CSTR� �

0

200

400

600

350

360

370

380

3900

0.2

0.4

0.6

0.8

1

Temperature Feed Flow Rate

Figure �� Local model validity functions used in simulation example�

controller that stabilizes the system� Initial estimates of the � unknown param�eters in the model are found o��line using a �� time�steps data sequence anda least squares estimator� There are no restrictions on the parameter estimates�and we use the on�line estimator described in section �� Since there is no prob�lem with local model validity functions that asymptotically go to zero� cf� Fig� �� no thresholding is applied� The gain is low� � � �� assuming there is noneed for fast adaptation� On the basis of the linearized and decoupled nominalsystem yi�t� � q��vi�t�� we choose the outer level controller as simple integratorsGj�q�� q�� which can be proved to give a fair trade�o� betweenhigh bandwidth and robustness for this nominal system�

A simulation sequence with changing temperature set�point and disturbances infeed composition is shown in Fig� �� For comparison� we also show results usingtwo SISO PI�controllers

Q�t� � KQ� �is

�is�T ��t� � T �t��

qi�t� � Kq� �is

�is�c�A�t� � cA�t��

where �i � min� KQ � �� MWK� Kq � �� l��mol min�� and s is theLaplace operator� These controllers are well tuned in the sense that we have madea signi�cant e�ort to �nd PI�parameters that simultaneously give fast responseand the smallest possible overshoot� Despite this e�ort� the response with the PIcontroller is slower than the response with the model based controller� and the

� CHAPTER �� OPERATING REGIME BASED ADAPTIVE CONTROL

0 2 4 6 8 10 12 14 16 18350

355

360

365

370

375

380

385

390

395

400

0 2 4 6 8 10 12 14 16 180

100

200

300

400

500

600

0 2 4 6 8 10 12 14 16 18-10

-8

-6

-4

-2

0

2

4

0 2 4 6 8 10 12 14 16 180

2

4

6

8

10

12

14

0 2 4 6 8 10 12 14 16 180.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

Time [hours]

Temperature T �K�

Composition cA �mol�l�

Feed composition cAi �mol�l�

Heat �ow�rate Q �MW �

Feed �ow�rate qi �l�min�

Figure �� Simulation sequence� Dotted curves �� are set�points� solid curves� / � are simulation with adaptive controller� while dashed curves �� aresimulation with PI controller�


PI controller gives signi�cant overshoot after set�point changes� Notice that thelarge overshoot is due to the combination of dead�time and system couplings� andcannot be avoided by re�tuning the PI controller� The model based controller isable to decouple the system� and therefore avoid overshoot� Both controllers areable to reject the disturbances in feed concentration� the adaptive controller beingthe somewhat better of the two� The adaptive model based controller is able todo this through the integral action in Gj�q�� and by adapting the parameterestimates� Since the estimator is slow compared to the control loop� the �rst e�ectis the most signi�cant� but the second one is also important since the disturbancedirectly in�uences the dynamics of the system� The control authority of the twocontrollers are comparable� but the PI controller uses a little more control e�ort�

An alternative control structure using an operating regime based model appliedto the same simulated system is reported in �Foss and Johansen � ��

�� Discussion

In this chapter we have proposed an adaptive control structure based on local ARXmodels� Theoretical properties� like stability and robustness are investigated� andpractical design trade�o�s are brie�y discussed� The theoretical analysis is basedon reasonably weak assumptions on the system and model� The main assumptionsare

� Invertibility of the model� and global exponential stability of the nominalinverse model� This excludes strong non�minimum�phase e�ects� and wasmade necessary by the particular control structure�

� The unstructured uncertainty is relatively small� This is a fundamentalassumption�

� The nominal system is time�invariant� This assumption is made for conve�nience only� and can be relaxed to include slowly time�varying non�linearnominal systems� see Appendix A�

Unfortunately� the results are of a qualitative nature� This is due to both the gen�erality of the setup� and the application of input�output stability theory� However�we believe that more quantitative� but somewhat weaker results can be found usingLyapunov theory� similar to Chen and Khalil �� and Polycarpou and Ioannou��

The simulation example illustrates two points� First� the prior knowledge requiredfor developing the model structure in this example is quite elementary and of amore qualitative character than what would be needed for the design of a mech�anistic model structure� This is consistent with the results in Chapters � and ��Second� the accuracy of the model is su�cient to give good performance whenused in the model based controller�

The application of local models in the context of adaptive control is attractivebecause only local persistence of excitation is needed to avoid drift and burstingphenomena� rather than global persistence of excitation that may be needed witha non�local model representation�

� � CHAPTER �� OPERATING REGIME BASED ADAPTIVE CONTROL

Chapter ��

Conclusions

�� Operating Regime based Modeling

Some speci�c conclusions have been made in the discussion sections proceedingeach chapter� The purpose of this section is to summarize and make some over�all conclusions� This will lead to some advice regarding the applicability of theoperating regime based modeling framework� and suggestions for future work�

�� Transparency

In this thesis we have analyzed a modeling framework based on the concept ofoperating regimes� This is appealing� since this is a concept that is well understoodand applied by process engineers� and some operators and managers� Hence�the framework may support e�ective interdisciplinary communication about thesystem� The focus will be forced towards representation of knowledge related tothe overall operation of the system in terms of operating regimes� Such knowledgeis often qualitative and vague� and sometimes di�cult to incorporate in otherapproaches� A major bene�t of the operating regime concept is that it elevatesthe modeling problem to a more abstract level compared to purely equation basedmodeling� Of course� frameworks based on geometry�phenomena and some objectoriented approaches� e�g� �Telnes � �� Marquardt � �� have this feature� too�

Transparency of the model is important because it simpli�es both model devel�opment and application� In particular� validation� analysis� interpretation� andincorporation of prior knowledge bene�ts strongly from a transparent model� Forexample� we believe lack of transparency is one major reason why complex black�box model representations like neural nets have only had limited applicabilityto model�based control� The operating regime based modeling framework givesmodels that are reasonably transparent� and supports both the empirical andmechanistic modeling paradigms� The transparency is related to the possibilityof interpreting the operating regimes in terms of either physical phenomena or

� �

� � CHAPTER �� CONCLUSIONS

di�erent behaviors of the system� Moreover� it is related to our ability to inter�pret the local models independently� because the local models are typically simpleenough to allow interpretation� This last point is not so important� as it is possibleto linearize most models about interesting operating points to study their localbehavior� The operating regime interpretation� however� is an inherent feature ofthe particular model representation�

�� Hybrid Modeling

The operating regime based modeling framework is �exible and supports hybridmodeling� in the sense that the di�erent operating regimes or local models mayhave di�erent

� levels of accuracy and detail�� model representations�� amounts and quality of relevant knowledge and data available�� levels of transparency�� computational algorithms associated with them� like estimators�

For large scale modeling problems� this �exibility may be useful� In particular� anessential part of hybrid modeling is the balanced combination of empirical dataand prior knowledge in various forms�

The operating regime based modeling approach has found inspiration and appliesmethods from a number of di�erent �elds� At least super�cially� the operatingregime based modeling approach provides a uni�cation of related modeling ap�proaches that have arised in di�erent �elds� cf� Fig ��

EMPIRICAL MECHANISTICMODELINGMODELING

HYBRID

Biology

MODELING

FuzzyNeural Physics

ChemistryStatistics

IdentificationSystem

Figure �� Elements of hybrid modeling�

�� OPERATING REGIME BASED MODELING � �

�� Incremental and Iterative Modeling

Typically� process knowledge will improve with time� as experience increases� newphenomena are discovered� and more process data becomes available� Sometimes�such new knowledge is relevant only to particular operating conditions� If this isthe case� the operating regime based modeling framework supports simple mainte�nance of the model� since one may only need to change one local model� or one maychoose to decompose one operating regime into two or more sub�regimes� The re�maining local models and operating regime can be left unchanged� In a sense� theoperating regime based modeling approach can be thought of as modular� wherethe modules are the di�erent operating regimes with local models�

Since some knowledge about the validity of each local model is explicit in themodel representation� it is often possible to deduce operating conditions underwhich the system is not su�ciently well understood� Hence� experiment designsupport is directly available in the form of hints about which operating conditionsexperiments should be conducted under in order to improve the model� In addition�standard experiment design tools can be applied �Fedorov � �� Box and Draper� ��

The operating regime based modeling framework supports iterative model devel�opment� Consider for example the batch fermenter modeling example� One maystart with a semi�empirical local linear model structure� cf� section �� andgradually add more structure to both the local models and operating regimes�until one gets a semi�mechanistic model structure� cf� section �� The identi��ed local models provide new insight into the reaction kinetics� which may leadto a mechanistic kinetic model� and eventually a completely mechanistic processmodel�

�� Computer Aided Modeling

With examples and philosophical arguments we have explored some of the �exi�bility of the operating regime based modeling framework� The high �exibility isre�ected by the fact that there are a large number of decisions to be made beforea complete model is developed� The tasks are well de�ned and not too stronglyinterconnected� A typical modeling session would include iterations of the tasks

� Choice of model representation� e�g� state�space� input�output� distributed�� Choice of which variables to apply to characterize the operating regimes�� Design of experiments�� Decomposition of the operating range into operating regimes�� Development and identi�cation of local model structures�� Parameter identi�cation�� Model reduction�� Model validation�


It is clear that some of these tasks are usually better performed by computers thanengineers �e�g� parameter identi�cation�� and vice versa �e�g� choice of model rep�resentation�� Other tasks like structure identi�cation �choice of regimes and localmodel structures�� experiment design� model reduction� and model validation aremore suitably solved by cooperation between engineer and computer� where theemphasize on man or computer may di�er considerably among di�erent model�ing problems� We therefore suggest that the operating regime based modelingframework described here should be implemented as a computer aided modelingenvironment� In particular� the computer should support the user with the com�putation of statistical properties and visualization of the data� knowledge� modelstructure and parameters in a �exible� interactive modeling environment� and leaveit to the user to pursue the more interesting alternatives and make the importantdecisions�

�� Applicability

The operating regime based modeling framework is intended to support the designof non�linear models� While linear modeling is su�cient for solving many of to�days control or optimization problems� non�linear modeling may sometimes be ofinterest when the system is operating within a wide range of operating conditions�for example batch processes� continuous processes during startup and shutdown�product shifts� maintenance and other exceptional operating conditions�

Knowledge

Poor Global empirical model

Global mechanistic model

Local mechanistic models

Local empirical models

Mechanistic

Combined local mechanisiticand empirical models

GoodPoor

Good

Process Data and Empirical

Knowledge

Figure �� The �gure illustrates which cases the operating regime based model�ing framework is useful� as a function of the available mechanistic and empiricalknowledge and data�

Fig� �� illustrates our view on when the operating regime based modeling frame�work is useful� cf� Fig �� The conditions are characterized by moderate amounts

�� OPERATING REGIME BASED MODELING � �

of both prior knowledge and empirical data� as opposed to the mechanistic mod�eling approach which requires very good prior knowledge� and the empirical mod�eling approach that requires large amounts of informative process data� The fullrange of knowledge levels in the gap between the empirical and mechanistic ap�proaches can be addressed with the operating regime based modeling approach�as illustrated by the examples�

� Local empirical models� Fermenter in sections �� and � heattransfer process in section �� hydraulic manipulator in section �� andCSTR in section ��

� Mixed local empirical and mechanistic models� pH neutralizationexample in section ��

� Local mechanistic models� Fermenter in section ��

Its �exibility combined with its transparency is the major advantage of this ap�proach� For example� compared to a neural network based modeling approach�we have demonstrated that the operating regime based modeling approach hasthe same ability to �t empirical data� but due to its transparency we �nd it sig�ni�cantly more appealing� One the other hand� with a mechanistic local modelstructure where the unmodeled terms are modeled with simple empirical functions�the approach is very close to practical mechanistic modeling� It may be that theoperating regime based modeling framework is not �the very best approach� forany given problem� but its �exibility makes it useful for a wide range of problems�

Recently� some industries have showed great interest in� and accepted certainmodel predictive control algorithms �Cutler and Ramaker � � Garcia et al� � ��In general� these algorithms are based on linear empirical input�output models� Inthe future� the demand for improved plant operation may require controllers basedon non�linear models to become standard� too� We believe that modeling and iden�ti�cation will be the major bottleneck in such control systems� and the success ofempirical models for linear model predictive control makes it not unlikely thatcertain industries will insist on applying non�linear input�output models with asigni�cant amount of empiricism� The operating regime based modeling approachdiscussed in this thesis is in our view an interesting approach for this purpose� cf�also Chapters and �

�� Open questions

Although the results reported in this thesis are promising� several open questionsremains�

� How does the operating regime based modeling approach perform on complexand high�dimensional problems� It may be that a signi�cant amount oftransparency is lost� Furthermore� prior knowledge may be available in formsthat are not possible to utilize easily with this approach� These aspects arenot covered by the simple examples in this thesis� nor is it convincinglydemonstrated by the more complex examples in the literature� e�g� �Sugenoand Kang � �� which are too incompletely documented to allow us to drawany conclusions�


� Is this framework really easy to apply� We have made a large e�ort to de�velop and understand this framework� This makes us expert users� Theframework may possibly be less transparent and more di�cult to apply forinexperienced users� We have observed that inexperienced users �throughstudent projects and diploma theses� intuitively understand the idea and�nd it appealing� However� the interplay between the choice of local modelstructures and operating regimes seems to require some experience to under�stand completely�

� Are the procedures and algorithms robust with respect to de�cient and in�complete prior knowledge and data set� The data sets we have applied hereare either simulated or experimental data under rather ideal experimentalconditions�

� How should the structure of a computer aided modeling environment be� Wehave only implemented some core algorithms� A software prototype hasnever been a partial goal for this thesis� and there remains a considerablee�ort before the structure of a computer aided modeling environment can beseen� in particularly related to man�machine interaction� It is important thatthe software environment does not reduce the transparency of the approach�Moreover� the environment must be �exible� yet so focused that it is easy touse and understand�

A complete validation of the framework� and an answer to the above questions�can only be accomplished through one or more tests on real modeling problems ofindustrial complexity� This is not within the scope of this thesis�

�� Empirical Modeling� System Identi�cation�

and Regularization

Even though the material in Chapter is somewhat preliminary and not treatedin as much depth as one might desire� we feel that it is an important contribution�and deserves some conclusions�

We have shown that the non�linear semi�empirical modeling problem can be ad�dressed at a higher and more transparent level than the direct speci�cation of aparameterized model structure� Prior knowledge in di�erent forms can be appliedto constrain the model� The method can be viewed in the context of regularization�

Regularization is a quite new tool in the empiricalmodeling tool�box� but has foundwide applicability in solving inverse problems in a number of other �elds in scienceand engineering �Tikhonov and Arsenin � � Nashed � �� Sabatler � �� Withregularization� the number of parameters and the parameterization of the model�are of minor importance� as it is the model�s e�ective number of parameters andthe constraints on the parameter space introduced by the prior knowledge that areimportant� Hence� the underlying model structure is less important than the priorknowledge used in the regularization to constrain the parameter space� Of course�regularization can also be applied when a good parameterized model structure

�� ADAPTIVE CONTROL �

is known� in order to robustify the parameter identi�cation problem� We havealso seen that the modeling and identi�cation problem can be approached in anin�nite�dimensional function space rather than in a �nite�dimensional parameterspace� Although of limited practical applicability� this provides some fundamentalinsight into the interplay between prior knowledge and model structure�

It is interesting to observe that regularization can be implicit� like the singu�lar value decomposition approach to matrix inversion� stopping an iterative opti�mization �Sj�oberg and Ljung � �� Ljung et al� � �� and in local learning andsmoothing algorithms �Hastie and Tibshirani � �� Murray�Smith � �a�� Notunderstanding the e�ect of implicit regularization have led some researchers tocriticize researchers in the neural network �eld for applying over�parameterizedmodels� At least sometimes� this criticism has been unfair�

The practical limitations of regularization are mainly related to the additionalcomputational complexity introduced� Various heuristics and short�cuts are nec�essary to approach high�dimensional problems with regularization� Another po�tential problem is the reduced transparency� as concepts like �e�ective number ofparameters� and the e�ect of the constraints on the parameter space introducedby the prior knowledge may be di�cult to grasp� However� we believe regulariza�tion has the potential to become a key tool in the non�linear system identi�cationand semi�empirical modeling �elds�

Structure identi�cation has traditionally been given less attention than parameteridenti�cation� However� with more complex parametric model sets and computerpower around� the structure identi�cation problem has become more important�but certainly not simpler� Compared to todays practise� we feel that one shouldattempt to incorporate some more prior knowledge in both the structure and pa�rameter identi�cation criteria� in order to reduce the demand for empirical dataand to improve the robustness with respect to data set de�ciencies� The appli�cation of regularization to robustify the parameter identi�cation problem is onepossibility� and the bootstrap structure identi�cation criteria introduced in Chap�ter � is a complementary method� However� several practical and theoretical issuesremain unsolved�

�� Adaptive Control

This thesis contains some results on the stability and robustness of an adaptivecontrol algorithm which applied to quite a large class of non�linear systems�

In Chapter � we apply a set of elementary analysis tools from input�output stabil�ity theory �some norm inequalities and the small gain theorem� to prove stabilityof an operating regime based adaptive controller� In Appendix A� a more gen�eral class of models is studied� using weighted l��norms and the Bellman�Gronwalllemma� The set of assumptions made is perhaps one of the least restrictive inthe adaptive control literature� A continuous�time variant based on a non�linearstate�space model representation is presented in �Johansen and Ioannou � �� Itis interesting to observe that the robust stability proofs are essentially the same

� CHAPTER �� CONCLUSIONS

as in the case of linear models� This is a feature of the input�output stabilitytheory� Unfortunately� the results are even more qualitative� because the variousbounds involved are potentially very conservative in the non�linear case� This isanother feature of the input�output stability theory� Hence� there are no majorpractical signi�cance of these results� They mainly serve as an illustration of thefundamental limitations and general design trade�o�s�

Appendix A

Robust Adaptive Control ofSlowly Time�varyingNon�linear Systems

The application of input�output stability theory �Desoer and Vidyasagar � ��and in particular weighted L��norms in the continuous�time case �Ioannou andDatta � �� Tsakalis � �� and weighted l��norms in the discrete time case �Datta� �� has turned out to be attractive for the analysis of stability and robustnessof adaptive control loops based on linear models�

The purpose of this appendix is to illustrate the application of weighted l��normsfor analysis of adaptive controllers based on non�linear models� The controlstructure we have chosen is a simple feedback linearizing controller �Monaco andNormand�Cyrot � � Sastry and Isidori � � Nijmeijer and van der Schaft � ��based on a non�linear discrete�time input�output model with slowly time�varyingparameters� The parameter estimation algorithm applies normalization �Praly� �� and a relative dead�zone �Kreisselmeier and Anderson � �� to ensure ro�bustness�

This appendix is organized as follows� First� we review the essentials of weightedl��norms� Then we present the model representation in section A�� and the adap�tive control structure in section A�� The parameter estimation algorithm and theclosed loop stability are analyzed in sections A�� and A�� respectively� The ap�pendix ends with some discussion of the assumptions made� and some concludingremarks�

The material in this appendix is taken from �Johansen � �d��

A�� Preliminaries

The Euclidean norm of a vector x � Rn is jjxjj �pxTx� The di�erence sequence

&s of a sequence s � �s�t�� is de�ned by &s�t� � s�t�� s�t� �� for all t �� The

�

�� APPENDIX A� ROBUST ADAPTIVE NONLINEAR CONTROL

truncation of a sequence s at time t is denoted st� The exponentially weightedl��norm of a sequence is de�ned by

jjstjj�� vuut tX

��

�t�� jjs�� jj�

for any � � $�� % and t �� The norm of a discrete�time transfer function H�q��where q is the one�step prediction operator and H�q� has all poles strictly insidethe unit disc� is de�ned by

jjH�q�jj� � sup��

jH �ej�� jIfH�q� is proper and has all poles on or inside the disc jcj � p

�� then the exponen�tially weighted norm is de�ned by jjH�q�jj�� jjH�p�q�jj�� The mixed notations��t� � H�q��s��t�� where H�q�� is an exponentially stable transfer function�means the convolution s��t� � h�t� � s��t�� where h�t� is the inverse z�transformof the transfer function H�q�� The exponentially decaying term� which mayappear as a consequence of non�zero initial conditions� is consistently neglected inthis work� as it does not a�ect the stability analysis� The following lemmas willbe used frequently�

Lemma � Let s��t� � H�q��s��t�� where H�q�� is a proper causal transferfunction that has all poles in the disc jcj � p

�� where � � �� %� and supposePt�� s

�� Then jj�s��tjj�� jjH�q��jj��jj�s��tjj�� for all t �

Proof� Given in �Datta � ��

�

Lemma � Let s� be a positive sequence� i�e� s��t� � for all t� Let the transferfunction H�q�� be de�ned by H�q�� K�� q�� for some K � and� � $�� Suppose the sequence s� satisfy

s��t� � H�q�� s��t� ��s��t� �� ns��t� n��

for some non�negative integer n and constants �� n �� Then�s��t� � H�q��

��

�� n��n�s��t�

Proof� Simple�

�

A�� Model Representation

We consider linearly parameterized non�linear discrete�time input�output models�written in the predictor form

y�t d� � �T �y�t�� y�t�m�� u�t�� u�t� r��t� ��t d� �A��

A�� ADAPTIVE CONTROL STRUCTURE ��

with scalar input and output sequences u and y� and m and r are non�negativeintegers� The integer d � is the system�s time�delay� Alternatively� it can beinterpreted as the system�s relative degree �Monaco and Normand�Cyrot � ��The sequence � is unstructured uncertainty� which contains unmodeled dynamics�modeling error and reduced order model�� disturbances� noise� e�ects due to sam�pling of continuous�time signals etc� The function � is possibly non�linear� andthe parameter sequence �� may be slowly time�varying�

AP� There exists a constant � �� such that jj&��t�jj � � for all t �� Inaddition� we know a convex and compact set ) such that ��t� � ) for all t ��

This means that the parametric variation from time t� � to t is bounded in normby the small constant �� Contrary to the time�invariant case� we need to assumethat ) is compact� in order to show boundedness of the parameter estimate� Asa consequence� there exists a � � such that �� ) implies jj�� jj � ��

AB� The sign of the high�frequency gain is uniformly constant and known� With�out loss of generality� we assume here that it is positive�

��T �y�t�� y�t�m�� u�t�� u�t� r��

�u�t� � � � �A��

holds uniformly for all � � )� input sequences u� and output sequences y� It isassumed that � is a continuously di�erentiable function of u�t�� Moreover� � isbounded by an a�ne function� i�e� there exists a constant C � such that

jj��jj � C�� jy�t�j �� jy�t �m�j ju�t�j �� ju�t� r�j�

�

AB contains a strong global controllability assumption� In the usual global con�trollability de�nition� it is required that any state can be reached from any initialstate within an unspeci�ed� but �nite time �Nijmeijer and van der Schaft � ��Our de�nition requires this time to be d� which is the shortest possible� On theother hand� we are only concerned about the output� not the full state�

AR� Let y� be a bounded reference sequence that is known d steps ahead in time�The bound is denoted K� � supt� jy��t�j� �The problem we address here is the one of asymptotically tracking this reference�In other words� if the tracking error is de�ned as #y � y� � y� then the controlobjective is to make j#y�t�j as small as possible as t��

A�� Adaptive Control Structure

With the above model representation� the adaptive certainty equivalence feedbacklinearizing controller is de�ned by the implicit equation

v�t� � �T �y�t�� y�t�m�� u�t�� u�t� r��"��t� �A��


where "��t� is an estimate of ��t�� From �A�� it follows that there always exists acontrol input u�t� that satis�es �A�� and it is unique� The closed loop behavioris now described by

y�t d� � v�t� �T �t��t�� "��t�� t d�

The following linear control algorithm is applied to control this nominally linearsystem

v�t� � y��t d� G�q��y��t�� y�t�� A��

where G�q�� P �q��Q�q�� Now the closed loop satisfy

#y�t� � M �q�� T �t � d��"��t� d�� t� d�� t�

��A��

where

M �q��

� G�q�d�q�d�

Q�q��

Q�q�� q�dP �q��

The e�ect of the parametric and unstructured uncertainty in �A�� will be analyzedin the remaining of this paper�

A�� Parameter Estimation

The estimate "��t� of ��t� is based on the predictor "y�tjt� �� T �t� d�"��t� ��which gives the prediction error

e�t� � "y�tjt� �� y�t� � �T �t � d�"��t� �� T �t� d��t� d�� t� �A��

and normalized prediction error ��t� � e�t�n�t�� where the scalar normalizingsequence n is de�ned by

n��t� � �n��t� �� jj��t� d�jj�

with n�� and � � �� given� The normalization is introduced to ensure thatthe normalized unstructured uncertainty is bounded �Praly � ��

AU� The unstructured uncertainty satis�es j��t�j � Vn�t�� where V �� We apply a parameter estimation algorithm with normalization and relative dead�zone �Kreisselmeier and Anderson � ��

"��t� � "��t� �� t� d�

� �T �t � d��t� d�n�t�D ��t�� A��

"��t� � P� "��t�

��A��

D��t�� t� d if ��t� � �d� if j��t�j � d��t� � d if ��t� � d

�A� �

A�� PARAMETER ESTIMATION ��

where P� is a continuous parameter projection that projects its argument to theclosest point �using Euclidean norm� in )� The continuous function D is referredto as a dead�zone function� the constant d � is the dead�zone� and � � � isthe estimator gain� For convenience� the parameter error sequence is de�ned by#��t� � "��t�� t�� Next� we examine the properties of this algorithm�

Lemma � Suppose the outputs are generated by

y�t d� � �T �t��t� ��t d�

and consider the parameter estimation algorithm A��A�� with initial estimate"�� )� If assumptions AP and AU hold� the gain satis�es � � �� and thedead�zone is chosen such that d � V� then the algorithm has the properties

tX��

D��

�

��

��d�� d��

� ��d� ��

t

�

��

�A��

j��t�j � V jD��t��j �A��tX

��

jj&"�� jj� ��

�

��

��d�� d��

� �d� �� d� ��

t

�

��

�A��

for all t ��Proof� Rewriting the equation for the prediction error �A�� gives

��t� ��T �t � d�#��t� ��

n�t�� T �t � d�

n�t��t� d�� t� �� t�

n�t�

Since d � V there exists a sequence � with the properties � � ��t� � � and

D��t�� t��T �t � d�#��t� ��

n�t�� t�

�T �t � d�

n�t��t� d�� t� ��

From �A�� we �nd

#��t� �

�I � ��t�

��t� d��T �t� d�

� �T �t� d��t � d�

#��t� �� t� �A��

where

��t� � �&��t� ��t��t� d��T �t� d�

� �T �t� d��t� d��t� d�� t� ��

De�ning the function V �t� � jj#��t�jj�� we �nd V �t� � jj#��t�jj� from the convexityof )� Hence�

V �t� � V �t� �� T �t� #�T �t� ��

�I � ��t�

��t � d��T �t� d�

� �T �t� d��t � d�

��

I � ��t��t � d��T �t � d�

� �T �t � d��t � d�

#��t� �� t�

� #�T �t� ��#��t� ��


Assume ��t� � �� then

V �t� � V �t� �� T �t��t� ��T �t�

�I � ��t�

��t � d��T �t� d�

� �T �t � d��t � d�

#��t� ��

��

��t��

�n��t�

� �T �t� d��t� d�

��D��t�� t�

�T �t� d�

n�t��t � d�� t � ��

�

or �D��t�� t�

�T �t� d�

n�t��t� d�� t� ��

��

��t�

�� t��

jj��t�jj� �� jj��t�jj � jj#��t� ��jj �&V �t�

��A��

If ��t� � �� then V �t� � V �t� �� and D��t�� Hence� from the properties of� it follows that

D��t��

��

jj��t�jj� �� jj��t�jj � jj#��t� ��jj �&V �t�

� �jj��t � d�� t � ��jj�

Using the fact that jj#��t�jj � �� which implies V �t� � ��for all t �� together

with the fact that AP implies jj��t�jj � d�� we get �A�� By the de�nition of thedead�zone function� �A�� follows� Finally� we get from �A��

tX��

jj&"�� jj� �tX

��

jj"�� "�� jj�

�tX

��

�� &�� T �� &��

�� &�� T�� d��T �� d�

� �T �� d�� d�#��

�� #�T �� d��T �� d�� d��T �� d�

�� T �� d�� d��#��

�

tX��

jj�� &�� jj� ��jj�� &�� jj � jj#�� jj

��n��

� �T �� d�� d�

��D��

�T �� d�

n�� d��

��

Now� �A�� follows from �A��

A�� CLOSED LOOP STABILITY ��

�

The relative dead�zone is a modi�cation that turns the parameter estimator o�when the normalized prediction error becomes small� and avoids therefore driftphenomena that otherwise might be excitated by the unstructured uncertainty�There are� however� several other modi�cations available� that leads to param�eter estimation algorithms with similar properties and a robust adaptive con�trol system� These include di�erent variations of ��modi�cation� �modi�cation�and parameter projection �Narendra and Annaswamy � � Ioannou and Datta� �� Datta � �� Ydstie � ��

A�� Closed Loop Stability

With the feedback linearizing control structure� it is well known that certain statesmay be unobservable �Byrnes and Isidori � ��Monaco and Normand�Cyrot � ��In order to ensure boundedness of these states� we must study the behavior of theinverse system� By the Implicit Function Theorem� e�g� �Nijmeijer and van derSchaft � �� and assumption AB� it is evident that the inverse system is globallyde�ned by a function g�

u�t� � g�u�t � �� u�t� r�� y�t d�� y�t�� y�t�m�� t d�� t�� A��

where y and � are viewed as inputs� and u as the output�

AI� The inverse system �A�� is globally uniformly exponentially stable� in thesense that its impulse response coe�cients are bounded by an exponentially de�caying sequence� �

Under this assumption� we can prove the following lemma� which essentially showsthat the input is bounded by the output and the unstructured uncertainty�

Lemma � Suppose AB and AI hold� then there exist constants � � $�� andK� � such that for any � � ��

jj�q�du�tjj�� K�

��p��jjytjj�� jj�tjj��

Proof� It follows directly from AI that for some K� �

ju�t�j � K�

tX��

p�t��

�jy�� d�j j�� d�j�

the result follows from Lemma� together with jj��p�q��jj�� p��

�

We are now in position to prove the main result�


Theorem � Suppose the system A�� is controlled by A�� and A�� and thealgorithm A��A�� is applied to estimate the parameter sequence �� AssumeAB� AR� AP� AU and AI hold� and let � � $�� be a constant such that the impulseresponse coe�cients of M �q�� decays faster than

p�t� Suppose the dead�zone is

chosen as d � V� Let � be arbitrary in the open interval max�� If � and V are both su�ciently small� then for arbitrary initial conditions and"�� )� all variables in the closed loop are bounded� and the average squaredtracking error satis�es

�

t

tX��

#y�� c� �

tc� �A��

for all t � �� where c� is proportional to �� and c� � � when � � � and V � ��

Proof� Let us de�ne

#Y �t� � �T �t � d�#��t� d�� t� �A��

Combining �A�� and �A�� we get

#Y �t�

n�t�� t�

�T �t� d�

n�t�

"��t� d�� "��t� ��

��A��

From the de�nition of the normalizing signal� assumption AB and Lemma �� it isevident that there exists a constant C such that

n��t� � �

�� C�

tX��

�t�� jy�� d�j ju�� d�j��

Using Lemma � and the triangle inequality� we �nd

n��t� � � �C�

�� C�

��q�dy�t

��

��

�C�K��

��p��

�jj�jyj j�j�tjj��because � � �� Using Lemma �� we get

n��t� � � �C�

�� Ky jjytjj�� K� jj�tjj��

where Ky �p�C��d��

p�K��

p�� and K� �

p�CK��

p��

By the triangle inequality� the bound on y�� and �A�� we get

n��t� � K Ky��

p��jj#Ytjj��

�� K� jj�tjj��

where K � �� C� �C�K�y �K

�� This can be written

n��t� � K K�y

�

�p��p��

tX��

�t�� #Y �� K��

tX��

�t��

A�� CLOSED LOOP STABILITY ��

After some straightforward manipulations and using �A��

n��t� � K K�y

�

�p��p��

tX��

�t��s�� n�� K��

tX��

�t��

where we have de�ned

s��t� �

��t�

�T �t� d�

n�t�

"��t� d�� "��t� ��

��

�A��

It is straightforward to show that there exists a �nite constant B � such that

n��t ��

n��t�� B

for all t �� Hence

n��t� � K t��X��

�t��w�� n�� A��

where w��t� � B��K��V� BK�

y �p��p��s��t�� By the Gronwall lemma� e�g�

�Desoer and Vidyasagar � �� Datta � �� and the relation between arithmeticand geometric means� e�g� �Rudin � �� we get from �A��

n��t� � K K

t��X��

�t��w��

��

��

t� � � �t��X

i��

w��i�

�t��

From Lemma �� it follows after some manipulation that

tX��

w�� t � �A��

where

�

�B��K�

� �BK�

y

�p��p��

�V� �Bd�K�

y

�p��p��

��

��

��

�BdK�y

�p��p��

��

��

�� A��

� ��BK�

y

�p� �p��

��

��

��

Now�

n��t� � K K

t��X��

�t��w�� t��

��

�� t � � � ��

t��


Since �� x��x � e for all x � �� it follows that

n��t� � K K�� exp

� �

�

t��X��

�� t��w��

Taking � and V su�ciently small� it follows from �A�� that � � �� Hence��cf� Lemma �� in �Datta � ��

n��t� � K K� �� exp

��

��

��

� n�

and it is proved that n is a bounded sequence� From �A�� Lemma �� and �A�� it follows that

tX��

#y��

��p��tX

��

#Y ��

��p��tX

��

s�� n��

� �n�

��p��tX

��

�� d� ��

��&"�� From Lemma �� it is evident that �A�� holds� and c� and c� have the statedproperties�

�

A� Discussion and Concluding Remarks

The contribution of this paper is a demonstration of the use of weighted l��normsfor stability and robustness analysis of adaptive controllers based on non�linearmodels� Su�cient conditions for robust stability of the adaptive control loopare provided by Theorem �� Stability requires the unstructured uncertainty Vand parametric variations � both to be su�ciently small� Because the variousbounds involved are potentially very conservative� the performance result �A�� isconservative and should be interpreted in a qualitative way� The average squaredtracking error will be bounded by a sum of two terms� The �rst term is theasymptotic performance bound that scales with the uncertainty and parametervariations� The second term is a bound on the transient performance that scales

with the bound ��on the parameter error� and vanishes asymptotically at the rate

�t� In addition� we have left out a third exponentially decaying term that is dueto non�zero initial conditions�

It is clear that the model representation �A�� contains two major limitations

�� Assumption AB requires y�t d� to depend strictly monotoneously on u�t��Moreover� the model cannot contain actuator saturation� Typically� if thisis violated� a solution to �A�� will only exist for v�t� �or y��t�� restrictedto some subset of R� Then it may be necessary to modify the controller for

A�� DISCUSSION AND CONCLUDING REMARKS ��

v�t� to ensure that v�t� remains within its prescribed subset� and analyze thee�ects of these modi�cations on the closed loop� Since these problems are ofa more general nature and not related to the use of weighted l��norms� wehave chosen to apply the restrictive assumption AB in order to simplify thepresentation�

�� The predictor is linearly parameterized� This assumption can be relaxed toinclude non�linearly parameterized models

y�t d� � f�y�t�� y�t�m�� u�t�� u�t� r�� t�� t d� �A��

where f is a non�linear function that is twice continuously di�erentiable withrespect to �� With the certainty equivalence controller

v�t� � f�y�t�� y�t �m�� u�t�� u�t� r�� "��t��

we get using Taylor�s theorem

#y�t� � M �q��

��f

�� "��t� d��#��t � d�

�

�#��t� d�

��f

�� t � d��#��t� d� ��t�

where ��t� � ��t��t� �� t��"��t� for some ��t� � $�� %� De�ning

��t� ��f

�� "��t��

��t� � ��t� �

�#��t � d�

��f

�� t � d��#��t� d�

we get

#y�t� �M �q��T �t � d�#��t � d� ��t��

Under similar conditions as before� it is clear that there exists a constantK � such that ��t� � Vn�t� where

V � V K��

If �� and V are all su�ciently small� then Theorem � still holds� Hence� theprice to pay for the relaxed assumption is that the parametric uncertaintymust be su�ciently small�

Assumptions AP� AU� and AR are standard in the certainty equivalence adaptivecontrol literature� The invertability condition AI is imposed by the particularcontrol algorithm� while the bound on �� cf� AB� are directly related to the use ofinput�output global stability theory arguments� see also �Chen and Khalil � ��for a related local Lyapunov approach� While the a�ne boundedness appearsto strongly restrict the class of mathematical systems �Sastry and Isidori � �Schwartz and Mareels � �� it is clear from a more practical point of view thatsystems where the states do not diverge at a rate faster than exponential� will


satisfy this assumption� We cannot think of any man�made physical system thatdoes not satisfy this assumptions�

We have seen that the application of weighted l��norms gives a quite simple proofunder rather weak and transparent assumptions� However� while the results arequalitatively appealing and transparent� their quantitative signi�cance is doubtful�This is related to the general conservativeness of global input�output stabilitytheory applied to non�linear systems�

Bibliography

Adams� R� A� �� Sobolev spaces� Adademic Press� New York�

Akaike� H� �� 3Fitting autoregressive models for prediction�� Ann� Inst� Stat�Math� ��

Akaike� H� �� 3A new look at the statistical model identi�cation�� IEEE Trans�Automatic Control ��

Aksenova� T� I� �� 3Su�cient covergence conditions for external criteria formodel selection�� Soviet J� Automation and Information Sciences ��

Albus� J� �� Theoretical and experimental aspects of a cerebellar model� PhDthesis� University of Maryland�

Aoyama� A� and Venkatasubramanian� V� �� Integrating neural networks with�rst�principles knowledge for bioreactor modeling and control� Paper ��i�� Annual AIChE Meeting� November� St� Louis�

Bailey� J� E� and Ollis� D� F� �� Biochemical Engineering Fundamentals�McGraw�Hill� Singapore�

Balchen� J� G�� Ljungquist� D� and Strand� S� �� 3State�space predictive con�trol�� Chemical Engineering Science ��

Barron� A� R� �� 3Universal approximation bounds for superpositions of asigmoidal function�� IEEE Trans� Information Theory ��

Bastin� G� and Dochain� D� �� Non linear adaptive control algorithms forfermentation processes� in 3Proc� American Control Conference� Atlanta��pp� ��

Bellman� R� and Dreyfus� S� �� Applied Dynamic Programming� PrincetonUniversity Press�

Bellman� R� E� �� a�� Adaptive Control Processes� Princeton Univ� Press�

Bellman� R� E� �� b�� 3On the approximation of curves by line segment usingdynamic programming�� Comm� Assoc� for Comp�

Benveniste� A�� Juditsky� A�� Delyon� B�� Zhang� Q� and Glorennec� P��Y� �� Wavelets in identi�cation� in 3Preprints ��th IFAC Symp� System Identi�ca�tion� Copenhagen�� Vol� �� pp� ��

��

�� BIBLIOGRAPHY

Bertero� M�� De Mol� C� and Pike� E� R� �� 3Linear inverse problems withdiscrete data� I� General formulation and singular system analysis�� InverseProblems ��

Bezdek� J� C�� Coray� C�� Gunderson� R� and Watson� J� �� a�� 3Detection andcharacterization of cluster substructure� II� fuzzy c�varities and complex com�binations thereof�� SIAM J� Applied Mathematics � � ��

Bezdek� J� C�� Coray� C�� Gunderson� R� and Watson� J� �� b�� 3Detection andcharacterization of cluster substructure� I� Linear structure� Fuzzy c�lines��SIAM J� Applied Mathematics � � ��

Billings� S� A� and Chen� S� �� 3Extended model set� global data and thresholdmodel identi�cation of severly non�linear systems�� Int� J� Control � � ��

Billings� S� A� and Voon� W� S� G� �� 3Piecewise linear identi�cation of non�linear systems�� Int� J� Control ��

Bishop� C� �� 3Improving the generalization properties of radial basis functionneural networks�� Neural Computation ��

Bohlin� T� �� The fundamentals of modelling and identi�cation� TechnicalReport TRITA�REG� �� Dept� Automatic Control� Royal Institute ofTechnology� Stockholm�

Bohlin� T� and Graebe� S� F� �� Issue in nonlinear stochastic grey�box iden�ti�cation� in 3Preprints IFAC Symposium on System Identi�cation� Copen�hagen�� Vol� �� pp� ��

Box� G� E� P� and Draper� N� R� �� Empirical Model�Building and ResponseSurfaces� John Wiley 4 Sons� New York�

Box� G� E� P� and Hunter� W� G� �� 3The experimental study of physicalmechanismns�� Technometrics ��

Box� G� E� P� and Jenkins� G� M� �� Time series analysis� Forecasting andcontrol� Holden�Day� San Francisco� Ca�

Box� G� E� P� and Youle� P� V� �� 3The exploration and exploitation andresponse surfaces� An example of the link between the �tted surface and thebasic mechansims of the system�� Biometrics ��

Breiman� L� �� 3Hinging hyperplanes for regression� classi�cation� and func�tion approximation�� IEEE Trans� Information Theory ��

Breiman� L� and Meisel� W� S� �� 3General estimates of the intrinsic variabilityof data in nonlinear regression models�� J� American Statistical Association��

Breiman� L�� Friedman� J� H�� Olshen� R� A� and Stone� C� J� �� Classi�cationand Regression Trees� Wadsworths 4 Brooks� Monterey� Ca�

Broomhead� D� S� and Lowe� D� �� 3Multivariable functional interpolationand adaptive networks�� Complex Systems ��

BIBLIOGRAPHY ��

Brown� R� H�� Ruchti� T� L� and Feng� X� �� Arti�cial neural network identi��cation of partially known dynamic nonlinear systems� in 3Proc� ��nd IEEEConf� Decision and Control� San Antonio� TX�� pp� ��

Byrnes� C� I� and Isidori� A� �� A frequency domain philosophy for nonlinearsystems� in 3Proceedings of the � � IEEE Conf� Decision and Control� LasVegas� Nevada�� pp� ��

Carlin� M�� Kavli� T� and Lillekjendlie� B� �� 3A comparison of four meth�ods for non�linear data modeling�� Chemometrics and Intelligent LaboratorySystems ��

Carlstein� E� �� Resampling techniques for stationary time�series� Some re�cent developments� in D� Brillinger et al�� ed�� 3New Direction in Time SeriesAnalysis� Part I�� Springer�Verlag� New York� NY� pp� ��

Chalmers� A� F� �� What is this thing called science� �nd Ed�� Open Univer�sity Press� Buckingham� UK�

Chen� F��C� and Khalil� H� K� �� Adaptive control of nonlinear systems usingneural networks � a dead�zone approach� in 3Proc� American Control Confer�ence� Boston�� pp� ��

Chen� S� and Billings� S� A� �� 3Representation of non�linear systems� TheNARMAX model�� Int� J� Control ��

Chen� S�� Billings� S� A� and Grant� P� M� �� a�� 3Non�linear system identi�cationusing neural networks�� Int� J� Control ��

Chen� S�� Billings� S� A�� Cowan� C� F� N� and Grant� P� �� b�� 3Practical iden�ti�cation of NARMAX models using radial basis functions�� Int� J� Control��

Craven� P� and Wahba� G� �� 3Smoothing noisy data with spline functions�Estimating the correct degree of smoothing by the method of generalizedcross�validation�� Numerical Math� ��

Cutler� C� R� and Ramaker� B� L� �� Dynamic matrix control � a computercontrol algorithm� in 3AIChE �th National Meeting� Houston��

Cybenko� G� �� 3Approximations by superpositions of a sigmoidal function��Mathematics of Control� Signals and Systems ��

Cybenko� G�� Saarinen� S�� Gray� R�� Wu� Y� and Khrabrov� A� �� On theunreasonable e�ectiveness of memory�based methods� Preprint�

Cyrot�Normand� D� and Mien� H� D� V� �� Non�linear state�a�ne identi�ca�tion methods� Application to electrical power plants� in 3Proc� IFAC Sympo�sium on Automatic Control in Power Generation� Distribution and Protec�tion�� pp� ��

Datta� A� �� 3Robustness of discrete�time adaptive controllers� An input�output approach�� IEEE Trans� Automatic Control ��

De Veaux� R� D�� Psichogios� D� C� and Ungar� L� H� �� A tale of two non�parametric estimation schemes� MARS and neural networks� in 3Proc� �thInt� Conf� Arti�cial Intelligence and Statistics��

�� BIBLIOGRAPHY

Demuth� H� and Beale� M� �� Neural Network Toolbox User�s Guide MAT�LAB� The MathWorks� Inc�

Desoer� C� A� and Vidyasagar� M� �� Feedback Systems� Input�OutputProperies� Academic Press� New York�

Dorofeyuk� A� A�� Kasavin� A� D� and Torgovitsky� I� S� �� Applicationof automatic classi�cation methods to process identi�cation in industry� in3Preprints �nd IFAC Symposium on Identi�cation and Process ParameterIdenti�cation� Prague� Czechoslovakia�� p� ��

Doyle� J� �� 3Guaranteed margins for LQG regulators�� IEEE Trans� Auto�matic Control�

Dyn� N� �� Interpolation and approximation by radial and related functions�in C� K� Chui� L� L� Schumaker and J� D� Ward� eds� 3Approximation TheoryIV�� Vol� �� Academic Press� Inc�� pp� ��

Efron� B� and Tibshirani� R� �� 3Bootstrap methods for standard error� con��dence intervals� and other measures of statistical accuracy�� Statistical Science��

Farmer� J� D� and Sidorowich� J� J� �� 3Predicting chaotic time series�� PhysicalReview Letters ��

Fedorov� V� V� �� Theory of Optimal Experiments� Academic Press� NewYork�

Foss� B� A� and Johansen� T� A� �� Parallel nonlinear decoupling for processcontrol � A NARMAX approach� in 3Preprints IFAC Symposium on AI inReal�Time Control� Delft� Holland�� pp� ��

Foss� B� A� and Johansen� T� A� �� On local and fuzzy modeling� in 3Proceed�ings of the �rd Int� Conf� Industrial Fuzzy Control and Intelligent Systems�Houston� TX�� pp� ��

Foss� B� A�� Johansen� T� A� and S�rensen� A� V� �� Nonlinear predictivecontrol using local models � applied to a batch process� in 3Preprints IFACSymp� Advanced Control of Chemical Processes �ADCHEM�� Kyoto� Japan�also accepted for Control Engineering Practise�� pp� ��

Freedman� D� �� 3On bootstrapping two�stage least�squares estimates in timeseries analysis�� The Annals of Statistics ��

Friedman� J� H� �� 3Multivariable adaptive regression splines �with discus�sion�� The Annals of Statistics ��

Friedman� J� H� and Stuetzle� W� �� 3Projection pursuit regression�� J� Amer�ican Statistical Association ��

Garcia� C� �� Quadratic�dynamic matrix control of nonlinear processes� in3AIChE Annual Meeting� San Francisco��

Garcia� C� E�� Prett� D� M� and Morari� M� �� 3Model predictive control�Theory and practice � A survey�� Automatica ��

Gelb� A�� ed� �� Applied Optimal Estimation� MIT Press� Cambridge�

BIBLIOGRAPHY ��

Ghose� T� K� and Ghosh� P� �� 3Kinetic analysis of gluconic acid productionby pseudomonas ovalis�� J� Applied Chemical Biotechnology ��

Gill� P�� Murray� W� and Wright� M� �� Practical optimization� AcademicPress� Inc�

Girosi� F�� Jones� M� and Poggio� T� �� Priors� stabilizers and basis functions�From regularization to radial� tensor and additive splines� Technical ReportAI Memo �� MIT� Cambridge�

Goldberg� D� E� �� Genetic algorithms in search� optimization and machinelearning� Addison�Wesley�

Goodwin� G� C�� Ramadge� P� J� and Caines� P� E� �� 3Discrete�time multi�variable adaptive control�� IEEE Trans� Automatic Control ��

Grace� A� �� Optimization Toolbox User�s Guide MATLAB� The Math�Works� Inc�

Granger� C� W� J� and Andersen� A� P� �� 3On the invertibility of time seriesmodels�� Stochastic Processes and their Application ��

Gustafsson� T� K� and Waller� K� V� �� 3Dynamic modeling and reactioninvariant control of pH�� Chemical Engineering Science ��

Haber� R�� Vajk� I� and Keviczky� L� �� Nonlinear system identi�cation by�linear� systems having signal�dependent parameters� in 3Preprints �th IFACSymp� on Identi�cation and System Parameter Identi�cation� WashingtonD�C�� pp� ��

Hall� R� C� and Seborg� D� E� �� Modelling and self�tuning control of a multi�variable pH neutralization process� Part I� Modelling and multiloop control�in 3Proc� American Control Conference� Pittsburgh�� Vol� �� pp� ��

Hastie� T� J� and Tibshirani� R� J� �� Generalized Additive Models� Chapman4 Hall� London�

Hathaway� R� J� and Bezdek� J� C� �� 3Switching regression models and fuzzyclustering�� IEEE Trans� Fuzzy Systems ��

Hengjie� Z�� Jianzhong� L�� Shuqing� W� and Jicheng� W� �� Nonlinear feed�back control of a fed�batch spriamycin fermentation process� in 3Proc� DY�CORD � � Maastricht�� pp� ��

Hilaly� A�� Karim� M� and Linden� J� �� 3Use of an extended Kalman �lterand development of an automated system for xylose fermentation by recom�binantescherichia coli�� J� Industrial Microbiology ��

Hilhorst� R� A� �� Supervisory Control of Mode�Switch Processes� PhD thesis�University of Twente � Electrical Engineering Department�

Hjalmarsson� H� �� Aspects on Incomplete Modeling in System Identi�cation�PhD thesis� University of Link�oping�

Huber� P� J� �� 3Projection pursuit �with discussion�� The Annals of Statistics��

�� BIBLIOGRAPHY

Impe� J� V�� Nicola� B�� Vanrolleghem� P�� Spriet� J�� Moor� B� D� and Vande�walle� J� �� 3Optimal control of the Pencillin G fed�batch fermentation��Optimal control appl� and meth� ��

Ioannou� P� and Datta� A� �� 3Robust adaptive control� A uni�ed approach��Proceedings of the IEEE �� Also in Foundations of AdaptiveControl� P� V� Kokotovi5c� Ed�� Springer�Verlag� Berlin� � ��

Isidori� A� �� Nonlinear Control Systems� �nd Ed�� Springer Verlag� Berlin�

Ivakhnenko� A� G� and Yurachkovsky� Y� P� �� System structure identi�cationby sets of observation data on the base of unbiasness principles� in 3PreprintsIFAC Symposium on Identi�cation and System Parameter Estimation� Bei�jing�� pp� ��

Jacobs� R� A�� Jordan� M� I�� Nowlan� S� J� and Hinton� G� E� �� 3Adaptivemixtures of local experts�� Neural Computation ��

Janssen� P�� Stoica� P�� S�oderstr�om� T� and Eykho�� P� �� Cross�validationideas in model structure selection for multivariate systems� in 3Preprints thIFAC�IFORS symposium on Identi�cation and System Parameter Estima�tion� Beijing� Aug� �� pp� ��

Jian� C� �� 3A predicting system based on combining an adaptive predictorand a knowledge base as applied to a blast furnace�� Journal of Forecasting��

Joerding� W� H� and Meador� J� L� �� 3Encoding a prior information in feed�forward networks�� Neural Networks ��

Johansen� T� A� �� a�� Adaptive control of MIMO nonlinear systems using localARX models and interpolation� in 3Preprints IFAC Symp� Advanced Controlof Chemical Processes �ADCHEM�� Kyoto� Japan�� pp� ��

Johansen� T� A� �� b�� 3Fuzzy model based control� Stability� robustness� andperformance issues�� IEEE Trans� Fuzzy Systems ��

Johansen� T� A� �� c�� Identi�cation of non�linear systems using empirical dataand prior knowledge � an optimization approach� Submitted to Automatica�

Johansen� T� A� �� d�� Weighted l��norms for analysis of an adaptive control loopbased on a non�linear model� Submitted to IEEE Trans� Automatic Control�

Johansen� T� A� �� On the optimality of the Takagi�Sugeno�Kang fuzzy infer�ence mechanism� Accepted for the �th IEEE Conf� on Fuzzy Systems� Yoko�hama� Japan�

Johansen� T� A� and Foss� B� A� �� a�� 3A NARMAX model representation foradaptive control based on local models��Modeling� Identi�cation� and Control��

Johansen� T� A� and Foss� B� A� �� b�� Nonlinear local model representationfor adaptive systems� in 3Proc� IEEE Int� Conf� on Intelligent Control andInstrumentation� Singapore�� Vol� �� pp� ��

BIBLIOGRAPHY ��

Johansen� T� A� and Foss� B� A� �� c�� Representing and learning unmodeleddynamics with neural network memories� in 3Proceedings of the AmericanControl Conference� Chicago� Il�� pp� ��

Johansen� T� A� and Foss� B� A� �� a�� 3Constructing NARMAX models usingARMAX models�� Int� J� Control ��

Johansen� T� A� and Foss� B� A� �� b�� State�space modeling using operat�ing regime decomposition and local models� in 3Preprints ��th IFAC WorldCongress� Sydney� Australia� � �� July� Extended paper in Technical Re�port ��W� Department of Engineering Cybernetics� Norwegain Instituteof Technology� Trondheim�� Vol� �� pp� ��

Johansen� T� A� and Foss� B� A� �� a�� A dynamic modeling framework based onlocal models and interpolation � combining empirical and mechanistic knowl�edge and data� Submitted to Computers and Chemical Engineering�

Johansen� T� A� and Foss� B� A� �� b�� Identi�cation of non�linear system struc�ture and parameters using regime decomposition� in 3Preprints IFAC Sympo�sium on System Identi�cation� Copenhagen �also accepted for Automatica��Vol� �� pp� ��

Johansen� T� A� and Foss� B� A� �� Empirical modeling of a heat transferprocess using local models and interpolation� Submitted to the � � AmericanControl Conference� Seattle� Wa�

Johansen� T� A� and Ioannou� P� A� �� Robust adaptive control of minimumphase nonlinear systems� Submitted to Int� J� Adaptive Control and SignalProcessing�

Johansen� T� A� and Weyer� E� �� Model structure identi�cation using sep�arate validation data � asymptotic properties� Submitted to the EuropeanControl Conference� Rome�

Johnson� A� �� 3The control of fed�batch fermentation processes � A survey��Automatica ��

Jones� R� D� and co�workers �� Nonlinear adaptive networks� A little theory�a few applications� Technical Report �� Los Alamos National Lab�� NM�

Jones� R� D�� Lee� Y� C�� Barnes� C� W�� Flake� G� W�� Lee� K�� Lewis� P� S�and Qian� S� �� Function approximation and time series prediction withneural networks� Technical Report �� Los Alamos National Lab�� NM�

Jordan� M� I� and Jacobs� R� A� �� Hierarchical mixtures of experts andthe EM algorithm� Technical Report �� MIT Computational CognitiveScience�

J�orgensen� S� B� and Jensen� N� �� Dynamics and control of chemical reactors� selectively surveyed� in 3Preprints DYCORD � Maastricht�� pp� ��

Kasavin� A� D� �� 3Adaptive piecewise approximation algorithms in the iden�ti�cation problem�� Automation and Remote Control ��

Kashyap� R� L� �� 3A bayesian comparison of di�erent classes of dynamicmodels using empirical data�� IEEE Trans� Automatic Control � ��

�� BIBLIOGRAPHY

Kavli� T� �� Nonuniformly partitioned piecewise linear representation of con�tinuous learned mappings� in 3Proceedings of IEEE Int� Workshop on Intelli�gent Motion Control� Istanbul�� pp� ��

Kavli� T� �� 3ASMOD �An algorithm for adaptive spline modelling of obser�vation data�� Int� J� Control ��

Kavli� T� and Weyer� E� �� ASMOD � Some theoretical and experimentalresults� in K� Kunt� G� Irwin and K� Warwick� eds� 3Advances in NeuralNetworks for Control Systems�� Springer�Verlag� Berlin�

Keulers� M� �� Identi�cation and Control of a Fed�Batch Process� Dr� thesis�Technical University� Eindhoven�

Kirkpatrick� S�� Gelatt Jr�� C� D� and Vecchi� M� P� �� 3Optimization bysimulated annealing�� Science ��

Kolmogoro�� A� �� 3Sulla theoria di Volterra della lotta per l�esistenza�� G�Istit� Ital� Degli Attuari ��

Konstantinov� K� and Yoshida� T� �� 3Physiological state control of fermen�tation processes�� Biotechnology and Bioengineering ��

Kortmann� M� and Unbehauen� H� �� Two algorithms for model structuredetermination of nonlinear dynamic systems with applications to industrialprocesses� in 3Preprints th IFAC�IFORS symposium on Identi�cation andSystem Parameter Estimation� Beijing� Aug� �� pp� � � ��

Kramer� M� A�� Thompson� M� L� and Phagat� P� M� �� Embedding theo�retical models in neural networks� in 3Proc� American Control Conference�Chicago� Il�� pp� ��

Kreisselmeier� G� and Anderson� B� D� O� �� 3Robust model reference adaptivecontrol�� IEEE Trans� Automatic Control ��

Kreyszig� E� �� Introductory Functional Analysis with Applications� Krieger�Malabar� FL�

Lane� S� H�� Handelman� D� A� and Gelfand� J� J� �� 3Theory and develop�ment of higher�order CMAC neural networks�� IEEE Control System Maga�zine ��

Larsen� J� �� A generalization error estimate for nonlinear systems� in 3Proc�IEEE Workshop on Neural Networks for Signal Processing� Piscataway� NJ��pp� � ��

Larsen� J� �� Generalization performance of regularized neural network mod�els� in 3Proc� IEEE Workshop on Neural Networks for Signal Processing�Ermioni� Greece��

Leontaritis� I� J� and Billings� S� A� �� 3Input�output parametric models fornon�linear systems�� Int� J� Control ��

Lim� H� and Lee� K� �� Control of bioreactor systems� in H��J� Rehm andG� Reed� eds� 3Biotechnology� Measuring� Modeling� and Control�� Vol� ��VCH� Weinheim�

BIBLIOGRAPHY ��

Lindskog� P� and Ljung� L� �� Tools for semi�physical modeling� in 3PreprintsIFAC Symposiumon System Identi�cation� Copenhagen�� Vol� �� pp� ��

Ljung� L� �� 3Convergence ananlysis of parametric identi�cation methods��IEEE Trans� Automatic Control ��

Ljung� L� �� System Identi�cation� Theory for the User� Prentice�Hall� Inc��Englewood Cli�s� NJ�

Ljung� L� �� 3System identi�cation in a Modeling� Identi�cation� and Controlperspective�� Modeling� Identi�cation� and Control ��

Ljung� L�� Sj�oberg� J� and McKelvey� T� �� On the use of regularization insystem identi�cation� in 3Preprints ��th IFAC World Congress� Sydney��

Luenberger� D� G� �� Optimization by Vector Space Methods� John Wiley�

Luenberger� D� G� �� Introduction to Linear and Nonlinear Programming��nd Ed�� Addison�Wesley� Inc�� Reading� MA�

Madych� W� R� and Nelson� S� A� �� 3Multivariate interpolation and condi�tionally positive de�nite functions� II��Mathematics of Computation ��

Mallows� C� L� �� 3Some comments on cp�� Technometrics ��

Marquardt� W� �� Trends in computer�aided process modeling� in 3Proc� Pro�cess Systems Engineering� Seoul� South Korea�� pp� ��

Mavrovouniotis� M� L� and Chang� S� �� 3Hierarchical neural networks�� Comp�Chem� Engr� ��

N�s� T� �� 3Multivariate calibration when data are split into subsets�� J�Chemometrics � ��

N�s� T� and Isaksson� T� �� 3Splitting of calibration data by clustering anal�ysis�� J� Chemometrics � � ��

Monaco� S� and Normand�Cyrot� D� �� Minimum�phase nonlinear discrete�time systems and feedback stabilization� in 3Proc� IEEE Conf� Decision andControl� Los Angeles� Ca�� pp� � ��

Moody� J� and Darken� C� J� �� 3Fast learning in networks of locally�tunedprocessing units�� Neural Computation ��

Mosca� E� �� System identi�cation by reproducing kernel hilbert space meth�ods� in 3Preprints �nd IFAC Symposium on Identi�cation and Process Pa�rameter Estimation� Prague�� p� Paper ��

Murray�Smith� R� �� A fractal basis function neural net for modeling� in3Proceedings Int� Conference on Automation� Robotics� and Computer Vision�Singapore�� pp� NW�� NW��

Murray�Smith� R� �� a�� A Local Model Network Approach to Nonlinear Mod�elling� PhD thesis� The University of Strathclyde� Glasgow�

Murray�Smith� R� �� b�� Local model networks and local learning� in 3FuzzyDuisburg�� pp� ��

�� BIBLIOGRAPHY

Murray�Smith� R� and Gollee� H� �� A constructive learning algorithm forlocal model networks� in 3Proceedings of the IEEE Workshop on Compuer�Intensive Methods in Control and Signal Processing� Prague� Czech Republic��pp� ��

Nakamori� Y� and Ryoke� M� �� 3Identi�cation of fuzzy prediction modelsthrough hyperellipsoidal clustering�� IEEE Trans� Systems� Man� and Cyber�netics ��

Nakamori� Y�� Suzuki� K� and Yamanaka�T� �� A new design of a fuzzy modelpredictive control system for nonlinear processes� in T� Terano� M� Sugeno�M� Mukaidono and K� Shigemasu� eds� 3Fuzzy Engineering Toward HumanFriendly Systems�� IOS Press� Amsterdam� pp� � �

Narendra� K� S� and Annaswamy� A� M� �� Stable Adaptive Systems� Prentice�Hall� Englewood Cli�s� NJ�

Nashed� M� Z�� ed� �� Generalized Inverses and Applications� Academic Press�New York�

Nguyen� D� H� and Widrow� B� �� 3Neural networks for self�learning controlsystems�� IEEE Control Systems Magazine � � ��

Nijmeijer� H� and van der Schaft� A� J� �� Nonlinear Dynamical Control Sys�tems� Springer�Verlag� New York�

Omohundro� S� M� �� 3E�cient algorithms with neural network behavior�� J�Complex Systems ��

Opoitsev� V� I� �� 3Identi�cation of static plants by means of piecewise linearfunctions�� Automation and Remote Control ��

O�Sullivan� F� �� 3A statistical perspective on ill�posed inverse problems��Statistical Science ��

Park� J� and Sandberg� I� W� �� 3Approximation and radial�basis�functionnetworks�� Neural Computation � ��

Parzen� E� �� 3On estimation of a probability density function and mode��Ann� Meth� Stat� ��

Pearl� J� �� Heuristics� Intelligent Search Strategies for Computer ProblemSolving� Addison�Wesley�

Peterson� T�� Hernandez� E�� Arkun� Y� and Schork� F� �� Nonlinear predictivecontrol of a semi batch polymerization reactor by extended DMC� in 3Proc�American Control Conference� Pittsburg�� pp� ��

Pollard� D� �� Convergence of Stochastic Processes� Springer�Verlag� NewYork�

Polycarpou� M� M� and Ioannou� P� A� �� Identi�cation and control of non�linear systems using neural network models� Design and stability analysis�Technical Report �� Dept� Electrical Enginering � Systems� Universityof Southern California� Los Angeles�

BIBLIOGRAPHY ��

Polycarpou� M� M�� Ioannou� P� A� and Ahmed�Zaid� F� �� Neural networksand on�line approximators for discrete�time nonlinear system identi�cation�Preprint�

Pomerleau� Y�� Perrier� M� and Dochain� D� �� Adaptive nonlinear control ofthe bakers� yeast fed�batch fermentation� in 3Proc� American Control Confer�ence� Pittsburgh�� pp� ��

Pottmann� M�� Unbehauen� H� and Seborg� D� E� �� 3Application of a generalmulti�model approach for identi�cation of highly non�linear processes � A casestudy�� Int� J� Control ��

Powell� M� J� D� �� Radial basis function approximations to polynomials� in3��th Biennal Numerical Analysis Conference� Dundee�� pp� ��

Praly� L� �� Robust model reference adaptive controllers� Part I� Stabilityanalysis� in 3Proc� IEEE Conf� Decision and Control� Las Vegas� Nevada��pp� ��

Press� W� H�� Flannery� B� P�� Teukolsky� S� A� and Vetterling� W� T� �� Nu�merical Recipes in C� The Art of Scienti�c Computing� Cambridge UniversityPress�

Priestley� M� B� �� Non�linear and Non�stationary Time Series Analysis� Aca�demic Press� London�

Proll� T� and Karim� M� �� 3Real�time design of an adaptive nonlinear pre�dictive controller�� Int� J� Control ��

Psichogios� D� C� and Ungar� L� H� �� 3A hybrid neural network � �rst prin�ciples approach to process modeling�� AIChE J� ��

Rai� V� R� and Constantindes� A� �� 3Mathematical modeling and optimiza�tion of the gluconic acid fermentation��AIChE Symposium Series ��

Rajbman� N� S�� Dorofeyuk� A� A� and Kasavin� A� D� �� Identi�cation ofnonlinear processes by piecewise approximation� in P� Eykho�� ed�� 3Trendsand Progress in System Ienti�cation�� Pergamon Press� Oxford� pp� ��

Rawlings� J� B�� Meadows� E� S� and Muske� K� R� �� Nonlinear model predic�tive control� A tutorial and survey� in 3Preprints IFAC SymposiumADCHEM�Kyoto� Japan�� pp� ��

Rescigno� A� and Richardson� I� W� �� The deterministic theory of populationdynamics� in R� Rosen� ed�� 3Foundations of Mathematical Biology�� Vol� ��Academic Press� New York� NY� pp� ��

Rippin� D� W� T� �� Control of batch processes� in 3Proceedings DYCORD � August� Maastrict� The Netherlands�� pp� ��

Rissanen� J� �� 3Modeling by shortest data description�� Automatica ��

Rissanen� J� �� Consistent order�estimates of autoregressive processes byshortest decscription of data� in O� L� R� Jacobs� M� H� A� Davis� M� A� H�Dempster� C� J� Harris and P� C� Parks� eds� 3Analysis and Optimization ofStochastic Systems�� Academic Press� London� pp� ��

�� BIBLIOGRAPHY

Rudin� W� �� Real and Complex Analysis� McGraw�Hill� New York�

Rugh� W� J� �� 3Analytical framework for gain scheduling�� IEEE ControlSystems Magazine ��

Sabatler� P� C�� ed� �� Inverse Problems� An Interdisciplinary Study� Aca�demic Press� London�

Sanger� T� D� �� 3A tree�structured adaptive network for function approxi�mation in high�dimensional spaces�� IEEE Transactions on Neural Networks��

Sargantanis� J�� Karim� M�� Murphy� V� and Ryoo� D� �� 3E�ect of operatingconditions on solid substrate fermentation�� Biotech� and Bioengr� ��

Sastry� S� S� and Isidori� A� �� 3Adaptive control of linearizable systems��IEEE Trans� Automatic Control ��

Sbarbaro� D� �� Context sensitive networks for modelling nonlinear dynamicsystems� Preprint�

Schwartz� C� A� and Mareels� I� M� Y� �� 3Comments on �Adaptive control oflinearizable systems�� IEEE Trans� Automatic Control ��

Scott� G� M� and Ray� W� H� �� 3Creating e�cient nonlinear neural networkprocess models that allow model interpretation�� J� Process Control ��

Shamma� J� S� and Athans� M� �� 3Analysis of gain scheduled control fornonlinear plants�� IEEE Trans� Automatic Control ��

Shibata� R� �� 3Selection of the order of an autoregressive model by Akaike�sinformation criterion�� Biometrica ��

Shorten� R� and Murray�Smith� R� �� On normalising radial basis functionnetworks� in 3Proc� Irish Neural Network Conference� Dublin��

Silverman� B� W� �� Density Estimation for statistics and data analysis�Chapman 4 Hall� London�

Sj�oberg� J� and Ljung� L� �� Overtraining� regularization� and searching forminimum in neural networks� in 3Preprints IFAC Symposium on AdaptiveSystems in Control and Signal Processing� Grenoble� France�� pp� ��

Sj�oberg� J�� Hjalmarsson� H� and Ljung� L� �� Neural networks in systemidenti�cation� in 3Preprints ��th IFAC Symp� System Identi�cation� Copen�hagen�� Vol� �� pp� � ��

Skeppstedt� A�� Ljung� L� and Millnert� M� �� 3Construction of compositemodels from observed data�� Int� J� Control � ��

S�rheim� E� �� 3A combined network architecture using ART� and back prop�agation for adaptive estimation of dynamical processes�� Modeling� Identi��cation and Control ��

BIBLIOGRAPHY ��

S�oderman� U�� Top� J� and Str�omberg� J��E� �� The conceptual side of modeswitching� in 3Proc� IEEE Conf� Systems� Man� and Cybernetics� Le Touquet�France�� pp� ��

S�oderstr�om� T� and Stoica� P� �� System Identi�cation� Prentice Hall� Engle�wood Cli�s� NJ�

Stepashko� V� S� �� 3Asymptotic properties of external criteria for modelselection�� Soviet J� Automation and Information Sciences ��

Stoica� P�� Eykho�� P�� Janssen� P� and S�oderstr�om� T� �� 3Model�structureselection by cross�validation�� Int� J� Control ��

Stokbro� K� and Umberger� D� K� �� Forecasting with weighted maps� in3Proc� � � Workshop on Nonlinear Modeling and Forecasting� Santa Fe In�stitute��

Stokbro� K�� Hertz� J� A� and Umberger� D� K� �� 3Exploiting neurons withlocalized receptive �elds to learn chaos�� J� Complex Systems ��

Stone� M� �� 3Cross�validatory choice and assessment of statistical predic�tions�� J� Royal Statistical Soc� B ��

Str�omberg� J��E�� Gustafsson� F� and Ljung� L� �� Trees as black�box modelstructures for dynamical systems� in 3Proc� European Control Conference�Grenoble�� pp� ��

Su� H��T�� Bhat� N� and McAvoy� T� J� �� Integrated neural networks with�rst principles models for dynamic modeling� In Preprints IFAC DYCORD � �� College Park� Maryland�

Sugeno� M� and Kang� G� T� �� 3Fuzzy modelling and control of multilayerincinerator�� Fuzzy Sets and Systems ��

Sugeno� M� and Kang� G� T� �� 3Structure identi�cation of fuzzy model��Fuzzy Sets and Systems ��

Takagi� T� and Sugeno� M� �� 3Fuzzy identi�cation of systems and its applica�tion to modeling and control�� IEEE Trans� Systems� Man� and Cybernetics��

Tanaka� Y�� Fukushima� M� and Ibaraki� T� �� 3A comparative study of sev�eral semi�in�nite nonlinear programming algorithms�� European Journal ofOperational Research ��

Telnes� K� �� Computer Aided Modeling of Dynamic Processes based onElementary Physics� PhD thesis� The Norwegian Institute of Technology�

Thompson� M� L� and Kramer� M� A� �� 3Modeling chemical processes usingprior knowledge and neural networks�� AIChE J� � � ��

Tikhonov� A� N� and Arsenin� V� Y� �� Solutions of Ill�posed problems� Win�ston� Washington DC�

Tong� H� and Lim� K� S� �� 3Threshold autoregression� limit cycles and cyclicaldata�� J� Royal Stat� Soc� B ��

�� BIBLIOGRAPHY

Tsakalis� K� S� �� 3Robustness of model reference adaptive controllers� Input�Output properties�� IEEE Trans� Automatic Control ��

Tulleken� H� J� A� F� �� 3Grey�box modelling and identi�cation using physicalknowledge and bayesian techniques�� Automatica ��

Vapnik� V� �� Estimation of Dependences based on Empirical Data� Springer�Verlag� New York�

Vogler� H� S� �� Elements of Chemical Reaction Engineering� Prentice�Hall�Englewood Cli�s� NJ�

Volterra� V� �� Lecons sur la th�eoris math�ematique de la lutte pour la vie�Gauthier�Villars� Paris�

Wahba� G� �� Spline Models for Observational Data� SIAM� Philadelphia�

Wang� L��X� and Mendel� J� M� �� 3Fuzzy basis functions� universal approxi�mation� and orthogonal least�squares learning�� IEEE Trans� Neural Networks��

Weigend� A� S�� Huberman� B� A� and Rumelhart� D� E� �� 3Predicting thefuture� A connectionist approach�� International Journal of Neural Systems��

Weyer� E�� Williamson� R� C� and Mareels� I� M� Y� �� System identi�cationin the behavioral framework� Part II� Analysis� Submitted for publication�

Yager� R� R� and Filev� D� P� �� 3Uni�ed structure and parameter identi�ca�tion of fuzzy models�� IEEE Trans� Systems� Man� and Cybernetics ��

Ydstie� B� E� �� 3Transient performance and robustness of direct adaptivecontrol�� IEEE Trans� Automatic Control ��

Yoshinari� Y�� Pedrycz� W� and Hirota� K� �� 3Construction of fuzzy modelsthrough clustering techniques�� Fuzzy Sets and Systems ��

Zadeh� L� A� �� 3Fuzzy sets�� Information and Control ��

Zames� G� �� 3On the input�output stability of time�varying nonlinear feed�back systems� Part I� Conditions derived using concepts of loop gain� conicity�and positivity�� IEEE Trans� Automatic Control ��

Zhang� X��C�� Visala� A�� Halme� A� and Linko� P� �� 3Functional state mod�elling approach for bioprocesses� Local models for aerobic yeast growth pro�cesses�� J� Process Control ��

or arne johansen - institutt for teknisk kybernetikk, ntnu · 2012. 2. 18. · don wib erg at the...

Documents