systems modelling using genetic programming

Pergamon Computers chem. Engng, Vol. 21, Suppl., pp. SlI61-SI166 1997

© 1997 Elsevier Science Ltd All rights reserved

Printed in Great Britain

PII:S0098-1354(97)00206-8 0098-1354/97 $17.00+0.00

Systems Modelling using Genetic Programming

MARK WILLIS*, HUGO HIDEN*, MARK HINCHLIFFE*, BEN McKAY* and GEOFFREY W. BARTON +

*Symbolic Optimisation Research Group (SORG) Dept. Of Chemical and Process Engineering University of Newcastle upon Tyne NEI 7RU, UK { mark.willis, h.g.hiden, m.p.hinchliffe ,ben.mckay } @ncl.ac.uk Ph: +44 191 222 7242 Fax: +44 191 222 5292 http:\\lorien.ncl.ac.ukk~org

+Dept. of Chemical Engineering University of Sydney NSW 2006 Australia [email protected] Ph: +61 2 351 2455 Fax: +61 2 351 2854

Abstract: In this contribution, a Genetic Programming (GP) algorithm is used to develop empirical models of chemical process systems. GP performs symbolic regression, determining both the structure and the complexity of a model. Initially, steady-state model development using a GP algorithm is considered, next the methodology is extended to'the development of dynamic input-output models. The usefulness of the technique is demonstrated by the development of inferential estimation models for two typical processes: a vacuum distillation column and a twin screw cooking extruder.

INTRODUCTION

Genetic Programming (GP) began as an attempt to

discover how computers could learn to solve

problems without being explicitly programmed to do

so. The GP technique is an evolutionary algorithm

that bears a strong resemblance to genetic algorithm's

(GA's). The use of this flexible coding system allows

the algorithm to perform structural optimisation. Koza

(1992 and 1994) is largely responsible for the

popularisation of GP within the field of computer

science. His GP algorithm (coded in LISP) has been

applied to a wide range of problems including

symbolic regression, control, robotics, games and

classification. Iba et al. (1994) introduced a GP

algorithm, called STROGANOFF 1. The algorithm

integrated multiple regression analysis and a GA-

based search strategy. The function set was limited to

quadratic polynomials in two variables. The

effectiveness of STROGANOFF was demonstrated by

solving several system identification problems.

Howard and Oakley (1995) used Koza's LISP code

(Koza, 1992) to demonstrate that GP can be used to

perform time series prediction of chaotic systems.

Applications to chemical process systems

have included the generation of non-linear dynamic

models of biotechnological batch and fed-batch

I STructured Representation On Genetic Algorithms for Non-linear Function Fitting

fermentations (Bettenhausen and Marenbach, 1995;

Bettenhausen et al. , 1995; Marenbach et al. , 1996),

the identification of complex fluid flow patterns

(Watson and Parmee, 1996) and the generation of

steady-state input-output models of a range of

industrial chemical process systems (McKay et at. ,

1996a,b and 1997). Comparisons with established

modelling paradigms (such as neural networks) are

also available (McKay et al., 1996a). Thus, it has

been demonstrated that a GP algorithm can provide

automatic model input selection, structure selection

and parameter optimisation. While GP provides a

novel approach to process modelling, the paradigm

does not exploit model structures accepted in basic

numerical analysis / approximation theory. If such an

exploitation were possible, then structural constraints

would reduce the search space, increasing the

efficiency of the methodology while simultaneously

improving the approximation capability of the

paradigm.

In the majority of the literature on

approximation theory a function flu) is approximated

by a combination of functions (Ralston and

Rabinowitz, 1978).Typically, given u (a set of input

data) and {~bj(u)}, j=t ....... (a sequence of functions),

then the objective is to approximate f(u) by a linear

combination of the { Oj(u) },j=l ........ i.e.

Sl161

S1162

n f(u)= ~ aj~)j(u) (1)

j= l

The linear coefficients (aj) are estimated using least

squares. However, ill-conditioning problems can

occur if orthogonal basis functions are not used.

Polynomials are a popular choice for orthogonal basis

functions, as it is relatively straightforward to prove

convergence and completeness of the functional

approximation. However, there is no reason why the

sequence of functions {~bj(u)},j=l ....... cannot be

approximated using an arbitrary set of primitives such

as {exp, sqrt, log, +, -, *,/}. The use of the latter set,

as opposed to for example, standard polynomial

approximation, will ensure that the final model form

will have a parsimonious structure.

In a recent contribution (Hinchliffe et al,

1996) a GP algorithm was proposed to evolve the

functional sequence {~bj(u)},j=l ....... while the

parameters aj in Eqn (1) were obtained using batch

least squares. It was demonstrated that the structure

could greatly improve the GP algorithm's

performance, both in terms of speed of solution and

accuracy of the final model. More recently, Frohlich

and Hafner (1996) suggested a similar structure,

however, in their work an attempt was made to evolve

an orthogonal sequence of functions. In this

contribution we extend the work presented in

Hinchliffe et al (1996) and Frohlich and Hafner

(1996) by consideration of dynamic systems

modelling.

After introducing the fundamentals of GP, a

steady-state model is developed for a binary

distillation column. GP is then used to develop a

.dynamic model of an industrial food extruder. The

dynamic model obtained is compared to that obtained

using a feedforward artificial neural network.

GENETIC PROGRAMMING

GP belongs to a class of probabilistic search

procedures known as Evolutionary Algorithms (others

include Genetic Algorithm's, Evolution Strategies and

Evolutionary Programming). These techniques use

computational models of natural evolutionary

processes for the development of computer based

problem solving systems. All evolutionary algorithms

function by simulating the evolution of individual

structures via processes of reproductive variation and

fitness based selection. The techniques have become

extremely popular due to their success at searching

PSE '97-ESCAPE-7 Joint Conference

complex non-linear spaces and their robustness in

practical applications.

GP was originally proposed by Koza (1992)

in an attempt to automatically generate computer

programs. While similar to Genetic Algorithm's, the

main contribution of the GP paradigm is that it is

formulated in such way as to encourage evolutionary

algorithms to be applied to a range of new problem

domains. In particular, GP readily lends itself to

symbolic regression problems. A significant benefit

over existing techniques is the automation of structure

and input selection via a search in the topological

(structural) space, i.e. given a set of input data U =

{Ul, u2 ...... u, } and a pre-defined library of functions,

e.g. L = { sqrt, exp, log, +, -, *, / } a GP algorithm will

automatically discover a functional relationship.

ALGORITHM DETAILS

A GP algorithm works on a population of individuals,

each of which represent a potential solution to a

problem. A flowchart of a typical GP algorithm is

shown in Fig. 1. Initially, a random population of

individuals is generated. Next, the fitness of each

individual is determined. Fitness is a numeric value

assigned to each member of a population to provide a

measure of the appropriateness of a solution and it is

used as the basis for selecting members of the

population for reproduction. Having selected

candidate members of the population, three basic

genetic operators: direct reproduction, crossover and

mutation are applied.

The direct reproduction operator simply

copies a member from the parent population to the

next generation. The genetic operation of crossover

takes two members of the population and combines

them to create new offspring, while mutation makes

random alterations to the proposed model structure.

These genetic and selection operators guide the

evolution of the population towards a pre-defined

goal, in this case the selection of a empirical model

that gives the lowest squared prediction error.

Algorithmic details are discussed at length in

Hinchliffe et al (1996), McKay et al (1997) and Willis

et al (1997). The main features that distinguish our

algorithm from standard approaches are:

• a standard optimisation routine is used to obtain

the regression constants of the model. It is known

that GP is good at evolving symbolic expressions

but not as good at evolving numeric expressions.

• the capacity to model process dynamics is

included by the use of a time history of the process

input variables, rather than recursive nodes. A

procedure analogous to models being used as the

basis for chemical process control techniques such

as Dynamic Matrix Control (DMC).

• crossover is adjusted, incorporating a synthesis of

ideas from both GA's and standard GP (see Willis

et al, 1997 for further details).

• mutation is allowed to evolve an optimal time-

history of the input transformations. In other

words, the algorithm automatically determines the

length of the time history associated with each

function group.

The two most important features of the algorithm are

the model structure and the fitness (cost) function

used. These are discussed in more detail below.

~enerate Initial Population of Size N I

I Evaluate Model Nmess I

PSE '97-ESCAPE-7 Joint Conference S 1163

ej(nk,z -1 where ) k = 1 .... m is a sequence of

functions which, once evolved by the GP algorithm,

are re-defined as polynomials in the back-shift

operator (za). This allows the development of models

capable of capturing process dynamics. Here, a i is a

vector of parameters which are identified by batch

least squares. 'm' is the length of the time history

which is normally of the order of 4 time constants and

is adjusted during evolution.

The sequence of functions ~bj() j=l . . . . . . . are

evolved by combination of the input set U and the

functional set L. A particular problem with this

methodology arises if the input data is correlated -

batch least squares suffers ill-conditioning. This can

be exacerbated by the GP algorithm during evolution.

If identical functions , i.e. ¢1(') = ~2('), (or near

identical) are generated this will introduce correlation

into the data set operated on by the least squares

algorithm. To overcome these problems the objective

function must be modified to target the dual goals of

prediction error minimisation and orthogonal (non-

correlated) function generation.

Fig. 1 Typical genetic programming algorithm flowsheet.

MODEL STRUCTURE

In the majority of GP literature U and L are combined

in an arbitrary fashion, i.e. no structural constraints

are imposed in the topological domain. It is our

contention that this reduces the efficiency of the GP

methodology. Therefore, in our work (and following

standard numerical approximation literature) we

restrict the mapping function to be either, n

y(t) -- t'(u) = j=~laj~j (u) (2)

That is, a static mapping, or, for the case of a dynamic

model,

n y ( t ) = f ( u ) = .~ f j (ak, z -1) (3)

j= l

MULTIPLE OBJECTIVE COST FUNCTION

The fitness function is a linear weighted sum of a

scaled inverse root mean square (RMS) error and the

correlation measure. The error (e) obtained by fitting

Eqn(2) or Eqn(3) using least squares decreases for

better solutions. However, in GP it is more natural to

think of fitness values that are increasing during the

evolutionary process. As such, the following scaled

inverse RMS error was used:

f= E{y}/(E{y}+RMS) (4)

where E{'} is the expectation operator. This gives

fitness values between 0 and 1, with higher values

corresponding to better model performance. The

standard correlation coefficient (T) may be used to

determine the degree of similarity between functional

groups. Here, values range between 0 if the groups are

not correlated and 1 if there is an exact

correspondence. The correlation must be determined

between all sub-groups evolved by the GP algorithm

and the average used in the cost function, n i Y~ Y~ Yi , j ( i¢j)

f _ i= l j= l (5) n X(n-i) i=1

$1164

where 'n' is the number of functional groups and ~ is

merely a weighted average. This allows the definition

of the following fitness measure:

F = c t f+ (1- c0(1- ~ ) (6)

The minimisation of the RMS error is set as the

highest priority, with the incorporation of the

correlation measure ensuring the integrity of the

models produced using the GP algorithm.


control system on this column ensures that it operates

close to steady-state, thus allowing the dynamics

between the tray and the product compositions to be neglected.

CROSS VALIDATION

A GP algorithm runs for a specified number of

generations progressively altering model structure

with the objective of improving the fitness of the

solution. However, the fitness is based on the training

data set. It is well known that models should be

verified on a second validation data set. To perform

this cross validation the verification RMS is also

monitored. Generally, this RMS error will decrease in

a similar fashion to that of the training data. However,

there will be a point where the validation error

reaches a minimum and subsequent models produce

worse validation data RMS error. It is at this point

that over-training has occurred and the model chosen

should be that with the minimum RMS error on the

validation data set.

BINARY DISTILLATION COLUMN

The vacuum column considered is equipped with 48

trays, a steam-heated reboiler and a total condenser

(see Fig. 2). The control objective is to maintain the

quality of the bottom product stream. While

composition analysers may be used to measure

product quality, investment and maintenance costs

often restrict their use. Thus, a reliable inferential

estimator can provide a cheap alternative to direct

measurement and allow tighter control of product

composition.

The column design specifications and operating data

are summarised in McKay et al (1997). Since the

system is essentially binary, compositions can be

estimated based on temperature and pressure

measurements.

However, the necessary sensors are only installed on

trays 12, 27 and 42. As a result, reliable composition

estimates can only be obtained for these three trays.

The objective of this study was to develop a model

that could be used to infer the bottom product

composition given the available measurements. The

Distillate D, x D

Boil-up ~ Leb~ilet

Bottom Product B, x e

Figure 2: Vacuum Distillation Column

Input-output data was obtained from a

mechanistic model of the column. This model was

comprised of several hundred differential and

algebraic equations describing the column dynamics

(Gani et al, 1986). To develop an estimator, 150

records of steady-state composition estimates from

trays 12 ( XI 2 ) , 27 (X27) and 42 (x42) together with the

corresponding value of xB were used. A second set of

50 data records was used for model validation.

The GP algorithm was run for 50 generations

with a population size of 25 and ~ =0.9. In other

words, preference was given to minimisation of the

RMS error while simultaneously attempting to reduce

correlation between sub-groups in order to obtain a

robust model. The best model obtained using our

technique was:

, 3 2 XB = 0.48 [exp(x12exp(x42x12))x42x12]

+ 0.13x20713 + 0.52 (7)

This model had an RMS error on the verification data

set of 0.0189. Figure (3a) shows the actual and

predicted bottom composition for the validation data

set using Eqn. (7) while Figure (3b) shows the fit

obtained using the following linear model :

y(t) = 0.02x 12+0.56X42o0.18x27+0.06 (8)

which was obtained using batch least squares and had

an RMS error of 0.33.


Figure 3a - Linear Model

-+ A - + + 4 e : , , l l a

n£ 0 10 20 30 40 50 Data Points

Figure 313 - GP Model

" ' ___:,+%%+

°'+t t+ n~ 0 10 20 30 40 50

Data Points

Fig. 3. Linear and Non-linear Estimator (Column

Data)

Table 1 presents the correlation between the original

data columns. It may be noted that x12 is highly

correlated to x27 while there is a also a strong

correlation to x42. This strong correlation coupled

with inherent process non-linearity explains the poor

performance of the linear estimator. With regard to

the non-linear model the correlation between the sub-

groups is 0.24. The GP algorithm has taken a non-

linear combination of xt2 and x42 and regressed this

against a function of x27 producing an accurate model.

In other words the GP algorithm has automatically

discovered an accurate empirical model.

Table 1: Correlation between original data columns

Xli x42 x27 xl~ 1.00 0.26 0.80 x,t; ~ 0.26 1.00 0.53

0.80 0.53 1.00 x~7

TWIN SCREW COOKING EXTRUDER

Cooking extrusion has attracted considerable attention

from the food industry in recent years, as it provides

an efficient means for the continuous processing of a

wide range of foodstuffs. Cooking extruders are

already well established in the production of snack

foods, baby foods, breakfast cereals and pastas.

Extruders allow a short residence time and high

temperature cooking conditions, which result in a high

nutritional value of the product and low processing

costs.

A typical extruder consists of a barrel, inside

which one or more helical screws rotate to convey the

feed material towards a die at one end, as illustrated in

Fig.(4). The section nearest the feed point is referred

to as the solids conveying zone (SCz), where the

Sl165

screw channels are only partially filled and there is no

pressure build-up.

At some point along the extruder, the

channels become completely filled and the

temperature and pressure increase considerably as a

result of viscous heat dissipation and material

transport. This section is referred to as the melt zone

(Mz). Additional temperature control may be

provided by heating or cooling sections along the

barrel. Feed Flow, 0F Moisture,

+

Bacrel Temperamce, T h

~o. j . . . / . / . / ! . / / ! . / / . / / . / 1 . / / . / . 1 7 ~ . . . . . ~°~1111111111111111~11

Solids Ct)nveying Zeale ~ ~ Mell Zone

Fig. 4 Cooking Extruder.

The process under consideration is a simulation of a

twin screw cooking extruder. As described in Elsey et

al (1996), the parameters of the simulation have been

fitted to plant data obtained from a pilot scale APV

Baker MPF 40 cooking extruder operated by the

CSIRO Division of Food Science, Australia for the

processing of corn flour. For modelling purposes, the

process inputs were the screw speed (N), feed rate

(QF), feed moisture content (Me) and the feed

temperature (Te). While there appears to be little

agreement over what quality actually means for

extruded food products, there are a number of product

attributes that clearly contribute to any measure of

product quality. One of these, the degree of

gelatinisation (g) of the extruded corn starch, has been

selected as the process output of interest in this study.

The dynamic models were developed using a

trajectory of 600 data points, with the first 450 used

for training and the remainder for verification.

Variable dead times were identified and removed by

lagging the data prior to presentation to the modelling

algorithms. A locally recurrent network was trained

using a Levenberg-Marquardt optimisation algorithm

and 1 hidden layer containing 4 neurons was found to

give the best performance. The dynamic GP algorithm

was run for 100 generations with a population size of

30. The best model obtained is summarised in Table

2.

Fig. (5) shows the actual and predicted

gelatinisation for the validation data set using the

model in Table 2 as well as that obtained using a

neural network. The RMS error for the GP

Sl166

algorithm's model was 0.045 while the neural network

gave an RMS of 0.095.

Table 2 The best model obtained using the G P

algorithm.

Function Time History N 1- 15

N( N - 1) 1 - 18 6.3Q + QN 1- 17

This is an improvement, however, the main advantage

of the GP algorithm is that it has automatically

eliminated irrelevant model inputs as well as

automatically regressed an appropriate model offering

a degree of parsimony not attainable using a neural

network.

1 / , , . . . .

2oo 3oo 400 5oo 6oo

O0 1 O0 200 300 400 500 600 Time

Fig. 5. G P and A N N Est imator (Extruder Data)

C O N C L U S I O N S

The application of GP to the development of

empirical models of chemical process systems has

been considered. Two examples were used to

highlight the utility of this approach. The results

revealed that in each case the GP algorithm can

generate an accurate model based solely on observed

data. A distinct advantage of this method is that no a-

priori assumptions have to be made about the actual

model form: the structure and complexity of the

model evolve as part of the problem solution. Indeed,

these preliminary results indicate the potential of GP

for developing models of non-linear process systems:

the methodology can perform both input recognition

and model form recognition. However, the important

issue of exactly where GP fits within the ' toolbox' of

techniques available has not been addressed - this is

the focus of our current work. The methodology is

being compared to other non-linear black-box

modelling approaches in an attempt to establish GP as

technique for empirical modelling.


R E F E R E N C E S Bettenhausen, K.D. and Marenbach, P., (1995), 'Self-organising

modelling of biotechnological batch and fed-batch fermentations', Proc. EUROSIM'95.

Bettenhausen, K.D., Marenbach, P., Freyer, S., Rettenmaier, H. and Nieken, U., (1995), 'Self-organising structured modelling of a biotechnological fed-batch fermentation by means of genetic programming', Proc. lEE Conf. on Genetic

Algorithms in Engng. Systems: Inovations and Applications -

GALESIA'95, Sheffield, UK, No 414, pp.481-486. Elsey, J., Riepenhausen, J., McKay, B. and Barton, G.W., (1996),

'Dynamic modelling of a cooking extruder', The proceedings of Chemeca'96, Sydney, Australia.

Frohlich, J. and Hafner, C. (1996) 'Extended and Generalised Genetic Programming for Function Analysis', Available on WWW.

Gani, R.C., Ruiz, A. and Cameron, I.T., 'A Generalised Model for a Distillation Columns - I', Comp. Chem. Eng., 10, pp 181- 198, (1986).

Hinchliffe, M.P., Willis, M.J., Hiden, H.G., Tham, M.T., McKay, B. and Barton, G. (1996) 'Modelling Chemical Process Systems using a Multi-Gene Genetic Programming Algorithm', Late Breaking Paper, GP'96, Stanford, USA.

Howard, E. and Oakley, N., (1995), 'Genetic programming as a means of assessing and reflecting chaos', Tech. Report FS-95-

01, AAAI Press, pp.68-72. Iba, H., de Gads, H. and Sato, T., (1994), "Genetic programming

using local hill-climbing', Davidor, Y. ,Schwefel, H.P. and Manner, R. (Ed.), Parallel Problem Solving in Nature - PPSN

11I, Vot.866 of Lecture Notes in Computer Science, Springer- Verlag, Germany, pp.302-311.

Koza, J., (1992), 'Genetic programming: On the programming of computers by means of natural selection', The MIT Press, USA.

Koza, J., (1994), Genetic Programming 11: Automatic Discovery of

Reusable Programs, The MIT Press, USA. Marenbach, P., Bettenhausen, K.D. and Freyer, S., (1996), 'Signal

path oriented approach for generation of dynamic process models', Proc. 1st Anual Conf. on Genetic Programming,

Stanford University, USA, pp.327-332. McKay, B., Lennox, B., Willis, M.J., .Montague, G. and Barton,

G.W., (1996a), 'Extruder modelling: A comparison of two paradigms', Proc. of the UKACC Int. Conf. on Control -

CONTROL-96, Exeter, UK, Vol.2, pp.734-739. McKay,B., Elsey,J., Willis,M.J. and Barton,G.W., (1996b),

'Evolving input-output models of chemical process systems using genetic programming', Proc. IFAC, 13th World

Congress '96, San Francisco, USA, VoI.M, pp.277-282. McKay,B., Willis, M.J., Barton, G., (1997), 'Steadystate

modelling of chemical process systems using genetic programming', Accepted for publication in Comp. and Chem.

Eng.

Ralston, A. and Rabinowitz, P. (1978) 'A first course in numerical analysis', McGraw-Hill, Inc.

Watson, A.H., Parmee, I.C., (1996), 'Identification of fluid systems using genetic programming', Proc. of the Second Online

Workshop on Evolutionary Computation - WEC2 , pp.45-48. Willis, M.J., Hiden, H.G. and Montague, G.A. (1997) 'Developing

inferential estimation algorithms using Genetic Programming', Banff, Canada

systems modelling using genetic programming

Documents