systems modelling using genetic programming
TRANSCRIPT
Pergamon Computers chem. Engng, Vol. 21, Suppl., pp. SlI61-SI166 1997
© 1997 Elsevier Science Ltd All rights reserved
Printed in Great Britain
PII:S0098-1354(97)00206-8 0098-1354/97 $17.00+0.00
Systems Modelling using Genetic Programming
MARK WILLIS*, HUGO HIDEN*, MARK HINCHLIFFE*, BEN McKAY* and GEOFFREY W. BARTON +
*Symbolic Optimisation Research Group (SORG) Dept. Of Chemical and Process Engineering University of Newcastle upon Tyne NEI 7RU, UK { mark.willis, h.g.hiden, m.p.hinchliffe ,ben.mckay } @ncl.ac.uk Ph: +44 191 222 7242 Fax: +44 191 222 5292 http:\\lorien.ncl.ac.ukk~org
+Dept. of Chemical Engineering University of Sydney NSW 2006 Australia [email protected] Ph: +61 2 351 2455 Fax: +61 2 351 2854
Abstract: In this contribution, a Genetic Programming (GP) algorithm is used to develop empirical models of chemical process systems. GP performs symbolic regression, determining both the structure and the complexity of a model. Initially, steady-state model development using a GP algorithm is considered, next the methodology is extended to'the development of dynamic input-output models. The usefulness of the technique is demonstrated by the development of inferential estimation models for two typical processes: a vacuum distillation column and a twin screw cooking extruder.
INTRODUCTION
Genetic Programming (GP) began as an attempt to
discover how computers could learn to solve
problems without being explicitly programmed to do
so. The GP technique is an evolutionary algorithm
that bears a strong resemblance to genetic algorithm's
(GA's). The use of this flexible coding system allows
the algorithm to perform structural optimisation. Koza
(1992 and 1994) is largely responsible for the
popularisation of GP within the field of computer
science. His GP algorithm (coded in LISP) has been
applied to a wide range of problems including
symbolic regression, control, robotics, games and
classification. Iba et al. (1994) introduced a GP
algorithm, called STROGANOFF 1. The algorithm
integrated multiple regression analysis and a GA-
based search strategy. The function set was limited to
quadratic polynomials in two variables. The
effectiveness of STROGANOFF was demonstrated by
solving several system identification problems.
Howard and Oakley (1995) used Koza's LISP code
(Koza, 1992) to demonstrate that GP can be used to
perform time series prediction of chaotic systems.
Applications to chemical process systems
have included the generation of non-linear dynamic
models of biotechnological batch and fed-batch
I STructured Representation On Genetic Algorithms for Non-linear Function Fitting
fermentations (Bettenhausen and Marenbach, 1995;
Bettenhausen et al. , 1995; Marenbach et al. , 1996),
the identification of complex fluid flow patterns
(Watson and Parmee, 1996) and the generation of
steady-state input-output models of a range of
industrial chemical process systems (McKay et at. ,
1996a,b and 1997). Comparisons with established
modelling paradigms (such as neural networks) are
also available (McKay et al., 1996a). Thus, it has
been demonstrated that a GP algorithm can provide
automatic model input selection, structure selection
and parameter optimisation. While GP provides a
novel approach to process modelling, the paradigm
does not exploit model structures accepted in basic
numerical analysis / approximation theory. If such an
exploitation were possible, then structural constraints
would reduce the search space, increasing the
efficiency of the methodology while simultaneously
improving the approximation capability of the
paradigm.
In the majority of the literature on
approximation theory a function flu) is approximated
by a combination of functions (Ralston and
Rabinowitz, 1978).Typically, given u (a set of input
data) and {~bj(u)}, j=t ....... (a sequence of functions),
then the objective is to approximate f(u) by a linear
combination of the { Oj(u) },j=l ........ i.e.
Sl161
S1162
n f(u)= ~ aj~)j(u) (1)
j= l
The linear coefficients (aj) are estimated using least
squares. However, ill-conditioning problems can
occur if orthogonal basis functions are not used.
Polynomials are a popular choice for orthogonal basis
functions, as it is relatively straightforward to prove
convergence and completeness of the functional
approximation. However, there is no reason why the
sequence of functions {~bj(u)},j=l ....... cannot be
approximated using an arbitrary set of primitives such
as {exp, sqrt, log, +, -, *,/}. The use of the latter set,
as opposed to for example, standard polynomial
approximation, will ensure that the final model form
will have a parsimonious structure.
In a recent contribution (Hinchliffe et al,
1996) a GP algorithm was proposed to evolve the
functional sequence {~bj(u)},j=l ....... while the
parameters aj in Eqn (1) were obtained using batch
least squares. It was demonstrated that the structure
could greatly improve the GP algorithm's
performance, both in terms of speed of solution and
accuracy of the final model. More recently, Frohlich
and Hafner (1996) suggested a similar structure,
however, in their work an attempt was made to evolve
an orthogonal sequence of functions. In this
contribution we extend the work presented in
Hinchliffe et al (1996) and Frohlich and Hafner
(1996) by consideration of dynamic systems
modelling.
After introducing the fundamentals of GP, a
steady-state model is developed for a binary
distillation column. GP is then used to develop a
.dynamic model of an industrial food extruder. The
dynamic model obtained is compared to that obtained
using a feedforward artificial neural network.
GENETIC PROGRAMMING
GP belongs to a class of probabilistic search
procedures known as Evolutionary Algorithms (others
include Genetic Algorithm's, Evolution Strategies and
Evolutionary Programming). These techniques use
computational models of natural evolutionary
processes for the development of computer based
problem solving systems. All evolutionary algorithms
function by simulating the evolution of individual
structures via processes of reproductive variation and
fitness based selection. The techniques have become
extremely popular due to their success at searching
PSE '97-ESCAPE-7 Joint Conference
complex non-linear spaces and their robustness in
practical applications.
GP was originally proposed by Koza (1992)
in an attempt to automatically generate computer
programs. While similar to Genetic Algorithm's, the
main contribution of the GP paradigm is that it is
formulated in such way as to encourage evolutionary
algorithms to be applied to a range of new problem
domains. In particular, GP readily lends itself to
symbolic regression problems. A significant benefit
over existing techniques is the automation of structure
and input selection via a search in the topological
(structural) space, i.e. given a set of input data U =
{Ul, u2 ...... u, } and a pre-defined library of functions,
e.g. L = { sqrt, exp, log, +, -, *, / } a GP algorithm will
automatically discover a functional relationship.
ALGORITHM DETAILS
A GP algorithm works on a population of individuals,
each of which represent a potential solution to a
problem. A flowchart of a typical GP algorithm is
shown in Fig. 1. Initially, a random population of
individuals is generated. Next, the fitness of each
individual is determined. Fitness is a numeric value
assigned to each member of a population to provide a
measure of the appropriateness of a solution and it is
used as the basis for selecting members of the
population for reproduction. Having selected
candidate members of the population, three basic
genetic operators: direct reproduction, crossover and
mutation are applied.
The direct reproduction operator simply
copies a member from the parent population to the
next generation. The genetic operation of crossover
takes two members of the population and combines
them to create new offspring, while mutation makes
random alterations to the proposed model structure.
These genetic and selection operators guide the
evolution of the population towards a pre-defined
goal, in this case the selection of a empirical model
that gives the lowest squared prediction error.
Algorithmic details are discussed at length in
Hinchliffe et al (1996), McKay et al (1997) and Willis
et al (1997). The main features that distinguish our
algorithm from standard approaches are:
• a standard optimisation routine is used to obtain
the regression constants of the model. It is known
that GP is good at evolving symbolic expressions
but not as good at evolving numeric expressions.
• the capacity to model process dynamics is
included by the use of a time history of the process
input variables, rather than recursive nodes. A
procedure analogous to models being used as the
basis for chemical process control techniques such
as Dynamic Matrix Control (DMC).
• crossover is adjusted, incorporating a synthesis of
ideas from both GA's and standard GP (see Willis
et al, 1997 for further details).
• mutation is allowed to evolve an optimal time-
history of the input transformations. In other
words, the algorithm automatically determines the
length of the time history associated with each
function group.
The two most important features of the algorithm are
the model structure and the fitness (cost) function
used. These are discussed in more detail below.
~enerate Initial Population of Size N I
I Evaluate Model Nmess I
PSE '97-ESCAPE-7 Joint Conference S 1163
ej(nk,z -1 where ) k = 1 .... m is a sequence of
functions which, once evolved by the GP algorithm,
are re-defined as polynomials in the back-shift
operator (za). This allows the development of models
capable of capturing process dynamics. Here, a i is a
vector of parameters which are identified by batch
least squares. 'm' is the length of the time history
which is normally of the order of 4 time constants and
is adjusted during evolution.
The sequence of functions ~bj() j=l . . . . . . . are
evolved by combination of the input set U and the
functional set L. A particular problem with this
methodology arises if the input data is correlated -
batch least squares suffers ill-conditioning. This can
be exacerbated by the GP algorithm during evolution.
If identical functions , i.e. ¢1(') = ~2('), (or near
identical) are generated this will introduce correlation
into the data set operated on by the least squares
algorithm. To overcome these problems the objective
function must be modified to target the dual goals of
prediction error minimisation and orthogonal (non-
correlated) function generation.
Fig. 1 Typical genetic programming algorithm flowsheet.
MODEL STRUCTURE
In the majority of GP literature U and L are combined
in an arbitrary fashion, i.e. no structural constraints
are imposed in the topological domain. It is our
contention that this reduces the efficiency of the GP
methodology. Therefore, in our work (and following
standard numerical approximation literature) we
restrict the mapping function to be either, n
y(t) -- t'(u) = j=~laj~j (u) (2)
That is, a static mapping, or, for the case of a dynamic
model,
n y ( t ) = f ( u ) = .~ f j (ak, z -1) (3)
j= l
MULTIPLE OBJECTIVE COST FUNCTION
The fitness function is a linear weighted sum of a
scaled inverse root mean square (RMS) error and the
correlation measure. The error (e) obtained by fitting
Eqn(2) or Eqn(3) using least squares decreases for
better solutions. However, in GP it is more natural to
think of fitness values that are increasing during the
evolutionary process. As such, the following scaled
inverse RMS error was used:
f= E{y}/(E{y}+RMS) (4)
where E{'} is the expectation operator. This gives
fitness values between 0 and 1, with higher values
corresponding to better model performance. The
standard correlation coefficient (T) may be used to
determine the degree of similarity between functional
groups. Here, values range between 0 if the groups are
not correlated and 1 if there is an exact
correspondence. The correlation must be determined
between all sub-groups evolved by the GP algorithm
and the average used in the cost function, n i Y~ Y~ Yi , j ( i¢j)
f _ i= l j= l (5) n X(n-i) i=1
$1164
where 'n' is the number of functional groups and ~ is
merely a weighted average. This allows the definition
of the following fitness measure:
F = c t f+ (1- c0(1- ~ ) (6)
The minimisation of the RMS error is set as the
highest priority, with the incorporation of the
correlation measure ensuring the integrity of the
models produced using the GP algorithm.
PSE '97-ESCAPE-7 Joint Conference
control system on this column ensures that it operates
close to steady-state, thus allowing the dynamics
between the tray and the product compositions to be neglected.
CROSS VALIDATION
A GP algorithm runs for a specified number of
generations progressively altering model structure
with the objective of improving the fitness of the
solution. However, the fitness is based on the training
data set. It is well known that models should be
verified on a second validation data set. To perform
this cross validation the verification RMS is also
monitored. Generally, this RMS error will decrease in
a similar fashion to that of the training data. However,
there will be a point where the validation error
reaches a minimum and subsequent models produce
worse validation data RMS error. It is at this point
that over-training has occurred and the model chosen
should be that with the minimum RMS error on the
validation data set.
BINARY DISTILLATION COLUMN
The vacuum column considered is equipped with 48
trays, a steam-heated reboiler and a total condenser
(see Fig. 2). The control objective is to maintain the
quality of the bottom product stream. While
composition analysers may be used to measure
product quality, investment and maintenance costs
often restrict their use. Thus, a reliable inferential
estimator can provide a cheap alternative to direct
measurement and allow tighter control of product
composition.
The column design specifications and operating data
are summarised in McKay et al (1997). Since the
system is essentially binary, compositions can be
estimated based on temperature and pressure
measurements.
However, the necessary sensors are only installed on
trays 12, 27 and 42. As a result, reliable composition
estimates can only be obtained for these three trays.
The objective of this study was to develop a model
that could be used to infer the bottom product
composition given the available measurements. The
Distillate D, x D
Boil-up ~ Leb~ilet
Bottom Product B, x e
Figure 2: Vacuum Distillation Column
Input-output data was obtained from a
mechanistic model of the column. This model was
comprised of several hundred differential and
algebraic equations describing the column dynamics
(Gani et al, 1986). To develop an estimator, 150
records of steady-state composition estimates from
trays 12 ( XI 2 ) , 27 (X27) and 42 (x42) together with the
corresponding value of xB were used. A second set of
50 data records was used for model validation.
The GP algorithm was run for 50 generations
with a population size of 25 and ~ =0.9. In other
words, preference was given to minimisation of the
RMS error while simultaneously attempting to reduce
correlation between sub-groups in order to obtain a
robust model. The best model obtained using our
technique was:
, 3 2 XB = 0.48 [exp(x12exp(x42x12))x42x12]
+ 0.13x20713 + 0.52 (7)
This model had an RMS error on the verification data
set of 0.0189. Figure (3a) shows the actual and
predicted bottom composition for the validation data
set using Eqn. (7) while Figure (3b) shows the fit
obtained using the following linear model :
y(t) = 0.02x 12+0.56X42o0.18x27+0.06 (8)
which was obtained using batch least squares and had
an RMS error of 0.33.
PSE '97-ESCAPE-7 Joint Conference
Figure 3a - Linear Model
-+ A - + + 4 e : , , l l a
n£ 0 10 20 30 40 50 Data Points
Figure 313 - GP Model
" ' ___:,+%%+
°'+t t+ n~ 0 10 20 30 40 50
Data Points
Fig. 3. Linear and Non-linear Estimator (Column
Data)
Table 1 presents the correlation between the original
data columns. It may be noted that x12 is highly
correlated to x27 while there is a also a strong
correlation to x42. This strong correlation coupled
with inherent process non-linearity explains the poor
performance of the linear estimator. With regard to
the non-linear model the correlation between the sub-
groups is 0.24. The GP algorithm has taken a non-
linear combination of xt2 and x42 and regressed this
against a function of x27 producing an accurate model.
In other words the GP algorithm has automatically
discovered an accurate empirical model.
Table 1: Correlation between original data columns
Xli x42 x27 xl~ 1.00 0.26 0.80 x,t; ~ 0.26 1.00 0.53
0.80 0.53 1.00 x~7
TWIN SCREW COOKING EXTRUDER
Cooking extrusion has attracted considerable attention
from the food industry in recent years, as it provides
an efficient means for the continuous processing of a
wide range of foodstuffs. Cooking extruders are
already well established in the production of snack
foods, baby foods, breakfast cereals and pastas.
Extruders allow a short residence time and high
temperature cooking conditions, which result in a high
nutritional value of the product and low processing
costs.
A typical extruder consists of a barrel, inside
which one or more helical screws rotate to convey the
feed material towards a die at one end, as illustrated in
Fig.(4). The section nearest the feed point is referred
to as the solids conveying zone (SCz), where the
Sl165
screw channels are only partially filled and there is no
pressure build-up.
At some point along the extruder, the
channels become completely filled and the
temperature and pressure increase considerably as a
result of viscous heat dissipation and material
transport. This section is referred to as the melt zone
(Mz). Additional temperature control may be
provided by heating or cooling sections along the
barrel. Feed Flow, 0F Moisture,
+
Bacrel Temperamce, T h
~o. j . . . / . / . / ! . / / ! . / / . / / . / 1 . / / . / . 1 7 ~ . . . . . ~°~1111111111111111~11
Solids Ct)nveying Zeale ~ ~ Mell Zone
Fig. 4 Cooking Extruder.
The process under consideration is a simulation of a
twin screw cooking extruder. As described in Elsey et
al (1996), the parameters of the simulation have been
fitted to plant data obtained from a pilot scale APV
Baker MPF 40 cooking extruder operated by the
CSIRO Division of Food Science, Australia for the
processing of corn flour. For modelling purposes, the
process inputs were the screw speed (N), feed rate
(QF), feed moisture content (Me) and the feed
temperature (Te). While there appears to be little
agreement over what quality actually means for
extruded food products, there are a number of product
attributes that clearly contribute to any measure of
product quality. One of these, the degree of
gelatinisation (g) of the extruded corn starch, has been
selected as the process output of interest in this study.
The dynamic models were developed using a
trajectory of 600 data points, with the first 450 used
for training and the remainder for verification.
Variable dead times were identified and removed by
lagging the data prior to presentation to the modelling
algorithms. A locally recurrent network was trained
using a Levenberg-Marquardt optimisation algorithm
and 1 hidden layer containing 4 neurons was found to
give the best performance. The dynamic GP algorithm
was run for 100 generations with a population size of
30. The best model obtained is summarised in Table
2.
Fig. (5) shows the actual and predicted
gelatinisation for the validation data set using the
model in Table 2 as well as that obtained using a
neural network. The RMS error for the GP
Sl166
algorithm's model was 0.045 while the neural network
gave an RMS of 0.095.
Table 2 The best model obtained using the G P
algorithm.
Function Time History N 1- 15
N( N - 1) 1 - 18 6.3Q + QN 1- 17
This is an improvement, however, the main advantage
of the GP algorithm is that it has automatically
eliminated irrelevant model inputs as well as
automatically regressed an appropriate model offering
a degree of parsimony not attainable using a neural
network.
1 / , , . . . .
2oo 3oo 400 5oo 6oo
O0 1 O0 200 300 400 500 600 Time
Fig. 5. G P and A N N Est imator (Extruder Data)
C O N C L U S I O N S
The application of GP to the development of
empirical models of chemical process systems has
been considered. Two examples were used to
highlight the utility of this approach. The results
revealed that in each case the GP algorithm can
generate an accurate model based solely on observed
data. A distinct advantage of this method is that no a-
priori assumptions have to be made about the actual
model form: the structure and complexity of the
model evolve as part of the problem solution. Indeed,
these preliminary results indicate the potential of GP
for developing models of non-linear process systems:
the methodology can perform both input recognition
and model form recognition. However, the important
issue of exactly where GP fits within the ' toolbox' of
techniques available has not been addressed - this is
the focus of our current work. The methodology is
being compared to other non-linear black-box
modelling approaches in an attempt to establish GP as
technique for empirical modelling.
PSE '97-ESCAPE-7 Joint Conference
R E F E R E N C E S Bettenhausen, K.D. and Marenbach, P., (1995), 'Self-organising
modelling of biotechnological batch and fed-batch fermentations', Proc. EUROSIM'95.
Bettenhausen, K.D., Marenbach, P., Freyer, S., Rettenmaier, H. and Nieken, U., (1995), 'Self-organising structured modelling of a biotechnological fed-batch fermentation by means of genetic programming', Proc. lEE Conf. on Genetic
Algorithms in Engng. Systems: Inovations and Applications -
GALESIA'95, Sheffield, UK, No 414, pp.481-486. Elsey, J., Riepenhausen, J., McKay, B. and Barton, G.W., (1996),
'Dynamic modelling of a cooking extruder', The proceedings of Chemeca'96, Sydney, Australia.
Frohlich, J. and Hafner, C. (1996) 'Extended and Generalised Genetic Programming for Function Analysis', Available on WWW.
Gani, R.C., Ruiz, A. and Cameron, I.T., 'A Generalised Model for a Distillation Columns - I', Comp. Chem. Eng., 10, pp 181- 198, (1986).
Hinchliffe, M.P., Willis, M.J., Hiden, H.G., Tham, M.T., McKay, B. and Barton, G. (1996) 'Modelling Chemical Process Systems using a Multi-Gene Genetic Programming Algorithm', Late Breaking Paper, GP'96, Stanford, USA.
Howard, E. and Oakley, N., (1995), 'Genetic programming as a means of assessing and reflecting chaos', Tech. Report FS-95-
01, AAAI Press, pp.68-72. Iba, H., de Gads, H. and Sato, T., (1994), "Genetic programming
using local hill-climbing', Davidor, Y. ,Schwefel, H.P. and Manner, R. (Ed.), Parallel Problem Solving in Nature - PPSN
11I, Vot.866 of Lecture Notes in Computer Science, Springer- Verlag, Germany, pp.302-311.
Koza, J., (1992), 'Genetic programming: On the programming of computers by means of natural selection', The MIT Press, USA.
Koza, J., (1994), Genetic Programming 11: Automatic Discovery of
Reusable Programs, The MIT Press, USA. Marenbach, P., Bettenhausen, K.D. and Freyer, S., (1996), 'Signal
path oriented approach for generation of dynamic process models', Proc. 1st Anual Conf. on Genetic Programming,
Stanford University, USA, pp.327-332. McKay, B., Lennox, B., Willis, M.J., .Montague, G. and Barton,
G.W., (1996a), 'Extruder modelling: A comparison of two paradigms', Proc. of the UKACC Int. Conf. on Control -
CONTROL-96, Exeter, UK, Vol.2, pp.734-739. McKay,B., Elsey,J., Willis,M.J. and Barton,G.W., (1996b),
'Evolving input-output models of chemical process systems using genetic programming', Proc. IFAC, 13th World
Congress '96, San Francisco, USA, VoI.M, pp.277-282. McKay,B., Willis, M.J., Barton, G., (1997), 'Steadystate
modelling of chemical process systems using genetic programming', Accepted for publication in Comp. and Chem.
Eng.
Ralston, A. and Rabinowitz, P. (1978) 'A first course in numerical analysis', McGraw-Hill, Inc.
Watson, A.H., Parmee, I.C., (1996), 'Identification of fluid systems using genetic programming', Proc. of the Second Online
Workshop on Evolutionary Computation - WEC2 , pp.45-48. Willis, M.J., Hiden, H.G. and Montague, G.A. (1997) 'Developing
inferential estimation algorithms using Genetic Programming', Banff, Canada