escext1
TRANSCRIPT
-
8/8/2019 ESCext1
1/9
Computers and Chemical Engineering xxx (2005) xxxxxx
Monitoring process transitions by Kalman filteringand time-series segmentation
Balazs Feil, Janos Abonyi, Sandor Nemeth, Peter Arva
University of Veszprem, Department of Process Engineering, P.O. Box 158, H-8201 Veszprem, Hungary
Abstract
The analysis of historical process data of technological systems plays important role in process monitoring, modelling and control. Time-series segmentation algorithms are often used to detect homogenous periods of operation-based on inputoutput process data. However,
historical process data alone may not be sufficient for the monitoring of complex processes. This paper incorporates the first-principle model
of the process into the segmentation algorithm. The key idea is to use a model-based non-linear state-estimation algorithm to detect the
changes in the correlation among the state-variables. The homogeneity of the time-series segments is measured using a PCA similarity factor
calculated from the covariance matrices given by the state-estimation algorithm. The whole approach is applied to the monitoring of an
industrial high-density polyethylene plant.
2005 Elsevier Ltd. All rights reserved.
Keywords: Process monitoring; Time-series segmentation; Non-linear state-estimation; Polyethylene production
1. Introduction
Continuous process plants undergo a number of changes
from one operating mode to another. These process transi-
tions are quite common in the chemical industry. The major
aims of monitoring plant performance at process transitions
are the reduction of off-specification production, the identifi-
cation of important process disturbances and the early warn-
ing of process malfunctions or plant faults (Wang, 1999).
Manual process supervision relies heavily on visual moni-
toring of characteristic process trends. Although humans are
very good at visually detecting such patterns, for a control
system software it is a difficult problem. The first step toward
building an automatized decision support system is the intel-ligent analysis of archive process data (Kivikunnas, 1998;
Vincze, Arva, Abonyi, & Nemeth, 2003;Stephanopoulos &
Han, 1996).
The segmentation of multivariate time-series is especially
important in the data-based analysis and monitoring of mod-
Corresponding author.
E-mail address: [email protected] (J. Abonyi).URL: http://www.fmt.vein.hu/softcomp.
ern production systems, where huge amount of historical
process data are recorded with distributed control systems(DCS). These data definitely have the potential to provide
information for product and process design, monitoring and
control (Yamashita, 2000). This is especially important in
many practical applications, where first-principles model-
ing of complex data rich and knowledge poor systems are
not possible (Zhang, Martin, & Morris, 1997). Hence, KDD
methods have been successfully applied to the analysis of
process systems, and the results have been used in process
design, process improvement, operator training, and so on
(Wang, 1999).
Time-series segmentation is often used to extract inter-
nally homogeneous segments from a given time-series to lo-cate stable periods of time, to identify change points, or to
simply compress the original time-series into a more com-
pact representation (Last, Klein, & Kandel, 2000). Although
in many real-life applicationsa lotof variablesmust be simul-
taneously tracked and monitored, most of the segmentation
algorithms are used for the analysis of only one time-variant
variable (Kivikunnas, 1998).
The main problem with this univariate approach is that
in some cases the hidden process, so the correlation among
0098-1354/$ see front matter 2005 Elsevier Ltd. All rights reserved.
doi:10.1016/j.compchemeng.2005.02.014
-
8/8/2019 ESCext1
2/9
2 B. Feil et al. / Computers and Chemical Engineering xxx (2005) xxxxxx
the variables, vary in time. In case of process engineering
systems this phenomena can occur when a different product
is formed, and/or different catalyst is applied, or there are
significant process faults, etc. The segmentation of only one
measured variable is not able to detect such changes. Hence,
the segmentation algorithm should be based on multivariate
statistical tools.Hence, the aim of this paper is to develop new algorithms
that are able to handle time-varying multivariate data that
is able to detect changes in the correlation structure among
variables.
The segmentation algorithms simultaneously determine
the parameters of the models and the borders of the segments
by minimizing thesum of thecostsof theindividualsegments.
Hence, a cost function describing the internal homogeneity
of individual segments should be defined. Usually, this cost
function is based on the distances between the actual values
of the time-series and the values given by a simple func-
tion fitted to the data of each segment (Keogh, Chu, Hart, &
Pazzani, 2001). Hence, time-series segmentation algorithms,such as methods that applies Principal Component Analy-
sis (PCA) and fuzzy clustering algorithm (Nemeth, Abonyi,
Feil, & Arva, 2003) are based on inputoutput process data.
However, historical process data alone usually may not
sufficient for monitoring complex processes. The current
measured inputoutput data pairs are often not in casuality
relationship because of the dead time and the dynamical be-
havior of the system. In practice, the state-variables happen
to be not measurable, or rarely measured only by off-line lab-
oratory tests. To solve these problems, different methods can
be applied that happen to force the usage of delayed mea-
sured data besides the current data, e.g. the method proposedin Srinivasan, Wang, Ho, and Lim (2004) which is based on
Dynamic Principal Component Analysis.
The main idea of this paper is to apply non-linear state-
estimation algorithm to detect changes in the estimated state-
variables and the correlation of their modelling error.
This paper is organizedas follows.In Section 2.1, the basic
idea of time-series segmentation and the applied algorithm
are given. Section 2.2 gives overview of multivariate seg-
mentation and the measure of internal homogeneity. Section
2.3 proposes three different methods to get information about
the changes of multivariate time-series. These approaches are
compared in a case study based on a real-life application ex-
ample in Section 3. Finally some conclusions are given in
Section 4.
2. State-estimation-based segmentation of historical
process data
2.1. Time-series segmentation
A time-series, T = {xk|1 k N}, is a finite set of
N samples labelled by time points t1, . . . , t N, where xk =
[x1,k, x2,k, . . . , xn,k]T. A segment of T is a set of consec-
utive time points, S(a, b) = {a k b}, xa, xa+1, . . . , xb.
The c-segmentation of time-series T is a partition of T to c
non-overlappingsegments, ScT = {Si(ai, bi)|1 i c}, such
that a1 = 1, bc = N and ai = bi1 + 1. In other words, a c-
segmentation splits T to c disjoint time intervals by segment
boundaries s1 < s2 < . . . < sc, where Si(si1 + 1, si).
Usually the goal is to find homogeneous segments froma given time-series. In order to formalize this goal, a cost
function with the internal homogeneity of individual seg-
ments should be defined. This cost function can be any arbi-
trary function. For example in (Himberg, Korpiaho, Mannila,
Tikanmaki, & Toivonen, 2001; Vasko & Toivonen, 2002) the
sum of variances of the variables in the segment was defined
as cost(Si(ai, bi)):
cost(Si(ai, bi)) =1
bi ai + 1
bik=ai
xk vi 2 . (1)
where vi the mean of the segment.
Usually, the cost function, cost(S(a, b)), is defined basedon the distances between the actual values of the time-series
and the values given by a simple function (constant or linear
function, or a polynomial of a higher but limited degree)
fitted to the data of each segment. Hence, the segmentation
algorithms simultaneously determine the parameters of the
models and the borders of the segments, ai, bi, by minimizing
the sum of the costs of the individual segments:
cost(ScT) =
ci=1
cost(Si). (2)
This cost function can be minimized by dynamic program-ming, which is computationally intractable for many real
datasets (Himberg et al., 2001). Consequently, heuristic op-
timization techniques such as greedy top-down or bottom-up
techniques are frequently used to find good but suboptimal c-
segmentations (Keogh et al., 2001; Stephanopoulos and Han,
1996):
Sliding window: A segment is grown until it exceeds some
error bound. The process repeats with the next data
point not included in the newly approximated segment.
For example a linear model is fitted on the observed
period and the modelling error is analyzed.
Top-down method: The time-series is recursively partitioned
until some stopping criterion is met.
Bottom-up method: Starting from the finest possible approx-
imation, segments are merged until some stopping cri-
terion is met.
Search for inflection points: Searching for primitive
episodes located between two inflection points.
Among these heuristic approaches the bottom-up algorithm
has been proven to be practically useful. This algorithm be-
gins creating a fine approximation of the time-series, and
goes on to merge the lowest cost pair of segments iteratively
-
8/8/2019 ESCext1
3/9
B. Feil et al. / Computers and Chemical Engineering xxx (2005) xxxxxx 3
Table 1
Bottom-up segmentation algorithm
Create initial fine approximation.
Find the cost of merging for each pair of segments:
mergecost(i) = cost(S(ai, bi+1))
while min(mergecost) < maxerror
Find the cheapest pair to merge: i = argmini(cost(i))
Merge the two segments, update the boundary indices, ai, bi, andrecalculate the merge costs.
mergecost(i) = cost(S(ai, bi+1))
mergecost(i 1) = cost(S(ai1, bi))
end
until a stopping criteria is met. The detailed description of
the algorithm can be found in Table 1.
2.2. Covariance-based similarity measure
Time-series segmentation is often used to extract inter-
nally homogeneous segments from a given time-series. Usu-
ally, the cost function describing the internal homogeneity
of the individual segments is defined based on the distances
between the actual values of the time-series and the values
given by a simple univariate function fitted to the data of each
segment.
Due to the hidden nature of the process the measured vari-
ables are correlated. In some cases the hidden process, so
the correlation among the variables, vary in time. This phe-
nomena can occur at process transitions or when there is a
significant process fault, etc. The segmentation of only one
measured variable is not able to detect such changes. Hence,
the segmentation algorithm should be based on multivariate
statistical tools.
Covariance matrices, Pk, describe the relationship be-tween the variables around the kth data point and they can
also be used to calculate the cost function-based on a covari-
ance matrix similarity measure:
cost(Si(ai, bi)) =1
bi ai + 1
bik=ai
scov(Pk, PSi ) (3)
where PSi is the covariance matrix of the ith segment with the
borders ai and bi, which can be calculated by the averaging
of the matrices Pk|ai k bi.
To compare covariance matrices, a PCA similarity factor,
scov, developed by Krzanowski (1979) can be applied. Let us
consider the first p eigenvectors of the Pi and Pj covariancematrices, Ui,p and Uj,p, which can be considered the (n p)
subspaces of two PCA models. The similarity between these
subspaces is defined based on the sum of the squares of the
cosines of the angles between each principal component of
Ui,p and Uj,p:
scov(Pi, Pj) =1
p
pi=1
pj=1
cos2 i,j
=1
ptrace(UTi,pUj,pU
Tj,pUi,p) (4)
Because the Ui,p and Uj,p subspaces contain the p most im-
portant principal components that account for most of the
variance of the state-variables at the ith and jth time instants,
scov is also a measure of the similarity between the two co-
variance matrices.
The similarity of the found segments can be displayed as
a dendrogram. A dendrogram is a tree-shaped map of thesimilarities that shows the merging of segments into clus-
ters at various stages of the analysis. The interpretation of
the results is intuitive, which is the major reason of these
methods to illustrate the results of a hierarchical clustering
(see Fig. 5).
2.3. Covariance of the monitored variables
In the previous subsection, it has been shown that the co-
variance of the monitored process variables can be used to
measure the homogeneity of the segments of multivariate
time-series. The main problem of the application of this ap-
proach is how we can estimate covariance matrices that con-
tain useful information about the operation of the monitored
process.
The most straightforward approach is the recursive esti-
mation of the Pk covariances:
Pk =1
j,k
Pk1
Pk1xkxTk Pk1
j,k + xTk Pk1xk
(5)
where Pk is a matrix proportional to the covariance matrix
and j is a scalar forgetting factor of the jth rule adaptation.
This tool can be directly used to analyze the measured
inputoutput data, xk= [uT, y]T, which approach is consid-
ered as the basis of the first algorithm proposed in the paper
(Algorithm 1).
Historical inputoutput process data alone may be not suf-
ficient for the monitoring of complex processes. Hence, the
main idea of this paper is to apply non-linear state-estimation
algorithm to detect changes in the in the estimated state-
variables (Algorithm 2) and the correlation of their mod-
elling error (Algorithm 3).
Theproposedalgorithms have been developed for the gen-
eral non-linear model of a dynamical system:
xk+1 = f(xk, uk, vk) (6)
yk = g(xk, wk) (7)
where vk and wk are noise variables assumed to be in-
dependent of the current and past states, vk N(vk, Qk),
wk N(wk, Rk).
Thedevelopedalgorithm is based on theresults of standard
state-estimation algorithms, i.e. the estimated state-variables,
xk = xk + Kk[yk yk] (8)
and their a posteriori covariance matrix,
Pk = E[(xk xk)(xk xk)T] (9)
-
8/8/2019 ESCext1
4/9
4 B. Feil et al. / Computers and Chemical Engineering xxx (2005) xxxxxx
In these expressions xk = E[xk|Yk1], yk = E[yk|Y
k1],
(Yk1 is a matrix containing the past measurements), and Kkis the Kalman gain: Kk = Pxy,kP
1y,k, where Pxy,k = E[(xk
xk)(yk yk)T|Yk1], Py,k = E[(yk yk)(yk yk)
T|Yk1].
By selecting theupdate of theestimated variables andtheir
covariance so that the covariance for the estimation error isminimized, we can obtain the following update-rule of the
covariance matrix
Pk = Pk KkPy,kKTk , (10)
where
Pk = E[(xk xk)(xk xk)T|Yk1]. (11)
As the various expectations used in these equations in gen-
eral are intractable, some kind of approximation is commonly
used. The Extended Kalman Filter (EKF) is based on Tay-
lor linearization of the state-transition and output equations.
Although thedevelopedalgorithm canbe applied to any state-estimation algorithms, the effectiveness of the selected filter
has an effect on the results of the segmentation. The uti-
lized DD2 filter is based on approximations obtained with
a multivariable extension of Stirlings interpolation formula.
This filter is simple to implement as no derivatives of the
model equations are needed, yet it provides excellent accu-
racy (Poulsen, Norgaard, & Ravn, 2000).
Based on the result of this non-linear state-estimation two
different algorithms can be defined. Algorithm 2 is based
on the direct analysis of the estimated state-variables, x = x,
while Algorithm 3, which is the main contribution of this
paper, uses the a posteriori covariance matrices, Pk, given by
the non-linear state-estimation algorithm (Pk = Pk).
3. Application example
3.1. Problem description
In this section, the proposed algorithms will be applied
to the data- and model-based product quality monitoring
and control of a polyethylene plant at Tiszai Vegyi Kom-
bint (TVK) Ltd., which is the largest Hungarian polyolefine
production company (http://www.tvk.hu). The monitoring of
a medium and high-density polyethylene (MDPE, HDPE)
plant is considered. HDPE is versatile plastic used for house-hold goods, packaging, car parts and pipe. The main prop-
erties of products of the HDPE (Melt Index (MI) and den-
sity) are controlled by the reactor temperature, monomer,
comonomer and chain-transfer agent concentrations. An in-
teresting problem with the process is that it is required to
produce about ten product grades according to market de-
mand. Hence, there is a clear need to minimize the time of
changeover because off-specification product may be pro-
duced during the process transitions.
The polymerization unit is controlled by a Honeywell
Distributed Control System (DCS), and the relevant process
variables are collected and stored by the Honeywell Pro-
cess History Data-module. The proposed process monitor-
ing tool has been implemented independently from the DCS;
the database of the historical process data are stored by an
MySQL SQL-server. Most of the measurements are available
in every 15 s on process variables which consist of input and
output variables: the comonomer hexene, the monomer ethy-
lene, the solvent isobutene andthe chain transfer agent hydro-gen inlet flowrates and temperatures (u1,...,4 = FinC6,C2,C4,H2
and u5,...,8 = Tin
C6,C2,C4,H2), the flowrate of the catalyst (u9 =
Fincat), and the flowrate, the inlet and the outlet temperatures
of the cooling water (u10,...,12 = Finw , T
inw , T
outw ).
-6pt The prototype of the proposed process moni-
toring tool has been implemented in MATLAB with
the use of the database and Kalman filter toolboxes
(http://www.iau.dtu.dk/research/control/kalmtool.html).
3.2. The model of the process
The model used in the state-estimation algorithm contains
the mass, components and energy balance equations to es-timate the mass of the fluid and the formulated polymer in
the reactor, the concentrations of the main components (ethy-
lene, hexene, hydrogen and catalyst) and the reactor tempera-
ture. Hence, the state-variables of this detailed first-principles
model are the mass of the fluid and the polymer in the reactor
(x1 = GF and x2 = GPE), the chain transfer agent concentra-
tion (x3 = cH2 ), monomer, comonomer and catalyst concen-
tration in the loop reactor (x4 = cC2 , x5 = cC6 and x6 = ccat),
and reactor temperature (x7 = TR). Since there are some un-
known parameters related to the reaction rates of the differ-
ent catalysts applied to produce the different products, there
are additional state-variables: the reaction rate coefficientsx8 = kC2 , x9 = kC6 , x10 = kH2 .
With the use of these state-variables the main model equa-
tions are formulated as follows:dGF
dt=
j
Finj FoutF
i
kiciGFccatGPE (12)
dGPE
dt=
i
kiciGFccatGPE FoutPE (13)
dci
dt=
1
GF
Fini F
outF ci kiciGFccatGPE ci
dGF
dt
(14)
dccat
dt=
1
GPE
Fincat F
outPE ccat ccat
dGPE
dt
(15)
dTR
dt=
1
GFcFp + GPEcPEp + Greactorc
reactorp
j
Finj cjp(T
inj TR) +
i
kiciGFccatGPEHi
Qcooling + Qstirring
(16)
http://www.iau.dtu.dk/research/control/kalmtool.htmlhttp://www.tvk.hu/ -
8/8/2019 ESCext1
5/9
B. Feil et al. / Computers and Chemical Engineering xxx (2005) xxxxxx 5
Notation: i = C2, C6, H2, j = C4, C2, C6, H2,
Qcooling = Finw c
wp (T
outw T
inw ) and G(.) means mass,
F(.) means mass rate, c(.)p means the specific heat of the (.)
component, and Hi represents the heat of the i reaction.
For the feedback to the filter, measurements are available
on the chain transfer agent, monomer and comonomer con-
centration (y1,2,3 = x3,4,5), reactor temperature (y4 = x7)and the density of the slurry in the reactor ( y5 = slurry,
which is related to x1 and x2). The concentration measure-
ments are available only in every 8 min.
The dimensionless state-variables are obtained by the nor-
malizing of the variables, xn = xxminxint
, where xmin is a min-
imal value and xint is the interval of the variable (based on
a priori knowledge, e.g. the operators experiences if avail-
able). The values of the input and state-variables have not
been depicted in the figures presented in the next sections
because they are secret so not publishable.
3.3. Parameters of the segmentation algorithms
The results studied in the next sections have been ob-
tained by setting the initial process noise covariance matrix
to Q = diag(104), the measurement noisecovariancematrix
to R = diag(108), and the initial state-covariance matrix to
P0 = diag(108). The values of these parameters heavily de-
pends on theanalyzeddataset. That is whythe propernormal-
ization method has an influence on the results. However, the
parameters above can be used to estimate the state-variables
not only the datasets presented in the next sections, but also
other datasets that contain data from production of other prod-
ucts in different operation conditions but in the same reactorand produced by the same type of catalyst. In these cases,
the state-estimation algorithm was robust enough related to
the parameters above, they can be varied in the range of two
orders of magnitude around the values above.
For the segmentation algorithm some parameters have to
be chosen in advance, one of them is the number of principal
components. This can be done by the analysis of the eigen-
values of the covariance matrices of some initial segments.
For this purpose a so-called screeplot can be drawn that plots
the ordered eigenvalues according to their contribution to the
variance of data. Another possibility is to define q based onthe desired accuracy (loss of variance) of the PCA models.
The datasets shown in Figs. 3 and 4 were initially parti-
tioned into 10 segments. As Fig. 1 illustrates, the cumulative
rate of the sum of the eigenvalues shows that five PCs are suf-
ficient to approximate the distribution of the data with 97%
accuracy in both cases. Obviously, this analysis can be fully
automatized-based on the following:
p1j=1 i,jnj=1 i,j
< accuracy
pj=1 i,jnj=1 i,j
, (17)
wherep is the number of principal components, n the number
of variables, and i,j is the jth eigenvalue of the covariance
matrix of the ith initial segment.
Another important parameter is the number of segments.
One of the applicable methods is presented by Vasko and
Toivonen (2002). This method is based on permutation test so
as to determine whether the increase of the model accuracy
with the increase of the number of segments is due to the
underlying structure of the data or due to the noise. In this
paper, the simplified version of this method has been used. It
is based on the relative reduction of the modelling error (see
(2) and (3)):
RR(c|T) = cost(Sc1T ) cost(ScT)
cost(Sc1T )(18)
where RR(c|T) is the relative reduction of error when c seg-
ments are used instead ofc 1 segments.
Fig. 1. Screeplot for determining the proper number of principal components in case of datasets presented in (a) Section 3.4 and (b) Section 3.5, respectively.
-
8/8/2019 ESCext1
6/9
6 B. Feil et al. / Computers and Chemical Engineering xxx (2005) xxxxxx
Fig. 2. Determining the number of segments by Algorithm 3 in case of datasets presented in (a) Section 3.4 and (b) Section 3.5, respectively.
As it can be seen in Fig. 2, significant reductions arenot achieved by using more than five or six segments in
case of both datasets. Similar figures can be obtained by
Algorithm 2.
3.4. Monitoring of process transitions
In this study, a set of historical process data covered 100h
period of operation hasbeen analyzed.Thesedatasets include
at least three segments because of a product transition around
the 45th hour (see Fig. 3). Based on the relative reduction of
error in Fig. 2(a), the algorithm searched for five segments
(c = 5).The results depicted in Fig. 3 show that the most reason-
able segmentation has been obtained based on the covariance
matrices of state-estimation algorithm (Algorithm 3). The
segmentation obtained based on the estimated state-variables
is similar: the boundaries of the segment that contains the
transition around the 45th hour are nearly the same, and the
other segments contain parts of the analyzed dataset with
similar properties. Contrary to these nice results, when only
the measured inputoutput data were used for the segmen-
tation the algorithm was not able to detect even the process
transition.
It has to be noted that Algorithm 3 can be found more
reasonable than Algorithm 2, because one additional param-
eter has to be chosen in the last case: the forgetting factor,
in the recursive estimation of the covariance matrices in
(5). The result obtained by Algorithm 2 is very sensitive to
its choice. The = 0.95 is seemed to be a good trade-off
between robustness and flexibility.
3.5. Detection of changes in the catalyst productivity
Beside the analysis of the process transitions, the time-
series of stable operations have also been segmented to
detect interesting patterns of relatively homogeneous data.
For this purpose, Algorithm 3 was chosen from the meth-ods presented above, because it gives good results in case
of product changes. One of these results can be seen in Fig.
4, which shows a 120-h long production period without any
product changes. Based on the relative reduction of error in
Fig. 2(b), the number of segments was chosen to be equal to
six (c = 6).
The homogeneity of a historical process data set can be
characterized by the similarity of the segments that can be
illustrated as a dendrogram (see Fig. 5).
This dendrogram and the border of the segments give a
chance to analyze and to understand the hidden processes
of complex systems. In this example, these results confirmthat the quality of the catalyst has an important influence
in productivity. During the 20, 47, 75, 90th hours of the
presented period of operation changes between the catalyst
feeder bins happened. The segmentation algorithm-based on
the estimated state-variables was able to detect these changes
that had an effect to the catalysis productivity, but when only
the inputoutput variables were used segments without any
useful information were detected.
It has to be noted that the borders of the segments given
by Algorithms 2 and 3 are similar also in this case, but
the dendrograms are different. This is because that the seg-
ments without product transition are much more similar to
each other than in case of the time-series which contains
a product transition. So it is a more difficult problem to
differentiate segments of operations related to the minor
changes of the technology, like the changes of the catalyst
productivity. This phenomena can also be seen in the den-
drogram: the values that belong to the axis of ordinates are
smaller with one or two order(s) of magnitude in case of
a time-series without product transition. In case of product
transition not only the borders of the segments are similar
but also the shape of the dendrograms are nearly the same.
This shows that both algorithms are applicable for similar
purposes.
-
8/8/2019 ESCext1
7/9
B. Feil et al. / Computers and Chemical Engineering xxx (2005) xxxxxx 7
Fig. 3. (a and b) Segmentation-based Algorithm 1; (c and d) segmentation-based on Algorithm 2; (e and f) segmentation-based on Algorithm 3; (a, c, and e)
input variables: FinC2, FinC4
, FinC6, FinH2
, Fincat, Tinw , T
outw ; (b, d and f) process outputs and states: TR, cC2 , cC4 , cC6 , slurry, kC2 , kC6 , kH2 .
-
8/8/2019 ESCext1
8/9
8 B. Feil et al. / Computers and Chemical Engineering xxx (2005) xxxxxx
Fig. 4. Segmentation-based on the error covariance matrices.
Fig. 5. Similarity of the found segments.
4. Conclusions
This paper presented the synergistic combination of state-
estimation and advanced statistical tools for the analysis of
multivariate historical process data. The key idea of the pre-
sented segmentation algorithm is to detect changes in the
correlation among the state-variables-based on their a pos-
teriori covariance matrices estimated by a state-estimation
algorithm. The PCA similarity factor can be used to ana-
lyze these covariance matrices. Although the developed al-
gorithm can be applied to any state-estimation algorithms,
the performance of the filter has huge effect on the segmen-
tation. The applied DD2 filter has been proven to be accu-
rate, and it was straightforward to include a varying num-
ber of parameters in the state-vector for simultaneous state
and parameter estimation, which was really useful for the
analysis of the reaction kinetic parameters during process
transitions. The application example showed the benefits of
the incorporation of state-estimation tools into segmentation
algorithms.
References
Himberg, J., Korpiaho, K., Mannila, H., Tikanmaki, J., & Toivonen, H. T.
(2001). Time-series segmentation for context recognition in mobile de-
vices. IEEE international conference on data mining (ICDM01, San
Jose, California) , pp. 203210.
Keogh, E., Chu, S., Hart, D., & Pazzani, M. (2001). An online algorithm for
segmenting timeseries.IEEE International Conference on DataMining;
http://www.citeseer.nj.nec.com/keogh01online.html .
Kivikunnas,S. (1998) Overviewof processtrend analysismethodsand appli-
cations. ERUDIT workshop on applications in pulp and paper industry,
page CD ROM.
Krzanowski, W. J. (1979). Between-groups comparison of principal
components. Journal of the American Statistical Society, 74, 703
707.
Last,M., Klein, Y., & Kandel, A. (2000). Knowledge discoveryin timeseries
databases.IEEE Transactions on Systems, Man,and Cybernetics, 31 (1),
160169.
Nemeth, S., Abonyi, J., Feil, B., & Arva, P. (2003). Fuzzy clustering
based segmentation of time-series. Lecture Notes in Computer Science,2810/2003, 275285.
Poulsen, N. K., Norgaard, M., & Ravn, O. (2000). New developments
in state estimation for nonlinear systems. Automatica, 36 (11), 1627
1638.
Srinivasan, R.,Wang, C., Ho,W. K., & Lim,K. W. (2004). Dynamicprincipal
component analysis based methodology for clustering process states in
agile chemical plants.Industrial& EngineeringChemistry Research, 43,
21232139.
Stephanopoulos, G., & Han, C. (1996). Intelligent systems in process en-
gineering: A review. Computational Chemical Engineering, 20, 743
791.
Vasko, K., & Toivonen, H. T. T.(2002). Estimatingthe number of segmentsin
time series data using permutation tests. IEEE International Conference
on Data Mining, 466473.
http://www.citeseer.nj.nec.com/keogh01online.htmlhttp://www.citeseer.nj.nec.com/keogh01online.html -
8/8/2019 ESCext1
9/9
B. Feil et al. / Computers and Chemical Engineering xxx (2005) xxxxxx 9
Vincze, Cs., Arva, P., Abonyi, J., & Nemeth, S. (2003). Process analysis and
product quality estimation by self-organizing maps with an application
to polyethylene production. Computers in Industry, Special Issue on Soft
Computing in Industrial Applications, 52 (3), 221234.
Wang, X. Z. (1999).Datamining and knowledge discovery for process mon-
itoring and control. Springer.
Yamashita, Y. (2000). Supervised learning for the analysis of the pro-
cess operational data. Computers and Chemical Engineering, 24, 471
474.
Zhang, J., Martin, E. B., & Morris, A. J. (1997). Process monitoring us-
ing non-linear statistical techniques. Chemical Engineering Journal, 67,
181189.