batch process monitoring based on fuzzy segmentation of

Vol. 50 No. 1 2017 53Copyright © 2017 The Society of Chemical Engineers, Japan

Journal of Chemical Engineering of Japan, Vol. 50, No. 1, pp. 53–63, 2017

Batch Process Monitoring Based on Fuzzy Segmentation of Multivariate Time-Series

Harakhun Tanatavikorn and Yoshiyuki YamashitaDepartment of Chemical Engineering, Tokyo University of Agriculture and Technology, 2-24-16 Naka-cho, Koganei-shi, Tokyo 184-8588, Japan

Keywords: Batch Process, Fault Detection, Principal Component Analysis, Fuzzy Clustering, Penicillin Fermentation

This paper proposes a novel batch process monitoring method called adjoined time series principal component analysis (AdTsPCA). In this method, a modi�ed GG clustering is used for phase identi�cation and data segmentation and multiple time-ordered overlapping PCA models are constructed from the data segments. The PCA models are then used for statis-tical process monitoring. The key characteristic of AdTsPCA is that additional information contained in the order of PCA models allows for additional diagnosis by the comparison of known process phase and suspected abnormal situation. The proposed AdTsPCA is applied to an industrial penicillin fermentation process to illustrate the e�ectiveness of the method. AdTsPCA is able to detect faults in the process and signi�cantly reduces the number of false positive errors in the process monitoring.

Introduction

Batch processes play an important role in today’s de-manding chemical industry due to their flexibility in pro-ducing low volume-high value products. They are character-ized by prescribed processing of raw materials for a finite duration to convert them into product (Cinar et al., 2003). They typically do not have steady states, usually operate over a broad set point range, and exhibit strong nonlinear behavior. These attributes make batch processes challeng-ing to monitor and various monitoringmethods have been proposed to improve monitoring accuracy and efficiency. Advances in process computers and developments in sen-sor technology, increasing the amount of available historical process data, has led to widespread use of data-driven pro-cess monitoring methods with Multiway Principal Compo-nent Analysis (MPCA) and Multiway Partial Least Squares (MPLS) being the most widely used among them (Qin, 2012).

Statistical process monitoring is based on modeling the variability of the process variables around their average. In the case of batch processes, the average trajectory of the pro-cess variables is available. Subtracting this average trajectory removes most of the non-linear dynamics and linear mod-els can be used for monitoring (Kosanovich et al., 1996). MPCA is an extension of the application of Principal Com-ponent Analysis (PCA) from continuous process to batch processes (Nomikos and MacGregor, 1994). PCA extracts latent variables from process measurements and model the data variance-covariance structure with the fewest possible

number of latent variables. While the multiway aspect al-lows for an analysis and monitoring of the within-batch and between-batch variations. Conventional MPCA takes the entire batch data as a single object in the modeling and con-structs a single PCA model. However, most batch processes are inherently multiphase in nature. Multiphase is defined as a batch process with a single processing unit but multiple operational regimes (Ündey and Cinar, 2002). These opera-tional regimes are called phases with process dynamics and correlations among variables often changing with the transi-tion across such phases (Nomikos and MacGregor, 1994). This lack of consideration for the multiphase nature of batch process leads to difficulties in understanding the process and affects the efficiency of the process monitoring strategy. Different phase division and modeling methods have been developed that take the phase effect into consideration. Kosanovich et al. (1996) proposed to divide the model into two based on changes in the variance explained by the prin-cipal component during batch. Lu et al. (2004) proposed the method Sub-PCA that utilizes a clustering algorithm to de-tect segments of a batch which can be modeled by the same mode. Alternatively Camacho and Picó (2006) developed the multiphase PCA (MPPCA) algorithm, which detects the phase devision points based on the prediction abilities of the constructed PCA models. A survey of phase division meth-ods was presented in Yao and Gao (2009).

Of particular interest is the usage of clustering algorithm to detect and segment batch data for PCA model con-struction. Batch process data can be viewed as multivariate time-series data. PCA models constructed from crisp data clustering often suffer from false positive (Type-I) errors during transition between phase models (Ng and Srinivasan, 2009) and the resulting clusters may or may not correspond to the phases. Ng and Srinivasan (2009) discussed the need for adjoined multiple models in a multimodel approach and

Received on July 7, 2016; accepted on August 17, 2016DOI: 10.1252/jcej.16we193Correspondence concerning this article should be addressed to H. Tanatavikorn (E-mail address: [email protected]).

Research Paper

54 Journal of Chemical Engineering of Japan

developed an adjoined multi model-based method, AdPCA. AdPCA uses fuzzy c-means clustering to produce PCA models with overlapping borders. This reduces the occur-rence of false positives, though it is still difficult to relate the PCA models to the process phases. To address this, we have applied a different clustering algorithm to detect and seg-ment the batch data. Abonyi et al. (2005) proposed a fuzzy clustering with a time-ordered structure where the points in a cluster must come from successive time points. It uses local probabilistic principal component analysis (PPCA) models to measure the homogeneity of the segments and uses fuzzy sets to represent the segments in time. It also automatically merges similar adjacent segments to allow for reduced number of clusters based on a cluster similarity and distance between clusters. The clusters are continuous seg-ments that retain their temporal information. The additional temporal information can then be used for phase identifica-tion.

In this paper, a novel batch process monitoring meth-od adjoined time series principal component analysis (AdTsPCA) based on the modified Gath-Geva clustering de-veloped by Abonyi et al. (2005) is proposed. The clustering is used for phase identification and data segmentation. Mul-tiple time-ordered overlapping PCA models are constructed from the data segments and used for monitoring. The key characteristic of AdTsPCA is that additional information contained in the order of PCA models allows for additional diagnostics and comparison of known process phase and suspected abnormal operation. This significantly reduces the number of within-mode false positives and improves the process monitoring. The proposed AdTsPCA is applied to a industrial penicillin fermentation process to illustrate the method’s effectiveness.

The remainder of the paper is organized as follows: Sec-tion 1 provides a background information of PCA and the multimodel approach as applied to process monitoring. An brief overview of the modified Gath-Geva clustering is also provided in this section. The proposed AdTsPCA methodol-ogy is described in Section 2. Sections 3 and 4 present the application of the proposed method a case study of indus-trial scale penicillin fermentation process.

1.　Background

1.1　Principal component analysisPrincipal component analysis is a statistical method that

is widely used in process monitoring (Venkatasubramanian et al., 2003). The principle of PCA is to project high-dimen-sional information into a low dimensional subspace while preserving the main information of the original data set. It uses orthogonal transformation to convert a set of data of possibly correlated variables into a set of linearly uncor-related variables called Principal Components (PC). Each principal component is a linear combination of the original variables and are independent of each other. A data matrix X can be linearly decomposed into the sum of scores as the sum of scores t, loadings p, and a residual matrix e in the

following way:

PC1 PC2

1 1 2 2Data variance ResidualScores 1 Loading 1 Scores 2 Loading 2

T TX U V U V E= + + +

(1)

U is a matrix of variable scores that contain information on how the data samples are related to each other while V is the loading matrix which contains information regarding the correlation among the variables. The PC are then sorted in descending order with respect to the variance contained within each PC. Thus with increasing order of PC, the vari-ance captured within that respective PC becomes relatively small. To achieve dimension reduction, only a selected num-ber of PC are chosen for model construction. Jolliffe (2002) distinguished three kinds of methods to determine the num-ber of PC q: (1) ad-hoc rules (such as cumulative percent variance or scree test), (2) test based on distributional as-sumptions and (3) computational methods such as cross-validation. In this paper the cumulative percent variance (CPV) of the PC is used to determine q with the threshold value of 95%, so that the resulting PCA model accounts for 95% of the variance in the original data.

1.1.1　Fault detection with PCA monitoring statis-tics　Fault detection using PCA or its variants is generally conducted by monitoring two statistics associated with the PCA model: (1) Hotelling’s T2 and (2) the squared predic-tion error (SPE).

The Hotelling’s T2 statistic measures the variation of the sample within the PCA model and is defined as follows:

2 1 11 2 1 2[ , , , ]Λ [ , , , ]T T T

i q q i iT U U U U U U xVλ V x− −… …= = (2)

where Λ−1 is the diagonal matrix containing the inverse of the eigenvalues associated with q eigenvectors retained in the PCA model (Ng and Srinivasan, 2009). The SPE statistic measures the variation of the sample from the PCA, i.e., lack-of-fit to the PCA model:

SPE ( )T T Ti i i i ie e x I VV x−= = (3)

The process is considered normal if SPE≤Q(α) where Q(α) denotes the upper control limit for SPE with significance level α based on a standard normal distribution (Jack-son and Mudholkar, 1979). An upper control limit can be similarly derived for the T2 statistic (Jackson, 2003). The Hotelling T2 and SPE statistics are complementary in nature where they represent the variability in the residue and PCs respectively. Raich and Çinar (1996) proposed the combined discriminant similarity index:

2

2SPE (1 )( ) ( )

Tφ β βQ α T α−= + (4)

where β∊[0, 1] is a constant. They further gave a rule that the statistic less than one is considered normal. It has been suggested that the use of a single index is preferable in prac-tice (Joe Qin, 2003).

Vol. 50 No. 1 2017 55

1.2　Modi�ed Gath-Geva clustering for segmentation of time-series

This work uses a modified Gath-Geva(GG) clustering proposed by Abonyi et al. (2005) for time series segmenta-tion. Abonyi et al. interpreted the GG clustering algorithm in a probabilistic framework and utilized a modified dis-tance norm based on probabilistic principal component analysis (PPCA) (Tipping and Bishop, 1999). They proposed a clustering algorithm for the simultaneous identification of local PPCA models which are used to measure the ho-mogeneity of the segments. Fuzzy sets in the form of over-lapping multivariate Gaussian functions are then used to represent the segments in time. The algorithm uses time as an additional variable and forms contiguous clusters in time. It is able to detect changes in the hidden structure of multivariate time-series. The optimal number of cluster is determined in a similar manner to the bottom-up method. Rather than guessing the number of clusters, the user de-fines a maximum number of clusters and the algorithm iteratively progresses from the maximum number of clusters to the optimal number of clusters by merging neighboring clusters based on a user specified criteria. In this modified algorithm, a fuzzy decision making algorithm determines whether clusters are merge based on a user specified thresh-old compatibility (similarity) to adjacent clusters.

Given a time-series T=xi|1≤i≤N is a finite set of N samples labelled by time points t1, …, tN, the segmentation splits T into c disjoint time intervals. The goal is to find homogeneous segments S1, …, Sc from a given time-series. This can be formulated as a constrained clustering problem: data points should be grouped based on their similarity but all points in a cluster must come from successive time points. The optimal c-segmentation is obtained by minimiz-ing the cost of c-segmentation which is usually the sum of the individual segments:

2total ,

1 1

cost( ) ( ) ( , )c N

m xa i i a

a i

S μ D x v= =

= (5)

The cost function minimized is the weighted squared dis-tances D2 (xi, va

x) between data points xi and the mean of the variables va

x in the a-th segment. μa,k represents the degree of membership of the observation xi to the cluster a-th cluster and m is a weighting exponent that determines the fuzziness of the resulting cluster (typically chosen as m=2).

When D2(xi, vix) is interpreted in a probabilistic frame-

work, the probability that a data point xi belongs to a cluster is inversely proportional to its distance D2(xi, vi

x) to the a-th cluster. The inclusion of time t requires us to redefine a data point as zi=[ti, xi] while η denotes the cluster prototype that contains the various cluster parameters. Thus the cluster-ing algorithms is based on the following distance measure D2(zi, ηa):

2

2

22 ,, 2

( | )

1

2

( | )

1( | )

( , )

1 1 ( ) 1exp

22 (2 ) det( )

1 1exp ( ) ( )

2(2 ) det( )

k a

i a

i ai a

ti a

a ra ta t

a

p t η

x T xi a a i ar

a

p η

p z ηD z η

t vα

σπσ π A

x xπ

−

−⋅

− − −

x

v A vA

=

=

×

(6)

The overall probability p(xi|ηa) is the combination of three probabilities. The first term αa represents the a priori prob-ability of the cluster. This is calculated based on the μa,k. The second term is the probability in the time t dimension, denoted by the superscript and subscripts t to the variables. It represents the distance between the i-th data point and the va

t center of the a-th segment in time. The third term represents the distance between the data and cluster center in the feature space where va

x means the coordinate of the a-th cluster center in the feature space and r is the rank of Aa distance norm corresponding to the a-th cluster.

The first and second terms in Eq. (6) can be calculated from the cluster parameters. The third term is obtained by constructing probabilistic principal component analysis (PPCA) for each cluster. The generated local PPCA models provide the p(xi|ηa) probabilities which can then be used to determine p(zi|ηa). The minimization of Eq. (6) under con-straints is then carried out using alternating optimization (Bezdek and Dunn, 1975; Höppner et al., 1999). Once the D2(zi, ηa) is determined, the partitioning matrix can then be updated by recalculating μa,i.

new,

2/( 1)

1

1

( ( , ) / ( , ))a i c

mi a i l

l

μD η D η − z z

=

= (7)

where [1≤a≤c] and [1≤i≤N]. To summarize, the algo-rithm operates according to the following general steps1. Calculate parameters of each cluster: centers (va

t,vax),

overall probability p(zi|ηa), degree of membership μa,i, and standard deviation σ2

a,t;2. Compute distance measure by Eq. (6);3. Update partition matrix (degree of membership);4. Repeat until membership values stabilize (The error tol-

erance is reached).The clusters are then evaluated by a fuzzy decision-

making algorithm (Kaymak and Babuska, 1995). The deci-sion-making algorithm determines a compatibility criterion that quantifies aspects of the similarity of the clusters. The compatibility criteria are; (1) Similarity factor SPCA (Krza-nowski, 1979; Singhal and Seborg, 2001) and (2) Distance among the cluster centers (Gath and Geva, 1989). SPCA is calculated from the sum of the squares of the cosines of the angles between each principal component of the clusters.


, 2, , , , ,PCA

1 1

1 1cos tr( )q q

a b T Ta b a q b q b q a q

a b

S θ L L L Lq q= =

= = (8)

where La,q and Lb,q are matrixes that contain the eigenvec-tor of the PCA models for the corresponding Sa and Sb data segment. The distance among the cluster centers v is taken according to the following equation.

( , )x x x xa b a bD −v v v v=‖ ‖ (9)

The overall cluster compatibility is determined by the fuzzy aggregation of the two criteria. The result is the overall com-patibility matrix O that presents the compatibility between each segment’s PCA models. Each element of the compat-ibility matrix is calculated.

21 2 2 2, ,

,( ) ( )

2a b a b

a bτ τ

O

+= (10)

Here τ1a,b and τ2

a,b are degree of parallelism and closeness obtained from evaluating the membership functions with the two compatibility criteria using the fuzzy decision mak-ing algorithm. Finally, the clusters are merged based on the previously determined compatibility matrix O. Clusters exceeding a user specified compatibility threshold value are merged. The cluster merging is then carried out using the method developed by Kelly (1994). A detailed derivation and discussion of the modified GG fuzzy clustering algo-rithm can be found in Abonyi et al. (2005).

The summary of the overall procedure of the time series segmentation is as follows:1. Uniformly segment the data by a large number of seg-

ments c. Determine the appropriate number of PC q based on the analysis of the eigenvalues of these seg-ments and CPV.

2. The values for fuzziness parameter m, the γ threshold for the O compatibility matrix and the termination tol-erance ε are are chosen based according to suggestions by Abonyi et al. (2005).

3. Finally the clustering algorithm is executed and an evaluation for cluster merging is performed at selected intervals. The algorithm stops when the termination tolerance is reached and no further cluster merging is required.

4. Further tuning of the output of the algorithm is achieved by manipulating c and gamma.

Figure 1 is the data segmentation algorithm at the initial state. The final results of the data algorithm is shown in Fig-ure 2.

1.2.1　Tuning of modified GG algorithm　The modified GG algorithm has several tuning parameters.1. c-The maximum number of local time-segments;2. q-The chosen number of principal component for the

PPCA model used in the clustering;3. m-The fuzziness parameter of cost Eq. (5);4. γ-A chosen threshold value for cluster compatibility

used to evaluate cluster merging;

5. ε-A termination tolerance that indicates the stabilization of fuzzy clusters.

In the case studies of Abonyi et al. (2005), it is suggested that the parameters should be tuned to fit the data being clustered and subsequent application of data. In their case study m=2, ε=10−4, and the value of γ is usually between 0.3 and 0.75 depending on the homogeneity of the time-series. c=10 was chosen for both case studies, though no particular criteria were given regarding their selection. q was selected based on a scree-plot of the eigenvalues and PCs chosen to account for 95–98% of the data. Finally, Abonyi et al. (2005) noted that the segmentation task is an iterative procedure and results should be evaluated by human experts or other data mining tools to ensure satisfactory and feasible results.

2.　Fault Detection with AdTsPCA

In this section, we propose the use of time-ordered over-lapping PCA models for monitoring of batch operation. The proposed methodology is based on fuzzy time-series seg-mentation as a preprocessing step to PCA model develop-ment. The modified GG clustering algorithm is used to dis-tinguish and segregate multiple phases in the training data. The resulting clusters are multivariate Gaussian function

Fig. 1　Data segmentation at initial state

Fig. 2　Data segmentation final result for normal batches

Vol. 50 No. 1 2017 57

that overlap, denoted as fuzzy segments. These segments are then used to construct PCA models and inherit the fuzzy nature and time-sensitive qualities of the data segments, resulting in time-ordered PCA models. During monitor-ing, the PCA model that best describes the current state is chosen for monitoring with additional ”adjacent” PCA mod-els providing supplementary information. The monitoring statistics is then used for monitoring. In a normal process, the main monitoring model transitions in a orderly man-ner though it may sometimes skip subsequent models in the transition. In case of faults, the monitoring statistics often exceeds the confidence limit and the usual orderly progres-sion of the monitoring model is strongly interrupted. This information contained in the time-ordered models can help assist in the fault identification procedure. The methodology is presented in Figure 3.

Firstly, a PCA model repository is built using histori-cal information. The historical time-series batch data is unfolded batch-wise (variable×time) and is normalized to zero mean and unity variance. The pretreated data is then fed into the modified GG clustering algorithm to produce multiple data groups. A data point belongs to a group if its membership value to a group is above a threshold value thμ, where 0≤thμ≤1. As seen in Figure 2, a smaller value of thμ allows for more overlap between the data segments while thμ=0.5 prevents any overlap and produces the disjointed multiple models. These local time-series data segments are then used to build PCA models that are used for monitor-ing. The building of the PCA repository is shown in Figure 4.

For online monitoring, as shown in Figure 3, a model that best describes the operation Mop is selected. The monitoring statistics used is based on the chosen monitoring model. The distance between the new data sample xi and the model is evaluated using the combine discriminant similarity factor φ calculated in Eq. (4). The nearest PCA model is the model with the lowest φ calculated based on the T2 and SPE values and confidence limits of xi to the c PCA models.

op[1, ] ,arg min( )k c i ciM φ∈= (11)

The monitoring statistics from Mop for the new data are then compared with their corresponding limits for fault de-tection. In normal operation, the Mop selected for monitor-

ing follows an orderly increase. When process faults occur, a major disruption in the Mop pattern is observed. AdTsPCA defines a fault as a violation of the monitoring statistics confidence limits combined with a disruption in the pattern of Mop. The monitoring of the Mop is in the form of a small time lagged window of width l containing previous choices of Mop.

op op op op1 2[ , , , , ]t t t t lM M M M− − −… (12)

The choice of l affects the fault sensitivity of AdTsPCA. A small l reduces the false positives that occur during model transition, but increases AdTsPCA sensitivity towards false positives within the local model. In the case study, l is cho-sen to cover 0.4 h of previous monitoring. When a monitor-ing mode switch occurs the new Mt

op should be the next ”ad-jacent” model in the PCA time-ordered models. Faults result in a more drastic mode switch, causing the Mop to revert to an earlier mode or skip modes. A delay in the mode switch when compared to normal operation is also indicative of a fault in the process. The model switching is not an accurate parameter to track expected model progression. The switch-ing is conducted based on φ so the behavior of φ provides a more detailed information about the model progression. The symbol→denotes the transition of monitoring model, for example 1→2 represents model 1 switch to model 2. A data point is flagged as a fault when one the following condi-tions are true.• A violation of either monitoring statistic and deviation

from expected monitoring model. In case of uncertainty, the behavior of φ is analyzed to ascertain the direction of model progression.

• A violation in the monitoring statistic that does not re-cover within the period of the time lagged window l.

The overall procedure for AdTsPCA monitoring follows these steps.1. Pretreat process data by centering and normalization.2. Segment time-series data of normal batches using the

modified GG clustering algorithm. The tuning is per-formed according to data characteristics and to consis-tently obtain similar cluster segments.

3. Merge segments denoting similar phases to produce ref-erence data set for normal operation.

4. Build PCA models for each data segments. The resulting time-order PCA models are placed in a repository used

Fig. 3　Monitoring using AdTsPCAFig. 4　Building the PCA model repository


for monitoring.5. For online monitoring, calculate the T2 and SPE and

their corresponding limits then determine the combined similarity index φ according to Eq. (4).

6. The main monitoring model Mtop is chosen by selecting

the model with the lowest φ value.7. Monitor the T2, SPE, and the progression of Mt

op to de-termine whether a fault has occurred. A fault is defined as a violation of monitoring statistic and deviation from expected model progression.

3.　Case Study of an Industrial Penicillin Fermentation Process

In this section, the proposed AdTsPCA is tested on the penicillin fermentation process. The process is a multiphase bio-process with nonlinear dynamics. This study uses the industrial-scale fed-batch fermentation simulator developed by Goldrick et al. (2015). The work is an extension of the structured penicillin fermentation model developed by Paul and Thomas (1996) and describes all the component balanc-es relating to the process variables. The simulator improves on previous fed-batch simulations as it considers the typical problems encountered on large-scale fermentation, includ-ing challenges associated with the control of the dissolved oxygen during high viscous fermentation. Additionally it was validated using the batch records from ten 100,000 L fed-batch penicillin fermentation. The simulator and in-dustrial process data are available for download at www.industrialpenicillinsimulation.com.

The penicillin fermentation takes place in a 100,000 L bioreactor. The reactor is equipped with sensors that mea-sure temperature, pH and dissolved oxygen (DO2). Also the flow rates for substrates, reactor discharge, water injection control fluids are monitored. The online off-gas analysis that measures the concentration of carbon dioxide (CO2) and oxygen (O2) is also available. Finally offline measurements, such as concentration and viscosity, were provided in the batch records and can be generated with the simulator. The feed rates of the substrate, soybean oil, and phenylacetic acid flow are controlled through sequential batch control using a recipe that follows predetermined optimum profile. Similarly aeration rate and vessel pressure are controlled to maintain the desired dissolved oxygen concentration. Ves-sel weight is recorded using a load cell and used to schedule discharges that allow for an extended penicillin production. The production of penicillin is sensitive to temperature and pH, these variables are controlled at 298K and 6.5 through PID controllers. The evolution of centered and scaled vari-ables used in the analysis is shown in Figures 5 and 6. De-tailed operating procedure and descriptions of the reactor is available in Goldrick et al. (2015).

A total of five normal batches are simulated to create a reference data. Eight faulty batches are then simulated for different fault scenarios. The faults are listed in Table 2 and represent some common fault scenarios. The data was gen-erated with a sampling time of 0.2 h providing 5 samples

per hour and simulated with randomized initial condi-tions and batch length. Randomization parameters were provided with the simulator and were left at default val-ues. The variables used for monitoring are shown in Table 1. These chosen variables are online-measurements that correspond to the various sensor measurements. Offline-measured variables, such as concentrations of substrate and penicillin, were notincluded in the monitoring due to their time delayed nature. These offline-measured variables can be estimated by the simulator using available data, but in

Fig. 5　Centered and scaled variables for normal batches

Fig. 6　Flowrates of variables controlled by sequential batch control

Table 1 Variables used for monitoring the industrial penicillin fer-mentation process

Index Variable Units

1 Aeration rate m3 min−1

2 Substrate flow rate L h−1

3 Soybean oil flow rate L h−1

4 Phenylacetic acid flow rate L h−1

5 Reactor Temperature K6 Reactor pH —7 Dissolved oxygen concentration mg L−1

8 Oxygen concentration in off-gas %9 Carbon dioxide concentration in off-gas %

10 Vessel weight kg11 Vessel back pressure bar

Vol. 50 No. 1 2017 59

order to compare with batch record data they were excluded from analysis. Hot/cold water flowrateand acid/base flow-rate, the manipulated variables for the PID controller, were also excluded. This is due to their volatile nature where they exhibit periods of inactivity (constant) or abrupt increases in response to temperature and pH changes. When combined with oscillations in the temperature and pH they tend to have a excessive influence on the clustering algorithms and monitoring.

4.　Results and Discussion

In this section, the results from using AdTsPCA for moni-toring is presented and compared to conventional single and multimodel approaches. The results from normal operation is presented and discussed in the first part. Then the perfor-mance of AdTsPCA in four different fault scenarios is then presented and analyzed. The overall result of AdTsPCA for the monitoring of all the fault types listed in Table 2 is pre-sented in the final part of this section in Table 3.

4.1　Normal operationFirst the simulated normal batch data are fed into the

clustering algorithm for phase separation. The clustering algorithm for the case study is tuned as; c=10, q=6, γ=0.4, and thμ=0.3. This is performed for all five of the normal batches. At current tuning parameters, it is observed that the clustering results for the normal batches exhibit similar phase division and produces 5 clusters, Figure 2. The clus-ters representing similar phases are then grouped based on knowledge of the process operating steps. The data clusters can be sorted into 3 separate general groups; (1) normal op-erations, (2) faults or abnormal operation, and (3) unknown operation. There is a degree of flexibility when selecting of data segments to represent normal operation. In scope of the current work, it was observed that a single aggregated data set is sufficient to describe normal operation and detect faults. Though it is speculated that for processes with high variance in operating conditions and control actions, mul-tiple models will be necessary to define normal operations.

As mentioned in previous sections, the clusters produced

by the modified GG clustering are continuous time seg-ments. Conventional clustering such as k-means or fuzzy c-means (FCM) produce clusters that may mix points from different time instance in data. This results in the loss of information contain within time dimension of the data. It is also interesting to note that due to randomized starting centers, k-means and FCM often require multiple runs to obtain appropriate clusters. Thus the clustering results are unique and it becomes difficult to relate clusters from dif-ferent batches. In contrast continuous time segments from the modified GG clustering retains the information in an implicit manner allowing for intuitive relation of within-batch clusters and between-batch clusters. This allows for improved phase division and subsequently the building of reference data. Figure 7 compares the clustering results.

After determining the clusters, the next step is to build the PCA models. When a single PCA model is used for monitoring there are a multitude of violations of T2 and SPE in the initial stages of process, shown in Figure 8. When multiple models are used, each model has its corresponding confidence limit. In order to provide a consistent presenta-tion, the monitoring statistics T2 and SPE of the monitoring model has been scaled to their corresponding confidence limit. The results is that the 99% confidence limit becomes 1

Table 2　Faults in the industrial penicillin fermentation process

Index Fault Type Time [h]

F1 50% Aeration rate step 20decrease in growth phase

F2 50% Aeration rate step 80decrease in production phase

F3 Aeration rate ramp increase 40F4 75% Substrate feed rate step 2–6

decrease in growth phaseF5 50% Substrate feed rate step 75

decrease in production phaseF6 Substrate feed rate ramp decrease 40F7 pH controller fault 80–100F8 Temperature controller fault 40

Fig. 7　Comparison between clustering methods

Fig. 8　Monitoring using a MPCA for normal batches


and the monitoring statistics are reduced to a different scale. Multimodel approaches, shown in Figures 9 and 10, shows clear improvement in fault detection performance in the initial stages of the batch by reducing the number of false positives. It is important to note that for Figure 10, the PCA models produced from k-means clustering requires signifi-cant post processing effort to relate them to operation stages and it may sometimes be impossible due to the mixture of data points from different time periods in a cluster. It is also prone to false-positives during transition between models. Ng and Srinivasan (2009) utilized FCM clustering to address the issue of false-positives by allowing for overlap at the border of PCA models, though correlating the monitoring model to the process phase still requires significant effort. In contrast, the main monitoring model of AdTsPCA on the other hand, follows a step-wise orderly pattern. This behav-ior is observed in all the normal batch runs of this study and shows that the PCA models inherit the time-ordered nature of their clusters. This can be used for fault detection and fur-ther identify false positives from real faults.

From Figure 5, 6, and 9; it is observed that the peaks in the T2 and SPE statistics reflect the peaks in the oscillations in temperature and pH values. The phases are characterized by these peaks in addition to the sequential batch control

actions. The process is still considered progressing as nor-mal, due to the orderly switching of the PCA model. There-fore the violations to the T2 and SPE values are labeled as false-positives with the help of AdTsPCA monitoring order progression. This labeling is not possible for conventional multimodel approaches and the violations T2 and SPE are flagged as faults.

4.2　Low aeration rate (F1)The fault type F1 is a case of low aeration rate. It occurs

as a step decrease in the aeration rate at t=20.0 h and lasts throughout the whole batch. The monitoring of F1 is shown in Figure 11. The fault is detected by AdTsPCA at t=20.8 h though a analysis of the behavior of φ is necessary. A plot of φ is shown in Figure 12. Recall that φ provides a qualitative measure of the distance between the current data and the all monitoring models. A model is said to move towards a model as φ to the corresponding model decreases and it moves away from a model when this value increases. Raich and Çinar (1996) gave a rule that a statistic less than 1 is considered normal. In the figure, the process moves away from all the other models. In normal process, the process will move away from just Model 1 and towards Model 2. This information in combination with violation of the both monitoring statistics in the current mode confirms the oc-

Fig. 9　AdTsPCA monitoring for normal batches

Fig. 10 Multimodel produced from k-means clustering used for monitoring of normal batches

Fig. 11　AdTsPCA monitoring for fault type F1

Fig. 12　Distance to the monitoring models for fault type F1

Vol. 50 No. 1 2017 61

currence of the fault.

4.3　Gradual decrease in substrate feed rate (F6)The gradual decrease in substrate feed rate is simulated

as a ramp decrease at t=40.0 h. The fault is detected by AdTsPCA at t=50.2 h based primarily on the T2 and SPE statistic, as demonstrated in Figure 13. The substrate feed rate diverges from normal constant flow to a slow decline. The ramp decrease is subtle and delays the detection by a significant margin. There are 2 reasons for this delay; (1) the magnitude of change in the substrate feed rate is rela-tively small when compared to temperature, pressure and gas concentrations, (2) The ramp was a gradual change that remained within the confidence limits of the current monitoring model. As the process transitions to model 3, the accumulated ramp change was sufficient to exceed the confidence level of the new monitoring model. The response is a sudden spike in monitoring statistics. The monitoring PCA model displays the appropriate transition (Model 2→3) but significant and extended violation over the period of the time lagged window in both monitoring statistics is suf-ficient to detect the fault.

For single PCA model monitoring, this ramp decrease is only detected at t=60.4 h when compared to the multi-model approachs. The multimodel approaches are observed to be more sensitive to gradual changes, particularly at model borders. AdTsPCA t=50.2 h is more sensitive when compared to Disjoined-PCA t=52.4 h and Adjoined-PCA t=52.4 h. This is because the data segments of AdTsPCA were clustered with respect to both variance and time and form distinct phases that can be related to operating phases. The data segments are not contaminated with high variance data from another operating phase. PCA models construct-ed from these data more vulnerable to false-positives but

4.4　pH controller fault (F7)In this scenario, a pH controller fault is a step decrease

in the base flow rate. It is occurs at t=80.0 h and lasts until t=100.0 h. As seen in Figure 14, the fault is immediately detected at t=80.6 h by AdTsPCA using the combination of T2, SPE, and the unexpected monitoring model change.

The change in monitoring model is of particular interest. An orderly progression from 1→2→3→4→5 is expected in normal operation. The change from 5→1 is indicative that a fault has occurred. AdTsPCA responds to the fault by changing to model 1 which it considered the closest to the fault. Recall that model 1 is identified as the initial growth stage of the fermentation and is characterized by the strong fluctuations in the variables, particularly pH and temperature. The pH control recovers at t=100.0 h and the monitoring identifies that the characteristics of current phase is most similar to model 5 and switches accordingly. Although the recovered T2 and SPE values do not match model 5, it can be inferred that the fermentation has arrived at a different equilibrium closest in characteristic to model 5. This recovery to a different equilibrium is characteristic to batch processes. The current AdTsPCA monitoring statistic is unable to recover from the fault. This can potentially be address by either using the data segment that contains the new equilibrium to form a new model that defines normal operation recovery for F7 or integrate the data segment that contains the new equilibrium into model 5.

4.5　SummaryThe summary for all faults and comparison between the

methods is presented in Table 3. The detection time and false positives rate are calculated for each method. False positives for normal operation are data points flagged as faults over the whole process divided by the total number of data points. For fault cases, it is calculated up to the data point where the fault is detected. Table 3 provides a compar-ison between single model (MPCA) and multimodels based on k-means clustering (DisPCA), fuzzy c-means clustering (AdPCA), and modified GG clustering (AdTsPCA).

All faults except F4 are detected by all four methods. To understand why F4 is not detected, recall that in Figure 6 the substrate flowrate is low and the value of the step de-crease is from 8 L h−1 to 2 L h−1. Although it is 75% this is a quite small disturbance when compared to the other param-eters such as temperature or pH. The short duration of the fault makes it difficult to detect. Therefore it can be said that even though the variables are centered and scaled, tempera-

Fig. 13　AdTsPCA monitoring for fault type F6 Fig. 14　AdTsPCA monitoring for fault type F7


ture and pH have a dominant influence on the measured variance. Any small faults or ramp faults that occur during the strong fluctuations of these dominant variables tend to escape detection, unless the fault persists until it sufficiently influences the current monitoring statistic or transitions into a different phase where dominant variables have lesser influence.

It is interesting to note that compared to other faults, the false alarm rate for F8 is relatively high for AdPCA and DisPCA, 10.1 and 0.8% respectively. This is due to tempera-ture being a variable with high variance, particularly at the initial stages of process. PCA model of AdPCA and DisPCA are not able to accurately describe the initial normal op-eration of F8. For those two methods may be necessary to increase more normal batches into the reference data set to cover a larger range of normal operation and improve performance. AdTsPCA reduces this by using the model order to eliminate false positive. This is demonstrated in F1, Figure 11, where the false alarm rate is 0 for AdTsPCA due to combining T2 and expected model to supress the false alarms in SPE statistic.

The results show that MPCA which uses a single model for monitoring is prone to false-positives and detects the faults slower than the multimodel approaches DisPCA, AdPCA, and AdTsPCA. AdPCA improves upon the DisPCA by reducing the number of false positive that occur during model switching. This is verified by Table 3 showing that AdPCA has a lower false alarms percentage than DisPCA for all detected faults. AdTsPCA is slightly faster in detec-tion due to the constrained clustering that produces clusters that contain successive time points. This makes the resulting PCA models vulnerable to faults. The trade-off is that it also makes the PCA models more prone to false positives but this is partially addressed through the implied information about the time-order of the PCA models. The results after combined evaluation of the monitoring statistics and model order yields an improved false alarm rate. The false alarms in AdTsPCA occur mainly during model switching, where the temporary spike in dominant variables and brief viola-tion of both T2 and SPE creates uncertainty in whether the current data is a fault until the it can be confirmed if the

monitoring statistic recover within the period of the time lagged window. In general, AdTsPCA provides competitive results with improved detection speed and superior false alarm rate. The main advantage of AdTsPCA is that the model order can be used in conjunction with the monitor-ing statistics to filter out false alarms.

Conclusions

The monitoring of batch process is important to ensure normal operation and enable timely recovery from process faults. Batch processes have proven challenging to monitor due to their inherent properties. A single model approach is insufficient to model the multitude of variations in the batch process. Multiple model approaches address some of these deficiencies but the task of data segmentation and identifica-tion are important pre-processing steps prior to model con-struction, due to the PCA models inheriting implicit prop-erties within the data. Conventional segmentation using clustering algorithms have been explored in several works (Yao and Gao, 2009) though it has proven challenging relate the clusters to process phases. There are also difficulties in the management and merging of normal batches to build a proper reference case for monitoring. AdTsPCA addresses these issues through the use of overlapping time-ordered PCA models and partially address the issue of phase identi-fication through the implied time information. It allows for a more intuitive merging of clusters and understanding of process variations in relation to time. AdTsPCA posses the following main features:1. AdTsPCA detects faults based on overlapping time-or-

dered PCA models using monitoring statistic T2 and SPE combined with supplementary information con-tained in the model transition.

2. AdTsPCA reduces the difficulties in phase identification and cluster merging by providing consistent clusters with time labels that can be easily related across different batches.

For future development, the model repository can be expanded with the help of the modified GG clustering algo-rithm for phase identification additional batches. The pro-

Table 3　Summary of monitoring results for penicillin fermentation process

Index Fault time

Multiway-PCA Disjoined-PCA Adjoined-PCA Adjoined-TsPCA

Fault detected

[h]

False positives

[%]

Fault detected

[h]

False positives

[%]

Fault detected

[h]

False positives

[%]

Fault detected

[h]

False positives

[%]

Normal — — 14.6 — 8.2 — 7.4 — 0.2

F1 20.0 21.2 25.8 21.0 7.5 21.0 6.9 20.8 0.0F2 80.0 81.0 17.2 80.8 8.1 80.8 6.4 80.6 0.2F3 40.0 41.2 21.0 40.8 6.6 40.8 5.8 40.4 0.1F4 2.0–6.0 — 12.6 — 7.3 — 6.9 — 0.2F5 75.0 78.4 16.8 76.2 6.8 76.2 6.0 75.8 0.2F6 40.0 60.4 22.2 52.4 7.7 52.4 6.5 50.2 0.1F7 80.0–100.0 81.0 16.9 81.0 8.7 81.0 6.5 80.6 0.2F8 40.0 40.6 22.8 40.4 10.1 40.2 9.8 40.4 0.1

Vol. 50 No. 1 2017 63

posed AdTsPCA can be expanded to include PCA models built with data segments identified as faults or recovered process.

Nomenclature

A = distance normc = number of segmentse = prediction error of PCA modelE = principal component residualsL = matrix containing eigenvector of PCA modelsO = compatibility matrixQ(α) = upper control limit for SPE with significance level αq = number of retained principal componentr = rank of Aa distance norm corresponding to the a-th clusterS = homogeneous time-series segmentSPCA = PCA similarity factorSPE = squared prediction errorT2 = Hotelling’s T2 statisicT2(α) = upper control limit for T2 with significance level αt = timethμ = threshold value for cluster overlapU = principal component scoresV = principal component loadingsva

t = center of the segment in timeva

x = coordinate of the a-th cluster center in the feature spaceX = data covariance matrixx = data point

α = significance level based on standard normal distributionη = parameters of the cluster:

covariance matrix, unconditional cluster probability, and cluster center coordinates

μa, k = membership of i-th data point to the a-segmentφ = combined discriminant similarity indexσa, t = variances of the membership functionτ1 = degree of parallelismτ2 = degree of closeness

‹Subscripts›a = specific index for segment/clustersb = specific index for segment/clustersi = index for data pointt = time index for data point

Literature Cited

Abonyi, J., B. Feil, S. Nemeth and P. Arva; “Modified Gath–Geva Clus-tering for Fuzzy Segmentation of Multivariate Time-series,” Fuzzy Sets Syst., 149, 39–56 (2005)

Bezdek, J. C. and J. C. Dunn; “Optimal Fuzzy Partitions: A Heuristic for Estimating the Parameters in a Mixture of Normal Distributions,” IEEE Trans. Comput., 24, 835–838 (1975)

Camacho, J. and J. Picó; “Multi-phase Principal Component Analysis for Batch Processes Modelling,” Chemom. Intell. Lab. Syst., 81, 127–136 (2006)

Cinar, A., S. Parulekar, C. Undey and G. Birol; Batch Fermentation: Modeling: Monitoring, and Control, Chemical Industries, pp. 4–5, Marcel Dekker, Inc., New York, United States (2003)

Gath, I. and A. B. Geva; “Unsupervised Optimal Fuzzy Clustering,”

IEEE Trans. Pattern Anal. Mach. Intell., 11, 773–780 (1989)Goldrick, S., A. Ştefan, D. Lovett, G. Montague and B. Lennox; “The

Development of an Industrial-scale Fed-batch FFermentation Sim-ulation,” J. Biotechnol., 193, 70–82 (2015)

Höppner, F., F. Klawonn, R. Kruse and T. Runkler; Fuzzy Cluster Analy-sis: Methods for Classification, Data Analysis and Image Recogni-tion, pp. 157–184, John Wiley & Sons Ltd., Chichester, U.K. (1999)

Jackson, J. E. and G. S. Mudholkar; “Control Procedures for Residuals Associated With Principal Component Analysis,” Technometrics, 21, 341–349 (1979)

Jackson, J.; A User’s Guide to Principal Components, Wiley Series in Probability and Statistics, pp. 123–141, John Wiley & Sons Ltd., Chichester, U.K. (2003)

Joe Qin, S.; “Statistical Process Monitoring: Basics and Beyond,” J. Che-mometr., 17, 480–502 (2003)

Jolliffe, I.; Principal Component Analysis, Springer Series in Statistics, pp. 92–110, Springer, Berlin, Germany (2002)

Kaymak, U. and R. Babuska; “Compatible Cluster Merging for Fuzzy Modelling,” Fuzzy Systems, 1995. International Joint Conference of the Fourth IEEE International Conference on Fuzzy Systems and The Second International Fuzzy Engineering Symposium., Pro-ceedings of 1995 IEEE Int, vol. 2, pp. 897–904 (1995)

Kelly, P. M.; An Algorithm for Merging Hyperellipsoidal Clusters, Tech. Report LA-UR-94-3306, Los Alamos National Laboratory, Los Alamos, U.S.A. (1994)

Kosanovich, K. A., K. S. Dahl and M. J. Piovoso; “Improved Process Un-derstanding Using Multiway Principal Component Analysis,” Ind. Eng. Chem. Res., 35, 138–146 (1996)

Krzanowski, W. J.; “Between-Groups Comparison of Principal Compo-nents,” J. Am. Stat. Assoc., 74, 703–707 (1979)

Lu, N., F. Gao and F. Wang; “Sub-PCA Modeling and On-line Monitor-ing Strategy for Batch Processes,” AIChE J., 50, 255–259 (2004)

Ng, Y. S. and R. Srinivasan; “An Adjoined Multi-model Approach for Monitoring Batch and Transient Operations,” Comput. Chem. Eng., 33, 887–902 (2009)

Nomikos, P. and J. F. MacGregor; “Monitoring Batch Processes Using Multiway Principal Component Analysis,” AIChE J., 40, 1361–1375 (1994)

Paul, G. C. and C. R. Thomas; “A Structured Model for Hyphal Dif-ferentiation and Penicillin Production Using Penicillium chrysoge-num,” Biotechnol. Bioeng., 51, 558–572 (1996)

Qin, S. J.; “Survey on Data-driven Industrial Process Monitoring and Diagnosis,” Annu. Rev. Contr., 36, 220–234 (2012)

Raich, A. and A. Çinar; “Statistical Process Monitoring and Disturbance Diagnosis in Multivariable Continuous Processes,” AIChE J., 42, 995–1009 (1996)

Singhal, A. and D. E. Seborg; “Matching Patterns from Historical Data using PCA and Distance Similarity Factors,” American Control Conference, 2001, vol. 2, pp. 1759–1764 (2001)

Tipping, M. E. and C. M. Bishop; “Mixtures of Probabilistic Principal Component Analyzers,” Neural Comput., 11, 443–482 (1999)

Ündey, C. and A. Cinar; “Statistical Monitoring of Multistage, Multi-phase Batch Processes,” IEEE Control Systems, 22, 40–52 (2002)

Venkatasubramanian, V., R. Rengaswamy, S. N. Kavuri and K. Yin; “A Review of Process Fault Detection and Diagnosis: Part III: Process History Based Methods,” Comput. Chem. Eng., 27, 327–346 (2003)

Yao, Y. and F. Gao; “A Survey on Multistage/Multiphase Statistical Mod-eling Methods for Batch Processes,” Annu. Rev. Contr., 33, 172–183 (2009)

http://dx.doi.org/10.1016/j.fss.2004.07.008



http://dx.doi.org/10.1109/T-C.1975.224317

http://dx.doi.org/10.1109/T-C.1975.224317

http://dx.doi.org/10.1109/T-C.1975.224317

http://dx.doi.org/10.1016/j.chemolab.2005.11.003



http://dx.doi.org/10.1109/34.192473

http://dx.doi.org/10.1109/34.192473

http://dx.doi.org/10.1016/j.jbiotec.2014.10.029



http://dx.doi.org/10.1080/00401706.1979.10489779

http://dx.doi.org/10.1080/00401706.1979.10489779

http://dx.doi.org/10.1080/00401706.1979.10489779

http://dx.doi.org/10.1002/cem.800

http://dx.doi.org/10.1002/cem.800

http://dx.doi.org/10.1021/ie9502594



http://dx.doi.org/10.1080/01621459.1979.10481674

http://dx.doi.org/10.1080/01621459.1979.10481674

http://dx.doi.org/10.1002/aic.10024


http://dx.doi.org/10.1016/j.compchemeng.2008.11.014






http://dx.doi.org/10.1002/(SICI)1097-0290(19960905)51:5%3c558::AID-BIT8%3e3.0.CO;2-B



http://dx.doi.org/10.1016/j.arcontrol.2012.09.004





http://dx.doi.org/10.1162/089976699300016728

http://dx.doi.org/10.1162/089976699300016728

http://dx.doi.org/10.1109/MCS.2002.1035216

http://dx.doi.org/10.1109/MCS.2002.1035216

http://dx.doi.org/10.1016/S0098-1354(02)00162-X

http://dx.doi.org/10.1016/S0098-1354(02)00162-X

http://dx.doi.org/10.1016/S0098-1354(02)00162-X




batch process monitoring based on fuzzy segmentation of

Documents