an adaptive predictor for dynamic system forecasting

ARTICLE IN PRESS

Mechanical Systemsand

Signal Processing

0888-3270/$ - se

doi:10.1016/j.ym

�Tel.: +1 80

E-mail addr

Mechanical Systems and Signal Processing 21 (2007) 809–823

www.elsevier.com/locate/jnlabr/ymssp

An adaptive predictor for dynamic system forecasting

Wilson Wang�

Department of Mechanical Engineering, Lakehead University, 955 Oliver Road, Thunder Bay, Ontario, Canada P7B 5E1

Received 23 July 2005; received in revised form 11 December 2005; accepted 14 December 2005

Available online 8 February 2006

Abstract

A reliable and real-time predictor is very useful to a wide array of industries to forecast the behaviour of dynamic

systems. In this paper, an adaptive predictor is developed based on the neuro-fuzzy approach to dynamic system

forecasting. An adaptive training technique is proposed to improve forecasting performance, accommodate different

operation conditions, and prevent possible trapping due to local minima. The viability of the developed predictor is

evaluated by using both gear system condition monitoring and material fatigue testing. The investigation results show that

the developed adaptive predictor is a reliable and robust forecasting tool. It can capture the system’s dynamic behaviour

quickly and track the system’s characteristics accurately. Its performance is superior to other classical forecasting schemes.

r 2006 Elsevier Ltd. All rights reserved.

Keywords: Neuro-fuzzy forecasting scheme; Adaptive training; Machinery condition monitoring; Model uncertainty; Fatigue testing

1. Introduction

A reliable predictor is very useful to a wide range of industries to forecast the upcoming states of a dynamicsystem. In mechanical systems, for example, the forecasting information can be used for: (1) conditionmonitoring to provide an accurate alarm before a fault reaches critical levels so as to prevent machineryperformance degradation, malfunction, or catastrophic failure; (2) scheduling of repairs and predictive/preventive maintenance in manufacturing facilities; and (3) predictive and fault-tolerant control. System stateforecasting utilises available observations to predict the future states of a dynamic system. The observationscan be patterns from such information carriers as temperature, acoustic signal, or vibration. The vibration-based approach, however, is the most commonly used technique because of the ease of measurement andanalysis. Thus, it is also used in this study.

Time-series forecasting can be performed for one-step or multiple-step predictions. The more steps, the lessreliable the forecasting operation is because the involved approaches in the multiple-step prediction areassociated with one-step operations. Thus, this research also focuses on one-step forecasting operations.

Several techniques have been proposed in the literature for time-series prediction. The classical approachesare the use of stochastic models [1] and dynamics-based techniques [2,3]; usually, these methods are easy toimplement but difficult in forecasting the behaviour of complex dynamic systems. Recent interest in time-series

e front matter r 2006 Elsevier Ltd. All rights reserved.

ssp.2005.12.008

7 766 7174; fax: +1 807 343 8928.

ess: [email protected].

www.elsevier.com/locate/jnlabr/ymssp

dx.doi.org/10.1016/j.ymssp.2005.12.008

mailto:[email protected]

ARTICLE IN PRESS

DynamicSystem

DataCollection

Signal Processing

Post Processing

Training

Predictor Database

Fig. 1. The architecture of the forecasting tool based on the adaptive predictor.

W. Wang / Mechanical Systems and Signal Processing 21 (2007) 809–823810

forecasting has focused on the use of flexible models such as neural networks (NNs). After being properlytrained, NNs can represent the non-linear relationship between the previous states and the future states of adynamic system [4]. NN-based predictors have two typical network architectures: feedforward and recurrentnetworks, both of which have been employed in some applications [5,6]. Advanced investigation has indicatedthat the recurrent network predictors perform superior to those based on the feedforward networks [7]. NNforecasting schemes, however, have some disadvantages: Their forecasting operation is opaque to users, andthe convergence of training is usually slow and not guaranteed. To solve these problems, synergetic schemesbased on NNs and fuzzy logic have been adopted in time-series forecasting [8]. In such synergetic schemes, thefuzzy logic provides NNs with a structural framework with high-level if-then rule-based thinking andreasoning, whereas the NNs provide the fuzzy systems with learning capability [9,10]. Jang et al. [11] proposeda neuro-fuzzy (NF) scheme for time-series forecasting. By simulation, they found that the NF predictorperformed better than both the stochastic models and the feedforward NNs. The author and his research teamdeveloped an NF prognostic system for machinery condition applications [12]. Their investigation indicatedthat if an NF predictor is properly trained, it performs even better than both the feedforward and the recurrentnetwork forecasting schemes.

Even though the NF predictors have demonstrated their superior properties to other classical time-seriesforecasting schemes, advanced research needs to be done in a few aspects before they can be applied to generalreal-time industrial applications [13]: (1) improving their application robustness to accommodate differentsystem conditions; (2) mitigating the requirements for the representative data sets; and (3) improving theirconvergence properties, especially for complex operation applications. Consequently, the aim of this paper isto develop an adaptive predictor to solve these problems in order to provide industries with a more reliableand real-time forecasting tool. Fig. 1 schematically shows the architecture of the forecasting tool based on theproposed adaptive predictor. Signals are collected using corresponding sensors. After being properly filteredand sampled, the signals are fed into a computer. In further processing, the first step is to generate therepresentative features from the collected signals by applying different signal processing techniques. Post-processing is done to enhance the feature characteristics and derive monitoring indices for forecastingoperations. All the involved signal processing techniques and forecasting schemes are coded in MATLAB andthen translated to a C++ environment for general application purposes.

This presentation starts with a description of the adaptive predictor and the corresponding adaptive trainingtechnique. Next, the predictor is implemented for real-time monitoring applications. The viability of this newpredictor is verified by online experimental tests related to gear condition monitoring and material fatigue testing.

2. The adaptive NF predictor

In this proposed adaptive predictor, the forecasting reasoning is performed by fuzzy logic, whereas the fuzzysystem parameters are trained by using NNs. To make it compatible with those in the author’s prior work [12],four input variables fx�3r x�2r x�r x0g are utilised in this case, where x0 represents the current state of thedynamic system and r denotes the time step. If two membership functions (MFs), small and large, are assigned toeach input variable, then 24 ¼ 16 rules will be formulated to predict the future state of a dynamic system, x+r,

<j : If ðx0 is Mj0Þ and ðx�r is M

j1Þ and ðx�2r is M

j2Þ and ðx�3r is M

j3Þ then xþr ¼ Cj, (1)

where Cj ¼ cj0x0 þ c

j1x�r þ c

j2x�2r þ c

j3x�3r þ c

j4; M

ji denote MFs; c

ji are constants; i ¼ 0,1,y,3, j ¼ 1,2,y,16.

ARTICLE IN PRESS

Fig. 2. The network architecture of the adaptive predictor.

W. Wang / Mechanical Systems and Signal Processing 21 (2007) 809–823 811

The network architecture of this adaptive predictor is schematically shown in Fig. 2. It is a 5-layer networkin which each node performs a particular activation function on the incoming signals. The links, with unityweights, represent the flow direction of signals between nodes. The nodes in layer 1 transmit input signals tothe next layer. Each node in layer 2 acts as an MF, which is either a single node that performs a simpleactivation function or multilayer nodes that perform a complex function. Different from the general NFpredictor that was proposed in [12], this adaptive predictor has a feedback link to each node in layer 2 toimprove the forecasting accuracy and convergence property (stability). The signal in each feedback linkrepresents the node output in the previous time step (i.e., at n�1). For example, if sigmoid MFs withparameters fa

ji b

jig are utilised in this case, then

mMji

xðnÞ�ir

� �¼

1

1þ exp ½�ajiðX i � b

jiÞ�

, (2)

where

X i ¼ xðnÞ�ir þ mM

ji

xðn�1Þ�ir

� �¼ x

ðnÞ�ir þ

1

1þ exp �aji xðn�1Þ�ir � b

ji

� �h i , ð3Þ

where mMji

xðnÞ�ir

� �and mM

ji

xðn�1Þ�ir

� �represent the MF values at the current and previous time steps, respectively.

Each node in layer 3 performs a fuzzy T-norm operation. If a max-product operator is applied in this case,the rule firing strength is

mj ¼Q3i¼0

mMjiðx�irÞ; j ¼ 1; 2; . . . ; 16 . (4)

All the rule firing strengths are normalised in layer 4. After a linear combination of the input variables inlayer 5, the predicted output x+r is obtained by using the centroid defuzzification:

xþr ¼X16j¼1

m̄jðcj0x0 þ c

j1x�r þ c

j2x�2r þ c

j3x�3r þ c

j4Þ, (5)

where m̄j ¼ mj=P16

j¼1mj denotes the normalised rule firing strength.

ARTICLE IN PRESSW. Wang / Mechanical Systems and Signal Processing 21 (2007) 809–823812

The input variables as well as the forecasted output x+r at each time step are stored in the training database.The predictor, as represented in Eq. (1), can also be adopted for variable step forecasting operations. If the

time interval is kr, where k is an integer, the input variables to the predictor become fx�3kr x�2kr x�kr x0g. Thepredicted state x+kr is equivalent to a multiple-step forecasting operation, and the corresponding forecastingrules will be represented by

<j : Ifðx0 is Mj0Þ and ðx�kr is M

j1Þ and ðx�2kr is M

j2Þ and ðx�3kr is M

j3Þ then xþkr ¼ Cj, (6)

where Cj ¼ cj0x0 þ c

j1x�kr þ c

j2x�2kr þ c

j3x�3kr þ c

j4, i ¼ 0,1,y,3, j ¼ 1,2,y,16. M

ji and c

ji may differ from

those in Eq. (1), which can be obtained by using training data recalibration and adaptive training as discussedin the next section.

3. The adaptive training technique

3.1. The training algorithm

Before being applied to real-time applications, a forecasting scheme should be properly trained usingrepresentative data sets, which should cover all the possible application conditions [14,15]. Such a requirementis usually difficult to achieve in real-world applications because most machinery operates in a noisy and anuncertain environment. Furthermore, the classical forecasting schemes are mainly used for time-invariantsystems or systems with slowly varying model parameters. However, machinery dynamic characteristics maychange suddenly just after repair or regular maintenance [16]. Therefore, a real-time predictor should havesufficient adaptive capability to accommodate different operation conditions [17]. In this section, an adaptivetraining technique is adopted to approach these problems.

According to [12], an NF forecasting scheme can be properly trained as long as the number of representativedata pairs is more than five times the number of parameters to be updated. The developed NF predictor, asrepresented in Eq. (1), has 96 unknown parameters (16 MF parameters and 80 consequent parameters).Therefore, to improve the training efficiency and facilitate the database management, the size of thetraining data sets in this adaptive predictor is limited to N ¼ 500. The data sets in the training database arerepresented as

Td ¼

xð1Þ�3r x

ð1Þ�2r xð1Þ�r x

ð1Þ0 x

ð1Þþr

xð2Þ�3r x

ð2Þ�2r xð2Þ�r x

ð2Þ0 x

ð2Þþr

..

.

xðNÞ�3r x

ðNÞ�2r xðNÞ�r x

ðNÞ0 x

ðNÞþr

26666664

37777775. (7)

For the nth input data pair fxðnÞ�3r x

ðnÞ�2r xðnÞ�r x

ðnÞ0 g, n ¼ 1,2,y,N, the forecasted state x

ðnÞþr is computed by Eq.

(5):

xðnÞþr ¼

X16j¼1

m̄jðcj0xðnÞ0 þ c

j1xðnÞ�r þ c

j2xðnÞ�2r þ c

j3xðnÞ�3r þ c

j4Þ. (8)

Different from other online training methods that use the definite weight functions [8], this adaptive trainingtechnique will use an exponential gain function to highlight the contribution of the data sets from a time-variant series

gn ¼1

1þ exp ½aðn=2� bÞ�, (9)

where gn 2 ½0; 1�; a ¼ 5� 10�5 N, b ¼ N=2, and N is the total number of training data pairs.

ARTICLE IN PRESSW. Wang / Mechanical Systems and Signal Processing 21 (2007) 809–823 813

Eq. (9) is illustrated in Fig. 3. The recent data sets are given more gain factors, whereas the old data pairscontribute less to training. All the gain factors are recognised in a gain matrix G:

G ¼

g1 0 0 � � � 0

0 g2 0 � � � 0

..

.

0 0 0 � � � gN

2666664

3777775, (10)

where gN ¼ 1. If d(n) denotes the desired output value for the nth training data set (i.e., dðnÞ ¼ xðnþ1Þ0 ), the

forecasting error is defined as

E ¼XN

n¼1

En ¼1

2

XN

n¼1

gn xðnÞþr � d ðnÞ

� �2. (11)

If sigmoid MFs with parameters faji b

jig are applied, the fuzzy MF parameters are updated by using the

gradient method:

ajiðmþ 1Þ ¼ a

jiðmÞ � Za

XN

n¼1

qEn

qaji

, (12)

bjiðmþ 1Þ ¼ b

jiðmÞ � Zb

XN

n¼1

qEn

qbji

, (13)

where m denotes the mth training epoch; Za and Zb are the step sizes; and

qEn

qaji

¼ gn xðnÞþr � d ðnÞ

� �CðnÞj

P16k¼1mk �

P16k¼1mkC

ðnÞkP16

k¼1mk

� �2 ðxðnÞ�ir � b

jiÞð1� mM

jiÞmj, (14)

qEn

qbji

¼ gn xðnÞþr � d ðnÞ

� �CðnÞj

P16k¼1mk �

P16k¼1mkC

ðnÞkP16

k¼1mk

� �2 ðmMji� 1Þmja

ji, (15)

where i ¼ 0,1,y,3, j ¼ 1,2,y,16; the manipulation of Eqs. (14) and (15) can be found in the appendix.

0 100 200 300 400 5000.5

0.6

0.7

0.8

0.9

1

Sample Span

Gai

n Le

vel

Fig. 3. The gain function.


Given the values of the MF parameters and N training data pairs to the adaptive predictor,fxðnÞ�3r x


ðnÞ0 d ðnÞg, n ¼ 1,2,y,N, N linear equations in terms of the consequent parameters h can be

formed as

Ah ¼ d, (16)

where A is the resulting matrix from the inference operation of the adaptive predictor:

A ¼

m̄ð1Þ1 xð1Þ0 m̄ð1Þ1 xð1Þ�r m̄ð1Þ1 x

ð1Þ�2r m̄ð1Þ1 x

ð1Þ�3r m̄ð1Þ1 � � � m̄ð1Þ16 x


ð1Þ�3r m̄ð1Þ16

m̄ð2Þ1 xð2Þ0 m̄ð2Þ1 xð2Þ�r m̄ð2Þ1 x


ð2Þ�3r m̄ð2Þ1 � � � m̄ð2Þ16 x


ð2Þ�3r m̄ð2Þ16

..

.

m̄ðNÞ1 xðNÞ0 m̄ðNÞ1 xðNÞ�r m̄ðNÞ1 x

ðNÞ�2r m̄ðNÞ1 x

ðNÞ�3r m̄ðNÞ1 � � � m̄ðNÞ16 x

ðNÞ�2r m̄ðNÞ16 x

ðNÞ�3r m̄ðNÞ16

26666664

37777775. (17)

h is a vector whose elements are the predictor’s consequent parameters to be updated:

h ¼ ½ c10 c11 c12 c13 c14 � � � c160 c161 c162 c163 c164 �

T. (18)

d represents the vector of the desired forecasting states:

d ¼ ½ d ð1Þ d ð2Þ � � � d ðNÞ �T. (19)

Because the row vectors in A and the associated elements in d are obtained sequentially, h in Eq. (16) can becomputed recursively. Let ak denote the kth row vector in matrix A, Ak be a submatrix of A composed by thefirst kth rows of A, dk be the subvector of d composed by the first kth elements of d, d(k) be the kth element of d,Gk be a submatrix of G composed by the first kth rows of G. Then the current output of the predictor is

xðkÞþr ¼ akhk, (20)

where k ¼ 0,1,y,N�1. Correspondingly, Eq. (11) can be rewritten as

E ¼1

2

XN�1k¼0

gk akhk � d ðkÞ� �2

¼1

2ðAh� dÞTGðAh� dÞ

¼ 12ðhTATGAh� 2dTGAhþ dTGdÞ. ð21Þ

To minimise the forecasting error, set qE=qh ¼ 0, then

h ¼ ðATGAÞ�1ATGd. (22)

To derive the formula for a recursive estimate, corresponding to the kth input data pair, hk can be computedby Eq. (22), that is,

hk ¼ ðATk GkAkÞ

�1ATk Gkdk. (23)

Corresponding to the (k+1)th data pair faTkþ1; dðkþ1Þg, Eq. (23) becomes

hkþ1 ¼

Ak

aTkþ1

" #TGk 0

0 1

" #Ak

aTkþ1

" #0@

1A�1

Ak

aTkþ1

" #TGk 0

0 1

" #dk

d ðkþ1Þ

" #

¼ ðATk GkAk þ akþ1aT

kþ1Þ�1ðAT

k Gkdk þ akþ1dðkþ1ÞÞ. ð24Þ

Let Sk ¼ ðATk GkAkÞ

�1, that is, S�1k ¼ ATkGkAk, then

Skþ1 ¼

Ak

aTkþ1

" #TGk 0

0 1

" #Ak

aTkþ1

" #0@

1A�1

¼ Sk � Skakþ1ðI þ aTkþ1Skakþ1Þ

�1aTkþ1Sk. ð25Þ


Eq. (24) becomes

hkþ1 ¼ Skþ1ATk Gkdk þ Skþ1akþ1d

ðkþ1Þ

¼ hk þ Skþ1akþ1ðdðkþ1Þ� aT

kþ1hkÞ, ð26Þ

where k ¼ 0,1,y,N�1. The least-squares estimate of h is hN. The initial conditions to Eqs. (25) and (26) areh0 ¼ 0 and S0 ¼ aI, respectively, where a 2 ½102 106�, and I is an identity matrix which is 16� 16 in this case.

3.2. Training processes and error measurement

Consider a given (ith) data set fxðiÞ�3r x

ðiÞ�2r xðiÞ�r x

ðiÞ0 d ðiÞg, where d(i) is the desired system output. As illustrated

in Fig. 2, the relative forecasting error (i.e., the point residual error) is measured by

ei ¼xðiÞþr � d ðiÞ

d ðiÞ

��, (27)

where xðiÞþr is the predictor output obtained by Eq. (5), and d ðiÞ ¼ x

ðiþ1Þ0 .

The change rate of the forecasting error for this ith data pair will be

_ei ¼dei

dt¼

xðiÞþr � d ðiÞ

d ðiÞ

�� x

ði�1Þþr � d ði�1Þ

dði�1Þ

��. (28)

The adaptive training of the predictor is triggered if the point residual error ei exceeds a tolerance et as wellas if the error changes increasingly, that is,

½ei4�t� ^ ½_ei40�40. (29)

In training, the predictor’s consequent parameters are optimised in the forward pass of each training epochby using Eqs. (25) and (26), whereas the MF parameters are fine-tuned in the backward pass by using Eqs. (12)and (13). The training error E in Eq. (11) is minimised successively. If the mismatching is caused by, forexample, the convergence deterioration due to a local minimum, the error-making rules, which take the mostactive part in making the current forecasting operation, are punished by reducing their contributions to thefollowing operations. The corresponding rule boundary properties in the decision space are modifiedaccordingly. To prevent possible trapping due to that local minimum, an optimal search direction is generatedbased on both forecasting error states and the error change trend orientations over a long term.

The training process is terminated as long as the training error is reduced to a specified tolerance level (e.g.,10�5 in this case)

DE ¼ EðmÞ � Eðm� 1Þp10�5, (30)

where EðmÞ and Eðm� 1Þ represent, respectively, the training error for the mth and ðm� 1Þth epoch.The general forecasting accuracy of the predictor can also be estimated over a specified region (e.g., over M

data sets) according to the following regional residual error

eR ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1

M

XMi¼1


d ðiÞ

!2vuut . (31)

3.3. Modelling uncertainty analysis

In employing training data sets to identify predictor model parameters, there always exist some errors due toimperfections in model assumptions and training processes, as well as noises (i.e., the signals not related to theinformation of interest) in measurement and analysis. This effect is termed model uncertainty. Severaltechniques have been proposed in the literature for analysing model uncertainty problems [18]. For linearmodels, in which the model’s output is linear (or approximately linear) in terms of its parameters, the model


uncertainty processing techniques can be classified as active and passive approaches [19]. The noise in activeapproaches is assumed random in nature and is characterised by some stochastic representations such asprobability density functions. Consequently, reference observers are constructed for model uncertaintyanalysis [20]. In real-world applications such as in machinery systems, however, noise signals may not beexactly random in nature, and they are difficult to characterise stochastically. The passive approach applieserror thresholds for noise analysis, and a noise signal is characterised by its upper and lower bounds. The well-accepted passive approaches include ellipsoidal outer bounding technique, orthotopic outer boundingalgorithm, and exact parameter description method [18]. In this study, because the NF predictor has acompleted architecture (i.e., 16 complementary rules corresponding to four input variables each having twoMFs) and is adaptively trained, a simplified bounded-error method is adopted for model uncertainty analysis.

Based on the specific application requirements, a point error threshold et is chosen (e.g., �t ¼ 0:05). For agiven data set, if the relative residual error is less than the priori threshold, that is

ei ¼xðiÞþr � dðiÞ

d ðiÞ

��p�t (32)

the monitoring is assumed in its normal condition. Adaptive training is performed as long as Eq. (29) holds.After training process is completed, the relative model uncertainty error can be determined by Eq. (31),

eR ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1

N

XN

i¼1


dðiÞ

!2vuut , (33)

where N represents the number of training data sets.In normal forecasting operations, for a given (the ith) data set, the point relative residual signal will satisfy


d ðiÞ2 ½��t �t�. (34)

Correspondingly, the confidence interval for the predictor output xðiÞþr is

ð1� �tÞdðiÞpx

ðnÞþrpð1þ �tÞd

ðiÞ. (35)

4. Performance evaluation

In this section, the proposed adaptive predictor is implemented for real-time forecasting applications.Online tests are conducted corresponding to gear system monitoring and material fatigue testing, respectively.To make a comparison, the forecasting results from a general NF predictor with offline training, as proposedin [12], are also listed; that general NF predictor has demonstrated its superior performance to other classicalforecasting schemes.

4.1. Initial training

Suppose there are no sufficient representative data sets available at the starting stage (a general case), theinitial training data pairs in the training database Td in Eq. (7) are generated from the Mackey–Glassdifferential equation [21] with initial conditions of xð0Þ ¼ 1 and t ¼ 2:

dxðtÞ

dt¼

0:2xðt� tÞ1þ x10ðt� tÞ

� 0:1xðtÞ. (36)

The data sets from Eq. (36) have chaotic, non-linear, and non-convergent properties, which makethem suitable to train forecasting schemes for general application purposes. In the test cases in this section,both predictors are initially trained using the same data set from Eq. (36). In the adaptive predictor, the


Mackey–Glass data pairs in the training database are gradually replaced by the available new data sets fromtesting, whereas the adaptive training is conducted as long as Eq. (29) holds.

4.2. Gear system monitoring

4.2.1. Experimental set-up

Fig. 4 schematically shows the experimental set-up used for online gear system monitoring. The system isdriven by a 3-HP DC motor. The load is provided by a heavy-duty magnetic break system. A two-stagegearbox is being tested in this work. The number of teeth is 17, 25, 19, and 23, respectively, for gears ]1 to ]4.The shafts are mounted to the gearbox housing by journal bearings. A magnetic pick-up sensor is mounted inthe radial direction of gear ]4 to provide a reference signal (one pulse per each tooth period) for the timesynchronous average filtering. The gap between the magnetic pick-up sensor and the gear top land is in therange of 0.3–1.0mm. The motor rotation is controlled by a speed controller, which allows tested gear systemto operate in the range of 20–3600 rpm. The speed of the drive motor and the load of the magnetic breaksystem are adjusted automatically to accommodate the range of speed/load operation conditions. Thevibration is measured using two accelerometers mounted at both ends of the gearbox housing. The signalsfrom different sources are collected using an intelligent data acquisition device made by the author’s researchteam [17], which consists of a micro-controller and the circuits for the purposes of sampling, anti-aliasingfilters, and AD/DA converters. The preconditioned signals are then fed into a computer for further processing.

4.2.2. Monitoring index determination

The measured vibration from the experimental set-up is an overall signal associated with different vibratorysources, such as magnetic break, shafts, journal bearings, and gear mesh. As discussed in [17], each componentwill generate a vibratory signal with specific spectral characteristics, whereas the characteristic frequencies of agear train are located in the highest bandwidth in a gearbox. Therefore, the gear signal can be derived byfiltering out those frequency components lower than the gear mesh frequency frZ Hz, where fr denotes the gearrotation speed in Hz, and Z is the number of teeth of the gear of interest.

The monitoring in this study is conducted gear by gear. Thus, an important procedure is to differentiate thesignal specific to each gear by using the time synchronous average filtering [22]. This process is performed withthe help of a reference signal related to the rotation of the gear of interest. In this case, the rotation referencefor each gear is computed by the corresponding transmission ratio with respect to gear ]4. Through this

Fig. 4. The experimental set-up for gear system monitoring.


synchronous average filtering process, all the signals non-synchronous to the rotation of the gear of interestare removed, and each gear signal is derived and represented in one full revolution (i.e., the signal average).

In condition monitoring, the monitoring indices should be sensitive to pattern modulation due to machineryfaults but insensitive to noise [13]. Several signal processing techniques have been proposed in the literature forgear fault diagnosis, but each has its own merits and limitations. According to the investigation in [23], one ofthe most robust gear fault detection techniques is the beta kurtosis, and it will be applied as an example in thispaper to demonstrate the monitoring process.

Suppose m and s2 represent the mean and variance of the gear signal average, respectively, the beta kurtosisindex can be represented as

xb ¼abðaþ bþ 2Þðaþ bþ 3Þ

3ðaþ bþ 1Þð2a2 � 2abþ a2bþ ab2 þ 2b2Þ, (37)

where a ¼ ðm=s2Þðm� m2 � s2Þ and b ¼ ð1� m=s2Þðm� m2 � s2Þ. The details about these signal processingtechniques can be found in [23].

Three gear cases are tested in this study as represented in Fig. 5: (a) healthy gears, (b) cracked gears, and (c)pitted gears. Many tests have been conducted corresponding to each gear condition, and two typical examplesare illustrated in the following sections.

In the following analysis, the predictor is believed being trained properly if the regional residual erroreRp0:01, whereas eR is computed by Eq. (31) over the training data sets. In the following forecastingoperations, the threshold for point residual error is chosen as �t ¼ 5%, whereas the threshold for regionalresidual error is �r ¼ 2%. Therefore, the predictor is believed to have captured the dynamic characteristics ofthe monitored system as long as eip0:05 and eRp0:02, where eR and ei are computed by Eqs. (31) and (32),respectively.

4.2.3. Cracked gear monitoring

At first, all the gears in the gearbox are in a healthy condition. The tests are conducted under load levelsfrom 0.5 to 3 hp and motor speeds from 100 to 3600 rpm. During testing, motor speed and load levels are

(a) (b) (c)

Fig. 5. Gear conditions tested: (a) healthy gear, (b) cracked gear, (c) pitted gear.

0 50 100 150 200 250 300 3500.4

0.6

0.8

1

Time Span (hours)

Bet

a K

urto

sis

Inde

x

(a)

0 50 100 150 200 250 300 3500.4

0.6

0.8

1

Time Span (hours)

Bet

a K

urto

sis

Inde

x

(b)

Fig. 6. The test results for the cracked gear (solid curves): (a) the forecasting result by the adaptive predictor (dotted curve); (b) the

forecasting result by the general predictor (dotted curve).


randomly changed to simulate practical machinery operation conditions. The monitoring time-step is set atr ¼ 0:5 h, that is, both predictors are applied automatically every 0.5 h to forecast the upcoming values of thenormalised beta kurtosis index xb. After about 150 h, a transverse cut with 10% of the tooth root thickness isintroduced to one tooth of gear ]1 to simulate the initial fatigue crack. Then the test is resumed and continuesuntil the damaged tooth is broken off about 206 h later.

Fig. 6 shows the forecasting results of the normalised monitoring index xb for gear ]1. It is seen that bothpredictors can capture the system’s dynamic behaviours. After the fault is introduced, the gear mesh dynamicschange correspondingly. The general predictor (Fig. 6b) takes about 5 samples to recapture the system’sdynamic characteristics, whereas the adaptive predictor (Fig. 6a) takes only a couple of steps. As more datasets are available in testing, the adaptive predictor performs better than the general predictor because of theadaptive training process. During the last monitoring section, big fluctuations appear because the gear meshdynamics change dramatically just before and after the tooth failure. The adaptive and the general predictorshave provided alarms at about 5–6 samples and 3–4 samples, respectively, prior to the tooth breakage. Theseare very valuable indications for gear system condition monitoring.

4.2.4. Pitted gear monitoring

The prior testing is associated with a localised gear fault. This test is for distributed fault conditionmonitoring. A new pair of gears ]1 and ]2 is mounted and tested. After about 150 h, several pits areintroduced on a tooth surface in gear ]1 to simulate a pitting fault. The tests resume and proceed until seriousnoise occurs due to pitting damage. Fig. 7 shows the forecasting results. Both predictors perform reasonablywell during the healthy period. After the pitting defect is introduced, the adaptive predictor (Fig. 7a) responsesmore quickly than the general predictor (Fig. 7b) to re-capture the gear system’s new dynamic characteristics.

As the pitting fault propagates, the monitoring indicator xb becomes smaller, even though the noise levelbecomes stronger. This healing phenomenon is misleading in condition monitoring, which is associated withfault signal properties. After the pitting occurs, the pitted area can no longer carry a load, and the unpittedarea has to take the extra load. Consequently, the unpitted area is prone to fatigue, and the pitting fault willpropagate rapidly over the entire tooth surface and to other gear teeth. Correspondingly, the localised faultbecomes the distributed surface failure. From the point of view of signal properties, when a localised faultoccurs, some high-amplitude pulses will be generated due to impacts, which are relatively easier for a signalprocessing technique to detect. As the pitting propagates, the overall energy of the fault will increase, but itoften becomes more wideband in nature and difficult to detect in the presence of the other vibratorycomponents of the machine. This example identifies a characteristic of currently used signal processingtechniques: It is usually easier to detect a distinct low-level narrowband tone than a high-level wideband signalin the presence of other signals or noises. Therefore, a possible solution to solve this healing problem is tointegrate the information from both vibration monitoring and acoustics analysis.

Furthermore, from examining Fig. 7, it is clear that the general predictor (Fig. 7b) loses track of the gearsystem’s behaviour during the healing section. It is because during this specific condition, the general predictor

0 50 100 150 200 250 300

0.5

0.6

0.7

0.8

Time Span (hours)

Bet

a K

urto

sis

Inde

x

(b)

0 50 100 150 200 250 300

0.5

0.6

0.7

0.8

Time Span (hours)

Bet

a K

urto

sis

Inde

x

(a)

Fig. 7. The test results for the pitted gear (solid curves): (a) the forecasting result by the adaptive predictor (dotted curve); (b) the

forecasting result by the general predictor (dotted curve).


cannot adapt itself to the system’s new dynamics, so its forecasting operation is trapped by local minima. Onthe other hand, the adaptive predictor (Fig. 7a) can prevent this local trapping problem because of theeffective training process.

It should be stated that if variable time steps, such as 8� 0.5, 20� 0.5, 100� 0.5 h, etc., are applied, asrepresented in Eq. (6), the proposed adaptive predictor can also be used for long-term forecasting operations.That information will be more helpful to schedule machinery maintenance and repairs to eliminate routinemachine shutdowns and check outs.

4.3. Material fatigue testing

The generality of the developed adaptive predictor is evaluated in this section by applying it to other casesrepresented by different scales and monitoring parameters. One of the examples is material fatigue testing,which is usually a time-consuming process. A reliable forecasting tool is very useful in quickly identifying thematerial properties so that the experimental time can be shortened. This test is performed on a specimen with athickness of 3mm, as shown in Fig. 8. The left end of the specimen is fixed to an experimental set-up, whereasthe load is applied to its right end using an exciter. An initial crack of about 3mm is introduced in the middlearea, close to the supporting end of the plate. Both predictors are also initially trained by using aMackey–Glass data set. The monitoring time-step is set at r ¼ 3000 cycles.

Fig. 9 shows the forecasting results of the crack propagation trend which is represented by a direct (offline)measurement. At the starting period, the general predictor (Fig. 9b) gives large forecasting errors due to itsslow convergence. The adaptive predictor (Fig. 9a), however, can overcome this problem by an adaptivetraining process.

The same fatigue crack propagation trend is also indirectly measured (online) at a location close to the crackwith the help of a special electric circuit. The forecasting results are illustrated in Fig. 10. It is seen that bothpredictors can capture and track the system’s characteristics. However, adaptive predictor (Fig. 10a) performs

Crack

Fig. 8. A specimen with an initial crack for material fatigue testing.

0 2 4 6 8 10

5

10

15

20

25

Cycles (105)

Mon

itorin

g In

dex

(mm

)

(b)

0 2 4 6 8 10

5

10

15

20

25

Cycles (105)

Mon

itorin

g In

dex

(mm

)

(a)

Fig. 9. The test results for a crack propagation using direct measurement (solid curves): (a) the forecasting result by the adaptive predictor

(dotted curve); (b) the forecasting result by the general predictor (dotted curve).

ARTICLE IN PRESS

0 2 4 6 8 10-20

20

40

60

80

Cycles (105)

Mon

itorin

g In

dex

(Vol

ts)

0

(b)

0 2 4 6 8 10-20

0

20

40

60

80

Cycles (105)

Mon

itorin

g In

dex

(Vol

ts)

(a)

Fig. 10. The test results for a crack propagation using indirect measurement (solid curves): (a) the forecasting result by the adaptive

predictor (dotted curve); (b) the forecasting result by the general predictor (dotted curve).

W. Wang / Mechanical Systems and Signal Processing 21 (2007) 809–823 821

better than the general predictor (Fig. 10b), thanks to its adaptive network architecture and effective trainingprocess.

Without a doubt, the forecasting performance in the aforementioned cases can be further improved if thepredictors are properly trained by using representative data sets corresponding to specific applications.

5. Conclusion

To provide a wide array of industries with a more reliable and real-time forecasting tool, an adaptivepredictor was developed in this paper based on the NF approach to predict the behaviour of dynamic systems.An adaptive training technique is proposed to further improve the forecasting efficiency. The adaptivepredictor has been implemented for both gear condition monitoring and material fatigue testing, whereas thegear conditions consist of both the localised and distributed faults. Test results showed that the developedadaptive predictor is a reliable forecasting tool. It can capture the system’s dynamic behaviour quickly andtrack the system’s characteristics accurately. It is also a robust forecasting tool in terms of its capabilities toaccommodate different system operation conditions and variations in system’s dynamic characteristics. Theadaptive training technique is efficient in improving the forecasting performance by modifying the propertiesof the decision space boundaries and by preventing possible trapping due to local minima. The forecastingaccuracy of the proposed adaptive predictor is higher than the general NF predictor, which is, in turn, superiorto other classical forecasting schemes.

Further research is currently underway to implement the adaptive predictor in complex industrial facilitiesand to develop new strategies for multiple-step predictions.

Acknowledgements

The author wishes to thank Drs. F. Ismail and F. Golnaraghi from the Department of MechanicalEngineering at the University of Waterloo for their support of this work. Appreciation is also given to theproject students, Mrs. D. Simatovic and M. Puric, for their help in conducting the tests. Financial support ofthis project has been provided by the Natural Sciences and Engineering Research Council of Canada and byMC Technologies Inc.

Appendix A. Derivation of some equations

For the nth training data pair fxðnÞ�3r x


ðnÞ0 d ðnÞg, n ¼ 1,2,y,N, if, for example, sigmoid MFs with

parameters faji b

jig are applied,

mMjiðxðnÞ�irÞ ¼

1

1þexp �aji

xðnÞ�ir�b

jið Þð Þ; i ¼ 0; 1; . . . ; 3; j ¼ 1; 2; . . . ; 1 , (A.1)


the following equations can be obtained

qmMji

qaji

¼exp �a

ji xðnÞ�ir � b

ji

� �� xðnÞ�ir � b

ji

� �1þ exp �a


ji

� �� h i2 ¼ xðnÞ�ir � b

ji

� �1� mM

ji

� �mM

ji, (A.2)

qmMji

qbji

¼� exp �a


ji

� �� a

ji

1þ exp �aji xðnÞ�ir � b

ji

� �� h i2 ¼ ðmMji� 1Þa

jimM

ji. (A.3)

From Eq. (4)

qmj

qmMji

¼

qQ3k¼0

mMj

k

� �qmM

ji

¼

Q3k¼0

mMj

k

mMji

¼mj

mMji

. (A.4)

From Eq. (5)

qxðnÞþr

qaji

¼q

qaji

P16k¼1mkC

ðnÞkP16

k¼1mk

!

¼CðnÞj

P16k¼1mk �

P16k¼1mkC

ðnÞkP16

k¼1mk

� �2 qmj

qaji

¼CðnÞj

P16k¼1mk �

P16k¼1mkC

ðnÞkP16

k¼1mk

� �2 qmj

qmMji

qmMji

qaji

,

¼CðnÞj

P16k¼1mk �

P16k¼1mkC

ðnÞkP16

k¼1mk


jiÞð1� mM

jiÞmj, ðA:5Þ

where CðnÞj ¼ c

j0xðnÞ0 þ c

j1xðnÞ�r þ c

j2xðnÞ�2r þ c

j3xðnÞ�3r þ c

j4.

Similarly,

qxðnÞþr

qbji

¼q

qbji

P16k¼1mkC

ðnÞkP16

k¼1mk

!¼

CðnÞj

P16k¼1mk �

P16k¼1mkC

ðnÞkP16

k¼1mk

� �2 qmj

@bji

¼CðnÞj

P16k¼1mk �

P16k¼1mkC

ðnÞkP16

k¼1mk


ji. ðA:6Þ

Therefore,

qEn

qaji

¼ gnðxðnÞþr � d ðnÞÞ

qxðnÞþr

qaji


CðnÞj

P16k¼1mk �

P16k¼1mkC

ðnÞkP16

k¼1mk


jiÞð1� mM

jiÞmj, ðA:7Þ


qEn

qbji


qxðnÞþr

qbji


CðnÞj

P16k¼1mk �

P16k¼1mkC

ðnÞkP16

k¼1mk


ji. ðA:8Þ

Eqs. (A.7) and (A.8) are Eqs. (14) and (15), respectively.

References

[1] M. Pourahmadi, Foundations of Time Series Analysis and Prediction Theory, Wiley, New York, 2001.

[2] D. Chelidze, J. Cusumano, A dynamical systems approach to failure prognosis, Journal of Vibration and Acoustics 126 (2004) 1–7.

[3] C. Li, H. Lee, Gear fatigue crack prognosis using embedded model, gear dynamic model and fracture mechanics, Mechanical Systems

and Signal Processing 9 (2005) 836–846.

[4] D. Husmeier, Neural Networks for Conditional Probability Estimation: Forecasting beyond Point Prediction, Springer-Verlag

London Ltd., 1999.

[5] A. Atiya, S. El-Shoura, S. Shaheen, M. El-Sherif, A comparison between neural-network forecasting techniques-case study: river flow

forecasting, IEEE Transactions on Neural Networks 10 (1999) 402–409.

[6] J. Connor, R. Martin, L. Atlas, Recurrent neural networks and robust time series prediction, IEEE Transactions on Neural Networks

5 (1994) 240–254.

[7] P. Tse, D. Atherton, Prediction of machine deterioration using vibration based fault trends and recurrent neural networks, Journal of

Vibration and Acoustics 121 (1999) 355–362.

[8] F. Karray, C. deSilver, Soft Computing and Intelligent Systems Design: Theory, Tools, and Applications, Pearson Education

Publishing Inc., 2004.

[9] A. Ruano, Intelligent Control Systems using Computational Intelligence Techniques, IEE, London, 2005.

[10] J. Korbicz, Fault Diagnosis: Models, Artificial Intelligence, Applications, Springer, Berlin, 2004.

[11] J. Jang, C. Sun, E. Mizutani, Neuro-Fuzzy and Soft Computing, Prentice-Hall, Inc., Englewood Cliffs, NJ, 1997.

[12] W. Wang, F. Golnaraghi, F. Ismail, Prognosis of machine health condition using neuro-fuzzy systems, Mechanical Systems and

Signal Processing 18 (2004) 813–831.

[13] W. Wang, F. Ismail, F. Golnaraghi, A neuro-fuzzy approach for gear system monitoring, IEEE Transactions on Fuzzy Systems 16

(2004) 710–723.

[14] D. Nauck, Adaptive rule weights in neuro-fuzzy systems, Journal of Neural Computing and Applications 9 (2000) 60–70.

[15] M. Figueiredo, R. Ballini, S. Soares, M. Andrade, F. Gomide, Learning algorithms for a class of neurofuzzy network and

applications, IEEE Transactions on Systems, Man, and Cybernetics—Part C: Applications and Review 34 (2004) 293–301.

[16] W. Wang, F. Golnaraghi, F. Ismail, Condition monitoring of a multistage printing press, Journal of Sound and Vibration 270 (2004)

755–766.

[17] W. Wang, F. Golnaraghi, F. Ismail, A real-time condition monitoring system for multistage machinery, US Patent #6901,335, 2005.

[18] E. Walter, L. Pronzato, Identification of Parametric Models from Experimental Data, Springer, Berlin, 1997.

[19] M. Kowal, J. Korbicz, Robust fault detection using neuro-fuzzy networks, in: Proceedings of the 16th IFAC World Congress,

Prague, Czech Republic, 2005.

[20] R. Pattern, J. Chen, Robust Model-based Fault Diagnosis for Dynamic Systems, Kluwer Academic Publishers, Dordrecht, The

Netherlands, 1999.

[21] M. Mackey, L. Glass, Oscillation and chaos in physiological control systems, Science 197 (1977) 287–289.

[22] P. McFadden, Interpolation techniques for time domain averaging of gear vibration, Mechanical Systems and Signal Processing 3

(1989) 87–97.

[23] W. Wang, F. Ismail, F. Golnaraghi, Assessment of gear damage monitoring techniques using vibration measurements, Mechanical

Systems and Signal Processing 15 (2001) 905–922.

an adaptive predictor for dynamic system forecasting

Documents