rollerbearingfaultdiagnosisusingstackeddenoisingautoencoderin … › seam › web content ›...

16
Applied Soft Computing Journal 73 (2018) 898–913 Contents lists available at ScienceDirect Applied Soft Computing Journal journal homepage: www.elsevier.com/locate/asoc Roller bearing fault diagnosis using stacked denoising autoencoder in deep learning and Gath–Geva clustering algorithm without principal component analysis and data label Fan Xu, Wai tai Peter Tse , Yiu Lun Tse Department of Systems Engineering and Engineering Management, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China highlights Reduce the dimension of extracted feature using SDAE directly without PCA. Fulfilling the bearing fault diagnosis by using SDAE and clustering model without data label. Using the SDAE reduce the high dimension to 2 or 3 directly from frequency domain feature directly after FFT decomposition. article info Article history: Received 19 February 2018 Received in revised form 24 September 2018 Accepted 26 September 2018 Available online 4 October 2018 Keywords: Deep learning Stacked denoising autoencoder Gath–Geva clustering algorithm Roller bearing fault diagnosis abstract Most deep learning models such as stacked autoencoder (SAE) and stacked denoising autoencoder (SDAE) are used for fault diagnosis with a data label. These models are applied to extract the useful features with several hidden layers, then a classifier is used to complete the fault diagnosis. However, these fault diagnosis classification methods are only suitable for tagged datasets. Actually, many datasets are untagged in practical engineering. The clustering method can classify data without a label. Therefore, a method based on the SDAE and Gath–Geva (GG) clustering algorithm for roller bearing fault diagnosis without a data label is proposed in this study. First, SDAE is selected to extract the useful feature and reduce the dimension of the vibration signal to two or three dimensions direct without principal component analysis (PCA) of the final hidden layer. Then GG is deployed to identify the different faults. To demonstrate that the feature extraction performance of the SDAE is better than that of the SAE and EEMD with the FE model, the PCA is selected to reduce the dimension of eigenvectors obtained from several previously hidden layers, except for the final hidden layer. Compared with SAE and ensemble empirical mode decomposition (EEMD)-fuzzy entropy (FE) models, the results show that as the number of the hidden layers increases, all the fault samples under different conditions are separated better by using SDAE rather than those feature extraction models mentioned. In addition, three evaluation indicators such as PC, CE, and classification accuracy are used to assess the performance of the method presented. Finally, the results show that the clustering effect of the method presented, and its classification accuracy are superior to those of the other combination models, including the SAE-fuzzy C-means (FCM)/Gustafson– Kessel (GK)/GG and EEMD-fuzzy entropy FE-PCA-FCM/GK/GG. © 2018 Elsevier B.V. All rights reserved. 1. Introduction The roller bearing is one of the important components of a rotating machinery system or equipment, and its operating status affects the reliability of the whole system direct. It is an urgent task to feature extraction and fault identification, since the char- acteristics of rolling bearing fault signals are nonlinear and have non-flat stability [14]. For the information extraction, vibration- based feature extraction is a common and useful method. As the Corresponding author. E-mail address: [email protected] (W.t.P. Tse). vibration signals are unstable, the fault diagnosis is challenging in the mechanical society. For machinery condition monitoring, feature extraction based on nonlinear analysis methods, such as empirical mode decom- position (EMD), and ensemble empirical mode decomposition (EEMD), has been widely applied in the field of mechanical fault diagnosis. Due to the characteristics of rolling bearing fault vibra- tion, signals are nonlinear and have nonflat stability, but a self- adaptively method, named EMD, can decompose a complicated signal into some intrinsic mode functions (IMFs), based on the local characteristic time scale of the signal [5]. However, there is a mode-mixing problem in EMD. Moreover, to overcome the problems of mode mixing in EMD, EEMD [6], an improved version https://doi.org/10.1016/j.asoc.2018.09.037 1568-4946/© 2018 Elsevier B.V. All rights reserved.

Upload: others

Post on 25-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Rollerbearingfaultdiagnosisusingstackeddenoisingautoencoderin … › seam › Web Content › Finalized... · 2019-10-09 · F.Xuetal./AppliedSoftComputingJournal73(2018)898–913

Applied Soft Computing Journal 73 (2018) 898–913

Contents lists available at ScienceDirect

Applied Soft Computing Journal

journal homepage: www.elsevier.com/locate/asoc

Roller bearing fault diagnosis using stacked denoising autoencoder indeep learning and Gath–Geva clustering algorithm without principalcomponent analysis and data labelFan Xu, Wai tai Peter Tse ∗, Yiu Lun TseDepartment of Systems Engineering and Engineering Management, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China

h i g h l i g h t s

• Reduce the dimension of extracted feature using SDAE directly without PCA.• Fulfilling the bearing fault diagnosis by using SDAE and clustering model without data label.• Using the SDAE reduce the high dimension to 2 or 3 directly from frequency domain feature directly after FFT decomposition.

a r t i c l e i n f o

Article history:Received 19 February 2018Received in revised form24 September 2018Accepted 26 September 2018Available online 4 October 2018

Keywords:Deep learningStacked denoising autoencoderGath–Geva clustering algorithmRoller bearing fault diagnosis

a b s t r a c t

Most deep learningmodels such as stacked autoencoder (SAE) and stacked denoising autoencoder (SDAE)are used for fault diagnosis with a data label. These models are applied to extract the useful featureswith several hidden layers, then a classifier is used to complete the fault diagnosis. However, thesefault diagnosis classification methods are only suitable for tagged datasets. Actually, many datasets areuntagged in practical engineering. The clustering method can classify data without a label. Therefore, amethod based on the SDAE and Gath–Geva (GG) clustering algorithm for roller bearing fault diagnosiswithout a data label is proposed in this study. First, SDAE is selected to extract the useful featureand reduce the dimension of the vibration signal to two or three dimensions direct without principalcomponent analysis (PCA) of the final hidden layer. Then GG is deployed to identify the different faults.To demonstrate that the feature extraction performance of the SDAE is better than that of the SAE andEEMD with the FE model, the PCA is selected to reduce the dimension of eigenvectors obtained fromseveral previously hidden layers, except for the final hidden layer. Compared with SAE and ensembleempiricalmode decomposition (EEMD)-fuzzy entropy (FE)models, the results show that as the number ofthe hidden layers increases, all the fault samples under different conditions are separated better by usingSDAE rather than those feature extractionmodelsmentioned. In addition, three evaluation indicators suchas PC, CE, and classification accuracy are used to assess the performance of themethod presented. Finally,the results show that the clustering effect of the method presented, and its classification accuracy aresuperior to those of the other combination models, including the SAE-fuzzy C-means (FCM)/Gustafson–Kessel (GK)/GG and EEMD-fuzzy entropy FE-PCA-FCM/GK/GG.

© 2018 Elsevier B.V. All rights reserved.

1. Introduction

The roller bearing is one of the important components of arotating machinery system or equipment, and its operating statusaffects the reliability of the whole system direct. It is an urgenttask to feature extraction and fault identification, since the char-acteristics of rolling bearing fault signals are nonlinear and havenon-flat stability [1–4]. For the information extraction, vibration-based feature extraction is a common and useful method. As the

∗ Corresponding author.E-mail address: [email protected] (W.t.P. Tse).

vibration signals are unstable, the fault diagnosis is challenging inthe mechanical society.

For machinery condition monitoring, feature extraction basedon nonlinear analysis methods, such as empirical mode decom-position (EMD), and ensemble empirical mode decomposition(EEMD), has been widely applied in the field of mechanical faultdiagnosis. Due to the characteristics of rolling bearing fault vibra-tion, signals are nonlinear and have nonflat stability, but a self-adaptively method, named EMD, can decompose a complicatedsignal into some intrinsic mode functions (IMFs), based on thelocal characteristic time scale of the signal [5]. However, thereis a mode-mixing problem in EMD. Moreover, to overcome theproblems of mode mixing in EMD, EEMD [6], an improved version

https://doi.org/10.1016/j.asoc.2018.09.0371568-4946/© 2018 Elsevier B.V. All rights reserved.

Page 2: Rollerbearingfaultdiagnosisusingstackeddenoisingautoencoderin … › seam › Web Content › Finalized... · 2019-10-09 · F.Xuetal./AppliedSoftComputingJournal73(2018)898–913

F. Xu et al. / Applied Soft Computing Journal 73 (2018) 898–913 899

Fig. 1. The structure of SDAE.

of EMD, was proposed by Huang et al. To eliminate the mode-mixing phenomenon, the EEMD introduced a random white noisesignal to calculate the ensemble means of each IMF. Recently, EMDand EEMD have been widely applied in fault diagnosis [7–11].

However, the aforementioned vibration signal feature extrac-tion methods generally require complex mathematical operationsand people need to have an extensive experiment to understandthe vibration signal. For some complex systems, such as ambientinterference and the internal structure interactingwith each other,it is difficult to extract the useful vibration signal features from themeasured vibration signals automatically and effectively [12,13].

In recent years, many scholars have gradually paid attentionto deep learning due to its strong feature extraction capabilities.Compared with the EMD and EEMD, a stacked autoencoder (SAE)in deep learning can extract the useful information automaticallyand reduce the dependence on expert troubleshooting experiencewith signal processing technology. In addition, SAE has been suc-cessfully applied in fault diagnosis. Feng et al. presented a methodbased on a stacked autoencoder (SAE) in DL for roller bearing faultdiagnosis [14]. In this paper, SAE was used to extract the vibra-tion signals’ features, then the pre-training, fine-tuning networkparameters, and eigenvectors through the final hidden layer wereregarded as the input of the output layer to distinguish thedifferentfault categories. Tan et al. [15] denoised signals with a digitalwavelet frame (DWF) and performed fault diagnosis based on astacked autoencoder (SAE). It combined low-level features to formmore abstract high-level features to represent data-distributedcharacteristics. Some scholars have already employed the SAEsuccessfully for fault feature extraction and fault diagnosis [16–19].

However, SAE is only a simple reconstruction of the input data.The features learnt do not have good generalization capabilities. Byadding the ‘‘damage noise’’ to the raw data, based on the structure

of the SAE network, that is, a part of the input data is randomlyzeroed, and the ‘‘pure raw input’’ is reconstructed from ‘‘noisydata’’, a stacked denoising autoencoder (SDAE) can obtain morerobust expression features for the original input information thanSAE [20,21]. The input is corrupted by randomly setting someof theinput to zero, which is called dropout noise. This denoising processhelps the autoencoders to learn robust representation. In addition,each autoencoder layer is intended to learn an increasingly abstractrepresentation of the input [16]. SDAE has been extensively used indifferent types of application [15,22–24].

For fault diagnosis, many machine learning models have beenwidely applied in fault diagnoses, such as support vector ma-chine (SVM), random forest (RF), and artificial neural networks(ANN) [8,25,26]. However, most of the deep learning models witha classifier output layer and the above different machine learningclassification models for fault diagnosis are only suitable for atagged dataset. Sometimes, the researchers also rely on the knowl-edge about faults, such as the fault characteristic frequency, andextract the related frequency band energy as features, and thenestablish the relationship between the feature vectors and theirlabels (fault type or healthy condition) [27]. But marking datarequires large amounts of time and material resources when theamount of data is large. A clustering method can classify datawithout a label. Fuzzy c-means (FCM) is one of the most commonclustering methods. Zhang et al. have developed a method basedon FCM for fault diagnosis [28], but the FCM is only suitable fora homogeneous data structure and handles the spherical distancedata with standard specification. Gustafson–Kessel (GK) is an im-proved algorithm based on FCM. GK can handle any direction datawith subspace dispersion by introducing the adaptive distancenorm and covariance matrix [29]. Wang et al. applied the GKin roller bearing fault diagnosis [30]. However, FCM and GK are

Page 3: Rollerbearingfaultdiagnosisusingstackeddenoisingautoencoderin … › seam › Web Content › Finalized... · 2019-10-09 · F.Xuetal./AppliedSoftComputingJournal73(2018)898–913

900 F. Xu et al. / Applied Soft Computing Journal 73 (2018) 898–913

Fig. 2. The frame diagram of the proposed method.

only suitable for a dataset of a spherical structure. Whereas dataobtained from practical engineering systems have different shapesand structures. Therefore, a method called Gath–Geva (GG) hasbeen developed to solve this issue. It adopted the fuzzy maxi-mum likelihood estimation to measure distance norms betweentwo samples, and it is suitable for data of different shapes andorientations [31,32]. In [33], the authors considered EEMD withfuzzy entropy(FE), the PCA, and GG models for roller bearing faultdiagnosis. But this combination model is complicated and needsseveral models to extract the vibration signal feature and diagnosethe fault.

As mentioned above, the advantages of the SDAE model forroller bearing fault diagnosis without a data label by using GG

are considered. Therefore, this study mainly contributes in thefollowing aspects

1. Most deep learning models including SAE and SDAE are usedfor fault diagnosis with a data label by using classifier, such as RF,and SVM. Actually, many datasets are untagged in practical engi-neering, while marking data requires labor and experience. Whilea clustering method can classify data without a label Therefore, amethod based on SDAE and GG is proposed in this study for rollerbearing fault diagnosis without data label.

2. SDAE is also employed to reduce the dimensions of the ex-tracted eigenvector for data visualization through the final hiddenlayer to two or three dimensions direct without PCA directly .

3. To compare and demonstrate that the performance of fea-ture extraction for roller bearing vibration signals by using SDAE

Page 4: Rollerbearingfaultdiagnosisusingstackeddenoisingautoencoderin … › seam › Web Content › Finalized... · 2019-10-09 · F.Xuetal./AppliedSoftComputingJournal73(2018)898–913

F. Xu et al. / Applied Soft Computing Journal 73 (2018) 898–913 901

Fig. 3. The time domain of the roller bearings in Table 1(Dataset A)

Fig. 4. The time and frequency domain of the IRF2 vibration signal.

Fig. 5. The first three PCs obtained by EEMD-FE-PCA. (a) Dataset A (b) Dataset B (c) Dataset C (a) Dataset A, (b) Dataset B, (c) Dataset C.

is superior to SAE and EEMD-FE, the PCA is used to reduce thedimension of the extracted feature at the previously hidden layers,except for the final hidden layer. The experiment results show thatthe feature extraction ability of SDAE is superior to that of SAEEEMD-FE and the classification accuracy and clustering effect of themethod presented (SDAE–GG) are also better than those of variousother (EEMD-FE-PCA-FCM/GK/GG, SAE-FCM/GK/GG) combinationmodels.

The rest of this paper is organized as follows. In Section 2, theprinciples of the SDAE and GG are introduced. The procedures ofthe proposed method and dataset source are given in Section 3.Section 4 contains the experimental validation; and conclusionsare described in Section 5.

2. The basic theory of the SDAE and GG

2.1. Autoencoder

Autoencoder (AE) is a three-layer-forward neural network thatcontains an input layer X = [x1, x2, · · · , xn], hidden layer Y =

[y1, y2, · · · , yn], and output layer Z = [z1, z2, · · · , zn], where n isthe number of samples. In these three layers, AE can be divided intotwo stages: encoder and decoder, the main idea of AE is its abilityto construct a constant function between the input and output,and to achieve the dimension reduction, and to retain the featureinformation at the same time. The detailed description and thenetwork structure of AE is shown in [14,17].

(1) Encoder: After datasets have normalized in [0,1], the inputvector X with a high dimensional is calculated by a nonlinear

Page 5: Rollerbearingfaultdiagnosisusingstackeddenoisingautoencoderin … › seam › Web Content › Finalized... · 2019-10-09 · F.Xuetal./AppliedSoftComputingJournal73(2018)898–913

902 F. Xu et al. / Applied Soft Computing Journal 73 (2018) 898–913

Fig. 6. The first three PCs (PC1–PC3) obtained from SAE with PCA, A: Dataset A, B: Dataset B, C: Dataset C. 1–8: The number of the hidden layer.

Page 6: Rollerbearingfaultdiagnosisusingstackeddenoisingautoencoderin … › seam › Web Content › Finalized... · 2019-10-09 · F.Xuetal./AppliedSoftComputingJournal73(2018)898–913

F. Xu et al. / Applied Soft Computing Journal 73 (2018) 898–913 903

Fig. 7. The first three PCs (PC1–PC3) obtained from SDAE with PCA, A: Dataset A, B: Dataset B, C: Dataset C. 1–8: The number of the hidden layer.

encoder function andmapped into a hidden layer Y, the calculationprocedures are as follows.

Y = fδ (X) = f (WX + b) (1)

where W = [w1, w2, · · · , wn] is the weight matrix between twolayers, n is the number of hidden layers, b = [b1, b2, · · · , bn] isthe bias vectors. Hence the parameters δ = [W , b] for each layerdenote the sigmoid activation function fδ (X) = 1/1 + e−x.

(2) Decoder: The output vector Z is reconstructed by Y accord-ing to the following equation.

Z = gδ(Y ) = g (WY + b) (2)

where Y is obtained according to Eq. (1) gδ is the sigmoid activationfunction gδ (Y ) = 1/1+e−y, b = [b1, b2, · · · , bn] is the bias vectorsfor each layer. Hence the AE training process aims to adjust theparameters to fit a constant function, which is used to reduce the

Page 7: Rollerbearingfaultdiagnosisusingstackeddenoisingautoencoderin … › seam › Web Content › Finalized... · 2019-10-09 · F.Xuetal./AppliedSoftComputingJournal73(2018)898–913

904 F. Xu et al. / Applied Soft Computing Journal 73 (2018) 898–913

Fig. 8. The results of 2-dimensional clustering by using EEMD-FE-PCA-FCM/GK/GG (a)–(c) Dataset A, (d)–(f) Dataset B, (g)–(i) Dataset C.

reconstruction error L between the X and Z as follows.

L = argmin ∥X − Z∥2

= arg min ∥X − gδ(Y )∥2

= arg min ∥X − gδ(fδ (x))∥2

(3)

define a cost function as follows:

L = J (W , b) =

[1n

n∑i−1

12

(x(i)− g (f (x))

2)]

2

2∑l=1

Sl∑i=1

SI+1∑j=1

(W l

ji

)2(4)

where x(i) denotes the ith sample, W lji is the connection weight

between the ith sample at lth hidden layer and the jth sample at(l+1)th hidden layer. n is the number of samples. Sl representsthe number of the neural nodes at lth hidden layer. λ is a regu-larization coefficient. The neural network obtains the appropriateparametersW and b through an error reverse conduction andbatchgradient descent algorithm.

2.2. Stacked denoising autoencoder (SDAE)

The SDAE is proposed by Vincent by using DAE [20,21]. DAE isan improvedmethod based on AE, A detailed description of and thestructure of DAE is given in [15,16]. Mixing training data into noise(usually using Gaussian noise or setting data to zero randomly)and forcing the automatic encoder to learn to remove noise andget uncontaminated input data In SDAE, in the event of impairedinput, the DAE can find more stable and more useful features. Thestructure of SDAE is shown in Fig. 1. In Fig. 1, X is the original inputdata, X1 indicates damaged input data. f and g denote the sigmoidfunction. Y is a new feature for encoding X1 by using function f,and Z is the decoded output for Y by using function g. The maindifference between DAE and AE is the fact that the DAE destroysthe original input data X by a random mapping transformationprobability P into X → P

(X ′/X1

). Therefore, the cost function

can be written as follows. The cost function LD is reduced by the

Page 8: Rollerbearingfaultdiagnosisusingstackeddenoisingautoencoderin … › seam › Web Content › Finalized... · 2019-10-09 · F.Xuetal./AppliedSoftComputingJournal73(2018)898–913

F. Xu et al. / Applied Soft Computing Journal 73 (2018) 898–913 905

Fig. 9. The results of 3-dimensional clustering by using EEMD-FE-PCA-FCM/GK/GG A: Dataset A, B: Dataset B, C: Dataset C.

gradient descent algorithm [15,16].

LD = argmin ∥X1 − Z∥2

= arg min ∥X1 − gδ(Y )∥2

= arg min ∥X1 − gδ(fδ (X))∥2

= J (W , b) =

[1n

∑ni−1

12

(x(i) − g (f (x1))2

)]+

λ2

∑2l=1

∑Sli=1

∑SI+1j=1

(W l

ji

)2(5)

Compared with AE, DAE aims to improve the robustness byintroducing artificial noise and reconstructing the input data. Inother words, the part of the input vector X is set as 0 according tothe probability P, then the AE is used to calculate the output values.Several DAE models are stacked into SDAE with N hidden layers.For a given vibration signal input vector X, the number of inputlayer nodes is the dimensionX. The second hidden layer is regardedas the second DAE to reconstruct the previous layer data. Likewise,the next hidden layer in SDAE is initialized by the previous layer.Therefore, the process is conducted sequentially until the Nth DAEhas been trained. The vibration signal feature vector Z is obtainedthrough the final hidden layer. The structure of the SDAE is givenin Fig. 1.

2.3. Gath–Geva (GG) clustering algorithm

The fuzzy c-means (FCM) algorithm determines the degree ofa sample that belongs to each cluster by using the membershipvalue. The FCM clustering algorithm can only reflect the super-spherical data structure by using Euclidean standard distance.

Gustafson and Kessel (GK) extended the standard fuzzy c-meansalgorithm by employing an adaptive distance norm, to detect clus-ters of different geometrical shapes in one dataset.

Although the GK clustering algorithm changes the FCM cluster-ing with the problem of isotropic, the problem of the clusteringalgorithm does not change the shape to a sphere. It means thateach direction of the radius is almost equal. As the distribution ofthe data cannot always be ball shaped, it is necessary to changethe distance measure. The shape of the cluster is determined bythe matrix of the distance functions. The GG clustering algorithminduces a fuzzy maximum likelihood estimation (FMLE) of thedistance norm, it can reflect different shapes and orientations ofdata. Note that the input data of GG is obtained from the finalhidden layer in SDAE.Hence the size of the input data is the numberof the final hidden layer node in SDAE.

The GGmethod divides the dataset Zq ={Z1,Z2, · ·,Zq, · ·,ZN

}1

≤ q ≤ N with N samples into c class (1 ≤ c ≤ N), The size ofeach sample is k, here k is the number of the final hidden layernode in SDAE so Zq =

{z1k,z2k, · · · zqk, · · · ,zNk

}. The membership

classification matrix U =(uiq

)c×N . Here i = 1, 2, · · · ,c and q =

1, 2, · · · ,N . Elements of uiq mean that the qth sample is classifiedobject membership belonging to the ith class. Meanwhile uiq ∈

[0, 1],∑c

i=1 uiq = 1, 1 ≤ q ≤ N and 0 <∑c

i=1∑N

q=1 uiq < N .The calculation steps of the GG method are as follows:

(1) Setting the membership classification matrix U =(uiq

)c×N

and the termination tolerance ε > 0(2) For l = 1, 2, . . .

Page 9: Rollerbearingfaultdiagnosisusingstackeddenoisingautoencoderin … › seam › Web Content › Finalized... · 2019-10-09 · F.Xuetal./AppliedSoftComputingJournal73(2018)898–913

906 F. Xu et al. / Applied Soft Computing Journal 73 (2018) 898–913

Fig. 10. The results of 2-dimensional clustering using SAE-FCM/GK/GG (a)–(c): Dataset A, (d)–(f): Dataset B, (g)–(i): Dataset C.

The cluster centers are calculated by

vli =

N∑q=1

(µl−1

q

)2Zq/

N∑q=1

(µl−1

iq

)2, 1 ≤ i ≤ c, 1 ≤ q ≤ N (6)

where vi is the ith cluster center of the matrix V = (v1,v1, · · · vi· · · ,vc) 1 ≤ i ≤ c.

The distance measure D2iq is computed. The distance of the

prototype is calculated, based on the fuzzy covariance matrices F li

of the cluster.

F li =

∑Nq=1

(µl−1

iq

)2 (Zq − vl

i

)T (Zq − vl

i

)∑N

q=1

(µl−1

iq

)2 1 ≤ i ≤ c (7)

(3) The distance function is chosen as:

D2iq

(Zq,vi

)=

(2π)n2

√det(Fi)

αiexp

(12

(Zq − vl

i

)TF−1i

(Zq − vl

i

))(8)

where αi is the membership average value of the ith cluster αi =1N

∑Nq=1 µiq

(4) Update the partition matrix.

µ(l)iq =

1∑cj=1

(DikAi

(Zq,vi

)/Diq

(Zq,vj

))2 (9)

UntilU (l) − U (l−1)

< ε, where ε is the termination toleranceof the clustering method.

3. Dataset source and the procedures for the proposed method

3.1. Data source

The roller bearing fault diagnosis data came from the CaseWestern Reserve University (CWRU) [34], the experimental datawas obtained from a motor driving a mechanical system withsample frequency 12 Khz. The roller bearing faults were dividedinto four types: normal (NR), ball fault (BF), inner race fault (IRF),and outer race fault (ORF), these four fault datasets with differentfault diameters of 0.18 mm, 0.36 mm, and 0.54 mm were used inthis paper. The detailed description of the roller bearing datasets isgiven in Table 1.

As shown in Table 1, Different symbols, such as A, B, and Crepresent the three subsets (A, B, C). Each subset contains 10 kinds

Page 10: Rollerbearingfaultdiagnosisusingstackeddenoisingautoencoderin … › seam › Web Content › Finalized... · 2019-10-09 · F.Xuetal./AppliedSoftComputingJournal73(2018)898–913

F. Xu et al. / Applied Soft Computing Journal 73 (2018) 898–913 907

Fig. 11. The results of 3-dimensional clustering using SAE-FCM/GK/GG (a)–(c): Dataset A, (d)–(f): Dataset B, (g)–(i): Dataset C.

Table 1The roller bearing experimental data under different conditions.Datasets Fault diameters (mm) Fault type The number of samples

A/B/C 0.18/0.36/0.54 NR 30/30/30BF1 30/30/30IRF1 30/30/30ORF1 30/30/30BF2 30/30/30IRF2 30/30/30ORF2 30/30/30BF3 30/30/30IRF3 30/30/30ORF3 30/30/30

of roller bearing fault under different fault diameters. Meanwhile,each subset has 30 samples with 2048 points, hence each datasethas 300 samples.

3.2. Evaluation of clustering effect

Two indicators, partition coefficient (PC) and classification en-tropy (CE), are used to assess the quality of the clustering results.

(1) The calculation of PC is given as:

PC = α =1N

c∑i=1

N∑q=1

(µiq

)2 (10)

where µiq is the membership value of the qth sample in the ithcluster. The disadvantage of PC is a lack of direct connection withsome property of the data themselves. The optimal number ofclusters is at the maximum value.

(2) The purpose of CE is to measures the fuzziness of the clusterpartition only, which is similar to the PC.

CE = β = −1N

c∑i=1

N∑q=1

µiq log(µiq

)(11)

When the PC value is close to 1, it indicates that the effect ofclustering is better, conversely, when the CE value is close to 0, itindicates that the effect of clustering is good [33].

3.3. Procedures for the proposed method

In this section, the proposed method includes the followingprocedures:

(1) Data preprocessing: (a) Load the original vibration signaland obtain the frequency spectra of the roller bearing vibration

Page 11: Rollerbearingfaultdiagnosisusingstackeddenoisingautoencoderin … › seam › Web Content › Finalized... · 2019-10-09 · F.Xuetal./AppliedSoftComputingJournal73(2018)898–913

908 F. Xu et al. / Applied Soft Computing Journal 73 (2018) 898–913

Fig. 12. The results of 2-dimensional clustering using SDAE-FCM/GK/GG (a)–(c): Dataset A, (d)–(f): Dataset B, (g)–(i): Dataset C.

signal under different conditions by using FFT. (b) Half of thecoefficient matrix is selected as the input of SAE and SDAE fortraining. Before training the SAE and SDAE neural network, allsamples are normalized to [0,1], initialing the different parametersin SAE and SDAE, including the size of the input data, the iterationnumber denoising probability P and the learning rate in a gradientdescent algorithm.

(2) Feature extraction: Construct SAE and SDAE with severalhidden layers, containing the number of the hidden layers, thenumber of nodes at each hidden layer and the size of the inputdata. Then use the unlabeled and denoised data X1 with denoisingprobability P in Fig. 2 to train the SAE and SDAE layer by layer.Consider the first hidden layer in SAE and DSAE as a hidden layer ofthe first AE and DAE and select the unlabeled data X1 as input dataand output data target Z1 to train the first AE and DAE. Then usethe Z1 as the inputs and outputs to train the second AE and DAEfor setting the parameters of the second hidden layer of the SAEand SDAE and obtain Z2. Finally, continue the training steps untilthe Slth AE and DAE are trained and obtain the extracted usefulfeature. Here Sl denotes the number of hidden layers. It shouldbe noted that the number of the nodes in a final hidden layer isfixed as 2 and 3 for data visualization. To compare the featureextraction performance of DSAE, SAE, and EEMD with FE, PCA isemployed to reduce the dimension of the extracted feature to 2 and

3 at each hidden layer except the final hidden layer. The detailedinformation about PCA is given in reference [33].

(3) Fault diagnosis: Use the eigenvector as the input of the FCM,GK, and GG models for roller bearing fault diagnosis.

(a) Initiate the different parameters in FCM, GK, and GG, includ-ing the number of clusters c and the terminate tolerance ε.

(b) Calculate the cluster centers v and the distanceDiq accordingto Eqs. (6)–(9).

(c) Update the partition matrix U until meeting the conditionU (l) − U (l−1) < ε.

(4) Comparison analysis: Compute the CE, PE, and classificationaccuracy and compare the method presented with other differentcombination models.

The frame diagram of the proposed method is shown in Fig. 2.

4. Fault diagnosis by using the method presented

In this section, use the experimental data to verify the methodpresented. First, the time domain of roller bearing vibration signalsin Table 1 is given in Fig. 3. (As the space is limited, each type of theoriginal vibration signals, with one sample in dataset A, is givenhere)

In Fig. 3, ten kinds of vibration signal are difficult to be dis-tinguished because different vibration signals have various ampli-tudes, as the NR and BF signals do not have obvious irregularity.

Page 12: Rollerbearingfaultdiagnosisusingstackeddenoisingautoencoderin … › seam › Web Content › Finalized... · 2019-10-09 · F.Xuetal./AppliedSoftComputingJournal73(2018)898–913

F. Xu et al. / Applied Soft Computing Journal 73 (2018) 898–913 909

Fig. 13. The results of 3-dimensional clustering using SDAE-FCM/GK/GG (a)–(c) Dataset A, (d)–(f) Dataset B, (g)–(i) Dataset C.

However, the IRF and ORF signals have apparent vibration regular-ity andobvious vibrationpatterns. This is because the bearing innerand outer race experience impacts when the rollers of the bearingare rotating and encounter the surface of the outer race. Therefore,the IRF and ORF vibration signals have strong periodic regularity.

First, the FFT is used to transform the time domain vibrationsignal into the frequency domain; here take an IRF2 signal forexample. The time and frequency domains of the IRF2 vibrationsignal are shown in Fig. 4. As shown in Fig. 4(b), the IRF signalworking frequencies mainly focus on 0–1000 Hz especially, due tothe working frequency for the IRF signal being 58 Hz, so the mainfrequency is focused on 58 Hz and the double frequency (164 Hz)

This result indicates that the frequency domain signal containsuseful feature information. Therefore, the FFT is used to preprocessthe different vibration signals in the first step.

After FFT transformation, the EEMD with FE, SAE, and SDAEmodels are used to decompose the vibration signals. Therefore,there are some parameters that should be selected in the EEMD,FE, SAE, and SDAE models.

(1) EEMD: There are two parameters to be set in EEMD, theensemble number m and the amplitude of the added white noise

ni (t) [7,8,33]. Generally speaking, an ensemble number of a fewhundredwill lead to an exact result, and the remaining noisewouldcause less than a fraction of one percent of error if the addednoise has the standard deviation that is a fraction of the standarddeviation of the input signal. For the standard deviation (SD) ofthe added white noise, in [7,8,33], the authors suggested that thewhite noise should be about 20% of the standard deviation of theinput signal [7,8,33]. The parameterm is set as an integer multipleof 100 [7,8,33], here, in this paper,m = 100.

(2) FE: Three parameters must be selected before calculation ofFE. The first parameter-embedding dimension m, as in FE, is thelength of sequences to be compared. Typically, a larger m allowsmore detailed reconstruction of the dynamic process [25–27,34–38]. Generally speaking,mwas often fixed as 2 [25–27,34–38]. Theparameters’ similarity tolerance r and n determine the width andthe gradient of the boundary of the exponential function respec-tively. In termsof the FE similarity boundary determinedby r andn,too narrow settings will lead to salient influence from noise, whiletoo broad a boundary, as mentioned above, was supposed to beavoided for fear of information loss. It is convenient to set thewidth

Page 13: Rollerbearingfaultdiagnosisusingstackeddenoisingautoencoderin … › seam › Web Content › Finalized... · 2019-10-09 · F.Xuetal./AppliedSoftComputingJournal73(2018)898–913

910 F. Xu et al. / Applied Soft Computing Journal 73 (2018) 898–913

Table 2The results of α (PC) and β (CE).Model Dimension Dataset FCM GK GG

α(PC) β(CE) α(PC) β(CE) α(PC) β(CE)

EEMD-FE-PCA

2A 0.6580 0.7735 0.7085 0.6630 0.9788 NaNB 0.6894 0.6920 0.7123 0.6262 0.9613 NaNC 0.7123 0.6440 0.7453 0.5619 0.9602 NaN

3A 0.6538 0.7738 0.6010 0.9004 0.9761 NaNB 0.6728 0.7190 0.6624 0.7983 0.9677 NaNC 0.7326 0.6211 0.6854 0.7543 0.9893 NaN

SAE

2A 0.8028 0.4731 0.8507 0.3389 0.9817 NaNB 0.7873 0.4608 0.8013 0.4339 0.9778 NaNC 0.8205 0.4119 0.7963 0.4021 0.9775 NaN

3A 0.7938 0.5260 0.8507 0.3495 1 NaNB 0.8398 0.3990 0.8910 0.2496 0.9999 NaNC 0.8151 0.4451 0.8777 0.2848 0.9967 NaN

SDAE

2A 0.9477 0.1321 0.9483 0.1001 0.9984 NaNB 0.9257 0.1681 0.9577 0.0852 0.990 NaNC 0.9612 0.0852 0.9735 0.0492 0.9997 NaN

3A 0.9483 0.1395 0.9949 0.0133 1 NaNB 0.9568 0.1170 0.9726 0.0577 1 NaNC 0.9930 0.0224 0.9812 0.0335 1 NaN

of the boundary as r multiplied by the standard deviation (SD) ofthe original datasets. Experimentally, r = (0.1 − 0.25)*SD [25–27,34–38], the parameter r was set as 0.2SD in the FEmodel in thispaper. Finally, the parameter nwas often fixed as 2 [25–27,34–38].

(3) FCM/GK/GG: Setting the parameter c = 10, where c is thenumber of clusters. Meanwhile, the value of termination toleranceε = 1e − 6.

(4) SAE/SDAE: Each subset (A, B, C) containing 300 samples inTable 1 is preprocessed by FFT. Each sample contains 2048 pointsand only half of the coefficient matrix is used after using FFTdecomposition. Therefore, the input size of SAE and SDAE is 1024.Eight hidden layers were used in this paper. In order to reduce thedimension of the extracted feature, a triangular structure is used inthis paper. Hence the number of nodes in the proceeding layer ishalf of and less than the number of nodes in the previous layer. Thenumber of nodes of each hidden layer are 512, 256, 128, 64, 32, 16,8, 3 and 2. Note that the number of the nodes at the final hiddenlayer is 2 or 3 because it is easy to show the extracted feature by theGG clustering model. The weights and biases of the SAE and SDAEwere initialized randomly. The lower the learning rate, the slowerthe cost function changes. Use a low learning rate to find any localminimum as much as possible. But the training time needs to belonger, hence the number of iterations is selected as 3000 and thelearning rate in a gradient descent algorithm as 0.1 in this paper.The denoising probability P in SDAE is too large, SDAEwill producehigh error because of too much information missing. When thenoise level P decrease layer by layer can achieve a more satisfiedresult simultaneously [39]. In addition, parameter P is often fixedlower than 0.5 [40–42]. Therefore, the denoising probability P inSDAE in this study is selected as 0.05, this value indicates that 5%of input data is set to zero randomly.

To demonstrate that the feature extraction performance of theSDAE is superior to that of SAE and EEMD-FE, the principal com-ponent analysis (PCA) is used to reduce the dimension of theextracted features through the first seven hidden layers (exceptthe final hidden layer). The results of the first three PCs obtainedby PCA for each hidden layer are given in Figs. 5–7 by using EEMDwith FE, SAE, and SDAE models.

In Fig. 6, the title in each subfigure such as ‘‘SAE-512-(B1)’’denotes that the SAEmodel has 512neural nodes in the first hiddenlayer by using SAE with dataset B. As can be seen from Figs. 6–7, asthe number of hidden layers increases, the preprocessed variousvibration signals are separated well. It should be noted that thedimension of all samples is 3 in the final hidden layer, hence the 8th

subfigure in Fig. 6 (v–x) and Fig. 7 (v–x) PCA. But in previous layersfrom 1 to 7, the dimensions of all samples’ features are reducedby PCA to achieve the data visualization target. PC1, PC2, and PC3represent the first three components.

In Figs. 6–7, obviously, data of the same fault type are discretein the first four layers, while there may be overlapping betweendata of different fault types in all the datasets. As the numberof hidden layers increases, these scattered data points are moreconcentrated at one point and these data of different fault types aremore separated from each other. As can be seen in Fig. 7, from alldatasets, the data points of the same shapes aremore concentratedat one point and there is a distinct separation between differentfault data types in the final hidden layer as compared with that ofthe first hidden layer. For example, all NR signals’ data, which havea triangular shape, are concentrated in Fig. 7 (v–x), (overlappingwith each other) in all datasets in the final hidden layer, they are,however, discrete in the first hidden layer.

Compared with SAE, as the number of hidden layers increases,the effect of the feature extraction ability in SDAE improves. Asfor SDAE-PCA-A/B/C-8, these scattered data points are more con-centrated at one point and these data of different fault types aremore separated fromeach other at the final hidden layer. But, somesamples are scattered at the final hidden layer in SAE. This resultindicates that the robustness of SDAE is better than that of SAE.

As shown in Fig. 5, comparedwith the EEMD-FEmodel, all faultsare close together in SDAE, especially in Fig. 7 (v–x). It looks likeonly a symbol ‘‘*’’ (IRF3). However, all types of fault are slightlyoverlapped together in Fig. 5. When the EEMD model is used, toomany samples are spread out and clustering around the centerpoint. The closer the cluster center point is to its correspondingsample point, the better the clustering is. If the distance betweenthe cluster center point and the sample point is small, it is easy tojudge whether a sample point belongs to a cluster. This indicatesthat the SDAE has good feature extraction ability. The extractedfeatures are selected as the input of FCM, GK, and GG for rollerbearings fault diagnosis, the results of two dimensions and threedimensions clustering by using EEMD-FE-PCA-FCM/GK/GG, SAE-FCM/GK/GG, and DSAE-FCM/GK/GG are shown in Figs. 8–13.

As shown in Figs. 12–13. The symbol ‘‘cc’’ denotes the clustercenter points for each type of sample.

(1) This can be seen in Figs. 12–13. The results of 3-dimensionalclusteringwhen differentmodes are used are generally better thanthose of 2-dimensional clustering, as shown in Figs. 8(a), 9(a).Figs. 10(a), 11(a), 12(a), and 13(a).

Page 14: Rollerbearingfaultdiagnosisusingstackeddenoisingautoencoderin … › seam › Web Content › Finalized... · 2019-10-09 · F.Xuetal./AppliedSoftComputingJournal73(2018)898–913

F. Xu et al. / Applied Soft Computing Journal 73 (2018) 898–913 911

Table 3The results of classification accuracy by using different models.Mode Model Dataset The accuracy of each cluster(%) Total (%)

NR BF1 IRF1 ORF2 BF2 IRF2 ORF2 BF3 IRF3 ORF3

EEMD-FE-PCA (k = 2)

FCMA 93.3 100 60 100 100 100 93.3 100 100 26.7 87.33B 100 100 100 63.3 100 73.3 100 100 100 30 86.69C 100 100 56.7 43.3 96.7 96.7 66.7 100 100 100 76.34

GKA 80 100 60 100 100 100 100 100 100 50 89B 76.7 100 100 66.7 100 100 100 100 100 66.7 91.01C 93.3 100 100 50 50 100 100 100 100 86.7 88.01

GGA 100 100 100 100 100 100 13.3 96.7 100 53.3 86.33B 100 100 100 63.3 100 20 100 80 100 30 79.36C 100 100 40 60 100 100 46.7 100 100 100 84.67

EEMD-FE-PCA (k = 3)

FCMA 100 100 86.7 100 100 50 83.3 100 96.7 43.3 86B 100 100 63.3 100 100 100 100 100 66.7 36.7 86.67C 46.7 100 100 83.3 100 100 53.3 100 100 100 88.33

GKA 93.3 100 60 100 100 66.7 100 100 100 43.3 80.33B 100 100 70 100 100 100 33.3 100 73.3 100 87.66C 100 100 100 93.3 50 100 100 100 50 100 89.33

GGA 100 100 70 100 100 100 100 60 70 43.3 84.33B 100 100 70 100 100 100 100 60 70 43.3 84.33C 63.3 100 100 96.7 100 100 36.7 100 100 100 89.67

SAE(k = 2)

FCMA 73.3 100 100 100 100 100 100 40 100 46.6 85.99B 63.3 100 100 53.3 100 100 60 100 100 76.6 85.32C 100 56.67 100 100 80 66.7 43.3 100 100 100 84.67

GKA 93.3 100 100 43.3 100 100 100 56.7 100 100 89.33B 66.7 100 100 96.7 100 100 30 100 100 80 87.34C 100 30 100 16.7 100 90 56.7 100 100 100 79.34

GGA 100 100 100 100 100 100 100 30 100 53.3 88.33B 76.7 100 93.3 100 100 100 20 53.3 100 100 84.33C 100 43.3 100 100 100 63.3 60 100 100 13.3 77.99

SAE(k = 3)

FCMA 100 93.3 100 100 96.7 100 100 100 100 100 99B 100 100 100 100 100 80 80 100 100 100 96C 100 96.7 100 100 100 100 100 63.3 100 100 96

GKA 43.3 100 100 100 100 100 100 100 100 66.7 91B 100 100 60 100 93.3 63.3 100 100 100 100 91.66C 100 100 50 100 100 100 83.3 100 100 23.3 85.66

GGA 93.3 100 100 100 100 100 100 100 100 100 93.3B 100 96.7 100 100 100 100 86.7 100 100 100 98.34C 100 100 93.3 86.7 100 100 100 100 100 100 98

SDAE(k = 2)

FCMA 100 100 100 100 93.3 43.3 100 56.7 100 100 89.33B 33.3 100 100 100 100 100 100 100 63.3 100 89.66C 100 100 100 100 100 100 46.7 100 100 53.3 90

GKA 96.7 100 100 100 100 50 50 100 93.3 100 89B 50 100 100 100 90 100 73.3 100 56.7 26.7 79.67C 100 100 100 100 100 100 66.7 100 96.7 33.3 89.67

GGA 100 100 100 100 93.3 43.3 100 56.7 100 100 89.33B 10 100 100 100 100 100 100 100 93.3 100 93.3C 100 100 100 100 100 100 13.3 100 100 86.7 90

SDAE(k = 3)

FCMA 100 100 100 93.3 100 100 100 100 100 100 93.3B 100 100 100 86.7 100 100 100 100 100 100 98.67C 100 100 100 100 100 100 100 100 100 100 100

GKA 100 100 100 100 93.3 100 100 100 100 93.3 98.66B 56.7 100 100 90 100 100 100 100 100 100 94.67C 100 100 100 40 100 96.7 100 100 100 60 89.67

GGA 100 100 100 93.3 100 100 100 100 100 100 93.3B 100 100 100 90 100 100 100 100 100 100 99C 100 100 100 100 100 100 100 100 100 100 100

(2) Compared with FCM and GK, the shape and orientations ofthe data structure in GG are closer to the actual data distribution.As shown in Figs. 8, 10 and 12, the data distribution in FCM isapproximately round, and GK resembles an oval, but the shape ofdata in GG is not round and oval. Because the FCM and GK useEuclidean distance to compute the distance between two samples,they are suitable for data in a spherical structure only. GG usedthe fuzzy maximum likelihood estimation to measure the normdistance between any two samples. So, it is suitable for reflectingdata in structures of different shapes and orientations.

(3) Fig. 13 shows that different types of sample in all datasetsusing SDAE are divided clearly, but in Figs. 9 and 11, they are notdivided well. These results indicate that the robustness of SDAE issuperior to that of SAE.

(4) To further verify that the effect of GG is better than that ofFCM and GK, two cluster indicators, α (PC) and β (CE) in Eqs. (10)and (11), were used to compare the FCM, GK, and GG. In [26],the authors demonstrated that the closer the value of α (PC) isto 1, the better the effect of the clustering model is. However, thecloser the value of CE is to 0, the better the effect of the clusteringmodel is. Because the values of α (PC) and β (CE) are calculated by

Page 15: Rollerbearingfaultdiagnosisusingstackeddenoisingautoencoderin … › seam › Web Content › Finalized... · 2019-10-09 · F.Xuetal./AppliedSoftComputingJournal73(2018)898–913

912 F. Xu et al. / Applied Soft Computing Journal 73 (2018) 898–913

membership value µiq in Eq. (9), here µiq denotes the degree of asample of the ith cluster. The greater the value µiq is, the greateris the likelihood of its being divided into one category, meanwhile,the sum of µiq is 1 for the qth sample. The corresponding resultsare shown in Table 2.

(a) As shown in Table 2, the value of PC is increased graduallyfrom up to down in different modes, and the value of CE is de-creased gradually. When the GG model is used, the greatest valueof PC is 1. All PC values in GG are greater than those in FCM and GK,all CE values in SDAE are smaller than those in FCM and GK.

(b) Table 2 shows that all PC values in GG are greater than thosein the FCM and GK models. This indicates that the GG model isbetter than the FCM and GK models.

(c) In Table 2, k denotes the dimension of the extracted feature.The larger the k is, the greater the value of PC is, and vice versafor CE. Because k represents the number of PCs, the eigenvectorscontain more feature information when the value of k increases.Therefore, the first three PC values when k = 3 are greater thanthe first two PC values when k = 2 and vice versa for CE. The valueof lnµiq is close to negative infinity when µiq in Eq. (10) is 0, then0∗ (−∞) is not a number. Hence the smallest value of CE is NaN inthe final column. In summary, these results demonstrate that theproposed method is superior to other combination models.

The classification accuracy is used to compare the effect ofSDAE–GG and other combination models. The results of classifica-tion accuracywhen differentmodels are used are shown in Table 3.In Table 3, the parameter k denotes the number of the final hiddenlayer node in SAE and SDAE, it is also the size of the input data ofthe GGmodel.Meanwhile, the accuracy of each cluster is also givenin Table 3.

(1) The highest classification accuracy is 100% in the SDAEmodelwhen k = 3, such as in the SDAE-FCMand SDAE–GGmodels.The smallest classification accuracy is 76.34% in the EEMD modelwhen k = 2. As shown in Table 3, the classification accuracy inSDAE is overall higher than that in the SAE and EEMD models.

(2) The classification accuracy in GG is overall higher than thatin the FCM and GK models. Table 3 shows, that the accuracyof SDAE–GG is superior to that of the SDAE-FCM and SDAE–GG.The results mentioned above demonstrate that the classificationperformance of SDAE–GG is better than that of other combinationmodels.

5. Conclusion

A method based on SDAE and GG for roller bearing fault diag-nosis is proposed in this paper. Because the frequency domain ofthe vibration signal can be useful, the FFT is deployed to transferthe time domain vibration signal into a frequency domain series.Hence the transformed frequency domain signals are consideredas the input data for training the SDAE model. SDAE, a deep modeimplemented by several hidden layers, is used to extract the usefulinformation and reduce the dimension without PCA. To verify thatthe feature extraction performance of the SDAE is better thanthat of the SAE and EEMD with the FE model, the PCA is selectedto reduce the dimension of eigenvectors obtained from severalpreviously hidden layers, except for the final hidden layer. Theresults show that when the number of the hidden layers increases,compared with the SAE and EEMD-FE models, all the fault sam-ples in SDAE under different conditions are separated better thanthose in the models mentioned above. Finally, the FCM, GK, andGG models are used to make the roller bearing fault diagnosis.Three indicators, PC, CE, and classification accuracy are selected toevaluate the identification performance of the method presentedand other combination models. When the two indicators, PC andCE, are calculated by the membership value in GG, it is shown thatthe SDAE–GG has a good clustering effect. Moreover, classificationaccuracy can demonstrate that themethodpresented is better thanother combination models referred to in this paper.

Acknowledgments

The work described in this paper is fully supported by a grantfrom theResearchGrants Council (ProjectNo. CityU11201315) anda grant from the Research Grants Council of the Hong Kong SpecialAdministrative Region, China (Project No. [T32-101/15-R]).

References

[1] Y. Huang, B.X. Wu, J.Q. Wang, Test for active control of boom vibration of aconcrete pump truck, J. Vib. Shock 31 (2) (2012) 91–94, http://en.cnki.com.cn/Article_en/CJFDTOTAL-ZDCJ201202020htm.

[2] F. Resta, F. Ripamonti, G. Cazzluani, Independent modal control for nonlinearflexible structures: an experimental test rig, J. Sound Vib. 329 (8) (2011) 961–972, http://dx.doi.org/10.1016/j.jsv.2009.10.021.

[3] G. Bagordo, G. Cazzluani, F. Resta, Amodal disturbance estimator for vibrationsuppression in nonlinear flexible structures, J. Sound Vib. 330 (25) (2011)6061–6069, http://dx.doi.org/10.1016/j.jsv.2011.07.014.

[4] X.B. Wang, S.G. Tong, Nonlinear dynamical behavior analysis on rigid flexiblecoupling mechanical arm of hydraulic excavator, J. Vib. Shock 33 (1) (2014)63–70, http://en.cnki.com.cn/Article_en/CJFDTotal-ZDCJ201401012htm.

[5] N.E. Huang, Z. Shen, S.R. Long, The empirical mode decomposition and thehilbert spectrum for nonlinear and non-stationary time series analysis, Proc.R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 454 (1998) 903–995, http://dx.doi.org/10.1098/rspa.1998.0193.

[6] H.Z. Wu, N.E. Huang, Ensemble empirical mode decomposition: a noise-assisted data analysis method, Adv. Adapt. Data Anal. 1 (2009) 1–41, http://dx.doi.org/10.1142/S1793536909000047.

[7] X.Y. Zhang, Y.T. Liang, Y. Zang, A novel bearing fault diagnosis model inte-grated permutation entropy, ensemble empirical mode decomposition andoptimized SVM,Measurement 69 (2015) 164–179, http://dx.doi.org/10.1016/j.measurement.2015.03.017.

[8] X.Y. Zhang, J.Z. Zhou, Multi-fault diagnosis for rolling element bearings basedon ensemble empirical mode decomposition and optimized support vectormachines, Mech. Syst. Signal Process. 41 (2013) 127–140, http://dx.doi.org/10.1016/j.ymssp.2013.07.006.

[9] Q. Miao, D. Wang, M. Pecht, Rolling element bearing fault feature extractionusing EMD-based independent component analysis, in: IEEE PHM, 2011, pp.1–6. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6024349.

[10] C. Yi, D.Wang, F.Wei, EEMDbased steady-state indexes and their applicationsto condition monitoring and fault diagnosis of railway axle bearings, Sensors18 (704) (2018) 1–21, https://www.mdpi.com/1424-8220/18/3/704/htm.

[11] D. Wang, W. Guo, P.W. Tse, An enhanced empirical mode decompositionmethod for blind component separation of a single-channel vibration signalmixture, J. Vib. Control 22 (11) (2014) 2603–2618, http://journals.sagepub.com/doi/abs/101177/1077546314550221.

[12] H.D. Shao, H.K. Jiang, H.W. Zhao, A novel deep autoencoder feature learningmethod for rotating machinery fault diagnosis, Mech. Syst. Signal Process. 95(2017) 187–204, http://dx.doi.org/10.1016/j.ymssp.2017.03.034.

[13] M. Van, H.J. Kang, Bearing defect classification based on individual wavelet lo-cal fisher discriminant analysis with particle swarm optimization, IEEE Trans.Ind. Inf. 12 (2016) 124–135, http://dx.doi.org/10.1109/TII.2015.2500098.

[14] J. Feng, Y.G. Lei, L. Jing, Deep neural networks: A promising tool for faultcharacteristic mining and intelligent diagnosis of rotating machinery withmassive data, Mech. Syst. Signal Process. (2016) 303–315, http://dx.doi.org/10.1016/j.ymssp.2015.10.025.

[15] C.C. Tan, C. Eswaran, Using autoencoders for mammogram compression, J.Med. Syst. 35 (2011) 49–58.

[16] F.Y. Lv, CL. Wen, M.Q. Liu, Weighted time series fault diagnosis based on astacked sparse autoencoder, J. Chemometrics 31 (9) (2017) 1–16, http://dx.doi.org/10.1002/cem.2912.

[17] Y.M. Qi, C.Q. Shen, D.Wang, Stacked sparse autoencoder-based deep networkfor fault diagnosis of rotating machinery, IEEE Access 5 (2017) 15066–15079,https://ieeexplore.ieee.org/abstract/document/7983338/.

[18] L.K. Wang, X.Y. Zhao, J.G. Pei, ransformer fault diagnosis using continuoussparse autoencoder, Springerplus 5 (2016) 1–13, http://dx.doi.org/10.1186/s40064-016-2107-7.

[19] S. Tang, C. Shen, D. Wang, Adaptive deep feature learning network with Nes-terov momentum and its application to rotating machinery fault diagnosis,Neurocomputing (2018) 1–14, http://dx.doi.org/10.1016/j.neucom.2018.04.048.

[20] P. Vincent, H. Larochelle, I. Lajoie, Stacked denoising autoencoders: Learninguseful representations in a deep network with a local denoising criterion,J. Mach. Learn. Res. 11 (2010) 3371–3408, http://www.jmlr.org/papers/v11/vincent10a.html.

[21] P. Vincent, H. Larochelle, Y. Bengio, Extracting and composing robust featureswith denoising autoencoders, in: International Conference, 2008, pp. 1096–1103. https://dl.acm.org/citation.cfm?id=1390294.

Page 16: Rollerbearingfaultdiagnosisusingstackeddenoisingautoencoderin … › seam › Web Content › Finalized... · 2019-10-09 · F.Xuetal./AppliedSoftComputingJournal73(2018)898–913

F. Xu et al. / Applied Soft Computing Journal 73 (2018) 898–913 913

[22] B. Leng, S. Guo, X. Zhang, X. Zhang, 3D object retrieval with stacked localconvolutional autoencoder, Signal Process. 112 (2015) 119–128, http://dx.doi.org/10.1016/j.sigpro.2014.09.005.

[23] Y. Liu, X. Feng, Z. Zhou, Multimodal video classification with stacked contrac-tive autoencoders, Signal Process. 120 (2015) 761–766, http://dx.doi.org/10.1016/j.sigpro.2015.01.001.

[24] J. Li, Z. Struzik, L. Zhang, A. Cichocki, Feature learning from incomplete EEGwith denoising autoencoder, Neurocomputing 165 (2014) 23–31, http://dx.doi.org/10.1016/j.neucom.2014.08.092.

[25] L. Zhang, G.L. Xiong, H.S. Liu, Bearing fault diagnosis usingmulti-scale entropyand adaptive neuro-fuzzy inference, Expert Syst. Appl. 37 (2010) 6077–6085,http://dx.doi.org/10.1016/j.eswa.2010.02.118.

[26] F. Xu, Y.J. Fang, Z.M. Kong, A fault diagnosis method based on MBSEand PSO-SVM for roller bearings, J. Vib. Eng. Technol. 4 (2016) 383–394, https://www.researchgate.net/profile/Zhengmin_Kong/publication/308051735_A_fault_diagnosis_method_based_on_mbse_and_pso-SVM_for_roller_bearings/links/582725a008ae5c0137edcc66/A-fault-diagnosis-method-based-on-mbse-and-pso-SVM-for-roller-bearings.pdf.

[27] X.J. Guo, C.Q. Shen, L. Chen, Deep fault recognizer: An integrated model todenoise and extract features for fault diagnosis in rotating machinery, Appl.Sci.-Basel. 7 (41) (2017) 1–17, http://www.mdpi.com/2076-3417/7/1/41.

[28] S.Q. Zhang, G.X. Sun, L. Li, Study on mechanical fault diagnosis method basedon LMD approximate entropy and fuzzy C-means clustering, Chin. J. Sci.Instrum. 34 (3) (2013) 714–720, http://en.cnki.com.cn/Article_en/CJFDTotal-YQXB201303034htm.

[29] D.E. Gustafson,W.C. Kessel, Fuzzy clusteringwith fuzzy covariancematrix, in:IEEE Conference on Decision and Control including the 17th Symposium onAdaptive Processes, 1979, pp. 761–766. http://ieeexplore.ieee.org/abstract/document/4046215/.

[30] S.T.Wang, L. Li, S.Q. Zhang,Mechanical fault diagnosismethod based on EEMDsample entropy and GK fuzzy clustering, Chin. J. Sci. Instrum. 24 (22) (2013)3036–3044.

[31] I. Gath, A.B. Geva, Unsupervised optimal fuzzy clustering, IEEE Trans. PatternAnal. Mach. Intell. 11 (7) (1989) 773–781, http://ieeexplore.ieee.org/abstract/document/192473/.

[32] J.C. Bezdek, J.C. Dunn, Optimal fuzzy partitions: A heuristic forb estimating theparameters in amixture of normal dustrubutions, IEEE Trans. Comput. (1975)835–838, https://ieeexplore.ieee.org/abstract/document/1672910/.

[33] F. Xu, Y.J. Fang, R. Zhang, PCA-GG rolling bearing clustering fault diagnosisbased on EEMD fuzzy entropy, Comput.-Integr. Manuf. Syst. 22 (11) (2016)2631–2642.

[34] Case Western Reserve University. Bearing data center test seeded fault testdata. http://csegroups.case.edu/bearingdatacenter/pages/download-data-file. (Accessed 2013).

[35] S.M. Pincus, Approximate entropy as a measure of system complexity, Proc.Natl. Acad. Sci. 55 (1991) 2297–2301, http://dx.doi.org/10.1073/pnas.88.6.2297.

[36] R.Q. Yan, R.X. Gao, Approximate entropy as a diagnostic tool for machinehealth monitoring, Mech. Syst. Signal Process. 21 (2007) 824–839, http://dx.doi.org/10.1016/j.ymssp.2006.02.009.

[37] W. Chen, J. Zhuang, W. Yu, Measuring complexity using FuzzyEn, ApEn,and SampEn, Med. Eng. Phys. 31 (2009) 61–68, http://dx.doi.org/10.1016/j.medengphy.2008.04.005.

[38] G. Xiong, L. Zhang, H. Liu, A comparative study on apen sampen and their fuzzycounterparts in a multiscale framework for feature extraction, J. ZhejiangUniv. Sci. A (Appl. Phys. Eng). 11 (4) (2010) 270–279, https://link.springer.com/article/10.1631/jzus.A0900360.

[39] J.W. Leng, Q.X. Chen, N. Mao, Combining granular computing techniquewith deep learning forservice planning under social manufacturing contexts,Knowl.-Based Syst. 143 (2018) 295–306.

[40] J. Dolz, N. Betrouni, M. Quidet, Stacking denoising auto-encoders in a deepnetwork to segment thebrainstem on MRI in brain cancer patients: A clinicalstudy, Comput. Med. Imaging Graph. 52 (2016) 8–18.

[41] R.L. Tang, X. Li, J.G. Lai, A novel optimal energymanagement strategy for amaritime hybrid energy systembased on largescale global optimization, Appl.Energy 228 (2018) 254–264.

[42] R.L. Tang, Z. Wu, X. Li, Optimal operation of photo-voltaic/battery/diesel/coldironing hybrid energy system for maritimeapplication, Energy 162 (2018) 697–714.