a novel toolbox for bearing fault detection based on pcc
TRANSCRIPT
A Novel Toolbox for Bearing Fault Detection Based on PCC and Residual Blocks
Paper:
A Novel Toolbox for Bearing Fault Detection Based on PCC andResidual Blocks
Chengkun Li1*, Yufan Lin1*, Yujing Liu1*
Qi Gao1
1Beijing Institute of TechnologyE-mail: {1120171048, 1120171146, 1120171028, gaoqi}@bit.edu.cn
[Received 22/07/2020; accepted 14/08/2020]
Bearing is one of the essential components of mechan-ical systems, bearing fault detection is of great impor-tance in bearing production and system fault diagno-sis. In this paper, a novel toolbox for bearing fault de-tection using the bearing vibration signals is proposed.Two baseline models are included: 1. Baseline forFeature Engineering Based Method, which consists ofthree steps: time-frequency feature extraction, Pear-son Correlation Coefficient (PCC) reduction and clas-sification. 2. Baseline for Deep Learning Based Meth-ods: a powerful deep neural network model consists ofConvolutional Blocks and Residual Blocks. In the pa-per, the experimental results show that the methods inour toolbox are sufficiently robust to produce resultswith accuracy between 98% and 100%.
Keywords: Rolling bearing, Fault detection, MachineLearning, Deep Learning, Time series classification
1. Introduction
The bearings are one of the most widely used parts ofmechanical equipment. Its running state will directly af-fect the performance and production safety of the equip-ment, it is of considerable significance to the research ofrolling bearing fault diagnosis technology. [1] Differentsensing modalities for solving the bearing fault detectionproblem have been explored including vibration [2], [3],acoustic noise [4] , [5], stator current [6], [7], thermal-imaging [8], and multiple sensor fusion [9], among whichvibration analysis is the most dominant [10]. The exis-tence of a bearing fault as well as its specific fault typecan be readily determined by performing frequency spec-tral analysis [11]. However, accurately identifying thepresence of a bearing fault can be challenging in prac-tice. The uniqueness of a bearing failure lies in its multi-physics nature. Abnormal electric signal is triggered byprimary mechanical vibration due to the bearing defect.The vibration further influences the output torque, the mo-tor speed, and finally the bearing vibration pattern itself.Furthermore, the accuracy of the physical model-based
. * indicates equal contributions.
vibration analysis can be further affected by backgroundnoise, and its sensitivity is subject to change with respectto sensor mounting positions and spatial constraints in ahighly-compact environment. Therefore, more robust andreliable methods of constructing the classification modelare required.
The problem of bearing fault detection via vibrationsignal is essentially a time series classification prob-lem. For traditional machine learning classifiers, fea-tures are usually extracted from time domain, frequencydomain and time-frequency domain. A commonly usedtime-frequency feature extraction method is empiricalmode decomposition (EMD), where cubic splines areused to define the upper and lower envelopes in each sift[12]. In [13], generalized empirical mode decomposition(GEMD), empirical envelope demodulation (EED) andHilbertHuang transform (HHT) were used. Xu [14] andhis group applied the algorithm based on empirical modeand spectral kurtosis to simplify the model, Mou [15] pro-posed a new frequency spectrum-based feature extractionmethod.
Wavelet packet transform is widely used in bearingfault analysis. Li et al. [16] proposed a method of intro-ducing wavelet packet transform into bearing fault analy-sis. Guo et al. [17] obtained the enhanced fault features byextracting the wavelet packet energy spectrum features.
For Deep Learning methods, Li et al. [18] proposed aBP network model that combined with VMD (VariationalMode Decomposition) Algorithm to tackle the bearingfault detection problem; Wen et al. [19] proposed a bear-ing fault detection pipeline; Moreover, in terms of DeepLearning-based solutions for time series classification,Cui et al. [20] proposed a neural network model calledMulti-scale Convolution Neural Network based on multi-spec convolutional layer. Wang et al. [21] proposed threeend-to-end deep neural networks for the TSC problem:Multiple Layer Perceptron (MLP), Fully ConvolutionalNetwork (FCN) , and Residual Network (ResNet), theirwork provides an important reference for deep networkthat uses raw time series as input without the need fordata augmentation and data preprocessing.
In this paper, a toolbox for bearing fault detection wasproposed which includes two baseline methods:
âą Feature dimension reduction pipeline based on Pear-
The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020)CITIC Jingling Hotel Beijing, Beijing, China, Oct.31-Nov.3, 2020 1
The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020) Beijing, China, Oct.31-Nov.3, 2020
Chengkun Li, Yufan Lin, Yujing Liu
son Correlation Coefficient which effectively en-sures the efficiency as well as the robustness of de-tection.
âą Simple while powerful deep learning baseline modelbased on Convolutional Blocks and Residual Blocksand achieved comparable performances with relativelow computing cost.
The toolbox could serve as baseline method in the fieldof bearing fault detection due to its simplicity and robust-ness.
2. Baseline Model for Feature EngineeringBased Method
In this part, a Feature Engineering Baseline modelbased on the feature reduction method with Pearson cor-relation coefficient is promoted for rolling bearing faultdetection. The model is tested on the dataset mentionedbefore and reached 100% test accuracy on the dataset
The performance of our model is compared with thewidely-used feature reduction method: Principal Compo-nent Analysis (PCA). The extraction of features, the re-duction of feature and the training of classifiers are illus-trated in following parts.
2.1. Preprocessing and Feature ExtractionFor each vibration sequence, Ai, j is used to represent
the value of our preprocessed data. Ai represents the ith
vibration sequence with 2n sampled points. The sampledvibration signal is our first preprocessed data and denoteas
Ai,1 = Ai(i = 1,2,3, · · · ,2n) . . . . . . . . (1)
Take the absolute value of the raw data and get the sec-ond kind of preprocessing data, denote as
Ai,2 = |Ai,1|(i = 1,2,3, · · · ,2n) . . . . . . (2)
Take Fourier transformation of the raw signal and takethe first n dimension considering the symmetrical prop-erty, denoted as
Ai,3(i = 1,2, · · ·n) . . . . . . . . . . . (3)
After the preprocessing of the data, features can be ex-tracted from the preprocessed data.
For the rolling bearing vibration data, the sum of thevibration displacement of time domain signals is extractedas a feature, denoted as
sum =2n
âi=1
Ai,1 . . . . . . . . . . . . . (4)
Due to the symmetrical property for time-domain sig-nals, the sum of the absolute values of the time-domainsignal is extracted as another feature, denoted as
abs sum =2n
âi=1
Ai,2 . . . . . . . . . . . (5)
And take the maximum and minimum of the absolutevalues of vibration of time-domain signal as a character-istic, denoted as
max = max(Ai,2)
min = min(Ai,2). . . . . . . . . . . (6)
In the vibration signalâs time-domain, take the quantileof vibration amplitude from large to small; take the 90-99quantile and the 5-quantile, respectively, for the k quan-tile, denoting as perk.The median is denoted as median.
The mean value of the time series is calculated and de-noted as
mean =12n
2n
âi=1
Ai,1 . . . . . . . . . . . (7)
Then we obtain the standard deviation of vibration datatime series, denoted as
std =2
ââ
2ni=1 (Ai,1âmean)2
2n. . . . . . . (8)
Take the variance of the time series, denoted as
var =â
2ni=1 (Ai,1âmean)2
2n. . . . . . . . (9)
The skew reflects the distribution of data relative to theaverage value, with a normal distribution skew of 0. Theskewness of vibration signal is calculated as follows:
skew =12n â
2ni=1 (Ai,1âmean)3(
12n â
2ni=1 (Ai,1âmean)2
) 32
. . . . (10)
Kurtosis is used to describe the steepness or smooth-ness of data distribution. The kurtosis feature of rollingbearing signal is calculated as follows
kurtosis =12n â
2ni=1 (Ai,1â mean )4(
12n â
2ni=1 (Ai,1â mean )2
)2 â3 . (11)
Maximum amplitude values feature extraction for spec-trum segmentation, for different fault types, in the spec-trum, amplitude distribution of each sample are not iden-tical, so we could divide the spectrum into different k1spectrums for same size , and take out the top k2 valueof frequency amplitude for different frequency bound torepresent the characterization of the frequency. Comparedwith the randomness of spectrum sampling, the maximumamplitude value feature extraction for spectrum segmen-tation method can better represent the characteristics of acertain frequency segment. The jth biggest value in groupi, is denoted as FFT k1Ă k2 Gi jth.
The time-frequency analysis method is to analyze andprocess the signal in the time-frequency domain. Waveletpacket decomposition is an orthogonal decompositionmethod formed on the basis of multi-resolution wavelettransformation. Liu Tao [22] put forward the conceptof wavelet packet node energy, and found that comparedwith extracting the wavelet packet decomposition coef-
The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020)2 CITIC Jingling Hotel Beijing, Beijing, China, Oct.31-Nov.3, 2020
The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020) Beijing, China, Oct.31-Nov.3, 2020
A Novel Toolbox for Bearing Fault Detection Based on PCC and Residual Blocks
ficient directly, the node energy has better stability. Thewavelet packet node energy is defined as follows: Assumewe apply j decomposition on the vibration signal x(t) andget 2 j child frequency bands, the energy of the wth bandis denoted as
Enw = â | fw|2,w = 0,1,2, . . .2 jâ1 . . . . . (12)
where fw represents the wth component after decomposi-tion of x(t). Then the wavelet packet energy spectrum forthe j decomposition level is denoted as
En j =âŁâŁEn0,En1,En2, . . . ,En2 jâ1
âŁâŁT . . . . (13)
Norm is a measure of energy; the characteristic of theenergy of reaction nodes can be obtained by taking 2-norm for node frequency band.
The algorithm flow of wavelet packet feature extractioninvolved in this paper is as follows
1. Decompose the acquired original bearing vibrationsignal with 3-layer wavelet packet
2. Obtain 14 frequencies from low frequency to highfrequency in the first 3 layers
3. Reconstruct the wavelet packet decomposition coef-ficient and calculate the norm of each frequency bandsignal
4. Take the norm of each frequency band signal as theelement to construct the feature vector
V = [N1,N2,N3, . . . ,N12,N13,N14]
2.2. Feature Dimension ReductionWith the extracted features, two methods are adopted
for feature reduction: feature selection based on Pear-son correlation coefficient (PCC) and dimension reduc-tion based on PCA. The analysis of our method: the fea-ture selection based on PCC is conducted comparing withthe reference model based on PCA.
PCA is a technique for analyzing and simplifyingdata. Applying linear transformation, the problem istransformed from high-dimensional to low-dimensional.Through dimension reduction, the complex multidimen-sional data is converted into simple, intuitive and irrele-vant low-dimensional data, which effectively reduces thedifficulty and complexity of data analysis [23]. PCA isa common method of feature reduction in rolling bear-ing fault detection and we will compare our new baselinemethod with the PCA-based feature extraction method.
Pearson correlation coefficient is a metric that reflectslinear correlation between two variables, which can be ap-plied in the reduction of the features. Basing on the ex-tracted features, Pearson correlation coefficients are cal-culated between every two features, as well as their cor-relation coefficients with the category label. As shown inFig 1, the formula to calculate PCC between two featuresX,Y is as follows:
r(X,Y) =cov(X ,Y )â
Var[X ]Var[Y ]. . . . . . . . (14)
where, cov(X ,Y ) is the covariance of two features ,Var[X ] is the standard deviation of X , and Var[Y ] is thestandard deviation of Y
Fig. 1. : The correlation coefficient matrix among the first50 features
Certain patterns can be seen from the correlation coef-ficient matrix that retains top 50 features correlate to cat-egory label (shown in Fig 1). Moreover, from the resultof experiment, the features with the high correlation co-efficient with the category label have a good performancein the classification while the high correlation coefficientsbetween the selected features will lead to the overfittingand degradation in the classifiers.
Therefore, when performing feature dimension reduc-tion based on Pearson correlation coefficient, the corre-lation coefficients of features and category are first cal-culated. Secondly, the features having high correlationwith the category label are selected. Thirdly, featuresthat present high correlation coefficients with others areremoved. With aforementioned steps, the feature dimen-sion could be reduced by selecting the effective featuresfor classification and effectively avoid the overfittings ofthe classifiers on the selected features.
2.3. Classifier Selection and TrainingBefore training, the features are first normalized to
speed up the training and improve the classification accu-racy. Support Vector Machine, Random Forest, K-NearestNeighbor and Multi-layer perceptron are applied as clas-sifiers for training. In the selection of classifier hyperpa-rameters, cross validation is used to get the optimal pa-rameters. Part of the training set is divided as validationset to perform cross-validation.
3. Baseline for Deep Learning Based Methods
In this part, we promote a Deep Learning Baselinemodel for rolling bearing fault detection based on resid-ual layers. The structure of the model is shown in Fig 2.The model is tested on the dataset mentioned before andreached 100% test accuracy on the dataset.
The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020)CITIC Jingling Hotel Beijing, Beijing, China, Oct.31-Nov.3, 2020 3
The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020) Beijing, China, Oct.31-Nov.3, 2020
Chengkun Li, Yufan Lin, Yujing Liu
0
1000
2000
3000
4000
5000
6000
time stam
p
-5
0
5
vibration displacement
Fault of the Outer Race of D
iameter 3
1
6000
1
Input64
6000
1
Conv 1Ă6BN
ReLU
64
6000
1
Conv 1Ă4BN
ReLU
64
6000
1
Conv 1Ă3BN
ReLU
128
6000
1
Conv 1Ă6BN
ReLU
128
6000
1
Conv 1Ă4BN
ReLU
128
6000
1
Conv 1Ă3BN
ReLU
128
6000
1
Conv 1Ă6BN
ReLU
128
6000
1
Conv 1Ă4BN
ReLU
128
6000
1
Conv 1Ă3
+
128
6000
1
BNReLU
112
81
Average
110
Fullycon-
nected
Fig. 2. : Structure of our baseline model for rolling bearing fault detection based on Residual Blocks
Fig. 3. : Structure of the reference model
The performance of our model is compared to a refer-ence model which is a popular model for the given dataset,it consist of 1D convolution layer, pooling layer, dropoutlayer and fully connected layer, the structure is shown inFig 3. Our model mainly consists of Convolution Blocksand Residual Blocks, and will be illustrated respectivelyin the following parts.
3.1. Convolution BlockConvolution blocks make up the most part of our
model, see #2 ⌠#7 block in Fig 2, the individual func-tion of a block could be described as
y = Wâx+b . . . . . . . . . . . . (15)
s = BN(y) . . . . . . . . . . . . (16)
out = ReLU(s) . . . . . . . . . . . . (17)
where BN stands for Batch Normalization and W is thelearned weights. These blocks in our model serve as up-sampling structures and low level vibration feature extrac-
tors.
3.2. Residual BlockResidual network in Deep Learning was initially pro-
posed by He et al. [24], they proved that adding skip-connect layers can avoid a significant decrease in networkperformance (also known as Degradation) with deepernetworks. In our model, we placed a Residual Block nearthe end of the model (#8 ⌠#10 block in Fig 2) to im-prove the overall performance of classification. Notice,the underlying condition to use a Residual block is thatthe dimension of the input should be the same as that ofthe output so that a identity short-connect could be added
h1 = Blockk1(x) . . . . . . . . . . . (18)
h2 = Blockk2 (h1) . . . . . . . . . . . (19)
h3 = Blockk3 (h2) . . . . . . . . . . . (20)
y = h3 + x . . . . . . . . . . . . . (21)
h = ReLU(y) . . . . . . . . . . . . . (22)
out = BN(h) . . . . . . . . . . . . . (23)
The dimensions for Convolution Block and ResidualBlock are [64,64,128] in our model and the final classifi-cation is performed by a fully-connected layer with the di-mension of [128Ă10]. The input of the model is the orig-inal vibration sequence with the dimension of [1Ă6000].The convolution kernel size in our model is set to {6,4,3}to extract the local feature in the vibration signal.
4. Experiment and Analysis
4.1. Evaluation MethodWe use accuracy as the criterion of classification re-
sults. The Accuracy is calculated as
Accuracy =T P+T N
T P+T N +FP+FN. . . . . (24)
4.2. DatasetWe used the official dataset of the DC Bearing Fault
Detection Competition official website for training and
The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020)4 CITIC Jingling Hotel Beijing, Beijing, China, Oct.31-Nov.3, 2020
The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020) Beijing, China, Oct.31-Nov.3, 2020
A Novel Toolbox for Bearing Fault Detection Based on PCC and Residual Blocks
0 1000 2000 3000 4000 5000 6000
time stamp
-5
0
5
vib
rati
on d
ispla
cem
ent
Fault of the Outer Race of Diameter 1
0 1000 2000 3000 4000 5000 6000
time stamp
-5
0
5
vib
rati
on d
ispla
cem
ent
Fault of the Inner Race of Diameter 1
0 1000 2000 3000 4000 5000 6000
time stamp
-5
0
5
vib
rati
on d
ispla
cem
ent
Fault of the Ball of Diameter 1
0 1000 2000 3000 4000 5000 6000
time stamp
-5
0
5
vib
rati
on d
ispla
cem
ent
Fault of the Outer Race of Diameter 2
0 1000 2000 3000 4000 5000 6000
time stamp
-5
0
5
vib
rati
on d
ispla
cem
ent
Fault of the Inner Race of Diameter 2
0 1000 2000 3000 4000 5000 6000
time stamp
-5
0
5
vib
rati
on
dis
pla
cem
ent
Fault of the Ball of Diameter 2
0 1000 2000 3000 4000 5000 6000
time stamp
-5
0
5
vib
rati
on d
ispla
cem
ent
Fault of the Outer Race of Diameter 3
0 1000 2000 3000 4000 5000 6000
time stamp
-5
0
5vib
rati
on d
ispla
cem
ent
Fault of the Inner Race of Diameter 3
1000 2000 3000 4000 5000 6000
time stamp
-5
0
5
vib
rati
on d
ispla
cem
ent
Fault of the Ball of Diameter 3
Fig. 4. : Nine types rolling bearing faults
testing. The value of the signal represent the displace-ment/magnitude of vibration that is continuously sampledin time stamps from 1 to 6000. There are 10 categories ofbearing conditions with their signals shown in Fig 4 andFig 5.
0 1000 2000 3000 4000 5000 6000
time stamp
-5
0
5
vib
rati
on d
ispla
cem
ent
Normal Bearing
Fig. 5. : Vibration signal of a bearing in good condition
4.3. Baseline Based on Traditional ClassificationMethods and Feature Engineering
Two experiments are performed to derive our baselinemethod: the features reduction method with Pearson cor-relation coefficient and further verify the effectiveness androbustness. Different feature combinations are obtainedwith different feature reduction methods and trained dif-ferent classifiers. The trained classifiers are tested in thetest set. The results of our method are compared with theresults of PCA.
In the derivation of our method, the correlation coef-ficients between features and category label are first ob-tained, where features are ranked from high to low bytheir correlation to the category label. 40 features arekept and the rest are filtered out. In the experiment, twosets of features are used: feature set with wavelet packetfeatures and feature set without wavelet packet features.With these two sets of features different results are ob-tained.
In the feature set with wavelet packet, the correlationcoefficients between the first 40 features are obtained afterfeatures correlation coefficients with label are calculated,the correlation matrix is shown in Fig 6.
In the feature set without wavelet packet, the correla-tion coefficient graph between the first 40 features is ob-tained after features correlations with the label are calcu-lated, the correlation matrix is shown in Fig 7.
Results show that classifiers trained by the features setwithout wavelet packet have higher accuracy after featuredimension reduction(see table 1). In Fig 6, although thereis a high correlation coefficient between wavelet packetfeatures and classification label, the correlation coeffi-cients among wavelet packet features are high as well.Also, wavelet packet features are highly correlated withthe time-domain maximum value and square differencefeatures, leading to feature redundancy and degradationof classification performance. In the feature set withoutwavelet packet, the correlation between features are lesssevere compared to that with wavelet packet thus leadingto a better result in Fig 7.
Based on the analysis above, our method to reduce fea-tures are derived: first, select the features that have a highcorrelation with the category labels; second, remove thefeatures that have a high correlation coefficient(â„ 0.9)among the features selected. With the method, 40 fea-tures are extracted from the top 128 features having highcorrelation coefficient with the category label and the cor-relation is shown in Fig 8, where features with high cor-relation coefficients between others are removed.
Based on the 40 features derived by our method, clas-sifiers are trained. To compare and evaluate the effective-ness of our method, PCA method is used to reduce thefeature dimension to 40 dimensions to train the classifierand test as well. Table 1 shows the performances of clas-sifiers with different reduction methods. From the results,
The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020)CITIC Jingling Hotel Beijing, Beijing, China, Oct.31-Nov.3, 2020 5
The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020) Beijing, China, Oct.31-Nov.3, 2020
Chengkun Li, Yufan Lin, Yujing Liu
Table 1. : Test accuracy of different classifiers after correaltion-based dimension reduction; 40-correlation-wv: 40features from the set with wavelet packet, 40-correlation: 40 features from the set without wavelet packet, 40-PCC: 40features reduced with PCC, 40-PCA: 40 features reduced with PCA
Model Feature Design
40-correlation-wv 40-correlation 40-PCC 40-PCA
RF 96.02 98.30 98.67 99.81SVM 76.14 99.62 99.43 99.43kNN 81.25 99.81 99.81 99.81MLP 67.23 95.17 100 99.24
Fig. 6. : Correlation Plot of top 40high correlation features with wp
Fig. 7. : Correlation Plot of top 40high correlation features without wp
Fig. 8. : Correlation Plot of 40 featuresafter PCC
the classifiers of our method has a good performance onthe test set. In the SVM and kNN, our model has the sameperformance with PCA and has a better performance inMLP. The derived reduction method of the correlation co-efficient is derived.
To further analyze and verify the effectiveness of ourmodel, the performances of classifiers after the reduc-tion method of the correlation coefficients with differ-ent feature dimensions, from 30-40 dimensions, are an-alyzed and compared with the performance of PCA withthe same dimensions. From table 2 , Our method has agood performance in the test set with the accuracy from98.48% to 100%. Our method has the higher accuracy inthe classifiers of SVM and kNN, while in RF, the methodof PCA has a better performance.Compared with PCA,although PCA can reaches high accuracy, our method canmaintain a relatively high accuracy over 98.48%. Accord-ing to Fig 9, the variance of the accuracy in our model issmaller compared with PCA, especially in kNN, whichmeans our method has a relatively stable performance inthe bearing fault detection. The method of our model hasa relatively high accuracy and has a better robustness thanPCA. The accuracy and the robustness enable our methodto be a good baseline which demands a high robustnessand a high enough accuracy.
Table 2. : The Comparision between our method andPCA in terms of testing accuracy with different dimen-sions of features; underline indicates the accuray of ourmethod
# dim Model Performances
RF SVM kNN
30 99.05% / 98.86% 99.43% / 99.81% 99.81% / 99.81%31 99.81% / 98.67% 100.0% / 99.24% 100.0% / 99.81%32 99.05% / 98.67% 99.81% / 99.24% 100.0% / 99.81%33 99.62% / 98.67% 100.0% / 99.81% 99.81% / 99.81%34 99.81% / 98.86% 100.0% / 99.05% 100.0% / 99.81%35 100.0% / 98.67% 99.43% / 99.62% 100.0% / 99.81%36 99.43% / 98.48% 99.24% / 99.24% 100.0% / 99.81%37 99.81% / 99.05% 100.0% / 99.24% 99.81% / 99.81%38 99.81% / 98.86% 100.0% / 99.62% 100.0% / 99.81%39 99.43% / 99.05% 99.43% / 99.81% 99.81% / 99.81%40 99.81% / 98.67% 99.62% / 99.43% 99.37% / 99.81%
PCA Our Method
98.5
99
99.5
100
Acc
urac
y
PCA Our MethodSVM
98.5
99
99.5
100
Acc
urac
y
PCAKNN
98.5
99
99.5
100
Acc
urac
y
RFOur Method
Fig. 9. : Comparison between our method and PCA onthe robustness of classification accuracy
The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020)6 CITIC Jingling Hotel Beijing, Beijing, China, Oct.31-Nov.3, 2020
The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020) Beijing, China, Oct.31-Nov.3, 2020
A Novel Toolbox for Bearing Fault Detection Based on PCC and Residual Blocks
The derived feature reduction method with the corre-lation coefficient is a feature reduction method base onthe feature selection and PCA is an efficient feature re-duction method based on feature extraction. From theanalysis, the effectiveness of our method is verified witha relatively high accuracy and this method has a good ro-bustness. Therefore, the feature reduction method withthe correlation coefficient is an effective method for thefeature reduction and this method can be used as a goodbaseline in the bearing fault detection to meet requirementof high accuracy and robustness with the feature selectionmethod.
4.4. Baseline Based on Deep Neural NetworksWe implemented our baseline model as well as the ref-
erence model both via PyTorch framework. Optimizersused in the models were Adam, and the loss functionswere Cross Entropy Loss.
During the experiment, we divided the dataset intotraining set (640 pieces of data), validation set (152 piecesof data), and test set (528 pieces of data). Training setand validation set are used to perform cross-validation tofind ideal hyperparameters of the models. Test resultswere achieved by evaluating models on the test set aftertraining. The hyperparameters we used in the experimentwere: learning rate set as 0.0002, batch size set as 10.The GPU used for training is NVIDIA RTX2060, and thetraining epochs is set to 30 rounds.
We then evaluated the performances of our model com-paring to the reference model and the result is shown inFig 10.
Table 3. : Results of our Deep Neural Networks modelscomparing to the reference model, training time and ac-curacy is measured for 30 epochs
Model Performances and parameters
Training time (s) Model size (MB) Test accuracy
Reference 26 27.56 97.91%Our Model 118 190.02 100%
Comparing the performances in the first 10 epochs be-tween these two models (see table 3 for details), we cansee that the training loss of our model converges fasterand the testing accuracy soon reaches 100% and stays rel-atively stable at 100%.
Additionally, compared to the reference model, thewidth of the Convolutional layer and Residual layer in ourmodel is the same as the width of the input timing seriesdata, thus leading to larger model size and an increase inthe size. Nevertheless, the deep neural network still hasadvantages over feature engineering in terms of the train-ing time.
Compared with the Baseline method based on featureengineering, deep neural network-base method have re-quired a certain amount of parallel computing capabil-
2 4 6 8 100
20
40
60
80
100
Tes
ting A
ccura
cy
Reference Model
Our Baseline
2 4 6 8 10
Epoch
0
50
100
150
Tra
inin
g L
oss
Reference Model
Our Baseline
Fig. 10. : Comparison between our baseline model andreference model; Top: testing accuracy, Bottom: trainingloss.
ities. Therefore, it is necessary to confirm whether thehardware meets the requirements before using it.
5. Conclusion
In this paper, we addressed the problem of rollingbearing fault detection and proposed a novel pipeline forrolling bearing fault detection which includes two base-line methods that are based on feature engineering & di-mension reduction and deep neural network respectively.The models were tested on our rolling bearing dataset andreached promising results. The conclusions for each partof the pipeline is as follows
1. The Baseline based on feature engineering & di-mension reduction: In our Baseline model, we firstextracted features from time-domain, frequency do-main, and time-frequency domain. Then we applyPCC method to reduce the dimension of the features.We test our method on the dataset mentioned in Sec-tion 4.2 and our method presented a relatively highaccuracy and a good robustness when compared withPCA. Our model based on feature engineering & di-mension reduction can serve as a Baseline with a rel-ative high accuracy and robustness.
2. The Baseline based on Deep Neural Network: Inour Baseline model, we used Convolutional Blocksand Residual Blocks to realize low-level feature aswell as high-level feature extractors and achieved apromising result on the dataset mentioned in Sec-tion 4.2 outperformed significantly than the refer-ence model. Moreover, our model converges and
The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020)CITIC Jingling Hotel Beijing, Beijing, China, Oct.31-Nov.3, 2020 7
The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020) Beijing, China, Oct.31-Nov.3, 2020
Chengkun Li, Yufan Lin, Yujing Liu
reached 100% accuracy on the test set only after 10epochs with no sign of overfitting presented. We ver-ified that by using our baseline model, data augmen-tation is avoided even when the model is presentedwith a relative small dataset.
With all mentioned above, we recommend using ourtoolbox when given a rolling bearing fault detection prob-lem with the following reasons
1) Both methods can complete the task of bearing faultdetection in a relatively short time and at low cost,thus could provide alternative solutions and assis-tance for manual bearing inspection.
(2) When there is limitation imposed on the computingpower, the user could choose methods from the tool-box based on their own computing power and get arobust result regardless of the hardware restrictions.
(3) When there is need for feature dimension reductionbased on the features selection, the user could usethe Baseline method based on feature engineering &dimension reduction.
AcknowledgementsThis work was inspired by the DC data analysis competition onrolling bearing fault detection.
References:[1] M. Z. X, âApplication of acoustic emission technique in fault di-
agnostics of rolling bearing,â Masterâs thesis. Tsinghua Univer-sity,Beijing, Haidian, 2006.
[2] J. Harmouche, C. Delpha, and D. Diallo, âImproved fault diagnosisof ball bearings based on the global spectrum of vibration signals,âIEEE Transactions on Energy Conversion, vol. 30, no. 1, pp. 376â383, 2015.
[3] F. Immovilli, A. Bellini, R. Rubini, and C. Tassoni, âDiagnosis ofbearing faults of induction machines by vibration or current sig-nals: A critical comparison,â IEEE Transactions on Industry Appli-cations, vol. 46, no. 4, pp. 1350â1359, 2010.
[4] M. Kang, J. Kim, and J. M. Kim, âAn fpga-based multicore systemfor real-time bearing fault diagnosis using ultrasampling rate ae sig-nals,â IEEE Transactions on Industrial Electronics, vol. 62, no. 4,pp. 2319â2329, 2015.
[5] A. B. Ming, W. Zhang, Z. Y. Qin, and F. L. Chu, âDual-impulseresponse model for the acoustic emission produced by a spall andthe size evaluation in rolling element bearings,â IEEE Transactionson Industrial Electronics, vol. 62, no. 10, pp. 6606â6615, 2015.
[6] R. R. Schoen, T. G. Habetler, F. Kamran, and R. G. Bartheld, âMo-tor bearing damage detection using stator current monitoring,â Pro-ceedings of 1994 IEEE Industry Applications Society Annual Meet-ing, vol. 1, pp. 110â116 vol.1, 1994.
[7] M. Blodt, P. Granjon, B. Raison, and G. Rostaing, âModels for bear-ing damage detection in induction motors using stator current moni-toring,â IEEE Transactions on Industrial Electronics, vol. 55, no. 4,pp. 1813â1822, 2008.
[8] D. Lopez-Perez and J. Antonino-Daviu, âApplication of infraredthermography to failure detection in industrial induction mo-tors: Case stories,â IEEE Transactions on Industry Applications,pp. 1901â1908, 2017.
[9] E. Esfahani, S. Wang, and V. Sundararajan, âMultisensor wirelesssystem for eccentricity and bearing fault detection in induction mo-tors,â IEEE/ASME Transactions on Mechatronics, vol. 19, no. 3,pp. 818â826, 2014.
[10] S. Zhang, S. Zhang, B. Wang, and T. G. Habetler, âDeep learningalgorithms for bearing fault diagnostics a comprehensive review,âIEEE Access, vol. PP, no. 99, pp. 1â1, 2020.
[11] C. Taylor, âRolling bearing analysis: 3rd edn.; by t. a. harris; pub-lished by wiley, chichester, west sussex, 1991; 1013 pp.; price,97.35,â vol. 155, no. 2, pp. 393â394, 1992.
[12] J. Zheng, J. Cheng, and Y. Yang, âGeneralized empirical mode de-composition and its applications to rolling element bearing fault di-agnosis,â Mechanical Systems and Signal Processing.
[13] A. Bellini, F. Filippetti, C. Tassoni, and G. A. Capolino, âAdvancesin diagnostic techniques for induction machines,â
[14] K. Xu, Z. Chen, C. Zhang, and G. Dong., âFault diagnosis of rollingbearing based on empirical mode decomposition and support vectormachine,â in Control Theory and Application:1- 8, vol. 1, pp. 257â261, 2019.
[15] Z. Mou and Z. Du, âDouble hidden layer neural network for bear-ing fault detection using pre-extracted optimal features,â in 201912th International Symposium on Computational Intelligence andDesign (ISCID), vol. 1, pp. 257â261, 2019.
[16] S. Li, Z. Li, and H. Li, âThe method of roller bearing fault moni-toring based on wavelet packet engergy feature,â Journal of SystemSimulation, no. 1, pp. 76â80, 2003.
[17] W. Guo, H. Zhao, C. Li, Y. Li, and A. Tang, âFault feature enhance-ment method for rolling bearing fault diagnosis based on waveletpacket energy spectrum and principal component analysis,â ActaArmamentarii, no. 11, 2019.
[18] J. Li, X. Yao, X. Wang, Q. Yu, and Y. Zhang, âMultiscale localfeatures learning based on bp neural network for rolling bearing in-telligent fault diagnosis,â Measurement, vol. 153, p. 107419, 2020.
[19] L. Wen, X. Li, L. Gao, and Y. Zhang, âA new convolutional neuralnetwork-based data-driven fault diagnosis method,â IEEE Transac-tions on Industrial Electronics, vol. 65, no. 7, pp. 5990â5998, 2017.
[20] Z. Cui, W. Chen, and Y. Chen, âMulti-scale convolutionalneural networks for time series classification,â arXiv preprintarXiv:1603.06995, 2016.
[21] Z. Wang, W. Yan, and T. Oates, âTime series classificationfrom scratch with deep neural networks: A strong baseline,â in2017 International Joint Conference on Neural Networks (IJCNN),pp. 1578â1585, 2017.
[22] T. Liu, A. Li, Y. Ding, Z. Li, and Q. Fei, âExperimental study onstructural damage alarming method based on wavelet packet energyspectrum,â Journal of Vibration and Shock, vol. 028, no. 004, pp. 4â9, 2009.
[23] Y. Gu, Z. Cheng, and F. Zhu, âRolling bearing fault feature fusionbased on pca and svm,â china mechanical engineering, 2015.
[24] K. He, X. Zhang, S. Ren, and J. Sun, âDeep residual learning forimage recognition,â in Proceedings of the IEEE conference on com-puter vision and pattern recognition, pp. 770â778, 2016.
The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020)8 CITIC Jingling Hotel Beijing, Beijing, China, Oct.31-Nov.3, 2020
The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020) Beijing, China, Oct.31-Nov.3, 2020