a novel toolbox for bearing fault detection based on pcc

8
A Novel Toolbox for Bearing Fault Detection Based on PCC and Residual Blocks Paper: A Novel Toolbox for Bearing Fault Detection Based on PCC and Residual Blocks Chengkun Li 1* , Yufan Lin 1* , Yujing Liu 1* Qi Gao 1 1 Beijing Institute of Technology E-mail: {1120171048, 1120171146, 1120171028, gaoqi}@bit.edu.cn [Received 22/07/2020; accepted 14/08/2020] Bearing is one of the essential components of mechan- ical systems, bearing fault detection is of great impor- tance in bearing production and system fault diagno- sis. In this paper, a novel toolbox for bearing fault de- tection using the bearing vibration signals is proposed. Two baseline models are included: 1. Baseline for Feature Engineering Based Method, which consists of three steps: time-frequency feature extraction, Pear- son Correlation CoefïŹcient (PCC) reduction and clas- siïŹcation. 2. Baseline for Deep Learning Based Meth- ods: a powerful deep neural network model consists of Convolutional Blocks and Residual Blocks. In the pa- per, the experimental results show that the methods in our toolbox are sufïŹciently robust to produce results with accuracy between 98% and 100%. Keywords: Rolling bearing, Fault detection, Machine Learning, Deep Learning, Time series classiïŹcation 1. Introduction The bearings are one of the most widely used parts of mechanical equipment. Its running state will directly af- fect the performance and production safety of the equip- ment, it is of considerable signiïŹcance to the research of rolling bearing fault diagnosis technology. [1] Different sensing modalities for solving the bearing fault detection problem have been explored including vibration [2], [3], acoustic noise [4],[5], stator current [6], [7], thermal- imaging [8], and multiple sensor fusion [9], among which vibration analysis is the most dominant [10]. The exis- tence of a bearing fault as well as its speciïŹc fault type can be readily determined by performing frequency spec- tral analysis [11]. However, accurately identifying the presence of a bearing fault can be challenging in prac- tice. The uniqueness of a bearing failure lies in its multi- physics nature. Abnormal electric signal is triggered by primary mechanical vibration due to the bearing defect. The vibration further inïŹ‚uences the output torque, the mo- tor speed, and ïŹnally the bearing vibration pattern itself. Furthermore, the accuracy of the physical model-based . * indicates equal contributions. vibration analysis can be further affected by background noise, and its sensitivity is subject to change with respect to sensor mounting positions and spatial constraints in a highly-compact environment. Therefore, more robust and reliable methods of constructing the classiïŹcation model are required. The problem of bearing fault detection via vibration signal is essentially a time series classiïŹcation prob- lem. For traditional machine learning classiïŹers, fea- tures are usually extracted from time domain, frequency domain and time-frequency domain. A commonly used time-frequency feature extraction method is empirical mode decomposition (EMD), where cubic splines are used to deïŹne the upper and lower envelopes in each sift [12]. In [13], generalized empirical mode decomposition (GEMD), empirical envelope demodulation (EED) and HilbertHuang transform (HHT) were used. Xu [14] and his group applied the algorithm based on empirical mode and spectral kurtosis to simplify the model, Mou [15] pro- posed a new frequency spectrum-based feature extraction method. Wavelet packet transform is widely used in bearing fault analysis. Li et al. [16] proposed a method of intro- ducing wavelet packet transform into bearing fault analy- sis. Guo et al. [17] obtained the enhanced fault features by extracting the wavelet packet energy spectrum features. For Deep Learning methods, Li et al. [18] proposed a BP network model that combined with VMD (Variational Mode Decomposition) Algorithm to tackle the bearing fault detection problem; Wen et al. [19] proposed a bear- ing fault detection pipeline; Moreover, in terms of Deep Learning-based solutions for time series classiïŹcation, Cui et al. [20] proposed a neural network model called Multi-scale Convolution Neural Network based on multi- spec convolutional layer. Wang et al. [21] proposed three end-to-end deep neural networks for the TSC problem: Multiple Layer Perceptron (MLP), Fully Convolutional Network (FCN) , and Residual Network (ResNet), their work provides an important reference for deep network that uses raw time series as input without the need for data augmentation and data preprocessing. In this paper, a toolbox for bearing fault detection was proposed which includes two baseline methods: ‱ Feature dimension reduction pipeline based on Pear- 1 The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020) Beijing, China, Oct.31-Nov.3, 2020

Upload: others

Post on 02-Feb-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

A Novel Toolbox for Bearing Fault Detection Based on PCC and Residual Blocks

Paper:

A Novel Toolbox for Bearing Fault Detection Based on PCC andResidual Blocks

Chengkun Li1*, Yufan Lin1*, Yujing Liu1*

Qi Gao1

1Beijing Institute of TechnologyE-mail: {1120171048, 1120171146, 1120171028, gaoqi}@bit.edu.cn

[Received 22/07/2020; accepted 14/08/2020]

Bearing is one of the essential components of mechan-ical systems, bearing fault detection is of great impor-tance in bearing production and system fault diagno-sis. In this paper, a novel toolbox for bearing fault de-tection using the bearing vibration signals is proposed.Two baseline models are included: 1. Baseline forFeature Engineering Based Method, which consists ofthree steps: time-frequency feature extraction, Pear-son Correlation Coefficient (PCC) reduction and clas-sification. 2. Baseline for Deep Learning Based Meth-ods: a powerful deep neural network model consists ofConvolutional Blocks and Residual Blocks. In the pa-per, the experimental results show that the methods inour toolbox are sufficiently robust to produce resultswith accuracy between 98% and 100%.

Keywords: Rolling bearing, Fault detection, MachineLearning, Deep Learning, Time series classification

1. Introduction

The bearings are one of the most widely used parts ofmechanical equipment. Its running state will directly af-fect the performance and production safety of the equip-ment, it is of considerable significance to the research ofrolling bearing fault diagnosis technology. [1] Differentsensing modalities for solving the bearing fault detectionproblem have been explored including vibration [2], [3],acoustic noise [4] , [5], stator current [6], [7], thermal-imaging [8], and multiple sensor fusion [9], among whichvibration analysis is the most dominant [10]. The exis-tence of a bearing fault as well as its specific fault typecan be readily determined by performing frequency spec-tral analysis [11]. However, accurately identifying thepresence of a bearing fault can be challenging in prac-tice. The uniqueness of a bearing failure lies in its multi-physics nature. Abnormal electric signal is triggered byprimary mechanical vibration due to the bearing defect.The vibration further influences the output torque, the mo-tor speed, and finally the bearing vibration pattern itself.Furthermore, the accuracy of the physical model-based

. * indicates equal contributions.

vibration analysis can be further affected by backgroundnoise, and its sensitivity is subject to change with respectto sensor mounting positions and spatial constraints in ahighly-compact environment. Therefore, more robust andreliable methods of constructing the classification modelare required.

The problem of bearing fault detection via vibrationsignal is essentially a time series classification prob-lem. For traditional machine learning classifiers, fea-tures are usually extracted from time domain, frequencydomain and time-frequency domain. A commonly usedtime-frequency feature extraction method is empiricalmode decomposition (EMD), where cubic splines areused to define the upper and lower envelopes in each sift[12]. In [13], generalized empirical mode decomposition(GEMD), empirical envelope demodulation (EED) andHilbertHuang transform (HHT) were used. Xu [14] andhis group applied the algorithm based on empirical modeand spectral kurtosis to simplify the model, Mou [15] pro-posed a new frequency spectrum-based feature extractionmethod.

Wavelet packet transform is widely used in bearingfault analysis. Li et al. [16] proposed a method of intro-ducing wavelet packet transform into bearing fault analy-sis. Guo et al. [17] obtained the enhanced fault features byextracting the wavelet packet energy spectrum features.

For Deep Learning methods, Li et al. [18] proposed aBP network model that combined with VMD (VariationalMode Decomposition) Algorithm to tackle the bearingfault detection problem; Wen et al. [19] proposed a bear-ing fault detection pipeline; Moreover, in terms of DeepLearning-based solutions for time series classification,Cui et al. [20] proposed a neural network model calledMulti-scale Convolution Neural Network based on multi-spec convolutional layer. Wang et al. [21] proposed threeend-to-end deep neural networks for the TSC problem:Multiple Layer Perceptron (MLP), Fully ConvolutionalNetwork (FCN) , and Residual Network (ResNet), theirwork provides an important reference for deep networkthat uses raw time series as input without the need fordata augmentation and data preprocessing.

In this paper, a toolbox for bearing fault detection wasproposed which includes two baseline methods:

‱ Feature dimension reduction pipeline based on Pear-

The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020)CITIC Jingling Hotel Beijing, Beijing, China, Oct.31-Nov.3, 2020 1

The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020) Beijing, China, Oct.31-Nov.3, 2020

Chengkun Li, Yufan Lin, Yujing Liu

son Correlation Coefficient which effectively en-sures the efficiency as well as the robustness of de-tection.

‱ Simple while powerful deep learning baseline modelbased on Convolutional Blocks and Residual Blocksand achieved comparable performances with relativelow computing cost.

The toolbox could serve as baseline method in the fieldof bearing fault detection due to its simplicity and robust-ness.

2. Baseline Model for Feature EngineeringBased Method

In this part, a Feature Engineering Baseline modelbased on the feature reduction method with Pearson cor-relation coefficient is promoted for rolling bearing faultdetection. The model is tested on the dataset mentionedbefore and reached 100% test accuracy on the dataset

The performance of our model is compared with thewidely-used feature reduction method: Principal Compo-nent Analysis (PCA). The extraction of features, the re-duction of feature and the training of classifiers are illus-trated in following parts.

2.1. Preprocessing and Feature ExtractionFor each vibration sequence, Ai, j is used to represent

the value of our preprocessed data. Ai represents the ith

vibration sequence with 2n sampled points. The sampledvibration signal is our first preprocessed data and denoteas

Ai,1 = Ai(i = 1,2,3, · · · ,2n) . . . . . . . . (1)

Take the absolute value of the raw data and get the sec-ond kind of preprocessing data, denote as

Ai,2 = |Ai,1|(i = 1,2,3, · · · ,2n) . . . . . . (2)

Take Fourier transformation of the raw signal and takethe first n dimension considering the symmetrical prop-erty, denoted as

Ai,3(i = 1,2, · · ·n) . . . . . . . . . . . (3)

After the preprocessing of the data, features can be ex-tracted from the preprocessed data.

For the rolling bearing vibration data, the sum of thevibration displacement of time domain signals is extractedas a feature, denoted as

sum =2n

∑i=1

Ai,1 . . . . . . . . . . . . . (4)

Due to the symmetrical property for time-domain sig-nals, the sum of the absolute values of the time-domainsignal is extracted as another feature, denoted as

abs sum =2n

∑i=1

Ai,2 . . . . . . . . . . . (5)

And take the maximum and minimum of the absolutevalues of vibration of time-domain signal as a character-istic, denoted as

max = max(Ai,2)

min = min(Ai,2). . . . . . . . . . . (6)

In the vibration signal’s time-domain, take the quantileof vibration amplitude from large to small; take the 90-99quantile and the 5-quantile, respectively, for the k quan-tile, denoting as perk.The median is denoted as median.

The mean value of the time series is calculated and de-noted as

mean =12n

2n

∑i=1

Ai,1 . . . . . . . . . . . (7)

Then we obtain the standard deviation of vibration datatime series, denoted as

std =2

√∑

2ni=1 (Ai,1−mean)2

2n. . . . . . . (8)

Take the variance of the time series, denoted as

var =∑

2ni=1 (Ai,1−mean)2

2n. . . . . . . . (9)

The skew reflects the distribution of data relative to theaverage value, with a normal distribution skew of 0. Theskewness of vibration signal is calculated as follows:

skew =12n ∑

2ni=1 (Ai,1−mean)3(

12n ∑

2ni=1 (Ai,1−mean)2

) 32

. . . . (10)

Kurtosis is used to describe the steepness or smooth-ness of data distribution. The kurtosis feature of rollingbearing signal is calculated as follows

kurtosis =12n ∑

2ni=1 (Ai,1− mean )4(

12n ∑

2ni=1 (Ai,1− mean )2

)2 −3 . (11)

Maximum amplitude values feature extraction for spec-trum segmentation, for different fault types, in the spec-trum, amplitude distribution of each sample are not iden-tical, so we could divide the spectrum into different k1spectrums for same size , and take out the top k2 valueof frequency amplitude for different frequency bound torepresent the characterization of the frequency. Comparedwith the randomness of spectrum sampling, the maximumamplitude value feature extraction for spectrum segmen-tation method can better represent the characteristics of acertain frequency segment. The jth biggest value in groupi, is denoted as FFT k1× k2 Gi jth.

The time-frequency analysis method is to analyze andprocess the signal in the time-frequency domain. Waveletpacket decomposition is an orthogonal decompositionmethod formed on the basis of multi-resolution wavelettransformation. Liu Tao [22] put forward the conceptof wavelet packet node energy, and found that comparedwith extracting the wavelet packet decomposition coef-

The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020)2 CITIC Jingling Hotel Beijing, Beijing, China, Oct.31-Nov.3, 2020

The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020) Beijing, China, Oct.31-Nov.3, 2020

A Novel Toolbox for Bearing Fault Detection Based on PCC and Residual Blocks

ficient directly, the node energy has better stability. Thewavelet packet node energy is defined as follows: Assumewe apply j decomposition on the vibration signal x(t) andget 2 j child frequency bands, the energy of the wth bandis denoted as

Enw = ∑ | fw|2,w = 0,1,2, . . .2 j−1 . . . . . (12)

where fw represents the wth component after decomposi-tion of x(t). Then the wavelet packet energy spectrum forthe j decomposition level is denoted as

En j =∣∣En0,En1,En2, . . . ,En2 j−1

∣∣T . . . . (13)

Norm is a measure of energy; the characteristic of theenergy of reaction nodes can be obtained by taking 2-norm for node frequency band.

The algorithm flow of wavelet packet feature extractioninvolved in this paper is as follows

1. Decompose the acquired original bearing vibrationsignal with 3-layer wavelet packet

2. Obtain 14 frequencies from low frequency to highfrequency in the first 3 layers

3. Reconstruct the wavelet packet decomposition coef-ficient and calculate the norm of each frequency bandsignal

4. Take the norm of each frequency band signal as theelement to construct the feature vector

V = [N1,N2,N3, . . . ,N12,N13,N14]

2.2. Feature Dimension ReductionWith the extracted features, two methods are adopted

for feature reduction: feature selection based on Pear-son correlation coefficient (PCC) and dimension reduc-tion based on PCA. The analysis of our method: the fea-ture selection based on PCC is conducted comparing withthe reference model based on PCA.

PCA is a technique for analyzing and simplifyingdata. Applying linear transformation, the problem istransformed from high-dimensional to low-dimensional.Through dimension reduction, the complex multidimen-sional data is converted into simple, intuitive and irrele-vant low-dimensional data, which effectively reduces thedifficulty and complexity of data analysis [23]. PCA isa common method of feature reduction in rolling bear-ing fault detection and we will compare our new baselinemethod with the PCA-based feature extraction method.

Pearson correlation coefficient is a metric that reflectslinear correlation between two variables, which can be ap-plied in the reduction of the features. Basing on the ex-tracted features, Pearson correlation coefficients are cal-culated between every two features, as well as their cor-relation coefficients with the category label. As shown inFig 1, the formula to calculate PCC between two featuresX,Y is as follows:

r(X,Y) =cov(X ,Y )√

Var[X ]Var[Y ]. . . . . . . . (14)

where, cov(X ,Y ) is the covariance of two features ,Var[X ] is the standard deviation of X , and Var[Y ] is thestandard deviation of Y

Fig. 1. : The correlation coefficient matrix among the first50 features

Certain patterns can be seen from the correlation coef-ficient matrix that retains top 50 features correlate to cat-egory label (shown in Fig 1). Moreover, from the resultof experiment, the features with the high correlation co-efficient with the category label have a good performancein the classification while the high correlation coefficientsbetween the selected features will lead to the overfittingand degradation in the classifiers.

Therefore, when performing feature dimension reduc-tion based on Pearson correlation coefficient, the corre-lation coefficients of features and category are first cal-culated. Secondly, the features having high correlationwith the category label are selected. Thirdly, featuresthat present high correlation coefficients with others areremoved. With aforementioned steps, the feature dimen-sion could be reduced by selecting the effective featuresfor classification and effectively avoid the overfittings ofthe classifiers on the selected features.

2.3. Classifier Selection and TrainingBefore training, the features are first normalized to

speed up the training and improve the classification accu-racy. Support Vector Machine, Random Forest, K-NearestNeighbor and Multi-layer perceptron are applied as clas-sifiers for training. In the selection of classifier hyperpa-rameters, cross validation is used to get the optimal pa-rameters. Part of the training set is divided as validationset to perform cross-validation.

3. Baseline for Deep Learning Based Methods

In this part, we promote a Deep Learning Baselinemodel for rolling bearing fault detection based on resid-ual layers. The structure of the model is shown in Fig 2.The model is tested on the dataset mentioned before andreached 100% test accuracy on the dataset.

The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020)CITIC Jingling Hotel Beijing, Beijing, China, Oct.31-Nov.3, 2020 3

The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020) Beijing, China, Oct.31-Nov.3, 2020

Chengkun Li, Yufan Lin, Yujing Liu

0

1000

2000

3000

4000

5000

6000

time stam

p

-5

0

5

vibration displacement

Fault of the Outer Race of D

iameter 3

1

6000

1

Input64

6000

1

Conv 1×6BN

ReLU

64

6000

1

Conv 1×4BN

ReLU

64

6000

1

Conv 1×3BN

ReLU

128

6000

1

Conv 1×6BN

ReLU

128

6000

1

Conv 1×4BN

ReLU

128

6000

1

Conv 1×3BN

ReLU

128

6000

1

Conv 1×6BN

ReLU

128

6000

1

Conv 1×4BN

ReLU

128

6000

1

Conv 1×3

+

128

6000

1

BNReLU

112

81

Average

110

Fullycon-

nected

Fig. 2. : Structure of our baseline model for rolling bearing fault detection based on Residual Blocks

Fig. 3. : Structure of the reference model

The performance of our model is compared to a refer-ence model which is a popular model for the given dataset,it consist of 1D convolution layer, pooling layer, dropoutlayer and fully connected layer, the structure is shown inFig 3. Our model mainly consists of Convolution Blocksand Residual Blocks, and will be illustrated respectivelyin the following parts.

3.1. Convolution BlockConvolution blocks make up the most part of our

model, see #2 ∌ #7 block in Fig 2, the individual func-tion of a block could be described as

y = W⊗x+b . . . . . . . . . . . . (15)

s = BN(y) . . . . . . . . . . . . (16)

out = ReLU(s) . . . . . . . . . . . . (17)

where BN stands for Batch Normalization and W is thelearned weights. These blocks in our model serve as up-sampling structures and low level vibration feature extrac-

tors.

3.2. Residual BlockResidual network in Deep Learning was initially pro-

posed by He et al. [24], they proved that adding skip-connect layers can avoid a significant decrease in networkperformance (also known as Degradation) with deepernetworks. In our model, we placed a Residual Block nearthe end of the model (#8 ∌ #10 block in Fig 2) to im-prove the overall performance of classification. Notice,the underlying condition to use a Residual block is thatthe dimension of the input should be the same as that ofthe output so that a identity short-connect could be added

h1 = Blockk1(x) . . . . . . . . . . . (18)

h2 = Blockk2 (h1) . . . . . . . . . . . (19)

h3 = Blockk3 (h2) . . . . . . . . . . . (20)

y = h3 + x . . . . . . . . . . . . . (21)

h = ReLU(y) . . . . . . . . . . . . . (22)

out = BN(h) . . . . . . . . . . . . . (23)

The dimensions for Convolution Block and ResidualBlock are [64,64,128] in our model and the final classifi-cation is performed by a fully-connected layer with the di-mension of [128×10]. The input of the model is the orig-inal vibration sequence with the dimension of [1×6000].The convolution kernel size in our model is set to {6,4,3}to extract the local feature in the vibration signal.

4. Experiment and Analysis

4.1. Evaluation MethodWe use accuracy as the criterion of classification re-

sults. The Accuracy is calculated as

Accuracy =T P+T N

T P+T N +FP+FN. . . . . (24)

4.2. DatasetWe used the official dataset of the DC Bearing Fault

Detection Competition official website for training and

The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020)4 CITIC Jingling Hotel Beijing, Beijing, China, Oct.31-Nov.3, 2020

The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020) Beijing, China, Oct.31-Nov.3, 2020

A Novel Toolbox for Bearing Fault Detection Based on PCC and Residual Blocks

0 1000 2000 3000 4000 5000 6000

time stamp

-5

0

5

vib

rati

on d

ispla

cem

ent

Fault of the Outer Race of Diameter 1

0 1000 2000 3000 4000 5000 6000

time stamp

-5

0

5

vib

rati

on d

ispla

cem

ent

Fault of the Inner Race of Diameter 1

0 1000 2000 3000 4000 5000 6000

time stamp

-5

0

5

vib

rati

on d

ispla

cem

ent

Fault of the Ball of Diameter 1

0 1000 2000 3000 4000 5000 6000

time stamp

-5

0

5

vib

rati

on d

ispla

cem

ent

Fault of the Outer Race of Diameter 2

0 1000 2000 3000 4000 5000 6000

time stamp

-5

0

5

vib

rati

on d

ispla

cem

ent

Fault of the Inner Race of Diameter 2

0 1000 2000 3000 4000 5000 6000

time stamp

-5

0

5

vib

rati

on

dis

pla

cem

ent

Fault of the Ball of Diameter 2

0 1000 2000 3000 4000 5000 6000

time stamp

-5

0

5

vib

rati

on d

ispla

cem

ent

Fault of the Outer Race of Diameter 3

0 1000 2000 3000 4000 5000 6000

time stamp

-5

0

5vib

rati

on d

ispla

cem

ent

Fault of the Inner Race of Diameter 3

1000 2000 3000 4000 5000 6000

time stamp

-5

0

5

vib

rati

on d

ispla

cem

ent

Fault of the Ball of Diameter 3

Fig. 4. : Nine types rolling bearing faults

testing. The value of the signal represent the displace-ment/magnitude of vibration that is continuously sampledin time stamps from 1 to 6000. There are 10 categories ofbearing conditions with their signals shown in Fig 4 andFig 5.

0 1000 2000 3000 4000 5000 6000

time stamp

-5

0

5

vib

rati

on d

ispla

cem

ent

Normal Bearing

Fig. 5. : Vibration signal of a bearing in good condition

4.3. Baseline Based on Traditional ClassificationMethods and Feature Engineering

Two experiments are performed to derive our baselinemethod: the features reduction method with Pearson cor-relation coefficient and further verify the effectiveness androbustness. Different feature combinations are obtainedwith different feature reduction methods and trained dif-ferent classifiers. The trained classifiers are tested in thetest set. The results of our method are compared with theresults of PCA.

In the derivation of our method, the correlation coef-ficients between features and category label are first ob-tained, where features are ranked from high to low bytheir correlation to the category label. 40 features arekept and the rest are filtered out. In the experiment, twosets of features are used: feature set with wavelet packetfeatures and feature set without wavelet packet features.With these two sets of features different results are ob-tained.

In the feature set with wavelet packet, the correlationcoefficients between the first 40 features are obtained afterfeatures correlation coefficients with label are calculated,the correlation matrix is shown in Fig 6.

In the feature set without wavelet packet, the correla-tion coefficient graph between the first 40 features is ob-tained after features correlations with the label are calcu-lated, the correlation matrix is shown in Fig 7.

Results show that classifiers trained by the features setwithout wavelet packet have higher accuracy after featuredimension reduction(see table 1). In Fig 6, although thereis a high correlation coefficient between wavelet packetfeatures and classification label, the correlation coeffi-cients among wavelet packet features are high as well.Also, wavelet packet features are highly correlated withthe time-domain maximum value and square differencefeatures, leading to feature redundancy and degradationof classification performance. In the feature set withoutwavelet packet, the correlation between features are lesssevere compared to that with wavelet packet thus leadingto a better result in Fig 7.

Based on the analysis above, our method to reduce fea-tures are derived: first, select the features that have a highcorrelation with the category labels; second, remove thefeatures that have a high correlation coefficient(≄ 0.9)among the features selected. With the method, 40 fea-tures are extracted from the top 128 features having highcorrelation coefficient with the category label and the cor-relation is shown in Fig 8, where features with high cor-relation coefficients between others are removed.

Based on the 40 features derived by our method, clas-sifiers are trained. To compare and evaluate the effective-ness of our method, PCA method is used to reduce thefeature dimension to 40 dimensions to train the classifierand test as well. Table 1 shows the performances of clas-sifiers with different reduction methods. From the results,

The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020)CITIC Jingling Hotel Beijing, Beijing, China, Oct.31-Nov.3, 2020 5

The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020) Beijing, China, Oct.31-Nov.3, 2020

Chengkun Li, Yufan Lin, Yujing Liu

Table 1. : Test accuracy of different classifiers after correaltion-based dimension reduction; 40-correlation-wv: 40features from the set with wavelet packet, 40-correlation: 40 features from the set without wavelet packet, 40-PCC: 40features reduced with PCC, 40-PCA: 40 features reduced with PCA

Model Feature Design

40-correlation-wv 40-correlation 40-PCC 40-PCA

RF 96.02 98.30 98.67 99.81SVM 76.14 99.62 99.43 99.43kNN 81.25 99.81 99.81 99.81MLP 67.23 95.17 100 99.24

Fig. 6. : Correlation Plot of top 40high correlation features with wp

Fig. 7. : Correlation Plot of top 40high correlation features without wp

Fig. 8. : Correlation Plot of 40 featuresafter PCC

the classifiers of our method has a good performance onthe test set. In the SVM and kNN, our model has the sameperformance with PCA and has a better performance inMLP. The derived reduction method of the correlation co-efficient is derived.

To further analyze and verify the effectiveness of ourmodel, the performances of classifiers after the reduc-tion method of the correlation coefficients with differ-ent feature dimensions, from 30-40 dimensions, are an-alyzed and compared with the performance of PCA withthe same dimensions. From table 2 , Our method has agood performance in the test set with the accuracy from98.48% to 100%. Our method has the higher accuracy inthe classifiers of SVM and kNN, while in RF, the methodof PCA has a better performance.Compared with PCA,although PCA can reaches high accuracy, our method canmaintain a relatively high accuracy over 98.48%. Accord-ing to Fig 9, the variance of the accuracy in our model issmaller compared with PCA, especially in kNN, whichmeans our method has a relatively stable performance inthe bearing fault detection. The method of our model hasa relatively high accuracy and has a better robustness thanPCA. The accuracy and the robustness enable our methodto be a good baseline which demands a high robustnessand a high enough accuracy.

Table 2. : The Comparision between our method andPCA in terms of testing accuracy with different dimen-sions of features; underline indicates the accuray of ourmethod

# dim Model Performances

RF SVM kNN

30 99.05% / 98.86% 99.43% / 99.81% 99.81% / 99.81%31 99.81% / 98.67% 100.0% / 99.24% 100.0% / 99.81%32 99.05% / 98.67% 99.81% / 99.24% 100.0% / 99.81%33 99.62% / 98.67% 100.0% / 99.81% 99.81% / 99.81%34 99.81% / 98.86% 100.0% / 99.05% 100.0% / 99.81%35 100.0% / 98.67% 99.43% / 99.62% 100.0% / 99.81%36 99.43% / 98.48% 99.24% / 99.24% 100.0% / 99.81%37 99.81% / 99.05% 100.0% / 99.24% 99.81% / 99.81%38 99.81% / 98.86% 100.0% / 99.62% 100.0% / 99.81%39 99.43% / 99.05% 99.43% / 99.81% 99.81% / 99.81%40 99.81% / 98.67% 99.62% / 99.43% 99.37% / 99.81%

PCA Our Method

98.5

99

99.5

100

Acc

urac

y

PCA Our MethodSVM

98.5

99

99.5

100

Acc

urac

y

PCAKNN

98.5

99

99.5

100

Acc

urac

y

RFOur Method

Fig. 9. : Comparison between our method and PCA onthe robustness of classification accuracy

The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020)6 CITIC Jingling Hotel Beijing, Beijing, China, Oct.31-Nov.3, 2020

The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020) Beijing, China, Oct.31-Nov.3, 2020

A Novel Toolbox for Bearing Fault Detection Based on PCC and Residual Blocks

The derived feature reduction method with the corre-lation coefficient is a feature reduction method base onthe feature selection and PCA is an efficient feature re-duction method based on feature extraction. From theanalysis, the effectiveness of our method is verified witha relatively high accuracy and this method has a good ro-bustness. Therefore, the feature reduction method withthe correlation coefficient is an effective method for thefeature reduction and this method can be used as a goodbaseline in the bearing fault detection to meet requirementof high accuracy and robustness with the feature selectionmethod.

4.4. Baseline Based on Deep Neural NetworksWe implemented our baseline model as well as the ref-

erence model both via PyTorch framework. Optimizersused in the models were Adam, and the loss functionswere Cross Entropy Loss.

During the experiment, we divided the dataset intotraining set (640 pieces of data), validation set (152 piecesof data), and test set (528 pieces of data). Training setand validation set are used to perform cross-validation tofind ideal hyperparameters of the models. Test resultswere achieved by evaluating models on the test set aftertraining. The hyperparameters we used in the experimentwere: learning rate set as 0.0002, batch size set as 10.The GPU used for training is NVIDIA RTX2060, and thetraining epochs is set to 30 rounds.

We then evaluated the performances of our model com-paring to the reference model and the result is shown inFig 10.

Table 3. : Results of our Deep Neural Networks modelscomparing to the reference model, training time and ac-curacy is measured for 30 epochs

Model Performances and parameters

Training time (s) Model size (MB) Test accuracy

Reference 26 27.56 97.91%Our Model 118 190.02 100%

Comparing the performances in the first 10 epochs be-tween these two models (see table 3 for details), we cansee that the training loss of our model converges fasterand the testing accuracy soon reaches 100% and stays rel-atively stable at 100%.

Additionally, compared to the reference model, thewidth of the Convolutional layer and Residual layer in ourmodel is the same as the width of the input timing seriesdata, thus leading to larger model size and an increase inthe size. Nevertheless, the deep neural network still hasadvantages over feature engineering in terms of the train-ing time.

Compared with the Baseline method based on featureengineering, deep neural network-base method have re-quired a certain amount of parallel computing capabil-

2 4 6 8 100

20

40

60

80

100

Tes

ting A

ccura

cy

Reference Model

Our Baseline

2 4 6 8 10

Epoch

0

50

100

150

Tra

inin

g L

oss

Reference Model

Our Baseline

Fig. 10. : Comparison between our baseline model andreference model; Top: testing accuracy, Bottom: trainingloss.

ities. Therefore, it is necessary to confirm whether thehardware meets the requirements before using it.

5. Conclusion

In this paper, we addressed the problem of rollingbearing fault detection and proposed a novel pipeline forrolling bearing fault detection which includes two base-line methods that are based on feature engineering & di-mension reduction and deep neural network respectively.The models were tested on our rolling bearing dataset andreached promising results. The conclusions for each partof the pipeline is as follows

1. The Baseline based on feature engineering & di-mension reduction: In our Baseline model, we firstextracted features from time-domain, frequency do-main, and time-frequency domain. Then we applyPCC method to reduce the dimension of the features.We test our method on the dataset mentioned in Sec-tion 4.2 and our method presented a relatively highaccuracy and a good robustness when compared withPCA. Our model based on feature engineering & di-mension reduction can serve as a Baseline with a rel-ative high accuracy and robustness.

2. The Baseline based on Deep Neural Network: Inour Baseline model, we used Convolutional Blocksand Residual Blocks to realize low-level feature aswell as high-level feature extractors and achieved apromising result on the dataset mentioned in Sec-tion 4.2 outperformed significantly than the refer-ence model. Moreover, our model converges and

The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020)CITIC Jingling Hotel Beijing, Beijing, China, Oct.31-Nov.3, 2020 7

The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020) Beijing, China, Oct.31-Nov.3, 2020

Chengkun Li, Yufan Lin, Yujing Liu

reached 100% accuracy on the test set only after 10epochs with no sign of overfitting presented. We ver-ified that by using our baseline model, data augmen-tation is avoided even when the model is presentedwith a relative small dataset.

With all mentioned above, we recommend using ourtoolbox when given a rolling bearing fault detection prob-lem with the following reasons

1) Both methods can complete the task of bearing faultdetection in a relatively short time and at low cost,thus could provide alternative solutions and assis-tance for manual bearing inspection.

(2) When there is limitation imposed on the computingpower, the user could choose methods from the tool-box based on their own computing power and get arobust result regardless of the hardware restrictions.

(3) When there is need for feature dimension reductionbased on the features selection, the user could usethe Baseline method based on feature engineering &dimension reduction.

AcknowledgementsThis work was inspired by the DC data analysis competition onrolling bearing fault detection.

References:[1] M. Z. X, “Application of acoustic emission technique in fault di-

agnostics of rolling bearing,” Master’s thesis. Tsinghua Univer-sity,Beijing, Haidian, 2006.

[2] J. Harmouche, C. Delpha, and D. Diallo, “Improved fault diagnosisof ball bearings based on the global spectrum of vibration signals,”IEEE Transactions on Energy Conversion, vol. 30, no. 1, pp. 376–383, 2015.

[3] F. Immovilli, A. Bellini, R. Rubini, and C. Tassoni, “Diagnosis ofbearing faults of induction machines by vibration or current sig-nals: A critical comparison,” IEEE Transactions on Industry Appli-cations, vol. 46, no. 4, pp. 1350–1359, 2010.

[4] M. Kang, J. Kim, and J. M. Kim, “An fpga-based multicore systemfor real-time bearing fault diagnosis using ultrasampling rate ae sig-nals,” IEEE Transactions on Industrial Electronics, vol. 62, no. 4,pp. 2319–2329, 2015.

[5] A. B. Ming, W. Zhang, Z. Y. Qin, and F. L. Chu, “Dual-impulseresponse model for the acoustic emission produced by a spall andthe size evaluation in rolling element bearings,” IEEE Transactionson Industrial Electronics, vol. 62, no. 10, pp. 6606–6615, 2015.

[6] R. R. Schoen, T. G. Habetler, F. Kamran, and R. G. Bartheld, “Mo-tor bearing damage detection using stator current monitoring,” Pro-ceedings of 1994 IEEE Industry Applications Society Annual Meet-ing, vol. 1, pp. 110–116 vol.1, 1994.

[7] M. Blodt, P. Granjon, B. Raison, and G. Rostaing, “Models for bear-ing damage detection in induction motors using stator current moni-toring,” IEEE Transactions on Industrial Electronics, vol. 55, no. 4,pp. 1813–1822, 2008.

[8] D. Lopez-Perez and J. Antonino-Daviu, “Application of infraredthermography to failure detection in industrial induction mo-tors: Case stories,” IEEE Transactions on Industry Applications,pp. 1901–1908, 2017.

[9] E. Esfahani, S. Wang, and V. Sundararajan, “Multisensor wirelesssystem for eccentricity and bearing fault detection in induction mo-tors,” IEEE/ASME Transactions on Mechatronics, vol. 19, no. 3,pp. 818–826, 2014.

[10] S. Zhang, S. Zhang, B. Wang, and T. G. Habetler, “Deep learningalgorithms for bearing fault diagnostics a comprehensive review,”IEEE Access, vol. PP, no. 99, pp. 1–1, 2020.

[11] C. Taylor, “Rolling bearing analysis: 3rd edn.; by t. a. harris; pub-lished by wiley, chichester, west sussex, 1991; 1013 pp.; price,97.35,” vol. 155, no. 2, pp. 393–394, 1992.

[12] J. Zheng, J. Cheng, and Y. Yang, “Generalized empirical mode de-composition and its applications to rolling element bearing fault di-agnosis,” Mechanical Systems and Signal Processing.

[13] A. Bellini, F. Filippetti, C. Tassoni, and G. A. Capolino, “Advancesin diagnostic techniques for induction machines,”

[14] K. Xu, Z. Chen, C. Zhang, and G. Dong., “Fault diagnosis of rollingbearing based on empirical mode decomposition and support vectormachine,” in Control Theory and Application:1- 8, vol. 1, pp. 257–261, 2019.

[15] Z. Mou and Z. Du, “Double hidden layer neural network for bear-ing fault detection using pre-extracted optimal features,” in 201912th International Symposium on Computational Intelligence andDesign (ISCID), vol. 1, pp. 257–261, 2019.

[16] S. Li, Z. Li, and H. Li, “The method of roller bearing fault moni-toring based on wavelet packet engergy feature,” Journal of SystemSimulation, no. 1, pp. 76–80, 2003.

[17] W. Guo, H. Zhao, C. Li, Y. Li, and A. Tang, “Fault feature enhance-ment method for rolling bearing fault diagnosis based on waveletpacket energy spectrum and principal component analysis,” ActaArmamentarii, no. 11, 2019.

[18] J. Li, X. Yao, X. Wang, Q. Yu, and Y. Zhang, “Multiscale localfeatures learning based on bp neural network for rolling bearing in-telligent fault diagnosis,” Measurement, vol. 153, p. 107419, 2020.

[19] L. Wen, X. Li, L. Gao, and Y. Zhang, “A new convolutional neuralnetwork-based data-driven fault diagnosis method,” IEEE Transac-tions on Industrial Electronics, vol. 65, no. 7, pp. 5990–5998, 2017.

[20] Z. Cui, W. Chen, and Y. Chen, “Multi-scale convolutionalneural networks for time series classification,” arXiv preprintarXiv:1603.06995, 2016.

[21] Z. Wang, W. Yan, and T. Oates, “Time series classificationfrom scratch with deep neural networks: A strong baseline,” in2017 International Joint Conference on Neural Networks (IJCNN),pp. 1578–1585, 2017.

[22] T. Liu, A. Li, Y. Ding, Z. Li, and Q. Fei, “Experimental study onstructural damage alarming method based on wavelet packet energyspectrum,” Journal of Vibration and Shock, vol. 028, no. 004, pp. 4–9, 2009.

[23] Y. Gu, Z. Cheng, and F. Zhu, “Rolling bearing fault feature fusionbased on pca and svm,” china mechanical engineering, 2015.

[24] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning forimage recognition,” in Proceedings of the IEEE conference on com-puter vision and pattern recognition, pp. 770–778, 2016.

The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020)8 CITIC Jingling Hotel Beijing, Beijing, China, Oct.31-Nov.3, 2020

The 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA2020) Beijing, China, Oct.31-Nov.3, 2020