challenging anomaly detection in wire ropes using linear ...cad.pdf · ropeway are presented in...

Challenging Anomaly Detection in Wire Ropes Using Linear PredictionCombined with One-class Classification

Esther-Sabrina Platzer1, Joachim Denzler1, Herbert S̈uße1, Josef N̈agele2 and Karl-Heinz Wehking2

1Chair for Computer Vision, Friedrich Schiller University of Jena2Institute of Mechanical Handling and Logistics, University Stuttgart

Email: {platzer,denzler,herbert.suesse}@informatik.uni-jena.de

Abstract

Automatic visual inspection has gathered a high im-portance in many fields of industrial applications.Especially in security relevant applications visualinspection is obligatory. Unfortunately, this taskcurrently bears also a risk for the human inspec-tor, as in the case of visual rope inspection. Thehuge and heavy rope is mounted in great height,or it is directly connected with running machines.Furthermore, the defects and anomalies are so in-conspicuous, that even for a human expert this is avery demanding task. For this reason, we presentan approach for the automatic detection of defectsor anomalies in wire ropes. Features, which incor-porate context-information from the preceding roperegion, are extracted with help of linear prediction.These features are then used to learn the faultlessand normal structure of the rope with help of a one-class classification approach. Two different learn-ing strategies, the K-means clustering and a Gaus-sian mixture model, are used and compared. Theevaluation is done on real rope data from a rope-way. Our first results considering this demandingtask show that it is possible to exclude more than 90percent of the rope as faultless.

1 Introduction and Motivation

Wire ropes are used in many fields of daily life:they bear the weight of bridges and can be found asload cable in every elevator. Moreover, they are thefoundation of every ropeway. All mentioned appli-cations of wire ropes indicate a high risk for humanlife if such a wire rope is damaged and therefore notsafe any more. It is thus not astonishing that thereexist strict rules for a regular inspection [14].

Typical defects in wire ropes are small wire

Figure 1: Rope defects: in the upper image you cansee a wire fraction and in the image below a wire ismissing.

fractions, damaged rope material due to lighteningstrokes or missing wires. Figure 1 displays two ofthe mentioned defects, marked in the rope. Fur-thermore, a reduced stress of the rope or untwistingcan be the origin for evolving defects and structuralanomalies. For many applications, the ends of therope have to be connected by interweaving individ-ual wires of the one end with lacings of the otherend. Every time such an interleaving is done, thereis a special structural modification, called knot. Thearea which covers all knots is called splice and isknown as a region of higher liability to defects.Therefore, it is also important to detect these knotsas a structural anomaly.

Visual inspection of wire ropes is a difficult anddangerous task [14]. The inspectors are exposedto many risks like crashing or injuries caused bythe closeness to the running rope. Besides, the in-spection speed is quite high (on average 0.5 me-ters/second) which makes it a hard effort, to concen-trate on the passing rope without missing small de-fective rope regions. For this reason, an automatic

VMV 2008 O. Deussen, D. Keim, D. Saupe (Editors)

Figure 2: A prototype version of the device whichacquires the rope data

visual inspection would be an important improve-ment. This was the motivation for the developmentof a prototypic acquisition system based on four linecameras and yielding rope data in a digital manner[14]. This prototype system can be seen in figure 2.Using this new technique, it became possible to in-spect the acquired data in a safe and comfortableway and without time pressure.

A solution to the problem of automatic visual in-spection of wire ropes, based on the digitally ac-quired data, is introduced in this paper. Consideringvisual inspection of wire ropes as an computer vi-sion task bears some further problems. First thedefective regions mostly are limited to small areasand the image quality is deranged by mud, powder,grease or water as well as reflections caused by thelighting conditions. So even for a human it is hard tolocate small defects like a missing or broken wire orsmall structural anomalies in the image data. There-fore, contextual information from neighbouring re-gions is an important information for the detectionof abnormal regions.

Regarding machine learning strategies, restric-tions are made due to the small sample set of de-fects, which is available in advance. Because safetyis the key aspect, inspection instructions are rigor-ous. This makes it hard to collect enough sampledefects of the different defect classes for supervisedlearning methods. In contrast, it is no problem todesign a huge sample set of defect-free rope data.

The only obtainable ground truth data is deliveredfrom the human inspector by manually labeling thedefects in a rope data sample set. Hence, statisticallearning approaches are limited to one-class clas-sification. Given just a sample set of defect-free,structural consistent rope data, the task is to deter-mine a boundary around this data in order to op-timize the separation between target class and out-liers [19].

1.1 Related Work

Comparable state-of-the-art work can be found inthe field of visual inspection of textile fabrics [18],fault detection in material-surfaces [9] or othercomparable anomaly detection tasks.

Most of the approaches for anomaly and defectdetection in this context are based on textural fea-tures, like local binary patterns (LBP) [18] or ga-bor filter outputs [9, 8]. Chan and Pang [2] makeuse of Fourier analysis to detect defects in fabrics.The approaches of Chetverikov [3] and Vartianinenet al. [20] both are engaged with the problem ofirregularity detection in regular patterns or struc-tures. Whereas Chetverikov [3] uses regularity fea-tures based on the auto-correlation function for thedetection of non-regular structures, Vartianinen etal. [20] separate regular and irregular image infor-mation with help of the Fourier transform.

Mostly, defect detection based on thresholdingthe filtered or transformed image data [8] is per-formed. Chetverikov [3] uses a clustering approachand defines a threshold on the maximum distanceto the cluster center for the detection of defects.Threshold determination is performed by trainingon defect-free samples. More sophisticated learn-ing strategies use unsupervised classification meth-ods like Gaussian mixture models [21] or self orga-nizing maps (SOM) [10]. These approaches can bearranged in the context of novelty detection. An ex-tensive overview over the field of novelty detection,also known as one-class classification or outlier de-tection, is presented in [12, 13, 6]. More detailedwork can be found in [19].

However, most of these approaches for defect de-tection do not take into consideration the aspect ofcontextual information over time. For the detectionof small structural anomalies and defects in ropedata this contextual information is highly important.In this paper, rope data is regarded as a time series.A model which can explain the appearance of the

data is sought. As one key technique in the fieldof time series analysis linear prediction (LP) shouldbe mentioned. Linear prediction aims to predict thenext value of a time series based on thep last val-ues and so incorporates a certain temporal context.In speech processing it is used to model the hu-man vocal tract [15]. Although linear prediction isknown for decades, it was recently used in the fieldof defect detection. Hajimowlana et al. [5] use lin-ear prediction for defect detection in web inspectionsystems. Based on the prediction error they are ableto localize defects in textures, containing a constantbackground. In [17] 2-dimensional linear predic-tion is used for the automatic determination of seedpoints for a region growing algorithm in order todetect microcalcifications in mammograms.

In this paper, a new approach combining featuresextracted by linear prediction with a one-class clas-sification approach for defect detection in wire ropedata is presented. Section 2 introduces how LPfeatures are extracted from multichannel rope data.These features are used in section 3 to form an indi-vidual feature space for every channel, which servesfor the separation into target class (non-defectiverope regions) and outliers (possible defects). Ex-periments and results using real rope data from aropeway are presented in section 4. They reveal theusability of the presented approach for automaticdefect detection in wire ropes. A discussion aboutthe contribution of this paper and the future workcan be found in section 5.

2 Linear Prediction of Rope Data

Linear prediction can be seen as one of the key tech-niques in speech recognition. It is used to computeparameters that determine the spectral characteris-tics of the underlying speech signal. Related ap-plications are furthermore the recognition of EEGsignals or the analysis of seismic signals [11].

The general idea is to develop a parametric repre-sentation that models the behaviour of the underly-ing signal. In case of linear prediction, this is doneby forecasting the valuex(t) of the signalx at timetby a linear combination of thep past valuesx(t−k)with k = 1, . . . , p. p is the order of the autoregres-sive process. The prediction̂x(t) of a 1-D signalcan be written as:

x̂(t) = −

pX

k=1

αkx(t − k) (1)

and the prediction error can be derived as:

e(t) = x(t)− x̂(t) = x(t)+

pX

k=1

αkx(t−k). (2)

This motivates the choice of linear prediction forfeature extraction, because for the prediction of theactual value the contextual information of the pastvalues is used. By this, context information is im-plicitly incorporated in the resulting feature.

A general assumption made in linear prediction isthe stationarity of the signal. For this reason a non-stationary signal should be segmented in relativelysmall, overlapping frames [15]. Hence, the signalis multiplied with a window functionw(t), whichis chosen to be a rectangular window. The windowsizeN is dependent on the application and here waschosen to be of 40 pixels width due to a strong re-sponse of the auto-correlation function for a transla-tion of 40 pixels. This peak corresponds to a periodof the regular wire structure. The quadratic errorfunction for the whole window witht ∈ [t0, t1] is

E =

t1X

t=t0

(e(t))2 =

t1X

t=t0

(x(t) − x̂(t))2 (3)

=

t1X

t=t0

(

pX

k=0

αkx(t − k))2 (4)

with α0 = 1. Based on this least-squares formula-tion, the optimal parameters~α = (1, α1 . . . αp) canbe derived by solving the following system

δE

δαi= 2

pX

k=0

αk

t1X

t=t0

x(t − k)x(t − i) = 0 (5)

with i = 1, . . . , p. Considering thatα0 = 1 onecan rewrite (5) as the following set of equations

pX

k=1

αkφki = −φ0i for i = 1, . . . , p (6)

with φki =Pt1

t=t0x(t − k)x(t − i). These equa-

tions are known as the normal equations [11] andcan be solved for the prediction coefficientsαk, 1 ≤k ≤ p. The optimal parameter set is derived by theuse of the auto-correlation method [11, 15]. Due tothe assumption of a stationary signal covered by thewindow, the short-time auto-correlation function

R(i) =

t1−iX

t=t0

x(t)x(t + i) (7)

can be used to replaceφki by R(k− i). This resultsin

pX

k=1

αkR(k − i) = −R(i) (8)

whereR(k − i) forms the auto-correlation matrix,which is known to be a Toeplitz matrix [15]. SuchToeplitz matrix systems can be solved efficiently byuse of the Levinson-Durbin recursion [11].

2.1 1-D Feature Extraction

As result of the Levinson-Durbin recursion one getsthe parameter vector~α = (1, α1 . . . αp), contain-ing the LP coefficients and the minimum total er-ror Ep of order p [11]. Because of windowingthe signal into sufficient small frames, a possibledefect will probably cover large parts of the win-dow. Hence, linear prediction in this window wouldcause an optimal fitting of the model to the defectand does not compulsory result in a high total er-ror. For this reason, the overall total prediction errorseems not to be a distinctive indicator of a defect.

Furthermore, the rope data is a 2-dimensionalsignal. Thus 1-D linear prediction is not effec-tual to analyse the whole signal. To overcomethis, the rope data is considered as a multichanneltime series. The signal~x consists ofc channels~x = (~x1 ~x2 . . . ~xc)

T and every channel representsa 1-dimensional time series~xi = (xi(1) . . . xi(t))with i ∈ {1, . . . , c}. For every channeli of this sig-nal an individual 1-D linear prediction is performed,leading to the estimatêxi(t) and a representativecoefficient vector~αi. This resulting coefficient vec-tor ~αi is used directly as corresponding feature forthe actual frame and the channeli.

3 One-class Classification of RopeData

After feature extraction, separation between fault-less and faulty samples is desired. The faultlesssamples represent the target classωt and the de-fects are considered as outliersωO. In other words,a representation for the target densityp(~α | ωT )is searched without any knowledge about the out-lier densityp(~α | ωO) [19]. This way of looking atthe one-class classification problem fits very well allpresent problems. So it is no problem to construct a

large training sample set of defect-free feature sam-ples for finding an optimal description of the targetdensity. Furthermore, the small number of avail-able, defective samples is no problem any longer, aswell as the fact that not all possible defects can becovered by such a sample set. Considering the goalto exclude as many rope meters as possible from afurther inspection, the theory of one-class classifi-cation seems to be a good choice.

A comparison of the problem of one-class clas-sification with the normal binary classification isgiven in [19] and is shortly summarized in the fol-lowing. In binary classification problems an opti-mal parameter vector~w∗ is searched for in the train-ing, which minimizes the total errorε of the classi-fication functionf(~α; ~w) with respect to the class-labely.

εtrue(f, ~w, A) =

Z

ε(f(~α; ~w), y)p(~α, y) d~α dy

(9)

Here f is the classification function which is de-pendent on the parameters~w. The error functionεcan be an arbitrary function for defining the classi-fication error, like the mean squared error for real-valued classification functions or the 0-1-loss for adiscrete valued functionf . A is the feature space.

In contrast, for one-class classification problemsthe only obtainable information is that of the tar-get density. This solely allows a minimization ofthe rejection of defect-free samples. Hence, the to-tal error (9) cannot be minimized due to the lack ofknowledge about the outlier density. The total erroris therefore replaced by two other error functions:the false negative rate (FNR) and the false positiverate (FPR). The false negative rate measures howmany faultless samples are regarded as outlier withrespect to all positive samples, and the false positiverate gives the amount of defects wrongly classifiedas fault-free. The FNR can be measured directlyfrom the training set. By treating the whole featurespace as target density, the FNR would be maxi-mized in a trivial manner. Instead, the false positiverate (FPR) cannot be measured in training withouta sample set containing a sufficient number of de-fective samples. However, assuming a uniform dis-tribution of outliers, a minimization of the FNR incombination with a minimization of the descriptivevolume of the target densityp(~α | ωT ) results ina minimization of the FNR together with the FPR[19].

There exist many different methods for nov-elty detection [6]. Tax [19] divides them intothree different categories: density-based methods,boundary methods and reconstruction-based meth-ods. Density-based methods attempt to estimate thewhole probability density of the target class. Ex-amples are Gaussian distributions or Gaussian mix-ture models (GMM). Boundary methods focus oncomputing the boundaries of the target class with-out estimating the complete density. This is ad-vantageous, especially if the training sample set isnot representative or consists only of few examples[19]. Reconstruction-based methods subdivide thefeature space and represent it by subspaces or proto-types. Most of these approaches make use of a pri-ori knowledge about the data or the generating pro-cess [19]. For this work, two approaches are cho-sen: the K-means clustering and a Gaussian mixturemodel. Both approaches are shortly summarized inthe following subsections.

3.1 K-means Clustering

K-means clustering is chosen due to its simplicityand its feasibility to represent the feature space asa set of prototype vectors [6]. Considering the ropedata, it is an acceptable assumption to think of thefeatures as representatives of a certain signal char-acteristic, which repeats over time due to the pe-riodic structure of the rope. Hence, an approachwhich subdivides the feature space into clusters be-longing to one of the prototypes, fits this assump-tion. The number of prototypes equals the numberof clustersK and is predefined in advance. Fortraining, resulting in the prototype vectors and acluster radius, the following error function is mini-mized. By that, for every feature~α extracted by lin-ear prediction, its nearest cluster center~µk is com-puted in the sense of the euclidian distance:

εkmeans =X

i

“

mink

||~αi − ~µk||2

”

. (10)

Training is done via an iterative procedure. In thefirst step new samples are assigned to the nearestcluster, and in the second step the cluster centers,representing the prototype, are recomputed as themean vector of all cluster samples [1]. After thetraining, it is possible to define a threshold on themaximum distance to a cluster as a criterion for out-lier rejection. The distance of a feature~α′ to the

nearest prototype is defined as

dkmeans(~α′) = min

k||~α′ − ~µk||

2. (11)

The maximum distance obtained for a feature in thetraining stage is used for threshold computation [6].Due to noise and uncertainty in the training set, themaximum distance is not a good choice. It is stated,that an optimal thresholdtopt should be among themean distancedmean and the maximum distancedmax measured during the training step.

topt = λdmean + (1 − λ)dmax (12)

Hereλ with 0 ≤ λ ≤ 1 is the free parameter steer-ing the threshold. A good evaluation method for anoptimally chosen thresholdtopt are receiver operat-ing characteristics (ROC) [16]. This evaluation isgiven in section 4.

3.2 Gaussian Mixture Models

As second classification strategy a Gaussian mix-ture model is chosen. Estimating the underlyingdensity only with help of a simple Gaussian wouldbe too inflexible due to the unimodal character ofa single Gaussian. A way to approximate complexdensities is provided by the usage of Gaussian mix-ture models [1]. Many authors use a Gaussian mix-ture model with a predefined number of Gaussiankernels to describe the underlying density in thesense of one-class classification, e.g. [19, 16, 21].A Gaussian mixture consists of a linear combina-tion of K single Gaussian kernels with dimensiond.

pN (~α | ~µ, Σ) =

1

(2π)d

2 |Σ|1

2

exp

„

−1

2(~α − ~µ)T Σ−1 (~α − ~µ)

«

(13)

Each Gaussian componentj = 1, . . . , K isparametrized by its mean vector~µj and its covari-ance matrixΣj as well as the corresponding mixingcoefficientπj .

pMoG(~α) =K

X

j=1

πjpN (~α | ~µj , Σj) (14)

The mixing coefficientsπj represent the influenceof that component with respect to the overall den-sity. It holds that

PK

j=1πj = 1.

(14) gives the likelihood of a certain sample~α belonging to the target density. The parame-ters~π = {π1, . . . , πK}, ~µ = {~µ1, . . . , ~µK} andΣ = {Σ1, . . . , ΣK} are derived in the training stepbased on the EM-algorithm [1]. In contrast to the K-means clustering, the Gaussian mixture model com-putes a soft assignment for every new sample con-sidering the membership to a certain component [1].For the use of density-based models it is possible todefine an analytic threshold [19]. One strategy fordoing so is presented in [7]. In this work, the thresh-old is determined in a similar manner as describedin section 3.1 for K-means clustering. Based onthe minimal likelihood, assigned to one of the train-ing samples, and the mean likelihood achieved inthe training, the optimal threshold is evaluated withhelp of ROC-curves and results are presented in sec-tion 4.

3.3 Anomaly Detection Model

In this section, the anomaly detection model is in-troduced. By regarding the rope data as a multi-channel version of a 1-D time series, it becomesclear that the signal characteristics of the individ-ual channels are not equal. This is induced by thedifferent periodicity of the local structure present invariable regions of the rope. Especially in regionsaround the rope borders, the period length of a light-dark change is different from that of the interiorrope regions. Furthermore, defective regions arelimited to only a small amount of channels. Hence,the construction of a very high-dimensional featurevector out of the feature vectors for every individualchannel would suppress the defect probability of theframe in case of a present defect. For this reason,every channel is treated separately concerning fea-ture extraction and classification. Figure 3 presentsa rough sketch of the underlying model. For everychannel its own feature space is created. In everyfeature space clustering of the defect-free samples isperformed, resulting in a representation of the over-all density by an individual model. Testing with realrope data is also performed channel-wise. For everychannel an individual decision about the rejectionas outlier is made. Unfortunately, this technique ingeneral leads to a high rejection rate of defect-freesamples due to noise or mud, existent in at least oneof the channel-signals. Therefore, the proposed ap-proach makes use of neighbourhood information inthe decision process. The classification output is

examined in a fixed-sized channel-neighbourhood,and the whole frame is only labeled as anomaly ifthe number of potential outliers in one neighbour-hood exceeds a certain thresholdtN . Experimentalevaluation of this threshold is presented in section 4.

Since most of the defects in real rope data havean elongation between 100 and 300 lines, a furtherassumption is made for the outlier detection: if oneframe in the range of this defect is rejected as anoutlier, the whole defect is marked as detected. Thisprocedure is necessary at the moment due to the fit-ting characteristic of the linear prediction model tothe underlying data. From this follows, that it is notpossible to determine the exact area of the error withthe presented approach. In case of a detection runin a real-life scenario without groundtruth-labeling,the human expert would have to control a sufficientlarge area around the outlier frame.

4 Experiments and Results

Evaluation is done on real rope data, which was ac-quired within the feasibility study for the alreadymentioned prototype device. For this reason, theunderlying dataset contains real rope defects, andthe data is noisy and deranged with mud or wa-ter. Since these are real-life problems, the followingevaluation can proof the practicability of the pre-sented approach.

The data set we used has 13.618.176 lines ac-quired by each line camera. For every view thereexists a labeled error table, giving the range and thelabel of every defect or anomaly in the rope. For thelearning step only rope regions which are assumedto be fault-free were used. For the testing phase aconnected region with all available defects was cho-sen. The number of defects in this region of about13.100.000 lines varies from view to view between9 and 13.

The evaluation was done with respect to a com-parison of the two different one-class classificationstrategies. We used Torch3 library [4] as imple-mentation for the K-means clustering, as well as forthe Gaussian mixture model. The performance ofthe whole system can be best assessed with helpof ROC curves, which show the system behaviourconcerning false and true positive rates with respectto the threshold for outlier rejection. Furthermore,some of the parameters are evaluated in order tofind an optimal setting. These are in the follow-

Figure 3: Multichannel-version of the classification model. For every channel in the frame (blue) a featureis extracted and examined in a separate feature space for that channel.The red part in the detection windowmarks the signal values, which are predicted in the actual step.

-0.1

-0.09

-0.08

-0.07

-0.06

-0.05

-0.04

-0.03

-0.02

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Mea

n D

ista

nce

Number of Clusters

channel 50

4

4.1

4.2

4.3

4.4

4.5

4.6

4.7

4.8

4.9

5

5.1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Mea

n P

roba

bilit

y

Number of Gaussians

channel 50

Figure 4: Plot of the mean distance with respect to the number of clusters (left) and mean log pdf-valuewith respect to the number of Gaussians (right), achieved in a training stepwith 100.000 lines of defect-freerope data. Evaluation was done on channel 50 of view 2.

ing the size of the training set (in lines), the numberof channel-outliers which have to be reached in or-der to consider the whole frame as outlier, and thenumber of cluster centers or Gaussians, which arepredefined for the classification task.

4.1 Number of Clusters/Gaussians

In order to determine the optimal number of clus-ters or Gaussians for the one-class classificationof defect-free rope data, the mean distance (meanprobability density function-value (pdf) of one fea-ture) reached in the learning step was used. Thismean distance (pdf-value for the Gaussian mixturecase) is assumed to decrease (increase) for everyadded cluster (Gaussian). Indeed, adding one ad-ditional cluster or Gaussian to a small number ofalready used ones, should have a higher impact onthe mean distance (pdf-value), than adding an addi-tional cluster or Gaussian to an already huge num-

ber of clusters/Gaussians. The plots in figure 4show exemplary the results for the channel 50 outof 130. For the other channels results are similar.Note that Torch3 implementation, like common ap-proaches, uses the log likelihood instead of the like-lihood. To be consistent concerning the range, theytherefore negated the euclidian distance used in theK-means approach. Based on these plots, the num-ber of clusters for the K-means approach was set tofive. In the Gaussian mixture model 8 componentswere used in the following experiments.

4.2 ROC Curves

ROC curves describe the behaviour of the systemwith respect to true and false positive rates and avarying threshold for the rejection of one class. Thegoal is in general to maximize the TPR and min-imize the FPR. However, in the context of visualrope inspection it is important not to miss any de-

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Tru

e P

ositi

ve R

ate

False Positive Rate

Kmeans Clustering, k=5Gaussian Mixture, k=8

Figure 5: ROC curves for the usage of K-meansclustering (red) and a Gaussian mixture model(green) with 5 respectively 8 clusters/Gaussians anda learning sample set consisting of features from10.000 lines of defect-free rope.

fect. On account of this, the primary and most im-portant interest is to minimize the FPR. A maxi-mization of the TPR is only suggestive while theFPR stays zero. The ROC curves in figure 5 showthe system behaviour with regards to the thresholdcomputation from (12), whereλ is varied from0-1 in steps of0.05. The number of lines used fortraining was 10.000. Apparently, the Gaussian mix-ture approach outperforms the K-means clustering.Nevertheless, both strategies result in a quite hightrue positive rate. Hence, a huge amount of rope,up to 90 percent and more, can be excluded fromfurther inspection.

4.3 Influence of the Training Size

Clearly, the size of the training sample set has aninfluence on the results, because the goal is to bestrepresent the underlying density of defect-free sam-ples. For this task, it is important to cover enoughfaultless observations from the rope data in orderto discriminate between real defective sample andnoisy, but defect-free samples. Testruns, resultingin ROC curves, for both classification approachesbased on different sized training sample sets (2.000,20.000 and 200.000 lines of rope data) were per-formed. The outcome of the experiments can beseen in figure 6. A size of 2000 lines is not suffi-cient for the chosen setting with a frame width of 40lines and an overlap of 15 lines. Best results wereobtained for a training sample set consisting of fea-

tures from 200.000 lines of rope data. The courseof the ROC curve for K-means clustering and learn-ing with 2.000 lines of rope data shows a curiousbehaviour at one point. This can be caused by therandom initialisation of the K-means clustering inevery learning step. Note that even the miss of onedefect causes the FPR to increase apparent.

4.4 Impact of the Outlier Threshold

This experiment is used to show the impact of theheuristic pre-processing, which is done after theclassification of every individual channel. Given aneighbourhood of 15 channels, the whole frame isonly rejected as an outlier if a sufficient number ofchannels in this neighbourhood vote for an outlier.This threshold is varied in our experiment between1 and14 and figure 7 shows the result fortN = 3, 5and10 respectively. The results for the Gaussiancase (right) are close to the results for the K-meansapproach (left). It is obvious that too high valuesfor tN result in lower overall performance becausethe procedure is made less sensitive for defects. Onthe other hand a very low threshold will also reducethe performance due to a high sensitivity for noise.Regarding the plots, a threshold between three andfive seems to be a good choice in order to achieve ahigh TPR with minimal FPR.

5 Conclusion and Outlook

A new approach for the challenging, automatic vi-sual inspection of wire ropes was presented. As afirst step, features based on linear prediction wereintroduced. The use of this features is justifiedby their ability to incorporate the temporal context,which is useful to detect deflections from a normalstructure. A combination of a one-class classifi-cation strategy using K-means clustering or Gaus-sian mixture models with the LP-based features wasmotivated and described. The presented work inte-grates well in the field of novelty detection, withrespect to an optimal separation of defective or ab-normal samples from the learned, fault-free struc-ture of wire ropes. The evaluation of the whole sys-tem proofs its applicability for the real-life problemof visual inspection. The system is able to excludeup to 90 percent and more of the rope data fromfurther visual inspection. Considering the severityof the data, this is a remarkable result. Indeed, it

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Tru

e P

ositi

ve R

ate

False Positive Rate

2000 Training lines20000 Training lines

200000 Training lines 0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Tru

e P

ositi

ve R

ate

False Positive Rate

2000 Training lines20000 Training lines

200000 Training lines

Figure 6: ROC curves for usage of K-means clustering (left) and a Gaussian mixture model (right) withdifferent sized training sets of 2000 (red), 20000 (green) and 200000 (blue) lines of rope data.

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Tru

e P

ositi

ve R

ate

False Positive Rate

3 outlier5 outlier

10 outlier 0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Tru

e P

ositi

ve R

ate

False Positive Rate

3 outlier5 outlier

10 outlier

Figure 7: ROC curves with regard to the outlier rejection thresholdtN = 3, 5, 10 for the K-means approach(left) and the Gaussian mixture model (right). Training was performed with 20000 defect-free sample lines.

is not possible to classify the potential defects auto-matically with respect to their corresponding defectclass. That will be in the focus of future work. Fur-thermore, it has to be restudied if a context-basedclassification approach from the field of time se-ries analysis, like a hidden markov model, will im-prove the system in a way, that makes it possibleto determine the exact area of the defect. Beyond,the incorporation of a priori knowledge about de-fects from the few examples available in advancewill be a topic under investigation. A compari-son with another frequently used one-class classi-fication method, the support vector data description[19], is also of great interest.

Acknowledgment This research is supported bythe German Research Foundation (DFG) within theparticular projects DE 735/6-1 and WE 2187/15-1.

References

[1] C. M. Bishop. Pattern Recognition and Ma-chine Learning. Springer, 2006.

[2] C. Chan and G. K. H. Pang. Fabric DefectDetection by Fourier Analysis.IEEE Trans-actions on Industry Applications, 36(5):1267–1276, 2000.

[3] D. Chetverikov. Structural Defects: GeneralApproach and Application to Textile Inspec-tion. In Proceedings of the 15th International

Conference on Pattern Recognition (ICPR),volume 1, pages 521–524, 2000.

[4] R. Collobert, S. Bengio, and J. Mariéthoz.Torch: a modular machine learning softwarelibrary. Technical report, IDIAP, 2002.

[5] S. H. Hajimowlana, R. Muscedere, G. A. Jul-lien, and J. W. Roberts. 1D AutoregressiveModeling for Defect Detection in Web Inspec-tion Systems. InProceedings of the 1998Midwest Symposium on Circuits and Systems,pages 318–321, 1998.

[6] V. J. Hodge and J. Austin. A Survey of Out-lier Detection Methodologies.Artificial Intel-ligence Review, 22(2):85–126, 2004.

[7] J. Ilonen, P. Paalanen, J.-K. Kamarainen, andH. Kalviainen. Gaussian mixture pdf in one-class classification: computing and utilizingconfidence values. InProceedings of the 18thInternational Conference on Pattern Recogni-tion (ICPR), volume 2, pages 577–580, 2006.

[8] A. Kumar and K. H. Pang. Defect Detec-tion in Textured Materials Using Gabor Fil-ters. IEEE Transactions on Industry Apllica-tions, 38(2):425–440, 2002.

[9] C. Madoriota, E. Stella, M. Nitti, N. Ancona,and A. Distante. Rail corrugation detection byGabor filtering. InProceedings of the IEEEInternational Conference on Image Process-ing (ICIP), volume 2, pages 626–628, 2001.

[10] T. Mäenp̈aä, M. Turtinen, and M. Pietik̈ainen.Real-time surface inspection by texture.Real-Time Imaging, 9(5):289–296, 2003.

[11] J. Makhoul. Linear Prediction: A Tutorial Re-view. Proceedings of the IEEE, 63(4):561–580, 1975.

[12] M. Markou and S. Singh. Novelty detection: areview - part 1: statistical approaches.SignalProcessing, 83(12):2481 – 2497, 2003.

[13] M. Markou and S. Singh. Novelty detection:a review - part 2: neural network bases ap-proaches. Signal Processing, 83(12):2499 –2521, 2003.

[14] D. Moll. Visuelle und magnetinduktiveSeilpr̈ufung bei Bergbahn- und Brückenseilen.In Tagungsband des 2. InternationalenStuttgarter Seiltags, pages 1–11, 2005.

[15] L. Rabiner and B.-H. Juang.Fundementals ofspeech recognition. Prentice Hall PTR, 1993.

[16] F. Ratle, M. Kanevski, A.-L. Terrettaz-Zufferey, P. Esseiva, and O. Ribaux. A Com-

parison of One-Class Classifiers for NoveltyDetection in Forensic Case Data. InIntelli-gent Data Engineering and Automated Learn-ing (IDEAL), pages 67–76. Springer, 2007.

[17] C. Serrano, J. Diaz-Trujillo, B. Acha, andR. M. Rangayyan. Use of 2D Linear Pre-diction Error to detect Microcalcifications inMammograms. InProceedings of the 2ndLatin American Congress of Biomedical En-geneering, Havana, Cuba, pages 1–4, article330, 2001.

[18] F. Tajeripour, E. Kabir, and A. Sheikhi. Fab-ric Defect Detection Using Modified Local Bi-nary Patterns.EURASIP Journal on Advancesin Signal Processing, 8(1):1–12, 2008.

[19] D. M. J. Tax. One-Class classification:Concept-learning in the absence of counter-examples. Phd thesis, Delft University ofTechnology, 2001.

[20] J. Vartianinen, A. Sadovnikov, J.-K. Kama-rainen, L. Lensu, and H. K̈alviäinen. De-tection of irregularities in regular patterns.Machine Vision and Applications, 19(4):249–259, 2008.

[21] X. Xie and M. Mirmehdi. TEXEMS: Tex-ture Exemplars for Defect Detection on Ran-dom Textured Surfaces.IEEE Transactionson Pattern Analysis and Machine Intelligence,29(8):1454–1464, 2007.

challenging anomaly detection in wire ropes using linear ...cad.pdf · ropeway are presented in...

Documents