a methodology for improving tear film lipid layer classification

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 18, NO. 4, JULY 2014 1485

A Methodology for Improving Tear FilmLipid Layer Classification

Beatriz Remeseiro, Veronica Bolon-Canedo, Diego Peteiro-Barral, Amparo Alonso-Betanzos,Bertha Guijarro-Berdinas, Antonio Mosquera, Manuel G. Penedo, and Noelia Sanchez-Marono

Abstract—Dry eye is a symptomatic disease which affects a widerange of population and has a negative impact on their daily ac-tivities. Its diagnosis can be achieved by analyzing the interferencepatterns of the tear film lipid layer and by classifying them into oneof the Guillon categories. The manual process done by experts is notonly affected by subjective factors but is also very time consuming.In this paper we propose a general methodology to the automaticclassification of tear film lipid layer, using color and texture infor-mation to characterize the image and feature selection methods toreduce the processing time. The adequacy of the proposed method-ology was demonstrated since it achieves classification rates over97% while maintaining robustness and provides unbiased results.Also, it can be applied in real time, and so allows important timesavings for the experts.

Index Terms—Feature selection, Guillon categories, machinelearning, tear film lipid layer, textural features.

I. INTRODUCTION

T EARS are secreted from the lachrymal gland and dis-tributed by blinking to form the tear film of the ocular

surface [1]. The tear film is responsible for wetting the ocularsurface, which is the first line of defense, and is also essentialfor clear visual imaging [2]. Its outer layer, known as tear filmlipid layer (TFLL), is composed of a polar phase with surfactantproperties overlaid by a non-polar phase. It is the thinnest layerof the tear film and is mainly secreted by the meibomian glands,embedded in the upper and lower tarsal plates [3]. Some of itsfunctions include forming a glossy and smooth surface with highoptic qualities, establishing the tear film, sealing the lid margins

Manuscript received February 14, 2013; revised October 10, 2013; acceptedDecember 8, 2013. Date of publication December 11, 2013; date of currentversion June 30, 2014. This paper has been partially funded by the Secretarıade Estado de Investigacion of the Spanish Government and FEDER funds ofthe European Union through the research projects PI10/00578, TIN2011-25476,and TIN2012-37954; and by the Consellerıa de Industria of the Xunta de Galiciathrough the research projects CN2011/007 and CN2012/211. B. Remeseiro, V.Bolon-Canedo and D. Peteiro-Barral acknowledge the support of Xunta deGalicia under Plan I2C Grant Program.

B. Remeseiro, V. Bolon-Canedo, D. Peteiro-Barral, A. Alonso-Betanzos,B. Guijarro-Berdinas, M.G. Penedo, and N. Sanchez-Marono are with the Dept.de Computacion, Universidade da Coruna, Campus de Elvina s/n, A Coruna15071, Spain (e-mail: [email protected]; [email protected]; [email protected];[email protected]; [email protected]; [email protected]; [email protected]).

A. Mosquera is with the Dept. de Electronica y Computacion, Universidadede Santiago de Compostela, Campus Universitario Sur, Santiago de Compostela15782, Spain (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JBHI.2013.2294732

during sleep, preventing spillover of tears, and controlling waterevaporation from the tear film [4].

A quantitative or qualitative change in the normal lipid layerhas a negative effect on the quality of vision measured as con-trast sensitivity [5] and on the evaporation of tears from theocular surface. Actually, it has been shown that a substan-tial tear evaporation caused by alterations of the lipid layeris characteristic of the evaporative dry eye (EDE) [6]. This dis-ease leads to irritation of the ocular surface and is associatedwith symptoms of discomfort and dryness [7]. Also, it affectsa wide sector of the population, specially among contact lensusers, and worsens with age. The current work conditions, suchas computer use, has increased the proportion of people withEDE [8].

The lipid layer thickness can be evaluated by the classifica-tion of the interference phenomena, which correlates with tearfilm quality [9], since a thinner lipid layer speeds up water evap-oration, decreasing the tear film stability and often causing theEDE. The Tearscope plus is the instrument designed by Guillonfor rapid assessment of lipid layer thickness [10]. Other deviceswere designed for lipid layer examination, but Tearscope plusis still the most commonly used instrument in clinical settingsand research. In general, Tearscope images have been analyzedin vivo, without any acquisition process. However, there aresome studies that have used video/images acquisition proce-dures [11], [12], which aid the observer to correctly classify thepatterns and are indispensable for computer-based analysis. Theacquisition process in those studies are similar to that performedin this paper: Rolando et al. [11] acquired lipid layer videos us-ing the Tearscope plus, whilst Nichols et al. [12] obtained aphotograph series.

Guillon defined five main grades of lipid layer interferencepatterns [10] to evaluate the lipid layer thickness through theTearscope plus: open meshwork, closed meshwork, wave, amor-phous and color fringe. However, the classification into thesegrades is a difficult clinical task, especially with thinner lipidlayers that lack color and/or morphological features. The sub-jective interpretation of the experts via visual inspection mayaffect the classification. This time-consuming task is very de-pendent on the training and experience of the optometrist(s),and so produces a high degree of inter- and also intra- observervariability [13]. The development of a systematic and objec-tive computerized method for analysis and classification is thushighly desirable, allowing for homogeneous diagnosis and re-lieving the experts from this tedious task. Some techniques havebeen designed to objectively calculate the lipid layer thicknesswhere a sophisticated optic system was necessary [14] or an

2168-2194 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

1486 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 18, NO. 4, JULY 2014

Fig. 1. Steps of the research methodology.

interference camera evaluated the lipid layer thickness by onlyanalyzing the interference color [15].

In [13], [16], it was demonstrated that the interference phe-nomena can be characterized as a color texture pattern, so theclassification could be automatized using machine learning tech-niques. Similarly, in [17] a wide set of feature vectors was pro-posed using different texture analysis methods in three colorspaces and a 95% of classification accuracy was obtained. Upto the authors’ knowledge, there are no more attempts in theliterature to automatize TFLL classification.

The problem with the approach proposed in [17] is that thetime required to extract some of the textural features is too long(more than one minute). Interviews with optometrists revealedthat a scale of computation time over ten seconds per imagemakes the system not usable. Therefore, the previous approachprevents the practical clinical use of an application developedto automatize the process, because it could not work in realtime. In order to deal with this problem, feature selection canplay a crucial role. In machine learning, feature selection (FS) isdefined as the process of detecting the relevant features and dis-carding the irrelevant ones [18], to obtain the subset of featuresthat describes properly the given problem. Therefore, becausethe number of features to extract and process will be reduced,the time required also will be reduced in consonance, and mostof the times, this can be achieved with a minimum degradationof performance.

In this paper, a methodology for improving TFLL classi-fication will be presented, in which feature selection plays acrucial role by optimizing both accuracy and processing time.For this purpose, several FS methods will be applied after ex-tracting the color and texture features trying to improve theperformance of the results achieved in [17]. Moreover, since theevaluation of the system is performed in terms of several crite-ria, a multiple-criteria decision-making (MCDM) method willbe used to obtain the best solution. This will lead us to a gen-eral methodology able to be applied to some other classificationproblems.

This paper is organized as follows: Section II describes theresearch methodology, Section III presents the materials andmethods employed in this paper, Section IV shows the resultsand discussion and, finally, Section V includes the conclusions.

II. RESEARCH METHODOLOGY

In order to obtain an efficient method for automatic TFLLclassification, a five-step methodology was applied as illustratedin Fig. 1. First, feature extraction is performed, after which FSmethods are applied to select the subset of relevant features,that allow for correct classification of the TFLL thickness. Afterthat, several performance measures are computed to evaluate theperformance of the system. Finally, a MCDM method is carriedout to obtain a final result. In what follows, every step will beexplained in depth.

Fig. 2. Steps of the feature extraction step.

Fig. 3. (a) Input image in RGB. (b) Sub-template and region of interest.

A. Feature Extraction

The initial stage in this methodology consists in process-ing the tear film images in order to extract their features (seeFig. 2). First, the region of interest of an input image in RGBis extracted. Then, this extracted region in RGB is converted tothe Lab color space and its channels L, a and b are obtained.After that, the texture features of each channel are extractedand three individual descriptors are generated. Finally, the threeindividual descriptors are concatenated in order to generate thefinal descriptor of the input image which contains its color andtexture features. Next, every stage will be explained in detailapart from the Concatenation stage due to its simplicity.

1) Extraction of the Region of Interest: The input images,as depicted in Fig. 3(a), include several areas of the eye that donot contain relevant information for the classification. Expertsusually focus on the bottom part of the iris, because this is thearea where the tear can be perceived with the best contrast.This forces a preprocessing step aimed at extracting the regionof interest (ROI) [19]. The acquisition procedure guaranteesthat this region corresponds to the most illuminated area of theimage. To this end, the input image is transformed to the Labcolor space and only the L component is considered in this step.Also, a set of ring-shaped templates that define different ROIshapes is used. Then, the normalized cross-correlation betweenthe L component and the set of templates is computed. Finally,the region with maximum cross-correlation value is selected [seeFig. 3(b)] and the ROI of the input image is obtained through acompletely automatic process.

2) Color Analysis: Color is one of the discriminant featuresof the Guillon categories [10] for TFLL classification. Some cat-egories show distinctive color characteristics and, for this rea-son, tear film images were analyzed [17] not only in grayscalebut also in the Lab and RGB color spaces. The best results wereobtained in Lab, so we will focus on it from now on. The CIE1976 L*a*b color space [20] (Lab) was defined by the Interna-tional Commission on Illumination, abbreviated as CIE from itsFrench title Commission Internationale de l’Eclairage. It is achromatic color space that describes all the colors that the humaneye can perceive. Lab is a 3-D model where its three coordi-nates represent: (i) the luminance of the color L, (ii) its positionbetween magenta and green a, and (iii) its position between

REMESEIRO et al.: METHODOLOGY FOR IMPROVING TEAR FILM LIPID LAYER CLASSIFICATION 1487

yellow and blue b. Its use is recommended by CIE in imageswith natural illumination and its colorimetric components makethis color space appropriate in texture extraction.

3) Texture Analysis: As well as color, interference patternsare one of the discriminant features of the Guillon categories.It has been demonstrated [17] that the interference phenomenacan be characterized as a texture pattern, since thick lipid layersshow clear patterns while thinner layers are more homogeneous.As several techniques for texture analysis can be applied, in thispaper five methods are tested. These are subsequently described:

1) Butterworth bandpass filters [21] are frequency domainfilters that have an approximately flat response in the band-pass frequency, which gradually decays in the stopband.The order of the filter defines the slope of the decay; thehigher the order, the faster the decay.

2) The discrete wavelet transform [22] generates a set ofwavelets by scaling and translating a mother wavelet. Thewavelet decomposition of an image consists of applyingthese wavelets horizontally and vertically, generating fourimages (LL, LH, HL, HH). The process is repeated it-eratively on the LL subimage resulting in the standardpyramidal wavelet decomposition.

3) Co-occurrence features method was introduced byHaralick et al. [23], which is based on the computation ofthe conditional joint probabilities of all pairwise combi-nations of gray levels. This method generates a set of graylevel co-occurrence matrices and extracts several statisticsfrom their elements. In general, the number of orientationsand so of matrices for a distance d is 4d.

4) Markov random fields [25] generate a texture model byexpressing the gray values of each pixel in an image as afunction of the gray values in its neighborhood.

5) Gabor filters [26] are complex exponential functions mod-ulated by Gaussian functions. The parameters of Gaborfilters define their shape, and represent their location inthe spatial and frequency domains.

B. Feature Selection

Machine learning can take advantage of FS to reduce thenumber of features so as to improve the performance of au-tomatic classifiers [18]. FS methods can be divided into threemain models: filters, wrappers and embedded methods [18]. Thefilter model relies on general characteristics of the data (correla-tion, entropy, etc.) to evaluate and select feature subsets withoutinvolving any learning algorithm or prediction model. On theother hand, wrapper models use a specific prediction method asa black box to score subsets of features as part of the selectionprocess and finally, embedded methods perform FS as part of thetraining process of the prediction model. By having some inter-action with the classifier, wrapper and embedded methods tendto give better performance results than filters, at the expense ofa higher computational cost. Also, it is well-known that wrap-pers have the risk of overfitting when having more features thansamples [27], as it is the case in this paper. Trying to overcomethis limitation, some preliminary tests have been performed inthis paper using a wrapper approach with sequential forward

search, however the performance obtained was not good. Thepoor behavior showed by wrappers in this kind of scenarios,together with the significant computational burden required bythis approach [28], prevent their use in this paper. Therefore,filters were chosen because they allow for reducing the dimen-sionality of the data without compromising the time and mem-ory requirements of machine learning algorithms. Among thebroad suite of methods present in the literature, the followingfilters were chosen based on previous studies [29], [30] and aresubsequently presented:

1) Correlation-based feature selection (CFS) [31] is a sim-ple multivariate filter algorithm that ranks feature subsetsaccording to a correlation based heuristic. The goal is toselect subsets that contain features that are highly corre-lated with the class and uncorrelated with each other.

2) Consistency-based filter [32] evaluates the worth of a sub-set of features by the level of consistency in the classvalues when the samples are projected onto the subset ofattributes. The inconsistency criterion specifies to whatextent the dimensionally reduced data can be accepted.

3) The INTERACT algorithm [33] is a subset filter based onsymmetrical uncertainty and the consistency contribution,which is an indicator about how significantly the elimina-tion of a feature will affect consistency. This method canhandle feature interaction.

C. Classification

Support vector machine (SVM) is a widely-used classifierbased on the statistical learning theory and revolves around thenotion of a “margin”, either side of a hyperplane that separatestwo classes [34]. SVM reaches a global minimum and avoidslocal minima, which may happen in other algorithms. Theyavoid problems of overfitting and, with an appropriate kernel,they can work well even if the data is not linearly separable.

D. Performance Measures

After the SVM is trained, the performance of the system isevaluated in terms of different measures of relevance to the prob-lem in question. The definitions of the performance measuresare provided as follows:

1) The accuracy is the percentage of correctly classified in-stances on a dataset with optimum illumination.

2) The robustness is the classification accuracy in a noisydataset, i.e. its accuracy when the images in the datasetshow illuminations outside the optimum range. This mea-sure is related to the generalization ability of the methodwhen handling noisy inputs. Notice that the higher therobustness the higher the generalization performance.

3) The feature extraction time is the time that the textureanalysis methods take to extract the selected features of asingle image. Notice that this does not include the trainingtime of the classifier, which is not relevant for practicalapplications because the classifier will be trained off-line.This also applies to FS, which is a pre-processing step thatis performed off-line.


TABLE ILIPID LAYER INTERFERENCE PATTERNS. FROM TOP TO BOTTOM: OPEN MESHWORK, CLOSED MESHWORK, WAVE, AND COLOR FRINGE

E. Multiple-Criteria Decision-Making

Multiple-criteria decision-making (MCDM) [35] is focusedon evaluating classifiers from different aspects and producerankings of them. A multi-criteria problem is formulated usinga set of alternatives and criteria. Among many MCDM meth-ods that have been developed up to now, technique for orderof preference by similarity to ideal solution (TOPSIS) [36] is awell-known method that will be used. TOPSIS finds the best al-gorithms by minimizing the distance to the ideal solution whilstmaximizing the distance to the anti-ideal one. The extension ofTOPSIS proposed by Olson [37] is used in this paper.

III. MATERIALS AND METHODS

The aim of this paper is to present a general methodologybased on color texture analysis, FS and MCDM. In this paper,we test this methodology on TFLL classification to improveprevious results [17]. The materials and methods used in thispaper are described in this section.

A. Datasets

Guillon defined five main grades of lipid layer interferencepatterns in increasing thickness [10]: open meshwork, closedmeshwork, wave, amorphous and color fringe. In this paper, aclinical dataset provided by optometrists will be used. It doesnot contain images within the amorphous category because itis a very uncommon pattern [13], and for this reason the lattercategory is not considered. See the representative images fromTable I as examples of the four Guillon categories and the ROIspicked from 10 different images according to Section II-A1. Theappearance and estimated thickness of each category are alsodescribed in Table I.

The acquisition of the input images was carried out with theTearscope-plus [38] attached to a Topcon SL-D4 slit lamp [39]

and a Topcon DV-3 digital video camera [40]. The slit lamp’smagnification was set at 200X and the images were stored viathe Topcon IMAGEnet i-base [41] at a spatial resolution of1024 × 768 pixels per frame in RGB. The characteristics of theillumination are provided by Topcon parameters. There are fourlevels of brightness obtained with the Topcon shutter establishedto 16, 28, 45, and 81 units. Since the tear lipid film is notstatic between blinks, a video was recorded and analyzed by anoptometrist in order to select the best images for processing.Those images were selected when the TFLL was completelyexpanded after the eye blink and are the same that specialistsanalyze by hand.

In order to analyze the texture in Lab color space, the inputimage in RGB was transformed to Lab and each componentwas analyzed separately, generating three descriptors per imagecorresponding to the L, a and b components. Next, these threedescriptors were concatenated to generate the final descriptor,so its size is 3 times the size of the descriptors. Table II shows thearrangements for applying the texture analysis methods. Notethat column No. of features contains the number of featuresgenerated by each method, in which they are always multipliedby three because of the use of Lab.

Two banks of images acquired from the same group of healthypatients with ages ranging from 19 to 33 years were available.All images in these banks have been annotated by optometristsfrom the Faculty of Optics and Optometry of the Universityof Santiago de Compostela (Spain). Although the interferencepatterns are independent of the illumination, there is an opti-mum range of illuminations used by optometrists to obtain theimages. Images with illuminations outside this range are con-sidered noisy images. In this manner, two bank datasets aremanaged in this paper. There is a first bank which contains105 images from VOPTICAL_I1 dataset [42], all of them takenover the same illumination conditions, which are consideredto be the optimum ones by practitioners. This dataset contains


TABLE IIARRANGEMENTS FOR TEXTURE ANALYSIS METHODS AND NUMBER OF FEATURES

Texture analysis Configuration (per component) No. features

Butterworthfilters

A bank of Butterworth bandpass filters composed of 9 second order filters was used, with bandpassfrequencies covering the whole frequency spectrum. The filter bank maps each input image to 9 outputimages, one per frequency band. Each output image was normalized separately and then an uniformhistogram with non-equidistant bins [16] was computed. Since 16 bin histograms were used, the featurevectors contain 16 components per frequency band.

9 × 16 × 3 = 432

Discrete wavelettransform

A Haar algorithm [22] was applied as the mother wavelet. The descriptor of an input image is constructedcalculating the mean and the absolute average deviation of the input and LL images, and the energy ofthe LH, HL and HH images. Since 2 scales were used, the feature vectors contain 12 components.

12 × 3 = 36

Co-occurrencefeatures

A set of 14 statistics proposed in [23] was computed from each co-occurrence matrix. These statisticsrepresent features such as homogeneity or contrast. The descriptor of an image consists of 2 properties,the mean and range across matrices of these statistics, thus obtaining a feature vector with 28 componentsper distance. Distances from 1 to 7 were considered.

28 × 7 × 3 = 588

Markov randomfields

In this work, the neighborhood of a pixel is defined as the set of pixels within a Chebyshev distanced. To generate the descriptor, the directional variances proposed by Cesmeli and Wang [44] were used.For a distance d, the descriptor comprises 4d features. Distances from 1 to 10 were considered.

(Σ10d=14d) × 3 = 660

Gabor filters

A bank of 16 Gabor filters centred at 4 frequencies and 4 orientations was created. The filter bank mapseach input image to 16 output images, one per frequency-orientation pair. Using the same idea as inButterworth Filters, the descriptor of each output image is its uniform histogram with non-equidistantbins. Since 7 bin histograms were used, the feature vectors contain 7 components per filter.

16 × 7 × 3 = 336

Fig. 4. (a) Image obtained with an optimum illumination, and its ROI wherea color fringe pattern is clearly observable. (b) Image obtained with a too highillumination, and its ROI where a color fringe pattern is hardly observable.

the samples that are expected to be obtained in a real casesituation and will be used to compute the performance of al-gorithms. It includes 29 open meshwork, 29 closed meshwork,25 wave and 22 color fringe images. The second bank contains406 images from VOPTICAL_Is dataset [43], taken over dif-ferent illumination conditions. This bank will be used only toevaluate the sensibility of algorithms to noisy data. It includes159 open meshwork, 117 closed meshwork, 90 wave and 40color fringe images. Fig. 4 shows an example of two imagesfrom the same subject. It can be seen that a too high illumina-tion produces an image where the interference pattern is hardlyappreciated.

B. Experimental Procedure

The experimental procedure is detailed as follows,1) Apply the five texture analysis methods (see Table II) to the

two banks of images. Moreover, the concatenation of allthe features extracted by these five methods is also consid-ered. As a result, six datasets with optimum illumination(105 images) and six datasets with different illuminations(406 images) are available.

2) Apply the three FS methods (see Section II-B) to thedatasets with optimum illumination to provide the sub-set of features that describe properly the given problem.

TABLE IIINO. OF FEATURES

Texture analysis Feature selection filterNone CFS Cons INT

Butterworth filters 432 26 6 14Discrete wavelet transform 36 10 8 7Co-occurrence features 588 27 6 21Markov random fields 660 24 13 15Gabor filters 336 29 7 18Concatenation 2052 56 6 29

3) Train a SVM (see Section II-C) with radial basis kerneland automatic parameter estimation, according to [45]. Aten-fold cross validation was used, so the average erroracross all 10 trials is computed.

4) Evaluate the effectiveness of FS in terms of the threeperformance measures (see Section II-D), by means ofthe MCDM method TOPSIS (see Section II-E).

Experimentation was performed on an Intel R©CoreTMi5 CPU760 @ 2.80 GHz with RAM 4 GB.

IV. RESULTS AND DISCUSSION

In this section, the results obtained with and without FS will becompared in terms of the three performance measures describedabove. Bear in mind that the column None in the tables shows theresults when no FS was performed, and Concatenation stands forthe concatenation of all the features of the five texture analysismethods.

The number of features selected by each of the three FS filtersis summarized in Table III. In average, CFS, Consistency-basedfilter (Cons) and INTERACT (INT) retain only the 4.9%, 1.6%and 3.2% of the features, respectively.

A. Classification Accuracy

Table IV shows the test accuracies for all pairwise textureanalysis and FS methods after applying the SVM classifier overthe VOPTICAL_I1 dataset. The best result for each is marked inbold face. As can be seen, all texture analysis techniques perform


TABLE IVMEAN TEST CLASSIFICATION ACCURACY (%), VOPTICAL_I1 DATASET


Butterworth filters 91.42 93.33 83.81 86.67Discrete wavelet transform 88.57 91.43 94.29 96.19Co-occurrence features 95.24 94.29 86.67 93.33Markov random fields 84.76 85.71 83.81 75.24Gabor filters 95.24 91.43 86.67 86.67Concatenation 97.14 96.19 87.62 93.33

TABLE VROBUSTNESS: MEAN TEST ACCURACY (%), VOPTICAL_IS DATASET



quite well providing results over 84% accuracy. The best resultis generated by the concatenation of all methods. Individually,Gabor filters and co-occurrence features without FS outperformthe other methods. Although Markov random fields use informa-tion of the pixel’s neighborhood, as the co-occurrence featuresdo, this method does not work so well because the statisticsproposed by Haralick et al. [23] provide much more textural in-formation. Regarding FS, it outperforms primal results in threeout of six methods (Butterworth filters, the discrete wavelettransform and Markov random fields), while accuracy is almostmaintained in co-occurrence features and the concatenation ofall methods when CFS is applied. As conclusions, the best re-sult is obtained by using the concatenation of all when no FSis performed (97.14%). Closely, the discrete wavelet transformwith INTERACT filter and the concatenation of all with CFSobtain an accuracy of 96.19%. Notice that although these re-sults improve slightly the previous ones [17] (95%), our goal isto reduce the processing time whilst maintaining accuracy.

As mentioned in Section II-B, results of the wrapper approachare not displayed because of its poor performance. Particularly,the highest accuracy obtained by the wrapper was 89.52%,whilst the best one for the filters was 96.19%. This degrada-tion in accuracy is caused by overfitting, demonstrating the nonadequacy of the wrapper approach for this problem.

B. Robustness to Noise

Table V shows the robustness of the six different methodsover VOPTICAL_IS dataset. Co-occurrence features and theconcatenation of all obtain remarkable better results than theremainder methods. Both methods obtain values of robustnessover 90% for some configurations. In particular, the best resultis obtained by using the concatenation of all methods whenCFS filter is used (93.84%). In relative terms, co-occurrencefeatures and the concatenation of all methods deteriorate theirmean classification accuracy by 2.66% and 4.59%, respectively(mean differences between the values contained in Tables IV

TABLE VIFEATURE EXTRACTION TIME (S)



and V). However, the remainder methods deteriorate their meanclassification accuracy by between 6.78% and 8.23%. Note alsothat the illumination levels affect the robustness in differentdegrees. The brighter the illumination, the lower the robustnessto noise. This also happens to practitioners when performingthis task by hand. For this reason, their experience to control theillumination level during the acquisition stage is cornerstone forensuring good classification performance.

C. Feature Extraction Time

TFLL classification is a real-time task so the time a methodtakes to process an image cannot be a bottleneck. After apply-ing FS and so reducing the number of input attributes, the timeneeded for analyzing a single image with any of the six methodswas also reduced as can be seen in Table VI. In general terms,Butterworth filters, the discrete wavelet transform and Gaborfilters take a negligible lapse of time to extract the features of animage (regardless of whether or not FS is applied as preprocess-ing step). Moreover, Markov random fields takes a time whichcould be acceptable for practical applications, even when no FSis applied, although it could not work in real time. Co-occurrencefeatures has been known to be slow and, despite the authorsimplemented an optimization of the method based on [24], itpresents an unacceptable extraction time. Regarding the timeof the concatenation of all methods, note that it is influencedby the time of co-occurrence features. Co-occurrence featuresand the concatenation of all methods are only acceptable forpractical applications when consistency-based or INTERACTfilters are used. Consistency-based filter selects fewer features(see Table III) and consequently the processing time when thisfilter is used is smaller. Co-occurrence features is the core be-hind the good performance of the concatenation of all methods.This is demonstrated by further experiments showing that theconcatenation of the other four methods achieves a maximumaccuracy of 93.33% and robustness of 88.91%. These results aresignificantly worse (around 4%) than the best results obtainedby the concatenation of all methods.

D. Overall Analysis

The efficiency of FS is tested in this paper. In general terms,we can assert that in a field with a very large number of features,FS filters play a significant role to reduce the cost of obtainingdata and the complexity of the classifier. Consistency-based fil-ter performed the most aggressive selection retaining only the1.6% of the features (see Table III). CFS retained three timesmore features (4.9%) than the former. Halfway, INTERACT


TABLE VIITOPSIS VALUES OBTAINED FOR EVERY METHOD WHEN w = [1/3, 1/3, 1/3]



selected in average 3.2% of features. Moreover, in most casesthe test accuracy is improved or maintained with a remark-able reduction in the number of features when FS is used (seeTable IV). The effectiveness of FS on TFLL classification wasthen demonstrated.

Evaluating the performance of the methods for texture analy-sis presented in this paper is a multi-objective problem definedin terms of accuracy, robustness, and feature extraction time.Butterworth filters, the discrete wavelet transform and Gabor fil-ters obtain competitive classification accuracies in short spansof time (see Tables IV and VI). However, these methods arevery sensitive to noisy data (see Table V) which make theminappropriate for practical applications. On the other hand, co-occurrence features presents competitive results in classificationaccuracy and generalization (see Tables IV and VI). However,the time the method takes to extract its features is an impedi-ment (see Table VI). The concatenation of all methods improvesthe previous results but at the expense of an even longer featureextraction time.

Table VII shows the TOPSIS values obtained for everymethod when the weights of each criteria are set equally. Notethat the larger the value the better the method. The top threemethods are marked in bold. As can be seen, those methodswith the best balance among classification accuracy, robustnessto noise and feature extraction time are ranked in higher posi-tions. In particular, Gabor filters with no FS, the discrete wavelettransform with INTERACT filter, and the concatenation of allmethods with INTERACT filter rank the top three positions ofthe ranking. However, those methods with good performance inaccuracy and robustness but very long feature extraction timeare penalized. e.g. co-occurrence features or the concatenationof all methods, with no feature selection in both cases.

A more detailed look at the results contained in Tables IV, Vand VI reveals that the combination of all methods in both CFSfiltering configuration and without FS obtain the best results interms of accuracy and robustness. These two configurations arein the Pareto front [46] of accuracy versus robustness (see Fig. 5).In multi-objective optimization, the Pareto front is defined asthe border between the region of feasible points (not strictlydominated by any other), for which all constraints are satisfied,and the region of unfeasible points (dominated by others). Inthis case, solutions are constrained to maximize accuracy androbustness.

The suitability of these two solutions to the problem inquestion is also corroborated by TOPSIS. Table VIII showsthe TOPSIS values when only accuracy and robustness areconsidered (note that the third term in the weight vector is

65 70 75 80 85 90 95 10065

70

75

80

85

90

95

100

Accuracy (%)

Rob

ustn

ess

to n

oise

(%)

Concatenation FS: CFS

Concatenation

FS: N

one

Fig. 5. Pareto front of a multi-objective optimization problem based on accu-racy and robustness to noise.

TABLE VIIITOPSIS VALUES OBTAINED FOR EVERY METHOD WHEN w = [1/2, 1/2, 0]



considered to be the feature extraction time). The concatenationof all methods without FS and with CFS filtering rank first andsecond, respectively.

However, the time for extracting the features must be short-ened for practical applications. With this aim a case of study, inwhich a deeper analysis for FS is carried out, is presented in thenext section. Notice that the number of features in the concate-nation of all methods without FS is too large (2052 features)to be optimized by hand. For this reason, we will focus on theconcatenation on all methods with CFS (56 features).

E. The Concatenation of All Methods With CFS:A Case of Study

When using FS, features are selected according to some spe-cific criteria depending on the method. Filters remove featuresbased on redundancy and relevance, but they do not take intoaccount costs for obtaining them. Note that the cost for obtain-ing a feature depends on the procedures required to extract it.Therefore, each feature has an associated cost that can be re-lated to financial cost, physical risk or computation demands.This is the case of co-occurrence features and, consequently, theconcatenation of all methods. In co-occurrence features the costfor obtaining the 588 features is not homogeneous. Features arevectorized in groups of 28 related to distances and channels inthe color space. Each group of 28 features corresponds withthe mean and range of 14 statistics across the co-occurrencematrices (see Table II).


TABLE IXCO-OCCURRENCE FEATURES SELECTED BY CFS OVER THE CONCATENATION

OF ALL METHODS, IN WHICH FEATURES CORRESPONDING WITH 14TH

STATISTIC ARE MARKED IN BOLD

Distance Component in the color spaceL a b

1 – 29, 50 662 98 121, 133 –3 193 – 2304 267, 268, 275, 276, 277 – 3215 350, 359 – –6 434, 443, 446 – 492, 5027 518 546 576

If we focus on co-occurrence features when using CFS, thenumber of features were reduced by 95.41% (from 588 to 27)but the processing time was not reduced in the same proportion,and it is now 27.01 instead of 102.18 seconds (a reduction of73.57%). This fact clearly shows that computing some of the588 features takes longer than others. Some experimentationwas performed on the time the method takes to compute eachof the 14 statistics. Results disclosed that computing the 14thstatistic, which corresponds with the maximal correlation coef-ficient [23], takes around 96% of the total time. So the time forobtaining a single matrix is negligible compared to the time forcomputing the 14th statistic. Therefore, the key for reducing thefeature extraction time is to reduce the number of 14th statisticsin the selection.

In the case of the concatenation of all methods with CFS, thefilter selects 56 features (see Table III) distributed as follows:17 features of Butterworth filters, one of the discrete wavelettransform, 24 of co-occurrence features, one of Markov randomfields, and 13 of Gabor filters. Five of the features selected inco-occurrence features correspond with the 14th statistic (seeTable IX). In co-occurrence features, the cost of obtaining thestatistics also depends on the distance and component in thecolor space. On the one hand, the longer the distance the largerthe number of matrices to compute (and so, the higher the pro-cessing time). On the other hand, the differences of color havelittle contrast so the colorimetric components of the Lab colorspace are minimal. As a consequence, the matrices within com-ponents a and b have smaller dimension than the matrices withincomponent L. As expected, the smaller the dimension the shorterthe time to compute a statistic.

Computing the five 14th statistics in the different distancesand components take: 3.12 s (feature 98), 8.23 s (feature350), 9.61 s (feature 434), 11.49 s (feature 518), and 4.81 s(feature 546). As can be seen, avoiding computing some ofthem will entail saving a significant amount of time. The aimhere is to explore the impact of removing some of the five 14thstatistics selected by CFS in terms of accuracy, robustness andtime. There are five features within the 14th statistic so only25 = 32 different configurations need to be explored. An em-pirical evaluation of brute force is acceptable. Table X shows theperformance of the different configurations in terms of accuracy,robustness and time. Each configuration corresponds with thosefeatures selected by CFS removing some 14th statistics. Forpurposes of simplicity, only the acceptable results are shown. It

TABLE XPERFORMANCE MEASURES FOR THE CONCATENATION OF ALL METHODS WITH

CFS WHEN SOME OF THE FIVE 14TH STATISTICS ARE NOT SELECTED

Features removed Acc (%) Rob (%) Time (s){}, baseline performance 96.19 93.84 37.04{98, 434} 97.14 94.09 24.31{98, 434, 546} 97.14 93.84 19.83{98, 350, 518, 546} 97.14 93.60 9.72{98, 434, 518, 546} 97.14 92.86 8.34{98, 350, 434, 518, 546} 97.14 92.61 0.11The best results are marked in bold.

is assumed that one solution is unacceptable if it obtains a loweraccuracy and robustness in a longer span of time than other.

In terms of accuracy and robustness to noisy data, the best re-sult is obtained when removing the features {98, 434} (results of97.14% and 94.09%, respectively), but at the expense of a quitelong lapse of time (24.31). Note that this result even improvesthe baseline performance. In the remainder results, the classifi-cation accuracy is maintained whilst the feature extraction timeis reduced, only at the expense of a slightly deterioration interms of robustness to noise (less than 2%).

It is also important to remark the effectiveness of CFS filterfor selecting the most appropriate features. If we do not applyfeature selection and we simply remove the 14th statistics fromthe 588 features corresponding with co-occurrence features inthe concatenation of all methods, the accuracy and the robust-ness are 92.86% for both of them. That is, the accuracy is worsethan the results shown in Table X and the robustness is not sig-nificantly different. As expected, the time is also longer: 14.74seconds.

V. CONCLUSION

The time required by existing approaches which deal withTFLL classification prevent their clinical use because they couldnot work in real time. In this paper, a methodology for improv-ing this classification problem was proposed. This methodologyincludes the application of FS and MCDM methods which, upto the authors’ knowledge, has not been explored yet in thisfield. Three of the most popular FS methods were used: CFS,Consistency-based, and INTERACT. They were tested on fivepopular texture analysis methods: Butterworth filters, the dis-crete wavelet transform, co-occurrence features, Markov ran-dom fields, and Gabor filters; and the concatenation of all thesemethods. In order to decide the best combination of techniques,the MCDM method TOPSIS was applied. Results obtained withthis methodology surpass previous results in terms of processingtime whilst maintaining accuracy and robustness to noise.

In clinical terms, the manual process done by experts canbe automated with the benefits of being faster and unaffectedby subjective factors, with maximum accuracy over 97% andprocessing time under 1 second. The clinical significance ofthese results should be highlighted, as the agreement betweensubjective observers is between 91%–100% according to [13].

In this paper, the proposed methodology was later optimizedad hoc regarding the different costs to compute the features.This process was suitable because of the low number of fea-tures but it would become unfeasible as the number of features


increases. Therefore, as future research, we plan to develop aFS method which automatically takes into account costs of fea-tures. Regarding image processing, we would like to increaseour database by including the amorphous category. Finally, asthis is a general methodology, we plan to use it to solve someother problems.

ACKNOWLEDGMENT

We would like to thank the Facultad de Optica y Optometrıaof the Universidade de Santiago de Compostela for providingus with the annotated datasets.

REFERENCES

[1] S. Pflugfelder, S. Tseng, O. Sanabria, H. Kell, C. Garcia, C. Felix,W. Feuer, and B. Reis, “Evaluation of subjective assessments and ob-jective diagnostic tests for diagnosing tear-film disorders known to causeocular irritation,” Cornea, vol. 17, no. 1, pp. 38–56, 1998.

[2] G. Rieger, “The importance of the precorneal tear-film for the quality ofoptical imaging,” Br. J. Ophthalmol., vol. 76, no. 3, pp. 157–158, 1992.

[3] K. Nichols, J. Nichols, and G. Mitchell, “The lack of association betweensigns and symptoms in patients with dry eye disease,” Cornea, vol. 23,no. 8, pp. 762–70, 2004.

[4] A. Bron, J. Tiffany, S. Gouveia, N. Yokoi, and L. Voon, “Functionalaspects of the tear film lipid layer,” Experiment. Eye Res., vol. 78, no. 3,pp. 347–360, 2004.

[5] M. Rolando, M. Iester, A. Macrı, and G. Calabria, “Low spatial-contrastsensitivity in dry eyes,” Cornea, vol. 17, no. 4, pp. 376–369, 1998.

[6] M. Rolando, M. Refojo, and K. Kenyon, “Increased tear evaporation ineyes with keratoconjunctivitis sicca,” Arch. Ophthalmol., vol. 101, no. 4,pp. 557–558, 1983.

[7] M. Lemp, “Report of the national eye institute/industry workshop onclinical trials in dry eyes,” CLAO J., vol. 21, no. 4, pp. 221–232, 1995.

[8] M. Lemp, C. Baudouin, J. Baum, M. Dogru, G. Foulks, S. Kinoshita,P. Laibson, J. McCulley, J. Murube, S. Pfugfelder, M. Rolando, and I. Toda,“The definition and classification of dry eye disease: Report of the defini-tion and classification subcommittee of the international dry eye workshop(2007),” Ocular Surf., vol. 5, no. 2, pp. 75–92, 2007.

[9] G. Foulks, “The correlation between the tear film lipid layer and dry eyedisease,” in Surv. Ophthalmol., 2007, pp. 52:369–74.

[10] J. Guillon, “Non-invasive tearscope plus routine for contact lens fitting,”Cont. Lens Anterior Eye, vol. 21 Suppl. 1, pp. 31–40, 1998.

[11] M. Rolando, C. Valente, and S. Barabino, “New test to quantify lipid layerbehavior in healthy subjects and patients with keratoconjunctivitis sicca,”Cornea, vol. 27, no. 8, pp. 866–870, 2008.

[12] J. Nichols, K. Nichols, B. Puent, M. Saracino, and G. Mitchell, “Evaluationof tear film interference patterns and measures of tear break-up time,”Optom. Vis. Sci., vol. 79, no. 6, pp. 363–369, 2002.

[13] C. Garcıa-Resua, M. Giraldez-Fernandez, M. Penedo, D. Calvo, M. Penas,and E. Yebra-Pimentel, “New software application for clarifying tear filmlipid layer patterns,” Cornea, vol. 32, no. 4, pp. 538–546, 2013.

[14] P. King-Smith, B. Fink, and N. Fogt, “Three interferometric methods formeasuring the thickness of layers of the tear film,” in Optom. Vis. Sci.,1999, pp. 76:19–32.

[15] E. Goto, Y. Yagi, M. Kaido, Y. Matsumoto, K. Konomi, and K. Tsubota,“Improved functional visual acuity after punctal occlusion in dry eyepatients,” Am. J. Ophthalmol., vol. 135, no. 5, pp. 704–705, 2003.

[16] L. Ramos, M. Penas, B. Remeseiro, A. Mosquera, N. Barreira, andE. Yebra-Pimentel, “Texture and color analysis for the automatic clas-sification of the eye lipid layer,” in Proc. LNCS: Adv. Computat. Intell.(Int. Work Conf. Artif. Neural Netw.), vol. 6692, pp. 66–73.

[17] B. Remeseiro, L. Ramos, M. Penas, E. Martınez, M. Penedo, andA. Mosquera, “Color texture analysis for classifying the tear film lipidlayer: A comparative study,” in Proc. Int. Conf. Digital Image Comput.:Techn. Appl., Noosa, Australia, Dec. 2011, pp. 268–273.

[18] I. Guyon, S. Gunn, M. Nikravesh, and L. Zadeh, Feature Extraction:Foundations and Applications. New York, NY, USA: Springer-Verlag,2006.

[19] D. Calvo, A. Mosquera, M. Penas, C. Garcıa-Resua, and B. Remeseiro,“Color texture analysis for tear film classification: A preliminary study,”in Proc. Lecture Notes Comput. Sci.: Int. Conf. Image Anal. Recognit.,2010, vol. 6112, pp. 388–397.

[20] K. McLaren, “The development of the CIE 1976 (L*a*b) uniform color-space and color-difference formula,” J. Soc. Dyers Colorists, vol. 92, no. 9,pp. 338–341, 1976.

[21] R. Gonzalez and R. Woods, Digital Image Processing. EnglewoodCliffs, NJ, USA: Pearson/Prentice-Hall, 2008.

[22] S. G. Mallat, “A theory for multiresolution signal decomposition: thewavelet representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 11,no. 7, pp. 674–693, Jul. 1989.

[23] R. M. Haralick, K. Shanmugam, and I. Dinstein, “Texture features forimage classification,” IEEE Trans. Syst., Man, Cybern. Syst., Man Cybern.,vol. 3, no. 6, pp. 610–621, Nov. 1973.

[24] D. A. Clausi and M. E. Jernigan, “A fast method to determine co-occurrence texture features,” IEEE Trans. Geosci. Remote Sens., vol. 36,no. 1, pp. 298–300, Jan. 1998.

[25] J. Woods, “Two-dimensional discrete markovian fields,” IEEE Trans. Inf.Theory, vol. 18, no. 2, pp. 232–240, Mar. 1972.

[26] D. Gabor, “Theory of communication,” J. Inst. Electr. Eng., vol. 93,pp. 429–457, 1946.

[27] J. Loughrey and P. Cunningham, “Overfitting in wrapper-based featuresubset selection: The harder you try the worse it gets,” Res. DevelopmentIntell. Syst. XXI, pp. 33–43, 2005.

[28] M. Dong and R. Kothari, “Feature subset selection using a new definitionof classifiability,” Pattern Recognit. Lett., vol. 24, no. 9, pp. 1215–1225,2003.

[29] V. Bolon-Canedo, N. Sanchez-Marono, and A. Alonso-Betanzos, “On thebehavior of feature selection methods dealing with noise and relevanceover synthetic scenarios,” in Proc. IEEE Neural Netw. Int. Joint Conf.,2011, pp. 1530–1537.

[30] V. Bolon-Canedo, N. Sanchez-Marono, and A. Alonso-Betanzos, “A re-view of feature selection methods on synthetic data,” Knowl. Inf. Syst.,vol. 34, no. 3, pp. 483–519, 2013.

[31] M. Hall, “Correlation-based feature selection for machine learning,” Ph.D.dissertation, Dept. Comput. Sci., Univ. Waikato, Hamilton, New Zealand,1999.

[32] M. Dash and H. Liu, “Consistency-based search in feature selection,”Artif. Intell., vol. 151, no. 1–2, pp. 155–176, 2003.

[33] Z. Zhao and H. Liu, “Searching for interacting features,” in Proceedings ofthe 20th International Joint Conference on Artificial Intelligence. SanMateo, CA, USA: Morgan Kaufmann Publishers Inc., 2007, pp. 1156–1161.

[34] C. J. C. Burges, “A tutorial on support vector machines for pattern recog-nition,” Data Mining Knowl. Discov., vol. 2, pp. 121–167, 1998.

[35] M. Zeleny, Multiple Criteria Decision Making. vol. 25, New York, NY,USA: McGraw-Hill, 1982.

[36] C. Hwang and K. Yoon, Multiple Attribute Decision Making: Methods andApplications: A State-of-the-Art Survey. vol. 13, New York, NY, USA:Springer-Verlag, 1981.

[37] D. Olson, “Comparison of weights in TOPSIS models,” Math. Comput.Modell., vol. 40, no. 7–8, pp. 721–727, 2004.

[38] Tearscope Plus Clinical Hand Book and Tearscope Plus Instructions.Broomall, PA, USA: Keeler Ltd. Windsor, Berkshire, Keeler Inc, 1997.

[39] “Topcon SL-D4 slit lamp,” Topcon Medical Systems, Oakland, NJ, USA.[40] “Topcon DV-3 digital video camera,” Topcon Medical Systems, Oakland,

NJ, USA.[41] “Topcon IMAGEnet i-base,” Topcon Medical Systems, Oakland, NJ,

USA.[42] “VOPTICAL_I1. VARPA optical dataset annotated by optometrists

from the Faculty of Optics and Optometry, University of Santi-ago de Compostela (Spain),” [Online]. Available: http://www.varpa.es/voptical_I1.html, last access: december 2013.

[43] “VOPTICAL_Is. VARPA optical dataset annotated by optometristsfrom the Faculty of Optics and Optometry, University of Santi-ago de Compostela (Spain),” [Online]. Available: http://www.varpa.es/voptical_Is.html, last access: december 2013.

[44] E. Cesmeli and D. Wang, “Texture segmentation using Gaussian-Markovrandom fields and neural oscillator networks,” IEEE Trans. Neural Netw.,vol. 12, Mar. 2001.

[45] B. Remeseiro, M. Penas, A. Mosquera, J. Novo, M. Penedo, and E. Yebra-Pimentel, “Statistical comparison of classifiers applied to the interferentialtear film lipid layer automatic classification,” Computat. Math. MethodsMed., vol. 2012, 2012.

[46] J. Teich, “Pareto-front exploration with uncertain objectives,” in Evo-lut. Multi-Criter. Optim. New York, NY, USA: Springer-Verlag, 2001,pp. 314–328.

Authors’ photographs and biographies not available at the time of publication.

a methodology for improving tear film lipid layer classification

Documents