an interpretable graph-based image classifierispac.diet.uniroma1.it/scardapane/pdf/2014 - an...

8
An Interpretable Graph-based Image Classifier Filippo M. Bianchi, Simone Scardapane, Lorenzo Livi, Aurelio Uncini, and Antonello Rizzi Abstract— The generalization capability is usually recognized as the most desired feature of data-driven learning systems, such as classifiers. However, in many practical applications obtaining human-understandable information, relevant to the problem at hand, from the classidication model can be equally important. In this paper we propose a classification system able to fulfill these two requirements simultaneously for a generic image classification task. As a first preprocessing step, an input image to the classifier is represented by a labeled graph, relying on a segmentation algorithm. The graph is conceived to represent visual and topological information of the relevant segments of the image. Then, the graph is classified by a suited inductive inference engine. In the learning procedure all the training set images are represented by graphs, feeding a state- of-the-art classification system working on structured domains. The synthesis procedure consists in extracting characterizing subgraphs from the training set, which are used to embed the graphs into a vector space, enabling thus the applicability of well-known classifiers for feature-based patterns. Such char- acterizing subgraphs, which are derived in an unsupervised fashion, are interpretable by suitable field experts, allowing a semantic analysis of the discovered classification rules for the given problem at hand. The system is optimized with a genetic algorithm, which tunes the system parameters according to a cross-validation scheme. We show the validity of the approach by performing experiments considering some image classification problems derived from an on-line repository. I. I NTRODUCTION D ESPITE the recent advances in computational process- ing power and computer vision algorithms, automatic classification of images in a non-controlled setting remains a challenging task [1]. The challenge is further magnified if we require our classification system to provide some semantic information about the underlying data generation process, i.e., on the specific components of the images that play a key role in discriminating the classes. This information may be pertinent only to the visual appearance of a given object (e.g., its shape); in more complex scene analysis, however, it can be related also to the mutual relations intervening between the objects. Let’s consider as an example the two images in Fig. 1. Any human is able to immediately find that (i) a small triangular shape is repeated three times in both images, and (ii) that it is the peculiar disposition of these three objects that discriminates image A from image B. If the interest is focused only on semantic information related to single objects, several approaches already exists in the literature. As an example, deep neural networks [2] have recently attracted a vast interest due to, among other Authors are with the Department of Information Engineering, Elec- tronics and Telecommunications (DIET), “Sapienza” University of Rome, Via Eudossiana 18, 00184, Rome. (email: {filippomaria.bianchi, si- mone.scardapane, aurelio.uncini, livi, antonello.rizzi}@uniroma1.it). (a) Class A (b) Class B Fig. 1: Example of a complex two-class discrimination where the key information is in the mutual relations among the constituting objects. factors, their capability of working directly on the pixel rep- resentation of an image, while extracting in an unsupervised fashion high-level features by virtue of a hierarchical design. These features, however, are not always easily interpretable, especially when obtained from very large neural network models. An alternative approach is to define an a-priori set of high-level features to be searched/recognized in the input image, performing the classification by maximizing the spar- sity of the employed features (see [3] and references therein). To achieved generality, here the burden is on designing a sufficiently large number of feature extractors able to cover a wide class of possible classification problems. A similar set of alternatives is instead unavailable if we are also interested in more complex patterns arising from objects relation [4]. Such a problem can be faced by mapping the image into a structured domain, e.g., labeled graphs, which provide an effective model for computer- based analysis, preserving topological and metric information on the relevant components of the image [5], [6]. Along this line, in [7] image classification is performed by first segmenting the input image, and subsequently constructing a graph where each vertex embodies information of a segment and each edge stores the information of the spatial relation among two segments. In this paper, we elaborate over the work [7] by exploiting the capabilities of the GRanular computing Approach for Labeled Graphs (GRALG) graph classifier [8]. GRALG is a general purpose graph classifier that is conceived to allow the interpretation of the input data. GRALG first searches for recurrent subgraphs of the training set by building the so-called symbols alphabet. Successively it generates a suitable embedding of the input graphs, i.e., it associates to each graph a numeric vector, called symbolic histogram, which encodes the original image expressed in terms of those symbols [9]. Symbolic histogram representations are hence exploitable by any feature-based classifier (e.g., a conventional neural network). A genetic algorithm optimizes the overall system, with the objective of finding the optimal parameters and at the same time minimizing the number of symbols used for the embedding. 2014 International Joint Conference on Neural Networks (IJCNN) July 6-11, 2014, Beijing, China 978-1-4799-1484-5/14/$31.00 ©2014 IEEE 2339

Upload: others

Post on 13-Jun-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Interpretable Graph-based Image Classifierispac.diet.uniroma1.it/scardapane/pdf/2014 - An interpretable graph... · An Interpretable Graph-based Image Classifier Filippo M. Bianchi,

An Interpretable Graph-based Image Classifier

Filippo M. Bianchi, Simone Scardapane, Lorenzo Livi, Aurelio Uncini,and Antonello Rizzi

Abstract— The generalization capability is usually recognizedas the most desired feature of data-driven learning systems,such as classifiers. However, in many practical applicationsobtaining human-understandable information, relevant to theproblem at hand, from the classidication model can be equallyimportant. In this paper we propose a classification system ableto fulfill these two requirements simultaneously for a genericimage classification task. As a first preprocessing step, aninput image to the classifier is represented by a labeled graph,relying on a segmentation algorithm. The graph is conceivedto represent visual and topological information of the relevantsegments of the image. Then, the graph is classified by a suitedinductive inference engine. In the learning procedure all thetraining set images are represented by graphs, feeding a state-of-the-art classification system working on structured domains.The synthesis procedure consists in extracting characterizingsubgraphs from the training set, which are used to embed thegraphs into a vector space, enabling thus the applicability ofwell-known classifiers for feature-based patterns. Such char-acterizing subgraphs, which are derived in an unsupervisedfashion, are interpretable by suitable field experts, allowinga semantic analysis of the discovered classification rules forthe given problem at hand. The system is optimized with agenetic algorithm, which tunes the system parameters accordingto a cross-validation scheme. We show the validity of theapproach by performing experiments considering some imageclassification problems derived from an on-line repository.

I. INTRODUCTION

DESPITE the recent advances in computational process-ing power and computer vision algorithms, automatic

classification of images in a non-controlled setting remains achallenging task [1]. The challenge is further magnified if werequire our classification system to provide some semanticinformation about the underlying data generation process,i.e., on the specific components of the images that play a keyrole in discriminating the classes. This information may bepertinent only to the visual appearance of a given object (e.g.,its shape); in more complex scene analysis, however, it canbe related also to the mutual relations intervening betweenthe objects. Let’s consider as an example the two images inFig. 1. Any human is able to immediately find that (i) a smalltriangular shape is repeated three times in both images, and(ii) that it is the peculiar disposition of these three objectsthat discriminates image A from image B.

If the interest is focused only on semantic informationrelated to single objects, several approaches already existsin the literature. As an example, deep neural networks [2]have recently attracted a vast interest due to, among other

Authors are with the Department of Information Engineering, Elec-tronics and Telecommunications (DIET), “Sapienza” University of Rome,Via Eudossiana 18, 00184, Rome. (email: {filippomaria.bianchi, si-mone.scardapane, aurelio.uncini, livi, antonello.rizzi}@uniroma1.it).

(a) Class A (b) Class B

Fig. 1: Example of a complex two-class discrimination wherethe key information is in the mutual relations among theconstituting objects.

factors, their capability of working directly on the pixel rep-resentation of an image, while extracting in an unsupervisedfashion high-level features by virtue of a hierarchical design.These features, however, are not always easily interpretable,especially when obtained from very large neural networkmodels. An alternative approach is to define an a-priori setof high-level features to be searched/recognized in the inputimage, performing the classification by maximizing the spar-sity of the employed features (see [3] and references therein).To achieved generality, here the burden is on designing asufficiently large number of feature extractors able to covera wide class of possible classification problems.

A similar set of alternatives is instead unavailable ifwe are also interested in more complex patterns arisingfrom objects relation [4]. Such a problem can be faced bymapping the image into a structured domain, e.g., labeledgraphs, which provide an effective model for computer-based analysis, preserving topological and metric informationon the relevant components of the image [5], [6]. Alongthis line, in [7] image classification is performed by firstsegmenting the input image, and subsequently constructing agraph where each vertex embodies information of a segmentand each edge stores the information of the spatial relationamong two segments. In this paper, we elaborate over thework [7] by exploiting the capabilities of the GRanularcomputing Approach for Labeled Graphs (GRALG) graphclassifier [8]. GRALG is a general purpose graph classifierthat is conceived to allow the interpretation of the inputdata. GRALG first searches for recurrent subgraphs of thetraining set by building the so-called symbols alphabet.Successively it generates a suitable embedding of the inputgraphs, i.e., it associates to each graph a numeric vector,called symbolic histogram, which encodes the original imageexpressed in terms of those symbols [9]. Symbolic histogramrepresentations are hence exploitable by any feature-basedclassifier (e.g., a conventional neural network). A geneticalgorithm optimizes the overall system, with the objectiveof finding the optimal parameters and at the same timeminimizing the number of symbols used for the embedding.

2014 International Joint Conference on Neural Networks (IJCNN) July 6-11, 2014, Beijing, China

978-1-4799-1484-5/14/$31.00 ©2014 IEEE 2339

Page 2: An Interpretable Graph-based Image Classifierispac.diet.uniroma1.it/scardapane/pdf/2014 - An interpretable graph... · An Interpretable Graph-based Image Classifier Filippo M. Bianchi,

RawImage

ImageSegmentation

GraphBuilder

GraphClassifier

- Class Label- Semantic Info

Fig. 2: Scheme of the classification process.

The symbols alphabet is a collection of the subgraphs (inour specific case a collection of image segments togetherwith their mutual relations) that are recurrent and that arediscriminant for the classes. As a consequence, such acollection allows to perform the interpretation of the inputimages from the class discrimination viewpoint. With respectto earlier works undergoing the same research direction (e.g.,[4], [10]), the following innovations are proposed in thispaper. First, the use of a more flexible representation of theimage components, able to adapt in an automatic fashionto the specific task at hand. Secondly, the capability ofproviding high-level semantic information on the imagescharacterization within a classification setting. Finally, anhigh level of automation of the learning procedure allowingthe possibility of dealing with a wide range of differentproblems, without any need to tweak by hand the system’sparameters.

The remainder of the paper is organized as follows. InSection II we present the details of the proposed imageclassification system. Experiments carried out on some imageclassification problems extracted from an on-line repositoryare discussed in Section III. Finally, we provide some con-clusive remarks in Section IV.

II. SYSTEM ARCHITECTURE

The proposed image classification system is composed ofthree macro blocks. The first one is the preprocessing block,which is responsible of identifying the segments containedin the input raw image by applying a proper segmentationalgorithm. The second block constructs a labeled graph fromthe extracted segments, whose vertices and edges representthe segments and their mutual spatial relations. Finally, thelast block is the GRALG graph classifier. The whole imageclassification process is depicted in Fig. 2.

The system depends on several parameters, in particularthe ones used for segmenting the image (H) and thoseused for training the classifier (P), including the parametersof the graph matching procedure adopted to define thecore inductive inference engine. Letting the user set suchparameters is impractical and not realistic, since it requires adeep knowledge of the specific classification problem (espe-cially from the segmentation viewpoint). Additionally, not allparameters are easily interpretable by a non-expert user. Forthis reason, in our system the parameters are automaticallytuned along the training phase using a standard geneticalgorithm that exploits the classification error on a validation

Segmentation

and TrainingClassification

Genetic

Algorithm

Testing

trS

valS

tstS

Fig. 3: Training and testing of the image classificationsystem.

set as the main objective that guides the optimization. Tothis end, we split the original dataset in three disjoint parts:the Training Set Str, the Validation Set Svs, and a Test SetSts. The i-th individual is represented by a genetic code Ci,given by instances of both H and P parameters, Hi andPi. For a given individual, image segmentation is performedon both Str and Svs data, with parameters setting given byHi. Successively, each image is encoded into a graph and thesymbols alphabet Ai is derived from training set graphs only,considering the setting Pi. Then, the symbolic histograms ofStr images are used to synthesize a classification model Mi

operating on Rn. Finally, the error rate ERR(Ci) of Mi onthe symbolic histograms representation of Svs is computed.

To promote an effective and parsimonious model, thefitness is defined as:

f(Ci) = λERR(Ci) + (1− λ)‖Ci‖0, (1)

where ‖Ci‖0 is the number of non-zero elements in thegenetic code (i.e., the number of considered parameters), andλ ∈ [0, 1] balances the two terms. The genetic algorithmterminates when a threshold performance is reached or amaximum number of evolutions is performed. Finally, wecompute the classification accuracy on Sts of the best individ-ual (classification model) found by the genetic optimizationaccording to (1). For both the training and test phases, weadopted the K-Nearest Neighbors classifier with k = 3. Aschematic representation of the training procedure is givenin Fig. 3.

In the following, we describe (i) the procedure we usedfor segmenting the input images, (ii) the related graph rep-resentation technique, and finally (iii) the GRALG classifier.

A. Image Segmentation

Image segmentation is the process of partitioning an imageinto multiple collections/regions of pixels (segments), suchthat in each region the pixels share some similarities [11].Our aim is to assign a label to every pixel of the imagesuch that pixels with the same label share certain visualcharacteristics. All pixels in a region should be similarwith respect to some characteristic/property (color, intensity,texture), while pixels belonging to distinct segments shouldbe significantly different. The goal of the segmentation is torepresent the considered image into a new simplified image,that is more interpretable and easier to analyze. This consists

2340

Page 3: An Interpretable Graph-based Image Classifierispac.diet.uniroma1.it/scardapane/pdf/2014 - An interpretable graph... · An Interpretable Graph-based Image Classifier Filippo M. Bianchi,

in partitioning the image in a collection of segments coveringthe entire image. The algorithm we used is a Watershed-based segmentation procedure [11], [7], which considers thegradient magnitude of an image as a topographic surface;however, other state-of-the-art segmenting procedures [12,Chapter 4] could work equally well in our system. Pixelshaving the highest gradient magnitude intensities correspondto watershed lines, which represent the region boundaries.“Water” placed on any pixel enclosed by a common wa-tershed line flows downhill to a common local intensityminimum. Pixels draining to a common minimum form a“basin”, which represents a segment. In the following, wediscuss more extensively the segmentation procedure.

At first for each pixel p of the image we extract a signatures(p), composed of 6 fields:• CrCb: chrominance values of the pixel (2 values);• BRG: value of the brightness of the pixel;• WTX: a vector containing 3 wavelet coefficients com-

puted on the pixel’s neighborhood with the Daubechiestransformation [13].

We refer to [7], [14] for a more complete description. Thedissimilarity between two signatures, s(pi), s(pj), associatedrespectively to the pixels pi, pj , is computed using theLWDM dissimilarity measure:

LWDM(s(pi), s(pj)) = η0 · (CrCbi − CrCbj) +

η1 · (BRGi − BRGj) +

η2 · (WTXi −WTXj),

(2)

where the 3 coefficients are computed as η0 = (1−ωBrg)·(1−ωWbin), η1 = ωBrg · (1 − ωWbin), and η2 = ωWbin, being ωBrg

the coefficient that weights brightness over chrominance andωWbin the coefficient that weights wavelet components overthe remainder.

Subsequently, for each pixel pi the sum Si of the dis-similarities of its signature from the ones of its 8 closestneighbors is computed using (2):

Si =1

8

∑j∈n(pi)

LWDM(s(pi), s(pj)), (3)

where n(pi) denotes the set of neighbors of pi.If Si ≤ τbin, where τbin is the stability threshold, the

pixel pi is marked as stable, otherwise it belongs to atransition region between two stable regions and it is markedas unstable. At the end of the procedure we obtain a map-ping, illustrated in Fig. 4(b), where white and black pixelsrepresent, respectively, the stable and the unstable regions.

In the next step, the connected regions of stable pixelsare grouped together into a single cluster. At the end of thisphase, we obtain an initial segmentation of the image, asdepicted in Fig. 4(c), where to each stable pixel we associatethe label of the corresponding region. Then we filter out theregions which are too small: if the relative area of a region isless than a given threshold τreg, the region is discarded. Forall the remaining regions Rx, we compute the centroid µx asthe average of all pixel signatures. Then, we proceed with the

(a) Original Image (b) Identifystable pixels

(c) Extract connectedstable regions

(d) Absorb instablepixels

(e) Backgroundremoval

Fig. 4: From the original image (a) stable and transitionpixels are identified (b). Connected regions of adjacent stablepixels are grouped in a single region (c). Unstable pixels areabsorbed in the more similar stable region (d), and finally abackground removal is applied (e).

absorption phase: each unstable pixel pi is associated to theclosest region Rx, where Rx is the region with the centroidµx that is most similar to pi, according to the followingdissimilarity measure:

d(pi, µx) = α · LWDM(s(pi), s(µx)) +

+ (1− α) · dc(c(pi), c(µx)).(4)

c(pi) represents the coordinates of the pixel within theimage, the dissimilarity dc(·, ·) is the Euclidean distance,and the parameter α ≥ 0 weights coordinate over signaturedissimilarity. The result is shown in Fig. 4(d). Finally, thebackground can be optionally removed, if it is considerednot interesting for a given problem (Fig. 4(e)). In order to fixpotential phenomena of “hyper-segmentation”, we performa clustering on the centroids of the regions using (4) asdissimilarity measure, in order to merge similar regions thatshould represent a unique segment. We used a fast sequentialclustering algorithm that depends on a resolution parameterτfus, namely the so-called BSAS algorithm [15, Chap. 12].

At the end of this procedure, we obtain the segmentationof the input image, that is, a mapping that associates eachpixel with the label of its corresponding region. The completelist of parameters for the segmentation procedure is H ={ωBrg, ωWbin, τbin, τreg, α, τfus}.

B. Graph Construction

Once we have obtained the segmentation of the inputimage, we derive the corresponding labeled graph representa-tion. We decided to represent each segment as a vertex of thegraph and the spatial relations among all segments with theedges of the graph (see Fig. 5) – so we have complete graphs.In order to describe with the highest accuracy each region,we defined the label of the vertices with a structured vectorthat contains several fields, useful to solve a wide range ofdifferent problems. Tab. I lists the features contained in the

2341

Page 4: An Interpretable Graph-based Image Classifierispac.diet.uniroma1.it/scardapane/pdf/2014 - An interpretable graph... · An Interpretable Graph-based Image Classifier Filippo M. Bianchi,

vertex and edge labels. Details on the implementation can befound in [7].

Fig. 5: Labeled graph representing a segmented image.

TABLE I: Vertex/Edge features of the graphs.

Name Description

Nod

e

Xc, Yc Normalized position of the centroid.CrCb Component-wise average of the chrominance.Sat Average of the saturation from the HSB representa-

tion.Brg Average of the brightness.Wlet Component-wise average of the wavelet components.Area Size of the region, expressed as the percentage of the

total area of the image.Symm A scalar value expressing how symmetric is the shape

of the region with respect to its center of mass.Rnd A scalar value that measures the roundness of the

shape of the region, i.e. how similar it is to a circle.Cmp A scalar value that measure how compact is the

region, i.e. how many pixels of the region fall outsidea circle of the same area centred in the center of massof the region.

Or A complex number that expresses if the segment hasan orientation and which this orientation is.

Dir A complex number that expresses if the segment hasa direction and which this direction is.

Edg

e dXc Distance of the Xc components of the two regions.dYc Distance of the Yc components of the two regions.dB Distance of the closest pair of pixels in the two

regions.

Vertex and edge labels are defined by numbers normalizedin [0, 1]. The dissimilarities dv and de between the respectivelabels are evaluated as the average of the differences of eachfield, pre-multiplied by a weighting factor relative to thespecific field. For example, given the vertex labels l(1)v andl(2)v associated to the vertices v(1) and v(2), their dissimilarity

is computed as:

dv(l(1)v , l(2)v ) =1

12

∑wi · (f (1)i − f (2)i ), (5)

where fi are the components of the vertex label describedabove and wi is the variable that weights its relevancein the dissimilarity. Note that the total number of nodedescriptors is 12. Each wi is binary, characterizing thus thecomplete relevance/irrelevance of the particular feature forthe description of the image with respect to the classificationtask. This cannot be decided a-priori, as in fact it depends onthe specific classification problem at hand. For example, if in

our classification problem the dimension of the segments ofthe images is not relevant, the field “Area” will be neglectedby setting wArea = 0. Analogously, for evaluating edgelabels dissimilarity we define:

de(l(a)e , l(b)e ) =

1

3

∑wi · (f (a)i − f (b)i ). (6)

The dissimilarity measures of vertex and edge labels de-pend then on a collection Θ of 15 weight parameters, whichare automatically tuned by the genetic algorithm during theaforementioned training phase.

C. Graph Classification

The graph classification is carried out by GRALG [8]. Thegraph embedding procedure in GRALG is based on the sym-bolic histograms, which are constructed by first identifyinga collection of frequent subgraphs A = {g1, ..., gm} of thetraining set. In the following, we briefly introduce GRALG;we refer to [8] for the details.

The computation of A, called symbols alphabet, is per-formed by means of a clustering ensemble procedure appliedon an appropriate set of subgraphs extracted from the trainingset only. A suited adaptive filtering of the obtained partitionsis performed with the aim of selecting only compact and pop-ulated clusters of subgraphs. The subgraphs populating A arederived by compressing the information of the non-filteredclusters considering the representative subgraphs only; therepresentative subgraph of a cluster is computed by meansof the well-known Minimum Sum Of Distances (MinSOD)[16]. Once obtained A, the embedding consists in describingeach input graph Gi with an integer-valued vector, hi, calledsymbolic histogram, which is defined as follows:

hi = φA(Gi) = [occ(g1), ..., occ(gm)]T , ∀Gi ∈ G. (7)

The function occ(·) counts the “occurrences” of eachrepresentative subgraphs gj ∈ A in the input graph. Theoccurrence of a subgraph gj into a graph Gi is evaluatedusing a weighted GED-based graph matching procedure,d(·, ·), called weighted Best Match First (wBMF) [17], whichoperates directly on the input space of graphs G. The wBMFprocedure adopts a weighting scheme for the edit operationsbased on a set ∆ of six parameters, which are used tomodulate the importance of the substitution, insertion, anddeletion edit operations for both vertices and edges. If thematching score reaches the symbol-dependent threshold, τj ,the occurrence is considered and the corresponding compo-nent is incremented by one. The adopted graph matchingprocedure is based on the two dissimilarity measures (5) and(6), for node labels and edge labels, respectively.

The embedding space D is defined as the vectorspace containing all the symbolic histogram representations,{hi}ni=1 ⊂ D ⊆ Rm. Reasonably, graphs belonging to thesame class will be characterized by a similar distributionin terms of symbols occurrences, while graphs pertaining todifferent classes will denote a discriminative representation.At this point, a generic classifier conceived to work in avector space can easily deal with the classification task.

2342

Page 5: An Interpretable Graph-based Image Classifierispac.diet.uniroma1.it/scardapane/pdf/2014 - An interpretable graph... · An Interpretable Graph-based Image Classifier Filippo M. Bianchi,

Fig. 6: Scheme of the embedding procedure: the two graphs are represented with symbolic histograms that enumerate theoccurrences of each symbol of the alphabet (set of recurrent subgraphs) within the graphs.

The behaviour of the embedding procedure depends, be-sides the dissimilarity parameters ∆ and Θ, on other im-portant parameters, synthetically denoted as Γ. The entireset of GRALG parameters is denoted with P = {∆,Θ,Γ},as depicted in Fig. 2. The values of the variables containedin H (relative to the segmentation) and in P (relative toGRALG) are tuned during the learning phase through agenetic algorithm. The jointly results of this optimization isa first instance of the alphabet A, which contains recurrentsymbols. However, for a given subgraph the property of beingvery frequent in the training dataset cannot be consideredsufficient to take part into the classification model synthesis,since its semantic value could be unrelated to the problemat hand. For this reason, a second optimization stage isperformed, which leads to the selection of the smallest subsetof A that is sufficient for solving the problem at hand withhigh accuracy, i.e., with high class discrimination.

The key assumption underlying the GRALG classificationsystem is that the classes can be discriminated in terms ofthe frequent subgraphs extracted from the training set Str.If such a characterization exists, GRALG is able to developa new vector representation (embedding) of the input graphsin terms of symbolic histograms, which in turn contains theinformation needed to synthesize a suited classification rule.Moreover, the symbolic histograms representation contains“interpretable information”, which can be exploited by fieldexperts to derive insights for the problem at hand. Forexample, it is possible to understand which are the featuresthat characterize a class, interpreting the distribution of thesymbols in the histograms. It is important to underline thatthe clustering ensemble procedure is in charge of defininga collection of clusters of frequent subgraphs, which arecandidate to become meaningful symbols. The two opti-

mization stages are performed in order to discover thoseinformation granules actually related to the classification taskat hand. A symbol is therefore not just a representative ofa set of similar subgraphs found in the training set: it isan information granule with a specific semantic value. Thesemantic value is attributed by the system during the firstand second optimization stages, when it is recognized asuseful to the final classification task, usually working insome logic conjunction with other symbols. Note that thesame frequent subgraph discovered as a symbol for a givenclassification problem, can be completely uninformative foranother problem. Figure 6 shows an example describing themechanism underlying GRALG.

III. EXPERIMENTAL RESULTS

In this section, we show the results obtained on 4 differentclassification problems taken from the SPImR1 repository[18] (here we denote those problems with the same namesassigned in the repository). The problems contain imagescoming from different classes that are split in the threedatasets: Str, Svs, and Sts. In the following, we first givea description of the problems and then we report the averageclassification accuracy achieved on the test set by considering5 different runs. Moreover, we report the (binary) dissimi-larity parameters characterizing the image features that havebeen selected as relevant by the genetic algorithm in order tocorrectly solve the classification problem. Most importantly,we visually show the symbols identified by GRALG thatare discriminating for the classes. For each problem weperformed several runs, initializing each time the genetic al-gorithm with different random seeds. In many cases we haveobtained solutions which achieved the same recognition ratewith different configurations of selected features and alphabet

2343

Page 6: An Interpretable Graph-based Image Classifierispac.diet.uniroma1.it/scardapane/pdf/2014 - An interpretable graph... · An Interpretable Graph-based Image Classifier Filippo M. Bianchi,

symbols. This proofs the fact that each problem can be solvedin different ways and that more than one set of features canbe used for discriminating the classes. In this section, for thesake of conciseness, we have reported one solution for eachproblem, in particular only the most recurrent among all theones found in different runs. In those tests we have set thevalue of the variable λ, which is the weight present in theequation (1) that balance the error rate on the validation setover the complexity of the model, to 0.9. Preliminary testshave shown that such a weight value allows the system toproduce good results in terms of classification error, keepingthe complexity (i.e. number of features considered) at a lowlevel. The system is implemented in C++ using functions andclasses implemented in the SPARE library1 [19].

A. Problem P08

We start with a simple problem, where each image con-tains a single object. In particular, images of class 1 depictone medium sized red object or one big green object, whileimages of class 2 depict one medium sized blue object orone small green object. (see Fig. 7).

Class 1

Class 2

Fig. 7: Some image instances of problem P08.

We have obtained an average recognition rate on Sts of100%. Typical features of the vertex label that have beenidentified as relevant by the genetic algorithm during thetraining phase for describing a segment are: Xc, Yc, CrCb,Brg, Wlet, Area, Symm, Rnd, and Cmp. Being all graphsof order 1 (i.e., vertices), accordingly no edge-features havebeen selected by the algorithm. In this case, it is possible tonote that the features Xc, Yc, Wlet, Symm, Rnd, and Cmp have(although sometimes) been considered, even if they are notstrictly necessary. This may happen because, even if we havedesigned the objective function of the genetic algorithm sothat it is biased toward the selection of a small set of relevantparameters, sometimes the solution corresponding to theglobal optimum is not identified (the genetic algorithm doesnot guarantee that). Furthermore, sometimes the system findssome “hidden patterns” that characterize the classificationproblem, which have been unintentionally generated duringthe design of the problem itself. For example, in this problemimages of class 1 displaying large sized green objects areless compact than the others, a characteristic which is only

1http://sourceforge.net/p/libspare/home/Spare/

coincidental, although it explains why the field Cmp isconsidered relevant.

The GRALG system has found 2 discriminative subgraphs(symbols), which are a large green size object (s1) and ared medium size object (s2). Fig. 8 shows them using arenderized image of the symbol. Images of class 1 havebeen represented by histograms (MinSOD representative)that contain the occurrences of symbols s1 or s2, whileimages of class 2 are represented with empty histograms, i.e.,only the first class is effectively characterized. This resultsis due to the fact that GRALG synthesizes the essentialalphabet in a completely unsupervised fashion. An exampleof embedding for a small set of testing patterns (labeled fromp(1) to p(6)) is shown in Tab. II.

S1 S2

Fig. 8: Symbols found by GRALG for discriminating class1 from class 2 in problem P08.

TABLE II: Embedding of some samples in problem P08.

Pattern Symbolss1 s2

Cla

ss1 p(1) 1 0

p(2) 0 1p(3) 0 1

Cla

ss2 p(4) 0 0

p(5) 0 0p(6) 0 0

B. Problem P15

This problem contains images of three different classes.Images of class 1 depict one eraser on the left-hand side ofthree guitar picks; images of class 2 depict one eraser at theright of three guitar picks; finally, images of class 3 depictone eraser among three guitar picks (see Fig. 9).

The recognition rate achieved on this problem on Stsis still 100%. The dissimilarity parameters that have beenidentified as relevant are dXc, dYc, Yc, CrCb, and Brg. Evenif the features relative to the color of the regions are notrelevant for this classification problem, the ones associatedto the position of the region and the relative distances arecorrectly identified as relevant. For this problem, the symbolsfound by GRALG for classifying the patterns are two (seeFig. 10).

In particular, in the graphs that represent patterns of class1 only the symbol s1 occur; graphs associated to patternsof class 2 are characterized by the occurrence of symbols2; finally, the graphs of class 3 contain occurrences ofboth symbols s1 and s2. In Tab. III, some examples of thesymbolic histograms associated to patterns of the 3 differentclasses are presented.

2344

Page 7: An Interpretable Graph-based Image Classifierispac.diet.uniroma1.it/scardapane/pdf/2014 - An interpretable graph... · An Interpretable Graph-based Image Classifier Filippo M. Bianchi,

Class 1

Class 2

Class 3

Fig. 9: Some instances of the three classes of P15.

S1 S2

Fig. 10: The two symbols used to characterize the threeclasses.

TABLE III: Embedding of the patterns in problem P15.

Pattern Symbolss1 s2

Cla

ss1 p(1) 0 1

p(2) 0 1p(3) 0 1

Cla

ss2 p(4) 1 0

p(5) 1 0p(6) 1 0

Cla

ss3 p(7) 1 1

p(8) 1 1p(9) 1 1

C. Problem P05

This classification problem contains images drawn fromtwo different classes. The images of class 1 depict onecolored object in the lower right corner of the image and grayobjects in the other three corners; images of class 2 depictone colored object in the upper left corner of the image andgray objects in the other three corners (see Fig. 11).

The average recognition rate achieved on Sts is 90%.The genetic algorithm selected Xc, Yc, CrCb, and Sat, thatare coherent with the expected classes description. In fact,the dissimilarity between different regions must consider thecolor of the regions and its position on the plane. In thiscase, a single symbol is extracted and therefore used fordiscriminating the 2 classes (the subgraph is depicted in Fig.12). As we can see from the image, the subgraph is a graphof order 3 that represents a colored object in the upper leftcorner and two gray objects on the right.

In the graphs of class 1 the symbol s1 does not occur,while it occurs in those of class 2, since in fact they represent

Class 1

Class 2

Fig. 11: Some instances of the two classes of P05.

S1

Fig. 12: Patterns of class 1 are discriminated by the absenceof this symbol; those of class 2 by its occurrence.

images containing a colored object in the upper left corner.In Tab. IV some examples of the synthesized symbolichistograms are reported.

TABLE IV: Embedding of the patterns in problem P05.

Pattern Symbolss1

Cla

ss1 p(1) 0

p(2) 0p(3) 0

Cla

ss2 p(4) 1

p(5) 1p(6) 1

D. Problem P23

This problem is not contained in the SPImR1 repository,and it was generated specifically for testing the systemdescribed in this paper. In fact each class here is explicitlycharacterized by a recurrent substructure, that should beindividuated by our system. The images of class 1 containone random object plus a composed sub-pattern formed bya yellow star and a red circle in a random position, but ata fixed mutual distance and size. Images of class 2 containinstead one random object plus a yellow star and a greensquare in a random position, again positioned at a fixedmutual distance and size (see Fig. 13).

The recognition rate achieved on Sts is 100%. The dis-similarity parameters that have been identified as relevantby the genetic algorithm during the training phase are CrCband Cmp, which are sufficient to discriminate the regions be-longing to the two different classes. The GRALG system hasfound only one discriminative symbol, that is, the subgraphof order 2 composed of the yellow star and the green square(Fig. 14).

2345

Page 8: An Interpretable Graph-based Image Classifierispac.diet.uniroma1.it/scardapane/pdf/2014 - An interpretable graph... · An Interpretable Graph-based Image Classifier Filippo M. Bianchi,

Class 1

Class 2

Fig. 13: Some image instances of the two classes of P23.

S1

Fig. 14: Patterns of class 1 are discriminated by the absenceof this symbol, while those of class 2 by its occurrence.

This symbol is in fact sufficient to correctly discriminatethe classes: images of class 1 are represented with a symbolichistogram component equal to 0, while images of class 2 arerepresented by an histogram with a single component equalto 1 (see Tab. V).

TABLE V: Embedding of the patterns in problem P23.

Pattern Symbolss1

Cla

ss1 p(1) 0

p(2) 0p(3) 0

Cla

ss2 p(4) 1

p(5) 1p(6) 1

IV. CONCLUSIONS

In this paper, we have presented a fully automated graph-based system for image classification, whose output can beeasily interpreted by a human user relying on the weights ofthe inner node and edge dissimilarity measures (i.e., featurespertinent to the images) and by considering recurring anddiscriminating substructures (i.e., the symbols). All problemsfaced in this paper and presented in Section III have beenprocessed without manually modifying any parameter, norproviding the system with any a-priori knowledge of thesemantic of the problems. Due to pages constraints, wehave presented only a handful of the available benchmarkingproblems for image classification and interpretation. It isworth mentioning that those problems have been conceivedin a controlled environment with a well-defined semantic,so that interpretability of results could be easily performedand verified. In a future work, we expect to present the

results obtained by our system when it is applied to morecomplex and challenging image classification and interpre-tation problems. Finally, it is worth stressing that the samescheme characterizing the presented system could be appliedto other contexts, where it is possible (and it is meaningful)to represent an input pattern as a labeled graph, which in turncould be conveniently characterized in terms of the recurringsubgraphs.

REFERENCES

[1] D. Lu and Q. Weng, “A survey of image classification methods andtechniques for improving classification performance,” Internationaljournal of Remote sensing, vol. 28, no. 5, pp. 823–870, 2007.

[2] Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado,J. Dean, and A. Ng, “Building high-level features using large scaleunsupervised learning,” in International Conference in Machine Learn-ing, 2012.

[3] L.-J. Li, H. Su, L. Fei-Fei, and E. P. Xing, “Object bank: A high-level image representation for scene classification & semantic featuresparsification,” in Advances in neural information processing systems,2010, pp. 1378–1386.

[4] B. Ozdemir and S. Aksoy, “Image classification using subgraphhistogram representation,” in 2010 20th International Conference onPattern Recognition (ICPR), 2010, pp. 1112–1115.

[5] B. Saux and H. Bunke, “Feature selection for graph-based imageclassifiers,” in Pattern Recognition and Image Analysis, ser. LectureNotes in Computer Science. Springer Berlin Heidelberg, 2005, vol.3523, pp. 147–154.

[6] T. Athanasiadis, P. Mylonas, Y. Avrithis, and S. Kollias, “Semanticimage segmentation and object labeling,” IEEE Transactions on Cir-cuits and Systems for Video Technology, vol. 17, no. 3, pp. 298–312,2007.

[7] A. Rizzi and G. Del Vescovo, “Automatic Image Classification bya Granular Computing Approach,” in Proceedings of the 2006 16thIEEE Signal Processing Society Workshop on Machine Learning forSignal Processing, sept. 2006, pp. 33–38.

[8] F. M. Bianchi, L. Livi, A. Rizzi, and A. Sadeghian, “A GranularComputing approach to the design of optimized graph classificationsystems,” Soft Computing, Jun. 2013.

[9] G. Del Vescovo and A. Rizzi, “Automatic classification of graphs bysymbolic histograms,” in Granular Computing, 2007. GRC 2007. IEEEInternational Conference on, Nov 2007, pp. 410–416.

[10] N. Acosta-Mendoza, A. Gago-Alonso, and J. E. Medina-Pagola,“Frequent approximate subgraphs as features for graph-based imageclassification,” Knowledge-Based Systems, vol. 27, no. 0, pp. 381 –392, 2012.

[11] A. Bleau and L. J. Leon, “Watershed-based segmentation and regionmerging,” Computer Vision and Image Understanding, vol. 77, no. 3,pp. 317–370, 2000.

[12] A. C. Bovik, Handbook of image and video processing. Elsevier,2010.

[13] M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies, “Image cod-ing using wavelet transform,” IEEE Transactions on Image Processing,vol. 1, no. 2, pp. 205–220, 1992.

[14] A. Rizzi, G. Del Vescovo, L. Livi, F. M. Bianchi, and A. Sadeghian, “AGranular Modeling System for Image Classification and Understand-ing,” IEEE Transactions on Pattern Analysis and Machine Intelligence,under review, Manuscript ID: TPAMI-2014-01-0083.

[15] S. Theodoridis and K. Koutroumbas, Pattern Recognition, third edi-tion ed. Academic Press, 2006.

[16] G. Del Vescovo, L. Livi, F. M. Frattale Mascioli, and A. Rizzi, “Onthe Problem of Modeling Structured Data with the MinSOD Repre-sentative,” International Journal of Computer Theory and Engineering,vol. 6, no. 1, pp. 9–14, 2014.

[17] L. Livi and A. Rizzi, “The graph matching problem,” Pattern Analysisand Applications, vol. 16, no. 3, pp. 253–283, 2013.

[18] “Spimr1: A set of 20 instances of synthetic and photographic imageclassification problems.” [Online]. Available: http://infocom.uniroma1.it/∼rizzi/index.htm

[19] L. Livi, A. Rizzi, and G. Del Vescovo, “Building Pattern RecognitionApplications with the SPARE Library,” Pattern Analysis and Applica-tions, 2013, under review. Manuscript ID: PAAA-D-13-00168.

2346