complex networks for image classification

Complex networks for image classification

Maarten Baijens ANR: 722956

Tilburg University School of Humanities and Digital Sciences

Department of Cognitive Science & Artificial Intelligence Tilburg, The Netherlands

January 2019

THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE

DEGREE OF MASTER OF SCIENCE IN COMMUNICATION AND INFORMATION

SCIENCES, MASTER TRACK DATA SCIENCE BUSINESS & GOVERNANCE, AT THE

SCHOOL OF HUMANITIES AND DIGITAL SCIENCES OF TILBURG UNIVERSITY

Thesis committee:

Dr. Martin Atzmueller Dr. Grzegorz Chrupala

1

Abstract

Complex networks have shown success in improving classification rates for images.

Particularly research on texture-based images has shown that modeling images as complex

networks can help identifying images. This research looks at using these networks for non-

texture-based images. The main research question is to what extent modeling images as

complex networks works for classification.

This research uses a simplified model based on the multilayer complex network descriptors

that Scabani, Condori, Goncalves & Bruno (2018) created in their study. They successfully

used these network descriptors on texture-based images. This research investigates the idea

whether a simplified model works on non-texture-based images.

Two datasets are used to find an answer to the main question. The first is the Vision Texture

database. This dataset contains texture-based images that are used to see whether the

simplified model works on texture-based images. The second set is the CIFAR-10 dataset.

This set contains non-texture-based images that are used to see whether using complex

networks works on those types of images.

The results show that the model works for texture-based images. The two algorithms used got

accuracy scores up to 90.6% on these images. The non-texture-based images that are modeled

as complex networks scored considerably lower, with accuracy scores not going above 22%.

The reasons for the results discussed above are twofold. Firstly, the images used had a low

resolution which means that there is less information that can be used for the modeling. And

secondly the non-texture-based images did not have enough differences which means that the

classification algorithms are not able to differentiate the different classes.

The conclusion is that modeling non-texture-based images as complex networks with this

model does not work.

2

Table of Contents

Abstract ...................................................................................................................................... 1

1.0 Introduction .......................................................................................................................... 4

1.1 Scientific relevance .......................................................................................................... 4

1.2 Practical relevance............................................................................................................ 5

1.3 Research questions ........................................................................................................... 5

1.4 Findings ............................................................................................................................ 6

2.0 Related Work ........................................................................................................................ 7

2.1 Image recognition ............................................................................................................. 7

2.2 Texture analysis ................................................................................................................ 7

2.2.1 Statistical methods .................................................................................................... 7

2.2.2 Model based methods ............................................................................................... 8

2.2.3 Structural methods .................................................................................................... 9

2.2.4 Color texture analysis ............................................................................................... 9

2.4 Complex Networks ........................................................................................................... 9

2.5 Convolutional Neural Networks..................................................................................... 10

2.6 Multilayer Complex Network Descriptors ..................................................................... 10

3.0 Methods.............................................................................................................................. 12

3.1 Image processing ............................................................................................................ 12

3.2 Modeling as Complex Networks .................................................................................... 12

3.3 Classification algorithms ................................................................................................ 15

3.3.1 Support Vector Machines ........................................................................................ 15

3.3.2 Random Forest ........................................................................................................ 16

4.0 Experimental Setup ............................................................................................................ 17

4.1 Datasets .......................................................................................................................... 17

4.1.1 Vision Texture ......................................................................................................... 17

4.1.2 CIFAR-10 ................................................................................................................ 18

4.2 Software ......................................................................................................................... 18

3

4.2.1 Packages in R-Studio .............................................................................................. 18

4.3 Proposed model .............................................................................................................. 19

4.4 Evaluation....................................................................................................................... 20

5.0 Results ................................................................................................................................ 21

5.1 VisTex modeled .............................................................................................................. 21

5.1.1 SVM ........................................................................................................................ 21

5.1.2 Random Forest ........................................................................................................ 23

5.1.3 Algorithms compared .............................................................................................. 24

5.2 CIFAR-10 unmodeled .................................................................................................... 25

5.2.1 SVM ........................................................................................................................ 25

5.2.2 Random Forest ........................................................................................................ 26


5.3 CIFAR-10 modeled as complex networks ..................................................................... 27

5.3.1 SVM ........................................................................................................................ 27

5.3.2 Random Forest ........................................................................................................ 29


5.4 Summary of the results ................................................................................................... 30

6.0 Discussion .......................................................................................................................... 31

7.0 Conclusion ......................................................................................................................... 34

7.1 Performance on texture-based images as complex networks ......................................... 34

7.2 Performance on base non-texture-based images ............................................................ 34

7.3 Performance on non-texture-based images as complex networks ................................. 35

7.4 Performances of algorithms compared ........................................................................... 35

7.5 Performance compared to baseline ................................................................................ 36

7.6 Modeling non-texture-based images as complex networks? .......................................... 36

References ................................................................................................................................ 38

4

1.0 Introduction

Image recognition is a popular topic in not only scientific research but also in mainstream

media. Through the years, the methods used to recognize images have evolved and the

algorithms keep advancing. Methods like grey-scale texture analysis and color distribution

analysis have been successfully used to classify images. And because of the increase in

computational power through the years, these methods have evolved into more demanding

algorithms that can be more accurate. However, the error of image recognition compared to

humans is still significantly higher in a lot of cases (Kovalevsky, 2012).

One of the most recent methods is using networks in images for classification. Convolutional

Neural Networks are getting high accuracies on image sets and show much promise

(Krizhevsky, Sutskever & Hinton, 2012). Recent research (Scabini, Condori, Goncalves &

Bruno, 2018) shows the use of complex networks in the form of multilayer complex network

descriptors to improve image classification, which in some cases performs even better than

Convolutional Neural Networks.

This all leads to a field in which a lot is still to be learned and discovered. This study takes a

further look into using complex networks to model images.

1.1 Scientific relevance

The idea of image recognition by machine learning has been a popular research topic for

many years. There are however still new methods to be discovered and old methods to be

improved on. In the last years the use of networks has shown promise as one of the best

methods currently available. Especially Convolutional Neural Networks show high

accuracies in classifying images (Krizhevsky et al, 2012). More recent research shows

complex networks as another efficient way to classify images.

To further the knowledge in this area this research builds on existing research of Scabani et al

(2018). To examine the use of complex networks and how they can be used to classify other

images. Scabani et al (2018) used complex networks in their research to improve

classification of textures. Because of this their research was primarily focused on texture-

based images such as wood and brick. The improvements that the use of these complex

networks in this way bring could also work on other types of images. This research looks

whether the basis of their modeling works on images that are not based on textures but on

real life objects. Real life objects also have certain textures and patterns so complex networks

could improve classification on these objects. A simplified model is used compared to the

model of Scabani et al. (2018).

5

These findings can further the research into image recognition as a research area. This

research gives insight in the current state of image recognition methods and improves on the

knowledge that is available.

1.2 Practical relevance

Image recognition these days is something that is widely spread. Everyone can use google to

reverse image search to see what an image is. Facebook can predict which people are in a

photograph. This means that improvements or other methods are useful for practical reasons,

since it impacts the lives of many people.

Image recognition is also used for more important technology. Imaging in medical fields is

used to diagnose diseases. Alzheimer’s can be detected by looking at images of brain scans

for instance (Anitha & Jyothi, 2016). It can be used in industries to make inspecting

machinery easier. Images can be input in an algorithm to check for problems instead of letting

a person look at it. Image recognition can also be used in sensor technology and a whole

catalog of other areas that make use of image data.

1.3 Research questions

This research looks at patterns in non-texture-based images. The patterns used are based on

network theory and specifically complex networks as proposed by Scabani et al (2018), who

showed that complex networks work on texture-based images. This leads to the following

main research question:

To what extent does modeling non-texture-based images as complex networks work

for classification?

To answer this question this research looks at different aspects. A baseline must be formed

with which the performance can be compared.

The first part of the baseline is the performance of the model on texture-based images. These

are images based on a single texture like brick or wood, instead of non-texture-based images

which do not have a just one texture like an image with a train and its surroundings. The

second part of the baseline is the performance on the base non-texture-based images.

Different algorithms are used, these need to be compared to each other.

6

This leads to the following sub questions:

How do classification algorithms perform on texture-based images modeled as

complex networks?

How do classification algorithms perform on the base non-texture-based images?

How do classification algorithms perform on the non-texture-based images modeled

as complex networks?

How do the performances of the algorithms compare to each other?

How do the performances of the non-texture-based images modeled as complex

networks compare to the baseline?

1.4 Findings

The main findings of the research are found in the table below. This table contains the highest

accuracy rates for each of the used methods and datasets.

VisTex CIFAR-10 Baseline CIFAR-10 Modeled

SVM 90.6% 28% 22%

Random Forest 90.6% 28% 19%

The results show that the algorithms got high results on the texture-based images modeled as

complex networks which are represented in the VisTex column. This is the dataset that was

used to get those results. They show low results on the base non-texture-based images which

is the column called baseline. And finally, they show even lower results on the non-texture-

based images modeled as complex networks which is represented in the last column. The

differences between the algorithms are small. And both are unable to classify non-texture-

based images with the modeled dataset as input.

7

2.0 Related Work

In this chapter the related work is discussed. Image recognition as a topic is discussed first.

After that different methods that have been used in the past are discussed in broad terms with

some examples to show the ideas behind the different methods. This leads to networks being

used in image classification and the theory of multilayer complex network descriptors where

this research is mainly based on.

2.1 Image recognition

Image recognition is the ability of an algorithm to identify something in an image. This can

be an object, a place, people or even writing (Rouse, 2017). Human brains can recognize

these objects easily, but computers have a much harder time with this task. Image recognition

is basically pattern recognition for images. There are many methods that are used for the

purposes of image recognition. Different classification algorithms can be used for instance.

The problem with most classification algorithms however is that they are not accurate with

the base image data. The data first needs to be modified. For this purpose, different methods

are used. In the next paragraphs the methods that are in line with the one used in this study

are explained.

2.2 Texture analysis

Texture analysis is an active research topic in which textures in images are researched to be

able to recognize them. Textures in images have been studied extensively and different

schemes of analysis have been proposed in these studies (Wouwer, Scheunders, Livens &

Dyck, 1997). The schemes all have in common that they look at spatial interactions between

the different pixels in an image. The goal of texture analysis in general is to look at texture

aspects in images and to use this for classification purposes (Scabani et al., 2018).

2.2.1 Statistical methods

Statistical methods in texture analysis use a collection of statistics of selected features (Zhang

& Tan, 2002). This is because the human visual system uses statistic features for texture

discrimination. The statistics include first-order, second-order and high-order statistics. Some

examples of statistical methods are polarograms, harmonic expansion and feature distribution

method. To further illustrate statistical methods, a couple of these methods are explained.

Polarograms are polar plots of texture statistics described as a function of orientation (Davis,

1981). Polar plots are plots in which you can write coordinates as the distance between a

point and the origin of the plot. From a polarogram texture features are derived based on the

size and shape of the polarogram (Zhang & Tan, 2002). The shape depends on the boundary

8

of the polarogram but also on the position of the origin. These features are then used for

classification with experiments getting 75% to 90% classification rate.

In the harmonic expansion approach (Alpati & Sanderson, 1985) an image is decomposed

into a combination of harmonic components in a polar form (Zhang & Tan, 2002). This

projection of the original image in harmonic form gives features of the pattern. Experiments

gave classification rates of 90% and better.

The feature distribution method (Ojala, Pietikainen & Harwood, 1996) bases features on

center-symmetric auto-correlation, local binary pattern and gray-level difference to describe

textures in images (Zhang & Tan, 2002).

2.2.2 Model based methods

Model based methods model a texture image as a probability model or a linear combination

of a set of basic functions (Zhang & Tan, 2002). These models can be used to calculate

coefficients which then can classify the image. Some examples of model-based methods are

SAR models, the Markov model and Wavelet transform. To further illustrate model-based

methods, a couple of those methods are explained.

The SAR model stands for simultaneous autoregressive model (Zhang & Tan, 2002). It looks

at the gray level of pixels in textured images. This leads to the following model definition:

In this model f(s) is the grey level of the pixels and ώ is the set of neighbors of the specific

pixel at site s. ε(s) is an Gaussian random variable, u is the bias independent of the mean gray

value of the image and θ(r) are the model parameters which can be used as the texture

features (Zhang & Tan, 2002). This model has been used for classification but also for

segmentation and synthesis. From this model other models have been created like the CSAR

and RISAR models.

The Markov model (Cohen, Fan & Patel, 1991) is used to model texture as Gaussian Markov

random fields and then use the maximum likelihood to estimate coefficients and rotation

angles (Zhang & Tan, 2002). This is a computationally intensive model which is a problem

and it also is highly nonlinear which is also a problem (Zhang & Tan, 2002).

Wavelet is a model for texture discrimination (Chitre & Dhawan, 1999). It decomposes a

texture image into frequency channels. These channels are created as such that they have

narrower bandwidths in the lower frequencies. This makes it specifically useful for textures

9

that are smooth because the information of such textures is concentrated in the lower

frequencies (Zhang & Tan, 2002).

2.2.3 Structural methods

Structural methods view texture as consisting of many textural elements which are arranged

in a certain way so that they follow placement rules (Zhang & Tan, 2002). This is because

humans can strongly perceive structural properties of textures. Some examples of structural

methods are an invariant histogram and morphological decomposition. To further illustrate

structural methods, a couple of these methods are explained.

The invariant histogram (Goyal, Goh, Mital & Chan, 1995) is a useful method for texture

analysis. The histogram is created with texture elements and can in turn be used for texture

characterization. The histogram is based on the following function

fw is the weighted frequency and is made up of ai which is the area of texel of area-index i and

ni the number of elements of index i (Zhang & Tan, 2002). Experiments with this method

achieved 95% classification rate.

Morphological decomposition (Lam & Li, 1997) decomposes a texture into a scale-dependent

set of component images. For each of these component statistical features are obtained.

Experiments with this method have been able to get 97,5% correct classification rate on

images (Zhang & Tan, 2002).

2.2.4 Color texture analysis

The three types of methods explained before are based on grey-scale images. This means that

you leave out an important way that humans can recognize images, namely the color

information. Based on this there are also methods that use color. The most basic way of using

color is making a histogram in which the colors are binned in a certain way. By comparing

the histograms, it shows which ones are similar in color (Hafner, Sawhney, Equitz, Flicker &

Niblack, 1995). The disadvantage is that the spatial information is lost with this method. One

of the ways to solve this disadvantage is to look at images as networks. This is the direction

this research will go in.

2.4 Complex Networks

As said before texture analysis looks at the spatial interactions between different pixels. This

gives way to a new way of thinking based on networks. Networks also look at interactions

10

between different nodes, same as spatial interactions between pixels. This leads to complex

networks. These networks are based on an interdisciplinary research area on which physics,

mathematics, biology, computer science and more disciplines look at a wide variety of

complex network structures (Silva & Zhao, 2016). This is done to understand interwoven

systems and their complexity. Structure affects function (Strogatz, 2001) and so the structure

of images affects what they show and how they can be classified. This can be described in the

form of a complex network.

These network structures form patterns and so can be used for pattern recognition (Scabani et

al., 2018). A network is based on several vertices and edges. There are weighted networks

and unweighted networks. Edges can have a direction or can be undirected.

Measured quantities are needed to characterize a network structure (Barrenas, Chavali,

Holme, Mobini, & Benson, 2009). Network measures are elements of networks that can be

represented in different ways and can tell the differences between two networks (Rubinov, &

Sporns, 2010). Network measures are used in this research to generalize images based on the

complex networks and the ones used are explained in the methods section.

2.5 Convolutional Neural Networks

The proof that recognition through networks works for images is found in the accuracy that

Convolutional Neural Networks get on image classification tasks. CNN’s are one of the best

performing algorithms when it comes to image classification (Krizhevsky, Sutskever &

Hinton, 2012).

CNN’s are hierarchical neural networks that use the 2d structure of an image to classify it

(Ciresan, Meier, Masci, Gambardella & Schmidhuber, 2011). CNN’s consist of different

layers. The first one is an image processing layer, this is an optional layer in which predefined

filters can be given. This can give extra input next to the raw image data. The next layer is the

convolutional layer. This layer creates the neural networks that form the basis for the

classification. The last layer is the classification layer in which on basis of the created

networks a classification is given to the objects.

2.6 Multilayer Complex Network Descriptors

Based on complex networks Scabani et al. (2018) proposed a new technique to model and

characterize color-texture. Each color channel is mapped as a layer in the network. The

vertices in this network consist out of the pixels of the image. The network consists out of

one vertex layer for each color. These colors are based on color-channels like RGB, which

stands for red; green and blue. Whether or not the vertices are connected is based on the

11

Euclidean distance of the vertices. By looking at the connected vertices, the different edges

can be composed. In this first step every vertex has the same number of connections except

those on the borders. This forms an ordinary network. For it to become a complex network

another step must be taken. A texture consists of patterns of intensity variation. Based on this

the approach is to cut the network connections where the connections between distinct

vertices are kept.

The weights of the connections between different vertices are based on the following

function:

In this function p(v) is the intensity value based on the color channel, d(vi, vj) is the

Euclidean distance between two pixels, L is the maximum value of the color channels and r is

the radius in which pixels are connected in the cartesian space.

How the theory of multilayer complex network descriptors as described here is used to create

complex networks from images is explained in the methods section. This research shows

whether complex networks can be used on images that are not texture based, as the images in

the original research are all textures like brick and wood instead of real-world objects. It also

looks at different classification algorithms to see if modeling images as complex networks

can improve accuracy of those algorithms.

12

3.0 Methods

Different methods are used to get to the results that are needed. The images are processed

first. Afterwards the images are turned into complex networks which in turn is the input for

two classification algorithms.

3.1 Image processing

The images are processed to forms rows in a dataframe with each row being an image. Each

row contains the color information based on the RGB color channels. The first values are the

red values, after that come the green values and finally the blue values. Within each channel

the first value represents the first pixel of the top row of the image and the second value the

second pixel of the top row of the image. After a row of pixels is finished it continues to the

next row of pixels, this sequence is used until the whole image is processed. So, for example

in an image which exists of 4x4 pixels with the next values for RGB:

By processing this it turns into the following row for a dataframe:

The labels of the images are then added to the dataframe with each row getting the label

corresponding to the image. These rows can be used for the modeling that is needed to turn

the images into complex networks. This step is explained in chapter 3.2.

The advantage of processing the images and transforming them to numbers in a dataframe is

twofold. Firstly, it is easier to do calculations on the numbers compared to using the direct

image-file. Secondly, this way of transforming the data makes it easier to put it into the two

classification algorithms that are used in this research.

3.2 Modeling as Complex Networks

The modeling of the data revolves around creating a complex network for the processed

image data as described above. Each pixel is a possible node in a network for each image.

This model is based on the work of Scabani et al. (2018). They used two steps to create the

edges between different nodes. The first was to look at the spatial feature of the pixels,

specifically the coordinates in cartesian space. The second step is to compare the different

13

pixels to each other based on the color channels and connect the ones that are different to

each other. After this they continue and use the network and create different subnets. They

use topological measures on these subnets which are then converted to RGB images. An

example of the recreation of the images that they use can be seen below (Scabani et al, 2018).

In this research a simpler approach is used. The steps made in the modeling process in this

research are based on the first steps of the research of Scabani et al. (2018). In this research

the complex networks are used to get variables based on network measures. These measures

are used as input for the classification algorithms.

The first step is to calculate which pixels are connected within a certain radius.

The pixel is connected with the other pixels if it is within the radius R as shown on the

picture above. In the picture the first pixel of the first row is connected in a network to the

second pixel of the first row and the first and second pixel of the second row. To calculate

this the following formula is used:

In this formula x1 and x2 are the x-coordinates of the two pixels that are being compared and

y1 and y2 are the y-coordinates of the two pixels that are being compared. d is the value that

is compared with the radius (R). If it is smaller than the radius then the pixels are connected,

if it is bigger, the pixels are not connected.

The second step is to compare the pixels based on the color data using the following formula:

14

In this formula v1 is the color value of the first pixel and v2 the color value of the second

pixel. L is the maximum value the color channel can have, for RGB this is 255. This formula

is used for each of the channels. For example, with RGB the formula is used for red, green

and blue.

The thirds step is to add these values to each other and then divide it by three to get the

average. This is then multiplied by b which is 1 or 0 depending if the pixel is connected based

on the radius. This leads to a weight (c) as shown in the formula below:

The final step is to compare this weight to t which is a number between 0 and 1 that can be

used to set the thresholds of the connections. A higher weight shows pixels that are different

from each other. So, if c is bigger than t the pixels are connected to each other, with c as the

weight of the connection. If not, the value is 0 because they are not connected.

This gives a network with the pixels as nodes and the edges based on the calculations

explained above. From this network different variables can be extracted. The first variable

that is used is the total number of edges per pixel. This indicates the pixels that are important

within the image because the differ a lot from their surroundings. The edge of an object

differs a lot from its background which makes it an important pixel within an image. The

processed original data has as many values per pixel as there are color channels. After the

modeling this is only one value per pixel, for RGB this means that the values are divided by

three.

Some preliminary results showed that the total edges per pixel are not enough to correctly

classify the images used. To solve this, the model can be used differently. Other variables can

be extracted from it on top of the total edges per pixel.

Networks have measures that can be used. The measures describe the network and its

characteristics. The measures used in this research are explained below:

Degree: the node degree is the number of edges that are connected to the node

(Rubinov & Sporns, 2010). Here the mean degree, the standard deviation of the

degree and the maximum degree are used. The mean degree is the mean of the

degrees of all nodes added together. The standard deviation is the deviation within

15

those degrees and the maximum degree is the number of edges of the node with the

highest degree.

Strength: the node strength is the sum of weights of the edges connected to that node.

In this study the mean strength is used, which is the mean of all the weights of the

nodes added together (Rubinov & Sporns, 2010).

Density: this is a number which is calculated by dividing the number of connections

by the number of possible connections (Rubinov & Sporns, 2010).

Diameter: this is the longest path of the shortest paths between different nodes

(Rubinov & Sporns, 2010).

Betweenness: this is the number of shortest paths that pass through a node (Rubinov

& Sporns, 2010).

Closeness: this is the average number of steps it takes for a node to reach all other

nodes in the network (Costenbader & Valente, 2003).

These measures further shrink the variables to only those based on the measurements. The

theory is that similar images have similar measures for their respective complex networks.

These different outputs of the model are input for the classification algorithms that are

described below and the results of those are compared to the original processed data.

3.3 Classification algorithms

After modeling the images as complex networks, two classification algorithms are used to see

if the images modeled as complex networks can be used to classify the images correctly and

to what extent they can do that compared to a baseline. These classification algorithms are

SVM and Random Forest. Both classification algorithms are used on the texture-based

images modeled as complex networks, the base non-texture-based images and finally the

non-texture-based images modeled as complex networks. This gives the opportunity to

compare the performance of the baselines to the non-texture-based images modeled as

complex networks.

3.3.1 Support Vector Machines

One classification algorithm used is SVM (Support Vector Machines). SVM’s are based on

the structural risk minimization principle from computational learning theory (Joachims,

1998). This principle looks to find a hypothesis with the lowest true error. SVM’s are

universal learners, this means that they use a linear threshold function as their basis, but they

can be modified in various ways like using them for polynomial classifiers (Joachims, 1998).

16

The goal of SVM is to create a model based on training data which can then predict values of

target data (Hsu, Chang & Lin, 2003). It is a supervised machine learning algorithm and it is

mostly used for classification (Ray, 2017). Each data item is plotted as a point in an n-

dimensional space then the classification is done by finding a hyperplane that differentiates

the different classes the best. One of the advantages of SVM’s is the ability to generalize

many features. This is especially important in this research because of the number of features

images have based on the images used here.

The algorithm is used with three different datasets. The first is the VisTex dataset that has

been modeled and exists of network measures. The second is the base image data from the

CIFAR-10 dataset. The third and last is the CIFAR-10 dataset made up of network measures

of the modeled images. For the first and third datasets there are different versions based on

the parameters that are used for the modeling. The results are used for comparison and to

draw conclusions from.

3.3.2 Random Forest

Random forest is a supervised learning algorithm just like SVM. It is a classification and

regression method (Belgiu & Dragut, 2016). The underlying theory is combining a group of

decision trees together to get a more accurate prediction (Donges, 2018). A random forest

does not search for the most important feature while splitting nodes like a normal decision

tree does. Instead it searches for the best feature among a random subset of features. This is

done to gain a wide diversity which leads to a better model (Breiman, 2001).

This algorithm uses the same three datasets as the SVM algorithm. The VisTex dataset, the

CIFAR-10 base image dataset and the CIFAR-10 dataset of network measures. For the first

and third datasets there are different versions based on the parameters that are used for the

modeling. The results are used for comparison and to draw conclusions from.

17

4.0 Experimental Setup

The experimental setup first explains the two datasets that are used in the experiment. After

this, the programs that are used are explained, followed by the packages and the

programming of the proposed model.

4.1 Datasets

In this research two different datasets are used. One with the texture-based images that are

needed for the baseline, and the other with non-texture-based images.

4.1.1 Vision Texture

The Vision Texture database (VisTex) was created as an alternative to the Brodatz texture

library. This database was used by Scabani et al (2018) in their research and they got 99.9%

accuracy with their model. To check whether the simplified model of their model used in this

research works, this database is also used here to set up a baseline.

In this research five of the reference textures are used for the texture-based images needed for

the baseline. The classes used are: brick, clouds, leaves, flowers, food. These classes are

selected because of the mix of big textures and especially the smaller textures like flowers

and leaves. The idea is that those are closer to the non-texture-based images used in this

research and because of can be used for a better comparison.

For each of the classes two 128x128 pixel images are used. These are split up into 32x32

pixel images which gives 32 images per class for a total of 160 images. The reason that this

method of splitting up the images is used is because of the original research of Scabini et al.

(2018), since their theory is used here. In that research they split up 512x512 pixel images up

into 128x128 pixel images. The same 512x512 pixel images are available in 128x128 pixel

images.

The first reason that the lower resolution images have been used is the computational

complexity. Since the model used calculates weights based on pixels, each additional pixel

complicates the model. Where 32x32 pixel images have 1024 possible edges, 128x128 pixel

images have 16,384 possible edges. The second reason is because the CIFAR-10 dataset that

is explained next consists of 32x32 pixel images. To be able to better compare the results the

same resolution is used for the two datasets.

In short, the VisTex dataset is used for the texture-based images. It contains 160 images that

are evenly distributed over 5 classes.

18

4.1.2 CIFAR-10

The second database that is used is the CIFAR-10 dataset. This dataset consists of 60,000

32x32 pixel images divided into 10 different classes. They are labelled subsets of the 80

million tiny images dataset collected by Krizxhevksy, Nair, and Hintin. All the 10 classes are

used in this study. The classes are the following: airplane, automobile, bird, cat, deer, dog,

frog, horse, ship and truck. The images in this dataset are non-texture-based images. Each

contains certain objects or animals with a background. The objects are pictured from different

sides and in different sizes compared to the surroundings.

From this dataset 500 images are used, 50 for each of the classes. The reason this is done is to

create a balanced dataset that is still small enough to try out different parameters. The images

are kept as 32x32 pixel images, which means 1024 possible edges. This dataset has been used

in convolutional neural network research and got classification rates of 90% and more

(Benenson, 2016). It is interesting for this research because network models have shown to

work on it, but this type of modeling has not yet been used.

In short, the CIFAR-10 dataset is used for the non-texture-based images. It contains 10

different classes and 50 images per class.

4.2 Software

Python is used for the CIFAR-10 dataset. It is used for the pickle add-on, to unpickle the

CIFAR-10 dataset into a dictionary with the RGB values and labels. From this the files are

stored in a dataframe using Pandas and then saved as an CSV file to easily export it to R-

studio where the rest of the work is done.

The main software that is used is R-Studio and with that the main programming language

used is R. This software is used to process the images and program the model. The algorithms

that are used are part of R-packages. To get reproducible results a set seed is used.

4.2.1 Packages in R-Studio

This is a list of the packages used with a short description of what they were used for:

randomForest: this is the random forest algorithm, as explained in the method section,

used in this study

dplyr: this is the package that is used for data manipulation of the image data

caret: this package contains several algorithms, one of which is SVM as explained in

the method section. It also contains the confusion matrix function that is used for

evaluation.

19

e1071: this is an additional package which is needed for the caret package.

Igraph: this package is used to create the network by using an adjacency matrix that is

explained in the model section.

Pixmap: the images of the VisTex dataset are ppm images, this package allows to

open these in R-studio.

4.3 Proposed model

This is an explanation of how the model as proposed in the methods is used in R-studio. The

first step of the modeling is looking at the cartesian coordinates to see if pixels are connected.

For this a grid is created where the rows and columns represent the different pixels. Each row

and column has 1024 values, and this creates what is basically an adjacency matrix, with the

values being 0 or 1 in the first step. These values are based on the formula given in the

method section:

If d is smaller than the radius R that is chosen, then the value is 1, otherwise it is 0. This grid

is the same for all the images because these are all 32x32 pixels, so they have the same

cartesian coordinates. This does not mean that this grid only has to be calculated once.

Different values for R can be chosen to change the network by increasing or decreasing the

radius in which pixels are connected. In this research the values from 1 to 6 are used for R.

Three grids just like the one in the first step are used for the second step of the modeling.

Each of these grids represent a color channel in the RGB system. Instead of 0 or 1 for the

values in the rows and columns these three grids have their weights based on the formula

given in the methods section:

These three grids are then added to each other and then divided by three to get the average

weight and then multiplied with the grid based on the first step like explained in the formula

with b being the grid of zeros and ones.

This makes a single grid from the four grids. This grid contains all the weights per

connection. These are compared with a threshold t, if the weight is bigger than t the value

20

remains the same. If not, the value becomes 0 to indicate the pixels not being connected. For t

different values are used from 0.3 to 0.6. This final grid can be used for two things. First it is

possible the total connections per pixel to see which pixels are different enough to be

important. Secondly you can use the grid as a weighted adjacency matrix to create a network

with the Igraph package. This network can then be used to get network measures. In this

research the following network measures are used: density, mean strength, diameter, mean

closeness, mean betweenness, mean degree, standard deviation of degree, and maximum

degree. Below is an example of what this data looks like. This data is then input for the

classification algorithms.

4.4 Evaluation

In this research there are three types of data that are input for the classification algorithms:

the non-texture-based images, the texture-based images modeled as complex networks, and

the non-texture-based images modeled as complex networks.

Each of these are split into a train and a test set with an 80/20 split. This is done randomly

with a set seed. The training and test sets are the same for both the algorithms. This is to be

able to compare the results of the algorithms with each other.

The train set is used as input for the algorithms to create a fit. The fit is then used to predict

the test data. The predicted labels are then compared to the true labels to get a confusion

matrix. The main value of interest in the confusion matrices is the overall accuracy. This

value is used to compare the different types of data with different parameters to each other.

The other values like the positive and negative prediction values can also tell us which

classes perform better than others and point to classes that are of special interest.

21

5.0 Results

In this chapter the different results are presented with explanations. First discussed is the

modeled data from the VisTex dataset, followed by the unmodeled CIFAR-10 set. These two

combined are the baseline. The last one discussed is the modeled CIFAR-10 dataset. Each of

the datasets are split up in results from the SVM algorithm and the Random Forest algorithm.

5.1 VisTex modeled

The VisTex dataset was mainly used to see if the variables created by the simplified model

can be used for the classification of images. The focus is not to try as many r and t-values as

possible to get the best result, but simply to provide proof that the model works on this type

of data and as comparison material for the non-texture-based images. The t-values are

selected in a way that it holds the middle ground between a network that has many

connections and a network that has few connections. The r-values are chosen in a way that

the radius does not encompass to much area of the image.

5.1.1 SVM

In the table below, you can see what the effect of different t and r values is on the accuracy of

the classification for the modeled data.

SVM t = 0.3 t = 0.4 t = 0.55 t = 0.6 t = 0.65

r = 2 40.6% 40.6% 71.9% 50% 50%

r = 3 71.9% 81.3% 75% 87.5% 53.1%

r = 4 81.3% 84.4% 72.9% 81.3% 53.1%

r = 5 84.4% 84.4% 78.1% 90.6% 53.1%

These results show that SVM can correctly classify the modeled images by using the network

measures. The highest score that is achieved is 90.6%. The r-values and t-values show that

the network should not be too small. A low r-value means that less pixels are connected

because the radius in which they can be connected is smaller. The table shows that with r=2

there is a drop-off. The same is true for the t-value. A higher t-value means that the threshold

for connections is higher. This means that less pixels are connected based on the color

differences. There is a drop-off in performance at 0.65.

22

Below are statistics from the confusion matrix for the highest accuracy score that explain the

performance further:

Confusion Matrix Statistics

Algorithm SVM

r-value used 5

t-value used 0.6

Overall Accuracy 90.6%

Statistics by Class Brick Cloud Flowers Food Leaves

Sensitivity 100.0% 100.0% 87.5% 100.0% 60.0%

Specificity 100.0% 100.0% 91.7% 100.0% 96.3%

Positive Prediction Value 100.0% 100.0% 77.8% 100.0% 75.0%

Negative Prediction Value 100.0% 100.0% 95.7% 100.0% 92.9%

The first statistic that is important is the overall accuracy. The table shows that the overall

accuracy for this result is 90.6%. The overall accuracy shows in how many instances the

classifier is correct. This shows that the algorithm can classify the images based on the

network measures. Scabani et al. (2018) got 99.8% accuracy in their study. The difference in

performance can have different reasons. The first reason is the model used in this study. It is a

simplified version of the model of Scabani et al. (2018). The second reason is the images

used. In their research 128x128 pixel images are used compared to the 32x32 pixel images

used here. This means that the images here contain less detail and less information.

The overall accuracy is interesting for an overview of the performance. The individual

statistics per class show how the overall accuracy is influenced by the different classes. The

statistics that are selected here show to what extent the algorithm can classify different

classes. When looking at the classes the table shows that three classes have 100%

performance on all the statistics. The other two classes are the ones where the performance

drops.

The sensitivity statistic of the leaves class shows that 60% of the images that are leaves, got

classified as leaves. This means that 40% got classified as something else. The positive

prediction rate shows the accuracy of the classifier on leaves. In 75% of the cases the

algorithm classified images as leaves correctly and in 25% of the cases it classified images as

leaves that are not leaves. This means that the algorithm has difficulties with classifying this

specific class. The same is true for the flowers class to a lesser extent. This difference in

performance compared to the other three classes is the reason that the overall accuracy is

lower. The reason why the performance is lower has to do with the texture and the image

23

size. Flowers and leaves are small textures compared to bricks. When this is combined with

the low detail in the small images it leads to an unclear texture that is hard to classify. This

problem is further discussed in chapter 6 and is seen in other results as well.

One of the sub-questions is about the performance on texture-based images modeled as

complex networks. The overall accuracy of 90.6% shows this performance and the individual

classes show what influences the performance. The performance on this dataset is high based

on the numbers that are shown above even though some classes show problems.

5.1.2 Random Forest

The SVM algorithm got high scores on the results. The table below shows how the Random

Forest algorithm performed on the same data:

Random Forest t = 0.3 t = 0.4 t = 0.55 t = 0.6 t = 0.65

r = 2 84.4% 87.5% 78.1% 75% 59.4%

r = 3 75% 75% 78.1% 84.4% 59.4%

r = 4 84.4% 90.6% 81.3% 87.5% 43.8%

r = 5 84.4% 87.5% 75% 81.3% 53.1%

These results show that Random Forest can classify texture-based images based on the

network measures. It has a highest accuracy score of 90.6%. For the t-values the story is the

same for Random Forest as with SVM. A higher t-value gets a network that is too small based

on a high threshold for connections. The lower r-value does not seem to matter as much with

Random Forest based on these results. Below are statistics from the confusion matrix for the

highest accuracy score that explain the performance further.

Confusion Matrix Statistics

Algorithm Random Forest

r-value used 4

t-value used 0.4


Statistics by Class Brick Cloud Flowers Food Leaves

Sensitivity 100.0% 100.0% 87.5% 100.0% 60.0%

Specificity 100.0% 100.0% 91.7% 96.4% 100.0%

Positive Prediction Value 100.0% 100.0% 77.8% 80.0% 100.0%

Negative Prediction Value 100.0% 100.0% 95.7% 100.0% 93.1%

These statistics show much of the same compared to the SVM results. The overall accuracy is

90.6% which is the same as the overall accuracy of SVM. This means that the Random Forest

24

algorithm can classify the images based on the network measures, just like the SVM

algorithm can.

The individual classes perform similar compared to the SVM algorithm. A notable difference

is the fact that the food-class did not get 100% over all the statistics. This class got 80% on

the positive prediction rate. This means that images that are not food got classified as food. It

does get 100% on sensitivity so it classifies all the images that are food as food. The flowers

and leaves classes show problems again and since they both have sensitivities that are not

100% this means that some of the leaves got classified as food. This gives us the same

information as with the SVM algorithm, that the main problem lies with the flowers and

leaves classes. The reasons for this problem are the same as for the SVM algorithm, small

textures combined with low detail images.

One of the sub-questions is about the performance on texture-based images modeled as

complex networks. The highest overall accuracy of 90.6% shows a high performance and the

individual classes show what influences this performance. The other accuracies show high

percentages as well depending on the t-value and r-value selected. The performance on

texture-based images modeled as complex networks is quite high as shown by the numbers.

5.1.3 Algorithms compared

One of the sub-questions is about comparing the performance of the two algorithms. Some

preliminary comparisons have been made in the text above. Based on the table below some

more things can be concluded.

SVM t = 0.3 t = 0.4 t = 0.55 t = 0.6 t = 0.65 Average %

r = 2 40.6% 40.6% 71.9% 50% 50% 50.6%

r = 3 71.9% 81.3% 75% 87.5% 53.1% 73.8%

r = 4 81.3% 84.4% 72.9% 81.3% 53.1% 74.6%

r = 5 84.4% 84.4% 78.1% 90.6% 53.1% 78.1%

Average % 69.6% 73.7% 74.5% 77.4% 52.3% 69.3%

Random Forest t = 0.3 t = 0.4 t = 0.55 t = 0.6 t = 0.65 Average %

r = 2 84.4% 87.5% 78.1% 75% 59.4% 76.9%

r = 3 75% 75% 78.1% 84.4% 59.4% 74.4%

r = 4 84.4% 90.6% 81.3% 87.5% 43.8% 77.5%

r = 5 84.4% 87.5% 75% 81.3% 53.1% 76.3%

Average % 82.1% 85.2% 78.1% 82.1% 53.9% 76.3%

At first glance, the Random Forest seems to perform better overall. When looking at the

average performance over all, the results this seems to hold, with 69.3% average accuracy for

25

SVM and 76.3% average accuracy for Random Forest. Comparing all the averages there is

only one where SVM comes out on top, that is for the average of r = 5. Overall the

differences are enough to say that the Random Forest algorithm performs better than SVM on

this dataset.

5.2 CIFAR-10 unmodeled

For the unmodeled CIFAR-10 dataset are no special parameters. For each algorithm there is a

single confusion matrix and a single score.

5.2.1 SVM

The accuracy score for the SVM algorithm is 28%. Below is the confusion matrix with the

different classes:

Confusion Matrix

Baseline

Algorithm SVM


Statistics by Class Airplane Auto Bird Cat Deer Dog Frog Horse Ship Truck

Sensitivity 25.0% 42.9% 18.2% 0.0% 50.0% 28.6% 7.7% 20.0% 77.8% 27.2%

Specificity 96.4% 94.6% 91.0% 90.2% 78.3% 96.8% 98.9% 94.4% 85.7% 94.4%

Positive Prediction 57.1% 37.5% 20.0% 0.0% 16.7% 40.0% 50.0% 28.6% 35.0% 37.5%

Negative Prediction 87.1% 95.7% 90.0% 91.2% 94.7% 94.7% 87.8% 91.4% 97.5% 91.3%

The overall accuracy is 28%. This means that the algorithm is inaccurate on this dataset. It

only predicted 28% of the classes correct which means it predicts 72% incorrect. The reason

for this low accuracy can be found in the statistics by class.

The overall sensitivity is low, with none of the classes getting higher than 50%. This means

that the algorithm has a hard time classifying a class as the correct one. This can also be seen

in the positive prediction rate percentages of which none get higher than 57.1% with some

going as low as 0%. The differences in percentages seems to indicate that there are classes on

which the algorithm works better. When testing this however, using a binary approach where

one class is the positive class and all the others the negative class, it showed that none of the

classes performed well.

Looking at the specificity and negative prediction rate the results seems more positive.

However, the odds of randomly predicting that something is not a class are much higher than

predicting the correct class when there are this many classes. The combination of low

26

sensitivity and positive prediction rate with the low overall accuracy show that the

predictions are mostly random.

One of the sub-questions is about the performance of classification algorithms on the base

non-texture-based image data. The results show that the performance is 28%. This is a low

performance, since this means that 72% of the image got classified incorrectly. The reason for

the low score is difficult to judge because all the classes have a low performance. The data is

RGB-channel information ordered by pixels. The SVM algorithm is not able to differentiate

classes based on that.

5.2.2 Random Forest

The Random forest algorithm got the same accuracy of 28% with some different numbers in

the confusion matrix as shown below.

Confusion Matrix

Baseline

Algorithm Random

Forest



Sensitivity 50.0% 57.1% 18.1% 12.5% 25.0% 28.6% 76.9% 0.0% 55.6% 27.2%

Specificity 90.5% 93.6% 92.1% 92.4% 81.5% 91.4% 97.7% 98.9% 90.1% 92.1%



This confusion matrix tells a similar story to that of the SVM confusion matrix. The Random

Forest algorithm classifies 72% incorrectly. The statistics by class are not the same as the

statistics for the SVM, but they tell the same story. The sensitivity does go up to 76.9% which

might indicate classes that perform better. The positive prediction rate goes up to 50% and

differs between classes. However, when using the binary approach, it shows that none of the

classes score noticeably better. With sensitivity statistics not getting above 22%.

Just like with the SVM algorithm the specificity and negative prediction rates are high. The

reason for this is the same. It is easier to randomly predict an image not being a certain class

than the other way around.

One of the questions is about the performance of classification algorithms on the base non-

texture-based image data. The results show that the performance is 28%. This is a low

performance, since this means that 72% of the image got classified incorrectly. The reason for

27

the low score is difficult to judge because all the classes have a low accuracy. The predictions

seem mostly random. The data exists of RGB-channel information ordered by pixels. The

Random Forest algorithm is not able to differentiate classes based on that.


One of the questions is how the algorithms compare to each other. It is hard to tell much

difference between the two algorithms with the results from the unmodeled non-texture-based

images. Both preform about equal in terms of overall accuracy with 28%. The statistics do

differ with classes but because the accuracy is so low and the prediction are mostly random, it

is not possible to compare these statistics. Both algorithms are not able to classify images

based on the dataset that is used.

5.3 CIFAR-10 modeled as complex networks

The VisTex results show that the model can work. The unmodeled CIFAR-10 results show

that the base data is unusable for classification. The modeled data shows whether the model

can bring any improvements to that. The t-values and r-values are chosen in a same way as

for the VisTex dataset. The t-values are selected in a way that the network holds the middle

ground between too many connections and too few connections. The r-values are chosen in a

way that the radius does not encompasses to much area of the image.

5.3.1 SVM

The expectation is that by generalizing the data into complex networks, the results improve

compared to the results on the base data. The table below shows the accuracy scores for the

modeled image data with different t and r values.

SVM t = 0.3 t = 0.4 t = 0.5 t = 0.6

r = 2 15% 9% 11% 9%

r = 3 13% 10% 16% 9%

r = 4 22% 13% 9% 12%

r = 5 16% 17% 11% 17%

r = 6 15% 18% 15% 13%

The table shows that the results are low. The highest score is only 20%, which is lower than

the result from the base images. Whilst there are some small differences these do not tell

much because all the scores are so low. These scores show that the predictions are random.

Below are statistics from the confusion matrix with the highest accuracy.

28

Confusion Matrix

Algorithm SVM

r-value used 4

t-value used 0.3



Sensitivity 18.8% 42.9% 18.2% 12.5% 50.0% 0.0% 76.9% 0.0% 44.4% 36.4%

Specificity 91.7% 87.1% 94.4% 96.7% 80.4% 95.7% 98.9% 88.9% 90.1% 91.0%



The overall accuracy is 22% which means that 78% is classified incorrectly. This score shows

that the performance is lower than the scores of the base images.

The highest sensitivity is 76.9% for the frog-class, after that it drops to 50% for the deer-class

and three classes get a 0.0% sensitivity score. The 76.9% looks positive but comparing it to

the positive prediction rate of 50% shows that even this class performs low. The positive

prediction rates tell a similar story as the sensitivity scores. From the 10 classes, two get a 0%

prediction rate and none of the classes get higher than 50%. Overall the sensitivity scores

give an indication of the inability of the algorithm to correctly classify images.

There are statistics that look positive. The specificity and negative prediction rates are high,

with most of the classes scoring around 90%. The reason for this is that it is easier to

randomly predict an image is not a certain class. 90% of the classes are correct in that case.

This means that a high score in those statistics does not matter when the sensitivity and

positive prediction rates are as low as they are here.

One of the questions is what the performance on the modeled dataset is. These results show

that the performance is low, even lower than the base data and much lower than modeled

VisTex dataset. One of the reasons for the low performance can be found in something that

happened with the VisTex dataset. There two classes performed worse than the others. The

reason for that is that the details in the image are too small. With the images from the CIFAR-

10 dataset this is true as well. The other reason is the content of the images. Images within

different classes do not differ enough and images with the same class do not look similar

enough. These reasons are further illustrated in the discussion section. The SVM algorithm

cannot classify images based on the dataset that is used here.

29

5.3.2 Random Forest

The Random Forest results show the same story as the results of SVM. As the table below

shows.

Here the highest accuracy score is 19% which is even lower than the best result of SVM.

There are some small differences but nothing that is telling. Below are statistics from the

confusion matrix for the best score:

Confusion Matrix

Algorithm Random

Forest

r-value used 4

t-value used 0.3



Sensitivity 25.0% 28.6% 18.2% 0.0% 25.0% 0.0% 23.1% 0.0% 33.3% 27.3%

Specificity 89.3% 86.0% 88.8% 96.7% 89.1% 92.5% 93.1% 91.1% 92.3% 91.0%



The overall accuracy is 19% which is 3% lower than the overall accuracy of SVM. Just like

with SVM the score is lower than the base data and much lower than the modeled VisTex

dataset.

The statistics by class tell a similar story to those of SVM. The sensitivity does not go higher

than 33.3% over all the classes and three classes score 0%. The positive prediction rates are

low as well with none of the classes scoring higher than 33%. Three classes got 0% with the

positive prediction rates, these are the same classes that got 0% on sensitivity. These statistics

give a clear indication that the algorithm cannot classify images correctly based on the dataset

and model that are used here.

The specificity and negative prediction rates are high. The reason for this is the same as with

SVM. It is easier to randomly predict an image is not a certain class because nine classes are

correct in that situation and only one is incorrect.

Random Forest t = 0.3 t = 0.4 t = 0.5 t = 0.6

r = 2 14% 14% 12% 11%

r = 3 14% 16% 11% 11%

r = 4 19% 13% 10% 10%

r = 5 15% 10% 12% 15%

r = 6 17% 16% 18% 12%

30

One of the questions is what the performance on the modeled dataset is. The results show that

the performance is low. It is lower than the baseline of the unmodeled CIFAR-10 dataset and

much lower than the modeled VisTex data. The reasons are in line as what was discussed with

the SVM algorithm. The images are too small which means less information. And the content

of the images is not different enough between classes and not similar enough for images of

the same class. These reasons are further explained in the next chapter. The Random Forest

cannot classify images based on the model and data used here.


One of the research questions is how the algorithms compare to each other. The results are

not comparable because they both score low. The 3% difference in highest score does not tell

anything except that one algorithm guessed right, because the results are random. The

problem with these results is that the data provided to the algorithms does not work. This

means that the algorithms get scores that are in the context of this comparison meaningless.

5.4 Summary of the results

Putting everything together gives the following best results for each of the algorithms:


SVM 90.6% 28% 22%


This shows that the scores of the modeled data are much lower than the baseline that is based

on VisTex and the unmodeled data. The modeled CIFAR-10 data only got 19/22% which is

low if you compare it to convolutional neural networks that can get up to 90% and more on

the same dataset (Benenson, 2016).

The main research question is to what extent modeling non-texture-based images as complex

networks works for classification. By looking at the results this method of modeling non-

texture-based images as complex networks does not work for classification purposes. The

reasons are explained in depth in the next chapter

31

6.0 Discussion

This chapter looks back at the results and discusses in depth what the reasons can be for these

results and how to connect them to each other. It also looks in to things that can be improved

or that have not been answered yet.

There are two main reasons as to why modeling the non-texture-based images as complex

networks and using the network measures does not work.

The first has to do with the amount of information that an image has. In this research images

were used that are 32x32 pixels. This means that a lot of detail is lost by compressing the size

to such a small image. With a large texture like bricks this is less important because it is not

as detailed. If you compare this with an airplane that fills a third of an image, this problem

becomes apparent. This also means that the radius in which you connect pixels cannot be too

big because it encompasses to much of the image if it is small.

This problem does not only show itself in the CIFAR-10 dataset, the VisTex dataset shows

that same problem. The classes that scored the worst there are flowers and leaves. If you

compare leaves to bricks, leaves are much smaller in detail and the same is true for flowers.

This is one of the reasons that the score for those classes is lower than the rest. Leaves and

flowers are also overlapping textures which further complicates things if you start

compressing images. The lines between different leaves starts to blur and the become less

clear. When using a model that explicitly searches for differing pixels, this becomes a

problem. The blurring of an image that happens when compressing it, turns pixels more

similar and makes differences smaller. This can explain the differences in performance

compared to the model of Scabani et al. (2018), they used 128x128 pixel images which

increases the information in an image. Below you can see the differences in images by

making both images the same size, the left one is the 128x128 pixels and the right one the

32x32 pixels:

32

Even though you can see the difference in image quality, it is still clear what the right one is

because the texture is big. If you compare that to a non-texture-based images like below, even

a human will have problems with recognizing the fact that it is a cat.

A conclusion seems that by increasing the image size, the model might work on non-texture-

based images. This is however where the second reason comes in.

The second reason has to do with the objects in the images. The similarities within classes

and the differences between classes are too small. The expectation at the start of the study

was that the classes have enough similarities and especially differences between them to

improve classification rates. The idea was that similar networks are created within classes or

at least differing networks between classes. Below you can see two images that look similar,

a third that looks different and a fourth that is even more different. The first three are the

same class but even the two that look similar have different backgrounds. The forth image

looks more like the third but is another class.

The model used here is not able to differentiate the different classes enough to classify them

correctly.

The expectation is that for non-texture-based images to work, there must be more information

in the image, so a higher pixel-count, and the object must be pictured in the same way with a

same sort of background. But even then, there are classes that look similar like the red car

and the red boat shown above. To know this for certain this must be researched. There is a

problem attached to increasing the pixel-count. It takes increasingly more computing power

and time, since every pixel is a possible node in the network and the values must be

calculated for each pixel. Even though this is a simplified model, it still took some quite time

to get different results, which is why only a small part of the dataset is used. The model that is

proposed here must be made more efficient to handle bigger images.

33

The question is if it is worth to research this method further. Even though it might improve

classification, this study gives no indications it will improve enough to compete with other

methods that are being used for image classification like convolutional neural networks.

34

7.0 Conclusion

In this chapter the questions that are stated at the beginning of this thesis are answered. First

the sub-questions are answered, followed by an answer to the main question.

7.1 Performance on texture-based images as complex networks

The first sub-question was:

How do classification algorithms perform on texture-based images modeled as

complex networks?

The SVM algorithm and Random Forest algorithm both got scores of 90.6%. This was

expected because the more complicated model of Scabani et al. (2018) shows that high

accuracy rates can be obtained with this type of model.

The only problems with the classification are the leaves and flowers classes. This has two

reasons. The image size that is used in this research contains less information which makes it

harder to differentiate different parts of the texture. The second reason enhances the first

because these two classes contain small textures which are harder to differentiate in the first

place.

The conclusion is that classification algorithms show high accuracy rates on texture-based

images that are modeled as complex networks, with the performance being dependent on the

model used but especially the type of images and their size.

7.2 Performance on base non-texture-based images

The second sub-question was:

How do classification algorithms perform on the base non-texture-based images?

The SVM and the Random Forest algorithm both got scores of 28%. All the classes had poor

individual scores when comparing them to the scores on the modeled VisTex dataset. The

sensitivity scores and positive prediction rates are low and show that the algorithms are not

able to classify images correctly based on this dataset. The reason for the low scores is that

the algorithms cannot find a pattern in the way the image data is structured.

The conclusion is that the classification algorithms have a low performance on the base non-

texture-based images.

35

7.3 Performance on non-texture-based images as complex networks

The third sub-question was:

How do classification algorithms perform on the non-texture-based images modeled

as complex networks?

The SVM algorithm got an accuracy rate of 22% and the highest rate for Random Forest was

19%. Looking at the statistics by class shows that the predictions are mostly random. None of

the classes got acceptable scores when comparing them to the results of the modeled VisTex

dataset. The statistics show that the classification algorithm is not able to correctly classify

the images based on the methods used here.

The reason for this is in line with the reason that the flowers and leaves classes of the VisTex

dataset perform worse than the other classes. The non-texture-based images have more detail

in the images with less detail being the object that needs to be classified. An image with a

plane contains other details that do not add to the classification because they differ within a

class. The images of different classes are too similar when modeled as a complex network

and too different within the same class. This problem is enhanced by the size of the images

used, which means that the details get washed out and harder to differentiate.

This leads to the conclusion that the classification algorithm got a low performance on the

non-texture-based images modeled as complex networks.

7.4 Performances of algorithms compared

The fourth sub-question was:

How do the performances of the algorithms compare to each other?

The algorithms scored similar as can be seen below:


SVM 90.6% 28% 22%


The highest accuracy rates from the VisTex dataset give the same rate for both the algorithms.

When looking at all the results the Random Forest scores better with an average accuracy of

76.3% compared to the 69.3% that the SVM algorithm obtained.

For the CIFAR-10 dataset, both modeled and the baseline, it is not possible to select one that

performs better than the other. This is mainly because the results are random, and the

differences are more due to random chance then any real difference.

36

The conclusion is that both the algorithms perform similar on the VisTex dataset with the

Random Forest scoring better on average.

7.5 Performance compared to baseline

The last sub-question was:

How do the performances of the non-texture-based images modeled as complex

networks compare to the baseline?

The baseline is comprised of the scores for the modeled VisTex data and the CIFAR-10 base

data. The non-texture-based images modeled as complex networks have highest accuracy

scores of 19% and 22%. The VisTex dataset got 90.6% accuracy and the unmodeled non-

texture-based images got 28% accuracy.

When comparing the performance of the non-texture-based images modeled as complex

networks to the baseline, the results show that it has a low performance. It scores lower than

the unmodeled data by a small margin. The main difference is with the results from the

VisTex dataset which are much higher than 19 to 22%.

The reason for the performance difference is the type of images. The model is not able to

generalize non-texture-based images to the same degree as it is able to do with texture-based

images.

The conclusion is that the non-texture-based images modeled as complex networks compare

unfavorably with the baseline by a substantial margin.

7.6 Modeling non-texture-based images as complex networks?

These sub-questions lead to the answer of the main question which was:

To what extent does modeling non-texture-based images as complex networks work

for classification?

The conclusions to the sub-questions show that the non-texture-based images have a low

performance compared to the baseline. The scores of 19 to 22% accuracy for the modeled

non-texture-based images can be considered low even without comparative material. This

leads to the conclusion that modeling non-texture-based images as complex networks does

not work for classification purposes with the model that is used here. The algorithm that is

used does not impact this conclusion as was shown by comparing the two algorithms.

The reason is that the model used here is not able to generalize images enough based on the

images of the CIFAR-10 dataset. The issue is that the images are too small and lose too much

37

detail because of that. The other issue is that the images within classes are too different and

the images of different classes are too similar.

This does not mean that using complex networks does not work at all for non-texture-based

images. It primarily means that this method of modeling images as complex networks does

not work for these images. Convolutional Neural Networks also make use of networks to

classify images and those have shown to work on the CIFAR-10 dataset. The use of complex

networks is still a method that can be used to great success. Just not in the way that it was

modeled here. It is however a method that needs more research. The results from the VisTex

dataset and studies with Convolutional Neural Networks show promise and other methods

might be discovered that work even better.

38

References

Alapati, N. K., & Sanderson, A. C. (1985, December). Texture Classification Using Multi-

resolution Rotation--Invariant Operators. In Intelligent Robots and Computer Vision IV (Vol.

579, pp. 27-39). International Society for Optics and Photonics.

Anitha, R., & Jyothi, S. (2016, March). A segmentation technique to detect the Alzheimer's

disease using image processing. In Electrical, Electronics, and Optimization Techniques

(ICEEOT), International Conference on (pp. 3800-3801). IEEE.

Barrenas, F., Chavali, S., Holme, P., Mobini, R., & Benson, B. (2009). Supplementary

Material: Network Measures. PloS one, 4(11), e8090.

Benenson, R. (2016). What is the class of this image? Discover the current state of the art in

objects classification. From:

http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html#434946

41522d3130

Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.

Cambridge. (n.d.). Introduction to Network Theory.

https://www.cl.cam.ac.uk/teaching/1011/PrincComm/slides/graph_theory_1-11.pdf

Chitre, Y., & Dhawan, A. P. (1999). M-band wavelet discrimination of natural

textures. Pattern Recognition, 32(5), 773-789.

Ciresan, D. C., Meier, U., Masci, J., Maria Gambardella, L., & Schmidhuber, J. (2011, July).

Flexible, high performance convolutional neural networks for image classification. In IJCAI

Proceedings-International Joint Conference on Artificial Intelligence (Vol. 22, No. 1, p.

1237).

Cohen, F. S., Fan, Z., & Patel, M. A. (1991). Classification of rotated and scaled textured

images using Gaussian Markov random field models. IEEE Transactions on Pattern Analysis

& Machine Intelligence, (2), 192-202.

Costenbader, E., & Valente, T. W. (2003). The stability of centrality measures when networks

are sampled. Social networks, 25(4), 283-307.

Davis, L.S., (1981). Polarogram: a new tool for image texture analysis, Pattern Recognition

13

https://www.cl.cam.ac.uk/teaching/1011/PrincComm/slides/graph_theory_1-11.pdf

39

Donges, N. (2018). The Random Forest Algorithm. From:

https://towardsdatascience.com/the-random-forest-algorithm-d457d499ffcd

Goyal, R. K., Goh, W. L., Mital, D. P., & Chan, K. L. (1995). Scale and rotation invariant

texture analysis based on structural property. In Industrial Electronics, Control, and

Instrumentation, 1995., Proceedings of the 1995 IEEE IECON 21st International Conference

on (Vol. 2, pp. 1290-1294). IEEE.

Gupta, P. (2017). Decision trees in machine learning. From:

https://towardsdatascience.com/decision-trees-in-machine-learning-641b9c4e8052

Hafner, J., Sawhney, H. S., Equitz, W., Flickner, M., & Niblack, W. (1995). Efficient color

histogram indexing for quadratic form distance functions. IEEE transactions on pattern

analysis and machine intelligence, 17(7), 729-736.

Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector

classification.

Joachims, T. (1998, April). Text categorization with support vector machines: Learning with

many relevant features. In European conference on machine learning (pp. 137-142).

Springer, Berlin, Heidelberg.

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep

convolutional neural networks. In Advances in neural information processing systems (pp.

1097-1105).

Lam, W. K., & Li, C. K. (1997). Rotated texture classification by improved iterative

morphological decomposition. IEE Proceedings-Vision, Image and Signal

Processing, 144(3), 171-179.

Li, Y., Liu, L., Shen, C., & van den Hengel, A. (2015). Mid-level deep pattern mining.

In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp.

971-980).

Ojala, T., Pietikäinen, M., & Harwood, D. (1996). A comparative study of texture measures

with classification based on featured distributions. Pattern recognition, 29(1), 51-59.

Ray, S. (2017). Understanding Support Vector Machine Algorithm. From:

https://www.analyticsvidhya.com/blog/2017/09/understaing-support-vector-machine-

example-code/

40

Rouse, M. (2017). Image Recognition. From:

https://searchenterpriseai.techtarget.com/definition/image-recognition

Rubinov, M., & Sporns, O. (2010). Complex network measures of brain connectivity: uses

and interpretations. Neuroimage, 52(3), 1059-1069.

Scabini, L. F., Condori, R. H., Gonçalves, W. N., & Bruno, O. M. (2018). Multilayer

Complex Network Descriptors for Color-Texture Characterization. arXiv preprint

arXiv:1804.00501.

Silva, T. C., & Zhao, L. (2016). Machine learning in complex networks (Vol. 1). Springer

International Publishing.

Standford. (n.d.). Convolutional Neural Network. From:

http://deeplearning.stanford.edu/tutorial/supervised/ConvolutionalNeuralNetwork/

Strogatz, S. H. (2001). Exploring complex networks. nature, 410(6825), 268.

Van de Wouwer, G., Scheunders, P., Livens, S., & Van Dyck, D. (1999). Wavelet correlation

signatures for color texture characterization. Pattern recognition, 32(3), 443-451.

Zhang, J., & Tan, T. (2002). Brief review of invariant texture analysis methods. Pattern

recognition, 35(3), 735-747.

complex networks for image classification

Documents