complex networks for image classification
TRANSCRIPT
Complex networks for image classification
Maarten Baijens ANR: 722956
Tilburg University School of Humanities and Digital Sciences
Department of Cognitive Science & Artificial Intelligence Tilburg, The Netherlands
January 2019
THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
DEGREE OF MASTER OF SCIENCE IN COMMUNICATION AND INFORMATION
SCIENCES, MASTER TRACK DATA SCIENCE BUSINESS & GOVERNANCE, AT THE
SCHOOL OF HUMANITIES AND DIGITAL SCIENCES OF TILBURG UNIVERSITY
Thesis committee:
Dr. Martin Atzmueller Dr. Grzegorz Chrupala
1
Abstract
Complex networks have shown success in improving classification rates for images.
Particularly research on texture-based images has shown that modeling images as complex
networks can help identifying images. This research looks at using these networks for non-
texture-based images. The main research question is to what extent modeling images as
complex networks works for classification.
This research uses a simplified model based on the multilayer complex network descriptors
that Scabani, Condori, Goncalves & Bruno (2018) created in their study. They successfully
used these network descriptors on texture-based images. This research investigates the idea
whether a simplified model works on non-texture-based images.
Two datasets are used to find an answer to the main question. The first is the Vision Texture
database. This dataset contains texture-based images that are used to see whether the
simplified model works on texture-based images. The second set is the CIFAR-10 dataset.
This set contains non-texture-based images that are used to see whether using complex
networks works on those types of images.
The results show that the model works for texture-based images. The two algorithms used got
accuracy scores up to 90.6% on these images. The non-texture-based images that are modeled
as complex networks scored considerably lower, with accuracy scores not going above 22%.
The reasons for the results discussed above are twofold. Firstly, the images used had a low
resolution which means that there is less information that can be used for the modeling. And
secondly the non-texture-based images did not have enough differences which means that the
classification algorithms are not able to differentiate the different classes.
The conclusion is that modeling non-texture-based images as complex networks with this
model does not work.
2
Table of Contents
Abstract ...................................................................................................................................... 1
1.0 Introduction .......................................................................................................................... 4
1.1 Scientific relevance .......................................................................................................... 4
1.2 Practical relevance............................................................................................................ 5
1.3 Research questions ........................................................................................................... 5
1.4 Findings ............................................................................................................................ 6
2.0 Related Work ........................................................................................................................ 7
2.1 Image recognition ............................................................................................................. 7
2.2 Texture analysis ................................................................................................................ 7
2.2.1 Statistical methods .................................................................................................... 7
2.2.2 Model based methods ............................................................................................... 8
2.2.3 Structural methods .................................................................................................... 9
2.2.4 Color texture analysis ............................................................................................... 9
2.4 Complex Networks ........................................................................................................... 9
2.5 Convolutional Neural Networks..................................................................................... 10
2.6 Multilayer Complex Network Descriptors ..................................................................... 10
3.0 Methods.............................................................................................................................. 12
3.1 Image processing ............................................................................................................ 12
3.2 Modeling as Complex Networks .................................................................................... 12
3.3 Classification algorithms ................................................................................................ 15
3.3.1 Support Vector Machines ........................................................................................ 15
3.3.2 Random Forest ........................................................................................................ 16
4.0 Experimental Setup ............................................................................................................ 17
4.1 Datasets .......................................................................................................................... 17
4.1.1 Vision Texture ......................................................................................................... 17
4.1.2 CIFAR-10 ................................................................................................................ 18
4.2 Software ......................................................................................................................... 18
3
4.2.1 Packages in R-Studio .............................................................................................. 18
4.3 Proposed model .............................................................................................................. 19
4.4 Evaluation....................................................................................................................... 20
5.0 Results ................................................................................................................................ 21
5.1 VisTex modeled .............................................................................................................. 21
5.1.1 SVM ........................................................................................................................ 21
5.1.2 Random Forest ........................................................................................................ 23
5.1.3 Algorithms compared .............................................................................................. 24
5.2 CIFAR-10 unmodeled .................................................................................................... 25
5.2.1 SVM ........................................................................................................................ 25
5.2.2 Random Forest ........................................................................................................ 26
5.2.3 Algorithms compared .............................................................................................. 27
5.3 CIFAR-10 modeled as complex networks ..................................................................... 27
5.3.1 SVM ........................................................................................................................ 27
5.3.2 Random Forest ........................................................................................................ 29
5.3.3 Algorithms compared .............................................................................................. 30
5.4 Summary of the results ................................................................................................... 30
6.0 Discussion .......................................................................................................................... 31
7.0 Conclusion ......................................................................................................................... 34
7.1 Performance on texture-based images as complex networks ......................................... 34
7.2 Performance on base non-texture-based images ............................................................ 34
7.3 Performance on non-texture-based images as complex networks ................................. 35
7.4 Performances of algorithms compared ........................................................................... 35
7.5 Performance compared to baseline ................................................................................ 36
7.6 Modeling non-texture-based images as complex networks? .......................................... 36
References ................................................................................................................................ 38
4
1.0 Introduction
Image recognition is a popular topic in not only scientific research but also in mainstream
media. Through the years, the methods used to recognize images have evolved and the
algorithms keep advancing. Methods like grey-scale texture analysis and color distribution
analysis have been successfully used to classify images. And because of the increase in
computational power through the years, these methods have evolved into more demanding
algorithms that can be more accurate. However, the error of image recognition compared to
humans is still significantly higher in a lot of cases (Kovalevsky, 2012).
One of the most recent methods is using networks in images for classification. Convolutional
Neural Networks are getting high accuracies on image sets and show much promise
(Krizhevsky, Sutskever & Hinton, 2012). Recent research (Scabini, Condori, Goncalves &
Bruno, 2018) shows the use of complex networks in the form of multilayer complex network
descriptors to improve image classification, which in some cases performs even better than
Convolutional Neural Networks.
This all leads to a field in which a lot is still to be learned and discovered. This study takes a
further look into using complex networks to model images.
1.1 Scientific relevance
The idea of image recognition by machine learning has been a popular research topic for
many years. There are however still new methods to be discovered and old methods to be
improved on. In the last years the use of networks has shown promise as one of the best
methods currently available. Especially Convolutional Neural Networks show high
accuracies in classifying images (Krizhevsky et al, 2012). More recent research shows
complex networks as another efficient way to classify images.
To further the knowledge in this area this research builds on existing research of Scabani et al
(2018). To examine the use of complex networks and how they can be used to classify other
images. Scabani et al (2018) used complex networks in their research to improve
classification of textures. Because of this their research was primarily focused on texture-
based images such as wood and brick. The improvements that the use of these complex
networks in this way bring could also work on other types of images. This research looks
whether the basis of their modeling works on images that are not based on textures but on
real life objects. Real life objects also have certain textures and patterns so complex networks
could improve classification on these objects. A simplified model is used compared to the
model of Scabani et al. (2018).
5
These findings can further the research into image recognition as a research area. This
research gives insight in the current state of image recognition methods and improves on the
knowledge that is available.
1.2 Practical relevance
Image recognition these days is something that is widely spread. Everyone can use google to
reverse image search to see what an image is. Facebook can predict which people are in a
photograph. This means that improvements or other methods are useful for practical reasons,
since it impacts the lives of many people.
Image recognition is also used for more important technology. Imaging in medical fields is
used to diagnose diseases. Alzheimer’s can be detected by looking at images of brain scans
for instance (Anitha & Jyothi, 2016). It can be used in industries to make inspecting
machinery easier. Images can be input in an algorithm to check for problems instead of letting
a person look at it. Image recognition can also be used in sensor technology and a whole
catalog of other areas that make use of image data.
1.3 Research questions
This research looks at patterns in non-texture-based images. The patterns used are based on
network theory and specifically complex networks as proposed by Scabani et al (2018), who
showed that complex networks work on texture-based images. This leads to the following
main research question:
To what extent does modeling non-texture-based images as complex networks work
for classification?
To answer this question this research looks at different aspects. A baseline must be formed
with which the performance can be compared.
The first part of the baseline is the performance of the model on texture-based images. These
are images based on a single texture like brick or wood, instead of non-texture-based images
which do not have a just one texture like an image with a train and its surroundings. The
second part of the baseline is the performance on the base non-texture-based images.
Different algorithms are used, these need to be compared to each other.
6
This leads to the following sub questions:
How do classification algorithms perform on texture-based images modeled as
complex networks?
How do classification algorithms perform on the base non-texture-based images?
How do classification algorithms perform on the non-texture-based images modeled
as complex networks?
How do the performances of the algorithms compare to each other?
How do the performances of the non-texture-based images modeled as complex
networks compare to the baseline?
1.4 Findings
The main findings of the research are found in the table below. This table contains the highest
accuracy rates for each of the used methods and datasets.
VisTex CIFAR-10 Baseline CIFAR-10 Modeled
SVM 90.6% 28% 22%
Random Forest 90.6% 28% 19%
The results show that the algorithms got high results on the texture-based images modeled as
complex networks which are represented in the VisTex column. This is the dataset that was
used to get those results. They show low results on the base non-texture-based images which
is the column called baseline. And finally, they show even lower results on the non-texture-
based images modeled as complex networks which is represented in the last column. The
differences between the algorithms are small. And both are unable to classify non-texture-
based images with the modeled dataset as input.
7
2.0 Related Work
In this chapter the related work is discussed. Image recognition as a topic is discussed first.
After that different methods that have been used in the past are discussed in broad terms with
some examples to show the ideas behind the different methods. This leads to networks being
used in image classification and the theory of multilayer complex network descriptors where
this research is mainly based on.
2.1 Image recognition
Image recognition is the ability of an algorithm to identify something in an image. This can
be an object, a place, people or even writing (Rouse, 2017). Human brains can recognize
these objects easily, but computers have a much harder time with this task. Image recognition
is basically pattern recognition for images. There are many methods that are used for the
purposes of image recognition. Different classification algorithms can be used for instance.
The problem with most classification algorithms however is that they are not accurate with
the base image data. The data first needs to be modified. For this purpose, different methods
are used. In the next paragraphs the methods that are in line with the one used in this study
are explained.
2.2 Texture analysis
Texture analysis is an active research topic in which textures in images are researched to be
able to recognize them. Textures in images have been studied extensively and different
schemes of analysis have been proposed in these studies (Wouwer, Scheunders, Livens &
Dyck, 1997). The schemes all have in common that they look at spatial interactions between
the different pixels in an image. The goal of texture analysis in general is to look at texture
aspects in images and to use this for classification purposes (Scabani et al., 2018).
2.2.1 Statistical methods
Statistical methods in texture analysis use a collection of statistics of selected features (Zhang
& Tan, 2002). This is because the human visual system uses statistic features for texture
discrimination. The statistics include first-order, second-order and high-order statistics. Some
examples of statistical methods are polarograms, harmonic expansion and feature distribution
method. To further illustrate statistical methods, a couple of these methods are explained.
Polarograms are polar plots of texture statistics described as a function of orientation (Davis,
1981). Polar plots are plots in which you can write coordinates as the distance between a
point and the origin of the plot. From a polarogram texture features are derived based on the
size and shape of the polarogram (Zhang & Tan, 2002). The shape depends on the boundary
8
of the polarogram but also on the position of the origin. These features are then used for
classification with experiments getting 75% to 90% classification rate.
In the harmonic expansion approach (Alpati & Sanderson, 1985) an image is decomposed
into a combination of harmonic components in a polar form (Zhang & Tan, 2002). This
projection of the original image in harmonic form gives features of the pattern. Experiments
gave classification rates of 90% and better.
The feature distribution method (Ojala, Pietikainen & Harwood, 1996) bases features on
center-symmetric auto-correlation, local binary pattern and gray-level difference to describe
textures in images (Zhang & Tan, 2002).
2.2.2 Model based methods
Model based methods model a texture image as a probability model or a linear combination
of a set of basic functions (Zhang & Tan, 2002). These models can be used to calculate
coefficients which then can classify the image. Some examples of model-based methods are
SAR models, the Markov model and Wavelet transform. To further illustrate model-based
methods, a couple of those methods are explained.
The SAR model stands for simultaneous autoregressive model (Zhang & Tan, 2002). It looks
at the gray level of pixels in textured images. This leads to the following model definition:
In this model f(s) is the grey level of the pixels and ώ is the set of neighbors of the specific
pixel at site s. ε(s) is an Gaussian random variable, u is the bias independent of the mean gray
value of the image and θ(r) are the model parameters which can be used as the texture
features (Zhang & Tan, 2002). This model has been used for classification but also for
segmentation and synthesis. From this model other models have been created like the CSAR
and RISAR models.
The Markov model (Cohen, Fan & Patel, 1991) is used to model texture as Gaussian Markov
random fields and then use the maximum likelihood to estimate coefficients and rotation
angles (Zhang & Tan, 2002). This is a computationally intensive model which is a problem
and it also is highly nonlinear which is also a problem (Zhang & Tan, 2002).
Wavelet is a model for texture discrimination (Chitre & Dhawan, 1999). It decomposes a
texture image into frequency channels. These channels are created as such that they have
narrower bandwidths in the lower frequencies. This makes it specifically useful for textures
9
that are smooth because the information of such textures is concentrated in the lower
frequencies (Zhang & Tan, 2002).
2.2.3 Structural methods
Structural methods view texture as consisting of many textural elements which are arranged
in a certain way so that they follow placement rules (Zhang & Tan, 2002). This is because
humans can strongly perceive structural properties of textures. Some examples of structural
methods are an invariant histogram and morphological decomposition. To further illustrate
structural methods, a couple of these methods are explained.
The invariant histogram (Goyal, Goh, Mital & Chan, 1995) is a useful method for texture
analysis. The histogram is created with texture elements and can in turn be used for texture
characterization. The histogram is based on the following function
fw is the weighted frequency and is made up of ai which is the area of texel of area-index i and
ni the number of elements of index i (Zhang & Tan, 2002). Experiments with this method
achieved 95% classification rate.
Morphological decomposition (Lam & Li, 1997) decomposes a texture into a scale-dependent
set of component images. For each of these component statistical features are obtained.
Experiments with this method have been able to get 97,5% correct classification rate on
images (Zhang & Tan, 2002).
2.2.4 Color texture analysis
The three types of methods explained before are based on grey-scale images. This means that
you leave out an important way that humans can recognize images, namely the color
information. Based on this there are also methods that use color. The most basic way of using
color is making a histogram in which the colors are binned in a certain way. By comparing
the histograms, it shows which ones are similar in color (Hafner, Sawhney, Equitz, Flicker &
Niblack, 1995). The disadvantage is that the spatial information is lost with this method. One
of the ways to solve this disadvantage is to look at images as networks. This is the direction
this research will go in.
2.4 Complex Networks
As said before texture analysis looks at the spatial interactions between different pixels. This
gives way to a new way of thinking based on networks. Networks also look at interactions
10
between different nodes, same as spatial interactions between pixels. This leads to complex
networks. These networks are based on an interdisciplinary research area on which physics,
mathematics, biology, computer science and more disciplines look at a wide variety of
complex network structures (Silva & Zhao, 2016). This is done to understand interwoven
systems and their complexity. Structure affects function (Strogatz, 2001) and so the structure
of images affects what they show and how they can be classified. This can be described in the
form of a complex network.
These network structures form patterns and so can be used for pattern recognition (Scabani et
al., 2018). A network is based on several vertices and edges. There are weighted networks
and unweighted networks. Edges can have a direction or can be undirected.
Measured quantities are needed to characterize a network structure (Barrenas, Chavali,
Holme, Mobini, & Benson, 2009). Network measures are elements of networks that can be
represented in different ways and can tell the differences between two networks (Rubinov, &
Sporns, 2010). Network measures are used in this research to generalize images based on the
complex networks and the ones used are explained in the methods section.
2.5 Convolutional Neural Networks
The proof that recognition through networks works for images is found in the accuracy that
Convolutional Neural Networks get on image classification tasks. CNN’s are one of the best
performing algorithms when it comes to image classification (Krizhevsky, Sutskever &
Hinton, 2012).
CNN’s are hierarchical neural networks that use the 2d structure of an image to classify it
(Ciresan, Meier, Masci, Gambardella & Schmidhuber, 2011). CNN’s consist of different
layers. The first one is an image processing layer, this is an optional layer in which predefined
filters can be given. This can give extra input next to the raw image data. The next layer is the
convolutional layer. This layer creates the neural networks that form the basis for the
classification. The last layer is the classification layer in which on basis of the created
networks a classification is given to the objects.
2.6 Multilayer Complex Network Descriptors
Based on complex networks Scabani et al. (2018) proposed a new technique to model and
characterize color-texture. Each color channel is mapped as a layer in the network. The
vertices in this network consist out of the pixels of the image. The network consists out of
one vertex layer for each color. These colors are based on color-channels like RGB, which
stands for red; green and blue. Whether or not the vertices are connected is based on the
11
Euclidean distance of the vertices. By looking at the connected vertices, the different edges
can be composed. In this first step every vertex has the same number of connections except
those on the borders. This forms an ordinary network. For it to become a complex network
another step must be taken. A texture consists of patterns of intensity variation. Based on this
the approach is to cut the network connections where the connections between distinct
vertices are kept.
The weights of the connections between different vertices are based on the following
function:
In this function p(v) is the intensity value based on the color channel, d(vi, vj) is the
Euclidean distance between two pixels, L is the maximum value of the color channels and r is
the radius in which pixels are connected in the cartesian space.
How the theory of multilayer complex network descriptors as described here is used to create
complex networks from images is explained in the methods section. This research shows
whether complex networks can be used on images that are not texture based, as the images in
the original research are all textures like brick and wood instead of real-world objects. It also
looks at different classification algorithms to see if modeling images as complex networks
can improve accuracy of those algorithms.
12
3.0 Methods
Different methods are used to get to the results that are needed. The images are processed
first. Afterwards the images are turned into complex networks which in turn is the input for
two classification algorithms.
3.1 Image processing
The images are processed to forms rows in a dataframe with each row being an image. Each
row contains the color information based on the RGB color channels. The first values are the
red values, after that come the green values and finally the blue values. Within each channel
the first value represents the first pixel of the top row of the image and the second value the
second pixel of the top row of the image. After a row of pixels is finished it continues to the
next row of pixels, this sequence is used until the whole image is processed. So, for example
in an image which exists of 4x4 pixels with the next values for RGB:
By processing this it turns into the following row for a dataframe:
The labels of the images are then added to the dataframe with each row getting the label
corresponding to the image. These rows can be used for the modeling that is needed to turn
the images into complex networks. This step is explained in chapter 3.2.
The advantage of processing the images and transforming them to numbers in a dataframe is
twofold. Firstly, it is easier to do calculations on the numbers compared to using the direct
image-file. Secondly, this way of transforming the data makes it easier to put it into the two
classification algorithms that are used in this research.
3.2 Modeling as Complex Networks
The modeling of the data revolves around creating a complex network for the processed
image data as described above. Each pixel is a possible node in a network for each image.
This model is based on the work of Scabani et al. (2018). They used two steps to create the
edges between different nodes. The first was to look at the spatial feature of the pixels,
specifically the coordinates in cartesian space. The second step is to compare the different
13
pixels to each other based on the color channels and connect the ones that are different to
each other. After this they continue and use the network and create different subnets. They
use topological measures on these subnets which are then converted to RGB images. An
example of the recreation of the images that they use can be seen below (Scabani et al, 2018).
In this research a simpler approach is used. The steps made in the modeling process in this
research are based on the first steps of the research of Scabani et al. (2018). In this research
the complex networks are used to get variables based on network measures. These measures
are used as input for the classification algorithms.
The first step is to calculate which pixels are connected within a certain radius.
The pixel is connected with the other pixels if it is within the radius R as shown on the
picture above. In the picture the first pixel of the first row is connected in a network to the
second pixel of the first row and the first and second pixel of the second row. To calculate
this the following formula is used:
In this formula x1 and x2 are the x-coordinates of the two pixels that are being compared and
y1 and y2 are the y-coordinates of the two pixels that are being compared. d is the value that
is compared with the radius (R). If it is smaller than the radius then the pixels are connected,
if it is bigger, the pixels are not connected.
The second step is to compare the pixels based on the color data using the following formula:
14
In this formula v1 is the color value of the first pixel and v2 the color value of the second
pixel. L is the maximum value the color channel can have, for RGB this is 255. This formula
is used for each of the channels. For example, with RGB the formula is used for red, green
and blue.
The thirds step is to add these values to each other and then divide it by three to get the
average. This is then multiplied by b which is 1 or 0 depending if the pixel is connected based
on the radius. This leads to a weight (c) as shown in the formula below:
The final step is to compare this weight to t which is a number between 0 and 1 that can be
used to set the thresholds of the connections. A higher weight shows pixels that are different
from each other. So, if c is bigger than t the pixels are connected to each other, with c as the
weight of the connection. If not, the value is 0 because they are not connected.
This gives a network with the pixels as nodes and the edges based on the calculations
explained above. From this network different variables can be extracted. The first variable
that is used is the total number of edges per pixel. This indicates the pixels that are important
within the image because the differ a lot from their surroundings. The edge of an object
differs a lot from its background which makes it an important pixel within an image. The
processed original data has as many values per pixel as there are color channels. After the
modeling this is only one value per pixel, for RGB this means that the values are divided by
three.
Some preliminary results showed that the total edges per pixel are not enough to correctly
classify the images used. To solve this, the model can be used differently. Other variables can
be extracted from it on top of the total edges per pixel.
Networks have measures that can be used. The measures describe the network and its
characteristics. The measures used in this research are explained below:
Degree: the node degree is the number of edges that are connected to the node
(Rubinov & Sporns, 2010). Here the mean degree, the standard deviation of the
degree and the maximum degree are used. The mean degree is the mean of the
degrees of all nodes added together. The standard deviation is the deviation within
15
those degrees and the maximum degree is the number of edges of the node with the
highest degree.
Strength: the node strength is the sum of weights of the edges connected to that node.
In this study the mean strength is used, which is the mean of all the weights of the
nodes added together (Rubinov & Sporns, 2010).
Density: this is a number which is calculated by dividing the number of connections
by the number of possible connections (Rubinov & Sporns, 2010).
Diameter: this is the longest path of the shortest paths between different nodes
(Rubinov & Sporns, 2010).
Betweenness: this is the number of shortest paths that pass through a node (Rubinov
& Sporns, 2010).
Closeness: this is the average number of steps it takes for a node to reach all other
nodes in the network (Costenbader & Valente, 2003).
These measures further shrink the variables to only those based on the measurements. The
theory is that similar images have similar measures for their respective complex networks.
These different outputs of the model are input for the classification algorithms that are
described below and the results of those are compared to the original processed data.
3.3 Classification algorithms
After modeling the images as complex networks, two classification algorithms are used to see
if the images modeled as complex networks can be used to classify the images correctly and
to what extent they can do that compared to a baseline. These classification algorithms are
SVM and Random Forest. Both classification algorithms are used on the texture-based
images modeled as complex networks, the base non-texture-based images and finally the
non-texture-based images modeled as complex networks. This gives the opportunity to
compare the performance of the baselines to the non-texture-based images modeled as
complex networks.
3.3.1 Support Vector Machines
One classification algorithm used is SVM (Support Vector Machines). SVM’s are based on
the structural risk minimization principle from computational learning theory (Joachims,
1998). This principle looks to find a hypothesis with the lowest true error. SVM’s are
universal learners, this means that they use a linear threshold function as their basis, but they
can be modified in various ways like using them for polynomial classifiers (Joachims, 1998).
16
The goal of SVM is to create a model based on training data which can then predict values of
target data (Hsu, Chang & Lin, 2003). It is a supervised machine learning algorithm and it is
mostly used for classification (Ray, 2017). Each data item is plotted as a point in an n-
dimensional space then the classification is done by finding a hyperplane that differentiates
the different classes the best. One of the advantages of SVM’s is the ability to generalize
many features. This is especially important in this research because of the number of features
images have based on the images used here.
The algorithm is used with three different datasets. The first is the VisTex dataset that has
been modeled and exists of network measures. The second is the base image data from the
CIFAR-10 dataset. The third and last is the CIFAR-10 dataset made up of network measures
of the modeled images. For the first and third datasets there are different versions based on
the parameters that are used for the modeling. The results are used for comparison and to
draw conclusions from.
3.3.2 Random Forest
Random forest is a supervised learning algorithm just like SVM. It is a classification and
regression method (Belgiu & Dragut, 2016). The underlying theory is combining a group of
decision trees together to get a more accurate prediction (Donges, 2018). A random forest
does not search for the most important feature while splitting nodes like a normal decision
tree does. Instead it searches for the best feature among a random subset of features. This is
done to gain a wide diversity which leads to a better model (Breiman, 2001).
This algorithm uses the same three datasets as the SVM algorithm. The VisTex dataset, the
CIFAR-10 base image dataset and the CIFAR-10 dataset of network measures. For the first
and third datasets there are different versions based on the parameters that are used for the
modeling. The results are used for comparison and to draw conclusions from.
17
4.0 Experimental Setup
The experimental setup first explains the two datasets that are used in the experiment. After
this, the programs that are used are explained, followed by the packages and the
programming of the proposed model.
4.1 Datasets
In this research two different datasets are used. One with the texture-based images that are
needed for the baseline, and the other with non-texture-based images.
4.1.1 Vision Texture
The Vision Texture database (VisTex) was created as an alternative to the Brodatz texture
library. This database was used by Scabani et al (2018) in their research and they got 99.9%
accuracy with their model. To check whether the simplified model of their model used in this
research works, this database is also used here to set up a baseline.
In this research five of the reference textures are used for the texture-based images needed for
the baseline. The classes used are: brick, clouds, leaves, flowers, food. These classes are
selected because of the mix of big textures and especially the smaller textures like flowers
and leaves. The idea is that those are closer to the non-texture-based images used in this
research and because of can be used for a better comparison.
For each of the classes two 128x128 pixel images are used. These are split up into 32x32
pixel images which gives 32 images per class for a total of 160 images. The reason that this
method of splitting up the images is used is because of the original research of Scabini et al.
(2018), since their theory is used here. In that research they split up 512x512 pixel images up
into 128x128 pixel images. The same 512x512 pixel images are available in 128x128 pixel
images.
The first reason that the lower resolution images have been used is the computational
complexity. Since the model used calculates weights based on pixels, each additional pixel
complicates the model. Where 32x32 pixel images have 1024 possible edges, 128x128 pixel
images have 16,384 possible edges. The second reason is because the CIFAR-10 dataset that
is explained next consists of 32x32 pixel images. To be able to better compare the results the
same resolution is used for the two datasets.
In short, the VisTex dataset is used for the texture-based images. It contains 160 images that
are evenly distributed over 5 classes.
18
4.1.2 CIFAR-10
The second database that is used is the CIFAR-10 dataset. This dataset consists of 60,000
32x32 pixel images divided into 10 different classes. They are labelled subsets of the 80
million tiny images dataset collected by Krizxhevksy, Nair, and Hintin. All the 10 classes are
used in this study. The classes are the following: airplane, automobile, bird, cat, deer, dog,
frog, horse, ship and truck. The images in this dataset are non-texture-based images. Each
contains certain objects or animals with a background. The objects are pictured from different
sides and in different sizes compared to the surroundings.
From this dataset 500 images are used, 50 for each of the classes. The reason this is done is to
create a balanced dataset that is still small enough to try out different parameters. The images
are kept as 32x32 pixel images, which means 1024 possible edges. This dataset has been used
in convolutional neural network research and got classification rates of 90% and more
(Benenson, 2016). It is interesting for this research because network models have shown to
work on it, but this type of modeling has not yet been used.
In short, the CIFAR-10 dataset is used for the non-texture-based images. It contains 10
different classes and 50 images per class.
4.2 Software
Python is used for the CIFAR-10 dataset. It is used for the pickle add-on, to unpickle the
CIFAR-10 dataset into a dictionary with the RGB values and labels. From this the files are
stored in a dataframe using Pandas and then saved as an CSV file to easily export it to R-
studio where the rest of the work is done.
The main software that is used is R-Studio and with that the main programming language
used is R. This software is used to process the images and program the model. The algorithms
that are used are part of R-packages. To get reproducible results a set seed is used.
4.2.1 Packages in R-Studio
This is a list of the packages used with a short description of what they were used for:
randomForest: this is the random forest algorithm, as explained in the method section,
used in this study
dplyr: this is the package that is used for data manipulation of the image data
caret: this package contains several algorithms, one of which is SVM as explained in
the method section. It also contains the confusion matrix function that is used for
evaluation.
19
e1071: this is an additional package which is needed for the caret package.
Igraph: this package is used to create the network by using an adjacency matrix that is
explained in the model section.
Pixmap: the images of the VisTex dataset are ppm images, this package allows to
open these in R-studio.
4.3 Proposed model
This is an explanation of how the model as proposed in the methods is used in R-studio. The
first step of the modeling is looking at the cartesian coordinates to see if pixels are connected.
For this a grid is created where the rows and columns represent the different pixels. Each row
and column has 1024 values, and this creates what is basically an adjacency matrix, with the
values being 0 or 1 in the first step. These values are based on the formula given in the
method section:
If d is smaller than the radius R that is chosen, then the value is 1, otherwise it is 0. This grid
is the same for all the images because these are all 32x32 pixels, so they have the same
cartesian coordinates. This does not mean that this grid only has to be calculated once.
Different values for R can be chosen to change the network by increasing or decreasing the
radius in which pixels are connected. In this research the values from 1 to 6 are used for R.
Three grids just like the one in the first step are used for the second step of the modeling.
Each of these grids represent a color channel in the RGB system. Instead of 0 or 1 for the
values in the rows and columns these three grids have their weights based on the formula
given in the methods section:
These three grids are then added to each other and then divided by three to get the average
weight and then multiplied with the grid based on the first step like explained in the formula
with b being the grid of zeros and ones.
This makes a single grid from the four grids. This grid contains all the weights per
connection. These are compared with a threshold t, if the weight is bigger than t the value
20
remains the same. If not, the value becomes 0 to indicate the pixels not being connected. For t
different values are used from 0.3 to 0.6. This final grid can be used for two things. First it is
possible the total connections per pixel to see which pixels are different enough to be
important. Secondly you can use the grid as a weighted adjacency matrix to create a network
with the Igraph package. This network can then be used to get network measures. In this
research the following network measures are used: density, mean strength, diameter, mean
closeness, mean betweenness, mean degree, standard deviation of degree, and maximum
degree. Below is an example of what this data looks like. This data is then input for the
classification algorithms.
4.4 Evaluation
In this research there are three types of data that are input for the classification algorithms:
the non-texture-based images, the texture-based images modeled as complex networks, and
the non-texture-based images modeled as complex networks.
Each of these are split into a train and a test set with an 80/20 split. This is done randomly
with a set seed. The training and test sets are the same for both the algorithms. This is to be
able to compare the results of the algorithms with each other.
The train set is used as input for the algorithms to create a fit. The fit is then used to predict
the test data. The predicted labels are then compared to the true labels to get a confusion
matrix. The main value of interest in the confusion matrices is the overall accuracy. This
value is used to compare the different types of data with different parameters to each other.
The other values like the positive and negative prediction values can also tell us which
classes perform better than others and point to classes that are of special interest.
21
5.0 Results
In this chapter the different results are presented with explanations. First discussed is the
modeled data from the VisTex dataset, followed by the unmodeled CIFAR-10 set. These two
combined are the baseline. The last one discussed is the modeled CIFAR-10 dataset. Each of
the datasets are split up in results from the SVM algorithm and the Random Forest algorithm.
5.1 VisTex modeled
The VisTex dataset was mainly used to see if the variables created by the simplified model
can be used for the classification of images. The focus is not to try as many r and t-values as
possible to get the best result, but simply to provide proof that the model works on this type
of data and as comparison material for the non-texture-based images. The t-values are
selected in a way that it holds the middle ground between a network that has many
connections and a network that has few connections. The r-values are chosen in a way that
the radius does not encompass to much area of the image.
5.1.1 SVM
In the table below, you can see what the effect of different t and r values is on the accuracy of
the classification for the modeled data.
SVM t = 0.3 t = 0.4 t = 0.55 t = 0.6 t = 0.65
r = 2 40.6% 40.6% 71.9% 50% 50%
r = 3 71.9% 81.3% 75% 87.5% 53.1%
r = 4 81.3% 84.4% 72.9% 81.3% 53.1%
r = 5 84.4% 84.4% 78.1% 90.6% 53.1%
These results show that SVM can correctly classify the modeled images by using the network
measures. The highest score that is achieved is 90.6%. The r-values and t-values show that
the network should not be too small. A low r-value means that less pixels are connected
because the radius in which they can be connected is smaller. The table shows that with r=2
there is a drop-off. The same is true for the t-value. A higher t-value means that the threshold
for connections is higher. This means that less pixels are connected based on the color
differences. There is a drop-off in performance at 0.65.
22
Below are statistics from the confusion matrix for the highest accuracy score that explain the
performance further:
Confusion Matrix Statistics
Algorithm SVM
r-value used 5
t-value used 0.6
Overall Accuracy 90.6%
Statistics by Class Brick Cloud Flowers Food Leaves
Sensitivity 100.0% 100.0% 87.5% 100.0% 60.0%
Specificity 100.0% 100.0% 91.7% 100.0% 96.3%
Positive Prediction Value 100.0% 100.0% 77.8% 100.0% 75.0%
Negative Prediction Value 100.0% 100.0% 95.7% 100.0% 92.9%
The first statistic that is important is the overall accuracy. The table shows that the overall
accuracy for this result is 90.6%. The overall accuracy shows in how many instances the
classifier is correct. This shows that the algorithm can classify the images based on the
network measures. Scabani et al. (2018) got 99.8% accuracy in their study. The difference in
performance can have different reasons. The first reason is the model used in this study. It is a
simplified version of the model of Scabani et al. (2018). The second reason is the images
used. In their research 128x128 pixel images are used compared to the 32x32 pixel images
used here. This means that the images here contain less detail and less information.
The overall accuracy is interesting for an overview of the performance. The individual
statistics per class show how the overall accuracy is influenced by the different classes. The
statistics that are selected here show to what extent the algorithm can classify different
classes. When looking at the classes the table shows that three classes have 100%
performance on all the statistics. The other two classes are the ones where the performance
drops.
The sensitivity statistic of the leaves class shows that 60% of the images that are leaves, got
classified as leaves. This means that 40% got classified as something else. The positive
prediction rate shows the accuracy of the classifier on leaves. In 75% of the cases the
algorithm classified images as leaves correctly and in 25% of the cases it classified images as
leaves that are not leaves. This means that the algorithm has difficulties with classifying this
specific class. The same is true for the flowers class to a lesser extent. This difference in
performance compared to the other three classes is the reason that the overall accuracy is
lower. The reason why the performance is lower has to do with the texture and the image
23
size. Flowers and leaves are small textures compared to bricks. When this is combined with
the low detail in the small images it leads to an unclear texture that is hard to classify. This
problem is further discussed in chapter 6 and is seen in other results as well.
One of the sub-questions is about the performance on texture-based images modeled as
complex networks. The overall accuracy of 90.6% shows this performance and the individual
classes show what influences the performance. The performance on this dataset is high based
on the numbers that are shown above even though some classes show problems.
5.1.2 Random Forest
The SVM algorithm got high scores on the results. The table below shows how the Random
Forest algorithm performed on the same data:
Random Forest t = 0.3 t = 0.4 t = 0.55 t = 0.6 t = 0.65
r = 2 84.4% 87.5% 78.1% 75% 59.4%
r = 3 75% 75% 78.1% 84.4% 59.4%
r = 4 84.4% 90.6% 81.3% 87.5% 43.8%
r = 5 84.4% 87.5% 75% 81.3% 53.1%
These results show that Random Forest can classify texture-based images based on the
network measures. It has a highest accuracy score of 90.6%. For the t-values the story is the
same for Random Forest as with SVM. A higher t-value gets a network that is too small based
on a high threshold for connections. The lower r-value does not seem to matter as much with
Random Forest based on these results. Below are statistics from the confusion matrix for the
highest accuracy score that explain the performance further.
Confusion Matrix Statistics
Algorithm Random Forest
r-value used 4
t-value used 0.4
Overall Accuracy 90.6%
Statistics by Class Brick Cloud Flowers Food Leaves
Sensitivity 100.0% 100.0% 87.5% 100.0% 60.0%
Specificity 100.0% 100.0% 91.7% 96.4% 100.0%
Positive Prediction Value 100.0% 100.0% 77.8% 80.0% 100.0%
Negative Prediction Value 100.0% 100.0% 95.7% 100.0% 93.1%
These statistics show much of the same compared to the SVM results. The overall accuracy is
90.6% which is the same as the overall accuracy of SVM. This means that the Random Forest
24
algorithm can classify the images based on the network measures, just like the SVM
algorithm can.
The individual classes perform similar compared to the SVM algorithm. A notable difference
is the fact that the food-class did not get 100% over all the statistics. This class got 80% on
the positive prediction rate. This means that images that are not food got classified as food. It
does get 100% on sensitivity so it classifies all the images that are food as food. The flowers
and leaves classes show problems again and since they both have sensitivities that are not
100% this means that some of the leaves got classified as food. This gives us the same
information as with the SVM algorithm, that the main problem lies with the flowers and
leaves classes. The reasons for this problem are the same as for the SVM algorithm, small
textures combined with low detail images.
One of the sub-questions is about the performance on texture-based images modeled as
complex networks. The highest overall accuracy of 90.6% shows a high performance and the
individual classes show what influences this performance. The other accuracies show high
percentages as well depending on the t-value and r-value selected. The performance on
texture-based images modeled as complex networks is quite high as shown by the numbers.
5.1.3 Algorithms compared
One of the sub-questions is about comparing the performance of the two algorithms. Some
preliminary comparisons have been made in the text above. Based on the table below some
more things can be concluded.
SVM t = 0.3 t = 0.4 t = 0.55 t = 0.6 t = 0.65 Average %
r = 2 40.6% 40.6% 71.9% 50% 50% 50.6%
r = 3 71.9% 81.3% 75% 87.5% 53.1% 73.8%
r = 4 81.3% 84.4% 72.9% 81.3% 53.1% 74.6%
r = 5 84.4% 84.4% 78.1% 90.6% 53.1% 78.1%
Average % 69.6% 73.7% 74.5% 77.4% 52.3% 69.3%
Random Forest t = 0.3 t = 0.4 t = 0.55 t = 0.6 t = 0.65 Average %
r = 2 84.4% 87.5% 78.1% 75% 59.4% 76.9%
r = 3 75% 75% 78.1% 84.4% 59.4% 74.4%
r = 4 84.4% 90.6% 81.3% 87.5% 43.8% 77.5%
r = 5 84.4% 87.5% 75% 81.3% 53.1% 76.3%
Average % 82.1% 85.2% 78.1% 82.1% 53.9% 76.3%
At first glance, the Random Forest seems to perform better overall. When looking at the
average performance over all, the results this seems to hold, with 69.3% average accuracy for
25
SVM and 76.3% average accuracy for Random Forest. Comparing all the averages there is
only one where SVM comes out on top, that is for the average of r = 5. Overall the
differences are enough to say that the Random Forest algorithm performs better than SVM on
this dataset.
5.2 CIFAR-10 unmodeled
For the unmodeled CIFAR-10 dataset are no special parameters. For each algorithm there is a
single confusion matrix and a single score.
5.2.1 SVM
The accuracy score for the SVM algorithm is 28%. Below is the confusion matrix with the
different classes:
Confusion Matrix
Baseline
Algorithm SVM
Overall Accuracy 28.0%
Statistics by Class Airplane Auto Bird Cat Deer Dog Frog Horse Ship Truck
Sensitivity 25.0% 42.9% 18.2% 0.0% 50.0% 28.6% 7.7% 20.0% 77.8% 27.2%
Specificity 96.4% 94.6% 91.0% 90.2% 78.3% 96.8% 98.9% 94.4% 85.7% 94.4%
Positive Prediction 57.1% 37.5% 20.0% 0.0% 16.7% 40.0% 50.0% 28.6% 35.0% 37.5%
Negative Prediction 87.1% 95.7% 90.0% 91.2% 94.7% 94.7% 87.8% 91.4% 97.5% 91.3%
The overall accuracy is 28%. This means that the algorithm is inaccurate on this dataset. It
only predicted 28% of the classes correct which means it predicts 72% incorrect. The reason
for this low accuracy can be found in the statistics by class.
The overall sensitivity is low, with none of the classes getting higher than 50%. This means
that the algorithm has a hard time classifying a class as the correct one. This can also be seen
in the positive prediction rate percentages of which none get higher than 57.1% with some
going as low as 0%. The differences in percentages seems to indicate that there are classes on
which the algorithm works better. When testing this however, using a binary approach where
one class is the positive class and all the others the negative class, it showed that none of the
classes performed well.
Looking at the specificity and negative prediction rate the results seems more positive.
However, the odds of randomly predicting that something is not a class are much higher than
predicting the correct class when there are this many classes. The combination of low
26
sensitivity and positive prediction rate with the low overall accuracy show that the
predictions are mostly random.
One of the sub-questions is about the performance of classification algorithms on the base
non-texture-based image data. The results show that the performance is 28%. This is a low
performance, since this means that 72% of the image got classified incorrectly. The reason for
the low score is difficult to judge because all the classes have a low performance. The data is
RGB-channel information ordered by pixels. The SVM algorithm is not able to differentiate
classes based on that.
5.2.2 Random Forest
The Random forest algorithm got the same accuracy of 28% with some different numbers in
the confusion matrix as shown below.
Confusion Matrix
Baseline
Algorithm Random
Forest
Overall Accuracy 28.0%
Statistics by Class Airplane Auto Bird Cat Deer Dog Frog Horse Ship Truck
Sensitivity 50.0% 57.1% 18.1% 12.5% 25.0% 28.6% 76.9% 0.0% 55.6% 27.2%
Specificity 90.5% 93.6% 92.1% 92.4% 81.5% 91.4% 97.7% 98.9% 90.1% 92.1%
Positive Prediction 50.0% 40.0% 22.2% 12.5% 10.5% 20.0% 33.3% 0.0% 35.7% 30.0%
Negative Prediction 90.5% 96.7% 91.1% 92.4% 92.6% 94.4% 87.6% 89.9% 95.4% 91.1%
This confusion matrix tells a similar story to that of the SVM confusion matrix. The Random
Forest algorithm classifies 72% incorrectly. The statistics by class are not the same as the
statistics for the SVM, but they tell the same story. The sensitivity does go up to 76.9% which
might indicate classes that perform better. The positive prediction rate goes up to 50% and
differs between classes. However, when using the binary approach, it shows that none of the
classes score noticeably better. With sensitivity statistics not getting above 22%.
Just like with the SVM algorithm the specificity and negative prediction rates are high. The
reason for this is the same. It is easier to randomly predict an image not being a certain class
than the other way around.
One of the questions is about the performance of classification algorithms on the base non-
texture-based image data. The results show that the performance is 28%. This is a low
performance, since this means that 72% of the image got classified incorrectly. The reason for
27
the low score is difficult to judge because all the classes have a low accuracy. The predictions
seem mostly random. The data exists of RGB-channel information ordered by pixels. The
Random Forest algorithm is not able to differentiate classes based on that.
5.2.3 Algorithms compared
One of the questions is how the algorithms compare to each other. It is hard to tell much
difference between the two algorithms with the results from the unmodeled non-texture-based
images. Both preform about equal in terms of overall accuracy with 28%. The statistics do
differ with classes but because the accuracy is so low and the prediction are mostly random, it
is not possible to compare these statistics. Both algorithms are not able to classify images
based on the dataset that is used.
5.3 CIFAR-10 modeled as complex networks
The VisTex results show that the model can work. The unmodeled CIFAR-10 results show
that the base data is unusable for classification. The modeled data shows whether the model
can bring any improvements to that. The t-values and r-values are chosen in a same way as
for the VisTex dataset. The t-values are selected in a way that the network holds the middle
ground between too many connections and too few connections. The r-values are chosen in a
way that the radius does not encompasses to much area of the image.
5.3.1 SVM
The expectation is that by generalizing the data into complex networks, the results improve
compared to the results on the base data. The table below shows the accuracy scores for the
modeled image data with different t and r values.
SVM t = 0.3 t = 0.4 t = 0.5 t = 0.6
r = 2 15% 9% 11% 9%
r = 3 13% 10% 16% 9%
r = 4 22% 13% 9% 12%
r = 5 16% 17% 11% 17%
r = 6 15% 18% 15% 13%
The table shows that the results are low. The highest score is only 20%, which is lower than
the result from the base images. Whilst there are some small differences these do not tell
much because all the scores are so low. These scores show that the predictions are random.
Below are statistics from the confusion matrix with the highest accuracy.
28
Confusion Matrix
Algorithm SVM
r-value used 4
t-value used 0.3
Overall Accuracy 22.0%
Statistics by Class Airplane Auto Bird Cat Deer Dog Frog Horse Ship Truck
Sensitivity 18.8% 42.9% 18.2% 12.5% 50.0% 0.0% 76.9% 0.0% 44.4% 36.4%
Specificity 91.7% 87.1% 94.4% 96.7% 80.4% 95.7% 98.9% 88.9% 90.1% 91.0%
Positive Prediction 30.0% 20.0% 28.6% 20.0% 18.2% 0.0% 50.0% 0.0% 30.8% 33.3%
Negative Prediction 85.6% 95.3% 90.3% 92.6% 94.9% 92.7% 87.8% 88.9% 94.3% 92.1%
The overall accuracy is 22% which means that 78% is classified incorrectly. This score shows
that the performance is lower than the scores of the base images.
The highest sensitivity is 76.9% for the frog-class, after that it drops to 50% for the deer-class
and three classes get a 0.0% sensitivity score. The 76.9% looks positive but comparing it to
the positive prediction rate of 50% shows that even this class performs low. The positive
prediction rates tell a similar story as the sensitivity scores. From the 10 classes, two get a 0%
prediction rate and none of the classes get higher than 50%. Overall the sensitivity scores
give an indication of the inability of the algorithm to correctly classify images.
There are statistics that look positive. The specificity and negative prediction rates are high,
with most of the classes scoring around 90%. The reason for this is that it is easier to
randomly predict an image is not a certain class. 90% of the classes are correct in that case.
This means that a high score in those statistics does not matter when the sensitivity and
positive prediction rates are as low as they are here.
One of the questions is what the performance on the modeled dataset is. These results show
that the performance is low, even lower than the base data and much lower than modeled
VisTex dataset. One of the reasons for the low performance can be found in something that
happened with the VisTex dataset. There two classes performed worse than the others. The
reason for that is that the details in the image are too small. With the images from the CIFAR-
10 dataset this is true as well. The other reason is the content of the images. Images within
different classes do not differ enough and images with the same class do not look similar
enough. These reasons are further illustrated in the discussion section. The SVM algorithm
cannot classify images based on the dataset that is used here.
29
5.3.2 Random Forest
The Random Forest results show the same story as the results of SVM. As the table below
shows.
Here the highest accuracy score is 19% which is even lower than the best result of SVM.
There are some small differences but nothing that is telling. Below are statistics from the
confusion matrix for the best score:
Confusion Matrix
Algorithm Random
Forest
r-value used 4
t-value used 0.3
Overall Accuracy 19.0%
Statistics by Class Airplane Auto Bird Cat Deer Dog Frog Horse Ship Truck
Sensitivity 25.0% 28.6% 18.2% 0.0% 25.0% 0.0% 23.1% 0.0% 33.3% 27.3%
Specificity 89.3% 86.0% 88.8% 96.7% 89.1% 92.5% 93.1% 91.1% 92.3% 91.0%
Positive Prediction 30.8% 13.3% 16.7% 0.0% 16.7% 0.0% 33.3% 0.0% 30.0% 27.3%
Negative Prediction 86.2% 94.1% 89.8% 91.8% 93.2% 92.5% 89.0% 89.1% 93.3% 91.0%
The overall accuracy is 19% which is 3% lower than the overall accuracy of SVM. Just like
with SVM the score is lower than the base data and much lower than the modeled VisTex
dataset.
The statistics by class tell a similar story to those of SVM. The sensitivity does not go higher
than 33.3% over all the classes and three classes score 0%. The positive prediction rates are
low as well with none of the classes scoring higher than 33%. Three classes got 0% with the
positive prediction rates, these are the same classes that got 0% on sensitivity. These statistics
give a clear indication that the algorithm cannot classify images correctly based on the dataset
and model that are used here.
The specificity and negative prediction rates are high. The reason for this is the same as with
SVM. It is easier to randomly predict an image is not a certain class because nine classes are
correct in that situation and only one is incorrect.
Random Forest t = 0.3 t = 0.4 t = 0.5 t = 0.6
r = 2 14% 14% 12% 11%
r = 3 14% 16% 11% 11%
r = 4 19% 13% 10% 10%
r = 5 15% 10% 12% 15%
r = 6 17% 16% 18% 12%
30
One of the questions is what the performance on the modeled dataset is. The results show that
the performance is low. It is lower than the baseline of the unmodeled CIFAR-10 dataset and
much lower than the modeled VisTex data. The reasons are in line as what was discussed with
the SVM algorithm. The images are too small which means less information. And the content
of the images is not different enough between classes and not similar enough for images of
the same class. These reasons are further explained in the next chapter. The Random Forest
cannot classify images based on the model and data used here.
5.3.3 Algorithms compared
One of the research questions is how the algorithms compare to each other. The results are
not comparable because they both score low. The 3% difference in highest score does not tell
anything except that one algorithm guessed right, because the results are random. The
problem with these results is that the data provided to the algorithms does not work. This
means that the algorithms get scores that are in the context of this comparison meaningless.
5.4 Summary of the results
Putting everything together gives the following best results for each of the algorithms:
VisTex CIFAR-10 Baseline CIFAR-10 Modeled
SVM 90.6% 28% 22%
Random Forest 90.6% 28% 19%
This shows that the scores of the modeled data are much lower than the baseline that is based
on VisTex and the unmodeled data. The modeled CIFAR-10 data only got 19/22% which is
low if you compare it to convolutional neural networks that can get up to 90% and more on
the same dataset (Benenson, 2016).
The main research question is to what extent modeling non-texture-based images as complex
networks works for classification. By looking at the results this method of modeling non-
texture-based images as complex networks does not work for classification purposes. The
reasons are explained in depth in the next chapter
31
6.0 Discussion
This chapter looks back at the results and discusses in depth what the reasons can be for these
results and how to connect them to each other. It also looks in to things that can be improved
or that have not been answered yet.
There are two main reasons as to why modeling the non-texture-based images as complex
networks and using the network measures does not work.
The first has to do with the amount of information that an image has. In this research images
were used that are 32x32 pixels. This means that a lot of detail is lost by compressing the size
to such a small image. With a large texture like bricks this is less important because it is not
as detailed. If you compare this with an airplane that fills a third of an image, this problem
becomes apparent. This also means that the radius in which you connect pixels cannot be too
big because it encompasses to much of the image if it is small.
This problem does not only show itself in the CIFAR-10 dataset, the VisTex dataset shows
that same problem. The classes that scored the worst there are flowers and leaves. If you
compare leaves to bricks, leaves are much smaller in detail and the same is true for flowers.
This is one of the reasons that the score for those classes is lower than the rest. Leaves and
flowers are also overlapping textures which further complicates things if you start
compressing images. The lines between different leaves starts to blur and the become less
clear. When using a model that explicitly searches for differing pixels, this becomes a
problem. The blurring of an image that happens when compressing it, turns pixels more
similar and makes differences smaller. This can explain the differences in performance
compared to the model of Scabani et al. (2018), they used 128x128 pixel images which
increases the information in an image. Below you can see the differences in images by
making both images the same size, the left one is the 128x128 pixels and the right one the
32x32 pixels:
32
Even though you can see the difference in image quality, it is still clear what the right one is
because the texture is big. If you compare that to a non-texture-based images like below, even
a human will have problems with recognizing the fact that it is a cat.
A conclusion seems that by increasing the image size, the model might work on non-texture-
based images. This is however where the second reason comes in.
The second reason has to do with the objects in the images. The similarities within classes
and the differences between classes are too small. The expectation at the start of the study
was that the classes have enough similarities and especially differences between them to
improve classification rates. The idea was that similar networks are created within classes or
at least differing networks between classes. Below you can see two images that look similar,
a third that looks different and a fourth that is even more different. The first three are the
same class but even the two that look similar have different backgrounds. The forth image
looks more like the third but is another class.
The model used here is not able to differentiate the different classes enough to classify them
correctly.
The expectation is that for non-texture-based images to work, there must be more information
in the image, so a higher pixel-count, and the object must be pictured in the same way with a
same sort of background. But even then, there are classes that look similar like the red car
and the red boat shown above. To know this for certain this must be researched. There is a
problem attached to increasing the pixel-count. It takes increasingly more computing power
and time, since every pixel is a possible node in the network and the values must be
calculated for each pixel. Even though this is a simplified model, it still took some quite time
to get different results, which is why only a small part of the dataset is used. The model that is
proposed here must be made more efficient to handle bigger images.
33
The question is if it is worth to research this method further. Even though it might improve
classification, this study gives no indications it will improve enough to compete with other
methods that are being used for image classification like convolutional neural networks.
34
7.0 Conclusion
In this chapter the questions that are stated at the beginning of this thesis are answered. First
the sub-questions are answered, followed by an answer to the main question.
7.1 Performance on texture-based images as complex networks
The first sub-question was:
How do classification algorithms perform on texture-based images modeled as
complex networks?
The SVM algorithm and Random Forest algorithm both got scores of 90.6%. This was
expected because the more complicated model of Scabani et al. (2018) shows that high
accuracy rates can be obtained with this type of model.
The only problems with the classification are the leaves and flowers classes. This has two
reasons. The image size that is used in this research contains less information which makes it
harder to differentiate different parts of the texture. The second reason enhances the first
because these two classes contain small textures which are harder to differentiate in the first
place.
The conclusion is that classification algorithms show high accuracy rates on texture-based
images that are modeled as complex networks, with the performance being dependent on the
model used but especially the type of images and their size.
7.2 Performance on base non-texture-based images
The second sub-question was:
How do classification algorithms perform on the base non-texture-based images?
The SVM and the Random Forest algorithm both got scores of 28%. All the classes had poor
individual scores when comparing them to the scores on the modeled VisTex dataset. The
sensitivity scores and positive prediction rates are low and show that the algorithms are not
able to classify images correctly based on this dataset. The reason for the low scores is that
the algorithms cannot find a pattern in the way the image data is structured.
The conclusion is that the classification algorithms have a low performance on the base non-
texture-based images.
35
7.3 Performance on non-texture-based images as complex networks
The third sub-question was:
How do classification algorithms perform on the non-texture-based images modeled
as complex networks?
The SVM algorithm got an accuracy rate of 22% and the highest rate for Random Forest was
19%. Looking at the statistics by class shows that the predictions are mostly random. None of
the classes got acceptable scores when comparing them to the results of the modeled VisTex
dataset. The statistics show that the classification algorithm is not able to correctly classify
the images based on the methods used here.
The reason for this is in line with the reason that the flowers and leaves classes of the VisTex
dataset perform worse than the other classes. The non-texture-based images have more detail
in the images with less detail being the object that needs to be classified. An image with a
plane contains other details that do not add to the classification because they differ within a
class. The images of different classes are too similar when modeled as a complex network
and too different within the same class. This problem is enhanced by the size of the images
used, which means that the details get washed out and harder to differentiate.
This leads to the conclusion that the classification algorithm got a low performance on the
non-texture-based images modeled as complex networks.
7.4 Performances of algorithms compared
The fourth sub-question was:
How do the performances of the algorithms compare to each other?
The algorithms scored similar as can be seen below:
VisTex CIFAR-10 Baseline CIFAR-10 Modeled
SVM 90.6% 28% 22%
Random Forest 90.6% 28% 19%
The highest accuracy rates from the VisTex dataset give the same rate for both the algorithms.
When looking at all the results the Random Forest scores better with an average accuracy of
76.3% compared to the 69.3% that the SVM algorithm obtained.
For the CIFAR-10 dataset, both modeled and the baseline, it is not possible to select one that
performs better than the other. This is mainly because the results are random, and the
differences are more due to random chance then any real difference.
36
The conclusion is that both the algorithms perform similar on the VisTex dataset with the
Random Forest scoring better on average.
7.5 Performance compared to baseline
The last sub-question was:
How do the performances of the non-texture-based images modeled as complex
networks compare to the baseline?
The baseline is comprised of the scores for the modeled VisTex data and the CIFAR-10 base
data. The non-texture-based images modeled as complex networks have highest accuracy
scores of 19% and 22%. The VisTex dataset got 90.6% accuracy and the unmodeled non-
texture-based images got 28% accuracy.
When comparing the performance of the non-texture-based images modeled as complex
networks to the baseline, the results show that it has a low performance. It scores lower than
the unmodeled data by a small margin. The main difference is with the results from the
VisTex dataset which are much higher than 19 to 22%.
The reason for the performance difference is the type of images. The model is not able to
generalize non-texture-based images to the same degree as it is able to do with texture-based
images.
The conclusion is that the non-texture-based images modeled as complex networks compare
unfavorably with the baseline by a substantial margin.
7.6 Modeling non-texture-based images as complex networks?
These sub-questions lead to the answer of the main question which was:
To what extent does modeling non-texture-based images as complex networks work
for classification?
The conclusions to the sub-questions show that the non-texture-based images have a low
performance compared to the baseline. The scores of 19 to 22% accuracy for the modeled
non-texture-based images can be considered low even without comparative material. This
leads to the conclusion that modeling non-texture-based images as complex networks does
not work for classification purposes with the model that is used here. The algorithm that is
used does not impact this conclusion as was shown by comparing the two algorithms.
The reason is that the model used here is not able to generalize images enough based on the
images of the CIFAR-10 dataset. The issue is that the images are too small and lose too much
37
detail because of that. The other issue is that the images within classes are too different and
the images of different classes are too similar.
This does not mean that using complex networks does not work at all for non-texture-based
images. It primarily means that this method of modeling images as complex networks does
not work for these images. Convolutional Neural Networks also make use of networks to
classify images and those have shown to work on the CIFAR-10 dataset. The use of complex
networks is still a method that can be used to great success. Just not in the way that it was
modeled here. It is however a method that needs more research. The results from the VisTex
dataset and studies with Convolutional Neural Networks show promise and other methods
might be discovered that work even better.
38
References
Alapati, N. K., & Sanderson, A. C. (1985, December). Texture Classification Using Multi-
resolution Rotation--Invariant Operators. In Intelligent Robots and Computer Vision IV (Vol.
579, pp. 27-39). International Society for Optics and Photonics.
Anitha, R., & Jyothi, S. (2016, March). A segmentation technique to detect the Alzheimer's
disease using image processing. In Electrical, Electronics, and Optimization Techniques
(ICEEOT), International Conference on (pp. 3800-3801). IEEE.
Barrenas, F., Chavali, S., Holme, P., Mobini, R., & Benson, B. (2009). Supplementary
Material: Network Measures. PloS one, 4(11), e8090.
Benenson, R. (2016). What is the class of this image? Discover the current state of the art in
objects classification. From:
http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html#434946
41522d3130
Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
Cambridge. (n.d.). Introduction to Network Theory.
https://www.cl.cam.ac.uk/teaching/1011/PrincComm/slides/graph_theory_1-11.pdf
Chitre, Y., & Dhawan, A. P. (1999). M-band wavelet discrimination of natural
textures. Pattern Recognition, 32(5), 773-789.
Ciresan, D. C., Meier, U., Masci, J., Maria Gambardella, L., & Schmidhuber, J. (2011, July).
Flexible, high performance convolutional neural networks for image classification. In IJCAI
Proceedings-International Joint Conference on Artificial Intelligence (Vol. 22, No. 1, p.
1237).
Cohen, F. S., Fan, Z., & Patel, M. A. (1991). Classification of rotated and scaled textured
images using Gaussian Markov random field models. IEEE Transactions on Pattern Analysis
& Machine Intelligence, (2), 192-202.
Costenbader, E., & Valente, T. W. (2003). The stability of centrality measures when networks
are sampled. Social networks, 25(4), 283-307.
Davis, L.S., (1981). Polarogram: a new tool for image texture analysis, Pattern Recognition
13
39
Donges, N. (2018). The Random Forest Algorithm. From:
https://towardsdatascience.com/the-random-forest-algorithm-d457d499ffcd
Goyal, R. K., Goh, W. L., Mital, D. P., & Chan, K. L. (1995). Scale and rotation invariant
texture analysis based on structural property. In Industrial Electronics, Control, and
Instrumentation, 1995., Proceedings of the 1995 IEEE IECON 21st International Conference
on (Vol. 2, pp. 1290-1294). IEEE.
Gupta, P. (2017). Decision trees in machine learning. From:
https://towardsdatascience.com/decision-trees-in-machine-learning-641b9c4e8052
Hafner, J., Sawhney, H. S., Equitz, W., Flickner, M., & Niblack, W. (1995). Efficient color
histogram indexing for quadratic form distance functions. IEEE transactions on pattern
analysis and machine intelligence, 17(7), 729-736.
Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector
classification.
Joachims, T. (1998, April). Text categorization with support vector machines: Learning with
many relevant features. In European conference on machine learning (pp. 137-142).
Springer, Berlin, Heidelberg.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep
convolutional neural networks. In Advances in neural information processing systems (pp.
1097-1105).
Lam, W. K., & Li, C. K. (1997). Rotated texture classification by improved iterative
morphological decomposition. IEE Proceedings-Vision, Image and Signal
Processing, 144(3), 171-179.
Li, Y., Liu, L., Shen, C., & van den Hengel, A. (2015). Mid-level deep pattern mining.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp.
971-980).
Ojala, T., Pietikäinen, M., & Harwood, D. (1996). A comparative study of texture measures
with classification based on featured distributions. Pattern recognition, 29(1), 51-59.
Ray, S. (2017). Understanding Support Vector Machine Algorithm. From:
https://www.analyticsvidhya.com/blog/2017/09/understaing-support-vector-machine-
example-code/
40
Rouse, M. (2017). Image Recognition. From:
https://searchenterpriseai.techtarget.com/definition/image-recognition
Rubinov, M., & Sporns, O. (2010). Complex network measures of brain connectivity: uses
and interpretations. Neuroimage, 52(3), 1059-1069.
Scabini, L. F., Condori, R. H., Gonçalves, W. N., & Bruno, O. M. (2018). Multilayer
Complex Network Descriptors for Color-Texture Characterization. arXiv preprint
arXiv:1804.00501.
Silva, T. C., & Zhao, L. (2016). Machine learning in complex networks (Vol. 1). Springer
International Publishing.
Standford. (n.d.). Convolutional Neural Network. From:
http://deeplearning.stanford.edu/tutorial/supervised/ConvolutionalNeuralNetwork/
Strogatz, S. H. (2001). Exploring complex networks. nature, 410(6825), 268.
Van de Wouwer, G., Scheunders, P., Livens, S., & Van Dyck, D. (1999). Wavelet correlation
signatures for color texture characterization. Pattern recognition, 32(3), 443-451.
Zhang, J., & Tan, T. (2002). Brief review of invariant texture analysis methods. Pattern
recognition, 35(3), 735-747.