automated measurements of liver fat using machine...

Master of Science Thesis in Electrical EngineeringDepartment of Electrical Engineering, Linköping University, 2018

Automated Measurementsof Liver Fat Using MachineLearning

Tobias Grundström

Master of Science Thesis in Electrical Engineering

Automated Measurements of Liver Fat Using Machine Learning

Tobias Grundström

LiTH-ISY-EX--18/5166--SE

Supervisor: Andreas Robinsonisy, Linköping University

Magnus Borga, Hannes JärrendahlAMRA Medical AB

Examiner: Maria Magnussonisy, Linköping University

Computer Vision LaboratoryDepartment of Electrical Engineering

Linköping UniversitySE-581 83 Linköping, Sweden

Copyright © 2018 Tobias Grundström

Abstract

The purpose of the thesis was to investigate the possibility of using machine learn-ing for automation of liver fat measurements in fat-water magnetic resonanceimaging (mri). The thesis presents methods for texture based liver classificationand Proton Density Fat Fraction (PDFF) regression using multi-layer perceptronsutilizing 2D and 3D textural image features. The first proposed method wasa data classification method with the goal to distinguish between suitable andunsuitable regions to measure PDFF in. The second proposed method was a com-bined classification and regression method where the classification distinguishesbetween liver and non-liver tissue. The goal of the regression model was to pre-dict the difference d = pdffmean − pdffROI between the manual ground truthmean and the fat fraction of the active Region of Interest (roi).

Tests were performed on varying sizes of Image Feature Regions (froi) and com-binations of image features on both of the proposed methods. The tests showedthat 3D measurements using image features from discrete wavelet transformsproduced measurements similar to the manual fat measurements. The first methodresulted in lower relative errors while the second method had a higher methodagreement compared to manual measurements.

iii

Acknowledgments

Firstly, I would like to thank AMRA Medical AB and my supervisors MagnusBorga and Hannes Järrendahl for the opportunity to perform my thesis workthere. I also want to thank everyone else at AMRA for all the help and greatideas, especially Peter Karlsson Zetterberg and the rest of the DevOps team.

Secondly, I want to thank my supervisor Andreas Robinson and my examinerMaria Magnusson at the Department of Electrical Engineering. Your valuableknowledge and feedback has helped me produce a thesis work that I am reallyproud of.

Finally, I want to thank everyone that I have come into contact with in differ-ent associations during my years at Linköping University. All the adventures wehave had together has helped motivate me in my studies, and have left me eagerto keep on exploring the world.

Linköping, August 2018Tobias Grundström

v

Contents

Notation ix

1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Theoretical Background 32.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Magnetic Resonance Imaging . . . . . . . . . . . . . . . . . . . . . 4

2.2.1 Spin Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2.2 Excitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2.3 Relaxation Times . . . . . . . . . . . . . . . . . . . . . . . . 62.2.4 Volume Construction . . . . . . . . . . . . . . . . . . . . . . 72.2.5 Chemical Shift . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2.6 Dixon Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2.7 Proton Density Fat Fraction . . . . . . . . . . . . . . . . . . 9

2.3 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3.1 Data Classification . . . . . . . . . . . . . . . . . . . . . . . 102.3.2 Data Regression . . . . . . . . . . . . . . . . . . . . . . . . . 112.3.3 Perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3.4 Activation Functions . . . . . . . . . . . . . . . . . . . . . . 122.3.5 Multi-layer Perceptron . . . . . . . . . . . . . . . . . . . . . 132.3.6 Back-propagation . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 Image Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.4.1 Gray Level Co-occurrence Matrix . . . . . . . . . . . . . . . 152.4.2 Discrete Wavelet Transform . . . . . . . . . . . . . . . . . . 172.4.3 Histogram of Gradient Magnitude . . . . . . . . . . . . . . 19

3 Method 213.1 Development Environment . . . . . . . . . . . . . . . . . . . . . . . 213.2 Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

vii

viii Contents

3.2.1 Liver Localization . . . . . . . . . . . . . . . . . . . . . . . . 223.2.2 Binary Masks . . . . . . . . . . . . . . . . . . . . . . . . . . 233.2.3 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3 Classification Method . . . . . . . . . . . . . . . . . . . . . . . . . . 253.4 Classification and Regression Method . . . . . . . . . . . . . . . . . 263.5 Network Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.6 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.6.1 Classifier Evaluation . . . . . . . . . . . . . . . . . . . . . . 273.6.2 System Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 29

4 Results 314.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2.1 Classification Method . . . . . . . . . . . . . . . . . . . . . . 324.2.2 Classification and Regression Method . . . . . . . . . . . . 32

4.3 Feature Region Size . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.4 Image Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.5 Classification and Regression . . . . . . . . . . . . . . . . . . . . . 35

5 Discussion 455.1 Feature Region Size . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.2 Image Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.3 Classification and Regression . . . . . . . . . . . . . . . . . . . . . 46

6 Conclusion 476.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Bibliography 49

Notation

Number sets

Notation Description

R Set of real numbersZ Set of real integers

Abbreviations

Abbreviation Description

mri Magnetic resonance imagingroi Region of interestfroi Feature region of interestpdff Proton density fat fractionrf Radio frequencymlp Multi-layer perceptronglcm Gray level co-occurance matrixi Intensity (image feature)dwt Discrete wavelet transformhgm Histogram of gradient magnitudeloa Limits of agreementcnn Convolutional Neural Network

ix

1Introduction

The following chapter presents the background and purpose of this thesis. Thelimitations and the outline of the thesis are also presented.

1.1 Background

The purpose of this thesis is to investigate the possibility of using machine learn-ing for automation of liver fat measurements in fat-water magnetic resonanceimaging. Today, liver fat is measured in a number of manually positioned regionsof interest (roi). The roi:s are placed manually, avoiding larger blood vesselsand bile ducts that would disturb the measurements. AMRA is now interestedin automating this process. The thesis project investigates machine learning tech-niques for selecting the placement of the liver roi:s. The aim is to have a vali-dated prototype for automated placement of roi:s for liver fat measurements.

1.2 Problem Statement

The following section presents the primary and secondary questions that theproject aims to answer. The questions are stated from the previously mentionedbackground to the project and covers both data acquisition and the use of ma-chine learning as a method.

• Is it possible to generate training data by using a liver MR image and ameasured ground truth of average fat percentage?

• Does the approach of Machine Learning work as a way of automating liverfat measurements?

• Is a segmentation of the liver necessary to obtain good results?

1

2 1 Introduction

• What aspects need to be considered if further work on the subject is to beperformed?

1.3 Limitations

This thesis does not include automated localization of the liver in a whole bodyscan. Instead spatial information about the liver based on mri scans centered atthe liver will be used.

1.4 Thesis Outline

Chapter 2 presents the theoretical background of this thesis, including MagneticResonance Imaging, Neural Networks and a number of image features. Chapter3 describes the proposed method for data acquisition, classification and regres-sion. Chapter 4 contains the experimental setup and the achieved experimentalresults. Chapter 5 discusses the results in Chapter 4 and is the foundation to theconcluding remarks and suggested further research in Chapter 6.

2Theoretical Background

In the following chapter the theoretical background to the work is presented,describing Magnetic Resonance Imaging, Neural Networks and image feature ex-traction methods. The level of theory is adapted to a reader studying at Master’sdegree level in engineering and assumes basic knowledge of physics and multi-variable calculus.

2.1 Related Work

To analyse liver regions different image features are used in previous research.The most common type of image features in Magnetic Resonance Imaging (mri)are describing local texture patterns. Textural features typically consist of GrayLevel Co-occurance Matrix (glcm) measurements, gradient histograms and fea-tures calculated using Discrete Wavelet Transform (dwt). The textural measure-ments are generally used in classification of tumors or other abnormalities inbodily organs. Frequently occurring machine learning techniques include Multi-Layer Perceptrons (mlp), Support Vector Machines (SVM) and ConvolutionalNeural Networks (cnn).

Yudong Zhang et. al [25] perform brain tumour classification using dwt imagefeatures fed to a neural network. The image features’ dimensions are reduced byprincipal component analysis (PCA). They achieve high classification accuracywith a neural network consisting of one hidden layer and a low number of nodes.

Gobert Lee et. al [7] conduct research on liver cirrhosis identification using gra-dient magnitude histograms and glcm texture analysis in subset roi:s of an MRimage. The learning approach used unsupervised k-means clustering learningand achieved a sensitivity and specificity of 72% and 60% respectively.

3

4 2 Theoretical Background

Li Zhenjiang et. al [10] show that texture analysis and detection of liver lesions inmri scans using glcm, dwt and gray level gradient co-occurrences are valuabletechniques that achieve good results. They compare several machine learningclassification methods including Neural Networks, Support Vector Machines andk-Nearest Neighbor that all perform well.

Ritu Punia and Shailendra Singh [18] use image features based on dwt and graylevel histograms to detect liver regions. The optimal features are found usinga genetic algorithm inspired by the principle of biological evolution. The opti-mized feature vector is fed to a neural network and they achieve a liver pixelclassification accuracy of 96,2%.

Convolutional Neural Networks (CNN) that are used in many state-of-the art ap-plications as in [9] produce high accuracy classification. The disadvantage is thatpre-trained medical imaging CNNs are not as widely available as CNNs trainedon natural images, resulting in longer training times and more needed data toachieve state-of-the-art results.

Multi layer perceptrons (mlp) are deemed sufficient, based on previous work, totest the hypothesis of machine learning based pdffmeasurements. Also, shortertraining times compared to cnn:s enables several approaches to be evaluated inthe given thesis time frame. This thesis presents methods for texture based liverclassification and pdff regression usingmlp:s utilizing 2- and 3-dimensional tex-tural image features.

2.2 Magnetic Resonance Imaging

Magnetic Resonance Imaging (mri) uses properties of hydrogen atoms in humantissue when exposed to a strong magnetic field to generate images. The Larmorfrequency is associated with the magnetic moment of the proton and the mag-netic field strength. A radio frequency pulse at a certain frequency ω is applied.If ω matches the Larmor frequency, the proton will precess. When the pulse isturned off, the protons return to their lowest energy state and release energy, thatis measured. The following sections are based on [4] and [17] unless other sourcesare mentioned.

2.2.1 Spin Physics

The principle behind mri is built on the quantum mechanical property of spin.Spin can be described as the rotation of the nucleus about its own axis. Due tothe positive charge of the nucleus, a magnetic field is generated. The magneticmoment of the nucleus µ is proportional to its spin S by its gyromagnetic ratio γ ,

µ = γS. (2.1)

2.2 Magnetic Resonance Imaging 5

The magnetic moment also possesses the same spatial orientation as S.

In an mri scan a strong external magnetic field B0 is applied over the subject.Protons in nuclei of mainly hydrogen atoms will precess about the direction ofB0 at the Larmor frequency ωL as shown in Figure 2.1. The relation betweenthe magnitude of the magnetic field and the precession frequency is given by theLarmor equation,

ωL = γB0. (2.2)

B0μ

ωL

Figure 2.1: A rotating nuclei in a static magnetic field, precessing at theLarmor frequency. Image adapted from [17].

The net magnetization M of the N number of nuclei in a given volume V is calcu-lated from the magnetic moments

M =1V

∑N

µi. (2.3)

M consists of components both parallel and perpendicular to B0. The componentin the transverse plane M⊥ is zero at the equilibrium state, due to the nucleiprecessing at random phases and cancelling out one another. The longitudinalcomponent in the z-direction on the other hand has a small net magnetizationM‖ = M0, due to the nuclei tending to align along the direction of the externalmagnetic field. This tendency comes from the potential energy of the magneticmoments in B0 moving towards the minimum energy state. According to

E = −µ ·B0, (2.4)

the lowest energy is achieved when µ and B0 are parallel.


2.2.2 Excitation

A magnetic field pulse B1 in the radio frequency (rf) domain is applied to thesubject in the transverse plane. In Figure 2.2, B1 oscillates at a frequency equalto the Larmor frequency ω = ωL and thus resonance of the protons is achievedand the spin will be tipped away from the ωL-axis. The flip angle θ, how widethe protons precess around the axis, increases by increasing the magnitude or theduration of B1 according to the relation

θ = γB1τ. (2.5)

When the pulse is turned off the protons gradually return to the equilibrium state.

B1

ωL

γB1

θM

Figure 2.2: A reference frame rotating at the Larmor frequency depictingthe flip angle of the magnetization when a magnetic field B1 is applied. Theprotons precess around the axis at which B1 is applied to. Image adaptedfrom [17].

2.2.3 Relaxation Times

When an rf pulse is applied the magnetization in the transverse plane, Mxy in-creases while Mz decreases. When the pulse ends, Mxy has reached its maximummagnitude. The magnitude then gradually decreases until the protons have re-turned to the equilibrium state when Mz = M0. The time it takes to reach theequilibrium is called the relaxation time. The relaxation time in the longitudinaldirection is the time it takes for Mz to reach 63% of its maximum value. It iscalled T1-relaxation and is shown in Figure 2.3a. It is given by

Mz(t) = M0(1 − e−tT1 ). (2.6)


In the latitudinal direction the process is called T2 relaxation, seen in Figure 2.3b.T2 relaxation is defined as the time it takes the magnetization Mxy to decrease to37% of its initial value. It is described as

Mxy(t) = Mxy(0)e−tT2 . (2.7)

(a) Increase of magnetization over time in the lon-gitudinal direction showing T1 relaxation.

(b) Decrease of magnetization in the transverseplane showing T2 relaxation.

Figure 2.3: Graphs showing T1 and T2 magnetization relaxation. Imagesource [20].

2.2.4 Volume Construction

In order to construct an MR volume, each voxel has to be associated with a signalresponse; the concept of gradient coils is used in encoding this spatial informa-tion.

At first, a slab, a 3D volume of the patient is excited as described in Section 2.2.2.A gradient magnetic field is applied in every spatial dimension (x, y and z) thatgradually increases in magnitude as the coordinates increase. The gradient fields(Gx, Gy , Gz) combined with the static field B0 results in a spatially varying field.

B(x, y, z, t) = B0 + xGx(t) + yGy(t) + zGz(t). (2.8)


The angular frequencies of the spins can then be calculated using the relation in(2.2) as

ω(x, y, z, t) = γB(x, y, z, t) = γB0 + γ(xGx(t) + yGy(t) + zGz(t)). (2.9)

The emitted signal of the slice is contributed to by all the spins in that volume.The signal responses are presented in an array of numbers called a k-space vol-ume with encoding in the x-direction on one axis, kx, and encoding in the y-direction on the other axis, ky and the encoding in the z-direction, kz on thethird axis. When the amplitudes of the three gradients change with the time t,it is possible to measure the signal in every position of the k-space. The k-spaces(kx, ky , kz) is a frequency domain representation of the MR volume, and usingthe inverse two dimensional Fourier transform,

I(x, y, z) =∫ ∫ ∫

s(kx, ky , kz)e2πi(kxx+kyy+kzz)dkxdkydkz = F −1{s(kx, ky , kz)},

(2.10)the volume I in the signal domain is reconstructed. A slice of the volume is shownin Figure 2.4b

(a) A k-space represen-tation of the volumeslice in 2.4b.

(b) A reconstructed MR volumeI(x, y, z = z1) after the inverseFourier transform has been ap-plied.

Figure 2.4: The k-space representation of the signal and the reconstructedMR volume.

2.2.5 Chemical Shift

Living tissue is dominated by hydrogen atoms in water that produce an mri sig-nal. Tissue containing fat also produce a notable signal. Fat and water are chemi-cally shifted, and this property is utilized when separating tissue types in an MRimage. A chemical shift is associated with a frequency shift of the different pro-tons’ Larmor precession frequencies due to differences in magnetic shielding in


the chemical compounds. The fat is shifted to a lower precession frequency (ωf )compared to water (ωw) and the difference is described by

∆ωf w = ωf − ωw = −σγB0 (2.11)

where σ is a dimensionless fraction of the field B0 that describes the chemicalshift.

In the reference frame of the water signal w and due to ∆ωf w the protons willvary the precession between having the same phase and being out of phase. Thisis the property that the Dixon separation method in Section 2.2.6 takes advantageof.

2.2.6 Dixon Imaging

The two point Dixon method [5] consists of acquiring two separate spin echoimages with different echo times. First, an image where the fat and water signalsare in phase,

IIP = f + w, (2.12)

and secondly, one where the signals are out of phase,

IOP = f − w. (2.13)

The images containing separated fat signals can be extracted as the sum of IIPand IOP according to

12

(IIP + IOP ) =12

(f + w + f − w) =12

(2f ) = f (2.14)

and the separated water signals are similarily extracted as the difference of IIPand IOP ,

12

(IIP − IOP ) =12

(f + w − (f − w)) =12

(2w) = w. (2.15)

See Figure 2.5 for an illustration of the different images.

2.2.7 Proton Density Fat Fraction

The signals of themri are separated into fat and water images as described in Sec-tion 2.2.6. From these images the Proton Density Fat-Fraction (pdff) is measuredas the ratio of density of mobile protons in the fat signals to the total density ofprotons in mobile fat and mobile water. pdff is a standardized biomarker formeasuring fat concentration. It is interpreted as the actual fat fraction of MR-visible tissue. [19]

P DFF =f

f + w(2.16)


(a) An in phase image of a liver,IIP .

(b) An opposite phase image of aliver, IOP .

(c) A fat image created from ad-dition of IIP and IOP .

(d) A water image created fromsubtraction of IIP and IOP .

Figure 2.5: In phase and opposite phase images that create fat and waterseparated images.

2.3 Neural Networks

An artificial neural network (ANN) is a mathematical model for data classifica-tion and regression inspired by a network of biological neurons as in the humanbrain. Neural networks are common within for example image processing, objectrecognition and voice recognition. This section presents the different parts of asimple neural network called the multi-layer perceptron and how it is trainedfrom given data. The following section is based on [12] and [2].

2.3.1 Data Classification

Data classification is the process to correctly classify data as one of possibleclasses from a given input. In supervised learning, where neural networks areused, a rule for classification called a discriminant function that separates thedata is created. The discriminant is based on data samples where the class isknown, in order to predict the corresponding unknown class for a new previ-ously unseen data sample. Figure 2.6 shows two classes being separated by adiscriminant. New samples on one side on the discriminant will correspond to

2.3 Neural Networks 11

one class, and samples on the other side to the other class.

Figure 2.6: Two data classes (green and red) separated by a linear discrim-inant function. The discriminant function outputs values of 1 if the inputsample belongs to the green class and 0 if the sample belongs to the redclass.

2.3.2 Data Regression

Data regression analysis is the task to estimate a model that fits given data sam-ples, where the output value is known. Compare with classification, where thegoal is to discriminate samples. The goal of the regression model is to correctlyanticipate output values based on inputs. Neural networks can also be used in re-gression analysis to predict continuous output values. Figure 2.7 depicts a linearregression model that aims to describe the surrounding samples.

Figure 2.7: A regression model fit to several data samples. The regressionmodel generates a continuous value based on the input sample.


2.3.3 Perceptron

The perceptron creates linear discriminant functions and regression models inthe form of a hyperplane. The perceptron in Figure 2.8 is a statistical modelinspired by a neuron. For a given input, the perceptron returns a class of thesample. The decision is based on previous data samples used to optimize theparameters of the model. The equation

zk =∑i

wkjxj + w0, (2.17)

describes a perceptron mathematically. Here xj are the inputs that can comefrom the environment or from other perceptrons and wkj are the weights to neu-ron node zk . The weight w0 is called the bias weight and has the purpose ofproviding a more general model by translating the model in the input featurespace. The input x0 is called the bias unit and is always set to a static value of+1. To create a non-linear function, two additional steps are necessary namelyactivation functions and hidden layers. This is discussed in Section 2.3.4 and2.3.5.

z

x0

xi

w0

wi

f(z)y

Figure 2.8: A single perceptron that adds all weighted inputs and passes thevalue to an activation function f (z).

2.3.4 Activation Functions

In order to model non-linear functions, the first step is to use different types ofactivation functions f (zk) as shown in Figure 2.8 at the output of neurons,

yk = f (zk). (2.18)

An activation functions both provides non-linearity to the weighted sum of theneuron and limits the output of the neuron to the output range of the activationfunction. A common activation function in the intermediate layers in a multi-layer perceptron is the logistic sigmoid function in Equation (2.19). The logisticsigmoid function limits the output of the neuron between 0 and 1.

f (zk) =1

1 + exp(−zk)(2.19)

2.3 Neural Networks 13

2.3.5 Multi-layer Perceptron

The second step of modelling non-linear functions is to add several layers ofmultiple perceptron nodes, a model called the multi-layer perceptron (mlp), seeFigure 2.9. A multi-layer perceptron contains an input layer where data entersthe network, an output layer where the resulting classification or prediction ofthe data is shown and one or several intermediate layers, called hidden layers.All the layers are fully connected, meaning that all inputs are connected withall nodes by a weighted path. Only the input and expected output are knownwhen training the network and that is the property used in the back-propagationtraining method. When training the network the purpose is to find the optimalweights that lead to an output that minimizes an error function. An example isthe squared error function

ε =12

K∑k

(dk − yk)2, (2.20)

where dk is the expected output and yk is the received output from the node kand K is the number of output nodes. A second example is the logarithmic lossfunction, also called the cross entropy function,

ε = −K∑k

N∑n

dk log yk , (2.21)

with K output nodes and N sample classes. The loss function simplified to twodiscriminating classes results in,

ε = −∑k

dk log yk + (1 − dk) log (1 − yk). (2.22)

The cross entropy function measures the dissimilarity between dk and yk . Whena correct classification is performed, the value of the function is 0 .

2.3.6 Back-propagation

The back-propagation method is based on minimizing the output error by usingthe steepest descent algorithm; the algorithm utilizes the direction of the gradi-ent,

∇ε =∂ε∂wkj

, (2.23)

to incrementally move towards a minimum value. The gradient of the errorfunction is determined by using the derivative chain-rule to work its way to theknown input of the system. The weights from node j to k are updated by

wm+1kj = wmkj + ∆wmkj , ∆wmkj = −η∇ε, (2.24)


Input layer Hidden layer Output layer

X y

Figure 2.9: An example of a multi-layer perceptron with an input layer, onehidden layer and a single node output layer.

at the m:th iteration of algorithm where η is the learning rate.

Given output layer nodes k ∈ (0, ..., K) and adjacent previous layer nodes j ∈(0, ..., J), the output layer weights wkj are updated using (2.24) and the gradient

Φk =∂ε∂yk

·∂yk∂zk

= f ′(zk)(yk − dk), (2.25)

∇ε =∂ε∂wkj

=∂ε∂yk

·∂yk∂zk

·∂zk∂wkj

= Φk∂zk∂wkj

= Φkyj . (2.26)

The update rule can be generalized for an arbitrary layer disregarding the outputlayer. The generalized update rule is described by (2.27) and (2.28). For any ac-tivation function f , the update rule is extended by updating the weights fromnodes i to j at layer r. The update is performed by taking the layers r − 1, r, r + 1with nodes i, j and k in consideration.

Φrj =∂ε∂yrj

·∂yrj∂zrj

= f ′(zrj )∑k

Φr+1k wr+1

kj , (2.27)

∇ε =∂ε∂wrji

=∂ε∂yrj

·∂yrj∂zrj

·∂zrj∂wrji

= Φrj

∂zrj∂wrji

= Φrjyr−1i (2.28)

Equation (2.27) indicates that Φrj at layer r depends on all subsequent layers (r+1,r + 2, ...) up until the output layer. The learning rate η determines the length

2.4 Image Features 15

of the step taken in the direction of the gradient, in other words how big theupdates of weight parameter values are. To decrease the time until the functionhas converged, an adaptive learning rate can be used. In this case a larger value ofη is chosen at the start of the training. By lowering the value of η as the learningprocess slows down, fine tuning of the weight parameters can be achieved.

2.4 Image Features

The input to themlp classifier consists of several image features. In the followingsection the gray level co-occurrence matrix, histogram of gradient magnitudesand discrete wavelet transforms are presented. The different image features de-scribe textural properties of local image regions.

2.4.1 Gray Level Co-occurrence Matrix

0 1 1 2 3

0 1 1 3 2

0 0 0 1 3

1 2 2 3 3

1 2 2 3 3

Figure 2.10: A 5 × 5 pixel image region with four different gray levels.

The gray level co-occurrence matrix (glcm) [13] is used for calculating texturalfeatures in images. The textural features consist of several statistical measure-ments describing different relationships between pixel gray levels in a designatedregion.

Let the gray-scale image I(x) be an M × M matrix, i.e x ∈ {1, · · · , M}2 with Ngray-levels. Let the gray-level co-occurrence matrix P be anN ×N matrix with theelements P (i, j), i, j ∈ {0, · · ·N −1} and that x ∈ {1, · · · , M}2 and x+d ∈ {1, · · · , M}2,count the tuples (i = I(x), j = I(x + d)) and store the counts in P (i, j).

As stated above, the glcm of an image patch extracted from an image with Ngray levels consists of an array P (i, j) of N × N elements. Figure 2.10 shows a5× 5 pixel image patch with four gray levels that will be used as an example. Sev-eral measures for different angles can be performed. For example, at angle 0 theneighboring pixel at distance d to the right of the query pixel is considered. Theoccurrence rate of gray value i with neighboring gray value j at distance d = 1are counted and entered at P (i, j) in Figure 2.12a. For example at position (3, 2)


π/4π/23π/4

0d = 1 d = 2

Figure 2.11: The glcmmatrix of an image region can be calculated at differ-ent angles and different step lengths d.

the number of times a pixel with gray value 3 has a neighboring pixel to the rightwith gray value 2 is counted. Matrices can be calculated for the angles 0, π

4 , π2

and 3π4 in Figure 2.11, resulting in different angular glcm:s. Due to symmetry

where (i, j) occurs at the same rate at angle 0 as (j, i) occurs at angle π, the re-maining angles can be taken into account by adding P (i, j) with its transpose, seeFigure 2.12b. In a similar way π

4 is combined with 5π4 and π

2 is combined with3π2 . To provide rotation invariance to the features, the final textural features are

calculated from an average of all the angular glcmmatrices. In Figure 2.12c theglcm is normalized by dividing each entry in the matrix in Figure 2.12b by thetotal number of occurrences.

Haralick [13] presents 14 textural measures. Since several of the measures de-scribe similar features; all of them will not be described here. Three distinctmeasures will be shown, however. The contrast measure,

fcon =∑i,j

P (i, j)|i − j |2, (2.29)

describes the amount of local variation in the image. All diagonal values in P thathave gray level i = j give no contribution to the contrast. By weighting highergray level differences higher, these elements will contribute more to the contrastmeasure. The angular second-moment (ASM),

fASM =∑i,j

P (i, j)2, (2.30)

results in high values when occurrence rates are high. In other words if a certaingray level difference occurs more frequently, there is more homogeneity in theimage and the ASM measure will be higher. To measure linear dependencies in


2 3 0 0

0 2 3 2

0 0 2 3

0 0 1 2

j = 0 j = 1 j = 2 j = 3

i = 0

i = 1

i = 2

i = 3

(a) glcm matrix calcu-lated from the image re-gion with four gray levelsin Figure 2.10 showing theoccurrence rates of graylevel differences at angle 0and distance 1.

4 3 0 0

3 4 3 2

0 3 4 4

0 2 4 4

(b) Symmetrical glcmmatrix created by addingP (i, j) with P (i, j)T .

0.1 0.075 0 0

0.075 0.1 0.075 0.05

0 0.075 0.1 0.1

0 0.05 0.1 0.1

(c) Normalized symmetri-cal glcm where all ele-ments have been dividedby the total number of oc-currences.

Figure 2.12: GLCM matrices in the basic, symmetrical and normalized vari-ants.

the image the correlation feature,

fcorr =

∑i,j (i, j)P (i, j) − µxµy

σxσy, (2.31)

where the means µx, µy and standard deviations σx, σy of px(i) =∑j P (i, j) and

py(i) =∑i P (i, j) are taken into account. A high correlation indicates higher

amounts of linear structure in the image.

2.4.2 Discrete Wavelet Transform

The continous wavelet transform

Cf (a, b) =∫R

f (t)ψa,b(t)dt (2.32)

is a 1D signal analysis method that acts in several resolutions simultaneously. Byapplying a mother wavelet

ψa,b =1√aψ

(t − ba

),

∫R

ψ(t) = 0, (2.33)

to a signal f , the fluctuations of the signal can be analyzed at any scale a ∈ R+

and position b ∈ R using the coefficients Cf (a, b).


In the case of sampled signals, a discrete version of the previously mentionedtransform, the discrete wavelet transform (dwt)

ψj,k(t) =1√

2jψ

(t − k2j

2j

), j, k ∈ Z (2.34)

is derived. It can be implemented using filter banks containing high- and low-pass filters as in Figure 2.13. The dwt decomposes the signal into approximationA and detail coefficients D that are obtained by using a convolution followed bydown-sampling by a factor of 2. The signal is convolved with a low-pass filter forthe approximation and a high-pass filter for the detail coefficients. By recursivelyperforming the process on A, the signal can be decomposed in several levels. Theapproximation Aj−1 at scale j − 1 can be reconstructed using Aj and D j at level jaccording to

Aj−1 = Aj + D j . (2.35)

2 ↓

2 ↓

2 ↓

2 ↓

LP

LP

HP

HP

A1

A2

D2

D1

1D signal

Figure 2.13: A two-level filter bank implementation of the dwt. Decomposi-tion of the input signal into first and second level detail and approximationcoefficients. The boxes denote low- and high-pass filters (LP and HP) anddown-sampling by a factor of 2 (2 ↓)

To generalize the dwt algorithm for two-dimensional signals (images) the de-composition is performed in both the horizontal and the vertical directions. Thisresults in detail coefficients in both the horizontal, DH , vertical DV and diagonalDD directions as seen in Figure 2.14. [15]

The image features are calculated as the mean and standard deviation of eachset of coefficients at every decomposition level [21].


A DH2

DV2 DD

2

DV1 DD

1

DH1

Figure 2.14: 2 level 2D discrete wavelet transform, approximation A, hori-zontal DH , vertical DV and diagonal coefficients DD .

2.4.3 Histogram of Gradient Magnitude

The histogram of gradient magnitude (hgm) is an image feature that describesthe distribution of local shape intensities in an image region I . A gradient imageis created using convolution kernels, the Sobel filters,

Sx =

1 0 -12 0 -21 0 -1

, (2.36)

Sy =

1 2 10 0 0-1 -2 -1

, (2.37)

resulting in Gx = Sx ∗ I and Gy = Sy ∗ I , respectively. The magnitude gradient

image is a combination of the gradient images as Gm = |G| =√G2x + G2

y . The gra-dient magnitudes are normalized and distributed in a histogram with n numberof bins H(Gm, n) as shown in Figure 2.15 where the histogram itself is the imagefeature descriptor. [22]


Figure 2.15: A histogram of gradient magnitudes H(Gm, n) with 20 bins (n ∈{1, 2, ..., 20}) that is used as a feature descriptor.

3Method

In the following chapter the two proposed methods are presented. The firstmethod is a data classification method with the goal to distinguish between suit-able and unsuitable regions to measure pdff in. The second method is a com-bined classification and regression method where the classification distinguishesbetween liver and non-liver tissue. The goal of the regression model is to predictthe difference d = pdffmean − pdffroi between the manual ground truth meanand the fat fraction of the active roi. Thus the data sample output will be d in-stead of a class label as in the classification case. It is described how training datais acquired from the available data collection. This is followed by a descriptionof the classifier implementation.

3.1 Development Environment

The system was developed in Python 3 and the external libraries Sci-kit learn [16]and Keras [6] for the implementation of the learning algorithms. For the imageprocessing methods Sci-kit image [24] and PyWavelets [11] were used.

3.2 Data Acquisition

The UK Biobank [1] is a health data resource where 500.000 people in the ages40-69 years have taken part in measurements and analyses of different kinds. Thegoal of the project was to create an important resource open for research both inacademia and industry. In this thesis full body and liver mri scans were usedin the experiments. The data available from the UK biobank has been markedwith roi:s by an operator and measured using AMRAs methods. All possible roiplacements within the liver mask area were tested and compared to the average

21

22 3 Method

Whole body scans Liver slab

Find Liver Slice

Liver Region Mask

Operator ROIs

Non-liver Region Mask

Extract Image Features

Extract Feature Region

Extract Feature Region

Compare to Ground Truth PDFF Value

Feature vector Class label

Figure 3.1: Design of the implementation of data acquisition based on wholebody MRI scans and operator placed ROIs. The events within the dotted areaare repeated for all positions inside the Liver Region Mask.

fat percentage that has been previously measured. This approach aimed to findall possible roi:s in the liver image, that both match and don’t match the averagefat percentage. In sections 3.2.1 - 3.2.3 the parts of Figure 3.1 are described morein detail.

3.2.1 Liver Localization

Acquisition of mlp training data was performed according to Figure 3.1. Theinput to the system were fat and water separated whole body scans and a liverslab scan, which is a limited measured volume centered at the liver, that is not fatand water separated. The operator placed roi:s were also fed to the system. Theliver z-coordinate zliver in the whole body scan was retrieved based on the liverslab scan by transforming its z-coordinate. The z-coordinate was transformed bytaking voxel sizes Vz and coordinate system offsets δ into account according toEquations (3.1) and (3.2) where wc denotes a world coordinate system. A onevoxel height volume at the z-coordinate gave the 2D liver image in Figure 3.3athat was passed on in the pipeline.

3.2 Data Acquisition 23

zwc = δ − Vz2− zVz (3.1)

zliver =zwc,slab − zwc,body

Vz,body(3.2)

3.2.2 Binary Masks

To generate data samples that describe regions that are suited and regions thatare unsuited for roi placement, two binary masks were created. The liver re-gion mask in Figure 3.3b was based on the positions of operator placed roi:s.The mask was created by calculating the distance d from every roi to its closestneighboring roi. By connecting all roi:s with a distance less than c × max(d)(out of all roi:s) polygons were created that were used to create the binary mask.The creation of the polygons was performed according to Algorithm 1. All pixelscontained within the region spanned by the roi:s were known to contain liver tis-sue. The non-liver region mask in Figure 3.3c was extracted from the right-mostthird part of the image. Here, the probability of finding liver tissue was low, thusdescribing areas outside of the liver.

3.2.3 Feature Extraction

For every pixel contained within the masks, a square Feature Region of Interest(froi), was extracted. One example of size 30 × 30 is shown in Figure 3.2. From

Figure 3.2: A 30 × 30 pixel froi extracted from a liver mri image.

these regions the texture and intensity image features described in Section (2.4)were computed. In every froi within the liver region mask, a roi was placedto measure the fat fraction of the region. The roi:s were in most cases smallerthan the froi:s. In the method using only roi classification, an acceptance limit

24 3 Method

Algorithm 1 Create liver region mask

c = Limiting parameterL = List of roi:s

for m in L do . Calculate distances between roi pairsfor n in L do

D(m, n) =√L(m)2 − L(n)2

if m==n thenD(m, n) = ∞

end ifend for

end for

M = max(min(D)) . Find max. value of all closest neighbors

for m in L do . Create polygons by drawing lines between roi:sfor n in L do

if D(m, n) ≤ c ∗M thenDraw line between L(m) and L(n)

end ifend for

end for

Fill all polygons

3.3 Classification Method 25

parameter was set. Region fat fractions within ±0.8 percentage points (based oncompany experience) of the ground truth fat fraction were classified as suitableregions and give a class label 1. All measurements outside the limit were classi-fied as unsuitable regions and were given a class label 0. In addition all froi:soutside of the liver mask were classified as unsuitable regions with class label0. In the method combining both a classifier and a regression model, all roi:scontained by the liver mask were classified as liver regions with a class label 1.All regions outside the liver mask were classified as non-liver regions and weregiven a class label 0. The difference

d = pdffmean − pdffroi (3.3)

between the manual ground truth mean and the fat fraction of the active roigenerated outputs used in regression model training.

(a) Slice of a wholebody mri scan at thez-coordinate of the liver.

(b) Binary liver maskbased on manually placedroi:s.

(c) Binary mask used togenerate samples outsideof the liver.

Figure 3.3: A liver image, binary liver mask and outside liver binary mask.

3.3 Classification Method

The method for data classification in Algorithm 2 resembled the data acquisi-tion method in some aspects. The only difference was that all roi:s were treatedequally, no distinction between liver or non-liver regions were made. When plac-ing and classifying roi:s, every roi position in the liver slice was tested by mov-ing stepwise through the whole image. For every position a froiwas created andimage features were computed. The features were assigned a class label throughthe trained classifier; if the roi was placed in a suitable region, i.e. Class = 1,it was used to calculate the mean fat fraction of the liver. When calculating themean fat fraction, the nine roi:s with the highest probabilities were used. Nineroi:s were used since it is the same number as AMRA uses in analyses. The classprobability is the output from the mlp in Section 2.3.5,

• If y > 0.5 , Class = 1,

26 3 Method

• If y < 0.5 , Class = 0.

Algorithm 2 roi Classification

Find liver slicefor all possible roi positions do

Extract feature regionCompute image featuresClassify roiif Class == 1 then

Save roiSave Class probability

end ifend for

Calculate mean fat fraction from 9 saved roi:s with highest probabilities

3.4 Classification and Regression Method

The combined method of liver classification and pdff regression in Algorithm 3had two steps. Both steps were based on similar neural network architectures.First a roi was classified as being in the liver or not. Second, the difference d inEquation (3.3) was predicted by the neural network to determine the expectedlevel of agreement to ground truth values. roi:s with small d values and clas-sified as liver, i.e Class = 1, were used when calculating the mean fat fraction.

Algorithm 3 Liver Classification and Regression

Find liver slicefor all possible roi positions do

Extract feature regionCompute image featuresClassify roiif Class == 1 then

Predict dSave roi

end ifend for

Calculate mean fat fraction from 9 saved roi:s with smallest d

3.5 Network Training 27

3.5 Network Training

The mlp in Figure 3.4 was implemented as a sequential Keras model, that al-lows changes to several aspects of the neural network layout. The mlp con-sisted of one hidden layer with 20 hidden nodes with a fH = tanh(x) activationfunction. The output layer with one node used the logistic activation function,fO,c = (1 + exp(−x))−1 in classification and fO,r = x in regression. The loss func-tion that was optimized in the regression model was the squared error functionin Equation (2.20). In classification the cross entropy function in Equation (2.22)was used.

20 Nodes

Input layer Hidden layer Output layer

Features 1 NodefH fO

Figure 3.4: Principle sketch of the implemented mlp with activation func-tions in the hidden and output layers.

Optimization of the mlp was performed using the adaptive optimization algo-rithm Adam [14]. The acquired training data was balanced to achieve a similardistribution between suitable and unsuitable region samples. Balancing of datawas performed by removing redundant samples until the wanted class balancewas achieved. The data was processed to have zero mean and a standard devia-tion of 1. This was achieved by subtracting each feature by the mean of the en-semble and dividing by the standard deviation of the ensemble. The model wasfit to the data until no further notable decrease in the value of the loss functionwas achieved on the validation data.

3.6 Performance Evaluation

To evaluate the proposed method, evaluations of both the classifier and the sys-tem as a whole were performed.

3.6.1 Classifier Evaluation

Evaluation of the trained classifier is necessary due to different aspects of themodel being tuned to increase performance. Evaluation is important when com-paring different classifiers to one another to determine which one provides the

28 3 Method

best model of the data. The primary evaluation metric that was used in the exper-iments was the Accuracy score for binary classifiers [23] that measures the ratioof correctly classified data samples and shows the overall effectiveness of the clas-sifier. The accuracy score is determined from the Confusion Matrix in Figure 3.5by using equation (3.4). The accuracy score was used when comparing differenttraining sessions to find the best performing parameter setup.

TP = True Positive

TN = True NegativeFP = False Positive

FN = False Negative

Figure 3.5: The confusion matrix for a binary classifier, containing TP, FP,FN and TN samples.

Accuracy = (T P + T N )/(T P + T N + FP + FN ) (3.4)

According to [8] the accuracy score has to be applied to all data samples and thusa score estimation method should be applied to receive a more statistically correctevaluation of the classifier. To achieve this the k-fold cross validation method wasused to train and validate the classifier on several subsets of the data set in differ-ent combinations as shown in Figure 3.6 (where k = 3). A mean accuracy scorewas then calculated using these measurements. To keep data that the system doesnot train on for testing, a portion of the data is extracted from the dataset beforeseparating into training data and validation data.

Training Validation Test

Training Validation TestTraining

Training

TrainingValidation TestTraining

Figure 3.6: A 3-fold cross validation scheme that performs training on dif-ferent parts of the data set. A fourth part of the data is never trained on andis used for testing.

3.6 Performance Evaluation 29

3.6.2 System Evaluation

The system was evaluated by adding the automatically generated roi:s to theliver slice of investigation and calculating the mean fat percentage. These val-ues were compared to the fat percentage corresponding to the manually placedroi:s. To compare the two approaches the Bland-Altman [3] method was used.The Bland-Altman graph structure in Figure 3.7 shows the difference in mea-surements (A − B) against the average (A + B)/2. The plot determines the meandifference of the methods, the bias, and the 95% confidence interval called thelimits of agreement (loa). The interpretation of the graph is that lower bias andnarrower limits of agreement indicate more similar measurements and a highermethod agreement.

Diff

eren

ce

Mean

Mean difference

+ Limit of agreement

- Limit of agreement

Figure 3.7: Figure showing the structure of a Bland-Altman plot. The plotshows the mean difference between compared methods and the 95% confi-dence interval.

4Results

In the following chapter the decided approach on how to answer the stated ques-tions in Section 1.2 is presented. This is followed by the experimental setup andexperimental results.

4.1 Approach

To determine wheather or not a machine learning approach is applicable in mea-suring liver fat and how to generate good training data, several parts of the prob-lem have to be investigated. To achieve this, experiments were performed withfocus on the following areas:

• Comparing different image features.

• Whether the froi where image features are extracted and the roi whereliver fat is measured should have different sizes. This was tested by runningtrials on different sizes and comparing the outcomes to one another.

• Whether the 2D approach of measuring gives a good enough result or if themeasurements have to be expanded to 3D volumes. This was determinedby comparing the automatic measurements to the manual measurements.

4.2 Experimental Setup

500 subjects were used in the following experiments. From these, 400 were usedin data acquisition; resulting in model training and validation data. The final 100subjects were used in method evaluation by performing Bland-Altman analysisand measuring the mean relative error (MRE),

31

32 4 Results

MRE =1N

N∑n=1

pdfftrue,n − pdffmean,npdfftrue,n

, (4.1)

compared to ground truth values. The ground truth fat fraction values were mea-sured using a different type of mri scan method. This resulted in comparisonsbetween two different scan methods and a comparison between manual and au-tomatic measurements.

Voxels in the data set had an approximate resolution of 2 × 2 × 3 mm. The roi:swere created using a cylinder with 7 mm radius and 11 mm height in the 3Dcase; resulting in approximately 3 pixel radius and height. In the 2D case roi:shad a 3 pixel radius and a 1 pixel height. To reduce computation times duringevaluation of methods, froi:s were placed at 10 pixel intervals. The region sizesare discussed further in Section 4.3. The image features are calculated from boththe fat- and water separated images, resulting in two sets of each image feature.´During calculations of glcm features, the input images were quantized to 64gray levels, to receive lower computational times. No notable loss in feature qual-ity was observed. The step lengths (1,2,3,4) and angles (0◦, 45◦, 90◦, 135◦) wereused. The hgm values were separated into a 10 bin histogram. In calculationsof dwt features, images were decomposed into 2 levels using the mother waveletdb2 from the Daubechies family [15].

4.2.1 Classification Method

Training and validation data samples were extracted as three different types inorder to simplify balancing. There were samples that describe suitable positionsinside liver (Class = 1), unsuitable positions inside liver (Class = 0) and samplesoutside liver (Class = 2). Samples were balanced to 50%/25%/25% of suitableregions, unsuitable regions and outside liver, respectively. After balancing, allsamples of class 2 were given class 0. This resulted in that class 0 was describingunsuitable regions both inside and outside of the liver.

4.2.2 Classification and Regression Method

In the combined method, training and validation samples were extracted as in-side liver (Class = 1) and outside liver (Class = 0). Balancing of data was per-formed to reach a 50%/50% ratio. Regression model samples were extracted ex-clusively from inside the liver.

4.3 Feature Region Size 33

4.3 Feature Region Size

The aim of the first test on the classification method was to determine the froisize to use in further experiments. Experiments were performed on three differ-ent froi sizes, shown in Figure 4.2 to visualize the different levels of detail. The7 × 7 froi is coarse while the 21 × 21 froi contains more detail. The width ofthe froi:s in 2D were multiples of the width of the roi:s. In 3D, a region size of5×5×5 was also tested. The results presented in Table 4.1 contain measurementson classification accuracy on validation data at the last epoch of training. The ta-ble also presents the results from Bland-Altman analysis on 100 test data setswhere the bias is the mean percentage point difference and the limits of agree-ment, loa, describes the 95% confidence interval. In the column named input,the number of input features are presented. In Table 4.1, it can be seen that a14 × 14 froi produced the lowest bias and loa. Both an increase and decrease infroi size resulted in higher values on the mean difference and a wider confidenceinterval. The method agreement measurements, bias and loa, were higher in 3Dmeasurements. The mean relative errors (MRE), were in general lower for smallregions.

Since the results from 5 × 5 × 5 and 7 × 7 × 7 froi:s produce similar values, theirrespective Bland-Altman graphs are shown in Figure 4.2 for further analysis. Thedifferent froi sizes produced similar predictions.

Table 4.1: Table presenting results from the froi size experiments using theclassification method. For an explanation of the measurements, see Section4.3.

FROI (px) Features Input Accuracy (%) Bias (pp) LoA (pp) MRE (%)

5 × 5 × 5 i + glcm 28 84.30 0.26 ± 1.63 14.107 × 7 i + glcm 28 82.00 0.19 ± 1.76 16.74

7 × 7 × 7 i + glcm 28 83.74 0.22 ± 1.64 14.2814 × 14 i + glcm 28 81.50 0.13 ± 1.52 17.02

14 × 14 × 14 i + glcm 28 82.24 0.17 ± 1.82 20.1421 × 21 i + glcm 28 80.50 0.15 ± 1.98 18.97

21 × 21 × 21 i + glcm 28 81.63 0.29 ± 3.82 23.12

34 4 Results

(a) Bland-Altman analysis of the 5 ×5× 5 froi comparison to ground truth.The difference is given in percentagepoints.

(b) Bland-Altman analysis of the 7 ×7× 7 froi comparison to ground truth.The difference is given in percentagepoints.

Figure 4.1: Bland-Altman comparison of 5 × 5 × 5 and 7 × 7 × 7 size froi:s.

(a) 7 × 7 pixel froi placedin a liver image.

(b) 14 × 14 pixel froiplaced in a liver image.

(c) 21 × 21 pixel froiplaced in a liver image.

Figure 4.2: Different sizes of froi:s show different detail levels of texture.

4.4 Image Features

The second test of the classification method aimed to study different image fea-tures and their impact on method outcomes. Image features were tested exclu-sively and in combinations to analyze if any are complementary when describingimage regions. The results are presented in Table 4.2. 3D tests were performedusing a 7×7×7 froiwhile two 2D experiments were performed on 14×14 froi:s.dwt features were calculated using only one decomposition level in 3D due tothe froi size being too small for a two level decomposition. Setups of image fea-tures consisted of intensity (i), intensity and histogram of gradient magnitude(i + hgm), intensity and gray level co-occurance matrix (i + glcm) followed byintensity and discrete wavelet transform features (i + dwt). Two different three-feature combinations were also tested consisting of i + hgm + glcm and i + hgm+ dwt features. The features containing only i + glcm and i + dwt produced thelowest bias and the most narrow loa. Due to these results, those are the only fea-

4.5 Classification and Regression 35

ture setups tested in volumetric measurements. In the volumetric measurementsthe i + dwt combination resulted in the lowest MRE and loa out of all combina-tions. In Figure 4.3 roi placements in a sample liver are shown. It can be seenthat the feature combinations containing glcm seemed to avoid some smallertextural structures while all feature combinations avoided the coarser abnormali-ties. Figure 4.4 shows Bland-Altman graphs (left) of the tests on i + glcm and i +dwt in 2D and 3D. Plots of the relative errors (right) are also showed, from whichthe MRE score is calculated. What can be noted is that a few relative errors weresignificantly higher. This was due to roi false positives in unsuited positions andfew regions classed as suited positions. This resulted in smaller distributions inliver tissue of roi:s, thus giving false fat fraction values.

Table 4.2: Table presenting results from the image feature experiments us-ing the classification method. For an explanation of the measurements, seeSection 4.4.

froi (px) Features Input Accuracy (%) Bias (pp) LoA (pp) MRE (%)

14 × 14 i 4 80.31 0.22 ± 2.31 20.4314 × 14 i + hgm 24 80.90 0.25 ± 2.28 16.9714 × 14 i + glcm 28 81.50 0.13 ± 1.52 17.0214 × 14 i + dwt 32 80.70 0.13 ± 1.54 19.1414 × 14 i + hgm + glcm 48 81.11 0.16 ± 1.69 17.4714 × 14 i + hgm + dwt 52 81.05 0.18 ± 1.82 16.86

7 × 7 × 7 i + glcm 28 82.24 0.17 ± 1.82 20.147 × 7 × 7 i + dwt 36 80.32 0.22 ± 1.37 13.62

4.5 Classification and Regression

Testing of the combined classification and regression method was performed oni + glcm and i + dwt features. Both two-dimensional and three-dimensionalmeasurements were tested, where a 14 × 14 pixel froi and a 7 × 7 × 7 pixel froiare used, respectively. Table 4.3 presents the results of the experiments. The mea-sured units are liver classification accuracy at the final epoch of classifier trainingof the validation data. The column named regression loss shows the final meansquared error function value on the validation data. All tests resulted in bias val-ues close to 0 and with similar width of the limits of agreement (loa). Also theMRE measurements were in the same range on all tests except for the 3D i + dwtfeatures, which was lower. In Figures 4.5-4.8, examples of liver classification anddecided measurement roi:s based on predicted d-values are shown. The last im-age in each figure shows the final roi positions (blue circles) and ground truthroi:s (green squares). On the right side in Figures 4.6a, 4.7a, 4.8a, it can be notedsome false positive classifications which were mostly located in the spleen. Figure4.9 shows Bland-Altman graphs (left) of the tests on i + glcm and i + dwt in 2Dand 3D. Plots of the relative errors (right) are also showed. In the relative errorgraphs it can be seen that the large relative errors were occurring at low fat frac-

36 4 Results

(a) i features (b) i + hgm features

(c) i + glcm features (d) i + dwt features

(e) i + hgm + glcm features (f) i + hgm + dwt features

Figure 4.3: Figures showing roi placement in the same subject using dif-ferent combinations of image features and the classification method. Greensquares show manually placed and blue circles show automatically placedroi:s.


tions. In Figure 4.9h it can be noted that the majority of errors were below 100%relative error. These results were better than the results in Figure 4.9b,4.9d,4.9f,where a larger number of measurements had a higher relative error due to highernumbers of false fat fraction predictions and false positive classifications.

Table 4.3: Table presenting results from the classification and regressionmethod experiments. For an explanation of the measurements, see Section4.5.

froi (px) Features Input Accuracy (%) Reg. Loss Bias (pp) LoA (pp) MRE (%)

14 × 14 i + glcm 28 99.20 6.011 0.03 ± 1.98 28.6914 × 14 i + dwt 32 99.27 5.155 -0.03 ± 1.67 26.64

7 × 7 × 7 i + glcm 28 99.06 2.651 0.02 ± 2.09 29.757 × 7 × 7 i + dwt 36 99.31 1.550 -0.03 ± 1.44 21.72

The results presented in Table 4.4 are from the same tests as in Table 4.3. Thedifference is that roi:s with d predictions within a 0.8 percentage point limit areused to calculate the mean fat fraction. Figure 4.10 shows the limited versionof Figure 4.9. These tests were performed in order show how well the methodperformed using a threshold value for d in a similar way as the method usingsolely classification. The results showed that limiting the predicted values on ddid not greatly affect the method agreement scores. Similar measurements indi-cated that a majority of the placed roi:s already were within the 0.8 percentagepoint limit. The measurements using i + glcm features produced slightly widerlimits of agreement in 2D, while i + dwt reduced the limits. In 3D, i + glcmwas unaffected and i + dwt resulted in slightly wider limits of agreement anda higher MRE. By limiting the fat fraction difference predictions, roi:s were re-moved. Removing roi:s resulted in a less distributed placement and a fat fractioncalculation more affected by fluctuations. This is the reason why differences inmeasurements are seen.

Table 4.4: Table presenting results from the classification and regressionmethod experiments with a 0.8 percentage point limit on the predicted d-values. For an explanation of the measurements, see Section 4.5.

froi (px) Features Input Accuracy (%) Reg. Loss Bias (pp) LoA (pp) MRE (%)

14 × 14 i + glcm 28 99.20 6.011 0.02 ± 2.36 29.7914 × 14 i + dwt 32 99.27 5.155 0.03 ± 1.60 24.19

7 × 7 × 7 i + glcm 28 99.06 2.651 0.04 ± 2.08 29.747 × 7 × 7 i + dwt 36 99.31 1.550 -0.03 ± 1.47 22.14

38 4 Results

(a) 2D: i + glcm features (b) 2D: i + glcm features

(c) 2D: i + dwt features (d) 2D: i + dwt features

(e) 3D: i + glcm features (f) 3D: i + glcm features

(g) 3D: i + dwt features (h) 3D: i + dwt features

Figure 4.4: Bland-Altman analyses and relative error plots from i +glcmand i +dwt in 2D and 3D using the classification method. Note the varyingscales on the y-axes.


(a) (b)

(c)

Figure 4.5: Examples of the classification and regression method. (a) Liverclassification using two-dimensional i +glcm features. (b) Green squarescontain the lowest predicted d-values. (c) Green regions show manuallyplaced roi:s and blue circles show automatically placed roi:s.

40 4 Results

(a) (b)

(c)

Figure 4.6: Examples of the classification and regression method. (a) Liverclassification using two-dimensional i +dwt features. (b) Green regions con-tain the lowest predicted d-values. (c) Green squares show manually placedand blue circles show automatically placed roi:s.


(a) (b)

(c)

Figure 4.7: Examples of the classification and regression method. (a) Liverclassification using three-dimensional i +glcm features. (b) Green regionscontain the lowest predicted d-values. (c) Green squares show manuallyplaced and blue circles show automatically placed roi:s.

42 4 Results

(a) (b)

(c)

Figure 4.8: Examples of the classification and regression method. (a) Liverclassification using three-dimensional i +dwt features. (b) Green regionscontain the lowest predicted d-values. (c) Green squares show manuallyplaced and blue circles show automatically placed roi:s.






Figure 4.9: Bland-Altman analyses and relative error plots from i + glcmand i + dwt in 2D and 3D using the combined classification and regressionmethod without limitations on d predictions. Note the varying scales on they-axes.

44 4 Results





Figure 4.10: Bland-Altman analyses and relative error plots from i + glcmand i + dwt in 2D and 3D using the combined classification and regressionmethod with 0.8 percentage point limitations on d predictions. Note thevarying scales on the y-axes.

5Discussion

In the following chapter all experimental results from Chapter 4 are analysedand discussed.

5.1 Feature Region Size

In Table 4.1 the three trials on froi sizes showed that the 14 × 14 region excelledin most of the measurements for 2D. A smaller region (7 × 7 pixels) producedlower classification accuracy and both higher bias and loa. The results might bedue to the region being too small to distinguish textural properties in the image.The larger region (21 × 21 pixels), on the other hand, produced results more sim-ilar to the 14 × 14; but was overall outperformed by it. A large region describes amore averaged measure of textural properties and might not provide the level oflocality needed to distinguish between roi positions. For 3D measurements the7× 7× 7 was the overall best performing froi size. Since the total number of vox-els were higher; a smaller region size provided more image region informationwith increased locality, thus achieving more accurate results as seen in Table 4.1.

5.2 Image Features

The experiments on different image features showed similar results. The exam-ples of qualitative results in Figure 4.3 showed similar roi placements in thesame liver samples. It can be noted that the roi:s in (c) and (e) avoided a certainregion containing finer texture, while the other features mostly only avoided therough texture variations. The images in (a) and (d) indicate that in this particularcase the dwt features did not contribute to the decision making, since the sameplacement was achieved using only signal intensity features. The intensity (i) fea-

45

46 5 Discussion

ture produced quantitative results, seen in Table (4.2), in the same value rangeas the combined features. The intensity feature results indicated that the maincontributor to classification accuracy was the signal mean and standard deviationin the fat and water images. Due to the values directly corresponding to actualfat and water content it is reasonable to believe that these features provided anindication on suitable positions for roi placement. The Bland-Altman analysisshowed both a higher bias and limits of agreement for i features compared to thecombined features; showing that a reasonable classification accuracy does notlead to the same level of method agreement. The features generating the highestmethod agreement score with lowest bias and loa were i + glcm and i + dwt.These features were considered the best performing combinations and were usedin the subsequent experiments.

5.3 Classification and Regression

The liver classification and pdff regression tests in Table 4.3 showed results in asimilar order of magnitude as prior trials. A notable difference is a better methodagreement (bias and loa) with lower bias and confidence interval on the i + dwtfeatures. The improvement was opposed by an increase in MRE. The liver classi-fication accuracy was high; resulting in a suitable liver tissue extractor providinga reasonable segmentation of the liver. Comparing measurements in Tables 4.3and 4.4 indicated that setting the acceptance limit parameter to 0.8 as in the firstmethod using classification did not significantly increase or decrease the measure-ment scores. Thus a clear advantage of the tested method was that no acceptancelimit parameter had to be set.

6Conclusion

The following chapter presents the concluding remarks of the thesis in relation tothe problem statement as well as suggestions on future work to further improvethe method.

6.1 Conclusion

This thesis showed that machine learning was a viable approach to automaticallymeasuring liver fat. It was also shown that given liver images and manually mea-sured fat fractions a training data set can be generated from large quantities ofMR-images. It was shown that 3D measurements using local image intensity fea-tures combined with features calculated from a wavelet decomposed image re-gion resulted in the highest method agreement (lowest bias and loa) comparedto manual measurements. For 2D measurements, a 14×14 pixel region containedsufficient information to describe local properties, while for 3D measurements a7 × 7 × 7 voxel region sufficed.

The two proposed methods had different advantages and disadvantages that shouldbe taken into consideration. The classification method resulted in lower relativeerrors (MRE). The method using liver classification and fat fraction prediction(regression) showed a higher agreement to the manual method, while showinglarger relative errors. This is probably due to a larger quantity of low fat fractionlivers with erroneous measurements. Small errors on low fat fractions can resultin high relative errors but still maintain a low bias and a narrow confidence inter-val.

The greatest advantage of the classification and regression method was that theacceptance limit parameter did not have to be set to a specific value in order to

47

48 6 Conclusion

achieve similar results as the manual method. Disadvantages of both methodswere that roi:s in some cases were placed in the spleen and resulted in a mislead-ing measurement of the liver fat fraction. As shown by the classification method,a correct segmentation of the liver was not necessary to produce results similarto manual measurements but could in some cases increase performance.

6.2 Future Work

Future work is deemed a necessity in order to achieve a fully automatic method.Below are suggestions on some areas that need further investigations to improvethe methods discussed in this thesis.

Firstly, more data is available trough the UK Biobank, thus tests using largeramounts of data should be performed to achieve an increased statistical correct-ness. Larger data sets should result in a more precise analysis of the suggestedmethods. Due to intensity features producing high scores without any texturalfeatures, it would be interesting to expand the number of gray value statisticalfeatures. For example intensity histograms can be used to provide more detaileddescriptions of intensity distributions.

To remove the false positive classifications outside the liver, an improved liversegmentation should be developed. An improved method may consist of remov-ing true classifications in the right side of the image where the liver is not locatedin most cases. More advanced and precise methods may also be tested. In orderto create a fully automated liver fat measurement method, the limitations of thisthesis have to be removed, namely to automatically locate the liver position ina whole body mri scan. Due to the tested textural features achieving such satis-fying results within the liver and in liver tissue classification, they should alsosuffice in locating the liver, or at least be a relevant starting point in the futureinvestigations.

Bibliography

[1] About the UK Biobank. http://www.ukbiobank.ac.uk/about-biobank-uk/, 2018. Online; accessed 22-January-2018. Cited onpage 21.

[2] Ethem Alpaydin. Introduction to Machine Learning, Third Edition. TheMIT Press, 2014. ISBN 9780262028189. Cited on page 10.

[3] D.G. Altman and J.M. Bland. Measurement in Medicine: the Analysis ofMethod Comparison Studies. The Statistician, 32:307–317, 1983. Cited onpage 29.

[4] Robert W. Brown, Y.-C. Norman Cheng, and E. Mark Haacke. MagneticResonance Imaging: Physical Principles and Sequence Design. Wiley, 2014.ISBN 9781118633984. Cited on page 4.

[5] Thomas W. Dixon. Simple Proton Spectroscopic Imaging. Radiology, 153,1984. Cited on page 9.

[6] François Chollet et al. Keras. https://keras.io, 2015. Cited on page21.

[7] Gobert Lee et al. Classification of Cirrhotic Liver in Gadolimium-enhancedMR Images. Proceedings of SPIE Medical Imaging 2007, 6514, 2007. Citedon page 3.

[8] M. Kurzynski et al. Evaluating and Comparing Classifiers: Review, SomeRecommendations and Limitations. Proceedings of the 10th InternationalConference on Computer Recognition Systems CORES 2017, Advances inIntelligeint Systems and Computing, 578, 2017. Cited on page 28.

[9] Patrick Ferdinand Christ et. al. Automatic Liver and Tumor Segmentation ofCT and MRI Volumes using Cascaded Fully Convolutional Neural Networks.2017. URL http://arxiv.org/abs/1702.05970. Cited on page 4.

[10] Zhenjiang Li et al. Texture-based Classification of Different Single LiverLesion Based on SPAIR T2W MRI Images. BMC Medical Imaging, 17:42,2017. Cited on page 4.

49

http://www.ukbiobank.ac.uk/about-biobank-uk/

http://www.ukbiobank.ac.uk/about-biobank-uk/

https://keras.io

http://arxiv.org/abs/1702.05970

50 Bibliography

[11] Lee G, F Wasilewski, R Gommers, K Wohlfart, A O´leary, H Nahrstaedt, andContributors. Pywavelets - wavelet transforms in python. Cited on page 21.

[12] Daniel Graupe. Principles of Artificial Neural Networks, 3rd Edition. WorldScientific Publishing Co, 2013. ISBN 9789814522731. Cited on page 10.

[13] Robert M. Haralick, K.Shanmugam, and Its’hak Dinstein. Textural Featuresfor Image Classification. IEEE Transactions on Systems, Man and Cybernet-ics, SMC-3, No.6, 1973. Cited on pages 15 and 16.

[14] Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Opti-mization. CoRR, abs/1412.6980, 2014. URL http://arxiv.org/abs/1412.6980. Cited on page 27.

[15] Michael Misiti, Yves Misiti, Georges Oppenheim, and Jean-Michel Poggi.Wavelets and their Applications. 2006. Cited on pages 18 and 32.

[16] F. et al. Pedregosa. Scikit-learn: Machine Learning in Python. Journal ofMachine Learning Research, 12:2825–2830, 2011. Cited on page 21.

[17] Prasad V. Pottumarthi. Magnetic Resonance Imaging: Methods and BiologicApplications. Humana Press Inc, 2006. ISBN 1597450103. Cited on pages4, 5, and 6.

[18] Ritu Punia and Shailendra Singh. Automatic detection of liver in ct imagesusing optimal feature based neural network. International Journal of Com-puter Applications, 76, 2013. Cited on page 4.

[19] Scott B. Reeder, Houchun H. Hu, and Claude B. Sirlin. Proton Density Fat-Fraction: A Standardized MR-Based Biomarker of Tissue Fat Concentration.Journal of Magnetic Resonance Imaging, 36:1011–1014, 2012. Cited on page9.

[20] Thobias Romu. Fat-referenced MRI: Quantitaive MRI for Tissue Character-izaion and Volume Measurement. Number 1910 in Linköping Studies inScience and Technology. Dissertations., 2018. URL http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-145316. Cited on page 7.

[21] Nicu Sebe and Michael S. Lew. Wavelet Based Texture Classification. Pro-ceedings of the 15th International Conference on Pattern Recognition, 3:947–950, 2000. Cited on page 18.

[22] Monika Sharma and Hiranmay Ghosh. Histogram of Gradient Magnitudes:A Rotation Invariant Texture-Descriptor. Proceedings of the IEEE Interna-tional Conference on Image Processing, 2015. Cited on page 19.

[23] Marina Sokolova and Guy Lapalme. A Systematic Analysis of PerformanceMeasures for Classification Tasks. Information Processing and Management,45:427–437, 2009. Cited on page 28.



http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-145316

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-145316

Bibliography 51

[24] Stéfan van der Walt et al. Scikit-image: Image Processing in Python. PeerJ,2:e453, 6 2014. ISSN 2167-8359. doi: 10.7717/peerj.453. URL http://dx.doi.org/10.7717/peerj.453. Cited on page 21.

[25] Yudong Zhang, Zhengchao Dong, Lenan Wu, and Shuihua Wang. A HybridMethod for MRI Brain Image Classification. Expert Systems with Applica-tions, 38:10049–10053, 2011. Cited on page 3.

http://dx.doi.org/10.7717/peerj.453

http://dx.doi.org/10.7717/peerj.453

automated measurements of liver fat using machine...

Documents