automatic computer aided diagnosis of breast cancer in ...€¦ · automatic computer aided...
TRANSCRIPT
Automatic Computer Aided Diagnosis of Breast Cancer inDynamic Contrast Enhanced Magnetic Resonance Images
by
Hongbo Wu
A thesis submitted in conformity with the requirementsfor the degree of Master of Science
Graduate Department of Medical BiophysicsUniversity of Toronto
c© Copyright 2016 by Hongbo Wu
Abstract
Automatic Computer Aided Diagnosis of Breast Cancer in Dynamic Contrast Enhanced
Magnetic Resonance Images
Hongbo Wu
Master of Science
Graduate Department of Medical Biophysics
University of Toronto
2016
Automated Computer Aided Diagnosis (CADx) systems have the potential to improve
the diagnostic accuracy of radiologists. Most CADx algorithms use features generated
from outlined regions to differentiate between benign and malignant lesions. Manually
outlining these regions for the purpose of analysis is not viable and therefore an au-
tomated segmentation method is essential. Our proposed method uses a trained deep
Artificial Neural Network (ANN) to classify overlapping tiles in breast Dynamic Con-
trast Enhanced Magnetic Resonance Imaging (DCE-MRI) images as lesion or non-lesion.
The classified tiles are then grouped into regions. Additional morphological, kinetic and
textural features are computed for each detected region. A cascaded Random Forests
Classifier (RFC) classifies the regions as malignant or benign. Our method was tested on
a dataset containing 71 malignant, 140 benign, and 316 normal studies. Free-response
Receiver Operating Characteristic (FROC) analysis of our method shows 94.4% sensitiv-
ity at 0.12 false positive detections per normal study.
ii
Acknowledgements
Although this thesis is published under one name, there were many people who made
this work possible. This section is dedicated to everyone who helped me get to this point.
First of all, I owe a great deal of gratitude to my supervisor Dr. Anne Martel for her
guidance as well as her easy-going supervision style, which provides the ideal environment
for me to grow as an independent scientist. Similarly, I would like to thank my supervisory
committee members Dr. Philip Beatty who aided me with the engineering aspect and
Dr. Martin Yaffe who provided me with the more clinical perspectives of my work.
I would also like to thank the members of the Martel breast CAD group - Cristina,
Martin, Yingli, Sharmilla, Nikita and Sylvester - for providing me with their constructive
feedback and inspiring discussions.
Finally, I would like to thank my family for the irreplaceable support they have given
me throughout the past few years.
iii
Contents
List of Abbreviations ix
1 Introduction 1
1.1 Breast Cancer Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Breast Cancer Screening . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Computer Aided Detection and Diagnosis . . . . . . . . . . . . . . . . . 8
1.3.1 Detection Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.2 Classification Overview . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2 Automated Computer Aided Diagnosis using Deep Learning 19
2.1 Introduction and Background . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Method Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.5 Region Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5.1 Unsupervised Pretraining . . . . . . . . . . . . . . . . . . . . . . 26
2.5.2 Supervised Training . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.5.3 Optimal Region Threshold . . . . . . . . . . . . . . . . . . . . . . 27
2.5.4 Postprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.6 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
iv
2.7 Region Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.7.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.7.2 RFC Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.8 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.10 Implementation Details and Limitations . . . . . . . . . . . . . . . . . . 37
2.10.1 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3 Discussion and Future Work 40
3.1 Significance of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Appendices 44
A Perceptrons 45
B Lesion Features 48
C List of Abbreviations 53
Bibliography 54
v
List of Tables
1.1 The Breast Imaging Reporting And Data System (BI-RADS) risk classi-
fication system for breast Magnetic Resonance Imaging (MRI) that radi-
ologists can assign to an exam. Table adapted from American College of
Radiology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1 A breakdown of our data to training and testing set. . . . . . . . . . . . 24
2.2 A breakdown of BI-RADS category for our training data. . . . . . . . . . 24
2.3 Statistics of our proposed method on the training set and testing set. The
measures were computed after applying both RFC1 and RFC2 classifiers
and provides a rough estimate of how well our algorithm does in practice. 33
2.4 A breakdown of the performance on the testing set with respect to BI-
RADS category. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
vi
List of Figures
1.1 A slice containing a malignant lesion in a series of DCE-MRI volumes.
The red box indicates the location of a malignant lesion. (a) Processed
maximum intensity projection image of the subtraction to show enhancing
lesions and blood vessels. (b) A series of DCE-MRI images at 5 time points.
The first image is the baseline before the contrast agent injection. . . . . 6
1.2 Examples of (a) mass and (b) non-mass lesions in subtracted DCE-MRI. 8
1.3 Kinetic enhancement map generated by Merge CADStream Software. (Adapted
from http://www.axisimagingnews.com/2010/07/better-breast-imaging/) 9
1.4 Flowchart of our automated CADx pipeline. After preprocessing each
image, we apply our detection algorithm to classify all the detected regions
as benign or malignant and present the resulting regions to the radiologist
for diagnosis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1 (a) An 8-neighbourhood connection scheme is used to divide the rendered
4D DCE-MRI matrix into overlapping image tiles of size 5× 1× 3× 3 (5
time points, 1 slice, 3-by-3 voxel window). (b) Each tile is then flattened
to a 1D input vector of size 45 for use in training and classification by our
ANN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
vii
2.2 (a) Architecture of Deep ANN with 45 input nodes, 32 tanh hidden nodes,
7 sigmoid hidden nodes, and 2 softmax output nodes. (b) Stacked Denois-
ing Autoencoder (dAE) used to initiate the network. The first dAE uses a
tanh while the second dAE uses a sigmoid as the encoding function. The
dashed arrow shows the path with respect to the original network. . . . . 26
2.3 Result of conditional dilation operation to join disconnected islands to-
gether. Left is the subtraction image showing a 2D slice of the lesion.
Middle is the segmentation without dilation and right is the segmentation
with dilation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4 A schema of the cascaded RFC. The first RFC classifies lesion and non-
lesion regions while the second RFC differentiates the resulting lesions as
malignant or benign. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.5 Illustrated examples of features learned by our ANN. (a) 2D representation
of first hidden-layer network weights. (b) The value of each row is averaged
and plotted on a graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.6 Aggregated Receiver Operating Characteristic (ROC) curve of the lesion
classifier (RFC1) for each of the 10-fold cross-validation. RFC1 achieved
0.91 Area Under the Curve (AUC) (0.91-0.94 interquartile range). The
optimal threshold value of 0.6 was selected to maximize the sensitivity
and specificity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.7 Aggregated ROC curve of the malignant/benign classifier (RFC2) for each
of the 10-fold cross-validation. RFC2 achieved 0.81 AUC (0.80-0.85 in-
terquartile range). The optimal threshold value of 0.63 was selected to
maximize the sensitivity and specificity. . . . . . . . . . . . . . . . . . . . 33
viii
2.8 Examples of false positive misclassifications by our algorithm. The top
row shows mild background parenchymal enhancements misclassified as
malignant lesions. The bottom row shows a lymphnode detected as ma-
lignant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.9 Image of the missed malignant lesion in the test data set (circled in red).
The lesion resembles background enhancement. . . . . . . . . . . . . . . 38
3.1 Diagram of a Convolutional Neural Network (ConvNet). The 2 convolu-
tion layers act as feature extractor without segmentation, while the fully
connected layers act as classifiers. The segmentation and classification
steps are in essence merged as a single classifier. Image adapted from
http://parse.ele.tue.nl/education/cluster2. . . . . . . . . . . . . . . . . . 42
A.1 Diagram of the perceptron unit. It computes the weighted sum of its
inputs as activation and proceeds to fire a signal if a threshold is passed. 45
ix
Chapter 1
Introduction
1
Chapter 1. Introduction 2
1.1 Breast Cancer Overview
Breast cancer is known to be one of the most diagnosed diseases among women in Canada.
It is currently the second leading cause of cancer death in women, resulting in an esti-
mated mortality rate of 17.9 per 100,000. Statistics show that 1 in 9 women is expected
to develop breast cancer over her lifetime and 13.6% of the women who have breast can-
cer will eventually succumb to it [44]. This number has been steadily declining over the
past few decades due to a combination of screening programs and improved treatment
[43].
The breast is an organ containing a network of ducts and lobules responsible for milk
production. The majority of cancers diagnosed are carcinomas arising from the ductal
and lobular epithelial cells of the breast. Breast cancer is categorized into 2 distinct types:
invasive and in situ carcinoma. In situ carcinomas may arise from the ducts or lobules
of the breast and are contained within their respective epithelium. When these cancer
cells proceed to invade the outer membrane beyond the epithelium, they are considered
invasive carcinomas. These types of cancers have a higher chance of metastasis.
Common approaches for treating breast cancer include chemotherapy, radiotherapy,
hormonal therapy, surgery (e.g. mastectomy, lumpectomy), or potentially a combination
of these procedures. Anatomical imaging can be used as a non-invasive way to detect
tumours and assess their size and progression. In vivo imaging has therefore become an
irreplaceable part of cancer diagnosis and treatment. The 3 most common modalities
used in breast imaging are X-ray mammography, Ultrasound (US), and MRI.
Currently, the standard for general population breast cancer screening is X-ray mam-
mography. A retrospective study following patients over a 2 year period shows that the
Ontario Breast Screening Program (OBSP) has 86.1% sensitivity, 93.1% specificity, and
6.5% Positive Predictive Value (PPV) after recall exam[38]. Research into Computer
Aided Detection (CADe) systems within the clinical workflow has shown that the sen-
sitivity can be further increased without loss in PPV [16]. Nevertheless, mammography
Chapter 1. Introduction 3
still struggles with detecting cancers in younger women who have denser breasts. MRI is
an alternative imaging modality that is known for having the highest sensitivity. Despite
this, it suffers from moderate specificity [27]. Studies have suggested that CADx systems
can help improve the diagnostic accuracy of DCE-MRI [7]. Current CADx systems on
the market for breast MRI rely on human interaction for detecting lesions and will not
increase the overall cancer detection rate. A fully automated CADx system can there-
fore have the potential to increase both the sensitivity (detection rate) as well as the
diagnostic accuracy of lesions.
This thesis presents a fully automatic lesion detection and classification algorithm that
can diagnose both mass and nonmass lesions in DCE-MRI. The key contribution of this
work is a proposed automated CADx system for high-risk breast cancer screening. In the
proceeding section of this chapter, we present an overview of breast cancer screening in
Canada and highlight the various CADe and CADx algorithms described in the literature.
1.2 Breast Cancer Screening
Cancers detected in their early stages tend to be easier to treat and have potentially fewer
complications, which may lead to a better prognosis for patients [2]. Thus, early detection
and monitoring of treatment response is important for improving patient survival rate.
Consequently, government agencies have introduced screening programs as part of the
healthcare system. The current standard for population-wide breast cancer screening is
X-ray mammography. This procedure typically involves compressing each breast between
2 plates while X-ray images are taken in 2 different planes. As X-rays travel through
the breast, various tissues will absorb the energy differently. This difference translates
directly to brightness in the resulting image. Since tumours have higher attenuation
than fatty tissue, they show up brighter in mammograms. Problems arise in denser
breasts where tumours have a greater risk of being occluded by the parenchymal tissue
Chapter 1. Introduction 4
in the resulting 2D image. A 3D method such as tomosynthesis is known to minimize this
problem by imaging at multiple angles. In the case of abnormal findings during screening,
subsequent US or MRI exams might be arranged before making a final diagnosis.
Breast US imaging offers a relatively cheap radiation-free diagnostic modality. It
involves a hand-held transducer that sends sound waves through the breast and detects
the resulting echoes. A common application of US within breast cancer diagnosis is the
detection of cysts. Water is known to have a lower acoustic attenuation than fat and
tissue. This means that it will reflect less acoustic waves and will therefore appear darker
in US images. Since cysts are typically filled with fluid, they are easily recognized by their
characteristic dark appearance. So in conjunction with X-ray mammography, radiologists
can rule out benign lesions such as cysts. A disadvantage of conventional breast US is the
reliance on a hand-held device, which makes image quality dependent on the technologist
operating it. Some clinics offer Automated Breast Ultrasound (ABUS) as an alternative
screening modality for women with dense breasts. The ABUS uses a robotic device
instead of the hand-held transducers in conventional US, which allows for faster and
more consistent imaging. Furthermore, the ABUS generates a 3D reconstruction of the
images unlike conventional US where only 2D images are taken. A study measuring the
accuracy of ABUS imaging has found that it approaches 95% sensitivity for malignant
and 66% for benign masses [10]. It has been suggested by [22] that ABUS in conjunction
with mammography could reach the same sensitivity as MRI.
The OBSP recommends that women undergo routine breast screening between the
age 50 and 74. While this is sufficient for the majority of the population, evidence
suggests that a certain group of women can have up to 85 % chance of developing breast
cancer over their lifetime [51]. The increased risk means that overall, women in this
group tend to develop breast cancer at a younger age and will therefore have denser
breasts. Mammographic images of breasts with high density have a higher chance of
missed lesions due to occlusion [26]. On the other hand, the issue of breast density does
Chapter 1. Introduction 5
not affect MRI images as much. Consequently, the OBSP initiated an annual high-risk
screening program for women between the ages of 30 and 69 at high risk of developing
breast cancer. Women included in this program must have at least one of the following
criteria: (1) carriers of BRCA1/2 mutation, (2) did not undergo genetic assessment
but have first-degree relatives with such mutation, (3) have a lifetime risk greater than
25% based on genetic assessment, or (4) have had received chest radiation before the
age of 30 and at least 8 years ago. For these women, the primary diagnostic screening
modality is DCE-MRI, which has been shown to have higher sensitivity compared to
X-ray mammography and breast ultrasound [9]. In fact, clinical studies have shown
that among the 3 modalities used, MRI was the only modality able to detect all of the
invasive cancers in their screening population [12]. This might be due to the fact that in
mammography, breasts with high amounts of dense tissue have a greater risk of occluding
smaller cancers. A 3D imaging method such as breast tomosynthesis could reduce this
problem. However, the lack of availability makes it difficult to implement in the current
population-wide screening program.
Recent studies have acknowledged MRI as an invaluable tool for detecting cancer in
woman at high risk. The typical MRI breast screening exam involves many different MRI
sequences to produce T1-Weighted (T1W), T2-Weighted (T2W), Fat Saturated (Fat-Sat)
images, Diffusion MRI, and DCE-MRI images. The main diagnostic tool for radiologists
is the analysis of DCE-MRI through the flow of contrast agents. The procedure involves
first taking a pre-contrast image of the breast. Then, a gadolinium-based contrast agent
is injected into the bloodstream and images are taken periodically afterwards to assess
the flow of the agent through the breast. Since aggressive tumours are known to have
very permeable membranes, the contrast agent flows into and out of the tumour more
readily compared to other tissues. Due to the paramagnetic properties of gadolinium
in the contrast agent, areas containing gadolinium will show up brightly in T1W MRI
images indicating a suspicious lesion. Figure 1.1 shows the enhancement of a lesion within
Chapter 1. Introduction 6
Figure 1.1: A slice containing a malignant lesion in a series of DCE-MRI volumes. Thered box indicates the location of a malignant lesion. (a) Processed maximum intensityprojection image of the subtraction to show enhancing lesions and blood vessels. (b) Aseries of DCE-MRI images at 5 time points. The first image is the baseline before thecontrast agent injection.
a slice of DCE-MRI volume over 5 time points. In order to make lesions easier to detect,
Maximum Intensity Projection (MIP) and subtraction images are often generated as part
of the protocol.
While MRI is known for achieving the highest sensitivity in detection, [13] shows that
contrast enhanced X-ray modalities can achieve detection rates equivalent to MRI. This
implies that the main factor for the detection of lesions is the presence of contrast agent
enhancement. Indeed, it is well studied that more malignant tumours tend to induce
rapid vasculature growth (a process called angiogenesis) which allows contrast agents to
permeate the tissue more readily [25]. The reason that these contrast enhanced X-ray
modalities are less widely used however, is due to the increased radiation dose as well as
high contrast agent dose, which make them unsuitable as a screening modality.
A commonly used standard for reporting MRI exams is BI-RADS. The lexicon pro-
vides descriptors for lesions, Background Parenchymal Enhancement (BPE), as well as
criteria for categorizing the likelihood of cancer. Enhancements that take on a distinct
shape within an area are considered mass lesions while enhancements that cluster over
multiple groups are categorized as non-mass lesions. These lesions are further detailed on
morphological (e.g. shape, margin), texture (e.g. internal enhancement patterns), and
kinetic (e.g. wash-out) features. Examples of mass and non-mass lesions are depicted in
Figure 1.2. A third category called foci describes enhancing regions that are too small
Chapter 1. Introduction 7
Table 1.1: The BI-RADS risk classification system for breast MRI that radiologists canassign to an exam. Table adapted from American College of Radiology.
to accurately characterize with respect to their margins. These regions can be correlated
with T2W imaging to rule out the presence of lymph nodes. Foci that do not appear
bright in T2W images are likely to be looked on with suspicion and might require biopsy
if they are observed to have increased in size at a follow-up exam. After considering
all these findings, the radiologist then assigns a score based on the likelihood of cancer.
The full BI-RADS cancer risk classification system for breast MRI is listed in Table 1.1.
Statistical analysis of biopsies performed at a clinic shows that BI-RADS 3 has a positive
predictive value of 3% while BI-RADS 4 and 5 are at 23% and 92% respectively [28].
To put this into perspective, close to half of the biopsies performed were of BI-RADS 3.
The large amount of BI-RADS 3 biopsies performed along with the low PPV means that
most of the negative biopsies performed belong in this group.
Chapter 1. Introduction 8
Figure 1.2: Examples of (a) mass and (b) non-mass lesions in subtracted DCE-MRI.
1.3 Computer Aided Detection and Diagnosis
It is believed that computer assistance within the clinical workflow can help improve
the radiologists’ diagnostic accuracy. Computer assistance can be categorized as CADe,
where the main goal is the detection of lesions, and CADx, where the system attempts to
differentiate benign and malignant lesions. While there are no automated CADx systems
on the market, manual and semi-automatic CADe systems for mammography are already
being integrated into the clinical workflow. In clinical practice, the CADe system is
applied after the primary radiologist finishes examining the image [24]. This essentially
allows the CADe system to bring attention to regions that the radiologist might have
overlooked. Such systems are commonly used in clinical practice as a second reader. On
the other hand, CADx systems attempt to diagnose any detected lesions as malignant or
benign. There are several factors identified for CADx systems to be successfully adapted
for wide clinical practice [47]. A CADx should improve the radiologist’s performance,
save time, be seamlessly integrated into the workflow, be cost-saving, and should not
Chapter 1. Introduction 9
Figure 1.3: Kinetic enhancement map generated by Merge CADStream Software.(Adapted from http://www.axisimagingnews.com/2010/07/better-breast-imaging/)
impose liability concerns.
While CADx systems are currently in use for mammography screening exams, a study
done in the United Kingdom has shown that CADx systems for mammography only offer
marginal improvements of 1% in sensitivity over a single reader radiologist while taking
almost twice as long (45 seconds) [24]. On the other hand, CADx systems can have
huge cost-savings potential within MRI imaging. Since each DCE-MRI volume usually
contains hundreds of images at various time points, the time required to analyze DCE-
MRI volumes is much longer compared to other modalities. Moreover, the majority of
findings in these exams have a high chance of being false positives after biopsy. Therefore,
it has been suggested by [41] that employing CADx systems as an additional diagnostic
tool can improve a radiologist’s diagnostic accuracy and thereby reduce the number of
unnecessary biopsies. Current breast MRI CADx systems are able to provide overlays of
image features such as kinetic enhancement parameters and time-intensity curves, which
facilitates the diagnosis procedure by making the MRI images easier to comprehend.
Figure 1.3 shows an example of a commercial breast MRI CADe software in action.
There exists a rapidly growing body of literature on CADx. A study by [36] demon-
strated that all the clinicians regardless of skill or experience were able to outperform an
Chapter 1. Introduction 10
expert MRI radiologist with the help of a CADx system. However, the authors were not
able to demonstrate improvements in detection rate since semi-automated segmentation
was used in their study. This motivates the development of an automated CADx system
which can potentially improve both the detection rate and diagnostic accuracy.
1.3.1 Detection Overview
The first step for any CADx system is to localize any suspicious regions. To this end,
many types of segmentation algorithms have been developed in the medical imaging
literature. Classical computer vision methods use various combinations of thresholding
and mathematical models to segment images. Naıve methods of segmentation include
seeded region growing and automated thresholding. There are also many mathematical
models such as clustering and active contour models developed to capture the outline of
lesions.
The naıve approach to segmenting an image is to define a lower and upper intensity
threshold. The regions that are within the defined thresholds will be highlighted. How-
ever, in medical images such as MRI, the variation in image intensity between patients
makes it difficult to assign a single pair of threshold values for every lesion. Attempts
have been made to automate the selection of thresholds using various mathematical mod-
els. One example is Otsu’s Method for threshold selection [52]. Otsu’s Method attempts
to find a threshold that minimizes the intra-class variance, defined as a weighted sum of
variance between 2 classes. The algorithm steps through each possible intensity value
and calculates the intensity variance between pixels above and below the selected thresh-
old. The intensity value that produces the maximum variance between pixels above and
below the selected intensity will be selected as the segmentation threshold. Many of the
methods described in literature use some type of thresholding within their algorithm.
A localized approach to segmentation is the Seeded Region Growing (SRG) algorithm.
This method uses manually or automatically planted seeds to segment an image. The
Chapter 1. Introduction 11
segmentation process starts out from the seed position and extends to neighbouring
regions based on a selection criteria. Just like the threshold method, it is difficult to
select a suitable criterion for a threshold that will work for all lesions in every image.
If segmentation is done using a single criterion, lesions will likely be over- or under-
segmented. Therefore, an adaptive threshold selection is often used in conjunction with
SRG to produce a more accurate segmentation. An adaptive SRG method was used
by [8] to find the contour to a mass lesion. A human-delineated region of interest was
necessary as a preprocessing step to reduce the range in which the algorithm operates
on.
The watershed segmentation algorithm is analogous to a flood simulator [53]. This
method treats a grayscale image as a topographic map wherein water is poured from
certain points. A gradient map is built from the image with each pixel corresponding to
the intensity change with respect to its neighbouring cells. These gradients form what is
called a basin where water gathers. The edge of the basins will become watershed lines
used to segment the image. The point of entry for water can be manually assigned or
automatically generated based on the unique features of points of interest (e.g. morphol-
ogy of lesions). The watershed algorithm was used in [14] to segment breast lesions in
DCE-MRI images on a slice-by-slice basis.
Fuzzy C-Means (FCM) is a cluster-based segmentation algorithm that groups a num-
ber of data points into c classes. Unlike traditional clustering where each data point can
only correspond to a single class, fuzzy clustering allows the data point to have a degree
of membership amongst different classes. A matrix is built to store the membership in-
formation of each data point. The matrix is then modified iteratively to minimize the
cluster membership error of each data point. For example, with grayscale images, the
pixels of the image will be clustered based on similar intensity values (e.g. by minimizing
the difference in intensity). This method was proposed for the segmentation of breast
lesions in DCE-MRI by [11]. The proposed method requires a human to first select a
Chapter 1. Introduction 12
ROI containing a lesion. The region is then normalized using the post-contrast T1W
image intensity at subsequent time steps. The FCM is applied to the enhanced region
to categorize the voxels into lesion and non-lesion. Final post-processing is done to take
into account necrotic regions and reduce false positive regions such as blood vessels.
The Gradient Vector Flow (GVF) or snake is a method that distorts a curve in order
to fit the outline of an object. The curve is distorted through the interaction between
internal and external energy functions. The external energy function is based on an
intensity gradient that minimizes when the curve is at the desired edge whereas the
internal energy function forces smoothness of the curve. Thus, the edge of the object
is found by minimizing the sum of the internal and external energies. A combination
of FCM and this method was used by [39] for segmenting and extracting morphological
features from breast lesions. First, 5 volume subtraction images were generated using
the pre- and post-contrast MRI data. For each lesion a representative MRI slice with the
greatest contrast was selected by an operator and a Region Of Interest (ROI) box was
placed around the lesion. Next, a crude contour was drawn around the lesion and the
GVF algorithm was applied to outline the boundary of the lesion.
While the previous methods described so far operate on the raw intensity values,
other studies have attempted to segment lesions based on computed features. For in-
stance, a mean intensity projection image was generated to detect enhancing regions
within a volume of interest [15]. Then, various dynamic (e.g. mean, standard deviation)
and texture features (e.g. kurtosis, entropy) are extracted and statistically analyzed to
determine how much each feature contributed to the lesion detection. The authors found
that the standard deviation and maximum mean intensity projection features had the
highest diagnostic accuracy at 90% detection rate. Other studies attempted to segment
lesions based on various kinetic models of the contrast agent. [21] attempted to segment
Invasive Ductal Carcinoma (IDC) by applying time series analysis to a linear dynamic
system model of the contrast enhancements. The authors report 100% sensitivity and
Chapter 1. Introduction 13
90% accuracy in detecting IDC cancers with this method. While the method had high
sensitivity, only 24 cases were studied and no specificity or false positive rate was reported
by the authors.
A different approach to image segmentation is to treat it as a classification problem
in which pixels are divided into object and background class. Classifiers essentially try
to learn a decision boundary separating the object and background classes based on a
given dataset.
A RFC approach was explored by [18] in which lesion segmentation was treated as
a blob detection problem. The method first computed various Hessian-based blob-like
features as well as kinetic enhancement values of the image and then trained an RFC
to classify voxels with blob-like features as lesions. A false positive removal stage using
another trained RFC was later performed to reduce the number of false candidate regions.
The RFC is essentially an ensemble of decision trees. The method operates by building
many decision trees with randomized sample of features. Classification is then achieved
by querying each tree in order to attain a majority vote of a target class. An inherent
property of RFCs is the automatic ranking of the importance of various features in the
training data. Analyzing these features could give unique insights to how each feature
affects the diagnosis.
On a different note, ANN is a graphical model based on the biological neural network.
Just like the brain, the ANN is organized into layers composed of many neuron-like
processing units called perceptrons. A detailed description of the structure of perceptrons
is included in Appendix A. In general, each individual perceptron unit in a layer takes as
input, the weighted sum of the output the previous layer. Through repeated exposure to
examples, the network of perceptrons can adapt its weights to capture the distribution
of the provided data. The use of a single hidden layer ANN model was explored by [32]
to detect breast cancer in DCE-MRI by attempting to model pharmacokinetic curves. A
similar approach was used in [6] in which the signal-time curve was classified in an attempt
Chapter 1. Introduction 14
to identify malignant, benign, and normal tissues. The authors were able to achieve 92%
accuracy on 34 test cases. The authors in [42] used a trained ANN to adaptively modify
the threshold of a SRG method to segment lesions in X-ray mammography images. A
ROI is accepted as input and then the ANN was used to initiate the seed point for a
SRG algorithm. The method achieved an average accuracy of 82% and 95% on 2 different
public mammography datasets.
We have summarized a few of the many different types of algorithms proposed in lit-
erature for the detection of breast lesions. Most of the aforementioned methods involve
a manual delineation of ROI in order to reduce computation time and increase accuracy.
Furthermore, some algorithms will not work correctly if the input ROI provided is too
large. For instance, if the initial contour for the GVF was drawn too far from the lesion,
the GVF might fail to find the object or outline a different object. Methods such as
Otsu’s thresholding are highly dependent on the ROI parameters (i.e. image intensity,
region size) so changes in size or location of the ROI can give varying results. Therefore,
automated algorithms which do not require manual ROI selection are desirable for en-
suring robustness of segmentation while minimizing observer variabilities. The drawback
of many classical segmentation algorithms is the need for human interaction (whether to
provide ROIs or seed points). On the other hand, statistical machine learning methods
such Support Vector Machine (SVM)s and RFCs function by classifying precomputed
features while the biology-inspired ANNs excel in directly learning from the data. With
respect to automated segmentation, the machine learning based algorithms show greater
potential in achieving the functionality required for automated CADx systems. Although
there are many advantages to using machine learning based methods, the performance
is directly related to the amount of high quality data, which is not always available in
abundance.
Chapter 1. Introduction 15
1.3.2 Classification Overview
The next step for a CADx system is to classify each of the detected legions as a possible
malignant cancer or benign tissue (such as cysts). While there exists a rapidly growing
body of literature on CADx, most of these methods rely on a machine learning algorithm
to classify lesions based on features extracted from the image. Studies have shown that
a combination of morphological, texture, and kinetic features can be used to distinguish
between benign and malignant lesions [17]. It is thought that computer analysis of
these features could be used as an aid to improve the radiologist’s ability to differentiate
benign and malignant cancers. Since malignant cancers require more nutrients than
normal tissue, it follows that lesions corresponding to malignant tumours will have higher
amounts of contrast agent flow. Pharmacokinetic modelling based on time-intensity data
can be used to characterize the malignancy of the lesion. In one study, an FCM algorithm
was used to cluster voxels based on time-intensity curves [50]. Morphological and textural
features were then used to classify the resulting regions as benign or malignant. The
authors noted that combining morphological and kinetic features proved to be more
robust when differentiating benign and malignant lesions.
The use of combination of kinetic, morphological and spatiotemporal features was
proposed by [4]. A histogram based threshold was applied to select enhancing regions
while kinetic and morphological filters were applied to reduce the number of false positive
regions. The authors then used an SVM to classify the extracted series of morphological,
kinetic, and spatiotemporal features of each region as benign or malignant. Generally,
SVMs attempt to find the optimal decision boundary in a multidimensional feature space
such that the orthogonal distance between the boundary and closest training data points
(known as support vectors) is maximized. Various kernels such as a Radial Basis Func-
tion can be used to transform the data before learning the boundary point in order to
improve separability of the classes. A semi-automatic method was proposed by [31]. The
algorithm involves having a user draw ellipses on a suspicious lesion and a non-lesion nor-
Chapter 1. Introduction 16
mal region. The voxels within the lesion region are assigned as the positive sample and
training is done on the fly for each case. A bounding box is drawn around the selected
samples and then the trained SVM is used to classify all the voxels within the bounding
box. [33] applied SVMs to differentiate invasive and non-invasive cancers in DCE-MRI
based on the signal intensity-time metrics. The authors first computed various voxel-
based kinetic features such as wash-out slope and area under the curve for each voxel
(representing contrast agent concentration), and used the SVM to classify the voxels as
potential lesions. The method had 72% sensitivity and 98% specificity on 26 malignant
and benign lesions. The poor sensitivity could be attributed to the limited dataset the
authors used.
Various studies have also explored the use of ANNs for the classification of breast
lesions. The authors in [36] used a semi-automated region growing method to generate
ROIs in DCE-MRI images. Then the shape, texture and kinetic enhancement of each
region was computed and classified using an ANN. Operating on a set of 43 malignant
and 37 benign lesions, the algorithm achieved an AUC of 0.97. Likewise, ANNs have been
used for the classification of lesions in X-ray mammography [42]. First, a cellular neural
network was used to automatically segment the lesion. Intensity, shape, and texture
features of the lesion were computed and classified using a simple ANN. The algorithm
was able to achieve 96.87% sensitivity and 95.94% specificity in diagnosing mass lesions.
As a final note, our lab has so far shown that a cascaded classifier can improve
over the performance of a single classifier in differentiating lesions [17]. In this study, a
cascaded RFC was used to determine lesion malignancy. Lesions are first categorized as
mass and nonmass using a combination of different kinetic, morphological and texture
features. A second RFC is then used to classify the lesion as benign or malignant.
Improvements in performance was noted on the cascaded RFC compared to the single
RFC. This phenomenon seems consistent with the idea of boosting in which an ensemble
of weaker learners can create a strong classifier. We will exploit this concept as part of
Chapter 1. Introduction 17
our proposed automated CADx system.
1.4 Thesis Outline
The remainder of this thesis is dedicated to describing the implementation of an auto-
mated CADx pipeline for breast cancer diagnosis. An overview of the proposed pipeline
is summarized by Figure 1.4. For each case to analyze, we first apply the necessary
preprocessing steps such as motion correction, breast segmentation and contrast normal-
ization in order to provide a common framework for our algorithm. We then apply a
trained ANN to each 3 × 3 patch (over 5 time points) to create a probabilistic map of
each patch belonging to a lesion. The resulting image is then binarized to create regions
of interest. After that, we analyze the kinetic, morphological, and textural features of
each region in order to classify it as benign or malignant. A list of the resulting regions
will then be provided to the radiologist to aid in diagnosis.
Chapter 2 presents our proposed pipeline in more detail. The particulars surrounding
the training of each classifier are elucidated and the relevant statistics are reported in
this chapter. Furthermore, I will provide some insights to the design decisions made
concerning the architecture of our pipeline as well as discuss some limitations and ways
to accommodate them.
Chapter 3 summarizes the contribution of this thesis and emphasizes the significance
of our proposed CADx pipeline in the context of breast cancer screening. I will then
discuss some of the mistakes our algorithm has made and finally, concluding this thesis
by presenting some ideas for future work.
Chapter 1. Introduction 18
Figure 1.4: Flowchart of our automated CADx pipeline. After preprocessing each im-age, we apply our detection algorithm to classify all the detected regions as benign ormalignant and present the resulting regions to the radiologist for diagnosis.
Chapter 2
Automated Computer Aided
Diagnosis using Deep Learning
2.1 Introduction and Background
Breast cancer is currently one of the most diagnosed diseases among women. Evidence
suggests that early screening and treatment reduces incidence of advanced-stage breast
cancer in certain high-risk groups [51]. The primary screening modality for these women
is DCE-MRI, which has been shown to have higher sensitivity compared to X-ray mam-
mography and breast ultrasound [9]. However, the time required to analyze DCE-MRI
volumes is often much longer compared to other modalities. Moreover, the majority of
findings in these MRI exams turn out to be false positives after biopsy. It has been sug-
gested by [5] that employing CADx systems as an additional diagnostic tool can improve
a radiologist’s diagnostic accuracy. A study conducted by [19] reported that a CADx
system can potentially reduce 36.9% of unnecessary biopsies within the BI-RADS 4A
group.
Currently available CADx systems provide an overlay of morphological, texture, and
kinetic features to medical images. Radiologists are then expected to give a diagnosis
19
Chapter 2. Automated Computer Aided Diagnosis using Deep Learning20
by examining the characteristics of suspicious lesions. Benign lesions tend to be circular
and have a sharp margin while malignant lesions tend to have an irregular shape and
spiculated margin. In many cases, however, this distinction is less well defined and the
analytical powers of computers are needed to mitigate this problem. In order to compute
these features, robust outlines of these lesions must be provided. Due to the nature of
DCE-MRI images, manually segmenting lesions by trained experts is prohibitively expen-
sive and time-consuming. While CADx systems relying on semi-automated segmentation
algorithms have demonstrated improvements in diagnostic accuracy over human experts
[36], they are still not optimal since it requires a human to first locate the lesion. There-
fore, it is necessary for a CADx system to have an automated detection and segmentation
algorithm in order to allow improvements in both diagnostic accuracy as well as detection
rate of cancers.
The simplest approach to segmenting an image is to define a lower and upper intensity
threshold. Voxels within the defined intensity boundary are selected as regions of interest.
However, the variation in image intensity of MR images makes it difficult to assign a single
pair of thresholds for every possible image. More complex methods such as watershed,
FCM, and GVF are described in [14, 11, 39] respectively. Most of these algorithms require
manual selection of seed points or regions of interest in order to reduce computation time
and improve robustness of the algorithm. These manual interventions are also subject to
observer variabilities.
A different paradigm to the classical image segmentation algorithm is to treat it as
a classification problem wherein each pixel is classified as object or background. The
recent resurgence of ANN and Deep Learning (DL) provide an excellent framework for
this problem. The term ”Deep Learning” refers to a branch of machine learning algo-
rithm that utilizes multiple non-linear transformations to learn some type of hierarchical
representation of data. With the introduction of faster and cheaper hardware, deep learn-
ing has become a powerful tool in research and industry. ANNs have seen huge success
Chapter 2. Automated Computer Aided Diagnosis using Deep Learning21
in recent years by achieving state of the art benchmark results in the computer vision
and linguistics community. The advent of deep-learning and unsupervised training have
shown promise in learning hierarchical features using unlabelled data [30]. This is of par-
ticular interest for the training of an automatic classifier-based segmentation algorithm
since ground truth lesion segmentations are scarce and expensive to produce in medical
images.
A common procedure for deep learning with ANNs is to use layer-wise pretraining of
data to initiate the ANN. An outline of this procedure is presented in Figure 2.2(b). The
idea behind layer-wise pretraining is to use an unsupervised ANN architecture to learn
latent representations of the data one layer at a time. After learning the representations
of one layer, the input is transformed by the learned representations and used as input to
train the next layer until all the layers have been trained. A common ANN architecture
for pretraining a layer is the Autoencoder (AE). AEs are networks that have the same
number of output as input nodes. The objective of AEs is to learn an encoding of the
data such that the reconstruction error with respect to the original input is minimized.
During the layer-wise pretraining procedure, once an encoding has been learned, it can
be copied over to the original ANN.
Real data are usually noisy and might contain partially corrupted inputs. It is there-
fore necessary to find features that are robust against such corruptions. The denoising
Autoencoder (dAE) is a special type of AE that essentially tries to learn from such
corruptions. Rather than training using the original input data, the dAE artificially
corrupts the input during training and tries to reconstruct the original data from the
partially destructed input [40]. The informal reasoning behind the conception of dAEs
is that a good representation is expected to capture robust structures of the data in the
form of dependencies within its input distribution. This means that with the amount
of redundancies within images, it should be possible to fully recover partially corrupted
images. Humans for instance, excel at recognizing partially occluded objects. This type
Chapter 2. Automated Computer Aided Diagnosis using Deep Learning22
of classifier is therefore ideal for medical images where noise is inherent.
The second stage of a CADx system is the classification of the segmented regions. A
SVM based method was proposed by [33] to discriminate between malignant and benign
voxels using kinetic enhancement features. The authors in [18] used RFCs to detect
malignant lesions based on a combination of morphological and kinetic features. A simple
ANN architecture was used by [32] to determine malignancy using kinetic enhancement
of single raw voxels. RFCs are ideal for diagnostic purposes, because they quantify the
importance of the features used in training whereas ANNs and SVMs do not provide a
clear description of how the data is clustered. We have therefore decided to use ANNs
and DL to segment the lesions while employing RFCs to classify the resulting regions.
2.2 Method Overview
Our approach models the radiologist’ workflow in which we first generate regions of
interest based on contrast enhancements and then classify those regions as benign or
malignant based on a combination of morphological, texture, and kinetic features. We
exploit the generalization capabilities of deep learning for the detection of suspicious
regions and use cascaded RFCs to differentiate benign and malignant lesions.
The proposed method can be outlined as follows:
1. Each DCE-MRI series is rendered as a 4D matrix (3D volume at 5 time points)
2. The 4D matrix is divided into small overlapping tiles (Figure 2.1a).
3. A trained deep ANN is used to classify each tile as containing a lesion or not.
4. The resulting lesion probability map is then processed to generate regions of inter-
est.
5. Kinetic, morphological, and texture features are generated from those regions.
Chapter 2. Automated Computer Aided Diagnosis using Deep Learning23
6. A trained RFC is used to classify the regions as benign or malignant (Section 2.7).
We used the following open-source python 2.7 packages for the implementation and eval-
uation of all the methods described in this paper: pylearn2, scikits-learn.
2.3 Dataset
For our dataset, a subset of 573 histology-proven malignant and benign lesions from
patient exams with BI-RADS 3 or higher was identified in our research database. For
each lesion, ground truth was semi-automatically generated using a seeded 3D connected-
component region growing method where manual seed points were placed by the authors
based on the lesion location indicated in the radiologist’s report. Another set of 630
normal studies (BI-RADS 3 and lower) were selected based on patients who’ve had no
imaging abnormalities (both benign and malignant) detected for at least 2 consecutive
years. The histology-proven lesion studies were stratified into roughly 3 equal parts. 2
parts were joined to form a training set (150 malignant, 212 benign) and the last part
is left as a testing set (71 malignant, 140 benign, 316 normal). The remaining normal
studies were split divided into 2 roughly equal parts and added to the training set and
testing set such that no patients were included twice in all 1203 studies. As a result our
training set contains 150 malignant, 212 benign, 314 normal studies while the testing set
contains 71 malignant, 140 benign, 316 normal studies. Table 2.1 shows a breakdown of
our dataset.
All of our images in this dataset were acquired as T1W Fat-Sat sagittal DCE-MRI
using a GE 1.5T scanner at an average resolution of 0.388mm by 0.388mm in-plane and
3.0mm between slices. Due to the large slice thickness of our images, each MRI volume
is treated as a stack of 2D slices and all operations were applied on a slice-by-slice basis.
A summary of BI-RADS score for our training data is shown in Table 2.2.
Chapter 2. Automated Computer Aided Diagnosis using Deep Learning24
Table 2.1: A breakdown of our data to training and testing set.
Training Set Testing Set
Normal 314 316
Benign 212 140
Malignant 150 71
Total 676 527
Table 2.2: A breakdown of BI-RADS category for our training data.
BI-RADS 0 1 2 3 4 5 6
Normal 0 84 196 34 0 0 0
Malignant 3 0 1 9 36 60 41
Benign 3 0 7 18 152 17 15
Chapter 2. Automated Computer Aided Diagnosis using Deep Learning25
Figure 2.1: (a) An 8-neighbourhood connection scheme is used to divide the rendered4D DCE-MRI matrix into overlapping image tiles of size 5× 1× 3× 3 (5 time points, 1slice, 3-by-3 voxel window). (b) Each tile is then flattened to a 1D input vector of size45 for use in training and classification by our ANN.
2.4 Preprocessing
Before the segmentation process, a certain number of preprocessing steps to clean the
image are necessary in order to reduce the number of false positives. Since our method
relies on patch-wise classification of time-intensity curves over several acquisition time
points, any type of motion in between acquisitions will affect our results. To reduce this
type of problem, we have used the optical-flow method described in [34] to correct for
the motion. We then render our DCE-MRI volumes as a 4D matrix (3D MRI at 5 time
points). The image intensities are clipped to the 99.5th percentile in order to remove
spikes in intensity values. The contrast between enhancing regions and background tis-
sue is improved by standardizing the image using the equation Vt,i,j,k =Vt,i,j,k−mean(V )
std(V ),
where t is the index of the dynamic sequence (from 0 − 4) and i, j, k are matrix indices
of the respective voxel. The chest area within breast DCE-MRI images tends to be
highly enhancing, which may lead to false positive detected regions. We therefore used a
classifier-based breast segmentation algorithm described in [35] to isolate the breast and
only consider areas within the breast as possible lesions.
Chapter 2. Automated Computer Aided Diagnosis using Deep Learning26
Figure 2.2: (a) Architecture of Deep ANN with 45 input nodes, 32 tanh hidden nodes, 7sigmoid hidden nodes, and 2 softmax output nodes. (b) Stacked dAE used to initiate thenetwork. The first dAE uses a tanh while the second dAE uses a sigmoid as the encodingfunction. The dashed arrow shows the path with respect to the original network.
2.5 Region Selection
We use an ANN to generate a list of suspicious regions. Our particular architecture uses
45 input nodes, 32 tanh hidden nodes, 7 sigmoid hidden nodes, and 2 softmax classifier
nodes. These parameters were experimentally optimized to minimize the number of sam-
ples from the training set that were misclassified (misclassification error). The proposed
architecture is illustrated in Fig. 2.2. An overview of the activation functions for the
nodes can be found in Appendix A.
2.5.1 Unsupervised Pretraining
We initialized our ANN by greedy-wise training a stack of dAE for the 2 hidden layers.
The training data for this process was acquired by dividing each volume in the training
set into 5×1×3×3 image tiles (Figure 2.1). Each layer of dAE in the stack was trained
for 30 epochs using a dropout rate of 30% (each node has 30% chance of being set to
0), batch size of 100, and annealed learning rate starting at 0.001. During each epoch,
millions of image tiles extracted randomly from volumes in the training set are processed
by the stacked dAE and optimized to minimize the reconstruction error. The resulting
weights represent latent representations of our dataset, which are used to initialize the
ANN.
Chapter 2. Automated Computer Aided Diagnosis using Deep Learning27
2.5.2 Supervised Training
After initialization, the ANN is fine-tuned using labeled data. Lesion samples were
generated by taking image tiles within our ground truth segmentations while non-lesion
samples were acquired by taking image tiles within areas of enhancement (e.g. blood ves-
sels, background parenchymal enhancement, artifacts) in normal breasts. In total, 1.8M
input vectors (equally split between lesion and non-lesion samples) from the training set
were used to train each epoch. Since our lesion and non-lesion samples were unbalanced,
we augmented our data by oversampling the minority class with Gaussian noise (σ: 110
of minimum feature standard deviation, µ: 0) in order to balance the 2 samples.
The training was performed using stochastic mini-batch gradient descent backpropa-
gation with initial learning rate of 0.1 and batch size 100. The learning rate was expo-
nentially decreased over 30 epochs. Early stopping based on the MSE (Mean Squared
Error) of the training data was used to prevent over-fitting of our model.
2.5.3 Optimal Region Threshold
To determine the optimal threshold for generating our regions of interest, we performed
FROC (Free response Receiver Operating Characteristic) analysis on the training dataset.
The threshold T was varied between 0.05 and 0.95. For each threshold value, 362 (150
malignant, 212 benign) lesions from our training set were used to calculate the sensitivity
while the average number of detected regions in the left over 314 normal studies were
used to compute the Mean False Candidate Regions (MFCR). A lesion is considered to
be a true positive if the dice score (Equation 2.1) between the ground truth and the
binarized image is greater than 0. In order to achieve robust outlines, our threshold not
only has to capture all the lesions but also retain a high correspondence to the ground
truth segmentations. The optimal threshold, therefore, was selected based on the highest
performance metric as defined by Equation 2.2. We use a scaling factor α of 0.5 as a
compromise between the sensitivity and dice score. In cases where multiple thresholds
Chapter 2. Automated Computer Aided Diagnosis using Deep Learning28
Figure 2.3: Result of conditional dilation operation to join disconnected islands together.Left is the subtraction image showing a 2D slice of the lesion. Middle is the segmentationwithout dilation and right is the segmentation with dilation.
had the same performance metric, we picked the one that had the fewest MFCR. The
optimal threshold selected was 0.615.
DS =2|A ∩B||A|+ |B|
(2.1)
metric[t] = α×mean(DS[t]) + (1− α)×mean(sens[t]) | ∀t ∈ Thresholds (2.2)
2.5.4 Postprocessing
Since nonmass lesions can potentially consist of multiple regions, an attempt was made to
join these regions so that they could be treated as a single structure for further analysis.
We performed a conditional dilation by first dilating the thresholded image by 1mm and
multiplying the resulting image by a mask (probability map thresholded to 0.5). An
example of the resulting pre- and post-operation is shown in Figure 2.3.
2.6 Segmentation
After ANN training and optimal threshold selection, we can generate lesion candidates
as follows:
Chapter 2. Automated Computer Aided Diagnosis using Deep Learning29
1. Preprocess the DCE-MRI series as described in Section 2.4.
2. Reformat the volume as a series of overlapping image tiles (Fig. 2.1).
3. Use the trained ANN to classify each tile as lesion or non-lesion to generate a
probability map that represents lesion-likelihood for each voxel.
4. Apply the optimal threshold from Section 2.5.3 to the probability map to get a
binary image.
5. Apply morphological postprocessing to connect and filter regions.
6. Assign labels to each region so that voxels within each region have the same value.
2.7 Region Classification
The previous section describes a method to segment out enhancing regions as potential
lesion candidates. An ANN was trained to detect regions of interest analogous to how
a human would find lesions by finding bright enhancing regions. In order to find out
whether the detected region is a false positive (i.e. blood vessel), malignant, or benign
lesion, we need to consider additional features such as its shape and texture. In order to
accomplish this, we employ a cascaded 2-stage RFC classifier (Figure 2.4) similar to [17].
The first stage removes as many false positive regions as possible while the second stage
classifies the remaining regions as benign or malignant. To this end, we compute various
morphological, kinetic, and textural features for each region and use them to differentiate
between lesion and non-lesion regions (e.g. artifacts, blood vessels). We then use the
same features to classify the remaining lesion regions as benign or malignant.
Chapter 2. Automated Computer Aided Diagnosis using Deep Learning30
Figure 2.4: A schema of the cascaded RFC. The first RFC classifies lesion and non-lesionregions while the second RFC differentiates the resulting lesions as malignant or benign.
2.7.1 Feature Extraction
We developed a feature extraction pipeline to generate a combination of 75 morphological,
kinetic, and textural features for each region. The full description of these features
is listed in Appendix B. The output segmentations of the trained ANN applied to
the training set was used to generate training samples for the cascaded RFC. Features
extracted from each of the segmented regions were labeled accordingly (malignant for
regions in malignant studies, benign for regions in benign studies, and normal for
regions in normal studies) and used as training samples for the RFC.
2.7.2 RFC Training
The cascaded RFC consists of a lesion classifier (RFC1) and malignancy classifier (RFC2).
Since our lesion versus non-lesion samples were greatly unbalanced, each individual tree
within the RFC was trained by using all of the lesion and a subset of the non-lesion
samples such that each tree was trained on an equal number of lesion and non-lesion
Chapter 2. Automated Computer Aided Diagnosis using Deep Learning31
Figure 2.5: Illustrated examples of features learned by our ANN. (a) 2D representationof first hidden-layer network weights. (b) The value of each row is averaged and plottedon a graph.
samples. The RFC classifiers were trained by performing a grid search along with 10-
fold cross-validation for each set of parameters within the grid. The final classifier was
attained by keeping all the RFCs across each fold that has an AUC greater than 0.75.
The operating point closest to 100% sensitivity and 100% specificity was selected as the
optimal decision threshold.
2.8 Results
The unsupervised training allowed our ANN to capture representations of our data. The
list of features learned by the first hidden layer of our ANN is shown in Figure 2.5. The
plots generated from the weights resemble a series of intensity-time curves (in the form
of image patches). It is interesting to note how the weights across each row starting
from the second row are fairly uniform while the first row, representing the precontrast
patch is more heterogeneous. This might signify that there is more spatial variance in
the precontrast patch compared to the post-contrast patches. To validate our ANN as
an effective way to delineate lesions, we applied the trained network to the unseen testing
set and measured its performance. Our ANN detected 342 out of 362 (94.4%) lesions
from the training set and 204 out of 211 lesions from the testing set.
Chapter 2. Automated Computer Aided Diagnosis using Deep Learning32
Figure 2.6: Aggregated ROC curve of the lesion classifier (RFC1) for each of the 10-foldcross-validation. RFC1 achieved 0.91 AUC (0.91-0.94 interquartile range). The optimalthreshold value of 0.6 was selected to maximize the sensitivity and specificity.
A cross-validation ROC analysis was performed to demonstrate the generalizability of
our RFC. Figure 2.6 shows the 10-fold cross-validation performance of the RFC1 lesion
classifier on the hold-out validation set while Figure 2.7 shows the performance of the
RFC2 malignancy classifier.
After successfully training both classifiers, we applied our CADx pipeline to the testing
set consisting of 71 malignant, 140 benign, and 316 normal studies from completely
different patients. This distribution is approximately 8.5 times the provincial high-risk
screening population breast cancer incidence rate of 1.6% [12]. Our algorithm was able
to correctly detect 204 out of 211 (96.7%) lesions (both benign and malignant) and
correctly classified 67 out of the 71 (94.3%) malignant lesions and 113 of 140 (80.7%)
benign lesions. The overall false positive rate was 0.12 per breast. An overview of the
results is summarized in Table 2.3.
Chapter 2. Automated Computer Aided Diagnosis using Deep Learning33
Figure 2.7: Aggregated ROC curve of the malignant/benign classifier (RFC2) for eachof the 10-fold cross-validation. RFC2 achieved 0.81 AUC (0.80-0.85 interquartile range).The optimal threshold value of 0.63 was selected to maximize the sensitivity and speci-ficity.
Table 2.3: Statistics of our proposed method on the training set and testing set. Themeasures were computed after applying both RFC1 and RFC2 classifiers and provides arough estimate of how well our algorithm does in practice.
Statistic Training Set Testing Set
Sensitivity 0.873 0.944
Specificity 0.859 0.807
Accuracy 0.862 0.886
PPV 0.639 0.545
Chapter 2. Automated Computer Aided Diagnosis using Deep Learning34
Table 2.4: A breakdown of the performance on the testing set with respect to BI-RADScategory.
BI-RADS 0 1 2 3 4 5 6
False Negative 0 0 0 1 0 1 2
False Positive 1 3 21 8 17 0 6
True Negative 1 87 179 37 82 5 9
True Positive 5 0 0 3 19 17 26
2.9 Discussion
We have shown that our method correctly classified the majority of malignant lesions
in our testing set. 27 out of the 140 benign studies and 29 out of 316 normal studies
were classified as malignant. Our method was able to correctly identify the majority of
benign lesions (113 out of 140) at a cost of 29 false positive detections in normal breasts.
Since benign lesions in breasts designated as BI-RADS 3 or 4 are often biopsied, our
method would have greatly reduced the amount of benign biopsies in a clinical setting.
A breakdown of our test results is shown in Table 2.4. One of the false negative lesions
(the BI-RADS 3) was determined to be low-grade Ductal Carcinoma In Situ (DCIS)
after biopsy. Since most of the BI-RADS classification in our database was assigned per
breast, benign biopsied lesions in the same breast as malignant lesions were assigned the
same BI-RADS score in our analysis. This explains the presence of the 6 BI-RADS-6
False Positives (benign classified as malignant) and 9 True Negatives (benign correctly
classified as benign).
Many of the false candidate regions in the normal studies were due to partially seg-
mented blood vessels, imaging artifacts, background parenchymal enhancement, and en-
hancing foci that resemble nonmass lesions (see Figure 1.2). Examples of false positive
Chapter 2. Automated Computer Aided Diagnosis using Deep Learning35
classifications are shown in Figure 2.8. Small enhancing islands were classified as ma-
lignant in the top row. This might be due to the similarity of the enhancing regions to
some of the nonmass lesions in our dataset. This problem could be rectified by boot-
strapping the ANN training with patches within these regions as non-lesion samples.
Alternatively, we could augment our RFC2 malignancy classifier training set with these
cases as additional benign lesion samples.
A second type of false positive misclassification is caused by the existence of a known
benign lesion in the image (see Figure 2.8, bottom row). When radiologists find an
enhancing region determined to be BI-RADS 3 or lower, the patient could be scheduled
for another exam in 6 months to examine its growth. When no changes are observed,
the radiologist can deem the study to be normal, meaning that no cancer is present
in the breast despite the presence of enhancements. Due to the scarcity of our labeled
data, we have included these follow-up exams in our training and testing dataset. The
enhancement detected by our algorithm was described as an intramammary lymph node
by the radiologist after a follow up exam and T2W imaging. However, without access to
additional information, our algorithm is not wrong in picking up these cases as requiring
further inspection.
While ANNs have been used in the past for both segmentation and classification
of lesions, our approach differs firstly by including an unsupervised learning stage for
initializing the ANN. We have also introduced a neighbourhood approach in which we
classify a group of pixels rather than individual ones as is the case in previous studies.
Finally, we reinforced the idea of a cascaded approach to the differentiation of benign
and malignant lesions.
Chapter 2. Automated Computer Aided Diagnosis using Deep Learning36
Figure 2.8: Examples of false positive misclassifications by our algorithm. The top rowshows mild background parenchymal enhancements misclassified as malignant lesions.The bottom row shows a lymphnode detected as malignant.
Chapter 2. Automated Computer Aided Diagnosis using Deep Learning37
2.10 Implementation Details and Limitations
We made some design decisions concerning the architecture of our proposed method,
which will be discussed in this section. Then, we will outline some limitations of our
algorithm and propose ways to alleviate them.
Patch Size Selection
Through preliminary testing on the ANN architecture, we have found that the 8-neighbours
connection scheme (3×3 patch size) gave us better results compared to the 0-neighbours
(single voxel) and 24-neighbours (5 × 5 size) architecture. Although no formal inves-
tigation was carried out, we suspect that the reason the 5 × 5 architecture gave worst
results could be due to the curse of dimensionality. What this means is that as the input
dimensionality increases, the space in which we train our classifiers becomes more sparse.
Since the current patch size results in 45 (3 by 3 patch over 5 time points) inputs, the
increased neighbours count would require 125 inputs. Therefore, in order for the 24-
neighbours scheme to have the same amount of robustness as our current architecture
we would require at least 3 times as much data. Another possibility is that our data has
in-plane voxel size of around .38mm. This effectively gives our 3× 3 patches a resolution
of 1mm. Testing the patch size on images with different voxel size might provide more
insight to this effect.
Optimal Threshold Criteria
Since we decided that it is more detrimental to miss a malignant lesion than to include
a benign one, the optimal threshold selection for all the classifiers was skewed towards
higher sensitivity. Additionally, we used an arbitrary α value of 0.5 in Equation 2.2 to
select the optimal threshold. In order to optimize this value, we would have had to repeat
the training of all the classifiers (both ANN and RFC) and their optimal thresholds for
each α between 0 and 1, which seemed inefficient with respect to time and accuracy
Chapter 2. Automated Computer Aided Diagnosis using Deep Learning38
Figure 2.9: Image of the missed malignant lesion in the test data set (circled in red).The lesion resembles background enhancement.
trade-offs. Figure 2.9 shows the missed malignant lesion from our testing set due to the
applied threshold. The lesion can be distinctly seen in the probability map, but is missing
in the resulting thresholded image.
Limitations
Since our ANN was trained using images from our clinic, our method might not work as
well on images from other clinics taken using a different machine or MRI pulse sequence.
One problem arises when the ANN is applied to DCE-MRI volumes that don’t have
exactly 5 time points. This could be alleviated by down-sampling sequences with more
than 5 time points in order to make it fit our ANN architecture. Further training using
those resampled images would be beneficial to ensure robustness. Another concern could
be differences in image resolution. Since our algorithm was trained on sagittal MRI
volumes, applying it to images with a different orientation might pose problems due to
differences in slice-thickness. This problem could be avoided by resampling the training
data to isotropic resolution while using images taken at different orientations. Since
many of the operations are on a per-slice basis, the processing time of our algorithm
could be greatly reduced by utilizing the parallel computation capabilities of a GPU for
the calculations.
Chapter 2. Automated Computer Aided Diagnosis using Deep Learning39
2.10.1 Acknowledgements
We would like to thank the contributions made by OICR Smarter Imaging Program and
Canadian Breast Cancer Foundation to make this research possible.
Chapter 3
Discussion and Future Work
3.1 Significance of Contributions
In the previous chapters, we emphasized the need for robust lesion segmentation in an
automated CADx pipeline for high risk breast cancer screening. Prior to the quantifi-
cation of tumour malignancy, it is necessary to robustly delineate the corresponding
regions of interest. Relevant features are then computed from those regions and analyzed
to provide a cancer likelihood score. This thesis presents an automated CADx pipeline
that allows accurate detection and diagnosis of breast lesions, which facilitates screening
exams. This work attempts to first segment suspicious lesions based on their kinetic en-
hancement features and then classify them as malignant or benign using a combination
of kinetic, morphological and textural features. The use of kinetic enhancement features
for segmentation corresponds to the way radiologists report findings based on enhancing
regions. Since malignant and benign lesions are characterized by their shape (e.g. round
versus irregular), margin (e.g. circumscribed, spiculated), internal enhancement texture
(e.g. homogeneous versus heterogeneous), and kinetic enhancement curves (e.g. wash-
out versus persistent), we compute these types of features and use them to characterize
our segmented regions. As we have stated in Chapter 1, we have developed an automated
40
Chapter 3. Discussion and Future Work 41
CADx software that can help radiologists diagnose cancers faster and more accurately.
Although our method did not detect all the cancers from our dataset, we must consider
whether our achieved sensitivity of 94.5% is acceptable in clinical practice. It is important
to note that our dataset is biased towards cancers that radiologists found and therefore
does not represent the true sensitivity of general screening exams. For instance, the
population wide breast screening program in Ontario only detected 86.1% of cancers
[38]. Moreover, our algorithm only uses DCE-MRI images whereas clinicians have access
to additional information such as T2W images, ultrasounds and mammography to aid in
detection. DCIS, for example, often produces micro-calcifications which is undetectable
under MRI. Since our method relies on the DCE-MRI modality for detection of lesions, it
is impossible to achieve 100% sensitivity and so we should consider the minimal acceptable
detection rate.
3.2 Future Directions
Convolutional Neural Networks
Our proposed automated CADx pipeline follows the classical image processing paradigm
in which lesions are first segmented and then classified. However, the nature of non-mass
lesions implies the existence of multiple enhancing regions. An intrinsic problem with this
classical paradigm arises from the treatment of these regions as individual disconnected
lesions rather than as a single non-mass lesion. This might pose the risk of the system
presenting the individual parts as benign lesions while in reality the actual non-mass
lesion is malignant. Although we have not encountered this problem in our dataset, it’s
not an unfounded fear. One way to overcome this problem is to merge the segmentation
and classification into a single stage. Rather than segmenting each lesion as an individual
region, we can treat it as a ROI-recognition problem where we classify each ROI based on
whether it contains a malignant lesion or not. In fact, many top results in natural image
Chapter 3. Discussion and Future Work 42
Figure 3.1: Diagram of a ConvNet. The 2 convolution layers act as feature extractorwithout segmentation, while the fully connected layers act as classifiers. The segmenta-tion and classification steps are in essence merged as a single classifier. Image adaptedfrom http://parse.ele.tue.nl/education/cluster2.
object recognition challenges uses some type of ConvNet model [1]. With the advent
of more powerful consumer level hardware, scientists were able to train deeper ANN
architectures with minimal costs. This facilitated the integration of ConvNet within the
medical sciences community. These networks take advantage of the spatial information
within natural images to learn robust hierarchical features that can be used to reconstruct
the original image. While ConvNets are able to solve more complex classification tasks
compared to other classifiers, it will require proportionately more data. Figure 3.1 shows
an example of ConvNet architecture where the need for segmentation is avoided.
Transfer Learning
The strength of ConvNets lies in the fact that the same features learned from one domain
can be applied to another domain with minimal to no adjustments. This phenomenon is
called transfer learning and models the fact that humans are adept at using knowledge
learned in one domain and apply it to another (e.g. using the concept of differentials
from mathematics in physics). Transfer learning has been applied in medical image
analysis by [48] and is shown to improve classification by up to 60% compared to regular
Chapter 3. Discussion and Future Work 43
supervised machine learning algorithms when training data is scarce. This is consistent
with how humans can recognize abnormalities in MRI images despite being only exposed
to natural images throughout their life. Therefore, the use of transfer learning can be
used to supplement the lack of abundant labelled data in the medical fields.
Radiomics
One of the difficulties in classifying lesions is the fact that some cancers remain dormant
over years while others proceed to metastasize rapidly. There might be a genetic cause
behind their development which could be deciphered via machine learning. The field of
radiomics revolves around correlating genomic information with radiology images in order
to characterize tumour phenotype [3]. [29] introduced a workflow in which quantitative
image features are correlated with treatment outcome or gene expression. The current
framework of our CADx pipeline allows easy integration of additional modules such as
learned genomic features. The full potential of our CADx system within the context
of medical screening programs could be realized by incorporating genomic data. For
instance, an ANN based classifier could be attached at the end of the CADx pipeline
to map our extracted image features to a dataset of genomic data. Similar approaches
have been done in the computer vision field in which image captions are generated from
natural images (and vice versa) [23].
3.3 Summary of Contributions
Many existing CADx algorithms in literature are only able to detect malignant lesions [18,
15, 16] or focus on differentiating one type of lesion (either mass or nonmass) as benign or
malignant [41, 45, 41, 49]. I integrated a novel lesion detection algorithm with an existing
classification scheme and demonstrated that our proposed method was able to accurately
detect and classify both mass and nonmass-like lesions as benign or malignant. To
Chapter 3. Discussion and Future Work 44
summarize, the trained deep ANN was able to provide accurate and robust segmentations
of both malignant and benign breast lesions in our dataset. Thus, our method can be
used as a catch all for segmenting any type of breast lesion. The results suggest that
localized time-intensity curves contain sufficient information to delineate breast lesions
from other tissue in DCE-MRI. We believe that this might be due to the robust kinetic
features that our ANN was able to learn from the data. The 2-stage cascaded classifier
approach was able to reduce the number of false positives detected by our segmentation
algorithm to a reasonable amount acceptable for clinical usage. We were able to lower
the False Positive Rate (FPR) well below 1 per breast on unseen data without any
reduction in sensitivity (see Table 2.3). In fact, if every false positive detection from
our algorithm were to be biopsied, our method would have caused 60% fewer negative
biopsies (compared to the total benign cases in our dataset). Our reported results seem
to surpass most of the current automated lesion detection methods in literature and
should be scalable for clinical practice. A more robust validation method would involve
a comparison of performance against other methods using publicly available datasets.
Appendix A
Perceptrons
Perceptrons are the processing units used in ANNs. Each layer of the ANN is composed
of many perceptron processing units working in conjunction. A diagram of the perceptron
is outlined in Figure A.1. The perceptron is modeled after the biological neuron. Just
like how a normal neurons receive signals from other neurons or sensory organs as input,
the perceptron accepts a weighted sum of outputs from other perceptrons as its input.
When the strength of the signals passes a certain threshold, the neuron proceeds to fire
the signal to other units connected to its axon. The derivation shown in A.1 models this
interaction: when the perceptron’s activation reaches a certain threshold, the perceptron
Figure A.1: Diagram of the perceptron unit. It computes the weighted sum of its inputsas activation and proceeds to fire a signal if a threshold is passed.
45
Appendix A. Perceptrons 46
outputs 1 and otherwise 0. The threshold θ is often referred to as the bias in literature
and is incorporated into weighted sum to allow each perceptron to learn its own threshold
during training.
Activation Function: f(a) =
1 a >= θ
0 a < θ
Sum: a =n∑
i=1
xi × wi
n∑i=1
xi × wi ≥ θ, where θ is the activation threshold
n∑i=1
xi × wi − θ = 0, set x0 = θ and w0 = −1
n∑i=0
xi × wi = 0
a =n∑
i=0
xi × wi
(A.1)
Output: y =
1 a >= 0
0 a < 0
Activation Functions
In the perceptron example above, we used a linear step function as the activation function.
In order to introduce non-linearity in our classifier, a non-linear activation function can be
used instead. Equation A.2 shows various activation functions that could be used for each
perceptron. The sigmoid and hyperbolic tangent functions are commonly used within
hidden layers whereas the softmax and linear functions are reserved for the output layer
of an ANN. ANN applied to classification problems usually employ the softmax activation
function as output while regression problems typically use the linear activation function.
Appendix A. Perceptrons 47
Within the context of an ANN layer, a sigmoid layer refers to an array of perceptron units
with the sigmoid activation function. Although each perceptron within a layer could have
a different activation function, it is ineffective to do so in practice as the network will
learn to adjust for this during training.
tanh function: y =2
1 + e−2a− 1
sigmoid function: y =1
1 + e−a
softmax function: y =eaj
N∑i=1
eajfor j=1,...,N
linear function: y = a
(A.2)
Appendix B
Lesion Features
According to the MRI BI-RADS lexicon, the first stage in assessing lesion malignancy
is to classify the type of enhancement as mass, nonmass, or focus. The radiologist then
estimates the likelihood for malignancy by assessing the kinetic, morphological as well as
textural characteristics of the lesion.
The kinetic curve shape type is intrinsically related to the perfusion, capillary per-
meability, and diffusion of contrast media from blood vessels to the extracellular space.
Invasive cancers predominantly present as mass lesions, with washout and persistent
curve shapes. Previous work by [20] has shown that kinetic analysis have the potential
to differentiate between benign and malignant mass lesions effectively. However, when
analysing lesions presenting as nonmass-like enhancements, conventional kinetic analysis
have failed to demonstrate discriminative power between benign and malignant nonmass
lesions [37].
The morphological characteristics of lesions are also evaluated. The main morphologi-
cal difference between mass and nonmass lesions is that unlike mass lesions, nonmass-like
enhancements exhibit poorly defined boundaries, leading to difficulties in the analysis of
morphology [20, 46]. Ultimately, morphological and kinetic features reflects the biological
characteristics of lesions and help explain the differences between benign and malignant
48
Appendix B. Lesion Features 49
lesions.
Bellow is a list of all the features used for our cascaded RFC to differentiate between
malignant and benign lesions.
Dynamic Features
Contrast Enhancement: C(r, i) =S(r, i)− S(r, 0)
S(r, 0)
Average Contrast Enhancement: C(i) = mean[C(r, i)]
Maximum Uptake: maxi=0,1,...,5[C(i)]
Peak Location of Enhancement: time frame index at which maximum enhancement occurs
Uptake Rate:Maximum Uptake
Peak Location
Washout Rate:
Contrast Enhancement−C(5)
5−Peak Location of EnhancementPeak Location of Enhancement 6= 5
0 Peak Location of Enhancement = 5
Inhomogeneity of Contrast Uptake: maxi=0,...,M−1{varr[I(r, i)]
varr[I(r, 0)]}I(r, i) is the set of voxel intensity values in the lesion at timepoint i and r is the vector pointing to the lesion
Variance of Uptake: mini=0,...,M−2{varr[I(r, i)]
varr[I(r, i+ 1)]}
Appendix B. Lesion Features 50
Spatial Variance of Enhancement: V (i) =1
L− 1
L∑r=1
[C(r, i)−C(i)]2, where i = 0, 1, ..., 5
Maximum Variance of Enhancement: max[Spatial Variance of Enhancement]
Peak Location of Variance: time frame index at which maximum variance occurs
Enhancement Variance Increasing Rate:Maximum Variance of Enhancement
Peak Location of Variance
Enhancement Variance Decreasing Rate (FIII,4)
Maximum Variance of Enhancement−V (5)
5−Peak Location of VariancePeak Location of Variance 6= 5
0 Peak Location of Variance = 5
Enhancement Variance at First Post-Contrast Frame: V (1)
Morphological Features
3D Sharpness of Lesion Margin: maxi=0,...,M−1{meanr‖ 5 [Fm(r, i)− Fm(r, 0)]‖
meanrFm(r, i)}
3D Variance of margin gradient: maxi=0,...,M−1{varr‖ 5 [Fm(r, i)− Fm(r, 0)]‖
[meanrFm(r, i)]2}
Appendix B. Lesion Features 51
3D Circularity:volume of sphere with effective lesion diameter
volume of lesion
3D Irregularity: 1− π(effective lesion diameter)2
surface area of lesion
effective lesion diameter = 23
√3 · volume of lesion
4π
Radial Gradient Histogram (RGH): voxel-value gradients·lines intersecting the centroid of lesion
Maximum Variance of RGH Values: maxi=0,...,M−1{varpH(p)}
p =| 5 [Fb(r, i)− Fb(r, 0)] · (r − rc)|‖ 5 [Fb(r, i)− Fb(r, 0)]‖ · ‖(r − rc)‖
Maximum Standard Deviation of RGH Valuesmaxi=0,...,M−1{stdpH(p)}
Texture Features
Energy:G∑i=1
G∑j=1
p(i, j)2
Maximum Probability:
Contrast:G−1∑k=0
k2(∑|i−j|=k
p(i, j))
Sum of Squares (Variance):G∑i=1
G∑j=1
(i− µ)2p(i, j)
Appendix B. Lesion Features 52
Correlation:
∑Gi=1
∑Gj=1(ij)p(i, j)− µxµy
σxσy
Maximal Correlation Coefficient:√
Second largest eigenvalue of Q
Q(i, j) =∑k
p(i, k)p(j, k)
px(i)py(k)
Sum Average:2N∑i=2
ipx+y(i)
Sum Entropy: −2N∑i=2
px+y(i) log p(x+y(i)
Sum Variance:2N∑i=2
(i− Sum Entropy)2px+y(i)
Difference Entropy: −N−1∑i=0
px−y(i) log p(x+y(i)
Difference Variance: variance of px−y
Appendix C
List of Abbreviations
Abbreviations
ABUS: Automated Breast Ultrasound
AE: Autoencoder
ANN: Artificial Neural Network
AUC: Area Under the Curve
BI-RADS: Breast Imaging Reporting And Data System
BPE: Background Parenchymal Enhancement
CADe: Computer Aided Detection
CADx: Computer Aided Diagnosis
ConvNet: Convolutional Neural Network
dAE: Denoising Autoencoder
DCE-MRI: Dynamic Contrast Enhanced Magnetic Resonance Imaging
DCIS: Ductal Carcinoma In Situ
DL: Deep Learning
Fat-Sat: Fat Saturated
FCM: Fuzzy C-Means
FPR: False Positive Rate
53
Appendix C. List of Abbreviations 54
FROC: Free-response Receiver Operating Characteristic
GVF: Gradient Vector Flow
IDC: Invasive Ductal Carcinoma
MFCR: Mean False Candidate Regions
MIP: Maximum Intensity Projection
MRI: Magnetic Resonance Imaging
OBSP: Ontario Breast Screening Program
PPV: Positive Predictive Value
RFC: Random Forests Classifier
ROC: Receiver Operating Characteristic
ROI: Region Of Interest
SRG: Seeded Region Growing
SVM: Support Vector Machine
T1W: T1-Weighted
T2W: T2-Weighted
TPR: True Positive Rate
US: Ultrasound
Bibliography
[1] Large scale visual recognition challenge 2015.
[2] O. Abe, R. Abe, K. Enomoto, K. Kikuchi, H. Koyama, H. Masuda, Y. Nomura,
K. Sakai, K. Sugimachi, T. Tominaga, J. Uchino, M. Yoshida, J. L. Haybittle,
C. Davies, V. J. Harvey, T. M. Holdaway, R. G. Kay, B. H. Mason, J. F. Forbes,
N. Wilcken, M. Gnant, R. Jakesz, M. Ploner, H. M. A. Yosef, C. Focan, J. P. Lo-
belle, U. Peek, G. D. Oates, J. Powell, M. Durand, L. Mauriac, A. Di Leo, S. Dolci,
M. J. Piccart, M. B. Masood, D. Parker, J. J. Price, Psgj Hupperets, S. Jackson,
J. Ragaz, D. Berry, G. Broadwater, C. Cirrincione, H. Muss, L. Norton, R. B. Weiss,
H. T. Abu-Zahra, S. M. Portnoj, M. Baum, J. Cuzick, J. Houghton, D. Riley, N. H.
Gordon, H. L. Davis, A. Beatrice, J. Mihura, A. Naja, Y. Lehingue, P. Romestaing,
J. B. Dubois, T. Delozier, J. Mace-Lesec’h, P. Rambert, O. Andrysek, J. Bark-
manova, J. R. Owen, P. Meier, A. Howell, G. C. Ribeiro, R. Swindell, R. Alison,
J. Boreham, M. Clarke, R. Collins, S. Darby, P. Elphinstone, V. Evans, J. Godwin,
R. Gray, C. Harwood, C. Hicks, S. James, E. MacKinnon, P. McGale, T. McHugh,
G. Mead, R. Peto, Y. Wang, J. Albano, C. F. de Oliveira, H. Gervasio, J. Gordilho,
H. Johansen, H. T. Mouridsen, R. S. Gelman, J. R. Harris, I. C. Henderson, C. L.
Shapiro, K. W. Andersen, C. K. Axelsson, et al. Effects of chemotherapy and
hormonal therapy for early breast cancer on recurrence and 15-year survival: an
overview of the randomised trials. Lancet, 365(9472):1687–1717, 2005.
[3] Hjwl Aerts, E. R. Velazquez, R. T. H. Leijenaar, C. Parmar, P. Grossmann, S. Cav-
55
Bibliography 56
alho, J. Bussink, R. Monshouwer, B. Haibe-Kains, D. Rietveld, F. Hoebers, M. M.
Rietbergen, C. R. Leemans, A. Dekker, J. Quackenbush, R. J. Gillies, and P. Lambin.
Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics
approach. Nature Communications, 5, 2014.
[4] S. Agliozzo, M. De Luca, C. Bracco, A. Vignati, V. Giannini, L. Martincich, L. A.
Carbonaro, A. Bert, F. Sardanelli, and D. Regge. Computer-aided diagnosis for
dynamic contrast-enhanced breast mri of mass-like lesions using a multiparametric
model combining a selection of morphological, kinetic, and spatiotemporal features.
Medical Physics, 39(4):1704–1715, 2012.
[5] T. Ayer, M. U. Ayvaci, Z. X. Liu, O. Alagoz, and E. S. Burnside. Computer-aided
diagnostic models in breast cancer screening. Imaging in Medicine, 2(3):313–323,
2010.
[6] B. Bayram, H. K. Koca, B. Narin, G. C. Cavdaroglu, L. Celik, U. Acar, and
R. Cubuk. An efficient algorithm for automatic tumor detection in contrast en-
hanced breast mri by using artificial neural network (neubrea). Neural Network
World, 23(5):483–498, 2013.
[7] S. Behrens, H. Laue, M. Althaus, T. Boehler, B. Kuemmerlen, H. K. Hahn, and H. O.
Peitgen. Computer assistance for mr based diagnosis of breast cancer: Present and
future challenges. Computerized Medical Imaging and Graphics, 31(4-5):236–247,
2007.
[8] T. Berber, A. Alpkocak, P. Balci, and O. Dicle. Breast mass contour segmentation al-
gorithm in digital mammograms. Computer Methods and Programs in Biomedicine,
110(2):150–159, 2013.
[9] W. A. Berg, L. Gutierrez, M. S. NessAiver, W. B. Carter, M. Bhargavan, R. S. Lewis,
and O. B. Ioffe. Diagnostic accuracy of mammography, clinical examination, us, and
Bibliography 57
mr imaging in preoperative assessment of breast cancer. Radiology, 233(3):830–849,
2004.
[10] J. M. Chang, W. K. Moon, N. Cho, J. S. Park, and S. J. Kim. Radiologists’
performance in the detection of benign and malignant masses with 3d automated
breast ultrasound (abus). European Journal of Radiology, 78(1):99–103, 2011.
[11] W. J. Chen, M. L. Giger, and U. Bick. A fuzzy c-means (fcm)-based approach
for computerized segmentation of breast lesions in dynamic contrast-enhanced mr
images. Academic Radiology, 13(1):63–72, 2006.
[12] A. M. Chiarelli, M. V. Prummel, D. Muradali, V. Majpruz, M. Horgan, J. C. Carroll,
A. Eisen, W. S. Meschino, R. S. Shumak, E. Warner, and L. Rabeneck. Effectiveness
of screening with annual magnetic resonance imaging and mammography: Results
of the initial screen from the ontario high risk breast screening program. Journal of
Clinical Oncology, 32(21):2224–2230, 2014.
[13] Chen-Pin Chou, John M. Lewin, Chia-Ling Chiang, Bao-Hui Hung, Tsung-Lung
Yang, Jer-Shyung Huang, Jia-Bin Liao, and Huay-Ben Pan. Clinical evaluation
of contrast-enhanced digital mammography and contrast enhanced tomosynthesis-
comparison to contrast-enhanced breast mri. European journal of radiology,
84(12):2501–8, 2015.
[14] Y. F. Cui, Y. Q. Tan, B. S. Zhao, L. Liberman, R. Parbhu, J. Kaplan,
M. Theodoulou, C. Hudis, and L. H. Schwartz. Malignant lesion segmentation
in contrast-enhanced breast mr images based on the marker-controlled watershed.
Medical Physics, 36(10):4359–4369, 2009.
[15] G. Ertas, O. Gulcur, and M. Tunaci. Improved lesion detection in mr mammography:
Three-dimensional segmentation, moving voxel sampling, and normalized maximum
intensity-time ratio entropy. Academic Radiology, 14(2):151–161, 2007.
Bibliography 58
[16] T. W. Freer and M. J. Ulissey. Screening mammography with computer-aided detec-
tion: Prospective study of 12,860 patients in a community breast center. Radiology,
220(3):781–786, 2001.
[17] C. Gallego-Ortiz and A.L. Martel. Improving the accuracy of computer-aided diag-
nosis for breast mr imaging by differentiating between mass and nonmass lesions.
Radiology, 0(0):150241, 2015. PMID: 26383229.
[18] A. Gubern-Merida, R. Marti, J. Melendez, J. L. Hauth, R. M. Mann, N. Karsse-
meijer, and B. Platel. Automated localization of breast cancer in dce-mri. Medical
Image Analysis, 20(1):265–274, 2015.
[19] Leichter Isaac, Lederman Richard, Buchbinder Shalom, Srour Yossi, Bamberger
Philippe, and Sperber Fanny. Computerized classification can reduce unnecessary
biopsies in bi-rads category 4a lesions. In Proceedings of the 8th International Con-
ference on Digital Mammography, IWDM’06, pages 76–83, Berlin, Heidelberg, 2006.
Springer-Verlag.
[20] Sanaz A. Jansen, Xiaobing Fan, Gregory S. Karczmar, Hiroyuki Abe, Robert A.
Schmidt, and Gillian M. Newstead. Differentiation between benign and malignant
breast lesions detected by bilateral dynamic contrast-enhanced mri: A sensitivity
and specificity study. Magnetic Resonance in Medicine, 59(4):747–754, 2008.
[21] J. Jayender, S. Chikarmane, F. A. Jolesz, and E. Gombos. Automatic segmentation
of invasive breast carcinomas from dynamic contrast-enhanced mri using time series
analysis. Journal of Magnetic Resonance Imaging, 40(2):467–475, 2014.
[22] K. M. Kelly and G. A. Richwald. Automated whole-breast ultrasound: Advancing
the performance of breast cancer screening. Seminars in Ultrasound Ct and Mri,
32(4):273–280, 2011.
Bibliography 59
[23] Xu Kelvin, Ba Jimmy, Kiros Ryan, Cho Kyunghyun, C. Courville Aaron, Salakhut-
dinov Ruslan, S. Zemel Richard, and Bengio Yoshua. Show, attend and tell: Neural
image caption generation with visual attention. CoRR, abs/1502.03044, 2015.
[24] L. A. L. Khoo, P. Taylor, and R. M. Given-Wilson. Computer-aided detection in the
united kingdom national breast screening programme: Prospective study. Radiology,
237(2):444–449, 2005.
[25] M. V. Knopp, E. Weiss, H. P. Sinn, J. Mattern, H. Junkermann, J. Radeleff, A. Ma-
gener, G. Brix, S. Delorme, I. Zuna, and G. van Kaick. Pathophysiologic basis
of contrast enhancement in breast tumors. Jmri-Journal of Magnetic Resonance
Imaging, 10(3):260–266, 1999.
[26] T. M. Kolb, J. Lichy, and J. H. Newhouse. Comparison of the performance of
screening mammography, physical examination, and breast us and evaluation of
factors that influence them: An analysis of 27,825 patient evaluations. Radiology,
225(1):165–175, 2002.
[27] C. Kuhl. The current status of breast mr imaging - part i. choice of technique, im-
age interpretation, diagnostic accuracy, and transfer to clinical practice. Radiology,
244(2):356–378, 2007.
[28] M. A. Lacquement, D. Mitchell, and A. B. Hollingsworth. Positive predictive value
of the breast imaging reporting and data system. Journal of the American College
of Surgeons, 189(1):34–40, 1999.
[29] P. Lambin, E. Rios-Velazquez, R. Leijenaar, S. Carvalho, Rgpm van Stiphout,
P. Granton, C. M. L. Zegers, R. Gillies, R. Boellard, A. Dekker, Hjwl Aerts, and I. C.
ConCePT Consortium Qu. Radiomics: Extracting more information from medical
images using advanced feature analysis. European Journal of Cancer, 48(4):441–446,
2012.
Bibliography 60
[30] Q. V. Le and Ieee. Building high-level features using large scale unsupervised learn-
ing. In IEEE International Conference on Acoustics, Speech, and Signal Process-
ing (ICASSP), International Conference on Acoustics Speech and Signal Processing
ICASSP, pages 8595–8598, NEW YORK, 2013. Ieee.
[31] J. E. D. Levman, E. Warner, P. Causer, and A. L. Martel. A vector machine
formulation with application to the computer-aided diagnosis of breast cancer from
dce-mri screening examinations. Journal of Digital Imaging, 27(1):145–151, 2014.
[32] R. Lucht, S. Delorme, and G. Brix. Neural network-based segmentation of dynamic
mr mammographic images. Magnetic Resonance Imaging, 20(2):147–154, 2002.
[33] S. Marrone, G. Piantadosi, R. Fusco, A. Petrillo, M. Sansone, and C. Sansone.
Automatic lesion detection in breast dce-mri. Image Analysis and Processing (Iciap
2013), Pt Ii, 8157:359–368, 2013.
[34] A. L. Martel, M. S. Froh, K. K. Brock, D. B. Plewes, and D. C. Barber. Evaluating an
optical-flow-based registration algorithm for contrast-enhanced magnetic resonance
imaging of the breast. Physics in Medicine and Biology, 52(13):3803–3816, 2007.
[35] Anne L. Martel, Cristina Gallego-Ortiz, and YingLi Lu. Breast segmentation in mri
using poisson surface reconstruction initialized with random forest edge detection,
2016.
[36] L. A. Meinel, A. H. Stolpen, K. S. Berbaum, L. L. Fajardo, and J. M. Reinhardt.
Breast mri lesion classification: Improved performance of human readers with a
backpropagation neural network computer-aided diagnosis (cad) system. Journal of
Magnetic Resonance Imaging, 25(1):89–95, 2007.
[37] Dustin Newell, Ke Nie, Jeon-Hor Chen, Chieh-Chih Hsu, Hon J. Yu, Orhan Nal-
cioglu, and Min-Ying Su. Selection of diagnostic features on breast mri to dif-
ferentiate between malignant and benign lesions using computer-aided diagnosis:
Bibliography 61
differences in lesions presenting as mass and non-mass-like enhancement. European
Radiology, 20(4):771–781, 2010.
[38] Cancer Care Ontario. Ontario breast screening program 2011 report. Technical
report, Government of Canada, 2011.
[39] Y. C. Pang, L. Li, W. Y. Hu, Y. X. Peng, L. Z. Liu, and Y. Z. Shao. Computerized
segmentation and characterization of breast lesions in dynamic contrast-enhanced
mr images using fuzzy c-means clustering and snake algorithm. Computational and
Mathematical Methods in Medicine, 2012.
[40] V. Pascal, H. Larochelle, Y. Bengio, and M. Pierre-Antoine. Extracting and compos-
ing robust features with denoising autoencoders. In Proceedings of the Twenty-fifth
International Conference on Machine Learning (ICML’08), International Conference
on Machine Learning (ICML), pages 1096–1103, NEW YORK, 2008. ACM.
[41] D. M. Renz, J. Bottcher, F. Diekmann, A. Poellinger, M. H. Maurer, A. Pfeil,
F. Streitparth, F. Collettini, U. Bick, B. Hamm, and E. M. Fallenberg. Detec-
tion and classification of contrast-enhancing masses by a fully automatic computer-
assisted diagnosis system for breast mri. Journal of Magnetic Resonance Imaging,
35(5):1077–1088, 2012.
[42] R. Rouhi, M. Jafari, S. Kasaei, and P. Keshavarzian. Benign and malignant breast
tumors classification based on region growing and cnn segmentation. Expert Systems
with Applications, 42(3):990–1002, 2015.
[43] S. Shapiro, W. Venet, P. Strax, L. Venet, and R. Roeser. 10-year to 14-year effect
of screening on breast-cancer mortality. Journal of the National Cancer Institute,
69(2):349–355, 1982.
Bibliography 62
[44] Canadian Cancer Society, Statistics Canada, Public Health Agency of Canada, and
Provincial/Territorial Cancer Registries cancer.ca/statistics. Canadian cancer statis-
tics. Technical report, Government of Canada, 2015.
[45] M. X. Tan, J. T. Pu, and B. Zheng. Optimization of breast mass classification
using sequential forward floating selection (sffs) and a support vector machine
(svm) model. International Journal of Computer Assisted Radiology and Surgery,
9(6):1005–1020, 2014.
[46] Mitsuhiro TOZAKI, Takao IGARASHI, and Kunihiko FUKUDA. Positive and neg-
ative predictive values of bi-rads-mri descriptors for focal breast masses. Magnetic
Resonance in Medical Sciences, 5(1):7–15, 2006.
[47] B. van Ginneken, C. M. Schaefer-Prokop, and M. Prokop. Computer-aided diagnosis:
How to move from the laboratory to the clinic. Radiology, 261(3):719–732, 2011.
[48] A. van Opbroek, M. A. Ikram, M. W. Vernooij, and M. de Bruijne. Transfer learning
improves supervised image segmentation across imaging protocols. Ieee Transactions
on Medical Imaging, 34(5):1018–1030, 2015.
[49] A. Vignati, V. Giannini, M. De Luca, L. Morra, D. Persano, L. A. Carbonaro,
I. Bertotto, L. Martincich, D. Regge, A. Bert, and F. Sardanelli. Performance of
a fully automatic lesion detection system for breast dce-mri. Journal of Magnetic
Resonance Imaging, 34(6):1341–1351, 2011.
[50] T. C. Wang, Y. H. Huang, C. S. Huang, J. H. Chen, G. Y. Huang, Y. C. Chang, and
R. F. Chang. Computer-aided diagnosis of breast dce-mri using pharmacokinetic
model and 3-d morphology analysis. Magnetic Resonance Imaging, 32(3):197–205,
2014.
[51] E. Warner, K. Hill, P. Causer, D. Plewes, R. Jong, M. Yaffe, W. D. Foulkes,
P. Ghadirian, H. Lynch, F. Couch, J. Wong, F. Wright, P. Sun, and S. A. Narod.
Bibliography 63
Prospective study of breast cancer incidence in women with a brca1 or brca2 mu-
tation under surveillance with and without magnetic resonance imaging. Journal of
Clinical Oncology, 29(13):1664–1669, 2011.
[52] Wikipedia. Otsu’s method — Wikipedia, the free encyclopedia, 2015.
[53] Wikipedia. Watershed (image processing) — Wikipedia, the free encyclopedia, 2015.