automatic computer aided diagnosis of breast cancer in ...€¦ · automatic computer aided...

Automatic Computer Aided Diagnosis of Breast Cancer inDynamic Contrast Enhanced Magnetic Resonance Images

by

Hongbo Wu

A thesis submitted in conformity with the requirementsfor the degree of Master of Science

Graduate Department of Medical BiophysicsUniversity of Toronto

c© Copyright 2016 by Hongbo Wu

Abstract

Automatic Computer Aided Diagnosis of Breast Cancer in Dynamic Contrast Enhanced

Magnetic Resonance Images

Hongbo Wu

Master of Science

Graduate Department of Medical Biophysics

University of Toronto

2016

Automated Computer Aided Diagnosis (CADx) systems have the potential to improve

the diagnostic accuracy of radiologists. Most CADx algorithms use features generated

from outlined regions to differentiate between benign and malignant lesions. Manually

outlining these regions for the purpose of analysis is not viable and therefore an au-

tomated segmentation method is essential. Our proposed method uses a trained deep

Artificial Neural Network (ANN) to classify overlapping tiles in breast Dynamic Con-

trast Enhanced Magnetic Resonance Imaging (DCE-MRI) images as lesion or non-lesion.

The classified tiles are then grouped into regions. Additional morphological, kinetic and

textural features are computed for each detected region. A cascaded Random Forests

Classifier (RFC) classifies the regions as malignant or benign. Our method was tested on

a dataset containing 71 malignant, 140 benign, and 316 normal studies. Free-response

Receiver Operating Characteristic (FROC) analysis of our method shows 94.4% sensitiv-

ity at 0.12 false positive detections per normal study.

ii

Acknowledgements

Although this thesis is published under one name, there were many people who made

this work possible. This section is dedicated to everyone who helped me get to this point.

First of all, I owe a great deal of gratitude to my supervisor Dr. Anne Martel for her

guidance as well as her easy-going supervision style, which provides the ideal environment

for me to grow as an independent scientist. Similarly, I would like to thank my supervisory

committee members Dr. Philip Beatty who aided me with the engineering aspect and

Dr. Martin Yaffe who provided me with the more clinical perspectives of my work.

I would also like to thank the members of the Martel breast CAD group - Cristina,

Martin, Yingli, Sharmilla, Nikita and Sylvester - for providing me with their constructive

feedback and inspiring discussions.

Finally, I would like to thank my family for the irreplaceable support they have given

me throughout the past few years.

iii

Contents

List of Abbreviations ix

1 Introduction 1

1.1 Breast Cancer Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Breast Cancer Screening . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Computer Aided Detection and Diagnosis . . . . . . . . . . . . . . . . . 8

1.3.1 Detection Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.3.2 Classification Overview . . . . . . . . . . . . . . . . . . . . . . . . 15

1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2 Automated Computer Aided Diagnosis using Deep Learning 19

2.1 Introduction and Background . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2 Method Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.4 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.5 Region Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.5.1 Unsupervised Pretraining . . . . . . . . . . . . . . . . . . . . . . 26

2.5.2 Supervised Training . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.5.3 Optimal Region Threshold . . . . . . . . . . . . . . . . . . . . . . 27

2.5.4 Postprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.6 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

iv

2.7 Region Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.7.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.7.2 RFC Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.8 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.10 Implementation Details and Limitations . . . . . . . . . . . . . . . . . . 37

2.10.1 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3 Discussion and Future Work 40

3.1 Significance of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.3 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Appendices 44

A Perceptrons 45

B Lesion Features 48

C List of Abbreviations 53

Bibliography 54

v

List of Tables

1.1 The Breast Imaging Reporting And Data System (BI-RADS) risk classi-

fication system for breast Magnetic Resonance Imaging (MRI) that radi-

ologists can assign to an exam. Table adapted from American College of

Radiology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1 A breakdown of our data to training and testing set. . . . . . . . . . . . 24

2.2 A breakdown of BI-RADS category for our training data. . . . . . . . . . 24

2.3 Statistics of our proposed method on the training set and testing set. The

measures were computed after applying both RFC1 and RFC2 classifiers

and provides a rough estimate of how well our algorithm does in practice. 33

2.4 A breakdown of the performance on the testing set with respect to BI-

RADS category. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

vi

List of Figures

1.1 A slice containing a malignant lesion in a series of DCE-MRI volumes.

The red box indicates the location of a malignant lesion. (a) Processed

maximum intensity projection image of the subtraction to show enhancing

lesions and blood vessels. (b) A series of DCE-MRI images at 5 time points.

The first image is the baseline before the contrast agent injection. . . . . 6

1.2 Examples of (a) mass and (b) non-mass lesions in subtracted DCE-MRI. 8

1.3 Kinetic enhancement map generated by Merge CADStream Software. (Adapted

from http://www.axisimagingnews.com/2010/07/better-breast-imaging/) 9

1.4 Flowchart of our automated CADx pipeline. After preprocessing each

image, we apply our detection algorithm to classify all the detected regions

as benign or malignant and present the resulting regions to the radiologist

for diagnosis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.1 (a) An 8-neighbourhood connection scheme is used to divide the rendered

4D DCE-MRI matrix into overlapping image tiles of size 5× 1× 3× 3 (5

time points, 1 slice, 3-by-3 voxel window). (b) Each tile is then flattened

to a 1D input vector of size 45 for use in training and classification by our

ANN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

vii

2.2 (a) Architecture of Deep ANN with 45 input nodes, 32 tanh hidden nodes,

7 sigmoid hidden nodes, and 2 softmax output nodes. (b) Stacked Denois-

ing Autoencoder (dAE) used to initiate the network. The first dAE uses a

tanh while the second dAE uses a sigmoid as the encoding function. The

dashed arrow shows the path with respect to the original network. . . . . 26

2.3 Result of conditional dilation operation to join disconnected islands to-

gether. Left is the subtraction image showing a 2D slice of the lesion.

Middle is the segmentation without dilation and right is the segmentation

with dilation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.4 A schema of the cascaded RFC. The first RFC classifies lesion and non-

lesion regions while the second RFC differentiates the resulting lesions as

malignant or benign. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.5 Illustrated examples of features learned by our ANN. (a) 2D representation

of first hidden-layer network weights. (b) The value of each row is averaged

and plotted on a graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.6 Aggregated Receiver Operating Characteristic (ROC) curve of the lesion

classifier (RFC1) for each of the 10-fold cross-validation. RFC1 achieved

0.91 Area Under the Curve (AUC) (0.91-0.94 interquartile range). The

optimal threshold value of 0.6 was selected to maximize the sensitivity

and specificity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.7 Aggregated ROC curve of the malignant/benign classifier (RFC2) for each

of the 10-fold cross-validation. RFC2 achieved 0.81 AUC (0.80-0.85 in-

terquartile range). The optimal threshold value of 0.63 was selected to

maximize the sensitivity and specificity. . . . . . . . . . . . . . . . . . . . 33

viii

2.8 Examples of false positive misclassifications by our algorithm. The top

row shows mild background parenchymal enhancements misclassified as

malignant lesions. The bottom row shows a lymphnode detected as ma-

lignant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.9 Image of the missed malignant lesion in the test data set (circled in red).

The lesion resembles background enhancement. . . . . . . . . . . . . . . 38

3.1 Diagram of a Convolutional Neural Network (ConvNet). The 2 convolu-

tion layers act as feature extractor without segmentation, while the fully

connected layers act as classifiers. The segmentation and classification

steps are in essence merged as a single classifier. Image adapted from

http://parse.ele.tue.nl/education/cluster2. . . . . . . . . . . . . . . . . . 42

A.1 Diagram of the perceptron unit. It computes the weighted sum of its

inputs as activation and proceeds to fire a signal if a threshold is passed. 45

ix

Chapter 1

Introduction

1

Chapter 1. Introduction 2

1.1 Breast Cancer Overview

Breast cancer is known to be one of the most diagnosed diseases among women in Canada.

It is currently the second leading cause of cancer death in women, resulting in an esti-

mated mortality rate of 17.9 per 100,000. Statistics show that 1 in 9 women is expected

to develop breast cancer over her lifetime and 13.6% of the women who have breast can-

cer will eventually succumb to it [44]. This number has been steadily declining over the

past few decades due to a combination of screening programs and improved treatment

[43].

The breast is an organ containing a network of ducts and lobules responsible for milk

production. The majority of cancers diagnosed are carcinomas arising from the ductal

and lobular epithelial cells of the breast. Breast cancer is categorized into 2 distinct types:

invasive and in situ carcinoma. In situ carcinomas may arise from the ducts or lobules

of the breast and are contained within their respective epithelium. When these cancer

cells proceed to invade the outer membrane beyond the epithelium, they are considered

invasive carcinomas. These types of cancers have a higher chance of metastasis.

Common approaches for treating breast cancer include chemotherapy, radiotherapy,

hormonal therapy, surgery (e.g. mastectomy, lumpectomy), or potentially a combination

of these procedures. Anatomical imaging can be used as a non-invasive way to detect

tumours and assess their size and progression. In vivo imaging has therefore become an

irreplaceable part of cancer diagnosis and treatment. The 3 most common modalities

used in breast imaging are X-ray mammography, Ultrasound (US), and MRI.

Currently, the standard for general population breast cancer screening is X-ray mam-

mography. A retrospective study following patients over a 2 year period shows that the

Ontario Breast Screening Program (OBSP) has 86.1% sensitivity, 93.1% specificity, and

6.5% Positive Predictive Value (PPV) after recall exam[38]. Research into Computer

Aided Detection (CADe) systems within the clinical workflow has shown that the sen-

sitivity can be further increased without loss in PPV [16]. Nevertheless, mammography


still struggles with detecting cancers in younger women who have denser breasts. MRI is

an alternative imaging modality that is known for having the highest sensitivity. Despite

this, it suffers from moderate specificity [27]. Studies have suggested that CADx systems

can help improve the diagnostic accuracy of DCE-MRI [7]. Current CADx systems on

the market for breast MRI rely on human interaction for detecting lesions and will not

increase the overall cancer detection rate. A fully automated CADx system can there-

fore have the potential to increase both the sensitivity (detection rate) as well as the

diagnostic accuracy of lesions.

This thesis presents a fully automatic lesion detection and classification algorithm that

can diagnose both mass and nonmass lesions in DCE-MRI. The key contribution of this

work is a proposed automated CADx system for high-risk breast cancer screening. In the

proceeding section of this chapter, we present an overview of breast cancer screening in

Canada and highlight the various CADe and CADx algorithms described in the literature.

1.2 Breast Cancer Screening

Cancers detected in their early stages tend to be easier to treat and have potentially fewer

complications, which may lead to a better prognosis for patients [2]. Thus, early detection

and monitoring of treatment response is important for improving patient survival rate.

Consequently, government agencies have introduced screening programs as part of the

healthcare system. The current standard for population-wide breast cancer screening is

X-ray mammography. This procedure typically involves compressing each breast between

2 plates while X-ray images are taken in 2 different planes. As X-rays travel through

the breast, various tissues will absorb the energy differently. This difference translates

directly to brightness in the resulting image. Since tumours have higher attenuation

than fatty tissue, they show up brighter in mammograms. Problems arise in denser

breasts where tumours have a greater risk of being occluded by the parenchymal tissue


in the resulting 2D image. A 3D method such as tomosynthesis is known to minimize this

problem by imaging at multiple angles. In the case of abnormal findings during screening,

subsequent US or MRI exams might be arranged before making a final diagnosis.

Breast US imaging offers a relatively cheap radiation-free diagnostic modality. It

involves a hand-held transducer that sends sound waves through the breast and detects

the resulting echoes. A common application of US within breast cancer diagnosis is the

detection of cysts. Water is known to have a lower acoustic attenuation than fat and

tissue. This means that it will reflect less acoustic waves and will therefore appear darker

in US images. Since cysts are typically filled with fluid, they are easily recognized by their

characteristic dark appearance. So in conjunction with X-ray mammography, radiologists

can rule out benign lesions such as cysts. A disadvantage of conventional breast US is the

reliance on a hand-held device, which makes image quality dependent on the technologist

operating it. Some clinics offer Automated Breast Ultrasound (ABUS) as an alternative

screening modality for women with dense breasts. The ABUS uses a robotic device

instead of the hand-held transducers in conventional US, which allows for faster and

more consistent imaging. Furthermore, the ABUS generates a 3D reconstruction of the

images unlike conventional US where only 2D images are taken. A study measuring the

accuracy of ABUS imaging has found that it approaches 95% sensitivity for malignant

and 66% for benign masses [10]. It has been suggested by [22] that ABUS in conjunction

with mammography could reach the same sensitivity as MRI.

The OBSP recommends that women undergo routine breast screening between the

age 50 and 74. While this is sufficient for the majority of the population, evidence

suggests that a certain group of women can have up to 85 % chance of developing breast

cancer over their lifetime [51]. The increased risk means that overall, women in this

group tend to develop breast cancer at a younger age and will therefore have denser

breasts. Mammographic images of breasts with high density have a higher chance of

missed lesions due to occlusion [26]. On the other hand, the issue of breast density does


not affect MRI images as much. Consequently, the OBSP initiated an annual high-risk

screening program for women between the ages of 30 and 69 at high risk of developing

breast cancer. Women included in this program must have at least one of the following

criteria: (1) carriers of BRCA1/2 mutation, (2) did not undergo genetic assessment

but have first-degree relatives with such mutation, (3) have a lifetime risk greater than

25% based on genetic assessment, or (4) have had received chest radiation before the

age of 30 and at least 8 years ago. For these women, the primary diagnostic screening

modality is DCE-MRI, which has been shown to have higher sensitivity compared to

X-ray mammography and breast ultrasound [9]. In fact, clinical studies have shown

that among the 3 modalities used, MRI was the only modality able to detect all of the

invasive cancers in their screening population [12]. This might be due to the fact that in

mammography, breasts with high amounts of dense tissue have a greater risk of occluding

smaller cancers. A 3D imaging method such as breast tomosynthesis could reduce this

problem. However, the lack of availability makes it difficult to implement in the current

population-wide screening program.

Recent studies have acknowledged MRI as an invaluable tool for detecting cancer in

woman at high risk. The typical MRI breast screening exam involves many different MRI

sequences to produce T1-Weighted (T1W), T2-Weighted (T2W), Fat Saturated (Fat-Sat)

images, Diffusion MRI, and DCE-MRI images. The main diagnostic tool for radiologists

is the analysis of DCE-MRI through the flow of contrast agents. The procedure involves

first taking a pre-contrast image of the breast. Then, a gadolinium-based contrast agent

is injected into the bloodstream and images are taken periodically afterwards to assess

the flow of the agent through the breast. Since aggressive tumours are known to have

very permeable membranes, the contrast agent flows into and out of the tumour more

readily compared to other tissues. Due to the paramagnetic properties of gadolinium

in the contrast agent, areas containing gadolinium will show up brightly in T1W MRI

images indicating a suspicious lesion. Figure 1.1 shows the enhancement of a lesion within


Figure 1.1: A slice containing a malignant lesion in a series of DCE-MRI volumes. Thered box indicates the location of a malignant lesion. (a) Processed maximum intensityprojection image of the subtraction to show enhancing lesions and blood vessels. (b) Aseries of DCE-MRI images at 5 time points. The first image is the baseline before thecontrast agent injection.

a slice of DCE-MRI volume over 5 time points. In order to make lesions easier to detect,

Maximum Intensity Projection (MIP) and subtraction images are often generated as part

of the protocol.

While MRI is known for achieving the highest sensitivity in detection, [13] shows that

contrast enhanced X-ray modalities can achieve detection rates equivalent to MRI. This

implies that the main factor for the detection of lesions is the presence of contrast agent

enhancement. Indeed, it is well studied that more malignant tumours tend to induce

rapid vasculature growth (a process called angiogenesis) which allows contrast agents to

permeate the tissue more readily [25]. The reason that these contrast enhanced X-ray

modalities are less widely used however, is due to the increased radiation dose as well as

high contrast agent dose, which make them unsuitable as a screening modality.

A commonly used standard for reporting MRI exams is BI-RADS. The lexicon pro-

vides descriptors for lesions, Background Parenchymal Enhancement (BPE), as well as

criteria for categorizing the likelihood of cancer. Enhancements that take on a distinct

shape within an area are considered mass lesions while enhancements that cluster over

multiple groups are categorized as non-mass lesions. These lesions are further detailed on

morphological (e.g. shape, margin), texture (e.g. internal enhancement patterns), and

kinetic (e.g. wash-out) features. Examples of mass and non-mass lesions are depicted in

Figure 1.2. A third category called foci describes enhancing regions that are too small


Table 1.1: The BI-RADS risk classification system for breast MRI that radiologists canassign to an exam. Table adapted from American College of Radiology.

to accurately characterize with respect to their margins. These regions can be correlated

with T2W imaging to rule out the presence of lymph nodes. Foci that do not appear

bright in T2W images are likely to be looked on with suspicion and might require biopsy

if they are observed to have increased in size at a follow-up exam. After considering

all these findings, the radiologist then assigns a score based on the likelihood of cancer.

The full BI-RADS cancer risk classification system for breast MRI is listed in Table 1.1.

Statistical analysis of biopsies performed at a clinic shows that BI-RADS 3 has a positive

predictive value of 3% while BI-RADS 4 and 5 are at 23% and 92% respectively [28].

To put this into perspective, close to half of the biopsies performed were of BI-RADS 3.

The large amount of BI-RADS 3 biopsies performed along with the low PPV means that

most of the negative biopsies performed belong in this group.


Figure 1.2: Examples of (a) mass and (b) non-mass lesions in subtracted DCE-MRI.

1.3 Computer Aided Detection and Diagnosis

It is believed that computer assistance within the clinical workflow can help improve

the radiologists’ diagnostic accuracy. Computer assistance can be categorized as CADe,

where the main goal is the detection of lesions, and CADx, where the system attempts to

differentiate benign and malignant lesions. While there are no automated CADx systems

on the market, manual and semi-automatic CADe systems for mammography are already

being integrated into the clinical workflow. In clinical practice, the CADe system is

applied after the primary radiologist finishes examining the image [24]. This essentially

allows the CADe system to bring attention to regions that the radiologist might have

overlooked. Such systems are commonly used in clinical practice as a second reader. On

the other hand, CADx systems attempt to diagnose any detected lesions as malignant or

benign. There are several factors identified for CADx systems to be successfully adapted

for wide clinical practice [47]. A CADx should improve the radiologist’s performance,

save time, be seamlessly integrated into the workflow, be cost-saving, and should not


Figure 1.3: Kinetic enhancement map generated by Merge CADStream Software.(Adapted from http://www.axisimagingnews.com/2010/07/better-breast-imaging/)

impose liability concerns.

While CADx systems are currently in use for mammography screening exams, a study

done in the United Kingdom has shown that CADx systems for mammography only offer

marginal improvements of 1% in sensitivity over a single reader radiologist while taking

almost twice as long (45 seconds) [24]. On the other hand, CADx systems can have

huge cost-savings potential within MRI imaging. Since each DCE-MRI volume usually

contains hundreds of images at various time points, the time required to analyze DCE-

MRI volumes is much longer compared to other modalities. Moreover, the majority of

findings in these exams have a high chance of being false positives after biopsy. Therefore,

it has been suggested by [41] that employing CADx systems as an additional diagnostic

tool can improve a radiologist’s diagnostic accuracy and thereby reduce the number of

unnecessary biopsies. Current breast MRI CADx systems are able to provide overlays of

image features such as kinetic enhancement parameters and time-intensity curves, which

facilitates the diagnosis procedure by making the MRI images easier to comprehend.

Figure 1.3 shows an example of a commercial breast MRI CADe software in action.

There exists a rapidly growing body of literature on CADx. A study by [36] demon-

strated that all the clinicians regardless of skill or experience were able to outperform an


expert MRI radiologist with the help of a CADx system. However, the authors were not

able to demonstrate improvements in detection rate since semi-automated segmentation

was used in their study. This motivates the development of an automated CADx system

which can potentially improve both the detection rate and diagnostic accuracy.

1.3.1 Detection Overview

The first step for any CADx system is to localize any suspicious regions. To this end,

many types of segmentation algorithms have been developed in the medical imaging

literature. Classical computer vision methods use various combinations of thresholding

and mathematical models to segment images. Naıve methods of segmentation include

seeded region growing and automated thresholding. There are also many mathematical

models such as clustering and active contour models developed to capture the outline of

lesions.

The naıve approach to segmenting an image is to define a lower and upper intensity

threshold. The regions that are within the defined thresholds will be highlighted. How-

ever, in medical images such as MRI, the variation in image intensity between patients

makes it difficult to assign a single pair of threshold values for every lesion. Attempts

have been made to automate the selection of thresholds using various mathematical mod-

els. One example is Otsu’s Method for threshold selection [52]. Otsu’s Method attempts

to find a threshold that minimizes the intra-class variance, defined as a weighted sum of

variance between 2 classes. The algorithm steps through each possible intensity value

and calculates the intensity variance between pixels above and below the selected thresh-

old. The intensity value that produces the maximum variance between pixels above and

below the selected intensity will be selected as the segmentation threshold. Many of the

methods described in literature use some type of thresholding within their algorithm.

A localized approach to segmentation is the Seeded Region Growing (SRG) algorithm.

This method uses manually or automatically planted seeds to segment an image. The


segmentation process starts out from the seed position and extends to neighbouring

regions based on a selection criteria. Just like the threshold method, it is difficult to

select a suitable criterion for a threshold that will work for all lesions in every image.

If segmentation is done using a single criterion, lesions will likely be over- or under-

segmented. Therefore, an adaptive threshold selection is often used in conjunction with

SRG to produce a more accurate segmentation. An adaptive SRG method was used

by [8] to find the contour to a mass lesion. A human-delineated region of interest was

necessary as a preprocessing step to reduce the range in which the algorithm operates

on.

The watershed segmentation algorithm is analogous to a flood simulator [53]. This

method treats a grayscale image as a topographic map wherein water is poured from

certain points. A gradient map is built from the image with each pixel corresponding to

the intensity change with respect to its neighbouring cells. These gradients form what is

called a basin where water gathers. The edge of the basins will become watershed lines

used to segment the image. The point of entry for water can be manually assigned or

automatically generated based on the unique features of points of interest (e.g. morphol-

ogy of lesions). The watershed algorithm was used in [14] to segment breast lesions in

DCE-MRI images on a slice-by-slice basis.

Fuzzy C-Means (FCM) is a cluster-based segmentation algorithm that groups a num-

ber of data points into c classes. Unlike traditional clustering where each data point can

only correspond to a single class, fuzzy clustering allows the data point to have a degree

of membership amongst different classes. A matrix is built to store the membership in-

formation of each data point. The matrix is then modified iteratively to minimize the

cluster membership error of each data point. For example, with grayscale images, the

pixels of the image will be clustered based on similar intensity values (e.g. by minimizing

the difference in intensity). This method was proposed for the segmentation of breast

lesions in DCE-MRI by [11]. The proposed method requires a human to first select a


ROI containing a lesion. The region is then normalized using the post-contrast T1W

image intensity at subsequent time steps. The FCM is applied to the enhanced region

to categorize the voxels into lesion and non-lesion. Final post-processing is done to take

into account necrotic regions and reduce false positive regions such as blood vessels.

The Gradient Vector Flow (GVF) or snake is a method that distorts a curve in order

to fit the outline of an object. The curve is distorted through the interaction between

internal and external energy functions. The external energy function is based on an

intensity gradient that minimizes when the curve is at the desired edge whereas the

internal energy function forces smoothness of the curve. Thus, the edge of the object

is found by minimizing the sum of the internal and external energies. A combination

of FCM and this method was used by [39] for segmenting and extracting morphological

features from breast lesions. First, 5 volume subtraction images were generated using

the pre- and post-contrast MRI data. For each lesion a representative MRI slice with the

greatest contrast was selected by an operator and a Region Of Interest (ROI) box was

placed around the lesion. Next, a crude contour was drawn around the lesion and the

GVF algorithm was applied to outline the boundary of the lesion.

While the previous methods described so far operate on the raw intensity values,

other studies have attempted to segment lesions based on computed features. For in-

stance, a mean intensity projection image was generated to detect enhancing regions

within a volume of interest [15]. Then, various dynamic (e.g. mean, standard deviation)

and texture features (e.g. kurtosis, entropy) are extracted and statistically analyzed to

determine how much each feature contributed to the lesion detection. The authors found

that the standard deviation and maximum mean intensity projection features had the

highest diagnostic accuracy at 90% detection rate. Other studies attempted to segment

lesions based on various kinetic models of the contrast agent. [21] attempted to segment

Invasive Ductal Carcinoma (IDC) by applying time series analysis to a linear dynamic

system model of the contrast enhancements. The authors report 100% sensitivity and


90% accuracy in detecting IDC cancers with this method. While the method had high

sensitivity, only 24 cases were studied and no specificity or false positive rate was reported

by the authors.

A different approach to image segmentation is to treat it as a classification problem

in which pixels are divided into object and background class. Classifiers essentially try

to learn a decision boundary separating the object and background classes based on a

given dataset.

A RFC approach was explored by [18] in which lesion segmentation was treated as

a blob detection problem. The method first computed various Hessian-based blob-like

features as well as kinetic enhancement values of the image and then trained an RFC

to classify voxels with blob-like features as lesions. A false positive removal stage using

another trained RFC was later performed to reduce the number of false candidate regions.

The RFC is essentially an ensemble of decision trees. The method operates by building

many decision trees with randomized sample of features. Classification is then achieved

by querying each tree in order to attain a majority vote of a target class. An inherent

property of RFCs is the automatic ranking of the importance of various features in the

training data. Analyzing these features could give unique insights to how each feature

affects the diagnosis.

On a different note, ANN is a graphical model based on the biological neural network.

Just like the brain, the ANN is organized into layers composed of many neuron-like

processing units called perceptrons. A detailed description of the structure of perceptrons

is included in Appendix A. In general, each individual perceptron unit in a layer takes as

input, the weighted sum of the output the previous layer. Through repeated exposure to

examples, the network of perceptrons can adapt its weights to capture the distribution

of the provided data. The use of a single hidden layer ANN model was explored by [32]

to detect breast cancer in DCE-MRI by attempting to model pharmacokinetic curves. A

similar approach was used in [6] in which the signal-time curve was classified in an attempt


to identify malignant, benign, and normal tissues. The authors were able to achieve 92%

accuracy on 34 test cases. The authors in [42] used a trained ANN to adaptively modify

the threshold of a SRG method to segment lesions in X-ray mammography images. A

ROI is accepted as input and then the ANN was used to initiate the seed point for a

SRG algorithm. The method achieved an average accuracy of 82% and 95% on 2 different

public mammography datasets.

We have summarized a few of the many different types of algorithms proposed in lit-

erature for the detection of breast lesions. Most of the aforementioned methods involve

a manual delineation of ROI in order to reduce computation time and increase accuracy.

Furthermore, some algorithms will not work correctly if the input ROI provided is too

large. For instance, if the initial contour for the GVF was drawn too far from the lesion,

the GVF might fail to find the object or outline a different object. Methods such as

Otsu’s thresholding are highly dependent on the ROI parameters (i.e. image intensity,

region size) so changes in size or location of the ROI can give varying results. Therefore,

automated algorithms which do not require manual ROI selection are desirable for en-

suring robustness of segmentation while minimizing observer variabilities. The drawback

of many classical segmentation algorithms is the need for human interaction (whether to

provide ROIs or seed points). On the other hand, statistical machine learning methods

such Support Vector Machine (SVM)s and RFCs function by classifying precomputed

features while the biology-inspired ANNs excel in directly learning from the data. With

respect to automated segmentation, the machine learning based algorithms show greater

potential in achieving the functionality required for automated CADx systems. Although

there are many advantages to using machine learning based methods, the performance

is directly related to the amount of high quality data, which is not always available in

abundance.


1.3.2 Classification Overview

The next step for a CADx system is to classify each of the detected legions as a possible

malignant cancer or benign tissue (such as cysts). While there exists a rapidly growing

body of literature on CADx, most of these methods rely on a machine learning algorithm

to classify lesions based on features extracted from the image. Studies have shown that

a combination of morphological, texture, and kinetic features can be used to distinguish

between benign and malignant lesions [17]. It is thought that computer analysis of

these features could be used as an aid to improve the radiologist’s ability to differentiate

benign and malignant cancers. Since malignant cancers require more nutrients than

normal tissue, it follows that lesions corresponding to malignant tumours will have higher

amounts of contrast agent flow. Pharmacokinetic modelling based on time-intensity data

can be used to characterize the malignancy of the lesion. In one study, an FCM algorithm

was used to cluster voxels based on time-intensity curves [50]. Morphological and textural

features were then used to classify the resulting regions as benign or malignant. The

authors noted that combining morphological and kinetic features proved to be more

robust when differentiating benign and malignant lesions.

The use of combination of kinetic, morphological and spatiotemporal features was

proposed by [4]. A histogram based threshold was applied to select enhancing regions

while kinetic and morphological filters were applied to reduce the number of false positive

regions. The authors then used an SVM to classify the extracted series of morphological,

kinetic, and spatiotemporal features of each region as benign or malignant. Generally,

SVMs attempt to find the optimal decision boundary in a multidimensional feature space

such that the orthogonal distance between the boundary and closest training data points

(known as support vectors) is maximized. Various kernels such as a Radial Basis Func-

tion can be used to transform the data before learning the boundary point in order to

improve separability of the classes. A semi-automatic method was proposed by [31]. The

algorithm involves having a user draw ellipses on a suspicious lesion and a non-lesion nor-


mal region. The voxels within the lesion region are assigned as the positive sample and

training is done on the fly for each case. A bounding box is drawn around the selected

samples and then the trained SVM is used to classify all the voxels within the bounding

box. [33] applied SVMs to differentiate invasive and non-invasive cancers in DCE-MRI

based on the signal intensity-time metrics. The authors first computed various voxel-

based kinetic features such as wash-out slope and area under the curve for each voxel

(representing contrast agent concentration), and used the SVM to classify the voxels as

potential lesions. The method had 72% sensitivity and 98% specificity on 26 malignant

and benign lesions. The poor sensitivity could be attributed to the limited dataset the

authors used.

Various studies have also explored the use of ANNs for the classification of breast

lesions. The authors in [36] used a semi-automated region growing method to generate

ROIs in DCE-MRI images. Then the shape, texture and kinetic enhancement of each

region was computed and classified using an ANN. Operating on a set of 43 malignant

and 37 benign lesions, the algorithm achieved an AUC of 0.97. Likewise, ANNs have been

used for the classification of lesions in X-ray mammography [42]. First, a cellular neural

network was used to automatically segment the lesion. Intensity, shape, and texture

features of the lesion were computed and classified using a simple ANN. The algorithm

was able to achieve 96.87% sensitivity and 95.94% specificity in diagnosing mass lesions.

As a final note, our lab has so far shown that a cascaded classifier can improve

over the performance of a single classifier in differentiating lesions [17]. In this study, a

cascaded RFC was used to determine lesion malignancy. Lesions are first categorized as

mass and nonmass using a combination of different kinetic, morphological and texture

features. A second RFC is then used to classify the lesion as benign or malignant.

Improvements in performance was noted on the cascaded RFC compared to the single

RFC. This phenomenon seems consistent with the idea of boosting in which an ensemble

of weaker learners can create a strong classifier. We will exploit this concept as part of


our proposed automated CADx system.

1.4 Thesis Outline

The remainder of this thesis is dedicated to describing the implementation of an auto-

mated CADx pipeline for breast cancer diagnosis. An overview of the proposed pipeline

is summarized by Figure 1.4. For each case to analyze, we first apply the necessary

preprocessing steps such as motion correction, breast segmentation and contrast normal-

ization in order to provide a common framework for our algorithm. We then apply a

trained ANN to each 3 × 3 patch (over 5 time points) to create a probabilistic map of

each patch belonging to a lesion. The resulting image is then binarized to create regions

of interest. After that, we analyze the kinetic, morphological, and textural features of

each region in order to classify it as benign or malignant. A list of the resulting regions

will then be provided to the radiologist to aid in diagnosis.

Chapter 2 presents our proposed pipeline in more detail. The particulars surrounding

the training of each classifier are elucidated and the relevant statistics are reported in

this chapter. Furthermore, I will provide some insights to the design decisions made

concerning the architecture of our pipeline as well as discuss some limitations and ways

to accommodate them.

Chapter 3 summarizes the contribution of this thesis and emphasizes the significance

of our proposed CADx pipeline in the context of breast cancer screening. I will then

discuss some of the mistakes our algorithm has made and finally, concluding this thesis

by presenting some ideas for future work.


Figure 1.4: Flowchart of our automated CADx pipeline. After preprocessing each im-age, we apply our detection algorithm to classify all the detected regions as benign ormalignant and present the resulting regions to the radiologist for diagnosis.

Chapter 2

Automated Computer Aided

Diagnosis using Deep Learning

2.1 Introduction and Background

Breast cancer is currently one of the most diagnosed diseases among women. Evidence

suggests that early screening and treatment reduces incidence of advanced-stage breast

cancer in certain high-risk groups [51]. The primary screening modality for these women

is DCE-MRI, which has been shown to have higher sensitivity compared to X-ray mam-

mography and breast ultrasound [9]. However, the time required to analyze DCE-MRI

volumes is often much longer compared to other modalities. Moreover, the majority of

findings in these MRI exams turn out to be false positives after biopsy. It has been sug-

gested by [5] that employing CADx systems as an additional diagnostic tool can improve

a radiologist’s diagnostic accuracy. A study conducted by [19] reported that a CADx

system can potentially reduce 36.9% of unnecessary biopsies within the BI-RADS 4A

group.

Currently available CADx systems provide an overlay of morphological, texture, and

kinetic features to medical images. Radiologists are then expected to give a diagnosis

19

Chapter 2. Automated Computer Aided Diagnosis using Deep Learning20

by examining the characteristics of suspicious lesions. Benign lesions tend to be circular

and have a sharp margin while malignant lesions tend to have an irregular shape and

spiculated margin. In many cases, however, this distinction is less well defined and the

analytical powers of computers are needed to mitigate this problem. In order to compute

these features, robust outlines of these lesions must be provided. Due to the nature of

DCE-MRI images, manually segmenting lesions by trained experts is prohibitively expen-

sive and time-consuming. While CADx systems relying on semi-automated segmentation

algorithms have demonstrated improvements in diagnostic accuracy over human experts

[36], they are still not optimal since it requires a human to first locate the lesion. There-

fore, it is necessary for a CADx system to have an automated detection and segmentation

algorithm in order to allow improvements in both diagnostic accuracy as well as detection

rate of cancers.

The simplest approach to segmenting an image is to define a lower and upper intensity

threshold. Voxels within the defined intensity boundary are selected as regions of interest.

However, the variation in image intensity of MR images makes it difficult to assign a single

pair of thresholds for every possible image. More complex methods such as watershed,

FCM, and GVF are described in [14, 11, 39] respectively. Most of these algorithms require

manual selection of seed points or regions of interest in order to reduce computation time

and improve robustness of the algorithm. These manual interventions are also subject to

observer variabilities.

A different paradigm to the classical image segmentation algorithm is to treat it as

a classification problem wherein each pixel is classified as object or background. The

recent resurgence of ANN and Deep Learning (DL) provide an excellent framework for

this problem. The term ”Deep Learning” refers to a branch of machine learning algo-

rithm that utilizes multiple non-linear transformations to learn some type of hierarchical

representation of data. With the introduction of faster and cheaper hardware, deep learn-

ing has become a powerful tool in research and industry. ANNs have seen huge success


in recent years by achieving state of the art benchmark results in the computer vision

and linguistics community. The advent of deep-learning and unsupervised training have

shown promise in learning hierarchical features using unlabelled data [30]. This is of par-

ticular interest for the training of an automatic classifier-based segmentation algorithm

since ground truth lesion segmentations are scarce and expensive to produce in medical

images.

A common procedure for deep learning with ANNs is to use layer-wise pretraining of

data to initiate the ANN. An outline of this procedure is presented in Figure 2.2(b). The

idea behind layer-wise pretraining is to use an unsupervised ANN architecture to learn

latent representations of the data one layer at a time. After learning the representations

of one layer, the input is transformed by the learned representations and used as input to

train the next layer until all the layers have been trained. A common ANN architecture

for pretraining a layer is the Autoencoder (AE). AEs are networks that have the same

number of output as input nodes. The objective of AEs is to learn an encoding of the

data such that the reconstruction error with respect to the original input is minimized.

During the layer-wise pretraining procedure, once an encoding has been learned, it can

be copied over to the original ANN.

Real data are usually noisy and might contain partially corrupted inputs. It is there-

fore necessary to find features that are robust against such corruptions. The denoising

Autoencoder (dAE) is a special type of AE that essentially tries to learn from such

corruptions. Rather than training using the original input data, the dAE artificially

corrupts the input during training and tries to reconstruct the original data from the

partially destructed input [40]. The informal reasoning behind the conception of dAEs

is that a good representation is expected to capture robust structures of the data in the

form of dependencies within its input distribution. This means that with the amount

of redundancies within images, it should be possible to fully recover partially corrupted

images. Humans for instance, excel at recognizing partially occluded objects. This type


of classifier is therefore ideal for medical images where noise is inherent.

The second stage of a CADx system is the classification of the segmented regions. A

SVM based method was proposed by [33] to discriminate between malignant and benign

voxels using kinetic enhancement features. The authors in [18] used RFCs to detect

malignant lesions based on a combination of morphological and kinetic features. A simple

ANN architecture was used by [32] to determine malignancy using kinetic enhancement

of single raw voxels. RFCs are ideal for diagnostic purposes, because they quantify the

importance of the features used in training whereas ANNs and SVMs do not provide a

clear description of how the data is clustered. We have therefore decided to use ANNs

and DL to segment the lesions while employing RFCs to classify the resulting regions.

2.2 Method Overview

Our approach models the radiologist’ workflow in which we first generate regions of

interest based on contrast enhancements and then classify those regions as benign or

malignant based on a combination of morphological, texture, and kinetic features. We

exploit the generalization capabilities of deep learning for the detection of suspicious

regions and use cascaded RFCs to differentiate benign and malignant lesions.

The proposed method can be outlined as follows:

1. Each DCE-MRI series is rendered as a 4D matrix (3D volume at 5 time points)

2. The 4D matrix is divided into small overlapping tiles (Figure 2.1a).

3. A trained deep ANN is used to classify each tile as containing a lesion or not.

4. The resulting lesion probability map is then processed to generate regions of inter-

est.

5. Kinetic, morphological, and texture features are generated from those regions.


6. A trained RFC is used to classify the regions as benign or malignant (Section 2.7).

We used the following open-source python 2.7 packages for the implementation and eval-

uation of all the methods described in this paper: pylearn2, scikits-learn.

2.3 Dataset

For our dataset, a subset of 573 histology-proven malignant and benign lesions from

patient exams with BI-RADS 3 or higher was identified in our research database. For

each lesion, ground truth was semi-automatically generated using a seeded 3D connected-

component region growing method where manual seed points were placed by the authors

based on the lesion location indicated in the radiologist’s report. Another set of 630

normal studies (BI-RADS 3 and lower) were selected based on patients who’ve had no

imaging abnormalities (both benign and malignant) detected for at least 2 consecutive

years. The histology-proven lesion studies were stratified into roughly 3 equal parts. 2

parts were joined to form a training set (150 malignant, 212 benign) and the last part

is left as a testing set (71 malignant, 140 benign, 316 normal). The remaining normal

studies were split divided into 2 roughly equal parts and added to the training set and

testing set such that no patients were included twice in all 1203 studies. As a result our

training set contains 150 malignant, 212 benign, 314 normal studies while the testing set

contains 71 malignant, 140 benign, 316 normal studies. Table 2.1 shows a breakdown of

our dataset.

All of our images in this dataset were acquired as T1W Fat-Sat sagittal DCE-MRI

using a GE 1.5T scanner at an average resolution of 0.388mm by 0.388mm in-plane and

3.0mm between slices. Due to the large slice thickness of our images, each MRI volume

is treated as a stack of 2D slices and all operations were applied on a slice-by-slice basis.

A summary of BI-RADS score for our training data is shown in Table 2.2.


Table 2.1: A breakdown of our data to training and testing set.

Training Set Testing Set

Normal 314 316

Benign 212 140

Malignant 150 71

Total 676 527

Table 2.2: A breakdown of BI-RADS category for our training data.

BI-RADS 0 1 2 3 4 5 6

Normal 0 84 196 34 0 0 0

Malignant 3 0 1 9 36 60 41

Benign 3 0 7 18 152 17 15


Figure 2.1: (a) An 8-neighbourhood connection scheme is used to divide the rendered4D DCE-MRI matrix into overlapping image tiles of size 5× 1× 3× 3 (5 time points, 1slice, 3-by-3 voxel window). (b) Each tile is then flattened to a 1D input vector of size45 for use in training and classification by our ANN.

2.4 Preprocessing

Before the segmentation process, a certain number of preprocessing steps to clean the

image are necessary in order to reduce the number of false positives. Since our method

relies on patch-wise classification of time-intensity curves over several acquisition time

points, any type of motion in between acquisitions will affect our results. To reduce this

type of problem, we have used the optical-flow method described in [34] to correct for

the motion. We then render our DCE-MRI volumes as a 4D matrix (3D MRI at 5 time

points). The image intensities are clipped to the 99.5th percentile in order to remove

spikes in intensity values. The contrast between enhancing regions and background tis-

sue is improved by standardizing the image using the equation Vt,i,j,k =Vt,i,j,k−mean(V )

std(V ),

where t is the index of the dynamic sequence (from 0 − 4) and i, j, k are matrix indices

of the respective voxel. The chest area within breast DCE-MRI images tends to be

highly enhancing, which may lead to false positive detected regions. We therefore used a

classifier-based breast segmentation algorithm described in [35] to isolate the breast and

only consider areas within the breast as possible lesions.


Figure 2.2: (a) Architecture of Deep ANN with 45 input nodes, 32 tanh hidden nodes, 7sigmoid hidden nodes, and 2 softmax output nodes. (b) Stacked dAE used to initiate thenetwork. The first dAE uses a tanh while the second dAE uses a sigmoid as the encodingfunction. The dashed arrow shows the path with respect to the original network.

2.5 Region Selection

We use an ANN to generate a list of suspicious regions. Our particular architecture uses

45 input nodes, 32 tanh hidden nodes, 7 sigmoid hidden nodes, and 2 softmax classifier

nodes. These parameters were experimentally optimized to minimize the number of sam-

ples from the training set that were misclassified (misclassification error). The proposed

architecture is illustrated in Fig. 2.2. An overview of the activation functions for the

nodes can be found in Appendix A.

2.5.1 Unsupervised Pretraining

We initialized our ANN by greedy-wise training a stack of dAE for the 2 hidden layers.

The training data for this process was acquired by dividing each volume in the training

set into 5×1×3×3 image tiles (Figure 2.1). Each layer of dAE in the stack was trained

for 30 epochs using a dropout rate of 30% (each node has 30% chance of being set to

0), batch size of 100, and annealed learning rate starting at 0.001. During each epoch,

millions of image tiles extracted randomly from volumes in the training set are processed

by the stacked dAE and optimized to minimize the reconstruction error. The resulting

weights represent latent representations of our dataset, which are used to initialize the

ANN.


2.5.2 Supervised Training

After initialization, the ANN is fine-tuned using labeled data. Lesion samples were

generated by taking image tiles within our ground truth segmentations while non-lesion

samples were acquired by taking image tiles within areas of enhancement (e.g. blood ves-

sels, background parenchymal enhancement, artifacts) in normal breasts. In total, 1.8M

input vectors (equally split between lesion and non-lesion samples) from the training set

were used to train each epoch. Since our lesion and non-lesion samples were unbalanced,

we augmented our data by oversampling the minority class with Gaussian noise (σ: 110

of minimum feature standard deviation, µ: 0) in order to balance the 2 samples.

The training was performed using stochastic mini-batch gradient descent backpropa-

gation with initial learning rate of 0.1 and batch size 100. The learning rate was expo-

nentially decreased over 30 epochs. Early stopping based on the MSE (Mean Squared

Error) of the training data was used to prevent over-fitting of our model.

2.5.3 Optimal Region Threshold

To determine the optimal threshold for generating our regions of interest, we performed

FROC (Free response Receiver Operating Characteristic) analysis on the training dataset.

The threshold T was varied between 0.05 and 0.95. For each threshold value, 362 (150

malignant, 212 benign) lesions from our training set were used to calculate the sensitivity

while the average number of detected regions in the left over 314 normal studies were

used to compute the Mean False Candidate Regions (MFCR). A lesion is considered to

be a true positive if the dice score (Equation 2.1) between the ground truth and the

binarized image is greater than 0. In order to achieve robust outlines, our threshold not

only has to capture all the lesions but also retain a high correspondence to the ground

truth segmentations. The optimal threshold, therefore, was selected based on the highest

performance metric as defined by Equation 2.2. We use a scaling factor α of 0.5 as a

compromise between the sensitivity and dice score. In cases where multiple thresholds


Figure 2.3: Result of conditional dilation operation to join disconnected islands together.Left is the subtraction image showing a 2D slice of the lesion. Middle is the segmentationwithout dilation and right is the segmentation with dilation.

had the same performance metric, we picked the one that had the fewest MFCR. The

optimal threshold selected was 0.615.

DS =2|A ∩B||A|+ |B|

(2.1)

metric[t] = α×mean(DS[t]) + (1− α)×mean(sens[t]) | ∀t ∈ Thresholds (2.2)

2.5.4 Postprocessing

Since nonmass lesions can potentially consist of multiple regions, an attempt was made to

join these regions so that they could be treated as a single structure for further analysis.

We performed a conditional dilation by first dilating the thresholded image by 1mm and

multiplying the resulting image by a mask (probability map thresholded to 0.5). An

example of the resulting pre- and post-operation is shown in Figure 2.3.

2.6 Segmentation

After ANN training and optimal threshold selection, we can generate lesion candidates

as follows:


1. Preprocess the DCE-MRI series as described in Section 2.4.

2. Reformat the volume as a series of overlapping image tiles (Fig. 2.1).

3. Use the trained ANN to classify each tile as lesion or non-lesion to generate a

probability map that represents lesion-likelihood for each voxel.

4. Apply the optimal threshold from Section 2.5.3 to the probability map to get a

binary image.

5. Apply morphological postprocessing to connect and filter regions.

6. Assign labels to each region so that voxels within each region have the same value.

2.7 Region Classification

The previous section describes a method to segment out enhancing regions as potential

lesion candidates. An ANN was trained to detect regions of interest analogous to how

a human would find lesions by finding bright enhancing regions. In order to find out

whether the detected region is a false positive (i.e. blood vessel), malignant, or benign

lesion, we need to consider additional features such as its shape and texture. In order to

accomplish this, we employ a cascaded 2-stage RFC classifier (Figure 2.4) similar to [17].

The first stage removes as many false positive regions as possible while the second stage

classifies the remaining regions as benign or malignant. To this end, we compute various

morphological, kinetic, and textural features for each region and use them to differentiate

between lesion and non-lesion regions (e.g. artifacts, blood vessels). We then use the

same features to classify the remaining lesion regions as benign or malignant.


Figure 2.4: A schema of the cascaded RFC. The first RFC classifies lesion and non-lesionregions while the second RFC differentiates the resulting lesions as malignant or benign.

2.7.1 Feature Extraction

We developed a feature extraction pipeline to generate a combination of 75 morphological,

kinetic, and textural features for each region. The full description of these features

is listed in Appendix B. The output segmentations of the trained ANN applied to

the training set was used to generate training samples for the cascaded RFC. Features

extracted from each of the segmented regions were labeled accordingly (malignant for

regions in malignant studies, benign for regions in benign studies, and normal for

regions in normal studies) and used as training samples for the RFC.

2.7.2 RFC Training

The cascaded RFC consists of a lesion classifier (RFC1) and malignancy classifier (RFC2).

Since our lesion versus non-lesion samples were greatly unbalanced, each individual tree

within the RFC was trained by using all of the lesion and a subset of the non-lesion

samples such that each tree was trained on an equal number of lesion and non-lesion


Figure 2.5: Illustrated examples of features learned by our ANN. (a) 2D representationof first hidden-layer network weights. (b) The value of each row is averaged and plottedon a graph.

samples. The RFC classifiers were trained by performing a grid search along with 10-

fold cross-validation for each set of parameters within the grid. The final classifier was

attained by keeping all the RFCs across each fold that has an AUC greater than 0.75.

The operating point closest to 100% sensitivity and 100% specificity was selected as the

optimal decision threshold.

2.8 Results

The unsupervised training allowed our ANN to capture representations of our data. The

list of features learned by the first hidden layer of our ANN is shown in Figure 2.5. The

plots generated from the weights resemble a series of intensity-time curves (in the form

of image patches). It is interesting to note how the weights across each row starting

from the second row are fairly uniform while the first row, representing the precontrast

patch is more heterogeneous. This might signify that there is more spatial variance in

the precontrast patch compared to the post-contrast patches. To validate our ANN as

an effective way to delineate lesions, we applied the trained network to the unseen testing

set and measured its performance. Our ANN detected 342 out of 362 (94.4%) lesions

from the training set and 204 out of 211 lesions from the testing set.


Figure 2.6: Aggregated ROC curve of the lesion classifier (RFC1) for each of the 10-foldcross-validation. RFC1 achieved 0.91 AUC (0.91-0.94 interquartile range). The optimalthreshold value of 0.6 was selected to maximize the sensitivity and specificity.

A cross-validation ROC analysis was performed to demonstrate the generalizability of

our RFC. Figure 2.6 shows the 10-fold cross-validation performance of the RFC1 lesion

classifier on the hold-out validation set while Figure 2.7 shows the performance of the

RFC2 malignancy classifier.

After successfully training both classifiers, we applied our CADx pipeline to the testing

set consisting of 71 malignant, 140 benign, and 316 normal studies from completely

different patients. This distribution is approximately 8.5 times the provincial high-risk

screening population breast cancer incidence rate of 1.6% [12]. Our algorithm was able

to correctly detect 204 out of 211 (96.7%) lesions (both benign and malignant) and

correctly classified 67 out of the 71 (94.3%) malignant lesions and 113 of 140 (80.7%)

benign lesions. The overall false positive rate was 0.12 per breast. An overview of the

results is summarized in Table 2.3.


Figure 2.7: Aggregated ROC curve of the malignant/benign classifier (RFC2) for eachof the 10-fold cross-validation. RFC2 achieved 0.81 AUC (0.80-0.85 interquartile range).The optimal threshold value of 0.63 was selected to maximize the sensitivity and speci-ficity.

Table 2.3: Statistics of our proposed method on the training set and testing set. Themeasures were computed after applying both RFC1 and RFC2 classifiers and provides arough estimate of how well our algorithm does in practice.

Statistic Training Set Testing Set

Sensitivity 0.873 0.944

Specificity 0.859 0.807

Accuracy 0.862 0.886

PPV 0.639 0.545


Table 2.4: A breakdown of the performance on the testing set with respect to BI-RADScategory.

BI-RADS 0 1 2 3 4 5 6

False Negative 0 0 0 1 0 1 2

False Positive 1 3 21 8 17 0 6

True Negative 1 87 179 37 82 5 9

True Positive 5 0 0 3 19 17 26

2.9 Discussion

We have shown that our method correctly classified the majority of malignant lesions

in our testing set. 27 out of the 140 benign studies and 29 out of 316 normal studies

were classified as malignant. Our method was able to correctly identify the majority of

benign lesions (113 out of 140) at a cost of 29 false positive detections in normal breasts.

Since benign lesions in breasts designated as BI-RADS 3 or 4 are often biopsied, our

method would have greatly reduced the amount of benign biopsies in a clinical setting.

A breakdown of our test results is shown in Table 2.4. One of the false negative lesions

(the BI-RADS 3) was determined to be low-grade Ductal Carcinoma In Situ (DCIS)

after biopsy. Since most of the BI-RADS classification in our database was assigned per

breast, benign biopsied lesions in the same breast as malignant lesions were assigned the

same BI-RADS score in our analysis. This explains the presence of the 6 BI-RADS-6

False Positives (benign classified as malignant) and 9 True Negatives (benign correctly

classified as benign).

Many of the false candidate regions in the normal studies were due to partially seg-

mented blood vessels, imaging artifacts, background parenchymal enhancement, and en-

hancing foci that resemble nonmass lesions (see Figure 1.2). Examples of false positive


classifications are shown in Figure 2.8. Small enhancing islands were classified as ma-

lignant in the top row. This might be due to the similarity of the enhancing regions to

some of the nonmass lesions in our dataset. This problem could be rectified by boot-

strapping the ANN training with patches within these regions as non-lesion samples.

Alternatively, we could augment our RFC2 malignancy classifier training set with these

cases as additional benign lesion samples.

A second type of false positive misclassification is caused by the existence of a known

benign lesion in the image (see Figure 2.8, bottom row). When radiologists find an

enhancing region determined to be BI-RADS 3 or lower, the patient could be scheduled

for another exam in 6 months to examine its growth. When no changes are observed,

the radiologist can deem the study to be normal, meaning that no cancer is present

in the breast despite the presence of enhancements. Due to the scarcity of our labeled

data, we have included these follow-up exams in our training and testing dataset. The

enhancement detected by our algorithm was described as an intramammary lymph node

by the radiologist after a follow up exam and T2W imaging. However, without access to

additional information, our algorithm is not wrong in picking up these cases as requiring

further inspection.

While ANNs have been used in the past for both segmentation and classification

of lesions, our approach differs firstly by including an unsupervised learning stage for

initializing the ANN. We have also introduced a neighbourhood approach in which we

classify a group of pixels rather than individual ones as is the case in previous studies.

Finally, we reinforced the idea of a cascaded approach to the differentiation of benign

and malignant lesions.


Figure 2.8: Examples of false positive misclassifications by our algorithm. The top rowshows mild background parenchymal enhancements misclassified as malignant lesions.The bottom row shows a lymphnode detected as malignant.


2.10 Implementation Details and Limitations

We made some design decisions concerning the architecture of our proposed method,

which will be discussed in this section. Then, we will outline some limitations of our

algorithm and propose ways to alleviate them.

Patch Size Selection

Through preliminary testing on the ANN architecture, we have found that the 8-neighbours

connection scheme (3×3 patch size) gave us better results compared to the 0-neighbours

(single voxel) and 24-neighbours (5 × 5 size) architecture. Although no formal inves-

tigation was carried out, we suspect that the reason the 5 × 5 architecture gave worst

results could be due to the curse of dimensionality. What this means is that as the input

dimensionality increases, the space in which we train our classifiers becomes more sparse.

Since the current patch size results in 45 (3 by 3 patch over 5 time points) inputs, the

increased neighbours count would require 125 inputs. Therefore, in order for the 24-

neighbours scheme to have the same amount of robustness as our current architecture

we would require at least 3 times as much data. Another possibility is that our data has

in-plane voxel size of around .38mm. This effectively gives our 3× 3 patches a resolution

of 1mm. Testing the patch size on images with different voxel size might provide more

insight to this effect.

Optimal Threshold Criteria

Since we decided that it is more detrimental to miss a malignant lesion than to include

a benign one, the optimal threshold selection for all the classifiers was skewed towards

higher sensitivity. Additionally, we used an arbitrary α value of 0.5 in Equation 2.2 to

select the optimal threshold. In order to optimize this value, we would have had to repeat

the training of all the classifiers (both ANN and RFC) and their optimal thresholds for

each α between 0 and 1, which seemed inefficient with respect to time and accuracy


Figure 2.9: Image of the missed malignant lesion in the test data set (circled in red).The lesion resembles background enhancement.

trade-offs. Figure 2.9 shows the missed malignant lesion from our testing set due to the

applied threshold. The lesion can be distinctly seen in the probability map, but is missing

in the resulting thresholded image.

Limitations

Since our ANN was trained using images from our clinic, our method might not work as

well on images from other clinics taken using a different machine or MRI pulse sequence.

One problem arises when the ANN is applied to DCE-MRI volumes that don’t have

exactly 5 time points. This could be alleviated by down-sampling sequences with more

than 5 time points in order to make it fit our ANN architecture. Further training using

those resampled images would be beneficial to ensure robustness. Another concern could

be differences in image resolution. Since our algorithm was trained on sagittal MRI

volumes, applying it to images with a different orientation might pose problems due to

differences in slice-thickness. This problem could be avoided by resampling the training

data to isotropic resolution while using images taken at different orientations. Since

many of the operations are on a per-slice basis, the processing time of our algorithm

could be greatly reduced by utilizing the parallel computation capabilities of a GPU for

the calculations.


2.10.1 Acknowledgements

We would like to thank the contributions made by OICR Smarter Imaging Program and

Canadian Breast Cancer Foundation to make this research possible.

Chapter 3

Discussion and Future Work

3.1 Significance of Contributions

In the previous chapters, we emphasized the need for robust lesion segmentation in an

automated CADx pipeline for high risk breast cancer screening. Prior to the quantifi-

cation of tumour malignancy, it is necessary to robustly delineate the corresponding

regions of interest. Relevant features are then computed from those regions and analyzed

to provide a cancer likelihood score. This thesis presents an automated CADx pipeline

that allows accurate detection and diagnosis of breast lesions, which facilitates screening

exams. This work attempts to first segment suspicious lesions based on their kinetic en-

hancement features and then classify them as malignant or benign using a combination

of kinetic, morphological and textural features. The use of kinetic enhancement features

for segmentation corresponds to the way radiologists report findings based on enhancing

regions. Since malignant and benign lesions are characterized by their shape (e.g. round

versus irregular), margin (e.g. circumscribed, spiculated), internal enhancement texture

(e.g. homogeneous versus heterogeneous), and kinetic enhancement curves (e.g. wash-

out versus persistent), we compute these types of features and use them to characterize

our segmented regions. As we have stated in Chapter 1, we have developed an automated

40

Chapter 3. Discussion and Future Work 41

CADx software that can help radiologists diagnose cancers faster and more accurately.

Although our method did not detect all the cancers from our dataset, we must consider

whether our achieved sensitivity of 94.5% is acceptable in clinical practice. It is important

to note that our dataset is biased towards cancers that radiologists found and therefore

does not represent the true sensitivity of general screening exams. For instance, the

population wide breast screening program in Ontario only detected 86.1% of cancers

[38]. Moreover, our algorithm only uses DCE-MRI images whereas clinicians have access

to additional information such as T2W images, ultrasounds and mammography to aid in

detection. DCIS, for example, often produces micro-calcifications which is undetectable

under MRI. Since our method relies on the DCE-MRI modality for detection of lesions, it

is impossible to achieve 100% sensitivity and so we should consider the minimal acceptable

detection rate.

3.2 Future Directions

Convolutional Neural Networks

Our proposed automated CADx pipeline follows the classical image processing paradigm

in which lesions are first segmented and then classified. However, the nature of non-mass

lesions implies the existence of multiple enhancing regions. An intrinsic problem with this

classical paradigm arises from the treatment of these regions as individual disconnected

lesions rather than as a single non-mass lesion. This might pose the risk of the system

presenting the individual parts as benign lesions while in reality the actual non-mass

lesion is malignant. Although we have not encountered this problem in our dataset, it’s

not an unfounded fear. One way to overcome this problem is to merge the segmentation

and classification into a single stage. Rather than segmenting each lesion as an individual

region, we can treat it as a ROI-recognition problem where we classify each ROI based on

whether it contains a malignant lesion or not. In fact, many top results in natural image


Figure 3.1: Diagram of a ConvNet. The 2 convolution layers act as feature extractorwithout segmentation, while the fully connected layers act as classifiers. The segmenta-tion and classification steps are in essence merged as a single classifier. Image adaptedfrom http://parse.ele.tue.nl/education/cluster2.

object recognition challenges uses some type of ConvNet model [1]. With the advent

of more powerful consumer level hardware, scientists were able to train deeper ANN

architectures with minimal costs. This facilitated the integration of ConvNet within the

medical sciences community. These networks take advantage of the spatial information

within natural images to learn robust hierarchical features that can be used to reconstruct

the original image. While ConvNets are able to solve more complex classification tasks

compared to other classifiers, it will require proportionately more data. Figure 3.1 shows

an example of ConvNet architecture where the need for segmentation is avoided.

Transfer Learning

The strength of ConvNets lies in the fact that the same features learned from one domain

can be applied to another domain with minimal to no adjustments. This phenomenon is

called transfer learning and models the fact that humans are adept at using knowledge

learned in one domain and apply it to another (e.g. using the concept of differentials

from mathematics in physics). Transfer learning has been applied in medical image

analysis by [48] and is shown to improve classification by up to 60% compared to regular


supervised machine learning algorithms when training data is scarce. This is consistent

with how humans can recognize abnormalities in MRI images despite being only exposed

to natural images throughout their life. Therefore, the use of transfer learning can be

used to supplement the lack of abundant labelled data in the medical fields.

Radiomics

One of the difficulties in classifying lesions is the fact that some cancers remain dormant

over years while others proceed to metastasize rapidly. There might be a genetic cause

behind their development which could be deciphered via machine learning. The field of

radiomics revolves around correlating genomic information with radiology images in order

to characterize tumour phenotype [3]. [29] introduced a workflow in which quantitative

image features are correlated with treatment outcome or gene expression. The current

framework of our CADx pipeline allows easy integration of additional modules such as

learned genomic features. The full potential of our CADx system within the context

of medical screening programs could be realized by incorporating genomic data. For

instance, an ANN based classifier could be attached at the end of the CADx pipeline

to map our extracted image features to a dataset of genomic data. Similar approaches

have been done in the computer vision field in which image captions are generated from

natural images (and vice versa) [23].

3.3 Summary of Contributions

Many existing CADx algorithms in literature are only able to detect malignant lesions [18,

15, 16] or focus on differentiating one type of lesion (either mass or nonmass) as benign or

malignant [41, 45, 41, 49]. I integrated a novel lesion detection algorithm with an existing

classification scheme and demonstrated that our proposed method was able to accurately

detect and classify both mass and nonmass-like lesions as benign or malignant. To


summarize, the trained deep ANN was able to provide accurate and robust segmentations

of both malignant and benign breast lesions in our dataset. Thus, our method can be

used as a catch all for segmenting any type of breast lesion. The results suggest that

localized time-intensity curves contain sufficient information to delineate breast lesions

from other tissue in DCE-MRI. We believe that this might be due to the robust kinetic

features that our ANN was able to learn from the data. The 2-stage cascaded classifier

approach was able to reduce the number of false positives detected by our segmentation

algorithm to a reasonable amount acceptable for clinical usage. We were able to lower

the False Positive Rate (FPR) well below 1 per breast on unseen data without any

reduction in sensitivity (see Table 2.3). In fact, if every false positive detection from

our algorithm were to be biopsied, our method would have caused 60% fewer negative

biopsies (compared to the total benign cases in our dataset). Our reported results seem

to surpass most of the current automated lesion detection methods in literature and

should be scalable for clinical practice. A more robust validation method would involve

a comparison of performance against other methods using publicly available datasets.

Appendix A

Perceptrons

Perceptrons are the processing units used in ANNs. Each layer of the ANN is composed

of many perceptron processing units working in conjunction. A diagram of the perceptron

is outlined in Figure A.1. The perceptron is modeled after the biological neuron. Just

like how a normal neurons receive signals from other neurons or sensory organs as input,

the perceptron accepts a weighted sum of outputs from other perceptrons as its input.

When the strength of the signals passes a certain threshold, the neuron proceeds to fire

the signal to other units connected to its axon. The derivation shown in A.1 models this

interaction: when the perceptron’s activation reaches a certain threshold, the perceptron

Figure A.1: Diagram of the perceptron unit. It computes the weighted sum of its inputsas activation and proceeds to fire a signal if a threshold is passed.

45

Appendix A. Perceptrons 46

outputs 1 and otherwise 0. The threshold θ is often referred to as the bias in literature

and is incorporated into weighted sum to allow each perceptron to learn its own threshold

during training.

Activation Function: f(a) =

1 a >= θ

0 a < θ

Sum: a =n∑

i=1

xi × wi

n∑i=1

xi × wi ≥ θ, where θ is the activation threshold

n∑i=1

xi × wi − θ = 0, set x0 = θ and w0 = −1

n∑i=0

xi × wi = 0

a =n∑

i=0

xi × wi

(A.1)

Output: y =

1 a >= 0

0 a < 0

Activation Functions

In the perceptron example above, we used a linear step function as the activation function.

In order to introduce non-linearity in our classifier, a non-linear activation function can be

used instead. Equation A.2 shows various activation functions that could be used for each

perceptron. The sigmoid and hyperbolic tangent functions are commonly used within

hidden layers whereas the softmax and linear functions are reserved for the output layer

of an ANN. ANN applied to classification problems usually employ the softmax activation

function as output while regression problems typically use the linear activation function.

Appendix A. Perceptrons 47

Within the context of an ANN layer, a sigmoid layer refers to an array of perceptron units

with the sigmoid activation function. Although each perceptron within a layer could have

a different activation function, it is ineffective to do so in practice as the network will

learn to adjust for this during training.

tanh function: y =2

1 + e−2a− 1

sigmoid function: y =1

1 + e−a

softmax function: y =eaj

N∑i=1

eajfor j=1,...,N

linear function: y = a

(A.2)

Appendix B

Lesion Features

According to the MRI BI-RADS lexicon, the first stage in assessing lesion malignancy

is to classify the type of enhancement as mass, nonmass, or focus. The radiologist then

estimates the likelihood for malignancy by assessing the kinetic, morphological as well as

textural characteristics of the lesion.

The kinetic curve shape type is intrinsically related to the perfusion, capillary per-

meability, and diffusion of contrast media from blood vessels to the extracellular space.

Invasive cancers predominantly present as mass lesions, with washout and persistent

curve shapes. Previous work by [20] has shown that kinetic analysis have the potential

to differentiate between benign and malignant mass lesions effectively. However, when

analysing lesions presenting as nonmass-like enhancements, conventional kinetic analysis

have failed to demonstrate discriminative power between benign and malignant nonmass

lesions [37].

The morphological characteristics of lesions are also evaluated. The main morphologi-

cal difference between mass and nonmass lesions is that unlike mass lesions, nonmass-like

enhancements exhibit poorly defined boundaries, leading to difficulties in the analysis of

morphology [20, 46]. Ultimately, morphological and kinetic features reflects the biological

characteristics of lesions and help explain the differences between benign and malignant

48

Appendix B. Lesion Features 49

lesions.

Bellow is a list of all the features used for our cascaded RFC to differentiate between

malignant and benign lesions.

Dynamic Features

Contrast Enhancement: C(r, i) =S(r, i)− S(r, 0)

S(r, 0)

Average Contrast Enhancement: C(i) = mean[C(r, i)]

Maximum Uptake: maxi=0,1,...,5[C(i)]

Peak Location of Enhancement: time frame index at which maximum enhancement occurs

Uptake Rate:Maximum Uptake

Peak Location

Washout Rate:

Contrast Enhancement−C(5)

5−Peak Location of EnhancementPeak Location of Enhancement 6= 5

0 Peak Location of Enhancement = 5

Inhomogeneity of Contrast Uptake: maxi=0,...,M−1{varr[I(r, i)]

varr[I(r, 0)]}I(r, i) is the set of voxel intensity values in the lesion at timepoint i and r is the vector pointing to the lesion

Variance of Uptake: mini=0,...,M−2{varr[I(r, i)]

varr[I(r, i+ 1)]}


Spatial Variance of Enhancement: V (i) =1

L− 1

L∑r=1

[C(r, i)−C(i)]2, where i = 0, 1, ..., 5

Maximum Variance of Enhancement: max[Spatial Variance of Enhancement]

Peak Location of Variance: time frame index at which maximum variance occurs

Enhancement Variance Increasing Rate:Maximum Variance of Enhancement

Peak Location of Variance

Enhancement Variance Decreasing Rate (FIII,4)

Maximum Variance of Enhancement−V (5)

5−Peak Location of VariancePeak Location of Variance 6= 5

0 Peak Location of Variance = 5

Enhancement Variance at First Post-Contrast Frame: V (1)

Morphological Features

3D Sharpness of Lesion Margin: maxi=0,...,M−1{meanr‖ 5 [Fm(r, i)− Fm(r, 0)]‖

meanrFm(r, i)}

3D Variance of margin gradient: maxi=0,...,M−1{varr‖ 5 [Fm(r, i)− Fm(r, 0)]‖

[meanrFm(r, i)]2}


3D Circularity:volume of sphere with effective lesion diameter

volume of lesion

3D Irregularity: 1− π(effective lesion diameter)2

surface area of lesion

effective lesion diameter = 23

√3 · volume of lesion

4π

Radial Gradient Histogram (RGH): voxel-value gradients·lines intersecting the centroid of lesion

Maximum Variance of RGH Values: maxi=0,...,M−1{varpH(p)}

p =| 5 [Fb(r, i)− Fb(r, 0)] · (r − rc)|‖ 5 [Fb(r, i)− Fb(r, 0)]‖ · ‖(r − rc)‖

Maximum Standard Deviation of RGH Valuesmaxi=0,...,M−1{stdpH(p)}

Texture Features

Energy:G∑i=1

G∑j=1

p(i, j)2

Maximum Probability:

Contrast:G−1∑k=0

k2(∑|i−j|=k

p(i, j))

Sum of Squares (Variance):G∑i=1

G∑j=1

(i− µ)2p(i, j)


Correlation:

∑Gi=1

∑Gj=1(ij)p(i, j)− µxµy

σxσy

Maximal Correlation Coefficient:√

Second largest eigenvalue of Q

Q(i, j) =∑k

p(i, k)p(j, k)

px(i)py(k)

Sum Average:2N∑i=2

ipx+y(i)

Sum Entropy: −2N∑i=2

px+y(i) log p(x+y(i)

Sum Variance:2N∑i=2

(i− Sum Entropy)2px+y(i)

Difference Entropy: −N−1∑i=0

px−y(i) log p(x+y(i)

Difference Variance: variance of px−y

Appendix C

List of Abbreviations

Abbreviations

ABUS: Automated Breast Ultrasound

AE: Autoencoder

ANN: Artificial Neural Network

AUC: Area Under the Curve

BI-RADS: Breast Imaging Reporting And Data System

BPE: Background Parenchymal Enhancement

CADe: Computer Aided Detection

CADx: Computer Aided Diagnosis

ConvNet: Convolutional Neural Network

dAE: Denoising Autoencoder

DCE-MRI: Dynamic Contrast Enhanced Magnetic Resonance Imaging

DCIS: Ductal Carcinoma In Situ

DL: Deep Learning

Fat-Sat: Fat Saturated

FCM: Fuzzy C-Means

FPR: False Positive Rate

53

Appendix C. List of Abbreviations 54

FROC: Free-response Receiver Operating Characteristic

GVF: Gradient Vector Flow

IDC: Invasive Ductal Carcinoma

MFCR: Mean False Candidate Regions

MIP: Maximum Intensity Projection

MRI: Magnetic Resonance Imaging

OBSP: Ontario Breast Screening Program

PPV: Positive Predictive Value

RFC: Random Forests Classifier

ROC: Receiver Operating Characteristic

ROI: Region Of Interest

SRG: Seeded Region Growing

SVM: Support Vector Machine

T1W: T1-Weighted

T2W: T2-Weighted

TPR: True Positive Rate

US: Ultrasound

Bibliography

[1] Large scale visual recognition challenge 2015.

[2] O. Abe, R. Abe, K. Enomoto, K. Kikuchi, H. Koyama, H. Masuda, Y. Nomura,

K. Sakai, K. Sugimachi, T. Tominaga, J. Uchino, M. Yoshida, J. L. Haybittle,

C. Davies, V. J. Harvey, T. M. Holdaway, R. G. Kay, B. H. Mason, J. F. Forbes,

N. Wilcken, M. Gnant, R. Jakesz, M. Ploner, H. M. A. Yosef, C. Focan, J. P. Lo-

belle, U. Peek, G. D. Oates, J. Powell, M. Durand, L. Mauriac, A. Di Leo, S. Dolci,

M. J. Piccart, M. B. Masood, D. Parker, J. J. Price, Psgj Hupperets, S. Jackson,

J. Ragaz, D. Berry, G. Broadwater, C. Cirrincione, H. Muss, L. Norton, R. B. Weiss,

H. T. Abu-Zahra, S. M. Portnoj, M. Baum, J. Cuzick, J. Houghton, D. Riley, N. H.

Gordon, H. L. Davis, A. Beatrice, J. Mihura, A. Naja, Y. Lehingue, P. Romestaing,

J. B. Dubois, T. Delozier, J. Mace-Lesec’h, P. Rambert, O. Andrysek, J. Bark-

manova, J. R. Owen, P. Meier, A. Howell, G. C. Ribeiro, R. Swindell, R. Alison,

J. Boreham, M. Clarke, R. Collins, S. Darby, P. Elphinstone, V. Evans, J. Godwin,

R. Gray, C. Harwood, C. Hicks, S. James, E. MacKinnon, P. McGale, T. McHugh,

G. Mead, R. Peto, Y. Wang, J. Albano, C. F. de Oliveira, H. Gervasio, J. Gordilho,

H. Johansen, H. T. Mouridsen, R. S. Gelman, J. R. Harris, I. C. Henderson, C. L.

Shapiro, K. W. Andersen, C. K. Axelsson, et al. Effects of chemotherapy and

hormonal therapy for early breast cancer on recurrence and 15-year survival: an

overview of the randomised trials. Lancet, 365(9472):1687–1717, 2005.

[3] Hjwl Aerts, E. R. Velazquez, R. T. H. Leijenaar, C. Parmar, P. Grossmann, S. Cav-

55

Bibliography 56

alho, J. Bussink, R. Monshouwer, B. Haibe-Kains, D. Rietveld, F. Hoebers, M. M.

Rietbergen, C. R. Leemans, A. Dekker, J. Quackenbush, R. J. Gillies, and P. Lambin.

Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics

approach. Nature Communications, 5, 2014.

[4] S. Agliozzo, M. De Luca, C. Bracco, A. Vignati, V. Giannini, L. Martincich, L. A.

Carbonaro, A. Bert, F. Sardanelli, and D. Regge. Computer-aided diagnosis for

dynamic contrast-enhanced breast mri of mass-like lesions using a multiparametric

model combining a selection of morphological, kinetic, and spatiotemporal features.

Medical Physics, 39(4):1704–1715, 2012.

[5] T. Ayer, M. U. Ayvaci, Z. X. Liu, O. Alagoz, and E. S. Burnside. Computer-aided

diagnostic models in breast cancer screening. Imaging in Medicine, 2(3):313–323,

2010.

[6] B. Bayram, H. K. Koca, B. Narin, G. C. Cavdaroglu, L. Celik, U. Acar, and

R. Cubuk. An efficient algorithm for automatic tumor detection in contrast en-

hanced breast mri by using artificial neural network (neubrea). Neural Network

World, 23(5):483–498, 2013.

[7] S. Behrens, H. Laue, M. Althaus, T. Boehler, B. Kuemmerlen, H. K. Hahn, and H. O.

Peitgen. Computer assistance for mr based diagnosis of breast cancer: Present and

future challenges. Computerized Medical Imaging and Graphics, 31(4-5):236–247,

2007.

[8] T. Berber, A. Alpkocak, P. Balci, and O. Dicle. Breast mass contour segmentation al-

gorithm in digital mammograms. Computer Methods and Programs in Biomedicine,

110(2):150–159, 2013.

[9] W. A. Berg, L. Gutierrez, M. S. NessAiver, W. B. Carter, M. Bhargavan, R. S. Lewis,

and O. B. Ioffe. Diagnostic accuracy of mammography, clinical examination, us, and

Bibliography 57

mr imaging in preoperative assessment of breast cancer. Radiology, 233(3):830–849,

2004.

[10] J. M. Chang, W. K. Moon, N. Cho, J. S. Park, and S. J. Kim. Radiologists’

performance in the detection of benign and malignant masses with 3d automated

breast ultrasound (abus). European Journal of Radiology, 78(1):99–103, 2011.

[11] W. J. Chen, M. L. Giger, and U. Bick. A fuzzy c-means (fcm)-based approach

for computerized segmentation of breast lesions in dynamic contrast-enhanced mr

images. Academic Radiology, 13(1):63–72, 2006.

[12] A. M. Chiarelli, M. V. Prummel, D. Muradali, V. Majpruz, M. Horgan, J. C. Carroll,

A. Eisen, W. S. Meschino, R. S. Shumak, E. Warner, and L. Rabeneck. Effectiveness

of screening with annual magnetic resonance imaging and mammography: Results

of the initial screen from the ontario high risk breast screening program. Journal of

Clinical Oncology, 32(21):2224–2230, 2014.

[13] Chen-Pin Chou, John M. Lewin, Chia-Ling Chiang, Bao-Hui Hung, Tsung-Lung

Yang, Jer-Shyung Huang, Jia-Bin Liao, and Huay-Ben Pan. Clinical evaluation

of contrast-enhanced digital mammography and contrast enhanced tomosynthesis-

comparison to contrast-enhanced breast mri. European journal of radiology,

84(12):2501–8, 2015.

[14] Y. F. Cui, Y. Q. Tan, B. S. Zhao, L. Liberman, R. Parbhu, J. Kaplan,

M. Theodoulou, C. Hudis, and L. H. Schwartz. Malignant lesion segmentation

in contrast-enhanced breast mr images based on the marker-controlled watershed.

Medical Physics, 36(10):4359–4369, 2009.

[15] G. Ertas, O. Gulcur, and M. Tunaci. Improved lesion detection in mr mammography:

Three-dimensional segmentation, moving voxel sampling, and normalized maximum

intensity-time ratio entropy. Academic Radiology, 14(2):151–161, 2007.

Bibliography 58

[16] T. W. Freer and M. J. Ulissey. Screening mammography with computer-aided detec-

tion: Prospective study of 12,860 patients in a community breast center. Radiology,

220(3):781–786, 2001.

[17] C. Gallego-Ortiz and A.L. Martel. Improving the accuracy of computer-aided diag-

nosis for breast mr imaging by differentiating between mass and nonmass lesions.

Radiology, 0(0):150241, 2015. PMID: 26383229.

[18] A. Gubern-Merida, R. Marti, J. Melendez, J. L. Hauth, R. M. Mann, N. Karsse-

meijer, and B. Platel. Automated localization of breast cancer in dce-mri. Medical

Image Analysis, 20(1):265–274, 2015.

[19] Leichter Isaac, Lederman Richard, Buchbinder Shalom, Srour Yossi, Bamberger

Philippe, and Sperber Fanny. Computerized classification can reduce unnecessary

biopsies in bi-rads category 4a lesions. In Proceedings of the 8th International Con-

ference on Digital Mammography, IWDM’06, pages 76–83, Berlin, Heidelberg, 2006.

Springer-Verlag.

[20] Sanaz A. Jansen, Xiaobing Fan, Gregory S. Karczmar, Hiroyuki Abe, Robert A.

Schmidt, and Gillian M. Newstead. Differentiation between benign and malignant

breast lesions detected by bilateral dynamic contrast-enhanced mri: A sensitivity

and specificity study. Magnetic Resonance in Medicine, 59(4):747–754, 2008.

[21] J. Jayender, S. Chikarmane, F. A. Jolesz, and E. Gombos. Automatic segmentation

of invasive breast carcinomas from dynamic contrast-enhanced mri using time series

analysis. Journal of Magnetic Resonance Imaging, 40(2):467–475, 2014.

[22] K. M. Kelly and G. A. Richwald. Automated whole-breast ultrasound: Advancing

the performance of breast cancer screening. Seminars in Ultrasound Ct and Mri,

32(4):273–280, 2011.

Bibliography 59

[23] Xu Kelvin, Ba Jimmy, Kiros Ryan, Cho Kyunghyun, C. Courville Aaron, Salakhut-

dinov Ruslan, S. Zemel Richard, and Bengio Yoshua. Show, attend and tell: Neural

image caption generation with visual attention. CoRR, abs/1502.03044, 2015.

[24] L. A. L. Khoo, P. Taylor, and R. M. Given-Wilson. Computer-aided detection in the

united kingdom national breast screening programme: Prospective study. Radiology,

237(2):444–449, 2005.

[25] M. V. Knopp, E. Weiss, H. P. Sinn, J. Mattern, H. Junkermann, J. Radeleff, A. Ma-

gener, G. Brix, S. Delorme, I. Zuna, and G. van Kaick. Pathophysiologic basis

of contrast enhancement in breast tumors. Jmri-Journal of Magnetic Resonance

Imaging, 10(3):260–266, 1999.

[26] T. M. Kolb, J. Lichy, and J. H. Newhouse. Comparison of the performance of

screening mammography, physical examination, and breast us and evaluation of

factors that influence them: An analysis of 27,825 patient evaluations. Radiology,

225(1):165–175, 2002.

[27] C. Kuhl. The current status of breast mr imaging - part i. choice of technique, im-

age interpretation, diagnostic accuracy, and transfer to clinical practice. Radiology,

244(2):356–378, 2007.

[28] M. A. Lacquement, D. Mitchell, and A. B. Hollingsworth. Positive predictive value

of the breast imaging reporting and data system. Journal of the American College

of Surgeons, 189(1):34–40, 1999.

[29] P. Lambin, E. Rios-Velazquez, R. Leijenaar, S. Carvalho, Rgpm van Stiphout,

P. Granton, C. M. L. Zegers, R. Gillies, R. Boellard, A. Dekker, Hjwl Aerts, and I. C.

ConCePT Consortium Qu. Radiomics: Extracting more information from medical

images using advanced feature analysis. European Journal of Cancer, 48(4):441–446,

2012.

Bibliography 60

[30] Q. V. Le and Ieee. Building high-level features using large scale unsupervised learn-

ing. In IEEE International Conference on Acoustics, Speech, and Signal Process-

ing (ICASSP), International Conference on Acoustics Speech and Signal Processing

ICASSP, pages 8595–8598, NEW YORK, 2013. Ieee.

[31] J. E. D. Levman, E. Warner, P. Causer, and A. L. Martel. A vector machine

formulation with application to the computer-aided diagnosis of breast cancer from

dce-mri screening examinations. Journal of Digital Imaging, 27(1):145–151, 2014.

[32] R. Lucht, S. Delorme, and G. Brix. Neural network-based segmentation of dynamic

mr mammographic images. Magnetic Resonance Imaging, 20(2):147–154, 2002.

[33] S. Marrone, G. Piantadosi, R. Fusco, A. Petrillo, M. Sansone, and C. Sansone.

Automatic lesion detection in breast dce-mri. Image Analysis and Processing (Iciap

2013), Pt Ii, 8157:359–368, 2013.

[34] A. L. Martel, M. S. Froh, K. K. Brock, D. B. Plewes, and D. C. Barber. Evaluating an

optical-flow-based registration algorithm for contrast-enhanced magnetic resonance

imaging of the breast. Physics in Medicine and Biology, 52(13):3803–3816, 2007.

[35] Anne L. Martel, Cristina Gallego-Ortiz, and YingLi Lu. Breast segmentation in mri

using poisson surface reconstruction initialized with random forest edge detection,

2016.

[36] L. A. Meinel, A. H. Stolpen, K. S. Berbaum, L. L. Fajardo, and J. M. Reinhardt.

Breast mri lesion classification: Improved performance of human readers with a

backpropagation neural network computer-aided diagnosis (cad) system. Journal of

Magnetic Resonance Imaging, 25(1):89–95, 2007.

[37] Dustin Newell, Ke Nie, Jeon-Hor Chen, Chieh-Chih Hsu, Hon J. Yu, Orhan Nal-

cioglu, and Min-Ying Su. Selection of diagnostic features on breast mri to dif-

ferentiate between malignant and benign lesions using computer-aided diagnosis:

Bibliography 61

differences in lesions presenting as mass and non-mass-like enhancement. European

Radiology, 20(4):771–781, 2010.

[38] Cancer Care Ontario. Ontario breast screening program 2011 report. Technical

report, Government of Canada, 2011.

[39] Y. C. Pang, L. Li, W. Y. Hu, Y. X. Peng, L. Z. Liu, and Y. Z. Shao. Computerized

segmentation and characterization of breast lesions in dynamic contrast-enhanced

mr images using fuzzy c-means clustering and snake algorithm. Computational and

Mathematical Methods in Medicine, 2012.

[40] V. Pascal, H. Larochelle, Y. Bengio, and M. Pierre-Antoine. Extracting and compos-

ing robust features with denoising autoencoders. In Proceedings of the Twenty-fifth

International Conference on Machine Learning (ICML’08), International Conference

on Machine Learning (ICML), pages 1096–1103, NEW YORK, 2008. ACM.

[41] D. M. Renz, J. Bottcher, F. Diekmann, A. Poellinger, M. H. Maurer, A. Pfeil,

F. Streitparth, F. Collettini, U. Bick, B. Hamm, and E. M. Fallenberg. Detec-

tion and classification of contrast-enhancing masses by a fully automatic computer-

assisted diagnosis system for breast mri. Journal of Magnetic Resonance Imaging,

35(5):1077–1088, 2012.

[42] R. Rouhi, M. Jafari, S. Kasaei, and P. Keshavarzian. Benign and malignant breast

tumors classification based on region growing and cnn segmentation. Expert Systems

with Applications, 42(3):990–1002, 2015.

[43] S. Shapiro, W. Venet, P. Strax, L. Venet, and R. Roeser. 10-year to 14-year effect

of screening on breast-cancer mortality. Journal of the National Cancer Institute,

69(2):349–355, 1982.

Bibliography 62

[44] Canadian Cancer Society, Statistics Canada, Public Health Agency of Canada, and

Provincial/Territorial Cancer Registries cancer.ca/statistics. Canadian cancer statis-

tics. Technical report, Government of Canada, 2015.

[45] M. X. Tan, J. T. Pu, and B. Zheng. Optimization of breast mass classification

using sequential forward floating selection (sffs) and a support vector machine

(svm) model. International Journal of Computer Assisted Radiology and Surgery,

9(6):1005–1020, 2014.

[46] Mitsuhiro TOZAKI, Takao IGARASHI, and Kunihiko FUKUDA. Positive and neg-

ative predictive values of bi-rads-mri descriptors for focal breast masses. Magnetic

Resonance in Medical Sciences, 5(1):7–15, 2006.

[47] B. van Ginneken, C. M. Schaefer-Prokop, and M. Prokop. Computer-aided diagnosis:

How to move from the laboratory to the clinic. Radiology, 261(3):719–732, 2011.

[48] A. van Opbroek, M. A. Ikram, M. W. Vernooij, and M. de Bruijne. Transfer learning

improves supervised image segmentation across imaging protocols. Ieee Transactions

on Medical Imaging, 34(5):1018–1030, 2015.

[49] A. Vignati, V. Giannini, M. De Luca, L. Morra, D. Persano, L. A. Carbonaro,

I. Bertotto, L. Martincich, D. Regge, A. Bert, and F. Sardanelli. Performance of

a fully automatic lesion detection system for breast dce-mri. Journal of Magnetic

Resonance Imaging, 34(6):1341–1351, 2011.

[50] T. C. Wang, Y. H. Huang, C. S. Huang, J. H. Chen, G. Y. Huang, Y. C. Chang, and

R. F. Chang. Computer-aided diagnosis of breast dce-mri using pharmacokinetic

model and 3-d morphology analysis. Magnetic Resonance Imaging, 32(3):197–205,

2014.

[51] E. Warner, K. Hill, P. Causer, D. Plewes, R. Jong, M. Yaffe, W. D. Foulkes,

P. Ghadirian, H. Lynch, F. Couch, J. Wong, F. Wright, P. Sun, and S. A. Narod.

Bibliography 63

Prospective study of breast cancer incidence in women with a brca1 or brca2 mu-

tation under surveillance with and without magnetic resonance imaging. Journal of

Clinical Oncology, 29(13):1664–1669, 2011.

[52] Wikipedia. Otsu’s method — Wikipedia, the free encyclopedia, 2015.

[53] Wikipedia. Watershed (image processing) — Wikipedia, the free encyclopedia, 2015.

automatic computer aided diagnosis of breast cancer in ...€¦ · automatic computer aided...

Documents