medical image segmentation for embryo image …...medical image segmentation for embryo image...

56
MEDICAL IMAGE SEGMENTATION FOR EMBRYO IMAGE ANALYSIS A THESIS SUBMITTED TO THE GRADUATE DIVISION OF THE UNIVERSITY OF HAWAI‘I IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN ELECTRICAL ENGINEERING MAY 2020 By Md Yousuf Harun Thesis Committee: Dr. Aaron Ohta, Chairperson Dr. Il Yong Chun, Chairperson Dr. Victor Lubecke

Upload: others

Post on 26-Jan-2021

11 views

Category:

Documents


0 download

TRANSCRIPT

  • MEDICAL IMAGE SEGMENTATION FOR EMBRYO IMAGE ANALYSIS

    A THESIS SUBMITTED TO THE GRADUATE DIVISION OF THEUNIVERSITY OF HAWAI‘I IN PARTIAL FULFILLMENT OF THE

    REQUIREMENTS FOR THE DEGREE OF

    MASTER OF SCIENCE

    IN

    ELECTRICAL ENGINEERING

    MAY 2020

    ByMd Yousuf Harun

    Thesis Committee:

    Dr. Aaron Ohta, ChairpersonDr. Il Yong Chun, Chairperson

    Dr. Victor Lubecke

  • c© Copyright 2020by

    Md Yousuf HarunAll Rights Reserved

    ii

  • To the sundial in the center of the courtyard

    iii

  • Acknowledgements

    I would like to express my gratitude to a number of individuals who have been instrumental

    to my education over the past two years at the University of Hawai‘i at Mānoa.

    First of all, I would like to thank my advisors, Dr. Aaron Ohta and Dr. Il Yong Chun

    for their enormous help in my MS research.

    I am grateful to Dr. Aaron Ohta for involving me in the exciting interdisciplinary

    embryo image segmentation project that brings together doctors and engineers. He always

    supported me and gave me the freedom to pursue my research in directions that were of

    special interest to me. I have learned many invaluable things from him which will pave my

    future research endeavors. Working with him is a great opportunity and rewarding in every

    aspect of my academic life.

    I want to express my gratitude to Dr. Il Yong Chun for introducing me to the exciting

    field of computational medical imaging. He always motivated me and guided me to perform

    good research. I am thankful to him for helping me to improve my technical writing skills.

    I would like to thank Dr. Victor Lubecke for taking time to be a part of my thesis

    committee. I appreciate the input and support he has given in order to improve this thesis.

    I want to thank Dr. Thomas Huang for his guidance and collaboration in the embryo

    image segmentation project. No result in this thesis would have been possible without the

    fruitful cooperation between doctors and engineers.

    I am also thankful to M Arifur Rahman, Kareem Elassy, Mohsen Paryavi,

    Meenakshi Vohra, Richie Chio and many laboratory colleagues I have had for their

    support and guidance. The discussions I have had with them were instrumental to my

    iv

  • research and this manuscript. Special thanks to Arif for his help and suggestions in all

    aspects of graduate life

    Most of all, I would like to thank my parents, family and friends for their continued support

    throughout my life. Their encouragement has driven me to do my best as both a student

    and a person. I dedicate this thesis to them.

    v

  • Abstract

    This thesis describes a project that applies electrical engineering to biomedical applications.

    The project involves the development of a deep learning-based image segmentation method

    to identify cellular regions in microscopic images of human embryos for their morphological

    and morphokinetic analysis during in vitro fertilization (IVF) treatment. First, we aim

    to segment inner cell mass (ICM) and trophectoderm epithelium (TE) in zona pellucida

    (ZP)-intact embryos imaged by a microscope for morphological analysis. ICM and TE

    segmentation in ZP-intact embryonic images is difficult due to small number of training

    images (211 ZP-intact embryonic images) and similar textures among ICM, TE, ZP, and

    artifacts. We overcame the aforementioned challenges by leveraging deep learning and

    semantic segmentation techniques. In this work, we implemented a UNet variant model

    named Residual Dilated UNet (RD-UNet) to segment ICM and TE in ZP-intact embryonic

    images. We added residual convolution to the encoding and decoding units and replaced

    conventional convolutional layer with multiple dilated convolutional layers at the central

    bridge of RD-UNet. The experimental results with a testing set of 38 ZP-intact embryonic

    images demonstrate that RD-UNet outperforms existing models. RD-UNet can identify

    ICM with a Dice Coefficient of 94.3% and a Jaccard Index of 89.3%. The model can

    segment TE with a Dice Coefficient of 92.5% and a Jaccard Index of 85.3%.

    Second, we aim to segment inner cell regions in ZP-ablated embryonic images obtained

    by time-lapse microscopic imaging for morphokinetic analysis. Segmenting inner cell

    regions in ZP-ablated embryonic images has following challenges: irregular expansion of

    vi

  • inner cell, surrounding fragmented cellular clusters and artifacts, and inner cell expansion

    beyond culture well. We proposed a UNet based architecture named Deep Dilated Residual

    Recurrent UNet (D2R2-UNet) to segment inner cell regions in ZP-ablated embryonic

    images. We incorporated residual recurrent convolution into the encoding and decoding

    units, dilated convolution into the central bridge, and residual convolution into the

    encoder-decoder skip-connections in order to maximize the segmentation performance. The

    experimental results with a testing set of 342 ZP-ablated embryonic images demonstrate

    that the proposed D2R2-UNet improves inner cell segmentation performances over existing

    UNet variants. Our model obtains the best overall performance as compared to other

    models in inner cell segmentation, with a Jaccard Index of 95.65% and a Dice Coefficient

    of 97.78%.

    vii

  • Table of Contents

    Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

    Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

    List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

    List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

    Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.1 Motivation of Embryo Image Segmentation . . . . . . . . . . . . . . . . . . 1

    1.2 Quantitative Evaluation of Embryo Viability . . . . . . . . . . . . . . . . . 3

    1.3 ICM and TE Segmentation Challenges in Morphological Analysis . . . . . . 4

    1.4 Inner Cell Segmentation Challenges in Morphokinetic Analysis . . . . . . . 5

    1.5 Semantic Segmentation with Deep Learning . . . . . . . . . . . . . . . . . . 6

    1.6 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    Chapter 2: Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    2.1 Baseline UNet Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    2.2 Proposed D2R2-UNet Architecture . . . . . . . . . . . . . . . . . . . . . . . 13

    2.2.1 Residual Convolutional Unit . . . . . . . . . . . . . . . . . . . . . . 14

    2.2.2 Recurrent Convolutional Unit . . . . . . . . . . . . . . . . . . . . . . 14

    2.2.3 Residual Recurrent (R2) Convolutional Unit . . . . . . . . . . . . . 15

    2.2.4 Dilated Convolution in the Central Bridge . . . . . . . . . . . . . . . 16

    2.2.5 Residual Convolutional Skip-Connections between Encoder and

    Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    viii

  • 2.3 RD-UNet Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    2.4 Network Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    2.5 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    2.6 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    2.7 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    2.7.1 Dataset for ICM and TE Segmentation . . . . . . . . . . . . . . . . 21

    2.7.2 Dataset for Inner Cell Segmentation . . . . . . . . . . . . . . . . . . 21

    Chapter 3: Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    3.1 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    3.2 ICM and TE Segmentation Results . . . . . . . . . . . . . . . . . . . . . . . 28

    3.2.1 Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    3.2.2 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    3.3 Inner Cell Segmentation Results . . . . . . . . . . . . . . . . . . . . . . . . 33

    3.3.1 Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    3.3.2 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    Chapter 4: Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 40

    4.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    4.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

    ix

  • List of Tables

    3.1 Comparison of ICM results of our method with that of existing methods

    based on same data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    3.2 Comparison of TE results of our method with that of existing methods based

    on same data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    3.3 Comparison among different UNet architectures based on their inner cell

    segmentation performance evaluated on same testing set . . . . . . . . . . . 34

    x

  • List of Figures

    1.1 (a) an image of an embryo and its (b) annotated regions. Here, ZP, ICM,

    CM, and TE denote zona pellucida, inner cell mass, cavity mass, and

    trophectoderm epithelium, respectively. . . . . . . . . . . . . . . . . . . . . 3

    1.2 Expansion kinetics of (a) a genetically normal embryo (b) a genetically

    abnormal embryo. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    1.3 Examples of inner cell segmentation challenges in ZP-ablated embryo: (a)

    ZP-ablated embryo, (b) artifacts, (c) inner cell beyond culture well. . . . . 5

    1.4 Semantic segmentation: (a) an image of street view and its (b) pixel

    annotated segmentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    2.1 The baseline UNet architecture [2].The height and width of each box

    represents the image size and number of channels, respectively. The dotted

    boxes denote copied feature maps. . . . . . . . . . . . . . . . . . . . . . . . 13

    2.2 Different convolutional units we compared in UNet. (a) A convolutional

    unit in the baseline UNet [2]. (b) A residual convolutional unit [4]. (c) A

    recurrent convolutional unit [5]. (d) A R2 convolutional unit [11]. (e) A

    recurrent convolutional layer [5] with the number of evolution steps S = 3.

    For all UNet variations, we use ELU [10] instead of RELU [6] since ELU

    slightly improved the embryo image segmentation performance. . . . . . . . 15

    xi

  • 2.3 A residual convolutional encoder-decoder skip-connection consisting of four

    residual convolutional layers, each of which applies 3×3 convolution followed

    by ELU activation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    2.4 The D2R2-UNet architecture for inner cell segmentation: we modify

    the UNet backbone in Fig. 2.1 by using R2 convolutional units, dilated

    convolutional layers, and residual convolutional encoder-decoder skip-

    connections. The height and width of each box represents the image size and

    number of channels, respectively. The black and blue dotted boxes denote

    central bridge and copied feature maps, respectively. . . . . . . . . . . . . . 17

    2.5 The RD-UNet architecture for ICM and TE segmentation [1]: we modify the

    UNet backbone in Fig. 2.1 by using residual convolutional units and dilated

    convolutional layers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    3.1 ICM segmentation results by RD-UNet. The background (non ICM) is

    colored dark cyan, the annotated ground truth ICM is light green, the

    network predicted ICM is yellow, and the contour of the ground truth ICM

    is red. JI and DC stand for Jaccard Index and Dice Coefficient, respectively. 31

    3.2 TE segmentation results by RD-UNet. The background (non TE) is colored

    dark cyan, the annotated ground truth TE is light green, the network

    predicted TE is yellow, and the contour of the ground truth TE is red. JI

    and DC stand for Jaccard Index and Dice Coefficient, respectively. . . . . . 32

    3.3 Comparisons of the joint loss (Equation 2.4) between different UNet variant

    models for inner cell segmentation: (a) training loss and (b) testing loss. . . 34

    3.4 Segmentation results. Light green in 2nd and 5th rows indicates segmented

    inner cell by D2R2-UNet. Red and blue in 3rd and 6th rows indicate the

    boundaries of ground truth and predicted inner cell, respectively. JI and DC

    stand for Jaccard index and Dice coefficient, respectively. . . . . . . . . . . 36

    xii

  • Chapter 1

    Introduction

    1.1 Motivation of Embryo Image Segmentation

    According to Centers for Disease Control and Prevention, almost six million women in

    United States suffer from infertility [1]. The World Health Organization reports the total

    number of patients worldwide suffering from infertility as almost fifty million [1]. The most

    effective treatment for infertility is in vitro fertilization (IVF) and IVF is performed more

    than one million times annually around the world [2]. However, IVF suffers from relatively

    low birth rates, i.e., less than 30% in the US from 1995 to 2016 [1]. One of the reasons of

    such low birth rate is the misidentification of embryo viability. During the IVF process, the

    fertilized eggs (embryos) are cultured in controlled environmental conditions and imaged

    digitally using microscopes or embryoscopes. When the embryos reach their blastocyst stage

    (at least 32 cells on fifth day of culture), the healthiest embryo is selected for implantation.

    Morphology assessment is a standard approach for embryo grading in IVF. Several

    studies have been conducted to find the most important feature of embryo morphology

    [3, 4, 5]. The studies suggest that the morphological features such as inner cell mass

    (ICM), trophectoderm epithelium (TE), and degree of blastocoel cavity expansion relative

    to the zona pellucida (ZP) are effective measures to determine embryo viability. ICM

    eventually develops into a fetus which contains major body organs [4]. Successful

    1

  • hatching of an implanted embryo, i.e., live birth correlates highly with strong TE layer

    [3]. Therefore, identification of ICM and TE regions are important to evaluate embryo

    implantation potential. In addition, [6] reports that the morphokinetics of an embryo

    highly correlates with its genetic quality, i.e., euploid or aneuploid. Here, an embryo with

    higher expansion rate has higher reproductive potential. A related study demonstrates that

    euploid (genetically normal) embryos expand more rapidly than the aneuploid embryos

    (genetically abnormal) [7].

    The identification of inner cell expansion is crucial for morphokinetic analysis of embryo

    towards genetic quality assessment in IVF. Traditionally, embryologists determines embryo

    viability by manually evaluating the morphological features of embryos based on visual

    inspection. This subjective and qualitative approach is prone to human bias and does not

    consider the genetic quality of an embryo. In addition, it poses high risk of misidentification

    of embryo viability, abnormal pregnancies, and health risks; it is a time-consuming task

    for embryologists to manually analyze the embryo morphology. This becomes labor and

    resource inefficient. To increase the chance of successful pregnancy, multiple embryos

    are transferred to mother’s uterus which oftentimes results in multiple pregnancies with

    associated health complications. Thus, identification of the single embryo with the highest

    potential for a live birth is critical to achieve sustain pregnancies and minimize health risks.

    Although preimplantation genetic screening (PGS) provides a good evaluation of embryo

    genetics [8, 9], such genetic testing remains very expensive.

    All the aforementioned issues necessitate a cost effective, automated, quantitative

    method for gauging embryo health. In this study, we developed a deep learning based

    segmentation method to precisely identify 1) ICM and TE regions in ZP-intact embryo

    images, and 2) inner cell in ZP-ablated embryo images. We use deep neural networks

    to recognize both local (texture) and contextual (spatial arrangement) representations of

    different embryo regions and segment them in the noisy images.

    2

  • (a) (b)

    Figure 1.1 (a) an image of an embryo and its (b) annotated regions. Here, ZP, ICM, CM,and TE denote zona pellucida, inner cell mass, cavity mass, and trophectoderm epithelium,respectively.

    1.2 Quantitative Evaluation of Embryo Viability

    There are two main approaches for quantitative evaluation of embryo viability:

    1) morphological analysis: this is based on morphological attributes of embryo cellular

    regions such as inner cell mass and trophectoderm epithelium, as illustrated in Fig. 1.1.

    The size of these regions is a good indicator of embryo viability. Biological studies [3, 4, 5]

    suggest that embryo morphology correlates with its health.

    2) morphokinetic analysis: this is based on morphokinetics of an embryo i.e. how rapidly

    the embryo grows in incubation. Studies [6, 7] suggest that morphokinetics of an embryo

    highly associates with its genetic information. Genetically normal embryos expand more

    rapidly than genetically abnormal embryos. Fig. 1.2 shows that genetically normal embryo

    has higher expansion rate (steep slope) as opposed to that of genetically abnormal embryo

    (flat or negative slope). In the morphokinetic study, embryologists ablate the zona-pellucida

    (ZP) of an embryo to let the inner cell expand beyond ZP region. Then they apply a time-

    lapse microscopic imaging using embryoscopes to capture the images of ZP-ablated embryo

    during a ten hours observation period. Finally, they estimate the total area of inner cell at

    different time points and measure the expansion rate.

    3

  • (a) (b)

    Figure 1.2 Expansion kinetics of (a) a genetically normal embryo (b) a genetically abnormalembryo.

    1.3 ICM and TE Segmentation Challenges in Morphological

    Analysis

    ICM and TE analysis plays crucial roles in determining embryo viability for healthy

    pregnancies in IVF. At blastocyst stage, an embryo consists of three inner regions: 1)

    inner cell mass (ICM), 2) trophectoderm epithelium (TE), and 3) cavity mass (CM). These

    inner regions are enclosed by an outer layer named zona pellucida (ZP). For convenience,

    we refer this embryo as ZP-intact embryo. Fig. 1.1 illustrate a ZP-intact embryo and its

    annotated inner (ICM, CM, TE) and outer (ZP) regions.

    The ICM and TE have similar pixel intensity values, i.e., in general, it is hard to

    distinguish them. They are also surrounded by two other embryo regions such as zona

    pellucida (ZP) and cavity mass (CM) that share similar pixel intensity values, similar to

    ICM and TE. In addition, there exist undesirable fragments and artifacts near the ICM and

    TE regions. The similar pixel intensity values of surrounding CM, ZP, artifacts, fragments,

    and image contrast variations make it challenging to differentiate between ICM and TE

    regions and precisely segment them. The number of training images in the dataset is also

    small (211 images); this poses an additional challenge in the ICM and TE segmentation,

    such as less diversity in training data and overfitting to training data.

    4

  • (a) (b) (c)

    Figure 1.3 Examples of inner cell segmentation challenges in ZP-ablated embryo: (a) ZP-ablated embryo, (b) artifacts, (c) inner cell beyond culture well.

    1.4 Inner Cell Segmentation Challenges in Morphokinetic

    Analysis

    The inner cell segmentation is critical for the morphokinetic study using an embryoscope

    [6]. The inner cell expansion rate is measured over a ten-hour observation period using

    time-lapse microscopic imaging. In these embryos, the embryologists ablate the ZP to

    perform preimplantation genetic screening. The goal of this project is to segment inner cell

    to facilitate the measurement of morphokinetics of an embryo i.e., how rapidly the inner

    cell expands by estimating their total area. To estimate the total area of an embryo, [6]

    segmented objects with circular shapes using the embryoscope software tool.

    There exist significant challenges in this segmentation method, because a) inner cells

    expand with irregular rates, b) some artifacts and fragmented cellular clusters can exist

    close to inner cell outlines, and c) expanded inner cell can have white bands and/or dark

    background due their expansion beyond the culture well. Fig. 1.3 shows some examples of

    such challenges.

    5

  • (a) (b)

    Figure 1.4 Semantic segmentation: (a) an image of street view and its (b) pixel annotatedsegmentation.

    1.5 Semantic Segmentation with Deep Learning

    Semantic segmentation is a high-level task that facilitates the complete scene understanding.

    The semantic segmentation techniques are applied to a wide range of images/videos,

    including still two-dimensional images, three-dimensional or volumetric images, and videos;

    the techniques are used in various applications including autonomous driving [10], human-

    machine interaction [11], computational photography [12], and image search engines [13].

    Semantic segmentation relates to the pixel- or voxel-wise image classification task, where

    each pixel or voxel is labeled according to the classes present in a two-dimensional or three-

    dimensional image; see an example in Fig. 1.4.

    Semantic segmentation has been addressed in the past using various computer vision

    and machine learning techniques such as active contour/sanke model, clustering algorithm,

    watershed algorithm, graph based region merging, random walk, and Markov random field

    [14]. Recent advancements in deep learning have shown potential to solve challenging

    image segmentation problems [15]. The most popular convolutional neural network (CNN)

    model is UNet which shows significant performances in medical image segmentation tasks

    [16]. The UNet architecture has been modified for medical image segmentation tasks in

    6

  • various medical applications such as retina blood vessel segmentation [17], liver and tumor

    segmentation [18], skin lesion segmentation [19], and surgical instrument segmentation [20].

    To perform semantic segmentation, the CNNs learn representative features of an image

    and convert them into a pixel-wise categorization. In general, semantic segmentation CNN

    models consist of an encoding network and a decoding network. The encoder converts an

    input image into a set of representative feature maps. The role of the decoder is to convert

    the encoded features often in lower spatial resolution into the original high-resolution pixel

    space and generate a pixel-wise classification map.

    1.6 Outline

    This thesis contributes to the embryo image segmentation for IVF treatment. In the

    following section, an outline of the thesis is provided.

    Chapter 2 describes the methodology, proposed or implemented neural network

    architecture, network specification, loss function, implementation details, and dataset.

    Chapter 3 describes the evaluation metrics, performance comparison, results, and

    discussions.

    Chapter 4 summarizes the performance of developed methods and contributions of

    our works to some applications/areas. The chapter also discusses future research

    work/direction.

    7

  • References

    [1] N. Gleicher, V. Kushnir, and D. Barad, “Worldwide decline of IVF birth rates and its

    probable causes,” Human Reproduction Open, vol. 2019, no. 3, p. hoz017, 2019.

    [2] E. Santos Filho, J. Noble, and D. Wells, “A review on automatic analysis of human

    embryo microscope images,” The open biomedical engineering journal, vol. 4, p. 170,

    2010.

    [3] A. Ahlström, C. Westin, E. Reismer, M. Wikland, and T. Hardarson, “Trophectoderm

    morphology: an important parameter for predicting live birth after single blastocyst

    transfer,” Human Reproduction, vol. 26, no. 12, pp. 3289–3296, 2011.

    [4] C. Lagalla, M. Barberi, G. Orlando, R. Sciajno, M. A. Bonu, and A. Borini, “A

    quantitative approach to blastocyst quality evaluation: morphometric analysis and

    related IVF outcomes,” Journal of Assisted Reproduction and Genetics, vol. 32, no. 5,

    pp. 705–712, 2015.

    [5] W. B. Schoolcraft, D. K. Gardner, M. Lane, T. Schlenker, F. Hamilton, and D. R.

    Meldrum, “Blastocyst culture and transfer: analysis of results and parameters affecting

    outcome in two in vitro fertilization programs,” Fertility and Sterility, vol. 72, no. 4,

    pp. 604–609, 1999.

    [6] T. T. Huang, D. H. Huang, H. J. Ahn, C. Arnett, and C. T. Huang, “Early blastocyst

    expansion in euploid and aneuploid human embryos: evidence for a non-invasive and

    8

  • quantitative marker for embryo selection,” Reproductive Biomedicine Online, vol. 39,

    no. 1, pp. 27–39, 2019.

    [7] T. T. Huang, B. C. Walker, M. Harun, A. T. Ohta, M. Rahman, J. Mellinger,

    and W. Chang, “Automated computer analysis of human blastocyst expansion from

    embryoscope time-lapse image files,” Fertility and Sterility, vol. 112, no. 3, pp. e292–

    e293, 2019.

    [8] R. T. Scott Jr, K. Ferry, J. Su, X. Tao, K. Scott, and N. R. Treff, “Comprehensive

    chromosome screening is highly predictive of the reproductive potential of human

    embryos: a prospective, blinded, nonselection study,” Fertility and Sterility, vol. 97,

    no. 4, pp. 870–875, 2012.

    [9] M. D. Werner, M. P. Leondires, W. B. Schoolcraft, B. T. Miller, A. B. Copperman,

    E. D. Robins, F. Arredondo, T. N. Hickman, J. Gutmann, W. J. Schillings et al.,

    “Clinically recognizable error rate after the transfer of comprehensive chromosomal

    screened euploid embryos is low,” Fertility and Sterility, vol. 102, no. 6, pp. 1613–1618,

    2014.

    [10] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson,

    U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene

    understanding,” in Proceedings of the IEEE conference on computer vision and pattern

    recognition, 2016, pp. 3213–3223.

    [11] M. Oberweger, P. Wohlhart, and V. Lepetit, “Hands deep in deep learning for hand

    pose estimation,” arXiv preprint arXiv:1502.06807, 2015.

    [12] Y. Yoon, H.-G. Jeon, D. Yoo, J.-Y. Lee, and I. So Kweon, “Learning a deep

    convolutional network for light-field image super-resolution,” in Proceedings of the

    IEEE international conference on computer vision workshops, 2015, pp. 24–32.

    9

  • [13] J. Wan, D. Wang, S. C. H. Hoi, P. Wu, J. Zhu, Y. Zhang, and J. Li, “Deep learning

    for content-based image retrieval: A comprehensive study,” in Proceedings of the 22nd

    ACM international conference on Multimedia, 2014, pp. 157–166.

    [14] H. Zhu, F. Meng, J. Cai, and S. Lu, “Beyond pixels: A comprehensive survey from

    bottom-up to semantic image segmentation and cosegmentation,” Journal of Visual

    Communication and Image Representation, vol. 34, pp. 12–27, 2016.

    [15] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A.

    Van Der Laak, B. Van Ginneken, and C. I. Sánchez, “A survey on deep learning in

    medical image analysis,” Medical Image Analysis, vol. 42, pp. 60–88, 2017.

    [16] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for

    biomedical image segmentation,” in International Conference on Medical Image

    Computing and Computer-Assisted Intervention (MICCAI). Springer, 2015, pp. 234–

    241.

    [17] M. Z. Alom, M. Hasan, C. Yakopcic, T. M. Taha, and V. K. Asari, “Recurrent

    residual convolutional neural network based on U-Net (R2U-Net) for medical image

    segmentation,” arXiv preprint arXiv:1802.06955, 2018.

    [18] X. Li, H. Chen, X. Qi, Q. Dou, C.-W. Fu, and P.-A. Heng, “H-denseunet: hybrid

    densely connected unet for liver and tumor segmentation from ct volumes,” IEEE

    transactions on medical imaging, vol. 37, no. 12, pp. 2663–2674, 2018.

    [19] N. Ibtehaz and M. S. Rahman, “Multiresunet: Rethinking the U-Net architecture for

    multimodal biomedical image segmentation,” Neural Networks, vol. 121, pp. 74–87,

    2020.

    [20] Z.-L. Ni, G.-B. Bian, X.-H. Zhou, Z.-G. Hou, X.-L. Xie, C. Wang, Y.-J. Zhou, R.-Q.

    Li, and Z. Li, “Raunet: Residual attention u-net for semantic segmentation of cataract

    surgical instruments,” in International Conference on Neural Information Processing.

    Springer, 2019, pp. 139–149.

    10

  • Chapter 2

    Methodology

    We implement RD-UNet model [1] for segmenting ICM and TE regions in ZP-intact

    embryo images obtained by a microscope. The model is based on baseline UNet architecture

    [2], residual convolutional units, and dilated convolutional layers. We will discuss each of

    these components in the following sections.

    For segmenting inner cell regions in ZP-ablated embryos, we propose a UNet based

    model. Here, we use embryonic images obtained by time-lapse imaging. However, we adopt

    a static image segmentation approach. There are two reasons for this choice: 1) inner cell

    vary dramatically in the consecutive time frames with nonperiodic time points across frames

    or videos (in general, 3 frames/hour); 2) the collected video dataset is relatively small – it

    consists 45 videos with 30 or 31 frames each.

    Inspired by successful applications of UNet to medical image segmentation [3], we

    developed an improved convolutional neural network (CNN) architecture, called Deep

    Dilated Residual Recurrent UNet (D2R2-UNet) for ZP-ablated embryo image segmentation.

    Similar to the original UNet architecture [2], the proposed architecture, D2R2-UNet,

    consists of encoder and decoder of which the last encoding and the first decoding units

    are connected by a central bridge. Inspired by deep residual model [4] and recurrent CNN

    [5], we made the following three modifications to the baseline UNet architecture:

    11

  • 1) We replaced convolutional units of the baseline UNet with residual convolutional

    units using two recurrent convolutional layers, called R2 convolutional units, in both the

    encoder and decoder.

    2) In the central bridge, we replaced the convolutional layers with dilated convolutional

    layers.

    3) We incorporated a series of residual convolutional layers into the baseline UNet

    encoder-decoder skip-connections between encoder and decoder.

    We will discuss the details of those modifications in the following sections.

    2.1 Baseline UNet Architecture

    To better understand our modifications, we first briefly review the baseline UNet

    architecture. The baseline UNet is composed of two symmetrical contracting (encoding)

    and expansive (decoding) units that are connected to each other via encoder-decoder skip-

    connections. The contracting units capture the context, whereas the expanding units enable

    localization. The contracting units encode the input image into a set of feature maps using

    convolutional layers with no skip connections. The expansive units decode the compact

    feature maps into pixel-wise representation, i.e., semantic segmentation. This encoding-

    decoding architecture is useful to perform the semantic segmentation task. The encoder and

    decoder are built on the conventional CNN architecture, and consist of four down-sampling

    and up-sampling convolutional units, respectively. Each down-sampling convolutional unit

    involves a sequence of two convolutional layers with the 3 × 3 kernel size, followed by a

    rectified linear unit (RELU) activation [6], and a max-pooling with the 2 × 2 window size

    and the stride parameter 2. Fig. 2.2(a) demonstrates each convolutional unit in the baseline

    UNet. The number of feature channels get doubled after performing down-sampling at each

    encoder block. At the decoder side, each up-sampling convolutional unit involves a sequence

    of two convolutional layers with the flipped 3×3 kernels of the encoder convolutional units,

    followed by a RELU activation and upsampling with the 2×2 window size and the stride 2.

    The feature channels are reducedby half at each up-sampling step. Then a concatenation,

    12

  • Figure 2.1 The baseline UNet architecture [2].The height and width of each box representsthe image size and number of channels, respectively. The dotted boxes denote copied featuremaps.

    i.e., skip connection, is established between down-sampled and upsampled features. In

    the final layer in the decoder, a sigmoid activation is performed to generate class-wise

    probabilities for each pixel. Both encoder and decoder consist of four convolutional units.

    See the baseline UNet architecture in Fig. 2.1.

    2.2 Proposed D2R2-UNet Architecture

    To better capture the context particularly related to small structures, i.e., to improve

    context modulation [5], we use residual recurrent (R2) convolutional units, instead of

    typical convolutional units in the baseline UNet. R2 convolutional unit combines the

    strength of resdiual and recurrent learning which benefits CNN to increase segmentation

    performance. To facilitate context extraction from high-level features at the central bridge

    without increasing CNN parameter dimensions, we use dilated convolutional layers [7] in

    the central bridge, rather than regular convolutional layers used in the baseline UNet. To

    reduce semantic disparity between low-level and high-level features [8] and better recover lost

    information during pooling operation, we incorporate series of residual convolutional layers

    into the baseline UNet encoder-decoder skip-connections. We describe these modifications

    in details in the following subsections.

    13

  • 2.2.1 Residual Convolutional Unit

    Skip connections [4] are incorporated to each convolutional unit of the baseline UNet

    [2], based on the empirical results in [9] that having skip connections produced benign

    optimization landscape in training. We hypothesize that residual convolutional units

    improve the training/testing performances by considering that the baseline UNet is

    sufficiently deep (23 convolutional layers; further modifications in the following subsections

    lead to 36 convolutional layers). In a residual unit, there exists a residual skip connection

    between input to the first convolutional layer and the output of second convolutional layer.

    This residual skip connection is implemented by a 1× 1 convolution. Fig. 2.2(b) depicts a

    residual convolutional unit.

    2.2.2 Recurrent Convolutional Unit

    We replace conventional convolutional units in the baseline UNet with recurrent

    convolutional units that benefit CNN to better understand contexts especially related to

    small objects, while avoiding increasing the number of CNN parameters [5]. At each step

    s ≥ 1, we add a recurrent feature and feed-forward feature, each computed by a shared

    convolutional kernel. Specifically, we use recurrent convolutional units with the number of

    evolution steps S = 3, where a first recurrent convolutional layer performs the following

    evolution steps:

    x(0)c = A(fc ~ x)

    x(1)c = A(rc ~ x(0)c + fc ~ x)

    x(2)c = A(rc ~ x(1)c + fc ~ x)

    (2.1)

    for c = 1, . . . , C, in which C is the number of channels. Here, A is some activation function,

    e.g., RELU [6], and ELU [10], the subscript index (·)c denotes the c-th convolutional channel,

    the superscript indices (·)(s) denote the step points, s = 0, . . . , S−1, ~ denotes a convolution

    operator, fc and rc is a feed-forward and recurrent convolutional kernel at the c-th channel,

    respectively, ∀c, and x denotes the input. The second recurrent convolutional layer does not

    expand the number of channels, i.e., it replaces x with the output from the first recurrent

    14

  • (a) (b) (c) (d) (e)

    Figure 2.2 Different convolutional units we compared in UNet. (a) A convolutional unit inthe baseline UNet [2]. (b) A residual convolutional unit [4]. (c) A recurrent convolutionalunit [5]. (d) A R2 convolutional unit [11]. (e) A recurrent convolutional layer [5] with thenumber of evolution steps S = 3. For all UNet variations, we use ELU [10] instead of RELU[6] since ELU slightly improved the embryo image segmentation performance.

    convolutional layer, x(2)c , ∀c, in Equation 2.1. See graphical illustrations for a recurrent

    convolutional unit consisting of these two recurrent convolutional layers in Fig. 2.2(c), and

    a S = 3 recurrent convolutional layer in Fig. 2.2(e).

    Using the recurrent convolutional units increases the UNet depth, while avoiding

    increasing the UNet complexity by using shared convolutional kernels. We expect that

    this is useful to better understand the context while avoiding overfitting risks (using S = 4

    recurrent convolutional units improved the image recognition performances over CNN that

    has the same depth and number of parameters by simply increasing the depth of CNN [5]).

    We observed that in our application, using S = 3 recurrent convolutional units gives better

    overall image segmentation performance, compared to using S = 2 and S = 4 recurrent

    convolutional units.

    2.2.3 Residual Recurrent (R2) Convolutional Unit

    To further improve the image segmentation performance, we fuse recurrent convolutional

    layers with residual connectivity and form R2 convolutional unit, similar to R2-UNet [11].

    Fig. 2.2(d) depicts a R2 convolutional unit. Different from R2-UNet [11] that uses four

    evolution steps S = 4, we use three S = 3 evolution steps (four evolution steps did

    15

  • not improve the performance). Fig. 2.2(e) shows a R2 convolutional unit. This rigorous

    formation develops a more efficient CNN that improves the segmentation performance due

    to its better understanding of context during multiple evolution steps.

    2.2.4 Dilated Convolution in the Central Bridge

    Receptive field of the CNN plays critical role for semantic image segmentation. A broader

    receptive field helps to extract information from larger region of the image. Stacking

    more convolutional layers increases receptive field size linearly by kernel size, but increases

    number of NN parameters [12]. Moreover, adding more down-sampling layers also expands

    receptive field size multiplicatively, which comes at a price of spatial information loss [12].

    Alternatively, dilated convolution provides exponential expansion of the receptive field with

    no increase in NN parameter and loss of spatial information [7]. Unlike typical convolution

    with no space between kernel weights, dilated convolution inserts zero(s) between kernel

    weights depending on dilation rate and expands receptive field size accordingly. For example,

    a 3 × 3 kernel with dilation rate 2 increases receptive field size from 3 × 3 to 7 × 7, while

    keeping the number of kernel parameters as 9. After several downsampling steps, we add

    multiple dilated convolution layers in the central bridge similar to [13], rather than stacking

    additional pooling layers and/or typical convolutional layers. Therefore, we can preserve

    spatial information in the central bridge and expand the receptive field of the baseline-

    UNet from 140 × 140 to 198 × 198. Thus, adding multiple dilated convolutional layers to

    the central bridge helps to expand network’s receptive field with larger access to the input.

    This benefits CNN to better capture context and improve segmentation prediction.

    2.2.5 Residual Convolutional Skip-Connections between Encoder and

    Decoder

    The conventional UNet encoder-decoder skip connections copy encoded features in the

    encoder to the upsampled features in the decoder, which are supposed to be of higher

    level because they are derived at the very deep UNet layers. Merging two sets of these

    16

  • Figure 2.3 A residual convolutional encoder-decoder skip-connection consisting of fourresidual convolutional layers, each of which applies 3 × 3 convolution followed by ELUactivation.

    Figure 2.4 The D2R2-UNet architecture for inner cell segmentation: we modify the UNetbackbone in Fig. 2.1 by using R2 convolutional units, dilated convolutional layers, andresidual convolutional encoder-decoder skip-connections. The height and width of each boxrepresents the image size and number of channels, respectively. The black and blue dottedboxes denote central bridge and copied feature maps, respectively.

    features in the decoder facilitates the spatial information propagation and recovers the

    lost information in upsampled features during pooling and/or RELU operations. However,

    semantic gap potentially exists between the two sets of features, and this discrepancy might

    affect the prediction accuracy [8]. To moderate this potential issue, we adapt the technique

    in [8] that incorporates residual convolutional layers into the conventional encoder-decoder

    skip-connections. Fig. 2.3 shows a residual convolutional encoder-decoder skip-connection.

    2.3 RD-UNet Architecture

    We implement a CNN architecture named Residual Dilated UNet (RD-UNet) [1] for the

    ICM and TE segmentation. The RD-UNet is a modified version of the baseline UNet [2].

    17

  • Figure 2.5 The RD-UNet architecture for ICM and TE segmentation [1]: we modify theUNet backbone in Fig. 2.1 by using residual convolutional units and dilated convolutionallayers.

    We included residual convolutional units (see section 2.2.1) in the encoder and decoder

    and dilated convolutional layers (see section 2.2.4) in the central bridge to improve the

    segmentation performance. See Fig. 2.5 for the detailed architecture of RD-UNet.

    2.4 Network Specifications

    The encoder and decoder consisting of four R2 convolutional units, are connected via

    residual convolutional encoder-decoder skip-connections in D2R2-UNet, as shown in

    Fig. 2.4. The number of channels is c = 16 in the first unit of encoder (left most) and

    we double the number in each successive unit (towards central bridge). Accordingly, we set

    the number of channels in the first unit of decoder (next to central bridge) to c = 8 · 16 and

    halve the number in each successive unit (towards final unit at right most). We reduced the

    size of feature maps by half at each encoding step and doubled at each decoding step. Then,

    we added five dilated convolutional layers to the central bridge using the dilation rates of 1,

    2, 4, 8, and 16, successively. Since, the semantic gap between encoder and decoder tends to

    decrease from shallow layers (at left) towards deep layers (at center), we gradually reduce

    the number of residual convolutional layers (4, 3, 2, and 1) in skip-connections between

    encoder and decoder in the direction from shallow layers to deep layers. We also added

    batch normalization to accelerate convergence [14]. Besides that, we included 5% dropout

    18

  • to prevent overfitting [15].

    We compared the proposed D2R2-UNet with other candidate models such as baseline

    UNet [2], Dilated UNet (D-UNet) [13], Residual UNet (Res-UNet) [16], Recurrent UNet

    (Rec-UNet) [11], Recurrent Residual UNet (R2-UNet) [11], and Residual Dilated UNet (RD-

    UNet) [1]. Baseline UNet consists of typical convolutional units. D-UNet employs series of

    five dilated convolutional layers with dilation rates 1, 2, 4, 8, and 16 in the central bridge.

    Res-UNet includes residual convolutional units. Rec-UNet utilizes recurrent convolutional

    units. R2-UNet has R2 convolutional units. Finally, RD-UNet consists of a central bridge

    similar to D-UNet and residual convolutional units. For the fair comparison among all the

    CNNs, we kept the basic architecture equivalent and optimization hyperparameters and

    dataset identical.

    The RD-UNet for ICM and TE segmentation consists of four residual convoutional units

    in encoder and decoder. It includes series of four dilated convolutional layers with dilation

    rates 1, 2, 4, and 8 in the central bridge. Here, we used RELU activation [6] in each

    encoding and decoding unit. Fig. 2.5 illustrates the detailed architecture. We compare the

    RD-UNet model with existing models i.e., CNN with discrete cosine transform (DCT) [17],

    coarse-to-fine texture analysis [18], texture analysis, clustering, and watershed algorithm

    [19], VGG16 [20], and SD-UNet [13] for ICM segmentation. For TE segmentation, we

    compare the RD-UNet model with existing models i.e., level-set algorithm and Retinex

    theory [21], CNN with discrete cosine transform [17], and texture analysis, clustering, and

    watershed algorithm [19].

    2.5 Loss Function

    We aim to classify each pixel based on two classes, target (inner cell corresponding to 1)

    and background (non inner cell corresponding to 0), so our image segmentation problem can

    19

  • be viewed as a pixel-wise binary classification problem. A natural choice for training loss

    function is the binary cross-entropy loss E(S;x,y) that learns an image segmentation CNN

    S by using input image x and ground-truth annotation image y ∈ {0, 1} by minimizing

    averaged pixel-wise cross-entropy:

    E(S;x,y) = − 1N

    N∑n=1

    yn log(S(x)n) + (1− yn) log(1− S(x)n) (2.2)

    where the sigmoid function in the final layer of S gives probability prediction values,

    i.e., {S(x)n ∈ (0, 1) : ∀n}.

    In our training images, the number of pixels classified as backgrounds often dominates

    that classified as inner cell, so the cross-entropy training loss in Equation 2.2 can potentially

    underestimate the inner cell prediction. To overcome the class imbalance limitation, we

    incorporate the Jaccard index J(S;x,y) [22] that quantifies the similarity between ground

    truth annotation y and probability prediction values S(x):

    J(S;x,y) = 1N

    N∑n=1

    ynS(x)nyn + S(x)n − ynS(x)n

    , (2.3)

    Where J(S;x,y) ∈ [0, 1]. Combining Equation 2.2 and 2.3 gives the following joint training

    loss [22]:

    L(S;x,y) = E(S;x,y)− log(J(S;x,y)). (2.4)

    We do not include a regularization parameter in Equation 2.4, because we observed that

    both binary cross entropy and Jaccard loss, Equation 2.2 and 2.3, are in the similar range

    [22]. The net effect is that as the total loss minimizes, one can simultaneously improve

    the pixel classification accuracy and increase the intersection between ground truth and

    predicted segmentation.

    20

  • 2.6 Implementation Details

    We implemented training and testing of all CNNs using Keras with TensorFlow backend.

    We used Nadam optimizer (Adam with Nesterov momentum) [23] to minimize the loss

    function in Equation 2.4. Here, we set the initial learning rate to 10−3 and reduced it by a

    factor of 0.05 in every 5 epochs, whereas the minimum learning rate was 10−5. We trained

    all CNNs on a GPU (NVIDIA GTX 1070 with 8GB memory) with the mini-batch size of

    4. Since loss and Jaccard values stagnated near 100 epochs, we set maximum epochs to

    100. We split the dataset of 1368 images into the training set (75% of dataset) and testing

    set (25% of dataset). We randomly sampled the training set in every epoch to improve the

    learning. Given a small training set, we applied data augmentation such as horizontal and

    vertical flips, rotation in a range up to 270◦, horizontal and vertical shifts up to 10% of

    width or height and zoom up to 10% in size. Finally, we used 0.5 threshold for the final

    semantic probability map.

    2.7 Dataset

    2.7.1 Dataset for ICM and TE Segmentation

    We used the blastocyst dataset [19]. The ICM and TE regions were manually segmented and

    annotated by embryologists at Pacific Centre for Reproductive Medicine, Canada. We used

    the human annotated images as ground truth to evaluate the segmentation performance.

    The dataset has 249 images in total. We split the dataset into two sets: a training set

    consists of 211 images and a testing set consists of 38 images.

    2.7.2 Dataset for Inner Cell Segmentation

    We constructed the dataset of total 1368 images representing genetic health conditions

    (normal/abnormal). The embryologists at the Pacific IVF Institute in Hawai‘i cultured

    and monitored the embryos over 6 days using embryoscopes (Vitrolife, USA). At day 5 of

    culture, they ablated ZPs using a Lykos laser (Hamilton-Thorne, USA). The embryoscopes

    21

  • captured the images of ZP-ablated embryos for 10 hours, using time-lapse imaging technique

    [24; 25]. The pixels corresponding to the inner cell were manually annotated by personnel

    supervised by embryologists. We use human annotated images as ground truth to train and

    test CNNs.

    22

  • References

    [1] M. Y. Harun, T. Huang, and A. T. Ohta, “Inner cell mass and trophectoderm

    segmentation in human blastocyst images using deep neural network,” in 13th IEEE

    International Conference on Nano/Molecular Medicine and Engineering. IEEE, 2019,

    pp. 214–219.

    [2] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for

    biomedical image segmentation,” in International Conference on Medical Image

    Computing and Computer-Assisted Intervention (MICCAI). Springer, 2015, pp. 234–

    241.

    [3] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A.

    Van Der Laak, B. Van Ginneken, and C. I. Sánchez, “A survey on deep learning in

    medical image analysis,” Medical Image Analysis, vol. 42, pp. 60–88, 2017.

    [4] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”

    in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,

    2016, pp. 770–778.

    [5] M. Liang and X. Hu, “Recurrent convolutional neural network for object recognition,”

    in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,

    2015, pp. 3367–3375.

    23

  • [6] V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann

    machines,” in Proceedings of the 27th International Conference on Machine Learning

    (ICML), 2010, pp. 807–814.

    [7] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” arXiv

    preprint arXiv:1511.07122, 2015.

    [8] N. Ibtehaz and M. S. Rahman, “Multiresunet: Rethinking the U-Net architecture for

    multimodal biomedical image segmentation,” Neural Networks, vol. 121, pp. 74–87,

    2020.

    [9] H. Li, Z. Xu, G. Taylor, C. Studer, and T. Goldstein, “Visualizing the loss landscape

    of neural nets,” in Proc. NIPS 31, Montreal, Canada, Dec. 2018, pp. 6389–6399.

    [10] D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and accurate deep network

    learning by exponential linear units (ELUs),” arXiv preprint arXiv:1511.07289, 2015.

    [11] M. Z. Alom, M. Hasan, C. Yakopcic, T. M. Taha, and V. K. Asari, “Recurrent

    residual convolutional neural network based on U-Net (R2U-Net) for medical image

    segmentation,” arXiv preprint arXiv:1802.06955, 2018.

    [12] W. Luo, Y. Li, R. Urtasun, and R. Zemel, “Understanding the effective receptive field

    in deep convolutional neural networks,” in Advances in neural information processing

    systems, 2016, pp. 4898–4906.

    [13] R. M. Rad, P. Saeedi, J. Au, and J. Havelock, “Multi-resolutional ensemble of stacked

    dilated U-Net for inner cell mass segmentation in human embryonic images,” in 25th

    IEEE International Conference on Image Processing (ICIP). IEEE, 2018, pp. 3518–

    3522.

    [14] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by

    reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015.

    24

  • [15] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout:

    a simple way to prevent neural networks from overfitting,” The Journal of Machine

    Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.

    [16] Z. Zhang, Q. Liu, and Y. Wang, “Road extraction by deep residual U-Net,” IEEE

    Geoscience and Remote Sensing Letters, vol. 15, no. 5, pp. 749–753, 2018.

    [17] S. Kheradmand, P. Saeedi, and I. Bajic, “Human blastocyst segmentation using neural

    network,” in IEEE Canadian Conference on Electrical and Computer Engineering

    (CCECE). IEEE, 2016, pp. 1–4.

    [18] R. M. Rad, P. Saeedi, J. Au, and J. Havelock, “Coarse-to-fine texture analysis for

    inner cell mass identification in human blastocyst microscopic images,” in Seventh

    International Conference on Image Processing Theory, Tools and Applications (IPTA).

    IEEE, 2017, pp. 1–5.

    [19] P. Saeedi, D. Yee, J. Au, and J. Havelock, “Automatic identification of human

    blastocyst components via texture,” IEEE Transactions on Biomedical Engineering,

    vol. 64, no. 12, pp. 2968–2978, 2017.

    [20] S. Kheradmand, A. Singh, P. Saeedi, J. Au, and J. Havelock, “Inner cell mass

    segmentation in human HMC embryo images using fully convolutional network,” in

    IEEE International Conference on Image Processing (ICIP). IEEE, 2017, pp. 1752–

    1756.

    [21] A. Singh, J. Au, P. Saeedi, and J. Havelock, “Automatic segmentation of trophectoderm

    in microscopic images of human blastocysts,” IEEE Transactions on Biomedical

    Engineering, vol. 62, no. 1, pp. 382–393, 2014.

    [22] V. Iglovikov, S. Mushinskiy, and V. Osin, “Satellite imagery feature detection

    using deep convolutional neural network: A Kaggle competition,” arXiv preprint

    arXiv:1706.06169, 2017.

    25

  • [23] T. Dozat, “Incorporating nesterov momentum into Adam,” 2016.

    [24] T. T. Huang, D. H. Huang, H. J. Ahn, C. Arnett, and C. T. Huang, “Early blastocyst

    expansion in euploid and aneuploid human embryos: evidence for a non-invasive and

    quantitative marker for embryo selection,” Reproductive Biomedicine Online, vol. 39,

    no. 1, pp. 27–39, 2019.

    [25] T. T. Huang, B. C. Walker, M. Harun, A. T. Ohta, M. Rahman, J. Mellinger,

    and W. Chang, “Automated computer analysis of human blastocyst expansion from

    embryoscope time-lapse image files,” Fertility and Sterility, vol. 112, no. 3, pp. e292–

    e293, 2019.

    26

  • Chapter 3

    Results and Discussion

    3.1 Evaluation

    To validate embryo image segmentation, we used different evaluation metrics such as Jaccard

    index, Dice Coefficient, accuracy, precision, specificity, and recall. These metrics [1] are

    calculated based on four cardinalities i.e., true positive (TP), false positive (FP), true

    negative (TN), and false negative (FN). TP measures the number of pixels that are correctly

    identified as target class (ICM/TE/inner cell). Analogously, TN shows the number of pixels

    that are truly detected as background (non ICM/TE/inner cell). On the contrary, FP

    indicates pixels that are incorrectly identified as target class. Similarly, FN counts pixels

    that are misclassified as background pixels.

    • The Jaccard Index, also termed as intersection over union, is a similarity measure

    that is defined as the intersection between two sets A and B divided by their union,

    that is:

    Jaccard =|A ∩B||A ∪B|

    =TP

    TP + FP + FN(3.1)

    • The Dice Coefficient, also known as overlap index, is also a measure of overlap between

    two sets, defined by:

    Dice =2× |A ∩B||A|+ |B|

    =2× TP

    2× TP + FP + FN(3.2)

    27

  • Dice and Jaccard both are equal to 1 if there exists 100% overlap between predicted

    and ground truth segmentation.

    • Accuracy is the ratio of correctly classified pixels, regardless of class, express as follows:

    Accuracy =TP + TN

    TP + TN + FP + FN(3.3)

    • Specificity, also called true negative rate, calculates the percentage of negative pixels

    in ground truth that are also detected as negative by CNN. It is given by:

    Specificity =TN

    TN + FP(3.4)

    • Precision, also called positive predictive value, measures the percent of correctly

    segmented pixels among all the segmented pixels. In ideal case, precision 1 means

    there is no FP in the segmentation. This is defined as follows:

    Precision =TP

    TP + FP(3.5)

    • Recall is the fraction of all the labeled ICM/TE/inner cell pixels that are correctly

    predicted, can be expressed as follows:

    Recall =TP

    TP + FN(3.6)

    3.2 ICM and TE Segmentation Results

    3.2.1 Quantitative Results

    We compare the ICM segmentation performance of RD-UNet model with existing models

    i.e., CNN with discrete cosine transform [2], coarse-to-fine texture analysis [3], texture

    analysis, clustering, and watershed algorithm [4], VGG16 [5], and SD-UNet [6]; see Table

    28

  • Table 3.1 Comparison of ICM results of our method with that of existing methods basedon same data set.

    Methods Jaccard (%) Dice (%) Precision (%) Recall (%) Accuracy (%)

    CNN with DCT [2] 47.7 64.6 75.6 56.4 93.0

    Coarse-to-fine texture analysis [3] 70.3 82.6 78.7 86.8 –

    Texture, clustering, and watershed algorithm [4] 71.1 83.1 84.5 78.3 93.3

    VGG16 [5] 76.5 86.7 – – 95.6

    SD-UNet [6] 81.6 89.5 88.6 91.5 98.3

    RD-UNet [8] 89.3 94.3 94.9 93.8 99.1

    Table 3.2 Comparison of TE results of our method with that of existing methods based onsame data set.

    Methods Jaccard (%) Dice (%) Precision (%) Recall (%) Accuracy (%)

    Level-set algorithm and Retinex theory [7] 62.2 76.7 71.3 83.1 86.7

    CNN with DCT [2] 58.9 74.2 69.1 80.0 90.0

    Texture, clustering, and watershed algorithm [4] 63.0 77.3 69.0 89.0 86.6

    RD-UNet [8] 85.3 92.5 91.8 93.2 98.3

    3.1. The experimental results illustrate that RD-UNet model achieves better performance

    than aforementioned existing models. It outperforms the SD-UNet model [6] by 0.8% in

    accuracy, 7.1% in precision, 2.5% in recall, 5.4% in the Dice Coefficient, and 9.4% in the

    Jaccard Index.

    Moreover, we compare the TE segmentation performance of RD-UNet model with

    existing models i.e., level-set algorithm and Retinex theory [7], CNN with discrete cosine

    transform [2], and texture analysis, clustering, and watershed algorithm [4]; see Table

    3.2. TE segmentation results indicate that RD-UNet model outperforms existing models

    particularly, Texture, clustering and watershed model [4] by 13.5% in accuracy, 33% in

    precision, 4.7% in recall, 19.7% in the Dice coefficient, and 35.4% in the Jaccard index.

    Compared to the existing models, we achieve highest Jaccard and Dice scores

    e.g., maximum overlap between network’s predictions and corresponding ground truth

    annotations. Furthermore, the RD-UNet model significantly reduces the false positive

    (misclassified as ICM/TE) pixels and false negative (misidentified as background) pixels

    throughout the dataset; see increased precision and recall scores in Tables 3.1 and 3.2.

    The RD-UNet model can understand context better due to the residual convolutional units

    29

  • in the encoding and decoding units and dilated convolutional layers in the central bridge.

    Consequently, it improves the ICM and TE segmentation performances.

    3.2.2 Qualitative Results

    We compared predicted ICM and TE segmentation results with ground truth ICM and TE

    annotations. The contours of ground truth ICM and TE annotations are overlaid on that of

    predicted ICM and TE segments to visualize the differences. To better understand the ICM

    segmentation quality, we categorize the results according to best (Jaccard Index of more

    than 97%), better (Jaccard Index from 92% to 97%), and fair (Jaccard Index from 77% to

    92%) segmentation; see Fig. 3.1. The 36.8%, 50%, and 13.2% of test images are in the best,

    better, and fair segmentation categories, respectively in ICM segmentation results.

    To better understand the TE segmentation quality, we categorize the results according

    to best (Jaccard Index of more than 94%), better (Jaccard Index from 87% to 94%), and

    fair (Jaccard Index from 76% to 87%) segmentation; see Fig. 3.2. The 31.6%, 47.4%, and

    21% of test images are in the best, better, and fair segmentation categories, respectively

    in TE segmentation results. The segmentation results have been classified by the Jaccard

    Index since other performance metrics are reasonably high.

    Next, we discuss the qualitative ICM segmentation performance of the RD-UNet model;

    Fig. 3.1 shows some representative results, where the (i, j)th image denotes an image at the

    ith row and jth column in Fig. 3.1. For all the three segmentation categories, RD-UNet

    successfully segments ICM regions even if they connect and/or overlap with TE/CM; see

    the (1, 4)th, (2, 4)th, (3, 4)th, (4, 4)th images. In general, contours of segmented ICM well

    align with that of the ground truth annotations; compare red contours and yellow regions in

    the 4th column. However, RD-UNet model shows limited ICM segmentation performances

    where some indistinct features exist between ICM and TE/CM; see the fair segmentation

    category in Fig. 3.1. For example, in the (5, 4)th image, miss-segmentation exists where

    30

  • Figure 3.1 ICM segmentation results by RD-UNet. The background (non ICM) is coloreddark cyan, the annotated ground truth ICM is light green, the network predicted ICM isyellow, and the contour of the ground truth ICM is red. JI and DC stand for Jaccard Indexand Dice Coefficient, respectively.

    31

  • Figure 3.2 TE segmentation results by RD-UNet. The background (non TE) is colored darkcyan, the annotated ground truth TE is light green, the network predicted TE is yellow,and the contour of the ground truth TE is red. JI and DC stand for Jaccard Index andDice Coefficient, respectively.

    32

  • ICM and TE have similar texture; in the (6, 4)th image, miss-segmentation happens where

    ICM has similar texture to that of CM.

    We explain the qualitative TE segmentation performance of the RD-UNet model;

    Fig. 3.2 demonstrates some representative results, where the (i, j)th image denotes an image

    at the ith row and jth column in Fig. 3.2. For all the three segmentation categories, RD-

    UNet successfully segments TE regions even if they connect and/or overlap with ICM/CM;

    see the (1, 4)th, (2, 4)th, (3, 4)th, (4, 4)th images. Here, contours of segmented TE closely

    align with that of the ground truth annotations; compare red contours and yellow regions

    in the 4th column. However, RD-UNet model shows limited TE segmentation performances

    where some indistinct features exist between TE and ICM/CM; see the fair segmentation

    category in Fig. 3.2. For example, in the (5, 4)th image, miss-segmentation exists where it

    is challenging to differentiate edges of TE and CM; in the (6, 4)th image, miss-segmentation

    happens where it is difficult to differentiate edges of TE and ICM.

    3.3 Inner Cell Segmentation Results

    3.3.1 Quantitative Results

    We compared the D2R2-UNet with other equivalent UNet variant models i.e., UNet [9], D-

    UNet [6], Res-UNet [10], Rec-UNet [11], R2-UNet [11], and RD-UNet [8] to gauge the

    segmentation potential. Fig. 3.3(a) demonstrates the joint loss (Equation 2.4) during

    training process, showing that D2R2-UNet model outperforms other UNet variant models

    with 7.79% loss. The proposed model better captures inner cell features as well as effectively

    isolates the artifacts and fragmented cellular clusters.

    Alongside, Fig. 3.3(b) exhibits the joint loss (Equation 2.4) during testing process, again,

    our model surpasses other models with 7.45% loss. Comparing Fig. 3.3(a) and 3.3(b), the

    testing loss is less than the training loss which may imply two things. First, the network

    has good generalization capability i.e., prevents overfitting. Second, the network underfits

    slightly which might be caused by over-regularization i.e., dropout rate, however, this did

    33

  • (a) Training loss (b) Testing loss

    10 20 30 40 50 60 70 80 90 100

    epochs

    7

    8

    9

    10

    11

    12

    Lo

    ss (

    %)

    UNet

    Res-UNet

    RD-UNet

    Rec-UNet

    D-UNet

    R2-UNet

    D2R2-UNet

    10 20 30 40 50 60 70 80 90 100

    epochs

    7

    8

    9

    10

    11

    12

    Lo

    ss (

    %)

    UNet

    Res-UNet

    Rec-UNet

    D-UNet

    R2-UNet

    RD-UNet

    D2R2-UNet

    Figure 3.3 Comparisons of the joint loss (Equation 2.4) between different UNet variantmodels for inner cell segmentation: (a) training loss and (b) testing loss.

    Table 3.3 Comparison among different UNet architectures based on their inner cellsegmentation performance evaluated on same testing set

    CNNs Jaccard (%) Dice (%) Precision (%) Accuracy (%) Specificity (%)

    UNet [9] 94.04 96.93 96.74 98.78 99.19

    D-UNet [6] 95.40 97.64 97.29 99.07 99.33

    Res-UNet [10] 95.26 97.57 97.36 99.04 99.35

    Rec-UNet [11] 95.28 97.58 97.11 99.04 99.28

    R2-UNet [11] 95.53 97.72 97.41 99.09 99.36

    RD-UNet [8] 95.55 97.72 97.53 99.10 99.39

    Proposed D2R2-UNet 95.65 97.78 97.66 99.12 99.42

    not significantly affect the segmentation performance. Here, we finely tuned dropout rate

    to prevent the overfitting which causes slight underfitting. Moreover, if we compare the

    training loss with the testing loss, we observe that the minimum loss values are almost

    similar in both cases. This highlights that our network generalizes well and avoids overfitting

    like the baseline UNet. Our network outperforms the baseline UNet by a large margin. The

    increased testing performance of D2R2-UNet reflects the fact that it better recognizes the

    features relevant to the varying inner cell. Also, it better combines low level and high level

    features and understand context well at deeper levels of the network. Finally, we summarize

    the overall segmentation performance of all models based on a testing set of 342 images in

    Table 3.3.

    Although UNet and its variants exhibit significant performance, the D2R2-UNet

    provides the best overall performance with Jaccard Index of 95.65% and Dice Coefficient of

    34

  • 97.78%. The intuition behind this enhanced performance is, the proposed network forms

    a robust architecture owing to three major modifications: 1) R2 convolutional units in the

    encoder and decoder, 2) dilated convolutional layers in the central bridge, and 3) residual

    convolutional layers in the encoder-decoder skip-connections, whereas other UNet models

    do not include all of them.

    3.3.2 Qualitative Results

    The network’s predictions were compared with the corresponding ground truths to evaluate

    the segmentation performance. We organized the segmentation results predicted by D2R2-

    UNet into three performance categories: 1) best prediction, 2) better prediction, and 3) fair

    prediction. This gives a clear idea about the overall segmentation performance throughout

    the testing dataset. Here, the best prediction is defined by Jaccard Index more than 96%.

    Similarly, the better prediction is based on Jaccard Index from 92% to 96%. Finally, the

    fair prediction includes Jaccard Index between 86% and 92%. Of the total 342 testing

    images, 167 images fall into the best category, 163 images correspond to the better category

    and the remaining 12 images are in the fair performance category. Among all predictions,

    the highest individual Jaccard and Dice are 98.55% and 99.27%, respectively. The lowest

    individual Jaccard and Dice are 86.06% and 92.51%, respectively.

    We discuss the qualitative inner cell segmentation performance of the proposed D2R2-

    UNet model; Fig. 3.4 shows some representative results, where the (i, j)th image denotes

    an image at the ith row and jth column in Fig. 3.4. For all the three prediction categories,

    D2R2-UNet successfully segments inner cell regions beyond the culture well containing white

    bands and/or in dark background; see the (2, 1)th, (2, 2)th, (2, 6)th, (5, 3)th, (5, 4)th, and

    (5, 5)th images. The proposed D2R2-UNet model effectively identifies an outline of inner

    cells even if outlines connect with artifacts, e.g., the (2, 4)th, (5, 2)th images, or fragmented

    cellular clusters, e.g., the (2, 3)th, (5, 3)th images. In general, contours of segmented inner

    cells well align with that of the ground truth annotations; compare red and blue contours

    in the 3rd and 6th rows.

    35

  • Figure 3.4 Segmentation results. Light green in 2nd and 5th rows indicates segmented innercell by D2R2-UNet. Red and blue in 3rd and 6th rows indicate the boundaries of groundtruth and predicted inner cell, respectively. JI and DC stand for Jaccard index and Dicecoefficient, respectively.

    36

  • D2R2-UNet shows limited segmentation performances where some indistinct features

    exist between inner cells and ZPs. For example, in the (3, 4)th image, miss-segmentation

    exists where it is challenging to differentiate edges of ZP and inner cell (where the edge

    around ZP is stronger than the usual); in the (6, 5)th image, miss-segmentation happens

    where inner cell has similar texture to that of ZP; in the (6, 6)th image, miss-segmentation

    exists where edges between inner cell and ZP are indistinct.

    37

  • References

    [1] A. A. Taha and A. Hanbury, “Metrics for evaluating 3D medical image segmentation:

    analysis, selection, and tool,” BMC Medical Imaging, vol. 15, no. 1, p. 29, 2015.

    [2] S. Kheradmand, P. Saeedi, and I. Bajic, “Human blastocyst segmentation using neural

    network,” in IEEE Canadian Conference on Electrical and Computer Engineering

    (CCECE). IEEE, 2016, pp. 1–4.

    [3] R. M. Rad, P. Saeedi, J. Au, and J. Havelock, “Coarse-to-fine texture analysis for

    inner cell mass identification in human blastocyst microscopic images,” in Seventh

    International Conference on Image Processing Theory, Tools and Applications (IPTA).

    IEEE, 2017, pp. 1–5.

    [4] P. Saeedi, D. Yee, J. Au, and J. Havelock, “Automatic identification of human

    blastocyst components via texture,” IEEE Transactions on Biomedical Engineering,

    vol. 64, no. 12, pp. 2968–2978, 2017.

    [5] S. Kheradmand, A. Singh, P. Saeedi, J. Au, and J. Havelock, “Inner cell mass

    segmentation in human HMC embryo images using fully convolutional network,” in

    IEEE International Conference on Image Processing (ICIP). IEEE, 2017, pp. 1752–

    1756.

    [6] R. M. Rad, P. Saeedi, J. Au, and J. Havelock, “Multi-resolutional ensemble of stacked

    dilated U-Net for inner cell mass segmentation in human embryonic images,” in 25th

    38

  • IEEE International Conference on Image Processing (ICIP). IEEE, 2018, pp. 3518–

    3522.

    [7] A. Singh, J. Au, P. Saeedi, and J. Havelock, “Automatic segmentation of trophectoderm

    in microscopic images of human blastocysts,” IEEE Transactions on Biomedical

    Engineering, vol. 62, no. 1, pp. 382–393, 2014.

    [8] M. Y. Harun, T. Huang, and A. T. Ohta, “Inner cell mass and trophectoderm

    segmentation in human blastocyst images using deep neural network,” in 13th IEEE

    International Conference on Nano/Molecular Medicine and Engineering. IEEE, 2019,

    pp. 214–219.

    [9] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for

    biomedical image segmentation,” in International Conference on Medical Image

    Computing and Computer-Assisted Intervention (MICCAI). Springer, 2015, pp. 234–

    241.

    [10] Z. Zhang, Q. Liu, and Y. Wang, “Road extraction by deep residual U-Net,” IEEE

    Geoscience and Remote Sensing Letters, vol. 15, no. 5, pp. 749–753, 2018.

    [11] M. Z. Alom, M. Hasan, C. Yakopcic, T. M. Taha, and V. K. Asari, “Recurrent

    residual convolutional neural network based on U-Net (R2U-Net) for medical image

    segmentation,” arXiv preprint arXiv:1802.06955, 2018.

    39

  • Chapter 4

    Conclusion and Future Work

    4.1 Conclusion

    The thesis has demonstrated works on a biomedical project aiming to improve the existing

    IVF treatment for infertility. Automating embryo image segmentation with high accuracy

    is important for sustaining healthy pregnancies in IVF, since it is a basic element in

    morphological and morphokinetic analysis for evaluating embryo viability. The project

    demonstrated deep learning based embryo image segmentation methods to 1) segment ICM

    and TE in ZP-intact embryonoic images for morphological analysis and 2) segment inner cell

    in ZP-ablated embryonic images for morphokinetic study. Improving semantic segmentation

    CNN is particularly useful for ICM and TE segmentation since it is challenging to segment

    ICM and TE regions due to the similar textures of embryo regions (ICM/TE/ZP/CM) and

    artifacts and image contrast variations. The CNN can resovle these issues by improving

    feature extraction and better understanding the context. Furthermore, it is important to

    enhance segmentation CNN for inner cell segmentation. Because it is difficult to segment

    inner cell with the conventional inner cell segmentation method in an embryoscope due to

    irregular expansion of inner cells, some artifacts and cellular clusters near inner cell outlines,

    and potential white bands and/or dark backgrounds around expanded inner cell. The CNN

    can overcome these challenges by better capturing inner cell features.

    40

  • We implemented RD-UNet model and developed D2R2-UNet model in order to overcome

    the aforementioned segmentation challenges. We implemented RD-UNet model [1] by

    incorporating residual convolutional units in encoder and decoder and adding series of

    dilated convolutional layers to the central bridge. The RD-UNet model improves the ICM

    segmentation and outperforms the existing models i.e., CNN with discrete cosine transform

    [2], coarse-to-fine texture analysis [3], texture analysis, clustering, and watershed algorithm

    [4], VGG16 [5], and SD-UNet [6] with a 94.3% Dice Coefficient and a 89.3% Jaccard Index.

    It achieves the best performances in TE segmentation with a 92.5% Dice Coefficient and

    a 85.3% Jaccard Index compared to existing models i.e., level-set algorithm and Retinex

    theory [7], CNN with discrete cosine transform [2], and texture analysis, clustering, and

    watershed algorithm [4]. We believe that this model can be used for precisely segmenting

    ICM and TE for morphological analysis of embryos towards improved pregnancy outcomes

    in IVF.

    For inner cell segmentation, we proposed a UNet-based CNN architecture by replacing

    UNet encoding-decoding units, central bridge, and encoder-decoder skip-paths with R2

    convolutional encoding-decoding units, dilated convolutional central bridge, and residual

    convolutional encoder-decoder skip-paths, respectively. The proposed D2R2-UNet model

    improves inner cell segmentation performances with a Jaccard Index of 95.65% and a Dice

    Coefficient of 97.78% compared to the existing UNet variants i.e., UNet [8], D-UNet [6], Res-

    UNet [9], Rec-UNet [10], R2-UNet [10], and RD-UNet [1]. The model better understands

    the context and/or reduces semantic disparity between encoder and decoder. We believe

    that the proposed model can accurately segment inner cell for morphokinetic analysis of

    embryos and facilitate sustained pregnancies in IVF.

    4.2 Future Work

    Our future work is using temporal information between frames to improve inner cell

    segmentation performances. The current dataset has small number of frames per video

    41

  • (30 or 31 frames) with long interval (20 minutes/frame), so there exists large and irregular

    spatial variations between consecutive frames and modeling temporal changes is challenging.

    In addition, the dataset has total 45 videos, so it has limited diversity in training video

    segmentation CNNs. To overcome the aforementioned limitations, future works will obtain

    more frames per video (regular and shorter interval in time-lapse imaging setup) and more

    videos.

    42

  • References

    [1] M. Y. Harun, T. Huang, and A. T. Ohta, “Inner cell mass and trophectoderm

    segmentation in human blastocyst images using deep neural network,” in 13th IEEE

    International Conference on Nano/Molecular Medicine and Engineering. IEEE, 2019,

    pp. 214–219.

    [2] S. Kheradmand, P. Saeedi, and I. Bajic, “Human blastocyst segmentation using neural

    network,” in IEEE Canadian Conference on Electrical and Computer Engineering

    (CCECE). IEEE, 2016, pp. 1–4.

    [3] R. M. Rad, P. Saeedi, J. Au, and J. Havelock, “Coarse-to-fine texture analysis for

    inner cell mass identification in human blastocyst microscopic images,” in Seventh

    International Conference on Image Processing Theory, Tools and Applications (IPTA).

    IEEE, 2017, pp. 1–5.

    [4] P. Saeedi, D. Yee, J. Au, and J. Havelock, “Automatic identification of human

    blastocyst components via texture,” IEEE Transactions on Biomedical Engineering,

    vol. 64, no. 12, pp. 2968–2978, 2017.

    [5] S. Kheradmand, A. Singh, P. Saeedi, J. Au, and J. Havelock, “Inner cell mass

    segmentation in human HMC embryo images using fully convolutional network,” in

    IEEE International Conference on Image Processing (ICIP). IEEE, 2017, pp. 1752–

    1756.

    43

  • [6] R. M. Rad, P. Saeedi, J. Au, and J. Havelock, “Multi-resolutional ensemble of stacked

    dilated U-Net for inner cell mass segmentation in human embryonic images,” in 25th

    IEEE International Conference on Image Processing (ICIP). IEEE, 2018, pp. 3518–

    3522.

    [7] A. Singh, J. Au, P. Saeedi, and J. Havelock, “Automatic segmentation of trophectoderm

    in microscopic images of human blastocysts,” IEEE Transactions on Biomedical

    Engineering, vol. 62, no. 1, pp. 382–393, 2014.

    [8] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for

    biomedical image segmentation,” in International Conference on Medical Image

    Computing and Computer-Assisted Intervention (MICCAI). Springer, 2015, pp. 234–

    241.

    [9] Z. Zhang, Q. Liu, and Y. Wang, “Road extraction by deep residual U-Net,” IEEE

    Geoscience and Remote Sensing Letters, vol. 15, no. 5, pp. 749–753, 2018.

    [10] M. Z. Alom, M. Hasan, C. Yakopcic, T. M. Taha, and V. K. Asari, “Recurrent

    residual convolutional neural network based on U-Net (R2U-Net) for medical image

    segmentation,” arXiv preprint arXiv:1802.06955, 2018.

    44