a contrast-adaptive method for simultaneous whole-brain and...
TRANSCRIPT
-
NeuroImage 225 (2021) 117471
Contents lists available at ScienceDirect
NeuroImage
journal homepage: www.elsevier.com/locate/neuroimage
A contrast-adaptive method for simultaneous whole-brain and lesion segmentation in multiple sclerosis
Stefano Cerri a , b , โ , Oula Puonti b , Dominik S. Meier c , Jens Wuerfel c , Mark Mรผhlau d , Hartwig R. Siebner b , e , f , Koen Van Leemput a , g
a Department of Health Technology, Technical University of Denmark, Denmark b Danish Research Centre for Magnetic Resonance, Copenhagen University Hospital Hvidovre, Denmark c Medical Image Analysis Center (MIAC AG) and Department of Biomedical Engineering, University Basel, Switzerland d Department of Neurology and TUM-Neuroimaging Center, School of Medicine, Technical University of Munich, Germany e Department of Neurology, Copenhagen University Hospital Bispebjerg, Denmark f Institute for Clinical Medicine, Faculty of Medical and Health Sciences, University of Copenhagen, Denmark g Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Harvard Medical School, USA
a r t i c l e i n f o
Keywords:
Lesion segmentation Multiple sclerosis Whole-brain segmentation Generative model
a b s t r a c t
Here we present a method for the simultaneous segmentation of white matter lesions and normal-appearing neu- roanatomical structures from multi-contrast brain MRI scans of multiple sclerosis patients. The method integrates a novel model for white matter lesions into a previously validated generative model for whole-brain segmenta- tion. By using separate models for the shape of anatomical structures and their appearance in MRI, the algorithm can adapt to data acquired with different scanners and imaging protocols without retraining. We validate the method using four disparate datasets, showing robust performance in white matter lesion segmentation while simultaneously segmenting dozens of other brain structures. We further demonstrate that the contrast-adaptive method can also be safely applied to MRI scans of healthy controls, and replicate previously documented atrophy patterns in deep gray matter structures in MS. The algorithm is publicly available as part of the open-source neuroimaging package FreeSurfer.
1
a d c 2 i
e c 2 2 m m l d m b
p a h 2
l p i a i i a i S p b
1
hRA1
. Introduction
Multiple sclerosis (MS) is the most frequent chronic inflammatoryutoimmune disorder of the central nervous system, causing progressiveamage and disability. The disease affects nearly half a million Ameri-ans and 2.5 million individuals world-wide ( Goldenberg, 2012; Rosati,001 ), generating more than $10 billion in annual healthcare spendingn the United States alone ( Adelman et al., 2013 ).
The ability to diagnose MS and track its progression has been greatlynhanced by magnetic resonance imaging (MRI), which can detectharacteristic brain lesions in white and gray matter ( Bakshi et al.,008; Blystad et al., 2015; Garcรญa-Lorenzo et al., 2013; Lรถvblad et al.,010 ). Lesions visualized by MRI are up to an order of magnitudeore sensitive in detecting disease activity compared to clinical assess-ent ( Filippi et al., 2006 ). The prevalence and dynamics of white matter
esions are thus used clinically to diagnose MS ( Thompson et al., 2018 ),efine disease stages and to determine the efficacy of a therapeutic regi-en ( Sormani, 2013 ). MRI is also an unparalleled tool for characterizing
rain atrophy, which occurs at a faster rate in patients with MS com-
โ Corresponding author. E-mail address: [email protected] (S. Cerri).
s
ttps://doi.org/10.1016/j.neuroimage.2020.117471 eceived 11 May 2020; Received in revised form 12 October 2020; Accepted 16 Octovailable online 22 October 2020 053-8119/ยฉ 2020 The Authors. Published by Elsevier Inc. This is an open access ar
ared to healthy controls ( Azevedo et al., 2018; Barkhof et al., 2009 )nd, especially in deep gray matter structures and the cerebral cortex,as been shown to correlate with measures of disability ( Geurts et al.,012 ).
Although manual labeling remains the most accurate way 1 of de-ineating white matter lesions in MS ( Commowick et al., 2018 ), this ap-roach is very cumbersome and in itself prone to considerable intra- andnter-rater disagreement ( Zijdenbos et al., 1998 ). Furthermore, manu-lly labeling various normal-appearing brain structures to assess atrophys simply too time consuming to be practically feasible. Therefore, theres a clear need for automated tools that can reliably and efficiently char-cterize the morphometry of white matter lesions, various neuroanatom-cal structures, and their changes over time directly from in vivo MRI.uch tools are of great potential value for diagnosing disease, trackingrogression, and evaluating treatment. They can also help in obtaining aetter understanding of underlying disease mechanisms, and to facilitate
Although selectively fusing several automatic methods has recently been hown to approach human performance ( Carass et al., 2020 ).
ber 2020
ticle under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )
https://doi.org/10.1016/j.neuroimage.2020.117471http://www.ScienceDirect.comhttp://www.elsevier.com/locate/neuroimagehttp://crossmark.crossref.org/dialog/?doi=10.1016/j.neuroimage.2020.117471&domain=pdfmailto:[email protected]://doi.org/10.1016/j.neuroimage.2020.117471http://creativecommons.org/licenses/by/4.0/
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021) 117471
m t h p
L p i
s 4 a f i i a i m s s f a
v M p r t i o t M w t b f
Fig. 1. Segmentation of white matter lesions and 41 different brain structures from the proposed method on T1w-FLAIR input. From left to right: sagittal, coronal, axial view. From top to bottom: T1w, FLAIR, automatic segmentation.
o b
V e m t t i
2
q o w s p a f h
a ( b t F{ v g ๐ฅ b m s I h o
ore efficient testing in clinical trials. Ultimately, automated softwareools may help clinicians to prospectively identify which patients are atighest risk of future disability accrual, leading to better counseling ofatients and better overall clinical outcomes.
Despite decades of methodological development (cf. Garcรญa-orenzo et al., 2013 or Danelakis et al., 2018 ), currently available com-utational tools for analyzing MRI scans of MS patients remain limitedn a number of important ways:
โข Poor generalizability: Existing tools are often developed and testedon very specific imaging protocols, and may not be able to work ondata that is acquired differently. Especially with the strong surge ofsupervised learning in recent years, where the relationship betweenimage appearance and segmentation labels in training scans is di-rectly and statically encoded, the segmentation performance of manystate-of-the-art algorithms will degrade substantially when appliedto data from different scanners and acquisition protocols ( Garcรญa-Lorenzo et al., 2013; Valverde et al., 2019 ), severely limiting theirusefulness in practice.
โข Dearth of available software: Despite the very large number of pro-posed methods, most algorithms are only developed and tested in-house, and very few tools are made publicly available ( Griffantiet al., 2016; Schmidt et al., 2012; Shiee et al., 2010; Valverde et al.,2017 ). In order to secure that computational methods will make areal practical impact, they must be accompanied by software imple-mentations that work robustly across a wide array of image acqui-sitions; that are made publicly available; and that are open-sourced,rigorously tested and comprehensively documented.
โข Limitations in assessing atrophy: There is a lack of dedicated tools forcharacterizing brain atrophy patterns in MS: many existing methodscharacterize only aggregate measures such as global brain or graymatter volume ( Smeets et al., 2016; Smith et al., 2002 ) rather thanindividual brain structures, or require that lesions are pre-segmentedso that their MRI intensities can be replaced with placeholder valuesto avoid biased atrophy measures ( Azevedo et al., 2018; Battagliniet al., 2012; Ceccarelli et al., 2012; Chard et al., 2010; Gelineau-Morel et al., 2012; Sdika and Pelletier, 2009 ) (so-called lesion fill-ing).
In order to address these limitations, we describe a new open-sourceoftware tool for simultaneously segmenting white matter lesions and1 neuroanatomical structures from MRI scans of MS patients. An ex-mple segmentation produced by this tool is shown in Fig. 1 . By per-orming lesion segmentation in the full context of whole-brain model-ng, the method obviates the need to segment lesions and assess atrophyn two separate processing phases, as currently required in lesion fillingpproaches. The method works robustly across a wide range of imag-ng hardware and protocols by completely decoupling computationalodels of anatomy from models of the imaging process, thereby side-
tepping the intrinsic generalization difficulties of supervised methodsuch as convolutional neural networks. Our software implementation isreely available as part of the FreeSurfer neuroimaging analysis pack-ge ( Fischl, 2012 ).
To the best of our knowledge, only two other methods have been de-eloped for joint whole-brain and white matter lesion segmentation inS. Shiee et al. (2010) model lesions as an extra tissue class in an unsu-
ervised whole-brain segmentation method ( Bazin and Pham, 2008 ),emoving false positive detections of lesions using a combination ofopological constraints and hand-crafted rules implementing variousntensity- and distance-based heuristics. However, the method segmentsnly a small set of neuroanatomical structures (10), and validation ofhis aspect was limited to a simulated MRI scan of a single subject.cKinley et al. (2019) use a cascade of two convolutional neural net-orks, with the first one skull-stripping individual image modalities and
he second one generating the actual segmentation. However, the whole-rain segmentation performance of this method was only evaluated on aew structures (7). Furthermore, as a supervised method its applicability
n data that differs substantially from its training data will necessarilye limited.
A preliminary version of this work was presented in Puonti andan Leemput (2016) . Compared to this earlier work, the current articlemploys more advanced models for the shape and appearance of whiteatter lesions, and includes a more thorough validation of the segmen-
ation performance of the proposed method, including an evaluation ofhe whole-brain segmentation component and comparisons with humannter-rater variability.
. Contrast-adaptive whole-brain segmentation
We build upon a method for whole-brain segmentation called Se-uence Adaptive Multimodal SEGmentation (SAMSEG) that we previ-usly developed ( Puonti et al., 2016 ), and that we propose to extendith the capability to handle white matter lesions. SAMSEG robustly
egments 41 structures from head MRI scans without any form of pre-rocessing or prior assumptions on the scanning platform or the numbernd type of pulse sequences used. Since we build heavily on this methodor the remainder of the paper, we briefly outline its main characteristicsere.
SAMSEG is based on a generative approach, in which a forward prob-bilistic model is inverted to obtain automated segmentations. Let ๐ = ๐ 1 , โฆ , ๐ ๐ผ ) denote a matrix collecting the intensities in a multi-contrastrain MR scan with ๐ผ voxels, where the vector ๐ ๐ = ( ๐ 1 ๐ , โฆ , ๐
๐ ๐ ) ๐ con-
ains the intensities in voxel ๐ for each of the available ๐ contrasts.urthermore, let ๐ฅ = ( ๐ 1 , โฆ , ๐ ๐ผ ) ๐ be the corresponding labels, where ๐ ๐ โ1 , โฆ๐พ} denotes one of the ๐พ possible segmentation labels assigned tooxel ๐ . SAMSEG estimates a segmentation ๐ฅ from MRI data ๐ by using aenerative model, illustrated in black in Fig. 2 . According to this model, is sampled from a segmentation prior ๐ ( ๐ฅ |๐ฝ๐ฅ ) , after which ๐ is obtainedy sampling from a likelihood function ๐ ( ๐ |๐ฅ , ๐ฝ๐ ) , where ๐ฝ๐ฅ and ๐ฝ๐ areodel parameters with priors ๐ ( ๐ฝ๐ฅ ) and ๐ ( ๐ฝ๐ ) . Segmentation then con-
ists of inferring the unknown ๐ฅ from the observed ๐ under this model.n the following, we summarize the segmentation prior and the likeli-ood used in SAMSEG, as well as the way the resulting model is used tobtain automated segmentations.
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021) 117471
2
c i m v v m f
๐
a
๐
w
a
bv
a o i e l ( i h d S s q
2
๐พ
e F ๐ b e m
๐
๐
๐
w f M ๐
2
e
๐ฝ
Fig. 2. Graphical model of the proposed method. In black the existing contrast- adaptive whole-brain segmentation method SAMSEG (without lesion modeling), in blue the proposed additional components to also model white matter lesions. Shading indicates observed variables. The plate indicates ๐ผ repetitions of the included variables, where ๐ผ is the number of voxels.
w a a
๏ฟฝฬ๏ฟฝ
i a a
๐ค
e s t s p t
c o p o a ( e t c m ๐
s i c i
3
m l l b S ๐ฅ l
2 http://freesurfer.net/
.1. Segmentation prior
To model the spatial configuration of various neuroanatomi-al structures, we use a deformable probabilistic atlas as detailedn Puonti et al. (2016) . In short, the atlas is based on a tetrahedralesh, where the parameters ๐ฝ๐ฅ are the spatial positions of the meshโs
ertices, and ๐ ( ๐ฝ๐ฅ ) is a topology-preserving deformation prior that pre-ents the mesh from tearing or folding ( Ashburner et al., 2000 ). Theodel assumes conditional independence of the labels between voxels
or a given deformation:
( ๐ฅ |๐ฝ๐ฅ ) = ๐ผ โ๐ =1 ๐ ( ๐ ๐ |๐ฝ๐ฅ ) ,
nd computes the probability of observing label ๐ at voxel ๐ as
( ๐ ๐ = ๐ |๐ฝ๐ฅ ) = ๐ฝ โ๐=1
๐ผ๐ ๐ ๐ ๐ ๐ ( ๐ฝ๐ฅ ) , (1)
here ๐ผ๐ ๐
are label probabilities defined at the ๐ฝ vertices of the mesh,
nd ๐ ๐ ๐ ( ๐ฝ๐ฅ ) denotes a spatially compact, piecewise-linear interpolation
asis function attached to the ๐ ๐กโ vertex and evaluated at the ๐ ๐กโ
oxel ( Van Leemput, 2009 ). The topology of the mesh, the mode of the deformation prior ๐ ( ๐ฝ๐ฅ ) ,
nd the label probabilities ๐ผ๐ ๐
can be learned automatically from a setf segmentations provided as training data ( Van Leemput, 2009 ). Thisnvolves an iterative process that combines a mesh simplification op-ration with a group-wise nonrigid registration step to warp the at-as to each of the training subjects, and an Expectation MaximizationEM) algorithm ( Dempster et al., 1977 ) to estimate the label probabil-ties ๐ผ๐
๐ in the mesh vertices. The result is a sparse mesh that encodes
igh-dimensional atlas deformations through a compact set of vertexisplacements. As described in Puonti et al. (2016) , the atlas used inAMSEG was derived from manual whole-brain segmentations of 20ubjects, representing a mix of healthy individuals and subjects withuestionable or probable Alzheimerโs disease.
.2. Likelihood function
For the likelihood function we use a Gaussian model for each of thedifferent structures. We assume that the bias field artifact can be mod-
lled as a multiplicative and spatially smooth effect ( Wells et al., 1996 ).or computational reasons, we use log-transformed image intensities in , and model the bias field as a linear combination of spatially smoothasis functions that is added to the local voxel intensities ( Van Leemputt al., 1999 ). Letting ๐ฝ๐ collect all bias field parameters and Gaussianeans and variances, the likelihood is defined as
( ๐ |๐ฅ , ๐ฝ๐ ) = ๐ผ โ๐ =1 ๐ ( ๐ ๐ |๐ ๐ , ๐ฝ๐ ) ,
( ๐ ๐ |๐ ๐ = ๐, ๐ฝ๐ ) = ( ๐ ๐ |๐๐ + ๐ ๐๐ , ๐บ๐ ) , =
โ โ โ โ ๐ ๐ 1 โฎ ๐ ๐ ๐
โ โ โ โ , ๐ ๐ = โ โ โ โ ๐ ๐, 1 โฎ ๐ ๐,๐
โ โ โ โ , ๐๐ = โ โ โ โ ๐๐ 1 โฎ ๐๐ ๐
โ โ โ โ , here ๐ denotes the number of bias field basis functions, ๐๐
๐ is the basis
unction ๐ evaluated at voxel ๐, and ๐ ๐ holds the bias field coefficients forRI contrast ๐ . We use a flat prior for the parameters of the likelihood: ( ๐ฝ๐ ) โ 1 .
.3. Segmentation
For a given MRI scan ๐ , segmentation proceeds by computing a pointstimate of the unknown model parameters ๐ฝ = { ๐ฝ๐ , ๐ฝ๐ } : ฬ = arg max
๐ฝ๐ ( ๐ฝ|๐ ) ,
hich effectively fits the model to the data. Details of this procedurere given in Appendix A . Once ๏ฟฝฬ๏ฟฝ is found, the corresponding maximum posteriori (MAP) segmentation
= arg max ๐ฅ ๐ ( ๐ฅ |๐ , ๏ฟฝฬ๏ฟฝ)
s obtained by assigning each voxel to the label with the highest prob-bility, i.e., ๐ ๐ = arg max ๐ ๏ฟฝฬ๏ฟฝ ๐,๐ , where 0 โค ๏ฟฝฬ๏ฟฝ ๐,๐ โค 1 are probabilistic labelssignments
๐,๐ = ( ๐ ๐ |๐๐ + ๐ ๐๐ , ๐บ๐ ) ๐ ( ๐ ๐ = ๐ |๐ฝ๐ฅ ) โ๐พ
๐ โฒ=1 ( ๐ ๐ |๐๐ โฒ + ๐ ๐๐ , ๐บ๐ โฒ ) ๐ ( ๐ ๐ = ๐ โฒ|๐ฝ๐ฅ ) (2) valuated at the estimated parameters ๏ฟฝฬ๏ฟฝ. It is worth emphasizing that,ince the class means and variances { ๐๐ , ๐บ๐ } are estimated from eacharget scan individually, the model automatically adapts to each scanโspecific intensity characteristics โ a property that we demonstrated ex-erimentally on several data sets acquired with different imaging pro-ocols, scanners and field strengths in Puonti et al. (2016) .
Our implementation of this method, written in Python with the ex-eption of C++ parts for the computationally demanding optimizationf the atlas mesh deformation, is available as part of the open-sourceackage FreeSurfer 2 . It segments MRI brain scans without any formf preprocessing such as skull stripping or bias field correction, takinground 10 minutes to process one subject on a state-of-the-art computermeasured on a machine with an Intel 12-core i7-8700K processor). Asxplained in Puonti et al. (2016) , in our implementation we make use ofhe fact that many neuroanatomical structures share the same intensityharacteristics in MRI to reduce the number of free parameters in theodel (e.g., all white matter structures share the same Gaussian mean
๐ and variance ๐บ๐ , as do most gray matter structures). Furthermore, forome structures (e.g., non-brain tissue) we use Gaussian mixture modelsnstead of a single Gaussian. In addition to using full covariance matri-es ๐บ๐ , our implementation also supports diagonal covariances, whichs currently selected as the default behavior.
. Modeling lesions
In order to make SAMSEG capable of additionally segmenting whiteatter lesions, we augment its generative model by introducing a binary
esion map ๐ณ = ( ๐ง 1 , โฆ , ๐ง ๐ผ ) ๐ , where ๐ง ๐ โ {0 , 1} indicates the presence of aesion in voxel ๐ . The augmented model is depicted in Fig. 2 , where thelue parts indicate the additional components compared to the originalAMSEG method. The complete model consists of a joint (i.e., over both and ๐ณ simultaneously) segmentation prior ๐ ( ๐ฅ , ๐ณ|๐ก , ๐ฝ๐ฅ ) , where ๐ก is a newatent variable that helps constrain the shape of lesions, as well as a joint
http://freesurfer.net/
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021) 117471
l t a r
3
๐
w
๐
i i
๐
H 0 r p
3
e w A ci
๐
H (
m m
l
H t t e
๐
wC j l w
T m e p t r
t B r
r s g a W i 4 E
3
l v
๐
w S d S m l
( b S r b i
3
t l G
๐
w
๐
Ic e
๐
H a
W dw l I fl c e t f i t h d s
ikelihood ๐ ( ๐ |๐ฅ , ๐ณ, ๐ฝ๐ , ๐ฝ๐๐๐ ) , where ๐ฝ๐๐๐ are new parameters that governheir appearance . In the following, we summarize the segmentation priornd the likelihood used in the augmented model, as well as the way theesulting model is used to obtain automated segmentations.
.1. Segmentation prior
We use a joint segmentation prior of the form
( ๐ฅ , ๐ณ|๐ก , ๐ฝ๐ฅ ) = ๐ ( ๐ณ|๐ก , ๐ฝ๐ฅ ) ๐ ( ๐ฅ |๐ฝ๐ฅ ) , here ๐ ( ๐ฅ |๐ฝ๐ฅ ) is the deformable atlas model defined in Section 2.1 , and ( ๐ณ|๐ก , ๐ฝ๐ฅ ) = ๐ผ โ
๐ =1 ๐ ( ๐ง ๐ |๐ก , ๐ฝ๐ฅ )
s a factorized model where the probability that a voxel is part of a lesions given by:
( ๐ง ๐ = 1 ||๐ก , ๐ฝ๐ฅ ) = ๐ ๐ ( ๐ก ) ๐๐ ( ๐ฝ๐ฅ ) . ere 0 โค ๐ ๐ ( ๐ก ) โค 1 aims to enforce shape constraints on lesions, whereas โค ๐๐ ( ๐ฝ๐ฅ ) โค 1 takes into account a voxelโs spatial location within its neu-oanatomical context. Below we provide more details on both these com-onents of the model.
.1.1. Modeling lesion shapes
In order to model lesion shapes, we use a variational auto-ncoder ( Kingma and Welling, 2013; Rezende et al., 2014 ) according tohich lesion segmentation maps ๐ณ are generated in a two-step process:n unobserved, low-dimensional code ๐ก is first sampled from a spheri-al Gaussian distribution ๐ ( ๐ก ) = ( ๐ก |๐ , ๐ ) , and subsequently โdecoded โnto ๐ณ by sampling from a factorized Bernoulli model:
๐ ( ๐ณ|๐ก ) = ๐ผ โ๐ =1 ๐ ๐ ( ๐ก ) ๐ง ๐
(1 โ ๐ ๐ ( ๐ก )
)(1โ ๐ง ๐ ) . ere ๐ ๐ ( ๐ก ) are the outputs of a โdecoder โ convolutional neural network
CNN) with filter weights ๐ , which parameterize the model. Given a training data set in the form of ๐ binary segmentation maps
= { ๐ณ ( ๐ ) } ๐ ๐ =1 , suitable network parameters ๐ can in principle be esti-
ated by maximizing the log-probability assigned to the data by theodel :
og ๐ ๐ ( ) = โ๐ณโ
log ๐ ๐ ( ๐ณ ) , where ๐ ๐ ( ๐ณ ) = โซ๐ก ๐ ๐ ( ๐ณ ๐ |๐ก ) ๐ ( ๐ก ) d ๐ก . owever, because the integral over the latent codes makes this in-
ractable, we use amortized variational inference in the form of stochas-ic gradient variational Bayes ( Kingma and Welling, 2013; Rezendet al., 2014 ). In particular, we introduce an approximate posterior
๐( ๐ก |๐ณ) = (๐ก |๐๐( ๐ณ) , diag ( ๐2 ๐( ๐ณ)) ), here the functions ๐๐( ๐ณ) and ๐๐( ๐ณ) are implemented as an โencoder โNN parameterized by ๐. The variational parameters ๐ are then learned
ointly with the model parameters ๐ by maximizing a variationalower bound
โ๐ณโ ๐,๐( ๐ณ) โค log ๐ ๐ ( ) using stochastic gradient descent,
here
๐,๐( ๐ณ) = โ ๐ท ๐พ๐ฟ ( ๐ ๐( ๐ก |๐ณ) ||๐ ( ๐ก )) + ๐ผ ๐ ๐( ๐ก |๐ณ) [log ๐ ๐ ( ๐ณ|๐ก ) ]. (3) he first term is the KullbackโLeibler divergence between the approxi-ate posterior and the prior, which can be evaluated analytically. The
xpectation in the last term is approximated using Monte Carlo sam-ling, using a change of variables (known as the โreparameterizationrick โ) to reduce the variance in the computation of the gradient withespect to ๐ ( Kingma and Welling, 2013; Rezende et al., 2014 ).
Our training data set was derived from manual lesion segmen-ations in 212 MS subjects, obtained from the University Hospital ofasel, Switzerland. The segmentations were all affinely registered andesampled to a 1 mm isotropic grid of size 197 ร233 ร189. In order to
educe the risk of overfitting to the training data, we augmented eachegmentation in the training data set by applying a rotation of 10 de-rees around each axis, obtaining a total of 1484 segmentations. Therchitecture for our encoder and decoder networks is detailed in Fig. 3 .e trained the model for 1000 epochs with mini-batch size of 10 us-
ng Adam optimizer ( Kingma and Ba, 2014 ) with a learning rate of 1e-. We approximated the expectation in the variational lower bound ofq. (3) by using a single Monte Carlo sample in each step.
.1.2. Modeling the spatial location of lesions
In order to encode the spatially varying frequency of occurrence ofesions across the brain, we model the probability of finding a lesion inoxel ๐, based on its location alone, as
๐ ( ๐ฝ๐ฅ ) = ๐ฝ โ๐=1
๐ฝ๐ ๐ ๐ ๐ ( ๐ฝ๐ฅ ) ,
here lesion probabilities 0 โค ๐ฝ๐ โค 1 defined in the vertices of the SAM-EG atlas mesh are interpolated at the voxel location. This effectivelyefines a lesion probability map that deforms in conjunction with theAMSEG atlas to match the neuroanatomy in each image being seg-ented, allowing the model to impose contextual constraints on where
esions are expected to be found. We estimated the parameters ๐ฝ๐ by running SAMSEG on MRI scans
T1-weighted (T1w) and FLAIR) of 54 MS subjects in whom lesions hadeen manually annotated (data from the University Hospital of Basel,witzerland), and recording the estimated atlas deformations. The pa-ameters ๐ฝ๐ were then computed from the manual lesion segmentationsy applying the same technique we used to estimate the ๐ผ๐
๐ parameters
n the SAMSEG atlas training phase (cf. Section 2.1 ).
.2. Likelihood function
For the likelihood, which links joint segmentations { ๐ฅ , ๐ณ} to intensi-ies ๐ , we use the same model as SAMSEG in voxels that do not containesion ( ๐ง ๐ = 0 ), but draw intensities in lesions ( ๐ง ๐ = 1 ) from a separateaussian with parameters ๐ฝ๐๐๐ = { ๐๐๐๐ , ๐บ๐๐๐ } :
( ๐ |๐ฅ , ๐ณ, ๐ฝ๐ , ๐ฝ๐๐๐ ) = ๐ผ โ๐ =1 ๐ ( ๐ ๐ |๐ ๐ , ๐ง ๐ , ๐ฝ๐ , ๐ฝ๐๐๐ ) ,
here
( ๐ ๐ |๐ ๐ = ๐, ๐ง ๐ , ๐ฝ๐ , ๐ฝ๐๐๐ ) = { ( ๐ ๐ |๐๐๐๐ + ๐ ๐๐ , ๐บ๐๐๐ ) if ๐ง ๐ = 1 , ( ๐ ๐ |๐๐ + ๐ ๐๐ , ๐บ๐ ) otherwise . n order to constrain the values that the lesion intensity parameters ๐ฝ๐๐๐ an take, we make them conditional on the remaining intensity param-ters using a normal-inverse-Wishart distribution :
( ๐ฝ๐๐๐ |๐ฝ๐ ) = ( ๐๐๐๐ |๐๐ ๐ , ๐โ1 ๐บ๐๐๐ ) IW ( ๐บ๐๐๐ |๐ ๐๐บ๐ ๐ , ๐ โ ๐ โ 2) . (4) ere the subscript โWM โ denotes the white matter Gaussian and ๐ > 1nd ๐ โฅ 0 are hyperparameters in the model.
This choice of model is motivated by the fact that the normal-inverse-ishart distribution is a conjugate prior for the parameters of a Gaussian
istribution: Eq. (4) can be interpreted as providing ๐ โpseudo-voxels โith empirical mean ๐๐ ๐ and variance ๐ ๐บ๐ ๐ in scenarios where the
esion intensity parameters ๐๐๐๐ and ๐บ๐๐๐ need to be estimated from data.n the absence of any such pseudo-voxels ( ๐ = 0 ), Eq. (4) reduces to aat prior on ๐ฝ๐๐๐ and lesions are modeled as a completely independentlass. Although such models have been used in the literature ( Guttmannt al., 1999; Kikinis et al., 1999; Shiee et al., 2010; Sudre et al., 2015 )heir robustness may suffer when applied to subjects with no or veryew lesions, such as controls or patients with early disease, since theres essentially no data to estimate the lesion intensity parameters from. Inhe other extreme case, the number of pseudo-voxels can be set to such aigh value ( ๐ โ โ) that the intensity parameters of the lesions are fullyetermined by those of WM. This effectively replaces the Gaussian inten-ity model for WM in SAMSEG by a distribution with longer tails, in the
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021) 117471
Fig. 3. Lesion shape model architecture consisting of two symmetrical convolutional neural networks: (a) decoder network and (b) encoder network. The decoder network generates lesion segmentations from a low-dimensional code. Its architecture has ReLU activation functions ( ๐ ( ๐ฅ ) = ๐๐๐ฅ (0 , ๐ฅ ) ) and batch normalization ( Ioffe and Szegedy, 2015 ) between each deconvolution layer, with the last layer having a sigmoid activation function, ensuring 0 โค ๐ ๐ ( ๐ก ) โค 1 . The encoder network encodes lesion segmentations into a latent code. The main differences compared to the decoder network are the use of convolutional layers instead of deconvolutional layers and, to encode the mean and variance parameters, the last layer has been split in two, with no activation function for the mean and a softplus activation function ( ๐ ( ๐ฅ ) = ln (1 + ๐ ๐ฅ ) ) for the variance.
f b I u n 2 P 2
t o 1 a w t p
3
o f
๐
w h v o
{
i r s a B a
S D
t
๐
fi i
S
S
e 2 i v t s
orm of a mixture of two Gaussians with identical means ( ๐๐๐๐ โก ๐๐ ๐ )ut variances that differ by a constant factor ( ๐บ๐๐๐ โก ๐ ๐บ๐ ๐ vs. ๐บ๐ ๐ ).n this scenario, MS lesions are detected as model outliers in a methodsing robust model parameter estimation ( Huber, 1981 ), another tech-ique that has also frequently been used in the literature ( Aรฏt-Ali et al.,005; Bricq et al., 2008; Garcรญa-Lorenzo et al., 2011; Liu et al., 2009;rastawa and Gerig, 2008; Rousseau et al., 2008; Van Leemput et al.,001 ).
Based on pilot experiments on a variety of datasets (distinct fromhe ones used in the results section), we found that good results arebtained by using an intermediate value of ๐ = 500 pseudo-voxels for mm 3 isotropic scans, together with a scaling factor ๐ = 50 . In order todapt to different image resolutions, ๐ is scaled inversely proportionallyith the voxel size in our implementation. We will visually demonstrate
he role of these hyperparameters in constraining the lesion intensityarameters in Section 5.1 .
.3. Segmentation
As in the original SAMSEG method, segmentation proceeds by firstbtaining point estimates ๏ฟฝฬ๏ฟฝ that fit the model to the data, and then in-erring the corresponding segmentation posterior:
( ๐ฅ , ๐ณ|๐ , ๏ฟฝฬ๏ฟฝ) , hich is now jointly over ๐ฅ and ๐ณ simultaneously. Unlike in SAMSEG,owever, both steps are made intractable by the presence of the newariables ๐ฝ๐๐๐ and ๐ก in the model. In order to side-step this difficulty, webtain ๏ฟฝฬ๏ฟฝ through a joint optimization over both ๐ฝ and ๐ฝ๐๐๐ :
ฬ๐ฝ, ๏ฟฝฬ๏ฟฝ๐๐๐ } = arg max { ๐ฝ, ๐ฝ๐๐๐ } ๐ ( ๐ฝ, ๐ฝ๐๐๐ |๐ )
n a simplified model in which the constraints on lesion shape have beenemoved, by clamping all decoder network outputs ๐ ๐ ( ๐ก ) to value 1. Thisimplification is defensible since the aim here is merely to find appropri-te model parameters, rather than highly accurate lesion segmentations.y doing so, the latent code ๐ก is effectively removed from the modelnd the optimization simplifies into the one used in the original SAM-
EG method, with only minor modifications due to the prior ๐ ( ๐ฝ๐๐๐ |๐ฝ๐ ) .etails are provided in Appendix B .
Once parameter estimates ๏ฟฝฬ๏ฟฝ are available, we compute segmenta-ions using the factorization
( ๐ฅ , ๐ณ|๐ , ๏ฟฝฬ๏ฟฝ) = ๐ ( ๐ณ|๐ , ๏ฟฝฬ๏ฟฝ) ๐ ( ๐ฅ |, ๐ณ, ๐ , ๏ฟฝฬ๏ฟฝ) , rst estimating ๐ณ from ๐ ( ๐ณ|๐ , ๏ฟฝฬ๏ฟฝ) (Step 1 below), and then plugging this
nto ๐ ( ๐ฅ |, ๐ณ, ๐ , ๏ฟฝฬ๏ฟฝ) to estimate ๐ฅ (Step 2): tep 1: Evaluating ๐ ( ๐ณ|๐ , ๏ฟฝฬ๏ฟฝ) involves marginalizing over both ๐ก and
๐ฝ๐๐๐ , which we approximate by drawing ๐ Monte Carlo samples{ ๐ก ( ๐ ) , ๐ฝ( ๐ )
๐๐๐ } ๐ ๐ =1 from ๐ ( ๐ก , ๐ฝ๐๐๐ |๐ , ๏ฟฝฬ๏ฟฝ) :
๐ ( ๐ณ|๐ , ๏ฟฝฬ๏ฟฝ) = โซ๐ก , ๐ฝ๐๐๐ ๐ ( ๐ณ|๐ , ๏ฟฝฬ๏ฟฝ, ๐ก , ๐ฝ๐๐๐ ) ๐ ( ๐ก , ๐ฝ๐๐๐ |๐ , ๏ฟฝฬ๏ฟฝ) d ๐ก , ๐ฝ๐๐๐ โ 1 ๐
๐ โ๐ =1
๐ ( ๐ณ|๐ , ๏ฟฝฬ๏ฟฝ, ๐ก ( ๐ ) , ๐ฝ( ๐ ) ๐๐๐ ) .
This allows us to estimate the probability of lesion occurrence ineach voxel, which we then compare with a user-specified thresh-old value ๐พ
๐ ( ๐ง ๐ = 1 |๐ ๐ , ๏ฟฝฬ๏ฟฝ) โท ๐พto obtain the final lesion segmentation ๏ฟฝฬ๏ฟฝ ๐ . Details on how weapproximate ๐ ( ๐ง ๐ = 1 |๐ ๐ , ๏ฟฝฬ๏ฟฝ) using Monte Carlo sampling are pro-vided in Appendix C .
tep 2: Voxels that are not assigned to lesion ( ฬ๐ง ๐ = 0 ) in the previousstep are finally assigned to the neuroanatomical structure withthe highest probability ๐ ( ๐ ๐ = ๐ |๐ง ๐ = 0 , ๐ ๐ , ๏ฟฝฬ๏ฟฝ) , which simply in-volves computing ฬ๐ ๐ = arg max ๐ ๏ฟฝฬ๏ฟฝ ๐,๐ with ๏ฟฝฬ๏ฟฝ ๐,๐ defined in Eq. (2) .
In agreement with other work ( Aรฏt-Ali et al., 2005; Garcรญa-Lorenzot al., 2011; Jain et al., 2015; Prastawa and Gerig, 2008; Shiee et al.,010; Van Leemput et al., 2001 ), we have found that using known priornformation regarding the expected intensity profile of MS lesions inarious MRI contrasts can help reduce the number of false positive de-ections. Therefore, we prevent some voxels from being assigned to le-ion (i.e., forcing ๏ฟฝฬ๏ฟฝ = 0 ) based on their intensities in relation to the
๐
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021) 117471
e t m c
c A S E S w t b
4
e W t m
4
a d i
Table 1
Summary of the datasets used in our experiments.
v t M p
4
p w
stimated intensity parameters { ฬ๐๐ , ฬ๐บ๐ } ๐พ ๐ =1 : In our current implemen-ation only voxels with an intensity higher than the mean of the grayatter Gaussian in FLAIR and/or T2 (if these modalities are present) are
onsidered candidate lesions. Since estimating ๐ ( ๐ง ๐ = 1 |๐ ๐ , ๏ฟฝฬ๏ฟฝ) involves repeatedly invoking the de-
oder and encoder networks of the lesion shape model, as detailed inppendix C , we implemented the proposed method as an add-on toAMSEG in Python using the Tensorflow library ( Abadi et al., 2015 ).stimating ๏ฟฝฬ๏ฟฝ has the same computational complexity as running SAM-EG (i.e., taking approximately 10 minutes on a state-of-the-art machineith an Intel 12-core i7-8700K CPU), while the Monte Carlo sampling
akes an additional 5 minutes on a GeForce GTX 1060 graphics card,ringing the total computation time to around 15 minutes per subject.
. Evaluation datasets and benchmark methods
In this section, we describe four datasets that we will use for thexperiments in this paper, including two taken from public challenges.e also outline two relevant methods for MS lesion segmentation that
he proposed method is compared to in detail, as well as the metrics andeasures used in our experiments.
.1. Datasets
In order to test the proposed method and demonstrate its contrast-daptiveness, we conducted experiments on four datasets acquired withifferent scanner platforms, field strengths, acquisition protocols andmage resolution:
โข MSSeg : This dataset is the publicly available training set of the MSlesion segmentation challenge that was held in conjunction with theMICCAI 2016 conference ( Commowick et al., 2018 ). It consists of 15MS cases from three different scanners, all acquired using a harmo-nized imaging protocol ( Cotton et al., 2015 ). For each patient a 3DT1w sequence, a contrast-enhanced (T1c) sequence, an axial dualPD-T2-weighted (T2w) sequence and a 3D fluid attenuation inver-sion recovery (FLAIR) sequence were acquired. Each subjectโs lesionswere delineated by seven different raters on the FLAIR scan and, ifnecessary, corrected using the T2w scan. These delineated imageswere then fused to create a consensus lesion segmentation for eachsubject. Both raw images and pre-processed images (pre-processingsteps: denoising, rigid registration, brain extraction and bias fieldcorrection โ see Commowick et al. (2018) for details) were madeavailable by the challenge organizers. In our experiments we usedthe pre-processed data, which required only minor modifications inour software to remove non-brain tissues from the model. We notethat the original challenge also included a separate set of 38 testsubjects, but at the time of writing this data is no longer available.
โข Trio : This dataset consists of 40 MS cases acquired on a SiemensTrio 3T scanner at the Danish Research Center of Magnetic Reso-nance (DRCMR). For each patient, a 3D T1w sequence, a T2w se-quence and a FLAIR sequence were acquired. Ground truth lesionsegmentations were automatically delineated on the FLAIR imagesusing Jim software 3 , and then checked and, if necessary, correctedby and expert rater at DRCMR using the T2w and MPRAGE images.
โข Achieva : This dataset consists of 50 MS cases and 25 healthy con-trols acquired on a Philips Achieva 3T scanner at DRCMR. After avisual inspection of the images, we decided to remove 2 healthycontrols from the dataset as they present marked gray matter atro-phy and white matter hyperintensities. For each patient, a 3D T1wsequence, a T2w sequence and a FLAIR sequence were acquired.Ground truth lesion segmentations were delineated using the sameprotocol as the one used for the Trio dataset.
3 http://www.xinapse.com/
โข ISBI : This dataset is the publicly available test set of the MS lesionsegmentation challenge that was held at the 2015 International Sym-posium on Biomedical Imaging ( Carass et al., 2017 ). It consists of14 longitudinal MS cases, with 4 to 6 time points each, separated byapproximately one year. Images were acquired on a Philips 3T scan-ner. For each patient, a 3D T1w sequence, a T2w sequence, a PDwsequence and a FLAIR sequence were acquired. Images were firstpreprocessed (inhomogeneity correction, skull stripping, dura strip-ping, again inhomogeneity correction โ see Carass et al. (2017) fordetails), and then registered to a 1 mm MNI template. Each subjectโslesions were delineated by two different raters on the FLAIR scan,and, if necessary, corrected using the other contrasts. As part of thechallenge, a training dataset of 5 additional longitudinal MS cases isalso available, with the same scanner, imaging protocols and delin-eation procedure as the test dataset.
A summary of the datasets, with scanner type, image modalities andoxel resolution details, can be found in Table 1 . For each subject allhe contrasts were co-registered and resampled to the FLAIR scan forSSeg, and to the T1w scan for Trio, Achieva and ISBI. This is the only
reprocessing step required by the proposed method.
.2. Benchmark methods for lesion segmentation
In order to evaluate the lesion segmentation component of the pro-osed method in detail, we compared it to two publicly available andidely used algorithms for MS lesion segmentation:
โข LST-lga 4 ( Schmidt et al., 2012 ): This lesion growth algorithm startsby segmenting a T1w image into three main tissue classes (CSF, GMand WM) using SPM12 5 , and combines the resulting segmentationwith co-registered FLAIR intensities to calculate a lesion belief map.A pre-chosen initial threshold ๐ is then used to create an initial bi-nary lesion map, which is subsequently grown along voxels that ap-pear hyperintense in the FLAIR image. We set ๐ to its recommended
4 https://www.applied-statistics.de/lst.html 5 https://www.fil.ion.ucl.ac.uk/spm/software/spm12/
http://www.xinapse.com/https://www.applied-statistics.de/lst.htmlhttps://www.fil.ion.ucl.ac.uk/spm/software/spm12/
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021) 117471
W T p
m r c c
4
t s m P v b w
D
w o
p l s e o i w F t c
5
n t c o a f o
I
b T w
5
m s l p b t a a e l
p l G t c d w r G
p t m F i s T t d
๐
w n d f t s
t i T o u A a W w
5
e s m
default value of 0.3, which was also used in previous studies ( Mรผhlauet al., 2013; Rissanen et al., 2014 ).
โข NicMsLesions 6 (Valverde et al., 2017, 2019) : This deep learningmethod is based on a cascade of two 3D convolutional neural net-works, where the first one reveals possible candidate lesion voxels,and the second one reduces the number of false positive outcomes.Both networks were trained by the authors of the method on T1wand FLAIR scans coming from a publicly available training datasetof the MS lesion segmentation challenge held in conjunction withthe MICCAI 2008 conference ( Styner et al., 2008 ) (20 cases) and theMSSeg dataset (15 cases). This method was one of the top performerson the test dataset of the MICCAI 2016 challenge ( Commowick et al.,2018 ), and one of the few methods for which an implementation ispublicly available.
e note that both these benchmark methods are specifically targeting1w-FLAIR input, whereas the proposed method is not tuned to anyarticular combination of input modalities.
Although we only compared our method in detail to these two bench-arks, many more good methods for MS lesion segmentation exist. We
efer the reader to the MSSeg paper ( Commowick et al., 2018 ), the ISBIhallenge paper ( Carass et al., 2017 ) and the ISBI challenge website 7 toompare the reported performance further with other ones.
.3. Metrics and measures
In order to evaluate the influence of varying the input modalities onhe segmentation performance of the proposed method, and to assessegmentation accuracy with respect to that of other methods and hu-an raters, we used a combination of segmentation volume estimates,earson correlation coefficients between such estimates and referencealues, and Dice scores. Volumes were computed by counting the num-er of voxels assigned to a specific structure and converting into mm 3 ,hereas Dice coefficients were computed as
ice ๐,๐ = 2 โ |๐ โฉ ๐ ||๐| + |๐ | ,
here ๐ and ๐ denote segmentation masks, and | โ | counts the numberf voxels in a mask.
The proposed method and both benchmark algorithms produce arobabilistic lesion map that needs to be thresholded to obtain a finalesion segmentation. This requires an appropriate threshold value to beet for this purpose (variable ๐พ in the proposed method). In order tonsure an objective comparison between the methods, we used a leave-ne-out cross-validation strategy in which the threshold for each testmage was set to the value that maximizes the average Dice overlapith manual segmentations in all the other images of the same dataset.or the reported performance of the methods on the ISBI dataset, thehresholds were tuned on the 5 training subjects that are part of thehallenge instead.
. Results
In this section, we first illustrate the effect of the various compo-ents of our model. We then evaluate how the proposed model adaptso different input modalities and acquisition platforms. Subsequently weompare the lesion segmentation performance of our model against thatf the two benchmark methods, relate it to human inter-rater variability,nd analyze its performance on the ISBI challenge data. Finally, we per-orm an indirect validation of the whole-brain segmentation componentf the method.
Throughout the section we use boxplots to show some of the results.n these plots, the median is indicated by a horizontal line, plotted inside
6 https://github.com/sergivalverde/nicMsLesions 7 https://smart-stats-tools.org/lesion-challenge
o a s d
oxes that extend from the first to the third quartile values of the data.he range of the data is indicated by whiskers extending from the boxes,ith outliers represented by circles.
.1. Illustration of the method
In order to illustrate the effect of the various components of theethod, here we analyze its behaviour when segmenting T1w-FLAIR
cans of two MS subjects โ one with a low and one with a high lesionoad. Fig. 4 shows, in addition to the input data and the final lesionrobability estimate ๐ ( ๐ง ๐ = 1 |๐ ๐ , ๏ฟฝฬ๏ฟฝ) , also an intermediate lesion proba-ility obtained with the simplified model used to estimate ๏ฟฝฬ๏ฟฝ, i.e., beforehe FLAIR-based intensity constraints and the lesion shape constraintsre applied. From these images we can see that the lesion shape modelnd the intensity constraints help remove false positive detections andnforce more realistic shapes of lesions, especially for the case with lowesion load.
Fig. 5 analyzes the effect of the prior ๐ ( ๐ฝ๐๐๐ |๐ฝ๐ ) on the lesion intensityarameters ๐ฝ๐๐๐ for the two subjects shown in Fig. 4 . When the lesionoad is high, the prior does not have a strong influence, leaving the lesionaussian โfree โ to fit the data. However, when the lesion load is low,
he lesion Gaussian is constrained to retain a wide variance and a meanlose to the mean of WM, effectively turning the model into an outlieretection method for WM lesions. This behavior is important in caseshen few lesions are present in the images, ensuring the method works
obustly even when only limited data is available to estimate the lesionaussian parameters.
In order to analyze the effect of the lesion shape prior, we com-ared the lesion segmentation performance of the proposed method withhat obtained when the shape prior was intentionally removed from theodel (i.e., all the decoder network outputs ๐ ๐ ( ๐ก ) clamped to value 1).
or a fair comparison, the lesion threshold value ๐พ was re-tuned to max-mize performance for the method without shape prior, in the way de-cribed in Section 4.3 . Table 2 summarizes the results across the MSSeg,rio and Achieva datasets, for different ranges of lesion load. In addi-ion to Dice scores, the table also reports results for precision and recall,efined as
๐๐๐ ๐๐ ๐๐๐ = ๐ ๐ ๐ ๐ + ๐น ๐
๐๐๐ ๐๐๐ = ๐ ๐ ๐ ๐ + ๐น ๐
,
here TP , FP and FN count the true positive, false positive and falseegative voxels compared to the manual segmentation. The results in-icate that, although performance is unchanged for high lesion loads,or which segmentation is generally easier ( Commowick et al., 2018 ),he lesion shape prior clearly improves segmentations in subjects withmall and medium lesion loads.
In order to demonstrate that the model also works robustly in con-rol subjects (with no lesions at all), and can therefore be safely appliedn studies comparing MS subjects with controls, we further segmented1w-FLAIR scans of the Achieva dataset, and computed the total volumef the lesions in each subject. The results are shown in Fig. 6 ; the vol-mes were 8.95 ยฑ 9.18 ml for MS subjects vs. 0.98 ยฑ 0.77 ml for controls.lthough the average lesion volume for controls was not exactly zero, visual inspection revealed that this was due to some controls havingM hyperintensities that were segmented by the method as MS lesions,hich we find acceptable.
.2. Scanner and contrast adaptive segmentations
In order to demonstrate the ability of our method to adapt to differ-nt types and combinations of MRI sequences acquired with differentcanners, we show the methodโs segmentation results along with theanual segmentations for a representative subset of combinations for
ne subject in the MSSeg (consensus as manual segmentation), the Triond the Achieva datasets in Fig. 7 . It is not feasible to show all pos-ible combinations. For instance, mixing the 5 contrasts in the MSSegataset alone already yields 31 possible multi-contrast combinations.
https://github.com/sergivalverde/nicMsLesionshttps://smart-stats-tools.org/lesion-challenge
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021) 117471
Fig. 4. Illustration of how intensity constraints and the lesion shape model help reduce false positive lesion detections in the method. Top row: a subject with a low lesion load; Bottom row: a subject with a high lesion load. From left to right: T1w and FLAIR input; intermediate lesion probability obtained with the simplified model used to estimate ๏ฟฝฬ๏ฟฝ; mask of candidate voxels based on intensity alone (intensity higher than the mean gray matter intensity in FLAIR); and final lesion probability estimate ๐ ( ๐ง ๐ = 1 |๐ ๐ , ฬ๐ฝ) produced by the method.
Fig. 5. Illustration of the effect of the prior ๐ ( ๐ฝ๐๐๐ |๐ฝ๐ ) on the lesion intensity parameters, both in the case of a lesion load that is low (left, corresponding to the subject in the top row of Fig. 4 ) and high (right, corresponding to the subject in the bottom row of Fig. 4 ). The illustration is from the Monte Carlo sampling phase of the method: In each case, the value of the parameters of the lesion Gaussian is taken as the average over the Monte Carlo samples { ๐ฝ( ๐ )
๐๐๐ } ๐ ๐ =1 , and the points represent the
resulting lesion posterior estimate ๐ ( ๐ง ๐ = 1 |๐ ๐ , ฬ๐ฝ) in each voxel. Table 2
Comparison in terms of lesion segmentation performance between the proposed method and a method where the lesion shape model was intentionally removed. Results are expressed in terms of mean ยฑ standard deviation of Dice overlap, precision and recall for different ranges of lesion load. Lesion segmentations were computed across three different datasets (MSSeg, Trio and Achieva) on T1w-FLAIR input.
Lesion load
Dice Precision Recall
Shape model No shape model Shape model No shape model Shape model No shape model
(0, 2] [ml] 0.42 ( ยฑ 0.10) 0.38 ( ยฑ 0.10) 0.32 ( ยฑ 0.12) 0.24 ( ยฑ 0.07) 0.28 ( ยฑ 0.09) 0.24 ( ยฑ 0.07) (2, 10] [ml] 0.50 ( ยฑ 0.13) 0.47 ( ยฑ 0.13) 0.37 ( ยฑ 0.13) 0.33 ( ยฑ 0.11) 0.34 ( ยฑ 0.12) 0.32 ( ยฑ 0.12) (10, โ) [ml] 0.70 ( ยฑ 0.11) 0.70 ( ยฑ 0.11) 0.62 ( ยฑ 0.20) 0.62 ( ยฑ 0.20) 0.55 ( ยฑ 0.12) 0.55 ( ยฑ 0.13)
(0, โ) [ml] 0.57 ( ยฑ 0.16) 0.55 ( ยฑ 0.17) 0.46 ( ยฑ 0.20) 0.43 ( ยฑ 0.20) 0.42 ( ยฑ 0.16) 0.40 ( ยฑ 0.16)
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021) 117471
Fig. 6. Difference between healthy controls (HC) and MS subjects in lesion vol- ume, as detected by the proposed method on the Achieva dataset (23 HC sub- jects, 50 MS subjects, T1w-FLAIR input). Lines indicate means across subjects.
N s w b T v c i
m i t b l d F d a t p t
m s i F
5
m m t b p
5
j m a s o t a t l
t o t c d s o b
5
I F o 0 o
a o ( c p s o s
w a t n t w f t F d d s p u p f
5
i t M t b a h w
5
u p l a c f
8 https://smart-stats-tools.org/lesion-challenge
onetheless, it is clear that the model is indeed able to adapt to thepecific contrast properties of its input scans. A visual inspection of itshole-brain segmentation component seems to indicate that the methodenefits from having access to the T1w contrast for best performance.his is especially clear when only the FLAIR contrast is provided, as thisisually degrades the segmentation of the white-gray boundaries in theortical regions due to the low contrast between white and gray mattern FLAIR.
When comparing the lesion probability maps produced by theethod visually with the corresponding manual lesion segmentations,
t seems that the method benefits from having access to the FLAIR con-rast for the best lesion segmentation performance. This is confirmedy a quantitative analysis shown in Fig. 8 , which plots the Dice over-ap scores for each of the seven input combinations that all our threeatasets have in common, namely T1w, T2w, FLAIR, T1w-T2w, T1w-LAIR, T2w-FLAIR, and T1w-T2w-FLAIR. Although the inclusion of ad-itional contrasts does not hurt lesion segmentation performance, acrossll three datasets the best results are obtained whenever the FLAIR con-rast is included as input to the model. This finding is perhaps not sur-rising, given that the manual delineations were all primarily based onhe FLAIR image.
Considering both the whole-brain and lesion segmentation perfor-ance together, we conclude that the combination T1w-FLAIR is well-
uited for obtaining good results with the proposed method, althought will also accept other and/or additional contrasts beyond T1w andLAIR.
.3. Lesion segmentation
In order to compare the lesion segmentation performance of ourodel against that of the two benchmark methods, and relate it to hu-an inter-rater variability, we here present a number of results based on
he T1w-FLAIR input combination (which is the combination requiredy the benchmark methods). We also analyze the lesion segmentationerformance of our method on the public ISBI challenge.
.3.1. Comparison with benchmark lesion segmentation methods
Fig. 9 shows automatic segmentations of two randomly selected sub-ects from the MSSeg, the Trio and the Achieva datasets, both for ourethod and for the two benchmark methods LST-lga and NicMSLesions,
long with the corresponding manual segmentations (consensus manualegmentations for MSSeg). Visually, all three methods perform similarlyn the Achieva MS data, but some of the results for NicMSLesions appearo be inferior to those obtained with the other two methods on MSSegnd Trio data. This qualitative observation is confirmed by the quanti-ative analysis shown in Fig. 10 , where the three methodsโ Dice over-ap scores are compared on each dataset: similar performances are ob-
ained for all methods on the Achieva data, but NicMSLesions trails thether two methods on MSSeg and Trio data. Especially for MSSeg datahis is a surprising result, since NicMSLesions was trained on this spe-ific dataset, i.e., the subjects used for testing were part of the trainingata of this method, potentially biasing the results in favor of NicMSLe-ions. Based on Dice scores, the proposed method outperforms LST-lgan MSSeg data, although there are no statistically significant differencesetween the two methods on the other datasets.
.3.2. Results on the ISBI data
We also evaluated the performance of the proposed method on theSBI challenge data, obtaining a mean Dice score of 0.58 when T1w-LAIR input is used. This score is comparable to the ones we obtainedn the other three datasets analyzed in this paper (cf. Fig. 10 ) โ MSSeg:.65, Trio: 0.58 and Achieva: 0.54. A few example segmentation resultsn the ISBI data are available in the Supplementary Material, Fig. 4.
The ISBI challenge website 8 ranks submissions according to an over-ll lesion segmentation performance score that takes into account Diceverlap, volume correlation, surface distance, and a few other metricssee Carass et al., 2017 for details). A score of 100 indicates perfectorrespondence, while 90 is meant to correspond to human inter-ratererformance ( Carass et al., 2017; Styner et al., 2008 ). We obtained acore of 87.87, which places us around half-way in the ranking of theriginal challenge ( Carass et al., 2017 ), although we note that the web-ite currently lists methods with a much higher score.
In order to relate the performance of our method to the one obtainedith the two benchmark methods, we also attempted to run LST-lgand NicMSLesions on this dataset. However, the preprocessing appliedo the ISBI challenge data proved problematic for LST-lga, and we wereot able to get any results with this method. Results for NicMSLesions inerms of Dice overlap are shown in Fig. 11 , together with those obtainedith the proposed method. It is clear that NicMSLesions suffers strongly
rom the domain shift between its training data and the ISBI data, a facthat was already reported in Valverde et al. (2019) . For completeness,ig. 11 also includes results for NicMSLesions when its network was up-ated on the ISBI training data as described in Valverde et al. (2019) :ifferent subsets of network parameters were retrained on the baselinecan of each of the five ISBI training subjects, and the combination thaterformed best on all 21 training images was retained. From the fig-re it can be seen that this partially retrained network has comparableerformance to the proposed model, although the latter attains this per-ormance without any retraining.
.3.3. Inter-rater variability
To evaluate the proposed methodโs lesion segmentation performancen the context of human inter-rater variability, we took advantage ofhe availability of lesion segmentations by seven different raters in theSSeg dataset. Table 3 shows the lesion segmentation performance in
erms of average Dice overlap between each pair of the seven raters, andetween each rater and the proposed method. On average, our methodchieves a Dice overlap score of 0.57, which is slightly below the meanuman ratersโ range of [0.59, 0.69]. We note that this result is in lineith those obtained in the MSSeg challenge ( Commowick et al., 2018 ).
.4. Whole-brain segmentation
Since no ground truth segmentations are available for a direct eval-ation of the whole-brain segmentation component of our method, weerformed an indirect validation, evaluating its potential for replacingesion filling approaches that rely on manually annotated lesions, as wells its ability to replicate known atrophy patterns in MS. The results con-entrate on the following 25 main neuroanatomical regions, segmentedrom T1w-FLAIR scans: left and right cerebral white matter, cerebellum
https://smart-stats-tools.org/lesion-challenge
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021) 117471
Fig. 7. Contrast-adaptiveness of the proposed method to different combinations of input modalities. Segmentations are shown for one subject of the MSSeg (top row), the Trio (mid- dle row) and the Achieva MS (bottom row) dataset. For each subject the top row shows slices of the data and the manual lesion anno- tation; the middle row shows the lesion prob- ability map and Dice score computed by the proposed method for specific input combina- tions; and the bottom row shows the corre- sponding complete segmentations produced by the method. Enlarged figures for each subject are available in the Supplementary Material Figs. 1โ3.
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021) 117471
Fig. 8. Lesion segmentation performance of the proposed method in terms of Dice overlap with manual raters on three different datasets when different input contrasts are used (T1w, T2w, FLAIR, T1w-T2w, T1w-FLAIR, T2w-FLAIR, T1w-T2w-FLAIR). From left to right: Dice scores on MSSeg, Trio and Achieva MS data.
Table 3
Comparison of lesion segmentation performance in terms of average Dice score between each pair of the seven raters of the MSSeg dataset, and be- tween each rater and the proposed method (T1w-FLAIR input).
w p a f t โ t
5
t M e a f p n i p w m d w i r s c
c w m d T
e s t e a s s t v t i
5
r t 2 p u r u o v s r b 2 v
6
m i m o i p f W p s t r s o
t
hite matter, cerebral cortex, cerebellum cortex, lateral ventricle, hip-ocampus, thalamus, putamen, pallidum, caudate, amygdala, nucleusccumbens and brain stem. To avoid cluttering, the quantitative resultsor left and right structures are averaged. We note that lesion segmen-ations are not merged into any of these brain structures (i.e., leavingholes โ in white matter), so that the results reflect performance only forhe normal-appearing parts of structures.
.4.1. Comparison with lesion filling
It is well-known that white matter lesions can severely interfere withhe quantification of normal-appearing structures when standard brainRI segmentation techniques are used ( Battaglini et al., 2012; Ceccarelli
t al., 2012; Chard et al., 2010; Gelineau-Morel et al., 2012; Nakamurand Fisher, 2009; Vrenken et al., 2013 ). A common strategy is there-ore to use a lesion-filling ( Chard et al., 2010; Sdika and Pelletier, 2009 )rocedure, in which lesions are first manually segmented, their origi-al voxel intensities are replaced with normal-appearing white matterntensities, and standard tools are then used to segment the resulting,reprocessed images. Using such a procedure with SAMSEG would yieldhole-brain segmentations that can serve as โsilver standard โ bench-arks against which the results of the proposed method (which worksirectly on the original scans) can be compared. In practice, however,e have noticed that replacing lesion intensities, which is typically done
n T1w only, did not work well in FLAIR in our experiments. Therefore,ather than explicitly replacing intensities, we obtained silver standardegmentations by simply masking out lesions during the SAMSEG pro-essing, effectively ignoring lesion voxels during the model fitting.
We wished to interpret segmentation vs. silver standard discrepan-ies within the context of the human inter-rater variability associatedith manually segmenting lesions. Therefore, we performed experi-ents on the MSSeg dataset, repeatedly re-computing the silver stan-ard using each of the seven ratersโ manual lesion annotations in turn.he results are shown in Tables 4 and 5 for Pearson correlation co-
fficients between estimated volumes and Dice segmentation overlapcores, respectively. Each line in these tables corresponds to one struc-ure, showing the average consistency between the silver standard ofach rater compared to that of the six other raters, as well as the aver-ge consistency between the proposed methodโs segmentation and theilver standards of all raters. The results indicate that, in terms of Pear-on correlation coefficient, the performance of our method falls withinhe range of inter-rater variability, albeit narrowly (average value 0.988s. inter-rater range [0.988, 0.992]). In terms of Dice scores, however,he method slightly underperforms compared to the inter-rater variabil-ty (average value 0.971 vs. inter-rater range [0.978, 0.980]).
.4.2. Detecting atrophy patterns in MS
In a final analysis, we assessed whether previously reported volumeeductions in specific brain structures in MS can automatically be de-ected with the proposed method. Towards this end, we segmented the3 controls and the 50 MS subjects of the Achieva dataset, and com-ared the volumes of various structures between the two groups. Vol-mes were normalized for age, gender and total intracranial volume byegressing them out with a general linear model. The intracranial vol-me used for the normalization was computed by summing the volumesf all the structures, as segmented by the method, within the intracranialault. The results are shown in Fig. 12 . Although not all volumes showedignificant difference between groups, well established differences wereeplicated. In particular, we demonstrated decreased volumes of cere-ral white matter, cerebral cortex, thalamus and caudate ( Azevedo et al.,018; Chard et al., 2002; Houtchens et al., 2007 ) as well as an increasedolume of the lateral ventricles ( Zivadinov et al., 2016 ).
. Discussion and conclusion
In this paper, we have proposed a method for the simultaneous seg-entation of white matter lesions and normal-appearing neuroanatom-
cal structures from multi-contrast brain MRI scans of MS patients. Theethod integrates a novel model for white matter lesions into a previ-
usly validated generative model for whole-brain segmentation. By us-ng separate models for the shape of anatomical structures and their ap-earance in MRI, the algorithm is able to adapt to data acquired with dif-erent scanners and imaging protocols without needing to be retrained.
e validated the method using four disparate datasets, showing robusterformance in white matter lesion segmentation while simultaneouslyegmenting dozens of other brain structures. We further demonstratedhat it can also be safely applied to MRI scans of healthy controls, andeplicate previously documented atrophy patterns in deep gray mattertructures in MS. The proposed algorithm is publicly available as partf the open-source neuroimaging package FreeSurfer.
By performing both whole-brain and white matter lesion segmenta-ion at the same time, the method we propose aims to supplant the two-
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021) 117471
Fig. 9. Visual comparison of lesion probability maps on three different datasets for the proposed method and two state-of-the-art lesion segmentation methods (LST-lga and NicMsLesions) on T1w-FLAIR input. (Top) Two subjects from the MSSeg dataset; (Middle) Two subjects from the Trio dataset; (Bottom) Two subjects from the Achieva dataset. For each subject the top row shows slices of the data and the manual annotation while the bottom row shows the lesion probability maps for our model, LST-lga and NicMsLesions.
s r a s w b
s a o m d w
tage โlesion filling โ procedure that is commonly used in morphomet-ic studies in MS, in which lesions segmented in a first step are used tovoid biasing a subsequent analysis of normal-appearing structures withoftware tools developed for healthy brain scans. In order to evaluatehether our method is successful in this regard, we compared its whole-rain segmentation performance against the results obtained when le-
ions are segmented a priori by seven different human raters instead ofutomatically by the method itself. Our results show that the volumesf various neuroanatomical structures obtained when lesions are seg-ented automatically fall within the range of inter-rater variability, in-icating that the proposed method may be used instead of lesion fillingith manual lesion segmentations in large volumetric studies of brain
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021) 117471
Table 4
Average Pearson correlation coefficients of brain structure volume estimates between the silver standard of each rater compared to that of the six other raters in the MSSeg dataset, as well as the average consistency between the proposed methodโs segmentation and the silver standards of all raters (T1w-FLAIR input). Each line shows an average across raters for a specific brain structure.
Table 5
Same as Table 4 , but with Dice segmentation overlap scores. Each line shows an average across raters โ similar to the last row of Table 3 โ for a specific brain structure.
a f f o
p a
a c ๐พ
f q o h
F
sf
trophy in MS. When detailed spatial overlap is analyzed, however, weound that the automatic segmentation does not fully reach the per-ormance obtained with human lesion annotation as measured by Diceverlap.
Like many other methods for MS lesion segmentation, the methodroposed here produces a spatial map indicating in each voxel its prob-bility of belonging to a lesion, which can then be thresholded to obtain
ig. 10. Lesion segmentation performance in terms of Dice overlap with manual ratLesions) on T1w-FLAIR input. Statistically significant differences between two methor p -value < 0.001, โโ โ โ for p -value < 0.01 and โโ โ for p -value < 0.05). From left to
final lesion segmentation. Although in our experience good resultsan be obtained by using the same threshold value across datasets (e.g.,= 0 . 5 ), changing this value allows one to adjust the trade-off between
alse positive and false negative lesion detections. Since some MRI se-uences and scanners will depict lesions with a higher contrast thanthers, and because there is often considerable disagreement betweenuman experts regarding the exact extent of lesions ( Zijdenbos et al.,
ers for the proposed method and two benchmark methods (LST-lga and NicM- ods, computed with a two-tailed paired t -test, are indicated by asterisks ( โโ โ โ โright: results on the MSSeg, the Trio and the Achieva dataset.
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021) 117471
Fig. 11. Lesion segmentation performance in terms of Dice overlap with man- ual raters on the ISBI dataset for the proposed method, NicMsLesions, and NicM- sLesions with partial retraining (see text for details). Statistically significant dif- ferences between two methods, computed with a two-tailed paired t -test, are indicated by asterisks ( โโ โ โ โ indicates p -value < 0.001).
1 a v a o
c a c n i c s r g b q i
p
t s s m s S 2 W i l d
D
GG P b
C
m P
i W
s S
i M
A
2 C s
F
pw
998 ), in our implementation we therefore expose this threshold values an optional, tunable parameter to the end-user. Suitable thresholdalues can be found by visually inspecting the lesion segmentations of few cases or, in large-scale studies, using cross-validation as we did inur experiments.
By providing the ability to robustly and efficiently segment multi-ontrasts scans of MS patients across a wide range of imaging equipmentnd protocols, the software tool presented here may help facilitate largeohort studies aiming to elucidate the morphological and temporal dy-amics underlying disease progression and accumulation of disabilityn MS. Furthermore, in current clinical practice, high-resolution multi-ontrast images, which can be used to increase the accuracy of lesionegmentation, represent a significantly increased burden for the neu-oradiologist to read, and are hence frequently not acquired. The emer-ence of robust, multi-contrast segmentation tools such as ours may helpreak the link between the resolution and number of contrasts of the ac-uired data and the human time needed to evaluate it, thus potentiallyncreasing the accuracy of the resulting measures.
The ability of the proposed method to automatically tailor its ap-earance models for specific datasets makes it very flexible, allowing it
ig. 12. Differences between healthy controls (HC) and MS subjects in normalizedroposed method on the Achieva dataset (23 HC subjects, 50 MS subjects, T1w-FLAIRith a Welchโs t-test, are indicated by asterisks ( โโ โ โ for p -value < 0.01 and โโ โ for p
o seamlessly take advantage of novel, potentially more sensitive andpecific MRI acquisitions as they are developed. Although not exten-ively tested, the proposed method should make it possible to, withinimal adjustments, segment data acquired with advanced research
equences such as MP2RAGE ( Marques et al., 2010 ), DIR ( Redpath andmith, 1994 ), FLAIR 2 ( Wiggermann et al., 2016 ) or T2 โ ( Anderson et al.,001 ), both at conventional and at ultra-high magnetic field strengths.e are currently pursuing several extensions of the proposed method,
ncluding the ability to go on and create cortical surfaces and parcel-ations in FreeSurfer, as well as a dedicated version for longitudinalata ( Cerri et al., 2020 ).
eclaration of Competing Interest
Hartwig R. Siebner has received honoraria as speaker from Sanofienzyme, Denmark and Novartis, Denmark, as consultant from Sanofienzyme, Denmark and as senior editor (NeuroImage) from Elsevierublishers, Amsterdam, The Netherlands. He has received royalties asook editor from Springer Publishers, Stuttgart, Germany.
RediT authorship contribution statement
Stefano Cerri: Conceptualization, Methodology, Software, For-al analysis, Validation, Visualization, Writing - original draft. Oulauonti: Supervision, Methodology, Software, Writing - review & edit-
ng. Dominik S. Meier: Resources, Writing - review & editing. Jensuerfel: Resources, Writing - review & editing. Mark Mรผhlau: Re-
ources, Writing - review & editing, Funding acquisition. Hartwig R.iebner: Supervision, Resources, Writing - review & editing, Fund-ng acquisition. Koen Van Leemput: Supervision, Conceptualization,ethodology, Software, Writing - review & editing, Funding acquisition.
cknowledgments
This project has received funding from the European Unionโs Horizon020 research and innovation program under the Marie Sklodowska-urie grant agreement No. 765148, as well as from the National In-titute Of Neurological Disorders and Stroke under project number
volume estimates of various neuroanatomical structures, as detected by the input). Statistically significant differences between the two groups, computed
-value < 0.05).
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021) 117471
R c v ( m G
A
r c i p 1 c t c b a
๐โโโโw
๐ฆ
๐
๐
a
๐
A
w d b t c w
๐ผ
a E
c o e t G
๐
๐บ
๐
๐บ
w
A
C t S r G o
๐บ
w t i
๐
i
as
๐
c T t w s t p
S
t
R
A
01NS112161. Hartwig R. Siebner holds a 5-year professorship in pre-ision medicine at the Faculty of Health Sciences and Medicine, Uni-ersity of Copenhagen which is sponsored by the Lundbeck FoundationGrant Nr. R186-2015-2138). Mark Mรผhlau was supported by the Ger-an Research Foundation (Priority Program SPP2177, Radiomics: Nexteneration of Biomedical Imaging) โ project number 428223038.
ppendix A. Parameter optimization in SAMSEG
We here describe how we perform the optimization of ๐ ( ๐ฝ|๐ ) withespect to ๐ฝ in the original SAMSEG model. We follow a coordinate as-ent approach, in which a limited-memory BFGS optimization of ๐ฝ๐ฅ isnterleaved with a generalized EM (GEM) optimization of the remainingarameters ๐ฝ๐ . The GEM algorithm was derived in ( Van Leemput et al.,999 ) based on ( Wells et al., 1996 ), and is repeated here for the sake ofompleteness. It iteratively constructs a tight lower bound to the objec-ive function by computing the soft label assignments ๐ค ๐,๐ based on theurrent estimate of ๐ฝ๐ ( Eq. (2) ), and subsequently improves the loweround (and therefore the objective function) using the following set ofnalytical update equations for these parameters :
๐ โ ๐ฆ ๐ and ๐บ๐ โ ๐ ๐ , โ๐
๐ 1 โฎ ๐ ๐
โ โ โ โ โ โ โ โ โ ๐ ๐ ๐ 1 , 1 ๐ โฆ ๐ ๐ ๐ 1 ,๐ ๐
โฎ โฑ โฎ ๐ ๐ ๐ ๐, 1 ๐ โฆ ๐ ๐ ๐ ๐,๐ ๐
โ โ โ โ โ1 โ โ โ โ โ ๐ ๐
(โ๐ ๐ =1 ๐ 1 ,๐ ๐ซ 1 ,๐
)โฎ
๐ ๐ (โ๐
๐ =1 ๐ ๐,๐ ๐ซ ๐,๐ )โ โ โ โ โ ,
here
๐ = โ๐ผ ๐ =1 ๐ค ๐,๐ ( ๐ ๐ โ ๐ ๐๐ )
๐ ๐ with ๐ ๐ =
โ๐ผ ๐ =1 ๐ค ๐,๐ ,
๐ = โ๐ผ ๐ =1 ๐ค ๐,๐ ( ๐ ๐ โ ๐ ๐๐ โ ๐ฆ ๐ )( ๐ ๐ โ ๐ ๐๐ โ ๐ฆ ๐ )
๐
๐ ๐ ,
= โ โ โ โ ๐1 1 โฆ ๐
1 ๐
โฎ โฑ โฎ ๐๐ผ 1 โฆ ๐
๐ผ ๐
โ โ โ โ , ๐ ๐,๐ = diag (๐ ๐,๐ ๐
), ๐ซ ๐,๐ =
โ โ โ โ ๐ ๐,๐
1 โฎ ๐ ๐,๐
๐ผ
โ โ โ โ nd
๐,๐ ๐
= โ๐พ ๐ =1 ๐
๐,๐
๐,๐ , ๐
๐,๐
๐,๐ = ๐ค ๐,๐
(๐บโ1 ๐
)๐,๐ , ๐
๐,๐ ๐
= ๐ ๐ ๐ โ
โ๐พ ๐=1 ๐
๐,๐ ๐,๐ ( ๐๐ ) ๐ โ๐พ
๐=1 ๐ ๐,๐ ๐,๐
.
ppendix B. Parameter optimization
Here we describe how we perform the optimization of ๐ ( ๐ฝ, ๐ฝ๐๐๐ |๐ )ith respect to ๐ฝ and ๐ฝ๐๐๐ in the augmented model of Sec. 3 with theecoder outputs ๐ ๐ ( ๐ก ) all clamped to value 1. In that case, the model cane reformulated in the same form as the original SAMSEG model, so thathe same optimization strategy can be used. In particular, lesions can beonsidered to form an extra class (with index ๐พ + 1 ) in a SAMSEG modelith ๐พ + 1 labels, provided that the mesh vertex label probabilities
ฬ ๐ ๐ =
{ ๐ฝ๐ if ๐ = ๐พ + 1 (lesion) , ๐ผ๐ ๐ ( ๐ฝ๐ โ 1) otherwise .
re used instead of the original ๐ผ๐ ๐ โs in the atlas interpolation model of
q. (1) . The optimization described in Appendix A does require one modifi-
ation because of the prior ๐ ( ๐ฝ๐๐๐ |๐ฝ๐ ) binding the means and variancesf the WM and lesion classes together. The following altered updatequations for these parameters guarantee that the EM lower bound, andherefore the objective function, is improved in each iteration of theEM algorithm:
๐ ๐ โ
( ๐ ๐ ๐ ๐ +
๐๐ ๐ ๐
๐ + ๐ ๐ ๐ ๐บ๐ ๐ ๐บโ1 ๐๐๐
) โ1 (
๐ ๐ ๐ ๐ฆ ๐ ๐ + ๐๐ ๐ ๐
๐ + ๐ ๐ ๐ ๐บ๐ ๐ ๐บโ1 ๐๐๐ ๐ฆ ๐๐๐
) ,
๐ ๐ โ ๐ ๐ ๐ ๐ ๐ ๐ + ๐บ๐๐๐ ๐บโ1 ๐ ๐ ๐ฟ๐๐๐ ๐ ๐ ๐ + ๐ ๐๐๐ + ๐ + 2
,
๐๐๐ โ ๐ ๐๐๐ ๐ฆ ๐๐๐ + ๐๐๐ ๐
๐ ๐๐๐ + ๐,
๐๐๐ โ ๐ฟ๐๐๐ + ๐๐ ๐บ๐ ๐
๐ ๐๐๐ + ๐,
here ๐ฟ๐๐๐ = ๐ ๐๐๐ ๐
๐ ๐๐๐ + ๐( ๐ฆ ๐๐๐ โ ๐๐ ๐ )( ๐ฆ ๐๐๐ โ ๐๐ ๐ ) ๐ + ๐ ๐๐๐ ๐ ๐๐๐ .
ppendix C. Estimating lesion probabilities
We here describe how we we approximate ๐ ( ๐ง ๐ = 1 |๐ ๐ , ๏ฟฝฬ๏ฟฝ) using Montearlo sampling. We use a Markov chain Monte Carlo (MCMC) approacho sample triplets { ๐ฝ( ๐ )
๐๐๐ , ๐ณ ( ๐ ) , ๐ก ( ๐ ) } from the distribution ๐ ( ๐ฝ๐๐๐ , ๐ณ, ๐ก |๐ , ๏ฟฝฬ๏ฟฝ) :
tarting from an initial lesion segmentation ๐ณ (0) obtained from the pa-ameter estimation procedure described in Appendix B , we use a blockedibbs sampler in which each variable is updated conditioned on thether ones: ( ๐ +1) ๐๐๐
โผ ๐ ( ๐บ๐๐๐ |๐ , ๏ฟฝฬ๏ฟฝ, ๐ณ ( ๐ ) ) = IW
(๐บ๐๐๐
|||๐ฟ( ๐ ) ๐๐๐ + ๐๐ ๏ฟฝฬ๏ฟฝ๐ ๐ , ๐ ( ๐ ) ๐๐๐ + ๐ โ ๐ โ 2 )๐( ๐ +1) ๐๐๐
โผ ๐ ( ๐๐๐๐ |๐ , ๏ฟฝฬ๏ฟฝ, ๐ณ ( ๐ ) , ๐บ( ๐ +1) ๐๐๐ ) =
( ๐๐๐๐
|||| ๐ ( ๐ ) ๐๐๐ ๐ฆ ( ๐ ) ๐๐๐
+ ๐๏ฟฝฬ๏ฟฝ๐ ๐ ๐
( ๐ ) ๐๐๐
+ ๐,
๐บ( ๐ +1) ๐๐๐
๐ ( ๐ ) ๐๐๐
+ ๐
) ๐ก ( ๐ +1) โผ ๐ ( ๐ก |๐ณ ( ๐ ) ) โ ( ๐ก |||๐๐( ๐ณ ( ๐ ) ) , diag ( ๐2 ๐( ๐ณ ( ๐ ) )) )๐ณ ( ๐ +1) โผ ๐ ( ๐ณ|๐ , ๏ฟฝฬ๏ฟฝ, ๐ก ( ๐ +1) , ๐ฝ( ๐ +1)
๐๐๐ ) =
๐ผ โ๐ =1 ๐ ( ๐ง ๐ |๐ ๐ , ๏ฟฝฬ๏ฟฝ, ๐ก ( ๐ +1) , ๐ฝ( ๐ +1) ๐๐๐ ) ,
here we use the encoder variational approximation obtained duringhe training of the lesion shape model (see Sec. 3.1.2 ) to sample from ๐กn the next-to-last step, and
( ๐ง ๐ = 1 |๐ ๐ , ๏ฟฝฬ๏ฟฝ, ๐ก , ๐ฝ๐๐๐ ) = ( ๐ ๐ |๐๐๐๐ + ๐ ๐๐ , ๐บ๐๐๐ ) ๐ ๐ ( ๐ก ) ๐๐ ( ฬ๐ฝ๐ฅ ) โ๐พ ๐ ๐ =1
โ1 ๐ง โฒ๐ =0 ๐ ( ๐ ๐ |๐ ๐ , ๐ง โฒ๐ , ๏ฟฝฬ๏ฟฝ๐ฅ , ๐ฝ๐๐๐ ) ๐ ( ๐ง โฒ๐ |๏ฟฝฬ๏ฟฝ๐ฅ , ๐ก ) ๐ ( ๐ ๐ |๏ฟฝฬ๏ฟฝ๐ฅ )
n the last step. In these equations, the variables ๐ ( ๐ ) ๐๐๐ , ๐ฆ ( ๐ )
๐๐๐ , ๐ ( ๐ )
๐๐๐ and ๐ฟ( ๐ )
๐๐๐
re as defined before, but using voxel assignments ๐ค ๐,๐๐๐ = ๐ง ( ๐ ) ๐
. Once ๐amples are obtained, we approximate ๐ ( ๐ง ๐ = 1 |๐ ๐ , ๏ฟฝฬ๏ฟฝ) as ( ๐ง ๐ = 1 |๐ ๐ , ๏ฟฝฬ๏ฟฝ) โ 1 ๐ ๐ โ
๐ =1 ๐ ( ๐ง ๐ = 1 |๐ ๐ , ๏ฟฝฬ๏ฟฝ, ๐ก ( ๐ ) , ๐ฝ( ๐ ) ๐๐๐ ) .
In our implementation, we use ๐ = 50 samples, obtained after dis-arding the first 50 sweeps of the sampler (so-called โburn-in โ phase).he algorithm repeatedly invokes the decoder and encoder networks ofhe lesion shape model described in Sec. 3.1.2 . Since this shape modelas trained in a specific isotropic space, the algorithm requires tran-
itioning between this training space and subject space using an affineransformation. This is accomplished by resampling the input and out-ut of the encoder and decoder, respectively, using linear interpolation.
upplementary material
Supplementary material associated with this article can be found, inhe online version, at 10.1016/j.neuroimage.2020.117471
eferences
badi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A.,Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia,
https://doi.org/10.1016/j.neuroimage.2020.117471
-
S. Cerri, O. Puonti, D.S. Meier et al. NeuroImage 225 (2021) 117471
A
A
A
A
A
B
B
B
B
B
B
C
C
C
C
C
C
C
C
D
D
F
FG
G
G
G
GG
G
H
HI
J
K
KKL
L
M
M
M
N
P
P
P
R
R
R
R
R
Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mane, D., Monga, R., Moore,S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar,K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viegas, F., Vinyals, O., Warden, P.,Wattenberg, M., Wicke, M., Yu, Y., Zheng, X., 2015. Tensorflow: large-scale machinelearning on heterogeneous distributed systems. 1603.04467 .
delman, G. , Rane, S.G. , Villa, K.F. , 2013. The cost burden of multiple sclerosis in theUnited States: a systematic review of the literature. J. Med. Econ. 16 (5), 639โ647 .
รฏt-Ali, L.S. , Prima, S. , Hellier, P. , Carsin, B. , Edan, G. , Barillot, C. , 2005. STREM: A robustmultidimensional parametric method to segment MS lesions in MRI. In: Lecture Notesin Computer Science (including subseries Lecture Notes in Artificial Intelligence andLecture Notes in Bioinformatics), 3749, pp. 409โ416 .
nderson, L. , Holden, S. , Davis, B. , Prescott, E. , Charrier, C. , Bunce, N. , Firmin, D. ,Wonke, B. , Porter, J. , Walker, J. , Pennell, D. , 2001. Cardiovascular T2-star (T2 โ ) mag-netic resonance for the early diagnosis of myocardial iron overload. Euro. Heart J. 22(23), 2171โ2179 .
shburner, J. , Andersson, J.L. , Fristen, K.J. , 2000. Image registration using a symmetricprior โ in three dimensions. Hum. Brain Mapp. 9 (4), 212โ225 .
zevedo, C.J. , Cen, S.Y. , Khadka, S. , Liu, S. , Kornak, J. , Shi, Y. , Zheng, L. , Hauser, S.L. ,Pelletier, D. , 2018. Thalamic atrophy in multiple sclerosis: a magnetic resonanceimaging marker of neurodegeneration throughout disease. Ann. Neurol. 83 (2), 223โ234 .
akshi, R. , Thompson, A.J. , Rocca, M.A. , Pelletier, D. , Dousset, V. , Barkhof, F. , Inglese, M. ,Guttmann, C.R. , Horsfield, M.A. , Filippi, M. , 2008. MRI in multiple sclerosis: currentstatus and future prospects. Lancet Neurol. 7 (7), 615โ625 .
arkhof, F. , Calabresi, P.A. , Miller, D.H. , Reingold, S.C. , 2009. Imaging outcomes forneuroprotection and repair in multiple sclerosis trials. Nat. Rev. Neurol. 5 (5), 256โ266 .
attaglini, M. , Jenkinson, M. , De Stefano, N. , 2012. Evaluating and reducing the impactof white matter lesions on brain volume measurements. Hum. Brain Mapp. 33 (9),2062โ2071 .
azin, P.-L. , Pham, D.L. , 2008. Homeomorphic brain image segmentation with topologicaland statistical atlases. Med. Image Anal. 12 (5), 616โ625 .
lystad, I. , Hรฅkansson, I. , Tisell, A. , Ernerudh, J. , Smedby, ร. , Lundberg, P. , Larsson, E.-M. ,2015. Quantitative MRI for analysis of active multiple sclerosis lesions withoutgadolinium-based contrast agent. Am. J. Neuroradiol. 37 (1), 94โ100 .
ricq, S. , Collet, C. , Armspach, J.P. , 2008. Lesions detection on 3D brain MRI using trim-mmed likelihood estimator and probabilistic atlas. In: Proceedings of the 2008 FifthIEEE International Symposium on Biomedical Imaging: From Nano to Macro, Proceed-ings, ISBI, pp. 93โ96 .
arass, A. , Roy, S. , Gherman, A. , Reinhold, J.C. , Jesson, A. , Arbel, T. , Maier, O. , Han-dels, H. , Ghafoorian, M. , Platel, B. , Birenbaum, A. , Greenspan, H. , Pham, D.L. ,Crainiceanu, C.M. , Calabresi, P.A. , Prince, J.L. , Roncal, W.R. , Shinohara, R.T. , Oguz, I. ,2020. Evaluating white matter lesion segmentations with refined Sรธrensen-Dice anal-ysis. Sci. Rep. 10, 1โ19 .
arass, A. , Roy, S. , Jog, A. , Cuzzocreo, J.L. , Magrath, E. , Gherman, A. , Button, J. ,Nguyen, J. , Prados, F. , Sudre, C.H. , Cardoso, M.J. , Cawley, N. , Ciccarelli, O. , Wheel-er-Kingshott, C.A.M. , Ourselin, S. , Catanese, L. , Deshpande, H. , Maurel, P. , Com-mowick, O. , Barillot, C. , Tomas-Fernandez, X. , Warfield, S.K. , Vaidya, S. , Chun-duru, A. , Muthuganapathy, R. , Krishnamurthi, G. , Jesson, A. , Arbel, T. , Maier, O. , Han-dels, H. , Iheme, L.O. , Unay, D. , Jain, S. , Sima, D.M. , Smeets, D. , Ghafoorian, M. , Pla-tel, B. , Birenbaum, A. , Greenspan, H. , Bazin, P.-L. , Calabresi, P.A. , Crainiceanu, C.M. ,Ellingsen, L.M. , Reich, D.S. , Prince, J.L. , Pham, D.L. , 2017. Longitudinal multiple scle-rosis lesion segmentation: resource & challenge HHS public access. NeuroImage 148,77โ102 .
eccarelli, A. , Jackson, J. , Tauhid, S. , Arora, A. , Gorky, J. , DellโOglio, E. , Bakshi, A. , Chit-nis, T. , Khoury, S.J. , Weiner, H.L. , et al. , 2012. The impact of lesion in-painting andregistration methods on voxel-based morphometry in detecting regional cerebral graymatter atrophy in multiple sclerosis. Am. J. Neuroradiol. 33 (8), 1579โ1585 .
erri, S. , Hoopes, A. , Greve, D.N. , Mรผhlau, M. , Van Leemput, K. , 2020. A longitudinalmethod for simultaneous whole-brain and lesion segmentation in multiple sclerosis.In: Proceedings of the Third International Workshop in Machine Learning in ClinicalNeuroimaging (accepted) .
hard, D.T. , Griffin, C.M. , Parker, G.J.M. , Kapoor, R. , Thompson, A.J. , Miller, D.H. , 2002.Brain atrophy in clinically early relapsing-remitting multiple sclerosis. Brain 125 (2),327โ337 .
hard, D.T. , Jackson, J.S. , Miller, D.H. , Wheeler-Kingshott, C.A. , 2010. Reducing the im-pact of white matter lesions on automated measures of brain gray and white mattervolumes. J. Magn. Resonanc. Imaging 32 (1), 223โ228 .
ommowick, O. , Istace, A.