learning distributions of shape trajectories from …...the action of a group of diffeomorphisms to...

11
Learning distributions of shape trajectories from longitudinal datasets: a hierarchical model on a manifold of diffeomorphisms Alexandre Bˆ one Olivier Colliot Stanley Durrleman The Alzheimer’s Disease Neuroimaging Initiative Institut du Cerveau et de la Moelle ´ epini` ere, Inserm, CNRS, Sorbonne Universit´ e, Paris, France Inria, Aramis project-team, Paris, France {alexandre.bone, olivier.colliot, stanley.durrleman}@icm-institute.org Abstract We propose a method to learn a distribution of shape tra- jectories from longitudinal data, i.e. the collection of indi- vidual objects repeatedly observed at multiple time-points. The method allows to compute an average spatiotemporal trajectory of shape changes at the group level, and the indi- vidual variations of this trajectory both in terms of geometry and time dynamics. First, we formulate a non-linear mixed- effects statistical model as the combination of a generic sta- tistical model for manifold-valued longitudinal data, a de- formation model defining shape trajectories via the action of a finite-dimensional set of diffeomorphisms with a man- ifold structure, and an efficient numerical scheme to com- pute parallel transport on this manifold. Second, we intro- duce a MCMC-SAEM algorithm with a specific approach to shape sampling, an adaptive scheme for proposal vari- ances, and a log-likelihood tempering strategy to estimate our model. Third, we validate our algorithm on 2D sim- ulated data, and then estimate a scenario of alteration of the shape of the hippocampus 3D brain structure during the course of Alzheimer’s disease. The method shows for in- stance that hippocampal atrophy progresses more quickly in female subjects, and occurs earlier in APOE4 mutation carriers. We finally illustrate the potential of our method for classifying pathological trajectories versus normal ageing. 1. Introduction 1.1. Motivation At the interface of geometry, statistics, and computer sci- ence, statistical shape analysis meets a growing number of applications in computer vision and medical image anal- ysis. This research field has addressed two main statisti- cal questions: atlas construction for cross-sectional shape datasets, and shape regression for shape time series. The former is the classical extension of a mean-variance analy- sis, which aims to estimate a mean shape and a covariance structure from observations of several individual instances of the same object or organ. The latter extends the concept of regression by estimating a spatiotemporal trajectory of shape changes from a series of observations of the same in- dividual object at different time-points. The emergence of longitudinal shape datasets, which consist in the collection of individual objects repeatedly observed at multiple time- points, has raised the need for a combined approach. One needs statistical methods to estimate normative spatiotem- poral models from series of individual observations which differ in shape and dynamics of shape changes across in- dividuals. Such model should capture and disentangle the inter-subject variability in shape at each time-point and the temporal variability due to shifts in time or scalings in pace of shape changes. Considering individual series as samples along a trajectory of shape changes, this approach amounts to estimate a spatiotemporal distribution of trajectories, and has potential applications in various fields including silhou- ette tracking in videos, analysis of growth patterns in biol- ogy or modelling disease progression in medicine. 1.2. Related work The central difficulty in shape analysis is that shape spaces are either defined by invariance properties [21, 40, 41] or by the conservation of topological properties [5, 8, 13, 20, 34], and therefore have intrinsically a structure of infinite dimensional Riemannian manifolds or Lie groups. Statistical Shape Models [9] are linear but require consistent points labelling across observations and have no topology preservation guarantees. A now usual approach is to use the action of a group of diffeomorphisms to define a metric on a shape space [29, 43]. This approach has been used to compute a “Fr´ echet mean” together with a covariance ma- trix in the tangent-space of the mean [1, 16, 17, 33, 44] from a cross-sectional dataset, and regression from time series of shape data [4, 15, 16, 18, 26, 32]. In [12, 14, 31], these tools

Upload: others

Post on 28-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning distributions of shape trajectories from …...the action of a group of diffeomorphisms to define a metric on a shape space [29, 43]. This approach has been used to compute

Learning distributions of shape trajectories from longitudinal datasets:a hierarchical model on a manifold of diffeomorphisms

Alexandre Bone Olivier Colliot Stanley DurrlemanThe Alzheimer’s Disease Neuroimaging Initiative

Institut du Cerveau et de la Moelle epiniere, Inserm, CNRS, Sorbonne Universite, Paris, FranceInria, Aramis project-team, Paris, France

{alexandre.bone, olivier.colliot, stanley.durrleman}@icm-institute.org

Abstract

We propose a method to learn a distribution of shape tra-jectories from longitudinal data, i.e. the collection of indi-vidual objects repeatedly observed at multiple time-points.The method allows to compute an average spatiotemporaltrajectory of shape changes at the group level, and the indi-vidual variations of this trajectory both in terms of geometryand time dynamics. First, we formulate a non-linear mixed-effects statistical model as the combination of a generic sta-tistical model for manifold-valued longitudinal data, a de-formation model defining shape trajectories via the actionof a finite-dimensional set of diffeomorphisms with a man-ifold structure, and an efficient numerical scheme to com-pute parallel transport on this manifold. Second, we intro-duce a MCMC-SAEM algorithm with a specific approachto shape sampling, an adaptive scheme for proposal vari-ances, and a log-likelihood tempering strategy to estimateour model. Third, we validate our algorithm on 2D sim-ulated data, and then estimate a scenario of alteration ofthe shape of the hippocampus 3D brain structure during thecourse of Alzheimer’s disease. The method shows for in-stance that hippocampal atrophy progresses more quicklyin female subjects, and occurs earlier in APOE4 mutationcarriers. We finally illustrate the potential of our method forclassifying pathological trajectories versus normal ageing.

1. Introduction1.1. Motivation

At the interface of geometry, statistics, and computer sci-ence, statistical shape analysis meets a growing number ofapplications in computer vision and medical image anal-ysis. This research field has addressed two main statisti-cal questions: atlas construction for cross-sectional shapedatasets, and shape regression for shape time series. Theformer is the classical extension of a mean-variance analy-

sis, which aims to estimate a mean shape and a covariancestructure from observations of several individual instancesof the same object or organ. The latter extends the conceptof regression by estimating a spatiotemporal trajectory ofshape changes from a series of observations of the same in-dividual object at different time-points. The emergence oflongitudinal shape datasets, which consist in the collectionof individual objects repeatedly observed at multiple time-points, has raised the need for a combined approach. Oneneeds statistical methods to estimate normative spatiotem-poral models from series of individual observations whichdiffer in shape and dynamics of shape changes across in-dividuals. Such model should capture and disentangle theinter-subject variability in shape at each time-point and thetemporal variability due to shifts in time or scalings in paceof shape changes. Considering individual series as samplesalong a trajectory of shape changes, this approach amountsto estimate a spatiotemporal distribution of trajectories, andhas potential applications in various fields including silhou-ette tracking in videos, analysis of growth patterns in biol-ogy or modelling disease progression in medicine.

1.2. Related work

The central difficulty in shape analysis is that shapespaces are either defined by invariance properties [21, 40,41] or by the conservation of topological properties [5, 8,13, 20, 34], and therefore have intrinsically a structure ofinfinite dimensional Riemannian manifolds or Lie groups.Statistical Shape Models [9] are linear but require consistentpoints labelling across observations and have no topologypreservation guarantees. A now usual approach is to usethe action of a group of diffeomorphisms to define a metricon a shape space [29, 43]. This approach has been used tocompute a “Frechet mean” together with a covariance ma-trix in the tangent-space of the mean [1, 16, 17, 33, 44] froma cross-sectional dataset, and regression from time series ofshape data [4, 15, 16, 18, 26, 32]. In [12, 14, 31], these tools

Page 2: Learning distributions of shape trajectories from …...the action of a group of diffeomorphisms to define a metric on a shape space [29, 43]. This approach has been used to compute

have been used to estimate an average trajectory of shapechanges from a longitudinal dataset using the convenientassumption that the parameters encoding inter-individualvariability are independent of time. The work in [27] in-troduced the idea to use the parallel transport to translatethe spatiotemporal patterns seen in one individual into thegeometry of another one. The co-adjoint transport is usedin [38] for the same purpose. Both estimate a group averagetrajectory from individual trajectories. The proposed mod-els do not account for inter-individual variability in the timedynamics, which is of key importance in the absence of tem-poral markers of the progression to align the sequences. Thesame remarks can be applied to [6], which introduces a nicetheoretical setting for spaces of trajectories, in the case of afixed number of temporal observations across subjects. Theneed for temporal alignment in longitudinal data analysisis highlighted for instance in [13] with a diffeomorphism-based morphometry approach, or in [40, 41] with quotientmanifolds. In [24, 37], a generative mixed-effects modelfor the statistical analysis of manifold-valued longitudinaldata is introduced for the analysis of feature vectors. Thismodel describes both the variability in the direction of theindividual trajectories by introducing the concept of “exp-parallelization” which relies on parallel transport, and thepace at which those trajectories are followed using “time-warp” functions. Similar time-warps are used by the authorsof [23] to refine their linear modeling approach [22].

1.3. Contributions

In this paper, we propose to extend the approach of [37]from low-dimensional feature vectors to shape data. Us-ing an approach designed for manifold-valued data forshape spaces defined by the action of a group of diffeomor-phisms raises several theoretical and computational difficul-ties. Are notably needed: a finite-dimensional set of dif-feomorphisms with a Riemannian manifold structure; sta-ble and efficient numerical schemes to compute Riemannianexponential and parallel transport operators on this mani-fold as no closed-form expressions are available; acceler-ated algorithmic convergence to cope with an hundreds oftimes larger dimensionality. To this end, we propose here:

• to formulate a generative non-linear mixed-effectsmodel for a finite-dimensional set of diffeomorphismsdefined by control points, to show that this set is sta-ble under exp-parallelization, and to use an efficientnumerical scheme for parallel transport;

• to introduce an adapted MCMC-SAEM algorithm withan adaptive block sampling of the latent variables, aspecific sampling strategy for shape parameters basedon random local displacements of the shape contours,and a vanishing tempering of the target log-likelihood;

• to validate our method on 2D simulated data and alarge dataset of 3D brain structures in the context of

Alzheimer’s disease progression, and to illustrate thepotential of our method for classifying spatiotemporalpatterns, e.g. to discriminate pathological versus nor-mal trajectories of ageing.

All in one, the proposed method estimates an averagespatiotemporal trajectory of shape changes from a longi-tudinal dataset, together with distributions of space-shifts,time-shifts and acceleration factors describing the variabil-ity in shape, onset and pace of shape changes respectively.

2. Deformation model2.1. The manifold of diffeomorphisms Dc0

We follow the approach taken in [12] built on the princi-ples of the Large Deformation Diffeomorphic Metric Map-ping (LDDMM) framework [30]. We note d 2 {2, 3} thedimension of the ambient space. We choose k a Gaussiankernel of width � 2 R⇤

+ and c a set of ncp 2 N “con-trol” points c = (c1, ..., cncp) of the ambient space Rd. Forany set of “momentum” vectors m = (m1, ...,mncn), wedefine the “velocity” vector field v : Rd ! Rd as the con-volution v(x) =

Pncp

k=1 k(ck, x)mk for any point x of theambient space Rd. From initial sets of ncp control points c0and corresponding momenta m0, we obtain the trajectoriest! (ct,mt) by integrating the Hamiltonian equations:

c = Kc m ; m = �1

2

rc

�m

T Kc m

(1)

where Kc is the ncp⇥ncp “kernel” matrix [k(ci, cj)]ij ,r(.)the gradient operator, and (.)

T the matrix transposition.Those trajectories prescribe the trajectory t ! vt in the

space of velocity fields. The integration along such a pathfrom the identity generates a flow of diffeomorphisms t !�t of the ambient space [5]. We can now define:

Dc0 =

��1 ; @t�t = vt � �t, �0 = Id, vt = Conv(ct,mt)

(ct, mt) = Ham(ct,mt), m0 2 Rdncp

(2)

where Conv(., .) and Ham(., .) are compact notations forthe convolution operator and the Hamiltonian equations (1)respectively. Dc0 has the structure of a manifold of finitedimension, where the metric at the tangent space T�1Dc0 isgiven by K�1

c1 . It is shown in [29] that the proposed pathst ! �t are the paths of minimum deformation energy, andare therefore the geodesics of Dc0 . These geodesics arefully determined by an initial set of momenta m0.

Then, any point x 2 Rd of the ambient space follows thetrajectory t ! �t(x). Such trajectories are used to deformany point cloud or mesh embedded in the ambient space,defining a diffeomorphic deformation of the shape. For-mally, this operation defines a shape space Sc0,y0 as the or-bit of a reference shape y0 under the action of Dc0 . Themanifold of diffeomorphisms Dc0 is used as a proxy to ma-nipulate shapes: all computations are performed in Dc0 or

Page 3: Learning distributions of shape trajectories from …...the action of a group of diffeomorphisms to define a metric on a shape space [29, 43]. This approach has been used to compute

more concretely on a finite set of control points and mo-mentum vectors, and applied back to the template shape y0

to obtain a result in Sc0,y0 .

2.2. Riemmanian exponentials on Dc0

For any set of control points c0, we define the exponen-tial operator Expc0 : m0 2 Rdncp ! �1 2 Dc0 . Note thatDc0 =

�Expc0(m0) |m0 2 Rdncp

.

The following proposition ensures the stability of Dc0 bythe exponential operator, i.e. that the control points obtainedby applying successive compatible exponential maps witharbitrary momenta are reachable by an unique integrationof the Hamiltonian equations from c0:Proposition. Let c0 be a set of control points. 8�1 2 Dc0 ,8w momenta, we have Exp�1(c0)(w) 2 Dc0 .Proof. We note �01 = Exp�1(c0)(w) 2 D�1(c0) and c

01 =

01 � �1(c0). By construction, there exist two paths t ! 't

in Dc0 and s ! '

0s in D�1(c0) such that '0

1 � '1(c0) = c

01.

Therefore there exist a diffeomorphic path u ! u suchthat (c0) = c

01. Concluding with [29], the path u! u of

minimum energy exists, and is written u ! Expc0(u.m00)

for some m

00 2 Rdncp . ⇤

As a physical interpretation might be given to the inte-gration time t when building a statistical model, we intro-duce the notation Expc0,t0,t : m0 2 Rdncp ! �t 2 Dc0

where �t is obtained by integrating from t = t0. Note thatExpc0 = Expc0,0,1.

On the considered manifold Dc0 , computing exponen-tials – i.e. geodesics – therefore consists in integrating or-dinary differential equations. This operation is direct andcomputationally tractable. The top line on Figure 1 plots ageodesic � : t! �t applied to the top-left shape y0.

2.3. Parallel transport and exp-parallels on Dc0

In [37] is introduced the exp-parallelism, which is ageneralization of Euclidian parallelism to geodesically-complete manifolds. It relies on the Riemmanian paralleltransport operator, which we propose to compute using thefanning scheme [28]. This numerical scheme only requiresthe exponential operator to approximate the parallel trans-port along a geodesic, with proved convergence.

We note Pc0,m0,t0,t : Rdncp ! Rdncp the parallel trans-port operator, which transports any momenta w along thegeodesic � : t ! �t = Expc0,t0,t(m0) from �t0 to �t. Forany c0, m0, w and t0, we can now define the curve:

t! ⌘c0,m0,t0,t(w) = Exp�(t)(c0)

⇥Pc0,m0,t0,t(w)

⇤. (3)

This curve, that we will call exp-parallel to �, is well-defined on the manifold Dc0 , according to the proposition ofSection 2.2. Figure 1 illustrates the whole procedure. Fromthe top-left shape, the computational scheme is as follow:integrate the Hamiltonian equations to obtain the control

Figure 1: Samples from a geodesic � (top) and an exp-parallelized curve ⌘ (bottom) on Sc0,y0 . Parameters encod-ing the geodesic are the blue momenta attached to con-trol points and plotted together with the associated velocityfields. Momenta in red are parallel transported along thegeodesic and define a deformation mapping each frame ofthe geodesic to a frame of ⌘. Exp-parallelization allows totransport a shape trajectory from one geometry to another.

points c(t) (red crosses) and momenta m(t) (bold blue ar-rows); compute the associated velocity fields vt (light bluearrows); compute the flow � : t ! �t (shape progression);transport the momenta w along � (red arrows); compute theexp-parallel curve ⌘ by repeating the three first steps alongthe transported momenta.

3. Statistical modelFor each individual 1 i N are available ni longitu-

dinal shape measurements y = (yi,j)1jni and associatedtimes (ti,j)1jni .

3.1. The generative statistical model

Let c0 a set of control points and m0 associated mo-menta. We call � the geodesic t ! Expc0,t0,t(m0) of Dc0 .Let y0 be a template mesh shape embedded in the ambi-ent space. For a subject i, the observed longitudinal shapemeasurements yi,0, ..., yi,ni are modeled as sample points attimes i(ti,j) of an exp-parallel curve t ! ⌘c0,m0,t0,t(wi)

to this geodesic �, plus additional noise ✏i,j :

yi,j = ⌘c0,m0,t0, i(ti,j)(wi) � y0 + ✏i,j . (4)

The time warp function i and the space-shift momenta wi

respectively encode for the individual time and space vari-ability. The time-warp is defined as an affine reparametriza-tion of the reference time t: i(t) = ↵i(t � t0 � ⌧i) + t0

where the individual time-shift ⌧i 2 R allows an inter-individual variability in the stage of evolution, and the in-dividual acceleration factor ↵i 2 R⇤

+ a variability in thepace of evolution. For convenience, we write ↵i = exp(⇠i).In the spirit of Independent Component Analysis [19], thespace-shift momenta wi is modeled as the linear combi-nation of ns sources, gathered in the ncp ⇥ ns matrix A:

Page 4: Learning distributions of shape trajectories from …...the action of a group of diffeomorphisms to define a metric on a shape space [29, 43]. This approach has been used to compute

wi = Am?0si. Before computing this superposition, each

column cl(A) of A has been projected on the hyperplanem

?0 for the metric Kc0 , ensuring the orthogonality between

m0 and wi. As argued in [37], this orthogonality is fun-damental for the identifiability of the model. Without thisconstraint, the projection of the space shifts (wi)i on m0

could be confounded with the acceleration factors (↵i)i.

3.2. Mixed-effects formulation

We use either the current [42] or varifold [7] noise modelfor the residuals ✏i,j , allowing our method to work with in-put meshes without any point correspondence. In this set-ting, we note ✏i,j

iid⇠ N (0,�

2✏ ). The other previously in-

troduced variables are modeled as random effects z, with:y0 ⇠ N (y0,�

2y), c0 ⇠ N (c0,�

2c ), m0 ⇠ N (m0,�

2m), A ⇠

N (A,�

2A), ⌧i

iid⇠ N (0,�

2⌧ ), ⇠i

iid⇠ N (0,�

2⇠ ), si

iid⇠ N (0, 1).We define ✓ = (y0, c0,m0, A, t0,�

2⌧ ,�

2⇠ ,�

2✏ ) the fixed ef-

fects i.e. the parameters of the model. The remaining vari-ance parameters �2

y , �2c , �2

m and �2A can be chosen arbitrar-

ily small. Standard conjugate distributions are chosen asBayesian priors on the model parameters: y0 ⇠ N (y0, &

2O),

c0 ⇠ N (c0, &2c ), m0 ⇠ N (m0, &

2m), A ⇠ N (A, &

2A),

t0 ⇠ N (t0, &2t ), �2

⌧ ⇠ IG(m⌧ ,�2⌧,0), �2

⇠ ⇠ IG(m⇠,�2⇠,0),

and �2✏ ⇠ IG(m✏,�

2✏,0) with inverse-gamma distributions

on variance parameters. Those priors ensure the existenceof the maximum a posteriori (MAP) estimator. In practice,they regularize and guide the estimation procedure.

The proposed model belongs to the curved exponentialfamily (see supplementary material, which gives the com-plete log-likelihood). In this setting, the algorithm intro-duced in the following section has a proved convergence.

We have defined a distribution of trajectories that couldbe noted t ! y(t) = f✓,t(z) where z is a random variablefollowing a normal distribution. We call t! f✓,t(E[z]) theaverage trajectory, which may not be equal to the expectedtrajectory t! E[f✓,t(z)] in the general non-linear case.

4. Estimation4.1. The MCMC-SAEM algorithm

The Expectation Maximization (EM) algorithm [11] al-lows to estimate the parameters of a mixed-effects modelwith latent variables, here the random effects z. It alter-nates between an expectation (E) step and a maximiza-tion (M) one. The E step is intractable in our case, dueto the non-linearity of the model. In [10] is introducedand proved a stochastic approximation of the EM algo-rithm, where the E step is replaced by a simulation (S)step followed by an approximation (A) one. The S steprequires to sample q(z|y, ✓k), which is also intractablein our case. In the case of curved exponential mod-els, the authors in [2] show that the convergence holds

if the S step is replaced by a single transition of an er-godic Monte-Carlo Markov Chain (MCMC) whose sta-tionary distribution is q(z|y, ✓k). This global algorithm iscalled the Monte-Carlo Markov Chain Stochastic Approxi-mation Expectation-Maximization (MCMC-SAEM), and isexploited in this paper to compute the MAP estimate of themodel parameters ✓map = max✓

Rq(y, z|✓) dz.

4.2. The adaptative block sampler

We use a block formulation of the Metropolis-Hastingwithin Gibbs (MHwG) sampler in the S-MCMC step. Thelatent variables z are decomposed into nb natural blocks:z =

�y0, c0,m0, [cl(A)]l, [⌧i, ⇠i, si]i

. Those blocks have

highly heterogeneous sizes, e.g. a single scalar for ⌧i versuspossibly thousands for y0, for which we introduce a specificproposal distribution in Section 4.3.

For all the other blocks, we use a symmetric randomwalk MHwG sampler with normal proposal distributionsof the form N (0,�

2b Id) to perturb the current block state

z

kb . In order to achieve reasonable acceptance rates ar i.e.

around ar

?= 30% [36], the proposal standard deviations

�b are dynamically adapted every nadapt iterations by mea-suring the mean acceptance rates ar over the last ndetect it-erations, and applying, for any b:

�b �b +1

k

ar � ar

?

(1� ar

?) ar�ar? + ar

?ar<ar?

(5)

with � > 0.5. Inspired by [3], this dynamic adaptation isperformed with a geometrically decreasing step-size k

�� ,ensuring the vanishing property of the adaptation schemeand the convergence of the whole algorithm [2, 3]. It provedvery efficient in practice with nadapt = ndetect = 10 and� = 0.51, for any kind of data.

4.3. Efficient sampling of smooth template shapes

The first block z1 = y0 i.e. the coordinates of the pointsof the template mesh, is of very high dimension: naivelysampling over each scalar value of its numerical descrip-tion would result both in unnatural distorted shapes and adaunting computational burden.

We propose to take advantage of the geometrical na-ture of y0 and leverage the framework introduced in Sec-tion 2 by perturbing the current block state z

k1 with a small

displacement field v, obtained by the convolution of ran-dom momenta on a pre-selected set of control points. Thisproposal distribution can be seen as a normal distributionN (0, �

21 D

TD) where �2

1 is the variance associated withthe random momenta, and D the convolution matrix. Inpractice, dynamically adapting the proposal variance �2

1 andselecting regularly-spaced shape points as control pointsproved efficient.

Page 5: Learning distributions of shape trajectories from …...the action of a group of diffeomorphisms to define a metric on a shape space [29, 43]. This approach has been used to compute

4.4. Tempering

The MCMC-SAEM is proved convergent toward a localmaximum of ✓ !

Rq(y, z|✓) dz. In practice, the dimen-

sionality of the energetic landscape q(y, z|✓) and the pres-ence of multiple local maxima can make the estimation pro-cedure sensitive to initialization. Inspired by the globally-convergent simulated annealing algorithm, [25] proposesto carry out the optimization procedure in a smoothed ver-sion of the original landscape qT (y, z|✓). The temperatureparameter T controls this smoothing, and should decreasefrom large values to 1, for which qT = q.

We propose to introduce such a temperature parameteronly for the population variables zpop. The tempered versionof the complete log-likelihood is given as supplementarymaterial. In our experiments, the chosen temperature se-quence T

k remains constant at first, and then geometricallydecreases to unity. Implementing this “tempering” featurehad a dramatic impact on the required number of iterationsbefore convergence, and greatly improved the robustness ofthe whole procedure. Note that the theoretical convergenceproperties of the MCMC-SAEM are not degraded, since thetempered phase of the algorithm can be seen as an initializ-ing heuristic, and may actually be improved.

Algorithm 1: Estimation of the longitudinal deformationsmodel with the MCMC-SAEM.Code publicly available at: www.deformetrica.org.input : Longitudinal dataset of shapes y = (yi,j)i,j . Initial

parameters ✓0 and latent variables z

0. Geometri-cally decreasing sequence of step-sizes ⇢k.

output: Estimation of ✓map. Samples (z

s)

s approximatelydistributed following q(z | y, ✓map).

Initialization: set k = 0 and S

0= S(z

0).

repeatSimulation: foreach block of latent variables zb do

Draw a candidate z

cb ⇠ pb(.|zkb ).

Set zc = (z

k+11 , ..., z

k+1b�1 , z

cb , z

kb+1, ..., z

knb).

Compute the geodesic � : t! Expc0,t0,t(m0).8i, compute wi = A

?m0

si.8i, compute w : t! Pc0,m0,t0,t(wi).8i, j, compute Exp�[ i(ti,j)](c0)

�w[ i(ti,j)]

.

Compute the acceptation ratio ! = min

h1,

q(zc|y,✓k)q(zk|y,✓k)

i.

if u ⇠ U(0, 1) < ! then z

k+1b z

cb else z

k+1b z

kb .

endStochastic approx.: Sk+1 S

k+ ⇢

k⇥S(z

k+1)� S

k⇤.

Maximization: ✓k+1 ✓

?(S

k+1).

Adaptation: if remainder(k + 1, nadapt) = 0 then updatethe proposal variances (�2

b )b with equation (5).Increment: set k k + 1.

until convergence;

4.5. Sufficient statistics and maximization step

Exhibiting the sufficient statistics S1=y0, S2=c0, S3=

m0, S4=A, S5=P

i t0+⌧i, S6=P

i(t0+⌧i)2, S7=

Pi ⇠

2i

and S8 =

Pi

Pj kyi,j � ⌘c0,m0,t0, i(ti,j)(wi) � y0k2, the

update of the model parameters ✓ ✓

? in the M step of theMCMC-SAEM can be derived in closed-form. The explicitexpressions are given as supplementary material.

5. Experiments

5.1. Validation with simulated data in R2

Convergence study. To validate the estimation procedure,we first generate synthetic data directly from the modelwithout additional noise. Our choice of reference geodesic� is plotted on top line of the previously introduced Fig-ure 1: the template y0 is the top central shape, the chosenfive control points c0 are the red crosses, and the momentam0 the bold blue arrow. Those parameters are completedwith t0 = 70, �⌧ = 1, �⇠ = 0.1. With ns = 4 independentcomponents, we simulate N = 100 individual trajectoriesand sample hniii = 5 observations from each.

The algorithm is run ten times. Figure 2 plots the evolu-tion of the error on the parameters along the estimation pro-cedure in log scale. Each color corresponds to a differentrun: the algorithm converges to the same point each time,as it is confirmed by the small variances on the residual er-rors indicated in Table 1. Those residual errors come fromthe finite number of observations of the generated dataset

0 10 20 30 40 50

Thousands of iterations

2

3

4

||y 0e

st -

y0tr

ue||2 va

r

Varifold error on y0

0 10 20 30 40 50

Thousands of iterations

1

1.5

2

2.5

||v 0e

st -

v0tr

ue||2

L2 error on v0

0 10 20 30 40 50

Thousands of iterations

10-1

100

101

|t0e

st -

t0tr

ue|

L1 error on t0

0 10 20 30 40 50

Thousands of iterations

10-3

10-2

10-1

100

101

|στe

st -

στtr

ue|

L1 error on στ

0 10 20 30 40 50

Thousands of iterations

10-3

10-2

10-1

100

101

|σξe

st -

σξtr

ue|

L1 error on σξ

0 10 20 30 40 50

Thousands of iterations

0.1

0.2

0.3

0.4

|σϵe

st -

σϵtr

ue|

L1 error on σϵ

Figure 2: Error on the population parameters along the esti-mation procedure, with logarithmic scales. The residual onthe template shape y0 is computed with the varifold metric.

Page 6: Learning distributions of shape trajectories from …...the action of a group of diffeomorphisms to define a metric on a shape space [29, 43]. This approach has been used to compute

k�y0k2var. k�v0k2 |�t0| |��⌧ | |��⇠| |��✏|⌦k�vik2

↵i

⌦|�⇠i|

↵i

⌦|�⌧i|

↵i

1.43±5.6% 0.89±0.7% 0.19±2.7% 0.029±13.2% 0.017±7.6% 0.11±0.1% 2.47±1.7% 0.022±6.7% 0.19±0.8%

Table 1: Absolute residual errors on the estimated parameters and associated relative standard deviations across the 10 runs.Are noted v0 = Conv(c0,m0) and vi = Conv(c0, wi). The operator h.ii indicates an average over the index i. Residualsare satisfyingly small, as it can be seen for |�t0| for instance when compared with the time-span max|�tij | = 4. The lowstandard deviations suggest that the stochastic estimation procedure is stable and reproduces very similar results at each run.

Figure 3: Estimated mean progression(bottom line in bold), and three recon-structed individual scenarii (top lines). In-put data is plotted in red in the rele-vant frames, demonstrating the reconstruc-tion ability of the estimated model. Ourmethod is able to disentangle the variabil-ity in shape, starting time of the arm move-ment and speed.

and the Bayesian priors, but are satisfyingly small, as qual-itatively confirmed by Figure 3. The estimated mean trajec-tory, in bold, matches the true one, given by the top line ofFigure 1. Figure 3 also illustrates the ability of our methodto reconstruct continuous individual trajectories.

Personalizing the model to unseen data. Once a modelhas been learned i.e. the parameters ✓map have been esti-mated, it can easily be personalized to the observations ynewof a new subject by maximizing q(ynew, znew | ✓map) for thelow-dimensional latent variables znew. We implemented thismaximization procedure with the Powell’s method [35], andevaluated it by registering the simulated trajectories to thetrue model. Table 2 gathers the results for the previously-introduced dataset with hniii = 5 observations per subject,and extended ones with hniii = 7 and 9. The parametersare satisfyingly estimated in all configurations: the recon-

Experience |��✏|⌦|�si|

↵i

⌦|�⇠i|

↵i

⌦|�⌧i|

↵i

hniii = 5 0.110 3.34% 37.0% 5.45%hniii = 7 0.095 2.98% 16.2% 3.86%hniii = 9 0.087 2.38% 11.9% 3.28%

Table 2: Residual errors metrics for the longitudinal regis-tration procedure, for three simulated datasets. The abso-lute residual error on �✏ is given, the other errors are givenin percentage of the simulation standard deviation.

struction error measured by |��✏| remains as low as in theprevious experiment (see Table 1, Figure 3). The acceler-ation factor is the most difficult parameter to estimate withsmall observation windows of the individual trajectories; atleast two observations are needed to obtain a good estimate.

5.2. Hippocampal atrophy in Alzheimer’s disease

Longitudinal deformations model on MCIc subjects. Weextract the T1-weighted magnetic resonance imaging mea-surements of N = 100 subjects from the ADNI database,with hniii = 7.6 datapoints on average. Those sub-jects present mild cognitive impairements, and are even-tually diagnosed with Alzheimer’s disease (MCI convert-ers, noted MCIc). In a pre-processing phase, the 3Dimages are affinely aligned and the segmentations of theright-hemisphere hippocampi are transformed into a surfacemeshes. Each affine transformation is then applied to thecorresponding mesh, before rigid alignement of follow-upmeshes on the baseline one. The hippocampus is a subcorti-cal brain structure which plays a central role in memory, andexperiences atrophy during the development of Alzheimer’sdisease. We initialize the geodesic population parametersy0, c0,m0 with a geodesic regression [15, 16] performed ona single subject. The reference time t0 is initialized to themean of the observation times (ti,j)i,j and �2

⌧ to the corre-sponding variance. We choose to estimate ns = 4 indepen-dent components and initialize the corresponding matrix A

to zero, so as the individual latent variables ⌧i, ⇠i, si. After10,000 iterations, the parameter estimates stabilized.

Page 7: Learning distributions of shape trajectories from …...the action of a group of diffeomorphisms to define a metric on a shape space [29, 43]. This approach has been used to compute

!. #$

!. !%

!. &%

!. '(

!. $(

!. ((

!. %(

!. )(

*!++/-

Figure 4: Estimated mean progression of the right hippocampi. Successive ages: 69.3y, 71.8y (i.e. the template y0), 74.3y,76.8y, 79.3y, 81.8y, and 84.3y. The color map gives the norm of the velocity field kv0k on the meshes.

!. #$

#. #%

&. #%

'. #(

$. #(

). #)

(. #)

%. #$

*+,$--//0

Figure 5: Third independent component. The plotted hippocampi correspond to si,3 successively equal to: -3, -2, -1, 0 (i.e.the template y0), 1, 2 and 3. Note that this component is orthogonal to the temporal trajectory displayed in Figure 4.

Figure 4 plots the estimated mean progression, which ex-hibits a complex spatiotemporal atrophy pattern during dis-ease progression: a pinching effect at the “joint” betweenthe head and the body, combined with a specific atrophy ofthe medial part of the tail. Figure 5 plots an independentcomponent, which is orthogonal to the mean progressionby construction. This component seems to account for theinter-subject variability in the relative size of the hippocam-pus head compared to its tail.

We further examine the correlation between individualparameters and several patients characteristics. Figure 6 ex-hibits the strong correlation between the estimated individ-ual time-shifts ⌧i and the age of diagnostic t

diagi , suggesting

that the hippocampal atrophy correlates well with the cog-nitive symptoms. The few outliers above the regression line

55 60 65 70 75 80 85 90( early onset) Onset age t0 + ti (late onset!)

60

70

80

90

Age

atdi

agno

sis

tdiag

i

Figure 6: Comparison of the estimated individual time-shifts ⌧i versus the age of diagnostic t

diagi . R2

= 0.74.

might have resisted better to the atrophy of their hippocam-pus with a higher brain plasticity, in line with the cognitivereserve theory [39]. The few outliers below this line couldhave developed a subform of the disease, with delayed atro-phy of their hippocampi. Further investigation is requiredto rule out potential convergence issues in the optimiza-tion procedure. Figures 7, 8, 9 propose group comparisonsbased on the estimated individual parameters: the accelera-tion factor ↵i, time-shift ⌧i and space-shift si,3 in the direc-tion of the third component (see Figure 5). The distributionsof those parameters are significantly different for the Mann-Whitney statistical test when dividing the N = 100 MCIcsubjects according to gender, APOE4 mutation status, andonset age t0 + ⌧i respectively.

0.5 1.0 1.5 2.0( slow) Acceleration factor ai (fast!)

0

5

10

MaleFemale

Figure 7: Distributions of acceleration factors ↵i accord-ing to the gender. Hippocampal atrophy is faster in femalesubjects (p = 0.045).

Page 8: Learning distributions of shape trajectories from …...the action of a group of diffeomorphisms to define a metric on a shape space [29, 43]. This approach has been used to compute

55 60 65 70 75 80 85 90( early onset) Onset age t0 + ti (late onset!)

0

5

10No APOE4 alleleOne APOE4 alleleTwo APOE4 alleles

Figure 8: Distributions of time-shifts ⌧i according to thenumber of APOE4 alleles. Hippocampal atrophy occursearlier in carriers of 1 or 2 alleles (p = 0.017 and 0.015).

Figure 9: Distribution of the third source term si,3 accord-ing to the categories {⌧i �3}, {�3 < ⌧i < 3}, and{3 ⌧i}. Hippocampal atrophy seems to occurs later insubjects presenting a lower volume ratio of the hippocam-pus tail over the hippocampus head (p = 0.0049).

hniii Shape features Naive feature All features

1 71%±4.5 [lr] 50%±5.0 [nb] 58%±5.0 [lr]2 77%±4.3 [lr] 58%±4.9 [5nn] 68%±4.7 [dt]4 79%±4.1 [svm] 67%±4.7 [5nn] 80%±4.0 [lr]5 77%±4.2 [nn] 77%±4.2 [lr] 82%±3.8 [nb]

6.86 83%±3.7 [lr] 80%±4.0 [lr] 86%±3.4 [lr]

Table 3: Mean classification scores and associated standarddeviations, computed on 10,000 bootstrap samples from thetest dataset. Across all tested classifiers (sklearn defaulthyperparameters), only the best performing one is reportedin each case: [lr] logistic regression, [nb] naive Bayes,[5nn] 5 nearest neighbors, [dt] decision tree, [nn] neural net-work, [svm] linear support vector machine.

Classifying pathological trajectories vs. normal ageing.We processed another hundred of individuals from theADNI database (hniii = 7.37), choosing this time controlsubjects (CN). We form two balanced datasets, each con-taining 50 MCIc and 50 CN. We learn two distinct longitu-dinal deformations model on the training MCIc (N = 50,

hniii = 8.14) and CN (N = 50, hniii = 8.08) subjects.We personalize both models to all the 200 subjects, anduse the scaled and normalized differences z

MCIci � z

CNi as

feature vectors of dimension 6, on which a list of stan-dard classifiers are trained and tested to predict the labelin {MCIc,CN}. For several number of observations pertest subject hniii configurations, we compute confidenceintervals by bootstraping the test set. Table 3 comparesthe results with a naive approach, using as unique featurethe slope of individually-fitted linear regressions of the hip-pocampal volume with age. Classifiers performed consis-tently better with the features extracted from the longitu-dinal deformations model, even with a single observation.The classification performance increases with the numberof available observations per subject. Interestingly, fromhniii = 4 pooling the shape and volume features yields animproved performance, suggesting complementarity.

6. Conclusion

We proposed a hierarchical model on a manifold of dif-feomorphisms estimating the spatiotemporal distribution oflongitudinal shape data. The observed shape trajectoriesare represented as individual variations of a group-average,which can be seen as the mean progression of the popu-lation. Both spatial and temporal variability are estimateddirectly from the data, allowing the use of unaligned se-quences. This feature is key for applications where no ob-jective temporal markers are available, as it is the case forAlzheimer’s disease progression for instance, whose onsetage and pace of progression vary among individuals. Ourmodel builds on the principles of a generic longitudinalmodeling for manifold-valued data [37]. We provided acoherent theoretical framework for its application to shapedata, along with the needed algorithmic solutions for paral-lel transport and sampling on our specific manifold. We es-timated our model with the MCMC-SAEM algorithm bothwith simulated and real data. The simulation experimentsconfirmed the ability of the proposed algorithm to retrievethe optimal parameters in realistic scenarii. The applicationto medical imaging data, namely segmented hippocampibrain structures of Alzheimer’s diseased patients, deliv-ered results coherent with medical knowledge, and providesmore detailed insights into the complex atrophy pattern ofthe hippocampus and its variability across patients. In fu-ture work, the proposed method will be leveraged for auto-matic diagnosis and prognosis purposes. Further investiga-tions are also needed to evaluate the algorithm convergencewith respect to the number of individual samples.

Acknowledgments. This work has been partly funded by the Eu-ropean Research Council with grant 678304, European Union’sHorizon 2020 research and innovation program with grant 666992,and the program Investissements d’avenir ANR-10-IAIHU-06.

Page 9: Learning distributions of shape trajectories from …...the action of a group of diffeomorphisms to define a metric on a shape space [29, 43]. This approach has been used to compute

References[1] S. Allassonniere, S. Durrleman, and E. Kuhn. Bayesian

mixed effect atlas estimation with a diffeomorphic deforma-tion model. SIAM Journal on Imaging Science, 8:13671395,2015. 1

[2] S. Allassonniere, E. Kuhn, and A. Trouve. Construction ofbayesian deformable models via a stochastic approximationalgorithm: a convergence study. Bernoulli, 16(3):641–678,2010. 4

[3] Y. F. Atchade. An adaptive version for the metropolis ad-justed langevin algorithm with a truncated drift. Method-ology and Computing in applied Probability, 8(2):235–254,2006. 4

[4] M. Banerjee, R. Chakraborty, E. Ofori, M. S. Okun, D. E.Viallancourt, and B. C. Vemuri. A nonlinear regression tech-nique for manifold valued data with applications to medi-cal image analysis. In Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition, pages 4424–4432, 2016. 1

[5] F. Beg, M. Miller, A. Trouve, and L. Younes. Computinglarge deformation metric mappings via geodesic flows of dif-feomorphisms. IJCV, 2005. 1, 2

[6] R. Chakraborty, M. Banerjee, and B. C. Vemuri. Statisticson the space of trajectories for longitudinal data analysis. InBiomedical Imaging (ISBI 2017), 2017 IEEE 14th Interna-tional Symposium on, pages 999–1002. IEEE, 2017. 2

[7] N. Charon and A. Trouve. The varifold representation ofnonoriented shapes for diffeomorphic registration. SIAMJournal on Imaging Sciences, 6(4):2547–2580, 2013. 4, 11

[8] G. E. Christensen, R. D. Rabbitt, and M. I. Miller. De-formable templates using large deformation kinematics.IEEE transactions on image processing, 5(10):1435–1447,1996. 1

[9] T. F. Cootes, G. J. Edwards, and C. J. Taylor. Active appear-ance models. IEEE Transactions on pattern analysis andmachine intelligence, 23(6):681–685, 2001. 1

[10] B. Delyon, M. Lavielle, and E. Moulines. Convergence of astochastic approximation version of the em algorithm. An-nals of statistics, pages 94–128, 1999. 4

[11] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximumlikelihood from incomplete data via the em algorithm. Jour-nal of the royal statistical society. Series B (methodological),pages 1–38, 1977. 4

[12] S. Durrleman, S. Allassonniere, and S. Joshi. Sparse adaptiveparameterization of variability in image ensembles. IJCV,101(1):161–183, 2013. 1, 2

[13] S. Durrleman, X. Pennec, A. Trouve, J. Braga, G. Gerig, andN. Ayache. Toward a comprehensive framework for the spa-tiotemporal statistical analysis of longitudinal shape data. In-ternational Journal of Computer Vision, 103(1):22–59, May2013. 1, 2

[14] S. Durrleman, X. Pennec, A. Trouve, G. Gerig, and N. Ay-ache. Spatiotemporal atlas estimation for developmental de-lay detection in longitudinal datasets. In Med Image ComputComput Assist Interv, pages 297–304. Springer, 2009. 1

[15] J. Fishbaugh, M. Prastawa, G. Gerig, and S. Durrleman.Geodesic regression of image and shape data for improved

modeling of 4D trajectories. In ISBI 2014 - 11th Interna-tional Symposium on Biomedical Imaging, pages 385 – 388,Apr. 2014. 1, 6

[16] T. Fletcher. Geodesic regression and the theory of leastsquares on riemannian manifolds. IJCV, 105(2):171–185,2013. 1, 6

[17] P. Gori, O. Colliot, L. Marrakchi-Kacem, Y. Worbe,C. Poupon, A. Hartmann, N. Ayache, and S. Durrleman. ABayesian Framework for Joint Morphometry of Surface andCurve meshes in Multi-Object Complexes. Medical ImageAnalysis, 35:458–474, Jan. 2017. 1

[18] J. Hinkle, P. Muralidharan, P. T. Fletcher, and S. Joshi. Poly-nomial Regression on Riemannian Manifolds, pages 1–14.Springer Berlin Heidelberg, Berlin, Heidelberg, 2012. 1

[19] A. Hyvarinen, J. Karhunen, and E. Oja. Independent compo-nent analysis, volume 46. John Wiley & Sons, 2004. 3

[20] S. C. Joshi and M. I. Miller. Landmark matching via largedeformation diffeomorphisms. IEEE Transactions on ImageProcessing, 9(8):1357–1370, 2000. 1

[21] D. G. Kendall. Shape manifolds, procrustean metrics, andcomplex projective spaces. Bulletin of the London Mathe-matical Society, 16(2):81–121, 1984. 1

[22] H. J. Kim, N. Adluru, M. D. Collins, M. K. Chung, B. B.Bendlin, S. C. Johnson, R. J. Davidson, and V. Singh. Multi-variate general linear models (mglm) on riemannian man-ifolds with applications to statistical analysis of diffusionweighted images. In Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition, pages 2705–2712, 2014. 2

[23] H. J. Kim, N. Adluru, H. Suri, B. C. Vemuri, S. C. Johnson,and V. Singh. Riemannian nonlinear mixed effects models:Analyzing longitudinal deformations in neuroimaging. InProceedings of IEEE Conference on Computer Vision andPattern Recognition (CVPR), 2017. 2

[24] I. Koval, J.-B. Schiratti, A. Routier, M. Bacci, O. Colliot,S. Allassonniere, S. Durrleman, A. D. N. Initiative, et al. Sta-tistical learning of spatiotemporal patterns from longitudinalmanifold-valued networks. In International Conference onMedical Image Computing and Computer-Assisted Interven-tion, pages 451–459. Springer, 2017. 2

[25] M. Lavielle. Mixed effects models for the population ap-proach: models, tasks, methods and tools. CRC press, 2014.5, 11

[26] M. Lorenzi, N. Ayache, G. Frisoni, and X. Pennec. 4D reg-istration of serial brains MR images: a robust measure ofchanges applied to Alzheimer’s disease. Spatio TemporalImage Analysis Workshop (STIA), MICCAI, 2010. 1

[27] M. Lorenzi, N. Ayache, and X. Pennec. Schild’s ladder forthe parallel transport of deformations in time series of im-ages. pages 463–474. Springer, 2011. 2

[28] M. Louis, A. Bone, B. Charlier, S. Durrleman, A. D. N. Ini-tiative, et al. Parallel transport in shape analysis: a scalablenumerical scheme. In International Conference on Geomet-ric Science of Information, pages 29–37. Springer, 2017. 3

[29] M. I. Miller, A. Trouve, and L. Younes. Geodesic shootingfor computational anatomy. Journal of Mathematical Imag-ing and Vision, 24(2):209–228, 2006. 1, 2, 3

Page 10: Learning distributions of shape trajectories from …...the action of a group of diffeomorphisms to define a metric on a shape space [29, 43]. This approach has been used to compute

[30] M. I. Miller and L. Younes. Group actions, homeomor-phisms, and matching: A general framework. InternationalJournal of Computer Vision, 41(1-2):61–84, 2001. 2

[31] P. Muralidharan and P. T. Fletcher. Sasaki metrics for anal-ysis of longitudinal data on manifolds. In Computer Visionand Pattern Recognition (CVPR), 2012 IEEE Conference on,pages 1027–1034. IEEE, 2012. 1

[32] M. Niethammer, Y. Huang, and F.-X. Vialard. Geodesic re-gression for image time-series. In International Conferenceon Medical Image Computing and Computer-Assisted Inter-vention, pages 655–662. Springer, 2011. 1

[33] X. Pennec. Intrinsic statistics on riemannian manifolds: Ba-sic tools for geometric measurements. Journal of Mathemat-ical Imaging and Vision, 25(1):127–154, 2006. 1

[34] X. Pennec, P. Fillard, and N. Ayache. A riemannian frame-work for tensor computing. International Journal of Com-puter Vision, 66(1):41–66, 2006. 1

[35] M. J. Powell. An efficient method for finding the minimumof a function of several variables without calculating deriva-tives. The computer journal, 7(2):155–162, 1964. 6

[36] G. O. Roberts, A. Gelman, W. R. Gilks, et al. Weak con-vergence and optimal scaling of random walk metropolis al-gorithms. The annals of applied probability, 7(1):110–120,1997. 4

[37] J.-B. Schiratti, S. Allassonniere, O. Colliot, and S. Durrle-man. Learning spatiotemporal trajectories from manifold-valued longitudinal data. In NIPS 28, pages 2404–2412.2015. 2, 3, 4, 8

[38] N. Singh, J. Hinkle, S. Joshi, and P. T. Fletcher. Hierarchicalgeodesic models in diffeomorphisms. IJCV, 117(1):70–92,2016. 2

[39] Y. Stern. Cognitive reserve and alzheimer disease. AlzheimerDisease & Associated Disorders, 20(2):112–117, 2006. 7

[40] J. Su, S. Kurtek, E. Klassen, A. Srivastava, et al. Statisticalanalysis of trajectories on riemannian manifolds: bird migra-tion, hurricane tracking and video surveillance. The Annalsof Applied Statistics, 8(1):530–552, 2014. 1, 2

[41] J. Su, A. Srivastava, F. D. de Souza, and S. Sarkar. Rate-invariant analysis of trajectories on riemannian manifoldswith application in visual speech recognition. In Proceed-ings of the IEEE Conference on Computer Vision and PatternRecognition, pages 620–627, 2014. 1, 2

[42] M. Vaillant and J. Glaunes. Surface matching via currents.In Information processing in medical imaging, pages 1–5.Springer, 2005. 4, 11

[43] L. Younes. Shapes and Diffeomorphisms. Applied Mathe-matical Sciences. Springer Berlin Heidelberg, 2010. 1

[44] M. Zhang, N. Singh, and P. T. Fletcher. Bayesian estimationof regularization and atlas building in diffeomorphic imageregistration. In IPMI, volume 23, pages 37–48, 2013. 1

Page 11: Learning distributions of shape trajectories from …...the action of a group of diffeomorphisms to define a metric on a shape space [29, 43]. This approach has been used to compute

Appendix: supplementary materialWe introduce the onset age individual random variable ti = t0 + ⌧i ⇠ N (t0,�

2⌧ ) instead of the time shift ⌧i.

The obtained hierarchical model is equivalent to the one presented in Section 3, with unchanged parameters ✓ =

(y0, c0,m0, A, t0,�2⌧ ,�

2⇠ ,�

2✏ ) and equivalent random effects z = (zpop, z1, ..., zN ), where zpop = (y0, c0,m0, A) and

8i 2 J1, NK, zi = (ti, ⇠i, si). The complete log-likelihood writes:

log q(y, z, ✓) =

NX

i=1

niX

j=1

log q(yi,j |z, ✓) + log q(zpop|✓) +NX

i=1

log q(zi|✓) + log q(✓) (6)

where the densities q(yi,j |z, ✓), q(zpop|✓), q(zi|✓) and q(✓) are given, up to an additive constant, by:

�2 log q(yi,j |z, ✓)+cst= ⇤ log �

2✏ + kyi,j � ⌘c0,m0,t0, i(ti,j)(wi) � y0k2/�2

✏ (7)

�2 log q(zpop|✓)+cst= |y0| log �2

y + ky0 � y0k2/�2y + |c0| log �2

c + kc0 � c0k2/�2c (8)

+ |m0| log �2m + km0 �m0k2/�2

m + |A| log �2A + kA�Ak2/�2

A

�2 log q(zi|✓)+cst= log �

2⌧ + (ti � t0)

2/�

2⌧ + log �

2⇠ + ⇠

2i /�

2⇠ + ksik2 (9)

�2 log q(✓) +cst= ky0 � y0k2/&2y + kc0 � c0k2/&2c + km0 �m0k2/&2m + kA�Ak2/&2A (10)

+ (t0 � t0)2/&

2t +m⌧ log �

2⌧ +m⌧�

2⌧,0/�

2⌧ +m⇠ log �

2⇠ +m⇠�

2⇠,0/�

2⇠

+ m✏ log �2✏ +m✏�

2✏,0/�

2✏

noting ⇤ the dimension of the space where the residual kyi,j � ⌘c0,m0,t0, i(ti,j)(wi) � y0k2 is computed, and |y0|, |c0|, |m0|and |A| the total dimension of y0, c0, m0 and A respectively. We chose either the current [42] or the varifold [7] norm for theresiduals.

Noticing the identity ⌘c0,m0,t0, i(ti,j) = ⌘c0,m0,0, i(ti,j)�t0 , the complete log-likelihood can be decomposed intolog q(y, z, ✓) =

⌦S(y, z),�(✓)

↵Id � (✓) i.e. the proposed mixed-effects model belongs the curved exponential family.

In this setting, the MCMC-SAEM algorithm presented in Section 4 has a proved convergence.Exhibiting the sufficient statistics S1 = y0, S2 = c0, S3 = m0, S4 = A, S5 =

Pi ti, S6 =

Pi t

2i , S7 =

Pi ⇠

2i and

S8 =

Pi

Pj kyi,j � ⌘c0,m0,t0, i(ti,j)(wi) � y0k2 (see Section 4.5), the update of the model parameters ✓ ✓

? in the M stepof the MCMC-SAEM algorithm can be derived in closed form:

y0?=

⇥&

2y S1 + �

2y y0

⇤/

⇥&

2y + �

2y

⇤t

?0 =

⇥&

2t S5 + �

2⌧?t0

⇤/

⇥N &

2t + �

2⌧? ⇤ (11)

c0?=

⇥&

2c S2 + �

2c c0

⇤/

⇥&

2c + �

2c

⇤�

2⌧?=

⇥S6 � 2t

?0S5 +Nt

?02+m⌧ �

2⌧,0

⇤/

⇥N +m⌧

⇤(12)

m0?=

⇥&

2m S3 + �

2m m0

⇤/

⇥&

2m + �

2m

⇤�

2⇠?=

⇥S7 +m⇠ �

2⇠,0

⇤/

⇥N +m⇠

⇤(13)

A

?=

⇥&

2A S4 + �

2A A

⇤/

⇥&

2A + �

2A

⇤�

2✏?=

⇥S8 +m✏ �

2✏,0

⇤/

⇥⇤Nhniii +m✏

⇤(14)

The intricate update of the parameters t0 t

?0 and �2

⌧ �

2⌧? can be solved by iterative replacement.

Similarly to Equation 6, the tempered complete log-likelihood writes:

log qT (y, z, ✓) =

NX

i=1

niX

j=1

log qT (yi,j |z, ✓) + log qT (zpop|✓) +NX

i=1

log q(zi|✓) + log q(✓) (15)

with: � 2 log qT (yi,j |z, ✓)+cst= ⇤ log(T�

2✏ ) + kyi,j � ⌘c0,m0,t0, i(ti,j)(wi) � y0k2/(T�2

✏ ) (16)

�2 log qT (zpop|✓)+cst= |y0| log(T�2

y) + ky0 � y0k2/(T�2y) + |c0| log(T�2

c ) + kc0 � c0k2/(T�2c ) (17)

+ |m0| log(T�2m) + km0 �m0k2/(T�2

m) + |A| log(T�2A) + kA�Ak2/(T�2

A)

Tempering can therefore be understood as an artificial increase of the variances �2✏ , �2

y , �2c , �2

m and �2A when computing

the associated acceptation ratios in the S-MCMC step of the algorithm. This intuition is well-explained in [25].