optimal transport for diffeomorphic registration -...

1
Optimal transport for diffeomorphic registration We define a fidelity term based on Optimal Transport to compare unlabeled shape data, and couple it we a registration algorithm. Jean Feydy 1,2 Benjamin Charlier 3,5 François-Xavier Vialard 4,6 Gabriel Peyré 1,5 1 DMA – École Normale Supérieure, Paris, France 2 CMLA – ENS Cachan, Cachan, France 3 Institut Montpelliérain Alexander Grothendieck, Univ. Montpellier, Montpellier, France 4 Univ. Paris-Dauphine - PSL Research, Paris, France 5 CNRS, Paris, France 6 INRIA Mokaplan, Paris, France The Measure+Kernel Paradigm We focus on the registration of a shape A to a shape B through a (rigid, diffeomorphic, etc.) transformation ϕ : A -→ ϕ(A) B. In a variational setting, one chooses a transforma- tion ϕ minimizing an energy E (ϕ)= Reg(ϕ) | {z } Regularization + d(ϕ(A) B ) | {z } fidelity term . Registration toolboxes thus require fidelity rou- tines d between unlabeled shapes ϕ(A) and B . Conveniently, one represents those as measures: ϕ(A) μ = I X i=1 p i δ x i and B ν = J X j =1 q j δ y j . If ϕ(A) is a segmented surface, each weighted dirac p i δ x i stands for a triangle. If ϕ(A) is a segmented density image, each weighted dirac p i δ x i stands for a voxel. Then, one typically chooses a blurring function G σ associated to a kernel k = G σ G σ and use d(μ ν )= G σ μ - G σ ν 2 L 2 = μ-ν | k(μ-ν ) . This simple fidelity can be computed at the cost of a single convolution through the data (μ - ν ). Fig. 1: Smoothed data G σ (μ - ν ) for two dif- ferent scales σ . (a) Fine kernels are not suited to large deformations, whereas (b) heavy-tailed ker- nels can be hard to tune. Our Contribution An Optimal Transport plan is a rough global matching, akin to a spring system. It may tear shapes apart. source A target B warped grid ϕ(R d ) OT plan Γ AB Use it to drive a smooth registration. This global gradient allows a smooth registration toolbox to reach quickly a satisfying overlap. warped source ϕ(A) Wait a little bit... source registered to the target At convergence, the smallest details can be retrieved without recurring to a coarse- -to-fine scheme. The proposed data attachment term is: Global, unlike kernel methods. Principled, as it relies on a blooming mathematical field. Differentiable, pluggable in any registration toolbox. Versatile, as it covers all scales and can be adapted to any feature space. Affordable, at a cost of 100-1000 gaussian convolutions per transport plan. The Math Behind It In the simplest setting, we assume that μ = I X i=1 p i δ x i and ν = J X j =1 q j δ y j have the same total mass. Then, we define the Wasserstein fidelity term through a minimization on I -by-J matrices – called transport plans Γ: W ε (μ, ν ) | {z } = d(μν ) = min Γ X i,j γ i,j ·|x i - y j | 2 | {z } transport cost + ε X i,j γ i,j log γ i,j | {z } entropic regularization under the constraint that Γ=(γ i,j ) satisfies i, j, γ i,j > 0, X j γ i,j = p i , X i γ i,j = q j . (1) Optimality conditions show that the OT plan can be written as a product γ i,j = Γ(x i y j )= a(x i ) k (x i ,y j ) b(y j ), where: The kernel function k is given by k (x i ,y j )= k (x i - y j )= e -|x i -y j | 2 . a and b are nonnegative functions supported respectively by {x i } and {y j }. The Sinkhorn theorem then asserts that a and b are uniquely determined by eq. (1), which now reads a = p kb , b = q ka . ε ε Fig. 2 : Two OT plans computed with different reg- ularization scales ε. Increasing this parameter results in a lower computational cost. The Algorithm Sinkhorn Iterative Algorithm Parameter : k : x e -|x| 2 Input : source μ = i p i δ x i target ν = j q j δ y j Output : fidelity W ε (μ, ν ) 1: a ones(size(p)) 2: b ones(size(q )) 3: while updates > tol do 4: a p/ (kb) 5: b q/ (ka) 6: return ε · ( p, log(a)+1/2 + q, log(b)+1/2 ) If the data lies on a grid, k · is a separable gaussian convolution. If the data is sparse, k · is the prod- uct with the kernel matrix (K ij )= k (x i ,y j ) and its transpose. In Practice it. 10 it. 50 it. 677 it. 100 Fig. 3 : Sinkhorn iterations propagate along both shapes the information encoded within (K ij ). The practical convergence rate of the Sinkhorn algorithm is not well un- derstood yet, but computing an opti- mal transport plan typically requires 1000 convolutions, depending on ε. Our Matlab and Python imple- mentations are freely available: github.com/jeanfeydy/lddmm-ot Bonus Features -----→ Fig. 4 : Use OT plans to registrate ex- otic data types. Use Unbalanced Transport, relaxing the constraints of eq. (1) with a soft penalty term. Generalize the algorithm to Features Spaces such as the “position + orientation” space. Compute seamlessly the derivatives of the fidelity. Implement the algorithm in the log-domain with Nesterov acceleration for increased numerical stability and speed. Take-Home Points The Sinkhorn algorithm, an iterative globalization trick, provides small kernels with long-distance vision. Computed at the cost of a few hundred convolutions, Optimal Transport plans can be used as spring systems driving a diffeomorphic registration routine. The resulting framework is more robust than a kernel-based one, as no target data is “out-of-sight”. This new scheme will find its use at the coarsest scales, where its properties are worth the computational overhead. References 1. Al-Rfou, R., Alain, G., Almahairi, A., Angermüller, C., Bahdanau, D., Ballas, N., ..., Zhang, Y.: Theano: A python framework for fast computation of mathematical expressions. CoRR abs/1605.02688 (2016) 2. Avants, B., Epstein, C., Grossman, M., Gee, J.: Symmetric diffeomorphic image registration with cross-correlation: Evaluating automated labeling of elderly and neurodegenerative brain. Medical Image Analysis 12(1), 26 – 41 (2008) 3. Charon, N., Trouvé, A.: The varifold representation of nonoriented shapes for dif- feomorphic registration. SIAM J. Imaging Sciences 6(4), 2547–2580 (2013) 4. Chizat, L., Peyré, G., Schmitzer, B., Vialard, F.X.: Scaling algorithms for unbal- anced transport problems. Preprint 1607.05816, Arxiv (2016) 5. Chizat, L., Schmitzer, B., Peyré, G., Vialard, F.X.: An interpolating distance be- tween optimal transport and Fisher–Rao. to appear in Foundations of Computa- tional Mathematics (2016) 6. Cuturi, M.: Sinkhorn distances: Lightspeed computation of optimal transportation. In: Proc. NIPS, vol. 26, pp. 2292–2300 (2013) 7. Cuturi, M., Doucet, A.: Fast computation of Wasserstein barycenters. In: Proceed- ings of the 31st International Conference on Machine Learning (ICML), JMLR WCP. vol. 32 (2014) 8. Gee, J.C., Gee, J.C., Reivich, M., Reivich, M., Bajcsy, R., Bajcsy, R.: Elastically deforming a three-dimensional atlas to match anatomical brain images. J. Comput. Assist. Tomogr 17, 225–236 (1993) 9. Gori, P., Colliot, O., Marrakchi-Kacem, L., Worbe, Y., de Vico Fallani, F., Chavez, M., Lecomte, S., Poupon, C., Hartmann, A., Ayache, N., Durrleman, S.: A proto- type representation to approximate white matter bundles with weighted currents. Proceedings of Medical Image Computing and Computer Aided Intervention (MIC- CAI’14) Part III LNCS 8675, 289–296 (2014) 10. Liero, M., Mielke, A., Savaré, G.: Optimal entropy-transport problems and a new Hellinger–Kantorovich distance between positive measures. ArXiv e-prints (2015) 11. Montavon, G., Müller, K.R., Cuturi, M.: Wasserstein training of restricted Boltz- mann machines. In: Adv. in Neural Information Processing Systems (2016) 12. Santambrogio, F.: Optimal transport for applied mathematicians. Progress in Non- linear Differential Equations and their applications 87 (2015) 13. Sejdinovic, D., Sriperumbudur, B., Gretton, A., Fukumizu, K.: Equivalence of distance-based and rkhs-based statistics in hypothesis testing. Ann. Statist. 41(5), 2263–2291 (10 2013) 14. Sotiras, A., Davatzikos, C., Paragios, N.: Deformable medical image registration: A survey. IEEE Transactions on Medical Imaging 32(7), 1153–1190 (July 2013) 15. Vaillant, M., Glaunès, J.: Surface matching via currents. In: Biennial International Conference on Information Processing in Medical Imaging. pp. 381–392. Springer (2005) Contact Information Jean Feydy, PhD student with Prof. Alain Trouvé, DMA, ENS Paris + CMLA, ENS Cachan. Mail : [email protected] Web : www.math.ens.fr/~feydy 1

Upload: dodung

Post on 11-Apr-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimal transport for diffeomorphic registration - DMA/ENSfeydy/Talks/MICCAI_2017/Poster_MICCAI2017.pdf · 15.Vaillant,M.,Glaunès,J.:Surfacematchingviacurrents.In:Biennial ... Jean

Optimal transport for diffeomorphic registrationWe define a fidelity term based on Optimal Transport to compare unlabeled shape data, and couple it we a registration algorithm.

Jean Feydy1,2 Benjamin Charlier3,5 François-Xavier Vialard4,6 Gabriel Peyré1,5

1DMA – École Normale Supérieure, Paris, France 2CMLA – ENS Cachan, Cachan, France3Institut Montpelliérain Alexander Grothendieck, Univ. Montpellier, Montpellier, France

4Univ. Paris-Dauphine - PSL Research, Paris, France 5CNRS, Paris, France 6INRIA Mokaplan, Paris, France

The Measure+Kernel Paradigm

We focus on the registration of a shape A toa shape B through a (rigid, diffeomorphic, etc.)transformation ϕ :

A −→ ϕ(A) ' B.

In a variational setting, one chooses a transforma-tion ϕ minimizing an energy

E(ϕ) = Reg(ϕ)︸ ︷︷ ︸Regularization

+ d(ϕ(A)→ B)︸ ︷︷ ︸fidelity term

.

Registration toolboxes thus require fidelity rou-tines d between unlabeled shapes ϕ(A) and B .Conveniently, one represents those as measures:

ϕ(A)↔ µ =I∑

i=1

piδxiand B ↔ ν =

J∑j=1

qjδyj.

• If ϕ(A) is a segmented surface, each weighteddirac piδxi

stands for a triangle.

• If ϕ(A) is a segmented density image, eachweighted dirac piδxi

stands for a voxel.

Then, one typically chooses a blurring function Gσ

associated to a kernel k = Gσ ? Gσ and use

d(µ→ ν) = ‖Gσ ? µ−Gσ ? ν ‖2L2 = 〈µ−ν | k?(µ−ν) 〉.

This simple fidelity can be computed at the cost ofa single convolution through the data (µ− ν).

Fig. 1 : Smoothed data Gσ ? (µ − ν) for two dif-ferent scales σ. (a) Fine kernels are not suited tolarge deformations, whereas (b) heavy-tailed ker-nels can be hard to tune.

Our Contribution

An Optimal Transportplan is a rough globalmatching, akin toa spring system.

It may tear shapesapart.

source A

target B

warped gridϕ(Rd)

OT planΓA→B

Use it to drive a smooth registration.

This global gradientallows a smoothregistration toolboxto reach quicklya satisfying overlap.warped source

ϕ(A)

Wait a little bit...

source registeredto the target

At convergence, thesmallest details canbe retrieved withoutrecurring to a coarse--to-fine scheme.

The proposed data attachment term is:

• Global, unlike kernel methods.

• Principled, as it relies on a blooming mathematical field.

• Differentiable, pluggable in any registration toolbox.

• Versatile, as it covers all scales and can be adapted to anyfeature space.

• Affordable, at a cost of 100-1000 gaussian convolutions pertransport plan.

The Math Behind It

In the simplest setting, we assume that

µ =I∑

i=1

piδxiand ν =

J∑j=1

qjδyj

have the same total mass. Then, we define theWasserstein fidelity term through a minimizationon I-by-J matrices – called transport plans Γ:

Wε(µ, ν)︸ ︷︷ ︸= d(µ→ν)

= minΓ

∑i,j

γi,j · |xi − yj|2︸ ︷︷ ︸transport cost

+ ε∑i,j

γi,j log γi,j︸ ︷︷ ︸entropic regularization

under the constraint that Γ = (γi,j) satisfies

∀i, j, γi,j > 0,∑

j

γi,j = pi,∑

i

γi,j = qj. (1)

Optimality conditions show that the OT plan canbe written as a product

γi,j = Γ(xi→ yj) = a(xi) k(xi, yj) b(yj), where:

• The kernel function k is given by

k(xi, yj) = k(xi − yj) = e−|xi−yj|2/ε.

• a and b are nonnegative functions supportedrespectively by {xi} and {yj}.

The Sinkhorn theorem then asserts that a and b areuniquely determined by eq. (1), which now reads

a = p

k ? b, b = q

k ? a.

√ε

√ε

Fig. 2 : Two OT plans computed with different reg-ularization scales

√ε. Increasing this parameter

results in a lower computational cost.

The Algorithm

Sinkhorn Iterative AlgorithmParameter : k : x 7→ e−|x|

2/ε

Input : source µ =∑

i piδxi

target ν =∑

j qjδyj

Output : fidelity Wε(µ, ν)1: a← ones(size(p))2: b← ones(size(q))3: while updates > tol do4: a← p / (k ? b)5: b← q / (k ? a)6: return ε ·

(〈 p, log(a) + 1/2 〉

+〈 q, log(b) + 1/2 〉)

If the data lies on a grid, k ? · is aseparable gaussian convolution.

If the data is sparse, k ? · is the prod-uct with the kernel matrix

(Kij) = k(xi, yj)

and its transpose.

In Practice

it. 10 it. 50

it. 677it. 100

Fig. 3 : Sinkhorn iterations propagatealong both shapes the informationencoded within (Kij).

The practical convergence rate of theSinkhorn algorithm is not well un-derstood yet, but computing an opti-mal transport plan typically requires∼1000 convolutions, depending on ε.Our Matlab and Python imple-mentations are freely available:github.com/jeanfeydy/lddmm-ot

Bonus Features

−−−−−→

Fig. 4 : Use OT plans to registrate ex-otic data types.

• Use Unbalanced Transport,relaxing the constraints of eq. (1)with a soft penalty term.

• Generalize the algorithm toFeatures Spaces such as the“position + orientation” space.

• Compute seamlessly thederivatives of the fidelity.

• Implement the algorithm in thelog-domain with Nesterovacceleration for increasednumerical stability and speed.

Take-Home Points

• The Sinkhorn algorithm, aniterative globalization trick,provides small kernels withlong-distance vision.

• Computed at the cost of a fewhundred convolutions, OptimalTransport plans can be used asspring systems driving adiffeomorphic registrationroutine.

• The resulting framework is morerobust than a kernel-based one,as no target data is “out-of-sight”.

• This new scheme will find its useat the coarsest scales, where itsproperties are worth thecomputational overhead.

References1. Al-Rfou, R., Alain, G., Almahairi, A., Angermüller, C., Bahdanau, D.,Ballas, N., ..., Zhang, Y.: Theano: A python framework for fastcomputation of mathematical expressions. CoRR abs/1605.02688 (2016)2. Avants, B., Epstein, C., Grossman, M., Gee, J.: Symmetric diffeomorphicimage registration with cross-correlation: Evaluating automatedlabeling of elderly and neurodegenerative brain. Medical Image Analysis12(1), 26 – 41 (2008)3. Charon, N., Trouvé, A.: The varifold representation of nonorientedshapes for dif- feomorphic registration. SIAM J. Imaging Sciences 6(4),2547–2580 (2013)4. Chizat, L., Peyré, G., Schmitzer, B., Vialard, F.X.: Scaling algorithms forunbal- anced transport problems. Preprint 1607.05816, Arxiv (2016)5. Chizat, L., Schmitzer, B., Peyré, G., Vialard, F.X.: An interpolatingdistance be- tween optimal transport and Fisher–Rao. to appear inFoundations of Computa- tional Mathematics (2016)6. Cuturi, M.: Sinkhorn distances: Lightspeed computation of optimal

transportation. In: Proc. NIPS, vol. 26, pp. 2292–2300 (2013)7. Cuturi, M., Doucet, A.: Fast computation of Wasserstein barycenters. In:Proceed- ings of the 31st International Conference on Machine Learning(ICML), JMLR WCP. vol. 32 (2014)8. Gee, J.C., Gee, J.C., Reivich, M., Reivich, M., Bajcsy, R., Bajcsy, R.:Elastically deforming a three-dimensional atlas to match anatomicalbrain images. J. Comput. Assist. Tomogr 17, 225–236 (1993)9. Gori, P., Colliot, O., Marrakchi-Kacem, L., Worbe, Y., de Vico Fallani, F.,Chavez, M., Lecomte, S., Poupon, C., Hartmann, A., Ayache, N., Durrleman,S.: A proto- type representation to approximate white matter bundleswith weighted currents. Proceedings of Medical Image Computing andComputer Aided Intervention (MIC- CAI’14) Part III LNCS 8675, 289–296(2014)10. Liero, M., Mielke, A., Savaré, G.: Optimal entropy-transport problemsand a new Hellinger–Kantorovich distance between positive measures.ArXiv e-prints (2015)

11. Montavon, G., Müller, K.R., Cuturi, M.: Wasserstein training ofrestricted Boltz- mann machines. In: Adv. in Neural InformationProcessing Systems (2016)12. Santambrogio, F.: Optimal transport for applied mathematicians.Progress in Non- linear Differential Equations and their applications 87(2015)13. Sejdinovic, D., Sriperumbudur, B., Gretton, A., Fukumizu, K.:Equivalence of distance-based and rkhs-based statistics in hypothesistesting. Ann. Statist. 41(5), 2263–2291 (10 2013)14. Sotiras, A., Davatzikos, C., Paragios, N.: Deformable medical imageregistration: A survey. IEEE Transactions on Medical Imaging 32(7),1153–1190 (July 2013)15. Vaillant, M., Glaunès, J.: Surface matching via currents. In: BiennialInternational Conference on Information Processing in Medical Imaging.pp. 381–392. Springer (2005)

Contact Information

Jean Feydy,PhD student with Prof. Alain Trouvé,DMA, ENS Paris + CMLA, ENS Cachan.

Mail : [email protected] : www.math.ens.fr/~feydy

1