towards real-time multimodal fusion for image-guided … · 2014. 1. 17. · towards real-time...

Towards Real-time Multimodal Fusion for Image-Guided Interventions using Self-Similarities

Mattias P. Heinrich1, Mark Jenkinson2, Bartlomiej W. Papiez1, Sir J. Michael Brady3, Julia A. Schnabel1 1Institute of Biomedical Engineering 2Oxford University Centre for Functional MRI of the Brain 3Department of Oncology

University of Oxford, UK

Visual Example of Registration Result

3. Quantisation of Self-similarity Descriptors •  SSC descriptor has 12 elements per voxel (many cost evaluations needed) •  Example of quantisation for continuous valued descriptor (here 4 elements):

Subsampling of voxels within influence region of control points (similar to [5])

4. Validation and Comparison to State-of-the-Art •  13 clinical pre-operative MRI and intra-operative US scan pairs [6]

•  Similarity metrics (using deeds) •  Results from literature

Discussion and Outlook •  Very low registration error (2.12 mm) for challenging US-MRI datasets •  Efficient similarity term evaluation and regularisation (comp. time ~20 secs.) •  Generally applicable: currently 2nd lowest TRE (0.62 mm) and 2nd lowest

fissure error (0.08 %) for EMPIRE lung challenge (out of 33 contestants) → C++ source code freely available on request: [email protected]

References [1] Wachinger: Entropy and laplacian images: structural representations for multi-modal registration, MedIA 2012 [2] Heinrich: MIND: Modality independent neighbourhood descriptor for deformable multi-modal registration, MedIA 2012 [3] Heinrich: MRF-based deformable registration and ventilation estimation of lung CT, TMI 2013 [4] Christensen: Consistent image registration, TMI 2001 [5] De Nigris: Fast rigid registration of pre-operative magnetic resonance images to intra-operative ultrasound for neurosurgery based on high confidence gradient orientations, JCARS 2013 [6] Mercier : Online database of clinical MR and ultrasound images of brain tumors. Med. Phys. 2012 [7] Rivaz: Self-similarity weighted mutual information: A new nonrigid image registration metric, MICCAI 2012

Motivation of Multi-modal Registration for Interventions •  Propagate segmentation information for image-guided radiotherapy •  Overlay of high-quality pre-operative scan for guidance during surgery → requires accurate, reliable and fast multi-modal deformable registration

Challenges for Multi-modal Registrations Algorithms •  Limitations of current similarity metrics for MRI-US registration: - global mutual information fails due to locally varying intensity distortions - structural representations (entropy images [1], MIND [2]) are not robust

enough to strong noise and artefacts of ultrasound - high computation time for complex statistical metrics limits use for

interventions (e.g. deformable registration with α-MI takes up to 2 hours [3])

→ self-similarity descriptor with contextual information: improves noise robustness

→ quantisation scheme with Hamming distance reduces computation time

1. Self-similarities Descriptors for Multi-Modal Similarity •  Obtain structural image representation independent of modality •  Self-similarity descriptor S(I, x, y) (as previously used for MIND [2]):

•  SSD of patches capture most important geometrical features (edges, corners) •  Robust against local change in contrast and intensity relations across scans

Self-similarity layout N for patch sampling locations x and y •  For MIND [2] x is fixed to central patch and y uses 6 direct neighbours → high dependency on central patch decreases noise robustness

•  Self-similarity context (SSC) avoids central patch → use pairs of patches in six neighbourhood (with spatial distance √2) → much improved robustness against image noise (important for ultrasound)

Noise robustness of landmark localisation (additional experiment)

2. Efficient Discrete Optimisation (deeds) [3]

•  Dense sampling of displacement space (133≈2200 displacements at 1st level) •  Three level B-spline transformation grid (control point spacing: 6, 5, and 4 mm) •  Incremental diffusion regularisation using globally optimal belief propagation

(simplified graph structure using a minimum-spanning-tree)

Inverse consistent (diffeomorphic) transformations •  Forward un and backward vn transform are estimated independently •  Iterative calculation of optimal inverse-consistent transformation using:

→ Explicit inverse is not required (improves convergence over [4])

1 2 3 4 5 6 7

2 3 5 10 15 20 30 50 100 150

samples MI samples SSC

Institute of Biomedical Engineering, University of Oxford | [email protected]

Multimodal Fusion for Image-Guided Interventions using Self-Similarities 3

3 Methods

SSC is estimated based on patch-based self-similarities in a similar way as e.g.LSS or MIND, but rather than extracting a representation of local shape orgeometry, it aims to find the context around the voxel of interest. Therefore, thenegative influence of noisy patches can be greatly reduced, making this approachvery suitable for ultrasound registration. Spatial context has been successfullyapplied to object detection [7] and is the driving force of pictorial structures [3].

3.1 Self-similarity context

Self-similarity can be described by a distance function between image patcheswithin one image I (sum of squared differences SSD can be used within thesame scan), a local or global noise estimate σ2, and a certain neighbourhoodlayout N for which self-similarities are calculated. For a patch centred at x, theself-similarity descriptor S(I,x,y) is given by:

S(I,x,y) = exp

�−SSD(x,y)

σ2

�x,y ∈ N (1)

where y defines the centre location of a patch within N . In [4] and [15], theneighbourhood layout was defined to always include the patch centred around x

for pairwise distance calculations. This has the disadvantage that image artefactsor noise within the central patch always have a direct adverse effect on the self-similarity descriptor. We therefore propose to completely avoid using the patchat x for the calculation of the descriptor for this location. Instead, all pairwisedistances of patches within the six neighbourhood (with a Euclidean distance of√2 between them) are used. The spatial layout of this approach is visualised in

Fig. 1 and compared to MIND (showing the central patch in red, patches in Nin grey, and edges connecting pairs for which distances are calculated in blue).The aim of SSC is not to find a good structural representation of the underlyingshape, but rather the context within its neighbourhood. The noise estimate σ2 inEq. 1 is defined to be the mean of all patch distances. Descriptors are normalisedso that max(S) = 1.

Pointwise Multimodal Similarity using SSC: Structural image repre-sentations are appealing, because they enable multi-modal registration usingsimple similarity metric across modalities. Once the descriptors are extractedfor both images, yielding a vector for each voxel, the similarity metric betweenlocations xi and xj in two images I and J can be defined as the sum of abso-lute differences (SAD) between their corresponding descriptors. The distance Dbetween two descriptors is therefore:

D(xi,xj) =1

|N |�

y∈N|S(I,xi,y)− S(J,xj ,y)| (2)

Equation 2 requires |N | computations to evaluate the similarity at one voxel.Discrete optimisation frameworks, as the one employed here, use many cost func-tion evaluations per voxel. In order to speed up the computations, we propose to

MIND SSC

0 1 2 3 4 5 6 7

0 3 6 9 12 15 18 21

MIND 6-NH MIND 26-NH SSC

TR

E in

mm

(be

fore

6.8

mm

)

land

mar

k lo

calis

atio

n er

ror

in m

m

4 M. Heinrich

MIND SSC

Fig. 1. Concept of self-similarity context (SSC) compared to MIND with six-neighbourhood (6-NH). The patch around the voxel of interest is shown in red, allpatches within its immediate 6-NH are shown in grey. Left: All patch distances (shownwith blue lines) used for MIND within the 6-NH take the centre patch into account.Right: Geometrical and structural context can be better described by SSC using allpatch-to-patch distances, none of which is dependent on the central patch.

quantise the descriptor to a single integer value with 64 bits, without significantloss of accuracy. The exact similarity evaluation of Eq. 2 can then be obtainedusing the Hamming distance between two descriptors using only one operationper voxel. A descriptor using self-similarity context consists of 12 elements, forwhich we use 5 bits per element, which translates into 6 different possible values(note that we cannot use a quantisation of 25 because the Hamming weight onlycounts the number of bits, which differ). Figure 2 illustrates the concept, whichcould also be employed for other multi-feature based registration techniques.

3.2 Discrete Optimisation

Discrete optimisation is used in this work, because it is computationally efficient,no derivative of the similarity cost is needed, local minima are avoided and largedeformations can be covered by defining an appropriate range of displacementsu. Here, we adopt our recent approach [5, 6], which uses a block-wise parametrictransformation model with belief propagation on a tree graph (BP-T) [3]. Theregularisation term penalises squared differences of displacements for neighbour-ing control points and is weighted with a constant λ. Not all similarity termswithin the influence region of each control point, but only a randomly chosensubset of them are taken into account. This concept, which has also been usedin stochastic gradient descent optimisation [8] and in [2], greatly reduces thecomputation of the similarity term without loss of accuracy (as shown in [6]).

Inverse Consistent Transformations: We introduce a simple scheme toobtain inverse consistent mappings, given the forward and backward displace-ment fields u

n and vn respectively (which are independently calculated), by

iteratively updating the following equations:

un+1 = 0.5(un − v

n(x+ un)) (3)

vn+1 = 0.5(vn − u

n(x+ vn))

2/5 4/5

0/5 5/5

3/5 5/5

1/5 4/5

2/5 = 00011 4/5 = 01111 0/5 = 00000 5/5 = 11111

3/5 = 00111 5/5 = 11111 1/5 = 00001 4/5 = 01111

binary SSC (image I), 20 bit concatenated

binary SSC (image J), 20 bit concatenated

00011011110000011111 00111111110000101111 ⊕

00100100000000110000

5 bit quantisation

bit-wise exclusive or

bit-count (Hamming weight) = 4 (divide by 20) equivalent to SAD → only one operation

1.5

2

2.5

3

3.5

MI NGF MIND SSC

1.5

2

2.5

3

3.5

α-MI SeSaMI [7] GOA [5]

targ

et r

egist

ratio

n er

ror

in m

m (

5 ca

ses)

befo

re r

egist

ratio

n af

ter

regi

stra

tion

axial plane coronal plane

five samples per control point sufficient for SSC → much faster computation

increasing standard deviation of Gaussian noise (std of scan=416)

normalised M

RI intensities

normalised U

S intensities

towards real-time multimodal fusion for image-guided … · 2014. 1. 17. · towards real-time...

Documents