a two-stage algorithm for joint multimodal image...

35
A two-stage algorithm for joint multimodal image reconstruction Yunmei Chen * Bin Li Xiaojing Ye Abstract We propose a new two-stage joint image reconstruction method by recovering edge directly from observed data and then assembling image using the recovered edges. More specifically, we reformulate joint image reconstruction with vectorial total-variation regularization as an l 1 minimization problem of the Jacobian of the underlying multi-modality or multi-contrast images. We provide detailed derivation of data fidelity for Jacobian in Radon and Fourier transform domains. The new minimization problem yields an optimal convergence rate higher than that of existing primal-dual based reconstruction algorithms, and the per-iteration cost remains low by using closed-form matrix-valued shrinkages. We conducted numerical tests on a number of multi-contrast CT and MR image datasets, which demonstrate that the proposed method significantly improves reconstruction efficiency and accuracy compared to the state-of- the-arts joint image reconstruction methods. Keywords. Joint image reconstruction, matrix norm, accelerated gradient method. AMS subject classifications. 90C25, 68U10, 65J22 1 Introduction The advances of medical imaging technology have allowed simultaneous data acquisition for multi- modality images, such as PET-CT, PET-MRI, and multi-contrast images, such as T1/T2/PD in MRI. Multi-modality/contrast imaging integrates two or more imaging modalities/contrasts into one system to produce more comprehensive observations of the subject. In particular, such technology combines the strengths of different imaging modalities/contrasts in clinical diagnostic imaging, and hence can be much more precise and effective than conventional imaging. In multi- modality/contrast imaging, the images from different modalities or contrasts complement each other, which yield high spatial resolution and better tissue contrast. However, it is a challenging task to use the information from different modalities effectively in image reconstruction, especially when only limited data are available. In multi-modal/contrast image reconstruction, the image of a subject to be reconstructed is vector-valued at every point (or pixel/voxel in discrete case) in the image domain Ω. For notation simplicity, we only consider two-dimensional (2D) images in this paper and assume that Ω := [0, 1] 2 R 2 . Then for every x Ω, an image of m modalities (or contrasts) can be represented by a function u R m such that u(x)=(u 1 (x),...,u m (x)) > R m and u j (x) R is the intensity * Department of Mathematics, University of Florida, Gainesville, Florida 32611, USA. Email: [email protected]. School of Automation Science and Engineering, South China University of Technology, Guangzhou, Guangdong 510640, China. Email: [email protected]. Department of Mathematics and Statistics, Georgia State University, Atlanta, Georgia 30303, USA. Email: [email protected]. 1

Upload: others

Post on 05-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

A two-stage algorithm for joint multimodal image reconstruction

Yunmei Chen ∗ Bin Li † Xiaojing Ye ‡

Abstract

We propose a new two-stage joint image reconstruction method by recovering edge directlyfrom observed data and then assembling image using the recovered edges. More specifically,we reformulate joint image reconstruction with vectorial total-variation regularization as anl1 minimization problem of the Jacobian of the underlying multi-modality or multi-contrastimages. We provide detailed derivation of data fidelity for Jacobian in Radon and Fouriertransform domains. The new minimization problem yields an optimal convergence rate higherthan that of existing primal-dual based reconstruction algorithms, and the per-iteration costremains low by using closed-form matrix-valued shrinkages. We conducted numerical tests ona number of multi-contrast CT and MR image datasets, which demonstrate that the proposedmethod significantly improves reconstruction efficiency and accuracy compared to the state-of-the-arts joint image reconstruction methods.

Keywords. Joint image reconstruction, matrix norm, accelerated gradient method.

AMS subject classifications. 90C25, 68U10, 65J22

1 Introduction

The advances of medical imaging technology have allowed simultaneous data acquisition for multi-modality images, such as PET-CT, PET-MRI, and multi-contrast images, such as T1/T2/PDin MRI. Multi-modality/contrast imaging integrates two or more imaging modalities/contrastsinto one system to produce more comprehensive observations of the subject. In particular, suchtechnology combines the strengths of different imaging modalities/contrasts in clinical diagnosticimaging, and hence can be much more precise and effective than conventional imaging. In multi-modality/contrast imaging, the images from different modalities or contrasts complement eachother, which yield high spatial resolution and better tissue contrast. However, it is a challengingtask to use the information from different modalities effectively in image reconstruction, especiallywhen only limited data are available.

In multi-modal/contrast image reconstruction, the image of a subject to be reconstructed isvector-valued at every point (or pixel/voxel in discrete case) in the image domain Ω. For notationsimplicity, we only consider two-dimensional (2D) images in this paper and assume that Ω :=[0, 1]2 ⊂ R2. Then for every x ∈ Ω, an image of m modalities (or contrasts) can be represented bya function u : Ω→ Rm such that u(x) = (u1(x), . . . , um(x))> ∈ Rm and uj(x) ∈ R is the intensity

∗Department of Mathematics, University of Florida, Gainesville, Florida 32611, USA. Email: [email protected].†School of Automation Science and Engineering, South China University of Technology, Guangzhou, Guangdong

510640, China. Email: [email protected].‡Department of Mathematics and Statistics, Georgia State University, Atlanta, Georgia 30303, USA. Email:

[email protected].

1

Page 2: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

value of modality j at every point x ∈ Ω. The variational framework of joint image reconstructionis given by

minu

αR(u) + h(u), where h(u) :=m∑j=1

hj(uj), (1)

where uj : Ω → R is the scalar-valued, jth component (i.e., j-th modality/contrast, also calledjth channel) of image u, and hj(uj) is the corresponding fidelity term that describes the dataacquisition process in the jth channel. In (1), the selection of the regularization term R(u) is criticalin joint image reconstruction. In principle, a good choice of R(u) can explore the complementaryinformation across different modalities or contrasts to improve reconstruction quality.

1.1 Related work

In the past several years, there has been a significant increase of interests in joint image recon-struction, most of which are under the framework (1) and focus on the design of the regularizationterm R(u). Since a main feature often shared by images from different channels is the location ofedges, an important class of regularization is to extract image gradients from every channel andcompare them. For instance, the parallelism of the level sets (PLS) has been proposed as a measureto compare gradients of images [14, 15, 16, 20]. In the case of m = 2 and u = (u1, u2), PLS(u1, u2)is defined in [15] as

PLS(u1, u2) :=

∫Ωf(|du1(x)||du2(x)| − g(〈du1(x),du2(x)〉)

)dx,

for some properly chosen functions f and g. Here duj(x) ∈ R2 represents the gradient of imageuj at x for j = 1, 2. In particular, for f(s) = s and g(s) = s, PLS(u1, u2) places large penaltyif du1 and du2 are in different directions [16]. For f(s) =

√1 + s and g(s) = s2 [14] or f(s) = s

and g(s) = s2 [20], minimizing PLS(u1, u2) enforces du1 and du2 to be parallel in either the sameor opposite directions. However, for images in very different intensity scales, the PLS definedabove may penalize high gradient magnitude rather than align the gradients. To overcome thisproblem, in [27], the generalized Bregman distance (GBD) with respect to the total variation (TV)is used to compare the gradients of images u1 and u2. The GBD is defined by Dp

TV (u1, u2) =TV (u1)−TV (u2)−〈p, u1−u2〉 for some p ∈ ∂TV (u2). With a special choice of p, in discrete formthe GBD can be written as

DpTV (u1, u2) =

∫Ω| du1|

(1− 〈du1,du2〉|du1||du2|

)dx,

Thus it penalizes the total variation of u1 weighted by the alignment of du1 and du2. Thentheir iterative model computes each channel using the regularizer formulated by a weighted lin-ear combination of the GBD to all other image channels from the previous iteration. How-ever, minimizing this GBD promotes gradients in the same direction and hence is not suitablein many applications. To avoid this problem, the infinitesimal convolution of Bregman distance(ICBD) is proposed in [30] as a similarity measure. The ICBD is defined by ICBDp

TV (u1, u2) =:

infu1=φ+ψDpTV (φ, u2) + D−pTV (ψ, u2). That is, it uses a local decomposition of an image into two

parts, of which one part matches the (sub)gradients of the same directions and the other partmatches the (sub)gradients of the opposite directions. As an edge indicator, ICBD is independentof the sign of jumps. This structure similarity measure has been applied to PET-MRI joint re-construction [30] and dynamic MRI reconstruction with anatomical prior and significantly reduceddata [31].

2

Page 3: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

Inspired by the success of TV regularization in grey-value image reconstruction, many algo-rithms for joint multi-modal image reconstruction extend the classical TV to vectorial TV asthe regularization. This change aims at capturing the sharable edge information and performingsmoothing along the common edges across the modalities. There are numerous ways to define avectorial TV. For instance, a family of TV for vector-valued functions, termed as collaborativeTV (CTV), are proposed as generalizations of classical TV in [13]. Each of these CTVs is a spe-cific combination of the norms regarding the partial derivatives (gradient), components (channels,modalities/contrasts) and points (pixels/voxels). For example, the channel by channel TV for thevector valued function u(x) = (u1(x), . . . , um(x)) defined as

∑mj=1 TV (uj) in [3] takes l2 norm of

the gradient duj of uj at every point x, then l1 norm over all points, and finally l1 norm over allmodalities/channels. Similarly, (

∑mj=1 TV (uj)

2)1/2 can be defined in the same way except thatthe last one is l2 norm over all modalities. Another CTV variation, known as joint TV (JTV), isformulated as

∫Ω(∑m

j=1

∑dl=1 |∂luj |2)1/2 dx and has been successfully applied to color image recon-

struction [7, 35], multi-contrast MR image reconstruction [21, 26], joint PET-MRI reconstruction[15], and geophysics [20]. It can be seen that JTV is a specific type of CTV that takes l2 norm interms of gradients, l2 norm across modalities, and then finally l1 for points.

Another type of extension of classical TV, termed as vectorial TV (VTV), is based on thestructure tensor of vector-valued functions. It has been shown that the eigenvalues and eigen-functions of the structure tensor Jρ = kρ ∗ dudu>, where kρ is a convolution kernel and u isa scalar-valued function (e.g., grey-value image), are able to capture the first-order informationin a local neighborhood, and provide more robust measure of image variation than classical TV[17, 19, 25, 37]. The idea of using such structure tensor is extended to joint multi-modal/contrastimage reconstruction. With a geometric interpretation of structure tensor, it is suggested in [12]to associate a 2-dimensional (2D) m-vector-valued function u (e.g., image of m modalities) witha parameterized 2D Riemannian manifold with metric J = DuDu>, where u is a vector-valuedimage and Du = [∂luj ]l,j ∈ R2×m is formed by stacking the column vectors duj horizontally. Theeigenvector corresponding to the smaller eigenvalue gives the direction along the vectorial edge. In[34], a family of VTVs are formed as the integral of f(λ+, λ−) over the manifold, where λ+ and λ−denote the larger and smaller eigenvalues of the metric J , respectively, and f is a properly chosen,differentiable, scalar-valued function. For example, one choice of f is f(λ+, λ−) =

√λ+ + λ−, and

the V TV (u) becomes

V TVF (u) =

∫Ω‖Du(x)‖F dx, (2)

where ‖Du(x)‖F is the Frobenius norm of the Jacobian Du(x) ∈ Rm×2 at x. This definition ofV TVF coincides with the JTV mentioned above. Another choice is f(λ+, λ−) =

√λ+, with which

V TV (u) becomes

V TVJ(u) =

∫Ω‖Du(x)‖2 dx =

∫Ωσ1(Du(x)) dx, (3)

where σ1(Du(x)) is the largest singular value of the Jacobian Du(x) at x. In [18], this V TVJ(u) isdefined in a more general sense for functions in the space of bounded variation (hence function udoes not need to be differentiable) as

V TVJ(u) := sup(ξ,η)∈K

d∑k=1

∫Ωukdiv(ηkξ) dx (4)

with K := C10 (Ω;Sm × Sd), where Sd = (z1, . . . , zd)|

∑dl=1 zl = 1, zl ≥ 0, ∀ l is the probability

simplex in Rd. The dual formulation (4) is an analogue of the original formulation of classical TV.

3

Page 4: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

It is shown in [18] that V TVJ performs better that V TVF for color image restoration. For thechoice f(λ+, λ−) =

√λ+ +

√λ−, the V TV (u) reduces to

V TVN (u) =

∫Ω‖Du(x)‖N dx =

∫Ω

(σ1(Du(x)) + σ2(Du(x))) dx, (5)

where ‖Du(x)‖N is the nuclear norm of the matrix Du(x), or the sum of the singular values ofDu(x), which can be interpreted a convex relaxation of the matrix rank. Minimizing the nuclearnorm of Du promotes the linear dependence of the gradients across the modalities, i.e., parallel gra-dients. It has been applied to joint reconstruction of multi-spectral CT data in [32]. Furthermore,the work in [24] considers patch-wise structure tensor SK(u)(x) := ρK ∗ DuDu>, where ρK(x) isa nonnegative, rotationally symmetric convolution kernel (e.g., Gaussian kernel), so that it takesneighborhood information into account. Then a regularization family, termed as structure tensortotal variation (STV), is defined by

STVp(u) =

∫Ω‖(√λ+,

√λ−)‖p dx

for p ≥ 1. For p = 2, p =∞ and p = 1, the STVp(u) coincides with V TVF (u) in (2), V TVJ(u) in (3)and V TVN (u) in (5), respectively. Numerical results on color images in [24] show that STV-basedmodel outperforms models using V TVF (u) or the second-order generalized total variation TGV 2(u)(see definition below) as a regularization.

For regularizations that use higher-order gradients, the total generalized variation (TGV) is acommon choice. The concept of TGV was first introduced in [6]. The discrete form of the second-order TGV TGV 2(u) of a scalar-valued image u can be written as TGV 2(u) = minw ‖∇u−w‖1 +a‖Ew‖1 for some weight a > 0, where Ew := (1/2)·(Dw+Dw>). It has been successfully employedas a regularizer in grey-valued image processing in [4, 5, 6, 22]. Recently, the TGV 2(u) has beenextended to vector-valued images for regularization in joint reconstruction of PET and MR images[23] and multi-channel/contrast MR images [10].

While all the aforementioned methods are based on the framework (1) but differ in the choiceof regularization R(u), the work in [2] takes a different, two-stage approach. More specifically, itreconstructs the gradients of the underlying multi-contrasts MR images jointly from their measure-ments in the Fourier space in the first stage, and then uses the recovered gradients to reconstruct theimages in the second stage. In [2], the first stage is carried out under a hierarchical Bayesian frame-work, where the joint sparsity on gradients across multi-contrast MRI is exploited by sharing thehyper-parameter in the their maximum a posteriori (MAP) estimations. The experimental resultsin [2] show the advantage of using joint sparsity on gradients over conventional sparsity. However,this Bayesian-estimation based method requires extensive computational cost. The advantage ofusing two-stage approach to improve quality is also shown in in [29, 33] for grey-valued imagereconstruction with limited measurements. To the best of our knowledge, there is no other workon such two-stage approach for joint multi-modal/contrast image reconstruction in the literature.

1.2 Our contribution

Our contribution in this paper is in the development of a novel two-stage variational model inspiredby the successes and VTV regularizations and edge-based reconstruction approach. In the firststage, the edges (gradients or Jacobian) of multi-modal/contrast images are reconstructed jointlyby minimizing the data fidelity of the gradients, rather than that of the image itself, together withthe regularization of the gradients derived from VTV. In the second stage, the image is reconstructedeasily using the edges obtained in the previous stage.

4

Page 5: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

The main advantage of the proposed method is that it yields an l1-type minimization of theedge in the first stage and a simple smooth minimization in the second stage. While the smoothminimization in the second stage is easy to solve and often has closed-form solution, the l1-typeminimization in the first stage can be solved by accelerated proximal gradient methods with anoptimal, unimprovable convergence rate of O(1/k2) where k is the iteration number. This is insharp contrast to TV-based reconstruction methods, for which the best known primal-dual basedalgorithms, such as alternating direction method of multipliers (ADMM) and primal-dual hybridgradient (PDHG) method, only converge at O(1/k) due to the lack of strong convexity in (1).Therefore, the proposed method can significantly improve the overall effectiveness of joint imagereconstruction.

To formulate the objective function of the gradients with l1-type regularization in the first stage,we establish the mathematical relation between the edge of an underlying image and the observedimaging data (Section 2.2). Moreover, we provide for the first time how the noise of imaging dataaffects data fidelity for the edges (i.e., image gradients) in MRI. In addition, we show that thesubproblem of the l1-type minimization only involves matrix norms, rather than TVs, and can besolved by closed-form matrix-valued shrinkage.

We also conduct extensive numerical tests on synthetic and in-vivo multi-contrast MRI andmulti-energy CT datasets in this paper. To show the effectiveness of the proposed method, wecompare the performance against the state-of-the-arts primal-dual methods for (1) using differ-ent VTVs. The experimental results show the very promising performance improvement by theproposed method.

1.3 Organization

The remainder of this paper is organized as follows. We first present the proposed framework, andaddress a number of important questions regarding this new framework in Section 2. In Section3, we conduct a series of numerical tests on a variety of multi-contrast MR and multi-energy CTimage reconstruction problems. Section 4 provides some concluding remarks.

2 Proposed Method

In this section, we propose our two-stage joint image reconstruction method. In the first stage,we recover the multimodal image edges (i.e., gradients or Jacobian of the image) jointly fromundersampled Fourier or Radon data. In the second stage, the final image is reconstructed usingthe edges obtained in the first stage.

For ease of presentation and numerical computation, we treat the image uj , the jth modalityof image u, as a 2D array where each entry represents the intensity value at that pixel location.We may also treat uj as a column vector by stacking its columns to form a single column vectoruj ∈ Rn, where n is the total number of pixels in the image uj .

2.1 Preliminaries on vectorial total-variation regularization

The standard total-variation (TV) regularized inverse problem for image reconstruction can beformulated as a minimization problem as follows,

minu

αTV (u) + h(u), (6)

where h represents the data fidelity function, e.g., h(u) = (1/2) · ‖Au − f‖2, for some given datasensing matrix A and observed partial/noisy/blurry data f . For example, in the basic compressive

5

Page 6: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

sensing MRI model, A = PF where F is the Fourier transform, and P represents an undersamplingtrajectory. In discrete setting, F ∈ Cn×n is the discrete Fourier matrix, and P ∈ Rp×n is a binarymatrix formed by the p rows of the identity matrix corresponding to the indices of sampled Fouriercoefficients. Then f ∈ Cp is the undersampled Fourier data. By solving (6), we obtain a solution uwhich minimizes the sum of TV regularization term and data fidelity term in (6) with a weightingparameter α > 0 that balances the two terms. It is shown that the TV regularization can effectivelyrecover high quality images with well-preserved boundaries of the objects from limited and/or noisydata.

In joint multi-modality image reconstruction, the edges of images from different modalitiesare highly correlated. To take such factor into consideration, the standard TV regularized imagereconstruction (6) can be replaced by its vectorial TV (VTV) regularized version:

minu

αV TV (u) + h(u), (7)

where V TV (u) is a vectorial TV of u. In the case that u is continuously differentiable, VTV is adirect extension of standard TV as an “l1 of gradient” as

V TV (u) := ‖Du‖?,1 =

∫Ω‖Du(x)‖? dx, (8)

where Du(x) ∈ R2×m is the Jacobian matrix at point x, and ? indicates some specific matrix norm.Examples of such norms include those in (2), (3), and (5). There are many other choices for VTVdue to the availability of different matrix norms. However, in this paper, we only focus on the threenorms above as they are mostly used in VTV regularized image reconstructions in the literatureand appear to have good performance. Recall that, for a d-by-m matrix Q = [qij ], these matrixnorms are defined as:

• Frobenius norm: ‖Q‖F = (∑

i,j |qij |2)1/2. This essentially treats Q as an (dm)-vector.

• Induced 2-norm: ‖Q‖2 = σ1 where σ1 is the largest singular value of Q. This norm isadvocated in [12, 18, 34] with a geometric interpretation when used to define VTV.

• Nuclear norm: ‖Q‖N =∑mind,m

i=1 σi where σ1 ≥ σ2 ≥ · · · ≥ 0 are singular values of Q. Thisis a convex relaxation of matrix rank and advocated in [32].

It is also worth noting that, in general, the VTV norm with matrix norm ‖ · ‖? is defined forany function u ∈ L1

loc(Ω;Rm) (not necessarily differentiable) as

V TV (u) = sup

∫Ωu(x)div(ξ(x)) dx | ξ ∈ C1

0 (Ω;R2×m), ‖ξ(x)‖• ≤ 1,∀x ∈ Ω

(9)

where ‖ ·‖• is the dual norm of ‖ ·‖?. In particular, we remark here that the Frobenius norm is dualto itself, and the induced 2-norm is dual to the nuclear norm (and vice versa). Although we wouldnot always make use of the original VTV definition (9) in discrete setting, we show that they canbe helpful in the derivation of closed-form soft-shrinkage with respect to the corresponding matrix?-norm as in Appendix A.

2.2 Joint edge reconstruction

From (8), we can see the two main features of VTV: the matrix ?-norm used to evaluate theJacobian Du(x) at each point x, and the l1 norm that integrates ‖Du(x)‖? over all x. The former

6

Page 7: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

is to explore edge information in Du(x) from different modalities, and the latter imposes sparsityof ‖Du(x)‖? across the image domain Ω to suppress noises and artifacts. However, minimizing anobjective function consisting of nonsmooth VTV regularization demands for primal-dual methods(such as ADMM) and yields only O(1/k) convergence rate in general without strong convexitycondition. On the other hand, we realize that such low convergence rate is due to the combinationof the Jacobian operator D and the nonsmooth l1 norm which introduces constraint and requiresupdates of primal and dual variables. Since it is Du(x), the Jacobian, that really takes effect inVTV, this motivates us to consider reconstructing the Jacobian Du first rather than the imageu directly. By this, we do not need to deal with the constraint involving the Jacobian operatorD from variable splitting, but can directly work on the l1-type minimization which can be solvedmuch more efficiently.

Therefore, the main idea of the two-stage reconstruction algorithm proposed this paper is toreconstruct edges (e.g., gradients/Jacobian) of multi-modality images jointly, and then assemblethe final image using the reconstructed edges. To this end, we let v denote the Jacobian of u. Forexample, assuming there are three modalities (m = 3) and u(x) = (u1(x), u2(x), u3(x)) ∈ R3 atevery point x ∈ Ω, then v(x) is a matrix

v(x) =(du1(x), du2(x), du3(x)

)=

(v11(x) v12(x) v13(x)v21(x) v22(x) v23(x)

)∈ R2×m (10)

where vlj(x) := ∂uj(x)/∂xl at every x = (x1, x2) for l = 1, 2 and j = 1, . . . ,m. As a result, theVTV of u in (8) simplifies to

∫Ω ‖v(x)‖? dx. If u is not differentiable, then v may become the weak

gradients of u if u ∈ W 1,1(Ω;Rm), or a Radon-Nikodym measure if u ∈ BV (Ω;Rm). However,in common practice using finite differences for numerical implementation to approximate partialderivatives of functions, such subtlety does not make much differences. Therefore, we treat v asthe Jacobian of u throughout the rest of the paper and in numerical experiments.

Assume that we can derive the relation of the Jacobian v and the original observed data f andform a data fidelity H(v) of v, as an analogue of data fidelity h(u) of image u (justification of thisassumption will be presented in Section 2.3). Then we can reformulate the VTV regularized imagereconstruction problem (7) about u into a matrix ?-norm regularized inverse problem about v asfollows:

minv

α‖v‖?,1 +H(v). (11)

where ‖v‖?,1 :=∫

Ω ‖v(x)‖? dx. Now we seek for reconstructing v from (11), instead of u in (7), inthe first stage of our algorithm; then we assemble u by solving (32) below with the reconstructedv in the second stage. This reformulation (11) has two significant advantages compared to theoriginal formulation (7):

• The formulation (11) is an l1-type minimization which can be solved effectively by acceler-ated proximal gradient method. For example, using the fast iterative shrinkage/thresholdingalgorithm (FISTA) [1], we can attain an optimal convergence rate of O(1/k2) to solve (11)even if it is not strongly convex, where k is the iteration number. More details will be givenin Section 2.4. This is in sharp contrast to the best known O(1/k) rate of primal-dual basedmethods (including the recent, successful PDHG and ADMM) for solving (7).

• The per-iteration complexity of the l1-type minimization (11) is very low. In particular, wecan derive closed-form solution of v in the iterations of (11) as a matrix ?-norm variant ofsoft-shrinkage. More details will be given in Section 2.5.

7

Page 8: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

In the remainder of this section, we will answer the following three questions that are criticalin the proposed joint image reconstruction method based on (11):

1. How to design the fidelity H(v) for the Jacobian v from the relation between an image u andits measurement data in Radon and Fourier transform domains. (Section 2.3)

2. How to solve (11) efficiently with O(1/k2) convergence rate? (Section 2.4)

3. How to obtain the closed-form solution of v-subproblem for matrix ?-norms when solving(11)? (Section 2.5)

4. How to reconstruct image u using Jacobian v obtained in (11)? (Section 2.6)

2.3 Formulating fidelity of Jacobian v

In many imaging technologies, especially medical imaging, the image data are acquired in transformdomains. Among those common transforms, Fourier transform and Radon transform are the twomost widely used ones in medical imaging. In what follows, we show that the relation between animage u and its (undersampled) data f can be converted to that between the Jacobian matrix vand f in both transform domains. This allows us to derive the data fidelity H(v) of Jacobian vand reconstruct v from measured CT and MR data using (11).

2.3.1 Radon case

Computed tomography (CT) is a widely used medical imaging technology in clinical practice. CTdata is acquired in the Radon transform domain of the image. Under ideal, noiseless conditions,the CT measurement can be modeled by a line integral through the continuous object function. Forease of presentation, we consider a 2D Radon transform of a grey-level (scalar-valued) image u asfollows. Let θk : 0 ≤ θ1 < θ2 < · · · < θp < π, k = 1, . . . , p be the p projection angles, and x′ ∈ Rbe the position of a detector on the detector array. Then the line integral of u along direction θkwith offset x′ from the center of the detector array is formulated by

Rθk [u](x′) :=

∫ ∞−∞

u(x′ cos θk − y′ sin θk, x′ sin θk + y′ cos θk) dy′, (12)

Here Rθ[u](x′) |x′ ∈ R, θ ∈ [0, π) is the Radon transform of u, where (x′, y′) is the new orthogonalcoordinate by rotating (x, y) counterclock-wise by angle θ, where x′ (y′) is parallel (perpendicular)to the detector array. Let fk(x

′) be the measured Radon data Rθk [u](x′) collected by the detectorson the detector array, then the data fidelity h of u in (7) for CT can be formulated as follows,

h(u) =1

2

p∑k=1

‖Rθk [u]− fk‖2. (13)

On the other hand, by taking derivative with respect to x′ on both sides of (12), we obtain

(Rθk [u])′(x′) =

∫ ∞−∞

(cos θkv1 + sin θkv2) dy′ = cos θkRθk [v1] + sin θkRθk [v2], (14)

where vl = vl(x′ cos θk − y′ sin θk, x′ sin θk + y′ cos θk) = ∂lu(x′ cos θk − y′ sin θk, x′ sin θk + y′ cos θk)

for l = 1, 2. Therefore, the gradient v = du = (v1, v2) of image u is related to the derivative of theRadon data Rθk [u], i.e., f ′k(x

′), and the data fidelity of v can be formulated by

H(v) =1

2

p∑k=1

‖ cos θkRθk [v1] + sin θkRθk [v2]− f ′k‖2. (15)

8

Page 9: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

We point out that the evaluations of H and ∇H require twice as many Radon transforms than thatfor h(u) since there are two unknowns v1 and v2 in solving (11) rather than one u in (7). However,the significantly improved convergence rate O(1/k2) of (11) over O(1/k) of (7) can well compensatethe increased per-iteration computational cost. This is also demonstrated by the numerical resultsin Section 3 where we use relative error versus actual CPU time (with single core computation).In these numerical results, we show that the proposed method based on (11) is more efficient thanexisting ones based on (7) as the relative error of the former decays much faster in time. In practice,such improvement may be more significant since the computations of related to v1 and v2 in H and∇H in the proposed method are separable and can be carried out in parallel.

The reformulation above is based on parallel beam CT scans, where the projections are parallelfor each angle. For fan/cone beam CT scans, one can derive the reformulation similarly, or convertfan/cone beam CT data to parallel beam data and apply the reformulation above.

To summarize, given the measured Radon data fk for k = 1, . . . , p, we can formulate the datafidelity H(v) in (15) of the gradient/Jacobian v, instead of h(u) in (13) of the image u. Then wecan directly solve v from (11).

2.3.2 Fourier case

Imaging technologies, such as magnetic resonance imaging (MRI) and radar imaging, are based onFourier transform of images. The image data are the Fourier coefficients of image acquired in theFourier domain (also known as the k-space in MRI). The inverse problem of image reconstructionin such technologies often refers to recovering image from partial (i.e., undersampled) Fourier data.

In discrete setting, let u ∈ Rn denote a 2D gray-level (scalar-valued) MR image to be recon-structed, F ∈ Rn×n the (discrete) Fourier transform matrix (hence a unitary matrix), and P ∈ Rp×nthe undersampling matrix with binary values (0 or 1) formed by rows of the identity matrix to rep-resent the undersampling pattern (also called mask in k-space), and f ∈ Rp the undersampledFourier data. Therefore, the relation between the underlying image u and observed partial data fis given by PFu = f . Consequently, the data fidelity term of u can be formed as

h(u) = (1/2) · ‖PFu− f‖2. (16)

The gradient (partial derivatives) of an image can be regarded as a convolution. The Fouriertransform, on the other hand, is well known to convert a convolution to simple point-wise multi-plication in the transform domain. In discrete settings, this simply means that Dl := FDlF> is adiagonal matrix where Dl ∈ Rn×n is the discrete partial derivative (e.g., forward finite difference)operator along the xi direction (l = 1, 2). This amounts to a straightforward formulation for thedata fidelity of v as follows. Let vl be the partial derivative of u in xl direction, i.e., vl = Dlu,which is the discretized ∂lu, then we have

PFvl = PFDlu = PFDlF>Fu = PDlFu (17)

where we used the fact that F is orthogonal and hence F>F = In, the n × n identity matrix.Furthermore, we know that P>P ∈ Rn×n is a diagonal binary matrix and PP> = Ip. Hence, bymultiplying PP> on both sides, we obtain that

PFvl = (PP>)PDlFu = P (P>P )DlFu = PDlP>PFu = PDlP

>f (18)

where we used the facts that (P>P )Dl = Dl(P>P ) in the third equality since both P>P and Dl

are diagonal, and PFu = f in the last equality. Therefore, the data fidelity term H(v) in (11) can

9

Page 10: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

be formulated as

H(v) =1

2

(‖PFv1 − PD1P

>f‖2 + ‖PFv2 − PD2P>f‖2

). (19)

Now we can convert the measured Fourier data f for image u to Fourier data PDlP>f for vl.

In numerical implementation, this is done simply by multiplying Dl(ω) ∈ C, which can be easilypre-computed, to f(ω) ∈ C for each of the p sampled Fourier frequency ω.

Similar to the CT case, the evaluations of H and ∇H require twice as many Fourier transformsthan that for h(u). However, the significantly improved convergence rate of the proposed methodcan well compensate its increased per-iteration computational cost, as also shown by our numericalresults using error versus CPU time comparisons in Section 3.

In clinical MRI scans, it is also common that the Fourier data are not acquired on Cartesiangrids, such as those using radial and spiral sampling trajectories. In this case, a preprocessing stepcalled gridding can be applied to the data to interpolate values on Cartesian grids of the k-spaceto obtain f for (19).

Noise transformation. The derivation of data fidelity H(v) in (19) assumes noiseless Fouriermeasurements f . If f is contaminated by Gaussian noise, then we can derive the data fidelity H(v)to take the noise distribution into consideration. Suppose there are Gaussian noises in the dataacquisition such that

f(ω) = PF [u](ω) + e(ω) (20)

for each frequency value ω in Fourier domain. Here F [u] is the Fourier transform of image u, ande(ω) = a(ω) + ib(ω) is the noise at frequency ω where a(ω) and b(ω) are the real and imaginaryparts of e(ω) repsectively. By multiplying Dl(ω) on both sides of (20), we see that this fidelity ofvl (l = 1, 2 indicates the axes in R2) satisfies

Dl(ω)f(ω) = PF [vl](ω) + Dl(ω)e(ω) .

Suppose that Dl(ω) = Al(ω) + iBl(ω) where Al(ω) ∈ R and Bl(ω) ∈ R are real and imaginaryparts of Dl(ω) respectively, and the noise e(ω) satisfies that a(ω) ∼ N(0, σ2

a) and b(ω) ∼ N(0, σ2b )

are independent for all frequency ω. Then we know that the transformed noise Dl(ω)e(ω) areindependent for different ω and distributed as bivariate Gaussian(

real(Dl(ω)e(ω))

imag(Dl(ω)e(ω))

)∼ N

(0, Cl(ω)ΣC>l (ω)

)(21)

where Cl(ω) = [Al(ω),−Bl(ω);Bl(ω), Al(ω)] ∈ R2×2 and Σ = diag(σ2a, σ

2b ) ∈ R2×2. Therefore, we

can readily obtain the maximum likelihood of Dl(ω)e(ω) and hence the fidelity of vl. In particular, ifσa = σb = σ, then real(Dl(ω)e(ω)) and imag(Dl(ω)e(ω)) are two i.i.d. N(0, σ2(A2

l (ω)+B2l (ω))), i.e.,

N(0, σ2|Dl(ω)|2). In this case, we denote Ψl = diag(1/|Dl(ω)|2), then data fidelity (i.e., negativelog-likelihood) of v simply becomes

H(v) =1

2

(‖PFv1 − PD1P

>f‖2Ψ1+ ‖PFv2 − PD2P

>f‖2Ψ2

), (22)

which is the same as (19) but with a proper weight Ψl for the data fidelity of vl. In the remainderof this paper, we assume that the standard deviation of real and imaginary parts are both σ. Fur-thermore, in numerical experiments, we can pre-compute |Dl(ω)|2 and only perform an additionalpoint-wise division by |Dl(ω)|2 after computing PFvl − PDlP

>f in each iteration.

10

Page 11: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

2.4 Edge reconstruction scheme and computation complexity

As we have obtained the expression of H(v), the minimization problem (11) is formulated andcan be solved by accelerated proximal gradient methods with optimal convergence rate of O(1/k2).To this end, we employ the idea of accelerated gradient descent in [28] and obtain the followingiterative scheme to solve (11):

v(k+1) = arg minv

α‖v‖? +

1

2τ‖v − (w(k) − τ∇H(w(k)))‖2

(23)

w(k+1) = v(k+1) + αk+1(α−1k − 1)(v(k+1) − v(k)) (24)

with arbitrary initialization v(0) and w(0) = v(0), and step size τ ∈ (0, 1/L] where L is the Lipschitzconstant of ∇H. The subproblem (23) can be solved by closed-form formula called matrix-valuedshrinkage, for which the details will be given soon in Section 2.5.

Now the scheme (23)-(24) can generate a sequence of iterates v(k)k=0,1,... which yields theoptimal convergence rateO(1/k2) in objective function among all first-order gradient based methodsapplied to (11). For completeness, we summarize this convergence result in the following theorem,and provide a concise proof by closely following [36].

Theorem 1. Let v(k) be the iterates generated by (23)-(24) with any initial v(0), and αk bechosen such that (1− αk+1)/α2

k+1 ≤ 1/α2k, 0 < αk ≤ 1 for all k ≥ 0 and α0 = 1. Suppose H(v) is

convex and its gradient is L-Lipschitz continuous. Let v∗ be any optimal solution of minvφ(v) :=α‖v‖?,1 +H(v) in (11) and φ∗ = φ(v∗), then there is, for all k ≥ 1, that

φ(v(k))− φ∗ ≤α2k‖v∗ − v(0)‖2

2τ(1− αk)= O

( 1

k2

). (25)

Proof. Define `φ(v, w) := α‖v‖?,1 +H(w) + 〈∇H(w), v−w〉, that is, `φ(v, w) approximates φ(v) byreplacing the H(v) part with its first-order Taylor expansion H(w) + 〈∇H(w), v − w〉 at w. Thenby convexity of H(x) and Lipschitz continuity of ∇H, we have

`φ(v, w) ≤ φ(v) ≤ `φ(v, w) + (L/2) · ‖v − w‖2 (26)

for all v, w. Denote w(k) = (1− αk)v(k) + αkv∗. Then we have

φ(v(k+1)) ≤ `φ(v(k+1), w(k)) +L

2‖v(k+1) − w(k)‖2

≤ `φ(v(k+1), w(k)) +1

2τ‖v(k+1) − w(k)‖2

≤ `φ(w(k), w(k)) +1

2τ‖w(k) − w(k)‖2 − 1

2τ‖w(k) − v(k+1)‖2

≤ (1− αk)`φ(v(k), w(k)) + αk`φ(v∗, w(k)) +1

2τ‖(1− αk)v(k) + αkv

∗ − w(k)‖2

− 1

2τ‖(1− αk)v(k) + αkv

∗ − v(k+1)‖2

≤ (1− αk)φ(v(k)) + αkφ(v) +α2k

2τ(‖v∗ − z(k)‖2 − ‖v∗ − z(k+1)‖2)

where we used (26) in the first inequality, τ ≤ 1/L in the second inequality, the optimality of v(k+1)

in solving (23) (which is equivalent to v(k+1) = arg minv `φ(v, w(k)) + (2τ)−1 · ‖v − w(k)‖2) and2〈a−b, a〉 = ‖a−b‖2+‖a‖2−‖b‖2 for all a, b in the third inequality, the convexity of `φ and definition

11

Page 12: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

of w(k) in the fourth inequality, and again (26) and the notation z(k+1) := v(k) +α−1k (v(k+1)− v(k))

for all k ≥ 0 and z(0) := v(0) in the last inequality. Now we subtract φ(v) on both sides, divide bothsides by α2

k, use the condition (1− αk+1)/α2k+1 ≤ 1/α2

k and α0 = 1, and finally take the telescopesum on both sides from 0 to k to obtain

1− αkα2k

(φ(v(k))− φ∗) ≤ 1− α0

α20

(φ(v(0))− φ∗) +1

2τ‖v∗ − z(0)‖2 =

1

2τ‖v∗ − z(0)‖2 (27)

Dividing both sides by (1 − αk)/α2k yields the inequality in (25). Furthermore, the condition

(1 − αk+1)/α2k+1 ≤ 1/α2

k implies that αk ≤ 2/(k + 2) and hence α2k/(1 − αk) = O(1/k2). This

completes the proof.

2.5 Closed form solution of matrix-valued shrinkage

As we can see, the algorithm step (34) calls for solution of the following type,

minX∈R2×m

α‖X‖? +1

2‖X −B‖2F , (28)

for specific matrix norm ‖ · ‖? and given matrix B ∈ R2×m. In what follows, we provide theclose-form solutions of (28) when the matrix ?-norm is Frobenius, induced 2-norm or nuclear norm,as mentioned in Section 2.1. The closed-form solution of (28) is denoted by shrink?(B,α). Thederivations are provided in Appendix A.

• Frobenius norm. This is the simplest case since ‖X‖F treats X as a vector in R2m in (28),for which the shrinkage has close-form solution. More specifically, the solution of (28) is

X∗ = max(‖B‖F − α, 0)B

‖B‖F. (29)

• Induced 2-norm. This is advocated the vectorial TV [18], but now we can provide a close-form solution of (28) as

X∗ = B − αξη> (30)

where ξ and η are the left and right singular vectors corresponding to the largest singularvalue of B.

• Nuclear norm. This norm promotes low rank and yields a close-form solution of (28) as

X∗ = U max(Σ− α, 0)V > (31)

where (U,Σ, V ) is the reduced singular value decomposition (SVD) of B.

The computation of (29) is essential shrinkage of vector and hence very cheap. The computationsof (30) and (31) involve (reduced) SVD, however, explicit formula also exists as the matrices havetiny size of 2-by-m (3-by-m if images are 3D), where m is the number of image channels/modalities.

Remark. It is worth noting that the computation of (28) is carried out at every pixel independentlyof others in each iterations. This allows straightforward parallel computing which can further reducereal-world computation time.

12

Page 13: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

2.6 Reconstruct image from gradients

Once the Jacobian v is reconstructed from data, the final step is to assemble the image u using v.Since this step is performed for each modality, the problem reduces to reconstruction of a scalar-valued image u from its gradient v = (v1, v2). In [29, 33], the image u is reconstructed by solvingthe Poisson equation ∆u = div(v) = −(D>1 v1 + D>2 v2) since v1 = D1u and v2 = D2u are thepartial derivatives of u. The boundary condition of this Poisson equation can be either Dirichlet orNeumann depending on the property of imaging modality. In medical imaging applications, suchas MRI, CT, and PET, the boundary condition is simply 0 since it often is just background nearimage boundary ∂Ω.

Numerically, it is more straightforward to recover u by solving the following minimization

minu‖D1u− v1‖2 + ‖D2u− v2‖2 + βh(u) (32)

with some parameter β > 0 to weight the data fidelity h(u) of u. The solution is easy to computesince the objective is smooth and highly efficient algorithms such as BFGS and conjugate gradientmethod can be applied. Moreover, it is also common that closed-form solution may exist. Forexample, in the MRI case where h(u) = 1

2‖PFu− f‖2, the solution of (32) is given by

u = F>[(D>1 D1 + D>2 D2 + βP>P

)−1 (D>1 Fv1 + D>2 Fv2 + βP>f

)](33)

where D>1 D1 + D>2 D2 + βP>P is diagonal and hence trivial to invert. The main computations arejust few Fourier transforms.

In fact, it is often sufficient to retrieve the base intensity of u from h(u) to reconstruct u from v,and hence the result is not sensitive to β when solving (32). Therefore, as long as we have recoveredthe finite differences v of u, and the intensity of u is 0 near ∂Ω (mostly true for medical imaging),the periodic boundary condition holds and we can just use the closed-form formula (33) to obtainu. To this end, we can estimate the DC coefficient f1 of f = (f1, . . . , fn)> ∈ Cn, i.e., the Fouriercoefficient of frequency 0, and use P = (1, 0, . . . , 0)> ∈ Rn in (33). The value of β > 0 does notmatter since the null spaces of Dl and P are complementary. The details are omitted here. It isalso worth pointing out that, by setting β = 0, the minimization (32) is just least squares whosenormal equation is the Poisson equation ∆u = div(v) = −(D>1 v1 +D>2 v2) mentioned above.

2.7 Summary of algorithm steps

To summarize, we propose the two-stage Algorithm 1, EdgeRec, which first recovers image edgev and then assembles image u. Each step in Algorithm 1 is executed only once. Step 0 requiresformation of H(·) which are given by (15) and (19) for the Radon and Fourier cases, respectively.Step 1 itself requires iterations which converge very quickly with rate O(1/k2) where k is iterationnumber. In Step 1, ε > 0 is a prescribed error tolerance to determine stopping criterion, andτ ∈ (0, 1/L] is a fixed step size where L is the Lipschitz constant of ∇H. The numerical sequenceαkk≥1 satisfies (1− αk+1)/α2

k+1 ≤ 1/α2k and 0 < αk ≤ 1 for all k ≥ 1 and α0 = 1. For example,

αk = 2/(k + 2) for all k ≥ 0, or α0 = α1 = 1 and αk+1 = ((α4k + 4α2

k)1/2 − α2

k)/2 for k ≥ 1, etc.The shrink? denotes the matrix-valued shrinkage operator corresponding to ?-norm as discussed inSection 2.5. Step 2 can be solved quickly by many smooth optimization methods such as BFGS orconjugate gradient. Moreover, it may just be solved using formula (33) as discussed earlier.

13

Page 14: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

Algorithm 1 Two-stage edge-based joint image reconstruction (EdgeRec)

Input: Imaging data f = fjj=1,...,m of m modalities. Set initial guess u and α > 0.Step 0: Compute Jacobian v(0) = Du and form fidelity H(·). Set w(0) = v(0).Step 1: Iterate (34)–(35) below for k = 0, 1, . . . until ‖v(k+1) − v(k)‖/‖v(k+1)‖ < ε:

v(k+1)i = shrink?(w

(k)i − τ [∇H(w(k))]i, ατ), for i = 1, . . . , n (34)

w(k+1) = v(k+1) + αk+1(α−1k − 1)(v(k+1) − v(k)) (35)

Step 2: Reconstruct image uj by (32) using output vj from Step 1 and fj for j = 1, . . . ,m.Output: Multimodal image u = ujj=1,...,m.

3 Numerical Results

To examine the performance of the proposed Algorithm 1 (labeled as EdgeRec), we conduct aseries of numerical experiments on synthetic and in vivo multi-contrast MRI data and multi-energyCT data using this algorithm, and compare with the following state-of-the-arts methods:

• The model (7), where the ‖Du‖∗ norm is Frobenius, induced 2-norm and nuclear norm,respectively, solved by Chambolle-Pock’s primal-dual algorithm presented in [9]. This methodlabeled as Primal Dual;

• The model (7), where the term V TV (u) is defined in (4), solved by primal-dual coordinateupdate algorithm [18] that performs coordinate updates of the primal and two dual variablesiteratively to approximate its solution. This method is labeled as VTV2;

• The methods presented in [32] for multi-energy CT image reconstruction. In [32] the jointimage reconstruction problem is modeled as a constrained minimization problem:

minuV TVS(u) (or V TVN (u)), subject to ‖Au− f‖W ≤ ε, (36)

where V TVS =∑m

j=1 TV (uj), V TVN =∫

Ω ‖Du(x)‖N dx and W is the covariance matrix fordata with Gaussian noise. By using the dual forms of all three terms ‖Au− f‖W , V TVS(u)and V TVN (u), the original problem (36) is reformulated as a min-max problem. Then themin-max problem is solved using the Chambolle-Pock’s primal-dual algorithm [9]. Theirmethods are labeled as VTVS and VTVN respectively.

All comparison algorithms are implemented and tested in MATLAB computing environment versionR2014a under Windows OS on a laptop computer equipped with Intel Core i7 2.7GHz (only 1 coreis used in computation) and 16GB of memory. In addition, we extensively test and tune theparameters, including the weight of regularization, step size(s), and ε (for VTVS and VTVN), foreach algorithm to its near optimal performance in terms of efficiency. The parameters we used togenerate the results are included in the descriptions of experiments below.

3.1 Multi-energy CT image reconstruction

To test the performance of the proposed method, we first conduct joint image reconstruction oftwo dual-energy CT datasets.

The first dataset is a synthetic shoulder CT image in two energy levels at 140 kVp and 80 kVpextracted from Figure 1 in [32]. We use projections of 30 equally spaced angles between 0 and

14

Page 15: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

178 degrees as the undersampled sinogram data for the 140 kVp image, and projections of another30 equally spaced angles between 3 and 180 degrees for the 80 kVp image. This data simulationpattern mimics the clinical energy-switching CT scan. The goal is to reconstruct the two images ofdifferent energy levels jointly. The original images are used as the ground truth (leftmost columnof Figure 1).

As we showed earlier, there are many matrix norms that can be used to define VTV. Therefore,our first test is to assess the difference of these norms when used on joint image reconstruction ofmulti-energy CT images. In particular, we use three norms, i.e., Frobenius norm, induced 2-norm,and nuclear norm, in the definition of VTV for EdgeRec. For comparison purpose, the state-of-the-art Primal Dual [9] is also applied to the one-stage VTV regularized minimization (7) with thesame three norms. For Primal Dual, the weight of VTV term is set to 5 × 10−3, and the primaland dual step sizes are set to 5× 10−3 and (1/8)/(5× 10−3), respectively. For EdgeRec, the weightof l1 is set to 2.5 × 10−4 and the step size is set to 5 × 10−3. For both methods, the stopping

criteria is that either the relative change∑2

j=1 ‖u(k)j − u

(k−1)j ‖/‖u(k)

j ‖ is lower than εtol = 10−8

or maximum iteration reaches kmax = 100. For Frobenius norm, both Primal Dual and EdgeReconly require a 2m-vector shrinkage at each pixel and hence is very fast. For 2-norm and nuclearnorm, a reduced SVD of 2×m matrix is needed at each pixel. More precisely, the 2-norm requiresthe largest singular value and its corresponding left and right singular vectors to compute theshrinkage, while the nuclear norm requires a complete SVD, as shown in Section 2.5. Therefore,the computational costs for these two norms are much higher than that of Frobenius norm in eachiteration. In our numerical implementation, we used the SVD function built in MATLAB for thesetwo norms, although there are faster implementations (especially for 2 ×m case where SVD hasclosed form) available. In the top two rows of Figure 1, from left to right, we show the groundtruth, direct and the reconstructed images using Primal Dual (with Frobenius, 2-norm, and nuclearnorm) and EdgeRec (same order of norms). The absolute error of reconstructions to the groundtruth images are shown in the bottom two rows of Figure 1, where brighter color means largererror. As we can see, with each of these three norms, EdgeRec consistently produces images withlower absolute error. The reconstruction efficiency can also be measured quantitatively as shown

in Figure 2. In the top row of Figure 2, we show the relative reconstruction error ‖u(k)j − u∗j‖/‖u∗j‖

(j = 1, 2) of Primal Dual and EdgeRec versus iteration number k for the images of two energy levels,where u∗ is the ground truth. From these two plots, we observe that the three norms yield similarreconstruction errors with slight differences as iteration progresses, while EdgeRec is much moreefficient than Primal Dual in terms of iteration complexity as the error decays a lot faster. Thisis due to the O(1/k2) convergence rate of EdgeRec in contrast to the O(1/k) rate of primal dualalgorithm on non-strongly convex functional. However, as mentioned in Section 2.3.2, the numberof Fourier transforms in each iteration of EdgeRec is different from Primal Dual and hence theformer has slightly higher per-iteration computational cost. In addition, the three norms used inPrimal Dual and EdgeRec also require different computational costs. Therefore, we plot the relativeerrors versus CPU time (in seconds) of Primal Dual and EdgeRec using single-core computationin MATLAB to compare their performance in practice. The comparison is shown in the bottomrow of Figure 2. From these two plots, it is clearly evident that EdgeRec is more efficient thanthe one-stage Primal Dual algorithm (which is considered as one of the most effective methods forsolving (7)) in this joint reconstruction problem, since the relative error of EdgeRec decays muchfaster than that of Primal Dual. Moreover, these plots also show that Frobenius norm appears tobe the most cost-effective one among the three norms used due to its low computational cost.

As we found that both Primal Dual and EdgeRec are the most cost-effective in terms of relativeerror when Frobenius norm is used in the definition of VTV, we compare them to VTVS, VTVN,

15

Page 16: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

Figure 1: Comparison of reconstruction results obtained by Primal Dual and EdgeRec using threedifferent norms for the definition of VTV on a synthetic shoulder CT image of two energy levels140 kVp and 80 kVp. Top two rows are the reconstruct 140 kVp (1st row) and 80 kVp (2nd row)images. From left to right: ground truth, Primal Dual with Frobeninus norm, 2-norm, and nuclearnorm, EdgeRec with Frobenius norm, 2-norm, and nuclear norm. Bottom two rows show theircorresponding absolute error to the ground truth image.

and VTV2 using the same dataset. For VTVS/VTVN, the two ε’s used in the reformulated min-max problem are both set to 100 [32]. The W weight (see (36) in [32]) is set to 106, the step sizeis set to 5 × 103, and the stopping criteria is εtol = 10−6 or kmax = 120. For VTV2, the weightof VTV term is set to 5 × 10−3, and the primal and dual step sizes are set to 5 × 10−3 and 1,respectively. The parameters for stopping criteria are set to εtol = 10−8 and kmax = 150. Theparameters for Primal Dual and EdgeRec are set the same as above. The reconstruction results aregiven in Figure 3, where in the top two rows from left to right are the ground truth, VTVS, VTVN,VTV2, Primal Dual (Frobenius) and EdgeRec (Frobenius). We also show the absolute errors ofthese reconstructions from the ground truth in the bottom two rows of Figure 3. As we can see,the proposed EdgeRec produces better image quality with smallest error among all comparisonmethods. The comparison plots of relative reconstruction error versus CPU time in Figure 4 alsoconfirm that the proposed EdgeRec is more efficient than others.

The second Radon dataset is an in vivo knee CT image also with two energy levels: 140 kVpand 80 kVp. The data is artificially downsampled as in the first experiment for reconstruction. Forthis dataset, we conduct the same comparison of the different matrix norms for VTV using PrimalDual and EdgeRec. For Primal Dual, the weight of VTV term is 0.05. For EdgeRec, the weight ofl1 term is 10−4. The parameter kmax = 150 for both methods, and all other parameters are set asabove. The reconstructed images and errors are shown in Figure 5, from which we again observe

16

Page 17: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

0 20 40 60 80 10010

−2

10−1

100

iteration

rela

tive

erro

r

Primal Dual (Frobenius)Primal Dual (2 norm)Primal Dual (nuclear)EdgeRec (Frobenius)EdgeRec (2 norm)EdgeRec (nuclear)

0 20 40 60 80 10010

−2

10−1

100

iteration

rela

tive

erro

r

Primal Dual (Frobenius)Primal Dual (2 norm)Primal Dual (nuclear)EdgeRec (Frobenius)EdgeRec (2 norm)EdgeRec (nuclear)

0 50 100 15010

−2

10−1

100

CPU time (s)

rela

tive

erro

r

Primal Dual (Frobenius)Primal Dual (2 norm)Primal Dual (nuclear)EdgeRec (Frobenius)EdgeRec (2 norm)EdgeRec (nuclear)

0 50 100 15010

−2

10−1

100

CPU time (s)

rela

tive

erro

r

Primal Dual (Frobenius)Primal Dual (2 norm)Primal Dual (nuclear)EdgeRec (Frobenius)EdgeRec (2 norm)EdgeRec (nuclear)

Figure 2: Relative error to ground truth versus iteration number (top row) and CPU time in seconds(bottom row) using Primal Dual and EdgeRec with different norms for VTV using the syntheticshoulder CT image of 140 kVp (left column) and 80 kVp (right column).

that EdgeRec produce better reconstruction quality with lower error. The relative error versusiteration number and CPU time are shown in Figure 6. This figure again suggests that Frobeniusnorm is the most cost-effective norm in VTV in terms of relative error when used in Primal Dualand EdgeRec. It also confirms the improved efficiency with EdgeRec over Primal Dual. In addition,we also show compare Primal Dual and EdgeRec to VTVS, VTVN, and VTV2. The parametersfor these three methods are set the same as in the first dataset. The reconstructed images and theirabsolute errors are shown in Figure 7. The quantitative comparison using relative error versus CPUtime is shown in Figure 8. From both figures, we can see that EdgeRec has significantly improvedefficiency compared to the existing methods.

3.2 Multi-contrast MR image reconstruction

To test the performance of EdgeRec for image reconstruction with partial Fourier data, we alsoconduct numerical experiments on two multi-contrast MR image datasets.

The first dataset is obtained from BrainWeb website [11] where the T1, T2, and PD k-spaceare artificially downsampled to 11.9% of full k-space data using a radial mask. Same as for CTimages, we first apply Primal Dual and EdgeRec using the three different matrix norms in VTV onthe undersampled data. For Primal Dual, the weight of VTV is set to 10−3, the primal and dualstep sizes are 1/8 and 1 respectively. For EdgeRec, the weight of l1 term is 10−4 and the step size

17

Page 18: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

Figure 3: Comparison of reconstruction results by different methods using the synthetic shoulderCT image of two energy levels. Top two rows are the reconstruct 140 kVp (1st row) and 80 kVp (2ndrow) images. From left to right: ground truth, VTVS, VTVN, VTV2, Primal Dual (Frobeninus),and EdgeRec (Frobeninus). Bottom two rows show the absolute error to the ground truth image.

is 1/8. For both methods, the termination parameters are set to εtol = 10−8 and kmax = 1000. Theresults are shown in Figure 9, from which we again observe lower reconstruction error by EdgeRec.The quantitative comparison using relative error versus iteration number (top row) and CPU timein seconds (bottom row) are plotted in Figure 10. This figure indicates improved convergence rateand efficiency of EdgeRec. In addition, it again suggests Frobenius norm for VTV in Primal Dualand EdgeRec as it is more cost-effective.

We also compare Primal Dual and EdgeRec with Frobenius norm to VTVS, VTVN, and VTV2.In particular, we manually add complex-valued Gaussian noise of standard deviation σ = 4 andσ = 8 to the (unnormalized) undersampled Fourier data, and apply the comparison methods tothis noisy data. For all noise levels, W is set to 106, the step size is 0.15, and the three ε’s areset to 2.4, 4.1 and 4.1 for both VTVS and VTVN. The termination parameter is εtol = 10−6, andkmax is set to 180 and 250 for VTVS and VTVN respectively. The parameters for Primal Dual andEdgeRec are the same as before. For VTV2, the weight of VTV term is 5× 10−4, and the primaland dual variables are set to 1 and 106, respectively. The termination parameters are εtol = 10−8

and kmax = 1000 for VTV2. The weight of VTV term in Primal Dual are 5× 10−3, 3× 10−3, and3 × 10−3 for σ = 0, 4, 8 respectively. All other parameters for Primal Dual and EdgeRec are the

18

Page 19: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

0 2 4 6 8 10 12

10−1

100

CPU time (s)

rela

tive

erro

r

VTVNVTVSVTV2Primal Dual (Frobenius)EdgeRec (Frobeninus)

0 2 4 6 8 10 12

10−1

100

CPU time (s)

rela

tive

erro

r

VTVNVTVSVTV2Primal Dual (Frobenius)EdgeRec (Frobeninus)

Figure 4: Comparisons of relative error to ground truth vs CPU time in seconds by differentmethods on the synthetic shoulder CT image of 140 kVp (left) and 80 kVp (right).

Figure 5: Comparison of reconstruction results obtained by Primal Dual and EdgeRec using threedifferent norms for the definition of VTV on an in vivo knee CT image of two energy levels. Toptwo rows are the reconstruct 140 kVp (1st row) and 80 kVp (2nd row) images. From left toright: ground truth, Primal Dual with Frobenius norm, 2-norm, and nuclear norm, EdgeRec withFrobenius norm, 2-norm, and nuclear norm. Bottom two rows show their corresponding absoluteerror to the ground truth image.

same as in their previous comparison. The results without noise (σ = 0), and with noise σ = 4 andσ = 8 are shown in Figures 11, 12, and 13, respectively. The quantitative comparison is given in

19

Page 20: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

0 20 40 60 80 100 120

10−1

100

iteration

rela

tive

erro

r

Primal Dual (Frobenius)Primal Dual (2 norm)Primal Dual (nuclear)EdgeRec (Frobenius)EdgeRec (2 norm)EdgeRec (nuclear)

0 20 40 60 80 100 120

10−1

100

iteration

rela

tive

erro

r

Primal Dual (Frobenius)Primal Dual (2 norm)Primal Dual (nuclear)EdgeRec (Frobenius)EdgeRec (2 norm)EdgeRec (nuclear)

0 10 20 30 40 50

10−1

100

CPU time (s)

rela

tive

erro

r

Primal Dual (Frobenius)Primal Dual (2 norm)Primal Dual (nuclear)EdgeRec (Frobenius)EdgeRec (2 norm)EdgeRec (nuclear)

0 10 20 30 40 50

10−1

100

CPU time (s)

rela

tive

erro

r

Primal Dual (Frobenius)Primal Dual (2 norm)Primal Dual (nuclear)EdgeRec (Frobenius)EdgeRec (2 norm)EdgeRec (nuclear)

Figure 6: Relative error to ground truth versus iteration number (top row) and CPU time inseconds (bottom row) using Primal Dual and EdgeRec with different norms for VTV using the invivo knee CT image of 140 kVp (left column) and 80 kVp (right column).

Figure 14. Again the results confirm that EdgeRec with Frobenius norm can outperform existingmethods in terms of reconstruction error.

Since we can obtain the noise distribution in the Fourier transform of Jacobian, the data fidelityis more accurately formulated by taking such noise distribution into consideration, as we showed inSection 2.3.2. To demonstrate this numerically, we test EdgeRec without and with the considerationof noise distribution on the BrainWeb dataset. More specifically, EdgeRec without considerationof noise distribution uses (19) as the data fidelity term, whereas EdgeRec with consideration noisedistribution uses (22) which essentially employs different weights specified in Ψi in the norm.The results are shown in Figure 15. The results imply that larger noise in data yield higherreconstruction error, which is expected. More importantly, EdgeRec can reach better accuracywhen the noise distribution is considered and the weighted data fidelity term given in (22) is used.

The second MRI dataset is an in vivo MR image of two contrasts: T1 and T2. The k-spaceis again artificially undersampled to 15.0% using a radial mask. We first conduct the same test ofdifferent norms using Primal Dual and EdgeRec. For Primal Dual, the weight of VTV is set to5× 10−3, the primal and dual step sizes are 1/8 and 1 respectively. For EdgeRec, the weight of l1term is 5 × 10−5 and the step size is 1/8. For both methods, the termination parameters are setto εtol = 10−8 and kmax = 200. The reconstructed images in Figure 16 and relative error plots inFigure 17. Same as for the first MR data, these figures imply that EdgeRec is more efficient than

20

Page 21: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

Figure 7: Comparison of reconstruction results by different methods using the in vivo knee CTimage. Top two rows are the reconstructed 140 kVp image (1st row) and 80 kVp image (2nd row).From left to right: ground truth, VTVS, VTVN, VTV2, Primal Dual (Frobeninus), and EdgeRec(Frobeninus). Bottom two rows show the corresponding absolute error to the ground truth image.

Primal Dual, and Frobenius norm is more cost effective than other norms for these two methods.We also compare Primal Dual and EdgeRec with Frobenius norm to VTVS, VTVN, and VTV2

for this dataset. In this case, we also manually add noise to the undersampled Fourier data withnoise levels σ = 0, 4, 8. For all noise levels, the parameter W is set to 106, the step size is 0.2, and thetwo ε’s are set to 7 and 11 for both VTVS and VTVN. The termination parameter is εtol = 10−6 forboth VTVN and VTVS. The parameters for Primal Dual and EdgeRec are the same as before. ForVTV2, the weight of VTV term is 5× 10−3, and the primal and dual variables are set to 1 and 10,respectively. The termination parameters are εtol = 10−8 for VTV2. The weight of VTV term forPrimal Dual are 5×10−3, 5×10−3, and 1×10−2 for σ = 0, 4, 8 respectively. All other parameters forPrimal Dual and EdgeRec are the same as in their previous comparison. The reconstructed resultsare shown in Figures 18, 19, and 20, respectively. The plots of relative error versus CPU time inseconds are given in Figure 21. Again the results confirm that EdgeRec with Frobenius norm is themost efficient one among all comparison methods. Finally, we also test EdgeRec without and withnoise weights for this in vivo dataset. Again, EdgeRec without consideration of noise weight uses(19) as the data fidelity term, whereas EdgeRec with noise weight uses (22). The results are shownin Figure 22, which implies that EdgeRec can reach better accuracy when the noise distribution is

21

Page 22: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

0 5 10 15 20 25

10−1

100

CPU time (s)

rela

tive

erro

r

VTVNVTVSVTV2Primal Dual (Frobenius)EdgeRec (Frobeninus)

0 5 10 15 20 25

10−1

100

CPU time (s)

rela

tive

erro

r

VTVNVTVSVTV2Primal Dual (Frobenius)EdgeRec (Frobeninus)

Figure 8: Comparisons of relative reconstruction error vs CPU time in seconds by different methodson the in vivo knee CT image of 140 kVp (left) and 80 kVp (right).

considered and the weighted data fidelity term in (22) is used.

4 Concluding Remarks

We proposed a new two-stage joint image reconstruction method by recovering edge (i.e., gradients,or Jacobian) of the multi-modal/contrast image directly from observed data, and then assemblingthe final image using the recovered edges. The first stage yields an optimal convergence rate O(1/k2)to recover the edges, and we provide a fast, closed-form solution using matrix-valued shrinkagein each iteration of this stage. The second stage of reconstructing the image using the edgesrecovered in the first stage is formulated as a smooth convex problem that can be solved very quickly.Moreover, we showed that there is even closed-form solution for this stage which requires veryminor computation effort. Overall, the proposed method improves upon the O(1/k) convergencerate of the state-of-the-art primal-dual algorithms for TV/VTV based image reconstruction. Wedemonstrated such improvement using extensive numerical tests on multi-contrast CT and MRimage reconstructions.

Appendix

A Computation of matrix-valued shrinkage

We show that the minimization problem (28) has closed-form solutions when the matrix ?-norm isFrobenius, induced 2-norm, or nuclear norm.

• Frobenius norm. In the case, all matrices can be considered as vectors and hence thevector-valued shrinkage formula can be directly applied. We omit the derivations here.

• Induced 2-norm. In the case, (28) becomes

minX‖X‖2 +

1

2α‖X −B‖2F = min

Xmax

‖ξ‖=‖η‖=1ξ>Xη +

1

2α‖X −B‖2F (37)

where ξ = (ξ1, ξ2)> ∈ R2 and η = (η1, . . . , ηm)> ∈ Rm.

22

Page 23: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

Figure 9: Reconstruction results by Primal Dual and EdgeRec on a BrainWeb MR image of T1, T2and PD contrasts. Top three rows are the reconstruct T1 (1st row), T2 (2nd row) and PD (3rd row)images. From left to right: ground truth, direct inverse Fourier, Primal Dual with Frobenius norm,2-norm, and nuclear norm, EdgeRec with Frobenius norm, 2-norm, and nuclear norm. Bottom tworows show their absolute error to the ground truth image.

Switching min and max, and solving for X with fixed ξ and η, we obtain X = B−αξη> andthe dual problem of (37) becomes

max‖ξ‖=‖η‖=1

ξ>Bη − αξ>(ξη>)η +α

2‖ξη>‖2F = max

‖ξ‖=‖η‖=1ξ>Bη − α

2(38)

where we used the facts that ξ>(ξη>)η = ‖ξ‖2‖η‖2 = 1 and ‖ξη>‖2F =∑

i

∑j(ξiηj)

2 =∑i(ξi)

2∑

j(ηj)2 = 1. Therefore, ξ and η are the left and right singular vectors of B, and

hence the optimal solution of (37) is X∗ = B − αξη>.

To obtain X∗, one needs to compute the largest singular vectors of B, or the SVD of B. Note

23

Page 24: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

0 200 400 600 800 100010

−2

10−1

100

iteration

rela

tive

erro

r

Primal Dual (Frobenius)Primal Dual (2 norm)Primal Dual (nuclear)EdgeRec (Frobenius)EdgeRec (2 norm)EdgeRec (nuclear)

0 200 400 600 800 100010

−2

10−1

100

iteration

rela

tive

erro

r

Primal Dual (Frobenius)Primal Dual (2 norm)Primal Dual (nuclear)EdgeRec (Frobenius)EdgeRec (2 norm)EdgeRec (nuclear)

0 200 400 600 800 100010

−2

10−1

100

iteration

rela

tive

erro

r

Primal Dual (Frobenius)Primal Dual (2 norm)Primal Dual (nuclear)EdgeRec (Frobenius)EdgeRec (2 norm)EdgeRec (nuclear)

0 5 10 15 20 25 30

10−1

100

CPU time (s)

rela

tive

erro

r

Primal Dual (Frobenius)Primal Dual (2 norm)Primal Dual (nuclear)EdgeRec (Frobenius)EdgeRec (2 norm)EdgeRec (nuclear)

0 5 10 15 20 25 30

10−1

100

CPU time (s)

rela

tive

erro

r

Primal Dual (Frobenius)Primal Dual (2 norm)Primal Dual (nuclear)EdgeRec (Frobenius)EdgeRec (2 norm)EdgeRec (nuclear)

0 5 10 15 20 25 30

10−1

100

CPU time (s)

rela

tive

erro

r

Primal Dual (Frobenius)Primal Dual (2 norm)Primal Dual (nuclear)EdgeRec (Frobenius)EdgeRec (2 norm)EdgeRec (nuclear)

Figure 10: Comparisons of relative error vs iteration number (top row) and CPU time in seconds(bottom row) by Primal Dual and EdgeRec with three different norms on the BrainWeb MR imageof T1 (left), T2 (middle) and PD (right) contrasts.

that B has size 2 ×m, and hence a reduced SVD for is sufficient. In particular, for m = 2,there is a closed form expression to compute SVD of X. In general, SVD of a 2×m matrixis easy to compute and has at most two singular values.

• Nuclear norm This corresponds to standard nuclear norm shrinkage: compute SVD ofB = UΣV >, and set X∗ = U max(Σ− α, 0)V >, as shown in [8]. Computation complexity iscomparable to the 2-norm case above.

References

[1] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverseproblems. SIAM Journal on Imaging Sciences, 2(1):183–202, 2009.

[2] B. Bilgic, V. K. Goyal, and E. Adalsteinsson. Multi-contrast reconstruction with bayesiancompressed sensing. Magnetic Resonance in Medicine, 66(6):1601–1615, 2011.

[3] P. Blomgren and T. F. Chan. Color tv: total variation methods for restoration of vector-valuedimages. IEEE transactions on image processing, 7(3):304–309, 1998.

[4] K. Bredies and M. Holler. A tgv-based framework for variational image decompression, zoom-ing, and reconstruction. part i: Analytics. SIAM Journal on Imaging Sciences, 8(4):2814–2850,2015.

[5] K. Bredies and M. Holler. A tgv-based framework for variational image decompression, zoom-ing, and reconstruction. part ii: Numerics. SIAM Journal on Imaging Sciences, 8(4):2851–2886,2015.

[6] K. Bredies, K. Kunisch, and T. Pock. Total generalized variation. SIAM Journal on ImagingSciences, 3(3):492–526, 2010.

24

Page 25: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

Figure 11: Reconstruction results by different methods on the BrainWeb MR image of T1, T2 andPD contrasts with no noise (σ = 0). Top three rows are the reconstruct T1 (1st row), T2 (2ndrow) and PD (3rd row) images. From left to right: ground truth, direct inverse Fourier, VTVS,VTVN, VTV2, Primal Dual (Frobenius) and EdgeRec (Frobenius). Bottom three rows show theirabsolute error to the ground truth image.

[7] X. Bresson and T. F. Chan. Fast dual minimization of the vectorial total variation norm andapplications to color image processing. Inverse problems and imaging, 2(4):455–484, 2008.

[8] J.-F. Cai, E. J. Candes, and Z. Shen. A singular value thresholding algorithm for matrixcompletion. SIAM J. Optim., 20(4):1956–1982, 2010.

25

Page 26: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

Figure 12: Reconstruction results by different methods on the BrainWeb MR image of T1, T2 andPD contrasts with noise level σ = 4. Top three rows are the reconstruct T1 (1st row), T2 (2ndrow) and PD (3rd row) images. From left to right: ground truth, direct inverse Fourier, VTVS,VTVN, VTV2, Primal Dual (Frobenius) and EdgeRec (Frobenius). Bottom three rows show theirabsolute error to the ground truth image.

[9] A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems withapplications to imaging. Journal of Mathematical Imaging and Vision, 40(1):120–145, 2011.

[10] I. Chatnuntawech, A. Martin, B. Bilgic, K. Setsompop, E. Adalsteinsson, and E. Schiavi.Vectorial total generalized variation for accelerated multi-channel multi-contrast mri. Magnetic

26

Page 27: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

Figure 13: Reconstruction results by different methods on the BrainWeb MR image of T1, T2 andPD contrasts with noise level σ = 8. Top three rows are the reconstruct T1 (1st row), T2 (2ndrow) and PD (3rd row) images. From left to right: ground truth, direct inverse Fourier, VTVS,VTVN, VTV2, Primal Dual (Frobenius) and EdgeRec (Frobenius). Bottom three rows show theirabsolute error to the ground truth image.

resonance imaging, 34(8):1161–1170, 2016.

[11] C. A. Cocosco, V. Kollokian, R. K.-S. Kwan, G. B. Pike, and A. C. Evans. Brainweb: Onlineinterface to a 3d mri simulated brain database. In NeuroImage. Citeseer, 1997.

27

Page 28: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

0 5 10 15 20 25 30 35

10−1

100

CPU time (s)

rela

tive

erro

r

VTVNVTVSVTV2Primal Dual (Frobenius)EdgeRec (Frobeninus)

0 5 10 15 20 25 30 35

10−1

100

CPU time (s)

rela

tive

erro

r

VTVNVTVSVTV2Primal Dual (Frobenius)EdgeRec (Frobeninus)

0 5 10 15 20 25 30 35

10−1

100

CPU time (s)

rela

tive

erro

r

VTVNVTVSVTV2Primal Dual (Frobenius)EdgeRec (Frobeninus)

0 5 10 15 20 25 30 35 40 45

10−1

100

CPU time (s)

rela

tive

erro

r

VTVNVTVSVTV2Primal Dual (Frobenius)EdgeRec (Frobeninus)

0 5 10 15 20 25 30 35 40 45

10−1

100

CPU time (s)

rela

tive

erro

r

VTVNVTVSVTV2Primal Dual (Frobenius)EdgeRec (Frobeninus)

0 5 10 15 20 25 30 35 40 45

10−1

100

CPU time (s)

rela

tive

erro

r

VTVNVTVSVTV2Primal Dual (Frobenius)EdgeRec (Frobeninus)

0 5 10 15 20 25 30 35

10−1

100

CPU time (s)

rela

tive

erro

r

VTVNVTVSVTV2Primal Dual (Frobenius)EdgeRec (Frobeninus)

0 5 10 15 20 25 30 35

10−1

100

CPU time (s)

rela

tive

erro

r

VTVNVTVSVTV2Primal Dual (Frobenius)EdgeRec (Frobeninus)

0 5 10 15 20 25 30 35

10−1

100

CPU time (s)

rela

tive

erro

r

VTVNVTVSVTV2Primal Dual (Frobenius)EdgeRec (Frobeninus)

Figure 14: Comparisons of relative error vs CPU time in seconds by different methods on theBrainWeb MR image of T1 (left), T2 (middle) and PD (right) contrasts with no noise (top row),noise level σ = 4 (middle row) and noise level σ = 8 (bottom row).

0 5 10 15 20 25

10−1

100

CPU time (s)

rela

tive

erro

r

EdgeRec (w/ noise weight) σ=4EdgeRec (w/o noise weight) σ=4EdgeRec (w/ noise weight) σ=8EdgeRec (w/o noise weight) σ=8EdgeRec (w/ noise weight) σ=12EdgeRec (w/o noise weight) σ=12

0 5 10 15 20 25

10−1

100

CPU time (s)

rela

tive

erro

r

EdgeRec (w/ noise weight) σ=4EdgeRec (w/o noise weight) σ=4EdgeRec (w/ noise weight) σ=8EdgeRec (w/o noise weight) σ=8EdgeRec (w/ noise weight) σ=12EdgeRec (w/o noise weight) σ=12

0 5 10 15 20 25

10−1

100

CPU time (s)

rela

tive

erro

r

EdgeRec (w/ noise weight) σ=4EdgeRec (w/o noise weight) σ=4EdgeRec (w/ noise weight) σ=8EdgeRec (w/o noise weight) σ=8EdgeRec (w/ noise weight) σ=12EdgeRec (w/o noise weight) σ=12

Figure 15: Relative error vs CPU time by EdgeRec without and with consideration of noise weightusing the BrainWeb MR image of T1 (left), T2 (middle) and PD (right) with noise levels σ = 4, 8, 12.

28

Page 29: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

Figure 16: Reconstruction results by Primal Dual and EdgeRec on an in vivo MR image of T1and T2 contrasts. Top two rows are the reconstruct T1 (1st row) and T2 (2nd row) images. Fromleft to right: ground truth, direct inverse Fourier, Primal Dual with Frobenius norm, 2-norm, andnuclear norm, EdgeRec with Frobenius norm, 2-norm, and nuclear norm. Bottom two rows showtheir absolute error to the ground truth image.

[12] S. Di Zenzo. A note on the gradient of a multi-image. Computer vision, graphics, and imageprocessing, 33(1):116–125, 1986.

[13] J. Duran, M. Moeller, C. Sbert, and D. Cremers. Collaborative total variation: a generalframework for vectorial tv models. SIAM Journal on Imaging Sciences, 9(1):116–151, 2016.

[14] M. J. Ehrhardt and S. R. Arridge. Vector-valued image processing by parallel level sets. IEEETransactions on Image Processing, 23(1):9–18, 2014.

[15] M. J. Ehrhardt, K. Thielemans, L. Pizarro, D. Atkinson, S. Ourselin, B. F. Hutton, and S. R.Arridge. Joint reconstruction of pet-mri by exploiting structural similarity. Inverse Problems,31(1):015001, 2015.

[16] M. J. Ehrhardt, K. Thielemans, L. Pizarro, P. Markiewicz, D. Atkinson, S. Ourselin, B. F.Hutton, and S. R. Arridge. Joint reconstruction of pet-mri by parallel level sets. In Nuclear Sci-ence Symposium and Medical Imaging Conference (NSS/MIC), 2014 IEEE, pages 1–6. IEEE,2014.

29

Page 30: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

0 50 100 150 20010

−2

10−1

100

iteration

rela

tive

erro

r

Primal Dual (Frobenius)Primal Dual (2 norm)Primal Dual (nuclear)EdgeRec (Frobenius)EdgeRec (2 norm)EdgeRec (nuclear)

0 50 100 150 20010

−2

10−1

100

iteration

rela

tive

erro

r

Primal Dual (Frobenius)Primal Dual (2 norm)Primal Dual (nuclear)EdgeRec (Frobenius)EdgeRec (2 norm)EdgeRec (nuclear)

0 10 20 30 40 50

10−1

100

CPU time (s)

rela

tive

erro

r

Primal Dual (Frobenius)Primal Dual (2 norm)Primal Dual (nuclear)EdgeRec (Frobenius)EdgeRec (2 norm)EdgeRec (nuclear)

0 10 20 30 40 50

10−1

100

CPU time (s)

rela

tive

erro

r

Primal Dual (Frobenius)Primal Dual (2 norm)Primal Dual (nuclear)EdgeRec (Frobenius)EdgeRec (2 norm)EdgeRec (nuclear)

Figure 17: Comparisons of relative error vs iteration number (top row) and CPU time in seconds(bottom row) by Primal Dual and EdgeRec with three different norms on the in vivo MR image ofT1 (left) and T2 (right) contrasts.

30

Page 31: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

Figure 18: Reconstruction results by different methods on the in vivo MR image of T1 and T2contrasts with no noise (σ = 0). Top two rows are the reconstruct T1 (1st row) and T2 (2nd row)images. From left to right: ground truth, direct inverse Fourier, VTVS, VTVN, VTV2, PrimalDual (Frobenius) and EdgeRec (Frobenius). Bottom two rows show their absolute error to theground truth image.

[17] V. Estellers, S. Soatto, and X. Bresson. Adaptive regularization with the structure tensor.IEEE Transactions on Image Processing, 24(6):1777–1790, 2015.

[18] B. Goldluecke and D. Cremers. An approach to vectorial total variation based on geometricmeasure theory. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conferenceon, pages 327–333. IEEE, 2010.

[19] M. Grasmair and F. Lenzen. Anisotropic total variation filtering. Applied Mathematics &Optimization, 62(3):323–339, 2010.

[20] E. Haber and M. H. Gazit. Model fusion and joint inversion. Surveys in Geophysics, 34(5):675–695, 2013.

[21] J. Huang, C. Chen, and L. Axel. Fast multi-contrast mri reconstruction. Magnetic resonanceimaging, 32(10):1344–1352, 2014.

[22] F. Knoll, K. Bredies, T. Pock, and R. Stollberger. Second order total generalized variation(tgv) for mri. Magnetic resonance in medicine, 65(2):480–491, 2011.

31

Page 32: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

Figure 19: Reconstruction results by different methods on the in vivo MR image of T1 and T2contrasts with noise level σ = 4. Top two rows are the reconstruct T1 (1st row) and T2 (2nd row)images. From left to right: ground truth, direct inverse Fourier, VTVS, VTVN, VTV2, PrimalDual (Frobenius) and EdgeRec (Frobenius). Bottom two rows show their absolute error to theground truth image.

[23] F. Knoll, M. Holler, T. Koesters, R. Otazo, K. Bredies, and D. K. Sodickson. Joint mr-petreconstruction using a multi-channel image regularizer. IEEE transactions on medical imaging,36(1):1–16, 2017.

[24] S. Lefkimmiatis, A. Roussos, P. Maragos, and M. Unser. Structure tensor total variation.SIAM Journal on Imaging Sciences, 8(2):1090–1122, 2015.

[25] F. Lenzen and J. Berger. Solution-driven adaptive total variation regularization. In Interna-tional Conference on Scale Space and Variational Methods in Computer Vision, pages 203–215.Springer, 2015.

[26] A. Majumdar and R. K. Ward. Joint reconstruction of multiecho mr images using correlatedsparsity. Magnetic resonance imaging, 29(7):899–906, 2011.

[27] M. Moeller, E.-M. Brinkmann, M. Burger, and T. Seybold. Color bregman tv. SIAM Journalon Imaging Sciences, 7(4):2771–2806, 2014.

32

Page 33: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

Figure 20: Reconstruction results by different methods on the in vivo MR image of T1 and T2contrasts with noise level σ = 8. Top two rows are the reconstruct T1 (1st row) and T2 (2nd row)images. From left to right: ground truth, direct inverse Fourier, VTVS, VTVN, VTV2, PrimalDual (Frobenius) and EdgeRec (Frobenius). Bottom two rows show their absolute error to theground truth image.

[28] Y. Nesterov. A method for unconstrained convex minimization problem with the rate ofconvergence o (1/k2). Technical Report 3, Doklady AN SSSR, 1983.

[29] V. M. Patel, R. Maleh, A. C. Gilbert, and R. Chellappa. Gradient-based image recoverymethods from incomplete fourier measurements. IEEE Transactions on Image Processing,21(1):94–105, 2012.

[30] J. Rasch, E.-M. Brinkmann, and M. Burger. Joint reconstruction via coupled bregman itera-tions with applications to pet-mr imaging. Inverse Problems, 34(1):014001, 2017.

[31] J. Rasch, V. Kolehmainen, R. Nivajarvi, M. Kettunen, O. Grohn, M. Burger, and E.-M.Brinkmann. Dynamic mri reconstruction from undersampled data with an anatomical prescan.Inverse Problems, 34(7):074001, 2018.

[32] D. S. Rigie and P. J. La Riviere. Joint reconstruction of multi-channel, spectral ct data viaconstrained total nuclear variation minimization. Physics in Medicine & Biology, 60(5):1741,2015.

33

Page 34: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

0 20 40 60 80 10010

−2

10−1

100

CPU time (s)

rela

tive

erro

r

VTVNVTVSVTV2Primal Dual (Frobenius)EdgeRec (Frobeninus)

0 20 40 60 80 10010

−2

10−1

100

CPU time (s)

rela

tive

erro

r

VTVNVTVSVTV2Primal Dual (Frobenius)EdgeRec (Frobeninus)

0 20 40 60 80 100 12010

−2

10−1

100

CPU time (s)

rela

tive

erro

r

VTVNVTVSVTV2Primal Dual (Frobenius)EdgeRec (Frobeninus)

0 20 40 60 80 100 12010

−2

10−1

100

CPU time (s)

rela

tive

erro

r

VTVNVTVSVTV2Primal Dual (Frobenius)EdgeRec (Frobeninus)

0 20 40 60 80 100 120

10−1

100

CPU time (s)

rela

tive

erro

r

VTVNVTVSVTV2Primal Dual (Frobenius)EdgeRec (Frobeninus)

0 20 40 60 80 100 120

10−1

100

CPU time (s)

rela

tive

erro

r

VTVNVTVSVTV2Primal Dual (Frobenius)EdgeRec (Frobeninus)

Figure 21: Comparisons of relative error vs CPU time in seconds by different methods on the invivo MR image of T1 (left) and T2 (right) contrasts with no noise (top row), noise level σ = 4(middle row) and noise level σ = 8 (bottom row).

34

Page 35: A two-stage algorithm for joint multimodal image …pdfs.semanticscholar.org/8717/296b3e9112e7ef8bc94dc66c7e...modality images, such as PET-CT, PET-MRI, and multi-contrast images,

0 10 20 30 40 50 60 70 8010

−2

10−1

100

CPU time (s)

rela

tive

erro

r

EdgeRec (w/ noise weight) σ=4EdgeRec (w/o noise weight) σ=4EdgeRec (w/ noise weight) σ=8EdgeRec (w/o noise weight) σ=8EdgeRec (w/ noise weight) σ=12EdgeRec (w/o noise weight) σ=12

0 10 20 30 40 50 60 70 8010

−2

10−1

100

CPU time (s)

rela

tive

erro

r

EdgeRec (w/ noise weight) σ=4EdgeRec (w/o noise weight) σ=4EdgeRec (w/ noise weight) σ=8EdgeRec (w/o noise weight) σ=8EdgeRec (w/ noise weight) σ=12EdgeRec (w/o noise weight) σ=12

Figure 22: Relative error vs CPU time by EdgeRec without and with consideration of noise weightusing the in vivo MR image of T1 (left) and T2 (right) with noise levels σ = 4, 8, 12.

[33] E. Sakhaee, M. Arreola, and A. Entezari. Gradient-based sparse approximation for computedtomography. In Biomedical Imaging (ISBI), 2015 IEEE 12th International Symposium on,pages 1608–1611. IEEE, 2015.

[34] G. Sapiro. Vector-valued active contours. In Computer Vision and Pattern Recognition, 1996.Proceedings CVPR’96, 1996 IEEE Computer Society Conference on, pages 680–685. IEEE,1996.

[35] G. Sapiro and D. L. Ringach. Anisotropic diffusion of multivalued images with applications tocolor filtering. IEEE transactions on image processing, 5(11):1582–1586, 1996.

[36] P. Tseng. On accelerated proximal gradient methods for convex-concave optimization. sub-mitted to SIAM Journal on Optimization, 2008.

[37] J. Weickert, B. T. H. Romeny, and M. A. Viergever. Efficient and reliable schemes for nonlineardiffusion filtering. Image Processing, IEEE Transactions on, 7(3):398–410, 1998.

35