accelerated optimization methods for large-scale medical ...fessler/papers/files/talk/13/...1...

1

Accelerated optimization methodsfor large-scale medical image reconstruction

Jeffrey A. Fessler

EECS Dept., BME Dept., Dept. of RadiologyUniversity of Michigan

web.eecs.umich.edu/˜fessler

CIMI Workshop, Toulouse

25 June 2013

web.eecs.umich.edu/~fessler

2

Disclosure

• Research support from GE Healthcare• Research support to GE Global Research• Supported in part by NIH grants R01 HL-098686 and P01 CA-87634• Equipment support from Intel

3

Statistical image reconstruction: a CT revolution

• A picture is worth 1000 words• (and perhaps several 1000 seconds of computation?)

Thin-slice FBP ASIR Statistical

Seconds A bit longer Much longer

(Same sinogram, so all at same dose)

4

Outline

• Image denoising (review)

• Image restorationAntonios Matakos, Sathish Ramani, JF, IEEE T-IP, May 2013Accelerated edge-preserving image restoration without boundary artifacts

• Low-dose X-ray CT image reconstructionSathish Ramani & JF, IEEE T-MI, Mar. 2012A splitting-based iterative algorithm for accelerated statistical X-ray CT reconstructionDonghwan Kim, Sathish Ramani, JF, Fully3D June 2013Accelerating X-ray CT ordered subsets image reconstruction with Nesterov’s first-ordermethods

• Model-based MR image reconstructionSathish Ramani & JF, IEEE T-MI, Mar. 2011Parallel MR image reconstruction using augmented Lagrangian methods

• Image in-painting (e.g., from cutset sampling) using sparsity

http://dx.doi.org/10.1109/TIP.2013.2244218http://dx.doi.org/10.1109/TMI.2011.2175233http://fully3d.orghttp://dx.doi.org/10.1109/TMI.2010.2093536

5

Image denoising

6

Denoising using sparsity

Measurement model:

yyy︸︷︷︸observed

= xxx︸︷︷︸unknown

+ εεε︸︷︷︸noise

Object model: assume QQQxxx is sparse (compressible) for some orthogonalsparsifying transform QQQ, such as an orthogonal wavelet transform (OWT).

Sparsity regularized estimator:

x̂xx = argminxxx

12∥yyy− xxx∥22︸︷︷︸data fit

+β ∥QQQxxx∥p︸︷︷︸sparsity

.

Regularization parameter β determines trade-off.

Equivalently (because QQQ−1 = QQQ′ is an orthonormal matrix):

x̂xx = QQQ′θ̂θθ , θ̂θθ = argminθθθ

12∥QQQyyy−θθθ∥22+β∥θθθ∥p = shrink(QQQyyy : β, p)

Non-iterative solution!

7

Orthogonal transform thresholding

Equation:x̂xx = QQQ′ shrink(QQQyyy : β, p)

Block diagram:

Noisyimage

yyy→

AnalysisTransform

QQQ→ Shrinkβ , p → θ̂θθ →

SynthesisTransform

QQQ’→

Denoisedimage

x̂xx

todo: show shrink function for p = 1 and p = 0

But sparsity in orthogonal transforms often yields artifacts. Spin cycling...

8

Hard thresholding example

PSNR = 76.1 dB

Noisy image

1

200

1 184

PSNR = 89.7 dB

Denoised

1

200

1 184

p = 0, orthonormal Haar wavelets

9

Sparsity using shift-invariant models

Analysis form:Assume RRRxxx is sparse for some sparsifying transform RRR.Often RRR is a “tall” matrix, e.g., finite differences along horizontal and verticaldirections, i.e., anisotropic total variation (TV).Often RRR is shift invariant: ∥RRRxxx∥p = ∥RRRcircshift(xxx)∥p and RRR

′RRR is circulant.

x̂xx = argminxxx

12∥yyy− xxx∥22+β∥RRRxxx∥p︸︷︷︸

transform sparsity

.

Synthesis formAssume xxx = SSSθθθ where coefficient vector θθθ is sparse.Often SSS is a “fat” matrix (over-complete dictionary) and SSS′SSS is circulant.

x̂xx = SSSθ̂θθ , θ̂θθ = argminθθθ

12∥yyy−SSSθθθ∥22+β∥θθθ∥p︸︷︷︸

sparse coefficients

Analysis form preferable to synthesis form?(Elad et al., Inv. Prob., June 2007)

http://dx.doi.org/10.1088/0266-5611/23/3/007

10

Constrained optimization

Unconstrained estimator (analysis form for illustration):

x̂xx = argminxxx

12∥yyy− xxx∥22+β∥RRRxxx∥p .

(Nonnegativity constraint or box constraints easily added.)

Equivalent constrained optimization problem:

minxxx,vvv

12∥yyy− xxx∥22+β∥vvv∥p sub. to vvv = RRRxxx.

(Y. Wang et al., SIAM J. Im. Sci., 2008)(M Afonso, J Bioucas-Dias, M Figueiredo, IEEE T-IP, Sep. 2010)

(The auxiliary variable vvv is discarded after optimization; keep only x̂xx.)

Penalty approach:

x̂xx = argminxxx

minvvv

12∥yyy− xxx∥22+β∥vvv∥p+

µ2∥vvv−RRRxxx∥22 .

Large µ better enforces the constraint vvv = RRRxxx, but can worsen conditioning.

Preferable (?) approach: augmented Lagrangian.

http://dx.doi.org/10.1137/080724265http://dx.doi.org/10.1109/TIP.2010.2047910

11

Augmented Lagrangian method: V1

General linearly constrained optimization problem:min

uuuΨ(uuu) sub. to CCC uuu = bbb.

Form augmented Lagrangian:

L(uuu,γγγ)≜ Ψ(uuu)+γγγ ′(CCC uuu−bbb)+ ρ2∥CCC uuu−bbb∥22

where γγγ is the dual variable or Lagrange multiplier vector .

AL method alternates between minimizing over uuu and gradient ascent on γγγ:uuu(n+1) = argmin

uuuL(uuu,γγγ (n))

γγγ (n+1) = γγγ (n)+ρ(CCC uuu(n+1)−bbb

).

Desirable convergence properties.AL penalty parameter ρ affects convergence rate, not solution!

Unfortunately, minimizing over uuu is impractical here:

vvv = RRRxxx equivalent to CCC uuu = bbb, CCC = [RRR −III], uuu =[

xxxvvv

], bbb = 000.

12

Augmented Lagrangian method: V2

General linearly constrained optimization problem:min

uuuΨ(uuu) sub. to CCC uuu = bbb.

Form (modified) augmented Lagrangian by completing the square:

L(uuu,ηηη)≜ Ψ(uuu)+ρ2∥CCC uuu−ηηη∥22+Cηηη,

where ηηη ≜ bbb− 1ρ γγγ is a modified dual variable or Lagrange multiplier vector .

AL method alternates between minimizing over uuu and gradient ascent on ηηη :uuu(n+1) = argmin

uuuL(uuu,γγγ (n))

ηηη (n+1) = ηηη (n)− (CCC uuu(n+1)−bbb).Desirable convergence properties.AL penalty parameter ρ affects convergence rate, not solution!

Unfortunately, minimizing over uuu is impractical here:

vvv = RRRxxx equivalent to CCC uuu = bbb, CCC = [RRR −III], uuu =[

xxxvvv

], bbb = 000.

13

Alternating direction method of multipliers (ADMM)

When uuu has multiple component vectors, e.g., uuu =[

xxxvvv

],

rewrite (modified) augmented Lagrangian in terms of all component vectors:

L(xxx,vvv;ηηη) = Ψ(xxx,vvv)+ρ2∥RRRxxx− vvv−ηηη∥22

=12∥yyy− xxx∥22+β∥vvv∥p+

ρ2∥RRRxxx− vvv−ηηη∥22︸︷︷︸cf. penalty!

because here CCC uuu = RRRxxx− vvv.

Alternate between minimizing over each component vector:xxx(n+1) = argmin

xxxL(xxx,vvv(n),ηηη (n))

vvv(n+1) = argminvvv

L(xxx(n+1),vvv,ηηη (n))

ηηη (n+1) = ηηη (n)+(RRRxxx(n+1)− vvv(n+1)

).

Reasonably desirable convergence properties. (Inexact inner minimizations!)Sufficient conditions on matrix CCC.(Eckstein & Bertsekas, Math. Prog., Apr. 1992)(Douglas and Rachford, Tr. Am. Math. Soc., 1956, heat conduction problems)

http://dx.doi.org/10.1007/BF01581204

14

ADMM for image denoising

Augmented Lagrangian:

L(xxx,vvv;ηηη) =12∥yyy− xxx∥22+β∥vvv∥p+

ρ2∥RRRxxx− vvv−ηηη∥22

Update of primal variable (unknown image):

xxx(n+1) = argminxxx

L(xxx,vvv(n),ηηη (n)) = [III +ρRRR′RRR]−1︸︷︷︸Wiener filter

(yyy+ρRRR′

(vvv(n)+ηηη (n)

))

Update of auxiliary variable: (No “corner rounding” needed for ℓ1.)

vvv(n+1) = argminvvv

L(xxx(n+1),vvv,ηηη (n)) = shrink(RRRxxx(n+1)−ηηη (n);β/ρ, p

)Update of multiplier: ηηη (n+1) = ηηη (n)+

(RRRxxx(n+1)− vvv(n+1)

)Equivalent to “split Bregman” approach.(Goldstein & Osher, SIAM J. Im. Sci. 2009)

Each update is simple and exact (non-iterative) if [III +ρRRR′RRR]−1 is easy.

http://dx.doi.org/10.1137/080725891

15

ADMM image denoising example

True

1 128

1

128

Noisy

SNR = 24.8 dB1 128

1

128

Denoised

SNR = 53.5 dB1 128

1

1280 5 10 15 20

25

53

ADMM iteration

SN

R (

dB)

RRR : horizontal and vertical finite differences (anisotropic TV),p = 1 (i.e., ℓ1), β= 1/2, ρ = 1 (condition number of (III +ρRRR′RRR) is 9)

16

ADMM image denoising iterates

17

X-ray CT image reconstructionPart 1: ADMM

18

X-ray CT review 1

X-ray source transmits X-ray photons through object.Recorded signal relates to line integral of attenuation along photon path.

19

X-ray CT review 2

X-ray source and detector rotate around object.

20

X-ray CT review 3

Collection of recorded views called a sinogram.Goal is to reconstruct (3D) image of object attenuation from sinogram.

21

Lower-dose X-ray CT

Radiation dose proportional to X-ray source intensity.Reducing dose =⇒ fewer recorded photons =⇒ lower SNRConventional filter back-project (FBP) method derived for noiseless data

Conventional FBP reconstruction

Statistical image reconstruction

22

Low-dose X-ray CT image reconstruction

Regularized estimator:

x̂xx = argminxxx⪰000

12∥yyy−AAAxxx∥2WWW︸︷︷︸data fit

+β∥RRRxxx∥p︸︷︷︸sparsity

.

Complications:• Large problem size

◦ xxx: 512×512×800 ≈ 2 ·108 unknown image volume◦ yyy: 888×64×7000 ≈ 4 ·108 measured sinogram◦ AAA: (4 ·108)× (2 ·108) system matrix◦ AAA is sparse but still too large to store◦ Projection AAAxxx and back-projection AAA′ rrr operations computed on the fly◦ Computing gradient ∇Ψ(xxx)=AAA′WWW (AAAxxx−yyy)+β∇R(xxx) requires projection

and back-projection operations that dominate computation• AAA′AAA is not circulant (but “approximately Toeplitz” in 2D)• AAA′WWWAAA is highly shift variant due to huge dynamic range of weighting WWW• Non-quadratic (edge-preserving) regularizer e.g., R(xxx) = ∥RRRxxx∥p• Nonnegativity constraint• Goal: fast parallelizable algorithms that “converge” in a few iterations

23

Basic ADMM for X-ray CT

Basic equivalent constrained optimization problem (cf. split Bregman):

minxxx⪰000,vvv

12∥yyy−AAAxxx∥2WWW +β∥vvv∥p sub. to vvv = RRRxxx.

Corresponding (modified) augmented Lagrangian (cf. “split Bregman”):

L(xxx,vvv;ηηη) =12∥yyy−AAAxxx∥2WWW +β∥vvv∥p+


ADMM update of primal variable (unknown image):


L(xxx,vvv(n),ηηη (n)) =[AAA′WWWAAA+ρRRR′RRR

]−1(AAA′WWW ′yyy+ρRRR′(vvv(n)+ηηη (n)))Drawbacks:• Ignores nonnegativity constraint•[AAA′WWWAAA+ρRRR′RRR

]−1 requires iteration (e.g., PCG) but hard to precondition.“second order method”

• Auxiliary variable vvv = RRRxxx is enormous in 3D CT

24

Improved ADMM for X-ray CT

minxxx⪰000,uuu,vvv

12∥yyy−uuu∥2WWW +β∥vvv∥p sub. to vvv = RRRxxx, uuu = AAAxxx.

Corresponding (modified) augmented Lagrangian:

L(xxx,uuu,vvv;ηηη1,ηηη2) =12∥yyy−uuu∥2WWW +β∥vvv∥p+

ρ12∥RRRxxx− vvv−ηηη1∥

22+

ρ22∥AAAxxx−uuu−ηηη2∥

22

ADMM update of primal variable (ignoring nonnegativity):

argminxxx

L(xxx,uuu,vvv,ηηη1,ηηη2) =[ρ2AAA′AAA+ρ1RRR′RRR

]−1(ρ1RRR′ (vvv+ηηη1)+ρ2AAA′(uuu+ηηη2))For 2D CT,

[ρ2AAA′AAA+ρ1RRR′RRR

]−1 is approximately Toeplitzso a circulant preconditioner is very effective.

ADMM update of auxiliary variable uuu:

argminuuu

L(xxx,uuu,vvv,ηηη1,ηηη2) = [WWW +ρ2III]−1︸︷︷︸

diagonal

(WWWyyy+ρ2(AAAxxx−ηηη2))

vvv update is shrinkage again. Reasonably simple to code.

(Sathish Ramani & JF, IEEE T-MI, Mar. 2012)

http://dx.doi.org/10.1109/TMI.2011.2175233

25

2D X-ray CT image reconstruction results: quality

True Hanning FBP Regularized

0

0.04

0.08

0.12

0.16

0.2

0.24

0.28

0.32

0.36

0.4

PWLS with ℓ1 regularization of shift-invariant Haar wavelet transform.No nonnegativity constraint, but probably unimportant if well-regularized.

26

2D X-ray CT image reconstruction results: speed

0 36 72 108 144 180 216 252 288 324 360−34.5

−33.4

−32.4

−31.3

−30.3

−29.2

−28.2

−27.1

−26.1

−25

−24

−22.9

−21.9

−20.8

−19.8

−18.7

−17.7

RM

SE

( x(

j))

(in

dB

)

Iterations

NCG−5MFISTA−5SB−CG−1SB−CG−2SB−PCG−1SB−PCG−2ADMM−CG−1ADMM−CG−2ADMM−PCG−1ADMM−PCG−2

0 79 158 237 316 395 474 553 632 711 790−34.5

−33.4

−32.4

−31.3

−30.3

−29.2

−28.2

−27.1

−26.1

−25

−24

−22.9

−21.9

−20.8

−19.8

−18.7

−17.7

RM

SE

( x(

j))

(in

dB

)

t j (seconds)

NCG−5MFISTA−5SB−CG−1SB−CG−2SB−PCG−1SB−PCG−2ADMM−CG−1ADMM−CG−2ADMM−PCG−1ADMM−PCG−2

Circulant preconditioner for[ρ2AAA′AAA+ρ1RRR′RRR

]−1 is crucial to acceleration.Similar results for real head CT scan in paper.

27

Lower-memory ADMM for X-ray CT

minxxx,uuu,zzz⪰000

12∥yyy−uuu∥2WWW +β∥RRRzzz∥p sub. to zzz = xxx, uuu = AAAxxx.

(M McGaffin, S Ramani, JF, SPIE 2012)

Corresponding (modified) augmented Lagrangian:

L(xxx,uuu,zzz;ηηη1,ηηη2) =12∥yyy−uuu∥2WWW +β∥RRRzzz∥p+

ρ12∥xxx− zzz−ηηη1∥

22+

ρ22∥AAAxxx−uuu−ηηη2∥

22

ADMM update of primal variable (nonnegativity not required, use PCG):

argminxxx

L(xxx,uuu,zzz,ηηη1,ηηη2) =[ρ2AAA′AAA+ρ1III

]−1(ρ1 (zzz+ηηη1)+ρ2AAA′(uuu+ηηη2)) .ADMM update of auxiliary variable zzz:

argminzzz⪰000

L(xxx,uuu,zzz,ηηη1,ηηη2) = argminzzz⪰000


22+β∥RRRzzz∥p .

Use nonnegatively constrained, edge-preserving image denoising.

ADMM updates of auxiliary variables uuu and vvv same as before.Variations...

http://dx.doi.org/10.1117/12.910941

28

3D X-ray CT image reconstruction results

Awaiting better preconditioner for[ρ2AAA′AAA+ρ1III

]−1 in 3D CT...This is not an “easy problem” like in (idealized) image restoration...

29

X-ray CT image reconstructionPart 2: OS+Momentum

30

OS+Momentum preview

• Optimization transfer (aka majorize-minimize, half-quadratic)• Ordered-subsets (OS) acceleration (aka incremental gradient, block iterative)• Nesterov’s momentum-based acceleration• Proposed OS+Momentum approach combines both

0 5 10 15 200

5

10

15

20

25

30

Iteration

RM

SD

[HU

]

Donghwan Kim, Sathish Ramani, JF, Fully3D June 2013Accelerating X-ray CT ordered subsets image reconstruction with Nesterov’s first-ordermethods

http://fully3d.org

31

Optimization transfer method

(aka majorization-minimization or half-quadratic)

• At nth iteration, replace original cost function Ψ(xxx)by a surrogate function ϕ (n)(xxx) that is easier to minimize:

xxx(n+1) = argminxxx⪰000

ϕ (n)(xxx)

• To monotonically decrease Ψ(xxx), i.e., Ψ(xxx(n+1)

)≤ Ψ(xxx(n)),

surrogate should satisfy the following majorization conditions:

ϕ (n)(xxx(n)

)= Ψ

(xxx(n)

)ϕ (n)(xxx) ≥ Ψ(xxx), ∀xxx ⪰ 000

32

Separable quadratic surrogate (SQS)

Quadratic surrogate functions are particularly convenient:

Ψ(xxx)≤ ϕ (n)(xxx)≜ Ψ(xxx(n)

)+∇Ψ

(xxx(n)

)(xxx− xxx(n))+ 1

2

∥∥xxx− xxx(n)∥∥2DDD ,where DDD is a specially designed diagonal matrix:

DDD = DDDL+βDDDR, DDDL ≜ diag{

AAA′WWWAAA111}, DDDR ≜ λmax

(∇2R(xxx)

)III.

(Erdoğan and Fessler, PMB, 1999)

(Easier to compute DDDL than to find Lipschitz constant of AAA′WWWAAA.)

SQS leads to trivial M-step:


ϕ (n)(xxx) =[xxx(n)−DDD−1∇Ψ

(xxx(n)

)]+.

“diagonally preconditioned gradient projection method”

http://dx.doi.org/10.1088/0031-9155/44/11/311

33

Convergence rate of SQS method

Asymptotic convergence rate:

ρ(III −DDD−1∇2 Ψ(x̂xx)

)Slightly generalizing Theorem 3.1 of Beck and Teboulle:

Ψ(xxx(n)

)−Ψ(x̂xx)≤

∥∥xxx(0)− x̂xx∥∥2DDD2n

.

(Beck and Teboulle, SIAM J Im. Sci., 2009)

Pro: easily parallelizedCon: very slow convergence

http://dx.doi.org/10.1137/080716542

34

Accelerating SQS using Nesterov’s momemtum

SQS+Momentum Algorithm:• Initialize image xxx(0) and zzz(0)• for n = 0,1, . . .


ϕ (n)(zzz(n)

)=[zzz(n)−DDD−1∇Ψ

(zzz(n)

)]+

tn+1 =(

1+√

1+4t2n)/2

zzz(n+1) = xxx(n+1)+tn−1tn+1

(xxx(n+1)− xxx(n)

)Convergence rate of SQS+Momentum method:

Ψ(xxx(n)

)−Ψ(x̂xx)≤

2∥∥xxx(0)− x̂xx∥∥2DDD(n+1)2

.

Simple generalization of Thm. 4.4 for FISTA of(Beck and Teboulle, SIAM J Im. Sci., 2009)

Pro: Almost same computation per iteration; slightly more memory needed.Con: still converges too slowly for X-ray CT

http://dx.doi.org/10.1137/080716542

35

Ordered subsets (OS) methods

• Recall: Projection operator AAA is computationally expensive.•• OS methods group projection views into M subsets, and use each subset

per each update, instead of using all measurement data.(Hudson and Larkin, IEEE T-MI, 1994)(Erdoğan and Fessler, PMB, 1999)

cf block-iterative incremental sub-gradient for machine learning

http://dx.doi.org/ 10.1109/42.363108 http://dx.doi.org/10.1088/0031-9155/44/11/311

36

OS projection view grouping

37


38


39

OS algorithm

Cost function decomposition:

Ψ(xxx) =M

∑m=1

Ψm(xxx), Ψm(xxx) =12∥yyym−AAAmxxx∥

2WWW m +

1M

R(xxx)

yyym, AAAm, WWW m: sinogram rows, system matrix rows, weighting elementsfor mth subset of projection views

Intuition: in early iterations (when xxx(n) is far from x̂xx):

M∇Ψm(xxx(n)

)≈ ∇Ψ

(xxx(n)

).

OS-SQS Algorithm (Erdoğan and Fessler, PMB, 1999)• Initialize image xxx(0)• for n = 0,1, . . .• for m = 0, . . . ,M−1

xxx(n+(m+1)/M) = argminxxx⪰000

[xxx(n+m/M)−DDD−1M∇Ψm

(xxx(n+m/M)

)]+

http://dx.doi.org/10.1088/0031-9155/44/11/311

40

OS-SQS algorithm properties

• One iteration corresponds to updating all M subsets.Computation cost similar to original SQS(one full forward AAA and back-projection AAA′ per iteration)

• + Highly parallelizable• + In early iterations, “we expect” the sequence {xxx(n)} to satisfy

Ψ(xxx(n)

)−Ψ(x̂xx)⪅

∥∥xxx(0)− x̂xx∥∥2DDD2nM

.

M times acceleration!• - Does not converge to x̂xx

Approaches a limit cycle, size related to MLuo, Neural Computation, June 1991

• - Computing ∇R(xxx) for each of M subsets =⇒ prefer small M• Since about 1997, OS methods have been used for (unregularized) PET

reconstruction in nearly every PET scanner sold.• Still undesirably slow (for small M) or unstable (for large M) in X-ray CT.

http://dx.doi.org/ 10.1162/neco.1991.3.2.226

41

OS+Momentum algorithm• Initialize image xxx(0) and zzz(0)• for n = 0,1, . . .• for m = 0,1, . . .

xxx(n+(m+1)/M) =[zzz(n+m/M)−DDD−1M∇Ψm

(zzz(n+m/M)

)]+

tnew =(

1+√

1+4t2old

)/2

zzz(n+(m+1)/M) = xxx(n+(m+1)/M)+told−1

tnew

(xxx(n+(m+1)/M)− xxx(n+m/M)

)told := tnew

• + In early iterations, “we expect” the sequence {xxx(n)} to satisfy

Ψ(xxx(n)

)−Ψ(x̂xx)⪅

∥∥xxx(0)− x̂xx∥∥2DDD2(nM)2

.

M2 times acceleration!• + Very similar computation as OS-SQS• + Easily implemented• - Unknown convergence properties

42

Summary of convergence rates

SQS (optimization transfer) methods:

• Convergence rate

◦ SQS: O(

1n

)◦ SQS+Momentum: O

(1n2

)• Expected convergence rate with OS method in early iterations

◦ OS-SQS: O(

1nM

)◦ Proposed OS-SQS+Momentum: O

(1

(nM)2

)• Pros: Owing to M2 times acceleration from OS methods, we can use

small M, improving stability and reducing regularizer computation.• Cons: Behavior of OS methods with momentum is unknown, while or-

dinary OS methods approach a limit-cycle. (Luo, Neural Comp., Jun. 1991)

http://dx.doi.org/ 10.1162/neco.1991.3.2.226

43

Patient 3D helical CT scan results

• 3D cone-beam helical CT scan with pitch 1.0• 3D image xxx: 512×512×109• voxel size: 1.369 mm × 1.369 mm × 0.625 mm• measured sinogram data yyy: 888×32×7146

(detector columns × detector rows × projection views)

Convergence rates (empirical)• Root mean square difference (RMSD) between current xxx(n) and converged

image x̂xx

RMSD ≜ ∥xxx(n)− x̂xx∥2√

Np[HU],

where Np = 512×512×109 is the number of image voxels in xxx.• Normalized RMSD:

NRMSD ≜ 20log10(∥xxx(n)− x̂xx∥2

∥x̂xx∥2

)[dB].

• x̂xx obtained by many iterations of several convergent algorithms

44

Convergence rate: RMSD [HU]

0 5 10 15 200

5

10

15

20

25

30

Iteration

RM

SD

[HU

]

SQSSQS−momentum OS(24)−SQSOS(24)−SQS−momentum

• Slow convergence without OS methods.• OS methods with M = 24 subsets needed 20% extra compute time per

iteration due to ∇R(xxx).• OS-SQS-Momentum “converges” very rapidly in early iterations!• Does not reach RMSD=0...

45

Convergence rate: Normalized RMSD [dB]

0 5 10 15 20−50

−45

−40

−35

−30

−25

Iteration

Nor

mal

ized

RM

SD

[dB

]

SQSSQS−momentum OS(24)−SQSOS(24)−SQS−momentum

Combining incremental (sub)gradient with Nesterov-type momentumacceleration may help other “big data” estimation problems.

46

Images

Initial FBP image x(0)

Converged image x*

800

850

900

950

1000

1050

1100

1150

1200

47

Images

SQS

RMSD = 29.3 [HU] SQS−momentum

RMSD = 27.9 [HU]

OS(24)−SQS RMSD = 16.8 [HU]

OS(24)−SQS−momentum RMSD = 1.9 [HU]

800 850 900 950 1000 1050 1100 1150 1200

Reconstructed images at 12th iteration. ([800 1200] HU)

OS-SQS+Momentum with M = 24 subsets much closer to minimizer x̂xx

48

Difference images

SQS

RMSD = 29.3 [HU] SQS−momentum

RMSD = 27.9 [HU]

OS(24)−SQS RMSD = 16.8 [HU]

OS(24)−SQS−momentum RMSD = 1.9 [HU]

−100 −80 −60 −40 −20 0 20 40 60 80 100

Difference between reconstructed images at 12th iterationand converged image x̂xx. ([-100 100] HU)

49

Newer Nesterov method

0 5 10 15 200

5

10

15

20

25

30

Iteration

RM

SD

[HU

]

SQS−Nes83 SQS−Nes05 OS24−SQSOS48−SQSOS24−SQS−Nes83 OS24−SQS−Nes05 OS48−SQS−Nes83 OS48−SQS−Nes05

Remains stable even for M = 48 subsets. (Nesterov, Math. Prog., May 2005)

http://dx.doi.org/ 10.1007/s10107-004-0552-5

50

Some research problems in CT

x̂xx = argminxxx⪰000

12∥yyy−AAAxxx∥2WWW +βR(xxx).

Reasonably mature research areas• Design and implementation of system model AAA• Statistical modeling WWW

Open problems• Design of regularizer R(xxx) to maximize radiologist performance• Faster parallelizable algorithms (argmin) with global convergence• Distributed computation – reducing communication• Algorithms for more complete/complicated physical models

(e.g., dual energy or spectral CT)• Dynamic imaging / motion compensated image reconstruction• Analysis of statistical properties of (highly nonlinear) estimator x̂xx

51

Image reconstruction for parallel MRI

52

Parallel MRI

Undersampled Cartesian k-space, multiple receive coils, ...(Pruessmann et al., MRM, Nov. 1999)

Compressed sensing parallel MRI ≡ further (random) under-samplingLustig et al., IEEE Sig. Proc. Mag., Mar. 2008

http://dx.doi.org/10.1002/(SICI)1522-2594(199911)42:53.0.CO;2-Shttp://dx.doi.org/10.1109/MSP.2007.914728

53

Model-based image reconstruction in parallel MRI

Regularized estimator:

x̂xx = argminxxx

12∥yyy−FFFSSSxxx∥22︸︷︷︸data fit

+β∥RRRxxx∥p︸︷︷︸sparsity

.

FFF is under-sampled DFT matrix (fat)

Features:• coil sensitivity matrix SSS is block diagonal (Pruessmann et al., MRM, Nov. 1999)• FFF ′FFF is circulant

Complications:• Data-fit Hessian SSS′FFF ′FFFSSS is highly shift variant due to coil sensitivity maps• Non-quadratic (edge-preserving) regularization ∥·∥p• Complex quantities• Large problem size (if 3D)

http://dx.doi.org/10.1002/(SICI)1522-2594(199911)42:53.0.CO;2-S

54

Basic ADMM for parallel MRI

Basic equivalent constrained optimization problem (cf. split Bregman):

minxxx,vvv

12∥yyy−FFFSSSxxx∥22+β∥vvv∥p sub. to vvv = RRRxxx.

Corresponding (modified) augmented Lagrangian (cf. “split Bregman”):

L(xxx,vvv;ηηη) =12∥yyy−FFFSSSxxx∥22+β∥vvv∥p+


(Skipping technical details about complex vectors.)

ADMM update of primal variable (unknown image):


L(xxx,vvv(n),ηηη (n)) =[SSS′FFF ′FFFSSS+ρRRR′RRR

]−1(SSS′FFF ′yyy+ρRRR′(vvv(n)+ηηη (n)))•[SSS′FFF ′FFFSSS+ρRRR′RRR

]−1 requires iteration (e.g., PCG) but hard to precondition• (Trivial for single coil case with SSS = III.)• The “problem” matrix is on opposite side:

◦ MRI: FFFSSS◦ Restoration: TTT AAA

55

Improved ADMM for parallel MRI

minxxx,uuu,vvv,zzz

12∥yyy−FFF uuu∥22+β∥vvv∥p sub. to vvv = RRRzzz, uuu = SSSxxx, zzz = xxx.

Corresponding (modified) augmented Lagrangian:12∥yyy−FFF uuu∥22+β∥vvv∥p+

ρ12∥RRRzzz− vvv−ηηη1∥

22+

ρ22∥SSSxxx−uuu−ηηη2∥

22+


22

ADMM update of primal variable

argminxxx

L(xxx,uuu,vvv,zzz;ηηη1,ηηη2,ηηη3) =[ρ2SSS′SSS+ρ3III

]−1︸︷︷︸diagonal

(ρ2SSS′ (uuu+ηηη2)+ρ3(zzz+ηηη3)

)

ADMM update of auxiliary variables:argmin

uuuL(xxx,uuu,vvv,zzz;ηηη1,ηηη2,ηηη3) = [FFF ′FFF +ρ2III]

−1︸︷︷︸circulant

(FFF ′yyy+ρ2(SSSxxx−ηηη2))

argminzzz

L(xxx,uuu,vvv,zzz;ηηη1,ηηη2,ηηη3) = [ρ1RRR′RRR+ρ3III]−1︸︷︷︸

circulant

(ρ1RRR′(vvv+ηηη1)+ρ3(xxx−ηηη3))

vvv update is shrinkage again.Simple, but does not satisfy sufficient conditions.(Sathish Ramani & JF, IEEE T-MI, Mar. 2011)

http://dx.doi.org/10.1109/TMI.2010.2093536

56

2.5D parallel MR image reconstruction results: data

0.4

32

63.6

95.1

126.7

158.2

189.8

221.3

252.9

284.4

316

347.5

0.4

32

63.6

95.1

126.7

158.2

189.8

221.3

252.9

284.4

316

347.5

Fully sampled body coil image of human brainPoisson-disk-based k-space sampling, 16% sampling (acceleration 6.25)Square-root of sum-of-squares inverse FFT of zero-filled k-space data

57

2.5D parallel MR image reconstruction results: IQ

0.4

32

63.6

95.1

126.7

158.2

189.8

221.3

252.9

284.4

316

347.5

0.4

32

63.6

95.1

126.7

158.2

189.8

221.3

252.9

284.4

316

347.5

0

8.4

16.8

25.2

33.5

41.9

50.3

58.7

67

75.4

83.8

92.2

• Fully sampled body coil image of human brain• Regularized reconstruction xxx(∞) (1000s of iterations of MFISTA)

(A Beck & M Teboulle, SIAM J. Im. Sci, 2009)Combined TV and ℓ1 norm of two-level undecimated Haar wavelets

• Difference image magnitude

http://dx.doi.org/10.1137/080716542

58

2.5D parallel MR image reconstruction results: speed

0 70 140 210 281 351 421 492 562 632 703 773−93.1

−86.4

−79.7

−72.9

−66.2

−59.5

−52.8

−46.1

−39.3

−32.6

−25.9

−19.2

−12.5

−5.7

1ξ(

j)

(in

dB

)

t j (seconds)

MFISTA−1MFISTA−5NCG−1NCG−5AL−P1−4AL−P1−6AL−P1−10AL−P2

AL approach converges to xxx(∞) much faster than MFISTA and CG

59

Current and future directions with ADMM• Motion-compensated image reconstruction: yyy = AAATTT (ααα)xxx+ εεε

(J H Cho, S Ramani, JF, 2nd CT meeting, 2012)(J H Cho, S Ramani, JF, IEEE Stat. Sig. Proc. W., 2012)

• Dynamic image reconstruction• Improved preconditioners for ADMM for 3D CT

(M McGaffin and JF, Submitted to Fully 3D 2013)• Combining ADMM with ordered subsets (OS) methods

(H Nien and JF, Submitted to Fully 3D 2013)• Generalize parallel MRI algorithm to include spatial support constraint

(M Le, S Ramani, JF, To appear at ISMRM 2013)• Non-Cartesian MRI (combine optimization transfer and variable splitting)

(S Ramani and JF, ISBI 2013, to appear.)• SPECT-CT reconstruction with non-local means regularizer

(S Y Chun, Y K Dewaraja, JF, Submitted to Fully 3D 2013)• Estimation of coil sensitivity maps (quadratic problem!)

(M J Allison, S Ramani, JF, IEEE T-MI, Mar. 2013)• L1-SPIRiT for non-Cartesian parallel MRI (D S Weller, S Ramani, JF, IEEE

T-MI, 2013, submitted)• Multi-frame super-resolution• Selection of AL penalty parameter ρ to optimize convergence rate

• Other non-ADMM methods...

http://dx.doi.org/10.1109/SSP.2012.6319667http://dx.doi.org/10.1109/TMI.2012.2229711http://dx.doi.org/http://dx.doi.org/

60

Acknowledgements

CT group• Jang Hwan Cho• Donghwan Kim• Jungkuk Kim• Madison McGaffin• Hung Nien• Stephen Schmitt

MR group• Michael Allison• Mai Le• Antonis Matakos• Matthew Muckley

Post-doctoral fellows• Se Young Chun• Sathish Ramani• Daniel Weller

UM collaborators• Doug Noll• Jon-Fredrik Nielsen• Mitch Goodsitt• Ella Kazerooni• Tom Chenevert• Charles Meyer

GE collaborators• Bruno De Man• Jean-Baptiste Thibault

61

Image reconstruction toolbox

MATLAB (and increasingly Octave) toolbox for imaging inverse problems(MRI, CT, PET, SPECT, Deblurring)

web.eecs.umich.edu/˜fessler

-600

-400

-200

0

200

400

600

x (mm)-600

-400

-200

0

200

400

600

y (mm)

-10

-5

0

5

10

15

z (mm)

1

985

1968

x

y

z

web.eecs.umich.edu/~fessler

62

Bibliography[1] A. Matakos, S. Ramani, and J. A. Fessler. Accelerated edge-preserving image restoration without boundary artifacts. IEEE

Trans. Im. Proc., 22(5):2019–29, May 2013.[2] S. Ramani and J. A. Fessler. A splitting-based iterative algorithm for accelerated statistical X-ray CT reconstruction. IEEE

Trans. Med. Imag., 31(3):677–88, March 2012.[3] D. Kim, S. Ramani, and J. A. Fessler. Accelerating X-ray CT ordered subsets image reconstruction with Nesterov’s first-order

methods. In Proc. Intl. Mtg. on Fully 3D Image Recon. in Rad. and Nuc. Med, pages 22–5, 2013.[4] D. Kim, S. Ramani, and J. A. Fessler. Ordered subsets with momentum for accelerated X-ray CT image reconstruction. In

Proc. IEEE Conf. Acoust. Speech Sig. Proc., pages 920–3, 2013.[5] S. Ramani and J. A. Fessler. Parallel MR image reconstruction using augmented Lagrangian methods. IEEE Trans. Med.

Imag., 30(3):694–706, March 2011.[6] M. Elad, P. Milanfar, and R. Rubinstein. Analysis versus synthesis in signal priors. Inverse Prob., 23(3):947–68, June 2007.[7] I. W. Selesnick and Mário A T Figueiredo. Signal restoration with overcomplete wavelet transforms: comparison of analysis

and synthesis priors. In Proc. SPIE 7446 Wavelets XIII, page 74460D, 2009. Wavelets XIII.[8] Y. Wang, J. Yang, W. Yin, and Y. Zhang. A new alternating minimization algorithm for total variation image reconstruction.

SIAM J. Imaging Sci., 1(3):248–72, 2008.[9] M. V. Afonso, José M Bioucas-Dias, and Mário A T Figueiredo. Fast image recovery using variable splitting and constrained

optimization. IEEE Trans. Im. Proc., 19(9):2345–56, September 2010.[10] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating

direction method of multipliers. Found. & Trends in Machine Learning, 3(1):1–122, 2010.[11] J. Eckstein and D. P. Bertsekas. On the Douglas-Rachford splitting method and the proximal point algorithm for maximal

monotone operators. Mathematical Programming, 55(1-3):293–318, April 1992.[12] J. Douglas and H. H. Rachford. On the numerical solution of heat conduction problems in two and three space variables.

tams, 82(2):421–39, July 1956.[13] T. Goldstein and S. Osher. The split Bregman method for L1-regularized problems. SIAM J. Imaging Sci., 2(2):323–43, 2009.[14] S. J. Reeves. Fast image restoration without boundary artifacts. IEEE Trans. Im. Proc., 14(10):1448–53, October 2005.[15] M. G. McGaffin, S. Ramani, and J. A. Fessler. Reduced memory augmented Lagrangian algorithm for 3D iterative X-ray CT

image reconstruction. In Proc. SPIE 8313 Medical Imaging 2012: Phys. Med. Im., page 831327, 2012.[16] H. Erdoğan and J. A. Fessler. Ordered subsets algorithms for transmission tomography. Phys. Med. Biol., 44(11):2835–51,

November 1999.[17] H. M. Hudson and R. S. Larkin. Accelerated image reconstruction using ordered subsets of projection data. IEEE Trans. Med.

Imag., 13(4):601–9, December 1994.[18] Z. Q. Luo. On the convergence of the LMS algorithm with adaptive learning rate for linear feedforward networks. Neural

Computation, 32(2):226–45, June 1991.[19] Y. Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103(1):127–52, May 2005.

[20] K. P. Pruessmann, M. Weiger, M. B. Scheidegger, and P. Boesiger. SENSE: sensitivity encoding for fast MRI. Mag. Res. Med.,42(5):952–62, November 1999.

[21] M. Lustig, D. L. Donoho, J. M. Santos, and J. M. Pauly. Compressed sensing MRI. IEEE Sig. Proc. Mag., 25(2):72–82, March2008.

[22] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci.,2(1):183–202, 2009.

[23] M. J. Allison, S. Ramani, and J. A. Fessler. Accelerated regularized estimation of MR coil sensitivities using augmentedLagrangian methods. IEEE Trans. Med. Imag., 32(3):556–64, March 2013.

[24] M. J. Allison and J. A. Fessler. Accelerated computation of regularized field map estimates. In Proc. Intl. Soc. Mag. Res. Med.,page 0413, 2012.

[25] J. H. Cho, S. Ramani, and J. A. Fessler. Motion-compensated image reconstruction with alternating minimization. In Proc.2nd Intl. Mtg. on image formation in X-ray CT, pages 330–3, 2012.

[26] J. H. Cho, S. Ramani, and J. A. Fessler. Alternating minimization approach for multi-frame image reconstruction. In IEEEWorkshop on Statistical Signal Processing, pages 225–8, 2012.

[27] M. McGaffin and J. A. Fessler. Sparse shift-varying FIR preconditioners for fast volume denoising. In Proc. Intl. Mtg. on Fully3D Image Recon. in Rad. and Nuc. Med, pages 284–7, 2013.

[28] H. Nien and J. A. Fessler. Combining augmented Lagrangian method with ordered subsets for X-ray CT image reconstruction.In Proc. Intl. Mtg. on Fully 3D Image Recon. in Rad. and Nuc. Med, pages 280–3, 2013.

[29] M. Le, S. Ramani, and J. A. Fessler. An efficient variable splitting based algorithm for regularized SENSE reconstruction withsupport constraint. In Proc. Intl. Soc. Mag. Res. Med., page 2654, 2013. To appear.

[30] S. Ramani and J. A. Fessler. Accelerated non-Cartesian SENSE reconstruction using a majorize-minimize algorithm combin-ing variable-splitting. In Proc. IEEE Intl. Symp. Biomed. Imag., pages 700–3, 2013.

[31] S. Y. Chun, Y. K. Dewaraja, and J. A. Fessler. Alternating direction method of multiplier for emission tomography with non-localregularizers. In Proc. Intl. Mtg. on Fully 3D Image Recon. in Rad. and Nuc. Med, pages 62–5, 2013.

[32] D. Weller, S. Ramani, and J. A. Fessler. Augmented Lagrangian with variable splitting for faster non-Cartesian L1-SPIRiT MRimage reconstruction. IEEE Trans. Med. Imag., 2013. Submitted.

[33] J. A. Fessler and W. L. Rogers. Spatial resolution properties of penalized-likelihood image reconstruction methods: Space-invariant tomographs. IEEE Trans. Im. Proc., 5(9):1346–58, September 1996.

accelerated optimization methods for large-scale medical ...fessler/papers/files/talk/13/...1...

Documents