accelerated optimization methods for large-scale medical ...fessler/papers/files/talk/13/...1...

63
1 Accelerated optimization methods for large-scale medical image reconstruction Jeffrey A. Fessler EECS Dept., BME Dept., Dept. of Radiology University of Michigan web.eecs.umich.edu/ ˜ fessler CIMI Workshop, Toulouse 25 June 2013

Upload: others

Post on 15-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

  • 1

    Accelerated optimization methodsfor large-scale medical image reconstruction

    Jeffrey A. Fessler

    EECS Dept., BME Dept., Dept. of RadiologyUniversity of Michigan

    web.eecs.umich.edu/˜fessler

    CIMI Workshop, Toulouse

    25 June 2013

    web.eecs.umich.edu/~fessler

  • 2

    Disclosure

    • Research support from GE Healthcare• Research support to GE Global Research• Supported in part by NIH grants R01 HL-098686 and P01 CA-87634• Equipment support from Intel

  • 3

    Statistical image reconstruction: a CT revolution

    • A picture is worth 1000 words• (and perhaps several 1000 seconds of computation?)

    Thin-slice FBP ASIR Statistical

    Seconds A bit longer Much longer

    (Same sinogram, so all at same dose)

  • 4

    Outline

    • Image denoising (review)

    • Image restorationAntonios Matakos, Sathish Ramani, JF, IEEE T-IP, May 2013Accelerated edge-preserving image restoration without boundary artifacts

    • Low-dose X-ray CT image reconstructionSathish Ramani & JF, IEEE T-MI, Mar. 2012A splitting-based iterative algorithm for accelerated statistical X-ray CT reconstructionDonghwan Kim, Sathish Ramani, JF, Fully3D June 2013Accelerating X-ray CT ordered subsets image reconstruction with Nesterov’s first-ordermethods

    • Model-based MR image reconstructionSathish Ramani & JF, IEEE T-MI, Mar. 2011Parallel MR image reconstruction using augmented Lagrangian methods

    • Image in-painting (e.g., from cutset sampling) using sparsity

    http://dx.doi.org/10.1109/TIP.2013.2244218http://dx.doi.org/10.1109/TMI.2011.2175233http://fully3d.orghttp://dx.doi.org/10.1109/TMI.2010.2093536

  • 5

    Image denoising

  • 6

    Denoising using sparsity

    Measurement model:

    yyy︸︷︷︸observed

    = xxx︸︷︷︸unknown

    + εεε︸︷︷︸noise

    Object model: assume QQQxxx is sparse (compressible) for some orthogonalsparsifying transform QQQ, such as an orthogonal wavelet transform (OWT).

    Sparsity regularized estimator:

    x̂xx = argminxxx

    12∥yyy− xxx∥22︸ ︷︷ ︸data fit

    +β ∥QQQxxx∥p︸ ︷︷ ︸sparsity

    .

    Regularization parameter β determines trade-off.

    Equivalently (because QQQ−1 = QQQ′ is an orthonormal matrix):

    x̂xx = QQQ′θ̂θθ , θ̂θθ = argminθθθ

    12∥QQQyyy−θθθ∥22+β∥θθθ∥p = shrink(QQQyyy : β, p)

    Non-iterative solution!

  • 7

    Orthogonal transform thresholding

    Equation:x̂xx = QQQ′ shrink(QQQyyy : β, p)

    Block diagram:

    Noisyimage

    yyy→

    AnalysisTransform

    QQQ→ Shrinkβ , p → θ̂θθ →

    SynthesisTransform

    QQQ’→

    Denoisedimage

    x̂xx

    todo: show shrink function for p = 1 and p = 0

    But sparsity in orthogonal transforms often yields artifacts. Spin cycling...

  • 8

    Hard thresholding example

    PSNR = 76.1 dB

    Noisy image

    1

    200

    1 184

    PSNR = 89.7 dB

    Denoised

    1

    200

    1 184

    p = 0, orthonormal Haar wavelets

  • 9

    Sparsity using shift-invariant models

    Analysis form:Assume RRRxxx is sparse for some sparsifying transform RRR.Often RRR is a “tall” matrix, e.g., finite differences along horizontal and verticaldirections, i.e., anisotropic total variation (TV).Often RRR is shift invariant: ∥RRRxxx∥p = ∥RRRcircshift(xxx)∥p and RRR

    ′RRR is circulant.

    x̂xx = argminxxx

    12∥yyy− xxx∥22+β∥RRRxxx∥p︸ ︷︷ ︸

    transform sparsity

    .

    Synthesis formAssume xxx = SSSθθθ where coefficient vector θθθ is sparse.Often SSS is a “fat” matrix (over-complete dictionary) and SSS′SSS is circulant.

    x̂xx = SSSθ̂θθ , θ̂θθ = argminθθθ

    12∥yyy−SSSθθθ∥22+β∥θθθ∥p︸ ︷︷ ︸

    sparse coefficients

    Analysis form preferable to synthesis form?(Elad et al., Inv. Prob., June 2007)

    http://dx.doi.org/10.1088/0266-5611/23/3/007

  • 10

    Constrained optimization

    Unconstrained estimator (analysis form for illustration):

    x̂xx = argminxxx

    12∥yyy− xxx∥22+β∥RRRxxx∥p .

    (Nonnegativity constraint or box constraints easily added.)

    Equivalent constrained optimization problem:

    minxxx,vvv

    12∥yyy− xxx∥22+β∥vvv∥p sub. to vvv = RRRxxx.

    (Y. Wang et al., SIAM J. Im. Sci., 2008)(M Afonso, J Bioucas-Dias, M Figueiredo, IEEE T-IP, Sep. 2010)

    (The auxiliary variable vvv is discarded after optimization; keep only x̂xx.)

    Penalty approach:

    x̂xx = argminxxx

    minvvv

    12∥yyy− xxx∥22+β∥vvv∥p+

    µ2∥vvv−RRRxxx∥22 .

    Large µ better enforces the constraint vvv = RRRxxx, but can worsen conditioning.

    Preferable (?) approach: augmented Lagrangian.

    http://dx.doi.org/10.1137/080724265http://dx.doi.org/10.1109/TIP.2010.2047910

  • 11

    Augmented Lagrangian method: V1

    General linearly constrained optimization problem:min

    uuuΨ(uuu) sub. to CCC uuu = bbb.

    Form augmented Lagrangian:

    L(uuu,γγγ)≜ Ψ(uuu)+γγγ ′(CCC uuu−bbb)+ ρ2∥CCC uuu−bbb∥22

    where γγγ is the dual variable or Lagrange multiplier vector .

    AL method alternates between minimizing over uuu and gradient ascent on γγγ:uuu(n+1) = argmin

    uuuL(uuu,γγγ (n))

    γγγ (n+1) = γγγ (n)+ρ(CCC uuu(n+1)−bbb

    ).

    Desirable convergence properties.AL penalty parameter ρ affects convergence rate, not solution!

    Unfortunately, minimizing over uuu is impractical here:

    vvv = RRRxxx equivalent to CCC uuu = bbb, CCC = [RRR −III], uuu =[

    xxxvvv

    ], bbb = 000.

  • 12

    Augmented Lagrangian method: V2

    General linearly constrained optimization problem:min

    uuuΨ(uuu) sub. to CCC uuu = bbb.

    Form (modified) augmented Lagrangian by completing the square:

    L(uuu,ηηη)≜ Ψ(uuu)+ρ2∥CCC uuu−ηηη∥22+Cηηη,

    where ηηη ≜ bbb− 1ρ γγγ is a modified dual variable or Lagrange multiplier vector .

    AL method alternates between minimizing over uuu and gradient ascent on ηηη :uuu(n+1) = argmin

    uuuL(uuu,γγγ (n))

    ηηη (n+1) = ηηη (n)− (CCC uuu(n+1)−bbb).Desirable convergence properties.AL penalty parameter ρ affects convergence rate, not solution!

    Unfortunately, minimizing over uuu is impractical here:

    vvv = RRRxxx equivalent to CCC uuu = bbb, CCC = [RRR −III], uuu =[

    xxxvvv

    ], bbb = 000.

  • 13

    Alternating direction method of multipliers (ADMM)

    When uuu has multiple component vectors, e.g., uuu =[

    xxxvvv

    ],

    rewrite (modified) augmented Lagrangian in terms of all component vectors:

    L(xxx,vvv;ηηη) = Ψ(xxx,vvv)+ρ2∥RRRxxx− vvv−ηηη∥22

    =12∥yyy− xxx∥22+β∥vvv∥p+

    ρ2∥RRRxxx− vvv−ηηη∥22︸ ︷︷ ︸cf. penalty!

    because here CCC uuu = RRRxxx− vvv.

    Alternate between minimizing over each component vector:xxx(n+1) = argmin

    xxxL(xxx,vvv(n),ηηη (n))

    vvv(n+1) = argminvvv

    L(xxx(n+1),vvv,ηηη (n))

    ηηη (n+1) = ηηη (n)+(RRRxxx(n+1)− vvv(n+1)

    ).

    Reasonably desirable convergence properties. (Inexact inner minimizations!)Sufficient conditions on matrix CCC.(Eckstein & Bertsekas, Math. Prog., Apr. 1992)(Douglas and Rachford, Tr. Am. Math. Soc., 1956, heat conduction problems)

    http://dx.doi.org/10.1007/BF01581204

  • 14

    ADMM for image denoising

    Augmented Lagrangian:

    L(xxx,vvv;ηηη) =12∥yyy− xxx∥22+β∥vvv∥p+

    ρ2∥RRRxxx− vvv−ηηη∥22

    Update of primal variable (unknown image):

    xxx(n+1) = argminxxx

    L(xxx,vvv(n),ηηη (n)) = [III +ρRRR′RRR]−1︸ ︷︷ ︸Wiener filter

    (yyy+ρRRR′

    (vvv(n)+ηηη (n)

    ))

    Update of auxiliary variable: (No “corner rounding” needed for ℓ1.)

    vvv(n+1) = argminvvv

    L(xxx(n+1),vvv,ηηη (n)) = shrink(RRRxxx(n+1)−ηηη (n);β/ρ, p

    )Update of multiplier: ηηη (n+1) = ηηη (n)+

    (RRRxxx(n+1)− vvv(n+1)

    )Equivalent to “split Bregman” approach.(Goldstein & Osher, SIAM J. Im. Sci. 2009)

    Each update is simple and exact (non-iterative) if [III +ρRRR′RRR]−1 is easy.

    http://dx.doi.org/10.1137/080725891

  • 15

    ADMM image denoising example

    True

    1 128

    1

    128

    Noisy

    SNR = 24.8 dB1 128

    1

    128

    Denoised

    SNR = 53.5 dB1 128

    1

    1280 5 10 15 20

    25

    53

    ADMM iteration

    SN

    R (

    dB)

    RRR : horizontal and vertical finite differences (anisotropic TV),p = 1 (i.e., ℓ1), β= 1/2, ρ = 1 (condition number of (III +ρRRR′RRR) is 9)

  • 16

    ADMM image denoising iterates

  • 17

    X-ray CT image reconstructionPart 1: ADMM

  • 18

    X-ray CT review 1

    X-ray source transmits X-ray photons through object.Recorded signal relates to line integral of attenuation along photon path.

  • 19

    X-ray CT review 2

    X-ray source and detector rotate around object.

  • 20

    X-ray CT review 3

    Collection of recorded views called a sinogram.Goal is to reconstruct (3D) image of object attenuation from sinogram.

  • 21

    Lower-dose X-ray CT

    Radiation dose proportional to X-ray source intensity.Reducing dose =⇒ fewer recorded photons =⇒ lower SNRConventional filter back-project (FBP) method derived for noiseless data

    Conventional FBP reconstruction

    Statistical image reconstruction

  • 22

    Low-dose X-ray CT image reconstruction

    Regularized estimator:

    x̂xx = argminxxx⪰000

    12∥yyy−AAAxxx∥2WWW︸ ︷︷ ︸data fit

    +β∥RRRxxx∥p︸ ︷︷ ︸sparsity

    .

    Complications:• Large problem size

    ◦ xxx: 512×512×800 ≈ 2 ·108 unknown image volume◦ yyy: 888×64×7000 ≈ 4 ·108 measured sinogram◦ AAA: (4 ·108)× (2 ·108) system matrix◦ AAA is sparse but still too large to store◦ Projection AAAxxx and back-projection AAA′ rrr operations computed on the fly◦ Computing gradient ∇Ψ(xxx)=AAA′WWW (AAAxxx−yyy)+β∇R(xxx) requires projection

    and back-projection operations that dominate computation• AAA′AAA is not circulant (but “approximately Toeplitz” in 2D)• AAA′WWWAAA is highly shift variant due to huge dynamic range of weighting WWW• Non-quadratic (edge-preserving) regularizer e.g., R(xxx) = ∥RRRxxx∥p• Nonnegativity constraint• Goal: fast parallelizable algorithms that “converge” in a few iterations

  • 23

    Basic ADMM for X-ray CT

    Basic equivalent constrained optimization problem (cf. split Bregman):

    minxxx⪰000,vvv

    12∥yyy−AAAxxx∥2WWW +β∥vvv∥p sub. to vvv = RRRxxx.

    Corresponding (modified) augmented Lagrangian (cf. “split Bregman”):

    L(xxx,vvv;ηηη) =12∥yyy−AAAxxx∥2WWW +β∥vvv∥p+

    ρ2∥RRRxxx− vvv−ηηη∥22

    ADMM update of primal variable (unknown image):

    xxx(n+1) = argminxxx

    L(xxx,vvv(n),ηηη (n)) =[AAA′WWWAAA+ρRRR′RRR

    ]−1(AAA′WWW ′yyy+ρRRR′(vvv(n)+ηηη (n)))Drawbacks:• Ignores nonnegativity constraint•[AAA′WWWAAA+ρRRR′RRR

    ]−1 requires iteration (e.g., PCG) but hard to precondition.“second order method”

    • Auxiliary variable vvv = RRRxxx is enormous in 3D CT

  • 24

    Improved ADMM for X-ray CT

    minxxx⪰000,uuu,vvv

    12∥yyy−uuu∥2WWW +β∥vvv∥p sub. to vvv = RRRxxx, uuu = AAAxxx.

    Corresponding (modified) augmented Lagrangian:

    L(xxx,uuu,vvv;ηηη1,ηηη2) =12∥yyy−uuu∥2WWW +β∥vvv∥p+

    ρ12∥RRRxxx− vvv−ηηη1∥

    22+

    ρ22∥AAAxxx−uuu−ηηη2∥

    22

    ADMM update of primal variable (ignoring nonnegativity):

    argminxxx

    L(xxx,uuu,vvv,ηηη1,ηηη2) =[ρ2AAA′AAA+ρ1RRR′RRR

    ]−1(ρ1RRR′ (vvv+ηηη1)+ρ2AAA′(uuu+ηηη2))For 2D CT,

    [ρ2AAA′AAA+ρ1RRR′RRR

    ]−1 is approximately Toeplitzso a circulant preconditioner is very effective.

    ADMM update of auxiliary variable uuu:

    argminuuu

    L(xxx,uuu,vvv,ηηη1,ηηη2) = [WWW +ρ2III]−1︸ ︷︷ ︸

    diagonal

    (WWWyyy+ρ2(AAAxxx−ηηη2))

    vvv update is shrinkage again. Reasonably simple to code.

    (Sathish Ramani & JF, IEEE T-MI, Mar. 2012)

    http://dx.doi.org/10.1109/TMI.2011.2175233

  • 25

    2D X-ray CT image reconstruction results: quality

    True Hanning FBP Regularized

    0

    0.04

    0.08

    0.12

    0.16

    0.2

    0.24

    0.28

    0.32

    0.36

    0.4

    PWLS with ℓ1 regularization of shift-invariant Haar wavelet transform.No nonnegativity constraint, but probably unimportant if well-regularized.

  • 26

    2D X-ray CT image reconstruction results: speed

    0 36 72 108 144 180 216 252 288 324 360−34.5

    −33.4

    −32.4

    −31.3

    −30.3

    −29.2

    −28.2

    −27.1

    −26.1

    −25

    −24

    −22.9

    −21.9

    −20.8

    −19.8

    −18.7

    −17.7

    RM

    SE

    ( x(

    j))

    (in

    dB

    )

    Iterations

    NCG−5MFISTA−5SB−CG−1SB−CG−2SB−PCG−1SB−PCG−2ADMM−CG−1ADMM−CG−2ADMM−PCG−1ADMM−PCG−2

    0 79 158 237 316 395 474 553 632 711 790−34.5

    −33.4

    −32.4

    −31.3

    −30.3

    −29.2

    −28.2

    −27.1

    −26.1

    −25

    −24

    −22.9

    −21.9

    −20.8

    −19.8

    −18.7

    −17.7

    RM

    SE

    ( x(

    j))

    (in

    dB

    )

    t j (seconds)

    NCG−5MFISTA−5SB−CG−1SB−CG−2SB−PCG−1SB−PCG−2ADMM−CG−1ADMM−CG−2ADMM−PCG−1ADMM−PCG−2

    Circulant preconditioner for[ρ2AAA′AAA+ρ1RRR′RRR

    ]−1 is crucial to acceleration.Similar results for real head CT scan in paper.

  • 27

    Lower-memory ADMM for X-ray CT

    minxxx,uuu,zzz⪰000

    12∥yyy−uuu∥2WWW +β∥RRRzzz∥p sub. to zzz = xxx, uuu = AAAxxx.

    (M McGaffin, S Ramani, JF, SPIE 2012)

    Corresponding (modified) augmented Lagrangian:

    L(xxx,uuu,zzz;ηηη1,ηηη2) =12∥yyy−uuu∥2WWW +β∥RRRzzz∥p+

    ρ12∥xxx− zzz−ηηη1∥

    22+

    ρ22∥AAAxxx−uuu−ηηη2∥

    22

    ADMM update of primal variable (nonnegativity not required, use PCG):

    argminxxx

    L(xxx,uuu,zzz,ηηη1,ηηη2) =[ρ2AAA′AAA+ρ1III

    ]−1(ρ1 (zzz+ηηη1)+ρ2AAA′(uuu+ηηη2)) .ADMM update of auxiliary variable zzz:

    argminzzz⪰000

    L(xxx,uuu,zzz,ηηη1,ηηη2) = argminzzz⪰000

    ρ12∥xxx− zzz−ηηη1∥

    22+β∥RRRzzz∥p .

    Use nonnegatively constrained, edge-preserving image denoising.

    ADMM updates of auxiliary variables uuu and vvv same as before.Variations...

    http://dx.doi.org/10.1117/12.910941

  • 28

    3D X-ray CT image reconstruction results

    Awaiting better preconditioner for[ρ2AAA′AAA+ρ1III

    ]−1 in 3D CT...This is not an “easy problem” like in (idealized) image restoration...

  • 29

    X-ray CT image reconstructionPart 2: OS+Momentum

  • 30

    OS+Momentum preview

    • Optimization transfer (aka majorize-minimize, half-quadratic)• Ordered-subsets (OS) acceleration (aka incremental gradient, block iterative)• Nesterov’s momentum-based acceleration• Proposed OS+Momentum approach combines both

    0 5 10 15 200

    5

    10

    15

    20

    25

    30

    Iteration

    RM

    SD

    [HU

    ]

    Donghwan Kim, Sathish Ramani, JF, Fully3D June 2013Accelerating X-ray CT ordered subsets image reconstruction with Nesterov’s first-ordermethods

    http://fully3d.org

  • 31

    Optimization transfer method

    (aka majorization-minimization or half-quadratic)

    • At nth iteration, replace original cost function Ψ(xxx)by a surrogate function ϕ (n)(xxx) that is easier to minimize:

    xxx(n+1) = argminxxx⪰000

    ϕ (n)(xxx)

    • To monotonically decrease Ψ(xxx), i.e., Ψ(xxx(n+1)

    )≤ Ψ(xxx(n)),

    surrogate should satisfy the following majorization conditions:

    ϕ (n)(xxx(n)

    )= Ψ

    (xxx(n)

    )ϕ (n)(xxx) ≥ Ψ(xxx), ∀xxx ⪰ 000

  • 32

    Separable quadratic surrogate (SQS)

    Quadratic surrogate functions are particularly convenient:

    Ψ(xxx)≤ ϕ (n)(xxx)≜ Ψ(xxx(n)

    )+∇Ψ

    (xxx(n)

    )(xxx− xxx(n))+ 1

    2

    ∥∥xxx− xxx(n)∥∥2DDD ,where DDD is a specially designed diagonal matrix:

    DDD = DDDL+βDDDR, DDDL ≜ diag{

    AAA′WWWAAA111}, DDDR ≜ λmax

    (∇2R(xxx)

    )III.

    (Erdoğan and Fessler, PMB, 1999)

    (Easier to compute DDDL than to find Lipschitz constant of AAA′WWWAAA.)

    SQS leads to trivial M-step:

    xxx(n+1) = argminxxx⪰000

    ϕ (n)(xxx) =[xxx(n)−DDD−1∇Ψ

    (xxx(n)

    )]+.

    “diagonally preconditioned gradient projection method”

    http://dx.doi.org/10.1088/0031-9155/44/11/311

  • 33

    Convergence rate of SQS method

    Asymptotic convergence rate:

    ρ(III −DDD−1∇2 Ψ(x̂xx)

    )Slightly generalizing Theorem 3.1 of Beck and Teboulle:

    Ψ(xxx(n)

    )−Ψ(x̂xx)≤

    ∥∥xxx(0)− x̂xx∥∥2DDD2n

    .

    (Beck and Teboulle, SIAM J Im. Sci., 2009)

    Pro: easily parallelizedCon: very slow convergence

    http://dx.doi.org/10.1137/080716542

  • 34

    Accelerating SQS using Nesterov’s momemtum

    SQS+Momentum Algorithm:• Initialize image xxx(0) and zzz(0)• for n = 0,1, . . .

    xxx(n+1) = argminxxx⪰000

    ϕ (n)(zzz(n)

    )=[zzz(n)−DDD−1∇Ψ

    (zzz(n)

    )]+

    tn+1 =(

    1+√

    1+4t2n)/2

    zzz(n+1) = xxx(n+1)+tn−1tn+1

    (xxx(n+1)− xxx(n)

    )Convergence rate of SQS+Momentum method:

    Ψ(xxx(n)

    )−Ψ(x̂xx)≤

    2∥∥xxx(0)− x̂xx∥∥2DDD(n+1)2

    .

    Simple generalization of Thm. 4.4 for FISTA of(Beck and Teboulle, SIAM J Im. Sci., 2009)

    Pro: Almost same computation per iteration; slightly more memory needed.Con: still converges too slowly for X-ray CT

    http://dx.doi.org/10.1137/080716542

  • 35

    Ordered subsets (OS) methods

    • Recall: Projection operator AAA is computationally expensive.•• OS methods group projection views into M subsets, and use each subset

    per each update, instead of using all measurement data.(Hudson and Larkin, IEEE T-MI, 1994)(Erdoğan and Fessler, PMB, 1999)

    cf block-iterative incremental sub-gradient for machine learning

    http://dx.doi.org/ 10.1109/42.363108 http://dx.doi.org/10.1088/0031-9155/44/11/311

  • 36

    OS projection view grouping

  • 37

    OS projection view grouping

  • 38

    OS projection view grouping

  • 39

    OS algorithm

    Cost function decomposition:

    Ψ(xxx) =M

    ∑m=1

    Ψm(xxx), Ψm(xxx) =12∥yyym−AAAmxxx∥

    2WWW m +

    1M

    R(xxx)

    yyym, AAAm, WWW m: sinogram rows, system matrix rows, weighting elementsfor mth subset of projection views

    Intuition: in early iterations (when xxx(n) is far from x̂xx):

    M∇Ψm(xxx(n)

    )≈ ∇Ψ

    (xxx(n)

    ).

    OS-SQS Algorithm (Erdoğan and Fessler, PMB, 1999)• Initialize image xxx(0)• for n = 0,1, . . .• for m = 0, . . . ,M−1

    xxx(n+(m+1)/M) = argminxxx⪰000

    [xxx(n+m/M)−DDD−1M∇Ψm

    (xxx(n+m/M)

    )]+

    http://dx.doi.org/10.1088/0031-9155/44/11/311

  • 40

    OS-SQS algorithm properties

    • One iteration corresponds to updating all M subsets.Computation cost similar to original SQS(one full forward AAA and back-projection AAA′ per iteration)

    • + Highly parallelizable• + In early iterations, “we expect” the sequence {xxx(n)} to satisfy

    Ψ(xxx(n)

    )−Ψ(x̂xx)⪅

    ∥∥xxx(0)− x̂xx∥∥2DDD2nM

    .

    M times acceleration!• - Does not converge to x̂xx

    Approaches a limit cycle, size related to MLuo, Neural Computation, June 1991

    • - Computing ∇R(xxx) for each of M subsets =⇒ prefer small M• Since about 1997, OS methods have been used for (unregularized) PET

    reconstruction in nearly every PET scanner sold.• Still undesirably slow (for small M) or unstable (for large M) in X-ray CT.

    http://dx.doi.org/ 10.1162/neco.1991.3.2.226

  • 41

    OS+Momentum algorithm• Initialize image xxx(0) and zzz(0)• for n = 0,1, . . .• for m = 0,1, . . .

    xxx(n+(m+1)/M) =[zzz(n+m/M)−DDD−1M∇Ψm

    (zzz(n+m/M)

    )]+

    tnew =(

    1+√

    1+4t2old

    )/2

    zzz(n+(m+1)/M) = xxx(n+(m+1)/M)+told−1

    tnew

    (xxx(n+(m+1)/M)− xxx(n+m/M)

    )told := tnew

    • + In early iterations, “we expect” the sequence {xxx(n)} to satisfy

    Ψ(xxx(n)

    )−Ψ(x̂xx)⪅

    ∥∥xxx(0)− x̂xx∥∥2DDD2(nM)2

    .

    M2 times acceleration!• + Very similar computation as OS-SQS• + Easily implemented• - Unknown convergence properties

  • 42

    Summary of convergence rates

    SQS (optimization transfer) methods:

    • Convergence rate

    ◦ SQS: O(

    1n

    )◦ SQS+Momentum: O

    (1n2

    )• Expected convergence rate with OS method in early iterations

    ◦ OS-SQS: O(

    1nM

    )◦ Proposed OS-SQS+Momentum: O

    (1

    (nM)2

    )• Pros: Owing to M2 times acceleration from OS methods, we can use

    small M, improving stability and reducing regularizer computation.• Cons: Behavior of OS methods with momentum is unknown, while or-

    dinary OS methods approach a limit-cycle. (Luo, Neural Comp., Jun. 1991)

    http://dx.doi.org/ 10.1162/neco.1991.3.2.226

  • 43

    Patient 3D helical CT scan results

    • 3D cone-beam helical CT scan with pitch 1.0• 3D image xxx: 512×512×109• voxel size: 1.369 mm × 1.369 mm × 0.625 mm• measured sinogram data yyy: 888×32×7146

    (detector columns × detector rows × projection views)

    Convergence rates (empirical)• Root mean square difference (RMSD) between current xxx(n) and converged

    image x̂xx

    RMSD ≜ ∥xxx(n)− x̂xx∥2√

    Np[HU],

    where Np = 512×512×109 is the number of image voxels in xxx.• Normalized RMSD:

    NRMSD ≜ 20log10(∥xxx(n)− x̂xx∥2

    ∥x̂xx∥2

    )[dB].

    • x̂xx obtained by many iterations of several convergent algorithms

  • 44

    Convergence rate: RMSD [HU]

    0 5 10 15 200

    5

    10

    15

    20

    25

    30

    Iteration

    RM

    SD

    [HU

    ]

    SQSSQS−momentum OS(24)−SQSOS(24)−SQS−momentum

    • Slow convergence without OS methods.• OS methods with M = 24 subsets needed 20% extra compute time per

    iteration due to ∇R(xxx).• OS-SQS-Momentum “converges” very rapidly in early iterations!• Does not reach RMSD=0...

  • 45

    Convergence rate: Normalized RMSD [dB]

    0 5 10 15 20−50

    −45

    −40

    −35

    −30

    −25

    Iteration

    Nor

    mal

    ized

    RM

    SD

    [dB

    ]

    SQSSQS−momentum OS(24)−SQSOS(24)−SQS−momentum

    Combining incremental (sub)gradient with Nesterov-type momentumacceleration may help other “big data” estimation problems.

  • 46

    Images

    Initial FBP image x(0)

    Converged image x*

    800

    850

    900

    950

    1000

    1050

    1100

    1150

    1200

  • 47

    Images

    SQS

    RMSD = 29.3 [HU] SQS−momentum

    RMSD = 27.9 [HU]

    OS(24)−SQS RMSD = 16.8 [HU]

    OS(24)−SQS−momentum RMSD = 1.9 [HU]

    800 850 900 950 1000 1050 1100 1150 1200

    Reconstructed images at 12th iteration. ([800 1200] HU)

    OS-SQS+Momentum with M = 24 subsets much closer to minimizer x̂xx

  • 48

    Difference images

    SQS

    RMSD = 29.3 [HU] SQS−momentum

    RMSD = 27.9 [HU]

    OS(24)−SQS RMSD = 16.8 [HU]

    OS(24)−SQS−momentum RMSD = 1.9 [HU]

    −100 −80 −60 −40 −20 0 20 40 60 80 100

    Difference between reconstructed images at 12th iterationand converged image x̂xx. ([-100 100] HU)

  • 49

    Newer Nesterov method

    0 5 10 15 200

    5

    10

    15

    20

    25

    30

    Iteration

    RM

    SD

    [HU

    ]

    SQS−Nes83 SQS−Nes05 OS24−SQSOS48−SQSOS24−SQS−Nes83 OS24−SQS−Nes05 OS48−SQS−Nes83 OS48−SQS−Nes05

    Remains stable even for M = 48 subsets. (Nesterov, Math. Prog., May 2005)

    http://dx.doi.org/ 10.1007/s10107-004-0552-5

  • 50

    Some research problems in CT

    x̂xx = argminxxx⪰000

    12∥yyy−AAAxxx∥2WWW +βR(xxx).

    Reasonably mature research areas• Design and implementation of system model AAA• Statistical modeling WWW

    Open problems• Design of regularizer R(xxx) to maximize radiologist performance• Faster parallelizable algorithms (argmin) with global convergence• Distributed computation – reducing communication• Algorithms for more complete/complicated physical models

    (e.g., dual energy or spectral CT)• Dynamic imaging / motion compensated image reconstruction• Analysis of statistical properties of (highly nonlinear) estimator x̂xx

  • 51

    Image reconstruction for parallel MRI

  • 52

    Parallel MRI

    Undersampled Cartesian k-space, multiple receive coils, ...(Pruessmann et al., MRM, Nov. 1999)

    Compressed sensing parallel MRI ≡ further (random) under-samplingLustig et al., IEEE Sig. Proc. Mag., Mar. 2008

    http://dx.doi.org/10.1002/(SICI)1522-2594(199911)42:53.0.CO;2-Shttp://dx.doi.org/10.1109/MSP.2007.914728

  • 53

    Model-based image reconstruction in parallel MRI

    Regularized estimator:

    x̂xx = argminxxx

    12∥yyy−FFFSSSxxx∥22︸ ︷︷ ︸data fit

    +β∥RRRxxx∥p︸ ︷︷ ︸sparsity

    .

    FFF is under-sampled DFT matrix (fat)

    Features:• coil sensitivity matrix SSS is block diagonal (Pruessmann et al., MRM, Nov. 1999)• FFF ′FFF is circulant

    Complications:• Data-fit Hessian SSS′FFF ′FFFSSS is highly shift variant due to coil sensitivity maps• Non-quadratic (edge-preserving) regularization ∥·∥p• Complex quantities• Large problem size (if 3D)

    http://dx.doi.org/10.1002/(SICI)1522-2594(199911)42:53.0.CO;2-S

  • 54

    Basic ADMM for parallel MRI

    Basic equivalent constrained optimization problem (cf. split Bregman):

    minxxx,vvv

    12∥yyy−FFFSSSxxx∥22+β∥vvv∥p sub. to vvv = RRRxxx.

    Corresponding (modified) augmented Lagrangian (cf. “split Bregman”):

    L(xxx,vvv;ηηη) =12∥yyy−FFFSSSxxx∥22+β∥vvv∥p+

    ρ2∥RRRxxx− vvv−ηηη∥22

    (Skipping technical details about complex vectors.)

    ADMM update of primal variable (unknown image):

    xxx(n+1) = argminxxx

    L(xxx,vvv(n),ηηη (n)) =[SSS′FFF ′FFFSSS+ρRRR′RRR

    ]−1(SSS′FFF ′yyy+ρRRR′(vvv(n)+ηηη (n)))•[SSS′FFF ′FFFSSS+ρRRR′RRR

    ]−1 requires iteration (e.g., PCG) but hard to precondition• (Trivial for single coil case with SSS = III.)• The “problem” matrix is on opposite side:

    ◦ MRI: FFFSSS◦ Restoration: TTT AAA

  • 55

    Improved ADMM for parallel MRI

    minxxx,uuu,vvv,zzz

    12∥yyy−FFF uuu∥22+β∥vvv∥p sub. to vvv = RRRzzz, uuu = SSSxxx, zzz = xxx.

    Corresponding (modified) augmented Lagrangian:12∥yyy−FFF uuu∥22+β∥vvv∥p+

    ρ12∥RRRzzz− vvv−ηηη1∥

    22+

    ρ22∥SSSxxx−uuu−ηηη2∥

    22+

    ρ32∥xxx− zzz−ηηη3∥

    22

    ADMM update of primal variable

    argminxxx

    L(xxx,uuu,vvv,zzz;ηηη1,ηηη2,ηηη3) =[ρ2SSS′SSS+ρ3III

    ]−1︸ ︷︷ ︸diagonal

    (ρ2SSS′ (uuu+ηηη2)+ρ3(zzz+ηηη3)

    )

    ADMM update of auxiliary variables:argmin

    uuuL(xxx,uuu,vvv,zzz;ηηη1,ηηη2,ηηη3) = [FFF ′FFF +ρ2III]

    −1︸ ︷︷ ︸circulant

    (FFF ′yyy+ρ2(SSSxxx−ηηη2))

    argminzzz

    L(xxx,uuu,vvv,zzz;ηηη1,ηηη2,ηηη3) = [ρ1RRR′RRR+ρ3III]−1︸ ︷︷ ︸

    circulant

    (ρ1RRR′(vvv+ηηη1)+ρ3(xxx−ηηη3))

    vvv update is shrinkage again.Simple, but does not satisfy sufficient conditions.(Sathish Ramani & JF, IEEE T-MI, Mar. 2011)

    http://dx.doi.org/10.1109/TMI.2010.2093536

  • 56

    2.5D parallel MR image reconstruction results: data

    0.4

    32

    63.6

    95.1

    126.7

    158.2

    189.8

    221.3

    252.9

    284.4

    316

    347.5

    0.4

    32

    63.6

    95.1

    126.7

    158.2

    189.8

    221.3

    252.9

    284.4

    316

    347.5

    Fully sampled body coil image of human brainPoisson-disk-based k-space sampling, 16% sampling (acceleration 6.25)Square-root of sum-of-squares inverse FFT of zero-filled k-space data

  • 57

    2.5D parallel MR image reconstruction results: IQ

    0.4

    32

    63.6

    95.1

    126.7

    158.2

    189.8

    221.3

    252.9

    284.4

    316

    347.5

    0.4

    32

    63.6

    95.1

    126.7

    158.2

    189.8

    221.3

    252.9

    284.4

    316

    347.5

    0

    8.4

    16.8

    25.2

    33.5

    41.9

    50.3

    58.7

    67

    75.4

    83.8

    92.2

    • Fully sampled body coil image of human brain• Regularized reconstruction xxx(∞) (1000s of iterations of MFISTA)

    (A Beck & M Teboulle, SIAM J. Im. Sci, 2009)Combined TV and ℓ1 norm of two-level undecimated Haar wavelets

    • Difference image magnitude

    http://dx.doi.org/10.1137/080716542

  • 58

    2.5D parallel MR image reconstruction results: speed

    0 70 140 210 281 351 421 492 562 632 703 773−93.1

    −86.4

    −79.7

    −72.9

    −66.2

    −59.5

    −52.8

    −46.1

    −39.3

    −32.6

    −25.9

    −19.2

    −12.5

    −5.7

    1ξ(

    j)

    (in

    dB

    )

    t j (seconds)

    MFISTA−1MFISTA−5NCG−1NCG−5AL−P1−4AL−P1−6AL−P1−10AL−P2

    AL approach converges to xxx(∞) much faster than MFISTA and CG

  • 59

    Current and future directions with ADMM• Motion-compensated image reconstruction: yyy = AAATTT (ααα)xxx+ εεε

    (J H Cho, S Ramani, JF, 2nd CT meeting, 2012)(J H Cho, S Ramani, JF, IEEE Stat. Sig. Proc. W., 2012)

    • Dynamic image reconstruction• Improved preconditioners for ADMM for 3D CT

    (M McGaffin and JF, Submitted to Fully 3D 2013)• Combining ADMM with ordered subsets (OS) methods

    (H Nien and JF, Submitted to Fully 3D 2013)• Generalize parallel MRI algorithm to include spatial support constraint

    (M Le, S Ramani, JF, To appear at ISMRM 2013)• Non-Cartesian MRI (combine optimization transfer and variable splitting)

    (S Ramani and JF, ISBI 2013, to appear.)• SPECT-CT reconstruction with non-local means regularizer

    (S Y Chun, Y K Dewaraja, JF, Submitted to Fully 3D 2013)• Estimation of coil sensitivity maps (quadratic problem!)

    (M J Allison, S Ramani, JF, IEEE T-MI, Mar. 2013)• L1-SPIRiT for non-Cartesian parallel MRI (D S Weller, S Ramani, JF, IEEE

    T-MI, 2013, submitted)• Multi-frame super-resolution• Selection of AL penalty parameter ρ to optimize convergence rate

    • Other non-ADMM methods...

    http://dx.doi.org/10.1109/SSP.2012.6319667http://dx.doi.org/10.1109/TMI.2012.2229711http://dx.doi.org/http://dx.doi.org/

  • 60

    Acknowledgements

    CT group• Jang Hwan Cho• Donghwan Kim• Jungkuk Kim• Madison McGaffin• Hung Nien• Stephen Schmitt

    MR group• Michael Allison• Mai Le• Antonis Matakos• Matthew Muckley

    Post-doctoral fellows• Se Young Chun• Sathish Ramani• Daniel Weller

    UM collaborators• Doug Noll• Jon-Fredrik Nielsen• Mitch Goodsitt• Ella Kazerooni• Tom Chenevert• Charles Meyer

    GE collaborators• Bruno De Man• Jean-Baptiste Thibault

  • 61

    Image reconstruction toolbox

    MATLAB (and increasingly Octave) toolbox for imaging inverse problems(MRI, CT, PET, SPECT, Deblurring)

    web.eecs.umich.edu/˜fessler

    -600

    -400

    -200

    0

    200

    400

    600

    x (mm)-600

    -400

    -200

    0

    200

    400

    600

    y (mm)

    -10

    -5

    0

    5

    10

    15

    z (mm)

    1

    985

    1968

    x

    y

    z

    web.eecs.umich.edu/~fessler

  • 62

    Bibliography[1] A. Matakos, S. Ramani, and J. A. Fessler. Accelerated edge-preserving image restoration without boundary artifacts. IEEE

    Trans. Im. Proc., 22(5):2019–29, May 2013.[2] S. Ramani and J. A. Fessler. A splitting-based iterative algorithm for accelerated statistical X-ray CT reconstruction. IEEE

    Trans. Med. Imag., 31(3):677–88, March 2012.[3] D. Kim, S. Ramani, and J. A. Fessler. Accelerating X-ray CT ordered subsets image reconstruction with Nesterov’s first-order

    methods. In Proc. Intl. Mtg. on Fully 3D Image Recon. in Rad. and Nuc. Med, pages 22–5, 2013.[4] D. Kim, S. Ramani, and J. A. Fessler. Ordered subsets with momentum for accelerated X-ray CT image reconstruction. In

    Proc. IEEE Conf. Acoust. Speech Sig. Proc., pages 920–3, 2013.[5] S. Ramani and J. A. Fessler. Parallel MR image reconstruction using augmented Lagrangian methods. IEEE Trans. Med.

    Imag., 30(3):694–706, March 2011.[6] M. Elad, P. Milanfar, and R. Rubinstein. Analysis versus synthesis in signal priors. Inverse Prob., 23(3):947–68, June 2007.[7] I. W. Selesnick and Mário A T Figueiredo. Signal restoration with overcomplete wavelet transforms: comparison of analysis

    and synthesis priors. In Proc. SPIE 7446 Wavelets XIII, page 74460D, 2009. Wavelets XIII.[8] Y. Wang, J. Yang, W. Yin, and Y. Zhang. A new alternating minimization algorithm for total variation image reconstruction.

    SIAM J. Imaging Sci., 1(3):248–72, 2008.[9] M. V. Afonso, José M Bioucas-Dias, and Mário A T Figueiredo. Fast image recovery using variable splitting and constrained

    optimization. IEEE Trans. Im. Proc., 19(9):2345–56, September 2010.[10] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating

    direction method of multipliers. Found. & Trends in Machine Learning, 3(1):1–122, 2010.[11] J. Eckstein and D. P. Bertsekas. On the Douglas-Rachford splitting method and the proximal point algorithm for maximal

    monotone operators. Mathematical Programming, 55(1-3):293–318, April 1992.[12] J. Douglas and H. H. Rachford. On the numerical solution of heat conduction problems in two and three space variables.

    tams, 82(2):421–39, July 1956.[13] T. Goldstein and S. Osher. The split Bregman method for L1-regularized problems. SIAM J. Imaging Sci., 2(2):323–43, 2009.[14] S. J. Reeves. Fast image restoration without boundary artifacts. IEEE Trans. Im. Proc., 14(10):1448–53, October 2005.[15] M. G. McGaffin, S. Ramani, and J. A. Fessler. Reduced memory augmented Lagrangian algorithm for 3D iterative X-ray CT

    image reconstruction. In Proc. SPIE 8313 Medical Imaging 2012: Phys. Med. Im., page 831327, 2012.[16] H. Erdoğan and J. A. Fessler. Ordered subsets algorithms for transmission tomography. Phys. Med. Biol., 44(11):2835–51,

    November 1999.[17] H. M. Hudson and R. S. Larkin. Accelerated image reconstruction using ordered subsets of projection data. IEEE Trans. Med.

    Imag., 13(4):601–9, December 1994.[18] Z. Q. Luo. On the convergence of the LMS algorithm with adaptive learning rate for linear feedforward networks. Neural

    Computation, 32(2):226–45, June 1991.[19] Y. Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103(1):127–52, May 2005.

  • [20] K. P. Pruessmann, M. Weiger, M. B. Scheidegger, and P. Boesiger. SENSE: sensitivity encoding for fast MRI. Mag. Res. Med.,42(5):952–62, November 1999.

    [21] M. Lustig, D. L. Donoho, J. M. Santos, and J. M. Pauly. Compressed sensing MRI. IEEE Sig. Proc. Mag., 25(2):72–82, March2008.

    [22] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci.,2(1):183–202, 2009.

    [23] M. J. Allison, S. Ramani, and J. A. Fessler. Accelerated regularized estimation of MR coil sensitivities using augmentedLagrangian methods. IEEE Trans. Med. Imag., 32(3):556–64, March 2013.

    [24] M. J. Allison and J. A. Fessler. Accelerated computation of regularized field map estimates. In Proc. Intl. Soc. Mag. Res. Med.,page 0413, 2012.

    [25] J. H. Cho, S. Ramani, and J. A. Fessler. Motion-compensated image reconstruction with alternating minimization. In Proc.2nd Intl. Mtg. on image formation in X-ray CT, pages 330–3, 2012.

    [26] J. H. Cho, S. Ramani, and J. A. Fessler. Alternating minimization approach for multi-frame image reconstruction. In IEEEWorkshop on Statistical Signal Processing, pages 225–8, 2012.

    [27] M. McGaffin and J. A. Fessler. Sparse shift-varying FIR preconditioners for fast volume denoising. In Proc. Intl. Mtg. on Fully3D Image Recon. in Rad. and Nuc. Med, pages 284–7, 2013.

    [28] H. Nien and J. A. Fessler. Combining augmented Lagrangian method with ordered subsets for X-ray CT image reconstruction.In Proc. Intl. Mtg. on Fully 3D Image Recon. in Rad. and Nuc. Med, pages 280–3, 2013.

    [29] M. Le, S. Ramani, and J. A. Fessler. An efficient variable splitting based algorithm for regularized SENSE reconstruction withsupport constraint. In Proc. Intl. Soc. Mag. Res. Med., page 2654, 2013. To appear.

    [30] S. Ramani and J. A. Fessler. Accelerated non-Cartesian SENSE reconstruction using a majorize-minimize algorithm combin-ing variable-splitting. In Proc. IEEE Intl. Symp. Biomed. Imag., pages 700–3, 2013.

    [31] S. Y. Chun, Y. K. Dewaraja, and J. A. Fessler. Alternating direction method of multiplier for emission tomography with non-localregularizers. In Proc. Intl. Mtg. on Fully 3D Image Recon. in Rad. and Nuc. Med, pages 62–5, 2013.

    [32] D. Weller, S. Ramani, and J. A. Fessler. Augmented Lagrangian with variable splitting for faster non-Cartesian L1-SPIRiT MRimage reconstruction. IEEE Trans. Med. Imag., 2013. Submitted.

    [33] J. A. Fessler and W. L. Rogers. Spatial resolution properties of penalized-likelihood image reconstruction methods: Space-invariant tomographs. IEEE Trans. Im. Proc., 5(9):1346–58, September 1996.