accelerated optimization methods for large-scale medical ...fessler/papers/files/talk/13/...1...
TRANSCRIPT
-
1
Accelerated optimization methodsfor large-scale medical image reconstruction
Jeffrey A. Fessler
EECS Dept., BME Dept., Dept. of RadiologyUniversity of Michigan
web.eecs.umich.edu/˜fessler
CIMI Workshop, Toulouse
25 June 2013
web.eecs.umich.edu/~fessler
-
2
Disclosure
• Research support from GE Healthcare• Research support to GE Global Research• Supported in part by NIH grants R01 HL-098686 and P01 CA-87634• Equipment support from Intel
-
3
Statistical image reconstruction: a CT revolution
• A picture is worth 1000 words• (and perhaps several 1000 seconds of computation?)
Thin-slice FBP ASIR Statistical
Seconds A bit longer Much longer
(Same sinogram, so all at same dose)
-
4
Outline
• Image denoising (review)
• Image restorationAntonios Matakos, Sathish Ramani, JF, IEEE T-IP, May 2013Accelerated edge-preserving image restoration without boundary artifacts
• Low-dose X-ray CT image reconstructionSathish Ramani & JF, IEEE T-MI, Mar. 2012A splitting-based iterative algorithm for accelerated statistical X-ray CT reconstructionDonghwan Kim, Sathish Ramani, JF, Fully3D June 2013Accelerating X-ray CT ordered subsets image reconstruction with Nesterov’s first-ordermethods
• Model-based MR image reconstructionSathish Ramani & JF, IEEE T-MI, Mar. 2011Parallel MR image reconstruction using augmented Lagrangian methods
• Image in-painting (e.g., from cutset sampling) using sparsity
http://dx.doi.org/10.1109/TIP.2013.2244218http://dx.doi.org/10.1109/TMI.2011.2175233http://fully3d.orghttp://dx.doi.org/10.1109/TMI.2010.2093536
-
5
Image denoising
-
6
Denoising using sparsity
Measurement model:
yyy︸︷︷︸observed
= xxx︸︷︷︸unknown
+ εεε︸︷︷︸noise
Object model: assume QQQxxx is sparse (compressible) for some orthogonalsparsifying transform QQQ, such as an orthogonal wavelet transform (OWT).
Sparsity regularized estimator:
x̂xx = argminxxx
12∥yyy− xxx∥22︸ ︷︷ ︸data fit
+β ∥QQQxxx∥p︸ ︷︷ ︸sparsity
.
Regularization parameter β determines trade-off.
Equivalently (because QQQ−1 = QQQ′ is an orthonormal matrix):
x̂xx = QQQ′θ̂θθ , θ̂θθ = argminθθθ
12∥QQQyyy−θθθ∥22+β∥θθθ∥p = shrink(QQQyyy : β, p)
Non-iterative solution!
-
7
Orthogonal transform thresholding
Equation:x̂xx = QQQ′ shrink(QQQyyy : β, p)
Block diagram:
Noisyimage
yyy→
AnalysisTransform
QQQ→ Shrinkβ , p → θ̂θθ →
SynthesisTransform
QQQ’→
Denoisedimage
x̂xx
todo: show shrink function for p = 1 and p = 0
But sparsity in orthogonal transforms often yields artifacts. Spin cycling...
-
8
Hard thresholding example
PSNR = 76.1 dB
Noisy image
1
200
1 184
PSNR = 89.7 dB
Denoised
1
200
1 184
p = 0, orthonormal Haar wavelets
-
9
Sparsity using shift-invariant models
Analysis form:Assume RRRxxx is sparse for some sparsifying transform RRR.Often RRR is a “tall” matrix, e.g., finite differences along horizontal and verticaldirections, i.e., anisotropic total variation (TV).Often RRR is shift invariant: ∥RRRxxx∥p = ∥RRRcircshift(xxx)∥p and RRR
′RRR is circulant.
x̂xx = argminxxx
12∥yyy− xxx∥22+β∥RRRxxx∥p︸ ︷︷ ︸
transform sparsity
.
Synthesis formAssume xxx = SSSθθθ where coefficient vector θθθ is sparse.Often SSS is a “fat” matrix (over-complete dictionary) and SSS′SSS is circulant.
x̂xx = SSSθ̂θθ , θ̂θθ = argminθθθ
12∥yyy−SSSθθθ∥22+β∥θθθ∥p︸ ︷︷ ︸
sparse coefficients
Analysis form preferable to synthesis form?(Elad et al., Inv. Prob., June 2007)
http://dx.doi.org/10.1088/0266-5611/23/3/007
-
10
Constrained optimization
Unconstrained estimator (analysis form for illustration):
x̂xx = argminxxx
12∥yyy− xxx∥22+β∥RRRxxx∥p .
(Nonnegativity constraint or box constraints easily added.)
Equivalent constrained optimization problem:
minxxx,vvv
12∥yyy− xxx∥22+β∥vvv∥p sub. to vvv = RRRxxx.
(Y. Wang et al., SIAM J. Im. Sci., 2008)(M Afonso, J Bioucas-Dias, M Figueiredo, IEEE T-IP, Sep. 2010)
(The auxiliary variable vvv is discarded after optimization; keep only x̂xx.)
Penalty approach:
x̂xx = argminxxx
minvvv
12∥yyy− xxx∥22+β∥vvv∥p+
µ2∥vvv−RRRxxx∥22 .
Large µ better enforces the constraint vvv = RRRxxx, but can worsen conditioning.
Preferable (?) approach: augmented Lagrangian.
http://dx.doi.org/10.1137/080724265http://dx.doi.org/10.1109/TIP.2010.2047910
-
11
Augmented Lagrangian method: V1
General linearly constrained optimization problem:min
uuuΨ(uuu) sub. to CCC uuu = bbb.
Form augmented Lagrangian:
L(uuu,γγγ)≜ Ψ(uuu)+γγγ ′(CCC uuu−bbb)+ ρ2∥CCC uuu−bbb∥22
where γγγ is the dual variable or Lagrange multiplier vector .
AL method alternates between minimizing over uuu and gradient ascent on γγγ:uuu(n+1) = argmin
uuuL(uuu,γγγ (n))
γγγ (n+1) = γγγ (n)+ρ(CCC uuu(n+1)−bbb
).
Desirable convergence properties.AL penalty parameter ρ affects convergence rate, not solution!
Unfortunately, minimizing over uuu is impractical here:
vvv = RRRxxx equivalent to CCC uuu = bbb, CCC = [RRR −III], uuu =[
xxxvvv
], bbb = 000.
-
12
Augmented Lagrangian method: V2
General linearly constrained optimization problem:min
uuuΨ(uuu) sub. to CCC uuu = bbb.
Form (modified) augmented Lagrangian by completing the square:
L(uuu,ηηη)≜ Ψ(uuu)+ρ2∥CCC uuu−ηηη∥22+Cηηη,
where ηηη ≜ bbb− 1ρ γγγ is a modified dual variable or Lagrange multiplier vector .
AL method alternates between minimizing over uuu and gradient ascent on ηηη :uuu(n+1) = argmin
uuuL(uuu,γγγ (n))
ηηη (n+1) = ηηη (n)− (CCC uuu(n+1)−bbb).Desirable convergence properties.AL penalty parameter ρ affects convergence rate, not solution!
Unfortunately, minimizing over uuu is impractical here:
vvv = RRRxxx equivalent to CCC uuu = bbb, CCC = [RRR −III], uuu =[
xxxvvv
], bbb = 000.
-
13
Alternating direction method of multipliers (ADMM)
When uuu has multiple component vectors, e.g., uuu =[
xxxvvv
],
rewrite (modified) augmented Lagrangian in terms of all component vectors:
L(xxx,vvv;ηηη) = Ψ(xxx,vvv)+ρ2∥RRRxxx− vvv−ηηη∥22
=12∥yyy− xxx∥22+β∥vvv∥p+
ρ2∥RRRxxx− vvv−ηηη∥22︸ ︷︷ ︸cf. penalty!
because here CCC uuu = RRRxxx− vvv.
Alternate between minimizing over each component vector:xxx(n+1) = argmin
xxxL(xxx,vvv(n),ηηη (n))
vvv(n+1) = argminvvv
L(xxx(n+1),vvv,ηηη (n))
ηηη (n+1) = ηηη (n)+(RRRxxx(n+1)− vvv(n+1)
).
Reasonably desirable convergence properties. (Inexact inner minimizations!)Sufficient conditions on matrix CCC.(Eckstein & Bertsekas, Math. Prog., Apr. 1992)(Douglas and Rachford, Tr. Am. Math. Soc., 1956, heat conduction problems)
http://dx.doi.org/10.1007/BF01581204
-
14
ADMM for image denoising
Augmented Lagrangian:
L(xxx,vvv;ηηη) =12∥yyy− xxx∥22+β∥vvv∥p+
ρ2∥RRRxxx− vvv−ηηη∥22
Update of primal variable (unknown image):
xxx(n+1) = argminxxx
L(xxx,vvv(n),ηηη (n)) = [III +ρRRR′RRR]−1︸ ︷︷ ︸Wiener filter
(yyy+ρRRR′
(vvv(n)+ηηη (n)
))
Update of auxiliary variable: (No “corner rounding” needed for ℓ1.)
vvv(n+1) = argminvvv
L(xxx(n+1),vvv,ηηη (n)) = shrink(RRRxxx(n+1)−ηηη (n);β/ρ, p
)Update of multiplier: ηηη (n+1) = ηηη (n)+
(RRRxxx(n+1)− vvv(n+1)
)Equivalent to “split Bregman” approach.(Goldstein & Osher, SIAM J. Im. Sci. 2009)
Each update is simple and exact (non-iterative) if [III +ρRRR′RRR]−1 is easy.
http://dx.doi.org/10.1137/080725891
-
15
ADMM image denoising example
True
1 128
1
128
Noisy
SNR = 24.8 dB1 128
1
128
Denoised
SNR = 53.5 dB1 128
1
1280 5 10 15 20
25
53
ADMM iteration
SN
R (
dB)
RRR : horizontal and vertical finite differences (anisotropic TV),p = 1 (i.e., ℓ1), β= 1/2, ρ = 1 (condition number of (III +ρRRR′RRR) is 9)
-
16
ADMM image denoising iterates
-
17
X-ray CT image reconstructionPart 1: ADMM
-
18
X-ray CT review 1
X-ray source transmits X-ray photons through object.Recorded signal relates to line integral of attenuation along photon path.
-
19
X-ray CT review 2
X-ray source and detector rotate around object.
-
20
X-ray CT review 3
Collection of recorded views called a sinogram.Goal is to reconstruct (3D) image of object attenuation from sinogram.
-
21
Lower-dose X-ray CT
Radiation dose proportional to X-ray source intensity.Reducing dose =⇒ fewer recorded photons =⇒ lower SNRConventional filter back-project (FBP) method derived for noiseless data
Conventional FBP reconstruction
Statistical image reconstruction
-
22
Low-dose X-ray CT image reconstruction
Regularized estimator:
x̂xx = argminxxx⪰000
12∥yyy−AAAxxx∥2WWW︸ ︷︷ ︸data fit
+β∥RRRxxx∥p︸ ︷︷ ︸sparsity
.
Complications:• Large problem size
◦ xxx: 512×512×800 ≈ 2 ·108 unknown image volume◦ yyy: 888×64×7000 ≈ 4 ·108 measured sinogram◦ AAA: (4 ·108)× (2 ·108) system matrix◦ AAA is sparse but still too large to store◦ Projection AAAxxx and back-projection AAA′ rrr operations computed on the fly◦ Computing gradient ∇Ψ(xxx)=AAA′WWW (AAAxxx−yyy)+β∇R(xxx) requires projection
and back-projection operations that dominate computation• AAA′AAA is not circulant (but “approximately Toeplitz” in 2D)• AAA′WWWAAA is highly shift variant due to huge dynamic range of weighting WWW• Non-quadratic (edge-preserving) regularizer e.g., R(xxx) = ∥RRRxxx∥p• Nonnegativity constraint• Goal: fast parallelizable algorithms that “converge” in a few iterations
-
23
Basic ADMM for X-ray CT
Basic equivalent constrained optimization problem (cf. split Bregman):
minxxx⪰000,vvv
12∥yyy−AAAxxx∥2WWW +β∥vvv∥p sub. to vvv = RRRxxx.
Corresponding (modified) augmented Lagrangian (cf. “split Bregman”):
L(xxx,vvv;ηηη) =12∥yyy−AAAxxx∥2WWW +β∥vvv∥p+
ρ2∥RRRxxx− vvv−ηηη∥22
ADMM update of primal variable (unknown image):
xxx(n+1) = argminxxx
L(xxx,vvv(n),ηηη (n)) =[AAA′WWWAAA+ρRRR′RRR
]−1(AAA′WWW ′yyy+ρRRR′(vvv(n)+ηηη (n)))Drawbacks:• Ignores nonnegativity constraint•[AAA′WWWAAA+ρRRR′RRR
]−1 requires iteration (e.g., PCG) but hard to precondition.“second order method”
• Auxiliary variable vvv = RRRxxx is enormous in 3D CT
-
24
Improved ADMM for X-ray CT
minxxx⪰000,uuu,vvv
12∥yyy−uuu∥2WWW +β∥vvv∥p sub. to vvv = RRRxxx, uuu = AAAxxx.
Corresponding (modified) augmented Lagrangian:
L(xxx,uuu,vvv;ηηη1,ηηη2) =12∥yyy−uuu∥2WWW +β∥vvv∥p+
ρ12∥RRRxxx− vvv−ηηη1∥
22+
ρ22∥AAAxxx−uuu−ηηη2∥
22
ADMM update of primal variable (ignoring nonnegativity):
argminxxx
L(xxx,uuu,vvv,ηηη1,ηηη2) =[ρ2AAA′AAA+ρ1RRR′RRR
]−1(ρ1RRR′ (vvv+ηηη1)+ρ2AAA′(uuu+ηηη2))For 2D CT,
[ρ2AAA′AAA+ρ1RRR′RRR
]−1 is approximately Toeplitzso a circulant preconditioner is very effective.
ADMM update of auxiliary variable uuu:
argminuuu
L(xxx,uuu,vvv,ηηη1,ηηη2) = [WWW +ρ2III]−1︸ ︷︷ ︸
diagonal
(WWWyyy+ρ2(AAAxxx−ηηη2))
vvv update is shrinkage again. Reasonably simple to code.
(Sathish Ramani & JF, IEEE T-MI, Mar. 2012)
http://dx.doi.org/10.1109/TMI.2011.2175233
-
25
2D X-ray CT image reconstruction results: quality
True Hanning FBP Regularized
0
0.04
0.08
0.12
0.16
0.2
0.24
0.28
0.32
0.36
0.4
PWLS with ℓ1 regularization of shift-invariant Haar wavelet transform.No nonnegativity constraint, but probably unimportant if well-regularized.
-
26
2D X-ray CT image reconstruction results: speed
0 36 72 108 144 180 216 252 288 324 360−34.5
−33.4
−32.4
−31.3
−30.3
−29.2
−28.2
−27.1
−26.1
−25
−24
−22.9
−21.9
−20.8
−19.8
−18.7
−17.7
RM
SE
( x(
j))
(in
dB
)
Iterations
NCG−5MFISTA−5SB−CG−1SB−CG−2SB−PCG−1SB−PCG−2ADMM−CG−1ADMM−CG−2ADMM−PCG−1ADMM−PCG−2
0 79 158 237 316 395 474 553 632 711 790−34.5
−33.4
−32.4
−31.3
−30.3
−29.2
−28.2
−27.1
−26.1
−25
−24
−22.9
−21.9
−20.8
−19.8
−18.7
−17.7
RM
SE
( x(
j))
(in
dB
)
t j (seconds)
NCG−5MFISTA−5SB−CG−1SB−CG−2SB−PCG−1SB−PCG−2ADMM−CG−1ADMM−CG−2ADMM−PCG−1ADMM−PCG−2
Circulant preconditioner for[ρ2AAA′AAA+ρ1RRR′RRR
]−1 is crucial to acceleration.Similar results for real head CT scan in paper.
-
27
Lower-memory ADMM for X-ray CT
minxxx,uuu,zzz⪰000
12∥yyy−uuu∥2WWW +β∥RRRzzz∥p sub. to zzz = xxx, uuu = AAAxxx.
(M McGaffin, S Ramani, JF, SPIE 2012)
Corresponding (modified) augmented Lagrangian:
L(xxx,uuu,zzz;ηηη1,ηηη2) =12∥yyy−uuu∥2WWW +β∥RRRzzz∥p+
ρ12∥xxx− zzz−ηηη1∥
22+
ρ22∥AAAxxx−uuu−ηηη2∥
22
ADMM update of primal variable (nonnegativity not required, use PCG):
argminxxx
L(xxx,uuu,zzz,ηηη1,ηηη2) =[ρ2AAA′AAA+ρ1III
]−1(ρ1 (zzz+ηηη1)+ρ2AAA′(uuu+ηηη2)) .ADMM update of auxiliary variable zzz:
argminzzz⪰000
L(xxx,uuu,zzz,ηηη1,ηηη2) = argminzzz⪰000
ρ12∥xxx− zzz−ηηη1∥
22+β∥RRRzzz∥p .
Use nonnegatively constrained, edge-preserving image denoising.
ADMM updates of auxiliary variables uuu and vvv same as before.Variations...
http://dx.doi.org/10.1117/12.910941
-
28
3D X-ray CT image reconstruction results
Awaiting better preconditioner for[ρ2AAA′AAA+ρ1III
]−1 in 3D CT...This is not an “easy problem” like in (idealized) image restoration...
-
29
X-ray CT image reconstructionPart 2: OS+Momentum
-
30
OS+Momentum preview
• Optimization transfer (aka majorize-minimize, half-quadratic)• Ordered-subsets (OS) acceleration (aka incremental gradient, block iterative)• Nesterov’s momentum-based acceleration• Proposed OS+Momentum approach combines both
0 5 10 15 200
5
10
15
20
25
30
Iteration
RM
SD
[HU
]
Donghwan Kim, Sathish Ramani, JF, Fully3D June 2013Accelerating X-ray CT ordered subsets image reconstruction with Nesterov’s first-ordermethods
http://fully3d.org
-
31
Optimization transfer method
(aka majorization-minimization or half-quadratic)
• At nth iteration, replace original cost function Ψ(xxx)by a surrogate function ϕ (n)(xxx) that is easier to minimize:
xxx(n+1) = argminxxx⪰000
ϕ (n)(xxx)
• To monotonically decrease Ψ(xxx), i.e., Ψ(xxx(n+1)
)≤ Ψ(xxx(n)),
surrogate should satisfy the following majorization conditions:
ϕ (n)(xxx(n)
)= Ψ
(xxx(n)
)ϕ (n)(xxx) ≥ Ψ(xxx), ∀xxx ⪰ 000
-
32
Separable quadratic surrogate (SQS)
Quadratic surrogate functions are particularly convenient:
Ψ(xxx)≤ ϕ (n)(xxx)≜ Ψ(xxx(n)
)+∇Ψ
(xxx(n)
)(xxx− xxx(n))+ 1
2
∥∥xxx− xxx(n)∥∥2DDD ,where DDD is a specially designed diagonal matrix:
DDD = DDDL+βDDDR, DDDL ≜ diag{
AAA′WWWAAA111}, DDDR ≜ λmax
(∇2R(xxx)
)III.
(Erdoğan and Fessler, PMB, 1999)
(Easier to compute DDDL than to find Lipschitz constant of AAA′WWWAAA.)
SQS leads to trivial M-step:
xxx(n+1) = argminxxx⪰000
ϕ (n)(xxx) =[xxx(n)−DDD−1∇Ψ
(xxx(n)
)]+.
“diagonally preconditioned gradient projection method”
http://dx.doi.org/10.1088/0031-9155/44/11/311
-
33
Convergence rate of SQS method
Asymptotic convergence rate:
ρ(III −DDD−1∇2 Ψ(x̂xx)
)Slightly generalizing Theorem 3.1 of Beck and Teboulle:
Ψ(xxx(n)
)−Ψ(x̂xx)≤
∥∥xxx(0)− x̂xx∥∥2DDD2n
.
(Beck and Teboulle, SIAM J Im. Sci., 2009)
Pro: easily parallelizedCon: very slow convergence
http://dx.doi.org/10.1137/080716542
-
34
Accelerating SQS using Nesterov’s momemtum
SQS+Momentum Algorithm:• Initialize image xxx(0) and zzz(0)• for n = 0,1, . . .
xxx(n+1) = argminxxx⪰000
ϕ (n)(zzz(n)
)=[zzz(n)−DDD−1∇Ψ
(zzz(n)
)]+
tn+1 =(
1+√
1+4t2n)/2
zzz(n+1) = xxx(n+1)+tn−1tn+1
(xxx(n+1)− xxx(n)
)Convergence rate of SQS+Momentum method:
Ψ(xxx(n)
)−Ψ(x̂xx)≤
2∥∥xxx(0)− x̂xx∥∥2DDD(n+1)2
.
Simple generalization of Thm. 4.4 for FISTA of(Beck and Teboulle, SIAM J Im. Sci., 2009)
Pro: Almost same computation per iteration; slightly more memory needed.Con: still converges too slowly for X-ray CT
http://dx.doi.org/10.1137/080716542
-
35
Ordered subsets (OS) methods
• Recall: Projection operator AAA is computationally expensive.•• OS methods group projection views into M subsets, and use each subset
per each update, instead of using all measurement data.(Hudson and Larkin, IEEE T-MI, 1994)(Erdoğan and Fessler, PMB, 1999)
cf block-iterative incremental sub-gradient for machine learning
http://dx.doi.org/ 10.1109/42.363108 http://dx.doi.org/10.1088/0031-9155/44/11/311
-
36
OS projection view grouping
-
37
OS projection view grouping
-
38
OS projection view grouping
-
39
OS algorithm
Cost function decomposition:
Ψ(xxx) =M
∑m=1
Ψm(xxx), Ψm(xxx) =12∥yyym−AAAmxxx∥
2WWW m +
1M
R(xxx)
yyym, AAAm, WWW m: sinogram rows, system matrix rows, weighting elementsfor mth subset of projection views
Intuition: in early iterations (when xxx(n) is far from x̂xx):
M∇Ψm(xxx(n)
)≈ ∇Ψ
(xxx(n)
).
OS-SQS Algorithm (Erdoğan and Fessler, PMB, 1999)• Initialize image xxx(0)• for n = 0,1, . . .• for m = 0, . . . ,M−1
xxx(n+(m+1)/M) = argminxxx⪰000
[xxx(n+m/M)−DDD−1M∇Ψm
(xxx(n+m/M)
)]+
http://dx.doi.org/10.1088/0031-9155/44/11/311
-
40
OS-SQS algorithm properties
• One iteration corresponds to updating all M subsets.Computation cost similar to original SQS(one full forward AAA and back-projection AAA′ per iteration)
• + Highly parallelizable• + In early iterations, “we expect” the sequence {xxx(n)} to satisfy
Ψ(xxx(n)
)−Ψ(x̂xx)⪅
∥∥xxx(0)− x̂xx∥∥2DDD2nM
.
M times acceleration!• - Does not converge to x̂xx
Approaches a limit cycle, size related to MLuo, Neural Computation, June 1991
• - Computing ∇R(xxx) for each of M subsets =⇒ prefer small M• Since about 1997, OS methods have been used for (unregularized) PET
reconstruction in nearly every PET scanner sold.• Still undesirably slow (for small M) or unstable (for large M) in X-ray CT.
http://dx.doi.org/ 10.1162/neco.1991.3.2.226
-
41
OS+Momentum algorithm• Initialize image xxx(0) and zzz(0)• for n = 0,1, . . .• for m = 0,1, . . .
xxx(n+(m+1)/M) =[zzz(n+m/M)−DDD−1M∇Ψm
(zzz(n+m/M)
)]+
tnew =(
1+√
1+4t2old
)/2
zzz(n+(m+1)/M) = xxx(n+(m+1)/M)+told−1
tnew
(xxx(n+(m+1)/M)− xxx(n+m/M)
)told := tnew
• + In early iterations, “we expect” the sequence {xxx(n)} to satisfy
Ψ(xxx(n)
)−Ψ(x̂xx)⪅
∥∥xxx(0)− x̂xx∥∥2DDD2(nM)2
.
M2 times acceleration!• + Very similar computation as OS-SQS• + Easily implemented• - Unknown convergence properties
-
42
Summary of convergence rates
SQS (optimization transfer) methods:
• Convergence rate
◦ SQS: O(
1n
)◦ SQS+Momentum: O
(1n2
)• Expected convergence rate with OS method in early iterations
◦ OS-SQS: O(
1nM
)◦ Proposed OS-SQS+Momentum: O
(1
(nM)2
)• Pros: Owing to M2 times acceleration from OS methods, we can use
small M, improving stability and reducing regularizer computation.• Cons: Behavior of OS methods with momentum is unknown, while or-
dinary OS methods approach a limit-cycle. (Luo, Neural Comp., Jun. 1991)
http://dx.doi.org/ 10.1162/neco.1991.3.2.226
-
43
Patient 3D helical CT scan results
• 3D cone-beam helical CT scan with pitch 1.0• 3D image xxx: 512×512×109• voxel size: 1.369 mm × 1.369 mm × 0.625 mm• measured sinogram data yyy: 888×32×7146
(detector columns × detector rows × projection views)
Convergence rates (empirical)• Root mean square difference (RMSD) between current xxx(n) and converged
image x̂xx
RMSD ≜ ∥xxx(n)− x̂xx∥2√
Np[HU],
where Np = 512×512×109 is the number of image voxels in xxx.• Normalized RMSD:
NRMSD ≜ 20log10(∥xxx(n)− x̂xx∥2
∥x̂xx∥2
)[dB].
• x̂xx obtained by many iterations of several convergent algorithms
-
44
Convergence rate: RMSD [HU]
0 5 10 15 200
5
10
15
20
25
30
Iteration
RM
SD
[HU
]
SQSSQS−momentum OS(24)−SQSOS(24)−SQS−momentum
• Slow convergence without OS methods.• OS methods with M = 24 subsets needed 20% extra compute time per
iteration due to ∇R(xxx).• OS-SQS-Momentum “converges” very rapidly in early iterations!• Does not reach RMSD=0...
-
45
Convergence rate: Normalized RMSD [dB]
0 5 10 15 20−50
−45
−40
−35
−30
−25
Iteration
Nor
mal
ized
RM
SD
[dB
]
SQSSQS−momentum OS(24)−SQSOS(24)−SQS−momentum
Combining incremental (sub)gradient with Nesterov-type momentumacceleration may help other “big data” estimation problems.
-
46
Images
Initial FBP image x(0)
Converged image x*
800
850
900
950
1000
1050
1100
1150
1200
-
47
Images
SQS
RMSD = 29.3 [HU] SQS−momentum
RMSD = 27.9 [HU]
OS(24)−SQS RMSD = 16.8 [HU]
OS(24)−SQS−momentum RMSD = 1.9 [HU]
800 850 900 950 1000 1050 1100 1150 1200
Reconstructed images at 12th iteration. ([800 1200] HU)
OS-SQS+Momentum with M = 24 subsets much closer to minimizer x̂xx
-
48
Difference images
SQS
RMSD = 29.3 [HU] SQS−momentum
RMSD = 27.9 [HU]
OS(24)−SQS RMSD = 16.8 [HU]
OS(24)−SQS−momentum RMSD = 1.9 [HU]
−100 −80 −60 −40 −20 0 20 40 60 80 100
Difference between reconstructed images at 12th iterationand converged image x̂xx. ([-100 100] HU)
-
49
Newer Nesterov method
0 5 10 15 200
5
10
15
20
25
30
Iteration
RM
SD
[HU
]
SQS−Nes83 SQS−Nes05 OS24−SQSOS48−SQSOS24−SQS−Nes83 OS24−SQS−Nes05 OS48−SQS−Nes83 OS48−SQS−Nes05
Remains stable even for M = 48 subsets. (Nesterov, Math. Prog., May 2005)
http://dx.doi.org/ 10.1007/s10107-004-0552-5
-
50
Some research problems in CT
x̂xx = argminxxx⪰000
12∥yyy−AAAxxx∥2WWW +βR(xxx).
Reasonably mature research areas• Design and implementation of system model AAA• Statistical modeling WWW
Open problems• Design of regularizer R(xxx) to maximize radiologist performance• Faster parallelizable algorithms (argmin) with global convergence• Distributed computation – reducing communication• Algorithms for more complete/complicated physical models
(e.g., dual energy or spectral CT)• Dynamic imaging / motion compensated image reconstruction• Analysis of statistical properties of (highly nonlinear) estimator x̂xx
-
51
Image reconstruction for parallel MRI
-
52
Parallel MRI
Undersampled Cartesian k-space, multiple receive coils, ...(Pruessmann et al., MRM, Nov. 1999)
Compressed sensing parallel MRI ≡ further (random) under-samplingLustig et al., IEEE Sig. Proc. Mag., Mar. 2008
http://dx.doi.org/10.1002/(SICI)1522-2594(199911)42:53.0.CO;2-Shttp://dx.doi.org/10.1109/MSP.2007.914728
-
53
Model-based image reconstruction in parallel MRI
Regularized estimator:
x̂xx = argminxxx
12∥yyy−FFFSSSxxx∥22︸ ︷︷ ︸data fit
+β∥RRRxxx∥p︸ ︷︷ ︸sparsity
.
FFF is under-sampled DFT matrix (fat)
Features:• coil sensitivity matrix SSS is block diagonal (Pruessmann et al., MRM, Nov. 1999)• FFF ′FFF is circulant
Complications:• Data-fit Hessian SSS′FFF ′FFFSSS is highly shift variant due to coil sensitivity maps• Non-quadratic (edge-preserving) regularization ∥·∥p• Complex quantities• Large problem size (if 3D)
http://dx.doi.org/10.1002/(SICI)1522-2594(199911)42:53.0.CO;2-S
-
54
Basic ADMM for parallel MRI
Basic equivalent constrained optimization problem (cf. split Bregman):
minxxx,vvv
12∥yyy−FFFSSSxxx∥22+β∥vvv∥p sub. to vvv = RRRxxx.
Corresponding (modified) augmented Lagrangian (cf. “split Bregman”):
L(xxx,vvv;ηηη) =12∥yyy−FFFSSSxxx∥22+β∥vvv∥p+
ρ2∥RRRxxx− vvv−ηηη∥22
(Skipping technical details about complex vectors.)
ADMM update of primal variable (unknown image):
xxx(n+1) = argminxxx
L(xxx,vvv(n),ηηη (n)) =[SSS′FFF ′FFFSSS+ρRRR′RRR
]−1(SSS′FFF ′yyy+ρRRR′(vvv(n)+ηηη (n)))•[SSS′FFF ′FFFSSS+ρRRR′RRR
]−1 requires iteration (e.g., PCG) but hard to precondition• (Trivial for single coil case with SSS = III.)• The “problem” matrix is on opposite side:
◦ MRI: FFFSSS◦ Restoration: TTT AAA
-
55
Improved ADMM for parallel MRI
minxxx,uuu,vvv,zzz
12∥yyy−FFF uuu∥22+β∥vvv∥p sub. to vvv = RRRzzz, uuu = SSSxxx, zzz = xxx.
Corresponding (modified) augmented Lagrangian:12∥yyy−FFF uuu∥22+β∥vvv∥p+
ρ12∥RRRzzz− vvv−ηηη1∥
22+
ρ22∥SSSxxx−uuu−ηηη2∥
22+
ρ32∥xxx− zzz−ηηη3∥
22
ADMM update of primal variable
argminxxx
L(xxx,uuu,vvv,zzz;ηηη1,ηηη2,ηηη3) =[ρ2SSS′SSS+ρ3III
]−1︸ ︷︷ ︸diagonal
(ρ2SSS′ (uuu+ηηη2)+ρ3(zzz+ηηη3)
)
ADMM update of auxiliary variables:argmin
uuuL(xxx,uuu,vvv,zzz;ηηη1,ηηη2,ηηη3) = [FFF ′FFF +ρ2III]
−1︸ ︷︷ ︸circulant
(FFF ′yyy+ρ2(SSSxxx−ηηη2))
argminzzz
L(xxx,uuu,vvv,zzz;ηηη1,ηηη2,ηηη3) = [ρ1RRR′RRR+ρ3III]−1︸ ︷︷ ︸
circulant
(ρ1RRR′(vvv+ηηη1)+ρ3(xxx−ηηη3))
vvv update is shrinkage again.Simple, but does not satisfy sufficient conditions.(Sathish Ramani & JF, IEEE T-MI, Mar. 2011)
http://dx.doi.org/10.1109/TMI.2010.2093536
-
56
2.5D parallel MR image reconstruction results: data
0.4
32
63.6
95.1
126.7
158.2
189.8
221.3
252.9
284.4
316
347.5
0.4
32
63.6
95.1
126.7
158.2
189.8
221.3
252.9
284.4
316
347.5
Fully sampled body coil image of human brainPoisson-disk-based k-space sampling, 16% sampling (acceleration 6.25)Square-root of sum-of-squares inverse FFT of zero-filled k-space data
-
57
2.5D parallel MR image reconstruction results: IQ
0.4
32
63.6
95.1
126.7
158.2
189.8
221.3
252.9
284.4
316
347.5
0.4
32
63.6
95.1
126.7
158.2
189.8
221.3
252.9
284.4
316
347.5
0
8.4
16.8
25.2
33.5
41.9
50.3
58.7
67
75.4
83.8
92.2
• Fully sampled body coil image of human brain• Regularized reconstruction xxx(∞) (1000s of iterations of MFISTA)
(A Beck & M Teboulle, SIAM J. Im. Sci, 2009)Combined TV and ℓ1 norm of two-level undecimated Haar wavelets
• Difference image magnitude
http://dx.doi.org/10.1137/080716542
-
58
2.5D parallel MR image reconstruction results: speed
0 70 140 210 281 351 421 492 562 632 703 773−93.1
−86.4
−79.7
−72.9
−66.2
−59.5
−52.8
−46.1
−39.3
−32.6
−25.9
−19.2
−12.5
−5.7
1ξ(
j)
(in
dB
)
t j (seconds)
MFISTA−1MFISTA−5NCG−1NCG−5AL−P1−4AL−P1−6AL−P1−10AL−P2
AL approach converges to xxx(∞) much faster than MFISTA and CG
-
59
Current and future directions with ADMM• Motion-compensated image reconstruction: yyy = AAATTT (ααα)xxx+ εεε
(J H Cho, S Ramani, JF, 2nd CT meeting, 2012)(J H Cho, S Ramani, JF, IEEE Stat. Sig. Proc. W., 2012)
• Dynamic image reconstruction• Improved preconditioners for ADMM for 3D CT
(M McGaffin and JF, Submitted to Fully 3D 2013)• Combining ADMM with ordered subsets (OS) methods
(H Nien and JF, Submitted to Fully 3D 2013)• Generalize parallel MRI algorithm to include spatial support constraint
(M Le, S Ramani, JF, To appear at ISMRM 2013)• Non-Cartesian MRI (combine optimization transfer and variable splitting)
(S Ramani and JF, ISBI 2013, to appear.)• SPECT-CT reconstruction with non-local means regularizer
(S Y Chun, Y K Dewaraja, JF, Submitted to Fully 3D 2013)• Estimation of coil sensitivity maps (quadratic problem!)
(M J Allison, S Ramani, JF, IEEE T-MI, Mar. 2013)• L1-SPIRiT for non-Cartesian parallel MRI (D S Weller, S Ramani, JF, IEEE
T-MI, 2013, submitted)• Multi-frame super-resolution• Selection of AL penalty parameter ρ to optimize convergence rate
• Other non-ADMM methods...
http://dx.doi.org/10.1109/SSP.2012.6319667http://dx.doi.org/10.1109/TMI.2012.2229711http://dx.doi.org/http://dx.doi.org/
-
60
Acknowledgements
CT group• Jang Hwan Cho• Donghwan Kim• Jungkuk Kim• Madison McGaffin• Hung Nien• Stephen Schmitt
MR group• Michael Allison• Mai Le• Antonis Matakos• Matthew Muckley
Post-doctoral fellows• Se Young Chun• Sathish Ramani• Daniel Weller
UM collaborators• Doug Noll• Jon-Fredrik Nielsen• Mitch Goodsitt• Ella Kazerooni• Tom Chenevert• Charles Meyer
GE collaborators• Bruno De Man• Jean-Baptiste Thibault
-
61
Image reconstruction toolbox
MATLAB (and increasingly Octave) toolbox for imaging inverse problems(MRI, CT, PET, SPECT, Deblurring)
web.eecs.umich.edu/˜fessler
-600
-400
-200
0
200
400
600
x (mm)-600
-400
-200
0
200
400
600
y (mm)
-10
-5
0
5
10
15
z (mm)
1
985
1968
x
y
z
web.eecs.umich.edu/~fessler
-
62
Bibliography[1] A. Matakos, S. Ramani, and J. A. Fessler. Accelerated edge-preserving image restoration without boundary artifacts. IEEE
Trans. Im. Proc., 22(5):2019–29, May 2013.[2] S. Ramani and J. A. Fessler. A splitting-based iterative algorithm for accelerated statistical X-ray CT reconstruction. IEEE
Trans. Med. Imag., 31(3):677–88, March 2012.[3] D. Kim, S. Ramani, and J. A. Fessler. Accelerating X-ray CT ordered subsets image reconstruction with Nesterov’s first-order
methods. In Proc. Intl. Mtg. on Fully 3D Image Recon. in Rad. and Nuc. Med, pages 22–5, 2013.[4] D. Kim, S. Ramani, and J. A. Fessler. Ordered subsets with momentum for accelerated X-ray CT image reconstruction. In
Proc. IEEE Conf. Acoust. Speech Sig. Proc., pages 920–3, 2013.[5] S. Ramani and J. A. Fessler. Parallel MR image reconstruction using augmented Lagrangian methods. IEEE Trans. Med.
Imag., 30(3):694–706, March 2011.[6] M. Elad, P. Milanfar, and R. Rubinstein. Analysis versus synthesis in signal priors. Inverse Prob., 23(3):947–68, June 2007.[7] I. W. Selesnick and Mário A T Figueiredo. Signal restoration with overcomplete wavelet transforms: comparison of analysis
and synthesis priors. In Proc. SPIE 7446 Wavelets XIII, page 74460D, 2009. Wavelets XIII.[8] Y. Wang, J. Yang, W. Yin, and Y. Zhang. A new alternating minimization algorithm for total variation image reconstruction.
SIAM J. Imaging Sci., 1(3):248–72, 2008.[9] M. V. Afonso, José M Bioucas-Dias, and Mário A T Figueiredo. Fast image recovery using variable splitting and constrained
optimization. IEEE Trans. Im. Proc., 19(9):2345–56, September 2010.[10] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating
direction method of multipliers. Found. & Trends in Machine Learning, 3(1):1–122, 2010.[11] J. Eckstein and D. P. Bertsekas. On the Douglas-Rachford splitting method and the proximal point algorithm for maximal
monotone operators. Mathematical Programming, 55(1-3):293–318, April 1992.[12] J. Douglas and H. H. Rachford. On the numerical solution of heat conduction problems in two and three space variables.
tams, 82(2):421–39, July 1956.[13] T. Goldstein and S. Osher. The split Bregman method for L1-regularized problems. SIAM J. Imaging Sci., 2(2):323–43, 2009.[14] S. J. Reeves. Fast image restoration without boundary artifacts. IEEE Trans. Im. Proc., 14(10):1448–53, October 2005.[15] M. G. McGaffin, S. Ramani, and J. A. Fessler. Reduced memory augmented Lagrangian algorithm for 3D iterative X-ray CT
image reconstruction. In Proc. SPIE 8313 Medical Imaging 2012: Phys. Med. Im., page 831327, 2012.[16] H. Erdoğan and J. A. Fessler. Ordered subsets algorithms for transmission tomography. Phys. Med. Biol., 44(11):2835–51,
November 1999.[17] H. M. Hudson and R. S. Larkin. Accelerated image reconstruction using ordered subsets of projection data. IEEE Trans. Med.
Imag., 13(4):601–9, December 1994.[18] Z. Q. Luo. On the convergence of the LMS algorithm with adaptive learning rate for linear feedforward networks. Neural
Computation, 32(2):226–45, June 1991.[19] Y. Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103(1):127–52, May 2005.
-
[20] K. P. Pruessmann, M. Weiger, M. B. Scheidegger, and P. Boesiger. SENSE: sensitivity encoding for fast MRI. Mag. Res. Med.,42(5):952–62, November 1999.
[21] M. Lustig, D. L. Donoho, J. M. Santos, and J. M. Pauly. Compressed sensing MRI. IEEE Sig. Proc. Mag., 25(2):72–82, March2008.
[22] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci.,2(1):183–202, 2009.
[23] M. J. Allison, S. Ramani, and J. A. Fessler. Accelerated regularized estimation of MR coil sensitivities using augmentedLagrangian methods. IEEE Trans. Med. Imag., 32(3):556–64, March 2013.
[24] M. J. Allison and J. A. Fessler. Accelerated computation of regularized field map estimates. In Proc. Intl. Soc. Mag. Res. Med.,page 0413, 2012.
[25] J. H. Cho, S. Ramani, and J. A. Fessler. Motion-compensated image reconstruction with alternating minimization. In Proc.2nd Intl. Mtg. on image formation in X-ray CT, pages 330–3, 2012.
[26] J. H. Cho, S. Ramani, and J. A. Fessler. Alternating minimization approach for multi-frame image reconstruction. In IEEEWorkshop on Statistical Signal Processing, pages 225–8, 2012.
[27] M. McGaffin and J. A. Fessler. Sparse shift-varying FIR preconditioners for fast volume denoising. In Proc. Intl. Mtg. on Fully3D Image Recon. in Rad. and Nuc. Med, pages 284–7, 2013.
[28] H. Nien and J. A. Fessler. Combining augmented Lagrangian method with ordered subsets for X-ray CT image reconstruction.In Proc. Intl. Mtg. on Fully 3D Image Recon. in Rad. and Nuc. Med, pages 280–3, 2013.
[29] M. Le, S. Ramani, and J. A. Fessler. An efficient variable splitting based algorithm for regularized SENSE reconstruction withsupport constraint. In Proc. Intl. Soc. Mag. Res. Med., page 2654, 2013. To appear.
[30] S. Ramani and J. A. Fessler. Accelerated non-Cartesian SENSE reconstruction using a majorize-minimize algorithm combin-ing variable-splitting. In Proc. IEEE Intl. Symp. Biomed. Imag., pages 700–3, 2013.
[31] S. Y. Chun, Y. K. Dewaraja, and J. A. Fessler. Alternating direction method of multiplier for emission tomography with non-localregularizers. In Proc. Intl. Mtg. on Fully 3D Image Recon. in Rad. and Nuc. Med, pages 62–5, 2013.
[32] D. Weller, S. Ramani, and J. A. Fessler. Augmented Lagrangian with variable splitting for faster non-Cartesian L1-SPIRiT MRimage reconstruction. IEEE Trans. Med. Imag., 2013. Submitted.
[33] J. A. Fessler and W. L. Rogers. Spatial resolution properties of penalized-likelihood image reconstruction methods: Space-invariant tomographs. IEEE Trans. Im. Proc., 5(9):1346–58, September 1996.