projected nesterov's proximal-gradient algorithm for sparse signal recovery

71
PNPG Algorithm Applications References References References Acronyms Projected Nesterov’s Proximal-Gradient Algorithm for Sparse Signal Recovery ł Aleksandar Dogandžić Electrical and Computer Engineering joint work with Renliang Gu, Ph.D. student supported by 1 / 71

Upload: aleksandar-dogandzic

Post on 12-Apr-2017

689 views

Category:

Engineering


2 download

TRANSCRIPT

Page 1: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Projected Nesterov’s Proximal-Gradient Algorithmfor Sparse Signal Recovery�

Aleksandar Dogandžić

Electrical and Computer Engineering

� joint work with Renliang Gu, Ph.D. student

supported by

1 / 71

Page 2: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Terminology and Notation

soft-thresholding operator for a D .ai /NiD1 2 RN :�T �.a/

�iD sign.ai /max

�jai j � �; 0�I“�” is the elementwise version of “�”;proximal operator for function r.x/ scaled by �:

prox�r a D arg minx

12kx � ak22 C �r.x/:

"-subgradient (Rockafellar 1970, Sec. 23):

@"r.x/ ,˚g 2 Rp j r.z/ � r.x/C .z � x/Tg � ";8z 2 Rp

:

2 / 71

Page 3: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Introduction I

For most natural signals x,

# significant coefficients of .x/ � signal size p

where .x/ W Rp 7! Rp

0

is sparsifying transform.

3 / 71

Page 4: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Sparsifying Transforms I

p pixels

lineartransform$DWT

0

100

200

300

400

500

600

700

800

900

1000

# significant coeffs � p

.x/ D ‰Tx

where ‰ 2 Rp�p0 is a known sparsifying dictionary matrix.4 / 71

Page 5: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Sparsifying Transforms II

p pixels

transform$gradient map

# significant coeffs � p

Œ .x/�p0

iD1 ,sXj2Ni

.xi � xj /2

Ni is the index set of neighbors of xi .5 / 71

Page 6: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Convex-Set Constraint

x 2 Cwhere C is a nonempty closed convex set.Example: the nonnegative signal set

C D RpC

is of significant practical interest and applicable to X-ray CT,SPECT, PET, and MRI.

6 / 71

Page 7: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Goal

Sense the significant components of .x/ using a small number ofmeasurements.Define the noiseless measurement vector �.x/, where

�.�/ W Rp 7! RN (N � p).

Example: Linear model

�.x/ D ˆx

where ˆ 2 RN�p is a known sensing matrix.

7 / 71

Page 8: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Penalized NLL

objectivefunction

f .x/ D L.x/ C u�k .x/k1 C IC .x/

��

r.x/

convex differentiablenegative log-likelihood (NLL)convex penalty termu > 0 is a scalar tuning constantC � cl

�domL.x/

�8 / 71

Page 9: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Comment

Our objective function f .x/ isconvex

has a convex set as minimum (unique if L.x/ is stronglyconvex),

not differentiable with respect to the signal xcannot apply usual gradient- or Newton-type algorithms,need proximal-gradient (PG) schemes.

9 / 71

Page 10: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Goals

Develop a fast algorithm withO�k�2

�convergence-rate and

iterate convergence guaranteesfor minimizing f .x/ that

is general (for a diverse set of NLLs),requires minimal tuning, andis matrix-free�, a must for solving large-scale problems.

�involves only matrix-vector multiplications implementable using, e.g.,function handle in Matlab

10 / 71

Page 11: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Majorizing Function

Define the quadratic approximation of the NLL L.x/:

Qˇ�x j xx� D L.xx/C .x � xx/TrL.xx/C 1

2ˇkx � xxk22

with ˇ chosen so that Qˇ�x j xx� majorizes L.x/ in the

neighborhood of x D xx.

x̄(j)x̄(i)

y

0 x

L(x)Qβ(i) (x|x̄(i))Qβ(j) (x|x̄(j))

11 / 71

Page 12: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

x̄(j)x̄(i)

y

0 x

L(x)Qβ(i) (x|x̄(i))Qβ(j) (x|x̄(j))

x̄(j)x̄(i)

y

0 x

L(x)Qβ(i) (x|x̄(i))Qβ(j) (x|x̄(j))

Q10β(j) (x|x̄(j))

Figure 1: Majorizing function: Impact of ˇ.

No need for strict majorization, sufficient to majorize in theneighborhood of xx where we wish to move next!

12 / 71

Page 13: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

PNPG Method: Iteration i

B.i/ D ˇ.i�1/=ˇ.i/

� .i/ D˚1; i � 11 Cqb C B.i/�� .i�1/�2; i > 1

xx.i/ D PC�x.i�1/ C � .i�1/ � 1

� .i/

�x.i�1/ � x.i�2/

��accel. step

x.i/ D proxˇ .i/ur

�xx.i/ � ˇ.i/rL�xx.i/�� PG step

where ˇ.i/ > 0 is an adaptive step size:satisfies

L�x.i/

� � Qˇ .i/

�x.i/ j xx.i/

�majorization condition,

is as large as possible.more

13 / 71

Page 14: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

PNPG Method: Iteration i

allows general domL

B.i/ D ˇ.i�1/=ˇ.i/

� .i/ D˚i; i � 11 Cqb C B.i/�� .i�1/�2; i > 1

xx.i/ D PC

�x.i�1/ C � .i�1/ � 1

� .i/

�x.i�1/ � x.i�2/

��accel. step

x.i/ D proxˇ .i/ur

�xx.i/ � ˇ.i/rL�xx.i/�� PG step

where ˇ.i/ > 0 is an adaptive step size:satisfies

L�x.i/

� � Qˇ .i/

�x.i/ j xx.i/

�majorization condition,

is as large as possible.

14 / 71

Page 15: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

PNPG Method: Iteration i

B.i/ D ˇ.i�1/=ˇ.i/

� .i/ D˚i; i � 11 Cqb C B.i/�� .i�1/�2; i > 1

xx.i/ D PC�x.i�1/ C � .i�1/ � 1

� .i/

�x.i�1/ � x.i�2/

��accel. step

x.i/ D proxˇ .i/ur

�xx.i/ � ˇ.i/rL�xx.i/�� PG step

where ˇ.i/ > 0 is an adaptive step size:satisfies

L�x.i/

� � Qˇ .i/

�x.i/ j xx.i/

�majorization condition,

needs to hold for x.i/, not for all x!

L�x� — Qˇ .i/

�x j xx.i/

�in general, for an arbitrary x! more

15 / 71

Page 16: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Algorithm 1: PNPG method

Input: x.�1/, u, n, m, �, �, and threshold �Output: arg min

xf .x/

Initialization: � .0/ 0, x.0/ 0, i 0, � 0, ˇ.1/ by the BB methodrepeat

i i C 1 and � � C 1while true do // backtracking search

evaluate B.i/, � .i/, and xx.i/if xx.i/ … domL then // domain restart

� .i�1/ 1 and continue

solve the PG stepif majorization condition holds then

breakelse

if ˇ.i/ > ˇ.i�1/ then // increase nn nCm

ˇ.i/ �ˇ.i/ and � 0

if i > 1 and f�x.i/

�> f

�x.i�1/

�then // function restart

� .i�1/ 1, i i � 1, and continue

if convergence condition holds thendeclare convergence

if � � n then // adapt step size� 0 and ˇ.iC1/ ˇ.i/=�

elseˇ.iC1/ ˇ.i/

until convergence declared or maximum number of iterations exceeded16 / 71

Page 17: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

10�7

10�6

10�5

10�4

0 50 100 150 200 250 300

Step

Sizeˇ.i/

# of iterations

PNPG .n D 4/PNPG .n D1/

n

��

Figure 2: Illustration of step-size selection for Poisson generalized linearmodel (GLM) with identity link.

17 / 71

Page 18: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Momentum Illustration

xx.i/ D PC� yx.i/

¥x.i�1/ C �.i�1/�1

�.i/

�x.i�1/ � x.i�2/� 

momentum termprevents zigzagging

b

b

b

b

x(i−2)

x(i−1)

x̂(i)

x(i)

−∇L(x(i))

C

18 / 71

Page 19: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Comments on the extrapolation term � .i/ I

� .i/ D 1 Cqb C B.i/�� .i�1/�2; i � 2

where

� 2; b 2 �0; 1=4�are momentum tuning constants.

To establish O�k�2

�convergence of PNPG, need

� .i/ � 12Cq14C B.i/�� .i�1/�2; i � 2:

controls the rate of increase of � .i/.

19 / 71

Page 20: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Comments on the extrapolation term � .i/ II

� .i/x? implies stronger momentum.

Effect of step size

In the “steady state” where ˇ.i�1/ D ˇ.i/, � .i/x? aproximatelylinearly with i , with slope 1= .Changes in the step size affect � .i/:ˇ.i/ < ˇ.i�1/ step size decrease, faster increase of � .i/,ˇ.i/ > ˇ.i�1/ step size increase, decrease or slower increase of � .i/

than in the steady state.

20 / 71

Page 21: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Proximal Mapping

To compute

prox�r a D arg minx

12kx � ak22 C �r.x/

usefor `1-norm penalty with .x/ D ‰Tx, alternating directionmethod of multipliers (ADMM)

o iterative

for total-variation (TV)-norm penalty with gradient map .x/,an inner iteration with the TV-based denoising method in(Beck and Teboulle 2009b).

o iterative

21 / 71

Page 22: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Inexact PG Steps

B.i/ D ˇ.i�1/=ˇ.i/

� .i/ D˚i; i � 11 Cqb C B.i/�� .i�1/�2; i > 1

xx.i/ D PC

�x.i�1/ C � .i�1/ � 1

� .i/

�x.i�1/ � x.i�2/�� accel. step

x.i/ Ñ".i/ proxˇ .i/ur

�xx.i/ � ˇ.i/rL�xx.i/�� PG step

Because of their iterative nature, PG steps are inexact: ".i/

quantifies the precision of the PG step in Iteration i .more

22 / 71

Page 23: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Remark (Monotonicity)

The projected Nesterov’s proximal-gradient (PNPG) iteration withrestart is non-increasing:

f�x.i/

� � f �x.i�1/�if the inexact PG steps are sufficiently accurate and satisfy

".i/ �pı.i/

where

ı.i/ , x.i/ � x.i�1/ 2

2

is the local variation of signal iterates.

23 / 71

Page 24: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Convergence Criterion

pı.i/ < �

x.i/ 2

where � > 0 is the convergence threshold.more

RestartThe goal of function and domain restarts is to ensure that

the PNPG iteration is monotonic andxx.i/ and x.i/ remain within domf .

more

24 / 71

Page 25: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Summary of PNPG Approach

Combineconvex-set projection withNesterov acceleration.

Applyadaptive step size,restart.

more

25 / 71

Page 26: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Why?

Thanks to step-size adaptation, no need for Lipschitzcontinuity of the gradient of the NLL.domL does not have to be Rp.

Extends the application of the Nesterov’s acceleration� to moregeneral measurement models than those used previously.

�Y. Nesterov, “A method of solving a convex programming problem withconvergence rate O.1=k2/,” Sov. Math. Dokl., vol. 27, 1983, pp. 372–376.

26 / 71

Page 27: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Theorem (Convergence of the Objective Function)

Assume

NLL L.x/ is convex and differentiable and r.x/ is convex,C � domL: no need for domain restart.

Consider the PNPG iteration without restart.

27 / 71

Page 28: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Theorem (Convergence of the Objective Function)

f�x.k/

� � f �x?� � 2 x.0/ � x? 22C E.k/

2�p

ˇ.1/ CPkiD1

pˇ.i/

�2where

E.k/ ,kXiD1

�� .i/".i/

�2 error term, accounts for inexact PG steps

x? , arg minxf .x/

28 / 71

Page 29: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Comments

f�x.k/

� � f �x?� � 2 x.0/ � x? 22C E.k/

2�p

ˇ.1/ CPkiD1

pˇ.i/

�2Step sizes ˇ.i/

x?, convergence-rate upper bound?y.

better initialization, convergence-rate upper bound #.smaller prox-step approx. error, convergence-rate bound #.

29 / 71

Page 30: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

CorollaryUnder the condition of the Theorem,

f�x.k/

� � f �x?� � 2

x.0/ � x? 22C E.k/

2k2ˇminŸO�k�2

�if E.C1/ < C1

provided that

ˇmin ,C1minkD1

ˇ.k/ > 0:

The assumption that the step-size sequence is lower-bounded by astrictly positive quantity is weaker than Lipschitz continuity ofrL.x/ because it is guaranteed to have ˇmin > �=L if rL.x/ has aLipschitz constant L.

30 / 71

Page 31: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Theorem (Convergence of Iterates)

Assume that1 NLL L.x/ is convex and differentiable and r.x/ is convex,2 C � domL, hence no need for domain restart,

3 cumulative error term E.k/ converges: E.C1/ < C1,4 momentum tuning constants satisfy > 2 and b 2 �0; 1= 2�,5 the step-size sequence

�ˇ.i/

�C1iD1 is bounded within the range�

ˇmin; ˇmax�; .ˇmin > 0/.

R The sequence of PNPG iterates x.i/ without restart convergesweakly to a minimizer of f .x/. a minimizer of f .x/.

31 / 71

Page 32: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Theorem (Convergence of Iterates)strict inequality

Assume that1 NLL L.x/ is convex and differentiable and r.x/ is convex,2 C � domL, hence no need for domain restart,

3 cumulative error term E.k/ converges: E.C1/ < C1,

4 momentum tuning const. satisfy > 2 and b 2 �0; 1= 2�,5 the step-size sequence

�ˇ.i/

�C1iD1 is bounded within the range�

ˇmin; ˇmax�; .ˇmin > 0/.

R The sequence of PNPG iterates x.i/ without restart convergesweakly to a minimizer of f .x/. a minimizer of f .x/.

narrower than�0; 1=4

32 / 71

Page 33: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Idea of Proof.Recall the inequality

� .i/ D 1 Cqb C B.i/�� .i�1/�2 � 1

2Cq14C B.i/�� .i�1/�2

used to establish convergence of the objective function.Assumption 4:

> 2; b 2 �0; 1= 2�creates a sufficient “gap” in this inequality that allows us to

show faster convergence of the objective function than theprevious theorem andestablish the convergence of iterates.

R Inspired by (Chambolle and Dossal 2015).

33 / 71

Page 34: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Introduction

Signal reconstruction from Poisson-distributed measurements withaffine model for the mean-signal intensity is important for

tomographic (Ollinger and Fessler 1997),astronomic, optical, microscopic (Bertero et al. 2009),hyperspectral (Willett et al. 2014)

imaging.

34 / 71

Page 35: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

PET: Coincidence detection due to positron decay and annihilation(Prince and Links 2015).

35 / 71

Page 36: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Measurement Model

N independent measurements y D .yn/NnD1 follow the Poissondistribution with means

Œˆx C b�nwhere

ˆ 2 RN�pC ; b

are the known sensing matrix and intercept term�.

�the intercept b models background radiation and scattering, obtained, e.g.,by calibration before the measurements y have been collected

36 / 71

Page 37: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Existing Work

The sparse Poisson-intensity reconstruction algorithm (SPIRAL)§

approximates the logarithm functionin the underlying NLL by adding asmall positive term to it and thendescends a regularized NLLobjective function with proximalsteps that employ Barzilai-Borwein(BB) step size in each iteration,followed by backtracking.

§Z. T. Harmany et al., “This is SPIRAL-TAP: Sparse Poisson intensityreconstruction algorithms—theory and practice,” IEEE Trans. Image Process.,vol. 21, no. 3, pp. 1084–1096, Mar. 2012.

37 / 71

Page 38: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

PET Image Reconstruction

128 � 128 concentration map x.Collect the photons from 90 equallyspaced directions over 180ı, with 128radial samples at each direction,Background radiation, scattering effect,and accidental coincidence combined to-gether lead to a known intercept term b.The elements of the intercept termare set to a constant equal to 10% of

the sample mean of ˆx: b D 1Tˆx

10N1.

The model, choices of parameters in the PET system setup, andconcentration map have been adopted from Image ReconstructionToolbox (IRT) (Fessler 2016).

38 / 71

Page 39: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Numerical Example

Main metric for assessing the performance of the comparedalgorithms is relative square error (RSE)

RSE D kyx � xtruek22

kxtruek22where xtrue and yx are the true and reconstructed signal,respectively.All iterative methods use the convergence threshold

� D 10�6

and have the maximum number of iterations limited to 104.Regularization constant u has the form

u D 10a:We vary a in the range Œ�6; 3� with a grid size of 0.5 andsearch for the reconstructions with the best RSE performance.

39 / 71

Page 40: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Compared Methods

Filtered backprojection (FBP) (Ollinger and Fessler 1997) andPG methods that aim at minimizing f .x/ with nonnegative x:

C D RpC:

All iterative methods initialized by FBP reconstructions.

Matlab implementation available athttp://isucsp.github.io/imgRecSrc/npg.html.

40 / 71

Page 41: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

PG Methods

PNPG with . ; b/ D .2; 1=4/.AT (Auslender and Teboulle 2006) implemented in thetemplates for first-order conic solvers (TFOCS) package(Becker et al. 2011) with a periodic restart every 200 iterations(tuned for its best performance) and our proximal mapping.SPIRAL, when possible.

41 / 71

Page 42: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

(a) radio-isotope concentration (b) attenuation map

Figure 3: (a) True emission image and (b) density map.

42 / 71

Page 43: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

RSE=3.09 %

(a) FBP

RSE=0.66 %

(b) `1

Figure 4: Reconstructions of the emission concentration map for expectedtotal annihilation photon count (SNR) equal to 108.

43 / 71

Page 44: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

RSE=0.66 %

(a) `1

RSE=0.22 %

(b) TV

Figure 5: Comparison of the two sparsity regularizations.

44 / 71

Page 45: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

10�1

100

101

102

103

104

105

106

0 20 40 60 80 100 120 140

�.i

/

Time/s

ATPNPG .n D1/

PNPG .n D m D 0/PNPG .n D m D 4/

(a) `1, 1T E.y/ D 108

10�4

10�3

10�2

10�1

100

101

102

103

104

105

106

0 10 20 30 40 50 60

�.i

/

Time/s

SPIRALAT

PNPG .n D1/PNPG .n D m D 0/PNPG .n D m D 4/

(b) TV, 1T E.y/ D 107

Figure 6: Centered objectives as functions of CPU time.

45 / 71

Page 46: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References Conclusion References

10�1

100

101

102

103

104

105

106

0 20 40 60 80 100 120 140

�.i

/

Time/s

ATPNPG .n D1/

PNPG .n D m D 0/PNPG .n D m D 4/

(a) `1, 1T E.y/ D 108

10�4

10�3

10�2

10�1

100

101

102

103

104

105

106

0 10 20 30 40 50 60

�.i

/

Time/s

SPIRALAT

PNPG .n D1/PNPG .n D m D 0/PNPG .n D m D 4/

(b) TV, 1T E.y/ D 107

Figure 7: Centered objectives as functions of CPU time.

50 / 101

Page 47: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Linear Model with Gaussian Noise

L.x/ D 1

2ky �ˆxk22

wherey 2 RN is the measurement vector andthe elements of the sensing matrix ˆ are independent,identically distributed (i.i.d.), drawn from the standard normaldistribution.

We select the `1-norm sparsifying signal penalty with linear map:

.x/ D ‰Tx:

46 / 71

Page 48: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

−0.5

0

0.5

1

1.5

2

2.5

0 200 400 600 800 1000

Figure 7: True signal.

47 / 71

Page 49: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Comments I

More methods available for comparison:

sparse reconstruction by separable approximation (SpaRSA)(Wright et al. 2009),generalized forward-backward (GFB) (Raguet et al. 2013),primal-dual splitting (PDS) (Condat 2013).

Select the regularization parameter u as

u D 10aU; U , ‰TrL.0/ 1

where a is an integer selected from the interval Œ�9;�1� and Uis an upper bound on u of interest.

Choose the nonnegativity convex set:

C D RpC:

48 / 71

Page 50: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Comments II

If we remove the convex-set constraint by setting C D Rp,PNPG iteration reduces to the Nesterov’s proximal gradientiteration with adaptive step size that imposes signal sparsityonly in the analysis form (termed NPGS).

49 / 71

Page 51: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

−0.5

0

0.5

1

1.5

2

2.5

0 200 400 600 800 1000

RSE=0.011 %

(a) PNPG

−0.5

0

0.5

1

1.5

2

2.5

0 200 400 600 800 1000

RSE=0.23 %

(b) NPGS

Figure 8: PNPG and NPGS reconstructions for N=p D 0:34.

50 / 71

Page 52: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

10−8

10−6

10−4

10−2

100

102

104

0 10 20 30 40 50 60 70 80 90

∆(i

)

time/s

(a) a D �5;N=p D 0:34

10−8

10−6

10−4

10−2

100

102

104

0 10 20 30 40 50 60 70 80 90

∆(i

)

time/s

SPIRALSpaRSA (cont.)

PNPGPNPG (cont.)

GFBPDSAT

(b) a D �6;N=p D 0:49

Figure 9: Centered objectives as functions of CPU time.

51 / 71

Page 53: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

10−8

10−6

10−4

10−2

100

102

104

0 10 20 30 40 50 60 70 80time/s

(a) a D �4;N=p D 0:34

10−8

10−6

10−4

10−2

100

102

104

0 10 20 30 40 50 60 70 80time/s

SPIRALSpaRSA (cont.)

PNPGPNPG (cont.)

GFBPDSAT

(b) a D �5;N=p D 0:49

Figure 10: Centered objectives as functions of CPU time.

52 / 71

Page 54: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

10−8

10−6

10−4

10−2

100

102

104

0 5 10 15 20 25 30time/s

(a) a D �3;N=p D 0:34

10−8

10−6

10−4

10−2

100

102

104

0 5 10 15 20time/s

SPIRALSpaRSA (cont.)

PNPGPNPG (cont.)

GFBPDSAT

(b) a D �4;N=p D 0:49

Figure 11: Centered objectives as functions of CPU time.

53 / 71

Page 55: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Publications

R. G. and A. D. (May 2016), Projected Nesterov’sproximal-gradient algorithm for sparse signal reconstructionwith a convex constraint, version 5. arXiv: 1502.02613[stat.CO].

R. G. and A. D., “Projected Nesterov’s proximal-gradientsignal recovery from compressive Poisson measurements,”Proc. Asilomar Conf. Signals, Syst. Comput., Pacific Grove,CA, Nov. 2015, pp. 1490–1495.

54 / 71

Page 56: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Preliminary Work

R. G. and A. D., “A fast proximal gradient algorithm forreconstructing nonnegative signals with sparse transformcoefficients,” Proc. Asilomar Conf. Signals, Syst. Comput.,Pacific Grove, CA, Nov. 2014, pp. 1662–1667.

R. G. and A. D. (Mar. 2015), Reconstruction of nonnegativesparse signals using accelerated proximal-gradient algorithms,version 3. arXiv: 1502.02613v3 [stat.CO].

55 / 71

Page 57: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Conclusion

Developed a fast framework for reconstructing signals that aresparse in a transform domain and belong to a closed convex setby employing a projected proximal-gradient scheme withNesterov’s acceleration, restart and adaptive step size.Applied the proposed framework to construct the firstNesterov-accelerated Poisson compressed-sensingreconstruction algorithm.Derived convergence-rate upper-bound that accounts forinexactness of the proximal operator.Proved convergence of iterates.Our PNPG approach is computationally efficient compared withthe state-of-the-art.

56 / 71

Page 58: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

References I

A. Auslender and M. Teboulle, “Interior gradient and proximal methods forconvex and conic optimization,” SIAM J. Optim., vol. 16, no. 3, pp. 697–725,2006.

A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm forlinear inverse problems,” SIAM J. Imag. Sci., vol. 2, no. 1, pp. 183–202, 2009.

A. Beck and M. Teboulle, “Fast gradient-based algorithms for constrainedtotal variation image denoising and deblurring problems,” IEEE Trans. ImageProcess., vol. 18, no. 11, pp. 2419–2434, 2009.

S. R. Becker, E. J. Candès, and M. C. Grant, “Templates for convex coneproblems with applications to sparse signal recovery,” Math. Program. Comp.,vol. 3, no. 3, pp. 165–218, 2011. [Online]. Available: http://cvxr.com/tfocs.

M. Bertero, P. Boccacci, G. Desiderà, and G. Vicidomini, “Image deblurringwith Poisson data: From cells to galaxies,” Inverse Prob., vol. 25, no. 12,pp. 123006-1–123006-26, 2009.

S. Bonettini, I. Loris, F. Porta, and M. Prato, “Variable metric inexactline-search-based methods for nonsmooth optimization,” SIAM J. Optim., vol.26, no. 2, pp. 891–921, 2016.

57 / 71

Page 59: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

References II

S. Bonettini, F. Porta, and V. Ruggiero, “A variable metric forward-backwardmethod with extrapolation,” SIAM J. Sci. Comput., vol. 38, no. 4,A2558–A2584, 2016.

A. Chambolle and C. Dossal, “On the convergence of the iterates of the ‘fastiterative shrinkage/thresholding algorithm’,” J. Optim. Theory Appl., vol. 166,no. 3, pp. 968–982, 2015.

L. Condat, “A primal–dual splitting method for convex optimization involvingLipschitzian, proximable and linear composite terms,” J. Optim. Theory Appl.,vol. 158, no. 2, pp. 460–479, 2013.

J. A. Fessler, Image reconstruction toolbox, [Online]. Available:http://www.eecs.umich.edu/~fessler/code (visited on 08/23/2016).

Z. T. Harmany, R. F. Marcia, and R. M. Willett, “This is SPIRAL-TAP:Sparse Poisson intensity reconstruction algorithms—theory and practice,”IEEE Trans. Image Process., vol. 21, no. 3, pp. 1084–1096, Mar. 2012.

Y. Nesterov, “A method of solving a convex programming problem withconvergence rate O.1=k2/,” Sov. Math. Dokl., vol. 27, 1983, pp. 372–376.

58 / 71

Page 60: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

References III

B. O‘Donoghue and E. Candès, “Adaptive restart for accelerated gradientschemes,” Found. Comput. Math., vol. 15, no. 3, pp. 715–732, 2015.

J. M. Ollinger and J. A. Fessler, “Positron-emission tomography,” IEEESignal Process. Mag., vol. 14, no. 1, pp. 43–55, 1997.

B. T. Polyak, “Some methods of speeding up the convergence of iterationmethods,” USSR Comput. Math. Math. Phys., vol. 4, no. 5, pp. 1–17, 1964.

B. T. Polyak, Introduction to Optimization. New York: Optimization Software,1987.

J. L. Prince and J. M. Links, Medical Imaging Signals and Systems, 2nd ed.Upper Saddle River, NJ: Pearson, 2015.

H. Raguet, J. Fadili, and G. Peyré, “A generalized forward-backward splitting,”SIAM J. Imag. Sci., vol. 6, no. 3, pp. 1199–1226, 2013.

R. T. Rockafellar, Convex Analysis. Princeton, NJ: Princeton Univ. Press,1970.

59 / 71

Page 61: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

References IV

S. Salzo. (May 2016), The variable metric forward-backward splittingalgorithm under mild differentiability assumptions, arXiv: 1605.00952[math.OC].

S. Villa, S. Salzo, L. Baldassarre, and A. Verri, “Accelerated and inexactforward-backward algorithms,” SIAM J. Optim., vol. 23, no. 3,pp. 1607–1633, 2013.

R. M. Willett, M. F. Duarte, M. A. Davenport, and R. G. Baraniuk, “Sparsityand structure in hyperspectral imaging: Sensing, reconstruction, and targetdetection,” IEEE Signal Process. Mag., vol. 31, no. 1, pp. 116–126, Jan. 2014.

S. J. Wright, R. D. Nowak, and M. A. T. Figueiredo, “Sparse reconstructionby separable approximation,” IEEE Trans. Signal Process., vol. 57, no. 7,pp. 2479–2493, 2009.

60 / 71

Page 62: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Adaptive Step Size

1 if no step-size backtracking events or increase attempts for nconsecutive iterations, start with a larger step size

x̌.i/ D ˇ.i�1/

�(increase attempt)

where � 2 .0; 1/ is a step-size adaptation parameter;otherwise start with

x̌.i/ D ˇ.i�1/I2 (backtracking search) select

ˇ.i/ D � ti x̌.i/ (2)

where ti � 0 is the smallest integer such that (2) satisfies themajorization condition (2); backtracking event corresponds toti > 0.

3 if max�ˇ.i/; ˇ.i�1/

�< x̌.i/, increase n by a nonnegative

integer m: n nCm: back

61 / 71

Page 63: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Restart

Whenever f�x.i/

�> f

�x.i�1/

�or xx.i/ 2 C n domL, we set

� .i�1/ D 1 (restart)

and refer to this action as function restart (O‘Donoghue and Candès2015) or domain restart respectively.

back

62 / 71

Page 64: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Inner Convergence Criteria

TV: x.i;j / � x.i;j�1/

2� �

pı.i�1/ (3a)

`1: max� s.i;j / �‰Tx.i;j /

2; s.i;j / � s.i;j�1/

2

�� �

‰T �x.i�1/ � x.i�2/� 2

(3b)

where j is the inner-iteration index,x.i;j / is the iterate of x in the j th inner iteration step withinthe ith step of the (outer) PNPG iteration, and

� 2 .0; 1/is the convergence tuning constant chosen to trade off theaccuracy and speed of the inner iterations and providesufficiently accurate solutions to the proximal mapping.

back

63 / 71

Page 65: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Definition (Inexact Proximal Operator (Villa et al. 2013))

We say that x is an approximation of proxur a with "-precision,denoted by

x Ñ" proxur a

if

a � xu2 @ "2

2u

r.x/:

Note: This definition implies

kx � proxur ak22 � "2:back

64 / 71

Page 66: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Relationship with FISTA I

PNPG can be thought of as a generalized fast iterativeshrinkage-thresholding algorithm (FISTA) (Beck and Teboulle2009a) that accommodates

convex constraints,more general NLLs,‘ and (increasing) adaptive step size

thanks to this step-size adaptation, PNPG does not requireLipschitz continuity of the gradient of the NLL.

back

‘FISTA has been developed for the linear Gaussian model.65 / 71

Page 67: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Relationship with FISTA II

Need B.i/ to derive theoretical guarantee for convergencespeed of the PNPG iteration.In contrast with PNPG, FISTA has a non-increasing step sizeˇ.i/, which allows for setting

B.i/ D 1for all i :‖

� .i/ D 1

2

�1C

q1C 4�� .i�1/�2�:

A simpler version of FISTA is

� .i/ D 1

2C � .i�1/ D i C 1

2

for i � 1, which corresponds to . ; b/ D .2; 0/.back

‖Y. Nesterov, “A method of solving a convex programming problem withconvergence rate O.1=k2/,” Sov. Math. Dokl., vol. 27, 1983, pp. 372–376.

66 / 71

Page 68: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Relationship with AT

(Auslender and Teboulle 2006):

� .i/ D 1

2

�1C

q1C 4�� .i�1/�2�

xx.i/ D�1 � 1

� .i/

�x.i�1/ C 1

� .i/zx.i�1/

zx.i/ D prox�.i/ˇ .i/ur

�zx.i�1/ � � .i/ˇ.i/rL�xx.i/��

x.i/ D�1 � 1

� .i/

�x.i�1/ C 1

� .i/zx.i/

Other variants with infinite memory are available at (Becker et al.2011).

back

67 / 71

Page 69: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Heavy-ball Method

(Polyak 1964; Polyak 1987):

x.i/ D proxˇ .i/ur

�x.i�1/ � ˇ.i/rL�x.i�1/��C‚.i/�x.i�1/ � x.i�2/�:

back

68 / 71

Page 70: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Glossary I

ADMM alternating direction method of multipliers. 21

BB Barzilai-Borwein. 37

CPU central processing unit. 45

CT computed tomography. 6

FBP filtered backprojection. 40

FISTA fast iterative shrinkage-thresholding algorithm. 66, 67

GFB generalized forward-backward. 49

GLM generalized linear model. 17

i.i.d. independent, identically distributed. 47

IRT Image Reconstruction Toolbox. 38

MRI magnetic resonance imaging. 6

70 / 71

Page 71: Projected Nesterov's Proximal-Gradient Algorithm for Sparse Signal  Recovery

PNPG Algorithm Applications References References References Acronyms

Glossary IINLL negative log-likelihood. 10, 11, 26, 27, 31, 32, 37, 66

NPGS Nesterov’s proximal-gradient sparse. 50, 51PDS primal-dual splitting. 49PET positron emission tomography. 6, 35, 38PG proximal-gradient. 9, 16, 22, 23, 28, 40

PNPG projected Nesterov’s proximal-gradient. 23, 24, 27, 31,32, 41, 51, 57, 64, 66, 67

RSE relative square error. 39SpaRSA sparse reconstruction by separable approximation. 49SPECT single photon emission computed tomography. 6SPIRAL sparse Poisson-intensity reconstruction algorithm. 37,

41TFOCS templates for first-order conic solvers. 41

TV total-variation. 2171 / 71