short-and-sparse deconvolution a geometric approach · yenson lau1,4, qing qu2,4, han-wen (henry)...

1
Short-and-Sparse Deconvolution – A Geometric Approach Yenson Lau , , Qing Qu , , Han-Wen (Henry) Kuo , Pengcheng Zhou , Yuqian Zhang , and John Wright Columbia University, New York University, Cornell University, denote equal contribution Short-and-Sparse (SaS) Model Model signals containing repeated (short) motifs: ~ ~ ~ y a 0 |{z} short kernel ~ x 0 |{z} sparse signal Problem: SaS Deconvolution (SaSD) Given convolution y = a 0 ~ x 0 R m between short a 0 R n 0 (n 0 m), and sparse x 0 R m , recover both a 0 and x 0 . Two Intrinsic Properties of SaSD -1 1 1 -2 0 -1 0 2 = ~ ~ y = αs [ a 0 ] (1)s - [ x 0 ] ~ Symmetry: All scale & shifts of (a 0 , x 0 ) are equivalent solutions; Spiky Generic Tapered Generic Lowpass μ 0 μ n -1/2 0 μ const. θ n -1/2 0 θ n -3/4 0 θ n -1 0 a 0 x 0 shift- coherence allowable sparsity - Intrinsic shift symmetry leads to nonconvex problems. Sparsity-coherence tradeoff. For a 0 S n 0 -1 , larger coherence of a 0 (in other words, smoother) μ(a 0 ) = max =0 |a 0 ,s [ a 0 ]| ∈ (0, 1], results in sparser x 0 to be recovered, and vice versa. - In practice, kernels tend to be smooth with μ(a 0 ). 1. Problem Formulation Bilinear Lasso (BL). Natural nonconvex objective min aS n-1 , x Ψ BL (a, x) := 1 2 y - a ~ x 2 | {z } data fidelity + λ ·x 1 | {z } sparsity . - Optimize with a R n and n =3n 0 - 2, to eliminate boundary effects (shift-truncations). - Break scale symmetry by spherical constraint a S n-1 ; Approximate Bilinear Lasso (ABL) [2] Ψ ABL (a, x) := 1 2 x 2 -y , a ~ x + 1 2 y 2 + λ x 1 , - min x Ψ ABL (a, x) has closed-form solution; - Ψ ABL (a, x) Ψ BL (a, x) when μ(a 0 ) 0. The approximation breaks when μ(a 0 ) is large. Nonconvex Optimization Landscape Nonconvex landscape over the sphere. Study Ψ BL (a) := min x Ψ BL (a, x), Ψ ABL (a) := min x Ψ ABL (a, x). - The objectives are convex w.r.t. x; - dim(a) dim(x), the space of a is where measure concentrates. (a) Ψ ABL (a), μ(a 0 ) 0 (b) Ψ BL (a), μ(a 0 ) 0 (c) Ψ BL (a), μ(a 0 ) 1 Benign geometry over subspace of shifts [2] S I . = n ∈I α s [ a 0 ]: α R o T S n-1 if |I| ∈ O (n 0 θ ), λ (n 0 θ ) -1/2 , and θn 0 / μ -1/2 (a 0 ). Near s 1 [ a 0 ] s 1 [a 0 ] Near S { 1 , 2 } s 1 [a 0 ] s 2 [a 0 ] S { 1 , 2 , 3 } s 1 [a 0 ] s 2 [a 0 ] s 3 [a 0 ] ϕ ABL (a) S p-1 - Every local minimizer is a shift of the kernel a 0 ; - Saddle point exhibits negative curvature in symmetry breaking directions; - Theoretical justified for Ψ ABL (a) with μ(a 0 ) 0 [2], less theoretically well-understood when Ψ BL (a) or μ(a 0 ) 1. Convolutional Dictionary Learning To recover multiple unknown kernels, we optimize min a k ,x k 1 2 y - N X k =1 a k ~ x k 2 + λ N X k =1 x k 1 , s.t. a k =1 by adapting similar ideas for recovering a single kernel. Optimization from Geometric Intuitions Initialize. Use trunk of y (sum of truncated shifts) data y kernel a 0 sparse x 0 windowed data a (-1) initialize a (0) = * α i s i [a 0 ] + α j s j [a 0 ] y consists of shifts a (0) near subspace S i,j where a (0) = -P S n-1 Ψ BL P S n-1 (a (-1) ) . Alternating descent method (ADM) 1 Fix a and take proximal gradient on x: x prox (x - τ ·∇ x Ψ BL (a, x)) 2 Fix x and take a Riemannian gradient on a: a ←P S n-1 (a - t · grad a Ψ BL (a, x)) 3 Alternate until convergence. Heuristics for dealing coherent problems (a) Without momentum (b) With momentum - Momentum acceleration: deal with ill-conditioning by adding momentum term on descent steps for both a and x. z (k +1) z (k ) - αf (z (k ) ) | {z } gradient direction + β z (k ) - z (k -1) | {z } inertial term , - Homotopy method: adaptively shrink λ through the solution path (a, x) of ADM for optimizing ϕ BL (a, x) (a) Ψ BL (a), λ =5 × 10 -1 (b) Ψ BL (a), λ =5 × 10 -2 (c) Ψ BL (a), λ =5 × 10 -3 Numerical comparison (a) Incoherent Kernel (b) Coherent Kernel (c) Incoherent Kernel (d) Coherent Kernel Applications Calcium Imaging raw vs. estimated calcium sequence estimated spike train x estimated kernel a calcium image Y kernel A k (k =1, 2) activation map X k (k =1, 2) reconstruction Y k = A k ~ X k (k =1, 2) Defects Detection in Scan Tunneling Microscopy (STM) STM image Y kernel A k (k =1, 2) activation map X k (k =1, 2) reconstruction Y k = A k ~ X k (k =1, 2) Super-resolution Microscopy Imaging orignal image deconvolved image Spike Sorting References [1]Y. Lau, Q. Qu, and H. Kuo, P. Zhou, Y. Zhang, and J. Wright, “Short-and-sparse deconvolution – A geometric approach”, arXiv preprint arXiv:1908.10959, submitted to ICLR’20, 2019. [2]H. Kuo and Y. Lau, Y. Zhang, and J. Wright, “Geometry and Symmetry in Short-and-Sparse Deconvolution”. ICML, 2019. [3] Q. Qu and Y. Zhai, X. Li, Y. Zhang, and Z. Zhu, “Geometric analysis of nonconvex optimization landscapes for overcomplete learning”. submitted to ICLR’20, 2019. https://deconvlab.github.io/ https://github.com/qingqu06/ sparse_deconvolution

Upload: others

Post on 16-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Short-and-Sparse Deconvolution A Geometric Approach · Yenson Lau1,4, Qing Qu2,4, Han-Wen (Henry) Kuo1, Pengcheng Zhou1, Yuqian Zhang3, and John Wright1 Created Date: 10/24/2019 1:24:26

Short-and-Sparse Deconvolution – A Geometric ApproachYenson Lau†,?, Qing Qu‡,?, Han-Wen (Henry) Kuo†, Pengcheng Zhou†, Yuqian Zhang�, and John Wright†

†Columbia University, ‡New York University, �Cornell University, ?denote equal contribution

Short-and-Sparse (SaS) Model

•Model signals containing repeated (short) motifs:

500 1000 1500 2000

200

400

600

800

1000

1200

1400

1600 50 100 150 200 250 300 350 400

50

100

150

200

250

300

350

400

~

~

~

y ≈ a0︸︷︷︸short kernel

~ x0︸︷︷︸sparse signal

Problem: SaS Deconvolution (SaSD)

Given convolution y = a0 ~ x0 ∈ Rm between shorta0 ∈ Rn0 (n0 � m), and sparse x0 ∈ Rm, recoverboth a0 and x0.

Two Intrinsic Properties of SaSD

−1 1

1 −2 0

−1 0 2

=~

~

y = αs`[a0] (1/α)s−`[x0]~

• Symmetry: All scale & shifts of (a0,x0) are equivalentsolutions;

Spiky Generic Tapered Generic Lowpass

µ ≈ 0 µ ≈ n−1/20 µ ≈ const.

θ ≈ n−1/20 θ ≈ n

−3/40 θ ≈ n−1

0

a0

x0

shift-coherence

allowablesparsity

- Intrinsic shift symmetry leads to nonconvex problems.

• Sparsity-coherence tradeoff. For a0 ∈ Sn0−1, largercoherence of a0 (in other words, smoother)

µ(a0) = max6̀=0|〈a0, s`[a0]〉| ∈ (0, 1],

results in sparser x0 to be recovered, and vice versa.- In practice, kernels tend to be smooth with µ(a0). ≈ 1.

Problem Formulation

•Bilinear Lasso (BL). Natural nonconvex objective

mina∈Sn−1, x

ΨBL(a,x) := 12‖y − a~ x‖2︸ ︷︷ ︸

data fidelity

+ λ · ‖x‖1︸ ︷︷ ︸sparsity

.

- Optimize with a ∈ Rn and n = 3n0 − 2, to eliminate boundary effects(shift-truncations).

- Break scale symmetry by spherical constraint a ∈ Sn−1;•Approximate Bilinear Lasso (ABL) [2]

ΨABL(a,x) := 12‖x‖2 − 〈y,a~ x〉 + 1

2‖y‖2 + λ ‖x‖1 ,

- minxΨABL(a,x) has closed-form solution;- ΨABL(a,x) ≈ ΨBL(a,x) when µ(a0)↘ 0. The approximation breakswhen µ(a0) is large.

Nonconvex Optimization Landscape

•Nonconvex landscape over the sphere. StudyΨBL(a) := min

xΨBL(a,x), ΨABL(a) := min

xΨABL(a,x).

- The objectives are convex w.r.t. x;- dim(a)� dim(x), the space of a is where measure concentrates.

(a) ΨABL(a), µ(a0) ≈ 0 (b) ΨBL(a), µ(a0) ≈ 0 (c) ΨBL(a), µ(a0) ≈ 1

•Benign geometry over subspace of shifts [2]SI .=

{ ∑`∈I α`s` [a0] : α` ∈ R

} ⋂ Sn−1

if |I| ∈ O(n0θ), λ ≈ (n0θ)−1/2, and θn0 / µ−1/2(a0).Near s`1[a0]

s`1[a0]

NearS{`1,`2}

s`1[a0]s`2[a0]

S{`1,`2,`3}

s`1[a0] s`2[a0]

s`3[a0]

ϕABL(a)

Sp−1

- Every local minimizer is a shift of the kernel a0;- Saddle point exhibits negative curvature in symmetry breaking directions;- Theoretical justified for ΨABL(a) with µ(a0) ≈ 0 [2], less theoreticallywell-understood when ΨBL(a) or µ(a0) ≈ 1.

Convolutional Dictionary Learning

•To recover multiple unknown kernels, we optimize

minak,xk

12

∥∥∥∥∥∥y −N∑k=1ak ~ xk

∥∥∥∥∥∥2

+ λN∑k=1‖xk‖1 , s.t. ‖ak‖ = 1

by adapting similar ideas for recovering a single kernel.

Optimization from Geometric Intuitions

• Initialize. Use trunk of y (sum of truncated shifts)

data y kernel a0 sparse x0

windowed data a(−1) initialize a(0)

= ∗

αisi[a0] + αjsj[a0]

y consistsof shifts

a(0) nearsubspace Si,j

where a(0) = −PSn−1∇ΨBL(PSn−1(a(−1))

).

•Alternating descent method (ADM)1 Fix a and take proximal gradient on x:

x ← prox (x− τ · ∇xΨBL(a,x))2 Fix x and take a Riemannian gradient on a:

a ← PSn−1 (a− t · gradaΨBL(a,x))3 Alternate until convergence.

•Heuristics for dealing coherent problems

(a) Without momentum (b) With momentum- Momentum acceleration: deal with ill-conditioning by adding momentumterm on descent steps for both a and x.

z(k+1) ← z(k) − α∇f (z(k))︸ ︷︷ ︸gradient direction

+ β(z(k) − z(k−1)

)︸ ︷︷ ︸

inertial term

,

- Homotopy method: adaptively shrink λ through the solution path (a,x)of ADM for optimizing ϕBL(a,x)

(a) ΨBL(a), λ = 5× 10−1 (b) ΨBL(a), λ = 5× 10−2 (c) ΨBL(a), λ = 5× 10−3

•Numerical comparison

(a) Incoherent Kernel (b) Coherent Kernel

(c) Incoherent Kernel (d) Coherent Kernel

Applications

•Calcium Imagingraw vs. estimated calcium sequence

estimated spike train x

estimated kernel a

calcium image Y

kernel Ak (k = 1, 2)

activation map Xk (k = 1, 2)

reconstructionY k = Ak ~Xk (k = 1, 2)

•Defects Detection in Scan Tunneling Microscopy (STM)STM image Y

kernel Ak (k = 1, 2)

activation map Xk (k = 1, 2)

reconstructionY k = Ak ~Xk (k = 1, 2)

• Super-resolution Microscopy Imagingorignal image deconvolved image

• Spike Sorting

References

[1] Y. Lau, Q. Qu, and H. Kuo, P. Zhou, Y. Zhang, and J. Wright, “Short-and-sparse deconvolution – Ageometric approach”, arXiv preprint arXiv:1908.10959, submitted to ICLR’20, 2019.

[2] H. Kuo and Y. Lau, Y. Zhang, and J. Wright, “Geometry and Symmetry in Short-and-Sparse Deconvolution”.ICML, 2019.

[3] Q. Qu and Y. Zhai, X. Li, Y. Zhang, and Z. Zhu, “Geometric analysis of nonconvex optimization landscapesfor overcomplete learning”. submitted to ICLR’20, 2019.

https://deconvlab.github.io/

https://github.com/qingqu06/

sparse_deconvolution