short-and-sparse deconvolution a geometric approach · yenson lau1,4, qing qu2,4, han-wen (henry)...

Short-and-Sparse Deconvolution – A Geometric ApproachYenson Lau†,?, Qing Qu‡,?, Han-Wen (Henry) Kuo†, Pengcheng Zhou†, Yuqian Zhang�, and John Wright†

†Columbia University, ‡New York University, �Cornell University, ?denote equal contribution

Short-and-Sparse (SaS) Model

•Model signals containing repeated (short) motifs:

500 1000 1500 2000

200

400

600

800

1000

1200

1400

1600 50 100 150 200 250 300 350 400

50

100

150

200

250

300

350

400

≈

≈

≈

~

~

~

y ≈ a0︸︷︷︸short kernel

~ x0︸︷︷︸sparse signal

Problem: SaS Deconvolution (SaSD)

Given convolution y = a0 ~ x0 ∈ Rm between shorta0 ∈ Rn0 (n0 � m), and sparse x0 ∈ Rm, recoverboth a0 and x0.

Two Intrinsic Properties of SaSD

−1 1

1 −2 0

−1 0 2

=~

~

y = αs`[a0] (1/α)s−`[x0]~

• Symmetry: All scale & shifts of (a0,x0) are equivalentsolutions;

Spiky Generic Tapered Generic Lowpass

µ ≈ 0 µ ≈ n−1/20 µ ≈ const.

θ ≈ n−1/20 θ ≈ n

−3/40 θ ≈ n−1

0

a0

x0

shift-coherence

allowablesparsity

- Intrinsic shift symmetry leads to nonconvex problems.

• Sparsity-coherence tradeoff. For a0 ∈ Sn0−1, largercoherence of a0 (in other words, smoother)

µ(a0) = max6̀=0|〈a0, s`[a0]〉| ∈ (0, 1],

results in sparser x0 to be recovered, and vice versa.- In practice, kernels tend to be smooth with µ(a0). ≈ 1.

Problem Formulation

•Bilinear Lasso (BL). Natural nonconvex objective

mina∈Sn−1, x

ΨBL(a,x) := 12‖y − a~ x‖2︸︷︷︸

data fidelity

+ λ · ‖x‖1︸︷︷︸sparsity

.

- Optimize with a ∈ Rn and n = 3n0 − 2, to eliminate boundary effects(shift-truncations).

- Break scale symmetry by spherical constraint a ∈ Sn−1;•Approximate Bilinear Lasso (ABL) [2]

ΨABL(a,x) := 12‖x‖2 − 〈y,a~ x〉 + 1

2‖y‖2 + λ ‖x‖1 ,

- minxΨABL(a,x) has closed-form solution;- ΨABL(a,x) ≈ ΨBL(a,x) when µ(a0)↘ 0. The approximation breakswhen µ(a0) is large.

Nonconvex Optimization Landscape

•Nonconvex landscape over the sphere. StudyΨBL(a) := min

xΨBL(a,x), ΨABL(a) := min

xΨABL(a,x).

- The objectives are convex w.r.t. x;- dim(a)� dim(x), the space of a is where measure concentrates.

(a) ΨABL(a), µ(a0) ≈ 0 (b) ΨBL(a), µ(a0) ≈ 0 (c) ΨBL(a), µ(a0) ≈ 1

•Benign geometry over subspace of shifts [2]SI .=

{ ∑`∈I α`s` [a0] : α` ∈ R

} ⋂ Sn−1

if |I| ∈ O(n0θ), λ ≈ (n0θ)−1/2, and θn0 / µ−1/2(a0).Near s`1[a0]

s`1[a0]

NearS{`1,`2}

s`1[a0]s`2[a0]

S{`1,`2,`3}

s`1[a0] s`2[a0]

s`3[a0]

ϕABL(a)

Sp−1

- Every local minimizer is a shift of the kernel a0;- Saddle point exhibits negative curvature in symmetry breaking directions;- Theoretical justified for ΨABL(a) with µ(a0) ≈ 0 [2], less theoreticallywell-understood when ΨBL(a) or µ(a0) ≈ 1.

Convolutional Dictionary Learning

•To recover multiple unknown kernels, we optimize

minak,xk

12

∥∥∥∥∥∥y −N∑k=1ak ~ xk

∥∥∥∥∥∥2

+ λN∑k=1‖xk‖1 , s.t. ‖ak‖ = 1

by adapting similar ideas for recovering a single kernel.

Optimization from Geometric Intuitions

• Initialize. Use trunk of y (sum of truncated shifts)

data y kernel a0 sparse x0

windowed data a(−1) initialize a(0)

= ∗

≈

αisi[a0] + αjsj[a0]

y consistsof shifts

a(0) nearsubspace Si,j

where a(0) = −PSn−1∇ΨBL(PSn−1(a(−1))

).

•Alternating descent method (ADM)1 Fix a and take proximal gradient on x:

x ← prox (x− τ · ∇xΨBL(a,x))2 Fix x and take a Riemannian gradient on a:

a ← PSn−1 (a− t · gradaΨBL(a,x))3 Alternate until convergence.

•Heuristics for dealing coherent problems

(a) Without momentum (b) With momentum- Momentum acceleration: deal with ill-conditioning by adding momentumterm on descent steps for both a and x.

z(k+1) ← z(k) − α∇f (z(k))︸︷︷︸gradient direction

+ β(z(k) − z(k−1)

)︸︷︷︸

inertial term

,

- Homotopy method: adaptively shrink λ through the solution path (a,x)of ADM for optimizing ϕBL(a,x)

(a) ΨBL(a), λ = 5× 10−1 (b) ΨBL(a), λ = 5× 10−2 (c) ΨBL(a), λ = 5× 10−3

•Numerical comparison

(a) Incoherent Kernel (b) Coherent Kernel

(c) Incoherent Kernel (d) Coherent Kernel

Applications

•Calcium Imagingraw vs. estimated calcium sequence

estimated spike train x

estimated kernel a

calcium image Y

kernel Ak (k = 1, 2)

activation map Xk (k = 1, 2)

reconstructionY k = Ak ~Xk (k = 1, 2)

•Defects Detection in Scan Tunneling Microscopy (STM)STM image Y

kernel Ak (k = 1, 2)

activation map Xk (k = 1, 2)

reconstructionY k = Ak ~Xk (k = 1, 2)

• Super-resolution Microscopy Imagingorignal image deconvolved image

• Spike Sorting

References

[1] Y. Lau, Q. Qu, and H. Kuo, P. Zhou, Y. Zhang, and J. Wright, “Short-and-sparse deconvolution – Ageometric approach”, arXiv preprint arXiv:1908.10959, submitted to ICLR’20, 2019.

[2] H. Kuo and Y. Lau, Y. Zhang, and J. Wright, “Geometry and Symmetry in Short-and-Sparse Deconvolution”.ICML, 2019.

[3] Q. Qu and Y. Zhai, X. Li, Y. Zhang, and Z. Zhu, “Geometric analysis of nonconvex optimization landscapesfor overcomplete learning”. submitted to ICLR’20, 2019.

https://deconvlab.github.io/

https://github.com/qingqu06/

sparse_deconvolution

https://deconvlab.github.io/

https://github.com/qingqu06/sparse_deconvolution

https://github.com/qingqu06/sparse_deconvolution

short-and-sparse deconvolution a geometric approach · yenson lau1,4, qing qu2,4, han-wen (henry)...

Documents