decomposition and denoising for moment sequences using convex optimization

48
Decomposition and Denoising for moment sequences using convex optimization Badri Narayan Bhaskar

Upload: badri-narayan-bhaskar

Post on 16-Apr-2017

202 views

Category:

Science


1 download

TRANSCRIPT

Page 1: Decomposition and Denoising for moment sequences using convex optimization

Decomposition and Denoising for moment sequences using convex optimization

Badri Narayan Bhaskar

High Dimensional Inference

n p

Statistics

Microarray Data

HyperspectralImaging

Predict Ratings

Sparse Vector Estimation Gene Expressions

X

n

p

k sparse

Samples

Genes

Few Active Genes

Disease Susceptibility =

Response

Low Rank Matrices The Netflix Problem

Movie Ratings

thousands ofmovies

millions ofsubscribers

movie profile

user profile

Action

+ a handful of other features

Decomposition

Low Rank Matrices Topic Models

Term DocumentMatrix

thousands ofdocuments

millions ofwords

degree of topicality

topic profile

Politics

+ a handful of other features

Decomposition

Simple objects

Objects Notion of Simplicity Atoms

Vectors Sparsity Canonical unit vectors

Matrices Rank Rank-1 matrices

Bandlimited signals No of frequencies Complex sinusoids

Linear systems Number of poles Single pole systems

x =X

a2Acaax =

X

a2Acaa

atomic set

atomsldquofeaturesrdquo

nonnegative weights

Universal RegularizerChandrasekaran Recht Parillo Wilsky (2010)

Objects Notion of Simplicity Atomic Norm

Vectors Sparsity

Matrices Rank nuclear norm

Bandlimited signals No of frequencies

Linear systems McMillan degree

1 QRUP

xA = LQI t 0 x t FRQY(A)

= LQI

a

ca x =

aAcaa ca gt 0

t

xxA = LQI t 0 x t FRQY(A)

Moment Problems

A

atoms

+ +

x

measure space

moments of a discrete measure

atomic moments

System Identification

minus1

0

1

minus1

0

10

2

4

6

minus05 0 052

3

4

5

6

7

Line Spectral Estimation

Decomposition

How many atoms do you need to represent x

composing atoms are on an exposed face decomposition is easy to find

find a supporting hyperplane to certify

concentrate on recovering ldquogoodrdquo objects with atoms on exposed faces

t

x

Dual Certificate

1 2 3 4 5 6 7

0

1

2

3

4

5

6

X Axis

Y A

xis

Gradient

Subgradients

optimum value is atomic norm

solutions are subgradients at x

semi-infinite problem

maximize

qhq xi

subject to kqkA 1

Dual Characterization of atomic norm

kqkA = supa2A

hq ai

(under mild conditions Bonsall)

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Denoising

minimizex

1

2ky xk22 + kxkA

Tradeoff parameter

AST (Atomic Soft Thresholding)

Dual AST

maximize

qhq yi kqk22

subject to kqkA 1

y = x + wLQ Cn

QRLVH N (0 2In)

AST Results

q + x = y

kqkA 1

hq xi = kxkA

Optimality Conditions

suggests q w ) EkwkA

means

q 2 kxkACan find composing atoms

Tradeoff = Gaussian Width

No cross validation needed - estimate Gaussian width

Gaussian width also important to characterize convergence rates

AST Convergence Rates

Fast Rate

s sparse

1

nX X2

2 C2 s log(p)

n

Slow RateLasso

n

1

nX X2

2

log(p)

n1

Choose appropriate tradeoff

= E (wA) 1

Universal but slow rateMinimal assumptions on atoms

1

nx x2

2

nxA

With a ldquocone conditionrdquo faster rates are possible

Weaker than RIP Coherence RSC RE 1

nx x2

2 C2

12n

ĪIRUODUJHHQRXJK ī

Summary

Notion of simple models

Atomic norm unifies treatment of many models

Recovers tight published bounds

Bounds depend on Gaussian width

Faster rates with a local cone condition

Line Spectral Estimation

x0

xk

xn1

=

s

j=1

cjei2fjk

Line Spectrum Time Samples

Extrapolate the remaining moments

micro =

j

cjfj

Super resolution Time frequency dual

Extrapolate high frequency information from low frequency samples

Just exchange time and frequency same as line spectral estimation

Classical Line Spectrum Estimation (a crash course)

Tn(x) =

2

6664

x1 x2 xn

x

2 x1 xn1

x

n x

n1 x1

3

7775Define for any vector x

Nonlinear Parameter Estimation xk =

s

j=1

cje2ifjk

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Classical techniques

PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum

Sensitive to model orderNo rigorous theoryOnly optimal asymptotically

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Convex Perspective

0 2 4 6 8 10 12 14minus10

minus5

0

5

10

15

x =s

j=1

cja(fj)

amplitudes

frequencies

x0

xk

xn1

=

s

j=1

cjei2fjk

1

ei2fjk

ei2fj(n1)

AtomsLine Spectrum

A+ = a(f) | f [0 1]Positive Amplitudes

A =a(f)ei

f [0 1] [0 2]

Complex

Convergence rates

With high probability for ngt4

pn log(n)Choose tradeoff as

Dudleyrsquos inequalityBernsteinrsquos polynomial theorem

For a signal with n samplesof the form

Using the global AST guarantee we get

E

1

nx x2

2

C

log(n)

n

s

j=1

|cj |

yk =s

j=1

cje2ifjk + wk

n log(n) n2 log(4 log(n)) w

A

1 +

1

log(n)

n log(n) + n log(1632 log(n))

Fast Rates

E1

n

kx x

k22

C

2s

log(n)

n

minp 6=q

d(up uq) 4

n

If every two frequencies are far enough apart (minimum separation condition)

For s frequencies and n samples MSE is

Not a global condition on the dictionary local to the signal

Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda

= pn log(n) 2 (11) large enough

and the tradeoff parameter is chosen correctly

Canrsquot do much better

Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation

E1

n

kx x

k22

C

2 s log(n4s)

n

Only logarithmic factor from ldquooracle raterdquo

E1

n

kx x

k22

C

2s

log(n)

n

Our rate

E

1

nx x2

2

C2 s

n

Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda

4n separation = can construct explicit dual certificate of exact recovery

Convex hull of far enough vertices are exposed faces

Prony does not have this limitation (Then why bother Robustness guarantee)

Minimum separation is necessary

except for the positive case

Convex hull of every collection of n2 vertices is an exposed face

Infinite dimensional version of cyclic polytope which is neighbourly

No separation condition needed

Localizing Frequencies

0 02 04 06 08 10

02

04

06

08

1

Frequency

Dua

l Pol

ynom

ial V

alue

034 036 038 04 042 044 046085

09

095

1

Frequency

Dua

l Pol

ynom

ial V

alueDual Polynomial

y = x+ q

Q(f) =n1

j=0

qjei2jf

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Localization Guarantees

Spurious Amplitudes

Weighted Frequency Deviation

Near region approximation

10 01 02 03 04 05 06 07 08 09

1

0

Frequency

Du

al P

oly

no

mia

l M

agn

itu

de

True FrequencyEstimated Frequency

016nSpurious Frequency

lflF

|cl| C1

k2 ORJ(n)

n

cj

lflNj

cl

C3

k2 ORJ(n)

n

lflNj

|cl|PLQfjT

d(fj fl)

2

C2

k2 ORJ(n)

n

How to actually solve AST Semidefinite Characterizations

minimizex

1

2ky xk22 + kxkA

conv(A+) conv(A)

kxkA+ =

(x0 Tn(x) 0

+1 otherwise

Bounded Real LemmaCaratheodory Toeplitz

A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei

f [0 1] [0 2]

Complex

xA = inf

1

2ntr(Tn(u)) +

1

2t

Tn(u) x

x t

0

Alternating Direction Method of Multipliers

minimize

tuxZ

12kx yk22 +

2 (t+ u1)

subject to Z =

T (u) x

x

t

Z 0

AST SDP Formulation

(tl+1 u

l+1 x

l+1) arg mintux

L

(t u x Zl

l)

Z

l+1 argminZ0

L

(tl+1 u

l+1 x

l+1 Zl)

l+1 l +

Z

l+1 T (ul+1) x

l+1

x

l+1t

l+1

Alternating Direction Iterations

quadratic objective linear

projection on psd cone

linear dual update

L(t u x Z) =1

2kx yk22 +

2(t+ u1) +

Z

T (u) x

x

t

+

2

Z T (u) x

x

t

2

F

Form Augmented Lagrangian

Lagrangian Term Augmented Lagrangian

Momentum Parameter

| z | z

Alternative Discretization Method

Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)

We show (1 2n

N

)kxkAN kxkA kxkAN

AST is approximated by Lasso

n

frequencies on a N grid

amplitudes

c

+ w

atomic moments

Via Bernsteinrsquos Polynomial Theorem or Fritz John

Works for general epsilon nets

Fast shrinkage algorithms

Fourier projections are cheap

Coherence is irrelevant

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 2: Decomposition and Denoising for moment sequences using convex optimization

High Dimensional Inference

n p

Statistics

Microarray Data

HyperspectralImaging

Predict Ratings

Sparse Vector Estimation Gene Expressions

X

n

p

k sparse

Samples

Genes

Few Active Genes

Disease Susceptibility =

Response

Low Rank Matrices The Netflix Problem

Movie Ratings

thousands ofmovies

millions ofsubscribers

movie profile

user profile

Action

+ a handful of other features

Decomposition

Low Rank Matrices Topic Models

Term DocumentMatrix

thousands ofdocuments

millions ofwords

degree of topicality

topic profile

Politics

+ a handful of other features

Decomposition

Simple objects

Objects Notion of Simplicity Atoms

Vectors Sparsity Canonical unit vectors

Matrices Rank Rank-1 matrices

Bandlimited signals No of frequencies Complex sinusoids

Linear systems Number of poles Single pole systems

x =X

a2Acaax =

X

a2Acaa

atomic set

atomsldquofeaturesrdquo

nonnegative weights

Universal RegularizerChandrasekaran Recht Parillo Wilsky (2010)

Objects Notion of Simplicity Atomic Norm

Vectors Sparsity

Matrices Rank nuclear norm

Bandlimited signals No of frequencies

Linear systems McMillan degree

1 QRUP

xA = LQI t 0 x t FRQY(A)

= LQI

a

ca x =

aAcaa ca gt 0

t

xxA = LQI t 0 x t FRQY(A)

Moment Problems

A

atoms

+ +

x

measure space

moments of a discrete measure

atomic moments

System Identification

minus1

0

1

minus1

0

10

2

4

6

minus05 0 052

3

4

5

6

7

Line Spectral Estimation

Decomposition

How many atoms do you need to represent x

composing atoms are on an exposed face decomposition is easy to find

find a supporting hyperplane to certify

concentrate on recovering ldquogoodrdquo objects with atoms on exposed faces

t

x

Dual Certificate

1 2 3 4 5 6 7

0

1

2

3

4

5

6

X Axis

Y A

xis

Gradient

Subgradients

optimum value is atomic norm

solutions are subgradients at x

semi-infinite problem

maximize

qhq xi

subject to kqkA 1

Dual Characterization of atomic norm

kqkA = supa2A

hq ai

(under mild conditions Bonsall)

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Denoising

minimizex

1

2ky xk22 + kxkA

Tradeoff parameter

AST (Atomic Soft Thresholding)

Dual AST

maximize

qhq yi kqk22

subject to kqkA 1

y = x + wLQ Cn

QRLVH N (0 2In)

AST Results

q + x = y

kqkA 1

hq xi = kxkA

Optimality Conditions

suggests q w ) EkwkA

means

q 2 kxkACan find composing atoms

Tradeoff = Gaussian Width

No cross validation needed - estimate Gaussian width

Gaussian width also important to characterize convergence rates

AST Convergence Rates

Fast Rate

s sparse

1

nX X2

2 C2 s log(p)

n

Slow RateLasso

n

1

nX X2

2

log(p)

n1

Choose appropriate tradeoff

= E (wA) 1

Universal but slow rateMinimal assumptions on atoms

1

nx x2

2

nxA

With a ldquocone conditionrdquo faster rates are possible

Weaker than RIP Coherence RSC RE 1

nx x2

2 C2

12n

ĪIRUODUJHHQRXJK ī

Summary

Notion of simple models

Atomic norm unifies treatment of many models

Recovers tight published bounds

Bounds depend on Gaussian width

Faster rates with a local cone condition

Line Spectral Estimation

x0

xk

xn1

=

s

j=1

cjei2fjk

Line Spectrum Time Samples

Extrapolate the remaining moments

micro =

j

cjfj

Super resolution Time frequency dual

Extrapolate high frequency information from low frequency samples

Just exchange time and frequency same as line spectral estimation

Classical Line Spectrum Estimation (a crash course)

Tn(x) =

2

6664

x1 x2 xn

x

2 x1 xn1

x

n x

n1 x1

3

7775Define for any vector x

Nonlinear Parameter Estimation xk =

s

j=1

cje2ifjk

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Classical techniques

PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum

Sensitive to model orderNo rigorous theoryOnly optimal asymptotically

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Convex Perspective

0 2 4 6 8 10 12 14minus10

minus5

0

5

10

15

x =s

j=1

cja(fj)

amplitudes

frequencies

x0

xk

xn1

=

s

j=1

cjei2fjk

1

ei2fjk

ei2fj(n1)

AtomsLine Spectrum

A+ = a(f) | f [0 1]Positive Amplitudes

A =a(f)ei

f [0 1] [0 2]

Complex

Convergence rates

With high probability for ngt4

pn log(n)Choose tradeoff as

Dudleyrsquos inequalityBernsteinrsquos polynomial theorem

For a signal with n samplesof the form

Using the global AST guarantee we get

E

1

nx x2

2

C

log(n)

n

s

j=1

|cj |

yk =s

j=1

cje2ifjk + wk

n log(n) n2 log(4 log(n)) w

A

1 +

1

log(n)

n log(n) + n log(1632 log(n))

Fast Rates

E1

n

kx x

k22

C

2s

log(n)

n

minp 6=q

d(up uq) 4

n

If every two frequencies are far enough apart (minimum separation condition)

For s frequencies and n samples MSE is

Not a global condition on the dictionary local to the signal

Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda

= pn log(n) 2 (11) large enough

and the tradeoff parameter is chosen correctly

Canrsquot do much better

Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation

E1

n

kx x

k22

C

2 s log(n4s)

n

Only logarithmic factor from ldquooracle raterdquo

E1

n

kx x

k22

C

2s

log(n)

n

Our rate

E

1

nx x2

2

C2 s

n

Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda

4n separation = can construct explicit dual certificate of exact recovery

Convex hull of far enough vertices are exposed faces

Prony does not have this limitation (Then why bother Robustness guarantee)

Minimum separation is necessary

except for the positive case

Convex hull of every collection of n2 vertices is an exposed face

Infinite dimensional version of cyclic polytope which is neighbourly

No separation condition needed

Localizing Frequencies

0 02 04 06 08 10

02

04

06

08

1

Frequency

Dua

l Pol

ynom

ial V

alue

034 036 038 04 042 044 046085

09

095

1

Frequency

Dua

l Pol

ynom

ial V

alueDual Polynomial

y = x+ q

Q(f) =n1

j=0

qjei2jf

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Localization Guarantees

Spurious Amplitudes

Weighted Frequency Deviation

Near region approximation

10 01 02 03 04 05 06 07 08 09

1

0

Frequency

Du

al P

oly

no

mia

l M

agn

itu

de

True FrequencyEstimated Frequency

016nSpurious Frequency

lflF

|cl| C1

k2 ORJ(n)

n

cj

lflNj

cl

C3

k2 ORJ(n)

n

lflNj

|cl|PLQfjT

d(fj fl)

2

C2

k2 ORJ(n)

n

How to actually solve AST Semidefinite Characterizations

minimizex

1

2ky xk22 + kxkA

conv(A+) conv(A)

kxkA+ =

(x0 Tn(x) 0

+1 otherwise

Bounded Real LemmaCaratheodory Toeplitz

A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei

f [0 1] [0 2]

Complex

xA = inf

1

2ntr(Tn(u)) +

1

2t

Tn(u) x

x t

0

Alternating Direction Method of Multipliers

minimize

tuxZ

12kx yk22 +

2 (t+ u1)

subject to Z =

T (u) x

x

t

Z 0

AST SDP Formulation

(tl+1 u

l+1 x

l+1) arg mintux

L

(t u x Zl

l)

Z

l+1 argminZ0

L

(tl+1 u

l+1 x

l+1 Zl)

l+1 l +

Z

l+1 T (ul+1) x

l+1

x

l+1t

l+1

Alternating Direction Iterations

quadratic objective linear

projection on psd cone

linear dual update

L(t u x Z) =1

2kx yk22 +

2(t+ u1) +

Z

T (u) x

x

t

+

2

Z T (u) x

x

t

2

F

Form Augmented Lagrangian

Lagrangian Term Augmented Lagrangian

Momentum Parameter

| z | z

Alternative Discretization Method

Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)

We show (1 2n

N

)kxkAN kxkA kxkAN

AST is approximated by Lasso

n

frequencies on a N grid

amplitudes

c

+ w

atomic moments

Via Bernsteinrsquos Polynomial Theorem or Fritz John

Works for general epsilon nets

Fast shrinkage algorithms

Fourier projections are cheap

Coherence is irrelevant

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 3: Decomposition and Denoising for moment sequences using convex optimization

Sparse Vector Estimation Gene Expressions

X

n

p

k sparse

Samples

Genes

Few Active Genes

Disease Susceptibility =

Response

Low Rank Matrices The Netflix Problem

Movie Ratings

thousands ofmovies

millions ofsubscribers

movie profile

user profile

Action

+ a handful of other features

Decomposition

Low Rank Matrices Topic Models

Term DocumentMatrix

thousands ofdocuments

millions ofwords

degree of topicality

topic profile

Politics

+ a handful of other features

Decomposition

Simple objects

Objects Notion of Simplicity Atoms

Vectors Sparsity Canonical unit vectors

Matrices Rank Rank-1 matrices

Bandlimited signals No of frequencies Complex sinusoids

Linear systems Number of poles Single pole systems

x =X

a2Acaax =

X

a2Acaa

atomic set

atomsldquofeaturesrdquo

nonnegative weights

Universal RegularizerChandrasekaran Recht Parillo Wilsky (2010)

Objects Notion of Simplicity Atomic Norm

Vectors Sparsity

Matrices Rank nuclear norm

Bandlimited signals No of frequencies

Linear systems McMillan degree

1 QRUP

xA = LQI t 0 x t FRQY(A)

= LQI

a

ca x =

aAcaa ca gt 0

t

xxA = LQI t 0 x t FRQY(A)

Moment Problems

A

atoms

+ +

x

measure space

moments of a discrete measure

atomic moments

System Identification

minus1

0

1

minus1

0

10

2

4

6

minus05 0 052

3

4

5

6

7

Line Spectral Estimation

Decomposition

How many atoms do you need to represent x

composing atoms are on an exposed face decomposition is easy to find

find a supporting hyperplane to certify

concentrate on recovering ldquogoodrdquo objects with atoms on exposed faces

t

x

Dual Certificate

1 2 3 4 5 6 7

0

1

2

3

4

5

6

X Axis

Y A

xis

Gradient

Subgradients

optimum value is atomic norm

solutions are subgradients at x

semi-infinite problem

maximize

qhq xi

subject to kqkA 1

Dual Characterization of atomic norm

kqkA = supa2A

hq ai

(under mild conditions Bonsall)

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Denoising

minimizex

1

2ky xk22 + kxkA

Tradeoff parameter

AST (Atomic Soft Thresholding)

Dual AST

maximize

qhq yi kqk22

subject to kqkA 1

y = x + wLQ Cn

QRLVH N (0 2In)

AST Results

q + x = y

kqkA 1

hq xi = kxkA

Optimality Conditions

suggests q w ) EkwkA

means

q 2 kxkACan find composing atoms

Tradeoff = Gaussian Width

No cross validation needed - estimate Gaussian width

Gaussian width also important to characterize convergence rates

AST Convergence Rates

Fast Rate

s sparse

1

nX X2

2 C2 s log(p)

n

Slow RateLasso

n

1

nX X2

2

log(p)

n1

Choose appropriate tradeoff

= E (wA) 1

Universal but slow rateMinimal assumptions on atoms

1

nx x2

2

nxA

With a ldquocone conditionrdquo faster rates are possible

Weaker than RIP Coherence RSC RE 1

nx x2

2 C2

12n

ĪIRUODUJHHQRXJK ī

Summary

Notion of simple models

Atomic norm unifies treatment of many models

Recovers tight published bounds

Bounds depend on Gaussian width

Faster rates with a local cone condition

Line Spectral Estimation

x0

xk

xn1

=

s

j=1

cjei2fjk

Line Spectrum Time Samples

Extrapolate the remaining moments

micro =

j

cjfj

Super resolution Time frequency dual

Extrapolate high frequency information from low frequency samples

Just exchange time and frequency same as line spectral estimation

Classical Line Spectrum Estimation (a crash course)

Tn(x) =

2

6664

x1 x2 xn

x

2 x1 xn1

x

n x

n1 x1

3

7775Define for any vector x

Nonlinear Parameter Estimation xk =

s

j=1

cje2ifjk

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Classical techniques

PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum

Sensitive to model orderNo rigorous theoryOnly optimal asymptotically

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Convex Perspective

0 2 4 6 8 10 12 14minus10

minus5

0

5

10

15

x =s

j=1

cja(fj)

amplitudes

frequencies

x0

xk

xn1

=

s

j=1

cjei2fjk

1

ei2fjk

ei2fj(n1)

AtomsLine Spectrum

A+ = a(f) | f [0 1]Positive Amplitudes

A =a(f)ei

f [0 1] [0 2]

Complex

Convergence rates

With high probability for ngt4

pn log(n)Choose tradeoff as

Dudleyrsquos inequalityBernsteinrsquos polynomial theorem

For a signal with n samplesof the form

Using the global AST guarantee we get

E

1

nx x2

2

C

log(n)

n

s

j=1

|cj |

yk =s

j=1

cje2ifjk + wk

n log(n) n2 log(4 log(n)) w

A

1 +

1

log(n)

n log(n) + n log(1632 log(n))

Fast Rates

E1

n

kx x

k22

C

2s

log(n)

n

minp 6=q

d(up uq) 4

n

If every two frequencies are far enough apart (minimum separation condition)

For s frequencies and n samples MSE is

Not a global condition on the dictionary local to the signal

Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda

= pn log(n) 2 (11) large enough

and the tradeoff parameter is chosen correctly

Canrsquot do much better

Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation

E1

n

kx x

k22

C

2 s log(n4s)

n

Only logarithmic factor from ldquooracle raterdquo

E1

n

kx x

k22

C

2s

log(n)

n

Our rate

E

1

nx x2

2

C2 s

n

Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda

4n separation = can construct explicit dual certificate of exact recovery

Convex hull of far enough vertices are exposed faces

Prony does not have this limitation (Then why bother Robustness guarantee)

Minimum separation is necessary

except for the positive case

Convex hull of every collection of n2 vertices is an exposed face

Infinite dimensional version of cyclic polytope which is neighbourly

No separation condition needed

Localizing Frequencies

0 02 04 06 08 10

02

04

06

08

1

Frequency

Dua

l Pol

ynom

ial V

alue

034 036 038 04 042 044 046085

09

095

1

Frequency

Dua

l Pol

ynom

ial V

alueDual Polynomial

y = x+ q

Q(f) =n1

j=0

qjei2jf

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Localization Guarantees

Spurious Amplitudes

Weighted Frequency Deviation

Near region approximation

10 01 02 03 04 05 06 07 08 09

1

0

Frequency

Du

al P

oly

no

mia

l M

agn

itu

de

True FrequencyEstimated Frequency

016nSpurious Frequency

lflF

|cl| C1

k2 ORJ(n)

n

cj

lflNj

cl

C3

k2 ORJ(n)

n

lflNj

|cl|PLQfjT

d(fj fl)

2

C2

k2 ORJ(n)

n

How to actually solve AST Semidefinite Characterizations

minimizex

1

2ky xk22 + kxkA

conv(A+) conv(A)

kxkA+ =

(x0 Tn(x) 0

+1 otherwise

Bounded Real LemmaCaratheodory Toeplitz

A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei

f [0 1] [0 2]

Complex

xA = inf

1

2ntr(Tn(u)) +

1

2t

Tn(u) x

x t

0

Alternating Direction Method of Multipliers

minimize

tuxZ

12kx yk22 +

2 (t+ u1)

subject to Z =

T (u) x

x

t

Z 0

AST SDP Formulation

(tl+1 u

l+1 x

l+1) arg mintux

L

(t u x Zl

l)

Z

l+1 argminZ0

L

(tl+1 u

l+1 x

l+1 Zl)

l+1 l +

Z

l+1 T (ul+1) x

l+1

x

l+1t

l+1

Alternating Direction Iterations

quadratic objective linear

projection on psd cone

linear dual update

L(t u x Z) =1

2kx yk22 +

2(t+ u1) +

Z

T (u) x

x

t

+

2

Z T (u) x

x

t

2

F

Form Augmented Lagrangian

Lagrangian Term Augmented Lagrangian

Momentum Parameter

| z | z

Alternative Discretization Method

Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)

We show (1 2n

N

)kxkAN kxkA kxkAN

AST is approximated by Lasso

n

frequencies on a N grid

amplitudes

c

+ w

atomic moments

Via Bernsteinrsquos Polynomial Theorem or Fritz John

Works for general epsilon nets

Fast shrinkage algorithms

Fourier projections are cheap

Coherence is irrelevant

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 4: Decomposition and Denoising for moment sequences using convex optimization

Low Rank Matrices The Netflix Problem

Movie Ratings

thousands ofmovies

millions ofsubscribers

movie profile

user profile

Action

+ a handful of other features

Decomposition

Low Rank Matrices Topic Models

Term DocumentMatrix

thousands ofdocuments

millions ofwords

degree of topicality

topic profile

Politics

+ a handful of other features

Decomposition

Simple objects

Objects Notion of Simplicity Atoms

Vectors Sparsity Canonical unit vectors

Matrices Rank Rank-1 matrices

Bandlimited signals No of frequencies Complex sinusoids

Linear systems Number of poles Single pole systems

x =X

a2Acaax =

X

a2Acaa

atomic set

atomsldquofeaturesrdquo

nonnegative weights

Universal RegularizerChandrasekaran Recht Parillo Wilsky (2010)

Objects Notion of Simplicity Atomic Norm

Vectors Sparsity

Matrices Rank nuclear norm

Bandlimited signals No of frequencies

Linear systems McMillan degree

1 QRUP

xA = LQI t 0 x t FRQY(A)

= LQI

a

ca x =

aAcaa ca gt 0

t

xxA = LQI t 0 x t FRQY(A)

Moment Problems

A

atoms

+ +

x

measure space

moments of a discrete measure

atomic moments

System Identification

minus1

0

1

minus1

0

10

2

4

6

minus05 0 052

3

4

5

6

7

Line Spectral Estimation

Decomposition

How many atoms do you need to represent x

composing atoms are on an exposed face decomposition is easy to find

find a supporting hyperplane to certify

concentrate on recovering ldquogoodrdquo objects with atoms on exposed faces

t

x

Dual Certificate

1 2 3 4 5 6 7

0

1

2

3

4

5

6

X Axis

Y A

xis

Gradient

Subgradients

optimum value is atomic norm

solutions are subgradients at x

semi-infinite problem

maximize

qhq xi

subject to kqkA 1

Dual Characterization of atomic norm

kqkA = supa2A

hq ai

(under mild conditions Bonsall)

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Denoising

minimizex

1

2ky xk22 + kxkA

Tradeoff parameter

AST (Atomic Soft Thresholding)

Dual AST

maximize

qhq yi kqk22

subject to kqkA 1

y = x + wLQ Cn

QRLVH N (0 2In)

AST Results

q + x = y

kqkA 1

hq xi = kxkA

Optimality Conditions

suggests q w ) EkwkA

means

q 2 kxkACan find composing atoms

Tradeoff = Gaussian Width

No cross validation needed - estimate Gaussian width

Gaussian width also important to characterize convergence rates

AST Convergence Rates

Fast Rate

s sparse

1

nX X2

2 C2 s log(p)

n

Slow RateLasso

n

1

nX X2

2

log(p)

n1

Choose appropriate tradeoff

= E (wA) 1

Universal but slow rateMinimal assumptions on atoms

1

nx x2

2

nxA

With a ldquocone conditionrdquo faster rates are possible

Weaker than RIP Coherence RSC RE 1

nx x2

2 C2

12n

ĪIRUODUJHHQRXJK ī

Summary

Notion of simple models

Atomic norm unifies treatment of many models

Recovers tight published bounds

Bounds depend on Gaussian width

Faster rates with a local cone condition

Line Spectral Estimation

x0

xk

xn1

=

s

j=1

cjei2fjk

Line Spectrum Time Samples

Extrapolate the remaining moments

micro =

j

cjfj

Super resolution Time frequency dual

Extrapolate high frequency information from low frequency samples

Just exchange time and frequency same as line spectral estimation

Classical Line Spectrum Estimation (a crash course)

Tn(x) =

2

6664

x1 x2 xn

x

2 x1 xn1

x

n x

n1 x1

3

7775Define for any vector x

Nonlinear Parameter Estimation xk =

s

j=1

cje2ifjk

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Classical techniques

PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum

Sensitive to model orderNo rigorous theoryOnly optimal asymptotically

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Convex Perspective

0 2 4 6 8 10 12 14minus10

minus5

0

5

10

15

x =s

j=1

cja(fj)

amplitudes

frequencies

x0

xk

xn1

=

s

j=1

cjei2fjk

1

ei2fjk

ei2fj(n1)

AtomsLine Spectrum

A+ = a(f) | f [0 1]Positive Amplitudes

A =a(f)ei

f [0 1] [0 2]

Complex

Convergence rates

With high probability for ngt4

pn log(n)Choose tradeoff as

Dudleyrsquos inequalityBernsteinrsquos polynomial theorem

For a signal with n samplesof the form

Using the global AST guarantee we get

E

1

nx x2

2

C

log(n)

n

s

j=1

|cj |

yk =s

j=1

cje2ifjk + wk

n log(n) n2 log(4 log(n)) w

A

1 +

1

log(n)

n log(n) + n log(1632 log(n))

Fast Rates

E1

n

kx x

k22

C

2s

log(n)

n

minp 6=q

d(up uq) 4

n

If every two frequencies are far enough apart (minimum separation condition)

For s frequencies and n samples MSE is

Not a global condition on the dictionary local to the signal

Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda

= pn log(n) 2 (11) large enough

and the tradeoff parameter is chosen correctly

Canrsquot do much better

Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation

E1

n

kx x

k22

C

2 s log(n4s)

n

Only logarithmic factor from ldquooracle raterdquo

E1

n

kx x

k22

C

2s

log(n)

n

Our rate

E

1

nx x2

2

C2 s

n

Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda

4n separation = can construct explicit dual certificate of exact recovery

Convex hull of far enough vertices are exposed faces

Prony does not have this limitation (Then why bother Robustness guarantee)

Minimum separation is necessary

except for the positive case

Convex hull of every collection of n2 vertices is an exposed face

Infinite dimensional version of cyclic polytope which is neighbourly

No separation condition needed

Localizing Frequencies

0 02 04 06 08 10

02

04

06

08

1

Frequency

Dua

l Pol

ynom

ial V

alue

034 036 038 04 042 044 046085

09

095

1

Frequency

Dua

l Pol

ynom

ial V

alueDual Polynomial

y = x+ q

Q(f) =n1

j=0

qjei2jf

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Localization Guarantees

Spurious Amplitudes

Weighted Frequency Deviation

Near region approximation

10 01 02 03 04 05 06 07 08 09

1

0

Frequency

Du

al P

oly

no

mia

l M

agn

itu

de

True FrequencyEstimated Frequency

016nSpurious Frequency

lflF

|cl| C1

k2 ORJ(n)

n

cj

lflNj

cl

C3

k2 ORJ(n)

n

lflNj

|cl|PLQfjT

d(fj fl)

2

C2

k2 ORJ(n)

n

How to actually solve AST Semidefinite Characterizations

minimizex

1

2ky xk22 + kxkA

conv(A+) conv(A)

kxkA+ =

(x0 Tn(x) 0

+1 otherwise

Bounded Real LemmaCaratheodory Toeplitz

A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei

f [0 1] [0 2]

Complex

xA = inf

1

2ntr(Tn(u)) +

1

2t

Tn(u) x

x t

0

Alternating Direction Method of Multipliers

minimize

tuxZ

12kx yk22 +

2 (t+ u1)

subject to Z =

T (u) x

x

t

Z 0

AST SDP Formulation

(tl+1 u

l+1 x

l+1) arg mintux

L

(t u x Zl

l)

Z

l+1 argminZ0

L

(tl+1 u

l+1 x

l+1 Zl)

l+1 l +

Z

l+1 T (ul+1) x

l+1

x

l+1t

l+1

Alternating Direction Iterations

quadratic objective linear

projection on psd cone

linear dual update

L(t u x Z) =1

2kx yk22 +

2(t+ u1) +

Z

T (u) x

x

t

+

2

Z T (u) x

x

t

2

F

Form Augmented Lagrangian

Lagrangian Term Augmented Lagrangian

Momentum Parameter

| z | z

Alternative Discretization Method

Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)

We show (1 2n

N

)kxkAN kxkA kxkAN

AST is approximated by Lasso

n

frequencies on a N grid

amplitudes

c

+ w

atomic moments

Via Bernsteinrsquos Polynomial Theorem or Fritz John

Works for general epsilon nets

Fast shrinkage algorithms

Fourier projections are cheap

Coherence is irrelevant

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 5: Decomposition and Denoising for moment sequences using convex optimization

Low Rank Matrices Topic Models

Term DocumentMatrix

thousands ofdocuments

millions ofwords

degree of topicality

topic profile

Politics

+ a handful of other features

Decomposition

Simple objects

Objects Notion of Simplicity Atoms

Vectors Sparsity Canonical unit vectors

Matrices Rank Rank-1 matrices

Bandlimited signals No of frequencies Complex sinusoids

Linear systems Number of poles Single pole systems

x =X

a2Acaax =

X

a2Acaa

atomic set

atomsldquofeaturesrdquo

nonnegative weights

Universal RegularizerChandrasekaran Recht Parillo Wilsky (2010)

Objects Notion of Simplicity Atomic Norm

Vectors Sparsity

Matrices Rank nuclear norm

Bandlimited signals No of frequencies

Linear systems McMillan degree

1 QRUP

xA = LQI t 0 x t FRQY(A)

= LQI

a

ca x =

aAcaa ca gt 0

t

xxA = LQI t 0 x t FRQY(A)

Moment Problems

A

atoms

+ +

x

measure space

moments of a discrete measure

atomic moments

System Identification

minus1

0

1

minus1

0

10

2

4

6

minus05 0 052

3

4

5

6

7

Line Spectral Estimation

Decomposition

How many atoms do you need to represent x

composing atoms are on an exposed face decomposition is easy to find

find a supporting hyperplane to certify

concentrate on recovering ldquogoodrdquo objects with atoms on exposed faces

t

x

Dual Certificate

1 2 3 4 5 6 7

0

1

2

3

4

5

6

X Axis

Y A

xis

Gradient

Subgradients

optimum value is atomic norm

solutions are subgradients at x

semi-infinite problem

maximize

qhq xi

subject to kqkA 1

Dual Characterization of atomic norm

kqkA = supa2A

hq ai

(under mild conditions Bonsall)

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Denoising

minimizex

1

2ky xk22 + kxkA

Tradeoff parameter

AST (Atomic Soft Thresholding)

Dual AST

maximize

qhq yi kqk22

subject to kqkA 1

y = x + wLQ Cn

QRLVH N (0 2In)

AST Results

q + x = y

kqkA 1

hq xi = kxkA

Optimality Conditions

suggests q w ) EkwkA

means

q 2 kxkACan find composing atoms

Tradeoff = Gaussian Width

No cross validation needed - estimate Gaussian width

Gaussian width also important to characterize convergence rates

AST Convergence Rates

Fast Rate

s sparse

1

nX X2

2 C2 s log(p)

n

Slow RateLasso

n

1

nX X2

2

log(p)

n1

Choose appropriate tradeoff

= E (wA) 1

Universal but slow rateMinimal assumptions on atoms

1

nx x2

2

nxA

With a ldquocone conditionrdquo faster rates are possible

Weaker than RIP Coherence RSC RE 1

nx x2

2 C2

12n

ĪIRUODUJHHQRXJK ī

Summary

Notion of simple models

Atomic norm unifies treatment of many models

Recovers tight published bounds

Bounds depend on Gaussian width

Faster rates with a local cone condition

Line Spectral Estimation

x0

xk

xn1

=

s

j=1

cjei2fjk

Line Spectrum Time Samples

Extrapolate the remaining moments

micro =

j

cjfj

Super resolution Time frequency dual

Extrapolate high frequency information from low frequency samples

Just exchange time and frequency same as line spectral estimation

Classical Line Spectrum Estimation (a crash course)

Tn(x) =

2

6664

x1 x2 xn

x

2 x1 xn1

x

n x

n1 x1

3

7775Define for any vector x

Nonlinear Parameter Estimation xk =

s

j=1

cje2ifjk

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Classical techniques

PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum

Sensitive to model orderNo rigorous theoryOnly optimal asymptotically

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Convex Perspective

0 2 4 6 8 10 12 14minus10

minus5

0

5

10

15

x =s

j=1

cja(fj)

amplitudes

frequencies

x0

xk

xn1

=

s

j=1

cjei2fjk

1

ei2fjk

ei2fj(n1)

AtomsLine Spectrum

A+ = a(f) | f [0 1]Positive Amplitudes

A =a(f)ei

f [0 1] [0 2]

Complex

Convergence rates

With high probability for ngt4

pn log(n)Choose tradeoff as

Dudleyrsquos inequalityBernsteinrsquos polynomial theorem

For a signal with n samplesof the form

Using the global AST guarantee we get

E

1

nx x2

2

C

log(n)

n

s

j=1

|cj |

yk =s

j=1

cje2ifjk + wk

n log(n) n2 log(4 log(n)) w

A

1 +

1

log(n)

n log(n) + n log(1632 log(n))

Fast Rates

E1

n

kx x

k22

C

2s

log(n)

n

minp 6=q

d(up uq) 4

n

If every two frequencies are far enough apart (minimum separation condition)

For s frequencies and n samples MSE is

Not a global condition on the dictionary local to the signal

Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda

= pn log(n) 2 (11) large enough

and the tradeoff parameter is chosen correctly

Canrsquot do much better

Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation

E1

n

kx x

k22

C

2 s log(n4s)

n

Only logarithmic factor from ldquooracle raterdquo

E1

n

kx x

k22

C

2s

log(n)

n

Our rate

E

1

nx x2

2

C2 s

n

Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda

4n separation = can construct explicit dual certificate of exact recovery

Convex hull of far enough vertices are exposed faces

Prony does not have this limitation (Then why bother Robustness guarantee)

Minimum separation is necessary

except for the positive case

Convex hull of every collection of n2 vertices is an exposed face

Infinite dimensional version of cyclic polytope which is neighbourly

No separation condition needed

Localizing Frequencies

0 02 04 06 08 10

02

04

06

08

1

Frequency

Dua

l Pol

ynom

ial V

alue

034 036 038 04 042 044 046085

09

095

1

Frequency

Dua

l Pol

ynom

ial V

alueDual Polynomial

y = x+ q

Q(f) =n1

j=0

qjei2jf

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Localization Guarantees

Spurious Amplitudes

Weighted Frequency Deviation

Near region approximation

10 01 02 03 04 05 06 07 08 09

1

0

Frequency

Du

al P

oly

no

mia

l M

agn

itu

de

True FrequencyEstimated Frequency

016nSpurious Frequency

lflF

|cl| C1

k2 ORJ(n)

n

cj

lflNj

cl

C3

k2 ORJ(n)

n

lflNj

|cl|PLQfjT

d(fj fl)

2

C2

k2 ORJ(n)

n

How to actually solve AST Semidefinite Characterizations

minimizex

1

2ky xk22 + kxkA

conv(A+) conv(A)

kxkA+ =

(x0 Tn(x) 0

+1 otherwise

Bounded Real LemmaCaratheodory Toeplitz

A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei

f [0 1] [0 2]

Complex

xA = inf

1

2ntr(Tn(u)) +

1

2t

Tn(u) x

x t

0

Alternating Direction Method of Multipliers

minimize

tuxZ

12kx yk22 +

2 (t+ u1)

subject to Z =

T (u) x

x

t

Z 0

AST SDP Formulation

(tl+1 u

l+1 x

l+1) arg mintux

L

(t u x Zl

l)

Z

l+1 argminZ0

L

(tl+1 u

l+1 x

l+1 Zl)

l+1 l +

Z

l+1 T (ul+1) x

l+1

x

l+1t

l+1

Alternating Direction Iterations

quadratic objective linear

projection on psd cone

linear dual update

L(t u x Z) =1

2kx yk22 +

2(t+ u1) +

Z

T (u) x

x

t

+

2

Z T (u) x

x

t

2

F

Form Augmented Lagrangian

Lagrangian Term Augmented Lagrangian

Momentum Parameter

| z | z

Alternative Discretization Method

Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)

We show (1 2n

N

)kxkAN kxkA kxkAN

AST is approximated by Lasso

n

frequencies on a N grid

amplitudes

c

+ w

atomic moments

Via Bernsteinrsquos Polynomial Theorem or Fritz John

Works for general epsilon nets

Fast shrinkage algorithms

Fourier projections are cheap

Coherence is irrelevant

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 6: Decomposition and Denoising for moment sequences using convex optimization

Simple objects

Objects Notion of Simplicity Atoms

Vectors Sparsity Canonical unit vectors

Matrices Rank Rank-1 matrices

Bandlimited signals No of frequencies Complex sinusoids

Linear systems Number of poles Single pole systems

x =X

a2Acaax =

X

a2Acaa

atomic set

atomsldquofeaturesrdquo

nonnegative weights

Universal RegularizerChandrasekaran Recht Parillo Wilsky (2010)

Objects Notion of Simplicity Atomic Norm

Vectors Sparsity

Matrices Rank nuclear norm

Bandlimited signals No of frequencies

Linear systems McMillan degree

1 QRUP

xA = LQI t 0 x t FRQY(A)

= LQI

a

ca x =

aAcaa ca gt 0

t

xxA = LQI t 0 x t FRQY(A)

Moment Problems

A

atoms

+ +

x

measure space

moments of a discrete measure

atomic moments

System Identification

minus1

0

1

minus1

0

10

2

4

6

minus05 0 052

3

4

5

6

7

Line Spectral Estimation

Decomposition

How many atoms do you need to represent x

composing atoms are on an exposed face decomposition is easy to find

find a supporting hyperplane to certify

concentrate on recovering ldquogoodrdquo objects with atoms on exposed faces

t

x

Dual Certificate

1 2 3 4 5 6 7

0

1

2

3

4

5

6

X Axis

Y A

xis

Gradient

Subgradients

optimum value is atomic norm

solutions are subgradients at x

semi-infinite problem

maximize

qhq xi

subject to kqkA 1

Dual Characterization of atomic norm

kqkA = supa2A

hq ai

(under mild conditions Bonsall)

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Denoising

minimizex

1

2ky xk22 + kxkA

Tradeoff parameter

AST (Atomic Soft Thresholding)

Dual AST

maximize

qhq yi kqk22

subject to kqkA 1

y = x + wLQ Cn

QRLVH N (0 2In)

AST Results

q + x = y

kqkA 1

hq xi = kxkA

Optimality Conditions

suggests q w ) EkwkA

means

q 2 kxkACan find composing atoms

Tradeoff = Gaussian Width

No cross validation needed - estimate Gaussian width

Gaussian width also important to characterize convergence rates

AST Convergence Rates

Fast Rate

s sparse

1

nX X2

2 C2 s log(p)

n

Slow RateLasso

n

1

nX X2

2

log(p)

n1

Choose appropriate tradeoff

= E (wA) 1

Universal but slow rateMinimal assumptions on atoms

1

nx x2

2

nxA

With a ldquocone conditionrdquo faster rates are possible

Weaker than RIP Coherence RSC RE 1

nx x2

2 C2

12n

ĪIRUODUJHHQRXJK ī

Summary

Notion of simple models

Atomic norm unifies treatment of many models

Recovers tight published bounds

Bounds depend on Gaussian width

Faster rates with a local cone condition

Line Spectral Estimation

x0

xk

xn1

=

s

j=1

cjei2fjk

Line Spectrum Time Samples

Extrapolate the remaining moments

micro =

j

cjfj

Super resolution Time frequency dual

Extrapolate high frequency information from low frequency samples

Just exchange time and frequency same as line spectral estimation

Classical Line Spectrum Estimation (a crash course)

Tn(x) =

2

6664

x1 x2 xn

x

2 x1 xn1

x

n x

n1 x1

3

7775Define for any vector x

Nonlinear Parameter Estimation xk =

s

j=1

cje2ifjk

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Classical techniques

PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum

Sensitive to model orderNo rigorous theoryOnly optimal asymptotically

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Convex Perspective

0 2 4 6 8 10 12 14minus10

minus5

0

5

10

15

x =s

j=1

cja(fj)

amplitudes

frequencies

x0

xk

xn1

=

s

j=1

cjei2fjk

1

ei2fjk

ei2fj(n1)

AtomsLine Spectrum

A+ = a(f) | f [0 1]Positive Amplitudes

A =a(f)ei

f [0 1] [0 2]

Complex

Convergence rates

With high probability for ngt4

pn log(n)Choose tradeoff as

Dudleyrsquos inequalityBernsteinrsquos polynomial theorem

For a signal with n samplesof the form

Using the global AST guarantee we get

E

1

nx x2

2

C

log(n)

n

s

j=1

|cj |

yk =s

j=1

cje2ifjk + wk

n log(n) n2 log(4 log(n)) w

A

1 +

1

log(n)

n log(n) + n log(1632 log(n))

Fast Rates

E1

n

kx x

k22

C

2s

log(n)

n

minp 6=q

d(up uq) 4

n

If every two frequencies are far enough apart (minimum separation condition)

For s frequencies and n samples MSE is

Not a global condition on the dictionary local to the signal

Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda

= pn log(n) 2 (11) large enough

and the tradeoff parameter is chosen correctly

Canrsquot do much better

Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation

E1

n

kx x

k22

C

2 s log(n4s)

n

Only logarithmic factor from ldquooracle raterdquo

E1

n

kx x

k22

C

2s

log(n)

n

Our rate

E

1

nx x2

2

C2 s

n

Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda

4n separation = can construct explicit dual certificate of exact recovery

Convex hull of far enough vertices are exposed faces

Prony does not have this limitation (Then why bother Robustness guarantee)

Minimum separation is necessary

except for the positive case

Convex hull of every collection of n2 vertices is an exposed face

Infinite dimensional version of cyclic polytope which is neighbourly

No separation condition needed

Localizing Frequencies

0 02 04 06 08 10

02

04

06

08

1

Frequency

Dua

l Pol

ynom

ial V

alue

034 036 038 04 042 044 046085

09

095

1

Frequency

Dua

l Pol

ynom

ial V

alueDual Polynomial

y = x+ q

Q(f) =n1

j=0

qjei2jf

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Localization Guarantees

Spurious Amplitudes

Weighted Frequency Deviation

Near region approximation

10 01 02 03 04 05 06 07 08 09

1

0

Frequency

Du

al P

oly

no

mia

l M

agn

itu

de

True FrequencyEstimated Frequency

016nSpurious Frequency

lflF

|cl| C1

k2 ORJ(n)

n

cj

lflNj

cl

C3

k2 ORJ(n)

n

lflNj

|cl|PLQfjT

d(fj fl)

2

C2

k2 ORJ(n)

n

How to actually solve AST Semidefinite Characterizations

minimizex

1

2ky xk22 + kxkA

conv(A+) conv(A)

kxkA+ =

(x0 Tn(x) 0

+1 otherwise

Bounded Real LemmaCaratheodory Toeplitz

A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei

f [0 1] [0 2]

Complex

xA = inf

1

2ntr(Tn(u)) +

1

2t

Tn(u) x

x t

0

Alternating Direction Method of Multipliers

minimize

tuxZ

12kx yk22 +

2 (t+ u1)

subject to Z =

T (u) x

x

t

Z 0

AST SDP Formulation

(tl+1 u

l+1 x

l+1) arg mintux

L

(t u x Zl

l)

Z

l+1 argminZ0

L

(tl+1 u

l+1 x

l+1 Zl)

l+1 l +

Z

l+1 T (ul+1) x

l+1

x

l+1t

l+1

Alternating Direction Iterations

quadratic objective linear

projection on psd cone

linear dual update

L(t u x Z) =1

2kx yk22 +

2(t+ u1) +

Z

T (u) x

x

t

+

2

Z T (u) x

x

t

2

F

Form Augmented Lagrangian

Lagrangian Term Augmented Lagrangian

Momentum Parameter

| z | z

Alternative Discretization Method

Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)

We show (1 2n

N

)kxkAN kxkA kxkAN

AST is approximated by Lasso

n

frequencies on a N grid

amplitudes

c

+ w

atomic moments

Via Bernsteinrsquos Polynomial Theorem or Fritz John

Works for general epsilon nets

Fast shrinkage algorithms

Fourier projections are cheap

Coherence is irrelevant

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 7: Decomposition and Denoising for moment sequences using convex optimization

Universal RegularizerChandrasekaran Recht Parillo Wilsky (2010)

Objects Notion of Simplicity Atomic Norm

Vectors Sparsity

Matrices Rank nuclear norm

Bandlimited signals No of frequencies

Linear systems McMillan degree

1 QRUP

xA = LQI t 0 x t FRQY(A)

= LQI

a

ca x =

aAcaa ca gt 0

t

xxA = LQI t 0 x t FRQY(A)

Moment Problems

A

atoms

+ +

x

measure space

moments of a discrete measure

atomic moments

System Identification

minus1

0

1

minus1

0

10

2

4

6

minus05 0 052

3

4

5

6

7

Line Spectral Estimation

Decomposition

How many atoms do you need to represent x

composing atoms are on an exposed face decomposition is easy to find

find a supporting hyperplane to certify

concentrate on recovering ldquogoodrdquo objects with atoms on exposed faces

t

x

Dual Certificate

1 2 3 4 5 6 7

0

1

2

3

4

5

6

X Axis

Y A

xis

Gradient

Subgradients

optimum value is atomic norm

solutions are subgradients at x

semi-infinite problem

maximize

qhq xi

subject to kqkA 1

Dual Characterization of atomic norm

kqkA = supa2A

hq ai

(under mild conditions Bonsall)

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Denoising

minimizex

1

2ky xk22 + kxkA

Tradeoff parameter

AST (Atomic Soft Thresholding)

Dual AST

maximize

qhq yi kqk22

subject to kqkA 1

y = x + wLQ Cn

QRLVH N (0 2In)

AST Results

q + x = y

kqkA 1

hq xi = kxkA

Optimality Conditions

suggests q w ) EkwkA

means

q 2 kxkACan find composing atoms

Tradeoff = Gaussian Width

No cross validation needed - estimate Gaussian width

Gaussian width also important to characterize convergence rates

AST Convergence Rates

Fast Rate

s sparse

1

nX X2

2 C2 s log(p)

n

Slow RateLasso

n

1

nX X2

2

log(p)

n1

Choose appropriate tradeoff

= E (wA) 1

Universal but slow rateMinimal assumptions on atoms

1

nx x2

2

nxA

With a ldquocone conditionrdquo faster rates are possible

Weaker than RIP Coherence RSC RE 1

nx x2

2 C2

12n

ĪIRUODUJHHQRXJK ī

Summary

Notion of simple models

Atomic norm unifies treatment of many models

Recovers tight published bounds

Bounds depend on Gaussian width

Faster rates with a local cone condition

Line Spectral Estimation

x0

xk

xn1

=

s

j=1

cjei2fjk

Line Spectrum Time Samples

Extrapolate the remaining moments

micro =

j

cjfj

Super resolution Time frequency dual

Extrapolate high frequency information from low frequency samples

Just exchange time and frequency same as line spectral estimation

Classical Line Spectrum Estimation (a crash course)

Tn(x) =

2

6664

x1 x2 xn

x

2 x1 xn1

x

n x

n1 x1

3

7775Define for any vector x

Nonlinear Parameter Estimation xk =

s

j=1

cje2ifjk

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Classical techniques

PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum

Sensitive to model orderNo rigorous theoryOnly optimal asymptotically

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Convex Perspective

0 2 4 6 8 10 12 14minus10

minus5

0

5

10

15

x =s

j=1

cja(fj)

amplitudes

frequencies

x0

xk

xn1

=

s

j=1

cjei2fjk

1

ei2fjk

ei2fj(n1)

AtomsLine Spectrum

A+ = a(f) | f [0 1]Positive Amplitudes

A =a(f)ei

f [0 1] [0 2]

Complex

Convergence rates

With high probability for ngt4

pn log(n)Choose tradeoff as

Dudleyrsquos inequalityBernsteinrsquos polynomial theorem

For a signal with n samplesof the form

Using the global AST guarantee we get

E

1

nx x2

2

C

log(n)

n

s

j=1

|cj |

yk =s

j=1

cje2ifjk + wk

n log(n) n2 log(4 log(n)) w

A

1 +

1

log(n)

n log(n) + n log(1632 log(n))

Fast Rates

E1

n

kx x

k22

C

2s

log(n)

n

minp 6=q

d(up uq) 4

n

If every two frequencies are far enough apart (minimum separation condition)

For s frequencies and n samples MSE is

Not a global condition on the dictionary local to the signal

Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda

= pn log(n) 2 (11) large enough

and the tradeoff parameter is chosen correctly

Canrsquot do much better

Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation

E1

n

kx x

k22

C

2 s log(n4s)

n

Only logarithmic factor from ldquooracle raterdquo

E1

n

kx x

k22

C

2s

log(n)

n

Our rate

E

1

nx x2

2

C2 s

n

Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda

4n separation = can construct explicit dual certificate of exact recovery

Convex hull of far enough vertices are exposed faces

Prony does not have this limitation (Then why bother Robustness guarantee)

Minimum separation is necessary

except for the positive case

Convex hull of every collection of n2 vertices is an exposed face

Infinite dimensional version of cyclic polytope which is neighbourly

No separation condition needed

Localizing Frequencies

0 02 04 06 08 10

02

04

06

08

1

Frequency

Dua

l Pol

ynom

ial V

alue

034 036 038 04 042 044 046085

09

095

1

Frequency

Dua

l Pol

ynom

ial V

alueDual Polynomial

y = x+ q

Q(f) =n1

j=0

qjei2jf

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Localization Guarantees

Spurious Amplitudes

Weighted Frequency Deviation

Near region approximation

10 01 02 03 04 05 06 07 08 09

1

0

Frequency

Du

al P

oly

no

mia

l M

agn

itu

de

True FrequencyEstimated Frequency

016nSpurious Frequency

lflF

|cl| C1

k2 ORJ(n)

n

cj

lflNj

cl

C3

k2 ORJ(n)

n

lflNj

|cl|PLQfjT

d(fj fl)

2

C2

k2 ORJ(n)

n

How to actually solve AST Semidefinite Characterizations

minimizex

1

2ky xk22 + kxkA

conv(A+) conv(A)

kxkA+ =

(x0 Tn(x) 0

+1 otherwise

Bounded Real LemmaCaratheodory Toeplitz

A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei

f [0 1] [0 2]

Complex

xA = inf

1

2ntr(Tn(u)) +

1

2t

Tn(u) x

x t

0

Alternating Direction Method of Multipliers

minimize

tuxZ

12kx yk22 +

2 (t+ u1)

subject to Z =

T (u) x

x

t

Z 0

AST SDP Formulation

(tl+1 u

l+1 x

l+1) arg mintux

L

(t u x Zl

l)

Z

l+1 argminZ0

L

(tl+1 u

l+1 x

l+1 Zl)

l+1 l +

Z

l+1 T (ul+1) x

l+1

x

l+1t

l+1

Alternating Direction Iterations

quadratic objective linear

projection on psd cone

linear dual update

L(t u x Z) =1

2kx yk22 +

2(t+ u1) +

Z

T (u) x

x

t

+

2

Z T (u) x

x

t

2

F

Form Augmented Lagrangian

Lagrangian Term Augmented Lagrangian

Momentum Parameter

| z | z

Alternative Discretization Method

Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)

We show (1 2n

N

)kxkAN kxkA kxkAN

AST is approximated by Lasso

n

frequencies on a N grid

amplitudes

c

+ w

atomic moments

Via Bernsteinrsquos Polynomial Theorem or Fritz John

Works for general epsilon nets

Fast shrinkage algorithms

Fourier projections are cheap

Coherence is irrelevant

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 8: Decomposition and Denoising for moment sequences using convex optimization

Moment Problems

A

atoms

+ +

x

measure space

moments of a discrete measure

atomic moments

System Identification

minus1

0

1

minus1

0

10

2

4

6

minus05 0 052

3

4

5

6

7

Line Spectral Estimation

Decomposition

How many atoms do you need to represent x

composing atoms are on an exposed face decomposition is easy to find

find a supporting hyperplane to certify

concentrate on recovering ldquogoodrdquo objects with atoms on exposed faces

t

x

Dual Certificate

1 2 3 4 5 6 7

0

1

2

3

4

5

6

X Axis

Y A

xis

Gradient

Subgradients

optimum value is atomic norm

solutions are subgradients at x

semi-infinite problem

maximize

qhq xi

subject to kqkA 1

Dual Characterization of atomic norm

kqkA = supa2A

hq ai

(under mild conditions Bonsall)

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Denoising

minimizex

1

2ky xk22 + kxkA

Tradeoff parameter

AST (Atomic Soft Thresholding)

Dual AST

maximize

qhq yi kqk22

subject to kqkA 1

y = x + wLQ Cn

QRLVH N (0 2In)

AST Results

q + x = y

kqkA 1

hq xi = kxkA

Optimality Conditions

suggests q w ) EkwkA

means

q 2 kxkACan find composing atoms

Tradeoff = Gaussian Width

No cross validation needed - estimate Gaussian width

Gaussian width also important to characterize convergence rates

AST Convergence Rates

Fast Rate

s sparse

1

nX X2

2 C2 s log(p)

n

Slow RateLasso

n

1

nX X2

2

log(p)

n1

Choose appropriate tradeoff

= E (wA) 1

Universal but slow rateMinimal assumptions on atoms

1

nx x2

2

nxA

With a ldquocone conditionrdquo faster rates are possible

Weaker than RIP Coherence RSC RE 1

nx x2

2 C2

12n

ĪIRUODUJHHQRXJK ī

Summary

Notion of simple models

Atomic norm unifies treatment of many models

Recovers tight published bounds

Bounds depend on Gaussian width

Faster rates with a local cone condition

Line Spectral Estimation

x0

xk

xn1

=

s

j=1

cjei2fjk

Line Spectrum Time Samples

Extrapolate the remaining moments

micro =

j

cjfj

Super resolution Time frequency dual

Extrapolate high frequency information from low frequency samples

Just exchange time and frequency same as line spectral estimation

Classical Line Spectrum Estimation (a crash course)

Tn(x) =

2

6664

x1 x2 xn

x

2 x1 xn1

x

n x

n1 x1

3

7775Define for any vector x

Nonlinear Parameter Estimation xk =

s

j=1

cje2ifjk

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Classical techniques

PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum

Sensitive to model orderNo rigorous theoryOnly optimal asymptotically

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Convex Perspective

0 2 4 6 8 10 12 14minus10

minus5

0

5

10

15

x =s

j=1

cja(fj)

amplitudes

frequencies

x0

xk

xn1

=

s

j=1

cjei2fjk

1

ei2fjk

ei2fj(n1)

AtomsLine Spectrum

A+ = a(f) | f [0 1]Positive Amplitudes

A =a(f)ei

f [0 1] [0 2]

Complex

Convergence rates

With high probability for ngt4

pn log(n)Choose tradeoff as

Dudleyrsquos inequalityBernsteinrsquos polynomial theorem

For a signal with n samplesof the form

Using the global AST guarantee we get

E

1

nx x2

2

C

log(n)

n

s

j=1

|cj |

yk =s

j=1

cje2ifjk + wk

n log(n) n2 log(4 log(n)) w

A

1 +

1

log(n)

n log(n) + n log(1632 log(n))

Fast Rates

E1

n

kx x

k22

C

2s

log(n)

n

minp 6=q

d(up uq) 4

n

If every two frequencies are far enough apart (minimum separation condition)

For s frequencies and n samples MSE is

Not a global condition on the dictionary local to the signal

Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda

= pn log(n) 2 (11) large enough

and the tradeoff parameter is chosen correctly

Canrsquot do much better

Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation

E1

n

kx x

k22

C

2 s log(n4s)

n

Only logarithmic factor from ldquooracle raterdquo

E1

n

kx x

k22

C

2s

log(n)

n

Our rate

E

1

nx x2

2

C2 s

n

Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda

4n separation = can construct explicit dual certificate of exact recovery

Convex hull of far enough vertices are exposed faces

Prony does not have this limitation (Then why bother Robustness guarantee)

Minimum separation is necessary

except for the positive case

Convex hull of every collection of n2 vertices is an exposed face

Infinite dimensional version of cyclic polytope which is neighbourly

No separation condition needed

Localizing Frequencies

0 02 04 06 08 10

02

04

06

08

1

Frequency

Dua

l Pol

ynom

ial V

alue

034 036 038 04 042 044 046085

09

095

1

Frequency

Dua

l Pol

ynom

ial V

alueDual Polynomial

y = x+ q

Q(f) =n1

j=0

qjei2jf

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Localization Guarantees

Spurious Amplitudes

Weighted Frequency Deviation

Near region approximation

10 01 02 03 04 05 06 07 08 09

1

0

Frequency

Du

al P

oly

no

mia

l M

agn

itu

de

True FrequencyEstimated Frequency

016nSpurious Frequency

lflF

|cl| C1

k2 ORJ(n)

n

cj

lflNj

cl

C3

k2 ORJ(n)

n

lflNj

|cl|PLQfjT

d(fj fl)

2

C2

k2 ORJ(n)

n

How to actually solve AST Semidefinite Characterizations

minimizex

1

2ky xk22 + kxkA

conv(A+) conv(A)

kxkA+ =

(x0 Tn(x) 0

+1 otherwise

Bounded Real LemmaCaratheodory Toeplitz

A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei

f [0 1] [0 2]

Complex

xA = inf

1

2ntr(Tn(u)) +

1

2t

Tn(u) x

x t

0

Alternating Direction Method of Multipliers

minimize

tuxZ

12kx yk22 +

2 (t+ u1)

subject to Z =

T (u) x

x

t

Z 0

AST SDP Formulation

(tl+1 u

l+1 x

l+1) arg mintux

L

(t u x Zl

l)

Z

l+1 argminZ0

L

(tl+1 u

l+1 x

l+1 Zl)

l+1 l +

Z

l+1 T (ul+1) x

l+1

x

l+1t

l+1

Alternating Direction Iterations

quadratic objective linear

projection on psd cone

linear dual update

L(t u x Z) =1

2kx yk22 +

2(t+ u1) +

Z

T (u) x

x

t

+

2

Z T (u) x

x

t

2

F

Form Augmented Lagrangian

Lagrangian Term Augmented Lagrangian

Momentum Parameter

| z | z

Alternative Discretization Method

Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)

We show (1 2n

N

)kxkAN kxkA kxkAN

AST is approximated by Lasso

n

frequencies on a N grid

amplitudes

c

+ w

atomic moments

Via Bernsteinrsquos Polynomial Theorem or Fritz John

Works for general epsilon nets

Fast shrinkage algorithms

Fourier projections are cheap

Coherence is irrelevant

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 9: Decomposition and Denoising for moment sequences using convex optimization

Decomposition

How many atoms do you need to represent x

composing atoms are on an exposed face decomposition is easy to find

find a supporting hyperplane to certify

concentrate on recovering ldquogoodrdquo objects with atoms on exposed faces

t

x

Dual Certificate

1 2 3 4 5 6 7

0

1

2

3

4

5

6

X Axis

Y A

xis

Gradient

Subgradients

optimum value is atomic norm

solutions are subgradients at x

semi-infinite problem

maximize

qhq xi

subject to kqkA 1

Dual Characterization of atomic norm

kqkA = supa2A

hq ai

(under mild conditions Bonsall)

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Denoising

minimizex

1

2ky xk22 + kxkA

Tradeoff parameter

AST (Atomic Soft Thresholding)

Dual AST

maximize

qhq yi kqk22

subject to kqkA 1

y = x + wLQ Cn

QRLVH N (0 2In)

AST Results

q + x = y

kqkA 1

hq xi = kxkA

Optimality Conditions

suggests q w ) EkwkA

means

q 2 kxkACan find composing atoms

Tradeoff = Gaussian Width

No cross validation needed - estimate Gaussian width

Gaussian width also important to characterize convergence rates

AST Convergence Rates

Fast Rate

s sparse

1

nX X2

2 C2 s log(p)

n

Slow RateLasso

n

1

nX X2

2

log(p)

n1

Choose appropriate tradeoff

= E (wA) 1

Universal but slow rateMinimal assumptions on atoms

1

nx x2

2

nxA

With a ldquocone conditionrdquo faster rates are possible

Weaker than RIP Coherence RSC RE 1

nx x2

2 C2

12n

ĪIRUODUJHHQRXJK ī

Summary

Notion of simple models

Atomic norm unifies treatment of many models

Recovers tight published bounds

Bounds depend on Gaussian width

Faster rates with a local cone condition

Line Spectral Estimation

x0

xk

xn1

=

s

j=1

cjei2fjk

Line Spectrum Time Samples

Extrapolate the remaining moments

micro =

j

cjfj

Super resolution Time frequency dual

Extrapolate high frequency information from low frequency samples

Just exchange time and frequency same as line spectral estimation

Classical Line Spectrum Estimation (a crash course)

Tn(x) =

2

6664

x1 x2 xn

x

2 x1 xn1

x

n x

n1 x1

3

7775Define for any vector x

Nonlinear Parameter Estimation xk =

s

j=1

cje2ifjk

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Classical techniques

PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum

Sensitive to model orderNo rigorous theoryOnly optimal asymptotically

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Convex Perspective

0 2 4 6 8 10 12 14minus10

minus5

0

5

10

15

x =s

j=1

cja(fj)

amplitudes

frequencies

x0

xk

xn1

=

s

j=1

cjei2fjk

1

ei2fjk

ei2fj(n1)

AtomsLine Spectrum

A+ = a(f) | f [0 1]Positive Amplitudes

A =a(f)ei

f [0 1] [0 2]

Complex

Convergence rates

With high probability for ngt4

pn log(n)Choose tradeoff as

Dudleyrsquos inequalityBernsteinrsquos polynomial theorem

For a signal with n samplesof the form

Using the global AST guarantee we get

E

1

nx x2

2

C

log(n)

n

s

j=1

|cj |

yk =s

j=1

cje2ifjk + wk

n log(n) n2 log(4 log(n)) w

A

1 +

1

log(n)

n log(n) + n log(1632 log(n))

Fast Rates

E1

n

kx x

k22

C

2s

log(n)

n

minp 6=q

d(up uq) 4

n

If every two frequencies are far enough apart (minimum separation condition)

For s frequencies and n samples MSE is

Not a global condition on the dictionary local to the signal

Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda

= pn log(n) 2 (11) large enough

and the tradeoff parameter is chosen correctly

Canrsquot do much better

Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation

E1

n

kx x

k22

C

2 s log(n4s)

n

Only logarithmic factor from ldquooracle raterdquo

E1

n

kx x

k22

C

2s

log(n)

n

Our rate

E

1

nx x2

2

C2 s

n

Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda

4n separation = can construct explicit dual certificate of exact recovery

Convex hull of far enough vertices are exposed faces

Prony does not have this limitation (Then why bother Robustness guarantee)

Minimum separation is necessary

except for the positive case

Convex hull of every collection of n2 vertices is an exposed face

Infinite dimensional version of cyclic polytope which is neighbourly

No separation condition needed

Localizing Frequencies

0 02 04 06 08 10

02

04

06

08

1

Frequency

Dua

l Pol

ynom

ial V

alue

034 036 038 04 042 044 046085

09

095

1

Frequency

Dua

l Pol

ynom

ial V

alueDual Polynomial

y = x+ q

Q(f) =n1

j=0

qjei2jf

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Localization Guarantees

Spurious Amplitudes

Weighted Frequency Deviation

Near region approximation

10 01 02 03 04 05 06 07 08 09

1

0

Frequency

Du

al P

oly

no

mia

l M

agn

itu

de

True FrequencyEstimated Frequency

016nSpurious Frequency

lflF

|cl| C1

k2 ORJ(n)

n

cj

lflNj

cl

C3

k2 ORJ(n)

n

lflNj

|cl|PLQfjT

d(fj fl)

2

C2

k2 ORJ(n)

n

How to actually solve AST Semidefinite Characterizations

minimizex

1

2ky xk22 + kxkA

conv(A+) conv(A)

kxkA+ =

(x0 Tn(x) 0

+1 otherwise

Bounded Real LemmaCaratheodory Toeplitz

A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei

f [0 1] [0 2]

Complex

xA = inf

1

2ntr(Tn(u)) +

1

2t

Tn(u) x

x t

0

Alternating Direction Method of Multipliers

minimize

tuxZ

12kx yk22 +

2 (t+ u1)

subject to Z =

T (u) x

x

t

Z 0

AST SDP Formulation

(tl+1 u

l+1 x

l+1) arg mintux

L

(t u x Zl

l)

Z

l+1 argminZ0

L

(tl+1 u

l+1 x

l+1 Zl)

l+1 l +

Z

l+1 T (ul+1) x

l+1

x

l+1t

l+1

Alternating Direction Iterations

quadratic objective linear

projection on psd cone

linear dual update

L(t u x Z) =1

2kx yk22 +

2(t+ u1) +

Z

T (u) x

x

t

+

2

Z T (u) x

x

t

2

F

Form Augmented Lagrangian

Lagrangian Term Augmented Lagrangian

Momentum Parameter

| z | z

Alternative Discretization Method

Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)

We show (1 2n

N

)kxkAN kxkA kxkAN

AST is approximated by Lasso

n

frequencies on a N grid

amplitudes

c

+ w

atomic moments

Via Bernsteinrsquos Polynomial Theorem or Fritz John

Works for general epsilon nets

Fast shrinkage algorithms

Fourier projections are cheap

Coherence is irrelevant

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 10: Decomposition and Denoising for moment sequences using convex optimization

Dual Certificate

1 2 3 4 5 6 7

0

1

2

3

4

5

6

X Axis

Y A

xis

Gradient

Subgradients

optimum value is atomic norm

solutions are subgradients at x

semi-infinite problem

maximize

qhq xi

subject to kqkA 1

Dual Characterization of atomic norm

kqkA = supa2A

hq ai

(under mild conditions Bonsall)

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Denoising

minimizex

1

2ky xk22 + kxkA

Tradeoff parameter

AST (Atomic Soft Thresholding)

Dual AST

maximize

qhq yi kqk22

subject to kqkA 1

y = x + wLQ Cn

QRLVH N (0 2In)

AST Results

q + x = y

kqkA 1

hq xi = kxkA

Optimality Conditions

suggests q w ) EkwkA

means

q 2 kxkACan find composing atoms

Tradeoff = Gaussian Width

No cross validation needed - estimate Gaussian width

Gaussian width also important to characterize convergence rates

AST Convergence Rates

Fast Rate

s sparse

1

nX X2

2 C2 s log(p)

n

Slow RateLasso

n

1

nX X2

2

log(p)

n1

Choose appropriate tradeoff

= E (wA) 1

Universal but slow rateMinimal assumptions on atoms

1

nx x2

2

nxA

With a ldquocone conditionrdquo faster rates are possible

Weaker than RIP Coherence RSC RE 1

nx x2

2 C2

12n

ĪIRUODUJHHQRXJK ī

Summary

Notion of simple models

Atomic norm unifies treatment of many models

Recovers tight published bounds

Bounds depend on Gaussian width

Faster rates with a local cone condition

Line Spectral Estimation

x0

xk

xn1

=

s

j=1

cjei2fjk

Line Spectrum Time Samples

Extrapolate the remaining moments

micro =

j

cjfj

Super resolution Time frequency dual

Extrapolate high frequency information from low frequency samples

Just exchange time and frequency same as line spectral estimation

Classical Line Spectrum Estimation (a crash course)

Tn(x) =

2

6664

x1 x2 xn

x

2 x1 xn1

x

n x

n1 x1

3

7775Define for any vector x

Nonlinear Parameter Estimation xk =

s

j=1

cje2ifjk

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Classical techniques

PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum

Sensitive to model orderNo rigorous theoryOnly optimal asymptotically

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Convex Perspective

0 2 4 6 8 10 12 14minus10

minus5

0

5

10

15

x =s

j=1

cja(fj)

amplitudes

frequencies

x0

xk

xn1

=

s

j=1

cjei2fjk

1

ei2fjk

ei2fj(n1)

AtomsLine Spectrum

A+ = a(f) | f [0 1]Positive Amplitudes

A =a(f)ei

f [0 1] [0 2]

Complex

Convergence rates

With high probability for ngt4

pn log(n)Choose tradeoff as

Dudleyrsquos inequalityBernsteinrsquos polynomial theorem

For a signal with n samplesof the form

Using the global AST guarantee we get

E

1

nx x2

2

C

log(n)

n

s

j=1

|cj |

yk =s

j=1

cje2ifjk + wk

n log(n) n2 log(4 log(n)) w

A

1 +

1

log(n)

n log(n) + n log(1632 log(n))

Fast Rates

E1

n

kx x

k22

C

2s

log(n)

n

minp 6=q

d(up uq) 4

n

If every two frequencies are far enough apart (minimum separation condition)

For s frequencies and n samples MSE is

Not a global condition on the dictionary local to the signal

Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda

= pn log(n) 2 (11) large enough

and the tradeoff parameter is chosen correctly

Canrsquot do much better

Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation

E1

n

kx x

k22

C

2 s log(n4s)

n

Only logarithmic factor from ldquooracle raterdquo

E1

n

kx x

k22

C

2s

log(n)

n

Our rate

E

1

nx x2

2

C2 s

n

Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda

4n separation = can construct explicit dual certificate of exact recovery

Convex hull of far enough vertices are exposed faces

Prony does not have this limitation (Then why bother Robustness guarantee)

Minimum separation is necessary

except for the positive case

Convex hull of every collection of n2 vertices is an exposed face

Infinite dimensional version of cyclic polytope which is neighbourly

No separation condition needed

Localizing Frequencies

0 02 04 06 08 10

02

04

06

08

1

Frequency

Dua

l Pol

ynom

ial V

alue

034 036 038 04 042 044 046085

09

095

1

Frequency

Dua

l Pol

ynom

ial V

alueDual Polynomial

y = x+ q

Q(f) =n1

j=0

qjei2jf

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Localization Guarantees

Spurious Amplitudes

Weighted Frequency Deviation

Near region approximation

10 01 02 03 04 05 06 07 08 09

1

0

Frequency

Du

al P

oly

no

mia

l M

agn

itu

de

True FrequencyEstimated Frequency

016nSpurious Frequency

lflF

|cl| C1

k2 ORJ(n)

n

cj

lflNj

cl

C3

k2 ORJ(n)

n

lflNj

|cl|PLQfjT

d(fj fl)

2

C2

k2 ORJ(n)

n

How to actually solve AST Semidefinite Characterizations

minimizex

1

2ky xk22 + kxkA

conv(A+) conv(A)

kxkA+ =

(x0 Tn(x) 0

+1 otherwise

Bounded Real LemmaCaratheodory Toeplitz

A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei

f [0 1] [0 2]

Complex

xA = inf

1

2ntr(Tn(u)) +

1

2t

Tn(u) x

x t

0

Alternating Direction Method of Multipliers

minimize

tuxZ

12kx yk22 +

2 (t+ u1)

subject to Z =

T (u) x

x

t

Z 0

AST SDP Formulation

(tl+1 u

l+1 x

l+1) arg mintux

L

(t u x Zl

l)

Z

l+1 argminZ0

L

(tl+1 u

l+1 x

l+1 Zl)

l+1 l +

Z

l+1 T (ul+1) x

l+1

x

l+1t

l+1

Alternating Direction Iterations

quadratic objective linear

projection on psd cone

linear dual update

L(t u x Z) =1

2kx yk22 +

2(t+ u1) +

Z

T (u) x

x

t

+

2

Z T (u) x

x

t

2

F

Form Augmented Lagrangian

Lagrangian Term Augmented Lagrangian

Momentum Parameter

| z | z

Alternative Discretization Method

Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)

We show (1 2n

N

)kxkAN kxkA kxkAN

AST is approximated by Lasso

n

frequencies on a N grid

amplitudes

c

+ w

atomic moments

Via Bernsteinrsquos Polynomial Theorem or Fritz John

Works for general epsilon nets

Fast shrinkage algorithms

Fourier projections are cheap

Coherence is irrelevant

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 11: Decomposition and Denoising for moment sequences using convex optimization

Denoising

minimizex

1

2ky xk22 + kxkA

Tradeoff parameter

AST (Atomic Soft Thresholding)

Dual AST

maximize

qhq yi kqk22

subject to kqkA 1

y = x + wLQ Cn

QRLVH N (0 2In)

AST Results

q + x = y

kqkA 1

hq xi = kxkA

Optimality Conditions

suggests q w ) EkwkA

means

q 2 kxkACan find composing atoms

Tradeoff = Gaussian Width

No cross validation needed - estimate Gaussian width

Gaussian width also important to characterize convergence rates

AST Convergence Rates

Fast Rate

s sparse

1

nX X2

2 C2 s log(p)

n

Slow RateLasso

n

1

nX X2

2

log(p)

n1

Choose appropriate tradeoff

= E (wA) 1

Universal but slow rateMinimal assumptions on atoms

1

nx x2

2

nxA

With a ldquocone conditionrdquo faster rates are possible

Weaker than RIP Coherence RSC RE 1

nx x2

2 C2

12n

ĪIRUODUJHHQRXJK ī

Summary

Notion of simple models

Atomic norm unifies treatment of many models

Recovers tight published bounds

Bounds depend on Gaussian width

Faster rates with a local cone condition

Line Spectral Estimation

x0

xk

xn1

=

s

j=1

cjei2fjk

Line Spectrum Time Samples

Extrapolate the remaining moments

micro =

j

cjfj

Super resolution Time frequency dual

Extrapolate high frequency information from low frequency samples

Just exchange time and frequency same as line spectral estimation

Classical Line Spectrum Estimation (a crash course)

Tn(x) =

2

6664

x1 x2 xn

x

2 x1 xn1

x

n x

n1 x1

3

7775Define for any vector x

Nonlinear Parameter Estimation xk =

s

j=1

cje2ifjk

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Classical techniques

PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum

Sensitive to model orderNo rigorous theoryOnly optimal asymptotically

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Convex Perspective

0 2 4 6 8 10 12 14minus10

minus5

0

5

10

15

x =s

j=1

cja(fj)

amplitudes

frequencies

x0

xk

xn1

=

s

j=1

cjei2fjk

1

ei2fjk

ei2fj(n1)

AtomsLine Spectrum

A+ = a(f) | f [0 1]Positive Amplitudes

A =a(f)ei

f [0 1] [0 2]

Complex

Convergence rates

With high probability for ngt4

pn log(n)Choose tradeoff as

Dudleyrsquos inequalityBernsteinrsquos polynomial theorem

For a signal with n samplesof the form

Using the global AST guarantee we get

E

1

nx x2

2

C

log(n)

n

s

j=1

|cj |

yk =s

j=1

cje2ifjk + wk

n log(n) n2 log(4 log(n)) w

A

1 +

1

log(n)

n log(n) + n log(1632 log(n))

Fast Rates

E1

n

kx x

k22

C

2s

log(n)

n

minp 6=q

d(up uq) 4

n

If every two frequencies are far enough apart (minimum separation condition)

For s frequencies and n samples MSE is

Not a global condition on the dictionary local to the signal

Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda

= pn log(n) 2 (11) large enough

and the tradeoff parameter is chosen correctly

Canrsquot do much better

Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation

E1

n

kx x

k22

C

2 s log(n4s)

n

Only logarithmic factor from ldquooracle raterdquo

E1

n

kx x

k22

C

2s

log(n)

n

Our rate

E

1

nx x2

2

C2 s

n

Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda

4n separation = can construct explicit dual certificate of exact recovery

Convex hull of far enough vertices are exposed faces

Prony does not have this limitation (Then why bother Robustness guarantee)

Minimum separation is necessary

except for the positive case

Convex hull of every collection of n2 vertices is an exposed face

Infinite dimensional version of cyclic polytope which is neighbourly

No separation condition needed

Localizing Frequencies

0 02 04 06 08 10

02

04

06

08

1

Frequency

Dua

l Pol

ynom

ial V

alue

034 036 038 04 042 044 046085

09

095

1

Frequency

Dua

l Pol

ynom

ial V

alueDual Polynomial

y = x+ q

Q(f) =n1

j=0

qjei2jf

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Localization Guarantees

Spurious Amplitudes

Weighted Frequency Deviation

Near region approximation

10 01 02 03 04 05 06 07 08 09

1

0

Frequency

Du

al P

oly

no

mia

l M

agn

itu

de

True FrequencyEstimated Frequency

016nSpurious Frequency

lflF

|cl| C1

k2 ORJ(n)

n

cj

lflNj

cl

C3

k2 ORJ(n)

n

lflNj

|cl|PLQfjT

d(fj fl)

2

C2

k2 ORJ(n)

n

How to actually solve AST Semidefinite Characterizations

minimizex

1

2ky xk22 + kxkA

conv(A+) conv(A)

kxkA+ =

(x0 Tn(x) 0

+1 otherwise

Bounded Real LemmaCaratheodory Toeplitz

A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei

f [0 1] [0 2]

Complex

xA = inf

1

2ntr(Tn(u)) +

1

2t

Tn(u) x

x t

0

Alternating Direction Method of Multipliers

minimize

tuxZ

12kx yk22 +

2 (t+ u1)

subject to Z =

T (u) x

x

t

Z 0

AST SDP Formulation

(tl+1 u

l+1 x

l+1) arg mintux

L

(t u x Zl

l)

Z

l+1 argminZ0

L

(tl+1 u

l+1 x

l+1 Zl)

l+1 l +

Z

l+1 T (ul+1) x

l+1

x

l+1t

l+1

Alternating Direction Iterations

quadratic objective linear

projection on psd cone

linear dual update

L(t u x Z) =1

2kx yk22 +

2(t+ u1) +

Z

T (u) x

x

t

+

2

Z T (u) x

x

t

2

F

Form Augmented Lagrangian

Lagrangian Term Augmented Lagrangian

Momentum Parameter

| z | z

Alternative Discretization Method

Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)

We show (1 2n

N

)kxkAN kxkA kxkAN

AST is approximated by Lasso

n

frequencies on a N grid

amplitudes

c

+ w

atomic moments

Via Bernsteinrsquos Polynomial Theorem or Fritz John

Works for general epsilon nets

Fast shrinkage algorithms

Fourier projections are cheap

Coherence is irrelevant

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 12: Decomposition and Denoising for moment sequences using convex optimization

AST Results

q + x = y

kqkA 1

hq xi = kxkA

Optimality Conditions

suggests q w ) EkwkA

means

q 2 kxkACan find composing atoms

Tradeoff = Gaussian Width

No cross validation needed - estimate Gaussian width

Gaussian width also important to characterize convergence rates

AST Convergence Rates

Fast Rate

s sparse

1

nX X2

2 C2 s log(p)

n

Slow RateLasso

n

1

nX X2

2

log(p)

n1

Choose appropriate tradeoff

= E (wA) 1

Universal but slow rateMinimal assumptions on atoms

1

nx x2

2

nxA

With a ldquocone conditionrdquo faster rates are possible

Weaker than RIP Coherence RSC RE 1

nx x2

2 C2

12n

ĪIRUODUJHHQRXJK ī

Summary

Notion of simple models

Atomic norm unifies treatment of many models

Recovers tight published bounds

Bounds depend on Gaussian width

Faster rates with a local cone condition

Line Spectral Estimation

x0

xk

xn1

=

s

j=1

cjei2fjk

Line Spectrum Time Samples

Extrapolate the remaining moments

micro =

j

cjfj

Super resolution Time frequency dual

Extrapolate high frequency information from low frequency samples

Just exchange time and frequency same as line spectral estimation

Classical Line Spectrum Estimation (a crash course)

Tn(x) =

2

6664

x1 x2 xn

x

2 x1 xn1

x

n x

n1 x1

3

7775Define for any vector x

Nonlinear Parameter Estimation xk =

s

j=1

cje2ifjk

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Classical techniques

PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum

Sensitive to model orderNo rigorous theoryOnly optimal asymptotically

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Convex Perspective

0 2 4 6 8 10 12 14minus10

minus5

0

5

10

15

x =s

j=1

cja(fj)

amplitudes

frequencies

x0

xk

xn1

=

s

j=1

cjei2fjk

1

ei2fjk

ei2fj(n1)

AtomsLine Spectrum

A+ = a(f) | f [0 1]Positive Amplitudes

A =a(f)ei

f [0 1] [0 2]

Complex

Convergence rates

With high probability for ngt4

pn log(n)Choose tradeoff as

Dudleyrsquos inequalityBernsteinrsquos polynomial theorem

For a signal with n samplesof the form

Using the global AST guarantee we get

E

1

nx x2

2

C

log(n)

n

s

j=1

|cj |

yk =s

j=1

cje2ifjk + wk

n log(n) n2 log(4 log(n)) w

A

1 +

1

log(n)

n log(n) + n log(1632 log(n))

Fast Rates

E1

n

kx x

k22

C

2s

log(n)

n

minp 6=q

d(up uq) 4

n

If every two frequencies are far enough apart (minimum separation condition)

For s frequencies and n samples MSE is

Not a global condition on the dictionary local to the signal

Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda

= pn log(n) 2 (11) large enough

and the tradeoff parameter is chosen correctly

Canrsquot do much better

Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation

E1

n

kx x

k22

C

2 s log(n4s)

n

Only logarithmic factor from ldquooracle raterdquo

E1

n

kx x

k22

C

2s

log(n)

n

Our rate

E

1

nx x2

2

C2 s

n

Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda

4n separation = can construct explicit dual certificate of exact recovery

Convex hull of far enough vertices are exposed faces

Prony does not have this limitation (Then why bother Robustness guarantee)

Minimum separation is necessary

except for the positive case

Convex hull of every collection of n2 vertices is an exposed face

Infinite dimensional version of cyclic polytope which is neighbourly

No separation condition needed

Localizing Frequencies

0 02 04 06 08 10

02

04

06

08

1

Frequency

Dua

l Pol

ynom

ial V

alue

034 036 038 04 042 044 046085

09

095

1

Frequency

Dua

l Pol

ynom

ial V

alueDual Polynomial

y = x+ q

Q(f) =n1

j=0

qjei2jf

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Localization Guarantees

Spurious Amplitudes

Weighted Frequency Deviation

Near region approximation

10 01 02 03 04 05 06 07 08 09

1

0

Frequency

Du

al P

oly

no

mia

l M

agn

itu

de

True FrequencyEstimated Frequency

016nSpurious Frequency

lflF

|cl| C1

k2 ORJ(n)

n

cj

lflNj

cl

C3

k2 ORJ(n)

n

lflNj

|cl|PLQfjT

d(fj fl)

2

C2

k2 ORJ(n)

n

How to actually solve AST Semidefinite Characterizations

minimizex

1

2ky xk22 + kxkA

conv(A+) conv(A)

kxkA+ =

(x0 Tn(x) 0

+1 otherwise

Bounded Real LemmaCaratheodory Toeplitz

A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei

f [0 1] [0 2]

Complex

xA = inf

1

2ntr(Tn(u)) +

1

2t

Tn(u) x

x t

0

Alternating Direction Method of Multipliers

minimize

tuxZ

12kx yk22 +

2 (t+ u1)

subject to Z =

T (u) x

x

t

Z 0

AST SDP Formulation

(tl+1 u

l+1 x

l+1) arg mintux

L

(t u x Zl

l)

Z

l+1 argminZ0

L

(tl+1 u

l+1 x

l+1 Zl)

l+1 l +

Z

l+1 T (ul+1) x

l+1

x

l+1t

l+1

Alternating Direction Iterations

quadratic objective linear

projection on psd cone

linear dual update

L(t u x Z) =1

2kx yk22 +

2(t+ u1) +

Z

T (u) x

x

t

+

2

Z T (u) x

x

t

2

F

Form Augmented Lagrangian

Lagrangian Term Augmented Lagrangian

Momentum Parameter

| z | z

Alternative Discretization Method

Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)

We show (1 2n

N

)kxkAN kxkA kxkAN

AST is approximated by Lasso

n

frequencies on a N grid

amplitudes

c

+ w

atomic moments

Via Bernsteinrsquos Polynomial Theorem or Fritz John

Works for general epsilon nets

Fast shrinkage algorithms

Fourier projections are cheap

Coherence is irrelevant

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 13: Decomposition and Denoising for moment sequences using convex optimization

AST Convergence Rates

Fast Rate

s sparse

1

nX X2

2 C2 s log(p)

n

Slow RateLasso

n

1

nX X2

2

log(p)

n1

Choose appropriate tradeoff

= E (wA) 1

Universal but slow rateMinimal assumptions on atoms

1

nx x2

2

nxA

With a ldquocone conditionrdquo faster rates are possible

Weaker than RIP Coherence RSC RE 1

nx x2

2 C2

12n

ĪIRUODUJHHQRXJK ī

Summary

Notion of simple models

Atomic norm unifies treatment of many models

Recovers tight published bounds

Bounds depend on Gaussian width

Faster rates with a local cone condition

Line Spectral Estimation

x0

xk

xn1

=

s

j=1

cjei2fjk

Line Spectrum Time Samples

Extrapolate the remaining moments

micro =

j

cjfj

Super resolution Time frequency dual

Extrapolate high frequency information from low frequency samples

Just exchange time and frequency same as line spectral estimation

Classical Line Spectrum Estimation (a crash course)

Tn(x) =

2

6664

x1 x2 xn

x

2 x1 xn1

x

n x

n1 x1

3

7775Define for any vector x

Nonlinear Parameter Estimation xk =

s

j=1

cje2ifjk

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Classical techniques

PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum

Sensitive to model orderNo rigorous theoryOnly optimal asymptotically

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Convex Perspective

0 2 4 6 8 10 12 14minus10

minus5

0

5

10

15

x =s

j=1

cja(fj)

amplitudes

frequencies

x0

xk

xn1

=

s

j=1

cjei2fjk

1

ei2fjk

ei2fj(n1)

AtomsLine Spectrum

A+ = a(f) | f [0 1]Positive Amplitudes

A =a(f)ei

f [0 1] [0 2]

Complex

Convergence rates

With high probability for ngt4

pn log(n)Choose tradeoff as

Dudleyrsquos inequalityBernsteinrsquos polynomial theorem

For a signal with n samplesof the form

Using the global AST guarantee we get

E

1

nx x2

2

C

log(n)

n

s

j=1

|cj |

yk =s

j=1

cje2ifjk + wk

n log(n) n2 log(4 log(n)) w

A

1 +

1

log(n)

n log(n) + n log(1632 log(n))

Fast Rates

E1

n

kx x

k22

C

2s

log(n)

n

minp 6=q

d(up uq) 4

n

If every two frequencies are far enough apart (minimum separation condition)

For s frequencies and n samples MSE is

Not a global condition on the dictionary local to the signal

Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda

= pn log(n) 2 (11) large enough

and the tradeoff parameter is chosen correctly

Canrsquot do much better

Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation

E1

n

kx x

k22

C

2 s log(n4s)

n

Only logarithmic factor from ldquooracle raterdquo

E1

n

kx x

k22

C

2s

log(n)

n

Our rate

E

1

nx x2

2

C2 s

n

Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda

4n separation = can construct explicit dual certificate of exact recovery

Convex hull of far enough vertices are exposed faces

Prony does not have this limitation (Then why bother Robustness guarantee)

Minimum separation is necessary

except for the positive case

Convex hull of every collection of n2 vertices is an exposed face

Infinite dimensional version of cyclic polytope which is neighbourly

No separation condition needed

Localizing Frequencies

0 02 04 06 08 10

02

04

06

08

1

Frequency

Dua

l Pol

ynom

ial V

alue

034 036 038 04 042 044 046085

09

095

1

Frequency

Dua

l Pol

ynom

ial V

alueDual Polynomial

y = x+ q

Q(f) =n1

j=0

qjei2jf

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Localization Guarantees

Spurious Amplitudes

Weighted Frequency Deviation

Near region approximation

10 01 02 03 04 05 06 07 08 09

1

0

Frequency

Du

al P

oly

no

mia

l M

agn

itu

de

True FrequencyEstimated Frequency

016nSpurious Frequency

lflF

|cl| C1

k2 ORJ(n)

n

cj

lflNj

cl

C3

k2 ORJ(n)

n

lflNj

|cl|PLQfjT

d(fj fl)

2

C2

k2 ORJ(n)

n

How to actually solve AST Semidefinite Characterizations

minimizex

1

2ky xk22 + kxkA

conv(A+) conv(A)

kxkA+ =

(x0 Tn(x) 0

+1 otherwise

Bounded Real LemmaCaratheodory Toeplitz

A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei

f [0 1] [0 2]

Complex

xA = inf

1

2ntr(Tn(u)) +

1

2t

Tn(u) x

x t

0

Alternating Direction Method of Multipliers

minimize

tuxZ

12kx yk22 +

2 (t+ u1)

subject to Z =

T (u) x

x

t

Z 0

AST SDP Formulation

(tl+1 u

l+1 x

l+1) arg mintux

L

(t u x Zl

l)

Z

l+1 argminZ0

L

(tl+1 u

l+1 x

l+1 Zl)

l+1 l +

Z

l+1 T (ul+1) x

l+1

x

l+1t

l+1

Alternating Direction Iterations

quadratic objective linear

projection on psd cone

linear dual update

L(t u x Z) =1

2kx yk22 +

2(t+ u1) +

Z

T (u) x

x

t

+

2

Z T (u) x

x

t

2

F

Form Augmented Lagrangian

Lagrangian Term Augmented Lagrangian

Momentum Parameter

| z | z

Alternative Discretization Method

Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)

We show (1 2n

N

)kxkAN kxkA kxkAN

AST is approximated by Lasso

n

frequencies on a N grid

amplitudes

c

+ w

atomic moments

Via Bernsteinrsquos Polynomial Theorem or Fritz John

Works for general epsilon nets

Fast shrinkage algorithms

Fourier projections are cheap

Coherence is irrelevant

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 14: Decomposition and Denoising for moment sequences using convex optimization

Summary

Notion of simple models

Atomic norm unifies treatment of many models

Recovers tight published bounds

Bounds depend on Gaussian width

Faster rates with a local cone condition

Line Spectral Estimation

x0

xk

xn1

=

s

j=1

cjei2fjk

Line Spectrum Time Samples

Extrapolate the remaining moments

micro =

j

cjfj

Super resolution Time frequency dual

Extrapolate high frequency information from low frequency samples

Just exchange time and frequency same as line spectral estimation

Classical Line Spectrum Estimation (a crash course)

Tn(x) =

2

6664

x1 x2 xn

x

2 x1 xn1

x

n x

n1 x1

3

7775Define for any vector x

Nonlinear Parameter Estimation xk =

s

j=1

cje2ifjk

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Classical techniques

PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum

Sensitive to model orderNo rigorous theoryOnly optimal asymptotically

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Convex Perspective

0 2 4 6 8 10 12 14minus10

minus5

0

5

10

15

x =s

j=1

cja(fj)

amplitudes

frequencies

x0

xk

xn1

=

s

j=1

cjei2fjk

1

ei2fjk

ei2fj(n1)

AtomsLine Spectrum

A+ = a(f) | f [0 1]Positive Amplitudes

A =a(f)ei

f [0 1] [0 2]

Complex

Convergence rates

With high probability for ngt4

pn log(n)Choose tradeoff as

Dudleyrsquos inequalityBernsteinrsquos polynomial theorem

For a signal with n samplesof the form

Using the global AST guarantee we get

E

1

nx x2

2

C

log(n)

n

s

j=1

|cj |

yk =s

j=1

cje2ifjk + wk

n log(n) n2 log(4 log(n)) w

A

1 +

1

log(n)

n log(n) + n log(1632 log(n))

Fast Rates

E1

n

kx x

k22

C

2s

log(n)

n

minp 6=q

d(up uq) 4

n

If every two frequencies are far enough apart (minimum separation condition)

For s frequencies and n samples MSE is

Not a global condition on the dictionary local to the signal

Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda

= pn log(n) 2 (11) large enough

and the tradeoff parameter is chosen correctly

Canrsquot do much better

Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation

E1

n

kx x

k22

C

2 s log(n4s)

n

Only logarithmic factor from ldquooracle raterdquo

E1

n

kx x

k22

C

2s

log(n)

n

Our rate

E

1

nx x2

2

C2 s

n

Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda

4n separation = can construct explicit dual certificate of exact recovery

Convex hull of far enough vertices are exposed faces

Prony does not have this limitation (Then why bother Robustness guarantee)

Minimum separation is necessary

except for the positive case

Convex hull of every collection of n2 vertices is an exposed face

Infinite dimensional version of cyclic polytope which is neighbourly

No separation condition needed

Localizing Frequencies

0 02 04 06 08 10

02

04

06

08

1

Frequency

Dua

l Pol

ynom

ial V

alue

034 036 038 04 042 044 046085

09

095

1

Frequency

Dua

l Pol

ynom

ial V

alueDual Polynomial

y = x+ q

Q(f) =n1

j=0

qjei2jf

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Localization Guarantees

Spurious Amplitudes

Weighted Frequency Deviation

Near region approximation

10 01 02 03 04 05 06 07 08 09

1

0

Frequency

Du

al P

oly

no

mia

l M

agn

itu

de

True FrequencyEstimated Frequency

016nSpurious Frequency

lflF

|cl| C1

k2 ORJ(n)

n

cj

lflNj

cl

C3

k2 ORJ(n)

n

lflNj

|cl|PLQfjT

d(fj fl)

2

C2

k2 ORJ(n)

n

How to actually solve AST Semidefinite Characterizations

minimizex

1

2ky xk22 + kxkA

conv(A+) conv(A)

kxkA+ =

(x0 Tn(x) 0

+1 otherwise

Bounded Real LemmaCaratheodory Toeplitz

A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei

f [0 1] [0 2]

Complex

xA = inf

1

2ntr(Tn(u)) +

1

2t

Tn(u) x

x t

0

Alternating Direction Method of Multipliers

minimize

tuxZ

12kx yk22 +

2 (t+ u1)

subject to Z =

T (u) x

x

t

Z 0

AST SDP Formulation

(tl+1 u

l+1 x

l+1) arg mintux

L

(t u x Zl

l)

Z

l+1 argminZ0

L

(tl+1 u

l+1 x

l+1 Zl)

l+1 l +

Z

l+1 T (ul+1) x

l+1

x

l+1t

l+1

Alternating Direction Iterations

quadratic objective linear

projection on psd cone

linear dual update

L(t u x Z) =1

2kx yk22 +

2(t+ u1) +

Z

T (u) x

x

t

+

2

Z T (u) x

x

t

2

F

Form Augmented Lagrangian

Lagrangian Term Augmented Lagrangian

Momentum Parameter

| z | z

Alternative Discretization Method

Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)

We show (1 2n

N

)kxkAN kxkA kxkAN

AST is approximated by Lasso

n

frequencies on a N grid

amplitudes

c

+ w

atomic moments

Via Bernsteinrsquos Polynomial Theorem or Fritz John

Works for general epsilon nets

Fast shrinkage algorithms

Fourier projections are cheap

Coherence is irrelevant

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 15: Decomposition and Denoising for moment sequences using convex optimization

Line Spectral Estimation

x0

xk

xn1

=

s

j=1

cjei2fjk

Line Spectrum Time Samples

Extrapolate the remaining moments

micro =

j

cjfj

Super resolution Time frequency dual

Extrapolate high frequency information from low frequency samples

Just exchange time and frequency same as line spectral estimation

Classical Line Spectrum Estimation (a crash course)

Tn(x) =

2

6664

x1 x2 xn

x

2 x1 xn1

x

n x

n1 x1

3

7775Define for any vector x

Nonlinear Parameter Estimation xk =

s

j=1

cje2ifjk

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Classical techniques

PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum

Sensitive to model orderNo rigorous theoryOnly optimal asymptotically

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Convex Perspective

0 2 4 6 8 10 12 14minus10

minus5

0

5

10

15

x =s

j=1

cja(fj)

amplitudes

frequencies

x0

xk

xn1

=

s

j=1

cjei2fjk

1

ei2fjk

ei2fj(n1)

AtomsLine Spectrum

A+ = a(f) | f [0 1]Positive Amplitudes

A =a(f)ei

f [0 1] [0 2]

Complex

Convergence rates

With high probability for ngt4

pn log(n)Choose tradeoff as

Dudleyrsquos inequalityBernsteinrsquos polynomial theorem

For a signal with n samplesof the form

Using the global AST guarantee we get

E

1

nx x2

2

C

log(n)

n

s

j=1

|cj |

yk =s

j=1

cje2ifjk + wk

n log(n) n2 log(4 log(n)) w

A

1 +

1

log(n)

n log(n) + n log(1632 log(n))

Fast Rates

E1

n

kx x

k22

C

2s

log(n)

n

minp 6=q

d(up uq) 4

n

If every two frequencies are far enough apart (minimum separation condition)

For s frequencies and n samples MSE is

Not a global condition on the dictionary local to the signal

Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda

= pn log(n) 2 (11) large enough

and the tradeoff parameter is chosen correctly

Canrsquot do much better

Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation

E1

n

kx x

k22

C

2 s log(n4s)

n

Only logarithmic factor from ldquooracle raterdquo

E1

n

kx x

k22

C

2s

log(n)

n

Our rate

E

1

nx x2

2

C2 s

n

Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda

4n separation = can construct explicit dual certificate of exact recovery

Convex hull of far enough vertices are exposed faces

Prony does not have this limitation (Then why bother Robustness guarantee)

Minimum separation is necessary

except for the positive case

Convex hull of every collection of n2 vertices is an exposed face

Infinite dimensional version of cyclic polytope which is neighbourly

No separation condition needed

Localizing Frequencies

0 02 04 06 08 10

02

04

06

08

1

Frequency

Dua

l Pol

ynom

ial V

alue

034 036 038 04 042 044 046085

09

095

1

Frequency

Dua

l Pol

ynom

ial V

alueDual Polynomial

y = x+ q

Q(f) =n1

j=0

qjei2jf

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Localization Guarantees

Spurious Amplitudes

Weighted Frequency Deviation

Near region approximation

10 01 02 03 04 05 06 07 08 09

1

0

Frequency

Du

al P

oly

no

mia

l M

agn

itu

de

True FrequencyEstimated Frequency

016nSpurious Frequency

lflF

|cl| C1

k2 ORJ(n)

n

cj

lflNj

cl

C3

k2 ORJ(n)

n

lflNj

|cl|PLQfjT

d(fj fl)

2

C2

k2 ORJ(n)

n

How to actually solve AST Semidefinite Characterizations

minimizex

1

2ky xk22 + kxkA

conv(A+) conv(A)

kxkA+ =

(x0 Tn(x) 0

+1 otherwise

Bounded Real LemmaCaratheodory Toeplitz

A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei

f [0 1] [0 2]

Complex

xA = inf

1

2ntr(Tn(u)) +

1

2t

Tn(u) x

x t

0

Alternating Direction Method of Multipliers

minimize

tuxZ

12kx yk22 +

2 (t+ u1)

subject to Z =

T (u) x

x

t

Z 0

AST SDP Formulation

(tl+1 u

l+1 x

l+1) arg mintux

L

(t u x Zl

l)

Z

l+1 argminZ0

L

(tl+1 u

l+1 x

l+1 Zl)

l+1 l +

Z

l+1 T (ul+1) x

l+1

x

l+1t

l+1

Alternating Direction Iterations

quadratic objective linear

projection on psd cone

linear dual update

L(t u x Z) =1

2kx yk22 +

2(t+ u1) +

Z

T (u) x

x

t

+

2

Z T (u) x

x

t

2

F

Form Augmented Lagrangian

Lagrangian Term Augmented Lagrangian

Momentum Parameter

| z | z

Alternative Discretization Method

Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)

We show (1 2n

N

)kxkAN kxkA kxkAN

AST is approximated by Lasso

n

frequencies on a N grid

amplitudes

c

+ w

atomic moments

Via Bernsteinrsquos Polynomial Theorem or Fritz John

Works for general epsilon nets

Fast shrinkage algorithms

Fourier projections are cheap

Coherence is irrelevant

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 16: Decomposition and Denoising for moment sequences using convex optimization

Super resolution Time frequency dual

Extrapolate high frequency information from low frequency samples

Just exchange time and frequency same as line spectral estimation

Classical Line Spectrum Estimation (a crash course)

Tn(x) =

2

6664

x1 x2 xn

x

2 x1 xn1

x

n x

n1 x1

3

7775Define for any vector x

Nonlinear Parameter Estimation xk =

s

j=1

cje2ifjk

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Classical techniques

PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum

Sensitive to model orderNo rigorous theoryOnly optimal asymptotically

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Convex Perspective

0 2 4 6 8 10 12 14minus10

minus5

0

5

10

15

x =s

j=1

cja(fj)

amplitudes

frequencies

x0

xk

xn1

=

s

j=1

cjei2fjk

1

ei2fjk

ei2fj(n1)

AtomsLine Spectrum

A+ = a(f) | f [0 1]Positive Amplitudes

A =a(f)ei

f [0 1] [0 2]

Complex

Convergence rates

With high probability for ngt4

pn log(n)Choose tradeoff as

Dudleyrsquos inequalityBernsteinrsquos polynomial theorem

For a signal with n samplesof the form

Using the global AST guarantee we get

E

1

nx x2

2

C

log(n)

n

s

j=1

|cj |

yk =s

j=1

cje2ifjk + wk

n log(n) n2 log(4 log(n)) w

A

1 +

1

log(n)

n log(n) + n log(1632 log(n))

Fast Rates

E1

n

kx x

k22

C

2s

log(n)

n

minp 6=q

d(up uq) 4

n

If every two frequencies are far enough apart (minimum separation condition)

For s frequencies and n samples MSE is

Not a global condition on the dictionary local to the signal

Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda

= pn log(n) 2 (11) large enough

and the tradeoff parameter is chosen correctly

Canrsquot do much better

Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation

E1

n

kx x

k22

C

2 s log(n4s)

n

Only logarithmic factor from ldquooracle raterdquo

E1

n

kx x

k22

C

2s

log(n)

n

Our rate

E

1

nx x2

2

C2 s

n

Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda

4n separation = can construct explicit dual certificate of exact recovery

Convex hull of far enough vertices are exposed faces

Prony does not have this limitation (Then why bother Robustness guarantee)

Minimum separation is necessary

except for the positive case

Convex hull of every collection of n2 vertices is an exposed face

Infinite dimensional version of cyclic polytope which is neighbourly

No separation condition needed

Localizing Frequencies

0 02 04 06 08 10

02

04

06

08

1

Frequency

Dua

l Pol

ynom

ial V

alue

034 036 038 04 042 044 046085

09

095

1

Frequency

Dua

l Pol

ynom

ial V

alueDual Polynomial

y = x+ q

Q(f) =n1

j=0

qjei2jf

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Localization Guarantees

Spurious Amplitudes

Weighted Frequency Deviation

Near region approximation

10 01 02 03 04 05 06 07 08 09

1

0

Frequency

Du

al P

oly

no

mia

l M

agn

itu

de

True FrequencyEstimated Frequency

016nSpurious Frequency

lflF

|cl| C1

k2 ORJ(n)

n

cj

lflNj

cl

C3

k2 ORJ(n)

n

lflNj

|cl|PLQfjT

d(fj fl)

2

C2

k2 ORJ(n)

n

How to actually solve AST Semidefinite Characterizations

minimizex

1

2ky xk22 + kxkA

conv(A+) conv(A)

kxkA+ =

(x0 Tn(x) 0

+1 otherwise

Bounded Real LemmaCaratheodory Toeplitz

A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei

f [0 1] [0 2]

Complex

xA = inf

1

2ntr(Tn(u)) +

1

2t

Tn(u) x

x t

0

Alternating Direction Method of Multipliers

minimize

tuxZ

12kx yk22 +

2 (t+ u1)

subject to Z =

T (u) x

x

t

Z 0

AST SDP Formulation

(tl+1 u

l+1 x

l+1) arg mintux

L

(t u x Zl

l)

Z

l+1 argminZ0

L

(tl+1 u

l+1 x

l+1 Zl)

l+1 l +

Z

l+1 T (ul+1) x

l+1

x

l+1t

l+1

Alternating Direction Iterations

quadratic objective linear

projection on psd cone

linear dual update

L(t u x Z) =1

2kx yk22 +

2(t+ u1) +

Z

T (u) x

x

t

+

2

Z T (u) x

x

t

2

F

Form Augmented Lagrangian

Lagrangian Term Augmented Lagrangian

Momentum Parameter

| z | z

Alternative Discretization Method

Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)

We show (1 2n

N

)kxkAN kxkA kxkAN

AST is approximated by Lasso

n

frequencies on a N grid

amplitudes

c

+ w

atomic moments

Via Bernsteinrsquos Polynomial Theorem or Fritz John

Works for general epsilon nets

Fast shrinkage algorithms

Fourier projections are cheap

Coherence is irrelevant

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 17: Decomposition and Denoising for moment sequences using convex optimization

Classical Line Spectrum Estimation (a crash course)

Tn(x) =

2

6664

x1 x2 xn

x

2 x1 xn1

x

n x

n1 x1

3

7775Define for any vector x

Nonlinear Parameter Estimation xk =

s

j=1

cje2ifjk

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Classical techniques

PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum

Sensitive to model orderNo rigorous theoryOnly optimal asymptotically

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Convex Perspective

0 2 4 6 8 10 12 14minus10

minus5

0

5

10

15

x =s

j=1

cja(fj)

amplitudes

frequencies

x0

xk

xn1

=

s

j=1

cjei2fjk

1

ei2fjk

ei2fj(n1)

AtomsLine Spectrum

A+ = a(f) | f [0 1]Positive Amplitudes

A =a(f)ei

f [0 1] [0 2]

Complex

Convergence rates

With high probability for ngt4

pn log(n)Choose tradeoff as

Dudleyrsquos inequalityBernsteinrsquos polynomial theorem

For a signal with n samplesof the form

Using the global AST guarantee we get

E

1

nx x2

2

C

log(n)

n

s

j=1

|cj |

yk =s

j=1

cje2ifjk + wk

n log(n) n2 log(4 log(n)) w

A

1 +

1

log(n)

n log(n) + n log(1632 log(n))

Fast Rates

E1

n

kx x

k22

C

2s

log(n)

n

minp 6=q

d(up uq) 4

n

If every two frequencies are far enough apart (minimum separation condition)

For s frequencies and n samples MSE is

Not a global condition on the dictionary local to the signal

Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda

= pn log(n) 2 (11) large enough

and the tradeoff parameter is chosen correctly

Canrsquot do much better

Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation

E1

n

kx x

k22

C

2 s log(n4s)

n

Only logarithmic factor from ldquooracle raterdquo

E1

n

kx x

k22

C

2s

log(n)

n

Our rate

E

1

nx x2

2

C2 s

n

Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda

4n separation = can construct explicit dual certificate of exact recovery

Convex hull of far enough vertices are exposed faces

Prony does not have this limitation (Then why bother Robustness guarantee)

Minimum separation is necessary

except for the positive case

Convex hull of every collection of n2 vertices is an exposed face

Infinite dimensional version of cyclic polytope which is neighbourly

No separation condition needed

Localizing Frequencies

0 02 04 06 08 10

02

04

06

08

1

Frequency

Dua

l Pol

ynom

ial V

alue

034 036 038 04 042 044 046085

09

095

1

Frequency

Dua

l Pol

ynom

ial V

alueDual Polynomial

y = x+ q

Q(f) =n1

j=0

qjei2jf

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Localization Guarantees

Spurious Amplitudes

Weighted Frequency Deviation

Near region approximation

10 01 02 03 04 05 06 07 08 09

1

0

Frequency

Du

al P

oly

no

mia

l M

agn

itu

de

True FrequencyEstimated Frequency

016nSpurious Frequency

lflF

|cl| C1

k2 ORJ(n)

n

cj

lflNj

cl

C3

k2 ORJ(n)

n

lflNj

|cl|PLQfjT

d(fj fl)

2

C2

k2 ORJ(n)

n

How to actually solve AST Semidefinite Characterizations

minimizex

1

2ky xk22 + kxkA

conv(A+) conv(A)

kxkA+ =

(x0 Tn(x) 0

+1 otherwise

Bounded Real LemmaCaratheodory Toeplitz

A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei

f [0 1] [0 2]

Complex

xA = inf

1

2ntr(Tn(u)) +

1

2t

Tn(u) x

x t

0

Alternating Direction Method of Multipliers

minimize

tuxZ

12kx yk22 +

2 (t+ u1)

subject to Z =

T (u) x

x

t

Z 0

AST SDP Formulation

(tl+1 u

l+1 x

l+1) arg mintux

L

(t u x Zl

l)

Z

l+1 argminZ0

L

(tl+1 u

l+1 x

l+1 Zl)

l+1 l +

Z

l+1 T (ul+1) x

l+1

x

l+1t

l+1

Alternating Direction Iterations

quadratic objective linear

projection on psd cone

linear dual update

L(t u x Z) =1

2kx yk22 +

2(t+ u1) +

Z

T (u) x

x

t

+

2

Z T (u) x

x

t

2

F

Form Augmented Lagrangian

Lagrangian Term Augmented Lagrangian

Momentum Parameter

| z | z

Alternative Discretization Method

Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)

We show (1 2n

N

)kxkAN kxkA kxkAN

AST is approximated by Lasso

n

frequencies on a N grid

amplitudes

c

+ w

atomic moments

Via Bernsteinrsquos Polynomial Theorem or Fritz John

Works for general epsilon nets

Fast shrinkage algorithms

Fourier projections are cheap

Coherence is irrelevant

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 18: Decomposition and Denoising for moment sequences using convex optimization

Classical techniques

PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum

Sensitive to model orderNo rigorous theoryOnly optimal asymptotically

Low rank and ToeplitzFACT

Tn(x) =s

j=1

cj

1ei2fj

ei2(n1)fj

1 ei2fj ei2(n1)fj

Convex Perspective

0 2 4 6 8 10 12 14minus10

minus5

0

5

10

15

x =s

j=1

cja(fj)

amplitudes

frequencies

x0

xk

xn1

=

s

j=1

cjei2fjk

1

ei2fjk

ei2fj(n1)

AtomsLine Spectrum

A+ = a(f) | f [0 1]Positive Amplitudes

A =a(f)ei

f [0 1] [0 2]

Complex

Convergence rates

With high probability for ngt4

pn log(n)Choose tradeoff as

Dudleyrsquos inequalityBernsteinrsquos polynomial theorem

For a signal with n samplesof the form

Using the global AST guarantee we get

E

1

nx x2

2

C

log(n)

n

s

j=1

|cj |

yk =s

j=1

cje2ifjk + wk

n log(n) n2 log(4 log(n)) w

A

1 +

1

log(n)

n log(n) + n log(1632 log(n))

Fast Rates

E1

n

kx x

k22

C

2s

log(n)

n

minp 6=q

d(up uq) 4

n

If every two frequencies are far enough apart (minimum separation condition)

For s frequencies and n samples MSE is

Not a global condition on the dictionary local to the signal

Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda

= pn log(n) 2 (11) large enough

and the tradeoff parameter is chosen correctly

Canrsquot do much better

Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation

E1

n

kx x

k22

C

2 s log(n4s)

n

Only logarithmic factor from ldquooracle raterdquo

E1

n

kx x

k22

C

2s

log(n)

n

Our rate

E

1

nx x2

2

C2 s

n

Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda

4n separation = can construct explicit dual certificate of exact recovery

Convex hull of far enough vertices are exposed faces

Prony does not have this limitation (Then why bother Robustness guarantee)

Minimum separation is necessary

except for the positive case

Convex hull of every collection of n2 vertices is an exposed face

Infinite dimensional version of cyclic polytope which is neighbourly

No separation condition needed

Localizing Frequencies

0 02 04 06 08 10

02

04

06

08

1

Frequency

Dua

l Pol

ynom

ial V

alue

034 036 038 04 042 044 046085

09

095

1

Frequency

Dua

l Pol

ynom

ial V

alueDual Polynomial

y = x+ q

Q(f) =n1

j=0

qjei2jf

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Localization Guarantees

Spurious Amplitudes

Weighted Frequency Deviation

Near region approximation

10 01 02 03 04 05 06 07 08 09

1

0

Frequency

Du

al P

oly

no

mia

l M

agn

itu

de

True FrequencyEstimated Frequency

016nSpurious Frequency

lflF

|cl| C1

k2 ORJ(n)

n

cj

lflNj

cl

C3

k2 ORJ(n)

n

lflNj

|cl|PLQfjT

d(fj fl)

2

C2

k2 ORJ(n)

n

How to actually solve AST Semidefinite Characterizations

minimizex

1

2ky xk22 + kxkA

conv(A+) conv(A)

kxkA+ =

(x0 Tn(x) 0

+1 otherwise

Bounded Real LemmaCaratheodory Toeplitz

A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei

f [0 1] [0 2]

Complex

xA = inf

1

2ntr(Tn(u)) +

1

2t

Tn(u) x

x t

0

Alternating Direction Method of Multipliers

minimize

tuxZ

12kx yk22 +

2 (t+ u1)

subject to Z =

T (u) x

x

t

Z 0

AST SDP Formulation

(tl+1 u

l+1 x

l+1) arg mintux

L

(t u x Zl

l)

Z

l+1 argminZ0

L

(tl+1 u

l+1 x

l+1 Zl)

l+1 l +

Z

l+1 T (ul+1) x

l+1

x

l+1t

l+1

Alternating Direction Iterations

quadratic objective linear

projection on psd cone

linear dual update

L(t u x Z) =1

2kx yk22 +

2(t+ u1) +

Z

T (u) x

x

t

+

2

Z T (u) x

x

t

2

F

Form Augmented Lagrangian

Lagrangian Term Augmented Lagrangian

Momentum Parameter

| z | z

Alternative Discretization Method

Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)

We show (1 2n

N

)kxkAN kxkA kxkAN

AST is approximated by Lasso

n

frequencies on a N grid

amplitudes

c

+ w

atomic moments

Via Bernsteinrsquos Polynomial Theorem or Fritz John

Works for general epsilon nets

Fast shrinkage algorithms

Fourier projections are cheap

Coherence is irrelevant

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 19: Decomposition and Denoising for moment sequences using convex optimization

Convex Perspective

0 2 4 6 8 10 12 14minus10

minus5

0

5

10

15

x =s

j=1

cja(fj)

amplitudes

frequencies

x0

xk

xn1

=

s

j=1

cjei2fjk

1

ei2fjk

ei2fj(n1)

AtomsLine Spectrum

A+ = a(f) | f [0 1]Positive Amplitudes

A =a(f)ei

f [0 1] [0 2]

Complex

Convergence rates

With high probability for ngt4

pn log(n)Choose tradeoff as

Dudleyrsquos inequalityBernsteinrsquos polynomial theorem

For a signal with n samplesof the form

Using the global AST guarantee we get

E

1

nx x2

2

C

log(n)

n

s

j=1

|cj |

yk =s

j=1

cje2ifjk + wk

n log(n) n2 log(4 log(n)) w

A

1 +

1

log(n)

n log(n) + n log(1632 log(n))

Fast Rates

E1

n

kx x

k22

C

2s

log(n)

n

minp 6=q

d(up uq) 4

n

If every two frequencies are far enough apart (minimum separation condition)

For s frequencies and n samples MSE is

Not a global condition on the dictionary local to the signal

Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda

= pn log(n) 2 (11) large enough

and the tradeoff parameter is chosen correctly

Canrsquot do much better

Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation

E1

n

kx x

k22

C

2 s log(n4s)

n

Only logarithmic factor from ldquooracle raterdquo

E1

n

kx x

k22

C

2s

log(n)

n

Our rate

E

1

nx x2

2

C2 s

n

Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda

4n separation = can construct explicit dual certificate of exact recovery

Convex hull of far enough vertices are exposed faces

Prony does not have this limitation (Then why bother Robustness guarantee)

Minimum separation is necessary

except for the positive case

Convex hull of every collection of n2 vertices is an exposed face

Infinite dimensional version of cyclic polytope which is neighbourly

No separation condition needed

Localizing Frequencies

0 02 04 06 08 10

02

04

06

08

1

Frequency

Dua

l Pol

ynom

ial V

alue

034 036 038 04 042 044 046085

09

095

1

Frequency

Dua

l Pol

ynom

ial V

alueDual Polynomial

y = x+ q

Q(f) =n1

j=0

qjei2jf

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Localization Guarantees

Spurious Amplitudes

Weighted Frequency Deviation

Near region approximation

10 01 02 03 04 05 06 07 08 09

1

0

Frequency

Du

al P

oly

no

mia

l M

agn

itu

de

True FrequencyEstimated Frequency

016nSpurious Frequency

lflF

|cl| C1

k2 ORJ(n)

n

cj

lflNj

cl

C3

k2 ORJ(n)

n

lflNj

|cl|PLQfjT

d(fj fl)

2

C2

k2 ORJ(n)

n

How to actually solve AST Semidefinite Characterizations

minimizex

1

2ky xk22 + kxkA

conv(A+) conv(A)

kxkA+ =

(x0 Tn(x) 0

+1 otherwise

Bounded Real LemmaCaratheodory Toeplitz

A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei

f [0 1] [0 2]

Complex

xA = inf

1

2ntr(Tn(u)) +

1

2t

Tn(u) x

x t

0

Alternating Direction Method of Multipliers

minimize

tuxZ

12kx yk22 +

2 (t+ u1)

subject to Z =

T (u) x

x

t

Z 0

AST SDP Formulation

(tl+1 u

l+1 x

l+1) arg mintux

L

(t u x Zl

l)

Z

l+1 argminZ0

L

(tl+1 u

l+1 x

l+1 Zl)

l+1 l +

Z

l+1 T (ul+1) x

l+1

x

l+1t

l+1

Alternating Direction Iterations

quadratic objective linear

projection on psd cone

linear dual update

L(t u x Z) =1

2kx yk22 +

2(t+ u1) +

Z

T (u) x

x

t

+

2

Z T (u) x

x

t

2

F

Form Augmented Lagrangian

Lagrangian Term Augmented Lagrangian

Momentum Parameter

| z | z

Alternative Discretization Method

Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)

We show (1 2n

N

)kxkAN kxkA kxkAN

AST is approximated by Lasso

n

frequencies on a N grid

amplitudes

c

+ w

atomic moments

Via Bernsteinrsquos Polynomial Theorem or Fritz John

Works for general epsilon nets

Fast shrinkage algorithms

Fourier projections are cheap

Coherence is irrelevant

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 20: Decomposition and Denoising for moment sequences using convex optimization

Convergence rates

With high probability for ngt4

pn log(n)Choose tradeoff as

Dudleyrsquos inequalityBernsteinrsquos polynomial theorem

For a signal with n samplesof the form

Using the global AST guarantee we get

E

1

nx x2

2

C

log(n)

n

s

j=1

|cj |

yk =s

j=1

cje2ifjk + wk

n log(n) n2 log(4 log(n)) w

A

1 +

1

log(n)

n log(n) + n log(1632 log(n))

Fast Rates

E1

n

kx x

k22

C

2s

log(n)

n

minp 6=q

d(up uq) 4

n

If every two frequencies are far enough apart (minimum separation condition)

For s frequencies and n samples MSE is

Not a global condition on the dictionary local to the signal

Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda

= pn log(n) 2 (11) large enough

and the tradeoff parameter is chosen correctly

Canrsquot do much better

Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation

E1

n

kx x

k22

C

2 s log(n4s)

n

Only logarithmic factor from ldquooracle raterdquo

E1

n

kx x

k22

C

2s

log(n)

n

Our rate

E

1

nx x2

2

C2 s

n

Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda

4n separation = can construct explicit dual certificate of exact recovery

Convex hull of far enough vertices are exposed faces

Prony does not have this limitation (Then why bother Robustness guarantee)

Minimum separation is necessary

except for the positive case

Convex hull of every collection of n2 vertices is an exposed face

Infinite dimensional version of cyclic polytope which is neighbourly

No separation condition needed

Localizing Frequencies

0 02 04 06 08 10

02

04

06

08

1

Frequency

Dua

l Pol

ynom

ial V

alue

034 036 038 04 042 044 046085

09

095

1

Frequency

Dua

l Pol

ynom

ial V

alueDual Polynomial

y = x+ q

Q(f) =n1

j=0

qjei2jf

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Localization Guarantees

Spurious Amplitudes

Weighted Frequency Deviation

Near region approximation

10 01 02 03 04 05 06 07 08 09

1

0

Frequency

Du

al P

oly

no

mia

l M

agn

itu

de

True FrequencyEstimated Frequency

016nSpurious Frequency

lflF

|cl| C1

k2 ORJ(n)

n

cj

lflNj

cl

C3

k2 ORJ(n)

n

lflNj

|cl|PLQfjT

d(fj fl)

2

C2

k2 ORJ(n)

n

How to actually solve AST Semidefinite Characterizations

minimizex

1

2ky xk22 + kxkA

conv(A+) conv(A)

kxkA+ =

(x0 Tn(x) 0

+1 otherwise

Bounded Real LemmaCaratheodory Toeplitz

A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei

f [0 1] [0 2]

Complex

xA = inf

1

2ntr(Tn(u)) +

1

2t

Tn(u) x

x t

0

Alternating Direction Method of Multipliers

minimize

tuxZ

12kx yk22 +

2 (t+ u1)

subject to Z =

T (u) x

x

t

Z 0

AST SDP Formulation

(tl+1 u

l+1 x

l+1) arg mintux

L

(t u x Zl

l)

Z

l+1 argminZ0

L

(tl+1 u

l+1 x

l+1 Zl)

l+1 l +

Z

l+1 T (ul+1) x

l+1

x

l+1t

l+1

Alternating Direction Iterations

quadratic objective linear

projection on psd cone

linear dual update

L(t u x Z) =1

2kx yk22 +

2(t+ u1) +

Z

T (u) x

x

t

+

2

Z T (u) x

x

t

2

F

Form Augmented Lagrangian

Lagrangian Term Augmented Lagrangian

Momentum Parameter

| z | z

Alternative Discretization Method

Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)

We show (1 2n

N

)kxkAN kxkA kxkAN

AST is approximated by Lasso

n

frequencies on a N grid

amplitudes

c

+ w

atomic moments

Via Bernsteinrsquos Polynomial Theorem or Fritz John

Works for general epsilon nets

Fast shrinkage algorithms

Fourier projections are cheap

Coherence is irrelevant

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 21: Decomposition and Denoising for moment sequences using convex optimization

Fast Rates

E1

n

kx x

k22

C

2s

log(n)

n

minp 6=q

d(up uq) 4

n

If every two frequencies are far enough apart (minimum separation condition)

For s frequencies and n samples MSE is

Not a global condition on the dictionary local to the signal

Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda

= pn log(n) 2 (11) large enough

and the tradeoff parameter is chosen correctly

Canrsquot do much better

Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation

E1

n

kx x

k22

C

2 s log(n4s)

n

Only logarithmic factor from ldquooracle raterdquo

E1

n

kx x

k22

C

2s

log(n)

n

Our rate

E

1

nx x2

2

C2 s

n

Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda

4n separation = can construct explicit dual certificate of exact recovery

Convex hull of far enough vertices are exposed faces

Prony does not have this limitation (Then why bother Robustness guarantee)

Minimum separation is necessary

except for the positive case

Convex hull of every collection of n2 vertices is an exposed face

Infinite dimensional version of cyclic polytope which is neighbourly

No separation condition needed

Localizing Frequencies

0 02 04 06 08 10

02

04

06

08

1

Frequency

Dua

l Pol

ynom

ial V

alue

034 036 038 04 042 044 046085

09

095

1

Frequency

Dua

l Pol

ynom

ial V

alueDual Polynomial

y = x+ q

Q(f) =n1

j=0

qjei2jf

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Localization Guarantees

Spurious Amplitudes

Weighted Frequency Deviation

Near region approximation

10 01 02 03 04 05 06 07 08 09

1

0

Frequency

Du

al P

oly

no

mia

l M

agn

itu

de

True FrequencyEstimated Frequency

016nSpurious Frequency

lflF

|cl| C1

k2 ORJ(n)

n

cj

lflNj

cl

C3

k2 ORJ(n)

n

lflNj

|cl|PLQfjT

d(fj fl)

2

C2

k2 ORJ(n)

n

How to actually solve AST Semidefinite Characterizations

minimizex

1

2ky xk22 + kxkA

conv(A+) conv(A)

kxkA+ =

(x0 Tn(x) 0

+1 otherwise

Bounded Real LemmaCaratheodory Toeplitz

A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei

f [0 1] [0 2]

Complex

xA = inf

1

2ntr(Tn(u)) +

1

2t

Tn(u) x

x t

0

Alternating Direction Method of Multipliers

minimize

tuxZ

12kx yk22 +

2 (t+ u1)

subject to Z =

T (u) x

x

t

Z 0

AST SDP Formulation

(tl+1 u

l+1 x

l+1) arg mintux

L

(t u x Zl

l)

Z

l+1 argminZ0

L

(tl+1 u

l+1 x

l+1 Zl)

l+1 l +

Z

l+1 T (ul+1) x

l+1

x

l+1t

l+1

Alternating Direction Iterations

quadratic objective linear

projection on psd cone

linear dual update

L(t u x Z) =1

2kx yk22 +

2(t+ u1) +

Z

T (u) x

x

t

+

2

Z T (u) x

x

t

2

F

Form Augmented Lagrangian

Lagrangian Term Augmented Lagrangian

Momentum Parameter

| z | z

Alternative Discretization Method

Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)

We show (1 2n

N

)kxkAN kxkA kxkAN

AST is approximated by Lasso

n

frequencies on a N grid

amplitudes

c

+ w

atomic moments

Via Bernsteinrsquos Polynomial Theorem or Fritz John

Works for general epsilon nets

Fast shrinkage algorithms

Fourier projections are cheap

Coherence is irrelevant

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 22: Decomposition and Denoising for moment sequences using convex optimization

Canrsquot do much better

Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation

E1

n

kx x

k22

C

2 s log(n4s)

n

Only logarithmic factor from ldquooracle raterdquo

E1

n

kx x

k22

C

2s

log(n)

n

Our rate

E

1

nx x2

2

C2 s

n

Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda

4n separation = can construct explicit dual certificate of exact recovery

Convex hull of far enough vertices are exposed faces

Prony does not have this limitation (Then why bother Robustness guarantee)

Minimum separation is necessary

except for the positive case

Convex hull of every collection of n2 vertices is an exposed face

Infinite dimensional version of cyclic polytope which is neighbourly

No separation condition needed

Localizing Frequencies

0 02 04 06 08 10

02

04

06

08

1

Frequency

Dua

l Pol

ynom

ial V

alue

034 036 038 04 042 044 046085

09

095

1

Frequency

Dua

l Pol

ynom

ial V

alueDual Polynomial

y = x+ q

Q(f) =n1

j=0

qjei2jf

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Localization Guarantees

Spurious Amplitudes

Weighted Frequency Deviation

Near region approximation

10 01 02 03 04 05 06 07 08 09

1

0

Frequency

Du

al P

oly

no

mia

l M

agn

itu

de

True FrequencyEstimated Frequency

016nSpurious Frequency

lflF

|cl| C1

k2 ORJ(n)

n

cj

lflNj

cl

C3

k2 ORJ(n)

n

lflNj

|cl|PLQfjT

d(fj fl)

2

C2

k2 ORJ(n)

n

How to actually solve AST Semidefinite Characterizations

minimizex

1

2ky xk22 + kxkA

conv(A+) conv(A)

kxkA+ =

(x0 Tn(x) 0

+1 otherwise

Bounded Real LemmaCaratheodory Toeplitz

A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei

f [0 1] [0 2]

Complex

xA = inf

1

2ntr(Tn(u)) +

1

2t

Tn(u) x

x t

0

Alternating Direction Method of Multipliers

minimize

tuxZ

12kx yk22 +

2 (t+ u1)

subject to Z =

T (u) x

x

t

Z 0

AST SDP Formulation

(tl+1 u

l+1 x

l+1) arg mintux

L

(t u x Zl

l)

Z

l+1 argminZ0

L

(tl+1 u

l+1 x

l+1 Zl)

l+1 l +

Z

l+1 T (ul+1) x

l+1

x

l+1t

l+1

Alternating Direction Iterations

quadratic objective linear

projection on psd cone

linear dual update

L(t u x Z) =1

2kx yk22 +

2(t+ u1) +

Z

T (u) x

x

t

+

2

Z T (u) x

x

t

2

F

Form Augmented Lagrangian

Lagrangian Term Augmented Lagrangian

Momentum Parameter

| z | z

Alternative Discretization Method

Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)

We show (1 2n

N

)kxkAN kxkA kxkAN

AST is approximated by Lasso

n

frequencies on a N grid

amplitudes

c

+ w

atomic moments

Via Bernsteinrsquos Polynomial Theorem or Fritz John

Works for general epsilon nets

Fast shrinkage algorithms

Fourier projections are cheap

Coherence is irrelevant

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 23: Decomposition and Denoising for moment sequences using convex optimization

Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda

4n separation = can construct explicit dual certificate of exact recovery

Convex hull of far enough vertices are exposed faces

Prony does not have this limitation (Then why bother Robustness guarantee)

Minimum separation is necessary

except for the positive case

Convex hull of every collection of n2 vertices is an exposed face

Infinite dimensional version of cyclic polytope which is neighbourly

No separation condition needed

Localizing Frequencies

0 02 04 06 08 10

02

04

06

08

1

Frequency

Dua

l Pol

ynom

ial V

alue

034 036 038 04 042 044 046085

09

095

1

Frequency

Dua

l Pol

ynom

ial V

alueDual Polynomial

y = x+ q

Q(f) =n1

j=0

qjei2jf

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Localization Guarantees

Spurious Amplitudes

Weighted Frequency Deviation

Near region approximation

10 01 02 03 04 05 06 07 08 09

1

0

Frequency

Du

al P

oly

no

mia

l M

agn

itu

de

True FrequencyEstimated Frequency

016nSpurious Frequency

lflF

|cl| C1

k2 ORJ(n)

n

cj

lflNj

cl

C3

k2 ORJ(n)

n

lflNj

|cl|PLQfjT

d(fj fl)

2

C2

k2 ORJ(n)

n

How to actually solve AST Semidefinite Characterizations

minimizex

1

2ky xk22 + kxkA

conv(A+) conv(A)

kxkA+ =

(x0 Tn(x) 0

+1 otherwise

Bounded Real LemmaCaratheodory Toeplitz

A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei

f [0 1] [0 2]

Complex

xA = inf

1

2ntr(Tn(u)) +

1

2t

Tn(u) x

x t

0

Alternating Direction Method of Multipliers

minimize

tuxZ

12kx yk22 +

2 (t+ u1)

subject to Z =

T (u) x

x

t

Z 0

AST SDP Formulation

(tl+1 u

l+1 x

l+1) arg mintux

L

(t u x Zl

l)

Z

l+1 argminZ0

L

(tl+1 u

l+1 x

l+1 Zl)

l+1 l +

Z

l+1 T (ul+1) x

l+1

x

l+1t

l+1

Alternating Direction Iterations

quadratic objective linear

projection on psd cone

linear dual update

L(t u x Z) =1

2kx yk22 +

2(t+ u1) +

Z

T (u) x

x

t

+

2

Z T (u) x

x

t

2

F

Form Augmented Lagrangian

Lagrangian Term Augmented Lagrangian

Momentum Parameter

| z | z

Alternative Discretization Method

Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)

We show (1 2n

N

)kxkAN kxkA kxkAN

AST is approximated by Lasso

n

frequencies on a N grid

amplitudes

c

+ w

atomic moments

Via Bernsteinrsquos Polynomial Theorem or Fritz John

Works for general epsilon nets

Fast shrinkage algorithms

Fourier projections are cheap

Coherence is irrelevant

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 24: Decomposition and Denoising for moment sequences using convex optimization

except for the positive case

Convex hull of every collection of n2 vertices is an exposed face

Infinite dimensional version of cyclic polytope which is neighbourly

No separation condition needed

Localizing Frequencies

0 02 04 06 08 10

02

04

06

08

1

Frequency

Dua

l Pol

ynom

ial V

alue

034 036 038 04 042 044 046085

09

095

1

Frequency

Dua

l Pol

ynom

ial V

alueDual Polynomial

y = x+ q

Q(f) =n1

j=0

qjei2jf

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Localization Guarantees

Spurious Amplitudes

Weighted Frequency Deviation

Near region approximation

10 01 02 03 04 05 06 07 08 09

1

0

Frequency

Du

al P

oly

no

mia

l M

agn

itu

de

True FrequencyEstimated Frequency

016nSpurious Frequency

lflF

|cl| C1

k2 ORJ(n)

n

cj

lflNj

cl

C3

k2 ORJ(n)

n

lflNj

|cl|PLQfjT

d(fj fl)

2

C2

k2 ORJ(n)

n

How to actually solve AST Semidefinite Characterizations

minimizex

1

2ky xk22 + kxkA

conv(A+) conv(A)

kxkA+ =

(x0 Tn(x) 0

+1 otherwise

Bounded Real LemmaCaratheodory Toeplitz

A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei

f [0 1] [0 2]

Complex

xA = inf

1

2ntr(Tn(u)) +

1

2t

Tn(u) x

x t

0

Alternating Direction Method of Multipliers

minimize

tuxZ

12kx yk22 +

2 (t+ u1)

subject to Z =

T (u) x

x

t

Z 0

AST SDP Formulation

(tl+1 u

l+1 x

l+1) arg mintux

L

(t u x Zl

l)

Z

l+1 argminZ0

L

(tl+1 u

l+1 x

l+1 Zl)

l+1 l +

Z

l+1 T (ul+1) x

l+1

x

l+1t

l+1

Alternating Direction Iterations

quadratic objective linear

projection on psd cone

linear dual update

L(t u x Z) =1

2kx yk22 +

2(t+ u1) +

Z

T (u) x

x

t

+

2

Z T (u) x

x

t

2

F

Form Augmented Lagrangian

Lagrangian Term Augmented Lagrangian

Momentum Parameter

| z | z

Alternative Discretization Method

Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)

We show (1 2n

N

)kxkAN kxkA kxkAN

AST is approximated by Lasso

n

frequencies on a N grid

amplitudes

c

+ w

atomic moments

Via Bernsteinrsquos Polynomial Theorem or Fritz John

Works for general epsilon nets

Fast shrinkage algorithms

Fourier projections are cheap

Coherence is irrelevant

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 25: Decomposition and Denoising for moment sequences using convex optimization

Localizing Frequencies

0 02 04 06 08 10

02

04

06

08

1

Frequency

Dua

l Pol

ynom

ial V

alue

034 036 038 04 042 044 046085

09

095

1

Frequency

Dua

l Pol

ynom

ial V

alueDual Polynomial

y = x+ q

Q(f) =n1

j=0

qjei2jf

hq ai = 1 a 2 support

hq ai lt 1 otherwise

x

t

Localization Guarantees

Spurious Amplitudes

Weighted Frequency Deviation

Near region approximation

10 01 02 03 04 05 06 07 08 09

1

0

Frequency

Du

al P

oly

no

mia

l M

agn

itu

de

True FrequencyEstimated Frequency

016nSpurious Frequency

lflF

|cl| C1

k2 ORJ(n)

n

cj

lflNj

cl

C3

k2 ORJ(n)

n

lflNj

|cl|PLQfjT

d(fj fl)

2

C2

k2 ORJ(n)

n

How to actually solve AST Semidefinite Characterizations

minimizex

1

2ky xk22 + kxkA

conv(A+) conv(A)

kxkA+ =

(x0 Tn(x) 0

+1 otherwise

Bounded Real LemmaCaratheodory Toeplitz

A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei

f [0 1] [0 2]

Complex

xA = inf

1

2ntr(Tn(u)) +

1

2t

Tn(u) x

x t

0

Alternating Direction Method of Multipliers

minimize

tuxZ

12kx yk22 +

2 (t+ u1)

subject to Z =

T (u) x

x

t

Z 0

AST SDP Formulation

(tl+1 u

l+1 x

l+1) arg mintux

L

(t u x Zl

l)

Z

l+1 argminZ0

L

(tl+1 u

l+1 x

l+1 Zl)

l+1 l +

Z

l+1 T (ul+1) x

l+1

x

l+1t

l+1

Alternating Direction Iterations

quadratic objective linear

projection on psd cone

linear dual update

L(t u x Z) =1

2kx yk22 +

2(t+ u1) +

Z

T (u) x

x

t

+

2

Z T (u) x

x

t

2

F

Form Augmented Lagrangian

Lagrangian Term Augmented Lagrangian

Momentum Parameter

| z | z

Alternative Discretization Method

Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)

We show (1 2n

N

)kxkAN kxkA kxkAN

AST is approximated by Lasso

n

frequencies on a N grid

amplitudes

c

+ w

atomic moments

Via Bernsteinrsquos Polynomial Theorem or Fritz John

Works for general epsilon nets

Fast shrinkage algorithms

Fourier projections are cheap

Coherence is irrelevant

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 26: Decomposition and Denoising for moment sequences using convex optimization

Localization Guarantees

Spurious Amplitudes

Weighted Frequency Deviation

Near region approximation

10 01 02 03 04 05 06 07 08 09

1

0

Frequency

Du

al P

oly

no

mia

l M

agn

itu

de

True FrequencyEstimated Frequency

016nSpurious Frequency

lflF

|cl| C1

k2 ORJ(n)

n

cj

lflNj

cl

C3

k2 ORJ(n)

n

lflNj

|cl|PLQfjT

d(fj fl)

2

C2

k2 ORJ(n)

n

How to actually solve AST Semidefinite Characterizations

minimizex

1

2ky xk22 + kxkA

conv(A+) conv(A)

kxkA+ =

(x0 Tn(x) 0

+1 otherwise

Bounded Real LemmaCaratheodory Toeplitz

A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei

f [0 1] [0 2]

Complex

xA = inf

1

2ntr(Tn(u)) +

1

2t

Tn(u) x

x t

0

Alternating Direction Method of Multipliers

minimize

tuxZ

12kx yk22 +

2 (t+ u1)

subject to Z =

T (u) x

x

t

Z 0

AST SDP Formulation

(tl+1 u

l+1 x

l+1) arg mintux

L

(t u x Zl

l)

Z

l+1 argminZ0

L

(tl+1 u

l+1 x

l+1 Zl)

l+1 l +

Z

l+1 T (ul+1) x

l+1

x

l+1t

l+1

Alternating Direction Iterations

quadratic objective linear

projection on psd cone

linear dual update

L(t u x Z) =1

2kx yk22 +

2(t+ u1) +

Z

T (u) x

x

t

+

2

Z T (u) x

x

t

2

F

Form Augmented Lagrangian

Lagrangian Term Augmented Lagrangian

Momentum Parameter

| z | z

Alternative Discretization Method

Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)

We show (1 2n

N

)kxkAN kxkA kxkAN

AST is approximated by Lasso

n

frequencies on a N grid

amplitudes

c

+ w

atomic moments

Via Bernsteinrsquos Polynomial Theorem or Fritz John

Works for general epsilon nets

Fast shrinkage algorithms

Fourier projections are cheap

Coherence is irrelevant

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 27: Decomposition and Denoising for moment sequences using convex optimization

How to actually solve AST Semidefinite Characterizations

minimizex

1

2ky xk22 + kxkA

conv(A+) conv(A)

kxkA+ =

(x0 Tn(x) 0

+1 otherwise

Bounded Real LemmaCaratheodory Toeplitz

A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei

f [0 1] [0 2]

Complex

xA = inf

1

2ntr(Tn(u)) +

1

2t

Tn(u) x

x t

0

Alternating Direction Method of Multipliers

minimize

tuxZ

12kx yk22 +

2 (t+ u1)

subject to Z =

T (u) x

x

t

Z 0

AST SDP Formulation

(tl+1 u

l+1 x

l+1) arg mintux

L

(t u x Zl

l)

Z

l+1 argminZ0

L

(tl+1 u

l+1 x

l+1 Zl)

l+1 l +

Z

l+1 T (ul+1) x

l+1

x

l+1t

l+1

Alternating Direction Iterations

quadratic objective linear

projection on psd cone

linear dual update

L(t u x Z) =1

2kx yk22 +

2(t+ u1) +

Z

T (u) x

x

t

+

2

Z T (u) x

x

t

2

F

Form Augmented Lagrangian

Lagrangian Term Augmented Lagrangian

Momentum Parameter

| z | z

Alternative Discretization Method

Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)

We show (1 2n

N

)kxkAN kxkA kxkAN

AST is approximated by Lasso

n

frequencies on a N grid

amplitudes

c

+ w

atomic moments

Via Bernsteinrsquos Polynomial Theorem or Fritz John

Works for general epsilon nets

Fast shrinkage algorithms

Fourier projections are cheap

Coherence is irrelevant

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 28: Decomposition and Denoising for moment sequences using convex optimization

Alternating Direction Method of Multipliers

minimize

tuxZ

12kx yk22 +

2 (t+ u1)

subject to Z =

T (u) x

x

t

Z 0

AST SDP Formulation

(tl+1 u

l+1 x

l+1) arg mintux

L

(t u x Zl

l)

Z

l+1 argminZ0

L

(tl+1 u

l+1 x

l+1 Zl)

l+1 l +

Z

l+1 T (ul+1) x

l+1

x

l+1t

l+1

Alternating Direction Iterations

quadratic objective linear

projection on psd cone

linear dual update

L(t u x Z) =1

2kx yk22 +

2(t+ u1) +

Z

T (u) x

x

t

+

2

Z T (u) x

x

t

2

F

Form Augmented Lagrangian

Lagrangian Term Augmented Lagrangian

Momentum Parameter

| z | z

Alternative Discretization Method

Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)

We show (1 2n

N

)kxkAN kxkA kxkAN

AST is approximated by Lasso

n

frequencies on a N grid

amplitudes

c

+ w

atomic moments

Via Bernsteinrsquos Polynomial Theorem or Fritz John

Works for general epsilon nets

Fast shrinkage algorithms

Fourier projections are cheap

Coherence is irrelevant

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 29: Decomposition and Denoising for moment sequences using convex optimization

Alternative Discretization Method

Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)

We show (1 2n

N

)kxkAN kxkA kxkAN

AST is approximated by Lasso

n

frequencies on a N grid

amplitudes

c

+ w

atomic moments

Via Bernsteinrsquos Polynomial Theorem or Fritz John

Works for general epsilon nets

Fast shrinkage algorithms

Fourier projections are cheap

Coherence is irrelevant

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 30: Decomposition and Denoising for moment sequences using convex optimization

Experimental setup

n=64 or 128 or 256 samples

no of frequencies n16 n8 n4

SNR -5 dB 0 dB 20 dB

(Root) MUSIC Matrix pencil Cadzow All fed true model order

Empirical estimated (not fed) noise variance for AST

Compared mean-squared-error and localization error metrics averaged over 20 trials

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 31: Decomposition and Denoising for moment sequences using convex optimization

Mean Squared Error MSE vs SNR

iuml10 iuml5 0 5 10 15 20iuml30

iuml20

iuml10

0

10

SNR(dB)

MSE

(dB)

ASTCadzowMUSICLasso

128 samples 8 frequencies

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 32: Decomposition and Denoising for moment sequences using convex optimization

Mean Squared Error Performance Profiles

2 4 6 8 10 12 140

02

04

06

08

1

β

Ps(β

)

ASTLasso 215

MUSICCadzowMatrix Pencil

P () = Fraction of experiments with MSE less than minimum MSE

Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 33: Decomposition and Denoising for moment sequences using convex optimization

Mean Squared Error Effect of Gridding

1 2 3 4 5 6 7 8 9 100

02

04

06

08

1

β

Ps(β

)

Lasso 210

Lasso 211

Lasso 212

Lasso 213

Lasso 214

Lasso 215

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 34: Decomposition and Denoising for moment sequences using convex optimization

Frequency Localization

20 40 60 80 1000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

100 200 300 400 5000

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

5 10 15 200

02

04

06

08

1

β

Ps(β

)

AST

MUSIC

Cadzow

minus10 minus5 0 5 10 15 200

20

40

60

80

100

k = 32

SNR (dB)

m1

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

001

002

003

004

k = 32

SNR (dB)

m2

ASTMUSICCadzow

minus10 minus5 0 5 10 15 200

1

2

3

4

5

k = 32

SNR (dB)m

3

ASTMUSICCadzow

Spurious Amplitudes Weighted Frequency Deviation Near region approximation

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 35: Decomposition and Denoising for moment sequences using convex optimization

Simple LTI SystemsVibration u(t)

Response y(t)

u(t) y(t)

x[t+ 1] = Ax[t] +Bu[t]

y[t] = Cx[t] +Du[t]State Space

Find a simple linear system from time series data

Nonlinear Kalman filter based estimation

Parametric methods based on Hankel Low rank formulations

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 36: Decomposition and Denoising for moment sequences using convex optimization

Subspace Methods

2

666664

y1 y2 middot middot middot yk

y2 y3 middot middot middot yk+1

y3 y4 middot middot middot yk+2

y` y`+1 middot middot middot y`+k1

3

777775=

2

666664

C

CA

CA

2

CA

`1

3

777775

x1 x2 xk

+

2

666664

D 0 0CB D 0 0CAB CB D 0

CA

`2B D

3

777775

2

666664

u1 u2 middot middot middot uk

u2 u3 middot middot middot uk+1

u3 u4 middot middot middot uk+2

u` u`+1 middot middot middot u`+k1

3

777775

With a little computation

| z | z

| z

Y

X

U

Y = OX + TU Y U = OXU

Output

State

Input

System Matrix T

Observability

SVD Subspace ID (n4sid)

Nuclear Norm (Vandenberghe et al)

Low Rank

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 37: Decomposition and Denoising for moment sequences using convex optimization

SISO

2 1

u[t]G

0 1 2

Past inputs Hankel Operator Future Outputs

G 2( 0) 2[0)

Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 38: Decomposition and Denoising for moment sequences using convex optimization

Convex Perspective SISO Systems

G(z) =2(1 072)

z 07+

5(1 052)

z (03 + 04i)+

5(1 052)

z (03 04i)

minus1

0

1

minus1

0

10

2

4

6

A =

1 |w|2

z w

w 2 D

D

y = L (G(z)) + w

minus05 0 052

3

4

5

6

7

minimizeG

ky L(G)k22 + microkGkA

Regularize with atomic norms

8kGkA kGk1 kGkA

Theorem

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 39: Decomposition and Denoising for moment sequences using convex optimization

Atomic Norm Regularization

How to solve this

Equivalent to

Discretize and Set

k` = Lk

1 |w|2

z w`

minimizec

12ky ck22 + microkck1 Lasso

minimizeG

ky L(G)k22 + microkGkA

minimize

cx

ky xk22 + micro

Pw2D

|cw|1|w|2

subject to x =

Pw2D

c

w

L

1zw

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 40: Decomposition and Denoising for moment sequences using convex optimization

DAST Analysis

constant Hankel nuclear norm

stability radius

number of measurements

Measure uniform spaced samples of frequency response

kG Gk2H2 0kGk1

(1 )pn

G(z) =X

`

c`1 |w`|2

z w`Set

Then we have

k` =1 |w`|2

e2ikn w`

Solvex = argmin

x

12ky xk22 + microkxkA

over a net on the disk

where the coefficients correspond to the support of

x

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 41: Decomposition and Denoising for moment sequences using convex optimization

6 polesρ = 095σ = 1e-1

n=30

H2 err = 2e-2

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 42: Decomposition and Denoising for moment sequences using convex optimization

Summary

Optimal bounds for denoising using convex methods

Better empirical performance than classical approaches

Local conditions on signals instead of global dictionary conditions

Discretization works

Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 43: Decomposition and Denoising for moment sequences using convex optimization

Future Work

bull Better convergence results for discretization

bull Why does weighted mean heuristic work so well for discretization

bull Better algorithms for System Identification

bull Greedy techniques

bull More atomic sets General localization guarantees

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 44: Decomposition and Denoising for moment sequences using convex optimization

Referenced Publications

B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011

B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing

Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted

Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 45: Decomposition and Denoising for moment sequences using convex optimization

Other Publications

Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory

Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012

Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012

Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software

Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 46: Decomposition and Denoising for moment sequences using convex optimization

Cone Condition

= LQIz =0

z2

zA

z C(x)

C(x) = FRQH z | x + zA xA + zA

Inflated Descent Cone

No direction in Cᵧ with zero atomic normHVFHQWampRQH C0

(0 1)

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal

Page 47: Decomposition and Denoising for moment sequences using convex optimization

Minimum separation is necessary

Using two sinusoids

ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|

So

When separation is lower than 14n2 atomic norm canrsquot recover it

ka(f1) a(f2)k2 = 2n32|f1 f2|

Can empirically show failure with less than 1n separation and adversarial signal