matthieu kowalski - github pages€¦ · introductiondirect modelinverse problemnumerical results...

Introduction Direct model Inverse problem Numerical results

Audio declipping

Matthieu Kowalski

Univ Paris-SudL2S (GPI)

Matthieu Kowalski Audio declipping 1 / 22


1 Introduction

2 Direct model

3 Inverse problem

4 Numerical results



Audio Declipping

Original signal:

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Time (s)

-1

-0.5

0

0.5

1



Audio Declipping

Clipped signal:

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Time (s)

-1

-0.5

0

0.5

1



Audio Declipping

Goal:0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Can we get a good estimation of the original signal (blue) from theclipped one (red) ?



Reliable vs Unreliable coeff.

Introducing examples Problem statement Time-dom. framework Algorithms Experiments Conclusions

Problem description and matrix formulation

Unreliable data

Observation y

Missing data to be estimated

Original s (unknown)

Degradation

Mm

Mr

yr = Mry

ym = Mmy

Mm =

00010000000000000000000010000000000000000010000000000000000000100000000000000000100000000000000000100000000000000000001

Mr =

10000000000000000010000000000000000010000000000000000001000000000000000001000000000000000001000000000000000000010000000000000000010000000000000000000010000000000000000010

Reliable data

Audio Inpainting - V. Emiya 13Matthieu Kowalski Audio declipping 6 / 22



Reliable samples: yr = Mry

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Time (s)

-1

-0.5

0

0.5

1




Unreliable (clipped) samples: ym = Mmy

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Time (s)

-1

-0.5

0

0.5

1




Reliable + Unreliable (clipped) samples:

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Time (s)

-1

-0.5

0

0.5

1



Audio inpainting: forward problem [A. Adler, V. Emiya et Al]

We have then:yr = Mry = Mrs

where

s ∈ RN is the unknown “clean” signal;

yr ∈ RM are the “reliable” sample of the observed signal

Mr ∈ RM×N is the matrix of the reliable support of x

we can also define the ”missing” samples as

ym = Mmy = Mms



Inverse problem: data term

Using the reliable coefficients, we must have

yr = Mrs

where Mr select the reliable samples. We can use a simple `2 loss

s = argmins

1

2‖yr −Mrs‖22

We must take the clipped samples into account



Inverse problem: clipping constraints

For audio declipping, we can add the following constraint

s = argmins

1

2‖yr −Mrs‖22

s.t. Mm+

Φα > θclip

Mm−Φα < −θclip

where

Mm+

(resp. Mm−) select the positive (resp. negative) clipped

samples.

θclip is the clip threshold (here θclip = 0.2)

Problem: infinite solutions! We must add some constraints on s



Audio declipping: use a dictionnary

Let Φ a dictionnary such that:

s = Φα

where α are sparse synthesis coefficients

Audio signal: use the short time Fourier transform

s(t) = Φα =∑n,f

αn,f ϕn,f (t)

Time (s)

Freq

uenc

y (H

z)

1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2x 104



Inverse problem: a constrained sparse problem

Using the dictionnary Φ + sparsity

α = argmins

1

2‖yr −MrΦα‖22 + λ‖α‖1

s.t. Mm+

Φα > θclip

Mm−Φα < −θclip

where

Mm+

(resp. Mm−) select the positive (resp. negative) clipped

samples.

θclip is the clip threshold (here θclip = 0.2)

s = α

Problems:

the proximity operator has no closed form

Cannot use simple algorithms such as (F)ISTA



Rewrite the constraints

Idea: use a `2 loss on the clipped samples if the constraint is notrespected

If ym(t) > θclip

then L(θclip − ym(t)) = 0

If ym(t) < θclip

else L(θclip − ym(t)) = (θclip − ym(t))2

-5 -4 -3 -2 -1 0 1 2 3 4 50

5

10

15

20

25



Rewrite the constraints

The squared hinge loss:

L(θclip − ym) = [θclip − ym]2+

=∑

t:ym(t)>0

(θclip − ym(t))2+ +∑

t:ym(t)<0

(−θclip + ym(t))2+

= [θclip −MmΦα]2+

-5 -4 -3 -2 -1 0 1 2 3 4 50

5

10

15

20

25



Audio declipping: (convex unconstrained) inverse problem

We consider the following unconstrained convex problem:

α = argminα

1

2‖yr −MrΦα‖22 +

1

2[θclip −MmΦα]2+ + λ‖α‖1

which is under the form

f1(α) + f2(α)

with f1 Lipschitz-differentiable and f2 semi-convex.

We can apply (relaxed)-ISTA directly !



FISTA for declipping



Thresolding operators



Numerical results

0 0.2 0.4 0.6 0.8 10

2

4

6

8

10

12

14

16

18

Clipping Level

Ave

rage

SN

Rm

Impr

ovem

ent

Speech @ 16kHz

LEWWGLPEWHTOMP

0 0.2 0.4 0.6 0.8 10

1

2

3

4

5

6

7

8

9

10

11

Clipping Level

Ave

rage

SN

Rm

Impr

ovem

ent

Music @ 16kHz

Average SNRmiss for 10 speech (left) and music (right) signals overdifferent clipping levels and operators. Neighborhoods extend 3 and 7coefficients in time for speech and music signals, respectively.



Numerical results: zoom on reconstructions

4230 4235 4240 4245 4250 4255

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

Time in ms

Am

plitu

de

OriginalPEWEWLWGLHTOMP

Declipped music signal using different operators for clip level θclip = 0.2using the Lasso, WGL, EW, PEW, HT, and OMP operators.Neighborhood size for WGL and PEW was 7.



Original Vs clipped Vs declipped Signal

0 1 2 3 4 5−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

0 1 2 3 4 5−1

−0.5

0

0.5

1


matthieu kowalski - github pages€¦ · introductiondirect modelinverse problemnumerical results...

Documents