source separation tutorial mini-series injb/teaching/sstutorial/part...speech enhancement: a source...

63
Source Separation Tutorial Mini-Series I Speech enhancement algorithms Eunjoon Cho Stanford University, EE April 2nd, 2013 Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 1 / 41

Upload: others

Post on 05-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Source Separation Tutorial Mini-Series ISpeech enhancement algorithms

Eunjoon Cho

Stanford University, EE

April 2nd, 2013

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 1 / 41

Page 2: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Speech enhancement: A source separation perspective

Source separation: Decoupling of two or more sources with no, littleor some prior information.

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 2 / 41

Page 3: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Speech enhancement: A source separation perspective

Source separation: Decoupling of two or more sources with no, littleor some prior information.

Speech enhancement is a natural application for source separation.

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 2 / 41

Page 4: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Speech enhancement: The application perspective

Goal: Increase the intelligibility/quality of noisy speech.

Practical constraints.

Computationally efficient: Real-time applications on mobile phones,teleconferences.A solution independent of the noise environment.Stronger emphasis on reconstructing speech (differentobjective/subjective measures).

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 3 / 41

Page 5: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Speech enhancement: The application perspective

Goal: Increase the intelligibility/quality of noisy speech.

Practical constraints.

Computationally efficient: Real-time applications on mobile phones,teleconferences.A solution independent of the noise environment.Stronger emphasis on reconstructing speech (differentobjective/subjective measures).

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 3 / 41

Page 6: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Speech enhancement: The application perspective

Goal: Increase the intelligibility/quality of noisy speech.

Practical constraints.

Computationally efficient: Real-time applications on mobile phones,teleconferences.

A solution independent of the noise environment.Stronger emphasis on reconstructing speech (differentobjective/subjective measures).

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 3 / 41

Page 7: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Speech enhancement: The application perspective

Goal: Increase the intelligibility/quality of noisy speech.

Practical constraints.

Computationally efficient: Real-time applications on mobile phones,teleconferences.A solution independent of the noise environment.

Stronger emphasis on reconstructing speech (differentobjective/subjective measures).

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 3 / 41

Page 8: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Speech enhancement: The application perspective

Goal: Increase the intelligibility/quality of noisy speech.

Practical constraints.

Computationally efficient: Real-time applications on mobile phones,teleconferences.A solution independent of the noise environment.Stronger emphasis on reconstructing speech (differentobjective/subjective measures).

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 3 / 41

Page 9: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Objective

Present some of the well-established methods in the speechenhancement literature and discuss the relationship between them.

Shed insight on how such methods differ in approach andassumptions with methods that rely on matrix factorization and/orprior training of sources.

Working code that can act as baselines for any speech enhancementwork.

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 4 / 41

Page 10: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Objective

Present some of the well-established methods in the speechenhancement literature and discuss the relationship between them.

Shed insight on how such methods differ in approach andassumptions with methods that rely on matrix factorization and/orprior training of sources.

Working code that can act as baselines for any speech enhancementwork.

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 4 / 41

Page 11: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Objective

Present some of the well-established methods in the speechenhancement literature and discuss the relationship between them.

Shed insight on how such methods differ in approach andassumptions with methods that rely on matrix factorization and/orprior training of sources.

Working code that can act as baselines for any speech enhancementwork.

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 4 / 41

Page 12: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Speech enhancement: Model sources

Under-determined problem: Y (ω) = X(ω) +D(ω)

Train dictionaries (bases) of noise and/or speech to use as prior.

Model of the noise can be inaccurate.Training online can be computationally expensive.

Assumption that noise varies more slowly compared to speech andthat speech is temporally sparse.

Use voice activity detectors and estimate noise when there is no speech.Keep track of the minimum level of spectrum at certain frequency.

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 5 / 41

Page 13: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Speech enhancement: Model sources

Under-determined problem: Y (ω) = X(ω) +D(ω)

Train dictionaries (bases) of noise and/or speech to use as prior.

Model of the noise can be inaccurate.Training online can be computationally expensive.

Assumption that noise varies more slowly compared to speech andthat speech is temporally sparse.

Use voice activity detectors and estimate noise when there is no speech.Keep track of the minimum level of spectrum at certain frequency.

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 5 / 41

Page 14: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Speech enhancement: Model sources

Under-determined problem: Y (ω) = X(ω) +D(ω)

Train dictionaries (bases) of noise and/or speech to use as prior.

Model of the noise can be inaccurate.Training online can be computationally expensive.

Assumption that noise varies more slowly compared to speech andthat speech is temporally sparse.

Use voice activity detectors and estimate noise when there is no speech.Keep track of the minimum level of spectrum at certain frequency.

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 5 / 41

Page 15: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Speech enhancement: Separate sources

Based on an estimate of the noise, how can we estimate the speech.

Spectral subtraction [Bol79]Wiener filtering: MMSE Estimator [LO79]Spectral Amplitude MMSE Estimator [EM84]

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 6 / 41

Page 16: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Frequency domain approaches

Discuss approaches in the frequency domain: Y (ω) = X(ω) +D(ω)

Overall flow of STFT processing

y(n)w(n − mR)

ym(n) Ym(w)

Xm(w)x(n) xm(n)Overlap add IFT

FT

Estimation

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 7 / 41

Page 17: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Spectral subtraction

|Ym(ω)|

0 1000 2000 3000 4000 5000 6000 7000 8000−100

−90

−80

−70

−60

−50

−40

−30

−20

−10

0

10

Freq (Hz)

Po

we

r m

ag

nitu

de

(d

B)

Frame 25

noisy speech

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 8 / 41

Page 18: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Spectral subtraction

|Ym(ω)|, |Dm(ω)|

0 1000 2000 3000 4000 5000 6000 7000 8000−100

−90

−80

−70

−60

−50

−40

−30

−20

−10

0

10

Freq (Hz)

Po

we

r m

ag

nitu

de

(d

B)

Frame 25

noisy speech

noise estimate

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 8 / 41

Page 19: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Spectral subtraction

|Ym(ω)|, E [|Dm(ω)|]

0 1000 2000 3000 4000 5000 6000 7000 8000−100

−90

−80

−70

−60

−50

−40

−30

−20

−10

0

10

Freq (Hz)

Po

we

r m

ag

nitu

de

(d

B)

Frame 25

noisy speech

noise estimate

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 8 / 41

Page 20: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Spectral subtraction

|Xm(ω)| = |Ym(ω)| − E [|Dm(ω)|]

0 1000 2000 3000 4000 5000 6000 7000 8000−100

−90

−80

−70

−60

−50

−40

−30

−20

−10

0

10

Freq (Hz)

Po

we

r m

ag

nitu

de

(d

B)

Frame 25

noisy speech

noise estimate

denoised speech

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 8 / 41

Page 21: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Spectral subtraction

|Xm(ω)| = max{|Ym(ω)| − E [|Dm(ω)|] , 0}

0 1000 2000 3000 4000 5000 6000 7000 8000−100

−90

−80

−70

−60

−50

−40

−30

−20

−10

0

10

Freq (Hz)

Po

we

r m

ag

nitu

de

(d

B)

Frame 25

noisy speech

noise estimate

denoised speech

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 8 / 41

Page 22: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Spectral subtraction

|Xm(ω)| = max{|Ym(ω)| − E [|Dm(ω)|] , 0}∠Xm(ω) = ∠Ym(ω)

0 1000 2000 3000 4000 5000 6000 7000 8000−100

−90

−80

−70

−60

−50

−40

−30

−20

−10

0

10

Freq (Hz)

Po

we

r m

ag

nitu

de

(d

B)

Frame 25

noisy speech

noise estimate

denoised speech

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 8 / 41

Page 23: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Power spectral subtraction

|Xm(ω)|α = max{|Ym(ω)|α − E [|Dm(ω)|α] , 0}

The power spectral subtraction: α = 2

|Xm(ω)|2 = max{|Ym(ω)|2 − E[|Dm(ω)|2

], 0}

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 9 / 41

Page 24: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Power spectral subtraction

|Xm(ω)|α = max{|Ym(ω)|α − E [|Dm(ω)|α] , 0}

The power spectral subtraction: α = 2

|Xm(ω)|2 = max{|Ym(ω)|2 − E[|Dm(ω)|2

], 0}

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 9 / 41

Page 25: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Gain for power spectral subtraction

From the power spectral subtraction,

|Xm(ω)|2 = |Ym(ω)|2 − E[|Dm(ω)|2

]the gain can be expressed as follows

Hm(ω) =|Xm(ω)||Ym(ω)|

=

√|Ym(ω)|2 − E [|Dm(ω)|2]

|Ym(ω)|2=

√γ(ω)− 1

γ(ω)

γ(ω) is called the a-posteriori SNR. γ(ω) = |Ym(ω)|2E[|Dm(ω)|2]

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 10 / 41

Page 26: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Gain for power spectral subtraction

From the power spectral subtraction,

|Xm(ω)|2 = |Ym(ω)|2 − E[|Dm(ω)|2

]the gain can be expressed as follows

Hm(ω) =|Xm(ω)||Ym(ω)|

=

√|Ym(ω)|2 − E [|Dm(ω)|2]

|Ym(ω)|2=

√γ(ω)− 1

γ(ω)

γ(ω) is called the a-posteriori SNR. γ(ω) = |Ym(ω)|2E[|Dm(ω)|2]

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 10 / 41

Page 27: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Gain for power spectral subtraction

From the power spectral subtraction,

|Xm(ω)|2 = |Ym(ω)|2 − E[|Dm(ω)|2

]the gain can be expressed as follows

Hm(ω) =|Xm(ω)||Ym(ω)|

=

√|Ym(ω)|2 − E [|Dm(ω)|2]

|Ym(ω)|2=

√γ(ω)− 1

γ(ω)

γ(ω) is called the a-posteriori SNR. γ(ω) = |Ym(ω)|2E[|Dm(ω)|2]

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 10 / 41

Page 28: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Gain for power spectral subtraction

Hm(ω) =

√γ(ω)− 1

γ(ω)

0 5 10 15 20−25

−20

−15

−10

−5

0

SNR (dB), 10log(γ(ω))

Ga

in (

dB

), 2

0lo

g(H

(ω))

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 11 / 41

Page 29: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Gain for general spectral subtraction

|Xm(ω)|α = |Ym(ω)|α − E [|Dm(ω)|α]Different gain functions for various α

Hm(ω) =|Xm(ω)||Ym(ω)|

=

(γ(ω)α/2 − 1

γ(ω)α/2

)1/α

0 5 10 15 20−50

−45

−40

−35

−30

−25

−20

−15

−10

−5

0

SNR (dB), 10log(γ(ω))

Ga

in (

dB

), 2

0lo

g(H

(ω))

α = 1

α = 2

α = 3

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 12 / 41

Page 30: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Example of spectral subtraction

Noisy speech

Power spectral subtraction

Magnitude spectral subtraction

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 13 / 41

Page 31: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Musical noise

Q. Why do we get musical noise?

A1. Inaccurate estimate of unknown variables.

From Ym(ω) = Xm(ω) +Dm(ω),

|Xm(ω)|2 = |Ym(ω)|2 − |Dm(ω)|2− (Xm(ω)D∗m(ω) +X∗m(ω)Dm(ω))

|Dm(ω)|2 ≈ E[|Dm(ω)|2

]2Re{Xm(ω)D∗m(ω)} ≈ E [2Re{Xm(ω)D∗m(ω)}] = 0

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 14 / 41

Page 32: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Musical noise

Q. Why do we get musical noise?

A1. Inaccurate estimate of unknown variables.

From Ym(ω) = Xm(ω) +Dm(ω),

|Xm(ω)|2 = |Ym(ω)|2 − |Dm(ω)|2− (Xm(ω)D∗m(ω) +X∗m(ω)Dm(ω))

|Dm(ω)|2 ≈ E[|Dm(ω)|2

]2Re{Xm(ω)D∗m(ω)} ≈ E [2Re{Xm(ω)D∗m(ω)}] = 0

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 14 / 41

Page 33: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Noise estimation

Simple method is to average power spectra when there is no speechactivity |Dm(ω)|2 = E

[|Ym(ω)|2

]= E

[|Dm(ω)|2

]Need an accurate voice activity detector (VAD)Issues with non-stationary noise (babble noise) conditions

Issues with |Dm(ω)|2 = E[|Dm(ω)|2

]The noise power spectrum (periodogram) has high variance withrespect to the underlying power spectral density

20 40 60 80 100 120−30

−25

−20

−15

−10

−5

0

5

FFT Index

Po

we

r (d

B)

Periodogram

True power

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 15 / 41

Page 34: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Cross term spectrum

2Re{Xm(ω)D∗m(ω)} requires the phase information which is difficult

to estimate

0 1000 2000 3000 4000 5000 6000 7000 8000−100

−80

−60

−40

−20

0

20

40

Freq (Hz)

Pow

er

Magnitude (

dB

)

Frame 18

|X(ω)|2

|Y(ω)|2

|2Re(X(ω)D*(ω))|

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 16 / 41

Page 35: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Musical noise

Q. Why do we get musical noise?

A1. In accurate estimate of unknown variables.

A2. How we engineer situations when we have a bad estimate.

Half rectify negative values.

|Xm(ω)|2 = max{|Ym(ω)|2 − E[|Dm(ω)|2

], 0}

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 17 / 41

Page 36: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Artifacts from half-wave rectifying

1

1Image from [Bol79]Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 18 / 41

Page 37: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Oversubtraction [BSM79]

Over-subtract the noise estimate to reduce noise peaks

|Xm(ω)|2 = |Ym(ω)|2 − αE[|Dm(ω)|2

]Comes at the expense of attenuating the underlying signal

0 1000 2000 3000 4000 5000 6000 7000 8000−80

−70

−60

−50

−40

−30

−20

−10

0

10

Freq (Hz)

Pow

er

magnitude (

dB

)

α = 1, Frame 25

noisy speech

noise estimate

denoised speech

0 1000 2000 3000 4000 5000 6000 7000 8000−80

−70

−60

−50

−40

−30

−20

−10

0

10

Freq (Hz)

Pow

er

magnitude (

dB

)

α = 2, Frame 25

noisy speech

noise estimate

denoised speech

0 1000 2000 3000 4000 5000 6000 7000 8000−80

−70

−60

−50

−40

−30

−20

−10

0

10

Freq (Hz)

Pow

er

magnitude (

dB

)

α = 3, Frame 25

noisy speech

noise estimate

denoised speech

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 19 / 41

Page 38: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Oversubtraction [BSM79]

Fill in valleys at frequencies to mask residue noise

|Xm(ω)|2 = max{|Ym(ω)|2 − αE[|Dm(ω)|2

],

βE[|Dm(ω)|2

]}

0 1000 2000 3000 4000 5000 6000 7000 8000−80

−70

−60

−50

−40

−30

−20

−10

0

10

Freq (Hz)

Pow

er

magnitude (

dB

)

α = 2, Frame 25

noisy speech

noise estimate

denoised speech

0 1000 2000 3000 4000 5000 6000 7000 8000−80

−70

−60

−50

−40

−30

−20

−10

0

10

Freq (Hz)

Pow

er

magnitude (

dB

)

α = 2, β = 0.001, Frame 25

noisy speech

noise estimate

denoised speech

0 1000 2000 3000 4000 5000 6000 7000 8000−80

−70

−60

−50

−40

−30

−20

−10

0

10

Freq (Hz)

Pow

er

magnitude (

dB

)

α = 2, β = 0.5, Frame 25

noisy speech

noise estimate

denoised speech

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 20 / 41

Page 39: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Oversubtraction [BSM79]

α should be dependent on the frame segmental SNR (γ)

Less attenuation (small α) for high SNR, and more attenuation (largeα) for low SNR

−10 −5 0 5 10 15 20 250

1

2

3

4

5

6

7

8

SNR (dB)

α

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 21 / 41

Page 40: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Example of over spectral subtraction

Noisy speech

Power spectral subtraction

Spectral over subtraction: α = 15, β = 0.01

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 22 / 41

Page 41: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Wiener filter

Find optimal linear filter that outputs the desired signal (clean speech)

e(n) = x(n)− x(n) = x(n)−M−1∑k=0

hky(n− k)

Find h∗ that minimizes E[e2(n)

]by solving

∂E[e2(n)]∂h = 0

h∗ = R−1yy ryx = (Rxx + Rdd)

−1rxx

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 23 / 41

Page 42: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Wiener filter in frequency domain

If we assume a non-causal IIR filter, using the convolution theorem,i.e., x(n) ∗ h(n)↔ X(w)H(w)

E(ω) = X(w)−H(w)Y (w)

If we minimize E[|E(ω)|2

]with respect to H(ω), we have

H(ω) =E [X(ω)Y ∗(ω)]

E [|Y (w)|2] =E [X(ω)(X(ω)∗ +D(ω)∗)]

E [|Y (w)|2]

=E[|X(ω)|2

]E [|Y (ω)|2] =

E[|X(ω)|2

]E [|X(ω)|2] + E [|D(ω)|2]

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 24 / 41

Page 43: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Parametric wiener filters

Wiener filter

H(ω) =E[|X(ω)|2

]E [|Y (ω)|2] =

E[|X(ω)|2

]E [|X(ω)|2] + E [|D(ω)|2]

More generally,

H(ω) =

(E[|X(ω)|2

]E [|X(ω)|2] + αE [|D(ω)|2]

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 25 / 41

Page 44: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Connection with spectral subtraction

H(ω) =

(E[|X(ω)|2

]E [|X(ω)|2] + αE [|D(ω)|2]

E[|X(ω)|2

]is unknown

If α = 1, β = 1/2, and E[|X(ω)|2

]= |X(ω)|2 then...

|X(ω)| = H(ω)|Y (ω)| =√

|X(ω)|2|X(ω)|2 + E [|D(ω)|2]

|Y (ω)|

|X(ω)|2(|X(ω)|2 + E[|D(ω)|2

]) = |X(ω)|2|Y (ω)|2

gives two solutions |X(ω)|2 = |Y (ω)|2 −E[|D(ω)|2

]or |X(ω)|2 = 0.

which is essentially the power spectral subtraction algorithm

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 26 / 41

Page 45: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Connection with spectral subtraction

H(ω) =

(E[|X(ω)|2

]E [|X(ω)|2] + αE [|D(ω)|2]

E[|X(ω)|2

]is unknown

If α = 1, β = 1/2, and E[|X(ω)|2

]= |X(ω)|2 then...

|X(ω)| = H(ω)|Y (ω)| =√

|X(ω)|2|X(ω)|2 + E [|D(ω)|2]

|Y (ω)|

|X(ω)|2(|X(ω)|2 + E[|D(ω)|2

]) = |X(ω)|2|Y (ω)|2

gives two solutions |X(ω)|2 = |Y (ω)|2 −E[|D(ω)|2

]or |X(ω)|2 = 0.

which is essentially the power spectral subtraction algorithm

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 26 / 41

Page 46: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Connection with spectral subtraction

H(ω) =

(E[|X(ω)|2

]E [|X(ω)|2] + αE [|D(ω)|2]

E[|X(ω)|2

]is unknown

If α = 1, β = 1/2, and E[|X(ω)|2

]= |X(ω)|2 then...

|X(ω)| = H(ω)|Y (ω)| =√

|X(ω)|2|X(ω)|2 + E [|D(ω)|2]

|Y (ω)|

|X(ω)|2(|X(ω)|2 + E[|D(ω)|2

]) = |X(ω)|2|Y (ω)|2

gives two solutions |X(ω)|2 = |Y (ω)|2 −E[|D(ω)|2

]or |X(ω)|2 = 0.

which is essentially the power spectral subtraction algorithm

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 26 / 41

Page 47: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Connection with spectral subtraction

H(ω) =

(E[|X(ω)|2

]E [|X(ω)|2] + αE [|D(ω)|2]

E[|X(ω)|2

]is unknown

If α = 1, β = 1/2, and E[|X(ω)|2

]= |X(ω)|2 then...

|X(ω)| = H(ω)|Y (ω)| =√

|X(ω)|2|X(ω)|2 + E [|D(ω)|2]

|Y (ω)|

|X(ω)|2(|X(ω)|2 + E[|D(ω)|2

]) = |X(ω)|2|Y (ω)|2

gives two solutions |X(ω)|2 = |Y (ω)|2 −E[|D(ω)|2

]or |X(ω)|2 = 0.

which is essentially the power spectral subtraction algorithm

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 26 / 41

Page 48: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Connection with spectral subtraction

H(ω) =

(E[|X(ω)|2

]E [|X(ω)|2] + αE [|D(ω)|2]

E[|X(ω)|2

]is unknown

If α = 1, β = 1/2, and E[|X(ω)|2

]= |X(ω)|2 then...

|X(ω)| = H(ω)|Y (ω)| =√

|X(ω)|2|X(ω)|2 + E [|D(ω)|2]

|Y (ω)|

|X(ω)|2(|X(ω)|2 + E[|D(ω)|2

]) = |X(ω)|2|Y (ω)|2

gives two solutions |X(ω)|2 = |Y (ω)|2 −E[|D(ω)|2

]or |X(ω)|2 = 0.

which is essentially the power spectral subtraction algorithm

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 26 / 41

Page 49: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Wiener filter gain

If we replace E[|X(ω)|2] = |Y (ω)|2 − E[|D(ω)|2

]Wiener filter

H(ω) =E[|X(ω)|2]

E[|X(ω)|2] + E [|D(ω)|2] =γ(ω)− 1

γ(ω)

, where γ(ω) = |Y (ω)|2E[|D(ω)|2] .

The square root wiener filter = power spectral subtraction

H(ω) 12=

√γ(ω)− 1

γ(ω)

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 27 / 41

Page 50: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Wiener filter gain

0 5 10 15 20−45

−40

−35

−30

−25

−20

−15

−10

−5

0

A−posteriori SNR (dB), 10log(γ(ω))

Ga

in (

dB

), 2

0lo

g(H

(ω))

Wiener filter

Square root wiener filter(i.e., Power spectral subtraction

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 28 / 41

Page 51: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Connection with spectral subtraction

If α 6= 1, using the same method

|X(ω)|2 = |Y (ω)|2 − αE[|D(ω)|2

]which is the spectral over subtraction method

Note the Wiener filter is zero-phase, and thus ∠X(ω) = ∠Y (ω), justlike the spectral subtraction method

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 29 / 41

Page 52: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

MMSE-STSA Estimator

Suggested by Eprahim and Malah [EM84]

Estimator that minimizes the mean square error of the spectralmagnitude

Given X(ωk) = Xkej∠X(ωk),

minE[(Xk − Xk)

2]

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 30 / 41

Page 53: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Comparison of MMSE-STSA Estimator with Wiener filter

1 MMSE in the complex spectrum vs. magnitude spectrum

Wiener: minE[(X(ωk)− X(ωk))

2)]

MMSE-STSA: minE[(Xk − Xk)

2]

2 Linear assumption vs. assumption on distribution of Xk

Wiener: minE[(X(ωk)−H(ωk)Y (ωk))

2]

MMSE-STSA: minE[(Xk − Xk)

2], where expectation is taken over

p(Y (ωk), Xk)

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 31 / 41

Page 54: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

MMSE-STSA Estimator

minE[(Xk − Xk)

2]

From Bayesian statistics the optimal MMSE estimator is,

Xk = E [Xk|Y (ωk)]

=

∫ ∞0

xkp(xk|Y (ωk))dxk

=

∫∞0 xkp(Y (ωk)|xk)p(xk)dxk

p(Y (ωk))

We need knowledge on the distribution of X(ωk) and Y (ωk)

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 32 / 41

Page 55: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Distribution assumption for X(wk), D(wk), and Y (wk)

Fourier transform coefficients (of both speech and noise) are Gaussiandistributed.

From central limit theorem: Y (ωk) =∑N−1

n=0 y(n)e−jωkn

CLT holds for weakly dependent signals tooThe variance of the distribution E|Y (ωk)|2 is time varying

X(wk) ∼ N (0, E[|X(wk)|2

])

D(wk) ∼ N (0, E[|D(wk)|2

])

Y (wk) ∼ N (0, E[|X(wk)|2

]+ E

[|D(wk)|2

])

Xk ∼ Rayleigh(σ), with σ =√E [|X(wk)|2] /2

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 33 / 41

Page 56: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Spectral gain of MMSE-STSA estimator

The spectral gain can be represented with two variables.

The a-priori SNR: ξk =E[|X(ωk)|2]E[|D(ωk)|2]

The a-posteriori SNR: γk = |Y (ωk)|2E[|D(ωk)|2]

Using a temporary variable, νk =ξk

1+ξkγk,

Xk =

√π

2

√νkγk

exp(−νk

2

) [(1 + νk)I0

(νk2

)+ νkI1

(νk2

)]Yk

= G(ξk, γk)Yk

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 34 / 41

Page 57: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Gain as a function of a-priori SNR

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 35 / 41

Page 58: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Estimating the a-priori SNR

ξk =E[|X(ωk)|2

]E [|D(ωk)|2]

Instantaneous SNR: ξk =|Y (ωk)|2−E[|D(ωk)|2]

E[|D(ωk)|2]= |Y (ωk)|2

E[|D(ωk)|2]− 1

Decision directed approach

ξk(m) = aX2k(m− 1)

E [|D(ωk,m− 1)|2] + (1− a)( |Y (ωk)|2E [|D(ωk)|2]

− 1

)

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 36 / 41

Page 59: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Effect of smoothed SNR

Instantaneous SNR:

ξk = γ(ωk)− 1 =|Y (ωk)|2

E [|D(ωk)|2]− 1

Decision directed approach

ξk(m) = aX2k(m− 1)

E [|D(ωk,m− 1)|2] + (1− a)( |Y (ωk)|2E [|D(ωk)|2]

− 1

)

0 20 40 60 80 100 120 140 160 180−50

−40

−30

−20

−10

0

10

20

30

Time

SN

R (

dB

)

ξ

k

γk − 1

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 37 / 41

Page 60: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Examples with decision directed a-priori SNR estimation

Noisy speech

MMSE-STSA (Instantaneous SNR)

MMSE-STSA (Decision Directed SNR)

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 38 / 41

Page 61: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Examples with decision directed a-priori SNR estimation

Noisy speech

Wiener Filter (Instantaneous SNR)

Wiener Filter (Decision Directed SNR)

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 39 / 41

Page 62: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

Summary

Model noise when speech is absent. |D(ω)| = E [|D(ω)|]Separate speech by applying gain on the noisy spectrum.

1 Spectral subtraction: |X(ω)| = |Y (ω)| − |D(ω)|2 Wiener filter: X(ω) =

E[|X(ω)|2]E[|X(ω)|2]+E[|D(ω)|2]Y (ω)

3 STSA-MSME: X(ω) = G(ξ(ω), γ(ω))Y (ω)

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 40 / 41

Page 63: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with

References I

Steven Boll, Suppression of acoustic noise in speech using spectralsubtraction, Acoustics, Speech and Signal Processing, IEEETransactions on 27 (1979), no. 2, 113–120.

M. Berouti, M. Schwartz, and J. Makhoul, Enhancement of speechcorrupted by acoustic noise, IEEE Int. Conf. Acoust., Speech, SignalProcessing (ICASSP ’79), 1979, pp. 208–211.

Y. Ephraim and D. Malah, Speech enhancement using a minimummean-square error short-time spectral amplitude estimator, IEEETransactions on Acoustics, Speech and Signal Processing (1984),1109–1121.

Jae S Lim and Alan V Oppenheim, Enhancement and bandwidthcompression of noisy speech, Proceedings of the IEEE 67 (1979),no. 12, 1586–1604.

Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 41 / 41