btp_y11uc158 _ppt.pdf

Perceptual WPT and time-adaptive level thresholding based enhancement of

degraded speech

Presented by

Nitesh Kumar Chaudhary

Department of Electronics & Communication Engineering

The LNM Institute Of Information Technology, Jaipur

Under the Supervision of

Dr. Navneet upadhyay

Why speech enhancement ?...

The presence of noise in speech can significantly reduce the intelligibility ofspeech and degrade automatic speech recognition performance.

Reduction of noise has become an important issue in speech signal processingsystem, such as speech coding and speech recognition system.

(a) Additive acoustic noise - such as the noise added to the speech signal whenrecorded in an environment with noticeable background noise, like in an aircraftcockpit.

(b) Acoustic reverberation - results from the additive effect of multiple reflectionsof an acoustic signal.

(c) Convolutive channel effects - resulting in an uneven or band-limited response,can result when the communication channel is not modeled effectively for thechannel equalizer to remove the channel impulse response.

.

(d) Electrical interference

(e) Codec distortion - distortion caused by the coding algorithm due to compression

(f) Distortion introduced by recording apparatus - poor response of microphone

Keywords: Perceptual Wavelet packet transform (PWPT), Time adaptive Thresholding,

TEO, Probability of detection Pd and false alarm Pf, Masking.

Block Diagram

Perceptual WPTTeager Energy

Operator

Critical Band

Selection

level

dependent

Thresholding

Inverse PWPT

VAS & Time

adaptive

Thresholding

Recovered

Clean Signal

Y(n)

Noisy Signal

X(n)

Wj,m (K)

m =1...17

tj,m (K)

m =1...17

m =1...17

m =1...17m =1...17

Mj,m (K)

Lj,m (K)Wm (n)

Perceptual Wavelet Packet Transform :

The Wavelet Packet Transform (WPT) is one such time frequency analysis

tools. It is a transform that brings the signal into a domain that contains both

time and frequency information.

In wavelet analysis, a signal is split into an approximation and a detail. The

approximation is then itself split into a second-level approximation and detail,

and the process is repeated.

In the corresponding Perceptual wavelet packet situation, each detail coefficient

vector is also decomposed into two parts using the same approach as in

approximation vector splitting and 17 critical bands are selected because for

speech with 8 kHz sampling rate, 17 critical bands are required to cover the

entire range of frequency

(0,0)

(1,0) (1,1)

(2,0) (2,1) (2,2) (2,3)

(3,0) (3,1) (3,2) (3,3) (3,4) (3,5) (3,6) (3,7)

(4,0) (4,1) (4,2) (4,3) (4,4) (4,5) (4,6) (4,7) (4,8) (4,9)

(5,0) (5,1) (5,2) (5,3) (5,4) (5,5) (5,6) (5,7)

Wavelet Decomposition

De

co

mp

os

itio

n L

ev

el

0.5 1 1.5 2

x 104

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

Sig

na

l M

ag

nit

ud

e

Sample Point

data1

data2

data3

data4

data5

data6

data7

data8

data9

data10

data11

data12

data13

data14

data15

data16

data17

data18

data19

data20

data21

data22

data23

data24

data25

data26

data27

data28

data29

data30

data31

data32

Noisy Signal Wavelet Packet Decomposition

TEO & level dependent thresholding

TEO is powerful non-linear operator which has been successfully used in various

speech applications, TEO can then be used to estimate the second moment

angular bandwidth of a signal and the moments of a signal duration and that of

its spectrum.

TEO can determine the energy functions of quite complicated functions For a

given band limited signal, TEO introduced by Kaiser is given by

[()] = () ( + )( )

The time adaptive threshold selection for wavelet coefficients has been

computed, which takes care of varying noise time into account.

,() = , , {, }

,

(0,0)

(1,0) (1,1)

(2,0) (2,1) (2,2) (2,3)

(3,0) (3,1) (3,2) (3,3) (3,4) (3,5) (3,6) (3,7)

(4,0) (4,1) (4,2) (4,3) (4,4) (4,5) (4,6) (4,7) (4,8) (4,9)

(5,0) (5,1) (5,2) (5,3) (5,4) (5,5) (5,6) (5,7)

Wavelet Decomposition

De

co

mp

os

itio

n L

ev

el

0.5 1 1.5 2

x 104

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

Sig

na

l M

ag

nit

ud

e

Sample Point

For a selected band, mask is obtained by

The voice activity shape V(n) is calculated by

Masking Construction:

, = , (

Where * denotes the convolution operation and Hj(k) is 256

2 point level dependent

Hamming window.

=

=

()

Where Wm(n) is the inverse perceptual Wavelet packet tranform of Mj,m k in equation

Time adaptive threshold calculation : To determine this time-adaptive threshold value AWT, an iterative algorithm has been proposed .

=

. ,

) <

+

,

Where AWT(i) is the time adaptive threshold value of frame i, and frame(i) is defined as

Frame(i) = [V(( i-1)*160 + 1], [V(( i-1)*160],

Noise is defined as Noise(n) = p *{E[V(2)

(n)] + Mean(Frame(i))}/2

E[V(k)

(n)] is the mean of V(k)

(n).

The voice-active regions are characterized by V(n) > AWT

Level 3

0 500 1000 1500 2000 2500 3000 3500-1

-0.5

0

0.5

1

Node (3,5)

Frequency in Hz

Sig

na

l A

mp

litu

de

0 500 1000 1500 2000 2500 3000 3500-1

-0.5

0

0.5

1Node (3,6)

Frequency in Hz

Sig

na

l A

mp

litu

de

0 500 1000 1500 2000 2500 3000 3500-1

-0.5

0

0.5

1Node (3,7)

Frequency in Hz

Sig

na

l A

mp

litu

de

0 500 1000 1500 2000 2500 3000 3500-1

-0.5

0

0.5

1Node (3,5)

Frequency in Hz

Sig

na

l A

mp

litu

de

0 500 1000 1500 2000 2500 3000 3500-1

-0.5

0

0.5

1Node (3,6)

Sig

na

l A

mp

litu

de

Frequency in Hz

0 500 1000 1500 2000 2500 3000 3500-1

-0.5

0

0.5

1Node (3,7)

Frequency in Hz

Sig

na

l A

mp

litu

de

Noise Signal of level 3rd of Wavelet Tree Denoised Signal of level 3rd of Wavelet Tree

Level 3, node by node denoising

Level 4

0 200 400 600 800 1000 1200 1400 1600-1

0

1Node (4,4)

Frequency in Hz

Am

p

0 200 400 600 800 1000 1200 1400 1600-1

0

1Node (4,5)

Frequency in Hz

Am

p

0 200 400 600 800 1000 1200 1400 1600-1

0

1Node (4,6)

Frequency in Hz

Am

p

0 200 400 600 800 1000 1200 1400 1600-1

0

1Node (4,7)

Frequency in Hz

Am

p

0 200 400 600 800 1000 1200 1400 1600-1

0

1Node (4,8)

Frequency in Hz

Am

p

0 200 400 600 800 1000 1200 1400 1600-1

0

1Node (4,9)

Frequency in Hz

Am

p

0 200 400 600 800 1000 1200 1400 1600-1

0

1Node (4,4)

Frequency in Hz

Am

p

0 200 400 600 800 1000 1200 1400 1600-1

0

1Node (4,5)

Frequency in Hz

Am

p

0 200 400 600 800 1000 1200 1400 1600-1

0

1Node (4,6)

Frequency in Hz

Am

p

0 200 400 600 800 1000 1200 1400 1600-1

0

1Node (4,7)

Frequency in Hz

Am

p

0 200 400 600 800 1000 1200 1400 1600-1

0

1Node (4,8)

Frequency in Hz

Am

p

0 200 400 600 800 1000 1200 1400 1600-1

0

1Node (4,9)

Frequency in Hz

Am

p

Denoised Signal Of Level 4th Of Wavelet TreeNoise Signal Of Level 4th Of Wavelet Tree


Level 5


0 200 400 600 800-1

0

1Node (5,0)

Frequency in Hz

Am

p

0 200 400 600 800-1

0

1Node (5,1)

Frequency in HzA

mp

0 200 400 600 800-1

0

1Node (5,2)

Frequency in Hz

Am

p

0 200 400 600 800-1

0

1Node (5,3)

Frequency in Hz

Am

p

0 200 400 600 800-1

0

1Node (5,4)

Frequency in Hz

Am

p

0 200 400 600 800-1

0

1Node (5,5)

Frequency in Hz

Am

p

0 200 400 600 800-1

0

1Node (5,6)

Frequency in Hz

Am

p

0 200 400 600 800-1

0

1Node (5,7)

Frequency in Hz

Am

p0 200 400 600 800

-1

0

1Node (5,0)

Frequency in Hz

Am

p

0 200 400 600 800-1

0

1Node (5,1)

Frequency in Hz

Am

p

0 200 400 600 800-1

0

1Node (5,2)

Frequency in Hz

Am

p

0 200 400 600 800-1

0

1Node (5,3)

Frequency in Hz

Am

p

0 200 400 600 800-1

0

1Node (5,4)

Frequency in Hz

Am

p0 200 400 600 800

-1

0

1Node (5,5)

Frequency in Hz

Am

p

0 200 400 600 800-1

0

1Node (5,6)

Frequency in Hz

Am

p

0 200 400 600 800-1

0

1Node (5,7)

Frequency in Hz

Am

p

Noise Signal Of Level 5th Of Wavelet Tree Denoised Signal Of Level 5th Of Wavelet Tree

Evaluation

To verify the effectiveness of the proposed algorithms, we compared the speech detection

and false-alarm probabilities

The proposed methods are all evaluated by receiver operating characteristic (ROC)

curves which show discriminative properties of VAD between noise-only and noisy

speech frames in terms of the Probability of Correct detection (Pd) and Probability of

false-alarm (Pf) such that

=

=

10-0.01

100

100.01

100.02

100.03

100.04

10-0.01

100

100.01

Pf: Probability of False alarm

Pd

: P

rob

ab

ilit

y o

f d

ete

cti

on

Performance Evaluation

20.6710 dB

shape-preserving

linear

Wavelet Filter type (filter

Length)

Probability Of Correct

Detection (Pd %)

Probability Of False Alarm

(Pf %)

Computation time

(CP)

Daubechies 2 86.4 15.6 2.872 s

Daubechies 4 89.3 11.7 2.884 s

Daubechies 8 91.8 9.2 3.023 s

Daubechies 10 94.3 5.7 3.074 s

Daubechies 12 94.5 5.5 3.898 s

Daubechies 14 94.8 5.2 3.899 s

The cost-performance (CP) is defined as

CP = ( )

Where the CP time is the average PWPT process time of specific wavelet. Considering the

cost performance rate given in Table 1, the Daubechies wavelet filter with length 12,

which has the best CP ratio, is recommended for the proposed algorithm.

References :

Shi-Huang Chen, HsinTe Wu, Yukon Chang and T.K. Truong Robust voice activity

detection using perceptual wavelet-packet transform and Teager energy operator in Pattern

Recognition Letters 28 (2007) 13271332.

Daubechies, I. (1992), Ten lectures on wavelets, CBMS-NSF conference series in applied

mathematics, SIAM Ed.

D. L. Donoho, I. M. Johnstone, Ideal Spatial Adaptation via Wavelet Shrinkage,

Biometrika, vol. 81, pp. 425-455, 1994.

S. Mallat, A theory for multiresolution signal decompo-sition: The wavelet representation,

IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 11, No. 7, pp. 674

693, July 1989.

M. Berouti, R. Schwartz, and J. Makhoul, Enhancement of speech corrupted by acoustic

noise, in Proc. IEEE ICASSP, Apr. 1979, pp. 208211.

Johnstone, I.M., Silverman, B.W., 1997. Wavelet threshold estimators for data with correlated

noise. J. Roy. Stat. Soc. B 59, 319351.

G. David Forney, Jr., Exponential error bounds for erasure, list, and decision feedback

schemes, Information Theory, IEEE Transactions on, vol. 14, no. 2, pp. 206220, Mar 1968.

TEO is powerful non-linear operator which has

been successfully used in various speech

applications, TEO can then be used to estimate

the second moment angular bandwidth of a

signal and the moments of a signal duration and

that of its spectrum.

TEO can determine the energy functions of

quite complicated functions For a given band

limited signal, TEO introduced by Kaiser is

given by

The time adaptive threshold selection for

wavelet coefficients has been computed, which

takes care of varying noise time into account.