btp_y11uc158 _ppt.pdf

19
Perceptual WPT and time-adaptive level thresholding based enhancement of degraded speech Presented by Nitesh Kumar Chaudhary Department of Electronics & Communication Engineering The LNM Institute Of Information Technology, Jaipur Under the Supervision of Dr. Navneet upadhyay

Upload: niteshlnmiit

Post on 09-Nov-2015

218 views

Category:

Documents


4 download

TRANSCRIPT

  • Perceptual WPT and time-adaptive level thresholding based enhancement of

    degraded speech

    Presented by

    Nitesh Kumar Chaudhary

    Department of Electronics & Communication Engineering

    The LNM Institute Of Information Technology, Jaipur

    Under the Supervision of

    Dr. Navneet upadhyay

  • Why speech enhancement ?...

    The presence of noise in speech can significantly reduce the intelligibility ofspeech and degrade automatic speech recognition performance.

    Reduction of noise has become an important issue in speech signal processingsystem, such as speech coding and speech recognition system.

    (a) Additive acoustic noise - such as the noise added to the speech signal whenrecorded in an environment with noticeable background noise, like in an aircraftcockpit.

    (b) Acoustic reverberation - results from the additive effect of multiple reflectionsof an acoustic signal.

    (c) Convolutive channel effects - resulting in an uneven or band-limited response,can result when the communication channel is not modeled effectively for thechannel equalizer to remove the channel impulse response.

    .

  • (d) Electrical interference

    (e) Codec distortion - distortion caused by the coding algorithm due to compression

    (f) Distortion introduced by recording apparatus - poor response of microphone

    Keywords: Perceptual Wavelet packet transform (PWPT), Time adaptive Thresholding,

    TEO, Probability of detection Pd and false alarm Pf, Masking.

  • Block Diagram

    Perceptual WPTTeager Energy

    Operator

    Critical Band

    Selection

    level

    dependent

    Thresholding

    Inverse PWPT

    VAS & Time

    adaptive

    Thresholding

    Recovered

    Clean Signal

    Y(n)

    Noisy Signal

    X(n)

    Wj,m (K)

    m =1...17

    tj,m (K)

    m =1...17

    m =1...17

    m =1...17m =1...17

    Mj,m (K)

    Lj,m (K)Wm (n)

  • Perceptual Wavelet Packet Transform :

    The Wavelet Packet Transform (WPT) is one such time frequency analysis

    tools. It is a transform that brings the signal into a domain that contains both

    time and frequency information.

    In wavelet analysis, a signal is split into an approximation and a detail. The

    approximation is then itself split into a second-level approximation and detail,

    and the process is repeated.

    In the corresponding Perceptual wavelet packet situation, each detail coefficient

    vector is also decomposed into two parts using the same approach as in

    approximation vector splitting and 17 critical bands are selected because for

    speech with 8 kHz sampling rate, 17 critical bands are required to cover the

    entire range of frequency

  • (0,0)

    (1,0) (1,1)

    (2,0) (2,1) (2,2) (2,3)

    (3,0) (3,1) (3,2) (3,3) (3,4) (3,5) (3,6) (3,7)

    (4,0) (4,1) (4,2) (4,3) (4,4) (4,5) (4,6) (4,7) (4,8) (4,9)

    (5,0) (5,1) (5,2) (5,3) (5,4) (5,5) (5,6) (5,7)

    Wavelet Decomposition

    De

    co

    mp

    os

    itio

    n L

    ev

    el

    0.5 1 1.5 2

    x 104

    -0.3

    -0.2

    -0.1

    0

    0.1

    0.2

    0.3

    0.4

    Sig

    na

    l M

    ag

    nit

    ud

    e

    Sample Point

    data1

    data2

    data3

    data4

    data5

    data6

    data7

    data8

    data9

    data10

    data11

    data12

    data13

    data14

    data15

    data16

    data17

    data18

    data19

    data20

    data21

    data22

    data23

    data24

    data25

    data26

    data27

    data28

    data29

    data30

    data31

    data32

    Noisy Signal Wavelet Packet Decomposition

  • TEO & level dependent thresholding

    TEO is powerful non-linear operator which has been successfully used in various

    speech applications, TEO can then be used to estimate the second moment

    angular bandwidth of a signal and the moments of a signal duration and that of

    its spectrum.

    TEO can determine the energy functions of quite complicated functions For a

    given band limited signal, TEO introduced by Kaiser is given by

    [()] = () ( + )( )

    The time adaptive threshold selection for wavelet coefficients has been

    computed, which takes care of varying noise time into account.

    ,() = , , {, }

    ,

  • (0,0)

    (1,0) (1,1)

    (2,0) (2,1) (2,2) (2,3)

    (3,0) (3,1) (3,2) (3,3) (3,4) (3,5) (3,6) (3,7)

    (4,0) (4,1) (4,2) (4,3) (4,4) (4,5) (4,6) (4,7) (4,8) (4,9)

    (5,0) (5,1) (5,2) (5,3) (5,4) (5,5) (5,6) (5,7)

    Wavelet Decomposition

    De

    co

    mp

    os

    itio

    n L

    ev

    el

    0.5 1 1.5 2

    x 104

    -0.3

    -0.2

    -0.1

    0

    0.1

    0.2

    0.3

    0.4

    Sig

    na

    l M

    ag

    nit

    ud

    e

    Sample Point

  • For a selected band, mask is obtained by

    The voice activity shape V(n) is calculated by

    Masking Construction:

    , = , (

    Where * denotes the convolution operation and Hj(k) is 256

    2 point level dependent

    Hamming window.

    =

    =

    ()

    Where Wm(n) is the inverse perceptual Wavelet packet tranform of Mj,m k in equation

  • Time adaptive threshold calculation : To determine this time-adaptive threshold value AWT, an iterative algorithm has been proposed .

    =

    . ,

    ) <

    +

    ,

    Where AWT(i) is the time adaptive threshold value of frame i, and frame(i) is defined as

    Frame(i) = [V(( i-1)*160 + 1], [V(( i-1)*160],

    Noise is defined as Noise(n) = p *{E[V(2)

    (n)] + Mean(Frame(i))}/2

    E[V(k)

    (n)] is the mean of V(k)

    (n).

    The voice-active regions are characterized by V(n) > AWT

  • Level 3

    0 500 1000 1500 2000 2500 3000 3500-1

    -0.5

    0

    0.5

    1

    Node (3,5)

    Frequency in Hz

    Sig

    na

    l A

    mp

    litu

    de

    0 500 1000 1500 2000 2500 3000 3500-1

    -0.5

    0

    0.5

    1Node (3,6)

    Frequency in Hz

    Sig

    na

    l A

    mp

    litu

    de

    0 500 1000 1500 2000 2500 3000 3500-1

    -0.5

    0

    0.5

    1Node (3,7)

    Frequency in Hz

    Sig

    na

    l A

    mp

    litu

    de

    0 500 1000 1500 2000 2500 3000 3500-1

    -0.5

    0

    0.5

    1Node (3,5)

    Frequency in Hz

    Sig

    na

    l A

    mp

    litu

    de

    0 500 1000 1500 2000 2500 3000 3500-1

    -0.5

    0

    0.5

    1Node (3,6)

    Sig

    na

    l A

    mp

    litu

    de

    Frequency in Hz

    0 500 1000 1500 2000 2500 3000 3500-1

    -0.5

    0

    0.5

    1Node (3,7)

    Frequency in Hz

    Sig

    na

    l A

    mp

    litu

    de

    Noise Signal of level 3rd of Wavelet Tree Denoised Signal of level 3rd of Wavelet Tree

    Level 3, node by node denoising

  • Level 4

    0 200 400 600 800 1000 1200 1400 1600-1

    0

    1Node (4,4)

    Frequency in Hz

    Am

    p

    0 200 400 600 800 1000 1200 1400 1600-1

    0

    1Node (4,5)

    Frequency in Hz

    Am

    p

    0 200 400 600 800 1000 1200 1400 1600-1

    0

    1Node (4,6)

    Frequency in Hz

    Am

    p

    0 200 400 600 800 1000 1200 1400 1600-1

    0

    1Node (4,7)

    Frequency in Hz

    Am

    p

    0 200 400 600 800 1000 1200 1400 1600-1

    0

    1Node (4,8)

    Frequency in Hz

    Am

    p

    0 200 400 600 800 1000 1200 1400 1600-1

    0

    1Node (4,9)

    Frequency in Hz

    Am

    p

    0 200 400 600 800 1000 1200 1400 1600-1

    0

    1Node (4,4)

    Frequency in Hz

    Am

    p

    0 200 400 600 800 1000 1200 1400 1600-1

    0

    1Node (4,5)

    Frequency in Hz

    Am

    p

    0 200 400 600 800 1000 1200 1400 1600-1

    0

    1Node (4,6)

    Frequency in Hz

    Am

    p

    0 200 400 600 800 1000 1200 1400 1600-1

    0

    1Node (4,7)

    Frequency in Hz

    Am

    p

    0 200 400 600 800 1000 1200 1400 1600-1

    0

    1Node (4,8)

    Frequency in Hz

    Am

    p

    0 200 400 600 800 1000 1200 1400 1600-1

    0

    1Node (4,9)

    Frequency in Hz

    Am

    p

    Denoised Signal Of Level 4th Of Wavelet TreeNoise Signal Of Level 4th Of Wavelet Tree

    Level 4, node by node denoising

  • Level 5

    Level 5, node by node denoising

    0 200 400 600 800-1

    0

    1Node (5,0)

    Frequency in Hz

    Am

    p

    0 200 400 600 800-1

    0

    1Node (5,1)

    Frequency in HzA

    mp

    0 200 400 600 800-1

    0

    1Node (5,2)

    Frequency in Hz

    Am

    p

    0 200 400 600 800-1

    0

    1Node (5,3)

    Frequency in Hz

    Am

    p

    0 200 400 600 800-1

    0

    1Node (5,4)

    Frequency in Hz

    Am

    p

    0 200 400 600 800-1

    0

    1Node (5,5)

    Frequency in Hz

    Am

    p

    0 200 400 600 800-1

    0

    1Node (5,6)

    Frequency in Hz

    Am

    p

    0 200 400 600 800-1

    0

    1Node (5,7)

    Frequency in Hz

    Am

    p0 200 400 600 800

    -1

    0

    1Node (5,0)

    Frequency in Hz

    Am

    p

    0 200 400 600 800-1

    0

    1Node (5,1)

    Frequency in Hz

    Am

    p

    0 200 400 600 800-1

    0

    1Node (5,2)

    Frequency in Hz

    Am

    p

    0 200 400 600 800-1

    0

    1Node (5,3)

    Frequency in Hz

    Am

    p

    0 200 400 600 800-1

    0

    1Node (5,4)

    Frequency in Hz

    Am

    p0 200 400 600 800

    -1

    0

    1Node (5,5)

    Frequency in Hz

    Am

    p

    0 200 400 600 800-1

    0

    1Node (5,6)

    Frequency in Hz

    Am

    p

    0 200 400 600 800-1

    0

    1Node (5,7)

    Frequency in Hz

    Am

    p

    Noise Signal Of Level 5th Of Wavelet Tree Denoised Signal Of Level 5th Of Wavelet Tree

  • Evaluation

    To verify the effectiveness of the proposed algorithms, we compared the speech detection

    and false-alarm probabilities

    The proposed methods are all evaluated by receiver operating characteristic (ROC)

    curves which show discriminative properties of VAD between noise-only and noisy

    speech frames in terms of the Probability of Correct detection (Pd) and Probability of

    false-alarm (Pf) such that

    =

    =

  • 10-0.01

    100

    100.01

    100.02

    100.03

    100.04

    10-0.01

    100

    100.01

    Pf: Probability of False alarm

    Pd

    : P

    rob

    ab

    ilit

    y o

    f d

    ete

    cti

    on

    Performance Evaluation

    20.6710 dB

    shape-preserving

    linear

  • Wavelet Filter type (filter

    Length)

    Probability Of Correct

    Detection (Pd %)

    Probability Of False Alarm

    (Pf %)

    Computation time

    (CP)

    Daubechies 2 86.4 15.6 2.872 s

    Daubechies 4 89.3 11.7 2.884 s

    Daubechies 8 91.8 9.2 3.023 s

    Daubechies 10 94.3 5.7 3.074 s

    Daubechies 12 94.5 5.5 3.898 s

    Daubechies 14 94.8 5.2 3.899 s

    The cost-performance (CP) is defined as

    CP = ( )

    Where the CP time is the average PWPT process time of specific wavelet. Considering the

    cost performance rate given in Table 1, the Daubechies wavelet filter with length 12,

    which has the best CP ratio, is recommended for the proposed algorithm.

  • References :

    Shi-Huang Chen, HsinTe Wu, Yukon Chang and T.K. Truong Robust voice activity

    detection using perceptual wavelet-packet transform and Teager energy operator in Pattern

    Recognition Letters 28 (2007) 13271332.

    Daubechies, I. (1992), Ten lectures on wavelets, CBMS-NSF conference series in applied

    mathematics, SIAM Ed.

    D. L. Donoho, I. M. Johnstone, Ideal Spatial Adaptation via Wavelet Shrinkage,

    Biometrika, vol. 81, pp. 425-455, 1994.

    S. Mallat, A theory for multiresolution signal decompo-sition: The wavelet representation,

    IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 11, No. 7, pp. 674

    693, July 1989.

    M. Berouti, R. Schwartz, and J. Makhoul, Enhancement of speech corrupted by acoustic

    noise, in Proc. IEEE ICASSP, Apr. 1979, pp. 208211.

    Johnstone, I.M., Silverman, B.W., 1997. Wavelet threshold estimators for data with correlated

    noise. J. Roy. Stat. Soc. B 59, 319351.

    G. David Forney, Jr., Exponential error bounds for erasure, list, and decision feedback

    schemes, Information Theory, IEEE Transactions on, vol. 14, no. 2, pp. 206220, Mar 1968.

  • TEO is powerful non-linear operator which has

    been successfully used in various speech

    applications, TEO can then be used to estimate

    the second moment angular bandwidth of a

    signal and the moments of a signal duration and

    that of its spectrum.

    TEO can determine the energy functions of

    quite complicated functions For a given band

    limited signal, TEO introduced by Kaiser is

    given by

    The time adaptive threshold selection for

    wavelet coefficients has been computed, which

    takes care of varying noise time into account.