perceptual wpt and time-adaptive level thresholding based enhancement of degraded speech

18
Perceptual WPT and time-adaptive level thresholding based enhancement of degraded speech Presented by Nitesh Kumar Chaudhary Department of Electronics & Communication Engineering The LNM Institute Of Information Technology, Jaipur Under the Supervision of Dr. Navneet upadhyay

Upload: niteshlnmiit

Post on 17-Dec-2015

26 views

Category:

Documents


7 download

DESCRIPTION

The basic idea introduced in this article is to detect the voiced part from noisy signal and enhance speech quality using perceptual wavelet packet transform and Time adaptive level dependent thresholding, the main advantages of this method is that it doesn’t required any priori SNR & static soft or hard threshold value which makes it more efficient for voice detection. So, instead of conventional wavelet packet transform it decomposes the input signal into adaptive critical sub-band of signal to improve the performance of various wavelet based speech processing. In the speech enhancement stage, to enhance the discrminablity of speech component Teager energy operator is applied for a given sub-band of discrete speech signal.

TRANSCRIPT

  • Perceptual WPT and time-adaptive level thresholding based enhancement of degraded speech Presented by Nitesh Kumar Chaudhary Department of Electronics & Communication Engineering The LNM Institute Of Information Technology, Jaipur

    Under the Supervision of Dr. Navneet upadhyay

  • Why speech enhancement ?...The presence of noise in speech can significantly reduce the intelligibility of speech and degrade automatic speech recognition performance.Reduction of noise has become an important issue in speech signal processing system, such as speech coding and speech recognition system.

    (a) Additive acoustic noise - such as the noise added to the speech signal when recorded in an environment with noticeable background noise, like in an aircraft cockpit.(b) Acoustic reverberation - results from the additive effect of multiple reflections of an acoustic signal.(c) Convolutive channel effects - resulting in an uneven or band-limited response, can result when the communication channel is not modeled effectively for the channel equalizer to remove the channel impulse response.

    .

  • (d) Electrical interference

    (e) Codec distortion - distortion caused by the coding algorithm due to compression

    (f) Distortion introduced by recording apparatus - poor response of microphone

    Keywords: Perceptual Wavelet packet transform (PWPT), Time adaptive Thresholding, TEO, Probability of detection Pd and false alarm Pf, Masking.

  • Block Diagram

  • Perceptual Wavelet Packet Transform :The Wavelet Packet Transform (WPT) is one such time frequency analysis tools. It is a transform that brings the signal into a domain that contains both time and frequency information.

    In wavelet analysis, a signal is split into an approximation and a detail. The approximation is then itself split into a second-level approximation and detail, and the process is repeated.

    In the corresponding Perceptual wavelet packet situation, each detail coefficient vector is also decomposed into two parts using the same approach as in approximation vector splitting and 17 critical bands are selected because for speech with 8 kHz sampling rate, 17 critical bands are required to cover the entire range of frequency

  • TEO & level dependent thresholdingTEO is powerful non-linear operator which has been successfully used in various speech applications, TEO can then be used to estimate the second moment angular bandwidth of a signal and the moments of a signal duration and that of its spectrum. TEO can determine the energy functions of quite complicated functions For a given band limited signal, TEO introduced by Kaiser is given by

    The time adaptive threshold selection for wavelet coefficients has been computed, which takes care of varying noise time into account.

  • For a selected band, mask is obtained by

    The voice activity shape V(n) is calculated by

    Masking Construction:

  • Time adaptive threshold calculation :

  • Level 3Level 3, node by node denoising

  • Level 4Level 4, node by node denoising

  • Level 5Level 5, node by node denoising

  • EvaluationTo verify the effectiveness of the proposed algorithms, we compared the speech detection and false-alarm probabilities

    The proposed methods are all evaluated by receiver operating characteristic (ROC) curves which show discriminative properties of VAD between noise-only and noisy speech frames in terms of the Probability of Correct detection (Pd) and Probability of false-alarm (Pf) such that

  • Wavelet Filter type (filter Length)Probability Of Correct Detection (Pd %)Probability Of False Alarm (Pf %)Computation time(CP)Daubechies 286.415.62.872 sDaubechies 489.311.72.884 sDaubechies 891.89.23.023 sDaubechies 1094.35.73.074 sDaubechies 1294.55.53.898 sDaubechies 1494.85.23.899 s

  • References :Shi-Huang Chen, HsinTe Wu, Yukon Chang and T.K. Truong Robust voice activity detection using perceptual wavelet-packet transform and Teager energy operator in Pattern Recognition Letters 28 (2007) 13271332.Daubechies, I. (1992), Ten lectures on wavelets, CBMS-NSF conference series in applied mathematics, SIAM Ed.D. L. Donoho, I. M. Johnstone, Ideal Spatial Adaptation via Wavelet Shrinkage, Biometrika, vol. 81, pp. 425-455, 1994.S. Mallat, A theory for multiresolution signal decompo-sition: The wavelet representation, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 11, No. 7, pp. 674693, July 1989.M. Berouti, R. Schwartz, and J. Makhoul, Enhancement of speech corrupted by acoustic noise, in Proc. IEEE ICASSP, Apr. 1979, pp. 208211.Johnstone, I.M., Silverman, B.W., 1997. Wavelet threshold estimators for data with correlated noise. J. Roy. Stat. Soc. B 59, 319351.G. David Forney, Jr., Exponential error bounds for erasure, list, and decision feedback schemes, Information Theory, IEEE Transactions on, vol. 14, no. 2, pp. 206220, Mar 1968.

  • ***********