1 introduction1 introduction 2 spectral subtraction 3 qbne 4 results 5 conclusion, & future...
TRANSCRIPT
1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work
P.C
. P
an
de
y
/ E
E D
ep
t /
IIT
Bo
mb
ay
IIT
Bo
mb
ay
Enhancement of Electrolaryngeal Speech
by Reducing Leakage Noise Using Spectral Subtraction
by
Prem C. Pandey
EE Dept, IIT Bombay
Feb’07
1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work
P.C
. P
an
de
y
/ E
E D
ep
t /
IIT
Bo
mb
ay
IIT
Bo
mb
ay
Abstract
Transcervical electrolarynx is a vibrator held against the neck tissue in order to provide excitation to the vocal tract, as a substitute to that provided by a natural larynx. It is of great help in verbal communication to a large number of laryngectomee patients. Its intelligibility suffers from the presence of a background noise, caused by leakage of the acoustic energy from the vibrator. Pitch synchronous application of spectral subtraction method, normally used for enhancement of speech corrupted by uncorrelated random noise, can be used for reduction of the self leakage noise for enhancement of electrolaryngeal speech. Average magnitude spectrum of leakage noise, obtained with lips closed, is subtracted from the magnitude spectrum of the noisy speech and the signal is reconstructed using the original phase spectrum. However, the spectrum of the leakage noise varies because of variation in the application pressure and movement of the throat tissue. A quantile based dynamic estimation of the magnitude spectrum without the need for silence/voice detection was found to be effective in noise reduction.
2
1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work
P.C
. P
an
de
y
/ E
E D
ep
t /
IIT
Bo
mb
ay
IIT
Bo
mb
ay
Overview
● Introduction
● Spectral subtraction for enhancement of
electrolaryngeal speech
● Quantile-based noise estimation
● Results, summary, & ongoing work
3
1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work
P.C
. P
an
de
y
/ E
E D
ep
t /
IIT
Bo
mb
ay
IIT
Bo
mb
ay
Natural speech productionIntroduction 1/5
Glottal excitation to vocal tract
4
1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work
P.C
. P
an
de
y
/ E
E D
ep
t /
IIT
Bo
mb
ay
IIT
Bo
mb
ay
External electronic larynx (Barney et al 1959)
Excitation to vocal tract from external vibrator
Introduction 2/5
5
1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work
P.C
. P
an
de
y
/ E
E D
ep
t /
IIT
Bo
mb
ay
IIT
Bo
mb
ay
Problems with artificial larynx
● Difficulty in coordinating controls
● Spectrally deficit
● Unvoiced segments substituted by voiced segments
● Background noise due to leakage of acoustic energy
Introduction 3/5
6
1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work
P.C
. P
an
de
y
/ E
E D
ep
t /
IIT
Bo
mb
ay
IIT
Bo
mb
ay
Model of noise generation
Causes of noise generation:• Leakage of vibrations produced by vibrator membrane• Improper coupling of vibrator to neck tissue
Introduction 4/5
7
1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work
P.C
. P
an
de
y
/ E
E D
ep
t /
IIT
Bo
mb
ay
IIT
Bo
mb
ay
Methods of noise reduction
Vibrator design Acoustic shielding of vibrator ( Epsy-Wilson et al 1996)
Piezoelectric vibrators (Katsutoshi et al 1999)
Signal processing 2-input noise cancellation based on LMS algorithm ( Epsy-Wilson et al 1996)
Single input noise cancellation ( Pandey et al 2002) based on spectral subtraction algorithm (Boll 1979 & Berouti et al 1979)
Introduction 5/5
8
1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work
P.C
. P
an
de
y
/ E
E D
ep
t /
IIT
Bo
mb
ay
IIT
Bo
mb
ay
Spectral subtraction for enhancement of electrolayngeal speech (Pandey et al 2000)
s(n) = e(n)*hv(n), l(n) = e(n)*hl(n)x(n) = s(n) + l(n)Xn(ej) = En(ej)[Hvn
(ej) + Hln(ej)]
Assumption: hv(n) and hl (n) uncorrelated Xn(ej)2 = En(ej)2[Hvn
(ej)2 + Hln(ej)2]
Noise estimation mode: s(n) = 0Xn(ej)2 = Ln(ej)2 = En(ej)2 Hln
(ej)2
L(ej)2 : averaged over many segments
Speech enhancement mode: Yn(ej)2 = Xn(ej)2 - L(ej)2
contd…
Spect. subtrn. 1/4
9
1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work
P.C
. P
an
de
y
/ E
E D
ep
t /
IIT
Bo
mb
ay
IIT
Bo
mb
ay
Implementation using DFT
Yn(k)2 = Xn(k)2 - L(k)2
yn(m) = IDFT [ Yn(k) ej Xn
(k)]
Modified spectral subtraction (Berouti et al 1979)
Yn(k) = Xn(k) -L(k)
Yn(k) = Yn(k) if Yn(k) L(k)
=L(k) otherwise
( : subtraction, : spectral floor, : exp. factors)
Output normalization factor for < 1 (Berouti et al 1979)
G = {(Xn(k)2 - L(k)2)/ Yn(k)2}/
Spect. subtrn. 2/4
10
1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work
P.C
. P
an
de
y
/ E
E D
ep
t /
IIT
Bo
mb
ay
IIT
Bo
mb
ay
Spectral subtraction method with ABNE (Pandey et al 2002)
Spect. subtrn. 3/4
11
1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work
P.C
. P
an
de
y
/ E
E D
ep
t /
IIT
Bo
mb
ay
IIT
Bo
mb
ay
Drawback of averaged noise estimation during silence
● Two modes: noise estimation & speech enhancement
● Estimated noise considered stationary over entire speech enhancement mode
● Some musical & broadband noise in the output Investigations for continuous noise estimation & signal enhancement● System with voice activity detector (Berouti et al 1979)
● Without involving speech vs non-speech detection (Stahl et al 2000, Evans et al 2002, Houwu et al 2002)
Spect. subtrn. 4/4
12
1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work
P.C
. P
an
de
y
/ E
E D
ep
t /
IIT
Bo
mb
ay
IIT
Bo
mb
ay
Quantile-based noise estimation
Basis for the technique
● During speech segments, frequency bins tend not to be permanently occupied by speech
● Speech / non-speech boundaries detected implicitly on per frequency basis
● Noise estimates updated throughout non- speech and speech periods
QBNE 1/6
13
1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work
P.C
. P
an
de
y
/ E
E D
ep
t /
IIT
Bo
mb
ay
IIT
Bo
mb
ay
Implementation of QBNE
● DFT of windowed speech segments
● FIFO array of past spectral values for each freq. sample is formed
● An efficient indexing algorithm used to sort the arrays to obtain particular quantile value:
– A sorted value buffer and an index buffer, for each frequency sample
– New data placed at locations of oldest data in sorted buffer by referring index buffer
– In all sorted buffers only one value needs to be placed at correct position
QBNE 2/6
14
1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work
P.C
. P
an
de
y
/ E
E D
ep
t /
IIT
Bo
mb
ay
IIT
Bo
mb
ay
QBNE 3/4Spectral subtraction with QBNE
QBNE 3/6
10
1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work
P.C
. P
an
de
y
/ E
E D
ep
t /
IIT
Bo
mb
ay
IIT
Bo
mb
ay
Investigations with QBNE
● Single quantile value - Quantile value which gives best visual match between quantile derived spect. & avg. spect. of noise is selected
● Two quantile values- Two quantiles for two frequency bands, which estimates noise close to avg. spect. of noise, were selected
● Frequency dependent quantile values - Estimated spectrum from noisy speech will be close match to the avg. spectrum of noise
QBNE 4/6
16
1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work
P.C
. P
an
de
y
/ E
E D
ep
t /
IIT
Bo
mb
ay
IIT
Bo
mb
ay
QBNE 5/6Investigations with QBNE (Contd..)
● Smoothened quantile values - Matched quantiles were averaged using 9 frequency values
● SNR based dynamic quantiles - Dynamic selection of quantiles depending on signal strength
q(k) = [(q1 (k) - q0 (k)) SNR (k) / SNR1 (k)] + q0 (k) q0 (k) if q (k) < 0
q1 (k) if q (k) > q1 (k)
17
1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work
P.C
. P
an
de
y
/ E
E D
ep
t /
IIT
Bo
mb
ay
IIT
Bo
mb
ay
Plot of SNR and frequency dependent quantilesfor three different applications of vibrator
Frequency sample
18
QBNE 5/6
1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work
P.C
. P
an
de
y
/ E
E D
ep
t /
IIT
Bo
mb
ay
IIT
Bo
mb
ay
Recorded and enhanced speech with (α=2,β=0.001,γ=1,N=16 ms), speaker: SP, material: /a/, /i/,and /u/ using electrolarynx Servox
Noise segment
/a/
/u/
/i/
Unprocessed Processed
Enhancement results
19
Results 1/3
1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work
P.C
. P
an
de
y
/ E
E D
ep
t /
IIT
Bo
mb
ay
IIT
Bo
mb
ay
Enhancement results Results 2/3
Recorded and enhanced speech with (α=2,β=0.001,γ=1, Widow length=16 ms), speaker: SP, material: question-answer pair in English “What is your name? My name is santosh” using electrolarynx Servox
20
1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work
P.C
. P
an
de
y
/ E
E D
ep
t /
IIT
Bo
mb
ay
IIT
Bo
mb
ay
Recorded and enhanced speech with (α=2,β=0.001,γ=1), speaker: SP, material: question-answer pair in English “What is your name? My name is santosh” using electrolarynx NP-1, Servox, and Solatone
Results 3/3Enhancement results
21
1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work
P.C
. P
an
de
y
/ E
E D
ep
t /
IIT
Bo
mb
ay
IIT
Bo
mb
ay
Conclusion 1/2
Conclusion● QBNE technique implemented for cont. updating of noise spectrum
& different methods for selection of quantile values for noise estimation investigated
● Results with QBNE during non-speech segment are comparable with results using ABNE
● Smoothened quantiles and SNR based quantiles resulted in better quality speech
● QBNE is effective for longer duration
● QBNE using SNR based dynamic quantiles is effective during long pauses
22
1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work
P.C
. P
an
de
y
/ E
E D
ep
t /
IIT
Bo
mb
ay
IIT
Bo
mb
ay
Ongoing work
● Evaluation of intelligibility and quality improvement
● Selection of optimum quantile values for different models of electrolarynx and users
● Phase resynthesis
from magnitude spectrum
using cepstral method● Real-time implementation of noise reduction, using ADSP- BF533 board
● Analysis-synthesis for
introducing small amount of
jitter to improve naturalness
Conclusion 2/2
23
1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work
P.C
. P
an
de
y
/ E
E D
ep
t /
IIT
Bo
mb
ay
IIT
Bo
mb
ay
1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work
P.C
. P
an
de
y
/ E
E D
ep
t /
IIT
Bo
mb
ay
IIT
Bo
mb
ay
References
A. Work at IIT Bombay
[1] S. M. Bhandarkar / P. C. Pandey (Supervisor), “Reduction of background noise in artificial larynx” M.Tech. Dissertation, Dept. of Electrical Engineering, IIT Bombay, January 2002.
[2] S. S. Pratapwar / P. C. Pandey (Supervisor), “Reduction of background noise in artificial larynx”, M.Tech. Dissertation, Dept. of Electrical Engineering, IIT Bombay, Feb 2004.
[3] B. R. Budiredla / P. C. Pandey (Supervisor), “Real-time implementation of spectral subtraction for enhancement of electrolaryngeal speech”, M.Tech. Dissertation, Dept. of Electrical Engineering Department, IIT Bombay, July 2005.
[4] P. Mitra / P. C. Pandey (Supervisor), “Enhancement of Electrolaryngeal Speech by Background Noise Reduction and Spectral Compensation”, M.Tech. Dissertation, Dept. of Electrical Engineering Department, IIT Bombay, July 2006.
[5] P. C. Pandey, S. M. Bhandarkar, G. K. Bachher, and P. K. Lehana, “Enhancement of alaryngeal speech using spectral subtraction” in Proc. 14th Int. Conf. Digital Signal Processing (DSP 2002), Santorini, Greece, pp. 591-594, 2002.
[6] S. S. Pratapwar, P. C. Pandey, and P. K. Lehana, “Reduction of background noise in alaryngeal speech using spectral subtraction with quantile based noise estimation”, in Proc. 7th World Conference on Systemics, Cybernetics, and Informatics,(SCI 2003 ), Orlando, Florida, 2003.
[7] P. C. Pandey, S. S. Pratapwar, and P. K. Lehana, “Enhancement of electrolaryngeal speech by reducing leakage noise using spectral subtraction with quantile based dynamic estimation of noise”, in Proc. 18th International Congress on Acoustics, (ICA2004), Kyoto, Japan, 2004.
[8] P. Mitra and P. C. Pandey, “Enhancement of electrolaryngeal speech by spectral subtraction with minimum statistics-based noise estimation” (A), J. Acoustical Society Am., 120 (5), p. 3039, November 2006
B. Literature on Artificial Larynx
[1] Y. Lebrun, "History and development of laryngeal prosthetic devices,” The Artificial Larynx. Amsterdam: Swets and Zeitlinger, pp. 19-76, 1973.
[2] L. P. Goldstein, “History and development of laryngeal prosthetic devices,” Electrostatic Analysis and Enhancement of Alaryngeal Speech, Spring field, Ill: Charles C. Thomas, pp. 137-165, 1982.
[3] M. Weiss, G. Y. Komshian, and J. Heinz, "Acoustical and perceptual characteristics of speech produced with an electronic artificial larynx," J. Acoust. Soc. Am., vol. 65, No. 5, pp. 1298-1308, 1979.
[4] H. L. Barney, F. E. Haworth, and H. K. Dunn, "An experimental transistorized artificial larynx," Bell Systems Tech. J., vol. 38, No. 6, pp. 1337-1356, 1959.
1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work
P.C
. P
an
de
y
/ E
E D
ep
t /
IIT
Bo
mb
ay
IIT
Bo
mb
ay
[5] Q. Yingyong and B. Weinberg, “Low-frequency energy deficit in electrolaryngeal speech,” J. Speech and Hearing Research, vol. 34, pp. 1250- 1256, 1991.
[6] C. Y. Espy-Wilson, V. R. Chari, and C. B. Huang, “Enhancement of alaryngeal speech by adaptive filtering,” in Proc. ICSLP, pp. 764-771, 1996.
[7] H. Liu, Q. Zhao, M. Wan, and S. Wang, “Application of spectral subtraction method on enhancement of electrolaryngeal speech,” J. Acoust. Soc. Am., vol. 120, No. 1, pp. 398-406, July 2006.
C. Literature on Speech Processing and Speech Enhancement
[1] L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, Englewood Cliffs, New Jersey: Prentice Hall, 1978.[2] S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Acoust., Speech, Signal Process,
Vol. 27, No. 2, pp.113-120, 1979.[3] M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of speech corrupted by acoustic noise,” in Proc.IEEE ICASSP’79,
pp. 208-211, 1979.[4] N. W. D. Evans and J. S. Mason, “Time-frequency quantile-based noise estimation”, in Proc. EUSIPCO ’02, 2002.[5] V. Stahl, A. Fisher, and R. Bipus, “Quantile based noise estimation for spectral subtraction and wiener filtering,” in Proc.
IEEE ICASSP’00, Vol.3, pp. 1875-1878, 2000.[6] N. W. D. Evans and J.S. Mason, “Noise estimation without explicit speech, nonspeech detection: a comparison of mean,
median and model based approaches,” in Proc. Eurospeech, 2001.[7] N. W. D. Evans, J.S. Mason, and B. Fauve, “Effective real-time noise estimation without speech, non-speech detection: An
assessment on the aurora corpus” in Proc. 14th Int. Conf. Digital Signal Processing (DSP 2002), Santorini, Greece, pp. 985-988, 2002.
[8] A. V. Oppenheim and R. W. Schafer, Digital Signal Processing, Englewood Cliffs, NJ: Prentice Hall, 1975.[9] M. Hayes, J. S. Lim, and A.V. Oppenheim, “Signal reconstruction from phase or and magnitude,” IEEE Trans. Acoust.,
Speech, Signal Process., vol. ASSP-28, pp. 672-680, December 1980.[10] T. F. Quatieri, and A. V. Oppenheim, “Iterative techniques for minimum phase signal reconstruction from phase or
magnitude,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-29, pp. 1187-1193, December 1981.[11] B. Yegnanarayana and A. Dhayalan, “Noniterative techniques for minimum phase signal reconstruction from phase or
magnitude,” in Proc. IEEE ICASSP, pp. 639-642, 1983.[12] S. H. Nawab, T. F. Quatieri, and J. S. Lim, “Signal-reconstruction from short time Fourier transform magnitude,” IEEE
Trans. Acoust., Speech Signal Process., Vol. ASSP-31, No. 4, pp. 986-998, August 1983.
1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work
P.C
. P
an
de
y
/ E
E D
ep
t /
IIT
Bo
mb
ay
IIT
Bo
mb
ay
[13] B. L. McKinley and G. H. Whiple, “Model based speech pause detection," in Proc. IEEE ICASSP-97, pp. 1179-1182, 20-24 April 1997.
[14] J. Meyer, K. U. Simmer, and K. D. Kammeyer, “Comparison of one- and two-channel noise-estimation techniques,” in Proc. 5th International Workshop on Acoustic Echo and Noise Control (IWAENC-97), London, UK, pp. 137-145, 11-12 September 1997.
[15] J. Sohn, N. S. Kim, and W. Sung, “A statistical model based voice activity detector,” IEEE Signal Processing Letters, Vol. 6, No. 1, pp. 1-3, January 1999.
[16] H. G. Hirsch and C. Ehrlicher, “Noise estimation techniques for robust speech recog nition,” in Proc. IEEE ICASSP-95, pp. 153-156, 8-12 May 1995.
[17] R. J. McAulay and M. L. Malpass, “Speech enhancement using a soft-decision noise suppression filter,” IEEE Trans. Acoust., Speech Signal Process., Vol. ASSP-28, No. 2, pp. 137-145, April 1980.
[18] C. Ris and S. Dupont, “Assessing local noise level estimation methods: application to noise robust ASR,” Speech Communication, Vol. 34, No.1-2, pp.141-158, April 2001.
[19] R. Martin, “Spectral subtraction based on minimum statistic,” in Proc. 7th European Signal Processing Conf., EUSIPCO - 94, Edinburgh, Scotland, pp. 1182-1185, 13-16 September 1994.
[20] S. Rangachari, “Noise estimation algorithms for highly non-stationary environments,” M. S. Thesis, Dept. of Electrical Engineering, University of Texas at Dallas, August 2004. Available online at www.utdallas.edu/~loizou/thesis/ sundar_ms_thesis.pdf (accessed in October, 2005)
[21] I. Cohen and B. Berdugo, “Noise estimation by minima controlled recursive averaging for robust speech enhancement,” IEEE Signal Processing Letters, vol. 9, no. 1, pp. 12-15, January 2002.
[22] I. Cohen and B. Berdugo, “Speech enhancement for non-stationary noise environments,” Signal Processing, Vol. 81, No. 11, pp. 2403-2418, November 2001.
[23] R. Martin, “Noise power spectral density estimation based on optimal smoothing and minimum statistics,” IEEE Trans. Speech and Audio Processing, Vol. 9, No. 5, pp. 504-512, July 2001.
[24] G. Doblinger, “Computationally efficient speech enhancement by spectral minima tracking in sub-bands,” in Proc 4th European Conf. Speech, Communication and Technology, EUROSPEECH'95, Madrid, Spain, pp.1513-1516, 18-21 September 1995.
[25] B. Widrow, J. R. Glover, J. M. McCool, and J. Kaunitz, “Adaptive noise canceling: principles and applications,” Proc. IEEE, vol. 63, pp. 1692-1716, December 1975.