hiwire meeting athens, november 3-4, 2005
DESCRIPTION
HIWIRE MEETING Athens, November 3-4, 2005. José C. Segura, Ángel de la Torre. Schedule. HIWIRE database evaluations Non-linear feature normalization ECDF segmental implementation Parametric equalization Robust VAD Bispectrum-based VAD Model-based feature compensation - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/1.jpg)
HIWIRE MEETINGHIWIRE MEETINGAthens, November 3-4, 2005Athens, November 3-4, 2005
José C. Segura, Ángel de la TorreJosé C. Segura, Ángel de la Torre
![Page 2: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/2.jpg)
2 HIWIRE Meeting – Athens, 3 - 4 November, 2005
Schedule
HIWIRE database evaluations
Non-linear feature normalization ECDF segmental implementation Parametric equalization
Robust VAD Bispectrum-based VAD
Model-based feature compensation VTS results on AURORA4 Including uncertainty caused by noise
![Page 3: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/3.jpg)
3 HIWIRE Meeting – Athens, 3 - 4 November, 2005
HIWIRE database evaluations
PARAMETERS: MFCC_0_D_A_Z (39 component)
MODELS: TIMIT: 46 phone models / 3 states / 128 Gaussians (17.664 G) WSJ16k: 16.825 triphones / 3.608 tied-states / 6 Gaussians (21.648 G) WSJ16kFon: 40 phone models / 3 states / 128 Gaussians (15.360 G)
ADAPTATION: MLLR: 32 regression classes / 50 adaptation utterances
GRAMMAR: LORIA & Word-Loop MODIFICATIONS: Some transcriptions have been modified to match
the grammar definition
![Page 4: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/4.jpg)
4 HIWIRE Meeting – Athens, 3 - 4 November, 2005
Transcription modifications
BEGIN { lista = LISTA; nfrase = 0;}
{ linea=$0; gsub("-","_",linea); gsub("Due_to_","Due_to ",linea); gsub("Mayday_Mayday","Mayday Mayday",linea); gsub("Pan_Pan","Pan Pan",linea); gsub("three hundred twenty","three_hundred_twenty",linea); gsub("one hundred sixty","one_hundred_sixty",linea); printf("%s\n",tolower(linea)); nfrase = nfrase+1;}
![Page 5: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/5.jpg)
5 HIWIRE Meeting – Athens, 3 - 4 November, 2005
HIWIRE database results
MODELS French Greek Italian Spanish World AvgTIMIT 7,30 9,93 11,87 9,27 6,26 8,93WSJ16k 14,70 25,11 20,66 18,01 14,32 18,56WSJ16kFon 10,43 19,51 16,52 15,33 8,72 14,10TIMIT_WL 26,79 33,77 35,61 30,88 22,53 29,92
RESULTS WITHOUT ADAPTATION (WER)
MODELS French Greek Italian Spanish World AvgTIMIT+MLLR 3,13 2,51 3,80 2,99 3,16 3,12WSJ16k+MLLR 3,85 4,48 5,94 4,53 4,00 4,56WSJ16kFon+MLLR 3,50 2,98 7,00 5,55 3,94 4,59TIMIT_WL+MLLR 11,12 9,43 14,61 13,14 12,20 12,10
RESULTS WITH MLLR (WER)
![Page 6: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/6.jpg)
6 HIWIRE Meeting – Athens, 3 - 4 November, 2005
Schedule
HIWIRE database evaluations
Non-linear feature normalization ECDF segmental implementation Parametric equalization
Robust VAD Bispectrum-based VAD
Model-based feature compensation VTS results on AURORA4 Including uncertainty caused by noise
![Page 7: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/7.jpg)
7 HIWIRE Meeting – Athens, 3 - 4 November, 2005
ECDF segmental implementation
ECDF segmental implementation
Provided LOQUENDO with a reference “C” implementation of segmental Gaussian transformation to be tested within LOQUENDO recognizer
Current work Nonlinear feature transformation with a clean reference to
avoid the problem of system retraining
![Page 8: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/8.jpg)
8 HIWIRE Meeting – Athens, 3 - 4 November, 2005
HEQ limitations
Influence of relative amount of silence in utterances
With a parametric model, a more robust equalization can be obtained
Parametric Equalization (1)
PARAMETRIC NONLINEAR FEATURE EQUALIZATIONFOR ROBUST SPEECH RECOGNITION (submitted ICASSP’06)
![Page 9: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/9.jpg)
9 HIWIRE Meeting – Athens, 3 - 4 November, 2005
Parametric Equalization (2)
CLASS-DEPENDENT LINEAR EQUALIZATION
SOFT DECISSION VAD (two-class Gaussian classifier on C0)NONLINEAR INTERPOLATION
![Page 10: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/10.jpg)
10 HIWIRE Meeting – Athens, 3 - 4 November, 2005
Parametric Equalization (3)
![Page 11: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/11.jpg)
11 HIWIRE Meeting – Athens, 3 - 4 November, 2005
Parametric Equalization (4)
In comparison with HEQ, PEQ transformations are smoother
For C0 a monotonic transformation is obtained
For other coefficients, the interpolated transformation is not monotonic
![Page 12: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/12.jpg)
12 HIWIRE Meeting – Athens, 3 - 4 November, 2005
Parametric Equalization (5)
BASE MFCC_0_D_A_Z (39 component)
HEQ Quantile based CDF-transformation Clean reference Implemented over MFCC_0 / CMS and regressions computed after HEQ
AFE Standard implementation
PEQ Clean reference Implemented over MFCC_0 / CMS and regressions computed after PEQ
![Page 13: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/13.jpg)
13 HIWIRE Meeting – Athens, 3 - 4 November, 2005
Parametric Equalization (6)
Current work
Development of an on-line version
Relax the diagonal covariance assumption
Investigate the normalization of dynamic features
Using a more detailed model of speech frames (i.e. More than one Gaussian)
![Page 14: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/14.jpg)
14 HIWIRE Meeting – Athens, 3 - 4 November, 2005
Schedule
HIWIRE database evaluations
Non-linear feature normalization ECDF segmental implementation (LOQ) Parametric equalization
Robust VAD Bispectrum-based VAD
Model-based feature compensation VTS results on AURORA4 Including uncertainty caused by noise
![Page 15: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/15.jpg)
15 HIWIRE Meeting – Athens, 3 - 4 November, 2005
Bispectrum-based VAD (1)
Motivations: Ability of higher order statistics to detect signals in noise Polyspectra methods rely on an a priori knowledge of the input
processes
Issues to be addressed: Computationally expensive Variance of the bispectrum estimators is much higher than that of
power spectral estimators for identical data record size
Solution: Integrated bispectrum J. K. Tugnait, “Detection of non-Gaussian signals using integrated
polyspectrum,” IEEE Trans. on Signal Processing, vol. 42, no. 11, pp. 3137–3149, 1994.
Computationally efficient and reduced variance statistical test based on the integrated polyspectra
Detection of an unknown random, stationary, non-Gaussian signal in Gaussian noise
![Page 16: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/16.jpg)
16 HIWIRE Meeting – Athens, 3 - 4 November, 2005
Bispectrum-based VAD (2)
Integrated bispectrum: Defined as a cross spectrum between the signal and its square,
and therefore, it is a function of a single frequency variable
Benefits: Its computation as a cross spectrum leads to significant
computational savings
The variance of the estimator is of the same order as that of the power spectrum estimator
Properties Bispectrum of a Gaussian process is identically zero, its integrated
bispectrum is as well
![Page 17: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/17.jpg)
17 HIWIRE Meeting – Athens, 3 - 4 November, 2005
Two alternatives explored for formulating the decision rule: Estimation by block averaging:
MO-LRT Given a set of N= 2m+1 consecutive observations:
Bispectrum-based VAD (3)
)( )H(P
)H(P
)H|ˆ(
)H|ˆ()ˆ(
1
0
0H|
1H|
1H
0H0
1
l
ll p
pL
y
yy
y
y
ml
mlk k
kmllmlN
k
k
p
pL
)H|ˆ(
)H|ˆ()ˆ,...,ˆ,...,ˆ(
0H|
1H|
0
1
y
yyyy
y
y
![Page 18: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/18.jpg)
18 HIWIRE Meeting – Athens, 3 - 4 November, 2005
Bispectrum-based VAD (4)
Likelihoods
Variances
![Page 19: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/19.jpg)
19 HIWIRE Meeting – Athens, 3 - 4 November, 2005
Bispectrum-based VAD results (1)
0
20
40
60
80
100
0 10 20 30 40 50 60FALSE ALARM RATE (FAR0)
PA
US
E H
IT R
AT
E (
HR
0)
G.729AMR1AMR2AFE (Noise Est.)AFE (frame-dropping)LiMarzinzikSohnWooBA-IBI (KB= 1, NB= 256)BA-IBI (KB= 3, NB= 256)BA-IBI (KB= 5, NB= 256)
![Page 20: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/20.jpg)
20 HIWIRE Meeting – Athens, 3 - 4 November, 2005
Bispectrum-based VAD results (2)
0
20
40
60
80
100
0 10 20 30 40 50 60FALSE ALARM RATE (FAR0)
PA
US
E H
IT R
AT
E (
HR
0)
G.729AMR1AMR2AFE (Noise Est.)AFE (frame-dropping)LiMarzinzikSohnWooMO-LRT IBI (KB= 1, NB= 256, m= 2)MO-LRT IBI (KB= 1, NB= 256, m= 5)MO-LRT IBI (KB= 1, NB= 256, m= 7)
![Page 21: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/21.jpg)
21 HIWIRE Meeting – Athens, 3 - 4 November, 2005
Bispectrum-based VAD results (3)
![Page 22: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/22.jpg)
22 HIWIRE Meeting – Athens, 3 - 4 November, 2005
Schedule
HIWIRE database evaluations
Non-linear feature normalization ECDF segmental implementation (LOQ) Parametric equalization
Robust VAD Bispectrum-based VAD
Model-based feature compensation VTS results on AURORA4 Including uncertainty caused by noise
![Page 23: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/23.jpg)
23 HIWIRE Meeting – Athens, 3 - 4 November, 2005
Schedule
Model-based feature compensation VTS: results on AURORA4
VTS formulation VTS vs non linear feature normalization procedures VTS results on AURORA 4
Including uncertainty caused by noise Including uncertainty in noise compensation Wiener filtering + uncertainty: results on Aurora 2 Wiener filtering + uncertainty: results on Aurora 4 VTS + uncertainty: formulation Numerical integration of probabilities: formulation
![Page 24: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/24.jpg)
24 HIWIRE Meeting – Athens, 3 - 4 November, 2005
VTS formulation
VTS: Vector Taylor Series approach to remove additive (and channel) noise
References: P.J. Moreno. “Speech recognition in noisy environments” Ph.D.
Thesis, Carnegie-Mellon University, Pittsburgh, Pensilvania, Apr. 1996.
A. de la Torre. “Técnicas de mejora de la representación en los sistemas de reconocimiento automático del habla” Ph.D. Thesis, University of Granada, Spain, Apr. 1999.
![Page 25: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/25.jpg)
25 HIWIRE Meeting – Athens, 3 - 4 November, 2005
VTS formulation
VTS provides an estimation of the clean speech in a statistical framework:
Log-FBO domain, assumed additive noise:
Effect of noise described using the “correction function” g():
![Page 26: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/26.jpg)
26 HIWIRE Meeting – Athens, 3 - 4 November, 2005
Auxiliary functions f() and h(): 1st and 2nd derivatives:
VTS provides estimation of noisy-speech Gaussian given the clean-speech and the noise Gaussians:
Noisy-speech Gaussian obtained with the expected values:
VTS formulation
![Page 27: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/27.jpg)
27 HIWIRE Meeting – Athens, 3 - 4 November, 2005
VTS formulation
Noisy-speech Gaussian: formulas:
Models for noise and clean speech:
![Page 28: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/28.jpg)
28 HIWIRE Meeting – Athens, 3 - 4 November, 2005
VTS formulation
Model for clean speech provides the model for noisy speech, and also P(k|y) (posterior probability of each Gaussian):
Estimation of clean speech:
![Page 29: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/29.jpg)
29 HIWIRE Meeting – Athens, 3 - 4 November, 2005
VTS vs non-linear feature normalization
VTS: Statistical framework: Model for noise in log-FBO domain: 1 Gaussian PDF Model for clean-speech in log-FBO domain: Gaussian mixture Noise assumed to be additive in FBO domain Accurate description of noise process
ACCURATE COMPENSATION
Non-linear feature normalization: No a-priori assumption Component-by-component
MORE FLEXIBLE, LESS ACCURATE
![Page 30: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/30.jpg)
30 HIWIRE Meeting – Athens, 3 - 4 November, 2005
VTS results on AURORA 4
Experiment Train mode
Test size
WER exp. 01-07
WER exp. 08-14
WER exp. 01-14
Baseline Clean 166 40.53 % 50.60 % 45.57 %
HEQ Clean 166 32.19 % 42.74 % 37.47 %
Parametric non-linear EQ
Clean 166 28.78 % 34.27 % 31.53 %
VTS Clean 166 29.46 % 37.22 % 33.34 %
VTS (noise known)
Clean 166 26.97 % 32.25 % 26.97 %
AFE Clean 166 27.57 % 34.99 % 31.28 %
Baseline Multi 166 24.58 % 29.88 % 27.23 %
![Page 31: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/31.jpg)
31 HIWIRE Meeting – Athens, 3 - 4 November, 2005
Including uncertainty in noise compensation
Noise is a random process: we do not know n, but p(n)
Then, from an observation y we cannot find x, but p(x|y,x,n)
Usually, compensation procedures provide E[x|y,x,n]
What about uncertainty of x ?
Mean and variance of x :
![Page 32: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/32.jpg)
32 HIWIRE Meeting – Athens, 3 - 4 November, 2005
Including uncertainty in noise compensation
![Page 33: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/33.jpg)
33 HIWIRE Meeting – Athens, 3 - 4 November, 2005
Including uncertainty in noise compensation
An approach for the estimation of the variance:
Evaluation of HMM Gaussians:
![Page 34: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/34.jpg)
34 HIWIRE Meeting – Athens, 3 - 4 November, 2005
Wiener filt. + uncertainty: results on AURORA 2
Preliminary results with Wiener filtering:
Results on Aurora 2 with Wiener filtering + uncertainty
Train mode WER Set A WER Set B WER Set C Aver. WER
Wiener Clean 15.75 % 15.87 % 17.62 % 16.17 %
Wiener + Uncert. Clean 12.13 % 12.90 % 13.28 % 12.67 %
Wiener Multi 8.91 % 10.44 % 10.95 % 9.93 %
Wiener + Uncert. Multi 8.87 % 10.34 % 10.69 % 9.82 %
![Page 35: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/35.jpg)
35 HIWIRE Meeting – Athens, 3 - 4 November, 2005
Wiener filter + uncertainty: results on AURORA 4
Experiment Train mode
Test size
WER exp. 01-07
WER exp. 08-14
WER exp. 01-14
Baseline Clean 166 40.53 % 50.60 % 45.57 %
HEQ Clean 166 32.19 % 42.74 % 37.47 %
Parametric non-linear EQ
Clean 166 28.78 % 34.27 % 31.53 %
VTS Clean 166 29.46 % 37.22 % 33.34 %
Wiener + Uncertainty
Clean 166 27.68 % 33.79 % 30.74 %
AFE Clean 166 27.57 % 34.99 % 31.28 %
Baseline Multi 166 24.58 % 29.88 % 27.23 %
![Page 36: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/36.jpg)
36 HIWIRE Meeting – Athens, 3 - 4 November, 2005
VTS + uncertainty: formulation
VTS based estimation of clean speech:
VTS based estimation of variance:
![Page 37: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/37.jpg)
37 HIWIRE Meeting – Athens, 3 - 4 November, 2005
Numerical integration of probabilities: formulation
Computation of expected values:
Numerical integration of expected values:
![Page 38: HIWIRE MEETING Athens, November 3-4, 2005](https://reader035.vdocuments.us/reader035/viewer/2022081520/56814d00550346895dba2bb7/html5/thumbnails/38.jpg)
HIWIRE MEETINGHIWIRE MEETINGAthens, November 3-4, 2005Athens, November 3-4, 2005
José C. Segura, Ángel de la TorreJosé C. Segura, Ángel de la Torre