the electrical engineering and applied signal processing...
TRANSCRIPT
THE ELECTRICAL ENGINEERINGAND APPLIED SIGNAL PROCESSING SERIES
Edited by Alexander Poularikas
The Advanced Signal Processing Handbook:Theory and Implementation for Radar, Sonar,
and Medical Imaging Real-Time SystemsStergios Stergiopoulos
The Transform and Data Compression HandbookK.R. Rao and P.C. Yip
Handbook of Multisensor Data FusionDavid Hall and James Llinas
Handbook of Neural Network Signal ProcessingYu Hen Hu and Jenq-Neng Hwang
Handbook of Antennas in Wireless CommunicationsLal Chand Godara
Noise Reduction in Speech ApplicationsGillian M. Davis
Signal Processing NoiseVyacheslav P. Tuzlukov
Digital Signal Processing with Examples in MATLAB®
Samuel Stearns
Applications in Time-Frequency Signal ProcessingAntonia Papandreou-Suppappola
The Digital Color Imaging HandbookGaurav Sharma
Pattern Recognition in Speech and Language ProcessingWu Chou and Biing Huang Juang
Forthcoming Titles
Propagation Data Handbook for Wireless Communication System DesignRobert Crane
Smart AntennasLal Chand Godara
Nonlinear Signal and Image Processing: Theory, Methods, and ApplicationsKenneth Barner and Gonzalo R. Arce
Forthcoming Titles (continued)
Soft Computing with MATLAB®
Ali Zilouchian
Signal and Image Processing Navigational SystemsVyacheslav P. Tuzlukov
Wireless Internet: Technologies and ApplicationsApostolis K. Salkintzis and Alexander Poularikas
CRC PR ESSBoca Raton London New York Washington, D.C.
Edited byWU CHOUAvaya Labs Research
BIING HWANG JUANGGeorgia Institute of Technology
PATTERNRECOGNITION inSPEECH andLANGUAGEPROCESSING
Preface
Basking Ridge, New JerseySeptember, 2002
Contributors
A. Abella
James Allan
T. Alonso
Jerome R. Bellegarda
William Byrne
Wu Chou
Sadaoki Furui
Jean-Luc Gauvain
Vaibhava Goel
Allen L. Gorin
Qiang Huo
Biing-Hwang Juang
Shigeru Katagiri
Lori Lamel
Qi (Peter) Li
John Makhoul
Hermann Ney
F. J. Och
G. Riccardi
Richard M. Schwartz
J. H. Wright
Contents
1 Minimum Classification Error (MCE) Approach in Pattern RecognitionWu Chou
2 Minimum Bayes-Risk Methods in Automatic Speech RecognitionVaibhava Goel� and William Byrne� � �
3 A Decision Theoretic Formulation for Robust Automatic Speech Recog-nitionQiang Huo
4 Speech Pattern Recognition using Neural NetworksShigeru Katagiri
5 Large Vocabulary Speech Recognition Based on Statistical MethodsJean-Luc Gauvain and Lori Lamel
6 Toward Spontaneous Speech Recognition and UnderstandingSadaoki Furui
7 Speaker AuthenticationQi Li� and Biing-Hwang Juang� � �
8 HMMs for Language Processing ProblemsRichard M. Schwartz and John Makhoul
9 Statistical Language Models With Embedded Latent Semantic Knowl-edgeJerome R. Bellegarda
10 Semantic Information Processing of Spoken Language – How May I
Help You?sm
A. L. Gorin, A. Abella, T. Alonso, G. Riccardi, and J. H. Wright,
11 Machine Translation Using Statistical ModelingHerman Ney, and F. J. Och
12 Modeling Topics for Detection and TrackingJames Allan
1
Minimum Classification Error (MCE)Approach in Pattern Recognition
Wu ChouAvaya Labs Research, Avaya Inc., USA
CONTENTS
1.1 Introduction
Proceedings of The IEEE �
�� �
1.2 Optimal Classifier from Bayes Decision Theory
�
�� � � �� �� � � � ������ � � �
�� � � � ������ �
��
��� �����
� �
��� � � � � �
���� � � �� � � ��� � �
���� � ��
��� � �� ������
���� ��� � ���
� �
������ � ���� ���
���� � �
��� � � � � �� �
� ��� �����
����� � �� � ����
��� � �� �
� ��
��� �
��� � � �
�� � �� ��� � � �� � � � � �
�
��� � �� ��� ���
� ��� � �� � �� � ��� � ��
�
���� � �� � ��� � �� � ���
� ��� � ���
�� �� ��� � � � � ���� ��� � �� � � �� �� � � � ��
�� ��� � ��
� ��� � �� � � �� � ���� ������ ��� �
� ���
� ���� � �� � ���
1.3 Discriminant Function Approach to Classifier Design
� ��� � ��
������������
���� � �� ��
���� � �� ��
���� � ���� ��
�� � ���� ��� � � � � ��� ��
���� � �� � �������� �������� � �� ����
�� � ���� ��� � �� � � ���� �� �
�� � ��� �� � � � � ���� �
��
������ � � � �� ��� ����
���� � � � � �� ��
�����
������ � ��
������ � � � �� ����
������� � � � �� ��� � �� ������������� �
������� � ���� ���
��� � � �
� ��� � ��
������� � � � �� ������ �� � � �� � � �� �������� � � � � � �� ��
� ��� � ��
� ��� � ��
1.4 Speech Recognition and Hidden Markov Modeling
�
��
�� � �� �
� �� � �� � �� �
� �� �� �� �� ��
� �� � � � � �� �
�
����
� ����� � � � � �� �
��
��
� � � � �
� � � � ����� �� � � � � ����
�
�
�� � �����
� ������ �� ��
��� � � �
�� � �����
� ������ �� ��
��
1.4.1 Hidden Markov Modeling of Speech
� � ������� � � � ��� �� �
�
� � ���� � ���� � � � � �
� � � �� � �� � ���� �� � � � � � � �
� �� � �� � � ������������ � � � ������� �
� ��
������� � � ��� � �� � � �� �� � � � � ��
��� � ��� ����
����
�� � �� � � �
��� ���� � �
��
�� �� � ��� � �� � � �
����
��
��� ��������������
� ���� ���������
������
�
�
�
�
� ������
� �� � ��
�
�
�
�
�
1.5 MCE Classifier Design Using Discriminant Functions
1.5.1 MCE Classifier Design Strategy
����� �� � � �� �� � � � ���
� ���� �
���� � � � � �����
������
����� � ����� � �� � ��
��
� �
���� ���
�� ����� ����
�����
�
������ ��� � ����� � �
����� � � �
� �
� � �� �� � � � � � ��� � �������� ��� ���� � �� � �
�
loss function
��� � �� � �������
��� �
� �� ����� ��
� � � � ����
�����
�
�� � �� ������
��� � ���� ���
��
��� � �� ���� � ����
����� � ��
���� �
�����
���
����� � ����� � ����� ����
�� � �� � � � ����� � �� �� ���� ����� � ��� �
���� �
�����
���
����� � ����� � ���������� � �� �� ����
����� � ����� ���
�
�����
���
����� � ����� � ���������� ���� ���
1.5.2 Optimization Methods
�
1.5.2.1 Expected Loss
���� � ������ ��� �
�����
�����
���� ���� ���
����
���� � �� � ����������� �����
�� ��������� �����
� � � � ��
�
Property 1 Suppose the following conditions are satisfied:
�� �
��
���
�� ���
��
���
��� �� �� � ��
�� � � � � � ��� such that for all t, the inner product
������ ��� �� ������� ��� ����������������� �� ��
where is the Hessian matrix of second order partial derivatives;
�� � � � � �����
�������� is the unique such that
� �� ������ ��������� ������ ��
Then, � given by��� � � �������� �����
will converge to � almost surely (i.e. with probability one).
�� ��
��������� �
��� ��
� � �� ������ �
�
1.5.2.2 Empirical Loss
� ��� � � � � ����
����� ��
�
��
���
��
���
����� � ������ � ��� �
���� � �����
� � � �
��
� � � �
� ��
���� �
���
�
�
�
�� �
1.5.3 Other Optimization Methods
� ��� ���� � ��� ���
��� �� � � �� � �
����
�
�� �� � �������� ������ �
�
� ��������� ���
�
����� � � ����
��� ���� � �� ��� � ����
�
�
�
1.5.4 HMM as a Discriminant Function
������� �� � � �������� �� � �����
�����
��������������� �����
� ������ ��
� ������ ��
� ���� � �� ���
������� ���
� ���� � �� � �� �
������� ���
�� ���� � �� �
�
�
���
������� ��
��
� �
��
� � �
segmental GPD� � ������� � � � ��� � �� � ����� ���� � � � � ����
� �
��
���� � �� � �
��� �
������� ��
�� ��������� ���
�
�����
�� �
���������
� � �����
������ �
�����
� � ���� ��� � � � � �� � �� �
������� ��
���� ���� �
�����
�������
����
����� � �
�����
�
� � � � ������
����� �
� ������ �
�����
������ � ���
���
�� ���� �
� ��� � � ��� �
����� � � � �
����� � �
�� ��� �� ���� ��� �
�����
� ����
�
�� ������ �� ������� �
����� �
��� � ������
����� � ������
�
�� ����� �� � ����� � �
������
�������
�
�� ������ �� ������� � ��
������
�� � ��
� �������� �� � � ��������� �������� ��
�� �������������
����� � ��
��������� ����
���
���
������������
���� ���������� �������
����� � ��
��������� � �
�����
��� � ��� ��
���� ����
����������
��������� �� ���� ���� � ����� ���
���� ������� �����
����� ����
����
��
�������
� ���������
� ��
����
����
� ��
�������
� ������������ �
� � ����
���������� �� � �
�������� �������� ��� �� �
�������
����������� �� � ������������ ������� ��
���������
����
���
���������
� ����������� ������� ������
��� � ��� ��
���� ����
���������
� �� ���� ����
���������
� ����� ���
���� ������� �
���� ��
����
����
� �� � �
������
�������
����
�
��� �� � �������
�������
��
� �
�� ������ ����
���
�
���������� �� � �� ������������ ��� �
��� ����
1.5.5 Relation between MCE and MMI
����� ���� �
����� �� � �������� ��
���������� ����
��� � �����
��� �������� � ����
� ������ ����
��
���
������������
�� � ����� � ��� ��� � �������� � ��� ��� � ��� �
��� ��� � �� � ���� ���� ��
���� � ����� �� � ��� ������
����� �� ���� � ��
���� � ��
� ���� � ��� � � �� � � � ����
����� �� � ������ ����
��
���
������� � �����
� ��� ���
� ������ ��� � ��� ���
��������� ����
��
���
������� � ������
� ���� �����
�
����� � ������ � �����
� � �
�
���� ����
����������� �
�
� ��� ���
�����������
�������� ��
� � ����������� ��
� � �
����� � ����� � �� � ������ � �����
���� ����
��������
������������ � � � ��������
�
���
��������
� ���������������� � �� � ��� � � ����
�
���
��������
�����
����� �� � � ���������������� � �� � ����
� �����
� � ������������� � ����
� ������������� � ����� � ���� � �����
������� � ����
���� ��
����� ����������������������
-10 -8 -6 -4 -2 0 2 4 6 8 10-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
FIGURE 1.1A plot of the value of the derivative of the sigmoid function.
���������� � � �����
� ����
���������� � ����� ����� � ��
����� � ����� � �� �
� ���� �� � ����� � ��
� ����� ���� � �
���� ��
���� ��
����� �����
�
�������
������ ���������
�
���� �������������
� � �
����� �� � �
��
��
�����
������ ��
������� �������������� � ���� � �����
����� � �����
����� �
� ���� �
��� ��
1.5.6 Discussions and Comments
�
�
�
�
���� ��� �
��
���
������������ � �������
���� �������� � �������
� �
1.6 Embedded String Model Based MCE Training
� �
�
FIGURE 1.2A structure diagram of a context dependent head-body-tail digit model inspeech recognition.
1.6.1 String Model Based MCE Approach
� � ���� � � � � �� �
�
��� � ��������
�� ������ � �� � �
�� ���
�� ��� ������ � �� � � ���
�
� � � ��� � � � ����
� ��
� �
� ���� � � � � ���
�� � �������
�� ������ � � � �
�� � ������� �������� �����
�� ������ � � � �
� �� � � � � �
������ � �� ������ � �� � � �
�� � �
�� ��� ������ � �� � �
���
������� � �� �������� � ��� � � �
��� ��� �� �������
��� � �
���� � �������� � ����
� � �
�
�� ������
������ ������
� �
FIGURE 1.3A diagram of the embedded string model based MCE training process.
������ ��
� � ����������
�
���� � �� ���������
� �
�
� �
�
�
� ���
��
������ � LD������ ���������������� � ����������
LD������ ������� ��
��
������� ��
1.6.2 Combined String Model Based MCE Approach
��
��� ��� �� � � ��� ��� �� � � �� ���� �� �
�� � �� �� ��
��
1.6.2.1 Discriminative Model Combination
���� � � � ������� � ��� � � � ��� �
� �
��� � ��� � � � ��� � �
��
���
����� � ����
��� � ��� � ��
��� � � �� � � � ��� �� �� � � � � ��
������ � ����������� � ����
�
�
�� ������
����������
� �
������ �
� ���������
��� � � � � ��
1.6.2.2 Discriminative Language Model Estimation
“too” “two”
�
� ��� � ����� � � � � ��������� �� �
�
�
�
�
�
�
1.6.3 Discriminative Feature Extraction
����
���� �
�� � � ������
�� � � �� � � � � �
� � � � ��
� � � � � �������
1.7 Verification and Identification
�� ��
� ��
���
�� ��
��� ��
�� � �� �� ��� ��
� �� ��
�� ���� � �� � �� �� � � �� �� � � �� ��
� � � ���� � � �� � �� � ���
� � ���� � � �� � �� � ��� � �� � �� � �� � ����
� � � �� � �� � ����
��
� �� � ��� � �� ����
����� ��� � �
��� ��� � � �� � ����� �� � ��� ��� � ��� � ������� � ��� �
� � ��� � � � ��� � �
FIGURE 1.4Block diagram of a speaker verification system
1.7.1 Speaker Verification and Identification
� �
� �� � ��� � �
��
� ����
� � �������
� ��� ���
��
� ��� �� � �� �
� ��� ��� � �� � ��
�� � ���� � ���
�� ���� �
�� ��
�� � ���
�� � �� ��
������
�� � ��� � ������
����
�� � ���
� ��� ��
�
������� �
��������� �������� � �
�������� � ������� � � �
������� �
������� ���
���������
������ ��
�
���
���
�������������� � �
������ ��
�
���
���
�������������� �� ��
���
����� � �������� � ���������
�� ��
1.7.2 Utterance Verification
� ������ ���� � �������
��� � �������
������ ������� �� ��� � ������� �
��� � ���� ��
��� � ������� ���
��������
� ���
��
�� �� ���
��� � �������
��� � ������� � ��� � ���� � �
���� � ���
������ � ����������
�
������� � � ����� � ���� � ����� � ���
�
�������� �
� ���������������
� �
� �
� �
���
� �� � �� �� ���
� �� � ��
� ��
�� ��
� �������� � � � � �� � �
��
���
���� � ����
���� � ��� ��
������� � � � � �� � �
��
���
����� � ����� ���� � �����
��� � � � � ��
�
1.8 Summary
Acknowledgement
References
IEEE Trans. on Elec-tronic Computers
IEEE Transactions on Computers
CLSP Research Note No. 40
Proceed-ings of ICASSP-86
IEEE Trans. Speech and Audio Processing
IEEE Transactions on Pattern and MachineIntelligence
Ann. Math. Stat.
Inequalities
Bull. Amer.Math Soc.,
Pacific J. Math.
Mathematical Statistics
Adaptive Algorithms and StochasticApproximations
Proc. 1997 Workshop onAutomatic Speech Recognition and Understanding Proceedings
IEEE Trans. Signal Processing
Proc. ICASSP92
IEEE Trans. Speech and Audio Processing
IEEE Proc. ICASSP-92
IEEE Proc. ICASSP-93
Proc.ICSLP-94
Proc. DARPA ANN Tech. Program CSR Mtg.
Proceedings of The IEEE
International Journal ofPattern Recognition and Artificial Intelligence
“Adaptive discriminative learning in pattern recog-nition,”
Elements of Information Theory
IEEE Proc. ICSLP’98
IEEE Transactions on Comput-ers
J. Roy. Soc.
Stochastic Process
Pattern Classification and Scene Analysis
IEEE Transactions on Information Theory
IEEE Transactions on Informa-tion Theory
Porc.1997 IEEE Workshop on Automatic Speech Recognition and Understanding
IEEE Proc. ICASSP’98
IEEE Proc. ICASSP’96
IEEE Trans. on InformationTheory
IEEE Proc. ICASSP’88
IEEE Proc. ICASSP’98
Speech Communication
IEEE Proc. ICASSP-93
Proc. IEEE
Proc. of theIEEE
Advances in Speech Signal Processing
Statistical Methods for Speech Recognition
IEEE Trans. Acoust. Speech Signal Processing
Technometrics
IEEE Trans. onInformation Theory
IEEE Trans. Acoust., Speech & Sig.Proc.
IEEE Trans. Acoust., Speech & Sig. Proc.
IEEE Trans. on Speech and Audio Process-ing
Proc. ICASSP’95
IEEE Trans. Acoustic., Speech, SignalProcessing
IEEE Transactions on Audio andSpeech Processing
Proc.ICASSP’97
IEEE Proc.ICASSP-92
Proc.IEEE-SP Workshop on Neural Networks for Signal Processing
Artificial Neural Networks for Speech and Vision
IEEE Proceedings
IEEE Transactions onSpeech and Audio Processing
Proc. ICASSP’98
IEEE Proc. ICSLP’96
The Development of the SPHINX System
Proc. ICASSP’90
Testing Statistical Hypotheses
Proc. ICASSP’96
Proc.ICSLP96
Computer Speech and Language
Proc. EuroSpeech’97
Proc. NORSIG’98
Proc. ICASSP’98
IEEE Trans. Audio & Speech Proc.
J. Acoust. Soc.Am.
Proc.ICSLP’96
Proc.ICASSP’96
Computer, Speech and Lan-guage
IEEE Transaction on Speech and AudioProcessing
IEEE Proc.ICASSP’99
Comput. Speech Language
Proc. EuroSpeech’97
Adaptive, Leaning and Pattern Recognition
IEEE Trans., on Acoustics, Speech and SignalProcessing
IEEE Trans. on Speech and AudioProcessing
Proc.ICASSP’99
Proc. EuroSpeech’95
Convergence of Stochastic Process
Proc. IEEE
Fundamentals of Speech Recognition
IEEE Proc. ICASSP’95
IEEE Proc. ICSLP’96
IEEE Proc. ICASSP’96
Proc. 1995 EuroSpeech’95
Proc. EuroSpeech’99
ESCAWorkshop on Interactive Dialogue in Multi-Modal Systems
SIAM Review
IEEE Proc. ICASSP’98
Proc. ICSLP’92
IEEE Proc. ICASSP’95
Neural Network for Signal Processing II
IEEE Proc. ICASSP’98
Proc. ICASSP 91
IEEE Proc.EuroSpeech’97
Proc. ASRU’99
IEEE Proc. ICSLP’96
IEEE Transactions on Speech and Audio Processing
IEEE Proc. ICASSP’98
Proc.ICASSP’96
Speech Commu-nication
IEEE Transactions on AutomaticControl
IEEEProc. ICASSP’99
IEEE Proc.ICASSP’96
Proc. ICASSP-2002
IEEE Transactions on Image Processing
2
Minimum Bayes-Risk Methods in AutomaticSpeech Recognition
Vaibhava Goel� and William Byrne�
�IBM; �Johns Hopkins University
CONTENTS
2.1 Minimum Bayes-Risk Classification Framework
� � ��� ��� ���� ��� � ��� ��� ���� �� �� �
��
��
hypothesis space
� ��� ��� � � ���
�
����� ��� � � � � ��
� ����� ��
� �����
� ���������� ������
���� �����
��� � ����� ����
�
�
���
����� ��� �� ����
� �� ������ �
��
� � �� � ��� �� ��� � ���
��� � ����� ����
�
�
�����
����� ��� �� ����
expected loss
��� �� ��
�����
����� ��� �� ����
� �
���
��� evidence space �
� �� ��� evidencedistribution
2.1.1 Likelihood Ratio Based Hypothesis Testing
�
�
� �
�
Æ ��� �
��
� ������� ������
� �
�
�� � ��� �� �� � ��� ��
� ���� � �
�������
� � � ��� � � ����� � � ��� � � ����� � � ��� � � ���� � � ��� � � ���
��
����� � �� ����� ����
�� ����� � �� ����� ���� ��
����� � ����� ������ ����� �� ������� ����� � �� ������� ����
2.1.2 Maximum A-Posteriori Probability Classification
Æ �� � ���������
�� ���
�������� �� �
� � � ����
�� � ������ ����
�� ���
�� ��� ��
������������� ����
�� ���
2.1.3 Previous Studies of Application Sensitive ASR
2.2 Practical MBR Procedures for ASR
��� ���
��� ����
2.2.1 Summation over Hidden State Sequences
� �� ��� � � �� �� ���� ��� ����
� �� � language model� ���� �
acoustic model� � ���� � �
� � ���� �
� ���� � ��
���
� ���� �� �
��
���
� �� �� �� ������ ��
� �� �� �
�
��� � ������� ��������
����
�
������������
�������� �� �� � ��� � �������
�� �� �
�
��� ��
� � �� ��� � ��
� ��� �� ����� �� ��
� ��� ��
��� ��
�
��� � ������ ����
�
����
����� ��� ������
� � �����N-best list lattice
2.2.2 MBR Recognition with N-best Lists
� �����
�� ��
��� � ������� ����
�
����
����� ��� ������
2.2.3 MBR Recognition with Lattices
��
��
��
��
2.2.3.1 Lattice Definitions
� �� � � � ��� ��� �� �� �� ��
� � � � � �
�
����
path complete path � �
�� path segment�� �� �� �� ��
�� partial path � �
� � ��
�� �� �� �����
�� �� �� �� ��
�� �� �� ����� �� �����
����� � �
��� ���� ������ �� ��� ���� ��������� �� ��� ���� ���������
partial path log-probability lattice backward log-probabilitylattice total probability ��
��
�� ���� � �� �� ���� ������� �
��
������ � ����
�������������� ���� ���������
��
��
��
� ���� � � ��� ���� � ������� �
�� ���� ������
� ���� � �
����� �� ���� ������ � ��
�� �������������
� ���� ����������
�� �
� �
�����
�� �������������
� ���� ����������� ���� ������
�� �
FIGURE 2.1An example lattice. The time marks correspond to the node times and theword ending times. The numbers on the edges are logarithms of conditionaljoint probabilities as described in the text. The partial path log-probability ofa partial hypothesis is the log of the probability of its path; the partial path�� � (‘HELLO’,‘0.6’) in this lattice has value ����. The lattice backwardlog-probability of a partial hypothesis �� is the log of the sum of probabili-ties of all lattice paths from end node of �� to the lattice end node; for thepartial path �� � (‘HELLO’,‘0.6’) in this lattice these paths are indicated bydotted lines and the lattice backward log-probability of this �� is �����. Thelattice total probability of a partial path is the exponentiated sum of its partialpath log-probability and lattice backward log-probability; its value is ����� for�� � (‘HELLO’, ‘0.6’) in the lattice above.
� �
�����
�� ��������������
� �� ���� ��
���
��
�������������
� �� ���� ��
� ��� ��
�
��
2.2.3.2 �� Search Under General Loss Functions
����
�� � �
�� �� �������
��� � ����� ������
�
������
����� ��� ������
� � �� ��
��� �� ��
������
����� ��� �����
� �
��
��
���� � ��������������
�
������
������ ����� ������
� �
�� �� ��
������
����� ��� ������
��
��
2.2.3.3 Single Stack Search Under Levenshtein Loss Function
���� ��
� �
���
����� ��
������
���������
�����������
� � �� �� ��
�� �� � ���� ���
���������� � �� ��
�
� �
� �� ��� � �� ��
� �� � �� ��
�
��� �� ��
������
�� �� ���� �� �� �� �� � � �� ��
���
�� � ���� � ���������
� � � �� �� ��� �� ��� ��
���� � �������
����� ��
��� �� ��� �� � �
��� ��
���� � ���
2.2.3.4 Prefix Tree Search Under Levenshtein Loss Function
�
�� ���
�� � ����� � ���� ��
��������� �� ����� � �� �
��
�
����� ��
������
�����������
� � �� �� ��
�� �� � ���� ��� � � �� �
��
��������
�
�� ��� �� ���
��������������
��������
��� � ��� � � � � �� �
��
��������
��������������
��������
��� � ��� � ��
�� ��� �� ���
� � �� �
��
��������
��������������
����������
��� � ���� � �� � ���
� �����
�� ��
prefix tree
������ ����
��������������
����������
��� � ���� � ���
partial hypothesis comparison cost
�
2.2.3.5 Pruning and Multistack Organization of the Prefix Tree Search
��
2.2.3.6 Loss Functions Other than Levenshtein Distance
��
2.3 Segmental MBR Procedures
high con-fidence regions
low confidence regions
��
�� �
�� �� segment sets� ��� � �� ���� �
� ���� � ���
� ���� � ������� �� � � �� ��
��� �
�
�� �� ��
�� �
��
��� �� ��� � � ��
conjunction rule ��� �
�� � � ��
�� � �
� � �
�
� ��
� ��
� � � ������ � � �� �� �
����� �� �
��
���
�����
��� �� ��
�������
�� ���
Proposition.�
�� � ��������
���
��
��� � ������
�����
�
�
� �����
���� ������ ��� ����
��� ��� ���
��� �� ��
���������� ��� �
�� ���
� ��� ����
induced
������ �� �
��
���
�������� �� ��
�������
��� �� �
2.3.1 Segmental Voting
���� � ������� �����
�
� ���������
� ��������
� ���� �� �
��
���
���������� �� ��
�������
�������
�� ��
2.3.2 ROVER
���� � �� ����� �
� �� ��
���� ���� ��
� �� ��� ���
���
������ ���� � � �����
���
�� � ��
�� � �� ����
��
simultaneous alignment
� �
corre-spondence set
���
� �
� � �� ���
FIGURE 2.2An example word transition network.
���� ������ �� �
��
���
���������� �� ��
������
� � �
���� ������ ��� � �
2.3.3 e-ROVER
joining expanded
� � � �
FIGURE 2.3Joining two correspondence sets.
������ ������ �� �
��
��������� ����
��������� �� ��
������ � �����
���
� ��
����� �� � ������ ������ �� � ���������� ���
segmentation
2.4 Experimental Results
2.4.1 Parameter Tuning within the MBR Classification Rule
� �����
� �����
��������� � ���� �� ���� �� �� ��
�� � � �
�� �word insertion penalty �
languagemodel scale factor
likelihood scale factor �
����������� � ����� �� ���� �� �� ������
TABLE 2.1
� � ���� � � ����� � � ����
���������� ������ �� �������� ��
����� � �����
������ ���
������ ���
�������� ��� � � ����
2.4.1.1 Optimization of Likelihood Parameters
���������������� � � �
�����
�
�������
��� ����������
� � �������supervised optimization
unsupervised optimization
� � ���
� �
� � ����� � � ���� �
� � � �
�
2.4.2 Utterance Level MBR Word and Keyword Recognition
��
����� ��
���� �� � �������
��
���� �� �� �
��� ����
��
��
��
abilities, bartenders, calculation, databasesa, and, the, besides, collaboration, distribution
2.4.2.1 Likelihood Scale Factor Tuning
�� ��
2.4.2.2 N-best List Rescoring and �� Search
��
��
��
TABLE 2.2
��
�
2.4.3 ROVER and e-ROVER for Multilingual ASR
�
�
FIGURE 2.4Top panel shows the ratio of total number of e-ROVER correspondence sets tothat of ROVER correspondence sets, as a function of the pinching threshold.Bottom panel shows the WER performance of e-ROVER for these thresholds.
2.4.3.1 Correspondence Set Pinching
2.5 Summary
��
��
2.6 Acknowledgements
References
Mathematical Statistics: Basic Ideas andSelected topics
IEEE Conference on Acoustics, Speech, and Signal Pro-cessing
��� Hub-5 Conversational Speech Recognition Workshop
In Proceedings of the NIST and NSASpeech Transcription Workshop
IEEE Workshop on Au-tomatic Speech Recognition and Understanding
ACL99
IEEE Conference on Acous-tics, Speech, and Signal Processing
Word List With Content Word Marks
Minimum Bayes-Risk Automatic Speech Recognition
�� Eurospeech-99
In Proceedings of the NIST and NSA Speech Transcription Work-shop
Computer Speech and Language
Research Notes No. 40, Center for Language andSpeech Processing
IEEE Conference on Acous-tics, Speech, and Signal Processing
��� International Conference on Spoken Language Pro-
cessing
IEEE Conferenceon Acoustics, Speech, and Signal Processing
IEEE Transactions on Systems Scienceand Cybernetics
SIGART Newsletter
IBM Journalof Research Development
Statistical Methods for Speech Recognition
Proceedings of the 1997 Large Vocabulary Continuous Speech RecognitionWorkshop
IEEE Transactions on Signal Processing
��� International
Conference on Spoken Language Processing
IEEE
Conference on Acoustics, Speech, and Signal Processing
1997 IEEE Workshopon Automatic Speech Recognition and Understanding
Soviet Phys. Dokl.
Eurospeech-99
9th Hub-5 Conversational Speech Recognition Work-shop
9th Hub-5 Conversational Speech RecognitionWorkshop
Eurospeech-95
IEEE Transactions on Acoustics, Speech,and Signal Processing
IEEE Transactions on Acoustics, Speech, and Signal Processing
��
IEEE Conference onAcoustics, Speech, and Signal Processing
IEEE Trans. PAMI
IEEE Conference on Acoustics, Speech, and Signal Processing
Eurospeech-97
Estimation of Dependences Based on Empirical Data
IEEE Conference on Acoustics, Speech, andSignal Processing
IEEE Transactions on Acoustics, Speech, and Signal Processing
HTK 2.1
3
A Decision Theoretic Formulation for RobustAutomatic Speech Recognition
Qiang HuoThe University of Hong Kong, Hong Kong, China
CONTENTS
3.1 Introduction
� �
decision problem �
� �
� class �
� �
�
statistical pattern recognition
FIGURE 3.1Communication Theoretic View of ASR: Noisy Channel for Speech Generationand Signal Capturing (adapted from [68]).
� � �
�����
� ����� parametric family������ � ������ � � ���� � ������ �
� ���� ��
� �����training data
plug-in MAP a posteriori decision rule
�� � �����
� �� ��� � �����
������� � � ����� � �
�� �� ��
statistical decision
3.2 Optimal Bayes’ Decision Rule for ASR
� � � � �� �� � ������� � � � ����� � � ��
� ��decision rule ���� �
� � �� � ��
� � ����� � � �� � � �� ���� � ��
� � �� deci-sion space �� � ����� � � � ��� ����
�� ���� � �nonrandomized decision rule
������ � �� � � � ��� ���� � ��� ��� � � ����
� ������� ���������������� � � � ������ �� ��
��
���
������ � ��� �������
������ � �� ��� � �� � �� � � � �� � � � ���
� � ����������� sampling paradigm
������ � �� ������ � ������ � ����� �� �� �
loss��������� loss function
���� �
� � ����� � � �������� �� � ��
true distribution ����������� � ��� ��� � �� � ����
total risk ������� ����
������� � �����������������
��
����
�����
�����������������
�
�����
������
����
���������� �� ������
��
����
� �� �
�����
�������������� ��� �
������� � ������
�������
������� � �������
�����
������
����
���������� �� �������
����� � �� ��������
�����
���������� �� ����
Bayes’ decision rule
�������� �
�����
������
����
����������� �� �������
Bayes’ risk������
� 0-1 loss function
��������� �
�� � � ����� � �� ����
� � � ���� � �
��������� ��
����
� �� �
�� ������ �
����� ���
� ���
����
������� �
� �� ������ ����
������ minimum classificationerror ������� � �
� � �� ����
� �� ��� � �� ����
����� � � � �� �
MAP decision rule
�
���������
������ ����� � � �� �
�������
3.3 Adaptive Decision Rules Constructed from Training Samples
true ������true prior uncertainty
independent � � ��� ������ � ��� �� � � � � �� independent �� ����� �������
� ���� � ����� �� X independent �adaptive decision rule
3.3.1 Plug-in Bayes’ Decision Rules with Maximum-likelihood DensityEstimate
3.3.1.1 What are Plug-in Bayes’ Decision Rules?
plug-indecision rules � �� �� �� ������ ��
�� �� �� ����� �� � plug-indecision rule � � ������
� �� �� �� ������ ���� �� �� ����� ��
������ � � �� �������
�����
��������� �� �� ����
�� �� ��� ������� � �� �� ���
������ � �� �� �
����� ��� � ��
�� � �����
�� �� ��� � �����
������ � � �� �� �
plug-in MAP decision rule������
plug-in risk �������
������� ��
����
�� �� �
�����
��������������� ����
density plug-in estimator � �� �� �������� ��
������ � � �� ������
�������
�� �������
3.3.1.2 Why Could Plug-in Bayes’ Decision Rules Work?
��� �������
�� ���������������
Property: � �� �� �� ������ ��
����� �������� � �������� � �� ��������
Bayes’ risk consistency
Theorem: (Bayes’ risk consistency): � �� �� �� ������ ��
� ���
�� �� ������� � �� �� ������ �
������ ����� �� � �� � ���
��� ������������� �������� �
3.3.1.3 Implications on Parametric Models and Parameter Estimation
assume ������������ � ���� � ���� estimated
Bayes’ risk consistent
������ � ���� �
representative
Discrete HMM Contin-uous Density HMM �
������ � ���� �
finite state knowledge sourcesnetwork search
maximumlikelihood �
�����
������ � ���� �
�� ��� � � ��
�� � ������� �� �� � ����� �
������ �����
���� ��� point estimator �����minimum
discrimination informationdiscrimination information directed divergence
discriminative trainingmaximum mutual information
conditional maximum likelihood estimate H-criteria
corrective training minimum empirical classification error
3.3.2 Maximum-Discriminant Decision Rules Minimizing the Empiri-cal Classification Error
3.3.2.1 What are Maximum-Discriminant Decision Rules?
discriminant function ������ � �
� � ��
maximum-discriminant decision rule ����
�� � �����
������ �
� � ���
�
������� � ������
��
� ���� � �
minimum misclassification best-count ������� ���� � �
���� � �
�� ����� � ��������
������� �
density estimator
�� �� ������ � ��������
��������� � ���� ����� �
�� ����� ����
3.3.2.2 Why Could Discriminant Approach Work?
Theorem: (Uniform Convergence) m-convex �
� � � ���������������� uniformly ������ �
����������
������� � ��������������� � �
best-count ����� � ��
���� ������ � ����������
���������
��� ������ � ����������
���������
�� �����
����� � � ������ ����
��������� �
�����
3.3.2.3 Implications on the Choice of Discriminant Functions and the PracticalTraining Algorithms
3.3.3 Discussion
plug-in MAP
maximum discriminantminimum empirical classification error
representative
3.4 Violations of Modeling Assumptions in ASR
3.4.1 Types of Distortions
� �� �� ������� �� �� �� � ����� ��
� � ��� ������ � � �� �� � � � � ��independent �� ����� � ������
� representative�
� � �
� � � �
� �� �� � � � �� �� ������ � � ����� ��
� ������
modeling error estimationerror
3.4.2 Towards Adaptive and Robust ASR
3.5 Improving Adaptive Decision Rules via Decision ParameterAdaptation
3.5.1 Decision Parameter Adaptation for Stationary Operating Condi-tions
������ �
������
�� � ��� �
����
��� � � �� �� � � � � ���
�� �
����
�� � ������
� goals of adaptation� �
� �
3.5.1.1 Adaptation for Plug-in Decision Rules
Remark 1:regularization
imposing constraints
maximum penalized likelihood
Remark 2:
3.5.1.2 Adaptation for Maximum-Discriminant Decision Rules
w.r.t.
empiricalminimum
expected classification error
���� ������
�����
������������������ �
����������� ����
������ stochasticapproximation
���� ���� ������ � � �� �� � � � � ������
���� � �� � ��������� ������
������� ������ ��
����������
�
3.5.2 Decision Parameter Adaptation for Slowly Changing OperatingConditions
�� �
����
��
��
forgetting mechanisms
3.5.3 Decision Parameter Adaptation for Switching Operating Condi-tions
adaptive model fu-sion
�
�
�
3.5.4 Discussion
robustdecision rule
3.6 Robust Decision Rules
3.6.1 Decision Rule Robustness
���� � ����� ��� ���� � �� � � ��
�
��
� � ���
�
������ � �
��
�������� � ������������ �
��������� �� � �
��
�� � �������� � �����
�
�
��������
guaranteed (upper) risk ������ � �
��
�� ���
�� � �������� � ����������� �
���� � � ��
�
�������� overall risk �������� ��������robust (with respect to distortions ��
� ) decisionrules ������ �
����� � ��� ����
��������
minimax decision rule ������ �
����� � ��� ����
��������
predictive decision rule
�� � ��
�
��
�
minimax decision ruleBayesian predictive decision rule
� ����� � � �� ������� � ���� �
� � �
� uncertainty neighborhood� �
������
3.6.2 Minimax Classification Rule
��������� uncertainty neighborhood��� ����� � ��������� �����
� �
��
� � ������� �� ���� � � ����� � ���������� �
��
�
�� � �������� � ������������������
�����
���� �
�����
�������������� ���
��������
�����������������
��� � ��������� ������
�����
������������������
��������������� ����� ��� �
� �� � ����� � ��
��� � ��������� ��
����
����� �
�������� �
�����������
������ ��� �
���������
������ � �� �
������ � � ��������
������ �� �
minimax decision rule
����� �� �
��� � � � ������
�� �� �
��� �� ��
��� �� �
��� ��� ��
����� � ������ �� �� �� � ��
�� �� �
����� � ������ �� ��
� ��
��� � ������� � � � � ������� �
������ �� �
� �
minimax decision rule ���
������ � �� �
������ �� �
������� � �������� �
model-space stochastic matching
3.6.3 Bayesian Predictive Classification Rule
����
average out�
Bayesian predictive classification�����
�����
a priori ����������� � �
���� � � � �� � � ��
�� �� � � ����� � �
����
hyperparameters
����������� � �
���� � � �����
���� � � �����
���� � �
�
��� ����������� � �
���� �
������
����������� � �
���� �
������� � ��������� � �������
���� � �
���� �
���
���
�������� � ����������� � �
���� �����
� ����� � � ����� � �
����� � ������� � �����
���� �
���
������ � ��������� ���
�
����� � ������� � �����
���� �
���
������ � ��������� ���
�
������� �
����������� � �
���� � �
point estimate ��� �� ������� �
���
������� �empirical Bayes
��������� ��� � ������� � �������
��� �������� �
��������� ��� ������� � prior uncertainty�
� ����������� � �
���� �
��� �� �
� representative �������
������� ��������� ���
prior uncertainty ��� �
� �
�
��
� � ������� �� ���� � � ����� � ��������� ���� � � ���� � ��� �
�
��������� �����
� overall risk �������
������� � ���������������������
��
����
�����
���
���
��������������������������� ���������
��
����
�����
������������� � � �� ��� �
����� � �
���
����������������� �
� �� � �
���
��� ������������
predictive densities ������� ��������
��������
����� � ������
�� ����� � ������
������ � � �� �� � �
����� Bayesian predictive classification(BPC) rule
�������� �����
��� ��
��
� � ������� �� ���� � � � �������� � �������� ��� �� � ���� ��� �
������ ������� � ��� � ��
�� �
������ � �
����
������ � ������������� �
�� �� � �
����
��� �� � �������������� �
���� ����model parameter
uncertainty
��
�� ������ �� � �� � � ����� � � � � ��������� � � ������� ��
� �� � � � � ����� �� � � ����� �� � �� � � �
����� � � �� �������� � ����� � ������ �
���� �
3.6.4 Discussion
������� � training set �
����������� � �
���� � �
������� �� ������ �
����� � ��� ��
����� � ��� ��� �
�
�
������� �
reproducing density
approximate Bayesian (AB) decision rule
�� � �����
����������� � � ������� �
��� ������� ������ � �
��
�
������� �
�
�������� � � ������� �� � �
�� � �����
���� ����� �� � � ����� �� �
���� �Bayesian minimax rule
Bayesian predictive density
Bayesian predictive density based model compensation
3.7 Summary
� class�
�
Acknowledgement
References
Acoustical and Environmental Robustness in Automatic SpeechRecognition
Proc. of ICASSP-2001
IEEE Trans. on Speechand Audio Processing
Statistical Prediction Analysis
IEEE Trans. on ElectronicComputers
IEEE Trans. on Pattern Analysis and MachineIntelligence
Proc. of ICASSP-86
IEEE Trans. Speech and Audio Processing
Speech Recognition
IEEE Trans. on Acoustics,Speech, and Signal Processing
Inequalities
IEEE Signal Pro-cessing Letters
Proc. of Eurospeech-2001
IEEE Trans. on Speech and Audio Pro-cessing
Proceedings of the IEEE
Proc. of ICASSP-1998
Speech Communica-tion
Pattern Classification and Scene Analysis
Pattern Classification
Spoken Dialogues with Computers
IEEE Trans. on InformationTheory
IEEE Trans. on Information Theory
Proc.IEEE
Mathematical Statistics: a Decision Theoretic Approach
Proc. ETRW onRobust Speech Recognition for Unknown Communication Channels
Speech Communication
IEEETrans. on Speech and Audio Processing
Proc. of Eurospeech-97
Proc. ICSLP-00
Handbook of Statistics
Predictive Inference: An Introduction
Journal of the American Statistical Association
IEEE Trans. on Information Theory
Computer Speech and Language
Speech Com-munication
IEEE Trans. on Speech and Audio Processing
Biometrika
Proc. ICASSP-88
IEEE Trans. on Speech andAudio Processing
Proc. Eurospeech-01
RobustStatistics: The approach Based on Influence Functions
Speech Communication
Speech Communication
IEEE Trans. on Automatic Control
Proc. of Eurospeech-99
Spoken language processing: aguide to theory, algorithm, and system development
Robust Statistics
IEEETrans. on Speech and Audio Processing
IEEE Trans. on Speechand Audio Processing
IEEE Trans. on Speech and Audio Processing
Speech Communication
Proc. ICSLP-2000
IEEE Trans. on Speech and Audio Processing
Proc. ICASSP-2000
IEEE Trans. on Pattern Analysis and Machine Intelligence
Proceed-ings of the IEEE
Statistical Method for Speech Recognition
Advances in Speech Signal Processing
IEEE Trans. on Speech andAudio Processing
IEEE Trans. on Speech and Audio Processing
SpeechCommunication
IEEE Trans. on Speech and Audio Processing
IEEE Trans.on Information Theory
IEEE Transactions on Acous-tics, Speech, and Signal Processing
Technometrics
Computer Speechand Language
IEEE Trans. on Signal Processing
1996 IEEE Workshop on Neural Net-works For Signal Processing
IEEE Trans. on Speech and Audio Process-ing
Robustness in Automatic Speech Recognition:Fundamentals and Applications
IEEE Trans. on Infor-mation Theory
Proc. of IEEE
IEEE Trans. Acoust., Speech, SignalProcessing
Computer Speech andLanguage
IEEE Trans. on Speech and Audio Processing
Computer Speech and Language
Proc. of ICASSP-2001
Proc. of ASRU-1999
Robustness in Statistical Pattern Recognition
IEEE Signal Processing Letters
IEEESignal Processing Letters
Proc. ICSLP-96
Proc.ICASSP-98
Automatic Speech andSpeaker Recognition: Advanced Topics
Speech Communication
Proceedings of the IEEE
The Bell System Technical Journal
Proc.IEEE
IEICE Trans. Inf. & Syst.
Automatic Speech Recognition – The Development of the SPHINX-System
IEEE Trans. on Information Theory
Proc. ICASSP-90
Proc. Eurospeech-95
IEEE Trans. on Signal Processing
IEEE Trans. on Speech and Audio Process-ing
IEEE Trans. onNeural Networks
IEEE Trans. on Acoustics, Speech, and Signal Process-ing
IEEE Trans. on Acoustics, Speech, and Signal Processing
IEEE Trans. on Acoustics, Speech, and SignalProcessing
Proceedings of the IEEE
Proceedings of the IEEE
IEEETrans. on Speech and Audio Processing
�
AT&T Tech. Journal
Proceedings of the IEEE
Fundamentals of Speech Recognition
Pattern Recognition and Neural Networks
Annals ofMathematical Statistics
IEEESignal Processing Letters
IEEE Trans. on Speech and AudioProcessing
IEEE Trans. on Speech and Audio Processing
Proc. Workshop on Adaptation Methods for SpeechRecognition
Proc. ETRW on Ro-bust Speech Recognition For Unknown Communication Channels
IEEE Trans. on Audio and Speech Processing
Speech Communication
IEICE Trans. Inf. & Syst.
Adaptation and learning in automatic systems
Foundations of the theory of learning systems
Proc. ICASSP-01
Proc. of ICASSP-2001
Proc. ICASSP-00
Statistical Decision Functions
Proc. of ICASSP-99
Proc. ICASSP-2002
Proc. of Eurospeech-2001
The HTK Book Version 3.0
4
Speech Pattern Recognition using NeuralNetworks
Shigeru KatagiriNTT Communication Science Laboratories
CONTENTS
4.1 Introduction
4.2 Bayes Decision Theory
4.2.1 Preparations
�
� �
� � ����� ��
� � � �� ����� � � � ��� ��� � � � �
�
�� � �� ��
4.2.2 Decision Rule
���� � �� � � �����
��� � ���
���� ��� � ���� � ��
� �
�
4.2.3 Minimum Error-rate Classification
���� � �� � � �������
���� ����
�
4.2.4 Probability Function Estimation
���� ��� ���� �
���� � �� � � �������
��� ����������
���� ��� ��� �� �������
4.2.5 Discriminative Training
�
�
4.2.5.1 Functional Form Embodiment of the Entire Process
���� � �� � ����� � �� �
�� �
� � �
���� ���
����� � ����
�����
�
� � �� ��� � � ���� � �
�
� �
4.2.5.2 Discriminant Functions
�
�
�
4.2.5.3 Loss over an Individual Pattern
� � �
������� � ������ � ��� ��
� � ��������������� � �
�����
������� �
�� ���� � ���
4.2.5.4 Loss over Multiple Patterns
� � ���� � � � � ���
����� ��
�
�
�
�
�
������ ������ � ����
� � � ��
�� � �
�
4.2.5.5 Adjustment of Trainable System Parameters
�
[Probabilistic Descent Theorem]
���� � �� �
������� ��������
������� �������� � �������������� ������
���
��
���
����
��
���
���� �
���� �� � ���� � ������� ��������
��
����
���� ��
�
��
����������� � ����� � ������
���� � � � �� �������� � ���������
���� �
4.2.5.6 Training Optimality
�����
��
��� � ��� � � ����� ���
�� ��� ��
�
��
����������� � ������ � �����
���
��
���������� � ���
� �
�� ����� ��� �� �
�� ����� ���
����
����
� �
�� ���
��
���� ��� ��
���
�� ���
� ���
���
� ������������ � ������
�� �
�� �
�� � � � � ������ ��� �� �
�� ������ ���
��
��
4.2.5.7 Global Design Scope
���� � �� � ������ � ��� ���
����� ���� ��
��� � ��
4.3 Speech Recognizers Based on Neural Networks
4.3.1 Preparations
�
� �
4.3.2 Classification Error Minimization
4.3.2.1 Learning Vector Quantization
� ��
�� � �� �� � ��
������ �� � ������ ��������� � ������������ �� � ����� � ��������� � �������
��� � ����
� �
4.3.2.2 Shift-tolerant LVQ Classifier
FIGURE 4.1Architecture of shift-tolerant LVQ classifier [20].
4.3.2.3 LVQ/HMM Hybrid Classifier
FIGURE 4.2Block diagram of LVQ/HMM hybrid classifier.
�
4.3.2.4 HMM/LVQ Hybrid Classifier
FIGURE 4.3Block diagram of HMM/LVQ hybrid classifier.
�
� �
�� ��� � ��
�
�
�
� �
4.3.3 Squared Error Minimization
4.3.3.1 Training Using the Squared Error Loss
�
���� � �� ��
�
��
���
����� � ��� �����
���� �
� ��
�� �
�� � �
�
���� � �� � ����� � �� �
������ � ����
�
�
�
�
����� ���
����� � ����
����� � �� �
�
����� ���
����� � �����
���� � �� � ����� � �� �
�
����� ���
����� � �����
FIGURE 4.4Architecture of time-delay neural network [27].
4.3.3.2 Time-delay Neural Network
c c c
c c c
FIGURE 4.5Schematic description of distance classifier as a single intermediate layer net-work (2-dimensional input, 3 references/class, 3 classes).
4.3.3.3 Multi-state Time-delay Neural Network
4.3.4 Cross Entropy Minimization
4.3.4.1 Training Using the Cross Entropy Loss
��
�� � �
�
� � �
��
���
��
���
�� ��������� ��
� ���
���
����������� ���
��� ������ �
� �
�����
������������� �
�� ��
������
������ �
���
�� ����� ��������� � ������ ��������� �
��������� �����
��� ������ ��
��������� �� � � ������������ ��
�� ��
������
������ ��
��
�
� �
�����
��������� ���
4.3.4.2 Unidirectional Network Classifier
� � �� �
�� ������� � ��
4.3.4.3 Bidirectional Network Classifier
W
V
utyt
st s(t+1)
Time delayut : Input vector
st : State vector
yt : Output vector
FIGURE 4.6Architecture of unidirectional network [23].
4.4 Fusion of Multiple Classification Decisions
4.4.1 Principles
FIGURE 4.7Architecture of bi-directional network [25].
FIGURE 4.8Typical classifier design schemes of averaging-based decision fusion.
4.4.2 Examples of Embodiment
4.4.2.1 Multi-codebook Classifier Designed with GPD
FIGURE 4.9Relation between recognition accuracy and the number of prototypes per classand codebook [3].
4.4.2.2 Multi-class Classification Based on Support Vector Machine
4.4.2.3 Decision Fusion Using Different Classifiers
FIGURE 4.10Typical block diagrams of the MSTDNN-based audio-visual speech recognition[7].
4.4.2.4 Decision Fusion Using Multi-modal Classifiers
FIGURE 4.11Block diagram of the twofold-HMM-based audio-visual speech recognition [21].
4.5 Concluding Remarks
References
4.6 Appendix: Maximizing Mutual Information
���� � �� � ������ ������
� ���� ����� �����
� ��
����� � �� � ��
�� ���� �
��
��� ��� ���� ����� ����
���� ����
�
� � �� ���� ���� � ��
���
����� ���
� ������� ��������
� �
�� ���� ����
���� � �� � ����� � �� � ��
���
����� ���
� ������������
��� �
���� � �� � ���� � ��
5
Large Vocabulary Speech Recognition Basedon Statistical Methods
Jean-Luc Gauvain and Lori LamelLIMSI, France
CONTENTS
5.1 Introduction
5.2 Overview
���� � � � �������
���� ��� �� �� �
� �
���� ���� �� ��
� � ����� ��� ���� ��� �� �
������ �� ���� ���
�
� ���� �n
� ���� �� �
���� ���
�
�
FIGURE 5.1LVCSR speech generation model: The word sequence � produced by the lan-guage model is successively transformed by the pronunciation model (� �� �� �)and the acoustic model (��� ���� �), resulting in the speech signal � .
5.3 Language Modeling
n
n� � ���� ��� ���� ���
FIGURE 5.2System diagram of a generic speech recognizer based on statistical models, in-cluding training and decoding processes and the main knowledge sources.
� �� � �
��
���
������������� ���� ����� �����
� � �� �� �
nn
�
�� ��� � � �� �����
� � �
��
���
� ��������� �������
�
�
� � ���� ���� ��� � � � � �� ���
5.3.1 Text Preparation
n
� one hundred fifty dollars �
nineteen ninety one one thousand nine hundred and ninety one
hundred � � hundred andmillion dollars
million
� �� � ������������
FIGURE 5.3Some example transformation rules applied during text normalization with as-sociated probabilities.
million officials
�
neunzehnhun-derteinundneunzig neunzehn hundert einund neunzig
5.3.2 Vocabulary Selection
�
5.3.3 N-gram Estimation
�
�
� ��������� ����� �������� ����� ���
������� �����
���� �
�
n
nn n
� �����
�� ��������� ����� � �� ��������� ������ ������
������ ������� ���������
�
��
�
�
�
�
���� �
��������������� ���� ������������������� ������������������ ���� ��������
�
�
�
� ���������� ����� � � �� ��������� ����� � ��� �� �� ������������ ���������
���� ���� ����� ��� ��
�
5.3.4 LM Adaptation
cache model trigger model topic coherence model-ing
n
5.4 Pronunciation Modeling
Phone Example Phone Example
��
� �
�� �
�
��
��
���
���
�
���
FIGURE 5.4Set of 45 phone symbols for English with illustrative words, with the portioncorresponding to the phone sound underlined.
excuse,record, moderate anti-, bi-, multi-, -ization
� � �
� �� � � �� � � �� ��� � � �� � � �� �� � ��
�� ��
�� � ��
�
� �� � �� � �
FIGURE 5.5Some example lexical entries and their pronunciations along with estimateprobabilities. For the compound words, the original concatenated pronunci-ation is given in the 1st line and the reduced forms are given in the 2nd line.
interest conferencecompany
don’t knowdid you going to
gonna, dunno
5.5 Acoustic Modeling
5.5.1 Acoustic Front-end
�
� � ������� ������ ��
��
FIGURE 5.6A simple 3-state left-to-right HMM topology commonly used for allophone mod-eling in LVCSR. The model generates at least 3 speech frames per allophone, re-sulting in a minimal phone segment duration of 30ms for frame rate of 100Hz.
5.5.2 Modeling Allophones
� �
� � ���� ������ � � � ���� ���� �� �
�������� � ��
��
���
��������������
� � ��
� � ������ �
/s�st�/s(*,�) �(s,s) s(�,t) t(s,�) �(t,*)s(*,�s) �(s,st) s(s�,t�) t(�s,�) �(st,*)
FIGURE 5.7Examples of allophonic transcriptions in terms of intra-word triphones andquinphones. Each contextual unit is defined by the central phone followed by itsphone context shown in parentheses (left-context, right-context). * is a wildcardsignifying any context.
������� ���
���
��� ������������
��� ��� ��� �
a priori
Position:General classes:
Vowel classes:
Consonant classes:Individual phones:
FIGURE 5.8Example questions used for decision tree clustering.
senones genones PELs tied-states
5.5.3 HMM Parameter Estimation
� �
Question Log likelihood gain Question Log likelihood gain
FIGURE 5.9The most frequently used decision tree questions for an American Englishbroadcast news transcription system [40]. The [+1] and [-1] indicate that thequestion has been applied to the right or left context respectively, and [0] to thephone itself.
�� � �������
�� ����
� �
A Posteriori
�
�
�� � �������
�� �� � � �� � �� �� �� � �
5.5.4 HMM Adaptation
� �
� � � ���
�� � �������
�� ���� ���
������� ��
����� � ��� � �
� ��
�� � �������
�� ��� ��� �� �
A b
5.6 Decoding
� �
� � � �������
� �� ��� � �������
�
���
� �� �� �� �� �������� �
�
� � � �������
������
� �� �� �� �� �������� ��
5.6.1 Speech/Non-speech Detection
5.6.2 Decoding Strategies
�
�
�
�
�
FIGURE 5.10Example word lattice generated by a speech recognizer using a bigram languagemodel for a 2.1s utterance. Each graph edge corresponds to a word hypothesisand a time interval (as specified by the time information on the nodes). In thisexample the word transcription with the highest likelihood is “sil IT WAS AGOOD PROGRAM sil” which happens to be what was said. (The acoustic andlanguage model likelihoods are not given on the figure.)
5.6.3 Efficiency
n
�
�
�
5.6.4 Confidence Measures
���� ��� �
���� ������
�������� �
���� ������
���� ����
5.7 Indicative Performance Levels
substitutionsinsertions
deletions
5.7.1 Dictation
�
5.7.2 Speech Recognition for Dialog Systems
�
n
exact
5.7.3 Transcription for Audio Indexation
�
�
5.8 Portability and Language Dependencies
�
References
The THISL Broadcast NewsRetrieval System,
Experiments in Vocal Tract Normaliza-tion,
A CompactModel for Speaker Adaptation Training,
One Pass Cross Word Decoding for Large Vocabularies Based on aLexical Tree Search Organization, 4
The Forward-Backward Search Strat-egy for Real-Time Speech Recognition,
Preliminary results on the performance of a system for the au-tomatic recognition of continuous speech,
AcousticMarkov Models used in the Tangora Speech Recognition System,
1
A Maximum Likelihood Approach toContinuous Speech Recognition,
PAMI-5
A Fast Match for Continuous Speech Recognition Using Allophonic Models,1
Large Vocabulary Recogni-tion of Wall Street Journal Sentences at Dragon Systems,
A maximization technique oc-curring in the statistical analysis of probabilistic functions of Markov chains
41
Vector quantization for efficient computation of continuous den-sity likelihoods, 2
A Baseline for the Tran-scription of Italian Broadcast News,
Word and acoustic confidence annotation for large vocabularyspeech recognition
Improvements in Language, Lexical and PhoneticModeling in Sphinx-II,
An empirical study of smoothing techniques forlanguage modeling, 13
Speaker, Environment and ChannelChange Detection and Clustering via the Bayesian Information Criterion
The Role of Word-Dependent Coartic-ulatory Effects in a Phoneme-Based Speech Recognition System
3
Statistical Language Modelling using CMU-Cambridge Toolkit,
Comparison of Parametric Representations ofMonosyllabic Word Recognition in Continuously Spoken Sentences,
28
Maximum Likelihood from In-complete Data via the EM Algorithm
39
Human SpeechRecognition Performance on the 1995 CSR Hub-3 Corpus
Genones: Optimization the Degree of Tying ina Large Vocabulary HMM-based Speech Recognizer,1
Speaker adaptation using con-strained estimation of Gaussian mixtures3
Sonograph and Sound Mechanics,22
Automatic Recognition of Phonetic Patterns inSpeech, 30
Human Speech Recognition Performance on the 1994CSR Spoke 10 Corpus
Comparison of speaker recognition methods using statistical featuresand dynamic features,ASSP-29
An improved approach to hidden Markov modeldecomposition of speech and noise,
Robust Continuous Speech Recognition usingParallel Model Combination, 9
Cluster Adaptive Training for Speech Recognition,
Semi-Tied Covariance Matrices for Hidden Markov Models,7
Transcribing Broad-cast News: The LIMSI Nov96 Hub4 System,
Spoken Lan-guage component of the MASK Kiosk
Speech Recognition for an Informa-tion Kiosk,
Partitioning and Transcription of Broad-cast News Data, 5
Developments in ContinuousSpeech Dictation using the ARPA WSJ Task,
Maximum a Posteriori Estimation for Multivari-ate Gaussian Mixture Observations of Markov Chains,
2
The LIMSI Broadcast News TranscriptionSystem 37
A Rapid Match Algorithm for Continuous SpeechRecognition,
A Probabilistic Approach to Confidence Mea-sure Estimation and Evaluation
Real-time Telephone-basedSpeech Recognition in the Jupiter Domain, 1
SWITCHBOARD: Telephone SpeechCorpus for Research and Development,
The Population Frequencies of Species and the Estimation of Popu-lation Parameters 40
A tree search strategyfor large-vocabulary continuous speech recognition,1
Linear Discriminant Analysis for ImprovedLarge Vocabulary Continuous Speech Recognition, 1
SegmentGeneration and Clustering in the HTK Broadcast News Transcription System,
News-on-Demand-’An Ap-plication of Informedia Technology’,
The ATIS Spoken LanguageSystems Pilot Corpus,
Perceptual linear predictive (PLP) analysis of speech,87
Large vocabu-lary continuous speech recognition using a hybrid connectionist-HMM system,
Signal Representation
Subphonetic Modeling with Markov States - Senone,1
Predicting Unseen Triphones withSenones, II
Continuous Speech Recognition by Statistical Methods,64
Statistical Methods for Speech Recognition,
A Dynamic LanguageModel for Speech Recognition,
: Speech BasedVideo Retrieval,
Maximum-Likelihood Estimation for Mixture MultivariateStochastic Observations of Markov Chains 64
Estimation of Probabilities from Sparse Data for the LanguageModel Component of a Speech Recognizer,
ASSP-35
Unsupervised Training of a Speech Recognizer: Re-cent Experiments, 6
The 1995 Abbot hybridconnectionist-HMM large-vocabulary recognition system,
Improved Clustering Techniques for Class-Based Statis-tical Language Modelling,
Improved backing-off for n-gram language modeling,1
Design of the 1994 CSR Benchmark Tests,
Toward Automatic Recognition of Broadcast News,
Heteroscedastic discriminant analysis and re-duced rank HMMs for improved speech recognition,26
Eigenvoices for Speaker Adaptation,
On Designing Pronunciation Lexicons for Large Vo-cabulary, Continuous Speech Recognition, 1
Speech Recognition of European Languages,
Continuous Speech Recognition at LIMSI,
A Phone-based Approach to Non-LinguisticSpeech Feature Identification, 9
Lightly Supervised and UnsupervisedAcoustic Model Training 16
Development of Spoken Language Corpora for Travel Infor-mation 3
Large-vocabulary speaker-independent continuous speech recogni-tion: The SPHINX system,
Speaker Normalization Using Efficient Frequency Warp-ing Procedures 1
Maximum Likelihood Linear Regression forSpeaker Adaptation of Continuous Density Hidden Markov Models,
9
Maximum Likelihood Estimation for Multivariate Observa-tions of Markov Sources IT-28
Speech recognition by machines and humans,22
Fast Speaker Change Detection for Broadcast NewsTranscription and Indexing 3
Multi-site Data Collection for a Spoken Language Corpus,
Finding Consensus in Speech Recognition:Word Error Minimization and Other Applications of Confusion Networks,
Subspace distribution clustering for continuousobservation density hidden Markov models,
Spoken Language Processing and Human-Machine Communica-tion in the European Union Programs,
An overview of EU programs related to conver-sational/interactive systems,
Algorithms for Bigram and Trigram Clus-tering,
News on Demand,43
Named Entity Extrac-tion from Broadcast News,
Full Expansion ofContext-Dependent Networks in Large Vocabulary Speech Recognition,
Large-VocabularyDictation using SRI’s Decipher Speech Recognition System: Progressive
Search Techniques, II
The Use of a One-Stage Dynamic Programming Algorithm for Con-nected Word Recognition,
ASSP-32
Improvements in BeamSearch for 10000-Word Continuous Speech Recognition,
I
Single-Tree Method for Grammar-DirectedSearch, 2
The Use of Decision Trees with Context Sensitive Phoneme Mod-elling,
A One Pass DecoderDesign for Large Vocabulary Recognition,
Recent Advancesin Japanese Broadcast News Transcription,2
Modeling Inverse Covariance Matrices by Ba-sis Expansion,
Language-model look-ahead for largevocabulary speech recognition,
A Word Graph Algorithm for Large Vo-cabulary Continuous Speech Recognition,11
The Role ofPhonological Rules in Speech Understanding Research,
ASSP-23
Continuous WordRecognition Based on the Stochastic Segment Model,
1993 Benchmark Tests for the ARPA Spoken Language Program,
1994 Benchmark Tests for the ARPA Spoken Language
Program,
1995 Hub-3 Multiple Microphone Corpus Benchmark Tests,
1998Broadcast News Benchmark Test Results: English and Non-English Word Er-ror Rate Performance Measures,
An efficient A� stack decoder algorithm for continuous speechrecognition with a stochastic language model,
Improved Discriminative Training Techniques ForLarge Vocabulary Continuous Speech Recognition
Evaluation of Spoken Language Systems: The ATIS Domain,
An Introduction to Hidden Markov ModelsASSP-3
Efficient Algorithms for Speech Recognition,
Stochastic pronuncia-tion modelling from hand-labelled phonetic corpora,29
Improvements in Stochastic Language Modeling,
Adaptive Statistical Language Modeling,
Two Decades of Statistical Language Modeling: Where Do WeGo From Here?,
88
Language-independent and langauge-adaptiveacoustic modeling for speech recognition 35
Memory-efficient LVCSR search using a one-pass stack decoder,14
New uses for N-Best Sen-tence Hypothesis, within the BYBLOS Speech Recognition System,
I
Improved Hid-den Markov Modeling of Phonemes for Continuous Speech Recognition,
3
NYU Language Modeling Experiments for the1995 CSR Evaluation,
A Markov Random Field Approach to Bayesian SpeakerAdaptation,
Modeling Those F-Conditions – Or Not,
Scalable backoff language models1
Automatic Segmentation, Classifica-tion and Clustering of Broadcast News Audio,
Evaluation of word confidence for speech recognitionsystems 13
Entropy-based Pruning of Backoff Language Models
Four-level Tied Structure for Efficient Repre-sentation of Acoustic Modeling,
An Investigation into Vocal Tract LengthNormalization,
Human Bench-marks for Speaker Independent Large Vocabulary Recognition Performance,
Speech discrimination by dynamic programming,4
Elements-wise recognition of continuous speech composed ofwords from a specified dictionary, 7
Verbmobil: Translation of Face-to-Face Dialogs,Plenary
Multilinguality in Speech and Spoken Language Systems88
Probabilistic Models for Topic De-tection and Tracking, 1
DragonSystems’ 1997 Broadcast News Transcription System,
Progress in Broadcast News Transcrip-tion at Dragon Systems,
Neural-Network based Measures of Confidence for Word Recognition,
Using word probabilities as confi-dence measures,
Unsupervised training of acoustic models for large vo-cabulary continuous speech recognition
The Zero Frequency problem: Estimating the prob-lems of Novel Events in Adaptive tex Compression
37
Large scale discriminative training of hiddenMarkov models for speech recognition,16
The de-velopment of the 1994 HTK large vocabulary speech recognition system,
The HTK large vocab-ulary recognition system for the 1995 ARPA H3 task,
A Hid-den Markov Approach to Text Segmentation and Event Tracking
1
A Review of Large-Vocabulary Continuous Speech Recognition,13
Multilingual large vocabulary speech recognition: the Euro-pean SQALE project, 11
Speech recognition evaluation: a review of the U.S.CSR and LVCSR programmes, 12
Tree-Based State Tying for High Ac-curacy Acoustic Modeling,
The Use of State Tying in Continuous SpeechRecognition, 3
Utilizing Untranscribed Training Data to Im-prove Performance
Maximum a Posteriori Adap-tation for Large Scale HMM Recognizers,
The MIT Speech Recog-nition System: A Progress Report
6
Toward Spontaneous Speech Recognition andUnderstanding
Sadaoki FuruiTokyo Institute of Technology
CONTENTS
6.1 Introduction
������������ ������ ���������������������� ������ ������������ �������������������������������������������������������������� ����� �
������� ��� ��
� ��� ��
��� ��� ��
���� �� �� ��
����� �����
�� � ��!���
����������" �������������
���" �������#�������!� �#�������!�
������������������������� ���� �! ���$��� ���! ��� �!��!
� ���� �! ���$��� ���! ��� �!��!
� �����" ������!�
� �����" ������!�
����� ��������������
���������
��� ������!��� ������!
�����������"��� �����������"���
��� ���������� ��� ����������
����������!����
������!
��!������!��!������!
"��� �������
"��� �������
�� �������������������������������������������������������������� ����� �
������� ��� ��
� ��� ��
��� ��� ��
���� �� �� ��
����� �����
�� � ��!���
����������" �������������
���" �������#�������!� �#�������!�
������������������������� ���� �! ���$��� ���! ��� �!��!
� ���� �! ���$��� ���! ��� �!��!
� �����" ������!�
� �����" ������!�
����� ��������������
���������
��� ������!��� ������!
�����������"��� �����������"���
��� ���������� ��� ����������
����������!����
������!
��!������!��!������!
"��� �������
"��� �������
FIGURE 6.1Progress of spoken language technology along the dimensions of vocabulary sizeand speaking styles.
6.2 Four Categories of Speech Recognition Tasks
TABLE 6.1
6.3 Spontaneous Speech Recognition and Understanding - Re-view
6.3.1 Category I (human-to-human dialogue)
6.3.2 Category II (human-to-human monologue)
%&%#'� �" �
��� ��
��� �" �
����(���
)��*+,
-���. �" �
�/� �" �
-��� �" ����� ���+� �" �
/����� �" �
%&%#'� �" �
��� ��
��� �" �
����(���
)��*+,
-���. �" �
�/� �" �
-��� �" ����� ���+� �" �
/����� �" �
FIGURE 6.2The SCANMail architecture [12].
6.3.3 Category III (human-to-machine dialogue)
FIGURE 6.3AT&T Communicator architecture [15].
6.4 Japanese National Project on Spontaneous Speech Corpusand Processing Technology
6.4.1 Project Overview
0��! #��� ������ ��� ��������
1����� ���� �!
0��!�����������������
%���#���!�����������������
+����� ������������
������� ��� ��
2���#��������
��� �������!�����������
.��������������������
�� ��� ��!������
�������� �� .�
3 ����
���� �� �"���
0��! #��� ������ ��� ��������
1����� ���� �!
0��!�����������������
%���#���!�����������������
+����� ������������
������� ��� ��
2���#��������
��� �������!�����������
.��������������������
�� ��� ��!������
�������� �� .�
3 ����
���� �� �"���
FIGURE 6.4Overview of the Japanese national project on spontaneous speech corpus andprocessing technology.
6.4.2 Corpus
FIGURE 6.5Overall design of the Corpus of Spontaneous Japanese.
6.5 Automatic Transcription of Spontaneous Presentation
6.5.1 Recognition Task
6.5.2 Language and Acoustic Modeling
CSJ:
Web :
TABLE 6.2
� �
SpnL WebL
SpnL:
WebL:
SpnA:
RdA:
6.5.3 Recognition Results
SpnLWebL WebL
FIGURE 6.6Test-set perplexity and OOV rate for the two language models.
SpnL WebL SpnA RdA
SpnL WebL SpnARdA
SpnL SpnA
FIGURE 6.7Word accuracy for each combination of models.
SpnA
SpnA
SpnL
SpnA � �
6.5.4 Analysis on Individual Differences
FIGURE 6.8Results of unsupervised adaptation.
6.5.4.1 Speaker Attributes
TABLE 6.3
�
0.280.32
-0.42 -0.47 -0.54 -0.62-0.40 -0.33-0.54 -0.51 0.33 0.520.38 0.38 -0.50 -0.41
-0.30 -0.31
6.5.4.2 Correlation Analysis
�
� �
6.5.4.3 Regression Analysis
����� � �������� � �������� � �������
������� ������ ������ �
����� � �������� � ������� � ������
������� ������ ������ ��
FIGURE 6.9Speaking rate vs. word accuracy.
�
6.5.4.4 Selection of Major Attributes
�
�
FIGURE 6.10Summary of correlation between various attributes.
TABLE 6.4
�
6.5.5 Discussion
6.6 Automatic Speech Summarization and Evaluation
6.6.1 Summarization of Each Sentence Utterance
� �
�
��
� � � ��� ��� � � � � ��� �� � � � �� ��� � � � �
��
�� � �
��
���
������ � � � �������������������������������� ����
�� �� �� � � � � �
6.6.1.1 Word Significance Score
�����
FIGURE 6.11An example of dependency structure.
6.6.1.2 Linguistic Score
����� � � � ����� � ���������
6.6.1.3 Word Confidence Score
�����
6.6.1.4 Word Concatenation Score
� ������� ���
i k k j j Lw w w w w w w wlwmwi nw
FIGURE 6.12A phrase structure tree based on a dependency structure.
� � �� (right-headed)
� � �� (left-headed)
� � �
� � �
� ��� � � � � �� ��
��
�
�� �� �� � � � �� � �� �
���� � � � �� � �� �
�� ��
�
�
�� ��
�� �� �� ���� � � � ��
����� ��� � � ��
������ ��� � ���
��
���
����
���
��
���
��
���
����� ��� �� �� ���
6.6.2 Summarization of Multiple Utterances
�� � � � � �� ����� ���� � � � � ���� ��� ��
� �� � � ��� ��� � � � � ��
6.6.3 Evaluation
6.6.3.1 Word Network of Manual Summarization Results for Evaluation
6.6.3.2 Evaluation Data
6.6.3.3 Training Data for Summarization Models
I_L
_T
SUB
I_L
_C_T
I_L
_C
I_L
I_L
RD
M I_L
_T
RD
M
I
I
REC TRSI_
L_T SU
B
I_L
_C_T
I_L
_C
I_L I_
L
RD
M I_L
_T
RD
M I
I
REC TRS
FIGURE 6.13Each utterance summarizations at 70% summarization ratio.
6.6.3.4 Evaluation Results
� �
� � �
� � � �
� � � �
� � � � �
I_L
_T
SUB
I_L
_C_T
I_L
_C
I_L
I_L
RD
M
I_L
_T
RD
M I I
REC TRS
I_L
_T
SUB
I_L
_C_T
I_L
_C
I_L
I_L
RD
M
I_L
_T
RD
M
I I
REC TRS
FIGURE 6.14Article summarizations at 30% summarization ratio.
6.6.4 Discussion
6.7 Spontaneous Speech Recognition and Understanding ResearchIssues
6.7.1 Language Models and Corpora
6.7.2 Message-driven Speech Recognition and Understanding
� �� ��� � � ��� � � � � �� � � ��� � � � � ��
� �
�
� �� ���
�
� � �� �� �
�����������
�
����������
� � �� ���� ��!��� �
������
( �! ����
0��!���������� �
-����������� �
0��!��!
���������
4������
� ������
���� .�
)����
�� � �
� " �� ������
5��
2��������������� �����
(��������
FIGURE 6.15A communication-theoretic view of speech generation and recognition.
�
�
����
� �� ��� � ����
�
�
� �� �� �� �� ����
����
� �� ��� � ����
�
�
� �� �� �� �� ���� ���
� ����
����
� �� ��� � ������
� �� �� �� �� ���� ���
� ����
� �� �� �
6.7.3 Statistical Approaches and Speech Science
-
6.7.4 Research on the Human Brain
6.7.5 Dynamic Spectral Features
FIGURE 6.16Speech-generation and speech-perception processes.
6.8 Conclusion
References
7
Speaker Authentication
Qi Li� and Biing-Hwang Juang�
�Bell Labs; �Avaya Labs Research
CONTENTS
7.1 Introduction
FIGURE 7.1Speaker authentication approaches.
Speaker authentication
7.1.1 Speaker Recognition and Verification
Speaker recognitionSpeaker verification
hypothesis test-ing Speaker identification
classification
FIGURE 7.2A speaker verification system.
direct methoddirectly
fixed pass-phrase system
text-prompted system
text-independentSV system
closed test open test
7.1.2 Verbal Information Verification
FIGURE 7.3An example of verbal information verification by asking sequential questions.(Similar sequential tests can also be applied in speaker verification and otherbiometric or multi-modality verification.)
in-direct method
7.2 Pattern Recognition in Speaker Authentication
7.2.1 Bayesian Decision Theory
�
� � ��
���� ��� ���� ��� � � �
�� � � �
a posteriori
� ������ ��������� ����
����
������� � ����
���� �
�����
�������� ����
�������� � �
��
��
������� �
�����
��������� ��� ����
Bayes decision rule� � �� ���� � � �������
�������� �
�� � � � � � �� ����� �� ��
������� �
�����
��������� ��� ���
��� ���
� ��� ��� � �� � �������
��
� ������
��� � � ��� ������
� �������
�������
��� � � ��� ������
�������� �����
�
� � ��������
��
� ������
� ������ �
�����
� ��������
�� � ��� ��������
�����
��������� �����
�� � ��� ��������
�����
� ��������� �����
7.2.2 Stochastic Models for Stationary Process
pdfpdf
�������� � �������� ���
���
��� ���������
�� �� ������� �� � � ���
� ������ ��� �
������������
��
���� � ��� ���� ��� � ���
�
�� �� �
�
��� ��
�
��
���
������� ��
��� �
��
��� ������� ������
��� ������� ��
��� �
��
��� ������� ����� � ������� � ������
��� ������� ��
������� �� ������������
��� ����������
��� ��� ���� ���
�� � � �� �������
��
���
� ���������
� � �
speaker-dependent
7.2.3 Stochastic Models for Non-Stationary Process
FIGURE 7.4Left-to-right hidden Markov model.
�
�
� �
�
� � ������� � ����� � ��� ���� � � �� ���� ��
� � � �� ������
������� �
segmental K-mean
���
7.2.4 Speech Segmentation
� � � �� ������
���� � ��� ����
7.2.5 Statistical Verification
� ��
������� � ��������� ������ � ��������� ������
������� � ��������� ������ � ��������� �������
��
������� ��������
� �
� ������
� ������
�����������������
������������������ ��
�� �
�������
������� ��
� ����
� ����� ���
� � ��������
���� �
��
�����������
��
�����������
�� ������
� ������ ���
Neymann-Pearson
���� � ���� ������� ���� �������
����� � ����� � �
�
7.3 Speaker Verification System
t )L(O,
L(O, )b
FIGURE 7.5A fixed-phrase speaker verification system.
whole-word or whole phrase model
�������� �� �
�
������� ��
��
���� �������
� ��
�� � ������
� � � ���� ���� ����
� � ���� ������� ��
� � �
������� ��
��
��
���
���� ��������
�� � ��������
� ���������
��� ������ � �������� ��������
������� �������
TABLE 7.1Experimental Results in Average Equal-Error Rates
7.4 Verbal Information Verification
FIGURE 7.6Utterance verification in verbal information verification (VIV).
7.4.1 Utterance Segmentation
�
�������� �� �
�
� � �� ���� ���
� ����� � ���������������
� ����� ����� ��
������
���� � � � � ����������
��� ��
� � ������� ������� � ����� � ���
����� ���� ���
��������
�� �� ���� �� � � ���
������
� �
��� � � � � � � � ���
7.4.2 Subword Hypothesis Testing
�� ��
�� �� �� ��
�� ��
����� �� �������
� ��������� �������
� ���������
�� ����� �� ��
��� ���� ��� � ����� � �� �
��
���� � ���� �������� ���� ��������
�
� ��
��
����� �������� ���� ��������
��
��
�� � ��� � ��
�� �
7.4.3 Confidence Measure Calculation
�
��� � ��� �� � � ��
� �
� ��
�
�����
����
� �
� ��
�
�����
� ��
�
�����
�
��
�� ��
�
�� ��
�� ����� �������� ���� ��������
���� �������
���� ������� �� � �� � �normalized confidence measure
�
� �
�
��
���
����
���� �
� �� � �
�
� � � � � �
� �
� �
� � ��
7.4.4 Sequential Utterance Verification
step-down procedure
�
���
���� � ����������� ����� � ���������� � ����
���� ����� ������
��
�� �
�����
������
� ������ �
�� �
�����
������
����� �
�
����� � � ������� � � ���
���� � ��� �
false rejectionfalse acceptance
�� �� equal-error rate �
�� � �� � �
Definition 1: False rejection error on � utterances � � ��
Definition 2: False acceptance error on � utterances � � ��
Definition 3: Equal-error rate on � utterances�
� ����� ������� � ����� � ����� �
����� ���� � � �������� ����� ���� �
� ��� ������
����� � ����� � ����� � ����� �
����� � ����� � ����� ������ �
�
����� � �
��
���
����� � ������ � �� �
� �
��
���
�� ������
�
����� � �
��
���
����� � ������ � �� �
�
��
���
������
��
��
�����
������ � ���� � � �
Example 1:
����� � ���� ����� � ������� � ���� ����� � �
����� � �������� � ���
���� � ��� ���� � �� ���� � ������� � ������
Example 2: ����� � ���� ����� � ������ � ����� �� � ���
�
� �
�� ���
� � ��
��
�� � �����
� � ����
�� �
���� � ����� � �������� � ����
7.4.5 VIV Experimental Results
� �
�
� � � ���� ��� �� � ��� � �� �
�
� ��� � �������� ���� � � �� ���� ��
���� �� � � �
�
� ���
TABLE 7.2Summary of the Experimental Results on Verbal Information Verification
FIGURE 7.7An integrated voice authentication system combining verbal information verifi-cation and speaker verification.
7.5 Speaker Authentication by Combining SV and VIV
TABLE 7.3Experimental Results without Adaptation in Average Equal-Error Rates
TABLE 7.4Experimental Results with Adaptation in Average Equal-Error Rates
7.6 Summary
References
An Introduction to Multivariate Statistical Analysis
Journal of the AcousticalSociety of America
Proceedingof the IEEE
Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing
Ann. Math. Stat.
Proceedings of the IEEE
Proceedings of the IEEE International Conferenceon Acoustics, Speech, and Signal Processing
Journal of Royal Statistical Society
Pattern Classification, Second Edition
Proceeding of IEEE
Introduction to Statistical Pattern Recognition
IEEETrans. Acoust., Speech, Signal Processing
IEEE SignalProcessing Magazine
AT&T Technical Journal
IEEE Trans. on Speech and Audio Process.
IEEE Transactions on Signal Processing
Proceed-ings of ICASSP
Proceedings of Int. Conf. on Spoken Language Processing
Proc.of ICSLP
IEEE Trans. on Speech and Audio Processing
Proceedings of the IEEE InternationalConference on Acoustics, Speech, and Signal Processing
IEEE Robotics & Automation magazine
Proceedings of EUROSPEECH
IEEE Trans. on Speech and Audio Processing
Proceedings of theIEEE International Conference on Acoustics, Speech, and Signal Processing
IEEE International Conference on Acoustics, Speech, and Signal Processing
Proceedings of IEEE Workshop on Automatic Identifi-cation
IEEE Trans. onSpeech and Audio Processing
Journal of theAcoustical Society of America
Proc. IEEE Int. Conf.Acoust., Speech, Signal Processing
Biometrika
Phil. Trans. Roy. Soc. A
IEEE Trans. onSpeech and Audio Processing
Proceedingsof ICSLP-96
Fundamentals of speech recognition
AT&T Technical Journal
Proc. IEEE Int. Conf. Acoust., Speech, SignalProcessing
Proc. IEEE Int. Conf. Acoustic, Speech, Signal Processing
IEEE Trans. on Speech and Audio Processing
Proceedings of theIEEE
Proceedings of the IEEE In-ternational Conference on Acoustics, Speech, and Signal Processing
Proc. IEEE Int. Conf. Acoust., Speech,Signal Processing
Proc. Int. Conf. on Spoken LanguageProcessing
Proc. IEEE Int. Conf. Acoust., Speech,Signal Processing
AT&T Technical Journal
IEEE Trans. Speech and Audio Process.
Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing
IEEE Transactions on Information Theory
Sequential analysis
IEEETrans. on Acoustics, Speech, and Signal Proc.
The Annalsof Statistics
8
HMMs for Language Processing Problems
Richard M. Schwartz and John MakhoulBBN Technologies, Verizon
CONTENTS
8.1 Introduction
8.2 Use of Probabilities
8.2.1 Hidden Markov Models
8.3 Name Spotting
8.4 Topic Classification
��
.
.
P(Tj|Set)
storystart
storyend
T1
T2
TM
T0G eneralLanguage
Loop
P(Set)
nP (W n|Tj)
FIGURE 8.1A hidden Markov model for topics. Each state can emit words for one topic.State T0 emits words corresponding to general language.
8.4.1 The Model
���
� ���� � ��
� ���� � �� � � ������ �� � ����
� ����
� ���
� ����� �
�� ��������
������������
� ���� ���
��
���
�
�
Set� ���� ���
� ��� � ��� � ������ ��
��
� �� � ����
� �� � � ����� ��� � ���
� �� � ����
� �� � ���� ��
�
�
�����
� ��� � ����� ��� � ���
�
8.4.2 Estimating HMM Parameters
� �� � �����
������� � ���� ��� �� � �
�� � �
������� � ���� � �� ����
��
���� ��������� � ���
��
���� �������� ��
�� ���� �
�
� �
����
�
�
����
����
��������� � �� � ��� � ����� �������� � �������� � ����
���� ���� � �������� � ���
��������� � �� � �
��� � ������
�� �� ��
������ � ��� �
�� ��������� � ���
�
���� ��������� � ��
�
8.4.3 Classification
���
��� �� �
���� ��� � �� � � ���� ���� ���
�
����
�� ��� � ���
� � �� � ���
� ���
��
���� � � ��
8.4.4 Experiments
8.5 Information Retrieval
8.5.1 A Bayesian Model for IR
� �� � � � ��
� �� � � � �� �� �� � ��� �� � � � ��
� ���
� �� � ��
� ���
� �� � � � ��
8.5.2 Training the IR HMM
�
�
� �
�
�
�
8.5.3 Performance
8.6 Event Tracking
8.7 Unsupervised Topic Detection
������ � ������������
-
8.8 Summary
References
9
Statistical Language Models With EmbeddedLatent Semantic Knowledge
Jerome R. BellegardaApple Computer, Inc.
CONTENTS
9.1 Introduction
�
�
9.1.1 Scope Locality
�
�
�
stocks fell sharply as a result of the announcementstocks, as a result of the announcement, sharply fell
fell stocks� � �
� � �
�
information aggregation span extension
�
9.1.2 Syntactically–Driven Span Extension
�
�
�
��� �� headwords ��� ��words
stocks fell
9.1.3 Semantically–Driven Span Extension
document
stock market trends
stocksfell stocksfell
stocks fell
latent semantic analysis
�
�
9.1.4 Organization
�
�
9.2 Latent Semantic Analysis
� ��� � � ��
� � ��� � ��� � �� �
��� �
9.2.1 Feature Extraction
�
�
�
� ���
���� � ��� ��� ���
��
���� �� �� ���� �� �� �
�� ����
�
�� ��
� ���� �� �
��
�� � ��
����
��
���
������
���������
�
� � �� � � ���� � �� ���� � �����
� �� � � �
� �
9.2.2 Singular Value Decomposition
�� �� �
� � � �
�� ��
��
� � �� � � � � �
�� � � � � � � �� � � �� � �� � � � � � �� � � �
� � � �� � � � � � � � ������
�� � �
� � � � �� � ��� �
� � � ���
� � ���� � � � � �
� � ��
�� � � � � �
� ��� � ��� word vector ��
� �� � � ��� document vector��
� �
� � � �
��
�
� �
9.2.3 General Behavior
�
�
�
�
� � ��
FIGURE 9.1Improved Topic Separability in LSA Space.
9.3 LSA Feature Space
� � � � �
� �� � � ��� �� �
� � �� � � ���
�
�
� � �
� � �
9.3.1 Word Clustering
� � �
� ��
� � � � � � � � � �
�
� �� ���
����� ��� � �������� ���� ��� �
� ���
����� ������
� � �� ����� ��� � ��� �� ��� �
��� �
����� ��� �������� ��� �
��� �
�� � � � �
�
9.3.2 Word Cluster Example
� � � ���
� � �� ���
�
drawingrule polysemy draw-
ing rule
Cluster 1
Cluster 2
FIGURE 9.2Word Cluster Example (After [2]).
drawing a conclusion breaking a rule
hysteria here
9.3.3 Document Clustering
� � � � � � � � � �
� � �� � � �
���� �� � ������� ��� �� �
� ��
���� �����
� �� � � � �
�
�
����� ���
FIGURE 9.3Document Cluster Example.
9.3.4 Document Cluster Example
9.4 Semantic Classification
9.4.1 Framework Extension
�
���� � �
� � � ����
� �
��� � � � �� �
� � �� �
��� � �� � ���� � �
����
���pseudo document vector
�
����
�
�
����
� ����
��� �� �� ���� ���
9.4.2 Semantic Inference
��
semantic anchor
semantic inference
� � �what is the time what is the day
what time is the meeting cancel the meeting� � � what is
the� � �
�
day can-celthe
day cancel
what–is time time meetingtime
when is themeeting what time is the meeting
�
�
���
������ �
FIGURE 9.4An Example of Semantic Inference for Command and Control (� � �).
9.4.3 Caveats
not
change popup to windowchange window to popup
exact same point
� � �
� �
�
9.5 N-gram+LSA Language Modeling
�
9.5.1 LSA Component
�� � ���
����� � ���
��� �������� � ��� � ������ �
�
��� � ������ � ������
��
9.5.1.1
�����
������ � ����� � � ������ � �
�
�� � ����� ����
� � � � � �
���
��� ��� � �
������� �
�� �
���� � � � � � � � ��� �
�
���� � ��� � ��
��
��� � � ������ � �� � �
��
9.5.1.2
�
� � ��� �
� � �� ��
��� ������
��� � ����� � �� ������ ������
��� � � � �� �
���
� ������ ������������
� �� � � ����� � ������ �� ��� � ����� � �
���� �
��� � ������
�����
��� � ������ ��
����������
the�
9.5.2 Integration with N-grams
��� ���������� � � ��� ��
������� �
������� �
���� � ���� ��� ����� � �������� � � �
������ � � � �����
��� ���������� � �
��� � ���������
��������
����
���� ���������
�������
�
�
��� � ���������
������� �
��� ��������� � ��
��������� � �
������� �
��� ��������� � � � �������
� � �������� �������� � � � ������� �
��� ���������� � �
��� ��������� � � � ������� � ����������
����
������������ � � � ������� � ����������
� ����������
�
� ��������� ��� � ������
� ������
��� ���������� � �
��� ��������� � � � ���������� � ������
�����
����
������������ � � � ����������� ������
������
�
����� � � �
9.5.3 Context Scope Selection
�
� ���������
����� �
� � � � �
���� ��
��
�� ��� � �� ������ � ��� ����
�
�
��� � � ������
��� ���
�
�
�� � � � � �
�
9.6 Smoothing
9.6.1 Word Smoothing
�� � � � � �
��� � ������ �
��
���
��� ���� ���� ������ �
��� � ������� �
�� ��� ������
� �
������� � � � �� � � � � �
�� ��
������� � ��
� � �
�� ��� ����
���� ������ �����
�����
9.6.2 Document Smoothing
�� � � � � �
��� � ������ �
��
���
��� ���� ���� ������ �
�� ��
��� ����
�� �� ����������� ��
�� � � � � � � � �
�����
� � �
� � �
��� ��������� � �
��� ��������� ���������� ��������
�
����
������������ ��������������
������
�
���� ��������
� ������
� �
��� ����
���� �����������
9.6.3 Joint Smoothing
��� � ������ �
��
���
��
���
��� ���� ��� ��� � ��� ������ �
��� � ������ �
��
���
��
���
��� ���� ������� ���� ������ �
�� ��
��� ���� ���� ������ �������
�
9.7 Experiments
��
�
�
9.7.1 Experimental Conditions
� � � ��� ���
�
� � ��� ���
� � �������
� � ���
��� �
� � ��� � � �
�
9.7.2 Experimental Results
�
�
�
TABLE 9.1
� � � � � � � �
�
� � � �
� � � �
� � � �
� � � �
� �
9.7.3 Context Scope Selection
� � ����� �
� � � � � � ����
���� � � �
�
�
�
TABLE 9.2
� �
� � ��� � �
� � ���� � �
� � ���� � �
� � ����� � �
� � ���� � �
� � ���� � �
� � ���� � �
�
� � � � � ����
� � ����
9.8 Inherent Trade-Offs
�
9.8.1 Cross-Domain Training
�
�� �� � �� ���
� �
�� � ���� ���
�� �� � � ���
TABLE 9.3
� �
�� �� � ��� ��� � �
�� �� � ���� ��� � �
�� �� � ���� ��� � �
�
�
9.8.2 Discussion
�
�
�
�
�� ��
� �
9.9 Conclusion
�
�
�
�
�
�
References
Context-Dependent Vector Clustering for Speech Recognition
A Multi-Span Language Modeling Framework for Large Vo-cabulary Speech Recognition
Large Vocabulary Speech Recognition With Multi-Span Sta-tistical Language Models
Exploiting Latent Semantic Information in Statistical Lan-guage Modeling
Robustness in Statistical Language Modeling: Review andPerspectives
Fast Update of Latent Semantic Spaces Using a Linear Trans-form Framework
ANovel Word Clustering Algorithm Based on Latent Semantic Analysis
Toward Unconstrained Command andControl: Data-Driven Semantic Inference
Natural Language Spoken InterfaceControl Using Data-Driven Semantic Inference
Large–Scale Sparse Singular Value Computations
Using Linear Algebra for In-telligent Information retrieval
An Overview of Parallel Algorithms for the SingularValue and Dense Symmetric Eigenvalue Problems
Natural Language Call Routing: A Robust,Self–Organized Approach
Structure and Perfor-mance of a Dependency Language Model
Recognition Performance of a Structured LanguageModel
Building Probabilistic Models for Natural Language
Dialog Management in Vector–Based CallRouting
Language Model Adaptation Using Mix-tures and an Exponentially Decaying Cache
Towards Better Integration of Semantic Predictorsin Statistical Language Modeling
Lanczos Algorithms for Large SymmetricEigenvalue Computations – Vol. 1 Theory
Recognizing and Using Knowledge Structures in Dialog Sys-tems
Indexing by Latent Semantic Analysis
Adaptive Lan-guage Model Estimation Using Minimum Discrimination Estimation
Improving the Retrieval of Information from External Sources
Latent Semantic Indexing (LSI) and TREC–2
Language Modeling
Personalized Information Delivery: An Analysisof Information Filtering Methods
On Topic Identification and Dialogue Move Recognition
Topic–Based Language Modeling Using EM
Matrix Computations
Document Space Models Using Latent Semantic Anal-ysis
Probabilistic Latent Semantic Analysis
Probabilistic Topic Maps: Navigating Through Large Text Col-lections
Modeling Long Distance Dependencies in Lan-guage: Topic Mixtures Versus Dynamic Cache Models
Self–Organized Language Modeling for Speech Recognition
Putting Language into Language Modeling
Using a Stochastic Context–Free Grammar as a Language Model forSpeech Recognition
Putting Language Back into Language Modeling
Statistical Language Modeling Using a Variable Context
The Hub and Spoke Paradigm for CSR Evaluation
A Cache-based Natural Language Method for SpeechRecognition
Cluster Expansion and Iterative Scaling for Maxi-mum Entropy Language Models
Solution to Plato’s Problem: The LatentSemantic Analysis Theory of Acquisition, Induction, and Representation ofKnowledge
How Well Can Pas-sage Meaning Be Derived Without Using Word Order: A Comparison of LatentSemantic Analysis and Humans
Trigger–Based Language Models: AMaximum Entropy Approach
On Structuring Probabilistic Dependencesin Stochastic Language Modeling
A Variable–Length Category–Based N–Gram Lan-guage Model
Latent Seman-tic Indexing: A Probabilistic Analysis
Beyond Word �-Grams
An Overview of Automatic SpeechRecognition
The CMU Statistical Language Modeling Toolkit and its Use inthe 1994 ARPA CSR Evaluation
A Maximum Entropy Approach to Adaptive Statistical LanguageModeling
Two Decades of Statistical Language Modeling: Where Do WeGo From Here
Interactive Feature Induc-tion and Logistic Regression for Whole Sentence Exponential Language Mod-els
Language Representation
A MaximumLikelihood Model for Topic Classification of Broadcast News
An Explanation of the Effectiveness of Latent Semantic Indexing byMeans of a Bayesian Regression Model
Combining Nonlocal, Syntactic and N-Gram De-pendencies in Language Modeling
Recognition and Parsing of Context–Free Languages in Time�
�
Using Detailed Linguistic Structure inLanguage Modeling
Linguistic Features for Whole Sen-tence Maximum Entropy Language Models
Integration of Speech Recognition and Natural Language Processing in theMIT Voyager System
10
Semantic Information Processing of SpokenLanguage – How May I Help You?sm
A. L. Gorin, A. Abella, T. Alonso, G. Riccardi, and J. H. Wright,AT&T Laboratories
CONTENTS
10.1 Introduction
AT&T’s‘How May I Help You?’ ��
“The fundamental problem of communication is that of reproducing atone point either exactly or approximately a message selected at another
point. Frequently the messages have meaning, � � � These semantic as-pects of communication are irrelevant to the engineering problem.”
confirmclarify
“Do you want to make a collect call?”“Charge this call please”
“How do you want to charge this call, to a credit card or to a third num-ber?”
“What is your home phonenumber?”
Construct Algebra
dialog motivators
inheritance hierarchy
‘is a’‘has a’
10.2 Call-Classification
‘press one if you want x, press two if you want y’
‘please say collect, calling card’ ‘press orsay one if you want x’
‘How may I help you?’
“I want to reverse the charges on this call.”“Can you tell me what time it is in Tokyo?”“I was trying to call my sister and dialed a wrong number.”“I’ve been trying to dial this number all day and can’t get through.”
“How much money do I owe you?”“I don’t recognize this phone call to Tallahassee on October 4.”“What’s this charge for one dollar and fifty cents?”“I have a question about my bill.”
FIGURE 10.1Call classification and routing in HMIHY.
‘How may I help you?’
‘How may I help you?’
FIGURE 10.2Inheritance hierarchy of task knowledge in operator services.
perplexity
� � � ���
� �
Evaluating Call Classification.
FIGURE 10.3Histogram of utterance lengths.
false rejection
correct classificationtrue rejection rate
Remark:
“I want to know howto pay my bill”
10.3 Language Modeling for Recognition and Understanding
� � ���� � � � ��
�� � �� ������ � � � �����
‘I want to make a’‘collect call’ ‘card call’
� � � �
‘wrong’‘wrong number’
‘dialed a wrong number’
‘dialed a wrong number’ ‘dialed the wrong number’
FIGURE 10.4A salient grammar fragment.
salient grammar fragments
� User
� ��� yeah I’m not AT&T WIRELESS PHONE and when I got and she toldme that I would be switched to 7 CENTS A MINUTES FOR ALL my AT&Tlong distance on that I was on 10 10 cents ONE RATE PLAN
� SLU
FIGURE 10.5Natural spoken dialog in HMIHY.
10.4 Dialog
Machine:User:Machine:User:Machine:User:Machine:User:Machine:
Machine:User:Machine:User:Machine:
User:Machine:
10.5 Conclusions
www.research.att.com/�algor/hmihy
References
11
Machine Translation Using StatisticalModeling
Herman Ney, and F. J. OchAachen University of Technology, Germany
CONTENTS
Abstract.
11.1 Introduction
machine translationwritten language text
spoken speech
spontaneous speech
� Statistical Decision Theory and Linguistics.
� Alignment and Lexicon Models.
� Alignment Templates: From Single Word to Word Groups.
� Experimental Resultswritten spoken
� Speech Translation: The Integrated Approach.serial
integrated
11.2 Statistical Decision Theory and Linguistics
11.2.1 The Statistical Approach
�
�
�
�
�
�
�
11.2.2 Bayes Decision Rule for Written Language Translation
� ��
�������� ����� ��
�� ������������
�
� wordfull-form
����� ������
���
������������
� ���������
������� � �����
������
������ ���� �
�����
after
11.2.3 Related Approaches
�
�
�
Source Language Text
Transformation
Lexicon Model
Language Model
Global Search:
Target Language Text
over
maximize Alignment Model
Transformation
FIGURE 11.1Architecture of the translation approach based on Bayes decision rule.
11.3 Alignment and Lexicon Models
11.3.1 Concept of Alignment Modelling
���� ��
�����
�� � � ��� �� ��� ��
��
well
I
think
if
we
can
make
it
at
eight
on
both
days
ja
ich
denke
wenn
wir
das
hinkriegen
an
beiden
Tagen
acht
Uhr
FIGURE 11.2Example of an alignment for a German-English sentence pair.
exactlyonealignment models
11.3.2 Hidden Markov Models
� � � � ��
�� � �� � � ��
� ��
�� ������� ������� �
�� ��
��
����������� ��
���
������ ��
������
‘hidden’not
sequence �� �� � � � �
� � �� � �
���� �� ��
������
������ ��
������ �
� ��� ����� � �����
� ��
����� �
� ��� ����� �
�����
������ �� ��
����
�����
��� ��
� ��� ����� �
�����
������ ��
����
�����
��� � � ����� ��
����
��� ��� ��
��� ����� ����� ��
����
�����
��� �
����� ������
��� ��� �
����� ������
�����
��� �
� baseline HMM
� �
���� � �������
����� ������
�����
��� � �� ���� ����� � �
�
�
����� ����� � � �� ���
����� ������
��� ��
� �� �� ��� ���� �
���� ���� � �
�
���� ����� �� �� ���
� homogeneous HMM
��� ����� � �� ��� ������ � � �� �� � ����
����� �����
� �
���
� ��
� �� � ��� ����� � ��
������ � �������
���
���� �����
����
�� � �� �� � �����
� context dependent HMM
����� ������
�����
��� �� �� ��� ����� � � � �����
���� � � ��
����� ������
� ���� ��
�� �� �� ���� ���� � ����� ������
� ��� �����
11.3.3 Models IBM 1–5
before
� models IBM-1 and IBM-2: zero-oder dependence.first-order zero-order
absolute
����� ������
� �����
� ���� �� �� ���� �� � ��
������
����� � ��� �� �
�
���
��
���
����� �� � �� � ���� ���� ��
� ���
� ��� �� �
��
���
��
���
������ � �� � ���� �����
����� � ������ ����
����� � �� ��
���
� ��
�� � � ���
��
� model IBM-3: fertility concept.
�
�
������
���
�� ��� �� �� � �
�� ����
��� � ��
�� �� � �
� models IBM-4 and IBM-5: inverted alignments with first-order depen-dence.
������ ��
������
�� � �� � � �� � ���
� ��
���
���
� � �� � �
� �
� � � � �
���� �� � �� � �
�� �� ���� ��� � ����
�
�� � � � ����
� � �
�
� ��� absoluterelative
�� � �� � ���� �
�
��� ������� ���
�������
���������� ������� �������
�
�����
� � ��
������
���
�� ������
�
� ���
�
�����
��
���� ������
��� ���
word context
11.3.4 Training
exact allmaximum approximation
11.3.5 Search
� ��
� � ��
� invertedtarget source
� several�
� � �� �� � ��� ���� �� ���� �
SENTENCE INSOURCE LANGUAGE
TRANSFORMATION
SENTENCE GENERATEDIN TARGET LANGUAGE
SENTENCE
KNOWLEDGE SOURCESSEARCH: INTERACTION OF
KNOWLEDGE SOURCES
WORD + POSITION
ALIGNMENT
LANGUAGEMODEL
BILINGUALLEXICON
ALIGNMENTMODEL
WORD RE-ORDERING
SYNTACTIC ANDSEMANTIC ANALYSIS
LEXICAL CHOICE
HYPOTHESES
HYPOTHESES
HYPOTHESES
TRANSFORMATION
FIGURE 11.3Illustration of search in statistical translation.
sets � �
���������� �����
� �
���������� �����
����
������ ��� � ���
��
�����
�����
��������
������� � ���������� ����� �
�����
���� ����
���
FIGURE 11.4Illustration of bottom-to-top search.
bottom-to-top �
��
�all
once
11.3.6 Algorithmic Differences between Speech Recognition and Lan-guage Translation
�
�
11.4 Alignment Templates: From Single Words to Word Groups
11.4.1 Concept
alignment template
okay
,
how
about
the
nineteenth
at
maybe
,
two
o’clock
in
the
afternoon
?
okay ,
wie
sieht
es
am
neunzehnten
aus ,
vielleicht
um
zwei
Uhr
nachmittags ?
FIGURE 11.5Example of alignment templates for a German-English sentence pair.
� ��
���
���� ���
�� ��� � �������� ���� ��� � � � �� �����
��� � ���� � ��� � �������� ���� ��� � � � �� �����
withinbetween ���
� between
����
����
������
����� � ��� ���
�������
��
����
�������� ����
������
��
����
�������
������ � ��� ���
������� �����
��
����
��
���
�������������� � �� ����������
��������������within
�� � ����
��� � ���
�
�
�� �� ���� ��
�
������ � �� �� �� ���
�
� � ��� � �� � �
������ �� �� �� ���� ��� ��� �
�� �� �� ���
�� �� �� ��� �
���
���
���
���
���� � � � �� ��� �����
���� � � ����
�����
11.4.2 Training
each � �
���������� ���
11.4.3 Search
� �
�
�
between the word groups within
11.5 Experimental Results
11.5.1 The Task and the Corpus
before
�
�
don’t � do not
�
11.5.2 Offline Results
�
�
�
�
several
�
TABLE 11.1
TABLE 11.2
� �
11.5.3 Integration into the Prototype System
stattrans
stattransrepair
stat-trans
prosodyprosody
11.5.4 Final Evaluation
�
�
slot filling
�
and
relative
TABLE 11.3
�
11.6 Speech Translation: The Integrated Approach
11.6.1 Principle
���� ������������
� � ����� � ��
���
���� ��
�� ��
�
� ��
� ��
���������
���������
�� �
� ���������
������
�� � �����
�������
� ���������
��������
�� �����
������� ��
������
��
� ���������
��������
�� �����
����������� � �����
������ ��
��
��
� ���������
��������
�� �����
����������� � �����
������
��
�� ���������
�����
�� ����
���
������
������ � �����
�������
����������� ��
�� � �����
������ �
���
if � ��
���
������� ���� �
������
���� ���
11.6.2 Practical Implementation
� ��
���� ������ ��� � ���� ���� �
����������� �
����
��
���� ������ � � ���� ������ ��� �
Speech Input inSource Language
Translated Text inTarget Language
Acoustic Model
Lexicon Model
Alignment Model
Language Model
AcousticAnalysis
Global Search:
maximize
over
FIGURE 11.6Integrated architecture of speech translation approach based on Bayes decisionrule.
� ��
���
joint ���� ��� ��
�� � �����
�� � �����
������
���� ���
�������
�
���� ��� ��
��
�
���� ��� ��
��
� meaning
se-mantically �����
��
sourcetarget
11.7 Summary
Acknowledgment
11.8 References
Spoken Language Translation Workshop, 35th Annual Conf. of the As-soc. for Computational Linguistics
Computational Linguistics
Int. Conf. on Spoken Language Processing,
ARPA Human Language Technology Workshop
United States Patent
Computational Linguistics
Computer Speech and Language
ComputationalLinguistics
� Computational Linguistics
IEEE Automatic Speech Recognition and Understanding Workshop
Words and objections. Essays on the work of W. V. Quine
Workshop on Very Large Corpora
Int. Conf. on Spoken Language Processing,
Final report of the EuTrans project
39thAnnual Meeting of the Assoc. for Computational Linguistics,
39th Annual Meeting of the Assoc.for Computational Linguistics,
Statistical methods for speech recognition.
Europ. Conf. on Speech Communication and Technology,
Computational Linguistics
2nd Conf.of the Assoc. for Machine Translation in the Americas
IEEE Int.Conf. on Acoustics, Speech and Signal Processing,
IEEE Trans. on Speech and AudioProcessing
18th Int. Conf. on Computational Linguistics
2nd Int. Conf. on Language Resourcesand Evaluation
36th Annual Meeting of the Assoc. for Compu-tational Linguistics and 17th Int. Conf. on Computational Linguistics
9th Conf.of the Europ. Chapter of the Assoc. for Computational Linguistics
18th Int. Conf. on Computational Linguistics
38th Annual Meet-ing of the Assoc. for Computational Linguistics
Joint SIGDAT Conf. on Empirical Methods in Natural Lan-guage Processing and Very Large Corpora
Data-Driven Machine Translation Workshop, 39thAnnual Meeting of the Assoc. for Computational Linguistics
IBM Research Report
Fundamentals of speech recognition
Data-Driven MachineTranslation Workshop, 39th Annual Meeting of the Assoc. for ComputationalLinguistics
6th Int. Workshop on Parsing Technologies
Data-Driven Machine Translation Workshop, 39th AnnualMeeting of the Assoc. for Computational Linguistics
18th Int. Conf. on Computational Linguistics 2000
IEEE Int. Conf. onAcoustics, Speech and Signal Processing
38th An-nual Meeting of the Assoc. for Computational Linguistics
16th Int. Conf. on Computational Linguistics
Verbmobil: Foundations of speech-to-speech translation.
35th An-nual Conf. of the Assoc. for Computational Linguistics
IEEE Trans. on Speech and AudioProcessing,
Computational Linguistics
39thAnnual Meeting of the Assoc. for Computational Linguistics,
12
Modeling Topics for Detection and Tracking
James AllanUniversity of Massachusetts Amherst
CONTENTS
12.1 Topic Detection and Tracking
12.1.1 Topic and Events
event
topic
not
12.1.2 TDT Tasks
12.1.2.1 Segmentation
12.1.2.2 Cluster Detection
12.1.2.3 Tracking
12.1.2.4 New Event Detection
12.1.2.5 Link Detection
12.1.3 Corpora
�
each
�
�
12.1.4 Evaluation
� � � � �� � � � � � � �� � �
� � � ��
� � �� � � � � �� � �� � � � � �� � � � � ����
�� � ����
�
0.02 0.10.2 0.5 1 2 5 10 20 40 60 80 90False Alarm Rate
2
5
10
20
40
60
80
90
Miss
Rate
0.02 0.10.2 0.5 1 2 5 10 20 40 60 80
1
2
5
10
20
40
60
80
FIGURE 12.1A sample detection error tradeoff (DET) curve for the TDT tracking task withone training story (�� � �).
minimum
12.2 Basic Topic Models
12.2.1 Vector Space
�� � ��
������� � �������
12.2.2 Language Models
� ��� ��
�
� � �
�
� � ���
� � ��� ��
��
� �����
���� � ��� � ���� � ���
12.3 Implementing the Models
12.3.1 Named Entities
President Bush George Bush
12.3.2 Document Expansion
� ����� ��
���
� ���� �����
12.3.3 Clustering
12.3.4 Time Decay
12.4 Comparing Models
12.4.1 Nearest Neighbors
� �
�
�
� � �
�
�
12.4.2 Decision Trees
�
12.4.3 Model-to-Model
�
��� � � ������ � ��
� �
��� � �� ��
���
���� �������
����
�
� � �
��� � ���� ����� ����
���� � ��� ����� � ���
�
12.5 Miscellaneous Issues
�
�
12.5.1 Deferral
12.5.2 Multi-modal Issues
third
12.5.3 Multi-lingual Issues
FIGURE 12.2Screen snapshot of the Lighthouse system that was created to portray TDT topicclusters and their relationships.
12.6 Using TDT Interactively
12.6.1 Demonstrations
12.6.2 Timelines
��
Oklahoma
��
OklahomaMcVeigh Simpson
FIGURE 12.3Overview of January-June 1998. The topic labeled monica lewinsky allegation isthe highest ranked topic by the �� measure. The pop-up on oregon school shoot-ing shows significant named entities for that event. The other pop-up displays asub-menu for obtaining more information on the name kip kinkel.
��
12.7 Modeling Events
�
12.8 Conclusion
� research
References
Proceedings of Conference onInformation Retrieval Research (SIGIR)
Proceedings of the DARPA BroadcastNews Transcription and Understanding Workshop
Proceedings of Conference on Information Retrieval Research (SIGIR)
Information Retrieval
Topic Detection and Track-ing: Event-based Information Organization
In Proceedings of the 36th Annual Meetingof the Association for Computational Linguistics and the 17th InternationalConference on Computational Linguistics (COLING-ACL’98)
Proceedings for Empirical Methods in NLP
Proceedings of the Text Retrieval Conference(TREC-3)
Proceedings of the DARPA Broadcast News Workshop
Topic Detection and Tracking: Event-based InformationOrganization
Topic Detection and Tracking: Event-based Information Organization
Proceed-ings of the DELOS-NSF Workshop on Personalization and Recommender Sys-tems in Digital Libraries
Topic Detectionand Tracking: Event-based Information Organization
Topic Detection and Tracking: Event-based Information Organization
Proceedings of the Text Retrieval Conference (TREC-2)
Topic Detection and Tracking:Event-based Information Organization
Proceedings of the Human Language Technology Conference (HLT)
Proceedings of the Text RetrievalConference (TREC-8)
Proceedings of ACM SIGIR Conference on Research in Information Retrieval
Topic Detection andTracking: Event-based Information Organization
Proceedings of the IEEE Symposium on Information Visualization2000 (InfoVis 2000)
Foundations of Statistical Natural LanguageProcessing
EuroSpeech
Proceedings of the DARPABroadcast News Workshop
Proceedings of the 2000 Speech Transcription Workshop
Proceedings of the DARPA BroadcastNews Workshop
On-line New Event Detection, Clustering, and Tracking
Advances inInformation Retrieval: Recent Research from the CIIR
Proceedings of the DARPA Broadcast NewsWorkshop
Proceedings of SIGIR
A Language Modeling Approach to Information Retrieval
Proceedings ofthe European Conference on Research and Advanced Technology for DigitalLibraries (ECDL)
Proceedings of the Text Retrieval Conference (TREC-9)
Introduction to Modern InformationRetrieval
Topic Detectionand Tracking: Event-based Information Organization
Proceedings of the DARPA Broadcast NewsWorkshop
Proceedings of the Eighth International Conference on Informa-tion and Knowledge Management (CIKM99)
Proceedings of SIGIR
Proceedings of KDD 2000 Conference
Information Retrieval
Proceedings of the Text Retrieval Conference (TREC-8)
Proceedings of the DARPA Broadcast News Transcriptionand Understanding Workshop
ACM Transactions on Information Systems(TOIS)
Topic Detection and Tracking: Event-based Information Organization