statistical pronunciation modeling for non-native speech · statistical pronunciation modeling for...
TRANSCRIPT
![Page 1: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/1.jpg)
Statistical Pronunciation Modelingfor Non-native Speech
Dissertation
Rainer Gruhn
Nov. 14th, 2008
Institute of Information Technology University of Ulm, Germany
In cooperation with Advanced Telecommunication Research Labs, Kyoto
![Page 2: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/2.jpg)
Page 2
Introduction→ Motivation and background→ Thesis objectives
Hidden Markov Models as statistical lexicon→ Initialization and training → Application
Experiments→ ATR non-native speech database→ Evaluation
Closing→ Thesis contributions → Publications
Outline
Introduction | HMMs as statistical lexicon | Experiments | Closing
![Page 3: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/3.jpg)
Page 3
Non-native English speech
Introduction | HMMs as statistical lexicon | Experiments | Closing
Mispronunciations include phoneme insertions, deletions and substitutions(e.g. in German English: /th/)Different patterns for each language (→ Accent)Example: “Certainly. What time do you anticipate checking in?”
Relevant in many applications of speech recognition:Automatic tourist information systemCar navigation with user going abroadSpeech recognition in the media domain
JapaneseIndonesianChinese French
![Page 4: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/4.jpg)
Page 4
Schematic Outline of a Speech Recognition System
Introduction | HMMs as statistical lexicon | Experiments | Closing
n-bestrecognitionspeech
featureextraction
rescoringfeatures n-best word
hypothesesresult
additionalknowledge
acousticmodel
languagemodel dictionary
![Page 5: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/5.jpg)
Page 5
Schematic Outline of a Speech Recognition System
Introduction | HMMs as statistical lexicon | Experiments | Closing
n-bestrecognitionspeech
featureextraction
rescoringfeatures n-best word
hypothesesresult
acousticmodel
languagemodel dictionary
additionalknowledge
Improve performance for individual speakers:
→ acoustic model adaptation (e.g. Maximum A Posteriori)
![Page 6: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/6.jpg)
Page 6
Schematic Outline of a Speech Recognition System
Introduction | HMMs as statistical lexicon | Experiments | Closing
n-bestrecognitionspeech
featureextraction
rescoringfeatures n-best word
hypothesesresult
acousticmodel
languagemodel dictionary
additionalknowledge
Common approach for non-native speakers:
→ rule-based dictionary enhancement (Goronzy 2002, Mayfield-Tomokiyo 2001)
![Page 7: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/7.jpg)
Page 7
Schematic Outline of a Speech Recognition System
Introduction | HMMs as statistical lexicon | Experiments | Closing
n-bestrecognitionspeech
featureextraction
rescoringfeatures n-best word
hypothesesresult
Proposed stat.HMM lexicon
acousticmodel
languagemodel dictionary
Proposed method:
→ rescoring with HMMs as statistical lexicon
![Page 8: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/8.jpg)
Page 8
Common Approach: Rules
Introduction | HMMs as statistical lexicon | Experiments | Closing
Common approach:Phoneme confusion rules (data driven / knowledge based)
Apply rules on pronunciation dictionary
s ae ng k - i uw
$S ae ng k - $S uw
th ae ng k - y uw
th�s � � � - y�i �
recognitionresult
transcription
comparison
generatedrules
thank : /th ae ng k/ , /s ae ng k/;you : /y uw/, /i uw/;
Rule set: th�s, y�i
![Page 9: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/9.jpg)
Page 9
Problems about Rules
Introduction | HMMs as statistical lexicon | Experiments | Closing
Pronunciation variations also depend on contextVariations unseen in training data cannot be modeledKnowledge-based: Manual rule generationWhen rules are applied to pronunciation dictionary: tradeoff between:
Large dictionary (including all possible variations as entry)Losing information (choosing to apply only some rules)
![Page 10: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/10.jpg)
Page 10
Thesis Objective
Introduction | HMMs as statistical lexicon | Experiments | Closing
Non-native speech: many pronunciation variations, automatic speech recognition difficultImprove automatic speech recognition of non-nativesTarget: Model those variations automatically and statisticallyCover all pronunciation variations
Approach: Train discrete Hidden Markov Models (HMM) for each word as pronunciation model
![Page 11: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/11.jpg)
Page 11
Introduction→ Motivation and background→ Thesis objectives
Hidden Markov Models as statistical lexicon→ Initialization and training → Application
Experiments→ ATR non-native speech database→ Evaluation
Closing→ Thesis contributions → Publications
Outline
Introduction | HMMs as statistical lexicon | Experiments | Closing
![Page 12: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/12.jpg)
Page 12
Statistical Lexicon
Introduction | HMMs as statistical lexicon | Experiments | Closing
HMMs to represent pronunciations (not explicitly representing the confusions)One discrete HMM model for each wordInitialization on baseline lexiconTraining on phoneme sequences generated by phoneme recognition
![Page 13: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/13.jpg)
Page 13
Initialization, Training and Application
Introduction | HMMs as statistical lexicon | Experiments | Closing
Enter ae 0.495ax 0.495
…
n 0.99…
d 0.99…
Exit
Initialization
Application of Modelsae n l eh n w ih ch ih l eh k t ix s t ey
anywhere you‘d like to stay
like to stayand when would you
like to stayand what will toI…
-82.5
-69.0
-75.0
Phoneme sequence
N-best hypotheses Pronunciationscore
Training
Phoneme recognition to generate phoneme sequences
Speech data
Phonemesax n d w ih th ah sh ow
ax n d eh n d ah n t
AND:
Train word pronunciation model on all instances of that word
![Page 14: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/14.jpg)
Page 14
Word Model Example: AND
Introduction | HMMs as statistical lexicon : Initialization | Experiments | Closing
Enter ae 0.495ax 0.495
ah 0.0002b 0.0002d 0.0002n 0.0002
…
Exit
Transitions
States
ProbabilityDistributions
AND: /ae n d//ax n d/
n 0.99
ae 0.0002ax 0.0002ah 0.0002b 0.0002d 0.0002
…
d 0.99
ae 0.0002ax 0.0002ah 0.0002b 0.0002n 0.0002
…
![Page 15: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/15.jpg)
Page 15
Model Initialization
Introduction | HMMs as statistical lexicon: Initialization | Experiments | Closing
Given: standard pronunciation dictionaryOne discrete HMM for each wordNumber of states equals number of baseline phonemes (+ enter, exit states)Several pronunciation variants in dictionary are integrated into word model
![Page 16: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/16.jpg)
Page 16
Model Training
Introduction | HMMs as statistical lexicon: Training | Experiments | Closing
Segmentation of training data into wordsPhoneme recognitionTrain discrete HMM for each word on phoneme sequenceDefault unseen words to baseline lexicon phoneme sequence(s)
![Page 17: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/17.jpg)
Page 17
Training of Discrete HMMs
Introduction | HMMs as statistical lexicon: Training | Experiments | Closing
Phoneme recognition to generate phoneme sequences
Speech data
Phonemesax n d w ih th ah sh ow
ax n d eh n d ah n t
AND:
Train word pronunciation model on all instances of that word
![Page 18: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/18.jpg)
Page 18
Word Model After Training
Introduction | HMMs as statistical lexicon : Initialization | Experiments | Closing
Enter ae 0.5ax 0.3ah 0.15ih 0.05d 0.0001n 1.0e-6
…
Exit
Transitions
States
ProbabilityDistributions
AND: /ae n d//ax n d/
n 0.7m 0.2
ng 0.005hh 0.002b 0.0001d 1.0e-6
…
d 0.7t 0.2b 0.05
ae 0.0001ax 0.0001ah 1.0e-6
…
![Page 19: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/19.jpg)
Page 19
Model Application
Introduction | HMMs as statistical lexicon: Application | Experiments | Closing
Standard n-best decoding of test set
n-bestrecognition
test utterance
n-best string
![Page 20: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/20.jpg)
Page 20
Model Application
Introduction | HMMs as statistical lexicon: Application | Experiments | Closing
Standard n-best decoding of test set 1-best phoneme recognition of whole utterance
phonemerecognition
n-bestrecognition
test utterance
n-best string
phonemesequence
![Page 21: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/21.jpg)
Page 21
Model Application
Introduction | HMMs as statistical lexicon: Application | Experiments | Closing
Standard n-best decoding of test set 1-best phoneme recognition of whole utteranceCalculate pronunciation score of each n-best hypothesis
phonemerecognition
n-bestrecognition
test utterance
Viterbialignment
pron. score
n-best string
phonemesequence
Proposed stat.HMM lexicon
![Page 22: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/22.jpg)
Page 22
Model Application
Introduction | HMMs as statistical lexicon: Application | Experiments | Closing
Standard n-best decoding of test set 1-best phoneme recognition of whole utteranceCalculate pronunciation score of each n-best hypothesisSelect best hypothesis based on pronunciation score with weighted language model score
phonemerecognition
n-bestrecognition
test utterance
Viterbialignment
pron. score
n-best string
phonemesequence
max. scoreselector
Language model score
best fromn-best
Proposed stat.HMM lexicon
LM
![Page 23: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/23.jpg)
Page 23
Rescoring of N-best
Introduction | HMMs as statistical lexicon: Application | Experiments | Closing
ae n l eh n w ih ch ih l eh k t ix s t ey
anywhere you‘d like to stay
like to stayand when would you
like to stayand what will toI…
-82.5
-69.0
-75.0
Phoneme sequence
N-best hypotheses
Pronunciationscore
…
![Page 24: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/24.jpg)
Page 24
Introduction→ Motivation and background→ Thesis objectives
Hidden Markov Models as statistical lexicon→ Initialization and training → Application
Experiments→ ATR non-native speech database→ Evaluation
Closing→ Thesis contributions → Publications
Outline
Introduction | HMMs as statistical lexicon | Experiments | Closing
![Page 25: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/25.jpg)
Page 25
ATR Non-native Speech Database
Introduction | HMMs as statistical lexicon | Experiments: Database | Closing
country China France Germany Indonesia Japan all
#speakers 17 15 15 15 28 96
Existing comparable databases (large, multi-accent):M-ATC, Hiwire: noisy, special military vocabularyCrosstowns: unavailable to public
Collected in this work One of the largest non-native English speech databasesData available at ATRTotal 22h of speech
![Page 26: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/26.jpg)
Page 26
ATR Non-native Speech Database
Introduction | HMMs as statistical lexicon | Experiments: Database | Closing
Per speaker: 12 minutes training, 2 minutes test data (2 hotel reservation dialogs)Read speechContent: Uniform set of
hotel reservation dialogsphonetically balanced sentencesdigit sequences
Speaker skill: various, rated
![Page 27: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/27.jpg)
Page 27
Database Collection
Introduction | HMMs as statistical lexicon | Experiments: Database | Closing
Non-nativeness vs. anxiousness:– Instructor in same room, nodding– Non-intimidating environment– Words where speaker was not sure how to pronounce: speaker
had to try– Speakers could repeat sentence until satisfied
![Page 28: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/28.jpg)
Page 28
Experimental Setup
Introduction | HMMs as statistical lexicon | Experiments: Evaluation | Closing
Baseline dictionary: 7311 words, 8875 entries→ 7311 pronunciation HMMs
10-best word recognitionGenerate pronunciation HMMs separately for each accent groupAcoustic model: trained on Wall Street Journal databaseWord bigram LM, trained on travel arrangement task text data
Phoneme/Word error rate
totalNSUBDELINS ++
before
afterbefore
ERRERRERR −
Relative error rate improvement
![Page 29: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/29.jpg)
Page 29
Phoneme Recognition
40
45
50
55
60
65
70
CH FR GER IN JAP Average
Monophone
Triphone
Introduction | HMMs as statistical lexicon | Experiments: Evaluation | Closing
Both pronunciation model training and application steps require phoneme recognitionError rate calculated relative to canonical transcriptionRecognition of whole utterancePhoneme bigram as phonotactical constraint
Phoneme
error
rate
![Page 30: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/30.jpg)
Page 30
Pronunciation Scoring: Results
Accent type CH FR GER IN JP Avg
rel. WER impr. 11.9 8.3 5.9 5.4 8.0 8.2
Introduction | HMMs as statistical lexicon | Experiments: Evaluation | Closing
Word error rates for non-native speech recognition, with and without pronunciation rescoring
Word
error
rate
25
30
35
40
45
50
55
60
CH FR GER IN JAP Average
Baseline
Rescoring
![Page 31: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/31.jpg)
Page 31
Comparing to Standard Technology
0
1
2
3
4
5
6
7
8
Rules
Statist.Lexicon
Introduction | HMMs as statistical lexicon | Experiments: Evaluation | Closing
Standard approach to adjust for non-native speech: Rule-based Dictionary modificationComparison of relative improvements
Relative word
error rate
improvement
Evaluated for the Japanese speaker set
%
00,5
11,5
22,5
33,5
44,5
8875
9994
1214
214
218
2350
641
151
rel.Impr.
Pronunciations in dictionary
%
Improvement vs. pronunciation alternatives added to dictionary
![Page 32: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/32.jpg)
Page 32
Introduction→ Motivation and background→ Thesis objectives
Hidden Markov Models as statistical lexicon→ Initialization and training → Application
Experiments→ ATR non-native speech database→ Evaluation
Closing→ Thesis contributions → Publications
Outline
Introduction | HMMs as statistical lexicon | Experiments | Closing
![Page 33: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/33.jpg)
Page 33
Thesis ContributionsTheoretical→ Integrated framework for statistical pronunciation modeling→ Both learned and unseen variations are considered→ Data-driven: No expert knowledge about accent is required
Introduction | HMMs as statistical lexicon | Experiments | Closing
Practical→ Collected a large non-native English speech database
→22h of speech uttered by 96 speakers →among the largest such databases existing
Experimental→ Consistently improved performance for any type of accent→ Largest improvement achieved: 11.9% relative WER reduction
![Page 34: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/34.jpg)
Page 34
Publications (Excerpt)
Introduction | HMMs as statistical lexicon | Experiments | Closing
1. A Statistical Lexicon for Non-Native Speech RecognitionRainer Gruhn, Konstantin Markov, Satoshi Nakamura, ICSLP 2004
2. Discrete HMMs for statistical pronunciation modelingRainer Gruhn, Konstantin Markov, Satoshi Nakamura, SLP 2004
3. A multi-accent non-native English databaseRainer Gruhn, Tobias Cincarek, Satoshi Nakamura, ASJ 2004
4. A Statistical Lexicon Based on HMMsRainer Gruhn, Satoshi Nakamura, IPSJ 2004
5. Probability Sustaining Phoneme Substitution for Non-Native Speech RecognitionRainer Gruhn, Konstantin Markov, Satoshi Nakamura, ASJ 2002
6. CORBA-based Speech-to-Speech Translation SystemRainer Gruhn, Koji Takashima, Atsushi Nishino, Satoshi Nakamura, ASRU 2001
7. A CORBA based Speech-to-Speech Translation SystemRainer Gruhn, Koji Takashima, Atsushi Nishino, Satoshi Nakamura, ASJ 2001
8. Multilingual Speech Recognition with the CALLHOME CorpusRainer Gruhn, Satoshi Nakamura, ASJ 2001
9. Cellular Phone Based Speech-To-Speech Translation System ATR-MATRIXRainer Gruhn, Harald Singer, Hajime Tsukada, Atsushi Nakamura, Masaki Naito, Atsushi Nishino, Yoshinori Sagisaka, Satoshi Nakamura, ICSLP 2000
10. Towards a Cellular Phone Based Speech-To-Speech Translation ServiceRainer Gruhn, Satoshi Nakamura, Yoshinori Sagisaka, MSC 2000
11. Scalar Quantization of Cepstral Parameters for Low Bandwidth Client-Server Speech Recognition SystemsRainer Gruhn,Harald Singer,Yoshinori Sagisaka, ASJ 1999
Total: 46 Publications
![Page 35: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/35.jpg)
Page 35
Patents
Introduction | HMMs as statistical lexicon | Experiments | Closing
2001-222292 A computer with a speech processing system and program inmemory
2001-222531 A computer with a program in memory that provides speech translation and feedback
2002-135642 A speech to speech translation system2002-304392 A speech to speech translation system2002-311983 A speech to speech translation system 2002-320037 A speech to speech translation system2005-234504 A method for training HMM pronunciation models for speech
recognition2005-292770 A method for acoustic model generation and speech recognition2006- 84965 A system and program for speech data collection2006- 84966 A method and program for automatic rating of spoken speech
Total: 10 Patents, all granted by Japanese Patent Office
![Page 36: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/36.jpg)
Page 36
Future Directions
Introduction | HMMs as statistical lexicon | Experiments | Closing
Applicability on native speechBaseline dictionary with no pronunciation variants
Speech controlled services on mobile devices
Experiments on word level → smaller units?SyllablesN-phones
Special states to model insertion errorsAccent recognition
![Page 37: Statistical Pronunciation Modeling for Non-native Speech · Statistical Pronunciation Modeling for Non-native Speech Dissertation Rainer Gruhn Nov. 14th, 2008 Institute of Information](https://reader031.vdocuments.us/reader031/viewer/2022020303/5b5c4d007f8b9a9c398be0d3/html5/thumbnails/37.jpg)
Page 37
! THANK YOU !
Introduction | HMMs as statistical lexicon | Experiments | Closing