topic-independent speaking-style transformation of language model for spontaneous speech recognition...
TRANSCRIPT
![Page 1: Topic-independent Speaking-Style Transformation of Language model for Spontaneous Speech Recognition Yuya Akita, Tatsuya Kawahara](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649f4d5503460f94c6de59/html5/thumbnails/1.jpg)
Topic-independent Speaking-Style Transformation of Language model for
Spontaneous Speech Recognition
Yuya Akita , Tatsuya Kawahara
![Page 2: Topic-independent Speaking-Style Transformation of Language model for Spontaneous Speech Recognition Yuya Akita, Tatsuya Kawahara](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649f4d5503460f94c6de59/html5/thumbnails/2.jpg)
Introduction
• Spoken-style v.s. writing style– Combination of document and spontaneous corpus
• Irrelevant linguistic expression– Model transformation
• Simulated spoken-style text by randomly inserting fillers• Weighted finite-state transducer framework (?)• Statistical machine translation framework
• Problem with Model transformation methods– Small corpus, data sparseness – One of solutions:
• POS tag
![Page 3: Topic-independent Speaking-Style Transformation of Language model for Spontaneous Speech Recognition Yuya Akita, Tatsuya Kawahara](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649f4d5503460f94c6de59/html5/thumbnails/3.jpg)
Statistical Transformation of Language model
• Posteriori:
– X: source language model (document style)– Y: target language model (spoken language)
• So,
– P(X|Y) and P(Y|X) are transformation model
• Transformation models can be estimated using parallel corpus – n-gram count:
XP
YPYXPXYP
||
YXP
XYPXPYP
|
|
yxP
xyPxNyN n
LMn
LM |
|11
![Page 4: Topic-independent Speaking-Style Transformation of Language model for Spontaneous Speech Recognition Yuya Akita, Tatsuya Kawahara](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649f4d5503460f94c6de59/html5/thumbnails/4.jpg)
Statistical Transformation of Language model (cont.)
• Data sparseness problem for parallel corpus– POS information
• Linear interpolation• Maximum entropy
![Page 5: Topic-independent Speaking-Style Transformation of Language model for Spontaneous Speech Recognition Yuya Akita, Tatsuya Kawahara](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649f4d5503460f94c6de59/html5/thumbnails/5.jpg)
Training
• Use aligned corpus– Word-based transformation probability
– POS-based transformation probability
– Pword(x|y) and PPOS(x|y) are estimated accordingly
xN
xyNxyPword
|
xN
xyNxyPPOS
|
![Page 6: Topic-independent Speaking-Style Transformation of Language model for Spontaneous Speech Recognition Yuya Akita, Tatsuya Kawahara](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649f4d5503460f94c6de59/html5/thumbnails/6.jpg)
Training (cont.)
• Back-off scheme
• Linear interpolation scheme
• Maximum entropy scheme
– ME model is applied to every n-gram entry of document-style model
– spoken-style n-garm is generated if transform probability is larger than a threshold
exists if else
exists if
yxxyP
yxxyPxyP
POSPOS
word
|
||
xyPxyPxyP POSword |1||
i
ii yxfZ
xyP ,exp1
|
![Page 7: Topic-independent Speaking-Style Transformation of Language model for Spontaneous Speech Recognition Yuya Akita, Tatsuya Kawahara](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649f4d5503460f94c6de59/html5/thumbnails/7.jpg)
Experiments
• Training coprus:– Baseline corpus: National Congress of Japan, 71M words– Parallel corpus: budget committee in 2003, 666K– Corpus of Spontaneous Japan, 2.9M words
• Test corpus:– Another meeting of Budget committee in 2003, 63k words
![Page 8: Topic-independent Speaking-Style Transformation of Language model for Spontaneous Speech Recognition Yuya Akita, Tatsuya Kawahara](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649f4d5503460f94c6de59/html5/thumbnails/8.jpg)
Experiments (cont.)
• Evaluation of Generality of transformation model• LM
![Page 9: Topic-independent Speaking-Style Transformation of Language model for Spontaneous Speech Recognition Yuya Akita, Tatsuya Kawahara](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649f4d5503460f94c6de59/html5/thumbnails/9.jpg)
Experiments (cont.)
• r
![Page 10: Topic-independent Speaking-Style Transformation of Language model for Spontaneous Speech Recognition Yuya Akita, Tatsuya Kawahara](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649f4d5503460f94c6de59/html5/thumbnails/10.jpg)
Conclusions
• Propose a novel statistical transformation model approach
![Page 11: Topic-independent Speaking-Style Transformation of Language model for Spontaneous Speech Recognition Yuya Akita, Tatsuya Kawahara](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649f4d5503460f94c6de59/html5/thumbnails/11.jpg)
Non-stationary n-gram model
![Page 12: Topic-independent Speaking-Style Transformation of Language model for Spontaneous Speech Recognition Yuya Akita, Tatsuya Kawahara](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649f4d5503460f94c6de59/html5/thumbnails/12.jpg)
Concept
• Probability of sentence– n-gram LM
• Actually,
• Miss long-distance and word position information while applying Markov assumption
n
iiinii wwwwPsP
1121 ,,...|
n
iiinii
n
iplplplpl
n
iplplplpl
wwwwP
wwwwP
wwwwPsP
iiiininiii
iiii
1121
1,,,,
1,,,,
,,...|
,,...,|
,...,,|
112211
112211
![Page 13: Topic-independent Speaking-Style Transformation of Language model for Spontaneous Speech Recognition Yuya Akita, Tatsuya Kawahara](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649f4d5503460f94c6de59/html5/thumbnails/13.jpg)
Concept (cont.)
•
n
iiinii
n
iplplplpl
twwwwP
wwwwPsPiiiininiii
1121
1,,,,
,,,...|
,,...,|112211
![Page 14: Topic-independent Speaking-Style Transformation of Language model for Spontaneous Speech Recognition Yuya Akita, Tatsuya Kawahara](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649f4d5503460f94c6de59/html5/thumbnails/14.jpg)
Training (cont.)
• ML estimation
• Smoothing– Use low order – Use small bins– Transform with
Smoothed normal ngram
• Combination– Linear interpolation– Back-off
twwC
twwCtwwwwp
ini
iniiinii ,,...,
,,...,,,,...,|
11
1121
![Page 15: Topic-independent Speaking-Style Transformation of Language model for Spontaneous Speech Recognition Yuya Akita, Tatsuya Kawahara](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649f4d5503460f94c6de59/html5/thumbnails/15.jpg)
Smoothing with lower order (cont.)
• Additive smoothing
• Back-off smoothing
• Linear interpolation
VtwwC
twwCtwwP
ii
iiii
,,
1,,,|
1
11
otherwise ,
0,, if ,|,| 11
1 twP
twwCtwwPtwwP
i
iiiiGTii
twPttwwPttwwP iiiii ,1,|,| 11
![Page 16: Topic-independent Speaking-Style Transformation of Language model for Spontaneous Speech Recognition Yuya Akita, Tatsuya Kawahara](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649f4d5503460f94c6de59/html5/thumbnails/16.jpg)
Smoothing with small bins (k=1) (cont.)
• Back-off smoothing
• Linear interpolation
• Hybrid smoothing
otherwise |,
0,, if ,|,|
1
111
ii
iiiiGTii wwP
twwCtwwPtwwP
111 |~
1,|,|ˆ iiiiii wwPttwwPttwwP
iiiii wPwwPwwP 1||~
11
11 ||~
iiGTii wwPwwP
111 |~
1,|,|ˆ iiiiii wwPttwwPttwwP
![Page 17: Topic-independent Speaking-Style Transformation of Language model for Spontaneous Speech Recognition Yuya Akita, Tatsuya Kawahara](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649f4d5503460f94c6de59/html5/thumbnails/17.jpg)
Transformation with smoothed ngram
• Novel method
– If t-mean(w) decreases, the word is more important– Var(w) is used to balance t-mean(w) for active words– active word: words can appears at any position in the sentences
• Back-off smoothing & linear interpolation
11 |
1,|
2
iiSMOOTHEDwMeant
wVar
ii wwPeZ
twwP i
i
![Page 18: Topic-independent Speaking-Style Transformation of Language model for Spontaneous Speech Recognition Yuya Akita, Tatsuya Kawahara](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649f4d5503460f94c6de59/html5/thumbnails/18.jpg)
Experiments
Observation: Marginal position & middle position
![Page 19: Topic-independent Speaking-Style Transformation of Language model for Spontaneous Speech Recognition Yuya Akita, Tatsuya Kawahara](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649f4d5503460f94c6de59/html5/thumbnails/19.jpg)
Experiments (cont.)
• NS bigram
![Page 20: Topic-independent Speaking-Style Transformation of Language model for Spontaneous Speech Recognition Yuya Akita, Tatsuya Kawahara](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649f4d5503460f94c6de59/html5/thumbnails/20.jpg)
Experiments (cont.)
• Comparison with three smoothing techniques
![Page 21: Topic-independent Speaking-Style Transformation of Language model for Spontaneous Speech Recognition Yuya Akita, Tatsuya Kawahara](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649f4d5503460f94c6de59/html5/thumbnails/21.jpg)
Experiments (cont.)
• Error rate with different bins
![Page 22: Topic-independent Speaking-Style Transformation of Language model for Spontaneous Speech Recognition Yuya Akita, Tatsuya Kawahara](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649f4d5503460f94c6de59/html5/thumbnails/22.jpg)
Conclusions
• Traditional n-gram model is enhanced by relaxing its stationary hypothesis and exploring the word positional information in language modeling
![Page 23: Topic-independent Speaking-Style Transformation of Language model for Spontaneous Speech Recognition Yuya Akita, Tatsuya Kawahara](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649f4d5503460f94c6de59/html5/thumbnails/23.jpg)
Two-way Poisson Mixture model
![Page 24: Topic-independent Speaking-Style Transformation of Language model for Spontaneous Speech Recognition Yuya Akita, Tatsuya Kawahara](https://reader038.vdocuments.us/reader038/viewer/2022110206/56649f4d5503460f94c6de59/html5/thumbnails/24.jpg)
Essential
• Poisson distribution
• Poisson mixture model
!
|n
enP
n
kR
r
p
jjrkjkr xkYxXP
1 1,,||
!
|,,
,,,,
j
xjrk
jrkj x
ex
jrkj
Poisson Rk
Poisson 1
Poisson 2…
Class k
πk1
πk2
πkRk
Σ
xp
...
X2
X1
Document x Multivariate Poisson, dim = p (lexicon size)
*Word clustering: reduce Poisson dimension=> Two-way mixtures