acl2014 reading: [zhang+] "kneser-ney smoothing on expected count" and [pickhardt+]...
DESCRIPTION
ยTRANSCRIPT
![Page 1: ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickhardt+] "A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing"](https://reader034.vdocuments.us/reader034/viewer/2022042613/5463d72ab4af9f623f8b46e1/html5/thumbnails/1.jpg)
[Zhang+ ACL2014] Kneser-Ney Smoothing on Expected Count [Pickhardt+ ACL2014] A Generalized Language Model as the
Comination of Skipped n-grams and Modified Kneser-Ney Smoothing
2014/7/12 ACL Reading @ PFI
Nakatani Shuyo, Cybozu Labs Inc.
![Page 2: ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickhardt+] "A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing"](https://reader034.vdocuments.us/reader034/viewer/2022042613/5463d72ab4af9f623f8b46e1/html5/thumbnails/2.jpg)
Kneser-Ney Smoothing [Kneser+ 1995]
โข Discounting & Interpolation
๐ ๐ค๐ ๐ค๐โ๐+1๐โ1
=max ๐ ๐ค๐โ๐+1
๐ โ ๐ท, 0
๐ ๐ค๐โ๐+1๐โ1
+๐ท
๐ ๐ค๐โ๐+1๐โ1
๐1+ ๐ค๐โ๐+1๐โ1 โ ๐ ๐ค๐ ๐ค๐โ๐+2
๐โ1
โข where
๐ค๐๐ = ๐ค๐ โฏ๐ค๐, ๐1+ ๐ค๐
๐ โ = ๐ค๐|๐ ๐ค๐๐๐ค๐ > 0
Number of Discounting
![Page 3: ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickhardt+] "A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing"](https://reader034.vdocuments.us/reader034/viewer/2022042613/5463d72ab4af9f623f8b46e1/html5/thumbnails/3.jpg)
Modified KN-Smoothing [Chen+ 1999]
๐ ๐ค๐ ๐ค๐โ๐+1๐โ1
=๐ ๐ค๐โ๐+1
๐ โ ๐ท ๐ค๐โ๐+1๐
๐ ๐ค๐โ๐+1๐โ1
+ ๐พ ๐ค๐โ๐+1๐โ1 ๐ ๐ค๐ ๐ค๐โ๐+2
๐โ1
โข where ๐ท ๐ = 0 if ๐ = 0, ๐ท1 if ๐ = 1, ๐ท2 if ๐ = 2, _ ๐ท3+ if ๐ โฅ 3
๐พ ๐ค๐โ๐+1๐โ1 =
[amount of discounting]
๐ ๐ค๐โ๐+1๐โ1
Weighted Discounting (D_n are estimated by leave-1-out CV)
![Page 4: ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickhardt+] "A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing"](https://reader034.vdocuments.us/reader034/viewer/2022042613/5463d72ab4af9f623f8b46e1/html5/thumbnails/4.jpg)
[Zhang+ ACL2014] Kneser-Ney Smoothing on Expected Count
โข When each sentence has fractional
weight
โ Domain adaptation
โ EM-algorithm on word alignment
โข Propose KN-smoothing using expected
fractional counts
Iโm interested in it!
![Page 5: ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickhardt+] "A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing"](https://reader034.vdocuments.us/reader034/viewer/2022042613/5463d72ab4af9f623f8b46e1/html5/thumbnails/5.jpg)
Model
โข ๐ means ๐ค๐โ๐+1๐โ1 , and ๐โฒ means ๐ค๐โ๐+2
๐โ1
โข A sequence ๐๐ค occurs ๐ times and each
occurring has probability ๐๐ (๐ = 1,โฏ , ๐) as weight,
โข then count ๐(๐๐ค) is distributed according to Poisson Binomial Distribution.
โข ๐ ๐ ๐ข๐ค = ๐ = ๐ ๐, ๐ , where
๐ ๐, ๐ =
๐ ๐ โ 1, ๐ 1 โ ๐๐
+ ๐ ๐ โ 1, ๐ โ 1 ๐๐
if 0 โค ๐ โค ๐1 if ๐ = ๐ = 00 otherwise
![Page 6: ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickhardt+] "A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing"](https://reader034.vdocuments.us/reader034/viewer/2022042613/5463d72ab4af9f623f8b46e1/html5/thumbnails/6.jpg)
MLE on this model
โข Expectations
โ ๐ผ ๐ ๐๐ค = ๐ โ ๐ ๐ ๐๐ค = ๐๐
โ ๐ผ ๐๐ ๐ โ = ๐ ๐ ๐๐ค = ๐๐ค
โ ๐ผ ๐๐+ ๐ โ = ๐ ๐ ๐๐ค โฅ ๐๐ค
โข Maximize (expected) likelihood
โ ๐ผ ๐ฟ = ๐ผ ๐ ๐๐ค log ๐ ๐ค ๐๐๐ค
= ๐ผ ๐ ๐๐ค log ๐ ๐ค ๐๐๐ค
โ obtain ๐MLE ๐ค ๐ =๐ผ ๐ ๐๐ค
๐ผ ๐ ๐โ
![Page 7: ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickhardt+] "A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing"](https://reader034.vdocuments.us/reader034/viewer/2022042613/5463d72ab4af9f623f8b46e1/html5/thumbnails/7.jpg)
Expected Kneser-Ney
โข ๐ ๐๐ค =
max 0, ๐ ๐๐ค โ ๐ท + ๐1+ ๐ โ ๐ท๐โฒ(๐ค|๐โฒ)
โข So, ๐ผ ๐ ๐๐ค = ๐ผ ๐ ๐๐ค โ ๐ ๐ ๐๐ค > 0 ๐ท +
๐ผ ๐1+ ๐ โ ๐ท๐โฒ(๐ค|๐โฒ)
โ where ๐โฒ ๐ค ๐โฒ = ๐ผ ๐1+ โ ๐โฒ๐ค
๐ผ ๐1+ โ ๐โฒโ
โข then ๐ ๐ค ๐ =๐ผ ๐ ๐๐ค
๐ผ ๐ ๐โ
![Page 8: ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickhardt+] "A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing"](https://reader034.vdocuments.us/reader034/viewer/2022042613/5463d72ab4af9f623f8b46e1/html5/thumbnails/8.jpg)
Language model adaptation
โข Our corpus consists on
โ large general-domain data and
โ small specific domain data
โข Sentence ๐ โs weight:
โ ๐ ๐ is in โ domain =1
1+exp โ๐ป ๐
โ where ๐ป ๐ =log ๐in ๐ โlog ๐out ๐
๐,
โ ๐in:lang. model of in-domain, ๐out: outโs one
![Page 9: ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickhardt+] "A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing"](https://reader034.vdocuments.us/reader034/viewer/2022042613/5463d72ab4af9f623f8b46e1/html5/thumbnails/9.jpg)
โข Figure 1: On the language model adaptation task, expected KN outperforms all other methods across all sizes of selected subsets. Integral KN is applied to unweighted instances, while fractional WB, fractional KN and expected KN are applied to weighted instances. (via [Zhang+ ACL2014])
from general-domain data
in-domain data - training: 54k - testing: 3k
192
162
156
148
Why isn't there Modified KN as a
baseline?
![Page 10: ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickhardt+] "A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing"](https://reader034.vdocuments.us/reader034/viewer/2022042613/5463d72ab4af9f623f8b46e1/html5/thumbnails/10.jpg)
[Pickhardt+ ACL2014] A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing
โข Higher-order n-grams are very sparse
โ Especially remarkable on small data(e.g.
domain specific data!)
โข Improve performance for small data
by skipped n-grams and Modified KN-
smoothing
โ Perplexity reduces 25.7% for very small
training data of only 736KB text
![Page 11: ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickhardt+] "A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing"](https://reader034.vdocuments.us/reader034/viewer/2022042613/5463d72ab4af9f623f8b46e1/html5/thumbnails/11.jpg)
โGeneralized Language Modelsโ
โข ๐3๐ค1๐ค2๐ค3๐ค4 = ๐ค1๐ค2_๐ค4
โ โ_โ means a word placeholder
๐GLM ๐ค๐ ๐ค๐โ๐+1๐โ1 =
๐ ๐ค๐โ๐+1๐ โ ๐ท ๐ ๐ค๐โ๐+1
๐
๐ ๐ค๐โ๐+1๐โ1
+๐พhigh ๐ค๐โ๐+1๐โ1
1
๐ โ 1๐ GLM
๐โ1
๐=1
๐ค๐ ๐๐๐ค๐โ๐+1๐โ1
๐ GLM ๐ค๐ ๐๐๐ค๐โ๐+1๐โ1 =
๐1+ ๐๐๐ค๐โ๐๐ โ ๐ท ๐ ๐๐๐ค๐โ๐+1
๐
๐1+ ๐๐๐ค๐โ๐+1๐โ1 โ
+๐พmid ๐๐๐ค๐โ๐+1๐โ1
1
๐ โ 2๐ GLM ๐ค๐ ๐๐๐๐๐ค๐โ๐+1
๐โ1
๐โ1
๐=1,๐โ ๐
![Page 12: ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickhardt+] "A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing"](https://reader034.vdocuments.us/reader034/viewer/2022042613/5463d72ab4af9f623f8b46e1/html5/thumbnails/12.jpg)
โข The bold arrows correspond to interpolation of models in traditional modified Kneser-Ney smoothing. The lighter arrows illustrate the additional interpolations introduced by our generalized language models. (via [Pickhardt+ ACL2014])
![Page 13: ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickhardt+] "A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing"](https://reader034.vdocuments.us/reader034/viewer/2022042613/5463d72ab4af9f623f8b46e1/html5/thumbnails/13.jpg)
โข shrunk training data sets for the English Wikipedia
small domain specific data
![Page 14: ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickhardt+] "A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing"](https://reader034.vdocuments.us/reader034/viewer/2022042613/5463d72ab4af9f623f8b46e1/html5/thumbnails/14.jpg)
Space Complexity
model size = 9.5GB # of entries = 427M
model size = 15GB # of entries = 742M
![Page 15: ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickhardt+] "A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing"](https://reader034.vdocuments.us/reader034/viewer/2022042613/5463d72ab4af9f623f8b46e1/html5/thumbnails/15.jpg)
References
โข [Zhang+ ACL2014] Kneser-Ney Smoothing on Expected Count
โข [Pickhardt+ ACL2014] A Generalized Language Model as the Comination of Skipped n-grams and Modified Kneser-Ney Smoothing
โข [Kneser+ 1995] Improved backing-off for m-gram language modeling
โข [Chen+ 1999] An Empirical Study of Smoothing Techniques for Language Modeling