1
A Comparative Investigation of Morphological Language Modeling
for the Languages of the European UnionThomas Muller, Hinrich Schutze and Helmut Schmid
ACL June 3-8, 2012 Reporter:Sitong Yang
ICT
2
Outline
• Introduction • Modeling of morphology and shape• Experimental Setup• Results and Discussion• Conclusion
3
Outline
• Introduction • Modeling of morphology and shape• Experimental Setup• Results and Discussion• Conclusion
5
Motivation
Language model?
potentially
large
dangerous
serious
hypothetically
large
dangerous
serious
(frequent history) (rare history)
how to transfer ?
morphology
7
main idea• goal
•perplexity reduction(PD) for a large number of languages
• Feature•Morphologigy•Shape Feature
8
main idea• goal
•perplexity reduction(PD) for a large number of languages
• Feature•Morphologigy•Shape Feature
• parameters•frequency threshold θ•number of suffixes uesd φ•morphological segmentation algorithms
9
Outline
• Introduction • Modeling of morphology and shape• Experimental Setup• Results and Discussion• Conclusion
11
Morphology
• Automatic suffix identification algorithms:Reports , Morfessor and Frequency
• Parameter:φ most frequent suffixes
13
similarity measure
• similarity measure and details of the shape features in prior work (M¨ uller and Sch¨ utze, 2011).
14
Outline
• Introduction • Modeling of morphology and shape• Experimental Setup• Results and Discussion• Conclusion
15
Experimental Setup• Baseline
• Morphological class language model
• Distributional class language model
• Corpus
16
Experimental Setup• Experiments:
•srilm, kneser-Ney(KN), generic class implementation, optimal interpolation parameters
• Baseline•modified KN model
18
Morphological class language model
Final model PM interpolates PC with a modified KN model:
Unknow word estimation:
20
Distributional class language model
• PD is same form PM
• The difference is the classes are mophological for PM and distributional for PD
• Whole-context distributional vector space model
22
Outline
• Introduction • Modeling of morphology and shape• Experimental Setup• Results and Discussion• Conclusion
23
Results and Discussion
• Morphological model vs. Distributional model
• Sensitivity analysis of parameters
24
Morphological model vs. Distributional model
• MM:more morphological , more perplexity reduction ,largerφ.
• MM : Result considerable perplexity reduc-tions 3%-11%
• Frequency is surprisingly well
• Noly 4 cases DM better than MM
• DM restriction clustering to less frequent words
26
Sensitivity analysis of parameters• best and worst values of each parameter and the diffe
rence in perplexity improve-ment between the two.
• θ•strong influence on PD•positive correlated with morphological complexit
y
• φ and segmentation algorithms•negligible effect•frequency is perform best.
28
Outline
• Introduction • Modeling of morphology and shape• Experimental Setup• Results and Discussion• Conclusion
29
Conclusion• Feature:morphology shape feature
• Result:perplexity reduc-tions 3%-11%
• parameters:•θ:considerable influence•φ and segmentation algorithms: small effect
30
Future Work• A model that interpolates KN, morphological class mo
del and distributional class model.