![Page 1: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/1.jpg)
Language Change as a
Constrained Multi-Objective
Optimization
Monojit ChoudhuryMicrosoft Research Lab, India
A tale
of t
he la
zy to
ngue
Indo-Australia Workshop on Optimization in Human Language Technology16th Dec 2012, IIT Patna
![Page 2: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/2.jpg)
Language Change
![Page 3: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/3.jpg)
Language Change
• Change in the syntactic/semantic/phonological features of a language
• Perpetual, universal, directional (?)
• Phonological Change: – Affects the sounds– Structured, independent of syntax/semantics– Example: Loss of consonant clusters in Hindi
agni aag, dugdha dUdh, raatri raat
![Page 4: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/4.jpg)
Effects of the “Lazy Tongue”
Assimilation• in+apt = inapt• in+decent = indecent• in+polite = impolite• in+mature = immature• in+legal = illegal• in+regular = irregular
Deletion• cannot can’t• do not don’t• will not won’t• are not ain’t• information info
![Page 5: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/5.jpg)
Explanations for Change
Exogenous causes– Language contact– Socio-political
factors– Communication
medium
Endogenous causes– Functional– Phonetic error-based– Frequency drifts– Evolutionary
![Page 6: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/6.jpg)
Functional Explanation of Language Change
• There are three evolutionary forces on any linguistic system:– Minimization of effort (energy)– Maximization of perceptual distinctiveness
(Minimization of ambiguity)– Maximization of learnability
Language is a perpetually evolving system shaped by these three conflicting
forces
![Page 7: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/7.jpg)
Outline of the Talk
• Morpho-phonological change of Bangla Verb systems and emergence of dialect diversity– Approach: Multi-Objective Constrained Optimization– Technique: Multi-Objective Genetic Algorithm (MOGA)
• Understanding Computer Mediated Communication– Normalization of Texting language – Romanization of Indian Language text
![Page 8: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/8.jpg)
Geography of Bangla
• Standard Colloquial Bengali (SCB)
• Agartala Colloquial Bengali (ACB)
• Sylhetti
![Page 9: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/9.jpg)
History of Bangla
1200 AD 1800 AD
![Page 10: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/10.jpg)
BanglaVerb Morphology
করে�ছি�লা�মkar-echh-il-aam
Verb root (do)
Aspect (perfect)
Tense (past)
Person (first)
I had done
![Page 11: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/11.jpg)
Cognates in the Dialects
Features Classical SCB ACB
Non-finite kariyA kore kairAPs,2, per. kariyAChila koreChilo korsilo
Ps,1, cont. kariteChilAm korChilAm kartAslAm
root: kar (to do)
![Page 12: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/12.jpg)
Atomic Phonological Operators
kariteChila
kariChila
kairChila korChila
karitChila
korChilo
Del(e/t_Ch)
Del(t/_Ch)Met(ri/_Ch)
Asm(ao/_i)Mut(a o/_$)
Deletion, MetathesisAssimilation, Mutation
![Page 13: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/13.jpg)
Hypothesis
A sequence of Atomic Phonological Operators, is preferred if the verb forms obtained by application of this sequence on the classical forms have some functional benefit over the classical forms.
Thus, all the modern dialects of Bangla have some functional advantage over the classical dialect.
![Page 14: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/14.jpg)
A Formal Model of Functional Explanation
f1: Effort of articulation
f2: [Acoustic distinctiveness]-1
Unstable languages
Impossible languages
Metastable languages
![Page 15: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/15.jpg)
Genetic Algorithm
Gene (A string of symbols) How the solution actually looks like
GA: search for good solutions mimicking nature [recombination and mutation of genes]
![Page 16: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/16.jpg)
Phenotype
kori
korChi
:
korte
kori
kartAsi
:
kartA
Lexicon consisting of 28 forms for the verb kar
![Page 17: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/17.jpg)
Genotype
A sequence of atomic phonological operators
Del t Met ri NOP Del e Asm a Del i NOP
Dsm e NOP NOP Met ri Asm a Del e NOP
![Page 18: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/18.jpg)
Genotype Phenotype
karikariteChi
karite
Del t Met ri NOP Del e Asm a Del i NOP
karikarieChi
karie
kairkaireChi
kaire
korkorCh
kor
![Page 19: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/19.jpg)
Crossover
![Page 20: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/20.jpg)
Mutation
![Page 21: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/21.jpg)
Multi-Objective GA
![Page 22: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/22.jpg)
Multi-Objective GA: Apply constraints
![Page 23: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/23.jpg)
Multi-Objective GA: Apply constraints
![Page 24: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/24.jpg)
Multi-Objective GA: Finding out good solutions
![Page 25: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/25.jpg)
Multi-Objective GA: But also keep some not-so-good solutions
![Page 26: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/26.jpg)
Multi-Objective GA: But also keep some not-so-good solutions
![Page 27: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/27.jpg)
Multi-Objective GA: After several iterations
![Page 28: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/28.jpg)
Objective functions
• Articulatory effort– fe(Λ): weighted sum of number of syllables,
letters and vowel height differences averaged over all words in the lexicon
• Acoustic Distinctiveness– fd(Λ): Inverse of mean edit distance between
words
• Learnability– fr(Λ): correlation between feature match and
edit distance
![Page 29: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/29.jpg)
Experiments
• NSGA – II : a package for fast MOGA• Gene length: 15 APOs• A repertoire of 128 APOs• Population: 1000, Generation: 500• 6 Models with different combinations of
constraints and objectives
![Page 30: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/30.jpg)
Pareto-optimal front
CB
SylhettiACB
SCB
![Page 31: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/31.jpg)
Observations
• vertical and horizontal limb• real dialects on the horizontal limb• Sound changes push the dialects from right
to left (reduce effort)• but never up the limb• why?
![Page 32: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/32.jpg)
Role of Constraints
![Page 33: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/33.jpg)
For more information
Choudhury et al., Evolution optimization and language change: the case of Bengali verb inflections, in Proceedings of ACL SIGMORPHON9, Association for Computational Linguistics, 2007
http://research.microsoft.com/people/monojitc/
MOGA and NSGA IIKanpur Genetic Algorithms Laboratory
http://www.iitk.ac.in/kangal/index.shtml
![Page 34: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/34.jpg)
Food for Thought
• Evaluation:– Myriads of possible dialects, but only a few
observed in nature
• Fixed set of pre-defined APOs – how to generalize for any change?
• MOGA is an optimization tool, which in no way simulates language change– How do languages optimize themselves?
![Page 35: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/35.jpg)
Outline of the Talk
• Morpho-phonological change of Bangla Verb systems and emergence of dialect diversity– Approach: Multi-Objective Constrained Optimization– Technique: Multi-Objective Genetic Algorithm (MOGA)
• Understanding Computer Mediated Communication– Normalization of Texting language – Romanization of Indian Language text
![Page 36: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/36.jpg)
Computer Mediated Communication
Form
![Page 37: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/37.jpg)
Texting Language
• A new genre of English & also other languages used in chats, sms, emails, blogs, tweets, FB posts, comments etc.
dis is n eg 4 txtin lang
This is an example for Texting language
![Page 38: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/38.jpg)
Texting Language
• A new genre of English & also other languages used in chats, sms, emails, blogs, etc.
• Ungrammatical, unconventional spellings
dis is n eg 4 txtin lang
This is an example for Texting language
24 39
The shorter the fasterConstraint: understandability
![Page 39: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/39.jpg)
Analysis of Social Media
• A hot topic in NLP– Normalization– Language identification– Sentiment/Polarity detection– Summarization/trend prediction
Choudhury et al. (2007) Investigation and Modeling of the Structure of Texting Language. In IJCAI Workshop on Analytics of Noisy Data 2007
![Page 40: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/40.jpg)
Tomorrow never dies!!!
• 2moro (9)• tomoz (25) • tomoro (12) • tomrw (5)• tom (2)• tomra (2)• tomorrow (24)• tomora (4)
• tomm (1)• tomo (3)• tomorow (3)• 2mro (2)• morrow (1)• tomor (2)• tmorro (1)• moro (1)
![Page 41: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/41.jpg)
Patterns or Compression Operators
• Phonetic substitution (phoneme)– psycho syco, then den
• Phonetic substitution (syllable)– today 2day , see c
• Deletion of vowels– message mssg, about abt
• Deletion of repeated characters– tomorrow tomorow
![Page 42: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/42.jpg)
Patterns or Compression Operators
• Truncation (deletion of tails)– introduction intro, evaluation eval
• Common Abbreviations– Bangalore blr, text back tb
• Informal pronunciation– going to gonna, better betta
![Page 43: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/43.jpg)
HMMs for SMS Normalization
G1
‘T’
S6
G2
‘O’G3
‘D’G4
‘A’G5
‘Y’
S0
P2
/AH/P4
/AY/
S1
“2”
ε T @ ε O @ ε D @ ε A @ ε Y @
![Page 44: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/44.jpg)
Bigram Examples
• TL: would b gd 2 c u some time soon• Op: would be good to see you some time soon
• TL: just wanted 2 say a big thanx 4 my bday card• Op: just wanted to say a big thanks for my today
card
• TL: me wel i fink bein at home makes me feel a lot more stressed den bein away from it
• Op: me well i think being at home makes me feel a lot more stressed deny being away from it
![Page 45: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/45.jpg)
Code mixing
Transliteration
Spelling Change
Indian English
Use of Indian Languages on Online Social Media
![Page 46: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/46.jpg)
Concluding Remarks
• Languages are perpetually evolving and optimizing systems– Computational modeling of language change is
still in its infancy– Lots of scope for research
![Page 48: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/48.jpg)
Why Computational Models?
FOR AGAINST
Formalization
Virtual experimentation
Exploration
Intractable
Simplified assumptions
Toy languages
Can we modelreal world language change?
![Page 49: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/49.jpg)
Objectives and Constraints - 1
• Articulatory effort
fe(w) = α1 fe1(w) + α2 fe2(w) + α3 fe3(w)
fe1(w) = |w|
fe2(w) = hr(σi)
fe3(w) = |ht(Vi) - ht(Vi+1)|
![Page 50: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/50.jpg)
Objectives and Constraints - 2
• Acoustic distinctiveness
fd(Λ) = (1/N) ed(wi,wj)-1
Cd(Λ) = -1 if ed(wi,wj) = 0 for > 2 pairs
• Phonotactic constraints
Cp(Λ) = -1 if any of the words violate the phonotactic constraints of the language
![Page 51: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/51.jpg)
Objectives and Constraints - 3
• Learnability as Regularity– fr: The correlation coefficient between the edit
distance and number of matching morphological attributes for every word pair
– Cr = -1 if fr > 0.8
![Page 52: Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy](https://reader037.vdocuments.us/reader037/viewer/2022102906/56649c7d5503460f94932bfa/html5/thumbnails/52.jpg)
Emergent dialects
Classical D1 D2 D3
kariteChilAm kartA karChi(korChi)
karteChi(kartAsi)
kariteChila kartAa karCha(korCha)
karteCha(kartAsa)
kariteChilen kartAen karChen(korChen)
karteChen(kartAsen)