morphology 1gatius/mai-inlp/morpho2.pdfnlp morphology 9 morphological analysis 1 formaries •...
TRANSCRIPT
![Page 1: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/1.jpg)
NLP Morphology 1
Morphology 1
• Introduction• Morphology• Morphological Analysis (MA)• Using FS techniques in MA• Automatic learning of the morphology of a
language
![Page 2: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/2.jpg)
NLP Morphology 2
Morphology 2
• Morphology• Structure of a word as a composition of morphemes
• Related to word formation rules
• Functions• Inflection
• Derivation
• Composition
• Result of morphologic analysis• Morphosyntactic categorization (POS)
• e.g. Parole tagset (VMIP1S0), more than 150 categories for Spanish
• e.g. Penn Treebank tagset (VBD), about 30 categories for English
• Morphological features• Number, case, gender, lexical functions
![Page 3: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/3.jpg)
NLP Morphology 3
Morphology 3
• Morphologic analysis• Decompose a word into a concatenation of
morphemes• Usually some of the morphemes contain the meaning
• One (root or stem) in flexion and derivation
• More than one in composition
• The other (affixes) provide morphological features
• Problems• Phonological alterations in morpheme concatenation• Morphotactics
• Which morphemes can be concatenated with which others
![Page 4: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/4.jpg)
NLP Morphology 4
Morphology 4
• Problems• Affixes
• Suffixes, prefixes, infixes, interfixes
• Inflectional affixes ≠ derivational affixes
• Derivation implies sometimes a semantic change not always predictible
• Meaning extensions
• Lexical rules
• A derivativational suffix can be followed by an inflectional one• love => lover => lovers
• Inflection does not change POS, sometimes derivation does
• Inflection affects other words in the sentence• agreement
![Page 5: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/5.jpg)
NLP Morphology 5
Morphology 5
• Morphotactics• Word formation rules• Valid combinations between morphemes
• Simple concatenation
• Complex models root/pattern
• Language dependency regularity
• Phonological alterations (Morphophonology)• Changes when concatenating morphemes• Source: Phonology, morphology, orthography• variable in number and complexity
• e.g. vocalic harmony
![Page 6: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/6.jpg)
NLP Morphology 6
Morphology 6
• 1 morpheme: Evitar ( verb to avoid)
• 2 morphemes:• evitable = evitar + able (adj: can be avoided)
• 3 morphemes:• inevitable = in + evitar + able
(adj: cannot be avoided)
• 4 morphemes:• inevitabilidad = in + evitar + able + idad
(noun: cannot be avoided)
Morphemes
![Page 7: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/7.jpg)
NLP Morphology 7
Morphology 7
• number• house houses• cheval chevaux• casa casas
• verbal form• walk walkes walked walking• amo amas aman ...
• gender• niño niña
Inflectional Morphology
![Page 8: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/8.jpg)
NLP Morphology 8
Morphology 8
• Form• Without change barcelonés
• Prefix inevitable
• Suffix importantísimo
• Source• verb => adjective tardar => tardío
• verb => noun sufrir => sufrimiento
• noun => noun actor => actorazo
• noun => adjective atleta => atlético
• adjective => adjective rojo => rojizo
• adjective => adverb alegre => alegremente
Derivational Morphology
![Page 9: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/9.jpg)
NLP Morphology 9
Morphological Analysis 1
Formaries • Dictionaries of word forms+ efficiency
+ Languages with few variants (e.g. English)
+ extensibility+ Possibility of building and maintenance from a
morphological generator– Languages with high flexive variation– derivation, composition
• FS techniques• FSA
• 1 level analyzers
• FST• > 1 level analyzers
Types of morphological analyzers
![Page 10: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/10.jpg)
NLP Morphology 10
Morphological Analysis 2
• General model for languages with morpheme concatenation
• Independence between lingware and analyzer • Valid for analysis and generation• Distinction between lexical and superficial
levels• Parallel rules for morphophonology• Simple implementation
Morphological analyzers of two levels
![Page 11: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/11.jpg)
NLP Morphology 11
Morphological Analysis 3
• Morphological rules• Define the relations betweens characters
(surface) and morphemes and map strings of characters and the morphemic structure of the word.
• Spelling rules• Perform at the level of the letters forming the
word. Can be used to define the valid phomological alterations.
• Ritchie, Pulman, Black, Russell, 1987
![Page 12: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/12.jpg)
NLP Morphology 12
Morphological Analysis 4
• input: • form
• output• lemma + morphological features
Input Outputcat cat + N + sgcats cat + N + plcities city + N + plmerging merge + V + pres_partcaught (catch + V + past) or (catch + V + past_part)
![Page 13: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/13.jpg)
NLP Morphology 13
Morphological Analysis 5
0 1 2
reg_noun plural (-s)
irreg_pl_noun
irreg_sg_noun
reg_noun irreg_pl_noun irreg_sg_noun pluralfox sheep sheep -scat mice mousedog
Morphotactics
![Page 14: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/14.jpg)
NLP Morphology 14
Morphological Analysis 6
fo
x
s
ε
c at
d
og
n ey
em
ou
s
e
ic
fogcatdogdonkeymousemice
Letter Transducers
![Page 15: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/15.jpg)
NLP Morphology 15
Morphological Analysis 7
upper level lexic cat + N cat + N + pllower level surface catcats
c:c a:a t:t +N:ε +pl:s
![Page 16: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/16.jpg)
NLP Morphology 16
Morphological Analysis 8
• As a recognizer• From a pair of input strings (one lexical and the other
superficial) determines if one is transduction of the other
• As a generator• Generates pairs of strings
• As a translator• From a superficial string generates its lexical translation
Using FST
![Page 17: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/17.jpg)
NLP Morphology 17
Morphological Analysis 9
reg_noun irreg_pl_noun irreg_sg_noun pluralfox sheep sheep scat m o:i u:ε ce mousedog g o:e o:e se goose
0 1 2
reg_noun +pl:s
irreg_pl_noun
irreg_sg_noun
2
3
4
5
6
+N:ε
+N:ε
+N:ε
+sg:ε
+sg:ε
+pl:ε
![Page 18: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/18.jpg)
NLP Morphology 18
Morphological Analysis 10
lexical level f o x +N +pl
intermediate level f o x ^ s
superficial level f o x es
morphotactics
spelling rules
![Page 19: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/19.jpg)
NLP Morphology 19
Morphological Analysis 11
fogcatdogdonkeymousemice
fo
xc a
t
d
og
n ey
em
ou
se
o:i
+N:ε
+N:ε
+pl:^s
+sg:ε
+sg:ε
+u:ε ce
+N:ε
+pl:ε
![Page 20: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/20.jpg)
NLP Morphology 20
Morphological Analysis 12
name description example consonant doubling single letter consonant beg/begging doubled before -ing/-ede deletion silent e dropped before
-ing/-ed make/makinge insertion e added after -s,-z,-x,-ch,-sh
before -s watch/watchesy replacement -y changes to -ie before -s, to
i before -ed try/triesk insertion verbs ending with voyel +c
add -k panic/panicked
Spelling rules
![Page 21: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/21.jpg)
NLP Morphology 21
Morphological Analysis 13
ε:e ⇔ [xsz]^:ε ___ s#
decomposition
ε:e ⇒[xsz]^:ε ___ s# ε:ε /⇐ [xsz]^:ε ___ s#
⇒ /⇐
Spelling rules: e-insertion
![Page 22: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/22.jpg)
NLP Morphology 22
Morphological Analysis 14
epenthesis
+ : e <=> {< {s:s c:c} h:h> s:s x:x z:z} --- s:s
<=>=> context restriction<= surface coercion
context
C: {...}V: {a,e,i,o,u,y}C2: {...}=: whatever
example: box + sbox e s
![Page 23: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/23.jpg)
NLP Morphology 23
Morphological Analysis 15
e-deletion
e : 0 <=> = :C2 --- <+:0 V:= > or <C:C V:V> --- < +:0 e:e >or <c:c g:g> --- < +:0 {e:e i:i} >or l:0 --- +:0or c:c --- < +:0 a:0 t:t b:b>
mov e + edmov ed
agre e + edagre ed
![Page 24: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/24.jpg)
NLP Morphology 24
Morphological Analysis 16
a-deletion
a : 0 <=> <c:c e:0 +:0> --- t:t
redu c e + a t ionredu c t ion
... left context focus right context ...
![Page 25: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/25.jpg)
NLP Morphology 25
Morphological Analysis 17
Lexicon-FST
spelling rules
lexical level f o x +N +pl
intermediate level f o x ^ s
superficial level f o x e s
FST1 FST2 FSTn...
![Page 26: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/26.jpg)
NLP Morphology 26
Morphological Analysis 18
Lexicon-FST
FST1 FSTn...
Lexicon-FST
FSTA= FST1 ∧ ... ∧ FSTn
Lexicon-FST•
FSTA
intersection composition
![Page 27: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/27.jpg)
NLP Morphology 27
Automatic morphology learning 1
• Problem• Paradigm stem + affixea• Obtaining the stems• Classification of stems into models• Learning part of the morphology (e.g. derivational)
• Two approaches• No previous morphologic knowledge is available
• Goldsmith, 2001
• Brent, 1999
• Snover, Brent, 2001, 2002
• Morphologic knowledge can be used• Oliver at al, 2002
![Page 28: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/28.jpg)
NLP Morphology 28
Automatic morphology learning 2
• Automatic morphological analysis• Identification of borders betwen morphemes
• Zellig Harris• {prefix, suffix} conditional entropy
• bigrams and trigrams with high probability of forming a morpheme
• Learning of patterns or rules of mapping between pairs of words
• Global approach (top-down)• Golsdmith, Brent, de Marcken
![Page 29: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/29.jpg)
NLP Morphology 29
Automatic morphology learning 3
• Goldsmith’s system based on MDL (Minimum Description Length)• Initial Partition: word -> stem + suffix
• split-all-words• A good candidate to {stem, suffix} splitting in a word
has to be a good candidate in many other words
• MI (mutual information) strategy• Faster convergence
• Learning Signatures• {signatures, stem, suffixes}
• MDL
![Page 30: Morphology 1gatius/mai-inlp/morpho2.pdfNLP Morphology 9 Morphological Analysis 1 Formaries • Dictionaries of word forms +efficiency +Languages with few variants (e.g. English) +extensibility](https://reader033.vdocuments.us/reader033/viewer/2022042601/5f6e3a6f264296441629c11c/html5/thumbnails/30.jpg)
NLP Morphology 30
Automatic morphology learning 4
• Semi-automatic morphological analysis• Oliver, 2004
• Starts with a set of manually written morphological rules
• TL:TF:Desc• lemma ending• form ending• POS
• Lists of non flexive classes , closed classes and irregular words
• Corpora• Serbo-Croatian 9 Mw
• Russian 16 Mw