morphological analysis
DESCRIPTION
Morphological Analysis. Chapter 3. Morphology. Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using morphemes base form ( stem,lemma ), e.g., believe affixes (suffixes, prefixes, infixes), e.g., un-, -able, - ly - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/1.jpg)
Morphological Analysis
Chapter 3
![Page 2: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/2.jpg)
Morphology
• Morpheme = "minimal meaning-bearing unit in a language"
• Morphology handles the formation of words by using morphemes– base form (stem,lemma), e.g., believe– affixes (suffixes, prefixes, infixes), e.g., un-, -able, -ly
• Morphological parsing = the task of recognizing the morphemes inside a word– e.g., hands, foxes, children
• Important for many tasks– machine translation, information retrieval, etc.– Parsing, text simplification, etc
2
![Page 3: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/3.jpg)
Morphemes and Words
• Combine morphemes to create words Inflection
combination of a word stem with a grammatical morpheme same word class, e.g. clean (verb), clean-ing (verb)
Derivation combination of a word stem with a grammatical morpheme Yields different word class, e.g delight (verb), delight-ful
(adj) Compounding
combination of multiple word stems Cliticization
combination of a word stem with a clitic different words from different syntactic categories, e.g. I’ve
= I + have
3
![Page 4: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/4.jpg)
Inflectional Morphology
• Inflectional Morphology• word stem + grammatical morpheme cat + s• only for nouns, verbs, and some adjectives
• Nouns plural: regular: +s, +es irregular: mouse - mice; ox - oxen many spelling rules: e.g. -y -> -ies like: butterfly -
butterflies possessive: +'s, +'
• Verbs main verbs (sleep, eat, walk) modal verbs (can, will, should) primary verbs (be, have, do)
4
![Page 5: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/5.jpg)
Inflectional Morphology (verbs)
• Verb Inflections for:• main verbs (sleep, eat, walk); primary verbs (be, have, do)
• Morpholog. Form Regularly Inflected Form• stem walk merge try
map• -s form walks merges tries maps• -ing participle walking merging trying
mapping• past; -ed participle walked merged tried
mapped
• Morph. Form Irregularly Inflected Form• stem eat catch cut • -s form eats catches cuts • -ing participle eating catching cutting • -ed past ate caught cut• -ed participle eaten caught cut
5
![Page 6: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/6.jpg)
• Noun Inflections for:• regular nouns (cat, hand); irregular nouns(child, ox)
• Morpholog. Form Regularly Inflected Form• stem cat hand• plural form cats hands
• Morph. Form Irregularly Inflected Form• stem child ox • plural form children oxen
Inflectional Morphology (nouns)
6
![Page 7: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/7.jpg)
Inflectional and Derivational Morphology (adjectives)
• Adjective Inflections and Derivations:
• prefix un- unhappy adjective, negation• suffix -ly happily adverb, manner• suffix -ier, -iest happier, happiest comparatives• suffix -ness happiness noun
• plus combinations, like unhappiest, unhappiness.
• Distinguish different adjective classes, which can or cannot take certain inflectional or derivational forms, e.g. no negation for big.
7
![Page 8: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/8.jpg)
Derivational Morphology (nouns)
8
![Page 9: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/9.jpg)
Derivational Morphology (adjectives)
9
![Page 10: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/10.jpg)
Verb Clitics
10
![Page 11: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/11.jpg)
11
Morpholgy and FSAs
• We’d like to use the machinery provided by FSAs to capture these facts about morphology Recognition:
Accept strings that are in the language Reject strings that are not
In a way that doesn’t require us to in effect list all the words in the language
![Page 12: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/12.jpg)
12
Computational Lexicons
• Depending on the purpose, computational lexicons have various types of information Between FrameNet and WordNet, we
saw POS, word sense, subcategorization, semantic roles, and lexical semantic relations
For our purposes now, we care about stems, irregular forms, and information about affixes
![Page 13: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/13.jpg)
13
Starting Simply
• Let’s start simply: Regular singular nouns listed explicitly
in lexicon Regular plural nouns have an -s on the
end Irregulars listed explicitly too
![Page 14: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/14.jpg)
14
Simple Rules
![Page 15: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/15.jpg)
15
Now Plug in the Words
Recognition of valid wordsBut “foxs” isn’t right; we’ll see how to fix that
![Page 16: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/16.jpg)
16
Parsing/Generation vs. Recognition
• We can now run strings through these machines to recognize strings in the language
• But recognition is usually not quite what we need Often if we find some string in the language we
might like to assign a structure to it (parsing) Or we might have some structure and we want to
produce a surface form for it (production/generation)
• Example From “cats” to “cat +N +PL”
![Page 17: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/17.jpg)
17
Finite State Transducers
• Add another tape• Add extra symbols to the transitions• On one tape we read “cats”, on the
other we write “cat +N +PL”
![Page 18: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/18.jpg)
18
FSTs
![Page 19: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/19.jpg)
19
Applications
• The kind of parsing we’re talking about is normally called morphological analysis
• It can either be • An important stand-alone component of
many applications (spelling correction, information retrieval)
• Or simply a link in a chain of further linguistic analysis
![Page 20: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/20.jpg)
20
Transitions
• c:c means read a c on one tape and write a c on the other• +N:ε means read a +N symbol on one tape and write
nothing on the other• +PL:s means read +PL and write an s
c:c a:a t:t +N: ε +PL:s
![Page 21: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/21.jpg)
21
Ambiguity
• Recall that in non-deterministic recognition multiple paths through a machine may lead to an accept state.• Didn’t matter which path was actually
traversed• In FSTs the path to an accept state
does matter since different paths represent different parses and different outputs will result
![Page 22: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/22.jpg)
22
Ambiguity
• What’s the right parse (segmentation) for• Unionizable• Union-ize-able• Un-ion-ize-able
• Each represents a valid path through the morphology machine.
![Page 23: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/23.jpg)
23
Ambiguity
• There are a number of ways to deal with this problem• Simply take the first output found• Find all the possible outputs (all paths)
and return them all (without choosing)• Bias the search so that only one or a few
likely paths are explored
![Page 24: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/24.jpg)
24
The Gory Details
• Of course, its not as easy as • “cat +N +PL” <-> “cats”
• As we saw earlier there are geese, mice and oxen
• But there are also a whole host of spelling/pronunciation changes that go along with inflectional changes• Fox and Foxes vs. Cat and Cats
![Page 25: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/25.jpg)
25
Multi-Tape Machines
• To deal with these complications, we will add more tapes and use the output of one tape machine as the input to the next
• So to handle irregular spelling changes we’ll add intermediate tapes with intermediate symbols
![Page 26: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/26.jpg)
26
Multi-Level Tape Machines
• We use one machine to transduce between the lexical and the intermediate level, and another to handle the spelling changes to the surface tape
#
![Page 27: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/27.jpg)
27
Intermediate to Surface
• The add an “e” rule as in fox^s# --> foxes#
![Page 28: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/28.jpg)
28
Lexical to Intermediate Level
![Page 29: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/29.jpg)
29
Foxes
#
This arrow should point straight down
![Page 30: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/30.jpg)
30
Notes
• The transducers may be run in the other direction too (examples in lecture)
• The transducers are cascaded: The output of one layer serves as the input to the next
![Page 31: Morphological Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062805/56814d27550346895dba59f4/html5/thumbnails/31.jpg)
31
Overall Scheme
#
We aren’t coveringthe overall schemein any more detail than this