natural language processing chapter 3 : morphological analysis

20
Natural Language Processing Chapter 3 : Morphological Analysis

Upload: arthur-mcdonald

Post on 13-Dec-2015

243 views

Category:

Documents


1 download

TRANSCRIPT

Natural Language Processing

Chapter 3 : Morphological Analysis

04/18/23 NLP 2

Definition

• Morphology is the study of word formation – how words are built up from smaller pieces. When we do morphological analysis, then, we’re asking questions like, what pieces does this word have? What does each of them mean? How are they combined?

•Goal : Given a word that’s not in the dictionary, can we derive a root form that is in the dictionary.

•Morphological analysis is the process of recognizing the root form and type of a morphological variant (prefix, suffix). Given a word W:

04/18/23 NLP 3

Algorithm

• 1- If W is in the dictionary, then return its definition.• 2- Else apply morphology rules to identify possible root

forms of W. - Each morphology rule strips a prefix or suffix from W,

and sometimes adds back replacement characters, to produce a possible root form. If the root form is in the dictionary, success! - When a morphology rule succeeds, the root word definition is returned along with properties of the morphological variant.• Rules must be applied recursively! Multiple derivations are

common!

04/18/23 NLP 4

Sample Morphology Rules

04/18/23 NLP 5

Sample Morphology Rules

04/18/23 NLP 6

Example of Morphological Analysis

04/18/23 NLP 7

Example of Morphological Analysis

04/18/23 NLP 8

Example of Morphological Analysis

04/18/23 NLP 9

Basic Parts of Speech

• Parts of Speech: adjective, adverb, article, conjunction, noun, verb, preposition, pronoun, ...

• A closed class is a class that contains a relatively fixed set of words;

• new words are rarely introduced into the language.• Ex: articles, conjunctions, pronouns, prepositions, ...• An open class is a class that contains a constantly

changing set of words; new words are often introduced into the language (that readily accept new members)

• Ex: adjectives, adverbs, nouns, verbs• Examples of Closed Classes• Articles: a, an, the• Conjunctions: and, but, or, ...• Demonstratives: this, that, these, ...

04/18/23 NLP 10

Basic Parts of Speech

• Prepositions: to, for, with, between, at, of, ...• Pronouns: I, you, me, we, he, she, him, her, ...• Quantifiers الكمية محددة غير ,some, every, most, any ( كلمات

both, ...Articles

• Articles are especially problematic for natural language generation.

• Many noun phrases begin with an article.• Ex: a newspaper, an apple, the movie• But there are many exceptions, for example:• The bowl was full of rice. -The bowl was full of apple.• I go to college. - I go to university.• She went on vacation. - She went on trip.

04/18/23 NLP 11

Basic Parts of Speech

Nouns• Nouns: Words that represent objects, places, concepts,

events.• Ex: dog, city, idea, marathon• Proper nouns : names of persons, city• Count nouns: describe specific objects or sets of objects.• Ex: dogs, cities, ideas, marathons• Mass nouns: describe composites or substances.• Ex: dirt, water, garbage, deer• Modifiers• Adjectives: words that attribute qualities to objects.• Ex: wet, loud, happy, funny• Noun modifiers: nouns that modify other nouns.• Ex: dog food, aluminum can, song book

04/18/23 NLP 12

Basic Parts of Speech

Prepositions and Particles

• Prepositions represent relationships, such as time, location,

modification, and complements. For example:• He put the book on the table.• Sam gave the book to Mary.• Jane walked up the stairs.• Particles follow verbs and create a new meaning. For

example:• Greg passed out.• Charlie threw up his lunch.• Sometimes there is preposition/particle ambiguity:• Sarah looked over the paper.

04/18/23 NLP 13

Basic Parts of Speech

Verbs• Verbs: represent actions, commands, or assertions.• Main verbs: walk, eat, believe, claim, ask, ...• Auxiliary verbs: be, do, have• Modals: would, should, could, can, will, may, ..• Transitive verbs: take a direct object complement.• Ex: eat an apple, read a book, sing a song• Intransitive verbs: do not take a direct object.• Ex: she laughed, he lied, I slept.• Bitransitive verbs: take both a direct object and an indirect

object..• I gave Mary a gift.• She sang the baby a lullaby.

04/18/23 NLP 14

part of speech tagging

Tagging :The process of assigning a part-of-speech or other lexical class marker to each word in a corpus.

Example :

thegirlkissedthebabyonthecheek

WORDSTAGS

NVPDET

04/18/23 NLP 15

part of speech tagging

thegirlkissthebabyonthecheek

LEMMA TAG

+DET+NOUN+VPAST+DET+NOUN+PREP+DET+NOUN

thegirlkissedthebabyonthecheek

WORD

04/18/23 NLP 16

part of speech tagging

04/18/23 NLP 17

Rule-Based Tagging

• Basic Idea:– Assign all possible tags to words

– Remove tags according to set of rules of type: if word+1 is an adj, adv, or quantifier and the following is a sentence boundary and word-1 is not a verb like “consider” then eliminate non-adv else eliminate adv.

– Typically more than 1000 hand-written rules, but may be machine-learned.

04/18/23 NLP 18

Stochastic Tagging

• Based on probability of certain tag occurring given various possibilities

• Requires a training corpus

• No probabilities for words not in corpus.

• Training corpus may be different from test corpus.

04/18/23 NLP 19

Transformation-Based Tagging (Brill Tagging)

• Combination of Rule-based and stochastic tagging methodologies– Like rule-based because rules are used to specify tags in a

certain environment– Like stochastic approach because machine learning is used—

with tagged corpus as input• Input:

– tagged corpus– dictionary (with most frequent tags)

• Usually constructed from the tagged corpus

• Basic Idea:– Set the most probable tag for each word as a start value– Change tags according to rules of type “if word-1 is a

determiner and word is a verb then change the tag to noun” in a specific order

04/18/23 NLP 20

Transformation-Based Tagging (Brill Tagging)

• Training is done on tagged corpus:– Write a set of rule templates– Among the set of rules, find one with highest score– Continue from 2 until lowest score threshold is passed– Keep the ordered set of rules