word tagging using max entropy model and feature selection

1

Word Tagging using Max Entropy Model

and Feature selection

NLP Final Project

Advanced NLP Pre-PhD Course

Submitted to:

Prof.Dr. Ali Fahmy

Prof.Dr. Ali Farghali

Submitted by:

Eman Negm

Marwa Mostafa

Wessam Sayed

Yomna Mahmoud

Yosr Eman

2

Contents

Project Introduction and Motivation ...................................................... 3

What is Maximum entropy model? ...................................................... 3

Why Maximum Entropy in NLP? ........................................................... 3

Why we are concerned about POS Tagging? ........................................ 3

Tools ....................................................................................................... 4

Methodology .......................................................................................... 4

Corpus Selection: ................................................................................. 4

Part-of-Speech Tags Selection: ............................................................. 4

Indicators Definition: ............................................................................ 5

MaxEntropy learning module: .............................................................. 5

Results and Analysis ................................................................................ 6

For the Noun Phrase tag: ..................................................................... 6

Sample of Selected Features: ............................................................ 6

For the Verb Phrase tag: ....................................................................... 7


For the Adjective tag: ........................................................................... 8


For the Adverb tag: .............................................................................. 9

Sample of Selected Features: .......................................................... 10

For the Pronoun tag: .......................................................................... 10

Sample of Selected Features: .......................................................... 11

For all tags .......................................................................................... 11

References ............................................................................................ 12

3

Project Introduction and Motivation

What is Maximum entropy model?

It is an information theory tool that is utilized to construct a

model from partially available data. When trying to model some

unknown events, we choose the one that has Maximum Entropy.

Why Maximum Entropy in NLP?

MaxEnt has been applied successfully in various fields including

NLP. Previous work similar to our work has been presented over

the years [1, 2, 8].

Why we are concerned about POS Tagging?

Part-of-Speech (POS) tagging is the task of understanding the

place of everywhere in the sentence based on its definition and

context. POS tagging helps the computer to better distinguish

words grammatically and correctly understand sentences.

4

Tools In this project we have been using NLTK (Natural Language

Toolkit), which is a python based toolkit. Our work is windows

based. We have used Python 3 as the base for our code.

Methodology Our work has been divided into five main steps:

1. Corpus Selection.

2. Part-of-Speech tags selection.

3. Indicators definition.

4. MaxEntropy learning module.

5. Running the algorithms on different selected features and

performing analysis (This step will be mentioned in details

in the following section).

We will go through each step and describe it in details.

Corpus Selection:

Our focus was to select a tagged corpus so that we can compare

our results to the existing tags to validate our work. We have

selected a part of a very known corpus that has been used

heavily in Natural Language research called “The Brown

Corpus” [3] (90% training & 10% test data).

Part-of-Speech Tags Selection:

After selecting the tagged corpus, we went through the selection

of tags. We selected the tags for the most common words, the

5

main selected tags are (Nouns, Verbs, Pronouns, Adjectives and

Adverbs). Through each main tag, we have set of subtags.

Description of the subtags are presented at the University of

Leeds website [4].

Indicators Definition:

For each tag from the defined above we have defined a set of

Indicators which represents the appearance of the tagged POS.

For example, we have defined that a verb, in the present tense,

3rd person singular -> ends with s or es, the word before it is an

adverb or noun.

These set of indicators have been collected from various

online resources and from our knowledge of English

grammar.

Matching the indicators was done through regular

expressions.

MaxEntropy learning module:

We have utilized the “Classify package” in NLTK. It is a

classifier model based on a maximum entropy modeling

framework. For learning the weights in the model, we have

utilized the Improved Iterative Scaling algorithm (IIS).

6

Results and Analysis We trained each tag alone, and all tags together using 100

iteration. The features have been selected according to our

research for the noun features, and our knowledge about the

English language. Following sections describe the results of the

implemented tags:

For the Noun Phrase tag:

The noun phrase tag consists of many features, we implemented

in our project 15 features. The result on 100 iteration was as

following:

Length of features set: 18438

Testing data size: 1843

Total Accuracy: 0.9696147585458491

Sample of Selected Features:

1. Nouns have determiners before them like: a, an, the, this,

that, these, those, some, many, their, one, two, three,

several.

2. Nouns may be singular or plural.

One book five books

One map several maps

One tooth three teeth

One box six boxes

One girl many girls

One child eight children

7

3. Nouns can own or be owned (can be possessive).

Frank’s bike is a ten-speed.

The window’s pane was frosted.

The duck’s pond was cloudy with muck.

The dog’s fur was curly and coarse.

4. Nouns can be formed from other words by using noun

suffixes such as:

-ation imagine + ation = imagination (information,

creation, suffocation, inspiration).

-ism capital + ism = capitalism (Mormonism, Catholicism,

idealism, realism, pessimism).

-ment assign + ment = assignment (arrangement,

encampment, enlargement, judgement)

-ness lonely + ness = loneliness (sadness, happiness,

painlessness, graciousness)

-ance accept + ance = acceptance (distance, penance,

repentance, romance)

For the Verb Phrase tag:

The verb phrase tag consists of four main sections: the DO

verbs, the BE verbs, the HAVE verbs, and the fourth section

consists of every other verb. The result on 100 iteration was as

following:


8


Total Accuracy: 0.9975845410628019


1. Some verbs end with s or es, and word before it is adverb

or noun.

E.g. Mohammed plays Football, Mona Amazingly handled

the situation.

2. Some verbs in the infinitive tense ends or start with certain

morphemes, such as: Ending with ate, ify, en, ize; starting

with en, em, re, over,sub, mis ,un.

E.g. Criticize, Modify.

For the Adjective tag:

The adjective tag consists of many features. We implemented in

this project 10 features.

The following figure shows the result for 100 iterations.



Total Accuracy: 0.9807692307692307

The features have been selected according to our research for

various a features, and our knowledge about the English

language [5].

9


1. Words ending in \-able" or \-ible" with a verb base are

tagged as adjective.

Example: adorable, agreeable

2. Another good indicator of adjectives is if it is a

comparative. We test this by determining if the word ends

is a superlative ending in \-er" or \-est".

Example: warmer, warmest, harder, shortest, smallest

3. Words ending in \ful are tagged as adjective.

Example: awful, beautiful, colorful.

For the Adverb tag:

Adverbs are divided to many categories [6]. The largest category

is called “manner adverbs”, most of the words in this category

are derivative –ly adverbs (e.g. quickly, bravely, happily). Other

categories like the comparative category (e.g. earlier, better,

later, higher), superlative category (e.g. highest, uppermost,

nearest), particle category (e.g. over, on, in, about, through).

Brown Corpus implemented 10 tags to support the different

adjective categories. We implemented in this project 10 features

to recognize the above tags. The following figure shows the

result for 100 iterations.

Length of features: 2742


Total Accuracy: 0.9416058394160584

10


1. Feature that represents manner adverbs by recognized the

words end with ‘-ly’.

Example: quickly, happily.

2. Feature that represents comparative adverbs by recognized

the words end with ‘-er’.

Example: earlier, better.

3. Feature that represents superlative adverbs by recognized

the words end with ‘-est’.

Example: highest, uppermost.

For the Pronoun tag:

Pronouns can be divided into several categories: personal,

indefinite, reflexive, reciprocal, possessive, demonstrative,

interrogative and relative [7]. We discussed in this project 24

features. The following figure shows the result for 100

iterations.

Length of features: 3400


Total Accuracy: 0.8147058823529412

11


1. Feature that represents singular, reflexive pronoun.

Example: itself, himself, myself, yourself and ownself.

2. Feature that represents plural pronoun.

Example: themselves, ourselves and yourselves

3. Feature that represents personal, accusative pronoun.

Example: them, it, him, me, us, you, 'em, her and we'uns.

4. Feature that represents personal, nominative, 3rd person

singular pronoun.

Example: he, she and thee.

For all tags

The below table describes the average results over all tags:

Tags Total Feature Set

Length

Testing Feature Set

Length (10%) Average

Noun 18438 1843 96.96%

Verb 8287 828 99.75%

Adjective 4686 468 98.07%

Adverb 2742 274 94.16%

Pronoun 3400 340 81.47%

12

References [1] Nugues, Pierre M. "Part-of-Speech Tagging Using Statistical

Techniques."Language Processing with Perl and Prolog. Springer Berlin

Heidelberg, 2014. 223-251.

[2] Ratnaparkhi, Adwait. "A maximum entropy model for part-of-speech

tagging."Proceedings of the conference on empirical methods in natural language

processing. Vol. 1. 1996.

[3] Francis, W. Nelson, and Henry Kucera. "Brown corpus manual." Brown

University Department of Linguistics (1979).

[4] The brown corpus tag-set. Available at:

http://www.scs.leeds.ac.uk/ccalas/tagsets/brown.html

http://www.uefap.com/writing/feature/complex_noun.htm

[5] S.Malik. "PARSING JAVA METHOD NAMES FOR IMPROVED

SOFTWARE ANALYSIS." Spring 2011

[6] Nancarrow, Owen, and Eric Atwell. "A comparative study of the tagging of

adverbs in modern English corpora." Proceedings of Corpus Linguistics

2007(2007).

[7] Börjars, Kersti; Burridge, Kate. “Introducing English grammar (2nd ed.)”.

London: Hodder Education. pp. 50–57. ISBN 978-1444109870. (2010).

[8] Malecha, Gregory, and Ian Smith. "Maximum Entropy Part-of-Speech Tagging

in NLTK." unpublished course-related report: http://www. people. fas. harvard.

edu/gmalecha (2010).

http://www.scs.leeds.ac.uk/ccalas/tagsets/brown.html

http://www.uefap.com/writing/feature/complex_noun.htm

http://en.wikipedia.org/wiki/International_Standard_Book_Number

http://en.wikipedia.org/wiki/Special:BookSources/978-1444109870

word tagging using max entropy model and feature selection

Science