lecture 9: part of speech - university of virginia school...

Post on 30-May-2020

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Lecture 9: Part of Speech

Kai-Wei ChangCS @ University of Virginia

kw@kwchang.net

Couse webpage: http://kwchang.net/teaching/NLP16

1CS6501 Natural Language Processing

This lecture

vParts of speech (POS) vPOS Tagsets

2CS6501 Natural Language Processing

CS6501 Natural Language Processing 3

Parts of Speech

vTraditional parts of speechv~ 8 of them

CS6501 Natural Language Processing 4

POS examples

vN noun chair, bandwidth, pacingvV verb study, debate, munchvADJ adjective purple, tall, ridiculousvADV adverb unfortunately, slowlyvP preposition of, by, tovPRO pronoun I, me, minevDET determiner the, a, that, those

CS6501 Natural Language Processing 5

Parts of Speech

vA.k.a. parts-of-speech, lexical categories, word classes, morphological classes, lexical tags...

v Lots of debate within linguistics about the number, nature, and universality of these

CS6501 Natural Language Processing 6

POS Tagging

vThe process of assigning a part-of-speech to each word in a collection (sentence).

WORD tag

the DETkoala Nput Vthe DETkeys Non Pthe DETtable N

CS6501 Natural Language Processing 7

Why is POS Tagging Useful?

vFirst step of a vast number of practical tasksvParsing

v Need to know if a word is an N or V before you can parse

v Information extractionv Finding names, relations, etc.

vSpeech synthesis/recognitionv OBject obJECTv OVERflow overFLOWv DIScount disCOUNTv CONtent conTENT

vMachine Translation

CS6501 Natural Language Processing 8

Open and Closed Classes

v Closed class: a small fixed membership v Prepositions: of, in, by, …v Pronouns: I, you, she, mine, his, them, …v Usually function words (short common words which

play a role in grammar)

v Open class: new ones can be createdv English has 4: Nouns, Verbs, Adjectives, Adverbsv Many languages have these 4, but not all!

CS6501 Natural Language Processing 9

Open Class Words

v Nounsv Proper nouns (Boulder, Granby, Eli Manning)v Common nouns (the rest). v Count nouns and mass nouns

v Count: have plurals, get counted: goat/goats, one goat, two goats

v Mass: don’t get counted (snow, salt, communism) (*two snows)

v Verbsv In English, have morphological affixes (eat/eats/eaten)

CS6501 Natural Language Processing 10

Closed Class Words

Examples:vprepositions: on, under, over, …vparticles: up, down, on, off, …vdeterminers: a, an, the, …vpronouns: she, who, I, ..vconjunctions: and, but, or, …vauxiliary verbs: can, may should, …vnumerals: one, two, three, third, …

CS6501 Natural Language Processing 11

Prepositions from CELEX

CELEX:onlinedictionaryFrequencycountsarefromCOBUILD16-billion-wordcorpus

CS6501 Natural Language Processing 12

English Particles

CS6501 Natural Language Processing 13

Conjunctions

CS6501 Natural Language Processing 14

Choosing a Tagset

v Could pick very coarse tagsetsv N, V, Adj, Adv, Other

v More commonly used set is finer grainedv E.g., “Penn TreeBank tagset”, 45 tags: PRP$, WRB,

WP$, VBGv Brown cropus, 87 tags.

v Prague Dependency Treebank (Czech)v 4452 tagsv AAFP3----3N----: (nejnezajímavějším)

Adj Regular Feminine Plural….Superlative [Hajic 2006, VMC tutorial]

CS6501 Natural Language Processing 15

Penn TreeBank POS Tagset

CS6501 Natural Language Processing 16

Using the Penn Tagset

vThe/DT grand/JJ jury/NN commmented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS ./.

Universal Tag set

v ~ 12 different tagsvNOUN, VERB, ADJ, ADV, PRON, DET, ADP,

NUM, CONJ, PRT, “.”, X

CS6501 Natural Language Processing 17

CS6501 Natural Language Processing 18

POS Tagging v.s. Word clustering

vWords often have more than one POS: backvThe back door = JJvOn my back = NNvWin the voters back = RBvPromised to back the bill = VB

These examples from Dekang Lin

CS6501 Natural Language Processing 19

How Hard is POS Tagging?

POS tag sequences

vSome tag sequences more likely occur than others

vPOS Ngram viewhttps://books.google.com/ngrams/graph?content=_ADJ_+_NOUN_%2C_ADV_+_NOUN_%2C+_ADV_+_VERB_

CS6501 Natural Language Processing 20

ExistingmethodsoftenmodelPOStaggingasasequencetagging problem

Evaluation

vHow many words in the unseen test data can be tagged correctly?

vUsually evaluated on Penn TreebankvState of the art ~97% vTrivial baseline (most likely tag) ~94%vHuman performance ~97%

CS6501 Natural Language Processing 21

top related