introduction to syntax, with part-of-speech tagging

Introduction to Syntax, with Part-of-Speech

Tagging

Owen RambowSeptember 17 & 19

Admin Stuff

• These slides available at o http://www.cs.columbia.edu/~rambow/teaching

.html

• For Eliza in homework, you can use a tagger or chunker, if you want – details at:o http://www.cs.columbia.edu/~ani/cs4705.html

• Special office hours (Ani): today after class, tomorrow at 10am in CEPSR 721

Statistical POS Tagging

• Want to choose most likely string of tags (T), given the string of words (W)

• W = w1, w2, …, wn

• T = t1, t2, …, tn

• I.e., want argmaxT p(T | W)• Problem: sparse data


p(W|T) = p(w1, w2, …, wn | t1, t2, …, tn )

= i p(wi | w1, …, wi-1, t1, t2, …, tn)

i p(wi | ti )


argmaxT p(T|W) = argmaxT p(W|T) p (T)

argmaxT i p(wi | ti ) p(ti | ti-2, ti-1 )

• Relatively easy to get data for parameter estimation (next slide)

• But: need smoothing for unseen words• Easy to determine the argmax (Viterbi

algorithm in time linear in sentence length)

Probability Estimation for trigram POS Tagging

Maximum-Likelihood Estimation• p’ ( wi | ti ) = c( wi, ti ) / c( ti )

• p’ ( ti | ti-2, ti-1 ) = c( ti, ti-2, ti-1 ) / c( ti-2, ti-

1 )

Statistical POS Tagging

• Method common to many tasks in speech & NLP

• “Noisy Channel Model”, Hidden Markov Model

Back to Syntax

• (((the/Det) boy/N) likes/V ((a/Det) girl/N))

boy

the

likes

girl

a

DetP

NP NP

DetP

S

Phrase-structuretree

nonterminalsymbols= constituents

terminal symbols = words

Phrase Structure and Dependency Structure

likes/V

boy/N girl/N

the/Det a/Detboy

the

likes

girl

a

DetP

NP NP

DetP

S

Types of Dependency

likes/V

boy/N girl/N

a/Detsmall/Adjthe/Det

very/Adv

sometimes/Adv

ObjSubjAdj(unct)

FwFw

Adj

Adj

Grammatical Relations

• Types of relations between wordso Arguments: subject, object, indirect

object, prepositional objecto Adjuncts: temporal, locative, causal,

manner, …o Function Words

Subcategorization

• List of arguments of a word (typically, a verb), with features about realization (POS, perhaps case, verb form etc)

• In canonical order Subject-Object-IndObj

• Example:o like: N-N, N-V(to-inf)o see: N, N-N, N-N-V(inf)

• Note: J&M talk about subcategorization only within VP

Where is the VP?

boy

the

likes

girl

a

DetP

NP NP

DetP

S

boy

the

likesDetP

NP

girl

a

NP

DetP

S

VP

Where is the VP?

• Existence of VP is a linguistic (empirical) claim, not a methodological claim

• Semantic evidence???• Syntactic evidence

o VP-fronting (and quickly clean the carpet he did! )o VP-ellipsis (He cleaned the carpets quickly, and so

did she )o Can have adjuncts before and after VP, but not in

VP (He often eats beans, *he eats often beans )• Note: in all right-branching structures, issue

is different again

Penn Treebank, Again

• Syntactically annotated corpus (phrase structure)

• PTB is not naturally occurring data!• Represents a particular linguistic theory

(but a fairly “vanilla” one)• Particularities

o Very indirect representation of grammatical relations (need for head percolation tables)

o Completely flat structure in NP (brown bag lunch, pink-and-yellow child seat )

o Has flat Ss, flat VPs

Context-Free Grammars

• Defined in formal language theory (comp sci)

• Terminals, nonterminals, start symbol, rules

• String-rewriting system• Start with start symbol, rewrite

using rules, done when only terminals left

Derivations of CFGs

• String rewriting system: we derive a string (=derived structure)

• But derivation history represented by phrase-structure tree (=derivation structure)!

Grammar Equivalence and Normal Form

• Can have different grammars that generate same set of strings (weak equivalence)

• Can have different grammars that have same set of derivation trees (string equivalence)

Nobody Uses CFGs Only (Except Intro NLP Courses)

o All major syntactic theories (Chomsky, LFG, HPSG, TAG-based theories) represent both phrase structure and dependency, in one way or another

o All successful parsers currently use statistics about phrase structure and about dependency

Massive Ambiguity of Syntax

• For a standard sentence, and a grammar with wide coverage, there are 1000s of derivations!

• Example:o The large head master told the man

that he gave money and shares in a letter on Wednesday

Some Syntactic Constructions: Wh -

Movement

Control

Raising

introduction to syntax, with part-of-speech tagging

Documents