introduction to syntax, with part-of-speech tagging
DESCRIPTION
Introduction to Syntax, with Part-of-Speech Tagging. Owen Rambow September 17 & 19. Admin Stuff. These slides available at http://www.cs.columbia.edu/~rambow/teaching.html For Eliza in homework, you can use a tagger or chunker, if you want – details at: - PowerPoint PPT PresentationTRANSCRIPT
Introduction to Syntax, with Part-of-Speech
Tagging
Owen RambowSeptember 17 & 19
Admin Stuff
• These slides available at o http://www.cs.columbia.edu/~rambow/teaching
.html
• For Eliza in homework, you can use a tagger or chunker, if you want – details at:o http://www.cs.columbia.edu/~ani/cs4705.html
• Special office hours (Ani): today after class, tomorrow at 10am in CEPSR 721
Statistical POS Tagging
• Want to choose most likely string of tags (T), given the string of words (W)
• W = w1, w2, …, wn
• T = t1, t2, …, tn
• I.e., want argmaxT p(T | W)• Problem: sparse data
Statistical POS Tagging (ctd)
• p(T|W) = p(T,W) / p(W) = p(W|T) p (T) / p(W)
• argmaxT p(T|W)
= argmaxT p(W|T) p (T) / p(W)
= argmaxT p(W|T) p (T)
Statistical POS Tagging (ctd)
p(T) = p(t1, t2, …, tn-1 , tn)
= p(tn | t1, …, tn-1 ) p (t1, …, tn-1)
= p(tn | t1, …, tn-1 )
p(tn-1 | t1, …, tn-2) p (t1, …, tn-2)
= i p(ti | t1, …, ti-1 ) i p(ti | ti-2, ti-1 ) trigram (n-gram)
Statistical POS Tagging (ctd)
p(W|T) = p(w1, w2, …, wn | t1, t2, …, tn )
= i p(wi | w1, …, wi-1, t1, t2, …, tn)
i p(wi | ti )
Statistical POS Tagging (ctd)
argmaxT p(T|W) = argmaxT p(W|T) p (T)
argmaxT i p(wi | ti ) p(ti | ti-2, ti-1 )
• Relatively easy to get data for parameter estimation (next slide)
• But: need smoothing for unseen words• Easy to determine the argmax (Viterbi
algorithm in time linear in sentence length)
Probability Estimation for trigram POS Tagging
Maximum-Likelihood Estimation• p’ ( wi | ti ) = c( wi, ti ) / c( ti )
• p’ ( ti | ti-2, ti-1 ) = c( ti, ti-2, ti-1 ) / c( ti-2, ti-
1 )
Statistical POS Tagging
• Method common to many tasks in speech & NLP
• “Noisy Channel Model”, Hidden Markov Model
Back to Syntax
• (((the/Det) boy/N) likes/V ((a/Det) girl/N))
boy
the
likes
girl
a
DetP
NP NP
DetP
S
Phrase-structuretree
nonterminalsymbols= constituents
terminal symbols = words
Phrase Structure and Dependency Structure
likes/V
boy/N girl/N
the/Det a/Detboy
the
likes
girl
a
DetP
NP NP
DetP
S
Types of Dependency
likes/V
boy/N girl/N
a/Detsmall/Adjthe/Det
very/Adv
sometimes/Adv
ObjSubjAdj(unct)
FwFw
Adj
Adj
Grammatical Relations
• Types of relations between wordso Arguments: subject, object, indirect
object, prepositional objecto Adjuncts: temporal, locative, causal,
manner, …o Function Words
Subcategorization
• List of arguments of a word (typically, a verb), with features about realization (POS, perhaps case, verb form etc)
• In canonical order Subject-Object-IndObj
• Example:o like: N-N, N-V(to-inf)o see: N, N-N, N-N-V(inf)
• Note: J&M talk about subcategorization only within VP
Where is the VP?
boy
the
likes
girl
a
DetP
NP NP
DetP
S
boy
the
likesDetP
NP
girl
a
NP
DetP
S
VP
Where is the VP?
• Existence of VP is a linguistic (empirical) claim, not a methodological claim
• Semantic evidence???• Syntactic evidence
o VP-fronting (and quickly clean the carpet he did! )o VP-ellipsis (He cleaned the carpets quickly, and so
did she )o Can have adjuncts before and after VP, but not in
VP (He often eats beans, *he eats often beans )• Note: in all right-branching structures, issue
is different again
Penn Treebank, Again
• Syntactically annotated corpus (phrase structure)
• PTB is not naturally occurring data!• Represents a particular linguistic theory
(but a fairly “vanilla” one)• Particularities
o Very indirect representation of grammatical relations (need for head percolation tables)
o Completely flat structure in NP (brown bag lunch, pink-and-yellow child seat )
o Has flat Ss, flat VPs
Context-Free Grammars
• Defined in formal language theory (comp sci)
• Terminals, nonterminals, start symbol, rules
• String-rewriting system• Start with start symbol, rewrite
using rules, done when only terminals left
CFG: Example
• Ruleso S NP VPo VP V NPo NP Det N | AdjP NPo AdjP Adj | Adv AdjPo N boy | girlo V sees | likeso Adj big | smallo Adv very o Det a | the
the very small boy likes a girl
Derivations of CFGs
• String rewriting system: we derive a string (=derived structure)
• But derivation history represented by phrase-structure tree (=derivation structure)!
Grammar Equivalence and Normal Form
• Can have different grammars that generate same set of strings (weak equivalence)
• Can have different grammars that have same set of derivation trees (string equivalence)
Nobody Uses CFGs Only (Except Intro NLP Courses)
o All major syntactic theories (Chomsky, LFG, HPSG, TAG-based theories) represent both phrase structure and dependency, in one way or another
o All successful parsers currently use statistics about phrase structure and about dependency
Massive Ambiguity of Syntax
• For a standard sentence, and a grammar with wide coverage, there are 1000s of derivations!
• Example:o The large head master told the man
that he gave money and shares in a letter on Wednesday
Some Syntactic Constructions: Wh -
Movement
Control
Raising