cs395t: structured models for nlp administrivia lecture 7...
TRANSCRIPT
1
CS395T:StructuredModelsforNLPLecture7:ParsingI
GregDurre?AdaptedfromDanKlein–UCBerkeley
Administrivia
Project1dueoneweekfromtoday!
Recall:EMforHMMs
log
X
y
P (x,y|✓) � Eq(y) logP (x,y|✓) + Entropy[q(y)]
Maximizelowerboundoflogmarginallikelihood:
EM:alternaUngmaximizaUon
E-step:maximizew.r.t.qqt = P (y|x, ✓t�1)
M-step:maximizew.r.t.theta✓t = argmax✓Eqt logP (x,y|✓)
SupervisedlearningfromfracUonalannotaUon
RoadMap§ Done:Sequences:generaUve,discriminaUve,supervised,unsupervised
§ Now:trees(parsing)–ali?lemorelinguisUcs...
§ Thisweek:consUtuency–lotsofgeneraUvemodels
§ Nextweek:dependency(Project2)–morediscriminaUvemodels
2
ThisLecture§ ConsUtuencyformalism
§ (ProbabilisUc)Context-freeGrammars
§ CKY
§ Refininggrammars
§ NextUme:finishconsUtuency+wriUngUps
Syntax
ParseTrees
The move followed a round of similar increases by other lenders, reflecting a continuing decline in that market
PhraseStructureParsing§ Phrasestructureparsing
organizessyntaxintocons%tuentsorbrackets
§ Ingeneral,thisinvolvesnestedtrees
§ Linguistscan,anddo,argueaboutdetails
§ Lotsofambiguity
§ Dependency(nextweek)makesmoresenseforsomelanguages
3
ConsUtuencyTests
§ Howdoweknowwhatnodesgointhetree?
§ ClassicconsUtuencytests:§ SubsUtuUonbyproform § Cleaing(Itwaswithaspoon...)§ Answerellipsis(Whatdidyoueat?)
§ CoordinaUon
§ Cross-linguisUcarguments,too
ConflicUngTests§ ConsUtuencyisn’talwaysclear
§ PhonologicalreducUon:§ Iwillgo→I’llgo§ Iwanttogo→Iwannago§ alecentre→aucentre
§ CoordinaUon§ Hewenttoandcamefromthestore.
La vélocité des ondes sismiques
ClassicalNLP:Parsing
§ Writesymbolicorlogicalrules:
§ UsededucUonsystemstoproveparsesfromwords§ Minimalgrammaron“Fedraises”sentence:36parses§ Simple10-rulegrammar:592parses§ Real-sizegrammar:manymillionsofparses
§ Thisscaledverybadly,didn’tyieldbroad-coveragetools
Grammar (CFG) Lexicon
ROOT → S
S → NP VP
NP → DT NN
NP → NN NNS
NN → interest
NNS → raises
VBP → interest
VBZ → raises
…
NP → NP PP
VP → VBP NP
VP → VBP NP PP
PP → IN NP
AmbiguiUes
4
AmbiguiUes:PPA?achment A?achments
§ Icleanedthedishesinmypajamas
§ Icleanedthedishesinthesink
SyntacUcAmbiguiUesI
§ PreposiUonalphrases:Theycookedthebeansinthepotonthestovewithhandles.
§ ParUclevs.preposiUon:Thepuppytoreupthestaircase.
§ ComplementstructuresThetouristsobjectedtotheguidethattheycouldn’thear.Sheknowsyoulikethebackofherhand.
§ Gerundvs.parUcipialadjecUveVisi%ngrela%vescanbeboring.Changingschedulesfrequentlyconfusedpassengers.
SyntacUcAmbiguiUesII§ ModifierscopewithinNPs
imprac%caldesignrequirementsplas%ccupholder
§ CoordinaUonscope:
Smallratsandmicecansqueezeintoholesorcracksinthewall.
5
DarkAmbiguiUes
§ Darkambigui%es:mostanalysesareshockinglybad(meaning,theydon’thaveaninterpretaUonyoucangetyourmindaround)
§ Unknownwordsandnewusages§ SoluUon:WeneedprobabilisUctechniqueshandlethisuncertainty
Thisanalysiscorrespondstothecorrectparseof
“Thiswillpanicbuyers!”
PCFGs
ProbabilisUcContext-FreeGrammars
§ Acontext-freegrammarisatuple<N,T,S,R>§ N:thesetofnon-terminals
§ Phrasalcategories:S,NP,VP,ADJP,etc.§ Parts-of-speech(pre-terminals):NN,JJ,DT,VB
§ T:thesetofterminals(thewords)§ S:thestartsymbol
§ Oaenwri?enasROOTorTOP(notS–notall“sentences”aresentences)§ R:thesetofrules
§ OftheformX→Y1Y2…Yk,withX,Yi∈N§ Examples:S→NPVP,VP→VPCCVP§ AlsocalledrewritesorproducUons
§ APCFGadds:§ Atop-downproducUonprobabilityperruleP(Y1Y2…Yk|X)
TreebankGrammars
§ NeedaPCFGforbroadcoverageparsing.§ Cantakeagrammarrightoffthetrees(doesn’tworkwell):
§ Maximum-likelihoodesUmate:getP(NP->PRP|NP)bycounUng+normalizing§ Be?erresultsbyenrichingthegrammar(lexicalizaUon,othertechniques)
ROOT → S 1
S → NP VP . 1
NP → PRP 1
VP → VBD ADJP 1
…..
6
ChomskyNormalForm
§ Chomskynormalform:§ AllrulesoftheformX→YZorX→w§ Inprinciple,thisisnolimitaUononthespaceof(P)CFGs
§ N-aryrulesintroducenewnon-terminals
§ Unaries/empUesare“promoted”§ NOTequivalenttothis:
§ InpracUce:binarizethegrammar,keepunaries
VP
[VP → VBD NP •]
VBD NP PP PP
[VP → VBD NP PP •]
VBD NP PP PP
VP
VP
VP
VBD NP PP PP
VP
CKYParsing
ARecursiveParser
§ maxoverkandrulebeingapplied§ Willthisparserwork?
bestScore(X,i,j,s) if (j = i+1) return tagScore(X,s[i]) else return max score(X->YZ) *
bestScore(Y,i,k,s) * bestScore(Z,k,j,s)
§ Canalsoorganizethingsbo?om-up§ Noteverytag/nonterminalcanbebuiltovereveryspan!
ABo?om-UpParser(CKY)
bestScore(s) for (i : [0,n-1]) for (X : tags[s[i]]) score[X][i][i+1] =
tagScore(X,s[i]) for (diff : [2,n]) for (i : [0,n-diff]) j = i + diff for (X->YZ : rule) for (k : [i+1, j-1]) score[X][i][j] = max score[X][i][j],
score(X->YZ) * score[Y][i][k] * score[Z][k][j]
Y Z
X
i k j
7
UnaryRules§ Unaryrules?
§ Problem:dynamicprogramisself-referenUal!
bestScore(X,i,j,s) if (j = i+1) return tagScore(X,s[i]) else return max max score(X->YZ) *
bestScore(Y,i,k,s) * bestScore(Z,k,j,s)
max score(X->Y) * bestScore(Y,i,j,s)
UnaryClosure
§ Weneedunariestobenon-cyclic§ Canaddressbypre-calculaUngtheunaryclosure§ Ratherthanhavingzeroormoreunaries,alwayshaveexactlyone
§ Alternateunaryandbinarylayers§ Reconstructunarychainsaaerwards
NP
DT NN
VP
VBD NP
DT NN
VP
VBD NP
VP
S
SBAR
VP
SBAR
AlternaUngLayers
bestScoreU(X,i,j,s) if (j = i+1) return tagScore(X,s[i]) else return max max score(X->Y) * bestScoreB(Y,i,j)
bestScoreB(X,i,j,s) return max max score(X->YZ) *
bestScoreU(Y,i,k) * bestScoreU(Z,k,j)
Analysis
8
Time:Theory§ HowmuchUmewillittaketoparse?
§ Foreachdiff(<=n)§ Foreachi(<=n)
§ ForeachruleX→YZ§ ForeachsplitpointkDoconstantwork
Y Z
X
i k j
Time:Theory§ HowmuchUmewillittaketoparse?
§ Foreachdiff(<=n)§ Foreachi(<=n)
§ ForeachruleX→YZ§ ForeachsplitpointkDoconstantwork
§ TotalUme:|rules|*n3
§ Simplegrammartakes0.1sectoparsea20-wordsentence,biggergrammarscantake10+secondsunopUmized
Y Z
X
i k j
Time:PracUce
§ Parsingwiththevanillatreebankgrammar:
§ Why’sitworseinpracUce?§ Longersentences“unlock”moreofthegrammar§ Allkindsofsystemsissuesdon’tscale
~ 20K Rules
(not an optimized parser!)
Observed exponent:
3.6
Same-SpanReachability
ADJP ADVP FRAG INTJ NP PP PRN QP S SBAR UCP VP
WHNP
TOP
LST
CONJP
WHADJP
WHADVP
WHPP
NX
NAC
SBARQ
SINV
RRC SQ X
PRT
9
EfficientCKY
§ LotsoftrickstomakeCKYefficient§ Someofthemareli?leengineeringdetails:
§ E.g.,firstchoosek,thenenumeratethroughtheY:[i,k]whicharenon-zero,thenloopthroughrulesbyleachild.
§ OpUmallayoutofthedynamicprogramdependsongrammar
§ Somearealgorithmicimprovements:§ Pruning:ruleoutchunksofthechartbasedonasimplermodel
LearningPCFGs
TypicalExperimentalSetup
§ Corpus:PennTreebank,WSJ
§ Accuracy–F1:harmonicmeanofper-nodelabeledprecisionandrecall.
§ Here:alsosize–numberofsymbolsingrammar.
Training: sections 02-21 Development: section 22 (here, first 20 files) Test: section 23
TreebankPCFGs§ UsePCFGsforbroadcoverageparsing§ Cantakeagrammarrightoffthetrees(doesn’tworkwell):
ROOT → S 1
S → NP VP . 1
NP → PRP 1
VP → VBD ADJP 1
…..
Model F1 Baseline 72.0
[Charniak 96]
10
CondiUonalIndependence?
§ NoteveryNPexpansioncanfilleveryNPslot§ Agrammarwithsymbolslike“NP”won’tbecontext-free§ StaUsUcally,condiUonalindependencetoostrong
Non-Independence§ IndependenceassumpUonsareoaentoostrong.
§ Example:theexpansionofanNPishighlydependentontheparentoftheNP(i.e.,subjectsvs.objects).
§ Also:thesubjectandobjectexpansionsarecorrelated!
11%9%
6%
NP PP DT NN PRP
9% 9%
21%
NP PP DT NN PRP
7%4%
23%
NP PP DT NN PRP
All NPs NPs under S NPs under VP
GrammarRefinement
§ StructureAnnotaUon[Johnson’98,Klein&Manning’03]§ LexicalizaUon[Collins’99,Charniak’00]§ LatentVariables[Matsuzakietal.05,Petrovetal.’06]
StructuralAnnotaUon
11
TheGameofDesigningaGrammar
§ AnnotaUonrefinesbasetreebanksymbolstoimprovestaUsUcalfitofthegrammar§ StructuralannotaUon
VerUcalMarkovizaUon
§ VerUcalMarkovorder:rewritesdependonpastkancestornodes.(cf.parentannotaUon)
Order 1 Order 2
72%73%74%75%76%77%78%79%
1 2v 2 3v 3
Vertical Markov Order
05000
10000
150002000025000
1 2v 2 3v 3
Vertical Markov Order
Symbols
KleinandManning(2003)
HorizontalMarkovizaUon
70%
71%
72%
73%
74%
0 1 2v 2 inf
Horizontal Markov Order
0
3000
6000
9000
12000
0 1 2v 2 inf
Horizontal Markov Order
Symbols
Order 1 Order ∞
KleinandManning(2003)
TagSplits
§ Problem:Treebanktagsaretoocoarse.
§ Example:SentenUal,PP,andotherpreposiUonsareallmarkedIN.
§ ParUalSoluUon:§ SubdividetheINtag. Annotation F1 Size
Previous 78.3 8.0K SPLIT-IN 80.3 8.1K
KleinandManning(2003)
12
AFullyAnnotated(Unlex)Tree SomeTestSetResults
§ Beats“firstgeneraUon”lexicalizedparsers.§ Baseline:~72
Parser LP LR F1 CB 0 CB
Magerman 95 84.9 84.6 84.7 1.26 56.6
Collins 96 86.3 85.8 86.0 1.14 59.9
K+M 2003 86.9 85.7 86.3 1.10 60.3
Charniak 97 87.4 87.5 87.4 1.00 62.1
Collins 99 88.7 88.6 88.6 0.90 67.1
KleinandManning(2003)
LexicalizaUon
§ Annotation refines base treebank symbols to improve statistical fit of the grammar § Structural annotation [Johnson ’98, Klein and Manning 03] § Head lexicalization [Collins ’99, Charniak ’00]
TheGameofDesigningaGrammar
13
ProblemswithPCFGs
§ IfwedonoannotaUon,thesetreesdifferonlyinonerule:§ VP→VPPP§ NP→NPPP
§ Parsewillgoonewayortheother,regardlessofwords§ LexicalizaUonallowsustobesensiUvetospecificwords
ProblemswithPCFGs
§ What’sdifferentbetweenbasicPCFGscoreshere?§ What(lexical)correlaUonsneedtobescored?
LexicalizedTrees
§ Add“headwords”toeachphrasalnode§ SyntacUcvs.semanUc
heads§ Headshipnotin(most)
treebanks§ Usuallyuseheadrules,
e.g.:§ NP:
§ TakeleamostNP§ TakerightmostN*§ TakerightmostJJ§ Takerightchild
§ VP:§ TakeleamostVB*§ TakeleamostVP§ Takeleachild
LexicalizedPCFGs?§ Problem:wenowhavetoesUmateprobabiliUeslike
§ Nevergoingtogettheseatomicallyoffofatreebank
§ SoluUon:breakupderivaUonintosmallersteps
1) 2)
3) 4)
14
LexicalDerivaUonSteps§ AderivaUonofalocaltree[Collins99]
ChooseaheadtagandwordP(childsymbol|parent,headword)
GeneratechildrenfromheadsequenUally
P(childtag,childhead|parent,headsymbol,headword)
FinishgeneraUngthechildren;eachnewonecondiUonsonthepreviousones
LexicalizedCKY§ Howbigisthestatespace?
§ Nonterminals=Numsymbolsxvocabsize–waytoolarge!
§ Can’tusestandardCKY
LexicalizedCKY§ TrackindexhofheadinDP:O(n5)§ Be?eralgorithmsnextlecture!
bestScore(X,i,j,h,s) if (j = i+1) return tagScore(X,s[i]) else return max max score(X[h]->Y[h] Z[h’]) * bestScore(Y,i,k,h,s) * bestScore(Z,k,j,h’,s) max score(X[h]->Y[h’] Z[h]) * bestScore(Y,i,k,h’,s) * bestScore(Z,k,j,h,s)
Y[h] Z[h’]
X[h]
i h k h’ j
k,h’,X->YZ
k,h’,X->YZ
Results
§ Someresults§ Collins99–88.6F1(generaUvelexical)§ CharniakandJohnson05–89.7/91.3F1(generaUvelexical/reranked)
§ McCloskyetal06–92.1F1(gen+rerank+self-train)§ 92.1wasSOTAforaround8years!
15
LatentVariablePCFGs
§ AnnotaUonrefinesbasetreebanksymbolstoimprovestaUsUcalfitofthegrammar§ ParentannotaUon[Johnson’98]§ HeadlexicalizaUon[Collins’99,Charniak’00]§ AutomaUcclustering?
TheGameofDesigningaGrammar
LatentVariableGrammars
Parse Tree Sentence Parameters
...
Derivations
Forward
LearningLatentAnnotaUons
EMalgorithm:
X1
X2 X7 X4
X5 X6 X3
He was right
.
§ Brackets are known § Base categories are known § Only induce subcategories
JustlikeForward-BackwardforHMMs.Learnlabelrefinements,baselabelsareknown! Backward
16
RefinementoftheDTtag
DT
DT-1 DT-2 DT-3 DT-4
Hierarchicalrefinement
HierarchicalEsUmaUonResults
74
76
78
80
82
84
86
88
90
100 300 500 700 900 1100 1300 1500 1700
Total Number of grammar symbols
Par
sing
acc
urac
y (F
1)
Model F1 Flat Training 87.3 Hierarchical Training 88.4
Refinementofthe,tag§ Spli~ngallcategoriesequallyiswasteful:
17
Adaptive Splitting
§ Wanttosplitcomplexcategoriesmore§ Idea:spliteverything,rollbacksplitswhichwereleastuseful
AdapUveSpli~ngResults
Model F1 Previous 88.4 With 50% Merging 89.5
0
5
10
15
20
25
30
35
40
NP
VP PP
ADVP S
ADJP
SBAR Q
P
WH
NP
PRN
NX
SIN
V
PRT
WH
PP SQ
CO
NJP
FRAG
NAC UC
P
WH
ADVP INTJ
SBAR
Q
RR
C
WH
ADJP X
RO
OT
LST
NumberofPhrasalSubcategories NumberofLexicalSubcategories
0
10
20
30
40
50
60
70
NNP JJ
NNS NN VBN RB
VBG VB VBD CD IN
VBZ
VBP DT
NNPS CC JJ
RJJ
S :PR
PPR
P$ MD
RBR
WP
POS
PDT
WRB
-LRB
- .EX
WP$
WDT
-RRB
- ''FW RB
S TO$
UH, ``
SYM RP LS #
18
LearnedSplits
§ Proper Nouns (NNP): NNP-14 Oct. Nov. Sept. NNP-12 John Robert James NNP-2 J. E. L. NNP-1 Bush Noriega Peters
NNP-15 New San Wall NNP-3 York Francisco Street
LearnedSplits
§ Proper Nouns (NNP):
§ Personal pronouns (PRP):
NNP-14 Oct. Nov. Sept. NNP-12 John Robert James NNP-2 J. E. L. NNP-1 Bush Noriega Peters
NNP-15 New San Wall NNP-3 York Francisco Street
PRP-0 It He I PRP-1 it he they PRP-2 it them him
§ CardinalNumbers(CD):
CD-7 one two Three CD-4 1989 1990 1988 CD-11 million billion trillion CD-0 1 50 100 CD-3 1 30 31 CD-9 78 58 34
LearnedSplits HierarchicalPruning
… QP NP VP … coarse:
split in two: … QP1 QP2 NP1 NP2 VP1 VP2 …
… QP1 QP1 QP3 QP4 NP1 NP2 NP3 NP4 VP1 VP2 VP3 VP4 … split in four:
split in eight: … … … … … … … … … … … … … … … … …
19
BracketPosteriors Speedup
§ 100xspeedupifyouusethefullcoarse-to-finehierarchyvs.justparsingwiththefinestgrammar
§ Parsethetestsetin15minutes–2sentences/second
FinalResults(Accuracy)
≤ 40 words F1
all F1
ENG
Charniak&Johnson ‘05 (generative) 90.1 89.6
Split / Merge 90.6 90.1
GER
Dubey ‘05 76.3 -
Split / Merge 80.8 80.1
CH
N
Chiang et al. ‘02 80.0 76.6
Split / Merge 86.3 83.4
Ensemble of split-merge parsers = best cross-lingual parser for ~7 years!