parsing with compositional vector grammars socher, bauer, manning, ng 2013
TRANSCRIPT
![Page 1: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013](https://reader038.vdocuments.us/reader038/viewer/2022103004/56649caf5503460f949723fe/html5/thumbnails/1.jpg)
Parsing with Compositional Vector Grammars
Socher, Bauer, Manning, NG 2013
![Page 2: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013](https://reader038.vdocuments.us/reader038/viewer/2022103004/56649caf5503460f949723fe/html5/thumbnails/2.jpg)
Problem
• How can we parse a sentence and create a dense representation of it?
– N-grams have obvious problems, most important is sparsity
• Can we resolve syntactic ambiguity with context? “They ate udon with forks” vs “They ate udon with chicken”
![Page 3: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013](https://reader038.vdocuments.us/reader038/viewer/2022103004/56649caf5503460f949723fe/html5/thumbnails/3.jpg)
Standard Recursive Neural Net
I like green eggs
[ Vector(I)] [ Vector(like)]
WMain
[ Vector(I-like)]
Score
[ Vector(green)] [ Vector(eggs)]
Classifier? WMain
[ Vector((I-like)green)]
![Page 4: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013](https://reader038.vdocuments.us/reader038/viewer/2022103004/56649caf5503460f949723fe/html5/thumbnails/4.jpg)
Standard Recursive Neural Net
Where is usually or logistic()In other words, stack the two word vectors and multiply through a matrix W and you get a vector of the same dimensionality as the children a or b.
![Page 5: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013](https://reader038.vdocuments.us/reader038/viewer/2022103004/56649caf5503460f949723fe/html5/thumbnails/5.jpg)
Syntactically Untied RNN
I like green eggs
[ Vector(I)] [ Vector(like)]
WN,V
[ Vector(I-like)]
Score
[ Vector(green)] [ Vector(eggs)]
Classifier
Wadj,N
[ Vector(green-eggs)]
First, parse lower level with PCFG
N V Adj N
![Page 6: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013](https://reader038.vdocuments.us/reader038/viewer/2022103004/56649caf5503460f949723fe/html5/thumbnails/6.jpg)
Syntactically Untied RNN
The weight matrix is determined by the PCFG parse category of a and b. (You have one per parse combination)
![Page 7: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013](https://reader038.vdocuments.us/reader038/viewer/2022103004/56649caf5503460f949723fe/html5/thumbnails/7.jpg)
Examples: Composition Matrixes
• Notice that he initializes them with two identity matrixes (in the absence of other information we should average
![Page 8: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013](https://reader038.vdocuments.us/reader038/viewer/2022103004/56649caf5503460f949723fe/html5/thumbnails/8.jpg)
Learning the Weights
• Errors are backpropagated through structure (Goller and Kuchler, 1996)
Weight derivatives are additive across branches! (Not obvious- good proof/explanation in Socher, 2014)
𝛿
𝑓 ′ (𝑥)
(for logistic)
𝑒input
![Page 9: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013](https://reader038.vdocuments.us/reader038/viewer/2022103004/56649caf5503460f949723fe/html5/thumbnails/9.jpg)
Tricks
• Our good friend, ada-grad (diagonal variant):
(Elementwise)• Initialize matrixes with identity + small
random noise• Uses Collobert and Weston (2008) word
embeddings to start
![Page 10: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013](https://reader038.vdocuments.us/reader038/viewer/2022103004/56649caf5503460f949723fe/html5/thumbnails/10.jpg)
Learning the Tree
• We want the scores of the correct parse trees to be better than all incorrect trees by a margin:
(Correct Parse Trees are Given in the Training Set)
![Page 11: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013](https://reader038.vdocuments.us/reader038/viewer/2022103004/56649caf5503460f949723fe/html5/thumbnails/11.jpg)
Finding the Best Tree (inference)
• Want to find the parse tree with the max score (which is the sum all the scores of all sub trees)
• Too expensive to try every combination • Trick: use non-RNN method to select best 200
trees (CKY algorithm). Then, beam search these trees with RNN.
![Page 12: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013](https://reader038.vdocuments.us/reader038/viewer/2022103004/56649caf5503460f949723fe/html5/thumbnails/12.jpg)
Model Comparisons (WSJ Dataset)
(Socher’s Model)F1 for parse labels
![Page 13: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013](https://reader038.vdocuments.us/reader038/viewer/2022103004/56649caf5503460f949723fe/html5/thumbnails/13.jpg)
Analysis of Errors
![Page 14: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013](https://reader038.vdocuments.us/reader038/viewer/2022103004/56649caf5503460f949723fe/html5/thumbnails/14.jpg)
![Page 15: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013](https://reader038.vdocuments.us/reader038/viewer/2022103004/56649caf5503460f949723fe/html5/thumbnails/15.jpg)
Conclusions:
• Not the best model, but fast• No hand engineered features• Huge number of parameters:
• Notice that Socher can’t make the standard RNN perform better than the PCFG: there is a pattern here. Most of the papers from this group involve very creative modifications to the standard RNN. (SU-RNN, RNTN, RNN+Max Pooling)
![Page 16: Parsing with Compositional Vector Grammars Socher, Bauer, Manning, NG 2013](https://reader038.vdocuments.us/reader038/viewer/2022103004/56649caf5503460f949723fe/html5/thumbnails/16.jpg)
• The model in this paper has (probably) been eclipsed by the Recursive Neural Tensor Network. Subsequent work showed this model performed better (in different situations) than the SU-RNN