how much do word embeddings encode about syntax? jacob andreas and dan klein uc berkeley
TRANSCRIPT
![Page 1: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032600/56649dba5503460f94aaadb3/html5/thumbnails/1.jpg)
How much do word embeddings encode about syntax?
Jacob Andreas and Dan KleinUC Berkeley
![Page 2: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032600/56649dba5503460f94aaadb3/html5/thumbnails/2.jpg)
Everybody loves word embeddings
few most
that the
a eachthisevery
[Collobert 2011][Collobert 2011, Mikolov 2013, Freitag 2004, Schuetze 1995, Turian 2010]
![Page 3: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032600/56649dba5503460f94aaadb3/html5/thumbnails/3.jpg)
What might embeddings bring?
Cathleen complained about the magazine’s shoddy editorial quality .
Mary
executiveaverage
![Page 4: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032600/56649dba5503460f94aaadb3/html5/thumbnails/4.jpg)
Three hypotheses
Vocabulary expansion(good for OOV words)
Statistic pooling(good for medium-frequency words)
Embedding structure(good for features)
Cathleen
Mary
averageeditorial
executive
transitivity
tense
![Page 5: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032600/56649dba5503460f94aaadb3/html5/thumbnails/5.jpg)
Vocabulary expansion:
Embeddings help handling of out-of-vocabulary words
Cathleen
Mary
![Page 6: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032600/56649dba5503460f94aaadb3/html5/thumbnails/6.jpg)
Vocabulary expansion
John
Mary
Pierre
yellow
enormous
hungry
Cathleen
![Page 7: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032600/56649dba5503460f94aaadb3/html5/thumbnails/7.jpg)
Vocabulary expansion
John
Mary
Pierre
yellow
enormous
hungry
Cathleen complained about the magazine’s shoddy editorial quality.
Cathleen
Mary
![Page 8: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032600/56649dba5503460f94aaadb3/html5/thumbnails/8.jpg)
Vocab. expansion results
60
65
70
75
80
85
90
95
100
91.13 91.22
Baseline +OOV
![Page 9: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032600/56649dba5503460f94aaadb3/html5/thumbnails/9.jpg)
Vocab. expansion results
70
71
72
73
74
75
71.8872.20
Baseline +OOV
(300 sentences)
![Page 10: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032600/56649dba5503460f94aaadb3/html5/thumbnails/10.jpg)
Statistic pooling hypothesis:
Embeddings help handling ofmedium-frequency words
averageeditorial
executive
![Page 11: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032600/56649dba5503460f94aaadb3/html5/thumbnails/11.jpg)
Statistic pooling
executive
kindgiant
editorial
average
{NN, JJ}
{NN}
{NN, JJ}
{JJ}
{NN}
![Page 12: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032600/56649dba5503460f94aaadb3/html5/thumbnails/12.jpg)
Statistic pooling
executive
kindgiant
editorial
average
{NN, JJ}
{NN, JJ}
{NN, JJ}
{JJ, NN}
{NN, JJ}
![Page 13: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032600/56649dba5503460f94aaadb3/html5/thumbnails/13.jpg)
Statistic pooling
executive
kindgiant
editorial
average
{NN, JJ}
{NN}
{NN, JJ}
{JJ}
{NN}
editorial NN
editorialNN
![Page 14: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032600/56649dba5503460f94aaadb3/html5/thumbnails/14.jpg)
Statistic pooling results
Baseline +Pooling60
65
70
75
80
85
90
95
100
91.13 91.11
![Page 15: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032600/56649dba5503460f94aaadb3/html5/thumbnails/15.jpg)
Vocab. expansion results
70
71
72
73
74
75
71.8872.21
Baseline +Pooling
(300 sentences)
![Page 16: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032600/56649dba5503460f94aaadb3/html5/thumbnails/16.jpg)
Embedding structure hypothesis:
The organization of the embedding spacedirectly encodes useful features
transitivity
tense
![Page 17: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032600/56649dba5503460f94aaadb3/html5/thumbnails/17.jpg)
Embedding structure
vanisheddined vanishing
dining
devoured
assassinateddevouring
assassinating
“transitivity”
“tense”
dined dinedVBD VBD
[Huang 2011]
![Page 18: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032600/56649dba5503460f94aaadb3/html5/thumbnails/18.jpg)
Embedding structure results
60
65
70
75
80
85
90
95
100
91.13 91.08
Baseline +Features
![Page 19: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032600/56649dba5503460f94aaadb3/html5/thumbnails/19.jpg)
Embedding structure results
70
71
72
73
74
75
71.88
70.32
Baseline +Features
(300 sentences)
![Page 20: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032600/56649dba5503460f94aaadb3/html5/thumbnails/20.jpg)
To summarize
60
65
70
75
80
85
90
95
100Baseline+OOV+Pooling+Features
(300 sentences)
![Page 21: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032600/56649dba5503460f94aaadb3/html5/thumbnails/21.jpg)
Combined results
60
65
70
75
80
85
90
95
100
90.70 90.11
Baseline +OOV+Pooling
![Page 22: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032600/56649dba5503460f94aaadb3/html5/thumbnails/22.jpg)
Vocab. expansion results
70
71
72
73
74
75
71.8872.21
Baseline
(300 sentences)
+OOV+Pooling
![Page 23: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032600/56649dba5503460f94aaadb3/html5/thumbnails/23.jpg)
What about…
• Domain adaptation?(no significant gain)
• French?(no significant gain)
• Other kinds of embeddings?(no significant gain)
![Page 24: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032600/56649dba5503460f94aaadb3/html5/thumbnails/24.jpg)
Why didn’t it work?• Context clues often provide enough information to
reason around words with incomplete / incorrect statistics
• Parser already has a robust OOV, small count models
• Sometimes “help” from embeddings is worse than nothing:
bifurcate Soap homered Paschi tuning unrecognized
![Page 25: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032600/56649dba5503460f94aaadb3/html5/thumbnails/25.jpg)
What about other parsers?
• Dependency parsers(continuous repr. as syntactic abstraction)
• Neural networks(continuous repr. as structural requirement)
[Henderson 2004, Socher 2013][Henderson 2004, Socher 2013, Koo 2008, Bansal 2014]
![Page 26: How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032600/56649dba5503460f94aaadb3/html5/thumbnails/26.jpg)
Conclusion
• Embeddings provide no apparent benefit to state-of-the-art parser for:– OOV handling– Parameter pooling– Lexicon features
• Code online at http://cs.berkeley.edu/~jda