university of montreal, microsoft research, university of ... · university of montreal, microsoft...

35
Straight to the Tree: Constituency Parsing with Neural Syntactic Distance Yikang Shen*, Zhouhan Lin*, Athul Paul Jacob, Alessandro Sordoni, Aaron Courville, Yoshua Bengio University of Montreal, Microsoft Research, University of Waterloo

Upload: others

Post on 26-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Straight to the Tree: Constituency Parsing with Neural Syntactic Distance

Yikang Shen*, Zhouhan Lin*,Athul Paul Jacob, Alessandro Sordoni, Aaron Courville, Yoshua Bengio

University of Montreal, Microsoft Research, University of Waterloo

Page 2: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Overview- Motivation- Syntactic Distance based Parsing Framework- Model- Experimental Results

Page 3: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Overview- Motivation- Syntactic Distance based Parsing Framework- Model- Experimental Results

Page 4: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

ICLR 2018: Neural Language Modeling by Jointly Learning Syntax and Lexicon

Syntactic Distance

Structured Self-Attention

LSTM

Language Model (61 ppl)

Unsupervised Constituency parser(68 UF1)

Supervised Constituency Parsing with Syntactic Distance?

[Shen et al. 2018]

Page 5: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Chart Neural Parsers

1. High computational cost:Complexity of CYK is O(n^3).

2. Complicated loss function:

Transition based Neural Parsers

1. Greedy decoding:Incompleted tree (the shift and reduce steps may not match).

2. Exposure biasThe model is never exposed to its own mistakes during training

[Stern et al., 2017; Cross and Huang, 2016]

Page 6: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Overview- Motivation- Syntactic Distance based Parsing Framework- Model- Experimental Results

Page 7: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Intuitions

Only the order of split (or combination) matters for reconstructing the tree.

Can we model the order directly?

Page 8: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Syntactic distance

For each split point, their syntactic distance should share the same order as the height of related node

N1

N2

S1 S2

Page 9: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Convert to binary tree

[Stern et al., 2017]

Page 10: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Tree to Distance

The height for each non-terminal node is the maximum height of its children plus 1

Page 11: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Tree to Distance

S VP S-VP ∅

NP ∅ ∅ NP ∅

Page 12: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Distance to Tree

Split point for each bracket is the one with maximum distance.

Page 13: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Distance to Tree

Page 14: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Overview- Motivation- Syntactic Distance based Parsing Framework- Model- Experimental Results

Page 15: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Framework for inferring the distances and labels

Distances

Labels for leaf nodes

Labels for non-leaf nodes

Page 16: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Inferring the distances

Distances

Page 17: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Inferring the distances

Page 18: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Pairwise learning-to-rank loss for distances

a variant of hinge loss

Page 19: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Pairwise learning-to-rank loss for distances

L L

-1 1

While di > dj : While di < dj :

Page 20: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Framework for inferring the distances and labels

Distances

Labels for leaf nodes

Labels for non-leaf nodes

Page 21: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Framework for inferring the distances and labels

Labels for leaf nodes

Labels for non-leaf nodes

Page 22: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Inferring the Labels

Page 23: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Inferring the Labels

Page 24: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Inferring the Labels

Page 25: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Putting it together

Page 26: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Putting it together

Page 27: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Overview- Motivation- Syntactic Distance based Parsing Framework- Model- Experimental Results

Page 28: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Experiments: Penn Treebank

Page 29: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Experiments: Chinese Treebank

Page 30: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Experiments: Detailed statistics in PTB and CTB

Page 31: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Experiments: Ablation Test

Page 32: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Experiments: Parsing Speed

Page 33: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Conclusions and Highlights

- A novel constituency parsing scheme: predicting tree structure from a set of real-valued scalars (syntactic distances).

- Completely free from compounding errors. - Strong performance compare to previous models, and- Significantly more efficient than previous models- Easy deployment: The architecture of model is no more than a stack

of standard recurrent and convolutional layers.

Page 34: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

One more thing... Why it works now?

- The research in rank loss is well-studied in the topic of learning-to-rank, since 2005 (Burges et al. 2005).

- Models that are good at learning these syntactic distances are not widely known until the rediscovery of LSTM in 2013 (Graves 2013).

- Efficient regularization methods for LSTM didn’t become mature until 2017 (Merity 2017).

Page 35: University of Montreal, Microsoft Research, University of ... · University of Montreal, Microsoft Research, University of Waterloo. Overview - Motivation - Syntactic Distance based

Thank you!

Questions?

Yikang Shen, Zhouhan LinMILA, Université de Montréal{yikang.shn, lin.zhouhan}@gmail.com

Paper:Code: