linguists for deep learning; or: how i learned to stop...

123
Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier Linguists for Deep Learning; or: How I Learned to Stop Worrying and Love Neural Networks Christopher Potts Stanford Linguistics *Sem 2018, June 5–6, New Orleans 1 / 42

Upload: others

Post on 13-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Linguists for Deep Learning;or: How I Learned to Stop Worrying and

Love Neural Networks

Christopher Potts

Stanford Linguistics

*Sem 2018, June 5–6, New Orleans

1 / 42

Page 2: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Linguists for Deep Learning;or: How I Learned to Stop Worrying and

Love Neural Networks

Christopher Potts

Stanford Linguistics

*Sem 2018, June 5–6, New Orleans

1 / 42

Page 3: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Linguists for Deep Learning;or: How I Learned to Stop Worrying and

Love Neural Networks

Christopher Potts

Stanford Linguistics

*Sem 2018, June 5–6, New Orleans

1 / 42

Page 4: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Signs of the apocalypse?

Neil Lawrence in 2015, quoted by Manning (2015)NLP is kind of like a rabbit in the headlights of the DeepLearning machine, waiting to be flattened.

2 / 42

Page 5: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Signs of the apocalypse?

Neil Lawrence in 2015, quoted by Manning (2015)NLP is kind of like a rabbit in the headlights of the DeepLearning machine, waiting to be flattened.

Yann LeCun in 2015 [link]The next frontier for Deep Learning is natural languageunderstanding.

2 / 42

Page 6: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Signs of the apocalypse?

Neil Lawrence in 2015, quoted by Manning (2015)NLP is kind of like a rabbit in the headlights of the DeepLearning machine, waiting to be flattened.

Yann LeCun in 2015 [link]The next frontier for Deep Learning is natural languageunderstanding.

Yann LeCun at Stanford in 2018 [link]I would say language is number 300 in the list of 500problems that we need to face.

2 / 42

Page 7: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Signs of the apocalypse?

Neil Lawrence in 2015, quoted by Manning (2015)NLP is kind of like a rabbit in the headlights of the DeepLearning machine, waiting to be flattened.

Yann LeCun in 2015 [link]The next frontier for Deep Learning is natural languageunderstanding.

Yann LeCun at Stanford in 2018 [link]I would say language is number 300 in the list of 500problems that we need to face.

Did deep learning swerve off the road instead?

2 / 42

Page 8: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Signs of the apocalypse?

Neil Lawrence in 2015, quoted by Manning (2015)NLP is kind of like a rabbit in the headlights of the DeepLearning machine, waiting to be flattened.

But what does this mean?

• If deep learning brings useful tools, ideas, andinsights to another field, has it thereby damagedthat field? I’d say the opposite!

• So what potential does deep learning have toimprove the science of language?

2 / 42

Page 9: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Signs of the apocalypse?

Neil Lawrence in 2015, quoted by Manning (2015)NLP is kind of like a rabbit in the headlights of the DeepLearning machine, waiting to be flattened.

But what does this mean?

• If deep learning brings useful tools, ideas, andinsights to another field, has it thereby damagedthat field? I’d say the opposite!

• So what potential does deep learning have toimprove the science of language?

2 / 42

Page 10: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Signs of the apocalypse?

Neil Lawrence in 2015, quoted by Manning (2015)NLP is kind of like a rabbit in the headlights of the DeepLearning machine, waiting to be flattened.

But what does this mean?

• If deep learning brings useful tools, ideas, andinsights to another field, has it thereby damagedthat field? I’d say the opposite!

• So what potential does deep learning have toimprove the science of language?

2 / 42

Page 11: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Signs of the apocalypse?

Neil Lawrence in 2015, quoted by Manning (2015)NLP is kind of like a rabbit in the headlights of the DeepLearning machine, waiting to be flattened.

But what does this mean?

• If deep learning brings useful tools, ideas, andinsights to another field, has it thereby damagedthat field? I’d say the opposite!

• So what potential does deep learning have toimprove the science of language?

My argument todayDeep learning has much to offer the study of linguisticmeaning and communication.

2 / 42

Page 12: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Lexical semantics

3 / 42

Page 13: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Dimensions of lexical meaning

c1 c2 c3 c4 c5 · · ·

w1w2w4w5w6

...

The stock deteriorated.

4 / 42

Page 14: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Dimensions of lexical meaning

c1 c2 c3 c4 c5 · · ·

w1w2w4w5w6

...

The stock deteriorated.

4 / 42

Page 15: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Dimensions of lexical meaning

c1 c2 c3 c4 c5 · · ·

w1w2w4w5w6

...

The stock deteriorated.

4 / 42

Page 16: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

The neglect of lexical meaning in semantics

5 / 42

Page 17: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

The neglect of lexical meaning in semantics

Thomason (1974)The problems of a semantic theory should bedistinguished from those of lexicography [. . . ]

A centralgoal of (semantics) is to explain how different kinds ofmeanings attach to different syntactic categories;another is to explain how the meanings of phrasesdepend on those of their components. [. . . ] But weshould not expect a semantic theory to furnish anaccount of how any two expressions belonging to thesame syntactic category differ in meaning. “Walk” and“run,” for instance, and “unicorn” and “zebra” certainlydo differ in meaning, and we require a dictionary ofEnglish to tell us how. But the making of a dictionarydemands considerable knowledge of the world.

5 / 42

Page 18: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

The neglect of lexical meaning in semantics

Thomason (1974)The problems of a semantic theory should bedistinguished from those of lexicography [. . . ] A centralgoal of (semantics) is to explain how different kinds ofmeanings attach to different syntactic categories;another is to explain how the meanings of phrasesdepend on those of their components. [. . . ]

But weshould not expect a semantic theory to furnish anaccount of how any two expressions belonging to thesame syntactic category differ in meaning. “Walk” and“run,” for instance, and “unicorn” and “zebra” certainlydo differ in meaning, and we require a dictionary ofEnglish to tell us how. But the making of a dictionarydemands considerable knowledge of the world.

5 / 42

Page 19: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

The neglect of lexical meaning in semantics

Thomason (1974)The problems of a semantic theory should bedistinguished from those of lexicography [. . . ] A centralgoal of (semantics) is to explain how different kinds ofmeanings attach to different syntactic categories;another is to explain how the meanings of phrasesdepend on those of their components. [. . . ] But weshould not expect a semantic theory to furnish anaccount of how any two expressions belonging to thesame syntactic category differ in meaning.

“Walk” and“run,” for instance, and “unicorn” and “zebra” certainlydo differ in meaning, and we require a dictionary ofEnglish to tell us how. But the making of a dictionarydemands considerable knowledge of the world.

5 / 42

Page 20: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

The neglect of lexical meaning in semantics

Thomason (1974)The problems of a semantic theory should bedistinguished from those of lexicography [. . . ] A centralgoal of (semantics) is to explain how different kinds ofmeanings attach to different syntactic categories;another is to explain how the meanings of phrasesdepend on those of their components. [. . . ] But weshould not expect a semantic theory to furnish anaccount of how any two expressions belonging to thesame syntactic category differ in meaning. “Walk” and“run,” for instance, and “unicorn” and “zebra” certainlydo differ in meaning, and we require a dictionary ofEnglish to tell us how. But the making of a dictionarydemands considerable knowledge of the world.

5 / 42

Page 21: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Jerrold Katz (1972) on meaning

The arbitrariness of the distinction betweenform and matter reveals itself [. . . ]

The question “What is meaning?” broken down:

• What is synonymy?• What is antonymy?• What is superordination?• What is semantic ambiguity?• What is semantic truth?• What is a possible answer to a question?• . . .

6 / 42

Page 22: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Jerrold Katz (1972) on meaning

The arbitrariness of the distinction betweenform and matter reveals itself [. . . ]

The question “What is meaning?” broken down:

• What is synonymy?• What is antonymy?• What is superordination?• What is semantic ambiguity?• What is semantic truth?• What is a possible answer to a question?• . . .

6 / 42

Page 23: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Children are situated word learners

Children learn word meanings

1. with incredible speed

2. despite relatively few inputs

3. by using cues from

É contrast inherent in the forms they hear

É social cues

É assumptions about the speaker’s goals

É regularities in the physical environment.

(Frank et al. 2012; Frank & Goodman 2014)

7 / 42

Page 24: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Purely distributional meaning

• High-dimensional

• Meaning from dense linguistic inter-relationships

• Meaning solely from (nth-order) co-occurrence

• No grounding in physical or social contexts

• Not symbolic

8 / 42

Page 25: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Purely distributional meaning

• High-dimensional

• Meaning from dense linguistic inter-relationships

• Meaning solely from (nth-order) co-occurrence

• No grounding in physical or social contexts

• Not symbolic

8 / 42

Page 26: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Purely distributional meaning

• High-dimensional

• Meaning from dense linguistic inter-relationships

• Meaning solely from (nth-order) co-occurrence

• No grounding in physical or social contexts

• Not symbolic

8 / 42

Page 27: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Purely distributional meaning

• High-dimensional

• Meaning from dense linguistic inter-relationships

• Meaning solely from (nth-order) co-occurrence

• No grounding in physical or social contexts

• Not symbolic

8 / 42

Page 28: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Purely distributional meaning

• High-dimensional

• Meaning from dense linguistic inter-relationships

• Meaning solely from (nth-order) co-occurrence

• No grounding in physical or social contexts

• Not symbolic

8 / 42

Page 29: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Purely distributional meaning

• High-dimensional

• Meaning from dense linguistic inter-relationships

• Meaning solely from (nth-order) co-occurrence

• No grounding in physical or social contexts

• Not symbolic

8 / 42

Page 30: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Faruqui et al. (2015): Retrofitting to graphs∑

i∈Vαi

qi − qi

2+

(i,j,r)∈Eβij

qi − qj

2

Balances fidelity to theoriginal vector qi

against looking more likeone’s graph neighbors.

Forces are balanced with α = 1 and β = 1Degree(i)

9 / 42

Page 31: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Faruqui et al. (2015): Retrofitting to graphs∑

i∈Vαi

qi − qi

2+

(i,j,r)∈Eβij

qi − qj

2

Balances fidelity to theoriginal vector qi

against looking more likeone’s graph neighbors.

Forces are balanced with α = 1 and β = 1Degree(i)

9 / 42

Page 32: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

What retrofitting to WordNet might do

• Cluster mammal with dog and puppy even thoughmammal has a different, unusual distribution.

• Avoid polarity mistakes like modeling superb andawful as similar (though beware antonym edges!).

• Holistic consistency:

Figure 3: Two-dimensional PCA projections of 100-dimensional SG vector pairs holding the “adjective to adverb”relation, before (left) and after (right) retrofitting.

Language Task SG Retrofitted SGGerman RG-65 53.4 60.3French RG-65 46.7 60.6Spanish MC-30 54.0 59.1

Table 5: Spearman’s correlation for word similarity eval-uation using the using original and retrofitted SG vectors.

Figure 2: Spearman’s correlation on the MEN word sim-ilarity task, before and after retrofitting.

tors on 1 billion English tokens for vector lengthsranging from 50 to 1,000 and evaluate on the MENword similarity task. We retrofit these vectors toPPDB (§4) and evaluate those on the same task. Fig-ure 2 shows consistent improvement in vector qual-ity across different vector lengths.

Visualization. We randomly select eight wordpairs that have the “adjective to adverb” relationfrom the SYN-REL task (§5). We then take a two-dimensional PCA projection of the 100-dimensional

SG word vectors and plot them in R2. In Figure 3 weplot these projections before (left) and after (right)retrofitting. It can be seen that in the first case thedirection of the analogy vectors is not consistent, butafter retrofitting all the analogy vectors are alignedin the same direction.

8 Related Work

The use of lexical semantic information in trainingword vectors has been limited. Recently, word sim-ilarity knowledge (Yu and Dredze, 2014; Fried andDuh, 2014) and word relational knowledge (Xu etal., 2014; Bian et al., 2014) have been used to im-prove the word2vec embeddings in a joint train-ing model similar to our regularization approach.In latent semantic analysis, the word cooccurrencematrix can be constructed to incorporate relationalinformation like antonym specific polarity induc-tion (Yih et al., 2012) and multi-relational latent se-mantic analysis (Chang et al., 2013).

The approach we propose is conceptually similarto previous work that uses graph structures to prop-agate information among semantic concepts (Zhu,2005; Culp and Michailidis, 2008). Graph-basedbelief propagation has also been used to inducePOS tags (Subramanya et al., 2010; Das and Petrov,2011) and semantic frame associations (Das andSmith, 2011). In those efforts, labels for unknownwords were inferred using a method similar toours. Broadly, graph-based semi-supervised learn-ing (Zhu, 2005; Talukdar and Pereira, 2010) hasbeen applied to machine translation (Alexandrescu

1613

10 / 42

Page 33: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

What retrofitting to WordNet might do

• Cluster mammal with dog and puppy even thoughmammal has a different, unusual distribution.

• Avoid polarity mistakes like modeling superb andawful as similar (though beware antonym edges!).

• Holistic consistency:

Figure 3: Two-dimensional PCA projections of 100-dimensional SG vector pairs holding the “adjective to adverb”relation, before (left) and after (right) retrofitting.

Language Task SG Retrofitted SGGerman RG-65 53.4 60.3French RG-65 46.7 60.6Spanish MC-30 54.0 59.1

Table 5: Spearman’s correlation for word similarity eval-uation using the using original and retrofitted SG vectors.

Figure 2: Spearman’s correlation on the MEN word sim-ilarity task, before and after retrofitting.

tors on 1 billion English tokens for vector lengthsranging from 50 to 1,000 and evaluate on the MENword similarity task. We retrofit these vectors toPPDB (§4) and evaluate those on the same task. Fig-ure 2 shows consistent improvement in vector qual-ity across different vector lengths.

Visualization. We randomly select eight wordpairs that have the “adjective to adverb” relationfrom the SYN-REL task (§5). We then take a two-dimensional PCA projection of the 100-dimensional

SG word vectors and plot them in R2. In Figure 3 weplot these projections before (left) and after (right)retrofitting. It can be seen that in the first case thedirection of the analogy vectors is not consistent, butafter retrofitting all the analogy vectors are alignedin the same direction.

8 Related Work

The use of lexical semantic information in trainingword vectors has been limited. Recently, word sim-ilarity knowledge (Yu and Dredze, 2014; Fried andDuh, 2014) and word relational knowledge (Xu etal., 2014; Bian et al., 2014) have been used to im-prove the word2vec embeddings in a joint train-ing model similar to our regularization approach.In latent semantic analysis, the word cooccurrencematrix can be constructed to incorporate relationalinformation like antonym specific polarity induc-tion (Yih et al., 2012) and multi-relational latent se-mantic analysis (Chang et al., 2013).

The approach we propose is conceptually similarto previous work that uses graph structures to prop-agate information among semantic concepts (Zhu,2005; Culp and Michailidis, 2008). Graph-basedbelief propagation has also been used to inducePOS tags (Subramanya et al., 2010; Das and Petrov,2011) and semantic frame associations (Das andSmith, 2011). In those efforts, labels for unknownwords were inferred using a method similar toours. Broadly, graph-based semi-supervised learn-ing (Zhu, 2005; Talukdar and Pereira, 2010) hasbeen applied to machine translation (Alexandrescu

1613

10 / 42

Page 34: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

What retrofitting to WordNet might do

• Cluster mammal with dog and puppy even thoughmammal has a different, unusual distribution.

• Avoid polarity mistakes like modeling superb andawful as similar (though beware antonym edges!).

• Holistic consistency:

Figure 3: Two-dimensional PCA projections of 100-dimensional SG vector pairs holding the “adjective to adverb”relation, before (left) and after (right) retrofitting.

Language Task SG Retrofitted SGGerman RG-65 53.4 60.3French RG-65 46.7 60.6Spanish MC-30 54.0 59.1

Table 5: Spearman’s correlation for word similarity eval-uation using the using original and retrofitted SG vectors.

Figure 2: Spearman’s correlation on the MEN word sim-ilarity task, before and after retrofitting.

tors on 1 billion English tokens for vector lengthsranging from 50 to 1,000 and evaluate on the MENword similarity task. We retrofit these vectors toPPDB (§4) and evaluate those on the same task. Fig-ure 2 shows consistent improvement in vector qual-ity across different vector lengths.

Visualization. We randomly select eight wordpairs that have the “adjective to adverb” relationfrom the SYN-REL task (§5). We then take a two-dimensional PCA projection of the 100-dimensional

SG word vectors and plot them in R2. In Figure 3 weplot these projections before (left) and after (right)retrofitting. It can be seen that in the first case thedirection of the analogy vectors is not consistent, butafter retrofitting all the analogy vectors are alignedin the same direction.

8 Related Work

The use of lexical semantic information in trainingword vectors has been limited. Recently, word sim-ilarity knowledge (Yu and Dredze, 2014; Fried andDuh, 2014) and word relational knowledge (Xu etal., 2014; Bian et al., 2014) have been used to im-prove the word2vec embeddings in a joint train-ing model similar to our regularization approach.In latent semantic analysis, the word cooccurrencematrix can be constructed to incorporate relationalinformation like antonym specific polarity induc-tion (Yih et al., 2012) and multi-relational latent se-mantic analysis (Chang et al., 2013).

The approach we propose is conceptually similarto previous work that uses graph structures to prop-agate information among semantic concepts (Zhu,2005; Culp and Michailidis, 2008). Graph-basedbelief propagation has also been used to inducePOS tags (Subramanya et al., 2010; Das and Petrov,2011) and semantic frame associations (Das andSmith, 2011). In those efforts, labels for unknownwords were inferred using a method similar toours. Broadly, graph-based semi-supervised learn-ing (Zhu, 2005; Talukdar and Pereira, 2010) hasbeen applied to machine translation (Alexandrescu

1613

10 / 42

Page 35: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

What retrofitting to WordNet might do

• Cluster mammal with dog and puppy even thoughmammal has a different, unusual distribution.

• Avoid polarity mistakes like modeling superb andawful as similar (though beware antonym edges!).

• Holistic consistency:

Figure 3: Two-dimensional PCA projections of 100-dimensional SG vector pairs holding the “adjective to adverb”relation, before (left) and after (right) retrofitting.

Language Task SG Retrofitted SGGerman RG-65 53.4 60.3French RG-65 46.7 60.6Spanish MC-30 54.0 59.1

Table 5: Spearman’s correlation for word similarity eval-uation using the using original and retrofitted SG vectors.

Figure 2: Spearman’s correlation on the MEN word sim-ilarity task, before and after retrofitting.

tors on 1 billion English tokens for vector lengthsranging from 50 to 1,000 and evaluate on the MENword similarity task. We retrofit these vectors toPPDB (§4) and evaluate those on the same task. Fig-ure 2 shows consistent improvement in vector qual-ity across different vector lengths.

Visualization. We randomly select eight wordpairs that have the “adjective to adverb” relationfrom the SYN-REL task (§5). We then take a two-dimensional PCA projection of the 100-dimensional

SG word vectors and plot them in R2. In Figure 3 weplot these projections before (left) and after (right)retrofitting. It can be seen that in the first case thedirection of the analogy vectors is not consistent, butafter retrofitting all the analogy vectors are alignedin the same direction.

8 Related Work

The use of lexical semantic information in trainingword vectors has been limited. Recently, word sim-ilarity knowledge (Yu and Dredze, 2014; Fried andDuh, 2014) and word relational knowledge (Xu etal., 2014; Bian et al., 2014) have been used to im-prove the word2vec embeddings in a joint train-ing model similar to our regularization approach.In latent semantic analysis, the word cooccurrencematrix can be constructed to incorporate relationalinformation like antonym specific polarity induc-tion (Yih et al., 2012) and multi-relational latent se-mantic analysis (Chang et al., 2013).

The approach we propose is conceptually similarto previous work that uses graph structures to prop-agate information among semantic concepts (Zhu,2005; Culp and Michailidis, 2008). Graph-basedbelief propagation has also been used to inducePOS tags (Subramanya et al., 2010; Das and Petrov,2011) and semantic frame associations (Das andSmith, 2011). In those efforts, labels for unknownwords were inferred using a method similar toours. Broadly, graph-based semi-supervised learn-ing (Zhu, 2005; Talukdar and Pereira, 2010) hasbeen applied to machine translation (Alexandrescu

1613

10 / 42

Page 36: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Concerns about identity retrofitting

• No attention to edgesemantics; edges mean‘similar to’.

• Presupposes a uniforminitial embeddingspace.

• No modeling of missingedges.

Athelas

Kingsfoil

BlackBreath

NazgûlAragorn

Is

Treats

Treats

Uses Causes

11 / 42

Page 37: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Hand-build functions from Mrksic et al. (2016)

• AntonymRepel:∑

(i,j)∈A

ReLU�

1.0− d(qi,qj)�

• SynonymAttract:∑

(i,j)∈S

ReLU�

d(qi,qj)− 0�

• VectorSpacePreservation:∑

i

j∈N(i)

ReLU�

d(qi,qj)− d(qi, qj)�

12 / 42

Page 38: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Functional relations (Lengerich et al. 2018)

Framework

i∈Vαi

qi − qi

2+

(i,j,r)∈Eβijrfr(qi,qj)−

(i,j,r)∈E−βijrfr(qi,qj) + λ

r

ρ(fr)

13 / 42

Page 39: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Functional relations (Lengerich et al. 2018)

Framework

i∈Vαi

qi − qi

2+

(i,j,r)∈Eβijrfr(qi,qj)−

(i,j,r)∈E−βijrfr(qi,qj) + λ

r

ρ(fr)

Faruqui et al.

fr(qi,qj) =

qi − qj

2

with βijr = 0

13 / 42

Page 40: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Functional relations (Lengerich et al. 2018)

Framework

i∈Vαi

qi − qi

2+

(i,j,r)∈Eβijrfr(qi,qj)−

(i,j,r)∈E−βijrfr(qi,qj) + λ

r

ρ(fr)

Linear

fr(qi,qj) =

Arqj +br − qi

2

• ρ(fr) = ‖Ar‖2

• We initialize Ar = 1 and br = 0• Initialization can be different for different relations,

e.g., Aantonym = −113 / 42

Page 41: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Functional relations (Lengerich et al. 2018)

Framework

i∈Vαi

qi − qi

2+

(i,j,r)∈Eβijrfr(qi,qj)−

(i,j,r)∈E−βijrfr(qi,qj) + λ

r

ρ(fr)

Simplest neural (akin to Sutskever et al. 2009)

fr(qi,qj) = tanh(q>iArqj)

13 / 42

Page 42: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Functional relations (Lengerich et al. 2018)

Framework

i∈Vαi

qi − qi

2+

(i,j,r)∈Eβijrfr(qi,qj)−

(i,j,r)∈E−βijrfr(qi,qj) + λ

r

ρ(fr)

Neural Tensor Network (akin to Socher et al. 2013)

fr(qi,qj) = ur> tanh(q>

iArqj)

where Ar ∈ Rd×d×k and ρ(fr) = ‖Ar‖2 + ‖ur‖2

13 / 42

Page 43: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Functional relations (Lengerich et al. 2018)

Framework

i∈Vαi

qi − qi

2+

(i,j,r)∈Eβijrfr(qi,qj)−

(i,j,r)∈E−βijrfr(qi,qj) + λ

r

ρ(fr)

Your favorite graph embedding methodBordes et al. 2013; Wang et al. 2014; Lin et al. 2015; foroverviews, see Nickel et al. 2011; Hamilton et al. 2017.

13 / 42

Page 44: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

FrameNet evaluation

Model‘Inheritance’ ‘Using’ ‘Reframing’ ‘Subframe’ ‘Perspective On’(2132/992) (1552/668) (544/312) (356/168) (336/148)

None 87.58 88.59 85.60 91.24 89.59Faruqui et al. 90.79 87.87 87.02 94.50 94.24FR-Linear 92.92 92.04 89.37 94.65 94.73FR-Neural 92.46 92.54 89.57 95.65 94.04

Model‘Precedes’ ‘See Also’ ‘Causative Of’ ‘Inchoative Of’(220/136) (268/76) (204/36) (60/16)

None 87.30 85.11 86.11 82.50Faruqui et al. 85.26 83.81 84.49 78.33FR-Linear 87.00 91.93 92.09 82.50FR-Neural 89.16 93.25 94.33 85.00

14 / 42

Page 45: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

A drug–disease knowledge graphFaruqui et al. FR-Linear

Model‘Treats’

(9152/2490)

None 72.02± 0.50Faruqui et al. 72.93± 0.82FR-Linear 84.22± 0.82FR-Neural 73.52± 0.89

15 / 42

Page 46: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Functional complexity in the lexicon

Category Semantic type

noun propertiesintransitive verbs propertiestransitive verbs entities to propertiesadjectives properties to propertiesprepositions entities to (properties to properties)determiner properties to sets of properties

Vector space models tend to be monotyped, but seeClark et al. 2011.

16 / 42

Page 47: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Some lexical generalizations

1. Some transitive verbs entail the existence of theirdirect object (see) and some do not (seek).

2. Across languages, verbs lexicalize manner or result,but not both (Rappaport Hovav & Levin 2010):É Manner: nibble, scribble, sweep, flutterÉ Result: clean, cover, empty, fill

3. Some adjectives predicate distributively acrosstheir arguments, others do not (Glass 2018):É Box A and Box B are new.

(entails both are new)É Box A and Box B are heavy.

(does not entail both are heavy)

Can we develop deep learning systems thatderive such generalizations? No training againstthem; that’s just restating them!

17 / 42

Page 48: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Some lexical generalizations1. Some transitive verbs entail the existence of their

direct object (see) and some do not (seek).

2. Across languages, verbs lexicalize manner or result,but not both (Rappaport Hovav & Levin 2010):É Manner: nibble, scribble, sweep, flutterÉ Result: clean, cover, empty, fill

3. Some adjectives predicate distributively acrosstheir arguments, others do not (Glass 2018):É Box A and Box B are new.

(entails both are new)É Box A and Box B are heavy.

(does not entail both are heavy)

Can we develop deep learning systems thatderive such generalizations? No training againstthem; that’s just restating them!

17 / 42

Page 49: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Some lexical generalizations1. Some transitive verbs entail the existence of their

direct object (see) and some do not (seek).

2. Across languages, verbs lexicalize manner or result,but not both (Rappaport Hovav & Levin 2010):É Manner: nibble, scribble, sweep, flutterÉ Result: clean, cover, empty, fill

3. Some adjectives predicate distributively acrosstheir arguments, others do not (Glass 2018):É Box A and Box B are new.

(entails both are new)É Box A and Box B are heavy.

(does not entail both are heavy)

Can we develop deep learning systems thatderive such generalizations? No training againstthem; that’s just restating them!

17 / 42

Page 50: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Some lexical generalizations1. Some transitive verbs entail the existence of their

direct object (see) and some do not (seek).

2. Across languages, verbs lexicalize manner or result,but not both (Rappaport Hovav & Levin 2010):É Manner: nibble, scribble, sweep, flutterÉ Result: clean, cover, empty, fill

3. Some adjectives predicate distributively acrosstheir arguments, others do not (Glass 2018):É Box A and Box B are new.

(entails both are new)É Box A and Box B are heavy.

(does not entail both are heavy)

Can we develop deep learning systems thatderive such generalizations? No training againstthem; that’s just restating them!

17 / 42

Page 51: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Some lexical generalizations1. Some transitive verbs entail the existence of their

direct object (see) and some do not (seek).

2. Across languages, verbs lexicalize manner or result,but not both (Rappaport Hovav & Levin 2010):É Manner: nibble, scribble, sweep, flutterÉ Result: clean, cover, empty, fill

3. Some adjectives predicate distributively acrosstheir arguments, others do not (Glass 2018):É Box A and Box B are new.

(entails both are new)É Box A and Box B are heavy.

(does not entail both are heavy)

Can we develop deep learning systems thatderive such generalizations? No training againstthem; that’s just restating them!

17 / 42

Page 52: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Compositional semantics

18 / 42

Page 53: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

A semanticist’s ideal

Every student attended a lecture

∀z ((student z)→ (∃x (lecture x)∧ (attended x z)))

λg ∀z ((student z)→ (g z))

λf λg ∀z ((f z)→ (g z)) student

λy (∃x (lecture x)∧ (attended x y))

attended λg (∃x (lecture x)∧ (g x))

λf λg (∃x (f x)∧ (g x)) lecture

But is this really so ideal?

19 / 42

Page 54: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

A semanticist’s ideal

Every student attended a lecture

∀z ((student z)→ (∃x (lecture x)∧ (attended x z)))

λg ∀z ((student z)→ (g z))

λf λg ∀z ((f z)→ (g z)) student

λy (∃x (lecture x)∧ (attended x y))

attended λg (∃x (lecture x)∧ (g x))

λf λg (∃x (f x)∧ (g x)) lecture

But is this really so ideal?

19 / 42

Page 55: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Complete semantic representations?

MacCartney & Manning (2009)The difficulty is plain: truly natural language isfiendishly complex. [. . . ] Consider for a moment thedifficulty of fully and accurately translating

1. Every firm polled saw costs grow more thanexpected, even after adjusting for inflation.

to a formal meaning representation.

20 / 42

Page 56: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Sparse, fragmented feature representationsS

NP

The NYT

VP

reported S

NP

the deal

VP

fell through

the 2 source_NYT TNYT 1 embedded_implicit_neg Treport 1 deal_neg 1length 7 vocab 6. . . . . .

21 / 42

Page 57: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

The answer from deep learning

S1 = f�

NP1 , VP1

NP1 = f�

The , NYT�

The

The

NYT

NYT

VP1 = f�

reported , S2

reported

reported

S2 = f�

NP2 , VP2

NP2 = f�

the , deal�

the

the

deal

deal

VP2 = f�

fell , through�

fell

fell

through

through

22 / 42

Page 58: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

The answer from deep learningh1 = f

h0 , The�

The

The

h2 = f�

h1 , NYT�

NYT

NYT

h3 = f�

h2 , reported�

reported

reported

h4 = f�

h3 , the�

the

the

h5 = f�

h4 , deal�

deal

deal

h6 = f�

h5 , fell�

fell

fell

h7 = f�

h6 , through�

through

through

23 / 42

Page 59: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

The answer from deep learning

h0 h1 h2 h3 h4 h5 h6 h7

The NYT reported the deal fell through

The NYT reported the deal fell through

24 / 42

Page 60: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

The answer from deep learning

All our parses are wrong, but perhaps we can discoverthe right one(s).

h0 h1 h2 h3 h4 h5 h6 h7

The NYT reported the deal fell through

The NYT reported the deal fell through

25 / 42

Page 61: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

A new perspective on compositionality

CompositionalityThe meaning of a complex phrase is a function of themeaning of its consituent phrases.

Partee (1984):Context-dependence, Ambiguity, and Challenges toLocal, Deterministic Compositionality

S1 = tanh�

NP1 ; VP1 W + b�

NP1 VP1 = tanh�

reported ; S2 W + b�

reported S2

26 / 42

Page 62: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

A new perspective on compositionality

CompositionalityThe meaning of a complex phrase is a function of themeaning of its consituent phrases.

Partee (1984):Context-dependence, Ambiguity, and Challenges toLocal, Deterministic Compositionality

S1 = tanh�

NP1 ; VP1 W + b�

NP1 VP1 = tanh�

reported ; S2 W + b�

reported S2

26 / 42

Page 63: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

A new perspective on compositionality

CompositionalityThe meaning of a complex phrase is a function of themeaning of its consituent phrases.

Partee (1984):Context-dependence, Ambiguity, and Challenges toLocal, Deterministic Compositionality

S1 = tanh�

NP1 ; VP1 W + b�

NP1 VP1 = tanh�

reported ; S2 W + b�

reported S2

26 / 42

Page 64: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

A new perspective on compositionality

CompositionalityThe meaning of a complex phrase is a function of themeaning of its consituent phrases.

Partee (1984):Context-dependence, Ambiguity, and Challenges toLocal, Deterministic Compositionality

S1 = tanh�

NP1 ; VP1 W + b�

NP1 VP1 = tanh�

reported ; S2 W + b�

reported S2

26 / 42

Page 65: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Compositional generalizations: monotonicity

Kim smoked.↑

Kim smoked cigars.

Kim didn’t smoke.↓

Kim didn’t smoke cigars.

A student smoked.↗ ↖

A Swedish student smoked. A student smoked cigars.

No student smoked.↙ ↘

No Swedish student smoked. No student smoked cigars.

Every student smoked.↙ ↖

Every Swedish student smoked. Every student smoked cigars.

Most students smoked.− ↖

Most Swedish students smoked. Most students smoked cigars.

(Bowman 2017)27 / 42

Page 66: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Compositional generalizations: monotonicityKim smoked.

↑Kim smoked cigars.

Kim didn’t smoke.↓

Kim didn’t smoke cigars.

A student smoked.↗ ↖

A Swedish student smoked. A student smoked cigars.

No student smoked.↙ ↘

No Swedish student smoked. No student smoked cigars.

Every student smoked.↙ ↖

Every Swedish student smoked. Every student smoked cigars.

Most students smoked.− ↖

Most Swedish students smoked. Most students smoked cigars.

(Bowman 2017)27 / 42

Page 67: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Compositional generalizations: monotonicityKim smoked.

↑Kim smoked cigars.

Kim didn’t smoke.↓

Kim didn’t smoke cigars.

A student smoked.↗ ↖

A Swedish student smoked. A student smoked cigars.

No student smoked.↙ ↘

No Swedish student smoked. No student smoked cigars.

Every student smoked.↙ ↖

Every Swedish student smoked. Every student smoked cigars.

Most students smoked.− ↖

Most Swedish students smoked. Most students smoked cigars.

(Bowman 2017)27 / 42

Page 68: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Compositional generalizations: monotonicityKim smoked.

↑Kim smoked cigars.

Kim didn’t smoke.↓

Kim didn’t smoke cigars.

A student smoked.↗ ↖

A Swedish student smoked. A student smoked cigars.

No student smoked.↙ ↘

No Swedish student smoked. No student smoked cigars.

Every student smoked.↙ ↖

Every Swedish student smoked. Every student smoked cigars.

Most students smoked.− ↖

Most Swedish students smoked. Most students smoked cigars.

(Bowman 2017)27 / 42

Page 69: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Compositional generalizations: monotonicityKim smoked.

↑Kim smoked cigars.

Kim didn’t smoke.↓

Kim didn’t smoke cigars.

A student smoked.↗ ↖

A Swedish student smoked. A student smoked cigars.

No student smoked.↙ ↘

No Swedish student smoked. No student smoked cigars.

Every student smoked.↙ ↖

Every Swedish student smoked. Every student smoked cigars.

Most students smoked.− ↖

Most Swedish students smoked. Most students smoked cigars.

(Bowman 2017)27 / 42

Page 70: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Compositional generalizations: monotonicityKim smoked.

↑Kim smoked cigars.

Kim didn’t smoke.↓

Kim didn’t smoke cigars.

A student smoked.↗ ↖

A Swedish student smoked. A student smoked cigars.

No student smoked.↙ ↘

No Swedish student smoked. No student smoked cigars.

Every student smoked.↙ ↖

Every Swedish student smoked. Every student smoked cigars.

Most students smoked.− ↖

Most Swedish students smoked. Most students smoked cigars.

(Bowman 2017)27 / 42

Page 71: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Pragmatics

28 / 42

Page 72: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Natural language is situated and social

1. I am speaking.

2. We won. [A team I’m on; a team I support; . . . ]

3. I am here. [NAACL; New Orleans; planet earth; . . . ]

4. We are here. [pointing at a map]

5. I’m not here now. [answering machine]

6. We went to a local bar after the workshop.

7. three days ago, tomorrow, now, . . .

29 / 42

Page 73: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Natural language is situated and social

1. I am speaking.

2. We won. [A team I’m on; a team I support; . . . ]

3. I am here. [NAACL; New Orleans; planet earth; . . . ]

4. We are here. [pointing at a map]

5. I’m not here now. [answering machine]

6. We went to a local bar after the workshop.

7. three days ago, tomorrow, now, . . .

29 / 42

Page 74: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Natural language is situated and social

1. I am speaking.

2. We won. [A team I’m on; a team I support; . . . ]

3. I am here. [NAACL; New Orleans; planet earth; . . . ]

4. We are here. [pointing at a map]

5. I’m not here now. [answering machine]

6. We went to a local bar after the workshop.

7. three days ago, tomorrow, now, . . .

29 / 42

Page 75: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Natural language is situated and social

1. I am speaking.

2. We won. [A team I’m on; a team I support; . . . ]

3. I am here. [NAACL; New Orleans; planet earth; . . . ]

4. We are here. [pointing at a map]

5. I’m not here now. [answering machine]

6. We went to a local bar after the workshop.

7. three days ago, tomorrow, now, . . .

29 / 42

Page 76: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Natural language is situated and social

1. I am speaking.

2. We won. [A team I’m on; a team I support; . . . ]

3. I am here. [NAACL; New Orleans; planet earth; . . . ]

4. We are here. [pointing at a map]

5. I’m not here now. [answering machine]

6. We went to a local bar after the workshop.

7. three days ago, tomorrow, now, . . .

29 / 42

Page 77: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Natural language is situated and social

1. I am speaking.

2. We won. [A team I’m on; a team I support; . . . ]

3. I am here. [NAACL; New Orleans; planet earth; . . . ]

4. We are here. [pointing at a map]

5. I’m not here now. [answering machine]

6. We went to a local bar after the workshop.

7. three days ago, tomorrow, now, . . .

29 / 42

Page 78: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Natural language is situated and social

1. I am speaking.

2. We won. [A team I’m on; a team I support; . . . ]

3. I am here. [NAACL; New Orleans; planet earth; . . . ]

4. We are here. [pointing at a map]

5. I’m not here now. [answering machine]

6. We went to a local bar after the workshop.

7. three days ago, tomorrow, now, . . .

29 / 42

Page 79: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Natural language is situated and social

1. I am speaking.

2. We won. [A team I’m on; a team I support; . . . ]

3. I am here. [NAACL; New Orleans; planet earth; . . . ]

4. We are here. [pointing at a map]

5. I’m not here now. [answering machine]

6. We went to a local bar after the workshop.

7. three days ago, tomorrow, now, . . .

29 / 42

Page 80: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Natural language is situated and social

8. Where are you from?

a. Connecticut.b. Stanford.c. The U.S.d. Planet earth.

9. If Kangaroos had no tails, they would fall over.

a. True,b. as long as we don’t slip in the premise that they

have jet packs.

10. I didn’t see any.

a. Are there typos in my slides?b. Are the cookies in cupboard.c. Are there bookstores downtown?

29 / 42

Page 81: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Natural language is situated and social

8. Where are you from?a. Connecticut.

b. Stanford.c. The U.S.d. Planet earth.

9. If Kangaroos had no tails, they would fall over.

a. True,b. as long as we don’t slip in the premise that they

have jet packs.

10. I didn’t see any.

a. Are there typos in my slides?b. Are the cookies in cupboard.c. Are there bookstores downtown?

29 / 42

Page 82: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Natural language is situated and social

8. Where are you from?a. Connecticut.b. Stanford.

c. The U.S.d. Planet earth.

9. If Kangaroos had no tails, they would fall over.

a. True,b. as long as we don’t slip in the premise that they

have jet packs.

10. I didn’t see any.

a. Are there typos in my slides?b. Are the cookies in cupboard.c. Are there bookstores downtown?

29 / 42

Page 83: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Natural language is situated and social

8. Where are you from?a. Connecticut.b. Stanford.c. The U.S.

d. Planet earth.9. If Kangaroos had no tails, they would fall over.

a. True,b. as long as we don’t slip in the premise that they

have jet packs.

10. I didn’t see any.

a. Are there typos in my slides?b. Are the cookies in cupboard.c. Are there bookstores downtown?

29 / 42

Page 84: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Natural language is situated and social

8. Where are you from?a. Connecticut.b. Stanford.c. The U.S.d. Planet earth.

9. If Kangaroos had no tails, they would fall over.

a. True,b. as long as we don’t slip in the premise that they

have jet packs.

10. I didn’t see any.

a. Are there typos in my slides?b. Are the cookies in cupboard.c. Are there bookstores downtown?

29 / 42

Page 85: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Natural language is situated and social

8. Where are you from?a. Connecticut.b. Stanford.c. The U.S.d. Planet earth.

9. If Kangaroos had no tails, they would fall over.

a. True,b. as long as we don’t slip in the premise that they

have jet packs.10. I didn’t see any.

a. Are there typos in my slides?b. Are the cookies in cupboard.c. Are there bookstores downtown?

29 / 42

Page 86: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Natural language is situated and social

8. Where are you from?a. Connecticut.b. Stanford.c. The U.S.d. Planet earth.

9. If Kangaroos had no tails, they would fall over.a. True,

b. as long as we don’t slip in the premise that theyhave jet packs.

10. I didn’t see any.

a. Are there typos in my slides?b. Are the cookies in cupboard.c. Are there bookstores downtown?

29 / 42

Page 87: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Natural language is situated and social

8. Where are you from?a. Connecticut.b. Stanford.c. The U.S.d. Planet earth.

9. If Kangaroos had no tails, they would fall over.a. True,b. as long as we don’t slip in the premise that they

have jet packs.

10. I didn’t see any.

a. Are there typos in my slides?b. Are the cookies in cupboard.c. Are there bookstores downtown?

29 / 42

Page 88: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Natural language is situated and social

8. Where are you from?a. Connecticut.b. Stanford.c. The U.S.d. Planet earth.

9. If Kangaroos had no tails, they would fall over.a. True,b. as long as we don’t slip in the premise that they

have jet packs.10. I didn’t see any.

a. Are there typos in my slides?b. Are the cookies in cupboard.c. Are there bookstores downtown?

29 / 42

Page 89: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Natural language is situated and social

8. Where are you from?a. Connecticut.b. Stanford.c. The U.S.d. Planet earth.

9. If Kangaroos had no tails, they would fall over.a. True,b. as long as we don’t slip in the premise that they

have jet packs.10. I didn’t see any.

a. Are there typos in my slides?

b. Are the cookies in cupboard.c. Are there bookstores downtown?

29 / 42

Page 90: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Natural language is situated and social

8. Where are you from?a. Connecticut.b. Stanford.c. The U.S.d. Planet earth.

9. If Kangaroos had no tails, they would fall over.a. True,b. as long as we don’t slip in the premise that they

have jet packs.10. I didn’t see any.

a. Are there typos in my slides?b. Are the cookies in cupboard.

c. Are there bookstores downtown?

29 / 42

Page 91: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Natural language is situated and social

8. Where are you from?a. Connecticut.b. Stanford.c. The U.S.d. Planet earth.

9. If Kangaroos had no tails, they would fall over.a. True,b. as long as we don’t slip in the premise that they

have jet packs.10. I didn’t see any.

a. Are there typos in my slides?b. Are the cookies in cupboard.c. Are there bookstores downtown?

29 / 42

Page 92: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Natural language is situated and social

11. I think this is the way to the library. [politeness]

12. Would you mind if I stole your pen for a second?[politeness]

13. He’s not exactly a genius/idiot. [irony]

14. Great idea! [sarcasm(?)]

15. Any chance we can sort this out here, officer?[bribery(?)]

16. It’d be a shame if something happened to your dog.[threat(?)]

29 / 42

Page 93: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Natural language is situated and social

11. I think this is the way to the library. [politeness]

12. Would you mind if I stole your pen for a second?[politeness]

13. He’s not exactly a genius/idiot. [irony]

14. Great idea! [sarcasm(?)]

15. Any chance we can sort this out here, officer?[bribery(?)]

16. It’d be a shame if something happened to your dog.[threat(?)]

29 / 42

Page 94: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Natural language is situated and social

11. I think this is the way to the library. [politeness]

12. Would you mind if I stole your pen for a second?[politeness]

13. He’s not exactly a genius/idiot. [irony]

14. Great idea! [sarcasm(?)]

15. Any chance we can sort this out here, officer?[bribery(?)]

16. It’d be a shame if something happened to your dog.[threat(?)]

29 / 42

Page 95: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Natural language is situated and social

11. I think this is the way to the library. [politeness]

12. Would you mind if I stole your pen for a second?[politeness]

13. He’s not exactly a genius/idiot. [irony]

14. Great idea! [sarcasm(?)]

15. Any chance we can sort this out here, officer?[bribery(?)]

16. It’d be a shame if something happened to your dog.[threat(?)]

29 / 42

Page 96: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Natural language is situated and social

11. I think this is the way to the library. [politeness]

12. Would you mind if I stole your pen for a second?[politeness]

13. He’s not exactly a genius/idiot. [irony]

14. Great idea! [sarcasm(?)]

15. Any chance we can sort this out here, officer?[bribery(?)]

16. It’d be a shame if something happened to your dog.[threat(?)]

29 / 42

Page 97: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Natural language is situated and social

11. I think this is the way to the library. [politeness]

12. Would you mind if I stole your pen for a second?[politeness]

13. He’s not exactly a genius/idiot. [irony]

14. Great idea! [sarcasm(?)]

15. Any chance we can sort this out here, officer?[bribery(?)]

16. It’d be a shame if something happened to your dog.[threat(?)]

29 / 42

Page 98: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

SHRDLU (Winograd 1972)Fully grounded system that parses the user’s input,maps it to logical form, interprets that logical form in itsworld, tries to take appropriate action, and generatesresponses.

One project did succeed.Terry Winograd’s programSHRDLU could use Englishintelligently, but there was acatch: the only subject youcould discuss was a micro-world of simulated blocks.

30 / 42

Page 99: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

SHRDLU (Winograd 1972)

1. Person: WHAT DOES THE BOX CONTAIN?2. Computer: THE BLUE PYRAMID AND THE BLUE

BLOCK.3. Person: WHAT IS THE PYRAMID SUPPORTED BY?4. Computer: THE BOX.5. Person: HOW MANY BLOCKS ARE NOT IN THE BOX?6. Computer: FOUR OF THEM.7. Person: IS AT LEAST ONE OF THEM NARROWER

THAN THE ONE WHICH I TOLD YOU TO PICK UP?8. Computer: YES, THE RED CUBE.9. Person: IS IT SUPPORTED?

10. Computer: YES, BY THE TABLE.

30 / 42

Page 100: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Example: sarcasm

sarcastic?

w

h0 h1 h2 h3

Yeah great idea

Yeah great idea

31 / 42

Page 101: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Example: sarcasm

sarcastic?

w

h0 h1 h2 h3 user context

Yeah great idea

Yeah great idea

SARC (Khodak et al. 2017); Hazarika et al. 2018

32 / 42

Page 102: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Example: Colors in context

Context Utterance

xxxx xxxx xxxx blue

xxxx xxxx xxxx The darker blue one

xxxx xxxx xxxx dull pink not thesuper bright one

xxxx xxxx xxxx Purple

xxxx xxxx xxxx blue

Table: Example from the Colors in Context corpus from theStanford Computation & Cognition Lab

33 / 42

Page 103: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Example: Colors in context

Context Utterance

xxxx xxxx xxxx blue

xxxx xxxx xxxx The darker blue one

xxxx xxxx xxxx dull pink not thesuper bright one

xxxx xxxx xxxx Purple

xxxx xxxx xxxx blue

Table: Example from the Colors in Context corpus from theStanford Computation & Cognition Lab

33 / 42

Page 104: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Example: Colors in context

Context Utterance

xxxx xxxx xxxx blue

xxxx xxxx xxxx The darker blue one

xxxx xxxx xxxx dull pink not thesuper bright one

xxxx xxxx xxxx Purple

xxxx xxxx xxxx blue

Table: Example from the Colors in Context corpus from theStanford Computation & Cognition Lab

33 / 42

Page 105: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Example: Colors in context

Context Utterance

xxxx xxxx xxxx blue

xxxx xxxx xxxx The darker blue one

xxxx xxxx xxxx dull pink not thesuper bright one

xxxx xxxx xxxx Purple

xxxx xxxx xxxx blue

Table: Example from the Colors in Context corpus from theStanford Computation & Cognition Lab

33 / 42

Page 106: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Example: Colors in context

Context Utterance

xxxx xxxx xxxx blue

xxxx xxxx xxxx The darker blue one

xxxx xxxx xxxx dull pink not thesuper bright one

xxxx xxxx xxxx Purple

xxxx xxxx xxxx blue

Table: Example from the Colors in Context corpus from theStanford Computation & Cognition Lab

33 / 42

Page 107: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Literal neural speaker S0

c1 c2 cT

h h; ⟨s⟩ h; x1 h; x2

x1 x2 ⟨/s⟩

LSTM

Fully connected

softmax

34 / 42

Page 108: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Neural literal listener L0

x1 x2 x3

(μ,Σ) c1 c2 c3

• • •

c3

Embedding

LSTM

Softmax

35 / 42

Page 109: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Neural pragmatic agents

Neural pragmatic speaker (Andreas & Klein 2016)

S1(msg | c,C;θ) =L0(c |msg,C;θ)

msg′∈X L0(c |msg′,C;θ)

where X is a sample from S0(msg | c,C;θ) such thatmsg∗ ∈ X.

Neural pragmatic listener

L1(c |msg,C;θ) ∝ S1(msg | c,C;θ)

36 / 42

Page 110: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Neural pragmatic agents

Neural pragmatic speaker (Andreas & Klein 2016)

S1(msg | c,C;θ) =L0(c |msg,C;θ)

msg′∈X L0(c |msg′,C;θ)

where X is a sample from S0(msg | c,C;θ) such thatmsg∗ ∈ X.

Neural pragmatic listener

L1(c |msg,C;θ) ∝ S1(msg | c,C;θ)

36 / 42

Page 111: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Neural pragmatic agents

Neural pragmatic speaker (Andreas & Klein 2016)

S1(msg | c,C;θ) =L0(c |msg,C;θ)

msg′∈X L0(c |msg′,C;θ)

where X is a sample from S0(msg | c,C;θ) such thatmsg∗ ∈ X.

Neural pragmatic listener

L1(c |msg,C;θ) ∝ S1(msg | c,C;θ)

36 / 42

Page 112: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Neural pragmatic agents

Neural pragmatic speaker (Andreas & Klein 2016)

S1(msg | c,C;θ) =L0(c |msg,C;θ)

msg′∈X L0(c |msg′,C;θ)

where X is a sample from S0(msg | c,C;θ) such thatmsg∗ ∈ X.

Neural pragmatic listener

L1(c |msg,C;θ) ∝ S1(msg | c,C;θ)

36 / 42

Page 113: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Example: Pragmatic image captioningMao et al. (2016); Vedantam et al. (2017): Captions thatare true and distinguish their images from related ones.

Reasoning about all possible utterances/captions?

⇒ Sample from S0⇒ Full pragmatic reasoning about characters

(Cohn-Gordon et al. 2018)37 / 42

Page 114: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Example: Pragmatic image captioningMao et al. (2016); Vedantam et al. (2017): Captions thatare true and distinguish their images from related ones.

Reasoning about all possible utterances/captions?⇒ Sample from S0

⇒ Full pragmatic reasoning about characters

(Cohn-Gordon et al. 2018)37 / 42

Page 115: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Example: Pragmatic image captioningMao et al. (2016); Vedantam et al. (2017): Captions thatare true and distinguish their images from related ones.

Reasoning about all possible utterances/captions?

⇒ Sample from S0

⇒ Full pragmatic reasoning about characters

(Cohn-Gordon et al. 2018)37 / 42

Page 116: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Some pragmatic generalizations

1. Scalar implicature: general terms tend to signalthat their more specific alternatives arepragmatically marked.

2. I-implicature: if a general term has prototypicalinstantiations in context, then it might be refined topick out just those prototypes.

3. Manner implicature: unusual events are describedwith unusual language; normal events with normallanguage.

4. Metaphor: metaphorical language is pervasive andenables the speaker to highlight specificdimensions of meaning efficiently.

5. Contextual refinement: word and phrase meaningsare flexible and respond to the social context.

38 / 42

Page 117: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

The next frontier

The Human SpeechomeProject (Roy et al. 2006)

And, not to play into stereotypes of linguists, but somesymbolic reasoning would be useful!

Thanks!

39 / 42

Page 118: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

The next frontier

The Human SpeechomeProject (Roy et al. 2006)

And, not to play into stereotypes of linguists, but somesymbolic reasoning would be useful!

Thanks!

39 / 42

Page 119: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

The next frontier

The Human SpeechomeProject (Roy et al. 2006)

And, not to play into stereotypes of linguists, but somesymbolic reasoning would be useful!

Thanks!

39 / 42

Page 120: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

The next frontier

The Human SpeechomeProject (Roy et al. 2006)

And, not to play into stereotypes of linguists, but somesymbolic reasoning would be useful!

Thanks!

39 / 42

Page 121: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

References

References IAndreas, Jacob & Dan Klein. 2016. Reasoning about pragmatics with neural listeners and speakers. In

Proceedings of the 2016 conference on empirical methods in natural language processing,1173–1182. Association for Computational Linguistics.http://aclweb.org/anthology/D16-1125.

Bordes, Antoine, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston & Oksana Yakhnenko. 2013.Translating embeddings for modeling multi-relational data. In Advances in neural informationprocessing systems, 2787–2795.

Bowman, Samuel R. 2017. Modeling natural language semantics in learned representations. Stanford,CA: Stanford University dissertation.

Clark, Stephen, Bob Coecke & Mehrnoosh Sadrzadeh. 2011. Mathematical foundations for acompositional distributed model of meaning. Linguistic Analysis 36(1–4). 345–384.

Cohn-Gordon, Reuben, Noah D. Goodman & Christopher Potts. 2018. Pragmatically informative imagecaptioning with character-level inference. In Human language technologies: The 16th annualconference of the north american chapter of the Association for Computational Linguistics,Stroudsburg, PA: Association for Computational Linguistics.

Faruqui, Manaal, Jesse Dodge, Sujay Kumar Jauhar, Chris Dyer, Eduard Hovy & Noah A. Smith. 2015.Retrofitting word vectors to semantic lexicons. In Proceedings of the 2015 conference of the northamerican chapter of the association for computational linguistics: Human language technologies,1606–1615. Stroudsburg, PA: Association for Computational Linguistics.http://www.aclweb.org/anthology/N15-1184.

Frank, Michael C. & Noah D. Goodman. 2014. Inferring word meanings by assuming that speakers areinformative. Cognitive Psychology 75(1). 80–96. doi:doi:10.1016/j.cogpsych.2014.08.002.

Frank, Michael C., Joshua B. Tenenbaum & Anne Fernald. 2012. Social and discourse contributions tothe determination of reference in cross-situational word learning. Language, Learning, andDevelopment .

Glass, Lelia. 2018. Deriving the distributivity potential of adjectives via measurement theory.Proceedings of the Linguistic Society of America 3(49). 1–14. doi:10.3765/plsa.v3i1.4343.

Hamilton, William L., Rex Ying & Jure Leskovec. 2017. Representation learning on graphs: Methodsand applications. In Ieee data engineering bulletin, 52–74. IEEE Press.

40 / 42

Page 122: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

References

References II

Hazarika, Devamanyu, Soujanya Poria, Sruthi Gorantla, Erik Cambria, Roger Zimmermann & RadaMihalcea. 2018. CASCADE: Contextual sarcasm detection in online discussion forums.ArXiv:1805.06413.

Katz, Jerrold J. 1972. Semantic theory. New York: Harper & Row.Khodak, Mikhail, Nikunj Saunshi & Kiran Vodrahalli. 2017. A large self-annotated corpus for sarcasm.

arXiv preprint arXiv:1704.05579 .Lengerich, Benjamin J., Andrew L. Maas & Christopher Potts. 2018. Retrofitting distributional

embeddings to knowledge graphs with functional relations. In Proceedings of the 27thinternational conference on computational linguistics (COLING 2018), The COLING 2018Organizing Committee. ArXiv:1708.00112.

Lin, Yankai, Zhiyuan Liu, Maosong Sun, Yang Liu & Xuan Zhu. 2015. Learning entity and relationembeddings for knowledge graph completion. In Aaai, 2181–2187.

MacCartney, Bill & Christopher D. Manning. 2009. An extended model of natural logic. In Proceedingsof the eighth international conference on computational semantics, 140–156. Tilburg, TheNetherlands: Association for Computational Linguistics.http://www.aclweb.org/anthology/W09-3714.

Manning, Christopher D. 2015. Computational linguistics and deep learning. ComputationalLinguistics 41(4). doi:10.1162/COLIa00239.

Mao, Junhua, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan L. Yuille & Kevin Murphy. 2016.Generation and comprehension of unambiguous object descriptions. In Proceedings of the ieeeconference on computer vision and pattern recognition, 11–20. IEEE.

Mrksic, Nikola, Diarmuid Ó Séaghdha, Blaise Thomson, Milica Gasic, Lina M. Rojas-Barahona, Pei-HaoSu, David Vandyke, Tsung-Hsien Wen & Steve Young. 2016. Counter-fitting word vectors tolinguistic constraints. In Proceedings of the 2016 conference of the north american chapter of theassociation for computational linguistics: Human language technologies, 142–148. Association forComputational Linguistics. doi:10.18653/v1/N16-1018.http://aclanthology.coli.uni-saarland.de/pdf/N/N16/N16-1018.pdf.

41 / 42

Page 123: Linguists for Deep Learning; or: How I Learned to Stop ...cgpotts/talks/potts-starsem2018-slides.pdfLearning machine, waiting to be flattened. But what does this mean? • If deep

References

References III

Nickel, Maximilian, Volker Tresp & Hans-Peter Kriegel. 2011. A three-way model for collective learningon multi-relational data. In Proceedings of the 28th international conference on machine learning(icml-11), 809–816. ACM.

Partee, Barbara H. 1984. Compositionality. In Fred Landman & Frank Veltman (eds.), Varieties offormal semantics, 281–311. Dordrecht: Foris. Reprinted in Barbara H. Partee (2004)Compositionality in formal semantics, Oxford: Blackwell 153–181. Page references to thereprinting.

Rappaport Hovav, Malka & Beth Levin. 2010. Reflections on manner/result complementarity. In MalkaRappaport Hovav, Edit Doron & Ivy Sichel (eds.), Syntax, lexical semantics, and event structure,21–38. Oxford University Press. doi:10.1093/acprof:oso/9780199544325.003.0002.

Roy, Deb, Rupal Patel, Philip DeCamp, Rony Kubat, Michael Fleischman, Brandon Roy, NikolaosMavridis, Stefanie Tellex, Alexia Salata, Jethran Guinness et al. 2006. The human speechomeproject. In Symbol grounding and beyond, 192–196. Springer.

Socher, Richard, Danqi Chen, Christopher D Manning & Andrew Ng. 2013. Reasoning with neuraltensor networks for knowledge base completion. In Advances in neural information processingsystems, 926–934.

Sutskever, Ilya, Joshua B Tenenbaum & Ruslan R Salakhutdinov. 2009. Modelling relational data usingbayesian clustered tensor factorization. In Advances in neural information processing systems,1821–1828.

Thomason, Richmond H. 1974. Introduction. In Formal philosophy: Selected papers of RichardMontague, 1–69. New Haven, CT: Yale University Press.

Vedantam, Ramakrishna, Samy Bengio, Kevin Murphy, Devi Parikh & Gal Chechik. 2017.Context-aware captions from context-agnostic supervision. arXiv:1701.02870 .

Wang, Zhen, Jianwen Zhang, Jianlin Feng & Zheng Chen. 2014. Knowledge graph embedding bytranslating on hyperplanes. In Twenty-eighth aaai conference on artificial intelligence, .

Winograd, Terry. 1972. Understanding natural language. Cognitive Psychology 3(1). 1–191.

42 / 42