linguists for deep learning; or: how i learned to stop...

Signs of the apocalypse? Lexical semantics Compositional semantics Pragmatics The next frontier

Linguists for Deep Learning;or: How I Learned to Stop Worrying and

Love Neural Networks

Christopher Potts

Stanford Linguistics

*Sem 2018, June 5–6, New Orleans

1 / 42

Christopher Potts

1 / 42

Christopher Potts

1 / 42

Signs of the apocalypse?

Neil Lawrence in 2015, quoted by Manning (2015)NLP is kind of like a rabbit in the headlights of the DeepLearning machine, waiting to be flattened.

2 / 42

Yann LeCun in 2015 [link]The next frontier for Deep Learning is natural languageunderstanding.

2 / 42

Yann LeCun at Stanford in 2018 [link]I would say language is number 300 in the list of 500problems that we need to face.

2 / 42

Yann LeCun at Stanford in 2018 [link]I would say language is number 300 in the list of 500problems that we need to face.

Did deep learning swerve off the road instead?

2 / 42

But what does this mean?

• If deep learning brings useful tools, ideas, andinsights to another field, has it thereby damagedthat field? I’d say the opposite!

• So what potential does deep learning have toimprove the science of language?

2 / 42

My argument todayDeep learning has much to offer the study of linguisticmeaning and communication.

2 / 42

Lexical semantics

3 / 42

Dimensions of lexical meaning

c1 c2 c3 c4 c5 · · ·

w1w2w4w5w6

The stock deteriorated.

4 / 42

c1 c2 c3 c4 c5 · · ·

w1w2w4w5w6

4 / 42

c1 c2 c3 c4 c5 · · ·

w1w2w4w5w6

4 / 42

The neglect of lexical meaning in semantics

5 / 42

Thomason (1974)The problems of a semantic theory should bedistinguished from those of lexicography [. . . ]

A centralgoal of (semantics) is to explain how different kinds ofmeanings attach to different syntactic categories;another is to explain how the meanings of phrasesdepend on those of their components. [. . . ] But weshould not expect a semantic theory to furnish anaccount of how any two expressions belonging to thesame syntactic category differ in meaning. “Walk” and“run,” for instance, and “unicorn” and “zebra” certainlydo differ in meaning, and we require a dictionary ofEnglish to tell us how. But the making of a dictionarydemands considerable knowledge of the world.

5 / 42

Thomason (1974)The problems of a semantic theory should bedistinguished from those of lexicography [. . . ] A centralgoal of (semantics) is to explain how different kinds ofmeanings attach to different syntactic categories;another is to explain how the meanings of phrasesdepend on those of their components. [. . . ]

But weshould not expect a semantic theory to furnish anaccount of how any two expressions belonging to thesame syntactic category differ in meaning. “Walk” and“run,” for instance, and “unicorn” and “zebra” certainlydo differ in meaning, and we require a dictionary ofEnglish to tell us how. But the making of a dictionarydemands considerable knowledge of the world.

5 / 42

Thomason (1974)The problems of a semantic theory should bedistinguished from those of lexicography [. . . ] A centralgoal of (semantics) is to explain how different kinds ofmeanings attach to different syntactic categories;another is to explain how the meanings of phrasesdepend on those of their components. [. . . ] But weshould not expect a semantic theory to furnish anaccount of how any two expressions belonging to thesame syntactic category differ in meaning.

“Walk” and“run,” for instance, and “unicorn” and “zebra” certainlydo differ in meaning, and we require a dictionary ofEnglish to tell us how. But the making of a dictionarydemands considerable knowledge of the world.

5 / 42

Thomason (1974)The problems of a semantic theory should bedistinguished from those of lexicography [. . . ] A centralgoal of (semantics) is to explain how different kinds ofmeanings attach to different syntactic categories;another is to explain how the meanings of phrasesdepend on those of their components. [. . . ] But weshould not expect a semantic theory to furnish anaccount of how any two expressions belonging to thesame syntactic category differ in meaning. “Walk” and“run,” for instance, and “unicorn” and “zebra” certainlydo differ in meaning, and we require a dictionary ofEnglish to tell us how. But the making of a dictionarydemands considerable knowledge of the world.

5 / 42

Jerrold Katz (1972) on meaning

The arbitrariness of the distinction betweenform and matter reveals itself [. . . ]

The question “What is meaning?” broken down:

• What is synonymy?• What is antonymy?• What is superordination?• What is semantic ambiguity?• What is semantic truth?• What is a possible answer to a question?• . . .

6 / 42

Jerrold Katz (1972) on meaning

The arbitrariness of the distinction betweenform and matter reveals itself [. . . ]

The question “What is meaning?” broken down:

• What is synonymy?• What is antonymy?• What is superordination?• What is semantic ambiguity?• What is semantic truth?• What is a possible answer to a question?• . . .

6 / 42

Children are situated word learners

Children learn word meanings

1. with incredible speed

2. despite relatively few inputs

3. by using cues from

É contrast inherent in the forms they hear

É social cues

É assumptions about the speaker’s goals

É regularities in the physical environment.

(Frank et al. 2012; Frank & Goodman 2014)

7 / 42

Purely distributional meaning

• High-dimensional

• Meaning from dense linguistic inter-relationships

• Meaning solely from (nth-order) co-occurrence

• No grounding in physical or social contexts

• Not symbolic

8 / 42

• Not symbolic

8 / 42

• Not symbolic

8 / 42

• Not symbolic

8 / 42

• Not symbolic

8 / 42

• Not symbolic

8 / 42

Faruqui et al. (2015): Retrofitting to graphs∑

i∈Vαi

qi − qi

(i,j,r)∈Eβij

qi − qj

Balances fidelity to theoriginal vector qi

against looking more likeone’s graph neighbors.

Forces are balanced with α = 1 and β = 1Degree(i)

9 / 42

Faruqui et al. (2015): Retrofitting to graphs∑

i∈Vαi

qi − qi

(i,j,r)∈Eβij

qi − qj

Balances fidelity to theoriginal vector qi

against looking more likeone’s graph neighbors.

Forces are balanced with α = 1 and β = 1Degree(i)

9 / 42

What retrofitting to WordNet might do

• Cluster mammal with dog and puppy even thoughmammal has a different, unusual distribution.

• Avoid polarity mistakes like modeling superb andawful as similar (though beware antonym edges!).

• Holistic consistency:

Figure 3: Two-dimensional PCA projections of 100-dimensional SG vector pairs holding the “adjective to adverb”relation, before (left) and after (right) retrofitting.

Language Task SG Retrofitted SGGerman RG-65 53.4 60.3French RG-65 46.7 60.6Spanish MC-30 54.0 59.1

Table 5: Spearman’s correlation for word similarity eval-uation using the using original and retrofitted SG vectors.

Figure 2: Spearman’s correlation on the MEN word sim-ilarity task, before and after retrofitting.

tors on 1 billion English tokens for vector lengthsranging from 50 to 1,000 and evaluate on the MENword similarity task. We retrofit these vectors toPPDB (§4) and evaluate those on the same task. Fig-ure 2 shows consistent improvement in vector qual-ity across different vector lengths.

Visualization. We randomly select eight wordpairs that have the “adjective to adverb” relationfrom the SYN-REL task (§5). We then take a two-dimensional PCA projection of the 100-dimensional

SG word vectors and plot them in R2. In Figure 3 weplot these projections before (left) and after (right)retrofitting. It can be seen that in the first case thedirection of the analogy vectors is not consistent, butafter retrofitting all the analogy vectors are alignedin the same direction.

8 Related Work

The use of lexical semantic information in trainingword vectors has been limited. Recently, word sim-ilarity knowledge (Yu and Dredze, 2014; Fried andDuh, 2014) and word relational knowledge (Xu etal., 2014; Bian et al., 2014) have been used to im-prove the word2vec embeddings in a joint train-ing model similar to our regularization approach.In latent semantic analysis, the word cooccurrencematrix can be constructed to incorporate relationalinformation like antonym specific polarity induc-tion (Yih et al., 2012) and multi-relational latent se-mantic analysis (Chang et al., 2013).

The approach we propose is conceptually similarto previous work that uses graph structures to prop-agate information among semantic concepts (Zhu,2005; Culp and Michailidis, 2008). Graph-basedbelief propagation has also been used to inducePOS tags (Subramanya et al., 2010; Das and Petrov,2011) and semantic frame associations (Das andSmith, 2011). In those efforts, labels for unknownwords were inferred using a method similar toours. Broadly, graph-based semi-supervised learn-ing (Zhu, 2005; Talukdar and Pereira, 2010) hasbeen applied to machine translation (Alexandrescu

10 / 42

8 Related Work

10 / 42

8 Related Work

10 / 42

8 Related Work

10 / 42

Concerns about identity retrofitting

• No attention to edgesemantics; edges mean‘similar to’.

• Presupposes a uniforminitial embeddingspace.

• No modeling of missingedges.

Athelas

Kingsfoil

BlackBreath

NazgûlAragorn

Treats

Uses Causes

11 / 42

Hand-build functions from Mrksic et al. (2016)

• AntonymRepel:∑

(i,j)∈A

ReLU�

1.0− d(qi,qj)�

• SynonymAttract:∑

(i,j)∈S

ReLU�

d(qi,qj)− 0�

• VectorSpacePreservation:∑

j∈N(i)

ReLU�

d(qi,qj)− d(qi, qj)�

12 / 42

Functional relations (Lengerich et al. 2018)

Framework

i∈Vαi

qi − qi

(i,j,r)∈Eβijrfr(qi,qj)−

(i,j,r)∈E−βijrfr(qi,qj) + λ

ρ(fr)

13 / 42

Framework

i∈Vαi

qi − qi

ρ(fr)

Faruqui et al.

fr(qi,qj) =

qi − qj

with βijr = 0

13 / 42

Framework

i∈Vαi

qi − qi

ρ(fr)

Linear

fr(qi,qj) =

Arqj +br − qi

• ρ(fr) = ‖Ar‖2

• We initialize Ar = 1 and br = 0• Initialization can be different for different relations,

e.g., Aantonym = −113 / 42

Framework

i∈Vαi

qi − qi

ρ(fr)

Simplest neural (akin to Sutskever et al. 2009)

fr(qi,qj) = tanh(q>iArqj)

13 / 42

Framework

i∈Vαi

qi − qi

ρ(fr)

Neural Tensor Network (akin to Socher et al. 2013)

fr(qi,qj) = ur> tanh(q>

iArqj)

where Ar ∈ Rd×d×k and ρ(fr) = ‖Ar‖2 + ‖ur‖2

13 / 42

Framework

i∈Vαi

qi − qi

ρ(fr)

Your favorite graph embedding methodBordes et al. 2013; Wang et al. 2014; Lin et al. 2015; foroverviews, see Nickel et al. 2011; Hamilton et al. 2017.

13 / 42

FrameNet evaluation

Model‘Inheritance’ ‘Using’ ‘Reframing’ ‘Subframe’ ‘Perspective On’(2132/992) (1552/668) (544/312) (356/168) (336/148)

None 87.58 88.59 85.60 91.24 89.59Faruqui et al. 90.79 87.87 87.02 94.50 94.24FR-Linear 92.92 92.04 89.37 94.65 94.73FR-Neural 92.46 92.54 89.57 95.65 94.04

Model‘Precedes’ ‘See Also’ ‘Causative Of’ ‘Inchoative Of’(220/136) (268/76) (204/36) (60/16)

None 87.30 85.11 86.11 82.50Faruqui et al. 85.26 83.81 84.49 78.33FR-Linear 87.00 91.93 92.09 82.50FR-Neural 89.16 93.25 94.33 85.00

14 / 42

A drug–disease knowledge graphFaruqui et al. FR-Linear

Model‘Treats’

(9152/2490)

None 72.02± 0.50Faruqui et al. 72.93± 0.82FR-Linear 84.22± 0.82FR-Neural 73.52± 0.89

15 / 42

Functional complexity in the lexicon

Category Semantic type

noun propertiesintransitive verbs propertiestransitive verbs entities to propertiesadjectives properties to propertiesprepositions entities to (properties to properties)determiner properties to sets of properties

Vector space models tend to be monotyped, but seeClark et al. 2011.

16 / 42

Some lexical generalizations

1. Some transitive verbs entail the existence of theirdirect object (see) and some do not (seek).

2. Across languages, verbs lexicalize manner or result,but not both (Rappaport Hovav & Levin 2010):É Manner: nibble, scribble, sweep, flutterÉ Result: clean, cover, empty, fill

3. Some adjectives predicate distributively acrosstheir arguments, others do not (Glass 2018):É Box A and Box B are new.

(entails both are new)É Box A and Box B are heavy.

(does not entail both are heavy)

Can we develop deep learning systems thatderive such generalizations? No training againstthem; that’s just restating them!

17 / 42

Some lexical generalizations1. Some transitive verbs entail the existence of their

direct object (see) and some do not (seek).

17 / 42

Compositional semantics

18 / 42

A semanticist’s ideal

Every student attended a lecture

∀z ((student z)→ (∃x (lecture x)∧ (attended x z)))

λg ∀z ((student z)→ (g z))

λf λg ∀z ((f z)→ (g z)) student

λy (∃x (lecture x)∧ (attended x y))

attended λg (∃x (lecture x)∧ (g x))

λf λg (∃x (f x)∧ (g x)) lecture

But is this really so ideal?

19 / 42

A semanticist’s ideal

Every student attended a lecture

∀z ((student z)→ (∃x (lecture x)∧ (attended x z)))

λg ∀z ((student z)→ (g z))

λf λg ∀z ((f z)→ (g z)) student

λy (∃x (lecture x)∧ (attended x y))

attended λg (∃x (lecture x)∧ (g x))

λf λg (∃x (f x)∧ (g x)) lecture

But is this really so ideal?

19 / 42

Complete semantic representations?

MacCartney & Manning (2009)The difficulty is plain: truly natural language isfiendishly complex. [. . . ] Consider for a moment thedifficulty of fully and accurately translating

1. Every firm polled saw costs grow more thanexpected, even after adjusting for inflation.

to a formal meaning representation.

20 / 42

Sparse, fragmented feature representationsS

The NYT

reported S

the deal

fell through

the 2 source_NYT TNYT 1 embedded_implicit_neg Treport 1 deal_neg 1length 7 vocab 6. . . . . .

21 / 42

The answer from deep learning

S1 = f�

NP1 , VP1

NP1 = f�

The , NYT�

VP1 = f�

reported , S2

reported

S2 = f�

NP2 , VP2

NP2 = f�

the , deal�

VP2 = f�

fell , through�

through

22 / 42

The answer from deep learningh1 = f

h0 , The�

h2 = f�

h1 , NYT�

h3 = f�

h2 , reported�

reported

h4 = f�

h3 , the�

h5 = f�

h4 , deal�

h6 = f�

h5 , fell�

h7 = f�

h6 , through�

through

23 / 42

h0 h1 h2 h3 h4 h5 h6 h7

The NYT reported the deal fell through

24 / 42

All our parses are wrong, but perhaps we can discoverthe right one(s).

h0 h1 h2 h3 h4 h5 h6 h7

The NYT reported the deal fell through

25 / 42

A new perspective on compositionality

CompositionalityThe meaning of a complex phrase is a function of themeaning of its consituent phrases.

Partee (1984):Context-dependence, Ambiguity, and Challenges toLocal, Deterministic Compositionality

S1 = tanh�

NP1 ; VP1 W + b�

NP1 VP1 = tanh�

reported ; S2 W + b�

reported S2

26 / 42

S1 = tanh�

NP1 ; VP1 W + b�

NP1 VP1 = tanh�

reported S2

26 / 42

S1 = tanh�

NP1 ; VP1 W + b�

NP1 VP1 = tanh�

reported S2

26 / 42

S1 = tanh�

NP1 ; VP1 W + b�

NP1 VP1 = tanh�

reported S2

26 / 42

Compositional generalizations: monotonicity

Kim smoked.↑

Kim smoked cigars.

Kim didn’t smoke.↓

Kim didn’t smoke cigars.

A student smoked.↗ ↖

A Swedish student smoked. A student smoked cigars.

No student smoked.↙ ↘

No Swedish student smoked. No student smoked cigars.

Every student smoked.↙ ↖

Every Swedish student smoked. Every student smoked cigars.

Most students smoked.− ↖

Most Swedish students smoked. Most students smoked cigars.

(Bowman 2017)27 / 42

Compositional generalizations: monotonicityKim smoked.

↑Kim smoked cigars.

(Bowman 2017)27 / 42

Pragmatics

28 / 42

Natural language is situated and social

1. I am speaking.

2. We won. [A team I’m on; a team I support; . . . ]

3. I am here. [NAACL; New Orleans; planet earth; . . . ]

4. We are here. [pointing at a map]

5. I’m not here now. [answering machine]

6. We went to a local bar after the workshop.

7. three days ago, tomorrow, now, . . .

29 / 42

1. I am speaking.

29 / 42

1. I am speaking.

29 / 42

1. I am speaking.

29 / 42

1. I am speaking.

29 / 42

1. I am speaking.

29 / 42

1. I am speaking.

29 / 42

1. I am speaking.

29 / 42

8. Where are you from?

a. Connecticut.b. Stanford.c. The U.S.d. Planet earth.

9. If Kangaroos had no tails, they would fall over.

a. True,b. as long as we don’t slip in the premise that they

have jet packs.

10. I didn’t see any.

a. Are there typos in my slides?b. Are the cookies in cupboard.c. Are there bookstores downtown?

29 / 42

8. Where are you from?a. Connecticut.

b. Stanford.c. The U.S.d. Planet earth.

have jet packs.

29 / 42

8. Where are you from?a. Connecticut.b. Stanford.

c. The U.S.d. Planet earth.

have jet packs.

29 / 42

8. Where are you from?a. Connecticut.b. Stanford.c. The U.S.

d. Planet earth.9. If Kangaroos had no tails, they would fall over.

have jet packs.

29 / 42

8. Where are you from?a. Connecticut.b. Stanford.c. The U.S.d. Planet earth.

have jet packs.

29 / 42

have jet packs.10. I didn’t see any.

29 / 42

9. If Kangaroos had no tails, they would fall over.a. True,

b. as long as we don’t slip in the premise that theyhave jet packs.

29 / 42

9. If Kangaroos had no tails, they would fall over.a. True,b. as long as we don’t slip in the premise that they

have jet packs.

29 / 42

a. Are there typos in my slides?

b. Are the cookies in cupboard.c. Are there bookstores downtown?

29 / 42

a. Are there typos in my slides?b. Are the cookies in cupboard.

c. Are there bookstores downtown?

29 / 42

11. I think this is the way to the library. [politeness]

12. Would you mind if I stole your pen for a second?[politeness]

13. He’s not exactly a genius/idiot. [irony]

14. Great idea! [sarcasm(?)]

15. Any chance we can sort this out here, officer?[bribery(?)]

16. It’d be a shame if something happened to your dog.[threat(?)]

29 / 42

SHRDLU (Winograd 1972)Fully grounded system that parses the user’s input,maps it to logical form, interprets that logical form in itsworld, tries to take appropriate action, and generatesresponses.

One project did succeed.Terry Winograd’s programSHRDLU could use Englishintelligently, but there was acatch: the only subject youcould discuss was a micro-world of simulated blocks.

30 / 42

SHRDLU (Winograd 1972)

1. Person: WHAT DOES THE BOX CONTAIN?2. Computer: THE BLUE PYRAMID AND THE BLUE

BLOCK.3. Person: WHAT IS THE PYRAMID SUPPORTED BY?4. Computer: THE BOX.5. Person: HOW MANY BLOCKS ARE NOT IN THE BOX?6. Computer: FOUR OF THEM.7. Person: IS AT LEAST ONE OF THEM NARROWER

THAN THE ONE WHICH I TOLD YOU TO PICK UP?8. Computer: YES, THE RED CUBE.9. Person: IS IT SUPPORTED?

10. Computer: YES, BY THE TABLE.

30 / 42

Example: sarcasm

sarcastic?

h0 h1 h2 h3

Yeah great idea

31 / 42

Example: sarcasm

sarcastic?

h0 h1 h2 h3 user context

Yeah great idea

SARC (Khodak et al. 2017); Hazarika et al. 2018

32 / 42

Example: Colors in context

Context Utterance

xxxx xxxx xxxx blue

xxxx xxxx xxxx The darker blue one

xxxx xxxx xxxx dull pink not thesuper bright one

xxxx xxxx xxxx Purple

xxxx xxxx xxxx blue

Table: Example from the Colors in Context corpus from theStanford Computation & Cognition Lab

33 / 42

Context Utterance

xxxx xxxx xxxx blue

33 / 42

Context Utterance

xxxx xxxx xxxx blue

33 / 42

Context Utterance

xxxx xxxx xxxx blue

33 / 42

Context Utterance

xxxx xxxx xxxx blue

33 / 42

Literal neural speaker S0

c1 c2 cT

h h; ⟨s⟩ h; x1 h; x2

x1 x2 ⟨/s⟩

Fully connected

softmax

34 / 42

Neural literal listener L0

x1 x2 x3

(μ,Σ) c1 c2 c3

• • •

Embedding

Softmax

35 / 42

Neural pragmatic agents

Neural pragmatic speaker (Andreas & Klein 2016)

S1(msg | c,C;θ) =L0(c |msg,C;θ)

msg′∈X L0(c |msg′,C;θ)

where X is a sample from S0(msg | c,C;θ) such thatmsg∗ ∈ X.

Neural pragmatic listener

L1(c |msg,C;θ) ∝ S1(msg | c,C;θ)

36 / 42

Example: Pragmatic image captioningMao et al. (2016); Vedantam et al. (2017): Captions thatare true and distinguish their images from related ones.

Reasoning about all possible utterances/captions?

⇒ Sample from S0⇒ Full pragmatic reasoning about characters

(Cohn-Gordon et al. 2018)37 / 42

Reasoning about all possible utterances/captions?⇒ Sample from S0

⇒ Full pragmatic reasoning about characters

Reasoning about all possible utterances/captions?

⇒ Sample from S0

⇒ Full pragmatic reasoning about characters

Some pragmatic generalizations

1. Scalar implicature: general terms tend to signalthat their more specific alternatives arepragmatically marked.

2. I-implicature: if a general term has prototypicalinstantiations in context, then it might be refined topick out just those prototypes.

3. Manner implicature: unusual events are describedwith unusual language; normal events with normallanguage.

4. Metaphor: metaphorical language is pervasive andenables the speaker to highlight specificdimensions of meaning efficiently.

5. Contextual refinement: word and phrase meaningsare flexible and respond to the social context.

38 / 42

The next frontier

The Human SpeechomeProject (Roy et al. 2006)

And, not to play into stereotypes of linguists, but somesymbolic reasoning would be useful!

Thanks!

39 / 42

The next frontier

Thanks!

39 / 42

The next frontier

Thanks!

39 / 42

The next frontier

Thanks!

39 / 42

References

References IAndreas, Jacob & Dan Klein. 2016. Reasoning about pragmatics with neural listeners and speakers. In

Proceedings of the 2016 conference on empirical methods in natural language processing,1173–1182. Association for Computational Linguistics.http://aclweb.org/anthology/D16-1125.

Bordes, Antoine, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston & Oksana Yakhnenko. 2013.Translating embeddings for modeling multi-relational data. In Advances in neural informationprocessing systems, 2787–2795.

Bowman, Samuel R. 2017. Modeling natural language semantics in learned representations. Stanford,CA: Stanford University dissertation.

Clark, Stephen, Bob Coecke & Mehrnoosh Sadrzadeh. 2011. Mathematical foundations for acompositional distributed model of meaning. Linguistic Analysis 36(1–4). 345–384.

Cohn-Gordon, Reuben, Noah D. Goodman & Christopher Potts. 2018. Pragmatically informative imagecaptioning with character-level inference. In Human language technologies: The 16th annualconference of the north american chapter of the Association for Computational Linguistics,Stroudsburg, PA: Association for Computational Linguistics.

Faruqui, Manaal, Jesse Dodge, Sujay Kumar Jauhar, Chris Dyer, Eduard Hovy & Noah A. Smith. 2015.Retrofitting word vectors to semantic lexicons. In Proceedings of the 2015 conference of the northamerican chapter of the association for computational linguistics: Human language technologies,1606–1615. Stroudsburg, PA: Association for Computational Linguistics.http://www.aclweb.org/anthology/N15-1184.

Frank, Michael C. & Noah D. Goodman. 2014. Inferring word meanings by assuming that speakers areinformative. Cognitive Psychology 75(1). 80–96. doi:doi:10.1016/j.cogpsych.2014.08.002.

Frank, Michael C., Joshua B. Tenenbaum & Anne Fernald. 2012. Social and discourse contributions tothe determination of reference in cross-situational word learning. Language, Learning, andDevelopment .

Glass, Lelia. 2018. Deriving the distributivity potential of adjectives via measurement theory.Proceedings of the Linguistic Society of America 3(49). 1–14. doi:10.3765/plsa.v3i1.4343.

Hamilton, William L., Rex Ying & Jure Leskovec. 2017. Representation learning on graphs: Methodsand applications. In Ieee data engineering bulletin, 52–74. IEEE Press.

40 / 42

References

References II

Hazarika, Devamanyu, Soujanya Poria, Sruthi Gorantla, Erik Cambria, Roger Zimmermann & RadaMihalcea. 2018. CASCADE: Contextual sarcasm detection in online discussion forums.ArXiv:1805.06413.

Katz, Jerrold J. 1972. Semantic theory. New York: Harper & Row.Khodak, Mikhail, Nikunj Saunshi & Kiran Vodrahalli. 2017. A large self-annotated corpus for sarcasm.

arXiv preprint arXiv:1704.05579 .Lengerich, Benjamin J., Andrew L. Maas & Christopher Potts. 2018. Retrofitting distributional

embeddings to knowledge graphs with functional relations. In Proceedings of the 27thinternational conference on computational linguistics (COLING 2018), The COLING 2018Organizing Committee. ArXiv:1708.00112.

Lin, Yankai, Zhiyuan Liu, Maosong Sun, Yang Liu & Xuan Zhu. 2015. Learning entity and relationembeddings for knowledge graph completion. In Aaai, 2181–2187.

MacCartney, Bill & Christopher D. Manning. 2009. An extended model of natural logic. In Proceedingsof the eighth international conference on computational semantics, 140–156. Tilburg, TheNetherlands: Association for Computational Linguistics.http://www.aclweb.org/anthology/W09-3714.

Manning, Christopher D. 2015. Computational linguistics and deep learning. ComputationalLinguistics 41(4). doi:10.1162/COLIa00239.

Mao, Junhua, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan L. Yuille & Kevin Murphy. 2016.Generation and comprehension of unambiguous object descriptions. In Proceedings of the ieeeconference on computer vision and pattern recognition, 11–20. IEEE.

Mrksic, Nikola, Diarmuid Ó Séaghdha, Blaise Thomson, Milica Gasic, Lina M. Rojas-Barahona, Pei-HaoSu, David Vandyke, Tsung-Hsien Wen & Steve Young. 2016. Counter-fitting word vectors tolinguistic constraints. In Proceedings of the 2016 conference of the north american chapter of theassociation for computational linguistics: Human language technologies, 142–148. Association forComputational Linguistics. doi:10.18653/v1/N16-1018.http://aclanthology.coli.uni-saarland.de/pdf/N/N16/N16-1018.pdf.

41 / 42

References

References III

Nickel, Maximilian, Volker Tresp & Hans-Peter Kriegel. 2011. A three-way model for collective learningon multi-relational data. In Proceedings of the 28th international conference on machine learning(icml-11), 809–816. ACM.

Partee, Barbara H. 1984. Compositionality. In Fred Landman & Frank Veltman (eds.), Varieties offormal semantics, 281–311. Dordrecht: Foris. Reprinted in Barbara H. Partee (2004)Compositionality in formal semantics, Oxford: Blackwell 153–181. Page references to thereprinting.

Rappaport Hovav, Malka & Beth Levin. 2010. Reflections on manner/result complementarity. In MalkaRappaport Hovav, Edit Doron & Ivy Sichel (eds.), Syntax, lexical semantics, and event structure,21–38. Oxford University Press. doi:10.1093/acprof:oso/9780199544325.003.0002.

Roy, Deb, Rupal Patel, Philip DeCamp, Rony Kubat, Michael Fleischman, Brandon Roy, NikolaosMavridis, Stefanie Tellex, Alexia Salata, Jethran Guinness et al. 2006. The human speechomeproject. In Symbol grounding and beyond, 192–196. Springer.

Socher, Richard, Danqi Chen, Christopher D Manning & Andrew Ng. 2013. Reasoning with neuraltensor networks for knowledge base completion. In Advances in neural information processingsystems, 926–934.

Sutskever, Ilya, Joshua B Tenenbaum & Ruslan R Salakhutdinov. 2009. Modelling relational data usingbayesian clustered tensor factorization. In Advances in neural information processing systems,1821–1828.

Thomason, Richmond H. 1974. Introduction. In Formal philosophy: Selected papers of RichardMontague, 1–69. New Haven, CT: Yale University Press.

Vedantam, Ramakrishna, Samy Bengio, Kevin Murphy, Devi Parikh & Gal Chechik. 2017.Context-aware captions from context-agnostic supervision. arXiv:1701.02870 .

Wang, Zhen, Jianwen Zhang, Jianlin Feng & Zheng Chen. 2014. Knowledge graph embedding bytranslating on hyperplanes. In Twenty-eighth aaai conference on artificial intelligence, .

Winograd, Terry. 1972. Understanding natural language. Cognitive Psychology 3(1). 1–191.

42 / 42

linguists for deep learning; or: how i learned to stop...

Documents