![Page 1: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/1.jpg)
Reasoningaboutpragma0cs withneurallistenersandspeakers
JacobAndreasandDanKlein
![Page 2: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/2.jpg)
Thereferencegame
2
![Page 3: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/3.jpg)
Thereferencegame
3
![Page 4: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/4.jpg)
Thereferencegame
4
Theonewiththesnake
![Page 5: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/5.jpg)
Thereferencegame
5
Mikeisholdingabaseballbat
![Page 6: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/6.jpg)
Thereferencegame
6
bataisholdingMikebaseball
![Page 7: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/7.jpg)
Thereferencegame
7
Theyaresi4ngbyapicnictable
![Page 8: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/8.jpg)
Thereferencegame
8
Thereisabat
![Page 9: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/9.jpg)
Thereferencegame
9
Thereisabat
![Page 10: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/10.jpg)
Thereferencegame
10
Whydowecareaboutthisgame?
Don’tyouthinkit’sali:lecoldinhere?
Doyouknowwhat<meitis?
Someofthechildrenplayedinthepark.
![Page 11: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/11.jpg)
Derivingpragma0csfromreasoning
11
Mikeisholdingabaseballbat
![Page 12: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/12.jpg)
12
Jennyisrunning fromthesnake
Derivingpragma0csfromreasoning
![Page 13: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/13.jpg)
13
Mikeisholdingabaseballbat
Derivingpragma0csfromreasoning
![Page 14: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/14.jpg)
Howtowin
14
DERIVEDSTRATEGY:Reasonaboutlistenerbeliefs
DIRECTSTRATEGY:Imitatesuccessfulhumanplay
Thereis asnake
Thereis asnake
Thereis asnake
?
![Page 15: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/15.jpg)
Howtowin
15
[Maoetal.2015]
[Kazemzadehetal.2014]
[Fitzgeraldetal.,2013]
[MonroeandPoRs,2015]
[Smithetal.2013]
[Vogeletal.2013]
[Gollandetal.2010]
DERIVEDSTRATEGY:Reasonaboutlistenerbeliefs
DIRECTSTRATEGY:Imitatesuccessfulhumanplay
![Page 16: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/16.jpg)
Howtowin
16
PRO:domainrepr“forfree”
CON:pastworkneedstargeteddata
PRO:pragma0cs“forfree”
CON:pastworkneedshand-engineering
DERIVEDSTRATEGY:Reasonaboutlistenerbeliefs
DIRECTSTRATEGY:Imitatesuccessfulhumanplay
![Page 17: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/17.jpg)
Howtowin
17
DERIVEDSTRATEGY:Reasonaboutlistenerbeliefs
DIRECTSTRATEGY:Imitatesuccessfulhumanplay
Learnbasemodelsforinterpreta0on&genera0onwithoutpragma0ccontext
Explicitlyreasonaboutbasemodelstogetnovelbehavior
![Page 18: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/18.jpg)
Data
AbstractScenesDataset
1000scenes10ksentencesFeaturerepresenta0ons
18
![Page 19: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/19.jpg)
Approach
19
Literal speaker
Literal listener
Sampler
Reasoningspeaker
![Page 20: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/20.jpg)
Aliteralspeaker(S0)
20
Mikeisholdingabaseballbat
![Page 21: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/21.jpg)
Aliteralspeaker(S0)
21
Referentencoder
Referentdecoder
Mikeisholdingabaseballbat
![Page 22: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/22.jpg)
Modulearchitectures
22
ReLUFC SoftmaxFC
referent
wordn
word<n wordn+1
FCreffeatures
referent
Referentencoder
Referentdecoder
![Page 23: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/23.jpg)
TrainingS0
23
Mikeisholdingabaseballbat
![Page 24: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/24.jpg)
S0
Aliteralspeaker(S0)
24
Mikeisholdingabaseballbat
Thesunisinthesky
Jennyisstanding nexttoMike
![Page 25: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/25.jpg)
Aliterallistener(L0)
25
Mikeisholdingabaseballbat
![Page 26: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/26.jpg)
Aliterallistener(L0)
26
Descr.encoder
Referentencoder
Referentencoder
Scorer
0.87Mikeisholdingabaseballbat
0.13
![Page 27: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/27.jpg)
Modulearchitectures
27
Referentencoder
Referentdecoder
sentence
ReLUSum FC Softmax choice
referent
desc
FCngramfeatures
desc
![Page 28: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/28.jpg)
TrainingL0
28
Mikeisholdingabaseballbat
(randomdistractor)
0.87
![Page 29: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/29.jpg)
Aliterallistener(L0)
29
L0
Mikeisholdingabaseballbat
![Page 30: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/30.jpg)
Areasoningspeaker(S1)
30
Mikeisholdingabaseballbat
?
![Page 31: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/31.jpg)
Areasoningspeaker(S1)
31
Literal speaker
Thesunisinthesky
Jennyisstanding nexttoMike
Literal listener
0.9
0.5
0.7
Mikeis abaseballbat
![Page 32: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/32.jpg)
Areasoningspeaker(S1)
32
Literal speaker
Thesunisinthesky
Jennyisstanding nexttoMike
Literal listener
0.9
0.5
0.7
Mikeis abaseballbat
0.05
0.09
0.08
![Page 33: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/33.jpg)
Areasoningspeaker(S1)
33
Literal speaker
Thesunisinthesky
Jennyisstanding nexttoMike
Literal listener
0.91-λ
0.51-λ
0.71-λ
Mikeis abaseballbat
0.05
0.09
0.08
*0.05λ
*0.09λ
*0.09λ
![Page 34: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/34.jpg)
Experiments
34
![Page 35: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/35.jpg)
Baselines
• Literal:theL0modelbyitself
• ContrasIve:acondi0onalLMtrainedonboththetargetimageandarandomdistractor[Maoetal.2015]
35
![Page 36: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/36.jpg)
Results(test)
Literal Contras0ve Reasoning
64%69%
81%
36
![Page 37: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/37.jpg)
Accuracyandfluency
Figure 5: Tradeoff between speaker and listener models, con-trolled by the parameter � in Equation 8. With � = 0, all weightis placed on the literal listener, and the model produces highlydiscriminative but somewhat disfluent captions. With � = 1, allweight is placed on the literal speaker, and the model producesfluent but generic captions.
4.1 How good are the base models?
To measure the performance of the base models,we draw 10 samples djk for a subset of 100 pairs(r1,j , r2,j) in the Dev-All set. We collect human flu-ency and accuracy judgments for each of the 1000total samples. This allows us to conduct a post-hocsearch over values of �: for a range of �, we com-pute the average accuracy and fluency of the high-est scoring sample. By varying �, we can view thetradeoff between accuracy and fluency that resultsfrom interpolating between the listener and speakermodel—setting � = 0 gives samples from pL0, and� = 1 gives samples from pS0.
Figure 5 shows the resulting accuracy and fluencyfor various values of �. It can be seen that relyingentirely on the listener gives the highest accuracybut degraded fluency. However, by adding only avery small weight to the speaker model, it is possibleto achieve near-perfect fluency without a substantialdecrease in accuracy. Example sentences for an in-dividual reference game are shown in Figure 5; in-creasing � causes captions to become more generic.For the remaining experiments in this paper, we take� = 0.02, finding that this gives excellent perfor-mance on both metrics.
On the development set, � = 0.02 results in anaverage fluency of 4.8 (compared to 4.8 for the lit-eral speaker � = 1). This high fluency can be con-firmed by inspection of model samples (Figure 4).
We thus focus on accuracy or the remainder of theevaluation.
4.2 How many samples are needed?
Next we turn to the computational efficiency of thereasoning model. As in all sampling-based infer-ence, the number of samples that must be drawnfrom the proposal is of critical interest—if too manysamples are needed, the model will be too slow touse in practice. Having fixed � = 0.02 in the pre-ceding section, we measure accuracy for versions ofthe reasoning model that draw 1, 10, 100, and 1000samples. Results are shown in Table 1. We find thatgains continue up to 100 samples.
4.3 Is reasoning necessary?
Because they do not require complicated inferenceprocedures, direct approaches to pragmatics typi-cally enjoy better computational efficiency than de-rived ones. Having built an accurate derived speaker,can we bootstrap a more efficient direct speaker?
To explore this, we constructed a “compiled”speaker model as follows: Given reference candi-dates r1 and r2 and target t, this model producesembeddings e1 and e2, concatenates them togetherinto a “contrast embedding” [et, e�t], and then feedsthis whole embedding into a string decoder mod-ule. Like S0, this model generates captions withoutthe need for discriminative rescoring; unlike S0, thecontrast embedding means this model can in prin-ciple learn to produce pragmatic captions, if givenaccess to pragmatic training data. Since no suchtraining data exists, we train the compiled model on
(a) target (b) distractor
(prefer L0) 0.0 a hamburger on the ground0.1 mike is holding the burger
(prefer S0) 0.2 the airplane is in the sky
Figure 5: Captions for the same pair with varying �. Changing� alters both the naturalness and specificity of the output.
37
![Page 38: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/38.jpg)
Howmanysamples?
Accuracy
50
60
70
80
90
100
#Samples
1 10 100 1000
38
![Page 39: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/39.jpg)
Examples
(a) the sun is in the sky (d) the plane is flying in the sky[contrastive] [contrastive]
(c) the dog is standing beside jenny (b) mike is wearing a chef’s hat[contrastive] [non-contrastive]
Figure 4: Figure 4: Four randomly-chosen samples from our model. For each, the target image is shown on the left, the distractorimage is shown on the right, and description generated by the model is shown below. All descriptions are fluent, and generallysucceed in uniquely identifying the target scene, even when they do not perfectly describe it (e.g. (c)). These samples are broadlyrepresentative of the model’s performance (Table 2).
Dev acc. (%) Test acc. (%)
Model All Hard All Hard
Literal (S0) 66 54 64 53Contrastive 71 54 69 58Reasoning (S1) 83 73 81 68
Table 2: Success rates at RG on abstract scenes. “Literal” isa captioning baseline corresponding to the base speaker S0.“Contrastive” is a reimplementation of the approach of Maoet al. (2015). “Reasoning” is the model from this paper. Alldifferences between our model and baselines are significant(p < 0.05, Binomial).
of the base model (it improves noticeably over S0on scenes with 2–3 differences), the overall gain isnegligible (the difference in mean scores is not sig-nificant). The compiled model significantly under-performs the reasoning model. These results sug-gest either that the reasoning procedure is not easilyapproximated by a shallow neural network, or thatexample descriptions of randomly-sampled trainingpairs (which are usually easy to discriminate) do notprovide a strong enough signal for a reflex learner torecover pragmatic behavior.
4.4 Final evaluation
Based on the following sections, we keep � = 0.02and use 100 samples to generate predictions. We
# of differences1 2 3 4 Mean
Literal (S0) 50 66 70 78 66 (%)Reasoning 64 86 88 94 83Compiled (S1) 44 72 80 80 69
Table 3: Comparison of the “compiled” pragmatic speakermodel with literal and explicitly reasoning speakers. The mod-els are evaluated on subsets of the development set, arranged bydifficulty: column headings indicate the number of differencesbetween the target and distractor scenes.
evaluate on the test set, comparing this Reason-ing model S1 to two baselines: Literal, an imagecaptioning model trained normally on the abstractscene captions (corresponding to our L0), and Con-trastive, a model trained with a soft contrastive ob-jective, and previously used for visual referring ex-pression generation (Mao et al., 2015).
Results are shown in Table 2. Our reasoningmodel outperforms both the literal baseline and pre-vious work by a substantial margin, achieving an im-provement of 17% on all pairs set and 15% on hardpairs.2 Figures 4 and 6 show various representative
2 For comparison, a model with hand-engineered pragmaticbehavior—trained using a feature representation with indicatorson only those objects that appear in the target image but not thedistractor—produces an accuracy of 78% and 69% on all and
39
![Page 40: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/40.jpg)
Examples(a) the sun is in the sky (d) the plane is flying in the sky
[contrastive] [contrastive]
(c) the dog is standing beside jenny (b) mike is wearing a chef’s hat[contrastive] [non-contrastive]
Figure 4: Figure 4: Four randomly-chosen samples from our model. For each, the target image is shown on the left, the distractorimage is shown on the right, and description generated by the model is shown below. All descriptions are fluent, and generallysucceed in uniquely identifying the target scene, even when they do not perfectly describe it (e.g. (c)). These samples are broadlyrepresentative of the model’s performance (Table 2).
Dev acc. (%) Test acc. (%)
Model All Hard All Hard
Literal (S0) 66 54 64 53Contrastive 71 54 69 58Reasoning (S1) 83 73 81 68
Table 2: Success rates at RG on abstract scenes. “Literal” isa captioning baseline corresponding to the base speaker S0.“Contrastive” is a reimplementation of the approach of Maoet al. (2015). “Reasoning” is the model from this paper. Alldifferences between our model and baselines are significant(p < 0.05, Binomial).
of the base model (it improves noticeably over S0on scenes with 2–3 differences), the overall gain isnegligible (the difference in mean scores is not sig-nificant). The compiled model significantly under-performs the reasoning model. These results sug-gest either that the reasoning procedure is not easilyapproximated by a shallow neural network, or thatexample descriptions of randomly-sampled trainingpairs (which are usually easy to discriminate) do notprovide a strong enough signal for a reflex learner torecover pragmatic behavior.
4.4 Final evaluation
Based on the following sections, we keep � = 0.02and use 100 samples to generate predictions. We
# of differences1 2 3 4 Mean
Literal (S0) 50 66 70 78 66 (%)Reasoning 64 86 88 94 83Compiled (S1) 44 72 80 80 69
Table 3: Comparison of the “compiled” pragmatic speakermodel with literal and explicitly reasoning speakers. The mod-els are evaluated on subsets of the development set, arranged bydifficulty: column headings indicate the number of differencesbetween the target and distractor scenes.
evaluate on the test set, comparing this Reason-ing model S1 to two baselines: Literal, an imagecaptioning model trained normally on the abstractscene captions (corresponding to our L0), and Con-trastive, a model trained with a soft contrastive ob-jective, and previously used for visual referring ex-pression generation (Mao et al., 2015).
Results are shown in Table 2. Our reasoningmodel outperforms both the literal baseline and pre-vious work by a substantial margin, achieving an im-provement of 17% on all pairs set and 15% on hardpairs.2 Figures 4 and 6 show various representative
2 For comparison, a model with hand-engineered pragmaticbehavior—trained using a feature representation with indicatorson only those objects that appear in the target image but not thedistractor—produces an accuracy of 78% and 69% on all and
40
![Page 41: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/41.jpg)
Examples(a) the sun is in the sky (d) the plane is flying in the sky
[contrastive] [contrastive]
(c) the dog is standing beside jenny (b) mike is wearing a chef’s hat[contrastive] [non-contrastive]
Figure 4: Figure 4: Four randomly-chosen samples from our model. For each, the target image is shown on the left, the distractorimage is shown on the right, and description generated by the model is shown below. All descriptions are fluent, and generallysucceed in uniquely identifying the target scene, even when they do not perfectly describe it (e.g. (c)). These samples are broadlyrepresentative of the model’s performance (Table 2).
Dev acc. (%) Test acc. (%)
Model All Hard All Hard
Literal (S0) 66 54 64 53Contrastive 71 54 69 58Reasoning (S1) 83 73 81 68
Table 2: Success rates at RG on abstract scenes. “Literal” isa captioning baseline corresponding to the base speaker S0.“Contrastive” is a reimplementation of the approach of Maoet al. (2015). “Reasoning” is the model from this paper. Alldifferences between our model and baselines are significant(p < 0.05, Binomial).
of the base model (it improves noticeably over S0on scenes with 2–3 differences), the overall gain isnegligible (the difference in mean scores is not sig-nificant). The compiled model significantly under-performs the reasoning model. These results sug-gest either that the reasoning procedure is not easilyapproximated by a shallow neural network, or thatexample descriptions of randomly-sampled trainingpairs (which are usually easy to discriminate) do notprovide a strong enough signal for a reflex learner torecover pragmatic behavior.
4.4 Final evaluation
Based on the following sections, we keep � = 0.02and use 100 samples to generate predictions. We
# of differences1 2 3 4 Mean
Literal (S0) 50 66 70 78 66 (%)Reasoning 64 86 88 94 83Compiled (S1) 44 72 80 80 69
Table 3: Comparison of the “compiled” pragmatic speakermodel with literal and explicitly reasoning speakers. The mod-els are evaluated on subsets of the development set, arranged bydifficulty: column headings indicate the number of differencesbetween the target and distractor scenes.
evaluate on the test set, comparing this Reason-ing model S1 to two baselines: Literal, an imagecaptioning model trained normally on the abstractscene captions (corresponding to our L0), and Con-trastive, a model trained with a soft contrastive ob-jective, and previously used for visual referring ex-pression generation (Mao et al., 2015).
Results are shown in Table 2. Our reasoningmodel outperforms both the literal baseline and pre-vious work by a substantial margin, achieving an im-provement of 17% on all pairs set and 15% on hardpairs.2 Figures 4 and 6 show various representative
2 For comparison, a model with hand-engineered pragmaticbehavior—trained using a feature representation with indicatorson only those objects that appear in the target image but not thedistractor—produces an accuracy of 78% and 69% on all and
41
![Page 42: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/42.jpg)
Conclusions
• Standardneuralkitofpartsforbasemodels• Probabilis0creasoningforhigh-levelgoals• AliRlebitofstructuregoesalongway!
42
![Page 43: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/43.jpg)
Thankyou!
![Page 44: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/44.jpg)
“Compiling”thereasoningmodel
Whatifwetrainthecontras0vemodelontheoutputofthereasoningmodel?
![Page 45: Reasoning about pragma0cs with neural listeners and speakersjda/slides/ak_pragma.pdf · DIRECT STRATEGY: Imitate successful human play ... ing model S1 to two baselines: Literal,](https://reader034.vdocuments.us/reader034/viewer/2022050205/5f5833f123ef2e28a9527dca/html5/thumbnails/45.jpg)
Results(dev)
Literal Compiled Reasoning
66% 69%
83%