translating neuralese - arxiv · translating neuralese jacob andreas anca dragan dan klein ......

14
Translating Neuralese Jacob Andreas Anca Dragan Dan Klein Computer Science Division University of California, Berkeley {jda,anca,klein}@cs.berkeley.edu Abstract Several approaches have recently been pro- posed for learning decentralized deep mul- tiagent policies that coordinate via a dif- ferentiable communication channel. While these policies are effective for many tasks, interpretation of their induced communi- cation strategies has remained a challenge. Here we propose to interpret agents’ mes- sages by translating them. Unlike in typi- cal machine translation problems, we have no parallel data to learn from. Instead we develop a translation model based on the insight that agent messages and natural lan- guage strings mean the same thing if they induce the same belief about the world in a listener. We present theoretical guarantees and empirical evidence that our approach preserves both the semantics and pragmat- ics of messages by ensuring that players communicating through a translation layer do not suffer a substantial loss in reward rel- ative to players with a common language. 1 1 Introduction Several recent papers have described approaches for learning deep communicating policies (DCPs): decentralized representations of behavior that en- able multiple agents to communicate via a differ- entiable channel that can be formulated as a recur- rent neural network. DCPs have been shown to solve a variety of coordination problems, including reference games (Lazaridou et al., 2016b), logic puzzles (Foerster et al., 2016), and simple control (Sukhbaatar et al., 2016). Appealingly, the agents’ communication protocol can be learned via direct 1 We have released code and data at http://github. com/jacobandreas/neuralese. z (1) a z (2) a z (1) b z (2) b Figure 1: Example interaction between a pair of agents in a deep communicating policy. Both cars are attempting to cross the intersection, but cannot see each other. By exchanging message vectors z (t) , the agents are able to coordinate and avoid a collision. This paper presents an approach for under- standing the contents of these message vectors by translating them into natural language. backpropagation through the communication chan- nel, avoiding many of the challenging inference problems associated with learning in classical de- centralized decision processes (Roth et al., 2005). But analysis of the strategies induced by DCPs has remained a challenge. As an example, Figure 1 depicts a driving game in which two cars, which are unable to see each other, must both cross an intersection without colliding. In order to ensure success, it is clear that the cars must communi- cate with each other. But a number of successful communication strategies are possible—for exam- ple, they might report their exact (x, y) coordinates at every timestep, or they might simply announce whenever they are entering and leaving the inter- section. If these messages were communicated in natural language, it would be straightforward to determine which strategy was being employed. However, DCP agents instead communicate with an automatically induced protocol of unstructured, real-valued recurrent state vectors—an artificial language we might call “neuralese,” which superfi- cially bears little resemblance to natural language, and thus frustrates attempts at direct interpretation. arXiv:1704.06960v4 [cs.CL] 6 Jan 2018

Upload: nguyentruc

Post on 26-Jul-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Translating Neuralese

Jacob Andreas Anca Dragan Dan KleinComputer Science Division

University of California, Berkeley{jda,anca,klein}@cs.berkeley.edu

Abstract

Several approaches have recently been pro-posed for learning decentralized deep mul-tiagent policies that coordinate via a dif-ferentiable communication channel. Whilethese policies are effective for many tasks,interpretation of their induced communi-cation strategies has remained a challenge.Here we propose to interpret agents’ mes-sages by translating them. Unlike in typi-cal machine translation problems, we haveno parallel data to learn from. Instead wedevelop a translation model based on theinsight that agent messages and natural lan-guage strings mean the same thing if theyinduce the same belief about the world in alistener. We present theoretical guaranteesand empirical evidence that our approachpreserves both the semantics and pragmat-ics of messages by ensuring that playerscommunicating through a translation layerdo not suffer a substantial loss in reward rel-ative to players with a common language.1

1 Introduction

Several recent papers have described approachesfor learning deep communicating policies (DCPs):decentralized representations of behavior that en-able multiple agents to communicate via a differ-entiable channel that can be formulated as a recur-rent neural network. DCPs have been shown tosolve a variety of coordination problems, includingreference games (Lazaridou et al., 2016b), logicpuzzles (Foerster et al., 2016), and simple control(Sukhbaatar et al., 2016). Appealingly, the agents’communication protocol can be learned via direct

1 We have released code and data at http://github.com/jacobandreas/neuralese.

z(1)a z(2)

a

z(1)b z

(2)b

Figure 1: Example interaction between a pair of agents in adeep communicating policy. Both cars are attempting to crossthe intersection, but cannot see each other. By exchangingmessage vectors z(t), the agents are able to coordinate andavoid a collision. This paper presents an approach for under-standing the contents of these message vectors by translatingthem into natural language.

backpropagation through the communication chan-nel, avoiding many of the challenging inferenceproblems associated with learning in classical de-centralized decision processes (Roth et al., 2005).

But analysis of the strategies induced by DCPshas remained a challenge. As an example, Figure 1depicts a driving game in which two cars, whichare unable to see each other, must both cross anintersection without colliding. In order to ensuresuccess, it is clear that the cars must communi-cate with each other. But a number of successfulcommunication strategies are possible—for exam-ple, they might report their exact (x, y) coordinatesat every timestep, or they might simply announcewhenever they are entering and leaving the inter-section. If these messages were communicatedin natural language, it would be straightforwardto determine which strategy was being employed.However, DCP agents instead communicate withan automatically induced protocol of unstructured,real-valued recurrent state vectors—an artificiallanguage we might call “neuralese,” which superfi-cially bears little resemblance to natural language,and thus frustrates attempts at direct interpretation.

arX

iv:1

704.

0696

0v4

[cs

.CL

] 6

Jan

201

8

We propose to understand neuralese messagesby translating them. In this work, we present a sim-ple technique for inducing a dictionary that mapsbetween neuralese message vectors and short natu-ral language strings, given only examples of DCPagents interacting with other agents, and humansinteracting with other humans. Natural languagealready provides a rich set of tools for describingbeliefs, observations, and plans—our thesis is thatthese tools provide a useful complement to the visu-alization and ablation techniques used in previouswork on understanding complex models (Strobeltet al., 2016; Ribeiro et al., 2016).

While structurally quite similar to the task ofmachine translation between pairs of human lan-guages, interpretation of neuralese poses a numberof novel challenges. First, there is no natural sourceof parallel data: there are no bilingual “speakers”of both neuralese and natural language. Second,there may not be a direct correspondence betweenthe strategy employed by humans and DCP agents:even if it were constrained to communicate usingnatural language, an automated agent might chooseto produce a different message from humans in agiven state. We tackle both of these challenges byappealing to the grounding of messages in game-play. Our approach is based on one of the coreinsights in natural language semantics: messages(whether in neuralese or natural language) havesimilar meanings when they induce similar beliefsabout the state of the world.

Based on this intuition, we introduce a transla-tion criterion that matches neuralese messages withnatural language strings by minimizing statisticaldistance in a common representation space of dis-tributions over speaker states. We explore severalrelated questions:

• What makes a good translation, and underwhat conditions is translation possible at all?(Section 4)

• How can we build a model to translatebetween neuralese and natural language?(Section 5)

• What kinds of theoretical guarantees canwe provide about the behavior of agentscommunicating via this translation model?(Section 6)

Our translation model and analysis are general,and in fact apply equally to human–computer and

large bird black wings

black crown

agent translator

agent translator

small brown light brown dark brown

Figure 2: Overview of our approach—best-scoring transla-tions generated for a reference game involving images of birds.The speaking agent’s goal is to send a message that uniquelyidentifies the bird on the left. From these translations it can beseen that the learned model appears to discriminate based oncoarse attributes like size and color.

human–human translation problems grounded ingameplay. In this paper, we focus our experimentsspecifically on the problem of interpreting commu-nication in deep policies, and apply our approachto the driving game in Figure 1 and two referencegames of the kind shown in Figure 2. We find thatthis approach outperforms a more conventional ma-chine translation criterion both when attemptingto interoperate with neuralese speakers and whenpredicting their state.

2 Related work

A variety of approaches for learning deep policieswith communication were proposed essentially si-multaneously in the past year. We have broadlylabeled these as “deep communicating policies”;concrete examples include Lazaridou et al. (2016b),Foerster et al. (2016), and Sukhbaatar et al. (2016).The policy representation we employ in this paperis similar to the latter two of these, although thegeneral framework is agnostic to low-level model-ing details and could be straightforwardly appliedto other architectures. Analysis of communicationstrategies in all these papers has been largely ad-hoc, obtained by clustering states from which simi-lar messages are emitted and attempting to manu-ally assign semantics to these clusters. The presentwork aims at developing tools for performing thisanalysis automatically.

Most closely related to our approach is that ofLazaridou et al. (2016a), who also develop a modelfor assigning natural language interpretations tolearned messages; however, this approach relieson supervised cluster labels and is targeted specif-ically towards referring expression games. Herewe attempt to develop an approach that can handlegeneral multiagent interactions without assuming aprior discrete structure in space of observations.

The literature on learning decentralized multi-agent policies in general is considerably larger(Bernstein et al., 2002; Dibangoye et al., 2016).This includes work focused on communication inmultiagent settings (Roth et al., 2005) and evencommunication using natural language messages(Vogel et al., 2013b). All of these approaches em-ploy structured communication schemes with man-ually engineered messaging protocols; these are, insome sense, automatically interpretable, but at thecost of introducing considerable complexity intoboth training and inference.

Our evaluation in this paper investigates com-munication strategies that arise in a number of dif-ferent games, including reference games and anextended-horizon driving game. Communicationstrategies for reference games were previously ex-plored by Vogel et al. (2013a), Andreas and Klein(2016) and Kazemzadeh et al. (2014), and refer-ence games specifically featuring end-to-end com-munication protocols by Yu et al. (2016). On thecontrol side, a long line of work considers nonver-bal communication strategies in multiagent policies(Dragan and Srinivasa, 2013).

Another group of related approaches focuses onthe development of more general machinery forinterpreting deep models in which messages haveno explicit semantics. This includes both visualiza-tion techniques (Zeiler and Fergus, 2014; Strobeltet al., 2016), and approaches focused on generat-ing explanations in the form of natural language(Hendricks et al., 2016; Vedantam et al., 2017).

3 Problem formulation

Games Consider a cooperative game with twoplayers a and b of the form given in Figure 3. Atevery step t of this game, player a makes an ob-servation x(t)a and receives a message z(t−1)b fromb. It then takes an action u(t)a and sends a messagez(t)a to b. (The process is symmetric for b.) The

distributions p(ua|xa, zb) and p(za|xa) togetherdefine a policy π which we assume is shared byboth players, i.e. p(ua|xa, zb) = p(ub|xb, za) andp(za|xa) = p(zb|xb). As in a standard Markovdecision process, the actions (u

(t)a , u

(t)b ) alter the

world state, generating new observations for bothplayers and a reward shared by both.

The distributions p(z|x) and p(u|x, z) may alsobe viewed as defining a language: they specify howa speaker will generate messages based on worldstates, and how a listener will respond to these mes-

a

b

x(1)a

x(1)b x

(2)b

u(1)a u(2)

a

u(2)bu

(1)b

z(1)a z(2)

a

z(1)b

z(2)b

a

b

x(2)a

0.3: stop 0.5: forward 0.1: left 0.1: right

observations actions messages

Figure 3: Schematic representation of communication games.At every timestep t, players a and b make an observation x(t)

and receive a message z(t−1), then produce an action u(t) anda new message z(t).

sages. Our goal in this work is to learn to translatebetween pairs of languages generated by differentpolicies. Specifically, we assume that we have ac-cess to two policies for the same game: a “robotpolicy” πr and a “human policy” πh. We wouldlike to use the representation of πh, the behavior ofwhich is transparent to human users, in order to un-derstand the behavior of πr (which is in general anuninterpretable learned model); we will do this byinducing bilingual dictionaries that map messagevectors zr of πr to natural language strings zh ofπh and vice-versa.

Learned agents πr Our goal is to present toolsfor interpretation of learned messages that are ag-nostic to the details of the underlying algorithm foracquiring them. We use a generic DCP model asa basis for the techniques developed in this paper.Here each agent policy is represented as a deeprecurrent Q network (Hausknecht and Stone, 2015).This network is built from communicating cells ofthe kind depicted in Figure 4. At every timestep,this agent receives three pieces of information: an

x(t)a

z(t�1)b

h(t�1)a h(t)

a

u(t)a

z(t)aMLP

GRU

Figure 4: Cell implementing a single step of agent commu-nication (compare with Sukhbaatar et al. (2016) and Foersteret al. (2016)). MLP denotes a multilayer perceptron; GRUdenotes a gated recurrent unit (Cho et al., 2014). Dashed linesrepresent recurrent connections.

observation of the current state of the world, theagent’s memory vector from the previous timestep,and a message from the other player. It then pro-duces three outputs: a predicted Q value for everypossible action, a new memory vector for the nexttimestep, and a message to send to the other agent.

Sukhbaatar et al. (2016) observe that models ofthis form may be viewed as specifying a singleRNN in which weight matrices have a particularblock structure. Such models may thus be trainedusing the standard recurrent Q-learning objective,with communication protocol learned end-to-end.

Human agents πh The translation model we de-velop requires a representation of the distributionover messages p(za|xa) employed by human speak-ers (without assuming that humans and agents pro-duce equivalent messages in equivalent contexts).We model the human message generation processas categorical, and fit a simple multilayer percep-tron model to map from observations to words andphrases used during human gameplay.

4 What’s in a translation?

What does it mean for a message zh to be a “trans-lation” of a message zr? In standard machine trans-lation problems, the answer is that zh is likely toco-occur in parallel data with zr; that is, p(zh|zr)is large. Here we have no parallel data: even ifwe could observe natural language and neuralesemessages produced by agents in the same state, wewould have no guarantee that these messages ac-tually served the same function. Our answer mustinstead appeal to the fact that both natural languageand neuralese messages are grounded in a commonenvironment. For a given neuralese message zr,we will first compute a grounded representationof that message’s meaning; to translate, we find anatural-language message whose meaning is mostsimilar. The key question is then what form thisgrounded meaning representation should take. Theexisting literature suggests two broad approaches:

Semantic representation The meaning of a mes-sage za is given by its denotations: that is, by theset of world states of which za may be felicitouslypredicated, given the existing context available toa listener. In probabilistic terms, this says that themeaning of a message za is represented by the dis-tribution p(xa|za, xb) it induces over speaker states.Examples of this approach include Guerin and Pitt(2001) and Pasupat and Liang (2016).

Pragmatic representation The meaning of amessage za is given by the behavior it induces ina listener. In probabilistic terms, this says that themeaning of a message za is represented by the dis-tribution p(ub|za, xb) it induces over actions giventhe listener’s observation xb. Examples of this ap-proach include Vogel et al. (2013a) and Gauthierand Mordatch (2016).

These two approaches can give rise to rather dif-ferent behaviors. Consider the following example:

square hexagon circle

few many many

The top language (in blue) has a unique name forevery kind of shape, while the bottom language (inred) only distinguishes between shapes with fewsides and shapes with many sides. Now imaginea simple reference game with the following form:player a is covertly assigned one of these threeshapes as a reference target, and communicatesthat reference to b; b must then pull a lever labeledlarge or small depending on the size of thetarget shape. Blue language speakers can achieveperfect success at this game, while red languagespeakers can succeed at best two out of three times.

How should we translate the blue word hexagoninto the red language? The semantic approach sug-gests that we should translate hexagon as many:while many does not uniquely identify the hexagon,it produces a distribution over shapes that is clos-est to the truth. The pragmatic approach insteadsuggests that we should translate hexagon as few,as this is the only message that guarantees that thelistener will pull the correct lever large. So inorder to produce a correct listener action, the trans-lator might have to “lie” and produce a maximallyinaccurate listener belief.

If we were exclusively concerned with buildinga translation layer that allowed humans and DCPagents to interoperate as effectively as possible, itwould be natural to adopt a pragmatic representa-tion strategy. But our goals here are broader: wealso want to facilitate understanding, and specif-ically to help users of learned systems form truebeliefs about the systems’ computational processesand representational abstractions. The exampleabove demonstrates that “pragmatically” optimiz-ing directly for task performance can sometimeslead to translations that produce inaccurate beliefs.

We instead build our approach around seman-tic representations of meaning. By preserving se-mantics, we allow listeners to reason accuratelyabout the content and interpretation of messages.We might worry that by adopting a semantics-firstview, we have given up all guarantees of effectiveinteroperation between humans and agents usinga translation layer. Fortunately, this is not so: aswe will see in Section 6, it is possible to show thatplayers communicating via a semantic translatorperform only boundedly worse (and sometimes bet-ter!) than pairs of players with a common language.

5 Translation models

In this section, we build on the intuition that mes-sages should be translated via their semantics todefine a concrete translation model—a procedurefor constructing a natural language ↔ neuralesedictionary given agent and human interactions.

We understand the meaning of a message za tobe represented by the distribution p(xa|za, xb) itinduces over speaker states given listener context.We can formalize this by defining the beliefdistribution β for a message z and context xb as:

β(za, xb) = p(xa|za, xb) =p(za|xa)p(xa, xa)∑x′ap(za|x′a)p(x′a, xb)

.

Here we have modeled the listener as performinga single step of Bayesian inference, using the lis-tener state and the message generation model (byassumption shared between players) to computethe posterior over speaker states. While in gen-eral neither humans nor DCP agents compute ex-plicit representations of this posterior, past workhas found that both humans and suitably-trainedneural networks can be modeled as Bayesian rea-soners (Frank et al., 2009; Paige and Wood, 2016).

This provides a context-specific representationof belief, but for messages z and z′ to have the samesemantics, they must induce the same belief overall contexts in which they occur. In our probabilis-tic formulation, this introduces an outer expectationover contexts, providing a final measure q of thequality of a translation from z to z′:

q(z, z′) = E[DKL(β(z,Xb) || β(z′, Xb)) | z, z′

]=∑xa,xb

p(xa, xb|z, z′)DKL(β(z, xb) || β(z′, xb))

∝∑xa,xb

p(xa, xb) · p(z|xa) · p(z′|xa)

· DKL(β(z, xb) || β(z′, xb)); (1)

Algorithm 1 Translating messages

given: a phrase inventory Lfunction TRANSLATE(z)

return argminz′∈L q(z, z′)

function q(z, z′)// sample contexts and distractorsxai, xbi ∼ p(Xa, Xb) for i = 1..nx′ai ∼ p(Xa|xbi)// compute context weightswi ← p(z|xai) · p(z′|xai)wi ← wi/

∑j wj

// compute divergenceski ←

∑x∈{xa,x′a} p(z|x) log

p(z|x)p(z′|x)

p(z′)p(z)

return∑

iwiki

recalling that in this setting

DKL(β || β′) =∑xa

p(xa|z, xb) logp(xa|z, xb)p(xa|z′, xb)

∝∑xa

p(xa, xb)p(z|xa) logp(z|xa)p(z′|xa)

p(z′)

p(z)(2)

which is zero when the messages z and z′ give riseto identical belief distributions and increases asthey grow more dissimilar. To translate, we wouldlike to compute tr(zr) = argminzh q(zr, zh) andtr(zh) = argminzr q(zh, zr). Intuitively, Equa-tion 1 says that we will measure the quality of aproposed translation z 7→ z′ by asking the follow-ing question: in contexts where z is likely to beused, how frequently does z′ induce the same beliefabout speaker states as z?

While this translation criterion directly encodesthe semantic notion of meaning described in Sec-tion 4, it is doubly intractable: the KL divergenceand outer expectation involve a sum over all obser-vations xa and xb respectively; these sums are notin general possible to compute efficiently. To avoidthis, we approximate Equation 1 by sampling. Wedraw a collection of samples (xa, xb) from the priorover world states, and then generate for each sam-ple a sequence of distractors (x′a, xb) from p(x′a|xb)(we assume access to both of these distributionsfrom the problem representation). The KL termin Equation 1 is computed over each true sampleand its distractors, which are then normalized andaveraged to compute the final score.

Sampling accounts for the outer p(xa, xb) inEquation 1 and the inner p(xa|xb) in Equation 2.

a

b

xa z

xb

uFigure 5: Simplified game representation used for analysis inSection 6. A speaker agent sends a message to a listener agent,which takes a single action and receives a reward.

The only quantities remaining are of the formp(z|xa) and p(z). In the case of neuralese, theseare determined by the agent policy πr. For naturallanguage, we use transcripts of human interactionsto fit a model that maps from world states to a dis-tribution over frequent utterances as discussed inSection 3. Details of these model implementationsare provided in Appendix B, and the full translationprocedure is given in Algorithm 1.

6 Belief and behavior

The translation criterion in the previous sectionmakes no reference to listener actions at all. Theshapes example in Section 4 shows that somemodel performance might be lost under translation.It is thus reasonable to ask whether this transla-tion model of Section 5 can make any guaranteesabout the effect of translation on behavior. In thissection we explore the relationship between belief-preserving translations and the behaviors they pro-duce, by examining the effect of belief accuracyand strategy mismatch on the reward obtained bycooperating agents.

To facilitate this analysis, we consider a sim-plified family of communication games with thestructure depicted in Figure 5. These games can beviewed as a subset of the family depicted in Fig-ure 3; and consist of two steps: a listener makesan observation xa and sends a single message zto a speaker, which makes its own observation xb,takes a single action u, and receives a reward. Weemphasize that the results in this section concernthe theoretical properties of idealized games, andare presented to provide intuition about high-levelproperties of our approach. Section 8 investigatesempirical behavior of this approach on real-worldtasks where these ideal conditions do not hold.

Our first result is that translations that minimizesemantic dissimilarity q cause the listener to takenear-optimal actions:2

2Proof is provided in Appendix A.

Proposition 1.Semantic translations reward rational listeners.Define a rational listener as one that chooses thebest action in expectation over the speaker’s state:

U(z, xb) = argmaxu

∑xa

p(xa|xb, z)r(xa, xb, u)

for a reward function r ∈ [0, 1] that depends onlyon the two observations and the action.3 Now let abe a speaker of a language r, b be a listener of thesame language r, and b′ be a listener of a differentlanguage h. Suppose that we wish for a and b′ tointeract via the translator tr : zr 7→ zh (so thata produces a message zr, and b′ takes an actionU(zh = tr(zr), xb′)). If tr respects the semanticsof zr, then the bilingual pair a and b′ achieves onlyboundedly worse reward than the monolingual paira and b. Specifically, if q(zr, zh) ≤ D, then

Er(Xa, Xb, U(tr(Z))

≥ Er(Xa, Xb, U(Z))−√2D (3)

So as discussed in Section 4, even by committingto a semantic approach to meaning representation,we have still succeeded in (approximately) captur-ing the nice properties of the pragmatic approach.

Section 4 examined the consequences of a mis-match between the set of primitives available intwo languages. In general we would like somemeasure of our approach’s robustness to the lack ofan exact correspondence between two languages.In the case of humans in particular we expect thata variety of different strategies will be employed,many of which will not correspond to the behaviorof the learned agent. It is natural to want some as-surance that we can identify the DCP’s strategy aslong as some human strategy mirrors it. Our secondobservation is that it is possible to exactly recovera translation of a DCP strategy from a mixture ofhumans playing different strategies:

Proposition 2.Semantic translations find hidden correspondences.Consider a fixed robot policy πr and a set ofhuman policies {πh1 , πh2 , . . . } (recalling fromSection 3 that each π is defined by distributions

3This notion of rationality is a fairly weak one: it permitsmany suboptimal communication strategies, and requires onlythat the listener do as well as possible given a fixed speaker—a first-order optimality criterion likely to be satisfied by anyrichly-parameterized model trained via gradient descent.

p(z |xa) and p(u |z , xb)). Suppose further thatthe messages employed by these human strate-gies are disjoint; that is, if phi(z |xa) > 0, thenphj (z |xa) = 0 for all j 6= i. Now suppose thatall q(zr , zh) = 0 for all messages in the supportof some phi(z |xa) and > 0 for all j 6= i. Thenevery message zr is translated into a message pro-duced by πhi , and messages from other strategiesare ignored.

This observation follows immediately from thedefinition of q(zr, zh), but demonstrates one ofthe key distinctions between our approach and aconventional machine translation criterion. Maxi-mizing p(zh|zr) will produce the natural languagemessage most often produced in contexts wherezr is observed, regardless of whether that messageis useful or informative. By contrast, minimizingq(zh, zr) will find the zh that corresponds mostclosely to zr even when zh is rarely used.

The disjointness condition, while seeminglyquite strong, in fact arises naturally in manycircumstances—for example, players in the drivinggame reporting their spatial locations in absolutevs. relative coordinates, or speakers in a color refer-ence game (Figure 6) discriminating based on light-ness vs. hue. It is also possible to relax the abovecondition to require that strategies be only locallydisjoint (i.e. with the disjointness condition holdingfor each fixed xa), in which case overlapping hu-man strategies are allowed, and the recovered robotstrategy is a context-weighted mixture of these.

7 Evaluation

7.1 TasksIn the remainder of the paper, we evaluate the em-pirical behavior of our approach to translation. Ourevaluation considers two kinds of tasks: referencegames and navigation games. In a reference game(e.g. Figure 6a), both players observe a pair of can-didate referents. A speaker is assigned a target ref-erent; it must communicate this target to a listener,who then performs a choice action correspondingto its belief about the true target. In this paper weconsider two variants on the reference game: a sim-ple color-naming task, and a more complex taskinvolving natural images of birds. For examplesof human communication strategies for these tasks,we obtain the XKCD color dataset (McMahan andStone, 2015; Monroe et al., 2016) and the Caltech–UCSD Birds dataset (Welinder et al., 2010) with

(a) (b)

(c)

Figure 6: Tasks used to evaluate the translation model. (a–b)Reference games: both players observe a pair of referencecandidates (colors or images); Player a is assigned a target(marked with a star), which player b must guess based ona message from a. (c) Driving game: each car attempts tonavigate to its goal (marked with a star). The cars cannot seeeach other, and must communicate to avoid a collision.

accompanying natural language descriptions (Reedet al., 2016). We use standard train / validation /test splits for both of these datasets.

The final task we consider is the driving task(Figure 6c) first discussed in the introduction. Inthis task, two cars, invisible to each other, musteach navigate between randomly assigned start andgoal positions without colliding. This task takesa number of steps to complete, and potentially in-volves a much broader range of communicationstrategies. To obtain human annotations for thistask, we recorded both actions and messages gener-ated by pairs of human Amazon Mechanical Turkworkers playing the driving game with each other.We collected close to 400 games, with a total ofmore than 2000 messages exchanged, from whichwe held out 100 game traces as a test set.

7.2 Metrics

A mechanism for understanding the behavior ofa learned model should allow a human user bothto correctly infer its beliefs and to successfullyinteroperate with it; we accordingly report resultsof both “belief” and “behavior” evaluations.

To support easy reproduction and comparison(and in keeping with standard practice in machine

translation), we focus on developing automaticmeasures of system performance. We use the avail-able training data to develop simulated models ofhuman decisions; by first showing that these mod-els track well with human judgments, we can beconfident that their use in evaluations will corre-late with human understanding. We employ thefollowing two metrics:

Belief evaluation This evaluation focuses on thedenotational perspective in semantics that moti-vated the initial development of our model. Wehave successfully understood the semantics of amessage zr if, after translating zr 7→ zh, a humanlistener can form a correct belief about the statein which zr was produced. We construct a simplestate-guessing game where the listener is presentedwith a translated message and two state observa-tions, and must guess which state the speaker wasin when the message was emitted.

When translating from natural language to neu-ralese, we use the learned agent model to directlyguess the hidden state. For neuralese to naturallanguage we must first construct a “model humanlistener” to map from strings back to state repre-sentations; we do this by using the training data tofit a simple regression model that scores (state, sen-tence) pairs using a bag-of-words sentence repre-sentation. We find that our “model human” matchesthe judgments of real humans 83% of the time onthe colors task, 77% of the time on the birds task,and 77% of the time on the driving task. This givesus confidence that the model human gives a reason-ably accurate proxy for human interpretation.

Behavior evaluation This evaluation focuses onthe cooperative aspects of interpretability: we mea-sure the extent to which learned models are able tointeroperate with each other by way of a translationlayer. In the case of reference games, the goal ofthis semantic evaluation is identical to the goal ofthe game itself (to identify the hidden state of thespeaker), so we perform this additional pragmaticevaluation only for the driving game. We foundthat the most reliable way to make use of humangame traces was to construct a speaker-only modelhuman. The evaluation selects a full game tracefrom a human player, and replays both the human’sactions and messages exactly (disregarding any in-coming messages); the evaluation measures thequality of the natural-language-to-neuralese transla-tor, and the extent to which the learned agent model

(a)

as speakerR H

aslis

tene

r R 1.000.50 random0.70 direct0.73 belief (ours)

H*0.50

0.830.720.86

(b)

as speakerR H

aslis

tene

r R 0.950.50 random0.55 direct0.60 belief (ours)

H*0.50

0.770.570.75

Table 1: Evaluation results for reference games. (a) The colorstask. (b) The birds task. Whether the model human is in alistener or speaker role, translation based on belief matchingoutperforms both random and machine translation baselines.

can accommodate a (real) human given translationsof the human’s messages.

Baselines We compare our approach to two base-lines: a random baseline that chooses a translationof each input uniformly from messages observedduring training, and a direct baseline that directlymaximizes p(z′|z) (by analogy to a conventionalmachine translation system). This is accomplishedby sampling from a DCP speaker in training stateslabeled with natural language strings.

8 Results

In all below, “R” indicates a DCP agent, “H” in-dicates a real human, and “H*” indicates a modelhuman player.

Reference games Results for the two referencegames are shown in Table 1. The end-to-end trainedmodel achieves nearly perfect accuracy in both

magenta, hot, rose, violet, purple

magenta, hot, violet, rose, purple

olive, puke, pea, grey, brown

pinkish, grey, dull, pale, light

Figure 7: Best-scoring translations generated for color task.

as speakerR H

aslis

tene

r R 0.850.50 random0.45 direct0.61 belief (ours)

H*0.5

0.770.450.57

Table 2: Belief evaluation results for the driving game. Drivingstates are challenging to identify based on messages alone (asevidenced by the comparatively low scores obtained by single-language pairs) . Translation based on belief achieves the bestoverall performance in both directions.

R / R H / H R / H

1.93 / 0.71 — / 0.771.35 / 0.64 random1.49 / 0.67 direct1.54 / 0.67 belief (ours)

Table 3: Behavior evaluation results for the driving game.Scores are presented in the form “reward / completion rate”.While less accurate than either humans or DCPs with a sharedlanguage, the models that employ a translation layer obtainhigher reward and a greater overall success rate than baselines.

cases, while a model trained to communicate innatural language achieves somewhat lower perfor-mance. Regardless of whether the speaker is aDCP and the listener a model human or vice-versa,translation based on the belief-matching criterionin Section 5 achieves the best performance; indeed,when translating neuralese color names to naturallanguage, the listener is able to achieve a slightlyhigher score than it is natively. This suggests thatthe automated agent has discovered a more effec-tive strategy than the one demonstrated by humansin the dataset, and that the effectiveness of thisstrategy is preserved by translation. Example trans-lations from the reference games are depicted inFigure 2 and Figure 7.

Driving game Behavior evaluation of the drivinggame is shown in Table 3, and belief evaluation isshown in Table 2. Translation of messages in thedriving game is considerably more challenging thanin the reference games, and scores are uniformlylower; however, a clear benefit from the belief-matching model is still visible. Belief matchingleads to higher scores on the belief evaluation inboth directions, and allows agents to obtain a higherreward on average (though task completion ratesremain roughly the same across all agents). Someexample translations of driving game messages areshown in Figure 8.

at goal

done

left to top

going in intersection

proceed

going

you first

following

going down

Figure 8: Best-scoring translations generated for driving taskgenerated from the given speaker state.

9 Conclusion

We have investigated the problem of interpretingmessage vectors from deep networks by translat-ing them. After introducing a translation criterionbased on matching listener beliefs about speakerstates, we presented both theoretical and empiricalevidence that this criterion outperforms a conven-tional machine translation approach at recoveringthe content of message vectors and facilitating col-laboration between humans and learned agents.

While our evaluation has focused on under-standing the behavior of deep communicating poli-cies, the framework proposed in this paper couldbe much more generally applied. Any encoder–decoder model (Sutskever et al., 2014) can bethought of as a kind of communication game playedbetween the encoder and the decoder, so we cananalogously imagine computing and translating“beliefs” induced by the encoding to explain whatfeatures of the input are being transmitted. The cur-rent work has focused on learning a purely categor-ical model of the translation process, supported byan unstructured inventory of translation candidates,and future work could explore the compositionalstructure of messages, and attempt to synthesizenovel natural language or neuralese messages fromscratch. More broadly, the work here shows thatthe denotational perspective from formal seman-tics provides a framework for precisely framing thedemands of interpretable machine learning (Wil-son et al., 2016), and particularly for ensuring thathuman users without prior exposure to a learnedmodel are able to interoperate with it, predict itsbehavior, and diagnose its errors.

Acknowledgments

JA is supported by a Facebook Graduate Fellow-ship and a Berkeley AI / Huawei Fellowship. Weare grateful to Lisa Anne Hendricks for assistancewith the Caltech–UCSD Birds dataset, and to LiangHuang and Sebastian Schuster for useful feedback.

ReferencesJacob Andreas and Dan Klein. 2016. Reasoning about

pragmatics with neural listeners and speakers. InProceedings of the Conference on Empirical Meth-ods in Natural Language Processing.

Daniel S Bernstein, Robert Givan, Neil Immerman,and Shlomo Zilberstein. 2002. The complexity ofdecentralized control of Markov decision processes.Mathematics of operations research 27(4):819–840.

Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bah-danau, and Yoshua Bengio. 2014. On the propertiesof neural machine translation: Encoder-decoder ap-proaches. arXiv preprint arXiv:1409.1259 .

Jilles Steeve Dibangoye, Christopher Amato, OlivierBuffet, and Francois Charpillet. 2016. Optimallysolving Dec-POMDPs as continuous-state MDPs.Journal of Artificial Intelligence Research 55:443–497.

Anca Dragan and Siddhartha Srinivasa. 2013. Gener-ating legible motion. In Robotics: Science and Sys-tems.

Jakob Foerster, Yannis M Assael, Nando de Freitas,and Shimon Whiteson. 2016. Learning to commu-nicate with deep multi-agent reinforcement learning.In Advances in Neural Information Processing Sys-tems. pages 2137–2145.

Michael C Frank, Noah D Goodman, Peter Lai, andJoshua B Tenenbaum. 2009. Informative communi-cation in word production and word learning. In Pro-ceedings of the 31st annual conference of the cogni-tive science society. pages 1228–1233.

Yang Gao, Oscar Beijbom, Ning Zhang, and TrevorDarrell. 2016. Compact bilinear pooling. In Pro-ceedings of the IEEE Conference on Computer Vi-sion and Pattern Recognition. pages 317–326.

Jon Gauthier and Igor Mordatch. 2016. A paradigm forsituated and goal-driven language learning. arXivpreprint arXiv:1610.03585 .

Frank Guerin and Jeremy Pitt. 2001. Denotational se-mantics for agent communication language. In Pro-ceedings of the fifth international conference on Au-tonomous agents. ACM, pages 497–504.

Matthew Hausknecht and Peter Stone. 2015. Deeprecurrent q-learning for partially observable mdps.arXiv preprint arXiv:1507.06527 .

Lisa Anne Hendricks, Zeynep Akata, MarcusRohrbach, Jeff Donahue, Bernt Schiele, and TrevorDarrell. 2016. Generating visual explanations. InEuropean Conference on Computer Vision. Springer,pages 3–19.

Sahar Kazemzadeh, Vicente Ordonez, Mark Matten,and Tamara L Berg. 2014. ReferItGame: Referringto objects in photographs of natural scenes. In Pro-ceedings of the Conference on Empirical Methods inNatural Language Processing. pages 787–798.

Diederik Kingma and Jimmy Ba. 2014. Adam: Amethod for stochastic optimization. arXiv preprintarXiv:1412.6980 .

Angeliki Lazaridou, Alexander Peysakhovich, andMarco Baroni. 2016a. Multi-agent cooperation andthe emergence of (natural) language. arXiv preprintarXiv:1612.07182 .

Angeliki Lazaridou, Nghia The Pham, andMarco Baroni. 2016b. Towards multi-agentcommunication-based language learning. arXivpreprint arXiv:1605.07133 .

Brian McMahan and Matthew Stone. 2015. ABayesian model of grounded color semantics. Trans-actions of the Association for Computational Lin-guistics 3:103–115.

Will Monroe, Noah D Goodman, and Christopher Potts.2016. Learning to generate compositional color de-scriptions. arXiv preprint arXiv:1606.03821 .

Brooks Paige and Frank Wood. 2016. Inference net-works for sequential monte carlo in graphical mod-els. volume 48.

Panupong Pasupat and Percy Liang. 2016. Inferringlogical forms from denotations. arXiv preprintarXiv:1606.06900 .

Scott Reed, Zeynep Akata, Honglak Lee, and BerntSchiele. 2016. Learning deep representations offine-grained visual descriptions. In Proceedings ofthe IEEE Conference on Computer Vision and Pat-tern Recognition. pages 49–58.

Marco Tulio Ribeiro, Sameer Singh, and CarlosGuestrin. 2016. Why should I trust you?: Explain-ing the predictions of any classifier. In Proceedingsof the 22nd ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining. ACM,pages 1135–1144.

Maayan Roth, Reid Simmons, and Manuela Veloso.2005. Reasoning about joint beliefs for execution-time communication decisions. In Proceedingsof the fourth international joint conference on Au-tonomous agents and multiagent systems. ACM,pages 786–793.

Hendrik Strobelt, Sebastian Gehrmann, Bernd Huber,Hanspeter Pfister, and Alexander M Rush. 2016. Vi-sual analysis of hidden state dynamics in recurrentneural networks. arXiv preprint arXiv:1606.07461 .

Sainbayar Sukhbaatar, Rob Fergus, et al. 2016. Learn-ing multiagent communication with backpropaga-tion. In Advances in Neural Information ProcessingSystems. pages 2244–2252.

Ilya Sutskever, Oriol Vinyals, and Quoc VV Le. 2014.Sequence to sequence learning with neural networks.In Advances in Neural Information Processing Sys-tems. pages 3104–3112.

Ramakrishna Vedantam, Samy Bengio, Kevin Murphy,Devi Parikh, and Gal Chechik. 2017. Context-awarecaptions from context-agnostic supervision. arXivpreprint arXiv:1701.02870 .

Adam Vogel, Max Bodoia, Christopher Potts, andDaniel Jurafsky. 2013a. Emergence of Gricean max-ims from multi-agent decision theory. In Proceed-ings of the Human Language Technology Confer-ence of the North American Chapter of the Asso-ciation for Computational Linguistics. pages 1072–1081.

Adam Vogel, Christopher Potts, and Dan Jurafsky.2013b. Implicatures and nested beliefs in approx-imate Decentralized-POMDPs. In ACL (2). pages74–80.

P. Welinder, S. Branson, T. Mita, C. Wah, F. Schroff,S. Belongie, and P. Perona. 2010. Caltech-UCSDBirds 200. Technical Report CNS-TR-2010-001,California Institute of Technology.

Andrew Gordon Wilson, Been Kim, and William Her-lands. 2016. Proceedings of nips 2016 workshop oninterpretable machine learning for complex systems.arXiv preprint arXiv:1611.09139 .

Licheng Yu, Hao Tan, Mohit Bansal, and Tamara LBerg. 2016. A joint speaker-listener-reinforcermodel for referring expressions. arXiv preprintarXiv:1612.09542 .

Matthew D Zeiler and Rob Fergus. 2014. Visualizingand understanding convolutional networks. In Euro-pean conference on computer vision. Springer, pages818–833.

A Proofs

Proof of Proposition 1 We know that

U(z, xb) := argmaxu

∑xa

p(xa|xb, z)r(xa, xb, z)

and that for all translations (z, z′ = t(r))

D ≥∑xb

p(xb|z, z′)DKL(β(z, xb) || β(z′, xb)) .

Applying Pinsker’s inequality:

≥ 2∑xb

p(xb|z, z′)δ(β(z, xb), β(z′, xb))2

and Jensen’s inequality:

≥ 2

(∑xb

p(xb|z, z′)δ(β(z, xb), β(z′, xb))))2

so √D/2 ≥

∑xb

p(xb|z, z′)δ(β(z, xb), β(z′, xb)) .

The next step relies on the following well-known property of the total variation distance: for distributionsp and q and a function f bounded by [0, 1],

|Epf(x)− Eqf(x)| ≤ δ(p, q) . (*)

For convenience we will write

δ := δ(β(z, xb), β(z′, xb)) .

A listener using the speaker’s language expects a reward of∑xb

p(xb)∑xa

p(xa|xb, z)r(xa, xb, U(z, xb))

≤∑xb

p(xb)

(∑xa

p(xa|xb, z′)r(xa, xb, U(z, xb)) + δ

)via (*). From the assumption of player rationality:

≤∑xb

p(xb)

(∑xa

p(xa|xb, z′)r(xa, xb, U(z′, xb)) + δ

)using (*) again:

≤∑xb

p(xb)

(∑xa

p(xa|xb, z)r(xa, xb, U(z′, xb)) + 2δ

)≤∑xa,xb

p(xa, xb|z)r(xa, xb, U(z′, xb)) +√2D .

So the true reward achieved by a z′-speaker receiving a translated code is only additively worse than thenative z-speaker reward:( ∑

xa,xb

p(xa, xb|z)r(xa, xb, U(z, xb))

)−√2D

B Implementation details

B.1 Agents

Learned agents have the following form:

x(t)a

z(t�1)b

h(t�1)a h(t)

a

u(t)a

z(t)aMLP

GRU

where h is a hidden state, z is a message from theother agent, u is a distribution over actions, andx is an observation of the world. A single hiddenlayer with 256 units and a tanh nonlinearity is usedfor the MLP. The GRU hidden state is also of size256, and the message vector is of size 64.

Agents are trained via interaction with the worldas in Hausknecht and Stone (2015) using theADAM optimizer (Kingma and Ba, 2014) and adiscount factor of 0.9. The step size was chosenas 0.003 for reference games and 0.0003 for thedriving game. An ε-greedy exploration strategyis employed, with the exploration parameter fortimestep t given by:

ε = max

(1000− t)/1000(5000− t)/500000

As in Foerster et al. (2016), we found it usefulto add noise to the communication channel: in thiscase, isotropic Gaussian noise with mean 0 andstandard deviation 0.3. This also helps smoothp(z|xa) when computing the translation criterion.

B.2 Representational models

As discussed in Section 5, the translation criterionis computed based on the quantity p(z|x). The pol-icy representation above actually defines a distri-bution p(z|x, h), additionally involving the agent’shidden state h from a previous timestep. Whilein principle it is possible to eliminate the depen-dence on h by introducing an additional samplingstep into Algorithm 1, we found that it simplifiedinference to simply learn an additional model ofp(z|x) directly. For simplicity, we treat the termlog(p(z′)/p(z)) as constant, those these could bemore accurately approximated with a learned den-sity estimator.

This model is trained alongside the learned agentto imitate its decisions, but does not get to observethe recurrent state, like so:

x(t)a

z(t�1)b z(t)

aMLP

Here the multilayer perceptron has a single hiddenlayer with tanh nonlinearities and size 128. It isalso trained with ADAM and a step size of 0.0003.

We use exactly the same model and parametersto implement representations of p(z|x) for humanspeakers, but in this case the vector z is taken to bea distribution over messages in the natural languageinventory, and the model is trained to maximize thelikelihood of labeled human traces.

B.3 Tasks

Colors We use the version of the XKCD datasetprepared by McMahan and Stone (2015). Here theinput feature vector is simply the LAB representa-tion of each color, and the message inventory takento be all unigrams that appear at least five times.

Birds We use the dataset of Welinder et al. (2010)with natural language annotations from Reed et al.(2016). The model’s input feature representationsare a final 256-dimensional hidden feature vectorfrom a compact bilinear pooling model (Gao et al.,2016) pre-trained for classification. The messageinventory consists of the 50 most frequent bigramsto appear in natural language descriptions; examplehuman traces are generated by for every frequent(bigram, image) pair in the dataset.

Driving Driving data is collected from pairs ofhuman workers on Mechanical Turk. Workers re-ceived the following description of the task:

Your goal is to drive the red car onto thered square. Be careful! You’re drivingin a thick fog, and there is another caron the road that you cannot see. How-ever, you can talk to the other driver tomake sure you both reach your destina-tions safely.

Players were restricted to messages of 1–3 words,and required to send at least one message per game.Each player was paid $0.25 per game. 382 gameswere collected with 5 different road layouts, eachrepresented as an 8x8 grid presented to players asin Figure 8. The action space is discrete: playerscan move forward, back, turn left, turn right, orwait. These were divided into a 282-game trainingset and 100-game test set. The message inventory

consists of all messages sent more than 3 times.Input features consists of indicators on the agent’scurrent position and orientation, goal position, andmap identity. Data is available for download athttp://github.com/jacobandreas/neuralese.