planungsbasierte kommunikationsmodelle - uni …garoufi/teaching/planung/03...planungsbasierte...
TRANSCRIPT
Planungsbasierte Kommunikationsmodelle
Einführung in die Sprechakttheorie
A scalable model of planning perlocutionary acts(Koller et al., 2010)
Konstantina Garoufi9. November 2011
Today we talk about...
Today
• Basic speech act theory
• A scalable model of planning perlocutionary acts
• Demo (if time)
Speech act theory
• Main idea: Communication is more than just transmitting information. It changes the world!
• What can communication change?
• Mental states of the agents involved in a conversation
• Overall state of the dialog
Communication = action
I sentence you to death.I promise to
be at home for dinner.
I now pronounce you husband and
wife.I order you to attack
the enemy.
Levels of speech acts
• Locutionary: locution (= saying, expression; from the Latin “loqui” = to speak) + -ary
• Illocutionary: in- + locution + -ary
• Perlocutionary: per- (= through) + locution + -ary
Austin (1962)
http://dictionary.reference.com/browse/locutionary
Locutionary act
• The act of saying something
• Utterance + meaning
• Phonetic act: verbal aspect
• Phatic act: syntactic aspect
• Rhetic act: semantic aspect
Illocutionary act
• The act performed in saying something
• Intended meaning of an utterance
• Illocutionary force: type of action (e.g. request, inform, ...)
• Propositional content: details of action (e.g. what is the hearer requested to do?)
Perlocutionary act
• The act performed by saying something
• Intended effects of utterance to the state of the dialog
• E.g. to persuade the hearer that something is true, or to get the hearer to perform a physical act
Example: “Mary, don’t go into the water”
• Locutionary act: phonetics, syntax, semantics of “Mary, don’t go into the water”
• Illocutionary act: warning (illocutionary force) that Mary should not go into the water (propositional content)
• Perlocutionary act: Mary stays away from the water
Taxonomy of illocutionary acts
• Assertives: commit the speaker to the truth of an expressed proposition (e.g. believe)
• Directives: get the hearer to do something (e.g. request)
• Commissives: commit the speaker to future actions (e.g. promise)
• Expressives: express speaker’s psychological state (e.g. thank)
• Declarations: change reality in accordance to the proposition of the declaration (e.g. baptize)
Searle (1976)
Note on performatives
• In his early work, Austin distinguished between two types of utterances
• Constatives: describe states of affairs (e.g. “Snow is white.”)
• Performatives: describe actions carried out by speakers
• Explicit: “I promise to come.”
• Implicit: “I intend to come.”
• How to distinguish statements from actions?
➔ Both the constative vs. performative and the explicit vs. implicit distinctions are problematic
• Austin eventually abandoned these dichotomies for more general notions
Note on performatives
Performatives: more info
• Searle (1989). How performatives work.
• Bach & Harnish (1992). How performatives really work.
• Grewendorf (2002). How performatives don’t work.
A scalable model of planning perlocutionary
acts
Motivation: the SCARE corpus• 15 spontaneous English
dialogue sessions
• Each session records the joint problem-solving of a pair of human partners working through a treasure-hunt style task in a 3D virtual world
DF view of the virtual world, displayed on the
Stoia et al. (2008)
The SCARE corpus• instruction giver (IG) guides
instruction follower (IF) through completing tasks
DF view of the virtual world, displayed on theIF’s view of the world,
as displayed on IG’s monitor
IG’s map of the worldStoia et al. (2008)
Example transliterated
walk forward and go through the first door you see [pause]
and then go through the next one right in front of it [pause]
yeah that one [pause] ok [disfluency - w] and then turn to your right [pause]
and then hit the button in the middle [pause]
Example step-by-stepwalk forward and go through the
first door you see
and then go through the next one right in front of it
and then turn to your right
and then hit the button in the middle
Precondition: particular spatio-visual configuration
Effect: spatio-visual configuration changes, history of interaction is enriched
Example step-by-stepwalk forward and go through the
first door you see
and then go through the next one right in front of it
and then turn to your right
and then hit the button in the middle
Precondition: particular spatio-visual configuration, particular history of interaction
Effect: spatio-visual configuration changes, history of interaction is enriched
Example step-by-stepwalk forward and go through the
first door you see
and then go through the next one right in front of it
and then turn to your right
and then hit the button in the middle
Precondition: particular spatio-visual configuration, particular history of interaction
Effect: spatio-visual configuration changes, history of interaction is enriched
Example step-by-stepwalk forward and go through the
first door you see
and then go through the next one right in front of it
and then turn to your right
and then hit the button in the middle
}navigation
Epistemic goal: improve cognition by reducing time/space complexity or unreliability of mental computation during the next turns of the discourse
Example step-by-stepwalk forward and go through the
first door you see
and then go through the next one right in front of it
and then turn to your right
and then hit the button in the middle } referring
expression generation
Pragmatic goal: complete task at hand
• Which are the linguistic and extra-linguistic factors that are important for situated communication?
• How does the interplay of language, pragmatic and epistemic action, and goal-based problem-solving work?
• In what ways can we model this interplay?
The problems
Solution: Sentence generation as planning
goal: express “sleepʼ(e,a)”
a
b
c
Koller & Stone (2007)
Lexicalized tree adjoining grammar
S:self
NP:subj ! VP:self
sleeps
V:self
N:self
rabbit
NP:self
the
N:self
white N:self *
{sleep(self,subj)} {rabbit(self)} {white(self)}
Figure 1: The example grammar.
S:e
NP:a ! VP:e
sleeps
V:e
N:a
rabbit
NP:a
the
N:a
white N:a *
S:e
VP:e
sleeps
V:e
rabbit
NP:a
the N:a
white
Figure 2: Derivation of “The white rabbit sleeps.”
In a first step, called sentence planning, the semantic rep-resentation is first enriched with more information; for in-stance, referring expressions, which refer to individuals thatwe want to talk about, are determined at this point. In asecond step, called surface realization, this enriched repre-sentation is then translated into a natural-language sentence,using the grammar.
In practice, the process of determining referring expres-sions typically interacts with the realization step and so itturns out to be beneficial to perform both of these steps to-gether. This was the goal of the SPUD system (Stone etal. 2003), which performed this task using a top-down gen-eration algorithm based on tree-adjoining grammars (Joshiand Schabes 1997) whose lexical entries were equipped withsemantic and pragmatic information. Unfortunately, SPUDsuffered from having to explore a huge search space, andhad to resort to a non-optimal greedy search strategy to re-tain reasonable efficiency. To improve on the efficiency ofSPUD, Koller and Stone (2007) translate the sentence gen-eration problem into a planning problem and use a planningalgorithm for generation.
We illustrate this process on a simplified example. Con-sider a knowledge base containing the individuals e, r1 andr2, and a set of attributes encoding the fact that r1 and r2
are rabbits, r1 is white and r2 is brown, and e is an event inwhich r1 sleeps. Say that we want to express the informa-tion {sleep(e, r1)} using the tree-adjoining grammar shownin Figure 1. This grammar consists of elementary trees (i.e.,the disjoint trees in the figure), each of which contributescertain semantic content. We can instantiate these trees bysubstituting individuals for semantic roles, such as self andsubj, and then combine the tree instances as shown in Fig-ure 2 to obtain the sentence “The white rabbit sleeps”.
We compute this grammatical derivation in a top-downmanner, starting with the elementary tree for “sleeps”. Thistree satisfies the need to convey the semantic information,but introduces a need to generate a noun phrase (NP) forthe subject; this NP must refer uniquely to the target ref-
(:action add-sleeps:parameters (?u - node
?xself - individual?xsubj - individual)
:precondition (and (subst S ?u)(referent ?u ?xself)(sleep ?xself ?xsubj))
:effect (and (not (subst S ?u))(expressed sleep ?xself ?xsubj)(subst NP (subj ?u))(referent (subj ?u) ?xsubj)(forall (?y - individual)
(when (not (= ?y ?xself))(distractor (subj ?u) ?y)))))
(:action add-rabbit:parameters (?u - node
?xself - individual):precondition (and (subst NP ?u)
(referent ?u ?xself)(rabbit ?xself))
:effect (and (not (subst NP ?u))(canadjoin N ?u)(forall (?y - individual)
(when (not (rabbit ?y))(not (distractor ?u ?y))))))
(:action add-white:parameters (?u - node
?xself - individual):precondition (and (canadjoin N ?u)
(referent ?u ?xself)(rabbit ?xself))
:effect (forall (?y - individual)(when (not (white ?y))
(not (distractor ?u ?y)))))
Figure 3: PDDL actions for generating the sentence “Thewhite rabbit sleeps.”
erent r1. In a second step, we substitute the tree for “therabbit” into the open NP leaf, which makes the derivationgrammatically complete. Since there are two different indi-viduals that could be described as “the rabbit”—technically,r2 is still a distractor (i.e., based on the description “the rab-bit”, the hearer might erroneously think that we’re talkingabout r2 and not r1)—we are still not finished. To completethe derivation, the tree for “white” is added to the existingstructure by an adjunction operation, making the derivationsyntactically and semantically complete.
The process described above has clear parallels to plan-ning: we manipulate a state by applying actions in order toachieve a goal. We can make this connection even moreprecise by translating the SPUD problem into a planningproblem. For instance, Figure 3 shows the correspondingPDDL actions for the above generation task, where each ac-tion corresponds to an operation that adds a single elemen-tary tree to the derivation. In each case, the first parameterof the action is a node name in the derivation tree, and theremaining parameters stand for the individuals to which thesemantic roles will be instantiated. The syntactic precon-
elementary treesKoller & Stone (2007)
Combining elementary trees
N:a
white N:a *
NP:a
the NP:a
S:e
NP:a ! VP:e
sleeps
V:e
N:a
rabbit
S:self
NP:subj ! VP:self
sleeps
V:self
N:self
rabbit
NP:self
the
N:self
white N:self *
{sleep(self,subj)} {rabbit(self)} {white(self)}
Figure 1: The example grammar.
S:e
NP:a ! VP:e
sleeps
V:e
N:a
rabbit
NP:a
the
N:a
white N:a *
S:e
VP:e
sleeps
V:e
rabbit
NP:a
the N:a
white
Figure 2: Derivation of “The white rabbit sleeps.”
In a first step, called sentence planning, the semantic rep-resentation is first enriched with more information; for in-stance, referring expressions, which refer to individuals thatwe want to talk about, are determined at this point. In asecond step, called surface realization, this enriched repre-sentation is then translated into a natural-language sentence,using the grammar.
In practice, the process of determining referring expres-sions typically interacts with the realization step and so itturns out to be beneficial to perform both of these steps to-gether. This was the goal of the SPUD system (Stone etal. 2003), which performed this task using a top-down gen-eration algorithm based on tree-adjoining grammars (Joshiand Schabes 1997) whose lexical entries were equipped withsemantic and pragmatic information. Unfortunately, SPUDsuffered from having to explore a huge search space, andhad to resort to a non-optimal greedy search strategy to re-tain reasonable efficiency. To improve on the efficiency ofSPUD, Koller and Stone (2007) translate the sentence gen-eration problem into a planning problem and use a planningalgorithm for generation.
We illustrate this process on a simplified example. Con-sider a knowledge base containing the individuals e, r1 andr2, and a set of attributes encoding the fact that r1 and r2
are rabbits, r1 is white and r2 is brown, and e is an event inwhich r1 sleeps. Say that we want to express the informa-tion {sleep(e, r1)} using the tree-adjoining grammar shownin Figure 1. This grammar consists of elementary trees (i.e.,the disjoint trees in the figure), each of which contributescertain semantic content. We can instantiate these trees bysubstituting individuals for semantic roles, such as self andsubj, and then combine the tree instances as shown in Fig-ure 2 to obtain the sentence “The white rabbit sleeps”.
We compute this grammatical derivation in a top-downmanner, starting with the elementary tree for “sleeps”. Thistree satisfies the need to convey the semantic information,but introduces a need to generate a noun phrase (NP) forthe subject; this NP must refer uniquely to the target ref-
(:action add-sleeps:parameters (?u - node
?xself - individual?xsubj - individual)
:precondition (and (subst S ?u)(referent ?u ?xself)(sleep ?xself ?xsubj))
:effect (and (not (subst S ?u))(expressed sleep ?xself ?xsubj)(subst NP (subj ?u))(referent (subj ?u) ?xsubj)(forall (?y - individual)
(when (not (= ?y ?xself))(distractor (subj ?u) ?y)))))
(:action add-rabbit:parameters (?u - node
?xself - individual):precondition (and (subst NP ?u)
(referent ?u ?xself)(rabbit ?xself))
:effect (and (not (subst NP ?u))(canadjoin N ?u)(forall (?y - individual)
(when (not (rabbit ?y))(not (distractor ?u ?y))))))
(:action add-white:parameters (?u - node
?xself - individual):precondition (and (canadjoin N ?u)
(referent ?u ?xself)(rabbit ?xself))
:effect (forall (?y - individual)(when (not (white ?y))
(not (distractor ?u ?y)))))
Figure 3: PDDL actions for generating the sentence “Thewhite rabbit sleeps.”
erent r1. In a second step, we substitute the tree for “therabbit” into the open NP leaf, which makes the derivationgrammatically complete. Since there are two different indi-viduals that could be described as “the rabbit”—technically,r2 is still a distractor (i.e., based on the description “the rab-bit”, the hearer might erroneously think that we’re talkingabout r2 and not r1)—we are still not finished. To completethe derivation, the tree for “white” is added to the existingstructure by an adjunction operation, making the derivationsyntactically and semantically complete.
The process described above has clear parallels to plan-ning: we manipulate a state by applying actions in order toachieve a goal. We can make this connection even moreprecise by translating the SPUD problem into a planningproblem. For instance, Figure 3 shows the correspondingPDDL actions for the above generation task, where each ac-tion corresponds to an operation that adds a single elemen-tary tree to the derivation. In each case, the first parameterof the action is a node name in the derivation tree, and theremaining parameters stand for the individuals to which thesemantic roles will be instantiated. The syntactic precon-
derived treeKoller & Stone (2007)
Elementary trees as actions
S-sleeps(0,e,a)
pre: open(0,S,e), sleep!(e,a)
effect: ¬open(0,S,e), open(1,NP,a),
!y. y " a ! distractor(1,y)
NP-the-rabbit(1,a)
pre: open(1,NP,a), rabbit!(a)
effect: ¬open(1,NP,a), allowed(1, adj-N, a),
!y. ¬rabbit(y) ! ¬distractor(1,y)
N-white(1,a)
pre: allowed(1,adj-N,a), white!(a)
effect: !y. ¬white!(y) !
¬distractor(1,y)
semantics
syntax
Koller & Stone (2007)
NP-the-rabbit(1,a)
pre: open(1,NP,a), rabbit!(a)
effect: ¬open(1,NP,a), allowed(1, adj-N, a),
!y. ¬rabbit(y) ! ¬distractor(1,y)
goal: express “sleepʼ(e,a)”
S-sleeps(0,e,a)
pre: open(0,S,e), sleep!(e,a)
effect: ¬open(0,S,e), open(1,NP,a),
!y. y " a ! distractor(1,y)
N-white(1,a)
pre: allowed(1,adj-N,a), white!(a)
effect: !y. ¬white!(y) !
¬distractor(1,y)
S-sleeps(0,e,a)
pre: open(0,S,e), sleep!(e,a)
effect: ¬open(0,S,e), open(1,NP,a),
!y. y " a ! distractor(1,y)
NP-the-rabbit(1,a)
pre: open(1,NP,a), rabbit!(a)
effect: ¬open(1,NP,a), allowed(1, adj-N, a),
!y. ¬rabbit(y) ! ¬distractor(1,y)
N-white(1,a)
pre: allowed(1,adj-N,a), white!(a)
effect: !y. ¬white!(y) !
¬distractor(1,y)
off-the-shelf automated
planning system
plan decoding
N:a
white N:a *
NP:a
the NP:a
S:e
NP:a ! VP:e
sleeps
V:e
N:a
rabbit
The white rabbit sleeps.
Sentence generation as planning
a b cKoller & Stone (2007)
Extensions for planning perlocutionary acts
S:self
V:self
push
NP:obj !
semreq: visible(p, o, obj)nonlingcon: player–pos(p),
player–ori(o)impeff: push(obj)
S:self
V:self
turn
Adv
left
nonlingcon: player–ori(o1),next–ori–left(o1, o2)
nonlingeff: ¬player–ori(o1),player–ori(o2)
impeff: turnleft
S:self
S:self * S:other ! and
Figure 4: An example SCRISP lexicon.
of you”. This lowers the cognitive load on the IF,and presumably improves the rate of correctly in-terpreted REs.
SCRISP is capable of deliberately generat-ing such context-changing navigation instructions.The key idea of our approach is to extend theCRISP planning operators with preconditions andeffects that describe the (simulated) physical envi-ronment: A “turn left” action, for example, mod-ifies the IF’s orientation in space and changes theset of visible objects; a “push” operator can thenpick up this changed set and restrict the distractorsof the forthcoming RE it introduces (i.e. “the but-ton”) to only objects that are visible in the changedcontext. We also extend CRISP to generate imper-ative rather than declarative sentences.
4.1 Situated CRISP
We define a lexicon for SCRISP to be a CRISPlexicon in which every lexicon entry may also de-scribe non-linguistic conditions, non-linguistic ef-fects and imperative effects. Each of these is aset of atoms over constants, semantic roles, andpossibly some free variables. Non-linguistic con-ditions specify what must be true in the worldso a particular instance of a lexicon entry can beuttered felicitously; non-linguistic effects specifywhat changes uttering the word brings about in theworld; and imperative effects contribute to the IF’s“to-do list” (Portner, 2007) by adding the proper-ties they denote.
A small lexicon for our example is shown inFig. 4. This lexicon specifies that saying “pushX” puts pushing X on the IF’s to-do list, and car-ries the presupposition that X must be visible fromthe location where “push X” is uttered; this re-flects our simplifying assumption that the IG can
turnleft(u, x, o1, o2):Precond: subst(S, u), ref(u, x), player–ori(o1),
next–ori–left(o1, o2), . . .Effect: ¬subst(S, u),¬player–ori(o1), player–ori(o2),
to–do(turnleft), . . .
push(u, u1, un, x, x1, p, o):Precond: subst(S, u), ref(u, x), player–pos(p),
player–ori(o), visible(p, o, x1), . . .Effect: ¬subst(S, u), subst(NP, u1), ref(u1, x1),
∀y.(y �= x1 ∧ visible(p, o, y) → distractor(u1, y)),to–do(push(x1)), canadjoin(S, u), . . .
and(u, u1, un, e1, e2):Precond: canadjoin(S, u), ref(u, e1), . . .Effect: subst(S, u1), ref(u1, e2), . . .
Figure 5: SCRISP planning operators for the lexi-con in Fig. 4.
only refer to objects that are currently visible.Similarly, “turn left” puts turning left on the IF’sagenda. In addition, the lexicon entry for “turnleft” specifies that, under the assumption that theIF understands and follows the instruction, theywill turn 90 degrees to the left after hearing it. Theplanning operators are written in a way that as-sumes that the intended (perlocutionary) effects ofan utterance actually come true. This assumptionis crucial in connecting the non-linguistic effectsof one SCRISP action to the non-linguistic pre-conditions of another, and generalizes to a scalablemodel of planning perlocutionary acts. We discussthis in more detail in Koller et al. (2010a).
We then translate a SCRISP generation prob-lem into a planning problem. In addition to whatCRISP does, we translate all non-linguistic condi-tions into preconditions and all non-linguistic ef-fects into effects of the planning operator, addingany free variables to the operator’s parameters.An imperative effect P is translated into an ef-fect to–do(P ). The operators for the example lex-icon of Fig. 4 are shown in Fig. 5. Finally, weadd information about the situated environment tothe initial state, and specify the planning goal byadding to–do(P ) atoms for each atom P that is tobe placed on the IF’s agenda.
4.2 An exampleNow let’s look at how this generates the appropri-ate instructions for our example scene of Fig. 3.We encode the state of the world as depictedin the map in an initial state which contains,among others, the atoms player–pos(pos3,2),player–ori(north), next–ori–left(north,west),
action: push(e,x,p,o)
precondition: visible(p,o,x), ...
effect: ∀y. (y≠x ∧ visible(p,o,y) → distractor(y)), to-do(push(x)), ...
S:self
V:self
push
NP:obj !
semreq: visible(p, o, obj)nonlingcon: player–pos(p),
player–ori(o)impeff: push(obj)
S:self
V:self
turn
Adv
left
nonlingcon: player–ori(o1),next–ori–left(o1, o2)
nonlingeff: ¬player–ori(o1),player–ori(o2)
impeff: turnleft
S:self
S:self * S:other ! and
Figure 4: An example SCRISP lexicon.
of you”. This lowers the cognitive load on the IF,and presumably improves the rate of correctly in-terpreted REs.
SCRISP is capable of deliberately generat-ing such context-changing navigation instructions.The key idea of our approach is to extend theCRISP planning operators with preconditions andeffects that describe the (simulated) physical envi-ronment: A “turn left” action, for example, mod-ifies the IF’s orientation in space and changes theset of visible objects; a “push” operator can thenpick up this changed set and restrict the distractorsof the forthcoming RE it introduces (i.e. “the but-ton”) to only objects that are visible in the changedcontext. We also extend CRISP to generate imper-ative rather than declarative sentences.
4.1 Situated CRISP
We define a lexicon for SCRISP to be a CRISPlexicon in which every lexicon entry may also de-scribe non-linguistic conditions, non-linguistic ef-fects and imperative effects. Each of these is aset of atoms over constants, semantic roles, andpossibly some free variables. Non-linguistic con-ditions specify what must be true in the worldso a particular instance of a lexicon entry can beuttered felicitously; non-linguistic effects specifywhat changes uttering the word brings about in theworld; and imperative effects contribute to the IF’s“to-do list” (Portner, 2007) by adding the proper-ties they denote.
A small lexicon for our example is shown inFig. 4. This lexicon specifies that saying “pushX” puts pushing X on the IF’s to-do list, and car-ries the presupposition that X must be visible fromthe location where “push X” is uttered; this re-flects our simplifying assumption that the IG can
turnleft(u, x, o1, o2):Precond: subst(S, u), ref(u, x), player–ori(o1),
next–ori–left(o1, o2), . . .Effect: ¬subst(S, u),¬player–ori(o1), player–ori(o2),
to–do(turnleft), . . .
push(u, u1, un, x, x1, p, o):Precond: subst(S, u), ref(u, x), player–pos(p),
player–ori(o), visible(p, o, x1), . . .Effect: ¬subst(S, u), subst(NP, u1), ref(u1, x1),
∀y.(y �= x1 ∧ visible(p, o, y) → distractor(u1, y)),to–do(push(x1)), canadjoin(S, u), . . .
and(u, u1, un, e1, e2):Precond: canadjoin(S, u), ref(u, e1), . . .Effect: subst(S, u1), ref(u1, e2), . . .
Figure 5: SCRISP planning operators for the lexi-con in Fig. 4.
only refer to objects that are currently visible.Similarly, “turn left” puts turning left on the IF’sagenda. In addition, the lexicon entry for “turnleft” specifies that, under the assumption that theIF understands and follows the instruction, theywill turn 90 degrees to the left after hearing it. Theplanning operators are written in a way that as-sumes that the intended (perlocutionary) effects ofan utterance actually come true. This assumptionis crucial in connecting the non-linguistic effectsof one SCRISP action to the non-linguistic pre-conditions of another, and generalizes to a scalablemodel of planning perlocutionary acts. We discussthis in more detail in Koller et al. (2010a).
We then translate a SCRISP generation prob-lem into a planning problem. In addition to whatCRISP does, we translate all non-linguistic condi-tions into preconditions and all non-linguistic ef-fects into effects of the planning operator, addingany free variables to the operator’s parameters.An imperative effect P is translated into an ef-fect to–do(P ). The operators for the example lex-icon of Fig. 4 are shown in Fig. 5. Finally, weadd information about the situated environment tothe initial state, and specify the planning goal byadding to–do(P ) atoms for each atom P that is tobe placed on the IF’s agenda.
4.2 An exampleNow let’s look at how this generates the appropri-ate instructions for our example scene of Fig. 3.We encode the state of the world as depictedin the map in an initial state which contains,among others, the atoms player–pos(pos3,2),player–ori(north), next–ori–left(north,west),
action: turn-left(e,x,y)
precondition: player-ori(x), ...
effect: ¬player-ori(x), player-ori(y), to-do(turn-left), ...
communicative acts have the
intended perlocutionary
effects
pragmatic conditions
Koller et al. (2010)
Exampletarget
instruction followerʼs location
hit the yellow button on the left wall in the
next room
1. walk forward and go through
the door2. (...) ok and then
turn to your left
3. and then hit the button
✔
Problem
How to navigate the user so that the forthcoming referring expression is cognitively simple?
Push the button on the wall to your left.
Push the button on the wall to your left in the next room.
target
Stateplayer-pos(p1)player-ori(o1)
¬visible(p1,o1,b1)
turn-left(e1,o1,o2)
Turn left
action: turn-left(e1,o1,o2)
precondition: player-ori(o1), ...
effect: ¬player-ori(o1), player-ori(o2), ...
Solutiontarget
Stateplayer-pos(p1)player-ori(o2)
visible(p1,o2,b1)
action: and(e1,e2)
precondition: ...
effect: ...
and(e1,e2)push(e2,b1,p1,o2)
the-button(b1)
and push the button.
action: the-button(b1)
effect: ∀y. (¬button(y) → ¬distractor(y)), ...
action: push(e2,b1,p1,o2)
precondition: visible(p1,o2,b1), ...
effect: ∀y. (y≠b1 ∧ visible(p1,o2,y) → distractor(y)), to-do(push(b1)), ...
Solutiontarget
Statetodo(push(b1))
✔
SolutionPush the button.
State
to-do(hit(b))✔
...by assuming that the intended perlocutionary effects become true, the system
➡ achieves deliberate manipulation of the extra-linguistic context
➡ but makes execution monitoring crucial
Notice that...
Solution
Monitoring perlocutionary effects
• Inactivity execution monitor: tracks whether the user has not moved or acted in a certain period of time, in which case it repeats the utterance
• Distance execution monitor: observes whether the user is moving away from the target, in which case it recomputes the utterance
• Danger execution monitor: monitors whether the user approaches a trap in the world, in which case it issues a warning
User plays 3D gamein virtual world.
inte
rnet
Natural language generation system generates
instructions in real time.
“move forward 2 steps!”“press the blue button!”
� �
! !
1
2
Evaluation framework: GIVE Challenge Byron et al.
(2009)
Website
Try GIVE out!
http://www.give-challenge.org
Implementation as a GIVE system
• Solves planning problems with off-the-shelf FF planner
• Max planning time 1.03 sec (3 GHz CPU) for a knowledge base of about 1500 facts and a grammar of about 30 lexicon entries
Real-time performance!
Evaluation
• System achieves competitive performance in the GIVE task
• With respect to task success, it outperforms an informed baseline and performs comparably to an improved version of the best performer in GIVE-1
• It can refer to objects from significantly further away than the baselines
Conclusions
• It is possible to model perlocutionary effects of speech acts as direct effects of corresponding planning operators
• This keeps planning complexity low...
• ...and employs execution monitoring to make sure the intended perlocutionary effects are actually achieved
Discussion points
• Which kinds of speech acts (according to Searle) has this approach covered?
• For which kinds of speech acts is it most appropriate?
• Is it possible to model any speech act in this manner?
• Austin (1962). How to do things with words.
• Byron et al. (2009). Report on the First NLG Challenge on Generating Instructions in Virtual Environments (GIVE).
• Koller & Stone (2007). Sentence generation as a planning problem.
• Searle (1976). A classification of illocutionary acts.
• Stoia et al. (2008). SCARE: A Situated Corpus with Annotated Referring Expressions.
References