simultaneous intention estimation and knowledge augmentation via human-robot...

Simultaneous Intention Estimation and KnowledgeAugmentation via Human-Robot Dialog

Sujay Bajracharya, Saeid AmiriCleveland State University

{s.bajracharya; s.amiri}@vikes.csuohio.edu

Jesse ThomasonUniversity of Washington

[email protected]

Shiqi ZhangSUNY Binghamton

[email protected]

Abstract—Robots have been able to interact with humansusing natural language, and to identify service requests throughhuman-robot dialog. However, few robots are able to improvetheir language capabilities from this experience. In this paper,we develop a dialog agent for robots that is able to interpret usercommands using a semantic parser, while asking clarificationquestions using a probabilistic dialog manager. This dialog agentis able to augment its knowledge base and improve its languagecapabilities by learning from dialog experiences, e.g., adding newentities and learning new ways of referring to existing entities.We have extensively tested our dialog system using a mobilerobot that is capable of completing object delivery and humanguidance tasks. Experiments were conducted both in simulationand with human participants. Results suggest that our dialogagent performs better in both efficiency and accuracy whencompared to one that does not learn from this experience andanother that learns using a predefined strategy.

I. INTRODUCTION

Mobile robots have been extensively used to conduct taskssuch as object delivery in the real world. Notable examplesinclude the Amazon warehouse robots and the Relay robotsfrom Savioke. However, these robots either work in human-forbidden environments, or have no interaction with humansexcept for obstacle avoidance. Researchers are developingmobile robot platforms that are able to interact with and pro-vide services to people in everyday, human-inhabited environ-ments [9, 28, 7, 3]. Some of the robot platforms can learn fromthe experience of human-robot interaction (HRI) to improvetheir language skills, e.g., learning new synonyms [26], butnone of them learn entirely new entities. This work aims toenable the robots to simultaneously identify service requeststhrough human-robot dialog, and learn from this experienceto augment the robot’s knowledge base (KB).

A robot dialog system typically includes the followingfour components: 1) A language understanding component forconverting spoken or text-based language inputs into a formalrepresentation that computers understand; 2) A state trackingcomponent that estimates the current dialog state based on thelanguage understanding; 3) A dialog management componentthat suggests a language action (e.g., clarification questions);and 4) A language generator that outputs spoken or text-basednatural language. The dialog agent developed in this workincludes the four components, and further supports dialog-based knowledge augmentation.

There are at least two types of dialog systems that canbe distinguished based on their design purposes. The first

Fig. 1. Segway-based mobile robot platform used in this research.

type focuses on maximizing social engagement, e.g., MicrosoftXiaoIce, where the dialog agent usually prefers extended con-versations. We are concerned with the second type of dialogsystems, often referred to as being goal-oriented, that aim atmaximizing information gain. Goal-oriented dialog systemscan be evaluated based on dialog efficiency and accuracy.In this setting, people prefer dialog agents that are able toaccurately identify human intention using fewer dialog turns.

Goal-oriented dialog systems are necessary for language-based human-robot interaction because, in most cases, peoplecannot fully and accurately deliver information using a singledialog turn. Consider a service request of “Robot, pleasedeliver a coffee to the conference room!” It is possible thatthe robot does not know which conference room the speakeris referring to, in which case it is necessary to ask clarificationquestions such as “Where should I deliver a coffee?” To furtheridentify the service request, the robot might want to ask aboutthe recipient as well: “For whom is the delivery?” Althoughsuch dialog systems have been implemented on robots, fewof them can learn to improve their language capabilitiesor augment their KB from the experience of human-robotconversations in the real world (Section II).

This work focuses on dialog-based robot knowledge aug-mentation, where the agent must identify when it is necessaryto augment its KB and how to do that, as applied to ourSegway-based mobile robot shown in Figure 1. Partiallyobservable Markov decision processes (POMDPs) [8] havebeen used to construct dialog managers to account for theuncertainty in language understanding [30]. We develop a

dual-track POMDP-based dialog manager to help the agentmaintain a confidence level of its current knowledge beingable to support the dialog, and accordingly decide to whetheraugment its KB or not. After the agent becomes confident thatnew entities are necessary so as to make progress in the dialog,it decides where in the KB to add a new entity (e.g., as a newitem or a new person) by analyzing the flow of the dialog. As aresult, our dialog agent is able to decide when it is necessary toaugment its KB and how to do so in a semantically meaningfulway.

Our dialog system has been implemented both in simulationand on a Segway-based mobile robot that is able to interactwith people using natural language and conduct guidanceand delivery tasks. Experimental results show that our dialogsystem performs better in service request identification in bothefficiency and accuracy, in comparison to baselines that use astatic KB or update the KB using a predefined strategy. Ex-periments with human participants suggest that our knowledgeaugmentation component improves user experiences as well.

II. RELATED WORK

This work is related to existing research on robots interpret-ing human language. We discuss applications where languageinputs are used for providing service requests, introducing newconcepts, and instructing a robot to conduct specific tasks.

Researchers have developed algorithms for learning to in-terpret natural language commands. Examples include thework of Matuszek et al. that has enabled a robot to learn tounderstand navigation commands [13]. Other examples includethe work of “Tell me Dave” [15], and the work of Tellexet al. that focused on a graph-based language understandingapproach [25]. Recent research enabled the co-learning ofsyntax and semantics of spatial language [23, 6]. Although thesystems support the learning of language skills, they do nothave a dialog management component (implicitly assumingperfect language understanding), making the methods not suit-able for applications that require multi-turn communications.

Another research area that is related to this work focuses onlearning dialog strategies. For instance, the NJFun system [22]modeled the dialog management problem using Markov De-cision Processes (MDPs) [19], and learned the dialog strategyusing Reinforcement Learning (RL) algorithms [24]. Recentresearch on Deep RL (DRL) has enabled an agent to interac-tively learn action policies in more complex domains [16], andDRL methods have been used to learn dialog strategies [29, 4].The systems do not include a semantic parsing component. Asa result, users can only communicate with their dialog agentsusing simple or predefined language patterns.

Mobile robot platforms have been equipped with semanticparsing and dialog management capabilities. After a task isidentified in dialog, these robots are able to conduct servicetasks using a task planner. One example is the dialog agentdeveloped for the KeJia robot [12]. Integrated commonsensereasoning and probabilistic planning (CORPP) has been ap-plied to dialog systems, resulting in a commonsense-awarerobot dialog system [31]. Recent research further integrated

multi-modal perception capabilities, such as locations andfacial expressions, into dialog systems [11]. Although thesesystems enable a robot to identify human requests via dialog,they do not learn from such experiences.

Thomason et al. developed a dialog agent for a mobileservice robot that is able to conduct service tasks such ashuman guidance and object delivery [26]. A dialog managersuggests language actions for asking clarification questions,and the agent is able to learn from human-robot conversations.Recent research further enabled the agent to learn to simulta-neously improve the semantic parsing and dialog managementcapabilities [17]. These methods focus on learning to improvean agent’s language capabilities, and do not augment itsknowledge base in this process. This work builds on thedialog agent implemented by Thomason et al., and introducesa dual-track POMDP-based dialog manager and a strategy foraugmenting the robot’s knowledge base.

There are other dialog agents that specifically aim at knowl-edge augmentation through human-robot dialog. An exampleis the CMU CoBots that are able to learn procedural knowl-edge from verbal instructions via human-robot dialog [14],and learn task knowledge by querying the web [18]. Recently,the instructable agent developed by Azaria et al. is able tolearn new concepts and new procedural knowledge throughhuman-robot dialog [2]. Recent work enabled a mobile robotto ground new concepts using visual-linguistic observations,e.g., to ground new word “box” given a command of “moveto the box” by exploring the environment and hypothesizingobjects [27]. These agents are able to augment their knowl-edge bases through language-based interactions with humans.However, their dialog management components (if existing)are relatively weak and do not model the noise in naturallanguage understanding.

She and Chai developed a robot dialog system that focuseson situated verb semantics learning [21]. Their dialog agentuses RL to learn a dialog management policy, and uses asemantic parser to process natural language inputs. However,their work specifically focused on learning the semantics ofverbs, limiting the applicability of their knowledge augmenta-tion approach.

The dialog agent developed in this work processes languageinputs using a semantic parser to understand users’ servicerequests, leverages a POMDP-based dialog manager to handlethe uncertainty from the unreliable parser by asking clarifica-tion questions, and augments its knowledge base when thecurrent knowledge is not sufficient to interpret the humanrequest.

III. DIALOG AGENT

In this section, we present our dialog agent that integratesa decision-theoretic approach for dialog management underuncertainty, and an information-theoretic approach for knowl-edge management. Figure 2 shows an overview of our dialogsystem. Next, we present our dual-track controller for knowl-edge and dialog management, the language understandingcomponent, and the representation of our knowledge base.

Language understanding

KB Belief

Dual-track POMDP

Dialog BeliefKnowledge Base

Dialog Agent

“Deliver a pop to Alice”

“Should I deliver a coke?”Dialog ManagerKnowledge Manager

<deliver, Ø, alice>

Should “pop” be added into the KB?

Fig. 2. A pictorial overview of our dialog system, including a semantic parser for language understanding, a knowledge base represented by Answer setprogramming (ASP) [5], and a dual-track POMDP for the management of knowledge base (KB) and dialog.

A. Dialog and Knowledge Management

Markov decision process (MDP) is a general sequentialdecision-making framework that can be used for planningunder uncertainty in action outcomes [19]. Partially observ-able MDP (POMDP) [8] generalizes the MDP framework byassuming the world being partially observable. POMDPs havebeen used for dialog management [30].

There are two POMDP-based control loops in our dialogagent, resulting in a dual-track POMDP controller. One trackfocuses on maintaining the dialog belief state, and accordinglysuggesting language actions to the agent. The other trackfocuses on maintaining the belief of the current knowledgebeing (in)sufficient to complete the task, and accordinglysuggesting knowledge augmentation.

1) Dialog Management Track: The dialog managementPOMDP includes the following components:• S : (ST ×SI ×SR)∪ term, where ST is the set of task

types (delivery and guidance in our case), SI is the setof items used in the task, SR is the set of recipients ofthe delivery, and term is the terminal state.

• A : AW ∪ AC ∪ AR is the action set. AW consists ofgeneral “wh” questions, such as “Whom is this deliveryfor?” and “What item should I deliver?” AC includesconfirming questions that expect yes/no answers, and re-porting actions AR return the estimated human requests.

• T : S × A × S′ → [0, 1] is the state-transition function.In our case, the dialog remains in the same state afterquestion-asking actions, and reporting actions lead tran-sitions to term deterministically.

• R : S × A → IR is the reward function and the rewardvalues are assigned as:

R(s, a) =

rC , if s ∈ S, a ∈ AC

rW , if s ∈ S, a ∈ AW

r+, if s ∈ S, a ∈ AR, s� ar−, if s ∈ S, a ∈ AR, s⊗ a

where rC and rW are the costs of confirming and generalquestions, in the form of negative, relatively small values;

r+ is a big bonus for correct reports; and r− is a bigpenalty (negative) for wrong reports.

• Z : ZT ∪ZI ∪ZR ∪ {z+, z−} is the set of observations,where ZT , ZI and ZR include observations of task type,item, and recipient respectively. z+ and z− correspond to“yes” and “ no”. Our dialog agent takes in observations assemantic parses that have correspondence to elements inZ. Other parses, including the malformed ones, producerandom observations (Section III-B).

• O : S×A×Z ∪ inapplicable is the observation functionthat specifies the probability of observing z ∈ Z in states ∈ S, after taking action a ∈ A. Reporting actions yieldthe inapplicable observations. Our observation functionmodels the noise in language understanding, e.g., theprobability of correcting recognizing z+ (“yes”) is 0.8.The noise model is heuristically designed in this work,though it can be learned from real conversations.

It should be noted that this POMDP model is dynamicallyrevised (similar to [32]), when new entities (Section III-C) areadded into the knowledge base. Solving this POMDP generatesa policy π, which maps a belief to an action that maximizeslong-term information gain.

2) Knowledge Management Track: In addition to the dialogmanagement POMDP, we have a knowledge managementPOMDP that monitors whether the agent’s knowledge is suffi-cient to support estimating human intentions. The knowledgemanagement POMDP formulation is similar to that for dialogmanagement but includes entities for unknown items andrecipients.

The components of the knowledge management POMDPare shown below, where transition and observation functionsare generated accordingly and hence not listed:

• S+ : S ∪ {(sT , sI , sR) | ∀sT ∈ ST ,∀sI ∈ SI} ∪{(sT , sI , sR) | ∀sT ∈ST ,∀sR∈SR} is the set of states.It includes all states in S along with the states corre-sponding to new entries that correspond to an unknownitem sI and an unknown recipient sR.

• A+ : A ∪ {aI , aR} ∪ AR is the set of actions includingthe actions in A, two actions for confirming the unknown

Bring coffee to james

S/N λf.f

N λx.coffee(x)

PP/NP λx.λy.to(x,y)

NP james

PP λx.to(x, james)

N\N λf.λx.f(x) ∧ to(x, james)

N λx.coffee(x) ∧ to(x, james)

S

λx.coffee(x) ∧ to(x, james)

Fig. 3. An example of parsing a service request sentence using CCG semanticparsing and λ calculus.

item and recipient respectively, and AR, reporting actionsthat correspond to the states in S+.

• Z+ = Z ∪ {zI , zR} is the augmented observation set.

At runtime, we maintain belief distributions for both tracksof POMDPs. Belief b of dialog POMDP is used for sequentialdecision making and dialog management, and belief b+ ofknowledge POMDP is only used for language augmentationpurposes, i.e., determining when it is necessary to augment theKB (Section III-D). When observations are made (observationz ∈ Z), both beliefs are updated using the Bayes update rule:

b′(s′) =O(s′, a, z)

∑s∈S T (s, a, s′)b(s)

pr(z|a, b)

where s is the state, a is the action, pr(z|a, b) is the normal-izing factor, and z is the observation. In our dialog system,observations are made based on the language understandingusing a semantic parser.

Remarks: Fundamentally, one can merge multiple POMDPsto unify the action selection process. We use a two-track con-troller in this work, because the knowledge track is not usedfor action selection. We believe separating the knowledge anddialog tracks reduces the complexity of the entire framework,and we leave formal analysis to future work.

B. Semantic Parsing for Language Understanding

In order to understand natural language and make obser-vations for POMDPs, we use a semantic parser that buildson the Semantic Parsing Framework (SPF) described in [1].The input of the semantic parser is natural language fromhuman users, and the output is a list of possible parses fora given sentence. Using the semantic parser, the request innatural language is transformed to a formal representation thatcomputers understand.

Figure 3 shows an example of the parser recognizing asentence. It can reason over the ontology of the known wordswhen it parses a sentence, e.g., james:pe and coffee:it. Thedialog manager can use this information to translate fromwords to corresponding observation for the question asked bythe robot. If the language understanding fails (e.g., producingparses that are malformed or do not comply with the precedingquestions), then a random observation from Z will be madefor the unknown part of the request.

C. Domain Knowledge Representation

We use Answer Set Prolog (ASP) [5], a declarative lan-guage, for knowledge representation. The agent’s knowledgebase (KB), in the form of an ASP program, includes rules inthe form of:

l0 ← l1, · · · , ln, not lk, · · · , not ln+k

where l’s are literals that represent whether a statement is true.The right side of a rule is the body, and the left side is thehead. A rule without a body is called a fact.

The KB of our agent includes a set of objects in ASP: {alice,sandwich, kitchen, office1, delivery, · · · }, where delivery spec-ifies the task type. A set of predicates, such as {recipient, item,task, room}, are used for specifying a category for each object.

Using the above predicates and objects, we can use rules tofully specify tasks, such as “deliver a coke to Alice”:

task(delivery).

item(coke).

recipient(alice).

One can easily include more information into the ASP-basedKB, such as rooms, positions of people, and a categorical treeof objects. This ASP-based KB is mainly used for respondingto information queries, and task planning purposes, where thequery and/or task are specified by our dialog agent.

D. Our Algorithm for Simultaneous Intention Estimation andKnowledge Augmentation

In this subsection, we first define a few terms and functions,and then introduce the main algorithm for simultaneouslyestimating human intention (service requests, in our case) andaugmenting the agent’s knowledge base. We use entropy tomeasure the uncertainty level of the agent’s belief distribution:

H(b) = −n−1∑i=0

b(si) · log b(si)

where H is the entropy function.When the agent is (un)confident about the state, the entropy

value is (high) low. In particular, a uniform belief distributioncorresponds to the highest entropy level. We use entropy fortwo purposes in our algorithm.

Rewording Service Request: If the belief entropy ishigher than threshold h (meaning the agent is highly uncertainabout the dialog state), we encourage the human user tostate the entire request in one sentence. Otherwise, the dialogmanager decides the flow of the conversation.

Entropy Fluctuation: We use the concept of entropy todefine entropy fluctuation (EF):

f(b) =sign(H(b[1])−H(b[0])

)⊕

sign(H(b[2])−H(b[1])

)where b is a belief queue that includes the three most recentdialog beliefs, f(b) outputs true, if there is an EF in the last

0 2 4 6 8 10 12Dialog Turn

0.5

1.0

1.5

2.0

Entro

py o

f Dia

log

Belie

f

Dennis added

How can I help?Bring Pop to Dennis

confirm Aliceno

confirm Popyes

Who should I bring the item to?Dennis

confirm Carolno

Who should I bring the item to?Dennis

confirm Bobno

Who should I bring the item to?Dennis confirm Carol

no

Fig. 4. In this example the user requests a Pop to an novel recipient, Dennis, who is not yet included in the system’s ontology. The dialog system makesobservation for Pop but does not understand Dennis, so it selects another person at random as an observation (in this case, Alice). The users denies thisincorrect person choice, then confirms pop. The conversation continues with the system trying to figure out who the delivery is for, but it cannot yet reasonabout Dennis, who is absent from the ontology. When the number of EFs crosses a specified threshold (5 in this case), Dennis is added as a new ontologicalobject. After that, the system asks some final confirmation questions to clear any confusion for example there may be some belief in Carol which is clearedonce the user replies no. The entropy continues to decrease until request is identified successfully.

Algorithm 1 Simultaneous Intention Estimation and Knowl-edge AugmentationRequire: τb, h,∆,M,M+, and a POMDP solver

1: Initialize b, b+ with uniform distributions2: Initialize EF counter δ ← 03: Initialize queue b of size 3 with {b, b, b}4: repeat5: if Pr(sR = sR)>τb, where sR ∈ b+ then6: Add a new recipient entity in KB7: else if Pr(sI = sI)>τb, where sI ∈ b+ then8: Add a new item entity in KB9: else if δ>∆ then

10: Add (item or recipient) entity that is more likely11: if f(b, t) is true then12: δ ← δ + 1

13: if H(b) > h then14: a← “Please reword your service request”15: else16: a← π(b)

17: o← parse(human response)18: Update b based on observation o and action a19: b.enqueue(b)20: if a ∈ Ac then21: b+ ← update(b+)

22: until s is term23: return the request based on the last (reporting) action,

and the (possibly updated) knowledge base.

three beliefs (i.e., entropy of the second last is the highest orlowest among the three), and ⊕ is the xor operator. Figure 4shows an example of EFs in a dialog.

Algorithm 1 shows the main operation loop of the dialog

system. τb is a probability threshold; h is an entropy threshold;∆ is a threshold over the number of EFs; andM andM+ arePOMDPs for dialog and knowledge management respectively.

The algorithm starts by initializing the two belief distri-butions uniformly. δ, which counts the number of EFs, isinitialized to 0. If the marginal probability over sR (or sI ) ofknowledge belief b+ is higher than threshold τb, or the numberof EFs is higher than ∆, we add a new entity into the KB. Ifthe entropy of dialog belief is higher than h, then the agentasks for rewording the original service request. Otherwise, weuse the dialog POMDP to maintain the dialog flow. Finally,the knowledge belief is only updated by confirming questions,which are able to invalidate agent hypothesis of unknownentities. The algorithm returns the request and the updatedknowledge base.

When add a new entity, the agent explicitly asks the usersto help specify the name of the unknown item (or person). TheKB is updated along with the lexicon for the semantic parser.The index for the unknown item or person is associated withthe newly added entry. With the new knowledge added to KB,model has to be adjusted in size so that the dialog can becontinued properly, and the belief b is replaced with b+ whichprovides a reinitialization to continue the conversation.

E. An Illustrative Example

Table I is an illustrative example of one of the trials inthe human participant experiment. The semantic parser failedto correctly parse the request, because it does not know themeaning of get. As a result, the dialog agent made a randomguess that it is “Alice requesting hamburger”. The robotasks again for clarification what it should deliver. The userresponds “coffee”, making the robot confused between coffeeand hamburger. The robot asked a confirmation question about“hamburger”, and received a negative response. Although the

10 17 26 37KB Size

19.0

18.5

18.0

QA C

ost

10 17 26 37KB Size

0

20

40

60

Dial

og R

ewar

d

Dual-track POMDP Manager Baseline Learning Agent

10 17 26 37KB Size

0.4

0.5

0.6

0.7

Over

all S

ucce

ss

Fig. 5. Comparison between our dual-track POMDP-based agent, and a baseline learning agent that augments its knowledge base (KB) if the dialog does notterminate in N = 5 turns. The x-axis is the “KB size”, measured by the number of states in dialog state space. Results of “QA Cost” and “Dialog Reward”are only based on dialog management, whereas we consider both dialog and knowledge management in “Overall Success”.

TABLE IAN EXAMPLE DIALOG FROM A HUMAN PARTICIPANT.

Robot HumanHow can I help you? Get me coffeeWhat item should I bring? CoffeeDo you want me to deliverhamburger? No

Who should I bring the item to? NateIs this delivery for ellen? NoIt seems I do not know the personyou are talking about. Pleasewrite their name so I can learn it.

Nate

Who should I bring the item to? NateIs this delivery for Nate? YesWhat item should I bring? CoffeeExecute: Robot brings coffee for Nate and the dialog is over.

user was explicitly guided to select the recipient among thepeople, this participant used the pronoun “me”, an unknownword to the robot. When the robot heard “Nate” the first time,it could not understand, and made a random guess that it is“Ellen”. As a result, the agent confirmed “Ellen” afterwards,and was invalidated by the user. So it kept asking for clarifi-cations until it adds the person’s name to its knowledge andbecame confident about the request.

IV. EXPERIMENTS

For the experiments, we used a Segway-based robot plat-form, pictured in Figure 1, which is equipped with a screen thatdisplays questions and responses to the user. For the purposesof our experiments, the participants used a wireless keyboardand the mounted screen to communicate with the robot. Thedialog agent was implemented using Robot Operating System(ROS) [20]. In simulation experiments, we model the uncer-tainty in natural language understanding by adding noise to theobservations. For instance, the agent can correctly recognize“coffee” in 0.8 probability, and this probability decreases givenmore items in the KB. When the user verbalizes the entirerequest, the agent receives a sequence of three (unreliable)observations on task, item and recipient in a row. The costsof confirming questions is RC = −1, and the cost of wh-questions is RW =−1.5. In each trial, a request is randomlyselected with 50% chance of including an unknown item orperson. POMDPs are solved using an off-the-shelf system [10].

Experiments were designed to evaluate the following hy-potheses: 1) In comparison to a baseline that augments theKB after the number of turns becomes higher than a threshold,our algorithm reduces the QA cost while maintaining similarsuccess rates. 2) A dialog agent that learns from human-robotconversations creates a better user experience, in comparisonto the ones that use static KBs.

Evaluation metrics used in the experiments consist of: QACost, the total cost of QA actions; Overall Success, where atrial is successful, if the service request is correctly identifiedand (if needed) the KB is corrected augmented; and DialogReward, where both QA cost and bonus/penalty are consid-ered. Focusing on the knowledge augmentation accuracy, wealso use F1 score as a harmonic average of the precision andrecall in knowledge management.

A. Experiments in Simulation

Figure 5 shows the results of comparing our dual-trackPOMDP-based agent with a baseline that augments its KB,if human intention cannot be identified in N = 5 turns(Hypothesis-1). As the domain size increases, our dialogagent performs consistently better than the baseline in OverallSuccess. From Figure 5-Left, we see our agent asks morequestions in larger domains, until it finds the domain (withKB Size 37) is too large and it is not worth more questions.From Figure 5-Middle, we see both methods produce lowerdialog reward given larger domains, because it is harder to getconfident that new knowledge is needed to achieve the dialoggoal in larger domains.

We further evaluated the effect of the EF threshold ∆ to thesystem performance. Intuitively, a higher EF threshold makesit more difficult to introduce new entities to the knowledgebase, resulting in more conservative behaviors. Figure 6 showsthe results, where we can see a small value of ∆ encouragesthe agent to add new knowledge, even if it is not very confidentabout the necessity of doing that. Results of such behaviorsare reflected in the ““F1 score” subfigure” on the right.Given a higher ∆ threshold, the agent is more conservativein knowledge augmentation, and has higher F1 scores. Thedownside of a higher ∆ is that the agent failed in more cases,where knowledge augmentation is needed but did not occur.

2 4 6 8Number of EFs

24

22

20

18

16

14

QA C

ost


0.50

0.55

0.60

0.65

0.70

0.75

0.80

Over

all S

ucce

ss


68

70

72

Dial

og R

ewar

d


0.0

0.2

0.4

0.6

0.8

1.0

F1 S

core

, Pre

cisio

n, R

ecal

l

F1 ScorePrecisionRecall

Fig. 6. Simulation experiment on a fixed size domain, with a varying EF threshold. As the value of EF threshold increases, the agent becomes moreconservative in adding new knowledge. It increases the length of the conversation, while producing higher F1 scores at the same time.

Fig. 7. Left: Segway RMP110 used in the experiment. Right: List of the itemsand recipients that robot participants used to ask for delivery. Two items andtwo people were unknown to the robot.

B. Experiments with Human Participants

Twelve students of ages 19-30 volunteered to participatein an experiment where they could ask the robot to conductdelivery tasks. Two of each of the the item and recipient listswere unknown to the robot, resulting in only about 49% (i.e.,1− 5

7×57 ) of the service requests that can be identified without

requiring knowledge augmentation. The participants were notaware of this setting, and arbitrarily chose any combination ofan item and a recipient to form a delivery task. Each participantconducted the experiment in two trials. In one, the robot usedthe dual-track POMDPs, and in the other, the robot used abaseline dialog agent with a static KB. The delivery itemsand recipients were shown in Figure 7.

At the end of the experiment, participants were requiredto fill out a survey form indicating their qualitative opinionincluding the following items. The response choices were 0(Strongly disagree), 1 (Somewhat disagree), 2 (Neutral), 3(Somewhat agree), and 4 (Strongly agree).

1) How easy the tasks were defined;2) How well the robot understood the human request;3) Whether robot frustrated the participant or not; and4) Participant’s willingness to use the robot in the future.

Table II shows the average scores. We see that, with p-value threshold 0.1, our dual-track POMDP approach makessignificant improvements in response for Q3 (“robot frustrated

TABLE IIRESULTS OF THE HUMAN PARTICIPANT EXPERIMENT.

Q1 Q2 Q3 Q4Average score using dual-track POMDP 3.42 2.50 1.50 2.50Average score of baseline 3.33 1.83 2.17 1.75

Fig. 8. Results of survey from participants. The survey consisted of fourstatements with Likert-scale responses.

me”) and Q4 (“Will use the robot”). There is no significancedifference observed in responses to the other two questions.Figure 8 presents more details from the survey papers.

V. CONCLUSIONS AND FUTURE WORK

We introduced a dialog agent that simultaneously supportshuman intention identification, and knowledge augmentationon an as-needed basis. Experiments show that our dual-trackPOMDP controller enables the agent to do dialog and knowl-edge management. In comparison to a baseline that augmentsits knowledge base after a fixed number of turns, our dialogagent consistently produces higher dialog overall success.Experiments with human participants show that humans feltless frustrated and more willing to use our agent. This dialogagent can be particularly useful to robot platforms that workin open domains, where pre-programing a knowledge base isimpractical. This includes places like offices, factories, andhospitals where the space of items and people may change asa function of time.

In the future, we plan to enable the agent to augmentknowledge bases with more complex structures, e.g., to modelsubclasses of item. Such structures make knowledge man-agement more difficult. Another direction is to enable the

agent to learn and merge synonyms via human-robot dialog,and remove incorrect (or unnecessary) concepts on an as-needed basis. In this process, the agent may want to activelyintroduce “redundant” dialog turns for knowledge acquisitionand clarification purposes. Finally, we plan to experimentwith more human participants and in environments with moredynamics.

REFERENCES

[1] Yoav Artzi. Cornell SPF: Cornell semantic parsingframework. arXiv preprint arXiv:1311.3011, 2013.

[2] Amos Azaria, Jayant Krishnamurthy, and Tom MMitchell. Instructable intelligent personal agent. In AAAI,pages 2681–2689, 2016.

[3] Yingfeng Chen, Feng Wu, Wei Shuai, Ningyang Wang,Rongya Chen, and Xiaoping Chen. Kejia robot–an attrac-tive shopping mall guider. In International Conferenceon Social Robotics, pages 145–154. Springer, 2015.

[4] Heriberto Cuayahuitl. Simpleds: A simple deep rein-forcement learning dialogue system. In Dialogues withSocial Robots, pages 109–118. Springer, 2017.

[5] Michael Gelfond and Yulia Kahl. Knowledge representa-tion, reasoning, and the design of intelligent agents: Theanswer-set programming approach. Cambridge Univer-sity Press, 2014.

[6] Ze Gong and Yu Zhang. Temporal spatial inversesemantics for robots communicating with humans. InProceedings of the IEEE International Conference onRobotics and Automation (ICRA), 2018.

[7] Nick Hawes, Christopher Burbridge, Ferdian Jovan, LarsKunze, Bruno Lacerda, Lenka Mudrova, Jay Young,Jeremy Wyatt, Denise Hebesberger, Tobias Kortner, et al.The strands project: Long-term autonomy in everydayenvironments. IEEE Robotics & Automation Magazine,24(3):146–156, 2017.

[8] Leslie Pack Kaelbling, Michael L. Littman, and An-thony R. Cassandra. Planning and acting in partiallyobservable stochastic domains. Artificial intelligence,101(1-2):99–134, 1998.

[9] Piyush Khandelwal, Shiqi Zhang, Jivko Sinapov, MatteoLeonetti, Jesse Thomason, Fangkai Yang, Ilaria Gori,Maxwell Svetlik, Priyanka Khante, Vladimir Lifschitz,et al. BWIbots: A platform for bridging the gap betweenAI and human–robot interaction research. The Interna-tional Journal of Robotics Research, 36(5-7):635–659,2017.

[10] Hanna Kurniawati, David Hsu, and Wee Sun Lee. Sarsop:Efficient point-based pomdp planning by approximatingoptimally reachable belief spaces. In Robotics: Scienceand Systems, volume 2008. Zurich, Switzerland., 2008.

[11] Dongcai Lu, Shiqi Zhang, Peter Stone, and XiaopingChen. Leveraging commonsense reasoning and multi-modal perception for robot spoken dialog systems. InIEEE/RSJ International Conference on Intelligent Robotsand Systems (IROS), pages 6582–6588. IEEE, 2017.

[12] Dongcai Lu, Yi Zhou, Feng Wu, Zhao Zhang, and

Xiaoping Chen. Integrating answer set programmingwith semantic dictionaries for robot task planning. InProceedings of the 26th International Joint Conferenceon Artificial Intelligence, pages 4361–4367. AAAI Press,2017.

[13] Cynthia Matuszek, Evan Herbst, Luke Zettlemoyer, andDieter Fox. Learning to parse natural language com-mands to a robot control system. In ExperimentalRobotics, pages 403–415. Springer, 2013.

[14] Cetin Mericli, Steven D. Klee, Jack Paparian, andManuela Veloso. An interactive approach for situatedtask specification through verbal instructions. In Pro-ceedings of the 2014 international conference on Au-tonomous agents and multi-agent systems, pages 1069–1076. International Foundation for Autonomous Agentsand Multiagent Systems, 2014.

[15] Dipendra K Misra, Jaeyong Sung, Kevin Lee, andAshutosh Saxena. Tell me dave: Context-sensitivegrounding of natural language to manipulation instruc-tions. The International Journal of Robotics Research,35(1-3):281–300, 2016.

[16] Volodymyr Mnih, Koray Kavukcuoglu, David Silver,Andrei A Rusu, Joel Veness, Marc G Bellemare, AlexGraves, Martin Riedmiller, Andreas K Fidjeland, GeorgOstrovski, et al. Human-level control through deepreinforcement learning. Nature, 518(7540):529, 2015.

[17] Aishwarya Padmakumar, Jesse Thomason, and Ray-mond J. Mooney. Integrated learning of dialog strate-gies and semantic parsing. In Proceedings of the 15thConference of the European Chapter of the Associationfor Computational Linguistics: Volume 1, Long Papers,volume 1, pages 547–557, 2017.

[18] Vittorio Perera, Robin Soetens, Thomas Kollar, MehdiSamadi, Yichao Sun, Daniele Nardi, Rene van de Molen-graft, and Manuela Veloso. Learning task knowledgefrom dialog and web access. Robotics, 4(2):223–252,2015.

[19] Martin L Puterman. Markov decision processes: discretestochastic dynamic programming. John Wiley & Sons,2014.

[20] Morgan Quigley, Ken Conley, Brian Gerkey, Josh Faust,Tully Foote, Jeremy Leibs, Rob Wheeler, and Andrew YNg. Ros: an open-source robot operating system. In ICRAworkshop on open source software, volume 3, page 5.Kobe, Japan, 2009.

[21] Lanbo She and Joyce Chai. Interactive learning ofgrounded verb semantics towards human-robot commu-nication. In Proceedings of the 55th Annual Meeting ofthe Association for Computational Linguistics (Volume1: Long Papers), volume 1, pages 1634–1644, 2017.

[22] Satinder Singh, Diane Litman, Michael Kearns, andMarilyn Walker. Optimizing dialogue management withreinforcement learning: Experiments with the njfun sys-tem. Journal of Artificial Intelligence Research, 16:105–133, 2002.

[23] Michael Spranger and Luc Steels. Co-acquisition of syn-

tax and semantics: an investigation in spatial language.In Proceedings of the 24th International Conference onArtificial Intelligence, pages 1909–1915. AAAI Press,2015.

[24] Richard S Sutton and Andrew G Barto. Reinforcementlearning: An introduction, volume 1. MIT press Cam-bridge, 1998.

[25] Stefanie Tellex, Thomas Kollar, Steven Dickerson,Matthew R Walter, Ashis Gopal Banerjee, Seth J Teller,and Nicholas Roy. Understanding natural language com-mands for robotic navigation and mobile manipulation.In AAAI, volume 1, page 2, 2011.

[26] Jesse Thomason, Shiqi Zhang, Raymond J. Mooney,and Peter Stone. Learning to interpret natural languagecommands through human-robot dialog. In Proceedingsof the 24th International Conference on Artificial Intel-ligence, pages 1923–1929. AAAI Press, 2015.

[27] Mycal Tucker, Derya Aksaray, Rohan Paul, Gregory JStein, and Nicholas Roy. Learning unknown groundingsfor natural language interaction with mobile robots. InInternational Symposium on Robotics Research (ISRR),2017.

[28] Manuela Veloso, Joydeep Biswas, Brian Coltin, andStephanie Rosenthal. Cobots: robust symbiotic au-tonomous mobile service robots. In Proceedings of the24th International Conference on Artificial Intelligence,pages 4423–4429. AAAI Press, 2015.

[29] Jason D Williams and Geoffrey Zweig. End-to-end LSTM-based dialog control optimized with su-pervised and reinforcement learning. arXiv preprintarXiv:1606.01269, 2016.

[30] Steve Young, Milica Gasic, Blaise Thomson, and Jason DWilliams. POMDP-based statistical spoken dialog sys-tems: A review. Proceedings of the IEEE, 101(5):1160–1179, 2013.

[31] Shiqi Zhang and Peter Stone. CORPP: commonsensereasoning and probabilistic planning, as applied to dialogwith a mobile robot. In Proceedings of the Twenty-NinthAAAI Conference on Artificial Intelligence, pages 1394–1400. AAAI Press, 2015.

[32] Shiqi Zhang, Piyush Khandelwal, and Peter Stone. Dy-namically constructed (PO)MDPs for adaptive robotplanning. In Proceedings of the Thirty-First AAAIConference on Artificial Intelligence, pages 3855–3862,2017.

simultaneous intention estimation and knowledge augmentation via human-robot...

Documents