towards efﬁcient human-robot dialogue collection: moving ... · 2usc institute for creative...

Towards Efficient Human-Robot Dialogue Collection: Moving Fido intothe Virtual World

Cassidy Henry1, Pooja Moolchandani1, Kimberly A. Pollard1, Clare Bonial1, Ashley Foots1,Ron Artstein2, Cory Hayes1, Clare R. Voss1, David Traum2, and Matthew Marge1

1U.S Army Research Laboratory, Adelphi, MD 207832USC Institute for Creative Technologies, Playa Vista, CA 90094

[email protected]

Abstract

Our research aims to develop a natural di-alogue interface between robots and hu-mans. We describe two focused efforts toincrease data collection efficiency towardsthis end: creation of an annotated corpusof interaction data, and a robot simula-tion, allowing greater flexibility in whenand where we can run experiments.

1 Introduction

Effective human-robot (H-R) teaming requiresrobots to engage in natural communication. Nat-ural language (NL) dialogue allows bi-directionalinformation exchange, with the benefit of famil-iarity and flexibility for humans. Our goal is todevelop dialogue processing capabilities for anautomated robot receiving instruction from a re-mote human teammate in a collaborative search-and-navigate task. The physical robot, affec-tionately nicknamed Fido by a participant, is aClearpath Robotics Jackal running ROS (Quigleyet al., 2009).

We follow a multiphase Wizard-of-Oz (WoZ)data collection approach (Marge et al., 2016a) tobootstrap the robot’s planned language capabili-ties, as the technology to support teaming inter-actions does not yet exist. The solution cannotsimply decompose into autonomous robot controland language processing. It cannot be fully au-tonomous as it must respond to and integrate in-structions, questions, and information from humanteammates into its plans. Natural language pro-cessing (NLP) relies on situated interaction basedon the dynamic state and robot action, perception,and inferential capabilities, that can be neither assimple as translation to a rigid command languagenor as extensive as requiring the full range of hu-man

Figure 1: WoZ Experiment Setup, where a CMDgives vocal instruction to a remote robot, repre-sented by the two-wizard setup

reasoning power and common sense knowl-edge. We use a two-wizard setup (Figure 1) to ad-dress this interdependence, allowing separate sim-ulation of both NL interaction based on flexiblebut limited robot intelligence, and navigation con-trols (Marge et al., 2016b). The Dialogue Manager(DM) listens to Commander (CMD) speech, theneither types back dialogue replies, or types con-strained action sets to the Robot Navigator (RN)who joysticks the robot.

The first experiment phase revealed several datacollection challenges, described in Section 2. InSection 3, we describe current efforts to annotatedialogue data, to be used to scope requirementsfor automated action and interaction, and providetraining data for automating NLP components. InSection 4, we describe ongoing robot simulatordevelopment which provides the ability to collectadditional training data more efficiently for bothdialogue processing and robot navigation.

2 Challenges of H-R Dialogue Collection

Substantial resources are required for data collec-tion. Due to shared test space and a single robot,we could only one run participant at a time. Four

researchers are needed for each participant for atwo-hour time block. The location dependence re-stricts the participant pool, making sufficient par-ticipent recruitment difficult. We found more datais necessary to both capture more natural partic-ipant language use variation, and to collect suffi-cient training data to automate dialogue process-ing and robot navigation capabilities.

Two focused efforts enable us to address theseissues: development of an annotated corpus oflanguage interactions (to automate more aspectsof the language process and reduce human wiz-ard labor), and a virtual simulation replicating ourphysical environment (allowing greater flexibilityin when and where we can run experiments).

3 Annotation of H-R Dialogue

We collected ∼10.5 hours of CMD speech in totalin phase one, along with the DM text messagesand RN feedback. Utterances were segmented inPraat (Boersma, 2001)

on a per-command basis, consisting of a sin-gle action or answer. There were 1668 total com-mands across all ten participants. Initial corpusprocessing included speech transcription and man-ual time alignment of the four streams to en-able analysis of utterance relationships. Figure2 shows the four streams (two typed, two spo-ken) from three interlocutors and contains: in-structions, translations to the RN, feedback, clar-ification, and question answering. We performedseveral annotations, including those for dialoguemoves(Marge et al., 2017), structure, and relationson the data.

Figure 2: Excerpt of dialogue corpus, showingfour message streams.

Each typed DM utterance was labeled with adialogue move. The dialogue move categorieswere analyzed and condensed to form a best-representative reply for each category. We con-densed DM utterances into a smaller set of specificutterances and utterance templates, which were

used to build a GUI for the second phase that be-gins language production automation by provid-ing tractable response data. It transforms a fully-generative task to one of selection and specifica-tion of a few details, speeding up response timeand lowering the effort required to produce com-plex language instructions and feedback.

Utterance data is being annotated and analyzedto automate NL understanding and dialogue man-agement modules, relying on observed behaviorpatterns to generate, tune, and evaluate policiesfor responding to the participant and translatinginstructions to the navigation component.

4 Moving “Fido” Into the Virtual World

Our simulation setup aims to reduce requirementsto run our research program’s next phase and col-lect more dialogue data. We developed high-fidelity replications of the robot and physical en-vironment using ROS and Gazebo (Koenig andHoward, 2016).

The virtual Jackal was equipped with the samesensors as the physical platform. The simulatedsetup will host the same task, and the GUI dis-play to CMD is the same as earlier experiments(see Figure 1, top). A point-and-click navigationsystem was included alongside the virtual robot,which, along with the GUI, enables a single wiz-ard to perform both dialogue management andnavigation.

Simulation thus provides several advantages fordata collection: (i) faster, parallel data collection,(ii) multiple site experimentation without physicalrobots, and (iii) simpler control for human opera-tors. With robot and simulation operating on thesame software and emulated hardware, we expectthe experiment will smoothly transition back intoa physical environment for validation purposes.

Figure 3: Differences across different phases ofexperiment and increasing number of participants.

5 Conclusion

Our overall program objective is to provide morenatural ways for humans to interact and commu-nicate with robots using language, utilizing multi-

phase data collection experiments to incrementallyautomate the system, towards the ultimate goal offull automation. We highlighted two focused ef-forts to increase data collection efficiency: corpuscreation/GUI development effort and robot sim-ulation. This corpus will help address many is-sues encountered in understanding and process-ing situated H-R dialogue. The robot simulationreplicates our physical environment while allow-ing greater flexibility in running experiments, andallows validation of simulated results in a physicalenvironment after completed data collection.

ReferencesPaul Boersma. 2001. Praat, a system for doing pho-

netics by computer. Glot International 5:9/10:341–345.

Nathan Koenig and Andrew Howard. 2016. De-sign and use paradigms for Gazebo, an open-sourcemulti-robot simulator. In Proceedings of the Inter-national Conference on Intelligent Robots and Sys-tems. IEEE.

Matthew Marge, Claire Bonial, Brendan Byrne, TaylorCassidy, A. William Evans, Susan G. Hill, and ClareVoss. 2016a. Applying the Wizard-of-Oz techniqueto multimodal human-robot dialogue. In Proceed-ings of the IEEE International Symposium on Robotand Human Interactive Communication (RO-MAN).

Matthew Marge, Claire Bonial, Ashley Foots, CoryHayes, Cassidy Henry, Kimberly A. Pollard, RonArtstein, Clare R. Voss, and David Traum. 2017.Investigating variation of natural human commandsto a robot in a collaborative navigation task. InProceedings of RoboNLP: Language Grounding forRobotics.

Matthew Marge, Claire Bonial, Kimberly A. Pollard,Ron Artstein, Brendan Byrne, Susan G. Hill, ClareVoss, and David Traum. 2016b. Assessing agree-ment in human-robot dialogue strategies: A tale oftwo wizards. In Proceedings of the Sixtheenth Inter-national Conference on Intelligent Virtual Agents.

Morgan Quigley, Ken Conley, Brian Gerkey, JoshFaust, Tully Foote, Jeremy Leibs, Rob Wheeler, andAndrew Y Ng. 2009. ROS: an open-source RobotOperating System. In Proceedings of the IEEE In-ternational Conference on Robotics and Automa-tion. IEEE.

towards efﬁcient human-robot dialogue collection: moving ... · 2usc institute for creative...

Documents