linking perception and action in a control architecture for …monica/papers/nicolescumatari... ·...

Papersubmissionfor HICSS-36

Emerging TechnologiesTrack

AugmentedCognition and Human-Robot Interaction Minitrack

Linking Perceptionand Action in a Control Ar chitecture forHuman-Robot Domains

Monica N. Nicolescuand Maja J Matari c

RoboticsResearchLaboratory

Universityof SouthernCalifornia

941West37thPlace,MC 0781

Los Angeles,CA 90089-0781

E-mail: monica�[email protected]

Tel: (213)740-6243,Fax: (213)740-7512

1

Abstract

Human-robotinteractionis agrowing researchdomain;therearemany approachesto robotdesign,

dependingon theparticularaspectsof interactionbeingfocusedon. In this paperwe presentan

action-basedframework that providesa naturalmeansfor robotsto interactwith humansandto

learnfrom them. Perceptionandactionaretheessentialmeansfor a robot’s interactionwith the

environment;for successfulrobotperformanceit is thusimportantto exploit this relationbetween

a robot andits environment. Our approachlinks perceptionandactionsin a uniquearchitecture

for representinga robot’s skills (behaviors). We usethis architectureto endow therobotswith the

ability to convey their intentionsby actingupontheir environmentandalsoto learn to perform

complex tasksfrom observingandexperiencinga demonstrationby a humanteacher. We demon-

stratetheseconceptswith a Pioneer2DX mobilerobot,learningvarioustasksfrom a humanand,

whenneeded,interactingwith ahumanto gethelpby conveying its intentionsthroughactions.

Keywords: Robotics,LearningandHuman-RobotInteraction

2

1 Intr oduction

Human-robotinteractionis an areaof growing interestin Robotics. Environmentsthat feature

theinteractionof humansandrobotspresentasignificantnumberof challenges,spawningseveral

importantresearchdirections.Thesedomainsof human-machineco-existenceform anew typeof

“society” in which therobot’s role is essentialin determiningthenatureof resultinginteractions.

In thiswork we focuson two majorchallengesof key importancefor designingrobotsthatwill be

effective in human-robotdomains.

Thefirst challengewe addressis thedesignof robotsthatexhibit socialbehavior, in orderto

allow themto engagein varioustypesof interactions.This is a very largedomain,with examples

including teachers[1],[2], [3], workers,membersof a team,cooperatingwith other robotsand

peopleto solve andperformtasks[4]. Robotscanbe entertainers,suchasmuseumtour-guides

[5], toys [6], pets,or emotionalcompanions[7]. Designingcontrolarchitecturesfor suchrobots

presentsparticularchallenges,in largepartspecificfor eachof thesedomains.

Thesecondchallengeweaddressis to build robotsthathave theability to learnthroughsocial

interactionwith humansor with otherrobotsin theenvironment,in orderto improve their perfor-

manceandexpandtheir capabilities.Successfulexamplesincluderobotsimitating demonstrated

tasks(suchasmazelearning[8] andjuggling [9]) andtheusenaturalcues(suchasmodelsof joint

attention[10]) asmeansfor socialinteraction.

In this paperwe presentan approachthat unifies the two challengesabove, interactionand

learningin human-robotenvironments,by unifying perceptionandaction in the form of action-

basedinteraction. Our approachrelieson an architecturethat is basedon a setof behaviors or

skills consistingof bothactiveandperceptualcomponents.

The perceptualcomponentof a behavior givesthe robot the capabilityof creatinga link be-

tweenits observationsandits own actions,which enablesit to learnto performa particulartask

from theexperiencesit hadwhile interactingwith humans.

The activecomponentof a robot behavior allows the useof implicit communication,which

doesnot rely on a symboliclanguage,andinsteadusesactions,whoseoutcomesareinvariantto

the specificbody performingthem. A robot can thusconvey its intentionsby suggestingthem

throughactions,ratherthancommunicatingthemthroughconventionalsigns,sounds,gestures,or

markswith previously agreed-uponmeanings.We employ theseactionsasa vocabulary that a

robotcoulduseto inducea humanto assistit for partsof tasksthatit is not ableto performon its

3

own. Theparticularitiesof our behavior architecturearedescribedin Section2.

To illustrateourapproach,wepresentexperimentsin whichahumanactsbothasateacherand

a collaboratorfor a mobile robot. The differentaspectsof this interactionhelp demonstratethe

robot’s learningandsocialabilities.

This paperis organizedasfollows. Section 2 presentsthebehavior representationthatwe are

using,andtheimportanceof thearchitecturefor ourproposedchallenges.In Section3, wepresent

themodelfor human-robotinteractionandthegeneralstrategy for communicatingintentions,in-

cludingexperimentsin whicharobotengagedahumanin interactionthroughactionsindicativeof

its intentions.Section 4 describesthemethodfor learningtaskrepresentationsfrom experienced

interactionswith humansandpresentsexperimentaldemonstrationsandvalidationof learningtask

representationsfrom demonstration.Sections 5 and 6 discussdifferentrelatedapproachesand

presenttheconclusionsof thedescribedwork.

2 Behavior representation

Perceptionandactionare the essentialmeansof interactionwith the environment. The perfor-

manceandthecapabilitiesof a robotaredependenton its availableactions,andthusthey arean

essentialcomponentof its design. As underlyingcontrol architecturewe areusinga behavior-

basedapproach[11, 12], in which time-extendedactionsthatachieveor maintainaparticulargoal

aregroupedinto behaviors, thekey building blocksfor intelligent,complex observablebehavior.

Thecomplexity of a robot’sskills canrangefrom elementaryactions(suchas“go forward”, “turn

left”) to temporally-extendedbehaviors (suchas“follo w”, “go home”, etc.).

Figure1: Structureof theinputs/outputsof anabstractandprimitivebehavior.

Within our architecture,behaviors arebuild from two components:onerelatedto perception

(Abstract behavior), the otherto actions(Primitive behavior) (Figure1). The abstract behavior

is simply anexplicit specificationof thebehavior’s activationconditions(i.e., preconditions),and

4

its effects(i.e., postconditions).Thebehaviors thatdo thework thatachievesthespecifiedeffects

underthe given conditionsare calledprimitive behaviors. An abstract behavior takes sensory

informationfrom theenvironmentand,whenits preconditionsaremet,activatesthecorresponding

primitive behavior(s), whichachieve theeffectsspecifiedin its postconditions.

This architectureprovidesa simpleandnaturalway of representingrobot tasksin theform of

behavior networks [13], andalsohasthe flexibility requiredfor robust function in dynamically

changingenvironments.Figure2 showsagenericbehavior network.

Figure2: Exampleof abehavior network

Theabstract behaviors embedrepresentationsof a behavior’s goalsin the form of abstracted

environmentalstates.Thisisakey featureof ourarchitecture,andacritical aspectfor learningfrom

experience.In orderto learna tasktherobothasto createa link betweenperception(observations)

and the actionsthat would achieve the sameobservedeffects. This processis enabledby the

abstract behaviors, theperceptualcomponentof a behavior. This componentfireseachtime the

observationsmatcha primitive’s goals,allowing the robot to identify during its experiencethe

behaviors thatarerelevantfor thetaskbeinglearned.

Theprimitive behaviors aretheactive componentof a behavior, executingtherobot’s actions

andachieving its goals.Acting in theenvironmentis a form of implicit communicationthatplays

a key role in humaninteraction.Usingevocativeactions,people(andotheranimals)convey emo-

tions, desires,interests,and intentions. Action-basedcommunicationhasthe advantagethat it

neednot be restrictedto robotsor agentswith a humanoidbody or face: structuralsimilarities

betweenthe interactingagentsarenot requiredto achieve successfulinteraction.Evenif thereis

noexactmappingbetweenamobilerobot’sphysicalcharacteristicsandthoseof ahumanuser, the

robot may still be ableto convey a message,sincecommunicationthroughactionalsodraws on

humancommonsense[14]. In thenext sectionwe describehow our approachachievesthis type

of communication.

5

3 Communication by acting - a meansfor robot-human inter-action

Our goal is to develop a modelof interactionwith humansthat would allow a robot to induce

a humanto assistit by being able to expressits intentionsin a way that humanscould easily

understand.Wefirst presentageneralexamplethatillustratesthebasicideaof ourapproach.

Considera prelinguisticchild who wantsa toy that is out of his reach. To get it, the child

will try to bring a grown-upto the toy andwill thenpoint andeventry to reachit, indicatinghis

intentions.Similarly, adogwill run backandforth to induceits ownerto cometo aplacewhereit

hasfoundsomethingit desires.Theability of thechild andthedogto demonstratetheir intentions

by callingahelperandmock-executinganactionis anexpressiveandnaturalwayto communicate

aproblemandneedfor help.Thecapacityof ahumanobserverto understandtheseintentionsfrom

exhibitedbehavior is alsonaturalsincetheactionscarryintentionalmeanings,andthusareeasyto

understand.

We apply the samestrategy in the robot domain. Theaction-basedcommunicationapproach

we proposefor thepurposeof suggestingintentionsis generalandcanbeappliedacrossdifferent

tasksandphysicalbodies/platforms.In our approach,a robotperformsits taskindependently, but

if it fails in a cognizantfashion,it searchesfor a humanandattemptsto inducehim to follow it

to theplacewherethefailureoccurredanddemonstratesits intentionsin hopesof obtaininghelp.

Next, wedescribehow this communicationis achieved.

Immediatelyaftera failure,therobotsavesthecurrentstateof thetaskexecution(failurecon-

text), in orderto beableto laterrestartexecutionfrom thatpoint.

Track(Human,90,50)

Track(Human,90,100)

Initialize

Figure3: Behavior network for callinga human

Next, therobotstartstheprocessof findingandluring ahumanto help.This is implementedas

abehavior-basedsystem,whichusestwo instancesof aTrack(Human, angle,distance)behavior,

with differentvaluesof theDistanceparameter:onefor gettingclose(50cm)andonefor getting

farther(1m) (Figure3). As partof thefirst trackingbehavior, therobotsearchesfor andfollows a

humanuntil hestopsandtherobotgetssufficiently close.At thatpoint, thepreconditionsfor the

secondtrackingbehavior areactive, so the robot backsup in orderto get to the fartherdistance.

6

Oncethe outcomesof this behavior have beenachieved (anddetectedby the Init behavior), the

robotre-instantiatesthenetwork, resultingin abackandforth cyclingbehavior, muchlikeadog’s

behavior for enticinga humanto follow it. When the detecteddistancebetweenthe robot and

the humanbecomessmallerthanthe valuesof the Distanceparameterfor any oneof its Track

behaviors for someperiodof time, thecycling behavior is terminated.

TheTrack behavior enablestherobotto follow coloredtargetsatany distancein the[30, 200]

cm rangeandany anglein the[0, 180] degreerange.Theinformationfrom thecamerais merged

with datafrom the laserrange-finderin orderto allow the robot to track targetsthat areoutside

of its visual field (seeFigure4). The robot usesthe camerato first detectthe target andthento

trackit after it goesout of thevisualfield. As long asthetarget is visible to thecamera,therobot

usesits positionin thevisualfield ( �� ) to infer anapproximateangleto thetarget �� (the

“approximation”in theanglecomesfrom thefactthatwearenotusingprecisecalibrateddatafrom

thecameraandwecomputeit without takinginto considerationthedistanceto thetarget).Weget

therealdistanceto thetarget �� "!# ��$��% from thelaserreadingin asmallneighborhoodof the

�� angle.Whenthetargetdisappearsfrom thevisualfield,wecontinueto trackit by lookingin

theneighborhoodof thepreviouspositionin termsof angleanddistancewhicharenow computed

as �&�� ')(*,+ and �-�� ,�.�� ! �� .')(*/+ . Thus, the behavior givesthe robot the ability to keeptrack of

positionsof objectsaroundit, evenif they arenotcurrentlyvisible,akin to workingmemory. This

is extremelyusefulduringthelearningprocess,asdiscussedin thenext section.

(a) Spacecoverageus-ing laserrangefinderandcamera

(b) Principle for targettracking by merging vi-sionandlaserdata

Figure4: Merging laserandvisualinformationfor tracking

After capturingthe human’s attention,the robot switchesbackto the taskit wasperforming

(i.e., loadsthetaskbehavior network andthefailurecontext thatdetermineswhichbehaviorshave

beenexecutedandwhich behavior hasfailed), while makingsurethat the humanis following.

This is achieved by adjustingthe speedof the robot suchthat thehumanfollower is keptwithin

7

desirablerangebehindtherobot. If thefollower is lost, therobotstartssearchingagainfor another

helper. After a few experienceswith unhelpfulhumans,the robot will againattemptto perform

the taskon its own. If a humanprovidesusefulassistance,andthe robot is ableto executethe

previously failedbehavior, therobotcontinueswith taskexecutionasnormal.

Thus,therobotretriesto executeits taskfrom thepoint whereit hasfailed,while makingsure

that thehumanhelperis nearby. Executingthe previously failed behavior will likely fail again,

effectively expressingto thehumantherobot’sproblem.

In the next sectionwe describethe experimentswe performedto test the above approachto

human-robotinteraction,involving casesin whichthehumanis helpful,unhelpful,or uninterested.

3.1 Experiments on Robot Interacting with humans - Communication byActing

Theexperimentsthatwepresentin thissectionfocusonperformingactionsasameansof commu-

nicatingintentionsandneeds.Initially, therobot (which hasa typical mobile robot form entirely

differentfrom thatof thehuman)wasgivena behavior setthatallowedit to trackcoloredtargets,

opendoors,pick up,drop,andpushobjects.ThebehaviorswereimplementedusingAYLLU [15],

an extensionof the C languagefor developmentof distributedcontrol systemsfor mobile robot

teams.

We testedour conceptson a Pioneer2-DX mobilerobot,equippedwith two ringsof sonars(8

front and8 rear),aSICK laserrange-finder, apan-tilt-zoomcolor camera,agripper, andon-board

computationon aPC104stack(Figure5).

Figure5: A Pioneer2DX robot

In orderto testtheinteractionmodelwe describedabove,we designeda setof experimentsin

which theenvironmentwaschangedso that the robot’s executionof the taskbecameimpossible

withoutsomeoutsideassistance.

8

The failure to performany oneof thestepsof the taskinducedthe robot to seekhelp andto

performevocativeactionsin orderto catchtheattentionof ahumanandgethim to theplacewhere

the problemoccurred. In orderto communicatethe natureof the problem,the robot repeatedly

tried to executethe failed behavior in front of its helper. This is a generalstrategy that canbe

employed for a wide variety of failures. However, asdemonstratedin our third examplebelow,

therearesituationsfor which this approachis not sufficient for conveying themessageaboutthe

robot’s intent. In those,explicit communication,suchasnaturallanguage,is moreeffective. We

discusshow differenttypesof failuresrequiredifferentmodesof communicationfor help.

In ourvalidationexperiments,weaskedapersonthathadnotworkedwith therobotbeforeto be

closeduringthetasksexecutionandexpectto beengagedin interaction.Duringtheexperimentset,

weencountereddifferentsituations,correspondingto differentreactionsof thehumanin response

to therobot.Wecangroupthesecasesinto thefollowing maincategories:

0 uninterested: the humanwasnot interestedin, did not reactto, or did not understandthe

robot’scalling for help.As a result,therobotstartedto searchfor anotherhelper.

0 interested,unhelpful: thehumanwasinterestedandfollowedtherobotfor awhile but then

abandonedit. As in the previouscase,whenthe robot detectedthat the helperwaslost, it

startedto look for anotherone.

0 helpful: thehumanfollowedtherobotto thelocationof theproblemandassistedtherobot.

In thesecasestherobotwasableto finish theexecutionof thetask,benefitingfrom thehelp

it hadreceived.

We purposefullyconstrainedtheenvironmentin which thetaskwasto beperformed,in order

to encouragehuman-robotinteraction.Thehelper’s behavior, consequently, hadadecisive impact

on the robot’s task performance:when uninterestedor unhelpful, failure ensuedeither due to

exceedingtime constraintsor to the robot giving up the task after trying for too many times.

However, therewerealsocaseswhentherobot failed to find or enticethehumanto comealong,

dueto visual sensinglimitations or the robot failing to expressively executeits calling behavior.

Thefew casesin which a failureoccurreddespitetheassistanceof a helpful humanarepresented

below, alongwith adescriptionof eachof thethreeexperimentaltasksandoverall results.

9

3.1.1 Traversingblocked gates

In thissectionwediscussanexperimentin whicharobotis givenataskof traversinggatesformed

by two closelyplacedcoloredtargets(seeFigure6(a). Theenvironmentis arrangedsuchthatthe

pathbetweenthetargetsis blockedby a largebox thatpreventstherobotfrom goingthrough.

Expressingintentionality of performingthis task is doneby executingthe Track behavior,

whichallowstherobotto makeits wayaroundoneof thetargets.While trying to reachthedesired

distanceandangleto thetarget,hinderedby the largebox, therobotshows thedirectionit wants

to go in, which is blockedby theobstacle.

(a) Go-ing throughagate

(b) Pickingup an inac-cessiblebox

(c) Visiting amissingtarget

Figure6: Thehuman-robotinteractionexperimentssetup

We performed12 experimentsin which the humanproved to be helpful. Failuresin accom-

plishingthe taskoccurredin threeof thecases,in which therobotcouldnot get throughthegate

evenafterthehumanhadclearedthebox from its way. For therestof thecasestherobotsuccess-

fully finishedthetaskwith thehuman’sassistance.

3.1.2 Moving inaccessiblelocatedobjects

Theexperimentdescribedin this sectioninvolvesmoving objectsaround.Therobot is supposed

to pick up a smallobject,closeto a big bluetarget. In orderto inducetherobot to seekhelp,we

placedthedesiredobjectin a narrow spacebetweentwo largeboxes,thusmakingit inaccessible

to therobot(seeFigure6(b)).

The robot expressesthe intentionsof getting the objectby simply attemptingto executethe

correspondingPickUp behavior. This forcesthe robot to lower andopenits gripperandtilt its

cameradown whenapproachingtheobject. Thedrive to pick up theobjectis combinedwith the

10

effect of avoiding largeboxes,causingtherobotto go backandforth in front of thenarrow space

andthusconvey anexpressivemessageaboutits intentionsandits problem.

From 12 experimentsin which the humanproved to be helpful, we recordedtwo failuresin

achieving the task. Thesefailureswere due to the robot losing track of the object during the

human’s interventionandbeingunableto find it againbeforetheallocatedtime expired. For the

restof thecasesthehelpreceivedallowedtherobotto successfullyfinish thetaskexecution.

3.1.3 Visiting non-existingtargets

In thissectionwepresentanexperimentthatdoesnot fall into thecategoryof thetasksmentioned

above andis an examplefor which the framework of communicatingthroughactionsshouldbe

extendedto includemoreexplicit meansof communication.Considera taskof visiting a number

of targets,in a given order (Green,Orange,Blue, Yellow, Orange,Green),in which oneof the

targetshasbeenremovedfrom theenvironment(Figure6(c)).

Therobotgivesup aftersometime of searchingfor themissingtargetandgoesto thehuman

for help.By applyingthesamestrategy of executingin front of thehelperthebehavior thatfailed,

theresultwill beacontinuouswanderingin searchof thetargetfrom which it is hardto infer what

therobot’sgoalandproblemare.It is evidentthattherobotis looking for something- but without

theability to namethemissingobject,thehumancannotintervenein ahelpfulway.

3.2 Discussion

Theexperimentspresentedabovedemonstratethatimplicit yet expressiveaction-basedcommuni-

cationcanbe successfullyusedeven in the domainof mobile robotics,wherethe robotscannot

utilize physicalstructuresimilaritiesbetweenthemselvesandthepeoplethey areinteractingwith.

However, asour third experimentshowed,therearesituationsin whichactionsalonearenotsuffi-

cientfor conveying therobot’s intent.This is dueto thefact thatthefailuretherobotencountered

hasaspectsthat could not be expressedby only repeatingthe unsuccessfulactions. For those

casesweshouldemploy explicit formsof communication,suchasnaturallanguage,to convey the

necessaryinformation.

From the results,our observations,andthe reportof the humansubjectinteractingwith the

robot throughouttheexperiments,we derive the following conclusionsaboutthevariousaspects

of therobot’ssocialbehavior:

11

0 Capturing a human’s attention by approachingandthengoingbackandforth in front of

him is abehavior typically easilyrecognizedandinterpretedassoliciting help.

0 Getting a human to follow by turning aroundand startingto go to the placewherethe

problemoccurred(aftercapturingthehuman’sattention)requiresmultiple trials in orderfor

thehumanto completelyfollow therobottheentireway. This is dueto severalreasons:first,

evenif interestedandrealizingthattherobotwantssomethingfrom him, thehumanmaynot

actuallybelieve thathe is beingcalledby a robot in a way in which a dogwould do it and

doesnot expectthat following is whatheshoulddo. Second,afterchoosingto go with the

robot,if wanderingin searchof theplacewith theproblemtakestoo muchtime, thehuman

givesup notknowing whethertherobotstill needshim.

0 Conveying intentions by repeatingthe actionsof a failing behavior in front of a helperis

easilyachievedfor tasksin which all theelementsof thebehavior executionareobservable

to thehuman.Uponreachingtheplaceof therobot’sproblem,thehelperis alreadyengaged

in interactionandis expectingto be shown something.Therefore,seeingthe robot trying

andfailing to performcertainactionsis a clearindicationof therobot’s intentionsandneed

for assistance.

4 Learning fr om human demonstrations

In orderto designrobotsthatcouldsuccessfullyandefficiently performin human-robotdomains

it is importantto endow themwith learningcapabilities.This enablesthemnot only to adaptand

improve their performance,but alsoto bemoreaccessibleto a largerrangeof users,from thelay

to theskilled.

Designingcontrollersfor robotic tasksis usuallydoneby peoplespecializedin programming

robots. Even for them,mostoften, this is a complicatedprocess,andit essentiallyrequirescre-

ating by handa new anddifferentcontroller for eachparticulartask. Although certainpartsof

controllers,oncerefined,can be reused,it is still necessaryto, at leastpartially, redesignand

customizethe existing codefor eachof the new tasks. If robotsare to be effective in human-

robot domains,even userswithout programmingskills shouldbe ableto interactwith themand

“re-program”them.

Therefore,automatingthe robot controller designprocessbecomesof particularinterest. A

naturalapproachto this problemis the useof teachingby demonstration.Insteadof having to

12

write, by hand,acontrollerthatachievesaparticulartask,weallow a robotto automaticallybuild

it from observationor from theexperienceit hadwhile interactingwith a teacher. It is the latter

approachthatwewill considerin thiswork,asameansfor transferof taskknowledgefromteachers

to robots.

Weassumethattherobotis equippedwith asetof behaviors,alsocalledprimitives, whichcan

becombinedinto a varietyof tasks.We thenfocuson a learningstrategy thatwould helpa robot

build high-level taskrepresentationthatwill achieve thegoalsdemonstratedby a teacherthrough

the activation of the existing behavior set. We do not attemptto reproduceexact trajectoriesor

actionsof theteacher, but ratherlearnthetaskin termsof its high-level goals.

In our particularapproachto learning,we uselearningby experienceddemonstrations. This

implies that the robot actively participatesin the demonstrationprovided by the teacher, by fol-

lowing thehuman,andexperiencingthetaskthroughits own sensors.Thus,our approachis once

againaction-based:therobothasto performthetaskin orderto learnit. This is anessentialchar-

acteristicof ourapproach,andis whatis providing therobotthedatanecessaryfor learning.In the

mobilerobotdomaintheexperienceddemonstrationsareachievedby following of andinteracting

with theteacher. Theadvantageof “putting therobot through” thetaskduringthedemonstration

is that the robot is able to adjustits behaviors (throughtheir parameters)using the information

gatheredthroughits own sensors.In contrast,if the taskweredesignedby hand,a userwould

haveto determinethoseparametervalues.Furthermore,if therobotweremerelyobservingbut not

executingthetask,it wouldalsohaveto estimatetheparametervaluesat leastfor theinitial trial or

setof trials. In additionto experiencingparametervaluesdirectly, theexecutionof thebehaviors

providesobservationsthat containtemporalinformation for properbehavior sequencing,which

wouldbetediousto designby handfor taskswith long temporalsequences.

An importantchallengefor a learningmethodthatis basedonrobot’sobservationsis to distin-

guishbetweentherelevantandirrelevantinformationthattherobotis perceiving. In our architec-

ture, theabstract behaviors help the robotssignificantlyin pruningthe observationsthat arenot

relatedto theirown skills, but it is still impossibleto determineexactlywhatis really relevantfor a

particulartask.For example,while teachinga robotto go andpick up themail, a robotcandetect

numerousotheraspectsalongits path(e.g.,passingachair, meetinganotherrobot,etc.).However,

theseobservationsshouldnot be includedin the robot’s learnedtask,as they are irrelevant for

gettingthemail.

To have a robot learna taskcorrectlyin suchconditions,theteacherneedsa meansof provid-

13

ing therobotwith additionalinformationthanjust thedemonstrationexperience.In ourapproach,

the teacheris allowed to signalthroughgestures/speechthe momentsin time whenthe environ-

ment presentsaspectsrelevant to the task. While this allows the robot to distinguishsomeof

the irrelevant observations,it still may not help it to perfectly learnthe task. For this, methods

suchasmultiple demonstrationsandgeneralizationtechniquescanbe applied. We arecurrently

investigatingthesemethodsasa futureextensionto this work.

Thegeneralideaof thealgorithmis to addto thenetwork taskrepresentationaninstanceof all

behaviors whosepostconditionshave beentrueduringthedemonstration,andduringwhich there

have beensignalsfrom the teacher, in the orderof their occurrence.At the endof the teaching

experience,the intervals of time when the effects of eachof the behaviors have beentrue are

known, andareusedto determineif theseeffectshave beenactive in overlappingintervalsor in

sequence.Basedon theabove information,thealgorithmgeneratesthepropernetwork links (i.e.,

precondition-postconditiondependencies).This learningprocess,shown in Figure7, is described

in moredetailedin [16].

Figure7: Stepsof thelearningfrom demonstrationalgorithm

We designedseveral different experimentsthat rely on navigation and object manipulation

skills of therobots.First,we reporton theperformanceof learningfrom humanteachersin clean

environments,followedby learningin clutteredenvironments.

4.1 Experimental results- learning in cleanenvir onments

We performedthreedifferentexperimentsin a 4m x 6m arena,in which only the objectsrele-

vant to the taskswerepresent.During the demonstrationphase,a humanteacherled the robot

throughtheenvironmentwhile therobotrecordedits observationsrelative to thepostconditionsof

14

its behaviors. Thedemonstrationsincluded:

0 teachinga robotto visit anumberof cylindrical coloredtargetsin aparticularorder;

0 teachinga robotto slalomaroundcylindrical coloredobjects;

0 teachinga robotto transportobjectsbetweenasourceandadestinationlocation(markedby

cylindrical coloredobjects).

We repeatedtheseteaching experimentsmore than five times for eachof the demonstrated

tasks,to validatethat our learningalgorithmreliably constructsthe sametaskrepresentationfor

the samedemonstratedtask. Next, using the behavior networks constructedduring the robot’s

observations,we performedexperimentsin which therobot reliably repeatedthetaskit hadbeen

shownandhadlearned.Wetestedtherobotin executingthetaskfivetimesin thesameenvironment

asthe onein the learningphase,andalsofive timesin a changedenvironment. We presentthe

detailsandtheresultsfor eachof thetasksin thefollowing sections.

4.1.1 Learning to visit targetsin a particular order

Thegoalof thisexperimentwasto teachtherobotto reachasetof targetsin theorderindicatedby

thearrows in Figure8(a).Therobot’sbehavior setcontainsaTracking behavior, parameterizable

in termsof thecolorsof targetsthatareknown to therobot. Therefore,duringthedemonstration

phase,differentinstancesof thesamebehavior producedoutputaccordingto their settings.

(a) Experimentalsetup(1)

(b) Experimentalsetup(2)

(c) Approximaterobottrajectory

Figure8: Experimentalsetupfor thetargetvisiting task

Figure9 shows thebehavior network therobotconstructedasa resultof theabovedemonstra-

tion.

15

Track(Green, 179, 468)

Track(Orange, 121, 590)

Track(Blue, 179, 531)

Track(Yellow, 179, 814)



INIT

Figure9: Taskrepresentationlearnedfrom thedemonstrationof theVisit targetstask

Morethanfivetrialsof thesamedemonstrationwereperformedin orderto verify thereliability

of thenetwork generationmechanism.All of theproducedcontrollerswereidenticalandvalidated

thattherobotlearnedthecorrectrepresentationfor this task.

4.1.2 Learning to slalom

In this experiment,thegoalwasto teacha robotto slalomthroughfour targetsplacedin a line, as

shown in Figure10(a).Wechangedthesizeof thearenato 2mx 6mfor this task.

(a) (b)

Figure10: TheSlalom task:(a)Experimentalsetup;(b) Approximaterobottrajectory

During 8 differenttrials the robot learnedthe correcttaskrepresentationasshown in the be-

havior network from Figure11.

We performed20 experiments,in which the robot correctlyexecutedtheslalomtaskin 85%

of thecases.Thefailuresconsistedof two types:1) therobot,afterpassingone“gate,” couldnot

find thenext onedueto thelimitationsof its visionsystem;and2) therobot,while searchingfor a

16



Track(Blue, 10, 350)


Initialize

Figure11: Taskrepresentationlearnedfrom thedemonstrationof theSlalom task

gate,turnedbacktowardthealreadyvisitedgates.Figure10(b)shows theapproximatetrajectory

of therobotsuccessfullyexecutingtheslalomtaskon its own.

4.1.3 Learning to traverse“gates” and moveobjectsfr om oneplaceto another

Thegoalof this experimentwasto extendthecomplexity of thetaskto belearnedby addingto it

objectmanipulation.For this, therobotusedits behaviors for picking up anddroppingobjectsin

additionto thebehaviors for navigationandtracking,alreadydescribed.

(a) (b)

Figure12: TheObject manipulation task:(a)Traversinggatesandmoving objects;(b) Approxi-matetrajectoryof therobot

Thesetupfor this experimentis presentedin Figure12(a).Notethesmallorangebox closeto

thegreentarget. In orderto teachtherobotthatthetaskis to pick uptheorangeboxplacednearthe

greentarget(thesource),thehumanled therobotto thebox,andwhensufficiently nearit, placed

the box betweenthe robot’s grippers. After leadingthe robot throughthe “gate” formedby the

blueandyellow targets,whenreachingtheorangetarget(thedestination),thehumantook thebox

from therobot’sgripper. Thelearnedbehavior network representationis shown in Figure13. Since

17

the robot startedthe demonstrationwith nothingin the gripper, theeffectsof the Drop behavior

weremet, andthusan instanceof that behavior wasaddedto the network. This ensurescorrect

executionfor thecasewhentherobotmight startthe taskwhile holdingsomething:thefirst step

wouldbeto droptheobjectbeingcarried.

Theability to tracktargetswithin a [0, 180] degreerangeallows therobotto learnto naturally

executethe part of the taskinvolving going througha gate. This experienceis mappedonto the

robot’srepresentationasfollows: “track theyellow targetuntil it is at180degrees(and50cm)with

respectto you, thentrack theblue target until it is at 0 degrees(and40cm).” At executiontime,

sincethe robot is ableto track both targetseven after they disappearedfrom its visual field, the

goalsof theabove Track behaviors wereachievedwith a smooth,naturaltrajectoryof the robot

passingthroughthegate.Drop1


PickUp(Orange)


Track(Blue, 0, 569)


Drop2

INIT

Figure13: Taskrepresentationlearnedfrom thedemonstrationof theObject manipulation task

Due to the increasedcomplexity of the taskdemonstration,in 10%of thecases(out of more

than10trials)thebehavior network representationsbuilt by therobotwerenotcompletelyaccurate.

Theerrorsrepresentedspecializedversionsof thecorrectrepresentation,suchas:Track thegreen

target from a certainangleanddistance,followedby thesameTrack behavior but with different

parameters- only thelastwasin factrelevant.

The robot correctlyexecutedthe task in 90% of the cases.The failureswereall of the type

involving exceedingthe allocatedamountof time for the task. This happenedwhen the robot

failedto pick up thebox becauseit wastoo closeto it andthusendedup pushingit without being

ableto perceive it. This failureresultsfrom theundesirablearrangementandrangeof therobot’s

sensors,notto any algorithmicissues.Figure14showstherobot’sprogressduringtheexecutionof

asuccessfultask,specificallytheintervalsof timeduringwhichthepostconditionsof thebehaviors

18

0 10 20 30 40 50 600

1

2

3

4

5

6

7

8

Drop1

Track(Green)

PickUp(Orange)

Track(Yellow)

Track(Blue)

Track(Orange)

Drop2

Moving objects and traversing gates task

Time [seconds]

Beh

avio

rs

Figure14: The robot’s progress(achievementof behavior postconditions)while performingtheObject manipulation task

in thenetwork weretrue: therobotstartedby goingto thegreentarget(thesource),thenpickedup

thebox,traversedthegate,andfollowedtheorangetarget(thedestination)whereit finally dropped

thebox.

4.1.4 Discussion

The resultsobtainedfrom the above experimentsdemonstratethe effectivenessof usinghuman

demonstrationcombinedwith ourbehavior architectureasamechanismfor learningtaskrepresen-

tations.Theapproachwepresentedallowsa robotto automaticallyconstructsuchrepresentations

from a singledemonstration.The summaryof the experimentalresultsis presentedin Table1.

Furthermore,thetaskstherobotis ableto learncanembedarbitrarily longsequencesof behaviors,

whichareencodedwithin thebehavior network representation.

Table1: Summaryof theexperimentalresults.

SuccessesExperimentname TrialsNr. Percent

Six targets(learning) 5 5 100%Six targets(execution) 5 5 100%Slalom(learning) 8 8 100%Slalom(execution) 20 17 85 %Objectmove(learning) 10 9 90 %Objectmove(execution) 10 9 90 %

Analyzing the taskrepresentationsthe robot built during the experimentsabove, we observe

thetendency towardover-specialization.Thebehavior networkstherobotlearnedenforcethatthe

19

executiongothroughall demonstratedstepsof thetask,evenif someof themmightnotberelevant.

Sincethereis no direct information from the humanaboutwhat is or is not relevant during a

demonstration,andsincetherobotlearnsthetaskrepresentationfrom evenasingledemonstration,

it assumesthat everything that it noticesabout the environment is importantand representsit

accordingly.

As any one-shotlearningsystem,our systemlearneda correct,but potentiallyoverly special-

ized representationof the demonstratedtask. Additional demonstrationsof the sametaskwould

allow it to generalizeat the level of theconstructedbehavior network. In thenext sectionwe ad-

dresstheproblemof overspecializationby experimentingin clutteredenvironmentsandallowing

theteacherto signalto therobotthesaliency of particularevents,or evenobjects.While this does

noteliminateirrelevantenvironmentstatefrom beingobserved,it biasestherobotto noticeand(if

capable)capturethekey elements.

4.2 Learning in Envir onmentsWith Distractors

Thegoalof theexperimentspresentedin this sectionis to show theability of the robotsto learn

from in environmentswith distractorobjects,whicharenot relevantfor thedemonstratedtasks.

The task to be learnedby the robot is similar to the moving objectstask from above (Fig-

ure15(a)): pick up theorangebox placednearthe light greentarget (thesource),go throughthe

“gate” formedby theyellow andlight orangetarget,droptheboxat thedarkgreentarget(thedes-

tination)andthencomebackto thesourcetarget.Theorangeandtheyellow targetsat theleft are

distractorsthatshouldnotbeconsideredaspartof thetask.In orderto teachtherobotthatit hasto

pick up thebox, thehumanled therobotto it andthen,whensufficiently nearit, placedit between

therobot’s grippers.At thedestinationtarget, the teachertook thebox from therobot’s grippers.

Momentsin timesignaledby theteacherasbeingrelevantto thetaskare:giving therobotthebox

while closeto thelight greentarget,teacherreachingtheyellowandlight orange target,takingthe

box from therobotwhile at thegreentarget,andteacherreachingthe light greentargetin theend.

Thus,althoughtherobotobservedthat it hadpassedtheorange anddistantyellow targetsduring

thedemonstration,it did not includethemin its taskrepresentation,sincetheteacherdid notsignal

any relevancewhile beingat them.

We performed10 human-robotdemonstrationexperimentsto validatetheperformanceof our

learningalgorithm.Wethenevaluatedeachlearnedrepresentationbothby inspectingit structurally

andby having therobotperformit, to getphysicalvalidationthattherobotlearnedthecorrecttask.

20

(a)Environment

0 50 100 150 200 2500

1

2

3

4

5

6

7

8

9

Drop0

Track(LightGreen)

PickUp(Orange)

Track(LightOrange)

Track(Yellow)

Track(Green)

Drop9

Track(LightGreen)

Moving objects and traversing gates task

Time [seconds]

Beh

avio

rs

(b) Achievementof behavior postcondi-tions

Figure15: TheObject manipulation taskin environmentswith distractors

In 9 of the 10 experimentsthe robot learneda structurallycorrectrepresentation(sequencingof

the relevantbehaviors) andalsoperformedit correctly. In onecase,althoughthestructureof the

behavior network wascorrect,the learnedvaluesof oneof thebehavior’s parameterscausedthe

robot to performan incorrecttask(insteadof goingbetweentwo of the targetstherobotwent to

themandthenaround).The learnedbehavior network representationof this taskis presentedin

Figure16.

DROP0

MTLGreen1

PICKUPOrange2

MTLOrange4

MTYellow7

MTGreen8

DROP9

MTLGreen11

INIT

Figure16: Taskrepresentationlearnedfrom humandemonstrationfor theObject manipulationtask

In Figure15(b)weshow therobot’sprogressduringtheexecutionof thetask,morespecifically

theinstantsof timeor theintervalsduringwhichthepostconditionsof thebehaviors in thenetwork

weretrue.

For the9 out of 10 successeswe have recorded,the95%confidenceinterval for thebinomial

21

distributionof thelearningrateis [0.55520.9975],obtainedusinga Paulson-Camp-Prattapproxi-

mation[17] of theconfidencelimits.

As a base-casescenario,to demonstratethe reliability of the learnedrepresentation,we per-

formed10 trials, in which a robot repeatedlyexecutedoneof the learnedrepresentationsof the

above task. In 9 of the10 casestherobotcorrectlycompletedtheexecutionof thetask.Theonly

failurewasdueto a time-outin trackingthegreentarget.

5 Relatedwork

Thework presentedhereis mostrelatedto two importantareasof roboticsresearch:human-robot

interactionandrobot learning.Herewe discussits relationto bothareasandstatetheadvantages

gainedby combiningthetwo in thecontext of addingsocialcapabilitiesto agentsin human-robot

domains.

Most of the approachesto human-robotinteractionso far rely on usingpredefined,common

vocabulariesof gestures[18], signsor words.Thesecanbesaidto beusinga symboliclanguage,

whoseelementsexplicitly communicatespecificmeanings.Theadvantageof thesemethodsis that,

assuminganappropriatevocabularyandgrammar, arbitrarily complex informationcanbedirectly

transmitted.However, aswe arestill far from a truedialoguewith a robot,mostapproachesthat

usenatural language for communicationemploy a limited andspecificvocabulary which hasto

be known in advanceby both the robot and the humanusers. Similarly, for gesture and sign

languages, a mutuallypredefined,agreed-uponvocabulary of symbolsis necessaryfor successful

communication.In this work, we show that communicationbetweenrobotsandhumanscanbe

achievedevenwithoutsuchexplicit prior vocabularysharing.

Oneof themostimportantformsof implicit communication,which hasreceiveda greatdeal

of attentionamongresearchers,is theuseof variousformsof body language.Using this typeof

communicationfor human-robotinteraction,andhuman-machineinteractionin general,is becom-

ing very popular. For example,it hasbeenappliedto humanoidrobots(in particularhead-eye

systems),for communicatingemotionalstatesthroughfaceexpressions[19] or bodymovements

[7], wherethe interactionis performedthroughbody language. This ideahasbeenexploredin

autonomousassistantsand interfaceagentsaswell [20]. While facial expressionsarea natural

meansof interactionfor a humanoid,or in generala “headed,” robot, they cannotbeentirelyap-

plied to the domainof mobile robots,wherethe platformstypically have a very different, and

22

non-anthropomorphicphysicalstructure.In ourapproach,wedemonstratethattheuseof implicit,

action-basedmethodsfor communicatingandexpressingintentionscanbeextendedto themobile

robotdomain,despitethestructuraldifferencesbetweenmobilerobotsandhumans.

Teachingrobotsnew tasksis a topic of greatinterestin robotics. In the context of behavior-

basedrobot learning,methodsfor learningpolicies(situation-behavior mappings)have beensuc-

cessfullyappliedto single-robotlearningof varioustasks,mostcommonlynavigation[21], hexa-

podwalking [22], box-pushing[23], andmulti-robotlearning[24].

In theareaof teachingrobotsby demonstration,alsoreferredto asimitation, [8] demonstrated

simplifiedmazelearning,i.e., learningturningbehaviors,by following anotherrobotteacher. The

robot usedits own observationsto relatethe changesin the environmentwith its own forward,

left, and right turn actions. [9] usedmodel-basedreinforcementlearningto speed-uplearning

for a systemin which a 7 DOF robot arm learnedthe taskof balancinga pole from a brief hu-

mandemonstration.Otherwork in our lab is alsoexploring imitation basedon mappingobserved

humandemonstrationonto a setof behavior primitives,implementedon a 20 DOF dynamichu-

manoidsimulation[25, 26]. Thekey differencebetweenthework presentedhereandthoseabove

is at the level of learning. The work above focuseson learningat the level of action imitation

(andthususually resultsin acquiringreactive policies),while our approachenableslearningof

high-level, sequentialtasks.

6 Conclusions

In thispaperwepresentedanaction-basedapproachto human-robotinteractionandrobotlearning,

both dealingwith aspectsof designingsocially intelligent agents.The methodwasshown to be

effective for interactingwith humansusing implicit, action-basedcommunicationand learning

from experienceddemonstration.

We arguedthat the meansof communicationand interactionof mobile robotswhich do not

have anthropomorphic,animal,or pet-like appearanceandexpressivenessshouldnot necessarily

belimited to explicit typesof interaction,suchasspeechor gestures.Wedemonstratedthatsimple

actionscouldbeusedin orderto allow a robot to successfullyinteractwith usersandexpressits

intentions.For a largeclassof intentionssuchas: I wantto do ”this” - but I can’t, theprocessof

capturingahuman’sattentionandthentrying to executetheactionandfailing is expressiveenough

to effectively convey themessage,andthusobtainassistance.

23

We alsopresenteda methodologyfor learningfrom demonstrationin which the robot learns

by relatingtheobservationsto theknown effectsof its behavior repertoire.This is madepossible

by our behavior architecturethathasa perceptualcomponent(abstract behavior) which embeds

representationsof therobot’s behavior goals.We demonstratedthat themethodis robustandcan

be appliedto a variety of tasksinvolving the executionof long, and sometimeseven repeated

sequencesof behaviors.

While webelievethatrobotsshouldbeendowedwith asmany interactionmodalitiesasis possi-

bleandefficient,wefocusonaction-basedinteractionasalesserstudiedbut powerfulmethodology

for bothlearningandhuman-machineinteractionin general.

References

[1] A. David and M. P. Ball, “The video game: a model for teacher-studentcollaboration,”

Momentum, vol. 17,no.1, pp.24–26,1986.

[2] C. Murray andK. VanLehn, “Dt tutor: A decision-theoretic,dynamicapproachfor optimal

selectionof tutorialactions,” in Proc.,ITS,6th Intl. Conf., 2000,pp.153–162.

[3] V. J.ShuteandJ.Psotka,“Intelligent tutoringsystems:past,presentandfuture,” Handbook

of Research onEducationalCommunicationsandTechnology, 1996.

[4] T. Matsuiet al., “An office conversationmobile robot that learnsby navigationandconver-

sation,” in Proc.,RealWorld ComputingSymp., 1997,pp.59–62.

[5] SebastianThrun et al., “A secondgenerationmobile tour-guiderobot,” in Proc. of IEEE,

ICRA, 1999.

[6] FrancoisMichaudandCaronS., “Roball - anautonomoustoy-rolling robot,” in Proc.of the

Workshopon InteractiveRoboticsandEntertainment, 2000.

[7] Lola D. CanameroandJakob Fredslund, “How doesit feel? emotionalinteractionwith a

humanoidlego robot,” TechReportFS-00-04,AAAI Fall Symposium,2000.

[8] Gillian HayesandJohnDemiris, “A robotcontrollerusinglearningby imitation,” in Proc.

of theIntl. Symp.on IntelligentRoboticSystems, Grenoble,France,1994,pp.198–204.

24

[9] StefanSchaal,“Learningfrom demonstration,” in Advancesin Neural InformationProcess-

ing Systems9, M.C. Mozer, M. Jordan,andT. Petsche,Eds.1997,pp.1040–1046,MIT Press,

Cambridge.

[10] Brian Scasellatti,“Investigatingmodelsof socialdevelopmentusinga humanoidrobot,” in

Biorobotics, BarbaraWebbandThomasConsi,Eds.MIT Press,to appear, 2000.

[11] Maja J. Mataric, “Behavior-basedcontrol: Examplesfrom navigaton, learning,andgroup

behavior,” Journal of Experimentaland Theoretical Artificial Intelligence, vol. 9, no. 2–3,

pp.323–336,1997.

[12] RonaldC. Arkin, Behavior-BasedRobotics, MIT Press,CA, 1998.

[13] MonicaN. NicolescuandMaja J. Mataric, “A hierarchicalarchitecturefor behavior-based

robots,” in Proc., Intl. Conf. on AutonomousAgentsand Multiagent Systems, Bologna,

ITALY, July2002.

[14] DanielC. Dennett,TheIntentionalStance, MIT Press,Cambridge,1987.

[15] Barry Brian Werger, “Ayllu: Distributedport-arbitratedbehavior-basedcontrol,” in Proc.,

The5th Intl. Symp.on DistributedAutonomousRoboticSystems, Knoxville, TN, 2000,pp.

25–34,Springer.

[16] MonicaN. NicolescuandMaja J. Mataric, “Experience-basedrepresentationconstruction:

learningfrom humanandrobotteachers,” in Proc.,IEEE/RSJIntl. Conf. onIntelligentRobots

andSystems, Maui, Hawaii, USA, Oct2001,pp.740–745.

[17] Colin R. Blyth, “Approximatebinomialconfidencelimits,” Journalof theAmericanStatisti-

cal Association, vol. 81,no.395,pp.843–855,September1986.

[18] David Kortenkamp,Eric Huber, andR. PeterBonasso,“Recognizingandinterpretingges-

tureson amobilerobot,” Proc.,AAAI, pp.915–921,1996.

[19] CynthiaBreazealandBrianScassellati,“How to build robotsthatmakefriendsandinfluence

people,” in Proc.,IROS,Kyonju,Korea, 1999,pp.858–863.

[20] Tomoko KodaandPattieMaes,“Agentswith faces:Theeffectsof personificationof agents,”

in Proceedings,HCI, 1996,pp.98–103.

25

[21] MarcoDorigoandMarcoColombetti,RobotShaping:AnExperimentin BehaviorEngineer-

ing, MIT Press,Cambridge,1997.

[22] Pattie MaesandRodney A. Brooks, “Learning to coordinatebehaviors,” in Proc., AAAI,

Boston,MA, 1990,pp.796–802.

[23] SridharMahadevan andJonathanConnell, “Scaling reinforcementlearningto roboticsby

exploiting the subsumptionarchitecture,” in Eighth Intl. Workshopon Machine Learning,

1991,pp.328–337.

[24] Maja J Mataric, “Reinforcementlearningin themulti-robotdomain,” AutonomousRobots,

vol. 4, no.1, pp.73–83,1997.

[25] Maja J Mataric, “Sensory-motorprimitivesasa basisfor imitation: Linking perceptionto

actionandbiology to robotics,” in Imitation in AnimalsandArtifacts, ChrystopherNehaniv

andKerstinDautenhahn,Eds.MIT Press,Cambridge,to appear, 2001.

[26] OdestChadwicke Jenkins,Maja J Mataric, andStefan Weber, “Primitive-basedmovement

classificationfor humanoidimitation,” in Proc., First IEEE-RASIntl. Conf. on Humanoid

Robotics, Cambridge,MA, MIT, 2000.

26

linking perception and action in a control architecture for …monica/papers/nicolescumatari... ·...

Documents