linking perception and action in a control architecture for …monica/papers/nicolescumatari... ·...
TRANSCRIPT
Papersubmissionfor HICSS-36
Emerging TechnologiesTrack
AugmentedCognition and Human-Robot Interaction Minitrack
Linking Perceptionand Action in a Control Ar chitecture forHuman-Robot Domains
Monica N. Nicolescuand Maja J Matari c
RoboticsResearchLaboratory
Universityof SouthernCalifornia
941West37thPlace,MC 0781
Los Angeles,CA 90089-0781
E-mail: monica�[email protected]
Tel: (213)740-6243,Fax: (213)740-7512
1
Abstract
Human-robotinteractionis agrowing researchdomain;therearemany approachesto robotdesign,
dependingon theparticularaspectsof interactionbeingfocusedon. In this paperwe presentan
action-basedframework that providesa naturalmeansfor robotsto interactwith humansandto
learnfrom them. Perceptionandactionaretheessentialmeansfor a robot’s interactionwith the
environment;for successfulrobotperformanceit is thusimportantto exploit this relationbetween
a robot andits environment. Our approachlinks perceptionandactionsin a uniquearchitecture
for representinga robot’s skills (behaviors). We usethis architectureto endow therobotswith the
ability to convey their intentionsby actingupontheir environmentandalsoto learn to perform
complex tasksfrom observingandexperiencinga demonstrationby a humanteacher. We demon-
stratetheseconceptswith a Pioneer2DX mobilerobot,learningvarioustasksfrom a humanand,
whenneeded,interactingwith ahumanto gethelpby conveying its intentionsthroughactions.
Keywords: Robotics,LearningandHuman-RobotInteraction
2
1 Intr oduction
Human-robotinteractionis an areaof growing interestin Robotics. Environmentsthat feature
theinteractionof humansandrobotspresentasignificantnumberof challenges,spawningseveral
importantresearchdirections.Thesedomainsof human-machineco-existenceform anew typeof
“society” in which therobot’s role is essentialin determiningthenatureof resultinginteractions.
In thiswork we focuson two majorchallengesof key importancefor designingrobotsthatwill be
effective in human-robotdomains.
Thefirst challengewe addressis thedesignof robotsthatexhibit socialbehavior, in orderto
allow themto engagein varioustypesof interactions.This is a very largedomain,with examples
including teachers[1],[2], [3], workers,membersof a team,cooperatingwith other robotsand
peopleto solve andperformtasks[4]. Robotscanbe entertainers,suchasmuseumtour-guides
[5], toys [6], pets,or emotionalcompanions[7]. Designingcontrolarchitecturesfor suchrobots
presentsparticularchallenges,in largepartspecificfor eachof thesedomains.
Thesecondchallengeweaddressis to build robotsthathave theability to learnthroughsocial
interactionwith humansor with otherrobotsin theenvironment,in orderto improve their perfor-
manceandexpandtheir capabilities.Successfulexamplesincluderobotsimitating demonstrated
tasks(suchasmazelearning[8] andjuggling [9]) andtheusenaturalcues(suchasmodelsof joint
attention[10]) asmeansfor socialinteraction.
In this paperwe presentan approachthat unifies the two challengesabove, interactionand
learningin human-robotenvironments,by unifying perceptionandaction in the form of action-
basedinteraction. Our approachrelieson an architecturethat is basedon a setof behaviors or
skills consistingof bothactiveandperceptualcomponents.
The perceptualcomponentof a behavior givesthe robot the capabilityof creatinga link be-
tweenits observationsandits own actions,which enablesit to learnto performa particulartask
from theexperiencesit hadwhile interactingwith humans.
The activecomponentof a robot behavior allows the useof implicit communication,which
doesnot rely on a symboliclanguage,andinsteadusesactions,whoseoutcomesareinvariantto
the specificbody performingthem. A robot can thusconvey its intentionsby suggestingthem
throughactions,ratherthancommunicatingthemthroughconventionalsigns,sounds,gestures,or
markswith previously agreed-uponmeanings.We employ theseactionsasa vocabulary that a
robotcoulduseto inducea humanto assistit for partsof tasksthatit is not ableto performon its
3
own. Theparticularitiesof our behavior architecturearedescribedin Section2.
To illustrateourapproach,wepresentexperimentsin whichahumanactsbothasateacherand
a collaboratorfor a mobile robot. The differentaspectsof this interactionhelp demonstratethe
robot’s learningandsocialabilities.
This paperis organizedasfollows. Section 2 presentsthebehavior representationthatwe are
using,andtheimportanceof thearchitecturefor ourproposedchallenges.In Section3, wepresent
themodelfor human-robotinteractionandthegeneralstrategy for communicatingintentions,in-
cludingexperimentsin whicharobotengagedahumanin interactionthroughactionsindicativeof
its intentions.Section 4 describesthemethodfor learningtaskrepresentationsfrom experienced
interactionswith humansandpresentsexperimentaldemonstrationsandvalidationof learningtask
representationsfrom demonstration.Sections 5 and 6 discussdifferentrelatedapproachesand
presenttheconclusionsof thedescribedwork.
2 Behavior representation
Perceptionandactionare the essentialmeansof interactionwith the environment. The perfor-
manceandthecapabilitiesof a robotaredependenton its availableactions,andthusthey arean
essentialcomponentof its design. As underlyingcontrol architecturewe areusinga behavior-
basedapproach[11, 12], in which time-extendedactionsthatachieveor maintainaparticulargoal
aregroupedinto behaviors, thekey building blocksfor intelligent,complex observablebehavior.
Thecomplexity of a robot’sskills canrangefrom elementaryactions(suchas“go forward”, “turn
left”) to temporally-extendedbehaviors (suchas“follo w”, “go home”, etc.).
Figure1: Structureof theinputs/outputsof anabstractandprimitivebehavior.
Within our architecture,behaviors arebuild from two components:onerelatedto perception
(Abstract behavior), the otherto actions(Primitive behavior) (Figure1). The abstract behavior
is simply anexplicit specificationof thebehavior’s activationconditions(i.e., preconditions),and
4
its effects(i.e., postconditions).Thebehaviors thatdo thework thatachievesthespecifiedeffects
underthe given conditionsare calledprimitive behaviors. An abstract behavior takes sensory
informationfrom theenvironmentand,whenits preconditionsaremet,activatesthecorresponding
primitive behavior(s), whichachieve theeffectsspecifiedin its postconditions.
This architectureprovidesa simpleandnaturalway of representingrobot tasksin theform of
behavior networks [13], andalsohasthe flexibility requiredfor robust function in dynamically
changingenvironments.Figure2 showsagenericbehavior network.
Figure2: Exampleof abehavior network
Theabstract behaviors embedrepresentationsof a behavior’s goalsin the form of abstracted
environmentalstates.Thisisakey featureof ourarchitecture,andacritical aspectfor learningfrom
experience.In orderto learna tasktherobothasto createa link betweenperception(observations)
and the actionsthat would achieve the sameobservedeffects. This processis enabledby the
abstract behaviors, theperceptualcomponentof a behavior. This componentfireseachtime the
observationsmatcha primitive’s goals,allowing the robot to identify during its experiencethe
behaviors thatarerelevantfor thetaskbeinglearned.
Theprimitive behaviors aretheactive componentof a behavior, executingtherobot’s actions
andachieving its goals.Acting in theenvironmentis a form of implicit communicationthatplays
a key role in humaninteraction.Usingevocativeactions,people(andotheranimals)convey emo-
tions, desires,interests,and intentions. Action-basedcommunicationhasthe advantagethat it
neednot be restrictedto robotsor agentswith a humanoidbody or face: structuralsimilarities
betweenthe interactingagentsarenot requiredto achieve successfulinteraction.Evenif thereis
noexactmappingbetweenamobilerobot’sphysicalcharacteristicsandthoseof ahumanuser, the
robot may still be ableto convey a message,sincecommunicationthroughactionalsodraws on
humancommonsense[14]. In thenext sectionwe describehow our approachachievesthis type
of communication.
5
3 Communication by acting - a meansfor robot-human inter-action
Our goal is to develop a modelof interactionwith humansthat would allow a robot to induce
a humanto assistit by being able to expressits intentionsin a way that humanscould easily
understand.Wefirst presentageneralexamplethatillustratesthebasicideaof ourapproach.
Considera prelinguisticchild who wantsa toy that is out of his reach. To get it, the child
will try to bring a grown-upto the toy andwill thenpoint andeventry to reachit, indicatinghis
intentions.Similarly, adogwill run backandforth to induceits ownerto cometo aplacewhereit
hasfoundsomethingit desires.Theability of thechild andthedogto demonstratetheir intentions
by callingahelperandmock-executinganactionis anexpressiveandnaturalwayto communicate
aproblemandneedfor help.Thecapacityof ahumanobserverto understandtheseintentionsfrom
exhibitedbehavior is alsonaturalsincetheactionscarryintentionalmeanings,andthusareeasyto
understand.
We apply the samestrategy in the robot domain. Theaction-basedcommunicationapproach
we proposefor thepurposeof suggestingintentionsis generalandcanbeappliedacrossdifferent
tasksandphysicalbodies/platforms.In our approach,a robotperformsits taskindependently, but
if it fails in a cognizantfashion,it searchesfor a humanandattemptsto inducehim to follow it
to theplacewherethefailureoccurredanddemonstratesits intentionsin hopesof obtaininghelp.
Next, wedescribehow this communicationis achieved.
Immediatelyaftera failure,therobotsavesthecurrentstateof thetaskexecution(failurecon-
text), in orderto beableto laterrestartexecutionfrom thatpoint.
Track(Human,90,50)
Track(Human,90,100)
Initialize
Figure3: Behavior network for callinga human
Next, therobotstartstheprocessof findingandluring ahumanto help.This is implementedas
abehavior-basedsystem,whichusestwo instancesof aTrack(Human, angle,distance)behavior,
with differentvaluesof theDistanceparameter:onefor gettingclose(50cm)andonefor getting
farther(1m) (Figure3). As partof thefirst trackingbehavior, therobotsearchesfor andfollows a
humanuntil hestopsandtherobotgetssufficiently close.At thatpoint, thepreconditionsfor the
secondtrackingbehavior areactive, so the robot backsup in orderto get to the fartherdistance.
6
Oncethe outcomesof this behavior have beenachieved (anddetectedby the Init behavior), the
robotre-instantiatesthenetwork, resultingin abackandforth cyclingbehavior, muchlikeadog’s
behavior for enticinga humanto follow it. When the detecteddistancebetweenthe robot and
the humanbecomessmallerthanthe valuesof the Distanceparameterfor any oneof its Track
behaviors for someperiodof time, thecycling behavior is terminated.
TheTrack behavior enablestherobotto follow coloredtargetsatany distancein the[30, 200]
cm rangeandany anglein the[0, 180] degreerange.Theinformationfrom thecamerais merged
with datafrom the laserrange-finderin orderto allow the robot to track targetsthat areoutside
of its visual field (seeFigure4). The robot usesthe camerato first detectthe target andthento
trackit after it goesout of thevisualfield. As long asthetarget is visible to thecamera,therobot
usesits positionin thevisualfield ( ��������� ) to infer anapproximateangleto thetarget �� ����������� (the
“approximation”in theanglecomesfrom thefactthatwearenotusingprecisecalibrateddatafrom
thecameraandwecomputeit without takinginto considerationthedistanceto thetarget).Weget
therealdistanceto thetarget ������������ �����"!# �����$���% from thelaserreadingin asmallneighborhoodof the
�� ����������� angle.Whenthetargetdisappearsfrom thevisualfield,wecontinueto trackit by lookingin
theneighborhoodof thepreviouspositionin termsof angleanddistancewhicharenow computed
as �&�� ���')(*,+ and �-���������� ,�.�� ! �� ��.')(*/+ . Thus, the behavior givesthe robot the ability to keeptrack of
positionsof objectsaroundit, evenif they arenotcurrentlyvisible,akin to workingmemory. This
is extremelyusefulduringthelearningprocess,asdiscussedin thenext section.
(a) Spacecoverageus-ing laserrangefinderandcamera
(b) Principle for targettracking by merging vi-sionandlaserdata
Figure4: Merging laserandvisualinformationfor tracking
After capturingthe human’s attention,the robot switchesbackto the taskit wasperforming
(i.e., loadsthetaskbehavior network andthefailurecontext thatdetermineswhichbehaviorshave
beenexecutedandwhich behavior hasfailed), while makingsurethat the humanis following.
This is achieved by adjustingthe speedof the robot suchthat thehumanfollower is keptwithin
7
desirablerangebehindtherobot. If thefollower is lost, therobotstartssearchingagainfor another
helper. After a few experienceswith unhelpfulhumans,the robot will againattemptto perform
the taskon its own. If a humanprovidesusefulassistance,andthe robot is ableto executethe
previously failedbehavior, therobotcontinueswith taskexecutionasnormal.
Thus,therobotretriesto executeits taskfrom thepoint whereit hasfailed,while makingsure
that thehumanhelperis nearby. Executingthe previously failed behavior will likely fail again,
effectively expressingto thehumantherobot’sproblem.
In the next sectionwe describethe experimentswe performedto test the above approachto
human-robotinteraction,involving casesin whichthehumanis helpful,unhelpful,or uninterested.
3.1 Experiments on Robot Interacting with humans - Communication byActing
Theexperimentsthatwepresentin thissectionfocusonperformingactionsasameansof commu-
nicatingintentionsandneeds.Initially, therobot (which hasa typical mobile robot form entirely
differentfrom thatof thehuman)wasgivena behavior setthatallowedit to trackcoloredtargets,
opendoors,pick up,drop,andpushobjects.ThebehaviorswereimplementedusingAYLLU [15],
an extensionof the C languagefor developmentof distributedcontrol systemsfor mobile robot
teams.
We testedour conceptson a Pioneer2-DX mobilerobot,equippedwith two ringsof sonars(8
front and8 rear),aSICK laserrange-finder, apan-tilt-zoomcolor camera,agripper, andon-board
computationon aPC104stack(Figure5).
Figure5: A Pioneer2DX robot
In orderto testtheinteractionmodelwe describedabove,we designeda setof experimentsin
which theenvironmentwaschangedso that the robot’s executionof the taskbecameimpossible
withoutsomeoutsideassistance.
8
The failure to performany oneof thestepsof the taskinducedthe robot to seekhelp andto
performevocativeactionsin orderto catchtheattentionof ahumanandgethim to theplacewhere
the problemoccurred. In orderto communicatethe natureof the problem,the robot repeatedly
tried to executethe failed behavior in front of its helper. This is a generalstrategy that canbe
employed for a wide variety of failures. However, asdemonstratedin our third examplebelow,
therearesituationsfor which this approachis not sufficient for conveying themessageaboutthe
robot’s intent. In those,explicit communication,suchasnaturallanguage,is moreeffective. We
discusshow differenttypesof failuresrequiredifferentmodesof communicationfor help.
In ourvalidationexperiments,weaskedapersonthathadnotworkedwith therobotbeforeto be
closeduringthetasksexecutionandexpectto beengagedin interaction.Duringtheexperimentset,
weencountereddifferentsituations,correspondingto differentreactionsof thehumanin response
to therobot.Wecangroupthesecasesinto thefollowing maincategories:
0 uninterested: the humanwasnot interestedin, did not reactto, or did not understandthe
robot’scalling for help.As a result,therobotstartedto searchfor anotherhelper.
0 interested,unhelpful: thehumanwasinterestedandfollowedtherobotfor awhile but then
abandonedit. As in the previouscase,whenthe robot detectedthat the helperwaslost, it
startedto look for anotherone.
0 helpful: thehumanfollowedtherobotto thelocationof theproblemandassistedtherobot.
In thesecasestherobotwasableto finish theexecutionof thetask,benefitingfrom thehelp
it hadreceived.
We purposefullyconstrainedtheenvironmentin which thetaskwasto beperformed,in order
to encouragehuman-robotinteraction.Thehelper’s behavior, consequently, hadadecisive impact
on the robot’s task performance:when uninterestedor unhelpful, failure ensuedeither due to
exceedingtime constraintsor to the robot giving up the task after trying for too many times.
However, therewerealsocaseswhentherobot failed to find or enticethehumanto comealong,
dueto visual sensinglimitations or the robot failing to expressively executeits calling behavior.
Thefew casesin which a failureoccurreddespitetheassistanceof a helpful humanarepresented
below, alongwith adescriptionof eachof thethreeexperimentaltasksandoverall results.
9
3.1.1 Traversingblocked gates
In thissectionwediscussanexperimentin whicharobotis givenataskof traversinggatesformed
by two closelyplacedcoloredtargets(seeFigure6(a). Theenvironmentis arrangedsuchthatthe
pathbetweenthetargetsis blockedby a largebox thatpreventstherobotfrom goingthrough.
Expressingintentionality of performingthis task is doneby executingthe Track behavior,
whichallowstherobotto makeits wayaroundoneof thetargets.While trying to reachthedesired
distanceandangleto thetarget,hinderedby the largebox, therobotshows thedirectionit wants
to go in, which is blockedby theobstacle.
(a) Go-ing throughagate
(b) Pickingup an inac-cessiblebox
(c) Visiting amissingtarget
Figure6: Thehuman-robotinteractionexperimentssetup
We performed12 experimentsin which the humanproved to be helpful. Failuresin accom-
plishingthe taskoccurredin threeof thecases,in which therobotcouldnot get throughthegate
evenafterthehumanhadclearedthebox from its way. For therestof thecasestherobotsuccess-
fully finishedthetaskwith thehuman’sassistance.
3.1.2 Moving inaccessiblelocatedobjects
Theexperimentdescribedin this sectioninvolvesmoving objectsaround.Therobot is supposed
to pick up a smallobject,closeto a big bluetarget. In orderto inducetherobot to seekhelp,we
placedthedesiredobjectin a narrow spacebetweentwo largeboxes,thusmakingit inaccessible
to therobot(seeFigure6(b)).
The robot expressesthe intentionsof getting the objectby simply attemptingto executethe
correspondingPickUp behavior. This forcesthe robot to lower andopenits gripperandtilt its
cameradown whenapproachingtheobject. Thedrive to pick up theobjectis combinedwith the
10
effect of avoiding largeboxes,causingtherobotto go backandforth in front of thenarrow space
andthusconvey anexpressivemessageaboutits intentionsandits problem.
From 12 experimentsin which the humanproved to be helpful, we recordedtwo failuresin
achieving the task. Thesefailureswere due to the robot losing track of the object during the
human’s interventionandbeingunableto find it againbeforetheallocatedtime expired. For the
restof thecasesthehelpreceivedallowedtherobotto successfullyfinish thetaskexecution.
3.1.3 Visiting non-existingtargets
In thissectionwepresentanexperimentthatdoesnot fall into thecategoryof thetasksmentioned
above andis an examplefor which the framework of communicatingthroughactionsshouldbe
extendedto includemoreexplicit meansof communication.Considera taskof visiting a number
of targets,in a given order (Green,Orange,Blue, Yellow, Orange,Green),in which oneof the
targetshasbeenremovedfrom theenvironment(Figure6(c)).
Therobotgivesup aftersometime of searchingfor themissingtargetandgoesto thehuman
for help.By applyingthesamestrategy of executingin front of thehelperthebehavior thatfailed,
theresultwill beacontinuouswanderingin searchof thetargetfrom which it is hardto infer what
therobot’sgoalandproblemare.It is evidentthattherobotis looking for something- but without
theability to namethemissingobject,thehumancannotintervenein ahelpfulway.
3.2 Discussion
Theexperimentspresentedabovedemonstratethatimplicit yet expressiveaction-basedcommuni-
cationcanbe successfullyusedeven in the domainof mobile robotics,wherethe robotscannot
utilize physicalstructuresimilaritiesbetweenthemselvesandthepeoplethey areinteractingwith.
However, asour third experimentshowed,therearesituationsin whichactionsalonearenotsuffi-
cientfor conveying therobot’s intent.This is dueto thefact thatthefailuretherobotencountered
hasaspectsthat could not be expressedby only repeatingthe unsuccessfulactions. For those
casesweshouldemploy explicit formsof communication,suchasnaturallanguage,to convey the
necessaryinformation.
From the results,our observations,andthe reportof the humansubjectinteractingwith the
robot throughouttheexperiments,we derive the following conclusionsaboutthevariousaspects
of therobot’ssocialbehavior:
11
0 Capturing a human’s attention by approachingandthengoingbackandforth in front of
him is abehavior typically easilyrecognizedandinterpretedassoliciting help.
0 Getting a human to follow by turning aroundand startingto go to the placewherethe
problemoccurred(aftercapturingthehuman’sattention)requiresmultiple trials in orderfor
thehumanto completelyfollow therobottheentireway. This is dueto severalreasons:first,
evenif interestedandrealizingthattherobotwantssomethingfrom him, thehumanmaynot
actuallybelieve thathe is beingcalledby a robot in a way in which a dogwould do it and
doesnot expectthat following is whatheshoulddo. Second,afterchoosingto go with the
robot,if wanderingin searchof theplacewith theproblemtakestoo muchtime, thehuman
givesup notknowing whethertherobotstill needshim.
0 Conveying intentions by repeatingthe actionsof a failing behavior in front of a helperis
easilyachievedfor tasksin which all theelementsof thebehavior executionareobservable
to thehuman.Uponreachingtheplaceof therobot’sproblem,thehelperis alreadyengaged
in interactionandis expectingto be shown something.Therefore,seeingthe robot trying
andfailing to performcertainactionsis a clearindicationof therobot’s intentionsandneed
for assistance.
4 Learning fr om human demonstrations
In orderto designrobotsthatcouldsuccessfullyandefficiently performin human-robotdomains
it is importantto endow themwith learningcapabilities.This enablesthemnot only to adaptand
improve their performance,but alsoto bemoreaccessibleto a largerrangeof users,from thelay
to theskilled.
Designingcontrollersfor robotic tasksis usuallydoneby peoplespecializedin programming
robots. Even for them,mostoften, this is a complicatedprocess,andit essentiallyrequirescre-
ating by handa new anddifferentcontroller for eachparticulartask. Although certainpartsof
controllers,oncerefined,can be reused,it is still necessaryto, at leastpartially, redesignand
customizethe existing codefor eachof the new tasks. If robotsare to be effective in human-
robot domains,even userswithout programmingskills shouldbe ableto interactwith themand
“re-program”them.
Therefore,automatingthe robot controller designprocessbecomesof particularinterest. A
naturalapproachto this problemis the useof teachingby demonstration.Insteadof having to
12
write, by hand,acontrollerthatachievesaparticulartask,weallow a robotto automaticallybuild
it from observationor from theexperienceit hadwhile interactingwith a teacher. It is the latter
approachthatwewill considerin thiswork,asameansfor transferof taskknowledgefromteachers
to robots.
Weassumethattherobotis equippedwith asetof behaviors,alsocalledprimitives, whichcan
becombinedinto a varietyof tasks.We thenfocuson a learningstrategy thatwould helpa robot
build high-level taskrepresentationthatwill achieve thegoalsdemonstratedby a teacherthrough
the activation of the existing behavior set. We do not attemptto reproduceexact trajectoriesor
actionsof theteacher, but ratherlearnthetaskin termsof its high-level goals.
In our particularapproachto learning,we uselearningby experienceddemonstrations. This
implies that the robot actively participatesin the demonstrationprovided by the teacher, by fol-
lowing thehuman,andexperiencingthetaskthroughits own sensors.Thus,our approachis once
againaction-based:therobothasto performthetaskin orderto learnit. This is anessentialchar-
acteristicof ourapproach,andis whatis providing therobotthedatanecessaryfor learning.In the
mobilerobotdomaintheexperienceddemonstrationsareachievedby following of andinteracting
with theteacher. Theadvantageof “putting therobot through” thetaskduringthedemonstration
is that the robot is able to adjustits behaviors (throughtheir parameters)using the information
gatheredthroughits own sensors.In contrast,if the taskweredesignedby hand,a userwould
haveto determinethoseparametervalues.Furthermore,if therobotweremerelyobservingbut not
executingthetask,it wouldalsohaveto estimatetheparametervaluesat leastfor theinitial trial or
setof trials. In additionto experiencingparametervaluesdirectly, theexecutionof thebehaviors
providesobservationsthat containtemporalinformation for properbehavior sequencing,which
wouldbetediousto designby handfor taskswith long temporalsequences.
An importantchallengefor a learningmethodthatis basedonrobot’sobservationsis to distin-
guishbetweentherelevantandirrelevantinformationthattherobotis perceiving. In our architec-
ture, theabstract behaviors help the robotssignificantlyin pruningthe observationsthat arenot
relatedto theirown skills, but it is still impossibleto determineexactlywhatis really relevantfor a
particulartask.For example,while teachinga robotto go andpick up themail, a robotcandetect
numerousotheraspectsalongits path(e.g.,passingachair, meetinganotherrobot,etc.).However,
theseobservationsshouldnot be includedin the robot’s learnedtask,as they are irrelevant for
gettingthemail.
To have a robot learna taskcorrectlyin suchconditions,theteacherneedsa meansof provid-
13
ing therobotwith additionalinformationthanjust thedemonstrationexperience.In ourapproach,
the teacheris allowed to signalthroughgestures/speechthe momentsin time whenthe environ-
ment presentsaspectsrelevant to the task. While this allows the robot to distinguishsomeof
the irrelevant observations,it still may not help it to perfectly learnthe task. For this, methods
suchasmultiple demonstrationsandgeneralizationtechniquescanbe applied. We arecurrently
investigatingthesemethodsasa futureextensionto this work.
Thegeneralideaof thealgorithmis to addto thenetwork taskrepresentationaninstanceof all
behaviors whosepostconditionshave beentrueduringthedemonstration,andduringwhich there
have beensignalsfrom the teacher, in the orderof their occurrence.At the endof the teaching
experience,the intervals of time when the effects of eachof the behaviors have beentrue are
known, andareusedto determineif theseeffectshave beenactive in overlappingintervalsor in
sequence.Basedon theabove information,thealgorithmgeneratesthepropernetwork links (i.e.,
precondition-postconditiondependencies).This learningprocess,shown in Figure7, is described
in moredetailedin [16].
Figure7: Stepsof thelearningfrom demonstrationalgorithm
We designedseveral different experimentsthat rely on navigation and object manipulation
skills of therobots.First,we reporton theperformanceof learningfrom humanteachersin clean
environments,followedby learningin clutteredenvironments.
4.1 Experimental results- learning in cleanenvir onments
We performedthreedifferentexperimentsin a 4m x 6m arena,in which only the objectsrele-
vant to the taskswerepresent.During the demonstrationphase,a humanteacherled the robot
throughtheenvironmentwhile therobotrecordedits observationsrelative to thepostconditionsof
14
its behaviors. Thedemonstrationsincluded:
0 teachinga robotto visit anumberof cylindrical coloredtargetsin aparticularorder;
0 teachinga robotto slalomaroundcylindrical coloredobjects;
0 teachinga robotto transportobjectsbetweenasourceandadestinationlocation(markedby
cylindrical coloredobjects).
We repeatedtheseteaching experimentsmore than five times for eachof the demonstrated
tasks,to validatethat our learningalgorithmreliably constructsthe sametaskrepresentationfor
the samedemonstratedtask. Next, using the behavior networks constructedduring the robot’s
observations,we performedexperimentsin which therobot reliably repeatedthetaskit hadbeen
shownandhadlearned.Wetestedtherobotin executingthetaskfivetimesin thesameenvironment
asthe onein the learningphase,andalsofive timesin a changedenvironment. We presentthe
detailsandtheresultsfor eachof thetasksin thefollowing sections.
4.1.1 Learning to visit targetsin a particular order
Thegoalof thisexperimentwasto teachtherobotto reachasetof targetsin theorderindicatedby
thearrows in Figure8(a).Therobot’sbehavior setcontainsaTracking behavior, parameterizable
in termsof thecolorsof targetsthatareknown to therobot. Therefore,duringthedemonstration
phase,differentinstancesof thesamebehavior producedoutputaccordingto their settings.
(a) Experimentalsetup(1)
(b) Experimentalsetup(2)
(c) Approximaterobottrajectory
Figure8: Experimentalsetupfor thetargetvisiting task
Figure9 shows thebehavior network therobotconstructedasa resultof theabovedemonstra-
tion.
15
Track(Green, 179, 468)
Track(Orange, 121, 590)
Track(Blue, 179, 531)
Track(Yellow, 179, 814)
Track(Orange, 55, 769)
Track(Green, 0, 370)
INIT
Figure9: Taskrepresentationlearnedfrom thedemonstrationof theVisit targetstask
Morethanfivetrialsof thesamedemonstrationwereperformedin orderto verify thereliability
of thenetwork generationmechanism.All of theproducedcontrollerswereidenticalandvalidated
thattherobotlearnedthecorrectrepresentationfor this task.
4.1.2 Learning to slalom
In this experiment,thegoalwasto teacha robotto slalomthroughfour targetsplacedin a line, as
shown in Figure10(a).Wechangedthesizeof thearenato 2mx 6mfor this task.
(a) (b)
Figure10: TheSlalom task:(a)Experimentalsetup;(b) Approximaterobottrajectory
During 8 differenttrials the robot learnedthe correcttaskrepresentationasshown in the be-
havior network from Figure11.
We performed20 experiments,in which the robot correctlyexecutedtheslalomtaskin 85%
of thecases.Thefailuresconsistedof two types:1) therobot,afterpassingone“gate,” couldnot
find thenext onedueto thelimitationsof its visionsystem;and2) therobot,while searchingfor a
16
Track(Yellow, 0, 364)
Track(Orange, 178, 378)
Track(Blue, 10, 350)
Track(Green, 179, 486)
Initialize
Figure11: Taskrepresentationlearnedfrom thedemonstrationof theSlalom task
gate,turnedbacktowardthealreadyvisitedgates.Figure10(b)shows theapproximatetrajectory
of therobotsuccessfullyexecutingtheslalomtaskon its own.
4.1.3 Learning to traverse“gates” and moveobjectsfr om oneplaceto another
Thegoalof this experimentwasto extendthecomplexity of thetaskto belearnedby addingto it
objectmanipulation.For this, therobotusedits behaviors for picking up anddroppingobjectsin
additionto thebehaviors for navigationandtracking,alreadydescribed.
(a) (b)
Figure12: TheObject manipulation task:(a)Traversinggatesandmoving objects;(b) Approxi-matetrajectoryof therobot
Thesetupfor this experimentis presentedin Figure12(a).Notethesmallorangebox closeto
thegreentarget. In orderto teachtherobotthatthetaskis to pick uptheorangeboxplacednearthe
greentarget(thesource),thehumanled therobotto thebox,andwhensufficiently nearit, placed
the box betweenthe robot’s grippers. After leadingthe robot throughthe “gate” formedby the
blueandyellow targets,whenreachingtheorangetarget(thedestination),thehumantook thebox
from therobot’sgripper. Thelearnedbehavior network representationis shown in Figure13. Since
17
the robot startedthe demonstrationwith nothingin the gripper, theeffectsof the Drop behavior
weremet, andthusan instanceof that behavior wasaddedto the network. This ensurescorrect
executionfor thecasewhentherobotmight startthe taskwhile holdingsomething:thefirst step
wouldbeto droptheobjectbeingcarried.
Theability to tracktargetswithin a [0, 180] degreerangeallows therobotto learnto naturally
executethe part of the taskinvolving going througha gate. This experienceis mappedonto the
robot’srepresentationasfollows: “track theyellow targetuntil it is at180degrees(and50cm)with
respectto you, thentrack theblue target until it is at 0 degrees(and40cm).” At executiontime,
sincethe robot is ableto track both targetseven after they disappearedfrom its visual field, the
goalsof theabove Track behaviors wereachievedwith a smooth,naturaltrajectoryof the robot
passingthroughthegate.Drop1
Track(Green, 179, 528)
PickUp(Orange)
Track(Yellow, 179, 396)
Track(Blue, 0, 569)
Track(Orange, 55, 348)
Drop2
INIT
Figure13: Taskrepresentationlearnedfrom thedemonstrationof theObject manipulation task
Due to the increasedcomplexity of the taskdemonstration,in 10%of thecases(out of more
than10trials)thebehavior network representationsbuilt by therobotwerenotcompletelyaccurate.
Theerrorsrepresentedspecializedversionsof thecorrectrepresentation,suchas:Track thegreen
target from a certainangleanddistance,followedby thesameTrack behavior but with different
parameters- only thelastwasin factrelevant.
The robot correctlyexecutedthe task in 90% of the cases.The failureswereall of the type
involving exceedingthe allocatedamountof time for the task. This happenedwhen the robot
failedto pick up thebox becauseit wastoo closeto it andthusendedup pushingit without being
ableto perceive it. This failureresultsfrom theundesirablearrangementandrangeof therobot’s
sensors,notto any algorithmicissues.Figure14showstherobot’sprogressduringtheexecutionof
asuccessfultask,specificallytheintervalsof timeduringwhichthepostconditionsof thebehaviors
18
0 10 20 30 40 50 600
1
2
3
4
5
6
7
8
Drop1
Track(Green)
PickUp(Orange)
Track(Yellow)
Track(Blue)
Track(Orange)
Drop2
Moving objects and traversing gates task
Time [seconds]
Beh
avio
rs
Figure14: The robot’s progress(achievementof behavior postconditions)while performingtheObject manipulation task
in thenetwork weretrue: therobotstartedby goingto thegreentarget(thesource),thenpickedup
thebox,traversedthegate,andfollowedtheorangetarget(thedestination)whereit finally dropped
thebox.
4.1.4 Discussion
The resultsobtainedfrom the above experimentsdemonstratethe effectivenessof usinghuman
demonstrationcombinedwith ourbehavior architectureasamechanismfor learningtaskrepresen-
tations.Theapproachwepresentedallowsa robotto automaticallyconstructsuchrepresentations
from a singledemonstration.The summaryof the experimentalresultsis presentedin Table1.
Furthermore,thetaskstherobotis ableto learncanembedarbitrarily longsequencesof behaviors,
whichareencodedwithin thebehavior network representation.
Table1: Summaryof theexperimentalresults.
SuccessesExperimentname TrialsNr. Percent
Six targets(learning) 5 5 100%Six targets(execution) 5 5 100%Slalom(learning) 8 8 100%Slalom(execution) 20 17 85 %Objectmove(learning) 10 9 90 %Objectmove(execution) 10 9 90 %
Analyzing the taskrepresentationsthe robot built during the experimentsabove, we observe
thetendency towardover-specialization.Thebehavior networkstherobotlearnedenforcethatthe
19
executiongothroughall demonstratedstepsof thetask,evenif someof themmightnotberelevant.
Sincethereis no direct information from the humanaboutwhat is or is not relevant during a
demonstration,andsincetherobotlearnsthetaskrepresentationfrom evenasingledemonstration,
it assumesthat everything that it noticesabout the environment is importantand representsit
accordingly.
As any one-shotlearningsystem,our systemlearneda correct,but potentiallyoverly special-
ized representationof the demonstratedtask. Additional demonstrationsof the sametaskwould
allow it to generalizeat the level of theconstructedbehavior network. In thenext sectionwe ad-
dresstheproblemof overspecializationby experimentingin clutteredenvironmentsandallowing
theteacherto signalto therobotthesaliency of particularevents,or evenobjects.While this does
noteliminateirrelevantenvironmentstatefrom beingobserved,it biasestherobotto noticeand(if
capable)capturethekey elements.
4.2 Learning in Envir onmentsWith Distractors
Thegoalof theexperimentspresentedin this sectionis to show theability of the robotsto learn
from in environmentswith distractorobjects,whicharenot relevantfor thedemonstratedtasks.
The task to be learnedby the robot is similar to the moving objectstask from above (Fig-
ure15(a)): pick up theorangebox placednearthe light greentarget (thesource),go throughthe
“gate” formedby theyellow andlight orangetarget,droptheboxat thedarkgreentarget(thedes-
tination)andthencomebackto thesourcetarget.Theorangeandtheyellow targetsat theleft are
distractorsthatshouldnotbeconsideredaspartof thetask.In orderto teachtherobotthatit hasto
pick up thebox, thehumanled therobotto it andthen,whensufficiently nearit, placedit between
therobot’s grippers.At thedestinationtarget, the teachertook thebox from therobot’s grippers.
Momentsin timesignaledby theteacherasbeingrelevantto thetaskare:giving therobotthebox
while closeto thelight greentarget,teacherreachingtheyellowandlight orange target,takingthe
box from therobotwhile at thegreentarget,andteacherreachingthe light greentargetin theend.
Thus,althoughtherobotobservedthat it hadpassedtheorange anddistantyellow targetsduring
thedemonstration,it did not includethemin its taskrepresentation,sincetheteacherdid notsignal
any relevancewhile beingat them.
We performed10 human-robotdemonstrationexperimentsto validatetheperformanceof our
learningalgorithm.Wethenevaluatedeachlearnedrepresentationbothby inspectingit structurally
andby having therobotperformit, to getphysicalvalidationthattherobotlearnedthecorrecttask.
20
(a)Environment
0 50 100 150 200 2500
1
2
3
4
5
6
7
8
9
Drop0
Track(LightGreen)
PickUp(Orange)
Track(LightOrange)
Track(Yellow)
Track(Green)
Drop9
Track(LightGreen)
Moving objects and traversing gates task
Time [seconds]
Beh
avio
rs
(b) Achievementof behavior postcondi-tions
Figure15: TheObject manipulation taskin environmentswith distractors
In 9 of the 10 experimentsthe robot learneda structurallycorrectrepresentation(sequencingof
the relevantbehaviors) andalsoperformedit correctly. In onecase,althoughthestructureof the
behavior network wascorrect,the learnedvaluesof oneof thebehavior’s parameterscausedthe
robot to performan incorrecttask(insteadof goingbetweentwo of the targetstherobotwent to
themandthenaround).The learnedbehavior network representationof this taskis presentedin
Figure16.
DROP0
MTLGreen1
PICKUPOrange2
MTLOrange4
MTYellow7
MTGreen8
DROP9
MTLGreen11
INIT
Figure16: Taskrepresentationlearnedfrom humandemonstrationfor theObject manipulationtask
In Figure15(b)weshow therobot’sprogressduringtheexecutionof thetask,morespecifically
theinstantsof timeor theintervalsduringwhichthepostconditionsof thebehaviors in thenetwork
weretrue.
For the9 out of 10 successeswe have recorded,the95%confidenceinterval for thebinomial
21
distributionof thelearningrateis [0.55520.9975],obtainedusinga Paulson-Camp-Prattapproxi-
mation[17] of theconfidencelimits.
As a base-casescenario,to demonstratethe reliability of the learnedrepresentation,we per-
formed10 trials, in which a robot repeatedlyexecutedoneof the learnedrepresentationsof the
above task. In 9 of the10 casestherobotcorrectlycompletedtheexecutionof thetask.Theonly
failurewasdueto a time-outin trackingthegreentarget.
5 Relatedwork
Thework presentedhereis mostrelatedto two importantareasof roboticsresearch:human-robot
interactionandrobot learning.Herewe discussits relationto bothareasandstatetheadvantages
gainedby combiningthetwo in thecontext of addingsocialcapabilitiesto agentsin human-robot
domains.
Most of the approachesto human-robotinteractionso far rely on usingpredefined,common
vocabulariesof gestures[18], signsor words.Thesecanbesaidto beusinga symboliclanguage,
whoseelementsexplicitly communicatespecificmeanings.Theadvantageof thesemethodsis that,
assuminganappropriatevocabularyandgrammar, arbitrarily complex informationcanbedirectly
transmitted.However, aswe arestill far from a truedialoguewith a robot,mostapproachesthat
usenatural language for communicationemploy a limited andspecificvocabulary which hasto
be known in advanceby both the robot and the humanusers. Similarly, for gesture and sign
languages, a mutuallypredefined,agreed-uponvocabulary of symbolsis necessaryfor successful
communication.In this work, we show that communicationbetweenrobotsandhumanscanbe
achievedevenwithoutsuchexplicit prior vocabularysharing.
Oneof themostimportantformsof implicit communication,which hasreceiveda greatdeal
of attentionamongresearchers,is theuseof variousformsof body language.Using this typeof
communicationfor human-robotinteraction,andhuman-machineinteractionin general,is becom-
ing very popular. For example,it hasbeenappliedto humanoidrobots(in particularhead-eye
systems),for communicatingemotionalstatesthroughfaceexpressions[19] or bodymovements
[7], wherethe interactionis performedthroughbody language. This ideahasbeenexploredin
autonomousassistantsand interfaceagentsaswell [20]. While facial expressionsarea natural
meansof interactionfor a humanoid,or in generala “headed,” robot, they cannotbeentirelyap-
plied to the domainof mobile robots,wherethe platformstypically have a very different, and
22
non-anthropomorphicphysicalstructure.In ourapproach,wedemonstratethattheuseof implicit,
action-basedmethodsfor communicatingandexpressingintentionscanbeextendedto themobile
robotdomain,despitethestructuraldifferencesbetweenmobilerobotsandhumans.
Teachingrobotsnew tasksis a topic of greatinterestin robotics. In the context of behavior-
basedrobot learning,methodsfor learningpolicies(situation-behavior mappings)have beensuc-
cessfullyappliedto single-robotlearningof varioustasks,mostcommonlynavigation[21], hexa-
podwalking [22], box-pushing[23], andmulti-robotlearning[24].
In theareaof teachingrobotsby demonstration,alsoreferredto asimitation, [8] demonstrated
simplifiedmazelearning,i.e., learningturningbehaviors,by following anotherrobotteacher. The
robot usedits own observationsto relatethe changesin the environmentwith its own forward,
left, and right turn actions. [9] usedmodel-basedreinforcementlearningto speed-uplearning
for a systemin which a 7 DOF robot arm learnedthe taskof balancinga pole from a brief hu-
mandemonstration.Otherwork in our lab is alsoexploring imitation basedon mappingobserved
humandemonstrationonto a setof behavior primitives,implementedon a 20 DOF dynamichu-
manoidsimulation[25, 26]. Thekey differencebetweenthework presentedhereandthoseabove
is at the level of learning. The work above focuseson learningat the level of action imitation
(andthususually resultsin acquiringreactive policies),while our approachenableslearningof
high-level, sequentialtasks.
6 Conclusions
In thispaperwepresentedanaction-basedapproachto human-robotinteractionandrobotlearning,
both dealingwith aspectsof designingsocially intelligent agents.The methodwasshown to be
effective for interactingwith humansusing implicit, action-basedcommunicationand learning
from experienceddemonstration.
We arguedthat the meansof communicationand interactionof mobile robotswhich do not
have anthropomorphic,animal,or pet-like appearanceandexpressivenessshouldnot necessarily
belimited to explicit typesof interaction,suchasspeechor gestures.Wedemonstratedthatsimple
actionscouldbeusedin orderto allow a robot to successfullyinteractwith usersandexpressits
intentions.For a largeclassof intentionssuchas: I wantto do ”this” - but I can’t, theprocessof
capturingahuman’sattentionandthentrying to executetheactionandfailing is expressiveenough
to effectively convey themessage,andthusobtainassistance.
23
We alsopresenteda methodologyfor learningfrom demonstrationin which the robot learns
by relatingtheobservationsto theknown effectsof its behavior repertoire.This is madepossible
by our behavior architecturethathasa perceptualcomponent(abstract behavior) which embeds
representationsof therobot’s behavior goals.We demonstratedthat themethodis robustandcan
be appliedto a variety of tasksinvolving the executionof long, and sometimeseven repeated
sequencesof behaviors.
While webelievethatrobotsshouldbeendowedwith asmany interactionmodalitiesasis possi-
bleandefficient,wefocusonaction-basedinteractionasalesserstudiedbut powerfulmethodology
for bothlearningandhuman-machineinteractionin general.
References
[1] A. David and M. P. Ball, “The video game: a model for teacher-studentcollaboration,”
Momentum, vol. 17,no.1, pp.24–26,1986.
[2] C. Murray andK. VanLehn, “Dt tutor: A decision-theoretic,dynamicapproachfor optimal
selectionof tutorialactions,” in Proc.,ITS,6th Intl. Conf., 2000,pp.153–162.
[3] V. J.ShuteandJ.Psotka,“Intelligent tutoringsystems:past,presentandfuture,” Handbook
of Research onEducationalCommunicationsandTechnology, 1996.
[4] T. Matsuiet al., “An office conversationmobile robot that learnsby navigationandconver-
sation,” in Proc.,RealWorld ComputingSymp., 1997,pp.59–62.
[5] SebastianThrun et al., “A secondgenerationmobile tour-guiderobot,” in Proc. of IEEE,
ICRA, 1999.
[6] FrancoisMichaudandCaronS., “Roball - anautonomoustoy-rolling robot,” in Proc.of the
Workshopon InteractiveRoboticsandEntertainment, 2000.
[7] Lola D. CanameroandJakob Fredslund, “How doesit feel? emotionalinteractionwith a
humanoidlego robot,” TechReportFS-00-04,AAAI Fall Symposium,2000.
[8] Gillian HayesandJohnDemiris, “A robotcontrollerusinglearningby imitation,” in Proc.
of theIntl. Symp.on IntelligentRoboticSystems, Grenoble,France,1994,pp.198–204.
24
[9] StefanSchaal,“Learningfrom demonstration,” in Advancesin Neural InformationProcess-
ing Systems9, M.C. Mozer, M. Jordan,andT. Petsche,Eds.1997,pp.1040–1046,MIT Press,
Cambridge.
[10] Brian Scasellatti,“Investigatingmodelsof socialdevelopmentusinga humanoidrobot,” in
Biorobotics, BarbaraWebbandThomasConsi,Eds.MIT Press,to appear, 2000.
[11] Maja J. Mataric, “Behavior-basedcontrol: Examplesfrom navigaton, learning,andgroup
behavior,” Journal of Experimentaland Theoretical Artificial Intelligence, vol. 9, no. 2–3,
pp.323–336,1997.
[12] RonaldC. Arkin, Behavior-BasedRobotics, MIT Press,CA, 1998.
[13] MonicaN. NicolescuandMaja J. Mataric, “A hierarchicalarchitecturefor behavior-based
robots,” in Proc., Intl. Conf. on AutonomousAgentsand Multiagent Systems, Bologna,
ITALY, July2002.
[14] DanielC. Dennett,TheIntentionalStance, MIT Press,Cambridge,1987.
[15] Barry Brian Werger, “Ayllu: Distributedport-arbitratedbehavior-basedcontrol,” in Proc.,
The5th Intl. Symp.on DistributedAutonomousRoboticSystems, Knoxville, TN, 2000,pp.
25–34,Springer.
[16] MonicaN. NicolescuandMaja J. Mataric, “Experience-basedrepresentationconstruction:
learningfrom humanandrobotteachers,” in Proc.,IEEE/RSJIntl. Conf. onIntelligentRobots
andSystems, Maui, Hawaii, USA, Oct2001,pp.740–745.
[17] Colin R. Blyth, “Approximatebinomialconfidencelimits,” Journalof theAmericanStatisti-
cal Association, vol. 81,no.395,pp.843–855,September1986.
[18] David Kortenkamp,Eric Huber, andR. PeterBonasso,“Recognizingandinterpretingges-
tureson amobilerobot,” Proc.,AAAI, pp.915–921,1996.
[19] CynthiaBreazealandBrianScassellati,“How to build robotsthatmakefriendsandinfluence
people,” in Proc.,IROS,Kyonju,Korea, 1999,pp.858–863.
[20] Tomoko KodaandPattieMaes,“Agentswith faces:Theeffectsof personificationof agents,”
in Proceedings,HCI, 1996,pp.98–103.
25
[21] MarcoDorigoandMarcoColombetti,RobotShaping:AnExperimentin BehaviorEngineer-
ing, MIT Press,Cambridge,1997.
[22] Pattie MaesandRodney A. Brooks, “Learning to coordinatebehaviors,” in Proc., AAAI,
Boston,MA, 1990,pp.796–802.
[23] SridharMahadevan andJonathanConnell, “Scaling reinforcementlearningto roboticsby
exploiting the subsumptionarchitecture,” in Eighth Intl. Workshopon Machine Learning,
1991,pp.328–337.
[24] Maja J Mataric, “Reinforcementlearningin themulti-robotdomain,” AutonomousRobots,
vol. 4, no.1, pp.73–83,1997.
[25] Maja J Mataric, “Sensory-motorprimitivesasa basisfor imitation: Linking perceptionto
actionandbiology to robotics,” in Imitation in AnimalsandArtifacts, ChrystopherNehaniv
andKerstinDautenhahn,Eds.MIT Press,Cambridge,to appear, 2001.
[26] OdestChadwicke Jenkins,Maja J Mataric, andStefan Weber, “Primitive-basedmovement
classificationfor humanoidimitation,” in Proc., First IEEE-RASIntl. Conf. on Humanoid
Robotics, Cambridge,MA, MIT, 2000.
26