behavioral overlays for non-verbal communication ... · mans and of the robot itself. an...

Behavioral Overlays for Non-Verbal Communication Expression on

a Humanoid Robot

Andrew G. BrooksRobotic Life Group

MIT Media Laboratory

77 Massachusetts Avenue, Cambridge, MA, USA

[email protected]

Ronald C. ArkinMobile Robot Laboratory and GVU Center

College of Computing, Georgia Institute of Technology

Atlanta, GA, USA

[email protected]

July 26, 2006

Abstract

This research details the application of non-verbalcommunication display behaviors to an autonomoushumanoid robot, including the use of proxemics,which to date has been seldom explored in the fieldof human-robot interaction. In order to allow therobot to communicate information non-verbally whilesimultaneously fulfilling its existing instrumental be-havior, a “behavioral overlay” model that encodesthis data onto the robot’s pre-existing motor ex-pression is developed and presented. The state ofthe robot’s system of internal emotions and moti-vational drives is used as the principal data sourcefor non-verbal expression, but in order for the robotto display this information in a natural and nu-anced fashion, an additional para-emotional frame-work has been developed to support the individual-ity of the robot’s interpersonal relationships with hu-mans and of the robot itself. An implementation onthe Sony QRIO is described which overlays QRIO’sexisting EGO architecture and situated schema-basedbehaviors with a mechanism for communicating thisframework through modalities that encompass pos-ture, gesture and the management of interpersonaldistance.

1 Introduction

When humans interact with other humans, they usea variety of implicit mechanisms to share informa-tion about their own state and the state of the in-teraction. Expressed over the channel of the phys-ical body, these mechanisms are collectively known

as non-verbal communication or “body language”. Ithas not been proven that humans respond in preciselythe same way to the body language of a humanoidrobot as they do to that of a human. Nor have thespecific requirements that the robot must meet in or-der to ensure such a response been empirically estab-lished. It has, however, been shown that humans willapply a social model to a sociable robot(Breazeal,C. 2003), and will in many cases approach interac-tions with electronic media holding a set of precon-ceived expectations based on their experiences of in-teractions with other humans(Reeves, B. and Nass,C. 1996). If these social equivalences extend to theinterpretation of human-like body language displayedby a robot, it is likely that there will be correspond-ing benefits associated with enabling robots to suc-cessfully communicate in this fashion. For a robotsuch as the Sony QRIO (Figure 1) whose principalfunction is interaction with humans, we identify threesuch potential benefits.

First is the practical benefit of increasing the databandwidth available for the “situational awareness”of the human, by transmitting more informationwithout adding additional load to existing communi-cation mechanisms. If it is assumed that it is benefi-cial for the human to be aware of the internal state ofthe robot, yet there are cases in which it is detrimen-tal for the robot to interrupt other activities (e.g. dia-log) in order to convey this information, an additionalsimultaneous data channel is called for. Non-verbalcommunication is an example of such a channel, andprovided that the cues are implemented according tocultural norms and are convincingly expressible bythe robot, adds the advantage of requiring no addi-tional training for the human to interpret.

1

The second benefit is the forestalling of miscom-munication within the expanded available data band-width. The problem raised if humans do indeed haveautomatic and unconscious expectations of receivingstate information through bodily signals, is that cer-tain states are represented by null signals, and hu-mans interacting with a robot that does not commu-nicate non-verbally (or which does so intermittentlyor ineffectively) may misconstrue lack of communica-tion as deliberate communication of such a state. Forexample, failing to respond to personal verbal com-munications with attentive signals, such as eye con-tact, can communicate coldness or indifference. If ahumanoid robot is equipped with those sorts of emo-tions, it is imperative that we try to ensure that such“false positive” communications are avoided to thefullest extent possible.

The third potential benefit is an increased proba-bility that humans will be able to form bonds with therobot that are analogous to those formed with otherhumans — for example, affection and trust. We be-lieve that the development of such relations requiresthat the robot appear “natural” — that its actionscan be seen as plausible in the context of the inter-nal and external situations in which they occur. Inother words, if a person collaborating with the robotcan “at a glance” gain a perspective of not just whatthe robot is doing but why it is doing it, and what itis likely to do next, we think that he or she will bemore likely to apply emotionally significant models tothe robot. A principal theory concerning how humanscome to be so skilled at modeling other minds is Sim-ulation Theory, which states that humans model themotivations and goals of an observed agent by usingtheir own cognitive structures to mentally simulatethe situation of the observee (Davies, M. and Stone,T. 1995, Gordon, R. 1986, Heal, J. 2003). This sug-gests that it is likely that the more the observablebehavior of the robot displays its internal state byreferencing the behaviors the human has been condi-tioned to recognize — the more it “acts like” a humanin its “display behaviors” — the more accurate thehuman’s mental simulation of the robot can become.

With these benefits in mind as ultimate goals, wehereby report on activities towards the more imme-diate goal of realizing the display behaviors them-selves, under three specific constraints. First, suchdisplays should not restrict the successful executionof “instrumental behaviors”, the tasks the robot isprimarily required to perform. Second, the applica-tion of body language should be tightly controlled to

Figure 1: Sony QRIO, an autonomous humanoidrobot designed for entertainment and interactionwith humans, shown here in its standard posture.

avoid confusing the human — it must be expressedwhen appropriate, and suppressed when not. Third,non-verbal communication in humans is subtle andcomplex; the robot must similarly be able to use thetechnique to represent a rich meshing of emotions,motivations and memories. To satisfy these require-ments, we have developed the concept of behavioral“overlays” for incorporating non-verbal communica-tion displays into pre-existing robot behaviors. First,overlays provide a practical mechanism for modifyingthe robot’s pre-existing activities “on-the-fly” withexpressive information rather than requiring the de-sign of specific new activities to incorporate it. Sec-ond, overlays permit the presence or absence of bodylanguage, and the degree to which it is expressed, tobe controlled independent of the underlying activity.Third, the overlay system can be driven by an arbi-trarily detailed model of these driving forces withoutforcing this model to be directly programmed intoevery underlying behavior, allowing even simple ac-tivities to become more nuanced and engaging.

A brief summary of the contributions of this re-search follows:

1. This work broadens and reappraises the use ofbodily expressiveness in humanoid robots, par-ticularly in the form of proxemics, which hashitherto been only minimally considered due tosafety considerations and the relative scarcity ofmobile humanoid platforms.

2. This work introduces the concept of a behav-ioral overlay for non-verbal communication thatboth encodes state information into the physicaloutput of ordinary behaviors without requiring

2

modification to the behaviors themselves, and in-creases non-verbal bandwidth by injecting addi-tional communicative behaviors in the absenceof physical resource conflicts.

3. This paper further develops the behavioral com-munications overlay concept into a general modelsuitable for application to other robotic plat-forms and information sources.

4. In contrast with much prior work, the researchdescribed here provides more depth to the infor-mation that is communicated non-verbally, giv-ing the robot the capability of presenting its in-ternal state as interpreted via its own individual-ity and interpersonal memory rather than simplyan instantaneous emotional snapshot.

5. Similarly, while expressive techniques such as fa-cial feature poses are now frequently used inrobots to communicate internal state and engagethe human, this work makes progress towards theuse of bodily expression in a goal-directed fash-ion.

6. Finally, this work presents a functioning imple-mentation of a behavioral overlay system on areal robot, including the design of data struc-tures to represent the robot’s individual respon-siveness to specific humans and to its own inter-nal model.

2 Proxemics and Body Lan-guage

Behavioral researchers have comprehensively enumer-ated and categorized various forms of non-verbalcommunication in humans and animals. In consider-ing non-verbal communication for a humanoid robot,we have primarily focused on the management of spa-tial relationships and personal space (proxemics) andon bodily postures and movements that convey mean-ing (kinesics). The latter class will be more looselyreferred to as “body language”, to underscore the factthat while many kinesic gestures can convey meaningin their own right, perhaps the majority of kinesiccontributions to non-verbal communication occurs ina paralinguistic capacity, as an enhancement of con-current verbal dialog(Dittmann, A. 1978). The useof this term is not intended, however, to imply thatthese postures and movements form a true languagewith discrete rules and grammars; but as Machotkaand Spiegel point out, they convey coded messages

that humans can interpret(Machotka, P. and Spiegel,J. 1982). Taken together, proxemics and body lan-guage can reflect some or all of the type of interaction,the relations between participants, the internal statesof the participants and the state of the interaction.

2.1 Proxemics

Hall, pioneer of the field of proxemics, identified anumber of factors that could be used to analyzethe usage of interpersonal space in human-humaninteractions(Hall, E.T. 1966). State descriptors in-clude the potential for the participants to touch, smelland feel the body heat of one another, and the visualappearance one another’s face at a particular distance(focus, distortion, domination of visual field). Thereactions of individuals to particular proxemic situ-ations were documented according to various codes,monitoring aspects such as the amount and type ofvisual and physical contact, and whether or not thebody posture of the subjects was encouraging (“so-ciopetal”) or discouraging (“sociofugal”) of such con-tact.

An informal classification was used to divide thecontinuous space of interpersonal distance into fourgeneral zones according to these state descriptors. Inorder of increasing distance, these are “Intimate”,“Personal”, “Socio-Consultive” and “Public”. Hu-man usage of these spaces in various relationships andsituations has been observed and summarized(Weitz,S. 1974), and can be used to inform the construc-tion of a robotic system that follows similar guidelines(subject to variations in cultural norms).

Spatial separation management therefore has prac-tical effects in terms of the potential for sensing andphysical contact, and emotional effects in terms of thecomfort of the participants with a particular spatialarrangement. What constitutes an appropriate ar-rangement depends on the nature of the interaction(what kinds of sensing and contact are necessary, andwhat kinds of emotional states are desired for it), therelationship between the participants (an appropri-ate distance for a married couple may be differentthan that between business associates engaged in thesame activity), and the current emotional states ofthe participants (the preceding factors being equal,the shape and size of an individual’s ideal personal“envelope” can exhibit significant variation based onhis or her feelings at the time, as shown in Figure 2).

To ensure that a robot displays appropriate usageof and respect for personal space, and to allow it totake actions to manipulate it in ways that the human

3

Figure 2: Illustration of how an individual’s ‘per-sonal’ space zone may vary in size and shape accord-ing to emotion. During fear, the space an individ-ual considers his own might expand, with greater ex-pansion occurring to his rear as he avoids potentialthreats that he cannot see. During anger, the spacean individual considers her own might expand to agreater extent to her front as she directs her con-frontational attention to known presences.

Proxemic Factor Human QRIOKinesthetic potential(arms only)

60–75cm 20cm

Kinesthetic potential(arms plus torso)

90–120cm 25–35cm

Minimum face recog-nition distance

< 5cm 20cm

Table 1: A comparison of select proxemic factors thatdiffer between adult humans and QRIO (equippedwith standard optics).

can understand and infer from them the underlyingreasoning, requires consideration of all of these fac-tors. In addition, the size of the robot (which is notlimited to the range fixed by human biology) mayhave to be taken into account when considering theproxemics that a human might be likely to find nat-ural or comfortable. See Table 1 for a comparison ofseveral proxemic factors in the case of adult humansand QRIO, and Figure 3 for the general proxemiczones that were selected for QRIO.

There has been little exploration of the use of prox-emics in human-robot interaction to date. The rea-sons for this are perhaps mostly pragmatic in na-ture. Robotic manipulators, including humanoid up-per torsos, can be dangerous to humans and in mostcases are not recommended for interactions withindistances at which physical contact is possible. Inaddition, humanoid robots with full mobility are stillrelatively rare, and those with legged locomotion(complete humanoids) even more so, precluding such

Figure 3: QRIO’s proxemic zones in this implemen-tation, selected as a balance between the zones for anadult human and those computed using QRIO’s rel-evant proxemic factors. The demarcation distancesbetween zones represent the midpoint of a fuzzythreshold function, rather than a ‘hard’ cutoff.

investigations. However, some related work exists.The mobile robot ‘Chaser’ by Yamasaki and Anzai

focused on one of the practical effects of interpersonaldistance by attempting to situate itself at a distancefrom the human that was ideal for sensor operation,in this case the collection of speech audio(Yamasaki,N. and Anzai, Y. 1996). This work demonstrated thatawareness of personal distance considerations couldbe acted upon to improve speech recognition perfor-mance.

In a similar vein, Kanda et al. performed a human-robot interaction field study in which the robot dis-tinguished concurrently present humans as eitherparticipants or observers based on their proxemicdistance(Kanda, T., Hirano, T., Eaton, D. and Ishig-uro, H. 2004). A single fixed distance threshold wasused for the classification, however, and it was left upto the human participants to maintain the appropri-ate proxemics.

Likhachev and Arkin explored the notion of “com-fort zones” for a mobile robot, using attachment the-ory to inform an emotional model that related therobot’s comfort to its spatial distance from an objectof attachment(Likhachev, M. and Arkin, R.C. 2000).The results of this work showed that the robot’s ex-ploration behavior varied according to its level ofcomfort; while the published work did not deal withhuman-robot interaction directly, useful HRI scenar-ios could be envisaged for cases in which the objectof attachment was a human.

More recently, there have been investigations intomodifying the spatial behavior of non-humanoid mo-bile robots in order to make people feel more at easewith the robot. Smith investigated self-adaptation of

4

the behavior of an interactive mobile robot, includingits spatial separation from the human, based on itsassessment of the human’s comfort(Smith, C. 2005).In this case the goal was for the robot to automati-cally learn the personal space preferences of individ-ual humans, rather than to use spatial separation as ageneral form of non-verbal communication. Howeversome of the informal human subject experiments re-ported are instructive as to the value of taking prox-emics into account in HRI, particularly the case inwhich misrecognition of a human discomfort responseas a comfort response leads to a positive feedbackloop that results in distressing behavior on the partof the robot.

Similar efforts have used considerations of hu-mans’ personal space to affect robot navigation.Nakauchi and Simmons developed a robot whose goalwas to queue up to register for a conference alongwith human participants(Nakauchi, Y. and Simmons,R. 2000). The robot thus needed to determine how tomove to positions that appropriately matched humanqueuing behavior. Althaus et al. describe a systemdeveloped to allow a robot to approach a group ofpeople engaged in a discussion, enter the group byassuming a spatially appropriate position, and thenleave and continue its navigation(Althaus, P., Ishig-uro, H., Kanda, T., Miyashita, T. and Christensen,H.I. 2004). Christensen and Pacchierotti use prox-emics to inform a control strategy for the avoidancebehavior exhibited by a mobile robot when forced bya constraining passageway to navigate in close prox-imity to humans(Christensen, H.I. and Pacchierotti,E. 2005). Pacchierotti et al. then report positiveresults from a pilot user study, in which subjects pre-ferred the condition in which the robot moved fastestand signaled the most deference to the humans (bymoving out of the way earliest and keeping the great-est distance away), though the subjects were all fa-miliar and comfortable with robots(Pacchierotti, E.,Christensen, H.I. and Jensfelt, P. 2005). While in-formed by proxemics theory, these efforts focus oncontrol techniques for applying social appropriate-ness to navigation activity, rather than using utilizingproxemic behavior as part of a non-verbal communi-cation suite.

Another recent experiment, by te Boekhorst etal., gave some consideration to potential effects ofthe distance between children and a non-humanoidrobot on the children’s attention to a “pass the par-cel” game(te Boekhorst, R., Walters, M., Koay, K.L.,Dautenhahn, K. and Nehaniv, C. 2005). No signifi-

cant effects were recorded, however the authors ad-mit that the data analysis was complicated by vio-lations of the assumptions underlying the statisticaltests and therefore we believe these results should notbe considered conclusive. Data from the same seriesof experiments was used to point out that the chil-dren’s initial approach distances were socially appro-priate according to human proxemics theory(Walters,M.L., Dautenhahn, K., Koay, K.L., Kaouri, C., teBoekhorst, R., Nehaniv, C.L., Werry, I. and Lee,D. 2005a). A follow-up experiment was conductedusing the same non-humanoid robot to investigatethe approach distances that adults preferred wheninteracting with the robot(Walters, M.L., Dauten-hahn, K., te Boekhorst, R., Koay, K.L., Kaouri,C., Woods, S., Nehaniv, C.L., Lee, D. and Werry,I. 2005b). A majority, 60%, positioned themselves ata distance compatible with human proxemics theory,whereas the remainder, a significant minority of 40%,assumed positions significantly closer. While theseresults are generally encouraging in their empiricalsupport of the validity of human-human proxemictheory to human-robot interactions, caution shouldbe observed in extrapolating the results in either di-rection due to the non-humanoid appearance of therobot.

More recently than the initial submission of this pa-per, there has been interest shown in the use of prox-emics and non-verbal communication by the roboticsearch and rescue community. In a conference posterBethel and Murphy proposed a set of guidelines for af-fect expression by appearance-constrained (i.e. non-humanoid) rescue robots based on their proxemiczone with respect to a human(Bethel, C.L. and Mur-phy, R.R. 2006). The goal of this work was once againto improve the comfort level of humans through so-cially aware behavior.

While not involving robots per se, some virtual en-vironment (VE) researchers have examined reactionsto proxemic considerations between humans and hu-manoid characters within immersive VEs. Bailensonet al., for example, demonstrated that people exhib-ited similar personal spatial behavior towards virtualhumans as they would towards real humans, and thiseffect was increased the more the virtual human wasbelieved to be the avatar of a real human rather thanan agent controlled by the computer(Bailenson, J.N.,Blascovich, J., Beall, A.C. and Loomis, J.M. 2003).However, they also encountered the interesting resultthat subjects exhibited more pronounced avoidancebehavior when proxemic boundaries were violated by

5

an agent rather than an avatar; the authors theo-rize that this is due to the subjects attributing morerationality and awareness of social spatial behaviorto a human-driven avatar than a computer-controlledagent, thus “trusting” that the avatar would not walkinto their virtual bodies, whereas an agent might bemore likely to do so. This may present a lesson forHRI designers: the precise fact that an autonomousrobot is known to be under computer control maymake socially communicative proxemic awareness (asopposed to simple collision avoidance) particularlyimportant for robots intended to operate in closeproximity with humans.

2.2 Body Language

Body language is the set of communicative body mo-tions, or kinesic behaviors, including those that are areflection of, or are intended to have an influence on,the proxemics of an interaction. Knapp identifies fivebasic categories:

1. Emblems, which have specific linguistic meaningand are what is most commonly meant by theterm ‘gestures’;

2. Illustrators, which provide emphasis to concur-rent speech;

3. Affect Displays, more commonly known as fa-cial expressions and used to represent emotionalstates;

4. Regulators, which are used to influence conver-sational turn-taking; and

5. Adaptors, which are behavioral fragments thatconvey implicit information without being tiedto dialog(Knapp, M. 1972).

Dittmann further categorizes body language intodiscrete and continuous (persistent) actions, with dis-crete actions further partitioned into categorical (al-ways performed in essentially the same way) and non-categorical(Dittmann, A. 1978). Body posture itselfis considered to be a kinesic behavior, inasmuch asmotion is required to modify it, and because theycan be modulated by attitude(Knapp, M. 1972).

The kinds of body language displays that can berealized on a particular robot of course depend onthe mechanical design of the robot itself, and thesecategorizations of human body language are not nec-essarily of principal usefulness to HRI designers otherthan sometimes suggesting implementational details(e.g. the necessity of precisely aligning illustratorswith spoken dialog). However, it is useful to examine

the broad range of expression that is detailed withinthese classifications, in order to select those appear-ing to have the most utility for robotic applications.For example, classified within these taxonomies arebodily motions that can communicate explicit sym-bolic concepts, deictic spatial references (e.g. point-ing), emotional states and desires, likes and dislikes,social status, engagement and boredom. As a resultthere has been significant ongoing robotics researchoverlapping with all of the areas thus referenced. Fora comprehensive survey of socially interactive roboticresearch in general, incorporating many of these as-pects, see (Fong, T., Nourbakhsh, I. and Dauten-hahn, K. 2003).

The use of emblematic gestures for communicationhas been widely used on robotic platforms and inthe field of animatronics; examples are the MIT Me-dia Lab’s ‘Leonardo’, an expressive humanoid whichcurrently uses emblematic gesture as its only formof symbolic communication and also incorporates ki-nesic adaptors in the form of blended natural idlemotions(Breazeal, C., Brooks, A.G., Gray, J., Hoff-man, G., Kidd, C., Lee, H., Lieberman, J., Lock-erd, A. and Chilongo, D. 2004), and Waseda Univer-sity’s WE-4RII ‘emotion expression humanoid robot’which was also designed to adopt expressive bodypostures(Zecca, M., Roccella, S., Carrozza, M.C.,Cappiello, G., Cabibihan, J.-J., Dario, P., Takanobu,H., Matsumoto, M., Miwa, H., Itoh, K. and Takan-ishi, A. 2004).

Comprehensive communicative gesture mecha-nisms have also been incorporated into animated hu-manoid conversational agents and VE avatars. Koppand Wachsmuth used a hierarchical kinesic modelto generate complex symbolic gestures from gesturephrases, later interleaving them tightly with concur-rent speech(Kopp, S. and Wachsmuth, I. 2000, Kopp,S. and Wachsmuth, I. 2002). Guye-Vuilleme et al.provided collaborative VE users with the means tomanually display a variety of non-verbal bodily ex-pressions on their avatars using a fixed palette ofpotential actions(Guye-Vuilleme, A., Capin, T.K.,Pandzic, I.S., Thalmann, N.M. and Thalmann, D.1998).

Similarly, illustrators and regulators have beenused to punctuate speech and control conversa-tional turn-taking on interactive robots and ani-mated characters. Aoyama and Shimomura im-plemented contingent head pose (such as nodding)and automatic filler insertion during speech interac-tions with Sony QRIO(Aoyama, K. and Shimomura,

6

H. 2005). Extensive work on body language for an-imated conversational agents has been performed atthe MIT Media Lab, such as Thorisson’s implemen-tation of a multimodal dialog skill management sys-tem on an animated humanoid for face-to-face in-teractions (Thorisson, K.R. 1996) and Cassell andVilhjalmsson’s work on allowing human-controlledfull-body avatars to exhibit communicative reactionsto other avatars autonomously(Cassell, J. and Vilh-jalmsson, H. 1999).

Robots that communicate using facial expressionhave also become the subject of much attention,too numerous to summarize here but beginning withwell-known examples such as Kismet and the facerobots developed by Hara(Breazeal, C. 2000, Hara,F., Akazawa, H. and Kobayashi, H. 2001). In ad-dition to communication of emotional state, someof these robots have used affective facial expressionwith the aim of manipulating the human, either interms of a desired emotional state as in the case ofKismet or in terms of increasing desired motivationto perform a collaborative task as in subsequent workon Leonardo(Brooks, A.G., Berlin, M., Gray, J. andBreazeal, C. 2005).

However much of this related work either focusesdirectly on social communication through body lan-guage as the central research topic rather than the in-teroperation of non-verbal communication with con-current instrumental behavior, or on improvementsto the interaction resulting from directly integrat-ing non-verbal communication as part of the inter-action design process. When emotional models areincorporated to control aspects such as affective dis-play, they tend to be models designed to provide a“snapshot” of the robot’s emotional state (for exam-ple represented by a number of discrete categoriessuch as the Ekman model (Ekman, P. and David-son, R.J. 1994)) suitable for immediate communica-tion via the robot’s facial actuators but with minimalreference to the context of the interaction. Howeverrecent research by Fridlund has strongly challengedthe widely accepted notion that facial expressions arean unconscious and largely culturally invariant rep-resentation of internal emotional state, arguing in-stead that they are very deliberate communicationsthat are heavily influenced by the context in whichthey are expressed(Fridlund, A. 1994). This is a con-tention that may be worth keeping in mind concern-ing robotic body language expression in general.

Given the capabilities of our robotic platform(Sony QRIO) and the relevance of the various types

of body language to the interactions envisaged forQRIO, the following aspects were chosen for specificattention:

I. Proxemics and the management of interper-sonal distance, including speed of locomo-tion;

II. Emblematic hand and arm gestures in sup-port of the above;

III. The rotation of the torso during interaction,which in humans reflects the desire for in-teraction (facing more squarely represents asociopetal stance, whereas displaying an an-gular offset is a sociofugal posture) — thusknown as the “sociofugal/sociopetal axis”;

IV. The posture of the arms, including continu-ous measures (arms akimbo, defensive rais-ing of the arms, and rotation of the fore-arms which appear sociopetal when rotatedoutwards and sociofugal when rotated in-wards) and discrete postural stances (e.g.arms folded);

V. Head pose and the maintenance of eye con-tact;

VI. Illustrators, both pre-existing (head nod-ding) and newly incorporated (attentivetorso leaning).

See Figure 15 in Section 6 for examples of some ofthese poses as displayed by QRIO.

3 Behavioral Overlays

The concept of a behavioral overlay for robots canbe described as a motor-level modifier that alters theresulting appearance of a particular output confor-mation (a motion, posture, or combination of both).The intention is to provide a simultaneous displayof information, in this case non-verbal communica-tion, through careful alteration of the motor systemin such a way that the underlying behavioral activi-ties of the robot may continue as normal. Just as thebehavior schemas that make up the robot’s behav-ioral repertoire typically need not know of the exis-tence or method of operation of one another, behav-ioral overlays should be largely transparent to thebehavior schema responsible for the current instru-mental behavior at a given time. Preservation of thislevel of modularity simplifies the process of adding orlearning new behaviors.

7

As a simplified example, consider the case of a sim-ple 2-DOF output system: the tail of a robotic dogsuch as AIBO, which can be angled around horizontaland vertical axes. The tail is used extensively for non-verbal communication by dogs, particularly throughvarious modes of tail wagging. Four examples of suchcommunications via the tail from the AIBO motorprimitives design specification are the following:

• Tail Wagging Friendly: Amplitude of wag large,height of tail low, speed of wag baseline slow butrelated to strength of emotion.

• Tail Wagging Defensive: Amplitude of wagsmall, height of tail high, speed of wag baselinefast but related to strength of emotion.

• Submissive Posture: In cases of low domi-nance/high submission, height of tail very low(between legs), no wagging motion.

• “Imperious Walking”: Simultaneous with loco-motion in cases of high dominance/low submis-sion; amplitude of wag small, height of tail high,speed of wag fast.

One method of implementing these communicationmodes would be to place them within each behavioralschema, designing these behaviors with the increasedcomplexity of responding to the relevant emotionaland instinct models directly. An alternative approach— behavioral overlays — is to allow simpler under-lying behaviors to be externally modified to produceappropriate display activity. Consider the basic walk-ing behavior b0 in which the dog’s tail wags naturallyleft and right at height φ0 with amplitude ±θ0 andfrequency ω0 from default values for the walking stepmotion rather than the dog’s emotional state. Themotor state MT of the tail under these conditions isthus given by:

MT (t) =[

b0x(t)b0y

(t)

]=

[θ0sin(ω0t)

φ0

]Now consider a behavioral overlay vector for the

tail o0 = [α, λ, δ] applied to the active behavior ac-cording to the mathematics of a multidimensionaloverlay coordination function ΩT to produce the fol-lowing overlaid tail motor state M+

T:

M+T (t) = ΩT (o0,b0(t)) =

[αθ0sin(λω0t)

φ0 + δ

]For appropriate construction of o0, the overlay sys-

tem is now able to produce imperious walking (α 1,

λ 1, δ 0) as well as other communicative walkstyles not specifically predefined (e.g. “submissivewalking”: α = 0, δ 0), without any modifica-tion of the underlying walking behavior. Moreover,the display output will continue to reflect as muchas possible the parameters of the underlying activity(in this case the walking motion) in addition to theinternal state used to generate the communicationsoverlay (e.g. dominance/submission, emotion).

However in our AIBO example so far, the overlaysystem is only able to communicate the dog’s internalstate using the tail when it is already being moved byan existing behavior (in this case walking). It maytherefore be necessary to add an additional specialtype of behavior whose function is to keep the over-lay system supplied with motor input. This type ofbehavior is distinguished by two characteristics: adifferent stimulus set than normal behaviors (eithermore or less, including none at all); and its output istreated differently by the overlay system (which mayat times choose to ignore it entirely). We refer to suchbehaviors as “idler” behaviors. In this case, consideridler behavior b1 which simply attempts to continu-ously wag the tail some amount in order to providethe overlay system with input to be accentuated orsuppressed:

b1(t) =[θ1sin(ω1t)

φ1

]This behavior competes for action selection with

the regular “environmental” behaviors as normal,and when active is overlaid by ΩT in the same fash-ion. Thus the overlay system with the addition of oneextremely basic behavior is able to achieve all of thefour tail communications displays originally specifiedabove, including variations in degree and combina-tion, by appropriate selection of overlay componentsbased on the robot’s internal state. For active be-havior bi and overlay oj , the tail activity producedis:

M+T (t) = ΩT (oj ,bi(t))

However, the addition of specialized “idler” behav-iors provides additional opportunities for manipula-tion of the robot’s display activity, as these behaviorscan be designed to be aware of and communicate withthe overlay system — for example, to enable the trig-gering of emblematic gestures. If the robot’s normalbehaviors are subject at a given time to the stimu-lus vector [S], the idler behaviors can be thought ofas responding to an expanded stimulus vector [S,Ψ]

8

Figure 4: The behavioral overlay model, shown over-laying active environmental behaviors b1..n and idlerbehaviors i1..m after action selection has already beenperformed.

where Ψ is the vector of feedback stimuli from theoverlay system. For instance, AIBO’s tail idler be-havior upon receiving a feedback stimulus ψp mightinterrupt its wagging action to trace out a predefinedshape P with the tail tip:

b1(ψp, t) =[Px(t)Py(t)

]In general, then, let us say that for a collection

of active (i.e. having passed action selection) envi-ronmental behaviors B and idler behaviors I, andan overlay vector O, the overlaid motor state M+

is given according to the model:

M+ = Ω(O, [B(S), I([S,Ψ]))

This model is represented graphically in Figure 4.In the idealized modular case, the environmental be-haviors need neither communicate directly with noreven be aware of the existence of the behavioral over-lay system. For practical purposes, however, a coarselevel of influence by the environmental behaviors onthe overlay system is required, because a completedetermination a priori of whether or not motor modi-fication will interfere with a particular activity is verydifficult to make. This level of influence has been ac-counted for in the model, and the input to Ω fromB and I as shown incorporates any necessary com-munication above and beyond the motor commandsthemselves.

Behavioral overlays as implemented in the researchdescribed in this paper include such facilities forschemas to communicate with the overlay systemwhen necessary. Schemas may, if desired, protect

themselves against modification in cases in which in-terference is likely to cause failure of the behavior(such as a task, like fine manipulation, that wouldsimilarly require intense concentration and suppres-sion of the non-verbal communication channel whenperformed by a human). Care should, of course,be taken to use this facility sparingly, in order toavoid the inadvertent sending of non-verbal null sig-nals during activities that should not require suchconcentration, or for which the consequences of fail-ure are not excessively undesirable.

Schemas may also make recommendations to theoverlay system that assist it in setting the envelopeof potential overlays (such as reporting the charac-teristic proxemics of an interaction; e.g. whether aspeaking behavior takes place in the context of anintimate conversation or a public address). In gen-eral, however, knowledge of and communication withthe overlay system is not a requirement for executionof a behavior. As QRIO’s intentional mechanism isrefined to better facilitate high-level behavioral con-trol, the intention system may also communicate withthe behavioral overlay system directly, preserving themodularity of the individual behaviors.

Related work of most relevance to this concept isthe general body of research related to motion pa-rameterization. The essential purpose of this classof techniques is to describe bodily motions in termsof parameters other than their basic joint-angle timeseries. Ideally, the new parameters should captureessential qualities of the motion (such as its overallappearance) in such a way that these qualities can bepredictably modified or held constant by modifyingor holding constant the appropriate parameters. Thishas advantages both for motion generation (classesof motions can be represented more compactly as pa-rameter ranges rather than clusters of individual ex-emplars) and motion recognition (novel motions canbe matched to known examples by comparison of theparameter values).

Approaches to this technique have been appliedto motion generation for animated characters candiffer in their principal focus. One philosophy in-volves creating comprehensive sets of basic motiontemplates that can then be used to fashion more com-plex motions by blending and modifying them witha smaller set of basic parameters, such as duration,amplitude and direction; this type of approach wasused under the control of a scripting language to addreal-time gestural activity to the animated characterOLGA(Beskow, J. and McGlashan, S. 1997). At the

9

other extreme, the Badler research group at the Uni-versity of Pennsylvania argued that truly lifelike mo-tion requires the use of a large number of parametersconcerned with effort and shaping, creating the an-imation model EMOTE based on Laban MovementAnalysis; however the model is non-emotional anddoes not address autonomous action generation(Chi,D., Costa, M., Zhao, L. and Badler, N. 2000).

One of the most well known examples of motionparameterization that has inspired extensive atten-tion in both the animation and robotics communitiesis the technique of “verbs and adverbs” proposed byRose et al.(Rose, C., Cohen, M. and Bodenheimer,B. 1998). In this method verbs are specific base ac-tions and adverbs are collections of parameters thatmodify the verbs to produce functionally similar mo-tor outputs that vary according to the specific quali-ties the adverbs were designed to affect.

This technique allows, for example, an animator togenerate a continuous range of emotional expressionsof a particular action, from say ‘excited waving’ to‘forlorn waving’, without having to manually createevery specific example separately; instead, a singlebase ‘waving’ motion would be created, and then pa-rameter ranges that described the variation from ‘ex-cited’ to ‘forlorn’ provide the means of automaticallysituating an example somewhere on that continuum.While the essential approach is general, it is typi-cally applied at the level of individual actions ratherthan overall behavioral output due to the difficulty ofspecifying a suitable parameterization of all possiblemotion.

Similarly, the technique of “morphable models”proposed by Giese and Poggio describes motion ex-pressions in terms of pattern manifolds inside whichplausible-looking motions can be synthesized and de-composed with linear parameter coefficients, accord-ing to the principles of linear superposition(Giese,M.A. and Poggio, T. 2000). In their original exam-ple, locomotion gaits such as ‘walking’ and ‘marching’were used as exemplars to define the parameter space,and from this the parameters could be re-weighted inorder to synthesize new gaits such as ‘limping’. Fur-thermore, an observed gait could then be classifiedagainst the training examples using least-squares es-timation in order to estimate its relationship to theknown walking styles.

Not surprisingly, motion parameterization tech-niques such as the above have been shown significantinterest by the segment of the robotics communityconcerned with robot programming by demonstra-

tion. Motion parameterization holds promise for thecentral problem that this approach attempts to solve:extrapolation from a discrete (and ideally small) setof demonstrated examples to a continuous task com-petency envelope; i.e., knowing what to vary to turn aknown spatio-temporal sequence into one that is func-tionally equivalent but better represents current cir-cumstances that were not prevailing during the orig-inal demonstration. The robotics literature in thisarea, even just concerning humanoid robots, is toobroad to summarize, but see (Peters, R.A.II, Camp-bell, C.C., Bluethmann, W.J. and Huber, E. 2003) fora representative example that uses a verbs-adverbsapproach for a learned grasping task, and illustratesthe depth of the problem.

Fortunately, the problem that behavioral overlaysseeks to address is somewhat simpler. In the firstplace, the task at hand is not to transform a knownaction that would be unsuccessful if executed underthe current circumstances into one that now achievesa successful result; rather, it is to make modificationsto certain bodily postures during known successfulactions to a degree that communicates informationwithout causing those successful actions to becomeunsuccessful. This distinction confers with it theluxury of being able to choose many parameters inadvance according to well-described human posturetaxonomies such as those referred to in Section 2, al-lowing algorithmic attention to thus be concentratedon the appropriate quantity and combination of theirapplication. Furthermore, such a situation, in whichthe outcome even of doing nothing at all is at leastthe success of the underlying behavior, has the addedadvantage that unused bodily resources can be em-ployed in overlay service with a reasonable level ofassuredness that they will not cause the behavior tofail.

Secondly, the motion classification task, where itexists at all, is not the complex problem of param-eterizing monolithic observed output motion-posturecombinations, but simply to attempt to ensure thatsuch conformations, when classified by the humanobserver’s built-in parameterization function, will beclassified correctly. In a sense, behavioral overlaysstart with motor descriptions that have already beenparameterized — into the instrumental behavior it-self and the overlay information — and the task ofthe overlay function is to maintain and apply thisparameterization in such a fashion that the outputremains effective in substance and natural in appear-ance, a far less ambiguous situation than the reverse

10

case of attempting to separate unconstrained naturalbehavior into its instrumental and non-instrumentalaspects (e.g. ‘style’ from ‘content’).

In light of the above, and since behavioral overlaysare designed with the intention of affecting all of therobot’s behavioral conformations, not just ones thathave been developed or exhibited at the time of pa-rameter estimation, the implementation of behavioraloverlays described here has focused on two main ar-eas. First, the development of general rules concern-ing the desired display behaviors and available bodilyresources identified in Section 2 that can be applied toa wide range of the robot’s activities. And second, thedevelopment of a maintenance and releasing mecha-nism that maps these rules to the space of emotionaland other internal information that the robot will usethem to express. Details of the internal data repre-sentations that provide the robot with the contextualinformation necessary to make these connections aregiven in Section 4, and implementation details of theoverlay system itself are provided in Section 5.

4 Relationships and Attitudes

An important driving force behind proxemics andbody language is the internal state of the individ-ual. QRIO’s standard EGO Architecture contains asystem of internally-maintained emotions and statevariables, and some of these are applicable to non-verbal communication. A brief review follows; pleaserefer to Figure 5 for a block diagram of the unmodi-fied EGO Architecture.

QRIO’s emotional model (EM) contains six emo-tions (ANGER, DISGUST, FEAR, JOY, SADNESS,SURPRISE) plus NEUTRAL, along the lines ofthe Ekman proposal(Ekman, P. and Davidson, R.J.1994). Currently QRIO is only able to experienceone emotion from this list at a given time, thoughthe non-verbal communication overlay system hasbeen designed in anticipation of the potential forthis to change. Emotional levels are represented bycontinuous-valued variables.

QRIO’s internal state model (ISM) is also a sys-tem of continuous-valued variables, that are main-tained within a certain range by a homeostasis mech-anism. Example variables include FATIGUE, IN-FORMATION, VITALITY and INTERACTION. Alow level of a particular state variable can be used todrive QRIO to seek objects or activities that can beexpected to increase the level, and vice versa.

For more details of the QRIO emotionally

Figure 5: Functional units and interconnections inQRIO’s standard EGO Architecture.

grounded architecture beyond the above summary,see (Sawada, T., Takagi, T., Hoshino, Y. and Fu-jita, M. 2004). The remainder of this section detailsadditions to the architecture that have been devel-oped specifically to support non-verbal communica-tion overlays. In the pre-existing EGO architecture,QRIO has been designed to behave differently withdifferent individual humans by changing its EM andISM values in response to facial identification of eachhuman. However, these changes are instantaneousupon recognition of the human; the lack of addi-tional factors distinguishing QRIO’s feelings aboutthese specific individual humans limits the amountof variation and naturalness that can be expressed inthe output behaviors.

In order for QRIO to respond to individual hu-mans with meaningful proxemic and body languagedisplays during personal interactions, QRIO requiresa mechanism for preserving the differences betweenthese individual partners — what we might generallyrefer to as a relationship. QRIO does have a long-term memory (LTM) feature; an associative memory,it is used to remember connections between peopleand objects in predefined contexts, such as the nameof a person’s favorite food. To support emotionalrelationships, which can then be used to influencenon-verbal communication display, a data structureto extend this system has been developed.

Each human with whom QRIO is familiar is repre-sented by a single ‘Relationship’ structure, with sev-eral internal variables. A diagram of the structurecan be seen in Figure 6. The set of variables chosenfor this structure have generally been selected for thepractical purpose of supporting non-verbal communi-cation rather than to mirror a particular theoreticalmodel of interpersonal relationships, with exceptions

11

Figure 6: An example instantiation of a Relationshipstructure, showing arbitrarily selected values for thesingle discrete enumerated variable and each of thefive continuous variables.

noted below.The discrete-valued ‘Type’ field represents the gen-

eral nature of QRIO’s relationship with this individ-ual; it provides the global context within which theother variables are locally interpreted. Because a pri-mary use of this structure will be the managementof personal space, it has been based on delineationsthat match the principal proxemic zones set out byWeitz(Weitz, S. 1974). An INTIMATE relationshipsignifies a particularly close friendly or familial bondin which touch is accepted. A PERSONAL relation-ship represents most friendly relationships. A SO-CIAL relationship includes most acquaintances, suchas the relationship between fellow company employ-ees. And a PUBLIC relationship is one in whichQRIO may be familiar with the identity of the humanbut little or no social contact has occurred. It is en-visaged that this value will not change frequently, butit could be learned or adapted over time (for exam-ple, a PUBLIC relationship becoming SOCIAL afterrepeated social contact).

All other Relationship variables are continuous-valued quantities, bounded and normalized, withsome capable of negative values if a reaction simi-lar in intensity but opposite in nature is semanticallymeaningful. The ‘Closeness’ field represents emo-tional closeness and could also be thought of as fa-miliarity or even trust. The ‘Attraction’ field repre-sents QRIO’s desire for emotional closeness with thehuman. The ‘Attachment’ field, based on Bowlby’stheory of attachment behavior(Bowlby, J. 1969), is avariable with direct proxemic consequences and rep-resents whether or not the human is an attachmentobject for QRIO, and if so to what degree. The ‘Sta-tus’ field represents the relative sense of superiority orinferiority QRIO enjoys in the relationship, allowingthe structure to represent formally hierarchical rela-

tionships in addition to informal friendships and ac-quaintances. Finally, the Relationship structure hasa ‘Confidence’ field which represents QRIO’s assess-ment of how accurately the other continuous variablesin the structure might represent the actual relation-ship; this provides a mechanism for allowing QRIO’sreactions to a person to exhibit differing amounts ofvariability as their relationship progresses, perhapstending to settle as QRIO gets to know them better.

In a similar vein, individual humans can exhibitmarkedly different output behavior under circum-stances in which particular aspects of their internalstates could be said to be essentially equivalent; theirpersonalities affect the way in which their emotionsand desires are interpreted and expressed. There isevidence to suggest that there may be positive out-comes to endowing humanoid robots with percepti-ble individual differences in the way in which theyreact to their internal signals and their relationshipswith people. Studies in social psychology and com-munication have repeatedly shown that people pre-fer to interact with people sharing similar attitudesand personalities (e.g. (Blankenship, V., Hnat, S.M.,Hess, T.G. and Brown, D.R. 2004, Byrne, D. andGriffit, W. 1969)) — such behavior is known as “sim-ilarity attraction”. A contrary case is made for “com-plementary attraction”, in which people seek inter-actions with other people having different but com-plementary attitudes that have the effect of balanc-ing their own personalities(Kiesler, D.J. 1983, Orford,J. 1986).

These theories have been carried over into thesphere of interactions involving non-human partici-pants. In the product design literature, Jordan dis-cusses the “pleasurability” of products; two of thecategories in which products have the potential tosatisfy their users are “socio-pleasure”, relating tointer-personal relationships, and “ideo-pleasure”, re-lating to shared values(Jordan, P.W. 2000). And anumber of human-computer interaction studies havedemonstrated that humans respond to computers associal agents with personalities, with similarity at-traction being the norm (e.g. (Nass, C. and Lee,K.M. 2001)). Yan et al. performed experiments inwhich AIBO robotic dogs were programmed to sim-ulate fixed traits of introversion or extroversion, andshowed that subjects were able to correctly recognizethe expressed trait; however, in this case the prefer-ences observed indicated complementary attraction,with the authors postulating the embodiment of therobot itself as a potential factor in the reversal(Yan,

12

C., Peng, W., Lee, K.M. and Jin, S. 2004).

As a result, if a humanoid robot can best bethought as a product which seeks to attract and ul-timately fulfil the desires of human users, it mightbe reasonable to predict that humans will be moreattracted to and satisfied by robots which appear tomatch their own personalities and social responses.On the other hand, if a humanoid robot is insteadbest described as a human-like embodied agent thatis already perceived as complementary to humans asa result of its differences in embodiment, it mightalternatively be predicted that humans will tend tobe more attracted to robots having personalities thatare perceptibly different from their own. In eithercase, a robot possessing the means to hold such at-titudes would be likely to have a general advantagein attaining acceptance from humans, and ultimatelythe choice of the precise nature of an individual robotcould be left up to its human counterpart.

To allow QRIO to exhibit individualistic (and thushopefully more interesting) non-verbal behavior, asystem that interprets or filters certain aspects ofQRIO’s internal model was required. At presentthis system is intended only to relate directly to therobot’s proxemic and body language overlays; no at-tempt has been made to give the robot anything ap-proaching a complete ‘personality’. As such, the datastructure representing these individual traits is in-stead entitled an ‘Attitude’. Each QRIO has one suchstructure; it is envisaged to have a long-to-medium-term effect, in that it is reasonable for the structureto be pre-set and not to change thereafter, but it alsomight be considered desirable to have the robot beable to change its nature somewhat over the course ofa particularly long term interaction (even a lifetime),much as human attitudes sometimes mellow or be-come more extreme over time. Brief instantiation oftemporary replacement Attitude (and Relationship)structures would also provide a potential mechanismfor QRIO to expand its entertainment repertoire with‘acting’ ability, simply by using its normal mecha-nisms to respond to interactions as though they fea-tured different participants.

The Attitude structure consists of six continuous-valued, normalized variables in three opposing pairs,as illustrated in Figure 7. Opposing pairs are directlyrelated in that one quantity can be computed fromthe other; although only three variables are thereforecomputationally necessary, they are specified in thisway to be more intuitively grounded from the pointof view of the programmer or behavior designer.

Figure 7: An arbitrary instantiation of an Atti-tude structure, showing the three opposing pairs ofcontinuous-valued variables, and the ISM, EM andRelationship variables that each interprets.

The ‘Extroversion’ and ‘Introversion’ fields areadapted directly from the Extroversion dimension ofthe Five-Factor Model (FFM) of personality(McCrae,R.R. and Costa, P.T. 1996), and as detailed abovewere used successfully in experiments with AIBO.In the case of QRIO, these values are used to af-fect the expression of body language and to interpretinternal state desires. High values of Extroversionencourage more overt body language, whereas highIntroversion results in more subtle bodily expression.Extroversion increases the effect of the ISM variableINTERACTION and decreases the effect of INFOR-MATION, whereas Introversion does the reverse.

The ‘Aggressiveness’ and ‘Timidity’ fields are moreloosely adapted from the FFM — they can be thoughtof as somewhat similar to a hybrid of Agreeable-ness and Conscientiousness, though the exact na-ture is more closely tailored to the specific emotionsand non-verbal display requirements of QRIO. Ag-gressiveness increases the effect of the EM variableANGER and decreases the effect of FEAR, whileTimidity accentuates FEAR and attenuates ANGER.High Timidity makes submissive postures more prob-able, while high Aggressiveness raises the likelihoodof dominant postures and may negate the effect ofRelationship Status.

Finally, the ‘Attachment’ and ‘Independence’ fieldsdepart from the FFM and return to Bowlby; theironly effect is proxemic, as an interpreter for the valueof Relationship Attachment. While any human canrepresent an attachment relationship with the robot,robots with different attitudes should be expected to

13

respond to such relationships in different ways. Re-lationship Attachment is intended to have the effectof compressing the distance from the human that therobot is willing to stray; the robot’s own Indepen-dence or Attachment could be used for example toalter the fall-off probabilities at the extremes of thisrange, or to change the ‘sortie’ behavior of the robotto bias it to make briefer forays away from its attach-ment object.

The scalar values within the Relationship and At-titude structures provide the raw material for affect-ing the robot’s non-verbal communication (and po-tentially many other behaviors). These have beencarefully selected in order to be able to drive the out-put overlays in which we are interested. However,there are many possible ways in which this materialcould then be interpreted and mathematically con-verted into the overlay signals themselves. We do notwish to argue for one particular numerical algorithmover another, because that would amount to claimingthat we have quantitative answers to questions suchas “how often should a robot which is 90% introvertedand 65% timid lean away from a person to whom it isonly 15% attracted?”. We do not make such claims.Instead, we will illustrate the interpretation of thesestructures through two examples of general data us-age models that contrast the variability of overlaygeneration available to a standard QRIO versus thatavailable to one equipped with Relationships and At-titudes.

Sociofugal/sociopetal axis: We use a scalar valuefor the sociofugal/sociopetal axis, S, from 0.0 (thetorso facing straight ahead) to 1.0 (maximum off-axistorso rotation). Since this represents QRIO’s unwill-ingness to interact, a QRIO equipped with non-verbalcommunication skills but neither Relationships norAttitudes might generate this axis based on a (pos-sibly non-linear) function s of its ISM variable IN-TERACTION, IINT , and its EM variable ANGER,EANG:

S = s(IINT , EANG)

The QRIO equipped with Relationships and Atti-tudes is able to apply more data towards this compu-tation. The value of the robot’s Introversion, AInt,can decrease the effect of INTERACTION, resultingin inhibition of display of the sociofugal axis accord-ing to some combining function f . Conversely, thevalue of the robot’s Aggression, AAgg, can increasethe effect of ANGER, enhancing the sociofugal resultaccording to the combining function g. Furthermore,

the robot may choose to take into account the relativeStatus, RSta, of the person with whom it is interact-ing, in order to politely suppress a negative display.The improved sociofugal axis generation function s+

is thus given by:

S′ = s+(f(IINT , AInt), g(EANG, AAgg), RSta)

Proxemic distance: Even if the set of appropriateproxemic zones P for the type of interaction is speci-fied by the interaction behavior itself, QRIO will needto decide on an actual scalar distance within thosezones, D, to stand from the human. A standardQRIO might thus use its instantaneous EM variableFEAR, EFEA, to influence whether it was preparedto choose a close value or to instead act more warilyand stand back, according to another possibly non-linear function d:

D = d(P, EFEA)

The enhanced QRIO, on the other hand, is able tomake much more nuanced selections of proxemic dis-tance. The proxemic Type field of the Relationship,RTyp, allows the robot to select the most appropriatesingle zone from the options given in P. The valueof the robot’s Timidity, ATim, increases the effect ofFEAR, altering the extent to which the robot displayswariness according to a combining function u. TheCloseness of the robot’s relationship to the human,RClo, can also be communicated by altering the dis-tance it keeps between them, as can its Attraction forthe human, RAttr. If the robot has an Attachment re-lationship with the human, its strength RAtta can beexpressed by enforcing an upper bound on D, modu-lated by the robot’s Independence AInd according tothe combining function v. The improved ideal prox-emic distance generation function d+ is thus givenby:

D′ = d+

(P, RTyp, u(EFEA, ATim),RClo, RAttr, v(RAtta, AInd)

)Clearly, the addition of Relationships and Atti-

tudes offer increased scope for variability of non-verbal communications output. More importantly,however, they provide a rich, socially groundedframework for that variability, allowing straightfor-ward implementations to be developed that non-verbally communicate the information in a way thatvaries predictably with the broader social context ofthe interaction.

14

5 System Architecture and Im-plementation

The QRIO behavioral system, termed the EGO Ar-chitecture, is a distributed object-based software ar-chitecture based on the OPEN-R modular environ-ment originally developed for AIBO. Objects in EGOare in general associated with particular functions.There are two basic memory objects: Short-TermMemory (STM) which processes perceptual informa-tion and makes it available in processed form at arate of 2 Hz, and Long-Term Memory (LTM) whichassociates information (such as face recognition re-sults) with known individual humans. The InternalModel (IM) object manages the variables of the ISMand EM. The Motion Controller (MC) receives andexecutes motion commands, returning a result to therequesting module.

QRIO’s actual behaviors are executed in up tothree Situated Behavior Layer (SBL) objects: theNormal SBL (N-SBL) manages behaviors that ex-ecute at the 2 Hz STM update rate (homeostaticbehaviors); the Reflexive SBL (R-SBL) manages be-haviors that require responses faster than the N-SBLcan provide, and therefore operates at a significantlyhigher update frequency (behaviors requiring percep-tual information must communicate with perceptualmodules directly rather than STM); and deliberativebehavior can be realized in the Deliberative SBL (D-SBL).

Within each SBL, behaviors are organized in atree-structured network of schemas; schemas performminimal communication between one another, andcompete for activation according to a winner-take-allmechanism based on the resource requirements of in-dividual schemas. For more information about theEGO Architecture and the SBL system of behaviorcontrol, please see (Fujita, M., Kuroki, Y., Ishida, T.and Doi, T. 2003).

5.1 NVC Object

Because behavioral overlays must be able to be ap-plied to all behavior schemas, and because schemasare intended to perform minimal data sharing (sothat schema trees can be easily constructed from in-dividual schemas without having to be aware of theoverall tree structure), it is not possible or desirableto completely implement an appropriate overlay sys-tem within the SBLs themselves. Instead, to imple-ment behavioral overlays an additional, independent

Figure 8: Interdependence diagram of the EGO Ar-chitecture with NVC.

Non-Verbal Communication (NVC) object was addedto the EGO Architecture.

The NVC object has data connections with severalof the other EGO objects, but its most central func-tion as a motor-level overlay object is to interceptand modify motor commands as they are sent to theMC object. Addition of the NVC object thereforeinvolves reconnecting the MC output of the variousSBL objects to the NVC, and then connecting theNVC object to the MC. MC responses are likewiserouted through the NVC object and then back to theSBLs.

In addition to the motor connection, the NVC ob-ject maintains a connection to the IM output (forreceiving ISM and EM updates, which can also beused as a 2 Hz interrupt timer), the STM target up-date (for acquiring information about the location ofhumans, used in proxemic computations) and a cus-tom message channel to the N-SBL and R-SBL (forreceiving special information about the interaction,and sending trigger messages for particular gesturesand postures). See Figure 8 for a graphical overviewof the EGO Architecture with NVC. In addition tothe connections shown, the NVC object manages thebehavioral overlay values themselves with referenceto the Relationship and Attitude structures; Figure 9has an overview of the internal workings of NVC.

15

Figure 9: The non-verbal communication (NVC)module’s conceptual internal structure and data in-terconnections with other EGO modules.

5.2 Overlays, Resources and Timing

The data representation for the behavioral overlaysthemselves is basic yet flexible. They are dividedaccording to the major resource types Head, Arms,Trunk and Legs. For each joint (or walking parametervalue, in the case of the legs) within a resource type,the overlay maintains a value and a flag that allowsthe value to be interpreted as either an absolute bias(to allow constant changes to the postural conforma-tion) or a relative gain (to accentuate or attenuateincoming motions). In addition, each overlay cate-gory contains a time parameter for altering the speedof motion of the resource, which can also be flaggedas a bias or a gain. Finally, the legs overlay containsan additional egocentric position parameter that canbe used to modify the destination of the robot in thecase of walking commands.

Motion commands that are routed through theNVC object consist of up to two parts: a commandbody, and an option parameter set. Motions that areparameterized (i.e., that have an option part) canbe modified directly by the NVC object according tothe current values of the overlays that the NVC ob-ject is storing. Such types of motions include directpositioning of the head, trunk and arm with explicitjoint angle commands; general purpose motions thathave been designed with reuse in mind, such as nod-ding (the parameter specifying the depth of nod); andcommands that are intended to subsequently run indirect communication with perceptual systems withthe SBL excluded from the decision loop, such ashead tracking. Unfortunately due to the design ofthe MC system, unparameterized motion commands(i.e., those with just a body) cannot be altered beforereaching the MC object; but they can be ignored orreplaced with any other single parameterized or unpa-

rameterized motion command having the same actu-ator resource requirements (this possibility is not yettaken advantage of in the current implementation).

Resource management in the EGO Architectureis coarse grained and fixed; it is used for managingschema activation in the SBLs as well as just for pre-senting direct motion command conflicts. The re-source categories in EGO are the Head, Trunk, RightArm, Left Arm and Legs. Thus a body languagemotion command that wanted only to adjust the so-ciofugal/sociopetal axis (trunk rotate), for example,would nevertheless be forced to take control of theentire trunk, potentially blocking an instrumental be-havior from executing. The NVC object implementsa somewhat finer-grained resource manager by virtueof the overlay system. By being able to choose tomodify the commands of instrumental behaviors di-rectly, or not to do so, the effect is as if the resourcewere managed at the level of individual joints.

This raises, however, the problem of the case oftime steps at which environmental behaviors do notsend motion commands yet it is desired to modifythe robot’s body language; this situation is common.The NVC object skirts this problem by relying on anetwork of fast-activating idle schemas residing in theN-SBL. Each idle schema is responsible for a singleMC resource. On each time step, if no instrumen-tal behavior claims a given resource, the appropriateidle schema sends a null command to the NVC ob-ject for potential overlaying. If an overlay is desired,the null command is modified into a genuine actionand passed on to the MC; otherwise it is discarded.Since commands from idle and other special overlayschemas are thus treated differently by the NVC ob-ject than those from behavioral schemas, the NVCobject must keep track of which schemas are which;this is accomplished by a handshaking procedure thatoccurs at startup time, illustrated graphically in Fig-ure 10. Normal operation of the idle schemas is illus-trated in Figure 11.

For practical considerations within the EGO Ar-chitecture, actions involved in non-verbal communi-cation can be divided into four categories accordingto two classification axes. First is the timing require-ments of the action. Some actions, such as posturalshifts, are not precisely timed and are appropriatelysuited to the 2 Hz update rate of the N-SBL. Oth-ers, however, are highly contingent with the activity,such as nodding during dialog, and must reside inthe R-SBL. Second is the resource requirements ofthe action. Some actions, such as gross posture, re-

16

Figure 10: Special schemas register with the NVCobject in a handshaking procedure. An acknowledge-ment from NVC is required because the SBLs donot know when the NVC object is ready to acceptmessages, so they attempt to register until success-ful. Within the R-SBL, the parent registers insteadof the individual schemas, to preserve as closely aspossible the mode of operation of the pre-existingconversational reflex system. The registration sys-tem also supports allowing schemas to deliberatelychange type at any time (e.g. from an idle schema toa behavioral schema) though no use of this extensi-bility has been made to date.

Figure 11: Normal idle activity of the system in theabsence of gesture or reflex triggering. N-SBL idleand gesture schemas are active if resources permit.Active idle schemas send null motor commands toNVC on each time step. The dialog reflex parent is al-ways active, but the individual dialog reflex schemasare inactive.

quire only partial control of a resource, whereas oth-ers require its total control. Partial resource manage-ment has just been described above, and it functionsidentically with commands that are also synchronousand originate in the R-SBL. However there are alsonon-verbal communication behaviors that require to-tal resource control, such as emblematic gestures, andthese are also implemented with the use of special-ized “triggered” schemas: at the N-SBL level theseschemas are called gesture schemas, and at the R-SBL level they are called dialog reflex schemas.

Gesture schemas reside within the same subtreeof the N-SBL as the idle schemas; unlike the idleschemas, however, they do not attempt to remainactive when no instrumental behavior is operating.Instead, they await gesture trigger messages from theNVC object, because the NVC object can not createmotion commands directly, it can only modify them.Upon receiving such a message, a gesture schema de-

17

Figure 12: Triggering of a gesture occurs by NVCsending a trigger message to all registered N-SBLschemas. There is no distinction between idle andgesture schemas as idle schemas can include gesture-like responses such as postural shifts. Individual ac-tive schemas decode the trigger message and choosewhether or not to respond with a motor command;in this typical example one gesture schema responds.

termines if it is designed to execute the requestedgesture; if so, it increases its activation level (AL),sends the motion command, and then reduces its ALagain (Figure 12). The gesture, which may be an un-parameterized motion from a predesigned set, is thenexecuted by the MC object as usual.

Dialog reflex schemas operate in a similar butslightly different fashion. Existing research on QRIOhas already resulted in a successful reactive speechinteraction system that inserted contingent attentivehead nods and speech filler actions into the flow ofconversation(Aoyama, K. and Shimomura, H. 2005).It was of course desirable for the NVC system to com-plement, rather than compete with, this existing sys-tem. The prior system used the “intention” mech-anism to modify the ALs of dialog reflex schemas,residing in the R-SBL, at appropriate points in thedialog. The NVC object manages these reflexes inalmost exactly the same way.

Dialog reflex schemas are grouped under a parentschema which has no resource requirements and is al-ways active. Upon receiving a control message fromthe NVC object, the parent schema activates its chil-dren’s monitor functions, which then choose to in-crease their own ALs, trigger the reflex gesture andthen reduce their ALs again. Thus the only practi-cal difference from the previous mechanism is thatthe reflex schemas themselves manage their own AL

Figure 13: Triggering of a dialog reflex occurs byNVC sending an activation message to the activedialog reflex parent, which then signals the self-evaluation functions of its children. Children self-evaluating to active (e.g. speech is occurring) raisetheir own activation levels and send motor commandsif required. The dialog reflex system remains activeand periodically self-evaluating until receiving a deac-tivation message from NVC, so may either be tightlysynchronized to non-causal data (e.g. pre-markedoutgoing speech) or generally reactive to causal data(e.g. incoming speech).

values, rather than the triggering object doing it forthem. In addition to the existing head nodding dialogreflex, a second dialog reflex schema was added to beresponsible for leaning the robot’s torso in towardsthe speaker (if the state of the robot’s interest andthe Relationship deem it appropriate) in reaction tospeech (Figure 13).

5.3 Execution Flow

The overall flow of execution of the NVC behavioraloverlay system (Figure 14) is thus as follows:

• At system startup, the idle, and gesture schemasmust register themselves with the NVC objectto allow subresource allocation and communica-tion. These schemas set their own AL high to al-low them to send messages; when they receive anacknowledgement from NVC, they reduce theirALs to a level that will ensure that instrumentalbehaviors take priority (Figure 10).

• Also at system startup, the dialog reflex schemaparent registers itself with NVC so that NVCknows where to send dialog reflex trigger mes-sages (Figure 10).

18

Figure 14: Execution flow of the NVC overlay system.Timing and synchronization issues such as respond-ing to changes in the internal model and proxemicstate (from STM) and waiting for the human to re-spond to an emblematic gesture are implicitly man-aged within the overlay calculation state (i.e. enter-ing this state does not guarantee that any change tothe overlays will in fact be made).

• Proxemic and body language activity com-mences when an N-SBL schema informs NVCthat an appropriate interaction is taking place.Prior to this event NVC passes all instrumentalmotion commands to MC unaltered and discardsall idle commands (Figure 11). The N-SBL inter-action behavior also passes the ID of the humansubject of the interaction to NVC so it can lookup the appropriate Relationship.

• NVC generates overlay values using a set of func-tions that take into account the IM and Attitudeof the robot, the Relationship with the humanand the proxemic hint of the interaction (if avail-able). This overlay generation phase includesthe robot’s selection of the ideal proxemic dis-tance, which is converted into a walking overlayif needed.

• Idle commands begin to be processed accordingto the overlay values. As idle commands typ-ically need only be executed once until inter-rupted, the NVC object maintains a two-tieredinternal subresource manager in which idle com-

mands may hold a resource until requested byan behavior command, but completing behaviorcommands free the resource immediately.

• To make the robot’s appearance more lifelikeand less rigid, the regular ISM update is used totrigger periodic re-evaluation of the upper bodyoverlays. When this occurs, resources held byidle commands are freed and the next incomingidle commands are allowed to execute the newoverlays.

• STM updates are monitored for changes in theproxemic state caused by movement of the targethuman, and if necessary the overlays are regen-erated to reflect the updated situation. This pro-vides an opportunity for triggering of the robot’semblematic gestures concerned with manipulat-ing the human into changing the proxemic sit-uation, rather than the robot adjusting it di-rectly with its own locomotion. A probabilisticfunction is executed that depends mainly on therobot’s IM and Attitude and the prior responseto such gestures, and according to the resultsan emblematic gesture of the appropriate natureand urgency (e.g. beckoning the human closer,or waving him or her away) is triggered. Trig-ger messages are sent to all registered idle andgesture schemas, with it up to the schemas tochoose whether or not to attempt to execute theproposed action (Figure 12).

• Activation messages are sent from NVC to thedialog reflex schemas when dictated by the pres-ence of dialog markers or the desire for generalreactivity of the robot (Figure 13).

6 Results

All of the non-verbal communication capabilities setout in Section 2 were implemented as part of the over-lay system. To test the resulting expressive range ofthe robot, we equipped QRIO with a suite of sam-ple Attitudes and Relationships. The Attitude selec-tions comprised a timid, introverted QRIO, an ag-gressive, agonistic QRIO, and an “average” QRIO.The example Relationships were crafted to representan “old friend” (dominant attribute: high Closeness),an “enemy” (dominant attribute: low Attraction),and the “Sony President” (dominant attribute: highStatus). The test scenario consisted of a simple so-cial interaction in which QRIO notices the presence

19

of a human in the room and attempts to engage himor her in a conversation. The interaction schemawas the same across conditions, but the active Atti-tude and Relationship, as well as QRIO’s current EMand ISM state, were concurrently communicated non-verbally. This scenario was demonstrated to labora-tory members, who were able to recognize the appro-priate changes in QRIO’s behavior. Due to the highdegree of existing familiarity of laboratory memberswith QRIO, however, we did not attempt to collectquantitative data from these interactions.

Due to the constraints of our arrangement withSony to collaborate on their QRIO architecture, for-mal user studies of the psychological effectiveness ofQRIO’s non-verbal expressions (as distinct from thedesign of the behavioral overlay model itself) havebeen recommended but not yet performed. In lieuof such experiments, the results of this work are il-lustrated with a comparison between general inter-actions with QRIO (such as the example above) inthe presence and absence of the overlay system. Letus specify the baseline QRIO as equipped with itsstandard internal system of emotions and instincts,as well as behavioral schema trees for tracking of thehuman’s head, for augmenting dialog with illustratorssuch as head nodding and torso leaning, and for theconversation itself. The behavioral overlay QRIO isequipped with these same attributes plus the overlayimplementation described here.

When the baseline QRIO spots the human, it maybegin the conversation activity so long as it has suf-ficient internal desire for interaction. If so, QRIOcommences tracking the human’s face and respond-ing to the human’s head movements with movementsof its own head to maintain eye contact, using QRIO’sbuilt-in face tracking and fixation module. WhenQRIO’s built-in auditory analysis system detects thatthe human is speaking, QRIO participates with itsreflexive illustrators such as head nodding, so the hu-man can find QRIO to be responsive. However, itis up to the human to select an appropriate inter-personal distance — if the human walks to the otherside of the room, or thrusts his or her face up close toQRIO’s, the conversation continues as before. Sim-ilarly, the human has no evidence of QRIO’s desirefor interaction other than the knowledge that it wassufficiently high to allow the conversational behav-ior to be initiated; and no information concerningQRIO’s knowledge of their relationship other thanwhat might come up in the conversation.

Contrast this scenario with that of the behavioral

Figure 15: A selection of QRIO body postures gener-ated by the overlay system. From left to right, top tobottom: normal (sociopetal) standing posture; maxi-mum sociofugal axis; defensive arm crossing posture;submissive attentive standing posture; open, outgo-ing raised arm posture; defensive, pugnacious raisedarm posture.

overlay QRIO. When this QRIO spots the human andthe conversation activity commences, QRIO immedi-ately begins to communicate information about itsinternal state and the relationship it shares with thehuman. QRIO begins to adopt characteristic bodypostures reflecting aspects such as its desire for in-teraction and its relative status compared with thatof the human — for some examples see Figure 15. Ifthe human is too close or too far away, QRIO maywalk to a more appropriate distance (see Figure 3 forspecifics concerning the appropriate interaction dis-tances selected for QRIO). Alternatively, QRIO maybeckon the human closer or motion him or her away,and then give the human a chance to respond beforeadjusting the distance itself if he or she does not.This may occur at any time during the interaction —approach too close, for example, and QRIO will backaway or gesticulate its displeasure. Repeated cases ofQRIO’s gestures being ignored reduces its patiencefor relying on the human’s response.

The behavioral overlay QRIO may also respond tothe human’s speech with participatory illustrators.However its behavior is again more communicative: ifthis QRIO desires interaction and has a positive rela-tionship with the human, its nods and torso leans willbe more pronounced; conversely, when disengaged oruninterested in the human, these illustrators will beattenuated or suppressed entirely. By adjusting the

20

parameters of its head tracking behavior, the overlaysystem allows QRIO to observably alter its respon-siveness to eye contact. The speed of its movementsis also altered: an angry or fearful QRIO moves morerapidly, while a QRIO whose interest in the interac-tion is waning goes through the motions more slug-gishly. All of this displayed internal information ismodulated by QRIO’s individual attitude, such asits extroversion or introversion. The human, alreadytrained to interpret such signals, is able to continu-ously update his or her mental model of QRIO andthe relationship between them.

It is clear that the latter interaction describedabove is richer and exhibits more variation. Ofcourse, many of the apparent differences couldequally well have been implemented through specifi-cally designed support within the conversational in-teraction schema tree. However the overlay systemallows these distinguishing features to be made avail-able to existing behaviors such as the conversationschema, as well as future interactions, without suchexplicit design work on a repeated basis.

For the most part, the overlays generated withinthis implementation were able to make use ofcontinuous-valued mapping functions from the var-ious internal variables to the postural and proxemicoutputs. For example, a continuous element such asthe sociofugal/sociopetal axis is easily mapped to aderived measure of interaction desire (see Section 4).However, there were a number of instances in whichit was necessary to select between discrete outputstates. This was due to the high-level motion controlinterface provided to us, which simplified the basicprocesses of generating motions on QRIO and addedan additional layer of physical safety for the robot,such as prevention of falling down. On the otherhand, this made some capabilities of QRIO’s under-lying motion architecture unavailable to us. Someexpressive postures such as defensive arm crossing,for instance, did not lend themselves to continuousmodulation as it could not be guaranteed via the ab-straction of the high-level interface that the robot’slimbs would not interfere with one another. In thesecases a discrete decision function with a probabilisticcomponent was typically used, based on the variablesused to derive the continuous postural overlays andwith its output allowed to replace those overlay valuesdirectly.

In addition to discrete arm postures, it was simi-larly necessary to create several explicit gestures witha motion editor in order to augment QRIO’s reper-

toire, as the high level interface was not designedfor parameterization of atomic gestures. (This waspresumably a safety feature, as certain parameteriza-tions might affect the robot’s stability and cause it tofall over, even when the robot is physically capableof executing the gesture over the majority of the pa-rameter space.) The gestures created were emblem-atic gestures in support of proxemic activity, such asa number of beckoning gestures using various combi-nations of one or both arms and the fingers. Similarprobabilistic discrete decision functions were used todetermine when these gestures should be selected tooverride continuous position overlays for QRIO to ad-just the proxemic state itself.

QRIO’s high-level walking interface exhibited asimilar trade-off in allowing easy generation of basicwalking motions but correspondingly hiding access tounderlying motion parameters that could otherwisehave been used to produce more expressive posturesand complex locomotion behaviors. Body languageactions that QRIO is otherwise capable of perform-ing, such as standing with legs akimbo, or makingfinely-tuned proxemic adjustments by walking alongspecific curved paths, were thus not available to usat this time. We therefore implemented an algorithmthat used the available locomotion features to gen-erate proxemic adjustments that appeared as natu-ral as possible, such as uninterruptible linear walkingmovements to appropriate positions. The display be-haviors implemented in this project should thus beviewed as a representative sample of a much largerbehavior space that could be overlaid on QRIO’s mo-tor output at lower levels of motion control abstrac-tion.

7 Conclusions and FutureWork

This research presented the model of a behavioraloverlay system and demonstrated that an implemen-tation of the model could successfully be used tomodulate the expressiveness of behaviors designedwithout non-verbal communication in mind as wellas those specifically created to non-verbally supportverbal communication (e.g. dialog illustrators). De-spite limitations in the motion controller’s supportfor motion parameterization, a spectrum of displaybehaviors was exhibited. Future extensions to themotion controller could support increased flexibilityin the communicative abilities of the system without

21

major alterations to the basic overlay implementa-tion.

The proxemic component of the behavioral overlaysystem emphasized that the spatial nature of non-verbal communication through the management ofinterpersonal distance is not only an important fea-ture of human-robot interaction that is largely yetto be explored, but can also be effectively modulatedby means of an overlay. The proxemics in this casewere computed simply in terms of egocentric lineardistances, so there remains plenty of potential forexploring alternative conceptualizations of proxemicoverlays, such as force fields or deformable surfaces.In order to be truly proxemically communicative, fu-ture work is required in giving QRIO a better spatialunderstanding of its environment, not only includingphysical features such as walls and objects, but alsoa more detailed model of the spatial activities of thehumans within it.

The overlay generating functions used in thisproject were effective display generators by virtueof being hand-tuned to reflect the general body ofresearch into human postures, gestures and interper-sonal spacing, with accommodations made for the dif-ferences in body size and motion capability of QRIO.As such, they represent a system that is genericallyapplicable but inflexible to cultural variations andthe like. The overlay generation components thusoffer an opportunity for future efforts to incorporatelearning and adaptation mechanisms into the robot’scommunicative repertoire. In particular, it would bebeneficial to be able to extract more real-time feed-back from the human, especially in the case of theproxemics. For example, given the nature of currenthuman-robot interaction, being able to robustly de-tect a human’s discomfort with the proxemics of thesituation should be an important priority.

This work introduced significant additional infras-tructure concerning QRIO’s reactions to its inter-nal states and emotions and to specific humans, inthe form of the Attitude and Relationship structures.The Relationship support assists in providing an emo-tional and expressive component to QRIO’s memory,and the Attitude support offers the beginning of anindividualistic aspect to the robot. There is a greatdeal of potential for future work in this area, primar-ily in the management and maintenance of relation-ships once instantiated; and in “closing the loop” byallowing the human’s activity, as viewed by the robotthrough the filter of the specific relationship, to havea feedback effect on the robot’s emotional state.

Finally, returning to first principles, this papersought to argue that since it is well supported thatnon-verbal communication is an important mode ofhuman interaction, it will also be a useful componentof interaction between humans and humanoid robots,and that behavioral overlays will be an adequate andscalable method of facilitating this in the long term.Robust HRI studies are now necessary to confirm orreject this conjecture, so that future design work canproceed in the most effective directions.

8 Acknowledgements

The work described in this document was performedat Sony Intelligence Dynamics Laboratories (SIDL)during a summer research internship program. Theauthors wish to gratefully acknowledge the generoussupport and assistance provided by Yukiko Hoshino,Kuniaki Noda, Kazumi Aoyama, Hideki Shimomura,the rest of the staff and management of SIDL, and theother summer internship students, in the realizationof this research.

References

Althaus, P., Ishiguro, H., Kanda, T., Miyashita,T. and Christensen, H.I.: 2004, Navigation forhuman-robot interaction tasks, Proc. Interna-tional Conference on Robotics and Automation,Vol. 2, pp. 1894–1900.

Aoyama, K. and Shimomura, H.: 2005, Real worldspeech interaction with a humanoid robot ona layered robot behavior control architecture,Proc. International Conference on Robotics andAutomation.

Bailenson, J.N., Blascovich, J., Beall, A.C. andLoomis, J.M.: 2003, Interpersonal distance inimmersive virtual environments, Personality andSocial Psychology Bulletin 29, 819–833.

Beskow, J. and McGlashan, S.: 1997, OLGA — aconversational agent with gestures, Proc. IJCAI-97 Workshop on Animated Interface Agents:Making Them Intelligent.

Bethel, C.L. and Murphy, R.R.: 2006, Affectiveexpression in appearance-constrained robots,Proceedings of the 2006 ACM Conference onHuman-Robot Interaction (HRI 2006), Salt LakeCity, Utah, pp. 327–328.

22

Blankenship, V., Hnat, S.M., Hess, T.G. and Brown,D.R.: 2004, Reciprocal interaction and similar-ity of personality attributes, Journal of Socialand Personal Relationships 1, 415–432.

Bowlby, J.: 1969, Attachment & Loss Volume 1: At-tachment, Basic Books.

Breazeal, C.: 2000, Sociable Machines: ExpressiveSocial Exchange Between Humans and Robots,PhD thesis, Massachusetts Institute of Technol-ogy.

Breazeal, C.: 2003, Towards sociable robots, Roboticsand Autonomous Systems 42(3–4), 167–175.

Breazeal, C., Brooks, A.G., Gray, J., Hoffman, G.,Kidd, C., Lee, H., Lieberman, J., Lockerd, A.and Chilongo, D.: 2004, Tutelage and collabora-tion for humanoid robots, International Journalof Humanoid Robots 1(2), 315–348.

Brooks, A.G., Berlin, M., Gray, J. and Breazeal,C.: 2005, Untethered robotic play for repetitivephysical tasks, Proc. ACM International Confer-ence on Advances in Computer Entertainment,Valencia, Spain.

Byrne, D. and Griffit, W.: 1969, Similarity andawareness of similarity of personality character-istic determinants of attraction, Journal of Ex-perimental Research in Personality 3, 179–186.

Cassell, J. and Vilhjalmsson, H.: 1999, Fully embod-ied conversational avatars: Making communica-tive behaviors autonomous, Autonomous Agentsand Multiagent Systems 2, 45–64.

Chi, D., Costa, M., Zhao, L. and Badler, N.:2000, The EMOTE model for effort and shape,Proc. 27th Annual Conf. on Computer Graphicsand Interactive Techniques (SIGGRAPH ’00),pp. 173–182.

Christensen, H.I. and Pacchierotti, E.: 2005, Embod-ied social interaction for robots, in Dautenhahn,K. (ed.), Proceedings of the 2005 Convention ofthe Society for the Study of Artificial Intelligenceand Simulation of Behaviour (AISB-05), Hert-fordshire, England, pp. 40–45.

Davies, M. and Stone, T.: 1995, Mental Simulation,Blackwell Publishers, Oxford, UK.

Dittmann, A.: 1978, The role of body move-ment in communication, Nonverbal Behaviorand Communication, Lawrence Erlbaum Asso-ciates, Hillsdale.

Ekman, P. and Davidson, R.J.: 1994, The Nature ofEmotion, Oxford University Press.

Fong, T., Nourbakhsh, I. and Dautenhahn, K.: 2003,A survey of socially interactive robots, Roboticsand Autonomous Systems 42, 143–166.

Fridlund, A.: 1994, Human Facial Expression: AnEvolutionary View, Academic Press, San Diego,CA.

Fujita, M., Kuroki, Y., Ishida, T. and Doi, T.: 2003,Autonomous behavior control architecture of en-tertainment humanoid robot SDR-4X, Proc. In-ternational Conference on Intelligent Robots andSystems, Las Vegas, NV, pp. 960–967.

Giese, M.A. and Poggio, T.: 2000, Morphable modelsfor the analysis and synthesis of complex motionpattern, International Journal of Computer Vi-sion 38(1), 59–73.

Gordon, R.: 1986, Folk psychology as simulation,Mind and Language 1, 158–171.

Guye-Vuilleme, A., Capin, T.K., Pandzic, I.S., Thal-mann, N.M. and Thalmann, D.: 1998, Non-verbal communication interface for collaborativevirtual environments, Proc. Collaborative Vir-tual Environments (CVE 98), pp. 105–112.

Hall, E.T.: 1966, The Hidden Dimension, Doubleday,Garden City, NY.

Hara, F., Akazawa, H. and Kobayashi, H.: 2001,Realistic facial expressions by SMA driven facerobot, Proceedings of IEEE International Work-shop on Robot and Human Interactive Commu-nication, pp. 504–510.

Heal, J.: 2003, Understanding Other Minds from theInside, Cambridge University Press, Cambridge,UK, pp. 28–44.

Jordan, P.W.: 2000, Designing Pleasurable Products:an Introduction to the New Human Factors, Tay-lor & Francis Books Ltd., UK.

Kanda, T., Hirano, T., Eaton, D. and Ishiguro, H.:2004, Interactive robots as social partners andpeer tutors for children, Human-Computer In-teraction 19, 61–84.

23

Kiesler, D.J.: 1983, The 1982 interpersonal circle: Ataxonomy for complementarity in human trans-actions, Psychological Review 90, 185–214.

Knapp, M.: 1972, Nonverbal Communication in Hu-man Interaction, Reinhart and Winston, Inc.,New York.

Kopp, S. and Wachsmuth, I.: 2000, A knowledge-based approach for lifelike gesture animation,Proc. ECAI-2000.

Kopp, S. and Wachsmuth, I.: 2002, Model-based ani-mation of coverbal gesture, Proc. Computer An-imation.

Likhachev, M. and Arkin, R.C.: 2000, Robotic com-fort zones, Proceedings of SPIE: Sensor Fusionand Decentralized Control in Robotic Systems IIIConference, Vol. 4196, pp. 27–41.

Machotka, P. and Spiegel, J.: 1982, The ArticulateBody, Irvington.

McCrae, R.R. and Costa, P.T.: 1996, Toward anew generation of personality theories: Theoret-ical contexts for the five-factor model, in Wig-gins, J.S. (ed.), Five-Factor Model of Personal-ity, Guilford, New York, pp. 51–87.

Nakauchi, Y. and Simmons, R.: 2000, A social robotthat stands in line, Proc. International Confer-ence on Intelligent Robots and Systems, Vol. 1,pp. 357–364.

Nass, C. and Lee, K.M.: 2001, Does computer-generated speech manifest personality? experi-mental test of recognition, similarity-attraction,and consistence-attraction., Journal of Experi-mental Psychology, Applied 7(3), 171–181.

Orford, J.: 1986, The rules of interpersonal com-plementarity: Does hostility beget hostility anddominance, submission?, Psychological Review93, 365–377.

Pacchierotti, E., Christensen, H.I. and Jensfelt, P.:2005, Human-robot embodied interaction inhallway settings: A pilot user study, Proceed-ings of the 2005 IEEE International Workshopon Robots and Human Interactive Communica-tion, pp. 164–171.

Peters, R.A.II, Campbell, C.C., Bluethmann, W.J.and Huber, E.: 2003, Robonaut task learn-ing through teleoperation, Proc. International

Conference on Robotics and Automation, Taipei,Taiwan, pp. 2806–2811.

Reeves, B. and Nass, C.: 1996, The Media Equation:How People Treat Computers, Television, andNew Media Like Real People and Places, Cam-bridge University Press, Cambridge, England.

Rose, C., Cohen, M. and Bodenheimer, B.: 1998,Verbs and adverbs: Multidimensional motion in-terpolation, IEEE Computer Graphics and Ap-plications 18(5), 32–40.

Sawada, T., Takagi, T., Hoshino, Y. and Fujita, M.:2004, Learning behavior selection through inter-action based on emotionally grounded symbolconcept, Proc. IEEE-RAS/RSJ Int’l Conf. onHumanoid Robots (Humanoids ’04).

Smith, C.: 2005, Behavior adaptation for a sociallyinteractive robot, Master’s thesis, KTH Royal In-stitute of Technology, Stockholm, Sweden.

te Boekhorst, R., Walters, M., Koay, K.L., Dauten-hahn, K. and Nehaniv, C.: 2005, A study ofa single robot interacting with groups of chil-dren in a rotation game scenario, Proc. 6th IEEEInternational Symposium on Computational In-telligence in Robotics and Automation (CIRA2005), Espoo, Finland.

Thorisson, K.R.: 1996, Communicative Humanoids:A Computational Model of Psychosocial Dia-logue Skills, PhD thesis, MIT Media Laboratory,Cambridge, MA.

Walters, M.L., Dautenhahn, K., Koay, K.L., Kaouri,C., te Boekhorst, R., Nehaniv, C.L., Werry, I.and Lee, D.: 2005a, Close encounters: Spa-tial distances between people and a robotof mechanistic appearance, Proceedings of the5th IEEE-RAS International Conference onHumanoid Robots (Humanoids ’05), Tsukuba,Japan, pp. 450–455.

Walters, M.L., Dautenhahn, K., te Boekhorst, R.,Koay, K.L., Kaouri, C., Woods, S., Nehaniv,C.L., Lee, D. and Werry, I.: 2005b, The influenceof subjects’ personality traits on personal spa-tial zones in a human-robot interaction experi-ment, Proceedings of IEEE Ro-Man 2005, 14thAnnual Workshop on Robot and Human Inter-active Communication, IEEE Press, Nashville,Tennessee, pp. 347–352.

24

Weitz, S.: 1974, Nonverbal Communication: Read-ings With Commentary, Oxford UniversityPress.

Yamasaki, N. and Anzai, Y.: 1996, Active inter-face for human-robot interaction, Proc. Interna-tional Conference on Robotics and Automation,pp. 3103–3109.

Yan, C., Peng, W., Lee, K.M. and Jin, S.: 2004, Canrobots have personality? An empirical study ofpersonality manifestation, social responses, andsocial presence in human-robot interaction, 54thAnnual Conference of the International Commu-nication Association.

Zecca, M., Roccella, S., Carrozza, M.C., Cappiello,G., Cabibihan, J.-J., Dario, P., Takanobu, H.,Matsumoto, M., Miwa, H., Itoh, K. and Takan-ishi, A.: 2004, On the development of the emo-tion expression humanoid robot WE-4RII withRCH-1, Proc. IEEE-RAS/RSJ Int’l Conf. onHumanoid Robots (Humanoids ’04), Los Ange-les, California.

25

behavioral overlays for non-verbal communication ... · mans and of the robot itself. an...

Documents