[ieee ro-man 2009 - the 18th ieee international symposium on robot and human interactive...

Showing Awareness of Humans’ Context

to Involve Humans in Interaction

Yasuhiko Hato, Thomas Kanold, Kentaro Ishii, and Michita Imai

Abstract— This paper proposes a robot communication strat-egy that enables a human’s context to be incorporated into arobot’s context. The strategy’s fundamental principle is that arobot will show awareness of a human’s context. In our pilotstudy, many participants did not actually start interacting withthe robot, but tested its functionality. According to the resultsof that pilot study, we implemented a robot with behaviorsthat showed awareness of such peculiar human’s behaviors. Weconducted a field experiment to verify the effectiveness of therobot’s behaviors for showing awareness. The results indicatedthat showing awareness can be a method for involving humans

in an interaction with a robot.

I. INTRODUCTION

There has been much recent research that has introduced

guide robots into public places such as museums and exhi-

bition halls [1], [2], [3]. These robots interact with humans

through speech and gestures so that anyone can easily

communicate with them. A speech and gesture interaction,

however, is not always successfully achieved, as the humans

may not pay attention to the robots during the interaction.

A communication strategy to draw humans’ attention to the

robot is required as a fundamental step of speech and gesture

interaction.

Previously, robot communication strategies to obtain peo-

ple’s attention have been researched. Shiomi et al. used a

communication robot in a science museum to guide people

[4]. The robot could distinguish visitors by means of RFID

tags, which were worn around their neck, and based on

the tag’s ID, the robot could call out the visitors’ names.

According to their questionnaire, the visitors gave a higher

evaluation score to the name-calling robot than to a normal

robot. In addition, Okada et al. investigated the behavior of

people listening to the contents presented by a guide robot

in an art museum [5]. They focused on the subject’s neck

movement during the explanation, and implemented it into

the robot.

However, Shiomi et al. reported that visitors sometimes

tried to evoke the robot’s reaction to their behavior and did

not follow the explanations or directions being given by it.

They repeatedly touched the robot or showed their RFID

Y. Hato and T. Kanold are with the Graduate School of Science andTechnology, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Japan{hato, thomas}@ayu.ics.keio.ac.jp

T. Kanold is also with Technische Universitat DarmstadtK. Ishii is with Japan Science and Technology Agency,

ERATO, IGARASHI Design Interface Project, Frontier KoishikawaBldg. 7F, Koishikawa 1-28-1, Bunkyo-ku, Tokyo, [email protected]

M. Imai is a faculty member of the Science and Technology De-partment, Keio University, 3-14-1 Hiyoshi Kohoku-ku, Yokohama, [email protected]

Guidance

content

Guidance

contentAttention

Showing

awareness

Guidance

content

Guidance

content

State: People pay

attention to the robot itself,

not to the content.

Method: The robot

shows awareness of the

human’s context in

State.

Goal: People pay

attention to the

content.

Fig. 1. Showing awareness of humans’ context

tag to the robot while the robot tried to explain an exhibit.

Our pilot study, described in Sec. II, showed a similar trend:

28 of 60 groups did not follow the robot’s guidance, even

though they looked at the robot. There were two peculiar

behaviors determined. The first behavior is the waving of

hands in front of the robot’s cameras to invoke a reaction.

The second is the observing the robot’s mechanism and outer

appearance. In this case, the participants looked at the robot,

but did not interact with it. We think that the problem lies

in the mismatch of how humans and robots interact with

each other. We define the term “context” as “the way for

human or a robot to interact with the other.” For example,

context of a person waving her/his hand in front of a robot’s

camera might indicate that they test the image processing

function of the robot. Okada et al. succeeded in maintaining

the interaction, but they did not discuss the possibility of

context mismatching.

We propose a communication strategy that enables a

human’s context to be incorporated into a robot’s context.

The basic idea is for the robot to show awareness of a

human’s context to create the interaction. We aimed at

drawing a human’s attention to the robot’s words (Fig. 1).

This communication strategy is called showing awareness

(SA). A change in the person’s state of mind will occur

because the robot shows awareness of the current situation.

Hence she/he infers the robot’s intention, then she/he begins

The 18th IEEE International Symposium onRobot and Human Interactive CommunicationToyama, Japan, Sept. 27-Oct. 2, 2009

WeB2.3

978-1-4244-5081-7/09/$26.00 ©2009 IEEE 663

to listen to its content.

We developed a communication robot that introduced

people to the various parts of a building, and conducted a

field experiment in its lobby to compare the “with SA” and

“without SA” strategies. According to the results of a pilot

study, we implemented two utterances for use as SA: (a) “It’s

OK, I can see you.” for those who try to test the robot’s vision

by waving their hand in front of the robot’s camera, and (b)

“Do you hear me?” for those who do not respond at all.

The remainder of this paper is organized as follows.

Sec. II describes the background for this paper. Sec. III

explains our communication strategy SA. Sec. IV describes

our experiment to compare the strategy with SA to without

SA. Sec. V presents the results from our experiment. Sec. VI

discusses the results and provides an overview of the future

work resulting from the experiment. Sec. VII concludes this

paper with a brief summary.

II. BACKGROUND

We focus on a robot whose task is guiding visitors in a

public space, and aim to draw their attention to the contents

of the robot’s guidance. In this section, we first introduce

related studies about our target. Next, we show our pilot

study to find concrete issues and its result. Lastly, we present

our approach to problems acquired from the pilot study.

A. Related Work

Hayashi et al. described a similar open field experiment

[6]. As we do in our study, they also attempted to draw

participants’ attention to information. In this study, robots

had the task of informing people in a train station about the

station and to give some travel information. They prepared

several conditions and compared them. They compared a

single robot versus two robots and non-interactive behavior

(robots repeatedly announced the information regardless of

the presence of visitors) versus interactive behavior (robots

greeted visitors when they detected them, then started to

announce the information). The results showed that no-

interactive behavior by two robots got the highest rate of

people’s interest in the information. However, their exper-

iment just proposed a method to convey information, and

did not refer any kind of improvement when robots were

interacting with humans.

A relationship between a robot and human is sufficient

for a better understanding of each other, and for more

reliable and effective cooperation. It was shown that with an

existing relationship, a human can better estimate the robot’s

intention, even if the utterance sound was unclear [7]. Ono et

al. focused on a human’s utterance recognition process and

succeeded in creating a relationship between a robot and

human by migrating a known CG character from a PC to a

robot. They adopted a non-verbal method, while we aimed

to create relationship between a robot and humans by verbal

expressions.

Our approach differs from experiments like [8] in that our

focus does not lie in visual features such as head tracking

or facial expressions, but in the investigation of conveying

information through utterances.

Fig. 2. Scene from pilot study

TABLE I

PILOT STUDY RESULTS

individuals groups total

paid attention to 9 23 32the robot’s contents

did not pay attention to 4 24 28the robot’s contents

B. Pilot Study

The aim of this study was to see if people would follow

a robot’s guidance proposal and how they behave.

1) Outline of Pilot Study: A humanoid robot was placed

in an exhibition hall, and gave visitors information about

three scientific exhibits at a university festival while pointing

to the corresponding exhibition booth (Fig. 2). The exper-

imenters, who were hidden behind a wall, controlled the

timing of the robot’s utterances. The participants’ reactions

were observed while the robot was talking.

2) Results: The robot talked to 75 groups during the

experiment. We extracted 60 groups from these, because 15

groups completely ignored the robot. A group consisted of

one or more participants. During the experiment, groups and

individuals were treated the same. 13 groups consisted of

only one participant, while the other 47 consisted of two or

more.

TABLE I lists the results of the pilot experiment as to

whether or not the groups paid attention to the robot’s

contents. We counted groups which turned around to look

in the direction the robot pointed as groups paying attention

to the robot’s contents. When at least one participant in a

group turned its attention to the robot, the whole group was

counted as paying attention. We found that about 47% of the

groups did not pay attention to the robot’s contents.

Some of the participants who we did not consider to be

paying attention to the contents were as follows.

1) Participants who waved their hands in front of the

robot’s camera. They could see that the robot can track

them with its head,

2) Participants who came very close to the robot and

stooped down to examine the robot’s construction,

sometimes even touching the robot.

664

These participants’ context seemed to test the image

processing function of the robot or examine the detail, while

the robot’s context was to interact with them.

C. Approach

According to Sec. II-B.2, the experiment confirmed that

information is not conveyed to humans unless their context

coincides with the robot’s context. Therefore, the people who

paid attention to only the robot itself.

We defined the following behaviors as our targets.

WB: Waving Behavior. Participants who wave their

hands in front of the robot’s camera. Some are play-

ing with the robot’s image processing in another

way. They see that the robot can track them with

its head, so they walk from side to side to test and

play around with that functionality.

NIB: No-Interaction Behavior. People who approach the

robot to take a glance at its outer appearance, but do

not interact with it. No interaction means that they

do not answer to the robot’s questions. Some come

very close to the robot to stoop down and examine

the its construction, sometimes even touching the

robot.

We attempted to achieve two goals with the approach to

our new experiment:

• To stop people from examining and playing with the

robot.

• To get them to start interacting with the robot.

Informative intention and communicative intention re-

ferred to in D. Sperber and D. Wilson’s Relevance theory[9]

were taken into consideration when determining the robot’s

utterances. Informative intention is the intention that ex-

presses the message itself. Communicative intention is the

meta-intention to convey the informative intention. For ex-

ample, the utterance “Please turn on the light” has the

informative intention to have a partner turn on the light

and a communicative intention conveys the intention to have

a partner turn on the light. This theory was taken into

consideration when designing the utterances in our new

approach.

III. SHOWING AWARENESS OF HUMANS’ CONTEXT

In this section, we propose a communication strategy for

changing a human’s context. This strategy is called showing

awareness (SA).

A. Communication Strategy

WB and NIB humans do not see a robot as a communica-

tion partner. Usually, their context is to observe the robot’s

construction or functional abilities. The aim of SA is to

change the participants’ context. The idea is that if a robot

shows awareness of the participants’ context, the subjects

might change their context.

SA is expressed by utterances (SA utterances). We de-

signed SA utterances so that a human’s mind changes

through following steps. These are also shown by Fig. 3.

Communicative

IntentionRead

SA Utterance

Informative

Intention

Step 1

Step 2

Infer

Fig. 3. Outline of transition of a human’s mind by SA

Step 1: Human learns of the existence of the robot’s

communicative intention from SA utterance.

Step 2: Human infers that the robot’s informative inten-

tion is that it wants to interact in its context (i.e.,

it wants her/him to listen to its content).

Step 3: Human begins to listen to the content of guid-

ance.

We expect the human’s context to change as a result.

We prepared the following two utterances for WB and NIB

respectively.

(a) “It’s OK, I can see you. Please listen to me.”.

(b) “Do you hear me? Please listen to me.”.

B. Robot’s Guidance Behavior

The robot’s guidance contents for the study are shown

in Fig. 4. The general guidance behavior of the robot is as

follows. It explains the facility to the people near by, using

pointing gestures to show direction. The robot usually gazes

at the participants’ faces during the entire communication. It

demands answers from participants at questions shown with

red squares. According to yes or no, the content diverges.

The guidance is finished by the utterance “See you later”

When the participant checks the robot’s image processing

at any time, the robot uses the utterance (a). The guide

robot asks the participants if they want to have some more

information, and then reacts appropriately. At this time, if

the participants do not answer the robot’s question, the robot

uses the utterance (b) of our approach. This flow is shown

in Fig. 5.

IV. EXPERIMENT

The experiment was made to prove the effectiveness of

the SA referred to in Sec.III.

A. Hypothesis

Hypothesis: A participant’s context of inattention towards

the robot’s communicated content can be changed if the robot

shows its awareness (SA) of that context.

665

Hello.

My name is Uni.

I will explain this building.

There is a library to your go to the

left.�Uni points to the right)

ITC can be found on the B1 level;

you can use the elevator. (Uni

points straight ahead)

There is a bakery called “La poire”

there. (Uni turns to the left)

Thank you.

See you later.

May I explain more? (Uni gazes at

participant)

There are many laboratories on the

upper floors.

Thank you for listening. Should I

explain it again from the beginning?

(Uni gazes participant)

No

No

Yes

All right,

Yes

There is scenic dome in the seventh

floor

There are discussion rooms, too

Start

Fig. 4. Guide flow of the robot: Each black square represents an utterancefrom the robot UNI. A red square stands for UNI’s utterance demandingan answer from the participant. Each gray square is the reaction of theparticipant. The contents of UNI were presented along arrows.

It’s OK, I can see you.

Please listen to me.

the participant confirms image

processing function of UNI.

The participant shows no reaction to

UNI’s question.

Next utterance

Do you hear me?

During the robot’s utterance, or

during the transition from an

utterance to the next utterance.

Fig. 5. The SA flow: Each dashed line square represents the robot’s SAutterance. Each gray square expresses the reaction of the participant.

B. Environment

We performed the experiment using a communication

robot named UNI in the elevator hall of the 14th building

at the Yagami Campus of Keio University (Fig. 6). This

place is an open environment, and passersby are familiar

with it. Since people are usually not used to seeing a robot,

we expected them to come closer to it to examine the new

technology. The participants were unaware of the experiment.

The experiment’s environment is shown in Fig. 7. The

experimenter controlled the robot from behind a guidance

board so that the participants could not see him. Two cameras

were used for recording the results. One camera was placed

on UNI to record the movements and reactions of the

participants. The other camera was appropriately placed to

record the entire scene.

C. Outline of experiment

The experiment was conducted using the “Wizard-of-Oz”

method, and the experimenter always watched the scene

through the installed cameras. When a participant passed by,

Fig. 6. UNI: Robot used in experiment

Library

Elevators

Bench

Robot

Experimenter

Guide boardCameras

Entrance

Entrance

Fig. 7. Experimental environment

the robot turned its head to the subject, and the flow of the

guide content began with the utterance, “hello”, as described

in Fig. 4. A picture of the experiment scene is shown in Fig.

8.

After UNI said the last utterance “See you later” or

participants left UNI, an experimenter appeared from behind

a guide board and asked them to fill out a questionnaire.

We got participants’ age, gender, permission to use recorded

video, and impressions through the questionnaire. They could

also refuse to answer it.

To prove or disprove our hypothesis, the participants of

the experiment, showing Waving (WB) and No-Interaction

Behavior (NIB), were divided into two groups. For the

experimental group (EG) we used the patterns in Fig. 4 and

Fig. 5. But for the control group (CG) we used only the

pattern of Fig. 4. In EG and CG, only participating groups

which showed behaviors, WB or NIB were included.

D. Prediction

The predictions made for our experiment that are based on

the above-mentioned plan were as follows: EG participants...

1) ...listen to utterances more than CG, because they pay

attention to contents of UNI due to SA.

2) ...stop checking functionality or observing the assem-

bly of UNI due to SA utterance (a).

666

Fig. 8. Scene from the experiment

3) ...start answering UNI’s question because of SA utter-

ance (b).

V. RESULTS

The experiment was held over two days and 44 groups

(74 people; 21 females and 53 males) participated in and

comprised the experiment group.

23 groups consisted of only one participant, while the

other 21 groups consisted of two or more. The people who

passed by while looking at the robot, were not accounted

for in this group. 30 participants (15 groups) comprised the

EG and 44 (29 groups) comprised the CG. The average age

of participants who filled out the questionnaire was 26.9

years (males 22-61 averaged an age of 28.8; females 19-36

averaged an age of 24.4).

A. Prediction 1

Fourteen utterances (Fig. 4) were prepared as guide con-

tents. To prove Prediction 1, we counted the numbers of

utterances played until each group left UNI. The average

number of utterances in EG is 9.0, while in CG the average

is 9.8. No significant difference between EG and CG was

found.

B. Prediction 2

The Target Group consisted of each participant of the EG

whose behavior is WB.

This included waving her/his hand in front of the camera

(Fig. 9) or walking from side to side (Fig. 10), so that UNI’s

head had to follow her/him.

We counted the number of people in the Target Group who

stopped checking as a result of the SA utterance (a). Three

of nine participants in the Target Group started interacting

after the SA. Thus, our strategy was able to stop 33% from

checking function of UNI

Moreover, all participants in the Target Group were sur-

prised by UNI’s SA utterances.

C. Prediction 3

We observed groups in the EG whose behaviors are NIB

and investigated to see if they would eventually interact after

SA. Participants who did not answer to any of UNI’s ques-

tions were declared as NIB participants. During observation

of videos, groups and individuals were treated the same.

For the evaluation of Prediction 3, six groups out of

the experiment group were examined. They heard UNI’s

question “May I explain more?” or “Should I explain it

from the begining?”. But they did not answer UNI’s question

at first. Among these groups, three replied to UNI’s SA

and replied affirmatively after hearing the SA utterance (b)

“Do you hear me?” from UNI. Those replies were also

seen as an interaction with the robot. All together, three out

of six groups started interacting with UNI because of SA.

Our strategy succeeded in involving 50% of the EG which

showed NIB in the interaction.

Also in this case, participants were surprised by UNI’s SA

utterances.

VI. DISCUSSION

A. Prediction 1

Because participants in EG continued to listen to UNI’s

contents due to SA, we believed that more utterances would

be played in EG. However, as expressed in Sec.V-A, there

was no significant difference between EG and CG. The

number of played utterances in CG was even higher.

We believe that the participants’ previous knowledge of

the building caused them to be less interested in hearing

UNI’s explanation.

B. Prediction 2

As mentioned above, 33% of the participants in the

Target Group stopped played with UNI’s camera. We believe

this phenomenon is due to their understanding of UNI’s

communicative and informative intention. Because of this,

UNI could change the participants’ context.

However, 67% of the participants did not stop their ex-

aminations and they were amused by UNI’s face tracking.

There are two possibilities.

• Step 2 failure. Because they could not infer UNI’s

informative intention, they did not change their context.

• Step 3 failure. They wanted to stay in their context (i.e.,

they did not intend to hear UNI’s content) though they

could understand UNI’s informative intention.

Considering the design of SA utterances and the result that

all participants in the Target Group were surprised by UNI’s

SA utterances, Step 3 failed was the reason why they did not

stop WB. A surprise in this case indicated that people did

not expect the existence of UNI’s intention.

Moreover, since they already might have known the

building, UNI did not offer any new information to them.

This would be also an explanation for why they were not

interested in UNI’s guidance content.

667

(1) (2) (3) (4) (5)

Fig. 9. An Example of a participant who waved its hands

(1) (2) (3) (4)

Fig. 10. An example of a participant who walked from side to side

C. Prediction 3

50% of participants whose behaviors were NIB shared

context with UNI and began to interact with it. An SA

utterance could change the participants’ behaviors and draw

attention to UNI’s context. We suggest that the informative

intention and communicative intention could be used to

convey a message to them, because of their reaction to the

SA utterances expressed in Sec. V-C.

In contrast, though the other 50% understood UNI’s inten-

tion when it said SA utterance, they remained in their context

and did not start to interact with it. Considering the design

of SA utterances and their surprise, Step 3 failure was the

reason why they did not stop NIB.

D. Future Study

We aim to design the next experiment to verify the

effectiveness of SA with different utterances. For example,

saying an utterance with SA: “It’s OK. I can see you. Please

listen to me” in EG, while saying an utterance without SA:

“Please listen to me” in CG. When an SA utterance after a

question leads to a response, we have to repeat the question.

For the following experiment, it will be important to get

a higher number of participants into the target group.

As mentioned, the content of the robot’s guidance was not

suitable because all participants already knew it. We need

new content that participants do not know.

The effect of surprise caused by SA must be investigated.

We do not yet know if surprise is the main reason for a

sudden response or not.

VII. CONCLUSION

In this paper, we proposed a communication strategy

called SA that focuses on helping guide robots in public

places to draw attention to their spoken content. A robot can

show its awareness of people’s context using SA so that they

then change their context.

We conducted an experiment in the lobby of a building in

our university, and then used the recorded scenes from the

experiment for this research. SA stopped 33% of participants

from playing with the robot and succeeded involving 50% in

the interaction. This suggests that people noticed the inten-

tion of the robot because of SA utterances. We believe that

SA will become an effective method after a few adjustments.

REFERENCES

[1] J. Schulte, C. Rosenberg, and S. Thrun, Spontaneous, Short-termInteraction with Mobile Robots, Proceedings of the 1999 IEEE In-

ternational Conference on Robotics & Automation, pp. 658-663, 1999[2] Y. Koide, T. Kanda, Y. Sumi, K. Kogure, and H. Ishiguro, An

Approach to Integrating an Interactive Guide Robot with UbiquitousSensors, Proceedings of 2004 IEEE RSJ International Conference on

Intelligent Robots and Systems, Vol. 3, pp. 2500-2505, 2004[3] K. Nohara, T. Tajika, M. Shiomi, T. Kanda, and H. Ishiguro, and

Norihiro Hagita, Integrating Passive RFID tag and Person Trackingfor Social Interaction in Daily Life, Proceedings of the 17th IEEE

International Symposium on Robot and Human Interactive Communi-

cation, pp. 545-552, 2008[4] M. Shiomi, T. Kanda, H. Ishiguro, and N. Hagita, Interactive Hu-

manoid Robots for a Science Museum, Proceedings of the 1st ACM

SIGCHI/SIGART conference on Human-robot interaction (HRI’06),pp. 305-312, 2006

[5] M. Okada, Y. Hoshi, K. Yamazaki, A. Yamazaki, Y. Kuno, and Y.Kobayashi, Museum Guide Robot Attracting Visitors in a Talk: Syn-chronously Coordination in Verbal and Bodily Behaviors (in Japanese),Human-Agent Interaction Symposium 2008, pp. 2B-1, 2008

[6] K. Hayashi, D. Sakamoto, T. Kanda, and M. Shiomi, HumanoidRobots as a Passive-Social Medium - A Field Experiment at a TrainStation Proceedings of the ACM/IEEE international conference on

Human-robot interaction, pp. 137-144, 2007[7] T. Ono, M. Imai, and H. Ishiguro, Anthropomorphic Communications

in the Emerging Relationship between Humans and Robots, Proceed-

ings of the 2000 IEEE International Workshop on Robot and Human

Interactive Communication, pp. 334-339, 2000[8] A. Bruce, I. Nourbakhsh, and R. Simmons. The Role of Expressiveness

and Attention in Human-Robot Interaction, AAAI Fall Symposium,2001.

[9] D. Sperber and D. Wilson, Relevance: Communication and Cognition,Oxford, Basil Blackwell, 1986

668

[ieee ro-man 2009 - the 18th ieee international symposium on robot and human interactive...

Documents