[ieee ro-man 2009 - the 18th ieee international symposium on robot and human interactive...
TRANSCRIPT
Showing Awareness of Humans’ Context
to Involve Humans in Interaction
Yasuhiko Hato, Thomas Kanold, Kentaro Ishii, and Michita Imai
Abstract— This paper proposes a robot communication strat-egy that enables a human’s context to be incorporated into arobot’s context. The strategy’s fundamental principle is that arobot will show awareness of a human’s context. In our pilotstudy, many participants did not actually start interacting withthe robot, but tested its functionality. According to the resultsof that pilot study, we implemented a robot with behaviorsthat showed awareness of such peculiar human’s behaviors. Weconducted a field experiment to verify the effectiveness of therobot’s behaviors for showing awareness. The results indicatedthat showing awareness can be a method for involving humans
in an interaction with a robot.
I. INTRODUCTION
There has been much recent research that has introduced
guide robots into public places such as museums and exhi-
bition halls [1], [2], [3]. These robots interact with humans
through speech and gestures so that anyone can easily
communicate with them. A speech and gesture interaction,
however, is not always successfully achieved, as the humans
may not pay attention to the robots during the interaction.
A communication strategy to draw humans’ attention to the
robot is required as a fundamental step of speech and gesture
interaction.
Previously, robot communication strategies to obtain peo-
ple’s attention have been researched. Shiomi et al. used a
communication robot in a science museum to guide people
[4]. The robot could distinguish visitors by means of RFID
tags, which were worn around their neck, and based on
the tag’s ID, the robot could call out the visitors’ names.
According to their questionnaire, the visitors gave a higher
evaluation score to the name-calling robot than to a normal
robot. In addition, Okada et al. investigated the behavior of
people listening to the contents presented by a guide robot
in an art museum [5]. They focused on the subject’s neck
movement during the explanation, and implemented it into
the robot.
However, Shiomi et al. reported that visitors sometimes
tried to evoke the robot’s reaction to their behavior and did
not follow the explanations or directions being given by it.
They repeatedly touched the robot or showed their RFID
Y. Hato and T. Kanold are with the Graduate School of Science andTechnology, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Japan{hato, thomas}@ayu.ics.keio.ac.jp
T. Kanold is also with Technische Universitat DarmstadtK. Ishii is with Japan Science and Technology Agency,
ERATO, IGARASHI Design Interface Project, Frontier KoishikawaBldg. 7F, Koishikawa 1-28-1, Bunkyo-ku, Tokyo, [email protected]
M. Imai is a faculty member of the Science and Technology De-partment, Keio University, 3-14-1 Hiyoshi Kohoku-ku, Yokohama, [email protected]
Guidance
content
Guidance
contentAttention
Showing
awareness
Guidance
content
Guidance
content
State: People pay
attention to the robot itself,
not to the content.
Method: The robot
shows awareness of the
human’s context in
State.
Goal: People pay
attention to the
content.
Fig. 1. Showing awareness of humans’ context
tag to the robot while the robot tried to explain an exhibit.
Our pilot study, described in Sec. II, showed a similar trend:
28 of 60 groups did not follow the robot’s guidance, even
though they looked at the robot. There were two peculiar
behaviors determined. The first behavior is the waving of
hands in front of the robot’s cameras to invoke a reaction.
The second is the observing the robot’s mechanism and outer
appearance. In this case, the participants looked at the robot,
but did not interact with it. We think that the problem lies
in the mismatch of how humans and robots interact with
each other. We define the term “context” as “the way for
human or a robot to interact with the other.” For example,
context of a person waving her/his hand in front of a robot’s
camera might indicate that they test the image processing
function of the robot. Okada et al. succeeded in maintaining
the interaction, but they did not discuss the possibility of
context mismatching.
We propose a communication strategy that enables a
human’s context to be incorporated into a robot’s context.
The basic idea is for the robot to show awareness of a
human’s context to create the interaction. We aimed at
drawing a human’s attention to the robot’s words (Fig. 1).
This communication strategy is called showing awareness
(SA). A change in the person’s state of mind will occur
because the robot shows awareness of the current situation.
Hence she/he infers the robot’s intention, then she/he begins
The 18th IEEE International Symposium onRobot and Human Interactive CommunicationToyama, Japan, Sept. 27-Oct. 2, 2009
WeB2.3
978-1-4244-5081-7/09/$26.00 ©2009 IEEE 663
to listen to its content.
We developed a communication robot that introduced
people to the various parts of a building, and conducted a
field experiment in its lobby to compare the “with SA” and
“without SA” strategies. According to the results of a pilot
study, we implemented two utterances for use as SA: (a) “It’s
OK, I can see you.” for those who try to test the robot’s vision
by waving their hand in front of the robot’s camera, and (b)
“Do you hear me?” for those who do not respond at all.
The remainder of this paper is organized as follows.
Sec. II describes the background for this paper. Sec. III
explains our communication strategy SA. Sec. IV describes
our experiment to compare the strategy with SA to without
SA. Sec. V presents the results from our experiment. Sec. VI
discusses the results and provides an overview of the future
work resulting from the experiment. Sec. VII concludes this
paper with a brief summary.
II. BACKGROUND
We focus on a robot whose task is guiding visitors in a
public space, and aim to draw their attention to the contents
of the robot’s guidance. In this section, we first introduce
related studies about our target. Next, we show our pilot
study to find concrete issues and its result. Lastly, we present
our approach to problems acquired from the pilot study.
A. Related Work
Hayashi et al. described a similar open field experiment
[6]. As we do in our study, they also attempted to draw
participants’ attention to information. In this study, robots
had the task of informing people in a train station about the
station and to give some travel information. They prepared
several conditions and compared them. They compared a
single robot versus two robots and non-interactive behavior
(robots repeatedly announced the information regardless of
the presence of visitors) versus interactive behavior (robots
greeted visitors when they detected them, then started to
announce the information). The results showed that no-
interactive behavior by two robots got the highest rate of
people’s interest in the information. However, their exper-
iment just proposed a method to convey information, and
did not refer any kind of improvement when robots were
interacting with humans.
A relationship between a robot and human is sufficient
for a better understanding of each other, and for more
reliable and effective cooperation. It was shown that with an
existing relationship, a human can better estimate the robot’s
intention, even if the utterance sound was unclear [7]. Ono et
al. focused on a human’s utterance recognition process and
succeeded in creating a relationship between a robot and
human by migrating a known CG character from a PC to a
robot. They adopted a non-verbal method, while we aimed
to create relationship between a robot and humans by verbal
expressions.
Our approach differs from experiments like [8] in that our
focus does not lie in visual features such as head tracking
or facial expressions, but in the investigation of conveying
information through utterances.
Fig. 2. Scene from pilot study
TABLE I
PILOT STUDY RESULTS
individuals groups total
paid attention to 9 23 32the robot’s contents
did not pay attention to 4 24 28the robot’s contents
B. Pilot Study
The aim of this study was to see if people would follow
a robot’s guidance proposal and how they behave.
1) Outline of Pilot Study: A humanoid robot was placed
in an exhibition hall, and gave visitors information about
three scientific exhibits at a university festival while pointing
to the corresponding exhibition booth (Fig. 2). The exper-
imenters, who were hidden behind a wall, controlled the
timing of the robot’s utterances. The participants’ reactions
were observed while the robot was talking.
2) Results: The robot talked to 75 groups during the
experiment. We extracted 60 groups from these, because 15
groups completely ignored the robot. A group consisted of
one or more participants. During the experiment, groups and
individuals were treated the same. 13 groups consisted of
only one participant, while the other 47 consisted of two or
more.
TABLE I lists the results of the pilot experiment as to
whether or not the groups paid attention to the robot’s
contents. We counted groups which turned around to look
in the direction the robot pointed as groups paying attention
to the robot’s contents. When at least one participant in a
group turned its attention to the robot, the whole group was
counted as paying attention. We found that about 47% of the
groups did not pay attention to the robot’s contents.
Some of the participants who we did not consider to be
paying attention to the contents were as follows.
1) Participants who waved their hands in front of the
robot’s camera. They could see that the robot can track
them with its head,
2) Participants who came very close to the robot and
stooped down to examine the robot’s construction,
sometimes even touching the robot.
664
These participants’ context seemed to test the image
processing function of the robot or examine the detail, while
the robot’s context was to interact with them.
C. Approach
According to Sec. II-B.2, the experiment confirmed that
information is not conveyed to humans unless their context
coincides with the robot’s context. Therefore, the people who
paid attention to only the robot itself.
We defined the following behaviors as our targets.
WB: Waving Behavior. Participants who wave their
hands in front of the robot’s camera. Some are play-
ing with the robot’s image processing in another
way. They see that the robot can track them with
its head, so they walk from side to side to test and
play around with that functionality.
NIB: No-Interaction Behavior. People who approach the
robot to take a glance at its outer appearance, but do
not interact with it. No interaction means that they
do not answer to the robot’s questions. Some come
very close to the robot to stoop down and examine
the its construction, sometimes even touching the
robot.
We attempted to achieve two goals with the approach to
our new experiment:
• To stop people from examining and playing with the
robot.
• To get them to start interacting with the robot.
Informative intention and communicative intention re-
ferred to in D. Sperber and D. Wilson’s Relevance theory[9]
were taken into consideration when determining the robot’s
utterances. Informative intention is the intention that ex-
presses the message itself. Communicative intention is the
meta-intention to convey the informative intention. For ex-
ample, the utterance “Please turn on the light” has the
informative intention to have a partner turn on the light
and a communicative intention conveys the intention to have
a partner turn on the light. This theory was taken into
consideration when designing the utterances in our new
approach.
III. SHOWING AWARENESS OF HUMANS’ CONTEXT
In this section, we propose a communication strategy for
changing a human’s context. This strategy is called showing
awareness (SA).
A. Communication Strategy
WB and NIB humans do not see a robot as a communica-
tion partner. Usually, their context is to observe the robot’s
construction or functional abilities. The aim of SA is to
change the participants’ context. The idea is that if a robot
shows awareness of the participants’ context, the subjects
might change their context.
SA is expressed by utterances (SA utterances). We de-
signed SA utterances so that a human’s mind changes
through following steps. These are also shown by Fig. 3.
Communicative
IntentionRead
SA Utterance
Informative
Intention
Step 1
Step 2
Infer
Fig. 3. Outline of transition of a human’s mind by SA
Step 1: Human learns of the existence of the robot’s
communicative intention from SA utterance.
Step 2: Human infers that the robot’s informative inten-
tion is that it wants to interact in its context (i.e.,
it wants her/him to listen to its content).
Step 3: Human begins to listen to the content of guid-
ance.
We expect the human’s context to change as a result.
We prepared the following two utterances for WB and NIB
respectively.
(a) “It’s OK, I can see you. Please listen to me.”.
(b) “Do you hear me? Please listen to me.”.
B. Robot’s Guidance Behavior
The robot’s guidance contents for the study are shown
in Fig. 4. The general guidance behavior of the robot is as
follows. It explains the facility to the people near by, using
pointing gestures to show direction. The robot usually gazes
at the participants’ faces during the entire communication. It
demands answers from participants at questions shown with
red squares. According to yes or no, the content diverges.
The guidance is finished by the utterance “See you later”
When the participant checks the robot’s image processing
at any time, the robot uses the utterance (a). The guide
robot asks the participants if they want to have some more
information, and then reacts appropriately. At this time, if
the participants do not answer the robot’s question, the robot
uses the utterance (b) of our approach. This flow is shown
in Fig. 5.
IV. EXPERIMENT
The experiment was made to prove the effectiveness of
the SA referred to in Sec.III.
A. Hypothesis
Hypothesis: A participant’s context of inattention towards
the robot’s communicated content can be changed if the robot
shows its awareness (SA) of that context.
665
Hello.
My name is Uni.
I will explain this building.
There is a library to your go to the
left.�Uni points to the right)
ITC can be found on the B1 level;
you can use the elevator. (Uni
points straight ahead)
There is a bakery called “La poire”
there. (Uni turns to the left)
Thank you.
See you later.
May I explain more? (Uni gazes at
participant)
There are many laboratories on the
upper floors.
Thank you for listening. Should I
explain it again from the beginning?
(Uni gazes participant)
No
No
Yes
All right,
Yes
There is scenic dome in the seventh
floor
There are discussion rooms, too
Start
Fig. 4. Guide flow of the robot: Each black square represents an utterancefrom the robot UNI. A red square stands for UNI’s utterance demandingan answer from the participant. Each gray square is the reaction of theparticipant. The contents of UNI were presented along arrows.
It’s OK, I can see you.
Please listen to me.
the participant confirms image
processing function of UNI.
The participant shows no reaction to
UNI’s question.
Next utterance
Do you hear me?
During the robot’s utterance, or
during the transition from an
utterance to the next utterance.
Fig. 5. The SA flow: Each dashed line square represents the robot’s SAutterance. Each gray square expresses the reaction of the participant.
B. Environment
We performed the experiment using a communication
robot named UNI in the elevator hall of the 14th building
at the Yagami Campus of Keio University (Fig. 6). This
place is an open environment, and passersby are familiar
with it. Since people are usually not used to seeing a robot,
we expected them to come closer to it to examine the new
technology. The participants were unaware of the experiment.
The experiment’s environment is shown in Fig. 7. The
experimenter controlled the robot from behind a guidance
board so that the participants could not see him. Two cameras
were used for recording the results. One camera was placed
on UNI to record the movements and reactions of the
participants. The other camera was appropriately placed to
record the entire scene.
C. Outline of experiment
The experiment was conducted using the “Wizard-of-Oz”
method, and the experimenter always watched the scene
through the installed cameras. When a participant passed by,
Fig. 6. UNI: Robot used in experiment
Library
Elevators
Bench
Robot
Experimenter
Guide boardCameras
Entrance
Entrance
Fig. 7. Experimental environment
the robot turned its head to the subject, and the flow of the
guide content began with the utterance, “hello”, as described
in Fig. 4. A picture of the experiment scene is shown in Fig.
8.
After UNI said the last utterance “See you later” or
participants left UNI, an experimenter appeared from behind
a guide board and asked them to fill out a questionnaire.
We got participants’ age, gender, permission to use recorded
video, and impressions through the questionnaire. They could
also refuse to answer it.
To prove or disprove our hypothesis, the participants of
the experiment, showing Waving (WB) and No-Interaction
Behavior (NIB), were divided into two groups. For the
experimental group (EG) we used the patterns in Fig. 4 and
Fig. 5. But for the control group (CG) we used only the
pattern of Fig. 4. In EG and CG, only participating groups
which showed behaviors, WB or NIB were included.
D. Prediction
The predictions made for our experiment that are based on
the above-mentioned plan were as follows: EG participants...
1) ...listen to utterances more than CG, because they pay
attention to contents of UNI due to SA.
2) ...stop checking functionality or observing the assem-
bly of UNI due to SA utterance (a).
666
Fig. 8. Scene from the experiment
3) ...start answering UNI’s question because of SA utter-
ance (b).
V. RESULTS
The experiment was held over two days and 44 groups
(74 people; 21 females and 53 males) participated in and
comprised the experiment group.
23 groups consisted of only one participant, while the
other 21 groups consisted of two or more. The people who
passed by while looking at the robot, were not accounted
for in this group. 30 participants (15 groups) comprised the
EG and 44 (29 groups) comprised the CG. The average age
of participants who filled out the questionnaire was 26.9
years (males 22-61 averaged an age of 28.8; females 19-36
averaged an age of 24.4).
A. Prediction 1
Fourteen utterances (Fig. 4) were prepared as guide con-
tents. To prove Prediction 1, we counted the numbers of
utterances played until each group left UNI. The average
number of utterances in EG is 9.0, while in CG the average
is 9.8. No significant difference between EG and CG was
found.
B. Prediction 2
The Target Group consisted of each participant of the EG
whose behavior is WB.
This included waving her/his hand in front of the camera
(Fig. 9) or walking from side to side (Fig. 10), so that UNI’s
head had to follow her/him.
We counted the number of people in the Target Group who
stopped checking as a result of the SA utterance (a). Three
of nine participants in the Target Group started interacting
after the SA. Thus, our strategy was able to stop 33% from
checking function of UNI
Moreover, all participants in the Target Group were sur-
prised by UNI’s SA utterances.
C. Prediction 3
We observed groups in the EG whose behaviors are NIB
and investigated to see if they would eventually interact after
SA. Participants who did not answer to any of UNI’s ques-
tions were declared as NIB participants. During observation
of videos, groups and individuals were treated the same.
For the evaluation of Prediction 3, six groups out of
the experiment group were examined. They heard UNI’s
question “May I explain more?” or “Should I explain it
from the begining?”. But they did not answer UNI’s question
at first. Among these groups, three replied to UNI’s SA
and replied affirmatively after hearing the SA utterance (b)
“Do you hear me?” from UNI. Those replies were also
seen as an interaction with the robot. All together, three out
of six groups started interacting with UNI because of SA.
Our strategy succeeded in involving 50% of the EG which
showed NIB in the interaction.
Also in this case, participants were surprised by UNI’s SA
utterances.
VI. DISCUSSION
A. Prediction 1
Because participants in EG continued to listen to UNI’s
contents due to SA, we believed that more utterances would
be played in EG. However, as expressed in Sec.V-A, there
was no significant difference between EG and CG. The
number of played utterances in CG was even higher.
We believe that the participants’ previous knowledge of
the building caused them to be less interested in hearing
UNI’s explanation.
B. Prediction 2
As mentioned above, 33% of the participants in the
Target Group stopped played with UNI’s camera. We believe
this phenomenon is due to their understanding of UNI’s
communicative and informative intention. Because of this,
UNI could change the participants’ context.
However, 67% of the participants did not stop their ex-
aminations and they were amused by UNI’s face tracking.
There are two possibilities.
• Step 2 failure. Because they could not infer UNI’s
informative intention, they did not change their context.
• Step 3 failure. They wanted to stay in their context (i.e.,
they did not intend to hear UNI’s content) though they
could understand UNI’s informative intention.
Considering the design of SA utterances and the result that
all participants in the Target Group were surprised by UNI’s
SA utterances, Step 3 failed was the reason why they did not
stop WB. A surprise in this case indicated that people did
not expect the existence of UNI’s intention.
Moreover, since they already might have known the
building, UNI did not offer any new information to them.
This would be also an explanation for why they were not
interested in UNI’s guidance content.
667
(1) (2) (3) (4) (5)
Fig. 9. An Example of a participant who waved its hands
(1) (2) (3) (4)
Fig. 10. An example of a participant who walked from side to side
C. Prediction 3
50% of participants whose behaviors were NIB shared
context with UNI and began to interact with it. An SA
utterance could change the participants’ behaviors and draw
attention to UNI’s context. We suggest that the informative
intention and communicative intention could be used to
convey a message to them, because of their reaction to the
SA utterances expressed in Sec. V-C.
In contrast, though the other 50% understood UNI’s inten-
tion when it said SA utterance, they remained in their context
and did not start to interact with it. Considering the design
of SA utterances and their surprise, Step 3 failure was the
reason why they did not stop NIB.
D. Future Study
We aim to design the next experiment to verify the
effectiveness of SA with different utterances. For example,
saying an utterance with SA: “It’s OK. I can see you. Please
listen to me” in EG, while saying an utterance without SA:
“Please listen to me” in CG. When an SA utterance after a
question leads to a response, we have to repeat the question.
For the following experiment, it will be important to get
a higher number of participants into the target group.
As mentioned, the content of the robot’s guidance was not
suitable because all participants already knew it. We need
new content that participants do not know.
The effect of surprise caused by SA must be investigated.
We do not yet know if surprise is the main reason for a
sudden response or not.
VII. CONCLUSION
In this paper, we proposed a communication strategy
called SA that focuses on helping guide robots in public
places to draw attention to their spoken content. A robot can
show its awareness of people’s context using SA so that they
then change their context.
We conducted an experiment in the lobby of a building in
our university, and then used the recorded scenes from the
experiment for this research. SA stopped 33% of participants
from playing with the robot and succeeded involving 50% in
the interaction. This suggests that people noticed the inten-
tion of the robot because of SA utterances. We believe that
SA will become an effective method after a few adjustments.
REFERENCES
[1] J. Schulte, C. Rosenberg, and S. Thrun, Spontaneous, Short-termInteraction with Mobile Robots, Proceedings of the 1999 IEEE In-
ternational Conference on Robotics & Automation, pp. 658-663, 1999[2] Y. Koide, T. Kanda, Y. Sumi, K. Kogure, and H. Ishiguro, An
Approach to Integrating an Interactive Guide Robot with UbiquitousSensors, Proceedings of 2004 IEEE RSJ International Conference on
Intelligent Robots and Systems, Vol. 3, pp. 2500-2505, 2004[3] K. Nohara, T. Tajika, M. Shiomi, T. Kanda, and H. Ishiguro, and
Norihiro Hagita, Integrating Passive RFID tag and Person Trackingfor Social Interaction in Daily Life, Proceedings of the 17th IEEE
International Symposium on Robot and Human Interactive Communi-
cation, pp. 545-552, 2008[4] M. Shiomi, T. Kanda, H. Ishiguro, and N. Hagita, Interactive Hu-
manoid Robots for a Science Museum, Proceedings of the 1st ACM
SIGCHI/SIGART conference on Human-robot interaction (HRI’06),pp. 305-312, 2006
[5] M. Okada, Y. Hoshi, K. Yamazaki, A. Yamazaki, Y. Kuno, and Y.Kobayashi, Museum Guide Robot Attracting Visitors in a Talk: Syn-chronously Coordination in Verbal and Bodily Behaviors (in Japanese),Human-Agent Interaction Symposium 2008, pp. 2B-1, 2008
[6] K. Hayashi, D. Sakamoto, T. Kanda, and M. Shiomi, HumanoidRobots as a Passive-Social Medium - A Field Experiment at a TrainStation Proceedings of the ACM/IEEE international conference on
Human-robot interaction, pp. 137-144, 2007[7] T. Ono, M. Imai, and H. Ishiguro, Anthropomorphic Communications
in the Emerging Relationship between Humans and Robots, Proceed-
ings of the 2000 IEEE International Workshop on Robot and Human
Interactive Communication, pp. 334-339, 2000[8] A. Bruce, I. Nourbakhsh, and R. Simmons. The Role of Expressiveness
and Attention in Human-Robot Interaction, AAAI Fall Symposium,2001.
[9] D. Sperber and D. Wilson, Relevance: Communication and Cognition,Oxford, Basil Blackwell, 1986
668