department of psychology realism on the impression of artificial human … · type of human-like...

16
Jun’ichiro Seyama* Department of Psychology Faculty of Letters University of Tokyo 7-3-1 Hongo, Bunkyo-ku Tokyo 113-0033, Japan Ruth S. Nagayama Department of Humanities and Social Sciences Shizuoka Eiwa Gakuin University Presence, Vol. 16, No. 4, August 2007, 337–351 © 2007 by the Massachusetts Institute of Technology The Uncanny Valley: Effect of Realism on the Impression of Artificial Human Faces Abstract Roboticists believe that people will have an unpleasant impression of a humanoid robot that has an almost, but not perfectly, realistic human appearance. This is called the uncanny valley, and is not limited to robots, but is also applicable to any type of human-like object, such as dolls, masks, facial caricatures, avatars in virtual reality, and characters in computer graphics movies. The present study investigated the uncanny valley by measuring observers’ impressions of facial images whose de- gree of realism was manipulated by morphing between artificial and real human faces. Facial images yielded the most unpleasant impressions when they were highly realistic, supporting the hypothesis of the uncanny valley. However, the uncanny valley was confirmed only when morphed faces had abnormal features such as bi- zarre eyes. These results suggest that to have an almost perfectly realistic human appearance is a necessary but not a sufficient condition for the uncanny valley. The uncanny valley emerges only when there is also an abnormal feature. 1 Introduction Roboticists have attempted to construct humanoid robots whose physical appearance is indistinguishable from real humans (e.g., Kobayashi, Ichikawa, Senda, & Shiba, 2003; Minato, MacDorman, et al., 2004; Minato, Shimada, Ishiguro, & Itakura, 2004). However, Mori (1970) warned that robots should not be made too similar to real humans because such robots can fall into the “uncanny valley,” where too high a degree of human realism evokes an un- pleasant impression in the viewer (see also Norman, 2004; Reichardt, 1978). To summarize his informal observations and predictions of how a robot’s degree of realism in physical appearance (or humanlikeness) can affect a human observer’s impression of the robot, Mori introduced a hypothetical graph of the impression of pleasantness as a function of the degree of realism (Figure 1). Although the degree of realism was defined as the robot’s physical similar- ity to real humans, the impression of pleasantness for the ordinate of the graph was not clearly defined. One definition consistent with Mori’s conjecture is that the ordinate numerically represents degrees of any pleasant impressions (e.g., attractive, pretty, and fascinating) in the positive range and any unpleas- ant impressions (e.g., unattractive, ugly, and uncanny) in the negative range. *Correspondence to [email protected]. Seyama and Nagayama 337

Upload: lyngoc

Post on 11-Mar-2019

213 views

Category:

Documents


0 download

TRANSCRIPT

Jun’ichiro Seyama*Department of PsychologyFaculty of LettersUniversity of Tokyo7-3-1 Hongo, Bunkyo-kuTokyo 113-0033, Japan

Ruth S. NagayamaDepartment of Humanities andSocial SciencesShizuoka Eiwa Gakuin University

Presence, Vol. 16, No. 4, August 2007, 337–351

© 2007 by the Massachusetts Institute of Technology

The Uncanny Valley: Effect ofRealism on the Impression ofArtificial Human Faces

Abstract

Roboticists believe that people will have an unpleasant impression of a humanoidrobot that has an almost, but not perfectly, realistic human appearance. This iscalled the uncanny valley, and is not limited to robots, but is also applicable to anytype of human-like object, such as dolls, masks, facial caricatures, avatars in virtualreality, and characters in computer graphics movies. The present study investigatedthe uncanny valley by measuring observers’ impressions of facial images whose de-gree of realism was manipulated by morphing between artificial and real humanfaces. Facial images yielded the most unpleasant impressions when they were highlyrealistic, supporting the hypothesis of the uncanny valley. However, the uncannyvalley was confirmed only when morphed faces had abnormal features such as bi-zarre eyes. These results suggest that to have an almost perfectly realistic humanappearance is a necessary but not a sufficient condition for the uncanny valley. Theuncanny valley emerges only when there is also an abnormal feature.

1 Introduction

Roboticists have attempted to construct humanoid robots whose physicalappearance is indistinguishable from real humans (e.g., Kobayashi, Ichikawa,Senda, & Shiba, 2003; Minato, MacDorman, et al., 2004; Minato, Shimada,Ishiguro, & Itakura, 2004). However, Mori (1970) warned that robots shouldnot be made too similar to real humans because such robots can fall into the“uncanny valley,” where too high a degree of human realism evokes an un-pleasant impression in the viewer (see also Norman, 2004; Reichardt, 1978).

To summarize his informal observations and predictions of how a robot’sdegree of realism in physical appearance (or humanlikeness) can affect a humanobserver’s impression of the robot, Mori introduced a hypothetical graph ofthe impression of pleasantness as a function of the degree of realism (Figure1). Although the degree of realism was defined as the robot’s physical similar-ity to real humans, the impression of pleasantness for the ordinate of the graphwas not clearly defined. One definition consistent with Mori’s conjecture isthat the ordinate numerically represents degrees of any pleasant impressions(e.g., attractive, pretty, and fascinating) in the positive range and any unpleas-ant impressions (e.g., unattractive, ugly, and uncanny) in the negative range.

*Correspondence to [email protected].

Seyama and Nagayama 337

In Mori’s graph, a positive impression of pleasantness ofa robot increases with an increasing degree of realism.For example, Honda ASIMO (Sakagami et al., 2002)may be more attractive than industrial robots. Moriclaimed, however, that human observers have excep-tionally unpleasant impressions of robots that have analmost, but not perfectly, realistic human appearance.This effect is shown in Mori’s graph as a negative peakat a relatively high level of realism (Figure 1). Moricalled this negative peak the uncanny valley, by analo-gizing the shape of his graph to a mountain. Along theabscissa of Mori’s graph, not only robots but also vari-ous kinds of humanlike artificial objects (e.g., dolls andprosthetic hands) were sorted in subjective order of de-gree of realism. Thus, Mori’s hypothesis is not limitedto robots but is also applicable to any type of artificialhumanlike object.

The physical appearance of robots that are supposedto communicate, cooperate, and coexist with humansshould be designed with due consideration of the emo-tional and psychological impact on human observers.

The same holds true for the physical appearances ofagents in virtual reality and characters in computergraphics movies. Mori’s hypothesis has been adopted asa guideline for designing the physical appearance of ro-bots (Canamero & Fredslund, 2001; DiSalvo, Gem-perle, Forlizzi, & Kiesler, 2002; Fong, Nourbakhsh, &Dautenhahn, 2003; Hara, 2004; Hinds, Roberts, &Jones, 2004; Minato, MacDorman, et al., 2004; Mi-nato, Shimada, et al., 2004; Woods, Dautenhahn, &Schulz, 2004) and agents in virtual reality (Aylett,2004; Fabri, Moor, & Hobbs, 2004; Wages, Grunvo-gel, & Grutzmacher, 2004). According to Mori’s hy-pothesis, designers should seek a moderate level of real-ism (e.g., within the range B in Figure 1) for thephysical appearance of robots and virtual reality agentsin order to avoid falling into the uncanny valley.

However, the validity of the uncanny valley has notbeen confirmed with psychological evidence. Thus it isuncertain whether the uncanny valley actually emergesat certain realism levels. In the present study we mea-sured observers’ ratings of pleasantness of facial imageswith varying degrees of realism. The degree of realismwas manipulated by morphing between images of artifi-cial and real human faces: the degree of realism was rep-resented as a morphing percentage (% of real human).The artificial face images used in this study were photo-graphs of dolls and computer graphics images of humanmodels. Participants in each experiment rated their im-pressions of the pleasantness of the morphed images ona five-point scale. The rated scores were plotted againstthe degree of realism in order to empirically validateMori’s hypothetical graph.

In hypothesizing about the emergence of the un-canny valley, Mori assumed that the impression of pleas-antness is zero (i.e., neutral impression) when the real-ism of robots is extremely low (e.g., industrial robots)and highest for a perfectly realistic human appearance(Figure 1). In other words, he assumed that impressionsof pleasantness increase monotonically with an increas-ing degree of realism, except for the uncanny valley.However, this assumption may not be valid. In fact,some robots, dolls, and human characters in computergraphics films seem highly pleasant although very unre-alistic, while others are unpleasant. Similarly, humans

Figure 1. Mori’s (1970) uncanny valley. Human observers’

impression of artificial objects is plotted hypothetically as a function of

the degree of realism (the artificial objects’ similarity to real humans).

Mori assumed that the level of impression would drop suddenly at a

relatively higher degree of realism, and called this dip the uncanny

valley. Regions A and B are realism ranges related to discussion in

Experiment 2. Modified after Mori (1970).

338 PRESENCE: VOLUME 16, NUMBER 4

vary in impressions of pleasantness although all are real-istic. Thus, the realism-pleasantness graph might notnecessarily show a monotonic increasing trend. How-ever, in the present study, we focus only on whether thenegative peak, that is, the uncanny valley, actuallyemerges in an empirically obtained realism-pleasantnessgraph.

2 Method

2.1 Participants

The experiments were web-based. Each participantaccessed a web page using a web browser. They werefirst guided to pages where they read instructions writ-ten in Japanese, and then they decided whether or notto participate in this study. Although most of the partic-ipants were students at the University of Tokyo, Shi-zuoka Eiwa Gakuin University, or Tokyo National Uni-versity of Fine Arts and Music, some of them wereidentified only by their self-reported age and gender.

2.2 Stimuli

The stimuli were frames of image sequences inwhich an artificial face was gradually morphed into a realface. Examples of the stimuli are shown in Figures 2, 3,5, 7, and 9. To define the correspondence between thetwo source images (i.e., artificial and real facial images),landmarks on each face were manually chosen. For thestimuli used in Experiments 1–3, the number of thelandmarks was 9 for each eye, 8 for each eyebrow, 9 forthe nose, 14 for the mouth, and 26 for facial contour.For the stimuli used in Experiment 4, the number ofthe landmarks varied from 222 to 325 depending onthe details of the image content. Although the internalfacial features and the face lines in the lower halves ofthe faces corresponded as precisely as possible, the outercontours of the upper halves of the faces (i.e., hairshapes) and the hairlines corresponded only roughly,because they were drastically different. Morphing soft-ware transformed positions and/or pixel values betweencorresponding points of the artificial and real faces.Morphing ratios, which controlled the magnitude of the

morphing, were defined as the percentages of real hu-man. Thus, an image with a morphing ratio of 0%would correspond to an unmorphed image of an artifi-cial face, and an image with a morphing ratio of 100%would be a perfectly realistic human face image. Detailsof the morphing procedure differed among the experi-ments, which will be described later. We obtained writ-ten consent to use the photographs of the human facesfor academic purposes.

Each face was presented on a uniform square back-ground with a height and width of 256 pixels. The sizeand orientation of each face were normalized so that theeyes were aligned along a horizontal line, and the dis-tance between the left and right pupils was 60 pixels.The actual size of the stimuli measured in visual anglesis not known, because it depended on the viewing con-dition of each participant when he/she accessed theweb page for the experiments.

2.3 Procedure

Each participant executed JavaScript programs in aweb browser. In each trial a stimulus image and fivebuttons were presented in the web browser window(Figure 2). Each of the five buttons showed a Japanese

Figure 2. A screenshot of a browser window during Experiment 4.

Seyama and Nagayama 339

phrase corresponding to “extremely unpleasant,” “un-pleasant,” “difficult to decide (uncertain),” “pleasant,”or “extremely pleasant.” These buttons represented afive-point scale ranging from –2 (extremely unpleasant)to �2 (extremely pleasant). Participants were instructedto interpret the word pleasant as representing any emo-tionally positive adjective such as attractive, pretty, natu-ral, healthy, intimate, or elegant, and the word unpleas-ant as representing any emotionally negative adjectivesuch as unattractive, fearful, ugly, abnormal, sick, orinelegant. Each participant rated the pleasantness of thepresented image by clicking the corresponding button.After the button was clicked, the stimulus image wasreplaced with a uniform black field for 1 s, and then thenext trial started. The images for each experiment werepresented in random order. Participants performed twopractice trials at the beginning of each experiment. Atthe end of each experiment, participants sent the data,as well as their age and gender, to the authors. At thisstage, participants were able to decide whether or not tosend the data. The number of participants who did notsend the data is not known.

2.4 Data analysis

The pleasantness scores for each experiment wereaveraged across participants and submitted to repeated-measures ANOVAs. When necessary, Bonferroni’s mul-tiple comparisons were performed. Statistical tests forthe other analyses will be specified as necessary.

3 Results and Discussion

3.1 Experiment 1

Let us hypothesize that an almost, but not per-fectly, realistic human appearance possesses an excep-tional perceptual significance and that for some reasonthe human visual system generates an unpleasant im-pression for such a “special” degree of realism. This hy-pothesis predicts that the uncanny valley would alwaysemerge at some point when the degree of realism is in-creased approaching the most realistic level.

To test this, we showed the participants images fromthree types of morphing sequences, in each of which an

Figure 3. Examples of stimuli used in Experiment 1. (a) Morphing sequence from Doll A to Human A,

(b) Morphing sequence from CG A to Human B, and (c) Morphing sequence from CG B to Human A,

with morphing percentages of 0, 30, 50, 70, and 100.

340 PRESENCE: VOLUME 16, NUMBER 4

artificial face was gradually morphed into a real humanface (Figure 3). The morphing ratio varied from 0% to100% in increments of 10%. Forty-nine participants(mean age 26.5 years, 20 female) rated the pleasantnessof the face images. The percentage of real human (i.e.,the degree of realism) was significantly related to thepleasantness score for all three morphing sequences(Fs(10, 480) � 5.96, ps � .001). However, the empiri-cally obtained realism-pleasantness graphs did not shownegative peaks (Figure 4). In one morphing sequence(Figure 3a), the face of a doll (Doll A, Pongratz Pup-pen) was morphed into a 19-year-old Japanese female’sface (Human A). This morphing sequence produced apositive rather than a negative peak at 80% real human(Figure 4, open circles). The second morphing se-quence, in which the face of another doll (Doll B,BP053-1, CITITOY) was morphed to a one-year-oldJapanese female’s face (Human G), yielded the highestscore at 60% real human without producing a negativepeak (Figure 4, filled squares). The images based onDoll B are not shown in Figure 3 due to copyright con-siderations. For the third morphing sequence (Figure3b), in which a computer graphics (CG) image of anadult female face (CG A, Poser2 model bundled inPoser4, Curious Labs Inc.) was morphed into a 21-year-

old Japanese female’s face (Human B), the pleasantnessscore increased monotonically with an increasing degreeof realism (Figure 4, open triangles).

Thirty-seven participants (mean age 24.9 years, 15female) rated the pleasantness scores for anothermorphing sequence in which CG B (Aiko 3.0, DAZProductions, Inc., Figure 3c) was morphed into HumanA. To improve the precision of the data analysis, thestep size of the morphing ratio was halved (i.e., 5%),and sixty-eight landmarks were used for the eyes toachieve smoother morphing. In spite of these improve-ments, however, there was no clear indication of a nega-tive peak corresponding to the uncanny valley (Figure 4,open diamonds). Although the pleasantness scores weresignificantly influenced by the percentage of real human(F(20, 720) � 2.48, p � .001), a multiple comparisonrevealed that a significant difference was obtained onlybetween the scores for 0% real human and 70% real hu-man (p � .05).

Although the morphing sequences produced differenttendencies, none of the four types of morphing se-quences showed that an almost perfectly realistic humanappearance was a sufficient condition for the uncannyvalley to emerge.

3.2 Experiment 2

In Mori’s (1970) hypothetical graph (Figure 1),robots were supposed to have no resemblance to realhumans at the lowest level of realism (e.g., industrialrobots equipped with only manipulators). On the otherhand, the stimuli used in Experiment 1 had reasonablyrealistic human appearances even at 0% real human. Themorphing ratio of 0% did not imply that the face had noresemblance to real humans; it only indicated that theimage was the same as the original image of an artificialface. Thus, one may argue that Experiment 1 failed todetect the uncanny valley because we tested only a lim-ited range of realism, denoted as A in Figure 1. Further-more, the images were somewhat unrealistic even at themorphing ratio of 100%, because they were low-resolu-tion images presented on computer displays. So it ispossible that Experiment 1 instead may only have testedthe range of realism denoted as B in Figure 1, where the

Figure 4. Pleasantness scores averaged across participants

(Experiment 1) for the Doll A–Human A sequence (circles), for the

Doll B–Human G sequence (squares), for the CG A–Human B

sequence (triangles), and for CG B–Human A sequence (diamonds).

Error bars are 95% confidence intervals.

Seyama and Nagayama 341

uncanny valley was not involved. In Experiment 2, wetested whether the failure to detect the uncanny valleyin Experiment 1 was due to an inappropriate range ofrealism in the stimuli.

Forty-five participants (mean age 23.6 years, 22 fe-male) observed images from morphing sequences wherethe eyes and head (i.e., facial regions other than theeyes) were asynchronously morphed. In the eyes firstsequence (Figure 5a), only the eyes of Doll A weremorphed into those of Human A while the head wasunchanged, resulting in realistic human eyes in an artifi-cial head. Next, the artificial doll head was morphedinto that of Human A, resulting in a wholly realistic hu-man face. Participants gave the lowest pleasantness scorewhen the eyes were 100% real human and the head was0% real human. This morphed image had higher realismthan the unmorphed image of Doll A because of its realhuman eyes, and lower realism than the unmorphedimage of Human A because of its artificial head. In thehead first sequence (Figure 5b), the head was morphedfirst and then the eyes. Participants gave the lowestpleasantness score when the head was 100% real humanand the eyes were 0% real human. This face also hadhigher realism than the unmorphed image of Doll Abecause of its real human head, and lower realism thanthe unmorphed image of Human A because of its artifi-cial eyes. Thus, in these two types of morphing se-

quences, the negative peaks were found where the de-grees of realism were between those of Doll A andHuman A. As shown in Figure 6, the percentage of realhuman significantly influenced the pleasantness score(Fs(10, 440) � 28.0, ps � .001), and the lowest scoreswere significantly lower than those for the unmorphedimages of Doll A and Human A (ps � .001).

The negative peaks found in Experiment 2 give em-pirical evidence for the emergence of the uncanny valleywithin a range comparable to the range tested in Experi-

Figure 6. Pleasantness scores averaged across participants

(Experiment 2) for the eyes first sequence (squares), and for the

head first sequence (triangles). Error bars are 95% confidence

intervals.

Figure 5. Examples of stimuli used in Experiment 2. (a) Morphing sequence from Doll A to Human A,

where the eyes were morphed first and then the head. Morphing ratios were 0–0, 60–0, 100–0,

100–60, and 100–100 (eyes %–head %). (b) Sequence where the head was morphed first.

342 PRESENCE: VOLUME 16, NUMBER 4

ment 1. Therefore, the failure to detect the uncannyvalley in Experiment 1 is not attributable to the limitedrange of realism.

The uncanny valley emerged when the eyes and thehead showed the largest mismatch in the degree of real-ism. Such mismatched realism was not presented in Ex-periment 1, where the facial features were morphed syn-chronously and the uncanny valley did not emerge. Thissuggests that mismatched realism may be a necessarycondition for the uncanny valley’s emergence. However,we suspect that abnormalities in the stimuli, rather thanthe mismatched realism per se, were the direct cause ofthe emergence of the uncanny valley. Doll A’s face isabnormal if it is viewed as a real human face, in thesense that its features remarkably deviate from a realhuman. Nevertheless, the pleasantness scores for theunmorphed image of Doll A were not significantlylower than zero (one-tailed t tests, Experiment 1,t(48) � .60, p � .05; Experiment 2, t(44) � .17, p �

.05). This suggests that the deviation may have beenviewed as an artistic representation rather than abnor-mality, although Doll A’s potential attractiveness forchildren (Zeit, 1992) may have been underestimated byadult participants. However, the faces at the bottom ofthe uncanny valley may have been judged abnormal dueto the mismatched realism between the eyes and thehead. If the eyes were judged based on the head’s real-

ism, Human A’s eyes in Doll A’s head may have beenjudged abnormal as a doll’s eyes (Figure 5a) and DollA’s eyes in Human A’s head may have been judged ab-normal as real human eyes (Figure 5b). Such judgmentsabout the eyes, with reference to the head, are consis-tent with past findings that visual features of the headcan influence the perceptual processing of the eyes (Hi-etanen, 1999; Kontsevich & Tyler, 2004; Langton,2000; Seyama & Nagayama, 2002, 2005). The abnor-malities induced by the mismatched realism may haveproduced unpleasant impressions, which in turn pro-duced the uncanny valley.

The same reasoning holds true for the judgments ofthe abnormality of the head based on the realism of theeyes. However, it is still uncertain whether the percep-tual processing of the head is influenced by the appear-ance of the eyes.

3.3 Experiment 3

If the uncanny valley reflects unpleasant impres-sions of abnormalities, then any type of abnormalitybesides mismatched realism should also produce theuncanny valley. Among various factors that can makefaces bizarre (see, e.g., Murray, Rhodes, & Schuchinsky,2003), we tested the effect of abnormal eye size usingtwo morphing sequences in Experiment 3 (Figure 7).

Figure 7. Examples of stimuli used in Experiment 3. (a) Morphing sequence from Doll A to Human A

with a manipulation of eye size (Doll A, Doll A with eyes scaled to 150%, 50% morph between Doll A

and Human A both with 150% eyes, Human A with 150% eyes, and Human A). (b) Morphing sequence

from CG B to Human B.

Seyama and Nagayama 343

Forty participants (mean age 22.5 years, 33 female)rated images from the Doll A–Human A sequence (Fig-ure 7a), and thirty-nine participants (mean age 24.2years, 19 female) rated images from the CG B–HumanB sequence (Figure 7b).

In each sequence, the eye size of an artificial face firstincreased from the original size (100%) up to 150%;however, the artificial faces did not produce unpleasantimpressions (i.e., negative scores) even when their eyeswere scaled to 150% (Figure 8, left plot area). The eyesize did not significantly influence the pleasantness scorefor Doll A (F(5, 195) � 1.49, p � .05). Although theeye size significantly influenced the pleasantness scorefor CG B (F(5, 190) � 7.91, p � .001), the score for150% eye size was not significantly different from thatfor 100% eye size (p � .05).

After the eye size was scaled to 150%, each artificialface was morphed into a real human face with 150%eyes. Pleasantness scores for the faces with 150% eyesdecreased with increasing degree of realism (Figure 8,middle plot area). The Doll A–Human A sequence pro-duced the lowest score for 100% real human face withenlarged eyes. Although the CG B–Human B sequenceproduced the lowest score for 80% real human, thescores for 60–100% were not significantly different fromone another (ps � .05).

Finally, the enlarged eyes of the real human face were

reduced to their original size. The pleasantness scoresincreased with decreasing eye size (Figure 8, right plotarea). For each image sequence, the eye size significantlyinfluenced the pleasantness score (F(5, 195) � 49.73for Human A; F(5, 190) � 51.33 for Human B; ps �

.001), and the score for 150% eye size was significantlydifferent from 100% eye size (ps � .001). As a result,the uncanny valley emerged around the real humanfaces with 150% eyes. For the faces with varying degreeof realism, the eyes scaled to 150% and the head weremorphed synchronously in each morphing sequence.Thus, the abnormalities induced by the mismatchedrealism do not account for the results.

The results suggest that an interaction between theabnormal eye size and realism produced negative peaksthat are comparable to Mori’s (1970) uncanny valleyonly for the faces with higher degrees of realism. Itshould be pointed out, however, that in each morphingsequence the degree of realism covaried with other facialcharacteristics such as gender, age, and expression.Thus, it is possible that the factor that interacted withthe abnormality in Experiment 3 was a covarying factorrather than the realism.

To test this possibility, we presented the face stimuliwith 100% eye size to 24 naive raters (undergraduates atTokyo National University of Fine Arts and Music andRikkyo University), and asked them to judge the real-ism, gender, facial expression, and age of each face. Forthe realism judgment, raters compared two faces pre-sented side by side, and all raters consistently judgedthat the human faces were more realistic than the artifi-cial faces (ps � .0001, binomial tests). This suggeststhat the realism in the two morphing sequences in-creased from left to right along the abscissa of Figure 8(middle plot area).

The raters’ judgments about gender and facial expres-sion were not consistent between the two morphingsequences. In the gender judgment, 18 of the 25 ratersjudged Doll A as male, and all raters judged Human Aas female (ps � .05, binomial tests). For CG B and Hu-man B, all raters judged both faces as female (p �

.0001). For the facial expression judgment, the raterschose the most suitable expression for each face amonghappy, sad, surprised, angry, disgusted, fearful (six basic

Figure 8. Pleasantness scores averaged across participants

(Experiment 3). Upper abscissa: eye size scaling factor. Lower

abscissa: degree of realism. Circles: Doll A to Human A. Triangles: CG

B to Human B. Error bars are 95% confidence intervals.

344 PRESENCE: VOLUME 16, NUMBER 4

emotions; Ekman & Friesen, 1975) and neutral expres-sions. The overall patterns of the expression choice weresignificantly different between Doll A and Human A(p � .0001, Fisher’s exact test), but were not signifi-cantly different between CG B and Human B (p � .79).Therefore, neither gender nor facial expressions consis-tently explain the results for the two morphing sequences.

For the judgments about age, the raters consistentlyjudged that the human faces were older than the artifi-cial faces. The raters observed two faces presented sideby side, and chose the older one. All raters judged thatHuman A was older than Doll A, and 22 of the 25 rat-ers judged that Human B was older than CG B (ps �

.001, binomial tests). The raters also estimated the agein years for each face. The mean estimated ages for Hu-man A (24.0 years, SD � 1.7) and Doll A (5.6 years,SD � 2.2) were significantly different (t(24) � 35.0,

p � .001). Also the mean estimated age for Human B(24.5 years, SD � 4.1) and CG B (20.6 years, SD �

4.7) were significantly different (t(24) � 4.15, p �

.001).Thus, one may argue that what produced the un-

canny valley in Figure 8 was the age rather than the real-ism interacting with abnormal eye size. It should also bepointed out that, even if the realism played a major rolein Experiment 3, there still remains a possibility that therealism interacted with confounding factors other thanabnormal eye size. The possible influences of confound-ing factors were investigated further in Experiment 4.

3.4 Experiment 4

To minimize the influence of confounding factors,we used images of artificial faces that were as similar as

Figure 9. Examples of stimuli used in Experiment 4. From left to right columns, dolls with 100% eye

size, dolls with 150% eye size, 50% morphs between doll and human faces with 150% eye size, humans

with 150% eye size, and humans with 100% eye size.

Seyama and Nagayama 345

possible to real human faces. The artificial face imageswere produced by mapping the textures of artificial facesonto real human faces (Figure 9). In other words,shapes of artificial face images were warped into those ofreal human faces without changing their textures. As isshown later, most raters judged these stimuli to be arti-ficial faces in spite of the realistic humanlike shapes. Inproducing the stimuli, four human face images (2 maleand 2 female Japanese, age 21 to 22 years; Humans C,D, E, and F) were paired with four doll face images(Dolls C, D, E, and F). Doll C was a classic Japanesedoll photographed by the authors, and Dolls D–F wereimages of masks obtained from a commercially availablephoto collection (Mask by Corel, SKU:CPH480).Based on these four Doll–Human pairs, stimuli wereproduced in a manner similar to Experiment 3. For sim-plicity, the eye size was scaled only to 100%, 125%, and150%, and the doll faces with 150% eye size weremorphed into the human faces with 150% eye size inonly five steps (0, 25, 50, 75, and 100% real human). Inaddition, morph images between the artificial and realhuman faces with 100% eye size were also produced tomeasure the pleasantness scores for normal eye size.Forty-seven participants (mean age 20.6 years, 26 fe-male, 20 male, one did not report his/her gender) ratedtheir impressions of the pleasantness of these stimuli.

Figure 10 shows the mean pleasantness scores in thesame manner as Figure 8. As shown in the left plot area,the effect of eye size was not consistent across the fourdolls. The eye size significantly influenced the pleasant-ness score only for Dolls C and E (Fs(2, 92) � 5.88,ps � .01). In contrast, the eye size influenced the pleas-antness score in a consistent manner across the four hu-man faces (Figure 10, right plot area). For the humanfaces, the pleasantness score decreased with increasingeye size (Fs(2, 92) � 27.8, ps � .001), and the eyesscaled to 150% produced the most unpleasant impres-sion.

Although the results of Experiment 4 did demon-strate how the impression of faces with a 150% eye sizevaried as a function of the morphing percentage (Figure10, middle plot area), the influence of various con-founding factors must be removed from these results toisolate the effects of abnormal eye size and realism. Fig-ure 11 shows the pleasantness scores for faces with100% eye size. These scores may also reflect the influ-ence of confounding factors. Thus, by subtracting thescores in Figure 11 from those for 150% eye size pre-sented in Figure 10 (middle plot area), the influence ofconfounding factors can be eliminated (calibratedscores). For all four morphing sequences (solid lines inFigure 12), the calibrated scores decreased with increas-ing morphing percentage (Fs(4, 184) � 20.71, ps �

.001).

Figure 10. Pleasantness scores for faces with eyes scaled to 150%

(Experiment 4). Upper abscissa: eye size scaling factor. Lower

abscissa: degree of realism. Squares: Doll C to Human C. Diamonds:

Doll D to Human D. Triangles: Doll E to Human E. Circles: Doll F to

Human F. Error bars are 95% confidence intervals.

Figure 11. Pleasantness scores for faces with 100% eye size

(Experiment 4). Squares: Doll C to Human C. Diamonds: Doll D to

Human D. Triangles: Doll E to Human E. Circles: Doll F to Human F.

Error bars are 95% confidence intervals.

346 PRESENCE: VOLUME 16, NUMBER 4

Figure 12 also shows the calibrated scores for theDoll A–Human A and CG B–Human B sequences (bro-ken lines). For the Doll A–Human A sequence, the cali-brated scores were obtained by subtracting the results ofExperiment 1, scores for 100% eye size (Figure 4), fromthose of Experiment 3, scores for 150% eye size (middleplot area in Figure 8). For the CG B–Human B se-quence, the calibrated scores were obtained based onlyon the results of Experiment 3. In Experiment 3, thepleasantness scores for 100% eye size were not measuredat intermediate morphing percentages. Thus, the cali-brated scores for this morphing sequence were yieldedonly at 0 and 100% real human.

As can be seen in Figure 12, the calibrated scores forthe artificial faces (0% real human) were close to zero,indicating that the 100% eye size and the 150% eye sizeyielded similar pleasantness scores. This suggests thatthe scaling of the eyes from 100% to 150% had only aweak influence on the impression created by artificialfaces. On the other hand, for the human faces (100%real human), all six morphing sequences produced thelowest calibrated scores, which were lower than zero.This suggests that the abnormally scaled eyes decreasedpleasantness scores the most when the faces were themost realistic.

Although the interpretation of the ordinate of Figure12 (i.e., the calibrated score) is not confounded, therestill remains the possibility noted earlier, that the ab-scissa may represent variation in a confounding factorother than realism. To test this possibility, 45 naive rat-ers (undergraduates at Tokyo National University ofFine Arts and Music) judged the gender, expression,age, and realism of the stimuli in the same manner asExperiment 3.

All raters judged the gender correctly for Humans D(male), E (female), and F (male), and only three wereincorrect for Human C (female) (ps � .0001, binomialtests). The gender judgment for Human D was not sig-nificantly different from that for Doll D, for which 43raters judged as male (p � .49, Fisher’s exact test).Thus, gender does not account for the results for theDoll D–Human D sequence. On the other hand, thegenders of the other doll faces were judged less consis-tently, and the male/female judgment ratios were sig-nificantly different between the doll and human faces(ps � .01, Fisher’s exact tests). The Doll C–Human Csequence and the Doll E–Human E sequence showedincreasing femininity, and the Doll F–Human F se-quence showed increasing masculinity. In spite of theseopposite variations in gender, all doll–human sequencesyielded the same trends for impressions of pleasantness(Figure 12). Therefore, the factor of gender does notexplain the results.

For the facial expression judgment, most raters choseeither a happy or neutral expression for each face, andthe overall patterns of the expression choices were notsignificantly different between each of the paired dolland human faces (ps � .39, Fisher’s exact test) exceptbetween Doll C and Human C (p � .05). As noted ear-lier, the facial expression judgments in Experiment 3were significantly different between Doll A and HumanA, but not significantly different between CG B andHuman B. Thus, among the six morphing sequencestested in Experiments 3 and 4, the factor of facial ex-pression does not explain the results for the fourmorphing sequences. For Doll C, 30 of the 45 raterschose the neutral expression, but this ratio was reducedto 16/45 for Human C, suggesting that the human faceappeared to be more expressive than the artificial face.

Figure 12. Calibrated scores (Experiment 4) obtained by

subtracting the scores for 100% eye size from those for 150% eye

size. Filled squares: Doll C to Human C. Diamonds: Doll D to Human

D. Triangles: Doll E to Human E. Open circles: Doll F to Human F.

Error bars are 95% confidence intervals. Filled circles and open

squares connected with broken lines represent the scores for Doll A–

Human A sequence and CG B–Human B sequence, respectively.

Seyama and Nagayama 347

In contrast, the neutral expression was chosen by only 5of the 25 raters for Doll A, but 21 of the 25 raters forHuman A, suggesting that the artificial face was moreexpressive than the human face. Because of this incon-sistency between the two morphing sequences, it is diffi-cult to explain the results shown in Figure 12 based onthe facial expression.

For the age judgment, the raters judged that HumanC was older than Doll C (p � .001, binomial test). Forthe other doll–human pairs, they judged that the dollswere older than the humans (ps � .001). Thus, amongthe six morphing sequences tested in Experiments 3 and4, human faces were perceived to be older than artificialfaces in three morphing sequences, but the opposite wastrue in the other three morphing sequences. Despitethese inconsistent age judgments, the six morphing se-quences yielded similar results for the calibrated scoresas shown in Figure 12. Therefore, the factor of age doesnot solely explain the results.

In contrast to the inconsistent judgments of gender,age, and facial expression, the raters consistently judgedthat the human faces were more realistic than the artifi-cial faces in all six morphing sequences tested in Experi-ments 3 and 4 (ps � .0001, binomial tests). Thus,among the four factors considered here, only realismcan consistently explain the results as the factor that in-teracted with the abnormal eye size. Therefore, we mayinterpret the abscissa of Figure 12 as representing thedegree of realism.

4 General Discussion

It is assumed that the human visual system in-volves sophisticated mechanisms for processing facialinformation (e.g., Haxby, Hoffman, & Gobbini, 2000).Such mechanisms seem to be broadly tuned to faceswith various degrees of realism. Real human faces, artifi-cial faces of dolls and robots, computer generated facialimages, schematic line drawings of faces, and even sim-ple face-like patterns consisting of simple geometricshapes (e.g., Robert & Robert, 2000; Turati, 2004) areall accepted as “faces.” Past studies showed that facialimages with different degrees of realism often yielded

comparable experimental results (e.g., Driver et al.,1999 and Friesen & Kingstone, 1998; Wilson, Loffler,& Wilkinson, 2002; Yin, 1969). Nevertheless, in dailylife people rarely confuse artificial faces with real humanfaces; people do not ask a mannequin in a store windowfor directions to a train station. This suggests that thevisual system has sensitivity to the degree of realism offaces.

The present study investigated an effect of the degreeof realism on the impression of pleasantness of artificialhuman faces, and in particular investigated the uncannyvalley hypothesis proposed by Mori (1970). The resultsof our experiments showed that the uncanny valley ac-tually emerged as Mori (1970) had predicted. However,our results also showed that the uncanny valley emergedonly when the face images involved abnormal features.Thus, to fully understand the nature of the uncannyvalley, we need to consider the effects of both the real-ism and the abnormality of artificial human appearance.For example, if human observers have unpleasant im-pressions of the faces of avatars in virtual reality or ro-bots, the unpleasantness should not be attributed solelyto the degree of realism. Probably, the physical appear-ance of such avatars and robots may involve certain ab-normal visual features. Thus, improving the degree ofrealism of such avatars and robots without removing theabnormal features may simply lead to an exaggeration ofthe human observers’ unpleasant impressions of the arti-ficial faces.

Although the degrees of abnormality investigated inExperiments 3 and 4 (i.e., scaling of the eyes to 150%)were identical for artificial and real faces, its impact wasgreater for faces with higher realism. Participants mayhave judged that the eyes scaled to 150% were too largefor real human eyes, but such eyes were acceptable asartificial human eyes. This implies that the judgmentcriterion for real faces was different from that for artifi-cial faces. The human visual system may have knowl-edge about how eye size varies among humans (i.e.,data of the statistical distribution of the size of real hu-man eyes) from past experience, and such knowledgecan serve as a judgment criterion of abnormality. If theeye size on a face deviated from the center of the statis-tical distribution of normal eye sizes, such a face may be

348 PRESENCE: VOLUME 16, NUMBER 4

judged abnormal. On the other hand, knowledge abouthow eye size varies among artificial faces may constituteanother judgment criterion of abnormality. In fact, arti-ficial faces can have arbitrary eye sizes depending on thedesigners’ intentions, and past experience in observingsuch artificial eyes may constitute statistical knowledgethat is different from that about real human faces. Itshould be pointed out that the participants’ tolerance ofthe huge eyes on the artificial faces might have reflectedtheir cultural background. Since most of the participantswere Japanese, their judgment criterion may have beenformed through their experience in watching Japanese-style animation moves, computer games, and comics(manga), in which huge eye size designs have been fre-quently employed.

Otherwise, the uncanny valley may indicate that per-ceptual sensitivity to facial features was higher for realfaces than for artificial faces, and the higher sensitivityfor real faces produced unpleasant impressions of abnor-mality while the lower sensitivity for artificial faces didnot. Sensitivity to facial features is known to be betterfor familiar faces than for unfamiliar faces (Walker &Tanaka, 2003). Thus, the results of Experiments 3 and4 may reflect the fact that participants were more famil-iar with real faces than with artificial faces.

Although we tested only static images, Mori(1970) noted that robots’ motion would also influ-ence the uncanny valley. If the judgment criterion ofan abnormality in motion is different for real and arti-ficial human appearances (Hodgins, O’Brien, & Tum-blin, 1998), then abnormality in motion would pro-duce the uncanny valley depending on the degree ofrealism.

Further studies are necessary to unveil other aspectsof the effect of realism on facial perception and cogni-tion. Such studies will provide clues to further under-standing human responses to artificial human-like ob-jects (e.g., Arita, Hiraki, Kanda, & Ishiguro, 2005;Breazeal, 2003; Garau, Slater, Pertaub, & Razzaque,2005; Hinds et al., 2004; Minato, MacDorman, et al.,2004; Minato, Shimada, et al., 2004; Shinozawa, Naya,Yamato, & Kogure, 2005). One of our unansweredquestions is how the human visual system extracts theinformation of realism from the visual features of face

images. Lacking this knowledge made it difficult to ef-fectively manipulate only the degree of realism in ourstimuli. Since we simply morphed artificial faces into realhuman faces, various confounding factors covaried withthe realism. Although we showed that the realism ex-plains the results of the present study better than theconfounding factors of gender, age, and facial expres-sion, there still remains a possibility that an interactionbetween abnormality and an untested confounding fac-tor better explains the results. The participants’ task inthe present study can be interpreted as a judgment offacial attractiveness (or unattractiveness). Researchershave shown that the facial attractiveness is influenced byvarious factors, such as averageness, youthfulness, sym-metry, hairstyle, and skin smoothness (see Rhodes &Zebrowitz, 2002 for review). The influences of suchfactors on the results of the present study are still un-clear.

The distinction between realism and abnormality isnot so straightforward. We have operationally definedthe degree of realism as the morphing percentage;that is, the similarity of a morphed image to the pho-tograph of a real human face that was used as asource image for the morphing sequence. In the ac-tual definition of realism employed by the human vi-sual system, the similarity may be measured betweenan observed (artificial) face and a certain standardface. The average face of real humans (e.g., Rhodes etal., 2001; Rhodes & Zebrowitz, 2002) may serve asthe standard face, since unrealistic artificial faces aresupposed to have visual features that deviate consider-ably from those of the average face of real humans. Itshould be noted, however, that the degree of abnor-mality (or normality) may also be defined based onsimilarity to the average (or normal) face, since ab-normal faces are supposed to have deviant visual fea-tures. In spite of the similarity between realism andabnormality, the results of the present study suggestthat the human visual system processes realism andabnormality as separate perceptual dimensions. Inother words, the human visual system may define re-alism and abnormality based on different visual fea-tures.

Seyama and Nagayama 349

Acknowledgments

We thank S. Akamatsu for assistance in producing themorphed facial images; M. Nomura, K. Hasegawa, I. Ito, K.Taki, and K. Enokida for assistance in recruiting participants;A. Tanaka and D. Norman for comments; and E. Pongratz forpermission to use the photo of Doll A. The preparation of thismanuscript was supported by MEXT.KAKENHI (17730423).

References

Arita, A., Hiraki, K., Kanda, T., & Ishiguro, H. (2005). Canwe talk to robots? Ten-month-old infants expected interac-tive humanoid robots to be talked to by persons. Cognition,95(3), B49–B57.

Aylett, R. S. (2004). Agents and affect: Why embodied agentsneed affective systems. Lecture Notes in Computer Science,3025, 496–504.

Breazeal, C. (2003). Emotion and sociable humanoid robots.International Journal of Human-Computer Studies, 59(1–2), 119–155.

Canamero, L., & Fredslund, J. (2001). I show you how I likeyou: Can you read it in my face? IEEE Transactions on Sys-tems, Man, and Cybernetics Part A: Systems and Humans,31(5), 454–459.

DiSalvo, C. F., Gemperle, F., Forlizzi, J., & Kiesler, S.(2002). All robots are not created equal: The design andperception of humanoid robot heads. Proceedings of the DISConference, 321–326.

Driver, J., Davis, G., Ricciardelli, P., Kidd, P., Maxwell, E., &Baron-Cohen, S. (1999). Gaze perception triggers reflexivevisuospatial orienting. Visual Cognition, 6, 509–540.

Ekman, P., & Friesen, W. V. (1975). Unmasking the face: Aguide to recognizing emotions from facial clues. EnglewoodCliffs, NJ: Prentice-Hall.

Fabri, M., Moor, D., & Hobbs, D. (2004). Mediating theexpression of emotion in educational collaborative virtualenvironments: An experimental study. Virtual Reality, 7(2),66–81.

Fong, T., Nourbakhsh, I., & Dautenhahn, K. (2003). A sur-vey of socially interactive robots. Robotics and AutonomousSystems, 42(3–4), 143–166.

Friesen, C. K., & Kingstone, A. (1998). The eyes have it! Re-flexive orienting is triggered by nonpredictive gaze. Psy-chonomic Bulletin & Review, 5(3), 490–495.

Garau, M., Slater, M., Pertaub, D.-P., & Razzaque, S. (2005).The responses of people to virtual humans in an immersivevirtual environment. Presence: Teleoperators and Virtual En-vironments, 14(1), 104–116.

Hara, F. (2004). Artificial emotion of face robot throughlearning in communicative interactions with humans. Pro-ceedings of the 2004 IEEE Workshop on Robot and HumanInteractive Communication, 20–22.

Haxby, J. V., Hoffman, E. A., & Gobbini, M. I. (2000). Thedistributed human neural system for face perception. Trendsin Cognitive Sciences, 4(6), 223–233.

Hietanen, J. K. (1999). Does your gaze direction and headorientation shift my visual attention? NeuroReport, 10(16),3443–3447.

Hinds, P. J., Roberts, T. L., & Jones, H. (2004). Whose jobis it anyway? A study of human-robot interaction in a col-laborative task. Human-Computer Interaction, 19(1–2),151–181.

Hodgins, J. K., O’Brien, J. F., & Tumblin, J. (1998). Percep-tion of human motion with different geometric models.IEEE Transactions on Visualization and Computer Graphics,4(4), 307–316.

Kobayashi, H., Ichikawa, Y., Senda, M., & Shiba, T. (2003).Realization of realistic and rich facial expressions by facerobot. Proceedings of the 2003 IEEE International Confer-ence on Intelligent Robots and Systems, 1123–1128.

Kontsevich, L. L., & Tyler, C. W. (2004). What makes MonaLisa smile? Vision Research, 44(13), 1493–1498.

Langton, S. R. H. (2000). The mutual influence of gaze andhead orientation in the analysis of social attention direction.Quarterly Journal of Experimental Psychology, 53A(3), 825–845.

Minato, T., MacDorman, K. F., Shimada, M., Itakura, S., Lee,K., & Ishiguro, H. (2004). Evaluating humanlikeness bycomparing responses elicited by an android and a person.Proceedings of the 2nd International Workshop on Man-Ma-chine Symbiotic Systems, 373–383.

Minato, T., Shimada, M., Ishiguro, H., & Itakura, S. (2004).Development of an android robot for studying human-robot interaction. Lecture Notes in Computer Science, 3029,424–434.

Mori, M. (1970). Bukimi no tani [The uncanny valley]. En-ergy, 7(4), 33–35.

Murray, J. E., Rhodes, G., & Schuchinsky, M. (2003). Whenis a face not a face? The effects of misorientation on mecha-nisms of face perception. In M. A. Peterson & G. Rhodes(Eds.), Perception of faces, objects, and scenes: Analytic and

350 PRESENCE: VOLUME 16, NUMBER 4

holistic processes (pp. 75–91). New York: Oxford UniversityPress.

Norman, D. A. (2004). Emotional design: Why we love (orhate) everyday things. New York: Basic Books.

Reichardt, J. (1978). Robots: Fact, fiction, and prediction. Har-mondsworth, Middlesex: Penguin Books Ltd.

Rhodes, G., Yoshikawa, S., Clark, A., Lee, K., McKay, R., &Akamatsu, S. (2001). Attractiveness of facial averagenessand symmetry in non-Western cultures: In search of biolog-ically based standards of beauty. Perception, 30(5), 611–625.

Rhodes, G., & Zebrowitz, L. A. (Eds.). (2002). Facial attrac-tiveness: Evolutionary cognitive and social perspectives. West-port, CT: Ablex.

Robert, F., & Robert, J. (2000). Faces. San Francisco: Chroni-cle Books.

Sakagami, Y., Watanabe, R., Aoyama, C., Matsunaga, S., Hi-gaki, N., & Fujimura, K. (2002). The intelligent ASIMO:System overview and integration. Proceedings of the 2002IEEE/RSJ International Conference on Intelligent Robotsand Systems, 2478–2483.

Seyama, J., & Nagayama, R. S. (2002). Perceived eye size islarger in happy faces than in surprised faces. Perception,31(8), 1153–1155.

Seyama, J., & Nagayama, R. S. (2005). The effect of torsodirection on the judgment of eye direction. Visual Cogni-tion, 12(1), 103–116.

Shinozawa, K., Naya, F., Yamato, J., & Kogure, K. (2005).Differences in effect of robot and screen agent recommen-dations on human decision-making. International Journalof Human-Computer Studies, 62(2), 267–279.

Turati, C. (2004). Why faces are not special to newborns: Analternative account of the face preference. Current Direc-tions in Psychological Science, 13(1), 5–8.

Wages, R., Grunvogel, S. M., & Grutzmacher, B. (2004).How realistic is realism? Considerations on the aesthetics ofcomputer games. Lecture Notes in Computer Science, 3166,216–225.

Walker, P. M., & Tanaka, J. W. (2003). An encoding advan-tage for own-race versus other-race faces. Perception, 32(9),1117–1125.

Wilson, H. R., Loffler, G., & Wilkinson, F. (2002). Syntheticfaces, face cubes, and the geometry of face space. Vision Re-search, 42(27), 2909–2923.

Woods, S., Dautenhahn, K., & Schultz, J. (2004). The designspace of robots: Investigating children’s views. Proceedingsof the 2004 IEEE International Workshop on Robot and Hu-man Interactive Communication, 20–22.

Yin, R. K. (1969). Looking at upside-down faces. Journal ofExperimental Psychology, 81(1), 141–145.

Zeit, U. (1992). Kunstler machen Puppen fur Kinder: vonMarion Kaulitz bis Elisabeth Pongratz. Duisburg, Germany:Verlag Puppen und Spielzeug.

Seyama and Nagayama 351