factors influencing the perception of realism in...

8
Factors influencing the perception of realism in synthetic facial expressions Rafael Luiz Testa * , Ariane Machado-Lima * and F´ atima L. S. Nunes *† * School of Arts, Sciences and Humanities of the University S˜ ao Paulo, S˜ ao Paulo, Brazil Escola Politecnica of the University of S˜ ao Paulo, S˜ ao Paulo, Brazil Abstract—One way to synthesize facial expressions is to change an image to represent the desired emotion and it is useful in entertainment, diagnostic and psychiatric disorders therapy applications. Despite several existing approaches, there is little discussion about factors that contribute or hinder the perception of realism in synthetic facial expressions images. After presenting an approach for facial expressions synthesis through the de- formation of facial features, this paper provides an evaluation by 155 volunteers regarding the realism of synthesized images. The proposed facial expression synthesis aims to generate new images using two source images (neutral and expressive face) and changing the expression in a target image (neutral face). The results suggest that assignment of realism depends on the type of image (real or synthetic). However, the synthesis presents images that can be considered realistic, especially for the expression of happiness. Finally, while factors such as color difference between subsequent regions and unnatural-sized facial features contribute to less realism, other factors such as the presence of wrinkles contribute to a greater assignment of realism to images. I. I NTRODUCTION In computer graphics, the synthesis consists of generating computer images based on geometric and visual specifications of objects and scenes. One way to obtain such specifications can be from other images, called image-based rendering. In our work, an approach is presented for facial expressions synthesis, aiming is to change the emotional expression present in an image while preserving the identity of the altered face. Synthetic facial expression images can be used in several applications, such as the automation of interactive web agents for low bandwidth videoconferences [1], interactive interfaces [2], improvement in photos [3], biometric systems that are invariant to facial expression [4], facial surgery planning [5], computer animation [6], custom icons [1], besides training and diagnosing facial expression emotions recognition in people with psychological disorders [7], [8]. These applications often require a degree of realism in syn- thetic images. Realism may be a desired factor, for example, because synthetic images must be convincingly mixed with real images [9]. This requirement is the case of applications such as improvements in photographs, animation applied to movies and games and facial expressions training. Realistic images allow users to forget that they are looking at a scenario that does not exist [9], thereby ensuring a greater sense of immersion in the context to which the synthetic facial expression is being applied. The human brain has highly efficient mechanisms for face perception [10], [11]. Therefore, even an unusual little detail can make an image seem strange to the viewer. Much of the work of synthesizing facial expressions thus consists in generating realistic images. Although there is a considerable number of papers dealing with facial expression synthesis as proposed by this work, important aspects to generate these images realistically are not discussed. Therefore, our aim is to discuss these aspects related to realism. To achieve this, we present an approach for synthesizing facial expressions based on a previous work [16], also improving features correspon- dence through changes in source images alignment (Section III-A). Then, we propose an assessing to analyze the realism of synthetic images (Section III-B). Finally, we describe the factors influencing realism in the synthetic images, discussing possible causes and solutions involved (Section IV-B). The synthetic facial expressions herein are two-dimensional (2D) images that represent emotions commonly considered universal [12] (happiness, sadness, fear, surprise, disgust, and anger). The image deformation approach was chosen because it requires only one image of the target person and two images of the source person (neutral and expressive), different from other approaches that may require videos [13] or multiple images [14], [15]. It is thus, possible to cover a larger number of applications. II. RELATED WORKS One way to simulate facial expressions in images is to use video-to-image facial retargeting. In this approach, a search is made for the image/frame that best expresses the desired emotional state in images/videos. Then, the resulting image of search is used as final expressive image. This approach assumes the existence of images of the target person varying facial expressions, which limits the applications that can be made [16], [17]. Another way to perform the synthesis is through 3D facial reconstruction [18], [19] and morphing of the reconstructed model [20], [21]. However, when only a single image of the target person is provided, the reconstruction of the models is highly dependent on precise landmarks [22] and can be time- consuming [23]. Furthermore, resulting images do not usually have fine details such as skin wrinkles and creases [16]. The synthesis can also be performed by transferring the expressive features identified in an example image (source face) to the target face on which the desired expression is reenacted. The expressive features of the face are identified and extracted from the source image. These characteristics

Upload: others

Post on 12-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Factors influencing the perception of realism in …sibgrapi.sid.inpe.br/col/sid.inpe.br/sibgrapi/2018/08.31...2005/08/31  · face and an expressive face of a certain person. The

Factors influencing the perception of realism insynthetic facial expressions

Rafael Luiz Testa∗, Ariane Machado-Lima∗ and Fatima L. S. Nunes∗†∗School of Arts, Sciences and Humanities of the University Sao Paulo, Sao Paulo, Brazil

†Escola Politecnica of the University of Sao Paulo, Sao Paulo, Brazil

Abstract—One way to synthesize facial expressions is to changean image to represent the desired emotion and it is usefulin entertainment, diagnostic and psychiatric disorders therapyapplications. Despite several existing approaches, there is littlediscussion about factors that contribute or hinder the perceptionof realism in synthetic facial expressions images. After presentingan approach for facial expressions synthesis through the de-formation of facial features, this paper provides an evaluationby 155 volunteers regarding the realism of synthesized images.The proposed facial expression synthesis aims to generate newimages using two source images (neutral and expressive face) andchanging the expression in a target image (neutral face). Theresults suggest that assignment of realism depends on the type ofimage (real or synthetic). However, the synthesis presents imagesthat can be considered realistic, especially for the expression ofhappiness. Finally, while factors such as color difference betweensubsequent regions and unnatural-sized facial features contributeto less realism, other factors such as the presence of wrinklescontribute to a greater assignment of realism to images.

I. INTRODUCTION

In computer graphics, the synthesis consists of generatingcomputer images based on geometric and visual specificationsof objects and scenes. One way to obtain such specificationscan be from other images, called image-based rendering. Inour work, an approach is presented for facial expressionssynthesis, aiming is to change the emotional expression presentin an image while preserving the identity of the altered face.Synthetic facial expression images can be used in severalapplications, such as the automation of interactive web agentsfor low bandwidth videoconferences [1], interactive interfaces[2], improvement in photos [3], biometric systems that areinvariant to facial expression [4], facial surgery planning [5],computer animation [6], custom icons [1], besides training anddiagnosing facial expression emotions recognition in peoplewith psychological disorders [7], [8].

These applications often require a degree of realism in syn-thetic images. Realism may be a desired factor, for example,because synthetic images must be convincingly mixed withreal images [9]. This requirement is the case of applicationssuch as improvements in photographs, animation applied tomovies and games and facial expressions training. Realisticimages allow users to forget that they are looking at a scenariothat does not exist [9], thereby ensuring a greater senseof immersion in the context to which the synthetic facialexpression is being applied.

The human brain has highly efficient mechanisms for faceperception [10], [11]. Therefore, even an unusual little detail

can make an image seem strange to the viewer. Much ofthe work of synthesizing facial expressions thus consists ingenerating realistic images. Although there is a considerablenumber of papers dealing with facial expression synthesis asproposed by this work, important aspects to generate theseimages realistically are not discussed. Therefore, our aim is todiscuss these aspects related to realism. To achieve this, wepresent an approach for synthesizing facial expressions basedon a previous work [16], also improving features correspon-dence through changes in source images alignment (SectionIII-A). Then, we propose an assessing to analyze the realismof synthetic images (Section III-B). Finally, we describe thefactors influencing realism in the synthetic images, discussingpossible causes and solutions involved (Section IV-B).

The synthetic facial expressions herein are two-dimensional(2D) images that represent emotions commonly considereduniversal [12] (happiness, sadness, fear, surprise, disgust, andanger). The image deformation approach was chosen becauseit requires only one image of the target person and two imagesof the source person (neutral and expressive), different fromother approaches that may require videos [13] or multipleimages [14], [15]. It is thus, possible to cover a larger numberof applications.

II. RELATED WORKS

One way to simulate facial expressions in images is to usevideo-to-image facial retargeting. In this approach, a searchis made for the image/frame that best expresses the desiredemotional state in images/videos. Then, the resulting imageof search is used as final expressive image. This approachassumes the existence of images of the target person varyingfacial expressions, which limits the applications that can bemade [16], [17].

Another way to perform the synthesis is through 3D facialreconstruction [18], [19] and morphing of the reconstructedmodel [20], [21]. However, when only a single image of thetarget person is provided, the reconstruction of the models ishighly dependent on precise landmarks [22] and can be time-consuming [23]. Furthermore, resulting images do not usuallyhave fine details such as skin wrinkles and creases [16].

The synthesis can also be performed by transferring theexpressive features identified in an example image (sourceface) to the target face on which the desired expression isreenacted. The expressive features of the face are identifiedand extracted from the source image. These characteristics

Page 2: Factors influencing the perception of realism in …sibgrapi.sid.inpe.br/col/sid.inpe.br/sibgrapi/2018/08.31...2005/08/31  · face and an expressive face of a certain person. The

are then adapted to the target face. Finally, the extracted andadapted properties are merged with the target face to generatethe new image. This approach can be made in two ways: imagedeformation or machine learning.

The deformation-based approach consists in changing atarget image based on an example (source image). The sourceimage is used to define a new shape, so that the facial featureslook expressive. Then, the facial features are deformed to thenew format. The most common way of deforming the imageis making affine transformations [7], [13], [15], [16], [24]. Inthis approach, two images are used as an example: a neutralface and an expressive face of a certain person.

The machine learn-based approach uses techniques such asSupport vector machines (SVM) [15], [25], Neural Networks[17], [26], [27] or Linear Regression [28] to learn the changesin a facial expression. In this approach, the characteristicsof each expressive face are generally learned from a setof example images and reproduced in the desired image(target face). A disadvantage of this approach to deformationapproaches is that they require a greater number of exampleimages to synthesize realistic results.

Visual realism of images can be evaluated either passivelyor actively. In the passive assessment, the person who is eval-uating may not know that the intention is to evaluate realismof the images, while in the active assessment, the evaluator isexplicitly looking for elements that help determine if a givenimage is real or not [29]. For the experiment described inSection III-B the active assessment was performed, in whichparticipants knew they would evaluate the realism of syntheticimages.

III. METHODOLOGY

We divided this work into two parts: facial expressionsynthesis and realism assessment. Section III-A describes howthe facial expressions synthesis was performed, while SectionIII-B shows how images were evaluated.

A. Facial expression synthesis

In the present approach, we use the same four conceptsof facial images as proposed by a related study [23]: sourceneutral face (Fsn), source expressive face (Fse), target neutralface (Ftn) and target expressive face (Fte). The individuals ofthe source images are different from the Ftn image.

The Ftn is the image to be changed in order to generatea facial expression. The change is made based on an exam-ple (Fse), selected from a set of images (facial expressiondatabase). The image database also contains a Fsn (withoutexpression of emotion) of the same Fse individual.

The changes that would be required to transform Fsn intoFse are identified and adapted to be used as a basis formodifying Ftn into Fte, so as to generate a new imagecontaining the desired facial expression. For simplification thesupplied Ftn is assumed to refer to a neutral facial expression.The Figure 1 illustrates the steps taken to generate expressivesynthetic facial images (Fte).

The procedures for features correspondence and facial de-formation are similar to a related work [16]. Additionally, ourapproach of source images alignment considers not only thescale and rotation but also translation, which can improve thefeatures correspondence. The method described in this sectiondoes not consider fine details, such as wrinkles and creases.However, this absence is an important factor to be discussed(see section IV-B), because we found no studies investigatingthe impact of these aspects on the generated images.

The first step is to identify a correspondence between thefacial features in Fsn, Fse, and Ftn. This correspondenceis performed by identifying of facial landmarks. Sixty-eight(68) landmarks are located to delimit facial regions: eyebrows,eyes, nose, mouth and face contour. The identification of thesepoints is performed through ensemble regression trees [30]using the DLIB library 1. This correspondence is importantfor defining new positions of Fte landmarks in order shape asan expressive face.

One of the steps towards expressive landmarks definition isthe alignment of Fsn and Fse landmarks with Ftn landmarks.Aligned landmarks can improve the accuracy of new positions.In this regard, a transformation (φ) is defined between theimages in terms of rotation, scale, and translation. The rotationis computed so that the centroid of the landmarks for botheyes have the same horizontal position. The scale guaranteesthat the distances between the eyes are the same. Finally, thetranslation centralizes the image vertically and horizontally.This process has been adapted from [31].

Each landmark has its new position (pte), equivalent tothe expressive face, defined by the corresponding landmarkposition at Ftn (ptn), plus the difference between its position inFse (pse) and in the Fsn (psn). This process has been adaptedfrom [16] and is illustrated in Figure 2.

pte = ptn + φ · (pse − psn) (1)

The second step is the triangulation inside the face regionand its features. A triangle delimits a small region of the imagethat will be modified. The triangulation of the face is a stepthat can be performed before the image processing, because itdoes not depend on the input image, but only on the order ofthe landmarks and the logic of the facial features. Therefore,it is pre-calculated in the Delaunay triangulation and manuallyadjusted to adequately separate the facial features (Figure 2).

The region of each triangle of Ftn is mapped to a newtriangle, formed from the computed expressive positions (pte).This mapping is performed with affine transformations, asexemplified in Figure 2. An affine transformation generalizestranslation, scaling, reflection, rotation and shear operations.The transformation matrix is computed by solving a linearsystem of equations between the points of a triangle of Ftn tothe corresponding points of the triangle in Fte. Then, pixelsof each triangle of Ftn are warped by means of an inversemapping to match the transformation. It is not possible to

1Available in http://dlib.net/

Page 3: Factors influencing the perception of realism in …sibgrapi.sid.inpe.br/col/sid.inpe.br/sibgrapi/2018/08.31...2005/08/31  · face and an expressive face of a certain person. The

Fig. 1. Steps of the facial expression synthesis.

Fig. 2. Illustration of facial expression synthesis steps. The Fsn and Ftn were retrieved from IMPA Faces [32] database.

transform regions not being exposed in the image, for ourapproach inside the mouth. This occluded region of Ftn ismapped from Fse (Figure 2).

Facial expression database: For synthesizing the imagestwo facial expressions databases were selected: IMPA Faces[32] and MUG Facial Expression Database [33]. Thesedatabases provide the front-facing images of individuals stag-ing the six aforementioned facial expressions. In the case ofthe MUG Facial Expression Database, these images are madeavailable as frames of a video. Hence, for this database, it wasnecessary to manually select a neutral frame and an expressiveframe for each of the six emotions.

B. Experiment

Volunteers were informed that photographs (real images)and computer-generated (not real) images would be evaluated.Although the same set of images has been evaluated by allthe volunteers, each participant received a form containingimages in a randomly different order. This procedure aimingat avoiding a bias related to the order of the presentationof images. Volunteers should categorize a certain image as“real” or “not real”, as illustrated in Figure 3. Each image waspresented separately from the others to avoid comparisons.

Fig. 3. Example of a question of the form with a real image. The imageevaluated in the question was retrieved from [32].

The images used in the form, besides being synthetic or real,included one of the six facial expressions formerly specified.Thus, each form contained 36 images (3 synthetic+3 real)×6emotions). The faces presented in the images were equallydistributed between masculine and feminine genders. Like-wise, no repeated faces were displayed. The images presentedto participants were retrieved or synthesized from imagesprovided in facial expression databases described in Section

Page 4: Factors influencing the perception of realism in …sibgrapi.sid.inpe.br/col/sid.inpe.br/sibgrapi/2018/08.31...2005/08/31  · face and an expressive face of a certain person. The

III-A.The realism assessment considered only two levels (“real”

and “not real”) rather than a scale of realism [13], [16], [24],[34] because there is no evidence that a person can realizedegrees of realism [29]. Moreover, the term “not real” waschosen instead of “synthetic” or “false” because it is the directnegation of “real”, thereby reducing the bias of the meaningof those terms [29].

Hypotheses: For this experiment, if the synthesis was ef-fective to represent the images in a realistic way, the numberof synthetic images considered real is expected to be greaterthan or equal to the number of real images considered real. Tothat end, we consider the “types of image”, whether it is real(photograph) or not (synthesized from the approach presentedin Section III-A), to raise the following hypotheses:

H0 the attribution of realism to an image, by voluntaryevaluators, does not depend on the type of image;

H1 the attribution of realism to an image, by voluntaryevaluators, depends on the type of image.

The hypotheses were tested using the Chi-Square Test ofIndependence presented in Section IV-A.

Participants: 155 participants responded to the form. Inorder to obtain a qualitative evaluation, we collected the datawith ten volunteers in an assisted way, including an extra task.They should show in the pictures which factors led them toconsider some images as “not real”. We use responses to thisquestion to discuss the factors influencing realism.

IV. RESULTS

In this section, we presented results of the evaluation of thesynthetic images as described in Section III-B. Section IV-Apresents an analysis of the assignment of realism to the images.Section IV-B discusses aspects of realism relevant to facialexpression synthesis. Finally, Section IV-C addresses realismfor each facial expression.

A. Realism assessment by volunteers

We found similar response times for real and not realimages. Not real images had an average of 10.0 seconds (SD= 2.4), while for real images the mean response time was 9.8seconds (SD = 1.3). On average a participant took 9.8 secondsto evaluate each image (SD = 4.5).

As discussed in Section III-B, a chi-square independencetest was performed to examine whether there is an associationbetween realism and image type. The independence betweenthese variables is not significant (χ2(1, N = 115) = 1229.45,p > .05). It indicates that realism depends on the typeof image. This can be observed in the plots of Figure 4,suggesting that the synthesized images need improvements tobe perceived as real. It does not mean that such images cannotbe used in the applications addressed in Section I, but needto be improved so that a person cannot distinguish between asynthetic and a real image.

In table I some volunteers who participated in the evaluationconsidered synthetic images as real. Some participants alsoconsidered real images as synthetic. The plot of Figure 4 shows

that the three real images with the highest assignments as “notreal” have similar percentages to the three synthetic imageswith the highest realism assignment. Section IV-B discusseswhich factors may have influenced the perception of realismin this paper.

TABLE ICONFUSION MATRIX OF THE RESPONSES.

Synthetic Real Total

Synthetic 70% (1945) 30% (845) 100% (2790)Real 23% (638) 77% (2152) 100% (2790)

Both 46% (2583) 54% (2997) 100% (5580)

A related study [16] conduced an evaluation in whichrealism attribution is evaluated in synthetic or real videos, butfive different levels are considered for assigning realism: verylikely fake (1), likely fake (2), could equally be real or fake (3),likely real (4) and very likely real (5). For synthetic videos, theauthors found that 34% of the respondents assigned levels 1and 2 (tending to false), 46% attributed levels 4 and 5 (tendingto real) and 20% were undecided.

This comparative approach solves some aspects that makeimages more realistic, such as jagged edges and wrinkles, asdiscussed in Section IV-B. The comparative and our assess-ments also have differences in the way of collecting realismassignment data (five levels of realism against two levels ofthe present work) and in the use of the word false in the citedwork instead of the term “not real” in our approach. Moreover,the studies also had a different number of volunteers: 30 inthe comparative research versus 155 herein.

These differences in the assessments hinder a direct com-parison between the results. Despite the differences pointedout, similar results were obtained in the attribution of realismfor our study and levels 4 and 5 for the comparative study.For this comparison, we considered the 4 images of thecomparative study compared to the 4 best images of ourassessment regarding the same expressions. The results of ourwork in contrast to the comparative study respectively were:52% versus 57% for happiness, 42% versus 49% for fear, 37%versus 36% for surprise, and 32% versus 41% for anger.

We choose to compare the top four images because therelated study does not specify in which images/videos theevaluation was performed. Thus, it is difficult to establish aprecise comparison of the results. It also important to point outthat the comparative study evaluates videos and not images,which are more complex due to the features related to theanimation process.

B. Aspects influencing the perception of realism

This section discusses aspects that influence the perceptionof realism according to volunteers. These aspects are cate-gorized into: “differences between subsequent regions” and“unnatural aspects in facial features”. These categories allowa joint analysis of similar factors. Then, these aspects areillustrated by examples with synthetic and real images.

Page 5: Factors influencing the perception of realism in …sibgrapi.sid.inpe.br/col/sid.inpe.br/sibgrapi/2018/08.31...2005/08/31  · face and an expressive face of a certain person. The

(a) Real images (b) Synthetic images

Fig. 4. Percentage of attributions as real and synthetic (not real) for each image.

1) Differences between subsequent regions: The first ar-tifact involves prominent jagged edges. The irregularity canappear as a result of the discontinuity between two regions thatunderwent the warping process. Participants pointed to jaggededges inside the mouth (Figure 5a) and in the chin (Figure5c). This artifact had a negative influence on the perception ofrealism. It has two different causes depending on the region inwhich it is located. In the inner part of the mouth, the artifactis generated by the difference between Fse and Ftn, since theinternal region of the mouth is deformed from Fse. Applyingfilters at the edge of the lips can diminish this problem. For thechin region, the artifact is generated because only the internalregion of the face is deformed. Thus, for some images, theartifact can appear because the region of the chin is shrunk,while the part below is not changed. The solution to thisproblem would be applying warp affine transformations to theregions on the image that are outside the face. This could beaccomplished by including landmarks/triangles on the neck orborder of the image.

Another artifact of this category refers to a noticeable colordifference between subsequent regions of the image. Interest-ingly, this difference was a factor that contributed to volunteersassigning the absence of realism in both synthetic and realimages. An example is shown in Figure 6a, which showsshading on the neck region. Another example is the darkercolor inside the mouth (Figure 6c and Figure 6d). Volunteersstated that these aspects evidenced that the image had beendigitally altered and that region had not been adequatelyaddressed. In real images, this difference probably occurredbecause of lighting conditions during the image acquisition.These real images were used as sources to synthesize otherimages; thus, the same problem was perceived in both images(synthetic and real). Nonetheless, the color difference betweenFsn and Fte can contribute to increasing the difference.

2) Unnatural aspects in facial features: Another type ofartifact reported in the images was a facial expression that

had aspects that do not seem natural. An example is a verylarge pupil size, as pointed out in synthetic images (Figure7a) and real ones (Figure 7c). An study [35], that evaluatesthe effect of Ucanny Valey in real and synthetic images, alsofound a lower attribution of realism to images that have largerfacial features. The disproportion of the facial feature maybe caused by two factors: (I) the deformation of the wholeeye region (and not just the pupil) to synthesize the image;and (II) that the facial expressions in the databases are staged(without spontaneous expressions), which may not open theeyes enough. In the proposed approach, the pupil region isnot treated independently of the eye. The disproportionatealteration of the pupil can be improved with the identificationand deformation of this region separately from the eye. Thesecond problem can be addressed by building a facial ex-pression database that contains spontaneous facial expressions.Such facial expression may seem more congruent, and thus,can also aid in the minimization of this artifact; however,additional issues such as face alignment and the difficulty ofgetting frontal images from spontaneous facial expressions,would have to be addressed.

Another problem reported by volunteers lies in the “com-plexity of the facial expression. Some participants pointed outthat facial expressions with wrinkles were real because theywould be harder to generate by computer. This aspect canbe observed when comparing Figures 8a and 8b. Indeed, thepresent approach did not synthesize wrinkles. To achieve this,techniques such as ERI (Expression Ratio Image) [36] can beused to synthesize facial expressions more convincingly.

The aspects presented in this section indicate possibleimprovements to the proposed approach. Artifacts such asjagged edges, color differences, and unnatural facial featuresreflect aspects of realism that must be observed in proposingfacial expressions synthesis approaches in which realism isan important prerequisite. However, some of these artifactsrequire improvement not only in the synthesis of a new image

Page 6: Factors influencing the perception of realism in …sibgrapi.sid.inpe.br/col/sid.inpe.br/sibgrapi/2018/08.31...2005/08/31  · face and an expressive face of a certain person. The

(a) Synthetic image withprominent jagged edges insidethe mouth.

(b) Synthetic imagewithout prominent jaggededges.

(c) Synthetic image with promi-nent jagged edges on the chin.

Fig. 5. Examples of images with and without aliasing. Images synthesized from images retrieved from IMPA Faces [32].

(a) Real image with color difference on the neck. (b) Synthetic image with lesscolor difference on the neck.

(c) Synthetic image withcolor difference inside themouth.

(d) Real image with color differ-ence inside the mouth.

(e) Synthetic image with lesscolor difference inside themouth.

Fig. 6. Examples of color difference. Images retrieved or synthesized from images of IMPA Faces [32].

but also in the facial expression databases.

C. Analysis of facial expressions

The attribution of realism also differs according to the facialexpression. For example, Figure 9b shows the facial expressionof happiness that obtained the greatest attribution of realismby volunteers among the synthetic images. Among the realimages (Figure 9a), the facial expression of disgust (Figure10a) reached the highest attribution of realism.

A study [37] evaluates the recognition of facial expressionsin real and synthetic images. The facial expression of happi-ness achieved the highest rate of recognition, and the least wasachieved by disgust. The facial expression of happiness wasalso the synthetic facial expression most classified as real inthis work and in a releated work [16]. The facial expressionof disgust was the expression with the greatest realism amongthe real faces. One possible explanation for this is that more“complex” facial expressions tend to be considered more real(see Section IV-B). The facial expression of disgust (Figure10a) has wrinkles in various parts of the face, while happinesshas small wrinkles only on the corners of the eyes.

For the synthetic images (Figure 9b) the two facial expres-sions that obtained the highest attribution of realism werehappiness and sadness. For these expressions, the shape ofthe mouth is changed quite significantly, especially in thecorner of the lips. The highest attribution of realism to thesefacial expressions can be explained because the synthesis waseffective in altering this region (mouth); examples of thesefacial expressions are presented in Figures 10b and 10c. Thismay have occurred because the mouth has a greater number

of landmarks/triangles. The facial feature of the mouth has20 landmarks, while the other features have fewer: nose (9landmarks), eye (6 landmarks) and eyebrow (6 landmarks).The landmarks are illustrated in Figure 2.

V. CONCLUSIONS

This study proposes an approach for facial expressions syn-thesis with some differences from related works. Nevertheless,the main contributions of this study are: i) the proposition ofa technique to evaluate the realism of synthetic facial expres-sions; and ii) a detailed discussion of the factors influencingthe perception of realism. Although the synthesized imagesdo not have the same level of realism as the real ones, thesynthesized images are sufficiently expressive to be used inseveral applications.

Factors influencing the attribuittion of realism are relatedto the synthesis and the source images used (facial expres-sion databases). The analysis of these aspects is importantto propose similar approaches. The empirical test based onparticipant feedback was an important tool to evaluate theperception of realism since we did not find similar evaluationin related studies.

From the experiment, we notice that even subtle artifactssuch as small jagged edges, color differences, and unnaturalfacial features sizes lead volunteers to perceive these images asless realistic. Of the solutions proposed to solve not realisticartifacts present in the image, what stands out most is theaddition of landmarks for a more detailed triangulation of theface. Such triangulation allows synthesizing a more preciseand detailed deformation, and is thus a way to improve aspects

Page 7: Factors influencing the perception of realism in …sibgrapi.sid.inpe.br/col/sid.inpe.br/sibgrapi/2018/08.31...2005/08/31  · face and an expressive face of a certain person. The

(a) Synthetic image with large pupil. (b) Synthetic image with normal sizepupil.

(c) Real image with large pupil.

Fig. 7. Examples of unnatural facial feature size. Images retrieved or synthesized from images of IMPA Faces [32].

(a) Real image with wrinkles. (b) Synthetic image without wrinkles.

Fig. 8. Examples of images with and without fine details. Images retrieved or synthesized from images of IMPA Faces [32].

(a) Real images. (b) Synthetic images.

Fig. 9. Percentage of assignments as real and synthetic (not real) for each facial expression.

(a) Real image of disgust. (b) Synthetic image of hap-piness.

(c) Synthetic image of sad-ness.

Fig. 10. Examples of facial expressio ns. Images retrieved or synthesized from images of IMPA Faces [32].

such as image discontinuity and face expressivity. Further-more, participants pointed out that images presenting finedetails such as wrinkles tend to appear more realistic, whichmakes this aspect essential for facial expression synthesis.

As future work, it is important to extend the study consid-ering the aspects presented in Section IV-B to generate morerealistic facial expressions. Another suggestion is verifyingother aspects of synthetic images besides the attribution of

realism, such as the preservation of face identity and theexpressiveness of each emotion represented.

Regarding the realism experiment, this paper suggests mea-suring the impact of each factor discussed providing moredetails on how these aspects influence the realism perceptionamong different levels for each visual factor, as described inother work concerning general image realism [29]. Thus, itwill be possible to analyze the amount of each visual factor

Page 8: Factors influencing the perception of realism in …sibgrapi.sid.inpe.br/col/sid.inpe.br/sibgrapi/2018/08.31...2005/08/31  · face and an expressive face of a certain person. The

(color differences, types of irregularities, wrinkles, creases,pupil sizes, etc.) that is necessary to perceive an image asreal.

ACKNOWLEDGMENT

The authors would like to thank the Coordination for theImprovement of Higher Education Personnel (CAPES) andthe Brazilian Institute of Science and Technology in MedicineAssisted by Scientific Computing (INCT-MACC) for theirfinancial support.

REFERENCES

[1] X. Li, C.-C. Chang, and S.-K. Chang, “Face alive icon,” Journal ofVisual Languages & Computing, vol. 18, no. 4, pp. 440–453, 2007,visual Interactions in Software Artifacts.

[2] E. Mendi and C. Bayrak, “Facial animation framework for web andmobile platforms,” in 2011 IEEE 13th International Conference one-Health Networking, Applications and Services, June 2011, pp. 52–55.

[3] H. Fujishiro, T. Suzuki, S. Nakano, A. Mejima, and S. Morishima, “Anatural smile synthesis from an artificial smile,” in SIGGRAPH ’09:Posters, ser. SIGGRAPH ’09. New York, NY, USA: ACM, 2009,pp. 59:1–59:1. [Online]. Available: http://doi.acm.org/10.1145/1599301.1599360

[4] B. Amberg, R. Knothe, and T. Vetter, “Expression invariant 3d facerecognition with a morphable model,” in 2008 8th IEEE InternationalConference on Automatic Face Gesture Recognition, Sept 2008, pp. 1–6.

[5] E. Keeve, S. Girod, R. Kikinis, and B. Girod, “Deformable modelingof facial tissue for craniofacial surgery simulation,” Computer AidedSurgery, vol. 3, no. 5, pp. 228–238, 1998.

[6] J.-y. Noh and U. Neumann, “Expression cloning,” in Proceedings ofthe 28th Annual Conference on Computer Graphics and InteractiveTechniques, ser. SIGGRAPH ’01. New York, NY, USA: ACM, 2001,pp. 277–288.

[7] L. Impett, P. Robinson, and T. Baltrusaitis, “A facial affect mappingengine,” in Proceedings of the Companion Publication of the 19thInternational Conference on Intelligent User Interfaces, ser. IUI Com-panion ’14. New York, NY, USA: ACM, 2014, pp. 33–36.

[8] R. L. Testa, A. H. N. Muniz, L. U. S. Carpio, R. d. S. Dias, C. C.d. A. Rocca, A. M. Lima, and F. d. L. d. S. N. Marques, “Gener-ating facial emotions for diagnosis and training,” in 2015 IEEE 28thInternational Symposium on Computer-Based Medical Systems, June2015, pp. 304–309.

[9] M. Pharr, W. Jakob, and G. Humphreys, Physically Based Rendering:From Theory to Implementation, 3rd ed. San Francisco, CA, USA:Morgan Kaufmann Publishers Inc., 2016.

[10] R. Diamond and S. Carey, “Why faces are and are not special: An effectof expertise.” Journal of Experimental Psychology: General, vol. 115,no. 2, pp. 107–117, 1986.

[11] D. Y. Tsao and M. S. Livingstone, “Mechanisms of faceperception,” Annu Rev Neurosci, vol. 31, pp. 411–437, 2008,18558862[pmid]. [Online]. Available: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2629401/

[12] H. Averbuch-Elor, D. Cohen-Or, J. Kopf, and M. F. Cohen, “Bringingportraits to life,” ACM Trans. Graph., vol. 36, no. 6, pp. 196:1–196:13,Nov. 2017. [Online]. Available: http://doi.acm.org/10.1145/3130800.3130818

[13] P. Ekman and W. V. Friesen, “Constants across cultures in the face andemotion.” Journal of personality and social psychology, vol. 17, no. 2,p. 124, 1971.

[14] K. Li, Q. Dai, R. Wang, Y. Liu, F. Xu, and J. Wang, “A data-drivenapproach for facial expression retargeting in video,” IEEE Transactionson Multimedia, vol. 16, no. 2, pp. 299–310, Feb 2014.

[15] A. Mohammadian, H. Aghaeinia, and F. Towhidkhah, “Diverse videossynthesis using manifold-based parametric motion model for facialunderstanding,” IET Image Processing, vol. 10, no. 4, pp. 253–260,2016.

[16] W. Wei, C. Tian, S. J. Maybank, and Y. Zhang, “Facial expressiontransfer method based on frequency analysis,” Pattern Recognition,vol. 49, pp. 115 – 128, 2016.

[17] F. Qiao, N. Yao, Z. Jiao, Z. Li, H. Chen, and H. Wang, “Geometry-Contrastive Generative Adversarial Network for Facial Expression Syn-thesis,” ArXiv e-prints, Feb. 2018.

[18] E. Richardson, M. Sela, R. Or-El, and R. Kimmel, “Learning detailedface reconstruction from a single image,” CoRR, vol. abs/1611.05053,2016. [Online]. Available: http://arxiv.org/abs/1611.05053

[19] Y. Feng, F. Wu, X. Shao, Y. Wang, and X. Zhou, “Joint 3D FaceReconstruction and Dense Alignment with Position Map RegressionNetwork,” ArXiv e-prints, Mar. 2018.

[20] V. Blanz, C. Basso, T. Poggio, and T. Vetter, “Reanimating faces inimages and video,” in Computer graphics forum, vol. 22, no. 3. WileyOnline Library, 2003, pp. 641–650.

[21] H. Liang, R. Liang, M. Song, and X. He, “Coupled dictionary learningfor the detail-enhanced synthesis of 3-d facial expressions,” IEEETransactions on Cybernetics, vol. 46, no. 4, pp. 890–901, April 2016.

[22] M. Piotraschke and V. Blanz, “Automated 3d face reconstruction frommultiple images using quality measures,” in 2016 IEEE Conference onComputer Vision and Pattern Recognition (CVPR), June 2016, pp. 3418–3427.

[23] W. Xie, L. Shen, M. Yang, and J. Jiang, “Facial expression synthesiswith direction field preservation based mesh deformation and lightingfitting based wrinkle mapping,” Multimedia Tools and Applications,vol. 77, no. 6, pp. 7565–7593, Mar 2018. [Online]. Available:https://doi.org/10.1007/s11042-017-4661-6

[24] Y. Zhang, W. Lin, B. Zhou, Z. Chen, B. Sheng, and J. Wu, “Facialexpression cloning with elastic and muscle models,” Journal of VisualCommunication and Image Representation, vol. 25, no. 5, pp. 916 –927, 2014.

[25] Y. Zhang and W. Wei, “A realistic dynamic facial expression transfermethod,” Neurocomputing, vol. 89, pp. 21 – 29, 2012.

[26] M. Y. Y. Leung, H. Y. Hui, and I. King, “Facial expression synthesis byradial basis function network and image warping,” in Neural Networks,1996., IEEE International Conference on, vol. 3, Jun 1996, pp. 1400–1405 vol.3.

[27] J. Ghent and J. McDonald, “Photo-realistic facial expression synthesis,”Image and Vision Computing, vol. 23, no. 12, pp. 1041 – 1050, 2005.

[28] W. Zhu, Y. Chen, Y. Sun, B. Yin, and D. Jiang, “Svr-based facialtexture driving for realistic expression synthesis,” in Image and Graphics(ICIG’04), Third International Conference on, Dec 2004, pp. 456–459.

[29] P. M. Rademacher, “Measuring the perceived visual realism of images,”Ph.D. dissertation, University of North Carolina, 2002.

[30] V. Kazemi and J. Sullivan, “One millisecond face alignment withan ensemble of regression trees,” in Proceedings of the 2014 IEEEConference on Computer Vision and Pattern Recognition, ser. CVPR’14. Washington, DC, USA: IEEE Computer Society, 2014, pp. 1867–1874.

[31] D. L. Baggio, S. Emami, D. M. Escriva, K. Ievgen, J. Saragih, andR. Shilkrot, Mastering OpenCV 3 - Second Edition, 2nd ed. PacktPublishing, 2017.

[32] R. C.-J. J. P. Mena-Chalco and L. Velho, “Banco de dados de faces3d: Impa-face3d,” IMPA - RJ, Tech. Rep., 2008. [Online]. Available:http://app.visgraf.impa.br/database/faces/

[33] N. Aifanti, C. Papachristou, and A. Delopoulos, “The mug facialexpression database,” in 11th International Workshop on Image Analysisfor Multimedia Interactive Services WIAMIS 10, April 2010, pp. 1–4.

[34] H. Wang and K. Wang, “Affective interaction based on person-independent facial expression space,” Neurocomputing, vol. 71, no.1012, pp. 1889 – 1901, 2008, neurocomputing for Vision ResearchAd-vances in Blind Signal Processing.

[35] J. Seyama and R. S. Nagayama, “The uncanny valley: Effect of realismon the impression of artificial human faces,” Presence: Teleoperatorsand Virtual Environments, vol. 16, no. 4, pp. 337–351, 2007. [Online].Available: https://doi.org/10.1162/pres.16.4.337

[36] Z. Liu, Y. Shan, and Z. Zhang, “Expressive expression mapping with ra-tio images,” in Proceedings of the 28th Annual Conference on ComputerGraphics and Interactive Techniques, ser. SIGGRAPH ’01. New York,NY, USA: ACM, 2001, pp. 271–276.

[37] M. Dyck, M. Winbeck, S. Leiberg, Y. Chen, R. C. Gur, andK. Mathiak, “Recognition profile of emotions in natural and virtualfaces,” PLOS ONE, vol. 3, no. 11, pp. 1–8, 11 2008. [Online].Available: https://doi.org/10.1371/journal.pone.0003628