ieee transactions on affective computing, vol. x, no. y ... · psychologist mirroring, the computer...

1949-3045 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2016.2606594, IEEETransactions on Affective Computing

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. X, NO. Y, APRIL 2016 1

Assessing the Influence of Mirroring on thePerception of Professional Competence using

Wearable TechnologyMarıa Elena Meza-de-Luna, Juan R. Terven, Member, IEEE Bogdan Raducanu, and Joaquın

Salas, Member, IEEE

Abstract—Nonverbal communication is an intrinsic part in daily face-to-face meetings. A frequently observed behavior during socialinteractions is mirroring, in which one person tends to mimic the attitude of the counterpart. This paper shows that a computer visionsystem could be used to predict the perception of competence in dyadic interactions through the automatic detection of mirroringevents. To prove our hypothesis, we developed: (1) A social assistant for mirroring detection, using a wearable device which includes avideo camera and (2) an automatic classifier for the perception of competence, using the number of nodding gestures and mirroringevents as predictors. For our study, we used a mixed-method approach in an experimental design where 48 participants acting ascustomers interacted with a confederated psychologist. We found that the number of nods or mirroring events has a significantinfluence on the perception of competence. Our results suggest that: (1) Customer mirroring is a better predictor than psychologistmirroring; (2) the number of psychologist’s nods is a better predictor than the number of customer’s nods; (3) except for thepsychologist mirroring, the computer vision algorithm we used worked about equally well whether it was acquiring images fromwearable smartglasses or fixed cameras.

Index Terms—Mirroring, Nodding, Competence, Perception, Wearable Technology

F

1 INTRODUCTION

NONVERBAL communication is an intrinsic part of oureveryday face-to-face interactions. It is this social

signaling that reinforces the conveyed message offeringsupplementary information smoothly integrated with ourverbal channel. Some psychologists [3], [30] suggest that inmany situations the social signals are just as important as theconscious content for determining human behavior. In otherwords, it is a piece of information that defines to a largeextent our attitude towards a situation and our reaction. Apossible way to understand the power of social signals is tomake an analogy with trying to follow a conversation in aforeign, unfamiliar language. Despite our lack of knowledgeof the language, we may still infer some general patterns,e.g., who is leading the conversation, or if there is a friendlyor tense discussion.

A frequently observed behavior during social interac-tions is mirroring, in which one person tends to mimicthe non-verbal communication (head movements, handgestures, facial expressions, tone of voice, verbal accent,breathing, etc.) of his/her counterpart. The role of mirroringis to signal empathy between people and, in general, is

• Maria Elena Meza de Luna is with the Universidad Autonoma deQueretaro.

• Juan R. Terven is with the Instituto Tecnologico de Mazatlan.

• Bogdan Raducanu is with the Computer Vision Center.

• Joaquın Salas is with the Instituto Politecnico Nacional. Sendcorrespondence to: [email protected]

Manuscript received April 15, 2016; revised April 15, 2016.

an early indicator of agreement of the interactional process[16], [37]. It has long been observed that head gestures suchas nodding, increase the opportunities for a person to beliked [29], [45], while the occurrence of mirroring is an earlypredictor of acceptance [22], [33], [38], [62]. However, asfar as we know, there is a void in the literature on hownodding/mirroring in a dyadic interaction can influencethe perception of competence 1. Hence, this investigationintends to answer whether it is possible to predict theperception of competence through nodding gestures or mir-roring events in dyadic interactions.

The thoughtful analysis of social interactions allowsus to advance the understanding of their consequences.However, the process of studying them is cumbersome, asthey are complex phenomena that generate large amountsof information. For instance, it has been reported that onehour of observation leads to about 10 hours of annota-tions [48]. In addition, gathering information may be achallenge in itself, since obtrusively intervening in a socialinteraction could affect its development. Besides, most ofthe analysis in this area is done after the interaction hasalready finished. As stated by Pentland [49], having a socialassistant decoding/interpreting the information containedin complex social signals could pave the way to correctingour attitude while there is still time to reduce the chancesfor an unwelcome result in the interaction.

Hence, inspired by the pioneering work of Pentland inthe processing of social signals, in this paper we present a

1. Competence entails the possession of skills, talents, and capabilitywith traits that include being clever, competent, creative, efficient,foresighted, ingenious, intelligent, and knowledgeable [19].




Fig. 1. Assessing Competence. During the interaction, the conversation is recorded using wearable and fixed cameras to generate a set of imagesI(t). These images are fed to a head-gesture recognizer that detects in real-time nodding η1(t), η2(t). Synchronous recognition of head nods leadsto mirroring detection µ(t). During training, a set of qualitative interviews was carried out with customers to evaluate their perception of competenceand satisfaction during the interaction. This creates a classification space C, which for a certain degree of activity provides an automated evaluationL of the likely outcome of the interaction

social assistant to recognize the perception of competencethat results from an interaction. Our central hypothesis isthat it is possible to develop a wearable system to determinewhether the interlocutor has been perceived as competentor non-competent in a dyadic conversation. Indeed, weprovide a practical confirmation that the perception ofcompetence can be evaluated using nodding gestures ormirroring events detected using a video camera embeddedin the wearable device. Our method can be summarized asfollows (see Figure 1): During the interaction, participants(acting as customers) ask a confederated psychologist (actingas service provider) for professional advice. The conversationis recorded using wearable and fixed cameras to generate aset of images I(t). The role of the pair of fixed cameras thatappear in our sketch scenario is only to serve as a reference,such that the images captured by the wearable could becompared. These images are fed to a head-gesture recog-nizer that detects the head nodding gestures η1(t), η2(t) inreal time. Synchronous recognition of head nods leads tomirroring detection µ(t). To automatically train the classifierthat detects the perceived competence, we obtain groundtruth from a set of qualitative interviews carried out withcustomers. This creates a classification space C, which for acertain degree of activity provides an automated evaluationL of the likely outcome of the interaction.

The main contribution of our approach is that we pro-pose a social assistant based on wearable technology, i.e.,smartglasses, which have a video camera embedded in thebridge that connects the two lenses (see Figure 2). Ourchoice for a solution based on computer vision is because:(1) Its non-invasive nature, which increases the chancesto be accepted by people; (2) it offers a first-person per-spective, compared to the classical fixed cameras, whichoffer a third-party perspective; and (3) the acceptance of

Fig. 2. Smartglasses used as a wearable camera. The glasses havea high-definition camera embedded in the bridge connecting the twolenses (by Pivothead Inc., with permission).

its use may increase given that the camera is embeddedin an everyday artifact used by a significant number ofpeople. The proposed social assistant could be employed bypsychologists in the automatic annotation/summarizationof videos and as a supportive tool by people suffering froma broad spectrum of impairments. For the latter, althoughwe are in the early stages, our goal is to develop wearableassistive technology to improve the social interactions of:(1) Visually impaired people by understanding the socialcontext, (2) autism children by decoding the facial expres-sions and social signals transmitted by the person they areinteracting with, and (3) persons affected by mild cognitiveimpairment by allowing them to review the persons theymet, conversations they had, places they visited, and eventsthey took part in during the day.

RELATED LITERATURE

During a face-to-face human interaction, nonverbal commu-nication plays a fundamental role, as it is used to support




the spoken message by placing particular emphasis oncertain aspects of it [40]. Usually, nonverbal communicationis manifested through a multiplicity of behavioral cuesincluding head movements, body postures/gestures, facialexpressions, winks, tone of voice, verbal accent, and vocalutterances [64]. Sometimes, these cues are also known associal signals, a term coined by Pentland [49], because theyare an entire part of our social interaction.

According to social psychology, head nodding plays aparamount role in social interactions. Apart from the ob-vious function of signaling a yes, head nods are used asbackchannels to display interest, enhance communicationor anticipate the counterpart intention for turn claiming [2],[35]. Besides, the psychology literature suggests that thefrequency of head nod events in face-to-face interactionscan reveal personal characteristics or even predict outcomes.For instance, job applicants producing more head nodsin employment interviews have been reported to be oftenperceived as more employable than applicants who do not[29], [45].

Mirroring is another important event that takes partduring social interaction, i.e., when one interlocutor triesto mimic the attitude of the counterpart [16], by imitatingspeech patterns (accent, voice prosody), facial expressions,postures, gestures, and idiosyncratic movements. The studyof mirroring has attracted the interest of psychologists for along time [18]. Mirroring behavior has an important, butbarely noticed, impact on our daily life. It reveals largepieces of information regarding the participants’ interper-sonal states and attitudes and represents a reliable indicatorof cooperativeness and empathy during interaction [65] tothe extent that it has been demonstrated to be positivelyassociated with romantic interest and attraction [22].

In marketing, mirroring has proved to influence the cus-tomers’ behavior in different settings. For instance, mimi-cking the verbal behavior of customers in a restaurant wasassociated with a higher rate of customers who gave a tipand with larger amounts of the tips [63]. In the retail sector,Jacob et al. [38] found an increment in sales rates whenmimicking (78.8%) versus non-mimicking (61.8%). Besides,clerk’s suggestions were more influential when purchasinga product: 71.1% of customers exposed to mirroring boughtthe object suggested by the seller versus 46.2% in the non-mirroring case. Similar results were reported by Gueguen[32], who found that mirroring is associated with greatercompliance with the sellers’ suggestions and customersrating the sellers and the store with greater positive evalu-ations. Indeed, mirroring seems to be a powerful techniqueto increase helping behavior. For instance, Van Baaren et al.[62] set an experiment where participants were invited toevaluate different advertisements. The experimenter, whowas seated in front of the participant, mimicked the partici-pant’s posture (position of their arms, legs) or not. When thetask finished the experimenter “accidentally” dropped sixpens on the floor. It was found that participants in the mir-roring condition picked up the pens more often (100%) thanparticipants in the non-mirroring condition (33%). Similarly,Gueguen et al. [34] found a positive effect of mirroringon helping behavior to the explicit verbal solicitation; ontheir experiment, 76.7% of the participants in the mirroringgroup agreed to help versus 46.7% in the non-mirroring

control condition. Even more, recent research revealed thatpeople who, even consciously, mimic the behavior of othersactivates behavioral strategies which may increase theirchances to achieve their goals [33]. Thus, a social interactionpresenting a high number of mirroring events is perceivedto run more smoothly and the chances to reach a positiveoutcome or an agreement increase significantly.

Recently, computer vision systems have been integratedto the nonverbal human communication studies to enhanceits understanding. Pentland [49] was the first one to claimthat social signals could be quantified automatically to inferbehavioral patterns in human interactions. The first attemptto prove this idea was reported by Curhan and Pentland[20], who tried to predict the behavioral outcome of employ-ment selection interviews using non-verbal audio cues. Thesame approach has also been applied for predicting salarynegotiations [14] and speed-dating conversations [44]. Someresearch on behavior analysis during social interactions hasfocused on different aspects such as the role of participantsin news broadcasts and movies [64], [66], the detection of theleadership role during meetings [51], [53], the inference ofpersonality traits [43], [56], and the simultaneous predictionof a job interview outcome and personality [47]. In thissense, the ability to automatically detect head nods could beuseful to build automatic inference methods of high-levelsocial constructs.

With all this evidence about the importance of socialsignals on human behavior, we aim to use a computervision system based on a wearable device to evaluatewhether mirroring behavior, understood as simultaneoushead nodding gestures, can be utilized as an early predictorof the perception of competence. We want to assess howthe gestures of two interacting people (customer and aservice provider) can influence a customer’s perception ofthe service provider competence.

METHOD

We integrated mixed methods [36], [60] in an experimen-tal design to assess the impact of nodding/mirroring inthe perception of professional competence. The experimentconsisted of a conversational scenario in which a customerinteracted with a confederated psychologist who answeredin the same verbal fashion to each customer. However, weinstructed the confederated psychologist to emit differentbackchannel regulatory gestures. During the interaction, ourwearable device-based computer vision system recognizednodding gestures and detected mirroring behavior auto-matically [61]. After the social interaction, we did semi-structured interviews to participants to assess, qualitatively,the satisfaction and their intentions to hire the psychologist.Finally, we used the number of either nods or mirroringevents to determine whether the customer perceived thepsychologist as competent or non-competent. The wholeprocess is summarized in Figure 1.

Participants

Our inclusion criterion to participate consisted of being acollege student at least 18 years old. The final sample had 48volunteers (50% women), ranging from 18 to 44 years (M =




0

10

20

30

40

Low Medium HighCases

Nu

mb

er o

f G

estu

res

Fig. 3. Confederated psychologist’s distribution of head nodding ges-tures for each of the three experimental cases.

21.83, SD = 4.10); of these, 29.2% were majoring in sociology,25% in politics, 18.8% in architecture, 10.4% in engineering,6.3% in journalism, 4.2% in business, 4.2% in mathematics,and 2.1% in nursing.

Experimental Procedure

In our scenario, we controlled: (1) the physical environmentin a lab setting; (2) the verbal explanations and gesturesof psychologist; and (3) the homogeneity of the sample,represented by the same number of men and women asparticipants, all being undergrad students. The confeder-ated psychologist was trained during one week before theexperiment and he rehearsed half an hour before eachsession.

We invited the participants to collaborate in an inquiryrelated to dyadic conversations. The participants were in-structed to present a particular personal pseudo-problem toa psychologist and ask for advice. We asked the participantsto explain the situation and to conduct their conversationsaround three questions with the psychologist: (1) what to doin the current situation?, (2) what the psychologist’s experi-ence with similar problems is? And, (3) how many sessionsare needed? Afterwards, the participants were conductedoutside of the psychologist’s office, and the smartglasseswere activated. The participants knocked on the office doorto start the experimental process where they asked foradvice. Once inside the office, the confederated psycholo-gist answered the questions in the same verbal style butcontrolling his nodding gestures in one of three occurrencelevels, which constitute the experimental cases: Low, thepsychologist acted restricting his nodding gestures; medium,the psychologist nodded and mirrored the customer’s nod-ding, and high, the psychologist increased nodding and pro-moted mirroring. The number of samples in each case wasbalanced. Figure 3 shows the distribution of head noddinggestures from the confederated psychologist for each of theexperimental cases. In this figure, we appreciate a cleardistinction between the number of head nodding gesturesfrom each case, performed by the psychologist during eachsession.

When the interaction ended, the participants left thepsychologist’s office, and the glasses were deactivated. Theparticipants were conducted to another room, where theywere interviewed with a semi-structured method to inquireabout their interaction. The aim of this interview was toevaluate their perception of the competence of the psycho-logist (illustrative examples are provided in Table 1).

We recorded each customer-psychologist conversationwith two static cameras and two wearable cameras. Thecustomer and the psychologist used smartglasses duringthe whole interaction. For the static cameras, we used Mi-crosoft LifeCam Studio cameras fixed on the table looking ateach interlocutor. For wearable cameras, we used Pivotheadglasses. After recording each session, we synchronized thefour videos. The final interview inquiring perception of thecustomer-psychologist interaction was digitally recorded.

Based on this scenario, we created a dataset, whichfor each experimental session consisted of: (1) Four videos(two from the customer’s and the psychologist’s wearablecameras and two from the fixed cameras), and (2) a recordedqualitative interview. The experiment took place betweenOctober 2014 and February 2015. On average, the dyadicconversation with the psychologist was three minutes long,and the qualitative interview was five minutes long. Al-though it is debatable that three minutes of interaction mayprovide enough evidence for a person to perceive his/herinterlocutor competence, our preference for that time in-terval is supported by some compelling studies that pointto the human capability to arrive at accurate perceptionsin short time intervals [3], [4], [5], [67]. In all cases, we re-warded the individual participation with 50 Mexican pesos(equivalent to about US$3.00 at the time of the experiment).

This study followed ethical standards as stipulated bythe American Psychological Association [27]. Participantssigned an informed consent letter. Confidentiality and per-son’s anonymity were maintained at all times. All videoand audio recordings were made with participant’s writtenauthorization. The protocol was approved by the EthicsCommittee from the Universidad Autonoma de Queretaro.

Analysis Procedure

Our procedure was the following: (1) analyze the qualitativedata about the psychologist’s competence with GroundedTheory [15], [31], [58], (2) quantify head nodding gesturesand mirroring events using computer vision algorithms, and(3) analyze the relation between head nodding gestures andperception of competence, using the probabilistic evaluationof membership to a class on a Bayesian framework. In whatfollows, we describe each step.

Qualitative Analysis on Professional CompetenceThe audio of the interviews was recorded digitally. Wethen used Sound Scriber [12] to transcribe verbatim andAtlas.ti [46] to analyze the qualitative data. We analyzed thequalitative data obtained from semi-structured interviews,using the principles of Grounded Theory [15], [31], [58]to explore the experience of the interaction between thecustomer and the psychologist. Grounded Theory is basedon a procedure of constant comparison, in which each pieceof data is analyzed and categorized. The method requires to




TABLE 1Topics and specific questions in the interview guide and detail of some responses from the customers. Using as guidelines the concepts in the

Topics column, we posed Questions as the ones shown. The qualitative analysis of responses from the customers, from which prototypicalexamples are shown, prompt us to assign a class to the client’s perception. In the qualitative analysis Satisfaction and Hire-ability resulted as

components of the Competence

Topics Questions ClassNon-competent Competent

Perception of interaction How was the inter-action?

It was fine, but the psychologist need to explainmore. He needs to give more details in the responsesto questions. . .

Fine, he was very friendly, he greeted mewhen I arrived. He answered correctly thequestions and solved my questions. . .

Com

pete

nce Satisfaction How did you feel? I would have felt more comfortable if he had ex-

plained betterGood, I felt listened. He was interested inwhat I was saying.

Hire-ability

Would you taketherapy with thispsychologist? Why?

Mm, well, definitely I would look for advice, but I amnot convinced by him. I believe that I can find a bettertherapeutic proposal somewhere else, with someoneelse that can convince me better [. . . ] I would lookfor a second opinion.

The interaction was very good. The psycho-logist was so kind and make me feel listened.All this made me feel confident on him. Iwould take therapy with him.

extract the similarities and differences to previous events inthe same category, while confirming its density of propertiesand dimensions. The theory to explain core concepts isdeveloped with both the categories and the relationshipsbetween them. Overall, the participants’ testimonies werecategorized into satisfaction or dissatisfaction, and the in-clination to hire or not. Two investigators did the analysisseparately to assess the reliability by intercode procedure[13], [36], obtaining high reliability (0.97).

Mirroring Detection with Wearable Technology

From the video acquired during the dyadic conversationswith the confederated psychologist, we applied an auto-matic head nodding gesture and mirroring detection basedon computer vision. The procedure started by tracking facialfeatures (shown in Figure 4) using the Supervised DescentMethod (SDM) [68], we then applied a stabilization step tocompensate for the motion induced by the wearable camera.

With the camera motion stabilized, we created a set ofdescriptors using histograms of orientations (HOO) [28] asfollows: First we tracked a facial feature (i.e., tip of thenose) during k consecutive frames, obtaining its motionp = (px, py), where p = pn − pn−1; then we calculatedits orientation, θ = arctan(py, px) (taking care of the casepx = 0), and magnitude, M =

√p2x + p2y ; then we added

the magnitude M to the bin b = ⌊θ/360⌋B, which belongs tothe B-bins histogram h. The histograms are used as inputsfor a head nodding classifier based on decision forests.Figures 6(a) and 6(b) show the non-normalized histogramfor a nodding gesture and an exemplary gesture other thannodding. Notice that a nodding gesture, which is describedas moving the head up and down has large bins at 90◦

and at 270◦. On the contrary, other head gestures havecontributions elsewhere with a different pattern.

Having detected the head nods from both participants,we measured mirroring in both directions following anapproach similar to Feese et al. [24]. We defined two events:Person A is mirroring Person B or (mAB); and Person B is mir-roring Person A or (mBA). To count an mAB event, personA needs to start displaying gesture ξ after person B startedand within a time ∆t after person B stopped displayinggesture ξ. In case that person A displays ξ multiple times

Fig. 4. Facial Features extracted with the SDM [68]. These features areused to track the face in consecutive video frames. The face trackingleads to head motion, which is used to detect nodding gestures on bothparticipants.

5

10

30

210

60

240

90

270

120

300

150

330

180

(a) Nodding Histogram

5

10

30

210

60

240

90

270

120

300

150

330

180

(b) Other gesture Histogram

Fig. 6. Head Gesture Histograms of Orientations (HOO). (a) Shows theHOO of a nodding gesture and (d) shows the HOO of an exemplarygesture other than nodding.




mAB mBA

Fig. 5. Mirroring detection. Top and middle plots depict the occurrence of nodding gestures from person A and person B respectively. Bottom plotdepicts the detected mirroring events. Note that there is a fixed interval of time ∆t when the mirroring effect may take place.

while B is displaying ξ, only one event is counted. Similarly,a mBA event is triggered when person B starts displayinggesture ξ after person A started and within ∆t after personA stopped displaying gesture ξ. Gesture repetitions weretreated the same way. More formally, given a sequence ofgestures gξ

1...Nξ of person A, the start and end times of

each gesture is given by t1(gξi

)and t2

(gξi

)respectively.

An mAB event is triggered if (following Feese et al. [24]):

gAi = gBj ,

t1(gBj

)< t1

(gAi

)< t2

(gBj

)+∆t.

(1)

Figure 5 shows two fragments from one of the videosin our dataset. The top plot depicts the nodding gesturesperformed by person A, the middle plot depicts the noddinggestures performed by person B, and the bottom plot de-picts the mirroring events. Figure 5(a) shows mAB events;the first mirroring event occurs when person A mirrorsperson B after person B stopped displaying the noddinggesture, but within a predefined window, lasting ∆t = 1.4seconds. The definition of ∆t is a problem in its own right.For instance, Bilakhia et al. [7] determined experimentallya motor mimicry value of 4s using the longest delay ob-served in the construction of their database. At their end,and while studying the reaction time of interlocutors inbehavioral mimicry, Kumano et al. [42] estimated a delayresponse of 1s. As could be appreciated, the time delayto mimic a gesture does not have a universal value and itlargely depends on the objective of the experimental study,as well as the particularities of the recorded database. Inour case, we estimated our value by analyzing all mirroringepisodes based on manually annotated data and settling inthe average delay between gesturing. We acknowledge thisis a heuristic method to decide the value of ∆t. In futurework, we aim to decide this value in a more systematicway (by analyzing a larger number of time intervals, andchoosing the one which provides the optimal results. Thisanalysis could be based on a ROC-like study). The secondmirror event occurs just after person B started the noddinggesture. Figure 5(b) shows mBA events; in this case, thetwo mirroring events occur just after person A started thenodding gesture.

Relation between Head Gestures and Perception of Compe-tenceAs a result of the previous stages, we end up with a set ofmeasurements of the number x of nodding and mirroringevents detected by the computer vision system and labelsC to identify the customer perception of the psychologist’scompetence. This set of measures may be represented byS = {(x,C)i}, for i = 1, ...,M . In the description of ourapproach, x is a scalar used to represent any of the vari-ables in the set {nf

c , nfp , n

wc , n

wp ,m

fc ,m

fp ,m

wc ,m

wp }, where

nβα and mβ

α reflect the number of automatically detectednods and mirroring events respectively, α ∈ {f, w} repre-sents whether the observation was performed with a fixed,f , or wearable camera, w, and β ∈ {c, p} highlights whetherthe observation was made on the customer, c, or on thepsychologist, p. For instance, nf

p is the number of nodsmade by the psychologist as observed from a fixed camera.In addition, C ∈ {Cn, Cc} is the label assigned to theperception acquired by the customer about the competence,Cc, or non-competence, Cn, of the psychologist.

Over the years, many supervised learning techniquesaimed to classify observations have been developed, in-cluding Naive Bayes, Logistic Regression, Decision Treesor Support Vector Machines, among others [8]. Since oursample is small and our feature is one dimensional, a rea-sonable choice is to use a Bayesian framework [52]. Here, themembership of an observation to a class is evaluated on thebasis of an estimate of the likelihood, P (x|C), obtained bytraining, and prior knowledge, P (C), about the competenceor non-competence of a particular psychologist, estimatedfrom the observations, such as

P (C|x) = kP (x|C)P (C), (2)

where the proportionality constant k is the same for bothP (Cc|x) and P (Cn|x). We estimate the likelihood, P (x|C),and the prior, P (C), directly from the data to avoid re-strictive assumptions about their form. In the case of thelikelihood, P (x|C), one possible way to do this could in-volve the use of normalized histograms. A histogram can beconstructed as the accumulation of the number of particularevents, such as

f(j|C) =M∑i=1

1(xi = j|C), ∀xi, (3)

where 1(·) is a function which outcome is one whenever itsargument is true and zero otherwise. However, factors such




(a) Frequency distribution (b) Cumulative frequency distributionfor functions in (a) and (c)

(c) Smoothed estimation of the fre-quency distribution

Fig. 7. Estimation of the smoothed frequency function. Frequency-based estimation of the probability mass function (a) may be improved bysmoothing the cumulative frequency distribution (b). The resulting smoothed frequency distribution draws information from neighbor cells to improveestimation.

as the position of the anchor point, the selection of the binswidth and the number of samples can have a substantialeffect on the estimation of the probability mass distributionbased on histograms. These effects may be seen clearly in thecase of complex, expensive or time-consuming experimentswhich could result in a sparse set of observations. It hasbeen argued that in these cases a better estimation of aparticular probability cell could be obtained by drawinginformation from nearby cells [17]. Aitchison and Aitekenwere the first to introduce a kernel function to smoothdiscrete probability distributions [1]. Nowadays, researchhas resulted in several kernel smoothers to select from [41],whose suitability depends on the particular problem at handand the features in the model that require to be enhanced.At our end, and despite the fact that some other optionsmay be available including those data-driven which makeno prior assumption about the form of the distribution [6],we compute an estimate of the probability mass function,P (xi|C), from a normalized smoothed histogram, fs(i|C).Note that due to the smoothing process, fs(i|C) may notbe an integer number. In turn, the smoothed histogram iscomputed by applying a Gaussian kernel on the cumula-tive histogram. In this case, the bandwidth, h, is a tuningparameter to select the spread of the kernel smoother. Itsvalue could be obtained by Bayesian optimization [69] usingcross-validation [41] or plugging in from the sample [17](the process is illustrated in Figure 7). At its end, the priorestimation, P (C), can be obtained empirically as the ratiobetween the number of observations assigned to each class,{Cn, Cc}, and the number of interviews.

A possible criterion to take a decision L(x) about thecustomer’s perception of competence could be based on thenumber of observed events x and a certain threshold α, suchas

L(x) ={

Cc if x ≥ α,Cn otherwise. (4)

The performance of the classifier could be evaluatedusing Receiving Operating Characteristics (ROC) analysis[55]. For a certain decision threshold α and a set of mea-surements S, the number of correct decisions (true positives(tp) and true negatives (tn)) can be computed based on the

Fig. 8. Illustration for the outcomes of a decision L(x). Given theprobability distributions corresponding to the classes competent andnon-competent, a decision to classify an event x is made based on athreshold α. For the observations made in an experiment, this will resultin a certain number of true positives, false positives, true negatives andfalse negatives. The optimal threshold will correspond to a combinationbetween the outcome and its associated cost. This illustration corre-sponds to nw

c , the number of nods made by the customer as observedwith a wearable camera.




expressions(see Figure 8)

tp =∑i≥α

fs(i|Ci = Cc) and tn =∑i<α

fs(i|Ci = Cn),

(5)whereas the number of wrong decisions (false positives (fp)and false negatives (fn)) can be computed as

fp =∑i≥α

f(i|Ci = Cn) and fn =∑i<α

f(i|Ci = Cc).

(6)

For ROC analysis, the expressions summarizing the over-all performance of the classifier include the sensitivity ortrue positive rate (tpr), and fall-out or false positive rate (fpr).They are defined as [59]

fpr =fp

fp+ tnand tpr =

tp

tp+ fn. (7)

Then, by changing the value of α over the possible numberof nods or mirrorings, we will obtain the scores for sen-sitivity and fall-out to construct a ROC curve. A widelyused performance criterion corresponds to the area underthe curve (AUC), basically the probability that a sample willbe classified correctly [59], and equivalent to the Wilcoxontest of ranks [23], which in turn is preferred to the pairedStudent’s t-test when the sample cannot be assumed to benormally distributed.

The uncertainty of the ROC analysis is evaluated usingcross-validation with the leave-one-out strategy [39]. Thatis, given our sample of n elements, we use n − 1 elementsto generate the estimated mass distributions, P (Cc|x) andP (Cn|x), and leave one observation out. We repeat thisprocedure for each element in the set of observations pro-ducing in every loop a ROC curve. The envelope and meanline constitute the uncertainty and expected ROC curve,respectively. In principle, this approach aims to minimizethe estimation bias and, given our sample size, its computa-tional cost is negligible. The ROC analysis may be a usefulguide to gain intuition on the trade off between sensitivity(true positive rate) and specificity (1 - false positive rate).The selected threshold α is the result of understanding thistrade off and the consequential costs associated to soundand wrong decisions.

Out of the eight possible configurations, where the num-ber of nodding or mirroring events is measured, the fixed orwearable cameras is used or the customer or psychologistis observed, we analyze whether they are equally effectiveto assess the perception of competence. Using the dataobtained from the interviews, we performed a balanced one-way Analysis of Variance (ANOVA) to test the hypothesisof equal means. Given that mirroring detection is based onnodding recognition, these two variables cannot be assumedto be confounding [26]. We did not perform a test of equalvariances, as some research has pointed out [11] that theANOVA is insensitive to departures from this assumptionwhen the sample sizes are equal. When the null hypothesisis rejected, there is the need to pursue post hoc analysis viaa multiple comparison procedure (MCP). The problem of se-lecting the most appropriate MCP is a difficult one becauseit depends on both objective and subjective properties ofthe observations being analyzed. For instance, Saville [54]

argues that the most conservative MCP , the ones aiming toreduce type I error, are the ones more inconsistent. Indeed, ithas been shown that when the variance is similar, the Tukey-Kramer’s test offers optimal results [57]. Therefore, we useTukey-Kramer’s test, which essentially is a series of t-testswith correction for dataset-wise error-rate.

Other classifiers could be used as well. For instance, alogistic regression classifier could be constructed using thenumber of nodding/mirroring events as the independentvariable and the competence/non-competence label as thedependent variable. For our problem, the dependent vari-able is discrete and binary. In logistic regression we use aBernoulli probability distribution defined as

P (Ci|x) =M∏i=1

λCi(1− λ)1−Ci , (8)

where Ci ∈ {0, 1} corresponding to the classes non-competent and competent. Correspondingly, the logisticsigmoid function λ is defined as

λ =1

1 + exp(ϕ0 + ϕ1x), (9)

and ϕ0 + ϕ1x is a linear function of the number of nod-ding/mirroring events. The parameters ϕ0 and ϕ1 can befound using Maximum Likelihood on the logarithm of (8).It has been shown that although the solution is not closed,it can be efficiently solved using a numerical method suchas the Newton’s one because (8) is a concave function ofϕ0 and ϕ1 [50]. Again using leave-one-out cross validationto construct the classifiers, we obtained the results shownin Table 2. In Table 2, the observed accuracy (ao) refers tothe sum of the number of true positives and true negativesdivided by the total number of observations. Similarly,kappa is a comparison between observed and expectedaccuracy. Expected accuracy (ae) is the accuracy of a randomclassifier. We compute ae from the confusion matrix. First,we sum the products of the relative frequency for thedifferent classes (for the confusion matrix in Figure 8, aowould be (0.76 + 0.60)/2 = 0.68, whereas ae would be0.76× 0.24+ 0.4× 0.6 = 0.4224). Kappa (κ) is then definedas

κ =ao − ae1− ae

. (10)

RESULTS

In this section, we detail the results of applying our methodto the scenario described above. Firstly, we describe themethod used to recognize head gestures and detect mir-rorings. Then, we explain how the label for competent andnon-competent was generated from the interviews. Next, wepresent the results of applying the proposed classificationscheme with a comparison with logistic regression.

Head Gestures and Mirroring DetectionFigure 9 shows the precision and recall curves of themirroring detection algorithm for the static and wearablecamera videos. We obtained these curves by changing thesensibility of the observation used for training the head




0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00Recall

Pre

cisi

on

Cam Static Wearable

Fig. 9. Mirroring detection performance. The solid circle in the curvesshows the position for the highest F1-score in the validation set.

gestures recognition. When sensibility varies, this has twoeffects: on one hand, if the system is very sensible it willachieve high recall but low precision; on the other hand, ifthe system is less sensible it will achieve higher precisionbut low recall. To find the optimal sensibility, we use the F1

score defined as F1 = 2×(precision×recall)/(precision+recall)[55]. The highlighted points on the curves denotes the valueswith the highest F1 score. We can see that the performanceof the gestures recognition in the static-camera videos (redcurve) is higher than the performance in the wearable-camera videos. This is explained in terms of the ego-motionresiduals despite video stabilization. We have identified twomain sources of miss-detection or false negatives: (1) fastchanges in head motion cause head tracking losses and (2)the stabilization algorithm smooth out subtle gestures.

Customer Perception Analysis

The qualitative analysis of the interaction resulted in theclassification of the customer perception about the psycho-logist as either competent or non-competent. Here we detailthe properties in both categories.

Psychologist Perceived as Competent

When customers were satisfied with the interaction, theyperceived the psychologist as kind, friendly, well mannered,polite and attentive. The interaction produced on customersa feeling of being understood in a friendly environment,which was associated with empathy. Besides, they estimatedthat the psychologist was highly skilled because he was gi-ving clear explanations, had self-confidence and was able tolisten and respond appropriately. All these factors generatedin customers confidence that the psychologist could helpthem to solve the problem, and increased their intention tohire him (see Figure 10):

“The interaction was very nice. He was very clear, he listenedto me and I trust him.” (Woman, 18 years).

Psychologist perceived as non-competentParticipants receiving less nodding/mirroring tended to bedissatisfied with the interaction. Although, we attemptedto expose to all customers to the same experimental con-ditions (e.g., personal problem script, questions asked bycustomers, explanations of the psychologist, person actingas psychologist, physical layout), they gave arguments tojustify their dissatisfaction precisely on issues that had beencontrolled. For example, people in high nodding/mirroringcondition tended to be satisfied with the explanations of thepsychologist, and those who received less nodding tendedto argue that explanations were not enough. In general,customers could not relate their dissatisfaction to the restric-tion of noddings/mirroring. They tended to rationalize theirdissatisfaction in two ways: (1) with specific reasons relatedto the lack of competence of the psychologist or (2) withcontextual reasons. We denominated specific reasons to theformer and contextual reasons to the latter(see Figure 11).

Specific reasons to explain discomfort were mostly de-rived from the perception of the psychologist’s lack ofexperience. Dissatisfied customers requested the psycho-logist to: give more explanations, be more attentive, bemore empathetic and inquire more deeply on the contextof the problem. We found that all these perceptions canlead the customers to believe that the psychologist had noreal interest to help, and hence no vocation. Even more, hisperformance in the interaction could produce a suspicionthat the psychologist was just looking to profit:

“I felt that he was listening just out of obligation, it seems thathe was there just for the money.” (Man, 19 years).

Some other specific reasons to explain discomfort wereassociated with the deficient service offered by the psycho-logist. Customers expected to be invited for beverages, to beasked their names, to be offered a seat, to be introduced, andfor the psychologist to give his professional background:

“I have my doubts because I need to know better this psycho-logist, and even I have to know some other psychologists, besides;I would like to know what he has studied.” (Man, 21 years).

Contextual reasons for dissatisfaction included the physi-cal layout, the psychologist’s clothing and the experimentalmedium. Generally, contextual reasons reinforced the specificones and increased the reluctance to hire the psychologist.

“I wanted to be convinced at the moment I opened the officedoor, I needed something to leave me with the feeling ‘I have to behere’. I do not know if it was the office or the person [psychologist],but something in there told me ‘You should find someone else’. ”(Man, 19 years).

Mirroring and its Relationship with Competence Assess-ment

In our experiments, there were two issues we investigated.First, we wanted to find out whether the number of nodsor mirroring events could be the basis to distinguish if thecustomer perceived the psychologist as competent or non-competent. If so, then we wanted to find out which onewas a better indicator, the number of nods or the numberof mirroring events; whether it is better to use static orwearable cameras, and whether the gestures displayed bythe customer are better to distinguish the classes than thepsychologist’s or vice versa. We performed 48 interviews




Fig. 10. Attributions to service provider when evaluated as competent . The competent class was related to clients satisfied with the interaction anda positive perception of the psychologist’s professional skills, which results in an inclination to hire. Boxes represent categories and arrows standfor relationships.

Fig. 11. Attributions to service provider when evaluated as non-competent . The non-competent class was related with dissatisfied clients aboutspecific reasons, such as deficient professional skills and service, which results in an inclination for no-hiring. Contextual reasons contribute toreinforce specific reasons. Boxes represent categories and arrows stand for relationships.

where a confederated psychologist interacted with peopleacting as customers. Our computer vision system capturedimages and obtained the number of nods and the numberof times the interlocutors mirrored a nodding for eachinterview. Afterward, a professional psychologist (one ofthe authors of this paper) talked with the acting customersand analyzed this conversation qualitatively. As a result,we have a set of pairs S = {(x,C)i}, for i = 1, . . . , 48,where x reflects either the number of nods or the numberof mirroring events and C is the label assigned to theconversation. Using the procedure described previously, we

constructed a probability mass distribution, for each of theclasses, based on the product between the likelihood andthe prior. For the likelihood, we used a Gaussian smootherkernel function. Using the plug-in method described by [17],we dynamically computed the bandwidth at each iterationof the leave-one-out. Typical values varied between 0.8and 0.9 for all the configurations. For the prior, we usedthe ratio between the number of interviews labeled withthe perception of competence (30) or non-competence (18)versus the total number of interviews (48). This resulted ina prior value of competence, P (Cc), of 0.625, and a prior




value of non-competence, P (Cn), of 0.375. We used the areaunder the curve as the indicator for the performance of theclassifier. As we applied the leave-one-out training strategyfor the samples selected, we obtained a different perfor-mance curve for each configuration. Figure 14 illustrates thecurves P (Cc|x) and P (Cn|x) obtained through this process.It includes the mean ROC curve and the ROC curves for themaximum and minimum AUC .

Now we turn our attention to whether a particularconfiguration of type of camera (fixed or wearable), per-son observed (psychologist or customer) or event detected(mirroring or nodding) makes to vary the classifier per-formance. The variable of interest is thus the Area Underthe Curve as estimated for each configuration at each loopof the leave-one-out training process. To that end, weperformed a balanced design, single factor ANOVA analysisto test the null hypothesis that the observed means wererealizations of the same underlying process. With an F -value of 1.08×103 and a p-value of 0.00, we rejected thishypothesis. To determine which configurations are differentfrom the others, we conduct a Tukey-Kramer post-hoc testor pair comparison. The results for this test are summarizedin Table 3. Some intuition may be gained analyzing theseresults and the boxplot in Figure 13. We can verify thatthe configurations in the group {nw

p , nfp ,m

wc ,m

fc } have not

significant differences, while all the other configurationshave significant differences to each other.

It may be interesting to compare the Naive Bayesianand the Logistic Regression classifiers described previouslyusing the results obtained in Table 2. In a first approach,the Person’s correlation coefficient between the Accuracyand the AUC columns, with r = 0.955 and p − value of2×10−4 lead us to reject the null hypothesis stating that thevalues are uncorrelated. However, as Bland et al. [10] pointout, correlation is different to agreement. High correlationmay exist along any straight line while perfect agreementexists along the line of equality. To study agreement betweenthe measurement instruments, we adjust a straight lineto the points (Accuracy,AUC) with the distance betweenthem and the model as our optimization criterion. Then,following Bland et al. suggestion, we compute the distancewith respect to the resulting straight line model and estimatethe standard deviation of these differences. Figure 12 showthe result of these differences. The standard deviation is0.0146.

DISCUSSION

Our experimental results emphasize the importance of nod-ding/mirroring during social interaction. The attributes re-lated to the detected competence of a service provider in adyadic interaction seem to be highly related to them. Whenthe customers received less nodding/mirroring, they hada tendency to feel dissatisfied and they frequently couldnot identify those nodding/mirroring as the reason for thatfeeling. As a consequence, they started a reasoning processto justify their discontent. For instance, they ascribed theirdissatisfaction to the psychologist’s lack of skills or pro-fessional experience. Even when every participant receivedan explanation along the same lines and equal advice, the

Fig. 12. Agreement between the results of the Bayesian and Logisticregression classifier (using the values in Table 2). The standard de-viation between their differences is 0.0146. The central dashed linecorresponds to the straight line model obtained when we minimize thedistance to the points (Accuracy, AUC. The blue points correspond tothe distance to that line. The dotted lines are located at one standarddeviation, supporting the claim that the classifiers are in fact correlated.

TABLE 2Classifier performance summary. Observed accuracy (a0) and

expected accuracy (κ) of the logistic regression classifier comparedwith the mean AUC (AUC) and one standard deviation of the Bayes

classifier.

Feature Logistic regression Naive BayesAccuracy (a0) Kappa (κ) AUC AUCσ

nwc 0.60 -0.08 0.64 0.02

nfc 0.60 -0.08 0.65 0.01

nwp 0.75 0.42 0.76 0.01

nfp 0.75 0.42 0.77 0.01

mwc 0.81 0.58 0.77 0.01

mfc 0.81 0.58 0.78 0.01

mwp 0.67 0.10 0.66 0.01

mfp 0.73 0.43 0.72 0.01

Fig. 13. Illustration of the AUC intervals. For each configuration we showthe mean value (circle) and one standard deviation (segment extremes).

dissatisfied customers had a tendency to require the ser-vice to be broadened (e.g., asking for clearer explanations,improvement of the experimental physical settings). Theyhad doubts on the professional advice and ultimately onthe psychologist’s competence. In other words, they dis-




Nod

ding

wearable fixed

cust

omer

(a) UAC = 0.64 ± 0.02 (b) UAC = 0.65 ± 0.01

psyc

holo

gist

(c) UAC = 0.76 ± 0.01 (d) UAC = 0.77 ± 0.01

Mir

rori

ng

cust

omer

(e) UAC = 0.77 ± 0.01 (f) UAC = 0.78 ± 0.01

psyc

holo

gist

(g) UAC = 0.66 ± 0.01 (h) UAC = 0.72 ± 0.01

Fig. 14. ROC Curves for the configurations in the experiment. Larger AUC represent a better discrimination between competent and non-competentclasses. The dashed ROC curves correspond to the maximum and minimum AUC , are the result of leave-one-out cross-validation and reflect AUCuncertainty. The subfigure caption states the mean and standard deviation for the AUC . The inserted subfigure represents the probability massdistribution when all the observations are considered. The lines between the dots are drawn to improve readability. Although a thorough fullanalysis is provided in the text, the ROC curves in this illustration indicate that customer mirroring is a better predictor than psychologist mirroring;contrariwise, the number of psychologist’s nods is a better predictor than the number of customer’s nods; and that with the notable exceptionof the psychologist mirroring, the computer vision algorithm we used worked about equally well whether it was acquiring images from wearablesmartglasses or fixed cameras.




TABLE 3Tukey-Kramer’s MCP for the observations corresponding to nodding

and mirroring. The cells show the significance level of the interaction, p.p < 0.05 are highlighted in bold. The variables in the set

{nfc , n

fp , n

wc , nw

p ,mfc ,m

fp ,m

wc ,mw

p }, where nβα and mβ

α reflect thenumber of automatically detected nods and mirroring events

respectively, α ∈ {f, w} represents whether the observation wasperformed with a fixed, f , or wearable camera, w, and β ∈ {c, p}

highlights whether the observation was made on the customer, c, or onthe psychologist, p. For instance, nf

p is the number of nods made bythe psychologist as observed from a fixed camera.

nfc nw

p nfp mw

c mfc mw

p mfp

nwc 0.002 0.000 0.000 0.000 0.000 0.000 0.000

nfc 0.000 0.000 0.000 0.000 0.000 0.000

nwp 0.060 0.159 0.000 0.000 0.000

nfp 1.000 0.359 0.000 0.000

mwc 0.168 0.000 0.000

mfc 0.000 0.000

mwp 0.000

trusted the psychologist. Hence, it appears that gesturessuch as nodding, or actions such as mirroring, can affectto a large extent the interpretation of what is being said.Nodding/mirroring can help to interact more smoothly andwith a comfortable feeling, which impacts the perception ofcompetence.

Even more, when people receive less nod-ding/mirroring, the event turns out into different intentionsof actions and negative attributions. We found thatcustomers who received less nodding/mirroring couldassign negative attributions to the psychologist such ashaving no real interest to help and no vocation. It appearsthat the customers experiencing discomfort, due to lackof nodding/mirroring, could arouse a suspicion that thepsychologist was just looking to profit.

This study stresses out that nodding/mirroring gesturescan impact the majority of the dimensions proposed byEpstein and Hundert [21]. People exposed to less noddingconditions was usually more dissatisfied with the psycholo-gist’s knowledge (cognitive dimension), his technical per-formance and proposed advice (technical and integrativedimensions), and his communication skills (relationship di-mension). In the affective domain, the customers referred alack of caring and emotional bonding from the psychologist,which is related to the dimension of habits of mind —thewillingness, patience, and emotional awareness to use theprofessional skills judiciously and humanely.

With respect to the quantitative analysis of the results,although it remains to be seen whether a different classifiercan perform better, this research shows that either a simpleBayesian scheme or a logistic regression could be usedfor this task. Using the data collected, we estimated theunderlying probabilistic mass distribution using smoothing.In fact, it has been argued that nonparametric smoothingcan be beneficial for cases similar to ours, where the datais sparse. For instance, Bishop et al. [9] showed that theestimators obtained through smoothing are often better thanproportions under squared error assessment. The smoothingprocess via sampling of the cumulative distribution andthe estimation of the kernel bandwidth dynamically arewell-established procedures. Our selection of the AUC asthe performance criterion aims to stress the importance

of improving the detection rate and reducing the missingrate. The leave-one-out highlighted the sensitivity of theestimation of performance. Nonetheless, in all cases, for allconfigurations, the AUC resulted well above the value of0.5, i.e., the resulting classifier gives results above randomdecisions. This may be important for schemes such as boost-ing, where the aim is to construct complex ensembles ofclassifiers.

The ANOVA analysis provided further insight and givesstatistical confidence to assess that not all the configu-rations, i.e., nods/mirrorings, customer/psychologist andwearable/fixed, resulted in the same level of performance.In fact, it highlights that the number of mirroring eventsmade by the customer is a better classification predictor thanhis/her number of nods. Even more, customer mirroring is abetter predictor than psychologist’s mirroring. On the otherhand, by a considerable margin the psychologist’s numberof nods is a better predictor than the customer’s numberof nods. In addition, there seems to be no difference onwhether the camera used in the computer vision systemwas wearable or fixed as the analysis of the behavior ofthe customer or the psychologist provided the same levelof performance to assess the perceived competence. Anexception is made for the analysis of the mirroring of thepsychologist where the fixed camera seems to provide betterperformance.

To reduce in as much as possible the factors involved inthe experiment, only one confederated psychologist inter-acted with the participants. With this, our intention was tominimize the effect of personal differences, such as physicalappearance or tone of voice. However, it may be importantto validate the findings of this research with experimentsincluding a mixture of men and women as confederatepsychologists. In any case, although our research was de-veloped in a highly controlled environment, there are someconfounding variables, such as micro-gestures, that are dif-ficult to isolate in human social interactions.

Another interesting direction of research would be torate competence on a Likert scale. Then a statistical testscould provide an indicator which in turn can be associatedto the number of noddings/mirrorings to construct a classi-fier. This approach will require a study in its own right. Oneof our main motivations to pursue a qualitative analysiswith open ended interviews is to circumvent a problemknown as social desirability bias [25], or the desire of respon-dents to avoid embarrassment and project a favorable imageto others. Contrasted with directly asking the participants,which would have resulted in the use of a Likert scaletype of approach, qualitative interviews take longer to carryon, and their evaluation is based on the free testimony ofparticipants, where they have the chance to explain theirexperiences, express their feelings and elaborate arguments.At his/her end, the interviewer has the opportunity toexplore the topics relevant to the investigation. Our use ofthe intercode procedure [13], [36], with two investigatorsseparately performed the analysis, was intended to mitigatesubjectivity.




CONCLUSION

In this paper, we have introduced an automatic social as-sistant (based on smartglasses) for the automatic inferenceof competence during face-to-face social meetings. In ourstudy we have shown that it is indeed possible to infer,with a level of confidence significantly well above randomchance, the resulting perception of competence acquiredby an acting customer after an interaction with a serviceprovider. Furthermore, our results have also shown that: (1)customer mirroring is a better predictor than psychologistmirroring; contrariwise, (2) the number of psychologist’snods is a better predictor than the number of customer’snods; (3) with the notable exception of the psychologistmirroring, the computer vision algorithm we used workedabout equally well whether it was acquiring images fromwearable smartglasses or fixed cameras.

Our scenario resembled a face-to-face interaction be-tween a customer and a psychologist. To validate our hy-pothesis, we used a representative number of participants.Therefore, for the people taking part in the experiment, ourresults show a significant degree of correlation between theobservations and the perceived perception of competence.Although the system was tested on a user-defined scenario,the proposed social assistant could serve as a training toolfor psychologists (for automatic annotation/summarizationof videos). In addition, it could be used as a supportive toolby people affected by a visual impairment to improve theirsocial integration, e.g., the assistant could inform the userwhen his counterpart performs a head nodding, and thus of-fering him the opportunity to respond to the gesture. Futureresearch will be devoted to extend the set of social signalsthat our assistant is able to recognize and to reproduce ofour results in a more natural setting. Current advances intechniques known as computer vision in the wild, may signalthe dawn of this possibility.

ACKNOWLEDGMENTS

This work was partially supported by CONACYT underGrant INFR-2016-01: 268542, by IPN-SIP under Grant No.20160132 and by UCMexus, for Joaquın Salas, and byMINECO Grants TIN2013-41751 and TIN2013-49982-EXP,Spain, for Bogdan Raducanu. Juan Ramon Terven was par-tially supported by Tecnologico Nacional de Mexico andCONACYT.

REFERENCES

[1] Aitchison, J., Aitken, C.: Multivariate Binary Discrimination by theKernel Method. Biometrika 63(3), 413–420 (1976)

[2] Allwood, J., Cerrato, L.: A Study of Gestural Feedback Expres-sions. In: Nordic Symposium on Multimodal Communication, pp.7–22 (2003)

[3] Ambady, N., Rosenthal, R.: Thin Slices of Expressive Behavioras Predictors of Interpersonal Consequences: A Meta-Analysis.Psychological Bulletin 111(2), 256 (1992)

[4] Ambady, N., Rosenthal, R.: Half a Minute: Predicting TeacherEvaluations from Thin Slices of Nonverbal Behavior and PhysicalAttractiveness. Journal of Personality and Social Psychology 64(3),431 (1993)

[5] Bar, M., Neta, M., Linz, H.: Very First Impressions. Emotion 6(2),269–278 (2006)

[6] Bernacchia, A., Pigolotti, S.: Self-consistent Method for DensityEstimation. Journal of the Royal Statistical Society: Series B(Statistical Methodology) 73(3), 407–422 (2011)

[7] Bilakhia, S., Petridis, S., Nijholt, A., Pantic, M.: The MAHNOBMimicry Database: A Database of Naturalistic Human Interac-tions. Pattern Recognition Letters 66, 52–61 (2015)

[8] Bishop, C.: Pattern Recognition and Machine Learning (2007)[9] Bishop, Y., Fienberg, S., Holland, P.: Discrete Multivariate Analy-

sis: Theory and Practice. Springer Science & Business Media (2007)[10] Bland, J.M., Altman, D.: Statistical Methods for Assessing Agree-

ment between Two Methods of Clinical Measurement. The Lancet327(8476), 307–310 (1986)

[11] Bradley, A.: The Use of the Area Under the ROC Curve in theEvaluation of Machine Learning Algorithms. Pattern Recognition30(7), 1145–1159 (1997)

[12] Breck, E.: Soundscriber (1998)[13] Burla, L., Knierim, B., Barth, J., Liewald, K., Duetz, M., Abel,

T.: From Text to Codings: Intercoder Reliability Assessment inQualitative Content Analysis. Nursing Research 57(2), 113–117(2008)

[14] Caneel, R.: Social Signaling in Decision Making. In: Master Thesis.MIT Press (2005)

[15] Charmaz, K.: Constructing Ground Theory. A Practical Guidethrough Qualitative Analysis. Sage (2008)

[16] Chartrand, T., Bargh, J.: The Chameleon Effect: The Perception-Behavior Link and Social Interaction. Journal of Personality andSocial Psychology 76(6), 893–910 (1999)

[17] Chu, C.Y., Henderson, D.J., Parmeter, C.F.: Plug-in BandwidthSelection for Kernel Density Estimation with Discrete Data. Econo-metrics 3(2), 199–214 (2015)

[18] Condon, W., Ogston, W.: Speech and Body Motion Synchrony ofthe Speaker-Hearer. In: The Perception of Language, pp. 150–184.Charles E. Merrill (1971)

[19] Cuddy, A., Fiske, S., Glick, P.: Warmth and Competence as Uni-versal Dimensions of Social Perception: The Stereotype ContentModel and the BIAS Map. Advances in Experimental SocialPsychology 40, 61–149 (2008)

[20] Curhan, J., Pentland, A.: Thin Slices of Negotiation: PredictingOutcomes from Conversational Dynamics within the First 5 Mi-nutes. Journal of Applied Psychology 92(3), 802–811 (2007)

[21] Epstein, R., Hundert, E.: Defining and Assessing ProfessionalCompetence. Journal of the American Medical Association 287(2),226–235 (2002)

[22] Farley, S.: Nonverbal Reactions to an Attractive Stranger: The Roleof Mimicry in Communicating Preferred Social Distance. Journalof Nonverbal Behavior. Journal of Nonverbal Behavior 38(2), 195–208 (2014)

[23] Fawcett, T.: An Introduction to ROC Analysis. Pattern RecognitionLetters 27(8), 861–874 (2006)

[24] Feese, S., Arnrich, B., Troster, G., Meyer, B., Jonas, K.: QuantifyingBehavioral Mimicry by Automatic Detection of Nonverbal Cuesfrom Body Motion. In: IEEE International Conference on SocialComputing, pp. 520–525 (2012)

[25] Fisher, R.: Social Desirability Bias and the Validity of IndirectQuestioning. Journal of Consumer Research 20(2), 303–315 (1993)

[26] Fisher, R.A.: The Design of Experiments (1937)[27] Flavio, L., Carneiro, M., Madge, J., Young, M., Meadows, D., John-

son, G., Fienup, J., Quance, D., Schaller, N.: Publication Manual ofthe American Psychological Association. In: Reports, Results andRecommendations from Technical Events Series, C30-56 (2010)

[28] Freeman, W., Roth, M.: Orientation Histograms for Hand GestureRecognition. In: International Workshop on Automatic Face andGesture Recognition, vol. 12, pp. 296–301 (1995)

[29] Gifford, R., Ng, C., Wilkinson, M.: Nonverbal Cues in the Em-ployment Interview: Links between Applicant Qualities and In-terviewer Judgments. Applied Psychology 70(4), 729–736 (1985)

[30] Gladwell, M.: The Tipping Point: How Little Things can make aBig Difference. Little, Brown (2006)

[31] Glaser, B., Strauss, A., Strutzel, E.: The Discovery of GroundedTheory; Strategies for Qualitative Research. Nursing Research17(4), 364 (1968)

[32] Gueguen, N.: Psychologie du Consommateur: Pour mieux Com-prendre Comment on Vous Influence. Dunod (2011)

[33] Gueguen, N., Jacob, C., Martin, A.: Mimicry in Social Interaction:Its Effect on Human Judgment and Behavior. European Journal ofSocial Sciences 8(2), 253–259 (2009)

[34] Gueguen, N., Martin, A., Meineri, S.: Mimicry and Helping Be-havior: An Evaluation of Mimicry on Explicit Helping Request.Journal of Social Psychology 151(1), 1–4 (2011)




[35] Hadar, U., Steiner, T., Rose, C.: Head Movement during ListeningTurns in Conversation. Nonverbal Behavior 9(4), 214–228 (1985)

[36] Hernandez, R., Fernandez, C., Baptista, P.: Metodologıa de laInvestigacion. Mc Graw Hill (2010)

[37] Iacoboni, M.: Mirroring People: The Science of Empathy and HowWe Connect with Others. Picador, New York (2009)

[38] Jacob, C., Gueguen, N., Martin, A., Boulbry, G.: Retail Sales-people’s Mimicry of Customers: Effects on Consumer Behavior.Journal of Retailing and Consumer Services 18(5), 381–388 (2011)

[39] James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction toStatistical Learning, vol. 112. Springer (2013)

[40] Knapp, M., Hall, J.: Nonverbal Communication in Human Inter-action. Cengage Learning (2009)

[41] Kokonendji, C., Kiesse, T.: Discrete Associated Kernels Methodand Extensions. Statistical Methodology 8(6), 497–516 (2011)

[42] Kumano, S., Otsuka, K., Matsuda, M., Yamato, J.: Analyzing Per-ceived Empathy based on Reaction Time in Behavioral Mimicry.Transactions on Information and Systems 97(8), 2008–2020 (2014)

[43] Lepri, B., Subramanian, R., Kalimeri, K., Staiano, J., Pianesi, F.,Sebe, N.: Connecting Meeting Behaviour with Extraversion: ASystematic Study. IEEE Transactions on Affective Computing 3(4),443–455 (2012)

[44] Madan, A., Caneel, R., Pentland, A.: Voices of Attraction. In:Proceedings. of Augmented Cognition (2005). See TR584, http://hd.media.mit.edu

[45] McGovern, T., Jones, B., Morris, S.: Comparison of Professionalversus Student Ratings of Job Interviewee Behavior 26(2), 176–179(1979)

[46] Muhr, T.: Atlas.ti: The Knowledge Workbench. Scientific SoftwareDevelopment (2004)

[47] Nguyen, L., Marcos-Ramiro, A., Marron, M., Gatica-Perez, D.:Multimodal Analysis of Body Communication Cues in Employ-ment Interviews. In: International Conference on MultimodalInterfaces, pp. 437–444 (2013)

[48] Paxton, A., Dale, R.: Frame-Differencing Methods for MeasuringBodily Synchrony in Conversation. Behavior Research Methods45(2), 329–343 (2013)

[49] Pentland, A.: Social Signal Processing. IEEE Signal ProcessingMagazine 24(4), 108–111 (2007)

[50] Prince, S.: Computer Vision: Models, Learning and Inference.Cambridge University Press (2012)

[51] Raducanu, B., Gatica-Perez, D.: Inferring Competitive Role Pat-terns in Reality TV Show through Nonverbal Analysis. Multime-dia Tools and Applications 56(1), 207–226 (2012)

[52] Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach.Prentice-Hall 25, 27 (1995)

[53] Sanchez-Cortes, D., Aran, O., Mast, M.S., Gatica-Perez, D.: Iden-tifying Emergent Leadership in Small Groups using NonverbalCommunicative Cues. In: International Conference on MultimodalInterfaces (2010). Article 39

[54] Saville, D.: Multiple Comparison Procedures: Cutting the GordianKnot. Agronomy Journal 107(2), 730–735 (2015)

[55] Sokolova, M., Lapalme, G.: A Systematic Analysis of PerformanceMeasures for Classification Tasks. Information Processing & Man-agement 45(4), 427–437 (2009)

[56] Staiano, J., Lepri, B., Aharony, N., Pianesi, F., Sebe, N., Pentland,A.: Friends don’t Lie - Inferring Personality Traits from SocialNetwork Structure. In: Ubicomp, pp. 321–330 (2012)

[57] Stoline, M.: The Status of Multiple Comparisons: SimultaneousEstimation of All Pairwise Comparisons in One-Way ANOVAdesigns. The American Statistician 35(3), 134–141 (1981)

[58] Straus, A., Corbin, J.: Basics of Qualitative Research: GroundedTheory Procedures and Techniques. Newbury Park, CA: Sage(1990)

[59] Swets, J., Dawes, R., Monahan, J.: Better Decisions through Sci-ence. Scientific American 283, 82–87 (2000)

[60] Tashakkori, A., Creswell, J.W.: The New Era of Mixed Methods.Journal of Mixed Methods Research 1(1), 3–7 (2007)

[61] Terven, J., Raducanu, B., Meza-de Luna, M.E., Salas, J.: Head-gestures Mirroring Detection in Dyadic Social Interactions withComputer Vision-based Wearable Devices. Neurocomputing 175,866–876 (2016)

[62] Van Baaren, R., Holland, R., Kawakami, Van Knippenberg, P.:Mimicry and Prosocial Behavior. Psychological Science 15(1), 71–74 (2004)

[63] Van Baaren, R., Holland, R., Kawakami, K., Van Knippenberg, A.:Mimicry and prosocial behavior. Psychological science 15(1), 71–74 (2004)

[64] Vinciarelli, A., Pantic, M., Bourlard, H.: Social Signal Processing:Survey of an Emerging Domain. Image and Vision computing27(12), 1743–1759 (2009)

[65] Wagner, P., Malisz, Z., Kopp, S.: Gesture and Speech in Interaction:An Overview 57, 209–232 (2014)

[66] Weng, C., Chu, W., Wu, J.: Movie Analysis based on Roles’ SocialNetwork. In: International Conference on Multimedia and Expo,pp. 1403–1406 (2007)

[67] Willis, J., Todorov, A.: First Impressions Making up Your Mindafter a 100-ms Exposure to a Face. Psychological Science 17(7),592–598 (2006)

[68] Xiong, X., Torre, F.: Supervised Descent Method and its Applica-tions to Face Alignment. In: IEEE Conference on Computer Visionand Pattern Recognition, pp. 532–539 (2013)

[69] Zougab, N., Adjabi, S., Kokonendji, C.: A Bayesian Approach toBandwidth Selection in Univariate Associate Kernel Estimation.Journal of Statistical Theory and Practice 7(1), 8–23 (2013)

Marıa-Elena Meza-de-Luna is researcher at theUniversidad Autonoma de Queretaro, Mexico.She is interested in the social and cultural is-sues to prevent the expression of violence. Cur-rently, she is director of !Atrevete Ya!/iHollaback!-Queretaro, an organization to prevent street ha-rassment (www.atrevete-ya.org) and presidentof IIPSIS, an NGO devoted to research and in-tervention in psychosocial matters. Contact herat [email protected].

Juan R. Terven obtained a doctor degree fromInstituto Politecnico Nacional, Mexico. He worksas a half-time lecturer at Mazatlan Institute ofTechnology (ITM). He has been a graduate vis-iting student at MIT and a research intern in Mi-crosoft. Terven’s research interests include em-bedded systems, computer vision, and assistivetechnologies design. Contact him at [email protected].

Bogdan Raducanu is a senior researcher andproject director at the Computer Vision Centerin Barcelona, Spain. His research interests in-clude computer vision, pattern recognition, andsocial computing. Raducanu received a PhDin computer science from the University of theBasque Country, Bilbao, Spain. Contact him [email protected].




Joaquın Salas is a professor in the area of Com-puter Vision at Instituto Politecnico Nacional. Hisresearch interests include the development ofassistive technology for the visually impaired andvisual interpretation of human activity. Salas re-ceived a PhD in computer science from ITESM,Mexico. Contact him at [email protected].

ieee transactions on affective computing, vol. x, no. y ... · psychologist mirroring, the computer...

Documents