detecting perceived quality of interaction with a robot ......several efforts towards the design of...

17
Auton Robot (2017) 41:1245–1261 DOI 10.1007/s10514-016-9592-y Detecting perceived quality of interaction with a robot using contextual features Ginevra Castellano 1 · Iolanda Leite 2 · Ana Paiva 2 Received: 13 March 2015 / Accepted: 28 June 2016 / Published online: 8 July 2016 © Springer Science+Business Media New York 2016 Abstract This work aims to advance the state of the art in exploring the role of task, social context and their interde- pendencies in the automatic prediction of affective and social dimensions in human–robot interaction. We explored several SVMs-based models with different features extracted from a set of context logs collected in a human–robot interaction experiment where children play a chess game with a social robot. The features include information about the game and the social context at the interaction level (overall features) and at the game turn level (turn-based features). While over- all features capture game and social context at the interaction level, turn-based features attempt to encode the dependencies of game and social context at each turn of the game. Results showed that game and social context-based features can be successfully used to predict dimensions of quality of inter- action with the robot. In particular, overall features proved to perform equally or better than turn-based features, and game context-based features more effective than social context- based features. Our results show that the interplay between game and social context-based features, combined with fea- tures encoding their dependencies, lead to higher recognition performances for a subset of dimensions. B Ginevra Castellano [email protected] Iolanda Leite [email protected] Ana Paiva [email protected] 1 Division of Visual Information and Interaction, Department of Information Technology, Uppsala University, Uppsala, Sweden 2 INESC-ID and Instituto Superior Tecnico, Technical University of Lisbon, Porto Salvo, Portugal Keywords Affect recognition · Quality of interaction · Social robots · Human–robot interaction · Contextual features 1 Introduction The recent advances in affective behavioural computing have brought a lot of attention to the design of systems for the automatic prediction of users’ affective states in computer- based applications. Several efforts towards the design of systems capable of perceiving multimodal social, affective cues (e.g., facial expressions, eye gaze, speech, body move- ment, physiological data, etc.) (Castellano et al. 2012) have been investigated in the recent years. Researchers have been investigating how accurately these systems can predict users’ affective states (Zeng et al. 2009) and other social variables related to engagement and quality of user experience, mainly in the areas of human–computer and human–robot interac- tion (Rani and Sarkar 2005; Mandryk et al. 2006; Rich et al. 2010; Lassalle et al. 2011). Despite the increasing interest in this area, the role of context in the automatic recognition of affective and social dimensions is still underexplored. This is somewhat surpris- ing, as behavioural expressions are strictly dependent on the context in which they occur (Castellano et al. 2010). On the other hand, the process of capturing context and its dependencies with behavioural expressions, social sig- nals and affective states is extremely challenging (Ochs et al. 2011). Capturing the role of context in the automatic recognition of affective and social dimensions is of high importance in applications where there is some form of interaction taking place, for example in human–robot interactions. Here, affec- tive and social dimensions are related with contextual aspects 123

Upload: others

Post on 16-Oct-2020

4 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Detecting perceived quality of interaction with a robot ......Several efforts towards the design of systems capable of perceiving multimodal social, affective cues (e.g., facial expressions,

Auton Robot (2017) 41:1245–1261DOI 10.1007/s10514-016-9592-y

Detecting perceived quality of interaction with a robot usingcontextual features

Ginevra Castellano1 · Iolanda Leite2 · Ana Paiva2

Received: 13 March 2015 / Accepted: 28 June 2016 / Published online: 8 July 2016© Springer Science+Business Media New York 2016

Abstract This work aims to advance the state of the art inexploring the role of task, social context and their interde-pendencies in the automatic prediction of affective and socialdimensions in human–robot interaction.We explored severalSVMs-based models with different features extracted froma set of context logs collected in a human–robot interactionexperiment where children play a chess game with a socialrobot. The features include information about the game andthe social context at the interaction level (overall features)and at the game turn level (turn-based features). While over-all features capture game and social context at the interactionlevel, turn-based features attempt to encode the dependenciesof game and social context at each turn of the game. Resultsshowed that game and social context-based features can besuccessfully used to predict dimensions of quality of inter-action with the robot. In particular, overall features proved toperform equally or better than turn-based features, and gamecontext-based features more effective than social context-based features. Our results show that the interplay betweengame and social context-based features, combined with fea-tures encoding their dependencies, lead to higher recognitionperformances for a subset of dimensions.

B Ginevra [email protected]

Iolanda [email protected]

Ana [email protected]

1 Division of Visual Information and Interaction,Department of Information Technology, Uppsala University,Uppsala, Sweden

2 INESC-ID and Instituto Superior Tecnico,Technical University of Lisbon, Porto Salvo, Portugal

Keywords Affect recognition · Quality ofinteraction · Social robots · Human–robot interaction ·Contextual features

1 Introduction

The recent advances in affective behavioural computing havebrought a lot of attention to the design of systems for theautomatic prediction of users’ affective states in computer-based applications. Several efforts towards the design ofsystems capable of perceiving multimodal social, affectivecues (e.g., facial expressions, eye gaze, speech, body move-ment, physiological data, etc.) (Castellano et al. 2012) havebeen investigated in the recent years. Researchers have beeninvestigating how accurately these systems can predict users’affective states (Zeng et al. 2009) and other social variablesrelated to engagement and quality of user experience, mainlyin the areas of human–computer and human–robot interac-tion (Rani and Sarkar 2005; Mandryk et al. 2006; Rich et al.2010; Lassalle et al. 2011).

Despite the increasing interest in this area, the role ofcontext in the automatic recognition of affective and socialdimensions is still underexplored. This is somewhat surpris-ing, as behavioural expressions are strictly dependent onthe context in which they occur (Castellano et al. 2010).On the other hand, the process of capturing context andits dependencies with behavioural expressions, social sig-nals and affective states is extremely challenging (Ochs et al.2011).

Capturing the role of context in the automatic recognitionof affective and social dimensions is of high importance inapplications where there is some form of interaction takingplace, for example in human–robot interactions. Here, affec-tive and social dimensions are relatedwith contextual aspects

123

Page 2: Detecting perceived quality of interaction with a robot ......Several efforts towards the design of systems capable of perceiving multimodal social, affective cues (e.g., facial expressions,

1246 Auton Robot (2017) 41:1245–1261

such as the task the interactants are involved in, their behav-iour, actions, preferences and history (Riek and Robinson2011). As robotic companions are increasingly being stud-ied as partners that collaborate and perform tasks along withpeople (Aylett et al. 2011), the automatic prediction of affec-tive and social dimensions in human–robot interaction musttake into account the role of task and social interaction-basedfeatures.

In the area of automatic affect recognition, there are somegood examples of studies exploring the role of task-relatedcontext (Kapoor and Picard 2005; Martinez and Yannakakis2011; Sabourin et al. 2011). However, research investigatingtask-related context, social context and their interdependen-cies for the purpose of automatically predicting affective andsocial dimensions is still limited, especially in the areas of inhuman–computer and human–robot interaction.

In this work, we investigate the role of social and gamecontext and their interdependencies in the automatic recog-nition of quality of interaction between children and socialrobots. To do so, we measure perceived quality of inter-action with affective [i.e., social engagement (Poggi 2007;Sidner et al. 2004)], friendship [i.e., help and self-validation(Mendelson andAboud 1999)], and social presence [i.e., per-ceived affective interdependence (Biocca 1997)] dimensions,which have been previously shown to be successful in mea-suring the influence of a robot’s behaviour on the relationshipbetween the robot itself and children (Leite 2013).

The interaction scenario involves a social robot, the iCat(Breemen et al. 2005), which plays chess with children andprovides real-time feedback through non-verbal behaviours,supportive comments and actions based on the events unfold-ing in the game and the affective states experienced by thechildren. We derived a set of contextual features related tothe game and the social behaviour displayed by the robotfrom automatically extracted context log files. These featuresencode information at the interaction level (overall features)and the game turn level (turn-based features). The overallfeatures capture game and social context in an independentway at the interaction level, while turn-based features encodethe interdependencies of game and social context at each turnof the game. The extracted features were used to train Sup-port Vector Machines (SVMs) models to predict dimensionsof quality of interaction with the robot during an interactionsession.

Our main research questions in this work are the follow-ing:

– How can game and social context-based features be usedto predict children’s quality of interaction with a robot?

– What is the role of game and social context in the auto-matic prediction of dimensions of quality of interaction?

– How do overall features and turn-based features comparein terms of recognition performance?

– Is the combination of overall and turn-based featuresmore successful than the two sets of features usedindependently for predicting dimensions of quality ofinteraction?

We trained several SVMs-based models using differ-ent sets of game and social context-based features. Resultsshowed that game and social context-based features can suc-cessfully predict children’s quality of interaction with a robotin our investigation scenario. In particular, overall featuresproved to perform equally or better than turn-based fea-tures, and game context-based features were more effectivethan social context-based features. Our results also showedthat the integration of game and social context-based fea-tures with features encoding their interdependencies leads tohigher recognition performances for the dimensions of socialengagement and help.

This paper is organised as follows. In the next section wepresent related work on context-sensitive affect and socialsignals recognition, and socially intelligent robots. Section 3describes our scenario and contains an overview of the archi-tecture of the robotic game companion, while Sect. 4 presentsthe experimental methodology. Finally, Sect. 5 provides anoverview of the experimental results, Sect. 6 discusses theresults and in Sect. 7 we summarise our main findings.

2 Related work

2.1 Context-sensitive affect recognition

A limited number of studies have investigated the problemof context-sensitive affect recognition. Kapoor et al. (2005),for example, proposed an approach for predicting interestin a learning environment by using a combination of non-verbal cues and information about the learner’s task. Maltaet al. (2008) proposed a system for the multimodal estima-tion of a driver’s irritation that exploits information about thedriving context. Peters et al. (2010) modelled the user’s inter-est and engagement with a virtual agent displaying sharedattention behaviour, by using contextualised eye gaze andhead direction information. More recently, Sabourin et al.(2011) used Dynamic Bayesian Networks modelling per-sonality attributes for the automatic prediction of learneraffect in a learning environment. In our own previous work,we demonstrated the importance of game context for dis-criminating user’s affect (Castellano et al. 2009), and thatchildren’s engagementwith the robot can be recognised usingposture data (Sanghvi et al. 2011) and a combination of taskand social interaction-based features (Castellano et al. 2009).

One of the main challenges in this area of research is thequestion of how to model and encode relationships between

123

Page 3: Detecting perceived quality of interaction with a robot ......Several efforts towards the design of systems capable of perceiving multimodal social, affective cues (e.g., facial expressions,

Auton Robot (2017) 41:1245–1261 1247

different types of context and between context and othermodalities.

In this work we investigate contextual feature representa-tion in a human–robot interaction scenario and explore how totimely encode task and game context and their relationshipsin a human–robot interaction session. We take inspirationfrom the work by Morency et al. (2008), who proposed acontext-based recognition framework that integrates infor-mation from human participants engaged in a conversationto improve visual gesture recognition. They proposed theconcept of encoding dictionary, a technique for contextualfeature representation that captures different relationshipsbetween contextual and visual gestures. Similarly, we aim topropose a method to encode relationships between differenttypes of features, but we focus primarily on different types ofcontextual information (and not on features encoding infor-mation from different modalities), and on predicting socialvariables such as quality of interaction (and not on automaticgesture recognition).

Other work used contextual information for the purposeof automatic affect recognition. For example, Martinez andYannakakis (2011) investigated a method for the fusionof physiological signals and game-related information in agame play scenario. Their proposed approach uses frequentsequence mining to extract sequential features that combineevents across different user inputmodalities. On the contrary,our work attempts to model interdependencies of differentcontextual features not only at the turn level, but also at theinteraction level.

2.2 Socially intelligent robots

Recent research on socially intelligent robots shows thatrobots are increasingly being studied as partners that col-laborate, do tasks and learn from people, making the useof robotic platforms as tools for experimental human–robotinteraction more approachable (Breazeal 2009).

Social perception abilities are of key importance for a suc-cessful interaction with human users. These allow robots torespond to human users in an appropriate way, and facilitatethe establishment of a human-centred multi-modal commu-nication with the robot.

Breazeal et al. were amongst the first to design sociallyperceptive abilities for robots: they developed an attentionsystem based on low-level perceptual stimuli used by robotssuch as Kismet and Leonardo (Breazeal et al. 2001; Breazeal2009). Other examples of social perceptive abilities aredescribed in the works by Mataric and colleagues. Meadet al., for example, developed a model for the automaticrecognition of interaction cues in a human–robot interac-tion scenario, such as initiation, acceptance and termination(Mead et al. 2011), while Feil-Seifer and Mataric proposed amethod to automatically determinewhether children affected

by autism are interacting positively or negatively with arobot, using distance-based features (Feil-Seifer andMataric2011).

There are several types of robots that benefit from expres-sive behaviour and socially perceptive abilities, from thoseemployed in socially assistive robotics (Scassellati et al.2012) to those acting as companions of people. Examples ofthe latter include companions for health applications (von derPutten et al. 2011), for education purposes (Saerbeck et al.2010), for children (Espinoza et al. 2011) and for the elderly(Heerink et al. 2008).

In this paper we focus on a game companion for youngchildren, endowed with expressive behaviour and social per-ception and affect recognition abilities. We investigate socialperception abilities in relation to the development of a com-putational model that allows a robot to predict a user’sperception of the quality of interaction with the robot itselfbased on contextual features.

3 Scenario

The human–robot interaction scenario used in this work con-sists of an iCat robot that acts as a chess game companionfor children. The robot is able to play chess with children,providing affective feedback based on the evaluation of themoves played on an electronic chessboard. The affectivefeedback is provided by means of facial expressions and ver-bal utterances, and by reacting empathically to children’sfeelings experienced throughout the game (see Fig. 1). If theuser is experiencing a negative feeling, the robot generatesan empathic response to help or encourage the user. Alter-natively, if the user is experiencing a positive feeling, noempathic reaction is triggered. In this case, the robot simplygenerates an emotional facial expression (Leite et al. 2008).After every user’s move, the robot asks the user to makeits own move by providing the desired chess coordinates.Then, the robot waits for the user’s move and so on. Initialuser studies using this scenario showed that affect sensingand empathic interventions increase children’s engagementand perception of friendship towards the robot (Leite et al.2012a, b).

3.1 System overview

Our system was built on a novel platform for affect sen-sitive, adaptive human–robot interaction that integrates anarray of sensors in amodular client-server architecture (Leiteet al. 2014). The main modules of the system are displayedin Fig. 2. The Vision module captures children’s non-verbalbehaviours and interactionswith the robot (Fig. 1). Thismod-ule tracks head movements and salient facial points in real

123

Page 4: Detecting perceived quality of interaction with a robot ......Several efforts towards the design of systems capable of perceiving multimodal social, affective cues (e.g., facial expressions,

1248 Auton Robot (2017) 41:1245–1261

Fig. 1 User interacting with the robot during the experiment

Fig. 2 iCat system architecture

time using a face tracking toolkit1 and extracts informationabout users’ gaze direction and probability of smile. TheGame Engine is built on top of an open-source chess engine.2

In addition to deciding the next move for the robot, this mod-ule includes evaluation functions that determine how wellthe robot (or the child) are doing in the game. The values ofthese functions are updated after every move played in thegame. Based on the history of these values, the game engineautomatically extracts game-related contextual features, aswell as features related to certain behaviours displayed bythe robot. After every move played by the user, the AffectDetection module predicts the probability values for valence(i.e., whether the user is more likely to be experiencing a

1 http://www.seeingmachines.com/.2 http://www.tckerrigan.com/Chess/TSCP.

positive, neutral or negative feeling) (Castellano et al. 2013)using synchronised features from the Vision module (i.e.,user’s probability of smile and gaze direction) and the GameEngine. This SVM-based module was trained with a corpusof contextualised children’s affective expressions (Castellanoet al. 2010).

The Appraisal module assesses the situation of the gameand determines the robot’s affective state. The output of thismodule is used by the Action Selection module, responsiblefor generating affective facial expressions in tune with therobot’s affective state. If the user is experiencing a negativefeeling with probability over a certain threshold (fine-tunedafter preliminary tests with a small group of users) this mod-ule also selects one of the robot’s empathic strategies to bedisplayed to the user.

123

Page 5: Detecting perceived quality of interaction with a robot ......Several efforts towards the design of systems capable of perceiving multimodal social, affective cues (e.g., facial expressions,

Auton Robot (2017) 41:1245–1261 1249

4 Experimental methodology

To investigate the role of game and social context in the auto-matic prediction of dimensions of quality of interaction withthe robot, we conducted a data collection field experiment.Our main goal was to explore newways in which context canimprove the design of empathic robots.

4.1 Experimental setting

The study was conducted in a Portuguese school where chil-dren have chess lessons as part of their maths curriculum.The setting consists of the robot, an electronic chessboard, acomputer where all the processing takes place and a webcamthat captures the children’s expressions to be analysed by theaffect detection module (Fig. 1). Two video cameras werealso used to record the interactions from a frontal and lat-eral perspective. The setting was installed in the room wherechildren have their weekly chess lessons.

4.2 Procedure

A total of 40 children (19 male and 21 female), with agesbetween 8 and 10 years old, participated in the experi-ment. Participants were randomly assigned to three differentcontrol conditions, corresponding to three different parame-terisations of the robot’s behaviour:

1. Neutral The robot simply comments the children’smoves in a neutral way (e.g., “you played well”, “goodmove”).

2. Random empathic When the user is experiencing anegative feeling (i.e., when the probability that the user isa negative feeling calculated by the Affect Detection Mod-ule is high), the iCat randomly selects one of the availableempathic strategies.

3. Adaptive empathic When the user is experiencing anegative feeling (i.e., when the probability that the user is anegative feeling calculated by theAffect DetectionModule ishigh), the empathic strategies are selected using an adaptivelearning algorithm that captures the impact of each strategyon different users. In this case, the robot adapts its empathicbehaviour considering the previous reactions of a particularuser to an empathic strategy (Leite et al. 2014).

In addition to differences in selecting the empathic strate-gies, the robot’s affective feedback is also different in thethree conditions. While in the two empathic conditions theaffective behaviours are user-oriented (i.e., the robot displayspositive emotions if the user makes good moves and nega-tive emotions if the user makes bad moves), in the neutralcondition the robot’s behaviour is self-oriented (i.e., it showssadness if the user makes good moves, etc.).

Our final sample consisted of 14 children in the adaptiveempathic condition, 13 in the random empathic and 13 in

the neutral condition. Participants were instructed to sit infront of the robot and play a chess exercise. The exercise wassuggested by the chess instructor so that the difficulty wasappropriate for the children and it was the same for all theparticipants. Two experimenters were in the room observingthe interaction and controlling the experiment. On average,each participant played 15 minutes with the robot. After thatperiod, depending on the state of the game, the iCat eithergave up (if it was in disadvantage) or proposed a draw (ifthe child was losing or if none of the players had advantage),arguing that it had to play with another user.

4.3 Data collection

For each human–robot interaction, context logs includinginformation about the game and the social context throughoutthe gamewere automatically collected.After playingwith therobot, participants were directed to another room where theyfilled in a questionnaire to rate their quality of interactionwiththe robot. Note that children from the age group taken intoconsideration are sufficiently developed to answer questionswith some consistency (Borgers et al. 2000). We collecteda total of 40 context logs and ratings for five dimensions ofquality of interaction.

4.3.1 Quality of interaction: dimensions

We measured quality of interaction using a set of affective,friendship, and social presence dimensions that have beenpreviously shown to be successful in measuring the influ-ence of a robot’s behaviour on the relationship between therobot itself and children (Leite 2013). Children were askedto rate these dimensions throughout the interaction using a5-point Likert scale, where 1 meant “totally disagree” and 5meant “totally agree”. For each dimension presented below,we considered the average ratings of all the questionnaireitems associated to it.

Social engagement—thismetric has been extensively usedto measure quality of interaction between humans (Poggi2007), and between humans and robots (Sidner et al. 2004).The engagement questionnaire we used is based on the ques-tions formulated by Sidner et al. (2004) to evaluate users’responses towards a robot capable of using social capabilitiesto attract users’ attention. It included the questions below:

– iCat made me participate more in the game.– It was fun playing with iCat.– Playing with the iCat caused me real feelings and emo-tions.

– I lost track of time while playing with iCat.

Help—measures how the robot is perceived to have pro-vided guidance and other forms of aid to the user throughout

123

Page 6: Detecting perceived quality of interaction with a robot ......Several efforts towards the design of systems capable of perceiving multimodal social, affective cues (e.g., facial expressions,

1250 Auton Robot (2017) 41:1245–1261

the game. In particular,we refer to help as a friendship dimen-sion that measures the degree to which a friend fullfils asupport function existing in most friendship definitions, assuggested by Mendelson and Aboud in the McGill Friend-ship Questionnaire (Mendelson and Aboud 1999). Questionsincluded:

– iCat helped me during the game.– iCat’s comments were useful to me.– iCat showed me how to do things better.– iCat’s comments were helpful to me.

Self-validation—another friendshipdimension taken fromtheMcGill FriendshipQuestionnaire (Mendelson andAboud1999), which measures the degree to which children per-ceived the robot as reassuring, encouraging and helped thechildren to maintain a positive self-image. To measure self-validation we asked children the following questions:

– iCat encouraged me to play better during the game.– iCat praised me when I played well.– iCat made me feel smart.– I felt that I could play better in the presence of iCat.– iCat highlighted the things that I am good at.– iCat made me feel special.

Perceived affective interdependence—this dimension wasused to measure social presence, that is, “the degree to whicha user feels access to the intelligence, intentions, and sen-sory impressions of another” (Biocca 1997). Here, perceivedaffective interdependence measures the extent to which theuser’s emotional states are perceived to affect and be affectedby the robot’s emotional states.

– I was influenced by iCat’s mood.– iCat was influenced by my mood.

4.3.2 Game context

For each interaction with the robot, the game context logsinclude information on the events happening in the game.These include:

Game state—a value that represents the advantage/disadvantage of the child in the game. This value is calcu-lated by the same chess evaluation function that the iCat usesto plan its own moves. The more the value of the game stateis positive, the more the user is in advantage with respectto the iCat; the more it is negative, the more the iCat is inadvantage.

Game evolution—given by the difference between the cur-rent and the previous value of the game state, a positive valueindicates that the user is improving in the game with respect

to the previous move, while a negative value means that theuser’s performance is worse since the last move.

User emotivector—after every move made by the userthe chess evaluation function in the Game Engine returnsa new value, updated according to the current state of thegame. The robot’s Emotivector System (Martinho and Paiva2006) is an anticipatory system that captures this valueand, by using the history of evaluation values, computesan expected value for the chess evaluation function associ-ated with the user’s moves. Based on the mismatch betweenthe expected value and the actual value, the system gen-erates a set of affective signals describing a sentiment ofreward, punishment or neutral effect for the user (Leiteet al. 2008), which are encoded into the following features:UserEmotivectorReward, UserEmotivectorPunishment andUserEmotivectorNeutral.

Game result—the outcome of the game (e.g., user won,iCat won, draw, iCat gave up).

Number of moves—total number of turns in the game.Game duration—duration of the game in seconds.

4.3.3 Social context

These logs contain information on the behaviour displayedby the iCat at specific times of the interaction. Addition-ally, social context logs also include information about therobot’s conditions (i.e., neutral, random empathic, adaptiveempathic) as well as, in case of the empathic condition, thetype of adopted empathic strategy. The empathic strategiesemployed by the robot are the following:

Encouraging comments (strategy n.0)—statements suchas “don’t be sad, you can still recover your disadvantage”.

Scaffolding (strategy n.1)—providing feedback on theuser’s last move and allowing the user to take back a move(in case the move is not good).

Offering help (strategy n.2)—suggesting a good move forthe user to play.

Intentionally playing a bad move (strategy n.3)— delib-erately playing a bad move, for example, choosing a movethat allows the user to capture a piece.

5 Experimental results

In this section, we start by describing the process of featureextraction and then the results of training and testing modelsfor the prediction of quality of interaction with the robot. Ourgoal is to investigate whether we can predict the perceivedquality of interaction with the robot over a whole interactionsession, using solely information about the game and thesocial context throughout the interaction.

123

Page 7: Detecting perceived quality of interaction with a robot ......Several efforts towards the design of systems capable of perceiving multimodal social, affective cues (e.g., facial expressions,

Auton Robot (2017) 41:1245–1261 1251

Table 1 Overall features

Feature Name Category of context Description

1 AvgGameState Game Game state averaged over the whole interaction

2 AvgGameEvol Game Game evolution averaged over the whole interaction

3 NofMoves/GameDur Game Number of moves played during the game divided by the duration of the game (inseconds)

4 EmotivReward/NofMoves Game Number of moves associated with a feeling of reward divided by total number ofmoves

5 EmotivPunish/NofMoves Game Number of moves associated with a feeling of punishment divided by total numberof moves

6 EmotivNeut/NofMoves Game Number of moves associated with a feeling of neutral feeling divided by totalnumber of moves

7 iCatWon Game Whether or not iCat won (binary feature)

8 UserWon Game Whether or not user won (binary feature)

9 Draw Game Whether or not iCat proposed a draw (binary feature)

10 iCatGaveUp Game Whether or not iCat gave up (binary feature)

11 Neutral Social Whether the iCat displayed or not a neutral behaviour (binary feature)

12 RandomEmpat Social Whether the iCat displayed or not random empathic behaviour (binary feature)

13 AdaptEmpat Social Whether the iCat displayed or not adaptive empathic behaviour (binary feature)

14 NofStrat0/NofStrategies Social Number of times the empathic strategy n. 0 was displayed divided by the number ofdisplayed strategies

15 NofStrat1/NofStrategies Social Number of times the empathic strategy n. 1 was displayed divided by the number ofdisplayed strategies

16 NofStrat2/NofStrategies Social Number of times the empathic strategy n. 2 was displayed divided by the number ofdisplayed strategies

17 NofStrat3/NofStrategies Social Number of times the empathic strategy n. 3 was displayed divided by the number ofdisplayed strategies

5.1 Features extraction

Game and social context-based features were extracted fromthe context logs for each of the 40 user interactions with therobot. These features were extracted at the interaction level(overall features) and the game level (turn-based features).

5.1.1 Overall features

These features were calculated over the whole interaction.They capture game and social context in an independent wayat the interaction level. The complete list of overall featuresis displayed on Table 1.

5.1.2 Turn-based features

These features were extracted after each game turn. Theyencode relationships between social and game context afterevery move, therefore being more focused on the dynam-ics of the interaction. Note that the turn-based features arenot dynamic per se, but are encoded in a way that attemptsto capture the dynamic changes of the game context as aconsequence of the robot’s behaviour. We are particularly

interested in features that capture whether social contextaffects game context at each turn of the game. The completeset of turn-based features is listed on Table 2.

5.2 Methodology

The overall and turn-based features were used to train andtestmodels based on Support VectorMachines (SVM) to pre-dict children’s perceived quality of interaction with the robotover a whole session. SVMs are state-of-the-art algorithmsfor multi-class classification, widely used for the automaticprediction of affect-related states (Batrinca et al. 2011; Ham-mal and Cohn 2012).

The classification experiments followed three differentstages. First, we performed training and testing to predictperceived quality of interaction using only overall features.In the second stage, we focused on prediction of perceivedquality of interaction using turn-based features only. Finally,we merged overall and turn-based features into a single fea-ture vector. We used the LibSVM library (Chang and Lin2011) and a Radial Basis Function (RBF) kernel for trainingand testing. RBF kernels are the most common kernels forSVM-based detection of discrete affect-related states, as they

123

Page 8: Detecting perceived quality of interaction with a robot ......Several efforts towards the design of systems capable of perceiving multimodal social, affective cues (e.g., facial expressions,

1252 Auton Robot (2017) 41:1245–1261

Table 2 Turn-based features

Feature Robot’s behaviour Game state Emotivector Description

1 Neut-No-strat ↑ N/A N. of times game state increases after a turn when the condition is neutral/n.of turns

2 Neut-No-strat ↓ N/A N. of times game state decreases after a turn when the condition is neutral/n.of turns

3 Neut-No-strat − N/A N. of times game state stays the same after a turn when the condition isneutral/n. of turns

4 Emp-strat ↑ N/A N. of times game state increases after a turn when the condition is empathicand a strategy is employed/n. of turns

5 Emp-strat ↓ N/A N. of times game state decreases after a turn when the condition is empathicand a strategy is employed/n. of turns

6 Emp-strat − N/A N. of times game state stays the same after a turn when the condition isempathic and a strategy is employed/n. of turns

7 Emp-No-strat ↑ N/A N. of times game state increases after a turn when the condition is empathicand a strategy is NOT employed/n. of turns

8 Emp-No-strat ↓ N/A N. of times game state decreases after a turn when the condition is empathicand a strategy is NOT employed/n. of turns

9 Emp-No-strat − N/A N. of times game state stays the same after a turn when the condition isempathic and a strategy is NOT employed/n. of turns

10 Neut-No-strat N/A Reward N. of times emotivector is reward after a turn when the condition is neutral/n.of turns

11 Neut-No-strat N/A Punishment N. of times emotivector is punishment after a turn when the condition isneutral/n. of turns

12 Neut-No-strat N/A Neutral N. of times emotivector is neutral after a turn when the condition is neutral/n.of turns

13 Emp-strat N/A Reward N. of times emotivector is reward after a turn when the condition is empathicand a strategy is employed/n. of turns

14 Emp-strat N/A Punishment N. of times emotivector is punishment after a turn when the condition isempathic and a strategy is employed/n. of turns

15 Emp-strat N/A Neutral N. of times emotivector is neutral after a turn when the condition is empathicand a strategy is employed/n. of turns

16 Emp-No-strat N/A Reward N. of times emotivector is reward after a turn when the condition is empathicand a strategy is NOT employed/n. of turns

17 Emp-No-strat N/A Punishment N. of times emotivector is punishment after a turn when the condition isempathic and a strategy is NOT employed/n. of turns

18 Emp-No-strat N/A Neutral N. of times emotivector is neutral after a turn when the condition is empathicand a strategy is NOT employed/n. of turns

support cases in which the relation between class labels andattributes is not linear. Previous studies on automatic recog-nition of affective and social dimensions indicate that theyprovide the best performances (Gunes et al. 2012).

We followed a leave-one-subject-out cross-validationapproach for training and testing. To address the issue ofthe limited size of the dataset, which includes one sample foreach user interaction with the robot, this approach allowedus to train the system with as many samples as possible ateach step to increase the chance that the classifier is accurate,while at the same time allowing for a measure of the abilityof the classifier to generalise to new users to be obtained.

In all experiments, to optimise the cost parameter C andkernel parameter, we carried out a grid-search using cross-

validation, as suggested in Hsu et al. (2010). The parametervalues producing the highest cross-validation accuracy wereselected. Before the training and testing phase for each clas-sification experiment, a feature selection stepwas performed.To identify the features with the most discriminative power,a LibSVM tool that uses F-scores for ranking features wasused (Chen and Lin 2006).

5.3 Social engagement: results

The first step consisted of encoding social engagement withthe objective of producing “ground truth” labels for thisdimension for each interaction with the robot. We did notemploy coders for this purpose, but used instead the answers

123

Page 9: Detecting perceived quality of interaction with a robot ......Several efforts towards the design of systems capable of perceiving multimodal social, affective cues (e.g., facial expressions,

Auton Robot (2017) 41:1245–1261 1253

Table 3 Social engagement prediction: comparison between different classification experiments

Classification Performance experiment Best features

Overall features 77.5 % (baseline = 52.5 %) iCatWon, Neutral, NofStrat0/NofMoves, NofStrat1/NofMoves,AvgGameEvol, iCatGaveUp, Draw, AvgGameState, RandomEmpat,NofStrat2/NofMoves

Turn-based features 72.5 % (baseline = 52.5 %) Neut-No-strat-AND-Game-state-lower,Neut-No-strat-AND-Emotiv-reward

Overall and turn-based features 80 % (baseline = 52.5 %) iCatWon, Neut-No-strat-AND-Game-state-lower,Neut-No-strat-AND-Emotiv-reward, Neutral

provided by the children in the 5-point Likert scale question-naire (self-reports). As the number of samples is relativelysmall and, on average, the children reported high levels ofengagement with the robot, it was decided to divide the 40interactions in two (rather than several) groups that differ-entiate the samples with medium-high to high engagementfrom the rest of the samples. The two groups correspond totwo classes, one for medium-high to high engagement (21interactions) and one for medium-high to low engagement(19 interactions). This distinction was made based on thechildren’s answers, e.g., all interactions rated higher or equalto a specified threshold were included in the medium-highto high engagement group. The threshold was chosen withthe objective to identify two groups with a similar number ofsamples, in order to train the systemwith as many samples aspossible from the two different groups. Therefore 21 sampleslabelled as medium-high to high engagement and 19 sampleslabelled as medium-high to low engagement were used in theevaluation.

5.3.1 Social engagement prediction using overall features

We ran a series of classification experiments where mod-els were trained and tested with different combinations ofthe overall features listed on Table 1. Experiments were per-formed by adding one feature at a time, based on the rankedoutput of the feature selection process. We started using asingle feature experiment with the first ranked feature, thenperformed a 2-feature experiment with the two most rankedfeatures, and so on, until the performance starts decreasing.

We followed a “leave-one-subject-out” cross validationapproach for training and testing. The best recognition per-formance for predicting social engagement using overallfeatures was 77.5 % (baseline = 52.5 %3), achieved using 10features (Table 3). When using solely game context-basedfeatures, the models achieved a recognition rate of 77.5 %(baseline = 52.5 %), while 65 % (baseline = 52.5 %) was the

3 Given that the samples in the training datasets are not evenly distrib-uted across classes, in the following sections we will use the percentageof the samples of the most frequent class as the baseline for the classi-fiers’ performance.

best performance when social context-based features wereused (Table 7).

5.3.2 Social engagement prediction using turn-basedfeatures

Classification experiments using turn-based features wereperformed by following the same approach. We ran a seriesof classification experiments, training and testing differentcombinations of the turn-based features.

A “leave-one-subject-out” cross validation approach wasfollowed during the training and testing phase. The bestrecognition performance for predicting social engagementusing turn-based features was 72.5 % (baseline = 52.5 %),achieved using only 2 features (Table 3).

5.3.3 Social engagement prediction using overall andturn-based features

Following the same procedure, we trained and tested a seriesof models using different combinations of the overall andturn-based features.

As in the previous two experiments, a “leave-one-subject-out” cross validation approach was followed. The bestperformance for predicting social engagement using over-all and turn-based features was 80 % (baseline = 52.5 %),achieved using 4 features (Table 3).

5.4 Help: results

The first step consisted of encoding help in order to produce“ground truth” labels for this dimension for each interac-tion with the robot, following the same procedure adoptedfor social engagement described earlier. As the numberof samples is relatively small and, on average, the chil-dren reported high levels of perceived help, we dividedthe 40 interactions in two (rather than several) groups thatdifferentiate the samples with medium-high to high helpfrom the rest of the samples. The two groups correspondto two classes, one for medium-high to high help (23interactions) and one for medium-high to low help (17 inter-

123

Page 10: Detecting perceived quality of interaction with a robot ......Several efforts towards the design of systems capable of perceiving multimodal social, affective cues (e.g., facial expressions,

1254 Auton Robot (2017) 41:1245–1261

Table 4 Help prediction: comparison between different classification experiments

Classification experiment Performance Best features

Overall features 75 % (baseline = 57.5 %) NofStrat3/NofStrategies, iCatGaveUp

Turn-based features 75 % (baseline = 57.5 %) Emp-strat-AND-Game-state-same,Emp-strat-AND-Game-state-higher

Overall and turn-based features 80 % (baseline = 57.5 %) Emp-strat-AND-Game-state-same, NofStrat3/NofStrategies,Emp-strat-AND-Game-state-higher,Emp-strat-AND-emotiv-reward, iCatGaveUp

Table 5 Self-validation prediction: comparison between different classification experiments

Classification experiment Performance Best features

Overall features 69.23 % (baseline = 56.41 %) AdaptEmpat, NofStrat1/NofStrategies

Turn-based features 69.23 % (baseline = 56.41 %) Emp-strat-AND-Game-state-same,Emp-No-strat-AND-Game-state-higher,Emp-No-strat-AND-Game-state-same

Overall and turn-based features 69.23 % (baseline = 56.41 %) Emp-strat-AND-Game-state-same,Emp-No-strat-AND-Game-state-higher,Emp-No-strat-AND-Game-state-same

Table 6 Perceived affective interdependence prediction: comparison between different classification experiments

Classification experiment Performance Best features

Overall features 72.5 % (baseline = 57.5 %) NofStrat0/NofStrategies

Turn-based features 67.5 % (baseline = 57.5 %) Emp-strat-AND-emotiv-punish

Overall and turn-based features 72.5 % (baseline = 57.5 %) NofStrat0/NofStrategies, Emp-strat-AND-emotiv-punish

actions). Therefore 23 samples labelled as medium-high tohigh help and 17 samples labelled as medium-high to lowhelp were used in the evaluation, in order to train the sys-tem with as many samples as possible from the two differentgroups.

5.4.1 Help prediction using overall features

These classification experiments were performed using theoverall features (Table 1). As we did for social engagement,we ran a series of classification experiments, where modelswere trained and tested with different combinations of theoverall features.

A “leave-one-subject-out” cross validation approach wasfollowed during the training and testing phase. The bestrecognition performance for predicting help using overallfeatures was 75 % (baseline = 57.5 %), achieved using 2features (Table 4).

We also trained and tested models using first solely gamecontext-based features and then solely social context-basedfeatures. We found that models achieved a recognition rateof 72.5 % (baseline = 57.5 %) using game context-basedfeatures and 70 % (baseline = 57.5 %) using social context-based features (Table 7).

5.4.2 Help prediction using turn-based features

Classification experiments using turn-based features wereperformed by following the same approach adopted for theexperiments using overall features. As we did for socialengagement, we ran a series of classification experiments,where models were trained and tested with different combi-nations of the turn-based features.

A “leave-one-subject-out” cross validation approach wasfollowed during the training and testing phase. The bestrecognition performance for predicting help using turn-basedfeatures was 75 % (baseline = 57.5 %), achieved using 2 fea-tures (Table 4).

5.4.3 Help prediction using overall and turn-based features

Following the same methodology adopted for the previoustwo classification experiments, we trained and tested a seriesof models using different combinations of the overall andturn-based features.

As for the previous twoexperiments, a “leave-one-subject-out” cross validation approach was followed during thetraining and testing phase. The best recognition performancefor predicting help using overall and turn-based features

123

Page 11: Detecting perceived quality of interaction with a robot ......Several efforts towards the design of systems capable of perceiving multimodal social, affective cues (e.g., facial expressions,

Auton Robot (2017) 41:1245–1261 1255

Table7

Predictin

gsocialengagement,help,self-valid

ationandperceivedaffectiveinterdependenceusingoverallfeatures:comparisonbetw

eengameandsocialcontext

Performance

Classificatio

nexperiment

Socialengagement

Help

Self-validation

Perceivedaffectiveinterdependence

Overallfeatures—gamecontext

77.5

%(baseline=52.5

%)

72.5

%(baseline=57.5

%)

74.36%

(baseline=56.41%)

70%

(baseline=57.5

%)

Overallfeatures—socialcontext

65%

(baseline=52.5

%)

70%

(baseline=57.5

%)

64.1

%(baseline=56.41%)

67.5

%(baseline=57.5

%)

was 80 % (baseline = 57.5 %), achieved using 5 features(Table 4).

5.5 Self-validation: results

The first step consisted of encoding self-validation in orderto produce “ground truth” labels for this dimension for eachinteraction with the robot, following the same procedureadopted for social engagement described earlier. Note thatall self-validation recognition experiments were performedon 39 subjects and not 40 as for social engagement, help andperceived affective interdependence, as one of the subjectsdid not provide a score for self-validation. As the numberof samples is relatively small and, on average, the childrenreported high levels of perceived self-validation, we dividedthe 39 interactions in two (rather than several) groups thatdifferentiate the samples with medium-high to high self-validation from the rest of the samples. The two groupscorrespond to two classes, one for medium-high to highself-validation (17 interactions) and one for medium-highto low self-validation (22 interactions). Therefore 17 sam-ples labelled as medium- high to high self-validation and 22samples labelled as medium-high to low self-validation wereused in the evaluation, in order to train the system with asmany samples as possible from the two different groups.

5.5.1 Self-validation prediction using overall features

These classification experiments were performed using theoverall features (Table 1). As we did for social engagementand help, we ran a series of classification experiments, wheremodels were trained and tested with different combinationsof the overall features.

A “leave-one-subject-out” cross validation approach wasfollowed during the training and testing phase. The bestrecognition performance for predicting self-validation usingoverall features was 69.23 % (baseline = 56.41 %), achievedusing 2 features (Table 5).

We also trained and tested models using first solely gamecontext-based features and then solely social context-basedfeatures. We found that models achieved a recognition rateof 74.36 % (baseline = 56.41 %) using game context-basedfeatures and 64.1 % (baseline = 56.41 %) using socialcontext-based features (Table 7).

5.5.2 Self-validation prediction using turn-based features

Classification experiments using turn-based features wereperformed by following the same approach adopted for theexperiments using overall features. As we did for socialengagement and help, we ran a series of classification exper-iments, where models were trained and tested with differentcombinations of the turn-based features.

123

Page 12: Detecting perceived quality of interaction with a robot ......Several efforts towards the design of systems capable of perceiving multimodal social, affective cues (e.g., facial expressions,

1256 Auton Robot (2017) 41:1245–1261

A “leave-one-subject-out” cross validation approach wasfollowed during the training and testing phase. The bestrecognition performance for predicting self-validation usingturn-based features was 69.23 % (baseline = 56.41 %),achieved using 3 features (Table 5).

5.5.3 Self-validation prediction using overall andturn-based features

Following the same methodology adopted for the previoustwo classification experiments, we trained and tested a seriesof models using different combinations of the overall andturn-based features.

As for the previous twoexperiments, a “leave-one-subject-out” cross validation approach was followed during thetraining and testing phase. The best recognition performancefor predicting self-validation using overall and turn-basedfeatures was 69.23 % (baseline = 56.41 %), achieved using3 features (Table 5).

5.6 Perceived affective interdependence: results

The first step consisted of encoding perceived affective inter-dependence in order to produce “ground truth“ labels for thisdimension for each interaction with the robot, following thesame procedure adopted for social engagement describedearlier. As the number of samples is relatively small and,on average, the children reported high levels of perceivedaffective interdependence, we divided the 40 interactionsin two (rather than several) groups that differentiate thesamples with medium-high to high perceived affective inter-dependence from the rest of the samples. The two groupscorrespond to two classes, one for medium-high to high per-ceived affective interdependence (23 interactions) andone formedium-high to low perceived affective interdependence (17interactions). Therefore 23 samples labelled as medium-highto high perceived affective interdependence and 17 sampleslabelled as medium-high to low perceived affective interde-pendence were used in the evaluation, in order to train thesystem with as many samples as possible from the two dif-ferent groups.

5.6.1 Perceived affective interdependence prediction usingoverall features

These classification experiments were performed using theoverall features (Table 1). As we did for social engagement,help, and self-validation, we ran a series of classificationexperiments, where models were trained and tested with dif-ferent combinations of the overall features.

A “leave-one-subject-out” cross validation approach wasfollowed during the training and testing phase. The bestrecognition performance for predicting perceived affective

interdependence using overall features was 72.5 % (baseline= 57.5 %), achieved using 1 feature (Table 6).

We also trained and tested models using first solely gamecontext-based features and then solely social context-basedfeatures.We found that models achieved a recognition rate of70 % (baseline = 57.5 %) using game context-based featuresand 67.5 % (baseline = 57.5 %) using social context-basedfeatures (Table 7).

5.6.2 Perceived affective interdependence prediction usingturn-based features

Classification experiments using turn-based features wereperformed by following the same approach adopted for theexperiments using overall features. As we did for socialengagement, help, and self-validation, we ran a series of clas-sification experiments, where models were trained and testedwith different combinations of the turn-based features.

A “leave-one-subject-out” cross validation approach wasfollowed during the training and testing phase. The bestrecognition performance for predicting perceived affectiveinterdependence using turn-based features was 67.5% (base-line = 57.5 %), achieved using 1 feature (Table 6).

5.6.3 Perceived affective interdependence prediction usingoverall and turn-based features

Following the same methodology adopted for the previoustwo classification experiments, we trained and tested a seriesof models using different combinations of the overall andturn-based features.

As for the previous twoexperiments, a “leave-one-subject-out” cross validation approach was followed during thetraining and testing phase. The best recognition perfor-mance for predicting perceived affective interdependenceusing overall and turn-based features was 72.5 % (baseline= 57.5 %), achieved using 2 features (Table 6).

6 Discussion

This paper aims to investigate the relationship between socialand game context-based features and the effect that theirinterdependencies have on perception of quality of interac-tion.

Overall, results showed that game and social context-based features can be successfully used to predict children’sperceived quality of interaction with the robot over a wholeinteraction session.

Specifically, overall features proved to perform equal (forthe dimensions of help and self-validation) or better (forsocial engagement and perceived affective interdependence)than turn-based features. It may be that overall features, i.e.,

123

Page 13: Detecting perceived quality of interaction with a robot ......Several efforts towards the design of systems capable of perceiving multimodal social, affective cues (e.g., facial expressions,

Auton Robot (2017) 41:1245–1261 1257

features that encode game and social context in an indepen-dent way at the interaction level, better capture the variablesthat affect the level of engagement with the robot (e.g., vari-ables that relate to the result of the chess game, the typeof robot condition (neutral vs. empathic,) and the type ofstrategy employed by the robot, see Table 3) and the extentto which the user’s affective state is perceived by the userto affect and be affected by the robot’s affective state (e.g.,variables that relate to the type of strategy employed by therobot, see Table 6), compared to turn-based features.

The second result is that overall game context featuresproved more successful than overall social context features.This applies to all dimensions of quality of interaction (Table7). These results are somewhat surprising, except for thesocial engagement dimension, where game-related features,such as the result of the game, are likely to have affected chil-dren’s reported level of engagementwith the robot. It is worthnoting, however, that the difference in performance betweenclassifiers trained with game context features and classifierstrained with social context features is less pronounced forhelp and perceived affective interdependence.

Finally, the experiments showed that the integration ofgame and social context-based features with features encod-ing their interdependencies leads to higher recognition per-formances for a subset of dimensions (i.e., social engagementand help, see Tables 3, 4). For the dimensions of self-validation and perceived affective interdependence, instead,the performance of the classifiers trained with a combina-tion of overall and turn-based features is, in the worse casescenario, comparable with that of the classifiers trained withthe overall and turn-based features separately (see Tables5, 6).

One should note that some of the recognition rates aresmaller than 75%. Therefore, in order to evaluate success, wecompare the recognition rates with a suitable baseline.Whilean option for the baseline would be chance level (50 %), inour case, given that the samples in the training datasets arenot perfectly evenly distributed across classes, we used as thebaseline the percentage of the samples of the most frequentclass.Whenwe compare the achieved recognition accuracieswith the respective baselines calculated with this method,one can observe that in all cases the recognition accuraciesare above the baselines. We consider the classifiers success-ful when they beat the baseline. Moreover, in most casesthe performance above baseline is at least 10 % or higher(reaching a maximum of 27.5 % in the case of recognition ofsocial engagement using overall and turn-based features),with the exception of recognition of self-validation usingsocial-context based features (7.69 %).

There are some important aspects encountered in the pre-sented work that require further discussion.

The first aspect concerns the reliability of the ground truthfor the dimensions of quality of interaction, which, in this

work, is provided by the users themselves, i.e., the children.First, we would like to point out, as mentioned earlier in thispaper, that children from this age group are sufficiently devel-oped to answer questions with some consistency (Borgerset al. 2000). It was noted that, on average, children reportedhigh levels of social engagement, help, self-validation andperceived affective interdependence.While we were not ableto identify a large number of interactions characterised withlow levels of the selected dimensions, it was clearly possibleto identify, for all dimensions, two separate groups of inter-actions. This affected the way we decided to encode socialengagement, help, self-validation and perceived affect inter-dependence for each interaction with the robot: we aimedto differentiate interactions rated with medium-high to highlevels of these dimensions from those rated with medium-high to low levels. It is also worth noting that in this workwe measure perceived quality of interaction at the end of thegame/interaction with the robot. We are aware that childrenmay have been biased, in their judgement, by the way thegame ended. Finding ways of identifying a ground truth foraffective and social dimensions throughout an ongoing inter-action is still an open question for researchers in the field.This is especially true if such a ground truth is required fromthe users themselves, and not from external coders, and if oneaims to measure perceived quality of interaction, as is in ourcase. To support self-assessments with external assessments,an alternative could be the use of systems for the automaticdetection of indicators relating to user behaviour (Bae et al.2012), or brain-computer interfaces techniques (Poel et al.2012), although the latter is not a viable solution when exper-imenting with children in a real world environment, ratherthan in the laboratory. Other studies employed human codersto obtain external assessments (Castellano et al. 2014), but inthis workwe are interested in perception of quality of interac-tion, hence our preference for self-reports. Another solutionthat is currently being explored in the literature is the useof implicit probes (Corrigan et al. 2014) to find a groundtruth for affect-related states, a non-intrusive, pervasive andembedded method of collecting informative data at differentstages of an interaction.

Second, it is worth pointing out that there may be moredimensions that contribute to the definition of quality ofinteraction. In this work we chose to explore a subset ofaffective (i.e., social engagement), friendship (i.e., help andself-validation) and social presence (i.e., perceived affectiveinterdependence) dimensions, which have been previouslyidentified as playing a key role in measuring the effectthat a robot’s behaviour had on the relationship establishedbetween the robot and children (Leite 2013). However, theremay be other dimensions that carry additional importantinformation about quality of interaction, for example the suc-cessful establishment of social bonding between users androbots. This will be the subject of future work, for example

123

Page 14: Detecting perceived quality of interaction with a robot ......Several efforts towards the design of systems capable of perceiving multimodal social, affective cues (e.g., facial expressions,

1258 Auton Robot (2017) 41:1245–1261

how the dimensions considered here relate to social bonding,and how contextual information could be further exploited toallow the robot to make predictions of perceived quality ofinteraction that may help it plan more personalised, appro-priate empathic responses.

Third, we note that, while the chosen contextual featurescan successfully predict dimensions of quality of interac-tion with a robot in a chess game context, they could not beused, exactly as they stand, in a scenario that is substantiallydifferent. However, the proposed approach for feature encod-ing and representation can be generalizable to other types ofhuman–machine interactions applications. Examples in thehuman–robot interaction domain include scenarios charac-terised by (1) social interaction between a user and a robot,and (2) a task that user and robot are engaged with, for exam-ple human–robot collaborative scenarios. More specifically,results point in the direction of choosing feature vectors thatmodel task and social context and their interdependenciesat the turn-based and interaction level. Social human–robotinteraction scenarios involving some form of turn taking arean example of application that could benefit from this pro-posed modelling of social and task context.

Finally, it is important to acknowledge the potential impactthat age and culture may have had on our results. Previousstudies, for example, investigated the effect of cultural vari-ability on opinions and attitudes towards robots (Bartnecket al. 2007; Riek et al. 2010; Mavridis et al. 2012). While thestudy presented in this paper was conducted with a limitedpopulation of Portuguese school children, future work in thearea is likely to benefit from the exploration of the effect ofculture and age on the perception of quality of interactionwith a robot.

We believe that the results presented in this paper con-tribute to a better understanding ofwhat quality of interactionis in a task-oriented social human–robot interaction, and howit can be measured, especially in relation to the effect that theinterdependencies of game and social context have on per-ceived quality of interaction, rather than the role that gameand social context-based features play in isolation.

7 Conclusions

Despite the recent advances on automatic affect recognition,the role of task and social context for the purpose of auto-matically predicting affective and social dimensions is stillunclear. This work aimed to advance the state of the art inthis direction, exploring the role of task, social context andtheir interdependencies, in the automatic prediction of thequality of interaction between children and a robotic gamecompanion.

Several SVMs models with different features extractedfrom context logs collected in a human–robot interactionexperiment were investigated. The features included infor-mation about the game and the social context during theinteraction with the robot. Our experimental results lead usto the following conclusions:

Game and social context-based features can be success-fully used to predict perceived quality of interaction with therobot. Using only information about the task and the socialcontext of the interaction, our results suggest that it is pos-sible to predict with high accuracy the perceived quality ofinteraction in a human–robot interaction scenario.

Game context proved more successful than social contextin the automatic prediction of perceived quality of interac-tion. Despite the high impact of both game context and socialcontext for the automatic detection of perceived quality ofinteraction, game context seems to play a more importantrole in this case.

Overall features performed equally or better than turn-based features; the combination of game and social context-based features with features encoding their interdependen-cies lead to higher recognition performances for a subsetof the selected dimensions of quality of interaction. Con-textual feature representation (e.g., how to timely encodegame, social context and their relationships) is a key issue incontext-sensitive recognition of affective and social dimen-sions. Our results indicate that overall features are moresuccessful than turn-based features for predicting socialengagement and perceived affective interdependence, andthat the fusion of overall features with turn-based featuresencoding relationships between social and game contextachieved the best performance rate for a subset of the con-sidered dimensions, i.e., social engagement and help.

This paper represented a first step towards investigatingthe role of different types of contexts and their relationsfor the automatic prediction of perceived quality of interac-tion with a robot. We consider context-sensitive detectionof perceived quality of interaction as a key requirementfor the establishment of empathic and personalised human–computer and human–robot interactions that are more nat-ural and engaging (Leite et al. 2012a; Castellano et al.2010). While our proposed approach was tested on a spe-cific human–robot interaction scenario, we believe that themodelling of interdependencies between social and gamecontext-based features could be applied to other task-orientedsocial human–robot interactions.

Acknowledgements This work was partially supported by the Euro-pean Commission (EC) and funded by the EU FP7 ICT-317923 projectEMOTE. The authors are solely responsible for the content of this pub-lication. It does not represent the opinion of the EC, and the EC is notresponsible for any use that might be made of data appearing therein.

123

Page 15: Detecting perceived quality of interaction with a robot ......Several efforts towards the design of systems capable of perceiving multimodal social, affective cues (e.g., facial expressions,

Auton Robot (2017) 41:1245–1261 1259

References

Aylett, R., Castellano, G., Raducanu, B., Paiva, A., & Hanheide,M. (2011). Long-term socially perceptive and interactive robotcompanions: Challenges and future perspectives. In: Proceedingsof the 13th international conference on multimodal interaction(ICMI’11). New York: ACM.

Bae, B. C., Brunete, A., Malik, U., Dimara, E., Jermsurawong, J., &Mavridis, N. (2012). Towards an empathizing and adaptive sto-ryteller system. In: Proceedings of the 8th AAAI conference onartificial intelligence and interactive digital entertainment (pp.63–65). Stanford: CA.

Bartneck, C., Suzuki, T., Kanda, T., &Nomura, T. (2007). The influenceof people’s culture and prior experiences with aibo on their attitudetowards robots. AI and Society, The Journal of Human-CentredSystems, 21(1–2), 217–230.

Batrinca, L. M., Mana, N., Lepri, B., Pianesi, F., & Sebe, N. (2011).Please, tell me about yourself: Automatic personality assessmentusing short self-presentations. In: Proceedings of the 13th inter-national conference on multimodal Interaction—ICMI 2011 (pp.255–262). New York: ACM.

Biocca, F. (1997). The cyborgs dilemma: Embodiment in virtualenvironments. In: Cognitive technology, 1997. ’Humanizing theinformation age’. Proceedings., second international conferenceon (pp. 12–26). IEEE.

Borgers, N., de Leeuw, E., & Hox, J. (2000). Children as respondentsin survey research: Cognitive development and response quality.Bulletin de Methodologie Sociologique, 66(1), 60.

Breazeal, C. (2009). Role of expressive behaviour for robots that learnfrom people. Philosophical Transactions of the Royal Society B,364, 3527–3538.

Breazeal, C., Edsinger, A., Fitzpatrick, P., & Scassellati, B. (2001).Active vision for sociable robots. IEEE Transactions on Systems,Man and Cybernetics-Part A, 31(5), 443–453.

Castellano, G., Leite, I., Pereira, A., Martinho, C., Paiva, A., &McOwan, P. (2009). It’s all in the game: Towards an affect sen-sitive and context aware game companion. In: Proceedings of the3rd international conference on affective computing and intelligentinteraction (ACII 2009) (pp. 29–36). IEEE.

Castellano, G., Leite, I., Pereira, A., Martinho, C., Paiva, A., &McOwan, P. W. (2010). Inter-ACT: An affective and contextu-ally rich multimodal video corpus for studying interaction withrobots. In: Proceedings of the ACM International Conference onMultimedia (pp. 1031–1034). Florence: ACM.

Castellano,G., Pereira, A., Leite, I., Paiva,A.,&McOwan, P.W. (2009).Detecting user engagement with a robot companion using taskand social interaction-based features. In: International conferenceon multimodal interfaces and workshop on machine learning formultimodal interaction(ICMI-MLMI’09). Cambridge, MA, USA:ACM Press, 2009, pp. 119–126.

Castellano, G., Caridakis, G., Camurri, A., Karpouzis, K., Volpe, G., &Kollias, S. (2010). Body gesture and facial expression analysis forautomatic affect recognition. In K . R. Scherer, T. Baenziger, & E .B.Roesch (Eds.),Blueprint for affective computing: A aourcebook.Oxford: Oxford University Press.

Castellano, G., Leite, I., Pereira, A., Martinho, C., Paiva, A., &McOwan, P. (2010). Affect recognition for interactive compan-ions: Challenges and design in real-world scenarios. Journal onMultimodal User Interfaces, 3(1–2), 89–98.

Castellano, G., Leite, I., Pereira, A., Martinho, C., Paiva, A., &McOwan, P. (2013). Multimodal affect modelling and recogni-tion for empathic robot companions. International Journal ofHumanoid Robotics, 10, 1350010.

Castellano, G., Leite, I., Pereira, A., Martinho, C., Paiva, A., &McOwan, P. (2014). Context-based affect recognition for a robotic

game companion. ACM Transactions on Interactive IntelligentSystems, 4(2), 10.

Castellano, G., Mancini, M., Peters, C., & McOwan, P. W. (2012).Expressive copying behavior for social agents: A perceptual analy-sis. IEEE Transactions on Systems, Man and Cybernetics PartA—Systems and Humans, 42(3), 776–783.

Chang, C.-C., & Lin, C.-J. (2011). LIBSVM: A library for supportvector machines. In: ACM Transactions on intelligent systems andtechnology (vol. 2, pp. 27:1–27:27). Software. http://www.csie.ntu.edu.tw/~cjlin/libsvm.

Chen, Y.-W., & Lin, C.-J. (2006). Combining svms with various featureselection strategies. In I. Guyon, S. Gunn, M. Nikravesh, & L.Zadeh (Eds.), Feature extraction, foundations and applications.New York: Springer.

Corrigan, L., Basedow, C., Kuster, D., Kappas, A., Peters, C., & Castel-lano, G. (2014). Mixing implicit and explicit probes: Finding aground truth for engagement in social human–robot interactions.In: Proceedings of the ACM/IEEE international conference onhuman–robot interaction (HRI14) (pp. 140–141). Bielefeld:ACM.

Espinoza, R. R., Nalin, M., Wood, R., Baxter, P., Looije, R. Demiris,Y., Belpaeme, T., Giusti, A., & Pozzi, C., (2011). Child-robotinteraction in the wild: Advice for the aspiring experimenter. In:Proceedings of the 13th international conference on multimodalinteraction, ser. ICMI ’11. New York: ACM.

Feil-Seifer, D., & Mataric, M. (2011). Automated detection and classi-fication of positive versus negative robot interactions with childrenwith autism using distance-based features. In: Proceedings of the6th international conference on human–robot interaction, ser. HRI’11. (pp. 323–330). New York: ACM.

Gunes, H., Shan, C., Chen, S., & Tian, Y. (2012). Bodily expressionfor automatic affect recognition. In A. Konar & A. Chakraborty(Eds.), Advances in emotion recognition. New York: Wiley.

Hammal, Z., & Cohn, J. F. (2012). Automatic detection of pain inten-sity. In: Proceedings of the 14th ACM international conferenceon multimodal interaction, ser. ICMI ’12. (pp. 47–52). New York:ACM.

Heerink, M., Krose, B., Evers, V., &Wielinga, B. (2008). The influenceof social presence on acceptance of a companion robot by olderpeople. Journal of Physical Agents, 2(2), 33–40.

Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2010). A practical guide tosupport vector classification.

Kapoor A., & Picard, R. W. (2005). Multimodal affect recognition inlearning environments. In: ACM international conference on mul-timedia (pp. 677–682).

Lassalle, J., Gros, L., and Coppin, G. (2011). Combination of physi-ological and subjective measures to assess quality of experiencefor audiovisual technologies. In: Third International Workshop onQuality of Multimedia Experience (QoMEX) (pp. 13–18). IEEE.

Leite, I. (2013). Long-term interactions with empathic social robots.Ph.D. Dissertation, Universidade Técnica de Lisboa, InstitutoSuperior Técnico.

Leite, I., Castellano, G., Pereira, A., Martinho, C., & Paiva, A. (2012b).Modelling empathic behaviour in a robotic game companionfor children: An ethnographic study in real-world settings. In:ACM/IEEE international conference on human–robot Interaction(HRI). Boston: ACM.

Leite, I., Pereira, A., Castellano, G., Mascarenhas, S., Martinho, C., &Paiva, A., (2012a). Modelling empathy in social robotic compan-ions. In: Advances in user modeling: Selected papers from UMAP2011 workshops. Lecture notes in computer science (Vol. 7138).New York: Springer.

Leite, I., Pereira, A., Martinho, C., & Paiva, A., (2008). Are emotionalrobots more fun to play with? In: Robot and human interactivecommunication, 2008. RO-MAN 2008. The 17th IEEE interna-tional symposium on (pp. 77–82).

123

Page 16: Detecting perceived quality of interaction with a robot ......Several efforts towards the design of systems capable of perceiving multimodal social, affective cues (e.g., facial expressions,

1260 Auton Robot (2017) 41:1245–1261

Leite, I., Castellano, G., Pereira, A., Martinho, C., & Paiva, A. (2014).Empathic robots for long-term interaction. International Journalof Social Robotics, 6(3), 1–13.

Malta, L., Miyajima, C., & Takeda, K. (2008). Multimodal estimationof a driver’s affective state. In: Workshop on affective interactionin natural environments (AFFINE), ACM international conferenceon multimodal interfaces (ICMI’08). Chania.

Mandryk, R., Inkpen, K., & Calvert, T. (2006). Using psychophysio-logical techniques to measure user experience with entertainmenttechnologies. Behaviour and Information Technology, 25(2), 141–158.

Martinez, H. P., & Yannakakis, G. N. (2011). Mining multimodalsequential patterns: A case study on affect detection. In: Proceed-ings of the 13th international conference on multimodal interaction(ICMI’11). New York: ACM.

Martinho, C., & Paiva, A. (2006). Using anticipation to create believ-able behaviour. In: American Association for artificial intelligencetechnical conference (pp. 1–6). Boston, July 2006.

Mavridis, N., Katsaiti, M.-S., Naef, S., Falasi, A., Nuaimi, A., Araifi,H., et al. (2012). Opinions and attitudes towards humanoid robotsin the middle east. Springer Journal of AI and Society, 27(4), 517–534.

Mead, R., Atrash, A., & Mataric, M. J. (2011). Recognition of spatialdynamics for predicting social interaction. In: Proceedings of the6th international conference on Human–robot interaction, ser.HRI’11. (pp. 201–202). New York: ACM.

Mendelson, M. J., & Aboud, F. E. (1999). Measuring friendship qualityin late adolescents and young adults: Mcgill friendship question-naires.Canadian Journal of Behavioural Science, 31(1), 130–132.

Morency, L.-P., de Kok, I., & Gratch, J. (2008). Context-based recog-nition during human interactions: Automatic feature selection andencoding dictionary. In: ACM international conference on multi-modal interfaces (ICMI’08) (pp. 181–188). Chania.

Ochs, M., Niewiadomski, R., Brunet, P., & Pelachaud, C. (2011). Smil-ing virtual agent in social context. In:Cognitive processing. Specialissue on “social agents. From theory to applications”, September2011.

Peters, C., Asteriadis, S., & Karpouzis, K. (2010). Investigating sharedattention with a virtual agent using a gaze-based interface. Journalon Multimodal User Interfaces, 3(1–2), 119–130.

Poel, M., Nijboer, F., van den Broek, E. L., Fairclough, S., & Nijholt,A. (2012). Brain computer interfaces as intelligent sensors forenhancing human–computer interaction. In: 14th ACM interna-tional conference on multimodal interaction (pp. 379–382). ACM.

Poggi, I. (2007). Mind, hands, face and body. A goal and belief view ofmultimodal communication. Berlin: Weidler.

Rani, P., & Sarkar, N. (2005). Operator engagement detection and robotbehavior adaptation in human–robot interaction. In: Proceedingsof the 2005 IEEE international conference on robotics and automa-tion (ICRA 2005) (pp. 2051–2056). IEEE.

Rich, C., Ponsler, B., Holroyd, A., & Sidner, C. L. (2010). Recognizingengagement in human–robot interaction. In: HRI ’10: Proceedingof the 5th ACM/IEEE international conference on human–robotinteraction (pp. 375–382). New York: ACM.

Riek, L., Mavridis, N., Antali, S., Darmaki, N., Ahmed, Z., Al-Neyadi,M., & Alketheri, A. (2010). Ibn sina steps out: Exploring arabicattitudes toward humanoid robots. In: Proceedings of the sympo-sium on new frontiers in human–robot Interaction (SSAISB), the36th annual convention of the society for the study of artificialintelligence and simulation of behavior (AISB 2010) (pp. 88–94).

Riek, L., & Robinson, P. (2011). Challenges and opportunities in build-ing socially intelligent machines. IEEE Signal Processing, 28(3),146–149.

Sabourin, J., Mott, B., & Lester, J. (2011). Modeling learner affect withtheoreticallly grounded dynamic bayesian networks. In: Proceed-

ings of the 4th international conference on affective computing andintelligent interaction (ACII’11). New York: Springer.

Saerbeck, M., Schut, T., Bartneck, C., & Janse, M. (2010). Expres-sive robots in education—Varying the degree of social supportivebehavior of a robotic tutor. In: Proceedings of the 28th ACM con-ference on human factors in computing systems (CHI 2010) (pp.1613–1622). Atlanta: ACM.

Sanghvi, J., Castellano, G., Leite, I., Pereira, A., McOwan, P. W., &Paiva, A. (2011). Automatic analysis of affective postures andbody motion to detect engagement with a game companion. In:ACM/IEEE international conference on human–robot interaction.Lausanne: ACM.

Scassellati, B., Admoni, H., & Mataric, M. (2012). Robots for use inautism research. Annual Review of Biomedical Engineering, 14,275–294.

Sidner, C. L., Kidd, C. D., Lee, C. H., & Lesh, N. B. (2004). Whereto look: A study of human–robot engagement. In: IUI ’04: Pro-ceedings of the 9th international conference on intelligent userinterfaces (pp. 78–84). New York: ACM.

van Breemen, A., Yan, X., & Meerbeek, B. (2005). iCat: An animateduser-interface robot with personality. In:AAMAS ’05: Proceedingsof the fourth international joint conference on autonomous agentsand multiagent systems (pp. 143–144). New York: ACM.

von der Putten, A. M., Kramer, N. C., & Eimler, S. C. (2011). Livingwith a robot companion: Empirical study on the interactionwith anartificial health advisor. In: Proceedings of the 13th internationalconference on multimodal interaction, ser. ICMI ’11. New York:ACM.

Zeng, Z., Pantic, M., Roisman, G. I., & Huang, T. S. (2009). A sur-vey of affect recognition methods: Audio, visual, and spontaneousexpressions. IEEE Transactions on Pattern Analysis and MachineIntelligence, 31(1), 39–58.

Ginevra Castellano is an Asso-ciate Senior Lecturer at theDepartment of InformationTech-nology, Uppsala University,where she leads the SocialRobotics Lab. Her research inter-ests are in the areas of socialrobotics and affective comput-ing. She was the coordina-tor of the EU FP7 EMOTE(EMbOdied-perceptiveTutors forEmpathy-based learning) project(2012–2016). She is the recip-ient of a project grant foryoung researchers awarded by

the Swedish Research Council (2016–2019) and co-investigator of theCOIN (Co-adaptive human-robot interactive systems) project, fundedby the Swedish Foundation for Strategic Research (2016–2021). Shewas an invited speaker at the Summer School on Social Human–RobotInteraction 2013; she is a member of the management board of theAssociation for the Advancement of Affective Computing (AAAC); co-founder and co-chair of the AFFINE (Affective Interaction in NaturalEnvironments) workshops; co-editor of special issues of the Journal onMultimodal User Interfaces and the ACM Transactions on InteractiveIntelligent Systems; Program Committee member of several interna-tional conferences, including ACM/IEEE HRI, IEEE/RSJ IROS, ACMICMI, IEEE SocialCom, AAMAS, IUI, ACII.

123

Page 17: Detecting perceived quality of interaction with a robot ......Several efforts towards the design of systems capable of perceiving multimodal social, affective cues (e.g., facial expressions,

Auton Robot (2017) 41:1245–1261 1261

Iolanda Leite is an Asso-ciate Research Scientist at Dis-ney Research, Pittsburgh. Shereceived her Ph.D in InformationSystems and Computer Engi-neering from Instituto SuperiorTécnico, University of Lisbon,in 2013. During her Ph.D., shewas a Research Associate at theIntelligent Agents and SyntheticCharacters Group of INESC-IDin Lisbon. From 2013 to 2015,she was a Postdoctoral Asso-ciate at the Yale Social RoboticsLab, conducting research within

the NSF Expedition on Socially Assistive Robotics. Her doctoraldissertation,“Long-term Interactions with Empathic Social Robots”,received an honorable mention in the IFAAMAS-13 Victor Lesser Dis-tinguished Dissertation Award. Iolanda received the Best Student PaperAward at the International Conference on Social Robotics in 2012 andtheBest SubmissionAward at theACMSpecial InterestGroup onArtifi-cial Intelligence (SIGAI) Career Conference in 2015. Hermain researchinterests lie in the areas of child-robot interaction, artificial intelligenceand affective computing.

AnaPaiva is the research group leader ofGAIPS at INESC-IDand aFull Professorat Instituto Superior Tecnico, Universityof Lisbon. She is well known in the areaof Artificial Intelligence Applied to Edu-cation, Intelligent Agents and AffectiveComputing. Her research is focused onthe affective elements in the interactionsbetween users and computers with con-crete applications in robotics, educationand entertainment. She served as a mem-ber of numerous international conferenceand workshops. She has (co)authoredover 100 publications in refereed jour-nals, conferences and books. She was a

founding member of the Kaleidoscope Network of Excellence SIG onNarrative and Learning Environments, and has been very active in thearea of Synthetic Characters and Intelligent Agents. She co-ordinatedthe participation of INESC-ID in several European projects, such as theSafira project (IST- 5th Framework), where she was the prime contrac-tor, the VICTEC project in the area of virtual agents to help victimsof bullying, the LIREC (in the 7th Framework) in the area of roboticcompanions, the SIREN project in the area of games to teach con-flict resolution, and recently the Emote project in the area of Empathicrobotic tutors.

123