beyond basic emotions: expressive virtual actors with ...prosody = the style of speech (intonation)...
TRANSCRIPT
-
Beyond Basic Emotions: Expressive Virtual Actors with Social Attitudes
Adela Barbulescu, Remi Ronfard, Gerard Bailly, Georges Gagnere and Huseyin Cakmak
-
Expressive speech animation Talking heads Complex mental states
2
Purpose of study
-
Expressive speech animation Talking heads Complex mental states
How to encode mental states of a talking character?
3
Purpose of study
-
Basic emotions4
Paul Ekman, An argument for basic emotions, 1992
-
Emotions in speech animation
Bregler et al, Mood swings: expressive speech animation, 2005 Neutral, Happy and Angry
Busso at al, Rigid Head Motion in Expressive Speech Animation: Analysis and Synthesis, 2007 Neutral, Angry, Happy and Sad
Albrecht et al, Mixed feelings: Expression of non-basic emotions in a muscle-based talking head, 2005 24 categories (gloating, relief, pride, reproach etc)
5
[Bregler et al, 2005]
-
Motivation6
Virtual actors performing an expressive dialogue
file:///C:/work/Desktop/sub.avi
-
Prosody = the style of speech (intonation) Graf et al, Visual prosody: facial movements accompanying speech,
2002 Levine et al, Realtime prosody driven synthesis of body language,
2009 Social attitude (ex: comforting, ironic, doubtful etc)
Bolinger et al, Intonation and its uses: Melody in grammar and discourse, 1989
7
Theoretical terms
« How we feel when we say (emotions) and how we feel about what we say (attitudes) »
[Bolinger, 1989]
-
Study on prosodic features voice pitch speech rhythm head movements
Discrete set of social attitudes Approaches
distance metric perceptual evaluation tests
8
Our solution
-
Theoretical framework
Prosodic features present attitude-specific signatures depending on the length of sentences [Morlec, 2001]
9
F0 and head motion encoded by contours at sentence level: 1, 2, 5, 9, 11 and 13 syllables
Declarative Question Disbelieving
file:///C:/work/Dropbox/vids/cnt_1.avifile:///C:/work/Dropbox/vids/cnt_3.avifile:///C:/work/Dropbox/vids/cnt_10.avi
-
Faceshift 1 director + 2 actors 35 identical phrases 13 attitudes from Mind Reading [Baron Cohen, 2004]
+ Declarative, Interrogative, Exclamative
10
Expressive corpus
-
Attitudes in corpus11
Declarative Interrogative Fond-likingComforting Seductive Fascinated
Jealous Thinking Disbelieving Sarcastic Scandalized Dazed
11
-
Prosodic representation
Voice pitch: F0 extraction Head movements: extracted using faceshift Rhythm: duration factor using elastic syllable model Used in computing a distance metric
12
Segmentation and annotation with Praat
-
Prosodic representation
Voice pitch: F0 extraction Head movements: extracted using faceshift Rhythm: duration factor using elastic syllable model Used in computing a distance metric
13
Segmentation and annotation with Praat
3 values / syllable (at 10, 50 and 80% of vocalic nucleus)
- 1 value / syllable
-
Distance metric (data analysis) Inter-class distances based on the prosodic features
Perceptual evaluation Create 3 types of material Carry perceptual tests for each type of material
Results comparison
14
Evaluation paradigm
-
Objective evaluation
Euclidian distances for equal-sized sentences Normalized F0 PCA components of rotation and translation
K-nearest neighbor framework Results to be compared with those of perceptual tests
15
1 declarative 2
exclamative
3 interrogation
4 comforting 5 fond-liking 6 seductive 7 fascinated 8 jealous 9 thinking10
incredulous
11 sarcastic12
scandalised
13 dazed14
responsable
15 hurt16
embarassed
Audio-only Visual-only Audio-visual
15
30.60 % 24.14 % 40.95 %
-
Original audio and original video
16
Material 1
Sarcastic Fascinated Fond-liking
file:///C:/work/Dropbox/test/2_3.mp4file:///C:/work/Dropbox/vids/8_3.wmvfile:///C:/work/Dropbox/vids/16_3.wmv
-
Original audio and motion capture Animation platform
17
Sarcastic Fascinated Fond-liking
Material 2
file:///C:/work/Dropbox/test2/data3/30_11.mp4file:///C:/work/Dropbox/vids/27_7_2.wmvfile:///C:/work/Dropbox/vids/27_5_2.wmv
-
Resynthesis of prosody Head motion, pitch and rhythm from expressive performance Add other params from neutral performance
18
Material 3
Sarcastic Fascinated Fond-liking
file:///C:/work/Dropbox/test3/data3/30_11.mp4file:///C:/work/Dropbox/vids/27_7_9.wmvfile:///C:/work/Dropbox/vids/27_5_3.wmv
-
Material 3
Audio resynthesis TD-PSOLA (Time-Domain Pitch-Synchronous Overlap and Add) Move, delete or duplicate short-time signals
19
Analysed speech
Synthesized speech
-
Material 3
Visual resynthesis DTW from neutral to expressive
(Dynamic Time Warping) Cubic spline interpolation for translations Quaternion interpolation for rotations
20
Rotation – quaternion x Translation x
-
Auto-evaluation
Material 1 (Original audio and video) 3 participants Best results: female actor
1 declarative 2
exclamative
3 interrogation
4 comforting 5 fond-liking 6 seductive 7 fascinated 8 jealous 9 thinking10
incredulous
11 sarcastic12
scandalised
13 dazed14
responsable
15 hurt16
embarassed
Audio-only Visual-only Audio-visual
21
78.12 % 81.25 % 78.12 %
-
User study: Material 1
Material 1 (Original audio and video) 84 participants
1 declarative 2
exclamative
3 interrogation
4 comforting 5 fond-liking 6 seductive 7 fascinated 8 jealous 9 thinking10
incredulous
11 sarcastic12
scandalised
13 dazed14
responsable
15 hurt16
embarassed
Audio-only Visual-only Audio-visual
22
30.98 % 35.47 % 36.90 %
-
User study: Material 2
Material 2 (Original audio and motion capture) 42 participants
1 declarative 2
exclamative
3 interrogation
4 comforting 5 fond-liking 6 seductive 7 fascinated 8 jealous 9 thinking10
incredulous
11 sarcastic12
scandalised
13 dazed14
responsable
15 embarassed
Audio-only Visual-only Audio-visual
23
26.00 % 17.73 % 31.72 %
-
User study: Material 3
Material 3 (Resynthesis of prosody) 13 participants
1 declarative 2
responsable
3 embarassed
4 comforting 5 fond-liking 6 seductive 7 fascinated 8 jealous 9 thinking10
incredulous
11 sneaky12
scandalised
13 dazed
Audio-only Visual-only Audio-visual
24
15.58 % 11.65 % 16.96 %
-
Material 1: Results per attitude25
Declarative Interrogative Fond-likingComforting Seductive
Fascinated Jealous Thinking Disbelieving Sarcastic Scandalized
Dazed EmbarassedResponsible
Exclamative
Hurt
0.170 0.066 0.788 0.290 0.368 0.761
0.391 0.167 0.814 0.214 0.640 0.264
0.357 0.273 0.150 0.617
-
Material 2: Results per attitude26
Declarative Interrogative Fond-likingComforting
Seductive Fascinated Jealous Thinking Disbelieving
Sarcastic Scandalized Dazed EmbarassedResponsible
Exclamative
0.235 0.128 0.602 0.253 0.372
0.550 0.286 0.227 0.424 0.181
0.313 0.554 0.219 0.040 0.540
-
Material 3: Results per attitude27
Declarative Fond-likingComforting Seductive Fascinated
Jealous Thinking Disbelieving Sarcastic
Scandalized Dazed EmbarassedResponsible
0.124 0.105 0.143 0.091 0.231
0.067 0.379 0.135 0.222
0.333 0.200 0.000 0.192
-
Average results and conclusions
Test type Audio-only Visual-only Audio-visual
Auto-evaluation 62.50% 73.96% 73.96%
Material 1 30.98% 35.47% 36.90%
Material 2 26.00% 17.73% 31.72%
Material 3 15.58% 11.65% 16.96%
Objective 30.60% 24.14% 40.95%
28
Generally better results for Audio-Visual Rates for video-only decrease as animation is used Results of objective and subjective tests are comparable for Audio-only F0 and speech rhythm present discriminant signatures Head movement is not sufficient
-
Improve retargetting (Material 2) Improve resynthesis (Material 3)
Blend prosodic and non-prosodic features Learn prosodic signatures
Investigate other prosodic features Audio: intensity Video: eyebrow movements, eye gaze, eye blink
Generate expressive dialogue
29
Future work
-
30
Thank you!
Slide 1Purpose of studySlide 3Basic emotionsEmotions in speech animationMotivationTheoretical termsOur solutionTheoretical frameworkExpressive corpusAttitudes in corpusProsodic representationSlide 13Evaluation paradigmObjective evaluationMaterial 1Material 2Slide 18Slide 19Slide 20Auto-evaluationUser study: Material 1User study: Material 2User study: Material 3Material 1: Results per attitudeMaterial 2: Results per attitudeMaterial 3: Results per attitudeAverage results and conclusionsFuture workSlide 30