a neural network model for human facial expression...

26
A Neural Network Model for Human Facial Expression Recognition IJCNN ‘99 Computer Science and Engineering Gary’s Unbelievable Research Unit (GURU) Institute for Neural Computation UC San Diego Matthew N. Dailey Garrison W. Cottrell

Upload: others

Post on 06-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Neural Network Model for Human Facial Expression Recognitioncs.ait.ac.th/~mdailey/papers/IJCNN-99-Slides.pdf · • Make new predictions to guide psychological research. Motivation

A Neural Network Model for HumanFacial Expression Recognition

IJCNN ‘99

Computer Science and Engineering

Gary’s Unbelievable Research Unit (GURU)

Institute for Neural Computation

UC San Diego

Matthew N. Dailey

Garrison W. Cottrell

Page 2: A Neural Network Model for Human Facial Expression Recognitioncs.ait.ac.th/~mdailey/papers/IJCNN-99-Slides.pdf · • Make new predictions to guide psychological research. Motivation

Why Cognitive Modeling?• Facial muscle movements can be observed objectively.

• But facial emotion is subjective.

• Machine recognition of emotional facial expressionsdepends on robustly detecting the muscle movementcombinations that humans reliably identify as Fear,Sadness, etc.

• Therefore, robust, practical facial emotion recognitionsystems should be informed by human perceptual data.

• Strategy:– Build simple systems that model human psychological data.

– Use the models to guide psychological research.

– Eventually, transfer the knowledge to practical systems.

Page 3: A Neural Network Model for Human Facial Expression Recognitioncs.ait.ac.th/~mdailey/papers/IJCNN-99-Slides.pdf · • Make new predictions to guide psychological research. Motivation

Motivation• Our goal: understanding the basis of human perception

of facial expression through cognitive modeling.

• There are two classic theories of facial emotionperception: categorical and dimensional.

• Young et al. (1997) facial expression “megamix”experiments using emotion morph stimuli provideevidence partially supporting both theories.

• Our theory: the data on perception of facial affect canlargely be explained by the computational requirementsof associating facial expressions with emotion labels.

• Our successful facial expression recognition systemaccounts for both categorical and dimensionalperceptual data with the same mechanism.

Page 4: A Neural Network Model for Human Facial Expression Recognitioncs.ait.ac.th/~mdailey/papers/IJCNN-99-Slides.pdf · • Make new predictions to guide psychological research. Motivation

A Good Cognitive Model Should:

• Be psychologically relevant (i.e. it should be in an areawith a lot of real, interesting psychological data).

• Actually be implemented.

• If possible, perform the actual task of interest ratherthan a cartoon version of it.

• Be simplifying (i.e. it should be constrained to theessential features of the problem at hand).

• Fit the experimental data.

• Make new predictions to guide psychological research.

Page 5: A Neural Network Model for Human Facial Expression Recognitioncs.ait.ac.th/~mdailey/papers/IJCNN-99-Slides.pdf · • Make new predictions to guide psychological research. Motivation

Motivation• Our goal: understanding the basis of human perception

of facial expression through cognitive modeling.

• There are two classic theories of facial emotionperception: categorical and multidimensional.

• Young et al. (1997) facial expression “megamix”experiments provide evidence partially supporting boththeories.

• Our theory: the data on perception of facial affect canlargely be explained by the computational requirementsof associating facial expressions with emotion labels.

• Our facial expression recognition system accounts forboth categorical and multidimensional data with onemechanism.

Page 6: A Neural Network Model for Human Facial Expression Recognitioncs.ait.ac.th/~mdailey/papers/IJCNN-99-Slides.pdf · • Make new predictions to guide psychological research. Motivation

Dimensional Theories of Emotion• Multidimensional scaling (MDS) of human similarity

judgments usually leads to a two-dimensional emotion“circumplex” with similar expressions closer together.

• Perceptual space is low-dimensional and continuous.

Anger

Happiness

Fear

SurpriseSadness

Disgust

• Predictions forperception:– Morphs along

chords shouldproduce intrusions.

– Morphs acrosscenter of the spaceshould travelthrough Neutral.

c.f. Russell (1980)

Page 7: A Neural Network Model for Human Facial Expression Recognitioncs.ait.ac.th/~mdailey/papers/IJCNN-99-Slides.pdf · • Make new predictions to guide psychological research. Motivation

Categorical Theories of Emotion• Categorical perception (e.g. colors in a rainbow):

– Sharp boundaries: a small physical change in stimulus leadsto big change in classification at the category boundary.

– Discrimination is better near category boundaries than nearprototypes.

– Response times are slower near category boundaries.

– Can be innate (e.g. color categories) or learned (e.g. theboundary between /p/ and /b/ phonemes or the boundarybetween Clinton and Kennedy in a morph sequence).

• Predictions for perception in facial expression morphs:– No intrusions of other expressions.

– No difference between chord/bisector morph sequences.

Page 8: A Neural Network Model for Human Facial Expression Recognitioncs.ait.ac.th/~mdailey/papers/IJCNN-99-Slides.pdf · • Make new predictions to guide psychological research. Motivation

Motivation• Our goal: understanding the basis of human perception

of facial expression through cognitive modeling.

• There are two classic theories of facial emotionperception: categorical and multidimensional.

• Young et al. (1997) facial expression “megamix”experiments provide evidence partially supporting boththeories.

• Our theory: the data on perception of facial affect canlargely be explained by the computational requirementsof associating facial expressions with emotion labels.

• Our facial expression recognition system accounts forboth categorical and multidimensional data with onemechanism.

Page 9: A Neural Network Model for Human Facial Expression Recognitioncs.ait.ac.th/~mdailey/papers/IJCNN-99-Slides.pdf · • Make new predictions to guide psychological research. Motivation

Young et al. (1997) Megamix: Methods• Created morphs between all possible pairs of the six

basic expressions and Neutral. Example:

• Presented stimuli inrandom order tosubjects, 6-way or 7-way forced choice(with or withoutNeutral).

FearPrototype

SadnessPrototype

10% 50%30% 70% 90%0% 100%

Fear to Sadness Morph Transition

0

20

40

60

80

100

90% 70% 50% 30% 10%

% Fear in Morph

% Id

en

tifie

d%Happy

%Surprise

%Fear

%Sad

%Disgust

%Anger

%Neutral

Page 10: A Neural Network Model for Human Facial Expression Recognitioncs.ait.ac.th/~mdailey/papers/IJCNN-99-Slides.pdf · • Make new predictions to guide psychological research. Motivation

Megamix Identification Results• All transitions have relatively sharp boundaries. Here

are 6:

iness Fear Sadness Disgust Anger Happ-Surprise

• Only a few very small “ intrusions” of unrelatedexpressions in transitions.

Page 11: A Neural Network Model for Human Facial Expression Recognitioncs.ait.ac.th/~mdailey/papers/IJCNN-99-Slides.pdf · • Make new predictions to guide psychological research. Motivation

Megamix: More Data Supporting CP• Subjects showed poor discrimination ability near

emotion prototypes and better discrimination abilitynear transitions.

• Response times were faster near prototypes and slowernear transitions ("scallop" shaped):

iness Fear Sadness Disgust Anger Happ-Surprise

Page 12: A Neural Network Model for Human Facial Expression Recognitioncs.ait.ac.th/~mdailey/papers/IJCNN-99-Slides.pdf · • Make new predictions to guide psychological research. Motivation

Less “Categorical” Data• Subjects were above chance at detecting the “mixed-in”

expression when 30% present.

• Despite seemingly categorical effects in perception ofthe morphs, the second expression is still detectable.

Mixed-In Expression Detection

-0.5

0

0.5

1

1.5

10 30 50

Percent Mixing

Ave

rage R

ank

Sco

re

Mixed-inexpression(humans)

Unrelatedexpressions(humans)

• Most apparentexpression: score =3.0.

• 2nd/3rd mostapparent: score =2.0/1.0.

• Normalized forresponse bias.

Page 13: A Neural Network Model for Human Facial Expression Recognitioncs.ait.ac.th/~mdailey/papers/IJCNN-99-Slides.pdf · • Make new predictions to guide psychological research. Motivation

Motivation• Our goal: understanding the basis of human perception

of facial expression through cognitive modeling.

• There are two classic theories of facial emotionperception: categorical and multidimensional.

• Young et al. (1997) facial expression “megamix”experiments provide evidence partially supporting boththeories.

• Our theory: the data on perception of facial affect canlargely be explained by the computational requirementsof associating facial expressions with emotion labels.

• Our facial expression recognition system accounts forboth categorical and multidimensional data with onemechanism.

Page 14: A Neural Network Model for Human Facial Expression Recognitioncs.ait.ac.th/~mdailey/papers/IJCNN-99-Slides.pdf · • Make new predictions to guide psychological research. Motivation

The Model’s Facial Expression Database• Ekman and Friesen proposed a quantification of the

prototypical muscle movement combinations (FacialActions) involved in portrayal of happiness, sadness,fear, anger, surprise, and disgust.– Result: the Pictures of Facial Affect (1976).

– 70% agreement on emotional content by naive humansubjects.

• 110 photos, 14 subjects, 7 expressions.

Actor “JJ”

SurprisedHappy Sad Afraid Angry Disgusted Neutral

Page 15: A Neural Network Model for Human Facial Expression Recognitioncs.ait.ac.th/~mdailey/papers/IJCNN-99-Slides.pdf · • Make new predictions to guide psychological research. Motivation

Model: Expression Transitions• Young et al. tested their subjects on morphs between

pairs of the 6 "basic" expressions and neutral.

• Used Ekman and Friesen actor "JJ."

• We recreated the morph stimuli used in their study withcommercial morphing software and the same JJ photos:

Fear

Surprise

Sadness

Disgust

Page 16: A Neural Network Model for Human Facial Expression Recognitioncs.ait.ac.th/~mdailey/papers/IJCNN-99-Slides.pdf · • Make new predictions to guide psychological research. Motivation

Representation: Gabor Jets• Based on the 2-D Gabor filter (Daugman, 1985).

• 2-D sinusoid wave localized by a Gaussian envelope.

• Combining kernels at multiple spatial frequencies andorientations forms a "jet."

• Good for object recognition (Lades et al. 1993), facerecognition (Wiskott et al. 1997), and classification ofindividual facial actions (Bartlett 1998).

Page 17: A Neural Network Model for Human Facial Expression Recognitioncs.ait.ac.th/~mdailey/papers/IJCNN-99-Slides.pdf · • Make new predictions to guide psychological research. Motivation

Gabor Jet Extraction

Convolution

Gabor “ jet” =8 orientations,5 spatial frequenciesat one location.

Extracted jets in arectangular 29x36 grid

.

.

.

41760-elementPattern Vector

.

.

.

35-elementPattern Vector

PCA

(top 35 P.C.eigenvectorsfor non-JJ

faces)

* Real Part (Cosine)

Imaginary Part (Sine)

Combinequadrature

pairs to get phaseinsensitive

Gabor magnitudes

69% ofvariance

Page 18: A Neural Network Model for Human Facial Expression Recognitioncs.ait.ac.th/~mdailey/papers/IJCNN-99-Slides.pdf · • Make new predictions to guide psychological research. Motivation

Feedforward Network Classification• Pattern vectors are classified independently by several

feedforward backpropagation networks. Combiningevidence improves generalization accuracy (3% for JJ).

• Individual networks: Train on 70 random faces,reserving remaining 29 for early stopping.

Training Set

HoldoutSet Disgust

Anger

Surprise

Individual Network(softmax, cross entropy)

Happiness

FearSadness

.

..

Page 19: A Neural Network Model for Human Facial Expression Recognitioncs.ait.ac.th/~mdailey/papers/IJCNN-99-Slides.pdf · • Make new predictions to guide psychological research. Motivation

Ensemble Combination

Averaging

Network 1

Network 2. . .Stimulus

Preprocessing(Gabor + PCA)

Happiness

DisgustSurpriseAngerFearSadness

. . .. . .

. . .. . .

. . .

. . .

• An experimental subject ismodeled by an ensemble of5 networks with differentweights and training sets.

• We combine the outputs ofindividual networks byaveraging their (softmax)outputs.

Network 5

Page 20: A Neural Network Model for Human Facial Expression Recognitioncs.ait.ac.th/~mdailey/papers/IJCNN-99-Slides.pdf · • Make new predictions to guide psychological research. Motivation

Measures for Cognitive Modeling• A college student = one trained-up network ensemble.

• Identification in six-way forced choice experiment = thelabel on the ensemble’s maximal output.

• Identification response time = the ensemble’suncertainty, or difference between the maximal outputand 1.0.

• Stimulus discrimination ability = dissimilarity = (1-correlation) between the ensemble’s 6-dimensionaloutput vectors for two stimuli.

• Scoring 1st, 2nd, and 3rd most apparent expressions =recording labels on the largest, 2nd largest, and 3rdlargest ensemble outputs.

Page 21: A Neural Network Model for Human Facial Expression Recognitioncs.ait.ac.th/~mdailey/papers/IJCNN-99-Slides.pdf · • Make new predictions to guide psychological research. Motivation

Modeling Results• Trained 50 ensembles of 5 networks on all actors except

JJ. 49 ensembles generalized perfectly to JJ prototypes.

• Tested network's response to JJ morph sequences.

• Good quantitative fit: r2 = 0.76 with zero fit parameters.

• Small qualitative differences: slightly larger “ intrusions,”less variance.

iness Fear Sadness Disgust Anger Happ-Surprise

Page 22: A Neural Network Model for Human Facial Expression Recognitioncs.ait.ac.th/~mdailey/papers/IJCNN-99-Slides.pdf · • Make new predictions to guide psychological research. Motivation

Model Response Times• The distance between an ensemble's maximal output

and 1.0 is a measure of its uncertainty:

• The model RTs show the same scallop-shapedpatterns as the data.

iness Fear Sadness Disgust Anger Happ-Surprise

Page 23: A Neural Network Model for Human Facial Expression Recognitioncs.ait.ac.th/~mdailey/papers/IJCNN-99-Slides.pdf · • Make new predictions to guide psychological research. Motivation

Model Discrimination• Correlation ( r ) between an ensemble's response to a

pair of stimuli models a similarity judgment; 1-r modelsdissimilarity / discriminability.

• As in human data, model discriminability is best nearthe category boundary.

Page 24: A Neural Network Model for Human Facial Expression Recognitioncs.ait.ac.th/~mdailey/papers/IJCNN-99-Slides.pdf · • Make new predictions to guide psychological research. Motivation

Perception of “Mixed-In” Expression• Can score and normalize the first, second, and third

largest network outputs as for the humans.

• Model scores for mixed-in expression are very close tothe human scores.

Mixed-In Expression Detection

-0.5

0

0.5

1

1.5

10 30 50

Percent Mixing

Ave

rage R

ank

Sco

re

Mixed-inexpression(humans)Unrelatedexpressions(humans)Mixed-inexpression(networks)Unrelatedexpressions(networks)

Page 25: A Neural Network Model for Human Facial Expression Recognitioncs.ait.ac.th/~mdailey/papers/IJCNN-99-Slides.pdf · • Make new predictions to guide psychological research. Motivation

Conclusions• Much of the human perceptual data can be accounted

for by a simple feedforward neural network that simplylearns to associate expressions with emotional labels.

• The fit is "easy" due to the inherent properties ofnonlinear classifiers.

• The minor failings of the model may be due to a lack oftraining data (there is little between-network variance).

• What have we learned from building this model?– Neutral classification should be separate from the emotions.

– The fit to human data helps us select one of two systems withequivalent performance on the POFA prototypes.

– Ensemble classification improves accuracy.

Page 26: A Neural Network Model for Human Facial Expression Recognitioncs.ait.ac.th/~mdailey/papers/IJCNN-99-Slides.pdf · • Make new predictions to guide psychological research. Motivation

Work in Progress• Robust Neutral/Expressive classification without test

set snooping.

• Adding dynamic information to improve performance.

• New experiments exploring the "malleability" ofexpression category boundaries in humans and innetworks.

• Collection of a large public database of emotional facialexpression images and video sequences.