emotion recognition

Emotion RecognitionRECOGNISING EMOTIONS FROM ENSEMBLE OF FEATURES

Introduction

As Human-Robot Interaction is increasing its attention nowadays. In order to put some limelight on socializing robots with human, Understanding the facial gestures and visual cues of an individual is a need.

It allows a robot to understand the expressions of humans in turn enhancing its effectiveness in performing various tasks.

It serves as a Measurement systems for behavioural science.

Socially intelligent software tools can be accomplished.

Challenges in Recognizing Emotions

Pose and Frequent head movements

Presence of structural components

Occlusion

Image orientation

Imaging conditions

Subtle facial deformation

Ambiguity and uncertainty in face motion measurement

Describing Facial Expressions

Quantitative Dynamics

Determine the amplitude of the expression in terms of intensity levels, where the levels correspond to some measure of the extent to which the expression is present on the face.

Temporal Dynamics

Splitting the expression into three temporal phases like Onset, Apex and Offset.

Methodology

Input Data Stage

Static Image

Sequence Video

Pre-Processing

Head Pose Identification

Face Tracking

Facial Part Identification

Feature Extraction

Localization of Facial

Action Units

Facial Point Tracking

Evolving Feature Points

Feature Classification

Using Machine Learning

Statistical methods

Emotion Detected

1. PreProcessing

Image is converted into grayscale

Gamma correction

Image Enhancement using Gaussian filter for Sharpness

Low Aliasing

After this normalization, the image will be fairly flat, limited to noise and blurring in the shadowed region as well as the jaggiest and aliasing.

1.1 Head Pose Identification

Sequence of Images , Camera

Parameters, Static Face Geometry

Stochastic Filtering

Pose Parameters [Rotation ,

Scale, Translation]

1.3 Face Tracking

Face tracking involves the separation on face as a feature space from the raw image or a video.

One of the reliable method of face tracking can be done using Color models.

The YCrCb color space is widely used for digital video. In the format, luminance information is stored as single component (Y) and chrominance information is stored as two color-difference components (Cb and Cr).

1.4 Face Part Identification

1.3.1 Eye Identification:

Eye display a strong vertical edges (Horizontal transitions) due to iris and white part of eye.

The Sobel mask can be applied to an image and the horizontal projection of vertical edges can be obtained to determine the Y coordinate of the eyes.

Sobel edge detection is applied to the upper half of the face image and the sum of each row is horizontally plotted

Vertical edger, Horizontal edger

The peak with the lower intensity value in horizontal projection of intensity is selected as the Y coordinate

A pair of regions that satisfy certain geometric conditions (G < 60) are selected as eyes from those regions.

1.3.2 Eyebrow Identification:

Two rectangular regions in the edge image which lies directly above each of the eye regions are selected as initial eyebrow regions.

These obtained edge images are then dilated and the holes are filled.

Then the edge images of these two areas are obtained for further refinement.

1.3.3 Mouth Identification

Since lips has more amount of red than other part of skin a color filter to enlarge the difference between lips and face.

Since the eye regions are known, the image region below the eyes in the face image is processed to and the regions which satisfy the following condition.

1.2 ≤ R/G ≤ 1.5

Action units

AUs are considered to be the smallest visually discernible facial movements.

As AUs are independent of any interpretation, they can be used as the basis for recognition of basic emotions.

However, both timing and the duration of various AUs are important for the interpretation of human facial behavior.

It is an unambiguous means of describing all possible movements of face in 46 action points .

2.1 Localization of Action Units The minor axis is a feature of the eye that varies for each

emotion. The major axis of the eye is more or less fixed for a particular person in varied emotions. The ellipse can be parameterized by its minor (2b) and major axes (2a).

From the edge detected eye image the value of b i.e. value of minor axis is computed by calculating the uppermost and lowermost position of the white pixels vertically.

The optimization is performed for more than 6 times for each emotion in reaching consistent minor axis value of b.

2.2 Facial Point Tracking

The motion seen on the skin surface at each muscle location was compared to a predetermined axis of motion along which each muscle expands and contracts, allowing estimates as to the activity of each muscle to be made.

Optical flow

Feature Based (Active Shape models)

Facial action coding system

FACS

Action Units

9 – Upper face

18 – Lower Face

5 - Miscellaneous

Action Descriptors

11 – Head Position

9- Eye Position

14 - Miscellaneous

The E motion Quadrants We can translate facial muscle movements into FAPs.

The selected FPs can be automatically detected from real images or video sequences.

In the next step, the range of variation of each FAP is estimated.

2.3 Evolving Feature Points

Once facial motion has been determined, it is necessary to place the motion signatures into the correct class of facial expression.

We translate facial muscle movements into Facial Action Points along the emotion quadrants.

Classifiers

The classification method used to distinguish between the emotions. All these approaches have focused on classifying the six universal emotions

Classifiers are concerned with finding the optimal hyper-plane that separates the classes in the feature space. The optimal hyper plane means finding the maximum margin between the classes.

Some commonly used classifiers are:

1. Adaboost

2. Support Vector Machines

3. Multilayer Perceptron

Support Vector Machine (SVM) is the successful and effective statistical classification machine learning approach. SVM is a linear classification that separates the classes in feature space by using hyper-panes.

Adaboost is similar to SVM algorithm. The AdaBoost preserves a probability distribution, weighted W, over the gathered samples.

MLP is a network model composed of an input layer, an output layer and several hidden layers. Each unit in the hidden layer and the output layer has two computations. The first one calculates their input, and then passes the input value through the activation function to obtain the units’ output.

ANN as a Classifier

The extracted feature points are processed to obtain the inputs for the neural network.

The neural network is trained so that the emotions neutral, happiness, sadness, anger, disgust, surprise and fear are recognized.

There may be roughly 41 input neurons.

References“An Approach To Automatic Recognition Of Spontaneous

Facial Actions”

This Paper describes the method of wrapping of Images into 3D forms of canonical views which is followed by machine learning techniques for emotion identification.B. Braathen et al [2002]

“Facial Expression Database And Multiple

Facial Expression Recognition”

Using human-Machine interaction the various attributes of people are databased and later facilitated for further recognition of expressions

Yu-Li Xue et al [2006]“Estimation Of The Temporal

Dynamics Of Facial Expression”Using the temporal dynamics of the image the locally linear embedding is done for the emotion identificationJane Reilly et al [2009]

“A Unified Probabilistic Framework

R Spontaneous Facial Action Modelling

And Understanding”

Recognizing is done by probabilistic facial action model based on the Dynamic Bayesian Network (dbn) to simultaneously represent rigid and nonrigid facial motions, their spatiotemporal dependencies, and their image measurements.

Yang Tong et al [2010]

“Real-Time Facial Expression Recognition With Illumination-

Corrected ImageSequences”

The image of a face is represented by a low dimensional vector that results from projecting the illumination corrected image onto a low dimensional expression manifold which favours robust identification of features He Li et al [2010]

“Facial Expression Recognition Based On

Weighted Principal Component Analysis And

Support Vector Machines”

The approach is based on the distribution of action units in the different facial area to determine the weights to extract the facial expression feature using SVMs

Zhiguo Niu et al [2010]“Fully Automatic Recognition

Of Temporal Phases Of Facial

Actions”

Algorithm uses a facial point detector to automatically localize 20 facial points. These points are tracked through a sequence of images using a method called particle filtering with factorized likelihoods. For temporal activation models based on the tracking data, it applies support vector machines.

emotion recognition

Technology

face image

face tracking face tracking

eye image

lower face

eye regions

face motion measurement

possible movements of

eye identification