emotion recognition
DESCRIPTION
TRANSCRIPT
Emotion RecognitionRECOGNISING EMOTIONS FROM ENSEMBLE OF FEATURES
Introduction
As Human-Robot Interaction is increasing its attention nowadays. In order to put some limelight on socializing robots with human, Understanding the facial gestures and visual cues of an individual is a need.
It allows a robot to understand the expressions of humans in turn enhancing its effectiveness in performing various tasks.
It serves as a Measurement systems for behavioural science.
Socially intelligent software tools can be accomplished.
Challenges in Recognizing Emotions
Pose and Frequent head movements
Presence of structural components
Occlusion
Image orientation
Imaging conditions
Subtle facial deformation
Ambiguity and uncertainty in face motion measurement
Describing Facial Expressions
Quantitative Dynamics
Determine the amplitude of the expression in terms of intensity levels, where the levels correspond to some measure of the extent to which the expression is present on the face.
Temporal Dynamics
Splitting the expression into three temporal phases like Onset, Apex and Offset.
Methodology
Input Data Stage
Static Image
Sequence Video
Pre-Processing
Head Pose Identification
Face Tracking
Facial Part Identification
Feature Extraction
Localization of Facial
Action Units
Facial Point Tracking
Evolving Feature Points
Feature Classification
Using Machine Learning
Statistical methods
Emotion Detected
1. PreProcessing
Image is converted into grayscale
Gamma correction
Image Enhancement using Gaussian filter for Sharpness
Low Aliasing
After this normalization, the image will be fairly flat, limited to noise and blurring in the shadowed region as well as the jaggiest and aliasing.
1.1 Head Pose Identification
Sequence of Images , Camera
Parameters, Static Face Geometry
Stochastic Filtering
Pose Parameters [Rotation ,
Scale, Translation]
1.3 Face Tracking
Face tracking involves the separation on face as a feature space from the raw image or a video.
One of the reliable method of face tracking can be done using Color models.
The YCrCb color space is widely used for digital video. In the format, luminance information is stored as single component (Y) and chrominance information is stored as two color-difference components (Cb and Cr).
1.4 Face Part Identification
1.3.1 Eye Identification:
Eye display a strong vertical edges (Horizontal transitions) due to iris and white part of eye.
The Sobel mask can be applied to an image and the horizontal projection of vertical edges can be obtained to determine the Y coordinate of the eyes.
Sobel edge detection is applied to the upper half of the face image and the sum of each row is horizontally plotted
Vertical edger, Horizontal edger
The peak with the lower intensity value in horizontal projection of intensity is selected as the Y coordinate
A pair of regions that satisfy certain geometric conditions (G < 60) are selected as eyes from those regions.
1.3.2 Eyebrow Identification:
Two rectangular regions in the edge image which lies directly above each of the eye regions are selected as initial eyebrow regions.
These obtained edge images are then dilated and the holes are filled.
Then the edge images of these two areas are obtained for further refinement.
1.3.3 Mouth Identification
Since lips has more amount of red than other part of skin a color filter to enlarge the difference between lips and face.
Since the eye regions are known, the image region below the eyes in the face image is processed to and the regions which satisfy the following condition.
1.2 ≤ R/G ≤ 1.5
Action units
AUs are considered to be the smallest visually discernible facial movements.
As AUs are independent of any interpretation, they can be used as the basis for recognition of basic emotions.
However, both timing and the duration of various AUs are important for the interpretation of human facial behavior.
It is an unambiguous means of describing all possible movements of face in 46 action points .
2.1 Localization of Action Units The minor axis is a feature of the eye that varies for each
emotion. The major axis of the eye is more or less fixed for a particular person in varied emotions. The ellipse can be parameterized by its minor (2b) and major axes (2a).
From the edge detected eye image the value of b i.e. value of minor axis is computed by calculating the uppermost and lowermost position of the white pixels vertically.
The optimization is performed for more than 6 times for each emotion in reaching consistent minor axis value of b.
2.2 Facial Point Tracking
The motion seen on the skin surface at each muscle location was compared to a predetermined axis of motion along which each muscle expands and contracts, allowing estimates as to the activity of each muscle to be made.
Optical flow
Feature Based (Active Shape models)
Facial action coding system
FACS
Action Units
9 – Upper face
18 – Lower Face
5 - Miscellaneous
Action Descriptors
11 – Head Position
9- Eye Position
14 - Miscellaneous
The E motion Quadrants We can translate facial muscle movements into FAPs.
The selected FPs can be automatically detected from real images or video sequences.
In the next step, the range of variation of each FAP is estimated.
2.3 Evolving Feature Points
Once facial motion has been determined, it is necessary to place the motion signatures into the correct class of facial expression.
We translate facial muscle movements into Facial Action Points along the emotion quadrants.
Classifiers
The classification method used to distinguish between the emotions. All these approaches have focused on classifying the six universal emotions
Classifiers are concerned with finding the optimal hyper-plane that separates the classes in the feature space. The optimal hyper plane means finding the maximum margin between the classes.
Some commonly used classifiers are:
1. Adaboost
2. Support Vector Machines
3. Multilayer Perceptron
Support Vector Machine (SVM) is the successful and effective statistical classification machine learning approach. SVM is a linear classification that separates the classes in feature space by using hyper-panes.
Adaboost is similar to SVM algorithm. The AdaBoost preserves a probability distribution, weighted W, over the gathered samples.
MLP is a network model composed of an input layer, an output layer and several hidden layers. Each unit in the hidden layer and the output layer has two computations. The first one calculates their input, and then passes the input value through the activation function to obtain the units’ output.
ANN as a Classifier
The extracted feature points are processed to obtain the inputs for the neural network.
The neural network is trained so that the emotions neutral, happiness, sadness, anger, disgust, surprise and fear are recognized.
There may be roughly 41 input neurons.
References“An Approach To Automatic Recognition Of Spontaneous
Facial Actions”
This Paper describes the method of wrapping of Images into 3D forms of canonical views which is followed by machine learning techniques for emotion identification.B. Braathen et al [2002]
“Facial Expression Database And Multiple
Facial Expression Recognition”
Using human-Machine interaction the various attributes of people are databased and later facilitated for further recognition of expressions
Yu-Li Xue et al [2006]“Estimation Of The Temporal
Dynamics Of Facial Expression”Using the temporal dynamics of the image the locally linear embedding is done for the emotion identificationJane Reilly et al [2009]
“A Unified Probabilistic Framework
R Spontaneous Facial Action Modelling
And Understanding”
Recognizing is done by probabilistic facial action model based on the Dynamic Bayesian Network (dbn) to simultaneously represent rigid and nonrigid facial motions, their spatiotemporal dependencies, and their image measurements.
Yang Tong et al [2010]
“Real-Time Facial Expression Recognition With Illumination-
Corrected ImageSequences”
The image of a face is represented by a low dimensional vector that results from projecting the illumination corrected image onto a low dimensional expression manifold which favours robust identification of features He Li et al [2010]
“Facial Expression Recognition Based On
Weighted Principal Component Analysis And
Support Vector Machines”
The approach is based on the distribution of action units in the different facial area to determine the weights to extract the facial expression feature using SVMs
Zhiguo Niu et al [2010]“Fully Automatic Recognition
Of Temporal Phases Of Facial
Actions”
Algorithm uses a facial point detector to automatically localize 20 facial points. These points are tracked through a sequence of images using a method called particle filtering with factorized likelihoods. For temporal activation models based on the tracking data, it applies support vector machines.