[ieee 2013 ieee/sice international symposium on system integration (sii) - kobe, japan...

6
Abstract— A considerable amount of research work has been done for facial expression recognition using local or global feature extraction methods. Weber Local Descriptor (WLD), a simple and robust local image descriptor, is recently developed for local feature extraction. In facial expression recognition, the information contained in the local is important for the recognition result. The Histograms of Oriented Gradients (HOG) can well describe the local area information using gradient and orientation density distribution of the edge. In order to solve the lack of contour and shape information only by WLD features and to extract facial local features more efficiently, we propose a hybrid approach that combines the WLD with HOG features. We divide the images into blocks and weight each of them, then extract the two features and fuse them. At last, the weighted fused histograms are used to classify facial expressions by chi-square distance and the nearest neighbor method. The proposed method is applied on popular JAFFE and Cohn-Kanade facial expression databases and recognition rate is up to 93.97% and 95.86%. Compared with the Gabor Wavelet, LBP, and AAM and experimental results show that the proposed method achieves better performance for facial expression recognition. Keywords- Facial Expression Recognition; Weber Local Descriptor; Histograms of Oriented Gradients; Feature Fusion. I. INTRODUCTION With the development of human-computer interaction, facial expression recognition [1]-[3] has become a hot topic in the field of pattern recognition. After years of development, facial expression recognition has made some achievements. Same as pattern recognition, facial expression recognition are primarily divided into three steps: the first step is target detection and image preprocessing, the second is feature extraction step and the third step is classification. Feature extraction is the most important step in pattern recognition and there is no exception for facial expression recognition. Many methods have been proposed and have achieved great consequences. The method based on Gabor Manuscript received August 30, 2013. This work was supported in part by the National High-Tech Research & Development Program of China (863 Program, Grant No.2012AA011103), the Science and Technology Research Projects of Anhui Province (No.1206c0805039) and the National Natural Science Foundation of China (No.61300119) . Xiaohua Wang, Chao Jin, Wei Liu, and Min Hu are with AnHui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine && School of Computer and Information, Hefei University of Technology, 230009, China. [email protected] Liangfeng Xu is with the School of Computer and Information, Hefei University of Technology,China Fuji Ren is with the University of Tokushima, Shinkura-cho, Tokushima 770-8501, Japan(e-mail: [email protected]), a senior member of IEEE, and a Fellow of the Japan Federation of Engineering Societies wavelet [4] extracts multi-scale and multi-direction information, while the time-consuming and large amount of memory-requiring make it difficult to establish an efficient human-computer interaction systems; Features extracted by Active Appearance Model (AAM) [5] are relatively reliable and have a high recognition rate, but the disadvantages are that the calculation is complex and the initial parameters are difficult to obtain; The Manifolds features [6] are not only simple in calculation but also can resolve the high-dimension problems. However, there still exists the problem of uncertainty of the parameters and the intrinsic dimension. The current methods for determining these parameters are experimental type. The calculation of Local Binary Pattern (LBP) [7]-[8] is simple and it has gray-scale invariance and rotation invariance and other advantages, but the features extracted by LBP are sensitive to noise and contain only the texture information of images and ignore the shape information. Moreover, LBP does not consider the difference of the magnitude so that it lost part of the information. J Chen, S Shan, G Zhao, and M Pietikäinen [9] proposed a simple, efficient and robust texture descriptor called the Weber Local Descriptor (WLD). WLD consists of two components: differential excitation and orientation. It is not only very effective in extracting image texture information, but also very robust against noise and illumination changes. At present, K Yu, Z Wang, L Zhuo, J Wang, Z Chi, and D Feng use the WLD to learn realistic facial expressions from web images [10]. In addition, another method that has strong descriptive ability for local information is Histograms of Oriented Gradients (HOG) [11], which is proposed by Dalal and Triggs and known for Human Detection. Later, A Albiol, D Monzo, A Martin, J Sastre, and A Albiol [12] combine the HOG feature with Elastic Bunch Graph Matching (EBGM) for face recognition. S Chen, Y Tian, Q Liu, and D Metaxas [13] use improved HOG features for recognizing expressions from face and body gesture. They all have a common drawback which is insufficient for the characterization of local details. The main work of this paper is the proposal of a hybrid approach, which combine the two methods to represent the local details. In order to distinguish the contribution to the facial expression recognition of different blocks, the image is divided into blocks and given different weights which can increase the impact of the region with greater contribution. As in [14], histograms can effectively describe the global features of the texture image. In this article, the histograms corresponding to each block are cascaded to represent global features of a face. Finally, the chi-square distance and the Feature Fusion of HOG and WLD for Facial Expression Recognition Xiaohua Wang, Chao Jin, Wei Liu, Min Hu, Liangfeng Xu, and Fuji Ren Proceedings of the 2013 IEEE/SICE International Symposium on System Integration, Kobe International Conference Center, Kobe, Japan, December 15-17, SP1-L.2 978-1-4799-2625-1/13/$31.00 ©2013 IEEE 227

Upload: fuji

Post on 28-Feb-2017

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2013 IEEE/SICE International Symposium on System Integration (SII) - Kobe, Japan (2013.12.15-2013.12.17)] Proceedings of the 2013 IEEE/SICE International Symposium on System

Abstract— A considerable amount of research work has been done for facial expression recognition using local or global feature extraction methods. Weber Local Descriptor (WLD), a simple and robust local image descriptor, is recently developed for local feature extraction. In facial expression recognition, the information contained in the local is important for the recognition result. The Histograms of Oriented Gradients (HOG) can well describe the local area information using gradient and orientation density distribution of the edge. In order to solve the lack of contour and shape information only by WLD features and to extract facial local features more efficiently, we propose a hybrid approach that combines the WLD with HOG features. We divide the images into blocks and weight each of them, then extract the two features and fuse them. At last, the weighted fused histograms are used to classify facial expressions by chi-square distance and the nearest neighbor method. The proposed method is applied on popular JAFFE and Cohn-Kanade facial expression databases and recognition rate is up to 93.97% and 95.86%. Compared with the Gabor Wavelet, LBP, and AAM and experimental results show that the proposed method achieves better performance for facial expression recognition. Keywords- Facial Expression Recognition; Weber Local Descriptor; Histograms of Oriented Gradients; Feature Fusion.

I. INTRODUCTION With the development of human-computer interaction, facial expression recognition [1]-[3] has become a hot topic in the field of pattern recognition. After years of development, facial expression recognition has made some achievements. Same as pattern recognition, facial expression recognition are primarily divided into three steps: the first step is target detection and image preprocessing, the second is feature extraction step and the third step is classification.

Feature extraction is the most important step in pattern recognition and there is no exception for facial expression recognition. Many methods have been proposed and have achieved great consequences. The method based on Gabor

Manuscript received August 30, 2013. This work was supported in part by the National High-Tech Research &

Development Program of China (863 Program, Grant No.2012AA011103), the Science and Technology Research Projects of Anhui Province (No.1206c0805039) and the National Natural Science Foundation of China (No.61300119) .

Xiaohua Wang, Chao Jin, Wei Liu, and Min Hu are with AnHui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine && School of Computer and Information, Hefei University of Technology, 230009, China. [email protected]

Liangfeng Xu is with the School of Computer and Information, Hefei University of Technology,China

Fuji Ren is with the University of Tokushima, Shinkura-cho, Tokushima 770-8501, Japan(e-mail: [email protected]), a senior member of IEEE, and a Fellow of the Japan Federation of Engineering Societies

wavelet [4] extracts multi-scale and multi-direction information, while the time-consuming and large amount of memory-requiring make it difficult to establish an efficient human-computer interaction systems; Features extracted by Active Appearance Model (AAM) [5] are relatively reliable and have a high recognition rate, but the disadvantages are that the calculation is complex and the initial parameters are difficult to obtain; The Manifolds features [6] are not only simple in calculation but also can resolve the high-dimension problems. However, there still exists the problem of uncertainty of the parameters and the intrinsic dimension. The current methods for determining these parameters are experimental type. The calculation of Local Binary Pattern (LBP) [7]-[8] is simple and it has gray-scale invariance and rotation invariance and other advantages, but the features extracted by LBP are sensitive to noise and contain only the texture information of images and ignore the shape information. Moreover, LBP does not consider the difference of the magnitude so that it lost part of the information.

J Chen, S Shan, G Zhao, and M Pietikäinen [9] proposed a simple, efficient and robust texture descriptor called the Weber Local Descriptor (WLD). WLD consists of two components: differential excitation and orientation. It is not only very effective in extracting image texture information, but also very robust against noise and illumination changes. At present, K Yu, Z Wang, L Zhuo, J Wang, Z Chi, and D Feng use the WLD to learn realistic facial expressions from web images [10]. In addition, another method that has strong descriptive ability for local information is Histograms of Oriented Gradients (HOG) [11], which is proposed by Dalal and Triggs and known for Human Detection. Later, A Albiol, D Monzo, A Martin, J Sastre, and A Albiol [12] combine the HOG feature with Elastic Bunch Graph Matching (EBGM) for face recognition. S Chen, Y Tian, Q Liu, and D Metaxas [13] use improved HOG features for recognizing expressions from face and body gesture. They all have a common drawback which is insufficient for the characterization of local details.

The main work of this paper is the proposal of a hybrid approach, which combine the two methods to represent the local details. In order to distinguish the contribution to the facial expression recognition of different blocks, the image is divided into blocks and given different weights which can increase the impact of the region with greater contribution. As in [14], histograms can effectively describe the global features of the texture image. In this article, the histograms corresponding to each block are cascaded to represent global features of a face. Finally, the chi-square distance and the

Feature Fusion of HOG and WLD for Facial Expression Recognition Xiaohua Wang, Chao Jin, Wei Liu, Min Hu, Liangfeng Xu, and Fuji Ren

Proceedings of the 2013 IEEE/SICE InternationalSymposium on System Integration, Kobe InternationalConference Center, Kobe, Japan, December 15-17,

SP1-L.2

978-1-4799-2625-1/13/$31.00 ©2013 IEEE 227

Page 2: [IEEE 2013 IEEE/SICE International Symposium on System Integration (SII) - Kobe, Japan (2013.12.15-2013.12.17)] Proceedings of the 2013 IEEE/SICE International Symposium on System

nearest neighbor method are used to calculate the similarity. The rest of the paper is organized as follows: Section II

overview related work; Section III describes the proposed method; Section IV presents detailed experiments and results; and finally conclusions are given.

II. RELATED WORK

A. Weber Local Descriptor The Weber Local Descriptor (WLD) is inspired by Weber's

Law [15] which is the ratio of the increment threshold to the background intensity is a constant. The relationship can be defined as Equation (1).

,I kI

Δ = (1)

where IΔ represents the increment threshold, I represents the initial stimulus intensity and k stands for the proportion. On the left side of the equation remains constant despite

variations in the I term. The left side of the equation II

Δ is

known as Weber fraction. WLD contains two main parts: differential excitation ( )cxξ and orientation ( )cxθ .

1) Differential Excitation The intensity differences between its neighbors and a

current pixel are used as the changes of the current pixel. With this approach, it is desirable to obtain the prominent changes which are recognizable compared with human beings. Specifically, a differential excitation ( )cxξ of a

current pixel cx is computed. Firstly, use the filter 00f to calculate the differences between its neighbors and the center point:

0 1 2

7 3

6 5 4

s c

x x xx x x x

x x x

⎡ ⎤⎢ ⎥= ⎢ ⎥⎢ ⎥⎣ ⎦

,00

1 1 11 8 11 1 1

f+ + +⎡ ⎤⎢ ⎥= + − +⎢ ⎥⎢ ⎥+ + +⎣ ⎦

,01

0 0 00 1 00 0 0

f⎡ ⎤⎢ ⎥= +⎢ ⎥⎢ ⎥⎣ ⎦

( ) ( )1 1

00

0 0

p p

s i i ci i

v x x x− −

= =

= Δ = −∑ ∑ , (2)

where ( 0,1,... 1)ix i p= − denotes the thi neighbors of

cx and p is the number of neighbors. Secondly, compute the ratio of the differences to the intensity of the current point by combining the outputs of the two filters 00f and 01f

(whose output 01xv is the original image in fact). At last, the

differential excitation of the current pixel ( )cxξ is computed

as Equation (3).

( )00 1

010

arctan arctanp

s i cc

i cs

v x xx

xvξ

=

⎡ ⎤⎡ ⎤ ⎛ ⎞−= = ⎢ ⎥⎜ ⎟⎢ ⎥

⎝ ⎠⎣ ⎦ ⎣ ⎦∑ (3)

2) Orientation

The orientation component of WLD is the gradient orientation as in [17]. Firstly, use the filters 10f and 11f to

calculate 10sv and 11

sv , which is computed as Equation (4).

10

0 1 00 0 00 1 0

f−⎡ ⎤

⎢ ⎥= ⎢ ⎥⎢ ⎥+⎣ ⎦

, 11

0 0 01 0 1

0 0 0f

⎡ ⎤⎢ ⎥= + −⎢ ⎥⎢ ⎥⎣ ⎦

( )11

110arctan s

c ss

vx

vθ γ

⎛ ⎞= = ⎜ ⎟

⎝ ⎠, (4)

For simplicity, θ is further quantized into T dominant orientations. The quantization function is then calculated as Equation (5).

2( ) , and mod ,2 /t q

tf t TT T

θθ ππ

′⎛ ⎞⎢ ⎥′Φ = = = ⎜ ⎟⎢ ⎥⎣ ⎦⎝ ⎠ (5)

where [ ]2, 2θ π π∈ − and [ ]0, 2θ π′∈ . Here θ ′ is

calculated by the mapping :f θ θ ′ :

( )11 10arctan 2 ,s sv vθ π′ = + , and

( )

11 10

11 1011 10

11 10

11 10

, 0 and 0,, 0 and 0,

arctan 2 , , 0 and 0,

, 0 and 0,

s x

s xs s

s x

s x

v vv v

v vv vv v

θπ θθ π

θ

> >⎧⎪ + > <⎪= ⎨ − < <⎪⎪ < >⎩

(6)

More details about the WLD descriptor and WLD histogram will be further depicted in Section III.

B. Histograms of Oriented Gradients Scale-invariant feature transform (SIFT) is proposed by

David G. Lowe in 1999 [16] and in 2004 he summarizes this algorithm comprehensively [17]. SIFT algorithm has immutability in rotation, translation and scaling and high tolerance against brightness variations and noise, perspective transformation, biomimetic transformation at the same time. Because of this, it has a good performance in the scene matching, object recognition, image stitching, face recognition and other fields. The major steps of the SIFT algorithm are:

1) Scale-space extrema detection. 2) Orientation assignment. 3) Descriptor extraction.

HOG feature descriptor comes from the final step of Lowe's scale invariant feature transform (SIFT) method for wide baseline image matching. Besides, it summarizes the distribution of measurements within the image regions and is particularly useful for recognition of textured objects with deformable shapes. After that, the HOG feature descriptor has a certain similarity with the LBP operator, because they all belong to the method which extracts information in differential mode. In this way, due to the linear variation of illumination, the impacts caused by the change of gray scale are weakened. The features described by HOG reflect the shape information of the image and the gradient corresponds

978-1-4799-2625-1/13/$31.00 ©2013 IEEE 228

Page 3: [IEEE 2013 IEEE/SICE International Symposium on System Integration (SII) - Kobe, Japan (2013.12.15-2013.12.17)] Proceedings of the 2013 IEEE/SICE International Symposium on System

to the first derivative of the image. For a image ( , )I x y , gradient in an arbitrary pixel point is a vector which can be defined as Equation (7).

( , ) , ,T

x yI II x y G Gx y

⎡ ⎤∂ ∂⎡ ⎤∇ = = ⎢ ⎥⎣ ⎦ ∂ ∂⎣ ⎦, (7)

where xG is the gradient of the X direction and yG is the

gradient of the Y direction. The magnitude and direction angles of the gradient are as Equation (8) and Equation (9).

2 2( , ) x yI x y G G∇ = + (8)

( , ) arctan y

x

Gx y

Gθ = (9)

For a digital image ( , )I x y , the gradient also can be defined as Equation (10).

[ ][ ]

2

2

( 1, ) ( 1, )( , )

( , 1) ( , 1)

f x y f x yI x y

f x y f x y

+ − − +∇ =

+ − − (10)

III. PROPOSED METHOD In order to solve the lack of capacity to represent the details

only by a single algorithm, we propose a framework of the expression recognition shown in Fig.1. The main steps of our method are broadly classified into three components: preprocessing of facial images, facial feature representation and classification. The three components in details are described as follows.

A. Preprocessing The experiments are performed on JAFFE [18] and

Cohn-Kanade [19] database. Firstly, detect the faces from the database by using AdaBoost face detection algorithm proposed by P Viola and M Jones [20]-[22]. After the faces being detected, the coordinate of the center position, width and height are stored and multiplied by the scaling coefficient s ( 1.3s = ). Afterwards normalize these images into size of 128*128 shown in Fig. 2(a). Since the different parts of the face affect differently in facial expression recognition [10], we divide the images and weight each block separately according to the importance of the region in human perception as shown in Fig. 2(b).

B. Facial Feature Representation

The differential excitation ξ of the current pixel can be calculated by using (3). As we can see, by using the difference between the current point and its neighborhood, it is liable to simulate human perception to find some salient variations of the images. After that, the gradient direction θ of each pixel is calculated by using (4). However, θ is from

2π− to 2π so that making a map using (5). In [9], the WLD histogram ( ){ } ( ), , 0,1,... 1, 0,1,... 1j tWLD j N t Tξ θ = − = −

is computed by both differential excitation ( )cxξ and

orientation θ . Here N is the dimensionality of an image and T is the number of the dominant orientations. The size of this 2D histogram is T C× , where C is the number of cells in each orientation. In this 2D histogram, each column corresponds to a direction θ and each row corresponds to a differential excitation histogram with C bins.

In our proposed method, we calculate the WLD histograms of each block and M is the number of the segments of each

Fig. 2. (a) Normalized an image into 128*128, (b) divide the normalized image in 8x8 sub-regions and weight each block and the weights assigned for the weighted 2χ dissimilarity measure.

Fig. 3. The extraction and representation of WLD features.

Fig. 1. The program flow chart of our proposed method.

978-1-4799-2625-1/13/$31.00 ©2013 IEEE 229

Page 4: [IEEE 2013 IEEE/SICE International Symposium on System Integration (SII) - Kobe, Japan (2013.12.15-2013.12.17)] Proceedings of the 2013 IEEE/SICE International Symposium on System

subhistogram ( )H t , i.e., , m tH . And each , m tH is composed

of S bins. And then the WLD histogram of each block is computed as ( ) ( ), ,i jw WLD ξ θ∗ , where ( ),i jw is the weight

of the block corresponds to each color given in Fig. 2. The procedure is shown in Fig. 3. In our study, the optimal parameters for facial expression recognition are:

3, 8, 5M T S= = = . In order to enhance the characterization of the areas which

have great contributions to expression recognition, the HOG features will be used to extract more information of these areas. As in [11], we use the terms cell and block. Block corresponds to each divided block with 1.0 of its weight. Each block is uniformly divided into n cells. For any pixel of the image, the gradient direction is given by (9). Then divide the orientation into H bins. After computing the H - dimensional gradient histogram of n cells, put them together into n H× - dimensional feature description. The extraction process of HOG features is shown in Fig. 4(a). Following our experimental observations 4, 9n H= = and the unsigned gradient orientation can be maintained better effect and more stable. Since every angle may be any value from 0 toπ and will be discretized into 9 bins, the gradient of each pixel is decomposed into its two adjacent bins according to Equation (11) as shown in Fig. 4(b). The weight is higher if the gradient is closer to the bin.

1 2 2 1

2 1 2 1

( ) ( )( ) ( )

g gg g

β β ββ β β

= − ∂ − ∗= ∂ − − ∗

(11)

where , g ∂ are the gradient and angle of the pixel,

1 2 and β β are angles of its two adjacent bins.

After the WLD and HOG features being extracted, the blocks with great contributions have both these two features as shown in Fig. 5. Finally, the whole face is represented by the cascaded histograms of the M blocks.

C. Classification We propose to use the joint histogram of the two

complementary features, WLD and HOG, for facial expression recognition. Firstly, the chi-square distance of the

weighted fused histograms of test samples and training samples will be used in computing. Here chi-square distance is defined as Equation (12).

( ) ( ) ( )2

1

,N

t t t ti

D T S T S T S=

= − +∑ (12)

where T is the joint histogram of the test sample, S is the joint histogram of the training sample, N is number of values in the histogram, tT is the number of t in histogram of the

test sample, tS is the number of t in histogram of the training sample.

After the chi-square distance of the histograms is figured out, use the nearest neighbor method to classify. Finally, we can get the recognition results.

IV. EXPERIMENTS A. Datasets To evaluate the proposed method, JAFFE and

Cohn-Kanade which are the publicly available standard face database are used. JAFFE database includes 10 individual expression images. Each of them has seven kinds of facial expressions (Anger, Disgust, Fear, Happy, Neutral, Sad and Surprise). There are 3 or 4 samples for every expression and it has a total of 213 expression images. Cohn-Kanade consists of 100 university students in age from 18 to 30 years, of which 65% were female, 15% were African-American, and

Fig. 4. (a) The extraction process of HOG features. (b) The calculation of magnitude and angle.

Fig. 6. Some normalized facial expression of the experiments. (a) Samples from JAFFE database. (b) Samples from Cohn-Kanade

database

Fig. 5. The extraction process of WLD and HOG features. The whole face is represented by these cascaded histograms.

978-1-4799-2625-1/13/$31.00 ©2013 IEEE 230

Page 5: [IEEE 2013 IEEE/SICE International Symposium on System Integration (SII) - Kobe, Japan (2013.12.15-2013.12.17)] Proceedings of the 2013 IEEE/SICE International Symposium on System

3% were Asian or Latino. Subjects were instructed to perform a series of 32 facial expressions with seven prototypic emotions; namely, Anger, Disgust, Fear, Happy, Neutral, Sad, and Surprise. Some normalized facial expressions of the experiment are shown in Fig. 6.

B. Results and Discussions In our study, we do the experiment with our proposed

method three times based on JAFFE database and select 1-2 images of each expression as training, the rest 1-2 images as test and compare with other well known methods like the Gabor Wavelet, LBP and AAM. Both of training and test samples have 15 images per each expression. Since the JAFFE database contains limited samples, the average recognition rate is obtained by traversing the database 3 times. Every time, 1-2 images of each expression are chosen randomly for training and the rest for testing. For another, the experiment is also done in Cohn-Kanade database. The subjects which can be labeled as the seven prototypic emotions (Anger, Disgust, Fear, Happy, Neutral, Sad and Surprise) are used. Finally, we choose 32 subjects where six images are extracted from image sequences per expression, totaling 1,344 images. Half of the emotions of each subject are used for training and the rest for testing. This experiment is repeated 4 times, in the same way, 3 of each expression from different people are chosen randomly for training and the rest of them for testing.

The results of the experiments based on the two databases are shown in Table I and Table II. The classifier of these methods is chi-square distance and nearest neighbor method.

By the detail data shown from the above 2 tables, there are

some misrecognized cases. The reasons of the misrecognition are: 1) Some expressions are very similar and little change between the expressions; 2) Some expression images are micro-expression; 3) Some of these images have slight posture changes.

In this article, we mainly compare with LBP, Gabor wavelet and AAM on these two databases. The recognition results of each algorithm in recognition rate and time are shown in Fig. 7, Fig. 8 and Fig. 9.

Compared with other algorithms, the proposed algorithm has the highest recognition rate and lower recognition time. The main reasons are: 1) the features extracted by the proposed method both contain the texture information and shape information; 2) divide the image into different regions and weight each of them to highlight the important regions; 3) the dimensions of the histograms extracted by HOG and WLD are lower.

Experiments on JAFFE database

868890929496

1st 2nd 3rd

LBP Gabor

AAM Ours

Fig. 7. The comparison of recognition rate between LBP, Gabor, AAM

and the method proposed on JAFFE database for 3 times.

0

500

1000

1500

2000

2500

3000

LBP Gabor AAM Ours

AverageRecognitionTime(ms)

Fig. 9. The comparison of recognition time between LBP, Gabor, AAM and the method we proposed. This experiment is running on the

machine with Basic Frequency of 2.2GHz and 2G memory.

Experiments on CK database

86889092949698

1st 2nd 3rd 4th

LBP

Gabor

AAM

Ours

Fig. 8. The comparison of recognition rate between LBP, Gabor, AAM and the method proposed on Cohn–Kanade database for 4 times.

TABLE I The results of our system on JAFFE database

Expression An %

Di %

Fe %

Ha %

Ne %

Sa %

Su %

An Di Fe Ha Ne Sa Su

93.3 4.4 4.5 0.0 0.0 4.4 0.0

2.2 88.9 0.0 0.0 0.0 0.0 0.0

0.0 0.0 88.9 0.0 2.2 0.0 0.0

0.0 0.0 2.2 100 0.0 0.0 0.0

2.2 2.2 4.4 0.0 93.3 2.3 0.0

2.3 4.5 0.0 0.0 4.5 93.30.0

0.0 0.0 0.0 0.0 0.0 0.0 100

TABLE II The results of our system on CK database

Expression An %

Di %

Fe %

Ha %

Ne %

Sa %

Su %

An Di Fe Ha Ne Sa Su

95.5 2.3 0.4 0.0 0.2 1.5 0.0

2.8 96.9 0.2 0.0 0.0 2.3 0.1

0.0 0.2 92.7 0.5 1.0 0.0 2.2

0.0 0.0 0.0

98.3 0.9 0.0 0.0

0.4 0.0 3.8 1.2 95.9 2.2 0.0

1.3 0.6 2.1 0.0 1.5 94 0.0

0.0 0.0 0.8 0.0 0.5 0.0

97.7

978-1-4799-2625-1/13/$31.00 ©2013 IEEE 231

Page 6: [IEEE 2013 IEEE/SICE International Symposium on System Integration (SII) - Kobe, Japan (2013.12.15-2013.12.17)] Proceedings of the 2013 IEEE/SICE International Symposium on System

V. CONCLUSION In this paper we have proposed a novel hybrid feature

extraction method to improve facial expression recognition. HOG has a very strong ability in partial description and WLD can effectively extract the texture information of the images. The edges and texture information extracted by WLD are very consistent with the perception of human beings, because they depend on human perception of the difference in brightness and are well matched to the subjective criteria. After divide the image into different regions and weight each of them, we can extract the local information better. Then cascade these histograms to effectively describe the global features of the image. The proposed method is also insensitive to noise and non-monotonic illumination variations. Hence the facial expression recognition system using Fused WLD and HOG features can classify different expressions with higher accuracy and less time.

Future interest lies in how to take adaptive weights, solve pose and illumination changes, as well as apply our system to the real time cases. In addition, many machine learning methods will be introduced into the system to achieve more accurate classification in the future work.

REFERENCES [1] I. Kotsia and I. Pitas, “Facial expression recognition in image sequence

using geometric deformation features and support vector machines,” IEEE transactions on image processing, 2007, vol. 16, no. 1, pp. 172-187.

[2] S. Yang and B. Bhanu, “Facial expression recognition using emotion avatar image,” 2011 IEEE International Conference on Automatic Face and Gesture Recognition and Workshops, 2011, pp. 866-871.

[3] T. Gehrig and H. K. Ekenel, “Facial action unit detection using kernel partial least squares,” 2011 IEEE International Conference on Computer Vision Workshops, 2011, pp. 2092-2099.

[4] J. Ou, XB. Bai, Y. Pei, L. Ma, and W. Liu, “Automatic facial expression recognition using Gabor filter and expression analysis,” 2010 Second International Conference on Computer Modeling and Simulation, 2010, pp. 215-218.

[5] S. Hommel and U. Handmann, “AAM based continuous expression recognition for face image sequence,” IEEE 12th International Symposium on Computational Intelligence and Informatics, 2011, pp. 189-194.

[6] Y. Cheon and D. Kim, “Natural facial expression recognition using differential-AAM and manifold learning,” Pattern Recognition, 2009, pp. 1340-1350.

[7] T. Ojala, M. Pietikäinen, and D. Harwood, “A comparative study of texture measures with classification based on feature distributions, “ Pattern Recognition, 1999, vol. 29, no 1, pp. 51-59.

[8] D. Huang, M. Ardabilian, Y. Wang, and L. Chen, “A Novel Geometric Facial Representation based on Multi-Scale Extended Local Binary Patterns,” 2011 IEEE International Conference on Automatic Face and Gesture Recognition and Workshops, 2011, pp. 1-7.

[9] J. Chen, S. Shan, C. He, G. Zhao, and M. Pietikäinen, “WLD: A Robust Local Image Descriptor”, IEEE Trans. Pattern Analysis and Machine Intelligence, 2010, vol. 32, no. 9, pp. 1705-1720.

[10] K. Yu, Z. Wang, L. Zhuo, J. Wang, Z. Chi, and D. Feng, “Learning realistic facial expressions from web images,” Pattern Recognition, 2013, pp. 2144-2155.

[11] N. Dalal and B. Triggs, “Histograms of Oriented Gradients for Human Detection,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005.

[12] A. Albiol, D. Monzo, A. Martin, J.Sastre, and A. Albiol, “Face recognition using HOG-EBGM,” Pattern Recognition Letters, 2008, pp. 1537-1543.

[13] S. Chen, Y. Tiana, Q. Liu, and D. N. Metaxas, “Recognizing expressions from face and body gesture by temporal normalized motion and appearance features,” Image and Vision Computing, 2013, pp. 175-185.

[14] B. Zhang, S. Shan, X. Chen, and W. Gao, “Histogram of Gabor Phase Patterns (HGPP): A Novel Object Representation Approach for Face Recognition,” IEEE Transactions on Image Processing, 2007, vol.16, no.1, pp. 57-68.

[15] A. K. Jain, Fundamentals of Digital Signal Processing, Englewood Cliffs, NJ: Prentice-Hall, 1989.

[16] D. Lowe. “Object Recognition from Local Scale-Invariant Features,” Proc. of the International Conference on Computer Vision, 1999.

[17] D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.

[18] M. J. Lyons, J. Budynek, and S. Akamatsu, “Automatic Classification of Single Facial Images,” IEEE Trans. Pattern Analysis and Machine Intelligence, 1999, vol. 21, no. 12, pp. 1357-1362.

[19] T. Kanade, J. F. Cohn, and Y. Tian, “Comprehensive Database for Facial Expression Analysis,” Proceedings of the Fourth IEEE International Conference on Face and Gesture Recognition, 2000, pp. 46-53.

[20] Y. Freund and R. E. Schapire, “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting,” Journal of Computer and System Sciences, 1997, vol. 55, no. 1, pp. 119-139

[21] P. Viola and M. Jones, “Rapid Object Detection using a Boosted Cascade of Simple Features,” IEEE Conference on Computer Vision and Pattern Recognition, 2001, pp. 511-518

[22] P. Viola and M. Jones, “Robust Real-Time Face Detection,” International Journal of Computer Vision, 2004, vol. 57, no. 2, pp. 137-154

978-1-4799-2625-1/13/$31.00 ©2013 IEEE 232