face recognition using a novel image representation …bebis/amis2015.pdfimage representation in...

Appl. Math. Inf. Sci. 7, No. ?, 1-13 (2013) 1

Applied Mathematics & Information SciencesAn International Journal

http://dx.doi.org/10.12785/amis/paper

Face Recognition Using a Novel Image RepresentationScheme and Multi-Scale Local FeaturesQingchuan Tao1, Zhiming Liu1, George Bebis2, and Muhammad Hussain3

1School of Electronics and Information Engineering, Sichuan University, Chengdu, Sichuan, P.R. China2Department of Computer Science and Engineering, University of Nevada, Reno, Nevada, USA3Department of Computer Science, King Saud University, Riyadh, Saudi Arabia

Received: ..., Revised: ..., Accepted: ...Published online: ...

Abstract: This paper presents a new method for improving face recognition performance under difficult conditions. The proposedmethod represents faces using multi-scale local features extracted from a novel representation of face images which employs colorinformation. Specifically, past research on face recognition has shown that color information can improve recognition accuracy androbustness. Instead of using the primary colors, R, G, and B, a new image representation scheme is proposed which is derivedfrom the YCrQ color space using Principal Component Analysis (PCA) followed by Fisher Linear Discriminant Analysis (FLDA).Multi-scale local features are used for face representation which are computed by extracting different resolution Local Binary Patterns(LBP) features from the new image representation and transforming the LBP features into the wavelet domain using Discrete WaveletTransform (DWT) and Haar wavelets; we refer to this new type of features as LBP-DWT. To optimize face representation, a variantof Nonparametric Discriminant Analysis (NDA), called Regularized Nonparametric Discriminant Analysis (RNDA) is introduced toextract the most discriminating features from LBP-DWT. The proposed methodology has been evaluated using two challenging facedatabases (i.e. FERET and Multi-PIE). We report promising experimental results showing that the proposed method outperforms twostate-of-the-art methods, one based on Gabor features and the other based on Sparse Representation Classification (SRC). Furtherimprovements are reported using score-level fusion.

Keywords: face recognition, color image, Local Binary Patterns (LBP), Discrete Wavelet Transform (DWT), Fisher LinearDiscriminant Analysis (FLDA), Nonparametric Discriminant Analysis (NDA)

1 Introduction

Feature selection and feature extraction are crucial tomany pattern classification problems (e.g., facerecognition). The key objective is transforming theoriginal data into a new domain where more effectivefeatures can be extracted, giving rise to significantdifferences between classes. Using a good set of featuresis vital for improving classifier performance. This paperpresents a new method for face recognition whichcapitalizes on fusing features across color, spatial, andfrequency domains for enhancing recognitionperformance. Specifically, the proposed method usesfeatures extracted by employing three different modules:the image representation module, the image featureextraction module, and the statistical feature extractionmodule.

While popular methods in face recognition mainlyfocus on extracting features either from the image space(e.g., using techniques such as Gabor kernels [6] andLocal Binary Patterns (LBP) [22]), or from transformedspaces (e.g., using statistical methods such as PCA [28]and FLDA [2]), there is little work on studying the role ofimage representation in face recognition. Using anappropriate image representation scheme can beconsidered as an important feature extraction stage. Mostface recognition methods start with an imagerepresentation that is a linear combination of the threeprimary colors, R, G, and B, for example, using theluminance image Y = 0.299R + 0.58G + 0.114B or theintensity image I = (R + G + B)/3. However, moretheoretical work and experimental results are in greatdemand to support that such image representationschemes are optimal for image analysis, especially forface recognition. Ideally, novel image representation

∗ Corresponding author e-mail: [email protected]⃝ 2013 NSP

Natural Sciences Publishing Cor.

2 C. Bianca, S. Ferrara: Hopf bifurcations in a model of capital accumulation

schemes that improve class separability are highlydesirable in order to simplify classifier design andimprove face recognition performance. Motivated byrecent work on fusing color information to improve facerecognition accuracy [17,35], we propose a new imagerepresentation scheme, which is derived from the YCrQcolor space using PCA followed by FLDA.

Local image features, such as Gabor and LBP, havedemonstrated remarkable success in improving facerecognition performance [15,1]. Since Gabor filters aresensitive to different scales and orientations, Gaborfeatures have been used to capture salient visualproperties and cope with image variations such asillumination changes. Motivated by these studies, wepropose a multiple-scale LBP feature that is computed bytransforming the LBP features into the wavelet domainusing Haar wavelets. This new type of feature, which willbe referred to as LBP-DWT, extends the traditionalsingle-resolution LBP feature to a multiple-scale spaceacross both the spatial and frequency domains.

Image features, or low-level features, alone are notsufficient to deal with face recognition under complexconditions, due to the fact that image features are usuallyredundant and noisy. It is widely accepted that statisticallearning techniques, such as PCA and FLDA, can findlow-dimensional spaces that contain more effectivefeatures, offering enhanced discriminating ability andcomputational efficiency. Motivated by NonparametricDiscriminant Analysis (NDA) [7], Large Margin NearestNeighbor (LMNN) classification [31], and eigenfeatureregularization [11], we present a variant of NDA usingregularization (RNDA) to extract the most discriminatingLBP-DWT features for enhancing face recognitionperformance.

The proposed face recognition approach has beenextensively evaluated on two challenging face databases,FERET [23] and CMU Multi-PIE [8]. FERET imageswere acquired during different photo sessions whereillumination conditions and face size may vary. CMUMulti-PIE images have large variations across pose,illumination, and expression. Experimental results showthat the proposed method improves face recognitionperformance significantly, and outperforms twostate-of-the-art methods, one based on Gaborfeatures [15] and the other based on SRC [32].

The rest of the paper is organized as follows: Section2 reviews methods related to our approach. Section 3presents the novel image representation scheme whileSection 4 introduces the LBP-DWT features and theRNDA approach. Section 5 reports our experimentalresults and comparisons. Finally, Section 6 contains ourconclusions and directions for future research.

2 Background

Face recognition has attracted a lot of attention recentlydue to the complexity of the problem and the enormous

number of applications both in the commercial andgovernment sectors [3,10,38]. Numerous methods havebeen proposed in the past two decades to address theproblems of face recognition with large image variationsdue to illumination, expression, blurring, etc. which areoften confronted in real-world face recognitionapplications.

Like any pattern recognition problem, facerecognition relies heavily on the features employed forclassification. Most state-of-the-art face recognitionmethods usually start with extracting image-level featuresthat are desired to be invariant to image variations inorder to facilitate subsequent statistics-level featureextraction for classification. Two popular image-levelfeatures used in face recognition are Gabor [15] andLBP [1]. Gabor features model quite well the receptivefield profiles of cortical simple cells [6]. Thus, theycapture salient visual properties such as spatiallocalization, orientation selectivity, and spatial frequencycharacteristic. The success of LBP in face recognition hasbeen attributed to its robustness in terms of gray-levelmonotonic transformation since faces can be seen as acomposition of micropatterns that are well described bythis operator [1]. In general, LBP features describe smallface details while Gabor features capture facial shape atlarger scales. The complementary information providedhas been exploited for enhancing face recognitionperformance [37,20,27].

Methods using gray-level images have difficultiesdealing with complex image variations, such as severeillumination changes, as evidenced in a recent FaceRecognition Grand Challenge (FRGC) competition [24].Instead of developing more sophisticated featureextraction methods and classifiers, a few researchersinvestigated the potential of using color information toaddress some of the challenges in face recognition [16,17,34,35,20,5,30]. Using color information for facerecognition has been motivated by the fact that, for anygiven color face image, a natural yet powerfulcomplementary image representation can be providedusing multiple color components that can enhance thediversity of misclassifications in statistical patternrecognition [13]. As a result, feature-level ordecision-level fusion can be used to improve theperformance of face recognition.

Although most researchers have ignored the role ofcolor information in face recognition, its promising effecton significantly boosting face recognition performancehas been validated using several color face databases,such as FRGC [16,17,20], FERET [5], PIE [5],XM2VTSDB [5], AR [35,30], and GT [30]. Severalresearchers [16,34,35,30] have proposed creating newcolor spaces that are more discriminative than existingones for face recognition, by applying statistical learningmethods such as PCA, FLDA, Independent ComponentAnalysis (ICA), and Tensor-based Discriminant Analysis.On the other hand, it has been also demonstrated thatperformance gains in face recognition can be achieved by

c⃝ 2013 NSPNatural Sciences Publishing Cor.

Appl. Math. Inf. Sci. 7, No. ?, 1-13 (2013) / www.naturalspublishing.com/Journals.asp 3

extracting different features from different colorspaces [17,20,5].

Low-level features, such as Gabor and LBP, usuallyreside in a high-dimensional space. As it is more likely toenhance the generalization accuracy and robustness ofclassification in a lower dimensional space [12], there is agreat demand to find meaningful and compact patterns toimprove the robustness of pattern recognition. Therefore,applying dimensionality reduction is an essential step fordesigning robust face recognition paradigms. Commonlyused methods for dimensionality reduction include PCAand FLDA. PCA reduces a large set of correlatedvariables to a smaller number of uncorrelatedcomponents, which can be used for recognition or facereconstruction. On the other hand, FLDA can achievehigher separability than PCA among different classes byderiving a projection basis that separates data in differentclasses as much as possible and compresses data in eachclass as much as possible. A fundamental limitation ofFLDA arises from the parametric nature of scattermatrices, whose computation relies on the assumptionthat the samples in each class follow a Gaussiandistribution. To address the case of a non-Gaussiandistribution, NDA [7] has been proposed which computesthe between-class scatter matrix using samples near theboundary between classes. The margin, which is derivedfrom the boundary between classes, plays an importantrole in classification since the larger the margin the lowerthe generalization error. A large margin between classescan be artificially maintained locally using a lineartransformation to pull same labeled samples closertogether and push different labeled samples furtherapart [31,9]. As the high dimensionality of face imagefeatures and the small number of training samples usuallycause the Small Sample Size (SSS) problem,dimensionality reduction is considered as an importantmethod to overcome this obstacle. It should be mentionedthat in dimensionality reduction, the eigenspace of thewithin-class scatter matrix usually contains an unstablesubspace due to noise and finite number of trainingsamples. Eigenfeature regularization based on aneigenspectrum model can be used to alleviate theproblems of instability, overfitting, and poorgeneralization [11].

3 A Novel Image Representation

Over the past two decades, much effort has been made ondeveloping different methods in face recognition usinggray-scale image representations. Except in facedetection, the effectiveness of color information in facerecognition has been largely ignored. Recently, a fewworks have considered using the complementarycharacteristics residing in color spaces to improve facerecognition performance [16,17,34,35,20,5,30]. Whilethese methods improve recognition performancesignificantly, they also have higher computational

complexity due to using multiple color components. Toprovide a trade-off between performance andcomputational efficiency, we present an optimum colorspace conversion scheme based on PCA and FLDA,namely a conversion from a 3-D color space to a 2-Dimage space.

Assuming a C1C2C3 color space, an image ofresolution m× n consists of three color components C1,C2, and C3. Without loss of generality, let C1, C2, and C3

be column vectors: C1,C2,C3 ∈ RN , where N = m × n.Each vector C is normalized to have zero mean and unitvariance. A data matrix X ∈ R3×Nl can be formed usingthe training images:

X =

C1

1, C21, C3

1C1

2, C22, C3

2...,

...,...

C1l C2

l , C3l

t

(1)

where l is the number of training images. In X, eachcolumn is an observation and each row is a variable. Thecovariance matrix ΣX may be formulated asΣX = 1

Nl−1 XXt ∈ R3×3, where X is the centered datamatrix. PCA is used to factorize ΣX into the followingform: ΣX = ΦΛΦ t , where Φ = [Φ1,Φ2,Φ3] ∈ R3×3 is anorthonormal eigenvector matrix andΛ = diag{λ1,λ2,λ3} ∈ R3×3 a diagonal eigenvaluematrix with diagonal elements in decreasing order(λ1≥λ2≥λ3).

By projecting the three color components C1,C2,C3

of an image onto the eigenvector Φ1, a new imagerepresentation U ∈ RN can be obtained:

U = [C1,C2,C3]Φ1 (2)

According to the properties of PCA, U is optimal fordata representation but not for data classification. Toobtain a representation that improves discrimination, oneneeds to handle within- and between-class variationsseparately. FLDA is a well-known subspace learningmethod that yields an optimal subspace, separatingdifferent classes as far as possible and compressing sameclasses as compactly as possible. Thus, a more powerfulimage representation can be obtained based on U via theFLDA framework.

Let Ui be the mean vector of class ωi and U be thegrand mean vector. Then the between- and within-classscatter matrices Sb and Sw are defined as follows:

Sb =k

∑i=1

P(ωi)(Ui − U)(Ui − U)t (3)

Sw =k

∑i=1

P(ωi)E{(U− Ui)(U− Ui)

t |ωi}

(4)

where P(ωi) is the prior probability of class ωi, and k is thenumber of classes, and Sb,Sw ∈RN×N . The general Fisher



criterion in the U image space can be defined as follows:

J(P) =|PtSbP||PtSwP|

(5)

Maximizing this criterion can be solved by deriving theoptimal transform matrix P = [ψ1,ψ2, . . . ,ψd ] ∈ RN×d ,where ψ1,ψ2, . . . ,ψd are chosen from the generalizedeigenvectors of SbΨ = λSwΨ corresponding to the dlargest eigenvalues.

Let C = [C1,C2,C3] be the original colorconfiguration with the normalization of zero mean andunit variance for each vector. By projecting C onto thematrix P, one can define the general color-spacebetween-class scatter matrix Lb and within-class scattermatrix Lw as follows [34]:

Lb =k

∑i=1

P(ωi)(Ci − C)tPPt(Ci − C) (6)

Lw =k

∑i=1

P(ωi)E{(C− Ci)

tPPt(C− Ci)|ωi}

(7)

where Ci is the mean of class ωi and C is the grand mean,and Lb,Lw ∈ R3×3. By obtaining the generalizedeigenvectors ξ1,ξ2,ξ3 of LbΞ = λLwΞ , ξ1 correspondingto the largest eigenvalue is selected as the optimal colortransformation, which can generate the discriminatingimage representation suitable for face recognition.Finally, a novel image representation D ∈ RN can bederived by projecting the three color component imagesC1,C2,C3 of an image onto ξ1:

D = [C1,C2,C3]ξ1 (8)

4 Face Recognition Using LBP-DWT andRNDA

In this section, first we present the LBP-DTW approachthat yields multi-scale local features both in spatial andfrequency domains. Then, we present the RNDAapproach that is used to extract the most discriminatingfeatures from LBP-DWT. For completeness, we provide abrief review of LBP.

4.1 LBP

LBP, which was originally introduced for textureanalysis [21], has been successfully extended to describefaces as a composition of micro-patterns. LBP hasdemonstrated good success in face recognition. In a 3×3neighborhood of the image, the basic LBP operatorassigns a binary label 0 or 1 to each surrounding pixel bythresholding its gray value using the central pixel value as

60

80 70

60

55 52

50

45

41

threshold

1

1

0 0 0

0

01

Binary: 10000011

Decimal:131

Fig. 1: An example of the basic LBP approach.

threshold. Then, the value of the central pixel is replacedwith a decimal number obtained by concatenating thebinary labels of the neighbors. Formally, the LBPoperator is defined as follows:

LBP =7

∑p=0

2ps(ip − ic) (9)

where s(ip − ic) equals 1 if ip − ic ≥ 0 and 0 otherwise.Fig. 1 illustrates an example of the basic LBP operator.Two extensions of the basic LBP were developed in [22]to enhance the discrimination ability of the texturedescriptor. The first extension allows LBP to deal withany neighborhood size by using circular neighborhoodsand bilinearly interpolating the pixel values. The secondextension defines the so-called uniform patterns. Referto [22] for the details.

The aforementioned extensions allow LBP is tocapture image information at different space resolutions.For example, LBPu2

8,1 operator describes facemicrostructure while LBPu2

8,3 encodes information in largerregions. Similarly to Gabor features that encodeinformation from multiple scales to improve facerecognition, multiple-resolution LBP features containface information from multiple scales.

A straightforward method to generatemultiple-resolution LBP features is to concatenatehistogram features derived from different LBP operators.This, however, leads to high dimensional vectors that candegrade classifier performance. To address this concern,we proposed using DWT.

4.2 LBP-DWT

DWT takes a series of n observations and produces n newvalues (i.e. wavelet coefficients). An advantage of DWTis its ability to decompose the variance of theobservations on a scale-by-scale basis. Specifically, DWTdecomposes an input signal x simultaneously using a lowpass filter g(k) and a high-pass filter h(k). The outputconsists of two parts, the approximation and detailcoefficients. Mathematically, DWT is defined as follows:

ylow(k) = ∑n

x(n)·g(n−2k) (10)

yhigh(k) = ∑n

x(n)·h(n−2k) (11)



multiple-scale

LBP features

2

2

detail

coefficients

approximation

coefficients

nonparametric

discriminant

analysis

nonparametric

discriminant

analysis

score-level

fusion

LBP8,1u2

histogram

LBP8,2u2

histogram

LBP8,3u2

histogram

Novel image

representation

Multiple-scale

LBP features

Discrete wavelet

transformLBP-DWT

features

RNDA

Fig. 2: Illustration of the proposed face recognition method.

The decomposition is applied recursively to the low-passoutput until a desired number of iterations is reached.

Due to concerns related to computational efficiency,we perform only one-level DWT decomposition using theHaar wavelet. The resulting LBP-DWT features containmultiple-scale information from both the spatial andfrequency domains. Fig. 2 illustrates the process forgenerating the LBP-DWT features from the novel imagerepresentation. The detail and approximation parts ofLBP-DWT, which are complementary to each other, willbe fed to RNDA and the classification outputs will befused at the score level to further improve classificationresults.

4.3 RNDA

FLDA is a commonly used classification method for facerecognition. However, when the samples follow anon-Gaussian distribution, it often leads to performancedegradation. Another drawback of FLDA is that thenumber of features assuming L classes has an upperbound of L− 1. To overcome these problems, Fukunagaproposed a nonparametric technique, known as NDA [7].The basis of the extension to NDA from FLDA is anonparametric between-class scatter matrix that measuresbetween-class scatter using k-nearest neighbors (kNN) ona local basis, instead of merely the class centers. Underthe new formulation and preferably weighting thesamples that are near the classification boundary, NDAinherently yields features preserving the structure that isimportant for classification [7]. The original NDA isrestricted to two-class problems; recent work hasextended NDA to the general L-class problem that issuitable for face recognition [14],[9].

The margin, which is defined as a hyperplanerepresenting the largest separation between classes, iswidely used in several machine learning algorithms. A

well-known example is Support Vector Machines(SVMs), which choose a subset of input vectors (i.e.support vectors) to construct the decision hyperplane bymaximizing the margin for achieving good generalization.In practice, samples from different classes are likely tomix with each other near the boundary. Such overlappingsamples bring considerable difficulty for the learningprocess and are the primary cause of classification errors.One solution to this problem is to learn a lineartransformation of the input space, which pulls samelabeled samples closer together and pushes differentlylabeled samples further apart in a local region [31,9].This allows a large margin between different classes to becreated in overlapping regions. For visualizationpurposes, in Fig. 3 we use a three-class problem in atwo-dimensional input space to illustrate the main ideabehind using a linear transformation to enlarge themargin. Fig. 3a shows that the samples of three classesmix with each other in a local neighborhood. Anappropriate linear transformation is learned to pull thesame-class samples close while push the different-classsamples away. After the transformation has been appliedon the data, there exists a finite margin between class 2and the other classes in the local region, as shown inFig. 3d. In particular, the effect of the lineartransformation, which enlarges the margin between class1 and class 2 in Fig. 3d, can be approximated by reducingthe distance d2 between sample b1 and local mean mb(b1)while increasing the distance d1 between sample b1 andlocal mean ma(b1), as shown in Fig. 3c [9].

Motivated by the aforementioned methods [31,9], wepresent a variant of NDA. Let xi

p be the pth sample ofclass i. Its k nearest neighbors (kNNs) in class j can befound and the local mean is denoted as m j(xi

p). Based onm j(xi

p), its k nearest neighbors (kNNs) in class i can befound and the local mean is denoted as mi(xi

p). Anexample is given in Fig. 3c, where k = 4 is used to define



(a) (b)

(d)

a1b4

a2

b4

a2

b1 b3

b2

a3

a1a3

b1

b3

b2

mb(b1)

(c)

pull

push

class 1

class 2

class 3

d1

d2

margin

a4

a4

ma(b1)

Fig. 3: Schematic illustration of the linear transformation to enlarge the margin between three classes in a two-dimensionalinput space. A local neighborhood centered at sample b1 contains its 3 nearest neighbors (3NNs) in class 2, as shown in(a). A linear transformation can be learned in (b) to pull same labeled samples closer together and push differently labeledsamples further apart, such that after transformation there exists a finite margin between class 2 and other classes in such alocal region, as shown in (d). In (c), the effect of the linear transformation, which enlarges the margin between class 1 andclass 2 in (d), can be approximated by reducing the distance d2 between sample b1 and local mean mb(b1) and meanwhileincreasing the distance d1 between sample b1 and local mean ma(b1).

kNNs and ma(b1) = m j(xip), mb(b1) = mi(xi

p). During thelearned linear transformation, pushing m j(xi

p) away fromxi

p as far as possible and pulling mi(xip) toward xi

p as closeas possible are approximated by maximizing the ratio ofbetween-class scatter to within-class scatter [9].

Following the definition of the nonparametric scattermatrix [7,14], the between-class and within-class scattermatrices are defined as

Sb =c

∑i=1

c

∑j=1j =i

ni

∑p=1

w(i, j, p)(xip −m j(xi

p))(xip −m j(xi

p))t ,

(12)

Sw =c

∑i=1

ni

∑p=1

(xip −mi(xi

p))(xip −mi(xi

p))t (13)

where c is the number of classes and ni is the number ofsamples in class i. The value of the weighting function,denoted as w(i, j, p), is defined as

w(i, j, p) =min(∥xi

p −mi(xip)∥α

1 ,∥xip −m j(xi

p)∥α1 )

∥xip −mi(xi

p)∥α1 +∥xi

p −m j(xip)∥α

1(14)

where ∥·∥1 is l1-norm and α is a parameter between zeroand infinity, which controls the changing speed of theweight with respect to the distance ratio. By using theFisher criterion, the optimum linear transformation thus iscomposed of the leading eigenvectors of S−1

w Sb.

In face recognition, the dimensionality of the faceimage is usually very high compared to the number ofavailable training samples, which may make thewithin-class scatter matrix singular. The inverse of thewithin-class scatter matrix often becomes unstable andproblematic. A popular solution to address this problem isusing PCA first for dimensionality reduction such that thewithin-class scatter matrix becomes nonsingular; then,FLDA is applied in the lower dimensional space.However, FLDA discards the null space of thewithin-class scatter matrix, which is also deemed tocontain important discriminatory information [4]. Toaddress this issue, an eigenfeature regularization subspaceapproach was proposed in [11] to extract thediscriminatory information residing in both the principaland null spaces of the within-class scatter matrix.Integrating eigenfeature regularization with NDA leads tothe proposed RNDA approach that can be summarized asfollows:

Training stage:

1.Apply PCA to a set of training samples and select theleading eigenvectors to construct the principalsubspace Wpca = {ψ1,ψ2, . . . ,ψn} so as to avoid thesingularity problem and reduce noise.

2.Project the training sample x into Wpca and whiten thefeatures, u = WT

wpcax, where Wwpca = {ψ1/√

λ1,

ψ2/√

λ2, . . . ,ψn/√

λn}.



3.Use u to compute the nonparametric Sb and Sw by (12)and (13).

4.Subspace decomposition of Sw.(a)Apply PCA to Sw for deriving the eigenspace

Φ = {ϕi}li=1 and the eigenvalues that are sorted in

descending order λ1≥λ2≥·· ·≥λl .(b)Split Φ into two subspaces: a principal subspace

{ϕi}mi=1 and its complement, a trailing subspace

{ϕi}li=m+1. The m can be estimated by using a

median-based operation [11].5.Eigenspectrum regularization. In the trailing subspace{ϕi}l

i=m+1, the largest eigenvalue λm+1 is used toreplace the remaining eigenvalues λi (m+2≤i≤l). Asa result, a new eigenvalue matrix Γm is formed, whosediagonal elements are the union of the eigenvalues in{ϕi}m

i=1 and the replaced ones in {ϕi}li=m+1.

6.Whiten Sw and the new between-class scatter matrix isS′

b = Γ −1/2m Φ tSbΦΓ −1/2

m .7.Compute the eigenspace Θ of S′

b and finally generatethe transformation matrix U = WwpcaΦΓ −1/2

m Θ .

Recognition stage:

1.Project a probe sample xp into U to extract the probefeature vector yp = UT xp.

2.Apply a classifier to recognize the probe featurevectors.

The classifier employs the nearest neighbor rule forclassification using cosine distance measure between aprobe feature vector yp and a gallery feature vector yg:

δcos(yp,yg) =−yt

pyg

∥yp∥∥yg∥(15)

5 Experiments

Extensive experiments have been carried out to evaluatethe proposed face recognition method using twochallenging data sets: FERET [23] and CMUMulti-PIE [8]. Specifically, we demonstrate theeffectiveness of the proposed method using three sets ofexperiments. The first set evaluates the effectiveness ofthe novel image representation scheme for facerecognition. The second set evaluates the discriminationpower of LBP-DWT features and compares them withtraditional LBP features. The third set compares theproposed face recognition method with twostate-of-the-art methods: Gabor-based recognition [15]and RSC [36].

5.1 Data Sets

FERET The color FERET database [23] consists of morethan 13,000 facial images corresponding to more than

1,500 subjects. Since images were acquired duringdifferent sessions, the illumination conditions and the sizeof the face may vary. The diversity of the FERETdatabase is across gender, ethnicity, and age. The imageswere acquired without any restrictions imposed on facialexpression and with at least two frontal images shot atdifferent times during the same session. The FERETdatabase has become the de facto standard for evaluatingface recognition technologies.

The data sets used in the experiments contain fourfrontal face image sets, fa, fb, dup1, and dup2.Specifically, 1,660 images comprising of 830 persons arefirst selected from the fa and fb sets to form a set, eachperson having one fa image and one fb image. From thisset, a subset of 1000 images corresponding to 500 personsthen is randomly selected to construct the training set.The gallery set consists of 967 fa images. The two mostchallenging FERET probe sets, dup1 and dup2, which areused for analyzing the effect of aging on the facerecognition performance, are evaluated as the testing setin the experiments. The image numbers of dup1 and dup2are 722 and 228. All images are processed by thenormalization consisting of the following procedures:first, manual annotation detects the centers of the eyes;second, rotation and scaling transformations align thecenters of the eyes to predefined locations and fixedinterocular distance; finally, a subimage procedure cropsthe face image to the size of 64 × 64 to extract the facialregion, where the locations of left eye and right eye are(17,17) and (47,17).

Multi-PIE The Pose, Illumination, and Expression (PIE)database was collected at CMU in the fall of 2000 [25], inorder to support research for face recognition across poseand illumination. Despite its popularity in the researchcommunity, the PIE database has some shortcomings.Particularly, there are only 68 subjects that were recordedin a single session, displaying a small range ofexpressions (neutral, smile, blink, and talk) [8]. Toaddress these issues, the Multi-PIE database wascollected between October 2004 and March 2005. Thisnew database improves upon the PIE database in anumber of categories. Most notably, a substantially largernumber of subjects were imaged (337 vs. only 68 in PIE)in up to four recording sessions [8]. In addition, therecording environment of the Multi-PIE database hasbeen improved in comparison to the PIE collectionthrough usage of a uniform, static background and livemonitors showing subjects during the recording, allowingfor constant control of the head position [8].

The experiments use only frontal images, i.e. pose05 1. In particular, all of the 249 subjects presented insession 1 serve as the training set. Each subject has siximages that cover illuminations 12, 14, and 16. Thenumber of training images thus is 1,494. The gallery setconsists of 249 images that were recorded in a neutralexpression without flash illumination, i.e. illumination 0.Of all 337 subjects in Multi-PIE 129 were recorded in all



Fig. 4: Example images from the FERET and Multi-PIE databases. The top row shows examples of FERET images. Thebottom row shows examples of Multi-PIE images, in which the illumination labels are 0, 4, 8, 10, 12, 14, and 16 from leftto right.

four sessions. The probe set comprises the images of such129 subjects in sessions 2, 3, and 4, coveringilluminations 4, 8, and 10. The image number of the probeset is 3,483. Since Multi-PIE does not provide the groundtruth of eye locations, face images in the experiments aregeometrically rectified by first automatically detectingface and eye locations through the Viola-Jones face andeye detectors [29] and then aligning the centers of theeyes to predefined locations for extracting 64 × 64 faceregion. However, due to the occlusion of sunglasses andthe presence of the dark glasses frames, the Viola-Joneseye detector fails in detecting the locations of the eyes fora small portion of face images. In such cases, manualannotation is used to locate the eyes.

Fig. 4 shows some example FERET and Multi-PIEimages used in the experiments that are already croppedto the size of 64 × 64 in face region. The FERETdatabase features various ethnicity and acquisitionenvironments, while the Multi-PIE database has differentlighting directions and extreme expressions. All imagesused in the experiments are original color images withoutany photometric normalization.

5.2 Evaluation of the Novel ImageRepresentation Scheme

The essence of the novel image representation scheme islearning a transformation from a space of uncorrelatedcolor representations to a space of most discriminativeimage. Previous research [18,19] has shown that theYCrQ color space provides more discriminatoryinformation than other color spaces in the case of theFRGC database [24]. Therefore, we have used YCrQ inour experiments. To derive the novel image representationfrom YCrQ, the transformation coefficients given by (8)are first obtained using PCA followed by FLDA using aset of training images corresponding to the union of theFERET and Multi-PIE training sets. In this case, thetransformation coefficients are [0.5768, 0.8156, 0.0446].It should be noted that the Y, Cr, and Q color images

Fig. 5: Examples of the novel image representation. Theimages from left to right are: Y, Cr, Q, and novel imagerepresentation.

should be normalized to zero mean and unit variancebefore applying the linear combination. Fig. 5 showssome examples using the novel image representationscheme. Interestingly, the resulting representation looksmore like the Cr color image than the gray-scale image Y.It is the color-related information that helps the novelimage representation scheme to outperformgray-level-based face recognition.

Three classification methods, Eigenfaces (PCA) [28],Fisherfaces (FLDA) [2], and SRC [32], were used in ourexperiments to evaluate the effectiveness of the novelimage representation scheme for face recognition. TheEigenfaces and Fisherfaces methods are classical face



Table 1: The performance of the novel image representation, gray image Y , color image Cr, and color image Q on theFERET and Multi-PIE databases.

Image Method FERET dup1 FERET dup2 Multi-PIEPCA 34.76% 10.96% 19.37%

Y image FLDA 50.69% 28.00% 55.29%SRC [32] 55.95% 48.24% 56.04%

PCA 48.75% 24.12% 38.38%Cr image FLDA 60.66% 47.80% 59.57%

SRC [32] 56.64% 47.36% 64.05%

PCA 43.76% 27.19% 17.97%Q image FLDA 63.40% 49.60% 41.65%

SRC [32] 59.00% 52.63% 52.56%

PCA 54.98% 41.66% 33.90%Novelrepresentation

FLDA 72.71% 72.80% 62.13%

SRC [32] 67.45% 66.66% 63.04%

recognition methods, which apply the global scatterinformation of the training samples whereas largelyignore the sample structures nonlinearly embedded in thehigh-dimensional face space. Although face images arenaturally very high dimensional, they lie on or nearlow-dimensional subspaces or submanifolds. If acollection of representative samples is found for thedistribution, one should expect that a typical sample has avery sparse representation with respect to such abasis [33]. Inspired by this finding, Wright et al. proposeda SRC approach for face recognition [32]. In SRC, agiven test image is first coded as a sparse linearcombination of the training samples via l1-normminimization. Then, recognition can be accomplished byevaluating which class from the training samples resultsin the smallest reconstruction error for the test image. Inour experiments, we used all 1,660 training images fromthe FERET database in SRC to improve the likelihoodthat for any test image, there exist similar training images.In addition, PCA was applied before SRC to reduce thedimensionality of input images.

Table 1 shows the recognition accuracy using thenovel image representation, the gray-level image Y, thecolor image Cr, and the color image Q on the twodatabases using different recognition methods. Generallyspeaking, the Cr color image yields better accuracy thanthe gray image Y on both databases. This finding is alsoconfirmed on the FRGC database [20]. The new imagerepresentation, defined as a linear combination of Y, Cr,and Q, thus inherits the discrimination power of Y, Cr,and Q to further improve face recognition performance.

5.3 Evaluation of LBP-DWT

To extract the LBP features, the face image, which is ofsize 64 × 64, is divided into overlapping patches. Thesize of each patch is 9 × 9 and there are 3 overlapping

pixels between adjacent patches. This leads to 144patches with each patch yielding LBP features of length59. As a result, each of the three LBP operators, LBPu2

8,1,LBPu2

8,2, and LBPu28,3, yields LBP histograms of length

8,496.

In the case of the new image representation, first weextract the LBP features using the three LBP operators,and then RNDA is applied to extract a set ofdiscriminating features from the detail and approximationparts of LBP-DWT, and finally the classification outputsare fused by the sum rule at the score level to derive thefinal classification results, as shown in Fig. 2. Forcomparison, multiclass NDA [14] is also implemented inthe experiments. There are two important parameters inRNDA and NDA. First, the number of nearest neighbors kand second, the weight α . As α does not influencerecognition performance significantly, we set its value to2 for both databases. However, k is training-set specificand affects performance. If k is chosen to the trainingnumber of each class and w(i, j, p) in (12) is set to one,NDA is essentially a generalization of ParametricDiscriminant Analysis (PDA) [14]. To avoid this, in ourexperiments we set k to one half of the training number ofeach class, i.e. 1 and 3 for the FERET and Multi-PIEdatabases, respectively.

Fig. 6 shows the effectiveness of different LBPfeatures. As it can be observed, the LBPu2

8,2 can achievebetter performance than LBPu2

8,1 and LBPu28,3. The proposed

LBP-DWT features consistently outperform LBPfeatures; its accuracy is 88.08%, 83.33%, and 72.26%,respectively, when using RNDA. The improvement overLBPu2

8,2 is 2.3%, 1.0%, and 5.0%. In particular, the gain ismore significant for the Multi-PIE database that containsimages having larger illumination variation. Generally,RNDA is marginally better than NDA for the LBP-DWTfeatures.



FERET dup1 (RNDA) FERET dup1 (NDA) FERET dup2 (RNDA) FERET dup2 (NDA) Multi−PIE (RNDA) Multi−PIE (NDA)0

10

20

30

40

50

60

70

80

90R

ecog

nitio

n R

ate

(%)

LBP8,1u2 LBP

8,2u2 LBP

8,3u2 LBP−DWT

Fig. 6: Face recognition performance of traditional LBP features and proposed LBP-DWT features, both being extractedfrom the novel image representation. Our results indicate that LBPu2

8,2 is better than LBPu28,1 and LBPu2

8,3 while LBP-DWTcan further improve face recognition performance.

5.4 Further improvements and comparisons

The face recognition performance achieved in theprevious subsection can be further improved by applyingscore-level fusion using the novel image representation,which encodes color information, with the Y image,which encoded gray-scale information. Specifically, thesum rule was used to fuse the similarity score of theLBP-DWT approach using the novel image representationwith the similarity score of the approximation part of theLBP-DWT approach using the gray-scale image Y. Fig. 7shows the obtained results, in which RNDA is used.

Table 2 compares the performance of the proposedapproach with two state-of-the-art methods, Gabor-basedclassification [15] and robust sparse coding (RSC)classification [36]. RSC models sparse coding as asparsity-constrained robust regression problem and seeksa maximum likelihood estimation (MLE) solution of thesparse coding problem. The experiments on Multi-PIEshow that RSC is more robust to outliers (e.g., occlusionsand corruptions) than SRC [36]. Our experimental resultson FERET and Multi-PIE indicate that the proposed facerecognition framework significantly outperforms theGabor-based and RSC approaches.

As local face details extracted by LBP are differentfrom those extracted by Gabor, these two types of localfeatures may be deemed to be complementary to eachother. Therefore, their combination has the potential toboost face recognition performance [20,27]. To test thishypothesis, we used the sum rule to fuse the classificationoutputs of the proposed method with the one obtained

FERET dup1 FERET dup2 Multi−PIE0

10

20

30

40

50

60

70

80

90

Rec

ogni

tion

Rat

e (%

)

LBP−DWT on new image LBP−DWT approx. on Y Sum rule fusion

Fig. 7: Improving face recognition performance by fusingthe similarity scores of the LBP-DWT approach using thenew image representation and that of the approximationpart of the LBP-DWT approach using the gray-scale imageY. Note that RNDA is used in the experiments.

using Gabor features using RNDA for classification.Table 3 shows that additional performance improvementscan be obtained.

We have also evaluated the effectiveness of RSC [36]using YCrQ. In particular, given face image, each colorcomponent image of YCrQ is first resized to 32 × 32 andthen normalized to zero mean and unit variance. The threenormalized image vectors are then concatenated and usedfor RSC-based classification. Our experimental resultsshown in Table 3 indicate that YCrQ helps RSC toimprove its performance remarkably, which implies that



Table 2: Performance improvements and comparisons with state-of-the-art methods.

Method FERET dup1 FERET dup2 Multi-PIEThe method in [15] on Ya 51.38% 57.01% 66.43%Gabor with NDA on Y 55.95% 60.08% 68.87%Gabor with RNDA on Y 56.64% 61.84% 69.36%RSC [36] on Y 45.15% 27.63% 64.85%RSC [36] on the novel image 58.03% 48.24% 71.26%The proposed method 88.78% 84.21% 77.43%

a Because Gabor features do not work well on color images for face recognition [20], we extracted Gabor features from the gray-scaleimage Y only.

Table 3: Recognition results obtained by fusing the proposed method with Gabor-based classification. Comparisons withRSC [36] using YCrQ and the gray-scale image Y.

Method FERET dup1 FERET dup2 Multi-PIEFusing the proposed method with Gabor 89.19% 85.52% 82.11%RSC [36] with concatenation of YCrQ 73.13% 75.00% 81.73%RSC [36] on the gray image Y 45.15% 27.63% 64.85%

color information is indeed a simple yet effective featurefor face recognition. Nevertheless, the proposed facerecognition framework fused with Gabor features is betterthan RSC using YCrQ.

6 Conclusions

Classical face recognition methods, such as Eigenfacesand Fisherfaces, have become inadequate in tacklingcomplex face recognition tasks, as evidenced by ourexperiments using the FERET and Multi-PIE databases inthis paper. Recently, efforts have focused on extractingmultiple complementary feature sets and new researchhas witnessed the development of several successful facerecognition systems using this idea [26,20,27]. In thispaper, we proposed a novel face recognition approachusing multiple feature fusion across color, spatial andfrequency domains. We summarize below the maincontributions of this work.

–First and foremost, a novel image representationscheme was derived from the uncorrelated YCrQcolor space by a transformation that maps the 3-Dcolor space to a 2-D feature space. Becauseuncorrelated color components naturally decomposeface color images into multiple complementary imagerepresentations, such a good property is important foreffective image representation used in facerecognition. Therefore, the new face image inherentlypossesses enhanced discrimination for facerecognition over gray image.

–Second, LBP-DWT features were designed todescribe multiple-scale local face details. Themultiple-resolution LBP features are extracted at three

different resolutions and DWT decomposes thehigh-dimensional LBP features into multiple-scalesacross spatial and frequency domains, henceenhancing the complementary characteristic of thefeatures. RNDA, a variant of NDA, was thenintroduced to extract a set of discriminative featuresfrom the LBP-DWT features.

Extensive experiments using the FERET andMulti-PIE databases demonstrated the feasibility of theproposed framework. In particular, our experimentalresults show that: 1) the novel image representationscheme performs better than the gray-scale image Y, 2)the LBP-DWT features are more powerful than traditionalLBP features, 3) RNDA can extract more powerfulfeatures from LBP-DWT to improve face recognitionperformance, and 4) the proposed face recognitionapproach significantly outperforms state-of-the-artmethods.

Recent research has shown that color spacenormalization (CSN) [35] techniques can significantlyreduce the correlation between color components, thusenhancing their discrimination power for facerecognition. Some hybrid color spaces, such as XGB,YRB, and ZRG, can achieve excellent performance in thecontext of face recognition, when CSN is applied [35].For future research, we plan to employ CSN techniques inorder to create more discriminating image representationsfor face recognition. In addition, we plan to investigatethe effectiveness of the new image representation schemeupon facial expression recognition and genderclassification.



Acknowledgement

This work was partially supported by the National Plan forScience and Technology, King Saud University, Riyadh,Saudi Arabia under project number 10-INF1044-02.

References

[1] T. Ahonen, A. Hadid, M. Pietikinen, Face description withlocal binary patterns: Application to face recognition. IEEETrans. Pattern Analysis and Machine Intelligence, vol. 28,no. 12, pp. 2037-2041, 2006.

[2] P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfacesvs. Fisherfaces: Recognition using class specific linearprojection. IEEE Trans. Pattern Analysis and MachineIntelligence, vol. 19, no. 7, pp. 711-720, 1997.

[3] K.W. Bowyer, K. Chang, P.J. Flynn, A survey of approachesand challenges in 3D and multi-modal 3D+2D facerecognition. Computer Vision and Image Understanding,vol. 101, no. 1, pp. 1-15, 2006.

[4] L. Chen, H. Liao, J. Lin, M. Ko, G. Yu, A new LDA-basedface recognition system which can solve the small samplesize problem. Pattern Recogition, vol. 33, no. 10, pp. 1713-1726, 2000.

[5] J.Y. Choi, Y.M. Ro, K.N. Plataniotis, Boosting color featureselection for face recognition. IEEE Trans. on ImageProcessing, vol. 20, no. 5, pp. 1425-1434, 2011.

[6] J.G. Daugman, Uncertainty relation for resolution inspace, spatial frequency, and orientation optimized by two-dimensional cortical filters. J. Optical Soc. Am., vol. 2, no.7, pp. 1160-1169, 1985.

[7] K. Fukunaga, Introduction to Statistical PatternRecognition, 2nd edn. (Academic Press, 1990).

[8] R. Gross, I. Matthews, J. Cohn, T. Kanade, S. Baker, Multi-PIE. Image and Vision Computing, vol. 28, pp. 807-813,2010.

[9] Z. Gu, J. Yang, L. Zhang, Push-Pull marginal discriminantanalysis for feature extraction. Pattern Recognition Letters,vol. 31, no. 15, pp. 2345-2352, 2010.

[10] A.K. Jain, S. Pankanti, S. Prabhakar, L. Hong, A. Ross,Biometrics: A grand challenge. Proc. IAPR InternationalConference on Pattern Recognition (ICPR’04), pp. 935-942, 2004.

[11] X. Jiang, B. Mandal, A. Kot, Eigenfeature regularizationand extraction in face recognition. IEEE Trans. PatternAnalysis and Machine Intelligence, vol. 30, no. 3, pp. 383-394, 2008.

[12] X. Jiang, Linear subspace learning-based dimensionalityreduction. IEEE Signal Processing Magzine, vol. 28, no. 2,pp. 16-26, 2011.

[13] J. Kittler, M. Hatef, P.W. Robert, J. Matas, On combiningclassifiers. IEEE Trans. Pattern Analysis and MachineIntelligence, vol. 20, no. 3, pp. 226-239, 1998.

[14] Z. Li, D. Lin, X. Tang, Nonparametric discriminant analysisfor face recognition. IEEE Trans. Pattern Analysis andMachine Intelligence, vol. 31, no. 4, pp. 755-761, 2009.

[15] C. Liu, H. Wechsler, Gabor feature based classificationusing the enhanced Fisher linear discriminant model forface recognition. IEEE Trans. on Image Processing, vol. 11,no. 4, pp. 467-476, 2002.

[16] C. Liu, Learning the uncorrelated, independent, anddiscriminating color spaces for face recognition. IEEETrans. on Information Forensics and Security, vol. 3, no.2, pp. 213-222, 2008.

[17] Z. Liu, C. Liu, Fusion of the complementary discretecosine features in the YIQ color space for face recognition.Computer Vision and Image Understanding, vol. 111, no. 3,pp. 249-262, 2008.

[18] Z. Liu, C. Liu, Q. Tao, Learning-based image representationand method for face recognition. Proc. IEEE 3rdInternational Conference on Biometrics: Theory,Applications and Systems (BTAS’09), 2009.

[19] Z. Liu, Q. Tao, Face Recognition Using New ImageRepresentations. Proc. 2009 International Joint Conferenceon Neural Networks (IJCNN’09), 2009.

[20] Z. Liu, C. Liu, Fusion of color, local spatial andglobal frequency information for face recognition. PatternRecognition, vol. 43, no. 8, pp. 2882-2890, 2010.

[21] T. Ojala, M. Pietikinen, D. Harwood, A comparative studyof texture measures with classification based on featuredistributions. Pattern Recognition, vol. 29, no. 1, pp. 51-59,1996.

[22] T. Ojala, M. Pietikinen, T. Menp, Multiresolution gray-scale and rotation invariant texture classification with localbinary patterns. IEEE Trans. Pattern Analysis and MachineIntelligence, vol. 24, no. 7, pp. 971-987, 2002.

[23] P.J. Phillips, H. Moon, S. Rizvi, P. Rauss, The FERETevaluation methodology for face recognition algorithms.IEEE Trans. Pattern Analysis and Machine Intelligence,vol. 22, no. 10, pp. 1090-1104, 2000.

[24] P.J. Phillips, P.J. Flynn, T. Scruggs, K.W. Bowyer, J. Chang,K. Hoffman, J. Marques, J. Min, W. Worek, Overview of theface recognition grand challenge. Proc. IEEE InternationalConference on Computer Vision and Pattern RecognitionWorkshop, 2005.

[25] T. Sim, S. Baker, M. Bsat, The CMU pose, illumination,and expression database. IEEE Trans. Pattern Analysis andMachine Intelligence, vol. 25, no. 12, pp. 1615-1618, 2003.

[26] Y. Su, S. Shan, X. Chen, W. Gao, Hierarchical ensemble ofglobal and local classifiers for face recognition. IEEE Trans.on Image Processing, vol. 18, no. 8, pp. 1885-1896, 2009.

[27] X. Tan, B. Triggs, Enhanced local texture feature sets forface recognition under difficult lighting conditions. IEEETrans. on Image Processing, vol. 19, no. 6, pp. 1635-1650,2010.

[28] M. Turk, A. Pentland, Eigenfaces for recognition. Journalof Cognitive Neuroscience, vol. 13, no. 1, pp. 71-86, 1991.

[29] P. Viola, M. Jones, Robust real-time face detection.International Journal of Computer Vision, vol. 57, no. 2,pp. 137-154, 2004.

[30] S.-J. Wang, J. Yang, N. Zhang, C.-G. Zhou, Tensordiscriminant color space for face recognition. IEEE Trans.on Image Processing, vol. 20, no 9, pp 2490-2501, 2011.

[31] Q. Weinberger, L. Saul, Distance metric learning for largemargin nearest neighbor classification. Journal of MachineLearning Research, 10, pp. 207-244, 2009.

[32] J. Wright, A. Yang, A. Ganesh, S. Sastry, Y. Ma, Robustface recognition via sparse representation. IEEE Trans.Pattern Analysis and Machine Intelligence, vol. 31, no. 2,pp. 210-227, 2009.



[33] J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. Huang, S.Yan, Sparse representation for computer vision and patternrecognition. Proceedings of the IEEE, vol. 98, no. 6, pp.1031-1044, 2010.

[34] J. Yang, C. Liu, Color image discriminant models andalgorithms for face recognition. IEEE Trans. NeuralNetworks, vol. 19, no. 12. pp. 2088-2098, 2008.

[35] J. Yang, C. Liu, L. Zhang, Color space normalization:Enhancing the discriminating power of color spaces for facerecognition. Pattern Recognition, vol. 43, pp. 1454-1466,2010.

[36] M. Yang, L. Zhang, J. Yang, D. Zhang, Robust sparsecoding for face recognition. Proc. IEEE InternationalConference on Computer Vision and Pattern Recognition,pp. 625-632, 2011.

[37] W. Zhang, S. Shan, W. Gao, X. Chen, H. Zhang, LocalGabor binary pattern histogram sequence (LGBPHS):A novel non-statistical model for face representationand recognition. Proc. IEEE International Confence onComputer Vision (ICCV’05), pp. 786-791, 2005.

[38] W. Zhao, R. Chellapa, J. Philips, A. Rosenfeld, Facerecognition: A literature survey. ACM Computing Surveys,vol. 35, no. 4, pp. 399-458, 2003.


face recognition using a novel image representation …bebis/amis2015.pdfimage representation in...

Documents