efficient linear discriminant analysis with locality preserving for face recognition

7

Click here to load reader

Upload: xin-shu

Post on 11-Sep-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Efficient linear discriminant analysis with locality preserving for face recognition

Pattern Recognition 45 (2012) 1892–1898

Contents lists available at SciVerse ScienceDirect

Pattern Recognition

0031-32

doi:10.1

n Corr

E-m

journal homepage: www.elsevier.com/locate/pr

Efficient linear discriminant analysis with locality preserving forface recognition

Xin Shu n, Yao Gao, Hongtao Lu

MOE-Microsoft Laboratory for Intelligent Computing and Intelligent Systems, Department of Computer Science and Engineering,

Shanghai Jiao Tong University, Shanghai 200240, China

a r t i c l e i n f o

Article history:

Received 1 June 2011

Received in revised form

26 October 2011

Accepted 18 November 2011Available online 26 November 2011

Keywords:

Face recognition

Spectral regression

Linear discriminant analysis

Locality preserving projection

03/$ - see front matter & 2011 Elsevier Ltd. A

016/j.patcog.2011.11.012

esponding author. Tel.: þ86 13918570942.

ail address: [email protected] (X. Sh

a b s t r a c t

Linear discriminant analysis (LDA) is one of the most popular techniques for extracting features in face

recognition. LDA captures the global geometric structure. However, local geometric structure has

recently been shown to be effective for face recognition. In this paper, we propose a novel feature

extraction algorithm which integrates both global and local geometric structures. We first cast LDA as a

least square problem based on the spectral regression, then regularization technique is used to model

the global and local geometric structures. Furthermore, we impose penalty on parameters to tackle the

singularity problem and design an efficient model selection algorithm to choose the optimal tuning

parameter which balances the tradeoff between the global and local structures. Experimental results on

four well-known face data sets show that the proposed integration framework is competitive with

traditional face recognition algorithms, which use either global or local structure only.

& 2011 Elsevier Ltd. All rights reserved.

1. Introduction

Face recognition has attracted tremendous attention over thepast few decades. Many well known face recognition techniqueshave been developed over the last few decades [1–3]. One of themost successful and well-studied techniques is the appearance-based method [4,5]. When using appearance-based method, oneusually converts an m�n image matrix to an nm� 1-dimensionalvector. However, fast face recognition is impossible in this highdimensional vector space due to the curse of dimensionality. Acommon way to resolve this problem is to use the dimensionalityreduction technique. One of the most popular dimensionalityreduction approaches for supervised dimensionality reduction isthe linear discriminant analysis (LDA) [6,7]. It has been successfullyapplied to face recognition and other pattern recognition problems.LDA searches an optimal projection that makes the sample pointsfrom the different classes are far from each other while keeping thesample points from the same class to be closer as possible to eachother. The optimal projection can be obtained by solving aneigenvalue and eigenvector problem of the scatter matrices of thetraining data. LDA captures the only global structure with maximumclass discrimination which ignores the local geometrical structure.

Recently, local geometrical structure has received much attentionin dimensionality reduction [8]. The local geometrical structure of

ll rights reserved.

u).

data is usually characterized by a Laplacian matrix which isconstructed from an adjacency graph of the data [9]. Laplacianeigenmaps [8] and locality preserving projection (LPP) [10] arenonlinear and linear dimensionality reduction algorithms respec-tively which can capture the nonlinear and linear local structures ofa training data set based on the graph Laplacian. Many techniquescombining global and local structures for face recognition have beenproposed in the literature. In [11], a discriminant locality preservingprojections (DLPP) has been proposed to improve the classificationperformance of LPP and applied to face recognition. An orthogonaldiscriminant locality projections (ODLPP) method has been pro-posed in [12] for face recognition. However, these methods usuallysuffer from the small sample size problem when dealing with highdimensional data. To address the singularity problem, Yang et al.[13] proposed a null space discriminant locality preserving projec-tions for face recognition. The main drawback of their approach isthe expensive computational cost due to the singular value decom-position and eigenvalue decomposition in null space.

Recent work has showed that both LDA and LPP can bereformulated in the regression framework based on spectralregression [14,15]. In this paper, we propose a novel lineardiscriminant analysis which integrates both global and localstructures. We first cast LDA as a least square problem based onthe spectral regression and use the graph Laplacian as a regular-ization term to model local structure. The use of graph Laplacianas the regularization term has been studied in [16,17] in thecontext of regression and SVM. A tuning parameter is introducedto balance the tradeoff between global and local structures.

Page 2: Efficient linear discriminant analysis with locality preserving for face recognition

X. Shu et al. / Pattern Recognition 45 (2012) 1892–1898 1893

To further improve the performance of our model, we introduce asmooth regularizer to avoid the ill-posed problem usuallyoccurred in the regression framework. An efficient algorithm isthen developed to estimate an optimal tuning parameter. Modelselection is an essential issue in machine learning. For highdimensional data, the computational cost for model selectionmay be expensive. Our theoretical analysis in this paper showsthat the computational cost for estimating the optimal tuningparameter only takes place on the small size matrix.

The rest of the paper is organized as follows. In Section 2, wegive a brief review of LDA and its variant extensions. Section 3introduces our efficient locality preserving LDA. Extensive experi-ments for face recognition are conducted in Section 4 to verify theefficiency of our method. Followed with conclusions in Section 5.

2. Brief review of LDA

2.1. Classical LDA

Suppose we have n samples x1,x2,x3, . . . ,xn belonging to c

classes. The objective function of LDA is defined as follows:

a ¼ arg maxaT Sba

aT Swa,

Sb ¼Xc

k ¼ 1

nkðmðkÞ�mÞðmðkÞ�mÞT ,

Sw ¼Xc

k ¼ 1

Xnk

i ¼ 1

nkðxðkÞi �m

ðkÞÞðxðkÞi �mðkÞÞ

T , ð1Þ

where mðkÞ is the k-th class mean vector, m is the centroid of thetotal samples, xðkÞi is the i-th sample in the k-th class, nk denotesthe number of the k-th class. The matrices Sb and Sw are oftencalled the between-class scatter matrix and within-class scattermatrix, respectively.

Let St ¼Pn

i ¼ 1ðxi�mÞðxi�mÞT , we have St ¼ SbþSw [6], where St

is called the total scatter matrix. The objective function (1) is thenequivalent to

a ¼ arg maxaT Sba

aT Sta: ð2Þ

The solution of (2) is the generalized eigenvectors correspondingto the largest eigenvalues that satisfies

Sba¼ lSta: ð3Þ

It can be obtained by applying an eigen-decomposition on the matrixS�1

t Sb, if St is nonsingular. There are at most c�1 eigenvectorscorresponding to nonzero values, since the rank of Sb is boundedfrom above by c�1. Thus, the reduced dimension by LDA is atmost c�1.

To get a stable solution of classical LDA, the total scattermatrix St is required to be nonsingular which does not hold whenthe number of features is larger than the number of samples. Thisis known as small sample size problem. Several techniques,including two-stage PCAþLDA [5,18], GSVDþLDA [19] and reg-ularized discriminant analysis (RDA) [20,21], pseudo-inverse LDA[22] have been proposed to address the singularity problems.

However, all the above extensions of LDA involves the eigen-value decomposition or singular value decomposition of the datamatrix which is computationally expensive. Another approach todeal with singularity is to reformulate LDA as a least squareproblem, then using the penalty term or regularizer to encouragecertain types of solutions. We briefly introduce this in thefollowing.

2.2. Spectral regression LDA

In this section, we show classical LDA can be reformulated inthe graph embedding framework [14,15].

Let xi ¼ xi�m denote the centered data sample point and

XðkÞ¼ ½xðkÞ1 ,xðkÞ2 , . . . ,xðkÞnk

� denote the centered data matrix of the

k-th class. We have

Sb ¼Xc

k ¼ 1

nkðmðkÞ�mÞðmðkÞ�mÞT

¼Xc

k ¼ 1

nk1

nk

Xnk

i ¼ 1

ðxðkÞi �mÞ !

1

nk

Xnk

i ¼ 1

ðxðkÞi �mÞ !T

¼Xc

k ¼ 1

1

nk

Xnk

i ¼ 1

xðkÞi

Xnk

i ¼ 1

ðxðkÞi ÞT

!¼Xc

k ¼ 1

XðkÞ

W ðkÞðXðkÞÞT ,

where W ðkÞ is an nk � nk matrix with all elements equal to 1=nk.

Let X ¼ ½Xð1Þ

, . . . ,XðcÞ� which is the centered sample matrix and

define an n�n matrix W as follows:

W ¼

W ð1Þ 0 . . . 0

0 W ð2Þ . . . 0

^ ^ & ^

0 0 . . . W ðcÞ

266664

377775, ð4Þ

we have

Sb ¼Xc

k ¼ 1

XðkÞ

W ðkÞðXðkÞÞT¼ XWX

T: ð5Þ

Since St ¼ XXT, we have

Sw ¼ St�Sb ¼ X ðI�WÞXT¼ XLX

T: ð6Þ

The matrix W can be defined as the edge weight matrix of a graphG, Wij is the weight of edge corresponding to the vertices i and j.L¼ I�W is usually called the graph Laplacian [9].

From (5) and (6), we can reformulate the generalized eigen-problem (3) as

XWXTa¼ lXX

Ta: ð7Þ

The following theorem proposed in [15] provides an efficient wayto solve the LDA eigen-problem in (7):

Theorem. Let y be the eigenvector of eigen-problem

Wy ¼ ly ð8Þ

with eigenvalue l. If XTa¼ y, then a is the eigenvector of eigen-

problem (7) with the same eigenvalue l.

Since the eigen-problem (8) can be readily solved [15], theabove theorem shows that the LDA solution a can be obtained bysolving the following linear equations:

XTa¼ y, ð9Þ

where y is the eigenvectors of W. In reality, such a may not exist.A suggested way to find an optimal a is to use the followingminimization criterion:

a ¼ arg minXm

i ¼ 1

ðaT xi�yi Þ2, ð10Þ

where xi are the i-th column of X , yi are the i-th element of y.In the situation that the number of features is larger than the

number of samples, the minimization problem (10) is ill-posed.The most popular way to deal with ill-posed problem is to imposepenalty on the parameter a, this leads to the following regularized

Page 3: Efficient linear discriminant analysis with locality preserving for face recognition

X. Shu et al. / Pattern Recognition 45 (2012) 1892–18981894

least squares problem:

a ¼ arg minXm

i ¼ 1

ðaT xi�yi Þ2þlJaJ2

2: ð11Þ

The regularized least squares is also called ridge regression andis well studied in statistics [23]. This form of LDA is calledspectral regression discriminant analysis and has been wellstudied in [24].

3. Locality preserving LDA

Classical LDA and its extensions consider the global structurewhile deemphasize the local geometrical structure. In this section,we consider an efficient LDA which can incorporate the localgeometrical structure.

3.1. Local structure modeling

Given a data set fxigni ¼ 1, we can construct a graph G with n

vertices. Each vertex represents a data point. Let W be a sym-metric matrix with Wij representing the weight of edge joiningvertices i and j. The G and W can be used to model the geometricstructure of the data set. One commonly used weight Wij betweentwo nodes is defined as follows

Wij ¼exp

Jxi�xjJ2

s

!, xiANkðxjÞ or xjANkðxiÞ,

0 otherwise,

8>><>>: ð12Þ

where xiANkðxjÞ implies that xi is among the k nearest neighborsof xj or xj is among the k nearest neighbors of xi [8]. Let zi be anapproximate representation of data point xi in low dimensionalspace. In order to preserve local structure, one aims to minimizethe following objective function [8]X

i,j

Jzi�zjJ2Wij: ð13Þ

Empirically, if vertices i and j are close then zi and zj are close aswell, thus, the locality structure is preserved.

Let L¼D�W be the Laplacian matrix, D¼ diagðD11,D22, . . . ,DnnÞ

is a diagonal matrix with its entries being Dii ¼Pn

j ¼ 1 Wij. It iseasy to verify that

1

2

Xi,j

Jzi�zjJ2Wij ¼ trðZLZT

Þ, ð14Þ

where Z ¼ ½z1,z2, . . . ,zn�.

3.2. Locality preserving LDA

The local geometrical structure can be modeled as a regular-ization term defined in (14). Our global and local structurespreserving LDA compute an optimal transformation matrix An

from the following optimization problem:

An¼ arg min

AfJX

TA�Y J2

Fþl trðAT XLXTAÞþð1�lÞJAJ2

F g, ð15Þ

where J � JF denotes the Frobenius norm [25] of a matrix, lAð0;1Þis a tuning parameter and Y is a matrix whose columns areeigenvectors of (8).

Requiring the derivative of the right side of (15) with respectto W vanish, we have

XXTWþð1�lÞWþlXLX

TW ¼ XY : ð16Þ

Since the matrix XXTþlXLX

Tþð1�lÞI is nonsingular (see the

proof in Appendix A), the optimal solution can be readily

computed as

An¼ ðXX

TþlXLX

Tþð1�lÞIÞ�1XY , ð17Þ

where B�1 denotes the inverse of B.

3.3. Efficient model selection

The tuning parameter l plays an important role in our modelwhich balances the tradeoff between global structure and localstructure. Therefore, the choice of l is a critical issue for ouralgorithm. We usually use cross-validation from a large set of datato find an optimal l. However, when the data dimension is high,the size of XX

Tand XLX

Tis large, then the computational cost for

cross-validation is expensive. In the following, we design anefficient model selection method.

Assuming X ARm�n. It follows from (17) that

An¼ ½XX

TþlXLX

Tþð1�lÞI��1XY

¼ ð1�lÞ1

1�lXX

l1�l

XLXTþ I

� ��1

XY :

Let l ¼ l=ð1�lÞ and D1�l be a diagonal matrix whose entries are1=ð1�lÞ. By using the following formula [26] on matrix manip-ulations

ðIþABÞ�1¼ I�AðIþBAÞ�1B, ð18Þ

we have

½IþX ðD1�lþlLÞXT��1 ¼ I�XLðIþX

TXL�1X

T, ð19Þ

where L ¼D1�lþlL. Therefore

An¼ ð1�lÞXY�ð1�lÞXLðIþX

TXL�1X

TXY , ð20Þ

where I is an identity matrix with appropriate size. For the casewhen the number of features (m) is much larger than the samplesize (n), the matrix

IþXTX ðD1�lþlLÞARn�n

is much smaller than the matrix

XXTþlXLX

Tþð1�lÞIARm�m

thus is dramatically reducing the computational cost and a hugesave of memory. When the sample size is much larger than thenumber of features, the original formulation is more efficient.

3.4. Practical consideration and efficiency

From the above discussion, the main computational cost of our

algorithm is the inverse computation of a k� k ðk¼minðm,nÞÞ

matrix which requires Oðk3Þ via Gauss–Jordan elimination. How-

ever, when we use (17) or (20) to compute the optimal An, it is notnecessary to compute the inverse involved in (17) or (20)explicitly. We show here how it can be obtained. Suppose k¼m,

i.e., mon. Let A¼ ½a1,a2, . . . ,ac�, B¼ XXTþlXLX

Tþð1�lÞI,

D¼ XY ¼ ½Xy1,Xy2, . . . ,Xyc ,�, Eq. (16) can be decomposed as thefollowing c linear equations:

Bai ¼ di, i¼ 1;2, . . . ,c: ð21Þ

Several efficient iterative algorithms have been developed to solvethis large scale linear system. In this paper, we use LSQR, which is aniterative algorithm designed to solve large scale sparse linear equa-tions and least squares problems [27]. In each iteration, LSQR needs tocompute two matrix–vector products [28]. If LSQR stops after j

iterations, the main cost of LSQR for solving (21) requires

jcð2 m2þ8m) which can be simplified as Oðm2þmÞ. In our

Page 4: Efficient linear discriminant analysis with locality preserving for face recognition

X. Shu et al. / Pattern Recognition 45 (2012) 1892–1898 1895

experiments, LSQR converges after about 30 iterations. If m4n, thesame procedure can be applied to (20) to find the optimal A.

In order to evaluate the efficiency of the algorithm, we applyLocLDA to PIE face database, which has the largest number ofsamples among all data sets in our experiment. Table 1 shows thecomputational time of LocLDA for different sizes of PIE facedatabase, ranging from 340 to 3400. We can observe fromTable 1 that the cost of LocLDA grows slowly as the face databasesize increases. Because of the efficiency of LocLDA, it is practical tochoose the optimal value of l from large data set. In the following,we first use cross-validation to select the optimal value of l.

4. Experiments

In this section, we evaluate the performance of our proposedalgorithm for face recognition. The face recognition task is handledas a multi-class classification problem. We first map each test imageonto a low-dimensional subspace via a transformation matrix whichis learned from training data, then classify the test image by thenearest neighbor criterion.

4.1. Data sets and compared algorithms

Four publicly available face data sets are used in our experi-mental study, including ORL face database, Yale face database, theextended Yale face database and CMU PIE face database. Theimportant statistics of four face data sets are listed below:

TabCom

Si

T

The ORL face database (http://www.cl.cam.ac.uk/research/dtg/attar chive/facedatabase.html) consists of a total of 400 picturesof 40 people. Each person has 10 images. For some subjects, theimages are taken at different times. The facial expressions (openor closed eyes, smiling or nonsmiling) and facial details (glassesor no glasses) also vary. The images are taken with a tolerance forsome tilting and rotation of the face of up to 201. Moreover, thereis also some variation in the scale of up to about 10%.

� The Yale face database (http://cvc.yale.edu/projects/yalefaces/

yalefaces.html) contains 165 grayscale images in GIF format of15 individuals. There are 11 images per subject,one perdifferent facial expressions or configurations: center-light,w/glasses, happy, left/light, w/no glasses, normal, right/left, sad,sleepy, surprised, and wink.

Table 2

The extended Yale face database includes the Yale face databaseB [29] and the extended Yale face database B [30]. The Yale facedatabase B contains 5760 single light source images of 10subjects each seen under 576 viewing conditions (9 poses�64illumination conditions). For every subject in a particular pose,an image with ambient (background) illumination was also

le 1putational time (in seconds) of LocLDA for different sizes of PIE face database.

ze 340 680 1360 2040 2720 3400

ime 0.45 1.33 2.80 3.43 4.89 6.48

Fig. 1. Some samples randomly chosen from the four data sets. (a)

Ran

D

O

Y

Y

P

captured. The extended Yale face database B contains 16 128images of 28 human subjects under nine poses and 64 illumina-tion conditions. The data format of this database is the same asthe Yale face database B. For the sake of simplicity, a subsetcalled Extended Yale face database B (YaleB) was collected fromtwo database, it contains 2414 face images of 38 subjects.

� The CMU PIE face database [31] contains 41 368 face images of

subjects. The facial images for each subject were capturedunder 13 different poses, 43 different illumination conditions,and with four different expressions. We choose the five nearfrontal poses (C05,C07,C09,C27,C29) and use all the 11 544images under different illuminations and expressions whereeach person has 170 images except for a few bad images.

Fig. 1 shows some samples from the four face data sets where fourpersons are randomly chosen from each data set and each subject hassix sample images. All the face images used in our experiments aremanually aligned, cropped and re-sized to have a spatial resolution of32�32, with 256 gray levels per pixel. The features (pixel values) arethen scaled to [0,1]. For the vector-based approaches, a face image isrepresented as a 1024-dimensional vector. For each data set, werandomly partition the data into a training set and a test set (l imagesper person for training and the remaining images are for testing).Detailed description of the partition is presented in Table 2. Thepartition procedure is repeated 50 times to give a better estimation ofrecognition accuracy.

Five algorithms which used in our experiments are summar-ized below

1.

LocLDA, our local structure preserved LDA which proposed andanalyzed in Section 3.

2.

Locality preserving projection (LPP) [10] which preserves theneighborhood structure can be seen as an alternative to PCA.

3.

Linear discriminant analysis (LDA) which can be found in [6,7].An efficient implementation has been proposed in [5] namedFisherfaces.

4.

Uncorrelated linear discriminant (ULDA), which was firstproposed in [32,33]. An efficient algorithm to compute thesolution of ULDA was proposed in [34].

5.

CLPP, contourlet-based locality preserving projection for facerecognition [35,36].

4.2. Results

The recognition accuracy of the random 50 different splittingfor four face image data sets with LocLDA, LPP, FisherFace and

ORL. (b) Yale. (c) Extended Yale B. (d) Extended Yale B.

dom partition for four data set with different training numbers.

atabase Different numbers for training (l) Classes (c)

RL (2,3,4,5,6) 40

ale (2,3,4,5,6) 15

ale B (10,20,30,40,50) 38

IE (30,60,90,120,130) 68

Page 5: Efficient linear discriminant analysis with locality preserving for face recognition

X. Shu et al. / Pattern Recognition 45 (2012) 1892–18981896

ULDA are presented in Fig. 2. We can observe from Fig. 2 that LPPoutperforms LDA and ULDA slightly on Yale face database, whileLDA and ULDA outperform LPP on ORL face database, extendedYale face database B and PIE face database. This implies that theimportance of global and local structures depend on specific dataset. For example, for the Yale face database, the local structuremay contain more important information, while for other threeface database, the local structure may less important. However, inall cases, our method outperforms LPP, LDA and ULDA. Thisimplies that global and local structures can be complementaryto each other. Therefore, incorporating both global and localstructures information may be beneficial.

We then compare the performance of LocLDA with LPP, Fish-erFace, ULDA and CLPP. We report the mean accuracy andstandard deviation of the 50 different partitions for each data

5 10 15 20 25 30 35 40 45 500.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

Fifty different splitings

Rec

ogni

tion

Acc

urac

y

ORL face database (Training size = 5)

LocLDALPPFisherFaceULDA

5 10 15 20 25 30 35 40 45 500.7

0.75

0.8

0.85

0.9

Fifty different splitings

Rec

ogni

tion

Acc

urac

y

Extended Yale face databaseB (Training size = 30)

LocLDALPPFisherFaceULDA

Fig. 2. Comparison of LocLDA, LPP, FisherFace and ULDA in classification accuracy usi

training and test sets. The y-axis is the corresponding classification accuracy respect t

Table 3Recognition accuracy on ORL face database (mean7std-dev)%.

Method 2Train 3Train

LocLDA 82.873.1 90.872.0LPP 74.273.048 83.572.474

FisherFace 72.573.3 86.372.4

ULDA 79.972.6 87.272.1

CLPP 78.472.7 85.4 71.7

set with different training numbers from Tables 3–6 (thesubscripts in the table denote the dimension of features usedfor LPP). It can be concluded that our LocLDA achieve the bestperformance. We can observe from Table 6, when the trainingdata size is relative small, e.g., ‘¼ 2, the recognition accuracy ofLocLDA is relatively poor compared with other methods. This dueto the fact when the training data size is small, it is difficult tocapture more global or local geometrical structure information.Therefore, integrating both structures may make poor perfor-mance. It is interesting to note that all five algorithms achievecomparable performance on PIE face database when the trainingsize is relatively large. This implies that in certain cases withrelative large training data size, global and local structures maycapture similar information and integrating both structures doesnot help.

0 10 20 30 40 500.65

0.7

0.75

0.8

0.85

0.9

Fifty different splitings

Rec

ogni

tion

Acc

urac

y

Yale face database (Training size = 5)

LocLDALPPFisherFaceULDA

5 10 15 20 25 30 35 40 45 500.82

0.83

0.84

0.85

0.86

0.87

0.88

0.89

0.9

0.91

Fifty different splitings

Rec

ogni

tion

Acc

urac

y

PIE face database (Training size = 30)

LocLDALPPFisherFaceULDA

ng four well-known face data sets. The x-axis denotes 50 different partitions into

o 50 different partitions.

4Train 5Train 6Train

94.471.5 96.571.3 97.1 71.788.172.095 90.771.8118 92.572.2136

91.272.0 94.071.5 95.271.8

91.271.8 93.671.5 94.771.9

90.971.5 93.5 72.1 94.9 71.8

Page 6: Efficient linear discriminant analysis with locality preserving for face recognition

Table 4Recognition accuracy on Yale face database (mean7std-dev)%.

Method 2Train 3Train 4Train 5Train 6Train

LocLDA 55.374.3 67.073.7 73.8 74.3 78.173.8 81.773.8LPP 56.273.818 66.573.927 72.274.637 76.673.246 79.773.955

FisherFace 41.174.6 60.774.1 68.674.7 74.174.3 77.574.1

ULDA 56.274.2 66.473.8 72.674.5 76.273.5 79.374.1

CLPP 50.8 75.0 60.773.5 70.573.6 71.9 75.5 77.673.4

Table 5Recognition accuracy on extended Yale face database B (mean7std-dev)%.

Method 10Train 20Train 30Train 40Train 50Train

LocLDA 79.171.5 83.671.8 85.771.1 94.370.9 96.770.7LPP 74.074.3183 64.5710.3302 74.571.8387 91.871.0444 95.070.9498

FisherFace 78.371.2 85.970.8 81.371.6 93.870.8 96.470.9

ULDA 68.879.9 57.7712.5 79.471.6 93.470.8 96.170.9

CLPP 65.771.4 78.070.8 83.370.9 93.371.4 95.970.5

Table 6Recognition accuracy on PIE face database (mean7std-dev)%.

Method 30Train 60Train 90Train 120Train 130Train

LocLDA 90.1 70.4 94.870.3 95.970.2 96.470.3 96.570.3LPP 85.370.7473 93.670.4605 95.270.2646 95.970.3655 96.070.3658

FisherFace 89.170.4 94.670.3 95.770.2 96.370.3 96.470.3

ULDA 88.070.4 93.670.3 94.870.2 95.470.3 95.670.3

CLPP 88.370.3 93.970.3 95.070.1 95.770.3 95.870.3

X. Shu et al. / Pattern Recognition 45 (2012) 1892–1898 1897

5. Conclusion and future work

We have proposed in this paper a novel linear discriminantanalysis, called LocLDA, which integrates both global and localstructures for dimensionality reduction and classification. Bycasting LDA as a least square problem based on the spectralregression, the local structure is modeled by the graph Laplacianregularization term. An efficient model selection algorithm is alsopresented to estimate the optimal tuning parameter. Experimen-tal results on four benchmark face data sets have shown ourmethod is competitive with LPP, FisherFace and ULDA in terms ofrecognition accuracy.

In this paper, we only focus on supervised learning, i.e., all thetraining samples are labeled. Semi-supervised learning hasreceived great attention in recent years because of applicationdomains in which unlabeled data are plentiful. One of our futurework is to extend our method under the semi-supervised learningframework. The current work focus on linear dimensionalityreduction. This may be less effective when the data has weaklinear separability. This motivates us to extended our method todeal with nonlinear data using kernel tricks.

Acknowledgments

I am very grateful to the anonymous referees. This workwas supported by 863 (No. 2008AA02Z310), NSFC (60873133)and Shanghai Committee of Science and Technology (Nos.08411951200, 08JG05002).

Appendix A

Let B¼ XXTþlXLX

T. Since L is the graph Laplacian, it is

symmetric and positive semi-definite [37]. From the symmetric

Shur decomposition [25], we have L¼ QSQT , where S¼ðS11,S22, . . . ,SnnÞ is diagonal matrix with SiiZ0. Let L ¼QS1=2.

We have L¼ LLT. Hence B¼ XX

TþlXLðXLÞT . It followed that B is

symmetry and positive-semidefinite. Hence B can be written as

B¼ B1BT1. Let B1 ¼USVT be the SVD of B1. We have

B1BT1þð1�lÞI¼UðS2

þð1�lÞIÞUT :

Hence

9XXTþlXLX

Tþð1�lÞI9¼ 9UðS2

þð1�lÞIÞUT9¼ 9S2þð1�lÞI9,

which is nonsingular since 1�lZ0.

References

[1] R. Chellappa, C. Wilson, S. Sirohey, et al., Human and machine recognition offaces: a survey, Proceedings of the IEEE 83 (1995) 705–740.

[2] A. Samal, P. Iyengar, Automatic recognition and analysis of human faces andfacial expressions: a survey, Pattern recognition 25 (1992) 65–77.

[3] B. Fasel, J. Luettin, Automatic facial expression analysis: a survey, PatternRecognition 36 (2003) 259–275.

[4] M. Turk, A. Pentland, Face recognition using eigenfaces, in: Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition, vol. 591,pp. 586–591.

[5] P. Belhumeur, J. Hespanha, D. Kriegman, Eigenfaces vs. Fisherfaces: recogni-tion using class specific linear projection, IEEE Transactions on PatternAnalysis and Machine Intelligence 19 (1997) 711–720.

[6] K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic, NewYork, 1990.

[7] R. Duda, P. Hart, D. Stork, Pattern Classification, Wiley, 2001.[8] M. Belkin, P. Niyogi, Laplacian eigenmaps and spectral techniques for

embedding and clustering, in: Advances in Neural Information ProcessingSystems, vol. 1, 2002, pp. 585–592.

[9] F. Chung, Spectral Graph Theory, American Mathematical Society, 1997.[10] X. Niyogi, Locality preserving projections, in: Advances in Neural Information

Processing Systems (NIPS), 2003.[11] W. Yu, X. Teng, C. Liu, Face recognition using discriminant locality preserving

projections, Image and Vision computing 24 (2006) 239–248.

Page 7: Efficient linear discriminant analysis with locality preserving for face recognition

X. Shu et al. / Pattern Recognition 45 (2012) 1892–18981898

[12] L. Zhu, S. Zhu, Face recognition based on orthogonal discriminant localitypreserving projections, Neurocomputing 70 (2007) 1543–1546.

[13] L. Yang, W. Gong, X. Gu, W. Li, Y. Liang, Null space discriminant localitypreserving projections for face recognition, Neurocomputing 71 (2008)3644–3649.

[14] X. He, S. Yan, Y. Hu, P. Niyogi, H. Zhang, Face recognition using Laplacianfaces,IEEE Transactions on Pattern Analysis and Machine Intelligence (2005)328–340.

[15] D. Cai, X. He, J. Han, Spectral regression: a unified approach for sparsesubspace learning, in: Proceedings of the International Conference on DataMining (ICDM’07), 2007.

[16] M. Belkin, P. Niyogi, V. Sindhwani, Manifold regularization: a geometricframework for learning from labeled and unlabeled examples, The Journal ofMachine Learning Research 7 (2006) 2399–2434.

[17] J. Chen, J. Ye, Q. Li, Integrating global and local structures: a least squaresframework for dimensionality reduction, in: IEEE Conference on ComputerVision and Pattern Recognition, IEEE, pp. 1–8.

[18] D. Swets, J. Weng, Using discriminant eigenfeatures for image retrieval, IEEETransactions on Pattern Analysis and Machine Intelligence 18 (2002)831–836.

[19] P. Howland, H. Park, Generalizing discriminant analysis using the generalizedsingular value decomposition, IEEE Transactions on Pattern Analysis andMachine Intelligence (2004) 995–1006.

[20] J. Friedman, Regularized discriminant analysis, Journal of the AmericanStatistical Association 84 (1989) 165–175.

[21] T. Hastie, A. Buja, R. Tibshirani, Penalized discriminant analysis, The Annals ofStatistics 23 (1995) 73–102.

[22] S. Raudys, R. Duin, Expected classification error of the Fisher linear classifierwith pseudo-inverse covariance matrix, Pattern Recognition Letters 19(1998) 385–392.

[23] T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: DataMining, Inference, and Prediction, Springer, New York, 2001.

[24] D. Cai, X. He, J. Han, SRDA: an efficient algorithm for large-scale discriminantanalysis, IEEE Transactions on Knowledge and Data Engineering 20 (2008)1–12.

[25] G. Golub, C. Van Loan, Matrix Computation, The Johns Hopkins UniversityPress, 1996.

[26] K. Petersen, M. Pedersen, The Matrix Cookbook /http://matrixcookbook.comS, 2008.

[27] C. Paige, M. Saunders, LSQR: an algorithm for sparse linear equations andsparse least squares, ACM Transactions on Mathematical Software (TOMS) 8(1982) 43–71.

[28] C. Paige, M. Saunders, Algorithm 583 LSQR: sparse linear equations and leastsquares problems, ACM Transactions on Mathematical Software (TOMS) 8(1982) 195–209.

[29] A. Georghiades, P. Belhumeur, D. Kriegman, From few to many: illuminationcone models for face recognition under variable lighting and pose, IEEETransactions on Pattern Analysis and Machine Intelligence 23 (2001)643–660.

[30] K. Lee, J. Ho, D. Kriegman, Acquiring linear subspaces for face recognitionunder variable lighting, IEEE Transactions on Pattern Analysis and MachineIntelligence 27 (2005) 684–698.

[31] T. Sim, S. Baker, M. Bsat, The CMU Pose, Illumination, and Expression (PIE)Database of Human Faces, Technical Report CMU-RI-TR-01-02, RoboticsInstitute, Pittsburgh, PA, 2001.

[32] Z. Jin, J. Yang, Z. Hu, Z. Lou, Face recognition based on the uncorrelateddiscriminant transformation, Pattern Recognition 34 (2001) 1405–1416.

[33] Z. Jin, J. Yang, Z. Tang, Z. Hu, A theorem on the uncorrelated optimaldiscriminant vectors, Pattern Recognition 34 (2001) 2041–2047.

[34] J. Ye, B. Yu, Characterization of a family of algorithms for generalizeddiscriminant analysis on undersampled problems, Journal of Machine Learn-ing Research 6 (2005) 483–502.

[35] M. Do, M. Vetterli, The contourlet transform: an efficient directional multi-resolution image representation, IEEE Transactions on Image Processing 14(2005) 2091–2106.

[36] Y. Tan, Y. Zhao, X. Ma, Contourlet-based feature extraction with LPP for facerecognition, in: International Conference on Multimedia and Signal Proces-sing (CMSP), 2011.

[37] U. Von Luxburg, A tutorial on spectral clustering, Statistics and Computing 17(2007) 395–416.

Xin Shu received his M.S. degree from Nanjing Agriculture University in applied mathematics in 2009. He is now a Ph.D. candidate in the Department of Computer Scienceand Engineering. His research interests include machine learning, computer vision.

Yao Gao received his B.S. degree from Harbin Institute of Technology in Weihai, China, in 2009. He is now pursuing a master degree in Shanghai Jiaotong University. Hisresearch interests include computer vision, image processing.

Hongtao Lu obtained his Ph.D. degree in Electronic Engineering from Southeast University, Nanjing, in 1997. After graduation he became a postdoctoral fellow inDepartment of Computer Science, Fudan University, Shanghai, China, where he spent two years. In 1999, he joined the Department of Computer Science and Engineering,Shanghai Jiao Tong University, Shanghai, where he is now a professor. His research interest includes machine learning, computer vision and pattern recognition, and EEGProcessing. He has published more than 60 papers in international journals such as IEEE Transactions, Neural Networks and in international conferences. His papersobtained more than 400 citations by other researchers.