a novel incremental principal component analysis and its

14
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSā€”PART B: CYBERNETICS, VOL. 36, NO. 4, AUGUST 2006 873 A Novel Incremental Principal Component Analysis and Its Application for Face Recognition Haitao Zhao, Pong Chi Yuen, Member, IEEE, and James T. Kwok, Member, IEEE Abstractā€”Principal component analysis (PCA) has been proven to be an efļ¬cient method in pattern recognition and image analysis. Recently, PCA has been extensively employed for face- recognition algorithms, such as eigenface and ļ¬sherface. The encouraging results have been reported and discussed in the liter- ature. Many PCA-based face-recognition systems have also been developed in the last decade. However, existing PCA-based face- recognition systems are hard to scale up because of the compu- tational cost and memory-requirement burden. To overcome this limitation, an incremental approach is usually adopted. Incremen- tal PCA (IPCA) methods have been studied for many years in the machine-learning community. The major limitation of existing IPCA methods is that there is no guarantee on the approximation error. In view of this limitation, this paper proposes a new IPCA method based on the idea of a singular value decomposition (SVD) updating algorithm, namely an SVD updating-based IPCA (SVDU-IPCA) algorithm. In the proposed SVDU-IPCA algorithm, we have mathematically proved that the approximation error is bounded. A complexity analysis on the proposed method is also presented. Another characteristic of the proposed SVDU-IPCA algorithm is that it can be easily extended to a kernel version. The proposed method has been evaluated using available public databases, namely FERET, AR, and Yale B, and applied to existing face-recognition algorithms. Experimental results show that the difference of the average recognition accuracy between the pro- posed incremental method and the batch-mode method is less than 1%. This implies that the proposed SVDU-IPCA method gives a close approximation to the batch-mode PCA method. Index Termsā€”Error analysis, face recognition, incremental principal component analysis (PCA), singular value decomposi- tion (SVD). I. I NTRODUCTION F ACE RECOGNITION has been an active research area in the computer-vision and pattern-recognition societies [1]ā€“[6] in the last two decades. Since the original input-image space has a very high dimension [2], a dimensionality-reduction technique is usually employed before classiļ¬cation takes place. Principal component analysis (PCA) [1], [7] is one of the most popular representation methods for face recognition. It does Manuscript received February 7, 2005; revised July 18, 2005. This project was supported in part by a Faculty research Grant of the Hong Kong Baptist University and in part by the Research Grant Council under Grant HKBU- 2119/03E. This paper was recommended by Associate Editor H. Qiao. H. Zhao is with the Institute of Aerospace Science and Technology, Shang- hai Jiaotong University, China, and also with the Department of Computer Science, Hong Kong Baptist University, Hong Kong, SAR (e-mail: zhaoht@ sjtu.edu.cn). P. C. Yuen is with the Department of Computer Science, Hong Kong Baptist University, Hong Kong, SAR (e-mail: [email protected]). J. T. Kwok is with the Department of Computer Science, The Hong Kong University of Science and Technology, Hong Kong, SAR (e-mail: jamesk@ cs.ust.hk). Digital Object Identiļ¬er 10.1109/TSMCB.2006.870645 not only reduce the image dimension, but also provides a compact feature for representing a face image. The well-known eigenface system was developed in 1991. In 1997, PCA was also employed for dimension reduction for linear discriminant analysis and a new algorithm, namely ļ¬sherface, was devel- oped. After that, PCA has been extensively employed in face- recognition technology. Usually, PCA is performed in batch mode. It means that all training data have to be ready for calculating the PCA projection matrix during training stage. The learning stops once the training data have been fully processed. If we want to incorporate additional training data into an existing PCA projection matrix, the matrix has to be retrained with all training data. In turn, it is hard to scale up the developed systems. To overcome this limitation, an incremental method is a straight- forward approach. Incremental PCA (IPCA) has been studied for more than two decades in the machine-learning community. Many IPCA methods have also been developed. Basically, existing IPCA algorithms can be divided into two categories. The ļ¬rst category computes the principal components (PCs) without computing the covariance matrix [13], [16], [17]. To the best of our knowledge, the candid covariance-free IPCA (CCIPCA) [13] algorithm developed by Weng et al. is the most recent work. Instead of estimating the PCs by stochastic gradient ascent (SGA) [17], the CCIPCA algorithm generates ā€œobservationsā€ in a complementary space for the computation of the higher order PCs. Suppose that sample vectors are acquired sequentially, e.g., u(1),u(2),..., possibly inļ¬nite. The ļ¬rst k dominant PCs v 1 (n),v 2 (n),...,v k (n) are obtained as follows [13]. For n =1, 2,..., do the followings steps. 1) u 1 (n)= u(n). 2) For i =1, 2,..., min(k,n), do: a) if i = n, initialize the ith PC as v i (n)= u i (n); b) otherwise v i (n)= n āˆ’ 1 āˆ’ l n v i (n āˆ’ 1) + 1+ l n u i (n)u T i (n) v i (n) v i (n) u i+1 (n)= u i (n) āˆ’ u T i (n) v i (n) v i (n) v i (n) v i (n) where l is the amnesic parameter [13]. Since the PCs are obtained sequentially and the computation of the (i + 1)th PC depends on the ith PC, the error will be propagated and accumulated in the process. Although the estimated vector v i (n) converges to the ith PC, no detailed error analysis was presented in [13]. Further, Weng et al. [13] deļ¬ned 1083-4419/$20.00 Ā© 2006 IEEE

Upload: others

Post on 22-Jan-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSā€”PART B: CYBERNETICS, VOL. 36, NO. 4, AUGUST 2006 873

A Novel Incremental Principal Component Analysisand Its Application for Face Recognition

Haitao Zhao, Pong Chi Yuen,Member, IEEE, and James T. Kwok,Member, IEEE

Abstractā€”Principal component analysis (PCA) has been provento be an efficient method in pattern recognition and imageanalysis. Recently, PCA has been extensively employed for face-recognition algorithms, such as eigenface and fisherface. Theencouraging results have been reported and discussed in the liter-ature. Many PCA-based face-recognition systems have also beendeveloped in the last decade. However, existing PCA-based face-recognition systems are hard to scale up because of the compu-tational cost and memory-requirement burden. To overcome thislimitation, an incremental approach is usually adopted. Incremen-tal PCA (IPCA) methods have been studied for many years inthe machine-learning community. The major limitation of existingIPCA methods is that there is no guarantee on the approximationerror. In view of this limitation, this paper proposes a new IPCAmethod based on the idea of a singular value decomposition(SVD) updating algorithm, namely an SVD updating-based IPCA(SVDU-IPCA) algorithm. In the proposed SVDU-IPCA algorithm,we have mathematically proved that the approximation error isbounded. A complexity analysis on the proposed method is alsopresented. Another characteristic of the proposed SVDU-IPCAalgorithm is that it can be easily extended to a kernel version.The proposed method has been evaluated using available publicdatabases, namely FERET, AR, and Yale B, and applied to existingface-recognition algorithms. Experimental results show that thedifference of the average recognition accuracy between the pro-posed incremental method and the batch-mode method is less than1%. This implies that the proposed SVDU-IPCA method gives aclose approximation to the batch-mode PCA method.

Index Termsā€”Error analysis, face recognition, incrementalprincipal component analysis (PCA), singular value decomposi-tion (SVD).

I. INTRODUCTION

FACE RECOGNITION has been an active research areain the computer-vision and pattern-recognition societies

[1]ā€“[6] in the last two decades. Since the original input-imagespace has a very high dimension [2], a dimensionality-reductiontechnique is usually employed before classification takes place.Principal component analysis (PCA) [1], [7] is one of the mostpopular representation methods for face recognition. It does

Manuscript received February 7, 2005; revised July 18, 2005. This projectwas supported in part by a Faculty research Grant of the Hong Kong BaptistUniversity and in part by the Research Grant Council under Grant HKBU-2119/03E. This paper was recommended by Associate Editor H. Qiao.

H. Zhao is with the Institute of Aerospace Science and Technology, Shang-hai Jiaotong University, China, and also with the Department of ComputerScience, Hong Kong Baptist University, Hong Kong, SAR (e-mail: [email protected]).

P. C. Yuen is with the Department of Computer Science, Hong Kong BaptistUniversity, Hong Kong, SAR (e-mail: [email protected]).

J. T. Kwok is with the Department of Computer Science, The Hong KongUniversity of Science and Technology, Hong Kong, SAR (e-mail: [email protected]).

Digital Object Identifier 10.1109/TSMCB.2006.870645

not only reduce the image dimension, but also provides acompact feature for representing a face image. The well-knowneigenface system was developed in 1991. In 1997, PCA wasalso employed for dimension reduction for linear discriminantanalysis and a new algorithm, namely fisherface, was devel-oped. After that, PCA has been extensively employed in face-recognition technology.

Usually, PCA is performed in batch mode. It means thatall training data have to be ready for calculating the PCAprojection matrix during training stage. The learning stopsonce the training data have been fully processed. If we wantto incorporate additional training data into an existing PCAprojection matrix, the matrix has to be retrained with all trainingdata. In turn, it is hard to scale up the developed systems. Toovercome this limitation, an incremental method is a straight-forward approach.

Incremental PCA (IPCA) has been studied for more thantwo decades in the machine-learning community. Many IPCAmethods have also been developed. Basically, existing IPCAalgorithms can be divided into two categories. The first categorycomputes the principal components (PCs) without computingthe covariance matrix [13], [16], [17]. To the best of ourknowledge, the candid covariance-free IPCA (CCIPCA) [13]algorithm developed by Weng et al. is the most recent work.Instead of estimating the PCs by stochastic gradient ascent(SGA) [17], the CCIPCA algorithm generates ā€œobservationsā€ ina complementary space for the computation of the higher orderPCs. Suppose that sample vectors are acquired sequentially,e.g., u(1), u(2), . . ., possibly infinite. The first k dominant PCsv1(n), v2(n), . . . , vk(n) are obtained as follows [13].

For n = 1, 2, . . ., do the followings steps.

1) u1(n) = u(n).2) For i = 1, 2, . . . ,min(k, n), do:

a) if i = n, initialize the ith PC as vi(n) = ui(n);b) otherwise

vi(n) =nāˆ’ 1 āˆ’ l

nvi(nāˆ’ 1) +

1 + l

nui(n)uT

i (n)vi(n)

ā€–vi(n)ā€–

ui+1(n) =ui(n) āˆ’ uTi (n)

vi(n)ā€–vi(n)ā€–

vi(n)ā€–vi(n)ā€–

where l is the amnesic parameter [13].

Since the PCs are obtained sequentially and the computationof the (i+ 1)th PC depends on the ith PC, the error willbe propagated and accumulated in the process. Although theestimated vector vi(n) converges to the ith PC, no detailed erroranalysis was presented in [13]. Further, Weng et al. [13] defined

1083-4419/$20.00 Ā© 2006 IEEE

874 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSā€”PART B: CYBERNETICS, VOL. 36, NO. 4, AUGUST 2006

a sample-to-dimension ratio n/d, where n is the number oftraining samples and d is the dimension of the sample space.If the ratio is small, it may be a problem from the statistical-estimation point of view. In the face-recognition problem, thehigh dimension of the face images is usually larger than 10 000.Even though 5000 training samples are used, the sample-to-dimension ratio (5000/10 000 = 0.5) is still very low [13].Hence, methods in this category are not suitable for face-recognition technology.

The second category of the IPCA algorithms [14], [18]ā€“[20],[26] reconstructs the significant PCs from the original trainingimages and a newly added sample. When a new sample isadded, the dimension of the subspace is increased by one.The updated eigenspace is obtained by using low-dimensionalcoefficient vectors of the original images. The reason is thatthe coefficient vectors and reconstructed images are only rep-resented in different coordinate frames. Since the dimensionof the eigenspace is small, this approach is computationallyefficient [14]. Since the new samples are added one by oneand the least significant PC is discarded to preserve the di-mension of the subspace, this approach also suffers from theproblem of unpredicted approximation error. Hall et al. [12]also demonstrated that, for the updating of the eigenspace withlow dimension coefficient vectors, adding one datum each timeis much less accurate than adding complete spaces.

On the other hand, there are powerful mathematical tools,namely SVD updating algorithms [8], [21]ā€“[25], which weredeveloped based on singular value decomposition (SVD). Zhaand Simon [8] further enhance the SVD updating algorithmand applied it to latent semantic indexing (LSI) for informationretrieval. They investigated matrices possessing the so-calledlow-rank-plus-shift structure, i.e., matrix A satisfying

ATA = a low-rank matrix + a multiple of the identity matrix.

If a matrix has the low-rank-plus-shift structure, the SVDupdating algorithm will obtain the same best low-rank approx-imation as in the batch mode. This means that no error willbe introduced. Theoretically, this approach is really good andinteresting. However, in practice, it is difficult to prove that thedata matrix has the low-rank-plus-shift structure. It is becausethe data matrix will be changed with the insertion of new sam-ple(s), and the rank of the data matrix can be changed during theincremental-learning process. Another important missing itemis that there is no analysis on error in case a matrix does notsatisfy the low-rank-plus-shift structure.

In this paper, we develop a new IPCA method based onthe SVD updating algorithm, namely the SVD updating-basedIPCA (SVDU-IPCA) algorithm. Instead of working directlyon the data matrix X , we apply our method on the matrixĪ£ = (X āˆ’ E(X))T(X āˆ’ E(X)), where E(Ā·) is the expecta-tion operator. This is often employed in implementing the PCA-based face-recognition methods, such as the eigenface method[1]. The dimensionality of the face image (usually higher than10 000) is much higher than that of the data used in LSI(usually less than 1000). Working on the matrix Ī£ can resultin computational savings. We also prove that if the data matrixhas the low-rank-plus-shift structure, the error bound using our

proposed SVDU-IPCA method is also equal to zero. However,as discussed, it is hard to prove that the input matrices havethe low-rank-plus-shift structure. Therefore, we have presenteda detailed error analysis and derived an error bound for theapproximation between the incremental algorithm and the batchmode. Another characteristic of our proposed SVDU-IPCAmethod is that it can be easily extended to the kernel version.Since the conventional incremental methods [8], [13], [19]need to compute the additional vectors in the feature space (notin terms of inner products), they are difficult to adopt to thekernel trick.

The rest of this paper is organized as follows. Section IIbriefly reviews PCA and SVD updating algorithms. Section IIIpresents our proposed SVDU-IPCA method. A mathematicalanalysis of our proposed SVDU-IPCA algorithm is presentedin Section IV. Section V reports our incremental kernel PCA(IKPCA). Experimental results are shown in Section VI, andSection VII concludes this paper.

II. BRIEF REVIEW ON PCA AND SVD UPDATING

In this section, we will briefly review the procedures of thePCA and SVD updating algorithms [8].

A. Principal Component Analysis

PCA [30] is one of the oldest and best known techniques inmultivariate analysis. Let x āˆˆ R

n be a random vector, wheren is the dimension of the input space. The covariance ma-trix of x is defined as Īž = E[xāˆ’ E(x)][xāˆ’ E(x)]T. Letu1, u2, . . . , un and Ī»1, Ī»2, . . . , Ī»n be eigenvectors and eigen-values of Īž, respectively, and Ī»1 ā‰„ Ī»2 ā‰„ Ā· Ā· Ā· ā‰„ Ī»n. Then, PCAfactorizes Īž into Īž = UĪ›UT, with U = [u1, u2, . . . , un] andĪ› = diag(Ī»1, Ī»2, . . . , Ī»n). One important property of PCA isits optimal signal reconstruction in the sense of minimum meansquared error (mse). It follows that once the PCA of Īž isavailable, the best rank-m approximation of Īž can be readilycomputed. Let P = [u1, u2, . . . , um], where m < n. y = PTxwill be an important application of PCA in dimensionalityreduction.

B. SVD Updating

Let A āˆˆ RmƗn, AmƗn = UĪ›V T, and its best rank-k ap-

proximation AmƗn ā‰” UkĪ›kVTk , where Uk and Vk are formed

by the first k columns of U and V , respectively, and Ī›k

is the kth leading principal submatrix of Ī›. For any matrixA āˆˆ R

mƗn, we will use bestk(A) to denote its best rank-kapproximation.

The SVD updating algorithm [8], [15] provides an efficientway to carry out the SVD of a larger matrix [AmƗn, BmƗr],where B is an mƗ r matrix consisting of r additional columns.

The procedure in obtaining the best rank-k approximation of[A,B] can be summarized in Algorithm 1.

By exploiting the orthonormal properties and block structure,the SVD computation of [A,B] can be efficiently carried out byusing the smaller matrices, Uk, Vk, and the SVD of the smaller

ZHAO et al.: A NOVEL INCREMENTAL PCA AND ITS APPLICATION FOR FACE RECOGNITION 875

matrix

[Ī›k UT

k B0 R

]. The computational-complexity analysis

and details of the SVD updating algorithm are described in [8].Algorithm 1ā€”SVD Updating [8]:1) Obtain the QR decomposition (I āˆ’ UkU

Tk )B = QR.

2) Obtain the rank-k SVD of the (k + rā€²) Ɨ (k + r) matrix[Ī›k UT

k B0 R

]= U Ī›V T

where rā€² is the rank of (I āˆ’ UkUTk )B.

3) The best rank-k approximation of [A,B] is

([Uk, Q]U)Ī›([

Vk 00 I

]V

)T

.

III. PROPOSED IPCA ALGORITHM

This section presents our proposed SVDU-IPCA algorithmbased on the SVD updating algorithm. A comprehensive the-oretical analysis on SVDU-IPCA will be given in the nextsection.

Let x1, x2, . . . , xm be the original face-image vectors, letxm+1, xm+2, . . . , xm+r be the newly added face-image vec-tors, and let the data matrix X1 = [x1, x2, . . . , xm], the datamatrix X2 = [xm+1, xm+2, . . . , xm+r], and X = [X1,X2].Due to the high dimensionality of the face image and thedifficulty in developing the incremental-learning method basedon the covariance matrix Īž, we proposed to develop theincremental-learning algorithm based on the matrix Ī£ = (X āˆ’E(X))T(X āˆ’ E(X)).

Let

Ī£1 = (X1 āˆ’ E(X1))T (X1 āˆ’E(X1)) (1)

and

Ī£ = (X āˆ’ E(X))T (X āˆ’ E(X)) =[

Ī£1 Ī£2

Ī£T2 Ī£3

](2)

where Ī£2 is of size mƗ r and Ī£3 is of size r Ɨ r. Due tothe high dimensionality of the face image, wherein Ī£1 and Ī£are often singular, we develop the incremental-learning methodbased on the presupposition of positive semidefinite matrices.1

Then, Ī£ can also be written as

Ī£ =[P Q1

Q2 Q3

]T [P Q1

Q2 Q3

].

Fig. 1 shows the basic idea of our proposed IPCA algorithmgraphically. We adopt the notations from [22], in which therotation sign in Fig. 1 represents the orthogonal transformation.The idea of the proposed incremental procedure is describedas follows. The rank-k eigendecomposition of Ī£1 = PTPcorresponds to best rank-k approximation P (1) of P . We can

then obtain the best rank-k approximation P (2) of

[P (1)

Q2

].

From the theoretical analysis in the next section, we will find

1In other applications, Ī£1 and Ī£ can be positive definite and the proposedSVDU-IPCA algorithm can be obtained by the same theoretical derivation.

Fig. 1. Visualization of our novel IPCA algorithm. The orthogonal transfor-mation is denoted by rotation.

that Q2 = 0. Hence, no computational cost is required in thisprocedure. Then, we can obtain the best rank-k approximation

of [P (2)[Q1Q3

]]. This will approximate the SVD of

[P Q1

Q2 Q3

],

from which we can get an approximated eigendecompo-sition of

[P Q1

Q2 Q3

]T [P Q1

Q2 Q3

]= Ī£.

Since Ī£1 is a positive semidefinite matrix, there exists an or-thogonal matrix U , such that Ī£1 = UĪ›UT, where U = [u1, u2,. . . , um] and Ī› = diag(Ī»1, Ī»2, . . . , Ī»m), and Ī»1, Ī»2, . . . , Ī»m

being the eigenvalues of Ī£1. Suppose Ī»1 ā‰„ Ī»2 ā‰„ Ā· Ā· Ā· ā‰„Ī»l > 0 and Ī»l+1 = Ī»l+2 = Ā· Ā· Ā· = Ī»m = 0. Let Ī› = diag(Ī»1,Ī»2, . . . , Ī»l), U = [u1, u2, . . . , ul], and PlƗm = Ī›1/2[u1, u2,. . . , ul]T. Then, Ī£1 = PTP .Q1, Q2, and Q3 are derived as follows (please refer to the

Appendix for a detailed derivation)

(Q1)lƗr = Ī›āˆ’ 12 UTĪ£2 (3)

Q2 = 0 (4)

QT3 Q3 = Ī£3 āˆ’ Ī£T

2 U Ī›āˆ’1UTĪ£2. (5)

Since PlƗm = (diag(Ī»1, Ī»2, . . . , Ī»l))1/2[u1, u2, . . . , ul]T,we have

P (1) =[Ik

0

]lƗk

Ī›kVTk

where Ik āˆˆ RkƗk, Ik = diag(1, 1, . . . , 1), Ī›k = (diag(Ī»1, Ī»2,

. . . , Ī»k))1/2, and Vk = [u1, u2, . . . , uk]T.Because Q2 = 0, thus

P (2) =[P (1)

Q2

]=

[Ik

0

]Ī›kV

Tk

0

(l+r)Ɨm

=[Ik

0

](l+r)Ɨk

Ī›kVTk

876 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSā€”PART B: CYBERNETICS, VOL. 36, NO. 4, AUGUST 2006

which is of rank k already. Hence, the best rank-k approxima-

tion P (2) of

[P (1)

0

]is simply

[P (1)

0

]itself.

Based on these derivations, the proposed SVDU-IPCA algo-rithm is shown in Algorithm 2. Notice that although, concep-tually, the incremental procedure involves the addition of rows,Q2 = 0 means that no extra computation cost is required. Thus,only one round of approximation based on column addition isrequired. In the third and fourth steps of the incremental proce-

dure, the computation cost of the matrices

[0kƗk

I

] [Q1

Q3

]

and [Ik 0][Q1

Q3

]is quite low. Hence, the proposed incremental

algorithm is computationally efficient. In the next section, wewill show that the approximated error is also bounded.Algorithm 2ā€”Proposed SVDU-IPCA Algorithm: Given the

original data X1 = [x1, x2, . . . , xm] and the rank-k eigen-decomposition of Ī£1 = PTP , for the newly added data[xm+1, xm+2, . . . , xm+r], do the following.

1) Compute the matrix Ī£2 and Ī£3 according to (2).2) Obtain the best rank-k approximation of PlƗm, which is

P (1) =[Ik

0

]lƗk

Ī›kVTk .

3) Compute Q1 according to (3) and Q3 as the square rootof Ī£3 āˆ’QT

1 Q1 in (5).4) Obtain the QR decomposition (I(l+r)Ɨ(l+r) āˆ’[

Ik

0

][Ik 0])

[Q1Q3

]= JK, i.e.,

[0kƗk

I

](l+r)Ɨ(l+r)

[Q1

Q3

]= JK.

5) Obtain the SVD of the smaller matrix Ī›k [Ik 0]

[Q1

Q3

]

0 K

= U Ī›V T.

6) Obtain the best rank-k approximation of

[P Q1

Q2 Q3

],

which is[P Q1

Q2 Q3

]=

([Ik J ]U

)Ī›

([Vk 00 I

]V

)T

.

7) Obtain the best rank-k approximation of

Ī£ =[P Q1

Q2 Q3

]T [P Q1

Q2 Q3

].

IV. ANALYSIS OF THE PROPOSED

SVDU-IPCA ALGORITHM

This section presents a mathematical analysis of the proposedSVDU-IPCA algorithm. We prove that our incremental algo-rithm gives the same result as batch-mode PCA if the matrixhas the low-rank-plus-shift structure. We have also presented anerror-bound analysis if the low-rank-plus-shift structure is not

satisfied. Finally, a computational-complexity analysis is alsodiscussed.

Before going into detailed analysis, let us define some sym-bols as follows. Ī£1 = (P (1))TP (1) and

Ī£ =[

Ī£1 Ī£2

Ī£T2 Ī£3

]

Ī£1 = Ī£1 āˆ’ Ī£1

Ī£ = Ī£ āˆ’ Ī£.

A. Use of Ī£

The eigendecomposition of Ī£ is as follows

Ī£ = Wdiag(Ī»1, . . . , Ī»n)WT (6)

where W = [w1, w2, . . . , wn] is orthogonal, and the eigenval-ues Ī»k are arranged in nonincreasing order.

In the following, we show that bestk(Ī£) = bestk(Ī£), if Ī£has the low-rank-plus-shift structure.Theorem 1: Assume that Ī£ = Ī© + Ļƒ2I , Ļƒ > 0, where

Ī© is symmetric and positive semidefinite with rank(Ī©) =k (k ā‰¤ m). Then, we have

bestk(Ī£) = bestk(Ī£).

Proof: Let Ī£1 =UĪ“UT =Udiag(Ī³1, . . . , Ī³m)UT, whereĪ³1 ā‰„ Ī³2 ā‰„ Ā· Ā· Ā· ā‰„ Ī³m. Then

Ī£1 =Ī£1 + Ī£1

=Udiag(Ī³1, . . . , Ī³k, 0, . . . , 0)UT

+ Udiag(0, . . . , 0, Ī³k+1, . . . , Ī³m)UT.

Let Ī© =[

Ī©1 Ī©2

Ī©T2 Ī©3

], then we have Ī£1 = Ī©1 + Ļƒ2Im, i.e.,

Ī©1 = Ī£1 āˆ’ Ļƒ2Im. Consider

UTĪ©1U = UTĪ£1U āˆ’ Ļƒ2Im

= Ī“ āˆ’ Ļƒ2Im

= diag(Ī³1 āˆ’ Ļƒ2, . . . , Ī³k āˆ’ Ļƒ2,

Ī³k+1 āˆ’ Ļƒ2, . . . , Ī³m āˆ’ Ļƒ2).

Since rank(Ī©1) ā‰¤ rank(Ī©) ā‰¤ k, then UTĪ©1U = diag(Ī³1 āˆ’Ļƒ2, . . . , Ī³k āˆ’ Ļƒ2, 0, . . . , 0), i.e.,

UTĪ£1U = diag(Ī³1, Ī³2, . . . , Ī³k, Ļƒ2, . . . , Ļƒ2).

Consider

Ī© = Ī£ āˆ’ Ļƒ2I =[

Ī£1 āˆ’ Ļƒ2Im Ī£2

Ī£T2 Ī£3 āˆ’ Ļƒ2Ināˆ’m

]

we have

Ī© =

Udiag(Ī³1 āˆ’ Ļƒ2, . . . , Ī³k āˆ’ Ļƒ2,

0, . . . , 0)UT Ī£2

Ī£T2 Ī£3 āˆ’ Ļƒ2Ināˆ’m

=[

Ī£1 Ī£2

Ī£T2 Ī£3

]āˆ’

Ļƒ2Ik 0 0

0 0 00 0 Ļƒ2Ināˆ’m

.

ZHAO et al.: A NOVEL INCREMENTAL PCA AND ITS APPLICATION FOR FACE RECOGNITION 877

Hence

WTĪ©W =WTĪ£W āˆ’ Ļƒ2I

=WTĪ£W āˆ’Ļƒ2Ik

0Ļƒ2Ināˆ’m

= diag(Ī»1, . . . , Ī»n) āˆ’Ļƒ2Ik

0Ļƒ2Ināˆ’m

.

It follows that

WTĪ£W = WTĪ£W +[Ļƒ2Im 0

0 0

].

Since Ī»kā€™s are arranged in nonincreasing order, we canconclude that

bestk(Ī£) = bestk(Ī£)

completing the proof. In Theorem 1, we consider the case where Ī£ satisfies the

low-rank-plus-shift property. Our second objective is to see thedifference between eigenvectors of Ī£ and eigenvectors of Ī£,which indicates the approximated errors. Details are discussedas follows.Theorem 2: From (6), let Ī£1 = UĪ“UT = Udiag(Ī³1,

. . . , Ī³m)UT, where Ī³1 ā‰„ Ī³2 ā‰„ Ā· Ā· Ā· ā‰„ Ī³m, Īµ = Ī³k+1, and

B =[U 00 Ināˆ’m

]

Ɨ diag

0, . . . , 0ļøø ļø·ļø· ļøø

k

, 1,Ī³k+2

Īµ, . . . ,

Ī³m

Īµ, 0, . . . , 0

[

U 00 Ināˆ’m

]T

.

Since

Ī£1 =Ī£1 + Ī£1

=Udiag(Ī³1, . . . , Ī³k, 0, . . . , 0)UT

+ Udiag(0, . . . , 0, Ī³k+1, . . . , Ī³m)UT

Ī£ =[

Ī£1 Ī£2

Ī£T2 Ī£3

]

=[

Ī£1 Ī£2

Ī£T2 Ī£3

]+

[Ī£1 00 0

]

=Ī£ + Ī£.

Assume that Ī²ji = wTj Bwi. Then, for Ī»i > 0 and Ī»i = Ī»j ,

(i = j), we have

ā€–vi āˆ’ wiā€– ā‰¤ 2Īµ

nāˆ‘

j =i

āˆ£āˆ£āˆ£āˆ£āˆ£ Ī²ji

(Ī»i āˆ’ Ī»j)

āˆ£āˆ£āˆ£āˆ£āˆ£ +12

+O(Īµ2)

where V = [v1, v2, . . . , vn] is orthogonal and V TĪ£V =diag(Ī»1, . . . , Ī»n).

Proof: Since Ī£1 = Udiag(0, . . . , 0, Ī³k+1, . . . , Ī³m)UT,then

Ī£ = Īµ

[U 00 Ināˆ’m

]

Ɨ diag(0, . . . , 0, 1,

Ī³k+2

Īµ, . . . ,

Ī³m

Īµ, 0, . . . , 0

) [U 00 Ināˆ’m

]T

we have Ī£ = ĪµB and ā€–Bā€– ā‰¤ 1. For a fixed i, consider Ī£ =Ī£ + ĪµB with eigenvector vi = wi(Īµ), and Ī»i = Ī»i(Īµ). Since Ī£is symmetric, Ī»i can be expressed as a power series in Īµ [29]

Ī»i = Ī»i(Īµ) = Ī»i + k1Īµ+ k2Īµ2 + Ā· Ā· Ā· . (7)

Similarly

vi = wi(Īµ) = wi + Īµz1 + Īµ2z2 + Ā· Ā· Ā· .Since Ī£ is symmetric, we can express the vectors ziā€™s as linearcombinations of the eigenvectors of Ī£ [29]. So

vi =wi(Īµ)=wi + ĪµĪ£n

j=1tj1wj + Īµ2Ī£nj=1tj2wj + Ā· Ā· Ā·

= (1 + Īµt11 + Īµ2t12 + Ā· Ā· Ā·)w1 + (Īµt21 + Īµ2t22 + Ā· Ā· Ā·)+ Ā· Ā· Ā· + (Īµtn1 + Īµ2tn2 + Ā· Ā· Ā·)wn. (8)

Since this is an eigenvector, and normalizing (8) such that thecoefficient of wi is equal to 1, then

vi = wi+

āˆžāˆ‘

j=1

Īµj t1j

w1+Ā· Ā· Ā·+

āˆžāˆ‘

j=1

Īµj t(iāˆ’1)j

wiāˆ’1+Ā· Ā· Ā·

+

āˆžāˆ‘

j=1

Īµj t(i+1)j

wi+1+Ā· Ā· Ā·+

āˆžāˆ‘

j=1

Īµj tnj

wn. (9)

Consider

(Ī£ + ĪµB)vi = Ī£vi = Ī»ivi = Ī»i(Īµ)vi Ī£wi = Ī»iwi. (10)

Substituting (7) and (9) into (10), and considering the terms oforder Īµ, then

ĪµĪ£(t11w1 + Ā· Ā· Ā· + t(iāˆ’1)1wiāˆ’1

+ t(i+1)1wi+1 + Ā· Ā· Ā· + tn1wn

)+ ĪµBwi

= Īµ(k1wi + Ī»it11w1 + Ā· Ā· Ā· + Ī»it(iāˆ’1)1wiāˆ’1

+ Ī»it(i+1)1wi+1 + Ā· Ā· Ā· + Ī»itn1wn

).

In other words

Ī£

nāˆ‘

j =i

tj1wj

+Bwi = Ī»i

nāˆ‘

j =i

tj1wj

+ k1wi

nāˆ‘j =i

tj1Ī£wj āˆ’ Ī»i

nāˆ‘

j =i

tj1wj

+Bwi = k1wi

nāˆ‘j =i

tj1

(Ī»j āˆ’ Ī»i

)wj +Bwi = k1wi. (11)

878 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSā€”PART B: CYBERNETICS, VOL. 36, NO. 4, AUGUST 2006

Since wTl wl = 1, and wT

l wi = 0 (l = i), multiplying wl onthe left of (11), then we have

(Ī»l āˆ’ Ī»i)tl1wTl wl + wT

l Bwi = 0

for l = i. Therefore, |tl1| = |(Ī²li/(Ī»i āˆ’ Ī»l))|.Recalling (9), we have

ā€–vi āˆ’ wiā€– ā‰¤ Īµ

nāˆ‘j =i

āˆ£āˆ£āˆ£āˆ£āˆ£ Ī²ji

(Ī»i āˆ’ Ī»j)

āˆ£āˆ£āˆ£āˆ£āˆ£ +O(Īµ2). (12)

Therefore

ā€–viā€– ā‰¤ 1 + Īµ

nāˆ‘j =i

āˆ£āˆ£āˆ£āˆ£āˆ£ Ī²ji

(Ī»i āˆ’ Ī»j)

āˆ£āˆ£āˆ£āˆ£āˆ£ +O(Īµ2).

Let ā€–viā€– = 1 + Īµs+O(Īµ2). If Īµ is small enough, then wecan get

|s| ā‰¤nāˆ‘

j =i

āˆ£āˆ£āˆ£āˆ£āˆ£ Ī²ji

(Ī»i āˆ’ Ī»j)

āˆ£āˆ£āˆ£āˆ£āˆ£ + 1.

Thus

vi = vi

(1 + Īµs+O(Īµ2)

). (13)

Substituting (13) in (12), we have

āˆ„āˆ„vi

(1 + Īµs+O(Īµ2)

) āˆ’ wi

āˆ„āˆ„ ā‰¤ Īµ

nāˆ‘j =i

āˆ£āˆ£āˆ£āˆ£āˆ£ Ī²ji

(Ī»i āˆ’ Ī»j)

āˆ£āˆ£āˆ£āˆ£āˆ£ +O(Īµ2).

Since ā€–viā€– = 1, then

ā€–viāˆ’wiā€– āˆ’ Īµ|s| +O(Īµ2) ā‰¤ Īµ

nāˆ‘j =i

āˆ£āˆ£āˆ£āˆ£āˆ£ Ī²ji

(Ī»iāˆ’Ī»j)

āˆ£āˆ£āˆ£āˆ£āˆ£+O(Īµ2)

ā€–viāˆ’wiā€– ā‰¤ Īµ

nāˆ‘j =i

āˆ£āˆ£āˆ£āˆ£āˆ£ Ī²ji

(Ī»iāˆ’Ī»j)

āˆ£āˆ£āˆ£āˆ£āˆ£+ Īµ|s| +O(Īµ2).

Considering |s| ā‰¤ āˆ‘nj =i |(Ī²ji/(Ī»i āˆ’ Ī»j))| + 1, we have

ā€–vi āˆ’ wiā€– ā‰¤ 2Īµ

nāˆ‘

j =i

āˆ£āˆ£āˆ£āˆ£āˆ£ Ī²ji

(Ī»i āˆ’ Ī»j)

āˆ£āˆ£āˆ£āˆ£āˆ£ +12

+O(Īµ2)

for Ī»i > 0 and Ī»i = Ī»j , (i = j).Thus, completing the proof. Corollary: Let ā€– Ā· ā€–F denote the Frobenius norm, then

ā€–V āˆ’Wā€–F ā‰¤ 2āˆšnĪµ

max

1ā‰¤iā‰¤n

nāˆ‘j =i

āˆ£āˆ£āˆ£āˆ£āˆ£ Ī²ji

(Ī»i āˆ’ Ī»j)

āˆ£āˆ£āˆ£āˆ£āˆ£ +12

+O(Īµ2).

Proof: The proof is straightforward and thereforeomitted.

According to Corollary, when Īµ ā†’ 0, W ā†’ V . It meansthat when Ī£1 is closely approximated with Ī£1, the error be-

tween the PCs of the batch method and those of SVDU-IPCAwill be very small.

B. Computational Complexity of the SVDU-IPCA Algorithm

Considering the first step of the proposed SVDU-IPCA algo-rithm in Algorithm 2, the matrix Ī£1 is seldom positive definitein face recognition. If we make P (1) = P , then rank(P (1)) =l (l < m), and no information will be lost in this step. Recall

that Q2 = 0, thus

[P (1)

Q2

]=

[P (1)

0

], which is of rank k al-

ready. The best rank-k approximation P (2) of

[P (1)

Q2

]is simply[

P (1)

0

]itself. Then, no computation cost is required and no

information is lost. Hence, only one round of approximationbased on column addition is required. The SVDU-IPCA al-gorithm requires O(r3) flops in the second step and O((l +r āˆ’ k)r2) flops for the QR decomposition in the third step.Then, O((k + r)2k) flops are needed in the fourth step. SinceQ2 = 0, [P (2)

[Q1Q3

]] is a sparse matrix. Hence, the computation

time of the SVDU-IPCA can be further reduced. Consider X =[x1, x2, . . . , xm], where xi denotes a sequence of face-imagevectors by row concatenation of the two-dimensional images. Ifwe perform the incremental procedure (SVD updating) directlyon the matrix X , it needs at least O(n2k) flops [22]. In face-recognition applications, very often, only two to six images areavailable for training while the image dimension is higher than10 000. That is to say, l, r, k n. Hence, the computationalcomplexity will be much higher than the proposed SVDU-IPCA. Consider that we perform the incremental procedure(SVD updating) directly on the matrix Ī£. Since Ī£2 = 0, we

need obtain two best rank-k approximations of Ī£(2) =[

Ī£(1)

Ī£T2

]

and

[Ī£(2)

[Ī£2

Ī£3

]], respectively. Then, two rounds of approxi-

mation will be required.

C. Remarks

In this paper, our method assumes that the Ī£1 submatrix isnot changed when the matrix is expanded. This implies that themean of the training samples is assumed to be constant, whichmay not hold especially when a large number of samples areadded. In such cases, we need to perform noncentered PCA[30], which does not require centering (Ī£ is then the outerproduct matrix). Noncentered PCA is also a well-establishedtechnique in ecology, chemistry, geology [30], and patternrecognition [31], especially face recognition [32]. NoncenteredPCA [30] uses the autocorrelation matrix, instead of the covari-ance matrix, to get the PCs.

Empirical results in Section VI also show that our incremen-tal method under large addition of face images outperformsbatch-mode noncentered PCA and centered PCA.

V. INCREMENTAL KPCA (IKPCA)

In this section, we will first briefly review KPCA, and thenreport our proposed IKPCA algorithm.

ZHAO et al.: A NOVEL INCREMENTAL PCA AND ITS APPLICATION FOR FACE RECOGNITION 879

A. KPCA

This section gives a short review on KPCA [9]ā€“[11]. Given anonlinear mapping Ī¦, the input data space R

n can be mappedinto the feature space F:

Ī¦ : Rn ā†’ F

x ā†’ Ī¦(x).

Correspondingly, a pattern in the original input space Rn is

mapped into a potentially much higher dimensional featurevector in the feature space F.

An initial motivation of KPCA is to perform PCA in thefeature space F. However, it is difficult to do so directly becausecomputation of dot product in a high-dimensional feature spaceis expensive. Fortunately, the algorithm can be implementedin the input space by virtue of the kernel trick. The explicitmapping process is not required at all.

PCA performed in the feature space F can be formulated asthe diagonalization of the covariance matrix

C =1n

nāˆ‘i=1

Ī¦(xi)Ī¦(xi)T

where x1, x2, . . . , xn are the given training samples in theinput space R

n. For simplicity, we assume that the mappeddata need not be centered.2 We can find the eigenvalues andeigenvectors of C via solving the eigenvalue problem

nĪ»Ī± = KĪ±

where K is a symmetric (nƗ n) Gram matrix with theelements

Kij = (Ī¦(xi),Ī¦(xj)) := K(xi, xj).

Consider the eigendecomposition K = GĪ›GT, where G =[Ī±1, Ī±2, . . . , Ī±n], with Ī±i = [Ī±i1, Ī±i2, . . . , Ī±in]T, is orthogonaland Ī› = diag(Ī»1, Ī»2, . . . , Ī»n). Denote the projection of the Ī¦image of a pattern x unto the kth component by Ļ•k. Finally,we have

Ļ•k =1āˆšĪ»k

nāˆ‘i=1

Ī±kiK(xi, x). (14)

The existence of such a kernel function is guaranteed by Mer-cerā€™s Theorem [9]. The Gaussian kernel [or radial basis function(RBF) kernel] K(x, y) = exp(āˆ’ā€–xāˆ’ yā€–/Ļƒ2) has been widelystudied in the literature and is also used in this paper.

B. IKPCA

The extension of existing incremental methods to KPCA isnot straightforward. Most, if not all, incremental algorithms,such as those in [13], [14], [16], and [19], are iterative meth-ods that need to compute the samples in the feature space

2This can be viewed as a Karhunenā€“Loeve (Kā€“L) transformation in thefeature space. Actually, all calculations can be reformulated to deal withcentering [9].

directly (not in terms of inner products). They are difficult toadopt to the kernel trick. To deal with the high computationalcomplexity, the sampling [27] and greedy methods [28] havebeen proposed to update KPCA. These algorithms can speed upKPCA very well when the Gram matrix is sparse. To the best ofour knowledge, no incremental method for KPCA is available.

From the theory of KPCA, using the kernel trick to deal withthe nonlinear mapping, the key issue is an eigenvalue problem.Our new incremental method is easy to adopt to the kernel trickand extend to the kernel version.

Consider the eigen equation

nĪ»Ī± = KĪ±.

Let K1 be an mƗm Gram matrix and

K =[K1 K2

KT2 K3

]

be an expanded (m+ r) Ɨ (m+ r) Gram matrix, where K2 isof size mƗ r and K3 is of size r Ɨ r. From Mercerā€™s Theorem[9], both K1 and K are positive semidefinite matrices.

Let the eigendecomposition K1 = V Ī›2V T = PTP , whereP = Ī›V T. Following the same derivation as in Section III,IKPCA is developed and shown in Algorithm 3.Algorithm 3ā€”Proposed IKPCA: Given the original data

X1 = [x1, x2, . . . , xm] and the rank-k eigendecomposition ofthe original Gram matrix K1 = PTP , for the newly added data[xm+1, xm+2, . . . , xm+r], do the following.

1) Compute the matrices K2 and K3.2) Obtain the best rank-k approximation of PlƗm, which is

P (1) =[Ik

0

]lƗk

Ī›kVTk .

3) Compute Q1 = Ī›āˆ’1V TK2 and Q3 as the square root ofK3 āˆ’QT

1 Q1.4) Obtain the QR decomposition (I(l+r)Ɨ(l+r) āˆ’[

Ik

0

][Ik 0])

[Q1Q3

]= JL, i.e.,

[0kƗk

I

](l+r)Ɨ(l+r)

[Q1

Q3

]= JL.

5) Obtain the SVD of the smaller matrix Ī›k [Ik 0]

[Q1

Q3

]

0 L

= U Ī›V T.

6) Obtain the best rank-k approximation of

[P Q1

0 Q3

],

which is[P Q1

0 Q3

]=

([Ik J ]U

)Ī›

([Vk 00 I

]V

)T

.

7) Obtain the best rank-k approximation of

K =[P Q1

0 Q3

]T [P Q1

0 Q3

].

8) Obtain the nonlinear projection of KPCA from (14).

880 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSā€”PART B: CYBERNETICS, VOL. 36, NO. 4, AUGUST 2006

Fig. 2. Samples from the FERET dataset.

VI. EXPERIMENT RESULTS

Three experiments will be presented in this section. In thefirst experiment, we evaluate and compare the performanceof the proposed SVDU-IPCA with batch-mode PCA (bothcentered and noncentered) by eigenface and fisherface face-recognition methods using the FERET database. The secondexperiment is to compare the performance of SVDU-IPCAwith the most recent IPCA algorithm, namely CCIPCA [13],using the AR face database. The third experiment compares theperformance of the proposed IKPCA with KPCA using the YaleB database. Details of each experiment are discussed as follows.

A. Experiment 1: Performance EvaluationUsing FERET Face Database

The objective of this experiment is to evaluate the perfor-mance of the proposed SVDU-IPCA algorithm by replacingthe batch-mode PCA algorithm in the eigenface and fisherfaceface-recognition methods.

The FERET database is used for evaluation in this experi-ment. We selected 72 subjects from the FERET database, withsix images for each subject. The six images are extracted fromfour different sets, namely Fa, Fb, Fc, and duplicate [2]. Theimages are selected to bear with more differences in lighting,facial expressions, and facial details. All images are alignedat the centers of the eyes and mouth. Histogram equalizationis applied to the face images for photometric normalization,and the images also convert to the intensity images that containvalues in the range 0.0 (black) to 1.0 (full intensity or white).The images from the two subjects are shown in Fig. 2.1) Results of the Eigenface Method: Initially, we used

144 images (36 subjects, 4 per subject) to form the initial matrixĪ£1. Training images are then added in increments of 24 images(six subjects, four per subject) up to a maximum total number of288 training images (72 subjects, 4 per subject). The remainingimages are used for recognition. Minimum distance classi-fier is used in this experiment. The experiments are repeated50 times and the average accuracy is recorded. Figs. 3 and 4show the performance of the proposed IPCA method, batch-mode centered PCA, and batch-mode noncentered PCA when30 and 50 PCs are used. The computational time, when 30 PCsare used, is recorded and shown in Table I, which indicates thatour proposed IPCA algorithm is more efficient than the batch-mode PCA algorithm.

It is found that the recognition rate decreases when moretraining samples are added. This may be due to the fact that

Fig. 3. Recognition accuracy using 30 PCs.

Fig. 4. Recognition accuracy using 50 PCs.

when more images are added, the (fixed) number of PCsmay not be enough for the recognition tasks. For example, if50 classes need 50 PCs to get a good result, 100 classes mayneed more PCs. If still 50 PCs are used, the accuracy willdrop. From the experiment results, we can also find that therecognition accuracy of 50 PCs is higher than that of 30 PCsbecause of the same argument. Choosing the best dimension ofthe feature space is still an open problem and beyond the scopeof this paper.

To demonstrate the benefit of using block updating, weperform an experiment using our SVDU-IPCA algorithmwith increment by single and block. In the experiment, first,144 images (36 subjects, 4 per subject) are used. Trainingimages are then added in increments of 24 images (six subjects,four per subject) up to a maximum total number of 288 trainingimages, and the remaining images are used for testing. To havea benchmark, experiments are performed using the eigenfacemethod (batch-mode PCA) on the initial 144 images to get theprojection matrix. This means the projection matrix does not

ZHAO et al.: A NOVEL INCREMENTAL PCA AND ITS APPLICATION FOR FACE RECOGNITION 881

TABLE ICOMPARISON OF CPU TIME (s) ON FERET DATA SET (2.4-GHz PENTIUM-4 CPU WITH 512-MB RAM)

Fig. 5. Comparing the accuracy of block updating and single updating.

change when some new face images are added. Fig. 5 comparesthe performance when 30 PCs are used. It can be shown thatincremental by block gives better performance than that bysingle.2) Results of the Fisherface Method: Besides the eigenface

method (PCA), the fisherface [33] method (PCA+LDA) isanother well-known method in appearance-based approach.

In the fisherface method, PCA is used for dimensional reduc-tion to ensure the full rank of the within-class scatter matrix. Indoing so, the direct computation of linear discriminant analysis(LDA) [31] becomes feasible.

In this experiment, we initially use 144 images for training.Training images are then added in increments of 24 imagesup to a maximum total number of 288 training samples. Theremaining images are used for recognition. In the PCA step,in order to avoid the singularity of the within-class scattermatrix, 100 PCs are selected, and then in the LDA step, cāˆ’ 1features are used for face recognition, where c is the numberof classes. The results are recorded and plotted in Fig. 6. Itcan be seen that the proposed incremental method gives a veryclose performance to batch-mode PCA when it is applied tofisherface.

B. Experiment 2: Comparison Between SVDU-IPCA andCCIPCA Algorithms

We have shown in Experiments 1 and 2 that our proposedSVDU-IPCA gives a very close approximation to the batch-mode PCA. In this section, we would like to show that ourproposed SVDU-IPCA outperforms the existing IPCA meth-ods. To do the comparison, we select the most recent method,namely the CCIPCA algorithm [13]. The AR database is se-lected [34]. Face-image variations in the AR database include

Fig. 6. Recognition accuracy of PCA+LDA and IPCA+LDA.

illumination, facial expression, and occlusion. For most ofthe individuals in the AR database, images were taken intwo sessions (separated by two weeks). In our experiment,119 individuals (64 male and 55 female) who participated inboth sessions were selected. We manually cropped the faceportion of the images. All the cropped images are aligned atthe centers of the eyes and mouth and then normalized withresolution 112 Ɨ 92. Due to the presence of sunglasses andscarves, the face images in AR contain a large area of occlusion.We discard these images and 952 images (119 Ɨ 8) are selectedin our experiment. In order to get a lager sample-to-dimensionratio, which is important for the CCIPCA algorithm, wavelettransform is applied on all the images two times, and the low-frequency images are used as input vectors. After two times ofwavelet transform, the resolution of the input image is 30 Ɨ 25.Note that, after the wavelet transform, some information of theoriginal image is lost.

In the experiment, initially, we used 238 images (119 per-sons, 2 images per person). Training images are then added inincrements of 119 images (119 persons, 1 image per person)up to a maximum total number of 714 images (119 persons,6 images per person). The remaining images are used for recog-nition. Minimum distance classifier is used in this experiment.The experiments are repeated 50 times and the average accuracyis recorded. Experiments on batch-mode PCA are performedas a benchmark. Figs. 7 and 8 plot the performance when 50and 100 PCs are used, respectively. It can be found that, inall the conditions that the sample-to-dimension ratio changesfrom 0.476 to 0.952, the SVDU-IPCA algorithm outperformsthe CCIPCA algorithm.

In order to give a comprehensive comparison, we also usethe measurement as suggested in [13]. Initially 238 trainingimages are used and then adds up to a maximum total number

882 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSā€”PART B: CYBERNETICS, VOL. 36, NO. 4, AUGUST 2006

Fig. 7. Recognition accuracy using 50 PCs.

Fig. 8. Recognition accuracy using 100 PCs.

of 714 training images (119 persons, 6 images per person).Let the PCs of our SVDU-IPCA be V = [v1, v2, . . . , vm], thePCs of CCIPCA be V = [v1, v2, . . . , vm], and the PCs ofthe batch-mode PCA on the total training samples be V =[v1, v2, . . . , vm]. We have computed the correlation V TV andV TV . The correlation between two unit eigenvectors v and vis represented by their inner product vvT. Figs. 9 and 10 showthe correlation (dot products) of the PCs. In both figures, theupper row [Figs. 9(a), (b) and 10(a), (b)] shows the resultsof our SVDU-IPCA while the lower row [Figs. 9(c), (d) and10(c), (d)] shows the results of the CCIPCA algorithm. It canbe seen that the proposed SVDU-IPCA can accurately estimatethe eigenvectors for both the first 20 and the last 20. In thisexperiment, the sample-to-dimension ratio is greater than 0.9,and CCIPCA estimates the first several PCs with high corre-lation with the actual ones. However, the other eigenvectors,especially the last eigenvectors, cannot be estimated accurately.The error in estimation may be due to the fact that, in CCIPCA,

the computation of the (i+ 1)th PC is based on the ith PC, andthe error will be propagated.

C. Experiment 3: Results on IKPCA

The objective of this experiment is to demonstrate the ef-fectiveness of the proposed IKPCA. The Yale Group B facedatabase is selected for evaluation because images have largervariations in pose and illumination.

The Yale Group B face database contains 5850 source imagesof ten subjects each captured under 585 viewing conditions(9 poses Ɨ 65 illumination conditions). In our experiments,we use images under 45 illumination conditions and these4050 images (9 poses Ɨ 45 illumination conditions) have beendivided into four subsets according to the angle the light-sourcedirection makes with the camera axis: subset 1 (up to 25,7 images per pose), subset 2 (up to 12, 12 images per pose),subset 3 (up to 50, 12 images per pose), and subset 4 (up to77, 14 images per pose) [35]. All frontal-pose images arealigned by the centers of the eyes and mouth and the otherimages are aligned by the center points of the faces. Then,all images are normalized with the same resolution of 57 Ɨ47. The images also convert to the intensity images that containvalues in the range 0.0 (black) to 1.0 (full intensity or white).Some images from one individual are shown in Fig. 11.

For each subset, we show images of nine poses under oneillumination condition. In the first row are images of nine posesunder one illumination condition from subset 1. In the secondrow are images of nine poses under one illumination conditionfrom subset 2, etc.

First, we fix the pose variation to evaluate the performanceof IKPCA. For each pose, we randomly select 80 images(two images per subset). Training images are then added inincrements of 40 images (one image per subset) up to a max-imum total number of 240 images. The remaining images areused for recognition. Minimum distance classifier is used. Inthe experiment, 50 PCs are used in the eigendecompositionupdating and testing. The experiments are repeated 50 timesand the average accuracy is recorded and plotted in Fig. 12. Itcan be seen that the two curves are almost overlapping. Thisimplies that our proposed IKPCA gives a close approximationto batch-mode KPCA.

Besides the recognition accuracy, we have compared thegram matrix and PC vectors from IKPCA and batch-modeKPCA. The results are recorded and shown in Tables II and III.In Table II, K and K are the Gram matrices used in thebatch method and the approximate Gram matrix used in theincremental method, respectively. ā€–V āˆ’Wā€–F is the Frobeniusnorm of the difference between the PCs of the batch methodand those of the incremental method. Table III records thedifference between the first PC of the batch method and that ofthe incremental method. Using Theorem 2, we can also obtainthe error bound of the angle between the first PC of the batchmethod and that of the incremental method.

Finally, we make the training samples include pose andillumination variations. We select two images from eachillumination subset and each pose. That is to say, we select720 images (10 persons Ɨ 9 poses Ɨ 4 subsets Ɨ 2 images) for

ZHAO et al.: A NOVEL INCREMENTAL PCA AND ITS APPLICATION FOR FACE RECOGNITION 883

Fig. 9. Correctness, or the correlation in the experiment using 50 PCs. (a) and (c) The correlation, represented by products V TV , of the first 20 PCs.(b) and (d) The correlation, represented by products V TV , of the last 20 PCs. (a) and (b) show the results for SVDU-IPCA algorithm while (c) and (d) forCCIPCA.

Fig. 10. Correctness, or the correlation in the experiment using 100 PCs. (a) and (c) The correlation, represented by products V TV , of the first 20 PCs.(b) and (d) The correlation, represented by products V TV , of the last 20 PCs. (a) and (b) show the results for SVDU-IPCA algorithm while (c) and (d) forCCIPCA.

training. Then, other training images are added in incrementsof 360 images (10 persons Ɨ 9 poses Ɨ 4 subsets Ɨ 1 image)up to total number of 2160 images. The remaining imagesare used for recognition. In the experiment, 50 PCs are used

in the eigendecomposition updating and testing. Experimentresults, together with the computational time, are recorded andshown in Tables IVā€“VI. Table VI indicates that KPCA by usingincremental learning is computational efficient.

884 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSā€”PART B: CYBERNETICS, VOL. 36, NO. 4, AUGUST 2006

Fig. 11. Images of one person from the Yale B database.

Fig. 12. Recognition accuracy on the Yale B database.

TABLE IICOMPARISON OF THE GRAM MATRIX AND PCS ON THE YALE B

DATA SET WITH FIXED POSE AND VARIANT ILLUMINATION

TABLE IIICOMPARISON OF THE FIRST PC ON THE YALE B DATA SET

WITH FIXED POSE AND VARIANT ILLUMINATION

TABLE IVCOMPARISON OF THE GRAM MATRIX AND PCS ON THE YALE B

DATA SET WITH VARIATIONS ON POSE AND ILLUMINATION

The experiment on the large database, Yale Group B, showsthat our new incremental algorithm not only is computationallyefficient, but also has small approximated error.

TABLE VCOMPARISON OF THE FIRST PC ON THE YALE B DATA SET

WITH VARIATIONS ON POSE AND ILLUMINATION

TABLE VICOMPARISON OF CPU TIME (s) ON THE YALE B DATA SET WITH

VARIATIONS ON POSE AND ILLUMINATION (2.4-GHzPENTIUM-4 CPU WITH 512-MB RAM)

VII. CONCLUSION

A new IPCA, namely SVDU-IPCA, is derived and reportedin this paper. The proposed SVDU-IPCA algorithm adopts theconcept of an SVD updating algorithm, in which we do notneed to recompute the eigendecomposition from scratch. Themain contribution of this paper is that we perform an erroranalysis of the proposed IPCA algorithm. A complete math-ematical derivation is presented, and we prove that the errorto be introduced in our SVDU-IPCA algorithm is bounded. Inother words, the proposed SVDU-IPCA algorithm guaranteesthe maximum error when approximating the batch-mode PCA.Another characteristic of the proposed SVDU-IPCA algorithmis that it can be easily extended to kernel space. An IKPCAalgorithm is also presented. To the best of our knowledge, thisis the first IKPCA algorithm.

Extensive experiments using available public databaseshave been performed in order to evaluate the performanceof our proposed SVDU-IPCA algorithm. It is found that theapproximation error is quite small. The proposed algorithm isalso applied to face recognition. Two well-known PCA-basedface-recognition methods, namely eigenface and fisherface,are used for evaluation. The experimental results show thatthe proposed SVDU-IPCA algorithm can be used in bothmethods with a very small degradation in recognition accuracy.This implies that the existing eigenface- and fisherface-basedalgorithms/systems can be scaled up easily by using theproposed SVDU-IPCA method as we do not need the previoustraining data.

APPENDIX

Since

Ī£ =[

Ī£1 Ī£2

Ī£T2 Ī£3

]=

[PTP Ī£2

Ī£T2 Ī£3

]

and

Ī£ =[P Q1

Q2 Q3

]T [P Q1

Q2 Q3

]

=[ (

PTP +QT2 Q2

)mƗm

(PTQ1 +QT

2 Q3

)mƗr(

QT1 P +QT

3 Q2

)rƗm

(QT

1 Q1 +QT3 Q3

)rƗr

](15)

ZHAO et al.: A NOVEL INCREMENTAL PCA AND ITS APPLICATION FOR FACE RECOGNITION 885

we have

PTP = PTP +QT2 Q2

that is, QT2 Q2 = 0.

Considering Q2 = 0, (15) simplifies to

[(PTP )mƗm (PTQ1)mƗr(QT

1 P)rƗm

(QT

1 Q1 +QT3 Q3

)rƗr

].

On equating terms for Ī£2, we have

(PTQ1)mƗr = Ī£2

that is, [u1, u2, . . . , ul]Ī›1/2Q1 = Ī£2. Hence

[u1, u2, . . . , ul]T[u1, u2, . . . , ul]Ī›12Q1 = [u1, u2, . . . , ul]TĪ£2

and then

(Q1)lƗr = Ī›āˆ’ 12 UTĪ£2.

Equating terms for Ī›3, we have

QT3 Q3 = Ī£3 āˆ’QT

1 Q1.

Since QT1 Q1 = Ī£2U Ī›āˆ’1UTĪ£2, then

QT3 Q3 = Ī£3 āˆ’ Ī£T

2 U Ī›āˆ’1UTĪ£2.

Consider

R =[U UT 0

0 I

]

S =[I āˆ’U Ī›āˆ’1UTĪ£2

0 I

]

where I is the identity matrix of appropriate size. We have

STRT

[Ī£1 Ī£2

Ī£T2 Ī£3

]RS

=[

I 0āˆ’Ī£T

2 U Ī›āˆ’1UT I

] [U UT 0

0 I

] [U Ī›UT Ī£2

Ī£T2 Ī£3

]

Ɨ[U UT 0

0 I

] [I āˆ’U Ī›āˆ’1UTĪ£2

0 I

]

=[U Ī›UT 0

0 Ī£3 āˆ’ Ī£T2 U Ī›āˆ’1UTĪ£2

].

As

Ī£ =[

Ī£1 Ī£2

Ī£T2 Ī£3

]

is positive semidefinite, by the definition of the positive semi-definite matrix, Ī£3 āˆ’ Ī£T

2 U Ī›āˆ’1UTĪ£2 is also positive semi-definite. Hence, Q3 can be obtained as the square root ofĪ£3 āˆ’ Ī£T

2 U Ī›āˆ’1UTĪ£2.

ACKNOWLEDGMENT

The authors would like to thank the research laboratorieswho contributed the FERET, AR, and Yale B databases forexperiments. They would also like to thank the five anonymousreviewers for precious comments.

REFERENCES

[1] M. Turk and A. Pentland, ā€œEigenfaces for recognition,ā€ J. Cognit.Neurosci., vol. 3, no. 1, pp. 71ā€“86, 1991.

[2] P. J. Phillips, H. Moo, S. A. Rizvi, and P. J. Rauss, ā€œThe FERET evaluationmethodology for face recognition algorithms,ā€ IEEE Trans. Pattern Anal.Mach. Intell., vol. 22, no. 10, pp. 1090ā€“1104, Oct. 2000.

[3] R. Chellappa, C. L. Wilson, and S. Sirohey, ā€œHuman and machine recog-nition of faces: A survey,ā€ Proc. IEEE, vol. 83, no. 5, pp. 705ā€“740,May 1995.

[4] J. Daugman, ā€œFace and gesture recognition: Overview,ā€ IEEE Trans.Pattern Anal. Mach. Intell., vol. 19, no. 7, pp. 675ā€“676, Jul. 1997.

[5] R. Brunelli and T. Poggio, ā€œFace recognition: Features versus templates,ā€IEEE Trans. Pattern Anal. Mach. Intell., vol. 15, no. 10, pp. 1042ā€“1053,Oct. 1993.

[6] A. Samal and P. A. Iyengar, ā€œAutomatic recognition and analysis ofhuman faces and facial expression: A survey,ā€ Pattern Recognit., vol. 25,no. 1, pp. 65ā€“77, Jan. 1992.

[7] M. Kirby and L. Sirovich, ā€œApplication of the Karhunenā€“Loeve procedurefor the characterization of human faces,ā€ IEEE Trans. Pattern Anal. Mach.Intell., vol. 12, no. 1, pp. 103ā€“108, Jan. 1990.

[8] H. Zha and H. D. Simon, ā€œOn updating problems in latent seman-tic indexing,ā€ SIAM J. Sci. Comput., vol. 21, no. 2, pp. 782ā€“791,1999.

[9] B. Schƶlkopf and A. Smola, Learning With Kernels: Support VectorMachines, Regularization, Optimization and Beyond. Cambridge, MA:MIT Press, 2002.

[10] B. Schƶlkopf, A. Smola, and K. Muller, ā€œNonlinear componentanalysis as a kernel eigenvalue problem,ā€ Neural Comput., vol. 10, no. 5,pp. 1299ā€“1319, Jul. 1998.

[11] M. Yang, ā€œKernel eigenfaces versus kernel fisherfaces: Face recognitionusing kernel methods,ā€ in Proc. 5th IEEE Int. Conf. Automatic Face andGesture Recognition (RGR), Washington, DC, 2002, pp. 215ā€“220.

[12] P. M. Hall, A. D. Marshall, and R. R. Martin, ā€œMerging and splittingeigenspace models,ā€ IEEE Trans. Pattern Anal. Mach. Intell., vol. 22,no. 9, pp. 1042ā€“1049, Sep. 2000.

[13] J. Weng, Y. Zhang, and W. S. Hwang, ā€œCandid covariance-free incre-mental principal component analysis,ā€ IEEE Trans. Pattern Anal. Mach.Intell., vol. 25, no. 8, pp. 1034ā€“1040, Aug. 2003.

[14] D. Skocaj and A. Leonardis, ā€œWeighted and robust incremental method forsubspace learning,ā€ in Proc. 9th IEEE Int. Conf. Computer Vision, Nice,France, 2003, vol. 2, pp. 1494ā€“1500.

[15] J. T. Kwok and H. Zhao, ā€œIncremental eigendecomposition,ā€ in Proc.ICANN, Istanbul, Turkey, Jun. 2003, pp. 270ā€“273.

[16] T. D. Sanger, ā€œOptimal unsupervised learning in a single-layerlinear feedforward neural network,ā€ Neural Netw., vol. 2, pp. 459ā€“473,1989.

[17] E. Oja and J. Karhunen, ā€œOn stochastic approximation of the eigenvectorsand eigenvalues of the expectation of a random matrix,ā€ J. Math. Anal.Appl., vol. 106, no. 1, pp. 69ā€“84, 1985.

[18] Y. Li, ā€œOn incremental and robust subspace learning,ā€ Pattern Recognit.,vol. 37, no. 7, pp. 1509ā€“1518, Jul. 2004.

[19] M. Artac, M. Jogan, and A. Leonardis, ā€œIncremental PCA for on-linevisual learning and recognition,ā€ in Proc. Int. Conf. Pattern Recognition,Quebec City, QC, Canada, 2002, vol. 3, pp. 781ā€“784.

[20] A. Levy and M. Lindenbaum, ā€œSequential Karhunenā€“Loeve basis extrac-tion and its application to images,ā€ IEEE Trans. Image Process., vol. 9,no. 8, pp. 1371ā€“1374, Oct. 2000.

[21] J. Bunch and C. Nielsen, ā€œUpdating the singular value decomposition,ā€Numer. Math., vol. 32, no. 2, pp. 131ā€“152, 1978.

[22] M. Brand, ā€œIncremental singular value decomposition of uncertain datawith missing values,ā€ in Proc. Eur. Conf. Computer Vision, Copenhagen,Denmark, May 2002, vol. 2350, pp. 707ā€“720.

[23] B. N. Parlett, The Symmetric Eigenvalue Problem. Englewood Cliffs,NJ: Prentice-Hall, 1980.

[24] P. Drineas, A. Frieze, R. Kannan, S. Vempala, and V. Vinay, ā€œClusteringin large graphs and matrices,ā€ in Proc. ACM SODA Conf., Baltimore, MD,1999, pp. 291ā€“299.

886 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSā€”PART B: CYBERNETICS, VOL. 36, NO. 4, AUGUST 2006

[25] A. Frieze, R. Kannan, and S. Vempala, ā€œFast Monte-Carlo algorithms forfinding low rank approximations,ā€ in Proc. ACM FOCS Conf., Palo Alto,CA, 1998, pp. 370ā€“378.

[26] D. Ross, J. Lim, and M. H. Yang, ā€œAdaptive probabilistic visual track-ing with incremental subspace update,ā€ in Proc. 8th ECCV, Prague,Czechoslovakia, May 2004, pp. 470ā€“482.

[27] D. Achlioptas, F. Mcsherry, and B. Schƶlkopf, ā€œSampling techniques forkernel methods,ā€ in Advances in Neural Information Processing Systems,vol. 14. Cambridge, MA: MIT Press, 2002.

[28] A. J. Smola and B. Schƶlkopf, ā€œSparse greedy matrix approximation formachine learning,ā€ in Proc. 17th Int. Conf. Machine Learning, Stanford,CA, 2000, pp. 911ā€“918.

[29] G. H. Golub, Lectures on Matrix Computations 2004, 2004. [Online].Available: http://www.mat.uniroma1.it/~bertaccini/seminars/

[30] I. T. Jolliffe, Principal Component Analysis, 2nd ed. New York:Springer-Verlag, 2002.

[31] K. Fukunaga, Introduction to Statistical Pattern Recognition. New York:Academic, 1990.

[32] Z. Jin, J. Y. Yang, Z. S. Hu, and Z. Lou, ā€œFace recognition based on theuncorrelated discriminant transform,ā€ Pattern Recognit., vol. 34, no. 7,pp. 1405ā€“1416, 2001.

[33] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, ā€œEigenfaces versusFisherfaces: Recognition using class specific linear projection,ā€ IEEETrans. Pattern Anal. Mach. Intell., vol. 19, no. 7, pp. 711ā€“720, Jul. 1997.

[34] A. M. Martinez and R. Benavente, ā€œThe AR face database,ā€ Purdue Univ.,West Lafayette, IN, CVC Tech. Rep. 24, Jun. 1998.

[35] A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman, ā€œFrom fewto many: Illumination cone models for face recognition under variablelighting and pose,ā€ IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 6,pp. 643ā€“660, Jun. 2001.

Haitao Zhao received the M.Sc. degree in appliedmathematics in 2000 and the Ph.D. degree in patternrecognition and artificial intelligence in 2003, bothfrom Nanjing University of Science and Technology,Nanjing, China.

Currently, he is an Assistant Professor in the Insti-tute of Aerospace Science and Technology, Shang-hai Jiaotong University, Shanghai, China. His majorresearch interests include human face recognition,image processing, and data fusion.

Pong Chi Yuen (Sā€™92ā€“Mā€™93) received the B.Sc. de-gree in electronic engineering with first class honorsfrom City Polytechnic of Hong Kong, Hongkong,SAR, in 1989, and the Ph.D. degree in electrical andelectronic engineering from The University of HongKong, Hongkong, SAR, in 1993.

Currently, he is a Professor in the Department ofComputer Science, Hong Kong Baptist University,Hongkong, SAR. He was a recipient of the universityfellowship to visit the University of Sydney in 1996.He was associated with the Department of Electrical

Engineering, Laboratory of Imaging Science and Engineering. In 1998, hespent a six-month sabbatical leave in the University of Maryland Institute forAdvanced Computer Studies, University of Maryland at College Park. He wasalso associated with the Computer Vision Laboratory, Center for AutomationResearch. He was the director of Croucher Advanced Study Institute onBiometric Authentication 2004 and is the track Co-Chair of the InternationalConference on Pattern Recognition 2006. From July 2005 to January 2006, hewas a Visiting Professor in the Institute National de Recherche en Informatiqueet en Automatique, Rhone Alpes, France. His major research interests includehuman-face recognition, signature recognition, and medical image processing.He has published more than 80 scientific articles in these areas.

James T. Kwok (Mā€™98) received the B.Sc. degreein electrical and electronic engineering from theUniversity of Hong Kong, Hongkong, SAR, andthe Ph.D. degree in computer science from theHong Kong University of Science and Technology,Hongkong, SAR.

He was a Consultant at Lucent Technologies BellLaboratories in Murray Hill, NJ. He then joinedthe Department of Computer Science, Hong KongBaptist University, as an Assistant Professor. Hereturned to the Hong Kong University of Science

and Technology, in 2000 and is an Assistant Professor in the Department ofComputer Science. He is currently on the editorial board of Neurocomputing.His major research interests are kernel methods, machine learning, patternrecognition, and artificial neural networks.