human gait recognition using patch distribution feature and locality-constrained group sparse...

11
316 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2012 Human Gait Recognition Using Patch Distribution Feature and Locality-Constrained Group Sparse Representation Dong Xu, Yi Huang, Zinan Zeng, and Xinxing Xu Abstract—In this paper, we propose a new patch distribution feature (PDF) (i.e., referred to as Gabor-PDF) for human gait recognition. We represent each gait energy image (GEI) as a set of local augmented Gabor features, which concatenate the Gabor features extracted from different scales and different orientations together with the X–Y coordinates. We learn a global Gaussian mixture model (GMM) (i.e., referred to as the universal back- ground model) with the local augmented Gabor features from all the gallery GEIs; then, each gallery or probe GEI is further expressed as the normalized parameters of an image-specific GMM adapted from the global GMM. Observing that one video is naturally represented as a group of GEIs, we also propose a new classification method called locality-constrained group sparse rep- resentation (LGSR) to classify each probe video by minimizing the weighted mixed-norm-regularized reconstruction error with respect to the gallery videos. In contrast to the standard group sparse representation method that is a special case of LGSR, the group sparsity and local smooth sparsity constraints are both enforced in LGSR. Our comprehensive experiments on the bench- mark USF HumanID database demonstrate the effectiveness of the newly proposed feature Gabor-PDF and the new classification method LGSR for human gait recognition. Moreover, LGSR using the new feature Gabor-PDF achieves the best average Rank-1 and Rank-5 recognition rates on this database among all gait recognition algorithms proposed to date. Index Terms—Human gait recognition, patch distribution fea- ture (PDF), sparse representation (SR). I. INTRODUCTION T HERE is an increasing research interest in human identifi- cation in controlled environments such as airports, banks, and car parks. The human gait is an important biometric fea- ture for human identification in such video-surveillance-based applications because it can be perceived unobtrusively from a medium to a great distance. In the view of biomechanics, in- dividuals have distinctive and special ways of walking. Results from the field of psychology also demonstrated the ability for humans to: 1) distinguish human locomotion from other motion Manuscript received August 18, 2010; revised March 17, 2011 and May 15, 2011; accepted May 25, 2011. Date of publication June 30, 2011; date of current version December 16, 2011. This work was supported in part by the Singapore National Research Foundation & Interactive Digital Media R&D Program Of- fice, MDA under Research Grant NRF2008IDM-IDM004-018 and in part by the Singapore MOE AcRF Tier 1 under Research Grant RG63/07. The associate ed- itor coordinating the review of this manuscript and approving it for publication was Prof. N. V. Boulgouris. The authors are with the School of Computer Engineering, Nanyang Techno- logical University, Singapore 639798. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2011.2160956 patterns; 2) recognize friends; and 3) recognize gender, the di- rection of motion, and carrying conditions [1]. The most recent psychological study has further demonstrated that humans can indeed recognize people by their gait [2]. The existing methods for human gait recognition can be divided roughly into two categories: model-based and appear- ance-based approaches. In model-based approaches [3], [4], the human body structure is characterized using the model parameters fitted based on the extracted features. The param- eters can be dynamic parameters (e.g., the stride length and speed) or static body parameters (e.g., the size ratios of various body parts). Compared with the model-based methods, the appearance-based approaches [1], [5]–[8], which employ a compact representation to characterize the motion patterns of the human body, have demonstrated better performance on the common databases [9]. In the appearance-based approaches, different types of features (e.g., the whole silhouettes [6], [9], [10], silhouette width vectors [11], or Fourier descriptors [12]) are first extracted. In the subsequent pattern-matching stage, some approaches exploit the silhouette shape and dynamics information, whereas other approaches only employ the silhou- ette shape similarity. Several temporal alignment techniques (e.g., the simple temporal correlation [1], [8], Fourier analysis [12], [13], dynamic time wrapping [14], and Hidden Markov Models (HMMs) [11], [14], [15]) have been proposed to exploit the dynamic information. In the common approaches [6], [10] that only employ the silhouette shape similarity, the binary silhouettes over one gait cycle are averaged such that each gait video containing a number of gait cycles is represented by a set of gray-level average silhouette images [i.e., gait energy images (GEIs)]. Recently, Liu and Sarkar [9] have employed a generic walking model, which is referred to as population HMM (pHMM), to acquire the dynamics-normalized gait circles for human gait recognition. In the appearance-based approaches, the classical dimen- sion-reduction techniques Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) [6], [9], [16] have also been applied to acquire an efficient and discrimi- nant representation before formally conducting classification, in which each GEI is represented as a lengthy vector in the high-dimensional space. Bilinear and tensor subspace learning methods [7], [17]–[19] were also proposed for human gait recognition, in which one gray-level GEI is represented as a second-order tensor (i.e., a matrix) and one set of Gabor-filtered images can be characterized as a higher order tensor. These bi- linear and tensor subspace learning methods have demonstrated 1057-7149/$26.00 © 2011 IEEE

Upload: dong-xu

Post on 24-Sep-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Human Gait Recognition Using Patch Distribution Feature and Locality-Constrained Group Sparse Representation

316 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2012

Human Gait Recognition Using PatchDistribution Feature and Locality-Constrained

Group Sparse RepresentationDong Xu, Yi Huang, Zinan Zeng, and Xinxing Xu

Abstract—In this paper, we propose a new patch distributionfeature (PDF) (i.e., referred to as Gabor-PDF) for human gaitrecognition. We represent each gait energy image (GEI) as a setof local augmented Gabor features, which concatenate the Gaborfeatures extracted from different scales and different orientationstogether with the X–Y coordinates. We learn a global Gaussianmixture model (GMM) (i.e., referred to as the universal back-ground model) with the local augmented Gabor features fromall the gallery GEIs; then, each gallery or probe GEI is furtherexpressed as the normalized parameters of an image-specificGMM adapted from the global GMM. Observing that one video isnaturally represented as a group of GEIs, we also propose a newclassification method called locality-constrained group sparse rep-resentation (LGSR) to classify each probe video by minimizing theweighted ���� mixed-norm-regularized reconstruction error withrespect to the gallery videos. In contrast to the standard groupsparse representation method that is a special case of LGSR, thegroup sparsity and local smooth sparsity constraints are bothenforced in LGSR. Our comprehensive experiments on the bench-mark USF HumanID database demonstrate the effectiveness ofthe newly proposed feature Gabor-PDF and the new classificationmethod LGSR for human gait recognition. Moreover, LGSR usingthe new feature Gabor-PDF achieves the best average Rank-1and Rank-5 recognition rates on this database among all gaitrecognition algorithms proposed to date.

Index Terms—Human gait recognition, patch distribution fea-ture (PDF), sparse representation (SR).

I. INTRODUCTION

T HERE is an increasing research interest in human identifi-cation in controlled environments such as airports, banks,

and car parks. The human gait is an important biometric fea-ture for human identification in such video-surveillance-basedapplications because it can be perceived unobtrusively from amedium to a great distance. In the view of biomechanics, in-dividuals have distinctive and special ways of walking. Resultsfrom the field of psychology also demonstrated the ability forhumans to: 1) distinguish human locomotion from other motion

Manuscript received August 18, 2010; revised March 17, 2011 and May 15,2011; accepted May 25, 2011. Date of publication June 30, 2011; date of currentversion December 16, 2011. This work was supported in part by the SingaporeNational Research Foundation & Interactive Digital Media R&D Program Of-fice, MDA under Research Grant NRF2008IDM-IDM004-018 and in part by theSingapore MOE AcRF Tier 1 under Research Grant RG63/07. The associate ed-itor coordinating the review of this manuscript and approving it for publicationwas Prof. N. V. Boulgouris.

The authors are with the School of Computer Engineering, Nanyang Techno-logical University, Singapore 639798.

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIP.2011.2160956

patterns; 2) recognize friends; and 3) recognize gender, the di-rection of motion, and carrying conditions [1]. The most recentpsychological study has further demonstrated that humans canindeed recognize people by their gait [2].

The existing methods for human gait recognition can bedivided roughly into two categories: model-based and appear-ance-based approaches. In model-based approaches [3], [4],the human body structure is characterized using the modelparameters fitted based on the extracted features. The param-eters can be dynamic parameters (e.g., the stride length andspeed) or static body parameters (e.g., the size ratios of variousbody parts). Compared with the model-based methods, theappearance-based approaches [1], [5]–[8], which employ acompact representation to characterize the motion patterns ofthe human body, have demonstrated better performance on thecommon databases [9]. In the appearance-based approaches,different types of features (e.g., the whole silhouettes [6], [9],[10], silhouette width vectors [11], or Fourier descriptors [12])are first extracted. In the subsequent pattern-matching stage,some approaches exploit the silhouette shape and dynamicsinformation, whereas other approaches only employ the silhou-ette shape similarity. Several temporal alignment techniques(e.g., the simple temporal correlation [1], [8], Fourier analysis[12], [13], dynamic time wrapping [14], and Hidden MarkovModels (HMMs) [11], [14], [15]) have been proposed to exploitthe dynamic information. In the common approaches [6], [10]that only employ the silhouette shape similarity, the binarysilhouettes over one gait cycle are averaged such that each gaitvideo containing a number of gait cycles is represented by aset of gray-level average silhouette images [i.e., gait energyimages (GEIs)]. Recently, Liu and Sarkar [9] have employeda generic walking model, which is referred to as populationHMM (pHMM), to acquire the dynamics-normalized gaitcircles for human gait recognition.

In the appearance-based approaches, the classical dimen-sion-reduction techniques Principal Component Analysis(PCA) and Linear Discriminant Analysis (LDA) [6], [9], [16]have also been applied to acquire an efficient and discrimi-nant representation before formally conducting classification,in which each GEI is represented as a lengthy vector in thehigh-dimensional space. Bilinear and tensor subspace learningmethods [7], [17]–[19] were also proposed for human gaitrecognition, in which one gray-level GEI is represented as asecond-order tensor (i.e., a matrix) and one set of Gabor-filteredimages can be characterized as a higher order tensor. These bi-linear and tensor subspace learning methods have demonstrated

1057-7149/$26.00 © 2011 IEEE

Page 2: Human Gait Recognition Using Patch Distribution Feature and Locality-Constrained Group Sparse Representation

XU et al.: HUMAN GAIT RECOGNITION USING PDF AND LGSR 317

Fig. 1. Comparison of SR and LGSR for human gait recognition using GEIs. The gallery GEIs (resp. gallery videos) associated with nonzero reconstructioncoefficients are connected with the corresponding probe GEIs (resp. probe video) in the SR (resp. LGSR).

promising results for human gait recognition. By representingeach GEI as a set of local Gabor features, Huang et al. [20]have recently proposed an image-to-class distance for humangait recognition by directly calculating the distance from oneprobe GEI to all the gallery images belonging to a certain class.

In this paper, we propose a new patch distribution feature(PDF) for human gait recognition. We represent each GEI as aset of local augmented Gabor features, which concatenate the40-D Gabor features extracted from five different scales andeight different orientations together with the 2-D X–Y coordi-nates. Following [21] and [22], we first learn a global Gaussianmixture model (GMM) [i.e., referred to as the universal back-ground model (UBM)] by using the local augmented Gabor fea-tures from all the gallery GEIs. Then, each gallery or probe GEIis further expressed as an image-specific GMM adapted fromthe global GMM via the Maximum a Posteriori (MAP) adap-tation process. Finally, the parameters of each image-specificGMM are normalized to have zero mean and a unit norm. Werefer to our new PDF (i.e., the normalized parameters) as theGabor-PDF. In contrast to the existing PDF (i.e., referred toas Discrete Cosine Transform PDF (DCT-PDF)] [21], [22] forface recognition, in which each face image is represented as aset of DCT features extracted from the local patches, our newlyproposed Gabor-PDF can achieve much better performance forhuman gait recognition by characterizing the distributions of aset of more discriminant Gabor features [23].

The sparse-representation (SR)-based method [24], whichaims to recover the sparse linear representation of any querysample with respect to a set of reference samples, has been em-ployed successfully for various vision applications such as facerecognition [25], [26], general image classification [26]–[28],and image annotation [29]. Wright et al. [25] proposed to min-imize the norm-regularized reconstruction error for robustface recognition. The recent work in locality-constrained linearcoding (LLC) [27] has shown that the image classificationperformance using the classical bag-of-words model can beimproved by enforcing the local smooth sparsity constraint in thevector quantization process, in which the SIFT features extractedfrom similar patches are enforced to be quantized into similarvisual words. However, the reference samples are treated asindependent data points in the above SR-based methods. Whenthe reference samples and the query samples are organized as

groups, group sparse representation (GSR)-based methods [26],[29]–[31] impose the mixed-norm penalty on the recon-struction coefficients such that only a limited number of groupsin the training set would be chosen. Recently, the GSR-basedmethod [30] has been employed to fuse effectively differenttypes of features (e.g., HSV and HoG) for image-based visualclassification [26] and image annotation [29].

In the context of human gait recognition, we need to clas-sify each probe video with multiple GEIs. To utilize effectivelythe intrinsic group information from multiple GEIs within eachvideo, we treat each probe/gallery video as a group of GEIs andpropose a new classification method referred to as locality-con-strained GSR (LGSR) for human gait recognition. Specifically,we impose the weighted mixed-norm penalty on the recon-struction coefficients in order to enforce both group sparsity andlocal smooth sparsity constraints. We also show that the standardGSR method is a special case of LGSR. Fig. 1 illustrates the mo-tivation to use LGSR instead of SR for human gait recognition.As shown in Fig. 1, SR only enforces a sparse set of gallery GEIsto reconstruct independently each probe GEI without utilizingthe group information among the GEIs in the gallery videosand the probe video. Hence, the overall representation for theprobe video may not be sparse in terms of the number of se-lected gallery videos. By incorporating the group informationboth in the gallery videos and the probe video in LGSR, we canobtain more robust and discriminant nonzero reconstruction co-efficients that are only from a sparse set of gallery videos.

We conduct comprehensive experiments on the benchmarkUSF HumanID database [1]. Using the simplistic nearestneighbor classifier, as suggested in [10], we first demonstratethat our newly proposed feature Gabor-PDF outperforms otherrelated features for human gait recognition. We also study theperformance variation of Gabor-PDF with respect to differentparameters including the number of scales and orientations ofthe Gabor kernel function, as well as the number of Gaussiancomponents for Gabor-PDF. Using Gabor-PDF as the inputfeature, we further compare different SR methods, includingSR [25], LLC [27], GSR [30], and the newly proposed LGSR.According to the experimental results, LGSR using the new fea-ture Gabor-PDF achieves the best average Rank-1 and Rank-5recognition rates on this database, among all gait recognitionalgorithms proposed to date.

Page 3: Human Gait Recognition Using Patch Distribution Feature and Locality-Constrained Group Sparse Representation

318 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2012

II. GABOR-PDF

Recently, the Gabor wavelet feature has been demonstratedto be an effective feature for human gait recognition [7], [20].Tao et al. [7] proposed a tensor subspace learning method byrepresenting the Gabor-filtered GEIs as higher order tensors,and Huang et al. [20] directly calculated the image-to-class dis-tance for human gait recognition by representing each GEI asa set of Gabor features. In this paper, we propose a new PDF(i.e., referred to as Gabor-PDF) to characterize the distributionof Gabor features extracted from each GEI.

A. Augmented Gabor Feature Extraction

Let us define a family of Gabor kernel functions at one givenpixel as

(1)

where determines the scale and the orientationof the Gabor kernel function. Specifically,determines the scale, and determines the direc-tion. Following [7], [23], and [32], we set parameter as ,and in the default setting, we also set and

. Then, we have 40 Gabor kernel func-tions from five scales and eight orientations. In Section IV-A, wealso evaluate Gabor-PDF using other parameter settings suchas three scales and eight orientations, five scales and four ori-entations, and three scales and four orientations. The resultsshow that the default parameter setting (i.e., five scales andeight orientations) generally achieves better performance. Theterm is subtracted in (1) to make the kernel func-tion DC-free; hence, the extracted Gabor feature is robust toillumination.

For each GEI with the size of ( andin this paper), we acquire 40 Gabor-filtered images after con-

volving the GEI with the Gabor kernel functions. For compu-tational efficiency, we further downsample each Gabor-filteredimage to , where and are thelargest integers less than or equal to and , respec-tively. Let us denote as the 40-D Gabor feature extracted from40 Gabor-filtered images at the th pixel. Without using thisdownsampling process, it is much more time consuming to ex-tract Gabor-PDF using the approach discussed in Section II-B.Moreover, such a downsampling process will not degrade sig-nificantly the recognition rate because the Gabor features ex-tracted from the neighboring pixels are generally redundant tosome extent.

Considering that each GEI is roughly aligned in the prepro-cessing stage, we also put the coordinate information into thelocal feature representation, as suggested in [22]. Specifically,we use the augmented Gabor feature( in this paper), which concatenates with the corre-sponding 2-D coordinates . Finally, we represent theth GEI after Gabor feature extraction as a set of augmented

Gabor features , where is thetotal number of pixels.

B. Gabor-PDF Extraction

We use the well-known GMM, which is also referred to asthe image-specific GMM in this paper to characterize the dis-tribution of a set of augmented Gabor features extracted fromeach GEI. The augmented Gabor features extracted from oneGEI may not contain sufficient information to estimate robustlythe parameters of the image-specific GMM. Following [21] and[22], we propose to use a two-stage approach for parameterestimation.

1) Universal Background Model (UBM) Learning: In thefirst stage, we learn a global GMM (i.e., referred to as the UBM)using all the augmented Gabor features from all the gallery im-ages. Since the global GMM can be represented as the parameterset , where is the total number of com-ponents in the GMM and , , andare the weight, the mean vector, and the covariance matrix ofthe th Gaussian component, respectively. Note that we haveconstraint .

There is no closed-form solution to estimate the parameters.We adopt the well-known Expectation–Maximization (EM)algorithm to update iteratively the weight, the mean vector, andthe covariance matrix. We initialize as the uniform weight.We conduct -means clustering to partition the augmentedGabor features from all the gallery GEIs into clusters; then,we exploit the samples in each cluster to initialize and .As suggested in [21] and [22], we also require the covariancematrix to be diagonal in order to improve the computationalefficiency.

2) MAP Adaptation: In the second stage, an image-specificGMM can be adapted from the global GMM for each GEIvia the MAP adaptation process. To indicate the membershipprobability of the augmented Gabor feature belonging to the

th Gaussian component, we define the intermediate variableas follows:

(2)

where represents the Gaussian probability density func-tion. It is easy to verify that . We also define

. Given the set of local augmented Gaborfeatures extracted from one gallery or probe GEI andthe prelearned global GMM parameters ,the th component of the image-specific GMM of this GEI canbe adapted as follows:

where , with being a predefined parameter.To better cope with the instability problem of the GMM pa-rameter estimation and to improve the computational efficiencyof the adaptation process, only the means and the weights areadapted, as suggested in [21] and [22]. Note that is the ex-pected mean of the th Gaussian component based on the obser-vation data . However, it is possible that may not be

Page 4: Human Gait Recognition Using Patch Distribution Feature and Locality-Constrained Group Sparse Representation

XU et al.: HUMAN GAIT RECOGNITION USING PDF AND LGSR 319

well estimated when only a limited number of augmented Gaborfeatures is given. Hence, the linear combination of andwith the weighting coefficient is then used to estimate theadapted mean . As shown in [22], when is large (i.e., the

th Gaussian component has a high probabilistic count), ap-proaches 1, and the adapted parameters are determined mainlyby the new sufficient statistics. Otherwise, the adapted parame-ters are determined mainly by the global model.

Finally, each GEI is represented by the image-specific GMMparameter set , where is the covariancematrix of the th component from the global GMM.

3) Normalization: For the th GEI, we denote the adaptedmean and weight of the th Gaussian component as and ,respectively. Similar to [22], we represent the PDF of the thGEI as

(3)

(4)

where is the weighted mean vector from the th componentof the image-specific GMM for the th GEI and is thefeature dimension of . In contrast to [22], we subtract in(3) because it can achieve better performance.

Finally, is normalized to have zero mean and a unit norm.We refer to our new PDF (i.e., the normalized parameters) asGabor-PDF.

C. Discussion

It is worth mentioning the difference between Gabor-PDF andthe existing PDF (i.e., referred to as DCT-PDF) [22] proposedfor face recognition. To extract DCT-PDF, each face image isrepresented as a set of orderless patches, and the 2-D DCT isused to extract the features from the patch. While Gabor-PDFand DCT-PDF are both PDFs, Gabor-PDF characterizes the dis-tribution of a set of Gabor features, and DCT-PDF represents thedistribution of a set of DCT features. Gabor wavelets have a sim-ilar shape as the receptive field profiles in mammalian corticalsimple cells. When compared with the DCT features, Gabor fea-tures are more discriminant due to its desirable characteristicsin terms of the frequency and orientation selectivity as well asthe spatial localization [7], [32], making Gabor-PDF a more ef-fective feature for human gait recognition.

III. LGSR

Most existing human gait recognition methods [6], [7], [9],[18] employ the simplistic nearest neighbor classifier for classi-fication. Inspired by the recent success of SR-based methods forvarious computer-vision applications [25], [26], [28], we pro-pose a new classification method called LGSR for human gaitrecognition. Moreover, we show that the standard GSR method[30] is a special case of LGSR.

Let us define, where is the th GEI in the training set; and

are the total numbers of training (i.e., gallery) GEIsand videos, respectively; is the feature dimension; and

represents the th gallery

video, with being the total number of GEIs in this video.Hence, we have . We also define the test (i.e.,probe) video , where isthe th GEI in the probe video and is the total number ofGEIs in the probe video. Let us represent the reconstructioncoefficient as ,where is the reconstruction coefficient for theprobe video with respect to the th gallery video. Moreover,the Frobenius norm of matrix is denoted as , and thesmallest element of is denoted as . The andnorms of vector are defined as and , respectively.We also define as the Hadamard product (also known asentrywise product). In addition, we define as a vector with allones, and as a matrix with all zeros.

A. LGSR

The pioneering work in [25] proposed to classify face imagesby minimizing the norm-regularized reconstruction error, inwhich it seeks an SR for only a single test image. In the contextof human gait recognition, we need to classify each probe videowith multiple GEIs. To utilize effectively the intrinsic group in-formation from multiple GEIs within each video, we treat eachprobe/gallery video as one group of GEIs and propose a newclassification method called LGSR. In contrast to the standardGSR method that is a special case of LGSR, we enforce both thegroup sparsity and local smooth sparsity constraints in LGSRby minimizing the weighted mixed-norm-regularized recon-struction error as follows:

(5)

where represents the reconstructionerror of the probe video with respect to all the gallery videos

. The second term is the weighted mixed-norm-based reg-ularizer on the reconstruction coefficient , and is the reg-ularization parameter to balance these two terms.is the distance matrix between the GEIs of the th gallery videoand the GEIs in the probe video. To calculate , we first use thesingle-level Earth mover’s distance-based temporal matchingmethod in [33] to calculate distance between the th galleryvideo and the probe video. Let us denote as the minimumdistance of . For the th GEI from the th gallery video,we define , where is the Eu-clidean distance between and and is the bandwidthparameter.

Following [34], we employ the active set-based subgradient-descent algorithm to solve (5). Since values are independentof each other, we separately update each . Then, is updatedat iteration by

(6)

where is the updating direction and is the stepsize determined by a standard line search method. By taking

Page 5: Human Gait Recognition Using Patch Distribution Feature and Locality-Constrained Group Sparse Representation

320 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2012

the subgradient of with respect to , we can rewrite theupdating direction as

(7)

where

(8)

ifotherwise

(9)

and is any arbitrary matrix that satisfies condition. Similar to [35], we also have the following

necessary optimal conditions for the objective function in (5):

if

otherwise.(10)

The detailed optimization algorithm is summarized in Al-gorithm 1. We initialize as a matrix with all its elementsas zero such that all the gallery videos would be added intothe active set and the corresponding reconstruction coefficientswould be updated. At each iteration of our algorithm, we in-sert at most one gallery video, which violates the optimal con-dition (i.e., the corresponding reconstruction coefficientbut ) into the active set.We also remove the gallery videos from the active set if thecorresponding reconstruction coefficient . If the op-timal conditions for these removed videos are satisfied, we donot need to update the corresponding reconstruction coefficient;otherwise, it may be selected into the active set in the subsequentiterations again. In our implementation, we set and alsoset if . The convergence of our algo-rithm will be illustrated experimentally in Section IV-B.

Algorithm 1 The optimization algorithm for LGSR

Input: : probe video, : gallery videos.

Initialize , , .

Compute between the th gallery video and the probevideo, .

WHILE

Compute .

Find . If then .

For each in do

Update by using (6) with line search.

If , then remove from .

End For

If , then exit WHILE.

.

End While

Output ;

B. Classification Method

Once we obtain the optimal reconstruction coefficient , wecan use two classification methods based on different criteria toclassify the probe video.

1) Minimum Reconstruction Error (minRE) Criterion: Wecompute the reconstruction error for each class as follows:

(11)

where the reconstruction coefficient is from that cor-responds to the th gallery video. Then, we classify the probevideo to , as in [25].

2) Maximum Weighted Inverse Recontruction Error(maxWIRE) Criterion: In the above criterion, the recon-struction coefficient is not used directly for classification.Intuitively, if the reconstruction errors of the probe video withrespect to two gallery videos are the same, we should choose theclass label of the gallery video that is associated with the largerFrobenius norm of the reconstruction coefficient. Specifically,we define the following weighted inverse reconstruction erroras follows:

(12)

Then, we classify the probe video to.

C. Discussions

The proposed method LGSR is intrinsically different from theSR [25] and LLC [27]. For the th GEI of the probe video,the reconstruction coefficient for SR and LLC can beobtained by minimizing the following objective functions:

SR (13)

LLC

s.t. (14)

where is used to enforce the local smooth sparsityconstraint in LLC and constraint follows the shift-invariant requirements of the LLC code [27]. As suggested in[27], with , where

is the bandwidth parameter and is the Euclidean distancebetween and , and is subsequently normalized to bebetween (0, 1].

In (13) and (14), it is obvious that SR and LLC aim to choosea sparse set of gallery GEIs to reconstruct independently eachprobe GEI without utilizing any group information among theGEIs in the gallery videos and probe video. For a video-basedclassification task such as human gait recognition, the recon-struction coefficient may spread over many groups, as shownin Fig. 1. In contrast, the goal of LGSR is to choose the morerobust and discriminant nonzero reconstruction coefficients thatare only from a sparse set of gallery videos by enforcing bothgroup sparsity and local smooth sparsity constraints.

Page 6: Human Gait Recognition Using Patch Distribution Feature and Locality-Constrained Group Sparse Representation

XU et al.: HUMAN GAIT RECOGNITION USING PDF AND LGSR 321

TABLE ITWELVE PROBE SETS FOR EXPERIMENTS (B-BRIEFCASE, C-CLOTHING, S-SHOE, T-TIME, U-SURFACE, AND V-VIEWPOINT). [1]

Moreover, it is also worth mentioning that the GSR methodwith the objective function shown in (15) is just a special caseof LGSR by setting every element in as 1 in (5), i.e.,

(15)

To recover the block-sparse signal whose nonzero elementsare grouped into blocks, Eldar and Mishali [31] formulated aGSR problem with the objective function equivalent to (15).They also showed that the recovery of multiple measurementvectors discussed in [36] is a special case of their framework.In contrast to [31] and [36], in this paper, the GSR methodis adopted for the video-based recognition task of human gaitrecognition, in which each gait video that contains a set of GEIsis treated as one group. More importantly, we also propose a newclassification method LGSR by enforcing both group sparsity aswell as local smooth sparsity constraints, and our experimentsdemonstrate that LGSR outperforms GSR for human gait recog-nition in terms of effectiveness and efficiency.

IV. EXPERIMENTS

The USF HumanID gait database collected by Sarkar et al. [1]is currently the largest publicly available database for evaluatinghuman gait recognition algorithms. It consists of 1870 videoclips from the 122 subjects walking on an elliptical path in frontof the cameras. For each person, there are up to five covariates:1) viewpoints (left/right); 2) shoe types (A/B); 3) surface types(grass/concrete); 4) carrying conditions (with/without a brief-case); and 5) date (May/November, where the time covariateimplicitly contains the change of shoes and clothing). In orderto test performance under different conditions, Sarkar et al. [1]fixed one gallery set containing the videos of all the 122 sub-jects and created 12 probe sets consisting of different numbersof subjects varying from 33 to 122. The differences betweenthe probe sets and the gallery set are summarized in Table I.Sarkar et al. [1] also proposed a baseline algorithm to extract thebinary silhouette, calculate the gait period length, and conductfinal matching. In order to facilitate the subsequent research inthis field, Sarkar et al. made the binary silhouettes and the gaitperiod lengths publicly available in http://figment.csee.usf.edu/GaitBaseline/. In this paper, we employ the GEIs rather than allthe binary silhouette images because [6], [7], [10], [17], [18],[20], and [23] have demonstrated experimentally that it is ef-fective and efficient to utilize GEIs for human gait recognition.Fig. 2 illustrates two different subjects with each row repre-senting the silhouette images of one person, in which the firstseven images are the normalized and aligned binary silhouetteimages within one gait cycle and the last image is the gray-levelGEI of the binary silhouette images.

Fig. 2. Each row represents the images within a gait cycle of a different person.For each row, the first seven images are normalized and aligned binary silhouetteimages, and the last image is the average GEI of the binary silhouette images.

A. Comparison of Different Features

We first compare our newly proposed feature Gabor-PDFwith the commonly used gray-level feature, the DCT feature,and the Gabor feature (i.e., referred to as Gray, DCT and Gabor,respectively) as well as the closely related feature DCT-PDF[22]. For DCT-PDF, we follow [22] to first extract a 25-DDCT feature from every pixel of each GEI and then representeach GEI as a set of 27-D augmented DCT features by con-catenating the DCT features with the X–Y coordinates. Then,the same process discussed in Section II-B is used to obtainthe DCT-PDF. For fair comparison, we empirically set thetotal number of Gaussian components (i.e., ) to 300 for bothDCT-PDF and Gabor-PDF. For the three baseline features (i.e.,Gray, DCT, and Gabor), we directly concatenate the featureextracted at every pixel to a lengthy feature vector.

For fair comparison of different features, we adopt the methodsuggested in [10] to compute directly the distance between theprobe video and one gallery video using the median andminimum operations on GEIs, i.e.,

Dist median min (16)

Based on the above distances, we use the nearest neighborclassifier for final classification. The results are shown inTable II, where Rank-1 indicates that the correct subject isranked as the top candidate, Rank-5 means that the correctsubject is ranked among the top five candidates, and Averageis the recognition rate among all the probe sets (i.e., the ratioof correctly recognized persons to the total number of personsin all the probe sets). From Table II, we have the followingobservations:

1) When comparing the three baseline features, the Gabor fea-ture outperforms the other two features on most probe sets,and the Gabor feature also achieves the best results in termsof the average Rank-1 and Rank-5 recognition rates. Ourobservation that the Gabor feature is a useful feature forhuman gait recognition is consistent with existing studies[7], [20].

Page 7: Human Gait Recognition Using Patch Distribution Feature and Locality-Constrained Group Sparse Representation

322 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2012

TABLE IIRANK-1 AND RANK-5 RECOGNITION RATE (IN PERCENTAGE) COMPARISON USING DIFFERENT FEATURES ON THE USF HUMANID GAIT DATABASE. THE NEAREST

NEIGHBOR CLASSIFIER IS USED FOR FINAL CLASSIFICATION

TABLE IIIAVERAGE RANK-1 AND RANK-5 RECOGNITION RATE (IN PERCENTAGE) COMPARISON USING DIFFERENT PARAMETERS FOR GABOR-PDF ON THE USF HUMANID

GAIT DATABASE. THE NEAREST NEIGHBOR CLASSIFIER IS USED FOR FINAL CLASSIFICATION

2) It is interesting to observe that DCT-PDF achieves theworst results for human gait recognition in terms of the av-erage Rank-1 recognition rate. One possible explanationis that the DCT features extracted from the noisy GEI areless effective when compared with the Gabor features. Italso indicates that it is desirable to develop new PDFs suchas Gabor-PDF for human gait recognition.

3) Our newly proposed feature Gabor-PDF achieves the bestresults on all the 12 probe sets, and it is also much betterthan other features in terms of the average Rank-1 andRank-5 recognition rates, which clearly demonstrates thatit is an effective feature for human gait recognition.

We also conduct an in-depth investigation of our newly pro-posed feature Gabor-PDF. First, we study the performance vari-ation of Gabor-PDF with respect to different parameters, in-cluding the number of scales and orientations of the Gabor fea-tures, as well as the number of Gaussian components (i.e., ).Specifically, we set to 100, 300, and 500. Recall that, in thedefault setting, we have 40 Gabor kernel functions from fivescales and eight orientations, in which we setand [see (1)]. We also test Gabor-PDFusing other parameter settings such as three scales by setting

and four orientations by setting .Table III reports the average Rank-1 and Rank-5 recognitionrates of Gabor-PDF using different parameters, in which we em-ploy the nearest neighbor classifier with the distances calculatedusing (16) for fair comparison. From Table III, we observe thatthe results using the default parameter setting (i.e., five scalesand eight orientations) for Gabor kernel functions are generallybetter when compared with other parameter settings like threescales and eight orientations, five scales and four orientations,and three scales and four orientations. Moreover, the results gen-erally improve when increases.

TABLE IVRUNNING TIME (IN SECONDS) OF GLOBAL GMM LEARNING AND THE

AVERAGE RUNNING TIME (IN SECONDS) OF MAP ADAPTATION FOR

ONE GALLERY/PROBE IMAGE USING THE DEFAULT SETTING OF

THE GABOR KERNEL FUNCTIONS

TABLE VAVERAGE RANK-1 AND RANK-5 RECOGNITION RATE (IN PERCENTAGE)COMPARISON BETWEEN GABOR-PDF AND GABOR-PDF_WITHOUTXYON THE USF HUMANID GAIT DATABASE. THE NEAREST NEIGHBOR

CLASSIFIER IS USED FOR FINAL CLASSIFICATION

TABLE VIAVERAGE RANK-1 AND RANK-5 RECOGNITION RATE (IN PERCENTAGE)

COMPARISON ON THE USF HUMANID GAIT DATABASE USING SR/GSR-BASEDMETHODS. TWO CLASSIFICATION RESULTS BASED ON THE MINRE CRITERION

AND THE MAXWIRE CRITERION ARE REPORTED

In Table IV, we also report the running time for the two-stepapproach used for Gabor-PDF extraction (see Section II-B), in

Page 8: Human Gait Recognition Using Patch Distribution Feature and Locality-Constrained Group Sparse Representation

XU et al.: HUMAN GAIT RECOGNITION USING PDF AND LGSR 323

TABLE VIIRANK-1 AND RANK-5 RECOGNITION RATE (IN PERCENTAGE) COMPARISON ON THE USF HUMAN ID GAIT DATABASE. THE RESULTS FOR GABOR-PDF-SR

AND GABOR-PDF-LLC (resp. GABOR-PDF-GSR AND GABOR-PDF-LGSR) ARE OBTAINED BY USING THE MINRE (resp. MAXWIRE) CRITERION. FOR

GABOR-PDF-LGSR, THE TWO NUMBERS IN THE PARENTHESES ARE THE OPTIMAL PARAMETERS FOR � AND �

which 40 Gabor kernel functions from five scales and eight ori-entations are employed. The experiments are conducted on anIBM workstation (3.33-GHz CPU with 16-GB RAM). Whenincreases, it takes more time to learn the global GMM and con-duct MAP adaptation for each gallery/probe image. As shownin Table III, the results of Gabor-PDF with are onlyslightly better when compared with if the default set-ting (i.e., five scales and eight orientations) for Gabor kernelfunctions is employed. To balance the effectiveness and effi-ciency, we empirically choose as the default setting forGabor-PDF when comparing different SR/GSR-based methodsin Section IV-B.

Finally, we also report the results in Table V for the PDF(referred to as Gabor-PDF_withoutXY), in which we representeach GEI as a set of 40-D Gabor features without includingthe X–Y coordinates. For Gabor-PDF_withoutXY, the sametwo-step approach and the normalization method, as discussedin Section II-B, are employed. Again, we adopt the nearestneighbor classifier with the distances calculated using (16) forfair comparison. From Table V, we observe that the resultsof Gabor-PDF are consistently better when compared withGabor-PDF_withoutXY, in which is set to 100, 300, and 500again.

B. Comparison of GSR-Based Classification Methods

Using Gabor-PDF as the input feature, we test the GSR-basedclassification methods including SR [25], LLC1 [27],GSR [30], and LGSR (i.e., referred to as Gabor-PDF-SR,Gabor-PDF-LLC, Gabor-PDF-GSR, and Gabor-PDF-LGSR,respectively) for human gait recognition. We also observethat, for some probe videos, it is possible that the number ofgallery videos (referred to as active gallery videos), whichare associated with nonzero reconstruction coefficients afterthe optimization process, is less than five. When calculatingthe Rank-5 recognition rate2 in this case, we count the probevideo as correctly classified if it has the same class label asone of the active gallery videos. Similar to other human gait

1In order to cope efficiently with a large amount of SIFT features, Wang et alproposed a fast encoding algorithm to obtain an approximated solution in [27].Considering that we only have a limited number of gallery/probe videos, in thispaper, we use the exact solution to the optimization problem in (14). Specifi-cally, the optimal solution � can be calculated directly by � � � �� �� �

�� �� �� � ����� � �� �� ��� �� �� � ��, where �� � �� �� ���

and �� � ���� �.2Note that the reported Rank-5 recognition accuracy values for the

SR/GSR-based classification methods are the underestimated results, i.e., theactual Rank-5 results for those methods should not be worse than the reportedRank-5 results in Tables VI and VII.

Page 9: Human Gait Recognition Using Patch Distribution Feature and Locality-Constrained Group Sparse Representation

324 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2012

recognition algorithms [7], [9], several parameters need to beset beforehand for each SR/GSR-based method. Specifically,we empirically set parameter for allthe SR/GSR-based methods and set the bandwidth parameter

for LLC and LGSR. In Tables VIand VII, we report the best results from the optimal parametercombinations. It is still an open problem to decide automaticallythe optimal parameters, which will be investigated in the future.

For all the SR/GSR-based methods, we make the final de-cision based on the minimum reconstruction error (minRE)criterion or the maximum weighted inverse reconstructionerror (maxWIRE) criterion, as discussed in Section III-B. InTable VI, we report the average Rank-1 and Rank-5 recognitionrates of Gabor-PDF-SR, Gabor-PDF-LLC, Gabor-PDF-GSR,and Gabor-PDF-LGSR by using two criteria. It is interesting toobserve that the results using the classification method basedon the minRE criterion are better for SR-based methods (i.e.,Gabor-PDF-SR and Gabor-PDF-LLC), whereas the resultsusing the classification method based on the maxWIRE crite-rion are better for GSR-based methods (i.e., Gabor-PDF-GSRand Gabor-PDF-LGSR). An explanation to this is that thenonzero reconstruction coefficients spread over multiplegallery videos in SR-based methods. Hence, it is less effectiveto utilize the Frobenius norm of the reconstruction coefficientfor classification. On the other hand, the nonzero reconstructioncoefficients in GSR-based methods are only from a limitednumber of gallery videos; therefore, it is beneficial to use theFrobenius norm of the reconstruction coefficients in this case.In the subsequent Table VII, we report the detailed results forGabor-PDF-SR and Gabor-PDF-LLC (resp. Gabor-PDF-GSRand Gabor-PDF-LGSR) using the minRE (resp. maxWIRE)criterion.

We also compare the SR/GSR-based methods with otherstate-of-the-art algorithms including the baseline [1], HMM[11], LDA [6], LDA Sync [6], LDA Fusion [6], the Dy-namic-Normalized Gait Recognition (DNGR) algorithm [9],Matrix-based Marginal Fisher Analysis (MMFA) [18], GeneralTensor Discriminant Analysis (GTDA) [7], Image-to-ClassDistance [20], and Regularized Trace-Ratio Discriminant Anal-ysis (RTRDA) [23]. For GTDA [7], there are multiple resultsfrom the combinations of different feature representationsand dimension-reduction algorithms. We only report the bestresults from the optimal combination in terms of the averagerecognition rates. Note that the Gabor feature is employed inGTDA [7], Image-to-Class Distance [20], and RTRDA [23],and different dimension-reduction methods such as LDA [6],[9], MMFA [18], GTDA [7] and RTRDA [23] are also used toextract the more effective features before conducting the finalclassification.

From Table VII, we have the following observations:1) The SR/GSR-based methods Gabor-PDF-SR, Gabor-

PDF-LLC, Gabor-PDF-GSR, and Gabor-PDF-LGSRoutperform the existing methods [1], [6], [7], [9], [11],[18], [20], [23] as well as Gabor-PDF-NN in terms of theaverage Rank-1 and Rank-5 recognition rates.

2) Gabor-PDF-LLC is better than Gabor-PDF-SR in terms ofthe average Rank-1 and Rank-5 recognition rates, which

TABLE VIIIAVERAGE RUNNING TIME TO OBTAIN THE RECONSTRUCTION COEFFICIENT

FOR ONE PROBE VIDEO USING GABOR-PDF-SR, GABOR-PDF-LLC,GABOR-PDF-GSR, AND GABOR-PDF-LGSR. PARAMETER �

IS SET AS 1/16 IN THIS TABLE

demonstrates that it is beneficial to enforce the localsmooth sparsity constraint for human gait recognition.

3) The GSR-based methods (i.e., Gabor-PDF-GSR andGabor-PDF-LGSR) perform better than the SR-basedmethods (i.e., Gabor-PDF-SR and Gabor-PDF-LLC) interms of the average Rank-1 and Rank-5 recognition rates.An explanation is that SR-based methods neglect thegroup information for the GEIs in the gallery videos andthe probe video. In contrast, the GSR-based methods canutilize effectively the group information by enforcing thegroup sparsity constraint.

4) Gabor-PDF-LGSR outperforms Gabor-PDF-SR,Gabor-PDF-LLC, and Gabor-PDF-GSR in terms ofthe average Rank-1 and Rank-5 recognition rates, whichdemonstrates that it is beneficial to enforce simultaneouslythe group sparsity and local smooth sparsity constraints. Interms of the Rank-1 recognition rates, Gabor-PDF-LGSRis much better than Gabor-PDF-GSR, Gabor-PDF-LLC,and Gabor-PDF-SR on probe J, in which the performanceimprovement is at least 6%.

5) Gabor-PDF-LGSR achieves the best average Rank-1 andRank-5 recognition rates on this data set among all gaitrecognition algorithms proposed to date. Specifically, theaverage Rank-1 (resp. Rank-5) recognition rate is signif-icantly increased from 65.41% in [23] to 70.07% (resp.from 82.05% in [9] to 85.31%), which is equivalent to13.47% (resp. 18.16%) relative classification error reduc-tion. Moreover, the DNGR algorithm in [9] used addi-tional training data (i.e., manually annotated silhouettes) tolearn a pHMM in order to facilitate the temporal alignmentof any probe silhouette image sequence with respect tothe population-based generic walking model. However, themanually annotated silhouettes used in the pHMM learningare not publicly available. When compared with the secondbest Rank-5 recognition rate 80.11% in [23], which is ob-tained using the same training data, the relative classifica-tion error reduction of our Gabor-PDF-LGSR is 26.14%.

In Table VIII, we report the average running time to obtainthe reconstruction coefficient for one probe video using Gabor-PDF-SR, Gabor-PDF-LLC, Gabor-PDF-GSR, and Gabor-PDF-LGSR, where the experiments are conducted with Matlab codeon an IBM workstation (3.33-GHz CPU with 16-GB RAM).For Gabor-PDF-LLC, the running time is generally comparable

Page 10: Human Gait Recognition Using Patch Distribution Feature and Locality-Constrained Group Sparse Representation

XU et al.: HUMAN GAIT RECOGNITION USING PDF AND LGSR 325

Fig. 3. Illustration of the convergence for LGSR with the first video in Probe A, Probe C, and Probe E, respectively.

with different . Therefore, we directly report the average run-ning time for all values. From Table VIII, we observe thatGabor-PDF-LGSR using different is much faster than Gabor-PDF-GSR. An explanation is that, for Gabor-PDF-LGSR, thetotal number of groups in the active set, for which the corre-sponding reconstruction coefficients need to be updated, is lesswhen compared with Gabor-PDF-GSR because of the weighted

mixed-norm penalty imposed on the reconstruction coeffi-cients. When a small value of is used in Gabor-PDF-LGSR,it runs even faster than Gabor-PDF-LLC, whose solution is inclosed form and only involves simple matrix multiplication andinversion operations.

In Fig. 3, we take LGSR as an example to illustrate the conver-gence of algorithm 1 using the first video from probe A, probeC, and probe E as the test data. It is obvious from Fig. 3 thatour algorithm converges after about 70 iterations. We observe asimilar trend for both LGSR and GSR on all the test videos.

V. CONCLUSION

In this paper, we have proposed a new PDF (referred to asGabor-PDF) for human gait recognition. To extract Gabor-PDF,we represent each GEI as a set of local augmented Gabor fea-tures from which the distribution is estimated by exploiting atwo-stage approach to learn an image-specific GMM. Moreover,we develop a new classification method referred to as LGSR byenforcing both group sparsity and local smooth sparsity con-straints, and we also show that the standard GSR-based methodis a special case of LGSR. Comprehensive experiments on thebenchmark USF HumanID database demonstrate the effective-ness of our newly proposed feature Gabor-PDF and the newclassification method LGSR for human gait recognition.

REFERENCES

[1] S. Sarkar, P. J. Phillips, Z. Liu, I. R. Vega, P. Grother, and K. W.Bowyer, “The HumanID gait challenge problem: Data sets, perfor-mance, and analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27,no. 2, pp. 162–177, Feb. 2005.

[2] S. V. Stevenage, M. S. Nixon, and K. Vince, “Visual analysis of gaitas a cue to identity,” Appl. Cogn. Psychol., vol. 13, no. 6, pp. 513–526,Dec. 1999.

[3] A. F. Bobick and A. Y. Johnson, “Gait recognition using static activity-specific parameters,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog.,2001, pp. 423–430.

[4] R. Tanawongsuwan and A. Bobick, “Gait recognition from time-nor-malized joint-angle trajectories in the walking plane,” in Proc. IEEEConf. Comput. Vis. Pattern Recog., 2001, pp. 726–731.

[5] R. T. Collins, R. Gross, and J. Shi, “Silhouette-based human identifica-tion from body shape and gait,” in Proc. IEEE Int. Conf. Autom. FaceGesture Recog., 2002, pp. 351–356.

[6] J. Han and B. Bhanu, “Individual recognition using gait energy image,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 2, pp. 316–332,Feb. 2006.

[7] D. Tao, X. Li, X. Wu, and S. Maybank, “General tensor discriminantanalysis and gabor features for gait recognition,” IEEE Trans. PatternAnal. Mach. Intell., vol. 29, no. 10, pp. 1700–1715, Oct. 2007.

[8] L. Wang, T. Tan, H. Ning, and W. Hu, “Silhouette analysis-based gaitrecognition for human identification,” IEEE Trans. Pattern Anal. Mach.Intell., vol. 25, no. 12, pp. 1505–1518, Dec. 2003.

[9] Z. Liu and S. Sarkar, “Improved gait recognition by gait dynamics nor-malization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 6, pp.863–876, Jun. 2006.

[10] Z. Liu and S. Sarkar, “Simplest representation yet for gait recognition:Averaged silhouette,” in Proc. IEEE Int. Conf. Pattern Recog., 2004,pp. 211–214.

[11] A. Kale, A. Sundaresan, A. N. Rajagopalan, N. P. Cuntoor, A. K. Roy-Chowdhury, V. Krüger, and R. Chellappa, “Identification of humansusing gait,” IEEE Trans. Image Process., vol. 13, no. 9, pp. 1163–1173,Sep. 2004.

[12] S. D. Mowbray and M. S. Nixon, “Automatic gait recognition viaFourier descriptors of deformable objects,” in Proc. AVBPA, 2003, pp.566–573.

[13] L. Lee and W. E. L. Grimson, “Gait analysis for recognition and classi-fication,” in Proc. IEEE Conf. Face Gesture Recog., 2002, pp. 155–161.

[14] A. Veeraraghavan, A. K. Roy-Chowdhury, and R. Chellappa,“Matching shape sequences in video with applications in humanmovement analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27,no. 12, pp. 1896–1909, Dec. 2005.

[15] L. Lee, G. Dalley, and K. Tieu, “Learning pedestrian models for sil-houette refinement,” in Proc. IEEE Int. Conf. Comput. Vis., 2003, pp.663–670.

[16] N. Boulgouris and Z. Chi, “Gait recognition using radon transform andlinear discriminant analysis,” IEEE Trans. Image Process., vol. 16, no.3, pp. 731–740, Mar. 2007.

[17] D. Xu, S. Yan, D. Tao, L. Zhang, X. Li, and H. Zhang, “Human gaitrecognition with matrix representation,” IEEE Trans. Circuits Syst.Video Technol., vol. 16, no. 7, pp. 896–903, Jul. 2006.

[18] D. Xu, S. Yan, D. Tao, S. Lin, and H. Zhang, “Marginal fisher analysisand its variants for human gait recognition and content based imageretrieval,” IEEE Trans. Image Process., vol. 16, no. 11, pp. 2811–2821,Nov. 2007.

[19] X. Li, S. Lin, S. Yan, and D. Xu, “Discriminant locally linear embed-ding with high-order tensor data,” IEEE Trans. Syst., Man, Cybern. B,Cybern., vol. 38, no. 2, pp. 342–352, Apr. 2008.

[20] Y. Huang, D. Xu, and T. J. Cham, “Face and human gait recogni-tion using image-to-class distance,” IEEE Trans. Circuits Syst. VideoTechnol., vol. 20, no. 3, pp. 431–438, Mar. 2010.

[21] S. Lucey and T. Chen, “A GMM parts based face representation forimproved verification through relevance adaptation,” in Proc. IEEE Int.Conf. Comput. Vis. Pattern Recog., 2004, pp. 855–861.

[22] M. Liu, S. Yan, Y. Fu, and T. Huang, “Flexible X-Y patches for facerecognition,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.,2008, pp. 2113–2116.

[23] Y. Huang, D. Xu, and F. Nie, “Regularized trace ratio discriminantanalysis with patch distribution feature for human gait recognition,” inProc. IEEE Int. Conf. Image Process., 2010, pp. 2449–2452.

[24] J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. S. Huang, and S. Yan,“Sparse representation for computer vision and pattern recognition,”Proc. IEEE, vol. 98, no. 6, pp. 1031–1044, Jun. 2010.

Page 11: Human Gait Recognition Using Patch Distribution Feature and Locality-Constrained Group Sparse Representation

326 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2012

[25] J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust facerecognition via sparse representation,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 31, no. 2, pp. 210–226, Feb. 2009.

[26] X. Yuan and S. Yan, “Visual classification with multi-task joint sparserepresentation,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog.,2010, pp. 3493–3500.

[27] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong, “Locality-constrained linear coding for image classification,” in Proc. IEEE Int.Conf. Comput. Vis. Pattern Recog., 2010, pp. 3360–3367.

[28] J. Yang, K. Yu, Y. Gong, and T. Huang, “Linear spatial pyramidmatching using sparse coding for image classification,” in Proc. IEEEInt. Conf. Comput. Vis. Pattern Recog., 2009, pp. 1794–1801.

[29] S. Zhang, J. Huang, Y. Huang, and D. Metaxas, “Automatic imageannotation using group sparsity,” in Proc. IEEE Int. Conf. Comput. Vis.Pattern Recog., 2010, pp. 3312–3319.

[30] M. Yuan and Y. Lin, “Model selection and estimation in regression withgrouped variables,” J. R. Stat. Soc., Ser. B, vol. 68, no. 1, pp. 49–67,Feb. 2006.

[31] Y. Eldar and M. Mishali, “Robust recovery of signals from a struc-tured union of subspaces,” IEEE Trans. Inf. Theory, vol. 55, no. 11, pp.5302–5316, Nov. 2009.

[32] L. Wiskott, J. M. Fellous, N. Kuiger, and C. Malsburg, “Face recog-nition by elastic bunch graph matching,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 19, no. 7, pp. 775–779, Jul. 1997.

[33] D. Xu and S. F. Chang, “Video event recognition using kernel methodswith multi-level temporal alignment,” IEEE Trans. Pattern Anal. Mach.Intell., vol. 30, no. 11, pp. 1985–1997, Nov. 2008.

[34] H. Lee, A. Battle, R. Raina, and A. Y. Ng, “Efficient sparse codingalgorithms,” in Proc. NIPS, 2006, pp. 801–808.

[35] G. Obozinski, B. Taskar, and M. Jordan, “Joint covariate selection andjoint subspace selection for multiple classification problems,” J. Statist.Comput., vol. 20, no. 2, pp. 231–252, Jan. 2010.

[36] B. Rao, “Analysis and extension of the FOCUSS algorithm,” in Proc.IEEE Asilomar Conf. Signals, Syst., Comput., 1996, pp. 1218–1223.

Dong Xu (M’07) received the B.E. and Ph.D. degreesfrom the University of Science and Technology ofChina, Hefei, China, in 2001 and 2005, respectively.

During his doctoral studies, he was with MicrosoftResearch Asia, Beijing, China, and with the ChineseUniversity of Hong Kong, Shatin, Hong Kong, formore than two years. For one year, he was a Post-doctoral Research Scientist with Columbia Univer-sity, New York, NY. He is currently an Assistant Pro-fessor with Nanyang Technological University, Sin-gapore. His current research interests include com-

puter vision, statistical learning, and multimedia content analysis.Dr. Xu was a co-recipient of the Best Student Paper Award from the IEEE

International Conference on Computer Vision and Pattern Recognition in 2010.

Yi Huang received the B.E. degree from ChongqingUniversity, Chongqing, China, in 2006. She iscurrently working toward the Ph.D. degree withthe School of Computer Engineering, NanyangTechnological University, Singapore.

Her research interests include dimension reduc-tion and its application in face and human gaitrecognition.

Ms. Huang was the recipient of the Microsoft Re-search Asia Fellowship in 2008.

Zinan Zeng received the B.E. (first honor) degreefrom Nanyang Technological University, Singapore,in 2008, where he is currently working toward theM.S. degree.

From 2008 to 2009, he was an R&D Engineer atGDC Technology, North Point, Hong Kong. He iscurrently a Research Assistant with Nanyang Tech-nological University, Singapore. His research inter-ests include statistical learning, sparse coding, andclustering and their application in computer vision.

Xinxing Xu received the B.E. degree from theUniversity of Science and Technology of China,Hefei, China, in 2009. He is currently working to-ward the Ph.D. degree with the School of ComputerEngineering, Nanyang Technological University,Singapore.

His current research interests include multiplekernel learning as well as image and video under-standing.