recursive spatiotemporal subspace learning for gait recognition

8
Recursive spatiotemporal subspace learning for gait recognition Rong Hu , Wei Shen, Hongyuan Wang Department of Electronics and Information Engineering, Huazhong University of Science and Technology, Wuhan 430074, China article info Available online 12 March 2010 Keywords: Gait recognition Recursive spatiotemporal subspace learning Periodicity feature vector Gait feature vector Principal Component Analysis Discriminative Locality Alignment abstract In this paper, we propose a new gait recognition method using recursive spatiotemporal subspace learning. In the first stage, periodic dynamic feature of gait over time is extracted by Principal Component Analysis (PCA) and gait sequences are represented in the form of Periodicity Feature Vector (PFV). In the second stage, shape feature of gait over space is extracted by Discriminative Locality Alignment (DLA) based on the PFV representation of gait sequences. After the recursive subspace learning, gait sequence data is compressed into a very compact vector named Gait Feature Vector (GFV) which is used for individual recognition. Compared to other gait recognition methods, GFV is an effective representation of gait because the recursive spatiotemporal subspace learning technique extracts both the shape features and the dynamic features. And at the same time, representing gait sequences in PFV form is an efficient way to save storage space and computational time. Experimental result shows that the proposed method achieves highly competitive performance with respect to the published gait recognition approaches on the USF HumanID gait database. & 2010 Elsevier B.V. All rights reserved. 1. Introduction Gait recognition [1–3] is a challenging signal processing technology for video surveillance and biometric identification, which has gained more and more attention from people. Compared to other biometrics such as face [4], fingerprints [5], iris [6], and signature [7], gait can be used to identify persons more effectively under public surveillance circumstance. The benefits of gait recognition are: it is hard to fake because gait recognition requires no prior consent of the observed subject; no special equipment is required for image acquisition, it is easy to fix cameras at the corners of public; it offers potential for recognition at a long distance when the observed subject occupies too few image pixels for other biometrics to be perceivable. As most of the biometric recognition applications, gait recognition seeks to extract the human gait feature for identifica- tion. The only difference is that gait data is a three dimensional cube, as in Fig. 1, while the dimensions of other biometrics [4–7] are two. The additional dimension of gait is the time axes, wherein the dynamics of gait are contained. As a consequence, gait recognition has to experience two feature extractions to get both the shape features and the dynamic features. Fig. 1 shows the framework of gait recognition. There are two alternative paths for gait feature extraction: the upper one takes space-time extracting order, and the lower one takes time-space extracting order. The final gait feature is obtained by a recursive feature extraction on the interim space or time gait feature. 2. Related work and our contribution In recent years, various techniques have been proposed for human recognition by gait. These techniques can be divided as model-based and model-free approaches. The model-based approaches aim to derive the movement of the torso and/or the legs. BenAbdelkader et al.’s approach using structural stride parameters (stride and cadence) [8] is a prime example of a model-based approach. An early system for automatic extraction and description of gait models was proposed by Cunado et al. [9], then Yam et al. extended this system to describe both legs and to handle walking as well as running [10]; Wagg and Nixon [11] developed an alternative model-based system which uses evidence gathering and model-based analysis driven by anatomical constraints. Unlike model-based approaches, model-free approaches operate directly on the gait sequences without assuming any specific model for the walking human. A prime example of a model-free approaches are Kale et al. and Sundaresan et al.’s deployment of hidden Markov models (HMM) [12,13] which consider two different image features: the width of the outer contour of a binary silhouette and the entire binary silhouette itself. Wang et al. and Vega and Sarkar considered other image features: silhouette boundary vector variations [14] and change of feature relationship [15], principal components analysis is applied on these features to reduce their dimensions while ARTICLE IN PRESS Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/neucom Neurocomputing 0925-2312/$ - see front matter & 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2009.12.034 Corresponding author. Tel./fax: + 86 027 87543535. E-mail addresses: [email protected] (R. Hu), [email protected] (W. Shen), [email protected] (H. Wang). Neurocomputing 73 (2010) 1892–1899

Upload: nano-scientific-research-centre-pvtltd

Post on 11-May-2015

375 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Recursive spatiotemporal subspace learning for gait recognition

ARTICLE IN PRESS

Neurocomputing 73 (2010) 1892–1899

Contents lists available at ScienceDirect

Neurocomputing

0925-23

doi:10.1

� Corr

E-m

(W. She

journal homepage: www.elsevier.com/locate/neucom

Recursive spatiotemporal subspace learning for gait recognition

Rong Hu �, Wei Shen, Hongyuan Wang

Department of Electronics and Information Engineering, Huazhong University of Science and Technology, Wuhan 430074, China

a r t i c l e i n f o

Available online 12 March 2010

Keywords:

Gait recognition

Recursive spatiotemporal subspace

learning

Periodicity feature vector

Gait feature vector

Principal Component Analysis

Discriminative Locality Alignment

12/$ - see front matter & 2010 Elsevier B.V. A

016/j.neucom.2009.12.034

esponding author. Tel./fax: +86 027 8754353

ail addresses: [email protected] (R. Hu)

n), [email protected] (H. Wang).

a b s t r a c t

In this paper, we propose a new gait recognition method using recursive spatiotemporal subspace

learning. In the first stage, periodic dynamic feature of gait over time is extracted by Principal

Component Analysis (PCA) and gait sequences are represented in the form of Periodicity Feature Vector

(PFV). In the second stage, shape feature of gait over space is extracted by Discriminative Locality

Alignment (DLA) based on the PFV representation of gait sequences. After the recursive subspace

learning, gait sequence data is compressed into a very compact vector named Gait Feature Vector (GFV)

which is used for individual recognition. Compared to other gait recognition methods, GFV is an

effective representation of gait because the recursive spatiotemporal subspace learning technique

extracts both the shape features and the dynamic features. And at the same time, representing gait

sequences in PFV form is an efficient way to save storage space and computational time. Experimental

result shows that the proposed method achieves highly competitive performance with respect to the

published gait recognition approaches on the USF HumanID gait database.

& 2010 Elsevier B.V. All rights reserved.

1. Introduction

Gait recognition [1–3] is a challenging signal processingtechnology for video surveillance and biometric identification,which has gained more and more attention from people.Compared to other biometrics such as face [4], fingerprints [5],iris [6], and signature [7], gait can be used to identify personsmore effectively under public surveillance circumstance. Thebenefits of gait recognition are: it is hard to fake because gaitrecognition requires no prior consent of the observed subject; nospecial equipment is required for image acquisition, it is easy tofix cameras at the corners of public; it offers potential forrecognition at a long distance when the observed subject occupiestoo few image pixels for other biometrics to be perceivable.

As most of the biometric recognition applications, gaitrecognition seeks to extract the human gait feature for identifica-tion. The only difference is that gait data is a three dimensionalcube, as in Fig. 1, while the dimensions of other biometrics [4–7]are two. The additional dimension of gait is the time axes,wherein the dynamics of gait are contained. As a consequence,gait recognition has to experience two feature extractions to getboth the shape features and the dynamic features. Fig. 1 showsthe framework of gait recognition. There are two alternative pathsfor gait feature extraction: the upper one takes space-timeextracting order, and the lower one takes time-space extracting

ll rights reserved.

5.

, [email protected]

order. The final gait feature is obtained by a recursive featureextraction on the interim space or time gait feature.

2. Related work and our contribution

In recent years, various techniques have been proposed forhuman recognition by gait. These techniques can be divided asmodel-based and model-free approaches. The model-basedapproaches aim to derive the movement of the torso and/or thelegs. BenAbdelkader et al.’s approach using structural strideparameters (stride and cadence) [8] is a prime example of amodel-based approach. An early system for automatic extractionand description of gait models was proposed by Cunado et al. [9],then Yam et al. extended this system to describe both legsand to handle walking as well as running [10]; Wagg andNixon [11] developed an alternative model-based systemwhich uses evidence gathering and model-based analysis drivenby anatomical constraints. Unlike model-based approaches,model-free approaches operate directly on the gait sequenceswithout assuming any specific model for the walking human.A prime example of a model-free approaches are Kale et al. andSundaresan et al.’s deployment of hidden Markov models (HMM)[12,13] which consider two different image features: the width ofthe outer contour of a binary silhouette and the entire binarysilhouette itself. Wang et al. and Vega and Sarkar considered otherimage features: silhouette boundary vector variations [14] andchange of feature relationship [15], principal components analysisis applied on these features to reduce their dimensions while

Page 2: Recursive spatiotemporal subspace learning for gait recognition

ARTICLE IN PRESS

Space Feature

Time Feature

Extraction

X

GaitDataCube

Time

Time

Y

X

YExtraction Feature ExtractionGait Feature Data

Gait Feature Data

A Recursive Space

Feature Extraction

A Recursive Time

Time Feature

Space Feature

Fig. 1. The framework of gait recognition to extract gait feature data.

R. Hu et al. / Neurocomputing 73 (2010) 1892–1899 1893

increasing the discriminative power. Sarkar et al. proposed anapproach which performs recognition by temporal correlation ofsilhouettes [16], the aim was to develop a technique againstwhich future performance could be evaluated. Simple temporalcorrelation [14–16] is the most common method for sequencematching, while other methods like Fourier analysis [17] anddynamic time warping [18] are available.

Model-based approaches mainly focus on the dynamics of gait,while shape features are omitted. And since model-basedapproaches rely on the identification of specific gait parameters inthe gait sequence, these approaches usually require high-quality gaitsequences to be useful. Moreover, other hindrances such as selfocclusion of walking subjects may even render the computation ofmodel parameters impossible. Because of the fatal drawbacksof model-based approaches, we focus on model-free approach.Model-free approaches usually perform space feature extractionfirst, and then match the feature sequence using simple temporalcorrelation or dynamic time warping, etc. But in real problems, thesilhouette images are quite distorted and the extracted space featurecannot represent the gait stance, so that the classification rate is notsatisfying. Efforts have been done by averaging to resolve suchproblems, like temporal template [20], gait energy image [19] andeigenstance of gait [13], which achieved obvious improvement.However, the average image method is too simple to contain all thedynamic features of gait and eigenstance method needs accurategait cycle detection and phase estimation.

Based on the above discussions, we proposed a new gaitrecognition method using the recursive spatiotemporal subspacelearning. Different from most of the model-free approaches, weextract the time feature first. Since gait is a periodic activity withstable frequency, each location of aligned gait images should takecertain periodic characteristic during the movement. To make fulluse of this periodic (dynamic) feature of gait, we adopt DFT (discreteFourier transform) to transform the periodic signals at each locationof aligned gait images into the frequency domain. An unsupervisedsubspace learning, PCA (Principal Component Analysis), is thenapplied to obtain the most discriminative components from thefrequency signals. The extracted feature vector is called PeriodicFeature Vector (PFV) which represents the dynamic feature at eachlocation of gait. Based on the PFV representation of gait, a recursivesubspace learning process named Discriminative Locality Alignment(DLA) is applied to extract the shape feature of gait. After DLA,original gait sequence data is compressed into a very compact vectornamed Gait Feature Vector (GFV). The Euclidean distance betweenGFVs is used for measuring the similarity of gaits.

In comparison with state-of-the-art, the contributions of thispaper are:

1.

Unlike model-based approaches, the recursive spatiotemporalsubspace learning method considers both the dynamic and

shape feature of gait, which improves the recognition rate. Andsince it does not rely on the identification of specific gaitparameters, we do not require high-quality gait sequences tobe useful. Other hindrances such as self occlusion will not be aproblem either.

2.

PFV is robust to silhouette distortions. Different from most ofthe model-free approaches, we extract the dynamic featurefirst. So in our approach, the shape feature of gait is basedon the entire sequence but not single frames. Silhouettedistortion in a single frame is averaged by the whole sequence,which makes PFV a robust representation for gait dynamics.

3.

PFV is an efficient representation for gait dynamics, whichmakes full use of periodic characteristic of gait. Compared tothe average gait image, gait PFV representation contains moredynamic features.

4.

The PFV representation of gait saves both storage space andcomputational time. It compresses the gait sequence into asingle vector image while preserving most of the temporalinformation. At the same time, there is no need to extractshape feature frame by frame. Instead, based on the vectorimage, shape feature extraction is carried out only once.

5.

Compared to traditional Fisher linear discriminant analysis(LDA), Discriminative Locality Alignment (DLA) revealsthe nonlinear structure hidden in the high dimensional andNon-Gaussian distributed data. Meanwhile, DLA does notsuffer from the matrix singularity problem.

The rest of paper is organized as follows: Section 3 introducesthe details of extracting periodicity feature vector using DFT andPCA; Section 4 introduces the details of extracting gait featurevector based on the gait PFV representation using DLA, and theprocedure of individual recognition; Section 5 presents theexperiment result and analysis; Section 6 concludes the paper.

3. Periodicity feature vector

3.1. Motivation

Gait is a typical periodic activity with stable frequency, so weseek to find an efficient way to represent the dynamics at differentgait locations while making the best use of its periodiccharacteristic. Under the assumption that the observed person isproperly aligned along the whole gait sequence, the samplesequences at different gait locations should be periodic. And forperiodic signals, power spectrum (using DFT) is the best choice forrepresenting the signal, since it contains all dynamic character-istics and sequences are automatically aligned at differentspectral positions. However, the power spectrums are still notdiscriminative enough. It is hard to tell the difference between

Page 3: Recursive spatiotemporal subspace learning for gait recognition

ARTICLE IN PRESS

R. Hu et al. / Neurocomputing 73 (2010) 1892–18991894

those power spectrums. To achieve the best discriminative effect,a subspace learning process should be followed. It is a typical caseof unsupervised subspace learning [30], since we do not knowhow many classes there are and which class a training spectrumbelongs to. We adopt PCA [31] to increase the discriminativepower between spectrum signals.

3.2. Preprocessing

Silhouettes are extracted using the simple background sub-traction method [16], a normalization step is followed immedi-ately to make sure that all extracted silhouettes are scaled to thesame height and the centroid of body is aligned with the image’scenterline. The sample sequence at position (x, y) is denoted byP(x, y, t), in which variable t represents the time axis. To reducethe sequential noise, P(x, y, t) is first Gaussian filtered in the timedimension,

Pgðx,y,tÞ ¼ Pðx,y,tÞ�GsðtÞ ð1Þ

where Gs(t) stands for a Gaussian kernel and symbol ‘*’ denotesthe convolution operation. Fourier Transform is then appliedto these time sequences. But before that, Pg(x, y, t) should be zero-meaned to remove the DC component from its power spectrums,

Pzðx,y,tÞ ¼ Pgðx,y,tÞ�1

N

XN

t ¼ 1

Pgðx,y,tÞ ð2Þ

where N is the length of gait sequence.Since Fourier transform is carried out in the discrete form and the

length of gait sequence varies, interpolation is necessary to makesure that all power spectrums are aligned at the same frequencylocations. Suppose the gait sequences have the same samplingrate. These sequences are transformed by DFT (discrete Fouriertransform) to the frequency domain in which the frequency samplesare uniformly distributed over the range [0, fs/2], where fs isthe sampling rate. If the length of the kth gait sequence is Nk, thenthe number of samples of the corresponding frequency sequence isalso Nk. The aim of interpolation is to transform this sequencefrom Nk samples to N samples while preserving its frequencycharacteristic, where NaNk.

The first step of interpolation is to expand the power spectrumfrom Nk samples to aNk samples while keeping the running sumunchanged,

XNk

i ¼ 1

xðiÞ ¼XaNk

j ¼ 1

exðjÞ ð3Þ

where x(i) represents the spectrum sequence and a is a positiveinteger. ~xðjÞ is calculated by the following criterion:

~xðjÞ ¼ xðiÞ=a, if jA ½ði�1Þaþ1,ia� ð4Þ

The second step of interpolation is to compress the expandedsequence from aNk samples to N samples and to keep the runningsum unchanged,

XaNk

i ¼ 1

~xðiÞ ¼XN

j ¼ 1

x̂ðjÞ ð5Þ

x̂ðjÞ is calculated as follows:

x̂ðjÞ ¼Xjb

i ¼ ðj�1Þbþ1

~xðiÞ ð6Þ

Note that both a and b are positive integers and they areproperly chosen such that aNk¼bN.

The sampling rates of gait are the same at most of times.However, if sampling rates differ, an adding trailing zeroes stepshould be applied before interpolation. Trailing zeroes expand

those frequency sequences whose sampling rates are less than themaximum sampling rate. Suppose the original frequency range ofthe kth gait sequence is [0, fk] while the maximum frequencyrange is ½0; fmax�, trailing zeros are added between the frequencyrange ½fk; fmax�and the number of trailing zeros is

Nzeros ¼fmax�fk

fk=Nkð7Þ

where Nk is the number of samples in the original frequencysequence.

3.3. PCA training

The training set of PCA consists of power spectrums at eachlocation of the training gait sequences. Suppose Npca is thenumber of training gait sequences and the size of silhouetteimage is L, then the training set of PCA can be represented byX¼{x1, x2, y , xn}, where xi is the processed spectrum as describedin Section 3.2, and n¼L�Npca. The dimension of xi is N, whichequals to the number of samples in the power spectrums. First,compute the scatter matrix S of training set X,

S¼Xn

i ¼ 1

ðxi�mÞðxi�mÞT ð8Þ

where m¼ 1=nðPn

i ¼ 1 xiÞ, then we obtain the d-dimensionalperiodicity feature vector yk by

yk ¼ ½e1,e2, . . . ,ed�T xk ¼Mxk ð9Þ

where doN, and [e1,e2, y, ed] are the d eigenvectors of the scattermatrix S with highest eigenvalues.

The dimension of periodicity feature vector yk is d, which dependson the eigenvalues of scatter matrix S. Suppose {l1, l2,y, lN} are theeigenvalues of scatter matrix S arranging from high to low, parameterd is decided by the following criterion:

Wd ¼Xd

i ¼ 1

li

XN

i ¼ 1

li4Ts

,ð10Þ

where Ts is the threshold.Once the transform matrix [e1, e2, y, ed]T is established in the

training step, it can be used directly to get periodicity featurevectors of all power spectrums, and time sequences are replacedby periodicity feature vector yk. Since the dimension of yk is muchless than the original gait sequence length, the PFV representationof gait saves a lot of storage space.

The PFV representation of gait is a vector image which can bedivided into dimensions. The collection of the ith dimensionalvalue of yk at each location of gait forms the ith dimensionalimage of PFV representation. To give a more intuitive under-standing, we scale the values into range [0, 255] and show it asgray level image. Fig. 2 shows the first five dimensions of gait PFVrepresentation together with the averaged gait image as acomparison. From Fig. 2 we can see that each dimensionrepresents a different periodicity characteristic.

4. Space feature extraction

Principal components analysis (PCA) and Fisher’s lineardiscriminant analysis (LDA) are two of the most popular lineardimensionality reduction algorithms. PCA [31] maximizes themutual information between original high dimensional Gaussiandistributed data and projected low dimensional data. PCA isoptimal for reconstruction of Gaussian distributed data. Howeverit is not optimal for classification problems. LDA [32] overcomesthis shortcoming by utilizing the class label information. It finds

Page 4: Recursive spatiotemporal subspace learning for gait recognition

ARTICLE IN PRESS

Fig. 2. (a) Averaged gait image, (b)–(f) the first five dimensions of gait PFV

representation.

yk , j

yk , i

E[yi]

Y

R. Hu et al. / Neurocomputing 73 (2010) 1892–1899 1895

the projection directions that maximize the trace of the between-class scatter matrix and minimize the trace of the within-classscatter matrix simultaneously. While LDA is a good algorithm tobe applied for classification, it also has several problems asfollows. First, LDA considers only the global Euclidean structure,so it cannot discover the nonlinear structure hidden in the highdimensional non-Gaussian distributed data. Second, LDA is in factbased on the assumption that all samples contribute equivalentlyfor discriminative dimensionality reduction, although samplesaround the margins are more important in classification thaninner samples. Finally, LDA suffers from the matrix singularityproblem since the between-class scatter matrix is often singular.

In this section, a PCA+DLA [21] subspace selecting method isused for extracting the space feature of gait PFV representation.PCA is first applied to gait vector images for eliminating theuseless information and then DLA (Discriminative LocalityAlignment) is used for classification. Compared to LDA, DLA hasthree particular advantages: 1) because it focuses on the localpatch of each sample, it can deal with the nonlinearity of thedistribution of samples while preserving the discriminativeinformation; 2) since the importance of marginal samples isenhanced to discriminative subspace selection, it learns lowdimensional representations for classification properly; and3) because it obviates the need to compute the inverse of amatrix, it has no matrix singularity problem.

E[yj]

Zx

Fig. 3. An illustration of calculating the covariance between dimension i and j of

vector images.

4.1. PCA on vector images

The training samples of traditional subspace learning methodsfor space feature extraction are scalar images. However, thetraining samples in this paper are vector images since gaitsequences are represented in the PFV form. So changes are neededto cope with this problem. Suppose G¼{g1, g2, y, gn} is thetraining set for space feature extraction, where n is the number ofgait sequences in the gallery set and gk represents the kth gaitvector image. gk is L�d matrix, where L is the size of vector image

and d is the dimension of periodicity feature vectors. Thecovariance between the ith- and jth-dimension of gait vectorimages is illustrated in Fig. 3, where yk,i and yk,j are the ith- andjth-dimension of gk; E[yi] and E[yj] are the mean of the ith- andjth-dimension of gait vector images, respectively. The covarianceis formulated as followed,

covði; jÞ ¼Xn

k ¼ 1

jyk;i�E½yi�j � jyk;j�E½yj�j cosy ð11Þ

where |yk,i�E[yi]| and |yk,j�E[yj]| represent the lengths of vector(yk,i�E[yi]) and (yk,j�E[yj]), respectively; while y represents theangle between them. Eq. (11) can be expressed in the matrix form,

S¼Xn

k ¼ 1

ðgk�mÞðgk�mÞT ð12Þ

where both gk and m are L�d matrix, and m¼ 1=nðPn

k ¼ 1 gkÞ isthe mean vector image.

We obtain the transformation matrix Mpca¼[e1, e2, y, ed1]T,d1oL by calculating the eigenvectors of S, Eq. (12). And thentransform gk (k¼1, 2, y, n) into the lower dimensional space by

xk ¼MPCAgk ð13Þ

where xkARd1�d. The Euclidean distance is used for measuring thedistance between points in the lower dimensional space.

4.2. Discriminative locality alignment

Many methods have been proposed to overcome the shortagesof LDA, these methods include: geometric mean [22], transductivecomponent analysis [23], discriminant locally linear embeddingwith high-order tensor data [24] and so on. DiscriminativeLocality Alignment (DLA) [21] is one among these newly proposedmethods which achieves excellent performance. DLA operates inthree stages. In the first stage, for each sample in the dataset, onepatch is built by the given sample and its neighbors whichincludes the samples from not only the same class but alsodifferent classes from the given sample. On each patch, anobjective function is designed to preserve the local discriminativeinformation. In the second stage, margin degree is defined foreach sample as a measure of the sample importance incontributing classification. Then, objective functions are weighted

Page 5: Recursive spatiotemporal subspace learning for gait recognition

ARTICLE IN PRESS

R. Hu et al. / Neurocomputing 73 (2010) 1892–18991896

based on the margin degree. In the final stage, all the weightedobjective functions are integrated into together to form a globalcoordinate. The projection matrix can be obtained by solving astandard eigen-decomposition problem.

For a given gait xi (after PCA), according to the class labelinformation, DLA divides the other ones into the two groups:samples in the same class with xi and samples from differentclasses with xi. It select k1 nearest neighbors from samples in thesame class with xi and k2 nearest neighbors from samples indifferent classes with xi, and the local patch for xi is representedby Xi¼[xi, xi1

, y, xik1, xi1

, y, xik2]. For each patch, the corresponding

output in the low dimensional space is Yi¼[yi, yi1, y, yik1

, yi1, y,

yik2]. DLA expects that distances between yi and neighbor samples

from an identical class are as small as possible. Meanwhile, itexpects that distances between yi and neighbor samples fromdifferent classes are as large as possible, so we get:

arg minyi

Xk1

j ¼ 1

:yi�yij:2�b

Xk2

p ¼ 1

:yi�yip:2

0@ 1A ð14Þ

where b is a scaling factor in [0, 1] to unify the different measuresof the within-class distance and the between-class distance.Define the coefficients vector

oi ¼ ½1, . . . ,1k1

,�b, . . . ,�bk2

�T ð15Þ

then, Eq. (14) reduces to

arg minYi

trðYiLiYTi Þ ð16Þ

where Li encapsulates both the local geometry and the discrimi-native information, and it is given by

Li ¼

Xk1þk2

j ¼ 1

ðoiÞj �oTi

�oi diagðoiÞ

26643775 ð17Þ

To quantify the importance of a sample xi for discriminativesubspace selection, DLA finds a measure, termed margin degree mi.For a sample, its margin degree is proportional to the number ofsamples with different class labels from the label of the samplebut in the e-ball centered at the sample. The definition of margin

degree mi for the ith sample is

mi ¼ exp �1

ðniþdÞt

� �ð18Þ

where ni is the number of samples in the e-ball centered at xi withlabels different from the label of xi; d is a regularizationparameter; and t is a scaling factor. In DLA, the part optimizationof the ith patch is weighted by the margin degree of the ith samplebefore the whole alignment stage, i.e.,

arg minYi

mi trðYiLiYTi Þ ¼ arg min

Yi

trðYimiLiYTi Þ ð19Þ

Yi is selected from the global coordinate Y¼[y1, y, yn], such that

Yi ¼ YSi ð20Þ

where SiARnðk1þk2þ1Þ is the selection matrix.Then, Eq. (19) can be rewritten as

arg minY

trðYSimiLiSTi YT Þ ð21Þ

By summing over all part optimizations described as Eq. (21)together, we can obtain the whole alignment as

arg minY

tr�

YðXn

i ¼ 1

SimiLiSTi ÞY

T�¼ arg min

YtrðYLYT Þ ð22Þ

where L¼Pn

i ¼ 1 SimiLiSTi ARn�n is the alignment matrix.

To obtain the linear and orthogonal projection matrix U, such thatY¼UTX, standard eigen-decomposition is applied to matrixXLXT ARd1�d1 , where XARd1�d�n is the input data set. We obtainthe transformation matrix MDLA¼[e1, e2, y, ed2]T, d2od1 bycalculating the eigenvectors of XLXT, and transform xi (i¼1, 2, y, n)into gait feature space by

zi ¼MDLAxi ¼MDLAMPCAgi ð23Þ

where ziARd2�d is called gait feature vectors (GFVs). The Euclideandistance between gait feature vectors is used to measure thesimilarity between gaits.

4.3. Individual recognition

We train the spectrum signals of gait sequences for PFVextraction and get the transformation matrix M; we train thelabeled gallery gaits which are represented in the PFV form forspace feature extraction and get transformation matrix MPCA andMDLA.

For any probe gait sequence:

(1)

Calculate the power spectrums at each gait location asdescribed in Section 3.2;

(2)

Using matrix M to transform each spectrum into periodicityfeature vector (Eq. (9));

(3)

Extract gait feature vectors (GFVs) using MPCA and MDLA

(Eq. (23));

(4) Calculate the Euclidean distance between probe GFV and

gallery GFVs as follows:

Dðzi,zjÞ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXd2

m ¼ 1

Xd

n ¼ 1

ðzm,ni �zm,n

j Þ2

vuut ð24Þ

where zi represents the probe GFV and zj represents thegallery GFV.

There are c classes (individuals) in gallery. We assign a probegait zi to class k if it satisfies the following criterion:

i¼ arg MinjADk

�Dðzi,zjÞ

�ð25Þ

where Dk is the set of gait sequences belonging to the kth class.

5. Experiment and analysis

5.1. Data and parameter

Our experiments are carried out on the USF HumanID gaitdatabase [16]. This database consists of 122 persons and for eachperson, there are up to five covariates: viewpoints (left/right), twodifferent shoe types, surface types (grass/concrete), carryingconditions (with/without a briefcase), clothing and time. Twelveexperiments are designed for individual recognition as shown inTable 1.

Fig. 4 shows the first three dimensions of PFV representationin the gallery set and their corresponding sequences in probeset A-L. From Fig. 4, we can see that each dimension of thePFV representation represents a different type of periodiccharacteristic for gait dynamics, and they are unique toindividuals.

There are three parameters to be decided in the process ofreducing the dimensionality of gait data: the dimension d ofPeriodicity Feature Vectors (Section 3.3), the number of principalcomponents d1 in the PCA stage of space feature extraction(Section 4.1) and the number of chosen eigenvectors d2 in the DLA

Page 6: Recursive spatiotemporal subspace learning for gait recognition

ARTICLE IN PRESS

Table 1Twelve experiments designed for individual recognition in USF HumanID gait database.

Exp. Probe (Surface, Shoe, View, Carry, Time) (C/G, A/B, L/R, NB/BF, time) Number of subjects Difference

A (G, A, L, NB, t1) 122 V

B (G, B, R, NB, t1) 54 S

C (G, B, L, NB, t1) 54 S+V

D (G, A, R, NB, t1) 121 F

E (C, B, R, NB, t1) 60 F+S

F (C, A, L, NB, t1) 121 F+V

G (C, B, L, NB, t1) 60 F+S+V

H (G, A, R, BF, t1) 120 B

I (G, B, R, BF, t1) 60 S+B

J (G, A, L, BF, t1) 120 V+B

K (G, A/B, R, NB, t2) 33 T+S+C

L (C, A/B, R, NB, t2) 33 F+T+S+C

(V—view; S—shoe; F—surface; B—briefcase; C—clothing; T—time).

Person A

Person B

Dimension#1

Dimension#2

Dimension#3

ProbeGallery

A

Person C

Dimension#1

Dimension#2

Dimension#3

Dimension#1

Dimension#2

Dimension#3

B C D E F G H I J K L

Fig. 4. The first three dimensions of PFV representation of three individuals in the gallery set and their corresponding sequence in probe set A–L.

R. Hu et al. / Neurocomputing 73 (2010) 1892–1899 1897

Page 7: Recursive spatiotemporal subspace learning for gait recognition

ARTICLE IN PRESS

R. Hu et al. / Neurocomputing 73 (2010) 1892–18991898

stage of space feature extraction (Section 4.2). All of theseparameters are decided by the criterion of Eq. (10), and Ts is setto 0.9 in our experiment. In DLA, parameter k1 stands for thenumber of samples with same class label and parameter k2 standsfor the number of samples with different class label. If we setk1+k2¼n, where n is the total number of gait sequences, then DLAis similar to LDA because the global structure is considered. Withthis setting, DLA ignores the local geometry and performs poor.Thus, by setting k1 and k2 suitably, DLA can capture both the localgeometry and the discriminative information of samples. In theexperiment, k1 is set to 4 and k2 is set to 6.

5.2. Performance evaluation

Our comparison includes the following methods: Baselinemethod [16], PCA+LDA [29], GEI+MFA-1 [26], GEI+TR1DA[27,28], Gabor+GTDA+LDA [25] and PFV+PCA+DLA in this paper,which is shown in Table 2. Note that, for fairness, all the methodsare carried out on the same gait database. The performance inTable 2 is represented by Rank1 and Rank5 recognition rate. Rank1

Table 3The first ten eigenvalues in PFV training.

Index 1 2 3 4 5

Eigenvalue 1.0475 0.1149 0.0834 0.0497 0.03

Table 2The Rank1 and Rank5 recognition rate for human gait recognition.

CCR(%) A B C D E F G H I J K L

RANK1

Baseline 73 78 48 32 22 17 17 61 57 36 3 3

PCA+LDA 87 85 76 31 30 18 21 63 59 54 3 6

GEI+MFA-1 89 89 83 38 43 25 29 58 59 56 9 18

GEI+TR1DA 85 88 71 19 23 15 14 49 47 45 7 7

Gabor+GTDA 91 93 86 32 47 21 32 95 90 68 16 19

PFV+DLA 94 92 85 46 51 28 34 68 66 62 13 20

RANK5

Baseline 88 93 78 66 55 42 38 85 78 62 12 15

PCA+LDA 93 92 89 58 60 36 43 90 81 79 12 12

GEI+MFA-1 95 96 93 64 67 44 53 89 88 81 24 27

GEI+TR1DA 100 97 95 52 52 34 45 47 71 70 25 25

Gabor+GTDA 98 99 95 58 64 41 52 98 99 87 31 37

PFV+DLA 99 97 94 66 69 52 59 92 91 84 28 35

background

body

handleg

Sample Space

Fig. 5. Explanation of dimensions of gait PFV representation.

and Rank5 performance is the standard for estimating the correctrecognition rate. Rank1 performance means the percentage of thecorrect subject appearing in the first place of the retrieved ranklist and Rank5 means the percentage of the correct subjectsappearing in any of the first five places of the retrieved rank list.

5.3. Analysis of PFV representation

To understand the property of dimensions of gait PFV represen-tation better, Fig. 5 gives an intuitive explanation. Locations havingthe same periodicity characteristic are clustered in ellipses, like thebackground region, the leg swing region, the hand swing region, thebody region and so on. The direction having the largest varianceamong all sample points is shown by the diagonal line labeled D1

which represents the first dimension of PFVs. D2 and D3 aredirections having the second and third largest variance among allsample points. The first dimension represents the differencebetween foreground and background, and the rest dimensionsrepresent the difference between body parts, such as leg, hand, body.Table 3 lists the first 10 eigenvalues in PFV training. The largesteigenvalue occupies nearly 70% of the total energy, and then itdecreases rapidly to around 7.5% for the second eigenvalue. The resteigenvalues still occupy small energy in contrast with the largesteigenvalue, but they decrease slowly.

The same as gait averaged image [19], the PFV representationdoes not require phase estimation. However, the sequence lengthof gait used in DFT as described in Section 3.2 has certain impacton the recognition rate. If the length of gait sequence is exactlyinteger multiples of T which is the period of gait, then thecorresponding GFV is called the expected GFV. To evaluatethe impact of sequence length on recognition rate, firstwe get all the expected GFVs (one gait for each individual) whichare denoted by zj. Then for each gait, we vary the sequence lengthand get the corresponding GFV, which is denoted by ~zi,N and N

represents the sequence length. The Euclidean distance between~zi,N and zj, i¼ j, is called within-class variation and the euclidean

6 7 8 9 10

77 0.0262 0.0198 0.0160 0.0120 0.0100

ratio

sequence length

0.1

0.09

0.08

0.07

0.06

0.05

0.04

0.03

0.02

0.010 20 40 60 80 100 120 140 160 180

Fig. 6. The change of ratio of within-class variation and the averaged between-

class variation with sequence length increasing for the ith gait sequence.

Page 8: Recursive spatiotemporal subspace learning for gait recognition

ARTICLE IN PRESS

R. Hu et al. / Neurocomputing 73 (2010) 1892–1899 1899

distance between ~zi,N and zj, ia j, is called between-classvariation. For each gait sequence, the ratio of within-classvariation and the averaged between-class variation is an im-portant measure for evaluating its discriminative power. For theith gait sequence, Fig. 6 shows its corresponding change of thisratio with sequence length increasing. From this figure, we cansee that the sequence length should be selected by two criterions:1) it should be close to the integer multiples of T; 2) it should belong enough, typically more than 3 or 4 gait cycles, to minimizethe within-class variation.

6. Conclusion and future work

In this paper, a new gait representation is proposed for individualrecognition. By DFT and PCA, gait sequences are represented in thePFV form. Based on the PFV representation of gaits, PCA+DLA is thenapplied to extract the discriminative information for recognition.Gait data is finally compressed into Gait Feature Vector (GFV) whichshows competitive discriminative power. Our future work includesmultimodal biometric recognition [33], by integrating more indivi-dual modalities, e.g., face, gait, and fingerprint, the recognitionaccuracy will be significantly raised.

References

[1] Xuelong Li, S.J. Maybank, Shuicheng Yan, Dacheng Tao, Dong Xu, Gaitcomponents and their application to gender recognition, IEEE T-SMC, Part C38 (2) (2008) 145–155.

[2] M.S. Nixon, J.N. Carter, Advances in automatic gait recognition, Proc. Int. Conf.Autom. Face Gesture Recognition (2004) 139–146.

[3] D. Gavrila, The visual analysis of human movement: a survey, Comput. VisionImage Understanding 73 (1) (1999) 82–98.

[4] M. Turk, A. Pentland, Face recognition using eigenfaces, Proc. Conf. Comput.Vision Pattern Recognition (1991) 586–591.

[5] A.K. Jain, L. Hong, S. Pankanti, R. Bolle, An identity verification system usingfingerprints, Proc. IEEE 85 (9) (1999) 1365–1388.

[6] J. Daugman, High confidence visual recognition of persons by a test ofstatistical independence, IEEE Trans. Pattern Anal. Mach. Intell. 15 (11) (1993)1148–1161.

[7] Y. Qi, B.R. Hunt, A multiresolution approach to computer verificationof handwritten signatures, IEEE Trans. Image Process. 4 (6) (1995) 870–874.

[8] C. BenAbdelkader, R. Cutler, L. Davis, Stride and cadence as a biometric inautomatic person identification and verification, Proc. Int. Conf. Autom. FaceGesture Recognition (2002) 372–377.

[9] D. Cunado, M.S. Nixon, J.N. Carter, Automatic extraction and description ofhuman gait models for recognition purposes, Comput. Vision Image Under-standing 90 (1) (2003) 1–41.

[10] C.Y. Yam, M.S. Nixon, J.N. Carter, Automated person recognition by walkingand running via model-based approaches, Pattern Recognition 37 (5) (2004)1057–1072.

[11] D.K. Wagg, M.S. Nixon, On automated model-based extraction and analysis ofgait, Proc. Int. Conf. Autom. Face Gesture Recognition (2004) 11–16.

[12] A. Kale, A. Sundaresan, A.N. Rajagopalan, N.P. Cuntoor, A.K. Roy-Chowdhury,V. Kruger, R. Chellappa, Identification of humans using gait, IEEE Trans. ImageProcess. 13 (2004) 1163–1173.

[13] A. Sundaresan, A.R. Chowdhury, R. Chellappa, A hidden Markov Model basedframework for recognition of humans from gait sequences, Proc. IEEE Int.Conf. Image Process. (2003) 143–150.

[14] L. Wang, T. Tan, H. Ning, W. Hu, Silhouette analysis-based gait recognition forhuman identification, IEEE Trans. Pattern Anal. Mach. Intell. 25 (12) (2003)1505–1518.

[15] I.R. Vega, S. Sarkar, Statistical motion model based on the change of featurerelationships: human gait-based recognition, IEEE Trans. Pattern Anal. Mach.Intell. 25 (10) (2003) 1323–1328.

[16] S. Sarkar, P.J. Phillips, Z. Liu, I.R. Vega, P. Grother, K.W. Bowyer, The human IDgait challenge problem: data sets, performance, and analysis, IEEE Trans.Pattern Anal. Mach. Intell. 27 (2) (2005) 162–177.

[17] S.D. Mowbry, M.S. Nixon, Automatic gait recognition via Fourier descriptorsof deformable objects, Proc. Conf. Audio Visual Biometric Person Authentica-tion (2003) 566–573.

[18] A. Veeraraghavan, A.R. Chowdhury, R. Chellappa, Matching shape sequencesin video with applications in human movement analysis, IEEE Trans. PatternAnal. Mach. Intell. 27 (12) (2005) 1896–1909.

[19] J. Han, B. Bhanu, Individual recognition using gait energy image, IEEE Trans.Pattern Anal. Mach. Intell. 28 (2) (2006) 316–322.

[20] A.F. Bobick, J.W. Davis, The recognition of human movement using temporaltemplates, IEEE Trans. Pattern Anal. Mach. Intell. 23 (3) (2001) 257–267.

[21] Tianhao Zhang, Dacheng Tao, Jie Yang, Discriminative Locality Alignment.ECCV Part I (2008) 725–738.

[22] Dacheng Tao, Xuelong Li, X.D. Wu, S.J. Maybank, Geometric mean for subspaceselection, IEEE Trans. Pattern Anal. Mach. Intell. 31 (2) (2009) 260–274.

[23] Wei Liu, Dacheng Tao, J.Z. Liu, Transductive component analysis, IEEE Int.Conf. Data Mining, ICDM (2008) 433–442.

[24] Xuelong Li, S. Lin, Shuicheng Yan, Dong Xu, Discriminant locally linearembedding with high-order tensor data, IEEE Trans. Syst., Man, Cybern., PartB: Cybern. 38 (2) (2008) 342–352.

[25] Dacheng Tao, Xuelong Li, X.D. Wu, S.J. Maybank, General tensor analysis andGabor features for gait recognition, IEEE Trans. Pattern Anal. Mach. Intell. 29(10) (2007) 1700–1715.

[26] Dong Xu, Shuicheng Yan, S. Lin, H.J. Zhang, Marginal fisher analysis and itsvariants for human gait recognition and content-based image retrieval, IEEETrans. Image Process. 16 (11) (2007) 2811–2821.

[27] Dacheng Tao, Xuelong Li, X.D. Wu, S.J. Maybank, Tensor rank onediscriminant analysis—a convergent method for discriminative multilinearsubspace selection, Neurocomputing 71 (10–12) (2008) 1866–1882.

[28] Dacheng Tao, Xuelong Li, X.D. Wu, S.J. Maybank, Elapsed time in human gaitrecognition: a new approach, ICASSP, IEEE Int. Conf. Acoustics, Speech SignalProcess. (2006) 177–180.

[29] P.S. Huang, C.J. Harris, M.S. Nixon, Recognizing humans by gait viaparameteric canonical space, Artificial Intell. Eng. 13 (1999) 359–366.

[30] A.M. Martinez, A.C. Kak, PCA versus LDA, IEEE Trans. Pattern Anal. Mach.Intell. 23 (2) (2001) 228–233.

[31] I. Joliffe, Principal Component Analysis, Springer-Verlag, 1986.[32] R.A. Fisher, The use of multiple measurements in taxonomic problems,

Annals of Eugenics 7 (1936) 179–188.[33] Tianhao Zhang, Xuelong Li, Dacheng Tao, Jie Yang, Multimodal biometrics

using geometry preserving projections, Pattern Recognition 41 (3) (2008)805–813.

Rong Hu received the BS degree in electronics andinformation engineering from Huazhong University ofScience & Technology (HUST), Wuhan, China, in 2004and the MS degree in electronics and informationengineering from HUST in 2006. He is now a PhDcandidate in Digital Video and Communication Labora-tory, HUST. His research interests include computergraphics, computer vision, and pattern recognition.

Wei Shen received the B.S. degree in Electronics andInformation Engineering from Huazhong University ofScience and Technology (HUST), Wuhan, China, in2007. Currently, he is a PhD candidate in Digital Videoand Communication Laboratory, HUST.

Hongyuan Wang is professor of the Department ofElectronics and Information Engineering at HuazhongUniversity of Science and Technology (HUST). From1984 to 1985, he worked in University of Oklahomaas a visiting scholar. His current research areasinclude digital video communication and digital signalprocessing.