performance analysis for gait in camera networks · information on the camera parameters [24] or...

8
Performance Analysis for Gait in Camera Networks Michela Goffredo, Imed Bouchrika, John N. Carter and Mark S. Nixon ECS, University of Southampton (UK) {mg2,ib04r,jnc,msn}@ecs.soton.ac.uk ABSTRACT This paper deploys gait analysis for subject identification in multi-camera surveillance scenarios. We present a new method for viewpoint independent markerless gait analysis that does not require camera calibration and works with a wide range of directions of walking. These properties make the proposed method particularly suitable for gait identifi- cation in real surveillance scenarios where people and their behaviour need to be tracked across a set of cameras. Tests on 300 synthetic and real video sequences, with subjects walking freely along different walking directions, have been performed. Since the choice of the cameras’ characteristics is a key-point for the development of a smart surveillance sys- tem, the performance of the proposed approach is measured with respect to different video properties: spatial resolu- tion, frame-rate, data compression and image quality. The obtained results show that markerless gait analysis can be achieved without any knowledge of camera’s position and subject’s pose. The extracted gait parameters allow recog- nition of people walking from different views with a mean recognition rate of 92.2% and confirm that gait can be ef- fectively used for subjects’ identification in a multi-camera surveillance scenario. Keywords Gait Analysis, Biometrics, Multi-view, Survelliance, Object Handover. 1. INTRODUCTION Activity monitoring for security purposes and work flow con- trol in critical infrastructures can be successfully achieved with a distributed network of video sensors, instead of hav- ing one high resolution camera with a limited field of view. Multiple-cameras, in fact, provide a solution to wide and complex areas where overlapped and non-overlapped views allow more accurate surveillance of human action. The aim of a smart surveillance system is to automatically track peo- ple across a network of cooperating active cameras and de- tect abnormal behaviours. In this context, “object han- dover” is defined as the subject’s identification and tracking across a distributed network of cameras, that can have both overlapped and non-overlapped views. Several strategies have been employed for coordinating sub- ject tracking between multiple overlapping camera views [5, 27]. They assumed that 3D ground calibration information was available for each camera or estimate it by analysis of training data. Recently there has been some interest in tracking objects between blind regions [21, 23, 26]. They make use of Kalman filtering assuming the ground plane is known [9] or assume that the transition models are known or hand-labelled [13, 37]. Some approaches recover the relative cameras’ position and the subject’s movement across them with a statistical approach [28, 40]: two cameras are consid- ered connected if objects seen departing in one camera are seen arriving in the other and the statistical dependence is characterized by the distribution of observation transforma- tions between cameras, such as departure to arrival transi- tion times. While there is a large number of papers regarding the track- ing between blind regions, a few works address the issue of subject’s identification across a network of video sensors [7, 14, 25, 31, 39]. Appearance information is indeed one of the most popular solutions for solving the object handover issue since it is relatively robust towards size and orienta- tion changes [7, 14, 25, 31]. Possible appearance features are colour templates, histograms, moments, signatures as dominant colours, and partitive colour layouts. Since illu- mination, camera distortion and object resolution change in a multi-camera setting, these approaches include an inter- camera distortions and illumination compensation [7, 14, 22]. However, in some environments like the military or in- dustrial scenarios, where everyone dresses in identical clothes, appearance information cannot be applied. Furthermore, the colour-based approaches work in a short time period and are not suitable for a long-term workflow control. To this ex- tent, Stillman et al. used face identification in a multimodal approach where acoustic and visual data are combined [39]. However, in typical surveillance applications cameras are usually located far away from the subjects, which result in low-resolution face images. Another concern is the face ori- entation: most face-based methods work only for frontal images and the accuracy of identification quickly decreases even with the slight orientation differences. In this paper, we propose gait analysis as a solution for sub- jects’ identification across a network of cameras.

Upload: others

Post on 28-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Performance Analysis for Gait in Camera Networks · information on the camera parameters [24] or rely on the use of re ective markers on the body [16]. In this paper a novel 2D markerless

Performance Analysis for Gait in Camera Networks

Michela Goffredo, Imed Bouchrika, John N. Carter and Mark S. Nixon

ECS, University of Southampton (UK){mg2,ib04r,jnc,msn}@ecs.soton.ac.uk

ABSTRACTThis paper deploys gait analysis for subject identificationin multi-camera surveillance scenarios. We present a newmethod for viewpoint independent markerless gait analysisthat does not require camera calibration and works with awide range of directions of walking. These properties makethe proposed method particularly suitable for gait identifi-cation in real surveillance scenarios where people and theirbehaviour need to be tracked across a set of cameras. Testson 300 synthetic and real video sequences, with subjectswalking freely along different walking directions, have beenperformed. Since the choice of the cameras’ characteristics isa key-point for the development of a smart surveillance sys-tem, the performance of the proposed approach is measuredwith respect to different video properties: spatial resolu-tion, frame-rate, data compression and image quality. Theobtained results show that markerless gait analysis can beachieved without any knowledge of camera’s position andsubject’s pose. The extracted gait parameters allow recog-nition of people walking from different views with a meanrecognition rate of 92.2% and confirm that gait can be ef-fectively used for subjects’ identification in a multi-camerasurveillance scenario.

KeywordsGait Analysis, Biometrics, Multi-view, Survelliance, ObjectHandover.

1. INTRODUCTIONActivity monitoring for security purposes and work flow con-trol in critical infrastructures can be successfully achievedwith a distributed network of video sensors, instead of hav-ing one high resolution camera with a limited field of view.Multiple-cameras, in fact, provide a solution to wide andcomplex areas where overlapped and non-overlapped viewsallow more accurate surveillance of human action. The aimof a smart surveillance system is to automatically track peo-ple across a network of cooperating active cameras and de-

tect abnormal behaviours. In this context, “object han-dover” is defined as the subject’s identification and trackingacross a distributed network of cameras, that can have bothoverlapped and non-overlapped views.Several strategies have been employed for coordinating sub-ject tracking between multiple overlapping camera views [5,27]. They assumed that 3D ground calibration informationwas available for each camera or estimate it by analysis oftraining data. Recently there has been some interest intracking objects between blind regions [21, 23, 26]. Theymake use of Kalman filtering assuming the ground plane isknown [9] or assume that the transition models are known orhand-labelled [13, 37]. Some approaches recover the relativecameras’ position and the subject’s movement across themwith a statistical approach [28, 40]: two cameras are consid-ered connected if objects seen departing in one camera areseen arriving in the other and the statistical dependence ischaracterized by the distribution of observation transforma-tions between cameras, such as departure to arrival transi-tion times.While there is a large number of papers regarding the track-ing between blind regions, a few works address the issue ofsubject’s identification across a network of video sensors [7,14, 25, 31, 39]. Appearance information is indeed one ofthe most popular solutions for solving the object handoverissue since it is relatively robust towards size and orienta-tion changes [7, 14, 25, 31]. Possible appearance featuresare colour templates, histograms, moments, signatures asdominant colours, and partitive colour layouts. Since illu-mination, camera distortion and object resolution change ina multi-camera setting, these approaches include an inter-camera distortions and illumination compensation [7, 14,22]. However, in some environments like the military or in-dustrial scenarios, where everyone dresses in identical clothes,appearance information cannot be applied. Furthermore,the colour-based approaches work in a short time period andare not suitable for a long-term workflow control. To this ex-tent, Stillman et al. used face identification in a multimodalapproach where acoustic and visual data are combined [39].However, in typical surveillance applications cameras areusually located far away from the subjects, which result inlow-resolution face images. Another concern is the face ori-entation: most face-based methods work only for frontalimages and the accuracy of identification quickly decreaseseven with the slight orientation differences.

In this paper, we propose gait analysis as a solution for sub-jects’ identification across a network of cameras.

Page 2: Performance Analysis for Gait in Camera Networks · information on the camera parameters [24] or rely on the use of re ective markers on the body [16]. In this paper a novel 2D markerless

Recognizing individuals by the way they walk is a particu-larly challenging research area, where the potential for per-sonal identification is supported by a rich literature, includ-ing medical and psychological studies [11, 30]. The com-pletely unobtrusiveness without any subject cooperation orcontact for data acquisition make gait particularly attrac-tive for identification purposes in camera handover. Gaitrecognition techniques at the state of the art can be dividedinto 3D [4, 29] and 2D approaches [20, 33, 45]. In the firstgroup, identification relies on parameters extracted from the3D limb movement. These methods use a large number ofdigital cameras and the 3D reconstruction is achieved aftera camera calibration process. On the other hand, the 2Dgait biometric approaches extract explicit features describ-ing gait by means of human body models [6, 34] or silhouetteshape [10, 41, 43]. A rich variety of data has been collectedfor evaluation of 2D gait biometrics. The widely used andcompared databases on gait recognition include: the Univer-sity of South Florida [35]; Carnegie Mellon University [17];and the University of Southampton [38] data. The majorityof methods and databases found in the literature use a sin-gle camera positioned with specific orientations with respectto the subject’s walking direction (generally capturing thewalk from the lateral view) and a large number of papersfor gait recognition have been published [19, 33].In real surveillance scenarios, however, we need a systemthat operates in an unconstrained environment where maybethere is no information regarding the relative position be-tween the camera and the walking subject [44]. 2D viewindependent gait identification is a recent research area andthe approaches found in literature extract some gait param-eters which are independent of the human pose [3, 42] byanalysing the silhouette shape over time or aim at synthe-sising the sagittal view of the human body from any otherarbitrary views [16, 24]. The latter choice is justified by thefact that the lateral view has proven recognition capabilityin a great number of works [20, 33, 45]. However, they needinformation on the camera parameters [24] or rely on theuse of reflective markers on the body [16].

In this paper a novel 2D markerless view independent gaitanalysis algorithm is presented and tested for assessing itapplicability for object handover in multi-camera scenar-ios. The method does not need camera calibration or pre-knowledge of subject pose. Since the choice of the cameras’characteristics is a key-point for the development of a smartsurveillance system, the performance of the proposed ap-proach is measured with respect to different video proper-ties.

2. THEORYThe proposed method for the view independent gait analy-sis is based on two consecutive stages: the markerless joints’estimation from the image sequence and the view-point rec-tification. Gait parameters are then used for identification.

2.1 Markerless Joints’ Position EstimationLet S(x,y, tϕ) be the RGB frame (of size RxC pixels) attime T = [t1, t2, . . . , tϕ, . . . , tF ] where (x1, y1) is the top-leftcorner of the image. By applying a background subtrac-tion method based on the threshold of the three compo-nents of the color space YUV, the binary image S(x,y, tϕ)has been extracted. The pixels (xsil,ysil) containing the hu-

man silhouette have been selected as the object with maxi-mum area [18]. The lower limbs pose estimation algorithmis based on the proportions of the human body segments,following the medical results of anatomical studies [12]:

y′hip = min (ysil) + 0.5 ·Hy′knee = min (ysil) + 0.75 ·Hy′ankle = min (ysil) + 0.90 ·H

(1)

where H is the silhouette’s height.

Subsequently, the shins and thighs have been analysed sep-arately. Let X̂ be the set of horizontal coordinates whereS(X̂, y′knee, t̂δ) = 1, then the subset of frames

T̂ = [̂t1, t̂2, . . . , t̂δ, . . . , t̂D≤F ] (2)

where the legs do not overlap are those where X̂ forms 2single connected components larger than 0.02H pixels.Consequently, for each frame t̂δ, the shin extraction algo-rithm, based on the linear approximation of the skeleton ofthe portion of image that includes the lower leg, is appliedon S(x,y, t̂δ). The sub-images containing the shins (labeledas Sshin1 and Sshin2) are extracted from

Slower leg(x,y, t̂δ) =

S`x, ys, t̂δ

´if ys ∈ yshin

0 otherwise(3)

where the vertical coordinates belonging to the two shinsyshin = [y1, y2, . . . , ys, . . . , yS ] are defined in the followingway:

yshin =ˆy′knee, y

′knee + 1, · · · ,min

`y′ankle, yoverlap − 1

´˜(4)

which includes the possibility of feet overlapping with yoverlap

: vertical coordinate where the set of correspondent horizon-tal coordinates xoverlap belong to a single object larger than0.02H pixels.Therefore, the left and right shins are defined by the pixelswith coordinates

xskelshin`

`̂tδ´

= [x1`, x2`, . . . , xs`, . . . , xS`]yskel

shin`

`̂tδ´

= yshin ` = {1, 2} (5)

where

xs` =

CXj=1

xj · Sshin`

`xj, ys, t̂δ

´/

CXj=1

Sshin`

`xj, ys, t̂δ

´(6)

Then, the shins can be linearly approximated by the firstorder polynomial with coefficients

p`

“xskel

shin`, t̂δ”

= p`0`̂tδ´

+ p`1`̂tδ´· xskel

shin`

`̂tδ´

(7)

Therefore, the angle between the shins and the vertical axis(called knee angle) at frame t̂δ is α`

`̂tδ´

= π−arctanˆp`1`̂tδ´˜

.

The procedure is applied for each frame in the subset T̂ =[̂t1, t̂2, . . . , t̂δ, . . . , t̂D].In order to also be able to estimate α1 and α2 when the legsoverlap, the extracted shins are labeled as ’left’ and ’right’according to the following rationale without losing generali-ties: the subject is walking for at least two gait cycles; in t1the shin labeled as shin1 belongs to the right leg; the labelsswap after the legs overlap.Subsequently, a 3rd order polynomial interpolation of α1 andα2 has been applied and the knees angles are achieved duringthe whole video sequence. This choice has been determinedexperimentally.

Page 3: Performance Analysis for Gait in Camera Networks · information on the camera parameters [24] or rely on the use of re ective markers on the body [16]. In this paper a novel 2D markerless

The upper legs orientation is extracted at every frame T =[t1, t2, . . . , tϕ, . . . , tF ] with a coarse to fine hips estimationprocedure where at first, the hips position is achieved with8>><>>:

x′hip` = 1P·

PXj=1

exj + (2`− 3) ·H · µ · 10−3

y′hip` = y′hip · (2`− 3) ·“exP−ex1

2

”· sin (0.3 · µ)

(8)

where eX = [ex1,ex2, . . . ,exj, . . . ,exP] is the subset of P (P ≤ C)

horizontal coordinates such as S(eX, y′

hip, ti) = 1.Equations 8 puts in relationship the horizontal hips positionand the walking direction µ, calculated with respect to thehorizontal axes of the image reference system. These rela-tionships have been obtained with a regressive analysis ofthe 3D Georgia Tech motion capture data [1] by consideringdifferent camera positions.The angle µ is estimated as the angle of inclination of thestraight line which approximates the heel strikes points withcoordinates:

(xstr,ystr) =`xfeet,yfeet

´|PX

i=1

SK`xfeet,yfeet, ti

´> τ (9)

where SK is the skeleton [32] of the image, xfeet and yfeet arethe coordinates corresponding to the portions of silhouettebelonging to the feet and the threshold τ is automaticallyset so that at least 4 heels strikes are extracted (in accor-dance with the hypothesis listed above).Subsequently, a finer hips pose estimation is obtained witha linear approximation of the thighs by the first order poly-nomial with coefficients

q`0 (tϕ) =ymshin`·x

′hip`−y′

hip`·xmshin`

x′hip`

+xmshin`

q`1 (tϕ) = 12·“a′` +

y′hip`−q`0x′hip`

” (10)

where ymshin` = min (yshin) and xm

shin` is the mean value of

the correspondent horizontal coordinates. a′1 and a

′2 are the

slope of the straight lines that approximate respectively theleft and right edges of the positions of silhouette belongingto the thighs.Therefore, the angle between the thighs and the vertical axisat every frame tϕ is γ` (tϕ) = π − arctan [q`1 (tϕ)].Ultimately, the thighs are labeled as ’left’ or ’right’in ac-cordance with the corresponding shin labeling. The anglesα1, α2, γ1 and γ2 are then corrected by projecting the limbstrajectories on the lateral plane, as explained next.

2.2 View-point Independent RectificationThe method proposed by the authors in [16] is based on fourmain assumptions: the nature of human gait is cyclic; sub-jects walk along a straight line; the distances between thebone joints are constant; and the articulated leg motion isapproximately planar. Therefore, multiple periods of lineargait motion appear analogous to a single period viewed frommany cameras related by linear translation and the positionsof the legs joints lie in an auto-epipolar configuration.If j`i is the set of joints positions for each leg ` = {1, 2} at theith frame in the image reference system, the relationship be-tween j`i and the corresponding positions in the worldspaceis j`i × Pi · J` = 0, where Pi =

ˆRT

e ,−ie0

˜and RT

e is therotation matrix for aligning the epipolar vector e0 with the

horizontal axis X. Then,

j`i = Pi

„1 00 H−1

V

«„1 00 HV

«= H · J` (11)

having expressed the limb plane transformation matrix withHV so that the two cross section plane lines are centred andnormalised respect to Y and Z axes and parallel with Y. Byassuming the lengths of the articulated limbs D2

` = ∆j`Ti ∆j`iare constant over all the frames, the pose difference vectorsfor the limb segments at two consecutive frames, ∆j`i and∆j`i+1, are related by

∆j`Ti ·HT ·H ·∆j`i = ∆j`Ti+1 ·H

T ·H ·∆j`i+1 (12)

After recovering the fronto-parallel structure of subject gait,the representation of the leg joints function

ˆJ`x (t) ,J`y (t)

˜is

found by fitting a modified Fourier series to the data withfixed fundamental frequency f0 and period T:

J`x (t) = vxt+

nXk=1

Ak cos

„2πkf0

„t +

(`− 1) T

2

«+ φk

«+J`x0

(13)analogously for J`y (t). Thus, the projection of the leg jointson the lateral plane is obtained with an optimized procedurein the following way

J̆`

(t) =ˆ

h1 h2 h3

˜g

„t +

(`− 1) T

2: f0,D`, vx, vy,F

«(14)

where g (t) is the bilateral Fourier series function with coef-ficients F and h are the values of the inverse normalizationtransform matrix.Therefore, starting from a video sequence from a single cam-era and without any calibration, the proposed markerlesssystem, in junction with [16], estimates the gait parametersprojected on the lateral plane.

2.3 View-point Invariant Gait IdentificationThe processing and derivation of good gait features fromthis trajectory-based data, is still an unsolved and chal-lenging problem due to the complexity of the human vi-sual perception system as well as the compound nature ofgait motion inherent in the numerous variables associatedwith it including kinematics, kinetics and anthropometrics[8]. An important issue in gait recognition is the derivationof appropriate features that can capture the discriminativeindividuality from the subject’s gait. Such features shouldrespond to crucial criteria such as robustness and invarianceto weather conditions, clothing and operating conditions.

In order to identify a subject by their gait, we derive the an-gular measurements, anthropometric measurements as wellas the trunk spatial displacement which best describe thegait kinematics. The use of angular motion is very commonin gait analysis and recognition. The angles of the jointsincluding the hip and the knee; are considered the mostimportant kinematics of the lower limbs. The anthropomet-ric measurements include the subject height and lengths ofthe lower limbs. Feature selection is employed to derive asmany discriminative cues as possible whilst removing the re-dundant and irrelevant gait features which may degrade therecognition rate. It is practically infeasible to run an ex-haustive search for all the possible combinations of featuresin order to obtain the optimal subset for recognition due to

Page 4: Performance Analysis for Gait in Camera Networks · information on the camera parameters [24] or rely on the use of re ective markers on the body [16]. In this paper a novel 2D markerless

the high dimensionality of the feature space. For this rea-son, we employed the Adaptive Sequential Forward FloatingSelection (ASFFS) search algorithm. The algorithm uses avalidation-based evaluation criterion which is proposed tofind the subset of features that minimises the classificationerrors as well as ensure good separability between the dif-ferent classes. In contrast to the voting scheme used in theKNN, the evaluation function uses different weights w tosignify the importance of the most nearest neighbours. Theprobability score for a sample sc to belong to class c is ex-pressed in the following equation (15):

f(sc) =

PNc−1i=1 ziwiPNc−1i=1 wi

(15)

where Nc is the number of instances in class c, and theweight wi for the ith nearest instance is related to proximityas:

wi = (Nc − i)2 (16)

The value of zi is defined as:

zi =

1 if nearest(sc, i) ∈ c0 otherwise

(17)

such that the nearest(sc, i) function returns the ith nearestinstance to the sample sc. The Euclidean distance metric isemployed to find the nearest neighbours.

3. EXPERIMENTAL TESTSThe proposed approach for subject identification from dif-ferent view-points has been tested both on synthetic and onreal data. Different video properties have been consideredin order to compare the performance of the method withrespect to the cameras’ properties.

3.1 Performance Analysis of Gait Feature Ex-traction

A quantitative validation of gait analysis has been obtainedin a virtual environment: synthetic video sequences (500x490pixels, 30fps) of a humanoid walking for one gait cycle havebeen created with Poser 7R© (by e-frontier) in order to pro-vide the ground truth for validation purposes. Different spa-tial resolutions, image qualities and compressions have beenconsidered. A range of spatial resolutions (250x245, 163x163pixels) have been considered by sub-sampling the originalimages with factors 2 and 3. Table 1 shows the comparativeresults in terms of mean (me) and standard deviation (σe)of joint’s angles compared to the ground truth over the gaitcycle.The results, with a mean value of 2.6±32.61, are particularlyencouraging and present the same magnitude to the onesobtained with 3D markerless systems or 2D complex modelbased ones [9][12]. The image resolution and silhouette’sheight reduce the algorithm performance, as predictable. Onthe other hand, a 50-pixels-height silhouette still maintainsan acceptable result. Furthermore, Table 1 shows the meantime for processing one frame (Matlab7 R© code processedwith 2GHz Intel CPU): the model-free proposed approachallows to extract the kinematic information with computa-tional cost per silhouette’s pixel lower than 2 · 10−4s.In addition, to simulate background subtraction imprecision,

Image resolution/ me (deg) σe (deg) Proc. timemean sil’s height (s/frame)

500x490/200 2.04 2.12 1.44250x245/100 2.80 2.15 0.97163x163/50 3.06 3.56 0.37

Table 1: Angle’s errors (in terms of mean and stan-dard deviation) on synthetic gait with different spa-tial resolution.

σn / PSNR(dB) me (deg) σe (deg)

0/∞ 2.04 2.125/28.12 2.30 2.1325/24.25 3.01 3.3850/21.89 3.30 3.72100/15.23 3.25 3.68150/7.01 4.25 3.49

Table 2: Angle’s errors (in terms of mean and stan-dard deviation) on synthetic gait with different zero-mean Gaussian noises.

zero-mean Gaussian noise has been added to the syntheticimages. The standard deviation (σn) of the added noisevaries from 0 (original data) to 150 corresponding to PSNRfrom ∞ to 7.01dB.Results in Table 2 show me and σe with respect to the dif-ferent PSNR. Obviously the error increases with the levelof noise added to the frames but it is lower than 5 degreeseven at high noise levels. The noise robustness allows to un-derstand how the proposed method depends on an accuratesilhouette extraction and thus to extend the approach in areal context.Since images from a network of cameras can be compressedfor data storage problems, different JPEG compressions havebeen compared. Seven quality factors between 100 to 5 havebeen used for compressing the synthetic images, allowing acompression ratio CR = Su/Sc between 40 to 146, where Suand Sc are the sizes of the uncompressed and compressed im-ages respectively.Figure 1 shows me with respect to the image compression.The value obtained with uncompressed images is also re-ported. The mean error increases rapidly with the compres-sion, then remaining constant at about 3.3 pixels for JPEGqualities between 75 and 25.

The proposed method for gait analysis has been tested onreal data from the CASIA-B database [2] where 12 younghealthy subjects walk along straight lines with 6 differentcamera orientations (36o, 54o, 72o, 90o, 108o, 126o). The 90o

corresponds to the side view walking direction as shown inFigure 2. The video sequences have a spatial resolution andframe rate of 320x240 pixels and 25fps respectively. Themethod has been applied to the video sequences, the limbspose has been estimated frame by frame and the hip andknee angles have been extracted for each camera positionand for each subject. Figure 2 shows an example of the limbspose estimation in the 6 directions. The frames have beencompressed with different JPEG qualities and the joints’position has been compared with the one extracted from

Page 5: Performance Analysis for Gait in Camera Networks · information on the camera parameters [24] or rely on the use of re ective markers on the body [16]. In this paper a novel 2D markerless

Figure 2: Markerless joints extraction in different view points

Page 6: Performance Analysis for Gait in Camera Networks · information on the camera parameters [24] or rely on the use of re ective markers on the body [16]. In this paper a novel 2D markerless

Figure 1: Mean angle’s errors on synthetic gait withdifferent image compressions.

the uncompressed images. The mean joints’ distance is thenmeasured:

md =

PIi=1

PVv=1

PJj=1

q`xcivj − xuivj

´2+`ycivj − yuivj

´2I · V · J

(18)where the superscripts ′u′ and ′c′ indicate the joints’ positionfrom the uncompressed and compressed images respectively.I is the number of individuals, V the number of video se-quence per person and J the number of tracked joints.Figure 3 depicts md with respect to the image compressions.The results are in accordance with the one obtained on syn-thetic images and show an increase of vertex’s de-locationespecially for qualities higher than 25. The results obtainedwith different compressions confirm the applicability of gaitanalysis in real surveillance scenarios where cameras usuallypresents JPEG compression qualities higher than 50.

3.2 Viewpoint Analysis of Gait RecognitionIn order to assess the performance of gait recognition fromdifferent viewpoints using a single uncalibared camera, a setof 275 video sequences with 12 different subjects recorded at6 viewpoints are taken from the Casia gait database. TheCorrect Classification Rate (CCR) is computed using theK-nearest neighbour (KNN) classifier with the Leave-one-out cross-validation rule. The KNN rule is applied at theclassification phase due to its low complexity and hence fastcomputation besides the ease of comparison to other meth-ods. In the leave-one-out validation, every instance from theoriginal sample is used for testing and is validated againstthe remaining observations. This is repeated for all the ob-servations in the dataset. The recognition rate is computedas the average of all validations.

To investigate the effects of the viewpoint, an initial ex-perimental is carried out to measure the recognition rateusing the non-rectified gait data. The CCR is computedseparately for every viewpoint. Figure 4 shows the varia-tion of the CCRs with respect to the different viewpoints.

Figure 3: Mean joints’ distance on real gait withdifferent image compressions.

Figure 4: Gait Recognition from different View-points using Non-Rectified Data.

The highest recognition rate is 96% for the side view whilstthe recognition rate decreases when the walking directionchanges from the lateral view. This is because of the varia-tions in the extracted angular features from different views.Therefore, non-rectified gait angular data cannot be useddirectly for biometric applications. This was confirmed bythe recent research study by Goffredo [15].

In the second experiment, the viewpoint rectification methoddescribed in section (2.2) is applied to reconstruct the gaitangular features into the lateral plane (i.e. sideview ) Basedon a set of 275 video sequences of walking subjects, a highrecognition rate of 92.2% is achieved for all the viewpointscombined together. It is worth to note that the database hasother covariate factors including clothing and load carriage.This is to signify the importance of the gait reconstructioninto translating gait as a biometric method into real worldsurveillance and monitoring applications.

Another useful evaluation measure is the Cumulative MatchScore (CMS) which was introduced by Phillips et al in theFERET protocol [36] for the evaluation of face recognition

Page 7: Performance Analysis for Gait in Camera Networks · information on the camera parameters [24] or rely on the use of re ective markers on the body [16]. In this paper a novel 2D markerless

algorithm. The measure assesses the ranking capabilitiesof the recognition system by producing a list of scores thatindicates the probabilities that the correct classification fora given test sample is within the top n matched class labels.A classification score of 92.2% and 100% are achieved at the1st and 7th rank respectively. The CMS score at 1st rankis the correct classification rate. Figure 5 shows the CMScurve of the gait identification for the rectified data.

Figure 5: Mean angle’s errors on synthetic gait withdifferent image compressions.

Furthermore, a number of experiments are carried out usingthe same video set to investigate the algorithm potentialsfor data recorded at low frame rate. The performance erroris simulated by dropping a number of frames from every 25frames (25 is the original frame rate) of the video sequenceswhich is equivalent to changing the frame rates. Table 3shows the recognition rates for data recorded at differentframe rates. It is observed that the algorithm performanceis not much affected even when dropping 40% of the framesas the algorithm predicts the joint positions for the missingframes through the use of data fitting.

Video Frame Rate Recognition Rate

25 fps 92.2%15 fps 81%

Table 3: Video Frame Rate Effects on the Perfor-mance of Gait Recognition .

4. CONCLUSIONSA smart surveillance system aims at automatically trackingpeople across a network of cameras and detect abnormal be-haviours. In this context, the subject’s identification andtracking between the cameras is critical point. Possible so-lutions found in literature are based on colour or appearanceinformation. However, most approaches to date limit theirapplication to a short time period and environments wheresubjects does not wear uniform.

In this paper, we have proposed gait analysis as a solutionfor subjects’ identification across a network of cameras. Thecompletely unobtrusiveness without any subject cooperationor contact for data acquisition make gait particularly attrac-tive for identification purposes in camera handover.

A novel 2D markerless view-independent gait analysis algo-rithm has been presented: the method does not need cam-era calibration or pre-knowledge of subject pose. Since thechoose of the cameras’ characteristics is a key-point for thedevelopment of a smart surveillance system, the performanceof the proposed approach has been measured with respectto different video properties.Tests on both synthetic and real video sequences allowedto evaluate the performance of the proposed approach withrespect to different spatial resolution, frame-rate, data com-pression and image quality. The obtained results show thatgait analysis can be efficiently used for view-independentsubjects’ identification with commercially available videocameras.

5. ACKNOWLEDGMENTSThe study was supported by SCOVIS: FP 7 ICT projectno. 216465 and uses the CASIA Gait Database collected byInstitute of Automation, Chinese Academy of Sciences.

6. REFERENCES[1] www.cc.gatech.edu/cpl/projects/hid.

[2] Casia gait database. online, 2006.

[3] C. BenAbdelkader, L. S. Davis, and R. Cutler.Motion-based recognition of people in eigengait space.In Proc IEEE Conf AFG, pages 267–274, 2002.

[4] B. Bhanu and J. Han. Human recognition oncombining kinematic and stationary features. In ProcInt Conf AVBPA, pages 600–608, 2003.

[5] J. Black, T. Ellis, and P. Rosin. Multi view imagesurveillance and tracking. Motion and VideoComputing. Proc. Workshop on, pages 169–174, Dec.2002.

[6] I. Bouchrika and M. S. Nixon. Model-Based FeatureExtraction for Gait Analysis and Recognition, volume4418 of LNCS. Springer, 2007.

[7] Y. Cai, W. Chen, K. Huang, and T. Tan. Continuouslytracking objects across multiple widely separatedcameras. In ACCV (1), pages 843–852, 2007.

[8] T. Chau. A review of analytical techniques for gaitdata. Part 1: fuzzy, statistical and fractal methods.Gait Posture, 13(1):49–66, 2001.

[9] A. Chilgunde, P. Kumar, S. Ranganath, andH. WeiMin. Multi-camera target tracking in blindregions of cameras with non-overlapping fields of view.In British Machine Vision Conference, pages –, 2004.

[10] R. Collins, R. Gross, and J. Shi. Silhouette-basedhuman identification from body shape and gait. InProc. IEEE Conf. on Automatic Face and GestureRecognition, pages 351–356, 2002.

[11] J. Cutting and L. Kozlowski. Recognizing friends bytheir walk: gait perception without familiarity cues.Bull. Psychonom. Soc., 9:353–356, 1977.

[12] W. T. Dempster and G. R. L. Gaughran. Properties ofbody segments based on size and weight. Am J Anat,120:33–54, 1965.

[13] R. Fisher. Self-organization of randomly placedsensors. page IV: 146 ff., 2002.

[14] A. Gilbert and R. Bowden. Tracking objects acrosscameras by incrementally learning inter-camera colourcalibration and patterns of activity. pages 125–136,2006.

Page 8: Performance Analysis for Gait in Camera Networks · information on the camera parameters [24] or rely on the use of re ective markers on the body [16]. In this paper a novel 2D markerless

[15] M. Goffredo, R. D. Seely, J. N. Carter, and M. S.Nixon. Markerless view independent gait analysis withself-camera calibration. Proc. IEEE Conf. onAutomatic Face and Gesture Recognition, 2008.

[16] M. Goffredo, N. Spencer, D. Pearce, J. N. Carter, andM. S. Nixon. Human Perambulation as a SelfCalibrating Biometric., volume 4778 of LNCS.Springer, 2007.

[17] R. Gross and J. Shi. The cmu motion of body (mobo)database. Technical Report CMU-RI-TR-01-18,Robotics Institute, Carnegie Mellon University,Pittsburgh, PA, 2001.

[18] R. M. Haralick and L. G. Shapiro. Computer andRobot Vision, volume 1. Addison-Wesley, 1992.

[19] W. Hu, T. Tan, L. Wang, and S. Maybank. A surveyon visual surveillance of object motion and behaviors.IEEE Trans Systems Man Cybernetics, 34(3):334–352,2004.

[20] P. Huang, C. Harris, and M. Nixon. Recognisinghumans by gait via parametric canonical space. J ArtifIntell Eng, 13(4):359–366, 1999.

[21] T. Huang and S. Russell. Object identification in abayesian context. In IJCAI97, pages 1276–1282, 1997.

[22] A. Ilie and G. Welch. Ensuring color consistencyacross multiple cameras. Tenth IEEE Conf. onComputer Vision., 2:1268–1275 Vol. 2, Oct. 2005.

[23] O. Javed, Z. Rasheed, K. Shafique, and M. Shah.Tracking across multiple cameras with disjoint views.Proc. Ninth IEEE Conf. on Computer Vision, pages952–957 vol.2, Oct. 2003.

[24] A. Kale, A. Chowdhury, and R. Chellappa. Towards aview invariant gait recognition algorithm. Proc IEEEConf AVSS, pages 143–150, 2003.

[25] S. Kang, J. Min, and J.-K. Paik. Multiple-cameratracking system with seamless object handover.volume 4303, pages 27–34. SPIE, 2001.

[26] V. Kettnaker and R. Zabih. Bayesian multi-camerasurveillance. IEEE Conf. on Computer Vision andPattern Recognition, 2:–259 Vol. 2, 1999.

[27] S. Khan and M. Shah. Consistent labeling of trackedobjects in multiple cameras with overlapping fields ofview. Pattern Analysis and Machine Intelligence,IEEE Transactions on, 25(10):1355–1360, Oct. 2003.

[28] D. Makris, T. Ellis, and J. Black. Bridging the gapsbetween cameras. Proc. IEEE Computer Vision andPattern Recognition, 2:II–205–II–210 Vol.2, June-2July 2004.

[29] L. Middleton, D. K. Wagg, A. I. Bazin, J. N. Carter,and M. S. Nixon. Developing a non-intrusive biometricenvironment. Proc IEEE Int Conf IROS, pages723–728, 2006.

[30] M. P. Murray, A. B. Drought, and R. C. Kory.Walking patterns of normal men. J Bone JointSurgery, 46:335, 1964.

[31] C. Niu and E. Grimson. Recovering non-overlappingnetwork topology using far-field vehicle tracking data.In ICPR (4), pages 944–949, 2006.

[32] M. Nixon and A. Aguado. Feature Extraction & ImageProcessing. Academic Press, 2nd edition, Dec. 2007.

[33] M. S. Nixon and J. N. Carter. Automatic recognitionby gait. Proc. of the IEEE, 94(11):2013–2024, 2006.

[34] C. Orrite-Urunuela, J. del Rincon, J. Herrero-Jaraba,and G. Rogez. 2d silhouette and 3d skeletal models forhuman detection and tracking. Proc IEEE ConfICPR, 4:244–247, 2004.

[35] P. Phillips, S. Sarkar, I. Robledo, P. Grother, andK. Bowyer. The gait identification challenge problem:data sets and baseline algorithm. Proc Int ConfPattern Recognition, 1:385–388 vol.1, 2002.

[36] P. J. Phillips, H. Moon, S. A. Rizvi, and P. J. Rauss.The FERET Evaluation Methodology for FaceRecognition Algorithms. IEEE Transactions onPattern Analysis and Machine Intelligence,22(10):1090–1104, 2000.

[37] A. Rahimi, B. Dunagan, and T. Darrell. Simultaneouscalibration and tracking with a network ofnon-overlapping sensors. Proc. Computer Vision andPattern Recognition, 1:I–187–I–194 Vol.1, June-2 July2004.

[38] J. Shutler, M. Grant, M. S. Nixon, and J. N. Carter.On a large sequence-based human gait database. InProc Int Conf Recent Advances in Soft Computing,pages 66–72, 2002.

[39] S. Stillman and I. Essa. Towards reliable multimodalsensing in aware environments. In PUI ’01: Proc.workshop on Perceptive user interfaces, pages 1–6,New York, NY, USA, 2001. ACM.

[40] K. Tieu, G. Dalley, and W. Grimson. Inference ofnon-overlapping camera network topology bymeasuring statistical dependence. Tenth IEEE Conf.on Computer Vision, 2:1842–1849 Vol. 2, Oct. 2005.

[41] G. Veres, L. Gordon, J. Carter, and M. Nixon. Whatimage information is important in silhouette-basedgait recognition? Proc IEEE Conf CVPR,2:II–776–II–782 Vol.2, 2004.

[42] L. Wang, T. Tan, W. Hu, and H. Ning. Automaticgait recognition based on statistical shape analysis.IEEE Trans IP, 12(9):1120–1131, 2003.

[43] L. Wang, T. Tan, H. Ning, and W. Hu. Silhouetteanalysis-based gait recognition for humanidentification. IEEE Trans PAMI, 25(12):1505–1518,2003.

[44] Y. Yamada. Advanced method for improvement ofobscure video image. Proc. IEEE 33rd AnnualCarnahan Conference on Security Technology, pages440–445, 1999.

[45] R. Zhang, C. Vogler, and D. Metaxas. Human gaitrecognition at sagittal plane. J Image Vision Comput,25(3):321–330, 2007.