human gait estimation using a wearable camera · human gait estimation using a wearable camera...

Human Gait Estimation Using a Wearable Camera

Yoshihiro Watanabe, Tetsuo Hatanaka, Takashi Komuro and Masatoshi IshikawaGraduate School of Information Science and Technology, University of Tokyo

7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan.Yoshihiro [email protected]

AbstractWe focus on the growing need for a technology that can

achieve motion capture in outdoor environments. The con-ventional approaches have relied mainly on fixed installedcameras. With this approach, however, it is difficult to cap-ture motion in everyday surroundings. This paper describesa new method for motion estimation using a single wear-able camera. We focused on walking motion. The key pointis how the system can estimate the original walking stateusing limited information from a wearable sensor. This pa-per describes three aspects: the configuration of the sensingsystem, gait representation, and the gait estimation method.

1. IntroductionThere is a growing need for ubiquitous sensing systems

that are capable of estimating the position and pose of hu-mans. This type of sensing system is expected to contributegreatly to the realization of applications in which humanbody motion is captured and predicted in order to providecooperative assistance, for example, using image projectionsystems, robots, and mobility systems, as illustrated in Fig-ure 1. The key point in such ubiquitous motion capture is torealize that the sensing should not be restricted by location.

Conventional motion capture systems have been de-signed mainly to generate computer graphic animations. Intypical systems, multiple cameras installed at various posi-tions around a subject (fixed cameras) observe optical mark-ers mounted on the subject’s joints [10, 3, 2]. Recently, amarker-less technique has been developed [9]. However,multiple fixed cameras are required to observe a single per-son at the same time without occlusions from the view-point of each camera. This approach is considered to hinderachievement of the goal described above.

A system configured of wearable sensors is a highlypromising approach. For example, Vlasic et al. developeda novel motion capture system using accelerometers, gyro-scopes, ultrasonic sources, and microphones [12]. However,one disadvantage is that the user is required to attach manysensors at the desired parts of the body. Also, because thepositions of the body parts are measured using ultrasound,simultaneous usage of many such systems in crowded areasis difficult.

gait estimation

wearable camerawearable camerawearable camera

robot assistancerobot assistancerobot assistancecooperative cooperative image projection image projectioncooperative image projection

Figure 1. Human gait estimation using a wearable camera.

On the other hand, Hamaguchi et al. developed a wear-able system for acquiring the walking state of a user [7].In this system, the relative position from the user’s waist toheel is measured by using an electromagnetic tracker. Theposition of the user is updated by using this data when a but-ton placed on the sole is pushed. However, the issue of thecomplex configuration of multiple wearable sensors still re-mains. Also there is a system which recognizes activity byusing a wearable accelerometers [5]. This system cannotutilize landmarks in the environment, so that it is difficult toavoid accumulated errors in the user’s position.

On the other hand, a wearable-sensor technique usinga single camera has been reported [4]. This is based ona method called Simultaneous Localisation and Mapping(SLAM), which estimates the camera motion and constructsa map of the surrounding environment. The report showsthat the system can estimate its own location in an unknownenvironment using a single camera. Although this is nota technique for estimating the body pose of a human sub-ject with a wearable camera, the framework is neverthelesspromising for our purpose.

The key point of our system is to estimate the local bodypose and the global position by using the minimum numberof wearable sensors. In this paper, we focus on the possi-bility that the gait can be described as cyclic foot motions.This allows us to assume that the obtained sensor data fol-lows a behavior based on fixed models, so that it is feasibleto reconstruct the original state from the limited sensor data.

2. Motion estimation using a wearable cameraIn this paper, we propose a system for motion estimation

of a human subject using only a single camera. Recently,the size and the cost of sensors, including cameras, have

1

been drastically reduced. Indeed, many mobile devices nowcome with such sensors installed. Therefore, it is becom-ing commonplace to assume that such sensors are availableanywhere and anytime.

Figure 1 shows the configuration of the proposed sys-tem. The user mounts a camera on his leg, with the camerapointing downward. The camera attached to the user’s bodycaptures changes in an image of the external environmentcaused by the motion. We consider this setup sufficient toestimate the gait. By utilizing the observed information inconjunction with models of human motion, our techniqueestimates both the current actual motion and the global po-sition. Also, because the camera can observe the externalenvironment, it should be possible to reduce the accumu-lated errors that occur in sensors such as accelerometers.

The direction of the camera is not an essential require-ment. In the downward orientation, we intend to removemoving objects, including other people, around the user tosimplify the image recognition. A setup where the camerafaces forward is also feasible, and in that case the amountof information available for location detection is expectedto increase.

Under this configuration, the task involves the problemof pose estimation using a single camera under a situationwhere the motion is based on given models. The motionmodel is not unique and varies with time according to spe-cific rules. The whole body motion must also be estimatedfrom the camera pose.

3. Gait representationThe walking motion needs to be prepared in advance in

order to recognize which motion causes changes in the cap-tured images. This section describes the gait representation.In this paper, we assume that such advance knowledge ofthe gait is constructed for each person.

The inverted pendulum model is a well-known simplemodel of bipedal walking [6]. It is mainly used to createwalking patterns of robots. However, it is not effective tocomplement insufficient sensor data because the invertedpendulum model does not accurately describe human walk-ing motion.

On the other hand, in the field of computer graphics, ani-mations have been created mainly by using the motion datadirectly obtained by motion capture systems. For example,Unuma et al. used a Fourier expansion in order to interpo-late or extrapolate human walking [11]. This kind of fre-quency analysis is considered to be effective because mostwalking motions can be characterized as cyclic motion. Thereported results show that the walking can be representedusing only low-frequency information with a small numberof parameters for animation. We also employed a motion-data-based representation to take advantage of this feature.

The definitions of parameters are given here. In this pa-

waist positionwaist positionwaist position

wearable camera

subjectsubjectsubject

traveling speed

camera coordinate system

world cooridnate system

body cooridnate systembody cooridnate systembody coordinate system

angle

relative joint positions

Figure 2. Parameter definitions.

per, the walking state is updated with the following equa-tion:

µt =

⎡⎢⎢⎢⎢⎣

xt

ytztξtpt

⎤⎥⎥⎥⎥⎦ =

⎡⎢⎢⎢⎢⎣

xt−1 + r(ηt−1) cos(ξt−1)yt−1 + r(ηt−1) sin(ξt−1)

z(ηt−1)

ξt−1 + ξ(ηt−1)p(ηt−1)

⎤⎥⎥⎥⎥⎦ (1)

Here, [xt, yt, zt] describes the waist position at time t, rtis the speed in the traveling direction, and ξt is the angu-lar speed. The behaviors are parameterized based on cyclicmotion with phase ηt. Also, the collected data indicating therelative positions of all joints, with the waist position set asan origin, is pt = {pi

t|i = 1, . . . , Np}. Figure 2 illustratesthe parameter definitions.

There are three coordinate systems: those attached to theworld, the subject’s body, and the camera. The rigid trans-formation between the world and the body is described byusing the walking state. Also, the camera is fixed to thebody, and the rigid transformation between the body andthe camera is assumed to be known in advance.

Based on this definition, as the gait data, the behaviorsof parameters dt = [zt, rt, ξt,pt] are sampled in advance.Walking samples are collected by using a motion capturesystem. In this paper, we represent the cyclic motion byusing a Fourier expansion of up to fifth order. An examplein the case of the waist height is:

zt =5∑

n=0

a0n sin(nηt + b0n). (2)

Using this representation, the behaviors are parameter-ized by using (ain, b

in). Here, the superscript i is an in-

dex identifying each parameter. The sampled gait motion isstored in the system as the feature vector f = {ain, bin|i =1, . . . , Np + 3}.

4. Classification of the walking modelsHuman walking motion has some variations. This sec-

tion describes the method used to classify gaits. In this pa-per, we call each classified motion a walking form. Alsowe assume that the number of forms in steady walking isfinite. The system is assumed to have a sufficient number offorms.

First, many walking samples are collected using amotion-capture system. The samples are converted to vec-tors fk by applying the frequency analysis described in

Section 3. This collected data fk=1,...,Nsis compressed

to lower dimension and projected onto classification space.The projected points are classified by using a clusteringmethod.

In this paper, principal component analysis is used toproject the sampled data onto three-dimensional space. Thefollowing conversion is applied.

gk = U(fk − f) (3)

Here, U is the normalized orthogonal basis, f is themean vector of all data, and gk is the compressed feature.Also, we used the k-means clustering method for classifica-tion [8].

AAA

BB

EE

CCF

DD

FF

walking form

Figure 3. Transition diagram of walking forms.

Our method classifies the walking forms and generatesa transition diagram describing the connection relationshipsbetween the walking forms. An example diagram is shownin Figure 3. This diagram clarifies the time-sequentialchanges of forms, which should make it possible to narrowthe search area to achieve robust estimation. This diagramis generated based on observations made in collecting sam-ples.

5. Human gait estimation5.1. Overview

In the proposed method, the walking form and thepresent state are estimated from the changes observed ina wearable camera. The walking state for this estimationis defined as µt = [xt, yt, ξt, ηt, ηt]. The complete stateµt defined in Section 3 can be reconstructed by using theestimated state µt and the form data fk.

Our method estimates the walking state at every sam-pling of sensor data. An overview of the method is shownin Figure 4. In this method, multiple candidate states aregenerated. The total number of candidates at time t is Mt.Each candidate is tested through five steps: selecting walk-ing forms, state prediction, likelihood estimation, resam-pling, and state updating. The state prediction and updatingare based on the extended Kalman filter (EKF). The detailsof each step are described in the following subsections.

In this paper, we assume that the world coordinates of theobserved points in the environment are known. There is a

state predictionstate predictionstate prediction

likelihood estimationlikelihood estimationlikelihood estimation

selection of walking formsselection of walking formsselection of walking forms

resamplingresamplingresampling

updateupdateupdateupdateupdateupdate updateupdateupdatesensor input

sensor inputsensor inputsensor input

Figure 4. Overview of the walking estimation technique.

possibility of removing this assumption by using the SLAMframework. It should be possible to introduce this approachin our method by extending the state in the EKF.

In addition, the intrinsic parameters of the camera usedare assumed to be known in advance. Therefore, if we ob-tain the walking state, we can project the observed spatialpoints onto the image plane.

5.2. Selection of walking formsThe i-th candidate state generated at time t − 1 is rep-

resented as µit−1. The walking forms for the tests of this

state at time t are selected based on the state transition dia-gram. For example, the candidate state µi

t−1 is updated atthat time by using the walking data of the A-form shownin Figure 3. In this case, this state is tested in the A-formand the connected B- and C-forms, as defined in the statetransition diagram.

5.3. Prediction of walking stateIn the selection step, the walking form at time t for the

candidate state µit−1 is selected. State prediction is applied

using the walking feature of the selected form fk. Theequation of state is

µit = g(µi

t−1) + εt = µit + εt. (4)

Here, εt is process noise, and g is a nonlinear state pre-diction function. In this step, we only calculate the vectorµi

t. In the function g, the parameters [xt, yt, ξt] in the stateµi

t are predicted based on equation (1). The phase parame-ters are predicted by the following equation:[

ηtηt

]=

[ηt−1 + ηt−1

ηt−1

]. (5)

5.4. Likelihood estimation and resamplingThe likelihood of the i-th candidate state ωi

t is estimatedusing the likelihood at the previous time t− 1 based on thefollowing equation:

ωit = Cλi

tωit−1. (6)

In this update, if the candidate is predicted based on thesame walking form at the previous time, a higher value is

given to the likelihood as an advantage. This operation iscontrolled by the parameter C. If the forms between twosuccessive estimations are the same, C is set to be largerthan one; otherwise C is set to one.

The parameter λit is the reliability of the prediction tested

by the sensing data. Using the predicted state, the spatialpoints based on world coordinates can be projected onto theimage plane of the camera. Using the sum of the errors be-tween the projected points and the observed correspondingpoints, we define the reliability λi

t as

λit =

1√2πσ

exp{−(| ut − h(µi

t) | /Nt − ν)2

2σ2}. (7)

Here, ut is the Nt observed image points, and h(µt)is the projected image points of the corresponding knownspatial points. As shown in the equation, the error is repre-sented using a Gaussian distribution. In the experiment, weset the average ν and the variance σ2 as 0 and 10, respec-tively.

In the resampling step, the candidate states are filtered.Our method leaves only the candidates whose likelihoodsare high. The state update described in the next section isapplied to those remaining candidates. For the sake of con-venience, the numbering of the candidate states is initializedafter this operation.

5.5. State updateThe prediction errors obtained in the previous step are

caused by the sensor errors and differences between thestored walking form and the observed one. It remains pos-sible that the observed form is slightly different from theprepared one. The error is eliminated by the update opera-tion using the captured sensor data. The observation modelis described by

ut = h(µt) + δt. (8)

The update operation is based on the following equa-tions, namely, the framework of the EKF technique:

Σt = GtΣt−1GTt + P t (9)

Kt = ΣtHTt (HtΣtH

Tt +Qt)

−1 (10)

µt = µt +Kt(ut − h(µt)). (11)

Here, Gt is the Jacobian of the state prediction g de-scribed in Section 5.3, Ht is the Jacobian of the observa-tion model h described in Section 5.4, and P t,Qt,

∑t are

the covariance matrices of process noise, observation noise,and the system.

In our method, the state prediction g is changed with thetransition of forms. Although this could cause a problemin the EKF technique, from the experiments, we confirmedthat this is not a problem in the estimation because of the ro-bustness in EKF as well as our design, where the transitionsof the forms at the possible phases are smooth.

6. Experiment6.1. Simulation using sample walking data

In this experiment, we used the data in the motion cap-ture library of the Carnegie Mellon University Graphics Lab[1]. The data was for five gaits, including two types of “slowwalk”, two types of “walk”, and one type of “run”. Allwalks were performed by the same person.

First, the five sets of data were converted to frequencyfeatures f using the method described in Section 3. Next,those features were classified using the method describedin Section 4. The classification result is shown in Figure 5.The features were classified into three types. The points inthe same cluster are shown in the same color. The three clas-sified types correspond to “slow walk”, “walk”, and “run”,respectively. This means that the classified results matchedthe initial classification decided visually based on the walk-ing speed. The transition diagram is shown in Figure 5 bya black line. In this experiment, the connection was sim-ple. Transitions are allowed only between “slow walk” and“walk”, and between “walk” and “run”. Also, the walkingdata stored in the system is selected from those data. For“walk” and “slow walk”, one of two data sets was used. For“run”, the single data set was used directly.

−20

2

−6−4−20246

−10

−5

0

5

10

15

2nd-order feature value

1st-order feature value

3rd-

orde

r fea

ture

val

ue

slow walkslow walkslow walkwalkwalkwalk

runrunrun

Figure 5. Classification result and transition diagram.

Figure 6. The input human gait and the estimated gait.

The generated walking motion was created by connect-ing the three types of walking forms. In this experiment, theorder of form changes was “slow walk”, “walk”, and “run”.For the “slow walk” and “walk” motions, the data that wasnot selected in the advance setup described above was used.The “run” motion was the same as the one that the systemstores. The sensor input was created by projecting the spa-tial points onto the camera image plane based on the createdmotion. The camera was assumed to be placed on the outerside of the right thigh.

0 100 200 300 400 5000

2.5

5

7.5

10

12.5

frame number

aver

age

of e

rror

s[cm

] slow walk walk run

Figure 7. Average errors in joint positions.

The estimation result is shown in Figure 6. The greensubject in the upper part of the figures shows the input mo-tion. Also the blue subject in the bottom part of the figureshows the estimated motion. The pairs of figures arrangedone above another show the poses at the same point in time.Figure 7 shows the time variation of the average error inthe joint positions. The errors were within 10 cm, and sothe achieved accuracy is considered to be high. Althoughin this experiment the actual motion and the stored motionwere different and the timings at which the forms changewere unknown, the form transition and the state update inthe estimation were not influenced by such problems, andworked well to reconstruct the original walking motions.

6.2. Experiment using actual sensing data

landmarklandmark

motion capture systemmotion capture systemmotion capture system

wearable camerawearable camerawearable camera

landmark

Figure 8. Photograph of the experimental environment.

This section describes an experiment using the actualsensing data. A photograph of the experiment is shown inFigure 8. In this experiment, the camera was placed on theouter side of the right thigh. The camera used was a BaslerPioneer. In this experiment, we used 400 × 400-pixel im-ages captured at 250 fps. A chessboard pattern was set onthe ground so that we assumed that the world coordinatesof the observed points were known. Also, the observationmodel described in Section 5.5 does not involve lens distor-tion. Therefore, the images were used after applying distor-tion correction.

First, the walking motions were sampled. The motioncapture system used for this sample collection was a Nat-uralPoint OptiTrack. The sampled motions were classified

0 1 2 3 4 5 689909192939495969798

wave Phase [rad]

wai

st h

eigh

t [cm

]

Figure 9. Sample waveform of six different motions.

-200

20 -20 -10 0 10 20

-20

-10

0

10

20

30

2nd-order feature value

1st-order feature value

3rd-

orde

r fea

ture

val

ue walkwalkwalk

turn leftturn leftturn left

slow turn leftslow turn leftslow turn leftslow walkslow walkslow walk

slow turn rightslow turn rightslow turn right

turn rightturn rightturn right

Figure 10. The classified result using the actual sensor data.

after being converted to frequency features. As an exam-ple, the waist height behaviors of six different motions areshown in different colors in Figure 9. The classification re-sult is shown in Figure 10. The motions were classified intosix forms: “walk”, “left-turning walk”, and “right-turningwalk” each at two different rates. Three samples were col-lected for each motion. The motion of each group that wasclosest to the center of each cluster was stored in the sys-tem. Also, the defined transitions are shown in Figure 10 asthe black lines.

Based on this setup, the estimation was tested duringwalking. In the estimation, the subject walked straightahead, turned left, and walked straight ahead again. Also,he attempted to walk at slow speed. A Harris detector wasused to detect the feature points in the captured image. Theresults of estimation are shown in Figure 11. In the fig-ure, the photographs captured during this experiment andthe corresponding estimated poses are compared.

Also, Figure 12 shows the time variation of the aver-age errors in the image plane between the observed imagepoints and the image points at which the corresponding spa-tial points were projected based on the estimation results. Asufficient number of feature points in the images could notbe detected, so that the errors became high. However, mosterrors were under around 10 pixels. Also even when the fea-ture points were not detected and the walking motion wasdifferent from the stored motion, the estimation was stable.

Figure 13 shows the time variation of the estimated walk-ing forms. In this experiment, if the subject achieved theideal motion, the observed walking could be represented us-ing only three forms. However, in the figure, the forms var-ied. This was because similar poses were included in neigh-boring motions under a situation where the system captured

Figure 11. Estimation results of walking motion.motions for a brief time. This is not a critical problem ifthe estimated pose is unnatural. One way to improve theestimation accuracy is to introduce a physical model of theform change instead of representing it using the values ex-pressed as equation (6).

0 200 400 600 800 10000

5

10

15

20

25

30

35

frame number

aver

age

erro

rs [p

ixel

]

Figure 12. Time variation of errors evaluated by projecting the spa-tial points onto the image plane based on the estimated pose.

0 200 400 600 800 1000

Walk

Turn left

Turn right

Slow walk

Slow turn left

Slow turn right

frame number

wal

king

form

Figure 13. Time variation of the walking forms.

7. ConclusionUbiquitous motion capture technology will become in-

creasingly important in many applications. This paper pro-poses a technique for estimating human gait with a simplesensor configuration. In our method, advance knowledge ofthe user’s motion allows use of only a single camera as thewearable sensor.

This is a new approach to wearable sensors, with which

it is possible to estimate the body motion and the globalposition using only the changes in images of the externalenvironment caused by the user’s motion. Another novelaspect is the proposed representation of the various walk-ing motions in a compact dataset. We also proposed uti-lizing this sensing configuration and the stored knowledgeto achieve effective walking estimation. This method inte-grates changes in walking form using a transition diagramand state estimation using an extended Kalman filter (EKF).Experiments showed that our method worked well when ap-plied to actual human motion.

As a next step, we plan to add typical non-cyclic mo-tions, such as stumbling. Also the system needs to focuson not only motions on flat ground but also on stairs. Inaddition, the system must be able to handle unknown envi-ronments.

References

[1] CMU graphics lab motion capture database.http://mocap.cs.cmu.edu/.

[2] OptiTrack. http://www.naturalpoint.com/optitrack/.[3] VICON. http://www.vicon.com/.[4] A. J. Davison, I. D. Reid, N. D. Molton, and O. Stasse.

MonoSLAM: Real-time single camera SLAM. IEEETransactions on Pattern Analysis and Machine Intelligence,29(6):1052–1067, 2007.

[5] J. Frank, S. Mannor, and D. Precup. Activity and gait recog-nition with time-delay embeddings. In Proc. 24th AAAI Con-ference on Artificial Intelligence, 2010.

[6] M. Garcia, A. Chatterjee, A. Ruina, and M. Coleman. Thesimplest walking model: Stability, complexity, and scaling.ASME Journal of Biomechanical Engineering, 120:281–288,1998.

[7] A. Hamaguchi, M. Kanbara, and N. Yokoya. User localiza-tion using wearable electromagnetic tracker and orientationsensor. In Proc. IEEE International Symposium on WearableComputers, pages 55–58, 2006.

[8] J. A. Hartigan and M. A. Wong. A k-means clustering algo-rithm. Journal of the Royal Statistical Society, 28:100–108,1979.

[9] T. B. Moeslund and E. Granum. A survey of computervision-based human motion capture. Computer Vision andImage Understanding, 81:231–268, 2001.

[10] S. Rosenthal and J. Pella. The process of motion capture:Dealing with the data. Computer Animation and Simulation,97:3–18, 1997.

[11] M. Unuma, K. Anjyo, and R. Takeuchi. Fourier principlesfor emotion-based human figure animation. In Proc. ACMSIGGRAPH, pages 91–96, 1995.

[12] D. Vlasic, R. Adelsberger, G. Vannucci, J. Barnwell,M. Gross, W. Matusik, and J. Popovic. Practical motion cap-ture in everyday surroundings. ACM Transaction on Graph-ics, 26:35, 2007.

human gait estimation using a wearable camera · human gait estimation using a wearable camera...

Documents