learning pedestrian dynamics from the real worldpscovann/sfm_iccv09.pdf · learning pedestrian...

8
Learning Pedestrian Dynamics from the Real World Paul Scovanner University of Central Florida Orlando, FL [email protected] Marshall F. Tappen University of Central Florida Orlando, FL [email protected] Abstract In this paper we describe a method to learn parame- ters which govern pedestrian motion by observing video data. Our learning framework is based on variational mode learning and allows us to efficiently optimize a continuous pedestrian cost model. We show that this model can be trained on automatic tracking results, and provides realistic and accurate pedestrian motions. 1. Introduction Systems for predicting pedestrian movement are be- coming increasingly important in a variety of research areas. They are used in virtual environments for gen- erating realistic crowd movement [13]. In simulators, they can be used to evaluate structures for crowd flow or evacuation [9, 3]. More recently, they have been used to detect anomalous events [10]. In this paper, we present a system for predicting pedestrian movements that can be trained by observing human movements in a natural environment. Our system is unique in that a pedestrian’s movements are formu- lated as a series of continuous optimizations. This for- mulation overcomes significant issues with previous at- tempts at learning behavioral models from video such as [2, 8]. Specifically, our model does not require the dis- cretization of the space of possible locations, like [2] and is able to learn more complicated models with more pa- rameters than the models from [8]. Section 3 discusses the benefits of our proposed approach in more detail. This model can both produce qualitatively accurate simulations of pedestrian movement, such as the sim- ulations shown in Figure 1, and provide predictions of pedestrian movement that are quantitatively more ac- curate than standard methods for predicting movement. Section 7 describes these results in more detail. An overview of our model is given in Section 4 and the details of our learning method are given in Sections 5 and 6. We evaluate our method both quantitatively and qualitatively in Section 7. Figure 1. This work focuses on learning a model of pedestrian move- ment from real-world pedestrian tracks taken from video data. This image shows an example of a two pedestrians’ paths, shown in black, and the system’s predicted paths for those pedestrians, shown in red. The system accurately models the deflections in the pedestrians’ paths due to other pedestrians. 2. Modeling Pedestrian Behavior Before describing our model and learning procedure in detail, this section will briefly review previously pro- posed models of pedestrian movement. We refer the reader to [13] for a more comprehensive review of dif- ferent approaches. One of the first pedestrian modeling methods to de- scribe how a people path with regard to their surround- ings was the social force model (SFM), proposed by Helbing and Moln´ ar [4]. At its core, the SFM operates on the assumption that the scene, the person’s prefer- ences, and other pedestrians exert forces on a person, which help to determine his or her path. This model al- lows a large scale view of large crowds of people to be modeled by describing the characteristics of individual people using a combination of relatively simple forces. This basic model has been extended to more accurately describe various kinds of crowds [9, 3]. The social force model, and its relatives, take an agent-based approach to crowd movement. In contrast,

Upload: others

Post on 24-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning Pedestrian Dynamics from the Real Worldpscovann/SFM_ICCV09.pdf · Learning Pedestrian Dynamics from the Real World Paul Scovanner University of Central Florida Orlando, FL

Learning Pedestrian Dynamics from the Real World

Paul ScovannerUniversity of Central Florida

Orlando, [email protected]

Marshall F. TappenUniversity of Central Florida

Orlando, [email protected]

Abstract

In this paper we describe a method to learn parame-ters which govern pedestrian motion by observing videodata. Our learning framework is based on variationalmode learning and allows us to efficiently optimize acontinuous pedestrian cost model. We show that thismodel can be trained on automatic tracking results, andprovides realistic and accurate pedestrian motions.

1. IntroductionSystems for predicting pedestrian movement are be-

coming increasingly important in a variety of researchareas. They are used in virtual environments for gen-erating realistic crowd movement [13]. In simulators,they can be used to evaluate structures for crowd flow orevacuation [9, 3]. More recently, they have been used todetect anomalous events [10].

In this paper, we present a system for predictingpedestrian movements that can be trained by observinghuman movements in a natural environment. Our systemis unique in that a pedestrian’s movements are formu-lated as a series of continuous optimizations. This for-mulation overcomes significant issues with previous at-tempts at learning behavioral models from video such as[2, 8]. Specifically, our model does not require the dis-cretization of the space of possible locations, like [2] andis able to learn more complicated models with more pa-rameters than the models from [8]. Section 3 discussesthe benefits of our proposed approach in more detail.

This model can both produce qualitatively accuratesimulations of pedestrian movement, such as the sim-ulations shown in Figure 1, and provide predictions ofpedestrian movement that are quantitatively more ac-curate than standard methods for predicting movement.Section 7 describes these results in more detail.

An overview of our model is given in Section 4 andthe details of our learning method are given in Sections5 and 6. We evaluate our method both quantitatively andqualitatively in Section 7.

Figure 1. This work focuses on learning a model of pedestrian move-ment from real-world pedestrian tracks taken from video data. Thisimage shows an example of a two pedestrians’ paths, shown in black,and the system’s predicted paths for those pedestrians, shown in red.The system accurately models the deflections in the pedestrians’ pathsdue to other pedestrians.

2. Modeling Pedestrian Behavior

Before describing our model and learning procedurein detail, this section will briefly review previously pro-posed models of pedestrian movement. We refer thereader to [13] for a more comprehensive review of dif-ferent approaches.

One of the first pedestrian modeling methods to de-scribe how a people path with regard to their surround-ings was the social force model (SFM), proposed byHelbing and Molnar [4]. At its core, the SFM operateson the assumption that the scene, the person’s prefer-ences, and other pedestrians exert forces on a person,which help to determine his or her path. This model al-lows a large scale view of large crowds of people to bemodeled by describing the characteristics of individualpeople using a combination of relatively simple forces.This basic model has been extended to more accuratelydescribe various kinds of crowds [9, 3].

The social force model, and its relatives, take anagent-based approach to crowd movement. In contrast,

Page 2: Learning Pedestrian Dynamics from the Real Worldpscovann/SFM_ICCV09.pdf · Learning Pedestrian Dynamics from the Real World Paul Scovanner University of Central Florida Orlando, FL

other models describe large crowds as a fluid whichobeys certain physical properties and gives no explicitregard to the tendencies of the parts which make up thewhole [6]. These methods are popular for simulationand stability detection in extremely large crowds. Also,there is a sort of hybrid approach to these two methods ofdescribing crowd behavior, often called the continuumapproach. This approach was introduced by Hughes [5]and was further developed by Treuille et. al. [13]. Thecontinuum approach takes a similar form as the SFM,however it calculates the forces with respect to the envi-ronment (pedestrian density of a specific region, velocityof the average person at that location, discomfort expe-rienced by being at such location) and then assumes thatpedestrians will move according to both these sharedforces, and the forces guiding them towards their goal.In this sense the continuum model is also similar to thefloor fields of [1].

We wish to note the difference between the modelsproposed here and work on modeling pedestrian tracks,such as [11]. The primary difference is that work like[11] models the tracks that people are likely to take with-out explaining why they take that track. These modelscan predict where a pedestrian is likely to be in a givenscene, but do not explain why he or she is there. Instead,the models considered in this paper describe the pedes-trian’s underlying motivations to predict how and whythe pedestrian moves.

2.1. Learning Pedestrian Models

Models of pedestrian movement are typically hand-specified. Relatively little work has focused on the prob-lem of using real-world data to produce accurate mod-els. Here we review the two major attempts at learninga pedestrian movement model.

2.1.1 Discrete Choice Model

In [2], Antonini et al. model pedestrian behavior as aseries of discrete choices. In this model, both time andspace are discretized. At each time instant, the pedes-trian chooses the next location from a set of possible dis-crete locations. This choice is made using a multi-classlinear classifier. This leads to a straightforward formu-lation of the learning problem, however discretizing thepossible destinations introduces issues. The most press-ing issue is the difficult balance between making the gridtoo coarse, which affects the accuracy of the prediction,and making the grid too fine, which enhances accuracybut requires substantially more computation. Their workproposes an adaptive spacial discretization approach toovercome the difficulties associated with a fixed grid,but this increases the complexity of implementation.

2.1.2 Learning Social Force Models

The second approach, described in [8], uses a classicsocial force model, similar to that proposed in [3]. Thegoal of their work is to find parameters of the model suchthat the simulated movements match tracks in video.This is accomplished using an unspecified evolutionaryalgorithm to optimize two parameters in the model. Intheir work, the learning algorithm is not described, so itis unclear how well the learning will scale up to modelswith many parameters.

3. Model OverviewOur model is similar in aspects to the previous two

approaches, but its unique formulation overcomes is-sues in both approaches. Similar to the work of [2], ourmodel is built on pedestrians choosing the next locationat each discrete time step. However, in our model, thatdecision is made by optimizing a function that is contin-uous in space. This eliminates the need for complicateddiscretization schemes.

Our approach is similar to [8], in that we also learnthe parameters of a continuous model. A key differencein our approach lies in how the model is specified. Here,we pose the pedestrian’s movement as an energy mini-mization problem. This enables us to build on previouswork on learning parameters for energy functions, suchas [12].

Posing the pedestrian’s movement as a series of con-tinuous optimizations makes it possible to differentiatethe predicted paths with respect to the parameters of themodel. This enables us to use standard gradient-basedoptimization algorithms to optimize the model parame-ters. The availability of robust learning algorithms en-ables us to use more complicated models. The modelsdescribed in this paper use 17 parameters.

4. Pedestrian ModelWe pose the problem of predicting a pedestrian’s path

as a series of energy minimization steps. The pedes-trian’s path is modeled as a set of discrete steps. Whilethe path is discretized in time, it is not discretized inspace. An energy, or cost, is assigned to each possible(continuous) location that the pedestrian might move to.At the next discrete time step, the pedestrian moves tothe location that minimizes this energy. We denote thiscost as E(xt) where xt is a 2D vector containing thepedestrian’s location at time t.

We assume that the path a pedestrian takes is influ-enced by the following four general motivations. An en-ergy function will describe each of these motivations.The motivations are

1. A desire to not move too far in a short amount of

Page 3: Learning Pedestrian Dynamics from the Real Worldpscovann/SFM_ICCV09.pdf · Learning Pedestrian Dynamics from the Real World Paul Scovanner University of Central Florida Orlando, FL

Figure 2. The avoidance forces can be seen here as a field which isoverlayed on a frame of the video containing four pedestrians. The ar-rows display the direction and magnitude of the gradient of this avoid-ance field.

time. We refer to this at the limited movement termand the energy function expressing this motivationwill be denoted as ELM (xt).

2. A desire to remain at a constant speed and di-rection. This motivation will be represented byECV (xt), where CV stands for “constant veloc-ity”.

3. A desire to reach one’s destination, represented byEDest(xt)

4. A desire to avoid other pedestrians in the scene, ex-pressed by EAV (xt)

The complete energy E(x) is a weighted combina-tion of these components. If the weight of each com-ponent is expressed within the components (see the fol-lowing subsections), then the complete energy can bewritten as:

E(xt) = ELM (xt) +ECV (xt) +EDest(xt) +EAV (xt)(1)

The following sections describe each of the energycomponents listed above. During training, our learningalgorithm optimizes a vector of parameters, θ. Theseweights are expressed within each of the energy func-tions. Section 6 and the following sections will describehow these parameters can be found by training on objecttracking data.

4.1. Movement Cost

The movement cost, or cost for moving too far in tooshort a time, prevents a pedestrian from jumping to a lo-cation that is too far away. This cost penalizes all move-ment, however it balances with other energies to allowreasonable speeds of movement while significantly pe-nalizing physically impossible speeds.

ELM (xt+1) = w(θ1)(xt+1 − xt)2 (2)

The term w(θ1) is the weight assigned to this com-ponent. In practice, we use an exponential functionto compute w(θ) for all of the components, makingw(θ1) = exp(θ1). We use the exponential function toensure that all weights are positive.

4.2. Constant Velocity

In our model, a pedestrian also seeks to maintain aconstant velocity and direction. This is expressed asan energy function over possible values of xt+1, whichis the location of the next step. This function is con-structed as a smoothed approximation of the radius be-tween xt+1 and the point that the pedestrian would havereached if maintaining a constant velocity. This point iscomputed by extrapolating from the previous two stepsin the pedestrian’s path. The ε term, which smooths thefunction to prevent discontinuity in the derivative at itsminimum, is set to 10−4.

ECV (xt+1) = w(θ2)√||xt+1 − (2xt − xt−1)||2 + ε

(3)

4.3. Destination

We hypothesize that pedestrians have some destina-tion in mind and they eventually are observed reachingthat destination. We do not have a strong final desti-nation in our scene so we assume the point where theperson exits the scene to be their destination. In applica-tions such as video analysis, destinations may be knownand therefore could be used. In real-time tracking ap-plications the destination of the individuals will likelynot be known and this force may be left out of the costcalculation.

Similar to the constant velocity component, de-scribed above, we use a smoothed approximation to theradius:

EDest(xt+1) = w(θ3)√||xt+1 − d||2 + ε (4)

The point d is the destination found from the track.In Section 7.1 we will show how this model can be usedwithout a destination cost to accurately predict a pedes-trian’s path several time steps ahead.

Page 4: Learning Pedestrian Dynamics from the Real Worldpscovann/SFM_ICCV09.pdf · Learning Pedestrian Dynamics from the Real World Paul Scovanner University of Central Florida Orlando, FL

Figure 3. The avoidance energy is made up of the sum of avoidanceterms at different locations and with different sizes. This function iscreated from a collection of rotated exponential functions. This makesit convenient to compute convex upper-bounds on this function.

4.4. Avoidance

The previous energies modeled where a pedestrianwould like to walk, but it is also important that the modelbe able to predict areas that the pedestrian would like toavoid. Avoidance is incorporated into the model with arepulsive energy function that goes to zero as one movesaway from the center of the function.

The complete avoidance energy is the sum of avoid-ance terms at different locations and of different sizes.Figure 3 shows the shape of each individual avoidanceterm. The kth avoidance term in the avoidance energy,EAV , is a repulsive function, which will be denotedR(·). This repulsive function is centered at location p,with size parameter σ, has the form

R(x; p, σ) = − 1γ

log

Nφ∑i=1

exp(−γe−ri

) . (5)

where ri = 1σ ((xx − px) cos(φi) + (xy −

py) sin(φi)). The scalars xx and xy are the x and ycomponents of x, with a similar notation for p. The an-gles φ1 . . . φNφ are uniformly spaced between 0 and 2π.The scalar γ is fixed for all avoidance terms. It affectsthe sharpness of the fall-off. In practice, we use the valueγ = 20. The combination of these avoidance functionscreates a sort of avoidance field which is visualized inFigure 2.

The value of this function at a point x can be thoughtof as the smooth approximation of the minimum valueat x of a set of rotated exponential functions. We usethis function for the avoidance energy rather than a moretypical function, like a Gaussian, because this functionmake it possible to learn the model parameters. Aswill be discussed in Section 5.1.2, it is possible to useJensen’s inequality to compute a convex upper boundon this function. This enables us to use the VariationalMode Learning strategy from [12] to learn the model pa-rameters.

4.4.1 Constructing the Avoidance Component fromTerms

If the there are No obstacles, o1 . . .oNo , at time t, anavoidance term is created for each obstacle. This avoid-ance term itself is the sum of multiple copies of the re-pulsive function described above. A pedestrian may notavoid just the current location of another individual, butalso the location of the individual in the near future. Toaccount for this, repulsive functions are placed at the in-dividual’s current location and his predicted location 2,4, 6, 8, 10, and 12 steps in the future. These predictedavoidance locations, denoted as p1,p2, . . . are found byassuming that the individual maintains constant velocity.

In addition, it is unknown how far away an individualmust be before affecting a pedestrian’s path. Therefore,we place repulsive functions with different values of σ ateach predicted location of the individual. Thus, if thereare Np predicted locations for each individual and Nσdifferent size parameters, the avoidance energy due toa single individual will consist of Np × Nσ repulsivefunctions.

To control the number of parameters in the model,we assign weights to each predicted position, θ4i , andeach size parameter θ5j . These two weights are com-bined to produce the weight of each repulsive functionin the avoidance energy. In practice, we multiply theweight due to size by the weight due to predicted po-sition. Combining these weights, the avoidance energydue to a single individual can be expressed as

EAV (xt+1) =Np∑i=1

Nσ∑j=1

w(θ4i)w(θ5j )R(xt+1; pi, σj)

(6)If multiple individuals are present in the scene, thenthe avoidance energy is the sum of the each individualsavoidance energy. In the experiments in this paper Np is7 and Nσ is 7, resulting in a total of 17 parameters.

5. Optimization for Generating PedestrianTracks

In our model, a pedestrian’s track is generated by op-timizing the pedestrian’s energy function E(·) to choosethe next location. In this section, we describe how thisoptimization is performed. As Section 6 will explain,this optimization is structured to make it possible tocompute the derivatives of the predicted path with re-spect to the model parameters. This makes it possibleto minimize a loss function measuring the difference be-tween the predicted pedestrian paths and ground-truthpaths.

The optimization procedure uses a modified ver-sion of Newton’s method, without the backtracking

Page 5: Learning Pedestrian Dynamics from the Real Worldpscovann/SFM_ICCV09.pdf · Learning Pedestrian Dynamics from the Real World Paul Scovanner University of Central Florida Orlando, FL

line search, to minimize E(·). Traditionally, Newton’smethod can be viewed as fitting a second-order Taylorapproximation to E(·) at a point, then moving in the di-rection of the minimum of the approximation. In ourimplementation, instead of approximating the functiondirectly, a tight, convex upper bound on E(·) is approx-imated insted. The following subsections describe howthese upper-bounds are computed.

Utilizing upper-bounds is necessary because theavoidance penalties described in Section 4.4 make E(·)non-convex. Thus, the optimization steps will fail if apoint is encountered where the Hessian matrix at thatpoint has negative eigenvalues 1. Using a convex upperbound on E(·) ensures that this will not happen. Whileconvergence is not guaranteed without the line-search,in our experiments we have not encountered any situa-tions where the optimization does not converge.

Below, the optimization steps are described algorith-mically. The variables xt and xt−1 denote the currentand previous locations of the pedestrian. NI is the num-ber of optimization iterations. The result of the opti-mization is the pedestrian’s next step, denoted xt+1. Inthe course of computing the next steps, a number ofintermediate locations are computed. These intermedi-ate locations are denoted using the variable z. Usingthese variables, the optimization consists of the follow-ing steps:

Initialize z0 ← xt;1

for j = 1 . . . NI do2

Compute E(zj) by replacing the EAV , EDest3

and ECV terms with convex upper bounds,computed at zj−1;Compute∇2E(zj) and ∆E(zj);4

zj ← zj−1 −(∇2E(zj)

)−1

∆E(zj);5

end6

xt+1 ← zNI ;7

5.1. Computing Upper Bounds

In Step 3 of the optimization, upper bounds are com-puted for all of the non-quadratic terms in E(x). Thissections describes how these upper bounds are com-puted.

5.1.1 Upper-bounds for EDest and ECV

Both EDest and ECV have the form√r2 + ε where r2

is some scalar distance. In the case of EDest, it is thedistance to the destination, while for ECV , it is the dis-tance to the point that pedestrian would move to if trav-

1Intuitively, a second-order approximation at the center of anavoidance term will be a quadratic function pointing downward. Min-imizing this approximation will produce values at +/−∞

eling with a constant velocity. For these two energies,we upper bound

√r2 + ε at a point r0 with the function

12

r2√(r0)2 + ε

+

[√(r0)2 + ε− 1

2(r0)2√

(r0)2 + ε

](7)

5.1.2 Upper Bounds for EAV

The upper bounds for the avoidance terms, described inSection 4.4, can be found using Jensen’s Inequality:

EAV (x) ≤Nφ∑i=1

exp(−γe−r′i)∑Nφj=1 exp(−γe−r′j )

e−ri +K (8)

where each term ri has the same definition as in Section4.4 and r′i is the value of ri computed at the point xwhere the upper bound is being computed. The term Kdenotes constant terms. These terms can be ignored asour goal is to minimize this upper bound.

6. Learning Model ParametersOur goal is to choose the parameters θ that make

the predicted pedestrian tracks match tracks observed invideo as closely as possible. To accomplish this, we de-fine a loss function L(x∗, t) that measures the differencebetween a predicted track x∗ and the ground truth trackt. Here we use a smoothed version of the L1 differencebetween the ground-truth track and the predicted track:

L(x∗, t) =Ns∑i=1

√||xt − tt||2 + ε (9)

where xt and tt are locations at time t in the overalltrack, and Ns is the number of samples in the track.

Because the loss depends on x∗, the loss can be min-imized with gradient-based optimization methods if thederivatives of x∗ can be computed with respect to the pa-rameters θ. The optimization strategy described in Sec-tion 5 is designed to make these computations possible.

Computing the gradient of the loss is possible be-cause an intermediate value during the optimization, zjis related to the previous value, zj−1, by multiplicationwith an inverse matrix. Thus, the Jacobian matrix ∂zj

∂zj−1

relating zj and zj−1 can be found. Using this Jacobian,combined with ∂zj−1

∂θ , it is then possible to compute ∂zj∂θ .

These basic steps can be repeated until the derivative ofeach step xt with respect to the parameters θ has beencomputed. With these derivatives, it is trivial to computethe derivative of the loss with respect to θ.

As in Variational Mode Learning, because each opti-mization step is differentiable, the gradient of the resultof the optimization can be computed by repeated appli-cation of the chain rule, similar to back-propagation inneural networks.

Page 6: Learning Pedestrian Dynamics from the Real Worldpscovann/SFM_ICCV09.pdf · Learning Pedestrian Dynamics from the Real World Paul Scovanner University of Central Florida Orlando, FL

For clarity, the following section shows how thesederivatives can be calculated for the model parameters.The algorithm for computing the derivative of the lossfunction with respect to the parameters θ is:

Initialize x0 to the initial point on the track;1

Initialize ∂x0∂θ to a 2×Nθ matrix, where Nθ is the2

number of parameters;for t = 1 . . . NS do3

z0 ← ∂xt−1∂θ ;4

for j = 1 . . . NI do5

Compute ∂zj∂xt−1

, ∂zj∂zj−1

, and ∂zj∂xt−2

;6

Compute ∂zj∂w

∂w∂θ (See below for7

explanation);∂zj∂θ ←

∂zj∂zj−1

∂zj−1∂θ + ∂zj

∂w∂w∂θ ;8

∂zj∂θ ←

∂zj∂θ + ∂zj

∂xt−1

∂xt−1∂θ + ∂zj

∂xt−2

∂xt−2∂θ ;9

end10∂xt∂θ ←

∂zNI∂θ ;11

∂L∂θ ←

∂L∂θ + ∂L

∂xt∂xt∂θ12

end13

The matrix ∂zj∂w

∂w∂θ appears because the energy func-

tion is the sum of a set of component energy functions,each with its own weight, wc. These weights are gener-ated from the parameters θ. The terms ∂zj

∂xt−1and ∂zj

∂xt−2

appear because of the inertial and constant velocity com-ponents of the energy function. These components in-volve the location of previous steps in the pedestriantrack.

6.1. Deriving Derivatives for EDest

Space does not allow for a full derivation of all ofthe derivatives, so only the derivation of terms for EDestwill be shown here. We refer the reader to [12] for moreinformation about deriving the derivatives.

The Newton step in the optimization procedure de-scribed in Section 5 can be thought of as minimizing asecond-order approximation of E(·), the upper boundon E(·). In this section, the solution to this second orderapproximation will be denoted as zj = A−1h, wherezj is one of the intermediate steps in the optimization.

Because E(·) is the sum of the individual componentfunctions, such as EDest, the second-order approxima-tion of E(·) is the sum of the second order approxima-tions of the individual energy components. Thus A isactually the sum of matrices, with one matrix for eachof the energy components. The vector h is likewise asum.

The contribution of EDest to A can be found by not-ing that the upper bound on the destination component,

EDest, is itself quadratic. It can be expressed in the form

EDest(zj ; zj−1) =12eθ3(zj − d)T

[a 00 a

](zj − d)

(10)where d is the destination of the pedestrian, a =(√||zj−1 − d||2 + ε)−1, and the term eθ3 is the weight

assigned to this component of the energy function. Theexponential is used to ensure that all weights are posi-tive.

Differentiation of this quadratic system makes it pos-sible to compute the contribution to A and h, which willbe as denoted ADest and hDest, as

ADest = eθ3[a 00 a

]hDest = eθ3

[a 00 a

]d

(11)As the second-order approximation E(·) is the sum of

components from the different motivations, the deriva-tive ∂zj

∂zj−1is the sum of terms corresponding to each of

these components. We denote the contribution of thedestination component EDest(·) as

∂zDestj

∂zj−1= −eθ3A−1diag(zj − d)

[∂a∂z

](12)

where a =[aa

]using a as defined above and

diag(zj−d) is a diagonal matrix with the vector (zj−d)placed along the diagonal. The derivation of this contri-bution can be found in the Appendix.

The derivative ∂zj∂w

∂w∂θ can be computed in a similar

fashion:

∂zj∂w

∂w

∂θ= −A−1

[eθ3a 0

0 eθ3a

](zj − d) (13)

7. EvaluationTo evaluate this model of pedestrian movement, we

gathered a dataset of 92 pedestrian tracks. In our studywe used individuals moving through hallways as ourtraining and testing data. For this experiment ground-truthing the pedestrian tracks was done manually toeliminate errors that may be caused by automatic track-ing methods, though the experiments in Section 7.1 willuse tracks generated by an unsupervised algorithm. Thelocation given to each person was his or her location onthe ground plane, or simply where he or she was stand-ing at each point in time.

Using a simple homography we mapped the im-age coordinates to real world coordinates and all learn-ing/prediction was done in the real world coordinate sys-tem. A point on the top of the head such as used by [8]would be easier to track, however these points wouldnot be moving on the same plane, as different people are

Page 7: Learning Pedestrian Dynamics from the Real Worldpscovann/SFM_ICCV09.pdf · Learning Pedestrian Dynamics from the Real World Paul Scovanner University of Central Florida Orlando, FL

Figure 4. Examples of pedestrian paths, shown in black, and predicted paths, shown in red. The model accurately predicts the deflection ofpedestrians due to oncoming obstacles.

different heights causing errors in the mapping betweencoordinate systems.

One third of the tracks were used for training andtwo-thirds were used for testing. In our first experi-ment, we used the model to predict each pedestrian’spath, given the individual’s initial position, velocity, andthe locations of the obstacles. Figure 4 shows predictedand actual tracks of different individuals. The model ac-curately predicts the deflection of pedestrians from ob-stacles.

Figure 6 shows the primary failure mode of themodel. A few of the tracks in our dataset include meet-ings between people, or groups of people walking to-gether. Often when this happens, prediction suffers be-cause the model is expecting pedestrians to be repelledby other pedestrians’ locations, rather than be attractedto them. This fact may be able to be used to classify thetracks in which meetings occur. Similar work in detec-tion of anomalous events using social models has shownpromise in this area [10].

7.1. Predicting Position from Noisy Tracking

When tracks are lost in many tracking applications,they are assumed to continue in their previously knowndirection. Here, we show how our model can be used topredict a pedestrian’s future position in the case wherethe position cannot be obtained from image data.

Because the system will not be working from wholetracks, the destination is not known, and for these exper-iments the corresponding θ3 parameter is held to −∞,thus w(θ3) = 0. The model is trained to independentlypredict, at every time step, the next single step as accu-rately as possible. This model was trained using tracksautomatically generated by the algorithm described in[7].

We compared the predictions from our model againstpredictions formed using only the assumption that thepedestrian maintains their last known trajectory. In thesepredictions, the pedestrian follows a straight-line de-fined by the previous two steps. We also experimentedwith splines fit to the previous path, but found that thestraight-line prediction performed better.

Length BaselineModel

OurModel

Improvement

1 24.462 19.698 19.47%2 41.155 30.198 26.62%3 55.961 38.829 30.61%4 71.116 47.486 33.23%5 85.939 56.432 34.34%6 102.986 66.154 35.76%

Table 1. Length denotes the number of time steps the model must pre-dict; Baseline Model is a model which assumes pedestrians will con-tinue in their current trajectory; Our Model denotes a model trained asdescribed by this paper without the Destination cost; Improvement isthe percentage decrease in error from the baseline model to our model.The units of the distances for both models are in an arbitrary coordi-nate system where 24 units corresponds to approximately 1.5 ft.

0 2 4 6 8 10 123.5

4

4.5

5

5.5

6

6.5

7

7.5

8

Time Offset

w(θ

)Weights of Learned Avoidance Terms

Figure 5. Learned parameter values corresponding to the multipleavoidance locations. A time offset of 2 corresponds to .8 seconds.

Table 1 shows the total error of the various methodsover several time lengths. The error in the estimatesof future position are significantly reduced by using thepedestrian model, which has avoidance terms in addi-tion to the constant velocity assumption. Notice that ourmodel offers greater improvement, the farther ahead thatthe prediction must be made.

Quantitative results on manually labeled tracks alsoshowed significant improvements of 10% over the base-line.

7.2. Avoidance Field

As discussed in Section 4.4, multiple values for theσ parameter are each given their own weight, essen-tially learning the size of the radius of influence thatone pedestrian exerts on another in the scene. Similarly,multiple avoidance locations are also used to aid in theaccuracy of the model. The combination of these avoid-ance terms creates a sort of field, which is visualized inFigure 2.

Page 8: Learning Pedestrian Dynamics from the Real Worldpscovann/SFM_ICCV09.pdf · Learning Pedestrian Dynamics from the Real World Paul Scovanner University of Central Florida Orlando, FL

Figure 6. An example of the limitation of the current model. In thiscase, the pedestrian is walking as part of a group. Because groupinteraction is not currently modeled, the model incorrectly treats thegroup’s members as obstacles. Most of the system’s large errors inprediction come from group behavior. Extending this model to con-sider group interactions is an area of future research.

After training, the θ4 weights showed that the pedes-trians will prefer to avoid the predicted future location ofa person even more then they will avoid the actual per-son. Figure 5 shows the corresponding weights of theavoidance terms.

8. ConclusionWe have presented a method for automatically learn-

ing parameters for pedestrian models from real worldobservations. We have used automatically detectedtracks to train a model which describes pedestrian move-ments. We have shown how this model can improve ex-isting detection and tracking methods. We have shownhow to pose the problem as an energy minimizationproblem and shown how training can produce accurateresults which can be used in a variety of tasks.

AcknowledgementsThis work was supported by grant HM-15820810021

through the NGA NURI program.

References[1] S. Ali and M. Shah. Floor fields for tracking in high

density crowd scenes. ECCV, 2008.[2] G. Antonini, S. Martinez, M. Bierlaire, and J. Thiran. Be-

havioral priors for detection and tracking of pedestriansin video sequences. IJCV, 2006.

[3] D. Helbing, I. Farkas, and T. Vicsek. Simulating dynam-ical features of escape panic. Nature, 2000.

[4] D. Helbing and P. Molnar. Social force model for pedes-trian dynamics. Physical Review E, 1995.

[5] R. Hughes. A continuum theory for the flow of pedestri-ans. Transportation Research Part B, 2002.

[6] R. Hughes. The flow of human crowds. Annual Reviewof Fluid Mechanics, 2003.

[7] O. Javed and M. Shah. Tracking and object classificationfor automated surveillance. In ECCV, 2002.

[8] A. Johansson, D. Helbing, and P. Shukla. Specificationof a microspopic pedestrian model by evolutionary ad-justment to video tracking data. Advances in ComplexSystems, 2008.

[9] T. Lakoba, D. Kaup, and N. Finkelstein. Modifications ofthe helbing-molnar-farkas-vicsek social force model forpedestian evolution. SIMULATION, 2005.

[10] R. Mehran and M. Shah. Abnormal crowd behavior de-tection using social force model. CVPR, 2009.

[11] C. Stauffer and W. Grimson. Adaptive background mix-ture models for real-time tracking. CVPR, 1999.

[12] M. Tappen. Utilizing variational optimization to learnmarkov random fields. CVPR, 2007.

[13] A. Treuille, S. Cooper, and Z. Popovic. Continuumcrowds. ACM SIGGRAPH, 2006.

Appendix: Derivations from Section 6In this appendix, we describe the details in deriving

the derivatives from Section 6.1, using the notation fromthat section. When computing the derivatives, we willrely on the identity

∂A−1

∂θ= −A−1 ∂A

∂θA−1 (14)

We will begin by finding ∂zj∂w

∂w∂θ . Using the product

rule, this is equal to ∂A−1

∂θ3h +A−1 ∂h

∂θ3. The first term in

this sum is can be rewritten as∂A−1

∂θ3h = −A−1

[eθa 00 eθa

]A−1h

= −A−1

[eθa 00 eθa

]zj (15)

The derivation of the second term, A−1 ∂h∂θ3

isstraightforward, leading to Equation 13.

The derivative∂zDestj

∂zj−1is computed in a similar fash-

ion. Defining a matrix W =[a 00 a

], the primary dif-

ficulty is that W must be differentiated with respect toboth components of zj−1. Following a process similarto that just shown,

∂A−1

∂zxj−1

h = −eθ3A−1 ∂W

∂zxj−1

A−1h

= −eθ3A−1 ∂W

∂zxj−1

zj

= −eθ3A−1diag(zj)∂a

∂zxj−1

(16)

where a is defined as in Section 6.1 and zxj−1 is the com-ponent of that vector that refers to the x, or horizontal,position of the pedestrian. Expressing the derivative inthis form makes it convenient to compute the derivativewith respect to both components of zj−1:

∂A−1

∂zj−1h = −eθ3A−1diag(zj)

∂a∂zj−1

(17)

The entire derivative∂zDestj

∂zj−1can be found by using a

similar set of steps to find A−1 ∂h∂zj−1