journal of icce, vol. 1, no. 1, july 2012 1 tracking of ......journal of icce, vol. 1, no. 1, july...

9
JOURNAL OF ICCE, VOL. 1, NO. 1, JULY 2012 1 Tracking of 2D or 3D Irregular Movement by a Family of Unscented Kalman Filters Junli Tao and Reinhard Klette Abstract—The paper reports about the design of an object tracker which utilizes a family of unscented Kalman filters, one for each tracked object. This is a more efficient design than having one unscented Kalman filter for the family of all moving objects. The performance of the designed and implemented filter is shown by using simulated movements, and also for object movements in 2D an 3D space. Index Terms—unscented Kalman filter, detection-by-tracking, 2D/3D tracking. 1 I NTRODUCTION O BJECT tracking is an important task for many applications, such as for robot navigation, surveillance, automotive safety, or video content indexing. Based on trajectories obtained through tracking, some advanced be- haviour analysis can be applied. For instance, the pedestrian trajectory can be analysed to warn a driver if trajectories of the vehicle and of the pedestrian are potentially intersecting. For multiple object tracking, tracking-by- detection methods are the most popular algo- rithms. A detector is used in each image frame to obtain candidate objects. Then, with a data- association procedure, all the candidates are matched to the existing trajectories as known up to the previous frame. Any unmatched candidate starts a new trajectory. Since there is no perfect detector that detects all objects without any false positives and false nega- tives, sometimes objects are missed (i.e. they appear in the image but are not detected), or background windows are incorrectly detected as being objects. Such false-positive or false- negative detections increase the difficulty of tracking. Occlusion by other objects or background is one main reason that causes the detection to Junli Tao and Reinhard Klette are with the Department of Com- puter Science, University of Auckland, Auckland, NZ. E-mail: [email protected] Manuscript received July 18, 2012; revised. fail and increases the difficulty for tracking(e.g., identity switch). Some algorithms [22] [1] pro- pose to track objects in the 2D image plane. The occlusion problem is handled either us- ing part detectors and tracking detected body parts, or adopting instance-specific classifiers to improve performance of data assignment. However, tracking in 2D image plane itself increases the ambiguity for data association. A tall person nearby, and a small person far away, for example, may appear very close to each other in the image, and probably the small person is occluded by the tall one in some frames. but they are actually several meters away from each other. Thus, often, and also in this paper, stereo information is adopted to improve the tracking performance [6] [12] [14], and multiple pedestrians are tracked in 3D coordinates. Tracking objects with irregular movements in 3D space is a challenging task due to totally unknown speed and direction. In this paper, the application of an unscented Kalman filter (UKF) is demonstrated which can also handle nonlinear (actually: fully irregular) trajectories in 3D space. For the original paper on UKF, see [19]. Similar work is proposed in [14]. But instead of modelling the motion of the vehi- cle and the pedestrians separately, we straight forward model the relative motion between them, and no ground plane is assumed, objects moving in 6 degree-of-freedom can be tracked properly. Different types of model are tested

Upload: others

Post on 02-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • JOURNAL OF ICCE, VOL. 1, NO. 1, JULY 2012 1

    Tracking of 2D or 3D Irregular Movementby a Family of Unscented Kalman Filters

    Junli Tao and Reinhard Klette

    Abstract—The paper reports about the design of an object tracker which utilizes a family of unscented Kalman filters,one for each tracked object. This is a more efficient design than having one unscented Kalman filter for the family of allmoving objects. The performance of the designed and implemented filter is shown by using simulated movements, andalso for object movements in 2D an 3D space.

    Index Terms—unscented Kalman filter, detection-by-tracking, 2D/3D tracking.

    F

    1 INTRODUCTION

    O BJECT tracking is an important task formany applications, such as for robotnavigation, surveillance, automotive safety, orvideo content indexing. Based on trajectoriesobtained through tracking, some advanced be-haviour analysis can be applied. For instance,the pedestrian trajectory can be analysed towarn a driver if trajectories of the vehicle andof the pedestrian are potentially intersecting.

    For multiple object tracking, tracking-by-detection methods are the most popular algo-rithms. A detector is used in each image frameto obtain candidate objects. Then, with a data-association procedure, all the candidates arematched to the existing trajectories as knownup to the previous frame. Any unmatchedcandidate starts a new trajectory. Since thereis no perfect detector that detects all objectswithout any false positives and false nega-tives, sometimes objects are missed (i.e. theyappear in the image but are not detected), orbackground windows are incorrectly detectedas being objects. Such false-positive or false-negative detections increase the difficulty oftracking.

    Occlusion by other objects or background isone main reason that causes the detection to

    • Junli Tao and Reinhard Klette are with the Department of Com-puter Science, University of Auckland, Auckland, NZ.E-mail: [email protected]

    Manuscript received July 18, 2012; revised.

    fail and increases the difficulty for tracking(e.g.,identity switch). Some algorithms [22] [1] pro-pose to track objects in the 2D image plane.The occlusion problem is handled either us-ing part detectors and tracking detected bodyparts, or adopting instance-specific classifiersto improve performance of data assignment.However, tracking in 2D image plane itselfincreases the ambiguity for data association.A tall person nearby, and a small person faraway, for example, may appear very close toeach other in the image, and probably the smallperson is occluded by the tall one in someframes. but they are actually several metersaway from each other. Thus, often, and alsoin this paper, stereo information is adopted toimprove the tracking performance [6] [12] [14],and multiple pedestrians are tracked in 3Dcoordinates.

    Tracking objects with irregular movements in3D space is a challenging task due to totallyunknown speed and direction. In this paper,the application of an unscented Kalman filter(UKF) is demonstrated which can also handlenonlinear (actually: fully irregular) trajectoriesin 3D space. For the original paper on UKF,see [19]. Similar work is proposed in [14]. Butinstead of modelling the motion of the vehi-cle and the pedestrians separately, we straightforward model the relative motion betweenthem, and no ground plane is assumed, objectsmoving in 6 degree-of-freedom can be trackedproperly. Different types of model are tested

  • JOURNAL OF ICCE, VOL. 1, NO. 1, JULY 2012 2

    and compared in both simulation and real se-quences.

    2 RELATED WORKMultiple object tracking attracts much of at-tention recently in computer vision research.Today, an update of the review [24] from 2006should also include work such as in [1], [5], [6],[12], [13], [15]–[18].

    Kalman filters (KF) are extensively adoptedto deal with tracking task. KF is a recursiveBayesian filter, firstly, using motion informa-tion to predict possible position, followed byfusing the observation (detection) and pre-dicted position. A linear Kalman filter is usedfor tracking (see, e.g., [24]) when movementcan be approximated by linear models. Ob-viously, a linear model is not true for mostof the cases. The extended Kalman filter (EKF)was designed [21] for handling a nonlinearmodel by linearizing functions using the Taylorexpansion extensively. For example, an EKFhas been used for Simultaneous Localization andMapping (SLAM) [9], and for pedestrian track-ing [23]. Particle filter were used to handlethe task in [2]. Similar performance as EKF isreported in [6].

    The UKF can handle a nonlinear model byusing the unscented transform to estimate thefirst and second order moments of sigma points,which represent the distribution of a predictedstate and predicted observations, and it ap-pears that the UKF does this better than theEKF [8]. Thus, in this paper, an UKF is usedfor tracking multiple, irregularly moving ob-jects in 3D space, which is a ‘highly nonlinear’problem.

    3 UNSCENTED KALMAN FILTERThe unscented transform (UT) is the core partthat makes UKF able to handle nonlinear mod-els. Let L be the dimensionality of the systemstate xt−1|t−1 at time t − 1. If the system noise(process noise Q and measurement noises R)is not additive, the state is augmented beforeUT. In our case, the random acceleration isintroduce as the process noise, thus, the stateaugmented with a process noise vector, is de-noted by x(a)t−1|t−1, called vectors. The dimension

    of the augmented vector, depends on the pro-cess model, which is illustrated in Section 4.2.Let xt|t−1 denote the predicted state at time twhen passing xt−1|t−1 through process functionf . Let yt|t−1 be the predicted observation attime t when passing xt|t−1 through observationfunction h.

    The UT works by sampling 2L+ 1 sigma vec-tors X(a)i in the augmented state space (follow-ing [19]), forming a matrix X (a). The covariancematrix in augmented state space is denotedby P(a). Let P(xx)t−1|t−1 be the state covariancematrix (i.e. describing dependencies betweencomponents of a state x). Formally,

    X(a)0 = x

    (a)t−1|t−1

    X(a)i = x

    (a)t−1|t−1 + (

    √(L+ λ)P

    (a)t−1|t−1)i

    for i = 1, 2, . . . , L

    X(a)i = x

    (a)t−1|t−1 − (

    √(L+ λ)P

    (a)t−1|t−1)i−L

    for i = L+ 1, L+ 2, . . . , 2L

    X (a) =[X (s)0

    ]+

    [0X (n)

    ]

    where λ is a positive real, a scaling parameter.These sigma vectors can be passed through anonlinear function (e.g., f , h) one by one, thusdefining transformed (i.e. new) sigma vectors(e.g., X (s)t|t−1 and Yt|t−1 are obtained). The meansxt|t−1 or yt|t−1 and covariance matrices P

    (xx)t|t−1,

    or P(yy)t|t−1, are obtained as follows, take h forexample:

    Yt|t−1 = h(X (s)t−1|t−1)

    yt|t−1 ≈L∑

    i=0

    W(m)i Yi

    P(yy)t|t−1 ≈

    L∑i=0

    W(e)i (Yi − yt|t−1)(Yi − yt|t−1)T

    with constant weights W (.)i . Details are givenin [19].

    The UKF is illustrated as follows. At first weinitialize the state x = x0 and state covariancesP(xx) = P

    (xx)0 . For the augmented vectors, let

    x(a) = (xT 0T )T

    P(a) = diag(P(xx),Q)

  • JOURNAL OF ICCE, VOL. 1, NO. 1, JULY 2012 3

    where Q denotes the process-noise covariancematrix. Details about Q are given in Section 4.2.

    For t ∈ (1, . . . ,∞), we calculate sigma vectorsas follows:

    X (a)t−1|t−1 = (x(a)t−1|t−1,

    x(a)t−1|t−1 + γ

    √P

    (a)t−1|t−1,

    x(a)t−1|t−1 − γ

    √P

    (a)t−1|t−1)

    where γ =√λ+ L. The process update is

    defined as follows:

    X (s)t|t−1 = f(X(s)t−1|t−1,X

    (n)t−1|t−1)

    xt|t−1 =2L∑i=0

    W(m)i X

    (s)i

    P(xx)t|t−1 =

    2L∑i=0

    W(c)i (X

    (s)i − xt|t−1)(X

    (s)i − xt|t−1)T

    We update the sigma vectors using

    X (s)t|t−1 = (xt|t−1,

    xt|t−1 + γ

    √P

    (xx)t|t−1,

    xt|t−1 − γ√

    P(xx)t|t−1)

    Yt|t−1 = h(X (s)t|t−1)

    yt|t−1 =2L∑i=0

    W(m)i Yi

    and update the measurement covariance ma-trix as follows:

    P(yy)t|t−1 =

    2L∑i=0

    W(c)i (Yi − yt|t−1)(Yi − yt|t−1)T + R

    where R is the assumed measurement noise co-variance, depending on the observation modelselected. Details are given in Section 4.2.

    Altogether, the UKF is defined by

    P(xy)t|t−1 =

    2L∑i=0

    W(c)i (Xi − xt|t−1)(Yi − yt|t−1)T

    Kt = P(xy)t|t−1(P(yy)t|t−1)

    −1

    xt|t = xt|t−1 +Kt(yt − yt|t−1)P

    (xx)t|t = P

    (xx)t|t−1 −KtP

    (xy)t|t−1K

    Tt

    4 MULTIPLE OBJECT TRACKING

    Following tracking-by-detection methods,which are popular for solving multiple-objecttracking tasks, a detector is applied in eachframe to generate object candidates which areoutputs of the detector. One UKF is adoptedfor tracking one object separately, thus a groupof detected pedestrians defines a family ofUKFs to be processed simultaneously. EachUKF tracks one detected object. The predictedstate of an UKF is used for data association;when an observation (of the tracked object) isavailable in the current frame then we updatethe predicted state by using the correspondingUKF.

    4.1 Detection

    Detection-by-tracking methods rely on eval-uating rectangular regions of interest, we callthem object boxes if positively identified as con-taining an object of interest. For pedestriantracking, we adopt the popular histogram oforiented gradients (HOG) feature method anda support vector machine (SVM) classifier, orig-inally introduced in [4]. HOG features describethe human profile by an oriented gradient his-togram. An SVM classifier is able to handlehigh-dimensional and non-linear features (suchas HOG features). It projects sample featuresinto a high-dimensional space, and then findsa hyperplane to separate two classes. Insteadof using a sliding window, regions of interest(i.e. inputs to the classifier) are selected byanalysing calculated stereo information (depthand disparity maps), as proposed in [7].

    Figure 1 shows several detection results inpedestrian sequence, dots(cyan) denotes theboxes centre that recognized as pedestrian, andthe red rectangle denotes the final detectionresults. As can be seen in the results, the objectboxes may contain background, shift from theobject, or miss the pedestrians.

    For the detection of Drosophila larvae (an ex-ample of 2D movement), thresholds and con-nected components are adopted to obtain oneobject box for each larvae. Several larvae detec-tion results are shown in Fig. 2. As the scene iscertain, the detection results are more reliable

  • JOURNAL OF ICCE, VOL. 1, NO. 1, JULY 2012 4

    when compare to pedestrian sequence. But nodepth information is available here.

    4.2 UKF-Based Object Tracking

    As there is an unknown number of objects ina scene, the state-dimensionality would expandsignificantly if we would have decided to trackall pedestrians in one UKF; in this case, thespeed of tracking reduces dramatically whenthe scene is crowded with many detected ob-jects around. Thus, we decided for one UKF foreach detected object for tracking.

    Fig. 1. The depth map on top uses a colourcode for calculated distances; depth values areonly shown at pixels where the mode filter ac-cepts the given value. The lower image showsdetected (coloured) object boxes.

    Fig. 2. Larvae detection results shown by (cyan)object boxes.

    Choosing a proper model is important. Inthis subsection we offer three models for pos-sible selection, namely 3D position (world co-ordinates) with velocity observed, denoted by3DVT, 3D position without velocity observed,denoted by 3DT, and 2D position (image coor-dinates) without velocity observed, denoted by2DT. These models are compared in Section 5.

    4.2.1 Both 3D modelsIn the 3DVT model, the object is tracked in3D world coordinates. Its 3D position (x, y, z)is the first part of the state. We also includethe velocity (vx, vy, vz). Thus, a state x =(x, y, z, vx, vy, vz)

    T is 6-dimensional.Process Model. We assume constant velocity

    between adjacent frames, with Gaussian dis-tributed noise acceleration na ∈ N(0,Σna). Thediagonal elements in Σna are set to be equaland denoted by σ2na . Thus,

    ẋ = x+ (vx + nax∆t)∆t, v̇x = vx + nax∆t

    ẏ = y + (vy + nay∆t)∆t, v̇y = vy + nay∆t

    ż = z + (vz + naz∆t)∆t, v̇z = vz + naz∆t

    where ∆t is the time interval between subse-quent frames.

    Observation Model. An observation consistsof the position (io, jo) (say, the centroid ofthe detected object box in the left camera),disparity d of the detected object, and veloc-ity (vox, voy, voz) in 3D coordinates. The usualpinhole-camera projection model is used tomap 3D points into the image plane,

    io = fx/z, jo = fy/z

    d = fb/z, vox = vx

    voy = vy, voz = vz

    where f denotes focal length, and b de-notes the length of the baseline between two

  • JOURNAL OF ICCE, VOL. 1, NO. 1, JULY 2012 5

    rectified stereo cameras. In this case, R =diag(σ2nmp , σ

    2nmp , σ

    2nmp , σ

    2nmv , σ

    2nmv , σ

    2nmv).

    For the disparity d we select the mode in thedisparity map in a fixed (e.g. 20 × 20) neigh-bourhood around the centroid (io, jo) of thedetected object box. 3D scene flow (vox, voy, voz)can be obtained by combining optic flow andstereo information [20].

    As it is difficult to obtain high-quality sceneflow as required for 3DVT, 3DT simplifies3DVT model by excluding the scene flow inobservation, and has the same process model as3DVT. In this case, R = diag(σ2nmp , σ

    2nmp , σ

    2nmp).

    4.2.2 The 2D modelIf only monocular recording is available, theobject is tracked in the 2D image plane only.The state x = (i, j, vi, vj)T consists of position(i, j) and velocity (vi, vj).

    Process Model. The same as for the 3Dmodels. We assume constant velocity betweensubsequent frames with a Gaussian noise dis-tribution for acceleration na:

    i̇ = i+×(vi + nai∆t)∆t, v̇i = vi + nai∆tj̇ = j +×(vj + naj ∆t)∆t, v̇j = vj + naj ∆t

    Observation Model. An observation consistsof the central position (io, jo) of an object boxonly, io = i and jo = j, resulting in R =diag(σ2nmp , σ

    2nmp) for this case.

    4.3 Data AssociationAs each object is tracked independently, dataassociation by matching candidates to existingtrajectories becomes important. If no matchthen we decide to initialize a new tracker.

    Since object movements are continuous, theestimated velocity in the UKF can be used asa cue to localize the search area for finding thematch object. For each trajectory, the possiblelocation (i.e. (xp, yp, zp) for 3D, and (ip, jp) for2D) of the object in the current frame is pre-dicted by process model used in the EKF. Thislocation is used as a reference for searchingpotentially matching candidates in the currentframe. Currently we simply match candidatesbased on shortest Euclidean distance and agiven threshold τ .

    One candidate might be matched with sev-eral trajectories if the Euclidean distance is be-low τ . Trajectories compete for the candidates,the closest wins finally. If a candidate is notmatched to any trajectory, a new tracker isinitialized. If a trajectory does not win any ofthe candidates, the tracker is propagated withthe given prediction, and the new state is thepredicted state, without being updated by anobservation (because not available).

    No object appearance description is usedhere for assigning an object to a trajectory.In general, the inclusion of appearance rep-resentation (e.g., a colour histogram, or aninstance-specific shape model) improves theperformance. However, this is out of the scopeof this paper where we discuss the combinationof different data-association methods.

    5 EXPERIMENTSIn this section, first, our three models (3DVT,3DT, and 2DT) are tested in a simulated envi-ronment with different parameter sets. Second,our multiple-object tracking method is testedon real video sequences where (3D example)pedestrians are walking in inner-city scenes,or (2D example) larvae are moving on a flatculture dish.

    5.1 Simulated TrackingThe three models defined in Section 4.2 aretested in a simulation environment in OpenGL.A cub is moving on a circular path around a 3Dpoint with constant speed, as show in Figs. 3and 4. Acceleration noise na with differentcovariance (e.g. σ2na = 0.0001, 0.01, 1), and mea-surement noise nm with different covariance(e.g. σ2nmp = 10, 50, 100, σ

    2nmv = 50, 100, 150), are

    used to test and compare the three models’performance. The simulation environment isdifferent for 2D and 3D models, where for 2D,positions are integral pixel coordinates in theimage plane, but for 3D, position coordinatesare reals. The radius of the circle in the 3Dmodels is 10, while in the 2D model it is50. In both environments, measurements aredegraded by noise before sent to the UKF.

    Figure 3 demonstrates the effect of σ2na , hav-ing fixed σ2nmp and σ

    2nmv . Experiments show

  • JOURNAL OF ICCE, VOL. 1, NO. 1, JULY 2012 6

    that lager σ2na values result in more unstabletrajectories. Large σ2na means that the processmodel produces a predicted state that is fluc-tuating with large magnitudes. Results showthat σ2na = 0.0001 is a reasonable choice for3D models. For the 2D model, a smaller σ2nayields smooth estimation, but shifts are signif-icant. A larger σ2na value produces estimationsthat are more close to the true positions, butfluctuations are significant, for an experimentwith σ2na = 1 for the 2D case. In general, 3DTand 3dDVT converge better than 2DT, while3DT and 3dDVT show a similar performance.3D models use stereo information rather thanjust a single image as for the 2D model, whichalso proves that stereo information can helpto improve the tracking performance. As themeasured 3D position is noisy, the measure ofvelocity is even more noisy; this appears tobe the main reason for the observation thatthe inclusion of velocity cannot improve theperformance.

    Figure 4 shows results for the three modelsfor different covariance values σ2nmp and σ

    2nmv

    of measurement noise. Significantly increasing

    Fig. 3. Simulation results for variations in thevariance of acceleration noise. From left to right,σ2na = 0.0001, 0.01, or 1, with fixed valuesσ2nm = σ

    2nmp = 50, and σ

    2nmv = 100. From top

    to down, the tracking model is 3DVT, 3DT and2DT respectively.

    measurement noise (i.e. higher uncertainty ofobservations) reduces the performance onlyslightly. This demonstrates that the UKF is arobust tracker to some degree, which is notvulnerable to detection uncertainties. As be-fore, 3DT and 3dDVT converge better than2DT, while 3DT and 3dDVT show a similarperformance.

    Fig. 4. Simulation results for variable varianceof measurement noise. From left to right, σ2nmp =10, 50, or 100, σ2nmv = 50, 100, 150, respectively,with fixed σ2na = 0.0001 for 3D models, σ

    2na = 1

    for 2D models,. From top to down, the trackingmodel is 3DVT, 3DT and 2DT respectively.

    5.2 Multiple Object Tracking in Real DataIn this section we report about the performanceof UKF-supported tracking for multiple larvaeusing the 2DT model, and for multiple pedes-trians in traffic scenes using the 3DT model.larvae sequence and pedestrian sequences are30 and 15 frames per second.

    Results for larvae tracking are shown inFig. 5. As the velocity in the model is initial-ize by (0, 0), the UKF-estimation is “slower”than the real speed of the larvae in the first30 frames. The speed of convergence can beimproved by increasing σna , but note that thelarger the σna value is, the larger is the mag-nitude of fluctuation. The estimated trajecto-ries follow “well” the larvae moving, mainly

  • JOURNAL OF ICCE, VOL. 1, NO. 1, JULY 2012 7

    because all larvae are properly detected in allframes. However, such a complete detectioncannot be expected for pedestrian sequences.Next, we test the UKF for such “noisy detectionresults” for pedestrian sequences.

    Fig. 5. 2D Tracking results of larvae sequences.From top to bottom: tracking results in Frames26, 46, and 166 of one sequence. Red linesshow the detected track, and white lines showthe UKF-predicted track. Blue lines representestimated trajectories. The left column is theoriginal intensity image overlaid with estimatedtrajectories.

    Results for pedestrian tracking are shown inFig. 6. Objects are missing or shifting from timeto time due to the clustered background (carin the traffic scene detected as a pedestrian),illumination variations (some pedestrians arenot detected), or internal variations betweenobjects (unstable detections). Our experimentsverified that UKF-predictions can follow irreg-ular moving pedestrians when detection failsfor a few frames, and can even correct unstabledetections.

    The second frame in Fig. 6 shows that the un-detected pedestrian is predicted correctly in thewhite object box, and successfully matched toa detected position in the third frame. The lastframe in Fig. ?? demonstrates that displaceddetections are corrected by the UKF. Using onlythe defined distance rule for data assignment,this appears to be insufficient, especially for

    the given detection results. A small thresholdmay lead to a mismatch (i.e. the detection failsto satisfy the rule), and a large threshold maylead to an identity switch (i.e. a pedestrian ismatched to another pedestrian).

    6 CONCLUSIONSAssigning one unscented Kalman filter to eachdetected (moving) object simplifies the designand implementation of UKF-prediction of 2Dor 3D motion. Experiments demonstrate therobustness of the chosen approach. This trackeronly generates short-term tracks when detec-tion is not reliable; long-term tracking shouldbe possible by also introducing dynamic pro-gramming. For evaluating the performance onreal-world (either 2D or 3D) applications, moreextensive tests need to be undertaken, espe-cially for the design and evaluation of quantita-tive performance measures. For example, mea-sures discussed in [10] for evaluating visualodometry techniques might also be of relevancefor the tracking case.

    ACKNOWLEDGMENTThe authors thank Simon Hermann for provid-ing his implementation of semi-global match-ing for stereo analysis, Gabriel Hartmann (bothAuckland) for support regarding the unscentedKalman filter, and Benjamin Risse (Münster) forvideo data with moving Drosophila larvae.

    REFERENCES[1] M. D. Breitenstein, F. Reichlin, B. Leibe, E. Koller-Meier,

    and L. Van Gool. Online multiperson tracking-by-detectionfrom a single, uncalibrated camera. IEEE Trans. PatternAnalysis Machine Intelligence, 33:1820–1833, 2011.

    [2] Y. Cai, N. de Freitas, and J. J. Little. Robust visual trackingfor multiple targets. In Proc ECCV, pages 107–118, 2006.

    [3] D. Comaniciu, V. Ramesh, and P. Meer. Kernel-based objecttracking. IEEE Trans. Pattern Analysis Machine Intelligence,25:564–577, 2003.

    [4] N. Dalal and B. Triggs. Histograms of oriented gradientsfor human detection. In Proc. CVPR, volume 1, pages 886–893, 2005.

    [5] A. Ess, B. Leibe, K. Schindler, and L. Van Gool. Robustmultiperson tracking from a mobile platform. IEEE Trans.Pattern Analysis Machine Intelligence, 31:1831–1846, 2009.

    [6] A. Ess, K. Schindler, B. Leibe, and L. Van Gool. Objectdetection and tracking for autonomous navigation in dy-namic environments. Int. J. Robotic Research, 29:1707–1725,2010.

  • JOURNAL OF ICCE, VOL. 1, NO. 1, JULY 2012 8

    Fig. 6. Left: object boxes (red), UKF-predictions (white), and UKF-estimations (blue) are overlaidon the original intensity image. The black box is an invalid detection excluded by a disparitycheck. Right: coloured blocks denote the object boxes (red), UKF-predictions (white), and UKF-estimations (blue) in the XZ-plane in real-world coordinates.

    [7] D. M. Gavrila and S. Munder. Multi-cue pedestrian detec-tion and tracking from a moving vehicle. Int. J. ComputerVision, 73:41–59, 2007.

    [8] G. Hartmann. Unscented Kalman filter sensor fusion formonocular camera localization. MSc thesis, Computer Sci-ence, Univ. Auckland, 2012.

    [9] S. Huang. Convergence analysis for extended Kalman filterbased SLAM. In Proc. IEEE Int. Conf. Robotics Automation,pages 412–417, 2006.

    [10] R. Jiang, R. Klette, and S. Wang. Statistical modeling oflong-range drift in visual odometry. In Proc. CVVT, ACCVworkshop, LNCS 6469, pages 214–224, 2011.

    [11] E. Kraft. A quaternion-based unscented Kalman filter fororientation tracking. In Proc. Int. Conf. Information Fusion,volume 1, pages 47–54, 2003.

    [12] B. Leibe, K. Schindler, N. Cornelis, S. Member, and L. VanGool. Coupled object detection and tracking from static

    cameras and moving vehicles. IEEE Trans. Pattern AnalysisMachine Intelligence, 30:1683–1698, 2008.

    [13] W. F. Leven and A. D. Lanterman. Unscented Kalmanfilters for multiple target tracking with symmetric mea-surement equations. IEEE Trans. Automatic Control, 54:370–375, 2009.

    [14] M. Meuter, U. Iurgel, S. B. Park, and A. Kummert. Theunscented Kalman filter for pedestrian tracking from amoving host. IEEE Symp. Intelligent Vehicles, pages 37–42,2008.

    [15] D. Mitzel, E. Horbert, A. Ess, and B. Leibe. Multi-persontracking with sparse detection and continuous segmenta-tion. In Proc ECCV, LNCS 6311, pages 397–410, 2010.

    [16] D. Mitzel, P. Sudowe, and B. Leibe. Real-time multi-person tracking with time-constrained detection. In ProcBMVC, pages 104.1–104.11, 2011.

    [17] D. Mitzel and B. Leibe. Real-time multi-person tracking

  • JOURNAL OF ICCE, VOL. 1, NO. 1, JULY 2012 9

    with detector assisted structure propagation. In Proc ICCVWorkshops, pages 974–981, 2011.

    [18] M. M. Shaikh, W. Bahn, C. Lee, T. Kim, T. Lee, K.Kim, and D. Cho. Mobile robot vision tracking systemusing unscented Kalman filter. In Proc. Int. Symp. SystemIntegration, pages 1214–1219, 2011.

    [19] E. A. Wan and R. Van Der Merwe. The unscented Kalmanfilter for nonlinear estimation. In Proc. IEEE Symp. AS-SPCC, pages 153–158, 2000.

    [20] A. Wedel, T. Brox, T. Vaudrey, C. Rabe, U. Franke, andD. Cremers. Stereoscopic scene flow computation for 3Dmotion understanding. Int. J. Computer Vision, 95:29–51,2011.

    [21] G. Welch and G. Bishop. An introduction to the Kalmanfilter. Technical Report, Univ. North Carolina at ChapelHill, 1995.

    [22] B. Wu and R. Nevatia. Detection and tracking of multiple,partially occluded humans by Bayesian combination ofedgelet based part detectors. Int. J. Computer Vision, 75:247–266, 2007.

    [23] H. S. Yazdi and S. E. Hosseini. Pedestrian tracking usingsingle camera with new extended Kalman filter. Int. J.Intelligent Computing Cybernetics, 1: 379–397, 2008.

    [24] A. Yilmaz, O. Javed, and M. Shah. Object tracking: asurvey. J. ACM Computing Surveys, 38, Article No. 13, 2006.

    Junli Tao is a PhD student at AucklandUniversity, New Zealand. She received herMaster degree in 2010 from Hunan Uni-versity at Hunan, China. Her research in-terests are in active pedestrian protectionsystems for diver assistance, describedby keywords such as ego-motion analy-sis, pedestrian detection and tracking, andpedestrian intention prediction.

    Dr. Reinhard Klette is a professor at theComputer Science Department at Auck-land University, New Zealand. His profes-sional interests are in computer vision (the-ory and applications) and in the designof geometric algorithms. He (co-)authoredmore than 350 peer-reviewed publications.He is the Editor-in-Chief of the Journal onControl Engineering and Technology.