multiple moving target detection, tracking, and recognition from a moving observer

7/28/2019 Multiple Moving Target Detection, Tracking, and Recognition from a Moving Observer

1/6

Multiple Moving Target Detection, Tracking, andRecognition from a Moving Observer

Fenghui Yao and Ali Sekmen Mohan J. MalkaniDepartment of Computer Science Department of Electric and Computer Engineering

Tennessee State University3500 John A Merritt Blvd, Nashville, TN 37215, USA

{fyao, asekmen, mmalkani }@tnstate.edu

Abstract - This paper describes an algorithm formultiple moving targets detection, tracking and recognition froma moving observer. When the camera is placed on a movingobserver, the whole background of the scene appears to bemoving and the actual motion of the targets must be distinguishedfrom the background motion. To do this, an affine motion modelbetween consecutive frames is estimated, and then moving targetscan be extracted. Next, the target tracking employs a similaritymeasure which is based on the joint feature-spatial space. At last,the target recognition is performed by matching moving targets

with target database. The average processing time is 680 ms perframe, which corresponds to a processing rate of 1.5 frames persecond. The algorithm was tested on the Vivid datasets providedthe Air Force Research Laboratory and experimental resultsshow that this method is efficient and fast for real-timeapplication.

I. INTRODUCTION

Detection and tracking of moving objects in an image

sequence is one of the basic tasks in computer vision. The

detected moving object trajectory can be either of interest in its

own or used as the input for a high level analysis such as

motion pattern understanding, moving behavior recognition

and so on. Applications include surveillance, homeland

security, protection of vital infrastructure, and advanced

human-machine communication. Therefore, moving objects

detection and tracking has received more and more attention,

and many algorithms have been proposed. Among these, one

interesting approach is the Particle filter [1], which has been

used and extended many times [2] [3] [4]. Particle filter was

developed to track objects in clutter, where the posterior

density and observation density are often non-Gaussian. The

key idea of particle filtering is to approximate the probability

distribution by a weighted sample set. Each sample consists of

an element which represents the hypothetical state of an object

and a corresponding probability. The state of an object may be

control points of a contour [1], the position, shape and motionof an elliptical region [2], or specific model parameters [3].

That is, these methods [2] [3] are based on models. Rosss

approach [4] is a model-free, statistical detection method

which use both edge and color information. The common

assumption of these methods [1] [2] [3] is that the background

does not move, and the image sequences are from a stationary

camera. Tian et al [5] developed a real-time algorithm to detect

salient motion in complex environments by combining

temporal difference imaging and temporal filtered optical flow.

The image sequence used in this method is also from stationary

camera.

The works of Smith and Brady [7] and Kang et al [6]

employed the image sequences from moving platform. Kang

et al developed an approach for tracking of moving objects

observed by both stationary and Pan-Tilt-Zoom cameras. Smith

and Bradys approach employed the image sequence from a

camera mounted on a vehicle to detect other moving vehicle.This method used special-purpose hardware to implement the

real-time target detection and tracking. COMETS system

detects the target from a moving observer (an autonomous

helicopter but does not perform tracking [8]. Yang et als

tracker works for image sequence form both stationary and

moving platform but it detects and track single target [9].

Literature [10] proposes a detection-based multiple object

tracking, literature [11] shows a multiple object tracking

method based on multiple hypotheses graph representation, and

literature [12] demonstrates a distributed Bayesian multiple

target tracker. However they all employ image sequences from

stationary observers.As shown above, there are a few works to discuss the

multiple moving target detection and tracking from the moving

observer. And also few work deals with target recognition at

same time. This paper introduces a method for moving target

detection, tracking, and recognition from a moving observer.

II.MOVING TARGET DETECTION FROM AMOVING OBSERVER

The entire configuration is shown in Fig. 1. The output of

the moving target detection is sent to the target tracking. The

tracked targets are sent to target recognition. This section

describes moving target detection, target tracking and target

recognition is discussed in Section 3 and 4, respectively.

The moving observer usually means a camera mounted on aground vehicle or on an airborne platform such as a helicopter

or an unmanned aerial vehicle (UAV). In this work, the video

sequences are generated by an airborne camera. In airborne

video, everything (target and background) appear to be moving

over time due to the camera motion. Before employing frame

differencing (simple motion detection method for stationary

platforms) to detect motion images, it is necessary to conduct

motion compensation first. Two-frame background motion

978-1-4244-2184-8/08/$25.00 2008 IEEE. 978

Proceedings of the 2008 IEEE

International Conference on Information and Automation

June 20 -23, 2008, Zhangjiajie, China


2/6

estimation is achieved by fitting a global parametric motion

model (affine or projective) to sparse optic flow. Here, we use

affine transformation model.

A. Optic Flow Detection

Sparse optic flow is obtained by applying Lucas-Kanade

algorithm [13]. The number of optic flow is controlled in the

range of 200 to 1000. Other methods such as matching Harris

corners, Moravec feature, SUSAN corners between frames, or

matching SIFT features are all applicable here. The mainfactors need to be considered are computation cost and

robustness. Experiment results show that Lucas-Kanade

method is most reliable and pretty fast.

B. Affine Parameter Estimation

2-D affine transformation is described as follows,

+

=

6

5

43

21

a

a

y

x

aa

aa

Y

X

i

i

i

i , (1)

where (xi,yi) are locations of feature points in previous frame,and (Xi, Yi) are locations of feature points in current frame.

Theoretically, to determine six affine parameters, three pairs ofmatched feature points are enough. How to select these threepairs of feature points will affect the precision of affineparameter estimation. To reduce this estimation error, theseparameters can be solved in the least-squares method based onall matched feature points. However the computation cost inleast-squares method is heavy. To reduce the computation timeand estimation error, this work use the algorithm similar toLMedS (Lest Median Square) [14]. Details are as follows. (i)Randomly select N pairs of matched feature points from

previous frame and current frame. And further, randomly selectM triplets from N pairs of matched feature points, where

NM


3/6

.1

),(1

),(

1

2

1

2

1

= =

=

=

=

N

i

jih

M

j

ji

M

ijjxyx

h

vuG

yxK

MN

vyPM

IIJ

(3)

J(Ix,Iy) is symmetric and bounded by zero and one. Thissimilarity is based on the average separation criterion in clusteranalysis [15] except that it employs the distance with akernelized one. This similarity measure has been applied for asingle target tracking [16] [17].

B. Modified Similarity Measure for Multiple Target Tracking

In multiple target tracking, the similarity between the targetTk represented by the hullHk in (t-1)-th frame and the target Tlin the t-th frame depends on not only joint feature-spatial space

but also the distance between them. Therefore the similaritymeasure in Eq.(3) is modified as follows.

ktc

ts

tlc

tl

y

kt

xlk

ttkl

PPIIJTTS

,11

,1

,1 ),(),(

=

, (4)

where ktxI,1 , ktcP

,1 is the distribution of target samples inside

the hullHk and the target center in (t-1)-frame,tlyI and

tlcP is

the distribution of target samples insideHland the target center

in t-th frame, respectively, and 1ts is the affine

transformation model from (t-1)-th frame to t-th frame.

To verify the robustness of this similarity measure, four

targets extracted from aerial images as shown in Fig. 3 (a),

which are gray truck (GT), red sedan (RS), blue sedan (BS),and gray sedan (GS) from left to right, are employed for

similarity testing. These four targets are rotated in range of 0

to 180, with 5 increment in each rotation. The similarity

measure between these generated images and the gray truck in

Fig. 3 (a) are calculated by using 500 random sample points

from the each image. The similarity measures for GT-RS, GT-

BS, GT-GS, and GT-GT are sown in Fig.3 (b). The similarity

measure variance for GT-RS, GT-BS, GT-GS, and GT-GT

matching is 4.34 10-6, 1.04 10-5, 1.29 10-5, and 5.04 10-5,

respectively. These results show that the similarity in Eq. (4) is

robust to the rotation and scaling. To reduce the computation

time, there is no need to use all points inside the target hull.

The sample points can be chosen randomly from samplesinside the target hull.

C. Tracking Graph Management

The multiple targets tracker needs to handle all problems aslisted in Fig.2. The algorithms to deal with these problems areas follows.

1) Missing detection prediction: The targets which are undertracking till the frame right before the current frame may bemissed at the current frame because of the failure of detector.Missing detections at i-th frame, they are estimated from thedetection results obtained in image frames prior to the currentframe, by applying estimators. According to the position andvelocity of the target in previous image frames, its new

position and velocity in the new frame can be estimated byKalman filter, recursive Bayesuan estimator, or particle filter.In this work, Kalman filter is employed. From previous state

),,,( ,1,1,1,1 kiikiikickic vyx , the next state ),,,( ikikikcikc vyx

is estimated, where ),( ,1,1 kicki

c yx is the center of the target Tk

(which is missing at i-th frame) at (i-1)-th frame, and

),( ,1,1 kiikii v is the average velocity and direction over

passed frames. Fig. 4 (c) shows a missing detection at frame21, which will be estimated.

2) New target detection: New targets usually appear at thefour surroundings but not interior area. If a target is detectedand tracked over 2 frames, it is considered as a new target.Currently, four surroundings with the size of 20-pixel arecleared to zero, to remove the pixels that are not involved ingenerating frame difference. Toward inside, four surroundings

with size of 40-pixel are the area that new target may emerge.Fig. 4 (a) shows 3 newly detected targets at frame 6.

3) False detection filtering: For targets that emerge in theinside area of the image, and are not linked to the targets in

previous frame or next frame, they are false detection. They arefiltered out. Fig.4 (b) shows a false detection at frame 9, whichwill be filtered out.

Fig. 3 Robustness of similarity measure.

(b) Similarity measures between targets in (a)

(a) Four targets extracted from aerial image (from left to right:gray truck, red sedan, blue sedan, and white sedan)

Similarity Using 500 Samples

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170

theta

Similarity

GT-RS GT-BSGT-GS GT-GT

980


4/6

(a)

(b)

Fig. 4 Target detection results. (a) Three targets (grey truck, grey sedan,and red sedan) are detected at frame 6; (b) Three targets andfalse detection (lower red ellipse) at frame 9; (c) Missingdetection (red sedan) at frame 21.

(c)

(b)

Fig. 5 Target detection results. (a) Merging detection at frame 162; (b)

Mask image showing target merging at frame 162; (c)Trajectory for six targets from frame 1 to frame 162.

(a)

(c)

981


5/6

4) Disappearing detection: For targets that are close to thefour surroundings, if they are not detected and tracked for 2frames, they disappear from the monitor range of the camera.

5) Merge detection: For two or more targets in previousframe, if they are linked to the same target in the current frame,target merging occurs. In this case the graph manager willseparate them. Fig.5 (a) shows target merging and (b) shows itsmask image (merging detection is marked by the red circle in

the middle) for another input image sequence. The principalaxis of the mask image for the merging targets, is calculatedand is used as the boundary to separate the merged targets.

6) Split detection: For a target in previous frame, it is linkedto two targets, and further the split targets keep track for 2frames, then split occurs.

The target graph manager maintains the trajectory of eachtarget. Fig. 5 (c) shows the target trajectory from frame 1 toframe 162, for the six targets.

IV.TARGET RECOGNITION

As indicated in Fig.1, the moving target recognition

subsystem accepts the tracked targets from the tracker. Foreach target, it performs matching with the target pattern indatabase. The target database stores the target name, targetregion represented by its hull, and image data. For image data

of target, the pixels beyond the hull region are cleared to zeros(refer to Fig. 3 (a)). The similarity measure for targetrecognition is based on Eq. (3). For the recognized target, thissubsystem output the target name. For unknown target, thissubsystem will register it to the database. For the recognizedtarget, its model image data is updated.

V. EXPERIMENT RESULTS

The above algorithm is implemented by using MS-VisualC++ 6.0 and Intel Open CV, running on Windows platform. used in missing detection is set at 3, and 2 for new targetdetection and disappearing target detection is set at 5. Thecalculation for modified similarity measure employs 500randomly selected pixels inside target hull, and HSV feature isused in Eq. (4). The test video sequences are from AFRL Vividdatabase. Fig. 6 shows some target detection and trackingresults. First column from left shows the detected and trackedtarget (shown by global number and circled by green ellipses)till frame 48. The second column shows the tracking is lost

because of the dynamic observer movement (red ellipses showsthe detected targets). The third column shows the five targets

under tracking and a newly detected target (shown by yellowellipse). The fourth column shows the target merging, which issplit into two targets. Fig. 7 shows some target tracking and

Frame 48 Frame 108 Frame 198 Frame 342

Fig. 6 Target detection and tracking results at frame 48, 108, 198, and 342, respectively.

Frame 30 Frame 144 Frame 244 Frame 636

Fig.7 Target tracking and recognition results at frame 30, 144, 244, and 636, respectively.

982


6/6

recognition result. From left to right: (i) blue sedan, (ii) graypick-up truck (iii) white sedan and gray pick-up truck, and (iv)white sedan and gray pick-up truck. In (iii), the gray pick-uptruck is wrongly recognized as a blue sedan because it is

partially hidden by trees, and in (iv) white sedan and gray pick-up truck are both wrongly recognized as blue sedan becausethey are both partially hidden by trees. The average executiontime for target detection, tracking and recognition, on a

Windows Vista machine mounted with a 2.33GHz Intel Core2CPU and 2GB memory, are shown in Table 1.

TABLE 1 AVERAGE PROCESSING TIMEFOR TARGET DETECTION, TRACKING AND RECOGNITION

Processing task Time (ms)

Target detection 316.1

Target tracking and recognition 363.7

VI.CONCLUSIONS

This paper proposed an algorithm for multiple moving targetdetection, tracking and recognition from a moving observer.

The moving observer is a manned/unmanned aerial vehiclemounted with a camera. The proposed algorithm first estimatethe motion model between two consecutive image frames,which is used to remove the moving background. Then itemploys a similarity measure for target tracking based on jointfeature-spatial space. The joint feature-spatial space is HSVfeature and geometry information. The similarity calculationemploys 500 randomly selected pixels. On a Windows Vistamachine mounted with a 2.33GHz Intel Core2 CPU and 2GBmemory, the average processing time is 680 ms. It leads to 1.5frame/s processing rate. The experiment results show the

proposed algorithm is efficient and fast.

ACKNOWLEDGEMENTThis work was partially supported by a grant from AFRL underMinority Leaders Program, contract No. TENN 06-S567-07-C2. Also, the authors would like to thank AFRL for providingthe datasets used in this research.

REFERENCES

[1] M. Isard and A. Blake, CONDENSATION ConditionalDensity Propagation for Visual Tracking, International Journalon Computer Vision, vol. 1, no. 29, pp.5-28, 1998.

[2] K. Nummiaro, E. Koller-Meier, and L. V. Gool, An AdaptiveColor-based Particle Filter, Image and Vision Computing, vol.21, 2002, pp.99-110.

[3] D. Tweed and A. Calway, Tracking Many Objects Using

Subordinated Condensation, in 13th British Machine VisionConference (BMVC 2002), 2002.

[4] M. Ross, Model-free, Statistical Detection and Tracking ofMoving Objects, in 13th International Conference on Image

Processing(ICIP 2006), Atlanta, GA, Oct.8-11, 2006.[5] Y. L. Tian and A. Hampapur, Robust Salient Motion Detection

with Complex Background for Real-time Video Surveillance,IEEE Computer Society Workshop on Motion and VideoComputing, Breckenridge, Colorado, Jan. 5-6, 2005.

[6] J. Kang, I. Cohen, G. Medioni, and C. Yuan, Detecction andTracking of Moving Objects from a Moving Platform inPresence of Strong parallax, IEEE international Conference onComputer Vision (ICCV), Beijing, China, Oct. 2005.

[7] S.M. Smith and J.M. Brady, ASSET-2: Real-time Motion andShape Tracking, IEEE Transactions on Pattern Analysis andMachine Intelligence, vol. 17, No. 8, Aug. 1995.

[8] A. Ollero, J. Ferruz, et al, Motion Compensation and ObjectDetection for Autonomous helicopter Visual Navigation in

COMETS system, in Proceedings of IEEE InternationalConference on Robotics and Automation, New Orleans, LA,USA, April 26 May 1, 2004.

[9] C. Yang, R. Duraiswami and L. Davis, Efficient Mean-ShiftTracking via a New Similarity measure, inProceedings of IEEE

International Conference on Computer Vision and PatternRecognition, CVPR 2005, San Diego, CA, USA, June 20-25,pp.176-183.

[10]M. Han, A. Sethi and Y. Gong, A Detection-based MultipleObject Tracking Method, in Proceedings of 2004 IEEE

International Conference on Image Processing (ICIP 2004),Singapore, October 24-27, 2004.

[11]A. Chia, W. Huang and L. Li, Multiple Objects Tracking withMultiple Hypotheses Graph Representation, in Proceedings ofthe 18th International Conference on pattern Recognition,

ICPR06, August 20 24, Hong Kong.[12]W. Qu, D. Schonfeld, and M. Mohamed, Distributed Bayesian

Multiple-Target Tracking in Crowded Environments UsingMultiple Collaborative Cameras, EURASIP Journal on

Advances in Signal Processing, Vol. 2007, Article ID 38373.[13]B.D. Lucas and T. Kanade, An Interactive Image Registration

Technique with an Application in Stereo Vision, in 7thInternational Joint Conference on Artificial Intelligent, 1981,pp.674-679.

[14]S. Araki, T. Matsuoka, et al, Real-time Tracking of MultipleMoving Object Contours in a Moving Camera Image Sequence,

IEICE transactions on information and systems, Vol. E83-D, No.7, July 2000.

[15]A. R. Webb, Statistical Pattern Recognition, John Weley &Sons, UK, 2nd Edition, 2002.

[16]A. Elgammal, R. Duraiswami, and L. S. Davis, ProbabilisticTracking in Joint Feature-Spatial Spaces, Proceedings ofIEEEComputer Society Conference on Computer Vision and Pattern

Recognition, Wisconsin, USA, June 16-22, 2003.[17]C. Yang, R. Duraiswami and L. Davis, Efficient Mean-Shift

Tracking via a New Similarity Measure, Proceedings ofIEEEComputer Society Conference on Computer Vision and Pattern

Recognition, San Diego, USA, June 20-25, 2005.

983

multiple moving target detection, tracking, and recognition from a moving observer

Documents