multiple moving target detection, tracking, and recognition from a moving observer
TRANSCRIPT
-
7/28/2019 Multiple Moving Target Detection, Tracking, and Recognition from a Moving Observer
1/6
Multiple Moving Target Detection, Tracking, andRecognition from a Moving Observer
Fenghui Yao and Ali Sekmen Mohan J. MalkaniDepartment of Computer Science Department of Electric and Computer Engineering
Tennessee State University3500 John A Merritt Blvd, Nashville, TN 37215, USA
{fyao, asekmen, mmalkani }@tnstate.edu
Abstract - This paper describes an algorithm formultiple moving targets detection, tracking and recognition froma moving observer. When the camera is placed on a movingobserver, the whole background of the scene appears to bemoving and the actual motion of the targets must be distinguishedfrom the background motion. To do this, an affine motion modelbetween consecutive frames is estimated, and then moving targetscan be extracted. Next, the target tracking employs a similaritymeasure which is based on the joint feature-spatial space. At last,the target recognition is performed by matching moving targets
with target database. The average processing time is 680 ms perframe, which corresponds to a processing rate of 1.5 frames persecond. The algorithm was tested on the Vivid datasets providedthe Air Force Research Laboratory and experimental resultsshow that this method is efficient and fast for real-timeapplication.
I. INTRODUCTION
Detection and tracking of moving objects in an image
sequence is one of the basic tasks in computer vision. The
detected moving object trajectory can be either of interest in its
own or used as the input for a high level analysis such as
motion pattern understanding, moving behavior recognition
and so on. Applications include surveillance, homeland
security, protection of vital infrastructure, and advanced
human-machine communication. Therefore, moving objects
detection and tracking has received more and more attention,
and many algorithms have been proposed. Among these, one
interesting approach is the Particle filter [1], which has been
used and extended many times [2] [3] [4]. Particle filter was
developed to track objects in clutter, where the posterior
density and observation density are often non-Gaussian. The
key idea of particle filtering is to approximate the probability
distribution by a weighted sample set. Each sample consists of
an element which represents the hypothetical state of an object
and a corresponding probability. The state of an object may be
control points of a contour [1], the position, shape and motionof an elliptical region [2], or specific model parameters [3].
That is, these methods [2] [3] are based on models. Rosss
approach [4] is a model-free, statistical detection method
which use both edge and color information. The common
assumption of these methods [1] [2] [3] is that the background
does not move, and the image sequences are from a stationary
camera. Tian et al [5] developed a real-time algorithm to detect
salient motion in complex environments by combining
temporal difference imaging and temporal filtered optical flow.
The image sequence used in this method is also from stationary
camera.
The works of Smith and Brady [7] and Kang et al [6]
employed the image sequences from moving platform. Kang
et al developed an approach for tracking of moving objects
observed by both stationary and Pan-Tilt-Zoom cameras. Smith
and Bradys approach employed the image sequence from a
camera mounted on a vehicle to detect other moving vehicle.This method used special-purpose hardware to implement the
real-time target detection and tracking. COMETS system
detects the target from a moving observer (an autonomous
helicopter but does not perform tracking [8]. Yang et als
tracker works for image sequence form both stationary and
moving platform but it detects and track single target [9].
Literature [10] proposes a detection-based multiple object
tracking, literature [11] shows a multiple object tracking
method based on multiple hypotheses graph representation, and
literature [12] demonstrates a distributed Bayesian multiple
target tracker. However they all employ image sequences from
stationary observers.As shown above, there are a few works to discuss the
multiple moving target detection and tracking from the moving
observer. And also few work deals with target recognition at
same time. This paper introduces a method for moving target
detection, tracking, and recognition from a moving observer.
II.MOVING TARGET DETECTION FROM AMOVING OBSERVER
The entire configuration is shown in Fig. 1. The output of
the moving target detection is sent to the target tracking. The
tracked targets are sent to target recognition. This section
describes moving target detection, target tracking and target
recognition is discussed in Section 3 and 4, respectively.
The moving observer usually means a camera mounted on aground vehicle or on an airborne platform such as a helicopter
or an unmanned aerial vehicle (UAV). In this work, the video
sequences are generated by an airborne camera. In airborne
video, everything (target and background) appear to be moving
over time due to the camera motion. Before employing frame
differencing (simple motion detection method for stationary
platforms) to detect motion images, it is necessary to conduct
motion compensation first. Two-frame background motion
978-1-4244-2184-8/08/$25.00 2008 IEEE. 978
Proceedings of the 2008 IEEE
International Conference on Information and Automation
June 20 -23, 2008, Zhangjiajie, China
-
7/28/2019 Multiple Moving Target Detection, Tracking, and Recognition from a Moving Observer
2/6
estimation is achieved by fitting a global parametric motion
model (affine or projective) to sparse optic flow. Here, we use
affine transformation model.
A. Optic Flow Detection
Sparse optic flow is obtained by applying Lucas-Kanade
algorithm [13]. The number of optic flow is controlled in the
range of 200 to 1000. Other methods such as matching Harris
corners, Moravec feature, SUSAN corners between frames, or
matching SIFT features are all applicable here. The mainfactors need to be considered are computation cost and
robustness. Experiment results show that Lucas-Kanade
method is most reliable and pretty fast.
B. Affine Parameter Estimation
2-D affine transformation is described as follows,
+
=
6
5
43
21
a
a
y
x
aa
aa
Y
X
i
i
i
i , (1)
where (xi,yi) are locations of feature points in previous frame,and (Xi, Yi) are locations of feature points in current frame.
Theoretically, to determine six affine parameters, three pairs ofmatched feature points are enough. How to select these threepairs of feature points will affect the precision of affineparameter estimation. To reduce this estimation error, theseparameters can be solved in the least-squares method based onall matched feature points. However the computation cost inleast-squares method is heavy. To reduce the computation timeand estimation error, this work use the algorithm similar toLMedS (Lest Median Square) [14]. Details are as follows. (i)Randomly select N pairs of matched feature points from
previous frame and current frame. And further, randomly selectM triplets from N pairs of matched feature points, where
NM
-
7/28/2019 Multiple Moving Target Detection, Tracking, and Recognition from a Moving Observer
3/6
.1
),(1
),(
1
2
1
2
1
= =
=
=
=
N
i
jih
M
j
ji
M
ijjxyx
h
vuG
yxK
MN
vyPM
IIJ
(3)
J(Ix,Iy) is symmetric and bounded by zero and one. Thissimilarity is based on the average separation criterion in clusteranalysis [15] except that it employs the distance with akernelized one. This similarity measure has been applied for asingle target tracking [16] [17].
B. Modified Similarity Measure for Multiple Target Tracking
In multiple target tracking, the similarity between the targetTk represented by the hullHk in (t-1)-th frame and the target Tlin the t-th frame depends on not only joint feature-spatial space
but also the distance between them. Therefore the similaritymeasure in Eq.(3) is modified as follows.
ktc
ts
tlc
tl
y
kt
xlk
ttkl
PPIIJTTS
,11
,1
,1 ),(),(
=
, (4)
where ktxI,1 , ktcP
,1 is the distribution of target samples inside
the hullHk and the target center in (t-1)-frame,tlyI and
tlcP is
the distribution of target samples insideHland the target center
in t-th frame, respectively, and 1ts is the affine
transformation model from (t-1)-th frame to t-th frame.
To verify the robustness of this similarity measure, four
targets extracted from aerial images as shown in Fig. 3 (a),
which are gray truck (GT), red sedan (RS), blue sedan (BS),and gray sedan (GS) from left to right, are employed for
similarity testing. These four targets are rotated in range of 0
to 180, with 5 increment in each rotation. The similarity
measure between these generated images and the gray truck in
Fig. 3 (a) are calculated by using 500 random sample points
from the each image. The similarity measures for GT-RS, GT-
BS, GT-GS, and GT-GT are sown in Fig.3 (b). The similarity
measure variance for GT-RS, GT-BS, GT-GS, and GT-GT
matching is 4.34 10-6, 1.04 10-5, 1.29 10-5, and 5.04 10-5,
respectively. These results show that the similarity in Eq. (4) is
robust to the rotation and scaling. To reduce the computation
time, there is no need to use all points inside the target hull.
The sample points can be chosen randomly from samplesinside the target hull.
C. Tracking Graph Management
The multiple targets tracker needs to handle all problems aslisted in Fig.2. The algorithms to deal with these problems areas follows.
1) Missing detection prediction: The targets which are undertracking till the frame right before the current frame may bemissed at the current frame because of the failure of detector.Missing detections at i-th frame, they are estimated from thedetection results obtained in image frames prior to the currentframe, by applying estimators. According to the position andvelocity of the target in previous image frames, its new
position and velocity in the new frame can be estimated byKalman filter, recursive Bayesuan estimator, or particle filter.In this work, Kalman filter is employed. From previous state
),,,( ,1,1,1,1 kiikiikickic vyx , the next state ),,,( ikikikcikc vyx
is estimated, where ),( ,1,1 kicki
c yx is the center of the target Tk
(which is missing at i-th frame) at (i-1)-th frame, and
),( ,1,1 kiikii v is the average velocity and direction over
passed frames. Fig. 4 (c) shows a missing detection at frame21, which will be estimated.
2) New target detection: New targets usually appear at thefour surroundings but not interior area. If a target is detectedand tracked over 2 frames, it is considered as a new target.Currently, four surroundings with the size of 20-pixel arecleared to zero, to remove the pixels that are not involved ingenerating frame difference. Toward inside, four surroundings
with size of 40-pixel are the area that new target may emerge.Fig. 4 (a) shows 3 newly detected targets at frame 6.
3) False detection filtering: For targets that emerge in theinside area of the image, and are not linked to the targets in
previous frame or next frame, they are false detection. They arefiltered out. Fig.4 (b) shows a false detection at frame 9, whichwill be filtered out.
Fig. 3 Robustness of similarity measure.
(b) Similarity measures between targets in (a)
(a) Four targets extracted from aerial image (from left to right:gray truck, red sedan, blue sedan, and white sedan)
Similarity Using 500 Samples
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170
theta
Similarity
GT-RS GT-BSGT-GS GT-GT
980
-
7/28/2019 Multiple Moving Target Detection, Tracking, and Recognition from a Moving Observer
4/6
(a)
(b)
Fig. 4 Target detection results. (a) Three targets (grey truck, grey sedan,and red sedan) are detected at frame 6; (b) Three targets andfalse detection (lower red ellipse) at frame 9; (c) Missingdetection (red sedan) at frame 21.
(c)
(b)
Fig. 5 Target detection results. (a) Merging detection at frame 162; (b)
Mask image showing target merging at frame 162; (c)Trajectory for six targets from frame 1 to frame 162.
(a)
(c)
981
-
7/28/2019 Multiple Moving Target Detection, Tracking, and Recognition from a Moving Observer
5/6
4) Disappearing detection: For targets that are close to thefour surroundings, if they are not detected and tracked for 2frames, they disappear from the monitor range of the camera.
5) Merge detection: For two or more targets in previousframe, if they are linked to the same target in the current frame,target merging occurs. In this case the graph manager willseparate them. Fig.5 (a) shows target merging and (b) shows itsmask image (merging detection is marked by the red circle in
the middle) for another input image sequence. The principalaxis of the mask image for the merging targets, is calculatedand is used as the boundary to separate the merged targets.
6) Split detection: For a target in previous frame, it is linkedto two targets, and further the split targets keep track for 2frames, then split occurs.
The target graph manager maintains the trajectory of eachtarget. Fig. 5 (c) shows the target trajectory from frame 1 toframe 162, for the six targets.
IV.TARGET RECOGNITION
As indicated in Fig.1, the moving target recognition
subsystem accepts the tracked targets from the tracker. Foreach target, it performs matching with the target pattern indatabase. The target database stores the target name, targetregion represented by its hull, and image data. For image data
of target, the pixels beyond the hull region are cleared to zeros(refer to Fig. 3 (a)). The similarity measure for targetrecognition is based on Eq. (3). For the recognized target, thissubsystem output the target name. For unknown target, thissubsystem will register it to the database. For the recognizedtarget, its model image data is updated.
V. EXPERIMENT RESULTS
The above algorithm is implemented by using MS-VisualC++ 6.0 and Intel Open CV, running on Windows platform. used in missing detection is set at 3, and 2 for new targetdetection and disappearing target detection is set at 5. Thecalculation for modified similarity measure employs 500randomly selected pixels inside target hull, and HSV feature isused in Eq. (4). The test video sequences are from AFRL Vividdatabase. Fig. 6 shows some target detection and trackingresults. First column from left shows the detected and trackedtarget (shown by global number and circled by green ellipses)till frame 48. The second column shows the tracking is lost
because of the dynamic observer movement (red ellipses showsthe detected targets). The third column shows the five targets
under tracking and a newly detected target (shown by yellowellipse). The fourth column shows the target merging, which issplit into two targets. Fig. 7 shows some target tracking and
Frame 48 Frame 108 Frame 198 Frame 342
Fig. 6 Target detection and tracking results at frame 48, 108, 198, and 342, respectively.
Frame 30 Frame 144 Frame 244 Frame 636
Fig.7 Target tracking and recognition results at frame 30, 144, 244, and 636, respectively.
982
-
7/28/2019 Multiple Moving Target Detection, Tracking, and Recognition from a Moving Observer
6/6
recognition result. From left to right: (i) blue sedan, (ii) graypick-up truck (iii) white sedan and gray pick-up truck, and (iv)white sedan and gray pick-up truck. In (iii), the gray pick-uptruck is wrongly recognized as a blue sedan because it is
partially hidden by trees, and in (iv) white sedan and gray pick-up truck are both wrongly recognized as blue sedan becausethey are both partially hidden by trees. The average executiontime for target detection, tracking and recognition, on a
Windows Vista machine mounted with a 2.33GHz Intel Core2CPU and 2GB memory, are shown in Table 1.
TABLE 1 AVERAGE PROCESSING TIMEFOR TARGET DETECTION, TRACKING AND RECOGNITION
Processing task Time (ms)
Target detection 316.1
Target tracking and recognition 363.7
VI.CONCLUSIONS
This paper proposed an algorithm for multiple moving targetdetection, tracking and recognition from a moving observer.
The moving observer is a manned/unmanned aerial vehiclemounted with a camera. The proposed algorithm first estimatethe motion model between two consecutive image frames,which is used to remove the moving background. Then itemploys a similarity measure for target tracking based on jointfeature-spatial space. The joint feature-spatial space is HSVfeature and geometry information. The similarity calculationemploys 500 randomly selected pixels. On a Windows Vistamachine mounted with a 2.33GHz Intel Core2 CPU and 2GBmemory, the average processing time is 680 ms. It leads to 1.5frame/s processing rate. The experiment results show the
proposed algorithm is efficient and fast.
ACKNOWLEDGEMENTThis work was partially supported by a grant from AFRL underMinority Leaders Program, contract No. TENN 06-S567-07-C2. Also, the authors would like to thank AFRL for providingthe datasets used in this research.
REFERENCES
[1] M. Isard and A. Blake, CONDENSATION ConditionalDensity Propagation for Visual Tracking, International Journalon Computer Vision, vol. 1, no. 29, pp.5-28, 1998.
[2] K. Nummiaro, E. Koller-Meier, and L. V. Gool, An AdaptiveColor-based Particle Filter, Image and Vision Computing, vol.21, 2002, pp.99-110.
[3] D. Tweed and A. Calway, Tracking Many Objects Using
Subordinated Condensation, in 13th British Machine VisionConference (BMVC 2002), 2002.
[4] M. Ross, Model-free, Statistical Detection and Tracking ofMoving Objects, in 13th International Conference on Image
Processing(ICIP 2006), Atlanta, GA, Oct.8-11, 2006.[5] Y. L. Tian and A. Hampapur, Robust Salient Motion Detection
with Complex Background for Real-time Video Surveillance,IEEE Computer Society Workshop on Motion and VideoComputing, Breckenridge, Colorado, Jan. 5-6, 2005.
[6] J. Kang, I. Cohen, G. Medioni, and C. Yuan, Detecction andTracking of Moving Objects from a Moving Platform inPresence of Strong parallax, IEEE international Conference onComputer Vision (ICCV), Beijing, China, Oct. 2005.
[7] S.M. Smith and J.M. Brady, ASSET-2: Real-time Motion andShape Tracking, IEEE Transactions on Pattern Analysis andMachine Intelligence, vol. 17, No. 8, Aug. 1995.
[8] A. Ollero, J. Ferruz, et al, Motion Compensation and ObjectDetection for Autonomous helicopter Visual Navigation in
COMETS system, in Proceedings of IEEE InternationalConference on Robotics and Automation, New Orleans, LA,USA, April 26 May 1, 2004.
[9] C. Yang, R. Duraiswami and L. Davis, Efficient Mean-ShiftTracking via a New Similarity measure, inProceedings of IEEE
International Conference on Computer Vision and PatternRecognition, CVPR 2005, San Diego, CA, USA, June 20-25,pp.176-183.
[10]M. Han, A. Sethi and Y. Gong, A Detection-based MultipleObject Tracking Method, in Proceedings of 2004 IEEE
International Conference on Image Processing (ICIP 2004),Singapore, October 24-27, 2004.
[11]A. Chia, W. Huang and L. Li, Multiple Objects Tracking withMultiple Hypotheses Graph Representation, in Proceedings ofthe 18th International Conference on pattern Recognition,
ICPR06, August 20 24, Hong Kong.[12]W. Qu, D. Schonfeld, and M. Mohamed, Distributed Bayesian
Multiple-Target Tracking in Crowded Environments UsingMultiple Collaborative Cameras, EURASIP Journal on
Advances in Signal Processing, Vol. 2007, Article ID 38373.[13]B.D. Lucas and T. Kanade, An Interactive Image Registration
Technique with an Application in Stereo Vision, in 7thInternational Joint Conference on Artificial Intelligent, 1981,pp.674-679.
[14]S. Araki, T. Matsuoka, et al, Real-time Tracking of MultipleMoving Object Contours in a Moving Camera Image Sequence,
IEICE transactions on information and systems, Vol. E83-D, No.7, July 2000.
[15]A. R. Webb, Statistical Pattern Recognition, John Weley &Sons, UK, 2nd Edition, 2002.
[16]A. Elgammal, R. Duraiswami, and L. S. Davis, ProbabilisticTracking in Joint Feature-Spatial Spaces, Proceedings ofIEEEComputer Society Conference on Computer Vision and Pattern
Recognition, Wisconsin, USA, June 16-22, 2003.[17]C. Yang, R. Duraiswami and L. Davis, Efficient Mean-Shift
Tracking via a New Similarity Measure, Proceedings ofIEEEComputer Society Conference on Computer Vision and Pattern
Recognition, San Diego, USA, June 20-25, 2005.
983