[ieee 2009 ieee student conference on research and development (scored) - upm serdang, malaysia...
TRANSCRIPT
![Page 1: [IEEE 2009 IEEE Student Conference on Research and Development (SCOReD) - UPM Serdang, Malaysia (2009.11.16-2009.11.18)] 2009 IEEE Student Conference on Research and Development (SCOReD)](https://reader036.vdocuments.us/reader036/viewer/2022092702/5750a5f51a28abcf0cb5e00e/html5/thumbnails/1.jpg)
Study and Implementation of Color-based Object Tracking in Monocular Image Sequences
Muhammad Owais Mehmood
Undergraduate Student, Department of Electronic Engineering NED University of Engineering & Technology, Karachi, Pakistan
Abstract— Kernel tracking of density-based appearance models is implemented in this paper for real-time object tracking applications. First a ROI, i.e., the region of interest is selected in real-time to create a model. Then the matching and locating of the search object is achieved by using mean-shift algorithm. Experimental results show that this method can find perform object tracking with adaptation to scale and translation, robustness to noise and handling of occlusion.
Keywords- Machine vision, Image matching, Image motion analysis, Image Processing, Pattern Recognition
I. INTRODUCTION Object Tracking is assigning of consistent label to an object
being tracked across different frames of the video. It is a challenging, yet, important task in Computer Vision and finds applications in action recognition, multimedia indexing, automated surveillance, human-machine interfaces, and vehicular guidance systems.
Numerous approaches have been proposed for Object Tracking which includes Point Tracking, Kernel Tracking and Silhouette Tracking [1]. This paper emphasizes on the implementation of Kernel tracking of density-based appearance models. Kernel tracking computes the motion of an object, represented as a primitive region, across a sequence of frames. To implement color density based tracking using color histograms, mean-shift algorithm is used.
The implementation uses approaches from [2] and [3]. The result is a fast and light-weight object tracking solution which is robust to noise and variations of sizes. Experiments were performed on real-time data obtained from live camera feeds. Different objects and different background settings were used to prove the robustness of the system.
II. MEAN-SHIFT ALGORITHM Mean-shift is a tool for finding modes in a set of data
samples, manifesting an underlying probability density function (PDF). In other words, given a PDF which in this case is a normalized histogram, the Mean-shift algorithm tends toward the peak of PDF.
From the object to be created a normalized PDF, in this case a normalized color histogram is created. As in figure 1, object with the similar color histogram is searched in the vicinity of object as in the previous frame. This histogram matching can be done by the Bhattacharyya Coefficient [2]. If
it the histogram matching is close enough, the system assumes it is the same object; else this process is repeated over and over again which results in tracking.
Figure 1. Overview of the process
III. CONTINUOUSLY ADAPTIVE MEAN-SHIFT ALGORITHM Mean-shift can efficiently handle real data analysis without
assuming any prior shape assumptions. There is only one parameter “window size” which has to be set. However, this parameter is non-trivial and inappropriate window sizes can cause false positives/negatives. Therefore, continuously adaptive mean-shift algorithm (CAMSHIFT) is employed as in [3].
It is called Continuously Adaptive because other than mean-shift it also adjusts the size and angle of the object. The
Choose a reference
model
Represent model by PDF
in color feature space
Identify the model in
present frame
Search model in next consecutive
frame
Repeat process
Proceedings of 2009 �����Student Conference on Research and Development (SCOReD 2009), ����������16-18 Nov. 2009, UPM Serdang, Malaysia
978-1-4244-5187-6/09/$26.00 ©2009 IEEE109
![Page 2: [IEEE 2009 IEEE Student Conference on Research and Development (SCOReD) - UPM Serdang, Malaysia (2009.11.16-2009.11.18)] 2009 IEEE Student Conference on Research and Development (SCOReD)](https://reader036.vdocuments.us/reader036/viewer/2022092702/5750a5f51a28abcf0cb5e00e/html5/thumbnails/2.jpg)
scale and orientation are the best fitted by not only keeping the window centered over the area which has the highest probability but CAMSHIFT also finds match by starting at the object’s previous location and calculating the center of gravity of the PDF probability values within a rectangle. This process is repeated until the rectangle is well-adjusted at the center of gravity of the object.
Due to the continuously adaptive window size, the system is robust to the object’s distribution scale and to translation. Translation in a monocular feed results in change of object size; hence, it is properly handled by CAMSHIFT as well.
Moreover, CAMSHIFT algorithm is based on the HSV color system and relies only upon Hue value (separating the brightness factor out) which gives CAMSHIFT algorithm much wider tolerations against illumination variations.
IV. CAMSHIFT ALGORITHM The basic meanshift tracking starts with the selection of
Region of Interest (ROI) whose normalized color histogram is used. In the next frame, candidate histogram is created at the same location.
Let uq be the normalized color histogram of the target
ROI and up be the candidate normalized color histogram at the location x, same as that of the target in the previous image. Let I be a pixel in the ROI around x, then it provides a weighted sample. If u be the bin of the histogram in which the color is, then the weight contributed by the pixel is
u
uq
piw �
The mean shift vector is given as
( )i i
i
ii
w x
wM x x�
� ��
which is then updated by
i ii
ii
w x
w x���
where i i
ii
i
w x
w
��
is called the weighted ratio. The current
position in then updated and iteration is done. The weighted ratio is biased towards the mode of the normalized color histogram and hence tends toward the correct position. But, meanshift would fail when dealing with variable window sizes. However, CAMSHIFT calculates the centroid of the color probability distribution within its 2D window of calculation, re-centers the window, then calculates the area for the next window size [3]. This overcomes the limitation associated with the basic meanshift algorithm.
(a) (b)
(c) (d)
Figure 2. Object selected in (a) being tracked correctly in (b), (c) and (d) with variations in size and orientation.
The camshift algorithm can be summarized as: calculation of 1D color histograms from H (Hue) of HSV color model. It is preferred over RGB model because RGB is much sensitive to lightning changes. As in the meanshift algorithm, probability distributions of both target and candidate model are built, with the ROI location being the same in both frames. However, the candidate model’s ROI size is slightly larger. Then, meanshift is performed for a set number of iterations; zeroth moment (area or size) and mean location are stored. The mean location serves as the center of search window/ROI for the next frame and search window size is a function of the zeroth moment calculated previously. This is repeated for all frames in the sequence. Further details are available in [3].
V. RESULTS Figures 2 and 3 show the real-time tracking results of rigid
and non-rigid objects, respectively. Tracking is performed correctly despite variations in the orientation, shapes and sizes.
Figure 3. Object tracked with variations in shape and size
110
![Page 3: [IEEE 2009 IEEE Student Conference on Research and Development (SCOReD) - UPM Serdang, Malaysia (2009.11.16-2009.11.18)] 2009 IEEE Student Conference on Research and Development (SCOReD)](https://reader036.vdocuments.us/reader036/viewer/2022092702/5750a5f51a28abcf0cb5e00e/html5/thumbnails/3.jpg)
(a) (b)
(c)
Figure 4. (a) Object being tracked while leaving the Field of
View of Camera (b) Object leaves the FOV (c) Object re-entry into the Field of View and consistent tracking.
As long as the occlusion is not 100%, CAMSHIFT continues to track an object [3]. However, Experiments of the implementation showed that when the object moves out of the scene (100% occlusion) and re-enters from the same point/area, then the algorithm can successfully handle 100% occlusion as in figure 4. However, if the object entry/exit point is different then the system looses track completely. Figure 5 shows tracking results with input feed obtained from a lower quality webcam. The images are blurred, yet, the object is correctly tracked.
VI. CONCLUSION Object tracking is an important task for various computer
and machine vision applications, ranging from industrials tasks, e.g., to place objects automatically or to align them to a moving part, to complex tasks, e.g., localization and mapping in a robot vision application.
To conclude, in this paper object tracking using mean-shift was reviewed and real-time tracking was performed using continuously adaptive mean-shift which resulted in correct correspondences even with noise, size and orientation variances, and occlusion. The actual application was implemented on VC++ using OpenCV libraries.
REFERENCES [1] A. Yilmaz, O. Javed, and M. Shah, "Object tracking: A survey," ACM
Comput. Surv., vol. 38, no. 4, pp. 13+, 2006. [Online]. Available: http://dx.doi.org/10.1145/1177352.1177355
[2] D. Comaniciu and P. Meer, "Mean shift: a robust approach toward feature space analysis," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 24, no. 5, pp. 603-619, 2002. [Online]. Available: http://dx.doi.org/10.1109/34.1000236
[3] G. R. Bradski, "Computer vision face tracking for use in a perceptual user interface," Intel Technology Journal, no. Q2, 1998. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.14.7673
[4] D. Comaniciu, V. Ramesh, and P. Meer, "Kernel-based object tracking," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 25, no. 5, pp. 564-577, 2003. [Online]. Available: http://dx.doi.org/10.1109/TPAMI.2003.1195991
[5] D. Comaniciu, V. Ramesh, and P. Meer, "Real-time tracking of non-rigid objects using mean shift," vol. 2, 2000, pp. 142-149 vol.2. [Online]. Available: http://dx.doi.org/10.1109/CVPR.2000.854761
[6] Y. Cheng, "Mean shift, mode seeking, and clustering," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 17, no. 8, pp. 790-799, 1995. [Online]. Available: http://dx.doi.org/10.1109/34.400568
[7] Jue Wang, Yingqing Xu, Heung-Yeung Shum, Michael F. Cohen, “Video Tooning”, SIGGRAPH '04: ACM SIGGRAPH 2004 Papers, pp. 574-583, 2004. [Online]. Available: http://doi.acm.org/10.1145/1186562.1015763
[8] Computer Vision Application with C# by Arif Khan. (undated). [Online]. Viewed 2009 July-August. Available: http://www.codeproject.com/script/Articles/MemberArticles.aspx?amid=4058454
Figure 5. Tracking of the red object is performed in noisy,
blurred images.
111