[ieee proceedings. international conference on computer graphics, imaging and visualization, 2004....
TRANSCRIPT
Image Subtraction for Real Time Moving Object Extraction
Shahbe Mat Desa, Qussay A. Salih
Faculty of Information Technology, Multimedia University [email protected], [email protected]
Abstract
This paper studies the task of extracting moving object from static irrelevant background. The
implementation is prepared for the purpose of real time
application. Some image processing concepts related to this study are first presented and improvement is
proposed. We have obtained motion mask by applying
background subtraction and consecutive frame differencing. We also propose a reliable background
update and noise reduction operator to facilitate the
result of moving object extraction. The analysis and result are obtained by using Matlab Image Processing
Toolbox Version 6.01 [1].
1. Introduction
In recent years, motion analysis has become
essential in many vision systems related to time
requiring examination. The rising interest in this
research is in conjunction with the immense attentions
of employing real time application to control complex
real world systems such as in the case of traffic
monitoring, airport surveillance and face verification
for ATM security. In achieving the realization of these
diverse applications, motion detection is one of the
most fundamental analysis tasks in the real time
process flow.
Main tasks of this study are to perform automatic
(a) motion detection in video sequence, (b) reference
background update, (c) segmentation of dynamic
region from static region and (d) noise reduction in the
segmentation result. The study based on motion
analysis is conducted and motivated by the high needs
of developing detection and segmentation algorithms to
facilitate an automated surveillance system. However
motion detection is essential to be further improved, as
there are many obstacles such as alteration of
illumination function and temporal cluttered motion
that will cause inaccurate detection. The objective of
this study is to implement a reliable and less
computational process for real time moving object
extraction.
Normally background subtraction is employed to
segment dynamic region from static region. The output
of the normal background subtraction is shown in
figures 4(b), 5(b), 6(b) and 7(b). Background
subtraction separates the object of interest from
unrelated background, but contains scattering noise. In
this study, we have applied background subtraction and
temporal differencing on three consecutive frames to
enhance the result of the common background
subtraction. Additionally, the implementation of
background update and noise reduction operator is
proposed to obtain subtle result. The performance of
these reliable and less complex steps is sufficiently
well as shown in figures 4(c), 5(c), 6(c) and 7(c). The
proposed method has produced a good extracted region
output, which is satisfactory to extend for the advanced
image processing. We expect the higher-level
processes such as object recognition and moving object
tracking on this extracted region output will be
computationally less complex and simply to be
performed as the meaningful moving region has been
segmented from unrelated background.
We have performed the analysis on several
different scenes. The scenes observed are moving
vehicles on road; and people walking indoors and
outdoors. Each scene has been categorized into
different background quality levels; good, moderate
and bad. The input is video image, which is taken using
video digital camera. Performance of the proposed
motion detection is evaluated by measuring the root
mean square error (RMSE) of both normal subtracted
image and proposed subtracted image.
1.1 Real time motion detection
Motion detection is defined in [2] as a binary
labeling problem whose goal is to attribute to each
pixel s(x,y) of image S at time t with one of the
following label ls values:
Proceedings of the International Conference on Computer Graphics, Imaging and Visualization (CGIV’04)
0-7695-2178-9/04 $20.00 © 2004 IEEE
1, if moving objects
0, if static backgroundl
ss s
(1)
Real time motion detection is generally a repeated
operation and it is a launching point of all advanced
steps in an automated system as depicted in figure 1.
The basic idea of mostly automated surveillance
applications is that motion detection continuously
operating and the system is triggered to perform
higher-level processes such as object recognition and
tracking if it detects motion. For example in the case
of home security system, the system operates by
performing continual inspection on dynamic and static
information in surrounding. The system will be
alarmed automatically to execute higher-level
examination only if it detects the presence of moving
object based on the motion analysis.
1.2 Image subtraction
Generally there are two approaches in image
subtraction: (i) background subtraction as discussed in
[3], [4], and [5]; and (ii) temporal differencing as
discussed in [6] and [7]. Background subtraction is
computing the difference between interest frame image
and background frame image. Temporal differencing is
computing the difference between consecutive frame
images. Motion mask that resulted from image
subtraction is depicted in figure 2. The shaded region
shown in the figure illustrates the dynamic pixels.
Each gray value A(x,y) of frameA as defined in (2)
is subtracted from its corresponding gray value B(x,y)of frameB as defined in (3), where w and h is the frame
width and height respectively.
: A( , )| 1,2,3,..., and 1,2,3,...,x y x y x w y h (2)
: ( , )| 1,2,3,..., and 1,2,3,...,x y B x y x w y h (3)
The difference value between two correspondingpixels A(x,y) and B(x,y), is converted into absolute
value and stored in difference matrix dAB as illustrated
in (4). This is to eliminate negative value afterundergoing subtraction [6]. Motion mask motionAB
between the two frames is obtained after applying
thresholding on the difference matrix dAB (5).
(1,1) (1, ) (1,1) (1, )
( ,1) ( , ) ( ,1) ( , )
ABA A h B B
A w A wh B w B wh
d
h
(4)
1,( , )
0,
dAB
AB
if d (x,y)>Tmotion x y
otherwise (5)
where Td is a difference threshold value.
2. Proposed moving object extraction for
real time process
The proposed method for moving object extraction
will continuously read video image frame. The study is
divided into three parts: firstly, motion maskextraction; secondly, background reconstruction; and
thirdly, noise reduction. Initially, system parameters
are initialized as follows:
Difference threshold: Td=0.06
Motion threshold: M=4%
Background frame, B = f 0, where f 0 is a static
background frame.
2.1 Motion mask extraction
In order to obtain the motion mask of frame fk,
background subtraction and temporal differencing havebeen applied. Background subtraction is performed
between frame fk and background frame B. The result
of this background subtraction is a difference matrix dB
(6). Temporal differencing is performed between frame
fk and fk-1, and frame fk and fk+1. The outputs are two difference matrixes and they are referred as dk-1 and
dk+1 (7). Then thresholding with difference threshold
value Td is performed on the difference matrixes; dk-1,dB and dk+1 ( 8).
Figure 1: Motion detection
k=k+1Sequence of frames
Automatically detect
motion in f k
f k-1
f kmotion
detected?
f k+1yes
Perform higher-level
operation on frame f k
Figure 2: (a) frameA, (b) frameB,
(c) Motion mask between frameA and frameB
(a) (b) (c)
Proceedings of the International Conference on Computer Graphics, Imaging and Visualization (CGIV’04)
0-7695-2178-9/04 $20.00 © 2004 IEEE
'' k KKd f f (6)
where K’= k-1 and k+1.
kBd f B (7)
' 1,( , )
0,
K '
K d if d (x,y)>Td x y
otherwise (8)
where K’=k-1, k and k+1
The process is followed by applying AND operator
between dB and dk-1, and dB and dk+1 (9). The outputs ofthe AND operation are two motion masks: motionk-1
and motionk+1. Lastly, motion mask of frame fk isobtained by applying OR operator between these two
motion masks and the output is named as motionk (10).
The process flow is illustrated in figure 3.
''
B KKmotion d d (9)
where K’= k-1 and k+1.
1 1k kmotion motion motionk (10)
2.2 Background reconstruction
Updating the scene model is necessary for the
reason that background varies from time to time as thescene changes due to (i) variation of illumination
function and (ii) temporal cluttered motion. Thus, the
frame model is constantly updated to reflex currentsituation. We propose an update operator dynamick that
is based on the percentage of dynamic pixel in motionmask of frame fk (11).
_
*kdynamic
dynamic pixel
w h% (11)
where dynamick is the percentage of dynamic pixel,
w*h is the size of frame fk and dynamic_pixel is thenumber of dynamic pixel in motion mask motionk.
Background model will be updated to frame f k only if
dynamick is below the predefined motion threshold Mas defined in (12). Otherwise, the background model
remains unchanged.
B = f k if dynamick <M (12)
2.3 Noise reduction
Morphological operator is employed on the motion
mask motionk to remove noisy spots [4]. The
morphological operators implemented are erosion followed by dilation. Erosion removes isolated
foreground pixels while dilation adds pixels to the
boundary of the object and closes isolated background
pixel.The erosion operator with structuring element E, is
denoted in (13). This operation will result in a value of
1 in motion mask motionk at location P=(x,y) (14), ifthe spatial arrangement of ones in structure element EP
fully matches the arrangement of ones in motionk. The
dilation operator on motion mask motionk with structure element D, is denoted in (15). The result is
the set of all points P=(x,y) such that reflection D̂ P
and motionk overlap by at least one nonzero element.
|k PE P Emotion motionk (13)
where EP is the structuring element of erosion at
location P(x,y).
: P( , )| 1,2,3,..., and 1,2,3,...,x y x y x w y h (14)
where h and w is the frame height and width
respectively.
ˆ| Pk D P Dmotion motionk (15)
where D̂ P is the reflection of structuring element ofdilation at location P(x,y).
Consecutive Frame Differencing
frame
fk-1
frame
fk
frame
fk+1
Background Subtraction
difference
matrix dk-1
difference
matrix dB
difference
matrix dk+1
AND Operation
motion mask
motionk-1
motion mask
motionk+1
OR Operation
motion mask
motionk
Figure 3: Motion mask extraction
Proceedings of the International Conference on Computer Graphics, Imaging and Visualization (CGIV’04)
0-7695-2178-9/04 $20.00 © 2004 IEEE
(a) (b) (c)(b) (c)(a)
Figure 4: SceneA Figure 5: SceneB
(a) (b) (c)(a) (b) (c)
Figure 6: SceneC Figure 7: SceneD
3. Result and discussion
There are several evalution methods that can beused to examined the performance of the proposed
algorithm. For examples: (i) counting the true
positive and false positive, and (ii) measuring root means square error (RMSE). In this study we have
chose the second technique as evaluation method.
This is because the smaller the RMSE, the better the performance of an algorithm.
To study the performance of the algorithm,
noises in the normal background subtraction aremanually removed to generate the ground truth G.
The outputs of the moving object extraction for bothcommon and proposed method are compared to the
ground truth by calculating the RMSE. RMSE is the
square root of the average squared difference
between every pixel in ground truth G(x,y) and analyzed output F(x,y) as shown in equation (16).
The result of proposed algorithm is shown in figures
4(c), 5(c), 6(c) and 7(c). The output of moving objectextraction has been improved where some of the
unrelated pixels have been almost completely
removed while the target image remains almostunaffected. The run time of detecting object on five
different frames using Pentium II machine is 8 to 15
seconds. This run time is considerably well for realtime process. Figure 8 displays four background
models. RMSE of both methods on four difference
scenes are shown in figure 9 and table 1.
1/22
1 1
1( , ) ( , )
*
w h
x yRMSE G x y F x y
h w (16)
Proceedings of the International Conference on Computer Graphics, Imaging and Visualization (CGIV’04)
0-7695-2178-9/04 $20.00 © 2004 IEEE
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
SceneA SceneB SceneC SceneDScene
RMSE
Common Method
Propsed Method
SceneRMSE of Common
MethodRMSE of
Proposed Method
SceneA 0.0324 0.0275SceneB 0.0524 0.0216
SceneC 0.1288 0.0611SceneD 0.1495 0.0696
SceneA contains noncomplex background andless affected by any dynamic element compared to
SceneB. Thus, background quality for SceneA andSceneB are classified as good and moderate
respectively. Both SceneC and SceneD are classified
as bad quality background as they contain complexbackground and affected by temporal cluttered
motion such as swaying plants. The RMSEmeasurement shows that the error in the result ofproposed method is less compared to the result of
common background subtraction. Moreover, the bad
the background quality, the higher the RMSE valuegives.
4. Conclusion
Implementation of common background
subtraction always results in flooding of noise anddiscarding of considerable motion pixel. Thus, a
reliable method has been proposed and the output is
compared to the result of common method. The
comparison shows that the proposed method
performs better compared to common method.Furthermore, the performance of moving object
extraction is highly dependent on the background
quality. The proposed approach seems to be necessary for more challenging scenes. There are
many aspects in image subtraction that are not
considered. The problems of shadow cast by the object [3], adaptive thresholding [8] and presence of
temporal cluttered motion [9] are examples of related
studies conducted by existing researchers. Takingthem into consideration could lead to some
improvement.
(a) SceneA (b) SceneB (c) SceneC (d) SceneD
Figure 8: Four different background scenes.
5. Acknowledgement
I would like to express my sincere gratitude and appreciation to Prof. Ryoichi Komiya for his very
helpful comments on this paper.
6. ReferencesTable 1: Comparison of RMSE between common
method and proposed method[1] Image Processing Toolbox User’s Guide Version 2, The
Math Works Inc., 1997.Figure 9: Comparison of RMSE between common
method and proposed method[2] Christophe Dumontier, Franck Luthon, Jean-Pierre
Charras, “Real Time DSP Implementation for MRF-Based
Video Motion Detection”, IEEE Transactions on Image
Processing, Vol.8No.10, pp.1341-1347, Oct 1999.
[3] Paul L. Rosin and Tim Ellis, “Image Difference
Threshold Strategies and Shadows Detection”, 1995.
[4] LIU Ya, AI Haizho, XU Guangyou, “Moving Object
Detection and Tracking Based on Background
Subtraction”, 2001.
[5] B.Prabhakar and Damodar V.Kadaba, “Automatic
Detection and Matching of Moving Objects”, CRL
Technical Journal, Vo.3 No.3, pp.32-37, Dec 2001.
[6] S.Y. Koay, A.R. Ramli, Y.P. Lew, V. Prakash and R.
Ali, “A Motion Region Estimation Technique for Web
Camera Application”, Student Conference on Research and
Development Proceedings, pp. 352-355, Shah Alam
Malaysia, 2002.
[7] J. Pons, J. Prades-Nebot, A. Albiol and J. Molina, “Fast
Motion Detection in Compressed Domain for Video
Surveillance”, IEE 2002 Electronics Letters, Vol.38 No.
29, pp. 409-411, April 2002.
[8] Jong Bae Kim and Hang Joon Kim, “Efficient Region-
Based Motion Segmentation for a Video Monitoring
System”, 2002.
[9] Phillip M. Ngan, “Motion Detection using
Approximate Entropy”, 1997
Proceedings of the International Conference on Computer Graphics, Imaging and Visualization (CGIV’04)
0-7695-2178-9/04 $20.00 © 2004 IEEE