[ieee proceedings. international conference on computer graphics, imaging and visualization, 2004....

5

Click here to load reader

Upload: qa

Post on 10-Mar-2017

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE Proceedings. International Conference on Computer Graphics, Imaging and Visualization, 2004. CGIV 2004. - Penang, Malaysia (26-29 July 2004)] Proceedings. International Conference

Image Subtraction for Real Time Moving Object Extraction

Shahbe Mat Desa, Qussay A. Salih

Faculty of Information Technology, Multimedia University [email protected], [email protected]

Abstract

This paper studies the task of extracting moving object from static irrelevant background. The

implementation is prepared for the purpose of real time

application. Some image processing concepts related to this study are first presented and improvement is

proposed. We have obtained motion mask by applying

background subtraction and consecutive frame differencing. We also propose a reliable background

update and noise reduction operator to facilitate the

result of moving object extraction. The analysis and result are obtained by using Matlab Image Processing

Toolbox Version 6.01 [1].

1. Introduction

In recent years, motion analysis has become

essential in many vision systems related to time

requiring examination. The rising interest in this

research is in conjunction with the immense attentions

of employing real time application to control complex

real world systems such as in the case of traffic

monitoring, airport surveillance and face verification

for ATM security. In achieving the realization of these

diverse applications, motion detection is one of the

most fundamental analysis tasks in the real time

process flow.

Main tasks of this study are to perform automatic

(a) motion detection in video sequence, (b) reference

background update, (c) segmentation of dynamic

region from static region and (d) noise reduction in the

segmentation result. The study based on motion

analysis is conducted and motivated by the high needs

of developing detection and segmentation algorithms to

facilitate an automated surveillance system. However

motion detection is essential to be further improved, as

there are many obstacles such as alteration of

illumination function and temporal cluttered motion

that will cause inaccurate detection. The objective of

this study is to implement a reliable and less

computational process for real time moving object

extraction.

Normally background subtraction is employed to

segment dynamic region from static region. The output

of the normal background subtraction is shown in

figures 4(b), 5(b), 6(b) and 7(b). Background

subtraction separates the object of interest from

unrelated background, but contains scattering noise. In

this study, we have applied background subtraction and

temporal differencing on three consecutive frames to

enhance the result of the common background

subtraction. Additionally, the implementation of

background update and noise reduction operator is

proposed to obtain subtle result. The performance of

these reliable and less complex steps is sufficiently

well as shown in figures 4(c), 5(c), 6(c) and 7(c). The

proposed method has produced a good extracted region

output, which is satisfactory to extend for the advanced

image processing. We expect the higher-level

processes such as object recognition and moving object

tracking on this extracted region output will be

computationally less complex and simply to be

performed as the meaningful moving region has been

segmented from unrelated background.

We have performed the analysis on several

different scenes. The scenes observed are moving

vehicles on road; and people walking indoors and

outdoors. Each scene has been categorized into

different background quality levels; good, moderate

and bad. The input is video image, which is taken using

video digital camera. Performance of the proposed

motion detection is evaluated by measuring the root

mean square error (RMSE) of both normal subtracted

image and proposed subtracted image.

1.1 Real time motion detection

Motion detection is defined in [2] as a binary

labeling problem whose goal is to attribute to each

pixel s(x,y) of image S at time t with one of the

following label ls values:

Proceedings of the International Conference on Computer Graphics, Imaging and Visualization (CGIV’04)

0-7695-2178-9/04 $20.00 © 2004 IEEE

Page 2: [IEEE Proceedings. International Conference on Computer Graphics, Imaging and Visualization, 2004. CGIV 2004. - Penang, Malaysia (26-29 July 2004)] Proceedings. International Conference

1, if moving objects

0, if static backgroundl

ss s

(1)

Real time motion detection is generally a repeated

operation and it is a launching point of all advanced

steps in an automated system as depicted in figure 1.

The basic idea of mostly automated surveillance

applications is that motion detection continuously

operating and the system is triggered to perform

higher-level processes such as object recognition and

tracking if it detects motion. For example in the case

of home security system, the system operates by

performing continual inspection on dynamic and static

information in surrounding. The system will be

alarmed automatically to execute higher-level

examination only if it detects the presence of moving

object based on the motion analysis.

1.2 Image subtraction

Generally there are two approaches in image

subtraction: (i) background subtraction as discussed in

[3], [4], and [5]; and (ii) temporal differencing as

discussed in [6] and [7]. Background subtraction is

computing the difference between interest frame image

and background frame image. Temporal differencing is

computing the difference between consecutive frame

images. Motion mask that resulted from image

subtraction is depicted in figure 2. The shaded region

shown in the figure illustrates the dynamic pixels.

Each gray value A(x,y) of frameA as defined in (2)

is subtracted from its corresponding gray value B(x,y)of frameB as defined in (3), where w and h is the frame

width and height respectively.

: A( , )| 1,2,3,..., and 1,2,3,...,x y x y x w y h (2)

: ( , )| 1,2,3,..., and 1,2,3,...,x y B x y x w y h (3)

The difference value between two correspondingpixels A(x,y) and B(x,y), is converted into absolute

value and stored in difference matrix dAB as illustrated

in (4). This is to eliminate negative value afterundergoing subtraction [6]. Motion mask motionAB

between the two frames is obtained after applying

thresholding on the difference matrix dAB (5).

(1,1) (1, ) (1,1) (1, )

( ,1) ( , ) ( ,1) ( , )

ABA A h B B

A w A wh B w B wh

d

h

(4)

1,( , )

0,

dAB

AB

if d (x,y)>Tmotion x y

otherwise (5)

where Td is a difference threshold value.

2. Proposed moving object extraction for

real time process

The proposed method for moving object extraction

will continuously read video image frame. The study is

divided into three parts: firstly, motion maskextraction; secondly, background reconstruction; and

thirdly, noise reduction. Initially, system parameters

are initialized as follows:

Difference threshold: Td=0.06

Motion threshold: M=4%

Background frame, B = f 0, where f 0 is a static

background frame.

2.1 Motion mask extraction

In order to obtain the motion mask of frame fk,

background subtraction and temporal differencing havebeen applied. Background subtraction is performed

between frame fk and background frame B. The result

of this background subtraction is a difference matrix dB

(6). Temporal differencing is performed between frame

fk and fk-1, and frame fk and fk+1. The outputs are two difference matrixes and they are referred as dk-1 and

dk+1 (7). Then thresholding with difference threshold

value Td is performed on the difference matrixes; dk-1,dB and dk+1 ( 8).

Figure 1: Motion detection

k=k+1Sequence of frames

Automatically detect

motion in f k

f k-1

f kmotion

detected?

f k+1yes

Perform higher-level

operation on frame f k

Figure 2: (a) frameA, (b) frameB,

(c) Motion mask between frameA and frameB

(a) (b) (c)

Proceedings of the International Conference on Computer Graphics, Imaging and Visualization (CGIV’04)

0-7695-2178-9/04 $20.00 © 2004 IEEE

Page 3: [IEEE Proceedings. International Conference on Computer Graphics, Imaging and Visualization, 2004. CGIV 2004. - Penang, Malaysia (26-29 July 2004)] Proceedings. International Conference

'' k KKd f f (6)

where K’= k-1 and k+1.

kBd f B (7)

' 1,( , )

0,

K '

K d if d (x,y)>Td x y

otherwise (8)

where K’=k-1, k and k+1

The process is followed by applying AND operator

between dB and dk-1, and dB and dk+1 (9). The outputs ofthe AND operation are two motion masks: motionk-1

and motionk+1. Lastly, motion mask of frame fk isobtained by applying OR operator between these two

motion masks and the output is named as motionk (10).

The process flow is illustrated in figure 3.

''

B KKmotion d d (9)

where K’= k-1 and k+1.

1 1k kmotion motion motionk (10)

2.2 Background reconstruction

Updating the scene model is necessary for the

reason that background varies from time to time as thescene changes due to (i) variation of illumination

function and (ii) temporal cluttered motion. Thus, the

frame model is constantly updated to reflex currentsituation. We propose an update operator dynamick that

is based on the percentage of dynamic pixel in motionmask of frame fk (11).

_

*kdynamic

dynamic pixel

w h% (11)

where dynamick is the percentage of dynamic pixel,

w*h is the size of frame fk and dynamic_pixel is thenumber of dynamic pixel in motion mask motionk.

Background model will be updated to frame f k only if

dynamick is below the predefined motion threshold Mas defined in (12). Otherwise, the background model

remains unchanged.

B = f k if dynamick <M (12)

2.3 Noise reduction

Morphological operator is employed on the motion

mask motionk to remove noisy spots [4]. The

morphological operators implemented are erosion followed by dilation. Erosion removes isolated

foreground pixels while dilation adds pixels to the

boundary of the object and closes isolated background

pixel.The erosion operator with structuring element E, is

denoted in (13). This operation will result in a value of

1 in motion mask motionk at location P=(x,y) (14), ifthe spatial arrangement of ones in structure element EP

fully matches the arrangement of ones in motionk. The

dilation operator on motion mask motionk with structure element D, is denoted in (15). The result is

the set of all points P=(x,y) such that reflection D̂ P

and motionk overlap by at least one nonzero element.

|k PE P Emotion motionk (13)

where EP is the structuring element of erosion at

location P(x,y).

: P( , )| 1,2,3,..., and 1,2,3,...,x y x y x w y h (14)

where h and w is the frame height and width

respectively.

ˆ| Pk D P Dmotion motionk (15)

where D̂ P is the reflection of structuring element ofdilation at location P(x,y).

Consecutive Frame Differencing

frame

fk-1

frame

fk

frame

fk+1

Background Subtraction

difference

matrix dk-1

difference

matrix dB

difference

matrix dk+1

AND Operation

motion mask

motionk-1

motion mask

motionk+1

OR Operation

motion mask

motionk

Figure 3: Motion mask extraction

Proceedings of the International Conference on Computer Graphics, Imaging and Visualization (CGIV’04)

0-7695-2178-9/04 $20.00 © 2004 IEEE

Page 4: [IEEE Proceedings. International Conference on Computer Graphics, Imaging and Visualization, 2004. CGIV 2004. - Penang, Malaysia (26-29 July 2004)] Proceedings. International Conference

(a) (b) (c)(b) (c)(a)

Figure 4: SceneA Figure 5: SceneB

(a) (b) (c)(a) (b) (c)

Figure 6: SceneC Figure 7: SceneD

3. Result and discussion

There are several evalution methods that can beused to examined the performance of the proposed

algorithm. For examples: (i) counting the true

positive and false positive, and (ii) measuring root means square error (RMSE). In this study we have

chose the second technique as evaluation method.

This is because the smaller the RMSE, the better the performance of an algorithm.

To study the performance of the algorithm,

noises in the normal background subtraction aremanually removed to generate the ground truth G.

The outputs of the moving object extraction for bothcommon and proposed method are compared to the

ground truth by calculating the RMSE. RMSE is the

square root of the average squared difference

between every pixel in ground truth G(x,y) and analyzed output F(x,y) as shown in equation (16).

The result of proposed algorithm is shown in figures

4(c), 5(c), 6(c) and 7(c). The output of moving objectextraction has been improved where some of the

unrelated pixels have been almost completely

removed while the target image remains almostunaffected. The run time of detecting object on five

different frames using Pentium II machine is 8 to 15

seconds. This run time is considerably well for realtime process. Figure 8 displays four background

models. RMSE of both methods on four difference

scenes are shown in figure 9 and table 1.

1/22

1 1

1( , ) ( , )

*

w h

x yRMSE G x y F x y

h w (16)

Proceedings of the International Conference on Computer Graphics, Imaging and Visualization (CGIV’04)

0-7695-2178-9/04 $20.00 © 2004 IEEE

Page 5: [IEEE Proceedings. International Conference on Computer Graphics, Imaging and Visualization, 2004. CGIV 2004. - Penang, Malaysia (26-29 July 2004)] Proceedings. International Conference

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

SceneA SceneB SceneC SceneDScene

RMSE

Common Method

Propsed Method

SceneRMSE of Common

MethodRMSE of

Proposed Method

SceneA 0.0324 0.0275SceneB 0.0524 0.0216

SceneC 0.1288 0.0611SceneD 0.1495 0.0696

SceneA contains noncomplex background andless affected by any dynamic element compared to

SceneB. Thus, background quality for SceneA andSceneB are classified as good and moderate

respectively. Both SceneC and SceneD are classified

as bad quality background as they contain complexbackground and affected by temporal cluttered

motion such as swaying plants. The RMSEmeasurement shows that the error in the result ofproposed method is less compared to the result of

common background subtraction. Moreover, the bad

the background quality, the higher the RMSE valuegives.

4. Conclusion

Implementation of common background

subtraction always results in flooding of noise anddiscarding of considerable motion pixel. Thus, a

reliable method has been proposed and the output is

compared to the result of common method. The

comparison shows that the proposed method

performs better compared to common method.Furthermore, the performance of moving object

extraction is highly dependent on the background

quality. The proposed approach seems to be necessary for more challenging scenes. There are

many aspects in image subtraction that are not

considered. The problems of shadow cast by the object [3], adaptive thresholding [8] and presence of

temporal cluttered motion [9] are examples of related

studies conducted by existing researchers. Takingthem into consideration could lead to some

improvement.

(a) SceneA (b) SceneB (c) SceneC (d) SceneD

Figure 8: Four different background scenes.

5. Acknowledgement

I would like to express my sincere gratitude and appreciation to Prof. Ryoichi Komiya for his very

helpful comments on this paper.

6. ReferencesTable 1: Comparison of RMSE between common

method and proposed method[1] Image Processing Toolbox User’s Guide Version 2, The

Math Works Inc., 1997.Figure 9: Comparison of RMSE between common

method and proposed method[2] Christophe Dumontier, Franck Luthon, Jean-Pierre

Charras, “Real Time DSP Implementation for MRF-Based

Video Motion Detection”, IEEE Transactions on Image

Processing, Vol.8No.10, pp.1341-1347, Oct 1999.

[3] Paul L. Rosin and Tim Ellis, “Image Difference

Threshold Strategies and Shadows Detection”, 1995.

[4] LIU Ya, AI Haizho, XU Guangyou, “Moving Object

Detection and Tracking Based on Background

Subtraction”, 2001.

[5] B.Prabhakar and Damodar V.Kadaba, “Automatic

Detection and Matching of Moving Objects”, CRL

Technical Journal, Vo.3 No.3, pp.32-37, Dec 2001.

[6] S.Y. Koay, A.R. Ramli, Y.P. Lew, V. Prakash and R.

Ali, “A Motion Region Estimation Technique for Web

Camera Application”, Student Conference on Research and

Development Proceedings, pp. 352-355, Shah Alam

Malaysia, 2002.

[7] J. Pons, J. Prades-Nebot, A. Albiol and J. Molina, “Fast

Motion Detection in Compressed Domain for Video

Surveillance”, IEE 2002 Electronics Letters, Vol.38 No.

29, pp. 409-411, April 2002.

[8] Jong Bae Kim and Hang Joon Kim, “Efficient Region-

Based Motion Segmentation for a Video Monitoring

System”, 2002.

[9] Phillip M. Ngan, “Motion Detection using

Approximate Entropy”, 1997

Proceedings of the International Conference on Computer Graphics, Imaging and Visualization (CGIV’04)

0-7695-2178-9/04 $20.00 © 2004 IEEE