spie proceedings [spie spie defense, security, and sensing - baltimore, maryland, usa (monday 29...

12
Flow detection via sparse frame analysis for suspicious event recognition in infrared imagery Henrique C. Fernandes *a , Marcos A. Batista b , Celia A. Z. Barcelos c , Xavier P. V. Maldague a a Laval University, Department of Electrical and Computer Engineering, 1065, av. de la Medecine, Quebec City, G1V 0A6. Canada. b Federal University of Goias, Department of Computer Science, Catalao, Goias, Brazil. c Federal University of Uberlandia, Faculty of Mathematics, Uberlandia, Minas Gerais, Brazil. ABSTRACT It is becoming increasingly evident that intelligent systems are very beneficial for society and that the further development of such systems is necessary to continue to improve society’s quality of life. One area that has drawn the attention of recent research is the development of automatic surveillance systems. In our work we outline a system capable of monitoring an uncontrolled area (an outside parking lot) using infrared imagery and recognizing suspicious events in this area. The first step is to identify moving objects and segment them from the scene’s background. Our approach is based on a dynamic background-subtraction technique which robustly adapts detection to illumination changes. It is analyzed only regions where movement is occurring, ignoring influence of pixels from regions where there is no movement, to segment moving objects. Regions where movement is occurring are identified using flow detection via sparse frame analysis. During the tracking process the objects are classified into two categories: Persons and Vehicles, based on features such as size and velocity. The last step is to recognize suspicious events that may occur in the scene. Since the objects are correctly segmented and classified it is possible to identify those events using features such as velocity and time spent motionless in one spot. In this paper we recognize the suspicious event “suspicion of object(s) theft from inside a parked vehicle at spot X by a person” and results show that the use of flow detection increases the recognition of this suspicious event from 78.57% to 92.85%. Keywords: surveillance, infrared imagery, flow detection, moving object segmentation, background subtraction, suspicious event recognition 1. INTRODUCTION In recent years the increase in society’s concern with security issues has been remarkable. Cars are monitored by GPS, houses are equipped with alarms against theft based on movement detection and these alarm systems are integrated with the INTERNET. Examples of technology use to support security and surveillance activities can be easily found. For example, in 2010 1 the United Kingdom accounted for one quarter of all the CCTV (Closed-circuit television) cameras in the world and and people in London were captured by CCTV cameras up to 300 times a day. 1 In order to develop an efficient surveillance system the installation of several video cameras is needed. Whether by public initiatives similar to what has been done in London or by private concerns which lead a citizen to install video cameras on his/her property, video cameras will become increasingly more and more present in our lives. A study shows that the market for global CCTV (Closed-circuit television) is projected to reach around US$ 28 billion by the end of 2013. 2 A constant and huge information flow is generated by the images from these CCTVs, however sometimes so much information is available that the system operator cannot visualize what he/she really must see. Monitoring based only on cameras and a human operator is expensive, tiring and inefficient. For instance, monitoring 25 cameras, 24 hours a day, 7 days per week, using human observers costs around US$150,000.00 per year. Besides, after only 20 minutes, human attention to video monitors degenerates to an unacceptable level. 3 * E-mail: [email protected] Thermosense: Thermal Infrared Applications XXXV, edited by Gregory R. Stockton, Fred P. Colbert, Proc. of SPIE Vol. 8705, 870507 · © 2013 SPIE · CCC code: 0277-786X/13/$18 · doi: 10.1117/12.2015077 Proc. of SPIE Vol. 8705 870507-1 DownloadedFrom:http://proceedings.spiedigitallibrary.org/on11/07/2013TermsofUse:http://spiedl.org/terms

Upload: fred-p

Post on 20-Dec-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Flow detection via sparse frame analysis for suspicious eventrecognition in infrared imagery

Henrique C. Fernandes∗a , Marcos A. Batistab, Celia A. Z. Barcelosc, Xavier P. V. Maldaguea

a Laval University, Department of Electrical and Computer Engineering,1065, av. de la Medecine, Quebec City, G1V 0A6. Canada.

b Federal University of Goias, Department of Computer Science, Catalao, Goias, Brazil.c Federal University of Uberlandia, Faculty of Mathematics, Uberlandia, Minas Gerais, Brazil.

ABSTRACT

It is becoming increasingly evident that intelligent systems are very beneficial for society and that the furtherdevelopment of such systems is necessary to continue to improve society’s quality of life. One area that hasdrawn the attention of recent research is the development of automatic surveillance systems. In our work weoutline a system capable of monitoring an uncontrolled area (an outside parking lot) using infrared imageryand recognizing suspicious events in this area. The first step is to identify moving objects and segment themfrom the scene’s background. Our approach is based on a dynamic background-subtraction technique whichrobustly adapts detection to illumination changes. It is analyzed only regions where movement is occurring,ignoring influence of pixels from regions where there is no movement, to segment moving objects. Regions wheremovement is occurring are identified using flow detection via sparse frame analysis. During the tracking processthe objects are classified into two categories: Persons and Vehicles, based on features such as size and velocity.The last step is to recognize suspicious events that may occur in the scene. Since the objects are correctlysegmented and classified it is possible to identify those events using features such as velocity and time spentmotionless in one spot. In this paper we recognize the suspicious event “suspicion of object(s) theft from insidea parked vehicle at spot X by a person” and results show that the use of flow detection increases the recognitionof this suspicious event from 78.57% to 92.85%.

Keywords: surveillance, infrared imagery, flow detection, moving object segmentation, background subtraction,suspicious event recognition

1. INTRODUCTION

In recent years the increase in society’s concern with security issues has been remarkable. Cars are monitoredby GPS, houses are equipped with alarms against theft based on movement detection and these alarm systemsare integrated with the INTERNET. Examples of technology use to support security and surveillance activitiescan be easily found. For example, in 20101 the United Kingdom accounted for one quarter of all the CCTV(Closed-circuit television) cameras in the world and and people in London were captured by CCTV cameras upto 300 times a day.1

In order to develop an efficient surveillance system the installation of several video cameras is needed. Whetherby public initiatives similar to what has been done in London or by private concerns which lead a citizen toinstall video cameras on his/her property, video cameras will become increasingly more and more present in ourlives. A study shows that the market for global CCTV (Closed-circuit television) is projected to reach aroundUS$ 28 billion by the end of 2013.2 A constant and huge information flow is generated by the images fromthese CCTVs, however sometimes so much information is available that the system operator cannot visualizewhat he/she really must see. Monitoring based only on cameras and a human operator is expensive, tiring andinefficient. For instance, monitoring 25 cameras, 24 hours a day, 7 days per week, using human observers costsaround US$150,000.00 per year. Besides, after only 20 minutes, human attention to video monitors degeneratesto an unacceptable level.3

∗ E-mail: [email protected]

Thermosense: Thermal Infrared Applications XXXV, edited by Gregory R. Stockton, Fred P. Colbert, Proc. of SPIE Vol. 8705, 870507 · © 2013 SPIE · CCC code: 0277-786X/13/$18 · doi: 10.1117/12.2015077

Proc. of SPIE Vol. 8705 870507-1

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 11/07/2013 Terms of Use: http://spiedl.org/terms

To overcome these problems a possible solution is the use of computational systems based on computervision. They can assist the human operator by warning him/her of events that need attention minimizing thedependence on the operator. The development of techniques which may be used in systems like these has beenthe focus of many researchers in recent years. Several works are concerned with the development of automaticor semi-automatic surveillance systems.4–8

Systems like these usually are composed of three modules: segmentation module, tracking module and eventmanagement module. The segmentation of moving objects in the scene is the first step needed to build anautomatic surveillance system. After that one need to track all segmented objects. After performing thesetasks, the next step is to identify events of interest between these objects. Several papers dealing with the eventmanagement issue can be found in the literature.9–14

As these systems often operate 24 hours a day, there are difficulties dealing with images captured with littleor no light (at night for example) and images captured during the sudden change of scene’s illumination (abruptsun appearance in outdoor scenes for example). A solution for this problem is the use of infrared imagery, whichprovides a better view of the scene and thus leads to better results. This is due to the fact that interesting objects(people and vehicles, supposing a parking lot surveillance system) are natural infrared ray emitters regardlessof the presence of visible light. Usually, the temperature of the human body is different from the background’stemperature. This leads to different energy distributions and gray-scale differences between the background andthe human body (or vehicle) in thermal images.15

El Maadi and Maldague16,17 proposed a surveillance system for outdoor environments that uses a stationaryinfrared camera. This system uses a dynamic background-subtraction technique that considers the illuminationchange history in the scene to segment moving objects. During the tracking process, performed by Kalman filter18

(a two-step function: prediction and correction), the system analyzes features such as temperature, velocity andsize to classify objects into two classes: Person and Vehicle. Once these tasks are correctly performed, eventsof interest can be detected automatically in the scene. We proposed19 a different approach for moving objectclassification in an effort to develop a better classification rate and we also proposed the recognition of suspiciousevents involving these objects (the possible theft of object(s) from inside a vehicle for example). Based on featuressuch as velocity, spatial proximity and time spent motionless in one spot, the system is capable of recognizingsuspicious actions and informing the human operator so that he/she can take the proper action.

In order to recognize suspicious events one first need to recognize moving objects involved in the event. Duringexperiments conducted in Ref. 19 we noticed that some events were misclassified because the human involved inthe event was not correctly segmented and classified. In order to minimize this problem, optical flow20 is usedin this work to segment and identify correctly small moving objects located far from the camera moving objects.After detecting the flow we perform the segmentation of moving objects only in the regions where the flow exists.More details are provided in the following section.

This paper is organized as follows: the next section presents the technique used to segment moving objectsand the use of optical flow via sparse frame analysis in the segmentation process; in section 3 the criteria usedto classify moving objects are presented; in section 4 the tracking process is outlined; in section 5 our method

(a) (b)

Figure 1: Object segmentation: (a) a scene containing two humans; (b) the two humans from the (a) scenecorrectly classified.

Proc. of SPIE Vol. 8705 870507-2

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 11/07/2013 Terms of Use: http://spiedl.org/terms

to recognize suspicious events is described (more details can be found in Ref. 19); in section 6 our experimentalresults are presented, and finally in section 7 our final considerations are presented.

2. OBJECT SEGMENTATION

The first step of a surveillance system development for a outdoor environment is the segmentation of movingobjects in the scene. In our previous work,19 the segmentation of moving objects is inspired by what is proposedin Ref. 16. We use a dynamic background-subtraction technique to robustly adapt segmentation to illuminationchanges in outdoor scenes. This approach is based on a background-subtraction method where the current frameat the time k is subtracted from the scene’s estimated background at the same time k and the result is thresholdedproviding foreground objects. The foreground image FOREk is constructed by verifying if each point (i, j) ofFk belongs or not to the foreground. FOREk is given by Equation (1).

FOREk(i, j) ={

1, if |Fk(i, j)−Bgk(i, j)| ≥ Thk,0, else. (1)

where 1 ≤ i ≤ N ; 1 ≤ j ≤ M ; 1 ≤ k ≤ q; Thk is the binarization threshold which is applied to decide if a pixelbelongs or not to the foreground calculated for each instant k; Fk is the frame at time k; Bgk is the scene‘sbackground at the time k; FOREk is the scene‘s foreground at the time k; F−l, F−l+1, F−l+2, ..., F−1, F0, F1, ..., F qis a sequence of l + q + 1 frames of size NxM and we use the first l frame of this sequence to initialize thebackground. The background initialization is achieved by Equation (2).

Bgk ={

Fk, if k = −l,Bgk−1 + 0.5(Fk−1 −Bgk−1), otherwise. (2)

where −l ≤ k ≤ 0.

In a background-subtraction method, the scene’s background must be updated because the scene is constantlychanging. The background must be up to date so FOREk can be correctly calculated. Bgk+1 is calculated byEquation (3).

Bgk+1 = Bgk + αk(Fk −Bgk) (3)

where αk is the learning rate and lies within the interval [0 1].

In outdoor scenes, abrupt changes in the scene’s illumination can occur frequently. In this case, even withoptimal thresholds, the background-subtraction process can fail completely. This occurs because Equations (1)and (3) will not follow the velocity change in scene’s illumination due to fixed thresholds. In order to overcomethis problem, El Maadi and Maldague16,17 used dynamical Thk and αk to calculate Equations (1) and (3)respectively. We use this approach to segment moving objects by calculating αk and Thk based on the historyvariation of the illumination (gray-scale) on the scene.

This history variation is represented by a vector containing the latest p+1 mean gray level of the pixels of thescene (meank). In our implementation p = 4 and meank is given by Equation (4) and the vector V containingthe history variation is given by Equation (5).

meank =N∑

i=1

M∑

j=1

Fk(i, j)/NM (4)

V = (meank−p, ...,meank), k > p (5)

Based on the history variation of the illumination on the scene we calculate the dynamic thresholds used inthe background-subtraction process. Thus, Thk and αk are given by the following equation respectively:

Proc. of SPIE Vol. 8705 870507-3

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 11/07/2013 Terms of Use: http://spiedl.org/terms

Thk = β(meank + ∆k) (6)

αk = a + b∆k/ max(meank,meank−p), k > p; (7)

∆k = meank −meank−p, k > p; (8)

where:

• Thk: is the binarization threshold used to detect the foreground at time k ;

• β: is a value defined by the user. In our implementation this value lies within the interval [0.1 0.5];

• ∆k: is the variation of the pixels’ mean gray level in the video in the interval [k -p , k ] defined by Equa-tion (8).

• αk: is the background update learning rate calculated at time k ;

• a: is the smallest value that αk may have and in our implementation is a = 0.05;

• b: is the slope of the line describing the gain variation curve set by the user. In our implementation b hasthe empirical value 0.85.

The dynamic background subtraction technique is given by applying Equations (6) and (7) to each framewhich gives the system the capability of controlling and adjusting dynamically the gain-threshold combinationaccording to the illumination change speed. αk and meank can be calculated using all pixels in the frame foreach time as demonstrated in our previous work.19 Figure 1 shows the segmentation of two persons using thistechnique mentioned above.

However in typical parking lot surveillance videos, the motion information is not always spread across theentire scene. There are situations when there is only a small object passing through the scene. In these cases,the region where the movement is occurring is quite small compared to the frame size. In this case the identifica-tion/segmentation of these small moving objects is jeopardized if the entire scene is analyzed because stationarypixels will influence the calculation of αk and Thk and consequently influence the background estimation andforeground detection. Thus, to improve segmentation of moving objects at time k, in this work we propose toanalyze only the region surrounding the moving objects via optical flow fields instead of analyzing all pixels ofthe frame at each time k.

(a) (b) (c)

Figure 2: Flow detection: (a) original frame with two humans walking. The first is in the first plane of the imageand the second is at the top of the frame behind the branches of a tree; (b) the associated flow of this frame and(c) regions in the scene where the moving object segmentation process will be applied.

Proc. of SPIE Vol. 8705 870507-4

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 11/07/2013 Terms of Use: http://spiedl.org/terms

j =0 ...

j =1

j =2

Fo1 Fo2 F3 Fm0-1 Fm0 +1

V W-

V VzzF1 Fz ...

! !i =f

Í =m Fm

V V VF1 F1m -1 F1mm-2

V V.. FZ FZm-2 m-1

Fm!-!+1

Figure 3: Optical flow detection schema via sparse frame analysis.

The main challenge of this approach is to determine where the motion is taking place. For this purposethe concept of optical flow filed is used. The optical flow is the distribution of apparent velocities that can beassociated with apparent motion.20 It is used to find the regions in the scene where movement is taking place.Even if an object has moved only during a small number of frames or the moving object is very small, the flowassociated with this movement will be detected. Then, segmentation of moving objects inside the optical flowbecomes an easy task.

The idea behind flow detection is to subtract consecutive frames in order to identify the difference betweenthese frames, thus constructing a sparse representation of the video. A certain number of interactions will allowus to determine the flow and thus identify the location of moving object(s) in the video sequence.

Let F 01 , F 0

2 , ... F 0m+1 be a sequence of m+1 consecutive frames that is part of a video composed of u frames.

The flow associated with this sequence is a factor F and it can be constructed as:

F ji = F j−1

i+1 − F j−1i (9)

where j represents the level of the flow detection and it varies from 1 to m and i varies from 1 to m − j + 1.Applying Equation (9) to each frame of the sequence m times leads to the detection of the flow associated withthe sequence. This detected flow indicates the region in the scene where motion is taking place. The number mis chosen experimentally so the flow associated with all movements that occur in a given sequence are identified.So Fm

1 , the flow associated with the video sequence, is the result of applying Equation (9) to all frames of thegiven sequence m times. Figure 2a shows the original frame at time k and Figure 2b shows the flow associatedwith it where m = 30. Once optical flow fields are calculated (Figure 2b) morphological operations are appliedto it to obtain Figure 2c which indicates the regions where there is movement in the scene. The moving objectsegmentation process will be applied only in these regions. Figures 2b and 2c show that even the flow associatedwith the movement of the person in the top of the scene, who is far from the camera and consequently verysmall, is detected by our process.

Figure 3 contains a graphical schema indicating how factor Fm1 is calculated via sparse frame analysis. After

identifying the associated flow in the first sequence of m + 1 frames, the flow associated with the next sequenceof m + 1 frames is constructed in the same way that the flow associated with the first sequence was constructed.This process is repeated until the u frames of the video have been processed. For the last sequence of frames weconsider the last m + 1 frames of the video.

Figure 4 shows a frame at time k where the segmentation results using the procedure described in Ref. 19are compared with the segmentation results obtained using what is proposed in this work. Figure 4c clearlyindicated that a small person walking behind the branches of a tree in the top of the scene was successfullyidentified and in Figure 4f this same person is identified in the top of the scene when he/she was partly behind aroad sign. Figures 4b and 4e show that the same person could not be recognized without the approach proposedin this work.

Proc. of SPIE Vol. 8705 870507-5

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 11/07/2013 Terms of Use: http://spiedl.org/terms

(a) (b) (c)

(d) (e) (f)

Figure 4: Segmentation using and not using flow detection. (a) original 69th frame of a sequence, (b) segmentationresult obtained from (a) using Ref. 19 approach, (c) segmentation result obtained from (a) using the approachproposed in this work, (d) original 113th frame of the same sequence, (e) segmentation result obtained from (d)using Ref. 19 approach and (f) segmentation result obtained from (d) using the approach proposed herein.

The segmentation process identifies and segments the moving objects in the scene. This provides a series ofblobs spread throughout the frame. These blobs are the moving objects in the scene. To represent these objectsand extract features from them, a bounding box is drawn around each one. Figure 5 shows an object fromPerson’s class with its bounding box.

3. CLASSIFICATION

The recognition of moving objects in the scene is only the first step of a surveillance system. Once the moving ob-jects have been segmented, they must be classified. Inspired by what is proposed by El Maadi and Maldague16,17

we considered object size, height/width ratio, average temperature of a specific part of the object (head for peo-ple and engine for vehicles) and the object’s velocity to classify them into two classes: Person and Vehicle. If theobject is not classified in one of these two classes it is classified as a third class of “non-identified” objects. The“unknown” class includes unclassified objects such as birds, dogs, cyclists, etc. Objects from this third class arenot considered by the event recognition module, however they are considered by the classification module whichtries to classify them in one of the other two classes until the object disappears from the scene.

The input of this module is the binary image of the foreground and the output is each scene’s object classifiedas one of the classes using the pre-established criteria. In this work the criteria used are those already describedin our previous work.19 It was showed in Ref. 19 that these criteria provide better results than those which are

Figure 5: Object from Person’s class in the bounding box.

Proc. of SPIE Vol. 8705 870507-6

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 11/07/2013 Terms of Use: http://spiedl.org/terms

(a) (b)

Figure 6: Object classification: (a) two humans correctly classified; (b) a vehicle correctly classified.

used in Ref. 17. The criteria used in this work are summarized next. Figure 6 shows the successful classificationof two humans and a vehicle both using these criteria.

A label “Person” is assigned to an object when the following criteria are verified: the object’s velocity isdifferent of zero, in other words, the object is not motionless and the object’s size is smaller than the largestpossible size for a person, and the rate height/width is larger than 1.5.

A label “Vehicle” is assigned to an object when one of the following criteria are verified: the object’s size islarger than a predefined threshold and the object’s velocity is also larger than another predefined threshold, orthe object’s size is larger than the largest size for an object from the class “Person”, or the object’s velocity islarger than the largest velocity for an object from the class “Person”. All thresholds here were chosen from anexperimental analysis of the largest size and the largest velocity of objects from the class “Person”.

4. TRACKING

Once the moving object has been correctly segmented, it is subjected to the tracking process using Kalmanfilter.18,20 Basically it is a two step function: prediction and correction (Figure 7). The prediction phase isconducted at each time k. This phase predicts where each object will be at time k + 1. In the correction phase,carried out at each time k+1, it is checked if the estimation conducted previously is correct for objects currentlyin the scene. At the end of this phase all objects that were in the scene at time k are linked with other objectsat time k + 1. Note that this linking occurs if the object did not leave the scene at time k + 1.

At the prediction phase, using a linear extrapolation, the predicted state is estimated by a temporal projectionfrom the location (xk, yk) at time k to the location (xk+1, yk+1) at time k + 1 as follows:

{xk+1 = xk + Vxk

yk+1 = yk + Vyk

(10)

where Vxkand Vyk

are the displacement values of the object in the axes x and y respectively calculated usingthe previous frame at time k − 1 according to Equation (11).

{Vxk

= xk − xk−1

Vyk= yk − yk−1

(11)

Figure 7: Prediction/correction cycle for tracking.

Proc. of SPIE Vol. 8705 870507-7

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 11/07/2013 Terms of Use: http://spiedl.org/terms

r 4- Measure

Prediction

L

Intersection zone

Figure 8: Bounding boxes overlapping.

Once the new measurement has been made at time k+1, a matching criterion based on overlapping boundingboxes is computed between predicted and measured states as shown in Figure 8. The percentage of the inter-section of the bounding boxes of the measured object and the predicted object is then determined to verify ifit is greater than a given threshold. This threshold depends on the camera’s image capture rate used to recordthe videos and the mean displacement velocity of the objects in the scene. In this work, this threshold wasfixed, experimentally, at 75% of the size of the bounding box of the predicted object. This means that with anintersection greater than this threshold, the measured object is the same object that was predicted.

5. EVENT RECOGNITION

For the development of a surveillance system the event recognition step is the most difficult and also the mostimportant step. Approaches in the literature state that event recognition can be divided into two categoriesdepending on which method is used to model events:9 implicitly or explicitly. In the first category, no a prioriknowledge concerning the application domain is provided. The system automatically identifies common eventsfrom observed data. For the second category, the system requires an explicit definition of what constitutesnormal and suspicious events. In this case, the system attempts to match the a priori knowledge provided bythe operator with observed data patterns. Implicit modeling renders the system highly adaptable to differentscenarios and situations, but inaccurate in detecting specific and complex events. On the other hand, explicitmodeling generally yields better results in terms of false alarms and missed alarms, but, of course, this methodis not self-adapting as all the knowledge is provided by the operator.

In our previous work19 we proposed to identify the suspicious event C: “suspicion of object(s) theft frominside a parked vehicle at spot X by a person”. This suspicious event can be recognized by dividing it intosub-events such as: C1: ‘person approaching a vehicle parked at spot X’, C2: ‘person staying close to this vehiclefor a certain amount of time’, C3: ‘person entering this vehicle at spot X’ and C4: ‘person moving away fromthis vehicle’. We used the terms proposed by Lavee et al in Ref. 14 to model the scenario described above:.According to Ref. 14, an event has three important aspects:

1. occupies a period of time;

2. is built of smaller semantic units called sub-events;

3. is described using the salient aspects of the video sequence input.

Therefore, lets consider the suspicious event C and the sub-events that compose it, Ci, where i = {1, 2, 3, 4}.To create Ci+1, the sub-event Ci must already exist. The features analyzed to recognize these sub-events were:object proximity, time spent motionless in a particular spot, objects intersection and object velocity.

Each sub-event is stored in an array of sub-events composed of the time at which the sub-event occurred, thesub-event type and the data of each actor that participates in the event. Here these actors are two and only two:a Person and a Vehicle. All video frames are analyzed and when the constraints for a sub-event are achieved, asub-event is created for the two actors involved.

The recognition of C1 (person approaching a vehicle parked at spot X) is conducted by verifying if the personis spatially close to a parked vehicle and this person’s velocity is smaller than a predefined threshold. When a

Proc. of SPIE Vol. 8705 870507-8

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 11/07/2013 Terms of Use: http://spiedl.org/terms

-

Ws"2-w

i.m

g'.1.4a..

(a) (b)

(c) (d)

Figure 9: Suspicious event of type C: (a) C1 - person approaching a vehicle parked at spot X; (b) C2 - personremaining close to this vehicle for a while; (c) C3 - person entering this vehicle at spot X; (d) C4 - person movingaway from this vehicle.

person meets these requirements a sub-event is created for each parked vehicle which this person is close to (morethan one sub-event is created when a person is between two parked vehicles). For sub-event C2 (person remainingclose to this vehicle for a while), in addition to verifying the existence of the sub-event C1, the occurrence timebetween sub-events C1 and C2 are compared. Thus, if the person’s velocity is smaller than a threshold, if theperson is still close to the parked vehicle and the difference between the occurrence time of C2 and C1 is greaterthan another predefined threshold, then the sub-event C2 is created. To create sub-event C3 (person enteringa vehicle at spot X), we verify the existence of sub-events C2 and C1 but we also verify if the person’s areais smaller than a threshold and the edges of the person’s bounding box have crossed the edges of the vehicle’sbounding box. In the case of sub-event C4 (person moving away from a vehicle) we verify if the person is movingaway from the vehicle. This is carried out by comparing the objects’ bounding boxes with its centroid (personand vehicle). If the distance between them is larger than a predefined value and the sub-events C3, C2 and C1exist for those actors, then the sub-event C4 is created. If the sub-events C1, C2, C3 and C4 are recognized,a suspicious event of type C is created and the system operator is warned of the occurrence of this suspiciousevent. Figure 9 shows an event of type C.

6. EXPERIMENTAL RESULTS

In this work the same database was used as in Ref. 19. This database contains videos recorded by an infraredstationary camera which captures images in the wavelength 7.0 to 15.0 µm. One section of a parking lot at LavalUniversity (Quebec City, Canada) was monitored during three consecutive days in late autumn from a heightof ten meters. In total, the database is composed of ninety minutes of video recording, including images withbright sunlight, images with a cloudy sky, images at night and images with/without snow on the ground and onthe vehicles. During the recordings the average temperature was −1, 5 ◦C.

As thefts on university campus are not frequent and no such event occurred over a three day period, it wasdecided that the suspicious event of interest in this work (suspicion of object(s) theft from inside a parked vehicle

Proc. of SPIE Vol. 8705 870507-9

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 11/07/2013 Terms of Use: http://spiedl.org/terms

96.00%

94.00%

92.00%

90.00%

88.00%

86.00%

84.00%

82.00%

80.00%

78.00%

Pecson Vehicle Total

Segmentation method used inour previous work (withoutflow analysis)

Segmentation method used inthis work (with flow analysis)

Table 1: Evaluation of the classification results using flow analysis.

Class TotalNumber of

objects

Correctlyclassifiedobjects

Incorrectlyclassifiedobjects

Accuracy rate

Person 265 250 15 94.34%Vehicle 137 121 16 88.32%Total 402 371 31 92.29%

Figure 10: Comparison of segmentation/classification rate between the approach used in Ref. 19 and the approachused herein.

at spot X by a person) would be simulated so a database for testing could created. Among the suspicious eventsof type C recorded are actions performed by different actors, vehicles parked in different spots and differentmethods of entering the vehicle (using a window left open, simulating the breaking down of the vehicle’s door orsimulating the breaking of the vehicle’s window).

The database used contains the passage of 402 objects of interest through the scene: 265 are from the Person’sclass and 137 are from the Vehicle’s class. Other objects have also passe through the scene, but they are not dealtwith in this study which focusses only on objects of the Person and Vehicle classes. Additionally, the databaseincludes 14 acted suspicious events of type C.

In this work experiments were performed on this database using the optical flow filed approach and comparethe results with the results obtained in Ref. 19. Table 1 presents the results obtained using flow analysis forthe classification of objects of interest. From the 265 objects of the Person class, 250 were correctly classified(94.34% accuracy rate). For objects of the Vehicle class, 121 of the 137 possible objects of this class were correctlyclassified (88.32% accuracy rate). In total a classification rate of 92.29% was reached for all objects of interest.Comparing these results with the results presented in Ref. 19 indicates that the use of optical flow improved thesegmentation and classification process. In Ref. 19 a total classification rate of 89.8% was reached in comparisonto the 92.29% achieved in this work. This increased accurate rate is important because improving the objectsegmentation and classification process will improve the recognition of suspicious events since the segmentedobjects here will be the actors involved in the suspicious event. Figure 10 shows a graph which compares theresults obtained here and the results obtained in Ref. 19.

In order to demonstrate the effectiveness of the proposed method for the recognition of suspicious events

Table 2: Evaluation of the suspicious event recognition using flow analysis.

Type TotalNumber of

events

Nbr. ofrecognized

events

Falsenegatives

False positives

C 14 16 1 3

Proc. of SPIE Vol. 8705 870507-10

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 11/07/2013 Terms of Use: http://spiedl.org/terms

explicitly, some experiments were performed regarding the recognition of suspicious events of type C and theresults are outlined in Table 2. The use of optical flow via sparse frame analysis also improved this recognitionrate, when applied to our database which includes 14 acted suspicious events of type C. In Ref. 19 we wereable to recognize 11 of these events, whereas herein we were able to recognize 13 of these suspicious event usingoptical flow analysis (an increase from 78.5% to 92.85%).

7. CONCLUSIONS

In this work we proposed the use of optical flow via sparse frame analysis to segment moving objects frominfrared imagery. This method was applied in a surveillance system for outdoor parking lots19 which is capableof recognizing the suspicious event “suspicion of object(s) theft from inside a parked vehicle at spot X by aperson”.

The final purpose of a surveillance system based on video cameras is recognizing suspicious events. However,beforehand, it must be able to recognize moving objects. A dynamic background subtraction technique basedon the illumination changes history in the scene is used to identify and classify moving objects. However, duringthe experiments conducted in Ref. 19 it became apparent that small objects located far from the camera werenot segmented and recognized correctly. So a suspicious event that occurred far from the camera could bemisclassified because objects involved in this event would not be identified correctly.

Thus, in this work optical flow was used to reduce this problem. The optical flow is used to identify the regionwhere the movement is occurring in the scene. Once areas with motion presence are known, only the regionsnear these areas are analyzed in order to segment moving objects. As a result, the segmentation of small movingobjects is not influenced by regions composed of stationary pixels. This influence results from the fact that theforeground detection is calculated subtracting the current frame and the current estimated background. Thisdifference is then thresholded to provide binary objects. This threshold is calculated dynamically based on theillumination change history in the scene. This history is given by the variation in mean gray level of the pixels.If all pixels are consider in the frame to calculate the mean gray level, then unnecessary information resultingfrom stationary pixels belonging to regions which are not moving will be generated and the threshold will notbe optimal for that current frame.

By analyzing only regions where movement occurs, the background subtraction technique will not be affectedby unnecessary information that could result from stationary pixels. Thus, the segmentation of small or far-from-the-camera moving objects is improved. This improvement can be seen comparing the results regardingsegmentation/classification obtained in this work (Table 1) and the results obtained in Ref. 19. This improvementis also seen in Figure 4 which shows two cases where objects were not identified using the method proposed in,19

but were correctly identified using the technique proposed in this work.

The improvement in the segmentation and classification directly affects the recognition of suspicious eventssince the first step of any event is the recognition of its actors. Compared with the recognition achieved in Ref. 19,the use of optical flow improves the recognition rate of the suspicious event of type C from 11 to 13, thus missingjust 1 of the 14 suspicious events. This improvement is significant since the main purpose of a surveillance systemis to recognize suspicious events that occur in the monitored area. The number of false positives also increased.In Ref. 19 the system produced 2 false positives (not using optical flow field restriction) and herein the systemproduced 3 (using optical flow restriction). This indicates that the use of optical flow renders the process moresensitive. However, the increase in the number of false positives is not a significant drawback since it is moreimportant for a surveillance system to recognize all suspicious events than to avoid recognizing a non-suspiciousevent as being suspicious. In the case of a non-suspicious event labeled as suspicious the system could warn thesystem human operator so he/she could mark it as non-suspicious.

The next step of this work is the validation of the proposed technique using a video database containing realthefts. Future work includes recognizing more events such as “vehicle theft”. Visible and infrared images couldbe combined in order to recognize events on videos. Using infrared images as a mask for visible images it wouldbe possible to recognize a vehicle’s license plate and also recognize objects on a person’s hands so as to confirmif this person left a suspicious backpack unattended (representing an abandoned explosive device) or removedpackage(s) from inside a parked vehicle.

Proc. of SPIE Vol. 8705 870507-11

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 11/07/2013 Terms of Use: http://spiedl.org/terms

ACKNOWLEDGMENTSThe support of the Emerging Leaders in the Americas Program (ELAP, a program from the Government ofCanada), as well as the National Council for Scientific and Technological Development (CNPq) and Coordinationfor the Improvement of Higher Level Personnel (CAPES) - Brazilian governmental agencies, are acknowledged.

REFERENCES[1] Wright, D., Friedewald, M., Gutwirth, S., Langheinrich, M., Mordini, E., Bellanova, R., Hert, P. D., Wad-

hwa, K., and Bigo, D., “Sorting out smart surveillance,” Computer Law and Security Review 26(4), 343–354(2010).

[2] RNCOS, “Global cctv market analysis (2008-2012),” tech. rep., RNCOS Industry Research Solutions (2009).[3] Haering, N., Venetianer, P., and Lipton, A., “The evolution of video surveillance: an overview,” Machine

Vision and Applications 19, 279–290 (2008).[4] Fernandez-Caballero, A., Castillo, J. C., Serrano-Cuerda, J., and Maldonado-Bascon, S., “Real-time human

segmentation in infrared videos,” Expert Systems with Applications 38, 2577–2584 (2010).[5] Pantrigo, J. J., Hernandez, J., and Sanchez, A., “Multiple and variable target visual tracking for video-

surveillance applications,” Pattern Recognition Letters 31(12), 1577–1590 (2010).[6] Fernandez-Caballero, A., Castillo, J. C., Martinez-Cantos, J., and Martinez-Tomas, R., “Optical flow or

image subtraction in human detection from infrared camera on mobile robot,” Robotics and AutonomousSystems 58, 1273–1281 (2011).

[7] Tziakos, I., Cavallaro, A., and Xu, L., “Event monitoring via local motion abnormality detection in non-linear subspace,” Neurocomputing 73(10-12), 1881–1891 (2010).

[8] Fernandez, C., Baiget, P., Roca, F. X., and Gonzalez, J., “Augmenting video surveillance footage withvirtual agents for incremental event evaluation,” Pattern Recognition Letters 32(6), 878–889 (2011).

[9] Micheloni, C., Snidaro, L., and Foresti, G. L., “Exploiting temporal statistics for events analysis and un-derstanding,” Image and Vision Computing 27, 1495–1469 (2009).

[10] Foresti, L. G., Micheloni, C., and Snidaro, L., “Event classification for automatic visual-based surveillanceof parking lots,” in [Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR2004: Cambridge, England, UK ], 3, 314–317 (2004).

[11] Diamantopoulos, G. and Spann, M., “Event detection for intelligent car park video surveillance,” Real-TimeImaging 11, 233–243 (2005).

[12] Fernandez, C., Baiget, P., Roca, F. X., and Gonzalez, J., “Determining the best suited semantic events forcognitive surveillance,” Expert Systems with Applications 38, 2048–4079 (2010).

[13] Tziakos, I., Cavallaro, A., and Xu, L., “Video event segmentation and visualisation in non-linear subspace,”Pattern Recognition Letters 30(2), 123–131 (2009).

[14] Lavee, G., Rivlin, E., and Rudzsky, M., “Understanding video events: A survey of methods for automaticinterpretation of semantic occurrences in video,” in [IEEE Transactions on Systems, Man, and Cybernetics- Part C: applications and reviews ], 39, 489–504 (2009).

[15] Xue, Z., Ming, D., Song, W., Wan, B., and Jin, S., “Infrared gait recognition based on wavelet transformand support vector machine,” Pattern Recognition 43(8), 2904–2910 (2010).

[16] Maadi, A. E. and Maldague, X., “Outdoor infrared video surveillance: A novel dynamic technique for thesubtraction of a changing background of ir images,” Infrared Physics and Technology 49, 261–265 (2007).

[17] Maadi, A. E. and Maldague, X., “Classifying tracked objects and their interactions from infrared imagery,”in [Canadian Conference on Electrical and Computer Engineering, 2006. CCECE’06. May 7-10, 2006 -Ottawa Congress Centre, Ottawa, Canada. ], 2194–2198 (2007).

[18] Welch, G. and Bishop, G., “An introduction to the kalman filter. (tr 95-041),” tech. rep., Computer Science,University of North California at Chapel Hill, Chapel Hill, NC (2006).

[19] Fernandes, H., Maldague, X., Batista, M. A., and Barcelos, C. Z., “Suspicious event recognition usinginfrared imagery,” in [2011 IEEE International Conference on Systems, Man, and Cybernetics, October9-12, 2011, Anchorage, Alaska ], 2186–2191 (2011).

[20] Maggio, E. and Cavallaro, A., [Video Tracking: Theory and Practice ], John Wiley and Sons, Ltd, UnitedKingdom (2011).

Proc. of SPIE Vol. 8705 870507-12

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 11/07/2013 Terms of Use: http://spiedl.org/terms