automatic traffic surveillance system for vision-based vehicle

19
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 26, 611-629 (2010) 611 Automatic Traffic Surveillance System for Vision-Based Vehicle Recognition and Tracking CHUNG-CHENG CHIU, MIN-YU KU AND CHUN-YI WANG Department of Electrical and Electronic Engineering Chung Cheng Institute of Technology National Defense University Taoyuan, 335 Taiwan This paper proposes a real-time traffic surveillance system for the detection, recog- nition, and tracking of multiple vehicles in roadway images. Moving vehicles can be auto- matically separated from the image sequences by a moving object segmentation method. Since CCD surveillance cameras are typically mounted at some distance from roadways, occlusion is a common and vexing problem for traffic surveillance systems. The seg- mentation and recognition method uses the length, width, and roof size to classify vehi- cles as vans, utility vehicles, sedans, mini trucks, or large vehicles, even when occlusive vehicles are continuously merging from one frame to the next. The segmented objects can be recognized and counted in accordance with their varying features, via the proposed recognition and tracking methods. The system has undergone roadside tests in Hsinchu and Taipei, Taiwan. Experiments using complex road scenes under various weather con- ditions are discussed and demonstrate the robustness, accuracy, and responsiveness of the method. Keywords: intelligent transportation system, vehicle detection, vehicle recognition, vehi- cle tracking, occlusion segmentation 1. INTRODUCTION Intelligent transportation systems (ITS) have attracted considerable research atten- tion in areas such as vehicle detection, recognition, and counting, and traffic parameter estimation. In light of the anticipated availability of low-cost hardware, as well as con- tinuing progress in algorithmic research, computer vision has become a promising base technology for traffic sensing systems. Since vision sensors provide more information than the conventional sensors widely used in ITS, attention is now being focused on vi- sion-based traffic surveillance systems. Background extraction is an efficient preprocessing technique for traffic observa- tion and surveillance. It reduces the amount of insignificant information in the image se- quences and thus accelerates the processing. To reduce computational complexity, many approaches have been suggested for segmenting a video image into a background image and a moving object/foreground image. Background images tend to be motionless over a long period of time, and moving object images only contain foreground objects. Change detection [1, 2] is the simplest method for segmenting moving objects. An adaptive back- ground update method [3] has been proposed to obtain background images over a given time period. He et al. [4] applied the Gaussian distribution to model each point of a back- ground image, using the mean value of each pixel during a given time period to assemble Received March 17, 2008; revised August 4 & October 21, 2008; accepted October 23, 2008. Communicated by Jiebo Luo.

Upload: letruc

Post on 04-Feb-2017

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automatic Traffic Surveillance System for Vision-Based Vehicle

JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 26, 611-629 (2010)

611

Automatic Traffic Surveillance System for Vision-Based Vehicle Recognition and Tracking

CHUNG-CHENG CHIU, MIN-YU KU AND CHUN-YI WANG

Department of Electrical and Electronic Engineering Chung Cheng Institute of Technology

National Defense University Taoyuan, 335 Taiwan

This paper proposes a real-time traffic surveillance system for the detection, recog-

nition, and tracking of multiple vehicles in roadway images. Moving vehicles can be auto-matically separated from the image sequences by a moving object segmentation method. Since CCD surveillance cameras are typically mounted at some distance from roadways, occlusion is a common and vexing problem for traffic surveillance systems. The seg-mentation and recognition method uses the length, width, and roof size to classify vehi-cles as vans, utility vehicles, sedans, mini trucks, or large vehicles, even when occlusive vehicles are continuously merging from one frame to the next. The segmented objects can be recognized and counted in accordance with their varying features, via the proposed recognition and tracking methods. The system has undergone roadside tests in Hsinchu and Taipei, Taiwan. Experiments using complex road scenes under various weather con-ditions are discussed and demonstrate the robustness, accuracy, and responsiveness of the method. Keywords: intelligent transportation system, vehicle detection, vehicle recognition, vehi-cle tracking, occlusion segmentation

1. INTRODUCTION

Intelligent transportation systems (ITS) have attracted considerable research atten-tion in areas such as vehicle detection, recognition, and counting, and traffic parameter estimation. In light of the anticipated availability of low-cost hardware, as well as con-tinuing progress in algorithmic research, computer vision has become a promising base technology for traffic sensing systems. Since vision sensors provide more information than the conventional sensors widely used in ITS, attention is now being focused on vi-sion-based traffic surveillance systems.

Background extraction is an efficient preprocessing technique for traffic observa-tion and surveillance. It reduces the amount of insignificant information in the image se-quences and thus accelerates the processing. To reduce computational complexity, many approaches have been suggested for segmenting a video image into a background image and a moving object/foreground image. Background images tend to be motionless over a long period of time, and moving object images only contain foreground objects. Change detection [1, 2] is the simplest method for segmenting moving objects. An adaptive back-ground update method [3] has been proposed to obtain background images over a given time period. He et al. [4] applied the Gaussian distribution to model each point of a back-ground image, using the mean value of each pixel during a given time period to assemble Received March 17, 2008; revised August 4 & October 21, 2008; accepted October 23, 2008. Communicated by Jiebo Luo.

Page 2: Automatic Traffic Surveillance System for Vision-Based Vehicle

CHUNG-CHENG CHIU, MIN-YU KU AND CHUN-YI WANG

612

the image. Stauffer et al. [5-7] utilized spectral features (distributions of intensities or colors at each pixel) to model the background. To adapt to changes in illumination, some spatial features have also been exploited [8-10]. These techniques can be used to obtain background images over long periods of time. Chen et al. [11] proposed a statistical al-gorithm to efficiently extract color backgrounds and moving objects. That algorithm pro-vides better background image extraction than the methods mentioned above and was used for that purpose in this study.

Once an accurate background image is obtained, moving objects can be detected by subtracting the background image from the input image. After moving object detection, the next step is to identify occlusive vehicles and segment them from the moving objects. Vehicle occlusion causes errors in traffic parameter calculations and impedes vehicle recognition. Therefore, finding an efficient method of dealing with vehicle occlusion is an important objective for researchers in intelligent transportation systems (ITS). Pang et al. [12] applied a cubical model of the foreground image to detect occlusion and separate merged vehicles from a monocular image. Because the cubical model is computationally complex, and requires segmented vehicles without the effect of shadows and other visual artifacts, it is sensitive to foreground noise and relatively ineffective in real-time applica-tions. Several methodologies [2, 13, 14] consider the problem of object recognition in shape models and topological structures. These techniques use 3D models to extract the point, line, and plane features that are regarded as vehicle models. Although recognition speed can be enhanced with these precise 3-D models, accurate detection under real-world conditions is a time-consuming problem. A model-based vehicle segmentation method [15] utilizes a two-stage approach to detect vehicles. However, a database of 2-D vehicle templates must be employed as matching templates. Thus, this technique is affected by the templates and requires a new set of templates whenever the CCD camera is set up at a different pitch angle. In their tracking method, Hu et al. [16] proposed a statistical mo-tion pattern learning method to cluster foreground pixels and to predict the behavior of moving objects. However, the statistical procedures and the initialization of each cluster centroid from subsequent frames are more complicated and time-consuming. Therefore, this study proposes a feature-based tracking method to track the moving objects.

This paper presents an automatic traffic surveillance system based on real-world application. The system integrates image capture, an object segmentation algorithm, an occlusive vehicle segmentation method, a vehicle recognition method, and a vehicle track-ing method. The occlusive vehicle segmentation and vehicle recognition methods use visual length and visual width to detect and recognize different vehicle types from occlu-sive or non-occlusive objects. To verify its feasibility, the proposed system was installed at three locations in Hsinchu and Taipei, Taiwan. Under varying conditions of weather and illumination, the average processing time of the system was less than 32msec for each color frame. Thus far, the system has been operating for over 10 months to verify the stability of the algorithms.

A flowchart of the real-time vehicle recognition and tracking system is shown in Fig. 1. The first phase is the segmentation of the moving objects. In this study, we used the statistical algorithm of Chen et al. [11] to extract the background image. The moving objects are detected by subtracting the background image from the input image. Then the moving objects are processed via the connected component labeling method of Ran-ganathan et al. [17] to obtain the bounding boxes. Occlusive vehicles are detected and

Page 3: Automatic Traffic Surveillance System for Vision-Based Vehicle

AUTOMATIC TRAFFIC SURVEILLANCE SYSTEM

613

Fig. 1. The flowchart of the proposed system.

segmented in the bounding boxes. The method is effective for detecting and segmenting different kinds of occlusive vehicles on the basis of their shape characteristics and can segment two or more mutually occluded vehicles. Finally, the recognition and tracking methods are processed for each vehicle. The proposed system can classify five types of vehicle and detect traffic flow and average speed in real time.

This paper is organized as follows. In section 2, we introduce the definition of vis-ual length and describe a novel concept of visual length and width. In section 3, we pre-sent the details of the vehicle recognition technique. Sections 4 and 5 describe the meth-ods for occlusive vehicle detection and segmentation and vehicle tracking. Experimental results and conclusions are discussed in sections 6 and 7.

2. CALCULATION OF VISUAL LENGTH AND VISUAL WIDTH

What features can be used to best recognize different car styles on the road? The dimensions of a vehicle are important parameters, since different vehicles are distinguish-able by their length and width. Because the unit of length in the image plane is the pixel, the pixel length of a vehicle differs according to its position in the image plane. We must compute the actual length of the outline of a vehicle, regardless of its position. We there-fore define the visual length and visual width of each vehicle style to approach the actual length and width of the vehicle, and we propose a vehicle recognition method based on visual length and visual width.

Fig. 2 illustrates the use of optical geometry to find the relationships between the pixel length R in the image plane and the visual length Dh1 on the ground. The dotted line F is the central line of the CCD camera, and Dh1 is the visual length of the vehicle above the dotted line F. R2 and R1 are the pixel lengths in the image plane, and Rp is the pixel size of the CCD camera. H is the altitude of the camera, f is the focus of the lens, and θ is the pitch angle of the camera.

Depending on the altitude and the pitch angle of the CCD camera, the relation be-tween H and F is

.sin

HFθ

= (1)

Page 4: Automatic Traffic Surveillance System for Vision-Based Vehicle

CHUNG-CHENG CHIU, MIN-YU KU AND CHUN-YI WANG

614

D3Dh2 D1 Dh1

D4

H

fR p: CCD Vertical Length / Vertical Pixels

F

D2

θ

Charge-Coupled Device

R4×RpR3×Rp

R1×Rp

R2×Rp

D3Dh2 D1 Dh1

D4

H

fR p: CCD Vertical Length / Vertical Pixels

F

D2

θ

Charge-Coupled Device

R4×RpR3×Rp

R1×Rp

R2×Rp

Fig. 2. The visual length between the image and the ground planes.

By similar triangles, the lengths D1 and D2 can be obtained from Eqs. (2) and (3) as

follows:

1 1

1

sincos

pR R Df F D

θθ

=+

(2)

2 2

2

sin.

cospR R D

f F Dθθ

=+

(3)

Substituting Eq. (1) into Eqs. (2) and (3), D1 and D2 can be expressed as in Eqs. (4) and (5), respectively:

1 11

1 1sin cos sin sin cosp p

p p

R R F R RHDf R R f R Rθ θ θ θ θ

⎛ ⎞= = ⎜ ⎟⎜ ⎟− −⎝ ⎠

(4)

2 22

2 2.

sin cos sin sin cosp p

p p

R R F R RHDf R R f R Rθ θ θ θ θ

⎛ ⎞= = ⎜ ⎟⎜ ⎟− −⎝ ⎠

(5)

We can then compute the visual length Dh1 from Eq. (6):

2 12 1

2 11 .

sin sin cos sin cosp

p p

R H R RDh D D

f R R f R Rθ θ θ θ θ

⎡ ⎤= − = −⎢ ⎥

− −⎢ ⎥⎣ ⎦ (6)

In the same way, the visual length Dh2 of the vehicle below the focus can be calcu-lated via Eq. (7):

344 3

4 32 .

sin sin cos sin cosp

p p

R H RRDh D D

f R R f R Rθ θ θ θ θ

⎡ ⎤= − = −⎢ ⎥

+ +⎢ ⎥⎣ ⎦ (7)

Page 5: Automatic Traffic Surveillance System for Vision-Based Vehicle

AUTOMATIC TRAFFIC SURVEILLANCE SYSTEM

615

HF

θ

Dh1

RwDw1

f

CCD

Dh2

Dw2HF

θ

Dh1

RwDw1

f

CCD

Dh2

Dw2

Fig. 3. The visual width between the image and the ground planes.

Once the visual length has been obtained, a similar procedure can be used to com-

pute the vehicle’s visual width, as shown in Fig. 3. The visual widths are defined in Eqs. (8) and (9).

( )1sin cos1 1cos 1w

HR DDw F Dh DwRw f f

θ θθ ++= ⇒ = (8)

( )3sin cos2 2 cos 2w

HR DDw F Dh DwRw f f

θ θθ −−= ⇒ = (9)

In Eqs. (8) and (9), Rw is the pixel width of the vehicle in the image plane. Dw1 and Dw2 represent the vehicle’s visual width above and below the central line F, respec-tively.

We calculated the average visual lengths and widths of different cars from a test sequence, including 29 brand names sold in Taiwan. Table 1 lists the results. Although car height caused slight error in estimating its visual length, we could still exactly deter-mine the style of a car on the road using the parameters of Table 1.

Table 1. The average visual length and visual width of different vehicle models. Visual measurement

Models Average visual length (m) Average visual width (m)

Sedan 5.1284 1.5542 SUV 6.2793 1.7164 Van 6.8390 1.9483

Mini truck 6.8193 2.4164 Truck, Bus 16.4936 3.5013

Because of the errors caused by vehicle height, the visual length and visual width

parameters are larger than the real dimensions. However, these values remain stable and distinct throughout the detection zone of a frame and thus provide a useful means of rec-ognizing different types of vehicles.

Page 6: Automatic Traffic Surveillance System for Vision-Based Vehicle

CHUNG-CHENG CHIU, MIN-YU KU AND CHUN-YI WANG

616

3. VEHICLE RECOGNITION METHOD

Because the length and width of a vehicle vary according to type, the preliminary classification is carried out in terms of length and width. The visual length and visual width of a moving object are computed to recognize the object. If the visual length of a moving object is 15 ~ 17m, and the visual width is 3 ~ 4m, the object is classified as a large vehi-cle, such as a bus or a truck. When the visual length of the moving object is between 4.5 and 7.5m, and the visual width is between 1.4 to 3.0m, the moving object is classified as a small vehicle, such as a van, utility vehicle, sedan, or mini truck. After preliminary classification, the proposed recognition method is employed to precisely classify small vehicles. The details are discussed below. 3.1 Horizontal Edge Detection and Quantization

The image sequences used in this study were recorded by a color CCD camera mounted above the roadway. The direction of the camera was parallel to the direction of the moving objects. After object segmentation, every object was marked by a bounding box in which width and height were denoted by X and Y, respectively. First, Canny’s method [18] was used to find the horizontal edge of the object within the bounding box. We employed a Canny mask (5 × 5 pixels, σ = 2) to detect the horizontal edge pixels. After the Canny mask was processed, the thickness of the horizontal edge was greater than one pixel. Suppression of non-maxima and hysteresis thresholding [19] were used to refine each horizontal edge to a thickness of one pixel. We then counted the horizontal histogram for each row. Hy[j] denotes the projection value of each row.

In the captured image, the horizontal edge pixels of the outline of a vehicle would be influenced by the shooting angle, which would give a small inclination to the horizon-tal edges. Thus the horizontal projection histogram was quantified as follows:

[ ] [ ] [ 1] and [ 1] 0, [ ] [ 1].

[ 1] [ ] [ 1] and [ ] 0, [ ] [ 1]

Hy j Hy j Hy j Hy j Hy j Hy j

Hy j Hy j Hy j Hy j Hy j Hy j

= + + + = > +⎧⎪⎨

+ = + + = ≤ +⎪⎩ (10)

The threshold value Tp was then used to select the significant horizontal projection edges. The method is summarized by Eq. (11):

[ ], [ ][ ] .

0, [ ]

Hy j Hy j TpHy j

Hy j Tp

>⎧⎪= ⎨<⎪⎩

(11)

Because of the camera setup and the resulting pitch angle, the top horizontal line of a vehicle was the roof edge, as illustrated in Fig. 4. In this study, Tp was set at 90% of the first quantified horizontal projection edge of the vehicle.

These horizontal projection edges represent the horizontal edges of the vehicle’s roof, rear window, trunk, and so on, and can be used to measure the corresponding visual lengths. The resulting information can then be used to recognize different types of vehi-cles.

Page 7: Automatic Traffic Surveillance System for Vision-Based Vehicle

AUTOMATIC TRAFFIC SURVEILLANCE SYSTEM

617

AB

CD

E

The moving object

The moving object’s horizontaledge within the bounding box

The quantification of the horizontal histogramprojection:A to B: the length of the roofB to C: the length of the rear windowC to D: the length of the boot

AB

CD

E

The moving object

The moving object’s horizontaledge within the bounding box

The quantification of the horizontal histogramprojection:A to B: the length of the roofB to C: the length of the rear windowC to D: the length of the boot

(a) Original image. (b) Definitions of the horizontal line.

Fig. 4. Outline definitions of a vehicle.

3.2 Measuring the Outlines of Vehicles

The vertical distance between two significant adjacent horizontal edges represents the visual length of the vehicle’s roof, rear window, or trunk. The visual length of the roof, rear window, and trunk of the vehicle can be computed via Eq. (6) or (7). Fig. 4 illustrates the length definitions of the roof, rear window, and trunk after the processing of hori-zontal edge detection and quantization.

From the imaging geometry and pitch angle of the CCD camera, we can be certain that the hood of a vehicle is covered by its roof in the image plane. The first line A can thus be considered as belonging to the roof edge of the vehicle. Therefore, the visual length of the roof Dr can be defined as the visual length from line A to line B. The visual length of the vehicle Dl is defined as the visual length from line A to line E, and the vis-ual width of the vehicle Dw is defined as the visual width computed from the average width of the object in the bounding box. Table 2 gives the average visual widths of dif-ferent types of small vehicles (including 29 brand names) from a test sequence.

Table 2. Measurements of the roofs of different vehicle types. Dimension

Models Average visual length of the roof (m)

Sedan 1.1950 SUV 1.7298 Van 2.3086

Mini truck 0.8

Because different vehicle types have different contours, the visual length and visual

width parameters are important features in vehicle recognition. Specifically, we can cate-gorize small vehicles into mini trucks, vans, utility vehicles, and sedans using two visual lengths, Dl and Dr, and a single visual width Dw. The classification rule is given in Eq. (12):

Mini Truck :(6 5 7.5 ) (0.5 1.0 ) (2.2 3.0 ) Van :(6 5 7.5 ) (2.0 2.5 )

(1 SUV :(6.0 6.5 ) (1.5 2.0 ) Sedan :(4 5 6.0 ) (1.0 1.5 )

. m Dl m m Dr m m Dw m. m Dl m m Dr m

m Dl m m Dr m. m Dl m m Dr m

< ≤ ∧ < ≤ ∧ < ≤

< ≤ ∧ < ≤ ⎫⎪∧< ≤ ∧ < ≤ ⎬⎪< ≤ ∧ < ≤ ⎭

..4 2.2 )m Dw m

⎧⎪⎪⎨ < ≤⎪⎪⎩

(12)

Page 8: Automatic Traffic Surveillance System for Vision-Based Vehicle

CHUNG-CHENG CHIU, MIN-YU KU AND CHUN-YI WANG

618

The proposed recognition method uses a coarse-to-fine strategy. Vehicles can first be categorized as large and small according to visual length and visual width. Then the small vehicles can be classified as mini trucks, vans, utility vehicles, and sedans via the classification rule of Eq. (14). If the visual length and visual width of an object are too small (i.e., (Dl < 4.5m) (Dw < 1.4m)), the object is classified as a noise, which may be a pedestrian, motorcycle, or bicycle. Because the proposed method cannot recognize such noises, they are bypassed and deleted. Finally, the remaining objects may be subject to occlusion and are therefore processed using the occlusive vehicle detection and segmen-tation algorithm described in the next section.

4. OCLUSSIVE VEHICLE DETECTION AND SEGMENTATION

Occlusion is a commonplace and rather difficult problem in traffic surveillance sys-tems. If a bounding box contains occlusive vehicles, those vehicles will not be success-fully recognized in the connected component object. This problem must therefore be ef-fectively addressed. From the occlusive images, we can identify four types of occlusion in a connected component object.

Case 1: Horizontal occlusion: a vehicle merges with others to its right or left. Case 2: Vertical occlusion: a vehicle merges with others behind or ahead of it. Case 3: Right diagonal occlusion: a vehicle merges with others behind it and to the right. Case 4: Left diagonal occlusion: a vehicle merges with the others behind it and to the

left.

Horizontal and vertical occlusion can be detected and segmented by the visual length and width of the bounding boxes. If the visual width of an object in a bounding box is less than 6m, and the visual length of the object is between 7.5 and 15m, the ob-ject may be classified as case 2, case 3, or case 4. The segmentation steps are as follows:

Step 1: In the bounding box, count the number of pixels in each row to obtain the hori-

zontal projection histogram. Step 2: Compute the average projection value Pavg of the horizontal projection histo-

gram within 3m (visual length) of the bottom of the bounding box. Step 3: Calculate the visual width Wvis of the average projection value Pavg. Step 4: In accordance with the values in Table 1, use the visual width Wvis to find the

most appropriate vehicle type and its visual length Lvis. If this is first time the bounding box has been segmented, then go to step 5; else continue the process-ing as follows.

If ( ){ }( ) 0.5 ,vis vis

visvis visL LR

LL LR −≥ ∧ ≤ identify the object of the remnant bound-

ing box as belonging to a single vehicle and go to step 12.

Else if ( ){ }( ) 0.5 ,vis vis

visvis visL LR

LL LR −≥ ∧ > delete the remnant bounding box and

go to step 12. Else go to step 5.

Page 9: Automatic Traffic Surveillance System for Vision-Based Vehicle

AUTOMATIC TRAFFIC SURVEILLANCE SYSTEM

619

Step 5: Check the horizontal projection values within the visual length Lvis. If there is a projection value larger than 1.5 × Pavg, mark the position of the pro-jection value as the Sth row, classify the occlusion as case 3 or case 4, and go to step 6. Else classify the occlusion as case 2 and go to step 10.

Step 6: Count the blank pixels in the bounding box. These pixels do not belong to (S + 5)th row of the object, on the left and right sides. (The (S + 5)th row is used in place of the Sth row to avoid interference.)

Step 7: If the number of the left-side blank pixels is greater than the number of the right- side blank pixels, classify the occlusion as case 3 and go to step 8. Else classify the occlusion as case 4 and go to step 9.

Step 8: From the Sth row to the top row of the bounding box, count the horizontal pro-jection histogram and compute the differences of the adjacent projection values. Denote by Lstop the first position where the difference is more than one third less than Pavg. Start the segmentation from the bottom of the bounding box and stop in accordance with the visual length Lstop. Then clear the right-side object from the Sth row to the Lstopth row. Set the number of left-side blank pixels as the width of the upper vehicle and the Sth row as the bottom of the new bounding box. Then go to step 11.

Step 9: From the Sth row to the top row of the bounding box, count the horizontal pro-jection histogram and compute the differences of the adjacent projection values. Denote by Lstop the first position where the difference is more than one third less than the previous difference. Start the segmentation from the bottom of the bounding box and stop in accordance with the visual length Lstop. Then clear the left-side object from Sth row to Lstopth row. Set the number of right-side blank pixels as the width of the upper vehicle, and set the Sth row as the bottom of the new bounding box. Then go to step 11.

Step 10: Start the segmentation from the bottom of the bounding box and stop in accor-dance with the visual length Lvis.

Step 11: Denote the visual length of the remnant bounding box by LRvis. If the value of LRvis is larger than 3m, go to step 1. Else go to step 12.

Step 12: Stop the segmentation processing. In a similar manner, horizontal occlusion is detected and segmented according to

the visual width of the bounding box. If the visual width of an object in a bounding box is greater than 5m, and the visual length of the object is less than 8m, the object can be classified as case 1. The segmentation steps are as follows:

Step 1: From the bottom row to the top row of the bounding box, count the number of

pixels in each row to obtain the horizontal projection histogram. Step 2: Compute the differences of adjacent projection values. Label as L1 and L2 the

first pair of positions where the difference increases or decreases by more than one third of the difference of the previous pair. If L1 and L2 can be found, then go to step 3. Else go to step 9.

Step 3: Count the number of blank pixels in the top row, on the left and right sides. De-

Page 10: Automatic Traffic Surveillance System for Vision-Based Vehicle

CHUNG-CHENG CHIU, MIN-YU KU AND CHUN-YI WANG

620

note the number of left-side blank pixels by BTL and the number of right-side blank pixels by BTR.

Step 4: Count the number of blank pixels in the bottom row, on the left side and right side. Denote the number of left-side blank pixels by BBL and the number of right-side blank pixels by BBR.

Step 5: If (BTL > BTR) and (BBL < BBR), the right vehicle is somewhat higher than the left vehicle. Segment the length of the right vehicle from L1 to the top of the bounding box, and obtain the width in accordance with the value of BBR. Seg-ment the length of the left vehicle from the bottom of the bounding box to L2, and obtain the width in accordance with the value of BTL. Then go to step 10.

Step 6: If (BTL < BTR) and (BBL > BBR), the right vehicle is somewhat lower than the left vehicle. Segment the length of the right vehicle from the bottom of the bounding box to L2, and obtain the width in accordance with the value of BTR. Segment the length of the left vehicle from L1 to the top of the bounding box, and obtain the width in accordance with the value of BBL. Then go to step 10.

Step 7: If (BTL < BTR) and (BBL < BBR), the right vehicle is smaller than the left vehicle and is located between the top and bottom of the left vehicle. Segment the length of the right vehicle from L1 to L2, and obtain the width in accordance with the value of BTR. Segment the length of the left vehicle from the top of the bounding box to the bottom of the bounding box, and subtract BTR from the width of the bounding box to obtain the width. Then go to step 10.

Step 8: If (BTL > BTR) and (BBL > BBR), the left vehicle is smaller than the right vehicle, and is located between the top and bottom of the right vehicle. Segment the length of the left vehicle from L1 to L2, and obtain the width in accordance with the value of BTL. Segment the length of the right vehicle from the top of the bounding box to the bottom of the bounding box, and subtract BTL from the width of the bounding box to obtain the width. Then go to step 10.

Step 9: Segment a horizontal occlusive vehicle according to the midpoint of the width of the bounding box.

Step 10: Stop the segmentation processing. Output images from segmentation processing in the case of right diagonal occlusion

are displayed in Fig. 5. In Fig. 5 (b), we used Table 1 to find the visual length of the right vehicle (the small bus), in accordance with its visual width. The segmentation point on the bottom of the left vehicle can be detected from the horizontal projection histogram. The segmented results are shown in Fig. 5 (e) with a different bounding box. In Fig. 5 (f), the segmented results were redrawn with different colored blocks.

As soon as each vehicle in the occlusive bounding box has been detected and seg-mented, the procedure is complete. The segmented objects will then be recognized by the proposed vehicle recognition method and tracked by the method discussed in the next section.

5. TRACKING METHOD

After the segmentation and recognition procedures, the upper left corner of the con-nected component of a vehicle is defined as a reference point. The tracking method uses

Page 11: Automatic Traffic Surveillance System for Vision-Based Vehicle

AUTOMATIC TRAFFIC SURVEILLANCE SYSTEM

621

(a) Input image. (b) Foreground image. (c) Occlusion image.

(d) The labeling image after mov-

ing object segmentation. (e) The labeling image after

occlusion segmentation. (f) Segmented vehicles.

Fig. 5. The output images of occlusive segmentation processing.

three parameters to ensure that the next reference point is on the vehicle being tracked. The three parameters are introduced as follows:

1. The distance between a predictive point and a reference point:

In a sequence of frames, the distance Dis between a predictive point and a reference point in the nth frame is defined in Eq. (13):

2 2( ) ( ) ,n n n nDis x x y y′ ′= − + − (13)

where (xn′, yn′) and (xn, yn) are the respective coordinates of the predictive point and the reference point in the nth frame. The predictive point is predicted from the reference point of the (n − 1)th frame, according to the motion of the vehicle. We use the pre-dictive point to predict the coordinates of the reference point in the nth frame. There-fore, when the reference point of the nth frame is at minimum distance from the pre-dictive point, the reference point could be the matching point.

2. The color of the vehicle: The color of a vehicle is defined by the average R, G, and B intensities in a small window near the reference point, in the bounding box. These are denoted by Ravg, Gavg, and Bavg and are computed from Eq. (14). We denote the width and the height of the small window by p and q and the color components in the region by R(x, y), G(x, y), and B(x, y), respectively:

0 0 0 0 0 0

( , ) ( , ) ( , ), , .p q p q p q

avg avg avgx y x y x y

R x y G x y B x yR G Bpq pq pq= = = = = =

= = =∑ ∑ ∑ ∑ ∑ ∑ (14)

Page 12: Automatic Traffic Surveillance System for Vision-Based Vehicle

CHUNG-CHENG CHIU, MIN-YU KU AND CHUN-YI WANG

622

3. The visual length of the roof: The visual length of the roof is computed by the recognition method.

First, find the reference point having minimum distance Dis. Next, verify the color

and visual length of the roof in the bounding box containing the reference point. If the color and visual length of the roof match the features of the vehicle in the previous frame, the vehicle is successfully tracked in the current frame. Otherwise, the reference point having the next smallest distance Dis is checked, and this process is continued until the best match is found.

The predictive point is predicted in terms of the displacement computed from the velocity and acceleration of the reference point. The coordinates (xn′, yn′) of the predictive point in the nth frame are given in Eq. (15). The coordinates (xn-1, yn-1) represent the ref-erence point of the same vehicle in the (n − 1)th frame. The parameters ΔSn(x) and ΔSn(y) are the displacements of the x and y coordinates, and are calculated using Eq. (16):

(xn′, yn′) = (xn-1, yn-1) + (ΔSn(x), ΔSn(y)) (15)

ΔSn(A) = vn-1(A) · (tn − tn-1) + 12 an-1(A) · (tn − tn-1)2

, (16)

where vn-1(A) = (An-1 − An-2)/(tn-1 − tn-2), an-1(A) = (vn-1(A) − vn-2(A)) / (tn-1 − tn-2), and A = x or y. vn-1(A) and vn-2(A) are the velocities of the reference point in the (n − 1)th and (n − 2)th frames, an-1(A) is the acceleration of the reference point in the (n − 1)th frame, and tn, tn-1 , tn-2 are the capture times of the nth, (n − 1)th, and (n − 2)th frames, respectively. The tracking system utilizes reference point displacement and the time interval between se-quential images to track the vehicles. This method is admirably suited to capture systems with variable capture speed.

6. EXPERIMENTAL RESULTS

In this study, two of the proposed systems were installed in Hsinchu, Taiwan. One was set up on the expressway (N 24°49′34″, E 120°58′13″ near the Wu-Ling interchange of the No. 68 expressway), as shown in Fig. 6. The other was set up on Zhong-Hua road (N 24°45′54″, E 120°54′50″). The stationary CCD cameras captured color images at 30 fps with an image size of 320 by 240 pixels. The average processing time for each color frame was approximately 28 to 32msec, and the frame-processing rate was capable of exceeding 30 fps with NTSC encoding. Thus the proposed system can be considered a real-time application. In our experiments, the system was linked to a personal computer with an Intel Pentium IV 2.8 GHz CPU and 1 GB of RAM, which also made use of the Microsoft Visual C++ development environment.

To evaluate the performance of the system, we used test video data collected under various weather and lighting conditions, over a period of more than seven days. Fig. 7 displays the vehicle detection results of the proposed system, when applied to a test video taken over a continuous 24-hour period (from 12:00 24/01/07 to 12:00 25/01/07) at the Wu-Ling interchange, under various lighting conditions. According to the Central Weather Bureau in Taiwan, sunrise and sunset occurred at 06:41 and 17:27, respectively.

Page 13: Automatic Traffic Surveillance System for Vision-Based Vehicle

AUTOMATIC TRAFFIC SURVEILLANCE SYSTEM

623

Fig. 6. The setup of the proposed system.

Date: 12:00~00:00 24/01/2007 Location: the Wu-Ling interchange of NO.68 expressway in Hsinchu, Taiwan

0

100

200

300

400

500

12~13 13~14 14~15 15~16 16~17 17~18 18~19 19~20 20~21 21~22 22~23 23~00Time

Veh

icle

(s)

50%55%60%65%70%75%80%85%90%95%100%

Counting by SystemCounting by HumanCorrect Rate

Date: 00:00~12:00 25/01/2007 Location: the Wu-Ling interchange of NO.68 expressway in Hsinchu, Taiwan

0100200300400500600700800900

1000

0~1 1~2 2~3 3~4 4~5 5~6 6~7 7~8 8~9 9~10 10~11 11~12Time

Veh

icle

(s)

50%55%60%65%70%75%80%85%90%95%100%

Counting by SystemCounting by HumanCorrect Rate

Fig. 7. Vehicle detection results for a continuous 24-hour period.

Fig. 8 shows the vehicle detection results of the system under various weather con-

ditions. Examples of detection and segmentation under various weather conditions are shown in Fig. 9.

Once the vehicles were segmented from the various backgrounds, they could be con-tinuously detected and counted by the recognition and tracking method under all types of weather conditions. Using the proposed system, the average vehicle detection rate was higher than 92%.

Page 14: Automatic Traffic Surveillance System for Vision-Based Vehicle

CHUNG-CHENG CHIU, MIN-YU KU AND CHUN-YI WANG

624

70%

80%

90%

100%Sunrise

Cloudy

Rainy at Day

NoonSunset

Fog

Rainy at Night

Vision-based Recognition

Fig. 8. The vehicle detection results for various weather conditions.

Daytime

Frame N Frame N+2 Frame N+6 Frame N+8

Rainy day

Frame N Frame N+2 Frame N+8 Frame N+10

Afternoon

Frame N Frame N+2 Frame N+4 Frame N+6

Fig. 9. Detection and segmentation images under various weather conditions.

Page 15: Automatic Traffic Surveillance System for Vision-Based Vehicle

AUTOMATIC TRAFFIC SURVEILLANCE SYSTEM

625

Cloudy day

Frame N Frame N+2 Frame N+4 Frame N+6

Nighttime

Frame N Frame N+2 Frame N+4 Frame N+6

Rainy night

Frame N Frame N+2 Frame N+4 Frame N+6

Foggy night

Frame N Frame N+2 Frame N+4 Frame N+5 Fig. 9. (Cont’d) Detection and segmentation images under various weather conditions.

The vehicle recognition rate at night, and under low-visibility weather conditions,

was lower than when under better circumstances (sunrise, cloudy, rainy, and noon), be-cause the sudden appearance of a light source (such as headlights or rear lights) tended to obscure the outline of a vehicle, as well as the foreground, and reduced the amount of edge information available. However, there was still enough information to detect a ve-hicle by the reflection of its rear lights.

Table 3 lists vehicle recognition rates in heavy traffic (0700 ~ 0800 on 25/01/2007). The overall accuracy was 98%. The small percentage of errors resulted from occlusions that were not accounted for in the proposed system.

In Fig. 10, the velocity estimation of the system is compared with that of a police laser gun (Stalker LIDAR). The velocity estimation error was no greater than ± 5km/hr when the system was applied to 30 vehicles. In Table 4, three continuous frames are used to demonstrate the distinctness of the visual lengths and visual widths of different vehicle types. As Table 4 indicates, the roofs, lengths, and widths of different types of vehicles

Page 16: Automatic Traffic Surveillance System for Vision-Based Vehicle

CHUNG-CHENG CHIU, MIN-YU KU AND CHUN-YI WANG

626

Table 3. Results for different vehicle types under heavy traffic conditions. Small vehicle

Large vehicles Vans Utility

vehicles Sedans Mini trucks

Total

Count by the proposed system 27 93 102 513 99 834 Count by human observer 30 100 111 524 87 852

Accuracy 90% 93% 92% 98% 86% 98% Date: 0700~0800 24/01/2007 Location: the Wu-Ling interchange of NO.68 expressway in Hsinchu, Taiwan

Table 4. The visual lengths, widths, and roofs of different vehicle types in continuous frames (H = 6m, θ = 14°, f = 8 mm, Rp= 0.025 mm/pixel).

Frame number N + 1 N + 3 N + 5 Length(m) 6.4234 6.7785 7.3152 Width(m) 1.9934 1.9927 1.8588 Van Roof (m) 2.1045 2.3587 2.4627

Length(m) 5.827 6.2543 6.7565 Width(m) 1.9274 1.6725 1.5492 Utility vehicle Roof (m) 1.6847 1.7360 1.7688

Length(m) 4.6690 5.1598 5.5563 Width(m) 1.7060 1.5218 1.4347 Sedan Roof (m) 1.1020 1.2320 1.5512

Length(m) 6.5769 6.7249 7.1560 Mini truck Width(m) 2.2766 2.4173 2.5552

Date : 2006.11.08~2006.11.25 Loc a t ion: the Wu-Ling interchange of NO.68 expressway in Hsinchu, Taiwan

60

70

80

90

100

Sample Vehicle'S NO.

Vel

ocity

(km

/hr)

Vision-Based 80 87 81 72 84 95 96 96 80 74 95 77 90 86 82 78 96 86 77 85 66 75 83 77 69 81 79 78 70 75

Stalker LIDAR 82 87 77 72 85 99 97 98 83 77 99 77 95 89 83 78 99 87 81 89 66 79 83 81 73 81 84 81 74 79

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Fig. 10. Results of the velocity estimations.

varied significantly. Although a vehicle’s pixel dimensions changed according to its po-sition in the frame, the visual length and visual width used fixed threshold values to dis-tinguish different vehicle types, regardless of location. The vehicle recognition and track-ing system is shown in Fig. 11. The output screen can display traffic parameters for each lane and the total number of each vehicle type (sedan, utility vehicle, van, mini truck, and large vehicle).

Page 17: Automatic Traffic Surveillance System for Vision-Based Vehicle

AUTOMATIC TRAFFIC SURVEILLANCE SYSTEM

627

Fig. 11. The proposed system.

7. CONCLUSIONS

In this paper, a real-time vision-based system is proposed for detecting and recog-nizing vehicles from image sequences. The system uses a moving object detection method to detect moving vehicles, and includes a technique for detecting and segmenting occlu-sive vehicles, as well as recognition and tracking methods. Even when images contain multiple occlusive vehicles that are continuously merging, our system is able to accurately segment the vehicles. In an image frame, the width and length of a vehicle are measured in pixels. However, pixel width and length vary according to the coordinates in the frame. Hence visual measurements are provided to convert the vehicle dimensions from pixels to meters. Furthermore, visual width and visual length are used to solve the occlusion prob-lem. This system overcomes various issues raised by the complexity of vehicle outlines. Experimental results obtained from different highway images show that the proposed system can successfully detect and recognize various vehicle types. Thus far, the system only uses the parameters of visual length, visual width, and roof size for vehicle recogni-tion. In future work, more vehicle outline features and additional occlusive cases will be considered to increase the recognition rate.

REFERENCES

1. J. B. Kim, C. W. Lee, K. M. Lee, T. S. Yun, and H. J. Kim, “Wavelet-based vehicle tracking for automatic traffic surveillance,” in Proceedings of IEEE International Conference on Electrical and Electronic Technology, Vol. 1, 2001, pp. 313-316.

2. G. L. Foresti, V. Murino, and C. Regazzoni, “Vehicle recognition and tracking from road image sequences,” IEEE Transactions on Vehicular Technology, Vol. 48, 1999,

Page 18: Automatic Traffic Surveillance System for Vision-Based Vehicle

CHUNG-CHENG CHIU, MIN-YU KU AND CHUN-YI WANG

628

pp. 301-318. 3. Y. K. Jung, K. W. Lee, and Y. S. Ho, “Content-based event retrieval using semantic

scene interpretation for automated traffic surveillance,” IEEE Transactions on Intel-ligent Transportation Systems, Vol. 2, 2001, pp. 151-163.

4. Z. He, J. Liu, and P. Li, “New method of background update for video-based vehicle detection,” in Proceedings of IEEE Conference on Intelligent Transportation Systems, 2004, pp. 580-584.

5. C. Stauffer and W. Grimson, “Learning patterns of activity using real-time tracking,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, 2000, pp. 747-757.

6. I. Haritaoglu, D. Harwood, and L. Davis, “W4: Real-time surveillance of people and their activities,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, 2000, pp. 809-830.

7. C. Wren, A. Azarbaygaui, T. Darrell, and A. Pentland, “Pfinder: Real-time tracking of the human body,” IEEE Transactions on Pattern Analysis and Machine Intelli-gence, Vol. 19, 1997, pp. 780-785.

8. L. Li and M. Leung, “Integrating intensity and texture differences for robust change detection,” IEEE Transactions on Image Processing, Vol. 11, 2002, pp. 105-112.

9. O. Javed, K. Shafique, and M. Shah, “A hierarchical approach to robust background subtraction using color and gradient information,” in Proceedings of IEEE Workshop on Motion Video Computing, 2002, pp. 22-27.

10. N. Paragios and V. Ramesh, “A MRF-based approach for real-time subway moni-toring,” in Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, Vol. 1, 2001, pp. 1034-1040.

11. C. J. Chen, C. C. Chiu, B. F. Wu, S. P. Lin, and C. D. Huang, “The moving object segmentation approach to vehicle extraction,” in Proceedings of IEEE International Conference on Networking, Sensing and Control, Vol. 1, 2004, pp. 19-23.

12. C. C. C. Pang, W. W. L. Lam, and N. H. C. Yung, “A novel method for resolving vehicle occlusion in a monocular traffic-image sequence,” IEEE Transactions on In-telligent Transportation Systems, Vol. 5, 2004, pp. 129-141.

13. X. Limin, “Vehicle shape recovery and recognition using generic models,” in Pro-ceedings of the 4th World Congress on Intelligent Control and Automation, 2002, pp. 1055-1059.

14. W. Wei, Q. Zhang, and M. Wang, “A method of vehicle classification using models and neural networks,” in Proceedings of IEEE Conference on Vehicular Technology, Vol. 4, 2001, pp. 3022-3026.

15. X. Song and R. Nevatia, “A model-based vehicle segmentation method for tracking,” in Proceedings of IEEE International Conference on Computer Vision, Vol. 2, 2005, pp. 1124-1131.

16. W. Hu, X. Xiao, Z. Fu, D. Xie, T. Tan, and S. Maybank, “A system for learning sta-tistical motion patterns,” IEEE Transactions on Pattern Analysis and Machine Intel-ligence, Vol. 28, 2006, pp. 1450-1464.

17. N. Ranganathan, R. Mehrotra, and S. Subramanian, “A high speed systolic architec-ture for labeling connected components in an image,” IEEE Transactions on Systems, Man and Cybernetics, Vol. 25, 1995, pp. 415-423.

18. J. Canny, “Finding edges and lines in images,” Technical Report, AITR-720, M.I.T

Page 19: Automatic Traffic Surveillance System for Vision-Based Vehicle

AUTOMATIC TRAFFIC SURVEILLANCE SYSTEM

629

Artificial Intelligence Laboratory, 1983. 19. J. R. Parker, Algorithms for Image Processing and Computer Vision, Wiley, New York,

1997, pp. 19-53.

Chung-Cheng Chiu (瞿忠正) received the B.S. degree and M.S. degree in Electrical Engineering from the Chung Cheng In-stitute of Technology, Taiwan in 1990 and 1993 respectively, and the Ph.D. degree in Department of Electrical and Control Engi-neering at National Chiao Tung University, Hsinchu, Taiwan in 2004. Since 1993, he is a lecturer of the Department of Electrical Engineering at the Chung Cheng Institute of Technology. In 2005, he became an Associate Professor. In 2003, he received Dragon Golden Paper Award sponsored by the Acer Foundation and the Silver Award of Technology Innovation Competition sponsored

by the AdvanTech. His research interests include image processing, image compression, document segmentation, computer vision and applications on intelligent transportation system.

Min-Yu Ku (古閔宇) received the M.S. degree in Naval Architecture and Marine Engineering from Chung Cheng Insti-tute of Technology, National Defense University, Taoyuan, Tai-wan, in 2003. He is currently working towards the Ph.D. degree in the Department of Electrical Engineering. His research inter-ests include image and video processing, pattern recognition, and applications on intelligent transportation system.

Chun-Yi Wang (王駿逸) received the M.S. degree in Elec-trical Engineering from Chung Cheng Institute of Technology, National Defense University, Taoyuan, Taiwan, in 2006. He pur-sues a career as a large-scale system programming engineer. He is interested in image processing and applications on intelligent transportation system, especially on object segmentation and track-ing.