distance determination from pairs of images ... - tu...
Post on 23-Jul-2020
7 Views
Preview:
TRANSCRIPT
MSc Project Report HSM 1805
Hantao LIU
Distance Determination from Pairs of Images
from Low Cost Cameras
August 2005
The University of Edinburgh
School of Engineering and Electronics
MSc in Signal Processing and Communication
The King’s Buildings
Edinburgh EH9 3JL
i
MSc Project Mission Statement
Student: Liu Hantao
Supervisor: Dr John Hannah
Project Title: “Distance Determination from pairs of images from low cost cameras”
Project Definition:
This project will involve working with pairs of images from stereo cameras and using
these to determination depth information. Initially the work would be based on using
existing image pairs. The project may extend to using and developing software for
stereo camera hardware we have recently acquired.
Preparatory Tasks:
Read BEng project report
Revision/learning of C or C++ syntax
Look at IceCam stereo vision system at
http://www.icerobotics.co.uk/technology.html
Read literature on depth estimation from stereo
Main Tasks:
Develop a suitable algorithm for depth estimation
Produce C or C++ code implementation
Test software with example images
Acquire new images using IceCam stereo camera hardware
Write project report
Scope for Extension:
Develop ‘real-time’operation with IceCam system
ii
Background Knowledge:
C(C++) Programming
Signal Processing
Image Processing
Resources:
Sample image pairs
‘Vision’image processing library
Unix computer access (TLC)
Linux PC for hardware experiments
IceCam stereo vision hardware & software
Location:
TLC
Vision Lab for experimental work
Reference:
Umesh R. Dhond and J. K. Aggarwal. “Structure from stereo---a review.”IEEE
Transactions on Systems, Man, and Cybernetics, 19(6): 1489— 1510, 1989.
The supervisor and student are satisfied that this project is suitable for performance
and assessment in accordance with the guidelines of the course documentation.
Signed
Liu Hantao … … … … … … … … … … … … … .
Dr J M Hannah … … … … … … … … … … … … … .
Date: … … … … … … … … … … … … … .
iii
Abstract
This thesis presents an implementation of a stereo vision system, using image
processing techniques to determine depth information from pairs of stereo images.
This system, which aims to aid vehicle reversal, involves working with stereo images
of different vehicles, and the distance between the stereo cameras and the target
vehicle is determined. For a pair of stereo images, the system algorithm extracts
feature points from each image, and then a block matching technique is used to find
the corresponding points between the two images and calculate the displacement. The
experimental results are presented in this thesis as well. An introduction to stereo
vision and some widely used image processing techniques is given. Additionally a
discussion on the improvement and modification of the system is made. Conclusions
are presented regarding the success of the proposed system algorithms and the
possible future research work.
iv
Declaration of Originality
I declare that this thesis is my original work except where stated.
Hantao Liu
August 2005
v
Table of Contents
MSC PROJECT MISSION STATEMENT .......................................................................................... I
ABSTRACT ......................................................................................................................................... III
DECLARATION OF ORIGINALITY .............................................................................................. IV
TABLE OF CONTENTS ......................................................................................................................V
ABBREVIATIONS........................................................................................................................... VIII
CHAPTER 1. INTRODUCTION.................................................................................................1
1.1. AIMS AND OBJECTIVES...........................................................................................................1
1.2. IMPLEMENTATION ...................................................................................................................1
1.2.1. Preprocessing ...................................................................................................................2
1.2.2. Establishment of Correspondence ....................................................................................2
1.2.3. Depth Estimation..............................................................................................................3
1.3. THESIS PLAN ..........................................................................................................................3
CHAPTER 2. BACKGROUND AND THEORY........................................................................4
2.1. STEREO VISION ......................................................................................................................4
2.1.1. Extraction of Feature Characteristics ..............................................................................5
2.1.2. Stereo Correspondence Problem.......................................................................................5
2.1.3. Depth Information ............................................................................................................6
2.2. IMAGE PROCESSING TECHNIQUES ..........................................................................................7
2.2.1. Smoothing Filters .............................................................................................................7
2.2.2. Thresholding.....................................................................................................................8
2.2.3. Edge Detection .................................................................................................................8
2.2.4. Corner Detection............................................................................................................10
2.3. SOFTWARE DEVELOPMENT...................................................................................................10
CHAPTER 3. DESIGN AND EXPERIMENTS........................................................................ 11
vi
3.1. EDGE DETECTION BASED ALGORITHM................................................................................. 11
3.1.1. Gaussian Smoothing.......................................................................................................12
3.1.2. Vertical Edge Detection..................................................................................................12
3.1.3. Otsu’s Thresholding........................................................................................................13
3.1.4. Morphological Operation...............................................................................................14
3.1.5. Hough Transform............................................................................................................15
3.1.6. Feature Points Extraction...............................................................................................16
3.1.7. Summary .........................................................................................................................17
3.2. CORNER DETECTION BASED ALGORITHM ............................................................................18
3.2.1. System Overview.............................................................................................................18
3.2.2. Corner Detection............................................................................................................19
3.2.3. Scale Operation..............................................................................................................21
3.2.4. Thresholding...................................................................................................................22
3.2.5. Non-maximal Suppression ..............................................................................................22
3.2.6. Feature Points Extraction...............................................................................................23
3.2.7. Summary .........................................................................................................................23
3.3. STEREO MATCHING ..............................................................................................................24
3.3.1. Block Matching Algorithm..............................................................................................24
3.3.2. Experiments ....................................................................................................................24
3.3.3. Summary .........................................................................................................................27
3.4. DISTANCE DETERMINATION .................................................................................................28
CHAPTER 4. RESULTS.............................................................................................................30
4.1. EDGE DETECTION BASED ALGORITHM.................................................................................30
4.1.1. Extraction of Feature Points...........................................................................................30
4.1.2. Block Matching...............................................................................................................31
4.1.3. Matching Results of Initial Database .............................................................................31
4.1.4. Matching Results of Additional Database ......................................................................39
4.2. CORNER DETECTION BASED ALGORITHM ............................................................................52
4.2.1. Extraction of Feature Points...........................................................................................52
vii
4.2.2. Block Matching...............................................................................................................52
4.2.3. Matching Results of Initial Database .............................................................................53
4.2.4. Matching Results of Additional Database ......................................................................60
CHAPTER 5. DISCUSSION......................................................................................................71
5.1. EXTRACTION OF EDGE-BASED FEATURE POINTS ..................................................................71
5.2. EXTRACTION OF CORNER-BASED FEATURE POINTS..............................................................71
5.3. MATCHING ALGORITHM .......................................................................................................72
5.4. ADDITIONALPROBLEMS.......................................................................................................73
CHAPTER 6. CONCLUSIONS .................................................................................................74
ACKNOWLEDGEMENTS .................................................................................................................76
REFERENCES.....................................................................................................................................77
APPENDIX 1. INITIAL PROJECT IMAGES......................................................................... A.1
A.1.1. FRONT ................................................................................................................................ A.1
A.1.2. REAR .................................................................................................................................. A.2
APPENDIX 2. ADDITIONAL PROJECT IMAGES .............................................................. A.3
A.2.1. STRAIGHT CAR ...................................................................................................................A.3
A.2.2. ANGLED CAR ..................................................................................................................... A.3
A.2.3. TAXI ................................................................................................................................... A.4
A.2.4. FOREIGN VAN ..................................................................................................................... A.5
A.2.5. LANDROVER....................................................................................................................... A.5
A.2.6. WHITE CAR ........................................................................................................................ A.6
viii
Abbreviations
1-D One-Dimensional
2-D Two-Dimensional
3-D Three-Dimensional
LOG Laplacian of Gaussian
HT Hough Transform
MAD Mean Absolute Difference
MSD Mean Squared Distance
NCC Normalized Cross-Correlation
PGM Portable Grey Map
IEEE Institute of Electrical and Electronic Engineers
1
Chapter 1. Introduction
1.1. Aims and Objectives
Many image processing applications involve detecting target object and estimating
some meaningful parameters of that object such as velocity or distance. The aim of
this project is to work with pairs of images from stereo cameras and use these to
determine depth information. In stereo vision system, two cameras are separated by a
fixed horizontal distance, and correspondence is established between two stereo
images, by knowing the camera focal length and imaging geometry, the depth
information can be determined. The main work of this project involves the
development of a suitable algorithm to estimate stereo depth.
In this project, the stereo image pairs are sourced from fixed cameras located at a
known distance apart, and this stereo rig is mounted on a vehicle to aid reversal. By
evaluation of all available information from stereo, corresponding points on the two
images will then be found, and the distance between the reversing vehicle and a
stationary vehicle at the rear of it can be determined.
1.2. Implementation
To implement the system, stereo image pairs in Appendix 1 and Appendix 2 are used.
Appendix 1 contains front and rear images of a car. And Appendix 2 consists of
images taken from different vehicles and a vehicle in different situations such as
straight or angled. These images are used to test the robustness of the system.
The implementation of this system can be divided into three major steps:
preprocessing, establishing correspondence, and estimating depth. This is shown in
Figure 1.1. In this section we briefly describe each of them.
2
Figure 1.1 Flow Diagram of System Implementation
1.2.1. Preprocessing
Preprocessing of images is an important step for stereo computation. In this stage,
feature characteristics are extracted from each image, and they are extensively used in
the subsequent matching process, therefore these feature characteristics have to be
chosen carefully. In this project, feature points of certain number are obtained in each
image as the matching primitives. There are two proposed algorithms used to extract
feature points in this project, one is based on edge detection combined with Hough
transform (HT), and the other is based on corner detection.
1.2.2. Establishment of Correspondence
Matching is perhaps the most important stage in stereo image processing. As we have
extracted feature points from two stereo images, correspondence needs to be achieved
among these homologous feature characteristics, that is, we have to find the feature
points that are projections of the same physical identity in each image. In this project,
block matching is applied to find the corresponding points in stereo vision.
3
1.2.3. Depth Estimation
In this stage, corresponding points on the two stereo images are used to calculate the
disparity – the separation between matched pixels. The depth information is then
determined by the consideration of imaging geometry and camera focal length.
1.3. Thesis Plan
The goal of this thesis is to design a suitable algorithm for depth estimation and
produce software for implementation.
This thesis will describe the techniques used and algorithms proposed in the project,
and the results achieved by the system will be analyzed and compared. Finally, some
ideas are put forward for future research.
Chapter two briefly describes the background of this project and some basic theories
that are used for stereo image processing.
Chapter three contains the system design and experiments. As two algorithms are
proposed in this project for feature characteristics extraction, this chapter is divided
into two sections for edge detection based approach and corner detection based
approach respectively.
Chapter four shows some experimental results, analysis and comparisons.
Chapter five includes a discussion on the problems encountered in this project, and
Chapter six gives conclusions and proposes possible future work.
4
Chapter 2. Background and Theory
2.1. Stereo Vision
Analysis of video images in stereo has emerged as an important passive method for
extracting the three-dimensional (3-D) structure of a scene [4]. A simplified stereo
imaging system is shown in Figure 2.1.
'' , ll yx '' , rr yx
zyx ,,
Figure 2.1 A Simplified Stereo Imaging System [17]
Two cameras with their optical axes parallel and separated by a distance d [17].
The line connecting the camera lens centers is called the baseline [17].
The focal length of both cameras is f [17].
Let the origin O of this system be mid-way between the lens centers [17].
Let the x axis of the 3-D world coordinate system be parallel to the baseline [17].
Consider a point zyx ,, in 3-D world coordinates on an object [17].
Let the point zyx ,, have image coordinates '' , ll yx and '' , rr yx in the left
and right image planes of the respective cameras [17].
The goal of stereo vision research is to estimate depth information from a pair of
stereo images. With two cameras separated by a fixed distance, each camera receives
5
a slightly different image of the same scene in the real world. If we can successfully
determine which feature characteristics in the image form the left camera correspond
with which in the image from the right camera, and if we know the stereo imaging
geometry and camera focal length, it is possible to reconstruct the depth information.
Generally, the major stages involved in the stereo vision are preprocessing of images
to obtain matching features, recovering the disparity between the images by a suitable
stereo algorithm, and using geometry to recover the stereo depth.
2.1.1. Extraction of Feature Characteristics
Extraction of feature characteristics from an image for subsequent matching process is
an important step in stereo vision. In this stage, we have to carefully decide which
kind of features should be chosen as the matching primitives, and this will have big
influence on the results of stereo matching. The feature characteristics can be
classified into two categories: area-based and feature-based.
The area-based matching primitives are used in some of the early stereo algorithms.
Area patches from two images are matched to establish correspondence.
The feature-based matching schemes, which match features directly, have been
increasingly used in practice. Since physical discontinuities in a scene are mapped to
intensity changes in an image, edges are widely used as the matching primitives.
2.1.2. Stereo Correspondence Problem
The stereo correspondence problem which finds corresponding points between two
images is the crucial and most difficult stage in stereo vision. The main task is to
compute the accurate disparity between the left and right images. In the past decades,
a large number of stereo matching algorithms have been proposed, and these
strategies are different according to the matching primitives as well as the stereo
imaging geometry. In terms of the matching primitives, the area-based matching and
6
the feature-based matching are commonly used. There are also different imaging
geometries, such as parallel-axis and nonparallel-axis, binocular and multi-ocular [4].
Area-based stereo techniques use correlation among brightness (intensity) patterns in
the local neighborhood of a pixel in one image with brightness patterns in a
corresponding neighborhood of a pixel in the other image [4]. This is a simple
matching method, but it is sensitive to changes of overall illumination or perspective.
Also, the selections of interest points and similarity measurement have large influence
on the determination of the accurate depth information.
Feature-based stereo techniques use symbolic features derived from intensity images
rather than image intensities themselves [4]. The advantage of this matching approach
is that it is more stable to changes in contrast and illumination, because the
feature-based techniques do not use intensity values directly. In practice, edge points
or edge segments are commonly used as the features.
Stereo matching paradigms are also characterized by the particular imaging geometry
being used [4]. The conventional stereo imaging geometry contains two cameras with
their optical axes mutually parallel, and the factors that could be changed include, but
are not limited to, the mutual orientation of the optical axes of the cameras and the
number of cameras used [4].
2.1.3. Depth Information
The term depth and the term disparity are frequently used in the literature of stereo
vision. Although they are interchangeable in many cases, there is a subtle difference
in meaning [6]. When a point in the left image and another point in the right image are
matched, that is, they are considered to be projections of the same physical identity in
the 3-D world, the difference in their relative positions is recorded as the disparity.
The Equation 2.1 shows the relationship between disparity (in pixels) and depth.
7
DisparityhFocalLengtBaselineDepth [6] (2.1)
The Equation 2.1 can be obtained from the imaging geometry shown in Figure 2.1. By
considering similar triangles:
z
dx
fx l 2
' ,
z
dx
fxr 2
' ,
zy
fy
fy rl
''
[17] (2.2)
Solving for (x, y, z) gives:
''
''
2 rl
rl
xxxxd
x
, ''
''
2 rl
rl
xxyyd
y
, ''rl xx
dfz
[17] (2.3)
The quantity ''rl xx which appears in Equation 2.2 and 2.3 is called the disparity,
and the quantity z is called the depth.
2.2. Image Processing Techniques
There are many image processing techniques which have been widely used in stereo
vision to enhance and manipulate images. Some of the techniques used in this project
are described in this section.
2.2.1. Smoothing Filters
Smoothing filtering aims to reduce noise in an image, and it is usually used in
preprocessing step to remove small details prior to object extraction. These filters are
also called averaging filters, because the output of a smoothing filer is the average of
the pixels contained in the filter mask.
The basic idea of smoothing is to simply replace the intensity value of every pixel in
an image by the average intensity value of all pixels within the defined filter mask,
8
and this process can reduce sharp changes in gray levels. Noise can be considered as
high-frequency information in an image, smoothing is essentially a low-pass filter.
However, edges which are desirable high-frequency elements can be removed or
blurred by a smoothing filter. Therefore, the size and property of filter used in practice
have to be chosen carefully, that is, a trade-off should be
e to remove more unwanted information and retain enough desired image features.
Figure 2.2 shows two 3×3 smoothing filters.
91
161
Figure 2.2 3×3 Smoothing Filers
2.2.2. Thresholding
Image thresholding is one of the most commonly used techniques in image processing
applications due to its simplicity of implementation. This operation highlights pixels
which have particular intensity values, or intensity values within a specified range [3].
Choosing a suitable threshold level is the most difficult part in the thresholding
operation, and it also depends on the application requirements. In uniform
thresholding, pixels above the chosen brightness level are set to white, and those
below this level are set to black. And the adaptive thresholding divides the original
image into subimages and then applies a different threshold to each subimage [1].
There are more advanced techniques which can select an optimal threshold level
automatically based on the image histogram, such as the Otsu’s method [3].
2.2.3. Edge Detection
Since feature-based stereo algorithms have been increasingly applied in many systems,
9
edges detection becomes one of the most commonly used stereo vision techniques. An
edge is a set of connected pixels that lie on the boundary between two regions [1].
Since the edge point is at the position of a step-change in gray level, or it is a
high-frequency element in frequency domain, edge detection highlights contrast and
is robust against brightness changes on an image. There are many excellent edge
detection operators, such as Prewitt, Sobel, Canny, and Marr-Hildreth. These
operators can be classified into two categories: first-order edge detection and
second-order edge detection.
The Sobel edge detection is a first-order edge detection and among the most used in
practice. The Sobel operator gives a better performance than other contemporaneous
edge detection operators, and it has superior noise-suppression characteristics [1]. The
Sobel operator consists of a vertical template, Mx, and a horizontal template, My,
which are given in Figure 2.3 (a) and (b) respectively.
Figure 2.3 Sobel Templates
The Marr-Hildreth edge detection is one of the most famous second-order edge
detections. Hildreth and Marr proposed that using a Laplacian of Gaussian (LOG)
operator can obtain a near-optimal edge detection operator [6]. The basic idea of
Marr-Hildreth operator is to combine the Gaussian smoothing with second-order
differentiation, and then detect edges via zero-crossings.
Edge detection detects intensity changes, and it is high-pass filter in frequency
domain, therefore it responds to noise. In practical applications, a trade-off has to be
considered, because some edge operators may detect more edges but respond to noise,
10
and others my be noise-tolerant but remove significant edge information.
2.2.4. Corner Detection
Corners which can be considered as junctions of edges are another low-level features,
and these again can be extracted automatically from an image. Corners are the points
of interest, and they are derived from edge information which defines the boundary of
different objects or different parts of the same object. A large number of corner
detection algorithms have been developed in the past decades. However, there are
three main trends for detection of corners in gray scale image: edge-relation methods,
topology methods, and autocorrelation methods [18].
2.3. Software Development
This project is encoded on a UNIX system by using the C++ programming language
and the CMACS compiler. There are two main reasons for using C++ programming
language in this project: one is to meet the real-time requirement of the system, and
the other is to use the ‘Vision Systems library’. This library [10] is proposed by the
Vision Systems group at the University of Edinburgh, and it contains the most
commonly used classes in the Vision Systems code.
Some classes in the ‘Vision System library’are particularly designed for image
processing, and the most used of these in this project are the VS_frame and
VS_frame_io. The VS_frame class is used to store an image, and pixels are stored as
integer (int) values [10]. This class provides some useful operations for returning
image attributes, getting intensity values, and setting new intensity values for pixels.
The VS_frame_io class is used for reading and writing images to and from files [10].
11
Chapter 3. Design and Experiments
3.1. Edge Detection Based Algorithm
Edge detection, which highlights meaningful discontinuities in grey levels, is one of
the most popular approaches for extracting features of an image. Since edges often
occur at the boundaries of features within an image, edge detection is used to separate
the object from its surroundings. Interpreting an image based on edges can reduce the
amount of data while retaining most of the image information. Moreover, edge
detection is insensitive to overall illumination changes, and it is thereby an important
component of preprocessing of stereo images for further use. The algorithm proposed
in the project combines edge detection with Hough transform (HT) to extract
reference points from stereo pairs of images for correspondence establishment.
The basic idea is shown in the flow diagram of Figure 3.1.
OriginalImage(Left)
DistanceDetermination
GaussianSmoothing
Vertical EdgeDetection
Otsu’sThresholding
Thinning &Erosion
HoughTransform
Vertical LinesExtraction
Feature PointsExtraction
OriginalImage(Right)
GaussianSmoothing
Vertical EdgeDetection
Otsu’sThresholding
Thinning &Erosion
HoughTransform
Vertical LinesExtraction
Feature PointsExtraction
BlockMatching
Figure 3.1 Flow Diagram of Edge Detection Based Algorithm
12
3.1.1. Gaussian Smoothing
Averaging is used to reduce the noise before edge detection. The Gaussian averaging
has been considered to be the optimal smoothing for an image [3]. The values of
Gaussian template are set by the 2-D Gaussian relationship given in Equation 3.1.
2
22
222
1,
yx
eyxG
[16] (3.1)
where is the standard deviation of the Gaussian distribution.
In this project, a 5×5 Gaussian template in Figure 3.2 with a of 1.0 is used.
Gaussian smoothing can offer better performance compared with direct averaging:
more image features are retained while the noise is removed.
Figure 3.2 5×5 Gaussian Template [16]
3.1.2. Vertical Edge Detection
In this project, it is noticed that vehicles have more horizontal edges than vertical
edges. According to our experimental results, the edge images have very strong
vertical edges, and it is clear that vertical edge detection is better than horizontal edge
detection in suppressing noise [15]. Therefore, it is reasonable to implement vertical
edge detection on vehicle stereo images for feature extraction.
There are many edge detection techniques, and the Sobel edge detection is so far
among the most used in practice, therefore it is chosen to be applied in this project.
The template used for vertical edge detection is given in Figure 3.3.
13
1
2
1
-1
-2
-1
0
0
0
Figure 3.3 Template of Sobel Vertical Edge Detection
3.1.3. Otsu’s Thresholding
Thresholding is a simple feature extraction technique. Edge images are converted to
binary images by thresholding. Choosing the threshold level is difficult as it requires
knowledge of the grey level. In this project, we use Otsu’s method, which is an
optimal thresholding technique. Otsu’s technique can automatically select a threshold
level that achieves the best separation of an object from its background. The basis for
this is use of the normalized histogram which represents a probability distribution for
the intensity levels as [3]:
2NlN
lp [3] (3.2)
The zero-order and first-order cumulative moments of the normalized histogram [3]:
k
l
lpk1
and
k
l
lplk1
[3] (3.3)
The total mean level of the image [3]:
max
1
N
l
lplT [3] (3.4)
The variance of the class separability is the ratio [3]:
kk
kkTkB
1
22
max,1 Nk [3] (3.5)
14
The optimal threshold optT is the level for which the variance is at its maximum [3]:
kT BNkoptB2
1
2
max
max
[3] (3.6)
Since selecting the threshold level by Otsu’s method is automatic, as opposed to
manual this has advantage in automated stereo vision.
3.1.4. Morphological Operation
(1) Thinning
The edges in the output image of edge detection are always thick. In order to
implement the Hough Transform (HT) to detect different lines, a thinning technique is
used to thin these edges by reducing all lines to a single pixel thickness. The thinning
operation is based on a structuring element, and it is determined by translating the
origin of the structuring element to each possible position in the image, and
comparing with the underlying image pixels [16]. If the pixels in the structuring
element exactly match pixels in the image, then pixel in the image which is
underneath the origin of the structuring element is set to zero [16]. Otherwise it is left
unchanged [16]. In each iteration, each structuring element must be used in each of its
four 90° rotations.
In this project, two structuring elements are used, which are given in Figure 3.4 (a)
and (b) respectively.
Figure 3.4 Structuring Elements of Thinning [1]
15
(2) Erosion
The erosion operation is applied on an edge image to erode away the boundaries of
edges, thus edges shrink in size. In the vertical edge image, the implementation of
erosion technique can remove short vertical edges which are considered to be noise,
and retain the strong vertical edges. In terms of Hough transform, this operation can
reduce the computational complexity. In the project, a 1×5 structuring element is used
to erode the edge image. The structuring element is superimposed on top of the input
image and the origin of the structuring is centered on each possible position in the
image [16]. If all the pixels underneath the structuring element are 255 (white), then
the input pixel in the image is set to 255 (white); otherwise if any of the
corresponding pixels in the image are not 255, the input pixel is set to 0 (black) [16].
The 1×5 erosion structuring element is given in Figure 3.5.
Figure 3.5 Structuring Element of Erosion
3.1.5. Hough Transform
There are many vertical edges remaining in the output image of above processing. We
can also see that the vehicle number plate contains more vertical edges than other
regions in the image. It is reasonable to extract an equal number of strong vertical
edges from each of the two stereo images, and then obtain points from these candidate
lines as matching primitives. The Hough transform is used in the project to extract
these candidate vertical lines. The advantage of the Hough transform technique is that
it is relatively unaffected by noise. Any line on the x-y plane is shown in Figure 3.6.
16
x
y
Figure 3.6 Polar Consideration of a Line [3]
We can describe a set of lines in the form:
sincos yx [3] (3.7)
where is the angle of the line normal to the line in an image and is the length
of the normal from the origin to the line.
The accumulator array is a set of 180 bins, the value of is in the range 0 to 180°,
and the value of is in the range 0 to 22 MN , where N×M is the image size.
The peaks in the accumulator array are projections of straight lines in the edge image.
In this project, only vertical edges are detected, accordingly in the accumulator array
of Hough transform, only the row with =0 is retained. Then peaks within this range,
which represent ten straight lines in an image, are extracted. Therefore ten vertical
lines are detected in each image, and the edge points are only retained in the positions
where the detected lines are placed.
3.1.6. Feature Points Extraction
So far, we have extracted the candidate lines, which are strong vertical edges in an
image. It is proposed in the project that the top and bottom points of each candidate
vertical edge are chosen as the feature points. There are also two other important
issues to be considered.
17
(1) Post-processing of the Candidate Vertical Lines
There may be more than one vertical edge in the same detected vertical position. And
the longest vertical edge is chosen to be the real candidate for extracting the feature
points. Moreover, there may be also gaps or discontinuous points within the candidate
vertical edge. This can be solved by using the morphological operations.
(2) Points Extraction
For each of these candidate vertical edges, its top and bottom points are extracted as
the feature points for establishing correspondence, meanwhile, the coordinates of
these points are known. Therefore, twenty points are obtained from the ten candidate
vertical edges. In order to make the results more accurate, we can detect more vertical
edges by the Hough transform, and more feature points can then be acquired from the
input image.
3.1.7. Summary
In the proposed method, we first detect the vertical edges from an image, and then we
extract the strong vertical edges by the Hough transform. Equal number of vertical
edges can be picked out from a pair of stereo images. The top and bottom points of
each candidate vertical edges are obtained as the feature points to find correspondence.
In the method, vertical edges are detected as they are relatively strong in the car image.
Hough transform guarantees that the detected vertical edges are strong enough, which
increases the possibility that the edges extract from a pair of images are corresponding
features.
Choosing a relatively small number of points for correspondence matching can reduce
the computational cost. In the project, we detect only ten candidate vertical edges and
twenty feature points are extracted. More feature points can be obtained to increase
the accuracy, but consequently this requires more computational effort.
18
3.2. Corner Detection Based Algorithm
The use of interest points to find the correspondence between two stereo images can
drastically reduce the required computation time compared with processing every
pixel in the two images or pixels within certain regions of images. The proposed
method based on vertical edge detection combined with the Hough transform is one of
the applications of interest points. Corner detection is an alternative method to extract
interest points for finding correspondence.
3.2.1. System Overview
Corners are essentially the points where the edge direction changes rapidly in an
image. In the project, a vehicle image contains many strong horizontal and vertical
edges. Therefore, it is reasonable to extract corners as the interest points. It is noticed
that no optimal corner detector is available, and selecting a corner detector depends on
the particular application (i.e. real-time). The proposed stereo vision system which is
based on corner detection is shown in Figure 3.7.
Figure 3.7 Flow Diagram of Corner Detection Based Algorithm
19
3.2.2. Corner Detection
Corner detection is applied to an image to obtain a cornerness map. For each pixel in
an image, the corner operator is implemented to make a measurement for this pixel,
indicating the degree to which this pixel is considered to be a corner. Different corner
detection approaches have different measurement criteria, but all measurements are
made for the pixels within a window centered on the input pixel in image. In the
project, the Harris/Plessey corner detection is introduced.
(1) Harris/Plessey Corner Detection
The Harris/Plessey corner detection was developed by Chris Harris and Mike
Stephens in 1988 [5]. This is a combined corner and edge detector which allows the
variation of the autocorrelation over all different orientation to be obtained [18]. The
method is stated below [5, 18]:
For each pixel (x, y) in the image, calculate the autocorrelation matrix M:
CA
M
BC
(3.8)
where: wxIA
2
, wyI
B
2
, wyI
xI
C
is the convolution operator and w is the Gaussian window
Construct the cornerness map by calculating the cornerness measure C(x, y) for
each pixel (x, y):
2det, MtracekMyxC (3.9)
221det CABM (3.10)
BAMtrace 21 (3.11)
constk
20
(2) Algorithm Design
In order to implement the Harris/Plessey corner detection to obtain the cornerness
map, we proposed the following steps to realize the algorithm.
Differentiation [3, 18]
The Prewitt operator is commonly used to approximate the first-order derivation of an
image. The values ofxI and
yI are approximated by the simple templates below.
A1
A4
A7
A3
A6
A9
A2
A5
A8
Figure 3.8 Template labeling
1,0,15465
AAAA IIIx
I TAAAA IIIy
I1,0,1582
5
Figure 3.9 Horizontal Gradient and Vertical Gradient
Therefore, for each input pixel we obtain its horizontal and vertical gradientsxI
andyI
, then2
xI
,2
yI
and
yI
xI
can be calculated respectively.
Gaussian Window [3, 18]
The use of a Gaussian window in the Harris/Plessey corner detection can reduce the
21
noise response. A 5×5 Gaussian window with =1.4 is given in Figure 3.10.
Figure 3.10 Gaussian Window
According to the algorithm, the Gaussian window is convolved with2
xI
,2
yI
and
yI
xI respectively to result in the autocorrelation matrix M.
Construction of Cornerness Map
By knowing the autocorrelation matrix M, the measurement of cornerness for each
pixel can be made by calculating the Trace(M) and Det(M). For each input image, the
output of the Harris/Plessey corner detection is a cornerness map.
3.2.3. Scale Operation
As we can see that, the cornerness map consists of corners with gigantic intensity
values, which are caused by cornerness measurement. In order to process the
cornerness map efficiently, the pixel values of the image are mapped into the range of
0 to 255 by a scale operation. We replace intensity values with ones computed
according to Equation 3.12.
255minmax
min,,
OO
OON yx
yx (3.12)
The brightness level of theold image O starts at minO and extends up to maxO , then
the image are scaled so that the pixel values in the new image N are between the
22
range 0 and 255 [3]. Since the scale operation is a linear brightness transformation,
the overall shape of the image histogram is not changed [3].
3.2.4. Thresholding
Thresholding of the cornerness map is one of the most important steps in a corner
detection. Corners are defined as local maxima in the cornerness map [18]. As each
pixel of the input image is measured by a corner operator, not all pixels correspond to
corners in cornerness map. Therefore, the local maxima that have relatively small
cornerness measures are considered to be false corners. To avoid reporting these
points as corners, we can threshold the cornerness map by setting all values below a
certain threshold level to zero [18]. Choosing this threshold level is difficult as it
depends on the requirements of application. The threshold level should be high
enough to remove the false corners, but low enough to retain most of the true corners.
In practice there is a trade-off in selecting the threshold level based on the system
requirements. In the project, a relatively low threshold level is chosen to retain
enough corners and remove obvious noise, and then an advanced technique is used to
extract the local maxima. Thresholding the cornerness map can make the system more
efficient in terms of the computational load.
3.2.5. Non-maximal Suppression
Non-maximal suppression is applied to the thresholded cornerness map to locate the
local maxima. For each pixel in the thresholded cornerness map, a square window is
centered on it. If the cornerness measure of this pixel is the largest within this window,
the pixel is retained with its cornerness measure. Otherwise, the cornerness measure
of this pixel is set to zero. A 3×3 square window is given in Figure 3.11.
After the implementation of the non-maximal suppression, the corners are simply the
non-zero points remaining in the thresholded cornerness map [18].
23
A1 A2
A5A4
A3
A6
A7 A8 A9
Figure 3.11 Non-maximal Suppression
3.2.6. Feature Points Extraction
It is proposed that the number of feature points extracted from each of the two stereo
images should be equal. In the corner image, there are a large number of corners
which have different intensity values. According to the Harris/Plessey corner
detection, the larger intensity value the pixel has, the stronger the corner is. Therefore,
the feature points can be chosen according to the intensity values. In the project,
twenty maximal-intensity corners are extracted from each corner map as the feature
points.
3.2.7. Summary
The proposed method to extract feature points for stereo matching is based on corner
detection. In the project, Harris/Plessey’s method is used to detect corners. Cornerness
map is obtained from input image by the Harris/Plessey operator, and then a
thresholding operation is applied to remove the false corners and retain most of the
true corners. Finally, the non-maximal suppression is used to find the local maxima in
the thresholded cornerness map, and feature points which are defined as corners can
then be extracted The use of Harris/Plessey corner detection has the advantage that we
can choose the matching points according to the cornerness measure. The larger the
cornerness measure is, the stronger the corner would be. In this point of view, we can
guarantee that the matching points extracted are strong corners in an image, and this
increase the possibility of finding more real matches. In the project, only twenty
24
points are extracted from each of the two stereo images, more feature points can be
extracted, but it is noticed there is a trade-off between the accuracy and computational
cost.
3.3. Stereo Matching
Matching is the most important stage in stereo image processing. Given two stereo
images, correspondence needs to be achieved among the homologous features. As a
result, features that are projections of the same physical identity in the real world are
found. In the project, the primitives used for matching are the feature points extracted
by the above algorithms.
3.3.1. Block Matching Algorithm
Block matching technique has been so far widely used for finding the corresponding
points in stereo vision. It simply groups pixels together into blocks, and then match
theses blocks [6]. In stereo vision, each block from the left image is matched into a
block in the right image, and the sum of differences between the intensity values of
the two blocks is calculated by a certain criterion. The pairs of blocks which give the
relatively small computed metrics are considered to be the real matches.
In the ideal case, two matching blocks have exactly the same corresponding pixels [7].
Unfortunately, this is rare because many factors can result in the difference. For
example, the target object can be out of shape due to the angle of view, and the overall
illumination on the images may change, furthermore, there is always noise in the real
world. Despite these problems, block matching is by far a simple and popular stereo
matching algorithm.
3.3.2. Experiments
In the project, as we have extracted feature points from the stereo images, a block is
25
centered on each of these interest points, by calculating the similarity measurement,
we can know how many points between the two images are matched. Compared with
the full search method in which block matching is applied for the entire image, the use
of the feature points as the matching primitives can dramatically reduce the
computational complexity.
(1) Block Size
Selecting a proper block size for stereo matching is not an easy task. Generally, large
block size is insensitive to image distortions, while small block size is computational
efficient. Therefore, a trade-off must be made in choosing a right block size, and this
also depends on the requirements of application. In practice, if a feature point is close
to the image borders, the use of a large block size can probably makes part of the
block centered on this point be out of the image. Essentially, the leading factor in the
project for choosing the block size is the distance between the object and the stereo
cameras. It is proposed that an adaptive block size is used to proportion the changes of
distance. If the object appears big in an image due to the short distance capture, a
large block size should be chosen to avoid false matching, while if the object is small
in the image, a small block size has to be applied.
(2) Search Region
The search region plays an important part in finding the match. In the project, the
feature points are used as the matching primitives, and the true matching positions are
searched only among these extracted points. If the amount of the extracted feature
points is large enough, more real matches can be found, but the computational load
grows with the increase of the feature points. If the amount of the feature points is
small, the computational cost is reduced, but the false match might be found.
Therefore, a trade-off must be made based on the system requirements.
26
(3) Matching Criteria
There are many commonly used matching criteria based on pixel differencing, such as
mean absolute difference (MAD), mean squared distance (MSD), and normalized
cross-correlation (NCC) [7]. In the project, the mean absolute difference (MAD) is
used to measure the similarity [7].
m
i
n
j
jiBjiAmn
MAD1 1
,,1
(3.13)
where the block size is m×n.
(4) Matching Process
In the project, we extract twenty feature points from each of the two stereo images,
and this is shown In Figure 3.12.
Figure 3.12 Feature Points in a Pairs of Stereo Images
For each point in the left image, we search all the points in the right image to find the
best matching point with minimal MAD. Therefore, all feature points in the left image
are mapped to the feature points in the right image according to the MAD. The outline
of the matching points is shown in Figure 3.13.
0A , ?B ?MAD
27
1A , ?B ?MAD
2A , ?B ?MAD
......
19A , ?B ?MAD
Figure 3.13 Outline of Matching Points
To decide which pairs of points above are real matches is not a trivial task. One of the
simplest solutions is to choose the pair of points with the minimal MAD in the above
list as the best match. Although this method works well to extract the best match in
our experiments, it can cause error due to the amount of feature points and the size of
block. Moreover, this method can only find one real match between the two stereo
images. To make the matching approach robust and more accurate, we can set a
threshold to retain the pairs of points with a MAD below this threshold. Alternatively
we can reorder the MADs in the list from minimum to maximum, and then choose the
pairs of points with relatively small MADs (e.g. the top five pairs of points in the
reordered list). These selected pairs of points are called candidate matches. In order to
decide the real matches, the horizontal offset (x offset) is calculated for each pair of
corresponding points, and the real matches should have approximately the same x
offsets. Apparently, more than one real match can be extracted by this approach.
Figure 3.14 shows the matching method used in the project.
3.3.3. Summary
In the project, block matching is applied to achieve the correspondence between
stereo image pairs, and mean absolute difference (MAD) is used to measure the
similarity. As feature points of certain number have been extracted from each of the
two stereo images, matches are searched among these interest points by centering a
block on each available point. Because we have relatively small amount of feature
points, block matching algorithm runs very fast even a large block size is applied.
28
Each point in the left image is mapped to its matching point in the right image by
calculating MAD. We outline all candidate matches and determine the real matches
according to the value of MAD and the horizontal offset of two matching points.
Load Reference Images
Create a Block fromLeft Image
Use all Points?
Create a Block fromRight Image
Use all Points?
MAD<Low?
MAD=LowUpdate Coordinate
Save the MatchingPoints and x-offset
Outline all the Matches
Determine the RealMatches
N
Y
N
Y
N
Y
Figure 3.14 Flow Diagram of Matching Algorithm
3.4. Distance Determination
The distance between the stereo pair of cameras and the target object can be
determined by the imaging geometry shown in Figure 3.15.
The imaging geometry of a conventional stereo imaging system involves a pair of
cameras with their optical axes mutually parallel and separated by a horizontal
distance denoted as the stereo baseline [4]. The cameras have their optical axes
perpendicular to the stereo baseline, and their image scan lines parallel to the baseline
29
(horizontal) [4]. Since the displacement between the optical centers of the two
cameras is purely horizontal, the position of corresponding points in the two images
can differ only in the horizontal component [4].
LXLO
LY
RXRO
RY
RZLZ
LI RI
LP RP
f
zyxP ,,
b
baseline
Figure 3.15 Stereo Geometry
In the Figure 3.15, the origin of the world coordinate system is LO , the effective
focal length of each camera is f , and the stereo baseline is b . The LLLL zyxP ,,
and RRRR zyxP ,, are the projections of the point zyxP ,, .The disparity value d
is defined as the x offset of each matched pair of points LLLL zyxP ,, and
RRRR zyxP ,, , RL xxd . The world coordinates of the point zyxP ,, can be
obtained by considering similar triangles [4].
,d
bxx L ,d
byy L andd
bfz [4] (3.14)
The required distance between the stereo cameras and the target object is then the
value of z in Equation 3.14, and this can be presented in Equation 3.15
Distance =Disparity
hFocalLengtBaseline(3.15)
30
Chapter 4. Results
To evaluate the performance of the proposed algorithms in the project, we implement
the edge detection based algorithm and corner detection based algorithm on our
database respectively. The initial stereo images are shown in Appendix 1. It contains
images of a car taken in different conditions. The additional stereo images in
Appendix 2 were taken from different kind of vehicles. The experimental results and
analysis are represented in this chapter.
4.1. Edge Detection Based Algorithm
The proposed algorithm is applied to our database of car stereo images, and the
experimental process can be divided into two parts. In the first part, we extract twenty
feature points from each of the two stereo images. In the second part, we achieve the
best match between these two images. Firstly, the initial database of Appendix 1 is
used in our experiments, and then we extend the algorithm to additional database we
have obtained in Appendix 2.
4.1.1. Extraction of Feature Points
As we can see from the results, the vertical edge image extracted by the system
consists of very clear vertical edges. In the area of car number plate, we obtain dense
vertical edges. Furthermore, most of the noise has been removed, and the edge image
contains only the information we are interested in. In the image of vertical lines, as
expected, the ten vertical lines extracted by means of Hough Transform tend to appear
in the region of number plate. It is proposed that the top and bottom points of each
vertical line are chosen as the feature points, and then there should be twenty points in
the ideal case. But in practice, if the vertical line is not long enough, which means it is
false feature component, and then we do not extract points from it.
31
4.1.2. Block Matching
In the process of block matching, we choose the block size of 30*30. It is noticed that
there is a border problem as the block is centered on the point. The width of the
border is 15, so the feature points which are inside the image borders should not be
used for matching. In our experiments, for each matching point in the left image, we
search all the matching points in the right image, and the mean absolute difference
(MAD) is used to measure the similarity between two matching blocks. The best
match is defined as the two corresponding points with the minimal MAD in the
process of block matching.
4.1.3. Matching Results of Initial Database
The initial database of Appendix 1 contains car stereo images in two different
situations, which we call “front car”and “rear car”in the project. The images from
Figure 4.6 to 4.9 are the results of processing the images of Appendix 1 by the system.
Figure 4.6 and 4.8 display the results of the image pairs named ‘front 58’and ‘rear 43’,
and the layout is shown as in Figure 4.1. For the rest data, only the image pairs which
show the best match are presented.
Original Image(Left)
Original Image(Right)
Vertical EdgeImage (Left)
Vertical EdgeImage (Right)
Vertical Lines(Left)
Feature Points(Left)
Vertical Lines(Right)
Feature Points(Right)
Best Match (Left) Best Match(Right)
Figure 4.1 Layout of Results for Initial Database
32
(1) Mean Absolute Difference (MAD)
Image MAD Best Match
Front 265
Front 165
Front 94
Front 58
Rear 205
Rear 152
Rear 113
Rear 67
Rear 43
28
24
9
9
7
10
9
21
5
(107 , 60) (91 , 67)
(97 , 97) (75 , 106)
(273 , 156) (236 , 167)
(214 , 196) (152 , 203)
(219 , 88) (205 , 98)
(185 , 66) (165 , 76)
(267 , 53) (240 , 65)
(289 , 31) (241 , 43)
(244 , 50) (167 , 61)
Figure 4.2 Mean Absolute Difference (MAD)
(2) Disparity (or X-offset)
Car Position Actual Distance (cm) Disparity (pixels)
Front
Rear
Front
Rear
Rear
Front
Rear
Front
Rear
265
205
165
152
113
94
67
58
43
16
14
22
20
27
37
48
62
77
Figure 4.3 Disparity (or X-offset)
33
(3) Distance Determination Results
The distance can be determined by using Equation 3.15, where the baseline is 93mm.
As the camera focal length is unknown, it has to be estimated based on the available
data. Since the stereo images given in the project are taken at measured distances, the
focal length can be calculated by Equation 4.1. In our experiments, we calculate the
individual focal length for each pair of images, and the camera focal length was
approximated by averaging all these calculated focal lengths.
BaselineDisparityceDis
hFocalLengt
tan
(4.1)
In order to reduce the computational complexity, we simply calculate the Factor
defined in Equation 4.2 instead of the focal length.
DisparityceActualDishFocalLengtBaselineFactor tan (4.2)
Calculated Distance=Disparity
torAverageFac(4.3)
Actual Distance (cm) Disparity (pixels)
265
205
165
152
113
94
67
58
43
16
14
22
20
27
37
48
62
77
Factor
42.40
28.70
36.30
30.40
30.51
34.78
32.16
35.96
33.11
33.80Average Factor
Figure 4.4 Average Factor
34
According to Equation 4.2, the ‘Average Factor’is calculated to be 33.80. The
estimation of the distance can be made by using Equation 4.3, and the results are
shown in Figure 4.5.
Figure 4.5 Calculated Distance and Percentage Error
In the project, only a small number of stereo images is available, the calculated
‘Average Factor’is inaccurate, in that the camera focal length can not be precisely
determined. The errors of the system could be reduced by increasing the amount of
data for calculating the ‘Average Factor’.
35
Figure 4.6 Front 58
36
(a) Front 94
(b) Front 165
(c) Front 265
Figure 4.7 Front Car
37
Figure 4.8 Rear 43
38
(a) Rear 67
(b) Rear 113
(c) Rear 152
39
(d) Rear 205
Figure 4.9 Rear Car
4.1.4. Matching Results of Additional Database
The additional database in Appendix 2 consists of stereo images which are taken from
different kind of vehicles. The proposed algorithm is applied to these images, and
only the images of best match are displayed.
(1) Straight Car
This set of car stereo images is similar to the images of initial database. The results of
finding the best match are shown in Figure 4.10. In Figure 4.10 (c), the correct best
match is found by increasing the block size from 30×30 to 60×60, and this is due to
the short distance capture. In Practice, we can set an adaptive block size which is
proportional to the changes of the distance between the stereo cameras and the target
object.
40
(a) Straight Car 196
(b) Straight Car 81
(c) Straight Car 38 (Block Size = 60*60)
Figure 4.10 Straight Car
41
(2) Angled Car
This set of images was taken of a car at an angle of view. The matching results are
shown in Figure 4.11. The system can correctly find the best match between two
stereo images.
(a) Angled Car 137
(b) Angled Car 110
Figure 4.11 Angled Car
42
(3) Taxi
The set of images is taken of a taxi in different angles of view and in different
distances. Using the proposed algorithm can correctly achieve the best match between
stereo image pairs. However in Figure 4.12 (a), we found the best corresponding
points at the windows in the images. Although this is the real match with minimal
MAD, it is supposed to be found at the region of the taxi. A simple solution is to
outline all possible real matches, and choose the one with minimal MAD in the region
of the taxi.
(a) Taxi 340
(b) Taxi 222
43
(c) Taxi 122
(d) Taxi 82
Figure 4.12 Taxi
(4) Foreign Van
The stereo images of foreign van are tested by the system to find the best match. The
results are shown in Figure 4.13, the best match can be found correctly. In Figure 4.13
(a) and (b), the detected matching point in the left image and the point in the right
image have a relatively big vertical displacement. As we only consider the offset in
the horizontal direction, this problem does not affect the system results.
44
(a) Foreign Van 210
(b) Foreign Van 148
(c) Foreign Van 94
Figure 4.13 Foreign Van
45
(5) Landrover
This set of stereo images is taken from the rear of a landrover. The system works well
to extract the best match between two stereo images, and the results are shown in
Figure 4.14.
(a) Landrover 380
(a) Landrover 260
46
Landrover 176
Landrover 141
Figure 4.14 Landrover
(6) White Car
The set of stereo images taken of a white car is the most demanding situation. The
two vertical edges of number plate are significant features for matching process, but
we may loss this important information as the number plate has similar intensity
values to the surrounding paintwork. However the proposed algorithm can
successfully find the best match in the rest region of the car. The results are shown in
Figure 4.15.
47
White Car 186
White Car 130
White Car 68
Figure 4.15 White Car
48
(7) Mean Absolute Difference (MAD)
Image MAD Best Match
Straight Car 196
Angled Car 137
Taxi 340
9
16
26
7
9
16
17
16
(244 , 106) (226 , 110)
(87 , 105) (42 , 105)
(183 , 108) (82 , 109)
(63 , 110) (41 , 110)
(189 , 101) (156 , 103)
(143 , 42) (139 , 46)
(162 , 119) (139 , 122)
(113 , 72) (82 , 72)
(77 , 174) (26 , 167)
Straight Car 81
Straight Car 38
Angled Car 110
Taxi 222
Taxi 122
Taxi 82
Foreign Van 210
Foreign Van 148
Foreign Van 94
Landrover 380
Landrover 260
Landrover 176
Landrover 141
White Car 186
12
10
22
8
(286 , 60) (261 , 46)
(326 , 30) (298 , 44)
(117 , 37) (76 , 38)
(287 , 174) (268 , 183)
(299 , 35) (278 , 48)
(193 , 28) (167 , 33)
(57 , 230) (25 , 224)
8
10
16
White Car 130
White Car 68
12
11
8
(313 , 98) (285 , 112)
(146 , 107) (111 , 124)
(187 , 26) (113 , 30)
10
Figure 4.16 Mean Absolute Difference (MAD)
49
(8) Disparity (or X-offset)
Straight Car
Angled Car
Taxi
380
340
260
222
210
196
186
148
19
xx
20
23
25
18
28
29
28
Straight Car
Straight Car
Angled Car
Taxi
Taxi
Taxi
Foreign Van
Foreign Van
Foreign Van
Landrover
Landrover
Landrover
Landrover
White Car
141
137
130
110
32
22
35
31
33
41
51
122
94
82
White Car
White Car
81
68
38
45
74
101
Car Position Actual Distance (cm) Disparity (pixels)
176
Figure 4.17 Disparity (or X-offset)
50
(9) Distance Determination Results
380
340
260
222
210
196
186
148
19
xx
20
23
25
18
28
29
28
141
137
130
110
32
22
35
31
33
41
51
122
94
82
81
68
38
45
74
101
Actual Distance (cm) Disparity (pixels)
176
Factor
72.20
xx
52.00
51.06
52.50
35.28
52.08
51.04
41.44
45.12
30.14
45.50
37.82
36.30
38.54
41.82
36.45
50.32
38.38
Average Factor 44.89
Figure 4.18 Average Factor
51
380
340
260
222
210
196
186
148
19
xx
20
23
25
18
28
29
28
141
137
130
110
32
22
35
31
33
41
51
122
94
82
81
68
38
45
74
101
Actual Distance(cm)
Disparity(pixels)
176
Calculated Distance(cm)
236.26
xx
224.45
195.17
179.56
249.39
160.32
154.79
160.32
140.28
204.05
128.26
144.81
136.03
109.49
88.02
99.76
60.66
44.45
Error(%)
-37.83
xx
-13.67
-12.09
-14.50
27.24
-13.81
-12.05
8.32
-0.51
48.94
-1.34
18.70
16.48
23.66
7.34
23.16
-10.79
16.97
Figure 4.19 Calculated Distance and Percentage Error
52
4.2. Corner Detection Based Algorithm
We implement the proposed corner detection based algorithm on the database of car
stereo images. Twenty feature points are extracted from each image by Harris/Plessey
corner detection, and then we search real matches among these points. Firstly, the
initial database of Appendix 1 is tested by the system, and then our experiments are
extended to additional database in Appendix 2.
4.2.1. Extraction of Feature Points
We can see from the results that the cornerness map produced by Harris/Plessey
corner detection consists of very clear cornerness components, and there is an enough
difference in the contrast of the cornerness component and its surroundings. As
expected, most of the cornerness components are detected in the area of number plate.
In fact, the cornerness map contains a lot of noise, which are invisible due to the large
range of intensity values. Thresholding of the cornerness map can remove most of the
noise and retain the required cornerness components. The proposed method of
non-maximal suppression is used to obtain the local maxima, and twenty
max-intensity points are extracted by the system.
4.2.2. Block Matching
The process of block matching is slightly different from that is used in the edge
detection based algorithm. As we can obtain points of certain number from each of the
two stereo images, we intend to extract all real matches. Each point in the left image
is mapped to its best matching point in the right image, and we choose five pairs of
points with low MADs as the candidate matches. The best match with the minimal
MAD is set as the benchmark, and then we compare the x offset of each candidate
match with the x offset of the best match. If the difference is within the acceptable
range, then this candidate match is considered to be the real match.
53
4.2.3. Matching Results of Initial Database
The images from Figure 4.25 to 4.28 are the results of processing the initial database
of Appendix 1 by the system. Figure 4.25 and 4.27 display the results of the image
pairs named ‘front 58’and ‘rear 67’, and the layout is shown as in Figure 4.20. For the
rest data, only the image pairs which show the best match are presented. In out
experiments, a relatively small threshold level =10 is selected to guarantee enough
local maxima can be extracted, and then twenty matching points are picked out
according to MAD values. The block size is set to 50×50 in the project.
Original Image(Left)
Original Image(Right)
Cornerness Map(Left)
Cornerness Map(Right)
Corners (Left)
Feature Points(Left)
Corners (Right)
Feature Points(Right)
Best Match (Left) Best Match(Right)
Real Matches(Left)
Real Matches(Right)
Figure 4.20 Layout of the Results for Initial Database
54
(1) Mean Absolute Difference (MAD)
Image MAD Best Match
Front 265
Front 165
Front 94
Front 58
Rear 205
Rear 152
Rear 113
Rear 67
Rear 43
34
26
15
12
13
14
17
14
15
(194 , 64) (178 , 74)
(107 , 97) (85 , 105)
(260 , 125) (223 , 136)
(189 , 206) ( 127, 215)
(114 , 43) (102 , 51)
(202 , 85) (182 , 95)
(239 , 83) (212 , 93)
(190 , 37) (141 , 47)
(257 , 81) (180 , 91)
Figure 4.21 Mean Absolute Difference (MAD)
(2) Disparity (or X-offset)
Car Position Actual Distance (cm) Disparity (pixels)
Front
Rear
Front
Rear
Rear
Front
Rear
Front
Rear
265
205
165
152
113
94
67
58
43
16
12
22
20
27
37
49
62
77
Figure 4.22 Disparity (or X-offset)
55
(3) Distance Determination Results
Actual Distance (cm) Disparity (pixels)
265
205
165
152
113
94
67
58
43
16
12
22
20
27
37
49
62
77
Factor
42.40
24.60
36.30
30.40
30.51
34.78
32.83
35.96
33.11
33.43Average Factor
Figure 4.23 Average Factor
Figure 4.24 Calculated Distance and Percentage Error
56
Figure 4.25 Front 58
57
(a) Front 94
(b) Front 165
(c) Front 265
Figure 4.26 Front Car
58
Figure 4.27 Rear 67
59
(a) Rear 43
(b) Rear 113
(c) Rear 152
60
(d) Rear 205
Figure 4.28 Rear Car
4.2.4. Matching Results of Additional Database
The proposed algorithm is applied to the additional database in Appendix 2, and the
results are shown in Figure 4.29 to Figure 4.34. We only display the images best
match in each of these figures.
(1) Straight Car
(a) Straight Car 196
61
(b) Straight Car 81
(c) Straight Car 38
Figure 4.29 Straight Car
(2) Angled Car
(a) Angled Car 137
62
(b) Angled Car 110
Figure 4.30 Angled Car
(3) Taxi
(a) Taxi 340
(b) Taxi 222
63
(c) Taxi 122
(d) Taxi 82
Figure 4.31 Taxi
(4) Foreign Van
(a) Foreign Van 210
64
(b) Foreign Van 148
(c) Foreign Van 94
Figure 4.32 Foreign Van
(5) Landrover
(a) Landrover 380
65
(b) Landrover 260
(c) Landrover 176
(d) Landrover 141
Figure 4.33 Landrover
66
(6) White Car
(a) White Car 186
(b) White Car 130
(c) White Car 68
Figure 4.34 White Car
67
(7) Mean Absolute Difference (MAD)
Image MAD Best Match
Straight Car 196
Angled Car 137
Taxi 340
12
16
13
18
26
20
22
21
(146 , 125) (128 , 127)
(173 , 151) (127 , 152)
(242 , 216) (139 , 217)
(130 , 107) (105 , 108)
(250 , 38) (215 , 42)
(172 , 41) (168 , 47)
(133 , 116) (110 , 118)
(74 , 150) (38 , 147)
(75 , 145) (27 , 140)
Straight Car 81
Straight Car 38
Angled Car 110
Taxi 222
Taxi 122
Taxi 82
Foreign Van 210
Foreign Van 148
Foreign Van 94
Landrover 380
Landrover 260
Landrover 176
Landrover 141
White Car 186
13
35
21
19
(317 , 64) (310 , 79)
(201 , 69) (170 , 75)
(216 , 29) (174 , 36)
(326 , 136) (313 , 151)
(248 , 35) (228 , 46)
(84 , 173) (56 , 170)
(91, 87) (62 , 86)
15
17
24
White Car 130
White Car 68
12
13
7
(287 , 100) (259 , 111)
(190 , 66) (158 , 71)
(189 , 19) (135 , 23)
17
Figure 4.35 Mean Absolute Difference (MAD)
68
(8) Disparity (or X-offset)
Straight Car
Angled Car
Taxi
380
340
260
222
210
196
186
148
xx
xx
20
23
xx
18
28
28
31
Straight Car
Straight Car
Angled Car
Taxi
Taxi
Taxi
Foreign Van
Foreign Van
Foreign Van
Landrover
Landrover
Landrover
Landrover
White Car
141
137
130
110
29
25
32
36
35
42
48
122
94
82
White Car
White Car
81
68
38
46
54
103
Car Position Actual Distance (cm) Disparity (pixels)
176
Figure 4.36 Disparity (or X-offset)
69
(9) Distance Determination Results
380
340
260
222
210
196
186
148
xx
xx
20
23
xx
18
28
28
31
141
137
130
110
29
25
32
36
35
42
48
122
94
82
81
68
38
46
54
103
Actual Distance (cm) Disparity (pixels)
176
Factor
xx
xx
52.00
51.06
xx
35.28
52.08
49.28
45.88
40.89
34.25
41.60
43.92
38.50
39.48
39.36
37.26
31.32
39.14
Average Factor 39.49
Figure 4.37 Average Factor
70
380
340
260
222
210
196
186
148
xx
xx
20
23
xx
18
28
28
31
141
137
130
110
29
25
32
36
35
42
48
122
94
82
81
68
38
46
54
103
Actual Distance(cm)
Disparity(pixels)
176
Calculated Distance(cm)
xx
xx
197.45
171.70
xx
219.39
141.04
141.04
127.39
136.17
157.96
123.41
109.69
112.83
94.02
82.27
85.85
73.13
38.34
Error(%)
xx
xx
-24.06
-22.66
xx
11.93
-24.17
-19.86
-13.93
-3.43
15.30
-5.07
-10.09
0.02
2.57
0.33
5.99
7.54
0.89
Figure 4.38 Calculated Distance and Percentage Error
71
Chapter 5. Discussion
The algorithms proposed in this project for depth estimation satisfactorily meet the
system requirements. In order that these algorithms can be robust to handle more
demanding situations and adapt to different applications, we discuss in this chapter
some problems which need to be solved and any possible modifications that can be
made in this project.
5.1. Extraction of Edge-based Feature Points
The edge detection based algorithm combines vertical edge detection with Hough
transform (HT) to extract the feature points. As edge detection is bound to respond to
noise, the template of edge detector has to be chosen carefully. In practice, a trade-off
must be made based on the application requirements. The edge image contains thick
vertical edges that need to be dealt with for further use. On one hand, the
morphological filter is applied to make the vertical edges thinner, and this is to avoid
Hough transform detecting redundant vertical lines on the same edge position. On the
other hand, as the edges have been made to be single pixel lines, there may be
discontinuous points. The edge points should be connected, because we have to
extract the top and bottom points of each edge. The Hough transform is applied to
assist in choosing the strong vertical edge features. In this project, we are interested in
the vertical edges particularly in the area of car number plate, and we assume these
edges are absolute vertical. Consider that the real vertical edges may slightly deviate
from the standard position due to the angle of view, and this problem can be solved by
setting a range of the deviation angle for the vertical lines int Hough transform..
5.2. Extraction of Corner-based Feature Points
The corner detection based algorithm extracts the feature points by corner detection.
72
The major advantage of the corner detection is that it directly detects the points of
interest. Compared with the proposed edge detection based algorithm, this approach is
hence insensitive to distortions as a result of changes in perspective. The
Harris/Plessey method is modified and used in this project, and this corner detection
results in a cornerness map which contains pixels of large intensity values. A scale
operation must be applied to the cornerness map to make its intensity range from 0 to
255. It is obvious that not all points in the cornerness map are true corners, and
thresholding is typically used to choose the true corners with large brightness values.
In this project, we proposed a method which combines thresholding and selection of
max-intensity points. In this point of view, thresholding is only used to remove
apparent false corners and reduce the computation cost. Therefore the threshold level
has to be relatively small to retain enough candidate corners. The procedure of
selecting the max-intensity points is the core operation to extract feature points. Due
to the scale operation, many strong corners may have same intensity values (such as
255) in the new cornerness map, although they are different in the original cornerness
map. In order that the selection operation can choose the real corners according to the
intensity values from maximum to minimum, we must extract enough points from the
new cornerness map to guarantee that all the strong corners are selected.
5.3. Matching Algorithm
Block matching is considered as a reasonably technique due to its computational
simplicity, and it is successfully operational for the system. One of the major
difficulties in the process of block matching is the selection of block size. The block
size has big influence on the stereo matching results, for example large block size can
increase the computational cost, and small block size is sensitive to noise.
Furthermore, the border problem must be considered, when the feature points are
located in the border area of an image, a small block size should be chosen to
correctly calculate the MAD. The border width is defined as the half length of one
side of block. If the feature points are in the borders, blocks are not set on these points.
73
The use of an adaptive block size can be used in practice.
5.4. Additional Problems
As we can see from images given in this project, some pairs of images are misaligned.
The misalignment of corresponding image has large influence on the block matching
results, especially when the block size is relatively big. In our experiments, we can
rotate the images manually to make each pairs of images aligned.
Another major problem in depth estimation is that the camera focal length is unknown.
In this project, we approximate the focal length based on the available data. The
images are taken at measured distances, therefore the actual distance is known. As the
database available is not big enough, the estimation of the camera focal length is not
accurate.
74
Chapter 6. Conclusions
In this thesis, we have proposed two algorithms for stereo depth estimation, named
edge detection based algorithm and corner detection based algorithm, both are
satisfactorily operational for the system.
In the edge detection based algorithm, the Sobel edge operator is used to detect the
vertical edges in an image. For each of the available stereo images, we have strong
vertical edges on the vehicle, especially in the region of number plate. Compared with
detecting horizontal edges, detecting vertical edges is more reasonable in terms of its
prior noise-suppression characteristic. Since there are many vertical edges in the edge
image, we have to choose edges of certain number for feature extraction. The criterion
of selection these required edges is based on the strength of the edges. As Hough
transform can extract lines based on an evidence gathering approach where the
evidence is the votes cast in an accumulator space [3], the vertical edges are thinned
to be single pixel lines and then a Hough transform is used to extract strong edge
features. Finally, the top and bottom points of each extracted edge are considered as
the required feature points. This approach is computational efficient and can correctly
extract points of interest from corresponding features between two stereo images.
In the corner detection based algorithm, feature points are directly extracted by corner
detection. The Harris/Plessey approach, a combined corner and edge detector, is used
in this project. This corner detection method is simple and can successfully detect
corners which have sharp contrast to the background. The strong corners are
particularly located in the vehicle number plate, because this region contains many
junctions of edges. Corner detection detects many corners in an image, but we are
only interested in the corners with large enough brightness values, and the rest corners
are not considered to be true corners. Thresholding is applied as the preprocessing to
remove the obvious false corners which have very small intensity values, and the
remaining non-zero points are considered to be true corners. We sort these corners
75
according to their intensity values from maximum to minimum, and the first twenty
(or more) corners are chosen as the feature points. This algorithm is simple and robust
to handle more demanding situations.
The block matching algorithm is implemented in this project to achieve
correspondence between two stereo images. In the process of block matching, a block
of fixed size is centered on each extracted feature point in an image, and then the
intensity difference is calculated between any block from the left image and any block
from the right image. The best match is determined based on the similarity
measurement (such as MAD). The two matching points that have the minimal MAD
are not always result in the best match. In this point of view, we can outline enough
pairs of matching points which have relatively small MADs, and then determine the
real matches based on the horizontal disparity of the point positions.
According to our experiments, both of the two algorithms are successfully operational
for the system to find correspondence and estimate depth information. For even more
demanding situations such as misalignment of corresponding images, the system can
overcome difficulties and give the results correctly.
Future work may involve in the modifications of the proposed algorithms to improve
the reliability and robustness of the depth estimation approach. More stereo images
need to be acquired for further experiments. To successfully implement these
algorithms in practice, a real-time operation system are required to be developed.
76
Acknowledgements
I would like to thank my supervisor Dr John Hannah for his choice of subject,
photography and assistance throughout the last three months. My thanks also go to
Paul Kuo and Dr Peter Hillman for their kind assistance with my project.
77
References
[1] R. C. Gonzalez and R. E. Woods, Digital Image Processing, Second Edition,
Prentice Hall, 2002.
[2] D. H. Ballard and C. M. Brown, Computer Vision, Prentice Hall, 1982.
[3] M. S. Nixon and A. S. Aguado, Feature Extraction and Image Processing,
Newnes, 2002.
[4] U. R. Dhond and J. K. Aggarwal, “Structure from Stereo-A Review,” IEEE
Transactions on Systems, Man and Cybernetics, vol. 19, pp. 1489-1510, Dec.
1989.
[5] C. Harris and M. Stephens, “A Combined Corner and Edge Detection,” Proc.
Alvey Vision Conf., Univ. Manchester, pp. 147-151, 1988.
[6] N. W. Walton, “Generating Depth Maps from Stereo Image Pairs,”Ph.D. thesis,
University of Edinburgh, UK, 2002.
[7] A. G.yaourova, C. Kamath, and S. Cheung, “Block Matching for Object
Tracking,”Lawrence Livermore National Laboratory, 2003.
[8] M. Yu and Y. D. Kim, “An Approach to Korean License Plate Recognition Based
on Vertical Edge Matching”, IEEE Int. Conf. SMC, vol. 4, pp. 2975-2980,2000.
[9] F. Candocia and M. Adjouadi, “A Similarity Measure for Stereo Feature
Matching,”IEEE Transactions on Image Processing, vol. 6, No. 10, Oct. 1997.
[10]A. M. Peacock, “Vision Systems Code Documentation,”University of Edinburgh,
UK, 2000.
[11]T. D. Duan, D. A. Duc, and T. L. H. Du, “Combining Hough Transform and
Contour Algorithm for Detecting Vehicles,” Proceeding of 2004 International
Symposium on Intelligent Multimedia, Video and Speech Processing, Oct. 2004.
[12]Y. Chen, Y. Hung, and C. Fuh, “Fast Block Matching Algorithm Based on the
Winner-Update Strategy,”IEEE Transactions on Image Processing, vol. 10, pp.
1212-1222, 2001.
[13]C. R. Jung and R. Schramm, “Rectangle Detection based on a Windowed Hough
78
Transform,”sibgrapi, vol. 00, no., pp. 113-120, Computer 2004.
[14]Y. Yanamura, M. Goto, and D. Nishiyama, “Extraction and Tracking of the
License Plate Using Hough Transform and Voted Block Matching”, IEEE IV2003
Intelligent Vehicles Symposium Conference, 2003.
[15]H. Bai, J. Zhu, and C. Liu, “A Fast License Plate Extraction Method on Complex
Background”, IEEE 2003 International Conference on Intelligent Transportation
Systems, vol. 2, Oct. 2003.
[16]B. Fisher, HIPR2, University of Edinburgh, UK,
http://www.inf.ed.ac.uk/people/staff/Robert_Fisher.html
[17]B. Fisher, Introduction to Stereo Imaging –Theory, University of Edinburgh, UK,
http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/MARSHALL/nod
e11.html
[18]D. Parks and J. Gravel, Corner Detection, McGill University, Canada,
http://www.cim.mcgill.ca/~dparks/index.htm
A.1
Appendix 1. Initial Project Images
A.1.1. Front
265
165
94
58
A.2
A.1.2. Rear
205
152
113
67
43
A.3
Appendix 2. Additional Project Images
A.2.1. Straight Car
196
81
38
A.2.2. Angled Car
137
A.4
110
A.2.3. Taxi
340
222
122
82
A.5
A.2.4. Foreign Van
210
148
94
A.2.5. Landrover
380
260
A.6
176
141
A.2.6. White Car
186
130
68
top related