comparison of dense stereo matching metrics for real time applications

3
Comparison of Dense Stereo Matching Metrics for Real Time Applications AbstractStereo Matching is one of the classical problems in computer vision. The stereo matching problem is to compute the disparity map for the reference image using two or more images of the same scene. This work is particularly interested in local stereo matching methods, which generally have low computation complexity and less storage requirement; and therefore they are suitable for real-time and embedded implementations. The class of algorithms which has been selected among several is the class of correlation based stereo algorithms because they are the only ones that can produce sufficiently dense range maps with an algorithmic structure which lends itself nicely to fast implementations because of the simplicity of the underlying computation. The proposed work tries to compare various block matching similarity measures like Sum of Absolute Difference (SAD), Sum of Squared Difference (SSD) and Normalized Cross-Correlation (NCC) for calculating depth maps. The result shows that NCC provides a close match to ground truth by reducing error and noises when compared to SAD and SSD. Index Terms Disparity Map, Epipolar Constraint, Stereo Correspondence, Stereo Vision. I. INTRODUCTION HE word "stereo" comes from the Greek word "stereos" which means firm or solid. With stereo vision you see an object as solid in three spatial dimensions width, height and depth--or x, y and z. It is the added perception of the depth dimension that makes stereo vision so rich and special. Stereo matching has been, and continues to be one of the most active research topics in computer vision. The task of stereo matching algorithm is to analyse the images taken from a stereo camera pair, and to estimate the displacement of corresponding points existing in both images in order to extract depth information (inversely proportional to the pixel displacement) of objects in the scene. The displacement is measured in number of pixels and also called Disparity; disparity values normally lie within a certain range, the Disparity Range, and disparities of all the image pixels form the disparity map, which is the output of a stereo matching process. An example with the Teddy benchmark image set is shown in Figure 1. In the figure, the disparities are visualized as gray scale intensities, and the brighter the grayscale, the closer (to the stereo cameras) the object. Therefore the disparity map encodes the depth information of each pixel, and once we infer the depth information by means of stereo matching, we are able to obtain the 3D information and reconstruct the 3D scene using triangulation. Since stereo matching provides depth information, it has great potential uses in 3D reconstruction, stereoscopic TV, navigation systems, virtual reality and so on. a) b) c) Fig. 1 An Example for Disparity Map (a) Image taken by the left camera. (b) Image taken by the right camera. (c) The ground truth disparity map associated with the left image. Many stereo algorithms make use of the epipolar constraint, meaning that for a pixel in the left image the corresponding point in the right image lies on the same horizontal line, the epipolar line. This strong constraint is used to reduce the search space of the correspondence algorithms that calculates depth maps. In the past two decades, various stereo matching algorithms have been proposed and they were summarized and evaluated by Scharstein and Szeliski [1]. In his notable work, these proposed stereo matching algorithms are categorized into two major types: local area based methods and global optimization based methods. In local methods, the disparity evaluation at a given pixel is based on similarity measurement performed in a finite window. The similarity metric is defined by a matching cost and the all cost in the local window is often aggregated to provide a more reliable and robust result. On the other hand, global methods define global cost functions and solve an optimization problem. Global algorithms typically do not perform an aggregation step, but rather seek a disparity assignment that minimizes a global cost function. In this work we are particularly interested in local stereo matching methods, which generally have low computation complexity and less storage requirement; and therefore they are suitable for real-time and embedded implementations. Merlin George, Student Member, IEEE, and Rejimol Robinson R.R T

Upload: sendtomerlin4u

Post on 30-Oct-2014

38 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Comparison of Dense Stereo Matching Metrics for Real Time Applications

Comparison of Dense Stereo Matching Metrics for

Real Time Applications

Abstract— Stereo Matching is one of the classical problems

in computer vision. The stereo matching problem is to

compute the disparity map for the reference image using

two or more images of the same scene. This work is

particularly interested in local stereo matching methods,

which generally have low computation complexity and less

storage requirement; and therefore they are suitable for

real-time and embedded implementations. The class of

algorithms which has been selected among several is the

class of correlation based stereo algorithms because they

are the only ones that can produce sufficiently dense range

maps with an algorithmic structure which lends itself

nicely to fast implementations because of the simplicity of

the underlying computation. The proposed work tries to

compare various block matching similarity measures like

Sum of Absolute Difference (SAD), Sum of Squared

Difference (SSD) and Normalized Cross-Correlation

(NCC) for calculating depth maps. The result shows that

NCC provides a close match to ground truth by reducing

error and noises when compared to SAD and SSD.

Index Terms — Disparity Map, Epipolar Constraint, Stereo

Correspondence, Stereo Vision.

I. INTRODUCTION

HE word "stereo" comes from the Greek word "stereos"

which means firm or solid. With stereo vision you see an

object as solid in three spatial dimensions width, height and

depth--or x, y and z. It is the added perception of the depth

dimension that makes stereo vision so rich and special. Stereo

matching has been, and continues to be one of the most active

research topics in computer vision. The task of stereo

matching algorithm is to analyse the images taken from a

stereo camera pair, and to estimate the displacement of

corresponding points existing in both images in order to

extract depth information (inversely proportional to the pixel

displacement) of objects in the scene. The displacement is

measured in number of pixels and also called Disparity;

disparity values normally lie within a certain range, the

Disparity Range, and disparities of all the image pixels form

the disparity map, which is the output of a stereo matching

process. An example with the Teddy benchmark image set is

shown in Figure 1. In the figure, the disparities are visualized

as gray scale intensities, and the brighter the grayscale, the

closer (to the stereo cameras) the object. Therefore the

disparity map encodes the depth information of each pixel, and

once we infer the depth information by means of stereo

matching, we are able to obtain the 3D information and

reconstruct the 3D scene using triangulation. Since stereo

matching provides depth information, it has great potential

uses in 3D reconstruction, stereoscopic TV, navigation

systems, virtual reality and so on.

a) b) c)

Fig. 1 An Example for Disparity Map (a) Image taken by the left camera. (b)

Image taken by the right camera. (c) The ground truth disparity map associated

with the left image.

Many stereo algorithms make use of the epipolar constraint,

meaning that for a pixel in the left image the corresponding

point in the right image lies on the same horizontal line, the

epipolar line. This strong constraint is used to reduce the

search space of the correspondence algorithms that calculates

depth maps.

In the past two decades, various stereo matching algorithms

have been proposed and they were summarized and evaluated

by Scharstein and Szeliski [1]. In his notable work, these

proposed stereo matching algorithms are categorized into two

major types: local area based methods and global optimization

based methods. In local methods, the disparity evaluation at a

given pixel is based on similarity measurement performed in a

finite window. The similarity metric is defined by a matching

cost and the all cost in the local window is often aggregated to

provide a more reliable and robust result. On the other hand,

global methods define global cost functions and solve an

optimization problem. Global algorithms typically do not

perform an aggregation step, but rather seek a disparity

assignment that minimizes a global cost function.

In this work we are particularly interested in local stereo

matching methods, which generally have low computation

complexity and less storage requirement; and therefore they

are suitable for real-time and embedded implementations.

Merlin George, Student Member, IEEE, and Rejimol Robinson R.R

T

Page 2: Comparison of Dense Stereo Matching Metrics for Real Time Applications

II. BLOCK MATCHING

The block matching method is one of the most popular local

methods because of its simplicity in implementation. The basic

idea of block matching for stereo correspondence is as follows:

to estimate the disparity of a point in the left image, we define

a reference block surrounding this point; and then, find the

closest matched block, within a search range in the right image,

using a pre-specified matching criterion; thus, the relative

displacement between the reference block and the closest

matched block constitutes the disparity of the point being

evaluated. In this work, matching criteria used for comparison

are the Sum of Absolute Differences (SAD), the Sum of

Squared Differences (SSD) and the Normalized Cross-

Correlation (NCC).

Normalized Cross-Correlation (NCC) is the standard

statistical method for determining similarity. Its normalization,

both in the mean and the variance, makes it relatively

insensitive to radiometric gain and bias. The sum of squared

differences (SSD) metric is computationally simpler than

cross-correlation, and it can be normalized as well. In addition

to NCC and SSD, many variations of each with different

normalization schemes have been used. One popular example

is the sum of absolute differences (SAD), which is often used

for computational efficiency [3].

III. MATCHING METRICS

The proposed work tries to compare various block matching

similarity measures like Sum of Absolute Difference (SAD),

Sum of Squared Difference (SSD) and Normalized Cross-

Correlation (NCC) for calculating depth maps. These are

shown in the Table 1.

A. Sum of Absolute Differences(SAD)

Sum of Absolute Differences (SAD) is one of the simplest

of the similarity measures which is calculated by subtracting

pixels within a square neighbourhood between the reference

image I1 and the target image I2 followed by the aggregation

of absolute differences within the square window, and

optimization with the winner-take-all (WTA) strategy [1]. If

the left and right images exactly match, the resultant will be

zero.

B. Sum of Squared Differences(SSD)

In Sum of Squared Differences (SSD), the differences are

squared and aggregated within a square window and later

optimized by WTA strategy. This measure has a higher

computational complexity compared to SAD algorithm as it

involves numerous multiplication operations.

TABLE I

BLOCK MATCHING METRICS USED FOR COMPARISON

Match Metric Definition

Sum of Absolute

Differences(SAD)

SAD(x,y,d) = , - -d, |

Sum of Squared

Differences(SSD)

SSD(x,y,d) = , - -d,2

Normalized Cross-

Correlation(NCC) NCC(x,y,d) =

– –

C. Normalized Cross-Correlation(NCC)

Normalized Cross Correlation is even more complex to both

SAD and SSD algorithms as it involves numerous

multiplication, division and square root operations. But the

result shows that it gives the best disparity map compared to

SAD and SSD.

IV. RESULTS AND DISCUSSIONS

In this section, we present some experimental results on

teddy stereo pairs with ground truth from the Middlebury

Stereo Vision page. In this work, teddy stereo image pair was

taken for the study because it is rich in depth discontinuity.

Sum of Absolute Differences (SAD) is easier and faster to

compute than Sum of Squared Differences (SSD) and

Normalized Cross-Correlation (NCC). But from table II it is

noted that Normalized Cross-Correlation (NCC) gives more

accurate disparity map when compared to Sum of Absolute

Differences (SAD) and Sum of Squared Differences (SSD).

Also Normalized Cross-Correlation (NCC) reduces the error

and noise of the disparity map since the calculation averages

the noise of each pixel. Error has been calculated for different

window sizes. It is clear from table III that Normalized Cross-

Correlation (NCC) provides a close match to ground truth by

reducing the noises created in Sum of Absolute Differences

(SAD) and Sum of Squared Differences (SSD).

TABLE II

COMPARATIVE PERFORMANCE OF ALGORITHMS ON TEDDY STEREO IMAGE

PAIR

TABLE III

DISPARITY MAP COMPARISON OF TEDDY STEREO IMAGE PAIR

Method Disparity Map

3x3 5x5 7x7

SAD

Image Method Error

Teddy

Window Size

3x3 5x5 7x7

SAD 4.3420e+

004

4.3151e+

004

4.3029e+

004

SSD 4.3286e+

004

4.3076e+

004

4.2977e+

004

NCC 4.2908e+

004

4.2502e+

004

4.2398e+

004

Page 3: Comparison of Dense Stereo Matching Metrics for Real Time Applications

SSD

NCC

V. CONCLUSIONS AND FUTURE WORK

In general, SAD is easier to compute and is less sensitive to

outliers than other measures. Stereo by SAD correlation has

proven a robust and reliable tool in moderately complex

environments. In this work it is proved that Normalized Cross-

Correlation (NCC) provides a close match to ground truth and

also the error computed is much less when compared to Sum

of Absolute Differences (SAD) and Sum of Squared

Differences (SSD). But the computing time taken by NCC is

much higher than SAD and SSD. So our future work in this

area is to develop an efficient NCC-based stereo matching

algorithm which works faster than conventional Normalized

Cross Correlation (NCC).

REFERENCES

[1] D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-

frame stereo correspondence algorithms. International journal of

computer vision, 47(1):7-42,2002.

[2] Daniel Scharstein, Richard SZeliski, ―A taxonomy and evaluation of

dense two-frame stereo correspondence algorithms,‖International

Journal of Computer Vision,vol. 47,no.1,pp.7–42,2002.

[3] Myron Z. Brown, Darius Burschka, and Gregory D. Hager, ―Advances

in computational stereo,‖ IEEE Trans. Pattern Analysis and Machine

Intelligence, vol. 25, no. 8, pp.993–1001,2003. (2002)

[4] E. Salari and J. Strong. On the reliability of correlation based stereo

matching. In IEEE Int. Conf. on Systems Engineering,pages 559–561,

1990.

[5] T. Kanade and M. Okutomi: A Stereo Matching Algorithm with an

Adaptive Window: Theory and Experiments, PAMI, vol. 16, no. 9

(1994) 920-932.

[6] S.T. Barnard and M.A. Fischler, ―Computational Stereo,‖ ACM

Computing Surveys, vol. 14, pp. 553-572, 1982

[7] Birchfield and C. Tomasi, ―Depth Discontinuities by Pixel-to-Pixel

Stereo,‖ Technical Report STAN-CS-TR-96-1573, Stanford Univ.,

1996.

[8] O. Faugeras, B. Hotz, H. Matthieu, T. Vieville, Z. Zhang, P. Fua,

E.Theron, L. Moll, G. Berry, J. Vuillemin, P. Bertin, and C. Proy,―Real

Time Correlation-Based Stereo: Algorithm, Implementations and

Applications,‖ INRIA Technical Report 2013, 1993.

[9] S. Birchfield and C. Tomasi, ―Depth Discontinuities by Pixel-to-Pixel

Stereo,‖ Proc. IEEE Int’l Conf. Computer Vision, pp. 1073-1080,1998.

[10] http://vision.middlebury.edu/stereo/data/...

Merlin George received the B.Tech degree in Computer Science and

Engineering from M.G University, Kottayam, in

2006 and now an M.Tech student in computer

Science and Engineering at Kerala University,

Thiruvananthapuram. Her field of interests include

stereo matching, 3D reconstruction, and

computational photography. She is a student member

of the IEEE.

Rejimol Robinson R.R received B.Tech degree in Computer Science and

Engineering from the University of Kerala in 1999

and M.Tech in Computer Science with specialization

in Digital Image Computing from the same

university in the year 2007.She is currently working

as a Senior Lecturer in Computer Science and

Engineering of the University of Kerala. Her research

interest area include Digital Image Processing,

Pattern Recognition, Network Security, Intrusion

Detection System