center for machine perception the cmp...

35
CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY RESEARCH REPORT ISSN 1213-2365 The CMP Evaluation of Stereo Algorithms Jana Kostkov´ a Jan ˇ Cech Radim ˇ ara {kostkova, cechj, sara}@cmp.felk.cvut.cz CTU–CMP–2003–01 January 29, 2003 Available at ftp://cmp.felk.cvut.cz/pub/cmp/articles/kostkova/Kostkova-TR-2003-01.pdf This research was supported by the Grant Agency of the Czech Re- public under project GACR 102/01/1371 and by the Czech Ministry of Education under project MSM 212300013. Research Reports of CMP, Czech Technical University in Prague, No. 1, 2003 Published by Center for Machine Perception, Department of Cybernetics Faculty of Electrical Engineering, Czech Technical University Technick´ a 2, 166 27 Prague 6, Czech Republic fax +420 2 2435 7385, phone +420 2 2435 7637, www: http://cmp.felk.cvut.cz

Upload: others

Post on 29-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

CENTER FOR

MACHINE PERCEPTION

CZECH TECHNICAL

UNIVERSITY

RESEARCH

REPO

RT

ISSN

1213

-236

5The CMP Evaluation of Stereo

Algorithms

Jana Kostkova

Jan Cech

Radim Sara

{kostkova, cechj, sara}@cmp.felk.cvut.cz

CTU–CMP–2003–01

January 29, 2003

Available atftp://cmp.felk.cvut.cz/pub/cmp/articles/kostkova/Kostkova-TR-2003-01.pdf

This research was supported by the Grant Agency of the Czech Re-public under project GACR 102/01/1371 and by the Czech Ministryof Education under project MSM 212300013.

Research Reports of CMP, Czech Technical University in Prague, No. 1, 2003

Published by

Center for Machine Perception, Department of CyberneticsFaculty of Electrical Engineering, Czech Technical University

Technicka 2, 166 27 Prague 6, Czech Republicfax +420 2 2435 7385, phone +420 2 2435 7637, www: http://cmp.felk.cvut.cz

Page 2: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation
Page 3: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

Contents

1 Introduction 11.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Types of Errors 4

3 Experimental Setup 93.1 Test scene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.2 Ground-Truth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2.1 Object segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 123.2.2 Disparity map composition . . . . . . . . . . . . . . . . . . . . . . . 12

3.3 Experimental data acquisition . . . . . . . . . . . . . . . . . . . . . . . . . 13

4 Evaluation of the errors 16

5 Experiments 175.1 Evaluated Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

6 Conclusion 23

A Simulation of the glass reflectance 31

i

Page 4: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

Abstract

Detailed studying of stereo matching algorithm properties and behaviour undervarying conditions is crucial for the algorithm improvement and development. To-wards this purpose we have designed a ground-truth evaluation method focusing onalgorithm failure due to insufficient signal-to-noise ratio, since the data uncertaintyis never totally avoidable. For the complex evaluation, nine types of error statisticshave been defined. The errors are focussed on basic matching failure mechanismsand their definitions observe the principles of orthogonality, symmetry, completenessand independence.

The test scene consists of a background plane and five thin stripes on a foreground.It has been captured under 20 different levels of texture contrast, in each level 10×with randomly shifted texture. The scene has been designed to preserve ordering.The ground truth has been obtained semi-manually.

The algorithms are evaluated on all the 200 images and the results are shown asdisparity maps and graph plots depicted for each error separately. The disparity mapsallow visual inspection, while the plots give quantitative results of the algorithm per-formance. In this paper, we have tested four state-of-the-art stereo algorithms and ontheir evaluation we have demonstrated how to interpret the evaluation results. Fromour experimental analysis we can conclude, that for the view prediction applicationthe best choice is the Graph Cuts algorithm and for the structure reconstruction itis the Confidently Stable Matching.

1 Introduction

Stereo vision has been one of the most investigated topics in computer vision for morethan four decades [9] resulting in an enormous number of publications dedicated to thistopic. They mainly focused on a correct formulation of the stereo matching problem andfinding its solution in an efficient way. Consequently, the demand of algorithm evaluation,examination and comparison arose with the increasing number of approaches.

We put the emphasis on the dense area-based binocular stereo matching in this paper.Although there exist very large number of various stereo methods, not much work has beendone on stereo algorithms evaluation. Only a few attempts designing methodologies foralgorithms comparison have been recently published, the significant ones are discussed inthe next section. However, to our knowledge none has been designed with the purpose ofalgorithms studying and examination. Having the methodology allowing detailed algorithmstudying we consider to be very essential for the algorithm development.

Our goal is to design the evaluation methodology which enables not only the stereoalgorithms ranking, but also allows to study matching error mechanisms with the aim todo stereo algorithm development. It is therefore necessary to design a set of experimentaldata where each individual experiment is targeted on a specific cause of error and othercauses of errors are excluded. Our method is based on known ground-truth and we focus onfailure due to insufficient signal-to-noise ratio, because the data uncertainty is never totally

1

Page 5: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

avoidable. Error mechanisms can then be studied in detail to discover specific weaknessesof various algorithms.

The paper is organized as follows: the next paragraphs are dedicated to the introductionof relevant publications. Then, the definition of basic matching error types used for theevaluation is proposed. In Sec. 3, the testing scene and the whole experimental setup aredescribed. The results on four different algorithms are given in Sec. 5 to demonstratethe utility of this methodology in (1) discovering not only basic but also small differencesamong the tested algorithms, and (2) allowing detailed studying of the algorithm propertiesand behaviour. In the last section, the whole work is concluded.

The aim of this paper is not to evaluate all available matching algorithms but to showthis is possible with the proposed method. Our intention is to make the data and theevaluation code public at our web site. If the response will be positive, larger evaluationstudy will be possible.

1.1 Related Work

The existing evaluation approaches can be distinguished into two main classes. Class 1methods do not use any ‘ground-truth’ information, they are based on a sequence of images.They use predicted matches obtained from image subsequence and evaluate their mutualconsistency [12] or validate them in independent images [24].

The self-consistency method [12] works by first dividing the image set into overlap-ping subsets. The matching problem is solved for each subset and the correspondenceinformation is used to reconstruct a spatial point. A statistic is defined on mutual dis-tances of points reconstructed from different subsets. Points are uniquely identified bytheir projected coordinates in an image common to the subsets.

The prediction error method [24] partitions the set of images into prediction and val-idation sets. Matches from the prediction set are transferred to validation images andresidual disparity vectors are computed for each predicted point. The prediction error isa statistic defined on the combination of image similarity and residual disparity vectormagnitudes. This method may fail to recognize structural errors due to repetitive patternsin the images.

The Class 1 methods are suitable when data is given and the experimental procedurecannot be re-designed. They can work with very complicated scenes where ground truthwould not be possible to obtain. Complicated scenes, on the other hand, do not allowto distinguish errors due to half-occlusions, repetitive structures, low signal-to-noise ratio,surface non-Lambertianity, and systematic errors on occlusion boundaries. Except thesereasons, there exist one more objection, and probably the most limiting, for using thesemethods for evaluation: although the predicting ability for the images is good, the disparitymap can be completely erroneous [6].

Class 2 methods are based on ground-truth. Ground-truth is either obtained fromindependent measurement (using range-finder [15], ground control points [5] or digitalelevation model [4]) or semi-manually with the help of a strong prior model (e.g., piecewise

2

Page 6: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

planarity). Recently, the most often used data of the second type is the Lab Scene fromthe University of Tsukuba [20, 21]. An overview of less often used datasets is given in [24].

Ground-truth from independent measurement must be obtained by a method by atleast an order of magnitude more accurate than stereo, which is not always possible. Inprinciple, complicated scenes can be measured as long as their complexity does not hinderthe accuracy of the independent method. The semi-manual ground-truth must be obtainedfrom independent images, not the data set itself. Nine images were used for the LabScene [20], for instance.

There have been several medium to large-scale efforts to evaluate stereo matching al-gorithms in a systematic way [11, 8, 1, 25, 23].

In [11] ten different stereo algorithms were re-implemented and evaluated. Their com-parison was based on the number of correctly matched pixels and thus only measuresthe overall quality of the matching. The choice of the test images is not focused on anyparticular application nor motivated and/or documented.

In [8] performance evaluation was focused on cartographic feature extraction appli-cation. This strongly influenced the test data selection. Only two algorithms based ondifferent matching techniques (area-based and feature-based approaches) were tested.

For a long time unsurpassed evaluation study has been the JISCT effort [1]. It is alsoapplication oriented1 but methodologically very advanced. On a large set of different stereoimages (44) from real complex scenes, various approaches from different groups (INRIA,SRI, Teleos) were statistically evaluated based on three types of errors: false negatives,false positives and mismatches (using our terminology). Ground-truth information wasprovided manually. Certain weakness of this study is that this ground truth is only partialand that the method is not focused on selected error mechanisms.

The second attempt to design an evaluation methodology for contemporary stereo al-gorithms has been published in [25]. Two different evaluation methods—comparison withground-truth and prediction error [24]—were applied to a few stereo algorithms (such asarea-based correlation methods, MLMHV [3], graph cuts [2], cooperative algorithm [26]).This study has been employed as a core of a large evaluation effort [23].

Nowadays widely used evaluation study is [23]. The authors selected four differentstereo scenes to establish an evaluation test set. The evaluation methodology is basedon ground-truth comparison. Two different statistics: the percentage of bad matchingcorrespondences and root-mean-squared error have been proposed. The following regions:textureless, occluded and depth discontinuity have been selected in order to support detailedalgorithms comparison. The authors implemented a few stereo algorithms and publishedthe results and a taxonomy of these algorithm together with free evaluation code at theirevaluation web page [22]. The other stereo researches are asked to run their algorithmson the test set and to contribute with their evaluated results to a taxonomy of contem-porary stereo approaches. This study is a very well designed methodology for the overallranking and comparison of stereo algorithms. Nevertheless, it does not offer any ability for

1Motivated by a development of stereo vision module for unmanned ground vehicle.

3

Page 7: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

studying the algorithms behaviour in detail in order to discover their specific weaknesses.First, they do not give any mechanism for parameters’ setting. Therefore, the authors setparameters of their methods somehow (in the worst case for each experiment separately),which leads to non-comparative results. Second, the half-occluded regions are excludedfrom the considered errors. Thus, the experiments do not say anything about how thealgorithms are able to handle occlusions. Third, although the textureless regions are mon-itored, the experiments do not have any potential for detailed algorithm investigation inthese areas. The textureless regions are only segmented from the input images, and thusthe experiments can only say if the correspondences are or are not established there. Nopossibility for studying the improvements as a function of varying signal-to-noise ratio isoffered. Fourth, the selected scenes are not very convenient for examining the algorithmsproperties. Three of them consist of scenes created from objects/planes slanted underdifferent angles with various textures. The last one—the Lab scene, on the other hand,represents very complex and difficult scene. To the contrary, there is no scene dedicatedto repetitive patterns, and consequently it is impossible to study algorithms failure due tostructural ambiguity at all.

2 Types of Errors

In stereoscopic vision, there exist three kinds of elementary matching errors in a disparitymap:

1. False positives, i.e. matches found in half-occluded regions,

2. False negatives, i.e. matches missing in binocularly visible regions (holes),

3. Mismatches, i.e. matches in binocularly visible regions where the difference from theground-truth was greater than 1.

Although these errors are traditionally used for stereo algorithm comparison and evalu-ation, they are not correctly defined. This error definition is closely related to selectedreference image and thus the errors are neither symmetric nor independent (one errormight imply another error) and based on these errors it is impossible to study algorithmbehaviour or specific properties (it is even impossible to identify if the algorithm producea matching or not). Consequently, they are not very convenient for algorithm study andevaluation.

In our methodology, we are able to evaluate arbitrary results, even those which are notone-to-one matching (such as Winner-takes-all). However we assume, the evaluated stereoalgorithms produce results, which are matchings, and other artefacts, such as false negativesdue to ordering constraint, false positives, etc., are classified as errors. To summarize, thefollowing four principles are important in designing the evaluation method:

1. Orthogonality. One error need not to influence (or imply) any other error. Thisensures the errors are not related and each of them measures a specific property.

4

Page 8: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

L

R

C

(a) “fly”

L

R

C

(b) “elephant”

L

R

C

(c) “key hole”

Figure 1: Various types of scene structures. “Fly” (left) represents a small obstacle in frontof a plane, where the ordering is violated. “Elephant” (center) represents a large obstaclein front of the plane, where the ordering is preserved. “Key hole” (right) represents thescene which is observed through a small (key) hole. This type of scene structure causesmutually occluded regions in the images.

2. Symmetry. The errors have to be invariant to the selection of the reference image,which guarantees the errors are equivalent for both the views.

3. Completeness. The error definition has to be rightful and correct in all the maintypes of scene structure, which are shown in Fig. 1. This guarantees the error usabilityfor arbitrary image scenes.

4. Algorithm independence. Completely dense disparity maps and one-to-one match-ing are not enforced. The arbitrary matching density allows to evaluate algorithmsthat do not produce dense disparity maps. The comparability of the results is guar-anteed by normalizing by a matching size.

In general, stereo matching approaches assume, the input images are rectified suchthat the epipolar lines coincide with the corresponding image rows. Thus the matching isperformed only between the corresponding image rows. Consequently, we can define thematching errors on the matching table for one row, which is the Cartesian product of allpossible matches between the selected corresponding image rows. The scene designed forthe sake of the evaluation, described in detail in Sec. 3, consists of a background object anda foreground object, see Figs. 4, 5. The sketch of the matching table is shown in Fig. 2,where the broken diagonal line represents the background, and the short line representsthe foreground (its part which is repeated five times in the scene, see Fig 5).

Assuming the above and principles, let us now define the evaluation errors. Having theground-truth we assume half-occluded regions are identified. The error definitions makeuse of defining areas illustrated in Fig. 2.

Let P be the matching table for the specified image row, the length of the main diagonal(for the case the table P is not squared) is D(P ). Let G ⊂ P be the ground-truth in the

5

Page 9: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

jointly occluded

half−occludedin right image

pixelsleft image

occlusionboundary

matchingground−truth

in left imagehalf−occluded

right image pixels

shadowed

OB

S

C

GB

GF

OF

(a) error definition

left imagepixles

right image pixels

FNO

FNB

FNF

FNS

MIS

FP

MI

(b) error areas

Figure 2: Matching tables for error definitions. The specific regions in the matching tableP (left), and the areas of error occurrence (right).

p+

Figure 3: An inhibition zone corresponding to uniqueness and ordering FX(p) ⊂ P . Theboundary of the zone is included, the pair p excluded: p /∈ FX(p).

matching table P , G = GF ∪ GB, GF ∩ GB = ∅, where GF represents the part of theground-truth corresponding to the foreground object, while GB the part of the ground-truth corresponding to the background object. The size of G, |G|, is the number of pairsin the ground-truth. Let M be the matching produced by the examined algorithm, thesize of M , |M |, is the number of pairs in the matching. Let ZG(p) be the inhibition zonefor a pair p, which represents constraints the test scene conforms to. The definition ofinhibition zone requires that any pair q ∈ ZG(p) cannot be in the matching M if p ∈ Mand vice versa. In our case the test scene conforms to uniqueness and ordering, thereforethe inhibition zone ZG = FX(p) consists of pairs as shown in Fig. 2.

For the complete algorithm evaluation it is important to study the algorithm propertiesalso in specific scene regions (such as occlusion boundary, non-textured areas), thus, thefollowing areas in the matching table P have been particularly defined: let C ⊂ P bethe intersection of the half-occluded regions, S ⊂ P the textureless area, and O ⊂ P the

6

Page 10: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

union of 5 × 5 neighbourhoods of all occlusion boundaries in disparity space. We defineO = OF ∪OB, OF ∩OB = ∅, where OF represents the part corresponding to the foregroundobject, while OB the part corresponding to the background object. The sizes of the definedareas, D(C), D(S) and D(O), are the diagonal lengths of the respective areas (in pixels).

The following types of erroneous correspondences were considered (cf. Figs. 2(a), 2(b)):

Mismatch (MI) is the sum of all pairs p ∈ M that are matched incorrectly, i.e. p /∈ G.This error is evaluated in textured binocularly visible areas, i.e. excluding regionsC, S, and O :

MI = |{p ∈ P | p ∈ (M \G \ C \ S \O)}|.

False Negative (FN) is the sum of all unmatched ground-truth pairs p ∈ G, p /∈ M .This error counts proper holes, which are independent on mismatches (i.e. such pairsp ∈ G that for any q ∈ M holds q /∈ ZG(p)):

FNx = |{p ∈ P | p ∈ (Gx \M \ S \O), ZG(p) ∩M = ∅}|,

where x may represent foreground F and background B, since in this error we dis-tinguish the foreground and the background object.

False Positive (FP ) is the sum of all pairs p ∈ P , such that they are proper false positives(i.e. false positives from both the views):

FP = |{p ∈ P | p ∈ (M ∩ C)}|.

Occlusion Boundary Error (OBE) is the sum of all unmatched pairs from the occlusionboundary region p ∈ O :

OBEx = |{p ∈ P | p ∈ ((G ∩Ox) \M)}|,

where x may represent foreground F or background B, since in this error we alsodistinguish the foreground and the background object.

Mismatch in Textureless Region (MIS) is the sum of all mismatched pairs from the tex-tureless area p ∈ S, p ∈ M , and p /∈ G :

MIS = |{p ∈ P | p ∈ ((M \G) ∩ S)}|.

False Negative in Textureless Region (FNS) is the sum of all proper false negatives fromthe textureless region (S):

FNS = |{p ∈ P | p ∈ ((G ∩ S) \M), ZG(p) ∩M = ∅}|.

7

Page 11: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

For the sake of the evaluation, the following nine types of errors have been consideredusing the above definitions. We have divided them into three categories:

1. Primary errors:

Mismatch Rate (MIR) is the percentage of mismatches in binocularly visible areasexcluding textureless regions (shadows) and occlusion boundary regions. Thiserror measures the accuracy of the matching M :

MIR =MI

|M \ C \ S \O|=|M \G \ C \ S \O||M \ C \ S \O|

. (1)

False Positive Rate (FPR) is the percentage of proper false positives, i.e. thepercentage of pairs matched in the region C. This error indicates that thealgorithm is unable to detect jointly occluded regions:

FPR =FP

|M \ S \O|=

|M ∩ C||M \ S \O|

. (2)

False Negative Rate (FNR) is the percentage of missing correspondences (properholes). This error measures the sparsity of the disparity map:

FNR =FNB + FNF

|G \ S \O|. (3)

Failure Rate (FR) is the only error defined in the traditional way: the percentageof incorrect correspondences in the whole image. However, as it will be shownin Sec. 5, this error does not provide much information about the algorithmproperties or behaviour:

FR =MI + FP + FNB + FNF

D(P ). (4)

2. Secondary errors:

Textureless Regions: We are interested in distinguishing the algorithms in three dif-ferent types based on the textureless regions: (1) the algorithms that generatespurious correspondences there, (2) the algorithms that interpolate the solu-tion over this region, and (3) the algorithms that detect it as a region withoutinformation contents. We therefore define:

Mismatch Rate in Textureless Regions (MISR) is the percentage of mismatchesin the textureless regions S. This error measures the accuracy of the givenmatching in areas with low signal-to-noise ratio:

MISR =MIS

|M ∩ S|=|(M \G) ∩ S|

|M ∩ S|. (5)

8

Page 12: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

False Negative Rate in Textureless Regions (FNSR) is the percentage of theproper holes. This error measures the disparity map sparsity in texturelessregions:

FNSR =FNS

D(S). (6)

Occlusion Boundary Inaccuracy (OBI) is the percentage incorrect correspondencesin the vicinity of the occlusion boundary. This error measures the precision ofocclusion boundary detection:

OBI =OBEB + OBEF

D(O)=|(G \M) ∩O|

D(O). (7)

3. Unbiasedness:

Bias (B) The matching algorithm is biased if it assigns correct correspondencesin one of the objects more often than in the other one. We measure it asthe difference between unmatched pairs in the background object and in theforeground object:

B =|GB \M \ S \O||GB \ S \O|

− |GF \M \O||GF \ O|

. (8)

Small absolute values of B imply non-biased matching, while the increasingvalues indicate biased matching. Positive B means that foreground is preferred,negative B means that preferred is background.

Occlusion Boundary Bias (OBB) is defined similarly as B, only focused on theocclusion boundary region O :

OBB =OBEB −OBEF

D(O)=|(G ∩OB) \M | − |(G ∩OF ) \M |

D(O). (9)

This error measures if the algorithm exhibits bias at the occlusion boundaryand shows if the object (foreground or background) or the occlusion regions arepreferred (widened) by the algorithm.

Since various matching algorithms may differ in matching window size, differences in thevicinity of occlusion boundary must be expected. When large windows are used, occlusionboundary tends to shift from the true position as far as matching window radius [19]. Fromour experience it follows that 5× 5 matching window is sufficient for the test dataset. Wetherefore require any tested algorithm will use matching window of 5× 5 pixels or smaller.

3 Experimental Setup

3.1 Test scene

The actual test scene we used (see Fig. 4:3) consists of five long thin stripes (sticked ona table of glass, we call it foreground) in front of a white flat panel (a paper sticked on

9

Page 13: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

1

4

2

3

4

3

1

2

α

Figure 4: The experimental setup: 1-cameras, 2-texture projector, 3-Test scene, 4-calibration target

Figure 5: Details of cameras and test scene

10

Page 14: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

another glass which forms the background). The distance between the background andthe foreground together with the camera distance was adjusted so that the scene preservesordering. The stripe color is yellow. Theire color was trimmed to cause the same intensityof the image in the black&white cameras. We could not use the same matarial for thestripes and the background, because the front glass partially absorbs the rays reflectedform the background.

A random texture is projected onto the scene using a projector (Fig. 4:2). The textureused was as fine as possible but coarse enough not to cause aliasing in the cameras.

Projector’s focal length was 80mm and its distance from the scene was about 1.5m.The lighting direction is chosen so that shadows cast by stripes appear approximately inthe middle between the stripes in both views, as we want to avoid texture-less areas atbackground-foreground boundaries. A 150W/24V incandescent lamp with a flat filamentwas powered from a stabilized DC power source.

Illumination nonuniformity, due to strong vignetting effect of the projector, was par-tially reduced with decentering the lamp from its standard position so that the back mirrorcreates another image of the filament in the lens focal plane. This decreases the quality offocusing, which is compensated by adding an iris to the projector.

The projector was equipped with a polarization filter which helps to reduce reflectionson the glass (see later). The projector’s source voltage was set to 15 V out of 24 V, sincehigher power would have damaged the polarizer.

The scene was observed with two cameras (Fig. 4:1). We used digital VosskuehlerCOOL-1300 cameras (12 bits, Peltier cooled to 250 Kelvins, high resolution 1280 × 1024pixels) with two digital framegrabbers DataTranslation DT3157 synchronized from an ex-ternal source. We used Computar 25 mm lenses, because they are 2/3” chip lenses andhave small vignetting.

Cameras were also equipped with polarization filters, that were orientated in the sameplane as the projector’s polarizer. The aim is to minimize non-lambertian reflections fromthe front glass. This polarizer setting was set in accordance with a simulation, see A.

Geometric camera calibration was performed with a special self-identifying pattern [13](Fig.4:4) sticked on a plane, which was translated into three positions in front of thecameras. We also calibrated radial distortion using Radial Distortion Toolbox [16].

Rectification homographies computed from the camera matrices [17], included reducingimage size to a half (to speedup matching in ground-truth test). Radial distortion correc-tion was combined with these homographies to form a composite transformation whichwas applied to raw data images using bicubic image interpolation. Finally the image wascropped to include only the main part with stripes, so the resulting rectified image is571 × 351 pixels. The images that will be used for stereo algorithm evaluation were alsore-quantised to 8-bit resolution (using the most significant 8 bits), because some algorithmscannot deal with 12-bit data.

11

Page 15: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

Original image Filtered image

0 20 40 60 80

−60

−40

−20

0

20

40

60

Intensity profile of the middle row

Figure 6: segmentation: input image (left) and filtered image (center) with marked bound-aries, a part of an intensity profile of the filtered image with marked zero-crossings (right)

3.2 Ground-Truth

Our test scene contains piecewise planar parts, thin stripes lying on the same plane and aplanar background. The process of generating the Ground-Truth has two steps: first, thestripes are segmented from the background and second, a disparity plane is fit to segmentedparts using a disparity map obtained from a selected matching algorithm. Disparity isexpressed in the left camera such that

xr = xl − d, (10)

where xl, xr is a position in the left, right rectified image, respectively.

3.2.1 Object segmentation

The input image is already rectified and reduced in size. In order to make the stripesegmentation easier, we put a black cloth behind the front glass with the stripes. We usedthe projector without the texture slide, to illuminate the scene.

An initial estimation of the stripe boundaries was done with thresholding the image.Since we want to precisely determine where the object boundaries are, we found them aszero-crossing of the image filtered with second order Laplacian (standard deviation was setto 2 pixels in both directions). The zero-crossing was found in each image row separatelywith a subpixel precision. Then a third order polynomial was fit to the neighbourhood ofthe zero-crossing. Finally the boundary was resampled into the image grid. The procedureis illustrated in Fig. 6. The output from this segmentation is a logical mask of the striperegions.

The same proceudre we used to segment the shadows of the stripes.

3.2.2 Disparity map composition

Disparity map obtained from the confidently stable matching algorithm [18] (α = 20,β = 0.05) is quite sparse but relatively error free, although there are some mismatches.Therefore fitting of the disparity plane must be robust.

12

Page 16: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

First, we cut out regions according to masks obtained from the previous procedure.Then fitting is performed in repeated squares sense, i.e. first estimation comes fromRANSAC2, outliers are excluded3 and the final fitting is done by standard least squaresusing inliers only. Finally, disparity map is composed from the background and foregroundparts. The procedure is illustrated in Fig. 7.

Half occlusions are found directly from the Ground-Truth. Occluded pixels are thebackground pixels which violate uniqueness constraint from the view of the left camera,see Fig. 8.

All masks which are later used in the evaluation algorithm are shown in Fig. 7. Theseare logical masks forming disjoint cover of the left image: background (blue), foreground(green), shadows (brown) and half occlusions (orange).

Ground-truth disparity range is (−89.9,−43.0), of which (−43.0,−72.9) is the disparityrange of the background and (−89.9,−61.8) of the foreground. The stripe width is 18 pixelsand the distance between stripes is 85 pixels approximately.

3.3 Experimental data acquisition

The test scene was captured under 20 different exposures corresponding to 20 differentimage contrast values. By the texture contrast we understand the mean value of image inthe left camera capturing the scene without projected texture (white frame). We set theexposure time range according to a required image contrast range of (160, 2400). Then 20levels were chosen from this interval as a geometric sequence, see Fig. 9. The left rectifiedimages of contrast 1, 8, 13, 20 are shown in Fig. 10.

For each texture contrast we acquired 10 stereo-images (we call them trials). Thetexture which was projected was slightly moved inside the projector in each trial.

This gives us the possibility to:

1. average out the dependence of tested algorithm results on the texture grain locationin the image,

2. analyze the spread there due to grain position relative to the objects in the scene.

We also acquired images for intensity normalization:

• dark frame - the projector was turned off,

• white frame A - the projector was turned on, without texture,

• white frame B - the projector was turned on, without texture, but a white paper wasput in front of our scene,

2Procedure randomly looks for 3 points, which define the plane, minimizing the median of the residual.3according to their residual, threshold was set as 3×MAD (Median of Absolute Deviation from median)

of the residual.

13

Page 17: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

Original disparity map Disparity map without outliers Filled disparity map

Original disparity map Disparity map without outliers Filled disparity map

Original disparity map Ground Truth All masks together

BackgroundObject Occlusion Shadow

Figure 7: ground-truth disparity map composition procedure

45◦

d=

xl−

xr

xl

xr

xl

Figure 8: Half occlusions: half occluded region is marked with an orange thick line. Dis-parity oriented (left), matching table oriented (right) representation of the matching.

14

Page 18: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

0 0.2 0.4 0.6 0.80

500

1000

1500

2000

2500

texp

[s]

text

ure

cont

rast

Figure 9: Texture contrast versus exposure-time plot with chosen levels marked.

(a) (b) (c) (d)

Figure 10: The left rectified images of contrast 1, 8, 13, 20.

15

Page 19: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

80 100 120 140 160 180

20

30

40

50

60

70

80

90

100

110

xr

x l

ground−truthmatched

Figure 11: A part of the matching table with ground truth and current matches (obtainedby WTA algorithm running on the middle contrast), jointly-occluded area (red), texture-less area (green) and background - foreground boundary areas (yellow, pink) are alsomarked.

None of this data was necessary to use in the processed ground-truth images. They areavailable if a need occurs.

We also acquired images of the scene with regular textures - regular checkerboard andhorizontal stripes.

4 Evaluation of the errors

A usual output of stereo-matching algorithms is a disparity map for the left camera. Match-ing errors defined in Sec.2, are evaluated using the matching table.

Initially (1) the disparity map of the evaluated algorithm, (2) the ground-truth disparitymap and (3) the mask of areas are represented in the matching table for each scanlineseparately.

The relation between the left disparity map and the matching table is defined in 10. Apart of a typical matching table is shown on fig. 11.

The xr coordinates of the ground-truth are non-integer, because our ground-truth dis-parity map is known to a subpixel precision. Therefore a decision whether a match lieson the ground-truth or not is made such that a match is correct if its distance from theground-truth match is smaller than 1 pixel in the horizontal (right) direction.

16

Page 20: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

Determining the cardinality of the sets defined in Sec.2 is then straightforward. Thecontributions to the nominator and denominator of the respective errors are summed overall scanlines and the final division is performed using these sums.

5 Experiments

The purpose of this section is to show in detail how to analyze any tested matching algo-rithm. We do not aim at a balanced evaluation of existing approaches, on the contrary, wethink that it is impossible to compare algorithms based on different principles. What wewant is to demonstrate and describe typical characteristics of various approaches.

5.1 Evaluated Algorithms

To demonstrate the evaluation procedure, we have chosen four area-based algorithms: MLmatching based on dynamic programming [3], MAP matching based on graph cuts [10],Winner-takes-all algorithm [23], and Confidently stable matching [18]. The principle ofthese algorithms is briefly described below.

Our aim was to select such algorithms which are typical representatives of differentstereo approaches. According to the amount of the continuity prior the algorithm exploits,we distinguish algorithms with a:

1. strong continuity prior,

2. weak continuity prior.

The first two algorithms belong to the group 1, however each of them uses different wayof computing the MAP estimate. The later two algorithms belong, on the other hand, tothe group 2, where CSM exploits ordering constraint and WTA uses no prior.

ML Matching Based on Dynamic Programming (DP) The Maximum LikelihoodMatching is our re-implementation of the Cox et al. algorithm [3]. The correspondencesare computed for each epipolar line separately via dynamic programming based on sum-of-squared-differences (SSD). Ordering constraint is employed.

MAP Matching Based on Graph Cuts (GC) This algorithm, published in [10], isbased on graph labeling, where the correspondences are established by minimizing the 2-DPotts energy function via graph cuts. The algorithm is able to explicitly model occlusions,SSD is used as the similarity statistics. For the evaluation, the Kolmogorov’s public codeis used.

17

Page 21: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

Winner-takes-all (WTA) This algorithm is a standard local stereo approach how-ever it does not produce a one to one matching: for each point in the reference imageis established the correspondence with the point from the other image which is the mostsimilar, computed by SSD statistics. No constraint is employed. In this paper, the publicScharstein’s implementation from [23] is used.

Confidently Stable Matching (CSM) This algorithm, proposed in [18], establishesthe correspondences based on the stability constraint, which guarantees the best uniquesolution on the given (by the user) confidence level. The ordering and uniqueness con-straints are incorporated. The similarity is computed by modified cross-correlation [14]between image windows.

Tested algorithms have several adjustable parameters, which can be divided into threeclasses: Parameters

1. common for all the algorithms, which were set the same for all the algorithms:the matching window size (5× 5 pixels) and the disparity search range (−100, 100).

2. fundamental for the current method, which were tuned to have the minimal FR(Eq. 4) on the middle contrast level, we call it T-level . We used a simple grid searchin a Cartesian product of these parameters, followed by a gradient descent from thegrid minima. We cannot use any sophisticated global optimization technique likesimulated annealing, because an evaluation of the criterion lasts more than half anhour in the case of some algorithms like Graph Cuts. Tuning process is illustratedin Fig. 12.

• WTA has no parameters.

• DP has only one parameter, occlusion penalty, which was set to λ = 30.

• GC has many parameters, we set only λ = 125 and penalty0 = 45.

• CSM has two parameters (α0 = 20, β = 0.05) which were not set accordingto the minimal FR (Eq. 4), because it has no local extremum. The reasonis that these parameters defines confidence level, unlike penalties in the otheralgorithms. The parameters setting determines the mismatch rate level andfalse negative rate level and these two errors are (obviously) in contradiction.We tuned the parameters to produce low mismatch rate, which consequentlyproduces higher false negative rate.

3. auxiliary. which have only a small influence on the result and were set according toauthor’s recommendation.

Setting of all the parameters is constant for all input stereo pairs over all twenty contrastlevels. We do not aim to tune the parameters to the absolutely optimal performance of thealgorithms. We want to tune them reasonably, i.e. to avoid artifacts and ensure comparableconditions.

18

Page 22: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

0 100 200 300 400 5000

0.2

0.4

0.6

0.8

1

λ

FR

λ

pena

lty0

0 30 60 90 120 150 180

0

50

100

150

200

250

300

α

β

0 20 40 60 80 100

0

0.02

0.04

0.06

0.08

0.1

Figure 12: FR values versus parameter setting: DP (left), GC (center), CSM (right)

5.2 Evaluation

Results of the tested algorithms are shown in graph plots in Fig. 13. Texture contrast (onthe horizontal axis with logarithmic scale in all plots) is directly related to signal-to-noiseratio. Vertical axes show the respective error rates, with also logarithmic scales, exceptFigs. 13(g), 13(h), and 13(i). In all the plots, the error rates of each algorithm are computedas the average error over the respective trial (10 images of the same texture contrast), thevertical bars show the error variance in the trial. The resulting disparity maps under allthe texture contrasts are shown separately for each algorithm in Figs. 14, 15, 16, and 17.

We can divide the selected methods in two groups using two different criteria and studyif it is possible to predict the algorithms behaviour under various conditions based only onthe group classification. The first criterion is based on the continuity model the methodsuse: (1) strong: GC and DP, and (2) weak (or none): CSM and WTA.

The second criterion is based on the purpose the results are used for:

1. 3D model reconstruction: the results must not include FP and MI errors, howeverthey can be sparse,

2. view prediction: the results have to be dense, while a few errors does not cause manyproblems [6].

We first discuss algorithm behaviour in different errors and than we give a summary:

Mismatch Rate (Fig. 13(a)). This error is usually determined by texture self-similarity.The GC gives below the T -level very poor results as confirmed in Figs. 14(1)–14(5)).

The accuracy improves about the two orders of magnitude in the maximum contrast.The DP produces below T -level very poor results (cf. Figs. 15(1)–15(4)) and improves

the accuracy very fast as the GC. The difference between the DP and GC above the T -levelis explained as follows: The DP minimizes the sum of the SSD in all potential matcheswhich decreases the influence of λ. The MIR thus levels off on the value of texture self-similarity (which is constant). The GC does not show this behaviour because the isotropiccontinuity prior is stronger than the directional one used in the DP.

19

Page 23: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

The CSM algorithm reaches very fast a low level and then stays constant over all thecontrasts.

The WTA gives the MIR consistently bad, as expected (by about almost two orders ofmagnitude).

False Negative Rate (Fig. 13(b)). In the GC the FNR generally grows with increasingtexture contrast. The FNR is induced by disparity jumps, and consequently, as the dis-parity resolution improves, the frequency of jumps increases faster than their decreasingmagnitude, cf. Fig. 14.

The DP exhibits similar behaviour as the GC, however the reason of increasing the FNRis different: it is a result of the relative decrease of λ with increasing contrast. Skipping amatch near a disparity jump is cheaper than including a match with a non-zero SSD.

The CSM produce high and constant FNR (cf. Fig. 16), which is a consequence ofparameter setting as discussed in Sec. 5.1.

The WTA matching produces zero FNR (if we exclude the few very low texture contrastlevels), since it does not model occlusions (it establishes for every pixel in the referenceimage correspondence in the other image, cf. Fig. 17).

False Positive Rate (Fig. 13(c)). This error tests if the algorithm is able to detectjointly occluded pixels and avoid totally erroneous correspondences (since such resultswould be unusable).

The GC and DP shows similar rates: the FPR is high for below T-level contrasts, forabove T-level contrasts, it is zero. This behaviour is due to a very strong continuity priormodel in both the methods which results in constant disparity solution passing throughthe region C, cf. Figs. 14(6)–14(12) for GC, and Figs. 15(5)–15(13) for DP. Therefore, forthe scenes with regions of lower texture contrast then the algorithm parameters are tunedto, the results of methods with a strong continuity model are very unreliable.

In the CSM, the FPR is zero except for the two lowest contrasts. On the contrary, theWTA produces the worst results in this error rate, which is a consequence of the lack ofany scene model.

Failure Rate (Fig. 13(d)). This error is the only one, which is traditionally defined.Although the plot shows various progresses of failure rate for different algorithms, theyonly follow the mismatch and false negative rates, which are however in contradiction.

The GC shows the best results above the T -level. In the DP above the T -level, the FRincreases. The CSM shows stable results across the whole contrast range, it is by about 5×worse than the GC under the maximum contrast. The performance of the WTA is similarto that of the CSM in the FR error.

Analysis in Textureless Regions (Figs. 13(e), 13(f)). To distinguish the algorithmbehaviour into the three different groups (discussed in Sec. 2), it is necessary to studyMismatch rate (Fig. 13(e)) and False negative rate (Fig. 13(f)) in this region together.

20

Page 24: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

The GC and DP belong to the second group, but in detailed inspection, they showslightly different behaviour. The GC gives empty disparity maps under low texture con-trasts. Then, for the below T-level contrasts, the algorithm establishes the correspondences,but since the data are still very poor, the continuity model wins and the disparity of theregion boundaries is held. For the above T-level contrasts, the data are taken into accountresulting in the disparity jumps, which are occluded. Since the disparity in these regionschange (see the ground-truth in Fig. 7), keeping the constant disparity over these regionscauses high mismatch rate. On the contrary, the DP interpolates the disparity over theseregions. This results in by order of a magnitude lower mismatch rate than of the GC, buthigher false negative rate as it unmatches pixels at disparity jumps.

The third group of algorithms, is represented by CSM. It establishes almost no cor-respondences in these regions (cf. Fig. 13(f)) and for good texture contrasts the CSMproduces no incorrect correspondences in these regions. However, for low texture con-trasts, the mismatch rate is higher. The reason of this behaviour is that in textureless andvery dark regions no model can perform, since the ordering constraint is insufficient.

The WTA algorithm belongs into the first group. It fills these regions totally (thereare no false negatives), however as it can be seen in mismatch rate, the established corre-spondences are almost completely incorrect.

Bias (Fig. 13(g)). Methods based on strong continuity models are expected to be heav-ily biased, while methods without such models to be unbiased. In agreement with thisconclusion, we can see the WTA and CSM are unbiased over all the image contrasts.

On the contrary, the global methods exhibit strong bias, except for the lowest contrasts,where there is nothing in disparity maps, and high-contrast images, where the resultsreflect data. For the DP matching, we can see very large negative bias, which meansthe background disparity is interpolated over the whole image (cf. Figs. 15(5)–15(11)).The GC algorithm below the T -level interpolates by planes with constant disparity, whichcorresponds at the beginning to the background (cf. Figs. 14(6)–14(11)). The positive biasaround the T -level is due to fewer disparity jumps in the stripes than in the background.

Occlusion Boundary Accuracy (Fig. 13(h)). In the GC the OBI is low and constantabove the T -level and very high below this contrast, as confirmed in Fig. 14. In the DPthe T -level performance is about 1.5× better than in the GC. This is due to the differencein isotropic and directional priors the methods use.

The CSM has almost constant OBI above the constant level of 4. The WTA has thelowest OBI because of its ability to find one-to-two matchings at disparity jumps.

Occlusion Boundary Bias (Fig. 13(i)). Bias at occlusion boundary exhibit the samebehaviour as the image Bias (Fig. 13(g)), which is discussed above. Non-zero bias forlocal methods (WTA and CSM) is caused by using windows, which may shift the occlusionboundary. However, for the highest contrasts the results of all the methods are comparable.

21

Page 25: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

T

10−3

10−2

10−1

100

Mismatch Rate (δ>1)

MIR

texture contrast

GC DP CSMWTA

(a) inaccuracy

T10

−6

10−4

10−2

100

False Negative Rate

FN

R

texture contrast

GC DP CSMWTA

(b) sparsity

T10

−6

10−5

10−4

10−3

10−2

10−1

100

False Positive Rate

FP

R

texture contrast

GC DP CSMWTA

(c) monocular artifacts

T10

−2

10−1

100

Failure Rate

FR

texture contrast

GC DP CSMWTA

(d) overall error

T10

−4

10−3

10−2

10−1

100

Mismatch Rate in Textureless regions (δ>1)

MIS

R

texture contrast

GC DP CSMWTA

(e) textureless inaccuracy

T10

−6

10−4

10−2

100

False Negative in Textureless regions

FN

SR

texture contrast

GC DP CSMWTA

(f) textureless sparsity

T−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4Bias

B

texture contrast

GC DP CSMWTA

(g) bias

T0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Occlusion Boundary Inaccuracy

OB

I

texture contrast

GC DP CSMWTA

(h) occlusion accuracy

T−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3Occlusion Boundary Bias

OB

B

texture contrast

GC DP CSMWTA

(i) occlusion bias

Figure 13: Different types of matching errors: ’WTA’ is the Winner-takes-all algorithmimplementation from [23], ’GC’ Kolmogorov’s MAP matching based on graph cuts [10],’DP’ Cox’s ML matching based on dynamic programming [3], and ’CSM’ Confidently stablematching [18].

22

Page 26: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

Disparity maps computed by the tested algorithms are shown in Figs. 14–17 for thevisual inspection of the results. Each figure represents the results of one tested algorithmunder all the twenty texture contrasts.

From our experimental analysis discussed above we conclude that the best choice for viewprediction is the GC algorithm and for the structure reconstruction it is the CSM algorithm:

• The GC has the best overall failure rate (FR) mainly because of a good disparity mapdensity (low false negative rate) and low mismatch rate that continuously improveswith increasing texture contrast. The GC has the ability to detect half-occlusions aslong as they have good contrast. The GC parameters must be tuned to the worst-contrast texture in the scene, since the method completely fails in low contrasts. Inother words it is prone to illusions: its inability to reject unreliable data is a seriousdrawback if the method is used in structure reconstruction.

• The CSM gives good quality results over all the contrast levels, although they aresparser then of the other methods (especially under low contrasts). The low mis-match rate, zero false positive rate, and the unbiasedness make the CSM suitablefor structure reconstruction in complex scenes of varying texture contrast. The highfalse negative rate renders this method unsuitable for view prediction, however.

• The DP produces similar results to that of the GC on the contrast level, the param-eters have been tuned to. The sensitivity to parameter setting is the main disadvan-tage of the DP, since the performance decreases in both below and above the optimalcontrast level.

• The WTA produce very erroneous results (independently on texture contrast), andtherefore it is not suitable for both the structure reconstruction and the view predic-tion applications except when the speed is a strong concern.

6 Conclusion

In this paper, we have proposed the methodology, which allows to study failure mechanismsof various binocular matching algorithms. Towards this purpose we designed a test dataset,where the same scene is captured under varying texture contrasts. The algorithms aretested under both low and high contrasts which reveals their ability to cope with realscenes of non-uniform texture. In order to discover various algorithms weaknesses, thetest data is designed to be targeted on a specific error mechanism, while other causes areexcluded.

The evaluation is based on two different principles: graph plots and disparity maps.On the graph plots, the performance comparison of algorithms under all twenty contrastlevels is demonstrated for each error independently. The disparity maps allow to study theresults also visually to better interpret plots conclusions.

23

Page 27: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

Graph Cuts Matching

(1) (2) (3) (4)

(5) (6) (7) (8)

(9) (10) (11) (12)

(13) (14) the T -level (15) (16)

(17) (18) (19) (20)

Figure 14: Disparity maps produced by the Kolmogorov’s MAP matching based on graphcuts under all twenty texture contrasts. The numbers below figures represent contrastlevels.

24

Page 28: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

Dynamic Programming Matching

(1) (2) (3) (4)

(5) (6) (7) (8)

(9) (10) (11) (12)

(13) (14) the T -level (15) (16)

(17) (18) (19) (20)

Figure 15: Disparity maps produced by the Cox’s ML matching via dynamic programmingunder all twenty texture contrasts. The numbers below figures represent contrast levels.

25

Page 29: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

Confidently-Stable Matching

(1) (2) (3) (4)

(5) (6) (7) (8)

(9) (10) (11) (12)

(13) (14) the T -level (15) (16)

(17) (18) (19) (20)

Figure 16: Disparity maps produced by the Confidently stable matching algorithm underall twenty texture contrasts. The numbers below figures represent contrast levels.

26

Page 30: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

Winner-takes-all Matching

(1) (2) (3) (4)

(5) (6) (7) (8)

(9) (10) (11) (12)

(13) (14) the T -level (15) (16)

(17) (18) (19) (20)

Figure 17: Disparity maps produced by the Winner-takes-all algorithm under all twentytexture contrasts. The numbers below figures represent contrast levels.

27

Page 31: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

Based on the evaluation, it is possible to infer algorithm properties and to predict itsbehaviour under various conditions. From our analysis, we can conclude, for the viewprediction applications the suitable method is the GC, for the structure reconstructionit is the CSM. The WTA gives the worst results over all the contrast levels and thus isunsuitable for both the applications. The strong sensitivity to parameters setting makesthe DP unsuitable as well.

We plan to provide the image set together with Matlab code implementing the evalu-ation methodology at our web site. If response is positive, larger evaluation study will bepossible.

References

[1] Robert C. Bolles, H. Harlyn Baker, and Marsha Jo Hannah. The JISCT stereo eval-uation. In Proc. DARPA Image Understanding Workshop, pages 263–274, 1993. 3

[2] Yuri Boykov, Olga Veksler, and Ramin Zabih. Fast approximate energy minimizationvia graph cuts. In Proc. 7th IEEE International Conference on Computer Vision(ICCV’99), volume 1, pages 377–384, September 1999. 3

[3] Ingemar J. Cox, Sunita L. Higorani, Satish B. Rao, and Bruce M. Maggs. A maximumlikelihood stereo algorithm. Computer Vision and Image Understanding, 63(3):542–567, May 1996. 3, 17, 22

[4] T. Day and J.-P. Muller. Digital elevation model production by stereo-matching spotimage-pairs: a comparison of algorithms. Image and Vision Computing, 7(2):95–101,1989. 2

[5] Georgy Gimel’farb. Pros and cons of using ground control points to validate stereo andmultiview terrain reconstruction. Presented at Evaluation and Validation of ComputerVision Algorithms, Schloss Dagstuhl, Wadern, Germany, March 1998. 2

[6] Georgy Gimel’farb and Hao Li. Probabilistic regularisation in symmetric dynamicprogramming stereo. In Proceedings of Image and Vision Computing New Zealand2000, pages 144–149, November 2000. 2, 19

[7] Oliver S. Heavens and Robert W. Ditchburn. Insight into Optics. John Wiley andSons, 2-nd edition, 1991. 31

[8] Yuan C. Hsieh, David M. McKeown, Jr., and Frederic P. Perlant. Performance evalu-ation of scene registration and stereo matching for cartographic feature extraction.IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(2):214–237,February 1992. 3

[9] Bela Julesz. Towards the automation of binocular depth perception (automap-1). InIFIPS Congress, Munich, Germany, 1962. 1

28

Page 32: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

[10] Vladimir Kolmogorov and Ramin Zabih. Computing visual correspondence with oc-clusions using graph cuts. In Proceedings of the 8th International Conference onComputer Vision (ICCV’01), Vancouver, Canada, July 2001. 17, 22

[11] Andreas Koschan. Methodic evaluation of stereo algorithms. In R. Klette and W. G.Kropatsch, editors, Proc. 5th Workshop on Theoretical Foundations of Computer Vi-sion, volume 69 of Mathematical Research, pages 155–166, Berlin, Germany, 1992.Akademie Verlag,. 3

[12] Yvan G. Leclerc, Q.-Tuan Luong, and P. Fua. Measuring the self-consistency ofstereo algorithms. In David Vernon, editor, Proceedings 6th European Conference onComputer Vision–ECCV2000, volume 2 of Lecture Notes in Computer Science, pages282–298, Dublin, Ireland, June 2000. Springer-Verlag. 2

[13] Csaba Meszaros. Automaticka detekce obrazce pro kalibraci pespektivnı kamery : diplo-mova prace. Univerzita Karlova, Matematicko-fyzikalnı fakulta, Praha, 2000. containsCD-ROM. 11

[14] H. P. Moravec. Towards automatic visual obstacle avoidance. In Proc. 5th Int. JointConf. Artifficial Intell., page 584, 1977. 18

[15] Jane Mulligan, Volkan Isler, and Kostas Daniilidis. Performance evaluation ofstereo for tele-presence. In Proc. of International Conference on Computer Vision(ICCV2001), 2001. 2

[16] Tomas Pajdla, Tomas Werner, and Vaclav Hlavac. Correcting radial lens distortionwithout knowledge of 3-D structure. Technical Report K335-CMP-1997-138, FEECTU, FEL CVUT, Karlovo namestı 13, Praha, Czech Republic, June 1997. 11

[17] Radim Sara. Binocular rectification. Working Paper 98/03, Center for Machine Per-ception, Faculty of EE, Czech Technical University, 1998. 11

[18] Radim Sara. Finding the largest unambiguous component of stereo matching. In An-ders Heyden, Gunnar Sparr, Mads Nielsen, and Peter Johansen, editors, Proceedings7th European Conference on Computer Vision, volume 3 of Lecture Notes in ComputerScience, pages 900–914, Berlin, Germany, May 2002. Springer. 12, 17, 18, 22

[19] Radim Sara and Ruzena Bajcsy. On occluding contour artifacts in stereo vision. InDeborah Plummer and Ian Torwick, editors, Proceedings of the International Confer-ence on Computer Vision and Pattern Recognition (CVPR’97), pages 852–857, LosAlamitos, California, June 1997. IEEE Computer Society, IEEE Computer SocietyPress. 9

[20] Kiyohide Satoh and Yuichi Ohta. Occlusion detectable stereo using a camera matrix.In Proc 2nd Asian Conf. on Computer Vision, volume 2, pages 331–335, 1995. Avail-able also from http://image-gw.esys.tsukuba.ac.jp/research/SEA/sea.html. 3

29

Page 33: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

[21] Kiyohide Satoh and Yuichi Ohta. Occlusion detectable stereo — systematic com-parison of detection algorithms. In Proc. of International Conference on PatternRecognition (ICPR’96), 1996. 3

[22] Daniel Scharstein and Richard Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Technical Report MSR-TR-2001-81, Mi-crosoft Corporation, Redmond, WA 98052, USA, November 2001. The evaluationpage url: http://www.middlebury.edu/stereo. 3

[23] Daniel Scharstein, Richard Szeliski, and Ramin Zabih. A taxonomy and evaluationof dense two-frame stereo correspondence algorithms. International Journal on Com-puter Vision, 47(1):7–42, May 2002. 3, 17, 18, 22

[24] Richard Szeliski. Prediction error as a quality metric for motion and stereo. In Proc.7th IEEE International Conference on Computer Vision – ICCV ’99, volume 2, pages781–788, Los Alamitos, CA, September 1999. IEEE Computer Society. 2, 3

[25] Richard Szeliski and Ramin Zabih. An experimental comparison of stereo algorithms.In Proceedings Vision Algorithms: Theory and Practice Workshop (ICCV99), Corfu,Greece, September 1999. 3

[26] C. Lawrence Zitnick and Takeo Kanade. A cooperative algorithm for stereo match-ing and occlusion detection. IEEE Transactions on Pattern Analysis and MachineIntelligence (PAMI), 22(7):675–684, July 2000. 3

30

Page 34: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

A Simulation of the glass reflectance

The simulation is a model of Fresnel’s equations [7], where multiple reflectances and re-fractances inside the glass are considered, as illustrated in Fig. 18.

Since we want to minimize the reflectance intensity, the source light was polarizedparallel to the plane of incidence, using the projector’s polarizer.4

We have to consider partially polarized light, because the polarizer efficiency is nomore than 90 percent. Thus we set Ep = 0.9, Ek = 0.3, where Ep, resp. Ek is the lightcomponent polarized perpendicularly, resp. paralell to the plane of incidence.

Plots in Fig. 19 show the reflected and transmitted intensity and their polarizing com-ponents versus the angle of incidence α. According to Fig. 19(c) the minimal reflectanceoccurs for α = 56◦, the Brewster’s angle marked by asterisk. The value of this minimumis then decreased using the camera polarizers. They are also set parallel to the plane ofincidence, to filter out the Ek component which causes non-zero value of the reflectancepower, c.f. Fig. 19(a).

The angle of incidence in our setup is slightly below 50◦. We cannot set the Brewster’sangle exactly, because we are constrained by the required position of the shadows.

4Practically this setting was achieved observing the reflectance on the opposite wall of the lab whenrotating polarizer.

31

Page 35: CENTER FOR MACHINE PERCEPTION The CMP …cmp.felk.cvut.cz/ftp/articles/kostkova/Kostkova-TR-2003...CENTER FOR MACHINE PERCEPTION CZECH TECHNICAL UNIVERSITY 2365 T The CMP Evaluation

· · ·

· · ·

· · ·

α

α

reflected

transmitted

source

α

β

Figure 18: Simulation setup

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1Power of the reflected light

α (deg): angle of incidence

Ep

Ek

(a)

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1Power of the transmitted light

α (deg): angle of incidence

Ep

Ek

(b)

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1Power of the light

α (deg): angle of incidence

Total Reflected Total Transmitted

(c)

Figure 19: Results of the simulation

32