a stereo confidence metric using single view …wildes/wildesvi02.pdfa stereo confidence metric...

8
A Stereo Confidence Metric Using Single View Imagery Geoffrey Egnal, Max Mintz Richard P. Wildes GRASP Laboratory Centre for Vision Research University of Pennsylvania York University gegnal,mintz @grasp.cis.upenn.edu [email protected] Abstract Although stereo vision research has progressed remark- ably, stereo systems still need a fast, accurate way to esti- mate confidence in their output. In the current paper, we explore using stereo performance on two different images from a single view as a confidence measure for a binocular stereo system incorporating that single view. Although it seems counterintuitive to search for correspondence in two different images from the same view, such a search gives us precise quantitative performance data. Correspondences significantly far from the same location are erroneous be- cause there is little to no motion between the two images. Using hand-generated ground truth, we quantitatively com- pare this new confidence metric with five commonly used confidence metrics. We explore the performance character- istics of each metric under a variety of conditions. 1 Introduction 1.1 Overview We present a new method to diagnose where a stereo algorithm has performed well and where it has performed badly. All stereo systems estimate correspondences, but not all of these correspondences are correct. Many systems do not give an accurate estimate of how trustworthy their results are. Our new confidence method addresses many causes of stereo error, but it focuses on predicting stereo error caused by low-texture regions. This error source is particularly bothersome when operating in urban terrain or when the imagery contains large regions of sky. The new confidence metric is based upon correspon- dence in images from a single view. Although it seems counterintuitive to search for correspondence in two dif- ferent images from the same view, such a search provides valuable information. It gives us precise quantitative perfor- mance data; disparities significantly far from zero are erro- neous because there is no motion between the two images. We use the term Single View Stereo (SVS) to refer to the disparity map produced by a stereo system applied to two images from one view, separated in time. Specifically, we use single view stereo performance data as a confidence metric that predicts how a binocular stereo Right Image Left Image t 0 t 1 Binocular Stereo SVS Figure 1: Single View Stereo. SVS searches for corre- spondence in two images from the same view. We use SVS failure as a confidence metric to predict binocular failure. system which incorporates the single view would perform. The SVS output shows empirically where the stereo system has failed on the current scenery, and for this reason, SVS failure predicts failure pixelwise in the binocular case (see Figure 1). In practice, many stereo cameras are in motion, and the SVS disparity will not be zero at each pixel. We discuss how to modify the static SVS algorithm to accom- modate images taken from a moving camera. We compare the performance of SVS as a confidence metric with five commonly-used confidence metrics: (i) left/right consistency (LRC) predicts stereo error where the left-based disparity image values are not the inverse map- ping of the right-based disparity image values. (ii) The matching score metric (MSM) directly bases confidence upon the magnitude of the similarity value at which the stereo matcher finds that left and right image elements match. (iii) The curvature metric (CUR) marks disparity points resulting from flat correlation surfaces with low con- fidence. (iv) The entropy-based confidence score (ENT) predicts stereo error at points where the left image has low image entropy. (v) The peak ratio (PKR) estimates error for those pixels with two similar match candidates. A brief comparison of the five metrics can be found in Table 1. 1.2 Previous Work Unlike much research on stereo performance, we do not attempt to compare different stereo algorithms to see which is best [10, 5, 24, 29, 9]. Also, we do not attempt to find theoretical limits to stereo performance [4, 8, 19]. Instead, our current research deals with on-line confi- dence metrics which predict errors within seconds for a

Upload: others

Post on 06-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Stereo Confidence Metric Using Single View …wildes/wildesVI02.pdfA Stereo Confidence Metric Using Single View Imagery Geoffrey Egnal, Max Mintz Richard P. Wildes GRASP Laboratory

A StereoConfidenceMetric UsingSingleView Imagery

Geoffrey Egnal,Max Mintz RichardP. WildesGRASPLaboratory Centrefor VisionResearch

Universityof Pennsylvania York University�gegnal,mintz� @grasp.cis.upenn.edu [email protected]

Abstract

Althoughstereovisionresearch hasprogressedremark-ably, stereo systemsstill needa fast, accurateway to esti-mateconfidencein their output. In the current paper, weexplore usingstereo performanceon two different imagesfroma singleview asa confidencemeasure for a binocularstereo systemincorporating that single view. Althoughitseemscounterintuitiveto search for correspondencein twodifferentimagesfromthesameview, such a search givesusprecisequantitativeperformancedata. Correspondencessignificantlyfar from the samelocation are erroneousbe-causethere is little to no motionbetweenthe two images.Usinghand-generatedgroundtruth, wequantitativelycom-pare this new confidencemetric with five commonlyusedconfidencemetrics.We explore theperformancecharacter-isticsof each metricundera varietyof conditions.

1 Intr oduction

1.1 OverviewWe presenta new methodto diagnosewherea stereo

algorithmhasperformedwell andwhereit hasperformedbadly. All stereosystemsestimatecorrespondences,butnot all of thesecorrespondencesarecorrect.Many systemsdo not give an accurateestimateof how trustworthy theirresultsare. Our new confidencemethodaddressesmanycausesof stereoerror, but it focuseson predictingstereoerror causedby low-texture regions. This error sourceisparticularlybothersomewhenoperatingin urbanterrainorwhentheimagerycontainslargeregionsof sky.

The new confidencemetric is basedupon correspon-dencein imagesfrom a single view. Although it seemscounterintuitive to searchfor correspondencein two dif-ferent imagesfrom the sameview, sucha searchprovidesvaluableinformation.It givesusprecisequantitativeperfor-mancedata;disparitiessignificantlyfar from zeroareerro-neousbecausethereis no motionbetweenthe two images.We usethe term SingleView Stereo(SVS) to refer to thedisparitymapproducedby a stereosystemappliedto twoimagesfrom oneview, separatedin time.

Specifically, weusesingleview stereoperformancedataasa confidencemetric thatpredictshow a binocularstereo

Right ImageLeft Imaget0

t1

BinocularStereo

SVS

Figure 1: SingleView Stereo. SVS searchesfor corre-spondencein two imagesfrom the sameview. We useSVS failure as a confidencemetric to predict binocularfailure.

systemwhich incorporatesthesingleview would perform.TheSVSoutputshowsempiricallywherethestereosystemhasfailedon thecurrentscenery, andfor this reason,SVSfailurepredictsfailurepixelwise in the binocularcase(seeFigure1). In practice,many stereocamerasarein motion,and the SVS disparitywill not be zeroat eachpixel. Wediscusshow to modify thestaticSVSalgorithmto accom-modateimagestakenfrom amoving camera.

We comparethe performanceof SVS as a confidencemetric with five commonly-usedconfidencemetrics: (i)left/right consistency (LRC) predictsstereoerrorwheretheleft-baseddisparity imagevaluesarenot the inversemap-ping of the right-baseddisparity image values. (ii) Thematchingscoremetric (MSM) directly basesconfidenceupon the magnitudeof the similarity value at which thestereomatcherfinds that left and right image elementsmatch. (iii) The curvaturemetric (CUR) marksdisparitypointsresultingfrom flat correlationsurfaceswith low con-fidence. (iv) The entropy-basedconfidencescore(ENT)predictsstereoerrorat pointswherethe left imagehaslowimageentropy. (v) The peakratio (PKR) estimateserrorfor thosepixelswith two similar matchcandidates.A briefcomparisonof thefivemetricscanbefoundin Table1.

1.2 PreviousWorkUnlike muchresearchon stereoperformance,we do not

attemptto comparedifferentstereoalgorithmsto seewhichis best[10, 5, 24, 29, 9]. Also, we do not attemptto findtheoreticallimits to stereoperformance[4, 8, 19].

Instead,our current researchdealswith on-line confi-dencemetrics which predict errors within secondsfor a

Page 2: A Stereo Confidence Metric Using Single View …wildes/wildesVI02.pdfA Stereo Confidence Metric Using Single View Imagery Geoffrey Egnal, Max Mintz Richard P. Wildes GRASP Laboratory

ErrorSource SV

S

LRC

MS

M

CU

R

EN

T

PK

R

LensDistortion D DHardware SensorNoise D D D

Quantization D DResampling D D

Algorithm Similarity Metric D D(Software) Window Size D D

SearchRange D DHalf-Occlusion D D

External Foreshortening D(Scene) PeriodicStructure D D D

Lighting Change D DLow Texture D D D D D D

Table 1: A Comparisonof Five ConfidenceMetrics. Thetable lists variouserror sourcesandindicateswith a ’D’whichconfidencemetricscandetecterrorsdueto theerrorsource.

givensystem.Within theconfidencemetricfield, therehasbeenmuchresearch.Researchthathasconsideredleft/rightconsistency includes[33, 18, 7, 20, 15, 32]. Previous re-searchinto the matchingscoremetric includes[12, 30].Researchersthat have examinedcurvaturemetricsinclude[13, 1, 2], andresearchlooking at entropy-like stereocon-fidencemetricsincludes[16, 26]. Previous work with thepeakratio includes[18, 28]. More generalresearchintoon-line error diagnosisincludes[34, 17, 31]. In perhapsthemostsimilar approachto ours,Leclerc,LuongandFua[17] have shown the effectivenessof the self-consistencyapproachto estimateconfidence. SVS differs from self-consistency in thatSVShasagroundtruthanddoesnotrelyon theagreementof multiple matchingprocessesto predictperformance.Otherstereoperformanceresearchfocusesonmodelingtheeffect of sensorerrorson stereo[14] andsta-tistical techniquesto estimatestereodisparity[22, 21].

In orderto verify stereoperformance,oneneedsgroundtruth. The oldestsourceof groundtruth comesfrom arti-ficial imagerythat hasadditive noisefrom a known distri-bution [23, 11]. While easyto generateandaccurate,thisimagerydoesnotaccuratelymodeltherealworld situationsthat moststereosystemsface. A secondapproachmanu-ally estimatesdisparitiesin laboratoryimages[25, 10, 27].This groundtruth measuresstereoperformancebetter, butdoesso at a greaterlabor cost anda correspondingsmallloss in accuracy. A third approachmeasuresa 3D scenefrom theactuallocationin which thestereosystemwill op-erate.Most avoid this optiondueto theextremelyhigh la-bor cost, but laserscanshave madethis task easier[24].Recently, Szeliski [31] hasproposeda new approachus-ing view-predictionasa measurementschemewith groundtruth. Thisapproachcatersspecificallyto theview synthesisapplicationandcomparesa synthesizedimagewith theac-

tual imagein that location.Althoughsomehave usedSVSasgroundtruth [3], SVShasnotbeentestedasaconfidencemetric. In this paper, we usea manually-obtainedgroundtruth to find the actualstereoerrorsso that we canverifytheconfidencemetrics’estimates.

In light of previouswork, our maincontributionsin thisreport are (i) to show how SVS can be usedas an on-line confidencemetric and (ii) to quantitatively evaluateand comparethis new confidencemetric with five more-traditional confidencemetrics on a uniform basis. Weusethreedatasets,eachcomprisedof a stereopair of im-agestakenin a laboratorysetting,to gainunderstandingofthedifferentperformancecharacteristicsof eachconfidencemetric. Using manually-obtainedgroundtruths,we checkwherethestereoactuallyfails,andweverify how well eachconfidencemetricpredictsthefailuresandcorrectmatches.

2 Implementation

2.1 SingleView Stereo(SVS)We proposeusingsingleview stereoresultsasa pixel-

wiseconfidencemetricfor binocularstereo.Thealgorithmhasthreesteps:(i) run thestereoalgorithmon two differentimagestakenfrom thesameview in closetemporalsucces-sion. Sincetheepipolargeometryis singularwith only oneview, we rectify both imagesasif they wereleft imagesinthe binocularcaseandsearchalonga horizontalscanline.(ii) Label the errorsin the outputfrom stepone,whereanerroris amatchoutsideathresholdintervalaroundzerodis-parity. (iii) Eacherror in theSVSoutputpredictsfailureatthesamepixel locationfor a binocularstereosystemincor-poratingtheview usedin thefirst step.For example,if theleft cameratakestwo picturesin quicktemporalsuccession,we canrun stereoon thetwo left imagesto generateconfi-dencedatafor thesamelocationin theleft/right stero.Thereverseexampleusestheright-basedSVSresultsto predictright/left stereoperformance.

Sensornoises,suchasCCD noiseandframegrabberjit-ter, aretheprimarydifferencebetweenthetwo SVSimages.Sincethesenoisesourcesalsoexist betweenthe binocularcameras,failureon SVS imageryshouldpredictfailureonbinocularimageryat thesamelocation. If we definesignalas imagestructurethat arisesfrom the sceneandnoiseasimagestructurethatarisesfrom othersources,thenonecanview SVS as a measureof signal to noise. In areaswithlow signal wherenoisedominatesthe scene-basedimagestructure,SVSshouldreturna low confidencescorefor thestereomatcher. Any stereosystemcanusethis confidencemetricbecauseSVSusesthesamestereoalgorithmthatthebinocularstereosystemuses.Moreover, thecomputationalexpenseis tolerablefor many applications;SVS requiresoneextra matchingprocess.

In practice,many stereosystemsarein motion,andthedisparityof the two successive monocularimagesmaynotbezero.Still, theSVSdisparityvaluesshouldbesmalldueto thelimited motionbetweenthetwo images.To compen-

Page 3: A Stereo Confidence Metric Using Single View …wildes/wildesVI02.pdfA Stereo Confidence Metric Using Single View Imagery Geoffrey Egnal, Max Mintz Richard P. Wildes GRASP Laboratory

satefor this motion,onecanrelaxthethresholdin labelingSVSerrors.For example,thedisparitiesin thesingleviewstereoresultsmight beallowedto vary up to 3 pixelsin ei-therdirection,beyondwhich thesystemlabelsthematchesaserroneous.Of course,this disparityrelaxationdependson the systemparameters.A fast frame rate and distantscenerywill reducethe disparitybetweensuccessive cam-eras. In this paper, we test the SVS confidencemetric onstill cameras,andthedisparityshouldbeexactly zeroat allSVSimagepixels.

In our study, we employ a basicscanline-searchstereoalgorithm. Sincewe only aim to measureinternal confi-dence, and not compareour stereoapproachwith others,the absoluteperformanceof our algorithmis lessrelevant.Following rectification,theimagesarefilteredwith aLapla-cianoperator[6], accentuatinghigh frequency edges.Next,thematchercalculates7x7 windowedmodifiednormalizedcross-correlationvalues(MNCC) for integralshifts,where������������ ���� � ����������� ������������ �!�����"�# $�&%

The disparity is the shift of eachpeak MNCC value.Subpixel peaklocalizationis madevia interpolationwitha quadraticpolynomial. For the tests,both the binocularstereoandSVS algorithmsearchalongscanlinesin a dis-parity rangeof ' 20pixels.2.2 Left/Right Consistency(LRC)

Our implementationof left/right checkingrelieson theconsistency betweenleft-basedand right-basedmatchingprocesses.Thematchingprocessesshouldproduceauniquematch,andleft/right checkingattemptsto enforcethis. For-mally, if (*) matches(*+,.-/(0) �21 )3"4 , where (0) is a rightcoordinateand ( +, is anestimatedleft matchat right-basedhorizontaldisparity

1 )3�4 , thenLRC is definedas5������6� -7(0)98 � ( +, �:1 ,3<;= �>�with

1 ,3<;= theleft-basedhorizontaldisparity.Whentheerroris high,thentheconfidenceis low. A low

scorecorrespondsto the situationwhenthe left andright-basedmatchesdisagreesignificantly.2.3 Matching ScoreMetric (MSM)

The matching score metric usesthe similarity score,? @BA 3, at the disparity where the matcherhas chosena

match. MSM estimateswhetherthe matchwassuccessfulor notbasedonthisscore.Intuitively, a low similarity scoremeansthatthematcheritself doesnotbelieveits bestmatchwasvery good. The rangeof MSM is limited to the rangeof the similarity metric, which in the caseof normalizedcross-correlationis CD8FE � EHG .2.4 CurvatureMetric (CUR)

Thecurvaturemetricmeasuresthecurvatureof thesim-ilarity functionaroundthe maximalsimilarity score

? @BA 3andusestheformula:��IJ������KLIM�LN - � ? @BA 3 8 ? ,MOQP<R 8 ? )�SUTHV R

where? ,MOQP<R and

? )�SUTHV R arethesimilarity scoresto theleftandright of

? @BA 3. Whencurvatureis low, CUR predictsa

falsematch.A low curvaturescorehappensin low-textureregions,wherenearbyvaluesappearto besimilar.2.5 Image Entr opy (ENT)

The image entropy metric relies on the intuition thatstereoworksbestin high textureareas.Entropy, muchlikeMDL [16], measuresimagetexture. If we assumethepixelvalues

are discreterandomvariables(RVs) with point

massfunction W , thenwe candefinetheentropy H:X������� 8ZYZ[$C \ �6]0� W ��^�_� GAn intuitivedescriptionof entropy is thatit measuresthe

randomnessof a RV. A low entropy meansthattheaverageprobabilityover thesupportsetfor a givenRV is low. Forexample,a constantregion in animagehasa onepoint dis-tribution, and thusa lower entropy thana highly texturedregion. Sincelow texturecausesproblemsfor many stereosystems,a low entropy scorecanbe usedto predictstereoerror. In our implementation,we usea 20 bin histogramtomeasurethe probability functionof a 7x7 supportwindowaroundeachpixel.2.6 Peak Ratio Metric (PKR)

Ideally, a matchshouldbeunique,andPKR attemptstoenforcethisby rejectingany matchwith morethanonesim-ilar candidate. The peakratio metric computesthe ratioof the two mostsimilar matches,

? ` OQacb_dfe @$A 3 and? @BA 3

.Whentheratio is closeto one,PKR predictsa falsematch.When the ratio is small, PKR indicatesa unique match,which is ostensiblyanaccuratematch.Typically, thepeakratio is high in areasof repetitivestructureandlow texture.2.7 ComparisonMethodology

For thecomparisons,we first obtainandrectify a stereopair, from which we manuallyconstructa groundtruth atinteger resolution(seeFigure2). We usethis groundtruthto verify our confidencematchpredictions.First, we mea-surewhereour stereoalgorithmhasmatchedbadly, count-ing any matchwithin 1 pixel of thetruthasvalid. Unmatch-ablehalf-occlusionpoints,deadpixelsandboundarypointsareexcludedfrom the statistics. We elect to usea binaryhit/missscoringsystembecausein the currentcontext wefeel thatoncea matchis erroneous,it is matchinga wrongpartner, andthedirectionandmagnitudeof theerroris irrel-evant.After calculatingwherethematcheractuallyfails,wecalculatewhereeachconfidencemetricestimatesfailure.

We characterizetheperformanceof a confidencemetricusingtwo numbers:the error andmatchpredictionproba-bilities - theconditionalprobabilitythatthebinocularstereoactually producesan error (match) given the confidencemetric predictsan error (match). We choosetheseproba-bilities ratherthanoverallnumberof predictionhitsbecausethey measurefalsepositivesandfalsenegativesandbecausethey normalizetheresultsfor easycomparison.Theproba-bility is a realisticmeasurebecauseit quantifiestheimpactif a systemalwaystruststheerrorestimatesof a particularconfidencemetric.

Page 4: A Stereo Confidence Metric Using Single View …wildes/wildesVI02.pdfA Stereo Confidence Metric Using Single View Imagery Geoffrey Egnal, Max Mintz Richard P. Wildes GRASP Laboratory

Figure2: StereoErrorMeasurement.Usinggroundtruth,we measurewherethe stereosystemfails. The top rowcontainsthefirst teststereopair. Thestereoresultsareonbottomleft, with a manually-obtainedground-truthto theright. Thegroundtruth labelsandexcludesunmatchablepoints,includinghalf-occlusionandboundarypoints.

3 ResultsThis sectionshows quantitative andqualitative compar-

isonsof SVSperformancerelative to otherconfidencemet-rics. The datasetfor thesecomparisonsconsistsof threestereopairs showing a box, a doll, and a pair of shoes.Each datasetdemonstratesdifferent aspectsof the confi-dencemetrics’performance.Theboxdemonstratesaplanarscenewith high texture.Thedoll testsetdemonstrateshowthe confidencemetricsreactto geometricdifficulties,suchas occlusionboundariesand a searchrangethat is smallwith respectto the attendantdisparity. The shoesdatasetdemonstrateshow the metricsreactto repetitive structureandlargeareasof low texture.

For eachcase,we run the stereoon two left imagestoget SVS data. Then,we run both the left-basedandright-basedmatchingprocesseson thebinocularstereopair. WecalculateENT on the rectifiedleft imagebeforethe stereomatching,andtheMSM, PKR andCUR metricsaregener-atedduringtheleft-basedstereorun. LRC is executedafterthetwo stereoprocesseshavefinished.

For a given test, we calculatefalsepositivesand falsenegatives. In this case,a falsepositive is wherethe con-fidencemetric predictsa stereofailure, but doesso incor-rectly. A falsenegative is wheretheconfidencemetricpre-dictsastereohit, but doessoincorrectly. Falsepositivesde-creasethe error predictionaccuracy, while falsenegativesdecreasethe matchpredictionaccuracy. The purposeofSVS is to predictstereofailure,not success.However, weincludethefalsenegativesfor completeness.

Thequantitativecomparisonrankorderseachconfidencemetrics’ estimatesof binocularfailure andplots how wellthe metricspredict stereoperformanceas the x axis pro-gressesfrom bestto worstestimates.We comparethemet-rics at a given participation rate, by which we meanthepercentof thepossiblematchesthataconfidencemetrices-

timatesthat the binocularstereowill fail. A higherpartic-ipation rate includeslower confidencepoints. SinceSVSandLRC do not predictasmany errorsasothers,they willnot produceresultsat higherparticipationrates.For exam-ple in case1, SVSonly predictsthatup to 7% of thestereomatcheswill be failures. Although this terminationseemspremature,this is not an error; the metric cannot predictany moreerrorswithout jumpingto theuselesscasewhereeverypoint is anerrorprediction.Similarly, ENT producesmany estimates,evenat thelowestthresholdof zero.3.1 Case1: Box

Thefirst testsetshows a box on top of anopticalbenchin front of a texturedbackground(seetheBox setin Figure3). Althoughthis testsetdoesexhibit featureswhich causestereoerror, suchasa low-texturepatchat top andocclud-ing boundariesnearthebox, thescenehashigh textureandplanarscenery. Both factorsaid thestereoperformanceandallow theconfidencemetricsto predictfewererrors.

Qualitatively, one can seethe differencesbetweentheconfidencemetricsat the top of Figure3. The figure dis-playsin white thepixelsat which eachmetric predictsthestereosystemwill err. For anevencomparison,we thresh-old eachmetricsothat7%of thepossiblematchesareerrorpredictions. The SVS metric correctlypredictsthe stereoerrors in the texturelesspatch at top, and also correctlylabelspoints in the lower portion of the imageas errors.SVS is not ableto label repetitive structurein this testbe-causetheholeson thetablearedistinctiveenoughthatSVSmatchescorrectlyat thoseregions.LRC labelsmany pointsin the top patchastrue matches,finding that both the leftandright-basedmatchingprocessagreeon thefalsematch.LRC countserrorsmadein eitherof thetwo views;so,someright-basederrorsareintroducedinto theleft-basedmatch-ing error results. CUR andPKR mistakenly predicterrorsat correctmatchesthat happento have smoothcorrelationcurves. MSM andENT performswell at this participationrate,but mistakenlylabelafew moreerrorswithin thelowerportionof theimagethanSVS.

The quantitative resultsfor the Box test are shown atthe left of Figure4. We presenttwo graphs:error predic-tion at top andcorrectmatchpredictionat bottom. Ideally,one would like to seetwo traits in thesegraphs: a high-probabilitypoint at high participationandalsoa flat curvefollowed by a sharpdescent. The high probability pointindicatesthat the metric can accuratelypredict binocularstereoerror. Theflat curve indicatesthat themetric is rela-tively insensitive to thresholding.

For errorprediction,SVSexhibits bothpositive traits. Ithasahighererrorpredictionprobabilitythantheothermet-rics. AlthoughSVShaslower participation,it consistentlyestimateserrorswith above99%accuracy, evenup to a 7%participation. SVS only reaches7% participationbecausethe SVS thresholdrangesfrom ' .2 to ' 20 pixels awayfrom zero. The binocularstereoactually produceserrorsfor 32%of possiblematches,andsotheconfidencemetricswould needat leasta 32% participationrateto predictev-

Page 5: A Stereo Confidence Metric Using Single View …wildes/wildesVI02.pdfA Stereo Confidence Metric Using Single View Imagery Geoffrey Egnal, Max Mintz Richard P. Wildes GRASP Laboratory

Box

Left Right Actual SVS

LRC MSM CUR ENT PKR

Doll

Left Right Actual SVS

LRC MSM CUR ENT PKR

Shoes

Left Right Actual SVS

LRC MSM CUR ENT PKR

Figure 3: The MissesPredictedby the ConfidenceMetrics. The imagesdisplaythe pixels that eachconfidencemetric wouldpredictasbadmatchesin white, andthepredictionsof correctmatchesin black. Theresultsfor theBox testareat top, theDolltestin middleandtheShoessetat bottom.Eachdisplayshows thestereopair, theactualstereomisses,andthemissespredictedby eachconfidencemetric.

ery error. SVS doesnot predictall errors,but the errorsitpredicts,it predictswell. The closestcompetitorto SVS,MSM, performsat a 5% lower probability. This differencemayseemnegligible, but with repeatedapplication,suchasthatusedin hierarchicalstereo,evensmallpredictionerrorcompounds.ENT performswell, correctlylabelingthelowtextureregions.LRC performswell, but spuriouslypredictsmoreerrorsthanSVSin themiddleof theimage.CURand

PKR are relatively inefficient at participationratesabove5%. Also, all confidencealgorithmsasidefrom SVS de-pendheavily upontheir threshold.SVS’ flat probabilityoferrorpredictionmakesits implementationmucheasier.

SVSdoesnotfareaswell whenpredictingcorrectbinoc-ular matchesasit did whenpredictingerrors.Sincetheer-ror predictionratesof SVSarelow, SVSinherentlypredictsmany stereomatches.Interestingly, SVS performswell at

Page 6: A Stereo Confidence Metric Using Single View …wildes/wildesVI02.pdfA Stereo Confidence Metric Using Single View Imagery Geoffrey Egnal, Max Mintz Richard P. Wildes GRASP Laboratory

Box Doll Shoes

0% 5% 10% 15% 20% 25% 40%

50%

60%

70%

80%

90%

100%

Pro

babi

lity

of S

ucce

ssfu

l Err

or P

redi

ctio

n

Error Prediction Accuracy

Participation (Percent of Image Points Predicted to be Errors)

SVSLRCMSMCURENTPKR

0% 10% 20% 30% 40% 50% 60% 75%

80%

85%

90%

95%

100%

Pro

babi

lity

of S

ucce

ssfu

l Err

or P

redi

ctio

n

Error Prediction Accuracy

Participation (Percent of Image Points Predicted to be Errors)

SVSLRCMSMCURENTPKR

0% 10% 20% 30% 40% 50% 60% 75%

80%

85%

90%

95%

100%

Pro

babi

lity

of S

ucce

ssfu

l Err

or P

redi

ctio

n

Error Prediction Accuracy

Participation (Percent of Image Points Predicted to be Errors)

SVSLRCMSMCURENTPKR

70% 75% 80% 85% 90% 95%100% 74%

76%

78%

80%

82%

84%

86%

88%

90%

92%

Pro

babi

lity

of S

ucce

ssfu

l Mat

ch P

redi

ctio

n

Match Prediction Accuracy

Participation (Percent of Image Points Predicted to Match Correctly)

SVSLRCMSMCURENTPKR

40% 50% 60% 70% 80% 90%100% 20%

25%

30%

35%

40%

45%

Pro

babi

lity

of S

ucce

ssfu

l Mat

ch P

redi

ctio

n

Match Prediction Accuracy

Participation (Percent of Image Points Predicted to Match Correctly)

SVSLRCMSMCURENTPKR

10% 20% 30% 40% 50% 60% 70% 80% 90%100% 10%

15%

20%

25%

30%

35%

40%

45%

50%

55%

60%

Pro

babi

lity

of S

ucce

ssfu

l Mat

ch P

redi

ctio

n

Match Prediction Accuracy

Participation (Percent of Image Points Predicted to Match Correctly)

SVSLRCMSMCURENTPKR

Figure4: Quantitative Results.Thetoprow containschartsthatshow how well eachconfidencemetricpredictsstereoerror, whilethe bottomrow containschartsshowing how eachmetric predictsstereosuccess.As the metricspredictmorebinocularerror,their errorprobabilitydecreases.As they do this, they predictfewer stereohits,andthehit probabilityincreases.We comparethemetricsat all possibleLRC thresholdlevels. NotethatSVSandLRC do notpredictasmany errorsastheothermetrics.Thenextavailablethresholdwould forcethetwo metricsto indiscriminatelypredictall pixelsaserroneous.

this high level of matchprediction,higher than the othermetrics. However, the performancegain is marginal asallof themetricsconvergeto thesamepoint.3.2 Case2: Doll

The secondtestsetshows a doll next to a pole (seetheDoll set in Figure 3). This test differs from the previousone in that the subjectis non-planar, the backgroundhaslesstexture,andthepole is thin with many occludingbor-ders.Moreimportantly, theheadandchestof thedoll figurelie outsidethebinocularstereosearchrangeof ' ��g pixels.Thecasedemonstrateshow the metricsbehave aroundge-ometricerrorsources.Theerrorsourcesleadthebinocularstereoto achieveonly a 22%matchrate.

Thequalitativedataliesin themiddleof Figure3. For anevencomparison,wethresholdeachmetricto show thecasewhere36%of thepossiblematchesarepredictederrors.Asbefore,SVS andENT bestpredict the stereoerrorsin thetexturelessblack patch,andalsocorrectly labelspoints inthe lower portion of the imageaserrors. Unlike LRC andPKR, the othermetricsarenot able to label pointson thedoll’schestandheadwherethematchwasoutof thestereosearchrange.Thesameis true for the stereoerrorson thefloor thatareout of rangeandthestereoerrorsneartheoc-cludingboundaryof thepole.MSM andENT performwellat this participationrate,but mistakenly labelsa few moreerrorswithin thebackboardthanSVS.

Thequantitativeresultsfor theDoll testareshown in themiddleof Figure4. SVShasahighererrorpredictionprob-ability thantheothermetrics,bolsteredby its performancearoundthe black background.The closestcompetitorstoSVS, MSM and ENT, perform at a slightly lower proba-bility. LRC performswell, but spuriouslypredictsmoreerrorsthanSVS in the middle of the image. Again, CURand PKR are inefficient. As before,SVS hasthe flattestcurve, which indicatesthat SVS doesnot needits thresh-old fine-tuned.At thebottomof thefigure,SVS is the topperformerin predictingcorrectstereomatches,albeit at alow rate of error prediction. SVS outperformsthe othermetricsonly in theareaof theblackbackground,andeventhoughit predictstoo many matches(too few errors)in theareaof thedoll, its overall rateis highest.MSM andENThave a high matchpredictionratefor similar reasons.LRCis ableto correctlypredictmatchesin the areaof the doll,but over-predictsmatchesin theblackbackground,puttingLRC third to MSM andSVS.3.3 Case3: Shoes

The last test set shows a pair of sneakers in front of aracquet(seethe Shoesset in Figure 3). This test differsfromthepreviouscasesin thattheobjectsexhibit significantrepetitive structureandthe imagehasa large areawithouttexture. Suchtest imagerymay realistically simulatethelow texturepatchesfoundin outdoorurbanterrainor scenes

Page 7: A Stereo Confidence Metric Using Single View …wildes/wildesVI02.pdfA Stereo Confidence Metric Using Single View Imagery Geoffrey Egnal, Max Mintz Richard P. Wildes GRASP Laboratory

with large patchesof sky. Theseerror sourcescausethebinocularstereoto achieveonly a15%matchrate.

Thequalitative datais at thebottomof Figure3. For anevencomparison,we thresholdeachmetricso that the im-agesdisplay the casewhere62% of the possiblematchesare predictederrors. The SVS and ENT metricsperformbestin thetexturelessblackpatch.Unlike MSM andENT,SVSpredictsstereoerrorin therepetitivepatternof therac-quetstrings.SVSperformswell in thisarea,but notaswellasit normallydoesin the low-textureregions.Becausetherepetitivestructureis rarelyanexactrepeat,while thesamestructureat thesamepoint is, thezerodisparitysolutionhasastrongtendency to beatout theshift-by-one-cyclesolutionfor SVS.PKR is ableto labeltherepetitive structureof thetennisracquet.LRC detectserrorsboth in thebackgroundandin therepetitivestructureof theracquet.CUR hashighconfidencein theracquetface,but overpredictserror in thefront of thetop shoe.

The quantitative resultsfor the Shoestestareshown attheright of Figure4. SVSerrorpredictionsin therepetitivestructureof the racquetarecorrectoverall, but at a lowerprobability than error predictionsin the texturelessareas.For this reason,SVS dropsto 98% probability andscoresthird to MSM andENT in predictingstereoerror. LRC pre-dicts errorswell in both the areaof the racquetfaceandthebackground,but producesfalseerrorpredictionswhichputsits errorpredictionprobabilitynear97%,or fourth. Allalgorithmshave a flat curve,which signalstheir insensitiv-ity to thresholding.Whenpredictingmatches,themethodsperformsimilarly. SVS doesnot achieve the samematchpredictionrateof MSM andENT becausetherepetitiveer-ror structureof theracquetfacecausesSVSto errslightly.3.4 GroupedResults

To quantify exactly which types of error sourcesthemetricsareestimating,we manuallysegmentedthe actualstereomissesinto four groups: out-of-rangeerrors, low-texture errors, repetitive structureerrors,and other errors(seeFigure5). We measurethe successfulerror estimatesfor eachmetric in eachof thesecategories.Theplotscom-parethemetricswhenworking at partipationratesof 7, 36,and62%for theBox, Doll, andShoescases,respectively.

The plots show that SVS predictsthe most low-textureerrorscorrectlyin all but thelastcase,with MSM andENTbehindSVSin thefirst two cases.PKRis effectiveatdetect-ing repetitive structure,especiallyin the shoescasewheretheracquetfacepresentsdifficulty for thematcher. LRC hasthemostevenperformanceacrosserrorcategories,pickingupall four kindsof erroreffectively.

4 Discussion4.1 Algorithm Results

Each algorithm has different performancecharacteris-tics. SVS performsbest in regionsof low texture. LRCdetectsmosttypesof stereoerror(seeTable1), but doesnotpredicterror whenthe two matchingprocessesmistakenly

GroupedActualErrors GroupedErrorEstimates

Out of Range Low Texture Repetitive Other 0

2,000

4,000

6,000

8,000

10,000

12,000

14,000

16,000

18,000

Num

ber

of E

rror

s E

stim

ated

Cor

rect

ly

Grouped Error Estimation at 7% Participation

SVSLRCMSMCURENTPKR

Out of Range Low Texture Repetitive Other 0

10,000

20,000

30,000

40,000

50,000

60,000

70,000

80,000

90,000

Num

ber

of E

rror

s E

stim

ated

Cor

rect

ly

Grouped Error Estimation at 36% Participation

SVSLRCMSMCURENTPKR

Out of Range Low Texture Repetitive Other 0

20,000

40,000

60,000

80,000

100,000

120,000

140,000

Num

ber

of E

rror

s E

stim

ated

Cor

rect

ly

Grouped Error Estimation at 62% Participation

SVSLRCMSMCURENTPKR

Figure 5: GroupedError Estimates.For eachtestcase,theleft imagedepictsthemanuallygroupedstereoerrors,while the right plot shows thecorrectlylabeledsubsetoferrorswheneachmetricestimates7,36,and62%(respec-tively) of possiblematchesto beerrors. In the manuallygroupederrors image, the grayscalescorrespondto thetypeof error, where50means’Out of Range’,150means’Low Texture’, 200means’RepetitiveStructure’,and255means’Other’.

agree.As evidencedby theblackregionsfor LRC in Figure3, this happensoften. Whenboth matchingprocessesareaccurate,LRC predictsfewer falseerrors.LikeSVS,MSMandENT reactto a subsetof thepossibleerrorsources,butdosowith ahighprobabilityof success.All threearepartic-ularly sensitiveto low-textureareas,butMSM is alsoabletodetecterrorsfrom moresourcesthanSVSandENT. Futurework includestestingnon-MNCC MSM metrics,suchasSSD,which may not measurelow texture regionsaswell.PKR is particularly effective in repetitive structureareas,but producesmany falsepositivesfor correctmatchesin ar-easof repetitive structure(e.g.,shoesdataset)andfails inlow textureregionswhereerroneousmatchesareunimodal.CUR hasa significantfalsepositive rate,labelingaserrorsthecorrectmatchesthathave low curvature.

Theconfidencemetricshaveotheradvantagesanddisad-vantages.Mostnotably, thenon-SVSmethodspredictmoreerror. Labelingmoreerrorswith lessaccuracy may some-timesbedesirable.However, whenevery matchis needed,the error predictionprobability is moreimportantthanthenumberof errorpredictionsbecauseit woulddiscountfewermatchesanddo somoreaccurately. Anotherdifferenceliesin the metrics’ implementation.SVS andLRC do not de-

Page 8: A Stereo Confidence Metric Using Single View …wildes/wildesVI02.pdfA Stereo Confidence Metric Using Single View Imagery Geoffrey Egnal, Max Mintz Richard P. Wildes GRASP Laboratory

pendcritically on thresholding,which makesimplementa-tion easier. The other confidencemetricsare sensitive totheir thresholdandneedto be to be morefinely tunedforoptimalaccuracy. Therunningtime for themetrics’ is alsodifferent. The fastestmetric,MSM, requiresno additionalcomputationsinceit relieson the stereomatchingprocessitself. CUR and PKR requirean incrementalcalculationduring the stereomaximizationphase. SVS andLRC re-quireanadditionalmatchingprocess,while ENT requiresacostlyhistogrammingoperation.4.2 Summary

In this paper, we have presenteda new confidencemet-ric, SVS,whichdelivershighprobabilityof errorandmatchprediction,especiallyaroundlow-textureregions.We havecomparedthemetricagainstfiveothermetrics,LRC,MSM,CUR,ENT andPKR,underavarietyof laboratorysettings.

The choice of confidencemetric ultimately dependsupon the sceneryat hand and the operational require-mentsof thesystem.Whenlow-textureareasdominatethescenery, SVS hasa high probabilityof success;it alsocandiagnosedifficultiesarisingfrom sensornoiseandperiodicstructure. Whenthe matchingis generallyaccurate,LRCwill be a goodconfidencemetric becauseit reactsto mosterrorsourcetypesandbothof its matchingprocessesshouldsucceed.MSM suitssystemsthatrequireextremespeedbe-causethesystemalreadyhasto calculatea matchscore.Inthefinal analysis,it maybedesirableto combinethesemet-rics to takeadvantageof their differentstrengths.References

[1] P. Anandan. ComputingDenseDisplacementFields withConfidenceMeasuresin ScenesContainingOcclusion. InSPIEIntelligent Robotsand ComputerVision, volume521,pages184–194,1984.

[2] P. Anandan.A ComputationalFramework andanAlgorithmfor the Measurementof Visual Motion. IJCV, 2:283–310,1989.

[3] D.N. BhatandS.K.Nayar. OrdinalMeasuresfor ImageCor-respondence.PAMI, 20(4):415–423,1998.

[4] S.D.BlosteinandT.S.Huang.Error Analysisin StereoDe-terminationof 3-D Point Positions. PAMI, 9(6):752–765,1987.

[5] R.Bolles,H. Baker, andM. Hannah.TheJISCTStereoEval-uation. In DIUW, pages263–274,1993.

[6] P. BurtandE.Adelson.TheLaplacianPyramidasaCompactImageCode.Trans.Computers, 31:532–540,1983.

[7] C. Chang,S. Chatterjee,andP.R. Kube. On anAnalysisofStaticOcclusionin StereoVision. In CVPR, pages722–723,1991.

[8] S. DasandN. Ahuja. PerformanceAnalysisof Stereo,Ver-genceandFocusasDepthCuesfor Active Vision. PAMI,17(2):1213–1219,1995.

[9] G. Egnal and R.P. Wildes. Detecting Binocular Half-Occlusions: Empirical Comparisonsof Five Approaches.PAMI, 2002.To Appear.

[10] O. Faugeras,P. Fua,B. Holz, R. Ma, L. Robert,M. Thon-nat, and Z. Zhang. Quantitative and Qualitative Compar-ison of SomeArea and Feature-BasedStereoAlgorithms.In ForstnerandRuwiedel,editors,RobustComputerVision:Quality of ComputerVision Algorithms, pages1–26.Wich-mann,1992.

[11] W.E.L. Grimson. A ComputerImplementationof a Theoryof HumanStereoVision. In Phil. Trans.RoyalSoc.London,volumeB292,pages217–253,1981.

[12] M.J.Hannah.ComputerMatchingof Areasin StereoImages.PhDthesis,StanfordUniversity, May 1974.

[13] M.J. Hannah. Detectionof Errors in Match Disparities. InDIUW, pages283–285,1982.

[14] G. Kamberova and R. Bajcsy. SensorErrors and the Un-certaintiesin StereoReconstruction.In Kevin W. BowyerandP. JonathanPhillips,editors,EmpiricalEvaluationTech-niquesin ComputerVision, pages96–116.IEEE ComputerSocietyPress,1998.

[15] K. Konolige. Small Vision Systems:HardwareandImple-mentation.In Intl. Symp.RoboticsResearch, 1997.

[16] Y.G. Leclerc. ConstructingSimpleStableDescriptionsforImagePartitioning. IJCV, 3(1):73–102,1989.

[17] Y.G.Leclerc,Q.-TuanLuong,andP. Fua.Self-Consistency:A Novel Approachto CharacterizingtheAccuracy andRe-liability of Point CorrespondenceAlgorithms. In DIUW,pages793–807,1998.

[18] J. Little and W. Gillett. Direct Evidencefor OcclusioninStereoandMotion. In ECCV, pages336–340,1990.

[19] X. Liu, T. Kanungo,andR.M. Haralick. StatisticalValida-tion of ComputerVision Software. In DIUW, pages1533–1540,1996.

[20] A. Luo andH. Burkhardt. An Intensity-BasedCooperativeBidirectionalStereoMatchingwith SimultaneousDetectionof DiscontinuitiesandOcclusions.IJCV, 15:171–188,1995.

[21] R.Mandelbaum,G.Kamberova,andM. Mintz. StereoDepthEstimation: A ConfidenceInterval Approach. In ICCV,1998.

[22] L. Matthies. StereoVision for PlanetaryRovers: StochasticModeling to NearReal-Time Implementation.IJCV, 8:71–91,1992.

[23] J.E.W. Mayhew andJ.P. Frisby. PsychophysicalandCom-putationalStudiesTowardsa Theoryof HumanStereopsis.Artifical Intell., 17:349–385,1981.

[24] J.Mulligan, V. Isler, andK. Daniilidis. PerformanceEvalu-ationof Stereofor Tele-presence.In ICCV, 2001.

[25] Y. OhtaandT. Kanade.Stereoby Intra- andInter-ScanlineSearchUsingDynamicProgramming.PAMI, 7(2):139–154,1985.

[26] D. Samaras,D. Metaxas,P. Fua,andY.G. Leclerc. VariableAlbedoSurfaceReconstructionfrom StereoandShapefromShading.In CVPR, pages480–487,2000.

[27] Kiyohide Satoh and Yuichi Ohta. Occlusion DetectableStereo- SystematicComparisonof DetectionAlgorithms.InICPR, 1996.

[28] D. Scharstein.StereoVision for View Synthesis. PhDthesis,CornellUniversity, Jan.1997.

[29] D. ScharsteinandR. Szeliski. A TaxonomyandEvaluationof DenseTwo-FrameStereoCorrespondenceAlgorithms.IJCV, 47(1):7–42,2002.

[30] T. Smitley andR. Bajcsy. StereoProcessingof Aerial UrbanImages.In IJCPR, pages405–409,1984.

[31] R. Szeliski. PredictionError asa Quality Metric for MotionandStereo.In ICCV, pages781–788,1999.

[32] R. Trapp,S. Drue,andG. Hartmann.StereoMatchingwithImplicit Detectionof Occlusions. In ECCV, pages17–33,1998.

[33] J.Weng,N. Ahuja,andT.S.Huang.Two-View Matching.InProc.Int’l Conf. ComputerVision, pages64–73,1988.

[34] Y. Xiong andL. Matthies. Error Analysisof a Real-TimeStereoSystem.In CVPR, 1997.