stereo-matching of line-segments based on a 3 …...stereo-matching of line-segments based on a...

Stereo-matching of line-segments based on a 3-dimensionalheuristic with potential for parallel implementation.

R.L. Vergnet, S.B. Pollard and J.E.W. Mayhew.Artificial Intelligence Vision Research Unit,University of Sheffield, Sheffield, S10 2TN.

This work provides a segment-based alternative to theedge-based stereo algorithm already existing in the TINAsystem. Our starting point is the algorithm of Ayache &Faverjon (1985). It attempts to recover groups of line-matches having low local disparity variations. In ourimplementation of Ayache & Faverjon's algorithmmatches were built quickly, but glaring mistakes occurredand the general behaviour was difficult to control. There-fore the constraints on the continuity of the world impli-citly used in the original approach were reformulated toachieve more robust matching, more particularly byrequiring mutual support between reconstructed 3-D seg-ments. A new algorithm has been designed that is suit-able for parallel implementation, where 3-D matches andtheir neighbourhood relationships are explicitly computed,cliques found (objects) and uniqueness enforced.

This paper describes a line based approach to binocularstereo that augments the capabilities of the TINA system(Porrill et al 1987, Pollard et al 1987). The alreadyexisting stereo matching is edge-based utilising the PMFalgorithm (Pollard 1985) and its descendents. Whilst anedge based approach is more general, image structuresthat are amenable to description by linear approximationscan be efficiently and robustly matched by an alternativestrategy. Hence a combined approach is adopted allowingeither, redundant cross primitive consistency to giveincreased robustness, or segment matching to seed match-ing at the lower level to improve efficiency.

The starting point for this work is the segment matchingalgorithm due to Ayache and Faverjon (1985) whichemploys a hypothesis, propagation and test strategy toidentify corresponding segments. Only a small number ofwell selected segments act as a basis for matchhypotheses, the remainder being found as a result of thepropagation stage. Their algorithm sets up a few likelycorrect matches and propagates them. Along the propaga-tion, a weak disparity gradient constraint is enforcedthrough an imposed limit upon the disparity differencebetween neighbouring matched segments.

R.L. Vergnet is a visiting research student from ENST, 46rue Barrault, 75013 Paris, and gratefully acknowledges theBritish Council for their support.

In this way the algorithm can be thought to be identifyingthe most geometrically plausible three-dimensional struc-tures from the pool of potential matches by buiding themrecursively. Once a few but strong hypotheses have beenpropagated independently, the final selection of matches isachieved preferring those generated by the most success-fully propagated hypotheses.

One disadvantage of the Ayache and Faverjon's algorithmis its unsuitability for parallel implementation. The finestgrain of potential parallel computation is the propagationof the match hypothesis. In our serial implementation oftheir algorithm only a small number of hypotheses(between 5 and 15) are usually necessary to achieve fullmatching, and only the correct hypotheses (a small pro-portion) account for the largest proportion of computa-tional effort because they propagate most widely. Further-more, there is great redundancy in the matches raised bythese good hypotheses. The improvement of performancewith increasing number of processors soon achievessaturation.

This paper describes an alternative segment based algo-rithm more suited to a parallel approach. While ourcurrent implementation is on a sequential machine it isour intention to develop the algorithm to run on MARVINthe AIVRU transputer-based vision architecture.

The other major departure from Ayache and Faverjon'soriginal algorithm is that we prefer an implementation ofthe continuity constraint based upon actual three dimen-sional connectivity rather than the computationally lessexpensive alternative of rough disparity similarity. Themore powerful constraint actually involves little extracomputational overhead when a simplified parallelrepresentation of the camera geometry is used.

1. LINE DESCRIPTION

1.1. Geometry and Representation

Edges are obtained from Canny operator (1986) at a sin-gle high frequency (small sigma Gaussian smoothing).Following their detection, edge-strings are formed by link-ing edge pixels using a few simple heuristics. Straight-line approximation employs a recursive fit (by orthogonalregression) and segment strategy. Segmentation points areadded when the edge string line deviates from the currentline fit. Robustness and efficiency are obtained throughthe use of a heuristic search strategy to identify thoseregions of strings/sub-strings that are most amenable to

181

straight-line fit.

We have found this scheme to be more reliable at identi-fying true straight lines consistently than more traditionalpolygonal approximations usually intended to segmentedge-strings for data-compression purposes. However thisstrategy proves to be less able to deal with data that doesnot arise from linear features.

Line segments are rectified; that is they are reprojectedinto a parallel camera geometry equivalent to the originalnon-parallel one. This has the effect that the epipolarconstraint becomes a same raster constraint and the pro-cess of 3-D projection is greatly simplified.

The left and right cameras have optic centers Ol and Or

respectively. The principal axes are perpendicular to theinterocular axis that connects the optic centers. Coplanarimage planes, parallel to the interocular axis are illustratedin front of the optic centers for simplicity. The worldcoordinate frame is located at the optic centre of the leftcamera with axes along the focal direction (z : positivetowards the scene), the interocular direction (x : positiveaway from the left optic center), and the direction mutu-ally orthogonal to these (y : positive upwards). The coor-dinate frames of each image are located where the princi-pal axes intersect their respective image planes with the xand y axes aligned with those of the world frame.

Consider a point P at (x, y, z) and its projection into theleft and right images at ? , = QCh Y) and Pr = (Xn Y)respectively. The disparity d between these projections isgiven by d = Xt - Xr.

It follows from similar triangles and algebraic manipula-tion that

x =X£d

y = ILd d.

where / is the focal length of the cameras, and / the dis-tance between their optic centres.

Line segments are denoted L; and Lr in the left and rightimages respectively and as L in the 3-D world. Each linesegment is represented in the overdetermined fashion bythe attribute tuple

(Pi,P2, V, C, con, length)

Where:

• Pi and P-i are end point vectors

• V is the direction Px to P2

• C the centroid

• con the average measured contrast along the line (orits projection into the left image) which is measuredin the direction V with positive polarity defined as"dark to the left/light to the right"

• length is the length of the line

1.2. Windows

Each image has a window bucket structure over the linestructure to provide efficient spatial access. Each image issegmented into 16 by 16 grid of windows. Each windowhas a line "bucket", i.e a list of references to the lines thatintersect the window (these are accessed bucket[i][j] forthe window row i col j).

Additionally each line has a list of windows it intersects.Window buckets simplify the search for match candidates.Search is restricted to the set of window buckets that liewithin the disparity search interval (which is of limitedextent). As shown below, buckets are also used for fastcomputation of image-based segment neighbourood rela-tionships

1.3. Line Matching

Line matches are identified provided they satisfy the fol-lowing criteria (all these criteria are verified symmetri-cally on a left-to-right and right-to-left basis):

(1) Only line features for which

length > /„,„,.

V.i > cos^min) where i is the horizontal axisdirection,

are used as matching primitives.

Small edge-strings are very often noisy and seldomdescribe straight features of the scene. Therefore athresholding is made on the length of the segments.Qnia is an orientation threshold in order to eliminatehorizontal segments for which 3-D computation isdifficult.

i v,jv, - vriv i(2) . / * — < K : a disparity gra-

Wijvy + <yrjvry +1dient constraint for orientation.

We prefer this orientation constraint because thedisparity gradient is related to the slope of the 3-Dlines of the scene with respect to the z-axis. Thedefinition and use of the disparity gradient limit canbe found in (Pollard PMF 1985). With this formu-lation, the limit on reorientation is tuned accordingto the orientation itself. Thus, the angulardifference is maximal for vertical segments (V* = 0)and null for horizontal ones (Vy = 0). The value ofK is set between 1 and 3.

182

1 cont

(3) — < < c : a similar contrast constraint.c conr

Since the contrast criterion is not very robust, thevalue of c is set relatively high (typically 3, 4).However the mere sign of the contrast is still astrongly discriminating feature.

(4) Matches for L, must be intersected by the raster ofthe centroid C, = (X,, Y{) of Lt.This is the epipolar constraint suitably adapted forwhole segments.

(5) Matches must lie within the disparity range[Dion, Dhigh\ ; that is matches for L, must have Xr inthe range [X, - DUgh, X, - DIm/].

The limit on the allowed range for disparity isderived from the possible positions of the featuresof the scene and corresponds to a depth range

Given the existence of a match between L; and Lr wecompute L as follows:

(i) First identify the intersection of Lt and Lr in thevertical direction, call this y((W to YUgh and thepoints of intersection of the of L, and Lr with these"rasters" P, ,Pik.k,Pr, , a n d P . .

'low 'high rlow rhigh

(ii) The 3-D end points of L are computed frommatches P, ->P r , and P, ->P r respectively to

'low rlow 'high rhigh r J

give P(ow and PWgA.(iii) The other attributes (except contrast) of L are com-

puted from P/ow and PWgA.

Thus a set of 3-D lines involving a same left line Lt isgenerated:

Lh -> { L}, L?, L,"' }

Note that it is the intersection of the left and right imagefeatures that give rise to the 3-D segment. A less conser-vative approach would have been to adopt either left, orright image feature (or their union) to provided the verti-cal extent of the image projection of the 3-D primitive.

2. ALGORITHM OVERVIEW

Our basic approach is to construct all possible line matchstructures (in 3-D) prior to a stage of parallel constraintpropagation. The limiting resolution of realistic parallelimplementation being the line segment. In order to sim-plify the structure of the algorithm verification of matchesis performed with respect to one image only (the left).Unlike Ayache and Faverjon we allow line segments tohave multiple matches provided they satisfy certaingeometrical constraints and do not intersect common ras-ters.

The algorithm employs the following heuristics to exploitthe assumption of locally continuous 3-D space:

(1) We define a 3-D line neighbourhood, saying thattwo lines Lu L2 are neighbours when their distanceZ>(Li, L2) is less than a certain threshold e. Thedistance measure is not the minimum Euclidian dis-tance between Lj and L2 but the minimum endpoint distance. A continuous object is defined as aset of lines that are directly or indirectly linked byneighbourhood pairwise relationships. Continuousobjects that contain greater numbers of constituentsare preferred.

(2) Within objects, a line-segment often will haveseveral immediate neighbours therefore lineinterpretations that possess greater numbers of 3-Dneighbours will be preferred.

In order to restrict the 3-D neighbourhood search, A 2-Dimage based neighbourhood graph is computed. To eachleft line L; a list of its neighbours is attached:

-> {Q,L?,....,L?}

If the distance between 3-D lines is less than e, then theirprojections in the images are expected to be less than acertain limit

-JLv =

where zmin is a chosen minimum depth.

In the two images neighbours are computed using thebuckets, as follows :

• For each line L, look at the set of buckets attachedto L (ie that it intersects).

• Given subscripts i and j of a window correspondingto one such bucket, consider the eight neighbours ofthat bucket that differ in subscript at most one.

• Among the lines contained in these buckets, selectthe lines L' such that dist ( L, L') < v.

3. VALIDATION OF THE MATCHES

When all the possible matches have been made and thecorresponding 3-D segments computed, for each proposed3-D segment L the following attributes are maintained:

- match Lr

- 3-D_neighbs- object- obsize- nlinks

Lr is the right line that is matched to L,. The idea is toenforce uniqueness constraint on the matches by eliminat-ing ghost (incorrect) 3-D lines on the basis of the con-tinuity of the 3-D world. Within continuous 3-D objects,we give more support to the lines that have a greatnumber of neighbours.

The algorithm proceeds as follows (see schematic figurein end of this section):

183

(i) Compute 3-D neighbourhood graph:

A pair of lines L and L' are considered to be neigh-bours in 3-D if:

L ; and L{ are neighbours : where L, and L{are the left image line primitives involved inthe generation of L and L' respectively.

dist ( L, LO < e

Furthermore a pair of lines L and L' that have thesame left image line primitive L( are considered tobe neighbours if:

Lr and L / are collinear and non overlapping:where Lr and L / are the right image lineprimitives involved in the generation of L andL' respectively.

dist ( L, LO < e

(ii) Compute connected sets:

Propagation is performed along the links of the 3-Dneighbourhood graph to identify and label connectedsub-graphs. The attributes of each line are set asfollows:

nlinks = number of other segments L' it isconnected to directly.

object = number of the object/sub-graph theline belongs to.

obsize = size of the object/sub-graph the linebelongs to ie the number of lines forming theobject.

Note that each object is not necessarily consistent.It is possible (highly likely) that more than one 3-Dline segment resulting from matches of the same leftor right image primitive are included in the sameobject, most of them being incorrect. In this way itis possible for "leakage" between objects to occur(because of ghost segments that result their mergerto a single entity.

(iii) Enforce within-object uniqueness:

Ideally within object uniqueness would be computedusing some form of "max-clique" algorithm subjectto the constraint that matches are consistent exceptif they involve the same left or right image primi-tive. The maximal consistent clique found wouldprovide a definition for a potential object

The strategy we prefer on grounds of computationalefficiency and potential for parallel implementationis to remove from each object/sub-graph all exceptthe most strongly supported amongst any set of linesthat violate the uniqueness constraint with respect toeach other. Support is determined directly in termsof nlinks score ; the immediate neighbourhood sup-port. Uniqueness is defined over lines Lt from theleft image and allows only single matches exceptwhere they could arise from a single 3-D linefeature in which case they are required to be col-linear the right image.

Note that within-object uniqueness (and uniquenessin general) is determined with respect to the leftimage alone. It would be straight forward to extendthe definition of uniqueness to include the rightimage, however we have found the simplerapproach to be sufficient for our purposes.

(iv) Stage (ii) is repeated to identify new connectedobjects that are now consistent with the uniquenessconstraint (but not necessarily optimal). Discardedsegments are not allowed to form part of theseobjects. Accordingly it is possible that fragmenta-tion of the former objects into sub-objects willoccur at this point.

To summarise: on the first pass, objects are com-puted, and then segmented from low-support lines.On the subsequent pass smaller objects are identifiedthat satisfy within-object uniqueness.

(v) Enforce globed uniqueness : for each L; and for eachpair ( L , L ' ) such that they belong to a differentobjects, choose the one that belong to the biggestobject, ie that has the highest obsize, and eliminatethe other.

(vi) Remove the objects that have a obsize below thres-hold for it is very probable that they are ghostobjects.

LINE MATCHES

The prediction of hypotheses gives raise tomatches between left and right lines, fromwhich 3D segments L are computed.

Then 3D neighbourhood relations betweenlines L are set.

VALIDATION OF THE MATCHES

I. Connected sets of neighbouringsegments are found , labelled andtheir size computed.

II.Within object uniqueness is enforcedon the basis of local support, thereforelines are removed and objects arerecomputed (3 has split into 3 and 4).

IIIGlobal uniqueness with respect to left lines(

is enforced on the basis of object size.(2 has has been cleared from lines correspon-ding to left 2D lines involved in 1,3 ...)

IV.Objects whose size is too small are removed V J f \for being supposedly ghost objects. I J

©

©

©

184

4. EXAMPLES OF PERFORMANCE

Examples of performance on two stereo pairs are shownin the last figures. It must be pointed out that even withloose constraints on local features, the way from ambigu-ous to disambiguated stereo-data is not very long.

On Fig a & b, we show pairs of polygonal approxima-tions, then 3 stages of the algorithm are shown, and on apurpose of intelligibility, with two different 3-D views ofthe same scene for each step.

(aj, bO All the matches (two views).

(a2, b^) After within-object uniqueness (two views).

(a3, b3) After global uniqueness (two views).

The parameters are set to the following values :

MATCHING- length > 4 pix (for the Boxes 512x512 and 12 pix

for the House 256x256)- 0 ^ = 8 deg-K= 1.5, c = 4

VALIDATION- e = 6 mm, obsize > 3

5. CONCLUSION

As the algorithm is data-oriented, it can be implementedin parallel without having to be redesigned. The data canbe split so that it is shared and processed separately, pro-vided fast communication channels between processors areavailable.

Enforcing the uniqueness constraint rigorously allows thealgorithm to rely only loosely on image-to-image similar-ity criteria and therefore achieve a more robust matching.Note that there is no constraint on the length of the seg-ments.

Working with 3-D primitives allows us to exploit power-ful heuristics based directly upon 3-D constraints. It iswell known that horizontal lines present fundamentalambiguity to binocular stereo algorithms in general. How-ever, in the algorithm above it is possible by reasoningover the polygonal scene structure to hypothesise and ver-ify the 3-D position of horizontal lines.

Finally, a few weaknesses (or at least characteristics) ofour approach must be underlined :

• The value of the neighbouring threshold e is scaledon the scene (is it a drawback?).

• There is a limitation to scenes amenable to canoni-cal representation in terms of straight-line imageprimitives.

REFERENCES

N. Ayache "Construction et Fusion de RepresentationsVisuelles" These de Doctorat es Sciences, Univ. de Paris-Sud, Mai 1988.

NAyache and B. Faverjon "Fast Stereo Matching ofEdge Segments Using Prediction and Verification ofHypotheses" ENRIA (France) Short Paper 1985.

J. Canny "A computational approach to edge detection"IEEE Trans. Pattern Anal, and Machine Intell. VolPAMI-8 pp 679-698

R. Horaud, T. Skordas "Stereo Correspondence ThroughFeature Grouping and Maximal Cliques" LIFIA-IMAG,D-LETI (Grenoble) to appear in IEEE Trans. PatternAnal, and Machine Intell.

G. Medioni and R.Nevatia "Matching Images UsingLinear Features" IEEE Trans. Pattern Anal, and MachineIntell. Vol PAMI-6, No 6, November 1984.

G. Medioni and R.Nevatia "Segment-Based Stereo Mach-ing" Comp. Vision, Graph, and Image Proc. Vol 31, pp2-18 (1985)

S.B. Pollard "A stereo Correspondence Algorithm using adisparity gradient limit", Perception Vol 14, 1985, 449-470.

S B P o l l a r d , T P P r i d m o r e , J P o r r i l l , J E W M a y h e w ,and J P Frisby "Components of a Vision Stereo System"Proc. Fourth International Symposium of RoboticsResearch, MIT Press, pp 229-235.

S B P o l l a r d , T P P r i d m o r e , J B B o w e n , J E WMayhew, and J P Frisby "TINA : The Sheffield VisionSystem" Proc. Ninth International Joint Conference onArtificial Intelligence, Milan, pp 1138-1144.

Fig a

FigbSegment-based Stereo-pairs

185

Fig a!

All matches

All matches

p—.

• \ * • ' ' '

1 " ' "

Figa2

Within-object uniqueness

Figb2

Within-object uniqueness

Figa3

Final matches

Figb3

Final matches

186

stereo-matching of line-segments based on a 3 …...stereo-matching of line-segments based on a...

Documents