solving correspondence problem with 1d signal matchingbeard/papers/preprints/zhanleebeard04.pdf ·...

Solving correspondence problem with 1D signal matching

Pengcheng Zhan, Dah-Jye Lee, and Randal Beard

Department of Electrical and Computer Engineering Brigham Young University, 459 CB

Provo, Utah 84602

ABSTRACT Signal matching can be applied to many applications, such as shape matching, stereo vision, image registration, and so on. With the development of hardware, 1D signal matching can be implemented with hardware to make fast processing more feasible. This is especially important for many real-time 3D vision applications such as unmanned air vehicles and mobile robots. When lighting variance is not significant in a controlled lighting environment or when the baseline is short, images taken from two viewpoints are quite similar. It is also true for each scan line pair if the attention is drawn to 1D signal. By processing 1D signal line by line, a dense disparity map can be achieved and 3D scene can be reconstructed. In this paper, we present a robust 1D signal matching method, which combines spline representation and genetic algorithm to obtain a dense disparity map. By imposing smoothness constraint implicitly, matching parameters can be solved in terms of their spline representations by minimizing a certain cost function. Genetic algorithm can then be used to perform the optimization task. Reconstruction results of three different scene settings are shown to prove the validity of our algorithm. Due to the similarity of the problem in nature, this algorithm can be easily extended to solve image registration and motion detection problems. Keywords: Signal matching, stereo vision, correspondence problem, spline representation, genetic algorithm

1. INTRODUCTION Stereo vision is one of the most important research areas in computer vision. It solves the problem of reconstructing depth information of a 3-D scene from two or more images taken from different view points [1]. Two basic problems a stereo system needs to solve are: correspondence problem and 3-D reconstruction. 3-D reconstruction can be done without too much problem if the geometry is known or the calibration is done properly. The major challenge is to solve the correspondence problem, i.e. to determine which pixel or feature in the first view corresponds to which pixel or feature in the second view, thus to infer the disparity information. A significant amount of work has been done to solve correspondence problem and most of the work can be classified into two categories: feature-based matching [2] [3] and area-based matching [4] [5]. Feature-based stereo techniques use symbolic features derived from intensity images rather than image intensities themselves, hence feature extractions are required in the preprocessing stage. However, area-based stereo techniques use correlation among intensity patterns in the local neighborhood of a pixel in one image with intensity patterns in a corresponding neighborhood of a pixel in the other image [6]. The difference between the two also lies in the disparity maps calculated using them. Feature-based algorithms always give a sparse disparity map for those feature points; therefore, interpolation is required to get a dense map for better reconstruction, while area-based algorithms give a dense disparity map instead. However, area-based algorithms are usually sensitive to illumination variations, occlusion, and window size and shape [7]. With the advancements of hardware technology, more and more algorithms can be implemented with hardware to achieve high speed processing. Our attention has been drawn to 1D signal processing in stereo vision. If signal matching can be done on two 1D intensity profiles, hardware parallel processing can be achieved to speed up 3D reconstruction. We have noticed that the scan lines from two view points have very similar intensity profile, especially for the stereo vision system with a short baseline. Because stereo vision is an ill-posed problem, constraints should be enforced to get a satisfactory result in order to avoid regularization. In this paper, spline based representation for disparity field is introduced to implicitly account for the smoothness constraint for stereo vision. As an optimization tool, genetic algorithm exhibits good quality of being able to eliminate local extremities. With the help of genetic algorithm and spline

representation, our algorithm gives a dense disparity map for each scan line, which is very desirable for 3D reconstruction and also exhibits a strong feasibility for parallel processing. Most of the genetic algorithm based stereo vision techniques give a sparse disparity map, such as [2] and [3]. However, with our algorithm, no edge detection is required. As oppose to the area-based algorithms, window size selection is not necessary. Details about the algorithm will be discussed in Section 2, in which 3 subsections will address spline representation, genetic algorithm, and integration and implementation, respectively. Section 3 shows three sets of results for different scene settings separately. Conclusions and future work will be mentioned in section 4.

2. ALGORITHMS

A general approach to solve correspondence problem can be formulated as follows. Given a stereo pair, lI and rI

denoting the intensity profiles taken from left and right view point, in the ideal case and with a canonical camera configuration, we have:

)())(( xIxdxI rl =+ . (1)

In equation (1), )(xd is the disparity field for each pixel. 3D reconstruction can be performed easily if )(xd can be

estimated accurately. The most common method to obtain optimal matching result is to minimize the error:

� −+x

rl xIxdxI 2))())((( . (2)

However, due to the ill-pose problem the stereo vision has in nature, a few more constraints should be considered together with (2), which will be explained more in the following sections.

2.1. Spline representation for disparity field In stereo vision, there is a constraint called disparity smoothness constraint, which simply means that the disparity changes slowly almost everywhere in the image between two points close to each other (See section 2.3.1. for details). This constraint can be implicitly enforced by using spline to represent the disparity field, i.e. the disparity field can be

expressed as 1D splines controlled by a smaller number of displacement estimates id∧

lying on a coarser spline control grid (Figure 1, 2) [8].

The small circles in figure 1 represent the control points in disparity field. Figure 2 shows that the control points may not fall exactly on the pixel position (the circles represent the control points, and the red crosses represent the pixel position. The equation for the disparity field is listed as follows:

�∧

=i

ii xBdxd )()( . (3)

where )(xBi are the basis functions with finite support. Depending on the basis function we use, different spline

representations can be derived. The most common basis can be used are listed below.

Figure 1. Spline Basis Figure 2. Control Grid

��

�

��

�

�

∈−+−=

∈+−=

∈−++=

∈=

=−

��

−∈+∈−

=

∈=

]4,3[6

331)(

]3,2[6

364)(

]2,1[6

3331)(

]1,0[6

)(

)(.3

]0,1[1

]1,0[1)(.2

]1,0[,1)(.1

32

3

32

2

32

1

3

0

xxxx

xb

xxx

xb

xxxx

xb

xx

xb

xBSplineB

xx

xxxBLinear

xxBBlock

. (4)

In our implementation, basis functions are shifted versions of each other. In other words, they shared the following forms:

)()( ii xxBxB∧

−= . (5)

A few reasons we use linear spline to implement our algorithm are: 1. easy implementation; 2. fast processing; 3. Satisfactory accuracy for applications such as obstacle avoidance and distance estimation. Due to similar reasons, we imposed the condition that the spline control grid is a regular sub-sampling of the pixel grid [8], i.e.

ii mxx =∧

(6)

2.2. Genetic Algorithm Genetic algorithm (GA) is an optimization tool, using stochastic global searching approach to mimic the metaphor of natural biological evolution. Comparing to traditional search methods, four most obvious differences can be listed as follows [9]: 1. GAs search in parallel; 2. GAs do not require the calculation of derivative information or other auxiliary knowledge; 3. GAs use probabilistic transition rules, not deterministic ones; 4. GAs work on an encoding of the parameter set rather than the parameter set itself. GA has great application potential in a lot of scientific fields. Once the solution of a problem can be encoded into the form of chromosomes and the relative performance of each chromosome can be evaluated separately, GA can be used to achieve satisfactory results. GAs are iterative procedures that maintain a population of candidate solutions encoded in the form of chromosome strings. After initialization of first population, each candidate is evaluated and is assigned the fitness value that is a function of the decoded bits contained in each candidate’s chromosome [10]. Different alphabets can be used to generate chromosome, among which, binary alphabets { 0, 1} are most common one. Ternary, integer, and even real-valued alphabets can be exploited. In order to overcome the hidden representation bias of binary encoding, gray codes are introduced. The most important processing in each generation contains selection of the candidates and reproduction of the chromosomes. For the selection procedure, the most fit few individuals are selected from all the candidates in the population according to their fitness values. For the reproduction, three genetic operators are applied according to some probabilities to fill the gene pool in order for the GA to continues, i.e. crossover, mutation and insertion [9]. Consider the two parent binary strings: P1 = 1 0 1 0 1 1 0 and P2 = 1 0 1 1 0 0 0. If an integer position i is selected uniformly at random in [1, k-1], where k is the string length, and the genetic information exchanged between the individuals about this point, then two new offspring strings are produced. The two offspring below are produced when the crossover point i = 5 is selected, O1 = 1 0 1 0 1 0 0, and O2 = 1 0 1 1 0 1 0. Another operator to alter a single bit in

the chromosome to change its state is called mutation. For example, mutating the second bit of O1 will produce a new string, OF1 = 1 1 1 0 1 0 0 [9]. With the help of mutation operator, GA has the ability of avoiding local extremity, thus to achieve a global optimum. However, crossover and mutation do not happen all the time, they are carried out with some probability rules. When the gene pool is not full after crossover and mutation, insertion operator will be used to fill it up. An insertion is simply an operation to generate a random chromosome and put it into gene pool.

2.3 Spline-based GA matching for 1D profile pairs Based on the tools mentioned in the previous two sections, we present a new technique of 3D reconstruction using 1D intensity profile matching. Our algorithm doesn’ t require edge detection or other explicit feature detection preprocessing. A dense disparity map will be generated for each pixel in the overlapping area of the image pair. It’s different from most of the feature-based GA stereo vision algorithm and it has a built-in smoothness constraint. Therefore, it is able to give satisfactory results which will be shown in the result section. Comparing to other work using GA with feature-based matching and applying different methods of interpolation (maybe using spline), our method integrates spline representation in every generation, takes every position available in the image into account, and therefore exhibits a better dense disparity map. However, we do have an assumption that the lighting variance is not significant between two images. But, this problem can be dealt with by resorting to local frequency representation of the images or by enforcing the requirement of a short baseline or controlled lighting environment. The properties of the local frequency make it a candidate for an invariant image representation in matching two scenes with different lighting setup: (1) it is relatively invariant to signal energy; (2) local phase estimates and spatial position are equi-variant, (3) the spatial derivative of local phase estimates is equi-variant with spatial frequency [11]. In this paper, we assume the intensity profile pair shares enough similarity for matching, and details about the algorithm will be discussed in the following subsections.

2.3.1. Stereo constraints and cost function In the ideal situation we mentioned above, disparity map can be derived by minimizing equation (2), however, due to the noises occurred in the imaging, taking (2) as the only term in the cost function is far from being realistic. At the same time, because stereo vision is an ill-posed problem, more constraints should be considered when setting up the cost function. For real applications, the following few constraints should also be enforced to reduce the searching space [12]: 1. Epipolar Constraint: The corresponding points can only lie on the epipolar line in the second image. 2. Uniqueness Constraint: A pixel on one image can only correspond to only one pixel on the other image except self-

occlusion. 3. Ordering Constraint: For surfaces of similar depth, corresponding feature points typically lie in the same order on

the epipolar line. 4. Photometric Compatibility Constraint: The intensities of a point in the first and second images are likely only

differing a little. 5. Geometric Similarity Constraint: Geometric characteristics of the features such as line length or orientation found in

the first and second images do not differ much. 6. Disparity Smoothness Constraint: The disparity changes slowly almost everywhere in the image between two points

close to each other. 7. Figural Disparity Constraint: Corresponding points should lie on an edge element in both right and left image. 8. Feature Compatibility Constraint: Points can match only if they have the same physical origin (object surface

discontinuity or border of a shadow cast by some objects). 9. Disparity Limit Constraint: Disparity must be smaller than some limit. Constraint 1 is considered automatically, because we are dealing with canonical camera configuration, in which case, epipolar line happens to coincide with the scan line. Constraints 2 and 3 can be integrated into cost function by adding 2 more terms shown in the following equation. Constraint 4 is the fundamental of matching, which resembles the idea of minimizing equation (2). Constraints 5, 7 and 8 can be used to check the validity of the result. Constraint 6 is taken care of by spline representation. And finally, Constraint 9 is considered when we create the chromosomes for each generation. Some cost functions are listed in the literature, for example, in [2], a cost function integrating intensity error, ordering constraint, and uniqueness constraint is given. Here we present a different expression for the cost function:

�∧

=i

ii xBdfxdf ))(())((

�� −⋅+⋅+−+⋅=∧

yyy

xx

xr

iiil YCKYCwOwxIxBdxIw )))(()(())())((( 32

21 . (7)

In Equation 7, symbol i is the index of the control points and x is the pixel position in the left image which has a

possible match. xO is the ordering constraint indicator. 1=xO if the ordering constraint is violated, otherwise 0=xO .

Y is a vector of position and each element of it has an expression:

�∧

+=+=i

ii xBdxxdxjY )()()( . (8)

Meanwhile, the dummy variable for the summation y satisfies:

)]max(),[min(, YYyNy ∈∈ . (9)

)(YCy is a counting function, and it takes a vector Y as an input and returns the number of the elements in Y with a

value equal to y .

)(xK is an indicator function, with the form:

��

=≥

=00

11)(

x

xxK . (10)

And )3,2,1( =iwi are the weights for intensity error, ordering constraint, and uniqueness constraint respectively. We

can also weigh the intensity error from various part of the intensity profile differently to get an even better match within certain local areas. For example, we can weigh the error more when the intensity is at a local maximum or minimum position and all we need to do is to add some terms to the cost function.

2.3.2. Genetic Algorithm implementation From the cost function above, we can see if the cost is minimized w.r.t

all id∧

’ s, )(xd can be calculated. As

an optimization tool, GA is used to achieve this goal. In order to get GA running, we need to encode the solution we are pursuing in a chromosome form. As we mentioned before, there are many options in choosing the encoding method. We can use binary, gray codes, integer or even real alphabets. In our implementation, we used gray codes in our implementation to avoid the bias, which would possibly be incurred by

binary coding. Each id∧

is represented by a fixed-length binary string with gray encoding. Probability for crossover and mutation need to be pre-selected in order for the algorithm to work. The flowchart of the algorithm is shown in Figure 3. Before GA starts, we express the disparity representation in a matrix product form and pre-calculate the

��

��

��

��

�∧

=i

ii xBdxd )()(→∧→

•= dWd

��

��

��

��

��

��

��

��

��

��!��

��

"��

��

��

��

��

#��$��

��

%��

&�

Figure 3. Flow chart of algorithm

weight matrix. In each generation of GA, we generate the chromosomes (in Gray Codes) to fill the gene pool. After that, we decode the chromosomes, and calculate the disparity for each pixel by using the pre-calculated weight matrix. Fitness values are assigned to each chromosome by calculating the cost function. And then selections are made to pick out the fittest few chromosomes into next generation. Genetic operators are applied according to some probability rules to fill the gene pool. And a new generation starts again. GA runs until certain numbers of the generation are finished or the error is smaller than a threshold. After the last generation finishes, the fittest chromosome is selected as the best solution.

3. DATA AND RESULTS Three different scene settings were chosen and the results by applying our algorithm are illustrated in this section to prove that our algorithm works well in reconstructing 3D scenes and correspondence problem can be solved by performing 1D signal matching.

3.1. Scene settings and original data The first scene (scene A) shown in Figure 4 is a book placed tilted in front of a pair of cameras with the canonical configuratio� and a short baseline. The book is set up so that the left end is closer to the camera, while the right end is farther. This creates a flat but tilted surface in front of the cameras. In this case, there are no occlusions. The smooth nature built in spline representation can be appreciated in the result.

The second one (scene B) shown in Figure 5 is a scene with a few books placed side by side, and with the second book placed a little bit farther from the cameras than others. In this scene, we have 2 large discontinuities in the points where the second book joins with its neighbors. There are occlusions and at the same locations. This is a good test for our

Figure 4. A stereo pair for a tilted network book (left view and right view of scene A)

Figure 5. Four books putting together with the second book a little bit farther (scene B)

algorithm in discontinuity preservation with the maintenance of the smoothness. And the effect of the occlusion will be taken into account in this case. The last scene (scene C.) shown in Figure 6 has a duster can in front of a book. In this setting, to reconstruct a smooth arc front surface is our goal. The ideal result should have two sharp discontinuities and between them a smooth arc surface. It’ s also another good test for our algorithm to see if it can get a smooth result while still keep the discontinuity at the strong edges.

3.2. 1D profiles matching result Our algorithm processes the images scan line by scan line. In this section, we show the disparity field obtained with 1D signal matching. Figure 7 shows an intensity profile pair from scene A. Repetitions of the same intensity pattern can be seen on the profile pair.

In general, without paying attention to too much detail, they share similarity in shape. Our algorithm is able to obtain a good disparity estimate. In figure 8, the matching result is shown as a disparity field, which exhibits a relatively smooth transition (our result gives the disparity for the pixels starting from position 66=x to the right end of the signal), and it agrees well with the scene setting. The left end of the book is closer to camera, thus has larger disparity (absolute value), and the right end shows the opposite. Figure 9 proves more about the correctness of the algorithm, we apply the disparity field on the left signal, and superimpose the result of that on the right signal. It’s obvious that they agree with each other very well, though it is a little bit off near the right end of the signal. However, we can handle this flaw easily by weighing the right end intensity error more as we mentioned in 2.3.1. In our algorithm, smoothness and ordering constraints play an important role in eliminating false matchings for the repetitions of the same pattern.

Figure 7. Intensity profile pair of the 80th scan line in scene A. (left view and right view)

Figure 6. Duster can in front of a book (scene C)

Figure 10 shows the intensity profile pair of the 150th scan line from scene B. Figure 11 is the disparity calculated using our algorithm. We can see the relative distance each book has to the camera on the disparity map. Result shows the disparity for the pixels starting from 61=x to the right end of the signal. Two depth layers are included in this scene; one with the disparity around -45, while the other one with -56. There are also a few corners on the disparity map, i.e.,

Figure 8. Disparity field for the 80th scan line Figure 9. Signals aligned according to the disparity

Figure 10. The 150th intensity profile pair for scene b. (left view and right view)


corner a, b, c, and d. Corner a, c, and d are caused by the coves formed at the position where two books join each other. The presence of corner c is the result of the occlusion and noise. But generally speaking, this agrees quite well with the scene setting. Similar results for scene C are illustrated in Figures 13, 14, and 15. The intensity profile pair we are showing is at the 367th scan line. The disparity result gives the value for the pixels starting from 81=x to the right end of the signal. The great arc in Figure 14 shows that our algorithm successfully estimates the disparity field because it closely resembles the real scene. It does maintain the discontinuity and keep the smoothness property. A look at Figure 15 reveals that the disparity field calculated by our algorithm is a good match between the intensity profile pair.

3.3. 3D scene reconstruction After disparity map is calculated by applying our algorithm, the depth extraction can be performed. In canonical camera configuration, we have:

d

TfZ = . (11)

In (11), f is the focal length, T is the base line, and d is the disparity for each pixel, for which in 3.2 we have shown

a few examples. With equation (11), we can calculate the depth information for each pixel with disparity information. The reconstruction results (up to a scale) for each scene are shown below.

Figure 13. The 367th intensity profile pair for scene c. (left view and right view)


Figures 16-18 show the reconstruction of part of the scene (slices of the image) respectively. These surfaces look quite like the real scene. There is noise in the reconstruction result, however, some post processing techniques can be used to obtain smooth results. Furthermore, if correlation between neighboring scan lines is introduced, a lot more can be done to improve the reconstruction accuracy.

4. CONCLUSION AND FUTURE WORK In this paper, we present a robust method to solve the correspondence problem in stereo vision. It can be easily extended to image registration, motion detection, and so on. The results prove the validity of the algorithm. The biggest motivation of this work is to draw the attention to the study of 1D signal matching with applications to stereo vision in order to achieve fast processing with hardware implementation. Though the result seems very promising, more work can be done to improve the performance: 1. Local frequency representation can be studied to see how well it can take care of the signal energy variations to

alleviate the effect of illumination variations; 2. The alphabets used in chromosome for GA, can be changed to be integer based, for the reason that the disparity can

only be some integer; 3. The cost function can be studied more, and more constraints need to be considered; 4. Pyramids version of our algorithm can be tried to speed up the processing.

Figure 16. Surface for scene A Figure 17. Surface for scene B

Figure 18. Surface for scene C

In a word, solving correspondence problem, especially in stereo vision, by doing 1D signal matching is a feasible approach and worth investigating. A good, quick, robust algorithm solving 1D correspondence problem will definitely find its way in hardware implementation more easily, thus to benefit real-time stereo vision applications such as Unmanned Air and Ground Vehicles as well as mobile robots.

REFERENCE 1. E. Trucco and Alessandro Verri, Introductory Techniques for 3-D Computer Vision, pp140, Prentice Hall, NJ, 1998 2. Y. Ruichek, H. Issa, and J. Postaire, “Genetic approach for obstacle detection using linear stereo vision” , 2000

Proceedings of IEEE Intelligent Vehicles Symposium, pp261-266, MI, 2000 3. S. Woo and A. Dipanda, “Matching lines and points in an active stereo vision system using genetic algorithms” ,

2000 Proceedings of International Conference on Image Processing, vol. 3, pp332-335, Canada, 2000 4. J. Vlontzos and D. Geiger, “A MRF approach to optical flow estimation” , 1992 Proceedings of CVPR ’92, pp853-

856, IL, 1992 5. J. Lotti and G. Giraudon, “Adaptive window algorithm for aerial image stereo” , Proceedings of the 12th IAPR

International Conference on Pattern Recognition, vol. 1, pp701-703, Israel, 1994 6. U. Dhond and J. Aggarwal, “Structure from stereo-a review” , IEEE Transactions on Systems, Man and Cybernetics,

vol. 19, issue. 6, pp1489-1510, 1989 7. J. da Silva, P. Simoni, and K. Bharadwaj, “Multiple correspondence in stereo vision under a genetic algorithm

approach” , 2000 Proceedings XIII Brazilian Symposium on Computer Graphics and Image Processing, pp52-59, Brazil, 2000

8. R. Szeliski and J. Coughlan, “Hierarchical spline-based image registration” , 1994 Proceedings CVPR ’94, pp194-201, WA, 1994

9. Univ. of Sheffield, “The MATLAB Genetic Algorithm Toolbox v1.2 User's Guide” , http://www.shef.ac.uk/~gaipp/ga-toolbox/

10. P. Chalermwart and T. El-Ghazawi, “Multi-resolution image registration using genetics” , 1999 Proceedings of International Conference on Image Processing, vol. 2, pp452-456, Japan, 1999

11. J. Liu, B. Vemuri, and F. Bova, “Multimodal image registration using local frequency” , Fifth IEEE Workshop on Applications of Computer Vision, pp120-125, CA, 2000

12. Sonka, M., V. Hlavac, and R. Boyles, “ Image Processing, Analysis, and Machine Vision” , pp477-479, PWS Publishing, 1999

solving correspondence problem with 1d signal matchingbeard/papers/preprints/zhanleebeard04.pdf ·...

Documents