robust computer vision midterm fundamental matrix ......despite the use of ransac, the matched surf...
TRANSCRIPT
Robust Computer Vision
Midterm
Fundamental Matrix estimation with the
Levenberg-Marquardt algorithm
Eric Wengrowski
March 31, 2016
1
Inputs and Expected Outputs
Our goal is to estimate the Fundamental Matrix using only 2 stereo images withnoisy matches. The stereo images are displayed below. Fundamental matrixestimation provides an estimated mapping of pixels to 3D coordinates.
left = rgb2gray(imread(’midterm_left.bmp’));
right = rgb2gray(imread(’midterm_right.bmp’));
Stereo Matching with SURF Features
The assignment called for locally maximum Harris Corners as interest points.But for simplicity and convenience, SURF points were used instead. SURFpoints are computed by approximating Laplacian of Gaussian with a Box Fil-ter. For orientation assignment, SURF uses wavelet responses in horizontal andvertical direction for a neighbourhood of size 6s. The dominant orientation isestimated by calculating the sum of all responses within a sliding orientationwindow of angle 60 degrees. For feature description, SURF uses Wavelet re-sponses in horizontal and vertical direction. A neighborhood of size 20sX20s istaken around the keypoint where s is the size. It is divided into 4x4 subregions.For each subregion, horizontal and vertical wavelet responses are taken and acomposes 64-dimensional feature vector of the form:
v = (∑
dx,∑
dy,∑|dx|,
∑|dy|)
% SURF
% Find SURF features
left_points = detectSURFFeatures(left);
right_points = detectSURFFeatures(right);
% Extract the features
[f_left,vpts_left] = extractFeatures(left,left_points);
[f_right,vpts_right] = extractFeatures(right,right_points);
% Visualize 100 strongest SURF features, including their scales and orientation
% which were determined during the descriptor extraction process.
figure
title(’100 Strongest SURF points in each image’)
subplot(1,2,1)
imshow(left); hold on;
strongestPoints = left_points.selectStrongest(100);
strongestPoints.plot(’showOrientation’,true);
hold off
subplot(1,2,2)
imshow(right); hold on;
strongestPoints = right_points.selectStrongest(100);
2
Figure 1: Left and right stereo images of the same scene given as input.
strongestPoints.plot(’showOrientation’,true);
hold off
3
Figure 2: Explanations of the SURF feature vector taken fromOpenCV’s documentation: http://docs.opencv.org/3.0-beta/doc/py_
tutorials/py_feature2d/py_surf_intro/py_surf_intro.html
Figure 3: 100 Strongest SURF points detected in each image.
% Retrieve the locations of matched points.
indexPairs = matchFeatures(f_left,f_right) ;
matchedPoints_left = vpts_left(indexPairs(:,1));
matchedPoints_right = vpts_right(indexPairs(:,2));
% Display the matching points.
4
% The data still includes several outliers.
figure(1), showMatchedFeatures(left,right,matchedPoints_left,matchedPoints_right);
legend(’matched points left’,’matched points right’);
% Matched Points in each image
figure, title(’Matched SURF points in each image’)
subplot(1,2,1)
imshow(left); hold on;
strongestPoints = matchedPoints_left;
strongestPoints.plot(’showOrientation’,false);
hold off
subplot(1,2,2)
imshow(right); hold on;
strongestPoints = matchedPoints_right;
strongestPoints.plot(’showOrientation’,false);
hold off
RANSAC
% RANSAC
[~,inliersIndex] = estimateFundamentalMatrix(matchedPoints_left,matchedPoints_right,...
’Method’,’RANSAC’,’NumTrials’,10000,’DistanceThreshold’,10,...
’DistanceType’,’Sampson’,’Confidence’,99);
numInliers = sum(inliersIndex);
% Display the matching points after RANSAC
figure(2), showMatchedFeatures(left,right,matchedPoints_left(inliersIndex),...
matchedPoints_right(inliersIndex));
legend(’matched points left inliers’,’matched points right inliers’);
Despite the use of RANSAC, the matched SURF points are still not perfect.Therefore, it would not be suitable to use a subset of matches with the 8-pointalgorithm for Fundamental matrix estimation. Instead, we want to use a robustestimator find the fundamental matrix. We employ the Levenberg Marquardtalgorithm, which will consider all points, and find the Fundamental matrix Fthat locally minimizes the reprojection error.
F estimation with Levenberg Marquardt
Levenberg Marquardt is a heuristic blend of Gradient descent and Gauss-Newtoniteration. These are all in the class of iterative optimization algorithms thatsearch for local extreama. Levenberg Marquardt work by dynamically changingthe step-size λ:
1. Do an update.
2. Evaluate the error at the new parameter vector.
5
matched points left
matched points right
Figure 4: Composite view of the stereo images and point matches beforeRANSAC. Despite having only 25-30 interest points remaining after matching,there are still significant outliers.
3. If the error has increased as a result the update, then retract the step (i.e.reset the weights to their previous values) and increase λ by a factor of 10or some such significant factor. Then go to (1) and try an update again.
4. If the error has decreased as a result of the update, then accept the step(i.e. keep the weights at their new values) and decrease λ by a factor of10 or so
Levenberg Marquardt is particularly popular in the Computer Vision Com-munity because it is relatively robust and computationally efficient for sparseestimate with relatively few data points, such as this stereo matching example.
% Levenberg-Marquardt algorithm
6
Figure 5: The 25-30 matched SURF points are shown on each image individually,but significant outliers still remain. Notice the top-leftmost SURF point in theleft image. We see no reasonable match in the right image.
% Solved externally using fundest
fileID = fopen(’C:\Users\Eric\Development\matlab_sandbox\...
Classes\RobustComputerVision\matches.txt’,’w’);
for i = 1:numInliers
fprintf(fileID, ’%e \t %e \t %e \t %e \n’, [matchedPoints_left.Location(i,1)...
matchedPoints_left.Location(i,2) matchedPoints_right.Location(i,1)...
matchedPoints_right.Location(i,2)]);
end
fclose(fileID);
From the notation of Hartley and Zisserman, the 3D coorinates are of theform:
X = (XT1 , ..., X
Tn )T
and each point measured in 2D (each of the camera images) is:
Xi = (xTi , x′Ti )T
We take the covariance of the 2D image noise as a diagonal with σx = σy = 2pixels.
The 3D to 2D camera matricies are denoted P and P ′ for the left and rightcamera. For simplicity sake, we set
P = [I3|0]
andP ′ = [M |m]
so we are only solving for 1 transformation. But because we have more than8 point matches, P ′ is not unique. So, we employ the Levenberg Marquardtalgorithm to solve for P ′ such that we minimize the reprojection errors of eachof our 3D point estimates back into the 2D image planes.
7
matched points left inliers
matched points right inliers
Figure 6: After RANSAC, the most significant outliers are pruned out, and wehave 15-16 relatively strong matches remaining. But notice that there is stillnot a perfect consensus for the geometric transform of the image points. Inparticular, notice the points on the rightmost column in the (blue) right image.The mapping between these points is not totally perfect with respect to therightmost column in the (red) left image.
The LM algorithm solves for a parameter vector P where
P = (aT , bT1 , ..., bTn )T
a = p′ which makes up the values of P’. bi = (Xi, Yi, Ti)T is a 3-vector,
parametrizing the ith 3D point (Xi, Yi, 1, Ti) So in total, there are 3n + 12parameters (3 for each matching interest point and 12 for the transformationbetween the 2 cameras).
Once we use the LM algorithm to solve for P ′ = [M |m], we have M and m.
8
Figure 7: Robust Fundamental Matrix estimates using the Levenberg Mar-quardt algorithm and 17 matching interest points. We see here that the com-putational cost is negligible for relatively few interest points.The LevenbergMarquardt estimation was performed using the C++ libraries levmar http://
users.ics.forth.gr/~lourakis/levmar/ and fundest http://users.ics.
forth.gr/~lourakis/fundest/.
We can now solve for the Fundamental Matrix F:
F = [m]xM
% LM Estimate of F
F = [3.208634e-07 1.179766e-05 -0.003055175;
-5.946336e-06 -1.996869e-05 0.003977885;
0.001530039 0.00502452 -0.9934454];
We can solve for the covarience of P ′ is a 12x12 matrix
ΣP ′ = (U −n∑
i=1
WiV−1i WT
i )+
However, we must remember to set the last 5 singular values to 0 because 2cameras with n point matches have only 3n + 7 degrees of freedom. From thetext, we set ||F|| = 1. We can then compute the 9 x 12 Jacobian matrix J of Ffrom P ′.
9