doc 6.5 p.1 1. feature selection (image 1l) 2. stereo matching (image 1r) 3. feature tracking...

Doc 6.5 p.1

1. Feature Selection(image 1L)

2. Stereo Matching(image 1R)

3. Feature Tracking(images 2L, 2R)

8. Rigidity Test #2

4. Rigidity Test #1

7. Model Refinement

6. ML Refinement

5. Least Squares Fit

OriginalAlgorithmOverview

qLi, ΣLi

qRi, ΣRi, Qi, ΣviqLi+1

ΣLi+1

qRi+1

Σri+1

Qi+1

Σvi+1

R0i+1, T0i+1, Θ0i+1qLi+1

ΣLi+1

qRi+1

Σri+1

Qi+1

Σvi+1

ΣM, Rni+1, Tni+1, Θni+1

PiPi

Doc 6.5 p.2

Algorithm Details

Based on Yang’s code, with notes from Larry’s thesis and Clark’s paper

Doc 6.5 p.3

Yang’s VisOdom Main.cc

• Main() reads args, inits mem mgt, calls doit(), and frees resources. Reading args seems to set flags to say which args have been read.

• Doit() is the main function– Instantiates two VisOdoms (front & rear)– Init()s them, ReadLogFile()s if filenames are given– Sets fusion_flag if rear log file is given– Read camera models for front cams (and back cams if fusion_flag)– GetNumPics(), TurnonVoStatusFlag(), then subtract attitude[0][0] from

all attitude[][0] in front, and fill dpos[][0] with the derivative of position[][0] in front. Yang says to get rid of that last part.

– If fusion flag, copy attitude[][0] and position[][0] from front to back– One huge if/else thing – see next slide for (similar) contents of each– WriteEstimatedMotion() to “motionest.txt”

Doc 6.5 p.4

Huge if/else thing

• Do the following for each pic– Do each bullet for front, then repeat for rear if (fusion_flag)– Copy… and the bullet after happen as a block, but MotionEst… is only tested once

• For each pic i– Read left & right images into memory & set CurImgIndex to that memory bank– If (first pic)

• InitPyramids, GeneratePyramidsMatch // Init & Generate pyramids for left & right images

• TransferCameras, FeaturesSelection, StereoMatch

– Else // other pics• TransferCameras, GeneratePyramidsTrack, FeaturesTrack, GeneratePyramidsMatch,

StereoMatch, RigidityTest• If (!MotionEstimation[Fusion]) CopyInitMotion2EstMotion

– No clear purpose to CopyInitMotion2EstMotion except recording

• FeaturesSelection, StereoMatch // add features for next time

• StereoMatch == {FeaturesStereoMatch, FeaturesRayGaps, FeaturesCov }

Doc 6.5 p.5

TransferCameras

• Record rover pose (estMotion) for current image– In frame 0, copy from logRecord file, which gives rover position and

attitude in world coordinates– In later frames, use the previous image’s estMotion (as refined by other

functions) plus the change in position according to the logRecord file. This is more accurate than reading the log file entry, which has accumulated error. Do not update estMotion attitude at this point.

• Record camera pose (srcLeftImage[CurImgIndex]->cam)– If either camera is NULL for the current frame, initialize it by copying

raw camera, which is camera (fixed) in rover coords– In frame 0, set cameras to estMotion plus leftrawcam or rightrawcam to

get camera initial pose in world coords– In later frames, set the camera at estMotion plus the offset from rover to

camera, rotated by the logRecord-estimated rotation

Doc 6.5 p.6

1. Feature Selection

• Basic (Larry) algorithm – build a list of 50 features– Divide image into a 10x10 grid of cells– For each cell without a feature from previous frame, evaluate interest

operator across cell, choose best pixel, and add to a list – Sort list of newly identified features by interest, and add the best to the list

of previous-frame-features (initially empty) to get 50 features– Output: lef t image feature coords & uncertainty

• Notes– Tracking features across multiple frames lets you improve 3D point model– Algorithm works poorly if features are collinear or all far away– Choose stereo-trackable features, because stereo error hurts more than

tracking error, so horizontal trackability >> vertical trackability– Good features, well distributed

Doc 6.5 p.7

FeaturesSelection()

• GenerateInterestImage() on left image via Int_forstner()

• SelectMinima(). Pass in border width. Divide remaining image into 10x10 cells, rounding up. List min nonzero value in each full cell. Sort list. Add to existing feature list, up to numfeatures (which is set to 200 in init), excluding features too close to existing features.

– Question: why round up to find grid, if you don’t look for features in the final row and column?

• Set resulting feature list as the left image’s feature list.

• Do not set ΣL – it will be done by stereo matching

Doc 6.5 p.8

Int_forstner()

• Find gradients (pixel[i-1]/2 – pixel[i+1]/2)• For each pixel in range

– Sum gxgx, gxgy, gygy across window around pixel

– Put into matrix, then find & record largest eigenvalue

– Does inversion, and scaling by determinant, and scaling by 4 because didn’t do derivatives right, but these all factor out – you should eliminate that code

• The smallest positive answer is the best feature

Doc 6.5 p.9

Feature Selection Issues and Improvements

• Algorithm works poorly if features are collinear or all far away. – Consider a filter to detect such problems.

• Investigate whether 50 features are as good as 200. – Yang says you lose around 50% each frame, but maybe you keep the top 50 in general. – If you can use fewer features, consider Larry’s original 10x10 grid, which has fewer cells to search.

• Retain features between frames. Yang does not. If you retain features, then do not apply interest operator or select minima for cells with a retained feature, and modify how and how many features you add to the feature list. Some sorting would be involved to easily tell which cells to search.

• If you retain features across frames, you could implement step 7 to improve model. Each feature would store model point P or the address of P, so that reordering the feature list does not disturb the feature-to-model-point map.

• If you retain features, consider allowing a range of number of features, for instance 40-60. – If you have at least 40, skip feature selection. Otherwise fill up to 60. – Use the time you save for more iteration on step 6.

• Consider using stereo to detect & reject features on occlusion contours, because they are unstable• If you use a 10x10 grid, and you determine that part of the image is low texture, consider resizing

the grid to ignore the low texture regions while maintaining 10x10• Consider weighting horizontal gradient more heavily in interest operator because horizontal

tracking is more important because stereo is more important than 2D tracking

Doc 6.5 p.10

2. Stereo Matching

• Basic (Larry) Algorithm– Correlate in pyramid

• Olson uses (16x, 4x1, 1x), Yang uses (2x, 1x).

• If limited depth, limit disparities.

• Use epipolar line to constrain search window

– If not found near epipolar line, reject

– Threshold residue (at each pyramid level)

– Triangulate to get depth at features

– Not obvious whether subpixel disparity is intended

– Bad data near image edges

– Outputs: right image coords & deviation; 3D coords and deviation

Doc 6.5 p.11

Yang Stereo Matching

• GeneratePyramidsMatch – copy pyramid for left-image from 2nd pyramid into 1st pyramid, and make new pyramid for right-image in 2nd pyramid– Would be faster to swap pointers, not copy

• FeaturesStereoMatch() – use pyramids & epipolar search windows to find each feature in right image and the 2D covariance matrix, which is the same in both images

• FeaturesRayGaps() – verify that epipolar lines mostly cross at feature location

• FeaturesCov() – fill in each features’ cov3d

Doc 6.5 p.12

FeaturesStereoMatch()

– Find min & max disparity based on min & max range– For each feature,

• Pose2EpipolarLine to get the epipolar line in the right image• Say (xl,yl) are the feature coords in the left image. Make a

box in the right image using columns x = x1-mindisp … x1-maxdisp, and rows from 5 pixels above the highest value of epipolar line in those columns to 5 pixels below lowest value. If horizontal line, probably has a bug in window dims.

• Call MatchOneAffineFeature() to match a feature from 1st

pyramid to the image in 2nd pyramid, using the above search range. Finds coords in second image and cov_stereo, which is both ΣL

-1 and ΣR-1 for the feature

Doc 6.5 p.13

MatchOneAffineFeature()

– Shrink search window dims by factor of 2 for each pyramid level, but at least 4x4

– For each pyramid level• Get img2 from param pyr, img1 (template) from this pyr• Feature is targetfeature coords at this resolution• Call nav_corimg(). Pass a winsize window of img1 around the

int feature location, and a winsize+searchsize window of img2 around the feature, where searchsize was determined based on epipolar line. Finds the location and inverse covariance of correlation peak. Add the img1-feature non-int offset – HACK

• If result was low confidence, return error. Otherwise change search window to 2*recovered move (actual move on next pyramid level) +/- 5 pixels

Doc 6.5 p.14

nav_corimg() p.1

• Well documented, in corr_image.c• Pseudo-normalized correlation of template img1

across window img2 – 2*Σ(i1*i2)/(Σ(i1*i1)+Σ(i2*i2))– i1, i2 are pixels in normalized windows. – To avoid 2-step normalizing, need Σu1, Σu2, Σu1

2, Σu22,

Σ(u1+u2), where u1, u2 are unnormalized images– Dan doubled speed by calculating i1*i2, not i1+i2, and

using int instead of double – Return correlation peak and covariance

Doc 6.5 p.15

nav_corimg() p.2

• Find Σ(u1) and Σ(u12) before loop main

• Calculate Σ(u2) and Σ(u22) for each column in top swath

• For each row-swath– Calculate ΣΣu2, ΣΣu2

2, and correlation for leftmost position– March ΣΣu2, ΣΣu2

2 forward and correlate remaining positions– March all Σu2 and Σu2

2 down one swath

• Fit 2D quadratic to the 3x3 around the best correlation score– Returns error if neighbor of “best score” ties or is better, if peak is more of a ridge– Trust their equations for fitting 9 points to biquadratic– Solve for subpixel offset of peak, add to best score pixel coords, interpolate

correlation score there – trust their equations– Generate “covariance vector”. If quadratic is Ax2+By2+Cxy+…, then ΣL = [ 2A C;

C 2B]. Calculate ΣL-1 and store the xx, xy, and yy terms. Assume that ΣL

-1 and ΣR-1

have the same value

Doc 6.5 p.16

FeaturesRayGaps() – new

• Does old FeaturesRayGaps except records camera matrix– Bug: If dotp==0, m2 should be –dotbv2

• Then does same as old FeaturesCov, except different equations for H0.– Larry’s document defines H0 as dproj(P)/dP|P=P0, where proj(P) is the

projection of P, and then Σ-1 = H0T * Σvq-1 * H

– Yang instead uses P’ = dP/dproj(P) [ = H0-1 ], and thus Σ = H0

-1 * Σvq * HT-1 = P’ * Σvq * P’T. Assumes the final param to Image2DToRay3D is d3Dpos/d2Dpos for points on the ray.

– The new equations find/use Σ , not Σ-1 . Must fix nav_corimg() and motion estimator to conform

– Yang’s code refills m1, m2 halfway through. That is an error, but it has negligible effect

Doc 6.5 p.17

FeaturesRayGaps() – old

• For each good feature in the new image– Convert feature loc in each image into 3D rays,

(direction, pinhole location, camera covar)– Find the point on each ray at closest approach– Project both into both images– If Manhattan distance between projections of closest-

approach points in either image exceeds threshold, reject feature

– Else retain feature in 3D at average of the two closest-approach-points

Doc 6.5 p.18

FeaturesCov() – old

• Follows Larry thesis A.2 (pp.143-4)• For each good feature in the new left image

– Ip1, pos3d, pos1, ray1 are left image feature– Ip2, pos2, ray2 is right image feature– cov_stereo[3] array gives the xx, xy, and yy terms of ΣL

-1 or ΣR-1

• both have the same values

– Generate Σvq-1 from ΣL-1 and ΣR

-1 as described in thesis– Assume Final param to Image2DToRay3D is the transpose of [ sx 0 cx; 0 sy

cy ]*R, where s and c refer to world-to-pixel scale and image center pixel and R is the camera rotation matrix. So that matrix converts world coords into orthographic projection coords, and you just have to divide by Z to get screen coords.

– Fill H0 (4x3), which the code calls Ht. It is dprojection(P)/dP• I cannot verify the equations, and now Yang has changed them

– Cov3d = H * Σvq-1 * Ht, the inverse of the cov matrix

Doc 6.5 p.19

Correlation Issues

• MatchOneAffineFeature() is just translation, not Affine– Would affine give better stereo results?

– Would KL be better than fitting parabola during nav_corimg()’s Subpixel interpolation?

• Tracking also uses MatchOneAffineFeature– Then it does homography transform for refinement

– Perhaps additional refinement and even subpixel interpolation in MatchOneAffineFeature is wasted – so perhaps they should be separated, and used only in stereo

Doc 6.5 p.20

3. Feature Tracking

• Original (Larry) Algorithm- correlate to find 1st-left-image features in 2nd-

left-image- use 3D world model and external-source

motion estimates to predict search window size and position (Yang uses odometry, no model)

- Use stereo matching to find 2nd-right-image features

- threshold on residue

Doc 6.5 p.21

Yang Feature Tracking

• GeneratePyramidsTrack()– generate new 2nd pyramid (2nd-left-image)

• FeaturesTrack() into new left image– For each good pixel in 1st-left-image that projects onto 2nd-

left-image• Define feature window around 0 + expected 2D motion, based on

feature 3D position and expected camera motion• MatchOneAffineFeature() to get new 2D position• If new feature position is good

– computeLocalHomography to improve 2D estimate– if residue is low, use improved estimate

• Repeat stereo matching into new right image

Doc 6.5 p.22

computeLocalHomography()

• Put windows from (-4,-4) to (7,7) about old and estimated-new feature locs on their images

• In reading order, MatchOneAffineFeature() to track pixels of these windows until you find 2 whose correlation > 0.8.

• Use those 2 and the main feature’s locs to compute homography coefficients

• For each pyramid level– for each pixel in window around old position

• Apply homography to find equiv pixel in new position• accumulate stats on old-image vs. new-transformed-image pixel intensities

– If that correlation is high, init final 2 homography coefs– mrq minimize, probably to improve coefs

• Use homography to update new pose & covar

Doc 6.5 p.23

Feature Tracking Issues and Possible Improvements

• Why do we need pyramid if we have correlation?– Consider correlation vs. pyramid

• Do correlation and pyramid only if fine-tracking fails• Consider the following order of events

– Use any external data to estimate rotation• Modulate correlation/pyramid size by credibility of external data

– Use vision to refine roll estimate– Begin tracking with features high on image (far away). Use correlation

and/or pyramid. Refine pitch & yaw estimate. Use rotation estimate and 3D model to predict location of nearer features. Use smaller/no search window and/or pyramid as accuracy improves.

• If no distant features, choose a large one high on image (per Clark’s paper)

• Consider affine tracker instead of homography – faster? Is a 3-point homography credible? Perhaps 3-point is just to init?

Doc 6.5 p.24

4. Rigidity #1

• Original (Larry) Algorithm– Require Δ distance between 3D feature coords < threshold– Reject worst offending features & recalculate

• RigidityTest() – one big loop– For each pair of features,

• VSrigidity_ai() does Larry thesis section 5.2.1 up thru calculating ai, the change in distance between the features, normalized for uncertainty in their measurements

• Sum ai into A[] for each feature, and track which feature has highest A[]

– If the highest A[] exceeds a threshold, remove the associated feature. – Else break from the loop

• Issues and possible improvements– perhaps watch evolution of an offender– Perhaps un-reject points that shape up– Perhaps predict & re-seek offending points

Doc 6.5 p.25

5-6. Motion estimation

• 5. Least Squares Fit – initial motion estimate– Could skip this step if you have odometry– Err = Σ (||Qi+1 – R*Pi – T||2 / (det(Σvi) + det(Σvi+1)))– Pi is model point, based on observations Q0…Qi

• 6. ML Refinement – iterative Maximum Likelihood – Err = Σ (eTWe) where e = Qi+1-R*Pi-T and W=Σvi-1

– Linearize about Θn, Tn and solve for Θn+1, Tn+1. Eqns on pp.23,150 – Also gives ΣM, covariance (confidence) in final Θ, T– Confidence is higher in closer points– Apparently critical to good results

• Presumably matching 3D-3D is faster or more accurate than matching 3D model to 2D images, Kalman style

Doc 6.5 p.26

Yang steps 5-6

• MotionEstimation()– Init temp cameras but provide irrelevant pose– Make sure we have enough points– Schoneman_least_middian_square() – weighted least squares

solution for R and T to describe motion of points Qi+1 in current frame and their counterparts Pi in previous frame

– Step 6: Big iteration– Move cameras of cur image by inverse of world-motion

• Cams of cur image are in world coords

– ComputePose(): member estMotion == param estPose = new cameras attitude and position relative to frame 0

– Find covariance (ΣM aka estMotion.covariance) by eqn. B.11

Doc 6.5 p.27

Schoneman_least_middian_square()

• Weighted least squares solution for R and T to describe motion of points Q i+1 in current frame and their counterparts Pi in previous frame – not motion of camera. Return R and T in params.

• Derivation in Larry’s thesis, sec. B.1, says …– weight points by w = 1 / (det(Σvi) + det(Σvi+1))– find E = (Σ w*Qi+1*Pi

T) – (Σ w*Qi+1)(Σ w*Pi)T/(Σ w), then svd to E=USVT, – then R=UVT and T = ((Σ w*Qi+1)-R*(Σ w*Pi)T) /(Σ w)

• Yang follows this except w = 1 / (|Qi+1|+|Pi|)– Suppose stereo is much worse than 2D tracking, so det(Σvi) is dominated by the variance in the

forward direction (perpendicular to image plane), so we could calculate that instead of det(Σv i). – Define FW as the forward distance from baseline to feature– Further suppose that Σvi = J*JT, which seems to (incorrectly) assume 2D feature covariance = I– From there, comments in the code show that forward variance FW4, where all features share

the same constant of proportionality, which we can drop from our equations– Further suppose that feature 3D coordinates reference an origin on the baseline, the feature is far

away, and the field of view is small, such the feature is at roughly (FW,0,0) – then we can use |Q|4 instead of FW4 as the weight

– Finally, assume that |Q|2 (standard deviation) or even |Q| is a reasonable substitute for variance

Doc 6.5 p.28

Big Iteration

• R aka R0 and Θ aka Θ0 = rotation from before iteration or from previous iteration

• For each good feature Q in old left image,– Use Larry thesis eqn B.6 (3rd eqn) and pp. 152-153 to get Jj– Use Larry thesis eqn B.7 (3rd eqn) to get Qj from prev and cur feature.pos3d

• You would probably use Pj, not Qpj, for prev feature pos

– See Larry thesis p.23 to get Wj (inverse covariance of Q noise) from Σpj and Σcj (cur and prev feature.cov3d)

– Sum the 6 terms shown in B.8

• Use eqns B.9 to find V1 (Θhat) and V2 (That)• Accept Θ = Θhat, T = That, and make R from Θ• Loop until out of iterations (return error) or change in Θ (radians) plus

fractional change in T < threshold (continue)

Doc 6.5 p.29

Aux functions

• TransformStereoCamerasRot() takes new left-cam pos & this frame’s rotation; rotates A, H, V, O and copies R for both cameras; assigns left C; and updates right C by rotating about baseline

• ComputePose()– take raw (frame 0) cams and current cams, and generate the relative

rotation and translation since frame 0.

– Each frame’s coord sys is left/right along the camera baseline, forward perpendicular to that on the plane containing baseline and left camera A vector, and centered on the left camera C vector.

• MotionEstimationFusion() – make one set of feature from front and rear images, then do same as MotionEstimation(), and move both sets of cameras afterwards

Doc 6.5 p.30

7-8. Motion estimation

• 7. Model Refinement– Equations on p.26 improve estimate of model points Pi

– In practice, does not change R, T

• 8. Rigidity Constraint #2– For each point

• Err = Q – R*P – T as before

• σ = a diagonal element of Σv, minus a function of ΣM (see p. 159.)

• Point is bad if err > K*σ for some K, say 3

– Reject worst offender, then return to some earlier step, perhaps 5 or 6

• Yang does not track points (P), so he does not do these steps

Doc 6.5 p.31

Motion Estimation Issues and Possible Improvements

• Make durable list of 3D feature positions (“P”)– Init from recovered 3D loc of new features

– Implement thesis step 7 to update 3D locs using each new frame’s estimate

– Modify earlier steps to use this list rather than the previous frame’s 3D estimate

• Implement step 8 (second rigidity test)

Doc 6.5 p.32

Things to improve

• Need image sequence to test on– Mast cam, approach – or carry cam in roverlike pattern

– To test 1-pixel target recovery

– So we can compare visodom, 2Donly, ICPonly, etc

– Perhaps two image sets, with and without LED thru pinhole, so we can see actual pixel and compare with our results

doc 6.5 p.1 1. feature selection (image 1l) 2. stereo matching (image 1r) 3. feature tracking...

Documents