Agenda
• Rotations
• Camera calibration
• Homography
• Ransac
164 Computer Vision: Algorithms and Applications (September 3, 2010 draft)
Transformation Matrix # DoF Preserves Icon
translationh
I ti
2⇥32 orientation
rigid (Euclidean)h
R ti
2⇥33 lengths ⇢⇢
⇢⇢SSSS
similarityh
sR ti
2⇥34 angles ⇢
⇢SS
affineh
Ai
2⇥36 parallelism ⇥⇥ ⇥⇥
projectiveh
˜Hi
3⇥38 straight lines `
Table 3.5 Hierarchy of 2D coordinate transformations. Each transformation also preservesthe properties listed in the rows below it, i.e., similarity preserves not only angles but alsoparallelism and straight lines. The 2⇥3 matrices are extended with a third [0T 1] row to forma full 3⇥ 3 matrix for homogeneous coordinate transformations.
amples of such transformations, which are based on the 2D geometric transformations shownin Figure 2.4. The formulas for these transformations were originally given in Table 2.1 andare reproduced here in Table 3.5 for ease of reference.
In general, given a transformation specified by a formula x0 = h(x) and a source imagef(x), how do we compute the values of the pixels in the new image g(x), as given in (3.88)?Think about this for a minute before proceeding and see if you can figure it out.
If you are like most people, you will come up with an algorithm that looks something likeAlgorithm 3.1. This process is called forward warping or forward mapping and is shown inFigure 3.46a. Can you think of any problems with this approach?
procedure forwardWarp(f,h, out g):
For every pixel x in f(x)
1. Compute the destination location x0 = h(x).
2. Copy the pixel f(x) to g(x0).
Algorithm 3.1 Forward warping algorithm for transforming an image f(x) into an imageg(x0) through the parametric transform x0 = h(x).
Geometric Transformations
x
y
Let’s define families of transformations by the properties that they preserve
Rotations
Definition: an orthogonal transformation perserves dot products
Linear transformations that preserve distances and angles
[can conclude by setting a,b = coordinate vectors]
Defn: A is a rotation matrix if ATA = I, det(A) = 1Defn: A is a reflection matrix if ATA = I, det(A) = -1
aT b = T (a)T (b) where T (a) = Aa, a 2 Rn, A 2 Rn⇥n
aT b = aTATAb () ATA = I
aT b = F (a)TF (b) where F (a) = Aa, a 2 Rn, A 2 Rn⇥n
2D Rotations
R =
cos ✓ � sin ✓sin ✓ cos ✓
�
1 DOF
3D Rotations
Think of as change of basis where ri = r(i,:) are orthonormal basis vectors
R
2
4XYZ
3
5 =
2
4r11 r12 r13r21 r22 r23r31 r32 r33
3
5
2
4XYZ
3
5
rotated coordinate frame
r1
r2
r3
How many DOFs?
3 = (2 to point r1 + 1 to rotate along r1)
3D RotationsLots of parameterizations that try to capture 3 DOFs
Helpful one for vision: axis-angle representation
Represent a 3D rotation with a unit vector that represents the axis of rotation, and an angle of rotation about that vector
7
Shears
A=
2
664
1 hxy hxz 0hyx 1 hyz 0hzx hzy 1 00 0 0 1
3
775
Shears y into x
7
8
Rotations• 3D Rotations fundamentally more complex than in 2D!
• 2D: amount of rotation!• 3D: amount and axis of rotation
-vs-
2D 3D
8
05-3DTransformations.key - February 9, 2015
Recall: cross-product
Dot product:
Cross product:
a · b = ||a|| ||b||cos✓
Cross product matrix:
������
i j ka1 a2 a3b1 b2 b3
������=
����a2 a3b2 b3
���� i�����a1 a3b1 b3
���� j+����a1 a2b1 b2
����k
a⇥ b = ab =
2
40 �a3 a2a3 0 �a1�a2 a1 0
3
5
2
4b1b2b3
3
5
Approach
x
! 2 R3, ||!|| = 1
✓
Approach
x✓
! 2 R3, ||!|| = 1
xk
x?
1. Write as x as sum of parallel and perpindicular component to omega
2. Rotate perpindicular component by 2D rotation of theta in plane orthogonal to omega
R = I + w sin ✓ + ww(1� cos ✓)
[Rx can simplify to cross and dot product computations]
Exponential map
x✓
! 2 R3, ||!|| = 1
xk
x?
[standard Taylor series expansion of exp(x) @ x=0 as 1 + x + (1/2!)x2 +…]
Implication: we can approximate change in position due to a small rotation as v ⇥ x, where v = !✓
R = exp(v), where v = !✓
= I + v +1
2!
v2 + . . .
Agenda
• Rotations
• Camera calibration
• Homography
• Ransac
Perspective projection
COP
(X,Y,Z)
(x,y,1)
x =f
Z
X
y =f
Z
Y
x
y
z
[right-handed coordinate system]
Perspective projection revisited
�
2
4x
y
1
3
5 =
2
4f 0 00 f 00 0 1
3
5
2
4X
Y
Z
3
5
�x = fX
� = Z
x =�x
�
=fX
Z
Given (X,Y,Z) and f, compute (x,y) and lambda:
Special case: f = 1
COP
(X,Y,Z)(x,y,1)
• 3D point is obtained by scaling ray pointed at image coordinate • Scale factor = true depth of point
Natural geometric intuition:
[Aside: given an image with a focal length ‘f’, resize by ‘1/f’ to obtain unit-focal-length image]
Z
2
4x
y
1
3
5 =
2
4X
Y
Z
3
5
Homogenous notation
For now, think of above as shorthand notation for
2
4x
y
z
3
5 ⇠
2
4X
Y
Z
3
5
2
4x
y
z
3
5 ⌘
2
4X
Y
Z
3
5
9� s.t. �
2
4x
y
z
3
5 =
2
4X
Y
Z
3
5
Camera projection
3D point in world coordinates
Camera extrinsics (rotation and translation)
Camera instrinsic matrix K (can include skew & non-square pixel size)
�
2
4x
y
1
3
5 =
2
4f 0 00 f 00 0 1
3
5
2
4r11 r12 r13 t
x
r21 r22 r23 t
y
r31 r32 r33 t
z
3
5
2
664
X
Y
Z
1
3
775
camera
world coordinate frame
r1
r2
r3
T
Aside: homogenous notation is shorthand for x =�x
�
Fancier intrinsicsx
s
= s
x
x
y
s
= s
y
y
x
0 = x
s
+ o
x
y
0 = y
s
+ o
y
x” = x
0 + s
✓
y
0
non-square pixels
shifted origin
x
y
✓ skewed image axes
}
}
K =
2
4s
x
s
✓
o
x
0 s
y
o
y
0 0 1
3
5
2
4f 0 00 f 00 0 1
3
5 =
2
4fs
x
fs
✓
o
x
0 fs
y
o
y
0 0 1
3
5
Notation�
2
4x
y
1
3
5 =
2
4fs
x
fs
✓
o
x
0 fs
y
o
y
0 0 1
3
5
2
4r11 r12 r13 t
x
r21 r22 r23 t
y
r31 r32 r33 t
z
3
5
2
664
X
Y
Z
1
3
775
= K3⇥3
⇥R3⇥3 T3⇥1
⇤
2
664
X
Y
Z
1
3
775
= M3⇥4
2
664
X
Y
Z
1
3
775
Claims (without proof): 1. A 3x4 matrix ‘M’ can be a camera matrix iff det(M) is not zero 2. M is determined only up to a scale factor
[Using Matlab’s rows x columns]
Notation (more)M3⇥4
2
664
XYZ1
3
775 =⇥A3⇥3 b3⇥1
⇤
2
664
XYZ1
3
775
= A3⇥3
2
4XYZ
3
5+ b3⇥1
M =
2
4mT
1
mT2
mT3
3
5 , A =
2
4aT1aT2aT3
3
5 , b =
2
4b1b2b3
3
5
Applying the projection matrix
Set of 3D points that project to x = 0:
Set of 3D points that project to y = 0:
Set of 3D points that project to x = inf or y = inf:
� =⇥X Y Z
⇤a3 + b3
⇥X Y Z
⇤a1 + b1 = 0
⇥X Y Z
⇤a2 + b2 = 0
⇥X Y Z
⇤a3 + b3 = 0
x =1
�
(⇥X Y Z
⇤a1 + b1)
y =1
�(⇥X Y Z
⇤a2 + b2)
x
y
a3
Rows of the projection matrix describe the 3 planes defined by the image coordinate system
a1
a2
image plane
COP
(x,y) (X,Y,Z)
What’s set of (X,Y,Z) points that project to same (x,y)?2
4X
Y
Z
3
5 = �w + b where w = A
�1
2
4x
y
1
3
5, b = �A
�1b
What’s the position of COP / pinhole?
COP
A
2
4XYZ
3
5+ b = 0 )
2
4XYZ
3
5 = �A�1b
Other geometric properties
Affine Cameras
• Example: Weak-perspective projection model • Projection defined by 8 parameters • Parallel lines are projected to parallel lines • The transformation can be written as a direct linear transformation
Image coordinates (x,y) are an affine function of world coordinates (X,Y,Z)
mT3 =
⇥0 0 0 1
⇤ x =⇥X Y Z
⇤a1 + b1
y =⇥X Y Z
⇤a2 + b1
Affine transformations = linear transformations plus an offset
Geometric Transformations
Euclidean (trans + rot) preserves lengths + angles
Euclidean
Affine
Projective
Affine: preserves parallel lines
Projective: preserves lines
Agenda
• Rotations
• Camera calibration
• Homography
• Ransac
Calibration: Recover M from scene points P1,..,PN and the corresponding projections in the image plane p1,..,pN
Find M that minimizes the distance between the actual points in the image, pi, and their predicted projections MPi
Problems: • The projection is (in general) non-linear • M is defined up to an arbitrary scale factor
PnP = Perspective n-Point
ii MPp ≡
iT
iT
ii
Ti
T
i PmPmv
PmPmu
3
2
3
1 ==
0)(0)(
32
31
=−
=−
iiT
iT
iiT
iT
vPmPmuPmPm
Write relation between image point, projection matrix, and point in space:
Write non-linear relations between coordinates:
Make them linear:
The math for the calibration procedure follows a recipe that is used in many (most?) problems involving camera geometry, so it’s worth remembering:
0
00
00
111
111
=
⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢
⎣
⎡
−
−
−
−
m
PvPPuP
PvPPuP
TNN
TN
TNN
TN
TT
TT
���
Put all the relations for all the points into a single matrix:
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
==⎥⎦
⎤
−
−⎢⎣
⎡
3
2
1
00
0 mmm
mmPvPu
PP
Tii
Tii
Ti
TiWrite them in
matrix form:
In noise-free case: Lm = 0
(vector of 0’s)
What about noisy case?
min||m||2=1
||Lm||2
Is this the right error to minimize?
If not, what is?
Min right singular vector of L (or eigenvector of LTL)
P1
z
x
y
Pi
(ui,vi)
(u1,v1)
MPi
Ideal error
2
3
2
2
3
1⎟⎟⎠
⎞⎜⎜⎝
⎛
⋅
⋅−+⎟⎟
⎠
⎞⎜⎜⎝
⎛
⋅
⋅−
i
ii
i
ii Pm
PmvPmPmuError(M) =
Initialize nonlinear optimization with “algebraic” solution
Radial Lens Distortions
Radial Lens Distortions
No Distortion Barrel Distortion Pincushion Distortion
Correcting Radial Lens Distortions
Before After
http://www.grasshopperonline.com/barrel_distortion_correction_software.html
Overall approachError(M,k’s)Minimize reprojection error:
Initialize with algebraic solution (approaches in literature based on various assumptions)
Revisiting homographies
�
2
4x
y
1
3
5 =
2
4f 0 00 f 00 0 1
3
5
2
4r11 r12 r13 t
x
r21 r22 r23 t
y
r31 r32 r33 t
z
3
5
2
664
X
Y
01
3
775
Place world coordinate frame on object plane
Projection of planar points
Convert between 2D location on object plane and image coordinate with a 3X3 matrix H(Above holds for any instrinc matrix K)
�
2
4x
y
1
3
5 =
2
4f 0 00 f 00 0 1
3
5
2
4r11 r12 r13 t
x
r21 r22 r23 t
y
r31 r32 r33 t
z
3
5
2
664
X
Y
01
3
775
=
2
4f 0 00 f 00 0 1
3
5
2
4r11 r12 t
x
r21 r22 t
y
r31 r32 t
z
3
5
2
4X
Y
1
3
5
=
2
4fr11 fr12 ft
x
fr21 fr22 ft
y
r31 r32 t
z
3
5
2
4X
Y
1
3
5
Two-views of a plane
Image correspondences
�1
2
4x1
y11
3
5 = H1
2
4XY1
3
5
�2
2
4x2
y21
3
5 = H2
2
4XY1
3
5
�
2
4x2
y21
3
5 = H
2
4XY1
3
5
[Aside: H usually invertible]
[LHS and RHS are related by a scale factor]
�
2
4x2
y21
3
5 = H2H�11
2
4x1
y11
3
5
Computing homography projections
�
2
4x2
y2
1
3
5 =
2
4a b c
d e f
g h i
3
5
2
4x1
y1
1
3
5
Given (x1,y1) and H, how do we compute (x2,y2)?
Is this operation linear in H or (x1,y1)?
x2 =�x2
�
=ax1 + by1 + c
gx1 + hy1 + i
Estimating homographies
Image correspondences
Given corresponding 2D points in left and right image, estimate H
How many corresponding points needed? How many degrees of freedom in H?
Homogenous linear systemAH(:) =
2
6400...
3
75
x2(gx1 + hy1 + i) = ax1 + by1 + c
...
Estimating homographies
Image correspondences
H is determined only up to scale factor (8 DOFs) Need 4 points minimum. How to handle more points?
min||H(:)||2=1
||AH(:)||2
Minimum right singular vector of A (eigenvector of ATA)
AH(:) =
2
6400...
3
75
Given corresponding 2D points in left and right image, estimate H
“Frontalizing” planes using homographies
Estimate homography on (at least) 4 pairs of corresponding points (e.g., corners of quad/rect)
Apply homography on all (x,y) coordinates inside target rectangle to compute source pixel location
“Frontalizing” planes using homographies
Special case of 2 views: rotations about camera center
LECTURE 4. PLANAR SCENES AND HOMOGRAPHY 5
cues (parallax) can only be recovered when T is nonzero. Looking at thehomography equation, the limit of H as d approaches infinity is R. Thus anypair of images of an arbitrary scene captured by a purely rotating camera isrelated by a planar homography.
A planar panorama can be constructed by capturing many overlappingimages at di↵erent rotations, picking an image to be a reference, and thenfinding corresponding points between the overlapping images. The pairwisehomographies are derived from the corresponding points, forming a mosaicthat typically is shaped like a “bow-tie,” as images farther away from thereference are warped outward to fit the homography. The figure below isfrom Pollefeys and Hartley & Zisserman.
4.7. Second Derivation of Homography Constraint
The homography constraint, element by element, in homogenous coordinatesis as follows:
2
4x2
y2z2
3
5 =
2
4H11 H12 H13
H21 H22 H23
H31 H32 H33
3
5
2
4x1
y1z1
3
5 , x2 ⇠ Hx1
In inhomogenous coordinates (x02 = x2/z2 and y02 = y2/z2),
Can be modeled as planar transformations, regardless of scene geometry!
(a) incline L.jpg (img1) (b) incline R.jpg (img2) (c) img2 warped to img1’s frame
Figure 5: Example output for Q6.1: Original images img1 and img2 (left and center) andimg2 warped to fit img1 (right). Notice that the warped image clips out of the image. Wewill fix this in Q6.2
H2to1=computeH(p1,p2)
Inputs: p1 and p2 should be 2⇥N matrices of corresponding (x, y)T coordinatesbetween two images.Outputs: H2to1 should be a 3⇥ 3 matrix encoding the homography that best matchesthe linear equation derived above for Equation 8 (in the least squares sense). Hint:Remember that a homography is only determined up to scale. The Matlab functionseig() or svd() will be useful. Note that this function can be written without anexplicit for-loop over the data points.
6 Stitching it together: Panoramas (30 pts)
We can also use homographies to create a panorama image from multiple views of the samescene. This is possible for example when there is no camera translation between the views(e.g., only rotation about the camera center), as we saw in Q4.2.
First, you will generate panoramas using matched point correspondences between imagesusing the BRIEF matching you implemented in Q2.4. We will assume that there is no errorin your matched point correspondences between images (Although there might be someerrors).
In the next section you will extend the technique to use (potentially noisy) keypointmatches.
You will need to use the provided function warp im=warpH(im, H, out size), whichwarps image im using the homography transform H. The pixels in warp_im are sampledat coordinates in the rectangle (1, 1) to (out_size(2), out_size(1)). The coordinates ofthe pixels in the source image are taken to be (1, 1) to (size(im,2), size(im,1)) andtransformed according to H.
• Q6.1 (15pts) In this problem you will implement and use the function (stub providedin matlab/imageStitching.m):
[panoImg] = imageStitching(img1, img2, H2to1)
on two images from the Dusquesne incline. This function accepts two images and theoutput from the homography estimation function. This function will:
10
Figure 6: Final panorama view. With homography estimated with RANSAC.
• a folder matlab containing all the .m and .mat files you were asked to write andgenerate
• a pdf named writeup.pdf containing the results, explanations and images asked forin the assignment along with to the answers to the questions on homographies.
Submit all the code needed to make your panorama generator run. Make sure all the .m
files that need to run are accessable from the matlab folder without any editing of the pathvariable. If you downloaded and used a feature detector for the extra credit, include thecode with your submission and mention it in your writeup. You may leave the data folderin your submission, but it is not needed. Please zip your homework as usual and submit itusing blackboard.
Appendix: Image Blending
Note: This section is not for credit and is for informational purposes only.
For overlapping pixels, it is common to blend the values of both images. You can sim-ply average the values but that will leave a seam at the edges of the overlapping images.Alternatively, you can obtain a blending value for each image that fades one image into theother. To do this, first create a mask like this for each image you wish to blend:
mask = zeros(size(im,1), size(im,2));
mask(1,:) = 1; mask(end,:) = 1; mask(:,1) = 1; mask(:,end) = 1;
mask = bwdist(mask, ’city’);
mask = mask/max(mask(:));
The function bwdist computes the distance transform of the binarized input image, so thismask will be zero at the borders and 1 at the center of the image. You can warp this maskjust as you warped your images. How would you use the mask weights to compute a linearcombination of the pixels in the overlap region? Your function should behave well whereone or both of the blending constants are zero.
13
Derivation
LECTURE 4. PLANAR SCENES AND HOMOGRAPHY 5
cues (parallax) can only be recovered when T is nonzero. Looking at thehomography equation, the limit of H as d approaches infinity is R. Thus anypair of images of an arbitrary scene captured by a purely rotating camera isrelated by a planar homography.
A planar panorama can be constructed by capturing many overlappingimages at di↵erent rotations, picking an image to be a reference, and thenfinding corresponding points between the overlapping images. The pairwisehomographies are derived from the corresponding points, forming a mosaicthat typically is shaped like a “bow-tie,” as images farther away from thereference are warped outward to fit the homography. The figure below isfrom Pollefeys and Hartley & Zisserman.
4.7. Second Derivation of Homography Constraint
The homography constraint, element by element, in homogenous coordinatesis as follows:
2
4x2
y2z2
3
5 =
2
4H11 H12 H13
H21 H22 H23
H31 H32 H33
3
5
2
4x1
y1z1
3
5 , x2 ⇠ Hx1
In inhomogenous coordinates (x02 = x2/z2 and y02 = y2/z2),
…
K2
2
4X2
Y2
Z2
3
5 = R
2
4X1
Y1
Z1
3
5
�2
2
4x2
y2
1
3
5 =
2
4f2 0 00 f2 00 0 1
3
5
2
4X2
Y2
Z2
3
5
�
2
4x2
y2
1
3
5 = K2RK
�11
2
4x1
y1
1
3
5
Take-home points for homographies
• If camera rotates about its center, then the images are related by a homography irrespective of scene depth.
• If the scene is planar, then images from any two cameras are related by a homography.
• Homography mapping is a 3x3 matrix with 8 degrees of freedom.
�
2
4x2
y2
1
3
5 =
2
4a b c
d e f
g h i
3
5
2
4x1
y1
1
3
5
Matching features
What do we do about the “bad” matches?
49
General problem: we are trying to fit a (geometric) model to noisy data
How about we choose the average vector (least-squares soln)? Why will/won’t this work?
Let’s generalize the problem a bitEstimate best model (a line) that fits data {xi, yi}
minw,b
X
i
(yi � fw,b(xi))2
fw,b(xi) = wxi + b
x
y
Let’s generalize the problem a bit“Least-squares” solution
x
y
RANSAC Line Fitting Example
Sample two points
RANSAC Line Fitting Example
Fit Line
RANSAC Line Fitting Example
Total number of points within a threshold of line.
RANSAC Line Fitting Example
Repeat, until get a good result
RANSAC Line Fitting Example
Repeat, until get a good result
RANSAC Line Fitting Example
Repeat, until get a good result
RAndom SAmple Consensus
Select one match, count inliers
RAndom SAmple Consensus
Select one match, count inliers
Least squares fit
Find “average” translation vector for the largest group of inliers
RANSAC for estimating transformation
RANSAC loop: 1. Select feature pairs (at random) 2. Compute transformation T (exact) 3. Compute inliers (point matches where |pi’ - T pi|2< ε) 4. Keep largest set of inliers
5. Re-compute least-squares estimate of transformation T using all of the inliers
RANSAC for estimating transformation
RANSAC loop: 1. Select feature pairs (at random) 2. Compute transformation T (exact) 3. Compute inliers (point matches where |pi’ - T pi|2< ε) 4. Keep largest set of inliers
5. Re-compute least-squares estimate of transformation T using all of the inliers
Ah = 0, A 2 R8X9 h, 0 2 R9
Recall homography estimation: how do we estimate with all inlier points?
RANSAC for estimating transformation
RANSAC loop: 1. Select feature pairs (at random) 2. Compute transformation T (exact) 3. Compute inliers (point matches where |pi’ - T pi|2< ε) 4. Keep largest set of inliers
5. Re-compute least-squares estimate of transformation T using all of the inliers
Ah = 0, A 2 R8X9 h, 0 2 R9
Recall homography estimation: how do we estimate with all inlier points?
RANSAC for alignment
RANSAC for alignment
RANSAC for alignment
Planar object recognition(what is transformation used; how many pairs must be selected in initial step?