Structure from Motion
Read Chapter 7 in Szeliski’s book
Schedule (tentative)
2
# date topic1 Sep.22 Introduction and geometry2 Sep.29 Invariant features 3 Oct.6 Camera models and calibration4 Oct.13 Multiple-view geometry5 Oct.20 Model fitting (RANSAC, EM, …)6 Oct.27 Stereo Matching7 Nov.3 2D Shape Matching 8 Nov.10 Segmentation9 Nov.17 Shape from Silhouettes10 Nov.24 Structure from motion11 Dec.1 Optical Flow 12 Dec.8 Tracking (Kalman, particle filter)13 Dec.15 Object category recognition 14 Dec.22 Specific object recognition
Today’s class
• Structure from motion
– factorization – sequential
– bundle adjustment
Factorization
• Factorise observations in structure of the scene and motion/calibration of the camera
• Use all points in all images at the same time
Affine factorisation Projective factorisation
Affine cameraThe affine projection equations are
1
=
j
j
j
yi
xi
ij
ij
ZYX
PP
yx
10001
1
=
j
j
j
yi
xi
ij
ij
ZYX
PP
yx
~~
4
4
=
=
−
−
j
j
j
yi
xi
ij
ijy
iij
xiij
ZYX
PP
yx
PyPx
how to find the origin? or for that matter a 3D reference point?
affine projection preserves center of gravity
∑−=i
ijijij xxx~ ∑−=i
ijijij yyy~
Orthographic factorizationThe orthographic projection equations are
where njmijiij ,...,1,,...,1,Mm === P
All equations can be collected for all i and j
where
[ ]n
mmnmm
n
n
M,...,M,M,,
mmm
mmmmmm
212
1
21
22221
11211
=
=
= M
P
PP
Pm
MPm =
M ~~
m
=
=
=
j
j
j
jyi
xi
iij
ijij
ZYX
,PP,
yx
P
Note that P and M are resp. 2mx3 and 3xn matrices and therefore the rank of m is at most 3
(Tomasi Kanade’92)
Orthographic factorizationFactorize m through singular value decomposition
An affine reconstruction is obtained as follows
TVUm Σ=
TVMUP Σ== ~,~
(Tomasi Kanade’92)
[ ]n
mmnmm
n
n
M,...,M,M
mmm
mmmmmm
min 212
1
21
22221
11211
−
P
PP
Closest rank-3 approximation yields MLE!
0~~1~~1~~
1
1
1
=
=
=
−−
−−
−−
TT
TT
TT
yi
xi
yi
yi
xi
xi
PP
PP
PP
AA
AA
AA
0~~1~~1~~
=
=
=
T
T
T
yi
xi
yi
yi
xi
xi
PP
PP
PP
C
C
C
A metric reconstruction is obtained as follows
Where A is computed from
Orthographic factorizationFactorize m through singular value decomposition
An affine reconstruction is obtained as follows
TVUm Σ=
TVMUP Σ== ~,~
MAMAPP ~,~ 1 == −
0
1
1
=
=
=
T
T
T
yi
xi
yi
yi
xi
xi
PP
PP
PP 3 linear equations per view on symmetric matrix C (6DOF)
A can be obtained from Cthrough Cholesky factorisationand inversion
(Tomasi Kanade’92)
Examples
Tomasi Kanade’92,Poelman & Kanade’94
Examples
Tomasi Kanade’92,Poelman & Kanade’94
Examples
Tomasi Kanade’92,Poelman & Kanade’94
Examples
Tomasi Kanade’92,Poelman & Kanade’94
Perspective factorization
The camera equations
for a fixed image i can be written in matrix form as
where
mjmijiijij ,...,1,,...,1,Mmλ === P
MPm iii =Λ
[ ] [ ]( )imiii
mimiii
λ,...,λ,λdiagM,...,M,M , m,...,m,m
21
2121
=Λ== Mm
Perspective factorization
All equations can be collected for all i as
wherePMm =
=
Λ
ΛΛ
=
mnn P
PP
P
m
mm
m...
,...
2
1
22
11
In these formulas m are known, but Λi,P and Mare unknown
Observe that PM is a product of a 3mx4 matrix and a 4xn matrix, i.e. it is a rank-4 matrix
Perspective factorization algorithm
Assume that Λi are known, then PM is known.
Use the singular value decomposition PM=UΣ VT
In the noise-free caseΣ=diag(σ1,σ2,σ3,σ4,0, … ,0)
and a reconstruction can be obtained by setting:
P=the first four columns of UΣ.M=the first four rows of V.
Iterative perspective factorization
When Λi are unknown the following algorithm can be used:
1. Set λij=1 (affine approximation).
2. Factorize PM and obtain an estimate of P and M. If σ5 is sufficiently small then STOP.
3. Use m, P and M to estimate Λi from the cameraequations (linearly) mi Λi=PiM
4. Goto 2.
In general the algorithm minimizes the proximity measure P(Λ,P,M)=σ5
Note that structure and motion recovered up to an arbitrary projective transformation
Further Factorization work
Factorization with uncertainty
Factorization for dynamic scenes
(Irani & Anandan, IJCV’02)
(Costeira and Kanade ‘94)(Bregler et al. ‘00, Brand ‘01)
(Yan and Pollefeys, ‘05/’06)
Multi-Camera Factorizations
(Static) affine cameras Rigidly moving object Camera calibration using rigid motion 2D feature point trajectories as input No feature point correspondences between different camera views
required
rank ≤13all tracks of all affine cameras form rank 13 subspace!
juxtapose x, y coordinates for all points and cameras
for a
ll fr
ames
(Angst & Pollefeys ICCV09)
Factorization technique 3rd order tensor with rank 13 constraint
C : camera matrices M : motion matrix S : Shape matrix
Multi-Camera Factorizations
(Angst & Pollefeys ICCV09)
Experiment on real data
Minimal cases
extra camera: 1 point
(Angst & Pollefeys ICCV09)
Multi-Camera Factorizations
practical structure and motion recovery from images
• Obtain reliable matches using matching or tracking and 2/3-view relations
• Compute initial structure and motion• Refine structure and motion• Auto-calibrate• Refine metric structure and motion
Initialize Motion (P1,P2 compatibel with F)
Sequential Structure and Motion Computation
Initialize Structure (minimize reprojection error)
Extend motion(compute pose through matches seen in 2 or more previous views)
Extend structure(Initialize new structure,refine existing structure)
Computation of initial structure and motion
according to Hartley and Zisserman “this area is still to some extend a
black-art”
All features not visible in all images⇒ No direct method (factorization not applicable)⇒ Build partial reconstructions and assemble
(more views is more stable, but less corresp.)
1) Sequential structure and motion recovery
2) Hierarchical structure and motion recovery
Sequential structure and motion recovery
• Initialize structure and motion from two views
• For each additional view– Determine pose– Refine and extend structure
• Determine correspondences robustly by jointly estimating matches and epipolar geometry
Initial structure and motion
[ ][ ][ ]eeaFeP
0IPT
x +=
=
2
1
Epipolar geometry ↔ Projective calibration
012 =FmmT
compatible with F
Yields correct projective camera setup(Faugeras´92,Hartley´92)
Obtain structure through triangulationUse reprojection error for minimizationAvoid measurements in projective space
Compute Pi+1 using robust approach (6-point RANSAC)Extend and refine reconstruction
)x,...,X(xPx 11 −= iii
2D-2D
2D-3D 2D-3D
mimi+1
M
new view
Determine pose towards existing structure
Compute P with 6-point RANSAC
• Generate hypothesis using 6 points
• Count inliers
• Projection error ( )( ) ?x,x...,,xXP 11 td iii <−
• Back-projection error ( ) ijtd jiij <∀< ?,x,xF• Re-projection error ( )( ) td iiii <− x,x,x...,,xXP 11
• 3D error ( )( ) ?X,xP 3-1
Dii td <Λ
• Projection error with covariance ( )( ) td iii <−Λ x,x...,,xXP 11
• Expensive testing? Abort early if not promising• Verify at random, abort if e.g. P(wrong)>0.95
(Chum and Matas, BMVC’02)
Calibrated structure from motion
• Equations more complicated, but less degeneracies
• For calibrated cameras:– 5-point relative motion (5DOF)
Nister CVPR03
– 3-point pose estimation (6DOF)Haralick et al. IJCV94
D. Nistér, An efficient solution to the five-point relative pose problem, In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2003), Volume 2, pages. 195-202, 2003.
R. Haralick, C. Lee, K. Ottenberg, M. Nolle. Review and Analysis of Solutions of the Three Point Perspective Pose Estimation Problem. Int’l Journal of Computer Vision, 13, 3, 331-356, 1994.
• Linear equations for 5 points
• Linear solution space
• Non-linear constraints
5-point relative motion
10 cubic polynomials
scale does not matter, choose
(Nister, CVPR03)
5-point relative motion
• Perform Gauss-Jordan elimination on polynomials
-z
-z
-z
represents polynomial of degree n in z
(Nister, CVPR03)
Three points perspective pose –p3p
(Haralick et al., IJCV94)
All techniques yield 4th order polynomial
Haralick et al. recommends using Finsterwalder’stechnique as it yields the best results numerically
19031841
Minimal solversLot’s of recent activity using Groebner bases:• Henrik Stewénius, David Nistér, Fredrik Kahl, Frederik
Schaffalitzky: A Minimal Solution for Relative Pose with Unknown Focal Length, CVPR 2005.
• H. Stewénius, D. Nistér, M. Oskarsson, and K. Åström. Solutions to minimal generalized relative pose problems. Omnivis 2005.
• D. Nistér, A Minimal solution to the generalised 3-point pose problem, CVPR 2004
• Martin Bujnak, Zuzana Kukelova, Tomás Pajdla: A general solution to the P4P problem for camera with unknown focal length. CVPR 2008.
• Brian Clipp, Christopher Zach, Jan-Michael Frahm and Marc Pollefeys, A New Minimal Solution to the Relative Pose of a Calibrated Stereo Camera with Small Field of View Overlap, ICCV 2009.
• Zuzana Kukelova, Martin Bujnak, Tomás Pajdla: Automatic Generator of Minimal Problem Solvers. ECCV 2008.
• F. Fraundorfer, P. Tanskanen and M. Pollefeys. A Minimal Case Solution to the Calibrated Relative Pose Problem for the Case of Two Known Orientation Angles. ECCV 2010.
• …
Minimal relative pose with know vertical
40
-g
Fraundorfer, Tanskanen and Pollefeys, ECCV2010
5 linear unknowns → linear 5 point algorithm3 unknowns → quartic 3 point algorithm
Vertical direction can often be estimated• inertial sensor• vanishing point
Incremental Structure from Motion
Photo Tourism
Noah Snavely, Steve Seitz, Rick Szeliski
University of Washington
Richard Szeliski
Incremental structure from motion
• Automatically select an initial pair of images
Incremental structure from motion
Incremental structure from motion
Hierarchical structure and motion recovery
• Compute 2-view• Compute 3-view• Stitch 3-view reconstructions• Merge and refine reconstruction
FT
H
PM
Hierarchical structure from motion
• Farenzena, Fusiello, and Gherardi,Structure-and-motion Pipeline on a Hierarchical Cluster Tree, 3DIM 2009, October 2009
Richar
ICVSS 2010 46
Hierarchical structure from motion
Richar
ICVSS 2010 47
Hierarchical structure from motion
Richar
ICVSS 2010 48
Stitching 3-view reconstructions
Different possibilities1. Align (P2,P3)with(P’1,P’2) ( ) ( )-1
23-1
12H
HP',PHP',Pminarg AA dd +
2. Align X,X’ (and C’C’) ( )∑j
jjAd HX',XminargH
3. Minimize reproj. error ( )( )∑
∑+
jjj
jjj
d
d
x',HXP'
x,X'PHminarg 1-
H
4. MLE (merge) ( )∑j
jjd x,PXminargXP,
More issues
• Repetition ambiguities
• Large scale reconstructions
Marc PollefeysICVSS 2010 50
Disambiguating visual relations using loop constraints
51
(Zach et al CVPR‘10)
Repetition and symmetry detection
52
(Wu et al ECCV‘10)
53
•Challenge: Error accumulation yields drift of relative scale, orientation and position
•Solution:Cancel drift by closing loops (e.g. at intersections)Need to visually recognize locations
•
Video-only large-scale reconstruction?
Solving 3D puzzles with VIPs
SIFT features•Extracted from 2D images•Variation due to viewpoint
VIP features•Extracted from 3D model•Viewpoint invariant
54
(Wu et al., CVPR08)
am
55
Where am I? What am I looking at?
3D Database
xSIFT VIP
56
Image query
Location recognition (Baatz et al., ECCV2010)
57
Projective ambiguity
reconstruction from uncalibrated images only⇒ projective ambiguity on reconstruction
´M´M))((Mm 1 PTPTP === − Similarity(7DOF)
Affine(12DOF)
Projective(15DOF)
58
Euclidean projection matrix
=
1yy
xx
cfcsf
K
Factorization of Euclidean projection matrix
Intrinsics:
Extrinsics: ( )t,RNote: every projection matrix can be factorized, but only meaningful for euclidean projection matrices
(camera geometry)
(camera motion)
59
The Absolute Quadric
Eliminate extrinsics from equation
Equivalent to projection of quadric
))(Ω)((Ω *1* TTTTT PTTTPTPPKK −−==
Absolute Quadric also exists in projective world
Absolute Quadric
T´Ω´´ * PP=Transforming world so thatreduces ambiguity to metric
** ΩΩ´ →
2 2 2 0A B C+ + =
0AX BY CZ D+ + + =
60
ω*
Ω*
Absolute Quadric = calibration object which is always present but can only be observed through constraints on the intrinsics
Tii
Tiii Ωω KKPP == ∗∗
Absolute quadric and self-calibration
Projection equation:
• Translate constraints on K • through projection equation• to constraints on Ω*
• Don’t treat all constraints equal
61
Practical linear self-calibration
( ) ( )( )( )( ) 0Ω
0Ω0Ω
0ΩΩ
23T
13T
12T
22T
11T
===
=−
∗
∗
∗
∗∗
PPPPPP
PPPP(Pollefeys et al., ECCV‘02)
2
* 2
ˆ 0 0ˆ0 0
0 0 1
f
f
= Ω ≈
P PT TKK
( ) ( )( ) ( ) 0ΩΩ
0ΩΩ
33T
22T
33T
11T
=−=−
∗∗
∗∗
PPPPPPPP
(only rough aproximation, but still usefull to avoid degenerate configurations)
(relatively accurate for most cameras)
91
91
1.011.0
101.01
2.01
=
1yy
xx
cfcsf
K
after normalization!
when fixating point at image-center not only absolute quadric diag(1,1,1,0) satisfies ICCV98 eqs., but also diag(1,1,1,a), i.e. real or imaginary spheres!
Recent related work (Gurdjos et al. ICCV‘09)
62
Upgrade from projective to metric
• Locate Ω* in projective reconstruction (self-calibration)Problem: not always unique (Critical Motion Sequences)
• Transform projective reconstruction • to bring Ω* in canonical position
Refining structure and motion
• Minimize reprojection error
– Maximum Likelyhood Estimation (if error zero-mean Gaussian noise)
– Huge problem but can be solved efficiently(Bundle adjustment)
( )∑∑= =
m
k
n
iikD
ik 1 1
2
kiM,P
MP,mmin
Non-linear least-squares
• Newton iteration• Levenberg-Marquardt • Sparse Levenberg-Marquardt
(P)X f= (P)X argminP
f−
Newton iteration
Taylor approximation
∆+≈∆+ J)(P )(P 00 ff PXJ∂∂
=Jacobian
)(PX 1f−
∆−=∆−−≈− JJ)(PX)(PX 001 eff
( ) 0T-1T
0TT JJJJJJ ee =∆⇒=∆⇒
∆+=+ i1i PP ( ) 0T-1T JJJ e=∆
( ) 01-T-11-T JJJ eΣΣ=∆
normal eq.
Levenberg-Marquardt
0TT JNJJ e=∆=∆
0TJN' e=∆
Augmented normal equations
Normal equations
J)λdiag(JJJN' TT +=
30 10λ −=
10/λλ :success 1 ii =+
ii λ10λ :failure = solve again
accept
λ small ~ Newton (quadratic convergence)
λ large ~ descent (guaranteed decrease)
Levenberg-Marquardt
Requirements for minimization• Function to compute f• Start value P0
• Optionally, function to compute J(but numerical ok, too)
Richard Szeliski
Bundle Adjustment
• What makes this non-linear minimization hard?– many more parameters: potentially slow– poorer conditioning (high correlation)– potentially lots of outliers– gauge (coordinate) freedom
Chained transformations
• Think of the optimization as a set of transformations along with their derivatives
Richard Szeliski
Richard Szeliski
Levenberg-Marquardt
• Iterative non-linear least squares [Press’92]– Linearize measurement equations
– Substitute into log-likelihood equation: quadratic cost function in ∆m
Richard Szeliski
Robust error models
• Outlier rejection– use robust penalty applied
to each set of jointmeasurements
– for extremely bad data, use random sampling [RANSAC, Fischler & Bolles, CACM’81]
Richard Szeliski
• Iterative non-linear least squares [Press’92]– Solve for minimum
Hessian:
error:
Levenberg-Marquardt
Chained transformations
• Only some derivatives are non-zero:j th camera, i th point
Richard Szeliski
Multi-camera rig
• Easy extension of generalized bundler:
Richard Szeliski
Richard Szeliski
Lots of parameters: sparsity
• Only a few entries in Jacobian are non-zero
Exploiting sparsity
• Schur complement to get reduced camera matrix
Richard Szeliski
Sparse Levenberg-Marquardt
• complexity for solving– prohibitive for large problems
(100 views 10,000 points ~30,000 unknowns)
• Partition parameters– partition A – partition B (only dependent on A and itself)
0T-1 JN' e=∆3N
Sparse bundle adjustment
residuals:normal equations:
with
note: tie points should be in partition A
Sparse bundle adjustment
normal equations:
modified normal equations:
solve in two parts:
Sparse bundle adjustment
U1
U2
U3
WT
W
V
P1 P2 P3 M
Jacobian of has sparse block structure
=J == JJN T
12xm 3xn(in general
much larger)
im.pts. view 1
( )( )∑∑= =
m
k
n
iikD
1 1
2ki MP,m
Needed for non-linear minimization
Sparse bundle adjustment
• Eliminate dependence of camera/motion parameters on structure parametersNote in general 3n >> 11m
WT V
U-WV-1WT
=×
− −
NI0WVI 1
11xm 3xn
Allows much more efficient computationse.g. 100 views,10000 points,
solve ±1000x1000, not ±30000x30000
Often still band diagonaluse sparse linear algebra algorithms
Sparse bundle adjustment
normal equations:
modified normal equations:
solve in two parts:
Sparse bundle adjustment
• Covariance estimation
-1WVY =( )
1a
Tb
1-a
VYYWWVU
−
+
+∑=∑−=∑
Yaab ∑=∑ -
Direct vs. iterative solvers
• How do they work?• When should you use them?
Richard Szeliski
Large reduced camera matrices
• Use iterative preconditioned conjugate gradient– [Jeong, Nister, Steedly, Szeliski, Kweon, CVPR 2009]– [Agarwal, Snavely, Seitz, and Szeliski, ECCV 2009]– Block diagonal works well– Sometimes partial Cholesky does too– Direct solvers for small problems
Richard Szeliski
Large-scale structure from motion
Richard Szeliski
Large-scale reconstruction
• Most of the models shown so far have had ~500 images
• How do we scale from 100s to 10,000s of images?
• Observation: Internet collections represent very non-uniform samplings of viewpoint
Richard Szeliski
Skeletal graphs for efficient structure from motion
Noah Snavely, Steve Seitz,Richard Szeliski
CVPR’2008
Skeletal set
• Goal: select a smaller set of images to reconstruct, without sacrificing quality of reconstruction
• Two problems:– How do we measure quality?– How do we find a subset of images with
bounded quality loss?
Skeletal set
Goal: given an image graph ,• select a small set S of important
images to reconstruct, bounding the loss in quality of the reconstruction
• Reconstruct the skeletal set S • Estimate the remaining images with
much faster pose estimation steps
Properties of the skeletal set
• Should touch all parts ofDominating set
• Should form a single reconstructionConnected dominating set
• Should result in an accurate reconstruction ?
Example
Full graph Skeletal graph
Representing information in a graphWhat kind of information?– No absolute information about camera
positions– Each edge provides information about the
relative positions of two images
– … but not all edges are equally informativeWe model information with the uncertainty (covariance) in pairwise camera positions
Estimating certainty
• Usually measured with covariance matrix
• SfM has thousands or millions of parameters
• We only measure covariance in camera positions (3x3 matrix for every camera)
Pantheon
Full graph Skeletal graph (t=16)
Skeletal reconstruction
101 images
After adding leaves579 images
After final optimization579 images
Pantheon
Trafalgar Square
2973 images registered (277 in skeletal set)
Trafalgar Square
Running time
0
20
40
60
80
100
120
140
160
180
Stonehenge St. Peters Pantheon Pisa Trafalgar
Full reconstruction
Skeletal reconstruction
Skeletal reconstruction + BA
(~10 days) (~30 days)
hour
s
Building Rome in a Day
Sameer Agarwal, Noah Snavely,Ian Simon, Steven M. Seitz,
Richard SzeliskiICCV’2009
Distributed matching pipeline
Query expansion
Results: Dubrovnik
Results: Rome
Changchang Wu’s SfM code
for iconic graph• uses 5-point+RANSAC for 2-view initialization• uses 3-point+RANSAC for adding views• performs bundle adjustmentFor additional images• use 3-point+RANSAC pose estimation
Rome on a cloudless day
(Frahm et al. ECCV 2010)• GIST & clustering (1h35)
SIFT & Geometric verification (11h36)
SfM & Bundle (8h35)
Dense Reconstruction (1h58)
Some numbers• 1PC• 2.88M images• 100k clusters• 22k SfM with 307k images• 63k 3D models• Largest model 5700 images• Total time 23h53
Structure from motion: limitations
• Very difficult to reliably estimate metricstructure and motion unless:– large (x or y) rotation or– large field of view and depth variation
• Camera calibration important for Euclidean reconstructions
• Need good feature tracker
Related problems
• On-line structure from motion and SLaM (Simultaneous Localization and Mapping)– Kalman filter (linear)– Particle filters (non-linear)
Visual SLAM
• Real-time Stereo Visual SLAM
113
(Clipp et al., IROS2010)
Open challenges
• Large scale structure from motion– Complete building– Complete city
Open source vode available, e.g.
Christopher Zach (SfM, Bundle adjustment)http://www.inf.ethz.ch/personal/chzach/opensource.html
Photo Tourism Bundler:http://phototour.cs.washington.edu/bundler/
116
Next week:Optical Flow