structure from motion - cvg€¦ · • obtain reliable matches using matching or tracking and...

Post on 03-Jun-2020

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Structure from Motion

Read Chapter 7 in Szeliski’s book

Schedule (tentative)

2

# date topic1 Sep.22 Introduction and geometry2 Sep.29 Invariant features 3 Oct.6 Camera models and calibration4 Oct.13 Multiple-view geometry5 Oct.20 Model fitting (RANSAC, EM, …)6 Oct.27 Stereo Matching7 Nov.3 2D Shape Matching 8 Nov.10 Segmentation9 Nov.17 Shape from Silhouettes10 Nov.24 Structure from motion11 Dec.1 Optical Flow 12 Dec.8 Tracking (Kalman, particle filter)13 Dec.15 Object category recognition 14 Dec.22 Specific object recognition

Today’s class

• Structure from motion

– factorization – sequential

– bundle adjustment

Factorization

• Factorise observations in structure of the scene and motion/calibration of the camera

• Use all points in all images at the same time

Affine factorisation Projective factorisation

Affine cameraThe affine projection equations are

1

=

j

j

j

yi

xi

ij

ij

ZYX

PP

yx

10001

1

=

j

j

j

yi

xi

ij

ij

ZYX

PP

yx

~~

4

4

=

=

j

j

j

yi

xi

ij

ijy

iij

xiij

ZYX

PP

yx

PyPx

how to find the origin? or for that matter a 3D reference point?

affine projection preserves center of gravity

∑−=i

ijijij xxx~ ∑−=i

ijijij yyy~

Orthographic factorizationThe orthographic projection equations are

where njmijiij ,...,1,,...,1,Mm === P

All equations can be collected for all i and j

where

[ ]n

mmnmm

n

n

M,...,M,M,,

mmm

mmmmmm

212

1

21

22221

11211

=

=

= M

P

PP

Pm

MPm =

M ~~

m

=

=

=

j

j

j

jyi

xi

iij

ijij

ZYX

,PP,

yx

P

Note that P and M are resp. 2mx3 and 3xn matrices and therefore the rank of m is at most 3

(Tomasi Kanade’92)

Orthographic factorizationFactorize m through singular value decomposition

An affine reconstruction is obtained as follows

TVUm Σ=

TVMUP Σ== ~,~

(Tomasi Kanade’92)

[ ]n

mmnmm

n

n

M,...,M,M

mmm

mmmmmm

min 212

1

21

22221

11211

P

PP

Closest rank-3 approximation yields MLE!

0~~1~~1~~

1

1

1

=

=

=

−−

−−

−−

TT

TT

TT

yi

xi

yi

yi

xi

xi

PP

PP

PP

AA

AA

AA

0~~1~~1~~

=

=

=

T

T

T

yi

xi

yi

yi

xi

xi

PP

PP

PP

C

C

C

A metric reconstruction is obtained as follows

Where A is computed from

Orthographic factorizationFactorize m through singular value decomposition

An affine reconstruction is obtained as follows

TVUm Σ=

TVMUP Σ== ~,~

MAMAPP ~,~ 1 == −

0

1

1

=

=

=

T

T

T

yi

xi

yi

yi

xi

xi

PP

PP

PP 3 linear equations per view on symmetric matrix C (6DOF)

A can be obtained from Cthrough Cholesky factorisationand inversion

(Tomasi Kanade’92)

Examples

Tomasi Kanade’92,Poelman & Kanade’94

Examples

Tomasi Kanade’92,Poelman & Kanade’94

Examples

Tomasi Kanade’92,Poelman & Kanade’94

Examples

Tomasi Kanade’92,Poelman & Kanade’94

Perspective factorization

The camera equations

for a fixed image i can be written in matrix form as

where

mjmijiijij ,...,1,,...,1,Mmλ === P

MPm iii =Λ

[ ] [ ]( )imiii

mimiii

λ,...,λ,λdiagM,...,M,M , m,...,m,m

21

2121

=Λ== Mm

Perspective factorization

All equations can be collected for all i as

wherePMm =

=

Λ

ΛΛ

=

mnn P

PP

P

m

mm

m...

,...

2

1

22

11

In these formulas m are known, but Λi,P and Mare unknown

Observe that PM is a product of a 3mx4 matrix and a 4xn matrix, i.e. it is a rank-4 matrix

Perspective factorization algorithm

Assume that Λi are known, then PM is known.

Use the singular value decomposition PM=UΣ VT

In the noise-free caseΣ=diag(σ1,σ2,σ3,σ4,0, … ,0)

and a reconstruction can be obtained by setting:

P=the first four columns of UΣ.M=the first four rows of V.

Iterative perspective factorization

When Λi are unknown the following algorithm can be used:

1. Set λij=1 (affine approximation).

2. Factorize PM and obtain an estimate of P and M. If σ5 is sufficiently small then STOP.

3. Use m, P and M to estimate Λi from the cameraequations (linearly) mi Λi=PiM

4. Goto 2.

In general the algorithm minimizes the proximity measure P(Λ,P,M)=σ5

Note that structure and motion recovered up to an arbitrary projective transformation

Further Factorization work

Factorization with uncertainty

Factorization for dynamic scenes

(Irani & Anandan, IJCV’02)

(Costeira and Kanade ‘94)(Bregler et al. ‘00, Brand ‘01)

(Yan and Pollefeys, ‘05/’06)

Multi-Camera Factorizations

(Static) affine cameras Rigidly moving object Camera calibration using rigid motion 2D feature point trajectories as input No feature point correspondences between different camera views

required

rank ≤13all tracks of all affine cameras form rank 13 subspace!

juxtapose x, y coordinates for all points and cameras

for a

ll fr

ames

(Angst & Pollefeys ICCV09)

Factorization technique 3rd order tensor with rank 13 constraint

C : camera matrices M : motion matrix S : Shape matrix

Multi-Camera Factorizations

(Angst & Pollefeys ICCV09)

Experiment on real data

Minimal cases

extra camera: 1 point

(Angst & Pollefeys ICCV09)

Multi-Camera Factorizations

practical structure and motion recovery from images

• Obtain reliable matches using matching or tracking and 2/3-view relations

• Compute initial structure and motion• Refine structure and motion• Auto-calibrate• Refine metric structure and motion

Initialize Motion (P1,P2 compatibel with F)

Sequential Structure and Motion Computation

Initialize Structure (minimize reprojection error)

Extend motion(compute pose through matches seen in 2 or more previous views)

Extend structure(Initialize new structure,refine existing structure)

Computation of initial structure and motion

according to Hartley and Zisserman “this area is still to some extend a

black-art”

All features not visible in all images⇒ No direct method (factorization not applicable)⇒ Build partial reconstructions and assemble

(more views is more stable, but less corresp.)

1) Sequential structure and motion recovery

2) Hierarchical structure and motion recovery

Sequential structure and motion recovery

• Initialize structure and motion from two views

• For each additional view– Determine pose– Refine and extend structure

• Determine correspondences robustly by jointly estimating matches and epipolar geometry

Initial structure and motion

[ ][ ][ ]eeaFeP

0IPT

x +=

=

2

1

Epipolar geometry ↔ Projective calibration

012 =FmmT

compatible with F

Yields correct projective camera setup(Faugeras´92,Hartley´92)

Obtain structure through triangulationUse reprojection error for minimizationAvoid measurements in projective space

Compute Pi+1 using robust approach (6-point RANSAC)Extend and refine reconstruction

)x,...,X(xPx 11 −= iii

2D-2D

2D-3D 2D-3D

mimi+1

M

new view

Determine pose towards existing structure

Compute P with 6-point RANSAC

• Generate hypothesis using 6 points

• Count inliers

• Projection error ( )( ) ?x,x...,,xXP 11 td iii <−

• Back-projection error ( ) ijtd jiij <∀< ?,x,xF• Re-projection error ( )( ) td iiii <− x,x,x...,,xXP 11

• 3D error ( )( ) ?X,xP 3-1

Dii td <Λ

• Projection error with covariance ( )( ) td iii <−Λ x,x...,,xXP 11

• Expensive testing? Abort early if not promising• Verify at random, abort if e.g. P(wrong)>0.95

(Chum and Matas, BMVC’02)

Calibrated structure from motion

• Equations more complicated, but less degeneracies

• For calibrated cameras:– 5-point relative motion (5DOF)

Nister CVPR03

– 3-point pose estimation (6DOF)Haralick et al. IJCV94

D. Nistér, An efficient solution to the five-point relative pose problem, In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2003), Volume 2, pages. 195-202, 2003.

R. Haralick, C. Lee, K. Ottenberg, M. Nolle. Review and Analysis of Solutions of the Three Point Perspective Pose Estimation Problem. Int’l Journal of Computer Vision, 13, 3, 331-356, 1994.

• Linear equations for 5 points

• Linear solution space

• Non-linear constraints

5-point relative motion

10 cubic polynomials

scale does not matter, choose

(Nister, CVPR03)

5-point relative motion

• Perform Gauss-Jordan elimination on polynomials

-z

-z

-z

represents polynomial of degree n in z

(Nister, CVPR03)

Three points perspective pose –p3p

(Haralick et al., IJCV94)

All techniques yield 4th order polynomial

Haralick et al. recommends using Finsterwalder’stechnique as it yields the best results numerically

19031841

Minimal solversLot’s of recent activity using Groebner bases:• Henrik Stewénius, David Nistér, Fredrik Kahl, Frederik

Schaffalitzky: A Minimal Solution for Relative Pose with Unknown Focal Length, CVPR 2005.

• H. Stewénius, D. Nistér, M. Oskarsson, and K. Åström. Solutions to minimal generalized relative pose problems. Omnivis 2005.

• D. Nistér, A Minimal solution to the generalised 3-point pose problem, CVPR 2004

• Martin Bujnak, Zuzana Kukelova, Tomás Pajdla: A general solution to the P4P problem for camera with unknown focal length. CVPR 2008.

• Brian Clipp, Christopher Zach, Jan-Michael Frahm and Marc Pollefeys, A New Minimal Solution to the Relative Pose of a Calibrated Stereo Camera with Small Field of View Overlap, ICCV 2009.

• Zuzana Kukelova, Martin Bujnak, Tomás Pajdla: Automatic Generator of Minimal Problem Solvers. ECCV 2008.

• F. Fraundorfer, P. Tanskanen and M. Pollefeys. A Minimal Case Solution to the Calibrated Relative Pose Problem for the Case of Two Known Orientation Angles. ECCV 2010.

• …

Minimal relative pose with know vertical

40

-g

Fraundorfer, Tanskanen and Pollefeys, ECCV2010

5 linear unknowns → linear 5 point algorithm3 unknowns → quartic 3 point algorithm

Vertical direction can often be estimated• inertial sensor• vanishing point

Incremental Structure from Motion

Photo Tourism

Noah Snavely, Steve Seitz, Rick Szeliski

University of Washington

Richard Szeliski

Incremental structure from motion

• Automatically select an initial pair of images

Incremental structure from motion

Incremental structure from motion

Hierarchical structure and motion recovery

• Compute 2-view• Compute 3-view• Stitch 3-view reconstructions• Merge and refine reconstruction

FT

H

PM

Hierarchical structure from motion

• Farenzena, Fusiello, and Gherardi,Structure-and-motion Pipeline on a Hierarchical Cluster Tree, 3DIM 2009, October 2009

Richar

ICVSS 2010 46

Hierarchical structure from motion

Richar

ICVSS 2010 47

Hierarchical structure from motion

Richar

ICVSS 2010 48

Stitching 3-view reconstructions

Different possibilities1. Align (P2,P3)with(P’1,P’2) ( ) ( )-1

23-1

12H

HP',PHP',Pminarg AA dd +

2. Align X,X’ (and C’C’) ( )∑j

jjAd HX',XminargH

3. Minimize reproj. error ( )( )∑

∑+

jjj

jjj

d

d

x',HXP'

x,X'PHminarg 1-

H

4. MLE (merge) ( )∑j

jjd x,PXminargXP,

More issues

• Repetition ambiguities

• Large scale reconstructions

Marc PollefeysICVSS 2010 50

Disambiguating visual relations using loop constraints

51

(Zach et al CVPR‘10)

Repetition and symmetry detection

52

(Wu et al ECCV‘10)

53

•Challenge: Error accumulation yields drift of relative scale, orientation and position

•Solution:Cancel drift by closing loops (e.g. at intersections)Need to visually recognize locations

Video-only large-scale reconstruction?

Solving 3D puzzles with VIPs

SIFT features•Extracted from 2D images•Variation due to viewpoint

VIP features•Extracted from 3D model•Viewpoint invariant

54

(Wu et al., CVPR08)

am

55

Where am I? What am I looking at?

3D Database

xSIFT VIP

56

Image query

Location recognition (Baatz et al., ECCV2010)

57

Projective ambiguity

reconstruction from uncalibrated images only⇒ projective ambiguity on reconstruction

´M´M))((Mm 1 PTPTP === − Similarity(7DOF)

Affine(12DOF)

Projective(15DOF)

58

Euclidean projection matrix

=

1yy

xx

cfcsf

K

Factorization of Euclidean projection matrix

Intrinsics:

Extrinsics: ( )t,RNote: every projection matrix can be factorized, but only meaningful for euclidean projection matrices

(camera geometry)

(camera motion)

59

The Absolute Quadric

Eliminate extrinsics from equation

Equivalent to projection of quadric

))(Ω)((Ω *1* TTTTT PTTTPTPPKK −−==

Absolute Quadric also exists in projective world

Absolute Quadric

T´Ω´´ * PP=Transforming world so thatreduces ambiguity to metric

** ΩΩ´ →

2 2 2 0A B C+ + =

0AX BY CZ D+ + + =

60

ω*

Ω*

Absolute Quadric = calibration object which is always present but can only be observed through constraints on the intrinsics

Tii

Tiii Ωω KKPP == ∗∗

Absolute quadric and self-calibration

Projection equation:

• Translate constraints on K • through projection equation• to constraints on Ω*

• Don’t treat all constraints equal

61

Practical linear self-calibration

( ) ( )( )( )( ) 0Ω

0Ω0Ω

0ΩΩ

23T

13T

12T

22T

11T

===

=−

∗∗

PPPPPP

PPPP(Pollefeys et al., ECCV‘02)

2

* 2

ˆ 0 0ˆ0 0

0 0 1

f

f

= Ω ≈

P PT TKK

( ) ( )( ) ( ) 0ΩΩ

0ΩΩ

33T

22T

33T

11T

=−=−

∗∗

∗∗

PPPPPPPP

(only rough aproximation, but still usefull to avoid degenerate configurations)

(relatively accurate for most cameras)

91

91

1.011.0

101.01

2.01

=

1yy

xx

cfcsf

K

after normalization!

when fixating point at image-center not only absolute quadric diag(1,1,1,0) satisfies ICCV98 eqs., but also diag(1,1,1,a), i.e. real or imaginary spheres!

Recent related work (Gurdjos et al. ICCV‘09)

62

Upgrade from projective to metric

• Locate Ω* in projective reconstruction (self-calibration)Problem: not always unique (Critical Motion Sequences)

• Transform projective reconstruction • to bring Ω* in canonical position

Refining structure and motion

• Minimize reprojection error

– Maximum Likelyhood Estimation (if error zero-mean Gaussian noise)

– Huge problem but can be solved efficiently(Bundle adjustment)

( )∑∑= =

m

k

n

iikD

ik 1 1

2

kiM,P

MP,mmin

Non-linear least-squares

• Newton iteration• Levenberg-Marquardt • Sparse Levenberg-Marquardt

(P)X f= (P)X argminP

f−

Newton iteration

Taylor approximation

∆+≈∆+ J)(P )(P 00 ff PXJ∂∂

=Jacobian

)(PX 1f−

∆−=∆−−≈− JJ)(PX)(PX 001 eff

( ) 0T-1T

0TT JJJJJJ ee =∆⇒=∆⇒

∆+=+ i1i PP ( ) 0T-1T JJJ e=∆

( ) 01-T-11-T JJJ eΣΣ=∆

normal eq.

Levenberg-Marquardt

0TT JNJJ e=∆=∆

0TJN' e=∆

Augmented normal equations

Normal equations

J)λdiag(JJJN' TT +=

30 10λ −=

10/λλ :success 1 ii =+

ii λ10λ :failure = solve again

accept

λ small ~ Newton (quadratic convergence)

λ large ~ descent (guaranteed decrease)

Levenberg-Marquardt

Requirements for minimization• Function to compute f• Start value P0

• Optionally, function to compute J(but numerical ok, too)

Richard Szeliski

Bundle Adjustment

• What makes this non-linear minimization hard?– many more parameters: potentially slow– poorer conditioning (high correlation)– potentially lots of outliers– gauge (coordinate) freedom

Chained transformations

• Think of the optimization as a set of transformations along with their derivatives

Richard Szeliski

Richard Szeliski

Levenberg-Marquardt

• Iterative non-linear least squares [Press’92]– Linearize measurement equations

– Substitute into log-likelihood equation: quadratic cost function in ∆m

Richard Szeliski

Robust error models

• Outlier rejection– use robust penalty applied

to each set of jointmeasurements

– for extremely bad data, use random sampling [RANSAC, Fischler & Bolles, CACM’81]

Richard Szeliski

• Iterative non-linear least squares [Press’92]– Solve for minimum

Hessian:

error:

Levenberg-Marquardt

Chained transformations

• Only some derivatives are non-zero:j th camera, i th point

Richard Szeliski

Multi-camera rig

• Easy extension of generalized bundler:

Richard Szeliski

Richard Szeliski

Lots of parameters: sparsity

• Only a few entries in Jacobian are non-zero

Exploiting sparsity

• Schur complement to get reduced camera matrix

Richard Szeliski

Sparse Levenberg-Marquardt

• complexity for solving– prohibitive for large problems

(100 views 10,000 points ~30,000 unknowns)

• Partition parameters– partition A – partition B (only dependent on A and itself)

0T-1 JN' e=∆3N

Sparse bundle adjustment

residuals:normal equations:

with

note: tie points should be in partition A

Sparse bundle adjustment

normal equations:

modified normal equations:

solve in two parts:

Sparse bundle adjustment

U1

U2

U3

WT

W

V

P1 P2 P3 M

Jacobian of has sparse block structure

=J == JJN T

12xm 3xn(in general

much larger)

im.pts. view 1

( )( )∑∑= =

m

k

n

iikD

1 1

2ki MP,m

Needed for non-linear minimization

Sparse bundle adjustment

• Eliminate dependence of camera/motion parameters on structure parametersNote in general 3n >> 11m

WT V

U-WV-1WT

− −

NI0WVI 1

11xm 3xn

Allows much more efficient computationse.g. 100 views,10000 points,

solve ±1000x1000, not ±30000x30000

Often still band diagonaluse sparse linear algebra algorithms

Sparse bundle adjustment

normal equations:

modified normal equations:

solve in two parts:

Sparse bundle adjustment

• Covariance estimation

-1WVY =( )

1a

Tb

1-a

VYYWWVU

+

+∑=∑−=∑

Yaab ∑=∑ -

Direct vs. iterative solvers

• How do they work?• When should you use them?

Richard Szeliski

Large reduced camera matrices

• Use iterative preconditioned conjugate gradient– [Jeong, Nister, Steedly, Szeliski, Kweon, CVPR 2009]– [Agarwal, Snavely, Seitz, and Szeliski, ECCV 2009]– Block diagonal works well– Sometimes partial Cholesky does too– Direct solvers for small problems

Richard Szeliski

Large-scale structure from motion

Richard Szeliski

Large-scale reconstruction

• Most of the models shown so far have had ~500 images

• How do we scale from 100s to 10,000s of images?

• Observation: Internet collections represent very non-uniform samplings of viewpoint

Richard Szeliski

Skeletal graphs for efficient structure from motion

Noah Snavely, Steve Seitz,Richard Szeliski

CVPR’2008

Skeletal set

• Goal: select a smaller set of images to reconstruct, without sacrificing quality of reconstruction

• Two problems:– How do we measure quality?– How do we find a subset of images with

bounded quality loss?

Skeletal set

Goal: given an image graph ,• select a small set S of important

images to reconstruct, bounding the loss in quality of the reconstruction

• Reconstruct the skeletal set S • Estimate the remaining images with

much faster pose estimation steps

Properties of the skeletal set

• Should touch all parts ofDominating set

• Should form a single reconstructionConnected dominating set

• Should result in an accurate reconstruction ?

Example

Full graph Skeletal graph

Representing information in a graphWhat kind of information?– No absolute information about camera

positions– Each edge provides information about the

relative positions of two images

– … but not all edges are equally informativeWe model information with the uncertainty (covariance) in pairwise camera positions

Estimating certainty

• Usually measured with covariance matrix

• SfM has thousands or millions of parameters

• We only measure covariance in camera positions (3x3 matrix for every camera)

Pantheon

Full graph Skeletal graph (t=16)

Skeletal reconstruction

101 images

After adding leaves579 images

After final optimization579 images

Pantheon

Trafalgar Square

2973 images registered (277 in skeletal set)

Trafalgar Square

Running time

0

20

40

60

80

100

120

140

160

180

Stonehenge St. Peters Pantheon Pisa Trafalgar

Full reconstruction

Skeletal reconstruction

Skeletal reconstruction + BA

(~10 days) (~30 days)

hour

s

Building Rome in a Day

Sameer Agarwal, Noah Snavely,Ian Simon, Steven M. Seitz,

Richard SzeliskiICCV’2009

Distributed matching pipeline

Query expansion

Results: Dubrovnik

Results: Rome

Changchang Wu’s SfM code

for iconic graph• uses 5-point+RANSAC for 2-view initialization• uses 3-point+RANSAC for adding views• performs bundle adjustmentFor additional images• use 3-point+RANSAC pose estimation

Rome on a cloudless day

(Frahm et al. ECCV 2010)• GIST & clustering (1h35)

SIFT & Geometric verification (11h36)

SfM & Bundle (8h35)

Dense Reconstruction (1h58)

Some numbers• 1PC• 2.88M images• 100k clusters• 22k SfM with 307k images• 63k 3D models• Largest model 5700 images• Total time 23h53

Structure from motion: limitations

• Very difficult to reliably estimate metricstructure and motion unless:– large (x or y) rotation or– large field of view and depth variation

• Camera calibration important for Euclidean reconstructions

• Need good feature tracker

Related problems

• On-line structure from motion and SLaM (Simultaneous Localization and Mapping)– Kalman filter (linear)– Particle filters (non-linear)

Visual SLAM

• Real-time Stereo Visual SLAM

113

(Clipp et al., IROS2010)

Open challenges

• Large scale structure from motion– Complete building– Complete city

Open source vode available, e.g.

Christopher Zach (SfM, Bundle adjustment)http://www.inf.ethz.ch/personal/chzach/opensource.html

Photo Tourism Bundler:http://phototour.cs.washington.edu/bundler/

116

Next week:Optical Flow

top related