lecture 23: structure from motion and multi-view stereo cs6670: computer vision noah snavely

40
Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Post on 15-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Lecture 23: Structure from motion and multi-view stereo

CS6670: Computer VisionNoah Snavely

Page 2: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Readings

• Szeliski, Chapter 7.1 – 7.4, 11.6

Page 3: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Announcements

• Project 2b due on Tuesday by 10:59pm

• Final project proposals feedback soon

Page 4: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Structure from motion

• Input: images with points in correspondence pi,j = (ui,j,vi,j)

• Output• structure: 3D location xi for each point pi• motion: camera parameters Rj , tj possibly Kj

• Objective function: minimize reprojection error

Reconstruction (side) (top)

Page 5: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Structure from motion

Camera 1

Camera 2

Camera 3R1,t1

R2,t2

R3,t3

X1

X4

X3

X2

X5

X6

X7

minimize

g(X,R, T)

p1,1

p1,2

p1,3

non-linear least squares

Page 6: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Structure from motion

• Minimize sum of squared reprojection errors:

• Minimizing this function is called bundle adjustment– Optimized using non-linear least squares,

e.g. Levenberg-Marquardt

predicted image location

observedimage location

indicator variable:is point i visible in image j ?

Page 7: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Incremental structure from motion

Page 8: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Incremental structure from motion

Page 9: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Incremental structure from motion

Page 10: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Photo Explorer

Page 11: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Demo

Page 12: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely
Page 13: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely
Page 14: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Why is this hard?Why is this hard?

Page 16: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Extensions to SfM• Can also solve for intrinsic parameters (focal

length, radial distortion, etc.)• Can use a more robust function than squared

error, to avoid fitting to outliers

• For more information, see: Triggs, et al, “Bundle Adjustment – A Modern Synthesis”, Vision Algorithms 2000.

Page 17: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Questions?

Page 18: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Can we reconstruct entire cities?

Page 19: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely
Page 20: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely
Page 21: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Gigantic matching problem

• 1,000,000 images 500,000,000,000 pairs– Matching all of these on a 1,000-node cluster would

take more than a year, even if we match 10,000 every second

– And involves TBs of data

• The vast majority (>99%) of image pairs do not match

• There are better ways of finding matching images (more on this later)

Page 22: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Gigantic SfM problem

• Largest problem size we’ve seen:– 15,000 cameras– 4 million 3D points– more than 12 million parameters– more than 25 million equations

• Huge optimization problem• Requires sparse least squares techniques

Page 23: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Building Rome in a Day

Rome, Italy. Reconstructed 150,000 in 21 hours on 496 machines

Colosseum

St. Peter’s Basilica

Trevi Fountain

Page 24: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Dubrovnik, Croatia. 4,619 images (out of an initial 57,845).Total reconstruction time: 23 hoursNumber of cores: 352

Dubrovnik, Croatia. 4,619 images (out of an initial 57,845).Total reconstruction time: 23 hoursNumber of cores: 352

Page 25: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Dubrovnik

Dubrovnik, Croatia. 4,619 images (out of an initial 57,845).Total reconstruction time: 23 hoursNumber of cores: 352

Page 26: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

San Marco Square

San Marco Square and environs, Venice. 14,079 photos, out of an initial 250,000. Total reconstruction time: 3 days. Number of cores: 496.

Page 27: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Multi-view stereo

StereoMulti-view stereo

Page 28: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Multi-view Stereo

CMU’s 3D Room

Point Grey’s Bumblebee XB3

Point Grey’s ProFusion 25

Page 29: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Multi-view Stereo

Page 30: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Multi-view Stereo

Figures by Carlos Hernandez

Input: calibrated images from several viewpointsOutput: 3D object model

Page 31: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Fua Narayanan, Rander, KanadeSeitz, Dyer

1995 1997 1998Faugeras, Keriven

1998

Hernandez, Schmitt Pons, Keriven, Faugeras Furukawa, Ponce

2004 2005 2006Goesele et al.

2007

Page 32: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Stereo: basic ideaerror

depth

Page 33: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

width of a pixel

Choosing the stereo baseline

What’s the optimal baseline?– Too small: large depth error– Too large: difficult search problem

Large BaselineLarge Baseline Small BaselineSmall Baseline

all of thesepoints projectto the same pair of pixels

Page 34: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

The Effect of Baseline on Depth Estimation

Page 35: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

z

width of a pixel

width of a pixel

z

pixel matching score

Page 36: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely
Page 37: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Multibaseline Stereo

Basic Approach– Choose a reference view– Use your favorite stereo algorithm BUT

• replace two-view SSD with SSSD over all baselines

Limitations– Only gives a depth map (not an “object model”)– Won’t work for widely distributed views:

Page 38: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Some Solutions• Match only nearby photos [Narayanan 98]

• Use NCC instead of SSD,Ignore NCC values > threshold [Hernandez & Schmitt 03]

Problem: visibility

Page 39: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Popular matching scores• SSD (Sum Squared Distance)

• NCC (Normalized Cross Correlation)

– where

– what advantages might NCC have?

Page 40: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely

Questions?