Download - Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely
![Page 1: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/1.jpg)
Lecture 23: Structure from motion and multi-view stereo
CS6670: Computer VisionNoah Snavely
![Page 2: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/2.jpg)
Readings
• Szeliski, Chapter 7.1 – 7.4, 11.6
![Page 3: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/3.jpg)
Announcements
• Project 2b due on Tuesday by 10:59pm
• Final project proposals feedback soon
![Page 4: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/4.jpg)
Structure from motion
• Input: images with points in correspondence pi,j = (ui,j,vi,j)
• Output• structure: 3D location xi for each point pi• motion: camera parameters Rj , tj possibly Kj
• Objective function: minimize reprojection error
Reconstruction (side) (top)
![Page 5: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/5.jpg)
Structure from motion
Camera 1
Camera 2
Camera 3R1,t1
R2,t2
R3,t3
X1
X4
X3
X2
X5
X6
X7
minimize
g(X,R, T)
p1,1
p1,2
p1,3
non-linear least squares
![Page 6: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/6.jpg)
Structure from motion
• Minimize sum of squared reprojection errors:
• Minimizing this function is called bundle adjustment– Optimized using non-linear least squares,
e.g. Levenberg-Marquardt
predicted image location
observedimage location
indicator variable:is point i visible in image j ?
![Page 7: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/7.jpg)
Incremental structure from motion
![Page 8: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/8.jpg)
Incremental structure from motion
![Page 9: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/9.jpg)
Incremental structure from motion
![Page 10: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/10.jpg)
Photo Explorer
![Page 11: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/11.jpg)
Demo
![Page 12: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/12.jpg)
![Page 13: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/13.jpg)
![Page 14: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/14.jpg)
Why is this hard?Why is this hard?
![Page 16: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/16.jpg)
Extensions to SfM• Can also solve for intrinsic parameters (focal
length, radial distortion, etc.)• Can use a more robust function than squared
error, to avoid fitting to outliers
• For more information, see: Triggs, et al, “Bundle Adjustment – A Modern Synthesis”, Vision Algorithms 2000.
![Page 17: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/17.jpg)
Questions?
![Page 18: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/18.jpg)
Can we reconstruct entire cities?
![Page 19: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/19.jpg)
![Page 20: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/20.jpg)
![Page 21: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/21.jpg)
Gigantic matching problem
• 1,000,000 images 500,000,000,000 pairs– Matching all of these on a 1,000-node cluster would
take more than a year, even if we match 10,000 every second
– And involves TBs of data
• The vast majority (>99%) of image pairs do not match
• There are better ways of finding matching images (more on this later)
![Page 22: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/22.jpg)
Gigantic SfM problem
• Largest problem size we’ve seen:– 15,000 cameras– 4 million 3D points– more than 12 million parameters– more than 25 million equations
• Huge optimization problem• Requires sparse least squares techniques
![Page 23: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/23.jpg)
Building Rome in a Day
Rome, Italy. Reconstructed 150,000 in 21 hours on 496 machines
Colosseum
St. Peter’s Basilica
Trevi Fountain
![Page 24: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/24.jpg)
Dubrovnik, Croatia. 4,619 images (out of an initial 57,845).Total reconstruction time: 23 hoursNumber of cores: 352
Dubrovnik, Croatia. 4,619 images (out of an initial 57,845).Total reconstruction time: 23 hoursNumber of cores: 352
![Page 25: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/25.jpg)
Dubrovnik
Dubrovnik, Croatia. 4,619 images (out of an initial 57,845).Total reconstruction time: 23 hoursNumber of cores: 352
![Page 26: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/26.jpg)
San Marco Square
San Marco Square and environs, Venice. 14,079 photos, out of an initial 250,000. Total reconstruction time: 3 days. Number of cores: 496.
![Page 27: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/27.jpg)
Multi-view stereo
StereoMulti-view stereo
![Page 28: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/28.jpg)
Multi-view Stereo
CMU’s 3D Room
Point Grey’s Bumblebee XB3
Point Grey’s ProFusion 25
![Page 29: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/29.jpg)
Multi-view Stereo
![Page 30: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/30.jpg)
Multi-view Stereo
Figures by Carlos Hernandez
Input: calibrated images from several viewpointsOutput: 3D object model
![Page 31: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/31.jpg)
Fua Narayanan, Rander, KanadeSeitz, Dyer
1995 1997 1998Faugeras, Keriven
1998
Hernandez, Schmitt Pons, Keriven, Faugeras Furukawa, Ponce
2004 2005 2006Goesele et al.
2007
![Page 32: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/32.jpg)
Stereo: basic ideaerror
depth
![Page 33: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/33.jpg)
width of a pixel
Choosing the stereo baseline
What’s the optimal baseline?– Too small: large depth error– Too large: difficult search problem
Large BaselineLarge Baseline Small BaselineSmall Baseline
all of thesepoints projectto the same pair of pixels
![Page 34: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/34.jpg)
The Effect of Baseline on Depth Estimation
![Page 35: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/35.jpg)
z
width of a pixel
width of a pixel
z
pixel matching score
![Page 36: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/36.jpg)
![Page 37: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/37.jpg)
Multibaseline Stereo
Basic Approach– Choose a reference view– Use your favorite stereo algorithm BUT
• replace two-view SSD with SSSD over all baselines
Limitations– Only gives a depth map (not an “object model”)– Won’t work for widely distributed views:
![Page 38: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/38.jpg)
Some Solutions• Match only nearby photos [Narayanan 98]
• Use NCC instead of SSD,Ignore NCC values > threshold [Hernandez & Schmitt 03]
Problem: visibility
![Page 39: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/39.jpg)
Popular matching scores• SSD (Sum Squared Distance)
• NCC (Normalized Cross Correlation)
– where
– what advantages might NCC have?
![Page 40: Lecture 23: Structure from motion and multi-view stereo CS6670: Computer Vision Noah Snavely](https://reader036.vdocuments.us/reader036/viewer/2022062408/56649d3e5503460f94a16b20/html5/thumbnails/40.jpg)
Questions?