structure and motion from line segments in multiple images camillo j. taylor, david j. kriegman...

Structure and Motion from Line Segments in Multiple ImagesCamillo J. Taylor, David J. Kriegman

Presented by David Lariviere

Primary Goal

Given a series of images with known corresponding line segments,

calculate the relative locations of the cameras imaging the scene and the three-dimensional locations of the

line segments.

Some Previous Work• (1981) Longuet-Higgins. “A computer

algorithm for reconstructing a scene from two projections.”

• (1990) Vieville. “Estimation of 3D-motion and structure from tracking 2D-lines in a sequence of images.”

• (1992) Tomasi, Kanade. “Shape and motion from image streams under orthography.”

Problem Characterization• Instead of using generalized scenes

and points, focus on rigid scenes with clear edges as features.

• Advantages of lines as features:– Occur frequently in man-made

environments.– Easily located and tracked– More accurately localized than points

because there is more information available in corroboration.

Algorithm Overview• Determine a non-linear objective

function whose minimization leads to an estimate of scene structure.

• In this case, estimate 3D camera locations/orientations and locations of line segments in 3D, and then reproject the lines onto the estimated image planes.

• The difference between the predicted projected lines and the actually observed lines is the error function to minimize.

Objective Function• pi: ith 3D line

• qj: jth camera position/orientation

• uij: observed edge i in image j.

• m images

• n lines

• F: reprojection of line pi onto the image plane of camera qj.

Notation – Line Representation• Represent a line in 3D space by (v,d)

– v: unit vector pointing in direction of the line– d: vector from origin to closest point on the line.

• m: normal vector of the plane defined by the camera center and line.

• Edge in image plane defined by mxx + myy + mz = 0

Notation – Reference Frames• Relate location/orientation of each

camera to some world base frame.

Summary of Parameters• Camera Location (tj): 3 DOF

• Camera Orientation (Rj): 3 DOF

• Line Location/Orientation (v,d): 4 DOF

• Requires at least 6 edge correspondences in 3 images.

Reprojection Error• Visible endpoints (x1,y1) & (x2,y2)

• Calculate minimal distance between observed and predicted lines for every point integrated on interval between endpoints.

• Normalize error by dividing by length of observed edge.

Algorithm• Primary Algorithm for minimizing non-

linear function: minimize line reprojection error through gradient decent to find local minimum:– Randomly generate initial values.– Iteratively follow function along steepest

descent to reach local minimum.• If local minimum error is below a certain threshold,

accept. • Else, generate new initial values and try again.

• Quality of initial values influence heavily the number of iterations required before the function converges.

Initial Value EstimationIn order to decrease computational cost, additional

steps are added to acquire acceptable starting values for gradient decent:

• User inputs range for camera orientations (Rj) and values of Rj within that range are randomly chosen.

• Holding constant estimates from (1), estimate vi subject to a constraint equation.

• Improve estimate from (2) by now minimizing same constraint equation with both vi and Rj as free parameters.

• Generate initial estimates of di and tj, using a second constraint equation.

• Provide estimates from (3) and (4) as starting values for gradient decent.

Constraint Equations• From the defined relations:

• One can derive:

• Which provides two constraint equations:

Results1. Simulation Results:

1. measuring tolerance to noise, rate of returns due to increased number of images/features, and rate of convergence of global minimization.

2. Comparing proposed method to previous linear methods

2. Real-world Results

Simulation Results:• Main Results:

– The algorithm is much more sensitive to errors in edge endpoints than error in the calibrated camera center.

– Holding maximum baseline constant, increasing the number of images beyond 6 or the number of lines beyond 50 does not improve accuracy.

– Small number of large-baseline images superior to many small-baseline images.

– Rate of convergence of global decent minimization algorithm is highly dependant on initial range of theta.

Simulation Results Continued

Comparison to Linear Method•This method is significantly less sensitive to noise than the leading linear algorithm1

1J. Weng, Y. Liu, T. S. Huang, and N. Ahuja, “Estimating motion/structure from line correspondences”

Real-world Results

Real-world Results…

Real-world Results: Hallway

Discussion• Initial estimation optimizations

improve calculation speed.

• Algorithm is very insensitive to noise

• Future improvements:– Automate edge correspondence tracking

by using video. – Impose edge-intersection and other

geometric restrictions (coplanarity, parallelism, etc).

Modeling and Rendering Architecture from Photographs: A hybrid geometry- and

image-based approach

Paul E. Debevec, Camillo J. Taylor, Jitendra Malik

Overview• Apply previous paper’s methods to

modeling architectural scenes with restricted geometry.

• Utilize model-based stereo to extract precise geometry from a sparse set of large-baseline photographs.

• Utilize 3D models and view-dependant photographs to construct photorealistic computer-generated views.

Architectural Models: Blocks• User starts by choosing geometric primitives

(blocks) to represent the basic geometry of the building

• Block: “hierarchical model of a parametric polyhedral primitive”

– Parametrized by base vertex and Po and other various properties (width, height, length, etc).

Block Relations• Hierarchy of blocks are used to describe the various geometric

primitives that make up the basic architecture.

• User manually maps corresponding edges in images to the edges of the blocks.

• Blocks are related by constraints on their relations in terms of location and orientation:

– For example, ensure that the bottom of one block sits on top of the top of another block.

• Values of blocks are stored symbolically, meaning if one specifies a series of blocks to be parallel, then only one variable is used to enforce this restriction across all blocks.

• gi(X): rigid transformation mapping one block to adjacent block.

• Pw(x): block vertex in world coordinates

• vw(x): line orientation in world orientation

Block Relations Continued…

Advantages of Blocks• Well model most architectural scenes• Implicitly contain features commonly

found in architecture (ex: parallel edges, right angles)

• Manipulation by user is easier due to reduced number of parameters.

• Surfaces are pre-defined by the model, removing the need to calculate them from edges.

• Number of parameters are greatly reduced when performing minimization of cost function.

Single Image Examples:

Estimation of 3D Structure • Very similar to previous paper: Estimate

parameters of camera (R, t) and edges (v, d) which minimize the reprojection error.

• Differences:– Many edges are defined with relation to

one another, meaning fewer variables. – Apply horizontal/vertical constraints on

vi to more accurately estimate Rj.– Instead of using gradient decent, the

authors use Newton-Raphson method to minimize the non-linear error function.

View-Dependant Texture Mapping• Once camera and edge locations/orientations are known,

project images onto block models. • If multiple images of same area exist, apply weighted

averaging to fuse multiple images. – Weights are inversely proportional to the difference in

angle between the virtual view being synthesized and the camera location/orientation which took the particular image.

• Possible to divide planes into faces, and only calculate the weighted average for one value and apply it to the entire

face.

Example of Texture-Mapping

Model-based Stereopsis• Use known scene geometry and camera

locations to rectify large-baseline images before performing stereo.

• Allows for the avoidance of foreshore-shortening problems which can be very large when images are taken far apart.

• Maintain epipolar constraint by projecting offset image onto model and then reprojecting onto key image’s image plane to create rectified image for use in stereopsis.

Model-based Stereopsis Example

Discussion• For architectural scenes that

generally fit the allowed geometric primitives, approach works quite well.

• Future Possible Improvements:– Additional models: surfaces of

revolution– Estimate BRDF– Devise method of selecting best images

to use for rendering of novel views.

Questions?

structure and motion from line segments in multiple images camillo j. taylor, david j. kriegman...

Documents