niveristy of pplied ciences tuttgart...

UNIVERISTY OF APPLIED SCIENCESSTUTTGART

DEPARTMENT OFSURVEYING, COMPUTERSCIENCE AND MATHEMATICS

IMAGE-BASED SHAPE MODELING USING

DEFORMABLE SURFACES

DIPLOMA THESIS

AUTHOR: STEFANIE WUHRER

WINTER SEMESTER2004 / 2005

FIRST CORRECTOR: PROF. SUSANNE HARMS,

UNIVERISTY OF APPLIED SCIENCESSTUTTGART

SECOND CORRECTOR: DR. CHANG SHU,

NATIONAL RESEARCHCOUNCIL CANADA

Abstract

Image-based modeling techniques construct digital shape models from 2D images of physi-

cal objects. They are used in a wide range of applications, where both virtual and real data

is required. Traditionally, image-based techniques operate on image space and rely on point

correspondences in different images. If smooth surfaces are modeled, these techniques re-

quire a large number of corresponding image points. Therefore, they are only suitable for

reconstructing mainly planar surfaces. Smooth surfaces can be modeled by using deformable

surface models, which dynamically evolve through time. These models deform an initial geo-

metric surface according to partial differential equations or variational principles. Deformable

models can be used for image-based modeling to deform an initial three-dimensional surface

iteratively towards the optimally photo consistent smooth surface. In this thesis, several de-

formable models are surveyed. A method that deforms an initial surface according to the

general weighted minimal surface flow was implemented and the experimental results are

discussed. Starting from an initial model that has the same topology as the real object, the

implemented method reconstructs the real object’s geometry.

i

Acknowledgements

First and foremost, I would like to thank my supervisor and second corrector, Dr. Chang

Shu for his continual insight, advice, and support. Dr. Shu helped me with his encyclopedic

knowledge not only in the field of image-based modeling, but also in the fields of free-form,

differential, and computational geometry. His enthusiasm for the project was matched only by

his ability to find relevant literature.

I wish to thank my first corrector, Prof. Susanne Harms for her great support. Despite of

the geographical distance, Prof. Harms was always there to give advice and encouragement.

I wish to thank Dr. Gerhard Roth and Dr. Mark Fiala of the National Research Coun-

cil Canada for their advice and support. Furthermore, I thank the Institute for Information

Technology for giving me the opportunity to complete my thesis at its facilities.

ii

Contents

1. Introduction 1

1.1. Motivation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2. Problem Statement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3. Related Work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4. Thesis Organization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2. Background 5

2.1. Camera Parameters and Perspective Projection. . . . . . . . . . . . . . . . . 5

2.2. Reconstructing 3D points from multiple images. . . . . . . . . . . . . . . . 7

3. A Survey of Surface Representations 9

3.1. Triangular meshes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2. B-Splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3. Non-Uniform Rational B-Splines (NURBS). . . . . . . . . . . . . . . . . . 15

3.4. Subdivision surfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.4.1. Implemented schemes. . . . . . . . . . . . . . . . . . . . . . . . . 17

4. A Survey of Dynamic Shape Modeling 21

4.1. D-NURBS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.1.1. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.2. Multi-resolution Shape Recovery. . . . . . . . . . . . . . . . . . . . . . . . 23

4.2.1. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.3. Energy-minimizing Snake. . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.3.1. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.4. PDE-based Deformable Surfaces. . . . . . . . . . . . . . . . . . . . . . . . 27

iii

CONTENTS iv

4.4.1. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5. Interpretation and Implementation 31

5.1. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.2. Comparison of dynamic shape models. . . . . . . . . . . . . . . . . . . . . 32

5.3. Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5.3.1. Mesh smoothing and adaptive refinement. . . . . . . . . . . . . . . 33

5.3.2. Mesh regularization. . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.3.3. Deformation of the mesh. . . . . . . . . . . . . . . . . . . . . . . . 35

5.3.4. Outline of the algorithm. . . . . . . . . . . . . . . . . . . . . . . . 47

5.4. Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6. Conclusion and Future Work 60

A. Used libraries i

A.1. Geometry library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i

A.2. OpenGL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

A.3. CLAPACK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

A.4. Algorithms from “Numerical Recipes in C”. . . . . . . . . . . . . . . . . . ii

A.5. JPEG library. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

B. Implemented classes iii

B.1. Import the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

B.2. Subdivision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

B.3. Mesh Refinement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

B.4. Deformation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

B.5. Export the model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

B.6. Possible user-interactions. . . . . . . . . . . . . . . . . . . . . . . . . . . . x

B.7. Parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

List of Figures

2.1. The pinhole camera model. . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.1. Computing the discrete mean curvature. . . . . . . . . . . . . . . . . . . . 13

3.2. Comparison of mesh smoothing. (a) shows the initial mesh, (b) the refined and

smoothed mesh with Laplace operator, (c) the refined and smoothed mesh with

mean curvature normal, and (d) refined and regularized mesh with tangential

Laplace operator.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3. Quadrisecting a triangle. Reference: [14] . . . . . . . . . . . . . . . . . . . 18

3.4. Comparison of subdivision. (a) shows the initial mesh, (b) the mesh after

Loop subdivision, (c) the mesh after modified Loop subdivision, and (d) the

mesh after Butterfly subdivision.. . . . . . . . . . . . . . . . . . . . . . . . 20

5.1. Adaptive refinement of a triangle.. . . . . . . . . . . . . . . . . . . . . . . . 34

5.2. (a) and (c) show the normals of the bag model and the head of Nefertiti re-

spectively, (b) and (d) show their mean curvature normals.. . . . . . . . . . 36

5.3. Neighborhoods used for computation of photo consistency, (a) by Duan et

al. [6, 7] and (b) in this project. . . . . . . . . . . . . . . . . . . . . . . . . 37

5.4. Photo consistency. (a) shows the intersection between the initial mesh and a

planeπ. (b)-(e) are plots of the photo consistencies of sample points inπ. The

photo consistency is low in blue regions and high in red regions.. . . . . . . 39

5.5. (a) shows the texture mapped initial triangular mesh intersected with a plane

π and (b) shows a plot of the photo consistency inπ and the intersection

between the initial model andπ as white polygon.. . . . . . . . . . . . . . . 41

v

L IST OF FIGURES vi

5.6. (a) and (c) show the texture mapped initial triangular mesh intersected with

planesπ1, π2, (b) and (d) show plots of the photo consistency inπ1, π2 and the

intersections between the initial model andπ1, π2 as white polygons respectively.41

5.7. Screenshot of the graphical user interface.. . . . . . . . . . . . . . . . . . . 48

5.8. Camera stations for the bag model.. . . . . . . . . . . . . . . . . . . . . . . 52

5.9. Reconstruction of a bag.. . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.10.Camera stations for the Nefertiti model.. . . . . . . . . . . . . . . . . . . . 54

5.11.Reconstruction of Nefertiti.. . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.12.Camera stations for the frog model.. . . . . . . . . . . . . . . . . . . . . . 56

5.13.Reconstruction of a soft toy frog.. . . . . . . . . . . . . . . . . . . . . . . . 57

5.14.Camera stations for the clown model.. . . . . . . . . . . . . . . . . . . . . 58

5.15.Reconstruction of a clown.. . . . . . . . . . . . . . . . . . . . . . . . . . . 59

List of Tables

5.1. Information on the bag model.. . . . . . . . . . . . . . . . . . . . . . . . . 49

5.2. Information on the Nefertiti model.. . . . . . . . . . . . . . . . . . . . . . . 50

5.3. Information on the frog model.. . . . . . . . . . . . . . . . . . . . . . . . . 51

5.4. Information on the clown model.. . . . . . . . . . . . . . . . . . . . . . . . 51

5.5. Information on the running times.. . . . . . . . . . . . . . . . . . . . . . . 52

vii

1. Introduction

Mathematicians have long since regarded it as demeaning to work on problems

related to elementary geometry in two or three dimensions, in spite of the fact that

it is precisely this sort of mathematics which is of practical value.

GRUENBAUM, BRANKO AND SHEPHARD, G. C.

1.1. Motivation

Shape modeling is an old field dating back to the early beginning of geometry when people

first tried to describe shapes with mathematical formulations. Today, various mathematical

disciplines are concerned with the description and analysis of shapes, for example, differen-

tial geometry, differential topology, and fractal theory. Inspired by the improving computing

power, disciplines that have the goal of modeling and processing geometric shapes in vir-

tual environments, for instance, computer graphics, computer vision, computer-aided design

(CAD), computer-aided manufacturing (CAM), and scientific computation, have been devel-

oped. Describing physical and abstract objects with digital shape models is common in many

fields such as engineering, mathematics, biology, physics, and chemistry, where digital shape

models are used to carry out computer simulations and animations, to create virtual and aug-

mented reality, and for reverse engineering. As real physical or chemical experiments are

often time-consuming, expensive, or even impossible to be performed, they have to be simu-

lated using digital shape models obtained from physical objects. Therefore it is important to

create high quality 3D models from physical objects.

Different approaches to building three-dimensional geometric models from physical objects

exist. An accurate geometric data acquisition device is the laser range scanner; its output is a

cloud of 3D points, which represents the object scanned. Its main disadvantages are high cost

and not easy to use. Another 3D acquisition technique is image-based modeling that constructs

3D models from multiple 2D images taken with a digital camera. It has the advantage of being

inexpensive and flexible.

1

CHAPTER 1. INTRODUCTION 2

1.2. Problem Statement

This project is concerned with building 3D geometric models from physical objects using

image-based techniques. The advantage of image-based techniques is that they require no

expensive equipment. With the cost of digital cameras decreasing, image-based techniques

might soon become the standard for modeling physical objects. However, image-based tech-

niques rely on point correspondences in different images. To model smooth surfaces with

polygon meshes, a large number of control points is needed. Therefore, this technique is only

adequate when the objects being modeled consist mainly of planar surfaces.

The goal of this project is to improve the current image-based model building techniques so

that they can handle free-form smooth surfaces. With a user-assisted association of a sparse

set of corresponding points, 3D points are reconstructed. A rough initial approximation of

the surface can be obtained by either fitting B-Spline surfaces to the points or by creating a

triangular mesh using the points. In the next step, the initial surface is deformed to get as

close to the true surface as possible using the photo-consistency property of projected points

in multiple images. A point in three-dimensional space can be projected to the image planes

if the calibration parameters of the camera are known. If a point is on the true surface and the

surface is Lambertian1, its projections in different images should have consistent colors. Now,

the problem of surface reconstruction can be formulated as an inverse problem where a surface

in three-dimensional space is sought that best satisfies the photo-consistency criteria. The aim

of this project is to define and implement a surface deformation scheme. A program that

demonstrates the theoretical formulation is implemented with C++ and OpenGL in Windows

environment.

1.3. Related Work

Various approaches to reconstruct a 3D model using a sequence of calibrated images without

knowing corresponding points have been proposed. The problem of finding a photo-consistent

shape is ill-posed and it has uncountably infinite solutions in general [16]. Hence, there ex-

ists an equivalence class of 3D shapes, which all reproduce the input photographs correctly.

Therefore, further constraints have to be applied to select one of the solutions.

1A surface, for which the directional hemispheric reflectance is independent of the direction of illumination,

is called Lambertian surface. On a Lambertian surface, the brightness of the color is independent of the

direction of view of the observer or camera [12].


The voxel-based techniques proposed in [16, 24] operate on a 3D grid space. The idea

is to discretize 3D space into voxels and back project each voxel into each image; the re-

constructed object consists of all the voxels with photo-consistency smaller than a threshold.

When projecting the voxels to the images, it is necessary to pay attention to occlusions. One

disadvantage of voxel-based techniques is that they operate on a regular 3D grid with resolu-

tion that can not be adapted to the shape of the surface; hence, we need a dense voxel grid to

approximate smooth surfaces, which results in a slow algorithm. Therefore, voxel-based tech-

niques are not considered in this project. Instead, surface-based techniques are used. Those

techniques deform geometric surface representations iteratively using the photo-consistency

criteria. The deformation process typically involves one term that attracts the model to fit

the photo-consistency criteria and a regularization term that tries to smooth the model. A

commonly used class of techniques are level-set methods that deform an implicit surface by

numerically solving a partial differential equation (PDE) [6]. Level-set methods are sensitive

to noises in the data and therefore not examined further in this project. Another approach

deforms a NURBS surface according to physical quantities, e.g. forces and deformation ener-

gies, the deformable NURBS (D-NURBS) [23]. If the deforming force simulates the effect of

the photo-consistency, D-NURBS can be used to deform an initial NURBS surface to the most

photo-consistent shape. Various approaches exist that deform and refine an initial triangular

mesh or subdivision surface to obtain a photo-consistent shape. A technique that produces

multi-resolution photo-realistic models by deforming an initial triangular mesh is discussed in

[28]. Esteban and Schmitt [9, 10] present a technique that deforms an initial triangular mesh

to a smooth photo-consistent mesh by applying forces. Another approach is proposed in [7]

where an initial triangular mesh is deformed by solving a PDE. The PDE-based method forms

the basis for the deformation that was implemented and tested in this project. The method was

modified to optimally fit the given problem.

1.4. Thesis Organization

Chapter2 gives some background information on camera parameters, perspective projection,

and the reconstruction of 3D points from corresponding points on multiple images. Chapter

3 and4 give surveys of surface representations and dynamic modeling respectively. Chapter

3 reviews triangular meshes (Section3.1), B-Spline surfaces (Section3.2), NURBS surfaces

(Section3.3), and subdivision surfaces (Section3.4). Chapter4 discusses D-NURBS (Section


4.1), a multi-resolution technique (Section4.2), energy-based snake models (Section4.3), and

PDE-based deformable models (Section4.4). In Chapter5, the algorithm based on PDE-based

deformable models that was implemented using C++ and OpenGL in a Windows environment

is discussed. Section5.1 and5.2 give an introduction and explain why the PDE-based de-

formable model is a suitable approach for solving the problem. Section5.3gives an overview

over the implemented algorithm, it contains detailed discussions about the smoothing and

regularization of the mesh, the computation and properties of the photo consistency, the com-

putation and properties of the gradient of the photo consistency function, and the system of

PDEs that deforms the initial surface. At the end of this Section, an overview of the deforma-

tion algorithm is given. In Section5.4, several experimental results obtained from real data

are demonstrated and discussed. Finally, Chapter6 concludes and gives ideas for future work.

2. Background

Projective geometry is all geometry.

CAYLEY, ARTHUR (1821-1895)

2.1. Camera Parameters and Perspective Projection

In computer vision, a central task is to determine the projection by which a three-dimensional

scene is transformed onto the image plane of a camera. This can be done using the pinhole

camera model. The pinhole perspective, which was first proposed at the beginning of the

fifteenth century by Brunelleschi, provides an acceptable approximation and is yet a simple

mathematical model [12].

Figure 2.1.:The pinhole camera model

The model that is explained in [26] and shown in Figure2.1 consists of an image planeπ

and a 3D pointO called center of projection. The line perpendicular toπ and passing through

5

CHAPTER 2. BACKGROUND 6

O, is called optical axis; it intersectsπ at the principal pointo. The distance betweenO and the

principal point is called focal lengthf. The camera reference frame is defined as a coordinate

system that has its origin atO and the z-axis aligned with the optical axis. The projection of

a 3D pointP = (X, Y, Z) onto the image plane can be constructed by intersecting the line

passing throughP andO with π. Denoting the projected point asp = (x, y, z) and using the

camera frame, the projection can be written as:

x = fX

Z(2.1)

y = fY

Z(2.2)

To determine the projection of a three-dimensional point given in a known world coordi-

nate frame to the image plane, the pinhole model makes use of camera parameters. Camera

parameters can be classified into two categories. Parameters that define the position and orien-

tation of the camera (i.e. the camera reference frame) at the time of the exposure with respect

to a known world coordinate system are called extrinsic parameters. Another set of parame-

ters, the intrinsic parameters, is internal to the camera that is used and defines the geometric

properties of the camera.

Extrinsic parameters can be any set of parameters that uniquely define the transformation

between the known world reference frame and the unknown camera reference frame. The

most commonly used method to describe this transformation is to use a three-dimensional

translation vectorT containing the relative positions of the origins and a 3× 3 rotation matrix

R that aligns corresponding axes of the two frames. A pointP can now be described in world

coordinatesPw as well as in camera coordinatesPc and the two descriptions are linked by the

following equation

Pc = R(Pw − T) (2.3)

with

R =

r11 r12 r13

r21 r22 r23

r31 r32 r33

.

A camera’s extrinsic transformation can be expressed as a single homogeneous matrixMext

that has dimension3× 4.

Intrinsic parameters include the focal lengthf and the principal point in 2D pixel space;

this follows from the pinhole camera model. For most of the digital cameras, pixels are not


exactly square. Therefore the horizontal and vertical pixel sizes(sx, sy) in millimeters are also

necessary to determine the projection. With these parameters, a point(xim, yim) on the image

plane in pixel units can be transformed to the camera frame. In the camera frame, the point

has coordinates(x, y).

x = −(xim − ox)sx (2.4)

y = −(yim − oy)sy (2.5)

The sign changes in (2.4) and (2.5) are necessary, because thex− andy− axes of the image

and camera reference frames are reversed. Another aspect intrinsic parameters need to model

is the geometric distortion that is introduced by the optics. One common form of distortion

is radial distortion, which occurs when cameras with wide-angle lenses are used. If(xd, yd)

are the coordinates of the distorted point, the true coordinates(x, y) of the point can be found

with

x = xd(1 + k1r2 + k2r

4) (2.6)

y = yd(1 + k1r2 + k2r

4) (2.7)

wherer2 = x2d + y2

d andk1 andk2 are further intrinsic parameters. If radial distortion is

neglected, the intrinsic transformation can be described with a single3× 3 matrixMint.

Mint =

−f/sx 0 ox

0 −f/sy oy

0 0 1

Hence, the perspective projection can be described by the linear matrix equation

x1

x2

x3

= MintMextPw (2.8)

wherePw is the three-dimensional point expressed in homogeneous coordinates. The image

coordinates in pixel units arexim = x1

x3andyim = x2

x3.

2.2. Reconstructing 3D points from multiple images

The previous section discussed how to find the image coordinates of a 3D point in pixel units

when the camera calibration parameters are known. With a given point in image coordinates


and camera calibration parameters, it is not possible to find the corresponding 3D point. How-

ever, the 3D point that is sought lies on the line passing through the image point and the center

of projection. If multiple images from the same scene and corresponding points of the images

are known, multiple lines that contain the corresponding 3D point can be constructed; the 3D

point is located where all the lines intersect. In this project point correspondences are found

by the user, who marks corresponding image points with a mouse. Therefore, the image co-

ordinates are not accurate and it is often impossible to find analytic intersections. Hence, the

3D points have to be reconstructed using numerical intersection methods. Another problem is

caused by the camera parameters; the intrinsic parameters are usually unknown, the extrinsic

ones are definitely unknown as they change when the camera moves. Therefore, we need to

calibrate the camera. Most of the camera calibration techniques involve taking multiple im-

ages of a pattern with known geometry. The pattern is then extracted from the images using its

known features to determine the calibration parameters [13]. Although camera calibration is

an important task in computer vision, it will not be discussed in detail as it is not core for this

project. Both the camera calibration and the reconstruction of a triangle mesh from multiple

images is performed using the software PhotoModeler 5 Pro, which was designed for image-

based surface reconstruction. PhotoModeler 5 Pro calibrates the camera, finds 3D points from

corresponding points on multiple images using iterative numerical methods, and triangulates

the 3D points [1].

3. A Survey of Surface

Representations

The modern, and to my mind true, theory is that mathematics is the abstract form

of the natural sciences; and that it is valuable as a training of the reasoning

powers not because it is abstract, but because it is a representation of actual

things.

SANFORD, T. H.

Deformable surfaces consist of two parts: a geometric representation of the surface and an

evolution law deforming the shape of the surface iteratively [6]. This chapter reviews some

geometric surface representations commonly used for shape modeling. The overview in this

chapter is not exhaustive. The surface representations can be categorized into two classes:

continuous and discrete representations. Continuous representations offer the advantage that

quantities from differential geometry such as normals and curvatures can be computed accu-

rately at arbitrary points of the surface with sufficient smoothness. As discrete surfaces are

only known at a finite set of points, in theory differential quantities do not exist. However, it

is possible to approximate those quantities.

3.1. Triangular meshes

Describing objects with complicated shapes using one functional expression is inconvenient as

the functional expression can be complex. Therefore, piecewise representations of the object

are preferable. The simplest piecewise representation is a triangular mesh, which represents

objects of arbitrary topology. Because of their simplicity, meshes are fairly popular; they are

commonly used in graphics and for 3D acquisition with laser scanning technologies.

Triangular meshes can be used to approximate continuous surfaces. Different mesh repre-

sentations of a free-form surface are compared according to two main criteria: smoothness and

9

CHAPTER 3. A SURVEY OF SURFACE REPRESENTATIONS 10

complexity. The smoothness can be measured using discrete curvatures and the complexity is

defined by the number of vertices, edges and faces of the mesh [6].

In theory differential quantities do not exist on triangular meshes, but various approaches

to approximate them were developed. Although meshes can not represent smooth surfaces, it

is possible to employ mesh smoothing operators to improve the approximation of a smooth

surface. We present some ways of calculating normals and mean curvatures as well as some

commonly used smoothing algorithms in the following sections.

Approximations of the normal at a vertex of triangular meshes

One direct way of approximating the normal vectorN in a vertexP of a triangular mesh is to

computeN as a weighted sum of the normals ofP’s neighboring triangles. IfNj denotes the

normals of thek triangles in a 1-ring neighborhood ofP andωj denotes the weight associated

with Nj, N can be calculated as

N =

∑kj=1 ωjNj∥∥∥∑kj=1 ωjNj

∥∥∥ (3.1)

It is important that the normalsNj have consistent orientation. The weights can be chosen in

different ways. The following choices are discussed in [27].

• A simple method is to use the same weight for each triangle normal. This only makes

sense if the triangles are evenly distributed and it offers the advantage that no weights

have to be computed. However, the results can be poor for meshes where the distribution

of the vertices is irregular.

• An efficient method for determining the weights is to use the inverse area of a triangle

as weightωj. For triangle numberj with verticesP1, P2 andP3,

Nj =(P2 − P1)× (P3 − P1)

‖(P2 − P1)× (P3 − P1)‖, ωj =

2

‖(P2 − P1)× (P3 − P1)‖. (3.2)

Hence, this method does not need significant extra computation time forωj.

• The weight for a triangle normal can be the inverse radius of its circumcircle.

• The inverse orthogonal distance of the vertexP to the opposite side of the triangle can

be the weight associated with the triangle.


• Angles might be used as well to weighNj. One can either choose the internal tri-

angle angle at vertexP or the angle betweenNj and N(j+1) mod k as ωj. If ωj =

6 (Nj, N(j+1) mod k and the trianglesj and(j + 1) mod k are coplanar, it is necessary

to setωj to a constant larger than0.

All of these methods – except for the first one – have the effect that triangles containing

vertices that are further fromP are weighed less than the ones that contain only vertices close

to P.

Another way of approximating the normal at a vertexP is to fit a plane toP and its neighbors

using least square.N can be approximated by the normal of the plane. This can be done

with the algorithm of Bradley and Vickers explained in [27], where the centroidM of P’s k

neighborsQj and a covariance matrix

C =k∑

j=1

(Qj −M) · (Qj −M)T (3.3)

are computed. The approximated normal vector is the eigenvector associated with the smallest

eigenvalue ofC. This way of computing the normal vector inP is less efficient than any of

the methods mentioned above as matrix operations need to be performed.

Approximations of the mean curvature on triangular meshes

Various approaches to compute the mean curvatureH associated with discrete meshes have

been proposed. Some approaches approximateH associated with an edge of the mesh [18],

others with a vertex of the mesh. If we approximateH for a vertexP, we calculate the integral

mean curvature of a vicinity ofP and divide it by the area ofP’s vicinity. The integral mean

curvature for an area of a surface is defined as the integral of the mean curvatureH over that

area.

Wilke [27] considers that the mean curvature in a pointQ of a continuous surface is defined

as the average of the normal curvaturesκn(Q, Φ):

H =

∫ 2π

0

κn(Q, Φ)dΦ =1

2π

∫ 2π

0

κ1(Q) cos2 Φ + κ2(Q) sin2 ΦdΦ (3.4)

whereκ1(Q) andκ2(Q) are the principal curvatures inQ andΦ is the angle in the tangent plane

of Q between the direction ofκ1(Q) andκn(Q, Φ). He concludes thatH(P) is proportional

to the weighted sum of the angles betweenNj andN(j+1) mod k, whereNj are the normals of


thek faces in a 1-ring neighborhood ofP. Dyn et al. [8] computesH according to the same

principle.

The mean curvature can also be derived using minimal surfaces. A minimal surface mini-

mizes the areaA for a given boundaryΓ ⊂ <3. A surface is a minimal surface if and only if

H ≡ 0.

Various numerical methods have been proposed to solve the problem, named after J. Plateau,

of finding a minimal surface for a given boundary. Mean curvature flow is a tool to decrease

the area of a surface iteratively. A surfaces(t) flows by mean curvature if it satisfies the PDE∂∂t

s(t) = H(t)N(t) whereN is the surface normal. Once the flow stops, it is either trapped in

a local minimum or it reached the global minimum, i.e. the surface is a minimal surface [21].

The relation between mean curvature flow and minimization of the area is described by the

equation

2HN = limdiam(A)→0

∇A

A(3.5)

whereA is an infinitesimal area aroundP, the point on the surface,diam(A) its diameter and

∇A its gradient [19].

The mean curvature normalHN for discrete meshes can be calculated as

HN =1

2

∑j

(cotαj + cotβj)(P−Qj) (3.6)

whereP is the point where the curvature normal is calculated andQj are all of its neighbors in

a 1-ring neighborhood. In the two neighboring triangles that share the edgee passing through

P andQj, αj andβj are the two angles opposite toe, see Figure3.1. HN is a vector-valued

quantity giving the mean curvature in an area associated withP. Hence, it needs to be divided

by an area associated withP to give a measurement of the vertex’s mean curvature normal

[21]. One popular way of calculating an area associated with a vertex in a triangular mesh is

the barycentric area. It is defined as one third of the area of the vertex’s neighboring triangles

[8].

Smoothing a triangular mesh

There exist various approaches to smooth a triangular mesh. In this project, we discuss and

compare three commonly used smoothing operators: Laplacian smoothing, tangential Lapla-

cian smoothing, and smoothing with the curvature normal operator.

The Laplace operator offers a simple way of smoothing a mesh. It moves each vertex

towards its neighbor’s centroid. All vertices of the mesh are then moved iteratively. For every


Figure 3.1.:Computing the discrete mean curvature

vertexxi of a meshX, the Laplace operator, also called umbrella operator, can be written as

L(xi) =1

m

∑j∈N1(i)

xj − xi, m = #N1(i) (3.7)

∂X

∂t= λL(X) (3.8)

whereN1(i) is the 1-ring neighborhood of vertexxi andλ is a constant. The differential

equation (3.8) can either be solved using an explicit or an implicit Euler scheme. This operator

significantly distorts the shape of meshes that contain triangles with different sizes [5].

The tangential Laplace operator is defined as the Laplace operatorL(x) projected on the

tangent plane of the vertexx.

T(x) = C(L − (L · N)N) (3.9)

whereC is a positive constant andN is the normal vector of the mesh inx. The operator is also

used in an iterative way over the mesh. A mesh is regular if its nodes are distributed evenly, if

the node density is adapted to the shape of the mesh and if the aspect ratio of its triangles is

good. The tangential Laplace operator improves the distribution of the mesh’s vertices, i.e., it

regularizes the mesh [6].

The mean curvature normal (3.6) can also be used for smoothing a triangular mesh. The

normalized operator is defined as

HNnormalized =1∑

j(cotαj + cotβj)

∑j

(cotαj + cotβj)(P−Qj) (3.10)


whereP, Qi, αi andβi are defined as in (3.6). This operator can be used in the explicit inte-

gration methods, hence the meshX can be smoothed according to

Xn+1 = (I + λdt(HNnormalized))Xn (3.11)

whereλdt < 1 is a time step. This operator neither smoothes features nor depends on the

sizes of the triangles. This means that even non-uniform meshes are not distorted, but only

smoothed [5].

(a) (b) (c) (d)

Figure 3.2.: Comparison of mesh smoothing. (a) shows the initial mesh, (b) the refined and

smoothed mesh with Laplace operator, (c) the refined and smoothed mesh with

mean curvature normal, and (d) refined and regularized mesh with tangential

Laplace operator.

3.2. B-Splines

B-Splines are among the most popular surface representations and they are used in a wide

range of applications, e.g., for various simulation tasks.

The basis functions of a B-Spline of orderk consist of piecewise polynomials of orderk

that are connected withCk−2 continuity at the points or knotst0 < t1 < . . . < tn+k. The basis

functions are defined recursively by

Ni,1(t) =

1, for ti ≤ u ≤ ti+1,

0, otherwise(3.12)


Ni,k(t) =t− ti

ti+k−1 − tiNi,k−1(t) +

ti+k − t

ti+k − ti+1

Ni+1,k−1(t) (3.13)

By choosing the multiplicity of a knot to bel, the continuity of the B-Spline is reduced to

Ck−l−1 at the point associated with the knot. Integral tensor-product B-Spline surfaces are

defined as

s(u, v) =m∑

i=0

n∑j=0

pi,jNi,k(u)Nj,l(v) (3.14)

with (m+1)(n+1) control pointspi,j called de Boor points that form the de Boor net.Ni,k(u)

andNj,l(u) are B-Spline basis functions of orderk andl respectively. The de Boor points only

have local effect, the pointpr,s only influences the surface region defined over the parametric

domainur ≤ u ≤ ur+k, vr ≤ v ≤ vs+l. Each B-Spline surface patch lies in the convex hull of

thek · l de Boor points associated with that patch. This is called the convex hull property [15].

3.3. Non-Uniform Rational B-Splines (NURBS)

Today, NURBS surfaces are the standard for representing surfaces in industrial applications

as they can represent both conics and free-form surfaces. Duan [6] gives a survey of NURBS

surfaces.

NURBS surfaces are a generalization of tensor-product B-Spline surfaces. Their definition

is given by

s(u, v) =

∑mi=0

∑nj=0 pi,jωi,jNi,k(u)Nj,l(v)∑m

i=0

∑nj=0 ωi,jNi,k(u)Nj,l(v)

(3.15)

with (m + 1)(n + 1) de Boor pointspi,j and associated weightsωi,j. Ni,k(u) andNj,l(u)

are B-Spline basis functions of orderk and l, defined in equation (3.12) and (3.13). The

weights associated with the de Boor points offer an additional degree of freedom for modeling

the shape of the surface. The higher the weight associated with a control point, the more

the surface gets attracted to that point. A weight with value zero indicates that the associated

control point is not taken into consideration and hence does not affect the shape of the NURBS

surface at all. NURBS surfaces not only inherit all of the properties of B-Spline surfaces,

they also offer additional ones. NURBS surfaces can represent conics as well as parametric

polynomial surfaces and they are invariant under scaling, rotation and translation. This means

that it is sufficient to scale, rotate, or translate the de Boor points to obtain a scaled, rotated, or

translated curve respectively.

However, NURBS surfaces are restricted to a rectangular parametric domain. This makes

it difficult and sometimes impossible to represent an object with arbitrary topology with one


single NURBS. If several NURBS are used to describe one shape, smoothness needs to be

modeled at the boundaries of the patches, which is a difficult problem. Another disadvantage

of NURBS is that the surface needs to be trimmed if the boundary of the shape does not fit the

rectangular domain or if sharp features need to be modeled. This is a time-consuming process.

3.4. Subdivision surfaces

B-Spline and NURBS surfaces offer a useful representation for constructing smooth free-form

surfaces. But as discussed in Section3.3, they are not advantageous for representing shapes

of arbitrary topology.

A mesh representation of a surface allows to model shapes with arbitrary topology with one

single mesh. However, as the surface is only known at a discrete set of points, it is impossible

to represent a smooth surface.

Subdivision surfaces are meshes that get refined iteratively using a fixed set of subdivision

rules; they converge towards a smooth limit surface. They offer a way to combine the advan-

tages of B-Spline surfaces and meshes as they are not restricted to a rectangular parametric

domain like meshes and as they can represent smooth B-Spline surfaces as a limit of iterative

refinements. During the past decade, subdivision surfaces have become popular in computer

graphics and computer-aided geometric design (CAGD) and they have been applied success-

fully in various fields such as multi-resolution mesh editing and animation.

The reason subdivision has become so important is that it offers a solution to many problems

that occur in computer graphics. The main advantages of subdivision surfaces discussed in

[6, 29] are:

• Subdivision surfaces converge towards smooth limit surfaces without being restricted

to rectangular parameter domains like B-Splines. Hence, it is possible to model an

object of arbitrary topology with a single subdivision surface without the need of time-

consuming trimming.

• As a result of the iterative refinement, subdivision is closely linked to level-of-detail

rendering. With subdivision surfaces, adaptive refinement is possible whereas NURBS

cannot be locally refined.

• While in graphics polygonal meshes are commonly used to represent surfaces, the most

popular way to represent surfaces in CAD are splines. Subdivision surfaces offer a


solution to close this gap as they can be treated as either a mesh or as the smooth limit

surface.

• The meshes that result from subdivision can be used for many numerical simulation

tasks in computer animation.

• Subdivision rules are numerically stable and simple to implement. To increase effi-

ciency, they might even be implemented in hardware.

The idea of subdivision is to iteratively refine a mesh by inserting new points such that it

defines a smooth curve or surface as the limit of a sequence of successive refinements. In

each iteration step, a fixed set of refinement rules is applied on the mesh. According to [29],

subdivision techniques can be classified using three criteria:

• The mesh that is generated can be either triangular or quadrilateral.

• The type of refinement rule can be either vertex insertion or corner cutting. Vertex

insertion schemes split each edge of the mesh in two peaces and create four triangles or

quadrilaterals of an existing one. In case of quadrilaterals an additional vertex is added

per face. Corner cutting schemes create a new face inside each existing face and connect

the newly created faces.

• The scheme can be either approximating or interpolating. Approximating schemes re-

fine each vertex in each iteration step, whereas interpolating schemes only insert new

refined vertices to the existing mesh. Generally speaking, the quality of interpolating

surfaces is lower than the quality of approximating surfaces. Furthermore, interpolating

schemes converge slower than approximating ones. However, it is obvious that each

iteration step of interpolating schemes is faster as only the newly inserted vertices need

to be refined.

3.4.1. Implemented schemes

For this project, three commonly used quadrisecting vertex insertion schemes for triangular

meshes were implemented: the Loop, modified Loop and modified Butterfly schemes. Quadri-

secting vertex insertion schemes split each triangle into four new triangles by splitting each

existing edge. Figure3.3demonstrates this triangle splitting scheme. New vertices are shown

as black dots.


Figure 3.3.:Quadrisecting a triangle. Reference: [14]

Notation

Valence of a point:The valence of a point is the number of its neighbors in a mesh.

Regular and extraordinary points:Points with a valence of 6 in the interior and a valence of

4 on the boundary are called regular in a triangular scheme. Points of other valence are

called extraordinary.

Odd and even points:For any subdivision level, all new points that are created at that level

are called odd, while all the points inherited from the previous level are called even.

Loop Subdivision

This method of subdividing triangulated surfaces was proposed by Charles Loop [29]. It is an

approximating vertex insertion scheme and the limit surface was proven to beC1-continuous

in each point up to the valence of 100. The limit surface isC2-continuous in regular points

of triangular meshes. This subdivision scheme is based on a spline basis function called the

three-directional quartic box spline, aC2-continuous function which is defined on a regular

triangular grid. The four refinement rules for odd and even points both on the boundary and

in the interior of the mesh can be found in [29].

Modified Loop Subdivision

This algorithm is explained in [14]. It is also an approximating scheme, but corner points

are interpolated. It is supposed to work better than the Loop scheme on arbitrary surfaces,

because it takes sharp edges into consideration and does not try to smooth them. In this

algorithm, edges are categorized into sharp and smooth using a threshold. Roughly speaking,

smooth edges are treated like in the Loop scheme, whereas sharp edges are maintained as they


represent features. To achieve this behavior of the surface, five different classes of vertices are

distinguished according to the number of incident sharp edges in a vertex. For each class of

vertex, a different subdivision mask is used. The rules on how to categorize a vertex and the

rules for refinement can be found in [14].

Modified Butterfly Subdivision

The Butterfly scheme is an interpolating subdivision scheme that leads to aC1-continuous

limit surfaces for regular vertices of triangular meshes, but not for irregular ones. Not all the

vertices of triangular meshes are regular in general. Therefore, the subdivision scheme was

modified and according to [29] the modified Butterfly scheme createsC1-continuous limit

surfaces for vertices of arbitrary valence. As the scheme is interpolating, only the odd points

have to be computed and inserted in each iterative refinement. They are refined using three

masks, for interior regular and extraordinary vertices and for boundary vertices.

Comparison

None of the three subdivision schemes that were implemented is generally preferable to the

other ones. Which subdivision scheme is most appropriate depends strongly on the data that is

subdivided. The Loop subdivision yields to good results for smooth shapes without features.

The modified Loop scheme is appropriate for shapes with sharp features and the Butterfly

scheme should be used if the existing mesh needs to be interpolated. For a comparison of the

different techniques, refer to Figure3.4.


(a) (b) (c) (d)

Figure 3.4.: Comparison of subdivision. (a) shows the initial mesh, (b) the mesh after Loop

subdivision, (c) the mesh after modified Loop subdivision, and (d) the mesh after

Butterfly subdivision.

4. A Survey of Dynamic Shape

Modeling

It is an error to imagine that evolution signifies a constant tendency to increased

perfection. That process undoubtedly involves a constant remodelling of the or-

ganism in adaptation to new conditions; but it depends on the nature of those

conditions whether the directions of the modifications effected shall be upward or

downward.

HUXLEY, THOMAS H. (1825-1895)

Once we chose a geometric surface representation, we need to deform the initial shape

in an evolution process using dynamic shape models. The evolution usually consist of two

parts: one part deforms the shape according to the photo-consistency criteria and the other

part smoothes or regularizes the surface. The smoothing or regularization is necessary to

improve the numerical stability of the evolution process [6]. As the problem of finding a

maximal-consistent shape is not only ill-posed, but also has uncountably infinite solutions, it

is necessary to restrict the space of possible solutions [16]. The smoothing and regularizing of

the surface is a way to add constraints to the problem to obtain a solution; the solution is the

smoothest or most regular surface that minimizes the photo-consistency criteria.

4.1. D-NURBS

Dynamic NURBS or D-NURBS were proposed by Hong Qin and Demetri Terzopoulos in

[23]. The main idea is to associate physical quantities such as mass distributions, deformation

energies and forces with the traditional geometric NURBS surfaces. This allows designers

to control the surface using physical properties, which are much more intuitive than the tradi-

tional ways of shape modeling where control points and associated weights are used to achieve

desired shapes. With D-NURBS, simulated forces and constraints can be applied to a NURBS

21

CHAPTER 4. A SURVEY OF DYNAMIC SHAPE MODELING 22

surface interactively to deform the shape. D-NURBS can be used in many CAGD applications,

for example fitting scattered data.

When using D-NURBS, a set of nonlinear differential equations have to be integrated nu-

merically through time, which affects the control points and weights to evolve according to

the applied forces. Hence, time is fundamental to this physics-based design technique. It is

possible to define tensor-product, swung and triangular D-NURBS surfaces. In this survey,

only tensor-product D-NURBS surfaces are discussed in detail.

A tensor-product D-NURBS surface is defined as

s(u, v, t) =

∑mi=0

∑nj=0 pi,j(t)ωi,j(t)Ni,k(u)Nj,l(v)∑m

i=0

∑nj=0 ωi,j(t)Ni,k(u)Nj,l(v)

(4.1)

wherepi,j(t) are the(m + 1)(n + 1) control points andωi,j(t) are the weights as functions of

timet. The basis functions along the parametric axes are of degreek−1 andl−1 respectively.

Hence, there are(m + k + 1)(n + l + 1) knots. Assuming nondecreasing knot sequences

t0 ≤ t1 ≤ . . . ≤ tm+k ands0 ≤ s1 ≤ . . . ≤ sn+l and end knot multiplicities ofk andl along

theu− andv−axis respectively, the surface interpolates the four control points at the corners

of the boundary. Put the control points and weights into one vector:

p(t) =[pT

0,0 ω0,0 . . . pTi,j ωi,j . . . pT

m,n ωm,n

](4.2)

As the components ofp are functions of time, it is possible to express the velocity and position

of a D-NURBS surface as

s(u, v, p) = J p, s(u, v, p) = Jp (4.3)

whereJ is the3× (4(m + 1)(n + 1)) Jacobian matrix ofs(u, v, p) with respect top.

The motion of the D-NURBS surface is governed by a system of second-order nonlinear

differential equations derived from Lagrangian dynamics.

M p + Dp + Kp = fp + gp (4.4)

whereM is the mass matrix,D the damping matrix andK the stiffness matrix.

M(p) =

∫ ∫µJT Jdudv (4.5)

with µ(u, v) as mass density function,

D(p) =

∫ ∫γJT Jdudv (4.6)


with γ(u, v) as damping density function and

K(p) =

∫ ∫(α1,1J

Tu Ju + α2,2J

Tv Jv + β1,1J

TuuJuu + β1,2J

TuvJuv + β2,2J

TvvJvv)dudv (4.7)

where subscripts onJ denote partial derivatives andαi,j(u, v), βi,j(u, v) are elasticity func-

tions. fp(p) andgp(p) are generalized forces associated with the model. Note that because of

the nonlinearity of the system, the three matricesM, D andK have to be recomputed at each

time step. The derivation of the equations of motion of a D-NURBS surface can be found in

[25].

D-NURBS can be used to obtain a NURBS surface that optimally fits regular or scattered

data. The result is optimal in the sense that it is the smoothest surface to interpolate or approx-

imate the given data.

4.1.1. Evaluation

As this technique operates with the NURBS surfaces, it is only possible to describe surfaces

that are defined over a rectangular or triangular parametric domain and this causes the follow-

ing problems:

• If a non regular parametric domain is used, the surface needs to be trimmed, a time-

consuming process.

• We can not parameterize a surface of arbitrary topology in a regular parametric domain

without introducing significant distortions. Hence, it is not possible to represent arbi-

trary shapes using a single NURBS patch, e.g., closed shapes can not be modeled with

one NURBS patch. If several NURBS patches are used to model an object, smooth

surface connections along the boundaries need to be modeled.

4.2. Multi-resolution Shape Recovery

Zhang and Seitz [28] proposed a technique that deforms a generic triangular mesh iteratively

using both image-based and geometry-based forces. A generic triangular mesh can converge

towards the true scene from arbitrarily far away, i.e., a cube or sphere can be used as initial

shape. Two restrictions apply to the initial shape. First, it needs to contain the true scene in its

interior volume as the mesh can only shrink but not expand during the deformation process.

Second, the topology of the initial mesh must match the topology of the true object as the


deformation maintains the initial topological structure. Assuming that the scene is approxi-

mately Lambertian, a consistency metric for a vertex is defined as the variance of colors of the

vertex’s projections in different images. The consistency of the entire mesh is defined as the

sum of consistencies over vertices of in the mesh. The aim of the deformation is to minimize

the consistency of the mesh to obtain a surface that matches all the input images in an optimal

way. When deforming the mesh, each point is moved according to the surface flow obtained

by minimizing the consistency. While updating the positions of the points, the sum of squared

distances between the original points and the deformed mesh is minimized. Furthermore, ef-

forts are made to subdivide regions where greater detail is needed and to simplify the mesh

where less detail is sufficient.

4.2.1. Evaluation

With this technique, a mesh is able to converge to the true scene from arbitrarily far away, an

object with adapted resolution is obtained and sharp corners and edges are modeled correctly.

As the initial mesh is only restricted to contain the true scene in its interior and to match the

topology of the true scene, few a-priori knowledge of the object is required. If an object with

both regions of high and low curvature is created using this technique, the node density of

the output mesh is only high in regions of high curvature. Therefore, the complexity of the

model is relatively low and the model can be rendered fast. This technique creates a photo-

realistic and compact output model that is ideal for rendering with graphics hardware and its

main application is multi-resolution rendering. The accurate geometry of the object is not

reconstructed. Therefore, this approach is suitable for scenes that only need to be displayed

realistically; geometric measurements are not possible using the output surface [28].

4.3. Energy-minimizing Snake

Esteban and Schmitt [9, 10] proposed an approach to obtain high quality 3D objects with

texture from a set of calibrated images. The idea is to use a classical snake approach that in-

corporates both silhouette and texture information of the model to deform an initial triangular

mesh. The classical snake approach was first proposed as a technique to find maximal changes

of intensity in an image plane using deformable contours. It was then extended to recover sur-

faces in three-dimensional space. The aim is to find the surfaceS ∈ <3 that minimizes a


global energy

E(S) = Etex(S) + Esil(S) + Eint(S) (4.8)

whereEtex(S) is a term that describes the texture of the object,Esil(S) is the energy related

to the silhouette, andEint(S) is a term that tries to smooth the surface.Etex(S) andEsil(S)

are both energies external to the geometric surface that are related to the sequence of images

andEint(S) is an internal energy only related to the smoothness and regularity of the mesh. If

a time variablet is introduced, the minimization of equation (4.8) can be achieved by solving

the discrete differential equation:

Sk+1 = Sk + ∆t(Ftex(Sk) + Fsil(S

k) + Fint(Sk)) (4.9)

whereFtex(S) = ∇Etex(S), Fsil(S) = ∇Esil(S), Fint(S) = ∇Eint(S), and∆t is the time

step. The solution of equation (4.9) is a surface where the internal and external forces cancel

each other, i.e., the surface is in equilibrium. To use equation (4.9), an initial triangular mesh

S0, the force resulting from the textureFtex(S), the force driven by the silhouettesFsil(S),

and the internal force that controls the smoothness of the surfaceFint(S) need to be specified.

The initial surfaceS0 needs to be close to the true surface to guarantee convergence of the

snake evolution. It needs to have the same topology as the true object because the evolution

process maintains the initial topological structure. The geometric distance between the initial

surface and the true surface should be small to keep the number of iteration steps low. The

initial surface chosen in [10] is a surface that lies between the real surface and its convex hull –

the visual hull of the object. The visual hull is the intersection of all the cones that contain the

true object obtained from the images; it can contain an arbitrary number of holes. However,

the topologies of the visual hull and of the true surface are not necessarily identical. If the

topology of the visual hull, which is used as initial surface, is wrong, this error is not corrected

during the evolution process.

The aim of the texture forceFtex(S) is to maximize the consistency of the surface’s pro-

jections in all images. Instead of using punctual radiometric comparisons by using photo-

consistency like in [7, 28], this approach uses cross-correlation measures to determine the

coherence. The cross-correlationC(p1, p2) measures the coherence between two pixelsp1

andp2 in two images by comparing the distributions of intensity in the neighborhoods of the

two pixels. The 3D geometry is then reconstructed by maximizing this criterion for a set of

images. To avoid local maxima, the optimization tests all configurations that can occur. Fur-

thermore, the criterion is cumulated in a 3D grid to make the algorithm more robust in the


presence of highlights. For a pixelp1 in an imageI1, the optic ray that projected a 3D point

to this pixel can be reconstructed if the camera calibration parameters are known (see Chapter

2). The optic ray can then be projected to another imageI2; the projection is called epipo-

lar line. Now, a correlation curve betweenp1 andI2 can be computed by determining the

cross-correlations betweenp1 and every pixel on the epipolar line inI2. For pixelp1, a set of

correlation curves can be computed by using all the images and the 3D point associated with

p1 can be determined as the point that maximizes the cross-correlations. The main idea of the

algorithm used to define the texture force is to perform this computation for each pixel in each

image and to cumulate the results in a voxel grid. The algorithm is more efficient because

redundant computations are only performed once. The texture force could now be defined as

the discrete gradient of the voxel grid, but as this is a local force, the texture force is defined

by a gradient vector flow (GVF) field, which we can interpret as the gradient smoothed with

the Laplacian operator.

The silhouette forceFsil(S) is the force that makes the surface match the original silhouettes

of the images. If only this force is used, the model will converge towards its visual hull. In

reality, the entire true surface matches the silhouette, but there are points, namely the object’s

concavities, that are occluded by parts of the surface and do not match the silhouette. This

must be taken into consideration when defining the silhouette force. Therefore, the silhouette

force consists of two components, one that measures how well the silhouette is matched and

one that determines the strength of the force that should be applied. The shortest distance

between a 3D vertex and the visual hull is used to measure the first component. If part of the

surface already matches a particular silhouette, the rest of the surface is not affected by that

silhouette. For the second component, it is distinguished whether a vertex is inside or outside

the visual hull. If a vertex is outside the visual hull, the maximum force will be applied.

For the vertices inside the visual hull, the force is inverse to the distance of the vertex to the

silhouette. This allows the detachment of vertices forming concavities from the visual hull.

The internal forceFint(S) is for regularizing and smoothing the mesh. The application of

this force results in a smooth limit surface; if only this force is used, the mesh collapses. In

[9], the internal force is defined as the Laplacian regularization (see equation3.7). In [10], the

internal force is a linear combination between Laplacian regularization and the biharmonic

operator defined asL 2(xi) = 1m

∑j∈N1(i) L(xj)−L(xi), m = #N1(i) whereL , xi, andN1 are

defined as in equation (3.7).

Finally, if vi denotes a vertex of the mesh, thekth iteration of the snake evolution can be


written as

vk+1i = vk

i + ∆t(Ftex(vki ) + βFsil(vk

i ) + γFint(vki )) (4.10)

whereβ andγ are weights that control the relative strengths of the silhouette force and the

regularization of the mesh relative to the texture force. The iteration given by (4.10) is stopped

once steady-state of all vertices is reached. After the evolution is finished, the mesh is refined

and a particle-based approach is used to obtain a good texture map.

4.3.1. Evaluation

The advantage of this algorithm is its stability in the presence of image noises. As the approach

uses cross-correlation measures to determine coherence, it remains stable in the presence of

highlights. Furthermore, the texture force is calculated using a volumetric approach, which

makes it a reliable force. The main disadvantage is that this algorithm is relatively slow com-

pared to the other ones discussed in this chapter as both the computation of the visual hull and

the texture force are time consuming. The running times of the algorithm given in [10, 9] are

in the order of several hours.

4.4. PDE-based Deformable Surfaces

Duan et al. [6, 7] proposed a technique that reconstructs 3D models not only from a set of

calibrated images, but also from volumetric data and 3D point clouds. An initial triangular

mesh, which can be either inside or outside of the true object, is deformed to capture both the

true object’s geometric boundary and its topological structure. When a model is reconstructed

from a set of 2D images, the technique automatically selects the best views and the technique

refines the model if additional images are included at any stage of the deformation process.

Furthermore, the technique is able to model sharp features correctly as the mesh is refined

adaptively in regions of high curvature. The adaptive refinement also has the effect that the

output model has adaptive resolution, i.e., the mesh is denser in regions of sharp features than

in mainly planar regions.

The behavior of the deformable model is governed by the general weighted minimal surface

flow, a system of PDEs, derived in [2, 3]. Caselles et al. [2] examined how to find an object

in a 2D imageI by segmentation. They showed that the energy-based models in 2D and

the solution of minimizing the weighted lengthLR =∫

g(I)ds, whereg(I) is a non negative,

monotonically decreasing function acting as an edge detector ands is the Euclidean arc-length,


are related. An initial curveC(0) = C0 is deformed towards a minimum ofLR by ∂C(t)∂t

=

g(I)κN−(∇g ·N)N whereκ is the Euclidean curvature ofC, N is the inward unit normal ofC,

andt is a time step. OnceLR reached its global minimum, the contours inI are detected and

the stopping factorg(I) equals 0. Caselles et al. [3] discuss the problem of object segmentation

in 3D images, i.e. reconstructing 3D models from volumetric data. The idea of minimizing

a weighted length was extended to 3D and the result is that an object in a 3D image can be

segmented by minimizing the weighted area

AR :=

∫ ∫g(I)da (4.11)

whereda is the Euclidean element of area andI is a 3D image. In analogy to the surface

that minimizes∫ ∫

da, the surface that minimizesAR is called minimal surface. The Euler-

Lagrange of equation (4.11) results in the gradient descent flow

∂S

∂t= (gH −∇gN) N, S(0) = S0 (4.12)

whereS0 is the initial surface,S is the surface sought,H is its mean curvature andN is its

unit normal. This flow is also called general weighted minimal surface flow; it is derived and

used as the basis of the deformation in [6]. One additional constant velocityv, which can be

interpreted as an initial speed of the surface, is added to the deformation to prevent the model

from getting stuck in local minima. The PDE that governs the surface deformation becomes

∂S

∂t= (g(v + H)−∇gN) N = F(p, t), S(0) = S0 (4.13)

wherep is the current position of the surface andF can be interpreted as the speed of the

surface. As the technique allows three kinds of input data for reconstructing an object, namely

volumetric data, 3D point clouds, and multiple images, different definitions of the functiong

are provided. In this project, we will only discuss the functiong used for multiple images. It

is defined as the photo consistency

g = σ2 = σ2R + σ2

G + σ2B (4.14)

with σ2C = 1

N−1

(∑Ni=1 C2

i − 1N

(∑Ni=1 Ci

)2)

, whereC has to be substituted by one of the

color channelsR,G, or B andN is the number of selected views1. The color vectorsRi, Gi,

1The equation given for the variance in [6, 7] is σ2C = 1

N−1

∑Ni=1 C2

i −(

1N

∑Ni=1 Ci

)2

. As this is not the

standard formula for the variance, we corrected the formula.


andBi for a 3D vertex of the meshP are obtained by projecting a neighboring patch ofP

into the imageIi. The neighboring patch is chosen on the tangent plane ofP and it needs to

be large enough to represent local features to increase the numerical robustness of the photo

consistency, but small comparing to the distance from the object to the camera. The area of the

projected neighborhood inIi is used to measure the quality ofIi for the reconstruction ofP;

the smaller the area, the more degenerate the view. For the computation of the vertex’s photo

consistency, the most degenerate views are not used, but only the bestN views are selected.

The best views are found as the ones where both the vertex is visible and the projection of the

neighborhood of the vertex has maximal area. The visibility check is performed with OpenGL

library functions which offers the advantage of being fast.

To use equation (4.13), the discrete normal and mean curvature of a mesh need to be cal-

culated. The normal in a vertexP of the mesh is approximated by computing the best-fitting

plane forP and some neighboring points like explained in Section3.1. The mean curvature in

P is computed using equation (3.10). Equation (4.13) is solved iteratively using the forward

Euler scheme

S(P, t + ∆t) = S(P, t) + ∆tF(P, t), S(P, 0) = S0(P) (4.15)

for each vertexP of the mesh. When solving a PDE using this scheme, it is important that

the time step∆t is constrained by the Courant-Friedrichs-Lewy (CFL) condition that was first

published in [4]. It states that the time step used to solve the PDE in an explicit scheme is

restricted by the discretization of space divided by the maximal speed with which the PDE

moves. Duan [6] gives the possible time step that results from the CFL condition as

∆t ≤ mE

MF

(4.16)

wheremE denotes the shortest edge length of the mesh andMF the magnitude of the maximal

speed vectorF. This restricts the speed of the deformation by the minimal detail of the mesh,

which is the shortest edge length. This means that prior to moving the mesh, the speed vectors

for each vertex need to be computed to obtain a valid time step.

If we deform the model using only equation (4.15) each vertex of the mesh is moved ac-

cording to the photo consistency criteria, but the mesh is neither refined nor regularized and

the topology of the shape is not changed. Therefore, additional steps for mesh refinement,

mesh regularization, and topology adaptation are proposed in [6, 7]. The mesh is refined both

globally and locally to obtain a multi-resolution mesh with good accuracy. At the beginning of

the evolution process, the mesh is refined and smoothed globally several times using the Loop


subdivision scheme (see Section3.4.1). After many iterations, it is not necessary any more

to refine the entire mesh, but adaptive refinement is sufficient. If a shape is recovered from

3D datasets, it is possible to connect the adaptive refinement directly to the accuracy of the

reconstruction by computing distances between the given 3D data and the mesh. As this is not

possible when multiple 2D images are used as input data, the mesh gets subdivided adaptively

in regions where the mean curvature is higher than a user defined threshold. To improve the

stability of the evolution process, the mesh is regularized using the tangential Laplace oper-

ator (3.9) in each iteration step, which results in an even node distribution. Furthermore, the

mesh is regularized using three operations: edge split and edge collapse keep edge lengths

in a defined range and edge swap keeps the valence of each vertex as close to 6 as possible.

The result of these operators is that the mesh’s node density and its triangles’ aspect ratio are

good. To capture arbitrary topology, the mesh’s topological structure must change during the

evolution process. This can be achieved with two topological operations: merging and split-

ting. A new approach for topology merging called ”lazy merging” is presented and performed

once after the deformation stopped. The main idea is to deactivate two vertices that are close

to each other and that are not adjacent to each other. Topology splitting is performed during

the evolution whenever necessary. If several faces of the mesh converge to a single point, the

mesh needs to split at that location.

4.4.1. Evaluation

This technique can only handle closed surfaces as the differential quantitiesN andH do not

exist for points on the boundary of a surface. However, the technique offers several advan-

tages. It is possible to start with a coarse initial mesh, e.g. a cube as both the shape’s topology

is retrieved and the mesh is refined adaptively to capture details. Furthermore, the evolution

process has a theoretical stopping term, the functiong, which equals0 once the boundary of

the shape is reached. The runtime of the algorithm given in [7] is in the order of minutes; it is

claimed that the algorithm might achieve interactive running times soon.

5. Interpretation and Implementation

Everything should be made as simple as possible, but not simpler.

EINSTEIN, ALBERT (1879-1955)

5.1. Preliminaries

The aim of this project is to reconstruct three-dimensional smooth surfaces from a set of cali-

brated images and an initial model. We obtain the camera calibration parameters and a coarse

initial triangular mesh model with a texture map from the software package PhotoModeler

5 Pro. First, the camera is calibrated intrinsically by taking multiple images of a predefined

pattern known by the software. Afterwards, a set of images is loaded to the software and

corresponding points on the images are defined and used to obtain both the camera’s extrinsic

parameters at the time of exposure and an initial mesh model. The quality of the extrinsic

parameters depends strongly on the accuracy of the correspondences. We take images that

contain both the object to reconstruct and a pattern; the pattern can then be used to obtain ac-

curate extrinsic parameters. Both, a texture mapped triangular mesh model and the calibration

parameters can be exported from PhotoModeler 5 Pro as VRML2.0 and ASCII files respec-

tively [1]. This information along with the original images is sufficient to start the deformation

process. In the following, two assumptions are made:

• The coarse initial model obtained by user assistance has the same topology as the true

object. Hence, the model’s topology is not changed during the deformation process.

The coarse initial mesh that was obtained using Photo Modeler 5 Pro can be subdivided

interactively by the user with one of the subdivision schemes discussed in Section3.4.1

to obtain a smoother and denser initial mesh for the deformation.

• The surface to reconstruct is Lambertian, which means that a 3D point has consistent

colors in all images.

31

CHAPTER 5. INTERPRETATION AND IMPLEMENTATION 32

The deformation model is implemented using C++ in a Windows environment. The render-

ing is performed with the application program interface (API) OpenGL. To implement the

deformation, we need to choose a surface representation and a deformable model.

5.2. Comparison of dynamic shape models

If D-NURBS explored in Section4.1 are used as basis for the deformation, it is necessary

to fit an initial NURBS-surface to an unorganized set of 3D points. Either D-NURBS or

other approximation methods for unorganized data points, see [20], can be used to interpolate

or approximate the scattered data. To approximate or interpolate the points, the parameters

(ui, vi), 1 ≤ i ≤ k of each of thek 3D points need to be known. The parameterization has big

impact on the shape and quality of the result. Various approaches have been proposed to obtain

good parameterizations of 3D points, e.g., shape preserving parameterizations [11] and most

isometric parameterizations [17]. Once an initial surface was created, it can be deformed.

When we deform an initial surface consisting of more than one NURBS patch, we need to

model smoothness along patch boundaries in each deformation step. When we deform an

initial NURBS patch with trimmed boundary, we need to update the trimming curve in each

deformation step. These disadvantages makes D-NURBS inappropriate for this project.

As all the other techniques surveyed in Chapter4 operate on a triangular mesh, they do not

suffer from the disadvantage of being restricted to a regular parametric domain. The mesh

obtained with PhotoModeler 5 Pro is a possible initial shape for the deformation. The multi-

resolution technique discussed in Section4.2can not be used in this project despite the many

advantages it offers as the geometry of the object is not retrieved correctly.

Both the energy-based deformable snake model (Section4.3) and the PDE-based deformable

model (Section4.4) are appropriate methods for solving the problem stated in Chapter1. The

energy-based snake depends on a 3D grid space to compute the external texture force and the

initial model; computations based on volumetric approaches are time-consuming. Further-

more, the visual hull is needed to compute the silhouette force. The computation of the visual

hull requires to find the 2D silhouettes of the object in each image [9]. The mesh does not

get refined during the deformation process, i.e., the technique does not create a mesh with

adapted resolution like the PDE-based approach. Furthermore, the energy-based algorithm

is much slower than the PDE-based one. As the PDE-based approach creates a mesh with

adapted resolution, does not need to compute the visual hull of the object, and is faster than


the energy-based deformable snake model, it forms the basis of the deformation implemented

in this project. However, the method is modified to optimally fit the given problem.

5.3. Algorithm

We start from the initial model obtained from PhotoModeler 5 Pro (see Section5.1) and de-

form it according to the general weighted minimal surface flow given by equation (4.13). The

initial mesh is assumed to have the same topology as the final result; therefore, no topology

modification is performed in the evolution process. However, it is necessary to maintain a

smooth and regular mesh throughout the deformation to help improve the numerical stability

of the process.

5.3.1. Mesh smoothing and adaptive refinement

Duan et al. [6, 7] refine the mesh globally using Loop subdivision at the beginning of the

evolution process. This approach not only refines the model, but also keeps it smooth helping

the shape to converge towards the final result. As discussed in Section3.4.1, this subdivision

does not take sharp features into consideration, but smoothes them. Furthermore, the number

of triangles increases quickly – it is quadruplicated in each iteration step – and this slows the

deformation process down. Therefore, subdivision is not used in this deformation algorithm.

It is assumed that the user subdivides the mesh interactively using one of the three subdivi-

sion schemes discussed in Section3.4.1until it is dense enough to roughly capture the global

shape before the deformation starts. As the mesh is not subdivided iteratively, it needs to be

smoothed to maintain numerical stability. The smoothing is performed with the curvature nor-

mal operator given by equation (3.11), which offers the advantage that sharp edges and corner

points are preserved. As proposed by Duan et al. [6, 7], the mesh is locally refined in regions

of high curvature whenever necessary, which helps to capture details and sharp features cor-

rectly. After each deformation step, each triangle adjacent to a vertex with mean curvature

higher than a user defined threshold is quadrisected. Afterwards, the mesh is connected to

achieve correct topology. The adaptive refinement in a neighborhood of vertexP is shown in

Figure5.1, where black dots denote new vertices.


Figure 5.1.:Adaptive refinement of a triangle.

5.3.2. Mesh regularization

It is important that the mesh is regular, i.e. has an even node distribution, a density of nodes

that is adapted to the shape of the model, and a good aspect ratio of triangles, to improve

numerical stability of the deformation process (see Section4.4).

To create and maintain an even node distribution, we use the regularization term suggested

by Duan et al. [6, 7]; the tangential Laplace operator given by equation (3.9).

As the mesh’s edge lengths are generally not in the same order of magnitude, the mesh’s

node density is regularized before the deformation process starts. The regularization is per-

formed for the entire mesh; it consists of two steps: edge split and edge collapse.

1. Edge splitsplits every edge longer than an upper boundarybu; triangles adjacent to that

edge are divided.

2. Edge collapsereplaces every edge shorter than a lower boundarybl by a vertex; triangles

adjacent to that edge are deleted. The texture information created by PhotoModeler 5

Pro is mapped on triangles and not on points, i.e., one single 3D point can correspond

to more than one 2D point on the texture map, but one 3D triangle is mapped to exactly

one 2D triangle on the texture map. If a triangle is deleted, the texture information can

not be updated easily. This results in missing parts of texture. Hence, the edge collapse

operator damages the texture of the object.

In theory, it is possible that edge split creates edges shorter thanbl and that edge collapse

results in edges longer thanbu, i.e., the two operators need to be used in an iterative process

to guaranteebl ≤ |e| ≤ bu for every edgee. However, this problem rarely occurs in practice,

and therefore, the two operators are used only once.

The edge swapping improves the aspect ratio of the mesh’s triangles and makes the valence

of each vertex as close to 6 as possible. This is necessary if Loop subdivision is used as only


vertices of valence 6 converge towards aC2-continuous point on the limit surface. As the

Loop subdivision scheme is not used in the deformation process, it is not required that each

vertex of the mesh has a valence as close to 6 as possible. Therefore, the algorithm does not

swap edges.

5.3.3. Deformation of the mesh

The deformation behavior of the model is governed by the the general weighted minimal

surface flow∂S

∂t= (g(v + H)−∇gN) N = F(p, t), S(0) = S0

that was discussed in Section4.4, equation (4.13). To solve that system of PDEs using a

numerical approach, we need to evaluate the discrete normalsN, the discrete mean curvatures

H, the photo consistencyg, and the gradient of the photo consistency function∇g for each

vertex of the triangular mesh. We will discuss how to evaluate these quantities and how to

solve the PDE in this section. Since the photo consistency and its gradient make the main

contribution to the surface deformation, they are discussed in detail.

The discrete normal vectorN for an interior vertexP of the mesh is computed using equa-

tions (3.1) and (3.2). This approach was chosen over computing the normal of the best fitting

plane ofP’s neighbors with equation (3.3) used by Duan et al. [6, 7], because it is efficient

and yields good results for meshes with evenly distributed nodes [27]. As the mesh’s node

distribution is kept regular throughout the deformation process by using the tangent Laplace

operator, this efficient way of computing discrete normals is appropriate. Figure5.2 (a) and

(c) show examples of meshes with discrete normals obtained with equations (3.1) and (3.2).

One can see that even for meshes with different triangle sizes or sharp features, the normals

correspond to what is intuitively expected.

The discrete mean curvatureH for an interior vertexP of the mesh is computed with equa-

tion (3.10) as proposed by Duan et al. [6, 7] Two meshes and their vector-valued discrete mean

curvature normals are displayed in Figure5.2 (b) and (d). One can see that the lengths of the

displayed vectors, which correspond to the discrete mean curvatures, are shorter in mainly

planar regions than in regions of high curvature. This behavior is intuitive as it corresponds

to the properties of the mean curvature on continuous surfaces. For boundary vertices, the

quantitiesN andH do not exist. Therefore, we do not move boundary vertices during the

deformation process. Hence, the approach produces best results for closed triangular meshes.


(a) (b) (c) (d)

Figure 5.2.: (a) and (c) show the normals of the bag model and the head of Nefertiti respec-

tively, (b) and (d) show their mean curvature normals.

Photo consistency

As the surface deformation given by equation (4.13) is a gradient descent method, the photo

consistencyg(P) and its gradient∇g(P) of a 3D pointP make the main contribution to the

deformation. Therefore, these factors need to be examined carefully and computed accurately,

yet efficiently. As mentioned in Section4.4, the photo consistency functiong that is used

to deform the initial surface needs to be non negative and monotonically decreasing. The

problem of finding a 3-dimensional photo consistent shapeS for given images requires that

g is minimal at each point onS and gradually increases as the distance fromS increases. If

the functiong behaves like this, the gradient∇g offers reliable information and can be used

in equation (4.13) to findS.

Let I denote a sequence of calibrated images of a perfectly Lambertian surface,IP thek

images whereP is visible, andPi the projections ofP to IP . The main idea for measuring

the photo consistency ofP is thatPi have consistent colors ifP is a point on the surface to

be reconstructed. However, real world surfaces are not perfectly Lambertian, cameras intro-

duce distortions that are neglected by the pinhole camera model (see Section2.1), and real

images are noisy. Therefore,Pi in general have slightly different colors even ifP is a point

on the surface. Hence, it is not accurate enough to measureP’s photo consistency by com-

paring the colors ofPi. The accuracy ofP’s photo consistency can be improved by projecting

a neighborhoodNbhd(P) of P to IP and by computing the variance between the projected


neighborhoodsNbhd(P)i. Note that the variance equals0 if P andNbhd(P) lie on the true

surface and if we neglect distortions. In reality, it is not possible to compute a neighborhood

of P on the true surface as the true surface is not known. Duan et al. [6, 7] chooseNbhd(P)

to lie on the tangent plane ofP and use the variance betweenNbhd(P)i to measureP’s photo

consistency; the smaller the variance, the smallerP’s distance from the real surface. This ap-

proach is illustrated in Figure5.3(a). They claim to further improve the numerical stability of

the photo consistency criteria by using only then ≤ k least degenerate views ofIP to compute

the variance.

(a) (b)

Figure 5.3.:Neighborhoods used for computation of photo consistency, (a) by Duan et al. [6,

7] and (b) in this project.

The criteria used to measure the photo consistency in this project differs from the one used

by Duan et al. [6, 7] in the following two ways:

1. Instead of computing the variance betweenNbhd(P)i, which requires to projectNbhd(P)

to k images, the variance between rectangular neighborhoodsNbhd(Pi) of Pi is used to

measureP’s photo consistency. This is significantly more efficient as onlyP needs to be

projected tok images. Note, however, that the colors ofNbhd(Pi) are not consistent if a

theoretical, perfectly Lambertian surface is examined as a rectangular neighborhood on

the tangent plane ofP does not necessarily project to a rectangle under pinhole projec-

tion. However, experimental results showed that the difference between this approach

and the one used by Duan et al. [6, 7], where the true surface is approximated by its

tangent plane, is mostly theoretical because of the influence of image noises and lens

distortions. The approach we use is shown in Figure5.3(b).

2. The photo consistency is computed using a statistical quantity, the variance, which only

produces reliable results for a large number of samples. Therefore, unlike Duan et al. [6,


7], we do not restrict the number of views used to compute the photo consistency to the

n least degenerate views, but use all thek available images. This increases efficiency

as the quality of the views is not evaluated. Experimental results showed that the photo

consistency becomes more reliable when more pictures are used. This is illustrated

in Figure5.4 (b)-(d), where the photo consistencies for 3D sample points in a plane

π, which intersects the initial model, were computed using different numbers of input

images. Figure5.4 (b), (c), and (d) show the results for 5, 10, and 22 input images

respectively. Figure5.4 (a) shows the intersection between the initial mesh andπ. The

intersectionc between the real object andπ is visible best in Figure5.4(d), where most

of the images were used to compute the photo consistency. We can also see that the

photo consistency gradually increases in a small neighborhood ofc. This behavior is

crucial as it allows to capture the real object using gradient information.

As an effect, the photo consistency we use is easier and faster to compute, but slightly less

accurate. This is demonstrated in Figure5.4 (d) and (e), where the photo consistencies of 3D

sample points inπ are displayed for both the approach proposed by Duan et al. [6, 7] (Figure

5.4(e)) and our approach (Figure5.4(d)). Figure5.4(d) is slightly more blurry, but overall the

two results are of comparable quality. This approach is implemented, because the discussed

changes only decrease the quality slightly, but improve the efficiency significantly.

To compute the photo consistency, the size ofNbhd(Pi) or Nbhd(P)i needs to be specified.

If the projected neighborhood is too small, image noise is not compensated and the numerical

stability is not improved significantly. If the projected neighborhood is too large, errors are

introduced as the neighborhood ofP does not in general lie on the true surface. Although this

problem is mentioned by Duan et al. [6, 7], they make no suggestions about the size of the

neighborhood. For all the experiments discussed in this project, the size of the neighborhood

was chosen to be5× 5 pixels.

Note that the photo consistency is defined using the variance between colors in different

images; it is therefore only defined for a 3D point that projects to at least two input images.

Any vertex of the triangular model that is visible in less than two input images is not deformed

by our deformation, but it is smoothed and regularized.

Due to the high importance of the photo consistency function, its properties need to be

examined. Figures5.5 and5.6 show the photo consistency for 3D sample points in planes

intersecting the initial models. The photo consistency is low in blue regions, high in red

regions and undefined in green regions, where 3D points project to less than two of the input


(a) (b) (c)

(d) (e)

Figure 5.4.: Photo consistency. (a) shows the intersection between the initial mesh and a

planeπ. (b)-(e) are plots of the photo consistencies of sample points inπ. The

photo consistency is low in blue regions and high in red regions.


images. From experimental results, it can be deduced that the photo consistency we use has

the following properties:

• The photo consistency has its minima for points onS, see Figure5.5(b).

• In a small neighborhood ofS, the photo consistency increases with increasing distance

from S, see Figure5.5(b).

• The photo consistency for points that are far fromS is arbitrary; this is comprehensible

as those points may project on the background in some images and on the object in

others. The background is not controlled, but can be used to obtain accurate calibrations

using patterns. This disadvantageous behavior of the photo consistency can be seen in

Figures5.5 (b), 5.6 (b) and5.6 (d). As a result, the photo consistency’s gradient∇g,

which makes a major contribution to the deformation of the surface, only gives useful

information in a small neighborhood of the true surface. Therefore, the initial model

must be close to the true surface to capture the correct shape.

• Slight changes of calibration parameters provoke large changes of the photo consistency.

Because of this numerical instability, it is highly recommended to use patterns rather

than the object to determine the camera’s extrinsic calibration parameters with Photo

Modeler 5 Pro.

• For specularities on a surface, the Lambertian surface assumption does not hold. This

results in noisy and wrong photo consistency values.

• Although both models shown in Figure5.5 and5.6 used accurate calibration parame-

ters and do not contain significant specularities, the photo consistency function is noisy.

Therefore, we can deduce that the photo consistency function is generally noisy, pos-

sibly because of image noises. This must be kept in mind for the computation of the

gradient∇g.

• For objects with uniformly colored projections in the image planes, photo consistency

is meaningless as the images do not contain enough information. Every 3D point that

projects to object points in each image has consistent colors. An example of this is

shown in Figure5.6 (a) and5.6 (b); in the shaded regions of the model all colors are

uniform and this results in large regions of constant photo consistency.


(a) (b)

Figure 5.5.: (a) shows the texture mapped initial triangular mesh intersected with a planeπ

and (b) shows a plot of the photo consistency inπ and the intersection between

the initial model andπ as white polygon.

(a) (b) (c) (d)

Figure 5.6.: (a) and (c) show the texture mapped initial triangular mesh intersected with

planesπ1, π2, (b) and (d) show plots of the photo consistency inπ1, π2 and the

intersections between the initial model andπ1, π2 as white polygons respectively.

Gradient of Photo consistency

To solve the general weighted minimal surface flow (equation (4.13)), we need the gradient

∇g of the photo consistency, which requires the function’s partial derivatives with respect to

x, y andz. Duan et al. [6, 7] do not discuss which approach they use to approximate∇g. The

computation of the derivative of a function that is not explicitly known is a complex problem

and the following approaches to approximate the derivativef ′(x) of a functionf(x) with

respect tox are discussed by Press et al. in [22].


• Finite-difference approach is simple approach that uses finite-differences to approxi-

mate the derivative

f ′(x) ≈ f(x + h)− f(x− h)

2h(5.1)

with h > 0. As h → 0, the right hand side of equation (5.1) converges tof ′(x). The

accuracy of this approach is low.

• Richardson’s deferred approach to the limit has the main idea to extrapolate the

results of finite-difference computations with decreasing values forh using Neville’s

algorithm. This approach significantly improves the accuracy of the approximation.

However, it assumes thatf(x) is a smooth function, which is not the case for the photo

consistency functiong.

• Savitzky-Golay smoothing filtersare usually used to smooth noisy data that is tabu-

lated equidistantly. To compute the function value and its derivatives at a pointx, the

data point and itsk neighborsf(xi) with xi ={x− k

2∆x, . . . , x, . . . , x + k

2∆x

}, where

∆x is the spacing, are used. The Savitzky-Golay filter is a low-pass filter. It fits a poly-

nomial p of ordern by least squares to thosek + 1 sample points and approximates

f(x) by p(x), f ′(x) by p′(x) and so on. It is possible to computek + 1 coefficientscsi

that depend on the sought derivatives by least-square fitting a polynomial of ordern to

fictitious data. For an arbitrary data pointx, we acquireg(s)(x) =∑k

i=0 f(xi)csi using

a simple linear combination ofx’s neighbors. Hence, the least-square fitting is only

performed once, which makes the algorithm efficient if equally spaced data samples of

f are known. To use this algorithm, one has to choose the ordern of p, the sizek of the

neighborhood, and the orders of the derivative.

This overview of numerical methods to compute derivatives is not exhaustive, but it contains

all the methods that were taken into consideration for this project.

Savitzky-Golay filters are suitable for computing the partial derivatives of the photo con-

sistency function as they smooth the noisy data prior to evaluating the derivative. The photo

consistency function is not tabulated for a regular grid, but it can be evaluated for any 3D

point that projects to at least two input images. The computation of the gradient∇g(P) for a

3D pointP with Savitzky-Golay smoothing filters requires3k + 1 evaluations ofg: k evalu-

ations in a neighborhood ofP along thex, y andzaxis respectively and one evaluation atP.

This costly computation of the gradient is our first choice, because the gradient∇g has major


influence on the surface deformation given by equation (4.13) and therefore needs to be a re-

liable and accurate factor. The order of the derivative used in this project iss = 1 as the first

derivative is sought, the order of polynomialp is n = 4 as this is recommended by Press et

al. [22], and the size of the neighborhood isk = 20 as this is the smallest neighborhood that

yields good results for the test models. The spacial sampling factor∆ = ∆x = ∆y = ∆z we

used is proportional to the length of the shortest edge of the triangular model.

Solving the system of PDEs

The PDEs given by equation (4.13) describe a system of nonlinear initial-value problems of

first order in 4 dimensions, the spacial dimensionsx, y, z and the time dimensiont. Press et

al. [22] state that the CFL condition for initial-value problems inN + 1 dimensions is given

by

∆t ≤ ∆√N ‖v‖

(5.2)

where∆t is the discrete time step,∆ is the spatial discretization step, which is the same for

all N spatial dimensions, andv is the maximal propagation velocity of the PDE. The spatial

discretization step∆ we use to compute the gradient∇g(P) for vertexP is the size ofP’s

neighborhood that was used for least-square fitting:∆ = k∆x = k∆y = k∆z. According to

Duan [6], the maximal speedv is maxi F(pi, t), whereF = g(v + H)−∇gN)N is defined as

in equation (4.13) andpi are all possible positions of the surfaceS. The dimensionN equals

3 in our case. Hence, the CFL condition for this problem becomes:

∆t ≤ ∆√3 ‖maxi F(pi, t)‖

.

Note that the CFL condition we found now differs from equation (4.16) used by Duan [6].

We can approximate the quantitiesN, H, g, and∇g for each vertex of the mesh. Hence,

we can choose a small initial speedv and computeF, which is the right hand side of equation

(4.13), for any given surfaceS(t). Note thatF is costly to evaluate as it contains∇g and that

we can treatF(p, t) as if it contained no partial derivatives as∇g is already evaluated. Hence,

the only remaining partial derivative of the PDE is∂S∂t

, the left-hand-side of equation (4.13).

Therefore, the system of PDEs can be solved as a system of nonlinear initial-value ordinary

differential equations (ODEs) of first order. The initial valueS0 is the (possibly subdivided)

initial triangular mesh obtained from PhotoModeler 5 Pro.

Solving systems of ODEs of first order analytically or numerically is a central task in math-

ematics as many problems in physics or chemistry lead to ODEs and as ODEs of any or-


der can always be solved by studying an equivalent system of ODEs of first order. Press

et al. [22] discuss the following methods of numerically solving the system ofN ODEsdydx

= f(x, y), y(0) = y0 wherey, f ∈ <N :

• Euler forward method replaces thedy anddx by finite differences∆y and∆x = h,

which leads to

yn+1 = yn + hf(xn, yn), y(0) = y0

and solves the ODEs iteratively using small step sizes. This is an explicit single-level

method as only one value at time leveln is needed to compute time leveln + 1. The

error term of the explicit Euler scheme isO(h2). Press et al. [22] claim that this method

is neither accurate nor stable compared to other methods using the same step size.

• Runge-Kutta formulas of orderk takek explicit Euler steps, which involvesk eval-

uations of the functionf, and combine them into one Runge-Kutta step that matches a

Taylor series expansion up to orderO(hk); the order of the error term becomesO(hk+1).

This method is not appropriate for solving equation (4.13) as multiple evaluations of

F(p, t) are required for each time step and asF(p, t) is costly to evaluate.

• Richardson extrapolationcomputes results with finite step sizesh and to extrapolates

these results to smaller values ofh; the goal is to extrapolate toh = 0. As any extrap-

olation technique, Richardson extrapolation requires a smooth functionf(x) to produce

reliable results. As the functionF(p, t) in equation (4.13) involves the noisy photo con-

sistency function, this approach is not appropriate for our problem.

• Predictor-corrector methods store solutions while stepping forward inx and use the

stored results to extrapolate the solution one step further. After this predictor step, the

extrapolation is corrected using derivative information. As in Richardson extrapolation

methods, the functionf(x) needs to be smooth; hence, this approach is not appropriate

for solving equation (4.13).

This overview is not exhaustive, but it contains all the methods that were taken into consider-

ation for this project.

Although the forward Euler method is described as neither accurate nor stable by Press et

al. [22] and is therefore not recommended, we choose it because it is suitable for the integration

of equation (4.13) for the following reasons:


• Duan [6] uses this simple explicit equation (4.15) to integrate the system of PDEs (4.13)

iteratively and achieves visually pleasing results. As the aim is to solve the same prob-

lem, we can assume that the method produces results of comparable quality.

• As discussed before, we smooth the noisy functiong prior to computing its gradient

to obtain reliable results for∇g. This is important as∇g has major influence on the

deformation. However, this approach has the drawback that the evaluation of function

F(p, t) is costly. When using a forward Euler scheme to solve equation (4.13), only one

evaluation ofF(p, t) is needed for each time step.

• The forward Euler method is a simple approximation, it does not need significant run-

ning time compared to the evaluation ofF(p, t). It is also easy to implement.

Equation (4.13) is solved using the simple forward Euler scheme that yields to equation (4.15).

Except for the restriction that comes from the CFL condition, this is the same scheme as the

one used by Duan [6].

Stopping criteria

As equation (4.13) is solved iteratively, we need to define a stopping criteria that stops the

deformation once an acceptable solution is found. In theory,g can be used as stopping term

that equals0 when the photo consistent shape is reached (see Section4.4). However, this the-

oretical stopping criteria will in general never be reached as the surfaces that are reconstructed

are not perfectly Lamertian, the images used for the reconstruction are not correctly modeled,

but only approximated by the pinhole camera model, and the neighborhood used to compute

g is in general not a part of the real surface. Therefore, another stopping criteria consisting of

3 parts is used.

1. The photo consistency functiong is a monotonically decreasing function. Hence, the

probability that a (possibly local) optimal shape was found is high when the sum of the

photo consistency values for all of the mesh’s vertices∑

P∈mesh g(P) does not change

significantly. Therefore, one possible stopping criteria for the deformation is to stop

after i iterations if∣∣∑

P∈mesh g(P)i−1 −∑

P∈mesh g(P)i

∣∣ < ε, whereε is a user-defined

threshold. Note, that we need to take the absolute value althoughg is monotonically

decreasing, because the number of vertices can change as a result of adaptive refinement.

To improve this criteria, it is possible to record the changes over several deformation


steps. In this project, the deformation is stopped when∑

P∈mesh g(P) does not change

more than a user-defined threshold in5 consecutive iterations. The thresholdε is given

in percentage. Hence, the deformation stops if∣∣∣100− 100

PP∈mesh g(P)iP

P∈mesh g(P)i−l

∣∣∣ < ε for each

l = 1, 2, . . . , 5.

2. Even if the first stopping criteria is fulfilled before, at least 10 iterations are performed

to avoid an initial model getting stuck in a local minimum. Even if the first stopping

criteria is not reached, the deformation stops after 100 iterations as it is assumed that an

error occurred, e.g., the error criteria provided by the user is too optimistic or the data is

too noisy.

3. A third stopping criteria connects to the stepsize of the algorithm that was discussed

before. If the stepsize is smaller than a small threshold, further iterations do not improve

the shape significantly. Therefore, the deformation is stopped if the stepsize is smaller

than1 exp−5.

In this section, we explain how the algorithm proposed by Duan [6] was modified to fit the

given problem. The following modifications of Duans algorithm improve efficiency:

• This algorithm does not change topology adaptively.

• This algorithm does not swap edges.

• Loop subdivision is not used for global refinement and smoothing, which keeps the

mesh’s complexity low.

• The normal vector of the mesh is computed using a more efficient approach than the one

used by Duan [6].

• The photo consistency criteria we use can be computed more efficiently. First, not the

entire neighborhood is projected to each image, but only the vertex of the mesh. Second,

not only the least degenerate images are used, but all available images.

However, the computation of the gradient used in this algorithm is slow as many evaluations

of the photo consistency are required.


5.3.4. Outline of the algorithm

The outline of the complete algorithm implemented for the deformation is:

1. Preprocessing: Import the file from PhotoModeler 5 Pro that contains the camera cali-

bration parameters and the input images. This data needs to be given by the user.

2. Global refinement of the mesh using the edge split and edge collapse operator.

3. Global smoothing of the mesh using the curvature normal operator.

4. Global regularization of the mesh using the tangent Laplace operator.

5. While the stopping criteria is not reached:

a) Adaptive refinement if necessary.

b) Preprocessing: Check the visibility of all vertices for each camera position and

record the results for each vertex.

c) Preprocessing: Find all the neighboring vertices for each vertex and record the

results. The reason is that this information is needed more than once in the loop.

d) Compute and recordN for each interior vertex of the mesh. An efficient algorithm,

which is proposed in [27] and linear to the number of triangles in the mesh, is used.

e) Compute and recordH for each interior vertex of the mesh. Here, the neighbor-

hood information that was recorded in preprocessing is used.

f) Computeg and∇g for each vertex of the mesh that is visible in at least two images.

Here, the visibility information that was recorded in preprocessing is used.

g) Compute the speed vectorsF for each vertex of the mesh. For vertices whereN,

H, or g is not defined, set the speed vectorF = 0.

h) Find the restriction for the stepsize using the CFL condition.

i) Move each vertex of the mesh according to the deformation speed.

j) Global smoothing of the mesh using the curvature normal operator.

k) Global regularization of the mesh using the tangent Laplace operator. Here, the

neighborhood information that was recorded in preprocessing is used.

6. Render the new, deformed model.


When the program starts, a VRML 2.0 file from PhotoModeler 5 Pro is imported. The

mesh specified in this file and its texture are stored and displayed. The user can interactively

subdivide the mesh using one of the subdivision schemes explained in Section3.4.1and start

the deformation process. A screen shot of the graphical user interface is shown in Figure

5.7. This approach was implemented using C++ and OpenGL in a Windows environment.

Further details on the implementation can be found in AppendixA and B. The algorithm

contains several parameters that need to be specified. In this project, most of the parameters

are specified automatically by the program and only a few parameters are provided by the user.

The following parameters are chosen automatically: size ofNbhd(Pi), initial speedv, bl and

bu used for initial refinement, tolerance for adaptive refinement, stepsize∆ for computation

of ∇g. For information on how the parameters were chosen refer to AppendixB.7. The

user-defined parameters are stated for each experimental result.

Figure 5.7.:Screenshot of the graphical user interface.


5.4. Results

In this chapter, the results of the deformation are presented and discussed. All the results are

obtained from real datasets. The checkerboard or other pattern that is visible in the background

of the input images was used to obtain accurate camera calibration parameters. Images are

obtained by using a consumer grade digital camera with 5 mega pixels. The running time of

the algorithm is measured on a Pentium III 600 MHz computer with 600 Mb RAM.

Figure5.9shows the reconstruction of a bag from 22 real images. Figure5.8shows the 22

camera positions from where the input images were taken. In Figures5.9 (a)-(l), 12 of the

input images are displayed. Figure5.9 (m) shows the initial triangular mesh; Figure5.9 (n)

shows the mesh after initial refinement; Figure5.9 (o) shows the mesh after 10 deformation

steps; and Figure5.9(p) shows the mesh after 15 deformation steps. Figures5.9(q)-(s) show

different views of the final result after 28 deformation steps. The initial model is deformed

towards the shape that the bag has on the images. However, in Figure5.9 (s) it is visible that

the upper right part of the model is deformed in the wrong direction. One possible reason for

this is the specularity near the zipper of the bag. Figure5.9 (r) illustrates the aforementioned

problem with the texture map. Due to edge collapses, parts of the texture disappear. The error

for the stopping criteria was 0.3%. Details on the mesh and the running time of the algorithm

can be found in Table5.1, where∑

P∈mesh g(P) is the sum of the photo consistencies of each

vertex of the mesh with defined photo consistency.

Figure Triangles Vertices Edges Total Time (s) Time∇g (s)∑

P∈mesh g(P)

5.9(m) 112 58 168 - - 127.6

5.9(n) 1120 562 1680 83 17 1187.7

5.9(o) 1144 574 1716 288 162 1123.3

5.9(p) 1168 586 1752 409 255 1133.7

5.9(q) 1168 586 1752 719 503 1131.9

Table 5.1.: Information on the bag model.

Figure5.11shows the reconstruction of the head of the egyptian queen Nefertiti from 14

input images. Figure5.10shows the 14 camera positions from where the input images were

taken. In Figures5.11 (a)-(h), 8 of the input images are displayed. Figure5.11 (i) shows

the initial triangular mesh; Figure5.11 (j) shows the mesh after initial refinement; Figure

5.11(k) shows the mesh after 10 deformation steps; and Figure5.11(l) shows the mesh after

15 deformation steps. Figures5.11 (m)-(p) show different views of the final result after 18


deformation steps. The initial model is smoothed and deformed towards the real shape of the

head. We can see that the mesh is refined adaptively in regions where more detail is required,

e.g., the mouth, ears, nose, and eyes of the head. However, we can see that the left side of the

face displayed in Figure5.11(m) captures the real shape much better than the right side of the

head shown in Figure5.11(o). This is due to the lighting conditions; in the input images, the

left side of the model is illuminated while the right side is in the shade. The shaded part of the

model has uniform color in the input images, which makes the photo consistency information

unreliable. The error for the stopping criteria was 0.3%. Details on the mesh and the running

time of the algorithm can be found in Table5.2.


P∈mesh g(P)

5.11(i) 155 81 235 - - 51.2

5.11(j) 3371 1699 5069 189 36 1016.3

5.11(k) 3513 1770 5282 748 392 1006.5

5.11(l) 3513 1770 5282 1027 593 1001.8

5.11(m) 3513 1770 5282 1229 758 1001.3

Table 5.2.: Information on the Nefertiti model.

Figure5.13shows the reconstruction of a soft toy frog from 20 input images. Figure5.12

shows the 20 camera positions from where the input images were taken. In Figures5.13(a)-(l),

12 of the input images are displayed. Figure5.13(m) shows the initial triangular mesh; Figure

5.13(n) shows the mesh after one step of Loop subdivision and initial refinement; Figure5.13

(o) shows the mesh after 10 deformation steps; and Figure5.13 (p) shows the mesh after

15 deformation steps. Figures5.13 (q)-(t) show different views of the final result after 33

deformation steps. The initial model is smoothed and deformed towards the real shape of the

toy. The result is visually pleasing, because the soft toy does not contain any specularities.

We can see that the right side of the frog’s face starts from a closer initial mesh and is better

reconstructed than the left side. This shows the influence of the quality of the initial model on

the quality of the result. If the initial model is too far from the surface to reconstruct, it can

not capture the shape well. The error for the stopping criteria was 0.3%. Details on the mesh

and the running time of the algorithm can be found in Table5.3.

Figure5.15 shows the reconstruction of a piggy-bank in form of a clown from 16 input

images. Figure5.14shows the 16 camera positions from where the input images were taken.

In Figures5.15(a)-(h), 8 of the input images are displayed. Figure5.15(i) shows the initial



P∈mesh g(P)

5.13(m) 189 130 318 - - 94.9

5.13(n) 5391 2870 8260 534 122 2049.6

5.13(o) 5443 2896 8338 2094 1202 1906.8

5.13(p) 5534 2942 8475 3071 1808 1896.4

5.13(q) 5950 3151 9100 6754 3965 1975.1

Table 5.3.: Information on the frog model.

triangular mesh; Figure5.15(j) shows the mesh after initial refinement; Figure5.15(k) shows

the mesh after 10 deformation steps; and Figure5.15(l) shows the mesh after 15 deformation

steps. Figures5.15(m)-(p) show different views of the final result after 35 deformation steps.

The initial model is smoothed and deformed towards the real shape of the clown. Figure

5.15(n) shows another example of poor textures as a result of edge collapses. Although our

method reconstructs the clown as a whole, some regions like the clown’s hat and the sphere

in his hand are not reconstructed well. The reasons for this behavior are specularities on the

model. The region of the slot of the piggy-bank is not reconstructed, but only deformed and

refined adaptively. The reason is that our method assumes correct topology and does not put

additional holes into the mesh. This would be necessary to reconstruct the slot correctly. The

error for the stopping criteria was 0.3%. Details on the mesh and the running time of the

algorithm can be found in Table5.4.


P∈mesh g(P)

5.15(i) 240 122 360 - - 97.6

5.15(j) 4170 2087 6255 422 45 1606.1

5.15(k) 4466 2235 6699 1402 526 1563.9

5.15(l) 4574 2289 6861 1991 753 1563.2

5.15(m) 5742 2415 6973 3980 1785 1581.6

Table 5.4.: Information on the clown model.

In Tables5.1-5.4, we can see that the ratio between∑

P∈mesh g(P) and the number of ver-

tices decreases as the surface is deformed. This shows that the method deforms the initial mesh

towards a more photo consistent surface for each of the models shown. Furthermore, Figures

5.9, 5.11, 5.13, and5.15 show that the initial meshes are refined adaptively and smoothed.

Table5.5 gives a short summary on the running times of the complete algorithm. Note that


the computation of∇g takes up more than half of the algorithm’s running time for three of the

reconstructed models.

Model Total Time (s) Time∇g (s)

Bag 719 503

Nefertiti 1229 758

Frog 6754 3965

Clown 3980 1785

Table 5.5.: Information on the running times.

Figure 5.8.:Camera stations for the bag model.


(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

(m) (n) (o) (p)

(q) (r) (s)

Figure 5.9.:Reconstruction of a bag.


Figure 5.10.:Camera stations for the Nefertiti model.


(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

(m) (n) (o) (p)

Figure 5.11.:Reconstruction of Nefertiti.


Figure 5.12.:Camera stations for the frog model.


(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

(m) (n) (o) (p)

(q) (r) (s) (t)

Figure 5.13.:Reconstruction of a soft toy frog.


Figure 5.14.:Camera stations for the clown model.


(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

(m) (n) (o) (p)

Figure 5.15.:Reconstruction of a clown.

6. Conclusion and Future Work

The best material model of a cat is another, or preferably the same, cat.

ROSENBLUETH, ARTURO (1900-1970)

The algorithm that was implemented deforms an initial triangular mesh towards a smooth

photo consistent mesh with adapted resolution. The general weighted minimal surface flow

that incorporates the photo consistency criteria governs the deformation behavior of the model.

During the deformation, the resolution of the mesh is refined adaptively, which allows model-

ing objects with sharp features. Furthermore, the mesh is smoothed and regularized to ensure

numerical stability. The deformation improves the photo consistency significantly and stops

once the geometric boundary of the object is detected. Given an initial model that is close to

the true surface and accurately calibrated input images of a Lamertian surface made in good

lighting conditions, we obtain visually pleasing results. The method that was presented is

powerful as smooth 3-dimensional models can be created without the need of an expensive

equipment; a consumer grade digital camera shoots input images with sufficient information

for the method. With the prices of digital cameras decreasing, this method could soon become

popular for creating 3D models from physical objects.

However, the approach has the following limitations, some of which might be considered

for future work.

• As the photo consistency criteria only holds for Lamertian surfaces, objects with specu-

larities can not be modeled using this approach.

• The technique does not recover the topology of the object as this was not part of the

task. If topological adaptation is desired, the technique of Duan et al. [6, 7] can be used

to modify the topology of the mesh iteratively.

• As normals and mean curvatures are only defined for interior vertices of a mesh, the

boundaries of objects do not get deformed and this might result in distorted models. An

60

CHAPTER 6. CONCLUSION AND FUTURE WORK 61

idea for future work is to examine this drawback further and to develop and test theories

on how to deform the boundaries of meshes.

• As the photo consistency has the property of being arbitrary when far away from the true

object, initial models that are too far from the true object can not capture the shape. Note

that this contradicts the results of Duan et al. [6, 7], who claim that any initial model can

be used for the reconstruction. Their statement might be true if the background of the

input images is controlled; it does not hold for images with arbitrary background. The

best way to compensate this drawback is to start with an initial model that is close to

the true surface. Our way of finding an initial surface requires user-interaction, which

is disadvantageous in two ways. First, the user needs to define points in several input

images, which is laborious. Second, the quality of the initial model depends strongly

on the accuracy with which the user creates it. An idea for future work is to control the

background of the images and to extract the 2D silhouettes of the object in each image.

Using this information, the visual hull can be constructed and used as an initial model

as in [9, 10]. As the topology of the visual hull might differ from the topology of the

true object, adaptive topology modification becomes necessary.

• As the photo consistency function is noisy, the computation of the gradient requires

some efforts and in our case around half of the computation time (see Section5.4). This

slows down the algorithm. Furthermore, the gradient is not the most reliable quantity as

it is computed for a noisy function. Therefore, an idea for future work is to examine the

problem of surface deformation without using gradient information. One way of doing

so is to use a volumetric approach like Esteban and Schmitt [9, 10]. This approach also

takes much computation time, but it always produces reliable information. Furthermore,

this volumetric approach works better in the presence of highlights than our approach.

Another idea of improving the current algorithm is to use a linear search instead of a

gradient based search. The main idea of a linear search is to compute the photo consis-

tency for some sample points along the discrete normal of a vertex. The vertex is then

moved to the sample point with minimal photo consistency.

• As the texture maps obtained from PhotoModeler 5 Pro are damaged by the edge col-

lapse operator, the output models do not in general have visually pleasant texture infor-

mation. This drawback can be compensated if a new texture map is created after the

deformation is finished. An idea for future work is to create a new texture map using a

CHAPTER 6. CONCLUSION AND FUTURE WORK 62

particle-based approach that filters highlights like explained in [9, 10].

• The quality of the resulting models can not be clearly validated and quantifiable errors

can not be measured. If a volumetric approach was used for the deformation of the

surface like in [9, 10], the distance between the resulting mesh and the computed volume

could be used to estimate the reconstruction error.

A. Used libraries

In this project, several libraries are used. This Appendix gives a short overview of the libraries

and of the tasks that are performed using the libraries.

A.1. Geometry library

A C++ library that was developed in NRC’s Computational Video Group is used to store the

triangular mesh in an efficient way. The library consists of various classes and methods. In

this section, only the main idea is presented.

• 3D vectors are stored in a class namedg_Vector . The class contains methods for stan-

dard vector operations: compute the length or squared length of the vector, normalize

the vector, negate the vector, add, subtract or multiply (dot product and cross product)

two vectors and compute the angle between two vectors.

• Vertices are stored in a class namedg_Node . A g_Node consists of ag_Vector , the

coordinates of the vertex, and a list of faces to which the vertex belongs.

• Faces are stored in a class namedg_Element . A g_Element consists of a list of

g_Nodes , the vertices that form the face, and a list of meshes to which the face belongs.

• Meshes are stored in a class namedg_Part . A g_Part consists of a list ofg_Nodes

and a list ofg_Elements . Those are all the vertices and faces that form the mesh.

If a mesh is stored using this structure, no duplicates are necessary, i.e. each vertex and each

face is stored exactly once. The library contains methods that make it easy to find neighboring

faces, vertices on the boundary, and so on. Edges are not stored, but only created when needed.

The structure for edges is a class namedg_PEdge that knows the two vertices that form the

edge.

i

CHAPTER A. USED LIBRARIES ii

A.2. OpenGL

The API OpenGL is used for rendering the meshes. OpenGL is a powerful open source C

library that can be used for rendering 2D and 3D scenes and for managing user interactions.

The graphical user interface shown in Figure5.7 was implemented using OpenGL. Further-

more, the library was used to determine which vertices of the mesh are visible for a given

camera station.

A.3. CLAPACK

The CLAPACK library is a powerful open source C linear algebra library. In this project, the

library is used for the matrix multiplications that are necessary to project a 3D point to an

image plane (see Section2.1).

A.4. Algorithms from “Numerical Recipes in C”

Press et al. [22] provide C code with the book. Here, only the code used in this project is

mentioned, which is a method that smoothes noisy data using Savitzky-Golay filters and that

computes the first derivative of the smoothed function (see Chapter5.3.3).

A.5. JPEG library

The JPEG library is a C library that is used to import the input images as well as the texture

files given in jpeg format. The RGB color values of each pixel are stored.

B. Implemented classes

In this Appendix, the most important classes necessary for the deformation are explained.

The project contains more classes, e.g., to render the scene in different display modes and

subclasses of the Geometry library that are more suitable to store the given information. These

classes are not important for the deformation algorithm and therefore not discussed further in

this thesis. For each of the following classes, only the most important methods are mentioned;

variables and small methods are not discussed.

B.1. Import the model

The classReadVRMLimports the VRML2.0 file from PhotoModeler 5 Pro and stores the in-

formation about the mesh in a subclass ofg_Part . The class does not read general VRML2.0

files, but only the ones that are used to describe a triangular mesh with texture map in Photo-

Modeler 5 Pro.

ReadVRML::ReadVRML(FILE * file)

reads the mesh from the input file using the following three private methods.

void ReadVRML::readPoints(

g_Part * part,

char * buf,

g_NodeContainer * currentNodes

)

reads the vertices that form the mesh. The input parameters are a meshpart and a buffer

buf to read the file. The method returns a list of vertices incurrentNodes .

void ReadVRML::read2dPoints(g_Part * part)

reads texture coordinates and returns them in a new meshpart .

iii

CHAPTER B. IMPLEMENTED CLASSES iv

void ReadVRML::readTriangles(g_Part * part, g_NodeContainer * nodes)

reads the faces that form the mesh. The input parameters are the meshpart and the list

nodes of all vertices of the mesh. All the faces are stored inpart when the method returns.

The classReadPhotoFile imports the calibration file from PhotoModeler 5 Pro. The

information on the input images is stored in a class namedPhoto . For more information on

the input parameters, see Section2.1.

Photo::Photo(

double centerX, double centerY, double centerZ,

double omega, double phi, double kappa,

double pixelSizeX, double pixelSizeY,

double focalLength,

int principalPointX, int principalPointY,

double K1, double K2,

int imageWidth, int imageHeight,

char * name

)

creates a new instance of thePhoto class. Input parameters are:

• the coordinates of the center of projectionO(centerX, centerY, centerZ) in

m.

• the three rotation anglesomega, phi andkappa in degrees that define the rotation

matrixR. The rotations about thex, y andz axis areomega, phi andkappa respec-

tively. For more information on the rotation angles, refer to [1].

• pixelSizeX andpixelSizeY give the sizes of pixels in mm inx andy direction

respectively.

• focalLength gives the focal length in mm.

• the coordinates of the principal pointo(principalPointX, principalPointY)

in mm.

• K1 andK2 are the coefficients for radial distortions.

CHAPTER B. IMPLEMENTED CLASSES v

• imageWidth andimageHeight denote the width and height of the image in pixel

units respectively.

• name contains the filename of the image file. The image is given in jpeg format.

Using this information, the input image is imported using the JPEG library; the matrix that

projects a given 3D point to the image plane is computed using the CLAPACK library.

void Photo::projPtToPhoto(double x, double y, double z, int * returnVal)

projects the given pointP(x,y,z) (in m) to this photo. It is assumed thatP is visible in the

given scene. InreturnVal the RGB color values of the point are returned.

B.2. Subdivision

A classSubdivision that subdivides all the faces that are selected by the user was imple-

mented (for subdivision see Section3.4.1). Some of its methods are:

void Subdivision::quadrisectLoopElements(g_ElementContainer container)

subdivides all the faces given in the listcontainer using the Loop subdivision scheme and

stores the result in a member variable of the class.

void Subdivision::quadrisectModLoopElements(

g_ElementContainer container,

double tol

)

subdivides all the faces given in the listcontainer using the modified Loop subdivision

scheme and stores the result in a member variable of the class. Up to the tolerancetol , edges

are considered to be smooth.

void Subdivision::quadrisectButterflyElements(

g_ElementContainer container,

bool keepSelected

)

subdivides all the faces given in the listcontainer using the Butterfly subdivision scheme

and stores the result in a member variable of the class. If the flagkeepSelected is set to

true , the faces the user selected stay selected when the method returns. Otherwise, the faces

are unselected by the method.

CHAPTER B. IMPLEMENTED CLASSES vi

B.3. Mesh Refinement

A classMeshRefinement that refines a mesh both globally and locally was implemented.

MeshRefinement::MeshRefinement(g_ElementContainer cont)

creates a new instance ofMeshRefinement ; cont contains a list of faces of the mesh.

void MeshRefinement::splitedges(double edgeTol)

splits all the edges of the mesh that are longer than the toleranceedgeTol .

void MeshRefinement::removeShortEdges(double edgeTol)

replaces all the edges that are shorter than the toleranceedgeTol by a vertex and removes

the faces adjacent to that edge.

g_NodeContainer MeshRefinement::quadrisectElements(

g_ElementContainer cont

)

refines the mesh adaptively by quadrisecting all the faces contained incont .

B.4. Deformation

The smoothing, regularization and deformation of the model is implemented in a class named

Deformation . All of the algorithms implemented in this class are explained in Section5.3.

The class contains amongst others the following methods:

void Deformation::deformShape(

g_ElementContainer cont,

int size,

double err,

double velocity

)

The input parameters of this public method are a listcont of all faces contained in the mesh,

the sizesize ×size of the rectangular neighborhood that is used to compute the photo

consistency, the acceptable user defined errorerr that is used for the stopping criteria and

CHAPTER B. IMPLEMENTED CLASSES vii

the initial speedvelocity of the surface. The method performs the entire deformation of

the mesh, including initial refinement, smoothing, regularization, and adaptive refinement of

the mesh. The deformed mesh is stored in a member variable of the class when the method

returns. To perform all these tasks, the method calls several private methods, some of which

are discussed in more detail in the following.

void Deformation::visibilityInfo(int ** result)

finds all the visible vertices for each camera position using the OpenGL depth buffer. The

result is returned in the integer matrixresult of dimension (number of vertices)× (number

of camera stations) given as input parameter. The space of the matrix needs to be allocated

before the method is called. When the method returns,

result [i][j] =

1, if vertex i is visible in image j,

0, otherwise.

g_Container <g_NodeContainer> Deformation::findOneNeighbourhood()

finds and returns a structure that contains information about neighboring vertices. For each

vertex of the mesh, all the vertices in a one-ring neighborhood are found.

g_Container <g_Vector> Deformation::calculateNormals()

finds and returns the normal vectors for each interior vertex of the mesh.

g_Container <g_Vector> Deformation::calculateCurvatures(

g_Container <bool> & meanCurvOK,

g_Container <g_NodeContainer> neighbourInfo

)

finds and returns the mean curvature normals for each interior vertex of the mesh. The in-

put parameters are an empty list of booleans that returns information on whether the mean

curvature was computed without problems and a structureneighbourInfo that contains

information about neighboring vertices.

double Deformation::photoConsistency(

std::vector<Photo * > photoVector,

g_Vector vec,

g_Vector normal,

int size

)

CHAPTER B. IMPLEMENTED CLASSES viii

computes the photo consistency value for a vertexP. Input parameters are a list of images

whereP is visible, the coordinates of vertexP, its discrete normal vector, and the sizesize ×size

of the rectangular neighborhood that is used to compute the photo consistency.

g_Vector Deformation::calculateConsistencyGradient(

std::vector<Photo * > photoVector,

g_Node * node,

g_Vector normal,

int size,

double rangeIni

)

computes the gradient of the photo consistency function in a vertexP using Savitzky-Golay

smoothing filters from [22]. The input parameters are a list of images on whichP is visible,

the vertexP, its discrete normal vector, the sizesize ×size of the rectangular neighborhood

that is used to compute the photo consistency, and the sizerangeIni of the neighborhood

used to least-square fit a polynomial. The return value is the gradient vector.

double Deformation::moveShape(

int size,

double velocity,

double stepsize,

double shortestEdge,


)

moves each vertex of the mesh according to equation (4.13) and stores the deformed mesh in a

member variable of the class when it returns. The input parameters are the sizesize ×size

of the rectangular neighborhood that is used to compute the photo consistency, the initial speed

velocity of the surface, a size smaller than the spacial discretization divided by√

3 needed

for the CFL conditionstepsize , the shortest edge lengthshortestEdge of the mesh,

and a structureneighbourInfo that contains information about neighboring vertices.

void Deformation::smoothWithCurvatureNormal(

double stepsize,

double tolerance,

CHAPTER B. IMPLEMENTED CLASSES ix

int maxStep,

int minStep,


)

smoothes the mesh using the mean curvature normal in an explicit Euler scheme and stores the

smoothed mesh in a member variable of the class when it returns. The input parameters are

thestepsize to be used in the Euler scheme, a structureneighbourInfo that contains

information about neighboring vertices, and three sizes needed for the stopping criteria. If the

mean curvatures in the vertices of the mesh changes less thantolerance in two adjacent

iterations and if the number of iterations is betweenminStep andmaxStep , the iteration

stops. Furthermore, the iteration stops aftermaxstep steps.

void Deformation::tangentLaplace(

double stepsize,

double tolerance,

int maxStep,

int minStep,


)

This method regularizes a mesh using the tangent Laplace operator in an explicit Euler scheme

and stores the regularized mesh in a member variable of the class when it returns. The input pa-

rameters are analogous to the ones used for the methodsmoothWithCurvatureNormal .

bool Deformation::refineRegionsOfHighCurv(

MeshRefinement * refine,

double tolerance,


)

Method that refines the mesh adaptively. The input parameters are an instance of the afore-

mentioned classMeshRefinement that contains the methods used for refining the mesh, a

tolerance , and a structureneighbourInfo that contains information about neighboring

vertices. If the mean curvature of a vertexP is higher thantolerance , all its neighboring

faces that only have edges longer than1 exp−3 are refined adaptively.

CHAPTER B. IMPLEMENTED CLASSES x

B.5. Export the model

A classExportVRML was implemented that exports the mesh as VRML2.0 file. As the class

that imports the VRML2.0 file from PhotoModeler 5 Pro, this class does not write general

VRML2.0 files, but only ones that are analogous to the ones expected from PhotoModeler 5

Pro as input files.

B.6. Possible user-interactions

Using the program, the user has the following possibilities of interaction:

• rotate and zoom the object.

• view the object in five different display modes: wire frame mode that shows only the

edges of the mesh, filled mode that shows only the faces of the mesh, point mode that

shows only the vertices of the mesh, texture mode that shows the faces with their texture

map, and a standard mode that shows the faces and edges of the mesh.

• display the scene in either Gouraud shading or flat shading mode.

• select faces of the mesh.

• subdivide selected regions of the mesh using three subdivision schemes: Loop, modified

Loop, or Butterfly subdivision.

• project all the selected vertices to the input images and display the result in a new win-

dow.

• start the deformation of the mesh. It is necessary to enter a tolerance for the stopping

criteria.

• export the mesh as VRML2.0 file.

Figure5.7 shows a screenshot of the program with help menu. Some other interactions that

approximate NURBS-surfaces for selected regions of the mesh are possible. These methods

are not discussed here as they are not core for this project.

CHAPTER B. IMPLEMENTED CLASSES xi

B.7. Parameters

In this project, several parameters are specified automatically by the program. In this Section,

we discuss the parameters and how they are specified.

• The size ofNbhd(Pi) discussed in Section5.3.3is 5× 5 pixels.

• The initial speedv used in equation (4.13) is 0.001.

• The edge lengthsbl andbu used for initial refinement (see Section5.3.2)are chosen in

relation to the shortest edge length of the initial mesh. IfeS denotes the shortest edge of

the initial mesh,bl = 45eS andbu = 2eS.

• When using the mean curvature normal computed according to equation (3.6), the mean

curvature ofP is connected to the edge lengths in a one-neighborhood ofP. Therefore,

the tolerance for adaptive refinement is connected to the edge length of the mesh. The

tolerance is chosen aseS

2.

• The stepsize∆ for computation of∇g is connected to the edge length of the current

mesh.∆ is the shortest edge length of the current mesh divided by two.

All these parameters were found using experiments. Therefore, they might not be optimal for

any given model.

Bibliography

[1] PhotoModeler Pro 5 User Manual. Eos Systems Inc., 2003.

[2] V. Caselles, R. Kimmel, and G. Sapiro. Geodesic active contours. InInternationalJournal of Computer Vision, number 22(1), pages 61–79, 1997.

[3] V. Caselles, R. Kimmel, G. Sapiro, and C. Sbert. Minimal surfaces based object seg-mentation. InIEEE Transactions on Pattern Analysis and Machine Intelligence, number19(4), pages 394–398, April 1997.

[4] R. Courant, K. Friedrichs, and H. Lewy. Ueber die partiellen Differenzengleichungen dermathematischen Physik. InProceedings of the Clay Summer School on Global Theoryof Minimal Surfaces (Hass, Hoffman, Jaffe, Rosenberg, Schoen, and Wolf Editors), pages32–74, 1928.

[5] M. Desbrun, M. Meyer, P. Schroeder, and A. H. Barr. Implicit fairing of irregular meshesusing diffusion and curvature flow. InSIGGRAPH 99 Conference Proceedings, pages317–324, 1999.

[6] Y. Duan. Topology Adaptive Deformable Models for Visual Computing. Ph.D Disserta-tion, State Univeristy of New York, June 2003.

[7] Y. Duan, L. Yang, H. Qin, and D. Samaras. Shape reconstruction from 3D and 2D datausing PDE-based deformable surfaces. InEuropean Conference on Computer Vision,pages 238–251, May 2004.

[8] N. Dyn, K. Hormann, S.-J. Kim, and D. Levin. Optimizing 3D triangulations using dis-crete curvature analysis. InInnovations in Applied Mathematics. Vanderbilt UniversityPress, Nashville, 2001.

[9] C. Hernandez Esteban and F. Schmitt. Silhouette and stereo fusion for 3D object model-ing. InProceedings, 4th International Conference on 3D Digital Imaging and Modeling,3DIM 2003, Banff, Alberta, Canada, pages 46–53, October 2003.

[10] C. Hernandez Esteban and F. Schmitt. A snake approach for high quality image-based3D object modeling. In2nd IEEE Workshop on Variational, Geometric and Level SetMethods in Computer Vision, Nice, France, pages 241–248, October 2003.

[11] M. S. Floater. Parameterization and smooth approximation of surface triangulations.14:231–250, 1997.

[12] D. A. Forsyth and J. Ponce.Computer Vision A Modern Approach. Prentice-Hall, 2003.

xii

BIBLIOGRAPHY xiii

[13] R. Hartley and A. Zisserman.Multiple View Geometry in Computer Vision. CambridgeUniversity Press, 2000.

[14] H. Hoppe, T. de Rose, T. Duchamp, M. Halstead, H. Jin, J. McDonald, J. Schweizer, andW. Stuetzle. Piecewise smooth surface reconstruction. InIn Computer Graphics, Vol.28, (SIGGRAPH ’94 Proceedings), pages 295–302, July 1994.

[15] J.Hoschek and D.Lasser.Fundamentals of Computer Aided Geometric Design. A KPeters, 1993.

[16] K. N. Kutulakos and S. M. Seitz. A theory of shape by space carving. InTechnicalReport TR692, Computer Science Dept., U. Rochester. May 1998.

[17] U. Labsik, K. Hormann, and G. Greiner. Using most isometric parameterizations forremeshing polygonal surfaces. InGeometric Modeling and Processing. Theory and Ap-plications. Proceedings, pages 220–228, 2000.

[18] J.-L. Maltret and M. Daniel. Discrete curvatures and applications:a survey. InRap-port de recherche, Laboratoire des Sciences de l’Information et des Systemes, numberLSIS.RR.2002.002.

[19] M. Meyer, M. Desbrun, P. Schroeder, and A. H. Barr. Discrete differential-geometryoperators for triangulated 2-manifolds. InVisMath, 2002.

[20] L. Piegl and W. Tiller.The NURBS book, second edition. Springer-Verlag, 1997.

[21] K. Polthier. Computational aspects of discrete minimal surfaces. InProceedings ofthe Clay Summer School on Global Theory of Minimal Surfaces (Hass, Hoffman, Jaffe,Rosenberg, Schoen, and Wolf Editors), 2002.

[22] W. Press, B. Flannery, S. Teukolsky, and W. Vetterling.Numerical Recipes in C: the Artof Scientific Computing. Cambridge University Press, 1993.

[23] H. Qin and D. Terzopoulos. D-NURBS: A physics-based geometric design framework.In IEEE Transactions on Visualization and Computer Graphics, volume 2(1), pages 85–96, March 1996.

[24] S. M. Seitz and R. Dyer. Photorealistic scene reconstruction by voxel coloring. InProceeding of Computer Vision and Pattern Recognition Conference, pages 1067–1073,1997.

[25] D. Terzopoulos and H. Qin. Dynamic NURBS with geometric constraints for interactivesculpting. InACM Transactions on Graphics, volume 13(2), pages 103–136, April 1994.

[26] E. Trucco and A. Verri.Introductory Techniques for 3-D Computer Vision. Prentice-Hall,1998.

[27] W. Wilke. Segmentierung und Approximation grosser Punktwolken. Ph.D Dissertation,Technische Universitaet Darmstadt, November 2000.

[28] L. Zhang and S. M. Seitz. Image-based multiresolution shape recovery by surface de-formation. InProceedings of SPIE: Videometrics and Optical Methods for 3D ShapeMeasurement, San Jose, CA, pages 51–61, January 2001.

[29] D. Zorin, P. Schroeder, A. DeRose, L. Kobbelt, and J. Stam. Subdivision for modelingand animation. InSIGGRAPH ’99 course notes, 1999.

Declaration of the author

I hereby declare that I have written this work independently and used none other than theindicated aids. References to the work are clearly documented.

Ottawa, January 11, 2005 Stefanie Wuhrer

xiv

niveristy of pplied ciences tuttgart...

Documents