a hybrid image-based modelling algorithm - crpitcrpit.com/confpapers/crpitv135nguyen.pdf · patrice...

A Hybrid Image-Based Modelling Algorithm

Hoang Minh NguyenThe University of

Auckland, New [email protected]

Burkhard WunscheThe University of


Patrice DelmasThe University of


Christof LutterothThe University of


Abstract

This paper explores the practical aspects associatedwith visual-geometric reconstruction of a complex 3Dscene from a sequence of unconstrained and uncali-brated 2D images. These image sequences can be ac-quired by a video camera or a handheld digital camerawithout the need for camera calibration. Once sup-plied with the input images, our system automaticallyprocesses and produces a 3D model. We propose anovel approach, which integrates uncalibrated Struc-ture from Motion (SfM), shape-from-silhouette andshape-from-correspondence, to create a quasi-densescene geometry of the observed scene. In the sec-ond stage, surface and texture are applied onto thegenerated scene geometry to produce the final 3Dmodel. The advantage of combining silhouette-basedand correspondence-based reconstruction approachesis that the new hybrid system is able to deal withboth featureless objects and objects with concavedregions. These classes of objects usually pose greatdifficulty for shape-from-correspondence and shape-from-silhouette approach. As the result, our approachis capable of producing satisfactory results for a largeclass of objects. Our approach does not require anya priori information about camera and image acquisi-tion parameters. We tested our algorithm using avariety of datasets of objects with different scales,textures and shapes acquired under different lightingconditions. The results indicate that our algorithm isstable and enables inexperienced users to easily cre-ate complex 3D content using a standard consumerlevel camera.

Keywords: image-based modelling, correspondence-based reconstruction, silhouette-based reconstruction

1 Introduction

There is an increasing amount of applications thatrequire high quality 3D representations, e.g. forarts, commerce, virtual heritage, training, education,computer games, virtual environments, documenta-tion, exchanging information, and social networkingapplications. Conventionally, 3D digital models areconstructed using modelling tools such as Maya,3D Max or Blender. Although these applicationsenable graphic designers to construct highly realisticand complex 3D models, they have a steep learning

Copyright c©2013, Australian Computer Society, Inc. Thispaper appeared at the 36th Australasian Computer ScienceConference (ACSC 2013), Adelaide, South Australia, January-February 2013. Conferences in Research and Practice in Infor-mation Technology (CRPIT), Vol. 135, Bruce H. Thomas, Ed.Reproduction for academic, not-for-profit purposes permittedprovided this text is included.

curve and often require a considerable amount oftraining and artistic skills to use. These restrictionsrender them unsuitable for non-professionals. Theintroduction of specialised hardware, such as laserscanners, has simplified the creation of models fromreal physical objects. However, while many of thesesystems deliver highly accurate results, they areusually extremely costly and often have restrictionson the size and surface properties of objects in thescene. Consequently, there is a critical need toimprove on the status quo by making 3D contentcreation available to a wider group of users.

In recent years image-based modelling hasemerged as a new approach to simplify 3D contentcreation. In contrast to traditional geometry-basedmodelling and hardware-heavy approaches, theimage-based modelling techniques confront theformidable, and still unanswered, challenge of creat-ing a comprehensive representation of 3D structureand appearance of a scene from visual informationencoded in 2D still images. Image-based modellingtechniques are usually less accurate, but offer veryintuitive and low-cost methods for recreating 3Dscenes and models.

The ultimate goal of our work is to createa low-cost system that allows users to obtain3D reconstruction of the observed scene using aconsumer-grade camera. The idea behind the systemis very simple: Once supplied with the input images,our system will automatically process and producea 3D model without any a priori information aboutthe scene to be reconstructed.

Reconstructing 3D scenes from a collection of 2Dphotographic images requires knowing where eachphoto was taken. A common approach to obtainthis information is to perform camera calibrationmanually. However, this method requires a setupand preparations that are usually too sophisticatedfor inexperienced users. Furthermore, this methodplaces a restriction on the types of scenes that canbe reconstructed since it is not always feasible toperform camera calibration for a large and complexscene. For this reason, for each input image, ouralgorithm needs to automatically estimate theintrinsic and extrinsic parameters of the camerabeing used and compute the 3D coordinates of asparse set of points in the scene (the scene geome-try or point cloud). Surfaces and texture are thenapplied to this point cloud to produce the final model.

Due to the sparseness of the scene geometry, prob-lems such as surface artifacts, noise, and blurry tex-tures might arise during the surface and texture re-

Proceedings of the Thirty-Sixth Australasian Computer Science Conference (ACSC 2013), Adelaide, Australia

115

construction processes. Most previous works ap-proached these problems by constraining the types ofobjects and requiring manual user inputs to aid theirsystems in deducing the structure of the object to bereconstructed. However, these requirements breachour goal of creating an easy-to-use system and pro-viding the capability of reconstructing any type of ob-ject. We overcome these problems by using a hybridapproach that integrates shape-from-correspondenceand shape-from-silhouette methods. The system willperform 3D reconstruction using the following steps:

1. Camera parameter estimation

2. Initial point cloud generation from extracted keyfeatures

3. Increase the density of the point cloud by exploit-ing silhouette information

4. Reconstruct the object’s surface and texture toproduce the final model

The remainder of this paper is organised as follows.After a description of the related work on image-basedmodelling, a discussion about the algorithms used inour system is presented in section 3. Results are dis-cussed in section 4. Section 5 concludes and sum-marises the paper and gives a brief outlook on direc-tions for future research.

2 Related Work

Although there has been much interest and study of3D modelling techniques over the last few decades,robustly and automatically obtaining 3D models isstill a difficult task. For the past 30 years, creation ofartiticial 3D models using conventional graphics andanimation software such as Maya and 3D Max hascontinued to be the most popular approach. To thisend, various tools have been proposed to assist thehuman designer by using images of the object to bemodelled [THP08]. The reason why manual creationof 3D models remains the prevalent approach despiteintensive study and research is that, in fact, there isno computer vision or graphics technique that worksfor every object. The difficulty in selecting the mostsuitable 3D reconstruction technique that works for alarge class of objects justifies the abundant literaturethat exists on this subject [Est04, REH06].

Various multiple view reconstruction techniqueshave been explored in recent years. Amongst them,the most well-known and successful class of tech-niques have been shape from silhouette and shapefrom correspondence.

The shape from silhouette class of algorithmsexploits silhouette information to create intersectedvisual cones, which are subsequently used to derivethe 3D structure of an object. Shape from silhouette-based methods are popular for shape estimation dueto their good approximation qualities for a largenumber of objects, their stability with respect to theobject’s material properties and textures, as well astheir ease and speed of implementation. Exploitingsilhouette information for 3D reconstruction was firstconsidered by Baumgart in 1974. In his pioneeringwork [Bau74], the author computed polyhedral shapeapproximations of a baby doll and a toy horse byintersecting silhouette cones. Following Baumgart’swork, many different variations of the shape fromsilhouette paradigm have been studied and proposed.

Aggarwal et al. [MA83] proposed a methodthat used an intensity threshold-based segmentationmethod to separate the object foreground andbackground in each input image. A connectedcomponent analysis of the segmented image producesthe silhouette. In order to compute the intersectionof different silhouette cones, the authors used arun-length encoded, uniformly discretised volume.The idea behind this is to consider each image pointas a statistical occupancy sensor. The point obser-vations are then analysed to deduce where matter islocated in the observed scene. This is achieved bydiscretising the scene into three dimensional voxels,which are then projected into silhouette images. Thetask is to mark the individual voxels as either “in”or “out”. If the projection of a voxel belongs to theforeground in all silhouette images, then the voxel islabeled as “in” otherwise it is labeled as “out”. If avoxel is labeled as “out”, it is excluded from furthercomputations of the object structure.

Matusik et al. [MBR+00] improve the efficencyof shape from silhouette techniques by taking ad-vantages of epipolar geometry. In their method,3D silhouette-intersection computation is reducedto 2D by projecting one silhoutte onto another.The intersection is then carried out in image space.Franco et al. [FB10] attempted to improve theeffciency of Matusik’s method and to produce awater-tight model. Their method involves two mainsteps. First, point clouds of the observed scene aregenerated by back-projecting viewing edges of thevisual hull. Next, missing surface points are recov-ered by exploiting local orientation and connectivityrules. A final connection walkthrough is then carriedout to construct the planar contours for each face ofthe polyhedron.

In recent years, various correspondence-basedtechniques have been explored. However, most ofthese methods were designed to tackle reconstructionproblems related to a particular class of objects. Asa result, their use is often limited.

Fruh et al. [FZ03] used a combination of aerialimagery, ground colour, and LIDAR scan data tocreate textured 3D models of an entire city. Whilethe proposed method produces visually acceptableresults, it suffers from a number of drawbacks thatrender it impractical for consumer-level applications.In particular, the method requires intensive use ofspecial hardware during the data acquisition step.This includes a vehicle equipped with fast 2D laserscanners and a digital camera to acquire texture datafor an entire city at the ground level and a LIDARoptical remote sensor. Additionally, the requiredmanual selection of features and the correspondencein different views is very tedious, error-prone, andcannot be scaled up well.

Xiao et al. [XFT+08] presented a semi-automaticimage-based approach to recover 3D structure offacade models from a sequence of street view images.The method combines a systematic and automaticdecomposition scheme of facades for analysis andreconstruction. The decomposition is accomplishedby recursively splitting the complete facades intosmall segments, while still preserving the overallarchitectural structure. Users are required to providefeedback on facade partitioning. This methoddemonstrated excellent results.

Quan et al. [QTZ+06] presented a method for

CRPIT Volume 135 - Computer Science 2013

116

modelling plants. In their method, segmentation isperformed in both image space (by manually select-ing areas in input images) and in 3D space. Usingthe segmented images and 3D data, the geometry ofeach leaf is recovered by setting a deformable leafmodel. Users are also required to provide hints onsegmentation. The main disadvantage of this methodis that it requires full coverage of the observed model(360 degree capture), which may not always bepossible in practice due to obstructions and spacelimitations. Branches are modelled through a simpleuser interface.

Tan et al. [TZW+06, LQ02] introduced a methodfor creating 3D models of natural-looking trees froma collection of images. Due to the large leaf count,small image footprint and widespread occlusions,it is not possible to recover accurate geometricrepresentation for each leaf. In order to overcomethis problem, the authors populate the manufacturedtree with leaf replicas from segmented input imagesto reconstruct the overall tree shape.

3 Algorithms

Our proposed approach uses a coarse-to-fine strategywhere a rough model is first reconstructed andthen sequentially refined through a series of steps.The approach consists of two main stages: scenegeometry extraction and visualisation. Both of thesestages are achieved by dividing the main probleminto a number of more manageable subproblems,which can then be solved by separate modules. Theentire reconstruction process is illustrated in Figure 1.

The objective of the first stage is to recover thescene geometry from the input images. This stagebegins with the camera parameters for each view be-ing estimated. This is accomplished by the automaticextraction of distinctive features and establishmentof point correspondences in stereo image pairs. Wethen isolate all matching images, selecting those thatview a common subset of 3D points. Given a setof matching images, a scene geometry (point cloud)and camera pose can be estimated simultaneouslyby Structure from Motion and subsequently refinedby Bundle Adjustment. Next, additional pointsare added to the computed scene geometry byexploiting the silhouette information to producea more complete geometry. In the final step, aresampling technique is applied onto the point cloudsto produce the final scene geometry. In contrast toan initial version of this algorithm [NWDL11] weintegrate silhouette information, which is used in thepoint cloud generation and the surface reconstruction.

In the second stage, we tackle the problem ofhow to transform the scene geometry recovered inthe preceding stage into a realistic representation ofthe scene. This is accomplished by applying surfacesand texture to the resulting scene geometry. Theoutcome of this stage is a complete 3D representationof the observed scene.

3.1 Camera Parameter Estimation

One key challenge in extracting 3D representationof a scene from a sequence of 2D images is thatthe process requires knowing where each photowas taken and in what direction the camera was

Figure 1: Overview of our algorithm for reconstructing 3Dmodels from a set of unconstrained and uncalibrated images.

pointed (extrinsic parameters), as well as the internalcamera settings, such as zoom and focus (intrinsicparameters), which influence how incoming light isprojected onto the retinal plane.

In order to recover such information, the systemwill first detect and extract points of interest suchas corners (edges with gradients in multiple direc-tions) in the input images. This is accomplishedusing the SIFT feature detector [Low06]. Featurepoints extracted by SIFT are highly distinctive, andinvariant to different transformations and changesin illumination, and additionally have a high in-formation content [HL07, BL05]. Once featureshave been identified and extracted from all theimages, they are matched. This is known as thecorrespondence problem. Given a feature in animage I1, what is the corresponding feature (theprojection of the same 3D feature) in the other imageI2. This can be solved by using a Euclidean distancefunction to compare the two feature descriptors.All the detected features in I2 will be tested andthe one with minimum distance is selected [HQZH08].

Once all interest points have been found, wematch them across views and estimate the cameraparameters and 3D coordinates of the matchedpoints simultaneously using the Structure fromMotion (SfM) technique. Structure from Motiondesignates the computation of the camera poses andscene geometry (3D points) simultaneously from a setof images and their feature correspondences. Moreprecisely, SfM can be formulated as an optimisation


117

problem, where the goal is to determine the configu-ration of cameras and 3D points that, when relatedthrough the equations of perspective projection, bestagree with the detected feature correspondences.This computation is carried out by exploiting aconstraint between correspondences and the physicalconfiguration of the two cameras. This is a powerfulconstraint, as two 3D rays chosen at random arevery unlikely to pass close to one another. Givenenough point correspondences between two images,the geometry of the system is sufficiently constrainedto determine the camera poses (up to scale). Thekey to the success of this procedure is that successiveimages must not vary significantly, i.e. must containoverlapping visual features.

Our solution takes an incremental approach, inwhich a pair of images is selected to initialise the se-quence. This initial pair should have a large num-ber of matches, but must also have a large baseline.This is to ensure that the 3D coordinates of observedpoints are well-conditioned. The remaining imagesare added to the optimisation one at a time orderedby the number of matches [REH06, SSS06]. The Bun-dle Adjustment technique is followed to refine and im-prove the obtained solution. The accuracy of the re-construction depends critically on the final step. Fig-ure 2 demonstrates a scene geometry created usingStructure from Motion and Bundle Adjustment.

Figure 2: Two views of the scene geometry generated from 37input images.

3.2 Scene Geometry Enhancement

At this stage, we have successfully acquired bothcamera parameters and scene geometry. As thescene geometry is derived from distintive features ofthe input images, they are often sparse. In orderto produce a more complete and comprehensivemodel, the obtained scene geometry needs to beenhanced. This can be accomplished by exploitingthe silhouette information of the observed scene togenerate additional 3D points.

The process is as follows: First, silhouette in-formation of the observed scene in each view isextracted using the Marching Squares algorithm[Lor95] producing sets of silhouette points. Each ofthese sets represents an exhaustive point-by-pointisoline list of every pixel which constitutes a sil-houette contour. To define a silhouette using anenormous contour point set will inevitably upsurgethe computational expense. To avoid this, silhouettedata must be preprocessed to reduce the numberof silhouette contour points. We achieve this byfirst performing a Delaunay triangulation of thecontour points. The output triangular mesh frame-work is then fed to a mesh simpification algorithm[Mel98] to decrease the number of triangles, which in

turn effectively reduces the number of contour points.

Each set of contour points together with thecamera centre of that view defines a viewing cone,whose apex is located at the camera’s optical centre.Each viewing cone consists of a number of conelines. A cone line represents a 3D line formed bya silhouette contour point and the camera’s opticalcentre. The polyhedral visual hull information canbe obtained by calculating the intersection of theseviewing cones. However, 3D polygon intersectionis non-trivial and often computationally expensive.Matusik et al. [MBR+00] proved that equivalentresults can be obtained by the following steps:

Assume that we want to compute the intersectionof silhoutte A and B.

1. Projecting each cone line of the viewing cone Aonto the silhoutte B.

2. Calculate the intersection of the projected lineand the silhouette B in 2D.

3. Lifting the computed intersection points from 2Dto 3D yields a set of 3D points, which defines aface of the polyhedral visual hull.

Projecting a cone line onto another silhou-ette As each of these rays (cone lines) is definedby the camera’s optical centre and the projecting ray−−−→OASi, the projection of a ray onto the image B iscomputed as follows:

Projecting the camera’s optical centre of A ontoB (epipole of B)

eB = PB

[OA1

](1)

where PB = [QB , qB ] is the 3 × 4 projectionmatrix of B.

Projecting the point Di at the infinity of the pro-

jecting ray−−−→OASi onto B

dB = PB

[DA0

]= PB

[Q−1A Si0

](2)

These two points define a 2D line on imageB, which will subsequently be used to computeintersections.

Line-Silhouette Intersection The naiveapproach of computing intersections of a line anda silhouette requires the traversal of each pair ofsilhouette contour points and performing a line-lineintersection with the given line. This method,however, proves to be very inefficient. We usean alternative method [15], which is based on theobservation that all projected rays intersect at onecommon point, the epipole. This makes it possibleto subdivide the reference image into partitions sothat each projected ray will intersect all the edgesand only the edges in that partition. This way weonly have to traverse edges in a particular partitionwhen computing intersections. The algorithm topartition the reference image into regions is as follows:

First, the epipole of the reference image B iscomputed by projecting the camera’s optical centreof A onto B. Each silhouette contour point alongwith the epipole forms a line segment. The slopesof all the line segments are stored in a sorted list


118

in ascending order. Two consecutive line segmentsin the list create a partition or a bin. Each bin isdefined by a minimum and maximum slope value(αmin and αmax) and a set of edges associated with it.

To determine the set of edges allocated to eachbin, we traverse through the silhoutte contour pointswhile maintaining a list of edges in the current bin.When a contour point is examined, we eliminatefrom the current bin all edges ending at that contourpoint and append all edges starting at that contourpoint. A start of an edge is characterised as the edgeend point that has a smaller slope value.

The intersections of a given line and a silhouettecan be determined by first examining the slope of theinput line segment to establish the bin to which theline segment belongs. Once the corresponding bin isfound, we iterate through the edges assigned to thatbin and perform line-to-line intersection. The resultof this stage is a collection of 2D intersection points.

2D points to 3D points Given a 2D in-tersection point in the reference image, we wantto compute the corresponding 3D point of thispoint. This is accomplished by shooting a rayfrom the camera’s optical centre of the referenceimage through the intersection point and computethe intersection point of the newly created 3Drays with the original 3D ray. The intersectionpoint of the two 3D ray is the 3D point we want tofind. Figure 3 shows an example of the lifting process.

Figure 3: Lifting 2D points to 3D points.

A problem arises when the two rays do notintersect as a result of noise. We can handle thisby finding the point with the smallest distance tothe two rays using a Least Square method. Figure 4illustrates the newly generated 3D points of the dogdataset. These point clouds together with those ofthe previous stage form a much more comprehensivegeometric representation of the observed scene.

3.3 Surface and Texture Reconstruction

The final step is to reconstruct surfaces from theobtained point clouds. Surface reconstruction isthe process of automated generation of a smoothsurface that closely approximates the underlying 3Dmodels from which the point clouds were sampled.The reconstructed surface is usually representedby a polygon mesh. Many sophisticated surfacereconstructions have been proposed and extensivelystudied. In our system, we employ the Poisson sur-face reconstruction algorithm [KBH06] for remeshing

Figure 4: Additional 3D points generated by exploiting thesilhouette information.

the surface.

The Poisson reconstruction method [Bol10] ex-tracts surfaces by taking advantage of the integralrelationship between oriented points sampled fromthe surface of an unknown model and the indicatorfunction χ of the model. The indicator function isdefined as 1 at points that lie inside the model andas 0 for points that lie outside the model.

Formally, the problem can be formulated as:Given an oriented point set P = {s1, s2, ..., sN},where a point sample si consists of a position p andan inward facing normal n and is assumed to lieon or near the surface ∂M of an unknown modelM . The aim is to create a smooth and watertightapproximation to the original surface by computingthe indicator function χ of the model and thenextract an appropriate isosurface [Bol10].

This is accomplished by first deriving a relation-ship between an integral of the normal field over thesurface and the gradient of the model’s indicatorfunction. The gradient of the indicator functionis defined as a vector field that is zero almosteverywhere as the indicator function is constantpractically everywhere except at points near thesurface, where it points in the direction of the inwardsurface normal. Hence, the input oriented pointsamples can be thought of as samples of the gradientof the indicator function [Bol10, Kaz05]. An exampleof Poisson Surface reconstruction is shown in Figure5.

Figure 5: Poisson reconstruction. The input point samples areon the left, while the reconstructed mesh is shown on the right.

After the surface reconstruction phase, the re-sulting model is a shaded and textureless polygonalmesh. Such a representation is usually not an accu-rate reflection of the object in the input images. Amore realistic representation is obtained by creatinga surface texture from the colour information in theinput images. In texture reconstruction, each vertexof the polygonal mesh has the RGB colour of the


119

corresponding vertex in the attached point clouds.The resulting triangle mesh with vertex colours isrendered using Gouraud shading. Since the trianglemesh is very dense, the colour interpolation overtriangles gives acceptable results.

4 Results

We evaluated our system using a wide range of datasets at different scales of both indoor and outdoorscenes. In most test cases, the system producesqualitatively good results. The geometries recoveredusing our system appear to retain high resemblanceto that of the original models even for objectswith smooth, uniform and limited-feature surfacesor concave regions. Datasets with both smooth,uniform surfaces and concave regions often resultedin an unsatisfied geometry since the silhouettes didnot contain sufficient geometry information, andthere were too few unambiguous features to allow fullsurface reconstruction. The size of our test datasetsvaries from as few as 6 images to hundreds of images,which were all taken with a simple handheld camera.

Owl Dataset The first dataset consists of 32images (3648 × 2736 pixels) taken from arbitraryview directions using a normal consumer-level SONYDSC-W180 camera of a paper model of an Owl.Three of the 32 input images are shown in Figure 6.This Owl object is highly decorated with texture. Onaverage, each image contains over 53,000 features,which would aid the reconstruction greatly.

Figure 6: Owl dataset input images.

The resulting reconstructed model, illustrated infigure 7, is of excellent quality. The overall shape,along with details such as feathers, and the texture ofthe original model are reconstructed with no visualdifference to the original model. This is due to thehigh number of distinct features of the original object.The resulting model has 678,210 faces in total. Theprocess took approximately 3 hours and 30 minutesto complete on an Intel Quad Core i7 with 6GB RAM.

Lady Statue Dataset This dataset consists of17 images taken from the frontside of a black copperstatue. The backside was not accessible. The imageswere taken with the same camera as in the previouscase and under very complex lighting conditions.

Some of the images which were used for thereconstruction are shown in Figure 8. The resolutionof each image in this dataset is 3648 × 2736 pixels.Notice that the surface of the model contains fewtexture details.

The reconstructed model has 268,892 faces andis of moderate quality (Figure 9). This result is

Figure 7: 3D reconstruction of the Owl model.

Figure 8: Two of seventeen input images of the Lady modeltaken at the Auckland Art Gallery.

surprisingly good considering the relatively lownumber of input images and the lack of distinctvisual features for correspondence matching. Thegeometry was predominantly obtained from thesilhouette information. This shows that our systemis capable of producing good results for feature-poormodels and few input images.

The reconstructed texture shows several sig-nificant differences to the original model. This iscaused by changes in lighting conditions within thegallary. Some white patches also appear aroundthe head region of the reconstructed model due tothe lack of images from that particular direction.The current systems only computes surface colours,rather than material properties, and hence worksbest for diffuse surfaces and lighting conditionsresulting in few isolated highlights. The recon-struction process required approximately 1 hour and47 minutes on an Intel Quad Core i7 with 6GB RAM.

Bunny Dataset This data set comprises49 images (2592 × 1944 pixels) taken from manydifferent views of a bunny model using a normalconsumer-level SONY DSC-W180 camera. Theoriginal model has a very bumpy surface, which isextremely difficult to reconstruct (Figure 10). Theobjective of this test is to determine if the systemcan effectively deal with rough surfaces and highillumination variations due to surface roughness andself shadowing.


120

Figure 9: 3D reconstruction of the Lady Statue.

Figure 10: Two of 31 input images of a bunny model.

The reconstruction result (shown in Figure 11)is of very good quality. The final model, which iscomposed of 528,637 faces, bears a high resemblanceto the original object.

This result demonstrates that our approach isable to recover realistic 3D models of shapes withcomplex surface geometries. The overall computationtime is approximately 4 hours and 40 minutes on anIntel Quad Core i7 with 6GB RAM.

Figure 11: 3D Reconstructed model of the Bunny dataset.

Cat Dataset This data set was obtained froma ceramic statue of a cat. There are 44 images witha resolution of 2592 × 1944 pixels. This objecthas a very shiny, reflective, smooth and uniformsurface. These surface characteristices pose a majorproblem for correspondence-based methods as veryfew features are available. Added to that, theobserved object’s surface also contains a numberof concave regions, which also cause problems forsilhouette-based methods. Three of the 44 inputimages are presented in Figure 12.

Figure 12: Input images of the Cat dataset.

The visual quality of the reconstruction is notsatisfactory. While the object is recognisable, im-portant features such as the eyes and ears are notreconstructed well, and the colour distribution variesfrom the original object.

This reconstruction results (Figure 13) demon-strate that our approach is able to recover 3Dgeometry even for models with shiny, reflective anduniform surfaces, but problems exist with concaveregions. The overall computation time was approx-imately 2 hours and 55 minutes on an Intel QuadCore i7 with 6GB RAM.

Figure 13: 3D reconstruction of the Cat model.

Comparison with Commercial SystemsOver the past 2-3 years a couple of prototypes

of commercial systems have emerged, which arecurrently at various stages of user testing. Whilenone of the commercial systems has published any in-formation about the utilised approach, we performedan analysis, which suggest that most of them use pre-dominantly a silhouette-based approach [NWDL12].In this section we provide a short comparison ofour novel hybrid approach with some of the mostpromising alternative image-based modelling systems(123D Catch and Agisoft) currently in development.

In order to evaluate these systems, we used arepository of over 40 objects. After initial testsusing different objects we selected one object whichreflected the main shortcomings of all tested algo-rithms. For this test, 44 input images of the abovecat model were supplied to these systems. Thereconstruction results from these two systems are


121

shown in Figure 14 and 15.

Figure 14: 3D reconstruction of the Cat model using AgisoftSystem.

Agisoft ’s model is almost unrecognisable. Thereconstruction is very incomplete and retains novisual resemblance to the original object. Largeportions of the object geometry are missing. This ismostly due to the fact that the system is not able toregister views when there is only a limited numberof features in the input images. Additionally, thereconstructed textures are also inadequate with manyblack patches covering the reconstructed surfaces.

123D Catch produces more a complete geometryof the reconstructed model. The resulting model stillsomewhat reflects the structure of the original object.Compared to our reconstruction result, the recon-structed geometry appears much rougher. Regionsaround the back, the head and the tail of the recon-structed model are mostly distorted.

Figure 15: 3D reconstruction of the Cat model using the 123DCatch System.

5 Conclusion and Future Work

Our research was motivated by the observation thatthere is an increasing demand for virtual 3D models.Existing modelling packages, such as Blender and

Maya, enable the construction of realistic and com-plex 3D models, but have a steep learning curve andrequire a high level of artistic skills. Additionally,their use is very time-consuming and often involvestedious manual manipulation of meshes. The useof specialised hardware for 3D model reconstruction(laser scanners) makes it possible for inexperiencedusers to acquire 3D digital models. However, suchhardware is very expensive and limited in its ap-plications. Based on this, we concluded that themost promising approach for a general, affordablecontent creation process is to use an image-basedmodelling approach using images obtained with aconsumer-level uncalibrated camera.

We demonstrated our system’s ability to pro-duce qualitatively good results for a wide range ofobjects including those with smooth, uniform, andtextureless surfaces or containing concave regions.The system has also demonstrated its robustness inthe case that there are illumination variations andshadows in the input images.

Problems, such as missing textures in someregions, still exist with the resulting 3D models.This is caused by insufficient input images of thoseregions. We aim to overcome this problem byemploying image in-painting and exemplar-basedtexture synthesis techniques.

Although we have demonstrated that our systemcan create 3D content inexpensively and conveniently,the high computation cost partially offsets its advan-tages. For example, our system takes approximately2 hours to process 17 images, and roughly 5 hours for31 images on a Intel Quad Core i7 with 6GB RAM.The computational cost rises quadratically with thenumber of input images. For the current state-of-the-art of hardware technology, real time processing isimpossible on consumer-level machines. However, wecan reduce the processing time considerably by par-allelising the computations as much as possible andexecuting them on the GPU.

References

[Bau74] Bruce Guenther Baumgart. Geometricmodeling for computer vision. DoctoralDissertation, Stanford University, 1974.

[BL05] Matthew Brown and David Lowe. Unsu-pervised 3D object recognition and recon-struction in unordered datasets. In Inter-national Conference on 3D Digital Imag-ing and Modelling, pages 56–63, 2005.

[Bol10] Matthew Grant Bolitho. The Reconstruc-tion of Large Three-Dimensional Meshes.Doctoral Thesis. The Johns Hopkins Uni-versity, 2010.

[Est04] Carlos Hernandez Esteban. Geometricmodeling for computer vision. DoctoralThesis, Ecole Nationale Superieure desTelecommunications, 2004.

[FB10] Jean Sebastien Franco and EdmondBoyer. Efficient polyhedralmodeling fromsilhouettes. IEEE Transaction on Pat-tern Analysis and Machine Intelligences,31:853–861, 2010.

[FZ03] Cristian Fruh and Avideh Zakhor. Con-structing 3d city models by merging


122

ground-based and airborne views. InIEEE International Conference on Com-puter Vision and Pattern Recognition,2:562–569, 2003.

[HL07] Shungang Hua and Ting Liu. Realis-tic 3D reconstruction from two uncali-brated views. In International Journal ofComputer Science and Network Security,7:178–183, 2007.

[HQZH08] Shaoxing Hu, Jingwei Qiao, Aiwu Zhang,and Qiaozhen Huang. 3d reconstructionfrom image sequence taken with a hand-held camera. The International ArchivesOf The Photogrammetry, 37(91):559–563,2008.

[Kaz05] Michael Kazhdan. Reconstruction of solidmodels from oriented point sets. In SIG-GRAPH, pages 73–82, 2005.

[KBH06] Michael Kazhdan, Matthew Bolitho, andHugues Hoppe. Poisson surface re-construction. In Proceedings of thefourth Eurographics symposium on Ge-ometry processing (Aire-la-Ville, Switzer-land, Switzerland, 2006), pages 61–70,2006.

[Lor95] William Lorensen. Marching through thevisible man. IEEE Visualization, pages368–373, 1995.

[Low06] David G Lowe. Distinctive image featuresfrom scale-invariant keypoints. In In-ternational Journal of Computer Vision,60(91):110–2004, 2006.

[LQ02] Maxime Lhuillier and Long Quan. Quasi-dense reconstruction from image se-quence. European Conference on Com-puter Vision, pages 125–139, 2002.

[MA83] Worthy Martin and J. K Aggarwal. Volu-metric descriptions of objects from multi-ple views. In IEEE Transactions on Pat-tern Analysis and Machine Intelligence,5(2):150–158, 1983.

[MBR+00] Wojciech Matusik, Chris Buehler,Ramesh Raskar, Steven Gortler, andLeonard McMillan. Image-based visualhulls. In Proceedings of the 27th annualconference on Computer graphics andinteractive techniques, pages 369–374,2000.

[Mel98] Stan Melax. Simple, fast, and effectivepolygon reduction algorithm. Game De-veloper Magazine, pages 44–49, 1998.

[NWDL11] Minh Hoang Nguyen, Burkhard C Wun-sche, Patrice Delmas, and Christof Lut-teroth. Modelling of 3D objects usingunconstrained and uncalibrated imagestaken with a handheld camera. ComputerVision, Imaging and Computer Graphics- Theory and Applications, 274:1–5, 2011.

[NWDL12] Minh Hoang Nguyen, Burkhard C Wun-sche, Patrice Delmas, and Christof Lut-teroth. 3D models from the black box:Investigating the current state of image-based modeling. Communication Pro-ceedings, Pilsen, Czech Republic, UnionAgency, pages 25–28, 2012.

[QTZ+06] Long Quan, Ping Tan, Gang Zeng,Lu Yuan, JingdongWang, and Sing BingKang. Image-based plant modeling.In ACM Transactions on Graphics,25(3):599–604, 2006.

[REH06] Fabio Remondino and Sabry El-Hakim.Image-based 3d modelling: A review.The Photogrammetric Record, 21:269–291, 2006.

[SSS06] Noah Snavely, Steven Seitz, and RichardSzeliski. Photo tourism: Exploring photocollections in 3D. In ACM Transactionson Graphics, 25(3):835–846, 2006.

[THP08] Thormahlen Thorsten and Seidel Hans-Peter. 3D-modeling by ortho-image gen-eration from image sequences. ACMTransaction Graph, 27(3):1–5, 2008.

[TZW+06] Ping Tan, Gang Zeng, Jingdong Wang,Sing Bing Kang, and Long Quan. Image-based tree modeling. In ACM Transac-tions on Graphics, 27(3):418–433, 2006.

[XFT+08] Jianxiong Xiao, Tian Fang, Ping Tan,Peng Zhao, Eyal Ofek, and Long Quan.Image-based facade modeling. In ACMTransactions on Graphics, 27(5):26–34,2008.


123


124

a hybrid image-based modelling algorithm - crpitcrpit.com/confpapers/crpitv135nguyen.pdf · patrice...

Documents