calibration of hand-heldcamera sequences for plenoptic...

7
Calibration of Hand-held Camera Sequences for Plenoptic Modeling R. Koch , M. Pollefeys , B. Heigl , L. Van Gool , H. Niemann Multimedia Information Processing, Inst. for Computer Science, Universit¨ at Kiel, Germany ESAT-PSI, Katholieke Universiteit Leuven, Belgium Lehrstuhl f¨ ur Mustererkennung, Universit¨ at Erlangen–N¨ urnberg, Germany Abstract In this contribution we focus on the calibration of very long image sequences from a hand-held camera that sam- ples the viewing sphere of a scene. View sphere sampling is important for plenoptic (image-based) modeling that cap- tures the appearance of a scene by storing images from all possible directions. The plenoptic approach is appealing since it allows in principle fast scene rendering of scenes with complex geometry and surface reflections, without the need for an explicit geometrical scene model. However the acquired images have to be calibrated, and current approaches mostly use pre-calibrated acquisition systems. This limits the generality of the approach. We propose a way out by using an uncalibrated hand- held camera only. The image sequence is acquired by sim- ply waving the camera around the scene objects, creating a zigzag scan path over the viewing sphere. We extend the sequential camera tracking of an existing structure-from- motion approach to the calibration of a mesh of viewpoints. Novel views are generated by piecewise mapping and in- terpolating the new image from the nearest viewpoints ac- cording to the viewpoint mesh. Local depth map estimates enhance the rendering process. Extensive experiments with ground truth data and hand-held sequences confirm the per- formance of our approach. 1. Introduction There is an ongoing debate in the computer vision and graphics community between geometry-based and image- based scene reconstruction and visualization methods. Both methods aim at realistic capture and fast visualization of 3D scenes from image sequences. Image-based rendering approaches like plenoptic mod- eling [13], lightfield rendering [12] and the lumigraph [6] have lately received a lot of attention, since they can capture the appearance of a 3D scene from images only, without the 1 Work performed while at K.U. Leuven explicit use of 3D geometry. Thus one may be able to cap- ture objects with very complex geometry that can not be modeled otherwise. Basically one caches all possible views of the scene and retrieves them during view rendering. Geometric 3D modeling approaches generate explicit 3D scene geometry and capture scene details mostly on polyg- onal (triangular) surface meshes. A limited set of cam- era views of the scene is sufficient to reconstruct the 3D scene. Texture mapping adds the necessary fidelity for photo-realistic rendering of the object surface. Recently progress has been reported on calibrating and reconstruct- ing scenes from general hand-held camera sequences with a Structure from Motion approach [5, 14]. The problem common to both approaches is the need to calibrate the camera sequence. Typically one uses cal- ibrated camera rigs mounted on a special acquisition de- vice like a robot [12], or a dedicated calibration pattern is used to facilitate calibration [6]. Recently we proposed to combine the structure from motion approach with plenop- tic modeling to generate lightfields from hand-held camera sequences [9]. When generating lightfields from a hand- held camera sequence, one typically generates hundreds of images, but with a specific distribution of the camera view- points. Since we want to capture the appearance of the ob- ject from all sides, we will try to sample the viewing sphere, thus generating a mesh of view points. To fully exploit hand- held sequences, we will also have to deviate from the re- stricted lightfield data structure and adopt a more flexible rendering data structure based on the viewpoint mesh. In this contribution we tackle the problem of camera cal- ibration from very many images under special considera- tion of dense viewsphere sampling. The necessary camera calibration and local depth estimates are obtained with a structure from motion approach. We will first give a brief overview of existing image-based rendering and geomet- ric reconstruction techniques. We will then focus on the calibration problem for plenoptic sequences. Finally we will describe the image-based rendering approach that is adapted to our calibration. Experiments on calibration, geo-

Upload: trankien

Post on 11-May-2018

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Calibration of Hand-heldCamera Sequences for Plenoptic Modelingcs.unc.edu/~marc/pubs/KochICCV99.pdf ·  · 2001-10-28Calibration of Hand-heldCamera Sequences for Plenoptic Modeling

Calibration of Hand-held CameraSequencesfor PlenopticModeling

R. Koch�����

, M. Pollefeys�, B. Heigl

�, L. VanGool

�, H. Niemann

��MultimediaInformationProcessing,Inst. for ComputerScience,UniversitatKiel, Germany�

ESAT-PSI,KatholiekeUniversiteitLeuven,Belgium�Lehrstuhlfur Mustererkennung,UniversitatErlangen–Nurnberg, Germany

Abstract

In this contribution we focuson the calibration of verylong image sequencesfrom a hand-heldcamera that sam-plestheviewingsphereof a scene. View spheresamplingisimportant for plenoptic(image-based)modelingthat cap-turestheappearanceof a sceneby storingimagesfromallpossibledirections. Theplenopticapproach is appealingsinceit allows in principle fast scenerenderingof sceneswith complex geometryandsurfacereflections,withouttheneedfor an explicit geometricalscenemodel. Howeverthe acquired images have to be calibrated, and currentapproachesmostlyusepre-calibratedacquisitionsystems.Thislimits thegenerality of theapproach.

We proposea way out by usingan uncalibratedhand-heldcamera only. Theimage sequenceis acquiredby sim-ply wavingthe camera aroundthe sceneobjects,creatinga zigzag scanpathover theviewing sphere. We extendthesequentialcamera tracking of an existing structure-from-motionapproachto thecalibrationof a meshof viewpoints.Novel views are generatedby piecewisemappingand in-terpolatingthenew image from the nearestviewpointsac-cording to theviewpointmesh.Local depthmapestimatesenhancetherenderingprocess.Extensiveexperimentswithgroundtruth dataandhand-heldsequencesconfirmtheper-formanceof our approach.

1. Intr oduction

Thereis an ongoingdebatein the computervision andgraphicscommunitybetweengeometry-basedand image-basedscenereconstructionandvisualizationmethods.Bothmethodsaimatrealisticcaptureandfastvisualizationof 3Dscenesfrom imagesequences.

Image-basedrenderingapproacheslike plenopticmod-eling [13], lightfield rendering[12] andthe lumigraph[6]havelatelyreceivedalot of attention,sincethey cancapturetheappearanceof a3D scenefrom imagesonly, without the

1Work performedwhile at K.U. Leuven

explicit useof 3D geometry. Thusonemaybeableto cap-ture objectswith very complex geometrythat can not bemodeledotherwise.Basicallyonecachesall possibleviewsof thesceneandretrievesthemduringview rendering.

Geometric3Dmodelingapproachesgenerateexplicit 3Dscenegeometryandcapturescenedetailsmostlyon polyg-onal (triangular)surface meshes. A limited set of cam-era views of the sceneis sufficient to reconstructthe 3Dscene. Texture mappingadds the necessaryfidelity forphoto-realisticrenderingof the object surface. Recentlyprogresshasbeenreportedon calibratingandreconstruct-ing scenesfrom generalhand-heldcamerasequenceswithaStructure fromMotionapproach[5, 14].

The problemcommonto both approachesis the needto calibratethe camerasequence.Typically oneusescal-ibratedcamerarigs mountedon a specialacquisitionde-vice like a robot [12], or a dedicatedcalibrationpatternisusedto facilitatecalibration[6]. Recentlywe proposedtocombinethe structurefrom motion approachwith plenop-tic modelingto generatelightfieldsfrom hand-heldcamerasequences[9]. When generatinglightfields from a hand-heldcamerasequence,onetypically generateshundredsofimages,but with a specificdistributionof thecameraview-points.Sincewe wantto capturetheappearanceof theob-jectfrom all sides,wewill try to sampletheviewing sphere,thusgeneratingameshofviewpoints. To fully exploit hand-held sequences,we will alsohave to deviate from the re-strictedlightfield datastructureandadopta moreflexiblerenderingdatastructurebasedon theviewpointmesh.

In thiscontributionwetackletheproblemof cameracal-ibration from very many imagesunderspecialconsidera-tion of denseviewspheresampling.Thenecessarycameracalibrationand local depthestimatesare obtainedwith astructurefrom motion approach.We will first give a briefoverview of existing image-basedrenderingand geomet-ric reconstructiontechniques.We will then focus on thecalibrationproblemfor plenopticsequences.Finally wewill describethe image-basedrenderingapproachthat isadaptedto ourcalibration.Experimentsoncalibration,geo-

Page 2: Calibration of Hand-heldCamera Sequences for Plenoptic Modelingcs.unc.edu/~marc/pubs/KochICCV99.pdf ·  · 2001-10-28Calibration of Hand-heldCamera Sequences for Plenoptic Modeling

metricapproximationandimage-basedrenderingconcludethiscontribution.

2. Previouswork

Plenopticmodelingdescribestheappearanceof a scenethroughall light rays(2-D) thatareemittedfrom every3-Dscenepoint, generatinga 5D-radiancefunction [13]. Re-centlytwo equivalentrealizationsof theplenopticfunctionwereproposedin form of the lightfield [12], andthe lumi-graph[6]. They handlethe casewhenwe observe an ob-ject surfacein free space,hencethe plenopticfunction isreducedto four dimensions(light raysareemittedfrom the2-dimensionalsurfacein all possibledirections).

Lightfield data representation. The original 4-D light-field datastructureemploys a two-planeparameterization.Eachlight raypassesthroughtwo parallelplaneswith planecoordinates��� � � and �������� . Thusthe ray is uniquelyde-scribedby the 4-tuple ���� ����� � � . The ����� � -planeis theviewpointplanein whichall camerafocalpointsareplacedon regular gridpoints. The �������� -planeis the focal planewhereall cameraimageplanesareplacedwith regularpixelspacing.Theopticalaxesof all camerasareperpendicularto theplanes.This datastructurecoversonesideof anob-ject. For afull lightfield weneedto constructsix planepairsona cubeboundingtheobject.

New views canbe renderedfrom this datastructurebyplacinga virtual cameraon anarbitraryview point andin-tersectingthe viewing rays with the planesat ������ �� ��� .This, however, applies only if the viewing ray passesthroughoriginal cameraview points and pixel positions.For rayspassingin betweenthe ���� � � and ���� ��� grid co-ordinatesan interpolationis appliedthat will degradetherenderingqualitydependingon thescenegeometry. In fact,the lightfield containsan implicit geometricalassumption:Thescenegeometryis planarandcoincideswith the focalplane.Deviationof thescenegeometryfrom thefocalplanecausesimagewarping.

The Lumigraph. The discussionabove revealstwo ma-jor problemswhenacquiringlightfieldsfrom realimagese-quences.First, the needto calibratethe cameraposesinorderto constructtheviewpointplane,andsecondtheesti-mationof localdepthmapsfor view interpolation.Theorig-inal lumigraphapproach[6] alreadytacklesbothproblems.A calibrationof the camerais obtainedby incorporatingabackgroundwith a known calibrationpattern. The knownspecificmarkerson thebackgroundareusedto obtaincam-eraparameterandposeestimation[18]. They provide nomeansto calibratefrom scenecontentonly. For depthinte-grationthe objectgeometryis approximatedby construct-ing a visual hull from the objectsilhouettes.The hull ap-proximatestheglobalsurfacegeometrybut cannot handle

local concavities. Furthermore,the silhouetteapproachisnotfeasiblefor generalscenesandviewing conditionssinceaspecificbackgroundis needed.Thisapproachis thereforeconfinedto laboratoryconditionsand doesnot provide ageneralsolutionfor arbitraryscenes.If we want to utilizethe image-basedapproachfor generalviewing conditionswe needto obtain the cameracalibrationand to estimatelocaldepthfor view interpolation.

Structur e-From-Motion. The problemof simultaneouscameracalibration and depth estimationfrom image se-quenceshasbeenaddressedfor quitesometimein thecom-putervisioncommunity. In theuncalibratedcaseall param-eters,cameraposeandintrinsiccalibrationaswell asthe3Dscenestructurehave to beestimatedfrom the2D imagese-quencealone.FaugerasandHartley first demonstratedhowto obtainuncalibratedprojective reconstructionsfrom im-agesequencesalone[3, 7]. Sincethen,researcherstried tofind ways to upgradethesereconstructionsto metric (i.e.Euclideanbut unknown scale, see[4, 17]). Recentlyamethodwasdescribedto obtainmetric reconstructionsforfully uncalibratedsequencesevenfor changingcamerapa-rameterswith methodsof self-calibration[14]. For densestructurerecovery a stereomatchingtechniqueis appliedbetweenimagepairsof thesequenceto obtainadensedepthmapfor eachviewpoint. Fromthis depthmapa triangularsurfacewire-frameis constructedandtexturemappingfromthe imageis appliedto obtainrealisticsurfacemodels[8].The approachallows metric surfacereconstructionin a 3-stepapproach:

1. cameracalibrationis obtainedby trackingof featurepointsover theimagesequence,

2. densedepthmapsfor all view points are computedfromcorrespondencesbetweenadjacentimagepairsofthesequence,

3. thedepthmapsarefusedto approximatethegeometry,andsurfacetexture is mappedonto it to enhancethevisualappearance.

3. Calibration of viewpoint meshes

In this contribution we proposeto extend the sequen-tial structure-from-motionapproachto thecalibrationof theviewpoint sphere.Plenopticmodelingamountsto a densesamplingof the viewing spherethat surroundsthe object.Onecaninterpretthe differentcameraviewpointsassam-plesof a generalizedsurfacewhich we will call the view-point surface. It canbeapproximatedasa triangularview-point meshwith camerapositionsasnodes.In thespecificcaseof lightfieldsthisviewing surfaceis simplyaplaneandthesamplingis theregularcameragrid. If a programmable

Page 3: Calibration of Hand-heldCamera Sequences for Plenoptic Modelingcs.unc.edu/~marc/pubs/KochICCV99.pdf ·  · 2001-10-28Calibration of Hand-heldCamera Sequences for Plenoptic Modeling

robotwith a cameraarmis athand,onecaneasilyprogramall desiredviews andrecorda calibratedimagesequence.For sequencesfrom a hand-heldvideocamerahowever weobtain a generalsurfacewith possiblecomplex geometryandnon-uniformsampling.To generatetheviewpointswithasimplevideocamera,onemightwantto sweepthecameraaroundtheobject,thuscreatinga zig-zagscanningpathonthe viewing surface. The problemthat ariseshereis thattypically very long imagesequencesof several hundredsof views have to be processed.If we processthe imagesstrictly in sequentialorder as they come from the videostream,then imageshave to be tracked oneby one. Onecanthink of walking alongthe pathof cameraviewpointsgivenby therecordingframeindex [9]. This maycauseer-ror accumulationin viewpoint tracking,becauseobjectfea-turesare typically seenonly in a few imagesandwill belostaftersomeframesdueto occlusionandmismatching.Itwould thereforebe highly desirableto detectthe presenceof a previously tracked but lost featureandto tie it to thenew image.

Thecaseof disappearingandreappearingfeaturesisverycommonin viewpointsurfacescanning.Sincewesweepthecamerain azigzagpathover theviewpointsurface,wewillgeneraterows andcolumnsof an irregular meshof view-points.Evenif theviewpointsarefar apartin thesequenceframeindex they maybegeometricallycloseon theview-point surface. We can thereforeexploit the proximity ofcameraviewpointsirrespectively of their frameindex.

3.1. Sequentialcameratracking

Thebasictool for theviewpointtrackingis thetwo-viewmatcher. Image intensity featuresare detectedand haveto be matchedbetweenthe two images�������� of the viewpoints �!�� "� . Here we rely on a robust computationoftheFundamentalmatrix #�!$� with theRANSAC (RANdomSAmpling Consensus)method[16]. A minimum setof 7featurescorrespondencesis picked from a large list of po-tentialimagematchesto computeaspecific# . For thispar-ticular # thesupportis computedfrom theotherpotentialmatches.Thisprocedureis repeatedrandomlyto obtainthemostlikely # !$� with bestsupportin featurecorrespondence.

Thenext stepafterestablishmentof # is thecomputationof the %'&)( cameraprojectionmatrices �! and "� . Thefun-damentalmatrixalonedoesnotsufficeto fully computetheprojectionmatrices.In abootstrapstepfor thefirst two im-ageswe follow theapproachby Beardsley et al. [1]. Sincethecameracalibrationmatrix * is unknown apriori weas-sumean approximate +* to startwith. The first cameraisthensetto �,.-/+*10 ��2 354 to coincidewith theworld coordi-natesystem,andthesecondcamera 76 canbederivedfromtheepipole8 and # as "69-:+*<;�0 8�4>=?#A@B8DC�EF2 G58�H with 0 8�4I=.-KJ<L M�N�O N�PN�O L M�NRQM�N�P NRQ L

S

"6 is definedup to a global scaler and the unknownplaneTVU WYX , encodedin C E (seealso[15]). Thuswecanonlyobtainaprojectivereconstruction.Thevector C E shouldbechosensuchthat the left %Z&B% matrix of ! bestapprox-imatesan orthonormalrotationmatrix. The scale G is setsuchthat thebaselinelengthbetweenthefirst two camerasis unity. * and C E will bedeterminedlaterduringcameraself-calibration.

Oncewe have obtainedthe projectionmatriceswe cantriangulatethe correspondingimagefeaturesto obtainthecorrespondingprojective 3D object features. The objectpointsaredeterminedsuchthat their reprojectionerror intheimagesis minimized.In additionwe computethepointuncertaintycovarianceto keeptrack of measurementun-certainties.The 3D objectpointsserve asthe memoryforconsistentcameratracking,andit is desirableto track theprojectionof the3D pointsthroughasmany imagesaspos-sible.

Eachnew view of thesequenceis usedto refinetheini-tial reconstructionandto determinethe cameraviewpoint.Here we rely on the fact that two adjacentframesof thesequenceare taken from nearbyview points,hencemanyobjectfeatureswill bevisible in bothviews. Theprocedurefor addinga new frameis much like the bootstrapphase.Robustmatchingof #�!�[ !]\�6 betweenthecurrentandthenextframe of the sequencerelatesthe 2D imagefeaturesbe-tweenviews �Y! and �Y!$\�6 . Sincewe have also the 2D/3Drelationshipbetweenimageandobjectfeaturesfor view �Y! ,wecantransfertheobjectfeaturesto view �Y!]\�6 aswell. Wecanthereforethink of the3D featuresasself-inducedcali-brationpatternanddirectly solve for thecameraprojectionmatrixfrom theknown2D/3Dcorrespondencein view �Y!$\�6with a robust (RANSAC) computationof !]\�6 . In a laststepwe updatetheexisting3D structureby minimizing theresultingfeaturereprojectionerrorin all images.A Kalmanfilter is appliedfor each3D point andits positionandco-varianceareupdatedaccordingly. Unreliablefeaturesandoutliersareremoved,andnewly foundfeaturesareadded.

3.2. Viewpoint meshweaving

Thesequentialapproachasdescribedaboveyieldsgoodresultsfor thetrackingof shortsequences.New featuresareaddedin eachimageandthe existing featuresaretrackedthroughoutthe sequence.Due to sceneocclusionsandin-evitable measurementoutliers,however, the featuresmaybe lost or wrongly initialized, leading to erroneousesti-matesandultimatelytrackingfailure. Sofar severalstrate-gieshave beendevelopedto avoid this situation. RecentlyFitzgibbonet al. [5] addressedthis problemwith a hierar-chical matchingschemethat matchespairs, triplets, shortsubsequencesand finally full sequences.However, theytrackalongthelinearcamerapathonly anddonotconsider

Page 4: Calibration of Hand-heldCamera Sequences for Plenoptic Modelingcs.unc.edu/~marc/pubs/KochICCV99.pdf ·  · 2001-10-28Calibration of Hand-heldCamera Sequences for Plenoptic Modeling

theextendedrelationshipin a meshof viewpoints.By exploiting thetopologyof thecameraviewpointdis-

tribution on the viewpoint surfacewe can extend the se-quentialtrackingto a simultaneousmatchingof neighbor-ing viewpoints.Theviewpointmeshis definedby thenodegeometry(cameraviewpoints) and the topology (whichviewpoints are nearestneighbors). We start sequentiallyas describedabove to computethe geometryof the cam-erafrom the connectivity with thepreviousviewpoint. Toestablishtheconnectivity to all nearestviewpointswe havenow two possibilities: Look-aheadandbacktracking.Forlook-aheadonecomputesimagerelationshipsbetweenthecurrentandall future frames. Suchan approachhasbeendevelopedfor collectionsof images[10]. It hastheadvan-tagethatit canhandleall imagesin parallel,but thecompu-tationalcostsarequitehigh. For backtrackingthesituationis morefortunate,sincefor previous cameraswe have al-readycalibratedtheir position. It is thereforeeasyto com-pute the geometricaldistancebetweenthe currentandallprevious camerasand to find the nearestviewpoints. Ofcourseone hasto accountfor the non-uniformviewpointdistribution and to selectonly viewpoints that give addi-tional information. We have adopteda schemeto dividethe viewing surface into angularsectorsaroundthe cur-rent viewpoint and to selectthe nearestcamerasthat aremost evenly distributed aroundthe currentposition. Thesearchstrategy is visualizedin fig. 1. Thecameraproducesa pathwhosepositionshave beentracked up to viewpoint^�_a`

already, resultingin ameshof viewpoints(filled dots).The new viewpoint

^is estimatedfrom thoseviewpoints

that are inside the shadedpart of the sphere.The cut-outsectionavoids unnecessaryevaluationof nearbycameras^�_b` ^"_dc fe$e]e . Theradiusof thesearchsphereis adaptedto thedistancebetweenthelasttwo viewpoints.

Oncewe have found the local topology to the nearestview pointswecanupdateourcurrentpositionbyadditionalmatching. In fact, eachconnectingedgeof our viewpointmeshallowsthecomputationof #�!I� betweentheviewpoints^

and g . More important,sincewe arenow matchingwithimagesway back in the sequence,we can couplethe 3Dstructure muchmoreeffectively to imagematches.Thus,

viewpoint surface

t

s

backtracking

camera path

tracked viewpoint3 ... viewpoints21

i-1look-ahead

search range

Figure 1. Search strategy to determine the topologybetween the viewpoints.

a 3D featurelivesmuchlongerandis seenin moreimagesthanwith simplesequentialtracking(seefig 4). In additionto thecouplingof old featureswe obtaina muchmoresta-bleestimatefor thesingleviewpointaswell. Eachimageisnow matchedwith (typically 3-4) imagesin differentspa-tial directionswith reducestherisk of critical or degeneratesituations.

4. Rendering fr om the viewpoint mesh

For image-basedrendering,virtual cameraviews aretobe reconstructedfrom the setof calibratedviews. The lu-migraphapproach[6] synthesizesa regularviewpoint gridthrough rebinning from the estimatedirregular cameras.Becauseof interpolatingthe grid from the original data,information is lost andblurring effectsoccur. To preventthedisadvantagesof therebinningstep,werenderviewsdi-rectly from theoriginally recordedimages.In thesimplestway this is achievedby projectingall imagesonto a com-mon planeof “mean geometry”by a 2D projective map-ping. Having a full triangulationof theviewpoint surface,we projectthis meshinto thevirtual camera.For eachtri-angleof themesh,only theviews thatspanthetrianglearecontributingto thecolorvaluesinside.Eachtriangleactsasa window throughwhich the threecorrespondingmappedtexturesare seenin the virtual camera. The texturesareoverlayedwith applyingalphablending,usingbarycentricweightsdependingon thedistanceto thecorrespondingtri-anglecorner.

Combining imagesand geometry. The texture interpo-lation canbe improvedwith local depthmaps.Dependingon the virtual cameraposition,a planeof meangeometryis assignedadaptively to eachimagetriplet which forms atriangle.Adaptive to thesizeof eachtriangleandthecom-plexity of geometry, furthersubdivisionof eachtrianglecanimprovetheaccuracy of therendering.Theapproachis de-scribedmorein depthin [11]. As therenderingis a2D pro-jectivemapping,it canbedonein realtimeusingthetexturemappingandalphablendingfacilitiesof graphicshardware.

5. Experimental results

To evaluateour approach,we recordeda testsequencewith known groundtruth from a calibratedrobotarm. Thecameraismountedonthearmof arobotof typeSCORBOT-ER VII. Thepositionof its gripperarmis known from theanglesof the5 axesandthedimensionsof thearm.Opticalcalibrationmethodswereappliedto determinetheeye-handcalibrationof thecameraw.r.t. thegripperarm.Weachievea meanabsolutepositioningerror of 4.22mm or 0.17de-gree,respectively [2]. Therepetitionerrorof therobotis 0.2mm and0.03degrees,respectively. Becauseof the limited

Page 5: Calibration of Hand-heldCamera Sequences for Plenoptic Modelingcs.unc.edu/~marc/pubs/KochICCV99.pdf ·  · 2001-10-28Calibration of Hand-heldCamera Sequences for Plenoptic Modeling

Figure 2. Image 1 and 64 of the 8 & 8 original cameraimages of the sphere sequence .

sizeof therobot,we arerestrictedto sceneswith maximalsizeof about100mmin diameter.

For the ground truth experimentthe robot sampledah & h sphericalviewing grid with a radiusof 230mm. Theviewing positionsenclosedamaximumangleof 45degreeswhichgivesanextensionof thesphericalviewpointsurfacepatchof 180& 180mmi . Thesceneconsistsof a cactusandsomemetallic partson a pieceof rough white wallpaper.Two of theoriginal imagesareshown in fig. 2. Pleasenotethe occlusions,reflectionsandillumination changesin theimages.

We comparedthe viewpoint meshweaving algorithmwith the sequentialtracking and with ground truth data.Fig. 3 shows the camerapath and connectivity for thesequentialtracking (left) and viewpoint weaving (right).Weaving generatesa topologicalnetwork that tightly con-nectsall neighboringviews. On averageeachnodewaslinked to 3.4 connections.The graphin fig. 4 illustratesvery clearly thesurvival of 3D points. A singlepoint maybetrackedthroughoutthesequencebut is lost occasionallydueto occlusion.However asthe camerapassesnearto apreviouspositionin thenext sweepit is revivedandhencetrackedagain.This resultsin fewer3D points(# Pts)whicharetrackedin moreimages(# Im/Pts).Somestatisticsof thetrackingaresummarizedin table1. A minimumamountof

Figure 3. Left: Camera trac k and view points for se-quential trac king. Right: Camera topology mesh andview points for viewpoint mesh weaving. The cam-eras are visualiz ed as pyramids, the black dots dis-play some of the trac ked 3D points.

Figure 4. Distrib ution of trac ked 3D points (ver tical)over the 64 images (horizontal). Left: Sequentialtrac king. Right: viewpoint mesh weaving. Pleasenote the specific 2D pattern in the right graph thatindicates how a trac ked point is lost and found backthr oughout the sequence .

3 imagesis requiredbeforea featureis kept as3D point.For viewpointweaving, 3D pointsareusuallytrackedin thedoubleamountof imagesascomparedto sequentialtrack-ing, and the averagenumberof imagematches(#Pts/Im)is increased.Importantis also that the numberof pointsthataretrackedin 3 imagesonly (# Min Pts)dropssharply.Thesepointsareusuallyunreliableandshouldbediscarded.

Table 1. Tracking statistics over 64 images.Algorithm: sequential viewpoint

tracking weaving# Pts 3791 2169# Im/Pts(ave.) 4.8 9.1# Im/Pts(max.) 28 48# Pts/Im(ave.) 286 306# Min Pts 1495 458

A quantitativeevaluationof thetrackingwasperformedby comparingthe estimatedmetric cameraposewith theknown Euclideanrobotpositions.We anticipatetwo typesof errors: 1) a stochasticmeasurementnoiseon the cam-eraposition,and2) a systematicerror dueto a remainingprojective skew from imperfectself-calibration.For com-parisonwetransformthemeasuredmetriccamerapositionsinto theEuclideanrobotcoordinateframe.With aprojectivetransformationwe caneliminatetheskew andestimatethemeasurementerror. We estimatedtheprojective transformfrom the64 correspondingcamerapositionsandcomputedtheresidualdistanceerror. Thedistanceerrorwasnormal-ized to relative depthby the meansurfacedistanceof 250mm. The meanresidualerror droppedfrom 1.1% for se-quentialtrackingto 0.58%for viewpoint weaving (seeta-ble 2). Thepositionrepeatabilityerrorof therobot itself is0.08%.

If we assumethat no projective skew is presentthenasimilarity transformwill suffice to mapthecoordinatesets

Page 6: Calibration of Hand-heldCamera Sequences for Plenoptic Modelingcs.unc.edu/~marc/pubs/KochICCV99.pdf ·  · 2001-10-28Calibration of Hand-heldCamera Sequences for Plenoptic Modeling

Table 2. Ground truth comparison of 3D camera posi-tional error between the 64 estimated and the kno wnrobot positions [in % of the mean object distance of250 mm].

Cameraposition projective similarityTrackingError [%] mean dev mean devsequential 1.08 0.69 2.31 1.082D viewpoints 0.57 0.37 1.41 0.61

ontoeachother. A systematicskew however will increasetheresidualerror. To testfor skew we computedthesimi-larity transformfrom thecorrespondingdatasetsandevalu-atedtheresidualerror. Herethemeanerrorincreasedwith afactorof about2.4to 1.4%whichstill is verygoodfor poseandstructureestimationfrom fully uncalibratedsequences.

5.1. Hand-held officesequence

We testedour approachwith an uncalibratedhand-heldsequence.A digital consumervideo camera(Sony DCR-TRV900 with progressive scan)was swept freely over aclutteredsceneon a desk,covering a viewing surfaceofabout1 jki . TheresultingvideostreamwasthendigitizedonanSGIO2bysimplygrabbing187framesatmoreor lessconstantintervals. No carewastakento manuallystabilize

Figure 5. Top: Two images from hand-held office se-quence . Bottom left: Distrib ution of 3D feature points(7014 points, ver tical) over the image sequence (187images, horizontal). Bottom right: Viewpoint meshwith cameras as pyramids and 3D points (in black).

Figure 6. 3D surface model of office scene renderedwith shading (left) and texture (right).

thecamerasweep.Fig. 5 (top) displaystwo imagesof thesequence.Thecameraviewpointsaretrackedandtheview-pointmeshtopologyis constructedwith theviewpointmeshweaving. Fig. 5 (bottom)shows thestatisticsof thetracked3D featurepoints(left) andtheresultingcameraviewpointmeshwith 3D points(right). The point distribution (left)shows thecharacteristicweaving structurewhenpointsarelost andfoundbackthroughoutthesequence.Thecameratrackingillustratesnicelythezigzagscanof thehandmove-mentasthecamerascannedthescene.Theviewpointmeshis irregulardueto thearbitraryhandmovements.Theblackdotsrepresentthereconstructed3D scenepoints.

Thestatisticalevaluationgivesanimpressiveaccountonthetrackingabilities.Thecamerawastrackedover187im-ageswith at average452 matches/image.A total of 7014points were generatedand matchedon the averagein 12imageseach.A certain3D pointwaseventrackedover181images,with imagematchesin 95 images.

Scenereconstruction and viewpoint mesh rendering.Fromthecalibratedsequencewecancomputeany geomet-ric or imagebasedscenerepresentation.As an examplewe show in fig. 6 a geometricsurfacemodelof the scenewith local scenegeometrythat was generatedwith densesurfacematching.More detailson 3D reconstructionfromplenopticsequencesarediscussedin [10]. Someresultsofimage-basedrenderingfrom theviewpointmeshareshownin Fig. 7. Novel views wererenderedwith differentlevelsof local geometry(top). In the closeupviews (bottom)adetailwasviewedfrom differentdirections.Thechangingsurfacereflectionsarerenderedcorrectlydueto the view-dependentimaging. A more detailedaccountof the ren-deringtechniquesthat incorporatelocal depthmapscanbefoundin [11].

6. Further Work and Conclusions

We have proposeda cameracalibrationalgorithm forgeometricandplenopticmodelingfrom uncalibratedhand-

Page 7: Calibration of Hand-heldCamera Sequences for Plenoptic Modelingcs.unc.edu/~marc/pubs/KochICCV99.pdf ·  · 2001-10-28Calibration of Hand-heldCamera Sequences for Plenoptic Modeling

Figure 7. Top: novel scene view rendered with in-creasing level of geometric detail. Left: single depthplane; right: 1 depth plane per viewpoint triangle .Bottom: Two closeup views from diff erent viewingdirections. Please note the changing surface reflec-tion on the object surface .

heldimagesequences.Duringimageacquisitionthecamerais sweptoverthesceneto sampletheviewing spherearoundan object. The approachconsidersthe two-dimensionaltopologyof the viewpointsandweavesa viewpoint meshwith highaccuracy androbustness.It significantlyimprovestheexistingsequentialstructure-from-motionapproachandallows to fully calibratehand-heldcamerasequencesthatare targetedtowardsplenopticmodeling. The calibratedviewpointmeshwasusedfor thereconstructionof 3-D sur-facemodelsandfor image-basedrendering,which evenal-lowsto renderreflectingsurfaces.

Acknowledgments: We acknowledge financial supportfromtheBelgianprojectIUAP04/24’IMechS’ andtheGer-manResearchFoundation(DFG) underthe grantnumberSFB603.

References

[1] P. Beardsley, P. Torr andA. Zisserman:3D Model Acqui-sition from ExtendedImageSequences.ECCV 96, LNCS1064,vol.2,pp.683-695.Springer1996.

[2] R. Beß: Kalibrierung einer beweglichen, monoku-laren Kamera zur Tiefengewinnung aus Bildfolgen. In:Kropatsch,W. G. andBischof,H. (eds.),InformaticsVol. 5,Mustererkennung1994,524– 531,SpringerVerlagBerlin,1994.

[3] O. Faugeras:Whatcanbeseenin threedimensionswith anuncalibratedstereorig. Proc.ECCV’92, pp.563-578.

[4] O. Faugeras,Q.-T. Luong and S. Maybank: Cameraself-calibration - Theory and experiments.Proc. ECCV’92,pp.321-334.

[5] A. Fitzgibbonand A. Zisserman:Automatic CameraRe-covery for Closedor OpenImageSequences.ProceedingsECCV’98.LNCSVol. 1406,Springer, 1998.

[6] S. Gortler, R. Grzeszczuk,R. Szeliski,M. F. Cohen: TheLumigraph.ProceedingsSIGGRAPH’96, pp 43–54,ACMPress,New York, 1996.

[7] R. Hartley: Estimationof relative camerapositionsfor un-calibratedcameras.ECCV’92, pp.579-587.

[8] R. Koch, M. Pollefeys, and L. Van Gool: Multi View-point Stereo from UncalibratedVideo Sequences.Proc.ECCV’98, Freiburg, June1998.

[9] R. Koch,B. Heigl, M. Pollefeys,L. VanGool,H. Niemann:A GeometricApproachto Lightfield Calibration.Proceed-ingsCAIP’99, Ljubljana,Slovenia,Sept.1999.

[10] R.Koch,M. Pollefeys,andL. VanGool: RobustCalibrationand3D GeometricModelingfrom LargeCollectionsof Un-calibratedImages.Proc.DAGM 99,Bonn,Germany, 1999.

[11] B. Heigl, R. Koch, M. Pollefeys, J. Denzler, L. Van Gool:PlenopticModeling andRenderingfrom ImageSequencestakenby aHand–HeldCamera.Proc.DAGM 99,Bonn,Ger-many, 1999

[12] M. Levoy, P. Hanrahan:Lightfield Rendering.ProceedingsSIGGRAPH’96, pp31–42,ACM Press,New York, 1996.

[13] L. McMillan and G. Bishop, “Plenoptic modeling: Animage-basedrendering system”, Proc. SIGGRAPH’95,pp.39-46,1995.

[14] M. Pollefeys, R. Koch and L. Van Gool: Self-CalibrationandMetric Reconstructionin spiteof VaryingandUnknownInternalCameraParameters.Proc. ICCV’98, Bombay, In-dia,Jan.1998.

[15] M. Pollefeys, R. Koch, M. Vergauwenand L. Van Gool:Metric 3D SurfaceReconstructionfrom UncalibratedImageSequences.In: 3D Structurefrom Multiple Imagesof LargeScaleEnvironments.LNCS SeriesVol. 1506,pp. 139-154.Springer-Verlag,1998.

[16] P.H.S. Torr: Motion Segmentationand Outlier Detection.PhDthesis,Universityof Oxford,UK, 1995.

[17] B. Triggs:TheAbsoluteQuadric.Proc.CVPR’97.

[18] R.Y.Tsai: A VersatileCameraCalibration TechniqueforHigh-Accuracy 3D Machine Vision Metrology using off-the-shelfCamerasandLenses.IEEE JournalRoboticsandAutomationRA-3,4(Aug. 1987),323-344.