rao-blackwellized importance sampling of camera ...dellaert/ftp/quennesson06-3dpvt.pdfcation. in the...

Rao-Blackwellized Importance Sampling of Camera Parametersfrom Simple User Input with Visibility Preprocessing in Line Space

Anonymous submission

Abstract

Users know what they see before where they are: it ismore natural to talk about high level visibility information(“I see such object”) than about one’s location or orienta-tion. In this paper we introduce a method to find in 3Dworlds a density of viewpoints of camera locations fromhigh level visibility constraints on objects in this world.Our method is based on Rao-Blackwellized importancesampling. For efficiency purposes, the proposal distribu-tion used for sampling is extracted from a visibility pre-processing technique adapted from computer graphics.

We apply the method for finding in a 3D city modelof Atlanta the virtual locations of real-world cameras andviewpoints.

1 Introduction

3D models of large real world outdoor and indoor loca-tions become increasingly accessible. We are interestedin quickly locating users or cameras locations inside ofthose models from high level visibility constraints, suchas “I see” or “I do not see” objectx. By providing imme-diately accessible visibility constraints users can be givenback information regarding their location. This can alsobe used as an initial estimate for other optimizations.

For instance, one might want to find all apartment witha view on a particular monument. Google Earth has 3Dmodels of several major US cities. Those models canbe used to extract the apartments with the desired view.Users can also virtually see how the view looks like,and eventually go to the website of the real-estate agencywhich sells it. Other applications might be for robot lo-calization. A robot might capture features of particularbuildings if outdoor or walls if indoor. The robot could

be located using a rough model of the world. Users mightalso want to locate photographs they took. They are justrequired to input the building they recognize. They mightlater discover images taken by other users at the same lo-cation. In the 4D-cities project, we are interested in gen-erating 3D model from a large set of images. Localizationof cameras given images provides initial estimates for fur-ther reconstruction methods.

Figure 1: A bimodal density in a city 3D model. The twomodes are circled in red. The buildings required as visibleare highlighted in color. Samples from this distributionare shown in Figure 6

Our approach is based on finding a density of cameralocations: to a camera location is assigned its likelyhood.The density will be more localized in space as constraintsare added. Figure 1 shows a bimodal density with two dis-joints region of space satisfying constraints, although onein more likely than the other since samples have higher

1

weights. The density is inferred using Rao-Blackwellizedimportance sampling – importance sampling of improvedquality by integrating out over the subspaces of orienta-tions and focal length. In the case of large 3D models, wedo not want to sample to whole space as only a small sub-set will satisfy the constraints. Bittner and Wonka in [1]precompute visibility of buildings on the ground plane forurban walkthroughs. The technique relies on solving theocclusion problem for regions of the ground plane (called“view cells” in the 2D case[1] ) in the dual line space. Byextending this technique to the spacial locations, we canuse this preprocessed visibility as the base of the proposaldistribution of our sampler.

A related area of work is the wide-baseline matching[2, 3, 4, 5, 6, 7] algorithms. They work best for imageswith similar viewpoints. However, such image might notbe available in the case of large models, or with historicalimages as we are interested in the 4d-cities project. In thefield of visibility, exact solution of the visibility problemis found using the Scale-Aspect graph[8, 9]. However thissolution is computationally very expensive. Visibility hasbeen extensively studied in Computer Graphics for accel-erated rendering of 3D scene (for a review [10]). Tech-niques separates in 2 kinds: from-point and from-region.Preprocessing using the dual line space as in [1] alloin-feredws efficient from-region preprocessing and can actas a good prior for the refinement at the point level thatthe sampling will constitute.

2 Density Estimation

A Bayesian density will be used to represent knowledgeabout camera locations and orientations given a measure-ment. Through this density we will extract the most likelycamera locations. The density however does not have aparametric form. We will make inference on it using Rao-Blackwellized importance sampling.

2.1 Modeling Assumption

We represent knowledge about camera locations with thedensityP (θ|Z), wheret, R, f is the 7 dimensional triplemade of the three dimensional translationt, rotationRand focal lengthf of a given viewpoint or camera.Z isthe measurement: the variable which contains the visibil-

ity constraints.P (θ|Z) tells how likely is the orientationθgiven the visibility constraintZ. It can be rewritten usingBayes’ law asP (θ|Z) = 1

P (Z)P (Z|θ)P (θ), whereP (θ)

is the likelyhood ofθ independently of any constraints(for instance, always zero under the ground plane).

2.2 Inference through Importance Sam-pling

This densityP (θ|Z) does not have a parametric form andthus needs to be estimated. Importance sampling is a wayof making approximate inference about a densityP (θ|Z).First, samplesθ1, θ2, ..., θr are generated from aproposaldistributionQ(θ|Z). The proposal distribution is an ini-tial hypothesis on the densityP (θ|Z), given some knowl-edge. The approximate inferenceP (θ|Z) on P (θ|Z) isobtained by weighting each sampleθi by an importanceweightwi = P (θi|Z)

Q(θi|Z) :

P (θ|Z) =

r∑i=1

wiδ(θi, θ)

where δ(θi, θ) = 1 when θ = θi, and 0 if not.We compute the weights using the formulaP (θi|Z) =

1P (Z)P (Z|θi)P (θi), by settingP (Z|θi) to 1 is the con-straintsZ is satisfied under the location, orientation andfocal lengthθi = ti, Ri, fi, and0 if not.

2.3 Rao-Blackwellized Importance Sam-pling

The quality of the inference provided by the im-portance sampling can be improved using Rao-Blackwellization. We can rewrite the distribution asP (θ|Z) = P (R, f |Z, t)P (t|Z). It is equivalent, forextractingθ, to first extractt given Z and thenR, fgiven Z and t. The densityP (R, f |Z, t) is easy toapproximate whent is known. One can make inferenceon t by samplingP (θ|Z): this will give the estimatort(θ = t, R, f) =

∑ri=1 tδ(θi, θ = t, R, f).

The Rao-Blackwell says that a better estimator thant(θ) is t(s) = E[t(θ)|S(θ) = s, Z] whereS is a suf-ficient statistic fortreal, the hidden “real” location be-hind the density. A sufficient statistic simply says thatP (θ|S, Z) is independent oftreal so that the estimator

2

t(s) = E[t(θ)|S(θ) = s, Z] =∑

t(θ)P (θ|S(θ) = s, Z)is properly defined. In our case, a trivial sufficient statis-tic is S(θ = t, R, f) = t. We have thatt(s) = t(t) =∑

ti∈TStδ(t, ti), with TS the set of uniqueti in the sam-

plesθ1, ..., θr, andt(t) is seen with probabilityP (t|Z).This corresponds to sampling the distribution

P (t|Z) =∑

ti∈TSαiδ(ti, t), where αi are the new

weights αi = P (ti|Z). We can compute them byintegrating overR andf :

αi =

∫P (ti, R, f |Z)dR, f

=P (ti)

P (Z)

∫P (Z|ti, R, f) · P (R, f |ti)dR, f

using Bayesè law.We can now just sample this distribution of higher qual-

ity to find a locationt, and sampleP (R, f |Z, t) to findR, f .

3 Visibility Preprocessing

However, in the case of large models only a small sub-set of space will actually satisfy the visibility constraints.It will be inefficient to sample those regions as most ofthe samples will have zero weight. We thus precomputevisibility for 3D blocks of space, and integrate this in-formation in the proposal distribution of the importancesampling. Bittner and Wonka [1] preprocess visibility forregions of the ground plane in order to make tractable thereal-time rendering of walkthrough through large city 3Dmodels. We adapt this method to 3D space to extract theregions in space which satisfy our visibility constraints.

3.1 3D View Cells

We use 3D view cells to carve out the space into distinctregions for which we will extract visibility information.A 3D view cell is represented by a box in space of heighth and width and heighta. The space is partioned by a setof view cells centered at(ka, la, mh) wherek, l, m ∈ Z

(Figure 2). The 2.5 hypothesis of a 3D model consists insaying that every point can be described by the equationz = f(x, y), wherex, y evolve in the ground plane. Thishypothesis is satisfied by 3D models of cities, wherez

will represent the height of the object located atx, y. Thishypothesis implies for the view cell that a object is visiblefrom within the view cell is and only if it is visible fromits top square (whenz = (m + 1/2)h). To find the objectvisibleat leastfrom somewhere in the view cells, one mayjust look at those which are visible from each horizontalsquare centered at(ka, la, (m + 1/2)h). In the case ofview cells of reasonable size (smaller than the objects inorder not to contain any of them), we can also make thefollowing assumption: a object visible from within theview cell top square will be also visible from the viewcell top squareedges. We can thus restrict the study ofvisibility to the study of all light rays reaching this set ofedges (Figure 2).

3.2 Visibility in Line Space

This problem of finding visibility information for all lightrays getting into a view cell is more easily solved in thedual line space.

First, we ignore the height information and considerthe 2D plane containing a view cell top square. Thisplane contains a set of object edges – intersection of ob-ject faces with the plane. We would like to find all thelines which lies in this plane and intersect both the celledge and the object edge. A line(a, b, c) in canonicalspace is associated to the point(a/c, b/c) in line space.The set of all linesl = alx + bly + 1 passing througha view cell edge and a object edge (represented as theshaded region in figure 2) can be expressed as a barycen-ter l =

∑l′∈AC,AD,BC,BD αl′(al′x + bl′y + 1) where∑

l′ αl = 1. In the dual line space, this means that allsuch lines are represented by the points contained withinthe polygon formed by the dual points of the 4 limit lines(Figure 2). A object edgee1 will occlude another objectedgee2 for all points on a view cell edgec if and onlyif e2 is located behinde1 and if the polygon associatedwith the four critical lines ofe2 with c in line space isfully contained within the polygon ofe1. This means thatall lines which are intersectinge2 and the cell edgec areintersectinge1 first.

However, this can only be generalized for visibility re-garding if he objects are of same height, which is not thecase in general. To integrate the height information, wecompute the maximum angle with the ground plane madeby the cell edge extremities (at the height(m + 1/2)h))

3

view cell edge

A B

C D

x

y

4 limit lines

occluders

The green occluder is hidden by red

occluder iif the dual polygon

of its critical lines is contained

inside the one associated with

the red occluder

AND

x

z a1a2

a2<a1

BSP Tree view after one new

blue occluder located behind

comes in

outside

only added to the tree

if associated angle

is smaller

a

a

h

x

y

z

view cell

object

intersection of

the building faces

with the view cell

top face

top viewinside

AC*

AD*

BC*

BD*

(a) (b) (c) (d)

Figure 2: (a) A 3D view cell and a 3D object. The intersection of the object faces with the plane containing the viewcell top edge creates occluders on this plane. (b) Lmit linesbetween a view cell edge and an occluder. All lines in theshaded region are “average” of those lines. Behind is one hidden green occluder, for which no limit lines are shown.(c) Line space representation of the 4 limit lines of the red occluder and green occluder. The polygon of the greenoccluder will be contained inside the one of the red occluder, as it is hidden by the red occluder. (d) A schematicrepresentation of the BSP tree when a new blue occluder comesin.

and all of the object edge extremities at the height of theobject. All locations of the view cell edge will see the ob-ject under a maximum angle below that computed angle.A face of a objectf1 is occluded by another facef2 fora all points on a view cellc is the projection off1 on theview cell plane is occluded by the projection off2 and ifthe maximum angle with the ground plane off1 is inferiorto the maximum angle off2 (Figure 2). Figure 3 showsthe occlusion mechanism in line space.

3.3 Iterative Solving of Visibility with BSPTrees

We use Binary Space Partitioning trees (BSP trees) to testin line space whether a polygon is hidden by a set of othersand to maintain the occluders structure. BSP are conve-nient for making inside/outside comparisons. For simpli-fication the tree can be seen as if each node correspond toa polygon (in fact, they are the lines generating the poly-gons). Each node have a “left” descendant correspond-ing to the tree containing all the nodes located inside, anda “right” descendant corresponding to the tree contain-ing all the nodes located outside. When a new polygon

Figure 3: Seeing occlusion in line space

4

Figure 4: Rough 3D model of Atlanta used for testing.Shaded regions corresponds to view cells which does notsee the building highlighted in black.

comes, if it lies inside, it is added to the left node, if itlies outside it is added to the right node, and if it has out-side and inside parts, it is split in two inside and outsideparts added to the left and right descendant respectively(Figure 2). We prune at construction the ”inside” nodes iftheir associated angle (for the object’s height) is smallerthan those of their father, and add the polygons in front toback order. If an object is visible, it will be remain in thetree.

Given a new visibility constraint, one can now simplyfind which 3D view cells might satisfy the constraint byexploring the tree. Figure 4 shows the result of such pre-processing.

4 Experiments

We tested our method using a rough 3D model of down-town Atlanta, containing 12 landmark buildings. Weonly looked at ground plane and building locations (Fig-ure 5). This is equivalent to rewriting the prior onθas the mixtureP (θ) = αPground(θ) + βPbuiding(θ),wherePground(θ) andPbuilding(θ) denotes the prior re-spectively ofθ for ground locations and for locations overthe buildings faces. Information about orientation is inte-grated in the priors: for instance, only “up” orientationsare valid for the ground plane, and only “outside” orien-tations are valid for buildings. The parameterα andβ areadjusted regarding whether we are looking for ground orbuilding solutions.

Figure 5 presents results when the system only has a2 constraints, for instance solving the problem “I wantto test the views seeing those two particular buildings”.The distribution is mutimodal as more than one regionof space satisfy such constraint. Such density could beweighted by other information, for instance in order toonly sample to the locations corresponding to an apart-ment that one is trying to sell.

As constraints are added, we observe that the densitygoes from multimodal to unimodal (Figure 6). Figure 6shows localization of photographs taken in downtown At-lanta. Sampling the density extracted for each of thosephotographs gives often views close to the actual one.This confirms our modeling assumptions.

Results from Figure 7 shows localization of imagestaken from the web (flickr.com). Interestingly, the first

5

Figure 5: Exploring views which satisfy visibility constraints. Red dots correspond to buildings required as visible.First samples line correspond to ground location, second tobuilding locations.

image is manually tagged on flickr with most of the build-ing names. We used those tags to find a probable locationof the person taking the picture. Samples are slightly bi-ased towards the left as the two rightmost buildings are nottagged. The second image correspond to another imageextracted from flickr that we manually tagged. Again, wefind the building from which this picture has been takenform (191 Peachtree Tower).

5 Discussion

We introduced a method to efficiently estimate cam-era positions or view points from simple visibility con-straints. Thanks to visibility preprocessing in line space,the method is scalable to large 3D models. We demon-strated its accuracy for localizing photographs in a citymodel with as little as 12 buildings. The method willkeeps on getting more accurate as new objects are addedin models, as they will restrict even more the regions sat-isfying users visibility constraints.

References

[1] Bittner, P. Wonka, and M. Wimmer, “Visibility pre-processing for urban scenes using line space sub-division,” in Proceedings of Pacific Graphics 2001(Ninth Pacific Conference on Computer Graphicsand Applications)(B. Werner, ed.), (Los Alamitos,CA), pp. 276–284, IEEE Computer Society Press,Oct. 2001.

[2] C. Schmid and R. Mohr, “Local grayvalue invariantsfor image retrieval,”IEEE Trans. Pattern Anal. Ma-chine Intell., vol. 19, no. 5, pp. 530–535, 1997.

[3] P. Pritchett and A. Zisserman, “Wide baselinestereo matching,” inIntl. Conf. on Computer Vision(ICCV), pp. 754–760, 1998.

[4] P. Pritchett and A. Zisserman, “Matching and recon-struction from widely separated views,” inSMILE98 European Workshop on 3D Structure from Multi-ple Images of Large-Scale Environments, Freiburg,Germany, 1998.

6

Figure 6: Localizing photographs. We highlighted in color on both the photograph and the samples the buildings setas visible. On the map, the building with black dots are the one set as not visible. The last line corresponds to samplesfrom the density of Figure 1

7

Figure 7: Localizing photographs. Images are from flickr.com. We used the tags (building names) on the image onflick to locate the first images. We manually tagged the secondone. We highlighted in color on both the photographand the samples the buildings set as visible.

[5] A. Baumberg, “Reliable feature matching acrosswidely separated views,” inIEEE Conf. on Com-puter Vision and Pattern Recognition (CVPR),pp. 774–781, 2000.

[6] T. Tuytelaars and L. Van Gool, “Matching widelyseparated views based on affinely invariant neigh-bourhoods,” inBritish Machine Vision Conference(BMVC), pp. 412–422, 2000.

[7] J. Matas, O. Chum, M. Urban, and T. Pajdla, “Ro-bust wide baseline stereo from maximally stable ex-tremal regions,” inBritish Machine Vision Confer-ence (BMVC), pp. 414–431, 2002.

[8] H. Plantinga and C. R. Dyer, “Visibility, occlusion,and the aspect graph,”Int. J. Comput. Vision, vol. 5,no. 2, pp. 137–160, 1990.

[9] D. W. Eggert, K. W. Bowyer, C. R. Dyer, H. I. Chris-tensen, and D. B. Goldgof, “The scale space as-pect graph,”IEEE Trans. Pattern Anal. Mach. Intell.,vol. 15, no. 11, pp. 1114–1130, 1993.

[10] D. Cohen-Or, Y. Chrysanthou, and C. Silva, “Asurvey of visibility for walkthrough applications,”2000.

8

rao-blackwellized importance sampling of camera ...dellaert/ftp/quennesson06-3dpvt.pdfcation. in the...

Documents