automatic generation of high resolution urban zone digital elevation models

15
ELSEVIER ISPRS Journal of Photogrammetry & RemoteSensing 52 (1997) 33-47 PHOTOGRAMMETRY & REMOTE SENSING Automatic generation of high resolution urban zone digital elevation models 1 L. Gabet a, G. Giraudon b,*, L. Renouard a a ISTAR, Route des Lucioles, Sophia Antipolis, 06560 Valbonne, France b INR1A, 2004, Route des Lucioles, Sophia Antipolis, 06560 Valbonne, France Received 15 May 1996; accepted9 September 1996 Abstract Our paper presents an automatic generation of high resolution urban digital elevation models (DEMs) based on a highly redundant correlation process. We will discuss the difficulties of such a task by commenting on the state of the art, and we propose an approach in three main steps. In the first step, the image acquisition specification as image sequences leads to pairs with various base/height ratios in order to obtain good precision and few errors due to hidden parts. In the second step we use various stereovision methods and we merge the results, thus attributing to each pixel the most probable and precise elevation. In the third step we automatically extract terrain-DEM and building-DEM from computed DEM in order to specifically post-process each class. Finally, we combine these two DEMs to generate a final DEM which presents the best continuity for ground surface, and which respects sharp building discontinuities. The results obtained with an operational example (including image size, difficulty of the scene) demonstrate the feasibility of generating metric resolution urban data bases from automated digital stereo methods. Keywords: Digital Elevation Model; sequence of aerial images; stereovision process; multiple correlation algorythms; urban database 1. Introduction Creating 3D databases of terrain and especially of buildings in dense urban zones (e.g., digital elevation model (DEM) in city areas) is an issue of high im- portance to many applications: cartography, mobile communications, architecture, photo-interpretation, street fighting simulation, etc. This domain is now a very active research field. We can cite as examples: * Corresponding author. I A first version of this work has been already published in French in the Bulletin de la SociEt6 Fran~aise de Photogram- metric et de TE1Ed~tection,Vol. 3, No. 135, September 1994. in the USA the big project Radius (Gerson, 1992; Gerson and Wood, 1994; Collins et al., 1995) or McKeown's team (McKeown et al., 1994), in Europe the active groups at ETHZ (project Amobe) (Hen- ricsson, 1996b), at the university of Bonn (Frrstner, 1996), at the university of Graz (Gruber et al., 1995) or at the National Geographical Institute in France (Jamet and Le Men, 1995; Jamet, 1996). Recently in Ascona, a workshop has been held (Gruen et al., 1995) on the topic of automatic extraction of Man-Made Objects from Aerial and Space Images, a workshop in which 3D reconstruction and building extraction have had a large audience. These data bases are currently generated almost 0924-2716/97/$17.00 Copyright© 1997 Published by Elsevier Science B.V. All rights reserved. PIIS0924-2716(96)00030-5

Upload: l-gabet

Post on 04-Jul-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automatic generation of high resolution urban zone digital elevation models

E L S E V I E R ISPRS Journal of Photogrammetry & Remote Sensing 52 (1997) 33-47

PHOTOGRAMMETRY & REMOTE SENSING

Automatic generation of high resolution urban zone digital elevation models 1

L. G a b e t a, G. G i r a u d o n b,*, L. R e n o u a r d a

a ISTAR, Route des Lucioles, Sophia Antipolis, 06560 Valbonne, France b INR1A, 2004, Route des Lucioles, Sophia Antipolis, 06560 Valbonne, France

Received 15 May 1996; accepted 9 September 1996

Abstract

Our paper presents an automatic generation of high resolution urban digital elevation models (DEMs) based on a highly redundant correlation process. We will discuss the difficulties of such a task by commenting on the state of the art, and we propose an approach in three main steps. In the first step, the image acquisition specification as image sequences leads to pairs with various base/height ratios in order to obtain good precision and few errors due to hidden parts. In the second step we use various stereovision methods and we merge the results, thus attributing to each pixel the most probable and precise elevation. In the third step we automatically extract terrain-DEM and building-DEM from computed DEM in order to specifically post-process each class. Finally, we combine these two DEMs to generate a final DEM which presents the best continuity for ground surface, and which respects sharp building discontinuities. The results obtained with an operational example (including image size, difficulty of the scene) demonstrate the feasibility of generating metric resolution urban data bases from automated digital stereo methods.

Keywords: Digital Elevation Model; sequence of aerial images; stereovision process; multiple correlation algorythms; urban database

1. Introduction

Creating 3D databases of terrain and especially of buildings in dense urban zones (e.g., digital elevation model (DEM) in city areas) is an issue of high im- portance to many applications: cartography, mobile communications, architecture, photo-interpretation, street fighting simulation, etc. This domain is now a very active research field. We can cite as examples:

* Corresponding author. I A first version of this work has been already published in French in the Bulletin de la SociEt6 Fran~aise de Photogram- metric et de TE1Ed~tection, Vol. 3, No. 135, September 1994.

in the USA the big project Radius (Gerson, 1992; Gerson and Wood, 1994; Collins et al., 1995) or McKeown's team (McKeown et al., 1994), in Europe the active groups at ETHZ (project Amobe) (Hen- ricsson, 1996b), at the university of Bonn (Frrstner, 1996), at the university of Graz (Gruber et al., 1995) or at the National Geographical Institute in France (Jamet and Le Men, 1995; Jamet, 1996). Recently in Ascona, a workshop has been held (Gruen et al., 1995) on the topic of automatic extraction of Man-Made Objects from Aerial and Space Images, a workshop in which 3D reconstruction and building extraction have had a large audience.

These data bases are currently generated almost

0924-2716/97/$17.00 Copyright © 1997 Published by Elsevier Science B.V. All rights reserved. P I I S 0 9 2 4 - 2 7 1 6 ( 9 6 ) 0 0 0 3 0 - 5

Page 2: Automatic generation of high resolution urban zone digital elevation models

34 L. Gabet et al./ ISPRS Journal of Photogrammetry & Remote Sensing 52 (1997) 33-47

manually by photogrammetric techniques. However, passive stereovision techniques used in computer vi- sion are now sufficiently developed to start thinking of more automated solutions for digital photogram- metric systems. Ackermann (1996) in a recent work- shop writes: "Multiple image data constitute, by the way, a basic advantage of digital photogrammetric systems to exploited in future". In the same way, we have developed an approach based on not only multiple image data, but also multiple algorithms from which we can merge results. So, this paper addresses the problem of DEM reconstruction in city areas (with sub-metric resolution) using automated stereovision techniques based on a multiple-base- line/multiple-algorithms approach.

The main principles of passive stereovision con- sist in (Faugeras, 1993): • acquiring a stereo pair and a geometric model to

obtain epipolar lines (calibration or autocalibra- tion phase);

• choosing and extracting the primitives for match- ing attributes;

• creating the disparity map which shows the dis- tortions between the two images along the main parallax. This requires choosing and using a matching algorithm to localize homologous at- tributes on both images corresponding to the same point in the object space;

• smoothing the disparity map; • reconstructing the attributes in 3D from the dis-

parity map and the geometric registration (abso- lute DEM).

Finally, a qualification of the process can be made by means of ground control points if they are avail- able.

From the methodology point of view, the type of primitives is closely connected to the choice of the matching algorithm and depends very much on hypotheses about the object observed. Roughly speaking, we usually have two main categories of solutions: the photometric approach (area-based) and the geometric approach (feature-based). Of course, different mixed approaches can be made. The first category assumes that the object has a rich texture and that the surface is continuous. The reconstruction in 3D of the earth surface from SPOT stereo images is the typical case. The second category assumes that the object has a sparse texture and presents large dis-

continuities of depth, and the typical case is interior scenes.

In the first case, dense disparity maps and ex- cellent reconstructions in 3D (Roug6 et al., 1991; Muller and Day, 1989; Paar and Poelzleitner, 1991; Renouard, 1991; Fua, 1991) are obtained from meth- ods based on correlation associated with pixel-type attributes. Since 1988 ISTAR's industrial production of 3D data is based upon these principles (Renouard, 1987, 1990, 1991).

In the second case, the primitives selected are usu- ally associated with contours. The most frequently used are contour chains (Meygret et al., 1992; Robert and Faugeras, 1991; Serra and Berthod, 1994), or line segments resulting from the polygonal approxi- mation of these chains (Medioni and Nevatia, 1985; Long and Giraudon, 1986; Ayache and Faverjon, 1987). The algorithms used in this case are of di- verse kinds, e.g., prediction-verification, relaxation or even dynamic programming algorithms. The re- suits obtained are relatively good but not very dense.

The main difference between these two domains lies in the notion of discontinuity of the surface to reconstruct (see Faugeras et al. (1992) for qual- itative and quantitative comparison between these approaches). Correlation-based methods do not pre- cisely localize discontinuities, and a dense disparity map cannot be obtained with contour-based methods. Reconstructing urban zones in 3D by stereovision is a difficult problem since we have a continuous and textured object which presents high surface disconti- nuities due to buildings.

Moreover, another constraint must be taken into account. In effect, good reconstruction precision requires a high B/H ratio, Inversely, minimizing matching errors, i.e., gross reconstruction errors, re- quires a low B/H ratio. As we will see below, this is an important constraint which must be taken into account when acquiring a stereo pair.

We present in this paper an original approach for computing urban DEMs based on three principles:

• multiple-baseline: acquiring an image sequence to generate several stereo pairs with various B~ H ratios;

• multiple algorithms: generating a digital eleva- tion model (DEM) by using different matching techniques based on correlation techniques;

• classes extraction: segmenting the DEM into two

Page 3: Automatic generation of high resolution urban zone digital elevation models

L. Gabet et al./ ISPRS Journal of Photogrammetry & Remote Sensing 52 (1997) 33-47 35

classes, terrain-DEM and building-DEM, and re- fining the two DEMs taking into account the constraints typical of each class.

We deliberately chose an operational example with 5000 × 5000 pixel images of the city of Mar- seilles (France) to test our approach, and the quality of the results obtained shows the feasibility of cre- ating 3D urban databases by the passive stereovision process.

In the next section we present briefly the state of the art in computer vision regarding stereo in urban zones, and the main principles of our approach. In Section 3, we present in detail the various points of our methodology with a large number of results. Section 4 gives our conclusions.

2. Defining the problem

2.1. State of the art

A large number of papers on stereovision have been published by the computer vision commu- nity. We can refer to the overview articles published by Barnard and Fischler (1982) and more recently by Dhond and Aggarwal (1989), and the book by Faugeras (1993). For digital photogrammetry ap- proaches, we can read the Heipke's recent overview (Heipke, 1996). Regarding traditional photogramme- try approaches, we can consult the photogrammetry reference manual (The Manual of Photogrammetry, 1980). In 1995, the last research work in stereovision for DEM reconstruction and buildings extraction has been presented in the Ascona workshop (Gruen et al., 1995). However, no automated system has proven to be entirely satisfactory in all the various applica- tions using stereovision. In the high resolution urban stereo domain (one pixel less than 1 meter) the ap- proaches developed in some articles have become standard.

The first category of work is based on photome- try (area-based), and usually uses algorithms based on correlation. Dense maps are obtained with these techniques, but they are often wrong near large dis- continuities. Kanade and Okutomi's solution (Oku- tomi and Kanade, 1992; Kanade and Okutomi, 1994) is based on the computation of an adaptive corre- lation window whose size is determined by noise modelling and information associated with the im-

age gradients. This method allows refining an initial disparity map. Lotti and Giraudon (1994) compute a disparity map from four adaptive windows around each pixel by incorporating information delivered by a contour detector.

Fua (1991) uses a classical correlation algorithm, but post-processes the disparity map by anisotropic smoothing using the gradient information of the original image; Cochran and Medioni (1992) also follow this idea in their work. These two approaches can be related to the fourth category explained below.

The second categor~ is based on structured at- tribute matching (feature-based). The early works used segment matching (Medioni and Nevatia, 1985; Long and Giraudon, 1986; Ayache and Faverjon, 1987), and the more recent ones on urban zones (Chung and Nevatia, 1992; Venkateswar and Chel- lappa, 1992), use segment and corner grouping in or- der to produce high symbolic-value entities (facets). This facilitates the matching and ~ 3D coherency verification of this information. In this way, a more complete work is realized in the Amobe project (Henricsson, t996a,b: Bignone et al., 1996). We also have to cite work based on corner features (Frrstner, 1986).

Another very interesting category of methods is based on the use of models. For photometric mod- elling, the most representative work is by Maitre and Luo (1992). They use an initial disparity map and a segmentation of the photometric image into regions. They assume that each region is connected to a projection of the surface model. They correct the disparity map by minimizing the disparity mea- sured in each region with respect to the theoretical value attributed to a planar or quadratic model of the disparity. The satisfactory results obtained over a difficult-to-reconstruct building such as the Grand Palais in Paris, show the promise of the approach. Frrstner presents an approach based on geometric modelling (Frrstner, 1988). His method is based on CAD models which are interactively matched build- ing after building on the stereo pair. To refer to other works, see Hoff and Ahuja (1989).

The fourth category is related to fusion-based methods. For example, works of Hannah (1989) and Hsieh et al. (1992) combine matching results from algorithms based on the photometry and algorithms based on structured attributes.

Page 4: Automatic generation of high resolution urban zone digital elevation models

36 L Gabet et al./ ISPRS Journal of Photogrammetry & Remote Sensing 52 (1997) 33-47

These works aim at combining the advantages of both approaches (dense map and reliability near discontinuities).

The results of Mc Keown et al. (Hsieh et al., 1992; McKeown and Perlant, 1992; Roux et al., 1995) are very interesting, and are in an application scheme similar to ours with respect to the density of the buildings in the scene and the size of the images processed.

Lastly, the work of Okutomi and Nakahara (1992) presents a stereo matching method which uses sev- eral stereo pairs with increasing B/H. This work is theoretically well argued and based on a photometric method which tends to minimize a function of dis- parity cost on all stereo pairs. The aim is to obtain a dense map which is both as precise as possible (large B/H) and as reliable as possible (small B/H).

In the next section, a method which combines the last approach with a fusion-based approach is presented.

Finally, other methodologic approaches using radar sensors (ERS1), which are based on inter- ferometry techniques, are being developed and the first results seem quite promising (Perlant and Mas- sonnet, 1992; Labrousse et al., 1995).

2.2. General strategy

The second statement leads us to combine meth- ods. Indeed, combining methods seems better than developing only one universal algorithm which takes into account all constraints. Beside the deformation and hidden faces problems, zones with constant ra- diometry (absence of texture) must be taken into account in order to obtain a dense disparity map pre- senting a precise definition of the edges of building tops.

Therefore we developed a stereovision method for high resolution urban scenes based on a highly re- dundant correlation process. This process combines:

• the use of several stereo pairs with various B/H's which are validated in an operational context (5000 x 5000 pixel image sequence) over the city of Marseilles (France) presents a large variety of urbanization cases;

• the use of the results of several matching algo- rithms based on photometry.

Moreover, the computed DEM is segmented in the first phase into two classes

(1) terrain-DEM; (2) building-DEM,

in order to separate problems and adapt the final production method of the DEM with respect to the a priori continuity/discontinuity hypothesis that we make on objects that we observe.

The strategy we have developed to compute urban DEMs is based on two observations:

1. Standard pairs of aerial images with 60% over- lap poorly suit automated matching because of large deformations, hidden faces, and roads occluded by buildings;

2. Automated digital processing which takes into account these constraints presents large difficulties. Today, no existing algorithm is able to process all constraints. Each method has its pros and cons with respect to each constraint.

The first statement leads us to adapt the image. In order to do so, we undertook an aerial image acquisi- tion program with 87% overlap between each scene. This large overlap reduces the deformations, and the hidden parts between two consecutive scenes. How- ever, we can note that four consecutive scenes (three pairs) can constitute the equivalent of a standard pair with 60% overlap, which guarantees sufficient geometric precision.

3. Description of the method

3.1. Data

A 30 aerial image track of Marseilles was ac- quired with the following technical characteristics:

• black and white emulsion, images taken on June 29th 1992 at 11.50 am (the time was chosen in order to minimize cast shadows);

• the image is taken at 1 : 8000 scale; • the long 214.22 mm focal length limits the

distortion on the edge of the images; • the 25/zm digitizing step gives a 20 cm average

resolution on the ground; • images are acquired vertically. The 1 km × i km zone of interest, i.e., 5000×5000

pixels, was chosen for the diversity of its urban struc- tures. There are two types of representative structures in this zone: the old town and the new town.

The characteristics of the old town are

Page 5: Automatic generation of high resolution urban zone digital elevation models

L Gabet et al./ISPRS Journal ofPhotogrammetry & Remote Sensing 52 (1997) 33-47 37

Image 1 Image 2

Image 3 Image 4

Fig. 1. 200 m × 200 m samples of the four vertical images.

• a high density of four-story buildings with tiled roofs (periodic texture);

• narrow streets (on the order of 4 m); • irregular street grid.

The characteristics of the new town are: • a few scattered 20-story buildings; • 10-story buildings with a large ground area and

concrete roofs (uniform texture); • wide streets sometimes planted with trees.

Four successive representative images covering the zone of interest were selected from the 30 images of the track. Regions of each of these images are shown in Fig. I. A wider view of the zone is shown in Fig. 7.

The geometry of the images was modeled by aero- triangulation with ground control points on 1 : 5000

maps from characteristic points (sidewalks, bases of buildings, etc.) at ground level. The height of these points was computed from the existing contour lines of the map.

3.2. Using the method

The method can automatically generate a Digital Terrain Model from the four initial images. The six main steps of the method are:

• the computation of a DEM for six possible stereo pairs with different matching algorithms;

• the generation of a DEM "filtered" from various DEMs;

• the segmentation of the filtered DEM into regions with a homogeneous surface criterion;

Page 6: Automatic generation of high resolution urban zone digital elevation models

38 L Gabet et al./ ISPRS Journal of Photogrammetry & Remote Sensing 52 (1997) 33-47

• the classification of the filtered DEM into two types of surfaces, those related to buildings and those related to the ground;

• approximating building blocks with polygons; • refining the filtered DEM with the building poly-

gons.

3.2.1. The matching algorithms Three matching algorithms were used. All three

are based on intercorrelation with a fixed window size of 7 × 7 or 9 × 9.

• The first is a standard intercorrelation algorithm; • the second is also based on the intercorrelation

algorithm. All the pixels of the image are processed as follows: for all possible disparities, the correla- tion is computed and the correlation coefficient is stored. In the space thus created for each pixel the disparity which defines a planar surface of maximum coefficient must be found;

• the third is also based on the intercorrelation algorithm, to which is added an order constraint.

3.2.2. DEM computations Three of the six possible pairs are generated from

the successive images of the sequence, with 87% overlap. Two pairs are generated by using every second image, giving 74% overlap. The last pair is generated with the first and the last images of the sequence with 61% overlap.

For each of the six pairs a matching computation is run using the three algorithms described above. Each computation is independent from the others. The results are in the form of disparity maps: one disparity map per pair and per algorithm. Each dis- parity map shows the distortions between the two images along the main parallax. The disparities be- tween the two images are given with respect to a reference image (here the right image). The ref- erence image is different for all pairs so that the disparity maps cannot be compared to each other.

The disparity maps can be overlaid and com- pared by projecting them into a common carto- graphic frame. Geometric models are used for the projection. The information of the disparity maps (x, y, disp) are projected in absolute cartographic coordinates (X, Y, Z).

The disparity maps make the elementary DEM once they are projected.

3.2.3. Comparison of the DEMs 18 DEMs were computed from the 6 pairs and the

3 algorithms. The theoretical altimetric precision of these DEMs is tied to the stereo pair characteristics. The pairs with 87%, 74% and 61% covering respec- tively have a 0.14, 0.28 and 0.42 B/H, i.e., an error of one pixel in matching generates an absolute error of 1.43, 0.71 and 0.47 m, respectively.

These DEMs are different for various reasons: • their theoretical altimetric precision, determined

by the B/H, is different; • the computed height reliability (in the sense

"free of error") is different, the greater the overlap between the image pairs is important (small B/H), the more reliable the computed heights;

• their hidden parts are different. Some narrow streets are hidden in one pair and visible in another.

Comparing the DEMs aims at using all the dif- ference characteristics between the DEMs in order to generate the most reliable (minimizing errors), the most precise and complete (minimizing the hidden zones) filtered DEM.

The algorithm used in order to make this com- parison is based on a majority vote method on all the DEMs. The DEMs can be overlaid so that com- paring heights can be made pixel by pixel. Two heights are considered identical if their difference is less than 2 m. A threshold was chosen to make the various DEMs compatible whatever the value of the B/H. The 2-m value corresponds to the precision of the least precise DEMs. The DEMs are classi- fied according to the precision of their results from the most to the least precise (B/H decreasing). The most precise height is given priority when selecting the final height for the filtered DEM. The practical analysis of the results shows that most errors of indi- vidual DEMs do not appear in the filtered DEM (see Fig. 2).

• The noise caused by moving vehicles was sup- pressed.

• The edges of buildings which are poorly defined in the individual DEMs due to the algorithms used (intercorrelation) are much better localized.

• The number of hidden surfaces is reduced, especially as some narrow streets whose height was not measured in certain initial DEMs now show in the final DEM.

The quality of these results is due in particu-

Page 7: Automatic generation of high resolution urban zone digital elevation models

L Gabet et al./ISPRS Journal ofPhotogrammetry & Remote Sensing 52 (1997) 33-47

| |

|

39

Algor i thm 1 Algor i thm 2 Algor i thm 3

The grey levels represent heights. High points are in white, low points are in black.

The DEM obtained by fil- tering the 18 initial DEMs. Only reliable heights are kept..

Fig. 2. For one pair (image 1 and image 2) comparison of the DEMs computed from the three algorithms and the filtered DEM. Zone size: 200 m x 200 m.

lar to the fact that the correlation algorithms used generate aberrant results when a correlation error occurs, which makes discrimination of the method by comparing the DEMs even more efficient, but only if there are enough DEMs where the errors are decorrelated.

h o m o g e n e o u s surface is separated from neighbour- ing surfaces by a difference at least greater than 4 m, or by a zone of no information. This segmentation allows defining a set of surfaces of homogeneous height with a single number attributed to each sur- face.

3.2.4. Segmentation o f DEMs An urban DEM presents two aspects: ground and

buildings. Errors which remain in a DEM are also di- vided into two types: errors due to buildings caused in particular by the edges of buildings, and errors due to the ground height which includes urban fix- tures (lampposts, benches, bus shelters, etc.), parked cars, trees, etc. To rectify these errors the DEM is segmented into two types of regions corresponding on the one hand to surfaces linked to the ground, and on the other hand to surfaces linked to build- ings. In order to make this discrimination the DEM is segmented into surfaces of homogeneous height (see Fig. 3). A surface is considered homogeneous if the local height differences are less than 4 m. A

3.2.5. Building selection From a segmented DEM with a homogeneous

height criterion, the surfaces are divided into two groups: ground surfaces and building surfaces. The criterion of selection is: a surface belongs to the ground group if its height is lower or equal to the neighbouring surfaces, if not, it belongs to the build- ing group. We can cite a recent paper of Weidner and FSrstner (Weidner and FOrstner, 1995) in which a building extraction method is presented. This method runs in two steps; firstly, on the DEM, a building detection is made by mathematical morphology; sec- ondly, on each building detection, a reconstruction is made by fitting geometric models (parametric or prismatic) in an iterative way.

Page 8: Automatic generation of high resolution urban zone digital elevation models

40 L. Gabet et al . / ISPRS Journal of Photogrammetry & Remote Sensing 52 (1997) 33-47

400mx400m DEM building DEM

D E M segmented into regions, a

grey level per region

Fig. 3. Segmentation of the DEM into homogeneous heights.

The position (higher, lower or of equal height) of a surface with respect to connecting surfaces is computed by comparing their heights. We take the border pixels of two surfaces. Then we compute the height difference between two points, this difference is taken into account in the difference between the two surfaces. A surface is said to be higher than its neighbour if 90% of the contact points between the two surfaces are higher.

This selection criterion is efficient in most cases for discriminating ground from buildings. The selec- tion errors are caused by buildings which are not right along a road, and building roofs with a gradual slope to the ground. These surfaces are manually integrated into the building group. In our zone of

ground D E M Fig. 4. Building DEM (top) and ground DEM (bottom) ob- tained by classifying the regions into building-type regions and ground-type regions.

study, only a few buildings were misselected, and all roads were correctly selected.

The segmentation into two classes can generate two DEMs, one containing buildings and one the ground (Fig. 4).

3.2.6. Building polygons (contours) The analysis of the building-DEM shows that the

roof height of the buildings is correctly recovered. On the other hand, the outside edges of the buildings are poorly localized. However, filtering from the various DEMs has refined the localization of the edges. In particular, none of the heights attributed to a building is outside the surface of its roof. The errors in positioning edges are from 2 to 5 pixels.

Page 9: Automatic generation of high resolution urban zone digital elevation models

L. Gabet et al. /1SPRS Journal of Photogrammetry & Remote Sensing 52 (1997) 33-47 41

Automat ica l ly ext rac ted building

block contours

Manual ly ext racted building

block contours Fig. 5. Comparison between manually (bottom) and automati- cally (top) extracted building blocks.

When improving building edge positioning we noticed that most buildings or blocks of buildings were simple geometric shapes (squares or rectan- gles), and could be defined very precisely by convex contours. Two steps are necessary for the computa- tion: polygonal approximation of the region borders created previously; creation of a convex polygon per region followed post-processing by fusing the poly- gons which intersect. When two convex polygons intersect, we assume that the buildings they surround belong to the same block of buildings. This fusion of convex polygons is iterated as many times as

Fig. 6. DEM of the bare-ground.

necessary until no more polygons intersect. Each polygon is then filtered by a polynomial to determine a maximum of long straight line segments.

Finally, each polygon is assigned the average height of the building.

However, concave-shaped buildings are very poorly defined with this method and must be out- lined manually. This method is fairly efficient for isolating building-blocks. The polygons defining the buildings within a block are suppressed when fusing the polygons.

For evaluation, all building contours were man- ually extracted and compared to the contours au- tomatically computed (see Fig. 5). To compare the polygons, three criteria were used: the contour shape, the maximum, and the average shifting in pixels be- tween contours.

The contour shapes are comparable for 70% of the buildings; the polygon shapes are very different for the other 30%. The polygons with comparable shapes have maximum deviations of 5 pixels be- tween the contours and the error average is on the order of 2 pixels.

3.2.7. Ground processing The ground-DEM obtained after selecting the ho-

mogeneous surfaces contains not only bare-ground type information, but all small relief along the streets, such as parked cars, bus shelters, trees, tele- phone booths, etc., are also integrated to the DEM. In order to generate a bare-ground DEM, all these small high frequencies have to be eliminated from

Page 10: Automatic generation of high resolution urban zone digital elevation models

42 L. Gabet et al./ ISPRS Journal of Photogrammetry & Remote Sensing 52 (1997) 33-47

Fig. 7. Mosaic orthophoto computed from four images arid the ground-DEM.

the ground-DEM. The amplitude of these high fre- quencies is low, and is always less than 4 m.

Because of the absence of real texture (little radio- metric variation) the altimetric information of street centres is difficult to recover. The information is greater along sidewalks, on pedestrian crossings, and more generally in all textured zones.

The filtering principle of the ground-DEM is based on the hypothesis that the ground surface

is very regular when free of high frequencies. The ground-DEM is filtered in three steps. The informa- tion-free zones (building sites, hidden zones, etc.) are interpolated by a regularisation technique. Then the DEM is convolved by a filter (deviation with respect to the average) in order to suppress high fre- quencies. An interpolation by regularisation is made once more to fill in the gaps left by the filtering. The result is shown in Fig. 6.

Page 11: Automatic generation of high resolution urban zone digital elevation models

L. Gabet et al./ ISPRS Journal of Photogrammetry & Remote Sensing 52 (1997) 33-47 43

Fig. 8. DEM (buildings, bare ground).

.With the help of 1 : 5000 maps the height of about twenty spot heights was compared to the height of the corresponding points in the bare-ground DEM. The result gives a standard deviation of 40 cm for the height differences. The maximum errors are 1 m.

3.2.8. Building processing The polygons (building contours) previously com-

puted are used here to correct the building-DEM. The

edges of the polygons represent building edges. Each building is constrained by the limits of the polygon. The lack of altimetric information for buildings, due to non-correlated zones, is filled in by a median filter.

The validation of the building heights is difficult: maps indicating the height of buildings do not exist. The height of the buildings was measured for about thirty points with a stereo viewer from the 60% overlap pair, and compared to the corresponding

Page 12: Automatic generation of high resolution urban zone digital elevation models

44 L. Gabet et al. /1SPRS Journal of Photogrammetry & Remote Sensing 52 (1997) 33-47

C3

Fig. 9. Building 3D vectors automatically computed and manually corrected.

value in the DEM. The results show a 1.5 m standard deviation in Z.

3.2.9. The f inal D E M The final DEM is computed from the two

DEMs: the ground-DEM and the building-DEM. The method is very simple: the building-DEM is overlaid on the ground-DEM in built-up areas.

4. Conclusion

In this paper we presented an automated method to produce high resolution DEMs in urban zones. In order to take into account the specificities of the application domain, our approach consists in decomposing the problem and optimizing each part. To that effect we have three main steps:

Page 13: Automatic generation of high resolution urban zone digital elevation models

L Gabet et al./ISPRS Journal ofPhotogrammetry & Remote Sensing 52 (1997) 33-47 45

Fig. 10. Map of Marseille: same location.

(1) Specification of the image acquisition as an image sequence to create pairs with different B/H's so that we have good precision and few errors due to hidden parts.

(2) Use of the different stereovision methods and fusion of the results allows attributing the most realistic and precise height to each pixel.

(3) Segmentation of the computed DEM into two classes: the ground-DEM and the building-DEM,

allowing post-processing specific to each of these classes. Then recombining the DEMs to produce the final DEM presenting the best continuity for the ground surface and respecting the sharp discontinu- ities of the buildings.

The results that we have presented (see Figs. 7-10 for the whole zone) in this real-scale example (with respect to image size and difficulty of the scene) show the feasibility of generating urban data bases in

Page 14: Automatic generation of high resolution urban zone digital elevation models

46 L. Gabet et aL / ISPRS Journal of Photogrammetry & Remote Sensing 52 (1997) 33--47

meter resolution from digital methods of automated stereovision. The hard constraints of our method need an absolute reference to combine DEMs in (X, Y, Z) space. This means to have a cartographic map with a good scale, which is sometimes difficult to get. The continuation of our work will consist

in diversifying the examples presented to take into account all types of urban structures, which vary widely in big cities in North America, Europe and

southeast Asia. We can note that in Jamet (1996),

O. Jamet presents an interesting comparison, in eco-

nomical terms, of our approach versus a manual making process for creating 3D urban databases,

which shows that rentability of automatic process strongly depends on the city structure type.

R e f e r e n c e s

Ackermann, E, 1996. Some considerations about feature match- ing for the automatic generation of digital elevation models. OEEPE Workshop on the Application of Digital Photogram- metric Workstations, Lausanne (Switzerland), March 1996. See http://dgrwww.epfl.ch/PHOT/publicat/wks96/tocwks96.html.

Ayache, N. and Faverjon, B., 1987. Efficient registration of stereo images by matching graph descriptions of edge segments. Int. J. Comput. Vision, 1(2), April 1987.

Barnard, S.T. and Fischler, M.A., 1982. Computational stereo. Comput. Surveys, 14(4): 553-572.

Bignone, F., Henricsson, O., Fua, P. and Stricker, M., 1996. Au- tomatic extraction of generic house roofs from high resolution aerial imagery. Computer Vision '96, March 1996.

Chung, R.C.K. and Nevatia, R., 1992. Recovering building struc- tures from stereo. IEEE CVPR, No. 2840-5/92: 64-73.

Cochran, S.D. and Medioni, G., 1992. 3-d Surface description from binocular stereo. IEEE Trans. Pattern Anal. Machine Intelligence, 14(10): 981-994.

Collins, R.T., Jaynes, C., Stolle, E, Wang, X., Cheng, Y.Q., Han- son, A.R. and Riseman, M., 1995. A system for automated site model acquisition. In: SPIE (Editor), Integrating Photogram- metric Techniques with Scene Analysis and Machine Vision II, Orlando, FL, April 1995, pp. 244-254.

Dhond, U.R. and Aggarwal, J.K., 1989. Structure from stereo - a review. IEEE Trans. Systems Man Cybernetics, 19(6): 1489-1510.

Faugeras, O., Fua, P., Hotz, B., Ma, R., Robert, L., Thonnat, M. and Zhang, Z., 1992. Quantitative and qualitative comparison of some area- and feature-based stereo algorithms. IWRCV. Bonn, Germany, 1992.

Faugeras, O., 1993. Three-Dimensional Computer Vision: a Ge- ometric Point of View. MIT Press, 1993.

Ftirstner, W., 1986. A feature based correspondence algorithm for image matching. Int. Arch. Photogramm. Remote Sensing Symp., 26(3/3): 150-166.

Frrsmer, W., 1988. Model based detection and location of houses as topographic control point in digital images. Int. Arch. Photogramm. Remote Sensing, 27: 505-517.

Frrsmer, W., 1996. Technical report, Institute of Photogramme- try, University of Bonn, 1996. http://www.ipb.univ-bonn.de

Fua, P., 1991. Combining stereo and monocular information to compute dense depth maps that preserve discontinuities. IJCAI Conference, Sydney, Australia, August 1991.

Gerson, D.J., 1992. Radius: The government viewpoint. SPIE, 20th AIPR Workshop, Vol. 1623, pp. 148-151.

Gerson, D.J. and Wood, S.E., 1994. Radius phase ii: The radius testbed system. ARPA94, pp. 231-237.

Gruber, M., Pasko, M. and Leberl, F., 1995. Geometric versus texture detail in 3d models of real world buildings. Ascona Workshop, April 1995, pp. 189-198.

Gruen, A., Kuebler, O. and Agouris, P. (Editors), 1995. Auto- matic Extraction of Man-Made Objects from Aerial and Space Images, Birkhauser, Monte Verita Ascona (Switzerland), April 1995.

Hannah, M.J., 1989. A system for digital stereo image matching. Photogramm. Eng. Remote Sensing, 55(12): 1765-1770.

Heipke, C., 1996. Overview of image matching techniques. OEEPE Workshop on the Application of Digital Photogram- metric Workstations, Lausanne (Switzerland), March 1996. See http://dgrwww.epfl.ch/PHOT/publicatlwks96/tocwks96.html.

Henricsson, O., 1996a. Analysis of image structures using color attributes and similarity relations. Technical sciences, Swiss Federal Institute of Techno. Zurich, Zurich, June 1996.

Henricsson, O., 1996b. Project amobe: Strategies, current work. In: Commission III, Theory and Algorithms. International So- ciety for Photogrammetry and Remote Sensing, Vienna, July 1996, pp. 321-330

Hoff, W. and Ahuja, N., 1989. Surfaces from stereo: Integrating feature matching, disparity estimation, and contour detection. IEEE Trans. Pattern Anal. Machine Intelligence, 11(2): 121- 136.

Hsieh, Y.C., McKeown, D. and Perlant, EP., 1992. Performance evaluation of scene registration and stereo matching for carto- graphic feature extraction. IEEE Trans. Pattern Anal. Machine Intelligence, 14(2): 214-238.

Jamet, O. and Le Men, H., 1995. Digital photogrammetry at the french national geographic institute: presentation of the re- search policy of a national mapping agency. In: SPIE (Editor), Integrating Photogrammetric Techniques with Scene Analy- sis and Machine Vision II, Orlando, Florida, April 1995, pp. 140-147.

Jamet, O., 1996. Automated feature extraction on digital pho- togrammetric systems. In: Commission III, Theory and Algo- rithms, Vienna, July 1996, pp. 365-376.

Kanade, T. and Okutomi, M., 1994. A stereo matching algorithm with an adaptive window: Theory and experiment. IEEE Trans. Pattern Anal. Machine Intelligence, 16(9): 920--932.

Labrousse, D., Dupont, S. and Berthod, M., 1995. Synthetic aperture radar interferometry: a markovian approach for phase unwrapping. In: SPIE (Editor), Integrating Photogrammetric Techniques with Scene Analysis and Machine Vision II, Or- lando, Florida, April 1995.

Page 15: Automatic generation of high resolution urban zone digital elevation models

L. Gabet et aL /ISPRS Journal of Photogrammetry & Remote Sensing 52 (1997) 33-47 47

Long, P. and Giraudon, G., 1986. Stereo matching based on contextual line-region primitives. 8th ICPR, Paris, October 1986. IEEE.

Lotti, J.L. and Giraudon, G., 1994. Adaptive window algorithm for aerial image stereo. In: Spatial Information from Digital Photogrammetry and Computer Vision. International Society for Photogrammetry and Remote Sensing, Munich, September 1994, pp. 517-524.

Maitre, H. and Luo, W., 1992. Using models to improve stereo reconstruction. IEEE Trans. Pattern Anal. Machine Intelli- gence, 14(2): 269-277.

The Manual of Photogrammetry, 1980. Am. Soc. Photogramme- try, Falls Church, VA, 1980.

McKeown, D.M. and Perlant, EP., 1992. Refinement of disparity estimates through the fusion of monocular image segmenta- tions. IEEE CVPR, No. 2855-3/92: 486-492.

McKeown, D.M., Cochran, S.D. and Ford, S.J., 1994. Research in the automated analysis of remotely sensed imagery: 1993- 1994. ARPA Image Understanding Workshop, Montery, CA, November 1994.

Medioni, G. and Nevatia, R., 1985. Segment-based stereo match- ing. CVGIP Graph. Models Image Processing, 31: 2-18, 1985.

Meygret, A., Thonnat, M. and Berthod, M., 1992. A pyrami- dal stereovision algorithm based on contour chain points. In: Computer Vision ECCV, Antibes, France, April 1992, pp. 83-88.

Muller, J.P. and Day, T., 1989. Digital elevation model produc- tion by stereo-matching spot image-pairs: a comparison of algorithms. Image Vision Comput., 7(2).

Okutomi, M. and Kanade, T., 1992. A locally adaptive window for signal matching. Int. J. Comput. Vision, 7(2): 143-162.

Okutomi, M. and Nakahara, T., 1992. A multiple-baseline stereo method. IEEE Image Understanding Workshop DARPA, 1992, pp. 409-427.

Paar, G. and Poelzleitner, W., 1991. Stereo vision and 3d terrain modelling for planetary exploration. First ESA Workshop on

Computer Vision and Image Processing for Spaceborne Appli- cations, Noordwijk, June 1991.

Perlant, E and Massonnet, D., 1992. Different spot dem applica- tions for studies in sar interferometry. ISPRS Commission IV, Washington, USA, August 1992.

Renouard, L., 1987. Cr6ation automatique de m.n.t. ~ partir de couples d'images spot. Technical report, Congr6s, SPOT 1, Utilisation des Images, Bilan, REsultats. CNES, Paris, 1987.

Renouard, L., 1990. Experiences with automated terrain extrac- tion from spot data. In: 10th Earsel sympothium, Paris, June 1990.

Renouard, L., 1991. Restitution automatique du relief h partir de Couples St~r6oscopiques d'images du satellite SPOT. Th~se de doctorat, Ecole Polytechnique, July 1991.

Robert, L. and Faugeras, O., 1991. Curve-based stereo: Figural continuity and curvature. In: CVPR. Lahaina Maui, USA, June 1991, pp. 57-62.

Roug6, B., Juline, P., Berthon, J., Laporte, M., Coutin-Faye, S. and Moura, D.J.P., 1991. Martian digital elevation (3d) modelling. First ESA Workshop on Computer Vision and Image Processing for Spaceborne Applications, Noordwijk, June 1991.

Roux, M., Hsieh, Y.C. and McKeown, M., 1995. Performance analysis of object space matching for building extraction using several images. In: SPIE (Editor), Integrating Photogrammet- ric Techniques with Scene Analysis and Machine Vision II, Orlando, Florida, April 1995, pp. 277-297.

Serra, B. and Berthod, M., 1994. Subpixel contour matching using continuous dynamic programming. CVPR, Seattle, USA, June 1994.

Venkateswar, V. and Chellappa, R., 1992. Hierarchical stereo matching using feature groupings. IEEE Image Understanding Workshop DARPA, January 1992, pp. 427-436.

Weidner, U. and F6rstner, W., 1995. Towards automatic build- ing extraction from high-resolution digital elevation models. ISPRS J. Photogramm. Remote Sensing, 50(4): 38-49.