journal of archaeological sciencearap.univie.ac.at/fileadmin/user_upload/p_arap/verhoeven...accurate...

11
Mapping by matching: a computer vision-based approach to fast and accurate georeferencing of archaeological aerial photographs G. Verhoeven a, b, d, * , M. Doneus b, d , Ch. Briese c, d , F. Vermeulen a a Department of Archaeology, Ghent University, Belgium b VIAS e Vienna Institute for Archaeological Science, University of Vienna, Austria c Institute of Photogrammetry and Remote Sensing, Vienna University of Technology, Austria d LBI for Archaeological Prospection and Virtual Archaeology, Vienna, Austria article info Article history: Received 28 July 2011 Received in revised form 16 February 2012 Accepted 21 February 2012 Keywords: 3D model Aerial archaeology Aerial photograph Computer vision Dense multi-view stereo reconstruction Orthophoto Structure from motion abstract To date, aerial archaeologists generally apply simple rectication procedures or more expensive and time-consuming orthorectication algorithms to correct their aerial photographs in varying degrees for geometrical deformations induced by the topographical relief, the tilt of the camera axis and the distortion of the optics. Irrespective of the method applied, the georeferencing of the images is commonly determined with ground control points, whose measurement and identication is a time- consuming operation and often limits certain images from being accurately georeferenced. Moreover, specialised software, certain photogrammetric skills, and experience are required. Thanks to the recent advances in the elds of computer vision and photogrammetry as well as the improvements in pro- cessing power, it is currently possible to generate orthophotos of large, almost randomly collected aerial photographs in a straightforward and nearly automatic way. This paper presents a computer vision- based approach that is complemented by proven photogrammetric principles to generate orthophotos from a range of uncalibrated oblique and vertical aerial frame images. In a rst phase, the method uses algorithms that automatically compute the viewpoint of each photograph as well as a sparse 3D geometric representation of the scene that is imaged. Afterwards, dense reconstruction algorithms are applied to yield a three-dimensional surface model. After georeferencing this model, it can be used to create any kind of orthophoto out of the initial aerial views. To prove the benets of this approach in comparison to the most common ways of georeferencing aerial imagery, several archaeological case studies are presented. Not only will they showcase the easy workow and accuracy of the results, but they will also prove that this approach moves beyond current restrictions due to its applicability to datasets that were previously thought to be unsuited for convenient georeferencing. Ó 2012 Elsevier Ltd. All rights reserved. 1. Introduction Of all archaeological remote sensing techniques, aerial photo- graphic reconnaissance from a low-ying aircraft has been the workhorse since it is one of the most effective methods for site discovery. On the one hand, the non-invasive approach yields easily interpretable imagery with abundant spatial detail. On the other hand, the method is driven by the specic nature of the partly eroded or sub-surface archaeological features, which show up on the surface under certain conditions as visibility marks: i.e. indirect indicators of archaeological residues due to the changed properties of the local topography or the soil matrix (Bewley and Ra ˛ czkowski, 2002; Brophy and Cowley, 2005; Crawford, 1924; Scollar et al., 1990; Wilson, 2000). According to their nature, archaeologists distinguish (i) shadow marks (when earthworks are thrown into relief by low slanting sunlight), (ii) soil marks (due to varying chemical and physical properties affecting the soil colour on the surface), (iii) crop marks (due to variable growth of the vegetation), (iv) snow/frost marks (due to differential snow accu- mulations and differential melting of snow or frost) as well as the less frequent ood and wind marks. Once detected from the air, they are orbited and documented from various positions mostly using small and medium-format Single-Lens Reex (SLR)-cameras. While discovery and photographic documentation of archaeo- logical sites is commonly seen as the prime aim of aerial archae- ology, the need for detailed interpretation (interpretative mapping * Corresponding author. Department of Archaeology, Faculty of Arts and Philosophy, Ghent University (UGent), Sint-Pietersnieuwstraat 35 (UFO e 110.016), B-9000 Gent, Belgium. Tel.: þ32 485 65 05 05. E-mail address: [email protected] (G. Verhoeven). Contents lists available at SciVerse ScienceDirect Journal of Archaeological Science journal homepage: http://www.elsevier.com/locate/jas 0305-4403/$ e see front matter Ó 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.jas.2012.02.022 Journal of Archaeological Science 39 (2012) 2060e2070

Upload: others

Post on 31-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Journal of Archaeological Sciencearap.univie.ac.at/fileadmin/user_upload/p_arap/Verhoeven...accurate for a detailed multi-temporal and multi-method analysis. 3. Structure from motion

at SciVerse ScienceDirect

Journal of Archaeological Science 39 (2012) 2060e2070

Contents lists available

Journal of Archaeological Science

journal homepage: http: / /www.elsevier .com/locate/ jas

Mapping by matching: a computer vision-based approach to fast and accurategeoreferencing of archaeological aerial photographs

G. Verhoeven a,b,d,*, M. Doneus b,d, Ch. Briese c,d, F. Vermeulen a

aDepartment of Archaeology, Ghent University, BelgiumbVIAS e Vienna Institute for Archaeological Science, University of Vienna, Austriac Institute of Photogrammetry and Remote Sensing, Vienna University of Technology, Austriad LBI for Archaeological Prospection and Virtual Archaeology, Vienna, Austria

a r t i c l e i n f o

Article history:Received 28 July 2011Received in revised form16 February 2012Accepted 21 February 2012

Keywords:3D modelAerial archaeologyAerial photographComputer visionDense multi-view stereo reconstructionOrthophotoStructure from motion

* Corresponding author. Department of ArchaePhilosophy, Ghent University (UGent), Sint-PietersnieB-9000 Gent, Belgium. Tel.: þ32 485 65 05 05.

E-mail address: [email protected] (G. Ve

0305-4403/$ e see front matter � 2012 Elsevier Ltd.doi:10.1016/j.jas.2012.02.022

a b s t r a c t

To date, aerial archaeologists generally apply simple rectification procedures or more expensive andtime-consuming orthorectification algorithms to correct their aerial photographs in varying degrees forgeometrical deformations induced by the topographical relief, the tilt of the camera axis and thedistortion of the optics. Irrespective of the method applied, the georeferencing of the images iscommonly determined with ground control points, whose measurement and identification is a time-consuming operation and often limits certain images from being accurately georeferenced. Moreover,specialised software, certain photogrammetric skills, and experience are required. Thanks to the recentadvances in the fields of computer vision and photogrammetry as well as the improvements in pro-cessing power, it is currently possible to generate orthophotos of large, almost randomly collected aerialphotographs in a straightforward and nearly automatic way. This paper presents a computer vision-based approach that is complemented by proven photogrammetric principles to generate orthophotosfrom a range of uncalibrated oblique and vertical aerial frame images. In a first phase, the method usesalgorithms that automatically compute the viewpoint of each photograph as well as a sparse 3Dgeometric representation of the scene that is imaged. Afterwards, dense reconstruction algorithms areapplied to yield a three-dimensional surface model. After georeferencing this model, it can be used tocreate any kind of orthophoto out of the initial aerial views. To prove the benefits of this approach incomparison to the most common ways of georeferencing aerial imagery, several archaeological casestudies are presented. Not only will they showcase the easy workflow and accuracy of the results, butthey will also prove that this approach moves beyond current restrictions due to its applicability todatasets that were previously thought to be unsuited for convenient georeferencing.

� 2012 Elsevier Ltd. All rights reserved.

1. Introduction

Of all archaeological remote sensing techniques, aerial photo-graphic reconnaissance from a low-flying aircraft has been theworkhorse since it is one of the most effective methods for sitediscovery. On the one hand, the non-invasive approach yields easilyinterpretable imagery with abundant spatial detail. On the otherhand, the method is driven by the specific nature of the partlyeroded or sub-surface archaeological features, which show up onthe surface under certain conditions as “visibility marks”: i.e.

ology, Faculty of Arts anduwstraat 35 (UFO e 110.016),

rhoeven).

All rights reserved.

indirect indicators of archaeological residues due to the changedproperties of the local topography or the soil matrix (Bewley andRaczkowski, 2002; Brophy and Cowley, 2005; Crawford, 1924;Scollar et al., 1990; Wilson, 2000). According to their nature,archaeologists distinguish (i) shadowmarks (when earthworks arethrown into relief by low slanting sunlight), (ii) soil marks (due tovarying chemical and physical properties affecting the soil colouron the surface), (iii) crop marks (due to variable growth of thevegetation), (iv) snow/frost marks (due to differential snow accu-mulations and differential melting of snow or frost) as well as theless frequent flood and wind marks. Once detected from the air,they are orbited and documented from various positions mostlyusing small and medium-format Single-Lens Reflex (SLR)-cameras.

While discovery and photographic documentation of archaeo-logical sites is commonly seen as the prime aim of aerial archae-ology, the need for detailed interpretation (interpretative mapping

Page 2: Journal of Archaeological Sciencearap.univie.ac.at/fileadmin/user_upload/p_arap/Verhoeven...accurate for a detailed multi-temporal and multi-method analysis. 3. Structure from motion

G. Verhoeven et al. / Journal of Archaeological Science 39 (2012) 2060e2070 2061

e cf. Doneus, 2001) of the photographs is rarely emphasized andexecuted (see also Bewley and Raczkowski, 2002). As a result,millions of aerial photographs are just stored in archives wheretheir archaeological information cannot (or will not) be exploitedefficiently. This is unfortunate, since a thorough understanding ofarchaeological landscapes is based on combining the interpretedevidence from various prospection methods.

The neglect of this interpretative mapping may have multiplereasons (such as available resources), but one of the most importantones is likely the time-consuming georeferencing process, which isa necessary prerequisite for any kind of accurate mapping and dataintegration (Doneus and Neubauer, 1998). This process, which is alsoknown as ground registration, assigns spatial information to theimagery to explicitly define their location and rotation in respect toa specific Earth-related coordinate frame. After georeferencing,several multi-temporal aerial observations can be mosaicked into anextensive overall view of an archaeological region that can serve asa basic information layer for further prospection, excavation,protection measures and heritage management.

Obviously, aerial archaeology is in need of fast and accurategeoreferencing approaches that allow straightforward orthophotoproduction of a wide variety of images. This article elaborates onsuch an approach and presents amethod to automate the important,but recurring task of orthophoto generation. The proposed meth-odology tries to overcome the conventional georeferencing prob-lems related to archaeological aerial frame images. To this end, theroutine exploits some of the technological improvements in hard-ware configurations as well as state-of-the-art algorithms mainlydeveloped in the field of computer vision and photogrammetry.Before illustrating this approach by means of three case studies(Section 4), the generally applicable approaches to the georefer-encing of aerial photos (Section 2) and the components of the newlypresented approach (Section 3) will be shortly reviewed below.

2. Georeferencing and orthophoto generation in aerialarchaeology

In the case of aerial imaging, it is important to note that anoptical remote sensing instrument such as a (digital) photographiccamera generates images of the spatial distribution of the Earth’supwelling radiance, but always degrades this analogue signal to

Fig. 1. Mapping of 3D object points onto 2

a certain extent. Consequently, the final image is never a veryfaithful reproduction of the real-world scene. Besides the radio-metric and spectral transformations that occur, the geometricthree-dimensional (3D) properties are mapped to a 2D plane(Fig. 1). This mapping result (i.e. the photograph) is influenced bya wide variety of factors, of which those induced by the topo-graphical relief, the tilt of the camera axis and the distortion of theoptics aremost considerable. Although it is sensu stricto not coveredby its definition, georeferencing often involves the necessary stepsto remove these factors in order to place each image pixel on itstrue location on the Earth’s surface. To do this, a wide variety ofapproaches and software solutions exist.

In general, archaeologists often fit tilted images to a flat surface,a process which is called (planar) rectification and which requiresa projective transformation (Wolf and Dewitt, 2000). Althoughthese rectified images do not suffer any more from tilt displace-ments, they still contain scale variations and image displacementsdue to the topographic relief (hills, buildings etc.). Consequently,projective transformations can only be considered “archaeologi-cally sufficient” when dealing with completely flat areas. If theaerial view suffers from topographically induced deformations (i.e.the relief displacements), georeferencing often employs poly-nomial corrections, spline algorithms or piecewise affine warpingsembedded in archaeologically-dedicated tools such as AERIAL(Haigh, 2005) and Airphoto (Scollar, 2002). Although theseapproaches are very popular and might deliver fairly good metricalinformation when the terrain variations are quite moderate, themethods are often sub-optimal because they do not (or only partly)eliminate all the image displacements, the distortion of the opticsand e to a lesser extent e the atmospheric refraction. To deal withthose issues and create planimetrically correct true orthophoto-graphs, more advanced ortho-correction offered by programs suchas Leica Photogrammetry Suite or Trimble INPHO PhotogrammetricSystem needs to be applied. Although these more expensivepackages embed rigorous orthorectification algorithms to producesuperior geometric quality, they suffer from the fact that photo-grammetric skills, calibrated camera information and an accurate,high-resolution DSM are essential, three conditions that aregenerally not met in aerial archaeology.

Irrespective of the method applied, the georeferencing of theimages is commonly determined with ground control points

D points in two aerial frame images.

Page 3: Journal of Archaeological Sciencearap.univie.ac.at/fileadmin/user_upload/p_arap/Verhoeven...accurate for a detailed multi-temporal and multi-method analysis. 3. Structure from motion

G. Verhoeven et al. / Journal of Archaeological Science 39 (2012) 2060e20702062

(GCPs), whose manual measurement and identification is a time-consuming operation that requires experience while being boundto certain prerequisites. As a result of all these issues, many valu-able aerial images never get properly georeferenced. Often, thegeoreferencing is suited for small-scale mapping but inadequatelyaccurate for a detailed multi-temporal and multi-method analysis.

3. Structure from motion and multi-view stereo

A lot of tools and methods exist to obtain information about the3D geometry of scenes from 2D views. One of the possibilities is touse multiple images from the same scene. Using techniques mainlydeveloped in the field of photogrammetry, an image point occur-ring in at least two views can be reconstructed in 3D (Fig. 1).However, this can only be performed if the projection geometries ofthe images are known. These are expressed by the interior or innercalibration parameters of the camera (also called camera intrinsicsand comprising the focal length, the principal point location pluslens distortion coefficients) and camera poses (i.e. the projectioncentre location and image orientation defined by the six exterior/outer orientation parameters or camera extrinsics e Fig. 1) at themoment of image acquisition (Robertson and Cipolla, 2009).

While the discipline of photogrammetry derives these param-eters mainly fromwell-distributed GCPs and tie points, a Structurefrom Motion (SfM; Ullman, 1979) approach allows simultaneouscomputation of both this relative projection geometry and a set of3D points, using only corresponding image features occurring ina series of overlapping photographs captured by a camera movingaround the scene (Fisher et al., 2005; Quan, 2010; Szeliski, 2011). Todo this, the SfM relies on algorithms that detect and describe localfeatures for each image and subsequently match those 2D pointsthroughout the multiple images. Afterwards, a number of potentialcorrespondences are obtained. Using this set of correspondencesas input, SfM computes the locations of those interest points andrenders them as a sparse 3D point cloud that represents thegeometry/structure of the scene in a local coordinate frame. Asmentioned previously, the camera pose and internal cameraparameters are also retrieved (Hartley and Zisserman, 2003;Szeliski, 2011). There is thus no real need to apply calibratedcameras and optics during the image acquisition stage (Quan,2010), which makes the procedure very flexible and well-suitedfor almost any kind of imagery.

SfM algorithms were developed in the field of computer vision,which is often defined as the science that develops mathematicaltechniques to recover a variety of information from images. Thisimage data can take many forms, such as multi-dimensionalimagery from medical scanners, stereo photographs, videosequences or views from multiple still cameras. Initially, manycomputer vision applications were focused on robotic vision andautomation. However, the last decade witnessed a shift of focus to3D visualisations and virtual reality. Additionally, many newinsights in the geometry of multiple images were obtained (seeHartley and Zisserman, 2003 for a good overview). Recently, SfMhas received a great deal of attention due to Bundler (Snavely,2010) and Microsoft’s Photosynth (Microsoft Corporation, 2010):two SfM implementations that are freely available on the Web. Todate, several SfM-based packages can be applied to obtain a (semi-)automated processing pipeline for image-based 3D visualisation.In this study, the commercial package PhotoScan (version 0.8.1build 877) from AgiSoft LLC is used.

Since it is tailored towards high-level 3D model generation,PhotoScan complements the SfM approach with a variety of densemulti-view stereo (MVS) algorithms. One could for example inter-polate the sparse set of 3D SfM points, but this would yield a farfrom optimal result. Therefore, a MVS reconstruction is used to

compute a dense estimate of the surface geometry of the observedscene. Because these solutions operate on the pixel values insteadof on the feature points (Scharstein and Szeliski, 2002; Seitz et al.,2006), this additional step enables the generation of detailed three-dimensional meshed models from the initially calculated sparsepoint clouds, hence enabling proper handling of fine details presentin the scenes. More specifically, PhotoScan (version 0.8.1) offersthree reconstruction methods (besides its “Fast” method). Each ofthose methods uses a pair-wise binocular stereo approach (Bradleyet al., 2008) to compute a depth estimate (i.e. the distance from thecamera to the object surface) for almost every image pixel of eachview (Mellor et al., 1996; Pollefeys et al., 2004). Fusing the resultingindependent depth maps of all the images yields a single 3D modelwhich is afterwards approximated by a triangular mesh. Photo-Scan’s different reconstruction methods (Exact, Smooth andHeight-field) differ by the way how these individual depth mapsare merged into the final digital model.

When working with aerial images, the model calculated can beconsidered a Digital Surface Model (DSM): a numerical represen-tation of the topography and all its imposed structures such astrees, houses and any other object that did not move. As isknown from conventional orthorectification, such a dense DSM iselementary when onewants to generate so-called true orthophotosin which all objects with a certain height (such as houses, towersand trees) are also accurately positioned (Braun, 2003; Kraus,2002).

In a final step, a textured 3D mesh can be created by a texturemapping using a particular selection of the initial images. At thisstage, the reconstructed 3D scene is still expressed in a localcoordinate framework and equivalent of the original scene up toa global scale and rotation factor. In order to transform the surfacemodel into an absolute coordinate system, a Helmert similaritytransformation is applied. Since seven parameters (three trans-lations, one scale, three rotations) must be determined for thisspatial transformation, at least three GCPs with known altitudevalues are needed (although the use of more GCPs is alwaysadvisable).

Because high-quality reconstructions with large image files arevery resource intensive, multicore processors, a decent amount ofRAM (minimum 8 GB) and a 64-bit operating system are essential.Since PhotoScan supports the OpenCL (Open Computing Language)programming platform, it can access the Graphics Processing Units(GPUs) of the video card when intensive computing tasks have tobe executed, additionally helping in shortening processing times.

4. Case studies

When archaeologists fly around in small aeroplanes to acquireimages of the visibility marks seen on the Earth’s surface, theyacquire single images that are mostly oblique in nature. Often, theaircraft may orbit the area of interest to maximise the photogra-pher’s options in getting several shots from a variety of angles,enabling the archaeological site to be documented as completely aspossible with one camera. The acquisition of such a series ofphotographs is perfectly suited for the presented approach: sincethe camera pose and 3D structure are calculated from the overlap ofsingle images, best results can be acquired when the site is equallywell covered from all directions. However, also very orderedcollections of vertical imagery and randomly acquired imagesover multiple years can also be successfully processed into trueorthophotos.

In total, three case studies (Fig. 2) show the potential of thismethod in a wide variety of topographic settings, but also whendealing with various archaeological marks and imagery (e.g.analogue visible frames, digital near-infrared coverage etc.). Since

Page 4: Journal of Archaeological Sciencearap.univie.ac.at/fileadmin/user_upload/p_arap/Verhoeven...accurate for a detailed multi-temporal and multi-method analysis. 3. Structure from motion

Fig. 2. The location of the three presented case studies: Carnuntum, Ricina and Trea.

G. Verhoeven et al. / Journal of Archaeological Science 39 (2012) 2060e2070 2063

all the images were acquired before the methodwas developed, thetrue potential of the method can be assessed and its possibilitiesconcerning older imagery showcased. The case studies are pre-sented in a common format: first a short introduction to the siteand the acquisition of the photographs is presented; secondly, thebuilding of the orthophoto and possible drawbacks are addressed;and thirdly, each case study will also highlight some very specificadvantages of this approach.

4.1. Carnuntum (Austria)

In this first case study, a very simple topographic setting ischosen to go overall the individual processing steps. The data setused here consists of sixteen images of the central part of theRoman town of Carnuntum, located approximately 40 km south-east of Vienna on the southern bank of the Danube river(N 48�604100, E 16�5105700 e WGS84; Fig. 2). The Roman legionarycamp with attached canabae and a civil town covers some 650 ha,thus constituting the “largest archaeological landscape in Austria”(Jobst, 1983; Vorbeck and Beckel, 1973). As the Roman capital of theprovince Pannonia, Carnuntum was an important town during thefirst four centuries AD. Since aerial archaeology and geophysicalarchaeological prospection have proven to be ideally suited surveymethods for the mapping and documentation of such large Romancity sites, those methods have been intensively applied at Carnun-tum during the last fifteen years (Doneus and Neubauer, 2005;Doneus et al., 2001; Neubauer and Eder-Hinterleitner, 1997;Neubauer et al., 1999, 2002).

The aerial imagery used in this reconstruction was captured atthe end of May 2000 around 11.00 h with a Hasselblad 205 FCCmedium-format camera and an 80 mm lens from a Cessna 172Skyhawk. For storage and data processing purposes, the 6 � 6

frames were digitised with Nikon�’s Super COOLSCAN 9000 ED,yielding images of circa 7000 � 7000 pixels. In Fig. 3A, three ofthose images are displayed. After uploading all sixteen views inPhotoScan, image feature points are detected in all the sourceimages (Fig. 3A-3). The approach is similar to the well-known SIFT(Scale Invariant Feature Transform) algorithm developed by DavidLowe (Lowe, 2004), since the features are also stable under view-point, scale and lighting variations. Based on its local neighbour-hood, each point has its own local descriptor which is subsequentlyused to detect point correspondences across the complete imageset. To do this, robust methods such as a modified version ofRANSAC (Fischler and Bolles, 1981) are used. These algorithms candeal with wrong matches, which would otherwise seriouslycontaminate the retrieved geometric relationships between theimages (the latter relations are called the multi-view constraints ePollefeys et al., 2004). Fig. 3A-2 shows that it is also possible tomask certain image parts that should not be considered during anyof the reconstruction steps. Using the correspondences betweenthe image features as input, the locations of those feature pointscan be estimated and rendered as a sparse 3D point cloud. Duringthe latter process e which is the aforementioned SfM approach e

the camera intrinsic and extrinsic parameters are also computed.PhotoScan uses an algorithm to initially compute approximatecamera locations, which are afterwards refined by a bundle-adjustment algorithm (pers. comm. PhotoScan support). At thisstage, PhotoScan will produce three datasets: (i) a point cloud oftypically a few thousand 3D points representing the geometry/structure of the scene (Fig. 3B); (ii) the relative camera poses at themoment of image acquisition (i.e. the position of the projectioncentre and the camera’s orientation e Fig. 3B); (iii) the interiororientation parameters, being the focal length f, the principal pointlocation (cx, cy), a skew factor s as well as three radial (k1, k2, k3) and

Page 5: Journal of Archaeological Sciencearap.univie.ac.at/fileadmin/user_upload/p_arap/Verhoeven...accurate for a detailed multi-temporal and multi-method analysis. 3. Structure from motion

Fig. 3. (A1e3) Three of the oblique images used in a 3D reconstruction of the scene (A2 indicate a masked area and A3 shows the detected features of this photo). B shows the sparsepoint cloud and the camera positions while C depicts a depth map. D displays the 3D model with GCP markers, while E is a textured version of D.

G. Verhoeven et al. / Journal of Archaeological Science 39 (2012) 2060e20702064

two tangential distortion coefficients (p1 and p2). The first threeparameters (f, cx, cy) are expressed in pixels (Agisoft LLC, 2011).

In a second step, a dense surface reconstruction is computed.Instead of interpolating the sparse 3D point cloud, a dense stereomatching is executed. Whereas SfM algorithms operate on a sparseset of feature points extracted from the source photographs, thesedense reconstruction algorithms operate on the individual pixelvalues. As all pixels are utilized, this reconstruction step (which isbased on a pair-wise depth map computation e Fig. 3C) enablesproper handling of fine details present in the scenes and representsthem as a 3D mesh (Fig. 3D). Several algorithms are available to dothis, but the height-field method has previously been proven to bevery suited for aerial images (see Verhoeven, 2011b).

In a third stage, the absolute orientation of the model can bederived by the manual measurement of GCPs in the photographs(or the model e Fig. 3D) or by automatic extraction of thegeographical coordinates embedded in the Exif (EXchangeableImage File)-defined metadata tags of the images. Although thelatter approach can be ideal for a very fast and straightforwardgeolocation of the 3D model, it is (currently) not advised consid-ering the accuracy of most GNSS (Global Navigation SatelliteSystem) receivers. Using one of the two approaches, PhotoScanwilluse a seven parameter similarity transformation to execute thegeoreferencing. At this stage, both an orthophoto and a DSM can beexported. If necessary, the initial imagery can be blended with the3D model to form a texture atlas for the 3D scene (Fig. 3E). It needs

Page 6: Journal of Archaeological Sciencearap.univie.ac.at/fileadmin/user_upload/p_arap/Verhoeven...accurate for a detailed multi-temporal and multi-method analysis. 3. Structure from motion

G. Verhoeven et al. / Journal of Archaeological Science 39 (2012) 2060e2070 2065

to be stressed that this texturing is not necessary for orthophotoproduction, although it might aid in a location of the positions ofthe GCP markers. In the latter case, the texturing needs to beapplied directly after the dense estimation of the surface geometry.

Even though the terrain is quite homogeneous and flat, this firstcase study already proves some of the potential of the aforemen-tioned algorithms for aerial archaeology. First of all, many photo-graphs look quite dissimilar due to the Bidirectional ReflectanceDistribution Function (BRDF) effects of vegetation in the visiblespectrum (Guyot,1990; Sandmeier and Itten,1999). Despite the factthat archaeological features and meadows look brighter or fainteraccording to a specific viewing and illumination geometry, allimages were accurately matched. Secondly, this alignment hasbeen executed without previous knowledge on the location ofimage acquisition, nor about the instrument the imagery wasacquired with. Although the camera and lens parameters areknown in this specific case, they were not stored in the metadataafter scanning the negatives. As a consequence, there were noinitial values for the SfM algorithms to take into account. Althoughit is obvious that information on focal length can cure certainalignment problems, this case proves that the generation of 3Dmodels and orthophotos can succeed even in the absence of theseparameters.

4.2. Ricina (Italy)

For the next two case studies, the focus is shifted towards Italy.In January 2000, Ghent University initiated the geoarchaeologicalPotenza Valley Survey (PVS) project in the central Adriatic RegionMarche. This interdisciplinary project has mainly aimed at recon-structing the changing physical and human landscape along thePotenza river, one of Marche’s major rivers. Aerial archaeologicalreconnaissance was from the start identified as one of the main

Fig. 4. The Ricina orthophoto calculated out of 27 oblique images and enhanced for crop mimpossible to accurately georeference due to the lack of GCPs and a detailed elevation moreferred to the web version of this article.)

survey techniques (Vermeulen, 2002, 2004). In the beginning ofJune 2009, a series of oblique aerial photographs were acquiredabove the lower valley Roman town of Ricina (N 43�1904100,E 13�2502600 e WGS84; Fig. 2) with an uncalibrated compact digitalcamera (Panasonic DMC-FZ18), yielding photographs of eightmegapixels. The total series consisted of 33 images, of which sixwere withheld for reasons of coverage and lack of sharpness. Allphotographs depict positive and negative crop marks related to thisimperial Roman town, of which only a rather well-preservedtheatre building is fully visible above ground level today(Capodoglio and Cipoletta, 1996). Although there have been a seriesof investigations to understand the character and extent of this city,almost nothing was known about its general layout and organiza-tion (Alfieri, 1937; Mercando, 1971; Percossi Serenelli, 1989; Cecchi,1968) before aerial photographs were taken in the PVS framework(Vermeulen and Verhoeven, 2004, 2006). The specific image seriesunder study here proved again the immense value of archaeologicalaerial photography, since it became for the first time possible tospot the hidden remains of a Roman amphitheatre, visible justabove the indicated area 1 in Fig. 4 (Vermeulen, 2011).

Using four GCPs whose coordinates were extracted from a 1:10 000 topographical map, the georeferencing of the model wasperformed with a final horizontal Root-Mean-Square Error (RMSE)of 0.31m and a vertical RMSE of 0.15 m. To compute the orthophotoof the image, a grid space of 10 cmwas used (which is the value thatwas also proposed by PhotoScan). After applying some additionalimage enhancement to make the crop marks stand out moreclearly, the final orthophotomap nicely displays all archaeologicallyrelevant features in their accurate position (Fig. 4).

The total processing time was about 38 min. After adding 5 minfor image selection, about 25 min for GCP marking plus tweakingand a few additional minutes for contrast enhancement, thetotal orthophoto production time for these 27 images reached

ark display. The area indicated in red is a down sloping zone which was previouslydel. (For interpretation of the references to colour in this figure legend, the reader is

Page 7: Journal of Archaeological Sciencearap.univie.ac.at/fileadmin/user_upload/p_arap/Verhoeven...accurate for a detailed multi-temporal and multi-method analysis. 3. Structure from motion

G. Verhoeven et al. / Journal of Archaeological Science 39 (2012) 2060e20702066

approximately 70 min. Compared to conventional georeferencingon an image-by-image basis, this shows the huge saving in totalprocessing time with the additional benefit of having a completeorthorectified overview image with a positional accuracy thatmight be hard to attain using the conventional archaeologically-dedicated low-cost packages.

Furthermore, it becomes apparent that imagery can be usedwhich would have been considered “unusable” before (see Palmer,2005 for some consideration on this issue). Some images of this andthe previous dataset only record three or ill-distributed GCPs,neither of which are considered adequate for most georeferencingapproaches. Furthermore, photographs that suffer from quite sometopographic image displacement cannot fully be corrected usingsimple rectification algorithms such as a projective or polynomialtransformation. Such a problematic area is indicated in Fig. 4 (area1). Although the borders of this field (respectively a road andanother field) are on the topographical map, no single point alongthese lines can be used as a possible control point. Besides this lackof potential GCPs, this south-east town area is slightly undulatingand sloping downwards. Since the presented approach usesa detailed DSM for orthophoto production, the output is a trueorthophoto in which all possible tilt and terrain displacements aretaken into account. The fact that this orthopohoto procedure inte-grates all imagery into one photomosaic, also makes the search forsuitable ground control much easier (hence solving all the issuesmentioned).

Finally, it needs to be stressed that the generated orthophotomight still suffer from some artefacts. Area 2 of Fig. 4 displays anarea which suffers from over-triangulation, most likely due to theimage noise produced by the compact camera and insufficient pixeldata to compute this part of the DSM. However, apart from beinga visual artefact, such defects do in most cases not alter the general

Fig. 5. (A) The relative position of all 203 camera stations and the extracted DSM

appraisal and global positional accuracy of the computed ortho-photo. Obviously, such artefacts are more likely with reconstruc-tions based on imagery from sub-optimal optical quality (such asthose from smaller cameras or blurry images), but they can alwaysbe fixed using third-party software.

4.3. Trea (Italy)

The general advised strategy in using PhotoScan is to solve thecomplex SfM math of as large as possible a set of images, withouthaving to rely on virtual memory. Later, one can ‘disable photos’and perform the dense reconstruction in parts (Verhoeven, 2011b).Although this approach was meant to tackle limited hardwareresources, it opens up a completely new application field for aerialarchaeologists. To illustrate this, a time series covering six years ofaerial PVS research on the Roman town of Trea (central AdriaticItaly, 43�1804000 N, 13�1804200 W e WGS84; Fig. 2) will be used. Incontrast to the former case studies, the Roman city of Trea lies ona hill, surrounded by a seriously undulating landscape of themiddlePotenza valley. The scene can thus be considered quite complexand the terrain displacement in the aerial images very substantial.Like Ricina, the city layout of Trea was only known summarily(see Moscatelli, 1985) before this systematic aerial photographycampaign. The latter now allows a near complete mapping of themain urban structures of this abandoned Roman city, such as thetown defences, the internal street network and themain public andprivate buildings.

From the 208 initially selected images, 203 were alignedcorrectly (Fig. 5A). This number is extremely high given thecircumstances: a wide variety of cameras and lenses were usedduring the reconnaissance flights; the land cover varied frombare soil to crops in various phenological states; 39 images only

of Trea (B); (C) shows a Near-Infrared and (D) a visible rendering of the DSM.

Page 8: Journal of Archaeological Sciencearap.univie.ac.at/fileadmin/user_upload/p_arap/Verhoeven...accurate for a detailed multi-temporal and multi-method analysis. 3. Structure from motion

G. Verhoeven et al. / Journal of Archaeological Science 39 (2012) 2060e2070 2067

recorded the reflectance in the Near-InfraRed (NIR) spectral band(see Verhoeven, 2008, 2011a; Verhoeven et al., 2009 for details onthis). Unquestionably, this alignment result was facilitated by thefact that all images still had information on the focal lengthembedded in the Exif metadata tags, so that these values could beused for initializing the SfM step. To execute the dense recon-struction stage, only a subset of images was used. Since all imageshave been aligned in Euclidian space in the SfM step, one canchoose whatever image that is deemed useful for the reconstruc-tion. Since too much pixel data increases computation time anddoes not necessarily enhance the accuracy and detail of the DSM,only a selection of 143 suitable images e largely based on imagescale, scene coverage and sharpness e was used as input for themodelling step. In total, alignment and DSM processing took 4.5 hand could easily be run over night. Once an accurate 3D model ofthe terrain is generated (Fig. 5B), every image or combination ofimages can be applied in a texture mapping or orthophoto gener-ation. The former can for example be interesting to create a newview on the scene. As an example, Fig. 5C and D respectively showa NIR and visible rendering of the DSM from a viewpoint that is verydifferent from all the original aerial views.

Because the orthophoto generation is characterised by the samefreedom in image selection, it is possible to use only the NIR images(Fig. 6A-3) or those that best illustrate the crop marks (Fig. 6A-1) orsoil mark state (Fig. 6A-2). Not only does this approach speed up theprocessing of individual images (or related photo sets) consider-ably, but the final interpretation is more trustworthy as well. Due tothe heavy undulating nature of the terrain and the very steep slopesbordering the central plateau, the archaeologically-dedicated tools(such as AERIAL or Airphoto) andmost GIS packages will typicallyfail to accurately georeference these images. Although this mightnot seem to be a big issue when dealing with vague soil marks, thenature of the crop marks (faint and small) as well as the type of site(a complex Roman townwith different phases) makes the accuratemapping of the features of the utmost importance for comparisonof aerial footage from different years or interpret the datawith respect to a geophysical survey (for this case study, the

Fig. 6. (A) Integration of several orthophotos, showing crop marks (1) and soil marks (2) in t(4) without any useable GCP; (B) shows the comparison between the generated orthophot

georeferencing delivered a planar RMSE of 6.2 cm and an RMSE of4.6 cm for the altitude component). Additionally, the whole processof orthophoto production is straightforward, fast and can deal witha variety of frame imaging sensors of which no calibrationparameters need to be supplied. Besides, Fig. 6A-4 indicates thateven individual images without any GCP can be transformed intoorthophotos. The combination of these advantages largely over-comes the current drawbacks that archaeologists encounter inmost image (ortho)rectification approaches, certainly when dealingwith larger areas (features of a palaeolandscape, extensive sites) orterrain undulation.

However, it must be stressed that processing this amount ofimagery requires significant computing power and DSM andorthophoto artefacts can occur. It should also be noted that such anintegrated approach onlyworks when nomajor scene changes havetaken place during the years of image acquisition. In the presentedcase study, the biggest surface difference was related to thephenological state of the vegetation: sometimes the fields were justharvested, while on other moments the camera recorded the fullcanopy. Although it did not hamper the image alignment, the DSMwill obviously be influenced by this. Therefore, one can best usea set of images displaying the most common surface condition,after which a numerical form of the latter can be used to computethe orthophotos of more or less all images. This approach was usedin this case study and did not result in archaeologically relevantpositional differences of the computed orthophotos. In case thedifference between different topographical conditions is too big,a multitude of DSMs should be computed to cover all possiblesurface states. In the worst case scenario, the landscape haschanged so drastically that the image alignment fails.

Although it is not the aim of this paper to provide a rigorousassessment of the positional accuracy of orthophotos (despite theRMSE values given so far), Fig. 6B give some visual clues about thehorizontal accuracy of the orthophotos. In this illustration, totalstation measurements of the road and some building features wereplaced on top of the orthophotograph. This result clearly illustratesthat the generated PhotoScan output is more than sufficient for

he visible domain, the NIR terrain reflectance (3) and the orthorectification of an imageo and total station measurements of the road and other building structures.

Page 9: Journal of Archaeological Sciencearap.univie.ac.at/fileadmin/user_upload/p_arap/Verhoeven...accurate for a detailed multi-temporal and multi-method analysis. 3. Structure from motion

G. Verhoeven et al. / Journal of Archaeological Science 39 (2012) 2060e20702068

large-scale (e.g. 1:500) archaeological mapping from aerialimagery. For a more elaborate accuracy analysis of SfM-basedorthophotos computed out of randomly acquired aerial imagery,consider Verhoeven et al. (2012). Additionally, research by Doneuset al. (2011) proved howwell this method holds upwhen comparedwith terrestrial laser scanning in an excavation context.

5. Considerations and improvements

Aerial photography provides a basis for gathering spatial data.Before archaeological information can be extracted from these datain a way that is useful for mapping and further analysis, the aerialimages must be georeferenced in an absolute manner. This process,which aims at placing each image pixel on its true location on theEarth’s surface, should also try to take care of all geometric trans-formations that occur during the imaging process. By applying theapproach presented in this article, simplicity is combined withgeometrical quality. Besides the automatic estimation of the innercamera calibration parameters, a dense DSM is calculated, which isused in a final phase to generate true orthophotos. As a result, thismethod largely accounts for most relevant kinds of geometricaldegradations and is capable of generating 3D models and ortho-photos that are perfectly suited for archaeological purposes. Thisoption of fast and accurate orthophoto production is very welcomefor aerial archaeologists, given their current approaches which arenot tailored to deal either with aerial frame imagery lacking suffi-cient ground control or with large amounts of photographs fromdifferent cameras shot in different seasons. This method offers theenormous advantage that there are just standard photographicrecording prerequisites. Apart from a sufficient number of sharpimages covering the scene to be reconstructed and at least threeGCPs to georeference the reconstruction in respect to a commoncoordinate frame, no other information is needed. Besides, onlya minimal technical knowledge and user interaction are required.Finally, this approach can also work in the total absence of anyinformation about the instrument the imagery was acquired with,although the Exif-defined metadata tags, including even GPScoordinates, can be utilized. So, it really is obvious that this meth-odology can be integrated seamlessly into any existing aerialarchaeological workflow. The extra investments needed for soft-ware and computing hardware are recovered easily when takingthe time and cost savings of map production into account.

Although the presented case studies prove this semi-automatic,integrated approach to be a valid alternative to the current varietyof (ortho)rectification methods, it has to be stressed that thismethodology is not perfect. First of all, the method is not applicablefor the individual image. At least twoe but preferably more imagese are needed for accurate DSM computation. In addition, erroneousalignment of the imagery can occur when dealing with very largephoto collections, images that suffer from excessive noise or blur,highly oblique photographs or photographs that have a verydissimilar appearance (e.g. due to major underexposure orchanging topographic terrain parameters). Currently, PhotoScanmisses tools that allow straightforward and solid inspection of thequality and accuracy of this alignment step (although a reprojectionerror is provided in later PhotoScan versions), so the user can onlyvisually assess any alignment problems. These can always betackled by disabling the images that cause the error or insertingaccurate camera calibration parameters before the SfM step. Thelatter can also prevent the reconstructed surface from being toocurved, since such distortion is often related to a poor estimation ofthe camera intrinsics when dealing with large image sequences.Because absolute georeferencing in the current approach occursafter building the 3D model, any previous distortion will alsoremain after assigning absolute spatial coordinates to the DSM.

Apart from the initial introduction of previously determinedinternal camera calibration parameters, a possible solution to thisproblem involves excessive coverage of the area using a largeroverlap between the photos. However, it should be clear that high-quality reconstructions with many digital frames are extremelytaxing for the computing hardware. Hence, computer componentsthat are a few years old will most likely not be up to this task. SincePhotoScan and many similar applications support the OpenCL(Open Computing Language) programming platform, the graphical-card is one of the most important hardware components forspeeding up the reconstructions. Still, better and more optimisedalgorithms are needed to permit the time-efficient processing oflarge image sets on standard computers. For instance, it would beinteresting to fix the alignment of the initially used images.Currently, registering new photographs to an image set means thatboth the SfM and dense reconstruction steps have to be executedagain. Being able to just align new imagery based on the previouslycalculated sparse 3D point cloud, would be a huge timesaver andenable on-the-fly production of orthophotos based on newly shotaerial imagery.

Another, rather hardware-based, improvement would certainlybe the possibility to merge existing exterior camera parameterswith the computed camera extrinsics, as was already performedby Wischounig-Strucl et al. (2011). Although PhotoScan alreadysupports the direct import of GPS coordinates, the positionalaccuracy of current handheld GNSS devices is too low for veryaccurate positioning using only the raw latitude, longitude andaltitude metadata. In case the relative camera extrinsics resultingfrom the SfM stage could be linked to absolute position andorientation data acquired by a differential GPS and IMU positioningsystem accompanying the digital camera, an iterative approachusing the SfM data could be used to improve the final accuracy ofthe camera pose parameters. Since the whole 3D model isexpressed in relative Euclidian coordinates, this matching woulddirectly georeference the computed model with high accuracy,while the optimized absolute camera exterior parameters replacethe first raw estimation and are subsequently stored in the meta-data of the imagery. Moreover, if a few absolute camera extrinsicsare added as constraints in the SfM process, it could preventpossible drift in the recovered camera and point locations (Snavelyet al., 2006). For those cases where no accurate camera extrinsicsare available, the georeferencing of the images is still determinedwith time-consuming GCPs identification.

6. Conclusion and outlook

Straightforward orthophoto production is very important inthe discipline of aerial archaeology. In this article, computervision algorithms (Structure from Motion and Multi-View Stereo)complemented by proven photogrammetrical principles (such asbundle adjustments) were exploited to present an integrated, low-cost, semi-automated orthophoto production of archaeologicalaerial (uncalibrated) frame images. This approach is straightfor-ward and requires no assumptions on the camera projectionmatrix, existing photogrammetric and computer vision knowl-edge or the topography of the scenes. One just needs to make surethat enough overlapping aerial images are acquired. Even thoughthis might involve flying one or more orbits of the scene ofinterest, this method will afterwards prove itself in terms oforthophoto quality and e in most occasions e processingspeed, certainly when a larger area must be mapped or uneventerrain is involved. Furthermore, the case studies proved that theorthophoto output is more than sufficiently accurate for archae-ological large-scale photo mapping, while the results are visuallyappealing as well.

Page 10: Journal of Archaeological Sciencearap.univie.ac.at/fileadmin/user_upload/p_arap/Verhoeven...accurate for a detailed multi-temporal and multi-method analysis. 3. Structure from motion

G. Verhoeven et al. / Journal of Archaeological Science 39 (2012) 2060e2070 2069

Finally, drawbacks were evaluated and possible improvementsproposed. First of all, it was indicated that the processing is verycomputer resource intensive, while image alignments are notcompletely optimal and solid accuracy measures still lacking. Also,the approach is presently semi-automatic and automation onlymakes sense when it seriously reduces or completely eliminatessteps in a process. In the case of archaeological orthophoto gener-ation, these are the recurring steps of visualizing the images,selecting the essential geodata (GCPs), and setting all the parame-ters for the subsequent execution of the algorithms. Since this iscurrently considered to be the bottleneck in large-scale archaeo-logical projects with thousands of images, a project has beenlaunched in 2012 by the first three authors (funded by the AustrianScience fund, P 24116-N23). It aims at the creation of completelyautomatic solutions for the GCP selection of archaeological aerialphotographs. This would offer possibilities for the consistentcreation and fast updating of archaeologically relevant cartographicdata in our rapidly changing landscapes.

Acknowledgements

The Ludwig Boltzmann Institute for Archaeological Prospectionand Virtual Archaeology (archpro.lbg.ac.at) is based on an inter-national cooperation of the Ludwig Boltzmann Gesellschaft (A), theUniversity of Vienna (A), the Vienna University of Technology (A),the Austrian Central Institute for Meteorology and Geodynamic (A),the office of the provincial government of Lower Austria (A),Airborne Technologies GmbH (A), RGZM-RomaneGermanic CentralMuseumMainz (D), RAÄ-Swedish National Heritage Board (S), IBMVISTA-University of Birmingham (GB) and NIKU-Norwegian Insti-tute for Cultural Heritage Research (N). The case studies from thePotenza Valley Survey project were made possible thanks tosupport from Belgian Science Policy (Interuniversity AttractionPoles, project P6/22).

References

Agisoft LLC, 2011. AgiSoft PhotoScan User Manual. Professional Edition, Version0.8.0 (accessed 03.05.2011.). http://www.agisoft.ru/pscan/help/en/pscan_pro.pdf.

Alfieri, N., 1937. Ricina. In: Atti e Memorie della Deputazione di Storia Patria per leMarche, 5, pp. 21e37.

Bewley, R.H., Raczkowski, W., 2002. Past achievements and prospects for thefuture development of aerial archaeology: an introduction. In: Bewley, R.H.,Raczkowski, W. (Eds.), Aerial Archaeology: Developing Future Practice. IOSPress, Amsterdam, pp. 1e8 (NATO Science Series. Series I: Life and BehaviouralSciences Vol. 337).

Bradley, D., Boubekeur, T., Heidrich, W., 2008. Accurate multi-view reconstructionusing robust binocular stereo and surface meshing. In: CVPR 2008. IEEEConference on Computer Vision and Pattern Recognition. IEEE, Anchorage, AK,pp. 1e8.

Braun, J., 2003. Aspects on true-orthophoto production. In: Fritsch, D. (Ed.),Photogrammetric Week’03. Wichmann Verlag, Heidelberg, pp. 205e214.

Brophy, K., Cowley, D., 2005. From the Air. Understanding Aerial Archaeology.Stroud, Tempus.

Capodoglio, G., Cipoletta, F., 1996. Ricina. Il Teatro, la città. S. Giuseppe srl, Pollenza.Cecchi, D., 1968. Helvia Recina e il Piceno nell’età romana. Studi Maceratesi 4,

126e214.Crawford, O., 1924. Air Survey and Archaeology. Ordnance Survey, Southampton.Doneus, M., 2001. Precision mapping and interpretation of oblique aerial photo-

graphs. Archaeol. Prospect 8, 13e27.Doneus, M., Neubauer, W., 1998. 2D combination of prospection data. Archaeol.

Prospect 5, 29e56.Doneus, M., Neubauer, W., 2005. Multiple survey techniques at Roman Carnuntum.

Integrated prospection of the largest archaeological landscape in Austria. In:Musson, C., Palmer, R., Campana, S. (Eds.), In volo nel passato. Aerofotografia ecartografia archaeological. Biblioteca del dipartimento di archaeologia e storiadelle arti e sezione archaeologica, Università di Siena, Sienna, pp. 272e279.

Doneus, M., Eder-Hinterleitner, A., Neubauer, W., 2001. Roman Carnuntum: pro-specting the largest archaeological landscape in Austria. In: Doneus, M.,Eder-Hinterleitner, A., Neubauer, W. (Eds.), Archaeological Prospection: FourthInternational Conference on Archaeological Prospection. Austrian Academy ofSciences, Vienna, pp. 47e59. Vienna, 19e23 September 2001.

Doneus, M., Verhoeven, G., Fera, M., Briese, Ch., Kucera, M., Neubauer, W., 2011.From deposit to point cloud e A study of low-cost computer vision approachesfor the straightforward documentation of archaeological excavations(XXIIIrd International CIPA Symposium). Geoinformatics 6, 81e88.

Fischler, M., Bolles, R., 1981. Random sample consensus: a paradigm for modelfitting with applications to image analysis and automated cartography. Comm.ACM 24, 381e395.

Fisher, R.B., Dawson-Howe, K., Fitzgibbon, A., Robertson, C., Trucco, E., 2005.Dictionary of Computer Vision and Image Processing. Wiley, Chichester.

Guyot, G., 1990. Optical properties of vegetation canopies. In: Steven, M.D.,Clark, J.A. (Eds.), Applications of Remote Sensing in Agriculture. Butterworths,London, pp. 19e43.

Haigh, J.G.B., 2005. From photographs to maps. A collaborative development. In:Bourgeois, J., Meganck, M. (Eds.), Aerial Photography and Archaeology2003. A Century of Information. Academia Press, Ghent, Belgium, Ghent,201e212 pp.

Hartley, R., Zisserman, A., 2003. Multiple View Geometry in Computer Vision,second ed. Cambridge University Press, Cambridge.

Jobst, W., 1983. Provinzhauptstadt Carnuntum. Österreichs größte archäologischeLandschaft. Wiener Verlag, Himberg.

Kraus, K., 2002. Zur Orthophoto-Terminologie. Photogrammetrie. Fernerkundung.Geoinformation 6, 451e452.

Lowe, D., 2004. Distinctive image features from scale-invariant keypoints. Int. J.Comput. Vis. 60 (2), 91e110.

Mellor, J.P., Teller, S., Lozano-Pérez, T., 1996. Dense Depth Maps from EpipolarImages. Memo 1593, Artificial Intelligence Laboratory, MIT.

Mercando, L., 1971. Villa Potenza (Macerata). e Rinvenimento di edificio romanocon pavimento a mosaico. In: Atti della Accademia Nazionale dei Lincei. Notiziedegli scavi di antichitá Serie 8, 25, pp. 381e401.

Moscatelli, U., 1985. Municipi romani della V regio Augustea: problemi storici edurbanistici del Piceno centro-settentrionale (III e I sec. a.C.). PICUS. Studi ericerche sulle Marche nell’antichità 5, 51e97.

Neubauer, W., Eder-Hinterleitner, A., 1997. Resistivity and magnetics of the Romantown Carnuntum, Austria: an example of combined interpretation of pro-spection data. Archaeol. Prospect 4, 179e189.

Neubauer, W., Eder-Hinterleitner, A., Seren, S., Doneus, M., Melichar, P., 1999. Kom-bination archäologisch-geophysikalischer Prospektionsmethoden am Beispielder römischen Zivilstadt Carnuntum. Archaeologia Austriaca 82/83, 493e551.

Neubauer, W., Eder-Hinterleitner, A., Seren, S., Melichar, P., 2002. Georadar inthe Roman civil town Carnuntum, Austria: an approach for archaeologicalinterpretation of GPR data. Archaeol. Prospect 9, 135e156.

Palmer, R., 2005. If they used their own photographs they wouldn’t take them likethat. In: Brophy, K., Cowley, D. (Eds.), From the Air. Understanding AerialArchaeology. Stroud, Tempus, pp. 94e116.

Percossi Serenelli, E., 1989. Rinvenimenti ed emergenze archeologiche nel terri-torio dell’antica Ricina. PICUS. Studi e ricerche sulle Marche nell’antichità 9,65e117.

Pollefeys, M., Van Gool, L., Vergauwen, M., Verbiest, F., Cornelis, K., Tops, J., Koch, R.,2004. Visualmodelingwith a hand-Held camera. Int. J. Comput. Vis. 59, 207e232.

Quan, L., 2010. Image-based Modeling. Springer, New York.Robertson, D.P., Cipolla, R., 2009. Structure from motion. In: Varga, M. (Ed.),

Practical Image Processing and Computer Vision. John Wiley and Sons Ltd,New York.

Sandmeier, S.R., Itten, K.I., 1999. A field Goniometer system (FIGOS) for acquisitionof hyperspectral BRDF data. IEEE Trans. Geosci. Remote Sensing 37, 978e986.

Scharstein, D., Szeliski, R., 2002. A taxonomy and evaluation of dense two-framestereo correspondence algorithms. Int. Journal Computer Vision 47, 7e42.

Scollar, I., 2002. Making Things look vertical. In: Bewley, R., Raczkowski, W. (Eds.),Aerial Archaeology: Developing Future Practice. IOS Press, Amsterdam,pp. 166e172 (¼ NATO science series Series 1: Life and behavioural sciences).

Scollar, I., Tabbagh, A., Hesse, A., Herzog, I., 1990. Archaeological Prospecting andRemote Sensing. Cambridge University Press, Cambridge.

Seitz, S.M., Curless, B., Diebel, J., Scharstein, D., Szeliski, R., 2006. A comparisonand Evaluation of multi-view stereo reconstruction algorithms. In: 2006 IEEEComputer Society Conference on Computer Vision and Pattern Recognition(CVPR’06), vol. 1. IEEE, Washington, pp. 519e528.

Snavely, N., 2010. Bundler: structure from motion for unordered image collections(accessed 28.10.10.). http://phototour.cs.washington.edu/bundler/.

Snavely, N., Seitz, S.M., Szeliski, R., 2006. Photo tourism: exploring photo collectionsin 3D. ACM Trans. Graphics 25, 835e846.

Szeliski, R., 2011. Computer Vision. Algorithms and Applications. Springer, NewYork.

Ullman, S., 1979. The interpretation of structure from motion. Proc. Royal Soc.London B 203, 405e426.

Verhoeven, G., 2008. Imaging the invisible using modified digital still cameras forstraightforward and low-cost archaeological near-infrared photography.J. Archaeological Sci. 35, 3087e3100.

Verhoeven, G., 2011a. Near-Infrared aerial crop mark archaeology: from itshistorical use to current digital implementations. J. Archaeol. Method Theory.

Verhoeven, G., 2011b. Taking computer vision aloft e archaeological three-dimensional reconstructions from aerial photographs with PhotoScan.Archaeol. Prospect 18, 67e73.

Verhoeven, G., Smet, P., Poelman, D., Vermeulen, F., 2009. Spectral characterizationof a digital still camera’s NIR modification to enhance archaeological observa-tion. IEEE Trans. Geosci. Remote Sensing 47, 3456e3468.

Page 11: Journal of Archaeological Sciencearap.univie.ac.at/fileadmin/user_upload/p_arap/Verhoeven...accurate for a detailed multi-temporal and multi-method analysis. 3. Structure from motion

G. Verhoeven et al. / Journal of Archaeological Science 39 (2012) 2060e20702070

Verhoeven, G., Taelman, D., Vermeulen, F., 2012. Computer vision-based orthophotomapping of complex archaeological sites: the ancient quarry of Pitaranha(Portugal-Spain). Archaeometry. doi:10.1111/j.1475-4754.2012.00667.x.

Vermeulen, F., 2002. The Potenza valley survey (Marche). In: Attema, P.A.J.,Burgers, G.-J., van Joolen, E., van Leusen, M., Mater, B. (Eds.), New Developmentsin Italian Landscape Archaeology. Theory and Methodology of Field Survey.Land Evaluation and Landscape Perception. Pottery Production and Distribu-tion. In: Proceedings of a Three-Day Conference Held at the University ofGroningen. Archaeopress, Oxford, pp. 104e106.

Vermeulen, F., 2004. Fotografi a aerea fi nalizzata nelle Marche centrali: un progettointegrato. Archeologia Aerea. Studi di Aerotopografia Archeologica 1, 91e118.

Vermeulen, F., 2011. Reviewing 10 years of aerial photography in the valley ofthe River Potenza (Le Marche). Archeologia Aerea. Studi di AerotopografiaArcheologica 4, 259e264.

Vermeulen, F., Verhoeven, G., 2004. The Contribution of aerial photography andfield survey to the study of urbanization in the Potenza valley (Picenum).J. Roman Archaeol., 57e82.

Vermeulen, F., Verhoeven, G., 2006. An integrated survey of Roman urbanization atPotentia, central Italy. J. Field Archaeol. 31, 395e410.

Vorbeck, E., Beckel, L., 1973. Carnuntum e Rom an der Donau. Otto Müller Verlag,Salzburg.

Wilson, D., 2000. Air Photo Interpretation for Archaeologists. Tempus, Stroud.Wischounig-Strucl, D., Quartisch, M., Rinner, B., 2011. A structure based mosaicking

approach for aerial images from low altitude of non-planar scenes. In:Wendel, A., Sternig, S., Godec, M. (Eds.), 16th Computer Vision WinterWorkshop.

Wolf, P.R., Dewitt, B.A., 2000. Elements of Photogrammetry with Applications inGIS, third ed. McGraw-Hill, Boston.