class of 2007 senior conference on computational geometry and …adanner/cs97/f06/papers/... ·...

Computer Science Department

CPSC 097

Class of 2007

Senior Conference on

Computational Geometry

and GIS

Proceedings of the Conference

Order copies of this proceedings from:


Swarthmore College

500 College Avenue

Swarthmore, PA 19081

USA

Tel: +1-610-328-8272

Fax: +1-610-328-8606

[email protected]

ii

Introduction

About CPSC 097: Senior Conference

This course provides honors and course majors an opportunity to delve more deeply into a particular

topic in computer science, synthesizing material from previous courses. Topics have included advanced

algorithms, networking, evolutionary computation, complexity, encryption and compression, and

parallel processing. CPSC 097 is the usual method used to satisfy the comprehensive requirement for a

computer science major.

During the 2006-2007 academic year, the Senior Conference was led by Andrew Danner in the area of

Computational Geomety with Applications in GIS.


Charles Kelemen, Edward Hicks Magill Professor and Chair

Lisa Meeden, Associate Professor

Tia Newhall, Associate Professor

Richard Wicentowski, Assistant Professor

Andrew Danner, Visiting Assistant Professor

Program Committee Members

Dan Amato

Scott Blaha

Andrew Danner

Taylor Hamilton

Phil Katz

Anthony Manfredi

Shingo Murata

Mustafa Paksoy

Alexandr Pshenichkin

Matt Singleton

Stephen St. Vincent

Giovanna Thron

Bronwyn Woods

Conference Website

http://www.cs.swarthmore.edu/˜adanner/cs97/f06/

iii

Conference Program

Wednesday, December 13, 2006

1:20–1:40 Finding Your Inner Blaha: GPS Mapping of the Swarthmore Campus

Matt Singleton and Bronwyn Woods

1:45–2:05 Border Patrol

Shingo Murata and Dan Amato

2:10–2:30 Parallelized Interpolation: A Quantitative Assesment

Scott Blaha and Mustafa Paksoy

2:45–3:05 Bridge Detection from Elevation Data Using a Classifier Cascade

Anthony Manfredi and Alexandr Pshenichkin

3:10–3:30 Flow Routing on Flat Terrains

Taylor Hamilton and Giovanna Thron

3:35–3:55 Shapefile Overlay Using a Doubly-Connected Edge List

Phil Katz and Stephen St. Vincent

iv

Table of Contents

Finding Your Inner Blaha: GPS Mapping of the Swarthmore Campus

Matt Singleton and Bronwyn Woods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Border Patrol

Shingo Murata and Dan Amato . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Parallelized Interpolation: A Quantitative Assesment

Scott Blaha and Mustafa Paksoy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Bridge Detection from Elevation Data Using a Classifier Cascade

Anthony Manfredi and Alexandr Pshenichkin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Flow Routing on Flat Terrains

Taylor Hamilton and Giovanna Thron. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25

Shapefile Overlay Using a Doubly-Connected Edge List

Phil Katz and Stephen St. Vincent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

v

Appeared in: Proceedings of the Class of 2007 Senior Conference, pages 1–5,Computer Science Department, Swarthmore College

Finding Your Inner Blaha:GPSMapping of theSwarthmoreCampus

Matt Singleton and Bronwyn Woods{msingle1,bwoods1}@cs.swarthmore.edu

Abstract

Swarthmore College is a largely un-mapped, dangerous region of southeast-ers Pennsylvania. Or rather, we treat it assuch for the purposesof this paper. Wepresent an interactive tool for calculatingthe shortest path between two points ontheSwarthmore campus. We develop ourtool using a combination of GPS tech-nology and knowledge of Swarthmore’sbuildings. We allow users to specify aBlaha factor, which scales theweights ofindoor paths, causing them to betreated asshorter or longer than their real lengths inthe shortest path calculations. In this way,users can express apreference for travel-ing primarily indoorsor outdoors, depend-ing on personal preference and weatherconditions.

1 Introduction

People familiar with a place often have strong in-tuitions about the most efficient ways of travelingbetween locations they frequent. However, people’sintuitions are sometimes indisagreement. Addition-ally, special circumstances such as extraordinarilynice or foul weather may influence aperson’s pref-erence for possible routes. We present a tool foridentifying the shortest pathbetween two points ontheSwarthmore College campus, allowing for pref-erencesfor indoor or outdoor paths.

We mapped theoutdoor pathson the campus us-ing GPStechnology and estimated the indoor paths

based on our knowledge of the buildings. Using acombination of manual and algorithmic techniques,we transformed our raw point data into a graph onwhich we perform shortest pathrouting using Dijk -stra’s algorithm. We allow indoor and outdoor pathsto beweighted differently, effectively discounting orpenalizing thedistancetraveled inside.

Our tool is presented as an interactive GUI thatallows theuser to select points on a map of Swarth-more’s campus. The tool graphically displays theshortest path according to thevalueof theBlahafac-tor, or weighting of the indoor paths, set by theuser.

2 Related Work

2.1 Global Posit ioning System

The NAVSTAR Global Positioning System (GPS)providesprecise information about location by us-ing signals transmitted by 24satellites inEarth’sor-bit. Originally designed for exclusive military use,the system wasopened for civilian use asit becamefully operational in the early 1990s. GPSsatellitestransmit ranging signals which encode informationabout the satellite’s location at the time the signalwas sent. By combining information received fromseveral satellites, this signal allowsGPSreceivers tocalculate their 3D location on Earth’s surface. Theaccuracy of locations determined by GPScan rangedramatically depending onthequality of theGPSre-ceiver. Commercial quality GPSreceiver units havetypical errors of between 10m to 30m, while moreexpensive systemscan reach an accuracy at the sub-centimeter level (US , 2003).

Many factors contribute to the overall accuracy

1

of measurements taken using GPS. These include,in addition to the quality of the receiver, atmo-spheric conditions, the environment of theuser andtheposition of theGPSsatellites relative to theuser.GPSmeasurements with commercial receivers canonly beperformed outdoors andcan bedisrupted bydensetree cover or other largeobstacles(US, 2003).

2.2 Dij kstra’sAlgor ithm

Dijkstra (1959) describes two algorithms for find-ing shortest pathson a graph, one for finding theminimum spanning tree andtheother for finding theshortest path between two nodes. For the purposesof this project, wewere concerned only with thelat-ter. The operation of this algorithm is discussed insection 3.3.

3 Methods

3.1 Data Collection

We collected raw dataabout thepathsontheSwarth-more College campus using a Garmin GPSMap70GPS unit. We marked paths by recording pointsmanually at regular intervals. Manually recordingpoints allowed us to determine thefrequency withwhich we recorded points and to ensure that werecorded pointsat the intersections andendpointsofpaths. Each point we recorded wasgiven a uniqueID, allowing us tokeep track of which pointsstartedand ended any given path. In total, we collected910 points. In addition to thedata points delimitingthe paths, we recorded individual points represent-ing the doors into the campus buildings. The self -reported accuracy of theGPSunit averaged around10m for all of our datacollection.

We recorded our data using the UTM coordinatesystem. The UTM system breaks the globe intozones, or bandsrunning north to south. A location isdefined by its zone, an easting and a northing. Theeasting represents thedistancefrom the edge of thezone, while thenorthing gives thedistancefrom theequator.

3.2 Processing Techniques

Once we had gathered our raw data, we needed tomake anumber of additions and changes toprepareit for screen display and path computation.

3.2.1 Hand Cleanup (Fir st Pass)

Our raw GPSdata was surprisingly good, but itstill contained anumber of erroneousdatapoints. Atthis point we had a preliminary GUI that allowed usto view thedata as a collection of numbered pointsand lines. Given this view, it wasrelatively easy toidentify the erroneous data points visually and thenremove them by hand.

In addition to the erroneous data points, the rawGPS data is presented as one unbroken line. Theresult is long segments connecting the end of onepath to thebeginning of another. Wedivided thedatainto the individual paths againby visual examinationof thedata.

3.2.2 L ine Intersection

Our next task was tofind thepoints at which thepaths intersected. Because thenumber of points inour data set was small, we decided to do this us-ing a brute-force algorithm. Each path is made upof a number of straight line-segments, so we sim-ply check each segment for intersections with ev-ery other segment. If an intersection is found, thatpoint is added to both paths. This operation isO(n2)wheren is thenumber of line segments.

3.2.3 Hand Cleanup (Second Pass)

Figure 1: In theraw pathdata, somepaths end a bittoo soon, while others endabit too late.

While GPSdid a very goodjob of gathering datawith good relative positioning (straight paths arestraight and curved paths curve where they are sup-posed to), it did a much poorer job at absolute posi-tioning. As aresult, pathsoften end slightly beforeor slightly after they should (seefigure 1). We usedour preliminary GUI to visually identify where theseproblem areaswere. We then added or removedpoints from thedataby hand, as appropriate.

2

3.2.4 Adding Doors and Indoor Paths

As noted above, we gathered individual pointsmarking the entrances tobuildings in addition toour path data. Unfortunately, due to the absolutepositioning problems with the GPS data, many ofthesepoints were significantly wrong. Based on ourknowledgeof thebuildingsoncampus, wewere ableto identify which doors points were worth keepingand which we needed to be adjusted manually. GPSdoesnot work inside, so we needed to add indoorpathsby hand. We approximated thesebased on ourknowledge of thebuildings and the locations of thedoors.

3.2.5 Creating aGraph

Finally, with all the paths and intersections inplace, we needed to convert our data into a graphthat that we could use to compute shortest pathsus-ing Dijkstra’s algorithm. As our data was then, asingle path could contain multiple vertices and spanmultiple edges. Weneeded to segment it so that eachpath corresponded to one edge in thegraph, andeachendpoint corresponded to avertex in thegraph. Withthedata in this format, it wasrelatively easy to buildthegraphstructure inonepass throughthedata.

Figure2: A simple graph.

Our graph is implemented as Python dictionary(essentially a hash table) where thekeys are vertexIDs for every vertex in the graph and the value isa dictionary where thekeys are vertex IDs for allconnected vertices and thevalueis theweight of thecorrespondingedge. Becausehash table lookups canbe done in constant time, building the graph is anO(n) operation wheren is thenumber of paths.

{‘A’: { ‘B’: 3, ‘C’: 2},‘B’: { ‘A’: 3, ‘C’: 1},‘C’: { ‘A’: 2, ‘B’: 1}}

Figure 3: An example of our graph representationdescribing thegraph displayed in figure2

3.3 Computing thePath

Now that we have constructed our graph, we cancompute the shortest pathbetween any two verticesusing Dijkstra’s algorithm (Dijkstra, 1959). Dijk -stra’s algorithm, left to its own devices, will com-pute the entire minimum spanning treeof a graph,starting at a given node. Since all we care about isa single shortest path, we can stop computation assoonasour destination vertex is added to thetree.

The algorithm is simple, and can be easily imple-mented in Python. To begin, the algorithm sets thedistance to the start vertex as0 and the distance toall other vertices as∞. Initially, the tree containsonly the start vertex. The algorithm proceeds by in-crementally adding adjacent vertices to thetreeuntilevery reachable vertex is added. The next vertex tobe added to thetreeis always thevertex whose ad-dition will minimize thelength of the longest path.Refer to figure4 for an example.

4 Results

4.1 Raw Data

Figure5: The raw pathdata from theGPSunit.

3

(a) Step 1 (b) Step 2 (c) Step 3

(d) Step 4 (e) Step 5 (f) Step 6

(g) Step 7

Figure4: A trivial example of the execution of Dijkstra’s algorithm.

Figure 5 shows our raw path data. The map isclearly recognizeable as theSwarthmore campus,but there aremany flaws tobe corrected. We can seeseveral immediate problems with this data. Somepaths that should intersect fall short and do not meet.Other linesdointersect, but extend past the intersec-tion when they should end. Lastly, somepoints areclearly inaccurate by a large margin causing spikesin a few of thepaths.

4.2 GUI

Oncewehad all of theunderlying structurebuilt, wecreated aGUI to concisely present all of thedataandprovide an easy method for getting user input. TheGUI is comprised of threedistinct areas.

Map canvas Thelargest andmost important part ofthe GUI is the map canvas. This is where themap is displayed and the user can select thestart and end points of their desired path. Oncetwo pointsare selected, the shortest pathis cal-culated based on the current Blaha factor anddrawn own themap.

Statusbar Along the bottom of the window, thestatus bar displays the current x- and y-coordinates of themousepointer aswell as thecurrent Blaha factor.

Input area To theright of theMap canvas, the inputarea allows theuser to change theBlaha factor

and reset themap.

5 Discussion

Our final product is an interactive tool for shortestpathrouting ontheSwarthmore campus. Thoughitmight seem that the tool would be superfluousgiventhe familiarity of students with the campus, anec-dotal evidence shows that some shortest paths, es-pecially with modified Blaha factors, are surprisingeven to Swarthmore students.

Though our map is created from GPSdata, thereare several possible sources of inaccuracy whichmight affect the shortest path calculations. For onething, the lengths of the indoor paths are only esti-mated, and do not take into account stairs that mustbe climbed or doors that must beopened. We also donot consider elevation for theoutdoor paths, thoughthis is unlikely tomake a significant difference.

There is a potential to expand our tool in a vari-etyof directions. For instance, we could expand ourmap to cover a greater portion of Swarthmore Col-lege and the surrounding areas. We could allow theuser to specify which buildings he could not passthrough due, for instance, to not having a key. Wemight also be able to improve the accuracy of thedoor data points by sampling several points at thedoors andaveraging their positions.

Though the methods we used for creating asearchable map of the Swarthmore campus worked

4

(a) Theshortest path with theBlaha factor set to 1. (b) Theshortest path with theBlaha factor set to 0.1

Figure 6: The final display given by the GUI, showing the shortest pathsfrom the Science Center to theMcCabeLibrary with theBlaha factor set to 1 and 0.1.

for this task, they would not be scalable. This isdueto thelarge amount of handcleanupinvolved. How-ever, the amount of error present in theGPSdataweobtained necessitates this hand clean-up. It seemsthat thetask of creating amap of the typewepresentfor a larger areawould require adifferent approach.

6 Conclusion

We present in this paper an interactive path findingtool for the Swarthmore College campus. The toolallows for differential treatment of indoor and out-door paths, allowing theuse to specify a preferencefor travel. Thoughthemethods that weused for cre-ating the map and searchable graph would not bescalable to larger maps, the techniqueswere effec-tive for our task. Our interactive GUI allows theuser to discover efficient pathswhich are occasion-ally surpsising even to individuals familiar with thearea.

References

E. W. Dijkstra. 1959. A Note on Two Problems in Con-nexion with Graphs. Numerische Mathematik, 1:269–271.

US Army Corps of Engineers, 2003. Engineering andDesign - Navstar Global Positioning System Survey-ing, July.

5


Border Patrol

ShingoMurataSwarthmoreCollege

Swarthmore, PA19081

[email protected]

Dan AmatoSwarthmoreCollege

Swarthmore, PA19081

[email protected]

Abstract

We implement a border patrol programthat computes ideal locations for observa-tion towers overlooking a border of inter-est. In this particular project, we studytheborder of Arizona facing Mexico. WeuseGRASSto manipulateelevation rasterdata. Our algorithm extracts the borderfrom araster file andlocates candidatepo-sitions for observation towers along thatborder. The viewshed of each observa-tion tower iscomputedwith adirect lineofsight algorithm. We employ a tower place-ment algorithm to select only the neces-sary towers from the set of all candidatetowers. Our algorithm selected 92towerswithin a 5km zone from theborder to pa-trol the approximately 680km border.

1 Introduction

Border patrol is a common interest involved in na-tional security. Militaries are frequently concernedwith detecting threatsalongthe extent of aparticularborder as completely and efficiently aspossible. Itis important that border security can be establishedcost-effectively aswell. We model this problem asthe task of placing the minimum number of towersnecessary to view the entireborder of interest withinsomerangeof that border.

Current GIS technology makesit possible to au-tomate theprocessof planning optimal locations forborder observation towers. In this paper we develop

algorithms to automatically extract a border from araster file, find candidate tower locations near theborder, and select the optimal set of towers that iscapable of observing the entireborder.

Elevation data for many areasof interest areread-ily available on theInternet. Weuse theGeographicResourcesAnalysis Support System (GRASS) fordata manipulation. GRASSis suited for this projectbecauseit hasfull functionality in visualizing eleva-tion, data conversion, and I/O support for ascii filesthat are compatible with our C programs. With anabundance of digital elevation models in raster for-mat and the software tools to manipulate them, wecan successfully apply useful viewshed computationalgorithmsto theproblem of finding theoptimal setof observation towers.

The viewshed of a view point is the set of pointsin the terrain model that can be observed from thatpoint. Viewshed computation is central to the au-tomation of these algorithms becauseit is essentialto know which border pointsa candidatetower is ca-pable of observing. Our tower placement algorithminvolves theiterative selection of candidate towerswith the greatest contribution of yet unseen borderpoints to the current optimal set of observation tow-ers.

There are several parameters involved in towerplacement. Oneof our goals isto minimize thenum-ber of towers we need to observe the entire border.This number is dependant both on theheight of thetowers and themaximum distance they are allowedto be placed from the border. As theheight of thetowers is increased, the number of necessary tow-ers decreases, but the cost of each tower increases.

6

Finding a cost-minimizing solution would involvebalancing thesefactors. Increasing thedistancefromtheborder in which towers can beplaced may incor-porate useful elevation maxima into themodel thatwere previously out of range. The gains from thisare limited by atmospheric conditions that limit vis-ibility and, ultimately, by the curvatureof theEarth.

We begin by reviewing the viewshed algorithmpresented byDavid Izraelevitz in (Izraelevitz, 2003).We then describe in detail the threemajor steps inour project: border extraction, viewshed computa-tion, and tower placement. Finally, we present theresults in section 4.

2 Related Work

Several algorithms for computing the viewshed ofa point are presented in (Izraelevitz, 2003). Thefirst method presented is direct computation. Thismethodessentially checks each possible obstructionon a line from theview point to thetarget point. Ifthere areno obstructions alongthis line, then thetar-get point is considered visible. This algorithm isstraight forward, but is computationally inefficient,becauseit requiresO(n) computations for each gridpoint on an n x n field, resulting in an O(n3) algo-rithm.

The Xdraw algorithm employs theLine of Sight(LOS) function to compute the viewshed of a viewpoint. This algorithm is faster than thedirect methodbecauseit storespreviousresults that can beutilizedat thenext stageof computation alongthe sameline.

The final algorithm improves theXdraw algo-rithm by introducing a backtracking method to re-duce thenumber of interpolations and increase theaccuracy of theLOScalculations. If any point alongthe line between theview point and thetarget pointcoincideswith a grid data point, that data point isused to initialize theLOS computation. Otherwise,the algorithm backtracks a specified distance andinitializes the computation with an interpolated LOSvalue.

To determine border visibility , we do not needto compute the entire viewshed from a given viewpoint. We only need to determine whether targetpoints on theborder are visible from a tower’s viewpoint. It is irrelevant to know whether thepointsbe-tween theobservation tower and theborder are vis-

ible. This negates thebenefits of the Xdraw algo-rithm which are based ona fast incremental compu-tation. We implement and use thedirect algorithmbecauseits inefficient computation is mitigated dueto the limited number of points we are examining.In addition, its accuracy is superior to Xdraw.

Vincent presents an alternative viewshed algo-rithm based on ray casting in three dimensionalpolygonal models in (Vincent, 1999). The algorithmis based on a hierarchical partitioning of three di-mensional space. This partition is represented by ak-d tree. The hierarchical partitioning of the spacecontinuesuntil the model of the terrain is enclosedin appropriate bounding boxes. This hierarchicalstructuring of the space allows Vincent to increasethe search speed for view obstructions. Other opti-mizations include abackface culling mechanism toremove irrelevant surfacepolygons from the searchspace andaparallelization of the algorithm. Vincentalso setsupan error vs. speed tradeoff by adaptivelyspacing therays used to test for intersection. As thenumber of rays increases, accuracy improves at thecost of increased execution time.

3 Methods

Our automated border patrol algorithm has threemain phases. First, the relevant border must be ex-tracted from the data. Second, the set of candidatetowers must be determined, and the viewshed ofthese towers must be computed. Finally, the towerplacement algorithm must select the optimal set ofobservation towers from the entire set of candidatetowers. These selections are based on thevisibilitycharacteristics of the candidate towers. These stepsare elaborated below.

3.1 Border Extraction

To patrol theborder, wefirst need the coordinatesofeach point on theborder. It is not trivial because thedigital elevation model contains no information re-garding theborder. Sincewe are looking at theAri-zona border for this particular experiment, and Ari-zona has a straight border facing Mexico, we couldhave figured out the equation for theline. However,this method would severely restrict the type of bor-der we can patrol. Therefore, we present an algo-rithm to extract aborder in general.

7

Figure1: Start List Construction

Althoughwe have no information about the bor-der in the raster file, we do have a vector file thattraces theborder. Webegin by converting thevectorfile into raster format using GRASS. Raster cells onthe interior of theborder are marked with a 1, whileraster cells exterior to theborder are marked with aNULL flag. We then output theraster to an ascii file,giving us afile of 1’s and NULL’s. Our algorithmdefines aborder candidate to be apoint whoseownvalueis 1, and also has at least one neighbor that isNULL.

Sincewe areworking with a section of theborder,we need to specify the start coordinate and the endcoordinate. The algorithm checks the8 neighborsofthe start point. If any of them areborder candidates,thepointsare added to thestart list. It also marks all8 neighbors as “visited” . Figure 1 displays the stateof the algorithm after the start list hasbeen created.The grey cells are those that have been added to thestart list.

For each point in the start list, our algorithmworks asfollows:

1. Dequeue apoint from the start list. Call it p.

2. Look at the 8 neighbors of p, and find bordercandidates that have not been marked as “vis-ited” yet. If there are any such points, addthemto awork list. Thework list is implemented as alinked-list queue, asis our temporary “border”described below.

3. Dequeue thefirst point from the work list andmark it as “visited.” Add thepoint to thetem-porary border linked list. Call this point p and

then repeat step 2. Continue until thework listbecomes empty or wehit the “end” point.

4. If the work list has become empty withoutreaching the end, we scrap that temporary bor-der. If we hit the end point, we have detecteda potential border. In either case, we restart theprocessfrom step 1with another start point, un-til we exhaust all the start point possibilities.

Once all start pointshavebeen expanded, wemayhave two valid borders: one for each of the twodirections along the border that lead from the startpoint to the end point. We must determine whichof thetwo potential borders is thereal border of in-terest, and which is simply the remaining border oftheregion which excludes theborder of interest. Fornow, we make this decision by simply choosing theshorter border.

We then copy the linked list into an array, as ar-rays are easier to work with for our purpose. At thispoint, each border cell all hasheight of 1 since theywere extracted from the1 or NULL raster. It is im-portant that wegothroughthe actual elevation rasterand set the elevation of theseborder points to theirtruevalues.

At this point we have extracted the border be-tween the start point and the end point. We also de-fine thezone in which towers may be placed. Thiszone is aband of specified width alongthe extractedborder and insideof the “home” territory. To findthe points falling in this zone, we essentially per-form a breadth first search of the1 or NULL raster.We begin with the extracted border points, and addthem into a queue. While the queue is not emptyandthe specified width hasnot been reached, wede-queue apoint from thequeue. This point is flaggedas “visited,” and all of its unvisited, home territoryneighbors are added to thequeue. All of thepointsin the zone are stored in an array for futureuse.

3.2 Viewshed Computation

The viewshed of a view point in a terrain model isthe set of all pointsvisible from that point. To patroltheborder we do not need to compute thefull view-shed of a tower location. We only need to computethevisibility of points on theborder. If a set of tow-ers can collectively view each border point, then theborder is considered to be fully patrolled by that set.

8

As we are not interested in computing the fullviewshed of potential tower points, but only thevisibility of the border points, we do not need theoptimized approximation algorithms presented in(Izraelevitz, 2003). We can afford to simply com-pute the visibility from the potential tower point toeach target border point. The relevant points in theviewshedcomputation are theview point, pv, thetar-get point, pt, andthepoint of potential view obstruc-tion, p. The visibility of pt from pv can be deter-mined with equation 1.

elv(pt) > elv(pv)+|pt − pv|

|p − pv|(S(p)−elv(pv)) (1)

elv(pa) denotes the elevation of point pa.|pa − pb| representstheEuclidean distancebetweentwo points, pa and pb. Finally, S(pa) is the inter-polated elevation of the terrain model at the (x, y)coordinates of pa.

This inequality places a constraint on the heightof the target point, pt, based on the elevation dataalong a sight line between the view point and thetarget point. The inequality generates theminimumelevation at the target location that is visible fromthe view point given the elevation of the potentialobstruction at point p. If the actual target elevationis less than or equal to this minimum, then thetargetpoint is not visible from theview point because thesight line is obstructed by theobstacle at point p.

To determine the visibility of the target point,equation 1 must be evaluated at each point, p, alongthe sight line between the view point and the tar-get point. If no point along the sight line obstructstheview, then thetarget is visible. We approximatethis test by evaluating equation 1at regular intervalsalong the sight line between theview point and thetarget point. The theoretical sight line will fall be-tween two grid datapoints in general. The elevationat point p is the interpolated elevation of thetwo griddatapoints oneither sideof the sight line.

The interpolated elevation at point p and the ele-vation of thetarget point are adjusted for theEarth’scurvature prior to the computation of equation 1.The curvature adjustment of a point pa’s elevation,∆elv(pa), is given by equation 2, where d representsthequantity |pa − pv|. The geometry of the approx-imation is illustrated in figure2.

Figure2: Geometry of theEarth CurvatureApprox-imation

∆elv(pa) =√

REarth2 + d2 − REarth (2)

We use this viewshed computation method tocompute the visibility of each border point from apotential tower location.

3.3 Placement Algorithm

Our goal is to view the entire border with as fewtowers aspossible. As noted in section 3.1, thereis a zone of specified width back from the borderin which towers may be placed. We baseour towerplacement on theprinciple that an observer can seemore if s/heis at a higher elevation. We begin bylocating all of the local maxima with in the zone.A local maximum is defined as araster cell havingno higher neighborswithin the zone. Thesemaximaserve as the candidatepositionsof our towers.

Wefirst figureout which candidate towers are ab-solutely necessary. A candidate isconsidered neces-sary if there are points on the border that can onlyby seen by that candidate position. Therefore, in-stead of computing which points each tower wouldbe able to see, we compute for each point on theborder, which towers can seeit (although they arecomputationally equivalent).

9

Figure3: Example of border visibility state

We maintain a table of integers, with each rowcorresponding to apoint on theborder. In each col-umn, we keep track of a tower identifier that cansee thepoint. For each point on theborder, we testwhether that point can be seen by each of the can-didate positions. If the point is visible from a can-didate tower location, we add the identifier for thattower to thepoint’s row in thetable.

Our table may look likefigure3, which shows thatthefirst point on theborder can be seen byTower 0,the second byTower 0 and 1, the third by Tower 0,1 and 2, and so on. Every time a tower is added tothetable, we increment the “score” of that candidatetower by 1. The score corresponds to thenumber ofpoints the candidate tower could seeif it werebuilt.

After we have created the table, we traverse thetable to find theborder points visible from only onetower. We know that the single towers that can seethesepoints are critical. In the table above, for ex-ample, Tower 0 is critical becauseit is theonly towerthat can see thefirst point of theborder.

Every time we find a critical tower, we add it tothepermanent tower list. Then we traverse the inte-ger table to “close” all border points that can be seenby that tower. We traverse thelist, find out whichpoints that tower can see, and, flag that point as “se-cured.” We also traverse therow for the securedpoint, and if any other towers can seeit, we decre-ment the scoreof those towers. By decrementing thescoresof these towers, we ensure that the score rep-resents only the number of unsecured points view-able from the candidate tower. This prevents redun-dant towers which can view few new border pointsfrombeingselectedasgoodcandidates infutureiter-ationsof the algorithm. The effect of addingTower 0to thepermanent list in our example above is illus-trated in figure 4. The score of Tower 1 is decre-

Figure4: Example of updated border visibility state

mented by threebecause threeof the border pointsit can view are made redundant with the addition ofTower 0 to thepermanent list.

Every timewe adda tower to thelist, we check toseeif the entire border hasbeen “secured” yet. Thechances are, the border cannot be secured by justthose critical towers. So we move on to the secondphaseof theplacement algorithm, in which we sim-ply pick the candidate tower with the highest scoreout of the remaining towers (when we put a towerinto thepermanent tower list, we set its score to -1, so we never look at it again). When we add thetower, we “close” thepoints it can see asbefore. Werepeat this processuntil the entireborder is secured,or there arenomore candidate towers to consider.

4 Results

The input raster, which covers the southern half ofArizona, has2846 x 5705 data points at 100m res-olution. When we extract theborder, we find that ithas6768 data points. Out of the6768 points, 1332are local maxima, so we begin with 1332candidatepoints for the towers.

Webegin by restricting tower placement to pointsdirectly on theborder. When we scan for pointsvis-ible from only one tower, we find that only 6 towersare critical. With those6critical towers, we can view844/ 6768 border points. The next tower we place,the tower with thehighest score, can view 619 pointsby itself . After securing 6642/ 6769 points, our ad-ditional towers can only see5 novel points. We alsoneed 15towers that can only see1 unique point. Intotal, weneed 120towers.

When we expand the zone, we have many morepoints to work with. Within a 5km zone from theborder, we have more than 370,000 points, fromwhich we obtain 8974 maxima. While we have

10

Figure 7: Number of Towers vs. Height of Towers(meters)

about 50 times asmany total data points, we onlyhave about 7 times asmany maximabecausewe arenow checking all 8 neighbors to test if a point is amaximum, asopposed to just checking the 2 adja-cent points on the border. For 20m towers, by ex-panding theborder zone to5km wide we decreasedthenumber of necessary towers to92.

5 Discussion

Considering that theborder is approximately 680kmlong, thenumber of towersweneed is relatively low.With 20m towers in a5km border zone, we need 92towers. On average, each tower is responsible forroughly 70 data points, or 7km of the border. Thenumber of towers seems to be higher than it couldbedue to the clusters that we can observe infigure6.The clusters suggest that our algorithm may not besuited to certain geographical featuresfoundin theseregions.

Still, we can infer some useful information fromthedata. We expected thenumber of towersrequiredto decrease aswe increased the tower height, but aswe can seefrom figure 7, it seemsto be asymptotic.In fact, when we tested with kilometer-high towers(which is absurdly high), we found that we wouldstill need 23 towers. What it tells us is that after

Figure 8: Number of 20 meter Towers vs. Width ofBorder Zone (meters)

Figure 9: Effect of Zone Size on Number of Neces-sary Towers

11

Figure5: Tower LocationsDirectly On TheBorder

Figure6: Tower LocationsWithin A 5km Zone

12

Figure 10: Pie Chart Overlay of Zone Point Eleva-tion, MaximaElevation, andTower Elevation

a certain point, we do not gain asmuch from mak-ing the towers taller. By comparing our data withthe cost of building towers of varying height, wecan theoretically obtain theoptimal tower height intermsof cost.

We can also see that expanding the zonefrom juston the border to some distance away from the bor-der tends to improve performance. As seen in fig-ure 9, we can reduce thenumber of towers (rangingin height from 10 meters to100meters) by between13.85% and 28.86% by allowing a 5km zone. Thenumber is boundto be inconsistent, because it de-pendson how many more important maximawe cangain by backing up, and thedefinition of an impor-tant maxima depends on the tower height. We alsonote that this behavior also seemsto be asymptotic,even more so than the tower height. By movingaway from theborder, we are expanding thevisibleregion, but we are also making ourselves suscepti-ble to moreobstaclesontheway. Also, wemay losesomevisible pointsdue to theEarth’s curvature. For20m towers, it looks like 3km away from theborderis the best distance. Still, it seems to be very im-portant that we actually use the zone toimprove ourperformancerather than rely onincreasing the towerheight. For 20m towers, the23% improvement dueto the5km zone is roughly equivalent to improve-ment that could be gained by increasing the towerheight to 50m.

The distribution graph in figure 10 shows the el-

evation distribution of 3 categories: the outermostring corresponds to the entire border zone, themid-dle ring is the local maxima, and the inner ringshows the actually placed towers. Local maximaaredistributed fairly evenly across the zone, as inferredfrom thegraph. In contrast, we can tell that the algo-rithm preferred to place the towerson higher points.Even thoughmajority of theregion is low elevation(0 - 1000m), almost half of the towers are placedat elevation 1000m or higher. We had few towersplaced in elevation 2km or higher relative to there-gion, but it is mostly because all the high elevationpoints are clustered near the east end of theborder,and if we place afew towers in the elevated regiontheir visible field should be large, negating theben-efit of more towers in that region.

A major problem with our algorithm is its com-putational complexity. Disregarding the complexityof file I/O, it first takesO(n) time for locating themaxima, where n is the length of the border. Thenscanning thepoints on theborder costs O(n3) time(each border point × each candidate point × linearviewshed computation, thoughwe expect the linearviewshed computation not to be aslarge asn). Theaddition of critical towers to thepermanent towerlist requires another O(n3) time, aswe need to tra-verse thetable to find each point visible from thecritical tower. For each point found, we need to tra-verseits row in the table again to “close” thepointand decrement the score of all other towers that cansee thepoint. Again, the last step usually is muchsmaller than n, as we do not expect all towers tobe able to see the same border point. When plac-ing 20m towers on top of the border, our runningtimewas around 90seconds after extracting thebor-der. As we increase theborder zone, running timeincreases, asis the casewhen we increase the towerheight. This happensdue to themethod of our view-shed computation. Becausewe are doing linear lineof sight from a point to a point, as soon aswe seean obstacle high enoughto block thetarget point onthedirect line, we can stop the computation. Whenmore towers tend to be visible from theborder, thenumber of computations required increases.

13

6 Conclusion

We have determined that theborder of Arizona canfeasiblybe patrolled by observation towers of rea-sonable height andreasonable distancefrom thebor-der. We expect that this algorithm could be success-fully applied to any border. The execution time ofthe algorithm for alargedataset wasnot prohibitive,and we conclude that thedirect line of sight methodfor determining visibility is sufficient for theborderpatrol problem. Finally, we conclude that there arediminishing returns in terms of visibility as towerheight and distance from the border are increased.Givenappropriatecost parameters, themost efficientmeans of patrolling the border could be obtainedwith our method.

7 Acknowledgements

We would like to heartily thank seamless.usgs.govfor presenting us with the elevation data.

References

David Izraelevitz. 2003. A Fast Algorithm for Approx-imate Viewshed Computation. American Society forPhotogrammetry andRemote Sensing, 69.7.

Andrew Vincent. 1999. Terrain Occlusion Using BinaryAdaptive Ray Casting. Sili conGraphics Inc.

14


Parallelized Interpolation: A QuantitativeAssessment

Scott BlahaSwarthmoreCollege

Mustafa PaksoySwarthmoreCollege

Abstract

The conversion of raw point-cloud eleva-tion data to grid DEMs is done by in-terpolation. Current interpolation meth-ods produce either high quality results ortake an acceptable amount of time, butnot both. This paper seeks to reconcilethese two objectives through paralleliz-ing a high-quality interpolation method,nearest-neighbor averaging. We explorethe speed-up obtained by parallelizationandcomparerun-timewith the lower qual-ity binning method.

1 Introduction

LIDAR, one of theprimary formsof elevation datacollection, yields a cloud of elevation data points.However, Geographic Information Systems (GIS)like GRASSoften require data in theform of a dig-ital elevation model (DEM). One of the simplestmethods to perform this conversion is called bin-ning. Binning simply averages thepoints in eachgrid cell of the DEM to yield an elevation for thegrid cell. However, becauseof thenon-uniform na-ture of thepoint cloud, someof thegrid cells of theDEM might not contain any points. Thus, it is pos-sible to haveholes in theDEM after binning.

The solution to this is to useone of a set of meth-ods known asinterpolation. A typical simple linearinterpolation might take an average of points closeto an empty cell, weighted by distancefrom the cell.Unfortunately, if there are n points, then there are

potentially O(n2) interpolation calculations to beperformed. This fact, paired with the typically ex-tremely large data sets of interest in GIS, makes in-terpolation a highly non-trivial task. In fact, in a re-cent I/O-efficient point cloudto DEM algorithm (0),from 52% to 86% of running time was spent inter-polating depending onthedata set. Clearly, a fastermethod of interpolation is needed.

The basic trade-off in interpolation is quality(e.g. representativeness) of the resulting DEM ver-sus the computational complexity of the interpola-tion. Rather than deal with reduced DEM qualityin our quest for better interpolation run-times, wewill parallelize the interpolation of point cloud ele-vation data. Becauseof the locality of referenceofthe interpolation task, parallelization can provide anexponential reduction of thetime to interpolatea setof points, based onthenumber of computers.

2 Methods

2.1 Serial Binning

Our first method is a simple implementation of se-rial binning. A grid cell’ s valueis the average of allpoints from thepoint cloud in that cell. A grid cellcontaining no points is assigned a “no value” con-stant; in our case, this was-99999. Serial binningwill be our base-line for comparison of competinginterpolation methods, both in termsof interpolationquality and run-time efficiency. We hope that paral-lelization will speed up creating a DEM, and thatinterpolation will improve thequality of the DEM.Since it takes constant time to place a point intoa bin, the run time complexity of binning is O(n),

15

wheren is thenumber of input points. However, theconstant hidden in this notation was experimentallyshown to bequite small.

2.2 Parallelized Binning

Next, we implemented a parallelized binner. Thissimply splits the task of binning and averagingpoints between several computers. It will producethe same interpolation as a serial binner, but hope-fully with a slight efficiency boost. Wedo not expectto seemuch improvement by using this method.

Our parallel applicationsuse theMessagePassingInterface(MPI) standard for distributing andcollect-ing data. MPI providesvarious facilities for passingdata aroundand synchronizing computation. In ourcase, we just needed to send data back and forth.We use theLocal AreaMulti-computer (LAM) im-plementation of MPI, it letsus set up virtual clustersusing an arbitrary number of nodes in theComputerScience Department network. Thesemachines areall on the samesubnet, so we expect network band-with to behigh and latencies tobe low.

Initially, we split data into equal size y-intervals.This led to different hosts creating overlappinggrids, which greatly complicates the process ofmerging results. To ameliorate this, we increase they-interval so it becomes amultiple of thebin heightof thegrid (seeFigure 1). So, the splits in thedataalign with the grid, which prevents different hostsfrom creating overlapping grids.

2.3 Nearest-Neighbor I nterpolation

Wehaveimplemented a simple typeof interpolation,nearest-neighbor interpolation. In this method, wefirst calculatethecentroid of each grid cell, the cen-ter point of the cell. Then, we sort points by theirEuclidean distance from the centroid. Finally, weset the elevation of thegrid cell to be the average ofthek pointsnearest to the centroid. Experimentationshowed that 10 is an acceptable valuefor k.

Because sorting is O(n log n), if we have g gridcells andn points, then this algorithm has arun-timecomplexity of O(g · n log n). This is because wemust sort all thepoints based on distancefrom eachcell’ s centroid. As we note below, this method isunacceptably slow. However, it doesnot suffer fromthe “holes” in thegrid that binning does.

2.4 Parallelized Nearest-NeighborInterpolation

Parallelization of our interpolation algorithm pro-ceeds inmuch the sameway aswith simple binning:we break the grid into a number of approximatelyequal sections, onefor each host, andsendeach hostonly the points which fall in that bin. Each hostthen performsthenearest-neighbor interpolation de-scribed above.

The theoretical speed-up we should seeis morethan quadratic in thenumber of hosts. If we have h

hosts, and the points are approximately evenly dis-tributed over thegrid, then each host will get aboutgh

of the grid and nh

of the points. So, each hostwill have aruntimecomplexity of O( g

h· n

hlog n

h) =

O(gnh2 log n

h). Our tests have supported this analy-

sis: wedo in fact get super-quadratic speedups fromadding hosts (seeSection 3).

2.5 Smoothing Parallelized Interpolation

Unfortunately, parallelization can result in edge ef-fects in interpolation results. Along the edge of asub-grid sent to a host for interpolation, the clos-est elevation data points to a centroid might be in adifferent sub-grid, and thus are not considered dur-ing the interpolation. This can result in noticeablelinesor bands in theresultant DEM. We have calledour solution to this smoothing. We pass thepointsbelonging to theimmediately surrounding sub-gridsalong with the points in the sub-grid we send toeach host. This approximately triples thenumber ofpointssent to each host, but results in lessnoticeableedge effects.

3 Resultsand Discussion

Wehavefully implemented the algorithmsdescribedabove. We ran tests on a 100,000 point subset of aLIDAR-generated 100 foot resolution point cloud.This subset was picked by sorting the 2,000,000points by y-value, and picking every twentieth. SeeFigure 2 for visualizations of the interpolated dataset. Interpolation is the clear winner in the qualitydepartment - there are no holes, and the resultantDEM looks like a “filled-in” version of the binnedDEM (Figure 2(a)). It is hard to notice thediffer-ence between smoothed and non-smoothed DEMswith thenaked eye, but Figure 2(d) shows theresult

16

� ��

Figure1: Splitting thegrid into sub-grids for parallelization

of subtracting the two DEMs from each other. No-tice that the difference lies along the cuts betweenthe sub-grids sent to different hosts. This shows thatsmoothing actually doeswhat it is supposed to, thatis, it smoothsout the edge effectsbetween sub-grids.

The actual running time of each algorithm wascalculated several times to test the possible speed-ups obtainable by parallelization. Unexpectedly,parallel binning is actually slower than serial bin-ning (seeFigure 3). We believe this in due to theoverhead of passing dataover thenetwork comparedwith the blazing speed of the simple binning algo-rithm. In the caseof interpolation, we observed theextreme parallelization speed-ups predicted above.SeeFigure4 for a chart of running timeversusnum-ber of hosts - note that the time axis is logarithmic.So the predicted super-quadratic speed-up doesoc-cur. As a simple comparison between binning andinterpolation, with 100,000 points, 20 hosts, anda 100 foot resolution, binning takes 1.4 seconds,non-smoothed interpolation takes24.8 seconds, andsmoothed interpolation takes73.3 seconds. Refer toFigure 5 for a comparison between the threemeth-ods

4 Conclusion

Parallelization provides an excellent speed-upfor in-terpolation methods, but doesnot decrease therun-

ning time of binning. Before parallelization, our in-terpolation method was intolerably slow, but with20 hosts, its runtime is very reasonable. Smooth-ing is also adesirable option when using parallelizednearest-neighbor interpolation, however it approxi-mately triples therun time of the interpolation. Forcasual use, parallelized non-smoothing interpolationsuffices. However, if more accuracy is required, thensmoothing provides that with only a three-fold slow-down.

Future work in this area could include paralleliz-ing the extremely popular quad-tree interpolation al-gorithm. Also, we can optimize thenearest neighborinterpolation by using a scan-line approach and onlysorting a section of thewhole dataset a time.

References

P. K. Agarwal, L. Arge, and A. Danner. From pointcloud to grid DEM: A scalable approach. In AndreasRiedl, Wolfgang Kainz, and Gregory Elmes, editors,Progressin Spatial Data Handling. 12th InternationalSymposiumon Spatial Data Handling, pages771–788.Springer-Verlag, 2006.

17

(a) (b)

(c) (d)

Figure 2: Results of interpolating sparsedata at 100feet. (a) is theresult of binning - note the cells with novalue. (b) is our parallel interpolation algorithm without smoothing, and (c) is with smoothing. (d) is thedifferencebetween (b) and (c).

18

1

10

100

1000

525250

Tim

e (s

)

Resolution (m)

seriallam5

lam10lam15lam20

Figure3: Comparison of serial and parallel binning running times.

10

100

1000

10000

20 15 10 5 1

Exe

cutio

n tim

e (s

)

Number of nodes

Parallel execution of interpolation

No smoothingSmoothing

Figure4: Comparison of non-smoothed andsmoothed interpolation running times, showing super-quadraticspeed-up for adding hosts.

19

0.1

1

10

100

1000

10000

20101

Tim

e (s

)

Hosts

BinnedNot smoothed

Smoothed

Figure 5: Comparison of parallel binning, non-smoothed interpolation, and smoothed interpolation runningtimes.

20


Bridge Detectionfr om Elevation DataUsinga ClassifierCascade

Anthony Manfr [email protected]

Alexandr [email protected]

Abstract

Bridges and similar non-obstructingfeaturesinhibit correctflow routing on high-resolutiondigital elevationmodelsbecausetheir apparentelevationdoesnotreflecttheelevationatwhichwatermaypassunderneaththem.Ourgoalis toidentify suchfeaturesusingthe elevation dataso thatflow-routingalgorithmsmayfind pathsunder them correctly. We use an algorithmbasedon Viola and Jones’object-recognitionsystem.Simplefilters areappliedin sequenceto efficiently narrow the searchspacedown toafinal set of likely candidatefeatures.Thispa-perpresentsasuccessfulsystemfor identifyingbridgesthatcanbefairly easilyintegratedintoexistingGISsystems.

1 Intr oduction

New hi-resterrainscanningtechniques suchaslaseral-timetry (lidar) have greatly expandedthe accuracy ofGIS. The improvedresolution hasintroducedmany newdetailsinto digital elevation maps;many suchfeatures,however, hinder analysisof the underlying bare-earthterrain. One of the most important problem featuresare bridges. From the air, a bridge appearsas a solidridge, but, in reality, water can pass beneathit. Whilea raw datadumpmay containsomepointsthat arevisi-bleunderneathabridge,currentpreprocessingtechniqueswill tend to remove these,leaving a solid obstacleonthe processeddigital elevation model(DEM). This con-fusesflow-routing algorithms,which mustflood terrainor searchfor convoluteddetoursto escapethelocal min-imum createdby the presenceof the false ridge. Ourgoal is to identify bridgesandsimilar features,suchasdrainagetunnels,ondigital elevationmodels,sothatwa-ter flow canbe routedthroughthem. Appropriateflowrouting canbe accomplishedwith minimal modificationof existingalgorithmsby simplycuttingthroughabridgeonceit is markedout.

1.1 RelatedWork

SitholeandVosselman(SitholeandVosselman,2006)de-scribea systemfor thegeometricrecognition of bridgesas part ofa general systemfor creatingbare-earthdata

from raw lidar input. Their system looksfor featuresthatdropoff sharplyon twosidesandfadesmoothlyinto thesurroundingterrainon the others. Calculatingandana-lyzing boundingpolygonsfor terrain features,however,is computationallyintensive.

Our algorithmis inspiredby computervision researchby Viola andJones(Viola andJones,2002). Their sys-tem utilizes a “cascade”of simplefilters, eachof whichis sensitive to a specificpattern. The algorithmreliablyrecognizesfacesin real-timevideo. They alsosuggestatechniquefor fastcomputationof rectangle sums,calledtheintegral imagemethod.Eachpixel in theintegral im-ageis the sumof the valuesof the pixels above and totheleft of its locationin theoriginal image,whichallowsany rectanglesumto be computedwith only four addi-tion operationsif the integral imagealreadyexists. Thistechniqueallowsusto quickly calculatestatisticsfor sub-sectionsof themap.For example,finding theaverageel-evationin a ten-by-tensquareareaconventionallywouldrequireaddingtogetheronehundredvalues;with theinte-gral imagemethod,we needonly accessfour values(thecornersof thebox) to get theareasum.

2 Methods

2.1 Algorithm

Our system is an implementationof thecascadeconceptof Viola and Jonesin a novel domain. A sliding win-dow moves over the map, examining small sectionsofthe terrain in sequence.The window may move oneorseveral pixels at a time: this is the stepsizeof the win-dow. A larger step sizedecreasesruntimesignificantlybut alsodecreasesaccuracy. Empirically we determinedthat a stepsize of2 pixels did not result in a significantdecreasein accuracy.

Eachwindow is passedthrougha seriesof filters. Afilter is afunctionthatevaluatesthepixelswithin thewin-dow statisticallyor geometricallyand decidesto acceptorrejecttheslice. To save storagespace,thealgorithmap-pliesall thefilters to eachwindow in orderbeforemov-ing on to the next; this way, no intermediatecandidatelists (which could be quite large) arestoredin memory.If any filter rejectsthe slice, it ceasesto be relevant andthewindow movesto thenext target. Like in the Viola-Jonesalgorithm,thecollective actionof thefilters makesupfor their individual inaccuracy. It is importantfor each

21

individualfilter to have avery low rateof falsenegatives,sothatthey donot rejectgoodcandidatesprematurely.

In orderto accommodatebridgesof varyingsizes,wemake several passesover the map, changingthe scaleof the window eachtime. One can reasonablyexpectbridgesto be at leastone car lane and no more than adozenlaneswide,andfiltersmusttakein someof the sur-roundingareafor comparison aswell. We arecurrentlyusingwindow sizesof 100, 150, 200, 250, 300,and400feet in an attemptto accommodateall reasonably-sizedbridges.

2.2 Filters

We have implementedseveralfilters to detectbridge-likefeatures. Since the overall goal is to aid hydrologicalmodeling,we focuson discovering terrainelementsthathave astrong effect on existing flow-routing algorithmsandtrying to identify themasbridges.

1. Thehigh gradientfilter acceptsan imageif at leastten percentof the pixels in the filter window havea gradientabove acertainthreshold.Currentlythisthresholdis 2.4feetof elevationper10feetof trans-lation (empirically determined),but we may adjustit in the future and analyzehow it affects our re-sults. This filter is designedto find thesteep edgesof bridges.

2. Thefloodfill filter acceptsanimageif at leastthirtypercentof thepixelsin thefilter window wereflood-filled by a flow-routingalgorithm. This filter is de-signedto capitalizeon the fact that bridgesin gen-eral,andparticularlythebridgesthatwe wantto re-move to do correctflow routing, causeflood fillingalongtheir length.

3. Theminimumfill depthacceptsan imageif thereisat leastonepixel in thewindow thatwasflood-filledhigherthan8 feet.Thisfilter is designedto focusonareasthataresignificantlyproblematicfor hydrolog-ical modeling.

4. The low gradientfilter acceptsan imageif at leasttwenty percentof the window areais low gradientpixels, wherethe low gradientthresholdis 0.5 feetof elevationper10 feetof translation.This filter isdesignedto look for theflat areaof thebridgeitself.

5. The minimumelevation differencefilter acceptsanimageif thedifferencein elevationbetweenany twopixelsin thewindow is above7 feet.Thisfilter capi-talizesonthefactthatbridgeswill beelevatedabovethesurroundingterrain.

6. Theheightbridge shapefilter acceptsanimageif astripedown the middle third of the imagematches

Figure1: A DEM with hand-labeledfeatures.

a low:high:low elevationpatternwhencomparedtotheaverageelevation theentirewindow. This filteris rotatedeighttimesatpi/8 radianintervals to catchvarying bridgeorientations. If any of theserotatedfilters match, the imageis accepted.This filter isdesignedto find anelevationpatternthat looks likeabridge:high in themiddleand loweron thesides.

7. Much like the height bridge shapefilter, the gra-dient bridge shape filter acceptsan image if astripedown the middle third of the imagematchesa high:low:high gradient pattern, using the samethresholdsfor low andhighgradientsthatwereusedin thepreviousgradientfilters. This filter is alsoro-tatedin the samemanneras filter 6. This filter isdesignedto find a gradientpatternthat looks like abridge:flat in themiddleandsharponbothsides.

3 Results

WefocusedourtestingonanareaoutsideDurham,whereInterstate85 crosseshighway 70. The combinationofmultiple roadwaysandawindingstreamproducenumer-ousinterestingfeaturesto analyze.

Figure1 shows a DEM of the areawith hand-labeledfeatures.Notableelementsin this imageare:

• Feature1 (seenin detail in Figure2) is a drainagepipeunderaroadway. While shapedverydifferentlyfrom a bridge,it servesthe samehydrologicalpur-pose,allowing waterto passbeneathit.

• Features2, 3, 4,and5 are,very distinctly, bridges.Some(particularly2 and5) appearto havebeenpre-cut. While it appearsto belesspronouncedthantheothers,Feature4 hasnot beencut by thepreproces-sor, soit is of interestto us.

22

Figure2: A drainagepipeundera roadway, correspond-ing to Feature1 from Figure 1. Picture from GoogleMaps.

• Feature6 is a setof small roadways,possiblywithbridges.

• Feature7 is an elevatedinterchangewith a streamflowing underneathit. The exact patternof flow isdifficult to discernfrom lidar andsatellitemaps,butit is clearthatpart ofthis structureneedsto becut.

• Feature8 is a roadway over anobviousdepression.Thesharpnessof thecutoff betweentheroadandthesurroundingterrainindicatesthattheroadis likely toberaisedabove thegroundprominentlyhere.

• Features9 and10 representareaswherea roadwayseemsto have beencompletelywiped out by theriver, probablyin the interpolationstep. Thesearebridge-likefeatures,but ouralgorithmshouldignorethemin theend.

Figure3 shows theresults of filtering our datasetandgroupingtheselected locationsusingthebuilt-in visual-izationtools in theGRASSsoftware package.Thealgo-rithm clearlylabelsthelargeuncutbridge-like featuresintheimage,suchastheinterchangeandthedrainagepipe,and avoids several pre-cutbridges. It correctly identi-fies the uncut bridge labeled4, above, as a noteworthyfeature,and isolatesseveral small bridgesin the tangleof elementslabeledhand-labeledasfeature6. Features9 and10, alreadydeeplycut, are ignored. Overall, thecomputer-generatedmapseemsto captureall of therele-vantfeaturesexceptfor a few ambiguousparts of area6,while ignoring already-cutbridgesandgeneratingfairlylittle noise.

Performing the feature extraction on this relativelysmall (609180cells)maptook 6m 45s. We expectcom-putationalcomplexity to be linear with respectto the

Figure 3: A comparisonof hand-labeled(dark) andmachine-detected(light) features.

numberof cells. This bearsout in practice: it takes25minutesto processa21117520-cellgrid with oursystem.

Figure4: DEM of a roadover a ravine in anurbanarea,demonstratingtheshortcomingsof thealgorithm.Severalbridge-like structuresarecorrectly labeled,but thesearealsomany falsepositivescausedby treesin theravine.

Figure 4 shows a lessfunctional job. This time, thealgorithm hasidentified a seriesof noisy-lookingareasalongwhat looks like astreamasbeingbridges,aswell.Analysisof theimagearearevealsthat theareaisn’t con-ventionallyfilled with water, however: thegrainy lumpsthathave beenlabeledasbridge-like objectsthat impedetheflow of waterareactuallytreesin a ravine. While theimageof all thoseareasbeingselectedasgoodareasisratherunsightly, mostrepresentterrainartifactsthat can

23

Figure 5: The bridge detectionalgorithm applied to alarger area. Note the tendency to over-selectlow areaswhenthey arenotuniform.

easilybecut; thosethatdon’t areactuallymajorfeatures.Figures5 and6 demonstratesimilar resultswith larger

datasets.There arequiteafew falsepositivesoverall,butnumerousbridgesarecorrectly detected.

4 Conclusionand Futur eWork

Overall, our algorithmseemsquiteeffective in detectingbridgesandsimilar featuresthat impedeflow routingonhigh-resolutionDEMs.

Experimentingon other datasets,however, revealedthatthealgorithmis fairly sensitive to input error. Whilenoisemostly just impairsits ability to detectusefulfea-tures,errorsthatproduceregularpatternswill often leadto numerousfalsepositives.Thisproblemoccursbecauseour program is searchingfor regular, mostly-linearfea-tures,whichcanbeintroducedinto theimageasartifactsduringthevariousstagesof preprocessingif theraw datais sufficiently poor. It shouldbe notedthat, while suchresultsincludea lot of falsepositives,mostof thoseareclusteredaroundimageartifacts,socuttingthroughsuchareasshouldn’t deform theactualmapvery much. Veryfew falsepositives generatedby our algorithm are ac-tually objectsthat would greatly affect the hydrologicalmodelif cut.

The computationtime currently leaves somethingtobe desired,however. While the integral imagemethodspeedsup the first few statisticalfiltering steps,we cur-rently usenaive techniquesto find local extrema– thesecost us a lot of time spentrescanningthe samepixelsas the window moves acrossthe map. Finding the ex-tremevalue for a strip of dataat a time and then sim-ply taking the extremaof thosecould greatly speeduptheexecutionof this stage.For theshapefilters, a morecomputationally-efficientwayto performtherequiredro-tationswould be ideal. Overall, thecomputationalover-

Figure 6: The bridge detectionalgorithm applied to alargearea with a very complex roadnetwork. While thesystemis incapableof puzzling out the interchange,itdoesidentify a largenumberof bridge-like features(anda few falsepositives).

headof runningthealgorithmcould probablybesignifi-cantly reducedby runningit as part ofanotherwindow-sweepingalgorithmandreimplementingit in C/C++.

It maybepossibleto getimprovements in accuracy byrunning this algorithm iteratively with a bridge splicer,recalculatingtheflood fill depthafter theremoval of thecurrenttargetbridge. Sucha systemwould requirea lotof repetitivecomputation,however.

Sincemostof our falsepositivesseemto come fromartifactsin the DEM, we believe that simply cutting theregionsidentifiedby our algorithm, even if the detectedfeatureis notabridge,will improveflow-routing.

References

G. Sithole and G. Vosselman. 2006. Bridge detectionin airbornelaserscannerdata. ISPRSJournal of Pho-togrammetryandRemoteSensing, 61(1):33–46,Octo-ber.

Paul Viola andMichael Jones. 2002. Robust real-timeobject detection. International Journal of ComputerVision - to appear.

24


Flow Routing on Flat Terrains

Taylor Hamil tonDepartment of Computer Science

[email protected]

Giovanna ThronDepartment of Computer Science

[email protected]

Abstract

Most current flow routing algorithms usedigital elevation models (DEMs) to con-struct flow models. In order to success-fully use current techniquesfor flow rout-ing, they floodlocal minima and then finda way of routing the flow across theflatsurfaces. In this paper, we examine an al-ternativemethod for computing flow rout-ing on theseflooded surfaces that takesinto account the original elevation data.Our approach is based uponDijkstra’s sin-gle source shortest path algorithm. We setthedistancebetween two adjacent cells tobe the elevation of one of the cells. Whileour results are not yet ideal, altering ourdistance formula shows promise for im-provement.

1 Introduction

A current problem in geographic information sys-tems is the automatic extraction of river networksfrom a set of elevation datausing themethod of flowaccumulation. Calculating river networks is usefulin determining floodinsurance zones.

The basic idea for solving this problem is rela-tively simple: for each elevation point, route flow totheneighbor with the steepest downhill slope. Thismethodis effective as longas there arenolocal min-ima. Local minima will be pits or valleys in whichthewater will get trapped. Ideally, all water shouldflow to someoutlet point at the edgeof thegrid.

Current algorithms(JensonandDomingue, 1988;Garbrecht and Martz, 1997; Soille and Colombo,2003) tend to deal with this problem by flooding theminima until they are all removed. This approach isjustified by the assumption that minima are the ac-cidental result of poor sampling in theoriginal data.However, this is not always the case. Many minimaare caused by large-scale terrain features, such as abridgeover a river. When theseminimaget flooded,useful information about theunderlying river is lost,as can be seen in Figure1.

Theseflooding algorithms create large flat sur-faces, which cause a new problem in flow rout-ing. Without a steepest downslope neighbor it isnot immediately obvious in which direction thewa-ter should flow across the surface.

Several algorithms have been developed that at-tempt to solve theproblem of flow routing over flatterrain. A side effect common to all these algorithmsis that they fail to use information about theoriginalterrainwith their flat terrainflow routing algorithms.Wehavedeveloped an algorithm to routeflow acrossflat surfaces that takes into account theoriginal ter-rain. This providesriver networks that more accu-rately match reality.

2 Related Work

Current algorithms for solving this problem do notproduce ideal results. Jenson and Domingue (1988)focussed mainly on flooding and their method forflow routing onflat surfaceswasnot very involved.For each point on a flat surface, they assigned theflow to be in thedirection directly towards theout-let. This resulted in artificial looking river networks

25

Figure1: Original andflooded elevation data

becauseof longstretchesof parallel lines.Garbrecht and Martz (1997) improved uponthis

algorithm by not only routing flow towards theout-let, but also away from the bordering high terrain.This provided more natural looking river networks.However, as theseriver networksdo not take into ac-count theunderlying elevations, they do not alwaysaccurately model thetrueflow of water in theregion.

Soille et al. (2003) proposed themethod of carv-ing as opposed to flooding for removing minima,which reduces thenumber of flat areas. They alsoproposed a flat terrain flow routing algorithm that isan improvement on Garbrecht and Martz’s method.Although Soille’s algorithm is an improvement, ithas the samefundamental issues as that of GarbrechtandMartz.

3 Methods

Our algorithm focuseson improving flow routingover flat terrains. To accomplish this, we useboththe flooded terrain information and the original el-evation data. The flooded terrain indicates the ar-eason which to concentrate, and theoriginal terrainprovides the elevation data needed for our method.We useDijkstra’s algorithm to calculate the short-est path to the spill points. We vary the metric forcomputing thedistancebetween twoadjacent cells.

We begin with digital elevation models in theGRASS ASCII f ormat: one of the original terrainand one with the local minima flooded. We find thespill points, cells adjacent to theflooded terrainwith

lower elevations.

We used a single source shortest path algorithmto calculate flow directions for flat areas. This al-gorithm treats our grid as a connected graph whereeach cell is connected to its eight neighbors. Theweights of the edges can be chosen independentlyof the algorithm,andwe experiment with several op-tions.

To compute the single source shortest path, weused Dijkstra’s algorithm. The algorithm begins byinitializing all thepathlengthsof any cell to the spillpoint to infinity. We createapriority queue that con-tains the spill points, and set their path lengths tozero. We continue extracting thepoint from thepri-ority queue with theminimum path length until thequeue is empty. Each time we remove apoint, welook at each of its neighbors and update their pathsif the path throughthe current point is shorter thanthe stored path. We then add each updated neighborto thepriority queue.

When the algorithm is finished, theresult is a for-est that spans the areaof interest, where each treeisrooted at a spill point. The leavesof thetrees are thepoints farthest away from the spill points. The pathfrom a node to theroot of a treeis the shortest pathto a spill point.

We can use thesetrees to calculateflow accumula-tion. We imagine that a unit of water falls onto eachcell in the grid. Using the flow directions of eachcell, we can determine the amount of water that ac-cumulates in each cell. Let p be any cell and F (p)

26

Figure2: River networks andflow directions using theEuclidean distancemetric

Figure3: River networks andflow directions using Soille’s algorithm

be the set of cells flowing into p.

acc(p) = 1 +∑

q∈F (p)

acc(q)

We calculatetheflow accumulations of a node intheforest as the sum of theflows of each of its chil-dren plus one. A grid showing cells whoseflow ac-cumulations are greater than somethreshold shouldshow the locations of theriversof theterrain.

Our algorithm outputs GRASSASCII files withthe flow directions and the flow accumulations ateach cell of the grid. We represent flow directionwith numbers 1 through 8, corresponding to theeight possible directions of flow. We create rivernetworks based on the set of points with flow ac-cumulations over agiven threshold.

3.1 Metr ics

Below wepresent theweightsused to determine thedistancesbetween cells for theDijkstra’s algorithm.In all cases, we added an extra weight of

√2 to di-

agonally adjacent cells to account for thedifferencein Euclidean distance.

3.1.1 Euclidean Distance

The simplest methodweused was setting thedis-tancesbetween adjacent cells to 1. This resulted inthe Single Source Shortest Path (SSSP) method, asused by Jenson and Domingue (1988). In effect thismethod just computes the shortest Euclidean dis-tance from any point to the spill points, and routesflow over that path.

27

Figure4: River networks andflow directions using the elevation distancemetric

Figure5: River networks andflow directionsusing thetranslated elevation distancemetric

3.1.2 Soille’s Algor ithm

The flat terrain flow routing algorithm introducedby Soille (2003) is designed to route flow throughthe center of the terrain and avoid having straightparallel lines. We calculate the distance, d(c) fromeach cell to the border of the flat terrain using abreadth first search away from theborder. Let c be acell andC the set of all cells.

w(c) = max{d(f)|f ∈ C} + 1 − d(c)

3.1.3 Elevation

Our first metric that uses theoriginal elevationdatasetsthedistancebetween any twoadjacent cellsto the elevation of the cell flowed to. Thus, the to-tal distance from a cell to the spill point is the sum

of the elevations of the cells traversed. This encour-ages theflow to travel down to lower elevations, aswell astraversing a small number of cells betweenthe source and the spill points.

3.1.4 Translated Elevation

To weight more heavily the importance of flow-ing across low elevations, asopposed to traversingshort distances, we translate the elevations of allcells down by the minimum elevation over the rel-evant area. Thus, the weight of a cell is the differ-ence of the elevation of the cell and the minimumelevation.

28

Figure6: River networks andflow directions using the squared translated elevation distancemetric

Figure 7: River networks and flow directions using the fourth power of the translated elevation distancemetric

3.1.5 Power of Translated Elevation

Raising the translated elevation to a positivepower puts a greater penalty on higher elevations.This further encourages flow to follow low eleva-tions. A greater power will put more emphasis ontraveling onlow elevations.

4 Data

Weused raster dataof theNorth Carolinariver basinat 10 foot resolution. We used both theoriginal ele-vation data andelevation data of theterrainwith thesinksflooded.

5 Results

For each metric, we show the river networks andflow directions. In computing the river networks,we used an accumulation threshold of 150 cells(150,000 ft2). In the flow directions figures, eachcolor indicates adifferent flow direction.

Theresults of using theEuclidean distancemetricare shown in Figure 2, Soille’s algorithm in Figure3, the elevation metric in Figure 4, thetranslated el-evation metric in Figure5, the squared translated el-evation metric in Figure 6, and the fourth power ofthetranslated elevation metric in Figure7.

29

6 Discussion

As can be seen in Figures2 through 7, the resultsimprovewith each alteration of our algorithm, even-tually producing natural looking river networks thatfollow the elevationsof theoriginal terrain.

By comparison, the Euclidean metric (Figure 2fails to produce natural looking or accurate rivers.The flow is routed in straight parallel lines andhugs theboundaries of the region, as the algorithmsearchesfor the shortest Euclidean distance.

Soille’s algorithm (Figure 3), on the other hand,producesmorenatural looking river networks. How-ever, as this algorithm fails to take into account theoriginal elevation data, the rivers do not follow theterrain features. As an example of this behavior, ob-serve theoxbow near the center of theregion. Ratherthan following the bend in the river, the river staysin the center of theregion.

Each of Figures5 through 7shows an improve-ment on the previous river network. As you cansee inFigure7, our river network both looksnaturalandaccurately models theterrain. Our river networktends to stay in the lighter yellow areas, which cor-responds to the lowest elevations in theregion.

From theriver networks inFigure7 it can be seenthat our algorithm tends toperform better on lowerelevations than on higher ones. This is a result oftranslating the elevations by theminimum elevationof the region. While this translation succeeds inappropriately weighting elevation against Euclideandistance at lower elevations, this balance too heavilyin favor of Euclidean distance at higher elevations.

7 Future Work

Wewould like todevelop adistancemetric that doesnot have thedrawbacks at higher elevations that ourcurrent algorithm displays. To accomplish this, wehave experimented with other distancemetricswith-out success. We began by taking the exponential ofthe elevation, but discovered that this this resultedin numbers that overflowed Python’s float type. Wewould like to experiment with the taking thediffer-encesof elevationsof neighboring cells.

References

J. Garbrecht and L. Martz. 1997. The assignment ofdrainagedirection over flat surfaces in raster digital el-evation models. Journal of Hydrology, 193:204–213.

S. Jenson and J. Domingue. 1988. Extracting topo-graphic structure from digital elevation data for geo-graphic information system analysis. Photogrammet-ric Engineering andRemote Sensing, 54:1593–1600.

J. Vogt Soill e, P. and R. Colombo. 2003. Carving andadaptivedrainage enforcement of grid digital elevationmodels. Water Resources Research, 39:10–1–10–13.

30


Shapefile Overlay Using aDoubly-Connected EdgeList

Phil Katz and Stephen St.VincentSwarthmoreCollege

500CollegeAve.Swarthmore, PA 19081

[pkatz1, sstvinc2]@swarthmore.edu

Abstract

We present a methodfor finding the over-lays of a set of polygons that uses thedoubly-connected edge list structure. Wefirst detect all i ntersections between poly-gons using a brute-force method. Wethen build the doubly-connected edge list,maintaining information about the origi-nal polygons, from which we can easilyperform shapefile overlay operations: in-tersection, difference, and union.

Our algorithm runs in O(n2) time. Ourdoubly-connected edge list constructionalgorithm runs in O(n log(n)) time,with the bottleneck being the brute-forceO(n2) line segment intersection. Oncethat list is built , any given overlay oper-ation isO(n).

1 Introduction

Natural disasters such as floods often occurswiftly and without warning. It is imperative, there-fore, that politi cal entities such as counties andstatesbe adequately prepared to deal with such disasters.Often, this level of preparation variesdirectly withthe amount of funding recieved, which is in and ofitself afunction of thepercieved threat in that region.Determining the extent to which region is in dangerof flooding is difficult to assess anecdotally. How-ever, by combining geographical data such as wa-tershed layouts with politi cal boundary data, we can

(a) (b)

(c) (d)

Figure 1: Examplesof shapefile overlays. (a) Theoriginal polygons in set S. Here, we have two over-lapping squares at different orientations. (b) Theintersection of the two squares, represented by thegreen region. (c) The difference of the two poly-gons, shown bytheblue and yellow regions. (d) Theunion of the two polygons. Note that the interiorsegments arenow gone.

use shapefile1 overlays to assign an unbiased valueto thelevel of danger toany region. Thisinformation

1A shapefile is a common file format for exchanging poly-gon data that does not maintain topology

31

can then assist in appropriate resource allocation, aswell as computation of floodinsurancerates.

Given a set of polygons S, how can one efficientlydetermine the new set of polygons P that is definedby the overlay of the polygons in S? Polygons inP can include the intersection, union, or differenceof members of S. Figure 1 shows examplesof thesetypesof overlays.

To solve theproblem of shapefileoverlay, wewillusemethods from (de Berg et al., 1998) to build adata structure called a doubly-connected edge listthat will allow us to calculate overlays efficiently.The more general problem of shapefile overlay hasspecial caseswhich would be unusual in settingssuch asthe one described above, such aspolygonswith holesin them. Still , we consider these specialcasesto make our algorithm asgeneral aspossible.

In section 3, we present our methods for build-ing the doubly-connected edge list and calculatingshapefile overlays. In section 4, we discussthe run-time analysis of our algorithms. In section 5, wepresent results applying our algorithms to simpletest-casepolygons aswell as to real geographicaldata. Finally, in section 6, we discuss the implica-tions of our results.2 Related Work

When building topological representations, it isusu-ally the casethat we wish to restrict the topologiesto follow a specific set of constraints. (Hoel et al.,1994) describe such a system. Their topologieshavecertain consistency requirements, and as such mustfollow an explicit set of rules, such asthe following:

• Interiorsof polygons in afeature classmust notoverlap

• Polygons must not have voids within them-selves

• Polygons of one feature classmust share all oftheir areawith polygons inanother feature class

Shapefile overlay is necessary to enforce theseruleson large setsof polygons. Without an effectiveshapefileoverlay algorithm, thework of (Hoel et al.,1994) could not be implemented in an efficient, ro-bust manner.

Figure 2: Doubly-connected edge list for two poly-gons. Half-edge e1 has anext pointer to e2 anda previous pointer to e3. The current faceof allthreeof theseisF1. The twin of e2 ise4, whose cur-rent faceis F2. The twins of e1 and e3 have Nullastheir current face.

3 Methods

To find the overlays of multiple polygons, wemust construct the doubly-connected edge list. Be-forewe can dothat, weneed to know where the seg-ments of each polygonintersect the segments of theother polygons in the set.3.1 L ine intersection calculations

This is by far the most straightforward step in cal-culating shapefile overlays. Because the runtimeof brute-force algorithms is not significantly slowerthan more complex algorithms for the data sets weare considering (Andrews et al., 1994), it doesnotcost us significantly to implement a brute-force al-gorithm. For each segment of a polygon, we checkexplicitly whether it intersects any segment of anyother polygon, and keep a list of all i ntersectionpoints that occur on that segment. This allows us toeasily create all of the subsegments for the doubly-connected edgelist, aswell asto keep track of whichof the original polygons a segment was associatedwith.

3.2 Thedoubly-connected edge list

A doubly-connected edge list (de Berg et al., 1998)stores all of the information regarding the set ofpolygons that is necessary to calculate the shape-fileoverlays. Thebasic setup of adoubly-connected

32

edge list begins with the edges. The edgesof eachpolygon are stored as directed half-edges that goaroundthe polygonin clockwiseorder such that thefacethat is bound bythe half-edgesis always to theright of each half-edge (seefigure 2).

Each half-edge storesthe following information:

• Starting point

• Ending point

• The ID of the face from which the half-edgeoriginated

• A pointer to the next half-edge on the currentface

• A pointer to the previous half-edge on the cur-rent face

• A pointer to its twin half-edge

As we build the half-edgesthat are on the interi-ors of the original polygons, we can build their twinedges. Twin edges are the same asthe original half-edge, but with itsorientation reversed anditsface setto null. So, for half-edge e, twin(twin(e))=e.

For each new half-edge that we add, we updatea dictionary, vDict, that contains all of the neces-sary edges. ThevDict is ahash tablekeyed on ver-tices andcontains alist of thehalf-edgesthat start atthat vertex. This is critical for building thenext- andprevious-edge pointers.

3.2.1 Creating next- and previous-pointers

Determining the next and previous pointers foreach half-edge is non-trivial. For the next pointer ofeach half-edge e, wemust findthehalf-edgeswhosestarting point is the same asthe ending point of e.Then, we must determine which of thesehalf-edgesmakesthe largest clockwise anglewith e, andset thenext-pointer of e to that half-edge.

Now that the vDict hasbeen updated for eachhalf-edge, this task becomesmuch easier. Usingfig-ure 3 as an example, let us attempt to find the next-pointer of the half-edge AX . By treating each half-edge as avector originating at X , we can find theangle between the half-edge AX and all the otherhalf-edges by using the cross product and the dotproduct:

Figure3: Finding thenext-pointer. Here, we are try-ing to find the appropriate half-edge for the next-pointer of half-edge AX. θ measuresthe anglesbe-tween AX and the other half-edges in the image.We choosethe half-edge with the largest value of θ,which in this caseisXB.

sin (θ) =

∣

∣

∣

∣

∣

∣AX × BX

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣AX

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣BX

∣

∣

∣

∣

∣

∣

cos (θ) =AX · BX

∣

∣

∣

∣

∣

∣AX

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣BX

∣

∣

∣

∣

∣

∣

After computing cos(θ) and sin(θ), we can calcu-late the true angle θ (where 0 ≤ θ ≤ 2π) betweenAX andeach of theother threehalf-edgesin thefig-ure. The candidate half-edge with the largest valueof θ can now be set as the next-pointer. To com-plete the example, thenext-pointer of AX would beXB, and the previous-pointer of XB can be set toAX . After walking completely arounda facein thisfashion, all of thenext- andprevious-pointers for thehalf-edgesthat boundthat facewill havebeen set, sowedo not need to calculate thepreviouspointers ex-plicitly.3.3 Computing theoverlays

Now that we have walked along the half-edges ofevery face, wehave alist for each new faceof all ofthe original facesfrom which it wasderived. Fromhere, we can easily specify an overlay by selectingthe new facesthat meet the criteria of the overlay inquestion.

33

3.4 Non-intersecting overlays

Figure 4: Line-Sweep Algorithm. Thedotted line issweeping rightward. Right before reaching the cur-rent point, there should be four edges(A,B,C,E) inthe data structure, and uponreaching this point, two(B,C) should be removed.

Not every overlay of two polygons will i nvolvethe inteserction of their segments. Consider for in-stance the overlay of the Sahara Desert with Cum-berland County, Pennsylvania. The intersection ofthesetwo polygons should be null , given that thetwo polygons are clearly not spatially coincident andhave no intersections.

But we cannot simply say that a lack of segmentintersections implies alack of spatial coincidence.Consider now the overlay of Nebraska with the en-tire United States. While it is not apparent why onewould choose to perform this overlay, it should beclear that this is an example of an overlay that hasno segment intersections but doeshave some over-lap.

We solve this problem by using a line-sweep al-gorithm (de Berg et al., 1998). Figure 4 showstwo polygons, with one completely interior to theother. Once our line-intersection algorithm deter-mines that there are no segment intersections be-tween thesetwo polygons, we can move into ourline-sweep algorithm. We sort the verticesin orderof their x-coordinate; we also store the face associ-ated with the first vertex in the sorted list. We thenstep throughthe sorted verticesin order whilemain-taining a list of edges that currently intersect the

sweep line. When a vertex is encountered that hasan associated facethat differs from the faceof thefirst point, we can stop our line-sweep. If the newvertex is below an odd number of segments, thenwe know that the new faceis completely interior tothe other; otherwise, it must be completely exterior.Thismethodis robust for concave polygons.

3.5 Polygons with holes

Figure 5: Polygons with holes. The blue polygonhas ahole in its center, which isnot filled in. Wherethe yellow polygon intersects with the hole, it re-mains unchanged.

Consider thenationsof South Africa andLesotho.Lesotho is an independent nation completely inte-rior to the borders of South Africa. As such, SouthAfricamay be treated as apolygonwith ahole in it.If weworked for the South African government andneeded to determineoverlays, wewould surely wantto take Lesotho into account.

We can handle this by allowing each faceto havea pointer to its inner edges. Theseinner edgeswillform a closed polygon. The outer half-edgesof thisinterior polygon(which runcounter-clockwise) havethe same face asthe original polygon, while the in-ner half-edges(which runclockwise) have their faceset to null . From here, we can perform our normalshapefile overlay process, starting with segment-intersection.

4 Runtime analysis

Our brute-force polygon intersection calculation isO(n2). For each line segment, we test all of theline segments in the other polygon for intersection.Given that we do not check segments in the same

34

polygon, this upper bound of O(n2) can never bereached, but is still t he appropriate theoretical upperbound. This assumesthat the polygons are simple.

Our line-sweep algorithm is O(n log(n)). Foreach of the n vertices, the operations we need toperform (search, insertion, and deletion) can be im-plemented (with a binary tree, for example) to re-quire O(log(n)) operations to maintain the line-sweep data structure.

Once we have the doubly-connected edge listbuilt , we will have k edges, stemming from the n

original edges. To build the overlay faces, we needonly go througheach of thek edgesonce, removingthem from the list of all edgesoncewe assign themto a face. So the overlay algorithm itself (assumingthat thedoubly-connected edge list has already beenbuilt ) is O(k). In the worst case, k is O(n2), butin most real-world polygon intersections, k will beO(n).

5 Results

Figure6: Exampleof polygonintersection. Theblueareaindicates areasthat areonly covered bythe con-vex polygon(rotated square). Theyellow areaswereonly covered by the concave polygon. The greenarearepresents the intersection of the two polygons.The white circles show points of intersection be-tween the segments of the two polygons.

Figure 6 is a sample run of our intersection al-gorithm on two hand-made polygons. The imagewas created using the Python graphics library from(Zelle, 2004). The green polygon in the centershows the intersection polygon. The blue and yel-low polygons combine to define the difference ofthe polygons. The entire shaded area is the union

Figure 7: Interior polygons. This image shows theeffects of scaling the concave polygon in figure 6so that it fits completely inside the concave poly-gon. No area has maintained the yellow coloringfrom above.

of the two polygons. The small white circles showthepoints of intersection between the two polygons.

Figure 7 shows the intersection of two polygonswhere one of the polygons is completely interior totheother. Despite the lack of intersection points, ouralgorithm hashandled this flawlessly.

Figure 8: Intersection of Cumberland County withwatershed 2050305. Thered region is thewatershedonly, the blue region is Cumberland County only,and the purple region is the intersection of the two.

Figure 8 shows the intersection of CumberlandCounty, Pennsylvania, with watershed 2050305(USGS Hydrological Unit Code). Figure 9 showsCumberland County intersected with watershed2050306.

35

Polygons Time(ms) Points Intersect time (ms) DE list time (ms) Overlay time (ms)

CC, WS5 8126 693 6132 238 1642CC, WS6 7950 688 6000 252 1556

Table 1: Benchmarking onsample county and watershed data. The Cumberland County (CC) polygon had284 points; the watershed 2050305(WS5) polygon had 409 points; the watershed 2050306(WS6) polygonhad 404 points.

.

Figure 9: Intersection of Cumberland County withwatershed 2050306. Theyellow region is the water-shed only, the blue regions are Cumberland Countyonly, and the green region is the intersection of thetwo.

Table1 showsbenchmarkingfiguresfor thesetworuns. The tests were run onan Intel(R) Pentium(R)4 running at 3.00 GHz with 156MB of RAM.

6 Discussion

The goal of this project wasto implement shapefileoverlay in an efficient manner. In order to accom-plish thisgoal, we implemented adoubly-connectededge list that allowed our algorithms to efficientlycompute overlays. With the exception of our brute-forceline segment intersection, all of our algorithmsare relatively computationally inexpensive. Table1 shows that the line segment intersection is theclear bottleneck. It is possible to implement fasterline segment intersection algorithms, but that is notwithin the scope of this project.

Our algorithm works successfully for most com-plex cases, including concave polygons, polygonsinside other polygons, polygons with holes, and up

to three polygons, as in figure 10. Our algorithmcan not handle the caseof intersecting two poly-gonsthat share an edge. This caseispathological forlineintersectionaswell asfor constructingadoubly-connected edge list.

Figure 10: Intersection of Three Polygons. Thisimage shows that our algorithm can be extended tothreepolygons.

7 Conclusions

Shapefileoverlay is afundamental building block ofcomputational geometry. It is imperative to thefieldthat shapefile overlay can be computed quickly andcorrectly. Althoughwe do not introduce any revo-lutionary methods to accomplish this task, we showthat it can be performed with fairly straightforwardalgorithms onsimple data structures.

References

D. S. Andrews, J. Snoeyink, J. Boritz, T. Chan, G. Den-ham, J. Harrison, andC. Zhu. 1994. Further compari-son of algorithmsfor geometric intersection problems.In 6th International Symposiumon Spatial Data Han-dling.

Mark de Berg, Mark van Kreveld, Mark Overmars, andOtfried Schwarzkopf. 1998. Computation Geometry:AlgorithmsandApplications. Springer.

36

E. Hoel, S. Menon, and S. Morehouse. 1994. Build-ing a robust relational implementation of topology. In8th International Symposiumon Spatial andTemporalDatabases.

John Zelle. 2004. Python Programming: An Introduc-tion to Computer Science. Franklin, Beedle, & Asso-ciates.

37

class of 2007 senior conference on computational geometry and …adanner/cs97/f06/papers/... ·...

Documents