geospatial information integration...satellite image terraserver geocoded houses constraint...
TRANSCRIPT
Geospatial Information Integration
Craig A. Knoblock
University of Southern California
Craig A. Knoblock University of Southern California 2
IntroductionHuge amount of geospatial data and related online sources now availableGeographic Information Systems (GIS) primarily support the overlay of different layersOpportunity: Geospatial Information Integration
Support the retrieval, fusion, reasoning and learning across the available sources
Craig A. Knoblock University of Southern California 3
Geospatial Information IntegrationGeospatial Data RetrievalGeospatial Data Fusion
Vector / Image FusionMap / Image Fusion
Geospatial ReasoningAccurately geocoding addressesIdentifying streets and buildings in satellite imageryPredicting the location of moving objects
Geospatial LearningLearning thematic maps
Geospatial IntegrationConclusions
Craig A. Knoblock University of Southern California 4
Geospatial Information IntegrationGeospatial Data RetrievalGeospatial Data Fusion
Vector / Image FusionMap / Image Fusion
Geospatial ReasoningAccurately geocoding addressesIdentifying streets and buildings in satellite imageryPredicting the location of moving objects
Geospatial LearningLearning thematic maps
Geospatial IntegrationConclusions
Craig A. Knoblock University of Southern California 5
Geospatial Data Retrieval
Structured data: databasesSemi-structured data: web pagesSpatial Data:
Vector data: points, lines, polygons, …Images: satellite, and aerial imageryMaps
Text documentsAudio & Video: TV & Radio on the web
Heracles: Framework to integrate heterogeneous data
Craig A. Knoblock University of Southern California 6
Geospatial Data SourcesImagery
Craig A. Knoblock University of Southern California 7
ImageryMaps
Geospatial Data Sources
Craig A. Knoblock University of Southern California 8
ImageryMapsVectors
Geospatial Data Sources
Craig A. Knoblock University of Southern California 9
Geospatial Data SourcesImageryMapsVectorsElevations
Craig A. Knoblock University of Southern California 10
Geospatial Data SourcesImageryMapsVectorsElevationsPoints
Craig A. Knoblock University of Southern California 11
Semi-structured Data Sources
Property tax sites
Craig A. Knoblock University of Southern California 12
Semi-structured Data Sources
Property tax sitesTelephone books
Craig A. Knoblock University of Southern California 13
Semi-structured Data Sources
Property tax sitesOnline telephone booksRailroad schedules…
<IRANIAN_RAILWAYS><TRAIN><ROW>
<CITY>Tehran</CITY> <TIME>12:35</TIME>
</ROW>…<ROW>
<CITY>Esfahan</CITY> <TIME>19:45</TIME>
</ROW></TRAIN><TRAIN><ROW>
<CITY>Tehran</CITY> <TIME>14:00</TIME>
</ROW>…
</TRAIN></IRANIAN_RAILWAYS>
Craig A. Knoblock University of Southern California 14
Geospatial Information IntegrationGeospatial Data RetrievalGeospatial Data Fusion
Vector / Image FusionMap / Image Fusion
Geospatial ReasoningAccurately geocoding addressesIdentifying streets and buildings in satellite imageryPredicting the location of moving objects
Geospatial LearningLearning thematic maps
Geospatial IntegrationConclusions
Craig A. Knoblock University of Southern California 15
Accurately Geocoding Addresses
Los Angeles County Assessor’s Site Property Tax Records
Satellite Image Terraserver Geocoded Houses
Constraint Satisfaction
Initial Hypothesis Result After Constraint Satisfaction
Street Vector Data Corrected Tiger Line Files
610, Palm or 645,Sierra
645, Sierra or 639,Sierra
633, Sierra or 629,Sierra
604 or 642
604 or 610
642, Penn or 636,Penn
630,Penn or 628,Penn
636,Penn or 630,Penn
628,Penn or 624,Penn624,Penn or 618,Penn
639, Sierra or 633,Sierra
629, Sierra or 623,Sierra
604 610 645, Sierra
642,644,646 Penn 639, Sierra
636,638,640 Penn
630,632,634 Penn
633, Sierra
629, Sierra628, Penn
624, Penn623, Sierra
Address Latitude Longitude642 Penn St 33.923413 -118.409809640 Penn St 33.923412 -118.409809636 Penn St 33.923412 -118.409809604 Palm Ave 33.923414 -118.409809610 Palm Ave 33.923414 -118.409810645 Sierra St 33.923413 -118.409810639 Sierra St 33.923412 -118.409810
Address # unitsArea(sq ft)Lot size642 Penn St 3 1793 135.72 * 53.33 604 Palm Ave 1 884 69 * 42610 Palm Ave 1 756 66 * 42645 Sierra St 1 1337 120 * 62639 Sierra St 1 1408 121*53.5
Data Extracted from On-line Site
Craig A. Knoblock University of Southern California 16
Information in Street Sources
Craig A. Knoblock University of Southern California 17
Address-range Method of GeocodingSierra StFrom: A ( 33.923413, -118.408709 )To: B ( 33.924813, -118.408809 )
Addresses on the Left: 601-699Addresses on the Right: 600-698
645: Left Side22nd out of the 50 addresses on the left side
Interpolate the address on the street
A
B
Craig A. Knoblock University of Southern California 18Address-range (traditional) method
Craig A. Knoblock University of Southern California 19
Uniform lot-size methodGet the information of the street segment from the street data sourceQuery the property tax source to get the number of parcels on a street segmentAssume all lots are equal in dimensionApproximate the location of the address based on the number of lots
Craig A. Knoblock University of Southern California 20Uniform lot-size method
Craig A. Knoblock University of Southern California 21
Corner lot problem
Number of dimensions on the street =number of lots on the street +
corner lot
Craig A. Knoblock University of Southern California 22
Actual Lot-Size MethodGet the coordinates of the block from the street data sourceQuery the property source and get the dimension of every lot on the blockCompute the dimensions of the 16 possible orientationsCompare these with the true dimensionThe layout that most closely matches / least error is chosen as the layout
Craig A. Knoblock University of Southern California 23
Finding the optimal layoutCalculate the actual length and breadth (width) of the block using the information in the street data source[length, width]
Truedim
257
257
480480
Craig A. Knoblock University of Southern California 24
136
240
482575
256
240
420575
204
240
482533
324
240
420533
136
120
542575
256
120
482575
204
120
542533
324
120
482533
136
256
542482
256
482482
256
204
256
542440
324
256
440
136
375
482482
256
375
420482
204
375
482440
324
375
440
482
420
Truedim
257
257
480480
Craig A. Knoblock University of Southern California 25
136
240
482575
256
240
420575
204
240
482533
324
240
420533
136
120
542575
256
120
482575
204
120
542533
324
120
482533
136
256
542482
256
482482
256
204
256
542440
324
256
440
136
375
482482
256
375
420482
204
375
482440
324
375
440
482
420
Truedim
257
257
480480
Craig A. Knoblock University of Southern California 26Actual lot-size method
Craig A. Knoblock University of Southern California 27
ResultsChosing a region
El SegundoData Source
Conflated TIGER/LinesFetch Agent Platform to convert website data into XMLPrometheus 2.0 information mediatorGeocoded 267 addresses spanning 13 blocksActual lot-size method could not be applied to 3 blocks (58 addresses)None of the methods could be applied to one addressResults based on the remaining 208 addresses
Craig A. Knoblock University of Southern California 28
N
Chosen area for goecoding
Craig A. Knoblock University of Southern California 29
Comparison of Results(all errors are in meters) Address-range Uniform lot-size Actual lot-size
Average Error 36.85359 7.87149 1.62993
Standard Deviation 20.49335 9.92361 1.46958
Minimum Error 0.86578 0.07086 0.03487
Maximum Error 73.80526 56.64072 7.80242
Average percentage of improvement over traditional approach
Uniform lot-size method: 78.65%Actual lot-size method: 95.59%
Craig A. Knoblock University of Southern California 30
Address Range Methodµ = 36.85 σ =20.49
Uniform lot-size Methodµ = 7.87 σ = 9.92
Actual lot-size Methodµ = 1.63 σ = 1.47
Error in meter
Pro
bab
ilit
y
Normal Distribution of the error
Craig A. Knoblock University of Southern California 31
Geospatial Information IntegrationGeospatial Data RetrievalGeospatial Data Fusion
Vector / Image FusionMap / Image Fusion
Geospatial ReasoningAccurately geocoding addressesIdentifying streets and buildings in satellite imageryPredicting the location of moving objects
Geospatial LearningLearning thematic maps
Geospatial IntegrationConclusions
Craig A. Knoblock University of Southern California 32
Combining Online Schedules with Vectors and Points [Shahabi et al., 2001]
How do we efficiently determine which trains will pass a given point or region
Railroad vectors specify all possible paths of the trainsStations show the locations of the stopsSchedules provide the detailed timetable and stops
StationsRailroads
Schedules
Craig A. Knoblock University of Southern California 33
Integrating Schedules with Vector DataApproach:
Create a wrapper for the online schedule and download it to a databaseMatch the names of the stations in the online schedule with the names of the stations in the gazetteer
Exploits work we have done on record linkage across sourcesAlign the points in the gazetteer with the vector data of the railroadsFind the shortest paths between the stationsCompute the trains that will pass a given region within some time interval
Determines how much real paths can deviate from the shortest distance between two points to compute this efficiently
Craig A. Knoblock University of Southern California 34
Integrating Schedules with Vectors
Craig A. Knoblock University of Southern California 35
Current TimeTime train enters area Time train exits area
Moving objectsSystem efficiently computes trains going through user selected area and time interval
Spatio-Temporal Integration
Craig A. Knoblock University of Southern California 36
Geospatial Information IntegrationGeospatial Data RetrievalGeospatial Data Fusion
Vector / Image FusionMap / Image Fusion
Geospatial ReasoningAccurately geocoding addressesIdentifying streets and buildings in satellite imageryPredicting the location of moving objects
Geospatial LearningLearning thematic maps
Geospatial IntegrationConclusions
Craig A. Knoblock University of Southern California 37
Learning Thematic MapsThe California county mapThe California county map
Craig A. Knoblock University of Southern California 38
ApplicationThere is a map that partitions the 2-d space into disjoint regionsEach region is assigned a label
e.g., zip code areasa = 90007b = 90006, …
The label of each point is equal the label of its surrounding region
Original Map
Craig A. Knoblock University of Southern California 39
Goal:Goal: find the best find the best approximationapproximation to the to the original maporiginal map
Given:Given: a set of data a set of data points and their points and their corresponding labelscorresponding labels
ApplicationProblem:
No access to the original map
Approximate MapOriginal Map
Craig A. Knoblock University of Southern California 40
Classification Methods:Nearest Neighbor
For each point X in the training set, find a polygon including all points in the 2-d space which are closer to Xthan any other point in the set (Voronoi Diagram)
Craig A. Knoblock University of Southern California 41
Classification Methods:Support Vector Machine
SVM is a classifier derived from statistical learning theory by Vapnik
Class 1
Class 2
Many decision boundaries Many decision boundaries can separate classescan separate classesWhich one should we Which one should we choose?choose?The decision boundary The decision boundary should be as far away from should be as far away from the data of both classes as the data of both classes as possiblepossible
We should maximize the marginWe should maximize the marginm
Craig A. Knoblock University of Southern California 42
Experiments : Performance Measures
Area-based measures:
)area()area(
RetrievedRelevantRetrievedPrecision ∩
=
)area()area(
RelevantRelevantRetrievedRecall ∩
=
Craig A. Knoblock University of Southern California 43
Experiments : Results••The zip map with USGS data The zip map with USGS data (area(area--based precision)based precision)
•• The performance is degradingThe performance is degrading
•• No more than 85% precisionNo more than 85% precision
•• RBF SVM outperformsRBF SVM outperforms
Craig A. Knoblock University of Southern California 44
Experiments : Results••The zip map with USGS data The zip map with USGS data (area(area--based recall)based recall)
•• Recall is getting betterRecall is getting better
•• Up to 87% recallUp to 87% recall
•• RBF SVM still outperformsRBF SVM still outperforms
Craig A. Knoblock University of Southern California 45
Geospatial Information IntegrationGeospatial Data RetrievalGeospatial Data Fusion
Vector / Image FusionMap / Image Fusion
Geospatial ReasoningAccurately geocoding addressesIdentifying streets and buildings in satellite imageryPredicting the location of moving objects
Geospatial LearningLearning thematic maps
Geospatial IntegrationConclusions
Craig A. Knoblock University of Southern California 46
Building a Geospatial Mediator (PrometheusGeo)
Develop an general integration framework that can represent and integrate:
Imagery, maps, vector, elevation, and other geospatial typesDatabases and text sources with a geospatial extentExploit the various fusion and extraction operations described earlier
OpenGISGML (Geography Markup Language)
XML encoding for the transport and storage of geographic informationWMS (Web Map Service)
Support of the creation and display of maps of information that come from multiple sources that are both remote and heterogeneous
WFS (Web Feature Service)Support creation and display of features from various information sources
Craig A. Knoblock University of Southern California 47
Example
JPL Map Server Capabilities File
GetCapabilities
Maps
JPL MapServer
Type = ‘Global_mosaic’Lat in (-90, 90)Long in (-180, 180)
JPL MapServer
Type = ‘US_Landsat’Lat in (23, 50)Long in (-127, -66)
…….
Mediator Model
Craig A. Knoblock University of Southern California 48
Example
JPL Map Server
Maps
JPL MapServer
Type = ‘Global_mosaic’Lat in (-90, 90)Long in (-180, 180)
JPL MapServer
Type = ‘US_Landsat’Lat in (23, 50)Long in (-127, -66)
…….
Mediator Model
MediatorFind all available maps for Belgrade (44.82,21.41)
Obtain relevant maps by querying the model
GetMapRequests
Craig A. Knoblock University of Southern California 49
Representation and Querying Processing in PrometheusGeo
Focus on querying and integrating geospatial layers GAV/LAV representation of sources is supportedBuilds on the Prometheus query representation and query processingAdd an explicit representation of quality where all sources have an associated quality across multiple dimensions (accuracy, resolution, time, etc.)User can specify their own quality metric
Craig A. Knoblock University of Southern California 50
Mediation in MIX [Gupta et al.]Mediator defined by building an structured representation of both GIS and image sourcesGlobal integration model
Each term in the global model is described as a view on the sources
Mediator relations defined by:Containment conditionsSpatial or temporal joinsLogical associations
Queries and results in XML
Craig A. Knoblock University of Southern California 51
ExampleProduce a table of aerial imagery and photographs of houses broken down by 5-year increments and Total Assessed Value
Craig A. Knoblock University of Southern California 52
Result
Craig A. Knoblock University of Southern California 53
Query Processing in VirGIS[Essid et al., 2004]
Focus on the mediation of geospatial vector dataAlso builds on the OpenGIS standards: GML, WFS, etc.Uses a LAV model with the restriction that all mappings have to be one-to-oneDoes support detailed querying of the vector layers
Craig A. Knoblock University of Southern California 54
ExampleExtract all 2-lane roads that cross bridges and have a length of up to 1000 metersFor $x in document(bridge), $y in document(road)where cross ($x/geometry, $y/geometry) = true
and $x/length > 1000 and $x/lanes = 2return $xDistributes the query across the different vector sourcesCombines the results and returns it in XML
Craig A. Knoblock University of Southern California 55
Geospatial Information IntegrationGeospatial Data RetrievalGeospatial Data Fusion
Vector / Image FusionMap / Image Fusion
Geospatial ReasoningAccurately geocoding addressesIdentifying streets and buildings in satellite imageryPredicting the location of moving objects
Geospatial LearningLearning thematic maps
Geospatial IntegrationConclusions
Craig A. Knoblock University of Southern California 56
ConclusionsGoal is to move beyond current GIS systems that provided very limited integration capabilitiesExploits all related sources of information to enable the integration of and reasoning about geographic data Working towards a comprehensive integration framework for geospatial information
Support the rapid retrieval, fusion, and reasoning of geospatial data to provide new insights