eswc2015 - tutorial on publishing and interlinking linked geospatial data
TRANSCRIPT
Publishing and Interlinking Linked Geospatial Data
Tutorial in Conjunction with the
12th Extended Semantic Web Conference
http://event.cwi.nl/eswc2015-geo/
Tutorial organization
9:00-9:15 Introduction9:15-10:30 Background in geospatial data modeling, representinggeospatial information in the Semantic Web, and querying linked geospatial data.10:30-11:00 coffee break11:00-12:00 Publishing geospatial information as RDF graphs12:00-12:30 Discovering Spatial and Temporal Links among RDF graphs12:30-14:00 Lunch break14:00-14:30 Discovering Spatial and Temporal Links among RDF graphs14:30-15:30 Hands-on session: Publishing geospatial information as RDF graphs15:30-16:00 coffee break16:00-17:00 Hands-on session: Discovering Spatial and Temporal Links among RDF graphs17:00-17:10 Conclusions
http://event.cwi.nl/eswc2015-geo/
Part 1:
Background in geospatial data
modeling
ESWC 2015 Tutorial
Publishing and Interlinking Linked Geospatial Data
Dept. of Informatics and TelecommunicationsNational and Kapodistrian University of Athens
ESWC 2015 Tutorial 2
Outline
• Basic GIS concepts and terminology
• Representing geometries
• Representing topological information
• Geospatial data standards
ESWC 2015 Tutorial 3
Basic GIS Concepts and Terminology
• Theme: the information corresponding to a particular domain
that we want to model. A theme is a set of geographic
features.
• Example: the countries of Europe
ESWC 2015 Tutorial 4
Basic GIS Concepts (cont’d)
• Geographic feature or geographic object: a domain entity
that can have various attributes that describe spatial and non-
spatial characteristics.
• Example: the country Greece with attributes
• Population
• Flag
• Capital
• Geographical area
• Coastline
• Bordering countries
ESWC 2015 Tutorial 5
Basic GIS Concepts (cont’d)
• Geographic features can be atomic or complex.
• Example: According to the Kallikratis administrative reform of
2010, Greece consists of:
• 13 regions (e.g., Crete)
• Each region consists of regional units (e.g., Heraklion)
• Each regional unit consists of municipalities (e.g.,
Dimos Chersonisou)
• …
ESWC 2015 Tutorial 6
Basic GIS Concepts (cont’d)
• The spatial characteristics of a feature can involve:
• Geometric information (location in the underlying
geographic space, shape etc.)
• Topological information (containment, adjacency etc.).
Municipalities of the regional unit of Heraklion:1. Dimos Irakliou2. Dimos Archanon-Asterousion3. Dimos Viannou4. Dimos Gortynas5. Dimos Maleviziou6. Dimos Minoa Pediadas7. Dimos Festou8. Dimos Chersonisou
ESWC 2015 Tutorial 7
Geometric Information
• Geometric information can be captured by using geometric primitives
(points, lines, polygons, etc.) to approximate the spatial attributes of
the real world feature that we want to model.
• Geometries are associated with a coordinate reference system which
describes the coordinate space in which the geometry is defined.
ESWC 2015 Tutorial 8
Encoding Geometries: Vector Representation
• In this encoding objects in space are represented using points as
primitives as follows:
• A point is represented by a tuple of coordinates.
• A line segment is represented by a pair with its beginning
and ending point.
• More complex objects such as arbitrary lines, curves,
surfaces etc. are built recursively by the basic primitives
using constructs such as lists, sets etc.
• This is the approach used in all GIS and other popular
systems today. It has also been standardized by various
international bodies.
ESWC 2015 Tutorial 9
Example
[(1,2) (2,2) (5,3) (3,1) (2,1) (1,2)]
ESWC 2015 Tutorial 10
Encoding Geometries: Constraint Representation
• In this case objects in space are represented by quantifier free
formulas in a constraint language (e.g., linear constraints).
)3
4
353()124()223(
xyxyyxxyyxxy
ESWC 2015 Tutorial 11
Constraint Databases
• The constraint representation of spatial data was the focus of
much work in databases, logic programming and AI after the
paper by Kanellakis, Kuper and Revesz (PODS, 1991).
• The approach was very fruitful theoretically but was not adopted
in practice.
ESWC 2015 Tutorial 12
Topological Information
• Topological information is inherently qualitative and it is
expressed in terms of topological relations (e.g., containment,
adjacency, overlap etc.).
• Topological information can be derived from geometric
information or it might be captured by asserting explicitly the
topological relations between features.
ESWC 2015 Tutorial 13
Topological Relations
• The study of topological relations has produced
a lot of interesting results by researchers in:
• GIS
• Spatial databases
• Artificial Intelligence (qualitative reasoning
and knowledge representation)
ESWC 2015 Tutorial 14
DE-9IM
• The dimensionally extended 9-intersection model
(DE-9IM) of Clementini and Felice.
• It is based on the point-set topology of R2.
• It deals with simple, closed and connected
geometries (areas, lines, points).
• It is an extension of earlier approaches: the 4-
intersection (4IM) and 9-intersection (9IM)
models by Egenhofer and colleagues.
ESWC 2015 Tutorial 15
Topological Relations in DE-9IM
• It captures topological relationships between two
geometries a and b in R2 by considering the
dimensions of the intersections of the
boundaries, interiors and exteriors of the two
geometries:
• The dimension can be 2, 1, 0 and -1 (dimension of
the empty set).
ESWC 2015 Tutorial 16
Example
I(C) B(C) E(C)
I(A) -1 -1 2
B(A) -1 -1 1
E(A) 2 1 2
A
C
ESWC 2015 Tutorial 17
Topological Relations in DE-9IM
• The following five named relationships between two different
geometries can be distinguished: disjoint, touches, crosses,
within and overlaps.
• The named relationships have a reasonably intuitive meaning
for users. They are jointly exclusive and pairwise disjoint
(JEPD).
• The model can also be defined using an appropriate calculus of
geometries that uses these 5 binary relations and boundary
operators.
ESWC 2015 Tutorial 18
Example: A disjoint C
I(C) B(C) E(C)
I(A) F F *
B(A) F F *
E(A) * * *
A
C
Notation: • T = { 0, 1, 2 }• F = -1 • * = don’t care = { -1, 0, 1, 2 }
ESWC 2015 Tutorial 19
Example: A within C
I(C) B(C) E(C)
I(A) T * F
B(A) * * F
E(A) * * *
C
A
Notation equivalent to 3x3 matrix:
• String of 9 characters representing the above matrix in row major order.
• In this case: T*F**F***
ESWC 2015 Tutorial 20
DE-9IM Relation Definitions
ESWC 2015 Tutorial 21
The Region Connection Calculus (RCC)
• The primitives of the calculus are spatial regions. These are
non-empty, regular closed subsets of a topological space.
• The calculus is based on a single binary predicate C that
formalizes the “connectedness” relation.
• C(a,b) is true when the closure of a is connected to the
closure of b i.e., they have at least one point in common.
• It is axiomatized using first order logic.
ESWC 2015 Tutorial 22
RCC-8
• This is a set of eight JEPD binary relations that can
be defined in terms of predicate C.
ESWC 2015 Tutorial 23
RCC-5
• The RCC-5 subset has also been studied. The
granularity here is coarser. The boundary of a region is
not taken into consideration:
• No distinction among DC and EC, called just DR.
• No distinction among TPP and NTPP, called just
PP.
• RCC-8 and RCC-5 relations can also be defined
using point-set topology, and there are very close
connections to the models of Egenhofer and others.
ESWC 2015 Tutorial 24
More Qualitative Spatial Relations
• Orientation/Cardinal directions (left of, right of,
north of, south of, northeast of etc.)
• Distance (close to, far from etc.). This information
can also be quantitative.
ESWC 2015 Tutorial 25
Coordinate Systems
• Coordinate: one of n scalar values that determines the position
of a point in an n-dimensional space.
• Coordinate system: a set of mathematical rules for specifying
how coordinates are to be assigned to points.
• Example: the Cartesian coordinate system
ESWC 2015 Tutorial 26
Coordinate Reference Systems
• Coordinate reference system: a coordinate system
that is related to an object (e.g., the Earth, a planar
projection of the Earth, a three dimensional
mathematical space such as R3) through a datum
which specifies its origin, scale, and orientation.
• The term spatial reference system is also used.
ESWC 2015 Tutorial 27
Geographic Coordinate Reference Systems
• These are 3-dimensional coordinate systems that utilize latitude
(φ), longitude (λ) , and optionally geodetic height (i.e.,
elevation), to capture geographic locations on Earth.
ESWC 2015 Tutorial 28
The World Geodetic System
• The World Geodetic System (WGS) is the most well-known
geographic coordinate reference system and its latest revision is
WGS84.
• Applications: cartography, geodesy, navigation (GPS), etc.
ESWC 2015 Tutorial 29
Projected Coordinate Reference Systems
• Projected coordinate reference systems: they transform the
3-dimensional approximation of the Earth into a 2-dimensional
surface (distortions!)
• Example: the Universal Transverse Mercator (UTM) system
ESWC 2015 Tutorial 30
Coordinate Reference Systems (cont’d)
• There are well-known ways to translate between co-
ordinate reference systems.
• See the list of coordinate reference systems of the
European Petroleum Survey Group: http://www.epsg-
registry.org/
ESWC 2015 Tutorial 31
Geospatial Data Standards
• The Open Geospatial Consortium (OGC) and the
International Organization for Standardization (ISO) have
developed many geospatial data standards that are in wide use
today. In this tutorial we will cover:
• Well-Known Text
• Geography Markup Language
• OpenGIS Simple Features Access
ESWC 2015 Tutorial 32
Well-Known Text (WKT)
• WKT is an OGC and ISO standard for representing geometries,
coordinate reference systems, and transformations between
coordinate reference systems.
• WKT is specified in OpenGIS Simple Feature Access - Part 1:
Common Architecture standard which is the same as the ISO 19125-1
standard. Download from
http://portal.opengeospatial.org/files/?artifact_id=25355 .
• This standard concentrates on simple features: features with all
spatial attributes described piecewise by a straight line or a
planar interpolation between sets of points.
ESWC 2015 Tutorial 33
WKT Class Hierarchy
ESWC 2015 Tutorial 34
Example
WKT representation:
GeometryCollection(
Point(5 35),
LineString(3 10,5 25,15 35,20 37,30 40),
Polygon((5 5,28 7,44 14,47 35,40 40,20 30,5 5),
(28 29,14.5 11,26.5 12,37.5 20,28 29))
)
ESWC 2015 Tutorial 35
Geography Markup Language (GML)
• GML is an XML-based encoding standard for the
representation of geospatial data.
• GML provides XML schemas for defining a variety of concepts:
geographic features, geometry, coordinate reference
systems, topology, time and units of measurement.
• GML profiles are subsets of GML that target particular
applications.
• Examples: Point Profile, GML Simple Features Profile etc.
ESWC 2015 Tutorial 36
GML Simple Features: Class Hierarchy
ESWC 2015 Tutorial 37
Example
GML representation:
<gml:Polygon gml:id="p3" srsName="urn:ogc:def:crs:EPSG:6.6:4326”>
<gml:exterior>
<gml:LinearRing>
<gml:coordinates>
5,5 28,7 44,14 47,35 40,40 20,30 5,5
</gml:coordinates>
</gml:LinearRing>
</gml:exterior>
</gml:Polygon>
ESWC 2015 Tutorial 38
OpenGIS Simple Features Access
• OGC has also specified a standard for the storage, retrieval,
query and update of sets of simple features using
relational DBMS and SQL.
• This standard is “OpenGIS Simple Feature Access - Part 2: SQL
Option” and it is the same as the ISO 19125-2 standard. Download from
http://portal.opengeospatial.org/files/?artifact_id=25354.
• Related standard: ISO 13249 SQL/MM - Part 3.
ESWC 2015 Tutorial 39
OpenGIS Simple Features Access (cont’d)
• The standard covers two implementations options: (i) using only
the SQL predefined data types and (ii) using SQL with
geometry types.
• SQL with geometry types:
• We use the WKT geometry class hierarchy presented earlier
to define new geometric data types for SQL
• We define new SQL functions on those types.
ESWC 2015 Tutorial 40
SQL with Geometry Types -Functions
• Functions that request or check properties of a geometry:
• ST_Dimension(A:Geometry):Integer
• ST_GeometryType(A:Geometry):Character Varying
• ST_AsText(A:Geometry): Character Large Object
• ST_AsBinary(A:Geometry): Binary Large Object
• ST_SRID(A:Geometry): Integer
• ST_IsEmpty(A:Geometry): Boolean
• ST_IsSimple(A:Geometry): Boolean
ESWC 2015 Tutorial 41
SQL with Geometry Types –Functions (cont’d)
• Functions that test topological relations between two geometries
using the DE-9IM:
• ST_Equals(A:Geometry, B:Geometry):Boolean
• ST_Disjoint(A:Geometry, B:Geometry):Boolean
• ST_Intersects(A:Geometry, B:Geometry):Boolean
• ST_Touches(A:Geometry, B:Geometry):Boolean
• ST_Crosses(A:Geometry, B:Geometry):Boolean
• ST_Within(A:Geometry, B:Geometry):Boolean
• ST_Contains(A:Geometry, B:Geometry):Boolean
• ST_Overlaps(A:Geometry, B:Geometry):Boolean
• ST_Relate(A:Geometry, B:Geometry, Matrix: Char(9)):Boolean
ESWC 2015 Tutorial 42
DE-9IM Relation Definitions
• A equals B can also be defined by the pattern TFFFTFFFT.
• A intersects B is the negation of A disjoint B
• A contains B is equivalent to B within A
ESWC 2015 Tutorial 43
SQL with Geometry Types –Functions (cont’d)
• Functions for constructing new geometries out of existing
ones:
• ST_Boundary(A:Geometry):Geometry
• ST_Envelope(A:Geometry):Geometry
• ST_Intersection(A:Geometry, B:Geometry):Geometry
• ST_Union(A:Geometry, B:Geometry):Geometry
• ST_Difference(A:Geometry, B:Geometry):Geometry
• ST_SymDifference(A:Geometry, B:Geometry):Geometry
• ST_Buffer(A:Geometry, distance:Double):Geometry
ESWC 2015 Tutorial 44
Geospatial Relational DBMS
• The OpenGIS Simple Features Access Standard is today been
used in all relational DBMS with a geospatial extension.
• The abstract data type mechanism of the DBMS allows
the representation of all kinds of geospatial data types
supported by the standard.
• The query language (SQL) offers the functions of the
standard for querying data of these types.
• The book Geographic Information Systems and Science is a nice introduction to GIS. See: http://eu.wiley.com/WileyCDA/WileyTitle/productCd-EHEP001475.html
• The following papers present the DE-9IM model:
Eliseo Clementini, Paolino Di Felice and Peter van Oosterom.
A Small Set of Formal Topological Relationships Suitable for End-User Interaction. SSD 1993: 277-295
http://link.springer.com/chapter/10.1007%2F3-540-56869-7_16
E. Clementini and P. Felice. A Comparison of Methods for Representing Topological Relationships. Information Sciences 80 (1994), pp. 1-34.
http://www.sciencedirect.com/science/article/pii/106901159400033X The paper
• The paper below surveys a lot of interesting results on the RCC calculus:J. Renz, B. Nebel, Qualitative Spatial Reasoning using Constraint
Calculi, in: M. Aiello, I. Pratt-Hartmann and J. van Benthem (eds.),
Handbook of Spatial Logics, pp. 161–215, 2007, Springer.http://users.cecs.anu.edu.au/~jrenz/papers/renz-nebel-los.pdf
• The two OGC standards mentioned in the slides.
Readings
Part 2:
Spatial and Temporal Data in RDF:
stRDF/stSPARQL and GeoSPARQL
ESWC 2015 Tutorial
Publishing and Interlinking Linked Geospatial Data
Dept. of Informatics and TelecommunicationsNational and Kapodistrian University of Athens
ESWC 2015 Tutorial 2
Common Approach
• The two proposals (stRDF/stSPARQL and GeoSPARQL) offer constructs for:o Developing ontologies for spatial
and temporal data.o Encoding spatial and temporal
data that use these ontologies in RDF.
o Extending SPARQL to query spatial and temporal data.
ESWC 2015 Tutorial 3
Two Proposals
• stRDF/stSPARQL
• GeoSPARQL
ESWC 2015 Tutorial 4
The data model stRDF
An extension of RDF for the representation of geospatial information that changes over time.
Geospatial dimension:
Spatial data types are introduced.
Geospatial information is representing using spatial literals of these datatypes.
OGC standards WKT and GML are used for the serialization of spatial literals.
Temporal dimension (later)
Proposed independently and around the same time as GeoSPARQL (starting with an ESWC 2010 paper by Koubarakis and Kyzirakos).
[ Kyzirakos, Karpathiotakis
& Koubarakis 2012 ]
ESWC 2015 Tutorial 5
strdf:geometry rdf:type rdfs:Datatype;
rdfs:subClassOf rdfs:Literal.
strdf:WKT rdf:type rdfs:Datatype;
rdfs:subClassOf strdf:geometry.
strdf:GML rdf:type rdfs:Datatype;
rdfs:subClassOf strdf:geometry.
Spatial Datatypes
ESWC 2015 Tutorial 6
Example Ontology: Administrative Geography of Greece
Geometry property
strdf:geometry
strdf:GML strdf:WKT
ESWC 2015 Tutorial 7
Example Ontology: Administrative Geography of Greece
strdf:geometry
strdf:GML strdf:WKT
Geometry property
ESWC 2015 Tutorial 8
Example Data in stRDF
gag:Olympia
gag:name "Ancient Olympia";
rdf:type gag:MunicipalCommunity .
Spatial data type
gag:Olympia gag:hasGeometry
"POLYGON((21.5 18.5, 23.5 18.5,
23.5 21, 21.5 21, 21.5 18.5));
<http://www.opengis.net/def/crs/EPSG/0/4326>"^^
strdf:WKT .
Spatial literal
Coordinate Reference
System
Geometry Property
ESWC 2015 Tutorial
gag:Olympia
rdf:type gag:MunicipalCommunity;
gag:name "Ancient Olympia";
gag:population "184"^^xsd:int;
gag:hasGeometry "POLYGON
(((25.37 35.34,…)))"^^strdf:WKT.
gag:OlympiaMUnit
rdf:type gag:MunicipalityUnit;
gag:name "Municipality Unit of
Ancient Olympia".
gag:OlympiaMunicipality
rdf:type gag:Municipality;
gag:name "Municipality of
Ancient Olympia".
gag:Olympia gag:belongsTo gag:OlympiaMUnit .
gag:OlympiaMUnit gag:belongsTo gag:OlympiaMunicipality.
9
Example (cont’d)
ESWC 2015 Tutorial 10
More Examples
Corine Land Use/Land Cover (http://www.eea.europa.eu/publications/COR0-landcover )
Burnt Area Products (project TELEIOS,
http://www.earthobservatory.eu/ )
ESWC 2015 Tutorial 11
Corine Land Use/Land Cover
ESWC 2015 Tutorial 12
Corine Land Use/Land Cover in stRDF(http://www.linkedopendata.gr )
clc:Area_24015134
rdf:type clc:Area ;
clc:hasCode "312"^^xsd:decimal;
clc:hasID "EU-203497"^^xsd:string;
clc:hasArea_ha "255.5807904"^^xsd:double;
clc:hasGeometry "POLYGON((15.53 62.54,
…))"^^strdf:WKT;
clc:hasLandUse clc:ConiferousForest .
Geometry Property
ESWC 2015 Tutorial 13
Burnt Area Products (http://www.earthobservatory.eu/ontologies/noaOntology.owl)
ESWC 2015 Tutorial 14
Burnt Area Products
noa:ba_15
rdf:type noa:BurntArea;
noa:isProducedByProcessingChain
"static thresholds"^^xsd:string;
noa:hasAcquisitionTime
"2010-08-24T13:00:00"^^xsd:dateTime;
noa:hasGeometry "MULTIPOLYGON(((
393801.42 4198827.92, ..., 393008 424131)));
<http://www.opengis.net/def/crs/
EPSG/0/2100>"^^strdf:WKT.
Geometry Property
ESWC 2015 Tutorial 15
stSPARQL: Geospatial SPARQL 1.1
We define a SPARQL extension function for each function defined in the OpenGIS Simple Features Access standard
Basic functions
Get a property of a geometryxsd:int strdf:dimension(strdf:geometry A)
xsd:string strdf:geometryType(strdf:geometry A)
xsd:int strdf:srid(strdf:geometry A)
Get the desired representation of a geometryxsd:string strdf:asText(strdf:geometry A)
xsd:string strdf:asGML(strdf:geometry A)
Test whether a certain condition holdsxsd:boolean strdf:isEmpty(strdf:geometry A)
xsd:boolean strdf:isSimple(strdf:geometry A)
ESWC 2015 Tutorial 16
stSPARQL: Geospatial SPARQL 1.1
Functions for testing topological spatial relationships
OGC Simple Features Access
xsd:boolean strdf:equals(strdf:geometry A, strdf:geometry B)
xsd:boolean strdf:disjoint(strdf:geometry A, strdf:geometry B)
xsd:boolean strdf:intersects(strdf:geometry A, strdf:geometry B)
xsd:boolean strdf:touches(strdf:geometry A, strdf:geometry B)
xsd:boolean strdf:crosses(strdf:geometry A, strdf:geometry B)
xsd:boolean strdf:within(strdf:geometry A, strdf:geometry B)
xsd:boolean strdf:contains(strdf:geometry A, strdf:geometry B)
xsd:boolean strdf:overlaps(strdf:geometry A, strdf:geometry B)
xsd:boolean strdf:relate(strdf:geometry A, strdf:geometry B,
xsd:string intersectionPatternMatrix)
Egenhofer
RCC-8
ESWC 2015 Tutorial 17
stSPARQL: Geospatial SPARQL 1.1
Spatial analysis functions
Construct new geometric objects from existing geometric objects
strdf:geometry strdf:boundary(strdf:geometry A)
strdf:geometry strdf:envelope(strdf:geometry A)
strdf:geometry strdf:convexHull(strdf:geometry A)
strdf:geometry strdf:intersection(strdf:geometry A, strdf:geometry B)
strdf:geometry strdf:union(strdf:geometry A, strdf:geometry B)
strdf:geometry strdf:difference(strdf:geometry A, strdf:geometry B)
strdf:geometry strdf:symDifference(strdf:geometry A, strdf:geometry B)
strdf:geometry strdf:buffer(strdf:geometry A, xsd:double distance, xsd:anyURI units)
Spatial metric functions
xsd:float strdf:distance(strdf:geometry A, strdf:geometry B, xsd:anyURI units)
xsd:float strdf:area(strdf:geometry A)
Spatial aggregate functions
strdf:geometry strdf:union(set of strdf:geometry A)
strdf:geometry strdf:intersection(set of strdf:geometry A)
strdf:geometry strdf:extent(set of strdf:geometry A)
ESWC 2015 Tutorial 18
stSPARQL: Geospatial SPARQL 1.1
Select clause
Construction of new geometries (e.g., strdf:buffer(?geo, 0.1, uom:metre))
Spatial aggregate functions (e.g., strdf:union(?geo))
Metric functions (e.g., strdf:area(?geo))
Filter clause
Functions for testing topological spatial relationships between spatial terms (e.g.,
strdf:contains(?G1, strdf:union(?G2, ?G3)))
Numeric expressions involving spatial metric functions
(e.g., strdf:area(?G1) ≤ 2*strdf:area(?G2)+1)
Boolean combinations
Having clause
Boolean expressions involving spatial aggregate functions and spatial metric
functions or functions testing for topological relationships between spatial terms
(e.g., strdf:area(strdf:union(?geo))>1)
ESWC 2015 Tutorial 19
stSPARQL: An example (1/3)
SELECT ?name
WHERE {
?comm rdf:type gag:LocalCommunity;
gag:name ?name;
gag:hasGeometry ?commGeo .
?ba rdf:type noa:BurntArea;
noa:hasGeometry ?baGeo .
FILTER(strdf:overlaps(?commGeo,?baGeo))
}Spatial
Function
Return the names of local communities that have been affected by fires
ESWC 2015 Tutorial 20
stSPARQL: An example (2/3)
SELECT ?ba ?baGeom
WHERE {
?r rdf:type clc:Region;
clc:hasGeometry ?rGeom;
clc:hasCorineLandUse ?f.
?f rdfs:subClassOf clc:Forest.
?c rdf:type gag:LocalCommunity;
gag:hasGeometry ?cGeom.
?ba rdf:type noa:BurntArea;
noa:hasGeometry ?baGeom.
FILTER( strdf:intersects(?rGeom,?baGeom) &&
strdf:distance(?baGeom,?cGeom,uom:metre) < 200)}
Spatial Functions
Find all burnt forests near local communities
ESWC 2015 Tutorial
Spatial Function
21
SELECT ?burntArea
(strdf:intersection(?baGeom,
strdf:union(?fGeom))
AS ?burntForest)
WHERE {
?burntArea rdf:type noa:BurntArea;
noa:hasGeometry ?baGeom.
?forest rdf:type clc:Region;
clc:hasLandCover clc:ConiferousForest;
clc:hasGeometry ?fGeom.
FILTER(strdf:intersects(?baGeom,?fGeom))
}
GROUP BY ?burntArea ?baGeom
Compute the parts of burnt areas that lie in coniferous forests.
stSPARQL: An example (3/3)
Spatial Aggregate
ESWC 2015 Tutorial
Time dimensions in Linked Data
User-defined time: A time value (literal) with no special semantics.
Valid time: The time when a fact (represented by a triple) is true in the modeled reality.
Transaction time: The time when the triple is current in the database.
ESWC 2015 Tutorial
The time dimension of stRDF: The valid time of triples
The following extensions are introduced in stRDF:• Timeline: the (discrete) value space of the datatype xsd:dateTime of
XML-Schema
• Two kinds of time primitives are supported: time instants and time periods.• A time instant is an element of the time line.
• A time period is an expression of the form [B, E) or [B, E] or (B, E] or (B, E) where B and E
are time instants called the beginning and ending time of the period.
• The new datatype strdf:period is introduced.
23
rdfs:Literal
strdf:WKT strdf:GML
strdf:periodstrdf:geometry
ESWC 2015 Tutorial
The time dimension of stRDF (cont’d)
• Triples are extended to quads.
• A temporal triple (quad) is an expression of the form s p o t.
where s p o. is an RDF triple and t is a time instant or time
period called the valid time of the triple.
• The temporal constants NOW and UC (“until changed”) are
introduced.
24
ESWC 2015 Tutorial
An example with valid time
25
Forest
ESWC 2015 Tutorial 26
Forest
clc:region1 clc:hasLandCover clc:Forest
"[2006-08-25T11:00:00+02, "UC")"^^strdf:period .
An example with valid time
ESWC 2015 Tutorial
An example with valid time
27
Forest
clc:region1 clc:hasLandCover clc:Forest
"[2006-08-25T11:00:00+02, "UC")"^^strdf:period .
Burnt area
ESWC 2015 Tutorial 28
Forest Burnt area
noa:ba1 rdf:type noa:BurntArea
"[2007-08-25T11:00:00+02, "UC")"^^strdf:period .
clc:region1 clc:hasLandCover clc:Forest
"[2006-08-25T11:00:00+02, "UC")"^^strdf:period .
An example with valid time
ESWC 2015 Tutorial 29
Forest Burnt area
noa:ba1 rdf:type noa:BurntArea
"[2007-08-25T11:00:00+02, "UC"))"^^strdf:period .
clc:region1 clc:hasLandCover clc:Forest
"[2006-08-25T11:00:00+02,2007-08-25T11:00:00+02)"^^strdf:period .
An example with valid time
ESWC 2015 Tutorial 30
Forest Burnt area Agricultural area
clc:region1 clc:hasLandCover clc:AgriculturalArea
"[2009-08-25T11:00:00+02, "UC")"^^strdf:period .
noa:ba1 rdf:type noa:BurntArea
"[2007-08-25T11:00:00+02,2009-08-25T11:00:00+02)"^^strdf:period .
clc:region1 clc:hasLandCover clc:Forest
"[2006-08-25T11:00:00+02,2007-08-25T11:00:00+02)"^^strdf:period .
An example with valid time
ESWC 2015 Tutorial
The time dimension of stSPARQL
The following extensions are introduced:
• Triple patterns are extended to quad patterns (the last component is a temporal
term: variable or constant)
• Temporal extension functions are introduced:
• Allen's temporal relations (e.g., strdf:after)
• Period constructors (e.g., strdf:period_intersect)
• Temporal aggregates (e.g., strdf:maximalPeriod)
31
ESWC 2015 Tutorial
• Find the current land cover of all areas in the dataset
SELECT ?clc
WHERE {
?R rdf:type clc:Region .
?R clc:hasLandCover ?clc ?t1 .
FILTER(strdf:during ("NOW", ?t1))
}
Temporal extension function
Temporal constant
Example Query
32
Quad Pattern
ESWC 2015 Tutorial 33
Two Proposals
• stRDF/stSPARQL
• GeoSPARQL
ESWC 2015 Tutorial 34
GeoSPARQL
GeoSPARQL is an OGC standard.
Functionalities similar to stRDF/stSPARQL:
Geometries are represented using literals of spatial datatypes.
Literals are serialized using WKT and GML.
The same families of functions are offered for querying geometries.
Functionalities beyond stSPARQL:
High level ontologies inspired from GIS terminology.
Topological relations can now be asserted as well so that reasoning and querying on them is possible.
A query rewriting mechanism.
Functionalities of stSPARQL that are not included in GeoSPARQL:
• Geospatial aggregate functions
• Temporal dimension
ESWC 2015 Tutorial
GeoSPARQL Components
Core
Topology VocabularyExtension
- relation family
Geometry Extension- serialization- version
Geometry TopologyExtension
- serialization- version - relation family
Query RewriteExtension
- serialization- version - relation family
RDFS Entailment Extension
- serialization- version - relation family
Parameters
• Serialization
• WKT
• GML
• Relation Family
• Simple Features
• RCC-8
• Egenhofer
ESWC 2015 Tutorial 36
GeoSPARQL Core
Defines two top level classes that can be used to organize geospatial data.
ESWC 2015 Tutorial 37
GeoSPARQL Geometry Extension
Provides vocabulary for asserting and querying data about the geometric attributes of a feature.
ESWC 2015 Tutorial 38
Example Ontology: Greek Administrative Geography
ESWC 2015 Tutorial 39
Greek Administrative Geography
ESWC 2015 Tutorial 40
Greek Administrative Geography
ESWC 2015 Tutorial 41
Example Data
gag:Olympia
rdf:type gag:MunicipalCommunity;
gag:name "Ancient Olympia";
gag:population "184"^^xsd:int;
geo:hasGeometry ex:polygon1.
ex:polygon1
rdf:type geo:Geometry;
geo:asWKT "http://www.opengis.net/def/crs/OGC/1.3/CRS84
POLYGON((21.5 18.5,23.5 18.5,
23.5 21,21.5 21,21.5 18.5))"
^^sf:wktLiteral.
Datatype from Geometry extension
Geometry literal
Property from Geometry extension
Property from Geometry extension
Class from Geometry extension
ESWC 2015 Tutorial 42
Non-Topological Query Functions of the Geometry Extension
The following non-topological query functions are also offered:
geof:distance
geof:buffer
geof:convexHull
geof:intersection
geof:union
geof:difference
geof:symDifference
geof:envelope
geof:boundary
ESWC 2015 Tutorial 43
GeoSPARQL Topology Vocabulary Extension
The extension is parameterized by the family of topological relations supported.
Topological relations for simple features
The Egenhofer relations e.g., geo:ehMeet
The RCC-8 relations e.g., geo:rcc8ec
ESWC 2015 Tutorial
gag:Olympia
rdf:type gag:MunicipalCommunity;
gag:name "Ancient Olympia".
gag:OlympiaMUnit
rdf:type gag:MunicipalityUnit;
gag:name "Municipality Unit of
Ancient Olympia".
gag:OlympiaMunicipality
rdf:type gag:Municipality;
gag:name "Municipality of
Ancient Olympia".
gag:Olympia geo:sfWithin gag:OlympiaMUnit .
gag:OlympiaMUnit geo:sfWithin gag:OlympiaMunicipality.
44
Greek Administrative Geography
Simple Features topological
relation
ESWC 2015 Tutorial 45
GeoSPARQL: An example
SELECT ?m
WHERE {
?m rdf:type gag:MunicipalityUnit.
?m geo:sfContains gag:Olympia.
}
Find the municipality unit that contains the community of Ancient Olympia
Simple Featurestopological relation
Answer: ?m = gag:OlympiaMUnit
ESWC 2015 Tutorial 46
GeoSPARQL: An example
SELECT ?m
WHERE {
?m rdf:type gag:Municipality.
?m geo:sfContains gag:Olympia.
}
Find the municipality that contains the community of Ancient Olympia
Answer?
ESWC 2015 Tutorial 47
Example (cont’d)
The answer to the previous query is
?m = gag:OlympiaMunicipality
GeoSPARQL does not tell you how to compute this answer which needs reasoning about the transitivity of relation geo:sfContains.
Options: • Use rules• Use constraint-based techniques
ESWC 2015 Tutorial 48
The Geometry Topology Extension
• Offers vocabulary for querying topological properties of geometry literals.
• Simple Features• geof:relate
• geof:sfEquals
• geof:sfDisjoint
• geof:sfIntersects
• geof:sfTouches
• geof:sfCrosses
• geof:sfWithin
• geof:sfContains
• geof:sfOverlaps
• Egenhofer (e.g., geof:ehDisjoint)• RCC-8 (e.g., geof:rcc8dc)
ESWC 2015 Tutorial 49
Example Query
SELECT ?name
WHERE {
?comm rdf:type gag:LocalCommunity;
gag:name ?name;
geo:hasGeometry ?commGeo .
?ba rdf:type noa:BurntArea;
geo:hasGeometry ?baGeo .
FILTER(geof:sfOverlaps(?commGeo,?baGeo))
}Geometry Topology Extension Function
Return the names of local communities that have been affected by fires
Geometry Extension Property
Geometry Extension Property
ESWC 2015 Tutorial 50
GeoSPARQL Query Rewrite Extension
Provides a collection of RIF rules that use topological extension functions to establish the existence of topological predicates.
Example: given the RIF rule named geor:sfWithin, the serializations of the geometries of gag:Athens and gag:Greece named AthensWKT and GreeceWKT and the fact that
geof:sfWithin(AthensWKT, GreeceWKT)
returns true from the computation of the two geometries, we can derive the triple
gag:Athens geo:sfWithin gag:Greece
One possible implementation is to re-write a given SPARQL query.
ESWC 2015 Tutorial 51
RIF Rule
Forall ?f1 ?f2 ?g1 ?g2 ?g1Serial ?g2Serial
(?f1[geo:sfWithin->?f2] :-
Or(
And (?f1[geo:hasDefaultGeometry->?g1]
?f2[geo:hasDefaultGeometry->?g2]
?g1[ogc:asGeomLiteral->?g1Serial]
?g2[ogc:asGeomLiteral->?g2Serial]
External(geof:sfWithin (?g1Serial,?g2Serial)))
And (?f1[geo:hasDefaultGeometry->?g1]
?g1[ogc:asGeomLiteral->?g1Serial]
?f2[ogc:asGeomLiteral->?g2Serial]
External(geof:sfWithin (?g1Serial,?g2Serial)))
And (?f2[geo:hasDefaultGeometry->?g2]
?f1[ogc:asGeomLiteral->?g1Serial]
?g2[ogc:asGeomLiteral->?g2Serial]
External(geof:sfWithin (?g1Serial,?g2Serial)))
And (?f1[ogc:asGeomLiteral->?g1Serial]
?f2[ogc:asGeomLiteral->?g2Serial]
External(geof:sfWithin (?g1Serial,?g2Serial)))
))
Feature-
Feature
Feature-
Geometry
Geometry-
Feature
Geometry-
Geometry
ESWC 2015 Tutorial 52
Example
SELECT ?feature
WHERE {
?feature geo:sfWithin
geonames:OlympiaMunicipality.
}
Find all features that are inside the municipality of Ancient Olympia
ESWC 2015 Tutorial 53
Rewritten Query
SELECT ?feature
WHERE { {?feature geo:sfWithin geonames:Olympia }
UNION
{ ?feature geo:hasDefaultGeometry ?featureGeom .
?featureGeom geo:asWKT ?featureSerial .
geonames:Olympia geo:hasDefaultGeometry ?olGeom .
?olGeom geo:asWKT ?olSerial .
FILTER (geof:sfWithin (?featureSerial, ?olSerial)) }
UNION { ?feature geo:hasDefaultGeometry ?featureGeom .
?featureGeom geo:asWKT ?featureSerial .
geonames:Olympia geo:asWKT ?olSerial .
FILTER (geof:sfWithin (?featureSerial, ?olSerial)) }
UNION { ?feature geo:asWKT ?featureSerial .
geonames:Olympia geo:hasDefaultGeometry ?olGeom .
?olGeom geo:asWKT ?olSerial .
FILTER (geof:sfWithin (?featureSerial, ?olSerial)) }
UNION {
?feature geo:asWKT ?featureSerial .
geonames:Olympia geo:asWKT ?olSerial .
FILTER (geof:sfWithin (?featureSerial, ?olSerial)) }
ESWC 2015 Tutorial
Specifies the RDFS entailments that follow from the class and property hierarchies defined in the other components e.g., the Geometry Extension.
Systems should use an implementation of RDFS entailment to allow the derivation of new triples from those already in a graph.
54
GeoSPARQL RDFS Entailment Extension
ESWC 2015 Tutorial 55
Example
Given the triples
ex:f1 geo:hasGeometry ex:g1 .
geo:hasGeometry rdfs:domain geo:Feature.
we can infer the following triples:
ex:f1 rdf:type geo:Feature .
ex:f1 rdf:type geo:SpatialObject .
ESWC 2015 Tutorial
Readings
56
• Material from the Strabon web site (http://strabon.di.uoa.gr ).
• The following tutorial paper which introduces to the topic of linked geospatial data:M. Koubarakis, M. Karpathiotakis, K. Kyzirakos, C. Nikolaou and M. Sioutis. Data Models and Query Languages for Linked Geospatial Data. Reasoning Web Summer School 2012.http://strabon.di.uoa.gr/files/survey.pdf
• The following paper which introduces stSPARQL and Strabon:K. Kyzirakos, M. Karpathiotakis and M. Koubarakis. Strabon: A Semantic Geospatial DBMS. 11th International Semantic Web Conference (ISWC 2012). November 11-15, 2012. Boston, USA.http://iswc2012.semanticweb.org/sites/default/files/76490289.pdf
• The following paper which introduces the temporal features of stSPARQL and Strabon:
K. Bereta, P. Smeros and M. Koubarakis. Representing and Querying the Valid Time of Triples for Linked Geospatial Data. In the 10th Extended Semantic Web Conference (ESWC 2013). Montpellier, France. May 26-30, 2013.http://www.strabon.di.uoa.gr/files/eswc2013.pdf
• The GeoSPARQL standard found at http://www.opengeospatial.org/standards/geosparql
ESWC 2015 Tutorial
Readings (cont’d)
57
• The following paper which introduces the RDFi framework:Charalampos Nikolaou and Manolis Koubarakis. Incomplete Information in RDF. In the 7th International Conference on Web Reasoning and Rule Systems (RR 2013). Mannheim, Germany. July 27-29, 2013.http://cgi.di.uoa.gr/~koubarak/publications/rr2013.pdf
• The following paper which introduces the benchmark Geographica:G. Garbis, K. Kyzirakos and M. Koubarakis. Geographica: A Benchmark for Geospatial RDF Stores. In the 12th International Semantic Web Conference (ISWC 2013). Sydney, Australia. October 21-25, 2013.http://cgi.di.uoa.gr/~koubarak/publications/Geographica.pdf
Publishing geospatial information
as RDF graphsKostis Kyzirakos, Dimitrianos Savva
Outline
Mapping relational data to RDF graphs
Mapping non-relational data to RDF graphs
Geospatial Extensions for mapping geospatial data to RDF graphs
Implemented Systems
Demonstration
2
Mapping relational data to RDF graphs
Sitecode Sitename ReleaseDate …
DE0916391 NTP S-H W 2011-01-27
DE1003301 DOGGERB
ANK
2011-01-27
ProtectedArea
?
Natura 2000 is an ecological network
designated under the Birds Directive and
the Habitats Directive which form the
cornerstone of the nature conservation
policy of the European Union.
http://ec.europa.eu/environment/nature/natura2000/index_en.htm
http://www.eea.europa.eu/data-and-maps/data/natura-6
Direct Mapping
W3C Recommendation from 2012http://www.w3.org/TR/rdb-direct-mapping/
Relational tables are mapped to classes defined by an RDF vocabulary.
Attributes of each table are mapped to RDF properties that represent the relation between subject and object resources.
Identifiers, class names, properties and instancesare generated automatically following the labels of the input data.
4
Direct Mapping - Example
Sitecode Sitename ReleaseDate …
DE0916391 NTP S-H W 2011-01-27
DE1003301 DOGGERB
ANK
2011-01-27
ProtectedArea ProtectedArea
xsd:string xsd:date
ReleaseDateSitename
@base <http://foo.example/DB/> .
@prefix rdf: <http://www.w3.org/1999/02-22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<ProtectedArea/Sitecode=DE0916391> rdf:type <ProtectedArea> .
<ProtectedArea/Sitecode=DE0916391> <ProtectedArea#Sitename> "NTP S-H W" .
<ProtectedArea/Sitecode=DE0916391> <ProtectedArea#ReleaseDate>
"2011-01-27"^^xsd:date .
<ProtectedArea/Sitecode=DE1003301> rdf:type <ProtectedArea> .
<ProtectedArea/Sitecode=DE1003301> <ProtectedArea#Sitename> "DOGGERBANK" .
<ProtectedArea/Sitecode=DE1003301> <ProtectedArea#ReleaseDate>
"2011-01-27"^^xsd:date .
The language R2RML
R2RML is a language for expressing customized mappings from relational databases to RDF graphs
R2RML is a W3C Recommendation from 2012http://www.w3.org/TR/r2rml/
R2RML mappings provide the user with the ability to express the desired transformation of existing relational data into the RDF data model, following a structure and a target vocabulary that is chosen by the user.
6
The language R2RML (cont’d)
LogicalTable
PredicateObjectMap
GraphMap
TriplesMap SubjectMap
ObjectMap
PredicateMap
RefObjectMap Join
TermMap
Constant
Column
Template
Child
Parent
The language R2RML (cont’d)
A logical table can be a relational table that is explicitly stored in the
databasean SQL viewan SQL select query
A triples map is a rule that defines how each tuple of the logical table will be mapped to a set of RDF triples. It consists ofa subject map zero or more predicate-object maps.
8
The language R2RML (cont’d)
A subject map is a rule that defines how to generate the URI that will be the subject of each generated RDF triple.
A predicate-object map consists of predicate maps and object maps.
A predicate map defines the RDF property to be used to relate the subject and the object of the generated triple.
An object map defines how to generate the object of the triple which originates from the current row of the logical table.
9
The language R2RML (cont’d)
Subject, predicate, object and graph maps are term maps. A term map is a function that generates an RDF term from a logical table. Three types of term maps are defined:constant-valued term mapscolumn-valued term maps template-valued term maps
10
The language R2RML (cont’d)
A referencing object map allows using the subjects of another triples map as the objects generated by a predicate-object map. Optionally, it has one or more join condition
properties.
11
PredicateObjectMap
RefObjectMap
TriplesMap
JoinConditioncolumn name
column name
source: http://www.w3.org/TR/r2rml/#dfn-predicate-map
rr:child
rr:parent
rr:join
Condition*
rr:parent
TriplesMaprr:object
Map
The language R2RML – Example
Sitecode Sitename ReleaseDate …
DE0916391 NTP S-H W 2011-01-27
DE1003301 DOGGERB
ANK
2011-01-27
ProtectedArea ProtectedArea
xsd:string
Sitename
@base <http://foo.example/DB/> .
<NaturaMapping>
rr:subjectMap [
rr:template "ProtectedArea/SiteCode={SiteCode}";
rr:class <ProtectedArea> ];
rr:predicateObjectMap [
rr:predicate ProtectedArea:SiteName;
rr:objectMap [ rr:column "SiteName"; ]; ] .
<ProtectedArea/Sitecode=DE0916391> rdf:type <ProtectedArea> .
<ProtectedArea/Sitecode=DE0916391> <ProtectedArea#Sitename> "NTP S-H W" .
<ProtectedArea/Sitecode=DE1003301> rdf:type <ProtectedArea> .
<ProtectedArea/Sitecode=DE1003301> <ProtectedArea#Sitename> "DOGGERBANK" .
<ogr:FeatureCollection>
<gml:featureMember>
<ogr:waterways fid="waterways.128">
<ogr:osm_id>8108139</ogr:osm_id>
<ogr:name>Lech</ogr:name>
<ogr:type>river</ogr:type>
<ogr:geometryProperty>
<gml:LineString>
<gml:coordinates>
10.9034096,47.7996669
10.9037025,47.8003338 …
</gml:coordinates>
</gml:LineString>
</ogr:geometryProperty>
</ogr:waterways>
</gml:featureMember>
</ogr:FeatureCollection>
Mapping non-relational data to RDF graphs
?
OpenStreetMap is a collaborative project
for publishing free maps of the world. OSM
maintains a community-driven global
editable map that gathers map data in a
crowdsourcing fashion.
http://www.openstreetmap.org/
RDF Mapping Language (RML)
RML is a recently proposed mapping language that defines how to map heterogeneous sources into RDF.http://semweb.mmlab.be/rml/spec.html
RML is defined as a superset of the W3C-standard R2RML
R2RML RML
Logical Table rr:logicalTable Logical Source rml:logicalSource
Table Name rr:tableName URI rml:source
column rr:column reference rml:reference
SQL Reference Formulation rml:referenceFormulation
per row iteration defined iterator rml:iterator
source: http://semweb.mmlab.be/rml/RML_R2RML.html
RML Overview
LogicalSource
PredicateObjectMap
GraphMap
TriplesMap SubjectMap
ObjectMap
PredicateMap
RefObjectMap JoinChild
Parent
TermMap
Constant
Reference
Template
Source
Iterator
Reference Formulation
RML extensions
A logical source refers to the input dataset that will be converted to an RDF graph.
Each logical source has a source property pointing to input data a logical iterator that defines the iteration pattern over
the input data source an optional reference formulation property that defines
the query language that may be used (e.g., SQL2008, XPath, JSONPath)
An RML reference is a term map that refers to a column name (SQL, CSV), an XML element or attribute, or an JSON object.
<ogr:FeatureCollection>
<gml:featureMember>
<ogr:waterways fid="waterways.128">
<ogr:osm_id>8108139</ogr:osm_id>
<ogr:name>Lech</ogr:name>
<ogr:type>river</ogr:type>
<ogr:geometryProperty>
<gml:LineString>
<gml:coordinates>
10.9034096,47.7996669
10.9037025,47.8003338 …
</gml:coordinates>
</gml:LineString>
</ogr:geometryProperty>
</ogr:waterways>
</gml:featureMember>
</ogr:FeatureCollection>
RML Example <#waterways>
rml:logicalSource [
rml:source "/home/leo/osm.gml";
rml:referenceFormulation ql:XPath;
rml:iterator "/ogr:FeatureCollection
/gml:featureMember
/ogr:waterways";
];
rr:subjectMap [
rr:template
"http://www.example.com/id/{@fid}";
rr:class onto:waterways;
];
rr:predicateObjectMap [
rr:predicate onto:hasOgr-Name;
rr:objectMap [
rr:datatype xsd:string;
rml:reference "ogr:name";
]; ] .
ex_id:waterways.128 rdf:type onto:waterways ;
onto:hasOgr-Name "Lech" ;
onto:hasFid "waterways.128"^^xsd:ID ;
onto:hasOgr-Osm_id "8108139" ;
onto:hasOgr-Type "river" .
Mapping geospatial data to RDF graphs
Geospatial data are available in formats suchas:
• ESRI shape files
• KML documents
• GeoJSON documents
• XML documents
Geospatial data may also be stored in spatially-enabled relational databases.
Extending R2ML with transformation-valued term maps
LogicalTable
PredicateObjectMap
GraphMap
TriplesMap SubjectMap
ObjectMap
PredicateMap
RefObjectMap Join
TermMap
Constant
Column
Template
Child
Parent
Function
ArgumentMap
ArgumentMap
Function
Extending RML with transformation-valued term maps
LogicalSource
PredicateObjectMap
GraphMap
TriplesMap SubjectMap
ObjectMap
PredicateMap
RefObjectMap Join
TermMap
Source
Iterator
Reference Formulation
Constant
Column
Template
Child
Parent
Function
ArgumentMap
ArgumentMap
Function
Transformation-valued term maps
A transformation-valued term maps is a term map that generates an RDF term by applying a SPARQL extension function on one or more term maps.
A transformation-valued term map has exactly one rrx:function property that defines a
SPARQL extension function that performs the desired transformation
one rrx:argumentMap property that has as range an rdf:List of term maps that define the arguments to be passed to the transformation function
Transformation-valued term maps (cont’d)
Extending join conditions
PredicateObjectMap
RefObjectMap
TriplesMap
JoinCondition
column name
column name
rr:child
rrx:
function
rr:join
Condition*
rr:parent
TriplesMaprr:object
Map
rdf:List
IRIrefOr
Function
rr:parent
rrx:
argument
Map
Example
Sitecode Sitename Geometry …
DE0916391 NTP S-H W POLYGON((…))
DE1003301 DOGGERB
ANK
POLYGON((…))
ProtectedArea
ProtectedArea
xsd:string
geo:hasGeometry
<NaturaGeometryMapping>
rr:subjectMap [
rr:template "ProtectedArea/Geometry/SiteCode={SiteCode}";
rr:class geo:Geometry ];
rr:predicateObjectMap [
rr:predicate geo:dimension;
rr:objectMap [
rrx:function strdf:dimension;
rrx:argumentMap ( [rr:column "`Geom`"] ); ]; ] .
<ProtectedArea/Geometry/Sitecode=DE0916391>
rdf:type <ProtectedArea> ;
geo:dimension "2"^xsd:integer .
geo:Geometry
geo:
dimensiongeo:asWKT
geo:wktLiteral
Example
Sitecode Sitename Geom …
DE0916391 NTP S-H W POLYGON((…))
DE1003301 DOGGERB
ANK
POLYGON((…))
ProtectedArea
<NaturaGeometryMapping>
rr:subjectMap [
rr:template
"ProtectedArea/Geometry/SiteCode={SiteCode}";
rr:class geo:Geometry ];
rr:predicateObjectMap [
rr:predicate geo:sfIntersects;
rr:objectMap [
rr:parentTriplesMap <#waterwaysGeom> ;
rr:joinCondition [
rrx:function geof:intersection;
rrx:argumentMap (
[rr:column "`Geom`"] ;
[rml:reference "ogr:geometryProperty“;
rr:parentTriplesMap <#waterwaysGeom>]
); ] ; ]; ] .
natura:DE0916391 geo:sfIntersects osm-id:waterways.128 .
<ogr:FeatureCollection>
<gml:featureMember>
<ogr:waterways fid="waterways.128">
<ogr:osm_id>8108139</ogr:osm_id>
<ogr:geometryProperty>
<gml:LineString>
<gml:coordinates>
10.9034096,47.7996669 …
</gml:coordinates>
</gml:LineString>
</ogr:geometryProperty>
</ogr:waterways>
</gml:featureMember>
</ogr:FeatureCollection>
OSM Waterways
<#waterwaysGeom>
rml:logicalSource [
rml:source "/home/leo/osm.gml";
rml:referenceFormulation ql:XPath;
rml:iterator "/ogr:FeatureCollection
/gml:featureMember
/ogr:waterways";
];
rr:subjectMap [
rr:template
"http://www.osm.org/id/{@fid}";
rr:class onto:waterways;
].
Implemented Systems
Direct Mapping processors: SquirellRDF
R2RML processors: D2RQ Platform OpenLink Virtuoso Ultrawrap Morph Ontop Oracle
RML processor Processor by iMinds Lab, Ghent University
Other Mapping Language: Triplify
Geospatial capabilities
So far: Geometry2RDF Sparqlify TripleGeo GeoTriples
26
Custom MappingLanguage
DirectMapping
R2RML RMLSPARQLquery
evaluation
AutomaticMapping
Generation
Geospatialsupport
OpenLinkVirtuoso
✔ ✖ ✔*✖
✔ ✖ ✖
RDF-RDB2RDF ✖ ✔ ✔ ✖ ✖ ✖ ✖
D2RQ Platform ✔ ✖ ✔ ✖ ✔ ✔ ✖
Db2triples ✖ ✔ ✔ ✖ ✖ ✖ ✖
Morph ✔ ✖ ✔ ✖ ✔ ? ✖
Sparqlify ✔ ✖ ✖* ✖ ✔ ✖ ✔
Ontop ✔ ✖ ✔ ✖ ✔ ✖ ✖*
Ultrawrap ✔* ✔ ✔ ✖ ✔ ✖ ✖
Oracle ✖ ✔ ✔ ✖ ✔ ✔ ✖
Geometry2RDF ✖ ✔ ✖ ✖ ✖ ✖ ✔*
TriplesGeo ✖ ✔ ✖ ✖ ✖ ✖ ✔*
iMinds lab RMLprocessor
✖ ✖ ✔ ✔ ✖ ✖ ✖
GeoTriples ✖ ✖ ✔ (✔) ✖* ✔ ✔
Comparison of Geo2RDF tools
DirectMapping
R2RML RMLAutomaticMapping
Generation
GeoSPARQLcompliance
RDBMSESRI
Shapefile
GMLGeo
JSON
Geometry2RDF ✔ ✖ ✖ ✖ ✖ ✔ ✖ ✖ ✖
TriplesGeo ✔ ✖ ✖ ✖ (✔) ✔ ✔ ✖ ✖
GeoTriples ✖ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔
GeoTriples
Open Source software
Released under Mozilla Public Licence v2.0
Available at: https://github.com/LinkedEOData/GeoTriples
Extends the D2RQ Platform
Extends the iMinds lab RML processor
Provides both a graphical user interface and a command line interface
29
Architecture of GeoTriples
30
Earth Obsevation
Acquisitions
Automatic generation of R2RML mappings (cont’d)
Generate two triples maps for each table that has a geometry column. Thematic triples map for the non-geometric information Spatial triples map for the geometric information
The spatial triples map contains multiple transformation functions over the input geometries in order to generate a GeoSPARQL compliant dataset.
31
NaturaGeometryNaturaArea
geo:
hasGeometry
(rr:joinCondition)
Automatic generation of RML mappings for GML documents
Each geometric object is mapped to a geo:Geometryinstance
For each geometric object we generate a set of predicate object maps that use the appropriate transformation functions for producing a GeoSPARQL compliant dataset
Each simple element is mapped to a predicate object map
Each non simple element is mapped to a triples map
Appropriate mappings are generated for linking nestedelements
32
Mapping
GeneratorXSDRML
mapping
Demonstration
33
Discovering Spatial and Temporal Links
among RDF Graphs
Publishing and Interlinking Linked Geospatial Data In Conjunction with the 12th Extended Semantic Web Conference
Portoroz, Slovenia, 1st June 2015
Presenter: Panayiotis Smeros
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 2
Outline
• Introduction to Entity Resolution and Link
Discovery
– Examples, Definitions, Common Problems
• Spatial Entity Resolution
• Spatial and Temporal Link Discovery
– Background and Developed Methods
– Extensions to the Silk Framework
– Hands-on
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 3
Entities in Real-World
source
source
Most of our knowledge about the world is based on entities
and their relations:
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 4
Entities in Data-World
Portoroz Portorož بورتوروز Порторож Πορτορόζ Portorose Портороз Порторожу Portorožu Порторож
Portorož (Italian: Portorose, literally "Port of Roses"), is an Adriatic - Mediterranean coastal settlement in the Municipality of Piran in southwestern Slovenia. Its modern development began in the late 19th century with appearance of first health resorts.
http://www.geonames.org/3192682/portoroz.html http://en.wikipedia.org/wiki/Portoroz http://www.portoroz.si/en/ …
source
Many names, descriptions or IDs (URIs) are used for the
same real-world entity:
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 5
Content Providers
News about Portoroz Reviews of hotels in Portoroz
Pictures about Portoroz
Videos for Portoroz
Wiki pages about Portoroz
Social networks in Portoroz
Many applications provide valuable information about each of
these entities:
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 6
Content Providers
News about Portoroz Reviews of hotels in Portoroz
Pictures about Portoroz
Videos for Portoroz
Wiki pages about Portoroz
Social networks in Portoroz
Many applications provide valuable information about each of
these entities:
Solution?
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 7
Entity Resolution
Problem of understanding that two (or more) entities in data-world
are references of the same real-world entity. [Christen, TKDE’11]
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 8
Entity Resolution (Example)
DBpedia
Entity
name = PORTOROZ
population = 2,849
GeoNames
Entity
name = Portorose
population = 2,851
Problem of understanding that two (or more) entities in data-world
are references of the same real-world entity. [Christen, TKDE’11]
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 9
Entity Resolution (Example)
DBpedia
Entity
name = PORTOROZ
population = 2,849
GeoNames
Entity
name = Portorose
population = 2,851
sameAs
Problem of understanding that two (or more) entities in data-world
are references of the same real-world entity. [Christen, TKDE’11]
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 10
Spatial Entity Resolution (Example)
DBpedia
Entity
name = PORTOROZ
population = 2,849
GeoNames
Entity
name = Portorose
population = 2,851
sameAs
location = 45.51663, 13.57996 location = 45.51661, 13.57998
Problem of understanding that two (or more) entities in data-world
are references of the same real-world entity. [Christen, TKDE’11]
Entity Resolution (Definition)
Let 𝑆 and 𝑇 be two sets of entities. We define a distance
(similarity) function 𝑑𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 and a distance (similarity)
threshold 𝜃𝑑𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 as follows:
𝑑𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦: 𝑆 × T → [0,1] , 𝜃𝑑𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦∈ 0,1
We define the set of discovered similarity links 𝐷𝐿𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 as
follows:
𝐷𝐿𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 = s, sameAs, t 𝑠 ∈ 𝑆 𝑡 ∈ 𝑇 𝑑𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑠, 𝑡 < 𝜃𝑑𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦}
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 11
Link Discovery
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 12
Source Source
Link Discovery is the fourth and the most important Linked Data
Principle.
Establish semantic relations between entities in order to enrich the
information that is known about them. [Bizer et al., IJSWIS’06]
Link Discovery (Definition)
Let 𝑆 and 𝑇 be two sets of entities and 𝑅 the set of relations
that can be discovered between entities. For a relation 𝑟 ∈ 𝑅,
w.l.o.g., we define a distance function 𝑑𝑟 and a distance
threshold 𝜃𝑑𝑟 as follows:
𝑑𝑟: S × T → [0,1] , 𝜃𝑑𝑟∈ 0,1
We define the set of discovered links for relation 𝑟 (𝐷𝐿𝑟) as
follows:
𝐷𝐿𝑟 = s, r, t 𝑠 ∈ 𝑆 𝑡 ∈ 𝑇 𝑑𝑟 𝑠, 𝑡 < 𝜃𝑑𝑟}
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 13
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 14
Link Discovery (Example)
Natura (2000) - Fields Fields - OSM Water Bodies
contains
intersects
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 15
Natura (2000) - Fields
Link Discovery (Example)
Fields - OSM Water Bodies
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 16
Main Problem: Heterogeneity
• Different Data Providers create Heterogeneous
Datasets
– Example: Literal Heterogeneity (case, language, etc).
• We focus on:
– Heterogeneity in the Representation of Geospatial
Information in RDF
– Heterogeneity in the Representation of Temporal
Information in RDF
name = PORTOROZ name = Portorose
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 17
Heterogeneity in the Representation of
Geospatial Information in RDF
_:1 rdf:type geo:Geometry .
_:1 geo:hasGeometry
"<http://www.opengis.net/def/crs/EPSG/0/4326>
POINT(10 20)"^^geo:wktLiteral .
_:1 rdf:type strdf:Geometry .
_:1 strdf:hasGeometry
"<gml:Point crsName="EPSG:2100"><gml:coordinates>10,20
</gml:coordinates></gml:Point>"^^strdf:GML .
_:1 rdf:type wgs84Geo:Point .
_:1 wgs84Geo:lat “10“^^xsd:double .
_:1 wgs84Geo:long “20“^^xsd:double .
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 18
Heterogeneity in the Representation of
Geospatial Information in RDF
_:1 rdf:type geo:Geometry .
_:1 geo:hasGeometry
"<http://www.opengis.net/def/crs/EPSG/0/4326>
POINT(10 20)"^^geo:wktLiteral .
_:1 rdf:type strdf:Geometry .
_:1 strdf:hasGeometry
"<gml:Point crsName="EPSG:2100"><gml:coordinates>10,20
</gml:coordinates></gml:Point>"^^strdf:GML .
• Different Vocabularies
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 19
Heterogeneity in the Representation of
Geospatial Information in RDF
_:1 rdf:type geo:Geometry .
_:1 geo:hasGeometry
"<http://www.opengis.net/def/crs/EPSG/0/4326>
POINT(10 20)"^^geo:wktLiteral .
_:1 rdf:type strdf:Geometry .
_:1 strdf:hasGeometry
"<gml:Point crsName="EPSG:2100"><gml:coordinates>10,20
</gml:coordinates></gml:Point>"^^strdf:GML .
• Different Vocabularies
• Different Serializations of Geometries
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 20
Heterogeneity in the Representation of
Geospatial Information in RDF
_:1 rdf:type geo:Geometry .
_:1 geo:hasGeometry
"<http://www.opengis.net/def/crs/EPSG/0/4326>
POINT(10 20)"^^geo:wktLiteral .
_:1 rdf:type strdf:Geometry .
_:1 strdf:hasGeometry
"<gml:Point crsName="EPSG:2100"><gml:coordinates>10,20
</gml:coordinates></gml:Point>"^^strdf:GML .
• Different Vocabularies
• Different Serializations of Geometries
• Geometries expressed in Different Coordinate
Reference Systems (CRS)
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 21
Heterogeneity in the Representation of
Geospatial Information in RDF
source
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 22
Heterogeneity in the Representation of
Geospatial Information in RDF
• Different Sampling Values
• Different Granularity
• Different Rounding Effects
source
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 23
Heterogeneity in the Representation of
Temporal Information in RDF
_:1 ex:hasBirthday "1989-09-
24T11:05:00+01:00"xsd:dateTime .
_:1 ex:hasAffiliation ex:UoA
"[2007-10-15T00:00:00+03:00,
2013-10-15T00:00:00+04:00)"^^strdf:Period .
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 24
Heterogeneity in the Representation of
Temporal Information in RDF
_:1 ex:hasBirthday "1989-09-
24T11:05:00+01:00"xsd:dateTime .
_:1 ex:hasAffiliation ex:UoA
"[2007-10-15T00:00:00+03:00,
2013-10-15T00:00:00+04:00)"^^strdf:Period .
• Different Vocabularies
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 25
Heterogeneity in the Representation of
Temporal Information in RDF
_:1 ex:hasBirthday "1989-09-
24T11:05:00+01:00"xsd:dateTime .
_:1 ex:hasAffiliation ex:UoA
"[2007-10-15T00:00:00+03:00,
2013-10-15T00:00:00+04:00)"^^strdf:Period .
• Different Vocabularies
• Different Time Zones
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 26
Heterogeneity in the Representation of
Temporal Information in RDF
_:1 ex:hasBirthday "1989-09-
24T11:05:00+01:00"xsd:dateTime .
_:1 ex:hasAffiliation ex:UoA
"[2007-10-15T00:00:00+03:00,
2013-10-15T00:00:00+04:00)"^^strdf:Period .
• Different Vocabularies
• Different Time Zones
• Time Instants and Periods
Outline
• Introduction to Entity Resolution and Link
Discovery
– Examples, Definitions, Common Problems
• Spatial Entity Resolution
• Spatial and Temporal Link Discovery
– Background and Developed Methods
– Extensions to the Silk Framework
– Hands-on
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 27
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 28
Spatial Entity Resolution (Example
Revisited)
DBpedia
Entity
name = PORTOROZ
population = 2,849
GeoNames
Entity
name = Portorose
population = 2,851
sameAs
location = 45.51663, 13.57996 location = 45.51661, 13.57998
Problem of understanding that two (or more) entities in data-world
are references of the same real-world entity. [Christen, TKDE’11]
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 29
Spatial Entity Resolution (1/4)
• Location Name Similarity
– Edit, Jaccard distance
• Location Similarity
– Euclidean distance
• Location Type Similarity
– (e.g. type “river” is similar to type “stream”)
Combines the above similarities to compute the
overall similarity between entities
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 30
Spatial Entity Resolution (2/4)
• Similarity measure: Hausdorff Distance – Intuitively Hausdorff Distance is defined as the
largest distance between the closest points of two geometric shapes
• Handling Geospatial Heterogeneity – Converts geometries to a common
vocabulary (NeoGeo)
– Assumes WGS-84 CRS
• Optimization – Simplifies Geometries with Ramer-Douglas-Peucker algorithm
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 31
Spatial Entity Resolution (3/4)
• Heuristic Combination of:
– URI Similarity
– Label Similarity
• Considering the language of the labels
– Location Similarity
• Assuming the W3C Geo vocabulary
– Geometric Similarity
• Minimum Distance between two Geometries
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 32
Spatial Entity Resolution (4/4)
• Non-Spatial Criteria
– Implemented within the LIMES framework
• Geometric Similarity
– Hausdorff Distance
– Optimizations
• Bounding Circle: Avoids useless comparisons
μ(s, t) = δ(ζ(s), ζ(t)) − r (s) − r (t) > θ ⇒ δ(s, t) > θ
• Space tiling: Reduces the quadratic number of comparisons
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 33
Spatial Entity Resolution
• [Sehgal et al. GIS’06] – Spatial and non-Spatial Criteria
– Only Location Similarity
• [Salas et al., TerraCognita’11] – Only Spatial Criteria
– Complex Geometric Similarity Methods
• [Vilches-Blázquez et al., AGILE’12] – Spatial and non-Spatial Criteria
– Simple Geometric Similarity Methods
• [Ngonga Ngomo, ISWC’13] – Spatial and non-Spatial Criteria
– Complex Geometric Similarity Methods
– Reduced number of comparisons
Outline
• Introduction to Entity Resolution and Link
Discovery
– Examples, Definitions, Common Problems
• Spatial Entity Resolution
• Spatial and Temporal Link Discovery
– Background and Developed Methods
– Extensions to the Silk Framework
– Hands-on
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 34
Link Discovery (reminder)
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 35
Source Source
Link Discovery is the fourth and the most important Linked Data
Principle.
Establish semantic relations between entities in order to enrich the
information that is known about them. [Bizer et al., IJSWIS’06]
Background on Spatial Relations (1/2)
• Dimensionally Extended 9-Intersection Model [Clementini et al., SSD'93]
– Captures topological relations in ℝ2, by considering the
dimension (dim) of the intersections involving the
interior (I), the boundary (B) and the exterior (E) of the
two geometries.
– Examples: Intersects, Equals, Touches, Disjoint,
Contains, Crosses, Covers, CoveredBy and Within
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 36
Background on Spatial Relations (2/2)
• Region Connection Calculus [Randell et al. KR’92]
– RCC-8: a well-known subset of RCC, which is based on eight topological relations
– DC stands for DisConnected, EC for Externally Connected, TPP for Tangential Proper Part, NTPP, for Non Tangential Proper Part, and TPPi and NTPPi are the inverse relations of TPP and NTPP
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 37
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 38
Background on Temporal Relations
• Allen’s Interval Calculus [Allen, Commun. ACM’83]
– thirteen jointly exclusive and pairwise disjoint qualitative
relations
Spatial and Temporal Relations
• We consider the previous Spatial (𝑅𝑠) and Temporal (𝑅𝑡)
relations as Boolean relations (𝑅𝐵) i.e., either they hold or
they do not:
𝑅𝑠, 𝑅𝑡 ⊂ 𝑅𝐵
• 𝑅𝐵 constitutes a special subset of 𝑅. The distance function
𝑑𝑟 and the distance threshold 𝜃𝑑𝑟 for a relation 𝑟 ∈ 𝑅𝐵 are
defined as follows:
𝑑𝑟(s,t) = 0 𝑖𝑓 𝑟 ℎ𝑜𝑙𝑑𝑠1 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
, 𝜃𝑑𝑟= 1
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 39
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 40
Spatial and Temporal Transformations
(1/2)
• CRS Transformation. The geometries of a dataset can be expressed in a Coordinate Reference System that is more precise for the geographic area that they describe (e.g., the GGRS87 for Greece). This transformation converts the CRS of a geometry to the World Geodetic System (WGS 84)
• Vocabulary Transformation. This transformation converts geometry literals from GeoSPARQL, stRDF or W3C GEO to a common vocabulary (GeoSPARQL)
• Serialization Transformation. This transformation converts the geometries of a dataset to a common serialization (WKT)
• Time-Zone Transformation. This transformation converts the time zone of a given time interval to Coordinated Universal Time (UTC)
• Period Transformation. This transformation converts a time instant to a period with the same starting and ending point
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 41
Spatial and Temporal Transformations
(2/2)
• Simplification Transformation. Some datasets have very complex geometries, which makes the computation of spatial relations inefficient. This transformation simplifies a geometry according to a given distance tolerance, ensuring that the result is a valid geometry having the same dimension and number of components as the input
• Envelope Transformation. This transformation computes the envelope (i.e., the minimum bounding rectangle) of a geometry and it is useful in cases that we want to compute approximate spatial relations between two datasets
• Area Transformation. In some cases it is enough to compare just the areas of two geometries to infer whether they are the same or not. This transformation computes the area of a given geometry in square metres
• Points-To-Centroid Transformation. In crowdsourcing datasets like OpenStreetMap, multiple users can define the position of the same placemark. As a better approximation of the real position of this placemark we can compute the centroid of these positions. This transformation computes the centroid of a cluster of points
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 42
Techniques for Checking the Relations
• Cartesian Product Technique (Naive) – Performs exhaustive checks between the pairs of the entities
of datasets
– Complete
– Complexity: O(|S||T|) checks
• Blocking Technique [Isele et al., WebDB’11, Papadakis et al, TKDE’13]
– Divides the entities into blocks
– Decreases the number of checks
– Complete
– Complexity: O(|S||T|) checks (worst case), O(|L|) checks (best case)
* |S|, |T|: number of entities in datasets S and T; |L|: number of links between datasets S and T
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 43
Blocking Technique for Spatial Relations
• Divide the surface of the earth
into curved rectangles (blocks)
• Adjust the area of the blocks
with a blocking factor (bf)
(blockArea: 1
𝑏𝑓2
𝑜2
)
• If the MBB of a geometry spatially intersects with a block, then insert it in this block
• Check for a spatial relation only within each block (independently)
• Construct the set of discovered links (𝐷𝐿𝑟) by aggregating the respective links that have been discovered within each block
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 44
Blocking Technique for Temporal
Relations
• Divide the time into
intervals (blocks)
• Adjust the length of the
blocks with a blocking factor (bf) (blockLength:
1
𝑏𝑓 𝑡𝑖𝑚𝑒 𝑢𝑛𝑖𝑡𝑠)
• If a time period or instant temporally intersects with a block, then insert it in this block
• Check for a temporal relation only within each block (independently)
• Construct the set of discovered links (𝐷𝐿𝑟) by aggregating the respective links that have been discovered within each block
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 45
Blocking Technique
• Fully parallelizable with respect to the blocks
• Proven sound and complete
• 100% accurate links
• 100% precision, recall, F-measure
Extensions to the Silk Framework:
Spatial and Temporal Relations
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 46
Silk
Silk
Extensions to the Silk Framework:
Spatial and Temporal Transformations
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 47
Extensions to the Silk Framework
• Spatial and Temporal Extensions for Silk implemented as
Plugins
• Transparent to all the applications of Silk
– Single Machine
– MapReduce
– Workbench
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 48
Silk
• Download: https://github.com/silk-framework/silk
• Workbench application pre-installed in the VM
• Discover the following links:
All the datasets will be first converted to RDF with GeoTriples!
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 49
Hands-on Silk
Source Dataset Relation Target Dataset
Field Boundaries Contains Raster Cells
OSM Water
Bodies
Intersects Natura (2000)
Natura (2000) Within Federal States of
Germany
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 50
References (1/3)
• [Bizer et al., IJSWIS’06]
Bizer, C., Heath, T., Berners-Lee, T.: Linked Data - The Story So Far. International
Journal on Semantic Web and Information Systems 5(3), 1–22 (2009)
• [Christen, TKDE’11]
P. Christen, " A survey of indexing techniques for scalable record linkage and
deduplication.” in IEEE TKDE 2011.
• [Auer, RW’13]
Auer, S., Lehmann, J., Ngomo, A.C.N., Zaveri, A.: Introduction to Linked Data and Its
Lifecycle on the Web. In: Rudolph, S., Gottlob, G., Horrocks, I., van Harmelen, F. (eds.)
Reasoning Web. Lecture Notes in Computer Science, vol. 8067, pp. 1–90. Springer
(2013)
• [Salas et al., TerraCognita’11]
Salas, J., Harth, A.: Finding spatial equivalences accross multiple RDF datasets. In:
Proceedings of the Terra Cognita Workshop on Foundations, Technologies and
Applications of the Geospatial Web. pp. 114–126. Citeseer (2011)
• [Sehgal et al. GIS’06]
Sehgal, V., Getoor, L., Viechnicki, P.D.: Entity resolution in geospatial data integration. In:
Proceedings of the 14th annual ACM international symposium on Advances in
geographic information systems. pp. 83–90. ACM (2006)
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 51
References (2/3)
• [Vilches-Blázquez et al., AGILE’12]
Vilches-Blázquez, L.M., Saquicela, V., Corcho, O.: Interlinking geospatial information in
the web of data. In: Bridging the Geographic Information Sciences, pp. 119–139.
Springer (2012)
• [Ngonga Ngomo, ISWC’13]
Ngonga Ngomo, A.C.: Orchid - reduction-ratio-optimal computation of geo-spatial
distances for link discovery. In: Proceedings of ISWC 2013 (2013)
• [Clementini et al., SSD'93]
Clementini, E., Di Felice, P., van Oosterom, P.: A small set of formal topological
relationships suitable for end-user interaction. In: Abel, D., Chin Ooi, B. (eds.) Advances
in Spatial Databases, Lecture Notes in Computer Science, vol. 692, pp. 277–295.
Springer Berlin Heidelberg (1993), http://dx.doi.org/10.1007/3-540-56869-7_16
• [Randell et al. KR’92]
Randell, D.A., Cui, Z., Cohn, A.G.: A spatial logic based on regions and connection. In:
KR. pp. 165–176 (1992)
• [Allen, Commun. ACM’83]
Allen, J.F.: Maintaining knowledge about temporal intervals. Commun. ACM 26(11), 832–
843 (Nov 1983)
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 52
References (3/3)
• [Isele et al., WebDB’11]
Isele, R., Jentzsch, A., Bizer, C.: Efficient multidimensional blocking for link discovery
without losing recall. In: WebDB. Citeseer (2011)
• [Papadakis et al, TKDE’13]
Papadakis, G., Ioannou, E., Palpanas, T., Niederée, C., Nejdl, W.: A blocking framework
for entity resolution in highly heterogeneous information spaces. Knowledge and Data
Engineering, IEEE Transactions on 25(12), 2665–2682 (2013)
01/06/2015 Discovering Spatial and Temporal Links among RDF Graphs 53
Thanks for your attention! Questions?
Transforming Natura2000 Shapefile into RDF
Kostis Kyzirakos and Dimitrianos Savva
Natura2000 (South Germany)
GeoTriples GUI
• From terminal execute: geotriples-gui
1. Connect to Natura2000 Shapefile
2. Adjust class/predicate names to ontology
3. Generate mapping
4. Dump RDF
Connect to Shapefile
GeoTriples Layout
Mapping Builder (Left)1. Adjust triples maps2. Change DataTypes3. Change Predicates
Bottom Toolbar1. Generate Mapping2. Select your preferred
geo-vocabulary3. Define CRS4. Select output format5. Dump RDF
Mapping Editor (Right)1. Change the mapping
by hand
Adjust to Ontology
Generate Mapping
Dump RDF
/home/leo/datasets/naturatriples.n3
RDF graph
Store RDF graph to Strabon
# endpoint store
http://localhost:8080/strabonendpoint
N-Triples -t
/home/leo/datasets/naturatriples.n3
Transforming OpenStreetMaps GML document into an RDF graph (1/4)
# cd ~/DEMO_ESWC15
# ./osmmapping.sh
--
geotriples-cmd generate_mapping
-o OSM/automatic-mapping.rml.ttl
-b http://data.linkedeodata.eu/waterways
-r waterways
-rp /ogr:FeatureCollection/gml:featureMember
-ns "gml|http://www.opengis.net/gml,
ogr|http://ogr.maptools.org/"
-null -onto OSM/automatic-ontology.txt
-x OSM/osm_waterways.xsd OSM/osm_waterways.gml
Transforming OpenStreetMaps GML document into an RDF graph (2/4)
# cp OSM/automatic-mapping.rml.ttl
OSM/altered-mapping.rml.ttl
# gedit OSM/altered-mapping.rml.ttl
Transforming OpenStreetMaps GML document into an RDF graph (3/4)
1. Change the class definition for the triples map <#ogr:waterwaysogr:geometryProperty>
1. Replace the class onto:LineStringPropertyTypewith ogc:Geometry
2. Change the predicate that will link the thematic data with the geometric data.
1. Find the triples map <#waterways>
2. Replace the text onto:has_geometryPropertywith ogc:hasGeometry
Transforming OpenStreetMaps GML document into an RDF graph (4/4)
# ./osmdump.sh
--
geotriples-cmd dump_rdf -rml
-o OSM/osmtriples.n3
-ns osm-namespaces.ns
OSM/altered-mapping.ttl
--
# endpoint store
http://localhost:8080/strabonendpoint N-
Triples -t
/home/leo/DEMO_ESWC15/OSM/osmtriples.n3
Store TalkingFields datasets to Strabon
# endpoint store
http://localhost:8080/strabonendpoint
N-Triples -t /home/leo/datasets/fb.n3
# endpoint store
http://localhost:8080/strabonendpoint
N-Triples -t /home/leo/datasets/rc.n3
• Download: https://github.com/silk-framework/silk
• Workbench application pre-installed in the VM
• Discover the following links:
All the datasets will be first converted to RDF with GeoTriples!
Hands-on Silk
Source Dataset Relation Target Dataset
Field Boundaries Contains Raster Cells
OSM Water
Bodies
Intersects Natura (2000)
Natura (2000) Within Federal States of
Germany
Start the Silk Workbench
Open Workspace
Import the project that you will find in the Desktop of the VM
Open the Linkage Rule
Modify the Linkage Rule
Start the Link Generation
Examing Generated Links
$ less
/home/leo/Desktop/FieldBounda
riesRasterCellsLinks.nt
• Download: https://github.com/silk-framework/silk
• Workbench application pre-installed in the VM
• Discover the following links:
All the datasets will be first converted to RDF with GeoTriples!
Hands-on Silk
Source Dataset Relation Target Dataset
Field Boundaries Contains Raster Cells
OSM Water
Bodies
Intersects Natura (2000)
Natura (2000) Within Federal States of
Germany