querying incomplete geospatial information in rdf

22
Querying Incomplete Geospatial Information in RDF Charalampos Nikolaou and Manolis Koubarakis Department of Informatics and Telecommunications National and Kapodistrian University of Athens International Symposium on Spatial and Temporal Databases (SSTD) 2013 August 23, 2013

Upload: charalampos-babis-nikolaou

Post on 04-Jul-2015

209 views

Category:

Education


1 download

DESCRIPTION

In this paper, we propose the problem of implementing an efficient query processing system for incomplete temporal and geospatial information in RDFi as a challenge to the SSTD community.

TRANSCRIPT

Page 1: Querying Incomplete Geospatial Information in RDF

Querying Incomplete Geospatial Information in RDF

Charalampos Nikolaou and Manolis Koubarakis

Department of Informatics and Telecommunications

National and Kapodistrian University of Athens

International Symposium on Spatial and Temporal Databases (SSTD) 2013

August 23, 2013

Page 2: Querying Incomplete Geospatial Information in RDF

Motivation

• Increased interest in publishing geospatial datasets

as linked data (i.e., encoded in RDF and with

semantic links to other datasets)

• Geospatial information might be:

o Quantitative (e.g., exact geometric information)

o Qualitative (e.g., topological relations)

... and express knowledge that is

o Complete

o Incomplete (or indefinite)

Page 3: Querying Incomplete Geospatial Information in RDF

Ordnance Survey (UK)

73,546,231 triples

Page 4: Querying Incomplete Geospatial Information in RDF

Global Administrative Areas (GADM)

9,896,532triples

Page 5: Querying Incomplete Geospatial Information in RDF

Nomenclature of Territorial Units for Statistics (NUTS)

316,246triples

Page 6: Querying Incomplete Geospatial Information in RDF

Linked Geospatial Data

As of September 2011

MusicBrainz

(zitgist)

P20

Turismo

de

Zaragoza

yovisto

Yahoo! Geo

Planet

YAGO

World

Fact-

book

El ViajeroTourism

WordNet

(W3C)

WordNet

(VUA)

VIVO UF

VIVO Indiana

VIVO Cornell

VIAF

URIBurner

Sussex

Reading

Lists

Plymouth

Reading

Lists

UniRef

UniProt

UMBEL

UK Post-codes

legislationdata.gov.uk

Uberblic

UB Mann-heim

TWC LOGD

Twarql

transportdata.gov.

uk

Traffic

Scotland

theses.

fr

Thesau-

rus W

totl.net

Tele-

graphis

TCM

Gene

DIT

TaxonConcept

Open Library (Talis)

tags2con

delicious

t4gm

info

Swedish

Open

Cultural Heritage

Surge

Radio

Sudoc

STW

RAMEAU SH

statistics

data.gov.

uk

St.

Andrews

Resource

Lists

ECS

South-

ampton

EPrints

SSW

Thesaur

us

Smart

Link

Slideshare

2RDF

semantic

web.org

Semantic

Tweet

Semantic XBRL

SW

Dog

Food

Source Code Ecosystem Linked Data

US SEC (rdfabout)

Sears

Scotland

Geo-

graphy

Scotland

Pupils &

Exams

Scholaro-

meter

WordNet (RKB

Explorer)

Wiki

UN/

LOCODE

Ulm

ECS (RKB

Explorer)

Roma

RISKS

RESEX

RAE2001

Pisa

OS

OAI

NSF

New-

castle

LAAS

KISTI

JISC

IRIT

IEEE

IBM

Eurécom

ERA

ePrints dotAC

DEPLOY

DBLP (RKB

Explorer)

Crime Reports

UK

Course-

ware

CORDIS (RKB

Explorer)CiteSeer

Budapest

ACM

riese

Revyu

research

data.gov.

ukRen.

Energy

Genera-tors

reference

data.gov.

uk

Recht-spraak.

nl

RDFohloh

Last.FM (rdfize)

RDF Book

Mashup

Rådata nå!

PSH

Product Types

Ontology

ProductDB

PBAC

Poké-

pédia

patents

data.go

v.uk

Ox

Points

Ord-nance Survey

Openly Local

Open Library

OpenCyc

Open Corpo-rates

OpenCalais

OpenEI

Open

Election

Data

Project

Open

Data

Thesau-

rus

Ontos News Portal

OGOLOD

JanusAMP

Ocean Drilling Codices

New

York

Times

NVD

ntnusc

NTU

Resource

Lists

Norwe-

gian

MeSH

NDL subjects

ndlna

myExperi-ment

Italian

Museums

medu-

cator

MARC

Codes

List

Man-

chester

Reading

Lists

Lotico

Weather

Stations

London Gazette

LOIUS

Linked Open Colors

lobidResources

lobidOrgani-sations

LEM

LinkedMDB

LinkedLCCN

LinkedGeoData

LinkedCT

Linked

User

Feedback

LOV

Linked Open

Numbers

LODE

Eurostat (Ontology

Central)

Linked

EDGAR (Ontology

Central)

Linked

Crunch-

base

lingvoj

Lichfield

Spen-

ding

LIBRIS

Lexvo

LCSH

DBLP (L3S)

Linked Sensor Data (Kno.e.sis)

Klapp-

stuhl-

club

Good-

win

Family

National

Radio-

activity

JP

Jamendo

(DBtune)

Italian

public

schools

ISTAT Immi-gration

iServe

IdRef Sudoc

NSZL Catalog

Hellenic

PD

Hellenic FBD

Piedmont

Accomo-

dations

GovTrack

GovWILD

Google

Art

wrapper

gnoss

GESIS

GeoWordNet

GeoSpecies

GeoNames

GeoLinkedData

GEMET

GTAA

STITCH

SIDER

Project

Guten-

berg

Medi

Care

Euro-

stat

(FUB)

EURES

DrugBank

Disea-

some

DBLP (FU

Berlin)

Daily

Med

CORDIS(FUB)

Freebase

flickr wrappr

Fishes of Texas

Finnish

Munici-

palities

ChEMBL

FanHubz

EventMedia

EUTC

Produc-

tions

Eurostat

Europeana

EUNIS

EU

Insti-

tutions

ESD

stan-

dards

EARTh

Enipedia

Popula-

tion (En-

AKTing)

NHS(En-

AKTing) Mortality

(En-

AKTing)

Energy

(En-

AKTing)

Crime

(En-

AKTing)

CO2

Emission

(En-

AKTing)

EEA

SISVU

education.data.g

ov.uk

ECS

South-

ampton

ECCO-

TCP

GND

Didactalia

DDC Deutsche

Bio-

graphie

data

dcs

MusicBrainz

(DBTune)

Magna-

tune

John

Peel (DBTune)

Classical

(DB

Tune)

AudioScrobbler (DBTune)

Last.FM artists

(DBTune)

DBTropes

Portu-guese

DBpedia

dbpedia lite

Greek DBpedia

DBpedia

data-open-ac-uk

SMC

Journals

Pokedex

Airports

NASA

(Data

Incu-

bator)

MusicBrainz(Data

Incubator)

Moseley

Folk

Metoffice Weather Forecasts

Discogs (Data

Incubator)

Climbing

data.gov.uk intervals

Data Gov.ie

databnf.fr

Cornetto

reegle

Chronic-ling

America

Chem2Bio2RDF

Calames

business

data.gov.

uk

Bricklink

Brazilian Poli-

ticians

BNB

UniSTS

UniPath

way

UniParc

Taxonomy

UniProt(Bio2RDF)

SGD

Reactome

PubMedPub

Chem

PRO-SITE

ProDom

Pfam

PDB

OMIMMGI

KEGG

Reaction

KEGG Pathway

KEGG

Glycan

KEGG Enzyme

KEGG

Drug

KEGG

Com-

pound

InterPro

HomoloGene

HGNC

Gene

Ontology

GeneID

Affy-metrix

bible

ontology

BibBase

FTS

BBC

Wildlife

Finder

BBC Program

mes BBC Music

Alpine

Ski

Austria

LOCAH

Amster-dam

Museum

AGROVOC

AEMET

US Census (rdfabout)

Media

Geographic

Publications

Government

Cross-domain

Life sciences

User-generated content

~ 62 billion triples

Page 7: Querying Incomplete Geospatial Information in RDF

Question

How do we manage (represent, store,

query) this data efficiently?

Page 8: Querying Incomplete Geospatial Information in RDF

Challenges: Theory① RDF extensions for representing and querying incomplete

qualitative and quantitative geospatial information

• GeoSPARQLo Standard OGC query language for RDF data with geospatial information

o Topological relations can be expressed/queried, but no reasoning is offered.

• We proposed RDFi

o Can work with any topological/temporal constraint languagewith/without constant symbols (e.g., RCC-5, RCC-8, IA)

o Formal semantics and algorithm for computing certain answers

o Preliminary complexity results for various constraint languages

• No published algorithm for query processing when considering RCC-8 and constants

Page 9: Querying Incomplete Geospatial Information in RDF

RDFi by examplegag:Region rdfs:subClassOf geo:Feature.

gag:WestGreece rdf:type gag:Region.

gag:Municipality rdfs:subClassOf geo:Feature.

gag:OlympiaMuni rdf:type gag:Municipality.

noa:Hotspot rdfs:subClassOf geo:Feature.

noa:hotspot rdf:type noa:Hospot.

noa:Fire rdfs:subClassOf geo:Feature.

noa:fire rdf:type noa:Fire.

gag:OlympiaMuni geo:hasGeometry ex:oGeo.

ex:oGeo rdf:type sf:Polygon.

ex:oGeo geo:asWKT "POLYGON((..))"^^geo:wktLiteral.

noa:hotspot geo:hasGeometry ex:rec.

ex:rec geo:asWKT "POLYGON((..))"^^geo:wktLiteral.

gag:WestGreece geo:sfContains gag:OlympiaMuni.

noa:hotspot geo:sfContains noa:fire.

West Greece

Olympia

Page 10: Querying Incomplete Geospatial Information in RDF

RDFi by example (cont’d)

Query: Find fires inside the

region of West Greece.

GeoSPARQL query:CERTAIN SELECT ?f

WHERE {

?f rdf:type noa:Fire.

gag:WestGreece geo:sfContains ?f.

}

West Greece

Olympia

Page 11: Querying Incomplete Geospatial Information in RDF

RDFi by example (cont’d)

Query: Find fires inside the

region of West Greece.

GeoSPARQL query:CERTAIN SELECT ?f

WHERE {

?f rdf:type noa:Fire.

gag:WestGreece geo:sfContains ?f.

}

West Greece

Olympia

con

tain

s

con

tain

s

Page 12: Querying Incomplete Geospatial Information in RDF

Challenges: Theory

② Efficient computation of the entailment relation

Φ⊨Θ• where Φ and Θ are quantifier-free first-order

formulas of a constraint language expressing the

topological relations of various frameworks (RCC-8,

DE-9IM, etc.)

Page 13: Querying Incomplete Geospatial Information in RDF

Challenges: Theory

③ Computing entailment is equivalent to checking consistency of formulas with constraint networks

• Constraint networks:o Spatial relations among regions

o Regions might be constant ones (exact geometric information) or identified by a URI

• Most recent results considered basic and completeRCC-5 networks with polygonal regions

• For RCC-8, deciding consistency is NP-complete

• No published algorithm for checking consistency

• Are there tractable cases?

Page 14: Querying Incomplete Geospatial Information in RDF

Challenges: Practice

④ Scale to billions of triples

• Reasoners from QSR scale only up to hundreds of regionswith complex spatial relations

How do they perform in our case?

• Setting:o Real linked geospatial datasets

o No constants

o Only base RCC-8 relations

o Evaluation of consistency checking using the well-known path-consistency algorithm

Page 15: Querying Incomplete Geospatial Information in RDF

Experimental evaluation

Setup: Intel Xeon E5620, 2.4 GHz, 12MB L3, 48GB RAM, RAID 5, Ubuntu 12.04

• Computation of

the complete

constraint

network

• Running time:

O(n3)

• Memory

requirements:

O(n2)

n ≈ thousands to

millions

hundreds of regions

thousands of regions

thousands of regions

thousands of regions

after one day

Page 16: Querying Incomplete Geospatial Information in RDF

Network structure

• We have started working on algorithms taking into

account the structure of these networks:

o Node degrees fit a power-law distribution

o Network is sparse

Page 17: Querying Incomplete Geospatial Information in RDF

Network structure (cont’d)

• Edges of three kinds:

• Reflect networks composed of components with hierarchical structureo R-tree extensions (Papadias, Kalnis, Mamoulis, AAAI’99)

• Parallel algorithms combined with backward-chaining techniques for lazy query processingo Graph partitioning

o Path compression data structures and indexes

externally connected equalsnon-tangential proper part

Page 18: Querying Incomplete Geospatial Information in RDF

Related work: Spatial

• Qualitative spatial reasoning

- Efficient algorithms for consistency checking of constraint

networks (complex spatial relations, few number of regions)

- Does not consider query processing

• Description logic reasoners

- PelletSpatial: RCC-8 reasoning (cannot handle disjunctions)

- RacerPro: RCC-8 reasoning

Page 19: Querying Incomplete Geospatial Information in RDF

Related work: Temporal

• Chaudhuri (VLDB’88)

• The knowledge representation language Telos (TOIS’90)

• Foundations of temporal constraint databases (Koubarakis, PhD thesis, ‘94)

• Qualitative temporal reasoning community (since 80s)

• SQL+i system (BNCOD‘96)

• Later system (IEEE’97)

• Hurtado and Vaisman (2006)

Page 20: Querying Incomplete Geospatial Information in RDF

Conclusions

• What’s the CHALLENGE?

Implementing an efficient query processing system

for incomplete geospatial information in RDFi

• The desired system should:

o reason about qualitative and quantitative spatial

information that might be incomplete

o be scalable to billions of triples in the most useful cases

Page 21: Querying Incomplete Geospatial Information in RDF

Thank you

Page 22: Querying Incomplete Geospatial Information in RDF

Dataset characteristics