semantics-enhanced geoscience interoperability, analytics, and applications

34
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications Krishnaprasad Thirunarayan and Amit Sheth Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, OH-45435 1

Upload: knoesis-center-wright-state-university

Post on 20-Aug-2015

191 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

Semantics-enhanced Geoscience Interoperability, Analytics, and Applications Krishnaprasad Thirunarayan and Amit Sheth

Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, OH-45435

1

Page 2: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

Outline

•  Semantics-empowered Cyberinfrastructure for Geoscience Applications –  Approaches, Benefits, and Challenges

(reflecting cost/convenience/pay-off trade-offs)

•  Expressive search and integration using Geospatial information –  SPARQL enhancements –  Practical applications using semantic technologies,

sensor data streams, and spatial information

2

Page 3: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

Semantics-empowered Cyberinfrastructure for Geoscience

applications

3

Page 4: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

Domain Goals and Challenges

Data-driven understanding of the evolution of oceans, atmosphere, and solid earth over time through physical, chemical and biological processes.

•  Cultural challenges –  Proper protection, control, and credit for sharing data

•  Technological challenges –  Computational tools and repositories conducive to easy

exchange, curation, and attribution of data

Data sharing can promote re-analysis/re-interpretation of extant data, reducing “redundant” data collections.

4

Page 5: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

Category of Geoscience

Data

Characteristics Strategy for Reuse CI Strategy

Shor t ta i l s c i e n c e data created b y l a r g e organizations a n d projects

Few, large (TB+), structured, spatially rich (e.g., remote sensing), largely h o m o g e n e o u s , h i g h l y v i s i b l e , curated

P l a n n e d i n t e g r a t i o n strategies, could use formal ontologies / domain models a n d v o c a b u l a r i e s , visualization tools and APIs

Data centers / grids g e n e r a l l y u s i n g relational databases and files, maintained b y p e o p l e w i t h significant IT skills

L o n g t a i l s c i e n c e data created by individual s c i e n t i s t s a n d s m a l l groups

Many, small (GB+), h e t e r o g e n e o u s , invisible (except via p u b l i c a t i o n s ) , poorly curated

Multi-domain and broad vocabularies ( including community establ ished ones), create semantic metadata (annotations) and optionally publish, search and download legacy data, o r use an open da ta initiative

Web-based easy to learn and use semantic tools for annotation, publication, search and download that can be used by individual s c i e n t i s t s w i t h o u t significant IT skills

5

Page 6: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

Our Thesis

Associating machine-processable semantics with the long tail of science data and documents can help overcome challenges associated with data discovery, integration and interoperabi l i ty caused by data heterogeneity.

6

Page 7: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

What?: Nature of Data

•  Structured Data (e.g., relational)

•  Semi-structured, Heterogeneous Documents (e.g., Geoscience publications and technical specs, which usually include text, numerics, maps and images)

•  Tabular data (e.g., ad hoc spreadsheets and complex tables incorporating “irregular” entries)

7

Page 8: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

What?: Granularity of Semantics and Associated Applications

•  Lightweight semantics: File-level annotation to enable discovery and sharing of long tail of science data

•  Richer semantics: Document-level annotation and

extraction for semantic search and summarization •  Fine-grained semantics: Data integration,

interoperability and reasoning in Linked Open Data

8

Page 9: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

Why?: Benefits of Lightweight Semantics

•  Ease of use by domain experts –  Faster and wider adoption, promoting evolution

•  Low upfront cost to support

•  Shallow semantics has wider applicability to a range of documents/data and appeal to a broader community of geoscientists

•  Bottom-line: “Learn to Walk before we Run”

9

Page 10: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

How?: Ingredients for Semantics-based Cyber Infrastructure

•  Use of community-ratified controlled vocabularies and l ightweight ontologies (upper- level , hierarchies)

•  Ease self-publishing and discovery

•  Data citation index to credit for data sharing

•  Semi-automat ic annotat ion of data and documents : Manual + Automatic

10

Page 11: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

Title of data Selected from five tier vocabulary provided Keywords

Type of data maps, excel files, images, text

Data format structured or unstructured

Description of data brief unstructured description of content

Contact information of provider(s) name of provider(s), email for verification, lineage

Spatial extent of data and reference system

location

Temporal extent of data date range in time or age range if not recent

Date and type of Related Publication(s)

Journal, Thesis, Agency report, not published

Host site for publication Journal, Library, Personal computer

Access restrictions copyright regulations

Example: Lightweight Semantic Registration of Data

11

Page 12: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

System Architecture and Components

12

Page 13: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

Problems and A Practical Approach (“When rubber meets the road”)

Deeper Issues: Semantic Formalization of Tabular Data

13

skip  

Page 14: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

Nature of tables

•  Compact structures for sharing information –  Minimize duplication

•  Types of Tables –  Regular : Dense Grid with explicit schema

information in terms of column and row headings => Tractable

–  Irregular: Sparse Grid with implicit schema and ad hoc placement of heading => Hard

14

Page 15: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

Challenges Associated with Typical Spreadsheet/Table

•  Meant for human consumption •  Irregular :

–  Not simple rectangular grid •  Heterogeneous

–  All rows not interpreted similarly •  Complex

–  Meaning of each row and each column context dependent •  Footnotes modify meaning of entries (esp. in materials

and process specifications)

15

Page 16: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

Practical Semi-Automatic Content Extraction

•  DESIGN: Develop regular data structures that can be used to formalize tabular information. –  Provide a natural expression of data –  Provide semantics to data, thereby removing potential

ambiguities –  Enable automatic translation

•  USE: Manual population of regular tables and automatic translation into LOD

16

Page 17: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

Expressive search and integration using Geospatial information

17

Page 18: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

Outline

•  Query Language Support for Spatio-Temporal Context: SPARQL-ST (=> GeoSPARQL)

•  Practical Applications that use Spatio-Temporal information for joining Sensor Data to enable Machine Perception

18

Page 19: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

Overview : SPARQL-ST

•  SPARQL –  W3C recommended query language for RDF data (as of

Jan. 15, 2008) –  Graph pattern-based queries (subgraph match)

•  SPARQL-ST –  Spatial variables –  Temporal variables –  Spatial filter expressions –  Temporal filter expressions

19

skipToEg  

Page 20: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

SELECT ?n WHERE { ?p foaf:name ?n . ?p usgov:hasRole ?r . ?r usgov:forOffice ?o . ?o usgov:represents ?q . ?q stt:located_at %g . ?a foaf:name “Nancy Pelosi” . ?a usgov:hasRole ?b . ?b usgov:forOffice ?c . ?c usgov:represents ?d . ?d stt:located_at %h .

SPATIAL FILTER (distance(%g, %h) <= 100 miles) }

Find all politicians that represent areas within 100 miles of the district represented by Nancy Pelosi.

20

Page 21: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

SELECT ?p WHERE { ?p usgov:hasRole ?r #t1 . ?r usgov:forOffice ?o #t2 . ?o usgov:represents ?c #t3 . ?c stt:located_at %g #t4 . SPATIAL FILTER (inside(%g, GEOM(POLYGON (( -75.14 40.88, -70.77 40.88, -70.77 42.35, -75.14 42.35, -75.14 40.88))) )) TEMPORAL FILTER ( anyinteract(intersect (#t1, #t2, #t3, #t4), interval(10:01:2013, 10:31:2013, MM:DD:YYYY))) }

Find all politicians representing congressional districts within a given geographical area at any time in October 2013

21

Page 22: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

Summary of SPARQL-ST

•  Relationship-centric nature of the RDF data model extended for querying STT data

•  Querying – Supports spatial and temporal relationships in graph

pattern queries –  Integrates well with current standards

•  Implementation – Good scalability on large synthetic/real-world data – Only system for spatial and temporal RDF

22

Page 23: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

4th Annual Spatial Ontology Community of Practice Workshop (SOCoP) USGS, 12201 Sunrise Valley Drive , Reston VA

December 2, 2011

OGC GeoSPARQL Slides by Matt Perry of Oracle

(also: Kno.e.sis Alumnus)

23

Page 24: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

OGC ®

What Does GeoSPARQL Give Us?

•  Vocabulary for Query Patterns – Classes

•  Spatial Object, Feature, Geometry – Properties

•  Topological relations •  Links between features and geometries

– Datatypes for geometry literals •  ogc:WKTLiteral, ogc:GMLLiteral

•  Query Functions – Topological relations, distance, buffer, intersection, …

•  Entailment Components – RIF rules to expand feature-feature query into geometry query – Gives a common interface for qualitative and quantitative systems

24

SkipToEg

Page 25: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

OGC ®

PREFIX : <http://my.com/appSchema#> PREFIX ogc: <http://www.opengis.net/geosparql#> PREFIX ogcf: <http://www.opengis.net/geosparql/functions#> PREFIX epsg: <http://www.opengis.net/def/crs/EPSG/0/> SELECT ?restaurant WHERE { ?restaurant rdf:type :Restaurant . ?restaurant :cuisine :Mexican . ?restaurant :pointGeometry ?rGeo . ?rGeo ogc:asWKT ?rWKT } ORDER BY ASC(ogcf:distance(“POINT(…)”^^ogc:WKTLiteral, ?rWKT, ogc:KM)) LIMIT 3

Find the three closest Mexican restaurants

Example Query

25

Page 26: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

Practical Applications that use Spatio-Temporal information

for joining Sensor Data to enable Machine Perception

26

Page 27: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

Applications using spatial and/or temporal information

•  Location-aware applications –  Four Squares –  Open Street Maps

•  Spatio-temporal-thematic (STT) context-enhanced data integration, querying, and inferencing (machine perception) –  Semantic Sensor Web (+ SemSOS)

•  Abstract weather sensor data streams to weather features

27

Page 28: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

•  Applications supporting expressive queries –  Human comprehensible vs machine processable

•  Geonames (LOD) ↔ Lat-long, GPS data –  What is the current temperature or traffic delay at Dayton

International Airport?

–  Knowledge-based query expansion/reasoning •  Bridging vocabulary mismatches in the queries and the data,

e.g., using semantic relationships between regions and landmark locations –  Find schools in OH –  Find schools near Wright State University

(cont’d)

28

Page 29: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

Semantic Sensor Observation Service Architecture : Making the Data Smart

29

Page 30: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

SSW demo with Mesowest data (Machine Perception)

http://archive.knoesis.org/projects/sensorweb/demos/semsos_mesowest/ssos_demo.htm

30

Page 31: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

Implementation of Perception Cycle

31

Page 32: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

Trusted Perception Cycle Demo

http://www.youtube.com/watch?v=lTxzghCjGgU

32

Page 33: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

Sensor Discovery on Linked Data Demo

http://archive.knoesis.org/projects/sensorweb/demos/sensor_discovery_on_lod/sample.htm

33

Page 34: Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

34

thank you, and please visit us at http://knoesis.org/

Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, Ohio, USA

Kno.e.sis