presenting semantic data through “instance hubs” using authoritative uri design schemes

1
Presenting Semantic Data Through “Instance Hubs” Using Authoritative URI Design Schemes Alexei Bulazel 1 ([email protected]) , Dominic Difranzo 1 ([email protected]) James Hendler 1 ([email protected]) 1 Tetherless World Constellation, Department of Computer Science, Rensselaer Polytechnic Institute INTRODUCTION METHODS 1.CSV files containing data of interest are converted to RDF using csv2rdf4lod-automation tool developed by at RPI. The RDF is then loaded into a triple store database. 2.LODSPeaKr views are then defined, giving rules as to what should be presented at URIs on the basis of the type of resource that they reference. Queries to both local and external data sources (“endpoints”) are written to obtain data about the resources found at given URIs. 3.LODSPeaKr services are defined for intermediate URIs that do not describe resources, and are instead used to provide listings of resources. 4.Users can now effectively traverse the URI scheme of any given resource in the instance hub, seeing categorical data presented at varying levels of specificity. Users are also able to request data in a variety of formats – HTML for human readable output, or semantic RDF formats such as Turtle, XML+RDF, or JSON. RESULTS & DISCUSSION • Through the use of the LODSPeaKr framework, we were able to greatly speed up the process of publishing semantic data, while also allowing for the creation of a richer and easier to use user interface. • We implemented hierarchical URI navigation as per design standards developed at the Tetherless World, Data.gov, and Data.gov.uk. Users are now able to logically navigate the hierarchy of URIs in the instance hub, viewing categorical information at varying levels of specificity. • Users are able to use content negotiation to request and obtain data in a variety of different semantic formats. • Future work will be done with integrating datasets catalogued by the Tetherless world lab with country and agency pages. • Other types of data will be integrated with the instance hub beyond just government- On the Semantic Web there is a need for ways to express authoritative references to entities, and to do so in such a way that is descriptive of those entities both for computers and humans. So-called “instance hubs” provide a way of doing this for collections of entities in related categories, e.g.: countries, US states, US government agencies, crops, toxic chemicals, etc. A URI (uniform resource identifier) is a string of characters that identifies a name or a resource on the web (ie:http://logd.tw.rpi.edu/id/us/ state/New_York). The commonly used term "URL" refers to a subset of URIs which identify resources on the web (ie: www.google.com). In instance hubs, URI design is of great importance; URIs for individual elements should be descriptive of the elements contained therein while also allowing for intuitive exploration and navigation through hierarchically structured instance categories. OBJECTIVE Previous research into instance hubs conducted by RPI researchers at the Tetherless World Constellation and working within the US Government’s Data.gov was done on an ad hoc basis using PHP scripts written by hand. While the semantic conversion and querying processes for instance hub presented data are very well established, a consistent easy-to-use process for presentation of data does not yet exist. In order to make the instance hub creation process faster we used LODSPeaKr (Linked Open Data Simple Publishing Kit), a framework developed at RPI for creating linked data applications and publishing RDF data online. One of LODSPeaKr’s greatest strengths for instance hub related work is the ability to easily define “services” which allow for the presentation of categorically related data at varying degrees of specificity gleaned from the URI, in keeping with URI design standards developed by Data.gov and Data.gov.uk in collaboration with the Tetherless World. The anatomy of a URI. The URI path is descriptive of the entity it refers to, and can be cut off at any point to view data pertaining to the category or entity found there. Diagram of URI hierarchy. “id” contains all instances, while “id/us” contains all US instances, and “id/us/fed” contains all US federal instances, etc. Example US federal agency page

Upload: grant

Post on 20-Feb-2016

15 views

Category:

Documents


0 download

DESCRIPTION

OBJECTIVE. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Presenting Semantic Data Through “Instance Hubs” Using Authoritative URI Design Schemes

Presenting Semantic Data Through “Instance Hubs” Using Authoritative URI

Design SchemesAlexei Bulazel1 ([email protected]), Dominic Difranzo1

([email protected]) James Hendler1 ([email protected])

1Tetherless World Constellation, Department of Computer Science, Rensselaer Polytechnic Institute

INTRODUCTION

METHODS1. CSV files containing data of interest are converted to RDF

using csv2rdf4lod-automation tool developed by at RPI. The RDF is then loaded into a triple store database.

2. LODSPeaKr views are then defined, giving rules as to what should be presented at URIs on the basis of the type of resource that they reference. Queries to both local and external data sources (“endpoints”) are written to obtain data about the resources found at given URIs.

3. LODSPeaKr services are defined for intermediate URIs that do not describe resources, and are instead used to provide listings of resources.

4. Users can now effectively traverse the URI scheme of any given resource in the instance hub, seeing categorical data presented at varying levels of specificity. Users are also able to request data in a variety of formats – HTML for human readable output, or semantic RDF formats such as Turtle, XML+RDF, or JSON.

RESULTS & DISCUSSION• Through the use of the LODSPeaKr framework, we were able to greatly speed up the process of publishing semantic data, while also allowing for the creation of a richer and easier to use user interface.

• We implemented hierarchical URI navigation as per design standards developed at the Tetherless World, Data.gov, and Data.gov.uk. Users are now able to logically navigate the hierarchy of URIs in the instance hub, viewing categorical information at varying levels of specificity.

• Users are able to use content negotiation to request and obtain data in a variety of different semantic formats.

• Future work will be done with integrating datasets catalogued by the Tetherless world lab with country and agency pages.

• Other types of data will be integrated with the instance hub beyond just government-related information. Crops and toxic chemicals are present in the Tetherless World’s previous instance hub, and we plan on porting them over to this new LODSPeaKr-based instance hub.

On the Semantic Web there is a need for ways to express authoritative references to entities, and to do so in such a way that is descriptive of those entities both for computers and humans. So-called “instance hubs” provide a way of doing this for collections of entities in related categories, e.g.: countries, US states, US government agencies, crops, toxic chemicals, etc. A URI (uniform resource identifier) is a string of characters that identifies a name or a resource on the web (ie:http://logd.tw.rpi.edu/id/us/state/New_York). The commonly used term "URL" refers to a subset of URIs which identify resources on the web (ie: www.google.com). In instance hubs, URI design is of great importance; URIs for individual elements should be descriptive of the elements contained therein while also allowing for intuitive exploration and navigation through hierarchically structured instance categories.

OBJECTIVEPrevious research into instance hubs conducted by RPI researchers at the Tetherless World Constellation and working within the US Government’s Data.gov was done on an ad hoc basis using PHP scripts written by hand. While the semantic conversion and querying processes for instance hub presented data are very well established, a consistent easy-to-use process for presentation of data does not yet exist. In order to make the instance hub creation process faster we used LODSPeaKr (Linked Open Data Simple Publishing Kit), a framework developed at RPI for creating linked data applications and publishing RDF data online. One of LODSPeaKr’s greatest strengths for instance hub related work is the ability to easily define “services” which allow for the presentation of categorically related data at varying degrees of specificity gleaned from the URI, in keeping with URI design standards developed by Data.gov and Data.gov.uk in collaboration with the Tetherless World.

The anatomy of a URI. The URI path is descriptive of the entity it refers to, and can be cut off at any point to view data pertaining to the category or entity found there.

Diagram of URI hierarchy. “id” contains all instances, while “id/us” contains all US instances, and “id/us/fed” contains all US federal instances, etc.

Example US federal agency page