modeling and representing national climate assessment information using linked data

1
Modeling and Representing National Climate Assessment Information using Linked Data Jin Guang Zheng 1 ([email protected] ) , Curt Tilmes 2 ( ctilmes@usgcrp. gov ) , Aaron Smith 2 ([email protected] ) , Stephan Zednik 1 ([email protected] ), Xiaogang Ma 1 (max7@rpi .edu ) Peter Fox 1 (foxp@ rpi.edu ) ( 1 Rensselaer Polytechnic Institute, Troy, NY, 12180, 2 United States Global Change Research Program, Washington DC) Every four years, Earth scientists work together on a National Climate Assessment (NCA) report which integrates, evaluates, and interprets the findings of climate change and impacts on affected industries such as agriculture, natural environment, energy production and use, etc. Given the amount of information presented in each report, and the wide range of information sources and topics, it can be difficult for users to find and identify desired information. To ease the user effort of information discovery, well- structured metadata is needed that describes the report's key statements and conclusions and provide for traceable provenance of data sources used. We present an assessment ontology developed to describe terms, concepts and relations required for the NCA metadata. Wherever possible, the assessment ontology reuses terms from well-known ontologies such as Semantic Web for Earth and Environmental Terminology (SWEET) ontology, Dublin Core (DC) vocabulary. We have generated sample National Climate Assessment metadata conforming to our assessment ontology and publicly exposed via a SPARQL-endpoint and website. We have also modeled provenance information for the NCA writing activities using the W3C recommendation- candidate PROV-O ontology. Using this provenance the user will be able to trace the sources of information used in the assessment and therefore make trust decisions. In the future, we are planning to implement a faceted browser over the metadata to enhance metadata traversal and information discovery. ABSTRACT USE CASE SCENARIO THE LINKED DATA OF AN NCA REPORT PROVENANCE USE CASE: The reader of the NCA report wishes to identify the dataset used to generate a particular gure in the report. S/he is directed rst to the gure caption. Selecting the caption displays a page of information about the gure, and, if the gure was originally published in another paper, including a link via the paper’s DOI to the publisher’s site describing that paper and offering it for download. The page of information also includes references to the datasets used in the paper on which the gure was based. Following each of the dataset links presents a page of information about the dataset, including links back to the agency/data center web page which provides more detail on the dataset (metadata) and from which the actual data may be available for order or download. Figure 1: An illustration of the provenance use case using the National Climate Assessment 2009 Report, Chapter Southeast. ONTOLOGIES Poster: MT15A-08 REFERENCE: Sponsors: National Science Foundation The GCIS Ontology defines a set of concepts and relations to model and represent the information of the NCA Report, including the information discussed in the Use Case Scenario. The Ontology reuses many concepts and relations across different ontologies, including FOAF 3 , Dublin Core 4 , and SKOS 5 . The Ontology is encoded using the Web Ontology Language. Structured data access for the NCA Report are designed according to the principles of Linked Data 1 . The structured data are stored in a triple store, a database for Semantic Web data, and can be accessed via Sparql Endpoint. Data entities are linked to Dbpedia 2 resources, which provides more descriptions for the data entity. Users can easily navigate and browse the data via the Linked Data API interface. Data are available in multiple formats including RDF and developer-friendly JSON format FUTURE IMPLEMENTATION The NCA Report contains wide range of information sources and topics, it can be difficult for users to find and identify desired information. Faceted Browse System provides a way for user to better browse and navigate the structured data. The planned Faceted Browse System will be built to browse the structured data included in the NCA report. We modeled provenance information for the NCA writing activities using the W3C recommendation- candidate PROV-O ontology. Using this provenance the user will be able to trace the sources of information used in the assessment and therefore make trust decisions. Figure 3: Subset of GCIS Ontology for NCA Report Figure 4: Example of Provenance Modeling of NCA Report Figure 2: Presenting Linked Data use LinkedData API Figure 5: S2S Faceted Browse System, an option for faceted browsing of the NCA report 1. http://www.w3.org/DesignIssues/LinkedData 2. http://www.dbpedia.org/ 3. http://www.foaf-project.org/ 4. http://dublincore.org/documents/dcmi-terms/ 5. http://www.w3.org/2004/02/skos/

Upload: hastin

Post on 23-Mar-2016

28 views

Category:

Documents


0 download

DESCRIPTION

USE CASE SCENARIO. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Modeling and Representing National Climate Assessment Information using Linked Data

Modeling and Representing National Climate Assessment Information

using Linked Data

Jin Guang Zheng1 ([email protected]), Curt Tilmes2 (ctilmes@usgcrp.

gov), Aaron Smith2 ([email protected]), Stephan Zednik1 ([email protected]), Xiaogang Ma1 ([email protected]) Peter Fox1 ([email protected]) (1Rensselaer Polytechnic Institute, Troy, NY, 12180, 2United States Global Change Research Program, Washington DC)

Every four years, Earth scientists work together on a National Climate Assessment (NCA) report which integrates, evaluates, and interprets the findings of climate change and impacts on affected industries such as agriculture, natural environment, energy production and use, etc. Given the amount of information presented in each report, and the wide range of information sources and topics, it can be difficult for users to find and identify desired information. To ease the user effort of information discovery, well-structured metadata is needed that describes the report's key statements and conclusions and provide for traceable provenance of data sources used. We present an assessment ontology developed to describe terms, concepts and relations required for the NCA metadata. Wherever possible, the assessment ontology reuses terms from well-known ontologies such as Semantic Web for Earth and Environmental Terminology (SWEET) ontology, Dublin Core (DC) vocabulary. We have generated sample National Climate Assessment metadata conforming to our assessment ontology and publicly exposed via a SPARQL-endpoint and website. We have also modeled provenance information for the NCA writing activities using the W3C recommendation-candidate PROV-O ontology. Using this provenance the user will be able to trace the sources of information used in the assessment and therefore make trust decisions. In the future, we are planning to implement a faceted browser over the metadata to enhance metadata traversal and information discovery.

ABSTRACT

USE CASE SCENARIO

THE LINKED DATA OF AN NCA REPORT

PROVENANCE USE CASE: The reader of the NCA report wishes to identify the dataset used to generate a particular figure in the report. S/he is directed first to the figure caption. Selecting the caption displays a page of information about the figure, and, if the figure was originally published in another paper, including a link via the paper’s DOI to the publisher’s site describing that paper and offering it for download. The page of information also includes references to the datasets used in the paper on which the figure was based. Following each of the dataset links presents a page of information about the dataset, including links back to the agency/data center web page which provides more detail on the dataset (metadata) and from which the actual data may be available for order or download.

Figure 1: An illustration of the provenance use case using the National Climate Assessment 2009 Report, Chapter Southeast.

ONTOLOGIES

Poster: MT15A-08REFERENCE:

Sponsors: National Science Foundation

• The GCIS Ontology defines a set of concepts and relations to model and represent the information of the NCA Report, including the information discussed in the Use Case Scenario.

• The Ontology reuses many concepts and relations across different ontologies, including FOAF3, Dublin Core4, and SKOS5.

• The Ontology is encoded using the Web Ontology Language.

• Structured data access for the NCA Report are designed according to the principles of Linked Data1.

• The structured data are stored in a triple store, a

database for Semantic Web data, and can be accessed via Sparql Endpoint.

• Data entities are linked to Dbpedia2 resources, which provides more descriptions for the data entity.

• Users can easily navigate and browse the data via the Linked Data API interface.

• Data are available in multiple formats including RDF and developer-friendly JSON format

FUTURE IMPLEMENTATION

The NCA Report contains wide range of information sources and topics, it can be difficult for users to find and identify desired information.

Faceted Browse System provides a way for user to better browse and navigate the structured data.

The planned Faceted Browse System will be built to browse the structured data included in the NCA report.

• We modeled provenance information for the NCA writing activities using the W3C recommendation-candidate PROV-O ontology.

• Using this provenance the user will be able to trace the sources of information used in the assessment and therefore make trust decisions.

Figure 3: Subset of GCIS Ontology for NCA Report Figure 4: Example of Provenance Modeling of NCA Report

Figure 2: Presenting Linked Data use LinkedData API

Figure 5: S2S Faceted Browse System, an option for faceted browsing of the NCA report

1. http://www.w3.org/DesignIssues/LinkedData2. http://www.dbpedia.org/3. http://www.foaf-project.org/ 4. http://dublincore.org/documents/dcmi-terms/5. http://www.w3.org/2004/02/skos/