m.benno blumenthal and john del corral international research institute for climate and society ...

55
M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society http:// iridl.ldeo.columbia.edu/ontologie s / Using a Resource Description Framework (RDF) to carry metadata for climate datasets

Upload: irene-hodge

Post on 19-Jan-2018

215 views

Category:

Documents


0 download

DESCRIPTION

Standard Metadata Users Datasets Tools Standard Metadata Schema/Data Services

TRANSCRIPT

Page 1: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

M.Benno Blumenthal and John del Corral

International Research Institute for Climate and Society

http://iridl.ldeo.columbia.edu/ontologies/

Using a Resource Description Framework (RDF) to carry

metadata for climate datasets

Page 2: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Why RDF?

Make implicit semantics explicit

Web-based system for interoperating semantics

RDF/OWL is an emerging technology, so tools are being built that help solve the semantic problems in handling data

Page 3: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Standard Metadata

Users

Datasets

Tools

Standard Metadata Schema/Data Services

Page 4: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Many Data Communities

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Page 5: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Super Schema

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Standard metadata schema

Page 6: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Super Schema: direct

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Standard metadata schema/data service

Page 7: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Flaws

• A lot of work• Super Schema/Service is the Lowest-

Common-Denominator• Science keeps evolving, so that standards

either fall behind or constantly change

Page 8: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

RDF Standard Data Model Exchange

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Tools

Users

Datasets

Standard Metadata Schema

Standard metadata schema

RDF

RDF

RDF

RDF

RDF

RDF

Page 9: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Standard metadata schema

Tools

Users

Datasets

Standard Metadata Schema

RDF

RDFRDF

Tools

Users

Datasets

Standard Metadata Schema

RDF

RDFRDF

Tools

Users

Datasets

Standard Metadata Schem

RDF

RDFRDF

RDF Data Model Exchange

RDF

Tools

Users

Datasets

Standard Metadata Schema

RDF

RDFRDF

Tools

Users

Datasets

Standard Metadata Schema

RDF

RDFRDF

Page 10: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Why is this better?• Maps the original dataset metadata into a standard

format that can be transported and manipulated• Still the same impedance mismatch when mapped to the

least-common-denominator standard metadata, but• When a better standard comes along, the original

complete-but-nonstandard metadata is already there to be remapped, and “late semantic binding” means everyone can use the new semantic mapping

• Can use enhanced mappings between models that have common concepts beyond the least-common-denominator

• EASIER – tools to enhance the mapping process, mappings build on other mappings

Page 11: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

RDF Architecture

RDFRDF RDF

RDFRDF RDF

RDFRDF RDF

RDF

RDFRDF RDF

RDFRDF RDF

Virtual (derived) RDF

queries queries queries

Page 12: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Example: Search Interface

Search Interface

Users

Datasets

Search Ontology

Dataset Ontology

Additional Semantics

Page 14: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Distinctive Features of the search

• Search terms are interrelated• terms that describe the set of returns are

displayed (spanning and not)• Returned items also have structure (sub-

items and superseded items are not shown)

Page 15: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Architectural Features of the search

• Multiple search structures possible• Multiple languages possible• Search structure is kept in the database,

not in the code

http://iridl.ldeo.columbia.edu/ontologies/query2.pl

Page 16: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Triplets of • Subject• Property (or Predicate)• Object

URI’s identify things, i.e. most of the aboveNamespaces are used as a convenient

shorthand for the URI’s

RDF: framework for writing connections

Page 17: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Datatype Properties

{WOA} dc:title “NOAA NODC WOA01”{WOA} dc:description “NOAA NODC

WOA01: World Ocean Atlas 2001, an atlas of objectively analyzed fields of major ocean parameters at monthly, seasonal, and annual time scales. Resolution: 1x1; Longitude: global; Latitude: global; Depth: [0 m,5500 m]; Time: [Jan,Dec]; monthly”

Page 18: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Object Properties{WOA} iridl:isContainerOf {Grid-1x1},

{Grid-1x1} iridl:isContainerOf {Monthly}

Page 19: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

WOA01 diagram

Page 20: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Standard Properties

{WOA} dcterm:hasPart {Grid-1x1},{Grid-1x1} dcterm:hasPart {MONTHLY}

Alternatively

{WOA} iridl:isContainerOf {Grid-1x1},{iridl:isContainerOf} rdfs:subPropertyOf

{dcterm:hasPart}

Page 21: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

{SST} rdf:type {cfatt:non_coordinate_variable}, {SST} cfobj:standard_name {cf:sea_surface_temperature}, {SST} netcdf:hasDimension {longitude}

Data Structures in RDF

Object properties provide a framework for explicitly writing down relationships between data objects/components, e.g. vague meaning of nesting is made explicit

Properties also can be related, since they are objects too

Page 22: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Search Interface Term

• http://iri.columbia.edu/~benno/sampleterm.pdf

Page 23: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Virtual Triples

Use Conventions to connect concepts to established sets of concepts

Generate additional “virtual” triples from the original set and semantics

RDFS – some property/class semanticsOWL – additional property/class semantics:

more sophisticated (ontological) relationships

SWRL – rules for constructing virtual triples

Page 24: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

OWL

Language for expressing ontologies, i.e. the semantics are very important. However, even without a reasoner to generate the implied RDF statements, OWL classes and properties represent a sophistication of the RDF Schema

However, there are many world views in how to express concepts: concepts as classes vs concepts as individuals vs concept as predicate

Page 25: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Define terms

• Attribute Ontology• Object Ontology• Term Ontology

Page 26: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Attribute Ontology

• Subjects are the only type-object• Predicates are “attributes”• Objects are datatype

• Isomorphic to simple data tables• Isomorphic to netcdf attributes of datasets• Some faceted browsers: predicate = facet

Page 27: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Object Ontology

• Objects are object-type• Isomorphic to “belongs to”• Isomorphic to multiple data tables connected by

keys• Express the concept behind netcdf attributes

which name variables • Concepts as objects can be cross-walked• Concepts as object can be interrelated

Page 28: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Example: controlled vocabulary

{variable} cfatt:standard_name {“string”}Where string has to belong to a list of

possibilities.

{variable} cfobj:standard_name {stdnam}Where stdnam is an individual of the class

cfobj:StandardName

Page 29: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Example: controlled vocabulary

Bi-direction crosswalk between the two is somewhat trivial, which means all my objects will have both

cfatt:standard_nameand cfobj:standard_name

Page 30: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Example: controlled vocabulary

If I am writing software to read/write netcdf files, I use the cfatt ontology and in particular cfatt:standard_name

If I am making connections/cross-walks to other variable naming standards, I use

cfobj:standard_name

Page 31: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Term Ontology

Concepts as individualsSimple Knowledge Organization System

(SKOS) is a prime exampleThe ontology used here is slightly different:

facets are classes of terms rather than being top_concepts

Page 32: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Nuanced tagging

Concepts as objects can be interrelated: specific terms imply broader terms

Object ends up being tagging with terms ranging from general to specific.

Search can then be nuancedtagging can proceed in absence of perfect

information

Page 33: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Faceted Search Explicated

Page 34: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Search Interface

• Items (datasets/maps)

• Terms• Facets• Taxa

Page 35: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Search Interface Semantic API{item} dc:title dc:description rss:link iridl:icon dcterm:isPartOf {item2} dcterm:isReplacedBy {item2}

{item} trm:isDescribedBy {term}

{term} a {facet} of {taxa} of {trm:Term},{facet} a {trm:Facet}, {taxa} a {trm:Taxa},{term} trm:directlyImplies {term2}

Page 37: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

RDF Architecture

RDFRDF RDF

RDFRDF RDF

RDFRDF RDF

RDF

RDFRDF RDF

RDFRDF RDF

Virtual (derived) RDF

queries queries queries

Page 38: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Data ServersOntologies

MMI

JPL

StandardsOrganizations

Start Point

RDF Crawler

RDFS SemanticsOwl SemanticsSWRL Rules

SeRQL CONSTRUCT

Search Queries

LocationCanonicalizer

TimeCanonicalizer

Sesame

Search Interface

bibliography

IRI RDF Architecture

Page 39: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Cast of Characters

NC – netcdf data file formatCF – Climate and Forecast metadata

convention for netcdfSWEET - Semantic Web for Earth and

Environmental Terminology (OWL Ontology)

IRIDL – IRI Data Library

Page 40: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

CF attributes

SWEET Ontologies(OWL)

Search Terms

CF Standard Names(RDF object)

IRIDL Terms

NC basic attributes

IRIDLattributes/objects

SWEET as Terms

CF Standard NamesAs Terms

Gazetteer Terms

CF data objects

Location

Page 41: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Thoughts

• Pure RDF framework seems currently viable for a moderate collection of data

• Potential for making a lot of implicit data conventions explicit

• Explicit conventions can improve interoperability

• Simple RDF concepts can greatly impact searches

Page 42: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Future Work Possibilities

More Usable Search InterfaceTagging Interface that uses tag interrelationships

to simplify choicesData Format translation using semantics“Related Object Browsing” given a dataset, find

related data, papers, imagesDocument/execute/create analysis treesStovepipe conventions/bash-to-fitLess Monolithic IRI Data Library

Page 43: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Question: Canonical Objects

Given a set of individuals related by owl:sameAs, semantics says predicates that point to one individual should point to all the individuals. But as a programmer I usually want to work with one (the canonical object).

Page 44: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Question: Concept Class Membership

Seemingly the only way to use a concept which is expressed as an OWL class is rdf:type, i.e. membership. Should I really be putting datasets into conceptual classes? Shouldn’t there be more choice of relationship between a concept and a dataset?

Page 45: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Question: OWL/SKOS Crosswalk

Given a concept in both OWL and Term (SKOS) frameworks, is it possible to crosswalk between them, in particular preserving the ability of Term to have different predicates with the item being described?

I’ve tried dual-defined objects, not sure it works yet …

Page 46: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Stovepipe Conventions

• Fixed Schema• Agreed upon metadata domain• Agreed upon data domain• Designed to be a partial solution

General server software needs to decide whether data legitimately fits the standard

User contemplates bash-to-fit

Page 47: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Overview

IRI Data Collection

Generalized Data Tools

Specialized Data Tools

Dataset • Dataset •Dataset •Variable

•ivar•ivar

multidimensional

Data Viewer Data Language

Maproom

URL/URI for data, calculations, figs, etc

Page 48: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

IRI DataCollection

Dataset • Dataset •Dataset •Variable

•ivar•ivar

multidimensional

EconomicsPublic Health

“geolocated by entity”

GIS

“geolocation byvector object or projection metadata”

Ocean/Atm

“geolocated by lat/lon”multidimensional

spectral harmonicsequal-area gridsGRIB grid codesclimate divisions

IRI Data Collection

Page 49: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

IRI DataCollection

Dataset • Dataset •Dataset •Variable

•ivar•ivar

ServersOpenDAPTHREDDS

GRIBnetCDFimagesbinary

DatabaseTablesqueries

spreadsheetsshapefiles

images w/proj

IRI Data Collection

Page 50: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

IRI DataCollection

Dataset • Dataset •Dataset •Variable

•ivar•ivar

Calculations“virtual variables”

imagesgraphics

descriptive and navigational pages

OpenGISWMS v1.3

WCS

Data FilesnetcdfbinaryImagesGeoTiff

Clients

OpenDAPTHREDDS

Tables

ServersOpenDAPTHREDDS

GRIBnetCDFimagesbinary

DatabaseTablesqueries

spreadsheetsshapefiles

images w/proj

IRI Data Collection

Page 51: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

IRI General Data Tools

Data Page

Page 52: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

IRI General Data Tools

Data Viewer

Page 53: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

IRI General Data Tools

Cut and Paste

Page 55: M.Benno Blumenthal and John del Corral International Research Institute for Climate and Society  Using a Resource

Malaria Early Warning System• Front page illustrates most recent dekadal rainfall estimates (FEWS RFE)

• Change dates to view different time periods

• Administrative and epidemiological overlays available

• Click and drag box across map to zoom

IRI Map Room