in search of what some of it means rda semantics and metadata workshop feb 23, 2015 peter fox (rpi)...

Post on 20-Jan-2016

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

In Search of What Some of It Means

RDA Semantics and Metadata Workshop

Feb 23, 2015

Peter Fox (RPI) pfox@cs.rpi.eduTetherless World Constellation

Metadata and documentation

Not more code!

Spectral synthesis components and flow

Getting the metadata?

6

What I wanted ~ 1994-6

Scientists should be able to access a global, distributed knowledge base of scientific data that:

• appears to be integrated

• appears to be locally available

But… data is obtained by multiple means (instruments, models, analysis) using various protocols, in differing vocabularies, using (sometimes unstated) assumptions, with inconsistent (or non-existent) metadata. It may be inconsistent, incomplete, evolving, and distributed. And, it is almost always created in a manner to facilitate its generation not its use.

And… there exist(ed) significant levels of semantic heterogeneity, large-scale data, complex data types, legacy systems, inflexible and unsustainable implementation technology…

What I was doing…

pro read_spec, spectra_name, description, auxiliary_info, model_size, mu_size, wave_size, model, smodel, mu, wave0, wavelength, intensity, brightness_temperature, index1, index2, percent

ncopts = 0;

description_start=0

description_edges=80

i=0

j=0

k=0

; Construct the DB filename

ncid=ncdf_open(string(getenv("SPECTRA")))

inq_struct=ncdf_inquire(ncid)

; /* get dimension info */

tmp_id = ncdf_dimid(ncid, "comment_dim")

ncdf_diminq,ncid, tmp_id, dummy, comment_dim

tmp_id=ncdf_dimid(ncid, "mu_dim")

ncdf_diminq,ncid, tmp_id, dummy, mu_dim

tmp_id=ncdf_dimid(ncid, "wave_dim")

ncdf_diminq,ncid, tmp_id, dummy, wave_dim

tmp_id=ncdf_dimid(ncid, "model_dim")

ncdf_diminq,ncid, tmp_id, dummy, model_dim

tmp_id=ncdf_dimid(ncid, "smodel_dim")

ncdf_diminq,ncid, tmp_id, dummy, smodel_dim

tmp_id=ncdf_dimid(ncid, "item_dim")

ncdf_diminq,ncid, tmp_id, dummy, item_dim

What I was doing… etc.

tmp_id = ncdf_varid (ncid, "description")

ncdf_varget,ncid, tmp_id, OFFSET=0,COUNT=comment_dim, description

; Id's for variables

tmp_id=ncdf_varid(ncid, "spectra_name")

ncdf_varget,ncid, tmp_id, OFFSET=0,COUNT=comment_dim, spectra_name

tmp_id=ncdf_varid(ncid, "auxiliary_info")

ncdf_varget,ncid, tmp_id, OFFSET=0,COUNT=comment_dim, auxiliary_info

tmp_id=ncdf_varid(ncid, "model_size")

ncdf_varget,ncid, tmp_id, OFFSET=0,COUNT=item_dim, model_size

start=intarr(1)

edges=intarr(1)

start(0)=0

edges(0)=model_size

tmp_id=ncdf_varid(ncid, "mu_size")

ncdf_varget,ncid, tmp_id, mu_size, OFFSET=start, COUNT=edges

tmp_id=ncdf_varid(ncid, "model")

ncdf_varget,ncid, tmp_id, model, OFFSET=start, COUNT=edges

start=intarr(2)

edges=intarr(2)

start(0)=0

edges(0)=smodel_dim

start(1)=0

edges(1)=model_size

tmp_id=ncdf_varid(ncid, "smodel")

ncdf_varget,ncid, tmp_id, smodel, OFFSET=start, COUNT=edges

What does It all Mean?

Some version of this…

10

Data Information Knowledge

Context

PresentationOrganization

IntegrationConversation

CreationGathering

Experience

~Metadata?

It and Meaning

• It = things that matter– Context

• Meaning = duh -> semantics• Relations!! Real ones!

• But it was more than that, though that often comes later…– Syntax (structure/form)– Semantics (meaning)– Pragmatics (use)

Metadata-Information-Knowledge Ecosystem

12

Metadata Information Knowledge

Context

FormalizationOrganization

IntegrationShared Conceptualization

CreationGathering

Experience

Provenance

• Origin or source from which something comes, intention for use, who/what generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility

• Provenance: metadata in a given context! Swallow that.

• Knowledge provenance; meaning and relations in multiple contexts!

Perfect is the enemy of the good… (thanks Voltaire)

Origins …

• In 2000-2001 the need for capturing and preserving knowledge in science data became very clear but the barriers were high

• In 2004 we started a virtual observatory project based on semantic technologies

• Use case driven – in solar and solar-terrestrial physics with an emphasis on instrument-based measurements and real data pipelines; we needed implementations

• We knew we also needed integration and provenance (but that came later)

• We aimed to push semantics into our systems to build new ‘prototypes’ but we ‘failed’ ;-)

Tetherless World Constellation 15

In 2004

• 2004 – OWL was a W3 recommendation!!• Protégé 2.x and the Protégé-Java-OWL

API• SWOOP was a viable editor• Jena and the Jena API were in good

shape• Pellet worked• SPARQL was still a twinkle in the RDF

working group’s eye• Semantics were still the realm of computer

scientists

Tetherless World Constellation 16

Design and Development

• We made a conscious decision only to develop ontologies that were required to answer specific use cases and migrate metadata– Both Classes AND Properties (uh-oh…)

• We made a conscious effort to use whatever ontologies were available (cf. trends in metadata… nuff said)

• We were pretty sure that rules would be needed (complex logic or late semantic binding)

• We ignored query (see implementation)

Tetherless World Constellation 17

18

Use Case example

• Plot the neutral temperature from the Millstone-Hill Fabry Perot, operating in the non-vertical mode during January 2000 as a time series.

• Plot the neutral temperature from the Millstone-Hill Fabry Perot, operating in the non-vertical mode during January 2000 as a time series.– Meanings and relations

• Objects=Things!– Neutral temperature is a (temperature is a) parameter– Millstone Hill is a (ground-based observatory is a) observatory– Fabry-Perot is a interferometer is a optical instrument is a instrument– Non-vertical mode is a instrument operating mode– January 2000 is a date-time range– Time is a independent variable/ coordinate– Time series is a data plot is a data product

• Metadata just appeared everywhere…

Semantics - Modern informatics enables a new scale-free** framework approach

• Use cases

• Stakeholders

• Distributed authority

• Access control

• Ontologies

• Maintaining Identity

Semantics between 2004 and 2009

• Ontologies were needed for data integration and provenance and mediation for data mining

• Protégé 3.x and then 4.0 came out• SWOOP development was interrupted• Cmap added OWL predicate support*• SPARQL became a recommendation• Triple stores exploded in use and capability• Linked Open Data started to take off• Pellet 2.0 came out• I used the “M” word less frequently!

Tetherless World Constellation 22

Working with knowledge

Expressivity

Maintainability/ Extensibility

Implementability

Working with semantics

Query

Rule execution

Inference

Semantics between 2009 and now

• Semantic data framework (SeSF)• Substantial knowledge provenance work• Data quality, uncertainty and bias

representations and applications (oh, these are in production at NASA)

• Multi-sensor data synergy advisor• Applications:

– Sea Ice, Carbon Observatory, Integrated Ecosystem Assessments, globalchange.gov, ocean.data.gov, energy.data.gov ….

Tetherless World Constellation 25

Respect and Mediation … how

Discovering new data

NCA links to GCIS entities

28

http://data.globalchange.gov

Information model

29

Ontology

Core and Framework Semantics - Multi-tiered interoperability

used by

Closing thoughts

• Go ahead, create all the metadata you want, we’ll “materialize” some of it into triples based on semantics for use!

• Go ahead, create all the schema and encodings you want but remember – semantics now lives in an open-world (some of it). You are not the only source of metadata. Not all formal. Link over map.

• Semantics make metadata useful but we do not need all of your metadata

Tetherless World Constellation 31

Contact

• pfox@cs.rpi.edu

• http://tw.rpi.edu

• @taswegian

top related