in search of what some of it means rda semantics and metadata workshop feb 23, 2015 peter fox (rpi)...

30
In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) [email protected] Tetherless World Constellation

Upload: annabel-crawford

Post on 20-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

In Search of What Some of It Means

RDA Semantics and Metadata Workshop

Feb 23, 2015

Peter Fox (RPI) [email protected] World Constellation

Page 2: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

Metadata and documentation

Page 3: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

Not more code!

Page 4: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

Spectral synthesis components and flow

Page 5: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

Getting the metadata?

Page 6: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

6

What I wanted ~ 1994-6

Scientists should be able to access a global, distributed knowledge base of scientific data that:

• appears to be integrated

• appears to be locally available

But… data is obtained by multiple means (instruments, models, analysis) using various protocols, in differing vocabularies, using (sometimes unstated) assumptions, with inconsistent (or non-existent) metadata. It may be inconsistent, incomplete, evolving, and distributed. And, it is almost always created in a manner to facilitate its generation not its use.

And… there exist(ed) significant levels of semantic heterogeneity, large-scale data, complex data types, legacy systems, inflexible and unsustainable implementation technology…

Page 7: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

What I was doing…

pro read_spec, spectra_name, description, auxiliary_info, model_size, mu_size, wave_size, model, smodel, mu, wave0, wavelength, intensity, brightness_temperature, index1, index2, percent

ncopts = 0;

description_start=0

description_edges=80

i=0

j=0

k=0

; Construct the DB filename

ncid=ncdf_open(string(getenv("SPECTRA")))

inq_struct=ncdf_inquire(ncid)

; /* get dimension info */

tmp_id = ncdf_dimid(ncid, "comment_dim")

ncdf_diminq,ncid, tmp_id, dummy, comment_dim

tmp_id=ncdf_dimid(ncid, "mu_dim")

ncdf_diminq,ncid, tmp_id, dummy, mu_dim

tmp_id=ncdf_dimid(ncid, "wave_dim")

ncdf_diminq,ncid, tmp_id, dummy, wave_dim

tmp_id=ncdf_dimid(ncid, "model_dim")

ncdf_diminq,ncid, tmp_id, dummy, model_dim

tmp_id=ncdf_dimid(ncid, "smodel_dim")

ncdf_diminq,ncid, tmp_id, dummy, smodel_dim

tmp_id=ncdf_dimid(ncid, "item_dim")

ncdf_diminq,ncid, tmp_id, dummy, item_dim

Page 8: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

What I was doing… etc.

tmp_id = ncdf_varid (ncid, "description")

ncdf_varget,ncid, tmp_id, OFFSET=0,COUNT=comment_dim, description

; Id's for variables

tmp_id=ncdf_varid(ncid, "spectra_name")

ncdf_varget,ncid, tmp_id, OFFSET=0,COUNT=comment_dim, spectra_name

tmp_id=ncdf_varid(ncid, "auxiliary_info")

ncdf_varget,ncid, tmp_id, OFFSET=0,COUNT=comment_dim, auxiliary_info

tmp_id=ncdf_varid(ncid, "model_size")

ncdf_varget,ncid, tmp_id, OFFSET=0,COUNT=item_dim, model_size

start=intarr(1)

edges=intarr(1)

start(0)=0

edges(0)=model_size

tmp_id=ncdf_varid(ncid, "mu_size")

ncdf_varget,ncid, tmp_id, mu_size, OFFSET=start, COUNT=edges

tmp_id=ncdf_varid(ncid, "model")

ncdf_varget,ncid, tmp_id, model, OFFSET=start, COUNT=edges

start=intarr(2)

edges=intarr(2)

start(0)=0

edges(0)=smodel_dim

start(1)=0

edges(1)=model_size

tmp_id=ncdf_varid(ncid, "smodel")

ncdf_varget,ncid, tmp_id, smodel, OFFSET=start, COUNT=edges

Page 9: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

What does It all Mean?

Page 10: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

Some version of this…

10

Data Information Knowledge

Context

PresentationOrganization

IntegrationConversation

CreationGathering

Experience

~Metadata?

Page 11: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

It and Meaning

• It = things that matter– Context

• Meaning = duh -> semantics• Relations!! Real ones!

• But it was more than that, though that often comes later…– Syntax (structure/form)– Semantics (meaning)– Pragmatics (use)

Page 12: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

Metadata-Information-Knowledge Ecosystem

12

Metadata Information Knowledge

Context

FormalizationOrganization

IntegrationShared Conceptualization

CreationGathering

Experience

Page 13: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

Provenance

• Origin or source from which something comes, intention for use, who/what generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility

• Provenance: metadata in a given context! Swallow that.

• Knowledge provenance; meaning and relations in multiple contexts!

Page 14: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

Perfect is the enemy of the good… (thanks Voltaire)

Page 15: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

Origins …

• In 2000-2001 the need for capturing and preserving knowledge in science data became very clear but the barriers were high

• In 2004 we started a virtual observatory project based on semantic technologies

• Use case driven – in solar and solar-terrestrial physics with an emphasis on instrument-based measurements and real data pipelines; we needed implementations

• We knew we also needed integration and provenance (but that came later)

• We aimed to push semantics into our systems to build new ‘prototypes’ but we ‘failed’ ;-)

Tetherless World Constellation 15

Page 16: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

In 2004

• 2004 – OWL was a W3 recommendation!!• Protégé 2.x and the Protégé-Java-OWL

API• SWOOP was a viable editor• Jena and the Jena API were in good

shape• Pellet worked• SPARQL was still a twinkle in the RDF

working group’s eye• Semantics were still the realm of computer

scientists

Tetherless World Constellation 16

Page 17: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

Design and Development

• We made a conscious decision only to develop ontologies that were required to answer specific use cases and migrate metadata– Both Classes AND Properties (uh-oh…)

• We made a conscious effort to use whatever ontologies were available (cf. trends in metadata… nuff said)

• We were pretty sure that rules would be needed (complex logic or late semantic binding)

• We ignored query (see implementation)

Tetherless World Constellation 17

Page 18: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

18

Use Case example

• Plot the neutral temperature from the Millstone-Hill Fabry Perot, operating in the non-vertical mode during January 2000 as a time series.

• Plot the neutral temperature from the Millstone-Hill Fabry Perot, operating in the non-vertical mode during January 2000 as a time series.– Meanings and relations

• Objects=Things!– Neutral temperature is a (temperature is a) parameter– Millstone Hill is a (ground-based observatory is a) observatory– Fabry-Perot is a interferometer is a optical instrument is a instrument– Non-vertical mode is a instrument operating mode– January 2000 is a date-time range– Time is a independent variable/ coordinate– Time series is a data plot is a data product

• Metadata just appeared everywhere…

Page 19: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

Semantics - Modern informatics enables a new scale-free** framework approach

• Use cases

• Stakeholders

• Distributed authority

• Access control

• Ontologies

• Maintaining Identity

Page 20: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

Semantics between 2004 and 2009

• Ontologies were needed for data integration and provenance and mediation for data mining

• Protégé 3.x and then 4.0 came out• SWOOP development was interrupted• Cmap added OWL predicate support*• SPARQL became a recommendation• Triple stores exploded in use and capability• Linked Open Data started to take off• Pellet 2.0 came out• I used the “M” word less frequently!

Tetherless World Constellation 22

Page 21: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

Working with knowledge

Expressivity

Maintainability/ Extensibility

Implementability

Page 22: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

Working with semantics

Query

Rule execution

Inference

Page 23: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

Semantics between 2009 and now

• Semantic data framework (SeSF)• Substantial knowledge provenance work• Data quality, uncertainty and bias

representations and applications (oh, these are in production at NASA)

• Multi-sensor data synergy advisor• Applications:

– Sea Ice, Carbon Observatory, Integrated Ecosystem Assessments, globalchange.gov, ocean.data.gov, energy.data.gov ….

Tetherless World Constellation 25

Page 24: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

Respect and Mediation … how

Page 25: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

Discovering new data

Page 26: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

NCA links to GCIS entities

28

http://data.globalchange.gov

Page 27: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

Information model

29

Ontology

Page 28: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

Core and Framework Semantics - Multi-tiered interoperability

used by

Page 29: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

Closing thoughts

• Go ahead, create all the metadata you want, we’ll “materialize” some of it into triples based on semantics for use!

• Go ahead, create all the schema and encodings you want but remember – semantics now lives in an open-world (some of it). You are not the only source of metadata. Not all formal. Link over map.

• Semantics make metadata useful but we do not need all of your metadata

Tetherless World Constellation 31

Page 30: In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edupfox@cs.rpi.edu Tetherless World Constellation

Contact

[email protected]

• http://tw.rpi.edu

• @taswegian