metadata semantics and the earth system curator rocky dunlap earth system curator georgia tech
Post on 18-Dec-2015
222 views
TRANSCRIPT
Earth System Curator 3 year NSF funded project Funded Collaborators:
Cecelia DeLuca (NCAR, PI) Balaji (GFDL, Co-PI) Don Middleton (NCAR, Co-PI) Chris Hill (MIT, Co-PI) Spencer Rugaber (Ga Tech, Co-PI) Leo Mark (Ga Tech) Julien Chastang (NCAR) Sergey Nikonov (GFDL) Angela Navarro (Ga Tech) Me (Ga Tech)
Also working with: Lois and Katherine (NMM) Sophie Valcke (PRISM/OASIS) Others...
Curator Doctrine
Currently a gap in the way we treat models and datasets (are they really so different?)
Best description of a dataset is a comprehensive description of the model run that created the dataset (+ post processing)
Model components are data objects for exchange Metadata-centric view
Don’t start with a dataset and try to find the metadata... Start with good metadata that leads you to the datasets you want—even if they don’t yet exist! (No, really, that’s how we think.)
Haiku are a valid form of model metadata
Earth System Curator Applications (Proofs of Concept)
Catalog of modeling components along with comprehensive metadata CDP Curator (Michael B., Don, Luca, Julien)
Demonstrate compatibility checking of components Primarily “technical” compatibility: platforms,
compilers, required fields, field data types, calendar/time
Demonstrate auto-generation of coupler component based on metadata
Demonstrate automation of workflow tasks Model assembly, execution, archive, post-
processing
Schema Development Fun
To accomplish these goals, we need:Comprehensive descriptions of climate
models: model metadataIncludes both “semantic” and “syntactic”
elements (“discovery” vs. “use”)• Semantic: component name, type, owner,
description, source code location, component architecture of model, platform, framework
• Syntactic: parameter settings, input datasets, boundary conditions, coupling details, grid coordinates
Lots of schemata...
Component (NMM) Potential Model (NMM/Curator) Model (NMM) PMIOD/SMIOC (PRISM coupling spec) CRE/Curator Complete (workflow) Application (NMM) Gridspec
Reminiscing on Metadata Development
Observations: (It seems) much of the community is in
support of metadata development• Although there are different opinions on levels of
comprehensiveness People using metadata for different reasons:
• Annotate large datasets for retrieval• Inform analysis tools• Archiving of modeling components• Automation of workflow (runtime environ.)• Exchange datasets
Each application requires different (but often overlapping) metadata
How should we think about schemata?
Schemata are typically written for applications: I have a particular task I want to accomplish What metadata do I need to accomplish it?
Write a schema. But...
Now we have lots of schemata sitting around• They may contain overlapping information• Different ways of expressing the same information• Each schema is used for a small number of tasks and
understood by a small number of applications• May need to reference elements in another schema,
or aggregate elements from multiple schemata
A Unified View of Metadata
Given all of the current metadata development efforts, Curator is promoting a unified view of metadataMetadata reuse must be a priorityMetadata aggregation is key: schemata
built (generated!) from repository of existing metadata elements (let’s call them types)
We must think conceptually first and then syntactically—ideally, all groups will agree at both levels
What’s In a Schema?
XML Schema (e.g., gridspec.xsd)
XML Type
GridTile
ContactRegion
Boundary
GridDescriptor
These are syntactic and conceptual constructs
Re-using schema elements
How do I best use/re-use metadata elements from (multiple) schema(ta) to accomplish my particular application?
You need:A conceptual understanding of the “types”
(concepts) in the schema GlossaryThe syntactic representation of that type
(so you can actually use it in implementations) XML Type Library
WEARE
HERE
Multi-Schema Semantic Glossary
Community-wide glossary of metadata types/concepts from multiple schemata
Concepts aggregated into a centralized glossary Schema authors and users can get
explanations/definitions of metadata elements. Examples:
What does the contact_region tag mean in the Gridspec schema?
What goes under the intent tag in the PMIOD? What is a potential model anyway?
Multi-Schema Semantic Glossary
For each metadata concept provide:Human-readable definitionSource schemaExample usageChange notes/provenanceSemantic relationships with other concepts
(e.g., broader than, narrower than, part of, parent of, synonym, etc.)
Glossary Design
Schema authors embed descriptions directly inside each XML schemaKeep the human-readable definitions close to
the formal syntactic definitionsWhen schema is updated, it is easy to
update glossary Glossary entries from distributed schemata
are harvested (nightly?) and placed into centralized glossary (alternatively, live access?)
Simple interface allows users to query glossary for concepts
Glossary Design
Simple Knowledge Organization Systems (SKOS) data model for glossary entrieshttp://www.w3.org/2004/02/skos/SKOS supports knowledge organization
systems like glossaries, thesauri, taxonomies, etc.
RDF based – move the community toward languages with higher semantics (eventually get down to dataset level)
Sample SKOS RDF (Basic)
<skos:Concept rdf:about="http://.../schema/1.0#PotentialModel"> <skos:prefLabel>potential model</skos:prefLabel> <skos:definition>
A set of components at the source code level that can potentially form an executable model....
</skos:definition> </skos:Concept>
Where should glossary entries be stored?
Example Annotated Schema
...<xsd:complexType name=“PotentialModel"> <xsd:annotation> <xsd:documentation> <skos:Concept rdf:about="http://.../schema/1.0#PotentialModel"> <skos:prefLabel>potential model</skos:prefLabel> <skos:definition>
A set of components at the source code level that can potentially form an executable model.
</skos:definition> </skos:Concept> </xsd:documentation> </xsd:annotation> <!-- rest of complexType definition goes here --> <xsd:complexType>...
Sample SKOS RDF Triples
esc:PotentialModel
skos:Concept
‘potential model’
‘A set of components at the source code level that can potentially form
an executable model. ’
rdf:type
skos:prefLabel
skos:definition
Other SKOS Fields<skos:Concept rdf:about="http://purl.oclc.org/NMM/Model/011/#model"> <skos:prefLabel>model</skos:prefLabel> <skos:definition> The root element of a NMM Model description. There is one model per xml file. This model can have one or more related component configurations. </skos:definition> <skos:altLabel>simulation</skos:altLabel> <skos:altLabel>job</skos:altLabel> <skos:altLabel>run</skos:altLabel> <skos:example>UK Met Office Unified Model</skos:example> <skos:related rdf:resource=" http://...NMMPotentialModel/1.0/#PotentialModel"/> <skos:changeNote rdf:parseType="Resource"> <rdf:value>The label 'model' was changed from NMM_Model.</rdf:value> <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/"> <foaf:Person xmlns:foaf="http://xmlns.com/foaf/0.1/"> <foaf:name>Katherine Bouton</foaf:name> <foaf:mbox rdf:resource="mailto:..."/> </foaf:Person> </dc:creator> <dc:date xmlns:dc="http://purl.org/dc/elements/1.1/">2007-02-02</dc:date> </skos:changeNote> <dc:source rdf:resource="http://purl.oclc.org/NMM/Model"/></skos:Concept>
Semantic Relationships
esc:PotentialModel
nmm:Component
skosx:childOf
skos:related
nmm:Model
skosx:childOf
prism:Modelskos:synonym
skos:synonym
Putting it all Together
Namespace Schemata (e.g.,
NMM, Curator-NMM, Gridspec, ESG)
Marked up with glossary metadata (terms, definitions,
relationships)
Aggregate Glossary RDF
Joseki RDF Server
Glossary Web Application
Tomcat (www.earthsystemcurator.org/glossary)
Client Web Browser
SPARQL Queries
Glossary metadata harvested
nightly
Search for terms, view relationships,
etc.
1
2 3 45
More info:
http://glossary.earthsystemcurator.org/http://www.earthsystemcurator.org/index.php?option=com_content&task=view&id=54&Itemid=84
Glossary Interface
Search
Schemata to Include
Concept List
Concept Details
Links to related concepts
Syntactic Metadata Re-use
So, if we agree on the concepts, what about the syntax? (i.e., XML representation)
Concept = XML Type How do we share XML types from multiple
schemata across the community? One idea: XML Type Library (or Catalog or
Repository) “Preliminary Research” This is NOT the same thing as a single complex
schema that describes everything – types are first class objects and can be manipulated individually
How does an XML Type Library work?
Operations (web service?)Submit an XML typeGet a list of all typesQuery for typesValidate a type (Is my XML
fragment a valid X?)Type membership (What
types does my XML fragment fit?)
Generate an XML Schema
How does an XML Type Library work?
What metadata is available per type?Definition (e.g., XML Schema complexType)
SKOS Glossary entry (for queries)Example usage scenariosDependencies on other typesVersioning metadataAvailable operations/web services
• “If you have an XML fragment of type X, you can use the following services...”
Use Case: Submit Type
<xsd:complexType name=“PotentialModel"> <xsd:annotation> <xsd:documentation> <skos:Concept rdf:about="http://.../schema/1.0#PotentialModel"> <skos:prefLabel>potential model</skos:prefLabel> <skos:definition>A set of components at the source code... </skos:definition> </skos:Concept> </xsd:documentation> </xsd:annotation> <!-- rest of complexType definition goes here --> <xsd:complexType>
<xsd:complexType name=“PotentialModel"> <xsd:annotation> <xsd:documentation> <skos:Concept rdf:about="http://.../schema/1.0#PotentialModel"> <skos:prefLabel>potential model</skos:prefLabel> <skos:definition>A set of components at the source code... </skos:definition> </skos:Concept> </xsd:documentation> </xsd:annotation> <!-- rest of complexType definition goes here --> <xsd:complexType>
<xsd:complexType name=“PotentialModel"> <xsd:annotation> <xsd:documentation> <skos:Concept rdf:about="http://.../schema/1.0#PotentialModel"> <skos:prefLabel>potential model</skos:prefLabel> <skos:definition>A set of components at the source code... </skos:definition> </skos:Concept> </xsd:documentation> </xsd:annotation> <!-- rest of complexType definition goes here --> <xsd:complexType>
<xsd:complexType name=“PotentialModel"> <xsd:annotation> <xsd:documentation> <skos:Concept rdf:about="http://.../schema/1.0#PotentialModel"> <skos:prefLabel>potential model</skos:prefLabel> <skos:definition>A set of components at the source code... </skos:definition> </skos:Concept> </xsd:documentation> </xsd:annotation> <!-- rest of complexType definition goes here --> <xsd:complexType>
<xsd:complexType name=“PotentialModel"> <xsd:annotation> <xsd:documentation> <skos:Concept rdf:about="http://.../schema/1.0#PotentialModel"> <skos:prefLabel>potential model</skos:prefLabel> <skos:definition>A set of components at the source code... </skos:definition> </skos:Concept> </xsd:documentation> </xsd:annotation> <!-- rest of complexType definition goes here --> <xsd:complexType>
ExistingSchemata Extract Types
Submit toType Library
Use Case: Validation
Type Library
<horizontal_coord_system type=“cartesian”> <x_axis>...</x_axis> <y_axis>...</y_axis></horizontal_coord_system>
XML Fragment
Validate“Valid” or“Invalid”
Use Case: Find Services
Type Library
<horizontal_coord_system type=“cartesian”> <x_axis>...</x_axis> <y_axis>...</y_axis></horizontal_coord_system>
XML Fragment Find Services
Interpolate_Service()Extract_Variable()Massage_Data()Another_Operation()
List of available services based on type of fragment
Some Conclusions
With large amount of metadata activity already in progress, metadata re-use must be a priority
Conceptual understanding is essentialAdoption of a glossary of concepts
Syntactic agreement is desirableConcepts assigned concrete XML
types and stored in a library
Some Haiku
Retile the ShowerTessellated MosaicFirst Write a Gridspec
Forever summerquestions and answersCurator complete
Potential ModelLike a cool autumn breezePotentially mad
Example Gridspec Applications
Not written for one particular application – general grid metadata has many potential uses IPCC Model Documentation table Moving variables to common grid for analysis Regridding vertical from 24 to 40 levels
There are two levels: conceptual and syntactic – ideally, we would agree at both of these levels! If we only have conceptual agreement—we can still
interoperate, but must do transformations
Application: NARCCAP Vertical Interpolation
Gridspec.xsd
Partial Schema
Description of vertical coordinate scheme
Metadata required for NARCCAP experiment: interpolate from 24 to 40 vertical levels}