emodnet chemistry 2 semantic suggestions roy lowry and adam leadbetter british oceanographic data...
TRANSCRIPT
EMODNET Chemistry 2Semantic Suggestions
Roy Lowry and Adam LeadbetterBritish Oceanographic Data Centre
Semantic Issues
• Parameter semantic issues encountered during the pilot– Naming of the aggregated products– Inability to aggregate across multiple P01 codes– Difficulty mapping local parameter vocabularies to
P01– P01 scalability issues– Inability to discover a specified contaminant
Aggregation Naming
• Problem– During the pilot a lot of (circular) e-mail traffic
concerned the labelling of aggregated parameters• Solution– Naming needs to be governed– Governance decisions need to be implemented as
a controlled vocabulary
P01 Aggregation Issues
• Problem– Aggregation tools create an aggregated parameter
for every P01 code in the source dataset– Different P01 codes used for parameters that are
not significantly different (or even not different at all)
– Fixes for this (retagging source data or merging channels in the aggregation tool) is both labour intensive and error prone
P01 Aggregation Issues
• Solution– Define each aggregation as a set of P01 codes– Store and serve resultant mapping in the NERC
Vocabulary Server– Update aggregation tools to access mapping and
use it to dynamically merge channels with different P01 codes
P01 Mapping Difficulties
• Problem– There’s a lot (>28000) of codes in P01– Finding the code needed for a given local
parameter vocabulary term seems to cause a lot of difficulty
– Text generated from a semantic model isn’t always intuitive (e.g. [dissolved plus reactive particulate phase] = ‘unfiltered’)
P01 Mapping Difficulties
• Solutions– Mapping based the semantic model (matrix,
substance, taxon, gender, organ) rather than the preferred label text
– Improvements to the search algorithm in the client (e.g. Addition of ‘excluding’ clause)
– Exposure of P01 subsets through NVS2 concept schemes (thesauri)
– Training in how to map
P01 Scalability Issues
• Problem– Many contaminants in many different biological
entities = a number of P01 codes that is predicted to be unmanageable
• Solution (not favoured)– Redesign formats to use discrete semantic model
not P01 code• Different formats for different data types• Moves complexity from semantic domain into the data
files
P01 Scalability Issues
• Solution (preferred)– Retain P01 as a register of semantic element
combinations– Automate concept registration (part of a semantic
model-based mapping tool perhaps)– Use NVS V2 concept schemes to expose P01
subsets to make navigation easier
Contaminant Discovery Issues
• Problem– Parameter discovery (CDI interface) is based on
P02– P02 groups contaminants with variable granularity• Good for PCBs• Not so good for ‘other organic contaminants’
– A search for datasets with cadmium in Mytilus edulis flesh isn’t possible
– The nearest is metals in biota, which will give many unwanted hits
Contaminant Discovery Issues
• Possible Solution– Mine the P01 codes in the SeaDataNet file stock
into the CDI metadatabase– Use these for drill-down parameter discovery in
the CDI search engine
Taking This Forward
• Some of the solutions presented are ODIP pilot candidates
• Specifications of these are currently vague• Not absolutely clear who should be doing
what and when• Meeting (Liverpool or London if easier) to
develop the specifications and an implementation roadmap