emodnet chemistry 2 semantic suggestions roy lowry and adam leadbetter british oceanographic data...

EMODNET Chemistry 2Semantic Suggestions

Roy Lowry and Adam LeadbetterBritish Oceanographic Data Centre

Semantic Issues

• Parameter semantic issues encountered during the pilot– Naming of the aggregated products– Inability to aggregate across multiple P01 codes– Difficulty mapping local parameter vocabularies to

P01– P01 scalability issues– Inability to discover a specified contaminant

Aggregation Naming

• Problem– During the pilot a lot of (circular) e-mail traffic

concerned the labelling of aggregated parameters• Solution– Naming needs to be governed– Governance decisions need to be implemented as

a controlled vocabulary

P01 Aggregation Issues

• Problem– Aggregation tools create an aggregated parameter

for every P01 code in the source dataset– Different P01 codes used for parameters that are

not significantly different (or even not different at all)

– Fixes for this (retagging source data or merging channels in the aggregation tool) is both labour intensive and error prone

P01 Aggregation Issues

• Solution– Define each aggregation as a set of P01 codes– Store and serve resultant mapping in the NERC

Vocabulary Server– Update aggregation tools to access mapping and

use it to dynamically merge channels with different P01 codes

P01 Mapping Difficulties

• Problem– There’s a lot (>28000) of codes in P01– Finding the code needed for a given local

parameter vocabulary term seems to cause a lot of difficulty

– Text generated from a semantic model isn’t always intuitive (e.g. [dissolved plus reactive particulate phase] = ‘unfiltered’)

P01 Mapping Difficulties

• Solutions– Mapping based the semantic model (matrix,

substance, taxon, gender, organ) rather than the preferred label text

– Improvements to the search algorithm in the client (e.g. Addition of ‘excluding’ clause)

– Exposure of P01 subsets through NVS2 concept schemes (thesauri)

– Training in how to map

P01 Scalability Issues

• Problem– Many contaminants in many different biological

entities = a number of P01 codes that is predicted to be unmanageable

• Solution (not favoured)– Redesign formats to use discrete semantic model

not P01 code• Different formats for different data types• Moves complexity from semantic domain into the data

files

P01 Scalability Issues

• Solution (preferred)– Retain P01 as a register of semantic element

combinations– Automate concept registration (part of a semantic

model-based mapping tool perhaps)– Use NVS V2 concept schemes to expose P01

subsets to make navigation easier

Contaminant Discovery Issues

• Problem– Parameter discovery (CDI interface) is based on

P02– P02 groups contaminants with variable granularity• Good for PCBs• Not so good for ‘other organic contaminants’

– A search for datasets with cadmium in Mytilus edulis flesh isn’t possible

– The nearest is metals in biota, which will give many unwanted hits

Contaminant Discovery Issues

• Possible Solution– Mine the P01 codes in the SeaDataNet file stock

into the CDI metadatabase– Use these for drill-down parameter discovery in

the CDI search engine

Taking This Forward

• Some of the solutions presented are ODIP pilot candidates

• Specifications of these are currently vague• Not absolutely clear who should be doing

what and when• Meeting (Liverpool or London if easier) to

develop the specifications and an implementation roadmap

emodnet chemistry 2 semantic suggestions roy lowry and adam leadbetter british oceanographic data...

Documents