netcdf-ld - towards linked data conventions for delivery of environmental data using netcdf
TRANSCRIPT
Towards linked data conventions for delivery of environmental data using netCDF
LAND AND WATER FLAGSHIP | OCEANS AND ATMOSPHERE FLAGSHIP
Jonathan Yu, Nicholas Car, Adam Leadbetter*, Bruce A. Simons, and Simon J.D. Cox CSIRO LWF and BODC/ Marine Institute (Galway)25 March 2014 | ISESS
… netCDF-LD
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
Outline
1. Big data and netCDF
2. Metadata challenges
3. Linked Data, JSON-LD & CSV-on-the-web
4. netCDF-LD
5. Applications
2 |
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
Big data
• “90% of the world’s data has been produced over the last two years”
• Environmental and earth observation has always been big-data…
3 |
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
National Soil and Landscape Grid
4 |
Hydrodynamic models
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu5 |
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu6 |
Real-time wind data
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
netCDF
• Persistence layer• Simple: self-describing, user
accessible “flat file”
• Java, C++, python, others
• Interfaces to the Data Model
7 |
File format
Software library
API
netCDF-4 Data Model
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
netCDF cont’
• Really good at handling array-oriented data – efficient subsetting• Multi-dimensional grids
• CF conventionsclimatology and weather forecasting domains
9 |
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
netCDF metadata example
10 |
float Nap_MIM(time, latitude, longitude) ; Nap_MIM:_FillValue = -999.f ; Nap_MIM:long_name = "TSS, MIM SVDC on Rrs" ; Nap_MIM:units = "mg/L" ; Nap_MIM:valid_min = 0.01209607f ; Nap_MIM:valid_max = 226.9626f ; Nap_MIM:standard_name = ”total_suspended_solids”;
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
Use of netCDF for Chlorophyll observations
11 |
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
netCDF challenges
• Encoding format and framework
• “Domain agnostic” but conventions exist for communities – e.g. Climate and Forecasting (CF conventions)
• CF conventions limited• Use of ‘standard_name’ attribute to denote the combination of
medium/transformation/substance/state/quantity using a particular syntax• Assumes English• Standard names take a while to be adopted• Standard names variations of each other• Units are from UDUNITS – Unidata managed• Narrow scope of scientific domain – can’t reuse for other domains
12 |
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
netCDF Metadata challenges – Semantic heterogeneity
13 |
Enviro Application
#1
Data
DB
Chl_MIM
Enviro Application
#2
Data
DB
mass_conc_chlorophyll_In_sea_water
Enviro Application
#3
Data
DB
mass_conc_chlorophyll_a_In_sea_water
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
Semantic Heterogeneity… leads to data silos
14 |
Enviro Application
#1
Enviro Application
#2
Enviro Application
#3
Data Data Data
DB DBDB
Chl_MIMmass_conc_chlorophyll_In_sea_water
mass_conc_chlorophyll_a_In_sea_waterX X
Meetings
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
Approaches to address netCDF metadata challenge
1. Strict naming grammars for standard_name
15 |
(Peckham 2014) CSDMS Standard Naming Conventions
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
Approaches to address netCDF metadata challenge (2)
2. Break up various concerns in standard_name into multiple attributes – ref (Yu et al. 2014)
float Nap_MIM(time, latitude, longitude) ; Nap_MIM:_FillValue = -999.f ; Nap_MIM:long_name = "TSS, MIM SVDC on Rrs" ; Nap_MIM:units = "mg/L" ; Nap_MIM:valid_min = 0.01209607f ; Nap_MIM:valid_max = 226.9626f ; Nap_MIM:scaledQuantityKind_id = "http://environment.data.gov.au/water/quality/def/property/solids-total_suspended" ; Nap_MIM:unit_id = "http://environment.data.gov.au/water/quality/def/unit/MilliGramsPerLitre" ; Nap_MIM:substanceOrTaxon_id = "http://environment.data.gov.au/water/quality/def/object/solids"; Nap_MIM:medium_id = "http://environment.data.gov.au/water/quality/def/object/ocean" Nap_MIM:procedure_id = "http://data.ereefs.org.au/ocean-colour/MIM_SVDC_RRS" ;
16 |
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
Harmonised Publish, Discovery, Access and Use
17 |
Relies on community agreed vocabularies
Describe/Publish the data
Query/Use dataEnviro
Application #1
Enviro Application
#2
Enviro Application
#3
Data Data Data
DB DBDB
substanceOrTaxon= http://environment.data.gov.au/def/object/chlorophyll
scaledQuantityKind = http://environment.data.gov.au/def/property/chlorophyll_concentration
Need to communicate more consistentlyRequires shared, precise, agreed semantics
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
Linked Data, JSON-LD & CSV-on-the-web
18 |"LOD Cloud Diagram as of September 2011" by Anja Jentzsch
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
Linked Data
• Method of connect related data, existing vocabularies and other semantics using web links (URIs) and a language for describing resources (RDF)
• Applications can thenquery data, follow the linksand infer new insights from the data
19 |
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
JSON-LD
{ "name": "John Lennon", "born": "1940-10-09", "spouse": "http://dbpedia.org/resource/Cynthia_Lennon"}
20 |
JSON-LDDecorators
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
JSON-LD and Semantic Web
21 |
http://dbpedia.org/resource/
John_Lennon
http://dbpedia.org/resource/
Cynthia_Lennon
“John Lennon”1940-10-09
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
CSV-on-the-web
• Linked data for CSV tabular data
• Add context to tables
22 |
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
Example: Domain vocab term
23 |
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
Example: Domain vocab term
24 |
... wqp:chlorophyll_a_concentration a skos:Concept, op:ScaledQuantityKind, qudt:ChemistryQuantityKind ;
skos:broader wqp:chlorophyll_concentration ; skos:prefLabel "chlorophyll a concentration"@en ;
netCDF-LD
Take the already successful and popular encoding format and…
‘linkify’ netCDF!
25 |
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
netCDF-LD: Linkifying netCDF
26 |
‘Context’ boilerplates
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
netCDF-LD: Linkifying netCDF
27 |
Air Temperature Definition
Air
Temperature
Medium Quantity Kind
Kelvin
UoMeReefs Vocabularies
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
netCDF-LD: Global Attributes
28 |
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
Assigning URIs to variable level attributes
z:units = "meters";z:units_ref = "http://qudt.org/vocab/unit#Meter";z:a = "http://environment.data.gov.au/def/op#quantityKind";z:dcPartOf = "http://foo.bar/linked_netCDF_example";z:valid_range = 0., 5000.;
29 |
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
netCDF-LD to RDF
@prefix unit: <http://qudt.org/vocab/unit#> .@prefix qudt: <http://qudt.org/1.1/schema/qudt#> .@prefix op: <http://environment.data.gov.au/def/op#> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix dcterms: <http://purl.org/dc/terms/> .@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
_:z qudt:unit unit:Meter; a op:ScaledQuantityKind; dcterms:isPartOf <http://foo.bar/linked_netCDF_example>.
30 |
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
Use of netCDF-LD to support Data Discovery
31 |
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
Data annotated with bindings to vocab URIs
32 |
THREDDS
THREDDS Catalog
Domain Vocabs(Water Quality at
environment.data.gov.au)
Quantities/ Units ontology(QUDT)
substanceOrTaxon= http://environment.data.gov.au
/def/object/chlorophyll
scaledQuantityKind = http://environment.data.gov.au
/def/property/chlorophyll_concentration
unit= http://qudt.org/vocab/unit#Unitless
medium= http://environment.data.gov.au
/def/feature/ocean
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
DBL Harvesting and End Use
33 |
Data Brokering Layer
THREDDS
THREDDS Catalog
Domain Vocabs(Water Quality at
environment.data.gov.au)
Quantities/ Units ontology(QUDT)
substanceOrTaxon= http://environment.data.gov.au
/def/object/chlorophyll
scaledQuantityKind = http://environment.data.gov.au
/def/property/chlorophyll_concentration
unit= http://qudt.org/vocab/unit#Unitless
medium= http://environment.data.gov.au
/def/feature/ocean
End users
Client application
chlorophyll
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
netCDF-LD benefits
• Enhanced metadata – better self-description for any domain
• Allows binding to rich semantics from existing vocabularies being published via SKOS/OWL/RDF in multiple domains
• Linked Data approach – consistent with Semantic Web technologies, RDF, JSON-LD, CSV-on-the-web
• Hook into toolkits and software libraries for data discovery via Semantic Web technologies and other Linked data already being published online
34 |
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
Future/Current Work
1. netCDF-LD requires adoption at community level – need to test the proposal at OGC, Unidata, CF communities
2. Tooling for parsing netCDF-LD
3. Encoding large volumes using netCDF-LD
4. Demonstrate Data Broker concept using netCDF-LD
35 |
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu
Summary
• netCDF is really good at big data – esp. array-oriented data
• Limitations with the current netCDF metadata structure
• *-LD pattern is emerging across multiple formats (JSON, CSV)
• Propose netCDF-LD which is an unintrusive extension on netCDF to facilitate improved self-described datasets
• netCDF-LD has potential to enhance data discovery from environmental dataset across to other Linked Data being published online
36 |
LAND AND WATER
Thank you
Land and WaterJonathan YuResearch Software Engineert +61 3 9252 6440e [email protected]