s-enda use cases documentation

S-ENDA use cases Documentation

Morten W. Hansen et al.

Feb 14, 2022

BACKGROUND

1 The FAIR Principles 1

2 User analysis 32.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Users definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.3 Use Case Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 Heritage of metadata management 333.1 Template for description of heritage systems used in data management . . . . . . . . . . . . . . . . . 333.2 MET Norway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4 Requirements Specification 39

5 S-ENDA Architecture 415.1 General Contexts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.2 S-ENDA C4 Context Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

6 Discovery and Configuration Metadata Catalog 43

7 Data Management Recipes 45

8 General conventions 478.1 Versioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

9 GIT Conventions 49

10 Definition of Done 51

11 Development environment 5311.1 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5311.2 Vagrant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5511.3 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5511.4 Generic usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5511.5 S-ENDA configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5611.6 Development of the S-ENDA csw catalog service and relevant Python packages . . . . . . . . . . . . 58

12 Docker CI with GitHub Actions 5912.1 Set up automatic Build of Containers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5912.2 Set Up Unit Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6312.3 Set Up Coverage Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

i

13 Writing documentation 6513.1 Software documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6513.2 Service documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6613.3 Compiling the documentation locally . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

14 Debating 67

15 Indices and tables 69

ii

CHAPTER

ONE

THE FAIR PRINCIPLES

S-ENDA’s vision is that everyone, from professional users to the general public, shall have easy, secure and stableaccess to dynamic geodata.

The FAIR principles constitute the governing principles of S-ENDA. This means that all contributions to the projectshall be aligned with the FAIR principles.

The FAIR data principles were introduced in 2016 through The FAIR Guiding Principles for scientific data managementand stewardship paper. Since then, they have become the reference point for many projects and efforts, both at theEuropean and global level. The highest goal of these efforts is to move towards open science, i.e., to make scientificresearch available to the whole society. In this context, scientific research has a broad meaning, from data publications,to digital and physical data, to software or anything that has a value for the community. A basis of this concept is theability and willingness to share knowledge, including data. This is, in turn, beneficial to, e.g., the general public, thescientific community, the research sponsors and funders, and to everyone working with research and development.

The FAIR data principles state that:

• data should be Findable

• data should be Accessible

• data should be Interoperable

• data should be Reusable

As stated in the original paper “The Principles may be adhered to in any combination and incrementally, as dataproviders’ publishing environments evolve to increasing degrees of ‘FAIRness’. Moreover, the modularity of the Prin-ciples, and their distinction between data and metadata, explicitly support a wide range of special circumstances.” Thisshows that a FAIRification process can be approached in several steps, thus valuing any improvements that are madeto achieve FAIRness.

The FAIR principles tackle the key points to allow people to find, access and use data, with specific focus on humanand machine-to-machine interaction.

• The Findability principles require good descriptions of (meta)data, to allow resources to be discovered.

• The Accessibility principles concern the protocols to access the (meta)data which are discovered.

• The Interoperabiliy principles focus on the content of the digital resources and their representation, to allow dataintegration and proper understanding of the accessed resources.

• The Reusability principles stress the importance of proper information to allow correct reuse and citation, in-cluding reuirements for licensing.

Detailed descriptions and explanations of the FAIR principles can be found in many online resources and will thereforenot be reported here, but a list of relevant links is provided instead:

• https://www.go-fair.org/

• https://www.fairsfair.eu/

1

https://doi.org/10.1038/sdata.2016.18


https://www.go-fair.org/

https://www.fairsfair.eu/


• https://envri.eu/home-envri-fair/

• https://www.openaire.eu/

• https://www.rd-alliance.org/

• https://www.force11.org/

• https://www.dataone.org/

Scientific papers and reports:

• The FAIR Guiding Principles for scientific data management and stewardship

• Turning FAIR into reality

As stated above, the S-ENDA project uses these principles as guidance for the performed work, with the goal of unifyingthe data management at MET Norway and other Norwegian environmental institutes and agencies, to improve thesharing and reuse of created data and knowledge.

2 Chapter 1. The FAIR Principles

https://envri.eu/home-envri-fair/

https://www.openaire.eu/

https://www.rd-alliance.org/

https://www.force11.org/

https://www.dataone.org/


https://op.europa.eu/en/publication-detail/-/publication/7769a148-f1f6-11e8-9982-01aa75ed71a1/language-en/format-PDF/source-80611283

CHAPTER

TWO

USER ANALYSIS

2.1 Context

Vision Everyone, from professional users to the general public, should have easy, secure and stable accessto dynamic geodata. S-ENDA shall achieve cross-sectoral coordination of data management and cost-effective use of resources on both the supplier and consumer side.

Goals

• Increased use of real-time and historical dynamic geodata at all user levels

• Information made available in the form of dynamic geodata must be verifiable and traceable (back to basic/rawdata)

• Reduced costs for establishment and operation of effective management of dynamic geodata at the partner insti-tutions

Content moved to DMH: https://metno.github.io/data-management-handbook/#human-roles

2.2 Users definition

Content moved to DMH: https://metno.github.io/data-management-handbook/#user-definitions

2.3 Use Case Descriptions

This section contains descriptions of use cases based on typical data needs and related workflows that should be sup-ported by S-ENDA. You can use the use case template as a guide for describing a new use case, or to modify an existingone.

2.3.1 Use case template

Insert the title of the use case template in the above heading. No other text should go under this heading.

3

https://metno.github.io/data-management-handbook/#human-roles

https://metno.github.io/data-management-handbook/#user-definitions


Use Case Goal

This is required.

Brief description of the reason for and outcome of this Use Case, or a high-level description of the sequence of actionsand the outcome of executing the Use Case.

Actors

This is required.

An actor is a person or other entity, external to the system being specified, who interacts with the system (includesthe actor that will be initiating this Use Case and any other actors who will participate in completing the Use Case).Different actors often correspond to different user classes, or roles, identified from the customer community that willuse the product.

Trigger

Event that initiates the Use Case (an external business event, a system event, or the first step in the normal flow.

Pre-conditions

Activities that must take place, or any conditions that must be true, before the Use Case can be started.

Post-conditions

The state of the system at the conclusion of the Use Case execution.

Normal Flow

Detailed description of the user actions and system responses that will take place during execution of the Use Caseunder normal, expected conditions. This dialog sequence will ultimately lead to accomplishing the goal stated in theUse Case name and description.

Alternative Flows

Other, legitimate usage scenarios that can take place within this Use Case.

Exceptions

Anticipated error conditions that could occur during execution of the Use Case, and how the system is to respond tothose conditions, or the Use Case execution fails for some reason.

4 Chapter 2. User analysis


Includes

Other Use Cases that are included (“called”) by this Use Case (common functionality appearing in multiple Use Casescan be described in a separate Use Case included by the ones that need that common functionality).

Notes and Issues

Additional comments about this Use Case and any remaining open issues that must be resolved. (It is useful to Identifywho will resolve each such issue and by what date.)

Relevant software

Relevant datasets

UML diagram

UML diagram, example;

2.3.2 Farmer in Malawi

Use Case Goal

A farmer in Malawi receives daily forecasts from METs open data services. The forecast must arrive on a relevantplatform and be presented in a way that makes it usable for the farming work.

Actors

• Random user with limited experience (farmer in Malawi)

• Data owner nwps (SUV)

2.3. Use Case Descriptions 5


Trigger

• The farmer need to follow the day to day based forecast to best plan how to get the most out of the crops

• This can either be triggered manually by the farmer going directly to a link with the latest relevant forecast forthis purpose. Or the farmer has a script in a cronjob updating a netcdf file every morning.

Pre-conditions

• Point data for forecast must be available for the actual point, ie the whole world

• Observation data could be available.

• Forcast fields for the actual place must be available, ie the whole world. This must be available to be view like awms layer and/or available for download

• The wms view must be enhanced for farming purposes, ie relevant paramters for farmin

• Meta data must be registered for the products for the user to be able to find it.

Post-conditions

• User be able to zoom and pan in a WMS with the products overlayd some map to easy relate the data to a location.

• User be able to make his decision from the data available

• User be able to download the data and visualize this in his desired tool

Normal Flow

• User search for parameters/fields relevant for farming(temperature, precipitioan, evaporation, dew, wind, humid-ity and possible also other) in Malawi for today

• Return possible source of these data with the newest data available first

– these parameters as one link to WMS with each paramteres as different layers.

– these parameters with one link each to downloadable dataset

– in the WMS view cycle over the available timesteps for each parameter

Alternative Flows

N/A

Exceptions

• Newest expected model results are delayed or for some other reason is missing. This can cause the user to getold results when he expects new as he check this every day.



Includes

N/A

Notes and Issues

N/A

Relevant software

N/A

Relevant datasets

• ECMWF model datasets, one dataset for each paramater for each timestep(?) available for WMS display

• The same data available for download or streaming

• Observations

UML diagram

2.3.3 Temperature from Longyearbyen

Use Case Goal

A user shall extract observed and forecasted temperature time series data values over Longyearbyen. Data consumerperspective; tests the compliance of the data management model with the FAIR principles

(This use case is already described in MET DMH: UC4)

The user needs both observed and forecasted temperature for the same time period and same location (longyearbyenarea).

Actors

• Data Consumer/Advanced User/Researcher (DC)

• Data owner (observations)(Obsklim on behalf of different owners)

• Data owner (model)(Director SUV, maybe on behalf of others)

• Data provider (observations)(Obsklim)

• Data provider (model)(SUV)

• Service provider (external dataservice)?



Trigger

The DC/user needs data for a project, searches for available data via some sort of web search engine.

The user might search in a general web search engine, or he/she might use a data portal/service known to them (e.g.,geonorge, services from MET, etc.).

Pre-conditions

• Model output is gridded data that are stored on a server at MET Norway

• Observations are stored on a server at MET Norway

• Observations must be made available and be well defined (with metadata)

• There is a high degree of standardisation for all types of data

• Datasets (Observations and forecast data) are findable and accessible for general web search engines, and to morespecialized portals.

Post-conditions

• Relevant datasets are found (user defined timeperiod)

• Relevant metadata for use and reference are found

• The user understands what types of datasets are available for their use (difference between observations, satellitedata and model/forecast data)

• Full datasets or subsets have been downloaded/streamed

• Enough information (datasets + metadata) has been received to produce the results needed, and to reference thedata.

Normal Flow

• User searches for relevant datasets

• List of available datasets, with metadata, in given area and timeframe found and displayed to user

• User downloads the datasets they want to use

• Enough information follows the dataset for:

– The user to produce the products they need

– The user to reference the data properly

Alternative Flows

• User searches in generic search engine for datasets

• User searches with machine-to-machine API

• User searches for relevant sources for the types of datasets the user needs.



Exceptions

• Anticipated error: Data may be missing for the desired area, point or timeframe.

• System response: Displays understandable error messages.

Includes

Notes and Issues

Relevant software

Relevant datasets

• InSitu Observations: temperature data from datastorage at MET

• Satellite observations

• Model data: temperature forcast data from ECMWF/EMEPS/MEPS

UML diagram


2.3.4 New NWP model

Use Case Goal

Brief description of the reason for and outcome of this Use Case, or a high-level description of the sequenceof actions and the outcome of executing the Use Case.

Datasets produced from a new NWP model are freely available for

• File download as NetCDF

• Streaming via OPeNDAP

• Visualisation via WMS

• Visualisation in Diana(?)

The datasets are discoverable via

• MET Norway catalogue web search interface

• Google?

• Machine-machine search interface



Actors

An actor is a person or other entity, external to the system being specified, who interacts with the system(includes the actor that will be initiating this Use Case and any other actors who will participate in com-pleting the Use Case). Different actors often correspond to different user classes, or roles, identified fromthe customer community that will use the product.

• Data Provider (DP)

• Data Owner (Director SUV)

• A random user with technical expertise (Python and Jupyter)

• A random user with limited expertise who is just browsing the web

Trigger

Event that initiates the Use Case (an external business event, a system event, or the first step in the normalflow.

Pre-conditions

Activities that must take place, or any conditions that must be true, before the Use Case can be started.


• NWP model data are well-known

• There is a high degree of standardisation

Post-conditions

The state of the system at the conclusion of the Use Case execution.

A user is able to visualise the wind speed and direction in

• WMS in a web browser

• Jupyter Notebook

Normal Flow

Detailed description of the user actions and system responses that will take place during execution of theUse Case under normal, expected conditions. This dialog sequence will ultimately lead to accomplishingthe goal stated in the Use Case name and description.

• define dataset(s) (including cf-variable);

• Update Data Management Plan for NWP;

• (Director DP informs)

• (inform DM/SM);

• build CF and ACDD compliant NetCDF files for the dataset(s) from the native files;

• configure dataset(s) in TDS;

• create MMD metadata



• ingest in metadata catalog (e.g., SolR);

• make searchable in, e.g., opensearch (like, e.g., https://colhub.copernicus.eu/userguide/ODataAPI?TWIKISID=b7b00ae74a4fc691a138709825f16fa3)

• add to existing portals (or check that it is made automatically available)

Alternative Flows

Other, legitimate usage scenarios that can take place within this Use Case.

Exceptions

Anticipated error conditions that could occur during execution of the Use Case, and how the system is torespond to those conditions, or the Use Case execution fails for some reason.

Includes

Other Use Cases that are included (“called”) by this Use Case (common functionality appearing in mul-tiple Use Cases can be described in a separate Use Case included by the ones that need that commonfunctionality).

Notes and Issues

Additional comments about this Use Case and any remaining open issues that must be resolved. (It isuseful to Identify who will resolve each such issue and by what date.)

2.3.5 Find latest satellite image describing cloud cover for visibility of NorthernLights

Use Case Goal

A user (tourist guide) wants to see Northern lights, but the current location is cloudy. Find information about cloudcover in the nearby areas to find places with less clouds. This can be done with infrared satellite imagery and or analysedcloud products.


https://colhub.copernicus.eu/userguide/ODataAPI?TWIKISID=b7b00ae74a4fc691a138709825f16fa3

https://colhub.copernicus.eu/userguide/ODataAPI?TWIKISID=b7b00ae74a4fc691a138709825f16fa3


Actors

• Random user with limited experience

• Random user (tourist guide) doing this every evening to plan an excursion

• Data owner (satellite)(Obsklim, FOU-FD)


• Data provider (satellite)(Obsklim, NBS)




Trigger

• User experiences cloudy conditions and wants to find nearby areas with less clouds.

• User planning an excursion wants to check the recent cloud cover information to plan the excursion.

Pre-conditions

• Satellite product for visualising clouds (gridded) at night must be available as WMS

• Satellite data (gridded) must be accessible for an experienced user to visualize in their preferred tool

• Metadata must be registered for the user to be able to find the dataset

Post-conditions

• User is able to zoom and pan in a WMS with the product overlayed some map to easily relate the data to a location

• User is able to load the downloaded or streamed data into a desired visualisation tool.

Normal Flow

• User search for cloud cover over Northern Norway (skydekke Nord-Norge)

• Return possible sources of cloud cover data

– cloud data from a model

– satellite product showing the current cloud coverage

• The data description must explain the strength of each product to enable the user to choose the right product forhis/her needs

• When selecting the satellite image this should be display in a wms display (on top of a map) enabling the user topan and zoom, and possibly also to switch between different times.

Alternative Flows

• If a more experienced guide is using this, then the normal flow will be different as this user know what to searchfor. This user will go directly to the source and download the latest data. Then display this in his/her favoritedisplay tool.

Exceptions

• Satellite data is too old to be relevant for the user

• WMS display system is not working

• Thredds server is failing



Includes

N/A

Notes and Issues

N/A

Relevant software

N/A

Relevant datasets

• MEPS model data

• Gridded satellite cloud product from optical and/or infrared sensors available in WMS, downloadable and/orstreamable

UML diagram

2.3.6 Investigation of vegetation health variations in a given area based on mea-surements from Sentinel-2

Use Case Goal

An advanced user shall be able to define a time span and region of interest in an NDVI tool in order to assess thetemporal and spatial variations of vegetation health in the given region.

Actors

An advanced user interacts with an external NDVI tool (on his/her local system), which interacts with the S-ENDA findand access systems via machine-machine APIs. The advanced user is able to interpret the (NDVI) index with respectto vegetation health.



Trigger

• Abnormal deterioration of vegetation health in the region of interest triggers the need to investigate NDVI changesin time

Pre-conditions

• The S-ENDA find metadata store must be populated by Sentinel-2 use and discovery metadata

• The S-ENDA access system must provide OPeNDAP access to Sentinel-2 data

• A simple NDVI Tool must be available on the user side to calculate the NDVI from bands in a Sentinel-2 acqui-sition

Post-conditions

The user is presented with a timeseries of gridded NDVI (i.e., a 3D dataset of temporal and horizontal dimensions)which can be further analysed in the external system.

Normal Flow

• [External system] The advanced user defines time and region of interest for the NDVI timeseries

• NDVI Tool searches the S-ENDA find system via the machine-machine API for the data of interest

• NDVI Tool retrieves use and discovery metadata from S-ENDA find

• NDVI Tool accesses the S-ENDA access system via OPeNDAP using the http address provided in the metadata

• NDVI Tool extracts spatial subsets of the found Sentinel-2 data

• Subsets of the Sentinel-2 data are streamed to the NDVI Tool

• [External system] The NDVI Tool creates an NDVI timeseries

• [External system] The advanced user analyses and interprets the 3D NDVI dataset

Alternative Flows

None

Exceptions

• OPeNDAP access could fail

– The system must offer an opportunity to download full files as a replacement for the OPeNDAP stream

• The search API could fail(?)



Includes

None

Notes and Issues

• We need to make a simple NDVI tool which calculates the NDVI from two bands in Sentinel-2

Relevant software

• Python netCDF4 to access the OPeNDAP stream

• Nansat could be used as basis for the NDVI tool and to access the Sentinel-2 data stream or any downloadednetCDF-CF files

Relevant datasets

• NBS Sentinel-2

UML diagram

2.3.7 Routine monitoring of vegetation health using NDVI from Sentinel-2

Use Case Goal

Reindeer herders shall be able to assess vegetation health in relevant herding areas from continuous NDVI monitoringwith Sentinel-2.

Actors

• A general user (i.e., a reindeer herder) interacts with a web UI or a mobile app to check the vegetation state inrelevant herding areas


https://github.com/nansencenter/nansat


Trigger

• Users need to know where to lead reindeer herds to find fresh vegetation for food

Pre-conditions

• The S-ENDA find metadata store must be updated by the newest Sentinel-2 use and discovery metadata on dataarrival

• The S-ENDA access system must provide access to the Sentinel-2 data

• A server side NDVI Tool must be available to calculate the NDVI from bands in a Sentinel-2 acquisition

• The NDVI results are available for streaming to user applications

• A simple description of how to interpret the NDVI must be available, or the NDVI must be presented in a waythat is evident/self-describing

• A web page must present the latest NDVI and ability to browse back in time (e.g., with a slider)

Post-conditions

• The web page contains information about vegetation state based on the latest Sentinel-2 retrievals

• The user is able to find fresh vegetation

Normal Flow

• The user accesses a web page containing information about the current and present vegetation state

• The user interprets the vegetation state in an extended region (relevant for herding) based on color codes andhistoric information (using, e.g., a slider)

Alternative Flows

• The user accesses a mobile app containing information about the current and present vegetation state

• The user interprets the vegetation state in an extended region (relevant for herding) based on color codes andhistoric information (using, e.g., a slider)

Exceptions

• OPeNDAP access could fail (due to, e.g., high server load?)

Includes

• Investigation of vegetation health variations in a given area based on measurements from Sentinel-2



Notes and Issues

• We need to make a simple web page to display the results

• We may want to demonstrate a mobile app

Relevant software

• Python netCDF4 to access the OPeNDAP stream

• Nansat could be used as basis for the NDVI tool and to access the Sentinel-2 data stream or any downloadednetCDF-CF files

Relevant datasets

• NBS Sentinel-2

UML diagram

UML diagram from user perspective:

2.3.8 Outdoor swimming competition

Use Case Goal

An outdoor swimming competition organizer wants to know sea surface temperature at the event location.

The organizer is interested in the sea surface temperature at the event location. As there is very few locations wheresea-water temperature is measured, the result to the user should show both closest available in-situ measurement (whichprobably will be some distance away from the actual event location), the sea temperature from model, and derived fromsatellite measurements.

• How to find the right dataset(s) and service?

• How to find the correct variable

• Is there uptime guarantee for the service in case a system will be built on top of it that needs to be available forfuture events?

For this use case, the parameter sea surface temperature should be exchangeable with any other parameter.


https://github.com/nansencenter/nansat


Actors

• Random user with limited experience

• Data owner (observations)(Obsklim on behalf of different owners)


• Data owner (satellite)(Obsklim, FOU-FD, maybe others)

• Data provider (observations)(Obsklim)


• Data provider (satellite)(Obsklim)

Trigger

External privately organized sports event: needs information about sea surface temperature conditions for the competi-tors for general information and safety considerations.

Pre-conditions




• Satellite data is gridded data that are stored on a server at MET Norway


Post-conditions

• A user is able to find all available sea temperature data.

• A user is able to choose/access the data he/she is interrested in.

• A user is able to visualize the data(or a compilation of data) for the competitors via WCS

• A user is able to discern which data is available for the area the user is interested in (and perhaps also informationabout what is not available, e.g., as: “No sea surface temperature observations available for inner Oslo fjord, butclosest is. . . ”)

Normal Flow

User actions: - User searches for “sea surface temperature oslofjord” in some generic search environment (e.g., google)

System responses: - Responds with a list of possible datasets and the possible ways of accessing those datasets. -Responds with existing services that may be useful for this request. - Access to datasets should preferrably both be filedownload with links to viewers and perhaps a simple WMS visualisation



Alternative Flows

• User searches in a known MET based search environment

• User uses Norwegian search phrases (sjøtemperatur)

System responses should be the same.

Exceptions

Parameter searched for is not found (typing error. . . )

• Understandable error message, with suggestions to correct. (Did you mean. . . )

Location does not have data, the search is for an area outside our datasets:

• Understandable error message: “Observations not available for this point, closest point with observations is. . . ”

Point in time/space not available in satellite data because of clouds (SST not possible to calculate from data)

Includes

Notes and Issues

Relevant software

Relevant datasets

• Observations: sea surface temperature from data storage at MET (Maritime data)

• Satellite: Calculated SST from satellite data.

• Model data: MEPS/ECMWF sea surface temperature

UML diagram




2.3.9 Creation of seNorge gridded observation dataset

Use Case Goal

The seNorge dataset is produced based on historical observations. The objective is to create a high resolution (1k)gridded data set of temperature and precipitation for the Norwegian mainland.

Goal: To produce observational gridded datasets over Norway.

https://github.com/metno/seNorge_docs/wiki/seNorge_2018

These are used in other analyses, such as the one described in Use Case Goal use case.

We follow the model presented in the Users definition section:

• Producers: the Norwegian Meterological Institute (MET)

• Consumers: advanced-consumers (e.g., researchers), intermediate-consumers, and simple-consumers (e.g.,journalists)

Actors

• Researchers at MET and externally

• Data processing and production (PPI)

• Data storage (thredds and lustre)

Producers

Norwegian meteorological instutute.

Consumers

End users of the data might include:

• A researcher that wants to use the data to remove biases in a global climate prediction model.

• NVE uses the data in their hydrological and snow modeling.

• A Journalist that wants to create visualizations of climate changes.


https://github.com/metno/seNorge_docs/wiki/seNorge_2018


Trigger

An update of the dataset is produced daily and the data is made available on thredds. Less often a full rerun of theinterpolation is done, and this dataset is published with more traceability (DOI). It also benefits from and updates orimprovements in the quality control routines.

Pre-conditions

The following datasets must be available to the producer:

• Historical weather observations for Scandinavia.

• Fine resolution map data for the area of interest.

Post-conditions

• A dataset is created and stored on lustre and thredds.

• The dataset is also used to create visualizations on seNorge.

Normal Flow

Producer

1. The producer uses the following data sources:

• Norwegian Meteorological Institute’s Climate Database (via the Frost API)

• wedish Meteorological and Hydrological Institute Open Data API

• Finnish Meteorological Institute open data API

• Maps created by NVE based on a fine digital elevation model (/lustre/storeB/project/metkl/klinogrid/geoinfo)

2. The producer then uses the data as input and runs the code found here: https://github.com/metno/seNorge2

3. The data is stored at MET.

Consumer

1. The consumer searches for data

2. The consumer investigates and interprets data

• The researcher downloads the whole dataset off thredds to investigate different parameters.

• NVE uses the atmospheric data to initialize the hydrological and snow simulations for modeling.

• The journalist chooses parameters on a website like http://www.senorge.no to look at the data through visualiza-tions

3. The consumer concludes and summarizes their findings

• The researcher publishes a scientific paper

• Hydrological and snow model data is produced.

• The journalist publishes a newspaper article


https://github.com/metno/seNorge2

http://www.senorge.no


Alternative Flows

Exceptions

Includes

Notes and Issues

Relevant software

Relevant datasets

UML diagram

2.3.10 Climate Projection of Yearly Air Temperature in Norway

Use Case Goal

Various users who work with climate adaptation or products for climate adaptation must obtain relevant data on expectedfuture climate change (climate projections), including hydrology and natural hazards. The Norwegian Climate ServiceCenter will calculate new climate and hydrology projections (expected changes in the future) for Norway up to the year2100.

Climate projections have a systematic bias for temperature and precipitation. Because of this they run climate modelsfor 100 years or so, such that they have an overlap with observational gridded datasets for 30-40 years. The time periodwhere the climate projections and observational datasets overlaps is used to compute systematic differences that areused to rescale climate projections such that the systematic differences are removed.

National services (such at MET) downscale and remove biases from the global projections to get it to fit more locally.MET downscales using both numerical models and stastical methods.

This use case focuses on the new projections for the yearly average and extreme air temperatures in Norway, forwardin time to year 2100.

Goal: To produce new climate projections of yearly air temperatures in Norway.


• Producers: the Norwegian Climate Service Center


The knowledge generation is described in the below figure. The data (see Context) used to generate the air temperatureprojections is a combination of gridded historical weather observations (level 3) and model simulations (level 4). Theprocessing and interpretation consists of using the weather observations to correct biases in the model results to producea dataset of downscaled climate model data (from this, likely future scenarios can be estimated to provide general andspecific information about the future climate).



Actors

Producers

The Norwegian climate service centre and partners (MET, NVE, NORCE, Bjerknessenteret) will produce the newpredictions for the average and extreme air temperatures in Norway, forward in time to year 2100.

Consumers

End users of the data might include:

• A researcher that wants to use the data in their biology model to predict the effect on ticks.

• A state agency that wants to investigate the effects temperature changes will have in each Norwegian kommune.

• A Journalist that wants write about the potential changes that Norway will experience.



Trigger

Norwegian Environment Agency orders the report - Climate in Norway 2100. The climate model data is needed towrite this report.

Pre-conditions

The following datasets must be available to the producer:

• Bias adjusted time series from EURO-COREX (CMIP5 og CMIP6).

• Calculated climate indexes, averages over certain time periods, etc.

• Gridded historical weather observations.

https://klimaservicesenter.no/faces/desktop/article.xhtml?uri=klimaservicesenteret/klima-og-hydrologiske-data/datagrunnlag-klimafremskrivninger

The following services should be used to present results:

• A solution for getting data

Post-conditions

• A dataset is created that contains climate predictions forward in time until 2100.

• The data should be stored on netcdf on a 1x1 km grid (follow-up: why netcdf?)

• This dataset includes predictions about temperature, as well as many other variables.

• The results of this dataset are distilled into a report (e.g., pdf) - Climate in Norway 2100.





Normal Flow

Producer

1. The producer searches and accesses the following data:

• Gridded historical weather observations

• Climate model data

• Some specific time series may be used in post processing

Currently at MET the gridded observations can be found on both lustre and thredds. The global climate models canbe found in online portals which can potentially be searched, but its also possible that the users are being told exactlywhere the data they want is found.

2. The producer creates the climate projections, and other aggregated values / time series (e.g. fylke averages).

3. The data is made available to consumers.

Note: A significant challenge is that the creators of the data feel a need to have some control over how the data isused and presented, since otherwise there is a potential for misinterpretation and/or misrepresentation. For exampleaverages over larger areas / over longer time periods might be considered appropriate use, but using the finer scale datato make decisions about landuse (or zoom way in on a map) is likely innapropriate. Most simple consumers are unableto grasp the uncertainty contained in the model data, and the varying quality for the different aggregation scales.

4. The report is written based on interpreting the predictions.

Consumer

1. The consumer searches for data

2. The consumer investigates and interprets data

• The researcher investigates the data

– Downloads the climate projection for temperature for the whole time range

– Downloads the climate projection medians as an average over the period 2071-2100

– Collocates the climate projections with their biology model simulation results

– Runs an algorithm to predict the effect on ticks

• The journalist chooses parameters on a website to see what the extreme temperatures will be in 2041-2070

• The state agency updates their maps with expected changes, to reflect future changes in permafrost

3. The consumer concludes and summarizes their findings

• The researcher publishes a scientific paper

• The journalist publishes a newspaper article

• The state agency establishes hazard zones due to melting permafrost



Alternative Flows

The journalist wants to know the temperature on 25th June, 2074. It must be clear from the discovery metadata thatthe projections cannot be used for that purpose.

Relevant datasets

• seNorge2018 for adjusting bias and grid specifications

• EURO-CORDEX climate prediction data that will be downscaled from a 12x12 km grid to 1x1 km for Norway.

• CMIP5 og CMIP6 climate predictions that will be downscaled for Norway.

Current workflow(s)

Because it is very easy for consumers to misinterpret the data (see note above in under normal flow), the distributionof the data is somewhat limited.

One can download particular aggregations of the data here: https://klimaservicesenter.no/faces/desktop/scenarios.xhtml https://nedlasting.nve.no/klimadata/kss

Some of the netcdf data can be found here: https://drive.google.com/drive/folders/1czjY8UR8RxUCwZsdsqNa-09cvRi5bVLB

See how the current data is used and visualized today: https://klimaservicesenter.no/faces/desktop/scenarios.xhtml


https://klimaservicesenter.no/faces/desktop/scenarios.xhtml


https://nedlasting.nve.no/klimadata/kss

https://drive.google.com/drive/folders/1czjY8UR8RxUCwZsdsqNa-09cvRi5bVLB

https://drive.google.com/drive/folders/1czjY8UR8RxUCwZsdsqNa-09cvRi5bVLB



Relevant software:

• CDI and NCO for netcdf file manipulation

• OGC Web Map Service (WMS) for presenting results

UML diagram

2.3.11 Model of Arctic Fox Distribution in Scandinavia

Use Case Goal

Goal: Creation of a model for spatial distribution of the Arctic fox in Scandinavia.

Biodiversity observation data shows species occurrences in space and time. This can be found from many differentsources (including from outside NINA), and combined into a distribution model.


• Producers: the Norwegian Institute for Nature Research (NINA)




Actors

• Biodiversity occurrence data owner (NINA & other Institutes / sources)

• Model of Arctic fox distribution Provider (NINA)

• Climate and weather observation data owner (Obsklim on behalf of different owners)

• Climate and weather observation data owner (observations)(Obsklim)

Trigger

A researcher aims to investigate what factors determine the spatial distribution of Arctic fox (Vulpes lagopus; Linnaeus,1758) in Scandinavia.

Pre-conditions

• NINA biodiversity discovery metadata is available in S-Enda catalogue endpoint (& thus is also available inGeonorge).



• Producer is able to cross reference biodiversity data with climate and weather observation data.


Post-conditions

• Consumer is able to find the distribution model results in the S-enda catalogue endpoint.

• The consumer is able to find and access the underlying datasets used for creating the model (provenance).

• Producer and consumer are able to give feedback on quality of the data.

Normal Flow

Producer

User actions:

1. Review the literature on Arctic fox distribution and identify plausible (ecologically) variables that could explainthe distribution.

2. Search for occurrence data for Arctic fox in the geographic region of interest could also be survey data (morerobust estimates of presence or absence than ad hoc sighting data). Note: This search could be carried out usingthe s-enda catalogue endpoint.

• Deduplicate the occcurence data, since different sources of data can contain the same data record.

• Identify biases in the data generation.

3. Find covariates that explain the distribution of Arctic fox. Particularly Climate data and observations.

• Covariates could be e.g Red fox distribution or locations (from gbif or other); Small mammal records (linetransect surveys carried out in Norway and Sweden), altitude (from a DEM), NOA (related to winter conditions),climate data, distance to forest line, etc.



• Identify and mitigate for biases in covariates

4. Model the distribution of Arctic fox in relation to the covariates taking into account the major bias (which is alack of absence points) and time. Integrate ancillary information into the model (survival estimates, reproductionrates etc. climate change models) to develop predictive models.

5. Model results are made available in the s-enda catalogue.

• Check licensing of each of the underlying datasets.

• Give appropriate license to newly created dataset.

Consumer

End users of the model could be wildlife managers and NINA researchers.

Wildlife management authorities might use the results as a base for their policy decisions. Other scientists might createsimilar models and compare them, or create a model for a different species and cross reference.

Alternative Flows

• The occurence data is found on on gbif.org or on data repositories (usually a part of published papers that havearchived data)

• The climate and weather observation data must still be found from MET.

UML diagram


2.3.12 Generic Data Provider

Use Case Goal

A data provider wants to register their data / metadata so that it is accessible in the S-Enda portal.


https://www.gbif.org/


Actors

• An internal provider that creates aggregated or merged datasets

• An external provider that has datasets they wish to cross reference with weather data

Trigger

• The provider has new data that they wish to make available.

Pre-conditions

• All data that this data is based on is available.

Post-conditions

• The provider is able to register their data / metadata so that it is searchable in an appropriate manner.

Normal Flow

• Learn about how to register data manually or automatically, read documentation.

• Initial setup of connection between two systems (if it is automatic).

• The provider connects to the S-Enda provider interface and sends their data / metadata for registration.

• The provider receives feedback about if the registration was sucessful.

Alternative Flows

Exceptions

• The provider is unable to connect to the S-Enda interface

Includes

N/A

Notes and Issues

N/A



Relevant software

N/A

Relevant datasets

N/A

UML diagram



CHAPTER

THREE

HERITAGE OF METADATA MANAGEMENT

3.1 Template for description of heritage systems used in data man-agement

3.1.1 Responsible(Who?)

3.1.2 Description(What?)

3.1.3 Documentation(Where/how?)

3.1.4 Conditions and dependencies(why?)

3.2 MET Norway

3.2.1 Service catalog

TODO: write or link to documentation about the service catalog.

3.2.2 Stinfosys

Responsible (Who?)

• System owner: Leader, department of observation quality and data processing

• Maintenance group

• Technical management: IT-Geo-Drift, OKD

• Content management: OKD

33


Description (What?)

Postgres database with site metadata for weather stations. Near complete information about stations owned by METNorway, a good amount of information about Norwegian weather stations owned by others, and a small amount ofinformation for some stations outside Norway.

Main information groups:

• Stations: a collection of measurements (on a given location)

• People: Someone who has a role in connection with a station or stinfosys in general

• Equipment: Something that is/can be installed on a station

• Message: communication of data from the stations to MET

Documentation (Where/how?)

The system documentation for stinfosys is only available for internal (MET) users and is (mainly) written in Norwegian.

Conditions and dependencies (why?)

Stinfosys is built to adress the needs of:

• Forecasters: Co-location of observations, visualization and quality management for stations.

• Climate statistics: Management of long (high quality) time series.

• Maintenance management: logistic needs

Data from stinfosys is currently used in:

• KRO: logistic management system for weather stations at MET

• Kvalobs: Quality control system for observations at MET

• KDVH/ODA: Data storage at MET

• Obsinn: System for handling incoming observations at MET

• Frost.met.no: API for observations, externally available

• Seklima.met.no: GUI for observations, externally available

External connections to stinfosys:

None is currently (Jan 2020) operationalized, but there is work being done on these fronts:

• M2M connection to the Public Roads Administration(SVV)’s metadatabase for measurement stations(Målestasjonsregisteret)

• External metadata registration software/gui for the Norwegian Institute for Bioeconomy Research (NIBIO)

• M2M interaction with the OSCAR/Surface database (WMO metadatabase for surfae based weather stations)

34 Chapter 3. Heritage of metadata management


3.2.3 api.met.no

Responsible (Who?)

MET Norway, IT Department, Team Punkt.

Description (What?)

api.met.no is system with three roles: - API Gateway. E.g proxy for other REST services. - Distribution of static files.- Developer portal / service catalog.

The most used service proxied through api.met.no is locationforecast, a REST service for requesting a forecast time-series for a specific location on earth.

The metadata system used by api.met.no is the standard http://apisjson.org/. This standard was created ca. at the sametime as DCAT. api.met.no will support DCAT in the future, but its not clear at the moment weather apis.json willco-exist with DCAT or be replaced.

Documentation(Where/how?)

api.met.no is documented with swagger and additional free text in html. See https://api.met.no for more information.

Conditions and dependencies(why?)

• The primary users of api.met.no are yr.no, specific governmental partners and the general public.

• Its meant to be easy to use for people with no knowledge about meteorology.

• Its meant to handle large amounts of requests.

• Its not meant as an archive, only serving fresh data (data for the last 2-3 days).

• Dependencies: MET Infrastructure services, such as OpenStack, DNS, network.

• Dependencies 2: Since api.met.no proxy other services, those other services need to be operated separately.

3.2.4 Frost

Responsible (Who?)

MET Norway, Department of Observation and Climate, Division for Observation quality and data processing.

Description (What?)

Frost is a RESTful API that provides access to MET Norway’s archive of historical weather and climate data. Frost doesnot actually contain any metadata itself. However, frost exposes metadata about stations from ST-INFOSYS throughvarious interfaces. And element metadata and the data istelf from KDVH (Klimadatavarehuset). An element is forexample temperature at 2m or wind speed at 10m at a specific location and at a time.

3.2. MET Norway 35

http://apisjson.org/

https://api.met.no



Frost is documented with swagger. See https://frost.met.no/api.html for more information.


• The Frost API is primarily for developers who need to develop scripts or applications that access MET Norway’sarchive of historical weather and climate data.

• Dependecies: ST-INFOSYS and KDVH. In the future, frost will replace KDVH with ODA (Observation, Data,Access). For more information about ODA, see here: https://oda.pages.met.no/page/about/

• The dependencies do not support FAIR, but FROST provides translations, where possible, of element namesfollowing CF.

• ST-INFOSYS contains metadata of the MET Norway obervations stations. ST-INFOSYS does not support FAIR,becuse data is not Findable. Metadata is not findable for other than special users, the same for Accessable. Themetadata do not follow cf standard i.e. not Interoperable. In St-INFOSYS metadata itself are Reusable.

• KDVH ( klimadatavarehuset) contains the element data. KDVH data is not Interoperable as the elements metadataare not CF compliant.

3.2.5 Heritage system: MMD

Responsible(Who?)

• Øystein Godøy / FoU-FD

Description(What?)

The Metno Metadata Format (MMD) is an XML metadata format for storing information about scientific datasets. Itis meant for consumption by internal systems at MET and to be a corner stone in our data management policy.

The MMD covers a wide range of metadata types and is compatible with ISO19115 and GCMD DIF.


Documentation for the MMD format can be found in the mmd repository on GitHub: https://github.com/steingod/mmd/


The objectives of the MMD standard format:

• To facilitate documentation of data and products managed by METNO.

• To facilitate metadata re-use between different projects and services at METNO.

• To be compatible with the GCMD DIF and ISO19115/ISO19139 metadata standards as imposed by WMO andNorge Digitalt/INSPIRE.

• To provide as lossless conversion between the different formats as possible.

The MMD is meant to be used for metadata management in support of internal and external services including, but notrestricted to:

• BarentsWatch


https://frost.met.no/api.html

https://oda.pages.met.no/page/about/

https://github.com/steingod/mmd/


• Halo

• METSIS

– Arctic Data Centre

– WMO Global Cryosphere Watch

3.2.6 Productstatus

TODO: write or link to documentation about product status.

3.2.7 NINA metadata catalog

3.2.8 Responsible(Who?)

NINA (Norwegian Institute for Nature Research) - Environmental data department (miljødata)

3.2.9 Description(What?)

NINA is building FAIR data infrastructures. The main data portal, under construction, will serve mainly as a FAIRmetadata catalog, with limited data sharing capabilities.

Most of the NINA datasets deal with Terrestrial ecology and Biodiveristy. For example, species occurrences duringfield sampling events.

Discovery metadata will be exposed as RDF documents serialized using the Data Catalog vocabulary (DCAT).

DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs on the Web.

Improved interoperability will be reached additionally by exposing metadata with the OAI-PMH exchange protocol

3.2.10 Documentation(Where/how?)

Documentation for the NINA data infrastructure are still under construction, and will be provided when ready.

3.2.11 Conditions and dependencies(why?)

• Purpose of the main metadata catalog is to publicly share info about NINA sourced data

• The NINA metadata catalog is designed to share rich metadata exposed with international standards

• datasets may be described using controlled vocabularies

• part of the data is shared with community standards and workflows, such as GBIF for biodiversity data.

• data shared through GBIF are structured using the darwin core standards

• only part of the data from NINA can be considered dynamic geodata

• Geospatial data in NINA are shared with a Geodata portal, exposing OGC web-services endpoints

• Geospatial data provide additionally ISO 19135 metadata shared through a CSW service

• NINA metadata sharing facilities could be modified partially to comply with S-ENDS requirements

• Ad-hoc data sharing platforms could be designed to meet specific S-ENDA requirements for data sharing in caseof specific selected datasets

3.2. MET Norway 37

CHAPTER

FOUR

REQUIREMENTS SPECIFICATION

Requirements based on the User analysis and Heritage of metadata management should come here..

39


40 Chapter 4. Requirements Specification

CHAPTER

FIVE

S-ENDA ARCHITECTURE

About the architecture drafts:

• They are described using the C4 model (https://c4model.com/). C4 does not define any properties based on thedirectionality of the used arrows so each arrow should have a textual description to avoid disambiguity

• They are work in progress, and many updates are expected as we dig into the details

• They are to a high degree based on the use cases outlined in Use Case Descriptions

5.1 General Contexts

S-ENDA is part of a larger effort within the national geodata strategy (“Alt skjer et sted”), and relates to this strategythrough Geonorge, which is developed and operated by the Norwegian Mapping Authority (“Kartverket”). GeoNorge,in turn, relates to the European Inspire Geoportal through the Inspire directive. In particular, S-ENDA is responsiblefor Action 20 of the Norwegian geodata strategy. The goal of action 20 is to establish a distributed, virtual data centerfor use and management of dynamic geodata. S-ENDA’s vision is that everyone, from professional users to the generalpublic, should have easy, secure and stable access to dynamic geodata.

The vision of S-ENDA and the goal of action 20 are aligned with international guidelines, in particular the FAIRGuiding Principles for scientific data management and stewardship.

5.1.1 S-ENDA in a national and international context

Dynamic geodata is weather, environment and climate-related data that changes in space and time and is thus descriptiveof processes in nature. Examples are weather observations, weather forecasts, pollution (environmental toxins) in water,air and sea, information on the drift of cod eggs and salmon lice, water flow in rivers, driving conditions on the roadsand the distribution of sea ice. Dynamic geodata provides important constraints for many decision-making processesand activities in society.

Geonorge is the national website for map data and other location information in Norway. Here, users of map data cansearch and access such information. Dynamic geodata is one such information type. S-ENDA extends Geonorge bytaking responsibility for harmonising the management of dynamic geodata in a consistent manner.

The below figure illustrates S-ENDA’s position in the national and international context. As illustrated, GeoNorgeCSW harvesting should also make S-ENDA datasets findable by other systems.

41

https://c4model.com/

https://www.nature.com/articles/sdata201618

https://www.nature.com/articles/sdata201618


5.2 S-ENDA C4 Context Diagram

The below diagram describes the S-ENDA system in the boundary for dynamic geodata above. The data consumersare defined in Users definition.

5.2.1 S-ENDA Discovery Metadata Service - C4 container diagram

Dataset catalog service API - C4 component diagram

5.2.2 Production Hubs - C4 container diagram

5.2.3 Distribution Systems - C4 container diagram

42 Chapter 5. S-ENDA Architecture

CHAPTER

SIX

DISCOVERY AND CONFIGURATION METADATA CATALOG

See DMH: https://htmlpreview.github.io/?https://github.com/metno/data-management-handbook/blob/master/html/data-management-handbook.html

43

https://htmlpreview.github.io/?https://github.com/metno/data-management-handbook/blob/master/html/data-management-handbook.html

https://htmlpreview.github.io/?https://github.com/metno/data-management-handbook/blob/master/html/data-management-handbook.html


44 Chapter 6. Discovery and Configuration Metadata Catalog

CHAPTER

SEVEN

DATA MANAGEMENT RECIPES

See DMH: https://metno.github.io/data-management-handbook/#practical-guides

45

https://metno.github.io/data-management-handbook/#practical-guides


46 Chapter 7. Data Management Recipes

CHAPTER

EIGHT

GENERAL CONVENTIONS

S-ENDA concerns both software development and generation of dynamic geodata. In line with the recommendationsfrom MET Norway and the FAIR principles, all S-ENDA relevant data and code shall use the following licenses tofoster reusability:

• Double licensing for data:

– CC-BY 4.0 International (https://creativecommons.org/licenses/by/4.0/), and

– The Norwegian Licence for Open Government Data (NLOD) (https://data.norge.no/nlod/en/2.0)

• One of the following licenses for software:

– Apache License, Version 2.0

– The MIT license

– The GNU General Public License (GPL) version 2 (all derivative work must be licensed with GPLv2)

All software repositories should also establish general guidelines/conventions, such as:

• Use version control, preferably git.

• Open-source code should be openly available, e.g., at GitHub or a similar open service.

• The coding style should be specified in a conventions section of the documentation of open-source software, e.g.,specifying which standard is followed (e.g., PEP-8 for Python), maximum line length, etc.

• All code should be tested (100% test coverage), i.e.,

– Every function must be accompanied with a test suite

– Tests should be both positive (testing that the function work as intended with valid data) and negative(testing that the function behaves as expected with invalid data, e.g., that correct exceptions are thrown)

– If a function has optional arguments, separate tests for all options should be created

– Examples should be supported by doctests

• Discussions regarding the S-ENDA framework are kept in the issue-trackers of relevant git repositories at githubor in public gitter chats linked from the respective README files

There is plenty of more online guides - see, e.g., https://opensource.guide/.

47

https://creativecommons.org/licenses/by/4.0/

https://data.norge.no/nlod/en/2.0

https://www.apache.org/licenses/LICENSE-2.0

https://opensource.org/licenses/MIT

https://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

https://www.python.org/dev/peps/pep-0008/

https://opensource.guide/


8.1 Versioning

Software tools should be properly versioned, following the Semantic Versioning principles.

The general principles of version increments, given a version number MAJOR.MINOR.PATCH, are:

1. MAJOR version when you make incompatible API changes,

2. MINOR version when you add functionality in a backwards compatible manner, and

3. PATCH version when you make backwards compatible bug fixes.

Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.(Reference: https://semver.org/)

It is good practice to use 0 as the MAJOR version during the initial development stage, and increment the MINORnumber for each release up until it is ready for production use and a 1.0 release.

The main benefit of semantic versioning, when followed properly, is that other software that depends on these softwarepackages have some idea what to expect from the version numbers. It is then possible for the developer to configuretheir own dependency requirements accordingly, using version ranges and filters rather than version pinning or pinningto specific commits.

8.1.1 Python Packages

For Python projects, the version number should be available from a setup.py or setup.cfg file. The package’s main__init__.py file should also define the __version__ variable.

The setup files can generally extract the version number from the main __init__.py file. See the setuptools docu-mentation.

This approach makes it significantly easier to use package tools like pip to deploy packages, and to include them asdependencies in other packages.

When making a release of a Python project, increase the version number first, and then tag the release with a tagmatching that version number string. This makes it significantly easier to handle automatic upgrades and installationsof pacages both from the Python Packaging Index and directly from the project’s main repository.

48 Chapter 8. General conventions

https://semver.org/

https://semver.org/

https://setuptools.readthedocs.io/en/latest/userguide/declarative_config.html

https://setuptools.readthedocs.io/en/latest/userguide/declarative_config.html

https://pypi.org/

CHAPTER

NINE

GIT CONVENTIONS

S-ENDA developers and team members are advised to use branches within the git repositories to write and updatecode. Externals may fork the repositories and submit pull requests to apply changes or new code.

We adopt the following system for branching and merging:

• master branch:

– only for complete use case descriptions (corrections/changes are handled in hotfix branches)

– numbered releases of the code

– Never edited

– Merged from issue and hotfix branches

– Long living

• issue<NNN>_<short-heading>

– ideas for new use cases should be described in regular issues (at https://github.com/metno/use-cases-for-S-ENDA/issues). They can be simple one line, or more elaborate, descriptions

– for each issue, we create issue branches in which the respective issue is solved and reviewed

– an issue branch is an issue specific branch (NNN = issue number)

– Main working area

– Short living

– Branched from, and merged back into master

• hotfix<NNN>_<short-heading>

– branches that are specific to an existing use case, i.e., deals with changes on complete use cases that arepresent in the master branch

Note:

• Never edit code in the master branch. Always make a new branch for your edits.

• A new branch should be very specific to only one problem. It should be short living.

• Commit often.

• Branch often.

• Branch only from master.

• Create pull requests for your branches and always assign a reviewer to merge, delete the branch, and close theissue (this is easy in github)

49

https://github.com/metno/use-cases-for-S-ENDA/issues

https://github.com/metno/use-cases-for-S-ENDA/issues


The master branch is linked from readthedocs, and shall be complete, nice and understandable.

50 Chapter 9. GIT Conventions

CHAPTER

TEN

DEFINITION OF DONE

The following requirements must be satisfied before a task can be considered done:

1. The solution is in agreement with the design

2. Software repositories are licensed (see General conventions)

3. The headers of all software code files contain a reference to the repository license (i.e., with a statement such as“License: This file is part of <repository>. <Repository> is licensed under <license name> (<link to license>)”)

4. 100% test coverage of production code

5. Aim for 80% test coverage of prototype code

6. Continuous integration

7. All tests are passing

8. The proposed solution is reviewed by at least one team member

9. Documentation is complete, and each component can be used by anyone interested

Example file header (ref. point 3 above):

# License:# This file is part of the S-ENDA-Prototype repository (https://github.com/metno/S-→˓ENDA-Prototype).# S-ENDA-Prototype is licensed under GPL-3.0 (https://github.com/metno/S-ENDA-→˓Prototype/blob/master/LICENSE)

51


52 Chapter 10. Definition of Done

CHAPTER

ELEVEN

DEVELOPMENT ENVIRONMENT

11.1 Workflow

The workflow from local development to production goes through four main stages: local development, local testingof containers intended for the staging environment, online staging, and online production.

53


54 Chapter 11. Development environment


11.2 Vagrant

In Wikipedia, HashiCorps’ Vagrant is defined as

Vagrant is an open-source software product for building and maintaining portable virtual software de-velopment environments, e.g. for VirtualBox, KVM, Hyper-V, Docker containers, VMware, and AWS. Ittries to simplify the software configuration management of virtualizations in order to increase developmentproductivity.

Vagrant is used in S-ENDA for spinning up reproducible development environments.

11.3 Installation

S-ENDA has standardized on these versions for the development environment:

• Vagrant 2.2.7

• VirtualBox 6.1.x

• Vagrant plugin vagrant-disksize 0.1.3

Download the Vagrant and VirtualBox appropriate for your platform and install. Install the plugin as follows.

vagrant plugin install --plugin-version 0.1.3 vagrant-disksize

11.4 Generic usage

Below is a sample of the most used Vagrant commands. For an overview of other options used in the configuration file,Vagrantfile, and more advanced usage, read the Vagrant documentation.

• Create config file, Vagrantfile, for an Ubuntu Bionic VM:

vagrant init ubuntu/bionic64

• Start VM:

vagrant up

• Access VM:

vagrant ssh

• Stop VM:

vagrant halt

• Rerun provisioning scripts:

vagrant provision

11.2. Vagrant 55

https://en.wikipedia.org/

https://www.hashicorp.com/

https://www.vagrantup.com/

https://releases.hashicorp.com/vagrant/2.2.7/

https://www.virtualbox.org/wiki/Downloads

https://github.com/sprotheroe/vagrant-disksize

https://www.vagrantup.com/docs/


11.5 S-ENDA configuration

This section contains a template Vagrantfile used in S-SENDA development. There is one generic and reusablepart. In addition to this, examples to help developers extend development environment functionality will be added.Specific Vagrantfile’s for reproducing the complete development environment for S-ENDA will reside in the maincode repositories.

11.5.1 Generic configuration

1. Create a folder for the development environment. This is usually your git repository folder where your coderesides.

mkdir developmentcd development

2. Create a new Vagrantfile, and add the generic template we use. Use copy and paste.

vim Vagrantfile

# -*- mode: ruby -*-# vi: set ft=ruby :

require 'yaml'

begincurrent_dir = File.dirname(File.expand_path(__FILE__))# config.yml is ignored by git, i.e., it is added to .gitignoreconfigs = YAML.load_file("#{current_dir}/config.yml")vagrant_config = configs['configs'][configs['configs']['use']]

rescue StandardError => msgvagrant_config = {}

end

Vagrant.configure("2") do |config|config.vm.box = "ubuntu/bionic64"config.vm.box_check_update = false

config.vm.network "private_network", ip: "10.20.30.10"

config.vm.provider "virtualbox" do |vb|vb.memory = "4096"vb.cpus = 4vb.default_nic_type = "virtio"

end

config.vm.define "default" do |config|if vagrant_config != {}config.vm.network "public_network", ip: vagrant_config['ip'], netmask:␣

→˓vagrant_config['netmask'], bridge: vagrant_config['bridge']config.vm.provision "shell", run: "always", inline: "ip route add␣

→˓default via #{ vagrant_config['gateway'] } metric 10 || exit 0"config.vm.hostname = vagrant_config['hostname']

(continues on next page)



(continued from previous page)

endend

config.vm.provision "shell", inline: <<-SHELLapt-get updateapt-get install -y wget unattended-upgrades

SHELLend

3. Add configuration file containing external IPs. This is an example. Remember to exclude this file from git in.gitignore.

Explanation. If you add this file in the same directory as you Vagrantfile, the Vagrant VM will au-tomatically get the hostname and external IP on the interface you’ve defined as bridge. You can havemultiple configurations in same file. Select the configuration you want with the use variable. In this ex-ample file use is set to myip1. With the myip1 configuration the VM will get the name my.host.foo, IP192.168.1.101 bridged on eth0.

vim config.yml

---configs:use: myip1myip1:hostname: my.host.fooip: 192.168.1.101netmask: 255.255.255.0bridge: eth0gateway: 192.168.1.1

myip2:hostname: ohter.host.fooip: 192.168.1.102netmask: 255.255.255.0bridge: eth0gateway: 192.168.1.1

4. Start environment with myip1 external IP.

vagrant up

11.5. S-ENDA configuration 57


11.5.2 Examples extending functionality

This section will be extended as the need for more functionality in the development environment arises.

Resize VM disk size in Vagrantfile

To increase the capacity of the VM disk, you need the vagrant-disksize plugin installed on your system, see Instal-lation. Accepted sizes are KB, MB, GB and TB. Change this example size, 50GB, to your desired size. Add this exampleto your Vagrantfile.

# Add example inside the vagrant configure block# Vagrant.configure("2") do |config|

if Vagrant.has_plugin?("vagrant-disksize")config.disksize.size = '50GB'

elseconfig.vm.post_up_message = <<-MESSAGEWARNING:Can't resize disk. 'vagrant-disksize' plugin is not installed.

To install plugin run:vagrant plugin install --plugin-version=0.1.3 vagrant-disksizeMESSAGE

end

# End of Vagrant configure block# end

11.6 Development of the S-ENDA csw catalog service and relevantPython packages

See DMH: https://htmlpreview.github.io/?https://github.com/metno/data-management-handbook/blob/master/html/data-management-handbook.html#practical-guides


https://htmlpreview.github.io/?https://github.com/metno/data-management-handbook/blob/master/html/data-management-handbook.html#practical-guides

https://htmlpreview.github.io/?https://github.com/metno/data-management-handbook/blob/master/html/data-management-handbook.html#practical-guides

CHAPTER

TWELVE

DOCKER CI WITH GITHUB ACTIONS

All docker containers, and all code should be tested. Continuous integration, CI, with GitHub Actions takes care of thisfor repositories that has a Docker container. Repositories that comes without a Docker container can be tested with,e.g., Travis CI and Coveralls for monitoring test coverage.

12.1 Set up automatic Build of Containers

Linking repositories to Docker Hub and let the Docker site build containers is excruciating slow. Lately GitHub hasstarted to provide a CI system called GitHub Actions which integrate CI into the repository. GitHubs’ integrated CI isup to 10 times faster than Dockers’ system with free accounts. Read more about GitHub Actions at GitHub ActionsDocumentation.

Docker Hub has already a recommended way of running test suites on a container. We reuse this with GitHub actions.We also reuse Dockers’ way of building and setting up services with docker-compose. Two files will be needed inour repositories containing containers.

• docker-compose.yml, see Overview of Docker Compose.

Containing definitions for building and running one or more containers.

• docker-compose.test.yml, see Automated repository tests.

Containing definitions for building and testing one or more containers.

12.1.1 Day to day usage

This will produce containers on Docker Hub where GitHub master branch is marked with tag dev. A release tag inGitHub, example 1.0.0, will be marked with the tags 1.0.0 and 1.0, 1 and latest if it is the latest release.

Release tag is on the semantic versioning format without prefixed v, see Semantic Versioning 2.0.0.

1. Set environment variable LATEST_TAG in .github/workflows/docker.yml with repository release tag.

2. Add new release tag under jobs.schedule-push.strategy.matrix.ref in .github/workflows/docker.yml.

3. Commit changes to repository and merge.

• Commit changes and push to branch.

git commit -m "new release" .github/workflows/docker.ymlgit push

• Merge changes to master repository.

59

https://en.wikipedia.org/wiki/Continuous_integration

https://github.com/features/actions

https://www.docker.com/

https://travis-ci.org/

https://coveralls.io/

https://hub.docker.com/

https://github.com/

https://help.github.com/en/actions

https://help.github.com/en/actions

https://docs.docker.com/compose/

https://docs.docker.com/docker-hub/builds/automated-testing/

https://semver.org/


4. Add release tag to repository after merge.

git checkout mastergit pullgit tag 1.0.0git push --tags

12.1.2 Configure GitHub Actions

Our goals.

1. Reuse existing workflow with docker-compose.

2. Run test suite on every pull request.

3. Automatic publish latest container as name:dev to Docker Hub registry. name is replaced with your containername.

4. Automatic publish releases as name:1.0.0, name:1.0, name:1 and name:latest. name is replaced with yourcontainer name, and tag version is replaced with your version.

5. Automatic rebuild all container every night to ensure security patches are added for all tagged versions and masterbranch.

Live example at GitHub metno/pycsw-container and result at Docker Hub metno/pycsw.

Authentication to Docker Hub

Before we start we need link a Docker Hub repository with the GitHub repository. We’re using a loose coupling,meaning we don’t actually link it. We only provide the GitHub with our Docker Hub credentials.

1. Create a Docker Hub application credential for this GitHub repository.

• Go to Docker Hubs’ Account Settings / Security page.

• Click New Access Token.

– As Access Token Description write the name of your repository, e.g. username/reponame, press Create.

– Copy your Access Token to a safe place. Keep it ready for the next step.

2. Adding Docker Hub credentials to GitHub repository. This will be used in the CI file.

• Go to the Settings page for the repository, cogwheel icon.

• Left menu, under Options, click on Secrets.

• Add two new secrets.

– DOCKER_HUB_USER

Add login name for Docker Hub here.

– DOCKER_HUB_PASS

Add the application credential you created in step one here.

60 Chapter 12. Docker CI with GitHub Actions

https://github.com/metno/pycsw-container

https://hub.docker.com/repository/docker/metno/pycsw

https://hub.docker.com/settings/security


Minimum repository content

This is an example. Replace repository content with your content.

• Dockerfile, we use a minimal file for example.

FROM alpine:latest

• docker-compose.yml

---version: '3.4'services:image:image: docker.io/username/image:${VERSION:-dev}build:context: .

• docker-compose.test.yml

---version: '3.4'services:sut:build:context: .

command: echo Start test script here e.g. ./run_tests.sh

Add CI definition file

Add the following file in the repository as .github/workflows/docker.yml.

Make a note of FIXME and TODO. TODO marks where you need to update with every release. FIXME is what could beimproved at a later point.

name: docker

# FIXME: add yaml anchors when GitHub supports it, strange that they don't

on:push:# publish image as master=dev or on new tag# except on document and ci changesbranches:- master

tags:- '*'

paths-ignore:- '**.md'- '.github/workflows/*yml'

# always run tests on merge# except on document and ci changespull_request:

(continues on next page)

12.1. Set up automatic Build of Containers 61



paths-ignore:- '**.md'- '.github/workflows/*yml'

# schedule full rebuild and push on schedule, see todosschedule:- cron: '13 3 * * *'

env:# TODO: remember to update version on new tagLATEST_TAG: 1.0.0DOCKER_HUB_USER: ${{ secrets.DOCKER_HUB_USER }}DOCKER_HUB_PASS: ${{ secrets.DOCKER_HUB_PASS }}

jobs:shcedule-push:runs-on: ubuntu-latestif: github.event_name == 'schedule'strategy:matrix:# FIXME: is it possible to automatic parse refs?# TODO: remember to add new tags to scheduleref:

- master- 1.0.0

steps:- uses: actions/checkout@v2with:ref: ${{ matrix.ref }}

- run: echo $DOCKER_HUB_PASS | docker login docker.io -u $DOCKER_HUB_USER --→˓password-stdin

- run: |export VERSION=${{ matrix.ref }}[ "$VERSION" == "master" ] && export VERSION=dev

echo VERSION=$VERSIONdocker-compose builddocker-compose push

# tag and push versions X.X and X and latestif echo "$VERSION" | grep -qE '^\w+\.\w+\.\w+$' && [ "$LATEST_TAG" == "$VERSION

→˓" ]; thenfor VERSION in $(echo $VERSION | cut -d. -f1,2) $(echo $VERSION | cut -d. -

→˓f1) latest; doexport VERSIONecho VERSION=$VERSIONdocker-compose builddocker-compose push

donefi

test:(continues on next page)




runs-on: ubuntu-latestif: github.event_name != 'schedule'steps:- uses: actions/checkout@v2- run: |

docker-compose --file docker-compose.test.yml build#docker-compose --file docker-compose.test.yml run sut

push:needs: testruns-on: ubuntu-latestif: github.event_name == 'push'steps:- uses: actions/checkout@v2- run: echo $DOCKER_HUB_PASS | docker login docker.io -u $DOCKER_HUB_USER --

→˓password-stdin- run: |

export VERSION=$(echo "${{ github.ref }}" | sed -e 's,.*/$.*$,\1,')[[ "${{ github.ref }}" == "refs/tags/"* ]] && export VERSION=$VERSION[ "$VERSION" == "master" ] && export VERSION=dev

echo VERSION=$VERSIONdocker-compose builddocker-compose push

# tag and push versions X.X and X and latestif echo "$VERSION" | grep -qE '^\w+\.\w+\.\w+$' && [ "$LATEST_TAG" == "$VERSION

→˓" ]; thenfor VERSION in $(echo $VERSION | cut -d. -f1,2) $(echo $VERSION | cut -d. -

→˓f1) latest; doexport VERSIONecho VERSION=$VERSIONdocker-compose builddocker-compose push

donefi

12.2 Set Up Unit Testing

Unit testing can be done in the same way as building containers. Please see a working example in the repositorymetno/mmd.

Take note of the following files in the repository:

• Dockerfile.unittests

• .github/workflows/unittests.yml

• docker-compose.unittests.yml

• run_unittests.sh

The setup can also be tested locally by running vagrant up.

12.2. Set Up Unit Testing 63

https://github.com/metno/mmd


Note: To work locally, the Vagrantfile should contain the following:

config.vm.provision "shell", "run": "always", inline: <<-SHELLdocker-compose -f docker-compose.unittests.yml up --build --exit-code-from unittests

SHELL

12.3 Set Up Coverage Testing

Coverage testing can be done in the same way as building containers and unit testing. Please see the same workingexample in the repository steingod / mmd.

Take note of the following files in the repository:

• Dockerfile.coverage

• .github/workflows/coverage.yml

• docker-compose.coverage.yml

• tests-coverage.sh

The setup can also be tested locally by running vagrant up .

Note: To work locally, the Vagrantfile should contain the following:

config.vm.provision "shell", "run": "always", inline: <<-SHELLdocker-compose -f docker-compose.coverage.yml up --build --exit-code-from coverage

SHELL


https://github.com/steingod/mmd

CHAPTER

THIRTEEN

WRITING DOCUMENTATION

All code should be documented. Integration with readthedocs is an easy and effective way of publishing the documen-tation. This can be integrated with GitHub in order to stay up-to-date.

In the following, we summarise the requirements for both software and service documentation.

13.1 Software documentation

Software should contain both user documentation and developer documentation.

13.1.1 User documentation

User documentation refers to the documentation provided to the end users. The user documentation is designed toassist end users to use the software.

User documentation is important because it provides an avenue for users to learn:

• how to use your software

• features of your software

• tips and tricks of your software

• how to resolve common problems with your software

Without user documentation, a user may not know how to do the above things. See, e.g., https://computersciencewiki.org/index.php/User_documentation for more information.

13.1.2 Developer documentation

Developer documentation refers to the documentation provided to developers. The developer documentation is designedto assist developers to contribute to the software development.

Developer documentation is important because it provides an avenue for developers to adhere to the basic principlesand guidelines used in collaborative software development:

• coding language and style

• version control

• testing

• how to handle issues and bugs

Developer documentation may help other developers to contribute more efficiently to your code.

65

https://computersciencewiki.org/index.php/User_documentation

https://computersciencewiki.org/index.php/User_documentation


13.2 Service documentation

Placeholder for possible service documentation. . .

13.3 Compiling the documentation locally

Check out the S-ENDA documentation repository locally. We use sphinx to build the S-ENDA documentation. To buildthe S-ENDA documentation locally, the best is to create a virtual environment with sphinx and plantuml installed. Weuse Conda or Vagrant to this. Both methods produce the same result.

After building documentation either with Conda or Vagrant, you can view it with a browser by accessing build/html/index.html in the repository.

13.3.1 Using Conda

If you have Conda installed, or plan to install Conda. Use this option.

sudo apt-get install graphvizsudo apt-get install plantumlconda env create -f source/env.ymlconda activate docsmake cleanmake html

13.3.2 Using Vagrant

Alternatively, you can use the provided Vagrantfile to spin up a VM which compiles the documentation for you. TheVagrantfile resides in the root folder of S-ENDA documentation. Vagrant is explained in the section Developmentenvironment.

Note: Since the operations are performed within an isolated virtual machine environment, this is considered to besafer than the above method using Conda.

To build or rebuild the documentation, run the following command in the root folder of S-ENDA documentation.

vagrant up

Stop the helper VM with the following command.

vagrant halt

66 Chapter 13. Writing documentation

https://github.com/metno/S-ENDA-documentation

https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html


https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html



devel_environ.html

devel_environ.html


CHAPTER

FOURTEEN

DEBATING

Discussions about the S-ENDA documentation are kept in the issue-tracker at github or in the public gitter chat. Weaim to keep the actual documentation as short and concise as possible. As such, we will link to background informationinstead of writing it out in the actual documents.

We seek

• To obtain consensus and win-win solutions

• To understand and communicate

• Synergy (recognising that different backgrounds and opinions can lead to better solutions)

Apart from this, please adhere to general codes of conduct in the open-source community (e.g., the Django Code ofConduct) when debating. If you believe someone is violating the code of conduct, we ask that you report it by [email protected].

67

https://github.com/metno/S-ENDA-documentation/issues

https://gitter.im/metno/S-ENDA-documentation

https://www.djangoproject.com/conduct/

https://www.djangoproject.com/conduct/

mailto:[email protected]


68 Chapter 14. Debating

CHAPTER

FIFTEEN

INDICES AND TABLES

• genindex

• modindex

• search

69

s-enda use cases documentation

Documents