felyx documentation - read the docsand pelamis. for more on the felyx project, please visit...

150
felyx Documentation Release 0.1.0 Jeff Piolle July 11, 2016

Upload: others

Post on 07-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

  • felyx DocumentationRelease 0.1.0

    Jeff Piolle

    July 11, 2016

  • Contents

    1 Technical documents 31.1 User guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Installation Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431.3 Administration and configuration guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 801.4 Developer guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

    2 Tutorials 1272.1 Tutorial for Sentinel-3 OLCI data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

    3 Indices and tables 143

    i

  • ii

  • felyx Documentation, Release 0.1.0

    felyx is a free software solution, written in python and javascript, whose aim is to provide EO data producers and userswith an open-source, flexible and reusable tool to allow the quality and performance of data streams (satellite, in situand model) to be easily monitored and studied.

    The development of felyx is funded by the European Space Agency and led by IFREMER with the support of PMLand Pelamis.

    For more on the felyx project, please visit the project home page.

    Contents 1

    http://www.felyx.org/

  • felyx Documentation, Release 0.1.0

    2 Contents

  • CHAPTER 1

    Technical documents

    1.1 User guide

    1.1.1 Introduction

    Principle

    felyx tools primarily work as data extraction tools, sub-setting source data over predefined target areas (static or mov-ing). Those subsets and any associated metrics are accessible by users or machines as raw files, automatic alerts andperiodic reports. felyx open-source tools provides back-end and front-end software components to:

    • subset large local or remote collections of Earth Observation data over predefined sites (geograph-ical boxes) or moving targets (ship, buoy, hurricane), storing locally the extracted data (referredas miniProds). These data can be directly accessed by users, they constitute a much smaller rep-resentative subset of the original collection on which one can perform any kind of processing orassessment without having to cope with heavy volumes of data.

    • compute statistical (or any kind of) metrics over these extracted subset using for instance a set ofclassic statistical operators (mean, median, rms, ...) that is fully extensible over some parameters ofeach dataset. These metrics are stored in database coupled with a fast search engine (ElasticSearch)from which they can be later queried by users or automated applications.

    • provide periodic reports and raise alerts based on a user-defined set of inference rules through variousmedia (email, twitter feed,..) and devices. The content and conditions on which this information issent to the user is fully configurable through a web interface.

    • analyse the content of the miniProds and metrics through a dedicated web interface allowing to diginto this base of information and extracting useful knowledge through multidimensional interactivedisplay functions (time series, scatterplots, histograms, maps,...).

    3

  • felyx Documentation, Release 0.1.0

    Usage

    Among several potential applications, users can use felyx for:

    • monitoring and assessing the quality of Earth observations (e.g. satellite products and their time series) throughstatistical analysis and/or comparison with other data sources, in NRT or over longer periods

    • assessing and inter-comparing geophysical inversion algorithms or different datasets

    • alerting and reporting on performance degradation or specific conditions

    • performing geophysical analysis (variability, trends,....)

    • observing a given phenomenon, collecting and cumulating various parameters over a defined area

    • crossing different sources of data for synergy applications

    In this context, felyx serves different kind of users:

    • instrument engineers

    • calibration engineers

    • quality control engineers

    • project validation scientists

    • external validation scientists

    • algorithm developers

    • scientific validation community

    • ocean/atmosphere scientific community.

    felyx services are deployable at your own premises and adaptable enough to integrate potentially any kind of param-eters. You can operate their own felyx instance at any location, on datasets and parameters of their own interest. Thedifferent instances of felyx will be able to interact with each other, creating a web of systems enabling aggregation andcross comparison of data subsetsand metrics from multiple sources.

    4 Chapter 1. Technical documents

  • felyx Documentation, Release 0.1.0

    felyx is based on standard and reusable technologies and components making it portable and cross platform. The clientside front-end runs on any navigator and a large range of display devices (computer screen, tablet, smart phone).

    1.1.2 Concepts

    Sites and site collections

    Sites are geographical areas where data subsets (or miniprods) will be extracted from larger files. Sitescan be either static or dynamic. They can be grouped into site collections.

    Sites

    Static sites Static sites correspond to a fix geographical polygon. The file subsets intersecting this polygon areextracted to produce miniprods.

    Fig. 1.1: Example of miniprod extraction (in blue) from several satellite swaths at different acquisition times over astatic site (limits in red).

    1.1. User guide 5

  • felyx Documentation, Release 0.1.0

    Dynamic sites Dynamic sites correspond to a moving target along a spatial and temporal trajectory, defined as a listof position and times. The file subsets centered on the trajectory locations within a given time window around theselocation’s times are extracted to produce miniprods.

    Dynamic sites allow a lagrangian extraction of data subsets following a moving feature which can be a measuringdevice (drifting buoy, ship, ...) or a natural feature (hurricane, eddy, ...).

    Fig. 1.2: Example of miniprods extraction along the cruise track of a ferry line. Miniprods are extracted at the closesttrack location to the data time.

    Collections

    Grouping Site collections are groups of existing sites. Collections are used to group sites that share a commonthematic or source : collections could encompass a set of platforms of the same type or observation program or dataprovider (‘Argo floats‘, ‘Drifting buoys from iQUAM‘,...) or a category of events (‘Tropical hurricanes‘, ‘Eddies‘,...).

    Warning: It is not possible to mix static and dynamic sites in the same collection but sites can be shared bydifferent collections (to define for instance subsets of larger site collections).

    Here is an example of static site collection (GHRSST HR-DDS sites):

    6 Chapter 1. Technical documents

  • felyx Documentation, Release 0.1.0

    Here is an example of dynamic collection (drifting buoys):

    1.1. User guide 7

  • felyx Documentation, Release 0.1.0

    Note: The back-end system ensures that a given site is not processed twice in case it belongs to several collections.

    Ownership Collections can be shared among users and even among different felyx instances. Collections are there-fore affected a ownership level (level) :

    • community level : they are (or should be) exclusively managed at a single instance (master), and defined bythe master instance administrator. They can be imported by any other instances (slave) to configure miniProdextraction and metrics processing on the same sites for different datasets.

    • local level : they are managed by each instance independantly by the local instance administrator.

    • custom level : they are private collections defined at an instance by (authorized) users. Authorization is grantedby the instance administrator through email request. Only a limited number of collections is granted to eachuser.

    Note: There is no central registration of any collection and any collections can be exchanged freely between instancesthrough the import/export mechanism. There is therefore no fundamental diffence between community and localcollections, as it is more a matter of organization and agreement between instances sharing these collections (a singleinstance shall be given the responsibility of maintaining a community collection definition - the master instance). Alocal collection could de facto become a community collection as soon it is shared with another instance.

    The name of the owner of a collection is defined in a specific attribute of the collection (Owner):

    8 Chapter 1. Technical documents

  • felyx Documentation, Release 0.1.0

    • For community or local level collections, this should be the name of the lab or organization hosting the instance

    • for custom level collection, this should be the name user who created the collection

    Tags Collections can be associated with a list of tags, single words defining their purpose, properties or usage.

    Miniprods

    Miniprods are subsets from larger input EO files matching the boundaries of static or dynamic sites. We call extrac-tion the process of subsetting larger original Earth Observation files (swath, grid, image, along-track,...) over thesesites.

    The extraction process extracts all the fields from the source file: you will get the same list of fields in the extractedminiprod. The miniprods are reformatted to a common NetCDF format using the same naming conventions forspatial and temporal dimensions and variables.

    The extraction works slightly differently over static sites (which have permanent boundaries) and dynamic sites.

    Static miniprods

    Over a static site, the miniprod is defined as the section of a file fully including the boundaries of the site. This ispurely an intersection problem:

    Fig. 1.3: In this example, a swath is subsetted so that it fully includes the site area (in yellow). If its boundariesoverlap the swath edge, then only the swath section intersecting the site area will be extracted and it will not fullyinclude the site boundaries in this case.

    Note: There is no remapping, filtering or resampling of the extracted data. They are stored in their native projectionwith the exact same content.

    1.1. User guide 9

  • felyx Documentation, Release 0.1.0

    Note: In order to make comparisons between miniprods easier, a mask is added to the miniprod. It specifies whichpixels from the extracted subset are within the site limits or outside.

    Fig. 1.4: Image of the mask corresponding to the same miniprod as above. The red area indicates which pixels arewithin the static site limits. It is possible to forbid a miniprod creation when there is not a single valid pixel within thesite limits.

    Dynamic miniprods

    Dynamic sites have no predefined shape and locations like the static site. They consist of a list of times and locations(vertices defining a trajectory).

    felyx will extract a miniprod from a source file where it intersects a location of this trajectory, under the conditionthat the time difference of these source data with the time associated with the trajectory location is lower than a giventhreshold (or time window).

    The size of the extracted miniprod is given in pixels by the user, in the extraction configuration. For a swath or gridsource file, it is a box of N x N pixels where N is the chosen size, centered on the pixel matching the trajectory vertex.For along-track data, it is a segment of N pixels centered on the trajectory vertex.

    10 Chapter 1. Technical documents

  • felyx Documentation, Release 0.1.0

    It is possible to forbid a miniprod creation when there is not a single valid pixel within the colocation radius (the limitsof the site).

    Where are the miniprods?

    The miniprods are created in NetCDF format in a dedicated archive area, on disk. The access to this archive area isdependant on the policy of the felyx host.

    Typically, an host will offer access to this archive through:

    • direct local or remote (ssh) access to the archive area

    • an FTP server for direct data download.

    • an OpenDAP server for direct reading in a client script.

    • a selection and download through felyx dedicated web services.

    Depending on the host policy, these services may not be offered or with some restrictions (required account).

    1.1. User guide 11

  • felyx Documentation, Release 0.1.0

    The organization of the archive area is described in a lter section of this user manual.

    Metrics

    Metrics are operators applied to the content of the extracted miniprods that returns a single value charac-terizing the content of these miniprods.

    Metrics operators

    A metric operator returns a single value computed from the miniprod content. This value can be indifferently a numberor a string. This means that this value can represent a quantitative value or a qualitative value (a property).

    felyx comes with a selection of predefined operators. The list of available operators can be extended through thebuilt-in plugin extension mechanism (refer to the developper guide).

    Statistical operators

    Name Descriptionstandard_deviation standard deviation of a fieldmedian_percentage_difference median percentage difference of two fieldsstandard_error standard error of a fieldmin minimum value of a fieldmax maximum value of a fieldmean_percentage_differenceroot_mean_square_error root mean square error of a fieldmedian median value of a fieldcoefficient_of_determinationcoefficient_of_correlation correlation coefficient between two fieldspercentilemodemean_absolute_percentage_differencekurtosisskewmean

    Content extraction operators

    Name Descriptioncentre_value return the value at the center of the miniprod (for dynamic sites, it is the location of

    intersected trajectory point)number_of_values number of valid valuessum_of_valuessum_of_square_values

    Property operators

    Name Descriptionice_presence return True if there is any sea ice in the miniprodday_or_night Return day if the miniprod is exclusively in daylight (solar zenith angle < 90 deg, night if the

    miniprod is exclusively in night time (solar zenith angle > 110 deg) and ‘twilight’ under any othercircumstance night, day or twilight

    cloudyminiprod_metadata

    12 Chapter 1. Technical documents

  • felyx Documentation, Release 0.1.0

    Common operator arguments

    Common arguments offered by operator plugins to condition the processing of the metrics include:

    limit_to_site Metrics processing can be limited to the site mask content by setting this value to True (by default).

    must_have Conditions can be applied to data before calculation (field vs threshold)

    What happen to the metrics?

    The metrics processed by the operators are then indexed into a search engine. This is the service from which then canbe queried from, using the felyx web services and APIs (application programming interface).

    They can also be visualized through the default report generation interface shipped with felyx.

    These services and APIs are described in a later section of this document.

    1.1.3 felyx RESTful API

    queries on miniprods and metrics can be performed using a RESTful web service. The web servicesyntax (or API, Application Programming Interface) enables to express simple selections requests as wellas more complex workflows chaining several actions to each other to express richer requests.

    A query is provided as a JSON string. This section details the syntax of these queries.

    Calling

    RESTful web services can be called in many different ways from any client application. We describe here a simpleway to call and test the queries described in this section:

    • the service URL is the base URL of your felyx installation followed by ‘extraction/extraction/’ forqueries on miniprods and metrics, e.g.:

    setenv URL 'http://felyx.cersat.fr/extraction/extraction/'

    • then call the web service as follow, using curl command:

    curl -XPOST -d @my_query.json ${URL}

    where my_query.json is the file containing the query in json format.

    See also:

    felyx comes also with a python API, pyfelyx which somehow alleviates the user from the complexity of this syntax.Check pyfelyx documentation here

    Syntax

    The following subsections detail the syntax, illustrated through practical examples, of typical queries a user or anapplication would perform to a felyx server. They range from very simple selection to more complex tasks andworkflows.

    1.1. User guide 13

    http://pyfelyx.readthedocs.org/

  • felyx Documentation, Release 0.1.0

    Query format

    A query consists in building a complete workflow, or sequence of processing steps, which should always start with aselection block and end with an extraction block.

    The selection step retrieves miniprods or metrics from felyx, based on some selection criteria (see examples andreference).

    Example:

    {"extraction": {

    "selection": {"metric_list": [

    "mean_sst"],"site_list": [

    "ghr115"],"data_type": "metrics","dataset_list": [

    "ostia-ukmo-l4-glob-v2.0","ostia-esacci-l4-v01.0"

    ],"start_time": "2010-01-01T00:00:00","stop_time": "2010-12-30T00:00:00"

    }}

    }

    Each step can be piped into each other by redirecting the output of one step to the input of the next step in the sequence.

    {"extraction": {

    "colocation": {"input": {

    "selection": [{

    "stop_time": "2014-12-01T00:00:00","start_time": "2014-11-01T00:00:00","site_list": [

    "ghr014"],"data_type": "miniprods","dataset_list": [

    "amsr2-jaxa-l2p-v01.0"]

    },{

    "stop_time": "2014-12-01T00:00:00","start_time": "2014-11-01T00:00:00","site_list": [

    "ghr014"],"data_type": "miniprods","dataset_list": [

    "ostia-ukmo-l4-glob-v2.0"]

    }

    14 Chapter 1. Technical documents

  • felyx Documentation, Release 0.1.0

    ]},"maximum_time_difference": 720

    }}

    }

    The complete syntax of each step is provided API reference below.

    Result format

    The result of a query to felyx API is a json string. It is formatted as follow:

    {"results": {},"arguments": {},"errors": {}

    }

    where:

    • result is the actual result of the query

    • arguments provide the default arguments used at each step of the query workflow.

    • errors are the errors that occured in the processing of the query

    Reference

    This section lists all the processing steps you can use to build a workflow, with the arguments that need to be provided.

    Selection

    Description Select metrics or miniprods based on several search criteria.

    A JSON packet for selection must at the least contain a selection object and some mandatory and optional criteria:

    {"selection":{}

    }

    Important: the selection block is the first element of any workflow, that can be complex and involve multipleembedded steps. The most basic workflow use the selection block to retrieve miniprods or metrics values, inwhich case the selection block is embedded within an extraction block in order to retrieve the result, asfollow:

    {"extraction":{

    "selection": {}}

    }

    Arguments

    1.1. User guide 15

  • felyx Documentation, Release 0.1.0

    data_type

    Argument data_type

    Mandatory Yes

    Type string (metrics or miniprods)

    Description what to select (metrics will return metrics values and miniprods will return miniprod file-names). By default, miniprods are returned (equivalent to "data_type": "miniprods").

    "data_type": "miniprods"

    Important: if data_type is metrics, the name of the requested metrics must be provided in metric_listargument.

    dataset_list

    Argument dataset_list

    Type string or list of strings

    Mandatory Yes

    Description list of datasets from which to select metrics or miniprods.

    Example

    "dataset_list": ["ostia-esacci-l4-v01.0"]

    site_list

    Argument site_list

    Mandatory Yes

    Type string or list of strings

    Description identifiers of the static or dynamic sites from which to select miniprods or metrics.

    Example

    "site_list": ["ghr014"]

    metric_list

    Argument metric_list

    Mandatory Yes if data_type is metrics

    Type string or list of strings

    Description identifiers of the metrics to select, as specified in the felyx configuration.

    Example

    "metric_list": ["mean_sst"]

    16 Chapter 1. Technical documents

  • felyx Documentation, Release 0.1.0

    start_time, stop_time

    Argument start_time, stop_time

    Mandatory No

    Type string (YYYY-MM-DDThh:mm:ss)

    Description time range of the selection. All available metrics or miniprods are selected by default notime bounds are provided. These must be added in the format YYYY-MM-DDThh:mm:ss. Validexamples are:

    • 2012-10-10T12:00:00 (noon on 10th October 2012)

    • 1981-02-14T03:00:54 (zero minutes and 54 seconds past 3am on 14th February 1981)

    Example

    "start_time": "2010-01-01T00:00:00","stop_time": "2010-01-05T00:00:00",

    metric_constraint

    Argument metric_constraint

    Mandatory No

    Type string

    Description allows to reduce the returned selection to miniprods or metrics to those which have indexed(pre-processed) metrics complying to the expressed conditions. Multiple conditions must be sepa-rated with a ;. It can be applied indifferently to miniprods or metrics.

    Example

    this will only return the metrics or miniprods for which simultaneously the mean_sstindexed metric is greater than 280 Kelvin.

    "metric_constraint": "mean_sst > 280."

    or with multiple conditions:

    "metric_constraint": "mean_sst > 280.;mean_sst < 300."

    constraint_list

    Argument constraint_list

    Mandatory No

    Type list of tuples

    Description this keyword allows to further limit the metrics or miniprods selection based on the value ofthe metrics. There are three basic types of metrics:

    1. Boolean metrics (truth values, i.e. ice_presence is either true or false). Themetric_name and value keywords must be provided.

    2. Flag metrics (keywords, i.e. day_or_night can be any combination of day, twilightand night). The metric_name and value keyword must be provided.

    1.1. User guide 17

  • felyx Documentation, Release 0.1.0

    3. Threshold values (numerical values, i.e. mean sea_surface_temperature). In this case,the metric_name, field, operator and value keywords must be provided. Theoperator value can be any in: eq, ge, gt, mod, le, lte, equal, not_equal, less,less_equal, greater, greater_equal, bit_flag_set, bit_flag_not set.For bit_flag_set or bit_flag_not set, value specifies the tested bit(s) and canbe expressed as a single integer (starting from bit 0) or a list.

    Example

    "metric_constraint": "mean_sst > 295."

    Examples To set the query to return all miniprods for the dataset upa-l2p-ats_nr_2p_v1.0 for the calendaryear 2012, at static site ghr109:

    {"extraction":{"selection": {

    "dataset_list": "upa-l2p-ats_nr_2p_v1.0","site_list": "ghr109","start_time": "2012-01-01T00:00:00","stop_time": "2012-12-31T23:59:59","data_type": "miniprods",}

    }}

    Miniprods from multiple sites can be selected at once, using a list instead of a single site after site_list keyword:

    {"extraction":{"selection": {

    "dataset_list": "upa-l2p-ats_nr_2p_v1.0","site_list": ["ghr109", "ghr101", "glc006"],"start_time": "2012-01-01T00:00:00","stop_time": "2012-12-31T23:59:59","data_type": "miniprods",}

    }}

    You can select miniprods or metrics over several datasets in a single query. The dataset_list keyword behaves the sameway as the site_list keyword (i.e. can be a string or a list of strings):

    {"extraction":{"selection": {

    "dataset_list": ["eur-l2p-avhrr_metop_a", "upa-l2p-ats_nr_2p_v1.0"]"site_list": ["ghr109", "ghr101", "glc006"],"start_time": "2012-01-01T00:00:00","stop_time": "2012-12-31T23:59:59","data_type": "miniprods",}

    }}

    It is further possible to limit the list of miniprods by filtering on metric values, using the constraint_list keyword, hereselecting only miniprods where no ice is present(“ice_presence” is False):

    {"extraction":{"selection": {

    "dataset_list": ["eur-l2p-avhrr_metop_a", "upa-l2p-ats_nr_2p_v1.0"]"site_list": ["ghr109", "ghr101", "glc006"],

    18 Chapter 1. Technical documents

  • felyx Documentation, Release 0.1.0

    "start_time": "2012-01-01T00:00:00","stop_time": "2012-12-31T23:59:59","data_type": "miniprods","constraint_list": [{"metric_name": "ice_presence", "value": false}]}

    }}

    To further limit the miniprods, to observation of only day time (a flag metric):

    {"extraction":{"selection": {

    "dataset_list": ["eur-l2p-avhrr_metop_a", "upa-l2p-ats_nr_2p_v1.0"]"site_list": ["ghr109", "ghr101", "glc006"],"start_time": "2012-01-01T00:00:00","stop_time": "2012-12-31T23:59:59",,"data_type": "miniprods","constraint_list": [{"metric_name": "ice_presence", "value": false}, {"metric_name": "day_or_night", "value" : "day"}]}

    }}

    One can also limit by some threshold, for example, only SST equal to or above 300K:

    {"extraction":{"selection": {

    "dataset_list": ["eur-l2p-avhrr_metop_a", "upa-l2p-ats_nr_2p_v1.0"]"site_list": ["ghr109", "ghr101", "glc006"],"start_time": "2012-01-01T00:00:00","stop_time": "2012-12-31T23:59:59",,"data_type": "miniprods","constraint_list": [{"metric_name": "mean", "value": 300, "field": "sea_surface_temperature", "operator": "gte"}]}

    }}

    To return the mean SST (a metric value), set that metric in the metric_list keyword, metrics values values will bereturned instead of miniprod filenames:

    {"extraction":{"selection": {

    "metric_list": [{"metric_name" "mean", "field": "sea_surface_temperature"}]"dataset_list": ["eur-l2p-avhrr_metop_a", "upa-l2p-ats_nr_2p_v1.0"]"site_list": ["ghr109", "ghr101", "glc006"],"start_time": "2012-01-01T00:00:00","stop_time": "2012-12-31T23:59:59","data_type": "metrics","constraint_list": [{"metric_name": "mean", "value": 290, "field": "sea_surface_temperature", "operator": "gte"}]}

    }}

    Want rather to constraint you query with a mask testing (keep only metrics where l2p_flags bits are not set to land(bit 1) or ice (bit 2):

    {"extraction":{"selection": {

    "metric_list": [{"metric_name" "mean", "field": "sea_surface_temperature"}]"dataset_list": ["eur-l2p-avhrr_metop_a", "upa-l2p-ats_nr_2p_v1.0"]

    1.1. User guide 19

  • felyx Documentation, Release 0.1.0

    "site_list": ["ghr109", "ghr101", "glc006"],"start_time": "2012-01-01T00:00:00","stop_time": "2012-12-31T23:59:59","data_type": "metrics","constraint_list": [{"value": [1,2], "field": "l2p_flags", "operator": "bit_flag_no_set"}]}

    }}

    Multiple selection options mixed together can be quite complex. A better separation is possible by using a list ofselection blocks instead of one selection (argument of selection is a list). For instance:

    {"extraction":{"selection": {

    "metric_list": [{"metric_name" "mean", "field": "sea_surface_temperature"}]"dataset_list": ["eur-l2p-avhrr_metop_a", "upa-l2p-ats_nr_2p_v1.0"]"site_list": "ghr109","start_time": "2012-01-01T00:00:00","stop_time": "2012-12-31T23:59:59","data_type": "metrics","constraint_list": [{"metric_name": "mean", "value": 290, "field": "sea_surface_temperature", "operator": "gte"}]}

    }}

    can be simplified with:

    {"extraction":{"selection": [

    {"metric_list": [{"metric_name" "mean", "field": "sea_surface_temperature"}]"dataset_list": ["eur-l2p-avhrr_metop_a"],"site_list": "ghr109","start_time": "2012-01-01T00:00:00","stop_time": "2012-12-31T23:59:59","data_type": "metrics","constraint_list": [{"metric_name": "mean", "value": 290, "field": "sea_surface_temperature", "operator": "gte"}]},

    {"metric_list": [{"metric_name" "mean", "field": "sea_surface_temperature"}]"dataset_list": ["upa-l2p-ats_nr_2p_v1.0"],"site_list": "ghr109","start_time": "2012-01-01T00:00:00","stop_time": "2012-12-31T23:59:59","data_type": "metrics","constraint_list": [{"metric_name": "mean", "value": 290, "field": "sea_surface_temperature", "operator": "gte"}]}

    ]}}

    This is required when having, for instance, different selection criteria for each selected dataset (here applying a thresh-old on the mean SST value only on “upa-l2p-ats_nr_2p_v1.0” dataset:

    {"extraction":{"selection": [

    {"metric_list": [{"metric_name" "mean", "field": "sea_surface_temperature"}]"dataset_list": ["eur-l2p-avhrr_metop_a"],"site_list": "ghr109",

    20 Chapter 1. Technical documents

  • felyx Documentation, Release 0.1.0

    "start_time": "2012-01-01T00:00:00","stop_time": "2012-12-31T23:59:59","data_type": "metrics",},

    {"metric_list": [{"metric_name" "mean", "field": "sea_surface_temperature"}]"dataset_list": ["upa-l2p-ats_nr_2p_v1.0"],"site_list": "ghr109","start_time": "2012-01-01T00:00:00","stop_time": "2012-12-31T23:59:59","data_type": "metrics","constraint_list": [{"metric_name": "mean", "value": 290, "field": "sea_surface_temperature", "operator": "gte"}]}

    ]}}

    Colocation

    Description Produce match-ups between series of miniprods and/or metrics, based on a time difference(maximum_time_difference). Takes as input a list of miniprods pr metrics series, from which the first onein the list is matched against the others (the time difference is therefore checked between the elements of the firstseries and the ones from each other series independently).

    Example

    {"extraction": {

    "colocation": {"input": {

    "selection": [{

    "stop_time": "2014-12-01T00:00:00","start_time": "2014-11-01T00:00:00","site_list": [

    "ghr014"],"data_type": "miniprods","dataset_list": [

    "amsr2-jaxa-l2p-v01.0"]

    },{

    "stop_time": "2014-12-01T00:00:00","start_time": "2014-11-01T00:00:00","site_list": [

    "ghr014"],"data_type": "miniprods","dataset_list": [

    "ostia-ukmo-l4-glob-v2.0"]

    }]

    },"maximum_time_difference": 720

    }

    1.1. User guide 21

  • felyx Documentation, Release 0.1.0

    }}

    yields the following result:

    {"arguments": {

    "colocation": {},"packaging": {

    "include_local_names": false}

    },"results": {

    "amsr2-jaxa-l2p-v01.0": {"ghr014": {

    "times": ["2014-11-01T08:40:39","2014-11-01T20:39:08","2014-11-02T07:45:52","2014-11-02T21:20:34"

    ],"urls": [

    "https://felyx.cersat.fr/extraction/miniprod/20141101084039_ghr014_amsr2-jaxa-l2p-v01.0_20141101080325-JAXA-L2P_GHRSST-SSTsubskin-AMSR2-v1a_D-v02.nc","https://felyx.cersat.fr/extraction/miniprod/20141101203908_ghr014_amsr2-jaxa-l2p-v01.0_20141101202500-JAXA-L2P_GHRSST-SSTsubskin-AMSR2-v1a_A-v02.nc","https://felyx.cersat.fr/extraction/miniprod/20141102074552_ghr014_amsr2-jaxa-l2p-v01.0_20141102070752-JAXA-L2P_GHRSST-SSTsubskin-AMSR2-v1a_D-v02.nc","https://felyx.cersat.fr/extraction/miniprod/20141102212034_ghr014_amsr2-jaxa-l2p-v01.0_20141102210814-JAXA-L2P_GHRSST-SSTsubskin-AMSR2-v1a_A-v02.nc"

    ]}

    },"ostia-ukmo-l4-glob-v2.0": {

    "ghr014": {"times": [

    "2014-11-01T12:00:00","2014-11-01T12:00:00","2014-11-02T12:00:00","2014-11-02T12:00:00"

    ],"urls": [

    "https://felyx.cersat.fr/extraction/miniprod/20141101120000_ghr014_ostia-ukmo-l4-glob-v2.0_20141101120000-UKMO-L4_GHRSST-SSTfnd-OSTIA-GLOB-v02.nc","https://felyx.cersat.fr/extraction/miniprod/20141101120000_ghr014_ostia-ukmo-l4-glob-v2.0_20141101120000-UKMO-L4_GHRSST-SSTfnd-OSTIA-GLOB-v02.nc","https://felyx.cersat.fr/extraction/miniprod/20141102120000_ghr014_ostia-ukmo-l4-glob-v2.0_20141102120000-UKMO-L4_GHRSST-SSTfnd-OSTIA-GLOB-v02.nc","https://felyx.cersat.fr/extraction/miniprod/20141102120000_ghr014_ostia-ukmo-l4-glob-v2.0_20141102120000-UKMO-L4_GHRSST-SSTfnd-OSTIA-GLOB-v02.nc"

    ]}

    }}

    }

    Arguments

    input

    Argument input

    Mandatory Yes

    Type json, the result of a selection step

    22 Chapter 1. Technical documents

  • felyx Documentation, Release 0.1.0

    Description The series to match to each other, as returned by a selection. The first selection in the resultlist is the reference, the series over which all other series are match to, wrt the specified maximumtime difference.

    Example

    "input": {"selection": [

    {"stop_time": "2014-12-01T00:00:00","start_time": "2014-11-01T00:00:00","site_list": [

    "ghr014"],"data_type": "miniprods","dataset_list": [

    "amsr2-jaxa-l2p-v01.0"]

    },{

    "stop_time": "2014-12-01T00:00:00","start_time": "2014-11-01T00:00:00","site_list": [

    "ghr014"],"data_type": "miniprods","dataset_list": [

    "ostia-ukmo-l4-glob-v2.0"]

    }]

    }

    maximum_time_difference

    Argument maximum_time_difference

    Mandatory Yes

    Type int

    Description The maximum time difference, in minutes, between the reference series and the matchedseries.

    Example

    "maximum_time_difference": 720

    Remapping

    Description Remap a miniprod onto a user defined grid.

    Miniprods are stored in felyx in their native grid and projection (there is no transformation between the original sourcefile and the extracted subset). Remapping is necessary to intercompare or compare miniprods to each other.

    Warning: This step can only be applied to a selection of miniprods. Using it on a metrics selection will triggeran error.

    1.1. User guide 23

  • felyx Documentation, Release 0.1.0

    Example

    Arguments

    input

    Argument input

    Mandatory Yes

    Type json, the result of a miniprod selection step

    Description The series of miniprods to remap, as returned by a selection.

    resolution

    Argument resolution

    Mandatory No

    Type number

    Description in degrees

    proj4_string

    Argument proj4_string

    Mandatory No

    Type string

    Description a string, in proj4 format, defining the target grid for the reprojection. Refer to Proj4 docu-mentation.

    hrdds

    Argument hrdds

    Mandatory No (default is false)

    Type bool

    Description Use a default grid, which is a regulat lat/lon grid at 0.01 degree resolution matching theextraction site’s boundaries.

    linear_zero_weight_distance

    Argument linear_zero_weight_distance

    Type number

    Mandatory

    Description in meters

    24 Chapter 1. Technical documents

    http://trac.osgeo.org/proj/http://trac.osgeo.org/proj/

  • felyx Documentation, Release 0.1.0

    neighbours

    Argument resolution

    Type number

    Mandatory

    Description in degrees

    Formatting

    Syntax

    “image_format”: “png”, “operation_name”: “image_map”, “value_range”: [275, 285], “field”: “anal-ysed_sst”

    Argument operation_name

    Type string in “image_map”,

    Mandatory

    Description

    Argument image_format

    Type string in “png”

    Mandatory

    Description

    Argument value_range

    Type [number, number]

    Mandatory

    Description

    Argument field

    Type string

    Mandatory

    Description

    1.1. User guide 25

  • felyx Documentation, Release 0.1.0

    I have implemented a basic image renderer (image_map) as a plugin. You give it a previous selection, colocation orremapping, along with a value range (v_min, v_max) and a field name (sea_surface_temperature) and a format (png)- you can also give a resolution in DPI.

    It will return paths to newly generated images (which the server can now access). f you omit the value ranges, thenit will establish them on a per site basis, and return in the JSON result what the ranges are. I have also modified thetargzip plugin to include the input JSON string (which is the result of the previous query step), so in the chain:

    selection -> colocation -> remapping -> image_format -> targzip the tar file will include the images and the JSONoutput of the image_format step. This will of course include the ranges, and will be structured in the same way as thecolocation step (so you can easily see what the colocations are.

    For example:

    { “extraction”: { “packaging”: { “input”: { “formatting”: { “input”: { “remapping”: { “hrdds”: true, “input”:{ “colocation”: { “input”: { “selection”: [{ “start_time”: “2012-10-01T00:00:00”, “site_list”: [”ghr014”],“stop_time”: “2012-10-05T00:00:00”, “data_type”: “miniprods”, “dataset_list”: [”SST-27”] }, { “start_time”:“2012-10-01T00:00:00”, “site_list”: [”ghr014”, “ghr060”], “stop_time”: “2012-10-05T00:00:00”, “data_type”:“miniprods”, “dataset_list”: [”SST-27”] }] } } } } }, “image_format”: “png”, “operation_name”: “image_map”,“value_range”: [275, 285], “field”: “analysed_sst” } }, “operation_name”: “targzip” } }

    }

    Will return a TGZ file with the PNG maped images of the reprojected, collocated files, and the JSON documentdescribing the colocation relationships (and also including that value ranges, although in this case you already knewthem). Additionally, the JSON document will include in the ‘miniprods’ section a list of miniprods now with thenames “formatted_miniprods//.png”

    You would be able to access the images directly for the server with the URL:

    felyx.og/extraction/miniprods/ to get the actual image. I believe, although I have not tested,that this will work federally too. Importantly there is no reason why it would not.

    I need to modify some of the documentation in the code, but it is functional.

    Obviously there are many features still to add (for example, how to deal with some SSTs being in fields called‘sea_surface_temperature’ and others in ‘analysed_sst’) - but that should’t be difficult. Obviously I need to add theability to modify the colour scheme, and perhaps add colours and other things.

    One question I have regarding the anomaly step you ask for: I assume that this would be a separate plugin step(processing or compositing I guess), but I do not clearly see yet what the output should be - the result is no longer aminiprod - so my instinct is for it to just return a numpy array. Is that acceptable to you?

    I will also try to answer the issues you have added over the weekend.

    Packaging

    Syntax

    Argument operation_name

    Type string, in “targzip”

    Mandatory

    Description

    26 Chapter 1. Technical documents

  • felyx Documentation, Release 0.1.0

    Some examples

    Colocation query

    Query format The general format for a colocation query is as follow:

    {"colocation":

    {"maximum_time_difference": ,"input":

    {"selection": {}

    }}

    }

    Important:

    As for the selection block, it is only the first element of a workflow, that can be complex and involve mul-tiple embedded steps. The most basic workflow uses the colocation block to retrieve colocated miniprodsor metrics values, in which case the colocation block is embedded within an extraction block as follow:

    {"extraction":{"colocation":

    {"maximum_time_difference": ,"input":{"selection": {}

    }}

    }}

    Examples

    {"extraction": {"colocation": {

    "maximum_time_difference": 720,"input": {

    "selection": [{"stop_time": "2014-12-31T00:00:00","start_time": "2012-01-01T00:00:0","site_list": ["ghr060", "ghr014"],"data_type": "metrics","dataset_list": ["ifr-l4-sstfnd-odyssea-med_002_v2.1"],"metric_list": ["mean_sst", "day_or_night"]

    }, {"stop_time": "2014-12-31T00:00:00","start_time": "2012-01-01T00:00:00","site_list": [ "ghr060", "ghr014"],"data_type": "metrics",

    1.1. User guide 27

  • felyx Documentation, Release 0.1.0

    "dataset_list": ["viirs_npp-navo-l2p-v1.0"],"metric_list": ["mean_sst", "day_or_night"]

    }]}

    }}

    Cross query

    1.1.4 felyx Python API

    A python client API to felyx, pyfelyx, is also provided. This python package allows users to build all types of metricsor miniprod queries and to insert them into your own scripts. It is also more user-friendly than the json RESTful API.

    Check it out at .

    1.1.5 Miniprod format

    Miniprods are in netCDF format, whatever the format of the source data. this independency to the source dataformat is achieved thanks to the cerbere python package developed by Ifremer that expose the content of any dataformat through the same interface (close to self-descriptive formats such as NetCDF). This package handles already alarge range of formats and products, and can be extended in a flexible way. More information on the cerbere packageand how to extend it is provided in the online documentation of cerbere.

    The format of the miniprods ensures that:

    • latitude, longitude and time information are always expressed in the same way (same field names and units).

    • similar data acquisition patterns will always be structured the same way, having the same spatial and temporaldimensions. The names and list of these dimensions for each type of pattern(swath, grid, trajectory) is alsodescribed in the online documentation of cerbere.

    Here is an example of miniprod format for a grid product:

    netcdf \20111118120000_CpF_ostia-ukmo-l4-glob-v2.0_20111118120000-UKMO-L4_GHRSST-SSTfnd-OSTIA-GLOB-v02 {dimensions:

    time = UNLIMITED ; // (1 currently)lat = 13 ;lon = 13 ;

    variables:float lat(lat) ;

    lat:_FillValue = 9.96921e+36f ;lat:long_name = "latitude" ;lat:standard_name = "latitude" ;lat:units = "degrees_north" ;lat:valid_min = -90.f ;lat:valid_max = 90.f ;lat:comment = " Latitude geographical coordinates,WGS84 projection" ;lat:axis = "Y" ;

    float lon(lon) ;lon:_FillValue = 9.96921e+36f ;lon:long_name = "longitude" ;lon:standard_name = "longitude" ;lon:units = "degrees_east" ;lon:valid_min = -180.f ;lon:valid_max = 180.f ;

    28 Chapter 1. Technical documents

    http://cerbere.readthedocs.orghttp://cerbere.readthedocs.org

  • felyx Documentation, Release 0.1.0

    lon:comment = " Longitude geographical coordinates,WGS84 projection" ;lon:axis = "X" ;

    int time(time) ;time:_FillValue = -2147483647 ;time:long_name = "reference time of sst field" ;time:standard_name = "time" ;time:units = "seconds since 1981-01-01 00:00:00" ;time:comment = " " ;time:axis = "T" ;

    byte miniprod_content_mask(time, lat, lon) ;miniprod_content_mask:_FillValue = -127b ;miniprod_content_mask:valid_max = 1L ;miniprod_content_mask:flag_meanings = "outside_miniprod inside_miniprod" ;miniprod_content_mask:long_name = "miniprod region of interest mask" ;miniprod_content_mask:flag_values = 0b, 1b ;

    float analysis_error(time, lat, lon) ;analysis_error:_FillValue = -32768.f ;analysis_error:long_name = "estimated error standard deviation of analysed_sst" ;analysis_error:standard_name = "sea_surface_temperature_error" ;analysis_error:units = "kelvin" ;analysis_error:valid_max = 327.67f ;analysis_error:comment = " OSTIA foundation SST analysis standard deviation error" ;analysis_error:coordinates = "lon lat" ;

    byte mask(time, lat, lon) ;mask:_FillValue = -128b ;mask:long_name = "land sea ice lake bit mask" ;mask:valid_min = 1b ;mask:valid_max = 31b ;mask:comment = " Land/ open ocean/ sea ice /lake mask" ;mask:source = "NAVOCEANO_landmask_v1.0 EUMETSAT_OSI-SAF_icemask ARCLake_lakemask" ;mask:flag_meanings = "water land optional_lake_surface sea_ice optional_river_surface" ;mask:flag_masks = 1b, 2b, 4b, 8b, 16b ;mask:coordinates = "lon lat" ;

    float analysed_sst(time, lat, lon) ;analysed_sst:_FillValue = -32768.f ;analysed_sst:long_name = "analysed sea surface temperature" ;analysed_sst:standard_name = "sea_surface_foundation_temperature" ;analysed_sst:units = "kelvin" ;analysed_sst:valid_min = 270.15f ;analysed_sst:valid_max = 318.15f ;analysed_sst:comment = " OSTIA foundation SST" ;analysed_sst:source = "REMSS-L2P-AMSRE, UPA-L2P-ATS_NR_2P, NAVO-L2P-AVHRR18_G, NAVO-L2P-AVHRR19_G, EUR-L2P-AVHRR_METOP_A, SEVIRI_SST-OSISAF-L3C-v1.0, REMSS_L2P-TMI, GOES13-OSISAF-L3C-v1.0" ;analysed_sst:reference = "C.J. Donlon, M. Martin, J.D. Stark, J. Roberts-Jones, E. Fiedler, W. Wimmer. The operational sea surface temperature and sea ice analysis (OSTIA) system. Remote Sensing Environ., 116 (2012), pp. 140u2013158 http://dx.doi.org/10.1016/j.rse.2010.10.017" ;analysed_sst:coordinates = "lon lat" ;

    float sea_ice_fraction(time, lat, lon) ;sea_ice_fraction:_FillValue = -128.f ;sea_ice_fraction:long_name = "sea ice area fraction" ;sea_ice_fraction:standard_name = "sea_ice_area_fraction" ;sea_ice_fraction:units = "1" ;sea_ice_fraction:valid_max = 1.f ;sea_ice_fraction:comment = " Sea ice area fraction" ;sea_ice_fraction:coordinates = "lon lat" ;sea_ice_fraction:source = "EUMETSAT OSI-SAF" ;

    // global attributes::Conventions = "CF-1.6, Unidata Observation Dataset v1.0" ;:netcdf_version_id = "4.1.1 of Nov 7 2011 11:35:16 $" ;:date_created = "20141126T150130Z" ;

    1.1. User guide 29

  • felyx Documentation, Release 0.1.0

    :date_modified = "20141126T150130Z" ;:id = "HRDDS_ostia-ukmo-l4-glob-v2.0_CpF" ;:naming_authority = "fr.ifremer.cersat" ;:institution = "Institut Francais de Recherche et d\'Exploitation de la Mer/Centre de Recherche et d\'Exploitation satellitaire" ;:institution_abbreviation = "ifremer/cersat" ;:title = "HRDDS miniprod derived from 20111118120000-UKMO-L4_GHRSST-SSTfnd-OSTIA-GLOB-v02.0-fv02.0.nc over felyx site CpF" ;:summary = "HRDDS miniprod created by felyx on 20141126T150130 from /home/cerdata/provider/ghrsst/satellite/l4/glob/ostia-nrt/2011/322/20111118120000-UKMO-L4_GHRSST-SSTfnd-OSTIA-GLOB-v02.0-fv02.0.nc provided by UK Met Office. Contains data over CpF (a polygon with bounds from 49.6555841044 to 49.9897558956 degrees North and from -2.0974158956 to -1.7632441044 degrees East)." ;:cdm_feature_type = "grid" ;:keywords = "Oceans > Ocean Temperature > Sea Surface Temperature" ;:keywords_vocabulary = "NASA Global Change Master Directory (GCMD) Science Keywords" ;:standard_name_vocabulary = "NetCDF Climate and Forecast (CF) Metadata Convention" ;:scientific_project = "" ;:acknowledgement = "" ;:license = "" ;:format_version = "Felyx 1.0" ;:history = "Created by felyx on 20141126T150130 with SourceFile(\'/home/cerdata/provider/ghrsst/satellite/l4/glob/ostia-nrt/2011/322/20111118120000-UKMO-L4_GHRSST-SSTfnd-OSTIA-GLOB-v02.0-fv02.0.nc\', \'ostia-ukmo-l4-glob-v2.0\', True, None, True, False, \'analysed_sst\', )" ;:publisher_name = "ifremer/cersat" ;:publisher_url = "http://cersat.ifremer.fr" ;:publisher_email = "[email protected]" ;:creator_name = "felyx" ;:creator_url = "" ;:creator_email = "[email protected]" ;:processing_software = "Cersat/Cerbere 1.0" ;:processing_level = "" ;:references = "" ;:geospatial_lat_min = 49.525f ;:geospatial_lat_max = 50.125f ;:geospatial_lat_units = "degrees" ;:geospatial_lon_min = -2.225f ;:geospatial_lon_max = -1.625f ;:geospatial_lon_units = "degrees" ;:geospatial_vertical_min = "" ;:geospatial_vertical_max = "" ;:geospatial_vertical_units = "meters above mean sea level" ;:geospatial_vertical_positive = "up" ;:time_coverage_start = "20111118T120000" ;:time_coverage_stop = "20111118T120000" ;:time_coverage_resolution = "" ;:dynamic_target_latitude = 49.82267 ;:dynamic_target_longitude = -1.93033 ;:dynamic_target_time = "20111118T021849" ;:felyx_dataset_name = "ostia-ukmo-l4-glob-v2.0" ;:felyx_site_collection_name = "isar" ;:felyx_site_geometry = "POLYGON ((-1.7632441044005671 49.8226700000000022, -1.7640486686570991 49.8062927183239452, -1.7664546130330852 49.7900731588231551, -1.7704387669689619 49.7741675247221451, -1.7759627608843662 49.7587289959722128, -1.7829733956981708 49.7439062540447807, -1.7914031551647158 49.7298420500475373, -1.8011708560921633 49.7166718299532562, -1.8121824301810134 49.7045224301810151, -1.8243318299532516 49.6935108560921677, -1.8375020500475341 49.6837431551647200, -1.8515662540447779 49.6753133956981756, -1.8663889959722137 49.6683027608843659, -1.8818275247221457 49.6627787669689624, -1.8977331588231532 49.6587946130330877, -1.9139527183239400 49.6563886686571010, -1.9303299999999999 49.6555841044005675, -1.9467072816760598 49.6563886686571010, -1.9629268411768463 49.6587946130330877, -1.9788324752778541 49.6627787669689624, -1.9942710040277858 49.6683027608843659, -2.0090937459552221 49.6753133956981756, -2.0231579499524659 49.6837431551647200, -2.0363281700467484 49.6935108560921677, -2.0484775698189868 49.7045224301810151, -2.0594891439078364 49.7166718299532562, -2.0692568448352842 49.7298420500475373, -2.0776866043018294 49.7439062540447807, -2.0846972391156338 49.7587289959722128, -2.0902212330310386 49.7741675247221451, -2.0942053869669151 49.7900731588231551, -2.0966113313429013 49.8062927183239452, -2.0974158955994331 49.8226700000000022, -2.0966113313429013 49.8390472816760592, -2.0942053869669151 49.8552668411768494, -2.0902212330310386 49.8711724752778593, -2.0846972391156342 49.8866110040277917, -2.0776866043018294 49.9014337459552237, -2.0692568448352842 49.9154979499524671, -2.0594891439078369 49.9286681700467483, -2.0484775698189868 49.9408175698189893, -2.0363281700467484 49.9518291439078368, -2.0231579499524663 49.9615968448352845, -2.0090937459552225 49.9700266043018289, -1.9942710040277865 49.9770372391156386, -1.9788324752778546 49.9825612330310420, -1.9629268411768470 49.9865453869669167, -1.9467072816760604 49.9889513313429035, -1.9303300000000005 49.9897558955994370, -1.9139527183239406 49.9889513313429035, -1.8977331588231541 49.9865453869669167, -1.8818275247221465 49.9825612330310420, -1.8663889959722146 49.9770372391156386, -1.8515662540447786 49.9700266043018289, -1.8375020500475348 49.9615968448352845, -1.8243318299532525 49.9518291439078368, -1.8121824301810141 49.9408175698189893, -1.8011708560921640 49.9286681700467483, -1.7914031551647165 49.9154979499524671, -1.7829733956981713 49.9014337459552237, -1.7759627608843667 49.8866110040277917, -1.7704387669689621 49.8711724752778593, -1.7664546130330854 49.8552668411768494, -1.7640486686570991 49.8390472816760663, -1.7632441044005671 49.8226700000000022))" ;:felyx_site_identifier = "CpF" ;:felyx_site_name = "Cap Finistere" ;:geospatial_lat_resolution = 0.04999924f ;:geospatial_lon_resolution = 0.04999995f ;:percentage_coverage_of_site_by_miniprod = 100. ;:source = "20111118120000-UKMO-L4_GHRSST-SSTfnd-OSTIA-GLOB-v02.0-fv02.0.nc" ;:source_slices = "(slice(2790, 2803, None), slice(3555, 3568, None))" ;

    }

    30 Chapter 1. Technical documents

  • felyx Documentation, Release 0.1.0

    1.1.6 Miniprod access and organization

    Access

    FTP

    Data can be retrieved with any FTP client by connecting to the FTP server of a felyx host (remember some organiza-tions may not offer this capability). The FTP server provides a direct view to the storage area where the miniprods areproduced and stored.

    The organization of the FTP repository is completely deterministic, making it very easy to access and download yourminiprods of interest if you know the dataset(s), the site(s) and temporal span of the these miniprods.

    1.1.7 In situ server

    What is the felyx in situ server?

    The felyx in situ server is an independant system that stores and gives access to in situ data values. It can be usedeither:

    • to query in situ miniprods and/or metrics in the same way you would query EO data from a felyx server.

    • to cross query in situ in situ miniprods and/or metrics with miniprods and/or metrics from a felyx server in orderto build a joint dataset (match-up dataset) or intercompare these respective data.

    We explained in a previous section that a felyx server stores trajectories (defined by lists of time and geographicallocations) defining dynamic extraction sites: these trajectories can be associated with in situ measuring platforms(drifters, ships, etc...) but not only (hurricanes,...). Being agnostic to what these trajectories represent, felyx does notstore nor deliver any in situ data.

    In the same way we can intercompare miniprods and metrics from different EO datasets from a single or differentfelyx instances, we would like to be able to intercompare the metrics or miniprods of an EO dataset extracted over anin situ device trajectory with the corresponding values measured and collected by the device. To this aim, we providean in situ server which acts exactly as a felyx server and expose the same web services.

    The equivalent in situ miniprod is a time series of values centered around a given time. The equivalent in situ metricsis a single in situ observation or the result of a metrics operator applied to a miniprod (exactly like in a EO felyx server).

    Ingesting data into the in situ server

    Users may run in situ data requests either directly to the in situ server for simple queries (to only get in situ data), orthrough a felyx instance for colocation with miniprods (providing that felyx instance is aware of the in situ server).

    Querying the datasets metadata

    Each dataset’s metadata can be retrieved with the url /metadata, for example:

    :$> curl "http://127.0.0.1:1080/felyx-insitu/metadata"

    yields:

    {"metadata": {

    "iQuam_v2": {"variable": {

    1.1. User guide 31

  • felyx Documentation, Release 0.1.0

    "Wind_Speed": {"units": "meter per second","long_name": "in situ wind speed","standard_name": "wind_speed","comment": "in situ observations from ships, drifting buoys and moored buoys, not available from argo (Platform_Type = 6)","source": "ships, drifting buoys and moored buoys"

    },"Dew_Point_Temperature": {

    "units": "kelvin","long_name": "in situ dew point temperature","standard_name": "dew_point_temperature","comment": "in situ observations from ships, drifting buoys and moored buoys, not available from argo (Platform_Type = 6)","source": "ships, drifting buoys and moored buoys"

    },"Extra_Quality_Flag": {

    "comment": "Extra Quality flags packed in 8 bytes.Starting From Least Significant Bit:\nbit 0-1 (ARGO Time QC): 0 - Pass, 1 - Fail ARGO inherited QC, 2 - Fail ARGO SST extraction QC (Range Check);\nbit 2-3 (ARGO Location QC): 0 - Pass, 1 - Fail ARGO inherited QC, 2 - Fail ARGO SST extraction QC (Range Check);\nbit 4-6 (ARGO Temperature/Pressure QC): 0 - Pass, 1 - Fail ARGO inherited QC, 2 - Fail ARGO SST extraction QC (Range Check), 3 - Fail ARGO SST extraction QC (Vertical Spike Check), 4 - not in iQuam favored depth range (i.e. from 3dBar to 8dBar); \nbit 7 (CMS Buoy Blacklist / ARGO Grey List): 0- Not flagged, 1-Flagged. \nbit 8 (Performance History): 0- Normal, 1-Suspicious.\nbit 9: reserved.\nbit 10-13: ICOADS individual flags.\nbit 14-17: ICOADS trimming flags.\nbit 18-28: ICOADS adaptive flags.\nbit 29-63: reserved.","long_name": "extra quality flags of in situ SST generated by iQuam or inherited from original data source","source": "iQuam quality control or original data source"

    },"Wind_Direction": {

    "units": "degree","long_name": "in situ wind direction","standard_name": "wind_to_direction","comment": "in situ observations from ships, drifting buoys and moored buoys, not available from argo (Platform_Type = 6)","source": "ships, drifting buoys and moored buoys"

    },"Sea_Water_Pressure": {

    "units": "dbar","long_name": "argo sea water pressure","standard_name": "sea_water_pressure","comment": "sea water pressure of the depth where argo sst is taken, only available from argo (Platform_Type = 6)","source": "argo floats"

    },"Air_Pressure": {

    "units": "pascal","long_name": "in situ air pressure at sea level","standard_name": "air_pressure_at_sea_level","comment": "in situ observations from ships, drifting buoys and moored buoys, not available from argo (Platform_Type = 6)","source": "ships, drifting buoys and moored buoys"

    },"Platform_Type": {

    "comment": "0: Unknown; 1: Ship; 2: Drifting Buoy; 3: Open-sea Moored Buoy; 4: Coastal Moored Buoy; 5: C-MAN Station; 6: Argo Float; 7+: Reserved. Note: only type 1,2,3,4,6 are QCed.","long_name": "in situ platform type"

    },"Air_Temperature": {

    "units": "kelvin","long_name": "in situ air temperature","standard_name": "air_temperature","comment": "in situ observations from ships, drifting buoys and moored buoys, not available from argo (Platform_Type = 6)","source": "ships, drifting buoys and moored buoys"

    },"Longitude": {

    "units": "degrees east","long_name": "longitude","standard_name": "longitude","comment": "precision 0.01 degree"

    },"Cloud_Coverage": {

    32 Chapter 1. Technical documents

  • felyx Documentation, Release 0.1.0

    "units": "okta","long_name": "in situ cloud coverage","comment": "0: completely clear sky; ... 8: completely overcast; 9: sky obscured or observation is missing. not available from argo (Platform_Type = 6)","source": "ships, drifting buoys and moored buoys"

    },"Time": {

    "units": "seconds since 1981-01-01 00:00:00","long_name": "reference time of file"

    },"Latitude": {

    "units": "degrees north","long_name": "latitude","standard_name": "latitude","comment": "precision 0.01 degree"

    },"Quality_Indicator": {

    "comment": "Quality indicator, i.e. Probability of Gross Error of an individual observation, 0:absolute correct, 1: absolute error.","long_name": "quality indicator (probability of error) of in situ SST generated by iQuam","source": "iQuam quality control"

    },"Quality_Flag": {

    "comment": "Quality flags packed in 2 bytes.\n Lower Byte: bit 0-1 (Overall Quality): 0-Normal, 2-Noisy, 1-Erroneous, 3-QC Unavailable;\n bit 2-3 (Duplicate Check): 0-No duplicate, 1-Duplicate kept, 2-Duplicate removed;\nbit 4 (Track Check and Geolocation Check): 0-Pass, 1-Fail;\nbit 5 (SST Spike Check): 0-Pass, 1-Fail; \nbit 6 (ID Validity Check): 0-Valid, 1-Invalid;\nbit 7 (Number of Buddies): 0-Checked with 6+ Buddies, 1-Otherwise. \nHigher Byte: reserved.","long_name": "quality flags of in situ SST generated by iQuam","source": "iQuam quality control"

    },"Sea_Surface_Temperature": {

    "units": "kelvin","long_name": "in situ sea surface temperature","standard_name": "sea_surface_temperature","comment": "for argo floats (Platform_Type = 6), it is sea water temperature at depth defined by Sea_Water_Pressure; for others, it is sea surface temperature","source": "in situ observations from ships, drifting buoys, moored buoys and argo floats"

    }},"global": {

    "comment": "original data from ICOADS v2.5 (1980-2007), NCEP GTS (2007-) and argo USGODAE/GDAC (1997-)","license": "freely and openly available, with no restrictions; reference/acknowledgement would be appreciated","title": "iQuam - in situ SST quality monitor","product_version": "2.0","summary": "in situ SST from ships, drifting buoys, moored buoys and argo floats quality controlled by iQuam","netcdf_version_id": "4.0","source": "NOAA/NESDIS/STAR www.star.nesdis.noaa.gov/sod/sst/iquam/v2/","contact": "[email protected], [email protected] or [email protected]","references": "Xu, F. and A. Ignatov, 2013: in situ SST quality monitor (iQuam), JTECH, in press; http://www.star.nesdis.noaa.gov/sod/sst/iquam","platform": "Ship, Drifting buoy, Moored buoy, Argo float","Conventions": "CF-1.4","institution": "NOAA/NESDIS/STAR"

    }},"test_data": "no available metadata","my_dataset": "no available metadata","my_in_situ_v1": "no available metadata"

    }}

    1.1. User guide 33

  • felyx Documentation, Release 0.1.0

    Querying data from the in situ server

    Simple query

    General syntax Requests for data directly to the server follow the same style as felyx queries. In this context,miniprods correspond to data returned within a file whereas metrics correspond to field values directly returned in therequest response.

    The requests can be performed through web services (in a similar way as for felyx miniprods and metrics) or directlyin the back-end using the felyx-in-situ-query command line tool.

    A web query will be performed as follow:

    :$> curl -XPOST "/extraction/" -d ''

    your query can also be put in a text file:

    :$> curl -XPOST "/extraction/" -d @

    The equivalent query from the back-end, using the felyx-in-situ-query command line tool will be:

    :$> felyx-in-situ-query selection ''

    The result is written in the text file provided with argument.

    Examples miniprod selection

    A simple in situ data selection returned as a miniprod file will look like this:

    :$> curl -XPOST"http://localhost:1080/felyx-insitu/extraction/" -d '{"selection": {

    "stop_time": "2012-07-02T00:00:14","start_time": "2012-07-01T00:00:00","site_list": ["co25619"],"data_type": "miniprods","dataset_list": ["coriolis_drifter"]

    }}'

    or, using the command line tool in the back-end:

    :$> felyx-in-situ-query selection '{"selection": {

    "stop_time": "2012-07-02T00:00:14","start_time": "2012-07-01T00:00:00","site_list": ["co25619"],"data_type": "miniprods","dataset_list": ["coriolis_drifter"]

    }}' toto.txt

    and will yield:

    {"coriolis_drifter": {

    "co25619": {"miniprods": ["LOCAL_IN_SITU:20120701000000_co25619_coriolis_drifter_20120702000014.nc", "LOCAL_IN_SITU:20120701000000_co25619_coriolis_drifter_20120702000014.nc", "LOCAL_IN_SITU:20120701000000_co25619_coriolis_drifter_20120702000014.nc", "LOCAL_IN_SITU:20120701000000_co25619_coriolis_drifter_20120702000014.nc", "LOCAL_IN_SITU:20120701000000_co25619_coriolis_drifter_20120702000014.nc", "LOCAL_IN_SITU:20120701000000_co25619_coriolis_drifter_20120702000014.nc", "LOCAL_IN_SITU:20120701000000_co25619_coriolis_drifter_20120702000014.nc", "LOCAL_IN_SITU:20120701000000_co25619_coriolis_drifter_20120702000014.nc", "LOCAL_IN_SITU:20120701000000_co25619_coriolis_drifter_20120702000014.nc", "LOCAL_IN_SITU:20120701000000_co25619_coriolis_drifter_20120702000014.nc", "LOCAL_IN_SITU:20120701000000_co25619_coriolis_drifter_20120702000014.nc", "LOCAL_IN_SITU:20120701000000_co25619_coriolis_drifter_20120702000014.nc", "LOCAL_IN_SITU:20120701000000_co25619_coriolis_drifter_20120702000014.nc", "LOCAL_IN_SITU:20120701000000_co25619_coriolis_drifter_20120702000014.nc", "LOCAL_IN_SITU:20120701000000_co25619_coriolis_drifter_20120702000014.nc", "LOCAL_IN_SITU:20120701000000_co25619_coriolis_drifter_20120702000014.nc", "LOCAL_IN_SITU:20120701000000_co25619_coriolis_drifter_20120702000014.nc", "LOCAL_IN_SITU:20120701000000_co25619_coriolis_drifter_20120702000014.nc", "LOCAL_IN_SITU:20120701000000_co25619_coriolis_drifter_20120702000014.nc", "LOCAL_IN_SITU:20120701000000_co25619_coriolis_drifter_20120702000014.nc", "LOCAL_IN_SITU:20120701000000_co25619_coriolis_drifter_20120702000014.nc", "LOCAL_IN_SITU:20120701000000_co25619_coriolis_drifter_20120702000014.nc", "LOCAL_IN_SITU:20120701000000_co25619_coriolis_drifter_20120702000014.nc"],

    34 Chapter 1. Technical documents

  • felyx Documentation, Release 0.1.0

    "times": ["2012-07-01T01:10:00", "2012-07-01T02:10:00", "2012-07-01T03:10:00", "2012-07-01T04:10:00", "2012-07-01T05:10:00", "2012-07-01T06:10:00", "2012-07-01T07:10:00", "2012-07-01T08:10:00", "2012-07-01T09:10:00", "2012-07-01T10:10:00", "2012-07-01T11:10:00", "2012-07-01T12:10:00", "2012-07-01T13:10:00", "2012-07-01T14:10:00", "2012-07-01T15:10:00", "2012-07-01T16:10:00", "2012-07-01T17:10:00", "2012-07-01T18:10:00", "2012-07-01T19:10:00", "2012-07-01T20:10:00", "2012-07-01T21:10:00", "2012-07-01T22:10:00", "2012-07-01T23:10:00"]}

    }}

    metric selection

    A direct request field data (simulating metrics) can be made:

    :$> curl -XPOST "http://localhost:1080/felyx-insitu/extraction/" -d '{

    "selection": {"stop_time": "2012-07-02T00:00:14","start_time": "2012-07-01T00:00:00","site_list": ["co25619"],"data_type": "metrics","metric_list": ["water_temperature"],"dataset_list": ["coriolis_drifter"]

    }}'

    to yield:

    {"coriolis_drifter": {"co25619": {

    "metrics": {"water_temperature": [0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.300000011921, 0.300000011921]

    },"times": ["2012-07-01T01:10:00", "2012-07-01T02:10:00", "2012-07-01T03:10:00", "2012-07-01T04:10:00", "2012-07-01T05:10:00", "2012-07-01T06:10:00", "2012-07-01T07:10:00", "2012-07-01T08:10:00", "2012-07-01T09:10:00", "2012-07-01T10:10:00", "2012-07-01T11:10:00", "2012-07-01T12:10:00", "2012-07-01T13:10:00", "2012-07-01T14:10:00", "2012-07-01T15:10:00", "2012-07-01T16:10:00", "2012-07-01T17:10:00", "2012-07-01T18:10:00", "2012-07-01T19:10:00", "2012-07-01T20:10:00", "2012-07-01T21:10:00", "2012-07-01T22:10:00", "2012-07-01T23:10:00"]

    }}

    }

    metric selection with filters

    Primitive filtering of data by other fields is available by adding a list of constraints to the query. A constraint must bein the format:

    " "

    for example:

    :$> curl -XPOST "http://localhost:1080/felyx-insitu/extraction/" -d '{

    "selection": {"stop_time": "2012-07-02T00:00:14","start_time": "2012-07-01T00:00:00","site_list": ["co25619"],"data_type": "metrics","metric_list": ["water_temperature"],"constraint_list": ["solar_zenith_angle lt 68."],"dataset_list": ["coriolis_drifter"]

    }}'

    yields:

    1.1. User guide 35

  • felyx Documentation, Release 0.1.0

    {"coriolis_drifter": {"co25619": {

    "metrics": {"water_temperature": [0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.10000000149, 0.300000011921, 0.300000011921]

    },"times": ["2012-07-01T06:10:00", "2012-07-01T07:10:00", "2012-07-01T08:10:00", "2012-07-01T09:10:00", "2012-07-01T10:10:00", "2012-07-01T11:10:00", "2012-07-01T12:10:00", "2012-07-01T13:10:00", "2012-07-01T14:10:00", "2012-07-01T15:10:00", "2012-07-01T16:10:00", "2012-07-01T17:10:00", "2012-07-01T18:10:00", "2012-07-01T19:10:00", "2012-07-01T20:10:00", "2012-07-01T21:10:00", "2012-07-01T22:10:00", "2012-07-01T23:10:00"]

    }}

    }

    The available operators are:

    1. Equals: eq, equal

    2. Greater than: gt, greater

    3. Greater than or equal: ge, greater_equal

    4. Less than: lt, less

    5. Less than or equal: le, less_equal

    6. Not equal: ne, not_equal,

    7. Modulo: mod

    Federated query

    Additionally, due to the federal nature of felyx, a request to both servers can be made in a single query to the mainfelyx instance, by simply telling the instance that the in situ dataset is on another instance:

    :$> curl -XPOST "http://127.0.0.1:8000/felyx/extraction/extraction/" -d '{

    "extraction": {"selection": {

    "stop_time": "2012-10-07T00:00:00","start_time": "2012-10-01T00:00:00","site_list": ["LMEL"],"data_type": "metrics","metric_list": ["sea_surface_temperature", "mean_sst"],"dataset_list": ["PELAMIS_TEST_IN_SITU:test_data", "SST-34"]

    }}

    }'

    Will cause the felyx instance to make a remote call to the in situ instance (or indeed any other felyx instance that itknows about):

    {"SST-34": {

    "LMEL": {"metrics": {"mean_sst": [

    281.89208984375,281.8932800292969,282.02911376953125,282.4496765136719,282.46356201171875,

    36 Chapter 1. Technical documents

  • felyx Documentation, Release 0.1.0

    282.44659423828125,282.3799743652344

    ],"sea_surface_temperature": [

    null,null,null,null,null,null,null

    ]},"times": ["2012-10-01T00:00:00","2012-10-02T00:00:00","2012-10-03T00:00:00","2012-10-04T00:00:00","2012-10-05T00:00:00","2012-10-06T00:00:00","2012-10-07T00:00:00"

    ]}

    },"PELAMIS_TEST_IN_SITU:test_data": {"LMEL": {

    "metrics": {"sea_surface_temperature": [

    285.0,286.0

    ]},"times": ["2012-10-03T12:12:03","2012-10-04T12:03:24"

    ]}

    }}

    Colocation query

    time constrained query

    A simple form of colocation request (when users have a set of times for which they check to see if data is available)can be achieved by adding the values associated_times and maximum_time_difference (in minutes) toa selection query:

    :$> curl -XPOST "http://localhost:1080/felyx-insitu/extraction/" -d '{

    "selection": {"stop_time": "2012-07-02T00:00:14","start_time": "2012-07-01T00:00:00","site_list": ["co25619"],"data_type": "metrics","metric_list": ["water_temperature"],"dataset_list": ["coriolis_drifter"],

    1.1. User guide 37

  • felyx Documentation, Release 0.1.0

    "associated_times": ["2012-07-01T12:00:00","2012-07-01T18:00:00"

    ],"maximum_time_difference": 20

    }}'

    yields:

    {"coriolis_drifter": {

    "co25619": {"metrics": {

    "water_temperature": [0.10000000149, 0.10000000149]},"times": ["2012-07-01T12:10:00", "2012-07-01T18:10:00"]

    }}

    }

    but a smaller maximum_time_difference, and a non-matching date:

    :$> curl -XPOST "http://localhost:1080/felyx-insitu/extraction/" -d '{

    "selection": {"stop_time": "2012-07-02T00:00:14","start_time": "2012-07-01T00:00:00","site_list": ["co25619"],"data_type": "metrics","metric_list": ["water_temperature"],"dataset_list": ["coriolis_drifter"],"associated_times": [

    "2012-07-01T12:00:00","2012-07-01T18:00:00"

    ],"maximum_time_difference": 5

    }}'

    yields:

    {"coriolis_drifter": {"co25619": {

    "metrics": {},"times": []

    }}

    }

    Note that the ‘maximum_time_difference’ value can be omitted, in which case it will default to 30 (meaning +/- 30minutes).

    colocation query with felyx

    38 Chapter 1. Technical documents

  • felyx Documentation, Release 0.1.0

    Felyx also allows the generation of colocated datasets, for example:

    :$> curl -XPOST "http://127.0.0.1:8000/felyx/extraction/extraction/" -d '{

    "extraction": {"formatting": {

    "input": {"colocation": {

    "input": {"selection": {

    "stop_time": "2012-10-05T00:00:00","start_time": "2012-10-01T00:00:06","site_list": ["LMEL"],"data_type": "metrics","dataset_list": ["SST-26", "SST-27"],"metric_list": ["mean_sst"]

    }},"maximum_time_difference": 15

    }},"operation_name": "json"

    }}

    }'

    Would return only the colocated values:

    {"colocation": {"SST-27": {"LMEL": {"metrics": {"mean_sst": [281.5019836425781,281.92401123046875

    ]},"times": ["2012-10-03T12:00:00","2012-10-04T12:00:00"

    ]}

    },"SST-26": {

    "LMEL": {"metrics": {"mean_sst": [null,280.98237704918034

    ]},"times": ["2012-10-03T12:13:44","2012-10-04T12:02:55"

    ]}

    }

    1.1. User guide 39

  • felyx Documentation, Release 0.1.0

    }}

    By adding only an “associated_dataset” string and an “associated_fields” list to the request, values from the in situserver are included in the colocation:

    :$> curl -XPOST "http://127.0.0.1:8000/felyx/extraction/extraction/" -d '{

    "extraction": {"formatting": {

    "input": {"colocation": {

    "input": {"selection": {

    "stop_time": "2012-10-05T00:00:00","start_time": "2012-10-01T00:00:06","site_list": ["LMEL"],"data_type": "metrics","dataset_list": ["SST-26", "SST-27"],"metric_list": ["mean_sst"]

    }},"maximum_time_difference": 15,"associated_dataset": "PELAMIS_TEST_IN_SITU:test_data","associated_fields": ["sea_surface_temperature"]

    }},"operation_name": "json"

    }}

    }'

    The result from the main felyx server would become:

    {"colocation": {"SST-27": {"LMEL": {"metrics": {"mean_sst": [281.5019836425781,281.92401123046875

    ]},"times": ["2012-10-03T12:00:00","2012-10-04T12:00:00"

    ]}

    },"SST-26": {

    "LMEL": {"metrics": {"mean_sst": [null,280.98237704918034

    ]},

    40 Chapter 1. Technical documents

  • felyx Documentation, Release 0.1.0

    "times": ["2012-10-03T12:13:44","2012-10-04T12:02:55"

    ]}

    },"test_data": {

    "LMEL": {"metrics": {

    "sea_surface_temperature": [285.0,286.0

    ]},"times": ["2012-10-03T12:12:03","2012-10-04T12:03:24"

    ]}

    }}

    }

    Multiple associated fields may be included in the request, but only one associated data may be included.

    Cross query

    A cross query with a felyx server returning a colocated dataset would be as follow:

    {"extraction": {

    "colocation": {"input": {

    "selection": [{

    "dataset_list": ["navo-l2p-avhrr19_l_v1.0"],"start_time": "2012-01-01T00:00:00","stop_time": "2014","site_list": ["ghr060"]

    },{

    "dataset_list": ["SST-34"],"start_time": "2012-01-01T00:00:00","stop_time": "2014","site_list": ["ghr060"]

    }]

    },"maximum_time_difference": 720,“associated_dataset”: “PELAMIS_IN_SITU:iQuam_v2"

    }}

    }

    Note: As the “navo-l2p-avhrr19_l_v1.0” dataset is listed first in the selection, it is treated as the “primary dataset”,this means that the colocations are returned only when there is a colocation between navo-l2p-avhrr19_l_v1.0 and

    1.1. User guide 41

  • felyx Documentation, Release 0.1.0

    SST-34, and navo-l2p-avhrr19_l_v1.0 and iQuam_v2.

    The important keywords here, compared to a felyx server query, is:

    associated_dataset specifies the in situ server and dataset to match with the queried EO miniprods. It is expressed asserver:dataset where server is the name of the in situ server and dataset the identifier of the requested in situdataset (an in situ server can offer several choices of datasets like any felyx server instance).

    1.1.8 Tutorials

    Working with miniprods

    Querying miniprods

    Topic What you want

    You want to download miniprods (or access their content) to use them as input to your application or youranalysis.

    The felyx concepts and operations required for such usage include:

    • access to the miniprods

    • fine selection and extraction of miniprods

    Simple access to miniprods If you are interested in a complete (or partial) time series of a dataset over a given site(or multiple datasets and sites), you can directly access to the miniprod files using the provided basic services : FTPor OpenDAP.

    Comparing miniprods from different datasets

    What you want

    You want to compare two different datasets, either side by side, or looking at their difference. These datasetsmay be available at different time, may have different resolutions.

    The felyx concepts and operations required for such usage include:

    • remapping: remap data over the same grid to visualize them or to process a difference or anomaly.

    • colocation: making the difference between miniprods from different datasets require first that we match them toeach other withing a given time window

    {"extraction": {

    "packaging": {"input": {

    "remapping": {"input": {

    "colocation": {"input": {

    "selection": [{

    42 Chapter 1. Technical documents

  • felyx Documentation, Release 0.1.0

    "start_time": "2012-10-01T00:00:00","dataset_list": ["SST-27"],"stop_time": "2012-10-05T00:00:00","data_type": "miniprods","site_list": ["ghr014", "ghr060"]

    }, {"start_time": "2012-10-01T00:00:00","dataset_list": ["SST-26"],"stop_time": "2012-10-05T00:00:00","data_type": "miniprods","site_list": ["ghr014", "ghr060"]

    }]},"linear_zero_weight_distance": 25000,"resolution": 0.1

    }}

    }},"operation_name": "targzip"

    }}

    }

    Working with metrics

    Working with in situ data

    1.2 Installation Guide

    Single page guide

    1.2.1 Installation overview

    Introduction

    This document will guide you through the installation of the Felyx system.

    It is recommended to print the single page guide or at least the preparation page in order to take notes and usethe configuration memo that you will need to install the system.

    Remarks

    Felyx has been developped for Python 2.7, so make sure this version is installed on the machines where you want toinstall Felyx components.

    Setups with less than the recommended number of machines are possible but will not be treated in this document.

    Components

    The following figure gives an overview of the different system elements that compose a felyx instance and that willneed to be installed:

    1.2. Installation Guide 43

  • felyx Documentation, Release 0.1.0

    felyx uses the following subsystems:

    • a frontend composed of a django python application run by a uWSGI application server and a nginx web server.The requests from any user or client applications go this way:

    client -> nginx -> uwsgi -> django

    The Django framework needs access to a relational database: postgresql (prefered), mysql,....

    In a production environment, the front-end will be usually located in the DMZ (unless you don’t want to openyour felyx instance to outside world).

    • RabbitMQ is a producer/consumer messaging system. It is used to communicate safely between the back-endand the front-end. For better performances, felyx requests are processed in the LAN area where it is easier toaccess the input data and deploy a cluster of servers for multiprocessing. Alternatively you can also install thewhole system in the DMZ.

    • ElasticSearch is a search engine used to store and retrieve the metrics processed by felyx over the extractedminiProd data.

    • celery is a framework used to perform distributed processing. It is based on consumer tasks that poll the Rab-bitMQ messaging queues to get and execute job orders.

    44 Chapter 1. Technical documents

  • felyx Documentation, Release 0.1.0

    Hardware considerations

    Number of machines

    The Felyx system can be installed on a single computer for test purposes, but a minimal production installation shouldcount at least 6 machines:

    • 1 for RabbitMQ

    • 1 for Celery worker (miniprod & metrics processing) and ElasticSearch

    • 1 for Celery worker (web requests) and ElasticSearch

    • 1 for Celery worker (logs requests) and ElasticSearch

    • 1 for the downloader

    • 1 for the frontend

    Note that this setup can only be used to process a limited volume of data and should be scaled up (with more Celery“processing” workers) if the workload grows too much.

    RabbitMQ machine

    The RabbitMQ server will relay job orders from client applications to Celery workers.

    If the server is too busy or does not have enough memory, these orders will be delayed and might be ignored (there isa timeout).

    Since its role is critical, the RabbitMQ server must have its own machine, with enough memory to store the potentialthousands of messages it will receive and good network capabilities to handle the communication with workers. Thisdepends on the number of files you intend to process at the same time or in a reprocessing and the number of siteswhere you extract miniprods, and the number of metrics you intend to process for each miniprod. There is one entryin the RabbitMQ queue per miniprod extracted or metrics processed.

    ElasticSearch node

    ElasticSearch nodes require a lot of RAM and some disk space to store the indexed data (miniproducts information,metrics and logs). A good CPU will speed up indexation and search, but is not essential.

    Note: Celery workers and ElasticSearch nodes can share the same hardware, but it may have a negative impact onperformance under heavy load.

    Elasticsearch web site provides some good resources on the hardware requirements, depending if you are running iton a cluster, on virtual machines, etc...

    Also, using at least two different machines will improve the speed and reliability, as Elasticsearch provides bothload-balancing and replication capabilities.

    Celery workers

    Celery workers will need both a good CPU and a good amount of RAM to perform the operations requested by Felyxusers in optimal delays.

    1.2. Installation Guide 45

  • felyx Documentation, Release 0.1.0

    Note: Celery workers and ElasticSearch nodes can share the same hardware, but it may have a negative impact onperformance under heavy load.

    Note: Different workers (e.g for processing, logs requests, web requests) should run on different cores or servers toavoid concurrent access to CPU resources. The highest number of workers should be affected to the most frequentlyaccessed process, usually the miniprod/metrics processing (especially when reprocessing data) or the web requests.During web requests, distributed processing is performed whenever possible : a large number of workers will thereforeresult in faster processing of a single request (in addition to the fact that multiple requests can be processed at the sametime).

    Downloader server

    The downloader should be installed on its own machine and have read/write permission on the shared disk space.

    Note: The Downloader is not mandatory in your installation. It is provided to help the routine ingestion of EO datainto felyx. If you only process data offline, or using your own orchestrator, you don’t need it.

    Shared disk

    The shared disk space must be readable and writable by Celery workers. It will store the miniprods and, optionally,the metrics. But the big part are the miniprods and it grows fast if you have a lot of extraction sites, process a lot ofproducts or products with a high resolution. This is difficult to give a figure here as it will depend on your own usage.Try to assess this respect to the total covered area of your products versus the total spatial coverage of your extractionsites.

    Note: The shared disk should also be readable by the machines that will host the frontend.

    Warning: felyx creates a huge number of files (miniprods and metrics). This must be taken into account whenconfiguring your file system for felyx usage as the number of inodes may be an issue on some servers.

    Operating system

    Felyx back-end is fully implemented in Python 2.7 and should therefore be able to run on any operating system. It hasbeen succesfully tested on Linux (Ubuntu, CentOS) and MacOS.

    Preparing the installation

    Before installing the Felyx system, several choice must be made, like the number of workers, the directory where datawill be stored, etc...

    46 Chapter 1. Technical documents

  • felyx Documentation, Release 0.1.0

    Configuration memo

    Please fill the following table, its content will be used in the next steps of the installation process and can be used as amemo for maintenance:

    Description Abbrevi-ation

    Numberof values

    Example Yoursetup

    RabbitMQ Host RHOST 1 myhost1ElasticSearch nodes ES_NODES 1 or

    moreeshost1 eshost2

    Celery workers nodes (processing) WORK-ERS_p

    1 ormore

    ahost1 eshost1

    Celery workers nodes (logs) WORK-ERS_l

    1 ormore

    ahost1 eshost1

    Celery workers nodes (web) WORK-ERS_w

    1 ormore

    ahost1 eshost1

    Downloader host DHOST 1 dhostFrontend hosts FHOST 1 or

    morepublichost

    Directory containing all felyx packages and downloadedsoftware

    IN-STALL_DIR

    1 /opt/felyx/install/sources

    Directory where the configuration files will be created.MUST be on shared disk!

    FE-LYX_CFG_DIR

    1 /data/datasets/Projects/felyx/config

    Directory used as workspace by Celery workers. MUST beon shared disk!

    1 /data/datasets/Projects/felyx/var

    Directory where miniproducts will be stored. MUST be onshared disk!

    1 /data/datasets/Projects/felyx/mprods

    URL of ElasticSearch. The IP or hostname of one of thecluster node should do.

    1 http://eshost1:9200

    URL of the Felyx API. If you plan to serve the frontend athttp://h.tld/felyx, the API URL is http://h.tld/felyx/api/v1

    1 http://publichost/felyx/api/v1

    Felyx API username 1 felyxFelyx API password 1 xxxxxxxxxFelyx backend admin username 1 felyxFelyx backend admin password 1 xxxxxxxxxx

    Firewall issues

    The frontend will have to communicate with the backend via:

    • the RabbitMQ server

    • the shared disk space

    If you have a firewall between the backend and the frontend (and you should!), you have to make sure that it will openport 5672 for RabbitMQ and allows the frontend server to read the shared disk space.

    1.2.2 Backend installation

    Installing the ElasticSearch cluster

    ElasticSearch is preferably installed as a cluster on several servers, for better performances. To setup the ElasticSearchcluster, you will need to install ElasticSearch on each of these servers (nodes).

    1.2. Installation Guide 47

    http://eshost1:9200http://h.tld/felyxhttp://h.tld/felyx/api/v1http://publichost/felyx/api/v1

  • felyx Documentation, Release 0.1.0

    Note: in the following we designate by ES_NODES the list of hosts on which ElasticSearch will be deployed.

    This guide describes the installation of ElasticSearch from source, but you can use your operating system packagemanager to install it automatically.

    Requirements

    • SSH access to each node in ES_NODES

    • Java JDK installed on each node

    Instructions

    On one of the nodes in ES_NODES (we will call this node NODE1):

    1. Get ElasticSearch source code

    wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.7.4.tar.gz -O /tmp/elasticsearch-1.7.4.tar.gz

    2. Extract it to /opt

    tar xvzf /tmp/elasticsearch-1.7.4.tar.gz -C /opt

    3. Install the elasticsearch-head plugin

    cd /opt/elasticsearch-1.7.4bin/plugin -install mobz/elasticsearch-head

    4. Edit configuration file config/elasticsearch.yml

    cluster.name: felyx_clusternetwork.host: discovery.zen.minimum_master_nodes: 2discovery.zen.ping.multicast.enabled: falsediscovery.zen.ping.unicast.hosts: ["", "", ..., ""]

    If you plan to use a cluster, but only use one node, remove/comment the discovery.* properties.

    • Replace by the actual IP address of the current node

    • Replace , , ..., by the IP addresses of all the nodes contained in ES_NODES.

    On a standalone installation (evrything installed on your workstation or laptop), you just need the follow-ing lines:

    cluster.name: felyx_clusternetwork.host: 127.0.0.1

    5. Start ElasticSearch

    /opt/elasticsearch-1.7.4/bin/elasticsearch

    The first ElasticSearch node should now be up and running, all that’s left is to do the same for the other nodes.

    On each remaining node:

    scp -r NODE1:/opt/elasticsearch-1.7.4 /opt

    48 Chapter 1. Technical documents

  • felyx Documentation, Release 0.1.0

    It should allow you to skip step 1., 2. and 3.

    Then repeat steps 4. (you should just change the “network.host” setting) and 5.

    Test

    Open a webbrowser on http://:9200/_plugin/head/ (replace by the IP address of one the ElasticSearchnodes), you should see the cluster status page, listing all the nodes currently connected.

    Check that they are all listed on this page: if a node is not there, check it’s configuration and restart ElasticSearch.

    Installing the RabbitMQ server

    This guide describes the installation of RabbitMQ from the binary package provided by the RabbitMQ developpers,but you can use your operating system package manager to install it automatically.

    Requirements

    • SSH access to the server (RHOST) on which you want to install RabbitMQ server

    • Erlang installed on RHOST

    Instructions

    On RHOST:

    1. Get RabbitMQ source code

    wget http://www.rabbitmq.com/releases/rabbitmq-server/v3.2.3/rabbitmq-server-generic-unix-3.2.3.tar.gz -O /tmp/rmq.tar.gz

    2. Extract it to /opt

    tar xvzf /tmp/rmq.tar.gz -C /opt

    3. Install the management plugin

    cd /opt/rabbitmq_server-3.2.3sbin/rabbitmq-plugins