garching, april 2007 the simple numerical access protocol (snap) for theoretical data claudio...
TRANSCRIPT
Garching, April 2007
The Simple Numerical Access Protocol (SNAP)
for theoretical data Claudio GhellerCINECA ([email protected])
Garching, April 2007
1. OVERVIEW
Garching, April 2007
Simple Numerical Access Protocol - SNAP
Simple Numerical Access Protocol (hereafter SNAP) defines a standard to access numerical simulation outputs.
Data can be the outcome of different kinds of numerical applications.
However, SNAP is designed to address numerical simulation outputs organized as follows:
• For each timestep, the information must be sampled in a generic 3D space
• Positions in this volume are called x, y and z.
• The sampling can be regular (e.g. cartesian mesh) or irregular (e.g. particle or adaptive mesh position). Each mesh/particle position in the 3D space hosts the same physical quantity (i.e. mass, density, velocity, etc) for each timestep.
Ultimately, the sampling volume does NOT necessarily need to be geometric or even 3D. It could be any N-dimensional set of variables that can be used to perform a meaningful SNAP operation. Furthermore, also conditioned queries could be supported (e.g. extract data from a given region with temperature higher than a given value).
However, for simplicity, we start dealing with geometric 3D operations
Garching, April 2007
SNAP main stages
1. Search for available simulations and data. The query is on metadata. The result is an XML document (maybe VOTable) with matching result metadata.
2. Identification of subset of interest. The user identifies and set a subset of the full simulation data which is of interest. This subset is defined both in time and in space.
3. Snap request. Send to the server the selection parameters for the Snap action
4. Data staging and delivery. Metadata and data are delivered (possibly after some time, needed for extraction) via HTTP, FTP as binary files.
5. Service registration SNAP services need to be published in available registry. Registry inquiry must be performed according to the SNAP data model
Search for simulations with Lambda>0.7
I like this one
It’s too large !!!
Let’s SNAP it !!!
Metadata VOTable
Binary data file
Garching, April 2007
Data levels
Mimicing observational data, simulated data can be organized in 3 levels
Level 0: direct outcome of the simulation. Examples are the coordinates and velocities of files in an N-Body simulation, the density field on the computational mesh of a Jet simulation etc.
Level 1: data extracted or derived from the simulation results, having the same characteristics of the simulation results themselves. For example, the coordinates of the points that build up a galaxy cluster extracted from a cosmological simulation using a friend of friends algortihm
Level 2: results that have been obtained after an analysis process from Level 0 and Level 1 data. Examples are projected maps, statistical functions, Virtual Telescope applications.
SNAP deals with Level 0 and Level 1 data
Garching, April 2007
SNAP and Data levels
The SNAP protocol deals with Level 0 and Level 1 data. It specifies the following services:
• retrieval of the entire simulation outcome (the particle positions and velocities within the simulation box, or the physical quantaties at each grid point) – known as a snapshot – at one or more timesteps (it is not simple download!!!)
• retrieval of a specific subset or subvolume of a simulation (e.g. all the particles/grid-points within a certain region)
Garching, April 2007
SNAP in action: an example
http://www.astrocomp.it/itvo
Implemented by the Astrophysical Observatory of Catania (Becciani U., Costa A., Costa V.)
Developed according to first Theoretical Data Model Prototype together with a similar implementation in Trieste
Available:• Simulation discovery• VOTable download• Thumbnails• GetSnap
Ask Ugo for details…
Garching, April 2007
Requirements for compliance
The SNAP service MUST be implemented according to a minimal set of characteristics. Each of the defined characteristics should be developed according to the specifications of the SNAP documents.
1. The service MUST support a Simulation Selection service. The SNAP service MUST provide tools to select the datasets and proceed with following steps of the SNAP procedure.
2. The SNAP service MUST support a getUnits method (or getFields method… to be discussed). This method allows clients to get the list of units associated to the available fields.
3. The Sub-Volume Extraction method SHOULD be supported. If supported, a getThumb method MUST be available.
4. The setSnap method MUST be supported. This method allows clients to submit a SNAP operation
Garching, April 2007
Requirements for compliance (cont.ed)
6. The data retrieval (getSnap) method MUST be supported This method allows clients to retrieve single simulation snapshots and cutouts
7. The SNAP service MUST be registered by providing the information which describes the available functions. Registration allows clients to use a central registry service to locate compliant simulation access services and select an optimal subset of services to query, based on the characteristics of each service and the simulation data collections it serves.
8. Job management request methods, getSnapInfo, cancelSnap, MAY be supported. These methods allow users to inquire about the status of a submitted request and, possibly, to cancel it.
Garching, April 2007
2. SIMULATION SELECTION
AND UNITS
Garching, April 2007
Simulation discovery
Available simulations are returned as the result of a query based on a set of physical and technical parameters which to some degree specify the type of simulation of interest to the user.
These parameters can be general or specific to the discipline or research field of interest. The details of the search criteria and execution are not part of the SNAP protocol implementation
Garching, April 2007
SNAP Model
the SNAP service MUST provide tools to select the datasets of interest and proceed with following steps of the SNAP procedure
Selection tools must be developed according to the SNAP data model presented by GL. Results must be described according to the same model:
http://www.ivoa.net/twiki/bin/view/IVOA/IVOATheorySimulationDatamodel
Garching, April 2007
Data Units
Data are stored in the archives with specific units, which can be retrieved by the client by a getUnits() method.
The client can present the data in any suitable unit.
However, the client MUST convert any quantity in server-side units before submitting any request. E.g., the center of a computational volume of a N-body cosmological simulation can be specified in Mpc by the client, since this is familiar to the user. However, particle coordinates could be represented by the simulation code in the [0, 1] range. Therefore, the center position must be converted by the client to this internal representation.
Garching, April 2007
getUnits method
In order to submit a getUnits request, the following parameters must be specified and passed to the server:
DATASERVICE and DATASOURCE
which specify the service and the data object (described later)
FIELDS
which specifies the quantities for which we need units. The parameter is described later.
The getUnits method returns a string in which units are listed in the same order of the quantities specified in the FIELDS parameter, e.g:
FIELDS=”xposition,yposition,zposition,velocity,temperature”
UNITS_RESULT=”Mpc,Mpc,Mpc,km sec-1,K”
Garching, April 2007
getFields method
Alternatively a getFields() method could be implemented, which return both all the available fields and the corresponding units (in this case some unnecessary information could be communicated).
In order to submit a getFields request, the following parameters must be specified and passed to the server:
DATASERVICE and DATASOURCE
which specify the service and the data object (described later)
The getFields method returns a string in which fields and corresponding units are listed e.g:
FIELDS_RESULT=”xposition,Mpc,yposition,Mpc,zposition,Mpc,velocity,km sec-1,temperature,K”
Garching, April 2007
3. SUBSET SELECTION
Garching, April 2007
Subset selection
The Snap request must be submitted according to the prescription presented (and discussed) later. A data cutout service could be implemented. This allows the user to focus on interesting regions/features of the simulation and to select and download only the related data. If it is implemented, the service MUST provide tools to enable the client to specify the size and position of the subset.
In general, size and position can be any N-uple of parameters in a N-D phase space.
For simplicity we will focus on geometrical 3D examples.
Garching, April 2007
The “thumbnail”
“A miniature representation of a page or image that is used to identify a file by its contents”
The thumbnail is a representative, but much smaller (with respect to the data size), realization of the whole dataset. Since the whole dataset is not downloadable or directly “usable”, this is a way to have a light interaction with the data
The thumbnail data and the associated tools should allow the user to perform all the necessary operations to select the region for the cutout
We cannot identify a unique thumbnail tool. It depends on the scientific field, on the data, its geometry, its meaning… In the following slides some examples…
Garching, April 2007
Geometrical sampling
1. The thumbnail is a decimated set of data. Decimation can be obtained by random selection (e.g. for N-Body particles), averaging neighbour cells (mesh simulations)
2. The thumbnail preserve the dimensionality and geometry of the original dataset
3. An appropriate application (web based, visualization tool…) is used to set the selected region.
Garching, April 2007
Projections and Cutting Planes
1. The thumbnail is a N-1 Dim representation of the dataset. For example the projection along the line of sights (e.g. column density in cosmological simulations), or cutting planes in interesting regions of the computational box (e.g. the main axis in a jet simulation)
2. The user application must provide tools to have multiple views of the data (e.g. orthogonal planes) and to select interesting regions
Garching, April 2007
Selection algorithms
1. The thumbnail is a subset of the whole dataset, determined according to specific selection algorithms. E.g. the highest peaks of a mass distribution or the regions with temperature higher than a certain threshold…
2. The dimensionality of the data is the same (in general) as the original one.
3. The client tools must allow the user to chose some of the resulting objects and get the corresponding data
Garching, April 2007
The getThumb method
The specific details of the thumbnail services depend on their implementation and they must be published to the registry. However, a minimal set of negotiation methods and interfaces can be defined.
A getThumb method MUST be implemented. The input of this method is a couple of DATASERVICE, DATASOURCE parameters which identifies the dataset of interest. The output is a SNAP VOTable (or XML descriptor file, see later…) describing the thumbnail features. As for the results of the Snap procedure, thumbnails data are stored in external binary files downloaded together with the VOTable, immediately (they are small) as a response to the method.
No other details can be a priori specified, since strongly depending on the service implementation
Garching, April 2007
4. THE SNAP SERVICE
Garching, April 2007
The SNAP service
The main target of the SNAP service is the access to the raw data of a simulation, selected by a general Simulation Query
The SNAP service in general provides the following functionalities:
1. Extraction of a subset of data selected in a rectangular or spherical volume
2. Storage of the associated metadata in a VOTable (or XML descriptor file)
3. Storage of data in a binary file
4. Delivery of the result to the user via http, ftp etc.
The extraction phase 1, allows the user to focus on regions of interest, without having to download the whole dataset. Nevertheless, retrieving the complete dataset is still possible.
Garching, April 2007
The setSnap method
In order to submit the Snap request, a setSnap() method MUST be implemented, with parameters defined in the following slides.
To select the region of interest, only geometric parameters are necessary. For a rectangular region, the user has to specify the center of the box and the length of each of its sides. For a spherical selection, center and radius of the sphere are required. One or more variables of a given snapshot can be selected in the same cutout operation. Only one timestep corresponds to a setSnap request.
Garching, April 2007
setSnap input parameters
An input Sub-Volume query consists of an x,y,z position in the box, plus the side lengths (or radius) of the rectangular (spherical) region surrounding this point. Units are decided by the client. Finally they must be converted by the client in server compatible units
The service MUST support the following parameters: POS
The position of the center of the region of interest, expressed in proper units. Example: "POS=0.3,0.25,0.9". A NULL value represents the center of the whole box (e.g. 0.5,0.5,0.5).
SIZE
The size of the sides (or the radius) of the region in proper units. The region may be specified using either one or three values. If only one value is given it represents the radius of the sphere. The format of the SIZE parameter is the same as that for POS. Example “SIZE=0.2,0.5,0.3”. A special case is SIZE=NULL, which represents the whole box.
Garching, April 2007
setSnap input parameters (cont.ed)
The following parameters SHOULD be supported: BOUNDARY
Also this parameter can have one or three values, one for each coordinate direction. If only one value is given it applies to all coordinate axes. Possible values are:
• TRUNC – if the interesting region exceeds the computational box, it is resized at the box boundary
• PERIODIC - if the interesting region exceeds the computational box, data are selected from the opposite side of the box
Metadata of the service indicates whether periodic is supported.FIELDS
The service SHOULD support an optional parameter with the name FIELDS, the value of which is a comma separated list of field names corresponding to the data elements the simulation can return. If the parameter is not provided the default behavior is to return all fields. Example: “FIELDS=Density,Temperature,Velocity_x “
Garching, April 2007
setSnap input parameter: data sources
Simulations outputs are stored in files. This files can be indicated by a reference name which identify unambiguously the data source. The data source can be also a database. However, this does not imply anything on the service interface implementation. The complexity of the database access is hidden behind the setSnap operation and its implementation. But this is up to the service provider.
DATASOURCEThe service MUST support a parameter with the name DATASOURCE, the
value of which is single data source reference. Examples: “DATASOURCE=/scratch/my_directory/myfile1.bin” “DATASOURCE=myfile2.ref”
The service id MUST also be specified. DATASERVICEIdentification of the data service (to be better specified)
A SNAP operation MUST refer to a single data source. Multiple sources cutouts, like for various time steps of the same simulation, cannot be supported by the protocol. Their implementation is up to the client, as, for example, sequences of single source requests with same subbox and fields. The client must verify that such operation is possible and/or meaningful.
Garching, April 2007
setSnap input parameters: File Formats etc.
The SNAP service deliver its results as VOTables (or XML descriptor file) with associated binary files.
The service MAY support a parameter with the name FORMAT to indicate the desired format or formats of the data referenced by the output table. Possible formats are:
• data/raw_tabular• data/raw_sequential• data/votable • data/hdf5• data/fits
Service-Defined Parameters. The service MAY support additional service-specific parameters. The names,
meanings, and allowed values are defined by the service. The names need not be upper-case; however, they should not match any of the reserved parameter names defined above.
Garching, April 2007
setSnap output
The setSnap output is an id to the setSnap Result
This id will be used in the next stages of data delivery
Garching, April 2007
SNAP results
The result of the SNAP query consists in
• A VOTable(or XML descriptor file) with the description of the result and of the data
• A binary file with the extracted data
Both the VOTable and data could be delivered after a staging procedure (see later)
The description VOTable consists in the following elements:
• a RESOURCE element, identified with the tag type="results", containing one or more TABLE elements with the metadata results of the setSnap operation
• The TABLE in the output VOTable MUST contain FIELDs, that refer to the variables stored in the external binary file. FIELDS can be organized either as table or as sequences
• Variables must be scalars, i.e. vectors (or more general multidimensional quantities) are not supported. In this case some FIELDs represents the different components of the vector
Garching, April 2007
SNAP results (cont.ed)
• The VOTable MUST contain a DATASERVICE parameter which identifies the used service.
• The VOTable MUST contain a REQUEST_ID parameter which identifies uniquely the job request on the service.
• The VOTable MUST contain a REQUEST_STATUS parameter which can be Ok or Rejected. In this last case all the other fields of the VOTable are not present.
• A single TABLE can contain different variables of the same species. Species can differ either by their geometrical representation (e.g. particles, regular meshes, AMR meshes…) or in their “physical meaning” (e.g. star particles vs. dark matter particles). All the FIELDS in a TABLE have the same number of elements, specified by the arraysize TABLE parameter. This parameter set also the geometry of the quantity. E.g. arraysize=”N” represents a point like quantity; arraysize=”NxMxS” represents a grid based variable. Resulting data FIELDS are stored one after the other in a single binary file, in the same order they appear in the VOTable.
Garching, April 2007
SNAP results (cont.ed)
• Each TABLE MUST contain FIELDs where the UCDs have been set. FIELDS refer to the variables stored in the external binary file.
• Each FIELD must specify the datatype and the unit of the variable. Furthermore name and ID have to be set.
• The binary data file reference, acref, is specified in a DATA section
• Other parameters may be supported according to the services offered by the data provider.
Garching, April 2007
SNAP VOTables examples 1
VOTable for the velocity field of a fluid on a fixed 3D mesh
<RESOURCE name="myVectorField" type="results" > <DESCRIPTION>Velocity Field from N-Body run</DESCRIPTION> <INFO name="QUERY_STATUS" value="OK"/> <TABLE name="VelocityField" ID="Vel" order="sequential” arraysize="41x41x41" > <FIELD name="vx" ID="vx1" ucd="phys.veloc;pos.cartesian.x" datatype="float"
unit="km/s" /> <FIELD name="vy" ID="vy1" ucd="phys.veloc;pos.cartesian.y" datatype="float"
unit="km/s"/> <FIELD name="vz" ID="vz1" ucd="phys.veloc;pos.cartesian.z" datatype="float"
unit="km/s"/> <DATA> <BINARY> <STREAM acref="file:///scratch/myhome/test.bin"/> </BINARY> </DATA> </TABLE> </RESOURCE></VOTABLE>
Garching, April 2007
SNAP VOTables examples 3
VOTable for the temperature field of a mesh based quantity and the position
of N-Body particles extracted from the same spatial region. <RESOURCE name=myMixedData type="results"> <INFO name="QUERY_STATUS" value="OK"/> <TABLE name="Particles" ID="NBody" order="sequential” arraysize="100000"> <FIELD name="x" ID="x1" ucd="pos.cartesian;pos.cartesian.x“ datatype="float" unit="Mpc" /> <FIELD name="y" ID="y1" ucd="pos.cartesian;pos.cartesian.y“ datatype="float" unit="Mpc"/> <FIELD name="z" ID="z1" ucd="pos.cartesian;pos.cartesian.z” datatype="float"unit="Mpc"/> <DATA> <BINARY> <STREAM href="file:///scratch/myhome/particles.bin"/> </BINARY> </DATA> </TABLE>
<TABLE name=“Mesh" ID=“MeshTemp" order="sequential” arraysize=“41x41x41"> <FIELD name="temperature" ID="temp" ucd="phys.temperature;pos.cartesian“ datatype="float"
unit="K" /> <DATA> <BINARY> <STREAM href="file:///scratch/myhome/mesh.bin"/> </BINARY> </DATA> </TABLE></RESOURCE></VOTABLE>
Garching, April 2007
5. DATA STAGING AND RESULTS
Garching, April 2007
Data Staging
By Data Staging we refer to the processing the server performs to retrieve or
generate the requested simulation volume or subvolume and cache
them in online storage for retrieval by a client.
Staging is necessary for large archives which must retrieve simulation
data from hierarchical storage, or for services which can
dynamically extract subvolumes, where it may take a substantial
time (e.g. minutes or hours) to retrieve the data in the relevant region of
the simulation box
The snapshot staging service is optional for the simulation server. If staging
is not implemented, data should be immediately available for retrieval.
The availability of this function is communicated to the registry services.
The getSnap method is identical whether or not staging is used.
Garching, April 2007
Data Delivery
As soon as staged data are available at the given URL, the user can start the
download procedure.
The user can be informed of the availability of the data following two
different approaches:
1. The client searches for the data on the service (e.g. reload a
web/ftp page).
2. The service searches for the client and, if present, sends
information to it.
Garching, April 2007
Server messaging
Second approach:
the server must provide a messaging capability
The client must have an identity to be recognized by the service
The service broadcasts messages to identified clients whenever a staging
(processing) event occurs (e.g. data are available)
Service generated messages can also be used to pass informational or
diagnostic messages to clients as processing proceeds.
Garching, April 2007
Client identification
Snap is not just a search-and-download service, but it requires also running
processes and, possibly, managing them
Therefore the authentication of the client should be required. This is
strictly required for approach 2, in which the user must be
detected and identified by the service.
However, authentication should be always necessary for security and
privacy reasons: access to the services should be granted only to
“trustable” users with proper privileges.
Authentication could be on a username-password basis or on some more
sophisticated methods, like certificates. This choice is up to the service
provider
Authentication allows the user to use the scheduling/batch system which is
implemented by the service provider.
Garching, April 2007
Staging services
the provider should support at least two basic operations:
• Job monitoring
• Job cancellation
The specific implementation of the two operations depends on the adopted
service technology.
Both operations use the SERVICE and REQUEST_ID parameters written in the
VOTable. They are called using proper web methods:
• getSnapInfo(SERVICE, REQUEST_ID, SNAPINFO)
• cancelSnap(SERVICE, REQUEST_ID, SNAPINFO)
The getSnapInfo method returns a SNAPINFO string with the following
information: STATUS (Idle, Hold, Cancelled, Running, reJected),
SUBMISSION_DATE, other (up to the service provider, specified to the
registry). The cancelSnap method returns a SNAPINFO string that can
have the values “Ok” or “Rejected”.
Other services can be implemented and registered by the provider.
Garching, April 2007
Data Delivery
The getSnap(acref, SERVICE, STATUS) web method allows a client to
retrieve a single binary simulation file and the corresponding XML
descriptor file given the reference, output of the setSnap method.
The files can be downloaded using http, ftp, grid ftp protocols (or any other
useful protocol). All the metadata about the content and the structure of
the data file is stored in the associated VOTable
XML header files (VOTables) are stored as well and they are downloaded
together with the binary file using the same getSnap method.
The getSnap method returns a STATUS string which can be Ok, Rejected or
Defferred (if data are not yet available).
Garching, April 2007
6. FILE FORMATS AND STANDARDS
Garching, April 2007
File formats
Data produced by simulation codes are stored in files with different and, usually, non-standard formats.
This make it difficult to handle and exchange data
E.g. Gadget as its own format file (although it supports also HDF5). This format has no access library support, it is not extensible, data access is not efficient, it is strictly linked to the application.
File formats should be:
• standard
• Flexible
• Extensible
• Portable
• Fast
• Easily usable by applications
• SELF DESCRIPTIVE
Possible solutions:
Raw Binaries
FITS
HDF5
VOTables
NetCDF
…
Garching, April 2007
File formats: archive and results
Data file formats can be different according to their usage
Archive side files should be
• High performance (fast access)
• Standard (portable and persistent)
Result files should be
• Simple (specific I/O libraries are not required to access them)
• Self descriptive (e.g. XML metadata headers)
• Compressed (to minimize transfer effort)
In any case, data size is crucial. ASCII files are “deprecated”. Base64 (or similar) encoding for http transfers are to avoid. Waist of time (for conversions) and “space” (increased size).
Garching, April 2007
Result files
A simple solution is represented by raw binary files with the following characteristics.
• In a file more variables can be stored
• Each variable represent a scalar quantity
• Components of multidimensional quantities are stored as separate variables
• Variables have the same number of elements but they can have different types
• Variables can be stored either as Tabular or as Sequential (see next slide)
• A descriptor file (XML) is associated to the binary to make it self-descriptive
Advantages: (little) standardization, simplicity, no I/O specific libraries required, fast access
Drawbacks: limited portability (endianism problem, data types), little standardization, no compression
Garching, April 2007
Result files: Tabular vs Sequential
Tabular files are closer to observational data, so more compatible to a standard VOTable idea.
If the file contains the 3 variables vx, vy, vz, their Tabular storage is:
vx(1), vy(1), vz(1)
vx(2), vy(2), vz(2)
…
vx(N), vy(N), vz(N)
This is suitable for variables (like the components of a vector) which are always accessed as N-uple. Or for data analysis tools which need (and load) all the stored variables for their goal.
However it leads to poor performances if variables has to be loaded separately in memory. Loading one variable requires continuous jumps on the file.
Garching, April 2007
Result files: Tabular vs Sequential (cont.ed)
Sequential files are a common choice for “simulators”
If the file contains the 2 variables rho and press, their Sequential storage is:rho(1)
rho(2)
…
rho(N)press(1)press(2)…press(N)
Each variable can be read with a single I/O call. This leads to high performance access to the file. This is typically required dealing with large files.
Garching, April 2007
Archive files
Archive files are not “visible” to the end user. Therefore the data provider can choose any suitable format.
The choice should be in general driven by several properties:
• The format should be standard and well supported, in order to ensure the preservation of the data, their portability between different computing platforms, software, compilers... (if the technology changes we don’t want to change the data)
• The files should be fast and efficiently accessible, since data is large and complex operations could be necessary to handle it (e.g. extract the particles which falls in a certain region)
Various formats, with such features, are available.
Garching, April 2007
File formats: HDF5
HDF5 (http://hdf.ncsa.uiuc.edu) represents a possible solution to deal with such data
HDF5 is• Portable between most of
modern platform• High performance• Well supported• Well documented• Rich of tools• Flexible and extendible
HDF5 data files are• Platform independent (portable)• Well organized• Self defined• Metadata enriched• Efficiently accessible
HDF5 drawbacks• Requires some expertise and
skill to be used• Information are difficult to
access• Can be subject to major library
changes (see HDF4 to HDF5)
Garching, April 2007
File formats: HDF5 hierarchical structure and self-consistency
The data file can have a (complex) hierarchical, filesystem like, structure with groups (directories) and datasets (files)
The base group is “/” (root). Files can have only the root group
/BmMassDensity Dataset {512, 512, 512}
/BmTemperature Dataset {512, 512, 512}
/BmVelocity Dataset {512, 512, 512, 3}
/DmMassDensity Dataset {512, 512, 512}
/DmPosition Dataset {134217728, 3}
/DmVelocity Dataset {134217728, 3}
Or, they can store different simulation output times in different groups
/Time1/Mesh/BmMassDensity Dataset {512, 512, 512}
/Time1/Mesh/BmTemperature Dataset {512, 512, 512}
/Time1/Particles/DmPosition Dataset {134217728, 3}
/Time2/Mesh/BmMassDensity Dataset {512, 512, 512}
/Time2/Mesh/BmTemperature Dataset {512, 512, 512}
/Time2/Particles/DmPosition Dataset {134217728, 3}
HDF5 metadata make the file completely self-consistent
Structural metadata (strictly required from the library)
• rank• Dimensionality
Annotation metadata (required from our implementation)
• Data object name• Data object description• Unit• Formula