profile of hdf-eos5 files

29
Profile of HDF-EOS5 Files Abe Taaheri, Raytheon Information Systems Larry Klein, RS Information Systems HDF-EOS Workshop X November 2006

Upload: keran

Post on 05-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Profile of HDF-EOS5 Files. Abe Taaheri, Raytheon Information Systems Larry Klein, RS Information Systems HDF-EOS Workshop X November 2006. General HDF-EOS5 File Structure. HDF-EOS5 file is any valid HDF5 file that contains: a family of global attributes called: coremetadata.X - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Profile of HDF-EOS5 Files

Profile of HDF-EOS5 Files

Abe Taaheri, Raytheon Information SystemsLarry Klein, RS Information Systems

HDF-EOS Workshop XNovember 2006

Page 2: Profile of HDF-EOS5 Files

Page 2

General HDF-EOS5 File Structure

• HDF-EOS5 file is any valid HDF5 file that contains:– a family of global attributes called: coremetadata.X

Optional data objects: family of global attributes called:

archivemetadata.X any number of Swath, Grid, Point, ZA, and Profile data

structures. another family of global attributes: StructMetadata.X

• The global attributes provide information on the structure of HDF-EOS5 file or information on the data granule that file contains.

• Other optional user-added global attributes such as “PGEVersion”, “OrbitNumber”, etc. are written as HDF5 attributes into a group called “FILE ATTRIBUTES”

Page 3: Profile of HDF-EOS5 Files

Page 3

General HDF-EOS5 File Structure

S

• coremetadata.XUsed to populate searchable database tables

within the ECS archives. Data users use this information to locate particular HDF-EOS5 data granules.

• archivemetadata.XRepresents information that, by definition, will

not be searchable. Contains whatever information the file creator considers useful to be in the file, but which will not be directly accessible by ECS databases.

• StructMetadata.XDescribes contents and structure of HDF-EOS

file. e.g. dimensions, compression methods, geolocation, projection information, etc. that are associated with the data itself.

Page 4: Profile of HDF-EOS5 Files

Page 4

General HDF-EOS5 File Structure

• An HDF-EOS5 file– can contain any number of Grid, Point, Swath,

Zonal Average, and Profile data structures– has no size limits. A file containing 1000's of objects could cause

program execution slow-downs– can be hybrid, containing plain HDF5 objects for

special purposes. HDF5 objects must be accessed by the

HDF5 library and not by HDF EOS5 extensions. will require more knowledge of file contents on

the part of an applications developer or data user.

Page 5: Profile of HDF-EOS5 Files

Page 5

Swath Structure

• For a typical satellite swath, an instrument takes a series of scans perpendicular to the ground track of the satellite as it moves along that ground track

Pro

file

s

Instrument

Along Track

• Or a sensor measures a vertical profile, instead of scanning across the ground track

Page 6: Profile of HDF-EOS5 Files

Page 6

Swath Structure

Data Field.1

ProfileField.1

ProfileField.n

HDF5 Attribute

HDF5Dataset

Each Data Field object can have Attributes and/or Dimension Scales

• Swath_X groups are created when swaths are created

•Data/Geo fields’ parent group are created when fields are defined.

• Swath attributes are set as Object Attributes.

• Attributes for Data, Profile, or Gelocation Fields groups are set as Group Attributes

• Dataset related attributes set for each data field or geolocation field are called Local Attributes. They may contain attributes such as fillvalue, units, etc.

Geolocation Fields

“SWATHS” group

“Swath_N”“Swath_1”

Data Fields

Profile Fields

Object Attribute<SwathName>:

<AttrName>

Group Attribute<DataFields>:<AttrName>

Local Attribute<FieldName>:<AttrName>

Longitude Latitude

Time Colatitude

DataField.n

HDF5 Group

Page 7: Profile of HDF-EOS5 Files

Page 7

Swath Structure

Field Name Data Type Format

Longitude float32 or float64 DD*, range [-180.0, 180.0]

Latitude float32 or float64 DD*, range [-90.0, 90.0]

Colatitude float32 or float64 DD*, range [0.0, 180.0]

Time float64 TAI93 [seconds until(-) /

since(+) midnight, 1/1/93]

• Geolocation Fields− Geolocation fields allow the Swath to be accurately tied to particular points on the Earth’s surface. − At least a time field (“Time”) or a latitude/longitude field pair (“Latitude” and “Longitude”). “Colatitude” may be substituted for “Latitude.”− Fields must be either one- or two-dimensional − The “Time” field is always in TAI format (International Atomic Time)

* DD = Decimal Degree

Page 8: Profile of HDF-EOS5 Files

Page 8

Swath Structure

• Data Fields− Fields may have up to 8 dimensions. − An “unlimited” dimension must be the first dimension (in C-order). − For all multi-dimensional fields in scan- or profile-oriented Swaths, the

dimension representing the “along track” dimension must precede the dimension representing the scan or profile dimension(s) (in C-order).

− Compression is selectable at the field level within a Swath. All HDF5-supported compression methods are available through the HDF-EOS5 library. The compression method is stored within the file. Subsequent use of the library will un-compress the file. As in HDF5 the data needs to be chunked before the compression is applied.

− Field names: * may be up to 64 characters in length. * Any character can be used with the exception of, ",", ";", " and "/". * are case sensitive. * must be unique within a particular Swath structure.

Page 9: Profile of HDF-EOS5 Files

Page 9

Compression Codes

Compression Code Value Explanation

HDFE_COMP_NONE 0 No Compression

HDFE_COMP_RLE1

Run Length Encoding Compression (not supported)

HDFE_COMP_NBIT 2 NBIT Compression

HDFE_COMP_SKPHUFF 3 Skipping Huffman (not supported)

HDFE_COMP_DEFLATE 4 gzip Compression

HDFE_COMP_SZIP_CHIP5

szip Compression, Compression exactly as in hardware

HDFE_COMP_SZIP_K136

szip Compression, allowing k split = 13 Compression

HDFE_COMP_SZIP_EC 7 szip Compression, entropy coding method

HDFE_COMP_SZIP_NN 8

szip Compression, nearest neighbor coding method

HDFE_COMP_SZIP_K13orEC

9

szip Compression, allowing k split = 13 Compression, or entropy coding method

For Compression the data storage must be CHUNKED first

Page 10: Profile of HDF-EOS5 Files

Page 10

Compression Codes

Compression Code Value Explanation

HDFE_COMP_SZIP_K13orNN

10

szip Compression, allowing k split = 13 Compression, or nearest neighbor coding method

HDFE_COMP_SHUF_DEFLATE 11 shuffling + deflate(gzip) Compression

HDFE_COMP_SHUF_SZIP_CHIP12

shuffling + Compression exactly as in hardware

HDFE_COMP_SHUF_SZIP_K1313

shuffling + allowing k split = 13 Compression

HDFE_COMP_SHUF_SZIP_EC 14 shuffling + entropy coding method

HDFE_COMP_SHUF_SZIP_NN 15

shuffling + nearest neighbor coding method

HDFE_COMP_SHUF_SZIP_K13orEC

16

shuffling + allowing k split = 13 Compression, or entropy coding method

HDFE_COMP_SHUF_SZIP_K13orNN

17

shuffling + allowing k split = 13 Compression, or nearest neighbor coding method

For Compression the data storage must be CHUNKED first

Page 11: Profile of HDF-EOS5 Files

Page 11

Swath Structure

A “Normal” Dimension Map

Data Dimension

Geolocation DimensionMapping Offset: 1

Increment: 2

1 2 30 4 5 6 7 8 910111213141516171819

1 2 30 4 5 6 7 8 9

Data Dimension

Geolocation Dimension

Mapping Offset: -1

Increment: -21 2 30 4 5 6 7 8 9

1 2 30 4 5 6 7 8 910111213141516171819

A “Backwards” Dimension Map

• Dimension maps are the glue that holds the SWATH together. They define the relationship between data fields and geolocation fields by defining, one-by-one, the relationship of each dimension of each geolocation field with the corresponding dimension in each data field.

Page 12: Profile of HDF-EOS5 Files

Page 12

Grid Structure

• A grid contains grid corner locations and a set of projection equations (or references to them) along with their relevant parameters.

• The equations and parameters can be used to compute the latitude and longitude for any point in the grid.

• Important features of a Grid data set: the data fields, the dimensions, and the projection

A Data Field in a Mercator-Projected Grid

A Data Field in an Interrupted Goode’s Homolosine-Projected Grid

Page 13: Profile of HDF-EOS5 Files

Page 13

Grid Structure

Data Field characteristics:

−Fields may have up to 8 dims− An “unlimited” dimension must be the first dimension.− Dim order in field definitions: - C: “Band, YDim, XDim” - Fortran: “XDim, YDim, Band”− Compression is selectable at the field level within a Grid. Subsequent use of the library will un-compress the file. Data needs to be tiled before the compression is applied.

− Field names must be unique within a particular Grid structure and are case sensitive. They may be up to 64 characters in length. − Any character can be used with the exception of, ",", ";", " and "/".

Page 14: Profile of HDF-EOS5 Files

Page 14

Grid Structure

• Fields are Two - eight dimensional many fields will need not more than three: the predefined dimensions “XDim” and “YDim” and a third dimension for depth, height, or band.

Dimensions:• Two predefined dimensions

for Data Fields: “XDim” and

“YDim”. - defined when the grid is created - stored in the structure metadata. - relate data fields to each other and to the geolocation information

Page 15: Profile of HDF-EOS5 Files

Page 15

Grid Structure

• Projection:− Is the heart of the Grid structure. − Provides a convenient way to encode geolocation information as

a set of mathematical equations, capable of transforming Earth coordinates (lat/long) to X-Y coordinates on a sheet of paper

− General Coordinate Transformation Package (GCTP) library contains all projection related conversions and calculations.

− Supported projections:

Geographic

Mercator Transverse Mercator  Universal Transverse

MercatorCylindrical Equal area   Hotin Oblique Mercator  Space Oblique Mercator

Sinusoidal Integerized Sinusoidal   Interrupted Goode’s

Homolosine   

Polar StereographicLambert Azimuthal Equal

Area 

   

Polyconic Albers Conical Equal Area Lambert Conformal Conic

Page 16: Profile of HDF-EOS5 Files

Page 16

Point Structure

• Made up of a series of data records taken at [possibly] irregular time intervals and at scattered geographic locations

• Loosely organized form of geolocated data supported by HDF-EOS

• Level are linked by a common field name called LinkField

• Usually shared info is stored in Parent level, while data values stored in Child level

• The values for the LinkFiled in the Parent level must be unique

Station Lat Lon Chicago 41.49 -87.37 Los Angeles 34.03 -118.14 Washington 38.50 -77.00 Miami 25.45 -80.11

Time Temp(C) 0800 -3 0900 -2 1000 -1 0800 20 0900 21 1000 22 1100 24 1000 6 1100 8 1200 9 1300 11 1400 12 0600 15 0700 16

Lat Lon Temp(C) Dewpt(C)61.12 -149.48 15.00 5.0045.31 -122.41 17.00 5.0038.50 -77.00 24.00 7.0038.39 -90.15 27.00 11.0030.00 -90.05 22.00 7.0037.45 -122.26 25.00 10.0018.00 -76.45 27.00 4.0043.40 -79.23 30.00 14.0034.03 -118.14 25.00 4.0032.45 -96.48 32.00 8.0033.30 -112.00 30.00 10.0042.15 -71.07 28.00 7.0035.05 -106.40 30.00 9.0034.12 -77.56 28.00 9.00 46.32 -87.25 30.00 8.00 47.36 -122.20 32.00 15.0039.44 -104.59 31.00 16.0021.25 -78.00 28.00 7.00 44.58 -93.15 32.00 13.00 41.49 -87.37 28.00 9.0025.45 -80.11 19.00 3.00

Page 17: Profile of HDF-EOS5 Files

Page 17

Point Structure

Object Attribute<SwathName>:

<AttrName>

“POINTS” Group

“Point_1”

Group Attribute<SwathName>:

<AttrName>

Local Attribute<SwathName>:

<AttrName>

Level 1 Level n

Data Linkag

“Point_n”

FWDPOINTER

BCKPOINTER

HDF5 Group

• Point structure groups are created when user creates “Point_1”, ….. • Data and Linkage groups are created automatically when the level is defined

• The order in which the levels are defined determines the (0-based) level index

• FWDPOINTER Linkage will not be set (acutally first one is set to (-1,-1)) if the records in Child level is not monotonic in LinkFiekd

• A level can contain any number of fields and records

Level Data

Page 18: Profile of HDF-EOS5 Files

Page 18

ZA Structure

• “Zonal Average” structure is basically a swath like structure without geolocation.

• The interface is designed to support data that has not associated with specific geolocation information.

“ZAS” group

“Za_n”“Za_1”Object Attribute<SwathName>:

<AttrName>

Group Attribute<DataFields>:<AttrName>

Local Attribute<FieldName>:<AttrName>

Data Fields

HDF5 Group

DataField.n

Page 19: Profile of HDF-EOS5 Files

Page 19

“h5dump” output of a simpleHDF-EOS5 file

HDF5 "Grid.he5" {GROUP "/" { GROUP "HDFEOS" { GROUP "ADDITIONAL" { GROUP "FILE_ATTRIBUTES" { } } GROUP "GRIDS" { GROUP "TMGrid" { GROUP "Data Fields" { DATASET "Voltage" { DATATYPE H5T_IEEE_F32BE DATASPACE SIMPLE { ( 5, 7 ) / ( 5, 7 ) } DATA { (0,0): -1.11111,-1.11111,-1.11111,-1.11111,-1.11111, (0,5): -1.11111,-1.11111, ……………………………….. (4,0): -1.11111,-1.11111,-1.11111,-1.11111,-1.11111, (4,5): -1.11111,-1.11111 }

Page 20: Profile of HDF-EOS5 Files

Page 20

“h5dump” output of a simpleHDF-EOS5 file (cont.)

ATTRIBUTE "_FillValue" { DATATYPE H5T_IEEE_F32BE DATASPACE SIMPLE { ( 1 ) / ( 1 ) } DATA { (0): -1.11111 } } } } } } } GROUP "HDFEOS INFORMATION" { ATTRIBUTE "HDFEOSVersion" { DATATYPE H5T_STRING { STRSIZE 32; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; }

Page 21: Profile of HDF-EOS5 Files

Page 21

“h5dump” output of a simpleHDF-EOS5 file (cont.)

DATASPACE SCALAR DATA { (0): "HDFEOS_5.1.10" } } DATASET "StructMetadata.0" { DATATYPE H5T_STRING { STRSIZE 32000; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; } DATASPACE SCALAR DATA { (0): "GROUP=SwathStructure END_GROUP=SwathStructure GROUP=GridStructure GROUP=GRID_1 GridName="TMGrid" XDim=5 YDim=7

Page 22: Profile of HDF-EOS5 Files

Page 22

“h5dump” output of a simpleHDF-EOS5 file (cont.)

UpperLeftPointMtrs=(4855670.775390,9458558.924830) LowerRightMtrs=(5201746.439830,-10466077.249420) Projection=HE5_GCTP_TM

ProjParams=(0,0,0.999600,0,-75000000,0,5000000, 0,0,0,0,0,0)

SphereCode=0 GROUP=Dimension OBJECT=Dimension_1 DimensionName="Time" Size=10 END_OBJECT=Dimension_1 OBJECT=Dimension_2 DimensionName="Unlim" Size=-1 END_OBJECT=Dimension_2 END_GROUP=Dimension

Page 23: Profile of HDF-EOS5 Files

Page 23

“h5dump” output of a simpleHDF-EOS5 file (cont.)

GROUP=DataField OBJECT=DataField_1 DataFieldName="Voltage" DataType=H5T_NATIVE_FLOAT DimList=("XDim","YDim") MaxdimList=("XDim","YDim") END_OBJECT=DataField_1 END_GROUP=DataField GROUP=MergedFields END_GROUP=MergedFields END_GROUP=GRID_1 END_GROUP=GridStructure GROUP=PointStructure END_GROUP=PointStructure GROUP=ZaStructure END_GROUP=ZaStructure END " } } }}}

Page 24: Profile of HDF-EOS5 Files

Page 24

RFC Comments Draft Community Standard (ESE-RFC-008)

1. "For all multi-dimension fields in scan- or profile-oriented Swaths, the dimension representing the "along track" dimension must precede the dimension representing the scan or profile dimension(s)." This is incorrect in Fortran.

- This can be clarified

2. A reserved field name is called "Latitude", but there is no software check for this. Users can inadvertently create fields called "latitude" or "LATITUDE" as there is no feedback from the library that this is incorrect.

- This can be implemented easily on reserved field names

3. Also HDF allows parallel I/O while HDF-EOS does not. This may become even more problematic as HDF-EOS is entering maintenance phase and active development on it may be curtailed.

- This cannot be done at this time (Labor intensive).

4. HDF-EOS adds another layer of complexity to an already complex system. When a bug occurs in the HDF-EOS or HDF libraries, it is not always apparent which library is the culprit.

- What can be done?

Page 25: Profile of HDF-EOS5 Files

Page 25

RFC Comments Draft Community Standard (ESE-RFC-008)

5. HDF-EOS is a complicated library which requires support. With the release of HDF-EOS5 in Toolkit 5.2.14, the Swath, Grid and EH support libraries total over 60,000 lines of code.

6. We must be confident that the original HDF5 and HDF-EOS5 developers, or qualified successors, continue to support these formats.

- If we also consider TOOLKIT, HDF(EOS)View, etc total should more than 2,000,000 lines of code, that needs support.

7. If a file contains 2 or more grids, and the grids each contain identically named fields then everything is fine when data is accessed via the HDF-EOS interface, but users of tools which in turn use the basic HDF4 interface are not able to distinguish between them. These tools reportedly include GRADS, HDFLook, and Giovanni.

- One should access HDF_EOS objects only using HDF-EOS interfaces.

8. I did not see anything on the Point and Zonal Average APIs in the specification. I don's see anything on datatypes, but that is probably covered in the HDF5 Specification. I'm assuming a reference guide would have more detail.

- Zonal Average is like Swath without geolocation field. As far as we know Point has not been used in any product.

Page 26: Profile of HDF-EOS5 Files

Page 26

RFC Comments Draft Community Standard (ESE-RFC-008)

9. I have never tried to develop an implementation of HDF-EOS5. However, the standard doesn't provide sufficient information to either reproduce or parse a correct coremetadata string. The standard stipulates the syntax of the structural metadata string description language (ODL), but contains no discussion of the lexicon or semantics to be used when creating these strings.

- This is in SDP Toolkit and too much to bring it up in the HDF-EOS Standards.

10. The description of the standard should be completed so that all components of the metadata are well defined.

- Metadata is in SDP TOOLKIT with ample examples. It was left out to concentrate on Swath and Grid structures in the Draft Community Standard (ESE-RFC-008)

11. Additionally the Draft Community Standard (ESE-RFC-008) states in section 7.2.3 that SDP toolkit contains the tools for parsing the coremetadata strings. Thus, the HDF-EOS(5) format is based not only upon HDF(5) but also put the SDP toolkit library. In this way the claim that HDF-EOS(5) is based upon HDF(5) is incomplete - it is also based on SDP toolkit, or a library of similar functionality

- HDF-EOS Objects depend only on HDF5 library. Only ECS metadata is handled using Toolkit.

Page 27: Profile of HDF-EOS5 Files

Page 27

RFC Comments Draft Community Standard (ESE-RFC-008)

12. Handing HDF-EOS in a high level language without HDF-EOS support would be difficult. This is because the standard relies on an obscure, obsolete syntax (Jet Propulsion Laboratory's ODL syntax) for its data serialization. Parsers for ODL are rare and not widely supported in different programming languages. The metadata stored with a dataset is effectively dumped into a metaphorical "black hole".

- XML implementation was abandoned because of high cost.

13. The API documentation describes input variables to functions as being either IN or OUT (or perhaps both). In the case of variables which are passed IN type parameters it has never been clear if the functions will modify the referent or not. Are IN type parameter referents constants?

- IN type parameters are not modified in functions (if that is the case, then it is explained in the Users Guide).

14. The primary failing of HDF-EOS as a useful product was made clear by an participant in the 2004 HDF/HDF-EOS meeting in Aurora, CO. The participate pointed out that HDF-EOS was not a standard but an implementation effectively "locked" to the particular programming languages which have been supported by the HDF-EOS(5) developers. This inhibits use of HDF-EOS(5) format data in other programming languages than C or Fortran.

- We can support any language HDF can, but it would be labor intensive to do so.

Page 28: Profile of HDF-EOS5 Files

Page 28

RFC Comments Draft Community Standard (ESE-RFC-008)

15. For all practical purposes it is impossible to read or write HDF-EOS5 datasets completely without using the existing libraries. Much of the high level software used for scientific data analysis doesn't support HDF-EOS5 libraries completely or in a timely fashion. This lack support (in contrast to HDF) makes it a less than desirable form in which to store data.

- What can be done?

16. SDP Toolkit which contains the recommended ODL parser is not supported under Mac OS X or Windows.

- SDP Toolkit parser is supported under Mac OS X. A shorter version of SDP TOOLKIT, called MTD Toolkit, supports metadata handling tools in Windows.

17. ODL should be abandoned as a data serialization syntax. Indeed, the need for data serialization should be reconsidered altogether as the underlying HDF(5) data structure contains the necessary components to maintain structured data.

- Too much work for redesigning HDf_EOS5

18. A set of Quality Assurance (QA) tools should be developed which analyze a target dataset to verify that it is a lexically and syntactically correct HDF-EOS formatted dataset. The QA tools should be both distributed as open source and made available as a Web based service.

- Good idea, but very labor intensive.

Page 29: Profile of HDF-EOS5 Files

Page 29

RFC Comments Draft Community Standard (ESE-RFC-008)

19. Example codes should be provided in different programming languages which produce and read the most commonly used HDF-EOS data structures.

- Requires more work than can be done in the maintenance phase