update on hdf5 1.8

Post on 11-Jun-2015

82 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

This presentation targets HDF5 application developers and anyone who is interested in the new HDF5 Library features. The following new features available in 1.8.0 will be discussed: HDF5 cache Meta data working set size is highly variable depending on file structure and access pattern. If the cache is too small, performance will deteriorate. In 1.8 we introduce code to configure metadata cache size automatically and API calls to allow manual configuration of the metadata cache. Text - data type conversion (10 minutes) The new high-level API function, H5LTtext_to_dtype, provides the ability to create a data type through the text description of the data type. The function H5LTdtype_to_text facilitates debugging by printing the text description of a data type. The current supported text description is in DDL format. External Links This feature allows links in a group to refer to objects in another file, and for the library to access those objects as if they are in the current file. We will present the API functions and how external links are supported. Group revisions We will introduce new features of the HDF5 Group object that include compact group storage, new large group storage, intermediate Group Creation and support of Unicode for the HDF5 object's names and datatypes. We will also cover new APIs for copying HDF5 objects between HDF5 files. Compact Groups – This feature allows groups containing only a few links to take up much less space in the file. New Large Group Storage – The method of storing groups with many links has been updated to be faster and more scalable. Intermediate Group Creation – This feature allows intermediate groups that don't exist yet to be created when creating an object in a file. Support for Unicode Character Set – The UTF-8 Unicode encoding is now supported for strings in datasets, the names of links and the names of attributes.

TRANSCRIPT

Update on HDF5 Update on HDF5 1.81.8The HDF Group

HDF and HDF-EOS Workshop XNovember 28, 2006

HDFHDF

Why HDF5 1.8?Why HDF5 1.8?

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

3

… as we know, there are known knowns; there are things we know we know.

We also know there are known unknowns; that is to say we know there are some

things we do not know.

But there are also unknown unknowns -- the ones we don't know we don't know.

Donald Rumsfeld

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

4

Some things we knew we Some things we knew we knewknew

• Need high level APIs – image, etc.• Need more datatypes - packed n-

bit, etc.• Need external and other links• Tools needed – h5pack, etc. • Caching embellishments• Eventually, multithreading

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

5

Things we knew we did not Things we knew we did not knowknow

• New requirements from EOS and ASCI

• New applications that would use HDF5

• How HDF5 would really perform in parallel

• What new tools, features and options needed

• New APIs, API features

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

6

Things we didn’t know we didn’t know

• Completely unanticipated applications• New data types and structures

• E.g. DNA sequences

• New operations• E.g. write many real-time streams

simultaneously

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

7

HDF5 1.8 topicsHDF5 1.8 topics

• Dataset and datatype improvements• Group improvements• Link Revisions• Shared object header nessages• Metadata cache improvements• Other improvements• Platform-specific changes• High level APIs• Parallel HDF5• Tool improvements

Dataset and Dataset and Datatype Datatype

ImprovementsImprovements

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

9

Text-based data type Text-based data type descriptionsdescriptions

• Why:• Simplify datatype creation• Make datatype creation code more

readable• Facilitate debugging by printing the text

description of a data type

• What: • New routine to create a data type through

the text description of the data type: H5LTdtype_to_text

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

10

Text data type description – Text data type description – ExampleExample

• Create a datatype of compound type.

/* Create the data type with text description */

dtype = H5Ttext_to_type(( “ “typedef struct foo {int a; float b;} typedef struct foo {int a; float b;} foo_t;”)foo_t;”)

/* Convert the data type back to text */H5Ttype_to_text(dtype, NULL, H5T_C, &tsize)

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

11

Serialized datatypes and Serialized datatypes and dataspaces dataspaces

• Why: • Allow datatype and dataspace info to

be transmitted between processes • Allow datatype/dataspace to be stored

in non-HDF5 files

• What: • A new set of routines to

serialize/deserialize HDF5 datatypes and dataspaces.

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

12

Int to float convert during I/OInt to float convert during I/O

• Why: Convert ints to floats during I/O

• What: Int to float conversion supported during I/O

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

13

Revised conversion exception Revised conversion exception handlinghandling

• Why: Give apps greater control over exceptions (range errors, etc.) during datatype conversion.

• What: Revised conversion exception handling

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

14

Revised conversion exception Revised conversion exception handlinghandling

• To handle exceptions during conversions, register handling function through H5Pset_type_conv_cb().

• Cases of exception:• H5T_CONV_EXCEPT_RANGE_HI• H5T_CONV_EXCEPT_RANGE_LOW• H5T_CONV_EXCEPT_TRUNCATE• H5T_CONV_EXCEPT_PRECISION• H5T_CONV_EXCEPT_PINF• H5T_CONV_EXCEPT_NINF• H5T_CONV_EXCEPT_NAN

• Return values: H5T_CONV_ABORT, H5T_CONV_UNHANDLED, H5T_CONV_HANDLED

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

15

Compression filter for n-bit Compression filter for n-bit datadata

• Why: Compact storage for user-defined

datatypes

• What:• When data stored on disk, padding

bits chopped off and only significant bits stored

• Supports most datatypes• Works with compound datatypes

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

16

N-bit compression exampleN-bit compression example

• In memory, one value of N-Bit datatype is stored like this:

| byte 3 | byte 2 | byte 1 | byte 0 ||????????|????SPPP|PPPPPPPP|PPPP????|

S-sign bit P-significant bit ?-padding bit

• After passing through the N-Bit filter, all padding bits are chopped off, and the bits are stored on disk like this:

| 1st value | 2nd value ||SPPPPPPP PPPPPPPP|SPPPPPPP PPPPPPPP|...

• Opposite (decompress) when going from disk to memory

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

17

Offset+size storage filter Offset+size storage filter

• Why:Use less storage when less precision needed

• What:• Performs scale/offset operation on each value• Truncates result to fewer bits before storing• Currently supports integers and floats

• ExampleH5Pset_scaleoffset

(dcr,H5Z_SO_INT,H5Z_SO_INT_MINBITS_DEFAULT);

H5Dcreate(……, dcr)

H5Dwrite (…);

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

18

Example with floating-point Example with floating-point typetype

• Data: {104.561, 99.459, 100.545, 105.644}• Choose scaling factor: decimal precision to

keepE.g. scale factor D = 2

1. Find minimum value (offset): 99.4592. Subtract minimum value from each

elementResult: {5.102, 0, 1.086, 6.185}

3. Scale data by multiplying 10D = 100Result: {510.2, 0, 108.6, 618.5}

4. Round the data to integerResult: {510 , 0, 109, 619}

5. Pack and store using min number of bits

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

19

““NULL” DataspaceNULL” Dataspace

• Why:• Allow datasets with no elements to be

described • NetCDF 4 needed a “place holder” for

attributes

• What:• A dataset with no dimensions, no data

Group Group improvementsimprovements

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

21

Access links by creation-time Access links by creation-time orderorder

• Why: • Allow iteration & lookup of group’s

links (children) by creation order as well as by name order

• Support netCDF access model for netCDF 4

• What: Option to access objects in group according to relative creation time

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

22

““Compact groups”Compact groups”

• Why: • Save space and access time for small groups• If groups small, don’t need B-tree overhead

• What:• Alternate storage for groups with few links

• Example• File with 11,600 groups• With original group structure, file size ~ 20

MB• With compact groups, file size ~ 12 MB• Total savings: 8 MB (40%)• Average savings/group: ~700 bytes

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

23

Better large group storageBetter large group storage

• Why: Faster, more scalable storage and access for large groups

• What: New format and method for storing groups with many links

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

24

Intermediate group creationIntermediate group creation

• Why: • Simplify creation of a series of

connected groups • Avoid having to create each

intermediate group separately, one by one

• What: • Intermediate groups can be created

when creating an object in a file, with one function call

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

25

Example: add intermediate Example: add intermediate groupsgroups

• Want to create “/A/B/C/dset1”• “A” exists, but “B/C/dset1” do not

/A

/A

BB

dset1dset1

CCH5Dcreate(file_id, “/A/B/C/dset1”,..)

One call creates groups “B” & “C”, then creates “dset1”

Link RevisionsLink Revisions

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

27

What are links?What are links?

Links connect groups to their members

“Hard” links point to a target by address

“Soft” links store the path to a target root group

Hard link

dataset

Soft link“/target dataset”<address>

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

28

file2.h5

file1.h5

New: New: externalexternal Links Links

• Why: Access objects by file & path within file

• What:• Store location of file and path within

that file• Can link across files

root group

“dataset EL”

“file2.h5”

“target dataset”

root group

dataset

“target dataset”

<address>

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

29

New: New: User-definedUser-defined Links Links

• Why:• Allow applications to create their own kinds of

links and link operations, such as• Create “hard” external link that finds an object by

address• Create link that accesses a URL• Keep track of how often a link accessed, or other

behavior

• What:• App can create new kinds of links by supplying

custom callback functions• Can do anything HDF5 hard, soft, or external

links do

Shared Object Shared Object Header MessagesHeader Messages

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

31

Shared object header Shared object header messagesmessages

• Why: metadata duplicated many times, wasting space

• Example:• You create a file with 10,000 datasets• All use the same datatype and dataspace• HDF5 needs to write this information 10,000 times!

Dataset 1

data 1

datatype

dataspace

Dataset 2

data 2

datatype

dataspace

Dataset 3

data 3

datatype

dataspace

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

32

Shared object header Shared object header messagesmessages

What:• Enable messages to be shared automatically• HDF5 shares duplicated messages on its

own!

Dataset 1

data 1

datatype

dataspace

Dataset 2

data 2

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

33

Shared MessagesShared Messages

• Happens automatically• Works with datatypes, dataspaces, attributes,

fill values, and filter pipelines• Saves space if these objects are relatively large• May be faster if HDF5 can cache shared

messages• Drawbacks

• Usually slower than non-shared messages• Adds overhead to the file

• Index for storing shared datatypes• 25 bytes per instance

• Older library versions can’t read files with shared messages

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

34

Two informal testsTwo informal tests

• File with 24 datasets, all with same big datatype• 26,000 bytes normally• 17,000 bytes with shared messages enabled• Saves 375 bytes per dataset

• But, make a bad decision: invoke shared messages but only create one dataset…• 9,000 bytes normally• 12,000 bytes with shared messages enabled• Probably slower when reading and writing, too.

• Moral: shared messages can be a big help, but only in the right situation!

Metadata cache Metadata cache improvementsimprovements

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

36

Metadata Cache Metadata Cache improvementsimprovements

• Why: • Improve I/O performance and memory

usage when accessing many objects• What:

• New metadata cache APIs• control cache size• monitor actual cache size and current hit rate

• Under the hood: adaptive cache resizing• Automatically detects the current working size• Sets max cache size to the working set size

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

37

Metadata cache Metadata cache improvementsimprovements

• Note: most applications do not need to worry about the cache

• See “Advanced topics” for details• And if you do see unusual memory

growth or poor performance, please contact us. We want to help you.

Other Other improvementsimprovements

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

39

New extendible error-New extendible error-handling APIhandling API

• Why: Enable app to integrate error reporting with HDF5 library error stack

• What: New error handling API• H5Epush - push major and minor error ID on

specified error stack• H5Eprint – print specified stack• H5Ewalk – walk through specified stack• H5Eclear – clear specified stack• H5Eset_auto – turn error printing on/off for

specified stack• H5Eget_auto – return settings for specified

stack traversal

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

41

Attribute improvementsAttribute improvements

• Why:• Use less storage when large numbers

of attributes attached to a single object

• Iterate over or look up attributes by creation order

• What:• Property to create index on the order

in which the attributes are created• Improved attribute storage

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

42

Support for Unicode Support for Unicode Character SetCharacter Set

• Why:• So apps can create names using Unicode• netCDF 4 needed this

• What• UTF-8 Unicode encoding now supported• For string datatypes, names of links and

attributes

• Example:H5Pset_char_encoding(lcpl_id, H5T_CSET_UTF8)

H5Llink(file_id, "UTF-8 name", …, lcpl_id, …);

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

43

Efficient copying of HDF5 Efficient copying of HDF5 objectsobjects

• Why:• Enable apps to copy objects efficiently

• What• New routines to copy an object in an HDF5

file within the current file or to another file• Done at a low-level in the HDF5 file,

allowing• Entire group hierarchies to be copied quickly• Compressed datasets to be copied without

going through a decompression/compression cycle

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

44

Performance of object copy Performance of object copy routinesroutines

88.1%

58.7%

35.8%

20.0%

0.3% 0.1%0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

80M a

rray,

compou

nd d

atatyp

e

16K x

16K in

t arra

y

10,00

0 gr

oups

16K x

16K flo

at ar

ray,

chun

ked

10,00

0 att

ribute

s

16Kx1

6K flo

at arra

y, ch

unked,

com

press

ed

relative time for new h5repack using object copy routines vs. old h5repack

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

45

Data transformation filterData transformation filter

• Why:• Apply arithmetic operations to data during I/O

• What:• Data transformation filter• Transform expressed by algebraic formula • Only +, -, *, and /supported

• Example:• Expression parameter set, such as x*(x-5)• When dataset read/written, x*(x-5) applied per

element• When reading, values in file are unchanged• When writing, transformed data written to file

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

46

Stackable Virtual File DriversStackable Virtual File Drivers

• What is Virtual File Driver (VFD)?

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

47

Virtual file I/O (C only)Virtual file I/O (C only) Perform byte-stream I/O operations (open/close, read/write, seek) User-implementable I/O (stdio, network, memory, etc.)

Virtual file I/O (C only)Virtual file I/O (C only) Perform byte-stream I/O operations (open/close, read/write, seek) User-implementable I/O (stdio, network, memory, etc.)

Library internalsLibrary internals• Performs data transformations and other prep for I/O • Configurable transformations (compression, etc.)

Library internalsLibrary internals• Performs data transformations and other prep for I/O • Configurable transformations (compression, etc.)

Structure of HDF5 LibraryStructure of HDF5 Library

Object API (C, Fortran 90, Java, C++)Object API (C, Fortran 90, Java, C++) Specify objects and transformation properties Invoke data movement operations and data transformations

Object API (C, Fortran 90, Java, C++)Object API (C, Fortran 90, Java, C++) Specify objects and transformation properties Invoke data movement operations and data transformations

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

48

Stackable VFDStackable VFD

• HDF5 VFD allows• Storing data using different physical

file layout. E.g., Family VFD (writes file as “family of files”)

• Doing different types of I/O. E.g., stdio (standard I/O); MPI-I/O (for parallel I/O)

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

49

Stackable VFDStackable VFD

• Why “stackable:”• Before now, only one VFD could be used at

a time• VFDs could not inter-operative

• What is “stackable:”• A Non-terminal VFD may stack on top of

compatible non-terminal and eventually Terminal VFD’s

• Two kinds of VFD• Non-terminal (e.g. Family)• Terminal (e.g. stdio; MPI-I/O)

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

50

Stackable VFDStackable VFD

HDF5 Files

Application

HDF5 API

stdio

Family Filesplit

mpiioSec2

Default I/O path

TerminalVFD

Non-terminalVFD

metadata rawdata

Platform-specific Platform-specific changeschanges

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

52

Platform-specific changesPlatform-specific changes

• Why: Better UNIX/Linux Portability • What:

• 1.8 uses latest GNU “auto” tools (autoconf, automake, libtool) • improves portability between many

machine and OS configurations

• Build can now be done in parallel • with gmake “–j” flag• speeds up build, test and install processes

• Build infrastructure includes many other improvements as well

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

53

Platforms to be droppedPlatforms to be dropped

• Operating systems• HPUX 11.00 • MAC OS 10.3• AIX 5.1 and 5.2• SGI IRIX64-6.5• Linux 2.4• Solaris 2.8 and 2.9

• Compilers• GNU C compilers

older than 3.4 (Linux)

• Intel 8.*• PGI V. 5.*, 6.0• MPICH 1.2.5

http://www.hdfgroup.org/HDF5/release/alpha/obtain518.html

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

54

Platforms to be addedPlatforms to be added

• Systems• Alpha Open VMS• MAC OSX 10.4

(Intel)• Solaris 2.* on Intel

(?)• Cray XT3• Windows 64-bit

(32-bit binaries)• Linux 2.6• BG/L

• Compilers• g95• PGI V. 6.1• Intel 9.*• MPICH 1.2.7• MPICH2

High level APIsHigh level APIs

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

56

High-Level Fortran APIsHigh-Level Fortran APIs

• Fortran APIs have been added for H5Lite, H5Image and H5Table.

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

57

Dimension scales Dimension scales

• Similar to • Dimension scales in HDF4• Coordinate variables in netCDF

• What is a dimension scale ?• An HDF5 dataset with additional metadata

that identifies the dataset as a “Dimension Scale”

• Associated with dimensions of HDF5 datasets• Meaning of the association is left to

applications • A Dimension scale can be shared by two

or more dataset dimensions

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

58

Dimension scales exampleDimension scales example

HDF Explorer image

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

59

Dimension scales exampleDimension scales example

HDF Explorer image

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

60

Sample dimension scale Sample dimension scale functionsfunctions

• H5DSset_scale:H5DSset_scale: convert dataset to a convert dataset to a dimension scaledimension scale

• H5DSattach_scale:H5DSattach_scale: attach scale to a attach scale to a dimensiondimension

• H5DSdetach_scale:H5DSdetach_scale: detach scale detach scale from a dimensionfrom a dimension

• H5DSis_attached:H5DSis_attached: verify if scale verify if scale attached to dataset attached to dataset

• H5DSget_scale_name:H5DSget_scale_name: read name of read name of scalescale

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

61

HDF5PacketHDF5Packet

• Why:• High performance table writing• For data acquisition, when there are

many sources of data• E.g. flight test

• What:• Each row is a “packet”: a collection of

fields, fixed or variable length• Append only• Indexed retrieval

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

62

Packets in HDF5Packets in HDF5

...

Data

Data

Data

Data

Data

Data

Variable-length recordsFixed-length data records

Tim

e

Tim

e

...

Parallel HDF5Parallel HDF5

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

64

Collective I/O improvementsCollective I/O improvements

• Why• Collective I/O not available for chunked

data• Collective I/O not available for complex

selections• Collective I/O is key to improving

performance for parallel HDF5• What

• Collective I/O works for chunked storage• Works for irregular selections for both

chunked and contiguous storage

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

65

Parallel h5diff (ph5diff)Parallel h5diff (ph5diff)

• Compares two files in an MPI parallel environment.

• Compares multiple datasets simultaneously

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

66

Windows MPICH supportWindows MPICH support

• Windows MPICH support: prototype

Tool improvementsTool improvements

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

68

New features for old toolsNew features for old tools

• h5dump• Dump data in binary format• Faster for files with large numbers of

objects• h5diff

• Can now compare dataset regions • Parallel ph5diff now available

• h5repack• Efficient data copy using H5Gcopy()• Able to handle big datasets

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

69

New HDF5 ToolsNew HDF5 Tools

• h5copy• Copies a group, dataset or named datatype from one

location to another• Copies within a file or across files

• h5repart• Partition file into a family of files

• h5import • Import binary/ascii data into an HDF5 file

• h5check • Verifies an HDF5 file against the defined HDF5 File

Format Specification

• h5stat• Reports statistics about a file and objects in a file

Thank YouThank You

Questions/Questions/comments?comments?

Nov. 28, 2006

HDF and HDF-EOS Workshop X, Landover MD

72

For more informationFor more information

• Go to http://www.hdfgroup.org/HDF5/

• Click on “Obtain HDF5 1.8.0 Alpha”

• Look at table “Information”

AcknowledgementAcknowledgementThis report is based upon work supported in part by a Cooperative Agreement with NASA under NASA NNG05GC60A. Any opinions, findings, and conclusions or recommendations expressed in this

material are those of the author(s) and do not necessarily reflect the views of the

National Aeronautics and Space Administration.

top related