may 30-31, 2012hdf5 workshop at psi1 hdf5 at glance quick overview of known topics

24
May 30-31, 2012 HDF5 Workshop at PSI 1 HDF5 at Glance Quick overview of known topics

Upload: anabel-strickland

Post on 01-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics

May 30-31, 2012 HDF5 Workshop at PSI 1

HDF5 at Glance

Quick overview of known topics

Page 2: May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics

May 30-31, 2012 HDF5 Workshop at PSI 2

Outline

• Overview of HDF5 • Topics not covered by PSI HDF5 Tutorial

• Groups and Links

Page 3: May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics

HDF5 File

HDF5 Workshop at PSI 3

lat | lon | temp----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6

An HDF5 file is a container that holds data objects.

Experiment Notes:

Serial Number: 99378920

Date: 3/13/09

Configuration: Standard 3

May 30-31, 2012

Page 4: May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics

HDF5 File

4

lat | lon | temp----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6

/

SimOutViz

HDF5 groups and links organize data objects.

Every HDF5 file has a root group

Parameters10;100;1000

Timestep36,000

May 30-31, 2012 HDF5 Workshop at PSI

Similar to UNIX directories

Page 5: May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics

HDF5 Software Layers & Storage

HDF5 File Format File Split

Files

File on Parallel Filesystem

Other?I/O Drivers

Virtual File Layer Posix I/O

Split Files MPI I/O Custom

Internals Memory Mgmt

Datatype Conversion Filters Chunked

StorageVersion

Compatibilityand so on…

LanguageInterfaces

C, Fortran, C++

HDF5 Data Model ObjectsGroups, Datasets, Attributes, …

Tunable PropertiesChunk Size, I/O Driver, …

HD

F5 L

ibra

rySt

orag

e

h5dumptool

High LevelAPIs

HDFview toolTo

ols

h5repack tool

Java Interface…

API

5May 30-31, 2012 HDF5 Workshop at PSI

Page 6: May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics

GROUPS AND LINKS

May 30-31, 2012 HDF5 Workshop at PSI 6

Page 7: May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics

May 30-31, 2012 HDF5 Workshop at PSI 7

Groups and Links

• Groups are containers for links (graph edges)• Links were added in 1.8.0• Warning: Many APIs in H5G interface are

obsolete - use H5L interfaces to discover and manipulate file structure

Page 8: May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics

Example h5_links.py

8

/

BA

Different kinds of links

May 30-31, 2012 HDF5 Workshop at PSI

a External

a soft

dangling

dset.h5

links.h5

Dataset can be “reached” using three paths /A/a/a/soft Dataset is in a different file

Page 9: May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics

May 30-31, 2012 HDF5 Workshop at PSI 9

Links

• Name• Example: “A”, “B”, “a”, “dangling”, “soft”• Unique within a group; “/” are not allowed in names

• Type• Hard Link

• Value is object’s address in a file• Created automatically when object is created• Can be added to point to existing object

• Soft Link• Value is a string , for example, “/A/a”, but can be

anything• Use to create aliases

Page 10: May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics

May 30-31, 2012 HDF5 Workshop at PSI 10

Links (cont.)

• Type• External Link

• Value is a pair of strings , for example, (“dset.h5”, “dset” )

• Use to access data in other HDF5 files• HDF5 1.8.7 introduced caching of files

opened via external links H5Pset_elink_file_cache_size

Page 11: May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics

May 30-31, 2012 HDF5 Workshop at PSI 11

Links Properties

• Links Properties• ASCII or UTF-8 encoding for names• Create intermediate groups

• Saves programming effort

• C examplelcpl_id = H5Pcreate(H5P_LINK_CREATE);

H5Pset_create_intermediate_group( lcpl_id, 1 );

H5Gcreate (fid, "A/B", lcpl_id, H5P_DEFAULT, H5P_DEFAULT);

• Group “A” will be created if it doesn’t exist

Page 12: May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics

May 30-31, 2012 HDF5 Workshop at PSI 12

Operations on Links

• See H5L interface in Reference Manual• Create• Delete• Copy• Iterate• Check if exists

Page 13: May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics

May 30-31, 2012 HDF5 Workshop at PSI 13

Groups Properties

• Creation properties• Type of links storage

• Compact (in 1.8.* versions)• Used with a few members (default under 8)

• Dense (default behavior)• Used with many (>16) members (default)

• Tunable size for a local heap• Save space by providing estimate for size of the storage

required for links names

• Can be compressed (in 1.8.5 and later)• Many links with similar names (XXX-abc, XXX-d, XXX-

efgh, etc.)• Requires more time to compress/uncompress data

Page 14: May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics

May 30-31, 2012 HDF5 Workshop at PSI 14

Groups Properties

• Creation properties• Links may have creation order tracked and indexed

• Indexing by name (default) • A, B, a, dangling, soft

• Indexing by creation order (has to be enabled)• A, B, a, soft, dangling

• http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/api18-c.html

Page 15: May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics

May 30-31, 2012 HDF5 Workshop at PSI 15

Discovering HDF5 file’s structure

• HDF5 provides C and Fortran 2003 APIs for recursive and non-recursive iterations over the groups and attributes• H5Ovisit and H5Literate (H5Giterate)• H5Aiterate

• Life is much easier with H5Py (h5_visita.py)import h5pydef print_info(name, obj): print name for name, value in obj.attrs.iteritems():

print name+":", valuef = h5py.File('GATMO-SATMS-npp.h5', 'r+')f.visititems(print_info)f.close()

Page 16: May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics

May 30-31, 2012 HDF5 Workshop at PSI 16

Checking a path in HDF5

• HDF5 1.8.8 provides HL C and Fortran 2003 APIs for checking if paths exists• H5LTvalid_path (h5ltvalid_path_f)• Example: Is there an object with a path /A/B/C/d ?• TRUE if there is a path, FALSE otherwise

Page 17: May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics

17

Hints

• Use latest file format (see H5Pset_libver_bound function in RM) • Save space when creating a lot of groups in

a file• Save time when accessing many objects

(>1000)• Caution: Tools built with the HDF5 versions prior

to 1.8.0 will not work on the files created with this property

May 30-31, 2012 HDF5 Workshop at PSI

Page 18: May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics

May 30-31, 2012 HDF5 Workshop at PSI 18

Informal Benchmark

• Create a file and a group in a file• Create up to 10^6 groups with one dataset in

each group• Compare files sizes and performance of HDF5

1.8.1 using the latest group format with the performance of HDF5 1.8.1 (default, old format) and 1.6.7

• Note: Default 1.8.1 and 1.6.7 became very slow after 700000 groups

Page 19: May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics

Time to Open and Read a Dataset

May 30-31, 2012 HDF5 Workshop at PSI 19

10000 100000 10000000.1

1

10

100

1000

1.61.8 (old groups)1.8 (new groups)

Number of Groups

Tim

e (

mil

lis

ec

on

ds

)

Page 20: May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics

Time to Close the File

May 30-31, 2012 HDF5 Workshop at PSI 20

10000 100000 10000000

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

1.61.8 (old groups)1.8 (new groups)

Number of Groups

Tim

e (

mil

lis

ec

on

ds

)

Page 21: May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics

File Size

May 30-31, 2012 HDF5 Workshop at PSI 21

0 200000 400000 600000 8000000

100000

200000

300000

400000

500000

600000

700000

800000

900000

1000000

1.8 (old groups)1.8 (new groups)

Number of Groups

Siz

e (

kil

ob

yte

s)

Page 22: May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics

DATATYPES

May 30-31, 2012 HDF5 Workshop at PSI 22

Page 23: May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics

23

Datatypes

• See Tutorial examples

May 30-31, 2012 HDF5 Workshop at PSI

Page 24: May 30-31, 2012HDF5 Workshop at PSI1 HDF5 at Glance Quick overview of known topics

Thank You!

Questions?

May 30-31, 2012 HDF5 Workshop at PSI 24