data format and packaging · corresponds to the technology/tools that are used to read and write...

47
Data Format and Packaging Kurt Biery 04 May 2020 DUNE DAQ/SC Meeting

Upload: others

Post on 19-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Data Format and PackagingKurt Biery04 May 2020DUNE DAQ/SC Meeting

Page 2: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

RemindersI primarily want to talk about the format and packaging of the raw data at the hand-off (interface) between online and offline.

I’ve used “event” and “Trigger Record” interchangeably in these slides.

In the interest of time, I suggest that we defer detailed discussions to follow-up meetings.

04-May-20202 Data Format and Packaging

Page 3: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

OutlineIntroductory comments on• Data Format• Data Packaging• Metadata and Manifest files

A sample HDF5 file with protoDUNE SP data

Data packaging examples – 3 so far

Sample metadata and manifest ‘files’

Topics for follow-up meetings; Ideas for testing HDF5 at PDSP

04-May-20203 Data Format and Packaging

Page 4: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Data FormatWe are investigating a DUNE-specific binary format stored in HDF5 files.Eric Flumerfelt has done initial work in creating sample code to write/read data fragments in HDF5 (artdaq-demo-hdf5 package).I’ve hacked that code to provide some sample HDF5 files with data from protoDUNE SP run 11037 (Cosmics run type).I will describe the (tentative, non-binding) choices that I made later in this talk…

** For the purposes of this talk, let’s say that the high level ‘data format’ corresponds to the technology/tools that are used to read and write the files.

04-May-20204 Data Format and Packaging

Page 5: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Data Packaging IntroThere have been some number of discussions over time about data packaging (Data Model workshop, DAQ workshop, etc.)

With help from Alessandro, I have come to believe that there are a few common themes regarding data packaging…

** Data packaging = ‘chunking’ or ‘slicing’ or ‘deciding which pieces of a Trigger Record or Stream go into which file’

04-May-20205 Data Format and Packaging

Page 6: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Data Packaging AbstractionsGoal:• Identify a set of parameters (or “choices” or “questions”)

that we can use to specify the packaging model for each type of Trigger (e.g. Beam or SNB) or Stream (TP stream or WIB debug stream)

Once those parameters are identified, we can design and build the DAQ (Dataflow) infrastructure to support ‘configurable’ packaging by Trigger and Stream.• Parameter values will be specified later• [note to self: focus on the online/offline interface]

04-May-20206 Data Format and Packaging

Page 7: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Data Packaging ParametersFor Triggered data:1. Whether each file on disk will have an integer number of Trigger

Records, or whether each file can have a fractional number of Trigger Record(s)

For both Triggered and Streamed data:2. Whether or not the data in each file on disk will have geographically

complete coverage (superset, Trigger Decision has details)1. If not GeoCmplt, A) what subdivision will be used, and B) should

the file boundaries match between the different subdivided pieces3. The maximum size of files that will be created4. The maximum time interval/duration that will be stored in a single file

(data time or wall clock time both seem possible)We will need to specify priority among these for each Trigger/Stream…

04-May-20207 Data Format and Packaging

Page 8: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Data Packaging SamplesI will describe some data files that demonstrate sample packaging models later in this talk…

04-May-20208 Data Format and Packaging

Page 9: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Metadata and Manifest FilesGoal is to have meta-information that describes the raw data in files• ‘metadata file’ – one-to-one with raw data file• ‘manifest file’ – one-to-many; provides list(s) when Trigger

Records can span multiple filesThe meta-information may not need to be in a separate file• Some sample choices in the later examples, different choices

certainly possibleOne particular type of meta-information: indicating a region-of-interest when a Trigger Decision specifies that for a particular Trigger Record, and I have an example of one way to do that.

04-May-20209 Data Format and Packaging

Page 10: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

HDF5 Samples

04-May-202010 Data Format and Packaging

Page 11: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Reminders about PDSP data nowart/ROOT files

Begin processing the 40th record. run: 11037 subRun: 1 event: 40 at 27-Mar-2020 11:07:38 CDTPRINCIPAL TYPE: EventPROCESS NAME | MODULE LABEL | PRODUCT INSTANCE NAME | DATA PRODUCT TYPE | PRODUCT FRIENDLY TYPE | SIZEDAQ.... | daq........... | TIMING........... | std::vector<artdaq::Fragment> | artdaq::Fragments.... | ...1DAQ.... | daq........... | ContainerFELIX... | std::vector<artdaq::Fragment> | artdaq::Fragments.... | ..50DAQ.... | daq........... | ContainerCRT..... | std::vector<artdaq::Fragment> | artdaq::Fragments.... | ...4DAQ.... | daq........... | ContainerTPC..... | std::vector<artdaq::Fragment> | artdaq::Fragments.... | ..10DAQ.... | TriggerResults | ................. | art::TriggerResults.......... | art::TriggerResults.. | ...-DAQ.... | daq........... | ContainerCTB..... | std::vector<artdaq::Fragment> | artdaq::Fragments.... | ...1DAQ.... | daq........... | ContainerPHOTON.. | std::vector<artdaq::Fragment> | artdaq::Fragments.... | ..24

Vectors of artdaq::Fragments, grouped by Product Instance Name’Container’ fragments for the majority of the types (pull-mode readout)artdaq::Fragment is header plus verbatim payload from the electronics

04-May-202011 Data Format and Packaging

Page 12: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Short intro to HDF5HDF Files contain Groups, Groups contain other Groups and Datasets• Similar to a directory structure- (Directories contain Directories and Files)

• Groups and Datasets can each have Attributes (key/value pairs)

HDF Datasets can support lots of different data structures• Simplifying assumption in these samples: use Datasets

with 1-dimensional arrays of unsigned 64-bit integers

04-May-202012 Data Format and Packaging

Page 13: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Comments about the samplesI followed the pattern set by Eric – artdaq::Fragment header fields become Dataset attributes, artdaq::Fragment payload is Dataset contents• And for event-based examples, I kept the same high-level Group as

Eric – an event

I took some liberties with mid-level constructs:• Got rid of Container Groups when only 1 fragment inside• Picked names for Groups and Datasets that seemed reasonable to

me, better suggestions welcome (e.g. “APA6.0”, “TimeSliceN”)• Created a Dataset Attribute that has the artdaq::Fragment timestamp

(which corresponds to the Trigger time) in a human-readable string• Missing data/Empty fragments just skipped

04-May-202013 Data Format and Packaging

Page 14: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Intro to the event-based layout

04-May-202014 Data Format and Packaging

file.hdf5Ø event N

Ø subdetector 1 (e.g. CRT)Ø geometric unit 1

Ø time slice 0Ø ... time slice P

Ø geometric unit 2Ø …

Ø subdetector 2 (e.g. CTB)Ø time slice 0Ø … time slice Q

Ø subdetector 3 (e.g. PDS) or electronics type 3 (e.g. FELIX or TPC [RCE])Ø geometric unit 1Ø … geometric unit R

Ø subdetector 4 (e.g. Timing)Ø event M

I chose to minimize the number oflevels in each case, at the expenseof giving up strict parallelism.

Of course, other choice(s) are possible.

Page 15: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Tools

04-May-202015 Data Format and Packaging

HDFView• HDF ‘file browser’; easy ‘install’; screenshots later this

talk; I can give a demo• Couple of notes: lexical sort; GB in HDFView is (1000)3

bytes, not (1024)3 [my talks always use (1024)3]h5dump• Prints out everything, so grep is typically neededh5copy• Allowed me to trivially copy individual files into one

Page 16: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Top-level view of one event

04-May-202016 Data Format and Packaging

Page 17: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Attributes for each event

04-May-202017 Data Format and Packaging

If/when I remake the examples, I will add attributes for• Trigger timestamp• Requested detector components• Successfully read out detector components

Page 18: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Datasets within each event

04-May-202018 Data Format and Packaging

Page 19: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Datasets within each event (2)

04-May-202019 Data Format and Packaging

Other options certainly available,For example, APA/1/link/0

Page 20: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Example of Dataset Attributes

04-May-202020 Data Format and Packaging

artdaq::Fragment header fields

Page 21: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Sample Timing Dataset

04-May-202021 Data Format and Packaging

“NoBeamTrig” & CookieTrigger timestampEvent counter & checksum

32-bit version, unused bits

Other timestamps

Page 22: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Sample FELIX/APA5.0 Dataset

04-May-202022 Data Format and Packaging

Crate/slot/fiber/version/sof?

Fragment metadata

Timestamp

Page 23: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Packaging examples

04-May-202023 Data Format and Packaging

Page 24: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Packaging examplesRemember, these are ones that I just made up – no implied official endorsement.If we want to try different schemes for further study, great, let’s do that.I can already see ways to make the Attributes assigned to various Groups and Datasets more common between event-based and time-based models.

04-May-202024 Data Format and Packaging

Page 25: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Packaging example 1”Fully-built events, primary file split by max file size”1. always an integer number of events (Trigger Records) per file2. fully-built/geographically-complete events3. max file size of N GB4. no limit on the time span covered by the events in each file

Four sample files so far (size limit is 6 GB):• np04_raw_run011037_GeoCmplt_6GB_0001.hdf5, 5.97 GB, events 1 to 45• np04_raw_run011037_GeoCmplt_6GB_0002.hdf5, 5.97 GB, events 46 to 90• np04_raw_run011037_GeoCmplt_6GB_0003.hdf5, 5.96 GB, events 91 to 135• np04_raw_run011037_GeoCmplt_6GB_0004.hdf5, 5.97 GB, events 136 to 180

Data structure already shown in “HDF5” part of the talk,…

04-May-202025 Data Format and Packaging

Page 26: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Packaging example 2”Data split by APA, primary file split by max file size”1. always an integer number of Trigger Record fragments per file2. geographically-separated data- APA1 data in one file (TPC & PDS)- APA2 data in a separate file (TPC & PDS)- APA3 data in a separate file (TPC & PDS)- APA4 data in a separate file (TPC & PDS)- APA5 data in a separate file (TPC & PDS)- APA6 data in a separate file (TPC & PDS)- Timing, CTB, and CRT data in a 7th file

3. correlated splitting of the files, by Trigger Record4. max file size N GB5. no limit on the time span covered by the events in each file

04-May-202026 Data Format and Packaging

Page 27: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Further info on example 2With geographically-separated data, at least two options for closing one file and opening another one:1. Each set of files (for example the TPC+PDS data from

APA1) is independent, and files get closed when they individually get to N GB

2. Files are correlated, and all files get closed when one of them reaches N GB

For this example, I chose option 2.

04-May-202027 Data Format and Packaging

Page 28: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Example 2 filesEach set of files was closed when any one of them approached 6 GB.

• np04_raw_run011037_GeoSplit_APA1_0001.hdf5, 5.94 GB, events 1 to 225• np04_raw_run011037_GeoSplit_APA2_0001.hdf5, 3.61 GB, events 1 to 225• np04_raw_run011037_GeoSplit_APA3_0001.hdf5, 2.35 GB, events 1 to 225• np04_raw_run011037_GeoSplit_APA4_0001.hdf5, 5.97 GB, events 1 to 225• np04_raw_run011037_GeoSplit_APA5_0001.hdf5, 5.99 GB, events 1 to 225• np04_raw_run011037_GeoSplit_APA6_0001.hdf5, 5.97 GB, events 1 to 225• np04_raw_run011037_GeoSplit_Other_0001.hdf5, 0.01 GB, events 1 to 225

• np04_raw_run011037_GeoSplit_APA1_0002.hdf5, 5.94 GB, events 226 to 450• np04_raw_run011037_GeoSplit_APA2_0002.hdf5, 3.61 GB, events 226 to 450• np04_raw_run011037_GeoSplit_APA3_0002.hdf5, 2.34 GB, events 226 to 450• np04_raw_run011037_GeoSplit_APA4_0002.hdf5, 5.98 GB, events 226 to 450• np04_raw_run011037_GeoSplit_APA5_0002.hdf5, 5.99 GB, events 226 to 450• np04_raw_run011037_GeoSplit_APA6_0002.hdf5, 5.97 GB, events 226 to 450• np04_raw_run011037_GeoSplit_Other_0002.hdf5, 0.01 GB, events 226 to 450

04-May-202028 Data Format and Packaging

Page 29: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Views of a few Example 2 files

04-May-202029 Data Format and Packaging

Page 30: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Packaging example 3"Fully-built Time Slices, primary file split TBD"

1. packaged by Time Slice, so no requirement on an integer number of Trigger Records per file

2. fully-built/geographically-complete events3. overall time span of Time Slices per file TBD4. max file size TBD

04-May-202030 Data Format and Packaging

Page 31: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Example 3 time-based layout

04-May-202031 Data Format and Packaging

file.hdf5Ø time slice 0

Ø subdetector 1 (e.g. FELIX) Ø geometric unit 1Ø …Ø geometric unit T

Ø time slice 1Ø …

It was not clear to me how to subdividethe non-FELIX PDSP data fragments, so I’veonly included FELIX data in this sample so far.

Page 32: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

View of the Example 3 file

04-May-202032 Data Format and Packaging

I didn’t have access to a readily available set of protoDUNE SP data with a long readout window, so I stuck with run 11037 and broke the 3 msecevents into three 1 msec chunks. (1 ms / 20 ns = 50,000 ticks)

Page 33: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Additional view for Example 3

04-May-202033 Data Format and Packaging

APA-level data

Page 34: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Example 3 file

04-May-202034 Data Format and Packaging

• np04_raw_run011037_TimeBased_GeoCmplt_0001.hdf5, 367 MB, 9 time slices

I wanted to get feedback before adding more time slices to this file• Bugger the timestamps to get continuous time slices?

Page 35: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Additional examples are possible

04-May-202035 Data Format and Packaging

Maybe the discussion of additional examples should be delegated to a group that includes offline folks?

Page 36: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Metadata files• Started with JSON metadata files from PDSP• Created sample JSON files for HDF5 sample files• Some details have been glossed over• Format and location (e.g. inside the HDF5 file) can be

changed

04-May-202036 Data Format and Packaging

Page 37: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

np04_raw_run011037_GeoSplit_APA1_0001.json{"file_name": "np04_raw_run011037_GeoSplit_APA1_0001.hdf5",

"start_time": "<at PDSP, Linux server time. still useful? If so, which process? HLF? Something upstream?>","end_time": "<still useful?>","earliest_trigger_time": 79184253665845104,"latest_trigger_time": 79184255392990149,

"information_about_the_version_of_the_DAQ_software_used" : {}

"data_stream": "Cosmics","data_tier": "raw","dune_data.daqconfigname": "CRT_noprescale_delay_00008","dune_data.is_fake_data": 0,

"dune_data.detector_config": "ohFelix100:ohFelix101:ssp101:ssp102:ssp103:ssp104:wib101:wib102:wib103:wib104:wib105""possible_new_way_of_describing_electronics_included_in_partition": { "TPC" : [ "APA1.0", "APA1.1", "APA1.2", "APA1.3",

"APA1.4", "APA1.5", "APA1.6", "APA1.7", "APA1.8", "APA1.9" ], "PDS" : [ "APA1.0", "APA1.1", "APA1.2", "APA1.3" ] }

"event_count": 225,"events": [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,…216,217,218,219,220,221,222,223,224,225],

"file_type": "detector","first_event": 1,"last_event": 225,"runs": [[11037,1,"protodune-sp"]]

}

04-May-202037 Data Format and Packaging

Page 38: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

np04_raw_run011037_GeoSplit_Manifest_0001.json{"files_for_each_trigger_record" :{"1" :{"CRT" : "np04_raw_run011037_GeoSplit_Other_0001.hdf5","CTB" : "np04_raw_run011037_GeoSplit_Other_0001.hdf5","PDS" :{"APA1" : "np04_raw_run011037_GeoSplit_APA1_0001.hdf5""APA2" : "np04_raw_run011037_GeoSplit_APA2_0001.hdf5""APA3" : "np04_raw_run011037_GeoSplit_APA3_0001.hdf5""APA4" : "np04_raw_run011037_GeoSplit_APA4_0001.hdf5""APA5" : "np04_raw_run011037_GeoSplit_APA5_0001.hdf5""APA6" : "np04_raw_run011037_GeoSplit_APA6_0001.hdf5"

}"Timing" : "np04_raw_run011037_GeoSplit_Other_0001.hdf5","TPC" :{"APA1" : "np04_raw_run011037_GeoSplit_APA1_0001.hdf5""APA2" : "np04_raw_run011037_GeoSplit_APA2_0001.hdf5""APA3" : "np04_raw_run011037_GeoSplit_APA3_0001.hdf5""APA4" : "np04_raw_run011037_GeoSplit_APA4_0001.hdf5""APA5" : "np04_raw_run011037_GeoSplit_APA5_0001.hdf5""APA6" : "np04_raw_run011037_GeoSplit_APA6_0001.hdf5"

}},

"2" :{"CRT" : "np04_raw_run011037_GeoSplit_Other_0001.hdf5","CTB" : "np04_raw_run011037_GeoSplit_Other_0001.hdf5",…

04-May-202038 Data Format and Packaging

Page 39: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Indicating region of interestMy idea so far is to add Attributes to each event for ‘requested_detector_components’ and ’successful_detector_components’, which would indicate which pieces of the overall detector were requested to be part of this event (from the Trigger Decision) and were successfully included in this event.

Thought so far is to use JSON string like in metadata files…“requested_detector_components": { "TPC" : [ "APA1.0", "APA1.1", "APA1.2", "APA1.3", "APA1.4", "APA1.5", "APA1.6", "APA1.7", "APA1.8", "APA1.9" ], "PDS" : [ "APA1.0", "APA1.1", "APA1.2", "APA1.3" ] }If this seems reasonable, I can go back and add this to the sample HDF5 files that have been created so far.

04-May-202039 Data Format and Packaging

Page 40: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Topics for follow-up meetingsAre we interested in pursuing these directions?Are the existing sample files useful, or is it better for folks to first meet and decide on better Groups and Datasets?With agreed-upon Groups and Datasets, should we remake the examples, or focus more into writing some HDF5 data from the DAQ at PDSP?How do folks envision accessing any HDF5 data that we produce (beyond looking at it with HDFView)? Reconstituted art/ROOT events? How would that work for time-slices?

04-May-202040 Data Format and Packaging

Page 41: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Possibly writing HDF5 files at PDSPRoland, Phil, Adam, and I have discussed various possibilities of writing long-window data at PDSP for later studies. (Binary dump from FELIX BR [Roland/Adam], HDF5 [me], stitching together 3 msec events [Phil])• I’ve made progress in creating some routines in dune-artdaq/dune-

raw-data that could be used for the HDF5 option. If we want to pursue that, we should discuss it. Like I suggested on the previous slide, learning about the expected data access will be a useful ingredient.

-rw-r--r-- 1 np04daq np-comp 123800 May 1 23:13 FelixReceiver_r11166_364.hdf5. (ohFelix600)-rw-r--r-- 1 np04daq np-comp 123800 May 1 23:13 FelixReceiver_r11166_369.hdf5. (ohFelix601)-rw-r--r-- 1 np04daq np-comp 111909192 May 1 23:13 FelixReceiver_r11167_364.hdf5-rw-r--r-- 1 np04daq np-comp 111909192 May 1 23:13 FelixReceiver_r11167_369.hdf5-rw-r--r-- 1 np04daq np-comp 83946568 May 1 23:13 FelixReceiver_r11168_364.hdf5-rw-r--r-- 1 np04daq np-comp 83946568 May 1 23:13 FelixReceiver_r11168_369.hdf5-rw-r--r-- 1 np04daq np-comp 125823736 May 1 23:13 FelixReceiver_r11169_364.hdf5-rw-r--r-- 1 np04daq np-comp 125823736 May 1 23:13 FelixReceiver_r11169_369.hdf5

04-May-202041 Data Format and Packaging

Page 42: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Backup Slides

04-May-202042 Data Format and Packaging

Page 43: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Reminder about Tom’s requirementsTom has summarized the following requirements:1. longevity of support 2. integrity checks – for the file format as well as the data fragments3. ability to read in small subsets of the trigger records and drop from

memory data no longer being used4. ability to navigate through a trigger record to get the adjacent time or

space samples5. compression tools6. browsable with a lightweight, interactive tool7. ability to handle evolution of data formats and structure gracefully

with backward compatibility ensuredhttps://wiki.dunescience.org/wiki/Project_Requirement_Brainstorming#Data_Format

04-May-202043 Data Format and Packaging

Page 44: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Ideas for common AttributesIn the packaging examples, different attributes were demonstrated for the Datasets. For the event-based grouping, the Attributes were mainly artdaq::Fragment header fields. For the time-based grouping, the Attributes were more focused on timestamps.Can I/we come up with a superset that works for both types of groupings? I believe so, yes.• Data size, fragment ID, timestamp, time string,• Fragment type, valid and complete flags• (FELIX) Number of frames, first frame timestamp, last frame timestamp

For the event or timeslice highest-level grouping, can we come up with a common set? Run number, time window start and end, is_complete. Event ID is redundant in the event-based grouping, so it could be dropped.

The earliest frame and latest frame Attributes that are part of the highest-level time-based grouping in example 3 should instead be part of the FELIX group.

04-May-202044 Data Format and Packaging

Page 45: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Further work and ideasSample HDF5 data files for long-window Trigger Records?

Sample HDF5 data files for Stream data?

Implementation ideas in Dataflow subsystem (Dispatcher)

04-May-202045 Data Format and Packaging

Page 46: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Other ideas1. Data challenge in Feb 20212. Metadata and manifest files…- Metadata file for each raw data file- Manifest file for each TR that spans multiple files- Metadata could instead be internal to the raw data file- Sample metadata information for SNB files:

• the trigger number/identifier• the APA number (or whatever geographic identifier(s) are appropriate)• the beginning and ending timestamps of the trigger window (or start

time and window size)• the beginning and ending timestamps of the interval that is covered by

the individual file (or start time and window size

04-May-202046 Data Format and Packaging

Page 47: Data Format and Packaging · corresponds to the technology/tools that are used to read and write the files. 4 04-May-2020Data Format and Packaging. Data Packaging Intro There have

Another idea for time-based layout

04-May-202047 Data Format and Packaging

file.hdf5Ø macro time slice 0

Ø subdetector 1 (e.g. CRT)Ø geometric unit 1

Ø micro time slice 0Ø … micro time slice J

Ø … geometric unit RØ subdetector 2 (e.g. PDS)

Ø geometric unit 1Ø … geometric unit S

Ø subdetector 3 (e.g. FELIX) Ø geometric unit 1Ø … geometric unit T

Ø macro time slice 1Ø …