Introduce the new NSX data storage format v1.0 (Backward-incompatible) (!278) · Merge requests · mlz / OpenHKL

Ammar Nejati requested to merge improveExporterImporter_NoBackwardCompatibility into develop Jun 28, 2021

The structure of the NSX data storage format is simplified and improved.
- A version string is added to distinguish different formats.
- Meta and Info are merged into Metadata group. The former restriction of string metadata to 80 chars is removed.
- The counts data are stored with the keyword Dataset to preclude possible name-clash with other group names.
- The peak metadata use the same data structure (MetaData) as other metadata.
- The peak metadata are stored as attributes to the PeakCollection group.
- Group names are improved to denote their intent properly.
```
Structure v1.0:
/DataCollections
 ./<datacollection-name>
   ./Dataset
   ./Detector
   ./Sample
   ./Metadata
...

/PeakCollections
 ./<peakcollection-name>
   ./Center
   ./Metric
   ...
   ./DatasetNames
   ./UnitCellNames
 ...

/UnitCells
 ./<unitcell-name>
   ./accepted
   ./rejected      
...
```
Removed duplicate code to export/import HDF5 files in order to avoid inconsistencies.
Removed the legacy HDF5 reader to have a single data storage format.
All keywords (including YAML and HDF keywords) are declared as constants in a single module (core/raw/DataKeys.h). No literal strings are allowed as keywords anymore. This avoids redundant and duplicate keywords which had caused confusion (eg., the confusion between file_name, filename, real_path and original_filename).
Removed duplicate instances of HDF5 Blosc-Filter definition. A dedicated module, HDF5BloscFilter, is used to avoid resource leaks.

TODO: Make HDF5 Blosc-Filter a singleton to acquire and release Blosc resources only once.
Added dev scripts to convert datafiles to the new NSX format v1.0. HDF5 files in test folder are already converted to version 1.0.
Corrected the initialization of some classes.

TODO: Some classes (like DataSet) are 'half-initialized' and could accessed while still in an inconsistent state. Their finalization depends strongly on the precise order of statements (e.g., whether IDataReader::end is called yet). This produces many obnoxious crashes and should be systematically corrected.

Improved variable or keyword names to denote the intent properly; e.g.,

original_filename => sources
group_name => dataset-name
npdone => number-of-frames
type => peak_type

Updated HDF5 test files to NSX v1.0.

Edited Jul 12, 2021 by Ammar Nejati

Introduce the new NSX data storage format v1.0 (Backward-incompatible)

Merge request reports