Skip to content

Introduce the new NSX data storage format v1.0 (Backward-incompatible)

  • The structure of the NSX data storage format is simplified and improved.

    • A version string is added to distinguish different formats.
    • Meta and Info are merged into Metadata group. The former restriction of string metadata to 80 chars is removed.
    • The counts data are stored with the keyword Dataset to preclude possible name-clash with other group names.
    • The peak metadata use the same data structure (MetaData) as other metadata.
    • The peak metadata are stored as attributes to the PeakCollection group.
    • Group names are improved to denote their intent properly.
    Structure v1.0:
    /DataCollections
     ./<datacollection-name>
       ./Dataset
       ./Detector
       ./Sample
       ./Metadata
    ...
    
    /PeakCollections
     ./<peakcollection-name>
       ./Center
       ./Metric
       ...
       ./DatasetNames
       ./UnitCellNames
     ...
    
    /UnitCells
     ./<unitcell-name>
       ./accepted
       ./rejected      
    ...
  • Removed duplicate code to export/import HDF5 files in order to avoid inconsistencies.

  • Removed the legacy HDF5 reader to have a single data storage format.

  • All keywords (including YAML and HDF keywords) are declared as constants in a single module (core/raw/DataKeys.h). No literal strings are allowed as keywords anymore. This avoids redundant and duplicate keywords which had caused confusion (eg., the confusion between file_name, filename, real_path and original_filename).

  • Removed duplicate instances of HDF5 Blosc-Filter definition. A dedicated module, HDF5BloscFilter, is used to avoid resource leaks.

    TODO: Make HDF5 Blosc-Filter a singleton to acquire and release Blosc resources only once.

  • Added dev scripts to convert datafiles to the new NSX format v1.0. HDF5 files in test folder are already converted to version 1.0.

  • Corrected the initialization of some classes.

    TODO: Some classes (like DataSet) are 'half-initialized' and could accessed while still in an inconsistent state. Their finalization depends strongly on the precise order of statements (e.g., whether IDataReader::end is called yet). This produces many obnoxious crashes and should be systematically corrected.

  • Improved variable or keyword names to denote the intent properly; e.g.,

    original_filename => sources
    group_name => dataset-name
    npdone => number-of-frames
    type => peak_type
  • Updated HDF5 test files to NSX v1.0.

Edited by Ammar Nejati

Merge request reports