Skip to content

Disentangle DataSet and DataReaders

Ammar Nejati requested to merge avoidMetaDataDuplicate into develop

Previously, a DataSet could end up in an inconsistent state depending on the order of execution (e.g., the nr of frames or the metadata could be invalid). To avoid such an inconsistency, the classes IDataReader as well as its derived classes are disentangled from the DataSet.
Currently, the MetaData is stored only in DataSet (to avoid duplication in IDataReader). A DataReader accepts a pointer to a DataSet and stores the acquired data in the DataSet (as the sole data container). The DataSet metadata are updated as soon as the required data is available. Furthermore, the Diffractometer is accessed only via DataSet (instead of the DataReader) to have a single access pattern and avoiding interface duplication. A DataSet initializes the required data-reader itself via the method addDataFile (for .nsx HDF5-files) or addRawFrame (for raw image files). After performing the reading, DataSet::finishRead method must be called to ensure correctness (similar to finishWrite).

Most importantly, the move semantics is not exposed to the Python API. Previously, such a Python code led to unknown/undefined behaviour:

import pynsx as nsx

expr = nsx.Experiment('test', 'BioDiff2500')
diffractometer = expr.diffractometer()
reader = nsx.HDF5DataReader("datafile.nsx", diffractometer)
data = nsx.DataSet(reader)

reader.metadata()  # unknown behaviour since `reader` is moved!

NOTE: cppreference for std::move states that

Unless otherwise specified, all standard library objects that have been moved from are placed in a "valid but unspecified state".

Furthermore,

  • The assigned name of the DataSet is made explicit.
  • Avoided duplication of nr of frames, columns, and rows as internal variables.
  • Retired unused modules FakeDataReader and DataReaderFactory as well as the build_server folder (relocated to devtools/deprecated).
  • Code and comments are improved.

Related to issue #262

Merge request reports