labcore.data.datadict_storage#

plottr.data.datadict_storage

Provides file-storage tools for the DataDict class.

Note

Any function in this module that interacts with a ddh5 file, will create a lock file while it is using the file. The lock file has the following format: ~<file_name>.lock. The file lock will get deleted even if the program crashes. If the process is suddenly stopped however, we cannot guarantee that the file lock will be deleted.

Functions

`add_cur_time_attr`(h5obj[, name, prefix, suffix])	Add current time information to the given HDF5 object, following the format of: `<prefix><name>_time_sec<suffix>`.
`all_datadicts_from_hdf5`(path[, file_timeout])	Loads all the DataDicts contained on a single HDF5 file.
`data_info`(folder[, fn, do_print])
`datadict_from_hdf5`(path[, groupname, ...])	Load a DataDict from file.
`datadict_to_hdf5`(datadict, path[, ...])	Write a DataDict to DDH5
`deh5ify`(obj)	Convert slightly mangled types back to more handy ones.
`find_data`(root[, newer_than, older_than, ...])
`h5ify`(obj)	Convert an object into something that we can assign to an HDF5 attribute.
`init_file`(f[, groupname])
`load_as_df`(folder[, fn])
`load_as_xr`(folder[, fn, fields])	Load ddh5 data as xarray (only for gridable data).
`most_recent_data_path`(root[, older_than, ...])
`reconstruct_safe_write_data`(path[, ...])	Creates a new DataDict from the data saved in the .tmp folder.
`set_attr`(h5obj, name, val)	Set attribute name of object h5obj to val
`timestamp_from_path`(p)	Return a datetime timestamp from a standard-formatted path.

Classes

`AppendMode`(value)	How/Whether to append data to existing data.
`DDH5Writer`(datadict[, basedir, groupname, ...])	Context manager for writing data to DDH5.
`FileOpener`(path[, mode, timeout, test_delay])	Context manager for opening files, creates its own file lock to indicate other programs that the file is being used.
`NumpyEncoder`(*[, skipkeys, ensure_ascii, ...])

class labcore.data.datadict_storage.AppendMode(value)[source]#

Bases: Enum

How/Whether to append data to existing data.

all = 1#: All data is appended to existing data.

new = 0#: Data that is additional compared to already existing data is appended.

none = 2#: Data is overwritten.

class labcore.data.datadict_storage.DDH5Writer(datadict: DataDict, basedir: str | Path = '.', groupname: str = 'data', name: str | None = None, filename: str = 'data', filepath: str | Path | None = None, file_timeout: float | None = None, safe_write_mode: bool | None = False)[source]#

Bases: object

Context manager for writing data to DDH5. Based on typical needs in taking data in an experimental physics lab.

Creates lock file when writing data.

Can be used in safe_write_mode to make sure the experiment and data will be saved even if the ddh5 is being used by other programs. In this mode, the data is individually saved in files in a .tmp folder. When the experiment is finished, the data is unified and saved in the original file. If the data is correctly reconstructed, the .tmp folder is deleted. If not you can use the function unify_safe_write_data to reconstruct the data.

Parameters:

basedir – The root directory in which data is stored. create_file_structure() is creating the structure inside this root and determines the file name of the data. The default structure implemented here is <root>/YYYY-MM-DD/YYYY-mm-dd_THHMMSS_<ID>-<name>/<filename>.ddh5, where <ID> is a short identifier string and <name> is the value of parameter name. To change this, re-implement data_folder() and/or create_file_structure().
datadict – Initial data object. Must contain at least the structure of the data to be able to use add_data() to add data.
groupname – Name of the top-level group in the file container. An existing group of that name will be deleted.
name – Name of this dataset. Used in path/file creation and added as meta data.
filename – Filename to use. Defaults to ‘data.ddh5’.
file_timeout – How long the function will wait for the ddh5 file to unlock. If none uses the default value from the FileOpener.
safe_write_mode – If True, will save the data in the safe writing mode. Defaults to False.

add_data(**kwargs: Any) → None[source]#

Add data to the file (and the internal DataDict).

Requires one keyword argument per data field in the DataDict, with the key being the name, and value the data to add. It is required that all added data has the same number of ‘rows’, i.e., the most outer dimension has to match for data to be inserted faithfully. If some data is scalar and others are not, then the data should be reshaped to (1, ) for the scalar data, and (1, …) for the others; in other words, an outer dimension with length 1 is added for all.

add_tag(tags: str | Collection[str]) → None[source]#

backup_file(paths: str | Collection[str]) → None[source]#

data_file_path() → Path[source]#

Determine the filepath of the data file.

Returns:: The filepath of the data file.

data_folder() → Path[source]#

Return the folder, relative to the data root path, in which data will be saved.

Default format: <basedir>/YYYY-MM-DD/YYYY-mm-ddTHHMMSS_<ID>-<name>. In this implementation we use the first 8 characters of a UUID as ID.

Returns:: The folder path.

n_files_per_dir = 1000#

n_files_per_reconstruction = 1000#

n_seconds_per_reconstruction = 10#

save_dict(name: str, d: dict) → None[source]#

save_text(name: str, text: str) → None[source]#

class labcore.data.datadict_storage.FileOpener(path: Path | str, mode: str = 'r', timeout: float | None = None, test_delay: float = 0.1)[source]#

Bases: object

Context manager for opening files, creates its own file lock to indicate other programs that the file is being used. The lock file follows the following structure: “~<file_name>.lock”.

Parameters:

path – The file path.
mode – The opening file mode. Only the following modes are supported: ‘r’, ‘w’, ‘w-’, ‘a’. Defaults to ‘r’.
timeout – Time, in seconds, the context manager waits for the file to unlock. Defaults to 30.
test_delay – Length of time in between checks. I.e. how long the FileOpener waits to see if a file got unlocked again

open_when_unlocked() → File[source]#

class labcore.data.datadict_storage.NumpyEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]#

Bases: JSONEncoder

default(obj)[source]#

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)

labcore.data.datadict_storage.add_cur_time_attr(h5obj: Any, name: str = 'creation', prefix: str = '__', suffix: str = '__') → None[source]#

Add current time information to the given HDF5 object, following the format of: <prefix><name>_time_sec<suffix>.

Parameters:

h5obj – The HDF5 object.
name – The name of the attribute.
prefix – Prefix of the attribute.
suffix – Suffix of the attribute.

labcore.data.datadict_storage.all_datadicts_from_hdf5(path: str | Path, file_timeout: float | None = None, **kwargs: Any) → Dict[str, Any][source]#

Loads all the DataDicts contained on a single HDF5 file. Returns a dictionary with the group names as keys and the DataDicts as the values of that key.

Parameters:

path – The path of the HDF5 file.
file_timeout – How long the function will wait for the ddh5 file to unlock. If none uses the default value from the FileOpener.

Returns:

Dictionary with group names as key, and the DataDicts inside them as values.

labcore.data.datadict_storage.data_info(folder: str, fn: str = 'data.ddh5', do_print: bool = True)[source]#

labcore.data.datadict_storage.datadict_from_hdf5(path: str | Path, groupname: str = 'data', startidx: int | None = None, stopidx: int | None = None, structure_only: bool = False, ignore_unequal_lengths: bool = True, file_timeout: float | None = None) → DataDict[source]#

Load a DataDict from file.

Parameters:

path – Full filepath without the file extension.
groupname – Name of hdf5 group.
startidx – Start row.
stopidx – End row + 1.
structure_only – If True, don’t load the data values.
ignore_unequal_lengths – If True, don’t fail when the rows have unequal length; will return the longest consistent DataDict possible.
file_timeout – How long the function will wait for the ddh5 file to unlock. If none uses the default value from the FileOpener.

Returns:

Validated DataDict.

labcore.data.datadict_storage.datadict_to_hdf5(datadict: DataDict, path: str | Path, groupname: str = 'data', append_mode: AppendMode = AppendMode.new, file_timeout: float | None = None) → None[source]#

Write a DataDict to DDH5

Note: Meta data is only written during initial writing of the dataset. If we’re appending to existing datasets, we’re not setting meta data anymore.

Parameters:

datadict – Datadict to write to disk.
path – Path of the file (extension may be omitted).
groupname – Name of the top level group to store the data in.
append_mode –
- AppendMode.none : Delete and re-create group.
- AppendMode.new : Append rows in the datadict that exceed the number of existing rows in the dataset already stored. Note: we’re not checking for content, only length!
- AppendMode.all : Append all data in datadict to file data sets.
file_timeout – How long the function will wait for the ddh5 file to unlock. Only relevant if you are writing to a file that already exists and some other program is trying to read it at the same time. If none uses the default value from the FileOpener.

labcore.data.datadict_storage.deh5ify(obj: Any) → Any[source]#

Convert slightly mangled types back to more handy ones.

Parameters:: obj – Input object.
Returns:: Object

labcore.data.datadict_storage.find_data(root, newer_than: datetime | None = None, older_than: datetime | None = None, folder_filter: str | None = None) → List[Path][source]#

labcore.data.datadict_storage.h5ify(obj: Any) → Any[source]#

Convert an object into something that we can assign to an HDF5 attribute.

Performs the following conversions: - list/array of strings -> numpy chararray of unicode type

Parameters:: obj – Input object.
Returns:: Object, converted if necessary.

labcore.data.datadict_storage.init_file(f: File, groupname: str = 'data') → None[source]#

labcore.data.datadict_storage.load_as_df(folder, fn='data.ddh5')[source]#

labcore.data.datadict_storage.load_as_xr(folder: Path, fn='data.ddh5', fields: List[str] | None = None) → Dataset[source]#

Load ddh5 data as xarray (only for gridable data).

Parameters:

folder – data folder
fn (str, optional) – filename, by default ‘data.ddh5’

Returns:

_description_

Return type:

_type_

labcore.data.datadict_storage.most_recent_data_path(root, older_than: datetime | None = None, folder_filter: str | None = None) → Path[source]#

labcore.data.datadict_storage.reconstruct_safe_write_data(path: str | Path, unification_from_scratch: bool = True, file_timeout: float | None = None) → DataDictBase[source]#

Creates a new DataDict from the data saved in the .tmp folder. This is used when the data is saved in the safe writing mode. The data is saved in individual files in the .tmp folder. This function reconstructs the data from these files and returns a DataDict with the data.

Parameters:

path – The path to the folder containing the .tmp path
unification_from_scratch – If True, will reconstruct the data from scratch. If False, will try to load the data from the last reconstructed file.
file_timeout – How long the function will wait for the ddh5 file to unlock. If none uses the default value

labcore.data.datadict_storage.set_attr(h5obj: Any, name: str, val: Any) → None[source]#

Set attribute name of object h5obj to val

Use h5ify() to convert the object, then try to set the attribute to the returned value. If that does not succeed due to a HDF5 typing restriction, set the attribute to the string representation of the value.

labcore.data.datadict_storage.timestamp_from_path(p: Path) → datetime[source]#: Return a datetime timestamp from a standard-formatted path. Assumes that the path stem has a timestamp that begins in ISO-like format YYYY-mm-ddTHHMMSS.

labcore.data.datadict_storage#

This Page