labcore.data.datadict#

datadict.py :

Data classes we use throughout the plottr package, and tools to work on them.

Module Attributes

str2dd(description)

shortcut to datastructure_from_string().

Functions

`combine_datadicts`(*dicts)	Try to make one datadict out of multiple.
`datadict_to_meshgrid`(data[, target_shape, ...])	Try to make a meshgrid from a dataset.
`datasets_are_equal`(a, b[, ignore_meta])	Check whether two datasets are equal.
`datastructure_from_string`(description)	Construct a DataDict from a string description.
`dd2df`(dd)	make a pandas Dataframe from a datadict.
`dd2xr`(dd)	makes an xarray Dataset from a MeshgridDataDict.
`guess_shape_from_datadict`(data)	Try to guess the shape of the datadict dependents from the axes values.
`is_meta_key`(key)	Checks if `key` is meta information.
`meshgrid_to_datadict`(data)	Make a DataDict from a MeshgridDataDict by reshaping the data.
`meta_key_to_name`(key)	Converts a meta data key to just the name.
`meta_name_to_key`(name)	Converts `name` into a meta data key.
`str2dd`(description)	shortcut to `datastructure_from_string()`.

Classes

`DataDict`(**kw)	The most basic implementation of the DataDict class.
`DataDictBase`(**kw)	Simple data storage class that is based on a regular dictionary.
`MeshgridDataDict`(**kw)	Implementation of DataDictBase meant to be used for when the axes form a grid on which the dependent values reside.

Exceptions

GriddingError

class labcore.data.datadict.DataDict(**kw: Any)[source]#

Bases: DataDictBase

The most basic implementation of the DataDict class.

It only enforces that the number of records per data field must be equal for all fields. This refers to the most outer dimension in case of nested arrays.

The class further implements simple appending of datadicts through the DataDict.append method, as well as allowing addition of DataDict instances.

add_data(**kw: Any) → None[source]#

Add data to all values. new data must be valid in itself.

This method is useful to easily add data without needing to specify meta data or dependencies, etc.

Parameters:: kw – one array per data field (none can be omitted).

append(newdata: DataDict) → None[source]#

Append a datadict to this one by appending data values.

Parameters:: newdata – DataDict to append.
Raises:: ValueError, if the structures are incompatible.

expand() → DataDict[source]#

Expand nested values in the data fields.

Flattens all value arrays. If nested dimensions are present, all data with non-nested dims will be repeated accordingly – each record is repeated to match the size of the nested dims.

Returns:: The flattened dataset.
Raises:: ValueError if data is not expandable.

is_expandable() → bool[source]#

Determine if the DataDict can be expanded.

Expansion flattens all nested data values to a 1D array. For doing so, we require that all data fields that have nested/inner dimensions (i.e, inside the records level) shape the inner shape. In other words, all data fields must be of shape (N,) or (N, (shape)), where shape is common to all that have a shape not equal to (N,).

Returns:: True if expandable. False otherwise.

is_expanded() → bool[source]#

Determine if the DataDict is expanded.

Returns:: True if expanded. False if not.

nrecords() → int | None[source]#

Gets the number of records in the dataset.

Returns:: The number of records in the dataset.

remove_invalid_entries() → DataDict[source]#

Remove all rows that are None or np.nan in all dependents.

Returns:: The cleaned DataDict.

sanitize() → DataDict[source]#

Clean-up.

Beyond the tasks of the base class DataDictBase:

remove invalid entries as far as reasonable.

Returns:: sanitized DataDict.

validate() → bool[source]#

Check dataset validity.

Beyond the checks performed in the base class DataDictBase, check whether the number of records is the same for all data fields.

Returns:: True if valid.
Raises:: ValueError if invalid.

class labcore.data.datadict.DataDictBase(**kw: Any)[source]#

Bases: dict

Simple data storage class that is based on a regular dictionary.

This base class does not make assumptions about the structure of the values. This is implemented in inheriting classes.

add_meta(key: str, value: Any, data: str | None = None) → None[source]#

Add meta info to the dataset.

If the key already exists, meta info will be overwritten.

Parameters:

key – Name of the meta field (without underscores).
value – Value of the meta information.
data – If None, meta will be global; otherwise assigned to data field data.

astype(dtype: dtype) → T[source]#

Convert all data values to given dtype.

Parameters:: dtype – np dtype.
Returns:: Dataset, with values as given type (not a copy)

axes(data: Sequence[str] | str | None = None) → List[str][source]#

Return a list of axes.

Parameters:: data – if None, return all axes present in the dataset, otherwise only the axes of the dependent data.
Returns:: The list of axes.

axes_are_compatible() → bool[source]#

Check if all dependent data fields have the same axes.

This includes axes order.

Returns:: True or False.

clear_meta(data: str | None = None) → None[source]#

Deletes all meta data.

Parameters:: data – If not None, delete all meta only from specified data field data. Else, deletes all top-level meta, as well as meta for all data fields.

copy() → T[source]#

Make a copy of the dataset.

Returns:: A copy of the dataset.

data_items() → Iterator[Tuple[str, Dict[str, Any]]][source]#

Generator for data field items.

Like dict.items(), but ignores meta data.

Returns:: Generator yielding first the key of the data field and second its value.

data_vals(key: str) → ndarray[source]#

Return the data values of field key.

Equivalent to DataDict['key'].values.

Parameters:: key – Name of the data field.
Returns:: Values of the data field.

delete_meta(key: str, data: str | None = None) → None[source]#

Deletes specific meta data.

Parameters:

key – Name of the meta field to remove.
data – If None, this affects global meta; otherwise remove from data field data.

dependents() → List[str][source]#

Get all dependents in the dataset.

Returns:: A list of the names of dependents.

extract(data: List[str], include_meta: bool = True, copy: bool = True, sanitize: bool = True) → T[source]#

Extract data from a dataset.

Return a new datadict with all fields specified in data included. Will also take any axes fields along that have not been explicitly specified. Will return empty if data consists of only axes fields.

Parameters:

data – Data field or list of data fields to be extracted.
include_meta – If True, include the global meta data. data meta will always be included.
copy – If True, data fields will be deep copies of the original.
sanitize – If True, will run DataDictBase.sanitize before returning.

Returns:

New DataDictBase containing only requested fields.

has_meta(key: str) → bool[source]#

Check whether meta field exists in the dataset.

Returns:: True if it exists, False if it doesn’t.

label(name: str) → str | None[source]#

Get the label for a data field. If no label is present returns the name of the data field as the label. If a unit is present, it will be appended at the end in brackets: “label (unit)”.

Parameters:: name – Name of the data field.
Returns:: Labelled name.

mask_invalid() → T[source]#: Mask all invalid data in all values. :return: Copy of the dataset with invalid entries (nan/None) masked.

meta_items(data: str | None = None, clean_keys: bool = True) → Iterator[Tuple[str, Dict[str, Any]]][source]#

Generator for meta items.

Like dict.items(), but yields only meta entries. The keys returned do not contain the underscores used internally.

Parameters:

data – If None iterate over global meta data. If it’s the name of a data field, iterate over the meta information of that field.
clean_keys – If True, remove the underscore pre/suffix.

Returns:

Generator yielding first the key of the data field and second its value.

meta_val(key: str, data: str | None = None) → Any[source]#

Return the value of meta field key (given without underscore).

Parameters:

key – Name of the meta field.
data – None for global meta; name of data field for data meta.

Returns:

The value of the meta information.

nbytes(name: str | None = None) → int | None[source]#

Get the size of data.

Parameters:: name – Name of the data field. if none, return size of entire datadict.
Returns:: size in bytes.

remove_unused_axes() → T[source]#

Removes axes not associated with dependents.

Returns:: Cleaned dataset.

reorder_axes(data_names: Sequence[str] | str | None = None, **pos: int) → T[source]#

Reorder data axes.

Parameters:

data_names – Data name(s) for which to reorder the axes. If None, apply to all dependents.
pos – New axes position in the form axis_name = new_position. Non-specified axes positions are adjusted automatically.

Returns:

Dataset with re-ordered axes (not a copy)

reorder_axes_indices(name: str, **pos: int) → Tuple[Tuple[int, ...], List[str]][source]#

Get the indices that can reorder axes in a given way.

Parameters:

name – Name of the data field of which we want to reorder axes.
pos – New axes position in the form axis_name = new_position. Non-specified axes positions are adjusted automatically.

Returns:

The tuple of new indices, and the list of axes names in the new order.

static same_structure(*data: T, check_shape: bool = False) → bool[source]#

Check if all supplied DataDicts share the same data structure (i.e., dependents and axes).

Ignores meta data and values. Checks also for matching shapes if check_shape is True.

Parameters:

data – The data sets to compare.
check_shape – Whether to include shape check in the comparison.

Returns:

True if the structure matches for all, else False.

sanitize() → T[source]#

Clean-up tasks:

Removes unused axes.

Returns:: Sanitized dataset.

set_meta(key: str, value: Any, data: str | None = None) → None#

Add meta info to the dataset.

If the key already exists, meta info will be overwritten.

Parameters:

key – Name of the meta field (without underscores).
value – Value of the meta information.
data – If None, meta will be global; otherwise assigned to data field data.

shapes() → Dict[str, Tuple[int, ...]][source]#

Get the shapes of all data fields.

Returns:: A dictionary of the form {key : shape}, where shape is the np.shape-tuple of the data with name key.

structure(add_shape: bool = False, include_meta: bool = True, same_type: bool = False, remove_data: List[str] | None = None) → T | None[source]#

Get the structure of the DataDict.

Return the datadict without values (value omitted in the dict).

Parameters:

add_shape – Deprecated – ignored.
include_meta – If True, include the meta information in the returned dict.
same_type – If True, return type will be the one of the object this is called on. Else, DataDictBase.
remove_data – any data fields listed will be removed from the result, also when listed in any axes.

Returns:

The DataDict containing the structure only. The exact type is the same as the type of self.

static to_records(**data: Any) → Dict[str, ndarray][source]#

Convert data to records that can be added to the DataDict. All data is converted to np.array, and reshaped such that the first dimension of all resulting arrays have the same length (chosen to be the smallest possible number that does not alter any shapes beyond adding a length-1 dimension as first dimension, if necessary).

If a data field is given as None, it will be converted to numpy.array([numpy.nan]).

Parameters:: data – keyword arguments for each data field followed by data.
Returns:: Dictionary with properly shaped data.

validate() → bool[source]#

Check the validity of the dataset.

Checks performed:

All axes specified with dependents must exist as data fields.

Other tasks performed:

unit keys are created if omitted.
label keys are created if omitted.
shape meta information is updated with the correct values (only if present already).

Returns:: True if valid, False if invalid.
Raises:: ValueError if invalid.

exception labcore.data.datadict.GriddingError[source]#: Bases: ValueError

class labcore.data.datadict.MeshgridDataDict(**kw: Any)[source]#

Bases: DataDictBase

Implementation of DataDictBase meant to be used for when the axes form a grid on which the dependent values reside.

It enforces that all dependents have the same axes and all shapes need to be identical.

mean(axis: str) → MeshgridDataDict[source]#

Take the mean over the given axis.

Parameters:: axis – which axis to take the average over.
Returns:: data, averaged over axis.

reorder_axes(data_names: Sequence[str] | str | None = None, **pos: int) → MeshgridDataDict[source]#

Reorder the axes for all data.

This includes transposing the data, since we’re on a grid.

Parameters:

data_names – Which dependents to include. if None are given, all dependents are included.
pos – New axes position in the form axis_name = new_position. non-specified axes positions are adjusted automatically.

Returns:

Dataset with re-ordered axes.

shape() → None | Tuple[int, ...][source]#

Return the shape of the meshgrid.

Returns:: The shape as tuple. None if no data in the set.

slice(**kwargs: Dict[str, slice | int]) → MeshgridDataDict[source]#

Return a N-d slice of the data.

Parameters:: kwargs – slicing information in the format axis: spec, where spec can be a slice object, or an integer (usual slicing notation).
Returns:: sliced data (as a copy)

squeeze() → None[source]#: Remove size-1 dimensions.

validate() → bool[source]#

Validation of the dataset.

Performs the following checks: * All dependents must have the same axes. * All shapes need to be identical.

Returns:: True if valid.
Raises:: ValueError if invalid.

labcore.data.datadict.combine_datadicts(*dicts: DataDict) → DataDictBase | DataDict[source]#

Try to make one datadict out of multiple.

Basic rules:

We try to maintain the input type.
Return type is ‘downgraded’ to DataDictBase if the contents are not compatible (i.e., different numbers of records in the inputs).

Returns:: Combined data.

labcore.data.datadict.datadict_to_meshgrid(data: DataDict, target_shape: Tuple[int, ...] | None = None, inner_axis_order: None | Sequence[str] = None, use_existing_shape: bool = False, copy: bool = True) → MeshgridDataDict[source]#

Try to make a meshgrid from a dataset.

Parameters:

data – Input DataDict.
target_shape – Target shape. If None we use guess_shape_from_datadict to infer.
inner_axis_order –
If axes of the datadict are not specified in the ‘C’ order (1st the slowest, last the fastest axis) then the ‘true’ inner order can be specified as a list of axes names, which has to match the specified axes in all but order. The data is then transposed to conform to the specified order.

Note

If this is given, then target_shape needs to be given in in the order of this inner_axis_order. The output data will keep the axis ordering specified in the axes property.
use_existing_shape – if True, simply use the shape that the data already has. For numpy-array data, this might already be present. If False, flatten and reshape.
copy – if True, then we make a copy of the data arrays. if False, data array is modified in-place.

Raises:

GriddingError (subclass of ValueError) if the data cannot be gridded.

Returns:

The generated MeshgridDataDict.

labcore.data.datadict.datasets_are_equal(a: DataDictBase, b: DataDictBase, ignore_meta: bool = False) → bool[source]#

Check whether two datasets are equal.

Compares type, structure, and content of all fields.

Parameters:

a – First dataset.
b – Second dataset.
ignore_meta – If True, do not verify if metadata matches.

Returns:

True or False.

labcore.data.datadict.datastructure_from_string(description: str) → DataDict[source]#

Construct a DataDict from a string description.

Examples

"data[mV](x, y)" results in a datadict with one dependent data with unit mV and two independents, x and y, that do not have units.
"data_1[mV](x, y); data_2[mA](x); x[mV]; y[nT]" results in two dependents, one of them depening on x and y, the other only on x. Note that x and y have units. We can (but do not have to) omit them when specifying the dependencies.
"data_1[mV](x[mV], y[nT]); data_2[mA](x[mV])". Same result as the previous example.

Rules:

We recognize descriptions of the form field1[unit1](ax1, ax2, ...); field1[unit2](...); ....

Field names (like field1 and field2 above) have to start with a letter, and may contain word characters.
Field descriptors consist of the name, optional unit (presence signified by square brackets), and optional dependencies (presence signified by round brackets).
Dependencies (axes) are implicitly recognized as fields (and thus have the same naming restrictions as field names).
Axes are separated by commas.
Axes may have a unit when specified as dependency, but besides the name, square brackets, and commas no other characters are recognized within the round brackets that specify the dependency.
In addition to being specified as dependency for a field, axes may be specified also as additional field without dependency, for instance to specify the unit (may simplify the string). For example, z1[x, y]; z2[x, y]; x[V]; y[V].
Units may only consist of word characters.
Use of unexpected characters will result in the ignoring the part that contains the symbol.
The regular expression used to find field descriptors is: ((?<=\A)|(?<=\;))[a-zA-Z]+\w*(\[\w*\])?(\(([a-zA-Z]+\w*(\[\w*\])?\,?)*\))?

labcore.data.datadict.dd2df(dd: DataDict)[source]#

make a pandas Dataframe from a datadict. Uses MultiIndex, and assumes that all data fields are compatible.

Parameters:: dd (DataDict) – source data
Returns:: pandas DataFrame
Return type:: DataFrame

labcore.data.datadict.dd2xr(dd: MeshgridDataDict) → Dataset[source]#

makes an xarray Dataset from a MeshgridDataDict.

TODO: currently only supports ‘regular’ grides, i.e., all axes: are independet of each other, and can be represented by 1d arrays. For each axis, the first slice is used as coordinate values.

Parameters:: dd (MeshgridDataDict) – input data
Returns:: xarray Dataset
Return type:: xr.Dataset

labcore.data.datadict.guess_shape_from_datadict(data: DataDict) → Dict[str, None | Tuple[List[str], Tuple[int, ...]]][source]#

Try to guess the shape of the datadict dependents from the axes values.

Parameters:: data – Dataset to examine.
Returns:: A dictionary with the dependents as keys, and inferred shapes as values. Value is None, if the shape could not be inferred.

labcore.data.datadict.is_meta_key(key: str) → bool[source]#

Checks if key is meta information.

Parameters:: key – The key we are checking.
Returns:: True if it is, False if it isn’t.

labcore.data.datadict.meshgrid_to_datadict(data: MeshgridDataDict) → DataDict[source]#

Make a DataDict from a MeshgridDataDict by reshaping the data.

Parameters:: data – Input MeshgridDataDict.
Returns:: Flattened DataDict.

labcore.data.datadict.meta_key_to_name(key: str) → str[source]#

Converts a meta data key to just the name. E.g: for key: “__meta__” returns “meta”

Parameters:: key – The key that is being converted
Returns:: The name of the key.
Raises:: ValueError if the key is not a meta key.

labcore.data.datadict.meta_name_to_key(name: str) → str[source]#

Converts name into a meta data key. E.g: “meta” gets converted to “__meta__”

Parameters:: name – The name that is being converted.
Returns:: The meta data key based on name.

labcore.data.datadict.str2dd(description: str) → DataDict#: shortcut to datastructure_from_string().

labcore.data.datadict#

This Page